Skip to content

[improvement](match) add SIMD-accelerated TokenSearcher fast path for match_any/match_all#60874

Open
airborne12 wants to merge 1 commit intoapache:masterfrom
airborne12:worktree-match
Open

[improvement](match) add SIMD-accelerated TokenSearcher fast path for match_any/match_all#60874
airborne12 wants to merge 1 commit intoapache:masterfrom
airborne12:worktree-match

Conversation

@airborne12
Copy link
Member

What problem does this PR solve?

Problem Summary:
When enable_match_without_inverted_index=true, match_any and match_all functions fall back to per-row Lucene tokenization which is expensive. This PR adds a SIMD-accelerated fast path using TokenSearcher (ported from ClickHouse's hasToken pattern) that eliminates per-row tokenization overhead for PARSER_STANDARD columns.

The fast path uses SSE4.1 SIMD instructions for substring search combined with token boundary verification. It activates only for PARSER_STANDARD with no char_filter_map and single-term query tokens, falling back to the original Lucene path otherwise.

Key changes:

  • Port case-insensitive ASCII StringSearcher from ClickHouse with SSE4.1 SIMD acceleration
  • Add is_lowercase field to InvertedIndexAnalyzerCtx for case-sensitivity dispatch
  • Add can_use_token_search guard with strict conditions (PARSER_STANDARD only, no char_filter_map, no array offsets)
  • Implement fast path for both match_any (OR semantics) and match_all (AND semantics)
  • Add comprehensive unit tests (36 tests) and Google Benchmark

Release note

Improve match_any/match_all performance for no-index queries on PARSER_STANDARD columns using SIMD-accelerated token search.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes. match_any/match_all without inverted index now uses a faster code path for PARSER_STANDARD columns. Results are semantically equivalent.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

… match_any/match_all

Add a fast path for match_any and match_all functions that bypasses
per-row Lucene tokenization by using SIMD-accelerated TokenSearcher
(ported from ClickHouse's hasToken pattern). This significantly improves
performance for no-index match queries on PARSER_STANDARD columns.

The fast path uses SSE4.1 SIMD instructions for substring search
combined with token boundary verification (non-alphanumeric ASCII
characters are separators). It activates only when:
- analyzer_ctx is present with PARSER_STANDARD
- no array offsets (non-array columns)
- no char_filter_map transformations
- all query tokens are single terms

Case-insensitive search is supported via dual-pattern SIMD matching
when the index has lower_case=true.
@airborne12
Copy link
Member Author

run buildall

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@doris-robot
Copy link

TPC-H: Total hot run time: 28730 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f882d32f24ea00f273699c47324b551b71a8b524, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17651	4509	4284	4284
q2	q3	10653	833	535	535
q4	4684	361	251	251
q5	7559	1210	1015	1015
q6	173	173	146	146
q7	783	872	671	671
q8	9300	1467	1338	1338
q9	4809	4770	4680	4680
q10	6830	1858	1629	1629
q11	452	272	235	235
q12	712	571	460	460
q13	17788	4236	3397	3397
q14	234	241	217	217
q15	948	801	786	786
q16	762	716	676	676
q17	732	880	419	419
q18	5926	5464	5270	5270
q19	1119	982	624	624
q20	495	495	376	376
q21	4676	2072	1471	1471
q22	421	340	250	250
Total cold run time: 96707 ms
Total hot run time: 28730 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4606	4533	4528	4528
q2	q3	1837	2226	1775	1775
q4	861	1193	747	747
q5	4051	4412	4308	4308
q6	191	174	137	137
q7	1761	1641	1519	1519
q8	2565	2819	2638	2638
q9	7651	7490	7365	7365
q10	2642	2862	2433	2433
q11	517	435	432	432
q12	507	623	428	428
q13	3982	4392	3637	3637
q14	296	314	286	286
q15	867	842	785	785
q16	705	754	705	705
q17	1180	1521	1293	1293
q18	6997	6741	6824	6741
q19	881	864	901	864
q20	2188	2184	2030	2030
q21	3946	3454	3372	3372
q22	461	433	378	378
Total cold run time: 48692 ms
Total hot run time: 46401 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183935 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f882d32f24ea00f273699c47324b551b71a8b524, data reload: false

query5	5143	650	525	525
query6	324	215	197	197
query7	4206	453	284	284
query8	335	236	233	233
query9	8704	2735	2669	2669
query10	518	408	338	338
query11	16967	17432	17239	17239
query12	223	130	127	127
query13	1488	471	363	363
query14	6923	3308	3099	3099
query14_1	2973	2907	2947	2907
query15	209	202	176	176
query16	1043	507	475	475
query17	1077	764	640	640
query18	2736	445	347	347
query19	214	207	185	185
query20	138	129	128	128
query21	211	141	118	118
query22	5021	5065	4909	4909
query23	17343	16780	16644	16644
query23_1	16620	16599	16756	16599
query24	7178	1609	1227	1227
query24_1	1227	1235	1220	1220
query25	557	481	433	433
query26	1293	258	140	140
query27	2780	465	280	280
query28	4489	1894	1877	1877
query29	767	547	453	453
query30	310	245	210	210
query31	903	753	653	653
query32	80	72	72	72
query33	521	340	279	279
query34	915	890	556	556
query35	641	688	579	579
query36	1083	1157	1022	1022
query37	136	93	83	83
query38	2963	3006	2855	2855
query39	1000	880	851	851
query39_1	844	830	934	830
query40	234	156	134	134
query41	65	60	58	58
query42	110	106	107	106
query43	395	385	361	361
query44	
query45	198	188	181	181
query46	876	967	603	603
query47	2126	2110	2055	2055
query48	317	313	224	224
query49	619	455	374	374
query50	679	278	213	213
query51	4087	4160	4084	4084
query52	103	109	99	99
query53	289	331	293	293
query54	334	266	265	265
query55	89	88	83	83
query56	318	312	310	310
query57	1376	1338	1277	1277
query58	292	280	267	267
query59	2664	2707	2650	2650
query60	335	334	319	319
query61	147	140	144	140
query62	619	587	543	543
query63	312	281	282	281
query64	4860	1222	964	964
query65	
query66	1373	462	344	344
query67	16375	16518	16226	16226
query68	
query69	379	320	283	283
query70	1009	928	973	928
query71	336	308	298	298
query72	2706	2646	2426	2426
query73	527	542	326	326
query74	10056	9856	9768	9768
query75	2859	2739	2457	2457
query76	2312	1035	683	683
query77	358	380	327	327
query78	11201	11616	10774	10774
query79	1122	787	586	586
query80	1407	656	528	528
query81	562	278	250	250
query82	991	150	112	112
query83	350	258	242	242
query84	250	126	96	96
query85	1020	475	434	434
query86	426	343	338	338
query87	3115	3151	3005	3005
query88	3558	2666	2635	2635
query89	429	367	339	339
query90	1934	178	174	174
query91	171	154	132	132
query92	75	77	73	73
query93	1013	854	505	505
query94	644	316	282	282
query95	582	339	377	339
query96	633	508	230	230
query97	2490	2474	2375	2375
query98	231	218	216	216
query99	1008	1005	927	927
Total cold run time: 254580 ms
Total hot run time: 183935 ms

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 34.68% (77/222) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.50% (19574/37283)
Line Coverage 36.13% (182705/505716)
Region Coverage 32.48% (141905/436909)
Branch Coverage 33.45% (61543/184011)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 68.02% (151/222) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.45% (26093/36517)
Line Coverage 54.21% (273354/504225)
Region Coverage 51.58% (227528/441088)
Branch Coverage 53.01% (97850/184603)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants