[Bug] KQL Validation Add Wildcard w/ Space token value#5753
[Bug] KQL Validation Add Wildcard w/ Space token value#5753
Conversation
## Summary Fixes KQL parser to support wildcard values containing spaces (e.g., `*S3 Browser*`), which work in Kibana but were rejected by our unit tests. **Issue:** #5750 ## Changes ### Grammar (`lib/kql/kql/kql.g`) - Added `WILDCARD_LITERAL` token with priority 3 to match wildcard patterns containing spaces - Uses negative lookahead to stop before `or`/`and`/`not` keywords - Added to `value` rule (not `literal`) so field names remain unaffected ### Parser (`lib/kql/kql/parser.py`) - Handle new `WILDCARD_LITERAL` token type as wildcards - Quoted strings (`"*text*"`) now treated as literals, matching Kibana behavior ## Behavior | Query | Before | After | |-------|--------|-------| | `field: *S3 Browser*` | ❌ Parse error | ✅ Wildcard | | `field: *test*` | ✅ Wildcard | ✅ Wildcard | | `common.*: value` | ✅ Works | ✅ Works | | `field: "*text*"` | Wildcard | ✅ Literal (matches Kibana) | ## Test plan - [x] All 63 existing KQL unit tests pass - [x] New wildcard-with-spaces patterns parse correctly - [x] Wildcard field names (`common.*`) still work - [x] Keywords (`or`, `and`, `not`) correctly recognized as separators - [x] Tested against rule file from PR #5694
Bug - GuidelinesThese guidelines serve as a reminder set of considerations when addressing a bug in the code. Documentation and Context
Code Standards and Practices
Testing
Additional Checks
|
update kibana and kql pyproject.toml versions
…thub.com/elastic/detection-rules into bug_fix_kql_parser_spaces_with_wildcards
|
I don't think this works as intended yet. If you look through the current rules on main with the parser update and have it log each time is sees a Wildcard literally, it will detect the value Reference Implementation for detection
Logfile result after installing via |
Here is something that appears to fix the issue as a suggestion. There probably is a better way to do this, just an example. |
…ding keywords Add Negative lookahead at start of Pattern 2 - uses (?!(?:or|and|not)\b) at the start to prevent matching values that begin with keywords like 'not /path*'
…aced phrase
# KQL Parser Changes - Wildcard Spaces and NOT Prefix Fix
## Overview
This update fixes two issues in the KQL parser:
1. **Wildcard values with spaces** - Values like `*S3 Browser*` now parse correctly
2. **NOT prefix false match** - Values like `not /tmp/go-build*` are no longer incorrectly consumed as a single wildcard literal
## Files Modified
### `lib/kql/kql/kql.g` (Grammar)
**Added `optional_not` rule** to handle `NOT` as an explicit grammar element:
```
?list_of_values: "(" or_list_of_values ")"
| optional_not value
?optional_not: NOT optional_not
|
```
**Expanded `WILDCARD_LITERAL`** with 4 patterns to support all wildcard-with-space cases:
| Pattern | Description | Example |
|---------|-------------|---------|
| 1 | Starts with `*` | `*S3 Browser`, `*S3 Browser*` |
| 2 | Ends with `*` (doesn't start with `*`) | `S3 Browser*` |
| 3a | `*` appears after a space | `S3 B*owser` |
| 3b | `*` appears before a space | `S3* Browser` |
### `lib/kql/kql/parser.py`
Added methods to handle the new grammar rules:
- `list_of_values()` - handles `optional_not value` structure
- `optional_not()` - counts NOT occurrences and wraps values with `NotValue`
### `lib/kql/kql/kql2eql.py`
Added corresponding methods for EQL conversion:
- `list_of_values()` - handles `optional_not value` structure
- `optional_not()` - counts NOT occurrences and wraps with `eql.ast.Not`
## Test Results
All 63 kuery tests pass. Verified wildcard cases:
| Input | Result |
|-------|--------|
| `field: *S3 Browser*` | `field:*S3\ Browser*` |
| `field: S3 Browser*` | `field:S3\ Browser*` |
| `field: *S3 Browser` | `field:*S3\ Browser` |
| `field: S3 B*owser` | `field:S3\ B*owser` |
| `field: S3* Browser` | `field:S3*\ Browser` |
| `field: foo* bar* baz` | `field:foo*\ bar*\ baz` |
| `process.executable: not /tmp/go-build*` | `not process.executable:/tmp/go-build*` |
| `field < value` | `field < value` (range expression, not wildcard) |
## Technical Notes
### Pattern 3a Fix
Pattern 3a requires at least one character AFTER the `*` (uses `[...]+` instead of `[...]*`). This prevents Pattern 2 from incorrectly matching shorter strings like `S3 B*` when the full value is `S3 B*owser`.
### NOT Keyword Handling
The `optional_not` grammar approach explicitly parses `NOT` as a keyword before the value, preventing it from being consumed as part of a wildcard literal. This is safer than regex-only approaches because:
- `NOT` token only matches the exact word "not" (case-insensitive)
- Values like `notafile*` are still parsed as `UNQUOTED_LITERAL`
- Edge case: literal value "not" must be quoted: `field: "not"`
|
@eric-forte-elastic I implemented your KQL Parser Changes - Wildcard Spaces and NOT Prefix FixOverviewThis update fixes two issues in the KQL parser:
Files Modified
|
| Pattern | Description | Example |
|---|---|---|
| 1 | Starts with * |
*S3 Browser, *S3 Browser* |
| 2 | Ends with * (doesn't start with *) |
S3 Browser* |
| 3a | * appears after a space |
S3 B*owser |
| 3b | * appears before a space |
S3* Browser |
lib/kql/kql/parser.py
Added methods to handle the new grammar rules:
list_of_values()- handlesoptional_not valuestructureoptional_not()- counts NOT occurrences and wraps values withNotValue
lib/kql/kql/kql2eql.py
Added corresponding methods for EQL conversion:
list_of_values()- handlesoptional_not valuestructureoptional_not()- counts NOT occurrences and wraps witheql.ast.Not
Test Results
All 63 kuery tests pass. Verified wildcard cases:
| Input | Result |
|---|---|
field: *S3 Browser* |
field:*S3\ Browser* |
field: S3 Browser* |
field:S3\ Browser* |
field: *S3 Browser |
field:*S3\ Browser |
field: S3 B*owser |
field:S3\ B*owser |
field: S3* Browser |
field:S3*\ Browser |
field: foo* bar* baz |
field:foo*\ bar*\ baz |
process.executable: not /tmp/go-build* |
not process.executable:/tmp/go-build* |
field < value |
field < value (range expression, not wildcard) |
Technical Notes
Pattern 3a Fix
Pattern 3a requires at least one character AFTER the * (uses [...]+ instead of [...]*). This prevents Pattern 2 from incorrectly matching shorter strings like S3 B* when the full value is S3 B*owser.
NOT Keyword Handling
The optional_not grammar approach explicitly parses NOT as a keyword before the value, preventing it from being consumed as part of a wildcard literal. This is safer than regex-only approaches because:
NOTtoken only matches the exact word "not" (case-insensitive)- Values like
notafile*are still parsed asUNQUOTED_LITERAL - Edge case: literal value "not" must be quoted:
field: "not"
Pull Request
Issue link(s):
Summary
Fixes KQL parser to support wildcard values containing spaces (e.g.,
*S3 Browser*), which work in Kibana but were rejected by our unit tests.Changes
Grammar (
lib/kql/kql/kql.g)WILDCARD_LITERALtoken with priority 3 to match wildcard patterns containing spacesor/and/notkeywordsvaluerule (notliteral) so field names remain unaffectedParser (
lib/kql/kql/parser.py)WILDCARD_LITERALtoken type as wildcards"*text*") now treated as literals, matching Kibana behaviorBehavior
field: *S3 Browser*common.*: valuefield: "*text*"Test plan
common.*) still workor,and,not) correctly recognized as separatorsWorking Query in Kibana
Parser changes working with rule from PR #5694
All KQL unit tests passing w/ changes