Skip to content

Comments

[Bug] KQL Validation Add Wildcard w/ Space token value#5753

Open
imays11 wants to merge 8 commits intomainfrom
bug_fix_kql_parser_spaces_with_wildcards
Open

[Bug] KQL Validation Add Wildcard w/ Space token value#5753
imays11 wants to merge 8 commits intomainfrom
bug_fix_kql_parser_spaces_with_wildcards

Conversation

@imays11
Copy link
Contributor

@imays11 imays11 commented Feb 20, 2026

Pull Request

Issue link(s):

Summary

Fixes KQL parser to support wildcard values containing spaces (e.g., *S3 Browser*), which work in Kibana but were rejected by our unit tests.

Changes

Grammar (lib/kql/kql/kql.g)

  • Added WILDCARD_LITERAL token with priority 3 to match wildcard patterns containing spaces
  • Uses negative lookahead to stop before or/and/not keywords
  • Added to value rule (not literal) so field names remain unaffected

Parser (lib/kql/kql/parser.py)

  • Handle new WILDCARD_LITERAL token type as wildcards
  • Quoted strings ("*text*") now treated as literals, matching Kibana behavior

Behavior

Query Before After
field: *S3 Browser* ❌ Parse error ✅ Wildcard
common.*: value ✅ Works ✅ Works
field: "*text*" Wildcard ✅ Literal (matches Kibana)

Test plan

Working Query in Kibana

Screenshot 2026-02-20 at 2 08 56 PM

Parser changes working with rule from PR #5694

Screenshot 2026-02-20 at 6 28 41 PM

All KQL unit tests passing w/ changes

Screenshot 2026-02-20 at 6 07 49 PM

## Summary
Fixes KQL parser to support wildcard values containing spaces (e.g., `*S3 Browser*`), which work in Kibana but were rejected by our unit tests.

**Issue:** #5750

## Changes

### Grammar (`lib/kql/kql/kql.g`)
- Added `WILDCARD_LITERAL` token with priority 3 to match wildcard patterns containing spaces
- Uses negative lookahead to stop before `or`/`and`/`not` keywords
- Added to `value` rule (not `literal`) so field names remain unaffected

### Parser (`lib/kql/kql/parser.py`)
- Handle new `WILDCARD_LITERAL` token type as wildcards
- Quoted strings (`"*text*"`) now treated as literals, matching Kibana behavior

## Behavior

| Query | Before | After |
|-------|--------|-------|
| `field: *S3 Browser*` | ❌ Parse error | ✅ Wildcard |
| `field: *test*` | ✅ Wildcard | ✅ Wildcard |
| `common.*: value` | ✅ Works | ✅ Works |
| `field: "*text*"` | Wildcard | ✅ Literal (matches Kibana) |

## Test plan
- [x] All 63 existing KQL unit tests pass
- [x] New wildcard-with-spaces patterns parse correctly
- [x] Wildcard field names (`common.*`) still work
- [x] Keywords (`or`, `and`, `not`) correctly recognized as separators
- [x] Tested against rule file from PR #5694
@imays11 imays11 self-assigned this Feb 20, 2026
@imays11 imays11 added bug Something isn't working python Internal python for the repository Team: TRADE labels Feb 20, 2026
@github-actions
Copy link
Contributor

Bug - Guidelines

These guidelines serve as a reminder set of considerations when addressing a bug in the code.

Documentation and Context

  • Provide detailed documentation (description, screenshots, reproducing the bug, etc.) of the bug if not already documented in an issue.
  • Include additional context or details about the problem.
  • Ensure the fix includes necessary updates to the release documentation and versioning.

Code Standards and Practices

  • Code follows established design patterns within the repo and avoids duplication.
  • Ensure that the code is modular and reusable where applicable.

Testing

  • New unit tests have been added to cover the bug fix or edge cases.
  • Existing unit tests have been updated to reflect the changes.
  • Provide evidence of testing and detecting the bug fix (e.g., test logs, screenshots).
  • Validate that any rules affected by the bug are correctly updated.
  • Ensure that performance is not negatively impacted by the changes.
  • Verify that any release artifacts are properly generated and tested.
  • Conducted system testing, including fleet, import, and create APIs (e.g., run make test-cli, make test-remote-cli, make test-hunting-cli)

Additional Checks

  • Verify that the bug fix works across all relevant environments (e.g., different OS versions).
  • Confirm that the proper version label is applied to the PR patch, minor, major.

@imays11 imays11 added the patch label Feb 23, 2026
@eric-forte-elastic
Copy link
Contributor

I don't think this works as intended yet. If you look through the current rules on main with the parser update and have it log each time is sees a Wildcard literally, it will detect the value not /tmp/go-build* as a wildcard literal from rules/linux/execution_suspicious_executable_running_system_commands.toml which I do not think we want.

Reference Implementation for detection

parser_py.patch

Logfile result after installing via make deps and running unit tests:
(this is all one detection, just runs through multiple unit tests)

---------- WILDCARD LITERAL DETECTED ----------
token.value: not /tmp/go-build*
---------- WILDCARD LITERAL DETECTED ----------
token.value: not /tmp/go-build*
---------- WILDCARD LITERAL DETECTED ----------
token.value: not /tmp/go-build*
---------- WILDCARD LITERAL DETECTED ----------
token.value: not /tmp/go-build*
---------- WILDCARD LITERAL DETECTED ----------
token.value: not /tmp/go-build*
---------- WILDCARD LITERAL DETECTED ----------
token.value: not /tmp/go-build*
---------- WILDCARD LITERAL DETECTED ----------
token.value: not /tmp/go-build*
---------- WILDCARD LITERAL DETECTED ----------
token.value: not /tmp/go-build*
---------- WILDCARD LITERAL DETECTED ----------
token.value: not /tmp/go-build*
---------- WILDCARD LITERAL DETECTED ----------
token.value: not /tmp/go-build*
---------- WILDCARD LITERAL DETECTED ----------
token.value: not /tmp/go-build*
---------- WILDCARD LITERAL DETECTED ----------
token.value: not /tmp/go-build*


@eric-forte-elastic
Copy link
Contributor

I don't think this works as intended yet. If you look through the current rules on main with the parser update and have it log each time is sees a Wildcard literally, it will detect the value not /tmp/go-build* as a wildcard literal from rules/linux/execution_suspicious_executable_running_system_commands.toml which I do not think we want.

Reference Implementation for detection

Here is something that appears to fix the issue as a suggestion. There probably is a better way to do this, just an example.

Details

my_changes.patch

…ding keywords

Add Negative lookahead at start of Pattern 2 - uses (?!(?:or|and|not)\b) at the start to prevent matching values that begin with keywords like 'not /path*'
@imays11 imays11 marked this pull request as draft February 23, 2026 17:50
…aced phrase

# KQL Parser Changes - Wildcard Spaces and NOT Prefix Fix

## Overview

This update fixes two issues in the KQL parser:
1. **Wildcard values with spaces** - Values like `*S3 Browser*` now parse correctly
2. **NOT prefix false match** - Values like `not /tmp/go-build*` are no longer incorrectly consumed as a single wildcard literal

## Files Modified

### `lib/kql/kql/kql.g` (Grammar)

**Added `optional_not` rule** to handle `NOT` as an explicit grammar element:
```
?list_of_values: "(" or_list_of_values ")"
| optional_not value
?optional_not: NOT optional_not
|
```

**Expanded `WILDCARD_LITERAL`** with 4 patterns to support all wildcard-with-space cases:

| Pattern | Description | Example |
|---------|-------------|---------|
| 1 | Starts with `*` | `*S3 Browser`, `*S3 Browser*` |
| 2 | Ends with `*` (doesn't start with `*`) | `S3 Browser*` |
| 3a | `*` appears after a space | `S3 B*owser` |
| 3b | `*` appears before a space | `S3* Browser` |

### `lib/kql/kql/parser.py`

Added methods to handle the new grammar rules:
- `list_of_values()` - handles `optional_not value` structure
- `optional_not()` - counts NOT occurrences and wraps values with `NotValue`

### `lib/kql/kql/kql2eql.py`

Added corresponding methods for EQL conversion:
- `list_of_values()` - handles `optional_not value` structure
- `optional_not()` - counts NOT occurrences and wraps with `eql.ast.Not`

## Test Results

All 63 kuery tests pass. Verified wildcard cases:

| Input | Result |
|-------|--------|
| `field: *S3 Browser*` | `field:*S3\ Browser*` |
| `field: S3 Browser*` | `field:S3\ Browser*` |
| `field: *S3 Browser` | `field:*S3\ Browser` |
| `field: S3 B*owser` | `field:S3\ B*owser` |
| `field: S3* Browser` | `field:S3*\ Browser` |
| `field: foo* bar* baz` | `field:foo*\ bar*\ baz` |
| `process.executable: not /tmp/go-build*` | `not process.executable:/tmp/go-build*` |
| `field < value` | `field < value` (range expression, not wildcard) |

## Technical Notes

### Pattern 3a Fix
Pattern 3a requires at least one character AFTER the `*` (uses `[...]+` instead of `[...]*`). This prevents Pattern 2 from incorrectly matching shorter strings like `S3 B*` when the full value is `S3 B*owser`.

### NOT Keyword Handling
The `optional_not` grammar approach explicitly parses `NOT` as a keyword before the value, preventing it from being consumed as part of a wildcard literal. This is safer than regex-only approaches because:
- `NOT` token only matches the exact word "not" (case-insensitive)
- Values like `notafile*` are still parsed as `UNQUOTED_LITERAL`
- Edge case: literal value "not" must be quoted: `field: "not"`
@imays11
Copy link
Contributor Author

imays11 commented Feb 23, 2026

@eric-forte-elastic I implemented your optional_not token approach, and added support for phrases with a space and the wildcard is somewhere in the middle, previously the implementation only focused on wildcard being at the beginning or at the end.

KQL Parser Changes - Wildcard Spaces and NOT Prefix Fix

Overview

This update fixes two issues in the KQL parser:

  1. Wildcard values with spaces - Values like *S3 Browser* now parse correctly
  2. NOT prefix false match - Values like not /tmp/go-build* are no longer incorrectly consumed as a single wildcard literal

Files Modified

lib/kql/kql/kql.g (Grammar)

Added optional_not rule to handle NOT as an explicit grammar element:

?list_of_values: "(" or_list_of_values ")"
| optional_not value
?optional_not: NOT optional_not
|

Expanded WILDCARD_LITERAL with 4 patterns to support all wildcard-with-space cases:

Pattern Description Example
1 Starts with * *S3 Browser, *S3 Browser*
2 Ends with * (doesn't start with *) S3 Browser*
3a * appears after a space S3 B*owser
3b * appears before a space S3* Browser

lib/kql/kql/parser.py

Added methods to handle the new grammar rules:

  • list_of_values() - handles optional_not value structure
  • optional_not() - counts NOT occurrences and wraps values with NotValue

lib/kql/kql/kql2eql.py

Added corresponding methods for EQL conversion:

  • list_of_values() - handles optional_not value structure
  • optional_not() - counts NOT occurrences and wraps with eql.ast.Not

Test Results

All 63 kuery tests pass. Verified wildcard cases:

Input Result
field: *S3 Browser* field:*S3\ Browser*
field: S3 Browser* field:S3\ Browser*
field: *S3 Browser field:*S3\ Browser
field: S3 B*owser field:S3\ B*owser
field: S3* Browser field:S3*\ Browser
field: foo* bar* baz field:foo*\ bar*\ baz
process.executable: not /tmp/go-build* not process.executable:/tmp/go-build*
field < value field < value (range expression, not wildcard)

Technical Notes

Pattern 3a Fix

Pattern 3a requires at least one character AFTER the * (uses [...]+ instead of [...]*). This prevents Pattern 2 from incorrectly matching shorter strings like S3 B* when the full value is S3 B*owser.

NOT Keyword Handling

The optional_not grammar approach explicitly parses NOT as a keyword before the value, preventing it from being consumed as part of a wildcard literal. This is safer than regex-only approaches because:

  • NOT token only matches the exact word "not" (case-insensitive)
  • Values like notafile* are still parsed as UNQUOTED_LITERAL
  • Edge case: literal value "not" must be quoted: field: "not"

@imays11 imays11 marked this pull request as ready for review February 23, 2026 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport: auto bug Something isn't working patch python Internal python for the repository Team: TRADE

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants