Add work-in-progress implementation of a new Python parser by JukkaL · Pull Request #20856 · python/mypy

JukkaL · 2026-02-21T13:23:28Z

The new "native" parser (mypy.nativeparse) will eventually replace the current parser (mypy.fastparse). The native parser uses a Rust extension that wraps the Ruff parser to generate a serialized AST, and mypy will deserialize the AST directly into a mypy AST. The binary format is the same one we already use for mypy fixed-format incremental caches.

This is still work in progress and some features aren't supported. The most important missing feature is probably function type comments. Also, the Rust extension needs to be manually compiled from https://github.com/mypyc/ast_serialize. Refer to the ast_serialize repository for instructions. There is no CI support for the new parser right now -- there are tests, but they are skipped unless the ast_serialize extension is installed, and it isn't installed in CI right now.

Once the Rust extension is installed, use --native-parser to enable the new parser. The main type checker test suite can be run using the native parser via TEST_NATIVE_PARSER=1 pytest mypy/test/testheck.py (the TEST_NATIVE_PARSER environment variable needs to be set). A bunch of tests are still failing.

Related issue with more context: #19776

Remaining work is tracked here for now: https://github.com/mypyc/ast_serialize/issues

Here are the expected benefits over the old mypy parser, adapted from the docstring of mypy/nativeparse.py:

No intermediate non-mypyc Python-level AST created, to improve performance
Parsing doesn't need GIL => can use multithreading to construct serialized ASTs in parallel
Produce import dependencies without having to build an AST => helps parallel type checking
Support all Python syntax even if mypy is running on an older Python version
Generate an AST even if there are syntax errors
Potential to support incremental parsing (quickly process modified sections in a file)
Stripping function bodies in third-party code can happen earlier, for extra performance
We have the option to easily add support for # mypy: ignore comments

Most of the code is straightforward and repetitive deserialization code. I used plenty of coding agent assist to implement deserialization and to add tests. The tests are separate from the pre-existing parser tests, but we can unify them later (or delete the old tests once we delete the old parser).

@ilevkivskyi contributed to this PR.

…RSER is set Example: `TEST_NATIVE_PARSER=1 pytest mypy/test/testheck.py`.

This is the mypy counter-part of mypyc/ast_serialize#12. Depends on that PR to work.

This is the mypy counter-part of mypyc/ast_serialize#13 (I am not actually using the new flag yet in `build.py`, I will do this later when the branch is in master)

This is the mypy counterpart of mypyc/ast_serialize#17

This is mypy counterpart for mypyc/ast_serialize#18

github-actions · 2026-02-21T13:43:48Z

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

ilevkivskyi

LG, thanks! Here are some comments, these are mostly minor. If you want to, you can address them in a follow-up PR (but then please don't forget to, because I will).

ilevkivskyi · 2026-02-21T14:01:50Z

mypy/nativeparse.py

+import os
+from typing import Any, Final, cast
+
+import ast_serialize  # type: ignore[import-untyped, import-not-found, unused-ignore]


import-untyped should not be needed anymore, we now ship the stub in latest ast_serialize.

ilevkivskyi · 2026-02-21T14:03:00Z

mypy/nativeparse.py

+class State:
+    def __init__(self, options: Options) -> None:
+        self.options = options
+        self.errors: list[dict[str, Any]] = []


I think it is better to use a TypedDict here.

ilevkivskyi · 2026-02-21T14:13:27Z

mypy/nativeparse.py

+        1 -> An IfStmt if the reachability of it can't be inferred,
+             i.e. the truth value is unknown.
+    """
+    infer_reachability_of_if_statement(stmt, options)


This looks like doing double-work, we already infer reachability of if-blocks in ast_serialize, right? Or am I missing something?

ilevkivskyi · 2026-02-21T14:15:08Z

mypy/nativeparse.py

+
+def native_parse(
+    filename: str, options: Options, skip_function_bodies: bool = False
+) -> tuple[MypyFile, list[dict[str, Any]], TypeIgnores]:


Same as above, we should return a TypedDict (or maybe even a trivial instance, like ParseError).

ilevkivskyi · 2026-02-21T14:18:11Z

mypy/nativeparse.py

+
+    Returns:
+        A tuple containing:
+        - MypyFile: The parsed AST as a mypy AST node


Explain which attributes the caller should set manually (I see the caller in parse.py adds ignored_lines and is_stub).

ilevkivskyi · 2026-02-21T14:43:50Z

mypy/nativeparse.py

+                code="misc",
+            )
+
+    # Process keyword arguments


Again, multiple pointless comments here and below.

ilevkivskyi · 2026-02-21T14:45:46Z

mypy/nativeparse.py

+bin_ops: Final = ["+", "-", "*", "@", "/", "%", "**", "<<", ">>", "|", "^", "&", "//"]
+bool_ops: Final = ["and", "or"]
+cmp_ops: Final = ["==", "!=", "<", "<=", ">", ">=", "is", "is not", "in", "not in"]
+unary_ops: Final = ["~", "not", "+", "-"]


Mention that order of these must be kept in sync with ast_serialize.

ilevkivskyi · 2026-02-21T14:59:11Z

mypy/nativeparse.py

+        read_loc(data, expr)
+        expect_end_tag(data)
+        return expr
+    elif tag == nodes.BIG_INT_EXPR:


Why do we need both INT_EXPR and BIG_INT_EXPR? Can we simplify this?

ilevkivskyi · 2026-02-21T15:00:12Z

mypy/nativeparse.py

+        read_loc(data, expr)
+        expect_end_tag(data)
+        return expr
+    elif tag == nodes.NAMED_EXPR:


This tag name is easy to confuse with NAME_EXPR, it may be better to rename it to ASSIGNMENT_EXPR.

ilevkivskyi · 2026-02-21T15:04:35Z

mypy/nativeparse.py

+def read_expression(state: State, data: ReadBuffer) -> Expression:
+    tag = read_tag(data)
+    expr: Expression
+    if tag == nodes.CALL_EXPR:


It may be beneficial to manually order branches here in terms of how "hot" they are (probably also for statements and/or types), unless you already did this. I did this kind of "manual PGO" for types (by looking at how many instances we create for each during mypy self-check) to help the compiler.

JukkaL added 30 commits January 1, 2026 14:47

[WIP] Add initial nativeparse test case

0c4185f

[WIP] First steps towards parsing something

8c0f906

Support multiple defs

d5291be

Read line/column information

4c72241

Remove debug print

3798d06

Fix self check, update docstring

32d855e

WIP add parse/deserialize benchmark

cf9aefd

Update for new cache format

9da3988

Deserialize member expr

c374e0a

Add data-driven tests

46fe324

Fix empty line in test output

e8ce8f8

Deserialize tuple expressions

bab774b

Deserialize binary operations

ffd4b8d

Deserialize int expressions

310a445

Deserialize assignment

2f92446

Deserialize if statement

845ae1f

Show informatino about panics in tests

52ed3ac

Deserialize additional node types

0681829

Deserialize comparison and bool ops

8046eaf

Add deserialization test for None, True and False

ddba816

Deserialize func defs (partial) and return statements

ce7e013

Deserialize 'pass' and test func defs more

a92202c

Deserialize parameter defaults

4db2423

Deserialize keyword args in calls

2430f83

Test *args and **kwargs in calls

5cf0bef

Minimal deserialization of class defs

7ff2f48

Support base classes

7d50e25

Deserialize floats

321c215

Deserialize unary expressions

4e27e8f

Deserialize dict expressions

e5859df

JukkaL and others added 27 commits February 15, 2026 14:08

Fix failing native parser test

83914fd

Fix test case

9e1f65d

Refactor line/column range set

95319e3

Add note about ast_serialize repo

60f733c

Remove some low-value comments from mypy.nativeparse

7ccf14f

Use native parser in testcheck.py only if env variable TEST_NATIVE_PA…

fd23cb8

…RSER is set Example: `TEST_NATIVE_PARSER=1 pytest mypy/test/testheck.py`.

Lint

98d9a1b

Update semantic analyzer test outputs

3e87c1a

Fix class keywords in tree transform

840bd1e

Fix an obvious ordering bug to correctly set line

144f6d6

Always allow new union syntax inside strings/comments

67ff2e5

Couple tuple-related fixes (#20840)

4270ff8

This is the mypy counter-part of mypyc/ast_serialize#12. Depends on that PR to work.

Detect partial stub packages (#20844)

3756c17

This is the mypy counter-part of mypyc/ast_serialize#13 (I am not actually using the new flag yet in `build.py`, I will do this later when the branch is in master)

Add support for inline TypedDicts (#20847)

854eea8

This is the mypy counterpart of mypyc/ast_serialize#17

Fix couple more edge cases type comments/strings (#20848)

fa08d89

This is mypy counterpart for mypyc/ast_serialize#18

Update docstring

3869d79

Remove test cases that aren't useful any more

512bf15

Update comments

9c2ca82

Remove type ignore that isn't needed any more

f7926c1

Fix self check

f09d7ba

Lint

ca7d19d

Add reachability tests

61478c2

Update docstring

99da49b

Add docstring and reorganize functions

659ecbf

Move some type read functions to be close to each other

8b31855

Clean up test code

0017e18

Merge branch 'master' into new-parser

11d04c6

JukkaL requested a review from ilevkivskyi February 21, 2026 13:23

ilevkivskyi approved these changes Feb 21, 2026

View reviewed changes

Uh oh!

Comments

Conversation

JukkaL commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 21, 2026

Uh oh!

ilevkivskyi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JukkaL commented Feb 21, 2026 •

edited

Loading