-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Problem
SQLAlchemy's create_engine() supports json_serializer and json_deserializer parameters to customize how JSON columns are serialized/deserialized. This is documented behavior and works with PostgreSQL, MySQL, SQLite, and other dialects.
However, passing these to the Spanner dialect raises TypeError:
engine = create_engine(
"spanner:///projects/p/instances/i/databases/d",
json_serializer=lambda obj: json.dumps(obj, cls=MyEncoder),
)
# TypeError: Invalid argument(s) 'json_serializer' sent to create_engine(),
# using configuration SpannerDialect/QueuePool/Engine.Root cause
SQLAlchemy's create_engine() uses util.get_cls_kwargs() to determine which kwargs the dialect accepts. Since SpannerDialect.__init__ does not declare json_serializer or json_deserializer, they are rejected.
Other dialects (e.g., PGDialect) accept these in their __init__ and store them on the instance, where DefaultDialect exposes them as _json_serializer / _json_deserializer for use by JSON.bind_processor().
Additional complexity
Even if the dialect accepted the kwargs, there's a pipeline mismatch. SQLAlchemy expects _json_serializer to be a json.dumps-like callable (returns a string), but the Spanner dialect sets _json_serializer = JsonObject — a class constructor that produces a JsonObject instance. The actual string serialization happens later in _helpers._make_param_value_pb when it calls obj.serialize().
A proposed fix uses a serialize-then-wrap strategy: pre-serialize with the user's function, then JsonObject.from_str() the result. This requires no changes to JsonObject or the core google-cloud-spanner library.
Expected behavior
engine = create_engine(
"spanner:///...",
json_serializer=lambda obj: json.dumps(obj, default=my_handler),
)should work, allowing custom types in JSON columns to be serialized correctly through the DML path.
Environment
sqlalchemy-spanner: 1.7.0sqlalchemy: 2.0.xgoogle-cloud-spanner: 3.x