Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
5be0a01
docs: design for restricted diagrams (#865, #1110)
dimitri-yatsenko Feb 21, 2026
3adc661
docs: add restriction semantics for non-downstream nodes and OR/AND
dimitri-yatsenko Feb 21, 2026
db0ab4e
docs: clarify convergence semantics and delete vs export scope
dimitri-yatsenko Feb 21, 2026
315bef8
docs: two distinct operators — cascade (OR) and restrict (AND)
dimitri-yatsenko Feb 21, 2026
6049fc3
docs: unify drop under diagram, shared traversal infrastructure
dimitri-yatsenko Feb 21, 2026
d63c130
feat: implement graph-driven cascade delete and restrict on Diagram
dimitri-yatsenko Feb 22, 2026
9fe2df7
Merge upstream/master into design/restricted-diagram
dimitri-yatsenko Feb 22, 2026
ae0eddd
fix: resolve mypy errors in codecs and hash_registry
dimitri-yatsenko Feb 22, 2026
3c028d1
ci: trigger fresh CI run
dimitri-yatsenko Feb 23, 2026
8cdf42d
feat: bump version to 2.2.0dev0
dimitri-yatsenko Feb 23, 2026
f4742be
fix: use restrict_in_place for cascade restrictions in Diagram
dimitri-yatsenko Feb 23, 2026
a2d2693
fix: store part_integrity and cascade_seed on Diagram instance
dimitri-yatsenko Feb 24, 2026
d2626e0
fix: use post-hoc enforce check matching old Table.delete() behavior
dimitri-yatsenko Feb 24, 2026
b88ede7
fix: use restriction_attributes property instead of private _restrict…
dimitri-yatsenko Feb 24, 2026
0bede1d
feat: implement Diagram.prune() to remove empty tables
dimitri-yatsenko Feb 25, 2026
0ae8c80
docs: update design docs to reflect actual implementation
dimitri-yatsenko Mar 2, 2026
934a6fc
docs: rewrite design docs as authoritative specs
dimitri-yatsenko Mar 2, 2026
b8fd688
docs: make bare issue references clickable links
dimitri-yatsenko Mar 2, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
207 changes: 207 additions & 0 deletions docs/design/restricted-diagram-spec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
# Restricted Diagram Specification

**Design:** [restricted-diagram.md](restricted-diagram.md)

## Architecture

Single `class Diagram(nx.DiGraph)` with all operational methods always available. Only visualization methods (`draw`, `make_dot`, `make_svg`, `make_png`, `make_image`, `make_mermaid`, `save`, `_repr_svg_`) are gated on `diagram_active`.

```python
class Diagram(nx.DiGraph):
# Always available: __init__, +/-/*, cascade, restrict, prune,
# delete, drop, preview, topo_sort, _from_table, ...
# Gated on diagram_active: draw, make_dot, make_svg, make_png,
# make_image, make_mermaid, save, _repr_svg_
```

`Dependencies` is the canonical store of the FK graph. `Diagram` copies from it and constructs derived views.

## Instance Attributes

```python
self._connection # Connection
self._cascade_restrictions # dict[str, list] — per-node OR restrictions
self._restrict_conditions # dict[str, AndList] — per-node AND restrictions
self._restriction_attrs # dict[str, set] — restriction attribute names per node
self._part_integrity # str — "enforce", "ignore", or "cascade"
```

Initialized empty in `__init__`. Deep-copied in the copy constructor (`Diagram(other_diagram)`).

## Restriction Modes

A diagram operates in one of three states: **unrestricted** (initial), **cascade**, or **restrict**. The modes are mutually exclusive. `cascade` is applied once; `restrict` can be chained.

```python
# cascade: applied once, OR at convergence, for delete
rd = dj.Diagram(schema).cascade(Session & 'subject_id=1')

# restrict: chainable, AND at convergence, for export
rd = dj.Diagram(schema).restrict(Session & cond).restrict(Stimulus & cond2)

# Mixing raises DataJointError:
dj.Diagram(schema).cascade(A & c).restrict(B & c)
dj.Diagram(schema).restrict(A & c).cascade(B & c)
dj.Diagram(schema).cascade(A & c1).cascade(B & c2)
```

## Methods

### `cascade(self, table_expr, part_integrity="enforce") -> Diagram`

Apply cascade restriction and propagate downstream. Returns a new `Diagram`.

**Semantics:** OR at convergence. A child row is affected if *any* restricted ancestor taints it. Used for delete.

1. Verify no existing cascade or restrict restrictions (raise if present)
2. `result = Diagram(self)` — copy
3. Seed `result._cascade_restrictions[root]` with `list(table_expr.restriction)`
4. Call `_propagate_restrictions(root, mode="cascade", part_integrity=part_integrity)`
5. Return `result`

### `restrict(self, table_expr) -> Diagram`

Apply restrict condition and propagate downstream. Returns a new `Diagram`. Chainable.

**Semantics:** AND at convergence. A child row is included only if it satisfies *all* restricted ancestors. Used for export.

1. Verify no existing cascade restrictions (raise if present)
2. `result = Diagram(self)` — copy
3. Seed/extend `result._restrict_conditions[root]` with `table_expr.restriction`
4. Call `_propagate_restrictions(root, mode="restrict")`
5. Return `result`

### `_propagate_restrictions(self, start_node, mode, part_integrity="enforce")`

Internal. Propagate restrictions from `start_node` to all its descendants in topological order. Only processes descendants of `start_node` to avoid duplicate propagation when chaining `restrict()`.

Uses multiple passes (up to 10) to handle `part_integrity="cascade"` upward propagation, which can add new restricted nodes requiring further propagation.

For each restricted node, iterates over `out_edges(node)`:

1. If target is an alias node (`.isdigit()`), follow through to real child via `out_edges(alias_node)`
2. Delegate to `_apply_propagation_rule()` for the restriction computation
3. Track propagated edges to avoid duplicate work
4. Handle `part_integrity="cascade"`: if child is a part table and its master is not already restricted, propagate upward from part to master using `make_condition(master, (master.proj() & part.proj()).to_arrays(), ...)`, expand the allowed node set, and continue to next pass

### `_apply_propagation_rule(self, parent_ft, parent_attrs, child_node, attr_map, aliased, mode, restrictions)`

Internal. Apply one of three propagation rules to a parent→child edge:

| Condition | Child restriction |
|-----------|-------------------|
| Non-aliased AND `parent_restriction_attrs ⊆ child.primary_key` | Copy parent restriction directly |
| Aliased FK (`attr_map` renames columns) | `parent_ft.proj(**{fk: pk for fk, pk in attr_map.items()})` |
| Non-aliased AND `parent_restriction_attrs ⊄ child.primary_key` | `parent_ft.proj()` |

Accumulates on child:
- `cascade` mode: `restrictions.setdefault(child, []).extend(...)` — list = OR
- `restrict` mode: `restrictions.setdefault(child, AndList()).extend(...)` — AndList = AND

### `delete(self, transaction=True, prompt=None) -> int`

Execute cascading delete using `_cascade_restrictions`. Requires `cascade()` first.

1. Get non-alias nodes with restrictions in topological order
2. If `prompt`: show preview (table name + row count for each)
3. Start transaction (if `transaction=True`)
4. Iterate in **reverse** topological order (leaves first):
- `ft = FreeTable(conn, table_name)`
- `ft.restrict_in_place(self._cascade_restrictions[table_name])`
- `ft.delete_quick(get_count=True)`
- Track which tables had rows deleted
5. On `IntegrityError`: cancel transaction, diagnostic fallback — parse FK error for actionable message about unloaded schemas
6. Post-check `part_integrity="enforce"`: if any part table had rows deleted but its master did not, cancel transaction and raise `DataJointError`
7. Confirm/commit transaction
8. Return count from the root table

### `drop(self, prompt=None, part_integrity="enforce")`

Drop all tables in `nodes_to_show` in reverse topological order.

1. Get non-alias nodes from `nodes_to_show` in topological order
2. Pre-check `part_integrity`: if any part's master is not in the set, raise error
3. If `prompt`: show preview, ask confirmation
4. Iterate in reverse order: `FreeTable(conn, t).drop_quick()`

### `preview(self) -> dict[str, int]`

Show affected tables and row counts without modifying data. Requires `cascade()` or `restrict()` first.

Returns `{full_table_name: row_count}` for each node with a restriction.

### `prune(self) -> Diagram`

Remove tables with zero matching rows from the diagram. Returns a new `Diagram`.

1. `result = Diagram(self)` — copy
2. If restrictions exist (`_cascade_restrictions` or `_restrict_conditions`):
- For each restricted node, build `FreeTable` with restriction applied
- If `len(ft) == 0`: remove from restrictions dict, `_restriction_attrs`, and `nodes_to_show`
3. If no restrictions (unrestricted diagram):
- For each node in `nodes_to_show`, check `len(FreeTable(conn, node))`
- If 0: remove from `nodes_to_show`
4. Return `result`

Properties: idempotent, chainable (`restrict()` can follow `prune()`), skips alias nodes.

### `_from_table(cls, table_expr) -> Diagram`

Classmethod factory for `Table.delete()` and `Table.drop()`. Creates a Diagram containing `table_expr` and all its descendants, bypassing the normal `__init__` (no caller-frame introspection or source-type resolution).

## `Table` Integration

### `Table.delete()`

Delegates to `Diagram`:

```python
def delete(self, transaction=True, prompt=None, part_integrity="enforce"):
from .diagram import Diagram
diagram = Diagram._from_table(self)
diagram = diagram.cascade(self, part_integrity=part_integrity)
return diagram.delete(transaction=transaction, prompt=prompt)
```

### `Table.drop()`

Delegates to `Diagram`:

```python
def drop(self, prompt=None, part_integrity="enforce"):
if self.restriction:
raise DataJointError("A restricted Table cannot be dropped.")
from .diagram import Diagram
diagram = Diagram._from_table(self)
diagram.drop(prompt=prompt, part_integrity=part_integrity)
```

### `Part.drop()`

Passes `part_integrity` through to `super().drop()`.

## Restriction Semantics

| DataJoint type | Python type | SQL meaning |
|----------------|-------------|-------------|
| OR-combined restrictions | `list` | `WHERE (r1) OR (r2) OR ...` |
| AND-combined restrictions | `AndList` | `WHERE (r1) AND (r2) AND ...` |
| No restriction | empty `list` or `AndList()` | No WHERE clause (all rows) |

`_cascade_restrictions` values are `list` (OR). An unrestricted cascade stores `[]`, meaning all rows.

`_restrict_conditions` values are `AndList` (AND). Each `.restrict()` call extends the AndList.

## Edge Cases

1. **Unrestricted delete**: `(Session()).delete()` — empty restriction propagates as "all rows" to all descendants.

2. **Mutual exclusivity**: `cascade` and `restrict` cannot be mixed. `cascade` is one-shot. `restrict` is chainable. Violations raise `DataJointError`.

3. **Alias nodes**: Walk `out_edges(parent)`. If target is alias (`.isdigit()`), read `attr_map` from parent→alias edge, follow alias→child. Apply Rule 2 (aliased projection). Multiple alias paths from same parent to same child produce OR entries.

4. **Circular import**: `diagram.py` needs `FreeTable` from `table.py`. `table.py` needs `Diagram` from `diagram.py`. Both use lazy imports inside method bodies.

5. **Nodes not in graph**: If `table_expr.full_table_name` not in `self.nodes()`, raise `DataJointError`.

6. **Disabled visualization**: Operational methods always work. Only visualization methods check `diagram_active`.
133 changes: 133 additions & 0 deletions docs/design/restricted-diagram.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# Restricted Diagrams

**Issues:** [#865](https://github.com/datajoint/datajoint-python/issues/865), [#1110](https://github.com/datajoint/datajoint-python/issues/1110)

## Motivation

### Error-driven cascade is fragile

The original cascade delete worked by trial-and-error: attempt `DELETE` on the parent, catch the FK integrity error, parse the MySQL error message to discover which child table is blocking, then recursively delete from that child first.

This approach has several problems:

- **MySQL 8 with limited privileges:** Returns error 1217 (`ROW_IS_REFERENCED`) instead of 1451 (`ROW_IS_REFERENCED_2`), which provides no table name. The cascade crashes ([#1110](https://github.com/datajoint/datajoint-python/issues/1110)).
- **PostgreSQL overhead:** PostgreSQL aborts the entire transaction on any error. Each failed delete attempt requires `SAVEPOINT` / `ROLLBACK TO SAVEPOINT` round-trips.
- **Fragile parsing:** Different MySQL versions and privilege levels produce different error message formats.

### Graph-driven approach

`drop()` already uses graph-driven traversal — walking the dependency graph in reverse topological order, dropping leaves first. The same pattern applies to cascade delete, with the addition of **restriction propagation** through FK attribute mappings.

### Data subsetting

`dj.Diagram` provides set operators for specifying subsets of *tables*. Per-node restrictions complete the functionality for specifying cross-sections of *data* — enabling delete, export, backup, and sharing.

## Core Concept

A restricted diagram is a `Diagram` augmented with per-node restrictions. Two operators apply restrictions with different propagation semantics:

- **`cascade(expr)`** — OR at convergence. "This data and everything depending on it." For delete.
- **`restrict(expr)`** — AND at convergence. "The cross-section matching all criteria." For export.

Both propagate restrictions downstream through FK edges using `attr_map`. They differ only in how restrictions combine when multiple restricted ancestors converge at the same child.

## Restriction Propagation

A restriction applied to one table node propagates downstream through FK edges in topological order. Each downstream node accumulates a restriction derived from its restricted parent(s).

**Propagation rules for edge `Parent → Child` with `attr_map`:**

1. **Non-aliased FK** (`attr_map` is identity, e.g. `{'mouse_id': 'mouse_id'}`):
If the parent's restriction attributes are a subset of the child's primary key, copy the restriction directly. Otherwise, restrict child by `parent.proj()`.

2. **Aliased FK** (`attr_map` renames, e.g. `{'source_mouse': 'mouse_id'}`):
Restrict child by `parent.proj(**{fk: pk for fk, pk in attr_map.items()})`.

### Converging paths

A child node may have multiple restricted ancestors. The combination rule depends on the operator:

```
Session ──→ Recording ←── Stimulus
↓ ↓
subject=1 type="visual"
```

`Recording` receives two propagated restrictions: R1 from Session, R2 from Stimulus.

**`cascade` — OR (union):** A recording is deleted if tainted by *any* restricted parent. Correct for referential integrity: if the parent row is being deleted, all child rows referencing it must go.

**`restrict` — AND (intersection):** A recording is included only if it satisfies *all* restricted ancestors. Correct for subsetting: only rows matching every condition are selected.

**Implementation:** `cascade` appends to a `list` (OR in DataJoint). `restrict` appends to an `AndList` (AND in DataJoint). The two modes are mutually exclusive on the same diagram.

### Multiple FK paths from same parent (alias nodes)

A child may reference the same parent through multiple FKs (e.g., `source_mouse` and `target_mouse` both referencing `Mouse`). These are represented as alias nodes in the dependency graph.

Multiple FK paths from the same restricted parent always combine with **OR** regardless of operation — structural, not operation-dependent.

### `part_integrity`

| Mode | Behavior |
|------|----------|
| `"enforce"` | Error if parts would be deleted without their masters |
| `"ignore"` | Allow deleting parts without masters |
| `"cascade"` | Propagate restriction upward from part to master, then re-propagate downstream |

### Pruning

After applying restrictions, some tables may have zero matching rows. `prune()` removes these from the diagram, leaving only the subgraph with actual data. Without prior restrictions, `prune()` removes physically empty tables.

### Unloaded schemas

If a child table lives in a schema not loaded into the dependency graph, the graph-driven delete won't know about it. The final parent `delete_quick()` fails with an FK error. Error-message parsing is retained as a **diagnostic fallback** to produce an actionable error: "activate schema X."

## API

```python
# cascade: OR propagation for delete
rd = dj.Diagram(schema).cascade(Session & 'subject_id=1')
rd.preview() # show affected tables and row counts
rd.delete() # downstream only, OR at convergence

# restrict: AND propagation for data subsetting
rd = (dj.Diagram(schema)
.restrict(Session & 'subject_id=1')
.restrict(Stimulus & 'type="visual"'))
rd.preview() # show selected tables and row counts

# prune: remove tables with zero matching rows
rd = (dj.Diagram(schema)
.restrict(Subject & {'species': 'mouse'})
.restrict(Session & 'session_date > "2024-01-01"')
.prune())
rd.preview() # only tables with matching rows
rd # visualize the export subgraph

# unrestricted prune: remove physically empty tables
dj.Diagram(schema).prune()

# drop: no restriction, drops entire tables
dj.Diagram(Session).drop()

# cascade with part_integrity
dj.Diagram(schema).cascade(PartTable & 'key=1', part_integrity="cascade").delete()

# Table.delete() delegates to Diagram internally
(Session & 'subject_id=1').delete()
# equivalent to:
# dj.Diagram._from_table(Session).cascade(Session & 'subject_id=1').delete()
```

## Advantages

| | Error-driven | Graph-driven |
|---|---|---|
| MySQL 8 + limited privileges | Crashes ([#1110](https://github.com/datajoint/datajoint-python/issues/1110)) | Works — no error parsing needed |
| PostgreSQL | Savepoint overhead per attempt | No errors triggered |
| Multiple FKs to same child | One-at-a-time via retry loop | All paths resolved upfront |
| part_integrity enforcement | Post-hoc check after delete | Post-check with transaction rollback |
| Unloaded schemas | Crash with opaque error | Clear error: "activate schema X" |
| Reusability | Delete-only | Delete, drop, export, prune |
| Inspectability | Opaque recursive cascade | Preview affected data before executing |
1 change: 1 addition & 0 deletions src/datajoint/builtin_codecs/attach.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ def decode(self, stored: bytes, *, key: dict | None = None) -> str:
config = (key or {}).get("_config")
if config is None:
from ..settings import config
assert config is not None
download_path = Path(config.get("download_path", "."))
download_path.mkdir(parents=True, exist_ok=True)
local_path = download_path / filename
Expand Down
1 change: 1 addition & 0 deletions src/datajoint/builtin_codecs/filepath.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ def encode(self, value: Any, *, key: dict | None = None, store_name: str | None
config = (key or {}).get("_config")
if config is None:
from ..settings import config
assert config is not None

path = str(value)

Expand Down
Loading
Loading