Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 64 additions & 1 deletion src/explanation/whats-new-22.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# What's New in DataJoint 2.2

DataJoint 2.2 introduces **isolated instances** and **thread-safe mode** for applications that need multiple independent database connections—web servers, multi-tenant notebooks, parallel pipelines, and testing.
DataJoint 2.2 introduces **isolated instances**, **thread-safe mode**, and **graph-driven diagram operations** for applications that need multiple independent database connections, explicit cascade control, and operational use of the dependency graph.

> **Upgrading from 2.0 or 2.1?** No breaking changes. All existing code using `dj.config` and `dj.Schema()` continues to work. The new Instance API is purely additive.
Expand Down Expand Up @@ -201,9 +201,72 @@ class MyTable(dj.Manual):

Once a Schema is created, table definitions, inserts, queries, and all other operations work identically regardless of which pattern was used to create the Schema.

## Graph-Driven Diagram Operations

DataJoint 2.2 promotes `dj.Diagram` from a visualization tool to an operational component. The same dependency graph that renders pipeline diagrams now powers cascade delete, table drop, and data subsetting.

### From Visualization to Operations

In prior versions, `dj.Diagram` existed solely for visualization — drawing the dependency graph as SVG or Mermaid output. The cascade logic inside `Table.delete()` traversed dependencies independently, with no way to inspect or control the cascade before it executed.

In 2.2, `Table.delete()` and `Table.drop()` delegate internally to `dj.Diagram`. The user-facing behavior of `Table.delete()` is unchanged, but the diagram-level API is now available as a more powerful interface for complex scenarios.

### The Preview-Then-Execute Pattern

The key benefit of the diagram-level API is the ability to build a cascade explicitly, inspect it, and then decide whether to execute:

```python
# Build the dependency graph
diag = dj.Diagram(schema)

# Apply cascade restriction — nothing is deleted yet
restricted = diag.cascade(Session & {'subject_id': 'M001'})

# Inspect: what tables and how many rows would be affected?
counts = restricted.preview()
# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45}

# Execute only after reviewing the blast radius
restricted.delete(prompt=False)
```

This is valuable when working with unfamiliar pipelines, large datasets, or multi-schema dependencies where the cascade impact is not immediately obvious.

### Two Propagation Modes

The diagram supports two restriction propagation modes with different convergence semantics:

**`cascade()` uses OR at convergence.** When a child table has multiple restricted ancestors, the child row is affected if *any* parent path reaches it. This is the right semantics for delete — if any reason exists to remove a row, it should be removed. `cascade()` is one-shot: it can only be called once on an unrestricted diagram.

**`restrict()` uses AND at convergence.** A child row is included only if *all* restricted ancestors match. This is the right semantics for data subsetting and export — only rows satisfying every condition are selected. `restrict()` is chainable: call it multiple times to build up conditions from different tables.

The two modes are mutually exclusive on the same diagram. This prevents accidental mixing of incompatible semantics.

### Pruning Empty Tables

After applying restrictions, some tables in the diagram may have zero matching rows. The `prune()` method removes these tables from the diagram, leaving only the subgraph with actual data:

```python
export = (dj.Diagram(schema)
.restrict(Subject & {'species': 'mouse'})
.restrict(Session & 'session_date > "2024-01-01"')
.prune())

export.preview() # only tables with matching rows
export # visualize the export subgraph
```

Without prior restrictions, `prune()` removes physically empty tables. This is useful for understanding which parts of a pipeline are populated.

### Architecture

`Table.delete()` now constructs a `Diagram` internally, calls `cascade()`, and then `delete()`. This means every table-level delete benefits from the same graph-driven logic. The diagram-level API simply exposes this machinery for direct use when more control is needed.

## See Also

- [Use Isolated Instances](../how-to/use-instances.md/) — Task-oriented guide
- [Working with Instances](../tutorials/advanced/instances.ipynb/) — Step-by-step tutorial
- [Configuration Reference](../reference/configuration.md/) — Thread-safe mode settings
- [Configure Database](../how-to/configure-database.md/) — Connection setup
- [Diagram Specification](../reference/specs/diagram.md/) — Full reference for diagram operations
- [Delete Data](../how-to/delete-data.md/) — Task-oriented delete guide
35 changes: 35 additions & 0 deletions src/how-to/delete-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,8 +189,43 @@ count = (Subject & restriction).delete(prompt=False)
print(f"Deleted {count} subjects")
```

## Diagram-Level Delete

!!! version-added "New in 2.2"
Diagram-level delete was added in DataJoint 2.2.

For complex scenarios — previewing the blast radius, working across schemas, or understanding the dependency graph before deleting — use `dj.Diagram` to build and inspect the cascade before executing.

### Build, Preview, Execute

```python
import datajoint as dj

# 1. Build the dependency graph
diag = dj.Diagram(schema)

# 2. Apply cascade restriction (nothing deleted yet)
restricted = diag.cascade(Session & {'subject_id': 'M001'})

# 3. Preview: see affected tables and row counts
counts = restricted.preview()
# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45}

# 4. Execute only after reviewing
restricted.delete(prompt=False)
```

### When to Use

- **Preview blast radius**: Understand what a cascade delete will affect before committing
- **Multi-schema cascades**: Build a diagram spanning multiple schemas and delete across them in one operation
- **Programmatic control**: Use `preview()` return values to make decisions in automated workflows

For simple single-table deletes, `(Table & restriction).delete()` remains the simplest approach. The diagram-level API is for when you need more visibility or control.

## See Also

- [Diagram Specification](../reference/specs/diagram.md/) — Full reference for diagram operations
- [Master-Part Tables](master-part.ipynb) — Compositional data patterns
- [Model Relationships](model-relationships.ipynb) — Foreign key patterns
- [Insert Data](insert-data.md) — Adding data to tables
Expand Down
47 changes: 3 additions & 44 deletions src/how-to/read-diagrams.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1325,22 +1325,7 @@
"cell_type": "markdown",
"id": "cell-ops-ref",
"metadata": {},
"source": [
"**Operation Reference:**\n",
"\n",
"| Operation | Meaning |\n",
"|-----------|--------|\n",
"| `dj.Diagram(schema)` | Entire schema |\n",
"| `dj.Diagram(Table) - N` | Table + N levels upstream |\n",
"| `dj.Diagram(Table) + N` | Table + N levels downstream |\n",
"| `D1 + D2` | Union of two diagrams |\n",
"| `D1 * D2` | Intersection (common nodes) |\n",
"\n",
"**Finding paths:** Use intersection to find connection paths:\n",
"```python\n",
"(dj.Diagram(upstream) + 100) * (dj.Diagram(downstream) - 100)\n",
"```"
]
"source": "**Operation Reference:**\n\n| Operation | Meaning |\n|-----------|--------|\n| `dj.Diagram(schema)` | Entire schema |\n| `dj.Diagram(Table) - N` | Table + N levels upstream |\n| `dj.Diagram(Table) + N` | Table + N levels downstream |\n| `D1 + D2` | Union of two diagrams |\n| `D1 * D2` | Intersection (common nodes) |\n| `D.prune()` | Remove tables with zero matching rows *(2.2+)* |\n\n**Finding paths:** Use intersection to find connection paths:\n```python\n(dj.Diagram(upstream) + 100) * (dj.Diagram(downstream) - 100)\n```"
},
{
"cell_type": "markdown",
Expand Down Expand Up @@ -3322,33 +3307,7 @@
"cell_type": "markdown",
"id": "cell-summary-md",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"| Visual | Meaning |\n",
"|--------|--------|\n",
"| **Thick solid** | One-to-one extension |\n",
"| **Thin solid** | One-to-many containment |\n",
"| **Dashed** | Reference (independent identity) |\n",
"| **Underlined** | Introduces new dimension |\n",
"| **Orange dots** | Renamed FK via `.proj()` |\n",
"| **Colors** | Green=Manual, Gray=Lookup, Red=Computed, Blue=Imported |\n",
"| **Grouped boxes** | Tables grouped by schema/module |\n",
"| **3D box (gray)** | Collapsed schema *(2.1+)* |\n",
"\n",
"| Feature | Method |\n",
"|---------|--------|\n",
"| Layout direction | `dj.config.display.diagram_direction` |\n",
"| Mermaid output | `.make_mermaid()` |\n",
"| Collapse schema | `.collapse()` *(2.1+)* |\n",
"\n",
"## Related\n",
"\n",
"- [Diagram Specification](../reference/specs/diagram.md)\n",
"- [Entity Integrity: Dimensions](../explanation/entity-integrity.md#schema-dimensions)\n",
"- [Semantic Matching](../reference/specs/semantic-matching.md)\n",
"- [Schema Design Tutorial](../tutorials/basics/02-schema-design.ipynb)"
]
"source": "## Summary\n\n| Visual | Meaning |\n|--------|--------|\n| **Thick solid** | One-to-one extension |\n| **Thin solid** | One-to-many containment |\n| **Dashed** | Reference (independent identity) |\n| **Underlined** | Introduces new dimension |\n| **Orange dots** | Renamed FK via `.proj()` |\n| **Colors** | Green=Manual, Gray=Lookup, Red=Computed, Blue=Imported |\n| **Grouped boxes** | Tables grouped by schema/module |\n| **3D box (gray)** | Collapsed schema *(2.1+)* |\n\n| Feature | Method |\n|---------|--------|\n| Layout direction | `dj.config.display.diagram_direction` |\n| Mermaid output | `.make_mermaid()` |\n| Collapse schema | `.collapse()` *(2.1+)* |\n| Prune empty tables | `.prune()` *(2.2+)* |\n\n## Related\n\n- [Diagram Specification](../reference/specs/diagram.md)\n- [Entity Integrity: Dimensions](../explanation/entity-integrity.md#schema-dimensions)\n- [Semantic Matching](../reference/specs/semantic-matching.md)\n- [Schema Design Tutorial](../tutorials/basics/02-schema-design.ipynb)"
},
{
"cell_type": "code",
Expand Down Expand Up @@ -3397,4 +3356,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
}
}
3 changes: 3 additions & 0 deletions src/reference/specs/data-manipulation.md
Original file line number Diff line number Diff line change
Expand Up @@ -332,6 +332,9 @@ Delete automatically cascades to all dependent tables:
2. Recursively delete matching rows in child tables
3. Delete rows in target table

!!! version-added "New in 2.2"
`Table.delete()` now uses graph-driven cascade internally via `dj.Diagram`. User-facing behavior is unchanged — the same parameters and return values apply. For direct control over the cascade (preview, multi-schema operations), use the [Diagram operational methods](diagram.md#operational-methods).

### 4.3 Basic Usage

```python
Expand Down
Loading
Loading