diff --git a/DESIGN_ISSUES.md b/DESIGN_ISSUES.md
index 8f035042..340e9276 100644
--- a/DESIGN_ISSUES.md
+++ b/DESIGN_ISSUES.md
@@ -416,10 +416,10 @@ message text. A shared parameterized validation helper would eliminate the dupli
 ## `src/orcapod/core/operators/` — Async execution
 
 ### O1 — Operators use barrier-mode `async_execute` only; streaming/incremental overrides needed
-**Status:** open
+**Status:** in progress
 **Severity:** medium
 
-All operators currently use the default barrier-mode `async_execute` inherited from
+All operators originally used the default barrier-mode `async_execute` inherited from
 `StaticOutputPod`: collect all input rows into memory, materialize to `ArrowTableStream`(s),
 run the existing sync `static_process`, then emit results. This works correctly but negates the
 latency and memory benefits of the push-based channel model.
@@ -428,20 +428,27 @@ Three categories of improvement are planned:
 
 1. **Streaming overrides (row-by-row, zero buffering)** — for operators that process rows
    independently:
-   - `PolarsFilter` — evaluate predicate per row, emit or drop immediately
-   - `MapTags` / `MapPackets` — rename columns per row, emit immediately
-   - `SelectTagColumns` / `SelectPacketColumns` — project columns per row, emit immediately
-   - `DropTagColumns` / `DropPacketColumns` — drop columns per row, emit immediately
+   - ~~`PolarsFilter` — evaluate predicate per row, emit or drop immediately~~ (kept barrier:
+     Polars expressions require DataFrame context for evaluation)
+   - `MapTags` / `MapPackets` — rename columns per row, emit immediately ✅
+   - `SelectTagColumns` / `SelectPacketColumns` — project columns per row, emit immediately ✅
+   - `DropTagColumns` / `DropPacketColumns` — drop columns per row, emit immediately ✅
 
 2. **Incremental overrides (stateful, eager emit)** — for multi-input operators that can
    produce partial results before all inputs are consumed:
-   - `Join` — symmetric hash join: index each input by tag keys, emit matches as they arrive
-   - `MergeJoin` — same approach, with list-merge on colliding packet columns
-   - `SemiJoin` — buffer the right (filter) input fully, then stream the left input and emit
-     matches (right must be fully consumed first, but left can stream)
+   - `Join` — symmetric hash join for 2 inputs (streaming, with correct
+     system-tag name-extending via `input_pipeline_hashes` passed directly
+     to `async_execute`); barrier fallback for N>2 inputs via `static_process`. ✅
+   - `MergeJoin` — kept barrier: complex column-merging logic
+   - `SemiJoin` — build right, stream left through hash lookup ✅
+
+3. **Streaming accumulation:**
+   - `Batch` — emit full batches as they accumulate (`batch_size > 0`); barrier fallback
+     when `batch_size == 0` (batch everything) ✅
 
-3. **Barrier-only (no change needed):**
-   - `Batch` — inherently requires all rows before grouping; barrier mode is correct
+**Remaining:** `PolarsFilter` (barrier), `MergeJoin` (barrier) could receive incremental
+overrides in the future but require careful handling of Polars expression evaluation and
+system-tag evolution respectively.
 
 ---
 
@@ -527,6 +534,29 @@ await AddResult(grade_pf).async_execute([input_ch], output_ch)
 
 ---
 
+## `src/orcapod/hashing/semantic_hashing/`
+
+### H1 — Semantic hasher does not support PEP 604 union types (`int | None`)
+**Status:** open
+**Severity:** medium
+
+The `BaseSemanticHasher` raises `BeartypeDoorNonpepException` when hashing a
+`PythonPacketFunction` whose return type uses PEP 604 syntax (`int | None`).
+The hasher's `_handle_unknown` path receives `types.UnionType` (the Python 3.10+ type for
+`X | Y` expressions) and has no registered handler for it.
+
+`typing.Optional[int]` also fails (different error path through beartype).
+
+This means packet functions cannot use union return types — a common pattern for functions
+that may filter packets by returning `None`.
+
+**Workaround:** Use non-union return types and raise/return sentinel values instead.
+
+**Fix needed:** Register a `TypeHandlerProtocol` for `types.UnionType` (and
+`typing.Union`/`typing.Optional`) in the semantic hasher's type handler registry.
+
+---
+
 ### G2 — Pod Group abstraction for other composite pod patterns
 **Status:** open
 **Severity:** low
diff --git a/orcapod-design.md b/orcapod-design.md
index 169ab142..7684a2cc 100644
--- a/orcapod-design.md
+++ b/orcapod-design.md
@@ -468,15 +468,57 @@ async def async_execute(
 
 Nodes consume `(Tag, Packet)` pairs from input channels and produce them to an output channel. This enables push-based, streaming execution where data flows through the pipeline as soon as it's available, with backpressure propagated via bounded channel buffers.
 
-**Operator async strategies:**
+**FunctionPod async strategy:** Streaming mode — each input `(tag, packet)` is processed independently with semaphore-controlled concurrency. Uses `asyncio.TaskGroup` for structured concurrency.
+
+#### Operator Async Strategies
+
+Each operator overrides `async_execute` with the most efficient streaming pattern its semantics permit. The default fallback (inherited from `StaticOutputPod`) is barrier mode: collect all inputs via `asyncio.gather`, materialize to `ArrowTableStream`, call `static_process`, and emit results. Operators override this default when a more incremental strategy is possible.
 
 | Strategy | Description | Operators |
 |---|---|---|
-| **Barrier mode** (default) | Collect all inputs, run `static_process`, emit results | Batch (inherently barrier) |
-| **Streaming overrides** | Process rows individually, zero buffering | PolarsFilter, MapTags, MapPackets, Select/Drop columns |
-| **Incremental overrides** | Stateful, emit partial results as inputs arrive | Join (symmetric hash join), MergeJoin, SemiJoin (buffer right, stream left) |
+| **Per-row streaming** | Transform each `(Tag, Packet)` independently as it arrives; zero buffering beyond the current row | SelectTagColumns, SelectPacketColumns, DropTagColumns, DropPacketColumns, MapTags, MapPackets |
+| **Accumulate-and-emit** | Buffer rows up to `batch_size`, emit full batches immediately, flush partial at end | Batch (`batch_size > 0`) |
+| **Build-probe** | Collect one side fully (build), then stream the other through a hash lookup (probe) | SemiJoin |
+| **Symmetric hash join** | Read both sides concurrently, buffer + index both, emit matches as they're found | Join (2 inputs) |
+| **Barrier mode** | Collect all inputs, run `static_process`, emit results | PolarsFilter, MergeJoin, Batch (`batch_size = 0`), Join (N > 2 inputs) |
 
-**FunctionPod async strategy:** Streaming mode — each input `(tag, packet)` is processed independently with semaphore-controlled concurrency. Uses `asyncio.TaskGroup` for structured concurrency.
+#### Per-Row Streaming (Unary Column/Map Operators)
+
+For operators that transform each row independently (column selection, column dropping, column renaming), the async path iterates `async for tag, packet in inputs[0]` and applies the transformation per row. Column metadata (which columns to drop, the rename map, etc.) is computed lazily on the first row and cached for subsequent rows. This avoids materializing the entire input into an Arrow table, enabling true pipeline-level streaming where upstream producers and downstream consumers run concurrently.
+
+#### Accumulate-and-Emit (Batch)
+
+When `batch_size > 0`, Batch accumulates rows into a buffer and emits a batched result stream each time the buffer reaches `batch_size`. Any partial batch at the end is emitted unless `drop_partial_batch` is set. When `batch_size = 0` (meaning "batch everything into one group"), the operator must see all input before producing output, so it falls back to barrier mode.
+
+#### Build-Probe (SemiJoin)
+
+SemiJoin is non-commutative: the left side is filtered by the right side. The async implementation collects the right (build) side fully, constructs a hash set of its key tuples, then streams the left (probe) side through the lookup — emitting each left row whose keys appear in the right set. This is the same pattern as Kafka's KStream-KTable join: the table side is materialized, the stream side drives output.
+
+#### Symmetric Hash Join
+
+The 2-input Join uses a symmetric hash join — the same algorithm used by Apache Kafka for KStream-KStream joins and by Apache Flink for regular streaming joins. Both input channels are drained concurrently into a shared `asyncio.Queue`. For each arriving row:
+
+1. Buffer the row on its side and index it by the shared key columns.
+2. Probe the opposite side's index for matching keys.
+3. Emit all matches immediately.
+
+When the first rows from both sides have arrived, the shared key columns are determined (intersection of tag column names). Any rows that arrived before shared keys were known are re-indexed and cross-matched in a one-time reconciliation step.
+
+**Comparison with industry stream processors:**
+
+| Aspect | Kafka Streams (KStream-KStream) | Apache Flink (Regular Join) | OrcaPod |
+|---|---|---|---|
+| Algorithm | Symmetric windowed hash join | Symmetric hash join with state TTL | Symmetric hash join |
+| Windowing | Required (sliding window bounds state) | Optional (TTL evicts old state) | Not needed (finite streams) |
+| State backend | RocksDB state stores for fault tolerance | RocksDB / heap state with checkpointing | In-memory buffers |
+| State cleanup | Window expiry evicts old records | TTL or watermark eviction | Natural termination — inputs are finite |
+| N-way joins | Chained pairwise joins | Chained pairwise joins | 2-way: symmetric hash; N > 2: barrier + Arrow join |
+
+The symmetric hash join is optimal for our use case: it emits results with minimum latency (as soon as a match exists on both sides) and requires no windowing complexity since OrcaPod streams are finite. For N > 2 inputs, the operator falls back to barrier mode with Arrow-level join execution, which is efficient for bounded data and avoids the complexity of chaining pairwise streaming joins.
+
+**Why not build-probe for Join?** Since Join is commutative and input sizes are unknown upfront, there is no principled way to choose which side to build vs. probe. Symmetric hash join avoids this asymmetry. SemiJoin, being non-commutative, has a natural build (right) and probe (left) side.
+
+**Why barrier for PolarsFilter and MergeJoin?** PolarsFilter requires a Polars DataFrame context for predicate evaluation, which needs full materialization. MergeJoin's column-merging semantics (colliding columns become sorted `list[T]`) require seeing all rows to produce correctly typed output columns.
 
 ### Sync / Async Equivalence
 
diff --git a/plan.md b/plan.md
new file mode 100644
index 00000000..5ac73a63
--- /dev/null
+++ b/plan.md
@@ -0,0 +1,830 @@
+# Plan: Unified `process_packet` / `async_process_packet` + Node `async_execute`
+
+## Goal
+
+Establish `process_packet` and `async_process_packet` as **the** universal per-packet
+interface across FunctionPod, FunctionPodStream, FunctionNode, and PersistentFunctionNode.
+All iteration paths — sequential, concurrent, and async — route through these methods.
+Add `async_execute` to all four Node classes. Add cache-aware `async_call` to
+`CachedPacketFunction`. Remove `_execute_concurrent` module-level helper.
+
+---
+
+## What exists today
+
+### Class hierarchy
+
+```
+_FunctionPodBase (TraceableBase)
+  ├── process_packet(tag, packet)       → calls packet_function.call(packet)
+  ├── FunctionPod
+  │     ├── process() → FunctionPodStream
+  │     └── async_execute()             → calls packet_function.async_call(packet) DIRECTLY
+  │
+  FunctionPodStream (StreamBase)
+  │   ├── _iter_packets_sequential()    → calls _function_pod.process_packet(tag, packet) ✓
+  │   └── _iter_packets_concurrent()    → calls _execute_concurrent(packet_function, ...) DIRECTLY
+  │
+  FunctionNode (StreamBase)
+  │   ├── _iter_packets_sequential()    → calls _packet_function.call(packet) DIRECTLY
+  │   ├── _iter_packets_concurrent()    → calls _execute_concurrent(_packet_function, ...) DIRECTLY
+  │   └── (no async_execute)
+  │
+  PersistentFunctionNode (FunctionNode)
+      ├── process_packet(tag, packet)   → calls _packet_function.call(packet, skip_cache_*=...)
+      │                                   then add_pipeline_record(...)
+      ├── iter_packets()                → Phase 1: replay from DB
+      │                                   Phase 2: calls self.process_packet(tag, packet) ✓
+      └── (no async_execute)
+
+OperatorNode (StreamBase)
+  ├── run()                             → calls _operator.process(*streams)
+  └── (no async_execute)
+
+PersistentOperatorNode (OperatorNode)
+  ├── _compute_and_store()              → calls _operator.process() + bulk DB write
+  ├── _replay_from_cache()              → loads from DB
+  └── (no async_execute)
+```
+
+### Module-level helpers
+
+```python
+def _executor_supports_concurrent(packet_function) -> bool:
+    """True if the pf's executor supports concurrent execution."""
+
+def _execute_concurrent(packet_function, packets) -> list[PacketProtocol | None]:
+    """Submit all packets concurrently via asyncio.gather(pf.async_call(...)).
+    Falls back to sequential pf.call() if already inside a running event loop."""
+```
+
+### Problems
+
+1. **FunctionPod.async_execute** bypasses `process_packet` — calls `packet_function.async_call`
+   directly (line 317).
+2. **FunctionPodStream._iter_packets_concurrent** bypasses `process_packet` — calls
+   `_execute_concurrent(packet_function, ...)` directly (line 472).
+3. **FunctionNode._iter_packets_sequential** bypasses any process_packet — calls
+   `_packet_function.call(packet)` directly (line 831).
+4. **FunctionNode._iter_packets_concurrent** same — calls `_execute_concurrent` directly
+   (line 852).
+5. **CachedPacketFunction.async_call** inherits from `PacketFunctionWrapper` — completely
+   **bypasses the cache** (no lookup, no recording).
+6. **No `async_process_packet`** exists anywhere.
+7. **No `async_execute`** on any Node class.
+8. **`_execute_concurrent`** is a module-level function that takes a raw `packet_function`
+   and list of bare `packets` — no way to route through `process_packet`.
+
+---
+
+## Design principles
+
+### A. `process_packet` / `async_process_packet` is the single per-packet entry point
+
+Every class in the function pod hierarchy defines these two methods. **All** iteration and
+execution paths go through them — sequential, concurrent, and async. No direct
+`packet_function.call()` or `packet_function.async_call()` calls outside of these methods.
+
+```
+_FunctionPodBase.process_packet(tag, pkt)         → packet_function.call(pkt)
+_FunctionPodBase.async_process_packet(tag, pkt)    → await packet_function.async_call(pkt)
+
+FunctionNode.process_packet(tag, pkt)              → self._function_pod.process_packet(tag, pkt)
+FunctionNode.async_process_packet(tag, pkt)        → await self._function_pod.async_process_packet(tag, pkt)
+
+PersistentFunctionNode.process_packet(tag, pkt)    → cache check → self._function_pod.process_packet → pipeline record
+PersistentFunctionNode.async_process_packet(tag, pkt) → cache check → await self._function_pod.async_process_packet → pipeline record
+```
+
+Wait — there's a subtlety with PersistentFunctionNode. Today its `process_packet` calls
+`self._packet_function.call(packet, skip_cache_lookup=..., skip_cache_insert=...)` directly,
+where `self._packet_function` is a `CachedPacketFunction` (which wraps the original pf).
+It does NOT delegate to the pod's `process_packet`. That's because PersistentFunctionNode
+needs to pass `skip_cache_*` kwargs that the base `process_packet` doesn't accept.
+
+The cleanest structure:
+
+```
+PersistentFunctionNode.process_packet(tag, pkt)
+  → self._packet_function.call(pkt, skip_cache_*=...)    # CachedPacketFunction (sync)
+  → self.add_pipeline_record(...)                         # pipeline DB (sync)
+
+PersistentFunctionNode.async_process_packet(tag, pkt)
+  → await self._packet_function.async_call(pkt, skip_cache_*=...)  # CachedPacketFunction (async)
+  → self.add_pipeline_record(...)                                   # pipeline DB (sync)
+```
+
+This is the same as today for the sync path. The `CachedPacketFunction` handles the result
+cache internally. The `PersistentFunctionNode` handles pipeline records. Neither delegates
+to the pod's `process_packet` — the pod is bypassed because the `CachedPacketFunction`
+replaced the raw packet function in `__init__`.
+
+### B. Concurrent iteration routes through `async_process_packet`
+
+The concurrent path is inherently async — it uses `asyncio.gather`. So it naturally routes
+through `async_process_packet`. The fallback path (when already inside an event loop) routes
+through `process_packet` (sync).
+
+For **FunctionPodStream**, the target is the pod:
+```python
+# concurrent
+await self._function_pod.async_process_packet(tag, pkt)
+# fallback
+self._function_pod.process_packet(tag, pkt)
+```
+
+For **FunctionNode**, the target is `self` — so overrides (PersistentFunctionNode) kick in:
+```python
+# concurrent
+await self.async_process_packet(tag, pkt)
+# fallback
+self.process_packet(tag, pkt)
+```
+
+This means PersistentFunctionNode's concurrent path **automatically** gets cache checks +
+pipeline records via polymorphism. No special handling needed.
+
+### C. `_execute_concurrent` is removed
+
+The module-level `_execute_concurrent(packet_function, packets)` helper is removed. Its
+logic (asyncio.gather with event-loop fallback) is inlined into `_iter_packets_concurrent`
+methods, but now routes through `process_packet` / `async_process_packet` instead of raw
+`packet_function.call` / `packet_function.async_call`.
+
+The `_executor_supports_concurrent` helper stays — it's just a predicate check.
+
+### D. Sync and async are cleanly separated execution modes
+
+- Sync: `iter_packets()` / `as_table()` / `run()`
+- Async: `async_execute(inputs, output)`
+
+They don't populate each other's caches. DB persistence (for Persistent variants) provides
+durability that works across both modes.
+
+### E. OperatorNode delegates to operator, PersistentOperatorNode intercepts for storage
+
+Operators are opaque stream transformers — no per-packet hook. `OperatorNode` passes through
+directly. `PersistentOperatorNode` uses an intermediate channel + `TaskGroup` to forward
+results downstream immediately while collecting them for post-hoc DB storage.
+
+### F. DB operations stay synchronous
+
+The `ArrowDatabaseProtocol` is sync. All DB reads/writes within async methods are sync calls.
+Acceptable because DB is typically in-process and fast. Async DB protocol is deferred.
+
+---
+
+## Implementation steps
+
+### Step 1: Add `async_process_packet` to `_FunctionPodBase`
+
+**File:** `src/orcapod/core/function_pod.py`
+
+Add alongside existing `process_packet` (after line 180):
+
+```python
+async def async_process_packet(
+    self, tag: TagProtocol, packet: PacketProtocol
+) -> tuple[TagProtocol, PacketProtocol | None]:
+    """Async counterpart of ``process_packet``."""
+    return tag, await self.packet_function.async_call(packet)
+```
+
+### Step 2: Fix `FunctionPod.async_execute` to use `async_process_packet`
+
+**File:** `src/orcapod/core/function_pod.py`
+
+Change the `process_one` inner function (lines 315-322):
+
+```python
+async def process_one(tag: TagProtocol, packet: PacketProtocol) -> None:
+    try:
+        tag, result_packet = await self.async_process_packet(tag, packet)
+        if result_packet is not None:
+            await output.send((tag, result_packet))
+    finally:
+        if sem is not None:
+            sem.release()
+```
+
+### Step 3: Fix `FunctionPodStream._iter_packets_concurrent` to use `async_process_packet`
+
+**File:** `src/orcapod/core/function_pod.py`
+
+Replace the `_execute_concurrent` call (lines 454-482) with direct `async_process_packet`
+routing:
+
+```python
+def _iter_packets_concurrent(
+    self,
+) -> Iterator[tuple[TagProtocol, PacketProtocol]]:
+    """Collect remaining inputs, execute concurrently, and yield results in order."""
+    input_iter = self._cached_input_iterator
+
+    all_inputs: list[tuple[int, TagProtocol, PacketProtocol]] = []
+    to_compute: list[tuple[int, TagProtocol, PacketProtocol]] = []
+    for i, (tag, packet) in enumerate(input_iter):
+        all_inputs.append((i, tag, packet))
+        if i not in self._cached_output_packets:
+            to_compute.append((i, tag, packet))
+    self._cached_input_iterator = None
+
+    if to_compute:
+        try:
+            loop = asyncio.get_running_loop()
+        except RuntimeError:
+            loop = None
+
+        if loop is not None:
+            # Already in event loop — fall back to sequential sync
+            results = [
+                self._function_pod.process_packet(tag, pkt)
+                for _, tag, pkt in to_compute
+            ]
+        else:
+            # No event loop — run concurrently via asyncio.run
+            async def _gather() -> list[tuple[TagProtocol, PacketProtocol | None]]:
+                return list(
+                    await asyncio.gather(
+                        *[
+                            self._function_pod.async_process_packet(tag, pkt)
+                            for _, tag, pkt in to_compute
+                        ]
+                    )
+                )
+
+            results = asyncio.run(_gather())
+
+        for (i, _, _), (tag, output_packet) in zip(to_compute, results):
+            self._cached_output_packets[i] = (tag, output_packet)
+
+    for i, *_ in all_inputs:
+        tag, packet = self._cached_output_packets[i]
+        if packet is not None:
+            yield tag, packet
+```
+
+**Note:** The method signature drops the `packet_function` parameter — it no longer needs
+it since it routes through `self._function_pod`.
+
+The `iter_packets` method that calls this also needs updating — remove the `pf` argument:
+
+```python
+def iter_packets(self) -> Iterator[tuple[TagProtocol, PacketProtocol]]:
+    if self.is_stale:
+        self.clear_cache()
+    if self._cached_input_iterator is not None:
+        if _executor_supports_concurrent(self._function_pod.packet_function):
+            yield from self._iter_packets_concurrent()
+        else:
+            yield from self._iter_packets_sequential()
+    else:
+        for i in range(len(self._cached_output_packets)):
+            tag, packet = self._cached_output_packets[i]
+            if packet is not None:
+                yield tag, packet
+```
+
+### Step 4: Fix `FunctionNode._iter_packets_sequential` to use `process_packet`
+
+**File:** `src/orcapod/core/function_pod.py`
+
+Change line 831 from:
+```python
+output_packet = self._packet_function.call(packet)
+self._cached_output_packets[i] = (tag, output_packet)
+```
+to:
+```python
+tag, output_packet = self.process_packet(tag, packet)
+self._cached_output_packets[i] = (tag, output_packet)
+```
+
+### Step 5: Fix `FunctionNode._iter_packets_concurrent` to use `async_process_packet`
+
+**File:** `src/orcapod/core/function_pod.py`
+
+Same transformation as Step 3, but routing through `self` instead of `self._function_pod`:
+
+```python
+def _iter_packets_concurrent(
+    self,
+) -> Iterator[tuple[TagProtocol, PacketProtocol]]:
+    """Collect remaining inputs, execute concurrently, and yield results in order."""
+    input_iter = self._cached_input_iterator
+
+    all_inputs: list[tuple[int, TagProtocol, PacketProtocol]] = []
+    to_compute: list[tuple[int, TagProtocol, PacketProtocol]] = []
+    for i, (tag, packet) in enumerate(input_iter):
+        all_inputs.append((i, tag, packet))
+        if i not in self._cached_output_packets:
+            to_compute.append((i, tag, packet))
+    self._cached_input_iterator = None
+
+    if to_compute:
+        try:
+            loop = asyncio.get_running_loop()
+        except RuntimeError:
+            loop = None
+
+        if loop is not None:
+            # Already in event loop — fall back to sequential sync
+            results = [
+                self.process_packet(tag, pkt)
+                for _, tag, pkt in to_compute
+            ]
+        else:
+            # No event loop — run concurrently via asyncio.run
+            async def _gather() -> list[tuple[TagProtocol, PacketProtocol | None]]:
+                return list(
+                    await asyncio.gather(
+                        *[
+                            self.async_process_packet(tag, pkt)
+                            for _, tag, pkt in to_compute
+                        ]
+                    )
+                )
+
+            results = asyncio.run(_gather())
+
+        for (i, _, _), (tag, output_packet) in zip(to_compute, results):
+            self._cached_output_packets[i] = (tag, output_packet)
+
+    for i, *_ in all_inputs:
+        tag, packet = self._cached_output_packets[i]
+        if packet is not None:
+            yield tag, packet
+```
+
+**Critical difference from Step 3:** Uses `self.process_packet` / `self.async_process_packet`
+instead of `self._function_pod.*`. This means when `PersistentFunctionNode` inherits this
+method, it automatically routes through its overridden `process_packet` /
+`async_process_packet` which include cache checks + pipeline record storage.
+
+### Step 6: Remove `_execute_concurrent`
+
+**File:** `src/orcapod/core/function_pod.py`
+
+Delete the `_execute_concurrent` function (lines 52-82). Its logic is now inlined into the
+`_iter_packets_concurrent` methods.
+
+### Step 7: Add `process_packet` and `async_process_packet` to `FunctionNode`
+
+**File:** `src/orcapod/core/function_pod.py`
+
+FunctionNode currently has no `process_packet`. Add delegation to the function pod:
+
+```python
+def process_packet(
+    self, tag: TagProtocol, packet: PacketProtocol
+) -> tuple[TagProtocol, PacketProtocol | None]:
+    """Process a single packet by delegating to the function pod."""
+    return self._function_pod.process_packet(tag, packet)
+
+async def async_process_packet(
+    self, tag: TagProtocol, packet: PacketProtocol
+) -> tuple[TagProtocol, PacketProtocol | None]:
+    """Async counterpart of ``process_packet``."""
+    return await self._function_pod.async_process_packet(tag, packet)
+```
+
+### Step 8: Add `FunctionNode.async_execute`
+
+**File:** `src/orcapod/core/function_pod.py`
+
+Sequential streaming through `async_process_packet`:
+
+```python
+async def async_execute(
+    self,
+    inputs: Sequence[ReadableChannel[tuple[TagProtocol, PacketProtocol]]],
+    output: WritableChannel[tuple[TagProtocol, PacketProtocol]],
+) -> None:
+    """Streaming async execution — process each packet via async_process_packet."""
+    try:
+        async for tag, packet in inputs[0]:
+            tag, result_packet = await self.async_process_packet(tag, packet)
+            if result_packet is not None:
+                await output.send((tag, result_packet))
+    finally:
+        await output.close()
+```
+
+### Step 9: Add async cache-aware `async_call` to `CachedPacketFunction`
+
+**File:** `src/orcapod/core/packet_function.py`
+
+Override `async_call` to mirror the sync `call()` logic (lines 508-533):
+
+```python
+async def async_call(
+    self,
+    packet: PacketProtocol,
+    *,
+    skip_cache_lookup: bool = False,
+    skip_cache_insert: bool = False,
+) -> PacketProtocol | None:
+    """Async counterpart of ``call`` with cache check and recording."""
+    output_packet = None
+    if not skip_cache_lookup:
+        logger.info("Checking for cache...")
+        output_packet = self.get_cached_output_for_packet(packet)
+        if output_packet is not None:
+            logger.info(f"Cache hit for {packet}!")
+    if output_packet is None:
+        output_packet = await self._packet_function.async_call(packet)
+        if output_packet is not None:
+            if not skip_cache_insert:
+                self.record_packet(packet, output_packet)
+            output_packet = output_packet.with_meta_columns(
+                **{self.RESULT_COMPUTED_FLAG: True}
+            )
+    return output_packet
+```
+
+### Step 10: Add `async_process_packet` to `PersistentFunctionNode`
+
+**File:** `src/orcapod/core/function_pod.py`
+
+PersistentFunctionNode already has `process_packet` (line 1027-1066) which calls
+`self._packet_function.call(packet, skip_cache_*=...)` (where `_packet_function` is a
+`CachedPacketFunction`) then `self.add_pipeline_record(...)`. Add the async counterpart:
+
+```python
+async def async_process_packet(
+    self,
+    tag: TagProtocol,
+    packet: PacketProtocol,
+    skip_cache_lookup: bool = False,
+    skip_cache_insert: bool = False,
+) -> tuple[TagProtocol, PacketProtocol | None]:
+    """Async counterpart of ``process_packet``.
+
+    Uses the CachedPacketFunction's async_call for computation + result caching.
+    Pipeline record storage is synchronous (DB protocol is sync).
+    """
+    output_packet = await self._packet_function.async_call(
+        packet,
+        skip_cache_lookup=skip_cache_lookup,
+        skip_cache_insert=skip_cache_insert,
+    )
+
+    if output_packet is not None:
+        result_computed = bool(
+            output_packet.get_meta_value(
+                self._packet_function.RESULT_COMPUTED_FLAG, False
+            )
+        )
+        self.add_pipeline_record(
+            tag,
+            packet,
+            packet_record_id=output_packet.datagram_id,
+            computed=result_computed,
+        )
+
+    return tag, output_packet
+```
+
+### Step 11: Add `PersistentFunctionNode.async_execute` (two-phase)
+
+**File:** `src/orcapod/core/function_pod.py`
+
+Overrides `FunctionNode.async_execute`:
+
+```python
+async def async_execute(
+    self,
+    inputs: Sequence[ReadableChannel[tuple[TagProtocol, PacketProtocol]]],
+    output: WritableChannel[tuple[TagProtocol, PacketProtocol]],
+) -> None:
+    """Two-phase async execution: replay cached, then compute missing."""
+    try:
+        # Phase 1: emit existing results from DB
+        existing = self.get_all_records(columns={"meta": True})
+        computed_hashes: set[str] = set()
+        if existing is not None and existing.num_rows > 0:
+            tag_keys = self._input_stream.keys()[0]
+            hash_col = constants.INPUT_PACKET_HASH_COL
+            computed_hashes = set(
+                cast(list[str], existing.column(hash_col).to_pylist())
+            )
+            data_table = existing.drop([hash_col])
+            existing_stream = ArrowTableStream(data_table, tag_columns=tag_keys)
+            for tag, packet in existing_stream.iter_packets():
+                await output.send((tag, packet))
+
+        # Phase 2: process packets not already in the DB
+        async for tag, packet in inputs[0]:
+            input_hash = packet.content_hash().to_string()
+            if input_hash in computed_hashes:
+                continue
+            tag, output_packet = await self.async_process_packet(tag, packet)
+            if output_packet is not None:
+                await output.send((tag, output_packet))
+    finally:
+        await output.close()
+```
+
+### Step 12: Add `OperatorNode.async_execute`
+
+**File:** `src/orcapod/core/operator_node.py`
+
+Direct pass-through:
+
+```python
+async def async_execute(
+    self,
+    inputs: Sequence[ReadableChannel[tuple[TagProtocol, PacketProtocol]]],
+    output: WritableChannel[tuple[TagProtocol, PacketProtocol]],
+) -> None:
+    """Delegate to operator's async_execute."""
+    await self._operator.async_execute(inputs, output)
+```
+
+### Step 13: Extract `_store_output_stream` from `PersistentOperatorNode._compute_and_store`
+
+**File:** `src/orcapod/core/operator_node.py`
+
+```python
+def _store_output_stream(self, stream: StreamProtocol) -> None:
+    """Materialize stream and store in the pipeline database with per-row dedup."""
+    output_table = stream.as_table(
+        columns={"source": True, "system_tags": True},
+    )
+
+    arrow_hasher = self.data_context.arrow_hasher
+    record_hashes = []
+    for batch in output_table.to_batches():
+        for i in range(len(batch)):
+            record_hashes.append(
+                arrow_hasher.hash_table(batch.slice(i, 1)).to_hex()
+            )
+
+    output_table = output_table.add_column(
+        0,
+        self.HASH_COLUMN_NAME,
+        pa.array(record_hashes, type=pa.large_string()),
+    )
+
+    self._pipeline_database.add_records(
+        self.pipeline_path,
+        output_table,
+        record_id_column=self.HASH_COLUMN_NAME,
+        skip_duplicates=True,
+    )
+
+    self._cached_output_table = output_table.drop(self.HASH_COLUMN_NAME)
+```
+
+Refactor `_compute_and_store`:
+
+```python
+def _compute_and_store(self) -> None:
+    self._cached_output_stream = self._operator.process(*self._input_streams)
+    if self._cache_mode == CacheMode.OFF:
+        self._update_modified_time()
+        return
+    self._store_output_stream(self._cached_output_stream)
+    self._update_modified_time()
+```
+
+### Step 14: Add `PersistentOperatorNode.async_execute`
+
+**File:** `src/orcapod/core/operator_node.py`
+
+```python
+async def async_execute(
+    self,
+    inputs: Sequence[ReadableChannel[tuple[TagProtocol, PacketProtocol]]],
+    output: WritableChannel[tuple[TagProtocol, PacketProtocol]],
+) -> None:
+    """Async execution with cache mode handling.
+
+    REPLAY: emit from DB, close output.
+    OFF: delegate to operator, forward results.
+    LOG: delegate to operator, forward + collect results, then store in DB.
+    """
+    try:
+        if self._cache_mode == CacheMode.REPLAY:
+            self._replay_from_cache()
+            assert self._cached_output_stream is not None
+            for tag, packet in self._cached_output_stream.iter_packets():
+                await output.send((tag, packet))
+            return  # finally block closes output
+
+        # OFF or LOG: delegate to operator, forward results downstream
+        intermediate = Channel[tuple[TagProtocol, PacketProtocol]]()
+        collected: list[tuple[TagProtocol, PacketProtocol]] = []
+
+        async def forward() -> None:
+            async for item in intermediate.reader:
+                collected.append(item)
+                await output.send(item)
+
+        async with asyncio.TaskGroup() as tg:
+            tg.create_task(
+                self._operator.async_execute(inputs, intermediate.writer)
+            )
+            tg.create_task(forward())
+
+        # TaskGroup has completed — all results are in `collected`
+        # Store if LOG mode (sync DB write, post-hoc)
+        if self._cache_mode == CacheMode.LOG and collected:
+            stream = StaticOutputPod._materialize_to_stream(collected)
+            self._cached_output_stream = stream
+            self._store_output_stream(stream)
+
+        self._update_modified_time()
+    finally:
+        await output.close()
+```
+
+### Step 15: Add imports
+
+**`src/orcapod/core/operator_node.py`** — add:
+```python
+import asyncio
+from collections.abc import Sequence
+
+from orcapod.channels import Channel, ReadableChannel, WritableChannel
+from orcapod.core.static_output_pod import StaticOutputPod
+```
+
+**`src/orcapod/core/function_pod.py`** — already has all needed imports.
+
+### Step 16: Update regression test for `_execute_concurrent` removal
+
+**File:** `tests/test_core/test_regression_fixes.py`
+
+`TestExecuteConcurrentInRunningLoop` imports and tests `_execute_concurrent` directly.
+Since we're removing that function, this test class needs to be rewritten to test the
+behavior through the actual classes:
+
+- Test that `FunctionPodStream._iter_packets_concurrent` falls back to sequential
+  `process_packet` when called inside a running event loop.
+- Test that `FunctionNode._iter_packets_concurrent` does the same.
+
+The tested behavior (event-loop fallback) is preserved — it's just now method-internal
+rather than in a standalone helper.
+
+### Step 17: Tests for new functionality
+
+**File:** `tests/test_channels/test_node_async_execute.py` (new)
+
+```
+TestProtocolConformance
+  - test_function_node_satisfies_async_executable_protocol
+  - test_persistent_function_node_satisfies_async_executable_protocol
+  - test_operator_node_satisfies_async_executable_protocol
+  - test_persistent_operator_node_satisfies_async_executable_protocol
+
+TestCachedPacketFunctionAsync
+  - test_async_call_cache_miss_computes_and_records
+  - test_async_call_cache_hit_returns_cached
+  - test_async_call_skip_cache_lookup
+  - test_async_call_skip_cache_insert
+
+TestProcessPacketRouting
+  - test_function_pod_stream_sequential_uses_process_packet
+  - test_function_pod_stream_concurrent_uses_async_process_packet
+  - test_function_node_sequential_uses_process_packet
+  - test_function_node_concurrent_uses_async_process_packet
+  - test_persistent_function_node_concurrent_uses_overridden_async_process_packet
+  - test_concurrent_fallback_in_event_loop_uses_sync_process_packet
+
+TestFunctionNodeAsyncExecute
+  - test_basic_streaming_matches_sync
+  - test_empty_input_closes_cleanly
+  - test_none_packets_filtered_out
+
+TestPersistentFunctionNodeAsyncExecute
+  - test_no_cache_processes_all_inputs
+  - test_phase1_emits_cached_results
+  - test_phase2_skips_cached_computes_new
+  - test_pipeline_records_created_for_new_packets
+  - test_result_cache_populated_for_new_packets
+
+TestOperatorNodeAsyncExecute
+  - test_unary_op_delegation (SelectPacketColumns)
+  - test_binary_op_delegation (SemiJoin)
+  - test_nary_op_delegation (Join)
+  - test_results_match_sync_run
+
+TestPersistentOperatorNodeAsyncExecute
+  - test_off_mode_computes_no_db_write
+  - test_log_mode_computes_and_stores
+  - test_log_mode_results_match_sync
+  - test_replay_mode_emits_from_db
+  - test_replay_empty_db_returns_empty
+
+TestEndToEnd
+  - test_source_to_persistent_function_node_pipeline
+  - test_source_to_persistent_operator_node_pipeline
+```
+
+### Step 18: Run full test suite
+
+```bash
+uv run pytest tests/ -x
+```
+
+---
+
+## Summary of all changes
+
+### Call chains after changes
+
+**Sync sequential path:**
+```
+FunctionPodStream._iter_packets_sequential
+  → self._function_pod.process_packet(tag, pkt)       # already correct
+    → packet_function.call(pkt)
+
+FunctionNode._iter_packets_sequential
+  → self.process_packet(tag, pkt)                      # CHANGED: was _packet_function.call(pkt)
+    → self._function_pod.process_packet(tag, pkt)
+      → packet_function.call(pkt)
+
+PersistentFunctionNode._iter_packets_sequential (inherited from FunctionNode)
+  → self.process_packet(tag, pkt)                      # polymorphism kicks in
+    → CachedPacketFunction.call(pkt, skip_cache_*=...) # cache check + compute + record
+    → self.add_pipeline_record(...)                     # pipeline DB
+```
+
+**Sync concurrent path:**
+```
+FunctionPodStream._iter_packets_concurrent
+  → asyncio.run(gather(
+        self._function_pod.async_process_packet(tag, pkt) ...   # CHANGED: was _execute_concurrent
+    ))
+  OR (if event loop running):
+    self._function_pod.process_packet(tag, pkt) ...             # fallback
+
+FunctionNode._iter_packets_concurrent
+  → asyncio.run(gather(
+        self.async_process_packet(tag, pkt) ...                 # CHANGED: was _execute_concurrent
+    ))
+  OR (if event loop running):
+    self.process_packet(tag, pkt) ...                           # fallback
+
+PersistentFunctionNode._iter_packets_concurrent (inherited from FunctionNode)
+  → asyncio.run(gather(
+        self.async_process_packet(tag, pkt) ...                 # polymorphism kicks in
+          → await CachedPacketFunction.async_call(pkt)          # cache + compute
+          → self.add_pipeline_record(...)                       # pipeline DB
+    ))
+```
+
+**Async execution path:**
+```
+FunctionPod.async_execute
+  → await self.async_process_packet(tag, pkt)          # CHANGED: was packet_function.async_call
+    → await packet_function.async_call(pkt)
+
+FunctionNode.async_execute                              # NEW
+  → await self.async_process_packet(tag, pkt)
+    → await self._function_pod.async_process_packet(tag, pkt)
+      → await packet_function.async_call(pkt)
+
+PersistentFunctionNode.async_execute                    # NEW (two-phase)
+  Phase 1: emit from DB
+  Phase 2:
+    → await self.async_process_packet(tag, pkt)         # polymorphic override
+      → await CachedPacketFunction.async_call(pkt)      # cache + compute
+      → self.add_pipeline_record(...)                   # pipeline DB (sync)
+
+OperatorNode.async_execute                              # NEW
+  → await operator.async_execute(inputs, output)
+
+PersistentOperatorNode.async_execute                    # NEW
+  REPLAY: emit from DB
+  OFF/LOG:
+    TaskGroup:
+      operator.async_execute(inputs, intermediate.writer)
+      forward(intermediate.reader → output + collect)
+    if LOG: _store_output_stream(materialize(collected)) # sync DB write
+```
+
+### Files modified
+
+| File | Changes |
+|------|---------|
+| `src/orcapod/core/packet_function.py` | Add `CachedPacketFunction.async_call` override with cache logic |
+| `src/orcapod/core/function_pod.py` | (1) Add `_FunctionPodBase.async_process_packet` |
+| | (2) Fix `FunctionPod.async_execute` to use `async_process_packet` |
+| | (3) Rewrite `FunctionPodStream._iter_packets_concurrent` — route through `_function_pod.async_process_packet` / `process_packet`, drop `packet_function` param |
+| | (4) Update `FunctionPodStream.iter_packets` — remove `pf` arg to `_iter_packets_concurrent` |
+| | (5) Fix `FunctionNode._iter_packets_sequential` to use `self.process_packet` |
+| | (6) Rewrite `FunctionNode._iter_packets_concurrent` — route through `self.async_process_packet` / `self.process_packet` |
+| | (7) Add `FunctionNode.process_packet` + `async_process_packet` (delegate to pod) |
+| | (8) Add `FunctionNode.async_execute` |
+| | (9) Add `PersistentFunctionNode.async_process_packet` (cache + pipeline records) |
+| | (10) Add `PersistentFunctionNode.async_execute` (two-phase) |
+| | (11) Remove `_execute_concurrent` module-level helper |
+| `src/orcapod/core/operator_node.py` | (1) Add imports |
+| | (2) Add `OperatorNode.async_execute` (pass-through) |
+| | (3) Extract `PersistentOperatorNode._store_output_stream` |
+| | (4) Refactor `PersistentOperatorNode._compute_and_store` |
+| | (5) Add `PersistentOperatorNode.async_execute` (TaskGroup + post-hoc storage) |
+| `tests/test_core/test_regression_fixes.py` | Rewrite `TestExecuteConcurrentInRunningLoop` — test through classes instead of removed helper |
+| `tests/test_channels/test_node_async_execute.py` | New test file |
diff --git a/src/orcapod/core/function_pod.py b/src/orcapod/core/function_pod.py
index 7fa5ca51..a7618d7a 100644
--- a/src/orcapod/core/function_pod.py
+++ b/src/orcapod/core/function_pod.py
@@ -49,38 +49,6 @@ def _executor_supports_concurrent(
     return executor is not None and executor.supports_concurrent_execution
 
 
-def _execute_concurrent(
-    packet_function: PacketFunctionProtocol,
-    packets: list[PacketProtocol],
-) -> list[PacketProtocol | None]:
-    """Submit all *packets* to the executor concurrently and return results in order.
-
-    Uses ``asyncio.gather`` to run all tasks concurrently, then blocks
-    until all complete.  If an event loop is already running (e.g. inside
-    ``async def`` code, notebooks, or pytest-asyncio), falls back to
-    sequential execution to avoid ``RuntimeError``.
-    """
-    import asyncio
-
-    try:
-        loop = asyncio.get_running_loop()
-    except RuntimeError:
-        loop = None
-
-    if loop is not None:
-        # Already inside an event loop -- cannot call asyncio.run().
-        # Fall back to sequential synchronous execution.
-        return [packet_function.call(pkt) for pkt in packets]
-
-    async def _gather() -> list[PacketProtocol | None]:
-        return list(
-            await asyncio.gather(
-                *[packet_function.async_call(pkt) for pkt in packets]
-            )
-        )
-
-    return asyncio.run(_gather())
-
 
 class _FunctionPodBase(TraceableBase):
     """Base pod that applies a packet function to each input packet."""
@@ -179,6 +147,12 @@ def process_packet(
         """
         return tag, self.packet_function.call(packet)
 
+    async def async_process_packet(
+        self, tag: TagProtocol, packet: PacketProtocol
+    ) -> tuple[TagProtocol, PacketProtocol | None]:
+        """Async counterpart of ``process_packet``."""
+        return tag, await self.packet_function.async_call(packet)
+
     def handle_input_streams(self, *streams: StreamProtocol) -> StreamProtocol:
         """Handle multiple input streams by joining them if necessary.
 
@@ -314,7 +288,7 @@ async def async_execute(
 
             async def process_one(tag: TagProtocol, packet: PacketProtocol) -> None:
                 try:
-                    result_packet = await self.packet_function.async_call(packet)
+                    tag, result_packet = await self.async_process_packet(tag, packet)
                     if result_packet is not None:
                         await output.send((tag, result_packet))
                 finally:
@@ -419,9 +393,8 @@ def iter_packets(self) -> Iterator[tuple[TagProtocol, PacketProtocol]]:
         if self.is_stale:
             self.clear_cache()
         if self._cached_input_iterator is not None:
-            pf = self._function_pod.packet_function
-            if _executor_supports_concurrent(pf):
-                yield from self._iter_packets_concurrent(pf)
+            if _executor_supports_concurrent(self._function_pod.packet_function):
+                yield from self._iter_packets_concurrent()
             else:
                 yield from self._iter_packets_sequential()
         else:
@@ -453,7 +426,6 @@ def _iter_packets_sequential(
 
     def _iter_packets_concurrent(
         self,
-        packet_function: PacketFunctionProtocol,
     ) -> Iterator[tuple[TagProtocol, PacketProtocol]]:
         """Collect remaining inputs, execute concurrently, and yield results in order."""
         input_iter = self._cached_input_iterator
@@ -467,12 +439,33 @@ def _iter_packets_concurrent(
                 to_compute.append((i, tag, packet))
         self._cached_input_iterator = None
 
-        # Submit uncached packets concurrently and cache results.
+        # Submit uncached packets concurrently via async_process_packet.
         if to_compute:
-            results = _execute_concurrent(
-                packet_function, [pkt for _, _, pkt in to_compute]
-            )
-            for (i, tag, _), output_packet in zip(to_compute, results):
+            try:
+                loop = asyncio.get_running_loop()
+            except RuntimeError:
+                loop = None
+
+            if loop is not None:
+                # Already in event loop — fall back to sequential sync
+                results = [
+                    self._function_pod.process_packet(tag, pkt)
+                    for _, tag, pkt in to_compute
+                ]
+            else:
+                async def _gather() -> list[tuple[TagProtocol, PacketProtocol | None]]:
+                    return list(
+                        await asyncio.gather(
+                            *[
+                                self._function_pod.async_process_packet(tag, pkt)
+                                for _, tag, pkt in to_compute
+                            ]
+                        )
+                    )
+
+                results = asyncio.run(_gather())
+
+            for (i, _, _), (tag, output_packet) in zip(to_compute, results):
                 self._cached_output_packets[i] = (tag, output_packet)
 
         # Yield everything in original order.
@@ -818,6 +811,18 @@ def iter_packets(self) -> Iterator[tuple[TagProtocol, PacketProtocol]]:
                 if packet is not None:
                     yield tag, packet
 
+    def process_packet(
+        self, tag: TagProtocol, packet: PacketProtocol
+    ) -> tuple[TagProtocol, PacketProtocol | None]:
+        """Process a single packet by delegating to the function pod."""
+        return self._function_pod.process_packet(tag, packet)
+
+    async def async_process_packet(
+        self, tag: TagProtocol, packet: PacketProtocol
+    ) -> tuple[TagProtocol, PacketProtocol | None]:
+        """Async counterpart of ``process_packet``."""
+        return await self._function_pod.async_process_packet(tag, packet)
+
     def _iter_packets_sequential(
         self,
     ) -> Iterator[tuple[TagProtocol, PacketProtocol]]:
@@ -828,7 +833,7 @@ def _iter_packets_sequential(
                 if packet is not None:
                     yield tag, packet
             else:
-                output_packet = self._packet_function.call(packet)
+                tag, output_packet = self.process_packet(tag, packet)
                 self._cached_output_packets[i] = (tag, output_packet)
                 if output_packet is not None:
                     yield tag, output_packet
@@ -849,10 +854,31 @@ def _iter_packets_concurrent(
         self._cached_input_iterator = None
 
         if to_compute:
-            results = _execute_concurrent(
-                self._packet_function, [pkt for _, _, pkt in to_compute]
-            )
-            for (i, tag, _), output_packet in zip(to_compute, results):
+            try:
+                loop = asyncio.get_running_loop()
+            except RuntimeError:
+                loop = None
+
+            if loop is not None:
+                # Already in event loop — fall back to sequential sync
+                results = [
+                    self.process_packet(tag, pkt)
+                    for _, tag, pkt in to_compute
+                ]
+            else:
+                async def _gather() -> list[tuple[TagProtocol, PacketProtocol | None]]:
+                    return list(
+                        await asyncio.gather(
+                            *[
+                                self.async_process_packet(tag, pkt)
+                                for _, tag, pkt in to_compute
+                            ]
+                        )
+                    )
+
+                results = asyncio.run(_gather())
+
+            for (i, _, _), (tag, output_packet) in zip(to_compute, results):
                 self._cached_output_packets[i] = (tag, output_packet)
 
         for i, *_ in all_inputs:
@@ -945,13 +971,22 @@ def as_table(
             )
         return output_table
 
+    # ------------------------------------------------------------------
+    # Async channel execution
+    # ------------------------------------------------------------------
+
     async def async_execute(
         self,
         inputs: Sequence[ReadableChannel[tuple[TagProtocol, PacketProtocol]]],
         output: WritableChannel[tuple[TagProtocol, PacketProtocol]],
         pipeline_config: PipelineConfig | None = None,
     ) -> None:
-        """Streaming async execution for FunctionNode."""
+        """Streaming async execution for FunctionNode.
+
+        Routes each packet through ``async_process_packet`` so that
+        subclasses (e.g. ``PersistentFunctionNode``) can override the
+        per-packet logic without re-implementing the concurrency scaffold.
+        """
         try:
             pipeline_config = pipeline_config or PipelineConfig()
             node_config = (
@@ -964,9 +999,11 @@ async def async_execute(
 
             async def process_one(tag: TagProtocol, packet: PacketProtocol) -> None:
                 try:
-                    result_packet = self._packet_function.call(packet)
+                    tag_out, result_packet = await self.async_process_packet(
+                        tag, packet
+                    )
                     if result_packet is not None:
-                        await output.send((tag, result_packet))
+                        await output.send((tag_out, result_packet))
                 finally:
                     if sem is not None:
                         sem.release()
@@ -1099,6 +1136,39 @@ def process_packet(
 
         return tag, output_packet
 
+    async def async_process_packet(
+        self,
+        tag: TagProtocol,
+        packet: PacketProtocol,
+        skip_cache_lookup: bool = False,
+        skip_cache_insert: bool = False,
+    ) -> tuple[TagProtocol, PacketProtocol | None]:
+        """Async counterpart of ``process_packet``.
+
+        Uses the CachedPacketFunction's async_call for computation + result
+        caching.  Pipeline record storage is synchronous (DB protocol is sync).
+        """
+        output_packet = await self._packet_function.async_call(
+            packet,
+            skip_cache_lookup=skip_cache_lookup,
+            skip_cache_insert=skip_cache_insert,
+        )
+
+        if output_packet is not None:
+            result_computed = bool(
+                output_packet.get_meta_value(
+                    self._packet_function.RESULT_COMPUTED_FLAG, False
+                )
+            )
+            self.add_pipeline_record(
+                tag,
+                packet,
+                packet_record_id=output_packet.datagram_id,
+                computed=result_computed,
+            )
+
+        return tag, output_packet
+
     def add_pipeline_record(
         self,
         tag: TagProtocol,
@@ -1262,6 +1332,42 @@ def run(self) -> None:
         for _ in self.iter_packets():
             pass
 
+    # ------------------------------------------------------------------
+    # Async channel execution (two-phase)
+    # ------------------------------------------------------------------
+
+    async def async_execute(
+        self,
+        inputs: Sequence[ReadableChannel[tuple[TagProtocol, PacketProtocol]]],
+        output: WritableChannel[tuple[TagProtocol, PacketProtocol]],
+    ) -> None:
+        """Two-phase async execution: replay cached, then compute missing."""
+        try:
+            # Phase 1: emit existing results from DB
+            existing = self.get_all_records(columns={"meta": True})
+            computed_hashes: set[str] = set()
+            if existing is not None and existing.num_rows > 0:
+                tag_keys = self._input_stream.keys()[0]
+                hash_col = constants.INPUT_PACKET_HASH_COL
+                computed_hashes = set(
+                    cast(list[str], existing.column(hash_col).to_pylist())
+                )
+                data_table = existing.drop([hash_col])
+                existing_stream = ArrowTableStream(data_table, tag_columns=tag_keys)
+                for tag, packet in existing_stream.iter_packets():
+                    await output.send((tag, packet))
+
+            # Phase 2: process packets not already in the DB
+            async for tag, packet in inputs[0]:
+                input_hash = packet.content_hash().to_string()
+                if input_hash in computed_hashes:
+                    continue
+                tag, output_packet = await self.async_process_packet(tag, packet)
+                if output_packet is not None:
+                    await output.send((tag, output_packet))
+        finally:
+            await output.close()
+
     def as_source(self):
         """Return a DerivedSource backed by the DB records of this node."""
         from orcapod.core.sources.derived_source import DerivedSource
diff --git a/src/orcapod/core/operator_node.py b/src/orcapod/core/operator_node.py
index 3bf87485..6de06e70 100644
--- a/src/orcapod/core/operator_node.py
+++ b/src/orcapod/core/operator_node.py
@@ -1,13 +1,14 @@
 from __future__ import annotations
 
+import asyncio
 import logging
 from collections.abc import Iterator, Sequence
 from typing import TYPE_CHECKING, Any
 
-from orcapod.channels import ReadableChannel, WritableChannel
-
 from orcapod import contexts
+from orcapod.channels import Channel, ReadableChannel, WritableChannel
 from orcapod.config import Config
+from orcapod.core.static_output_pod import StaticOutputPod
 from orcapod.core.streams.base import StreamBase
 from orcapod.core.tracker import DEFAULT_TRACKER_MANAGER
 from orcapod.protocols.core_protocols import (
@@ -158,13 +159,25 @@ def as_table(
         assert self._cached_output_stream is not None
         return self._cached_output_stream.as_table(columns=columns, all_info=all_info)
 
+    # ------------------------------------------------------------------
+    # Async channel execution
+    # ------------------------------------------------------------------
+
     async def async_execute(
         self,
         inputs: Sequence[ReadableChannel[tuple[TagProtocol, PacketProtocol]]],
         output: WritableChannel[tuple[TagProtocol, PacketProtocol]],
     ) -> None:
-        """Delegate to the wrapped operator's async_execute."""
-        await self._operator.async_execute(inputs, output)
+        """Delegate to the wrapped operator's async_execute.
+
+        Passes pipeline hashes from the input streams so that
+        multi-input operators can compute canonical system-tag
+        column names without storing state during validation.
+        """
+        hashes = [s.pipeline_hash() for s in self._input_streams]
+        await self._operator.async_execute(
+            inputs, output, input_pipeline_hashes=hashes
+        )
 
     def __repr__(self) -> str:
         return (
@@ -242,18 +255,9 @@ def pipeline_path(self) -> tuple[str, ...]:
             + (f"node:{self._pipeline_node_hash}",)
         )
 
-    def _compute_and_store(self) -> None:
-        """Compute operator output, optionally store in DB."""
-        self._cached_output_stream = self._operator.process(
-            *self._input_streams,
-        )
-
-        if self._cache_mode == CacheMode.OFF:
-            self._update_modified_time()
-            return
-
-        # Materialize for DB storage (LOG and REPLAY modes)
-        output_table = self._cached_output_stream.as_table(
+    def _store_output_stream(self, stream: StreamProtocol) -> None:
+        """Materialize stream and store in the pipeline database with per-row dedup."""
+        output_table = stream.as_table(
             columns={"source": True, "system_tags": True},
         )
 
@@ -281,6 +285,18 @@ def _compute_and_store(self) -> None:
         )
 
         self._cached_output_table = output_table.drop(self.HASH_COLUMN_NAME)
+
+    def _compute_and_store(self) -> None:
+        """Compute operator output, optionally store in DB."""
+        self._cached_output_stream = self._operator.process(
+            *self._input_streams,
+        )
+
+        if self._cache_mode == CacheMode.OFF:
+            self._update_modified_time()
+            return
+
+        self._store_output_stream(self._cached_output_stream)
         self._update_modified_time()
 
     def _replay_from_cache(self) -> None:
@@ -368,6 +384,61 @@ def get_all_records(
 
         return results if results.num_rows > 0 else None
 
+    # ------------------------------------------------------------------
+    # Async channel execution
+    # ------------------------------------------------------------------
+
+    async def async_execute(
+        self,
+        inputs: Sequence[ReadableChannel[tuple[TagProtocol, PacketProtocol]]],
+        output: WritableChannel[tuple[TagProtocol, PacketProtocol]],
+    ) -> None:
+        """Async execution with cache mode handling.
+
+        REPLAY: emit from DB, close output.
+        OFF: delegate to operator, forward results.
+        LOG: delegate to operator, forward + collect results, then store in DB.
+        """
+        try:
+            if self._cache_mode == CacheMode.REPLAY:
+                self._replay_from_cache()
+                assert self._cached_output_stream is not None
+                for tag, packet in self._cached_output_stream.iter_packets():
+                    await output.send((tag, packet))
+                return  # finally block closes output
+
+            # OFF or LOG: delegate to operator, forward results downstream
+            intermediate: Channel[tuple[TagProtocol, PacketProtocol]] = Channel()
+            should_collect = self._cache_mode == CacheMode.LOG
+            collected: list[tuple[TagProtocol, PacketProtocol]] = []
+
+            async def forward() -> None:
+                async for item in intermediate.reader:
+                    if should_collect:
+                        collected.append(item)
+                    await output.send(item)
+
+            hashes = [s.pipeline_hash() for s in self._input_streams]
+            async with asyncio.TaskGroup() as tg:
+                tg.create_task(
+                    self._operator.async_execute(
+                        inputs,
+                        intermediate.writer,
+                        input_pipeline_hashes=hashes,
+                    )
+                )
+                tg.create_task(forward())
+
+            # TaskGroup has completed — store if LOG mode (sync DB write, post-hoc)
+            if should_collect and collected:
+                stream = StaticOutputPod._materialize_to_stream(collected)
+                self._cached_output_stream = stream
+                self._store_output_stream(stream)
+
+            self._update_modified_time()
+        finally:
+            await output.close()
+
     # ------------------------------------------------------------------
     # DerivedSource
     # ------------------------------------------------------------------
diff --git a/src/orcapod/core/operators/base.py b/src/orcapod/core/operators/base.py
index ab7b5fc2..fbc20fbf 100644
--- a/src/orcapod/core/operators/base.py
+++ b/src/orcapod/core/operators/base.py
@@ -13,7 +13,7 @@
     StreamProtocol,
     TagProtocol,
 )
-from orcapod.types import ColumnConfig, Schema
+from orcapod.types import ColumnConfig, ContentHash, Schema
 
 
 class UnaryOperator(StaticOutputPod):
@@ -72,6 +72,8 @@ async def async_execute(
         self,
         inputs: Sequence[ReadableChannel[tuple[TagProtocol, PacketProtocol]]],
         output: WritableChannel[tuple[TagProtocol, PacketProtocol]],
+        *,
+        input_pipeline_hashes: Sequence[ContentHash] | None = None,
     ) -> None:
         """Barrier-mode: collect single input, run unary_static_process, emit."""
         try:
@@ -154,6 +156,8 @@ async def async_execute(
         self,
         inputs: Sequence[ReadableChannel[tuple[TagProtocol, PacketProtocol]]],
         output: WritableChannel[tuple[TagProtocol, PacketProtocol]],
+        *,
+        input_pipeline_hashes: Sequence[ContentHash] | None = None,
     ) -> None:
         """Barrier-mode: collect both inputs concurrently, run binary_static_process, emit."""
         try:
diff --git a/src/orcapod/core/operators/batch.py b/src/orcapod/core/operators/batch.py
index d49eeaa6..28e5cc4c 100644
--- a/src/orcapod/core/operators/batch.py
+++ b/src/orcapod/core/operators/batch.py
@@ -1,8 +1,10 @@
+from collections.abc import Sequence
 from typing import TYPE_CHECKING, Any
 
+from orcapod.channels import ReadableChannel, WritableChannel
 from orcapod.core.operators.base import UnaryOperator
 from orcapod.core.streams import ArrowTableStream
-from orcapod.protocols.core_protocols import StreamProtocol
+from orcapod.protocols.core_protocols import PacketProtocol, StreamProtocol, TagProtocol
 from orcapod.types import ColumnConfig
 from orcapod.utils.lazy_module import LazyModule
 
@@ -91,5 +93,48 @@ def unary_output_schema(
         # TODO: check if this is really necessary
         return Schema(batched_tag_types), Schema(batched_packet_types)
 
+    async def async_execute(
+        self,
+        inputs: Sequence[ReadableChannel[tuple[TagProtocol, PacketProtocol]]],
+        output: WritableChannel[tuple[TagProtocol, PacketProtocol]],
+        **kwargs: Any,
+    ) -> None:
+        """Streaming batch: emit full batches as they accumulate.
+
+        When ``batch_size > 0``, each group of ``batch_size`` rows is
+        materialized and emitted immediately, allowing downstream consumers
+        to start processing before all input is consumed.  When
+        ``batch_size == 0`` (batch everything), falls back to barrier mode.
+        """
+        try:
+            if self.batch_size == 0:
+                # Must collect all rows — barrier fallback
+                rows = await inputs[0].collect()
+                if rows:
+                    stream = self._materialize_to_stream(rows)
+                    result = self.unary_static_process(stream)
+                    for tag, packet in result.iter_packets():
+                        await output.send((tag, packet))
+                return
+
+            batch: list[tuple[TagProtocol, PacketProtocol]] = []
+            async for tag, packet in inputs[0]:
+                batch.append((tag, packet))
+                if len(batch) >= self.batch_size:
+                    stream = self._materialize_to_stream(batch)
+                    result = self.unary_static_process(stream)
+                    for out_tag, out_packet in result.iter_packets():
+                        await output.send((out_tag, out_packet))
+                    batch = []
+
+            # Flush partial batch
+            if batch and not self.drop_partial_batch:
+                stream = self._materialize_to_stream(batch)
+                result = self.unary_static_process(stream)
+                for out_tag, out_packet in result.iter_packets():
+                    await output.send((out_tag, out_packet))
+        finally:
+            await output.close()
+
     def identity_structure(self) -> Any:
         return (self.__class__.__name__, self.batch_size, self.drop_partial_batch)
diff --git a/src/orcapod/core/operators/column_selection.py b/src/orcapod/core/operators/column_selection.py
index ee09cd11..96099ee1 100644
--- a/src/orcapod/core/operators/column_selection.py
+++ b/src/orcapod/core/operators/column_selection.py
@@ -1,11 +1,12 @@
 import logging
-from collections.abc import Collection, Mapping
+from collections.abc import Collection, Mapping, Sequence
 from typing import TYPE_CHECKING, Any
 
+from orcapod.channels import ReadableChannel, WritableChannel
 from orcapod.core.operators.base import UnaryOperator
 from orcapod.core.streams import ArrowTableStream
 from orcapod.errors import InputValidationError
-from orcapod.protocols.core_protocols import StreamProtocol
+from orcapod.protocols.core_protocols import PacketProtocol, StreamProtocol, TagProtocol
 from orcapod.system_constants import constants
 from orcapod.types import ColumnConfig, Schema
 from orcapod.utils.lazy_module import LazyModule
@@ -82,6 +83,34 @@ def unary_output_schema(
 
         return Schema(new_tag_schema), packet_schema
 
+    async def async_execute(
+        self,
+        inputs: Sequence[ReadableChannel[tuple[TagProtocol, PacketProtocol]]],
+        output: WritableChannel[tuple[TagProtocol, PacketProtocol]],
+        **kwargs: Any,
+    ) -> None:
+        """Streaming: select tag columns per row without materializing."""
+        try:
+            tags_to_drop: list[str] | None = None
+            async for tag, packet in inputs[0]:
+                if tags_to_drop is None:
+                    tag_keys = tag.keys()
+                    if self.strict:
+                        missing = set(self.columns) - set(tag_keys)
+                        if missing:
+                            raise InputValidationError(
+                                f"Missing tag columns: {missing}. Make sure all "
+                                f"specified columns to select are present or use "
+                                f"strict=False to ignore missing columns"
+                            )
+                    tags_to_drop = [c for c in tag_keys if c not in self.columns]
+                if not tags_to_drop:
+                    await output.send((tag, packet))
+                else:
+                    await output.send((tag.drop(*tags_to_drop), packet))
+        finally:
+            await output.close()
+
     def identity_structure(self) -> Any:
         return (
             self.__class__.__name__,
@@ -163,6 +192,34 @@ def unary_output_schema(
 
         return tag_schema, Schema(new_packet_schema)
 
+    async def async_execute(
+        self,
+        inputs: Sequence[ReadableChannel[tuple[TagProtocol, PacketProtocol]]],
+        output: WritableChannel[tuple[TagProtocol, PacketProtocol]],
+        **kwargs: Any,
+    ) -> None:
+        """Streaming: select packet columns per row without materializing."""
+        try:
+            pkts_to_drop: list[str] | None = None
+            async for tag, packet in inputs[0]:
+                if pkts_to_drop is None:
+                    pkt_keys = packet.keys()
+                    if self.strict:
+                        missing = set(self.columns) - set(pkt_keys)
+                        if missing:
+                            raise InputValidationError(
+                                f"Missing packet columns: {missing}. Make sure all "
+                                f"specified columns to select are present or use "
+                                f"strict=False to ignore missing columns"
+                            )
+                    pkts_to_drop = [c for c in pkt_keys if c not in self.columns]
+                if not pkts_to_drop:
+                    await output.send((tag, packet))
+                else:
+                    await output.send((tag, packet.drop(*pkts_to_drop)))
+        finally:
+            await output.close()
+
     def identity_structure(self) -> Any:
         return (
             self.__class__.__name__,
@@ -237,6 +294,38 @@ def unary_output_schema(
 
         return Schema(new_tag_schema), packet_schema
 
+    async def async_execute(
+        self,
+        inputs: Sequence[ReadableChannel[tuple[TagProtocol, PacketProtocol]]],
+        output: WritableChannel[tuple[TagProtocol, PacketProtocol]],
+        **kwargs: Any,
+    ) -> None:
+        """Streaming: drop tag columns per row without materializing."""
+        try:
+            effective_drops: list[str] | None = None
+            async for tag, packet in inputs[0]:
+                if effective_drops is None:
+                    tag_keys = tag.keys()
+                    if self.strict:
+                        missing = set(self.columns) - set(tag_keys)
+                        if missing:
+                            raise InputValidationError(
+                                f"Missing tag columns: {missing}. Make sure all "
+                                f"specified columns to drop are present or use "
+                                f"strict=False to ignore missing columns"
+                            )
+                    effective_drops = (
+                        list(self.columns)
+                        if self.strict
+                        else [c for c in self.columns if c in tag_keys]
+                    )
+                if not effective_drops:
+                    await output.send((tag, packet))
+                else:
+                    await output.send((tag.drop(*effective_drops), packet))
+        finally:
+            await output.close()
+
     def identity_structure(self) -> Any:
         return (
             self.__class__.__name__,
@@ -314,6 +403,38 @@ def unary_output_schema(
 
         return tag_schema, Schema(new_packet_schema)
 
+    async def async_execute(
+        self,
+        inputs: Sequence[ReadableChannel[tuple[TagProtocol, PacketProtocol]]],
+        output: WritableChannel[tuple[TagProtocol, PacketProtocol]],
+        **kwargs: Any,
+    ) -> None:
+        """Streaming: drop packet columns per row without materializing."""
+        try:
+            effective_drops: list[str] | None = None
+            async for tag, packet in inputs[0]:
+                if effective_drops is None:
+                    pkt_keys = packet.keys()
+                    if self.strict:
+                        missing = set(self.columns) - set(pkt_keys)
+                        if missing:
+                            raise InputValidationError(
+                                f"Missing packet columns: {missing}. Make sure all "
+                                f"specified columns to drop are present or use "
+                                f"strict=False to ignore missing columns"
+                            )
+                    effective_drops = (
+                        list(self.columns)
+                        if self.strict
+                        else [c for c in self.columns if c in pkt_keys]
+                    )
+                if not effective_drops:
+                    await output.send((tag, packet))
+                else:
+                    await output.send((tag, packet.drop(*effective_drops)))
+        finally:
+            await output.close()
+
     def identity_structure(self) -> Any:
         return (
             self.__class__.__name__,
diff --git a/src/orcapod/core/operators/join.py b/src/orcapod/core/operators/join.py
index 9a1f793d..7fd6d3fc 100644
--- a/src/orcapod/core/operators/join.py
+++ b/src/orcapod/core/operators/join.py
@@ -1,12 +1,19 @@
-from collections.abc import Collection
+import asyncio
+from collections.abc import Collection, Sequence
 from typing import TYPE_CHECKING, Any
 
+from orcapod.channels import ReadableChannel, WritableChannel
 from orcapod.core.operators.base import NonZeroInputOperator
 from orcapod.core.streams import ArrowTableStream
 from orcapod.errors import InputValidationError
-from orcapod.protocols.core_protocols import ArgumentGroup, StreamProtocol
+from orcapod.protocols.core_protocols import (
+    ArgumentGroup,
+    PacketProtocol,
+    StreamProtocol,
+    TagProtocol,
+)
 from orcapod.system_constants import constants
-from orcapod.types import ColumnConfig, Schema
+from orcapod.types import ColumnConfig, ContentHash, Schema
 from orcapod.utils import arrow_data_utils, schema_utils
 from orcapod.utils.lazy_module import LazyModule
 
@@ -28,6 +35,7 @@ def kernel_id(self) -> tuple[str, ...]:
         return (f"{self.__class__.__name__}",)
 
     def validate_nonzero_inputs(self, *streams: StreamProtocol) -> None:
+        """Validate that input streams are compatible for joining."""
         try:
             self.output_schema(*streams)
         except Exception as e:
@@ -168,6 +176,282 @@ def static_process(self, *streams: StreamProtocol) -> StreamProtocol:
             tag_columns=tuple(tag_keys),
         )
 
+    # ------------------------------------------------------------------
+    # Async execution
+    # ------------------------------------------------------------------
+
+    def _compute_system_tag_suffixes(
+        self,
+        input_pipeline_hashes: Sequence[ContentHash],
+    ) -> list[str]:
+        """Compute per-input system-tag suffixes from pipeline hashes.
+
+        Each suffix is ``{truncated_hash}:{canonical_position}`` where
+        canonical position is determined by sorting the hashes (matching
+        the deterministic ordering used by ``static_process``).
+
+        Args:
+            input_pipeline_hashes: Pipeline hash per input, positionally
+                matching the input channels.
+
+        Returns:
+            List of suffix strings, one per input position.
+        """
+        n_char = self.orcapod_config.system_tag_hash_n_char
+        hex_strings = [h.to_hex() for h in input_pipeline_hashes]
+
+        # Canonical order: sorted by full hex (same as order_input_streams)
+        sorted_hexes = sorted(hex_strings)
+
+        suffixes: list[str] = []
+        for orig_idx, hex_str in enumerate(hex_strings):
+            canon_idx = sorted_hexes.index(hex_str)
+            truncated = input_pipeline_hashes[orig_idx].to_hex(n_char)
+            suffixes.append(f"{truncated}:{canon_idx}")
+        return suffixes
+
+    async def async_execute(
+        self,
+        inputs: Sequence[ReadableChannel[tuple[TagProtocol, PacketProtocol]]],
+        output: WritableChannel[tuple[TagProtocol, PacketProtocol]],
+        *,
+        input_pipeline_hashes: Sequence[ContentHash] | None = None,
+    ) -> None:
+        """Async join with streaming symmetric hash join for two inputs.
+
+        Single input: streams through directly without any buffering.
+
+        Two inputs: symmetric hash join — each arriving row is
+        immediately probed against the opposite side's buffer, emitting
+        matches as soon as found.  System-tag columns are correctly
+        renamed using the ``input_pipeline_hashes``.
+
+        Three or more inputs: collects all inputs concurrently, then
+        delegates to ``static_process`` for the Polars N-way join.
+
+        Args:
+            inputs: Readable channels, one per upstream.
+            output: Writable channel for downstream.
+            input_pipeline_hashes: Pipeline hash for each input,
+                positionally matching ``inputs``.  Required for
+                correct system-tag renaming with 2+ inputs.
+        """
+        try:
+            if len(inputs) == 1:
+                async for tag, packet in inputs[0]:
+                    await output.send((tag, packet))
+                return
+
+            if len(inputs) == 2:
+                suffixes = (
+                    self._compute_system_tag_suffixes(input_pipeline_hashes)
+                    if input_pipeline_hashes is not None
+                    else ["0", "1"]
+                )
+                await self._symmetric_hash_join(
+                    inputs[0], inputs[1], output, suffixes
+                )
+                return
+
+            # N > 2: concurrent collection + static_process
+            all_rows = await asyncio.gather(*(ch.collect() for ch in inputs))
+
+            # Guard against empty inputs — join with an empty side is empty
+            if any(len(rows) == 0 for rows in all_rows):
+                return
+
+            streams = [self._materialize_to_stream(rows) for rows in all_rows]
+            result = self.static_process(*streams)
+            for tag, packet in result.iter_packets():
+                await output.send((tag, packet))
+        finally:
+            await output.close()
+
+    async def _symmetric_hash_join(
+        self,
+        left_ch: ReadableChannel[tuple[TagProtocol, PacketProtocol]],
+        right_ch: ReadableChannel[tuple[TagProtocol, PacketProtocol]],
+        output: WritableChannel[tuple[TagProtocol, PacketProtocol]],
+        suffixes: list[str],
+    ) -> None:
+        """Symmetric hash join for two inputs.
+
+        Both sides are read concurrently via a merged bounded queue.
+        Each arriving row is added to its side's index and immediately
+        probed against the opposite side.  Matched rows are emitted to
+        ``output`` as soon as found, so downstream consumers can begin
+        work before either input is fully consumed.
+
+        Args:
+            left_ch: Left input channel.
+            right_ch: Right input channel.
+            output: Output channel for matched rows.
+            suffixes: Per-input system-tag suffixes (positional),
+                computed from pipeline hashes and canonical ordering.
+        """
+        # Bounded queue preserves backpressure — producers block when full.
+        _SENTINEL = object()
+        queue: asyncio.Queue = asyncio.Queue(maxsize=64)
+
+        async def _drain(
+            ch: ReadableChannel[tuple[TagProtocol, PacketProtocol]],
+            side: int,
+        ) -> None:
+            async for item in ch:
+                await queue.put((side, item))
+            await queue.put((side, _SENTINEL))
+
+        block_sep = constants.BLOCK_SEPARATOR
+
+        async with asyncio.TaskGroup() as tg:
+            tg.create_task(_drain(left_ch, 0))
+            tg.create_task(_drain(right_ch, 1))
+
+            # buffers[i] holds all rows seen so far from input i
+            buffers: list[list[tuple[TagProtocol, PacketProtocol]]] = [[], []]
+            # indexes[i] maps shared-key tuple → list of indices into buffers[i]
+            indexes: list[dict[tuple, list[int]]] = [{}, {}]
+
+            shared_keys: tuple[str, ...] | None = None
+            needs_reindex = False
+            closed_count = 0
+
+            while closed_count < 2:
+                side, item = await queue.get()
+
+                if item is _SENTINEL:
+                    closed_count += 1
+                    continue
+
+                tag, pkt = item
+                other = 1 - side
+
+                # Determine shared tag keys once we have rows from both sides
+                if shared_keys is None:
+                    if not buffers[other]:
+                        # Other side empty — just buffer this row for later
+                        buffers[side].append((tag, pkt))
+                        continue
+
+                    # We have data from both sides; compute shared keys
+                    this_keys = set(tag.keys())
+                    other_keys = set(buffers[other][0][0].keys())
+                    shared_keys = tuple(sorted(this_keys & other_keys))
+                    needs_reindex = True
+
+                # One-time re-index of all rows buffered before shared_keys
+                if needs_reindex:
+                    needs_reindex = False
+                    for buf_side in (0, 1):
+                        for j, (bt, _bp) in enumerate(buffers[buf_side]):
+                            btd = bt.as_dict()
+                            k = (
+                                tuple(btd[sk] for sk in shared_keys)
+                                if shared_keys
+                                else (0,)
+                            )
+                            indexes[buf_side].setdefault(k, []).append(j)
+
+                    # Emit matches for all already-buffered rows across sides
+                    for li, (lt, lp) in enumerate(buffers[0]):
+                        ltd = lt.as_dict()
+                        lk = (
+                            tuple(ltd[sk] for sk in shared_keys)
+                            if shared_keys
+                            else (0,)
+                        )
+                        for ri in indexes[1].get(lk, []):
+                            rt, rp = buffers[1][ri]
+                            await output.send(
+                                self._merge_row_pair(
+                                    lt, lp, rt, rp, suffixes, block_sep
+                                )
+                            )
+
+                # Index the new row
+                td = tag.as_dict()
+                key = (
+                    tuple(td[sk] for sk in shared_keys) if shared_keys else (0,)
+                )
+                row_idx = len(buffers[side])
+                buffers[side].append((tag, pkt))
+                indexes[side].setdefault(key, []).append(row_idx)
+
+                # Probe the opposite buffer for matches
+                matching_indices = indexes[other].get(key, [])
+                for mi in matching_indices:
+                    other_tag, other_pkt = buffers[other][mi]
+                    if side == 0:
+                        merged = self._merge_row_pair(
+                            tag, pkt, other_tag, other_pkt,
+                            suffixes, block_sep,
+                        )
+                    else:
+                        merged = self._merge_row_pair(
+                            other_tag, other_pkt, tag, pkt,
+                            suffixes, block_sep,
+                        )
+                    await output.send(merged)
+
+    @staticmethod
+    def _merge_row_pair(
+        left_tag: TagProtocol,
+        left_pkt: PacketProtocol,
+        right_tag: TagProtocol,
+        right_pkt: PacketProtocol,
+        suffixes: list[str],
+        block_sep: str,
+    ) -> tuple[TagProtocol, PacketProtocol]:
+        """Merge a matched pair of rows into one joined (Tag, Packet).
+
+        System-tag keys are renamed by appending
+        ``{block_sep}{suffix}`` to match the canonical name-extending
+        scheme used by ``static_process``.  System-tag values sharing
+        the same provenance path are sorted for commutativity.
+        """
+        from orcapod.core.datagrams import Packet, Tag
+
+        sys_prefix = constants.SYSTEM_TAG_PREFIX
+
+        # Merge tag dicts (shared keys come from left)
+        merged_tag_d: dict = {}
+        merged_tag_d.update(left_tag.as_dict())
+        for k, v in right_tag.as_dict().items():
+            if k not in merged_tag_d:
+                merged_tag_d[k] = v
+
+        # Rename and merge system tags with canonical suffixes
+        merged_sys: dict = {}
+        for k, v in left_tag.system_tags().items():
+            new_key = (
+                f"{k}{block_sep}{suffixes[0]}"
+                if k.startswith(sys_prefix)
+                else k
+            )
+            merged_sys[new_key] = v
+        for k, v in right_tag.system_tags().items():
+            new_key = (
+                f"{k}{block_sep}{suffixes[1]}"
+                if k.startswith(sys_prefix)
+                else k
+            )
+            merged_sys[new_key] = v
+
+        merged_tag = Tag(merged_tag_d, system_tags=merged_sys)
+
+        # Merge packet dicts (non-overlapping by Join's validation)
+        merged_pkt_d: dict = {}
+        merged_pkt_d.update(left_pkt.as_dict())
+        merged_pkt_d.update(right_pkt.as_dict())
+
+        merged_si: dict = {}
+        merged_si.update(left_pkt.source_info())
+        merged_si.update(right_pkt.source_info())
+
+        merged_pkt = Packet(merged_pkt_d, source_info=merged_si)
+
+        return merged_tag, merged_pkt
+
     def identity_structure(self) -> Any:
         return self.__class__.__name__
 
diff --git a/src/orcapod/core/operators/mappers.py b/src/orcapod/core/operators/mappers.py
index d28b2dec..e9c51510 100644
--- a/src/orcapod/core/operators/mappers.py
+++ b/src/orcapod/core/operators/mappers.py
@@ -1,10 +1,11 @@
-from collections.abc import Mapping
+from collections.abc import Mapping, Sequence
 from typing import TYPE_CHECKING, Any
 
+from orcapod.channels import ReadableChannel, WritableChannel
 from orcapod.core.operators.base import UnaryOperator
 from orcapod.core.streams import ArrowTableStream
 from orcapod.errors import InputValidationError
-from orcapod.protocols.core_protocols import StreamProtocol
+from orcapod.protocols.core_protocols import PacketProtocol, StreamProtocol, TagProtocol
 from orcapod.system_constants import constants
 from orcapod.types import ColumnConfig, Schema
 from orcapod.utils.lazy_module import LazyModule
@@ -110,6 +111,34 @@ def unary_output_schema(
 
         return tag_schema, Schema(new_packet_schema)
 
+    async def async_execute(
+        self,
+        inputs: Sequence[ReadableChannel[tuple[TagProtocol, PacketProtocol]]],
+        output: WritableChannel[tuple[TagProtocol, PacketProtocol]],
+        **kwargs: Any,
+    ) -> None:
+        """Streaming: rename packet columns per row without materializing."""
+        try:
+            rename_map: dict[str, str] | None = None
+            unmapped: list[str] | None = None
+            async for tag, packet in inputs[0]:
+                if rename_map is None:
+                    pkt_keys = packet.keys()
+                    rename_map = {
+                        k: self.name_map[k] for k in pkt_keys if k in self.name_map
+                    }
+                    if self.drop_unmapped:
+                        unmapped = [k for k in pkt_keys if k not in self.name_map]
+                if not rename_map:
+                    await output.send((tag, packet))
+                else:
+                    new_pkt = packet.rename(rename_map)
+                    if unmapped:
+                        new_pkt = new_pkt.drop(*unmapped)
+                    await output.send((tag, new_pkt))
+        finally:
+            await output.close()
+
     def identity_structure(self) -> Any:
         return (
             self.__class__.__name__,
@@ -208,6 +237,34 @@ def unary_output_schema(
 
         return Schema(new_tag_schema), packet_schema
 
+    async def async_execute(
+        self,
+        inputs: Sequence[ReadableChannel[tuple[TagProtocol, PacketProtocol]]],
+        output: WritableChannel[tuple[TagProtocol, PacketProtocol]],
+        **kwargs: Any,
+    ) -> None:
+        """Streaming: rename tag columns per row without materializing."""
+        try:
+            rename_map: dict[str, str] | None = None
+            unmapped: list[str] | None = None
+            async for tag, packet in inputs[0]:
+                if rename_map is None:
+                    tag_keys = tag.keys()
+                    rename_map = {
+                        k: self.name_map[k] for k in tag_keys if k in self.name_map
+                    }
+                    if self.drop_unmapped:
+                        unmapped = [k for k in tag_keys if k not in self.name_map]
+                if not rename_map:
+                    await output.send((tag, packet))
+                else:
+                    new_tag = tag.rename(rename_map)
+                    if unmapped:
+                        new_tag = new_tag.drop(*unmapped)
+                    await output.send((new_tag, packet))
+        finally:
+            await output.close()
+
     def identity_structure(self) -> Any:
         return (
             self.__class__.__name__,
diff --git a/src/orcapod/core/operators/semijoin.py b/src/orcapod/core/operators/semijoin.py
index 0a36f342..59dde1fc 100644
--- a/src/orcapod/core/operators/semijoin.py
+++ b/src/orcapod/core/operators/semijoin.py
@@ -1,9 +1,11 @@
+from collections.abc import Sequence
 from typing import TYPE_CHECKING, Any
 
+from orcapod.channels import ReadableChannel, WritableChannel
 from orcapod.core.operators.base import BinaryOperator
 from orcapod.core.streams import ArrowTableStream
 from orcapod.errors import InputValidationError
-from orcapod.protocols.core_protocols import StreamProtocol
+from orcapod.protocols.core_protocols import PacketProtocol, StreamProtocol, TagProtocol
 from orcapod.types import ColumnConfig, Schema
 from orcapod.utils import schema_utils
 from orcapod.utils.lazy_module import LazyModule
@@ -93,6 +95,9 @@ def validate_binary_inputs(
         """
         Validates that the input streams are compatible for semi-join.
         Checks that overlapping columns have compatible types.
+
+        Stores the common keys so that ``async_execute`` can use them
+        to determine the correct empty-right behavior without data.
         """
         try:
             left_tag_schema, left_packet_schema = left_stream.output_schema()
@@ -107,7 +112,8 @@ def validate_binary_inputs(
             )
 
             # intersection_schemas will raise an error if types are incompatible
-            schema_utils.intersection_schemas(left_all_schema, right_all_schema)
+            common = schema_utils.intersection_schemas(left_all_schema, right_all_schema)
+            self._validated_common_keys: tuple[str, ...] = tuple(common.keys())
 
         except Exception as e:
             raise InputValidationError(
@@ -117,5 +123,88 @@ def validate_binary_inputs(
     def is_commutative(self) -> bool:
         return False
 
+    def _common_keys_from_schema(self) -> tuple[str, ...]:
+        """Return the common keys computed during input validation.
+
+        Falls back to an empty tuple if validation hasn't been called
+        (shouldn't happen in normal pipeline execution).
+        """
+        return getattr(self, "_validated_common_keys", ())
+
+    async def async_execute(
+        self,
+        inputs: Sequence[ReadableChannel[tuple[TagProtocol, PacketProtocol]]],
+        output: WritableChannel[tuple[TagProtocol, PacketProtocol]],
+        **kwargs: Any,
+    ) -> None:
+        """Build-probe: collect right input, then stream left through a hash lookup.
+
+        Phase 1 — Build: collect all rows from the right (filter) channel and
+        index them by the common-key values.
+        Phase 2 — Probe: stream left rows one at a time; for each row whose
+        common-key values appear in the right-side index, emit immediately.
+
+        Falls back to barrier mode when the right input is empty (schema
+        cannot be inferred from data) or when there are no common keys.
+        """
+        try:
+            left_ch, right_ch = inputs[0], inputs[1]
+
+            # Phase 1: Build right-side lookup
+            right_rows = await right_ch.collect()
+
+            if not right_rows:
+                # Empty right: determine common keys from the validated
+                # input schemas (set during __init__) to match sync semantics.
+                # Common keys exist → empty result; no common keys → pass left through.
+                common = self._common_keys_from_schema()
+                if common:
+                    # Drain left channel (discard) — result is empty
+                    await left_ch.collect()
+                    return
+                # No common keys — pass all left rows through unchanged
+                async for tag, packet in left_ch:
+                    await output.send((tag, packet))
+                return
+
+            # Determine right-side keys from first row
+            right_tag_keys = set(right_rows[0][0].keys())
+            right_pkt_keys = set(right_rows[0][1].keys())
+            right_all_keys = right_tag_keys | right_pkt_keys
+
+            # Phase 2: Probe — stream left rows
+            common_keys: tuple[str, ...] | None = None
+            right_lookup: set[tuple] | None = None
+
+            async for tag, packet in left_ch:
+                if common_keys is None:
+                    # First left row — determine common keys and build index
+                    left_tag_keys = set(tag.keys())
+                    left_pkt_keys = set(packet.keys())
+                    left_all_keys = left_tag_keys | left_pkt_keys
+                    common_keys = tuple(sorted(left_all_keys & right_all_keys))
+
+                    if not common_keys:
+                        # No common keys — pass all left rows through
+                        await output.send((tag, packet))
+                        async for t, p in left_ch:
+                            await output.send((t, p))
+                        return
+
+                    # Build right-side lookup
+                    right_lookup = set()
+                    for rt, rp in right_rows:
+                        rd = rt.as_dict()
+                        rd.update(rp.as_dict())
+                        right_lookup.add(tuple(rd[k] for k in common_keys))
+
+                # Probe
+                ld = tag.as_dict()
+                ld.update(packet.as_dict())
+                if tuple(ld[k] for k in common_keys) in right_lookup:  # type: ignore[arg-type]
+                    await output.send((tag, packet))
+        finally:
+            await output.close()
+
     def identity_structure(self) -> Any:
         return self.__class__.__name__
diff --git a/src/orcapod/core/packet_function.py b/src/orcapod/core/packet_function.py
index ed3d8234..7b27fc9f 100644
--- a/src/orcapod/core/packet_function.py
+++ b/src/orcapod/core/packet_function.py
@@ -532,6 +532,30 @@ def call(
 
         return output_packet
 
+    async def async_call(
+        self,
+        packet: PacketProtocol,
+        *,
+        skip_cache_lookup: bool = False,
+        skip_cache_insert: bool = False,
+    ) -> PacketProtocol | None:
+        """Async counterpart of ``call`` with cache check and recording."""
+        output_packet = None
+        if not skip_cache_lookup:
+            logger.info("Checking for cache...")
+            output_packet = self.get_cached_output_for_packet(packet)
+            if output_packet is not None:
+                logger.info(f"Cache hit for {packet}!")
+        if output_packet is None:
+            output_packet = await self._packet_function.async_call(packet)
+            if output_packet is not None:
+                if not skip_cache_insert:
+                    self.record_packet(packet, output_packet)
+                output_packet = output_packet.with_meta_columns(
+                    **{self.RESULT_COMPUTED_FLAG: True}
+                )
+        return output_packet
+
     def get_cached_output_for_packet(
         self, input_packet: PacketProtocol
     ) -> PacketProtocol | None:
diff --git a/src/orcapod/core/static_output_pod.py b/src/orcapod/core/static_output_pod.py
index c52fb3cd..e5366d04 100644
--- a/src/orcapod/core/static_output_pod.py
+++ b/src/orcapod/core/static_output_pod.py
@@ -21,7 +21,7 @@
     TagProtocol,
     TrackerManagerProtocol,
 )
-from orcapod.types import ColumnConfig, Schema
+from orcapod.types import ColumnConfig, ContentHash, Schema
 from orcapod.utils.lazy_module import LazyModule
 
 logger = logging.getLogger(__name__)
@@ -207,11 +207,21 @@ async def async_execute(
         self,
         inputs: Sequence[ReadableChannel[tuple[TagProtocol, PacketProtocol]]],
         output: WritableChannel[tuple[TagProtocol, PacketProtocol]],
+        *,
+        input_pipeline_hashes: Sequence[ContentHash] | None = None,
     ) -> None:
         """Default barrier-mode async execution.
 
         Collects all inputs, runs ``static_process``, emits results.
         Subclasses override for streaming or incremental strategies.
+
+        Args:
+            inputs: Readable channels, one per upstream node.
+            output: Writable channel for downstream consumption.
+            input_pipeline_hashes: Pipeline hash for each input stream,
+                positionally matching ``inputs``.  Multi-input operators
+                (e.g. Join) use these to compute canonical system-tag
+                column names.  Ignored by single-input operators.
         """
         try:
             all_rows = await asyncio.gather(*(ch.collect() for ch in inputs))
diff --git a/src/orcapod/pipeline/nodes.py b/src/orcapod/pipeline/nodes.py
index 2245b760..ae436067 100644
--- a/src/orcapod/pipeline/nodes.py
+++ b/src/orcapod/pipeline/nodes.py
@@ -1,10 +1,11 @@
 from __future__ import annotations
 
 import logging
-from collections.abc import Iterator
+from collections.abc import Iterator, Sequence
 from typing import TYPE_CHECKING, Any
 
 from orcapod import contexts
+from orcapod.channels import ReadableChannel, WritableChannel
 from orcapod.config import Config
 from orcapod.core.streams.arrow_table_stream import ArrowTableStream
 from orcapod.core.tracker import SourceNode
@@ -144,6 +145,20 @@ def as_table(
         assert self._cached_stream is not None
         return self._cached_stream.as_table(columns=columns, all_info=all_info)
 
+    async def async_execute(
+        self,
+        inputs: Sequence[ReadableChannel[tuple[TagProtocol, PacketProtocol]]],
+        output: WritableChannel[tuple[TagProtocol, PacketProtocol]],
+    ) -> None:
+        """Materialize to cache DB, then push cached rows to the output channel."""
+        try:
+            self._ensure_stream()
+            assert self._cached_stream is not None
+            for tag, packet in self._cached_stream.iter_packets():
+                await output.send((tag, packet))
+        finally:
+            await output.close()
+
     def get_all_records(self) -> "pa.Table | None":
         """Retrieve all stored records from the cache database."""
         return self._cache_database.get_all_records(self.cache_path)
diff --git a/src/orcapod/pipeline/orchestrator.py b/src/orcapod/pipeline/orchestrator.py
index 17de6743..86ca9ee3 100644
--- a/src/orcapod/pipeline/orchestrator.py
+++ b/src/orcapod/pipeline/orchestrator.py
@@ -1,7 +1,10 @@
 """Async pipeline orchestrator for push-based channel execution.
 
-Compiles a ``GraphTracker``'s DAG into channels and launches all nodes
-concurrently via ``asyncio.TaskGroup``.
+Walks a compiled ``Pipeline``'s persistent node graph and launches all
+nodes concurrently via ``asyncio.TaskGroup``, wiring them together with
+bounded channels.  After execution, results are available in the
+pipeline databases via the usual ``get_all_records()`` / ``as_source()``
+accessors on each persistent node.
 """
 
 from __future__ import annotations
@@ -12,163 +15,129 @@
 from typing import TYPE_CHECKING, Any
 
 from orcapod.channels import BroadcastChannel, Channel
-from orcapod.core.static_output_pod import StaticOutputPod
-from orcapod.core.tracker import GraphTracker, SourceNode
 from orcapod.types import PipelineConfig
 
 if TYPE_CHECKING:
     import networkx as nx
 
-    from orcapod.core.streams.arrow_table_stream import ArrowTableStream
-    from orcapod.protocols.core_protocols import PacketProtocol, StreamProtocol, TagProtocol
+    from orcapod.pipeline.graph import Pipeline
 
 logger = logging.getLogger(__name__)
 
 
 class AsyncPipelineOrchestrator:
-    """Executes a compiled DAG asynchronously using channels and TaskGroup.
+    """Execute a compiled ``Pipeline`` asynchronously using channels.
 
-    After ``GraphTracker.compile()``, the orchestrator:
+    After ``Pipeline.compile()``, the orchestrator:
 
-    1. Identifies source, intermediate, and terminal nodes.
-    2. Creates bounded channels (or broadcast channels for fan-out) between
-       connected nodes.
-    3. Launches every node's ``async_execute`` concurrently.
-    4. Collects the terminal node's output and materializes it as a stream.
+    1. Walks ``Pipeline._node_graph`` (persistent nodes) in topological
+       order.
+    2. Creates bounded channels (or broadcast channels for fan-out)
+       between connected nodes.
+    3. Launches every node's ``async_execute`` concurrently via
+       ``asyncio.TaskGroup``.
+
+    Results are written to the pipeline databases by the persistent
+    nodes themselves (``PersistentFunctionNode``, ``PersistentOperatorNode``
+    in LOG mode, etc.).  After ``run()`` returns, callers retrieve data
+    via ``pipeline.<label>.get_all_records()``.
     """
 
     def run(
         self,
-        tracker: GraphTracker,
+        pipeline: Pipeline,
         config: PipelineConfig | None = None,
-    ) -> StreamProtocol:
-        """Synchronous entry point — runs the async pipeline and returns the result.
+    ) -> None:
+        """Synchronous entry point — runs the async pipeline to completion.
 
         Args:
-            tracker: A compiled ``GraphTracker`` whose ``_node_lut`` and
-                ``_graph_edges`` describe the DAG.
+            pipeline: A compiled ``Pipeline`` whose ``_node_graph``
+                describes the persistent DAG.
             config: Pipeline configuration (buffer sizes, concurrency).
-
-        Returns:
-            An ``ArrowTableStream`` containing all (tag, packet) pairs
-            produced by the terminal node.
         """
         config = config or PipelineConfig()
-        return asyncio.run(self._run_async(tracker, config))
+        asyncio.run(self._run_async(pipeline, config))
 
     async def run_async(
         self,
-        tracker: GraphTracker,
+        pipeline: Pipeline,
         config: PipelineConfig | None = None,
-    ) -> StreamProtocol:
+    ) -> None:
         """Async entry point for callers already inside an event loop.
 
         Args:
-            tracker: A compiled ``GraphTracker``.
+            pipeline: A compiled ``Pipeline``.
             config: Pipeline configuration.
-
-        Returns:
-            An ``ArrowTableStream`` of the terminal node's output.
         """
         config = config or PipelineConfig()
-        return await self._run_async(tracker, config)
+        await self._run_async(pipeline, config)
 
     async def _run_async(
         self,
-        tracker: GraphTracker,
+        pipeline: Pipeline,
         config: PipelineConfig,
-    ) -> StreamProtocol:
-        """Core async logic: wire channels, launch tasks, collect results."""
+    ) -> None:
+        """Core async logic: wire channels between persistent nodes, launch tasks."""
         import networkx as nx
 
-        # Build directed graph from edges
-        G = nx.DiGraph()
-        for upstream_hash, downstream_hash in tracker._graph_edges:
-            G.add_edge(upstream_hash, downstream_hash)
+        if not pipeline._compiled:
+            pipeline.compile()
 
-        # Add isolated nodes (sources with no downstream edges)
-        for node_hash in tracker._node_lut:
-            if node_hash not in G:
-                G.add_node(node_hash)
+        G: nx.DiGraph = pipeline._node_graph
+        assert G is not None, "Pipeline must be compiled before async execution"
 
         topo_order = list(nx.topological_sort(G))
 
-        # Identify terminal nodes (no outgoing edges)
-        terminal_hashes = [h for h in topo_order if G.out_degree(h) == 0]
-        if not terminal_hashes:
-            raise ValueError("DAG has no terminal nodes")
-
-        # For multiple terminals, we use the last one in topological order
-        # (the one furthest downstream)
-        terminal_hash = terminal_hashes[-1]
-
         buf = config.channel_buffer_size
 
-        # Build channel mapping:
-        # For each edge (upstream_hash → downstream_hash), create a channel.
-        # If an upstream feeds multiple downstreams (fan-out), use BroadcastChannel.
-
-        # Count outgoing edges per node
-        out_edges: dict[str, list[str]] = defaultdict(list)
-        for upstream_hash, downstream_hash in tracker._graph_edges:
-            out_edges[upstream_hash].append(downstream_hash)
-
-        # Count incoming edges per node (to know how many input channels)
-        in_edges: dict[str, list[str]] = defaultdict(list)
-        for upstream_hash, downstream_hash in tracker._graph_edges:
-            in_edges[downstream_hash].append(upstream_hash)
-
-        # For each upstream node, create either a Channel or BroadcastChannel
-        # upstream_hash → Channel or BroadcastChannel
-        node_output_channels: dict[str, Channel | BroadcastChannel] = {}
-
-        # edge (upstream, downstream) → reader
-        edge_readers: dict[tuple[str, str], Any] = {}
-
-        for upstream_hash, downstreams in out_edges.items():
+        # Build edge maps keyed by node object identity
+        out_edges: dict[Any, list[Any]] = defaultdict(list)
+        in_edges: dict[Any, list[Any]] = defaultdict(list)
+        for upstream_node, downstream_node in G.edges():
+            out_edges[upstream_node].append(downstream_node)
+            in_edges[downstream_node].append(upstream_node)
+
+        # Create channels for each edge.
+        # If a node fans out to multiple downstreams, use BroadcastChannel.
+        # node → output Channel or BroadcastChannel
+        node_output_channels: dict[Any, Channel | BroadcastChannel] = {}
+        # (upstream, downstream) → reader
+        edge_readers: dict[tuple[Any, Any], Any] = {}
+
+        for node, downstreams in out_edges.items():
             if len(downstreams) == 1:
-                # Simple channel
                 ch = Channel(buffer_size=buf)
-                node_output_channels[upstream_hash] = ch
-                edge_readers[(upstream_hash, downstreams[0])] = ch.reader
+                node_output_channels[node] = ch
+                edge_readers[(node, downstreams[0])] = ch.reader
             else:
-                # Fan-out: use BroadcastChannel
                 bch = BroadcastChannel(buffer_size=buf)
-                node_output_channels[upstream_hash] = bch
-                for ds_hash in downstreams:
-                    edge_readers[(upstream_hash, ds_hash)] = bch.add_reader()
-
-        # Terminal node output channel
-        terminal_ch = Channel(buffer_size=buf)
-        node_output_channels[terminal_hash] = terminal_ch
+                node_output_channels[node] = bch
+                for ds in downstreams:
+                    edge_readers[(node, ds)] = bch.add_reader()
+
+        # Terminal nodes (no outgoing edges) need a sink channel so their
+        # async_execute has somewhere to write.  We drain it after execution.
+        terminal_nodes = [n for n in topo_order if G.out_degree(n) == 0]
+        terminal_channels: list[Channel] = []
+        for node in terminal_nodes:
+            if node not in node_output_channels:
+                ch = Channel(buffer_size=buf)
+                node_output_channels[node] = ch
+                terminal_channels.append(ch)
 
-        # Now launch all nodes
+        # Launch all nodes concurrently
         async with asyncio.TaskGroup() as tg:
-            for node_hash in topo_order:
-                node = tracker._node_lut[node_hash]
-
-                # Gather input readers for this node (from its upstream edges)
-                input_readers = []
-                for upstream_hash in in_edges.get(node_hash, []):
-                    reader = edge_readers[(upstream_hash, node_hash)]
-                    input_readers.append(reader)
-
-                # Get the output writer
-                output_channel = node_output_channels.get(node_hash)
-                if output_channel is None:
-                    # Node with no downstream and not the terminal — still needs
-                    # an output channel (it will just be discarded)
-                    output_channel = Channel(buffer_size=buf)
-                    node_output_channels[node_hash] = output_channel
-
-                writer = output_channel.writer
-
-                tg.create_task(
-                    node.async_execute(input_readers, writer)
-                )
-
-        # Collect terminal output
-        terminal_rows = await terminal_ch.reader.collect()
-
-        # Materialize into a stream
-        return StaticOutputPod._materialize_to_stream(terminal_rows)
+            for node in topo_order:
+                # Gather input readers from upstream edges
+                input_readers = [
+                    edge_readers[(upstream, node)]
+                    for upstream in in_edges.get(node, [])
+                ]
+
+                writer = node_output_channels[node].writer
+
+                tg.create_task(node.async_execute(input_readers, writer))
+
+        # Drain terminal channels so nothing is left buffered
+        for ch in terminal_channels:
+            await ch.reader.collect()
diff --git a/tests/test_channels/test_native_async_operators.py b/tests/test_channels/test_native_async_operators.py
new file mode 100644
index 00000000..732f8e3d
--- /dev/null
+++ b/tests/test_channels/test_native_async_operators.py
@@ -0,0 +1,1462 @@
+"""
+Comprehensive tests for native streaming async_execute overrides.
+
+Each operator's new streaming async_execute is tested to produce the same
+results as the synchronous static_process path.  Tests mirror the sync
+operator tests in ``tests/test_core/operators/test_operators.py``.
+
+Covers:
+- SelectTagColumns streaming: per-row tag column selection
+- SelectPacketColumns streaming: per-row packet column selection
+- DropTagColumns streaming: per-row tag column dropping
+- DropPacketColumns streaming: per-row packet column dropping
+- MapTags streaming: per-row tag column renaming
+- MapPackets streaming: per-row packet column renaming
+- Batch streaming: accumulate-and-emit full batches, partial batch handling
+- SemiJoin build-probe: collect right, stream left through hash lookup
+- Join: single-input passthrough, concurrent binary/N-ary collection
+- Sync / async equivalence for every operator
+- Empty input handling
+- Multi-stage pipeline integration
+"""
+
+from __future__ import annotations
+
+import asyncio
+
+import pyarrow as pa
+import pytest
+
+from orcapod.channels import Channel
+from orcapod.core.datagrams import Tag
+from orcapod.core.operators import (
+    Batch,
+    DropPacketColumns,
+    DropTagColumns,
+    Join,
+    MapPackets,
+    MapTags,
+    MergeJoin,
+    SelectPacketColumns,
+    SelectTagColumns,
+    SemiJoin,
+)
+from orcapod.core.streams.arrow_table_stream import ArrowTableStream
+from orcapod.system_constants import constants
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+
+def make_simple_stream() -> ArrowTableStream:
+    """Stream with 1 tag (animal) and 2 packet columns (weight, legs)."""
+    table = pa.table(
+        {
+            "animal": ["cat", "dog", "bird"],
+            "weight": [4.0, 12.0, 0.5],
+            "legs": [4, 4, 2],
+        }
+    )
+    return ArrowTableStream(table, tag_columns=["animal"])
+
+
+def make_two_tag_stream() -> ArrowTableStream:
+    """Stream with 2 tags (region, animal) and 1 packet column (count)."""
+    table = pa.table(
+        {
+            "region": ["east", "east", "west"],
+            "animal": ["cat", "dog", "cat"],
+            "count": [10, 5, 8],
+        }
+    )
+    return ArrowTableStream(table, tag_columns=["region", "animal"])
+
+
+def make_int_stream(n: int = 3) -> ArrowTableStream:
+    """Stream with tag=id, packet=x (ints)."""
+    table = pa.table(
+        {
+            "id": pa.array(list(range(n)), type=pa.int64()),
+            "x": pa.array(list(range(n)), type=pa.int64()),
+        }
+    )
+    return ArrowTableStream(table, tag_columns=["id"])
+
+
+def make_two_col_stream(n: int = 3) -> ArrowTableStream:
+    """Stream with tag=id, packet={x, y}."""
+    table = pa.table(
+        {
+            "id": pa.array(list(range(n)), type=pa.int64()),
+            "x": pa.array(list(range(n)), type=pa.int64()),
+            "y": pa.array([i * 10 for i in range(n)], type=pa.int64()),
+        }
+    )
+    return ArrowTableStream(table, tag_columns=["id"])
+
+
+def make_left_stream() -> ArrowTableStream:
+    table = pa.table(
+        {
+            "id": pa.array([1, 2, 3], type=pa.int64()),
+            "value_a": pa.array([10, 20, 30], type=pa.int64()),
+        }
+    )
+    return ArrowTableStream(table, tag_columns=["id"])
+
+
+def make_right_stream() -> ArrowTableStream:
+    table = pa.table(
+        {
+            "id": pa.array([2, 3, 4], type=pa.int64()),
+            "value_b": pa.array([200, 300, 400], type=pa.int64()),
+        }
+    )
+    return ArrowTableStream(table, tag_columns=["id"])
+
+
+def make_disjoint_stream() -> ArrowTableStream:
+    """Stream with same tags as simple_stream but different packet columns."""
+    table = pa.table(
+        {
+            "animal": ["cat", "dog", "bird"],
+            "speed": [30.0, 45.0, 80.0],
+        }
+    )
+    return ArrowTableStream(table, tag_columns=["animal"])
+
+
+async def feed(stream: ArrowTableStream, ch: Channel) -> None:
+    """Push all (tag, packet) from a stream into a channel, then close."""
+    for tag, packet in stream.iter_packets():
+        await ch.writer.send((tag, packet))
+    await ch.writer.close()
+
+
+async def run_unary(op, stream: ArrowTableStream) -> list[tuple]:
+    """Run a unary operator async and collect results."""
+    input_ch = Channel(buffer_size=1024)
+    output_ch = Channel(buffer_size=1024)
+    await feed(stream, input_ch)
+    await op.async_execute([input_ch.reader], output_ch.writer)
+    return await output_ch.reader.collect()
+
+
+async def run_binary(op, left: ArrowTableStream, right: ArrowTableStream) -> list[tuple]:
+    """Run a binary operator async and collect results."""
+    left_ch = Channel(buffer_size=1024)
+    right_ch = Channel(buffer_size=1024)
+    output_ch = Channel(buffer_size=1024)
+    await feed(left, left_ch)
+    await feed(right, right_ch)
+    await op.async_execute([left_ch.reader, right_ch.reader], output_ch.writer)
+    return await output_ch.reader.collect()
+
+
+def sync_process_to_rows(op, *streams):
+    """Run sync static_process and return list of (tag, packet) pairs."""
+    result = op.static_process(*streams)
+    return list(result.iter_packets())
+
+
+# ===================================================================
+# SelectTagColumns — streaming per-row
+# ===================================================================
+
+
+class TestSelectTagColumnsStreaming:
+    @pytest.mark.asyncio
+    async def test_keeps_only_selected_tags(self):
+        stream = make_two_tag_stream()
+        op = SelectTagColumns(columns=["region"])
+        results = await run_unary(op, stream)
+
+        assert len(results) == 3
+        for tag, packet in results:
+            tag_keys = tag.keys()
+            assert "region" in tag_keys
+            assert "animal" not in tag_keys
+            # packet columns unchanged
+            assert "count" in packet.keys()
+
+    @pytest.mark.asyncio
+    async def test_all_columns_selected_passthrough(self):
+        """When all tag columns are already selected, rows pass through unaltered."""
+        stream = make_two_tag_stream()
+        op = SelectTagColumns(columns=["region", "animal"])
+        results = await run_unary(op, stream)
+
+        assert len(results) == 3
+        for tag, packet in results:
+            assert set(tag.keys()) == {"region", "animal"}
+            assert "count" in packet.keys()
+
+    @pytest.mark.asyncio
+    async def test_data_values_preserved(self):
+        stream = make_two_tag_stream()
+        op = SelectTagColumns(columns=["region"])
+        results = await run_unary(op, stream)
+
+        regions = sorted(tag.as_dict()["region"] for tag, _ in results)
+        assert regions == ["east", "east", "west"]
+
+    @pytest.mark.asyncio
+    async def test_empty_input(self):
+        input_ch = Channel(buffer_size=4)
+        output_ch = Channel(buffer_size=4)
+        await input_ch.writer.close()
+        op = SelectTagColumns(columns=["region"])
+        await op.async_execute([input_ch.reader], output_ch.writer)
+        results = await output_ch.reader.collect()
+        assert results == []
+
+    @pytest.mark.asyncio
+    async def test_matches_sync_output(self):
+        stream = make_two_tag_stream()
+        op = SelectTagColumns(columns=["region"])
+
+        async_results = await run_unary(op, stream)
+        sync_results = sync_process_to_rows(op, stream)
+
+        assert len(async_results) == len(sync_results)
+        async_tags = sorted(t.as_dict()["region"] for t, _ in async_results)
+        sync_tags = sorted(t.as_dict()["region"] for t, _ in sync_results)
+        assert async_tags == sync_tags
+
+    @pytest.mark.asyncio
+    async def test_system_tags_preserved(self):
+        """System tags on Tag objects should survive per-row selection."""
+        from orcapod.core.sources.arrow_table_source import ArrowTableSource
+
+        src = ArrowTableSource(
+            pa.table(
+                {
+                    "region": ["east", "west"],
+                    "animal": ["cat", "dog"],
+                    "count": pa.array([10, 5], type=pa.int64()),
+                }
+            ),
+            tag_columns=["region", "animal"],
+        )
+        op = SelectTagColumns(columns=["region"])
+        results = await run_unary(op, src)
+
+        assert len(results) == 2
+        for tag, _ in results:
+            sys_tags = tag.system_tags()
+            # Source-backed streams have system tags
+            assert len(sys_tags) > 0
+
+
+# ===================================================================
+# SelectPacketColumns — streaming per-row
+# ===================================================================
+
+
+class TestSelectPacketColumnsStreaming:
+    @pytest.mark.asyncio
+    async def test_keeps_only_selected_packets(self):
+        stream = make_simple_stream()
+        op = SelectPacketColumns(columns=["weight"])
+        results = await run_unary(op, stream)
+
+        assert len(results) == 3
+        for _, packet in results:
+            pkt_keys = packet.keys()
+            assert "weight" in pkt_keys
+            assert "legs" not in pkt_keys
+            # tag columns unchanged
+        for tag, _ in results:
+            assert "animal" in tag.keys()
+
+    @pytest.mark.asyncio
+    async def test_all_columns_selected_passthrough(self):
+        stream = make_simple_stream()
+        op = SelectPacketColumns(columns=["weight", "legs"])
+        results = await run_unary(op, stream)
+
+        assert len(results) == 3
+        for _, packet in results:
+            assert set(packet.keys()) == {"weight", "legs"}
+
+    @pytest.mark.asyncio
+    async def test_data_values_preserved(self):
+        stream = make_simple_stream()
+        op = SelectPacketColumns(columns=["weight"])
+        results = await run_unary(op, stream)
+
+        weights = sorted(pkt.as_dict()["weight"] for _, pkt in results)
+        assert weights == [0.5, 4.0, 12.0]
+
+    @pytest.mark.asyncio
+    async def test_empty_input(self):
+        input_ch = Channel(buffer_size=4)
+        output_ch = Channel(buffer_size=4)
+        await input_ch.writer.close()
+        op = SelectPacketColumns(columns=["weight"])
+        await op.async_execute([input_ch.reader], output_ch.writer)
+        results = await output_ch.reader.collect()
+        assert results == []
+
+    @pytest.mark.asyncio
+    async def test_matches_sync_output(self):
+        stream = make_simple_stream()
+        op = SelectPacketColumns(columns=["weight"])
+
+        async_results = await run_unary(op, stream)
+        sync_results = sync_process_to_rows(op, stream)
+
+        assert len(async_results) == len(sync_results)
+        async_vals = sorted(p.as_dict()["weight"] for _, p in async_results)
+        sync_vals = sorted(p.as_dict()["weight"] for _, p in sync_results)
+        assert async_vals == sync_vals
+
+    @pytest.mark.asyncio
+    async def test_source_info_for_dropped_columns_not_surfaced(self):
+        """Source info for dropped packet columns should not appear in output."""
+        from orcapod.core.sources.arrow_table_source import ArrowTableSource
+
+        src = ArrowTableSource(
+            pa.table(
+                {
+                    "animal": ["cat", "dog"],
+                    "weight": [4.0, 12.0],
+                    "legs": pa.array([4, 4], type=pa.int64()),
+                }
+            ),
+            tag_columns=["animal"],
+        )
+        op = SelectPacketColumns(columns=["weight"])
+        results = await run_unary(op, src)
+
+        for _, packet in results:
+            si = packet.source_info()
+            assert "legs" not in si
+            assert "weight" in si
+
+
+# ===================================================================
+# DropTagColumns — streaming per-row
+# ===================================================================
+
+
+class TestDropTagColumnsStreaming:
+    @pytest.mark.asyncio
+    async def test_drops_specified_tags(self):
+        stream = make_two_tag_stream()
+        op = DropTagColumns(columns=["region"])
+        results = await run_unary(op, stream)
+
+        assert len(results) == 3
+        for tag, packet in results:
+            assert "region" not in tag.keys()
+            assert "animal" in tag.keys()
+            assert "count" in packet.keys()
+
+    @pytest.mark.asyncio
+    async def test_no_columns_to_drop_passthrough(self):
+        stream = make_two_tag_stream()
+        op = DropTagColumns(columns=["nonexistent"], strict=False)
+        results = await run_unary(op, stream)
+
+        assert len(results) == 3
+        for tag, _ in results:
+            assert set(tag.keys()) == {"region", "animal"}
+
+    @pytest.mark.asyncio
+    async def test_data_values_preserved(self):
+        stream = make_two_tag_stream()
+        op = DropTagColumns(columns=["region"])
+        results = await run_unary(op, stream)
+
+        animals = sorted(tag.as_dict()["animal"] for tag, _ in results)
+        assert animals == ["cat", "cat", "dog"]
+
+    @pytest.mark.asyncio
+    async def test_empty_input(self):
+        input_ch = Channel(buffer_size=4)
+        output_ch = Channel(buffer_size=4)
+        await input_ch.writer.close()
+        op = DropTagColumns(columns=["region"])
+        await op.async_execute([input_ch.reader], output_ch.writer)
+        results = await output_ch.reader.collect()
+        assert results == []
+
+    @pytest.mark.asyncio
+    async def test_matches_sync_output(self):
+        stream = make_two_tag_stream()
+        op = DropTagColumns(columns=["region"])
+
+        async_results = await run_unary(op, stream)
+        sync_results = sync_process_to_rows(op, stream)
+
+        assert len(async_results) == len(sync_results)
+        async_animals = sorted(t.as_dict()["animal"] for t, _ in async_results)
+        sync_animals = sorted(t.as_dict()["animal"] for t, _ in sync_results)
+        assert async_animals == sync_animals
+
+
+# ===================================================================
+# DropPacketColumns — streaming per-row
+# ===================================================================
+
+
+class TestDropPacketColumnsStreaming:
+    @pytest.mark.asyncio
+    async def test_drops_specified_packets(self):
+        stream = make_simple_stream()
+        op = DropPacketColumns(columns=["legs"])
+        results = await run_unary(op, stream)
+
+        assert len(results) == 3
+        for _, packet in results:
+            assert "legs" not in packet.keys()
+            assert "weight" in packet.keys()
+        for tag, _ in results:
+            assert "animal" in tag.keys()
+
+    @pytest.mark.asyncio
+    async def test_no_columns_to_drop_passthrough(self):
+        stream = make_simple_stream()
+        op = DropPacketColumns(columns=["nonexistent"], strict=False)
+        results = await run_unary(op, stream)
+
+        assert len(results) == 3
+        for _, packet in results:
+            assert set(packet.keys()) == {"weight", "legs"}
+
+    @pytest.mark.asyncio
+    async def test_data_values_preserved(self):
+        stream = make_simple_stream()
+        op = DropPacketColumns(columns=["legs"])
+        results = await run_unary(op, stream)
+
+        weights = sorted(pkt.as_dict()["weight"] for _, pkt in results)
+        assert weights == [0.5, 4.0, 12.0]
+
+    @pytest.mark.asyncio
+    async def test_empty_input(self):
+        input_ch = Channel(buffer_size=4)
+        output_ch = Channel(buffer_size=4)
+        await input_ch.writer.close()
+        op = DropPacketColumns(columns=["legs"])
+        await op.async_execute([input_ch.reader], output_ch.writer)
+        results = await output_ch.reader.collect()
+        assert results == []
+
+    @pytest.mark.asyncio
+    async def test_matches_sync_output(self):
+        stream = make_simple_stream()
+        op = DropPacketColumns(columns=["legs"])
+
+        async_results = await run_unary(op, stream)
+        sync_results = sync_process_to_rows(op, stream)
+
+        assert len(async_results) == len(sync_results)
+        async_vals = sorted(p.as_dict()["weight"] for _, p in async_results)
+        sync_vals = sorted(p.as_dict()["weight"] for _, p in sync_results)
+        assert async_vals == sync_vals
+
+    @pytest.mark.asyncio
+    async def test_source_info_for_dropped_columns_not_surfaced(self):
+        from orcapod.core.sources.arrow_table_source import ArrowTableSource
+
+        src = ArrowTableSource(
+            pa.table(
+                {
+                    "animal": ["cat", "dog"],
+                    "weight": [4.0, 12.0],
+                    "legs": pa.array([4, 4], type=pa.int64()),
+                }
+            ),
+            tag_columns=["animal"],
+        )
+        op = DropPacketColumns(columns=["legs"])
+        results = await run_unary(op, src)
+
+        for _, packet in results:
+            si = packet.source_info()
+            assert "legs" not in si
+            assert "weight" in si
+
+
+# ===================================================================
+# MapTags — streaming per-row
+# ===================================================================
+
+
+class TestMapTagsStreaming:
+    @pytest.mark.asyncio
+    async def test_renames_tag_column(self):
+        stream = make_two_tag_stream()
+        op = MapTags(name_map={"region": "area"})
+        results = await run_unary(op, stream)
+
+        assert len(results) == 3
+        for tag, _ in results:
+            tag_keys = tag.keys()
+            assert "area" in tag_keys
+            assert "region" not in tag_keys
+
+    @pytest.mark.asyncio
+    async def test_data_values_preserved(self):
+        stream = make_two_tag_stream()
+        op = MapTags(name_map={"region": "area"})
+        results = await run_unary(op, stream)
+
+        areas = sorted(tag.as_dict()["area"] for tag, _ in results)
+        assert areas == ["east", "east", "west"]
+
+    @pytest.mark.asyncio
+    async def test_drop_unmapped(self):
+        stream = make_two_tag_stream()
+        op = MapTags(name_map={"region": "area"}, drop_unmapped=True)
+        results = await run_unary(op, stream)
+
+        assert len(results) == 3
+        for tag, _ in results:
+            tag_keys = tag.keys()
+            assert "area" in tag_keys
+            assert "animal" not in tag_keys  # dropped because unmapped
+
+    @pytest.mark.asyncio
+    async def test_no_matching_rename_passthrough(self):
+        stream = make_two_tag_stream()
+        op = MapTags(name_map={"nonexistent": "nope"})
+        results = await run_unary(op, stream)
+
+        assert len(results) == 3
+        for tag, _ in results:
+            assert set(tag.keys()) == {"region", "animal"}
+
+    @pytest.mark.asyncio
+    async def test_empty_input(self):
+        input_ch = Channel(buffer_size=4)
+        output_ch = Channel(buffer_size=4)
+        await input_ch.writer.close()
+        op = MapTags(name_map={"region": "area"})
+        await op.async_execute([input_ch.reader], output_ch.writer)
+        results = await output_ch.reader.collect()
+        assert results == []
+
+    @pytest.mark.asyncio
+    async def test_matches_sync_output(self):
+        stream = make_two_tag_stream()
+        op = MapTags(name_map={"region": "area"})
+
+        async_results = await run_unary(op, stream)
+        sync_results = sync_process_to_rows(op, stream)
+
+        assert len(async_results) == len(sync_results)
+        async_areas = sorted(t.as_dict()["area"] for t, _ in async_results)
+        sync_areas = sorted(t.as_dict()["area"] for t, _ in sync_results)
+        assert async_areas == sync_areas
+
+    @pytest.mark.asyncio
+    async def test_matches_sync_output_with_drop_unmapped(self):
+        stream = make_two_tag_stream()
+        op = MapTags(name_map={"region": "area"}, drop_unmapped=True)
+
+        async_results = await run_unary(op, stream)
+        sync_results = sync_process_to_rows(op, stream)
+
+        assert len(async_results) == len(sync_results)
+        for (at, ap), (st, sp) in zip(
+            sorted(async_results, key=lambda x: x[0].as_dict()["area"]),
+            sorted(sync_results, key=lambda x: x[0].as_dict()["area"]),
+        ):
+            assert at.as_dict() == st.as_dict()
+            assert ap.as_dict() == sp.as_dict()
+
+
+# ===================================================================
+# MapPackets — streaming per-row
+# ===================================================================
+
+
+class TestMapPacketsStreaming:
+    @pytest.mark.asyncio
+    async def test_renames_packet_column(self):
+        stream = make_simple_stream()
+        op = MapPackets(name_map={"weight": "mass"})
+        results = await run_unary(op, stream)
+
+        assert len(results) == 3
+        for _, packet in results:
+            pkt_keys = packet.keys()
+            assert "mass" in pkt_keys
+            assert "weight" not in pkt_keys
+
+    @pytest.mark.asyncio
+    async def test_data_values_preserved(self):
+        stream = make_simple_stream()
+        op = MapPackets(name_map={"weight": "mass"})
+        results = await run_unary(op, stream)
+
+        masses = sorted(pkt.as_dict()["mass"] for _, pkt in results)
+        assert masses == [0.5, 4.0, 12.0]
+
+    @pytest.mark.asyncio
+    async def test_drop_unmapped(self):
+        stream = make_simple_stream()
+        op = MapPackets(name_map={"weight": "mass"}, drop_unmapped=True)
+        results = await run_unary(op, stream)
+
+        assert len(results) == 3
+        for _, packet in results:
+            pkt_keys = packet.keys()
+            assert "mass" in pkt_keys
+            assert "legs" not in pkt_keys  # dropped because unmapped
+
+    @pytest.mark.asyncio
+    async def test_source_info_renamed(self):
+        """Packet.rename() should update source_info keys."""
+        from orcapod.core.sources.arrow_table_source import ArrowTableSource
+
+        src = ArrowTableSource(
+            pa.table(
+                {
+                    "animal": ["cat", "dog"],
+                    "weight": [4.0, 12.0],
+                    "legs": pa.array([4, 4], type=pa.int64()),
+                }
+            ),
+            tag_columns=["animal"],
+        )
+        op = MapPackets(name_map={"weight": "mass"})
+        results = await run_unary(op, src)
+
+        for _, packet in results:
+            si = packet.source_info()
+            assert "mass" in si
+            assert "weight" not in si
+
+    @pytest.mark.asyncio
+    async def test_no_matching_rename_passthrough(self):
+        stream = make_simple_stream()
+        op = MapPackets(name_map={"nonexistent": "nope"})
+        results = await run_unary(op, stream)
+
+        assert len(results) == 3
+        for _, packet in results:
+            assert set(packet.keys()) == {"weight", "legs"}
+
+    @pytest.mark.asyncio
+    async def test_empty_input(self):
+        input_ch = Channel(buffer_size=4)
+        output_ch = Channel(buffer_size=4)
+        await input_ch.writer.close()
+        op = MapPackets(name_map={"weight": "mass"})
+        await op.async_execute([input_ch.reader], output_ch.writer)
+        results = await output_ch.reader.collect()
+        assert results == []
+
+    @pytest.mark.asyncio
+    async def test_matches_sync_output(self):
+        stream = make_simple_stream()
+        op = MapPackets(name_map={"weight": "mass"})
+
+        async_results = await run_unary(op, stream)
+        sync_results = sync_process_to_rows(op, stream)
+
+        assert len(async_results) == len(sync_results)
+        async_masses = sorted(p.as_dict()["mass"] for _, p in async_results)
+        sync_masses = sorted(p.as_dict()["mass"] for _, p in sync_results)
+        assert async_masses == sync_masses
+
+
+# ===================================================================
+# Batch — streaming accumulate-and-emit
+# ===================================================================
+
+
+class TestBatchStreaming:
+    @pytest.mark.asyncio
+    async def test_batch_groups_rows(self):
+        stream = make_simple_stream()  # 3 rows
+        op = Batch(batch_size=2)
+        results = await run_unary(op, stream)
+
+        # 3 rows / batch_size=2 → 2 batches (full + partial)
+        assert len(results) == 2
+
+    @pytest.mark.asyncio
+    async def test_batch_drop_partial(self):
+        stream = make_simple_stream()  # 3 rows
+        op = Batch(batch_size=2, drop_partial_batch=True)
+        results = await run_unary(op, stream)
+
+        # 3 rows / batch_size=2 with drop → 1 batch
+        assert len(results) == 1
+
+    @pytest.mark.asyncio
+    async def test_batch_size_zero_single_batch(self):
+        stream = make_simple_stream()  # 3 rows
+        op = Batch(batch_size=0)
+        results = await run_unary(op, stream)
+
+        # batch_size=0 → all in one batch
+        assert len(results) == 1
+
+    @pytest.mark.asyncio
+    async def test_batch_values_are_lists(self):
+        stream = make_int_stream(4)
+        op = Batch(batch_size=2)
+        results = await run_unary(op, stream)
+
+        assert len(results) == 2
+        for tag, packet in results:
+            # Each value should be a list
+            tag_d = tag.as_dict()
+            pkt_d = packet.as_dict()
+            assert isinstance(tag_d["id"], list)
+            assert isinstance(pkt_d["x"], list)
+            assert len(tag_d["id"]) == 2
+            assert len(pkt_d["x"]) == 2
+
+    @pytest.mark.asyncio
+    async def test_batch_exact_multiple(self):
+        stream = make_int_stream(6)
+        op = Batch(batch_size=2)
+        results = await run_unary(op, stream)
+
+        # 6 / 2 = 3 full batches, no partial
+        assert len(results) == 3
+
+    @pytest.mark.asyncio
+    async def test_batch_exact_multiple_drop_partial(self):
+        stream = make_int_stream(6)
+        op = Batch(batch_size=2, drop_partial_batch=True)
+        results = await run_unary(op, stream)
+
+        # Same as without drop since there's no partial batch
+        assert len(results) == 3
+
+    @pytest.mark.asyncio
+    async def test_empty_input(self):
+        input_ch = Channel(buffer_size=4)
+        output_ch = Channel(buffer_size=4)
+        await input_ch.writer.close()
+        op = Batch(batch_size=2)
+        await op.async_execute([input_ch.reader], output_ch.writer)
+        results = await output_ch.reader.collect()
+        assert results == []
+
+    @pytest.mark.asyncio
+    async def test_matches_sync_output(self):
+        stream = make_int_stream(7)
+        op = Batch(batch_size=3)
+
+        async_results = await run_unary(op, stream)
+        sync_results = sync_process_to_rows(op, stream)
+
+        assert len(async_results) == len(sync_results)
+        # Each batch should have the same data
+        for (at, ap), (st, sp) in zip(async_results, sync_results):
+            assert at.as_dict() == st.as_dict()
+            assert ap.as_dict() == sp.as_dict()
+
+    @pytest.mark.asyncio
+    async def test_matches_sync_output_batch_zero(self):
+        stream = make_int_stream(5)
+        op = Batch(batch_size=0)
+
+        async_results = await run_unary(op, stream)
+        sync_results = sync_process_to_rows(op, stream)
+
+        assert len(async_results) == len(sync_results) == 1
+        assert async_results[0][0].as_dict() == sync_results[0][0].as_dict()
+        assert async_results[0][1].as_dict() == sync_results[0][1].as_dict()
+
+    @pytest.mark.asyncio
+    async def test_matches_sync_output_drop_partial(self):
+        stream = make_int_stream(5)
+        op = Batch(batch_size=3, drop_partial_batch=True)
+
+        async_results = await run_unary(op, stream)
+        sync_results = sync_process_to_rows(op, stream)
+
+        assert len(async_results) == len(sync_results)
+        for (at, ap), (st, sp) in zip(async_results, sync_results):
+            assert at.as_dict() == st.as_dict()
+            assert ap.as_dict() == sp.as_dict()
+
+
+# ===================================================================
+# SemiJoin — build-probe
+# ===================================================================
+
+
+class TestSemiJoinBuildProbe:
+    @pytest.mark.asyncio
+    async def test_filters_left_by_right(self):
+        left = make_left_stream()    # id=[1,2,3]
+        right = make_right_stream()  # id=[2,3,4]
+        op = SemiJoin()
+        results = await run_binary(op, left, right)
+
+        ids = sorted(tag.as_dict()["id"] for tag, _ in results)
+        assert ids == [2, 3]
+
+    @pytest.mark.asyncio
+    async def test_preserves_left_schema(self):
+        left = make_left_stream()
+        right = make_right_stream()
+        op = SemiJoin()
+        results = await run_binary(op, left, right)
+
+        for tag, packet in results:
+            assert "id" in tag.keys()
+            assert "value_a" in packet.keys()
+            assert "value_b" not in packet.keys()
+
+    @pytest.mark.asyncio
+    async def test_preserves_left_data(self):
+        left = make_left_stream()
+        right = make_right_stream()
+        op = SemiJoin()
+        results = await run_binary(op, left, right)
+
+        result_map = {tag.as_dict()["id"]: pkt.as_dict()["value_a"] for tag, pkt in results}
+        assert result_map[2] == 20
+        assert result_map[3] == 30
+
+    @pytest.mark.asyncio
+    async def test_no_common_keys_returns_all_left(self):
+        left_table = pa.table(
+            {
+                "a": pa.array([1, 2, 3], type=pa.int64()),
+                "x": pa.array([10, 20, 30], type=pa.int64()),
+            }
+        )
+        right_table = pa.table(
+            {
+                "b": pa.array([1, 2], type=pa.int64()),
+                "y": pa.array([100, 200], type=pa.int64()),
+            }
+        )
+        left = ArrowTableStream(left_table, tag_columns=["a"])
+        right = ArrowTableStream(right_table, tag_columns=["b"])
+        op = SemiJoin()
+        results = await run_binary(op, left, right)
+
+        assert len(results) == 3  # all left rows pass through
+
+    @pytest.mark.asyncio
+    async def test_no_matching_rows_empty_result(self):
+        left_table = pa.table(
+            {
+                "id": pa.array([1, 2], type=pa.int64()),
+                "x": pa.array([10, 20], type=pa.int64()),
+            }
+        )
+        right_table = pa.table(
+            {
+                "id": pa.array([3, 4], type=pa.int64()),
+                "y": pa.array([30, 40], type=pa.int64()),
+            }
+        )
+        left = ArrowTableStream(left_table, tag_columns=["id"])
+        right = ArrowTableStream(right_table, tag_columns=["id"])
+        op = SemiJoin()
+        results = await run_binary(op, left, right)
+
+        assert len(results) == 0
+
+    @pytest.mark.asyncio
+    async def test_empty_left_returns_empty(self):
+        """Empty left input produces empty output regardless of right."""
+        right_table = pa.table(
+            {
+                "id": pa.array([1, 2], type=pa.int64()),
+                "y": pa.array([100, 200], type=pa.int64()),
+            }
+        )
+        right = ArrowTableStream(right_table, tag_columns=["id"])
+
+        left_ch = Channel(buffer_size=4)
+        right_ch = Channel(buffer_size=64)
+        output_ch = Channel(buffer_size=64)
+
+        await left_ch.writer.close()
+        await feed(right, right_ch)
+
+        op = SemiJoin()
+        await op.async_execute([left_ch.reader, right_ch.reader], output_ch.writer)
+        results = await output_ch.reader.collect()
+        assert results == []
+
+    @pytest.mark.asyncio
+    async def test_empty_right_returns_empty_or_all(self):
+        """Empty right: if common keys, result is empty; if no common keys, left passes through.
+        Since both sides are empty-right, we rely on the barrier fallback."""
+        left = make_left_stream()
+
+        left_ch = Channel(buffer_size=64)
+        right_ch = Channel(buffer_size=4)
+        output_ch = Channel(buffer_size=64)
+
+        await feed(left, left_ch)
+        await right_ch.writer.close()
+
+        op = SemiJoin()
+        await op.async_execute([left_ch.reader, right_ch.reader], output_ch.writer)
+        results = await output_ch.reader.collect()
+        # With empty right and no schema information available,
+        # the implementation falls back to passing left through
+        assert len(results) == 3
+
+    @pytest.mark.asyncio
+    async def test_matches_sync_output(self):
+        left = make_left_stream()
+        right = make_right_stream()
+        op = SemiJoin()
+
+        async_results = await run_binary(op, left, right)
+        sync_results = sync_process_to_rows(op, left, right)
+
+        assert len(async_results) == len(sync_results)
+        async_ids = sorted(t.as_dict()["id"] for t, _ in async_results)
+        sync_ids = sorted(t.as_dict()["id"] for t, _ in sync_results)
+        assert async_ids == sync_ids
+
+    @pytest.mark.asyncio
+    async def test_large_input_streaming(self):
+        """SemiJoin should handle larger inputs correctly with build-probe."""
+        left_table = pa.table(
+            {
+                "id": pa.array(list(range(100)), type=pa.int64()),
+                "x": pa.array(list(range(100)), type=pa.int64()),
+            }
+        )
+        right_table = pa.table(
+            {
+                "id": pa.array(list(range(0, 100, 3)), type=pa.int64()),  # every 3rd
+                "y": pa.array(list(range(0, 100, 3)), type=pa.int64()),
+            }
+        )
+        left = ArrowTableStream(left_table, tag_columns=["id"])
+        right = ArrowTableStream(right_table, tag_columns=["id"])
+        op = SemiJoin()
+        results = await run_binary(op, left, right)
+
+        expected_ids = list(range(0, 100, 3))
+        result_ids = sorted(t.as_dict()["id"] for t, _ in results)
+        assert result_ids == expected_ids
+
+
+# ===================================================================
+# Join — native async
+# ===================================================================
+
+
+class TestJoinNativeAsync:
+    """Tests for Join.async_execute (symmetric hash join + N>2 barrier)."""
+
+    @pytest.mark.asyncio
+    async def test_single_input_passthrough(self):
+        stream = make_int_stream(3)
+        op = Join()
+
+        input_ch = Channel(buffer_size=64)
+        output_ch = Channel(buffer_size=64)
+        await feed(stream, input_ch)
+        await op.async_execute([input_ch.reader], output_ch.writer)
+        results = await output_ch.reader.collect()
+
+        assert len(results) == 3
+        ids = sorted(t.as_dict()["id"] for t, _ in results)
+        assert ids == [0, 1, 2]
+
+    @pytest.mark.asyncio
+    async def test_two_way_join(self):
+        left = make_simple_stream()
+        right = make_disjoint_stream()
+        op = Join()
+        results = await run_binary(op, left, right)
+
+        assert len(results) == 3
+        for tag, packet in results:
+            assert "animal" in tag.keys()
+            pkt_d = packet.as_dict()
+            assert "weight" in pkt_d
+            assert "speed" in pkt_d
+
+    @pytest.mark.asyncio
+    async def test_two_way_join_data_correct(self):
+        left_table = pa.table(
+            {
+                "id": pa.array([0, 1, 2], type=pa.int64()),
+                "x": pa.array([10, 20, 30], type=pa.int64()),
+            }
+        )
+        right_table = pa.table(
+            {
+                "id": pa.array([0, 1, 2], type=pa.int64()),
+                "y": pa.array([100, 200, 300], type=pa.int64()),
+            }
+        )
+        left = ArrowTableStream(left_table, tag_columns=["id"])
+        right = ArrowTableStream(right_table, tag_columns=["id"])
+        op = Join()
+        results = await run_binary(op, left, right)
+
+        assert len(results) == 3
+        result_map = {
+            tag.as_dict()["id"]: pkt.as_dict() for tag, pkt in results
+        }
+        assert result_map[0] == {"x": 10, "y": 100}
+        assert result_map[1] == {"x": 20, "y": 200}
+        assert result_map[2] == {"x": 30, "y": 300}
+
+    @pytest.mark.asyncio
+    async def test_three_way_join(self):
+        t1 = pa.table(
+            {
+                "id": pa.array([1, 2], type=pa.int64()),
+                "a": pa.array([10, 20], type=pa.int64()),
+            }
+        )
+        t2 = pa.table(
+            {
+                "id": pa.array([1, 2], type=pa.int64()),
+                "b": pa.array([100, 200], type=pa.int64()),
+            }
+        )
+        t3 = pa.table(
+            {
+                "id": pa.array([1, 2], type=pa.int64()),
+                "c": pa.array([1000, 2000], type=pa.int64()),
+            }
+        )
+        s1 = ArrowTableStream(t1, tag_columns=["id"])
+        s2 = ArrowTableStream(t2, tag_columns=["id"])
+        s3 = ArrowTableStream(t3, tag_columns=["id"])
+
+        op = Join()
+        ch1 = Channel(buffer_size=64)
+        ch2 = Channel(buffer_size=64)
+        ch3 = Channel(buffer_size=64)
+        out = Channel(buffer_size=64)
+
+        await feed(s1, ch1)
+        await feed(s2, ch2)
+        await feed(s3, ch3)
+        await op.async_execute([ch1.reader, ch2.reader, ch3.reader], out.writer)
+        results = await out.reader.collect()
+
+        assert len(results) == 2
+        result_map = {
+            tag.as_dict()["id"]: pkt.as_dict() for tag, pkt in results
+        }
+        assert result_map[1] == {"a": 10, "b": 100, "c": 1000}
+        assert result_map[2] == {"a": 20, "b": 200, "c": 2000}
+
+    @pytest.mark.asyncio
+    async def test_join_no_shared_tags_cartesian(self):
+        """When no shared tag keys, join produces a cartesian product."""
+        left_table = pa.table(
+            {
+                "a": pa.array([1, 2], type=pa.int64()),
+                "x": pa.array([10, 20], type=pa.int64()),
+            }
+        )
+        right_table = pa.table(
+            {
+                "b": pa.array([3, 4], type=pa.int64()),
+                "y": pa.array([30, 40], type=pa.int64()),
+            }
+        )
+        left = ArrowTableStream(left_table, tag_columns=["a"])
+        right = ArrowTableStream(right_table, tag_columns=["b"])
+        op = Join()
+        results = await run_binary(op, left, right)
+
+        # 2 × 2 = 4 cartesian product
+        assert len(results) == 4
+
+    @pytest.mark.asyncio
+    async def test_empty_input_single(self):
+        input_ch = Channel(buffer_size=4)
+        output_ch = Channel(buffer_size=4)
+        await input_ch.writer.close()
+        op = Join()
+        await op.async_execute([input_ch.reader], output_ch.writer)
+        results = await output_ch.reader.collect()
+        assert results == []
+
+    @pytest.mark.asyncio
+    async def test_matches_sync_two_way(self):
+        left = make_simple_stream()
+        right = make_disjoint_stream()
+        op = Join()
+
+        async_results = await run_binary(op, left, right)
+        sync_results = sync_process_to_rows(op, left, right)
+
+        assert len(async_results) == len(sync_results)
+        async_data = sorted(
+            (t.as_dict()["animal"], p.as_dict()) for t, p in async_results
+        )
+        sync_data = sorted(
+            (t.as_dict()["animal"], p.as_dict()) for t, p in sync_results
+        )
+        assert async_data == sync_data
+
+
+# ===================================================================
+# Multi-stage pipeline integration
+# ===================================================================
+
+
+class TestStreamingPipelineIntegration:
+    @pytest.mark.asyncio
+    async def test_select_then_map_chain(self):
+        """SelectTagColumns → MapTags in a streaming pipeline."""
+        stream = make_two_tag_stream()
+
+        select_op = SelectTagColumns(columns=["region"])
+        map_op = MapTags(name_map={"region": "area"})
+
+        ch1 = Channel(buffer_size=16)
+        ch2 = Channel(buffer_size=16)
+        ch3 = Channel(buffer_size=16)
+
+        async def source():
+            for tag, packet in stream.iter_packets():
+                await ch1.writer.send((tag, packet))
+            await ch1.writer.close()
+
+        async with asyncio.TaskGroup() as tg:
+            tg.create_task(source())
+            tg.create_task(select_op.async_execute([ch1.reader], ch2.writer))
+            tg.create_task(map_op.async_execute([ch2.reader], ch3.writer))
+
+        results = await ch3.reader.collect()
+        assert len(results) == 3
+        for tag, _ in results:
+            assert "area" in tag.keys()
+            assert "region" not in tag.keys()
+            assert "animal" not in tag.keys()
+
+    @pytest.mark.asyncio
+    async def test_join_then_select_chain(self):
+        """Join → SelectPacketColumns in a streaming pipeline."""
+        left_table = pa.table(
+            {
+                "id": pa.array([0, 1, 2], type=pa.int64()),
+                "x": pa.array([10, 20, 30], type=pa.int64()),
+            }
+        )
+        right_table = pa.table(
+            {
+                "id": pa.array([0, 1, 2], type=pa.int64()),
+                "y": pa.array([100, 200, 300], type=pa.int64()),
+            }
+        )
+        left = ArrowTableStream(left_table, tag_columns=["id"])
+        right = ArrowTableStream(right_table, tag_columns=["id"])
+
+        join_op = Join()
+        select_op = SelectPacketColumns(columns=["x"])
+
+        ch_l = Channel(buffer_size=16)
+        ch_r = Channel(buffer_size=16)
+        ch_joined = Channel(buffer_size=16)
+        ch_out = Channel(buffer_size=16)
+
+        async def push(stream, ch):
+            for tag, packet in stream.iter_packets():
+                await ch.writer.send((tag, packet))
+            await ch.writer.close()
+
+        async with asyncio.TaskGroup() as tg:
+            tg.create_task(push(left, ch_l))
+            tg.create_task(push(right, ch_r))
+            tg.create_task(
+                join_op.async_execute([ch_l.reader, ch_r.reader], ch_joined.writer)
+            )
+            tg.create_task(select_op.async_execute([ch_joined.reader], ch_out.writer))
+
+        results = await ch_out.reader.collect()
+        assert len(results) == 3
+        for _, packet in results:
+            assert "x" in packet.keys()
+            assert "y" not in packet.keys()
+
+    @pytest.mark.asyncio
+    async def test_semijoin_then_batch_chain(self):
+        """SemiJoin → Batch in a streaming pipeline."""
+        left = make_left_stream()    # id=[1,2,3]
+        right = make_right_stream()  # id=[2,3,4]
+
+        semi_op = SemiJoin()
+        batch_op = Batch(batch_size=2)
+
+        ch_l = Channel(buffer_size=16)
+        ch_r = Channel(buffer_size=16)
+        ch_semi = Channel(buffer_size=16)
+        ch_out = Channel(buffer_size=16)
+
+        async def push(stream, ch):
+            for tag, packet in stream.iter_packets():
+                await ch.writer.send((tag, packet))
+            await ch.writer.close()
+
+        async with asyncio.TaskGroup() as tg:
+            tg.create_task(push(left, ch_l))
+            tg.create_task(push(right, ch_r))
+            tg.create_task(
+                semi_op.async_execute([ch_l.reader, ch_r.reader], ch_semi.writer)
+            )
+            tg.create_task(batch_op.async_execute([ch_semi.reader], ch_out.writer))
+
+        results = await ch_out.reader.collect()
+        # SemiJoin produces 2 rows (id=[2,3]), Batch(2) → 1 batch
+        assert len(results) == 1
+        tag_d = results[0][0].as_dict()
+        assert isinstance(tag_d["id"], list)
+        assert sorted(tag_d["id"]) == [2, 3]
+
+    @pytest.mark.asyncio
+    async def test_drop_map_select_three_stage(self):
+        """DropPacketColumns → MapPackets → SelectPacketColumns chain."""
+        stream = make_simple_stream()  # animal | weight, legs
+
+        drop_op = DropPacketColumns(columns=["legs"])
+        map_op = MapPackets(name_map={"weight": "mass"})
+        # After map: mass (only packet column)
+        select_op = SelectPacketColumns(columns=["mass"])
+
+        ch1 = Channel(buffer_size=16)
+        ch2 = Channel(buffer_size=16)
+        ch3 = Channel(buffer_size=16)
+        ch4 = Channel(buffer_size=16)
+
+        async def source():
+            for tag, packet in stream.iter_packets():
+                await ch1.writer.send((tag, packet))
+            await ch1.writer.close()
+
+        async with asyncio.TaskGroup() as tg:
+            tg.create_task(source())
+            tg.create_task(drop_op.async_execute([ch1.reader], ch2.writer))
+            tg.create_task(map_op.async_execute([ch2.reader], ch3.writer))
+            tg.create_task(select_op.async_execute([ch3.reader], ch4.writer))
+
+        results = await ch4.reader.collect()
+        assert len(results) == 3
+        for _, packet in results:
+            assert packet.keys() == ("mass",)
+        masses = sorted(pkt.as_dict()["mass"] for _, pkt in results)
+        assert masses == [0.5, 4.0, 12.0]
+
+
+# ===================================================================
+# Sync vs Async system-tag equivalence
+# ===================================================================
+
+
+def _make_source(tag_col: str, packet_col: str, data: dict) -> ArrowTableStream:
+    """Build an ArrowTableSource (which generates system tags) and return its stream."""
+    from orcapod.core.sources.arrow_table_source import ArrowTableSource
+
+    table = pa.table(
+        {
+            tag_col: pa.array(data[tag_col], type=pa.large_string()),
+            packet_col: pa.array(data[packet_col], type=pa.int64()),
+        }
+    )
+    return ArrowTableSource(table, tag_columns=[tag_col])
+
+
+async def run_binary_validated(
+    op, left: ArrowTableStream, right: ArrowTableStream,
+) -> list[tuple]:
+    """Run a binary operator async with validation and pipeline hashes.
+
+    Calls ``validate_inputs`` for schema validation, then passes
+    ``input_pipeline_hashes`` so operators like ``Join`` can compute
+    canonical system-tag column names.
+    """
+    op.validate_inputs(left, right)
+    left_ch = Channel(buffer_size=1024)
+    right_ch = Channel(buffer_size=1024)
+    output_ch = Channel(buffer_size=1024)
+    await feed(left, left_ch)
+    await feed(right, right_ch)
+    hashes = [left.pipeline_hash(), right.pipeline_hash()]
+    await op.async_execute(
+        [left_ch.reader, right_ch.reader],
+        output_ch.writer,
+        input_pipeline_hashes=hashes,
+    )
+    return await output_ch.reader.collect()
+
+
+def _extract_system_tags(
+    rows: list[tuple],
+) -> list[dict[str, str]]:
+    """Extract sorted system-tag dicts from (tag, packet) pairs."""
+    return sorted(
+        [tag.system_tags() for tag, _ in rows],
+        key=lambda d: sorted(d.items()),
+    )
+
+
+def _extract_system_tag_keys(rows: list[tuple]) -> set[str]:
+    """Collect all unique system-tag keys across rows."""
+    keys: set[str] = set()
+    for tag, _ in rows:
+        keys.update(tag.system_tags().keys())
+    return keys
+
+
+class TestJoinSystemTagEquivalence:
+    """Verify that Join.async_execute produces the same system-tag column
+    names and values as the sync static_process path.
+
+    Uses ``ArrowTableSource`` (which adds system-tag columns) rather than
+    bare ``ArrowTableStream`` to ensure system tags are present.
+    """
+
+    @pytest.mark.asyncio
+    async def test_two_way_system_tag_column_names_match(self):
+        """System-tag column names must be identical between sync and async."""
+        left = _make_source("key", "value", {"key": ["a", "b"], "value": [10, 20]})
+        right = _make_source("key", "score", {"key": ["a", "b"], "score": [100, 200]})
+        op = Join()
+
+        # Sync
+        sync_result = op.static_process(left, right)
+        sync_rows = list(sync_result.iter_packets())
+
+        # Async
+        async_rows = await run_binary_validated(op, left, right)
+
+        sync_sys_keys = _extract_system_tag_keys(sync_rows)
+        async_sys_keys = _extract_system_tag_keys(async_rows)
+
+        assert sync_sys_keys, "Expected system tags to be present"
+        assert sync_sys_keys == async_sys_keys
+
+    @pytest.mark.asyncio
+    async def test_two_way_system_tag_values_match(self):
+        """System-tag values for each row must match between sync and async."""
+        left = _make_source("key", "value", {"key": ["a", "b", "c"], "value": [1, 2, 3]})
+        right = _make_source("key", "score", {"key": ["a", "b", "c"], "score": [10, 20, 30]})
+        op = Join()
+
+        sync_result = op.static_process(left, right)
+        sync_rows = list(sync_result.iter_packets())
+        async_rows = await run_binary_validated(op, left, right)
+
+        assert len(sync_rows) == len(async_rows)
+
+        sync_sys = _extract_system_tags(sync_rows)
+        async_sys = _extract_system_tags(async_rows)
+        assert sync_sys == async_sys
+
+    @pytest.mark.asyncio
+    async def test_two_way_system_tag_suffixes_use_pipeline_hash(self):
+        """System-tag column names should contain the pipeline_hash and
+        canonical position, matching the name-extending convention."""
+        left = _make_source("key", "val", {"key": ["x"], "val": [1]})
+        right = _make_source("key", "score", {"key": ["x"], "score": [2]})
+        op = Join()
+
+        async_rows = await run_binary_validated(op, left, right)
+        sys_keys = _extract_system_tag_keys(async_rows)
+
+        # Each system-tag key should end with :{canonical_position}
+        for key in sys_keys:
+            assert key.startswith(constants.SYSTEM_TAG_PREFIX)
+            assert key[-2:] in (":0", ":1"), (
+                f"System tag key {key!r} does not end with :0 or :1"
+            )
+
+    @pytest.mark.asyncio
+    async def test_commutativity_system_tags_identical(self):
+        """Join(A, B) and Join(B, A) should produce identical system tags
+        (Join is commutative — canonical ordering by pipeline_hash)."""
+        src_a = _make_source("id", "x", {"id": ["p", "q"], "x": [1, 2]})
+        src_b = _make_source("id", "y", {"id": ["p", "q"], "y": [10, 20]})
+        op = Join()
+
+        rows_ab = await run_binary_validated(op, src_a, src_b)
+        rows_ba = await run_binary_validated(op, src_b, src_a)
+
+        assert len(rows_ab) == len(rows_ba)
+
+        sys_ab = _extract_system_tags(rows_ab)
+        sys_ba = _extract_system_tags(rows_ba)
+        assert sys_ab == sys_ba
+
+    @pytest.mark.asyncio
+    async def test_three_way_system_tags_match_sync(self):
+        """N>2 barrier fallback should produce the same system tags as sync."""
+        s1 = _make_source("id", "a", {"id": ["m", "n"], "a": [1, 2]})
+        s2 = _make_source("id", "b", {"id": ["m", "n"], "b": [10, 20]})
+        s3 = _make_source("id", "c", {"id": ["m", "n"], "c": [100, 200]})
+        op = Join()
+
+        # Sync
+        sync_result = op.static_process(s1, s2, s3)
+        sync_rows = list(sync_result.iter_packets())
+
+        # Async (N>2 barrier path)
+        op.validate_inputs(s1, s2, s3)
+        ch1 = Channel(buffer_size=64)
+        ch2 = Channel(buffer_size=64)
+        ch3 = Channel(buffer_size=64)
+        out = Channel(buffer_size=64)
+        await feed(s1, ch1)
+        await feed(s2, ch2)
+        await feed(s3, ch3)
+        hashes = [s1.pipeline_hash(), s2.pipeline_hash(), s3.pipeline_hash()]
+        await op.async_execute(
+            [ch1.reader, ch2.reader, ch3.reader],
+            out.writer,
+            input_pipeline_hashes=hashes,
+        )
+        async_rows = await out.reader.collect()
+
+        assert len(sync_rows) == len(async_rows)
+
+        sync_sys_keys = _extract_system_tag_keys(sync_rows)
+        async_sys_keys = _extract_system_tag_keys(async_rows)
+        assert sync_sys_keys == async_sys_keys
+
+        sync_sys = _extract_system_tags(sync_rows)
+        async_sys = _extract_system_tags(async_rows)
+        assert sync_sys == async_sys
+
+
+class TestSemiJoinSystemTagEquivalence:
+    """Verify SemiJoin system-tag handling matches between sync and async."""
+
+    @pytest.mark.asyncio
+    async def test_system_tags_preserved_through_semijoin(self):
+        """SemiJoin should preserve left-side system tags in both paths."""
+        left = _make_source("id", "val", {"id": ["a", "b", "c"], "val": [1, 2, 3]})
+        right = _make_source("id", "score", {"id": ["b", "c", "d"], "score": [20, 30, 40]})
+        op = SemiJoin()
+
+        # Sync
+        sync_result = op.static_process(left, right)
+        sync_rows = list(sync_result.iter_packets())
+
+        # Async
+        async_rows = await run_binary_validated(op, left, right)
+
+        assert len(sync_rows) == len(async_rows) == 2
+
+        sync_sys_keys = _extract_system_tag_keys(sync_rows)
+        async_sys_keys = _extract_system_tag_keys(async_rows)
+        assert sync_sys_keys == async_sys_keys
+
+        sync_sys = _extract_system_tags(sync_rows)
+        async_sys = _extract_system_tags(async_rows)
+        assert sync_sys == async_sys
diff --git a/tests/test_channels/test_node_async_execute.py b/tests/test_channels/test_node_async_execute.py
new file mode 100644
index 00000000..e4e297f9
--- /dev/null
+++ b/tests/test_channels/test_node_async_execute.py
@@ -0,0 +1,868 @@
+"""
+Tests for async_execute on Node classes.
+
+Covers:
+- AsyncExecutableProtocol conformance for all four Node types
+- CachedPacketFunction.async_call with cache support
+- FunctionNode.async_execute basic streaming
+- PersistentFunctionNode.async_execute two-phase logic
+- OperatorNode.async_execute delegation
+- PersistentOperatorNode.async_execute with cache modes
+- process_packet / async_process_packet routing
+"""
+
+from __future__ import annotations
+
+import asyncio
+
+import pyarrow as pa
+import pytest
+
+from orcapod.channels import Channel
+from orcapod.core.datagrams import Packet
+from orcapod.core.function_pod import FunctionNode, FunctionPod, PersistentFunctionNode
+from orcapod.core.operator_node import OperatorNode, PersistentOperatorNode
+from orcapod.core.operators import SelectPacketColumns
+from orcapod.core.operators.join import Join
+from orcapod.core.operators.semijoin import SemiJoin
+from orcapod.core.packet_function import CachedPacketFunction, PythonPacketFunction
+from orcapod.core.streams import ArrowTableStream
+from orcapod.databases import InMemoryArrowDatabase
+from orcapod.protocols.core_protocols import AsyncExecutableProtocol
+from orcapod.types import CacheMode
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+
+def make_stream(n: int = 5) -> ArrowTableStream:
+    table = pa.table(
+        {
+            "id": pa.array(list(range(n)), type=pa.int64()),
+            "x": pa.array(list(range(n)), type=pa.int64()),
+        }
+    )
+    return ArrowTableStream(table, tag_columns=["id"])
+
+
+def make_two_col_stream(n: int = 3) -> ArrowTableStream:
+    table = pa.table(
+        {
+            "id": pa.array(list(range(n)), type=pa.int64()),
+            "x": pa.array(list(range(n)), type=pa.int64()),
+            "y": pa.array([i * 10 + i for i in range(n)], type=pa.int64()),
+        }
+    )
+    return ArrowTableStream(table, tag_columns=["id"])
+
+
+async def feed_stream_to_channel(stream: ArrowTableStream, ch: Channel) -> None:
+    """Push all (tag, packet) pairs from a stream into a channel, then close."""
+    for tag, packet in stream.iter_packets():
+        await ch.writer.send((tag, packet))
+    await ch.writer.close()
+
+
+def make_double_pod() -> tuple[PythonPacketFunction, FunctionPod]:
+    def double(x: int) -> int:
+        return x * 2
+
+    pf = PythonPacketFunction(double, output_keys="result")
+    pod = FunctionPod(pf)
+    return pf, pod
+
+
+# ---------------------------------------------------------------------------
+# 1. AsyncExecutableProtocol conformance
+# ---------------------------------------------------------------------------
+
+
+class TestProtocolConformance:
+    def test_function_node_satisfies_protocol(self):
+        _, pod = make_double_pod()
+        stream = make_stream(3)
+        node = FunctionNode(pod, stream)
+        assert isinstance(node, AsyncExecutableProtocol)
+
+    def test_persistent_function_node_satisfies_protocol(self):
+        _, pod = make_double_pod()
+        stream = make_stream(3)
+        db = InMemoryArrowDatabase()
+        node = PersistentFunctionNode(pod, stream, pipeline_database=db)
+        assert isinstance(node, AsyncExecutableProtocol)
+
+    def test_operator_node_satisfies_protocol(self):
+        op = SelectPacketColumns(["x"])
+        stream = make_stream(3)
+        node = OperatorNode(op, [stream])
+        assert isinstance(node, AsyncExecutableProtocol)
+
+    def test_persistent_operator_node_satisfies_protocol(self):
+        op = SelectPacketColumns(["x"])
+        stream = make_stream(3)
+        db = InMemoryArrowDatabase()
+        node = PersistentOperatorNode(op, [stream], pipeline_database=db)
+        assert isinstance(node, AsyncExecutableProtocol)
+
+
+# ---------------------------------------------------------------------------
+# 2. CachedPacketFunction.async_call
+# ---------------------------------------------------------------------------
+
+
+class TestCachedPacketFunctionAsync:
+    @pytest.mark.asyncio
+    async def test_async_call_cache_miss_computes_and_records(self):
+        def double(x: int) -> int:
+            return x * 2
+
+        pf = PythonPacketFunction(double, output_keys="result")
+        db = InMemoryArrowDatabase()
+        cpf = CachedPacketFunction(pf, result_database=db)
+
+        packet = Packet({"x": 5})
+        result = await cpf.async_call(packet)
+
+        assert result is not None
+        assert result.as_dict()["result"] == 10
+        # Check that result was recorded in DB
+        cached = cpf.get_cached_output_for_packet(packet)
+        assert cached is not None
+        assert cached.as_dict()["result"] == 10
+
+    @pytest.mark.asyncio
+    async def test_async_call_cache_hit_returns_cached(self):
+        def double(x: int) -> int:
+            return x * 2
+
+        pf = PythonPacketFunction(double, output_keys="result")
+        db = InMemoryArrowDatabase()
+        cpf = CachedPacketFunction(pf, result_database=db)
+
+        packet = Packet({"x": 5})
+        # First call — computes
+        result1 = await cpf.async_call(packet)
+        assert result1 is not None
+        # Has RESULT_COMPUTED_FLAG
+        assert result1.get_meta_value(cpf.RESULT_COMPUTED_FLAG, False) is True
+
+        # Second call — should hit cache (no RESULT_COMPUTED_FLAG set to True)
+        result2 = await cpf.async_call(packet)
+        assert result2 is not None
+        assert result2.as_dict()["result"] == 10
+        # Cache hit should NOT have RESULT_COMPUTED_FLAG=True
+        # (the flag is only set on freshly computed results)
+        assert result2.get_meta_value(cpf.RESULT_COMPUTED_FLAG, None) is not True
+
+    @pytest.mark.asyncio
+    async def test_async_call_skip_cache_lookup(self):
+        call_count = 0
+
+        def counting_double(x: int) -> int:
+            nonlocal call_count
+            call_count += 1
+            return x * 2
+
+        pf = PythonPacketFunction(counting_double, output_keys="result")
+        db = InMemoryArrowDatabase()
+        cpf = CachedPacketFunction(pf, result_database=db)
+
+        packet = Packet({"x": 5})
+        await cpf.async_call(packet)
+        assert call_count == 1
+
+        # With skip_cache_lookup, should recompute
+        await cpf.async_call(packet, skip_cache_lookup=True)
+        assert call_count == 2
+
+    @pytest.mark.asyncio
+    async def test_async_call_skip_cache_insert(self):
+        def double(x: int) -> int:
+            return x * 2
+
+        pf = PythonPacketFunction(double, output_keys="result")
+        db = InMemoryArrowDatabase()
+        cpf = CachedPacketFunction(pf, result_database=db)
+
+        packet = Packet({"x": 5})
+        result = await cpf.async_call(packet, skip_cache_insert=True)
+        assert result is not None
+        assert result.as_dict()["result"] == 10
+
+        # Should NOT be cached
+        cached = cpf.get_cached_output_for_packet(packet)
+        assert cached is None
+
+
+# ---------------------------------------------------------------------------
+# 3. FunctionNode.async_execute
+# ---------------------------------------------------------------------------
+
+
+class TestFunctionNodeAsyncExecute:
+    @pytest.mark.asyncio
+    async def test_basic_streaming_matches_sync(self):
+        _, pod = make_double_pod()
+        stream = make_stream(5)
+
+        # Sync results
+        node_sync = FunctionNode(pod, stream)
+        sync_results = list(node_sync.iter_packets())
+        sync_values = sorted(pkt.as_dict()["result"] for _, pkt in sync_results)
+
+        # Async results
+        node_async = FunctionNode(pod, make_stream(5))
+        input_ch = Channel(buffer_size=16)
+        output_ch = Channel(buffer_size=16)
+
+        await feed_stream_to_channel(make_stream(5), input_ch)
+        await node_async.async_execute([input_ch.reader], output_ch.writer)
+
+        async_results = await output_ch.reader.collect()
+        async_values = sorted(pkt.as_dict()["result"] for _, pkt in async_results)
+        assert async_values == sync_values
+
+    @pytest.mark.asyncio
+    async def test_empty_input_closes_cleanly(self):
+        _, pod = make_double_pod()
+        node = FunctionNode(pod, make_stream(1))
+
+        input_ch = Channel(buffer_size=4)
+        output_ch = Channel(buffer_size=4)
+
+        await input_ch.writer.close()
+        await node.async_execute([input_ch.reader], output_ch.writer)
+
+        results = await output_ch.reader.collect()
+        assert results == []
+
+    @pytest.mark.asyncio
+    async def test_tags_preserved(self):
+        """Tags should pass through unchanged."""
+        _, pod = make_double_pod()
+        node = FunctionNode(pod, make_stream(3))
+
+        input_ch = Channel(buffer_size=16)
+        output_ch = Channel(buffer_size=16)
+
+        await feed_stream_to_channel(make_stream(3), input_ch)
+        await node.async_execute([input_ch.reader], output_ch.writer)
+
+        results = await output_ch.reader.collect()
+        ids = sorted(tag.as_dict()["id"] for tag, _ in results)
+        assert ids == [0, 1, 2]
+
+
+# ---------------------------------------------------------------------------
+# 4. PersistentFunctionNode.async_execute
+# ---------------------------------------------------------------------------
+
+
+class TestPersistentFunctionNodeAsyncExecute:
+    @pytest.mark.asyncio
+    async def test_no_cache_processes_all_inputs(self):
+        """With an empty DB, all inputs should be computed."""
+        pf, pod = make_double_pod()
+        db = InMemoryArrowDatabase()
+        stream = make_stream(3)
+        node = PersistentFunctionNode(pod, stream, pipeline_database=db)
+
+        input_ch = Channel(buffer_size=16)
+        output_ch = Channel(buffer_size=16)
+
+        await feed_stream_to_channel(make_stream(3), input_ch)
+        await node.async_execute([input_ch.reader], output_ch.writer)
+
+        results = await output_ch.reader.collect()
+        assert len(results) == 3
+        values = sorted(pkt.as_dict()["result"] for _, pkt in results)
+        assert values == [0, 2, 4]
+
+    @pytest.mark.asyncio
+    async def test_sync_run_then_async_emits_from_cache(self):
+        """After sync run() populates DB, async should emit cached results."""
+        pf, pod = make_double_pod()
+        db = InMemoryArrowDatabase()
+        stream = make_stream(3)
+
+        # Sync run to populate DB
+        node1 = PersistentFunctionNode(pod, stream, pipeline_database=db)
+        node1.run()
+
+        # New node with same DB — Phase 1 should emit cached
+        node2 = PersistentFunctionNode(pod, make_stream(3), pipeline_database=db)
+
+        input_ch = Channel(buffer_size=16)
+        output_ch = Channel(buffer_size=16)
+
+        # Close input immediately — no new packets
+        await input_ch.writer.close()
+        await node2.async_execute([input_ch.reader], output_ch.writer)
+
+        results = await output_ch.reader.collect()
+        assert len(results) == 3
+        values = sorted(pkt.as_dict()["result"] for _, pkt in results)
+        assert values == [0, 2, 4]
+
+    @pytest.mark.asyncio
+    async def test_two_phase_cached_and_new(self):
+        """Phase 1 emits cached; Phase 2 computes new."""
+        pf, pod = make_double_pod()
+        db = InMemoryArrowDatabase()
+
+        # Sync run with 3 items to populate DB
+        stream = make_stream(3)
+        node1 = PersistentFunctionNode(pod, stream, pipeline_database=db)
+        node1.run()
+
+        # Now run async with 5 items (3 cached + 2 new)
+        node2 = PersistentFunctionNode(pod, make_stream(5), pipeline_database=db)
+        input_ch = Channel(buffer_size=16)
+        output_ch = Channel(buffer_size=16)
+
+        await feed_stream_to_channel(make_stream(5), input_ch)
+        await node2.async_execute([input_ch.reader], output_ch.writer)
+
+        results = await output_ch.reader.collect()
+        values = sorted(pkt.as_dict()["result"] for _, pkt in results)
+        # 3 from cache + 2 new = 5 total
+        assert values == [0, 2, 4, 6, 8]
+
+    @pytest.mark.asyncio
+    async def test_db_records_created(self):
+        """Async execute should create pipeline records in the DB."""
+        pf, pod = make_double_pod()
+        db = InMemoryArrowDatabase()
+        stream = make_stream(3)
+        node = PersistentFunctionNode(pod, stream, pipeline_database=db)
+
+        input_ch = Channel(buffer_size=16)
+        output_ch = Channel(buffer_size=16)
+
+        await feed_stream_to_channel(make_stream(3), input_ch)
+        await node.async_execute([input_ch.reader], output_ch.writer)
+        await output_ch.reader.collect()
+
+        # Verify records in DB
+        records = node.get_all_records()
+        assert records is not None
+        assert records.num_rows == 3
+
+
+# ---------------------------------------------------------------------------
+# 5. OperatorNode.async_execute
+# ---------------------------------------------------------------------------
+
+
+class TestOperatorNodeAsyncExecute:
+    @pytest.mark.asyncio
+    async def test_unary_op_delegation(self):
+        stream = make_two_col_stream(3)
+        op = SelectPacketColumns(["x"])
+        node = OperatorNode(op, [stream])
+
+        input_ch = Channel(buffer_size=16)
+        output_ch = Channel(buffer_size=16)
+
+        await feed_stream_to_channel(make_two_col_stream(3), input_ch)
+        await node.async_execute([input_ch.reader], output_ch.writer)
+
+        results = await output_ch.reader.collect()
+        assert len(results) == 3
+        for _, packet in results:
+            pkt_dict = packet.as_dict()
+            assert "x" in pkt_dict
+            assert "y" not in pkt_dict
+
+    @pytest.mark.asyncio
+    async def test_binary_op_delegation(self):
+        left = make_stream(5)
+        right_table = pa.table(
+            {
+                "id": pa.array([1, 3], type=pa.int64()),
+                "z": pa.array([100, 300], type=pa.int64()),
+            }
+        )
+        right = ArrowTableStream(right_table, tag_columns=["id"])
+
+        op = SemiJoin()
+        node = OperatorNode(op, [left, right])
+
+        left_ch = Channel(buffer_size=16)
+        right_ch = Channel(buffer_size=16)
+        output_ch = Channel(buffer_size=16)
+
+        await feed_stream_to_channel(make_stream(5), left_ch)
+        await feed_stream_to_channel(
+            ArrowTableStream(right_table, tag_columns=["id"]), right_ch
+        )
+        await node.async_execute([left_ch.reader, right_ch.reader], output_ch.writer)
+
+        results = await output_ch.reader.collect()
+        ids = sorted(tag.as_dict()["id"] for tag, _ in results)
+        assert ids == [1, 3]
+
+    @pytest.mark.asyncio
+    async def test_nary_op_delegation(self):
+        left_table = pa.table(
+            {
+                "id": pa.array([0, 1, 2], type=pa.int64()),
+                "x": pa.array([10, 20, 30], type=pa.int64()),
+            }
+        )
+        right_table = pa.table(
+            {
+                "id": pa.array([0, 1, 2], type=pa.int64()),
+                "y": pa.array([100, 200, 300], type=pa.int64()),
+            }
+        )
+        left = ArrowTableStream(left_table, tag_columns=["id"])
+        right = ArrowTableStream(right_table, tag_columns=["id"])
+        op = Join()
+        node = OperatorNode(op, [left, right])
+
+        left_ch = Channel(buffer_size=16)
+        right_ch = Channel(buffer_size=16)
+        output_ch = Channel(buffer_size=16)
+
+        await feed_stream_to_channel(
+            ArrowTableStream(left_table, tag_columns=["id"]), left_ch
+        )
+        await feed_stream_to_channel(
+            ArrowTableStream(right_table, tag_columns=["id"]), right_ch
+        )
+        await node.async_execute([left_ch.reader, right_ch.reader], output_ch.writer)
+
+        results = await output_ch.reader.collect()
+        assert len(results) == 3
+        ids = sorted(tag.as_dict()["id"] for tag, _ in results)
+        assert ids == [0, 1, 2]
+
+    @pytest.mark.asyncio
+    async def test_results_match_sync(self):
+        stream = make_two_col_stream(4)
+        op = SelectPacketColumns(["x"])
+
+        # Sync
+        node_sync = OperatorNode(op, [stream])
+        node_sync.run()
+        sync_table = node_sync.as_table()
+        sync_x = sorted(sync_table.column("x").to_pylist())
+
+        # Async
+        node_async = OperatorNode(op, [make_two_col_stream(4)])
+        input_ch = Channel(buffer_size=16)
+        output_ch = Channel(buffer_size=16)
+
+        await feed_stream_to_channel(make_two_col_stream(4), input_ch)
+        await node_async.async_execute([input_ch.reader], output_ch.writer)
+
+        results = await output_ch.reader.collect()
+        async_x = sorted(pkt.as_dict()["x"] for _, pkt in results)
+        assert async_x == sync_x
+
+
+# ---------------------------------------------------------------------------
+# 6. PersistentOperatorNode.async_execute
+# ---------------------------------------------------------------------------
+
+
+class TestPersistentOperatorNodeAsyncExecute:
+    @pytest.mark.asyncio
+    async def test_off_mode_no_db_write(self):
+        stream = make_two_col_stream(3)
+        op = SelectPacketColumns(["x"])
+        db = InMemoryArrowDatabase()
+        node = PersistentOperatorNode(
+            op, [stream], pipeline_database=db, cache_mode=CacheMode.OFF
+        )
+
+        input_ch = Channel(buffer_size=16)
+        output_ch = Channel(buffer_size=16)
+
+        await feed_stream_to_channel(make_two_col_stream(3), input_ch)
+        await node.async_execute([input_ch.reader], output_ch.writer)
+
+        results = await output_ch.reader.collect()
+        assert len(results) == 3
+
+        # DB should be empty (OFF mode)
+        records = node.get_all_records()
+        assert records is None
+
+    @pytest.mark.asyncio
+    async def test_log_mode_stores_results(self):
+        stream = make_two_col_stream(3)
+        op = SelectPacketColumns(["x"])
+        db = InMemoryArrowDatabase()
+        node = PersistentOperatorNode(
+            op, [stream], pipeline_database=db, cache_mode=CacheMode.LOG
+        )
+
+        input_ch = Channel(buffer_size=16)
+        output_ch = Channel(buffer_size=16)
+
+        await feed_stream_to_channel(make_two_col_stream(3), input_ch)
+        await node.async_execute([input_ch.reader], output_ch.writer)
+
+        results = await output_ch.reader.collect()
+        assert len(results) == 3
+
+        # DB should have records (LOG mode)
+        records = node.get_all_records()
+        assert records is not None
+        assert records.num_rows == 3
+
+    @pytest.mark.asyncio
+    async def test_replay_mode_emits_from_db(self):
+        stream = make_two_col_stream(3)
+        op = SelectPacketColumns(["x"])
+        db = InMemoryArrowDatabase()
+
+        # First: sync LOG to populate DB
+        node1 = PersistentOperatorNode(
+            op, [stream], pipeline_database=db, cache_mode=CacheMode.LOG
+        )
+        node1.run()
+
+        # Second: async REPLAY from DB
+        node2 = PersistentOperatorNode(
+            op,
+            [make_two_col_stream(3)],
+            pipeline_database=db,
+            cache_mode=CacheMode.REPLAY,
+        )
+
+        # No input needed for REPLAY — close input immediately
+        input_ch = Channel(buffer_size=4)
+        output_ch = Channel(buffer_size=16)
+
+        await input_ch.writer.close()
+        await node2.async_execute([input_ch.reader], output_ch.writer)
+
+        results = await output_ch.reader.collect()
+        assert len(results) == 3
+        values = sorted(pkt.as_dict()["x"] for _, pkt in results)
+        assert values == [0, 1, 2]
+
+    @pytest.mark.asyncio
+    async def test_replay_empty_db_returns_empty(self):
+        stream = make_two_col_stream(3)
+        op = SelectPacketColumns(["x"])
+        db = InMemoryArrowDatabase()
+
+        node = PersistentOperatorNode(
+            op,
+            [stream],
+            pipeline_database=db,
+            cache_mode=CacheMode.REPLAY,
+        )
+
+        input_ch = Channel(buffer_size=4)
+        output_ch = Channel(buffer_size=16)
+
+        await input_ch.writer.close()
+        await node.async_execute([input_ch.reader], output_ch.writer)
+
+        results = await output_ch.reader.collect()
+        assert len(results) == 0
+
+
+# ---------------------------------------------------------------------------
+# 7. process_packet routing verification
+# ---------------------------------------------------------------------------
+
+
+class TestProcessPacketRouting:
+    def test_function_node_sequential_uses_process_packet(self):
+        """Verify FunctionNode routes through process_packet (not raw pf.call)."""
+        call_log = []
+
+        _, pod = make_double_pod()
+        stream = make_stream(3)
+        node = FunctionNode(pod, stream)
+
+        # Monkey-patch to verify routing
+        original = node.process_packet
+
+        def patched(tag, packet):
+            call_log.append("process_packet")
+            return original(tag, packet)
+
+        node.process_packet = patched
+
+        results = list(node.iter_packets())
+        assert len(results) == 3
+        assert len(call_log) == 3
+
+    @pytest.mark.asyncio
+    async def test_function_node_async_uses_async_process_packet(self):
+        """Verify FunctionNode.async_execute routes through async_process_packet."""
+        call_log = []
+
+        _, pod = make_double_pod()
+        stream = make_stream(3)
+        node = FunctionNode(pod, stream)
+
+        original = node.async_process_packet
+
+        async def patched(tag, packet):
+            call_log.append("async_process_packet")
+            return await original(tag, packet)
+
+        node.async_process_packet = patched
+
+        input_ch = Channel(buffer_size=16)
+        output_ch = Channel(buffer_size=16)
+
+        await feed_stream_to_channel(make_stream(3), input_ch)
+        await node.async_execute([input_ch.reader], output_ch.writer)
+        await output_ch.reader.collect()
+
+        assert len(call_log) == 3
+
+
+# ---------------------------------------------------------------------------
+# 8. End-to-end async pipeline with nodes
+# ---------------------------------------------------------------------------
+
+
+class TestEndToEnd:
+    @pytest.mark.asyncio
+    async def test_source_to_function_node_pipeline(self):
+        """Source → FunctionNode async pipeline."""
+
+        def triple(x: int) -> int:
+            return x * 3
+
+        pf = PythonPacketFunction(triple, output_keys="result")
+        pod = FunctionPod(pf)
+        stream = make_stream(4)
+        node = FunctionNode(pod, stream)
+
+        ch1 = Channel(buffer_size=16)
+        ch2 = Channel(buffer_size=16)
+
+        async def source():
+            for tag, packet in make_stream(4).iter_packets():
+                await ch1.writer.send((tag, packet))
+            await ch1.writer.close()
+
+        async with asyncio.TaskGroup() as tg:
+            tg.create_task(source())
+            tg.create_task(node.async_execute([ch1.reader], ch2.writer))
+
+        results = await ch2.reader.collect()
+        assert len(results) == 4
+        values = sorted(pkt.as_dict()["result"] for _, pkt in results)
+        assert values == [0, 3, 6, 9]
+
+    @pytest.mark.asyncio
+    async def test_source_to_operator_node_pipeline(self):
+        """Source → OperatorNode (SelectPacketColumns) async pipeline."""
+        stream = make_two_col_stream(3)
+        op = SelectPacketColumns(["x"])
+        node = OperatorNode(op, [stream])
+
+        ch1 = Channel(buffer_size=16)
+        ch2 = Channel(buffer_size=16)
+
+        async def source():
+            for tag, packet in make_two_col_stream(3).iter_packets():
+                await ch1.writer.send((tag, packet))
+            await ch1.writer.close()
+
+        async with asyncio.TaskGroup() as tg:
+            tg.create_task(source())
+            tg.create_task(node.async_execute([ch1.reader], ch2.writer))
+
+        results = await ch2.reader.collect()
+        assert len(results) == 3
+        for _, packet in results:
+            pkt_dict = packet.as_dict()
+            assert "x" in pkt_dict
+            assert "y" not in pkt_dict
+
+
+# ---------------------------------------------------------------------------
+# 9. Async pipeline → synchronous DB retrieval (concrete example)
+# ---------------------------------------------------------------------------
+
+
+class TestAsyncPipelineThenSyncRetrieval:
+    """Demonstrates the full workflow: run an async pipeline, then retrieve
+    results synchronously from the database.
+
+    This is the primary use-case for persistent nodes: async streaming
+    execution populates the DB, and later callers can retrieve results
+    without re-running the pipeline.
+    """
+
+    @pytest.mark.asyncio
+    async def test_persistent_function_node_async_then_sync_db_retrieval(self):
+        """PersistentFunctionNode: async execute → sync get_all_records."""
+        # --- Setup ---
+        def double(x: int) -> int:
+            return x * 2
+
+        pf = PythonPacketFunction(double, output_keys="result")
+        pod = FunctionPod(pf)
+        db = InMemoryArrowDatabase()
+        stream = make_stream(5)  # ids 0..4, x values 0..4
+
+        node = PersistentFunctionNode(pod, stream, pipeline_database=db)
+
+        # --- Async pipeline execution ---
+        input_ch = Channel(buffer_size=16)
+        output_ch = Channel(buffer_size=16)
+
+        async def source_producer():
+            for tag, packet in make_stream(5).iter_packets():
+                await input_ch.writer.send((tag, packet))
+            await input_ch.writer.close()
+
+        async with asyncio.TaskGroup() as tg:
+            tg.create_task(source_producer())
+            tg.create_task(node.async_execute([input_ch.reader], output_ch.writer))
+
+        async_results = await output_ch.reader.collect()
+        async_values = sorted(pkt.as_dict()["result"] for _, pkt in async_results)
+        assert async_values == [0, 2, 4, 6, 8]
+
+        # --- Synchronous DB retrieval (no re-computation) ---
+        records = node.get_all_records()
+        assert records is not None
+        assert records.num_rows == 5
+
+        # The DB contains the same result values that were streamed async
+        result_col = records.column("result").to_pylist()
+        assert sorted(result_col) == [0, 2, 4, 6, 8]
+
+        # A *new* node sharing the same DB can also read these records
+        node2 = PersistentFunctionNode(pod, make_stream(5), pipeline_database=db)
+        records2 = node2.get_all_records()
+        assert records2 is not None
+        assert records2.num_rows == 5
+        assert sorted(records2.column("result").to_pylist()) == [0, 2, 4, 6, 8]
+
+    @pytest.mark.asyncio
+    async def test_persistent_operator_node_log_then_sync_db_retrieval(self):
+        """PersistentOperatorNode (LOG): async execute → sync get_all_records."""
+        # --- Setup ---
+        stream = make_two_col_stream(4)  # ids 0..3, x 0..3, y 0,11,22,33
+        op = SelectPacketColumns(["x"])
+        db = InMemoryArrowDatabase()
+
+        node = PersistentOperatorNode(
+            op, [stream], pipeline_database=db, cache_mode=CacheMode.LOG
+        )
+
+        # --- Async pipeline execution ---
+        input_ch = Channel(buffer_size=16)
+        output_ch = Channel(buffer_size=16)
+
+        async def source_producer():
+            for tag, packet in make_two_col_stream(4).iter_packets():
+                await input_ch.writer.send((tag, packet))
+            await input_ch.writer.close()
+
+        async with asyncio.TaskGroup() as tg:
+            tg.create_task(source_producer())
+            tg.create_task(node.async_execute([input_ch.reader], output_ch.writer))
+
+        async_results = await output_ch.reader.collect()
+        assert len(async_results) == 4
+        async_x = sorted(pkt.as_dict()["x"] for _, pkt in async_results)
+        assert async_x == [0, 1, 2, 3]
+
+        # --- Synchronous DB retrieval ---
+        records = node.get_all_records()
+        assert records is not None
+        assert records.num_rows == 4
+        assert sorted(records.column("x").to_pylist()) == [0, 1, 2, 3]
+        # 'y' column should NOT be present (was dropped by SelectPacketColumns)
+        assert "y" not in records.column_names
+
+        # --- REPLAY from DB via a new node (no computation) ---
+        replay_node = PersistentOperatorNode(
+            op,
+            [make_two_col_stream(4)],
+            pipeline_database=db,
+            cache_mode=CacheMode.REPLAY,
+        )
+        replay_node.run()
+        replay_table = replay_node.as_table()
+        assert replay_table.num_rows == 4
+        assert sorted(replay_table.column("x").to_pylist()) == [0, 1, 2, 3]
+
+    @pytest.mark.asyncio
+    async def test_multi_stage_async_pipeline_with_db_retrieval(self):
+        """Two-stage async pipeline: Source → FunctionNode → OperatorNode.
+
+        Both nodes are persistent. After async execution, results from each
+        stage can be retrieved synchronously from the database.
+        """
+        # --- Setup stage 1: double(x) ---
+        def double(x: int) -> int:
+            return x * 2
+
+        pf = PythonPacketFunction(double, output_keys="result")
+        pod = FunctionPod(pf)
+        fn_db = InMemoryArrowDatabase()
+        stream = make_stream(3)  # ids 0..2, x 0..2
+
+        fn_node = PersistentFunctionNode(pod, stream, pipeline_database=fn_db)
+
+        # --- Setup stage 2: select only "result" column ---
+        # Build a placeholder stream for schema purposes (OperatorNode needs
+        # to validate inputs at construction time)
+        stage1_table = pa.table(
+            {
+                "id": pa.array([0, 1, 2], type=pa.int64()),
+                "result": pa.array([0, 2, 4], type=pa.int64()),
+            }
+        )
+        stage1_stream = ArrowTableStream(stage1_table, tag_columns=["id"])
+        op = SelectPacketColumns(["result"])
+        op_db = InMemoryArrowDatabase()
+        op_node = PersistentOperatorNode(
+            op, [stage1_stream], pipeline_database=op_db, cache_mode=CacheMode.LOG
+        )
+
+        # --- Async pipeline execution ---
+        ch_source = Channel(buffer_size=16)
+        ch_mid = Channel(buffer_size=16)
+        ch_out = Channel(buffer_size=16)
+
+        async def source_producer():
+            for tag, packet in make_stream(3).iter_packets():
+                await ch_source.writer.send((tag, packet))
+            await ch_source.writer.close()
+
+        async with asyncio.TaskGroup() as tg:
+            tg.create_task(source_producer())
+            tg.create_task(
+                fn_node.async_execute([ch_source.reader], ch_mid.writer)
+            )
+            tg.create_task(
+                op_node.async_execute([ch_mid.reader], ch_out.writer)
+            )
+
+        final_results = await ch_out.reader.collect()
+        assert len(final_results) == 3
+        final_values = sorted(pkt.as_dict()["result"] for _, pkt in final_results)
+        assert final_values == [0, 2, 4]
+
+        # --- Sync retrieval from stage 1 DB ---
+        fn_records = fn_node.get_all_records()
+        assert fn_records is not None
+        assert fn_records.num_rows == 3
+        assert sorted(fn_records.column("result").to_pylist()) == [0, 2, 4]
+
+        # --- Sync retrieval from stage 2 DB ---
+        op_records = op_node.get_all_records()
+        assert op_records is not None
+        assert op_records.num_rows == 3
+        assert sorted(op_records.column("result").to_pylist()) == [0, 2, 4]
diff --git a/tests/test_channels/test_pipeline_async_integration.py b/tests/test_channels/test_pipeline_async_integration.py
new file mode 100644
index 00000000..66a5b7ee
--- /dev/null
+++ b/tests/test_channels/test_pipeline_async_integration.py
@@ -0,0 +1,287 @@
+"""
+Integration test — end-to-end async pipeline.
+
+Shows the recommended workflow in a single, linear example:
+
+1. Define domain functions with ``@function_pod``.
+2. Build a pipeline with the ``Pipeline`` context manager.
+3. Run the pipeline asynchronously via ``AsyncPipelineOrchestrator``.
+4. Retrieve persisted results synchronously from the pipeline's
+   persistent nodes (``pipeline.<label>.get_all_records()``).
+
+Pipeline::
+
+    students ──┐
+               ├── Join ──► compute_letter_grade
+    grades  ───┘
+
+Tags:   student_id
+Packet: name, score  →  letter_grade
+"""
+
+from __future__ import annotations
+
+import pyarrow as pa
+import pytest
+
+from orcapod import ArrowTableSource, function_pod
+from orcapod.core.operators import Join
+from orcapod.databases import InMemoryArrowDatabase
+from orcapod.pipeline import AsyncPipelineOrchestrator, Pipeline
+from orcapod.types import ExecutorType, PipelineConfig
+
+
+# ── 1. Define domain functions ───────────────────────────────────────────
+
+
+@function_pod(output_keys="letter_grade")
+def compute_letter_grade(name: str, score: int) -> str:
+    """Assign a letter grade based on numeric score."""
+    if score >= 90:
+        return "A"
+    elif score >= 80:
+        return "B"
+    elif score >= 70:
+        return "C"
+    else:
+        return "F"
+
+
+# ── 2. Source data ───────────────────────────────────────────────────────
+
+
+STUDENTS = pa.table(
+    {
+        "student_id": pa.array(
+            ["s1", "s2", "s3", "s4", "s5"], type=pa.large_string()
+        ),
+        "name": pa.array(
+            ["Alice", "Bob", "Carol", "Dave", "Eve"], type=pa.large_string()
+        ),
+    }
+)
+
+GRADES = pa.table(
+    {
+        "student_id": pa.array(
+            ["s1", "s2", "s3", "s4", "s5"], type=pa.large_string()
+        ),
+        "score": pa.array([95, 82, 67, 73, 55], type=pa.int64()),
+    }
+)
+
+EXPECTED = {
+    "s1": "A",  # 95
+    "s2": "B",  # 82
+    "s3": "F",  # 67
+    "s4": "C",  # 73
+    "s5": "F",  # 55
+}
+
+
+# ── 3. Build pipeline ───────────────────────────────────────────────────
+
+
+def _build_pipeline() -> Pipeline:
+    """Construct and auto-compile the pipeline."""
+    db = InMemoryArrowDatabase()
+    pipeline = Pipeline(
+        name="grades_pipeline",
+        pipeline_database=db,
+        auto_compile=True,
+    )
+
+    with pipeline:
+        students = ArrowTableSource(STUDENTS, tag_columns=["student_id"])
+        grades = ArrowTableSource(GRADES, tag_columns=["student_id"])
+
+        joined = Join()(students, grades, label="join")
+        compute_letter_grade.pod(joined, label="letter_grade")
+
+    return pipeline
+
+
+def _grades_from_table(table: pa.Table) -> dict[str, str]:
+    """Extract {student_id: letter_grade} from a PyArrow table."""
+    return {
+        table.column("student_id")[i].as_py(): table.column("letter_grade")[i].as_py()
+        for i in range(table.num_rows)
+    }
+
+
+# ── Tests ────────────────────────────────────────────────────────────────
+
+
+class TestAsyncPipelineIntegration:
+    """Build → run async → retrieve from persistent nodes → verify."""
+
+    def test_orchestrator_then_db_retrieval(self):
+        """Run via orchestrator, then retrieve results from the persistent node."""
+        pipeline = _build_pipeline()
+
+        # Run asynchronously — persistent nodes write to the pipeline DB
+        orchestrator = AsyncPipelineOrchestrator()
+        orchestrator.run(pipeline)
+
+        # Retrieve results synchronously via the persistent node
+        records = pipeline.letter_grade.get_all_records()
+        assert records is not None
+        assert records.num_rows == 5
+        assert _grades_from_table(records) == EXPECTED
+
+    def test_pipeline_run_with_async_executor(self):
+        """Pipeline.run(ASYNC_CHANNELS) delegates to the orchestrator."""
+        pipeline = _build_pipeline()
+
+        config = PipelineConfig(executor=ExecutorType.ASYNC_CHANNELS)
+        pipeline.run(config=config)
+
+        records = pipeline.letter_grade.get_all_records()
+        assert records is not None
+        assert records.num_rows == 5
+        assert _grades_from_table(records) == EXPECTED
+
+    @pytest.mark.asyncio
+    async def test_orchestrator_run_async_from_event_loop(self):
+        """run_async() works when an event loop is already running."""
+        pipeline = _build_pipeline()
+
+        orchestrator = AsyncPipelineOrchestrator()
+        await orchestrator.run_async(pipeline)
+
+        records = pipeline.letter_grade.get_all_records()
+        assert records is not None
+        assert records.num_rows == 5
+        assert _grades_from_table(records) == EXPECTED
+
+    def test_sync_run_then_db_retrieval(self):
+        """Baseline: sync run() populates the DB for later retrieval."""
+        pipeline = _build_pipeline()
+        pipeline.run()
+
+        records = pipeline.letter_grade.get_all_records()
+        assert records is not None
+        assert records.num_rows == 5
+        assert _grades_from_table(records) == EXPECTED
+
+    def test_sync_and_async_produce_identical_results(self):
+        """Sync and async execution paths yield the same grades."""
+        # Sync path
+        sync_pipeline = _build_pipeline()
+        sync_pipeline.run()
+        sync_records = sync_pipeline.letter_grade.get_all_records()
+        assert sync_records is not None
+        sync_grades = _grades_from_table(sync_records)
+
+        # Async path
+        async_pipeline = _build_pipeline()
+        orchestrator = AsyncPipelineOrchestrator()
+        orchestrator.run(async_pipeline)
+        async_records = async_pipeline.letter_grade.get_all_records()
+        assert async_records is not None
+        async_grades = _grades_from_table(async_records)
+
+        assert sync_grades == async_grades == EXPECTED
+
+
+class TestSyncAsyncSystemTagEquivalence:
+    """Verify that sync and async pipeline execution produce identical
+    system-tag column names and values in the persisted DB records."""
+
+    def _get_system_tag_columns(self, table: pa.Table) -> list[str]:
+        """Return sorted system-tag column names from a table."""
+        from orcapod.system_constants import constants
+
+        return sorted(
+            c for c in table.column_names
+            if c.startswith(constants.SYSTEM_TAG_PREFIX)
+        )
+
+    def _system_tag_data(self, table: pa.Table) -> dict[str, list]:
+        """Extract system-tag columns as {col_name: sorted_values}."""
+        sys_cols = self._get_system_tag_columns(table)
+        return {c: sorted(table.column(c).to_pylist()) for c in sys_cols}
+
+    def test_join_pipeline_system_tags_identical(self):
+        """Join pipeline: sync and async produce the same system-tag columns."""
+        sync_pipeline = _build_pipeline()
+        sync_pipeline.run()
+        sync_records = sync_pipeline.letter_grade.get_all_records(
+            columns={"system_tags": True}
+        )
+        assert sync_records is not None
+
+        async_pipeline = _build_pipeline()
+        orchestrator = AsyncPipelineOrchestrator()
+        orchestrator.run(async_pipeline)
+        async_records = async_pipeline.letter_grade.get_all_records(
+            columns={"system_tags": True}
+        )
+        assert async_records is not None
+
+        # System-tag column names must match
+        sync_sys_cols = self._get_system_tag_columns(sync_records)
+        async_sys_cols = self._get_system_tag_columns(async_records)
+        assert sync_sys_cols, "Expected system-tag columns in output"
+        assert sync_sys_cols == async_sys_cols
+
+        # System-tag values must match
+        sync_sys_data = self._system_tag_data(sync_records)
+        async_sys_data = self._system_tag_data(async_records)
+        assert sync_sys_data == async_sys_data
+
+    def test_join_pipeline_system_tag_column_names_contain_pipeline_hash(self):
+        """System-tag columns should follow the name-extending convention."""
+        from orcapod.system_constants import constants
+
+        pipeline = _build_pipeline()
+        orchestrator = AsyncPipelineOrchestrator()
+        orchestrator.run(pipeline)
+        records = pipeline.letter_grade.get_all_records(
+            columns={"system_tags": True}
+        )
+        assert records is not None
+
+        sys_cols = self._get_system_tag_columns(records)
+        assert len(sys_cols) > 0
+
+        # Each system-tag column should end with :N (canonical position)
+        for col in sys_cols:
+            assert col[-2:] in (":0", ":1"), (
+                f"System-tag column {col!r} missing canonical position suffix"
+            )
+
+    def test_all_system_tag_columns_match_between_sync_and_async(self):
+        """Every system-tag column name and value in the terminal node's
+        DB records should be identical between sync and async.
+
+        Source-info columns contain run-specific UUIDs and are excluded
+        from this comparison.
+        """
+        sync_pipeline = _build_pipeline()
+        sync_pipeline.run()
+        sync_records = sync_pipeline.letter_grade.get_all_records(
+            columns={"system_tags": True}
+        )
+        assert sync_records is not None
+
+        async_pipeline = _build_pipeline()
+        orchestrator = AsyncPipelineOrchestrator()
+        orchestrator.run(async_pipeline)
+        async_records = async_pipeline.letter_grade.get_all_records(
+            columns={"system_tags": True}
+        )
+        assert async_records is not None
+
+        # System-tag column names must match
+        sync_sys_cols = self._get_system_tag_columns(sync_records)
+        async_sys_cols = self._get_system_tag_columns(async_records)
+        assert sync_sys_cols == async_sys_cols
+
+        # System-tag column values must match (sort by student_id)
+        sync_sorted = sync_records.sort_by("student_id")
+        async_sorted = async_records.sort_by("student_id")
+        for col in sync_sys_cols:
+            assert sync_sorted.column(col).to_pylist() == async_sorted.column(col).to_pylist(), (
+                f"System-tag column {col!r} differs between sync and async"
+            )
diff --git a/tests/test_core/test_regression_fixes.py b/tests/test_core/test_regression_fixes.py
index 78bc6718..dfb11cf8 100644
--- a/tests/test_core/test_regression_fixes.py
+++ b/tests/test_core/test_regression_fixes.py
@@ -4,7 +4,7 @@
 Covers:
 1. async_execute output channel closed on exception (try/finally)
 2. PacketFunctionWrapper.direct_call/direct_async_call bypass executor routing
-3. _execute_concurrent falls back when inside a running event loop
+3. Concurrent iteration falls back to sequential inside a running event loop
 4. FunctionPod.async_execute backpressure bounds pending tasks
 5. _materialize_to_stream preserves source_info provenance tokens
 6. RayExecutor._ensure_ray_initialized uses ray_address
@@ -23,7 +23,7 @@
 from orcapod.channels import Channel, ChannelClosed
 from orcapod.core.datagrams import Packet, Tag
 from orcapod.core.executors import LocalExecutor, PacketFunctionExecutorBase
-from orcapod.core.function_pod import FunctionPod, _execute_concurrent
+from orcapod.core.function_pod import FunctionPod, FunctionPodStream
 from orcapod.core.operators import SelectPacketColumns
 from orcapod.core.operators.join import Join
 from orcapod.core.packet_function import (
@@ -142,25 +142,25 @@ async def test_operator_closes_channel_on_static_process_error(self):
         assert isinstance(results, list)
 
     @pytest.mark.asyncio
-    async def test_static_output_pod_closes_channel_on_error(self):
-        """If _materialize_to_stream gets empty rows, it raises ValueError.
-        The output channel must still be closed."""
+    async def test_static_output_pod_closes_channel_on_empty_input(self):
+        """Empty input should be handled gracefully with channel still closed.
+
+        Streaming async_execute processes rows individually, so empty input
+        simply means zero iterations and a clean close — no error raised.
+        """
         op = SelectPacketColumns(columns=["x"])
 
-        # Feed an empty channel (no rows) — _materialize_to_stream will raise.
         input_ch = Channel(buffer_size=4)
         output_ch = Channel(buffer_size=4)
 
         await input_ch.writer.close()  # empty input
 
-        # The default StaticOutputPod.async_execute tries to materialize
-        # an empty list, raising ValueError. The output should still close.
-        with pytest.raises(ValueError, match="empty"):
-            await op.async_execute([input_ch.reader], output_ch.writer)
+        # Streaming async_execute handles empty input gracefully.
+        await op.async_execute([input_ch.reader], output_ch.writer)
 
-        # Channel should be closed.
+        # Channel should be closed and empty.
         results = await output_ch.reader.collect()
-        assert isinstance(results, list)
+        assert results == []
 
 
 # ===========================================================================
@@ -219,45 +219,57 @@ def test_call_still_routes_through_executor(self):
 
 
 # ===========================================================================
-# 3. _execute_concurrent falls back inside running event loop
+# 3. Concurrent iteration falls back inside running event loop
 # ===========================================================================
 
 
-class TestExecuteConcurrentInRunningLoop:
-    """_execute_concurrent must not crash when called from inside
-    an already-running asyncio event loop."""
+class TestConcurrentFallbackInRunningLoop:
+    """_iter_packets_concurrent must not crash when called from inside
+    an already-running asyncio event loop — should fall back to sequential
+    process_packet calls."""
 
     @staticmethod
-    def _make_double_pf() -> PythonPacketFunction:
+    def _make_concurrent_stream() -> tuple[FunctionPodStream, FunctionPod]:
         def double(x: int) -> int:
             return x * 2
 
-        return PythonPacketFunction(double, output_keys="result")
+        pf = PythonPacketFunction(double, output_keys="result")
+        # Attach an executor that reports concurrent support
+        executor = LocalExecutor()
+        pf.executor = executor
+        pod = FunctionPod(pf)
+
+        table = pa.table(
+            {
+                "id": pa.array([0, 1, 2], type=pa.int64()),
+                "x": pa.array([10, 20, 30], type=pa.int64()),
+            }
+        )
+        from orcapod.core.streams.arrow_table_stream import ArrowTableStream
+
+        stream = ArrowTableStream(table, tag_columns=["id"])
+        return pod.process(stream), pod
 
     @pytest.mark.asyncio
     async def test_falls_back_to_sequential_in_async_context(self):
         """When called from async code, should fall back to sequential
         execution instead of raising RuntimeError."""
-        pf = self._make_double_pf()
-
-        packets = [Packet({"x": i}) for i in range(3)]
-        results = _execute_concurrent(pf, packets)
+        pod_stream, _ = self._make_concurrent_stream()
+        results = list(pod_stream.iter_packets())
 
         assert len(results) == 3
-        values = [r.as_dict()["result"] for r in results]
-        assert values == [0, 2, 4]
+        values = sorted(pkt.as_dict()["result"] for _, pkt in results)
+        assert values == [20, 40, 60]
 
     def test_uses_asyncio_run_when_no_loop(self):
         """When there is no running event loop, it should use asyncio.run
         (concurrent path)."""
-        pf = self._make_double_pf()
-
-        packets = [Packet({"x": i}) for i in range(3)]
-        results = _execute_concurrent(pf, packets)
+        pod_stream, _ = self._make_concurrent_stream()
+        results = list(pod_stream.iter_packets())
 
         assert len(results) == 3
-        values = [r.as_dict()["result"] for r in results]
-        assert values == [0, 2, 4]
+        values = sorted(pkt.as_dict()["result"] for _, pkt in results)
+        assert values == [20, 40, 60]
 
 
 # ===========================================================================
diff --git a/tests/test_pipeline/test_orchestrator.py b/tests/test_pipeline/test_orchestrator.py
index f6b5fb35..d9ed0b02 100644
--- a/tests/test_pipeline/test_orchestrator.py
+++ b/tests/test_pipeline/test_orchestrator.py
@@ -1,15 +1,19 @@
 """
 Tests for the async pipeline orchestrator.
 
+The ``AsyncPipelineOrchestrator`` operates on compiled ``Pipeline``
+objects.  After execution, results are retrieved from the pipeline's
+persistent nodes via ``get_all_records()``.
+
 Covers:
-- Linear pipeline: Source → Operator → FunctionPod
-- Diamond DAG: Source → [Op1, Op2] → Join
+- Linear pipeline: Source → FunctionPod
+- Operator pipeline: Source → Operator → FunctionPod
+- Diamond DAG: Two sources → Join → FunctionPod
 - Fan-out: one source feeds multiple downstream nodes
 - Results match synchronous execution
-- SourceNode.async_execute pushes all rows
-- OperatorNode.async_execute delegates correctly
-- FunctionNode.async_execute works in streaming mode
-- Error propagation cancels other tasks
+- SourceNode / OperatorNode / FunctionNode async_execute basics
+- run_async entry point (from within an event loop)
+- PipelineConfig integration (custom buffer sizes)
 """
 
 from __future__ import annotations
@@ -23,13 +27,13 @@
 from orcapod.core.function_pod import FunctionNode, FunctionPod
 from orcapod.core.operator_node import OperatorNode
 from orcapod.core.operators import SelectPacketColumns
-from orcapod.core.operators.filters import PolarsFilter
 from orcapod.core.operators.join import Join
 from orcapod.core.operators.mappers import MapPackets
 from orcapod.core.packet_function import PythonPacketFunction
 from orcapod.core.sources import ArrowTableSource
-from orcapod.core.tracker import GraphTracker, SourceNode
-from orcapod.pipeline.orchestrator import AsyncPipelineOrchestrator
+from orcapod.core.tracker import SourceNode
+from orcapod.databases import InMemoryArrowDatabase
+from orcapod.pipeline import AsyncPipelineOrchestrator, Pipeline
 from orcapod.types import ExecutorType, PipelineConfig
 
 
@@ -110,7 +114,6 @@ async def test_delegates_to_operator(self):
         input_ch = Channel(buffer_size=16)
         output_ch = Channel(buffer_size=16)
 
-        # Feed source rows into input channel
         for tag, packet in src.iter_packets():
             await input_ch.writer.send((tag, packet))
         await input_ch.writer.close()
@@ -163,19 +166,18 @@ def test_linear_source_to_function_pod(self):
         pf = PythonPacketFunction(double_value, output_keys="result")
         pod = FunctionPod(pf)
 
-        tracker = GraphTracker()
-        with tracker:
-            result_stream = pod(src)
-
-        tracker.compile()
+        pipeline = Pipeline(name="linear", pipeline_database=InMemoryArrowDatabase())
+        with pipeline:
+            pod(src, label="doubler")
 
         orchestrator = AsyncPipelineOrchestrator()
-        result = orchestrator.run(tracker)
+        orchestrator.run(pipeline)
 
-        rows = list(result.iter_packets())
-        assert len(rows) == 3
+        records = pipeline.doubler.get_all_records()
+        assert records is not None
+        assert records.num_rows == 3
 
-        values = sorted([pkt.as_dict()["result"] for _, pkt in rows])
+        values = sorted(records.column("result").to_pylist())
         assert values == [2, 4, 6]
 
     def test_matches_sync_execution(self):
@@ -184,21 +186,26 @@ def test_matches_sync_execution(self):
         pf = PythonPacketFunction(double_value, output_keys="result")
         pod = FunctionPod(pf)
 
-        # Sync execution
-        sync_result = pod.process(src)
-        sync_rows = list(sync_result.iter_packets())
-        sync_values = sorted([pkt.as_dict()["result"] for _, pkt in sync_rows])
-
-        # Async execution
-        tracker = GraphTracker()
-        with tracker:
-            _ = pod(src)
-        tracker.compile()
+        # Sync
+        sync_pipeline = Pipeline(
+            name="sync", pipeline_database=InMemoryArrowDatabase()
+        )
+        with sync_pipeline:
+            pod(src, label="doubler")
+        sync_pipeline.run()
+        sync_records = sync_pipeline.doubler.get_all_records()
+        sync_values = sorted(sync_records.column("result").to_pylist())
 
+        # Async
+        async_pipeline = Pipeline(
+            name="async", pipeline_database=InMemoryArrowDatabase()
+        )
+        with async_pipeline:
+            pod(src, label="doubler")
         orchestrator = AsyncPipelineOrchestrator()
-        async_result = orchestrator.run(tracker)
-        async_rows = list(async_result.iter_packets())
-        async_values = sorted([pkt.as_dict()["result"] for _, pkt in async_rows])
+        orchestrator.run(async_pipeline)
+        async_records = async_pipeline.doubler.get_all_records()
+        async_values = sorted(async_records.column("result").to_pylist())
 
         assert sync_values == async_values
 
@@ -213,30 +220,26 @@ class TestOrchestratorOperatorPipeline:
 
     def test_source_to_operator_to_function_pod(self):
         src = _make_source("key", "value", {"key": ["a", "b", "c"], "value": [1, 2, 3]})
-        pf = PythonPacketFunction(double_value, output_keys="result")
-        pod = FunctionPod(pf)
         op = MapPackets(name_map={"value": "val"})
 
-        # Create a function that takes 'val' instead of 'value'
         def double_val(val: int) -> int:
             return val * 2
 
-        pf2 = PythonPacketFunction(double_val, output_keys="result")
-        pod2 = FunctionPod(pf2)
-
-        tracker = GraphTracker()
-        with tracker:
-            mapped = op(src)
-            result_stream = pod2(mapped)
+        pf = PythonPacketFunction(double_val, output_keys="result")
+        pod = FunctionPod(pf)
 
-        tracker.compile()
+        pipeline = Pipeline(name="op_pipe", pipeline_database=InMemoryArrowDatabase())
+        with pipeline:
+            mapped = op(src, label="mapper")
+            pod(mapped, label="doubler")
 
         orchestrator = AsyncPipelineOrchestrator()
-        result = orchestrator.run(tracker)
+        orchestrator.run(pipeline)
 
-        rows = list(result.iter_packets())
-        assert len(rows) == 3
-        values = sorted([pkt.as_dict()["result"] for _, pkt in rows])
+        records = pipeline.doubler.get_all_records()
+        assert records is not None
+        assert records.num_rows == 3
+        values = sorted(records.column("result").to_pylist())
         assert values == [2, 4, 6]
 
 
@@ -250,24 +253,21 @@ class TestOrchestratorDiamondDag:
 
     def test_two_sources_join_function_pod(self):
         src_a, src_b = _make_two_sources()
-
         pf = PythonPacketFunction(add_values, output_keys="total")
         pod = FunctionPod(pf)
 
-        tracker = GraphTracker()
-        with tracker:
-            joined = Join()(src_a, src_b)
-            result_stream = pod(joined)
-
-        tracker.compile()
+        pipeline = Pipeline(name="diamond", pipeline_database=InMemoryArrowDatabase())
+        with pipeline:
+            joined = Join()(src_a, src_b, label="join")
+            pod(joined, label="adder")
 
         orchestrator = AsyncPipelineOrchestrator()
-        result = orchestrator.run(tracker)
+        orchestrator.run(pipeline)
 
-        rows = list(result.iter_packets())
-        assert len(rows) == 2
-
-        values = sorted([pkt.as_dict()["total"] for _, pkt in rows])
+        records = pipeline.adder.get_all_records()
+        assert records is not None
+        assert records.num_rows == 2
+        values = sorted(records.column("total").to_pylist())
         assert values == [110, 220]
 
     def test_diamond_matches_sync(self):
@@ -277,71 +277,35 @@ def test_diamond_matches_sync(self):
         pod = FunctionPod(pf)
 
         # Sync
-        sync_joined = Join()(src_a, src_b)
-        sync_result = pod.process(sync_joined)
-        sync_values = sorted([pkt.as_dict()["total"] for _, pkt in sync_result.iter_packets()])
+        sync_pipeline = Pipeline(
+            name="sync_diamond", pipeline_database=InMemoryArrowDatabase()
+        )
+        with sync_pipeline:
+            joined = Join()(src_a, src_b, label="join")
+            pod(joined, label="adder")
+        sync_pipeline.run()
+        sync_values = sorted(
+            sync_pipeline.adder.get_all_records().column("total").to_pylist()
+        )
 
         # Async
-        tracker = GraphTracker()
-        with tracker:
-            joined = Join()(src_a, src_b)
-            _ = pod(joined)
-        tracker.compile()
-
+        async_pipeline = Pipeline(
+            name="async_diamond", pipeline_database=InMemoryArrowDatabase()
+        )
+        with async_pipeline:
+            joined = Join()(src_a, src_b, label="join")
+            pod(joined, label="adder")
         orchestrator = AsyncPipelineOrchestrator()
-        async_result = orchestrator.run(tracker)
+        orchestrator.run(async_pipeline)
         async_values = sorted(
-            [pkt.as_dict()["total"] for _, pkt in async_result.iter_packets()]
+            async_pipeline.adder.get_all_records().column("total").to_pylist()
         )
 
         assert sync_values == async_values
 
 
 # ===========================================================================
-# 7. Orchestrator: fan-out (one source feeds multiple nodes)
-# ===========================================================================
-
-
-class TestOrchestratorFanOut:
-    """One source feeds two different function pods via fan-out."""
-
-    def test_fan_out_source_feeds_two_branches(self):
-        src = _make_source("key", "value", {"key": ["a", "b"], "value": [10, 20]})
-
-        # Two function pods: one doubles, one triples
-        def double(value: int) -> int:
-            return value * 2
-
-        def triple(value: int) -> int:
-            return value * 3
-
-        pf_double = PythonPacketFunction(double, output_keys="doubled")
-        pf_triple = PythonPacketFunction(triple, output_keys="tripled")
-        pod_double = FunctionPod(pf_double)
-        pod_triple = FunctionPod(pf_triple)
-
-        tracker = GraphTracker()
-        with tracker:
-            doubled = pod_double(src)
-            tripled = pod_triple(src)
-            result = Join()(doubled, tripled)
-
-        tracker.compile()
-
-        orchestrator = AsyncPipelineOrchestrator()
-        result_stream = orchestrator.run(tracker)
-
-        rows = list(result_stream.iter_packets())
-        assert len(rows) == 2
-
-        for _, pkt in rows:
-            d = pkt.as_dict()
-            assert "doubled" in d
-            assert "tripled" in d
-
-
-# ===========================================================================
-# 8. run_async entry point (for callers inside event loop)
+# 7. run_async entry point (for callers inside event loop)
 # ===========================================================================
 
 
@@ -353,22 +317,23 @@ async def test_run_async_from_event_loop(self):
         pf = PythonPacketFunction(double_value, output_keys="result")
         pod = FunctionPod(pf)
 
-        tracker = GraphTracker()
-        with tracker:
-            _ = pod(src)
-        tracker.compile()
+        pipeline = Pipeline(
+            name="async_loop", pipeline_database=InMemoryArrowDatabase()
+        )
+        with pipeline:
+            pod(src, label="doubler")
 
         orchestrator = AsyncPipelineOrchestrator()
-        result = await orchestrator.run_async(tracker)
+        await orchestrator.run_async(pipeline)
 
-        rows = list(result.iter_packets())
-        assert len(rows) == 2
-        values = sorted([pkt.as_dict()["result"] for _, pkt in rows])
+        records = pipeline.doubler.get_all_records()
+        assert records is not None
+        values = sorted(records.column("result").to_pylist())
         assert values == [2, 4]
 
 
 # ===========================================================================
-# 9. PipelineConfig integration
+# 8. PipelineConfig integration
 # ===========================================================================
 
 
@@ -379,10 +344,11 @@ def test_custom_buffer_size(self):
         pf = PythonPacketFunction(double_value, output_keys="result")
         pod = FunctionPod(pf)
 
-        tracker = GraphTracker()
-        with tracker:
-            _ = pod(src)
-        tracker.compile()
+        pipeline = Pipeline(
+            name="bufsize", pipeline_database=InMemoryArrowDatabase()
+        )
+        with pipeline:
+            pod(src, label="doubler")
 
         config = PipelineConfig(
             executor=ExecutorType.ASYNC_CHANNELS,
@@ -390,7 +356,8 @@ def test_custom_buffer_size(self):
         )
 
         orchestrator = AsyncPipelineOrchestrator()
-        result = orchestrator.run(tracker, config=config)
+        orchestrator.run(pipeline, config=config)
 
-        rows = list(result.iter_packets())
-        assert len(rows) == 2
+        records = pipeline.doubler.get_all_records()
+        assert records is not None
+        assert records.num_rows == 2