Skip to content

bug fixes#1564

Merged
nikhilsinhaparseable merged 2 commits intoparseablehq:mainfrom
nikhilsinhaparseable:query-bug-fixes
Mar 2, 2026
Merged

bug fixes#1564
nikhilsinhaparseable merged 2 commits intoparseablehq:mainfrom
nikhilsinhaparseable:query-bug-fixes

Conversation

@nikhilsinhaparseable
Copy link
Contributor

@nikhilsinhaparseable nikhilsinhaparseable commented Mar 2, 2026

  1. update metrics call for listing queries
  2. fix delete on retention

Summary by CodeRabbit

  • Refactor
    • Improved error handling and logging around snapshot retention to reduce unexpected failures.
    • Optimized internal metrics collection to simplify and stabilize request timing and status reporting.
  • Chores
    • Updated build asset checksum metadata.

1. update metrics call for listing queries
2. fix delete on retention
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 2, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 730710c and f3965a7.

📒 Files selected for processing (1)
  • Cargo.toml
✅ Files skipped from review due to trivial changes (1)
  • Cargo.toml

Walkthrough

Adjusted catalog retention cleanup to log errors instead of propagating them; refactored storage metrics wrapper by replacing a const-generic labels array with explicit provider/method/status fields and updated metric recording. Also updated an assets SHA1 checksum in Cargo.toml.

Changes

Cohort / File(s) Summary
Catalog Error Handling
src/catalog/mod.rs
Changed remove_manifest_from_snapshot to ignore the dispatched retention-cleanup future (let _ =) and map errors to tracing::error rather than propagating them.
Metrics Layer Refactor
src/storage/metrics_layer.rs
Removed const-generic N from StreamMetricWrapper; replaced fixed labels array with fields provider: String, method: &'static str, status: &'static str; updated wrapper construction and poll_next to use the new fields for metric labels.
Metadata Update
Cargo.toml
Updated package.metadata.parseable_ui.assets-sha1 checksum value.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

for next release

Suggested reviewers

  • parmesant

Poem

🐰 I hopped through code with joyful cheer,
I logged the errors that crept near.
I split the labels, neat and spry,
Timed metrics watch the streams go by.
A tiny rabbit says: "Well-coded!" 🥕

🚥 Pre-merge checks | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is minimal and lacks the required template structure, missing detailed explanations of goals, solutions, rationale, and key changes. Complete the full PR description template with detailed explanation of each fix, including problem statement, chosen solution, rationale, and verification that the changes were tested.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'bug fixes' is vague and generic, failing to convey specific information about what bugs were fixed or which components were affected. Revise the title to be more specific, such as 'Fix retention cleanup error handling and metrics labels for listing queries' to clearly describe the actual changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/catalog/mod.rs (1)

536-558: ⚠️ Potential issue | 🟠 Major

Do not silently succeed when retention cleanup fanout fails.

Line [542] dispatches through for_each_live_node (which includes queriers in Mode::Prism), and Line [558] logs then discards all errors. If any ingestor cleanup fails, this path still returns success after local snapshot mutation, which can leave cluster state inconsistent.

Suggested fix direction
-        let _ = for_each_live_node(tenant_id, move |ingestor| {
+        // fan out only to ingestors, then propagate failures
+        if let Err(err) = for_each_live_node(tenant_id, move |ingestor| {
             let stream_name = stream_name_clone.clone();
             let dates = dates_clone.clone();
             async move {
                 let url = format!(
                     "{}{}/logstream/{}/retention/cleanup",
                     ingestor.domain_name,
                     base_path_without_preceding_slash(),
                     stream_name
                 );
                 handlers::http::cluster::send_retention_cleanup_request(&url, ingestor, &dates)
                     .await?;
                 Ok::<(), ObjectStorageError>(())
             }
         })
-        .await
-        .map_err(|e| tracing::error!("{e}"));
+        .await
+        {
+            tracing::error!(error = ?err, stream = %stream_name_clone, dates = ?dates_clone, "retention cleanup fanout failed");
+            return Err(err);
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/catalog/mod.rs` around lines 536 - 558, The fanout to other nodes uses
for_each_live_node and currently logs and swallows errors (map_err(|e|
tracing::error!("{e}"))), which can hide failed ingestor cleanup; change the
logic to (1) only target ingestors (not queriers) when dispatching retention
cleanup—filter nodes by role or use an ingestor-specific helper instead of
for_each_live_node if available—and (2) propagate any error instead of
discarding it: await the future, return Err(...) or use ? so the caller observes
failures from handlers::http::cluster::send_retention_cleanup_request; update
the closure/return types around send_retention_cleanup_request,
for_each_live_node, and the surrounding function so errors bubble up rather than
being logged and ignored.
🧹 Nitpick comments (1)
src/storage/metrics_layer.rs (1)

294-299: LIST/LIST_OFFSET metrics still use a hard-coded success status.

Line [298] and Line [315] set status to "200", and it is never updated before emission on Line [421]. That makes list failures indistinguishable from successful calls in latency metrics.

Possible refactor
-struct StreamMetricWrapper<'a, T> {
+struct StreamMetricWrapper<'a> {
     time: time::Instant,
     provider: String,
     method: &'static str,
-    status: &'static str,
-    inner: BoxStream<'a, T>,
+    status: &'static str,
+    inner: BoxStream<'a, ObjectStoreResult<ObjectMeta>>,
 }

-impl<T> Stream for StreamMetricWrapper<'_, T> {
-    type Item = T;
+impl Stream for StreamMetricWrapper<'_> {
+    type Item = ObjectStoreResult<ObjectMeta>;

     fn poll_next(
         mut self: std::pin::Pin<&mut Self>,
         cx: &mut Context<'_>,
     ) -> Poll<Option<Self::Item>> {
         match self.inner.poll_next_unpin(cx) {
+            t @ Poll::Ready(Some(Err(ref err))) => {
+                STORAGE_REQUEST_RESPONSE_TIME
+                    .with_label_values(&[&self.provider, self.method, error_to_status_code(err)])
+                    .observe(self.time.elapsed().as_secs_f64());
+                t
+            }
             t @ Poll::Ready(None) => {
                 STORAGE_REQUEST_RESPONSE_TIME
                     .with_label_values(&[&self.provider, self.method, self.status])
                     .observe(self.time.elapsed().as_secs_f64());
                 t
             }
             t => t,
         }
     }
 }

Also applies to: 311-316, 403-422

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/catalog/mod.rs`:
- Around line 536-558: The fanout to other nodes uses for_each_live_node and
currently logs and swallows errors (map_err(|e| tracing::error!("{e}"))), which
can hide failed ingestor cleanup; change the logic to (1) only target ingestors
(not queriers) when dispatching retention cleanup—filter nodes by role or use an
ingestor-specific helper instead of for_each_live_node if available—and (2)
propagate any error instead of discarding it: await the future, return Err(...)
or use ? so the caller observes failures from
handlers::http::cluster::send_retention_cleanup_request; update the
closure/return types around send_retention_cleanup_request, for_each_live_node,
and the surrounding function so errors bubble up rather than being logged and
ignored.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3bf5673 and 730710c.

📒 Files selected for processing (2)
  • src/catalog/mod.rs
  • src/storage/metrics_layer.rs

coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 2, 2026
@nikhilsinhaparseable nikhilsinhaparseable merged commit f7b01e3 into parseablehq:main Mar 2, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants