Skip to content

[file-diet] Refactor pkg/parser/import_processor.go (1104 lines) into focused modules #17920

@github-actions

Description

@github-actions

Overview

The file pkg/parser/import_processor.go has grown to 1104 lines, well above the project's healthy threshold of 800 lines. It currently handles five distinct concerns in a single file: import parsing, BFS traversal, content extraction/merging, cycle detection, and remote import context. This task involves splitting it into smaller, focused files with clear boundaries.

Current State

  • File: pkg/parser/import_processor.go
  • Size: 1104 lines
  • Test file: No dedicated import_processor_test.go — coverage is spread across 9 separate test files (import_cache_test.go, import_cycle_test.go, import_error_test.go, import_remote_nested_test.go, import_syntax_test.go, import_topological_test.go, etc.)
  • Complexity: High — single BFS loop (lines 182–847) handles 20+ config merge operations; highly repetitive extract-merge pattern repeated ~15–20 times
Full File Analysis

Function / Type Inventory

Symbol Visibility ~Lines Responsibility
ImportsResult exported struct 20–52 Aggregates all merged config fields from imports
ImportInputDefinition exported struct 59–65 Input parameter definition
ImportSpec exported struct 68–74 Single import spec (path + inputs)
importLog var 17 Logger instance
remoteImportOrigin unexported struct 94–99 Tracks GitHub repo context for nested imports
importQueueItem unexported struct 102–109 BFS queue item with context
ProcessImportsFromFrontmatter exported func 82 Public entry point
ProcessImportsFromFrontmatterWithManifest exported func 171 Extended entry point with manifest
ProcessImportsFromFrontmatterWithSource exported func 177 Extended entry point with source tracking
parseRemoteOrigin unexported func 117 Parses workflowspec format into remote origin
processImportsFromFrontmatterWithManifestAndSource unexported func 182–847 Core BFS loop — 665 lines
findCyclePath unexported func 851 DFS helper to reconstruct cycle path
dfsForCycle unexported func 877 DFS traversal for cycle detection
topologicalSortImports unexported func 918 Kahn's algorithm topological sort
extractImportPaths unexported func 1072 Extracts import paths from frontmatter

Duplicate / Repetitive Patterns

Pattern 1 — Extract-from-Content (lines 635–804, ~170 lines)
Repeated ~15–20 times for different config sections (engines, steps, services, runtimes, permissions, safe-outputs, etc.):

xContent, err := extractXFromContent(string(content))
if err == nil && xContent != "" && xContent != "{}" {
    xBuilder.WriteString(xContent + "\n")
}

Pattern 2 — Array Deduplication with Set (lines 694–787, ~4 occurrences)
Identical dedup logic for bots, skip-roles, skip-bots, plugins, labels arrays.

Pattern 3 — Section Extraction (lines 288–295, 507–515, 933–938)
#Section suffix stripped from import path in 3 separate places.

Pattern 4 — Relative-to-Remote Path Resolution (lines 521–550, 506–587)
Duplicated logic converting relative import paths to remote GitHub paths.

Refactoring Strategy

Proposed File Splits

  1. import_processor.go (trimmed to entry points only)

    • Exported structs: ImportsResult, ImportInputDefinition, ImportSpec
    • Exported entry points: ProcessImportsFromFrontmatter, ProcessImportsFromFrontmatterWithManifest, ProcessImportsFromFrontmatterWithSource
    • Internal entry: processImportsFromFrontmatterWithManifestAndSource (trimmed)
    • Estimated LOC: ~200
  2. import_traversal.go (BFS queue and core traversal logic)

    • Types: importQueueItem, remoteImportOrigin
    • Functions: parseRemoteOrigin, BFS loop body (currently lines 182–634)
    • Responsibility: Breadth-first traversal, queue management, remote context propagation
    • Estimated LOC: ~350
  3. import_merger.go (content extraction and merging)

    • Functions: all extractXFromContent call sites consolidated with a helper, dedup logic, stringBuilderMerge utility
    • Responsibility: Merge 20+ config sections from imported files into ImportsResult
    • Estimated LOC: ~250
  4. import_graph.go (cycle detection and topological sort)

    • Functions: topologicalSortImports, findCyclePath, dfsForCycle
    • Responsibility: Dependency graph, cycle detection, ordering
    • Estimated LOC: ~200
  5. import_paths.go (path parsing and resolution)

    • Functions: extractImportPaths, section-suffix parsing, relative-to-remote resolution helpers
    • Responsibility: All import path manipulation and resolution logic
    • Estimated LOC: ~100

Shared Utilities to Extract

  • mergeStringArrayUniq(dst *[]string, src []string) — eliminates 4 copies of the set-based dedup loop
  • extractSection(path string) (cleanPath, section string) — eliminates 3 copies of #Section suffix parsing

Interface Abstractions

  • Consider extracting an importResolver interface to decouple local vs. remote file fetching, improving unit testability without real filesystem/network access.
Test Coverage Plan

Existing tests cover behaviour end-to-end via the 9 test files. After splitting, verify each new file independently:

  1. import_paths_test.go

    • extractImportPaths with empty/nil frontmatter
    • Section suffix parsing (workflow#Section)
    • Relative-to-remote path conversion
  2. import_graph_test.go

    • topologicalSortImports with linear, branching, diamond graphs
    • findCyclePath / dfsForCycle with simple and complex cycles
    • Already partially covered by import_cycle_test.go and import_topological_test.go
  3. import_merger_test.go

    • mergeStringArrayUniq deduplication helper
    • Merging of individual config sections (tools, steps, engines)
    • Empty/nil content handling
  4. import_traversal_test.go

    • Remote origin parsing (parseRemoteOrigin)
    • BFS queue ordering
    • Already partially covered by import_remote_nested_test.go

Target: ≥80% coverage for all new files; all existing tests must continue to pass.

Implementation Guidelines

  1. Preserve Behaviour: All existing ProcessImports* functions must return identical results
  2. Maintain Exports: ImportsResult, ImportInputDefinition, ImportSpec, and the three ProcessImports* functions stay exported and unchanged
  3. Add Tests First: Write unit tests for extracted helpers before moving code
  4. Incremental Splits: Split one module at a time, running make test-unit after each
  5. Extract Helpers First: Start with mergeStringArrayUniq and extractSection to reduce repetition before splitting files
  6. Build Tags: Every new *_test.go file must start with //go:build !integration
  7. Linting: Run make lint after each split — watch for unused helper warnings

Acceptance Criteria

  • import_processor.go split into 4–5 focused files
  • Each new file is under 400 lines
  • mergeStringArrayUniq and extractSection helpers extracted and tested
  • All tests pass (make test-unit)
  • No breaking changes to public API
  • Code passes linting (make lint)
  • Build succeeds (make build)
Additional Context
  • Repository Guidelines: See AGENTS.md — "Prefer many smaller files grouped by functionality"
  • Validation Complexity: Target 100–200 lines per file, hard limit 300 (same principle applies here)
  • Testing: Match build-tag and table-driven patterns in existing pkg/parser/*_test.go files
  • Workflow run: §22308275991

Priority: Medium
Effort: Medium (well-defined boundaries, no API changes, existing test coverage provides safety net)
Expected Impact: Improved maintainability, easier code navigation, reduced merge conflicts, clearer ownership of each concern

Generated by Daily File Diet

  • expires on Feb 25, 2026, 1:36 PM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions