Skip to content

Refactor Crawler Commands

Priority: 🟡 MEDIUM Status: Planning Related QA Analysis: qa-analysis-overview.md

Problem Statement

Three crawler-related commands have moderate complexity issues:

CrawlerRunCommand

File: Command/CrawlerRunCommand.php:44

  • Cyclomatic Complexity: 13/10 (30% over)
  • NPath Complexity: 1,248/250 (399% over)

CreateAdminCommand

File: Command/CreateAdminCommand.php:35

  • Cyclomatic Complexity: 12/10 (20% over)
  • NPath Complexity: 800/250 (220% over)

CalculateSimilarityCommand

File: Command/CalculateSimilarityCommand.php:28

  • Cyclomatic Complexity: 11/10 (10% over)
  • NPath Complexity: 288/250 (15% over)

Impact

  • Maintainability: Commands should be thin orchestrators
  • Developer Experience: Complex CLI tools harder to maintain
  • Testing: Multiple execution paths to test
  • Lower impact than core services but still technical debt

Guideline Violations

  • SOLID - Single Responsibility Principle: Commands doing too much
  • Command Pattern: Commands should delegate to services

Root Cause Analysis

Commands likely have complexity from:

  • Business logic embedded in command
  • Complex validation and error handling
  • Output formatting mixed with logic
  • Direct orchestration instead of service delegation
  • Multiple responsibilities in single command

Proposed Refactoring Strategy

Step 1: Analyze Each Command

CrawlerRunCommand (CC: 13, NPath: 1,248): Likely handles:

  • Command option parsing and validation
  • Crawler initialization
  • Running crawler with various options
  • Error handling and reporting
  • Progress output
  • Multiple execution modes

CreateAdminCommand (CC: 12, NPath: 800): Likely handles:

  • Interactive prompting for admin details
  • Input validation
  • Password handling
  • User creation logic
  • Role assignment
  • Error handling
  • Output formatting

CalculateSimilarityCommand (CC: 11, NPath: 288): Likely handles:

  • Calculation options
  • Batch processing
  • Progress reporting
  • Result storage
  • Error handling

Step 2: Extract Service Logic

For each command, extract business logic to services:

CrawlerRunCommand:

  • Create CrawlerRunner service
  • Extract option validation
  • Move crawler logic to service
  • Keep command as thin wrapper

CreateAdminCommand:

  • Use existing UserService or create AdminCreationService
  • Extract validation logic
  • Move user creation to service
  • Keep command for I/O only

CalculateSimilarityCommand:

  • Extract calculation logic to service (may already exist)
  • Command handles only I/O and progress reporting

Step 3: Simplify Control Flow

  • Extract complex conditionals
  • Use early returns to reduce nesting
  • Create helper methods for output
  • Separate validation from execution

Step 4: Improve Output Management

  • Extract output formatting to helpers
  • Use Symfony Console helpers (ProgressBar, Table, etc.)
  • Separate output from business logic

Success Criteria

For all commands:

  • __invoke() method cyclomatic complexity < 10
  • NPath complexity < 250
  • Commands are thin orchestrators only
  • Business logic in dedicated services
  • Improved readability and testability
  • Clear separation of concerns

Detailed Refactoring Plans

CrawlerRunCommand (Priority 1)

Complexity: Highest (CC: 13, NPath: 1,248)

Steps:

  1. Create CrawlerRunService or enhance existing
  2. Extract option processing
  3. Extract crawler execution logic
  4. Simplify error handling
  5. Use ProgressBar for output

Estimated Effort: 1.5 days

CreateAdminCommand (Priority 2)

Complexity: Medium-high (CC: 12, NPath: 800)

Steps:

  1. Create AdminCreationService if needed
  2. Extract validation logic
  3. Extract user creation logic
  4. Simplify input handling (use Question Helper)
  5. Improve output formatting

Estimated Effort: 1 day

CalculateSimilarityCommand (Priority 3)

Complexity: Lowest (CC: 11, NPath: 288)

Steps:

  1. Verify similarity calculation service exists
  2. Extract any business logic to service
  3. Simplify batch processing loop
  4. Use ProgressBar for long operations
  5. Clean up error handling

Estimated Effort: 0.5 day

Risk Assessment

Low-Medium Risk:

  • Commands are less frequently modified than core services
  • Primarily affect developer experience, not end users
  • Testing relatively straightforward

Mitigation:

  • Test command execution scenarios
  • Verify command output
  • Test error handling paths
  • Integration tests for CLI

Estimated Total Effort

Low-Medium:

  • CrawlerRunCommand: 1.5 days
  • CreateAdminCommand: 1 day
  • CalculateSimilarityCommand: 0.5 day
  • Testing & verification: 1 day
  • Total: 4 days

Implementation Order

  1. CalculateSimilarityCommand (easiest, lowest complexity)
  2. CreateAdminCommand (medium complexity)
  3. CrawlerRunCommand (highest complexity)

Or prioritize based on:

  • Most frequently used command
  • Most problematic command
  • Upcoming feature needs

Dependencies

None - can be addressed independently

  • See simplify-operational-commands.md for related command refactoring
  • Similar patterns can be applied across all commands

Notes

  • Commands should ideally be <50 lines of orchestration code
  • Use Symfony Console components for I/O (ProgressBar, Table, Question)
  • Business logic belongs in services, not commands
  • Lower priority than Critical/High items but improves developer experience
  • Can be addressed opportunistically when modifying these commands
  • Good learning opportunity for command pattern best practices