Refactor CoffeeBeanPersister¶
Priority: 🟠HIGH Status: Planning Related QA Analysis: qa-analysis-overview.md
Problem Statement¶
Service/Crawler/Persistance/CoffeeBeanPersister.php:52 has excessive complexity:
Violations¶
- Method:
processCoffeeBeanFromDTO() - Cyclomatic Complexity: 14/10 (40% over threshold)
- NPath Complexity: 308/250 (23% over threshold)
Impact¶
- Data Quality: Critical path for crawler data ingestion
- Bug Risk: Complex persistence logic is fragile and error-prone
- Maintainability: Difficult to understand and modify
- Testing: Hard to test all execution paths
- Mixed Concerns: Likely mixing validation, transformation, and persistence
Guideline Violations¶
- SOLID - Single Responsibility Principle: Method doing too much
- Separation of Concerns: Business logic mixed with persistence
Current Issues¶
Based on method name and complexity, processCoffeeBeanFromDTO() likely handles:
- DTO validation
- Data transformation (DTO → Entity)
- Business rule validation
- Relationship handling (roaster, variety, region, etc.)
- Duplicate detection
- Entity creation or update logic
- Persistence orchestration
- Error handling
- Possibly logging/auditing
Root Cause Analysis¶
High complexity likely from:
- Multiple conditional paths for create vs update
- Complex relationship resolution
- Validation logic embedded in method
- Error handling for various failure scenarios
- Duplicate detection logic
- Data transformation rules
Proposed Refactoring Strategy¶
Step 1: Analyze Current Method¶
- Document all operations performed
- Identify business rules
- Map conditional branches
- Find duplication opportunities
- Review error scenarios
Step 2: Extract Validation Logic¶
Create dedicated validators:
CoffeeBeanDtoValidator- Validate DTO dataCoffeeBeanBusinessRules- Business rule validation- Keep validation separate from persistence
Step 3: Extract Transformation Logic¶
Create dedicated transformers:
CoffeeBeanDtoToEntityTransformer- DTO → Entity mapping- Handle data type conversions
- Apply transformation rules
- Keep transformation pure (no side effects)
Step 4: Extract Relationship Resolution¶
Create relationship resolver:
CoffeeBeanRelationshipResolver- Resolve roaster, variety, region, etc.
- Handle relationship creation/lookup
- Separate concern from main persistence
Step 5: Simplify Persistence Method¶
Reduce processCoffeeBeanFromDTO() to orchestration:
public function processCoffeeBeanFromDTO(CoffeeBeanDTO $dto): CoffeeBean
{
// Validate
$this->validator->validate($dto);
// Transform
$entity = $this->transformer->transform($dto);
// Resolve relationships
$this->relationshipResolver->resolve($entity, $dto);
// Check for duplicates
$existing = $this->duplicateDetector->find($entity);
if ($existing) {
return $this->updateExisting($existing, $entity);
}
// Persist
return $this->persist($entity);
}
Step 6: Extract Duplicate Detection¶
CoffeeBeanDuplicateDetectorservice- Clear duplicate detection rules
- Separate from persistence logic
Success Criteria¶
processCoffeeBeanFromDTO()cyclomatic complexity < 10- NPath complexity < 250
- Method < 30 lines (guideline adherence)
- Clear separation of concerns:
- Validation
- Transformation
- Relationship resolution
- Duplicate detection
- Persistence
- Improved testability
- No regression in data quality
- Better error messages
Testing Strategy¶
Before Refactoring¶
- Add comprehensive integration tests
- Test all known scenarios:
- New coffee bean creation
- Duplicate detection
- Update existing bean
- Various DTO variations
- Error scenarios
After Refactoring¶
- Unit test each extracted component
- Verify integration tests still pass
- Add tests for edge cases discovered
- Test error handling paths
Risk Assessment¶
High Risk:
- Critical data ingestion path
- Affects crawler data quality
- Complex existing logic to preserve
- Potential for subtle bugs
Mitigation:
- Comprehensive test coverage first
- Incremental extraction (one concern at a time)
- Extensive integration testing
- Test with production-like data
- Feature flag new implementation
- Monitor data quality metrics post-deployment
Estimated Effort¶
Medium:
- Method analysis: 0.5 day
- Test coverage: 1 day
- Validator extraction: 0.5 day
- Transformer extraction: 1 day
- Relationship resolver: 1 day
- Duplicate detector: 0.5 day
- Integration & testing: 1.5 days
- Total: 6 days
Implementation Order¶
- Add test coverage (critical!)
- Extract validator (easiest, least risky)
- Extract transformer
- Extract relationship resolver
- Extract duplicate detector
- Simplify main method
- Comprehensive testing
Dependencies¶
- H1: Simplify Domain Entities - CoffeeBean entity refactoring may impact this
- Consider coordinating if entity gets value objects
- Transformer may need updates
Related Issues¶
- May reveal data quality issues during refactoring
- Could expose unclear business rules
- Might find opportunities to improve duplicate detection
Notes¶
- This is the PRIMARY data ingestion point for crawler
- Data quality is paramount - test thoroughly
- Complex method suggests unclear requirements - may need domain expert input
- Consider adding data quality metrics/monitoring
- May discover edge cases not currently handled
- Refactoring could improve crawler reliability overall