Coffee Bean Image Management - Planning Document¶
This document covers three related image features: 1. Backend Image Dashboard (Priority: ASAP) 2. Broken Image Crawl Job (Priority: Medium-term) 3. Image Caching & Proxy (Priority: Later, needs most planning)
Design Decisions (Confirmed)¶
| Decision | Choice | Rationale |
|---|---|---|
| Phase 2: Check result storage | Separate ImageCheck table |
Audit trail, check history, cleaner entity |
| Phase 2: Validation method | GET + magic bytes | Most thorough - verifies actual image data |
| Phase 3: Storage backend | S3/MinIO | Scalable, CDN-ready for future |
| Phase 3: Image transformation | Optimize + thumbnails | WebP conversion, multiple sizes |
| Phase 3: API response | Replace imageUrl | Proxy URL replaces original in API response |
Current State¶
Entity Structure¶
CoffeeBean.imageUrl: nullable VARCHAR(255) storing external URLs- Images extracted from JSON-LD Product schema or Open Graph
og:imagemeta tags - URLs normalized (scheme-less
//converted tohttps:) - Validated with Symfony URL constraint during
FullDiscoverygroup - Image URL is one of 6 "strongly recommended" fields affecting data quality status
Existing Patterns¶
- Admin: EasyAdmin 4.x with custom CRUD controllers and dashboards
- Scheduling: Symfony Scheduler with
#[AsSchedule]providers - Async Jobs: Symfony Messenger with Doctrine transport
- HTTP Client:
HttpClientInterfacewith custom exception handling
Phase 1: Backend Image Dashboard (ASAP)¶
Goal¶
Create an admin view showing coffee beans with image thumbnails, accessible from the review dashboard.
Implementation Approach¶
New CRUD Controller: CoffeeBeanImageCrudController¶
- Focused view with minimal fields: thumbnail, name, roaster name
- Filters: roaster, has/missing image URL, data status
- Sortable by creation date, roaster
Fields Configuration¶
- ImageField::new('imageUrl')->setLabel('Image') - thumbnail display
- TextField::new('name')->setLabel('Bean Name')
- AssociationField or formatted text for roaster name (via CrawlUrl relationship)
- BooleanField or ChoiceField for image status (present/missing)
Filters¶
RoasterFilter(existing) - filter by roasterBooleanFilteror custom filter forimageUrl IS NOT NULLChoiceFilterfor status
Menu & Dashboard Integration¶
- Add menu item in DashboardController under "Coffee Bean Management"
- Add link card in review_dashboard.html.twig
Files to Create/Modify¶
src/Controller/Admin/CoffeeBeanImageCrudController.php(new)src/Controller/Admin/DashboardController.php(menu item)templates/admin/review_dashboard.html.twig(link card)- Possibly:
src/Filter/HasImageFilter.php(custom filter if needed)
Phase 2: Broken Image Crawl Job (Medium-term)¶
Goal¶
Periodically verify image URLs return HTTP 200 and contain valid image content.
Design: Separate ImageCheck Entity¶
#[ORM\Entity]
class ImageCheck
{
#[ORM\Id]
#[ORM\Column(type: 'uuid')]
private Uuid $id;
#[ORM\ManyToOne(targetEntity: CoffeeBean::class)]
#[ORM\JoinColumn(nullable: false, onDelete: 'CASCADE')]
private CoffeeBean $coffeeBean;
#[ORM\Column(length: 255)]
private string $imageUrl; // Snapshot of URL at check time
#[ORM\Column(type: 'datetime_immutable')]
private DateTimeImmutable $checkedAt;
#[ORM\Column(enumType: ImageCheckStatus::class)]
private ImageCheckStatus $status; // VALID, BROKEN, TIMEOUT, ERROR
#[ORM\Column(nullable: true)]
private ?int $httpStatusCode = null;
#[ORM\Column(length: 100, nullable: true)]
private ?string $contentType = null;
#[ORM\Column(length: 20, nullable: true)]
private ?string $detectedFormat = null; // From magic bytes: jpeg, png, webp, gif
#[ORM\Column(nullable: true)]
private ?int $contentLength = null;
#[ORM\Column(type: 'text', nullable: true)]
private ?string $errorMessage = null;
}
Validation: GET + Magic Bytes¶
Image magic byte signatures to detect:
- JPEG: FF D8 FF
- PNG: 89 50 4E 47 0D 0A 1A 0A
- GIF: 47 49 46 38 (GIF8)
- WebP: 52 49 46 46 ... 57 45 42 50 (RIFF...WEBP)
Service will:
1. Send GET request with Range: bytes=0-15 header (fetch first 16 bytes only)
2. Check HTTP status code (200 or 206)
3. Verify Content-Type header starts with image/
4. Match magic bytes against known signatures
Implementation¶
Entity & Enum¶
src/Entity/ImageCheck.phpsrc/Enum/ImageCheckStatus.php(VALID, BROKEN, TIMEOUT, ERROR)
Service: ImageValidationService¶
validate(string $url): ImageCheckResult- Uses HttpClient with Range header
- Magic byte detection logic
- Returns structured result
Scheduler: ImageValidationSchedulerService¶
- Cron:
0 3 */3 * *(3 AM every 3 days) - Query: CoffeeBeans with imageUrl where no ImageCheck in last 3 days
- Dispatch
ImageValidationMessageper bean
Message & Handler¶
src/Message/ImageValidationMessage.phpsrc/MessageHandler/ImageValidationHandler.php
Command: app:validate-images¶
- Manual trigger with options:
--dry-run,--limit=N,--force
Files to Create¶
src/Entity/ImageCheck.phpsrc/Repository/ImageCheckRepository.phpsrc/Enum/ImageCheckStatus.phpsrc/Service/Image/ImageValidationService.phpsrc/Scheduler/ImageValidationSchedulerService.phpsrc/Message/ImageValidationMessage.phpsrc/MessageHandler/ImageValidationHandler.phpsrc/Command/ValidateImagesCommand.php- Migration for
image_checktable
Phase 3: Image Caching & Proxy (Later - Needs Most Planning)¶
Goal¶
Cache external images in S3/MinIO, transform to optimized formats, and serve through our infrastructure.
Storage: S3/MinIO with Flysystem¶
Use league/flysystem-aws-s3-v3 for abstraction:
- Development: MinIO container
- Production: S3 or S3-compatible storage
- Easy CDN integration later (CloudFront, etc.)
Bucket Structure¶
images/
├── original/
│ └── {bean_uuid}.{ext} # Original fetched image
├── optimized/
│ └── {bean_uuid}.webp # Full-size WebP
└── thumbnails/
├── {bean_uuid}_sm.webp # 150x150
└── {bean_uuid}_md.webp # 400x400
Database: CachedImage Entity¶
#[ORM\Entity]
class CachedImage
{
#[ORM\Id]
#[ORM\Column(type: 'uuid')]
private Uuid $id;
#[ORM\OneToOne(targetEntity: CoffeeBean::class)]
#[ORM\JoinColumn(nullable: false, onDelete: 'CASCADE')]
private CoffeeBean $coffeeBean;
#[ORM\Column(length: 255)]
private string $originalUrl; // Source URL
#[ORM\Column(length: 64)]
private string $originalUrlHash; // SHA256 for dedup
#[ORM\Column(length: 64, nullable: true)]
private ?string $contentHash = null; // SHA256 of image bytes
#[ORM\Column(type: 'datetime_immutable')]
private DateTimeImmutable $cachedAt;
#[ORM\Column(type: 'datetime_immutable', nullable: true)]
private ?DateTimeImmutable $lastValidatedAt = null;
#[ORM\Column(enumType: CachedImageStatus::class)]
private CachedImageStatus $status; // PENDING, CACHED, FAILED, STALE
#[ORM\Column(length: 50, nullable: true)]
private ?string $originalMimeType = null;
#[ORM\Column(nullable: true)]
private ?int $originalSize = null;
#[ORM\Column(nullable: true)]
private ?int $optimizedSize = null;
#[ORM\Column(type: 'json', nullable: true)]
private ?array $variants = null; // ['sm' => true, 'md' => true, 'full' => true]
}
Image Transformation: Intervention Image¶
Use intervention/image with GD or Imagick driver:
- Convert to WebP (80% quality)
- Generate thumbnail sizes: 150x150 (sm), 400x400 (md)
- Preserve aspect ratio with cover/contain
Sizes Configuration¶
// config/packages/image_cache.php or service parameter
'image_variants' => [
'sm' => ['width' => 150, 'height' => 150, 'fit' => 'cover'],
'md' => ['width' => 400, 'height' => 400, 'fit' => 'contain'],
'full' => ['width' => 1200, 'height' => 1200, 'fit' => 'contain'], // max size
]
Cache Invalidation Strategy¶
- Time-based: Re-validate after 7 days (via Phase 2 job)
- On broken detection: Phase 2 marks as STALE, triggers re-fetch
- Manual: Admin action to invalidate and re-cache
- Source URL change: If CoffeeBean.imageUrl changes, invalidate
API Integration¶
Proxy Controller¶
GET /api/images/{beanUuid} → Full optimized WebP
GET /api/images/{beanUuid}?size=sm → 150x150 thumbnail
GET /api/images/{beanUuid}?size=md → 400x400 medium
GET /api/images/{beanUuid}?original=1 → Original (if stored)
Response headers:
- Cache-Control: public, max-age=86400 (1 day)
- ETag based on content hash
- Content-Type: image/webp
DTO Changes¶
Modify CoffeeBeanDTO mapping:
- imageUrl returns proxy URL when cached, fallback to original if not cached
- Original URL stored in CachedImage.originalUrl for reference
- Add imageVariants array for size options: ['sm' => url, 'md' => url, 'full' => url]
// In EntityToDtoMapper
$imageUrl = $cachedImage?->getStatus() === CachedImageStatus::CACHED
? $this->imageProxyUrlGenerator->generate($coffeeBean)
: $coffeeBean->getImageUrl();
Implementation¶
Services¶
ImageCacheService: Orchestrates caching workflowImageStorageService: S3/Flysystem operationsImageTransformService: Resize/convert with InterventionImageProxyUrlGenerator: Generate signed/public URLs
Async Processing¶
ImageCacheMessage: Trigger caching for a beanImageCacheHandler: Fetch, transform, upload to S3- Batch job for initial migration of existing images
Controller¶
ImageProxyController: Serve images, handle cache-on-demand
Integration with Phase 2¶
- Phase 2 validates → marks ImageCheck as BROKEN
- Listener detects broken check → marks CachedImage as STALE
- Re-cache job picks up STALE images → attempts re-fetch
- If still broken → CachedImage status = FAILED, API returns fallback
Files to Create¶
src/Entity/CachedImage.phpsrc/Repository/CachedImageRepository.phpsrc/Enum/CachedImageStatus.phpsrc/Service/Image/ImageCacheService.phpsrc/Service/Image/ImageStorageService.phpsrc/Service/Image/ImageTransformService.phpsrc/Service/Image/ImageProxyUrlGenerator.phpsrc/Controller/Api/ImageProxyController.phpsrc/Message/ImageCacheMessage.phpsrc/MessageHandler/ImageCacheHandler.phpsrc/EventListener/BrokenImageListener.php(optional: event-driven)- Migration for
cached_imagetable - Config for Flysystem S3 adapter
Dependencies to Add¶
Implementation Order¶
- Phase 1: Image Dashboard - Implement ASAP (simple, self-contained)
- Phase 2: Broken Image Job - After Phase 1 (foundation for Phase 3)
- Phase 3: Image Caching - After Phase 2 (depends on validation infrastructure)
Critical Files Reference¶
Existing (patterns to follow)¶
src/Controller/Admin/CoffeeBeanCrudController.php- CRUD controller patternsrc/Controller/Admin/DashboardController.php- Menu items, custom routessrc/Filter/RoasterFilter.php- Custom filter through relationshipssrc/Scheduler/AvailabilityCrawlSchedulerService.php- Scheduler patternsrc/MessageHandler/CrawlStepHandler.php- Message handler patterntemplates/admin/review_dashboard.html.twig- Dashboard template
New Files Summary¶
Phase 1 (4-5 files)¶
src/Controller/Admin/CoffeeBeanImageCrudController.phpsrc/Filter/HasImageFilter.php(optional)- Modify:
DashboardController.php,review_dashboard.html.twig
Phase 2 (9 files + migration)¶
src/Entity/ImageCheck.phpsrc/Repository/ImageCheckRepository.phpsrc/Enum/ImageCheckStatus.phpsrc/Service/Image/ImageValidationService.phpsrc/Scheduler/ImageValidationSchedulerService.phpsrc/Message/ImageValidationMessage.phpsrc/MessageHandler/ImageValidationHandler.phpsrc/Command/ValidateImagesCommand.php
Phase 3 (12+ files + migration + config)¶
src/Entity/CachedImage.phpsrc/Repository/CachedImageRepository.phpsrc/Enum/CachedImageStatus.phpsrc/Service/Image/ImageCacheService.phpsrc/Service/Image/ImageStorageService.phpsrc/Service/Image/ImageTransformService.phpsrc/Service/Image/ImageProxyUrlGenerator.phpsrc/Controller/Api/ImageProxyController.phpsrc/Message/ImageCacheMessage.phpsrc/MessageHandler/ImageCacheHandler.php- Config: Flysystem S3 adapter
- Modify:
CoffeeBeanDTO.php,EntityToDtoMapper.php