Skip to content

Phase 3: Image Caching & Proxy

Priority: Later Complexity: High Dependencies: None (but benefits from Phase 2 for validation integration)


Goal

Cache external images in S3/MinIO, transform to optimized formats, and serve through our infrastructure.


Design Decisions

Decision Choice Rationale
Storage backend S3/MinIO Scalable, CDN-ready for future
Image transformation Optimize + thumbnails WebP conversion, multiple sizes
API response Replace imageUrl Proxy URL replaces original in API response

Storage: S3/MinIO with Flysystem

Use league/flysystem-aws-s3-v3 for abstraction: - Development: MinIO container - Production: S3 or S3-compatible storage - Easy CDN integration later (CloudFront, etc.)

Bucket Structure

images/
├── original/
│   └── {bean_uuid}.{ext}      # Original fetched image
├── optimized/
│   └── {bean_uuid}.webp       # Full-size WebP
└── thumbnails/
    ├── {bean_uuid}_sm.webp    # 150x150
    └── {bean_uuid}_md.webp    # 400x400

Entity: CachedImage

#[ORM\Entity(repositoryClass: CachedImageRepository::class)]
class CachedImage
{
    #[ORM\Id]
    #[ORM\Column(type: 'uuid')]
    private Uuid $id;

    #[ORM\OneToOne(targetEntity: CoffeeBean::class)]
    #[ORM\JoinColumn(nullable: false, onDelete: 'CASCADE')]
    private CoffeeBean $coffeeBean;

    #[ORM\Column(length: 255)]
    private string $originalUrl;  // Source URL

    #[ORM\Column(length: 64)]
    private string $originalUrlHash;  // SHA256 for dedup

    #[ORM\Column(length: 64, nullable: true)]
    private ?string $contentHash = null;  // SHA256 of image bytes

    #[ORM\Column(type: 'datetime_immutable')]
    private DateTimeImmutable $cachedAt;

    #[ORM\Column(type: 'datetime_immutable', nullable: true)]
    private ?DateTimeImmutable $lastValidatedAt = null;

    #[ORM\Column(enumType: CachedImageStatus::class)]
    private CachedImageStatus $status;  // PENDING, CACHED, FAILED, STALE

    #[ORM\Column(length: 50, nullable: true)]
    private ?string $originalMimeType = null;

    #[ORM\Column(nullable: true)]
    private ?int $originalSize = null;

    #[ORM\Column(nullable: true)]
    private ?int $optimizedSize = null;

    #[ORM\Column(type: 'json', nullable: true)]
    private ?array $variants = null;  // ['sm' => true, 'md' => true, 'full' => true]
}

Image Transformation: Intervention Image

Use intervention/image with GD or Imagick driver: - Convert to WebP (80% quality) - Generate thumbnail sizes: 150x150 (sm), 400x400 (md) - Preserve aspect ratio with cover/contain

Sizes Configuration

// config/packages/image_cache.php or service parameter
'image_variants' => [
    'sm' => ['width' => 150, 'height' => 150, 'fit' => 'cover'],
    'md' => ['width' => 400, 'height' => 400, 'fit' => 'contain'],
    'full' => ['width' => 1200, 'height' => 1200, 'fit' => 'contain'],
]

Cache Invalidation Strategy

  1. Time-based: Re-validate after 7 days (via Phase 2 job if available)
  2. On broken detection: Phase 2 marks as STALE, triggers re-fetch
  3. Manual: Admin action to invalidate and re-cache
  4. Source URL change: If CoffeeBean.imageUrl changes, invalidate

API Integration

Proxy Controller

GET /api/images/{beanUuid}              → Full optimized WebP
GET /api/images/{beanUuid}?size=sm      → 150x150 thumbnail
GET /api/images/{beanUuid}?size=md      → 400x400 medium
GET /api/images/{beanUuid}?original=1   → Original (if stored)

Response headers: - Cache-Control: public, max-age=86400 (1 day) - ETag based on content hash - Content-Type: image/webp

DTO Changes

Modify EntityToDtoMapper: - imageUrl returns proxy URL when cached, fallback to original if not cached - Original URL stored in CachedImage.originalUrl for reference - Add imageVariants array for size options

// In EntityToDtoMapper
$imageUrl = $cachedImage?->getStatus() === CachedImageStatus::CACHED
    ? $this->imageProxyUrlGenerator->generate($coffeeBean)
    : $coffeeBean->getImageUrl();

Implementation

Services

Service Responsibility
ImageCacheService Orchestrates caching workflow
ImageStorageService S3/Flysystem operations
ImageTransformService Resize/convert with Intervention
ImageProxyUrlGenerator Generate signed/public URLs

Async Processing

  • ImageCacheMessage: Trigger caching for a bean
  • ImageCacheHandler: Fetch, transform, upload to S3
  • Batch job for initial migration of existing images

Controller

  • ImageProxyController: Serve images, handle cache-on-demand

Integration with Phase 2

If Phase 2 (Broken Image Detection) is implemented:

  1. Phase 2 validates → marks ImageCheck as BROKEN
  2. Listener detects broken check → marks CachedImage as STALE
  3. Re-cache job picks up STALE images → attempts re-fetch
  4. If still broken → CachedImage status = FAILED, API returns fallback

Files to Create

File Description
src/Entity/CachedImage.php Entity
src/Repository/CachedImageRepository.php Repository
src/Enum/CachedImageStatus.php Status enum (PENDING, CACHED, FAILED, STALE)
src/Service/Image/ImageCacheService.php Orchestration
src/Service/Image/ImageStorageService.php S3 operations
src/Service/Image/ImageTransformService.php Image transformation
src/Service/Image/ImageProxyUrlGenerator.php URL generation
src/Controller/Api/ImageProxyController.php Proxy endpoint
src/Message/ImageCacheMessage.php Message class
src/MessageHandler/ImageCacheHandler.php Handler
src/EventListener/BrokenImageListener.php Optional: event-driven
Migration cached_image table
config/packages/flysystem.php S3 adapter config

Dependencies to Add

composer require league/flysystem-aws-s3-v3
composer require intervention/image

Reference Files

  • src/Scheduler/AvailabilityCrawlSchedulerService.php - Scheduler pattern
  • src/MessageHandler/CrawlStepHandler.php - Message handler pattern
  • src/DTO/Api/CoffeeBeanDTO.php - DTO structure
  • src/Service/Api/Mapper/EntityToDtoMapper.php - DTO mapping