Skip to content

Feature Implementation Plan: Background Similarity Calculation

📋 Todo Checklist

  • [ ] Create CoffeeBeanSimilarity entity and migration to store scores.
  • [ ] Create CoffeeBeanSimilarityService for the core calculation logic.
  • [ ] Implement Symfony Messenger for background processing (CalculateSimilarityForBean message and handler).
  • [ ] Create a Doctrine listener to trigger calculations for new coffee beans.
  • [ ] Create a Symfony command for batch and periodic calculations.
  • [ ] Update the API endpoint to read pre-calculated scores from the new table.
  • [ ] Final Review and Testing.

🔍 Analysis & Investigation

Codebase Structure & Architecture

This plan shifts the feature's architecture from a synchronous, on-demand calculation to an asynchronous, event-driven one. - Persistence: A new entity CoffeeBeanSimilarity will be created in src/Entity/ to store the calculated scores. This is more scalable than adding a JSON field to the CoffeeBean entity. - Background Processing: The presence of src/Message and src/MessageHandler directories indicates the project uses the Symfony Messenger component. This is the ideal tool for offloading the similarity calculation to a background worker. - Event-Driven Triggers: - For new beans, a Doctrine Entity Listener (src/EventListener/) is the standard Symfony way to react to entity lifecycle events like postPersist. - For batch processing, a Symfony Console Command (src/Command/) provides a way to trigger the calculations manually or via a cron job. - API Layer: The API controller (src/Controller/Api/CoffeeBeanController.php) will become much simpler. Instead of triggering a heavy calculation, it will perform a fast read operation on the new coffee_bean_similarity table.

Considerations & Challenges

  • Data Volume: With N beans, there are potentially N * (N-1) / 2 pairs. Calculating all pairs is computationally expensive. The plan mitigates this by calculating similarity only for a source bean against a smaller set of relevant "candidates" (e.g., from the same roaster or country).
  • Stale Data: Similarity scores could become stale if a coffee bean's properties are updated. The periodic refresh command (app:similarity:calculate --all) addresses this by queueing all beans for recalculation.
  • Configuration: The number of similar beans to store (12) and the weights for the algorithm should be configurable, perhaps as parameters in services.yaml, to allow for easier tuning.

📝 Implementation Plan

Step-by-Step Implementation

  1. Step 1: Create CoffeeBeanSimilarity Entity

    • File to create: src/Entity/CoffeeBeanSimilarity.php
    • Changes needed:
      • Create a new Doctrine entity with a composite primary key or a unique constraint on the two bean fields.
      • Properties:
        • sourceBean (ManyToOne relationship to CoffeeBean)
        • similarBean (ManyToOne relationship to CoffeeBean)
        • score (float)
        • createdAt (DateTimeImmutable)
    • Action: Run php bin/console make:migration to generate the database migration.
  2. Step 2: Create CoffeeBeanSimilarityService

    • File to create: src/Service/Similarity/CoffeeBeanSimilarityService.php
    • Changes needed:
      • This service will contain the pure calculation logic.
      • Public Method: calculateScores(CoffeeBean $sourceBean): array.
      • It will:
        1. Fetch candidate beans via CoffeeBeanRepository->findSimilarityCandidates().
        2. Calculate a similarity score for each candidate against the source bean using the weighted algorithm from the previous plan.
        3. Return an array of ['bean' => CoffeeBean, 'score' => float], sorted by score.
  3. Step 3: Implement Symfony Messenger Components

    • File to create: src/Message/CalculateSimilarityForBean.php
      • A simple message class containing the coffeeBeanId.
    • File to create: src/MessageHandler/CalculateSimilarityForBeanHandler.php
      • This handler will be invoked by the Messenger worker.
      • Logic:
        1. Inject CoffeeBeanRepository, CoffeeBeanSimilarityService, and EntityManagerInterface.
        2. In the __invoke method, fetch the source CoffeeBean using the ID from the message.
        3. Call CoffeeBeanSimilarityService->calculateScores().
        4. Delete existing similarity entries for the source bean.
        5. Persist the top 12 new CoffeeBeanSimilarity entities to the database.
  4. Step 4: Create Doctrine Listener for New Beans

    • File to create: src/EventListener/CoffeeBeanListener.php
    • Changes needed:
      • Create a listener that acts on the postPersist event for CoffeeBean entities.
      • Inject the MessageBusInterface.
      • In the postPersist method, dispatch a new CalculateSimilarityForBean message with the new bean's ID.
      • Register this listener in config/services.yaml with the doctrine.event_listener tag.
  5. Step 5: Create Console Command for Batch Processing

    • File to create: src/Command/CalculateSimilarityCommand.php
    • Changes needed:
      • Create a new command, e.g., app:similarity:calculate.
      • Options:
        • --missing-only: Finds beans with no entries in the coffee_bean_similarity table and dispatches a message for each.
        • --all: Dispatches a message for every single coffee bean in the database.
      • Inject CoffeeBeanRepository and MessageBusInterface.
      • The command's logic will query for the appropriate beans and loop through them, dispatching a message for each one.
  6. Step 6: Update API Endpoint

    • File to modify: src/Repository/CoffeeBeanRepository.php
      • Add a new method: findSimilarBeans(string $sourceBeanId, int $limit = 12): array.
      • This method will query the CoffeeBeanSimilarity entity, filtering by sourceBean, ordering by score, and joining the similarBean relationship to fetch the bean data.
    • File to modify: src/Controller/Api/CoffeeBeanController.php
      • Modify the GET /api/coffee-beans/{id}/similar endpoint.
      • The controller method will now simply call CoffeeBeanRepository->findSimilarBeans().
      • It will then map the results to the SimilarCoffeeBeanDTO (which should be updated to accept a CoffeeBean and a score).

Testing Strategy

  • Unit Tests:
    • Test the CoffeeBeanSimilarityService calculation logic thoroughly.
    • Test the CalculateSimilarityCommand to ensure it queries for the correct beans based on the options.
    • Test the CoffeeBeanListener to verify that it dispatches a message.
  • Integration Tests:
    • Test the CalculateSimilarityForBeanHandler. This can be done using an in-memory messenger transport to assert that the handler consumes a message and correctly saves data to the database.
    • Test the API endpoint GET /api/coffee-beans/{id}/similar to ensure it correctly retrieves the pre-calculated data from the coffee_bean_similarity table.

🎯 Success Criteria

  • When a new CoffeeBean is created, a background job is queued to calculate its similarity scores.
  • The app:similarity:calculate command successfully queues jobs for calculation.
  • The GET /api/coffee-beans/{id}/similar endpoint responds quickly with a list of 12 (or fewer) similar beans and their scores, read from the database.
  • The system is robust and can handle similarity calculations for a large number of coffee beans without impacting API performance.