Feature Implementation Plan: Background Similarity Calculation¶
📋 Todo Checklist¶
- [ ] Create
CoffeeBeanSimilarityentity and migration to store scores. - [ ] Create
CoffeeBeanSimilarityServicefor the core calculation logic. - [ ] Implement Symfony Messenger for background processing (
CalculateSimilarityForBeanmessage and handler). - [ ] Create a Doctrine listener to trigger calculations for new coffee beans.
- [ ] Create a Symfony command for batch and periodic calculations.
- [ ] Update the API endpoint to read pre-calculated scores from the new table.
- [ ] Final Review and Testing.
🔍 Analysis & Investigation¶
Codebase Structure & Architecture¶
This plan shifts the feature's architecture from a synchronous, on-demand calculation to an asynchronous, event-driven one.
- Persistence: A new entity CoffeeBeanSimilarity will be created in src/Entity/ to store the calculated scores. This is more scalable than adding a JSON field to the CoffeeBean entity.
- Background Processing: The presence of src/Message and src/MessageHandler directories indicates the project uses the Symfony Messenger component. This is the ideal tool for offloading the similarity calculation to a background worker.
- Event-Driven Triggers:
- For new beans, a Doctrine Entity Listener (src/EventListener/) is the standard Symfony way to react to entity lifecycle events like postPersist.
- For batch processing, a Symfony Console Command (src/Command/) provides a way to trigger the calculations manually or via a cron job.
- API Layer: The API controller (src/Controller/Api/CoffeeBeanController.php) will become much simpler. Instead of triggering a heavy calculation, it will perform a fast read operation on the new coffee_bean_similarity table.
Considerations & Challenges¶
- Data Volume: With N beans, there are potentially N * (N-1) / 2 pairs. Calculating all pairs is computationally expensive. The plan mitigates this by calculating similarity only for a source bean against a smaller set of relevant "candidates" (e.g., from the same roaster or country).
- Stale Data: Similarity scores could become stale if a coffee bean's properties are updated. The periodic refresh command (
app:similarity:calculate --all) addresses this by queueing all beans for recalculation. - Configuration: The number of similar beans to store (12) and the weights for the algorithm should be configurable, perhaps as parameters in
services.yaml, to allow for easier tuning.
📝 Implementation Plan¶
Step-by-Step Implementation¶
-
Step 1: Create
CoffeeBeanSimilarityEntity- File to create:
src/Entity/CoffeeBeanSimilarity.php - Changes needed:
- Create a new Doctrine entity with a composite primary key or a unique constraint on the two bean fields.
- Properties:
sourceBean(ManyToOnerelationship toCoffeeBean)similarBean(ManyToOnerelationship toCoffeeBean)score(float)createdAt(DateTimeImmutable)
- Action: Run
php bin/console make:migrationto generate the database migration.
- File to create:
-
Step 2: Create
CoffeeBeanSimilarityService- File to create:
src/Service/Similarity/CoffeeBeanSimilarityService.php - Changes needed:
- This service will contain the pure calculation logic.
- Public Method:
calculateScores(CoffeeBean $sourceBean): array. - It will:
- Fetch candidate beans via
CoffeeBeanRepository->findSimilarityCandidates(). - Calculate a similarity score for each candidate against the source bean using the weighted algorithm from the previous plan.
- Return an array of
['bean' => CoffeeBean, 'score' => float], sorted by score.
- Fetch candidate beans via
- File to create:
-
Step 3: Implement Symfony Messenger Components
- File to create:
src/Message/CalculateSimilarityForBean.php- A simple message class containing the
coffeeBeanId.
- A simple message class containing the
- File to create:
src/MessageHandler/CalculateSimilarityForBeanHandler.php- This handler will be invoked by the Messenger worker.
- Logic:
- Inject
CoffeeBeanRepository,CoffeeBeanSimilarityService, andEntityManagerInterface. - In the
__invokemethod, fetch the sourceCoffeeBeanusing the ID from the message. - Call
CoffeeBeanSimilarityService->calculateScores(). - Delete existing similarity entries for the source bean.
- Persist the top 12 new
CoffeeBeanSimilarityentities to the database.
- Inject
- File to create:
-
Step 4: Create Doctrine Listener for New Beans
- File to create:
src/EventListener/CoffeeBeanListener.php - Changes needed:
- Create a listener that acts on the
postPersistevent forCoffeeBeanentities. - Inject the
MessageBusInterface. - In the
postPersistmethod, dispatch a newCalculateSimilarityForBeanmessage with the new bean's ID. - Register this listener in
config/services.yamlwith thedoctrine.event_listenertag.
- Create a listener that acts on the
- File to create:
-
Step 5: Create Console Command for Batch Processing
- File to create:
src/Command/CalculateSimilarityCommand.php - Changes needed:
- Create a new command, e.g.,
app:similarity:calculate. - Options:
--missing-only: Finds beans with no entries in thecoffee_bean_similaritytable and dispatches a message for each.--all: Dispatches a message for every single coffee bean in the database.
- Inject
CoffeeBeanRepositoryandMessageBusInterface. - The command's logic will query for the appropriate beans and loop through them, dispatching a message for each one.
- Create a new command, e.g.,
- File to create:
-
Step 6: Update API Endpoint
- File to modify:
src/Repository/CoffeeBeanRepository.php- Add a new method:
findSimilarBeans(string $sourceBeanId, int $limit = 12): array. - This method will query the
CoffeeBeanSimilarityentity, filtering bysourceBean, ordering byscore, and joining thesimilarBeanrelationship to fetch the bean data.
- Add a new method:
- File to modify:
src/Controller/Api/CoffeeBeanController.php- Modify the
GET /api/coffee-beans/{id}/similarendpoint. - The controller method will now simply call
CoffeeBeanRepository->findSimilarBeans(). - It will then map the results to the
SimilarCoffeeBeanDTO(which should be updated to accept aCoffeeBeanand ascore).
- Modify the
- File to modify:
Testing Strategy¶
- Unit Tests:
- Test the
CoffeeBeanSimilarityServicecalculation logic thoroughly. - Test the
CalculateSimilarityCommandto ensure it queries for the correct beans based on the options. - Test the
CoffeeBeanListenerto verify that it dispatches a message.
- Test the
- Integration Tests:
- Test the
CalculateSimilarityForBeanHandler. This can be done using an in-memory messenger transport to assert that the handler consumes a message and correctly saves data to the database. - Test the API endpoint
GET /api/coffee-beans/{id}/similarto ensure it correctly retrieves the pre-calculated data from thecoffee_bean_similaritytable.
- Test the
🎯 Success Criteria¶
- When a new
CoffeeBeanis created, a background job is queued to calculate its similarity scores. - The
app:similarity:calculatecommand successfully queues jobs for calculation. - The
GET /api/coffee-beans/{id}/similarendpoint responds quickly with a list of 12 (or fewer) similar beans and their scores, read from the database. - The system is robust and can handle similarity calculations for a large number of coffee beans without impacting API performance.