Feature Implementation Plan: Migrate to Symfony AI Bundle (Experimental)¶
📋 Todo Checklist¶
- [ ] Create a new feature branch for this implementation (e.g.,
feature/symfony-ai-bundle). - [ ] Install the experimental
symfony/ai-bundle. - [ ] Configure the bundle with the existing OpenAI API key.
- [ ] Refactor
LlmExtractionStepProcessorto use the new AI services. - [ ] Delete the custom
StructuredOutputLlmExtractor. - [ ] Thoroughly test the new implementation on the feature branch.
- [ ] Document the process and any findings for future reference.
🔍 Analysis & Investigation¶
Codebase Structure¶
- Target for Deletion: The primary goal is to completely remove the custom
src/Service/Crawler/Extraction/StructuredOutputLlmExtractor.php. - Target for Refactoring: The
src/Service/Crawler/Step/Processors/LlmExtractionStepProcessor.phpwill be the main focus of the refactoring effort. It will be updated to use services provided by the new bundle instead of the custom extractor. - Configuration: New configuration files will be created under
config/packages/to set up thesymfony/ai-bundle.
Current Architecture & Problem¶
- Problem: The current implementation is tightly coupled to the OpenAI API. The model, API endpoint, and request structure are hardcoded in a custom service. This makes it difficult to switch providers or experiment with different models.
- Solution: This plan outlines the migration to the official
symfony/ai-bundle. This will replace the custom, rigid implementation with a flexible, vendor-agnostic abstraction layer. This is a strategic move that aligns the project with the official Symfony ecosystem, reducing maintenance and increasing future flexibility.
Dependencies & Integration Points¶
- symfony/ai-bundle: This will be a new, experimental dependency. The plan involves installing it and configuring it to use the existing OpenAI API key.
- JSON Schema: A critical part of the investigation will be to ensure that the new bundle's "structured output" or "tool calling" features can replicate the current system's use of JSON Schema for reliable data extraction.
Considerations & Challenges¶
- Experimental Nature: This is the main challenge. The bundle's API is subject to change. This implementation should be considered a proof-of-concept on a feature branch and should not be merged to production until the bundle reaches a stable release.
- Feature Parity: We must verify that the Symfony AI bundle can fully replace the functionality of our custom extractor, especially the strict enforcement of a JSON Schema. If it can't, we need to document the gap and potentially contribute to the bundle to close it.
- Configuration: The new bundle will have its own configuration format. The plan involves mapping the existing setup (API key, model name) to this new format.
📝 Implementation Plan¶
Prerequisites¶
- Create and check out a new feature branch:
git checkout -b feature/symfony-ai-bundle
Step-by-Step Implementation¶
-
Install the Experimental Bundle
- Run the following command to install the bundle and its dependencies. Note that you may need to adjust your
composer.jsonto allow fordevstability packages if you haven't already.
- Run the following command to install the bundle and its dependencies. Note that you may need to adjust your
-
Configure the AI Bundle
- Files to create:
config/packages/ai.yaml - Changes needed: Create the configuration file to set up the OpenAI provider.
- Files to create:
-
Refactor the
LlmExtractionStepProcessor- Files to modify:
src/Service/Crawler/Step/Processors/LlmExtractionStepProcessor.php - Changes needed:
- Remove the injection of the custom
StructuredOutputLlmExtractor. - Inject the new AI service provided by the bundle. This will likely be an implementation of
Symfony\Component\Ai\Llm\LlmInterfaceor a specific chat client. - In the
processmethod, replace the call to$this->llmExtractor->extract()with the equivalent call using the new AI service. - The new call will involve passing the system and user prompts, and configuring the call to use the JSON schema for structured output. This might look something like:
// This is a hypothetical example based on the bundle's likely design $jsonSchema = $this->schemaLoader->load($schemaName); $response = $this->llm->chat( messages: [ new SystemMessage('You are analyzing...'), new UserMessage($processedContent), ], options: [ 'tool_choice' => 'my_tool_name', 'tools' => [ [ 'type' => 'function', 'function' => [ 'name' => 'my_tool_name', 'description' => 'Extract data from the page.', 'parameters' => $jsonSchema, ], ], ], ] ); $extractedData = $response->getToolCalls()[0]->getArguments();
- Remove the injection of the custom
- Files to modify:
-
Delete the Custom Extractor
- Files to delete:
src/Service/Crawler/Extraction/StructuredOutputLlmExtractor.php - Changes needed: Once the refactoring is complete and tested, delete the now-redundant custom service file.
- Files to delete:
Testing Strategy¶
- Unit Tests:
- Write new unit tests for the refactored
LlmExtractionStepProcessor. This will likely involve mocking theLlmInterfacefrom the Symfony AI bundle to simulate different responses (success, failure, refusal).
- Write new unit tests for the refactored
- Integration Tests:
- Create an integration test that processes a real HTML fixture. This test will make a live call to the OpenAI API through the new bundle.
- Assert that the
LlmExtractionStepProcessorcorrectly receives the structured data and saves it as expected. This will validate that the new implementation works end-to-end.
- Manual Testing:
- On the feature branch, manually trigger a crawl for a few different roasters and verify that the extraction process completes successfully and the data is correct.
🎯 Success Criteria¶
- The custom
StructuredOutputLlmExtractoris successfully removed from the codebase. - All LLM calls are now made through the
symfony/ai-bundle. - The application can be configured to use different models or even different providers (like Anthropic) with only configuration changes.
- The extraction process is as reliable as, or more reliable than, the previous implementation.
- The feature branch is fully tested and ready to be merged once the Symfony AI bundle reaches a stable release.