Feature Implementation Plan: Migrate to Symfony AI Bundle (Experimental)¶

📋 Todo Checklist¶

[ ] Create a new feature branch for this implementation (e.g., feature/symfony-ai-bundle).
[ ] Install the experimental symfony/ai-bundle.
[ ] Configure the bundle with the existing OpenAI API key.
[ ] Refactor LlmExtractionStepProcessor to use the new AI services.
[ ] Delete the custom StructuredOutputLlmExtractor.
[ ] Thoroughly test the new implementation on the feature branch.
[ ] Document the process and any findings for future reference.

🔍 Analysis & Investigation¶

Codebase Structure¶

Target for Deletion: The primary goal is to completely remove the custom src/Service/Crawler/Extraction/StructuredOutputLlmExtractor.php.
Target for Refactoring: The src/Service/Crawler/Step/Processors/LlmExtractionStepProcessor.php will be the main focus of the refactoring effort. It will be updated to use services provided by the new bundle instead of the custom extractor.
Configuration: New configuration files will be created under config/packages/ to set up the symfony/ai-bundle.

Current Architecture & Problem¶

Problem: The current implementation is tightly coupled to the OpenAI API. The model, API endpoint, and request structure are hardcoded in a custom service. This makes it difficult to switch providers or experiment with different models.
Solution: This plan outlines the migration to the official symfony/ai-bundle. This will replace the custom, rigid implementation with a flexible, vendor-agnostic abstraction layer. This is a strategic move that aligns the project with the official Symfony ecosystem, reducing maintenance and increasing future flexibility.

Dependencies & Integration Points¶

symfony/ai-bundle: This will be a new, experimental dependency. The plan involves installing it and configuring it to use the existing OpenAI API key.
JSON Schema: A critical part of the investigation will be to ensure that the new bundle's "structured output" or "tool calling" features can replicate the current system's use of JSON Schema for reliable data extraction.

Considerations & Challenges¶

Experimental Nature: This is the main challenge. The bundle's API is subject to change. This implementation should be considered a proof-of-concept on a feature branch and should not be merged to production until the bundle reaches a stable release.
Feature Parity: We must verify that the Symfony AI bundle can fully replace the functionality of our custom extractor, especially the strict enforcement of a JSON Schema. If it can't, we need to document the gap and potentially contribute to the bundle to close it.
Configuration: The new bundle will have its own configuration format. The plan involves mapping the existing setup (API key, model name) to this new format.

📝 Implementation Plan¶

Prerequisites¶

Create and check out a new feature branch: git checkout -b feature/symfony-ai-bundle

Step-by-Step Implementation¶

Install the Experimental Bundle
- Run the following command to install the bundle and its dependencies. Note that you may need to adjust your composer.json to allow for dev stability packages if you haven't already.
```
composer require symfony/ai-bundle
```

Configure the AI Bundle

Files to create: config/packages/ai.yaml

Changes needed: Create the configuration file to set up the OpenAI provider.

ai:
    llms:
        openai_chat:
            type: 'openai'
            model: 'gpt-4.1-mini' # Or your desired model
            api_key: '%env(APP_OPENAI_API_TOKEN)%'

Refactor the LlmExtractionStepProcessor

Files to modify: src/Service/Crawler/Step/Processors/LlmExtractionStepProcessor.php

Changes needed:

Remove the injection of the custom StructuredOutputLlmExtractor.
Inject the new AI service provided by the bundle. This will likely be an implementation of Symfony\Component\Ai\Llm\LlmInterface or a specific chat client.
In the process method, replace the call to $this->llmExtractor->extract() with the equivalent call using the new AI service.

The new call will involve passing the system and user prompts, and configuring the call to use the JSON schema for structured output. This might look something like:

// This is a hypothetical example based on the bundle's likely design
$jsonSchema = $this->schemaLoader->load($schemaName);
$response = $this->llm->chat(
    messages: [
        new SystemMessage('You are analyzing...'),
        new UserMessage($processedContent),
    ],
    options: [
        'tool_choice' => 'my_tool_name',
        'tools' => [
            [
                'type' => 'function',
                'function' => [
                    'name' => 'my_tool_name',
                    'description' => 'Extract data from the page.',
                    'parameters' => $jsonSchema,
                ],
            ],
        ],
    ]
);
$extractedData = $response->getToolCalls()[0]->getArguments();

Delete the Custom Extractor
- Files to delete: src/Service/Crawler/Extraction/StructuredOutputLlmExtractor.php
- Changes needed: Once the refactoring is complete and tested, delete the now-redundant custom service file.

Testing Strategy¶

Unit Tests:
- Write new unit tests for the refactored LlmExtractionStepProcessor. This will likely involve mocking the LlmInterface from the Symfony AI bundle to simulate different responses (success, failure, refusal).
Integration Tests:
- Create an integration test that processes a real HTML fixture. This test will make a live call to the OpenAI API through the new bundle.
- Assert that the LlmExtractionStepProcessor correctly receives the structured data and saves it as expected. This will validate that the new implementation works end-to-end.
Manual Testing:
- On the feature branch, manually trigger a crawl for a few different roasters and verify that the extraction process completes successfully and the data is correct.

🎯 Success Criteria¶

The custom StructuredOutputLlmExtractor is successfully removed from the codebase.
All LLM calls are now made through the symfony/ai-bundle.
The application can be configured to use different models or even different providers (like Anthropic) with only configuration changes.
The extraction process is as reliable as, or more reliable than, the previous implementation.
The feature branch is fully tested and ready to be merged once the Symfony AI bundle reaches a stable release.