Skip to content

Feature Implementation Plan: Improve Worker Error Logging

📋 Todo Checklist

  • [ ] Modify RoasterCrawlerService to log and re-throw exceptions.
  • [ ] Modify CoffeeBeanCrawlerService to re-throw exceptions.
  • [ ] Update monolog.php with a dedicated handler for messenger workers.
  • [ ] Final Review and Testing.

🔍 Analysis & Investigation

Codebase Structure

The application uses a message-driven architecture for handling background crawling tasks, leveraging the Symfony Messenger component. Key components in this system include: - Message Handlers: src/MessageHandler/RoasterCrawlHandler.php and src/MessageHandler/CrawlStepHandler.php consume messages and orchestrate the crawling process. - Crawler Services: src/Service/Crawler/RoasterCrawlerService.php and src/Service/Crawler/CoffeeBeanCrawlerService.php contain the core logic for fetching and processing data from roaster websites. - Logging Configuration: config/packages/monolog.php defines how logs are handled for different environments.

The files inspected during this analysis were: - config/packages/monolog.php - src/MessageHandler/RoasterCrawlHandler.php - src/Service/Crawler/RoasterCrawlerService.php - src/Service/Crawler/CoffeeBeanCrawlerService.php

Current Architecture

The current architecture suffers from two main issues that lead to silent failures in background jobs:

  1. Exception Swallowing: Both RoasterCrawlerService and CoffeeBeanCrawlerService have try...catch blocks that catch all Throwable exceptions but fail to re-throw them. This prevents the Symfony Messenger component from recognizing that the job has failed. As a result, the message is acknowledged as "successful" and removed from the queue, even though the underlying task failed.
  2. Overly-Restrictive Logging: The production logging configuration in monolog.php uses a fingers_crossed handler for the main log stream. This handler buffers all log messages and only outputs them if a message of error level or higher is recorded. Because the crawler services swallow exceptions and, in the case of RoasterCrawlerService, don't even log an error, the buffered info and debug messages are discarded, leaving no trace of what happened during the failed job.

Dependencies & Integration Points

The plan involves changes to the core crawler services and their interaction with the Symfony Messenger bus and the Monolog logging library. The goal is to ensure these components interact correctly to provide transparent and useful error reporting.

Considerations & Challenges

The primary consideration is to improve observability without introducing excessive log noise. The proposed solution addresses this by creating a dedicated logging channel for messenger workers, allowing for verbose logging from workers without affecting the logging level of the main application (e.g., web requests).

📝 Implementation Plan

Prerequisites

No special prerequisites are required. This plan involves modifying existing PHP files.

Step-by-Step Implementation

  1. Step 1: Fix RoasterCrawlerService Error Handling
  2. File to modify: src/Service/Crawler/RoasterCrawlerService.php
  3. Changes needed: In the crawlRegularPage method, the catch (Throwable $e) block currently returns a success-like array with an error message. This should be changed to properly log the error and then re-throw the exception.

    // src/Service/Crawler/RoasterCrawlerService.php
    
    // ... inside crawlRegularPage method ...
    try {
        $crawlResult = $this->apiClient->crawl($config, $requestBody);
        $this.validateResults($crawlResult);
    } catch (Throwable $e) {
        // Log the error with full context
        $this->logger->error('Error during regular page crawl', [
            'url' => $url,
            'config_id' => $config->getId(),
            'error' => $e->getMessage(),
            'trace' => $e->getTraceAsString(),
        ]);
    
        // Re-throw the exception to fail the message handler
        throw $e;
    }
    // ...
    
  4. Step 2: Fix CoffeeBeanCrawlerService Error Handling

  5. File to modify: src/Service/Crawler/CoffeeBeanCrawlerService.php
  6. Changes needed: The catch (Throwable $e) block in the crawl method already logs the error correctly. However, it does not re-throw the exception. Add throw $e; at the end of the catch block to ensure the job fails in the message queue.

    // src/Service/Crawler/CoffeeBeanCrawlerService.php
    
    // ... inside crawl method ...
    } catch (Throwable $e) {
        // Handle the exception and update the CrawlUrl
        $this->updateCrawlUrlFailure($crawlUrl, $e->getMessage());
    
        $this->logger->error('Crawler error', [
            'status'  => 'error',
            'message' => $e->getMessage(),
            'context' => [
                'url'       => $crawlUrl->getUrl(),
                'config_id' => $crawlUrl->getRoasterCrawlConfig()?->getId(),
                'trace'     => $e->getTrace(),
            ],
        ]);
    
        // Re-throw the exception
        throw $e;
    }
    
  7. Step 3: Improve Logging Configuration for Workers

  8. File to modify: config/packages/monolog.php
  9. Changes needed: In the production environment configuration (if ($containerConfigurator->env() === 'prod')), add a new, dedicated handler for the messenger channel. This handler will write all logs of debug level and above directly to stderr, ensuring complete visibility into worker operations without being affected by the fingers_crossed handler.

    // config/packages/monolog.php
    
    // ... inside the 'prod' environment block ...
    $containerConfigurator->extension('monolog', [
        'handlers' => [
            // Add this new handler for workers
            'messenger' => [
                'type' => 'stream',
                'path' => 'php://stderr',
                'level' => 'debug',
                'formatter' => 'monolog.formatter.json',
                'channels' => ['messenger'],
            ],
            // Keep the existing handlers
            'error' => [
                'type' => 'stream',
                'path' => 'php://stderr',
                'level' => 'error',
                'formatter' => 'monolog.formatter.json',
            ],
            // ... other handlers
        ],
    ]);
    

Testing Strategy

  1. Manual Testing:

    • Trigger a roaster crawl that is designed to fail (e.g., by using a config that points to a non-existent website).
    • Monitor the container logs (e.g., via docker logs). You should see the detailed error message and stack trace logged from the RoasterCrawlerService.
    • Verify in the application's admin dashboard that the message has been moved to the failed transport and is available for retry.
    • Repeat the test for a CoffeeBeanCrawlerService failure.
  2. Log Verification:

    • In a production-like environment, trigger a successful background job.
    • Verify that the info and debug logs from the worker are visible in the stderr output, confirming the new messenger channel is working.
    • Trigger a web request and verify that its info logs are not present, confirming the fingers_crossed handler is still active for other channels.

🎯 Success Criteria

  • When a background job fails due to an exception, a detailed error log, including a stack trace, is written to stderr.
  • The failed job is correctly marked as "failed" in the Symfony Messenger queue.
  • All log messages (debug, info, etc.) from messenger workers are visible in the logs in all environments, providing better observability.
  • The logging behavior for non-worker processes (e.g., web requests) remains unchanged.