Skip to content

Fix Database Deadlock in Similarity Calculation

Priority: 🔴 CRITICAL - Production Database Deadlock Status: Planning Sentry Issue: BEANS-BACKEND-4C

Problem Statement

Database deadlock occurring when multiple similarity calculation workers process messages concurrently, leading to PostgreSQL deadlock errors on the region table.

Error Details

  • Error Type: Doctrine\DBAL\Exception\DeadlockException
  • Database Error: SQLSTATE[40P01]: Deadlock detected
  • Location: src/MessageHandler/CalculateSimilarityForBeanHandler.php:60
  • Frequency: 1 occurrence (but retries configured)
  • Environment: Production
  • First Seen: 2025-11-09T02:06:13.356Z

PostgreSQL Deadlock Details

Process 845172 waits for ShareLock on transaction 166374; blocked by process 845171.
Process 845171 waits for ShareLock on transaction 166373; blocked by process 845172.
CONTEXT: while updating tuple (5,14) in relation "region"

Classic Circular Wait Deadlock: - Process A locks resource 1, waits for resource 2 - Process B locks resource 2, waits for resource 1 - Neither can proceed → deadlock

Root Cause Analysis

The Deadlock Scenario

Code Flow

  1. Handler Entry (CalculateSimilarityForBeanHandler.php:25)

    public function __invoke(CalculateSimilarityForBean $message): void
    

  2. Delete Existing Similarities (Line 42-46)

    $this->entityManager->createQuery(
        'DELETE FROM App\Entity\CoffeeBeanSimilarity cbs WHERE cbs.sourceBean = :sourceBean'
    )
    ->setParameter('sourceBean', $sourceBean)
    ->execute();
    

  3. Calculate New Similarities (Line 48)

    $similarityResults = $this->similarityService->calculateScores($sourceBean);
    

  4. Persist Similarities (Lines 50-58)

    foreach ($similarityResults as $result) {
        $similarity = new CoffeeBeanSimilarity(
            $sourceBean,
            $result['bean'],  // CoffeeBean entity with eager-loaded relationships
            $result['score']
        );
        $this->entityManager->persist($similarity);
    }
    

  5. Flush All Changes (Line 60) - DEADLOCK OCCURS HERE

    $this->entityManager->flush();
    

Why Deadlocks Occur

1. Concurrent Workers Processing Different Beans

Multiple similarity_calculation workers run concurrently (messenger:consume processes messages in parallel)

2. Shared Entity Updates via Doctrine ORM

When persisting CoffeeBeanSimilarity, Doctrine: - Persists the similarity record - Updates inverse side of ManyToMany relationships on related entities - CoffeeBean has ManyToMany with: Region, ProcessingMethod, RoastLevel, Variety, FlavorWheelNode

3. Non-Deterministic Lock Order

Worker A processing Bean X:
  - Loads Bean X with regions [Brazil, Colombia, Ethiopia]
  - Creates similarities with Beans [Y, Z, W]
  - Flush tries to update regions in some order (e.g., Brazil → Colombia)

Worker B processing Bean Y:
  - Loads Bean Y with regions [Colombia, Kenya, Ethiopia]
  - Creates similarities with Beans [X, A, B]
  - Flush tries to update regions in different order (e.g., Kenya → Ethiopia)

DEADLOCK:
  Worker A: Lock Brazil → Wait for Ethiopia
  Worker B: Lock Ethiopia → Wait for Brazil

4. Region Table Update Context

The error specifically mentions: while updating tuple (5,14) in relation "region"

This happens because: - CoffeeBean has ManyToMany with Region (line 41 in CoffeeBean.php) - Region has inverse side: ManyToMany mappedBy regions (line 46 in Region.php) - When Doctrine flushes, it may update counters or metadata on Region rows - Multiple workers updating different beans with overlapping regions → lock contention

Entity Relationships

CoffeeBeanSimilarity
  ├─ sourceBean (ManyToOne → CoffeeBean)
  └─ similarBean (ManyToOne → CoffeeBean)

CoffeeBean (both source and similar beans)
  ├─ regions (ManyToMany → Region)  ← DEADLOCK HAPPENS HERE
  ├─ processingMethods (ManyToMany → ProcessingMethod)
  ├─ roastLevels (ManyToMany → RoastLevel)
  ├─ varieties (ManyToMany → Variety)
  ├─ flavorNotes (ManyToMany → FlavorWheelNode)
  └─ nonStandardFlavorNotes (ManyToMany → NonStandardFlavorNote)

Region (inverse side)
  └─ coffeeBeans (ManyToMany mappedBy regions)

Trigger Conditions

The deadlock is triggered when: 1. Multiple workers consume from similarity_calculation queue concurrently 2. Different coffee beans share common regions (or other ManyToMany entities) 3. Lock acquisition order differs between workers 4. Timing coincidence - both workers flush at approximately same time

Why It's Rare (But Critical)

  • Single occurrence suggests specific race condition
  • Happens when:
  • Multiple beans persisted simultaneously (triggers CoffeeBeanListener::postPersist)
  • Beans share common regions (e.g., "Ethiopia", "Colombia")
  • Workers process messages at same time
  • Retry mechanism makes it eventually succeed (3 retries configured)

Impact Assessment

Severity: HIGH 🟠 (Not CRITICAL due to retry mechanism)

Positive: - ✅ Retry strategy configured (3 retries with exponential backoff) - ✅ Only 1 occurrence so far - ✅ Message will eventually succeed after retry

Negative: - ❌ Causes message processing delays - ❌ Wastes system resources on retries - ❌ Can cascade if load increases - ❌ Poor user experience if many concurrent bean creations occur

Affected Functionality

  • Similarity calculation background jobs
  • New coffee bean processing pipeline
  • Recommendation system data freshness

Solution Options

Use read-only/detached entities for similarity calculation to avoid triggering updates.

public function __invoke(CalculateSimilarityForBean $message): void
{
    $coffeeBeanId = $message->coffeeBeanId;

    $this->logger->info('Starting similarity calculation for coffee bean', [
        'coffeeBeanId' => $coffeeBeanId,
    ]);

    $sourceBean = $this->coffeeBeanRepository->find($coffeeBeanId);

    if (!$sourceBean instanceof CoffeeBean) {
        $this->logger->warning('Coffee bean not found for similarity calculation', [
            'coffeeBeanId' => $coffeeBeanId,
        ]);
        return;
    }

    // Delete existing similarities
    $this->entityManager->createQuery(
        'DELETE FROM App\Entity\CoffeeBeanSimilarity cbs WHERE cbs.sourceBean = :sourceBean'
    )
    ->setParameter('sourceBean', $sourceBean)
    ->execute();

    // Calculate similarities
    $similarityResults = $this->similarityService->calculateScores($sourceBean);

    // NEW: Detach related beans to prevent relationship updates
    foreach ($similarityResults as $result) {
        // Detach the similar bean to prevent Doctrine from tracking changes
        $this->entityManager->detach($result['bean']);

        // Re-fetch as a reference (lightweight proxy without loading all relations)
        $similarBeanRef = $this->entityManager->getReference(CoffeeBean::class, $result['bean']->getId());

        $similarity = new CoffeeBeanSimilarity(
            $sourceBean,
            $similarBeanRef,  // Use reference instead of full entity
            $result['score']
        );

        $this->entityManager->persist($similarity);
    }

    $this->entityManager->flush();

    $this->logger->info('Completed similarity calculation for coffee bean', [
        'coffeeBeanId' => $coffeeBeanId,
        'similarityCount' => count($similarityResults),
    ]);
}

Pros: - Quick fix, minimal code change - Prevents Doctrine from updating related entities - Maintains functionality - Low risk

Cons: - Doesn't address root architectural issue - Still has potential for deadlock if sourceBean is shared

Option 2: Single-Threaded Similarity Queue

Configure similarity calculation to run with a single worker.

messenger.php changes:

'similarity_calculation' => [
    'dsn' => '%env(MESSENGER_TRANSPORT_DSN)%',
    'options' => [
        'queue_name' => 'similarity_calculation',
        'serializer' => 'messenger.transport.symfony_serializer',
    ],
    'retry_strategy' => [
        'max_retries' => 3,
        'delay' => 1000,
        'multiplier' => 2,
    ],
    // NEW: Limit to single worker via deployment config
],

Deployment/Supervisor config:

# Only run 1 worker for similarity_calculation
messenger:consume similarity_calculation --limit=100 --time-limit=3600

Pros: - Completely eliminates deadlock risk - Simple configuration change - No code changes needed

Cons: - Reduced throughput - Slower similarity calculation processing - Bottleneck if many beans created simultaneously - Doesn't scale

Option 3: Batch Processing with Row-Level Locking

Use explicit locking or batch processing to control concurrency.

public function __invoke(CalculateSimilarityForBean $message): void
{
    // Use transaction with explicit lock timeout
    $this->entityManager->getConnection()->beginTransaction();

    try {
        // Set lock timeout to prevent indefinite waits
        $this->entityManager->getConnection()->executeStatement(
            'SET LOCAL lock_timeout = \'2s\''
        );

        // Rest of processing...

        $this->entityManager->getConnection()->commit();
    } catch (DeadlockException $e) {
        $this->entityManager->getConnection()->rollBack();
        throw $e; // Let retry mechanism handle it
    }
}

Pros: - More control over locking behavior - Faster failure detection - Explicit error handling

Cons: - Still requires retry mechanism - More complex code - Doesn't prevent deadlock, just handles it faster

Option 4: Deferred Similarity Updates (Eventual Consistency)

Calculate similarities asynchronously without blocking, update in batches.

Pros: - Highest scalability - No blocking operations - Best long-term solution

Cons: - Significant architectural change - More complex implementation - Eventual consistency trade-off

Option 5: Optimistic Locking + Retry (Status Quo Enhancement)

Keep current approach but add optimistic locking detection.

// Add version column to CoffeeBeanSimilarity
#[ORM\Version]
#[ORM\Column(type: 'integer')]
private int $version = 0;

Pros: - Minimal changes - Leverages existing retry mechanism - Good for low-concurrency scenarios

Cons: - Doesn't prevent deadlocks - Relies on retries

Phase 1: Immediate Fix (This Week)

Implement Option 1 (Detached Entities) + Optimize Retry

  1. Use detached entities to prevent relationship updates
  2. Improve retry strategy for faster recovery
  3. Add monitoring for deadlock occurrences

Phase 2: Medium-Term (Next Sprint)

Implement Option 2 (Single Worker) OR optimize concurrent processing

Evaluate: - Current similarity calculation throughput - Typical number of concurrent bean creations - If throughput is low → single worker acceptable - If throughput is high → need better solution (Option 4)

Phase 3: Long-Term (Future)

Consider Option 4 (Batch Processing)

If similarity calculations become a bottleneck: - Implement batch similarity recalculation - Decouple from bean creation - Calculate similarities in scheduled jobs

Implementation Plan

Step 1: Add Detached Entity Pattern

File: src/MessageHandler/CalculateSimilarityForBeanHandler.php

public function __invoke(CalculateSimilarityForBean $message): void
{
    $coffeeBeanId = $message->coffeeBeanId;

    $this->logger->info('Starting similarity calculation for coffee bean', [
        'coffeeBeanId' => $coffeeBeanId,
    ]);

    $sourceBean = $this->coffeeBeanRepository->find($coffeeBeanId);

    if (!$sourceBean instanceof CoffeeBean) {
        $this->logger->warning('Coffee bean not found for similarity calculation', [
            'coffeeBeanId' => $coffeeBeanId,
        ]);
        return;
    }

    // Delete existing similarities
    $this->entityManager->createQuery(
        'DELETE FROM App\Entity\CoffeeBeanSimilarity cbs WHERE cbs.sourceBean = :sourceBean'
    )
    ->setParameter('sourceBean', $sourceBean)
    ->execute();

    // Calculate similarities
    $similarityResults = $this->similarityService->calculateScores($sourceBean);

    // Persist new similarities with detached similar beans
    foreach ($similarityResults as $result) {
        // Detach the bean to prevent Doctrine from tracking relationship changes
        $this->entityManager->detach($result['bean']);

        // Get a lightweight reference instead of the full entity
        // This creates a proxy without loading all relationships
        $similarBeanRef = $this->entityManager->getReference(
            CoffeeBean::class,
            $result['bean']->getId()
        );

        $similarity = new CoffeeBeanSimilarity(
            $sourceBean,
            $similarBeanRef,
            $result['score']
        );

        $this->entityManager->persist($similarity);
    }

    // Flush - now only updates CoffeeBeanSimilarity table, not related entities
    $this->entityManager->flush();

    $this->logger->info('Completed similarity calculation for coffee bean', [
        'coffeeBeanId' => $coffeeBeanId,
        'similarityCount' => count($similarityResults),
    ]);
}

Step 2: Improve Retry Strategy

File: config/packages/messenger.php

'similarity_calculation' => [
    'dsn' => '%env(MESSENGER_TRANSPORT_DSN)%',
    'options' => [
        'queue_name' => 'similarity_calculation',
    ],
    'retry_strategy' => [
        'max_retries' => 5,  // Increase retries
        'delay' => 500,      // Faster initial retry (500ms)
        'multiplier' => 2,   // Exponential backoff
        'max_delay' => 10000, // Cap at 10s
    ],
],

Step 3: Add Deadlock Monitoring

File: src/MessageHandler/CalculateSimilarityForBeanHandler.php

Add try-catch for deadlock detection:

use Doctrine\DBAL\Exception\DeadlockException;

public function __invoke(CalculateSimilarityForBean $message): void
{
    try {
        // ... existing code ...
    } catch (DeadlockException $e) {
        $this->logger->error('Deadlock detected during similarity calculation', [
            'coffeeBeanId' => $message->coffeeBeanId,
            'exception' => $e->getMessage(),
        ]);

        // Re-throw to trigger retry mechanism
        throw $e;
    }
}

Step 4: Add Unit Tests

File: tests/MessageHandler/CalculateSimilarityForBeanHandlerTest.php

Add tests for: 1. Successful similarity calculation 2. Deadlock scenario simulation 3. Verify detached entities don't trigger updates 4. Verify retry mechanism works

public function testDeadlockRetryMechanism(): void
{
    // Mock EntityManager to throw DeadlockException on first flush
    $entityManager = $this->createMock(EntityManagerInterface::class);
    $entityManager->expects($this->once())
        ->method('flush')
        ->willThrowException(new DeadlockException('Test deadlock', null));

    // ... rest of test

    $this->expectException(DeadlockException::class);
    $handler($message);
}

public function testDetachedEntitiesDontTriggerUpdates(): void
{
    // Verify that detaching similar beans prevents relationship updates
    // Mock flush to track what gets updated
}

Step 5: Monitor in Production

After deployment: 1. Monitor Sentry for BEANS-BACKEND-4C recurrence 2. Check retry count metrics 3. Verify similarity calculation success rate 4. Monitor database lock wait times

Testing Strategy

Unit Tests

  • [ ] Test successful similarity calculation
  • [ ] Test handling of non-existent bean
  • [ ] Test deadlock exception re-throw
  • [ ] Test detached entity behavior

Integration Tests

  • [ ] Test with multiple concurrent workers
  • [ ] Simulate deadlock scenario
  • [ ] Verify retry mechanism
  • [ ] Test with beans sharing regions

Performance Tests

  • [ ] Benchmark similarity calculation throughput
  • [ ] Test with high concurrency
  • [ ] Measure deadlock occurrence rate

Manual Testing

  • [ ] Create multiple beans simultaneously
  • [ ] Monitor database locks
  • [ ] Verify similarities are calculated correctly
  • [ ] Check retry counts in logs

Deployment Plan

Pre-Deployment

  1. Run full test suite: make test
  2. Test locally with multiple workers
  3. Review code changes
  4. Update monitoring alerts

Deployment

  1. Create branch: fix/similarity-calculation-deadlock
  2. Commit with message: fix: Prevent database deadlock in similarity calculation using detached entities - Fixes BEANS-BACKEND-4C
  3. Create PR
  4. Code review
  5. Deploy to staging
  6. Run load test with concurrent bean creation
  7. Deploy to production

Post-Deployment

  1. Monitor Sentry for 72 hours
  2. Check retry counts
  3. Verify deadlock rate drops to zero
  4. Review database performance metrics

Success Criteria

  • [ ] Zero deadlock errors in similarity calculation
  • [ ] Sentry issue BEANS-BACKEND-4C marked as resolved
  • [ ] All tests passing
  • [ ] No increase in similarity calculation latency
  • [ ] Retry rate remains low (<5%)
  • [ ] Database lock wait time unchanged or improved

Risk Assessment

Risk: LOW 🟢

  • Detaching entities is safe - prevents unintended updates
  • Retry mechanism already in place as safety net
  • Easy to rollback if issues occur

Mitigation

  • Comprehensive testing before deployment
  • Staging environment validation
  • Gradual rollout monitoring
  • Easy rollback plan

Alternative Considerations

Why Not Use Database-Level Solutions?

Option: Increase lock timeout - Doesn't fix root cause - Just delays the inevitable - Poor user experience

Option: Advisory locks - Over-engineered for this problem - Adds complexity - Harder to maintain

Option: Serializable isolation level - Performance impact - Overkill for this scenario - May cause more conflicts

Performance Impact

Expected Impact: NEUTRAL to POSITIVE

Neutral: - Detached entities reduce Doctrine overhead - No additional database queries

Positive: - Fewer locks held during flush - Faster flush operations - Reduced database contention

Risk: - Minimal - only changes in-memory entity management

Follow-up Actions

After Fix Deployed

  1. Monitor deadlock metrics for 1 week
  2. Evaluate if single-worker approach needed
  3. Consider batch processing for future scaling
  4. Document entity management patterns
  • Review other message handlers for similar patterns
  • Add database deadlock detection to monitoring
  • Create guidelines for concurrent entity updates

Additional Context

PostgreSQL Deadlock Detection

PostgreSQL automatically detects deadlocks and aborts one transaction. The deadlock_timeout (default 1s) controls detection speed.

Doctrine Behavior

  • ManyToMany relationships can trigger updates on inverse side
  • persist() doesn't immediately write to database
  • flush() executes all pending operations
  • Update order is non-deterministic
  • Detached entities are not tracked by UnitOfWork

Messenger Retry Strategy

Current config: - Max retries: 3 - Initial delay: 1s - Multiplier: 2x - Retries: 1s, 2s, 4s

References

  • Sentry Issue: https://alpipego.sentry.io/issues/7009563819/
  • PostgreSQL Deadlocks: https://www.postgresql.org/docs/current/explicit-locking.html
  • Doctrine Detach: https://www.doctrine-project.org/projects/doctrine-orm/en/latest/reference/working-with-objects.html#detaching-entities
  • Symfony Messenger Retry: https://symfony.com/doc/current/messenger.html#retries-failures

Estimated Effort

  • Detached Entities Implementation: 2-3 hours
  • Testing: 3-4 hours
  • Retry Strategy Tuning: 1 hour
  • Monitoring Setup: 1-2 hours
  • Deployment & Validation: 2-3 hours
  • Total: 9-13 hours (1.5-2 days)

Assignee

TBD

Status Updates

  • 2025-11-09: Plan created after Sentry deadlock analysis
  • Next: Implement detached entity pattern and enhanced retry strategy