Skip to content

Multi-Market Implementation: Phase 2 - Database Migration and Data Migration

Status: Planning

Dependencies

Requires completed: - Phase 1 (Entities) - Must have Market, AffiliateProgram, BuyingOption entities

Blocks: - Phase 3 (Services) - Services need these new fields - Phase 4 (Cache/Repository) - Repositories query these fields - All subsequent phases

📋 Todo Checklist

  • [ ] Add market and priority fields to RoasterCrawlConfig entity
  • [ ] Add available field to CrawlUrl entity
  • [ ] Create database migration for schema changes
  • [ ] Create data migration script for CoffeeBean.available → CrawlUrl.available
  • [ ] Test migration up/down (reversibility)
  • [ ] Verify existing queries still work
  • [ ] Document transition period behavior
  • [ ] Plan removal of CoffeeBean.available field (future phase)

🔍 Analysis & Investigation

Problem Statement

The current system needs fields added to existing entities to support multi-market architecture: 1. RoasterCrawlConfig needs market assignment and priority 2. CrawlUrl needs availability tracking (replacing global CoffeeBean.available) 3. Existing data must be migrated safely

Current State

RoasterCrawlConfig: - Has shipsTo countries (logistics) - No market assignment (monetization) - No priority system

CrawlUrl: - No availability tracking - Relies on global CoffeeBean.available

CoffeeBean: - Has global available boolean - Applied to ALL crawl URLs (not granular enough)

Target State

RoasterCrawlConfig: - Has market (nullable during transition) - Has priority (default 50) - Keeps shipsTo (serves different purpose than Market.countries)

CrawlUrl: - Has available boolean (per-URL availability)

CoffeeBean: - available field removed (AFTER transition complete)

Migration Strategy

Transition Period: - New fields are nullable/have defaults - Configs with market = NULL work in "legacy mode" - Both old and new systems work simultaneously - Gradual migration of configs to markets

Backward Compatibility: - Down migration restores previous state - Data migration is reversible - No breaking changes during transition

📝 Implementation Plan

Prerequisites

  • Phase 1 completed (Market, AffiliateProgram, BuyingOption entities exist)
  • Database backup created
  • All existing migrations applied

Step-by-Step Implementation

Step 1: Update RoasterCrawlConfig Entity

File: src/Entity/RoasterCrawlConfig.php

Add new fields and relationships:

use App\Entity\Market;

// Add to class body:

#[ORM\ManyToOne(targetEntity: Market::class, inversedBy: 'roasterCrawlConfigs')]
#[ORM\JoinColumn(name: 'market_id', referencedColumnName: 'id', nullable: true)]
private ?Market $market = null;

#[ORM\Column(type: Types::INTEGER, options: ['default' => 50])]
#[Assert\Range(min: 0, max: 1000, notInRangeMessage: 'Priority must be between 0 and 1000.')]
private int $priority = 50;

// Add getters/setters:

public function getMarket(): ?Market
{
    return $this->market;
}

public function setMarket(?Market $market): self
{
    $this->market = $market;
    return $this;
}

public function getPriority(): int
{
    return $this->priority;
}

public function setPriority(int $priority): self
{
    $this->priority = $priority;
    return $this;
}

Add indexes to entity annotations:

#[ORM\Index(name: 'idx_rcc_market', columns: ['market_id'])]
#[ORM\Index(name: 'idx_rcc_priority', columns: ['priority'])]

Notes: - market is nullable to support transition period - priority defaults to 50 (allows higher/lower on both sides) - Keep existing shippingRegion and shipsTo fields (serve different purpose)

Step 2: Update CrawlUrl Entity

File: src/Entity/CrawlUrl.php

Add availability field:

#[ORM\Column(type: Types::BOOLEAN, options: ['default' => true])]
private bool $available = true;

// Add getter/setter:

public function isAvailable(): bool
{
    return $this->available;
}

public function setAvailable(bool $available): self
{
    $this->available = $available;
    return $this;
}

Add index:

#[ORM\Index(name: 'idx_crawl_url_available', columns: ['available'])]

Notes: - Default true for new crawl URLs - Existing URLs will be updated via data migration

Step 3: Generate Schema Migration

Run Doctrine migration diff:

make migrate-diff ARGS="multi_market_add_fields"

Verify generated migration includes:

// Up migration:
$this->addSql('ALTER TABLE roaster_crawl_config ADD COLUMN market_id UUID NULL');
$this->addSql('ALTER TABLE roaster_crawl_config ADD COLUMN priority INT DEFAULT 50 NOT NULL');
$this->addSql('ALTER TABLE crawl_url ADD COLUMN available BOOLEAN DEFAULT true NOT NULL');

// Add foreign key
$this->addSql('ALTER TABLE roaster_crawl_config ADD CONSTRAINT FK_RCC_MARKET FOREIGN KEY (market_id) REFERENCES market (id)');

// Add indexes
$this->addSql('CREATE INDEX idx_rcc_market ON roaster_crawl_config (market_id)');
$this->addSql('CREATE INDEX idx_rcc_priority ON roaster_crawl_config (priority)');
$this->addSql('CREATE INDEX idx_crawl_url_available ON crawl_url (available)');

Step 4: Create Data Migration Script

File: migrations/VersionXXX_migrate_availability_data.php

Create separate migration for data transformation:

<?php

declare(strict_types=1);

namespace DoctrineMigrations;

use Doctrine\DBAL\Schema\Schema;
use Doctrine\Migrations\AbstractMigration;

final class VersionXXX_migrate_availability_data extends AbstractMigration
{
    public function getDescription(): string
    {
        return 'Migrate CoffeeBean.available to CrawlUrl.available';
    }

    public function up(Schema $schema): void
    {
        // For each bean where available = false, set all its CrawlUrls to available = false
        $this->addSql('
            UPDATE crawl_url cu
            SET available = false
            WHERE cu.coffee_bean_id IN (
                SELECT id FROM coffee_bean WHERE available = false
            )
        ');

        // Log the migration
        $affectedRows = $this->connection->executeStatement('
            SELECT COUNT(*) FROM crawl_url WHERE available = false
        ');

        $this->write(sprintf('Migrated availability for %d CrawlUrls', $affectedRows));
    }

    public function down(Schema $schema): void
    {
        // Restore CoffeeBean.available from CrawlUrl.available
        // This assumes CoffeeBean.available field still exists

        // Set bean available = false if ANY of its CrawlUrls are unavailable
        $this->addSql('
            UPDATE coffee_bean cb
            SET available = false
            WHERE EXISTS (
                SELECT 1 FROM crawl_url cu
                WHERE cu.coffee_bean_id = cb.id
                AND cu.available = false
            )
        ');

        // Set bean available = true if ALL CrawlUrls are available
        $this->addSql('
            UPDATE coffee_bean cb
            SET available = true
            WHERE NOT EXISTS (
                SELECT 1 FROM crawl_url cu
                WHERE cu.coffee_bean_id = cb.id
                AND cu.available = false
            )
        ');
    }
}

Important: Run this migration AFTER the schema migration.

Step 5: Document Transition Behavior

Create documentation file: docs/multi-market-transition.md

# Multi-Market Transition Period

## Current State (During Transition)

### RoasterCrawlConfig.market
- **Nullable**: Yes
- **NULL behavior**: Legacy mode (uses shipsTo, no affiliate transformation)
- **Non-NULL behavior**: Multi-market mode (uses market countries, affiliate transformation)

### URL Resolution Logic

For each CrawlUrl, the system checks:
1. **Logistics check**: Does RCC ship to visitor's country? (RCC.shipsTo)
2. **Monetization check**: If RCC has market, does market serve visitor's country? (Market.countries)
3. **Both must pass** for bean to be shown

### Priority System

Even in legacy mode (market = NULL), priority is used:
- Default priority: 50
- Higher priority wins when multiple configs serve same country
- Allows gradual migration without URL resolution changes

## Migration Workflow

### Immediate (After Migration)
1. All existing RCCs have `market = NULL` and `priority = 50`
2. System works as before (legacy mode)
3. All CrawlUrls inherit availability from CoffeeBean.available

### Gradual (Admin Actions)
1. Create AffiliateProgram entities
2. Create Market entities
3. Assign RCCs to markets one by one
4. Adjust priorities as needed
5. Create BuyingOptions for special cases

### Eventually (After Full Transition)
1. All RCCs have markets assigned
2. Verify all beans have correct availability
3. Remove CoffeeBean.available field (separate migration)

## Rollback Procedure

If issues occur:
1. Revert code deployment
2. Run migration down: `bin/console doctrine:migrations:migrate prev`
3. Data is restored to previous state

Step 6: Create Migration Tests

File: tests/Migration/MultiMarketMigrationTest.php

<?php

namespace App\Tests\Migration;

use Symfony\Bundle\FrameworkBundle\Test\KernelTestCase;
use Doctrine\DBAL\Connection;

class MultiMarketMigrationTest extends KernelTestCase
{
    private Connection $connection;

    protected function setUp(): void
    {
        self::bootKernel();
        $this->connection = self::getContainer()->get('doctrine')->getConnection();
    }

    public function testSchemaChangesApplied(): void
    {
        // Verify new columns exist
        $this->assertTrue(
            $this->columnExists('roaster_crawl_config', 'market_id')
        );
        $this->assertTrue(
            $this->columnExists('roaster_crawl_config', 'priority')
        );
        $this->assertTrue(
            $this->columnExists('crawl_url', 'available')
        );
    }

    public function testIndexesCreated(): void
    {
        // Verify indexes exist
        $this->assertTrue(
            $this->indexExists('roaster_crawl_config', 'idx_rcc_market')
        );
        $this->assertTrue(
            $this->indexExists('roaster_crawl_config', 'idx_rcc_priority')
        );
        $this->assertTrue(
            $this->indexExists('crawl_url', 'idx_crawl_url_available')
        );
    }

    public function testDataMigration(): void
    {
        // Create test data: bean with available=false
        $this->connection->insert('coffee_bean', [
            'id' => 'test-bean-id',
            'name' => 'Test Bean',
            'available' => false,
            // ... other required fields
        ]);

        $this->connection->insert('crawl_url', [
            'id' => 'test-url-id',
            'url' => 'https://example.com/product',
            'coffee_bean_id' => 'test-bean-id',
            'available' => true, // Default
            // ... other required fields
        ]);

        // Run data migration logic
        $this->connection->executeStatement('
            UPDATE crawl_url cu
            SET available = false
            WHERE cu.coffee_bean_id IN (
                SELECT id FROM coffee_bean WHERE available = false
            )
        ');

        // Verify CrawlUrl.available was updated
        $result = $this->connection->fetchAssociative(
            'SELECT available FROM crawl_url WHERE id = ?',
            ['test-url-id']
        );

        $this->assertFalse($result['available']);
    }

    private function columnExists(string $table, string $column): bool
    {
        $schemaManager = $this->connection->createSchemaManager();
        $columns = $schemaManager->listTableColumns($table);
        return isset($columns[$column]);
    }

    private function indexExists(string $table, string $index): bool
    {
        $schemaManager = $this->connection->createSchemaManager();
        $indexes = $schemaManager->listTableIndexes($table);
        return isset($indexes[$index]);
    }
}

Step 7: Test Migration Reversibility

Create test script to verify up/down migrations work:

#!/bin/bash
# Test migration reversibility

set -e

echo "Creating database backup..."
make db-backup

echo "Running migration up..."
make migrate

echo "Verifying schema..."
bin/console doctrine:schema:validate

echo "Running migration down..."
bin/console doctrine:migrations:migrate prev --no-interaction

echo "Verifying rollback..."
bin/console doctrine:schema:validate

echo "Running migration up again..."
make migrate

echo "Migration reversibility test passed!"

Testing Strategy

Manual Testing

  1. Database Backup:
  2. Create backup before migration
  3. Verify backup contains all data

  4. Schema Migration:

  5. Run make migrate
  6. Verify new columns exist with correct types
  7. Verify indexes created
  8. Verify foreign keys created

  9. Data Migration:

  10. Verify CrawlUrls inherit availability from CoffeeBean
  11. Check edge cases (beans with no URLs, NULL availability)

  12. Rollback Test:

  13. Run migration down
  14. Verify schema restored
  15. Verify data restored
  16. Run migration up again

Integration Tests

  1. Existing Queries:
  2. Verify all existing queries still work
  3. Test RoasterCrawlConfig queries with market = NULL
  4. Test CrawlUrl queries with available field

  5. Legacy Mode:

  6. Verify configs with market = NULL still function
  7. Test URL resolution with legacy configs
  8. Verify no breaking changes

Performance Testing

  1. Query Performance:
  2. Verify new indexes improve query performance
  3. Test queries on large datasets
  4. Check for N+1 query issues

🎯 Success Criteria

  • Schema migration runs successfully without errors
  • New fields added to RoasterCrawlConfig (market, priority)
  • New field added to CrawlUrl (available)
  • All indexes created correctly
  • Data migration completes: CrawlUrl.available reflects CoffeeBean.available
  • Migration is reversible (up/down works correctly)
  • Existing queries continue working
  • Legacy mode (market = NULL) functions correctly
  • No breaking changes to existing functionality
  • Doctrine schema validation passes
  • All tests pass

Files to Modify:

  • src/Entity/RoasterCrawlConfig.php (add market, priority)
  • src/Entity/CrawlUrl.php (add available)

Files to Create:

  • migrations/VersionXXX_multi_market_add_fields.php (auto-generated schema)
  • migrations/VersionXXX_migrate_availability_data.php (manual data migration)
  • docs/multi-market-transition.md (documentation)
  • tests/Migration/MultiMarketMigrationTest.php

Files to Keep (not remove yet):

  • src/Entity/CoffeeBean.php - Keep available field during transition
  • Will be removed in Phase 8 (Cleanup) after full transition

Notes

Why Keep RCC.shipsTo?

Different purposes: - RCC.shipsTo = Physical shipping capability - Market.countries = Monetization regions

Both are checked: 1. Can roaster ship to visitor? (RCC.shipsTo) 2. Can we monetize in visitor's country? (Market.countries)

Not deprecated: RCC.shipsTo serves a distinct purpose and will NOT be removed.

Transition Period

Duration: Indefinite - no forced migration deadline Flexibility: Admins migrate configs to markets at their own pace Safety: Both systems work simultaneously

Data Integrity

Unique constraint: BuyingOption (coffee_bean_id, market_id) prevents duplicates Foreign keys: All relationships enforced at database level Cascading deletes: Market deletion removes BuyingOptions

Next Steps

After completing this phase: - Proceed to Phase 3 (Services) to implement BuyingOptionService and AffiliateUrlService - Entities and schema ready for service layer - Data migrated and safe to use in new system