Why Data Migration Scripts Fail (And What Works Instead)

CSVImport Team 6 min read

Custom migration scripts break in production. Learn why enterprise IT teams are switching to spec-driven import tools for reliability and maintainability.

We’ve all been there. The stakeholder says: “We need to migrate 50,000 customer records from the old system to the new API by Monday.”

Your options:

  1. Manual data entry (absolutely not)
  2. Write a custom Python/Node script
  3. Pay a consultant thousands of dollars
  4. Use an existing import tool

Most IT teams choose option 2: write a script. It seems simple enough:

# How hard can this be?
import csv, requests

for row in csv.DictReader(open('customers.csv')):
    requests.post('https://api.example.com/customers', json=row)

Three weeks later:

This article explores why custom migration scripts fail and what enterprise teams are doing instead.

Table of Contents

  1. The Hidden Complexity of Data Migration
  2. The Real Cost of Custom Scripts
  3. What Enterprise Teams Do Instead
  4. When Custom Scripts Still Make Sense
  5. Choosing an Import Tool
  6. Migration to Spec-Driven Tools

The Hidden Complexity of Data Migration

That simple 5-line script ignores a dozen real-world problems:

1. Authentication Hell

APIs don’t just accept anonymous POST requests. They need:

Your script grows:

import csv, requests, time, jwt
from requests_oauthlib import OAuth2Session

# 50 lines of authentication code
oauth = OAuth2Session(client_id, token_url=token_url)
token = oauth.fetch_token(...)

# Token refresh logic
if token_expired():
    refresh_token()

# Make request with auth
response = requests.post(url, json=row, headers={
    'Authorization': f'Bearer {token}',
    'X-API-Key': api_key,
    'X-Request-ID': str(uuid.uuid4()),
})

Now your 5-line script is 100 lines, and you’re debugging OAuth flows instead of migrating data.

2. Rate Limiting

APIs have rate limits. Hit them and you get:

HTTP 429 Too Many Requests
Retry-After: 60

Your script crashes, or worse, continues with silent failures.

You add retry logic:

import time
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

session = requests.Session()
retry = Retry(
    total=5,
    backoff_factor=1,
    status_forcelist=[429, 500, 502, 503, 504]
)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)

# Add rate limiting
time.sleep(0.1)  # 10 requests per second

Now you’re implementing exponential backoff and jitter strategies. This is getting out of hand.

3. Data Validation

Your CSV has “dirty” data:

Your script needs validation:

def clean_row(row):
    # Handle empty strings
    for key, value in row.items():
        if value == '' or value == 'N/A':
            row[key] = None

    # Parse dates
    if row.get('date'):
        row['date'] = parse_date(row['date'])  # Another 50 lines

    # Validate email
    if row.get('email'):
        row['email'] = row['email'].strip().lower()
        if not is_valid_email(row['email']):
            raise ValueError(f"Invalid email: {row['email']}")

    # Convert types
    if row.get('age'):
        row['age'] = int(row['age'])

    return row

You’re now writing a full data validation framework.

4. Error Handling

Imports fail. Some rows succeed, some fail. You need to know:

Your script needs error tracking:

import csv

successful_rows = []
failed_rows = []

for i, row in enumerate(csv.DictReader(open('customers.csv')), start=2):
    try:
        row = clean_row(row)
        response = session.post(url, json=row)
        response.raise_for_status()
        successful_rows.append(i)
    except Exception as e:
        failed_rows.append({
            'row_number': i,
            'data': row,
            'error': str(e)
        })
        continue

# Write failed rows to CSV for manual review
with open('failed_rows.csv', 'w') as f:
    writer = csv.DictWriter(f, fieldnames=['row_number', 'data', 'error'])
    writer.writeheader()
    writer.writerows(failed_rows)

print(f"Success: {len(successful_rows)}, Failed: {len(failed_rows)}")

Now you’re writing CSV generation logic and error reporting systems.

5. Schema Changes

APIs evolve. Next month:

Your script breaks:

HTTP 400 Bad Request
{
  "error": "Missing required field: customer_type"
}

You need to update the script for every API change. Multiply this by 10 APIs and you’re maintaining a mess.

6. No Progress Tracking

You run the script:

python migrate.py

And wait. Is it working? How many rows processed? How long until it finishes?

You add progress tracking:

from tqdm import tqdm

rows = list(csv.DictReader(open('customers.csv')))
for row in tqdm(rows, desc="Importing"):
    # ... import logic

7. The “Works on My Machine” Problem

Your script runs fine on your laptop. In production:

You’re now writing Dockerfiles and deployment documentation.

The Real Cost of Custom Scripts

Let’s do the math.

Writing the script:

Total: 12 hours

Maintaining the script:

Total annual cost: 12 hours/year

Scaling to multiple migrations:

At $75/hour (average IT staff rate), that’s:

And this assumes no major issues or data loss incidents.

What Enterprise Teams Do Instead

Smart IT operations teams use spec-driven import tools that handle the complexity automatically.

The Spec-Driven Approach

Instead of writing code, you:

  1. Point at an OpenAPI spec - Tool reads the API definition
  2. Upload CSV - Automatic separator detection
  3. Map fields - Visual drag-and-drop interface
  4. Submit - Batch processing with progress tracking

For a detailed walkthrough, see our complete CSV to API import guide for non-developers.

The tool handles:

✅ Authentication (reads security schemes from OpenAPI) ✅ Rate limiting (respects API limits) ✅ Data validation (enforces required fields, types) ✅ Error handling (detailed error reports with row numbers) ✅ Retry logic (exponential backoff on transient failures) ✅ Progress tracking (real-time updates) ✅ API changes (re-import spec, mappings still work)

Real-World Example: Enterprise SaaS Migration

Scenario: Migrating 250,000 customer records from legacy CRM to new API.

Custom script approach:

Total: 100 hours over 5 weeks

Spec-driven tool approach:

Total: 14 hours over 2 days

Savings: 86 hours (85% reduction)

When Custom Scripts Still Make Sense

Use custom scripts when:

  1. One-time migration with simple data - 100 rows, no auth, flat structure
  2. Extremely custom transformations - Complex business logic, multi-source joins
  3. No API available - Direct database access required
  4. Learning exercise - Teaching team how APIs work

Don’t use custom scripts when:

Choosing an Import Tool

When evaluating import tools, look for:

Must-Have Features

OpenAPI/Swagger support - Automatic endpoint discovery ✅ Visual field mapping - No code required ✅ Batch processing - Handle large datasets ✅ Error reporting - Detailed logs with row numbers ✅ Authentication support - API keys, Bearer tokens, OAuth

Nice-to-Have Features

Auto-mapping - Suggest field mappings ✅ Retry logic - Automatic retry on failures ✅ Rate limiting - Respect API limits ✅ Template saving - Reuse configurations ✅ Webhook notifications - Alert when import completes

Enterprise Features

Audit logs - Who ran what import, when ✅ Role-based access - Control who can import to which APIs ✅ Scheduled imports - Automate recurring imports ✅ API response storage - Keep records of created resources ✅ White-label - Embed in your own tools

Migration to Spec-Driven Tools

Already have custom scripts? Here’s how to transition:

Phase 1: New Imports

Phase 2: Replace High-Maintenance Scripts

Phase 3: Deprecate All Scripts

The Bottom Line

Custom migration scripts fail because:

  1. They underestimate complexity (authentication, rate limiting, errors)
  2. They’re hard to maintain (API changes break scripts)
  3. They don’t scale (writing 10 scripts takes 10x the time)
  4. They lack observability (hard to debug, poor error messages)

Spec-driven import tools succeed because:

  1. They handle complexity automatically (authentication, retry logic, validation)
  2. They adapt to changes (re-import spec, mappings persist)
  3. They scale effortlessly (same tool for any OpenAPI-compliant API)
  4. They provide visibility (progress tracking, detailed error reports)

The math is simple: 14 hours with a tool vs 100 hours with scripts.

Your time is valuable. Stop writing migration scripts and start importing data.


Ready to ditch custom scripts? CSVImport handles authentication, rate limiting, validation, and error reporting automatically. Try the demo or join the waitlist for early access.

Ready to try CSVImport?

Import your CSV data into any API in minutes. No coding required.

More from the blog

← Back to all articles