We’ve all been there. The stakeholder says: “We need to migrate 50,000 customer records from the old system to the new API by Monday.”

Your options:

Manual data entry (absolutely not)
Write a custom Python/Node script
Pay a consultant thousands of dollars
Use an existing import tool

Most IT teams choose option 2: write a script. It seems simple enough:

# How hard can this be?
import csv, requests

for row in csv.DictReader(open('customers.csv')):
    requests.post('https://api.example.com/customers', json=row)

Three weeks later:

The script works… sometimes
Half the data imported with errors you didn’t notice
The API changed and now your script is broken
Your manager asks “Can we use this for next month’s product import?”
You realize you’ll be maintaining this forever

This article explores why custom migration scripts fail and what enterprise teams are doing instead.

The Hidden Complexity of Data Migration
The Real Cost of Custom Scripts
What Enterprise Teams Do Instead
When Custom Scripts Still Make Sense
Choosing an Import Tool
Migration to Spec-Driven Tools

The Hidden Complexity of Data Migration

That simple 5-line script ignores a dozen real-world problems:

1. Authentication Hell

APIs don’t just accept anonymous POST requests. They need:

API keys in custom headers
OAuth 2.0 tokens that expire every hour
JWT tokens with specific claims
Mutual TLS certificates
Request signing (HMAC, AWS Signature v4)

Your script grows:

import csv, requests, time, jwt
from requests_oauthlib import OAuth2Session

# 50 lines of authentication code
oauth = OAuth2Session(client_id, token_url=token_url)
token = oauth.fetch_token(...)

# Token refresh logic
if token_expired():
    refresh_token()

# Make request with auth
response = requests.post(url, json=row, headers={
    'Authorization': f'Bearer {token}',
    'X-API-Key': api_key,
    'X-Request-ID': str(uuid.uuid4()),
})

Now your 5-line script is 100 lines, and you’re debugging OAuth flows instead of migrating data.

2. Rate Limiting

APIs have rate limits. Hit them and you get:

HTTP 429 Too Many Requests
Retry-After: 60

Your script crashes, or worse, continues with silent failures.

You add retry logic:

import time
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

session = requests.Session()
retry = Retry(
    total=5,
    backoff_factor=1,
    status_forcelist=[429, 500, 502, 503, 504]
)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)

# Add rate limiting
time.sleep(0.1)  # 10 requests per second

Now you’re implementing exponential backoff and jitter strategies. This is getting out of hand.

3. Data Validation

Your CSV has “dirty” data:

Empty strings where API expects null
"N/A" in numeric fields
Dates in 15 different formats
Country codes (US, USA, United States, america)
Phone numbers with/without country codes
Emails with trailing spaces

Your script needs validation:

def clean_row(row):
    # Handle empty strings
    for key, value in row.items():
        if value == '' or value == 'N/A':
            row[key] = None

    # Parse dates
    if row.get('date'):
        row['date'] = parse_date(row['date'])  # Another 50 lines

    # Validate email
    if row.get('email'):
        row['email'] = row['email'].strip().lower()
        if not is_valid_email(row['email']):
            raise ValueError(f"Invalid email: {row['email']}")

    # Convert types
    if row.get('age'):
        row['age'] = int(row['age'])

    return row

You’re now writing a full data validation framework.

4. Error Handling

Imports fail. Some rows succeed, some fail. You need to know:

Which rows failed?
Why did they fail?
How do I retry just the failed rows?

Your script needs error tracking:

import csv

successful_rows = []
failed_rows = []

for i, row in enumerate(csv.DictReader(open('customers.csv')), start=2):
    try:
        row = clean_row(row)
        response = session.post(url, json=row)
        response.raise_for_status()
        successful_rows.append(i)
    except Exception as e:
        failed_rows.append({
            'row_number': i,
            'data': row,
            'error': str(e)
        })
        continue

# Write failed rows to CSV for manual review
with open('failed_rows.csv', 'w') as f:
    writer = csv.DictWriter(f, fieldnames=['row_number', 'data', 'error'])
    writer.writeheader()
    writer.writerows(failed_rows)

print(f"Success: {len(successful_rows)}, Failed: {len(failed_rows)}")

Now you’re writing CSV generation logic and error reporting systems.

5. Schema Changes

APIs evolve. Next month:

Required field added (customer_type)
Field renamed (email → primary_email)
Nested object introduced (flat address → nested address.street, address.city)

Your script breaks:

HTTP 400 Bad Request
{
  "error": "Missing required field: customer_type"
}

You need to update the script for every API change. Multiply this by 10 APIs and you’re maintaining a mess.

6. No Progress Tracking

You run the script:

python migrate.py

And wait. Is it working? How many rows processed? How long until it finishes?

You add progress tracking:

from tqdm import tqdm

rows = list(csv.DictReader(open('customers.csv')))
for row in tqdm(rows, desc="Importing"):
    # ... import logic

7. The “Works on My Machine” Problem

Your script runs fine on your laptop. In production:

Different Python version
Missing dependencies
Different CSV encoding (UTF-8 vs Windows-1252)
Environment variables not set
Firewall blocks outbound HTTPS

You’re now writing Dockerfiles and deployment documentation.

The Real Cost of Custom Scripts

Let’s do the math.

Writing the script:

Initial version: 2 hours
Authentication: 3 hours (OAuth is painful)
Error handling: 2 hours
Testing and debugging: 4 hours
Documentation: 1 hour

Total: 12 hours

Maintaining the script:

API changes (3x per year): 2 hours each = 6 hours/year
Bug fixes when edge cases appear: 4 hours/year
Helping colleagues use it: 2 hours/year

Total annual cost: 12 hours/year

Scaling to multiple migrations:

You need 10 different imports per year
Each requires a similar script
Total time: 120 hours initial + 120 hours annual maintenance

At $75/hour (average IT staff rate), that’s:

$9,000 first year
$9,000 every subsequent year

And this assumes no major issues or data loss incidents.

What Enterprise Teams Do Instead

Smart IT operations teams use spec-driven import tools that handle the complexity automatically.

The Spec-Driven Approach

Instead of writing code, you:

Point at an OpenAPI spec - Tool reads the API definition
Upload CSV - Automatic separator detection
Map fields - Visual drag-and-drop interface
Submit - Batch processing with progress tracking

For a detailed walkthrough, see our complete CSV to API import guide for non-developers.

The tool handles:

✅ Authentication (reads security schemes from OpenAPI) ✅ Rate limiting (respects API limits) ✅ Data validation (enforces required fields, types) ✅ Error handling (detailed error reports with row numbers) ✅ Retry logic (exponential backoff on transient failures) ✅ Progress tracking (real-time updates) ✅ API changes (re-import spec, mappings still work)

Real-World Example: Enterprise SaaS Migration

Scenario: Migrating 250,000 customer records from legacy CRM to new API.

Custom script approach:

Week 1: Write script (40 hours)
Week 2: Debug authentication issues (15 hours)
Week 3: Handle rate limiting (10 hours)
Week 4: Fix data validation errors (20 hours)
Week 5: Run migration, handle failures (15 hours)

Total: 100 hours over 5 weeks

Spec-driven tool approach:

Day 1: Configure OpenAPI spec (1 hour)
Day 1: Map fields (2 hours)
Day 1: Test with 100 rows (1 hour)
Day 2: Run full import (8 hours for 250k rows)
Day 2: Fix 2% failed rows (2 hours)

Total: 14 hours over 2 days

Savings: 86 hours (85% reduction)

When Custom Scripts Still Make Sense

Use custom scripts when:

One-time migration with simple data - 100 rows, no auth, flat structure
Extremely custom transformations - Complex business logic, multi-source joins
No API available - Direct database access required
Learning exercise - Teaching team how APIs work

Don’t use custom scripts when:

Migrating more than 1,000 rows
API requires authentication
Data has validation requirements
You’ll need to do similar imports again
API is documented with OpenAPI

Choosing an Import Tool

When evaluating import tools, look for:

Must-Have Features

✅ OpenAPI/Swagger support - Automatic endpoint discovery ✅ Visual field mapping - No code required ✅ Batch processing - Handle large datasets ✅ Error reporting - Detailed logs with row numbers ✅ Authentication support - API keys, Bearer tokens, OAuth

Nice-to-Have Features

✅ Auto-mapping - Suggest field mappings ✅ Retry logic - Automatic retry on failures ✅ Rate limiting - Respect API limits ✅ Template saving - Reuse configurations ✅ Webhook notifications - Alert when import completes

Enterprise Features

✅ Audit logs - Who ran what import, when ✅ Role-based access - Control who can import to which APIs ✅ Scheduled imports - Automate recurring imports ✅ API response storage - Keep records of created resources ✅ White-label - Embed in your own tools

Migration to Spec-Driven Tools

Already have custom scripts? Here’s how to transition:

Phase 1: New Imports

Stop writing new scripts
Use import tool for all new migrations
Measure time savings

Phase 2: Replace High-Maintenance Scripts

Identify scripts that break often
Replace with tool configurations
Document field mappings

Phase 3: Deprecate All Scripts

Replace remaining scripts
Archive old code
Update documentation

The Bottom Line

Custom migration scripts fail because:

They underestimate complexity (authentication, rate limiting, errors)
They’re hard to maintain (API changes break scripts)
They don’t scale (writing 10 scripts takes 10x the time)
They lack observability (hard to debug, poor error messages)

Spec-driven import tools succeed because:

They handle complexity automatically (authentication, retry logic, validation)
They adapt to changes (re-import spec, mappings persist)
They scale effortlessly (same tool for any OpenAPI-compliant API)
They provide visibility (progress tracking, detailed error reports)

The math is simple: 14 hours with a tool vs 100 hours with scripts.

Your time is valuable. Stop writing migration scripts and start importing data.

Ready to ditch custom scripts? CSVImport handles authentication, rate limiting, validation, and error reporting automatically. Try the demo or join the waitlist for early access.

Why Data Migration Scripts Fail (And What Works Instead)

Table of Contents