Why Data Migration Scripts Fail (And What Works Instead)
Custom migration scripts break in production. Learn why enterprise IT teams are switching to spec-driven import tools for reliability and maintainability.
We’ve all been there. The stakeholder says: “We need to migrate 50,000 customer records from the old system to the new API by Monday.”
Your options:
- Manual data entry (absolutely not)
- Write a custom Python/Node script
- Pay a consultant thousands of dollars
- Use an existing import tool
Most IT teams choose option 2: write a script. It seems simple enough:
# How hard can this be?
import csv, requests
for row in csv.DictReader(open('customers.csv')):
requests.post('https://api.example.com/customers', json=row)
Three weeks later:
- The script works… sometimes
- Half the data imported with errors you didn’t notice
- The API changed and now your script is broken
- Your manager asks “Can we use this for next month’s product import?”
- You realize you’ll be maintaining this forever
This article explores why custom migration scripts fail and what enterprise teams are doing instead.
Table of Contents
- The Hidden Complexity of Data Migration
- The Real Cost of Custom Scripts
- What Enterprise Teams Do Instead
- When Custom Scripts Still Make Sense
- Choosing an Import Tool
- Migration to Spec-Driven Tools
The Hidden Complexity of Data Migration
That simple 5-line script ignores a dozen real-world problems:
1. Authentication Hell
APIs don’t just accept anonymous POST requests. They need:
- API keys in custom headers
- OAuth 2.0 tokens that expire every hour
- JWT tokens with specific claims
- Mutual TLS certificates
- Request signing (HMAC, AWS Signature v4)
Your script grows:
import csv, requests, time, jwt
from requests_oauthlib import OAuth2Session
# 50 lines of authentication code
oauth = OAuth2Session(client_id, token_url=token_url)
token = oauth.fetch_token(...)
# Token refresh logic
if token_expired():
refresh_token()
# Make request with auth
response = requests.post(url, json=row, headers={
'Authorization': f'Bearer {token}',
'X-API-Key': api_key,
'X-Request-ID': str(uuid.uuid4()),
})
Now your 5-line script is 100 lines, and you’re debugging OAuth flows instead of migrating data.
2. Rate Limiting
APIs have rate limits. Hit them and you get:
HTTP 429 Too Many Requests
Retry-After: 60
Your script crashes, or worse, continues with silent failures.
You add retry logic:
import time
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
session = requests.Session()
retry = Retry(
total=5,
backoff_factor=1,
status_forcelist=[429, 500, 502, 503, 504]
)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
# Add rate limiting
time.sleep(0.1) # 10 requests per second
Now you’re implementing exponential backoff and jitter strategies. This is getting out of hand.
3. Data Validation
Your CSV has “dirty” data:
- Empty strings where API expects null
"N/A"in numeric fields- Dates in 15 different formats
- Country codes (US, USA, United States, america)
- Phone numbers with/without country codes
- Emails with trailing spaces
Your script needs validation:
def clean_row(row):
# Handle empty strings
for key, value in row.items():
if value == '' or value == 'N/A':
row[key] = None
# Parse dates
if row.get('date'):
row['date'] = parse_date(row['date']) # Another 50 lines
# Validate email
if row.get('email'):
row['email'] = row['email'].strip().lower()
if not is_valid_email(row['email']):
raise ValueError(f"Invalid email: {row['email']}")
# Convert types
if row.get('age'):
row['age'] = int(row['age'])
return row
You’re now writing a full data validation framework.
4. Error Handling
Imports fail. Some rows succeed, some fail. You need to know:
- Which rows failed?
- Why did they fail?
- How do I retry just the failed rows?
Your script needs error tracking:
import csv
successful_rows = []
failed_rows = []
for i, row in enumerate(csv.DictReader(open('customers.csv')), start=2):
try:
row = clean_row(row)
response = session.post(url, json=row)
response.raise_for_status()
successful_rows.append(i)
except Exception as e:
failed_rows.append({
'row_number': i,
'data': row,
'error': str(e)
})
continue
# Write failed rows to CSV for manual review
with open('failed_rows.csv', 'w') as f:
writer = csv.DictWriter(f, fieldnames=['row_number', 'data', 'error'])
writer.writeheader()
writer.writerows(failed_rows)
print(f"Success: {len(successful_rows)}, Failed: {len(failed_rows)}")
Now you’re writing CSV generation logic and error reporting systems.
5. Schema Changes
APIs evolve. Next month:
- Required field added (
customer_type) - Field renamed (
email→primary_email) - Nested object introduced (flat
address→ nestedaddress.street,address.city)
Your script breaks:
HTTP 400 Bad Request
{
"error": "Missing required field: customer_type"
}
You need to update the script for every API change. Multiply this by 10 APIs and you’re maintaining a mess.
6. No Progress Tracking
You run the script:
python migrate.py
And wait. Is it working? How many rows processed? How long until it finishes?
You add progress tracking:
from tqdm import tqdm
rows = list(csv.DictReader(open('customers.csv')))
for row in tqdm(rows, desc="Importing"):
# ... import logic
7. The “Works on My Machine” Problem
Your script runs fine on your laptop. In production:
- Different Python version
- Missing dependencies
- Different CSV encoding (UTF-8 vs Windows-1252)
- Environment variables not set
- Firewall blocks outbound HTTPS
You’re now writing Dockerfiles and deployment documentation.
The Real Cost of Custom Scripts
Let’s do the math.
Writing the script:
- Initial version: 2 hours
- Authentication: 3 hours (OAuth is painful)
- Error handling: 2 hours
- Testing and debugging: 4 hours
- Documentation: 1 hour
Total: 12 hours
Maintaining the script:
- API changes (3x per year): 2 hours each = 6 hours/year
- Bug fixes when edge cases appear: 4 hours/year
- Helping colleagues use it: 2 hours/year
Total annual cost: 12 hours/year
Scaling to multiple migrations:
- You need 10 different imports per year
- Each requires a similar script
- Total time: 120 hours initial + 120 hours annual maintenance
At $75/hour (average IT staff rate), that’s:
- $9,000 first year
- $9,000 every subsequent year
And this assumes no major issues or data loss incidents.
What Enterprise Teams Do Instead
Smart IT operations teams use spec-driven import tools that handle the complexity automatically.
The Spec-Driven Approach
Instead of writing code, you:
- Point at an OpenAPI spec - Tool reads the API definition
- Upload CSV - Automatic separator detection
- Map fields - Visual drag-and-drop interface
- Submit - Batch processing with progress tracking
For a detailed walkthrough, see our complete CSV to API import guide for non-developers.
The tool handles:
✅ Authentication (reads security schemes from OpenAPI) ✅ Rate limiting (respects API limits) ✅ Data validation (enforces required fields, types) ✅ Error handling (detailed error reports with row numbers) ✅ Retry logic (exponential backoff on transient failures) ✅ Progress tracking (real-time updates) ✅ API changes (re-import spec, mappings still work)
Real-World Example: Enterprise SaaS Migration
Scenario: Migrating 250,000 customer records from legacy CRM to new API.
Custom script approach:
- Week 1: Write script (40 hours)
- Week 2: Debug authentication issues (15 hours)
- Week 3: Handle rate limiting (10 hours)
- Week 4: Fix data validation errors (20 hours)
- Week 5: Run migration, handle failures (15 hours)
Total: 100 hours over 5 weeks
Spec-driven tool approach:
- Day 1: Configure OpenAPI spec (1 hour)
- Day 1: Map fields (2 hours)
- Day 1: Test with 100 rows (1 hour)
- Day 2: Run full import (8 hours for 250k rows)
- Day 2: Fix 2% failed rows (2 hours)
Total: 14 hours over 2 days
Savings: 86 hours (85% reduction)
When Custom Scripts Still Make Sense
Use custom scripts when:
- One-time migration with simple data - 100 rows, no auth, flat structure
- Extremely custom transformations - Complex business logic, multi-source joins
- No API available - Direct database access required
- Learning exercise - Teaching team how APIs work
Don’t use custom scripts when:
- Migrating more than 1,000 rows
- API requires authentication
- Data has validation requirements
- You’ll need to do similar imports again
- API is documented with OpenAPI
Choosing an Import Tool
When evaluating import tools, look for:
Must-Have Features
✅ OpenAPI/Swagger support - Automatic endpoint discovery ✅ Visual field mapping - No code required ✅ Batch processing - Handle large datasets ✅ Error reporting - Detailed logs with row numbers ✅ Authentication support - API keys, Bearer tokens, OAuth
Nice-to-Have Features
✅ Auto-mapping - Suggest field mappings ✅ Retry logic - Automatic retry on failures ✅ Rate limiting - Respect API limits ✅ Template saving - Reuse configurations ✅ Webhook notifications - Alert when import completes
Enterprise Features
✅ Audit logs - Who ran what import, when ✅ Role-based access - Control who can import to which APIs ✅ Scheduled imports - Automate recurring imports ✅ API response storage - Keep records of created resources ✅ White-label - Embed in your own tools
Migration to Spec-Driven Tools
Already have custom scripts? Here’s how to transition:
Phase 1: New Imports
- Stop writing new scripts
- Use import tool for all new migrations
- Measure time savings
Phase 2: Replace High-Maintenance Scripts
- Identify scripts that break often
- Replace with tool configurations
- Document field mappings
Phase 3: Deprecate All Scripts
- Replace remaining scripts
- Archive old code
- Update documentation
The Bottom Line
Custom migration scripts fail because:
- They underestimate complexity (authentication, rate limiting, errors)
- They’re hard to maintain (API changes break scripts)
- They don’t scale (writing 10 scripts takes 10x the time)
- They lack observability (hard to debug, poor error messages)
Spec-driven import tools succeed because:
- They handle complexity automatically (authentication, retry logic, validation)
- They adapt to changes (re-import spec, mappings persist)
- They scale effortlessly (same tool for any OpenAPI-compliant API)
- They provide visibility (progress tracking, detailed error reports)
The math is simple: 14 hours with a tool vs 100 hours with scripts.
Your time is valuable. Stop writing migration scripts and start importing data.
Ready to ditch custom scripts? CSVImport handles authentication, rate limiting, validation, and error reporting automatically. Try the demo or join the waitlist for early access.
Ready to try CSVImport?
Import your CSV data into any API in minutes. No coding required.
More from the blog
Bulk Import Products to Stripe from CSV
Step-by-step tutorial showing how to import your product catalog into Stripe. No programming required.
Complete CSV to API Import Guide
Learn how to bulk import CSV data without writing scripts or code. Complete guide for non-developers.
Smart Field Mapping with Visual Validation
Prevent data import disasters with visual warnings for unmapped fields and semantic mismatches.
Why Data Migration Scripts Fail
Understanding the common pitfalls in data migration and how to avoid them with proper tooling.