iMessage Pipeline Troubleshooting Guide
Version: 1.0 Last Updated: 2025-10-19 Project: iMessage Timeline Refactor
Table of Contents
- Date and Timezone Issues
- Missing Media Files
- Rate Limiting and API Errors
- Checkpoint and Resume Failures
- Validation Errors
- Performance Issues
- Common Error Messages
Date and Timezone Issues
Problem: "Date must end with Z suffix (UTC)"
Symptom: Validation fails with error about missing Z suffix.
❌ Validation failed:
- date: Date must end with Z suffix (UTC)
Cause: Non-UTC timezone in CSV data or manual edits.
Solution:
# Check the problematic message in JSON
jq '.messages[] | select(.date | endswith("Z") | not)' normalized.json
# Convert dates to UTC during ingest
chatline ingest-csv \
--input messages.csv \
--output ingested.json \
--force-utc
Prevention: Always use UTC dates. The pipeline enforces ISO 8601 with Z suffix.
Problem: Apple Epoch Conversion Errors
Symptom: Dates appear as year 1970 or 2159.
Date: 1970-01-01T00:00:00.000Z (should be 2024)
Cause:
- 1970: Treating Apple epoch as Unix epoch
- 2159: Apple epoch seconds interpreted as milliseconds
Apple Epoch Details:
- Apple epoch: Seconds since
2001-01-01 00:00:00 UTC - Valid range:
0to~5,000,000,000(year 2159) - Example:
756,864,000=2024-12-31 00:00:00 UTC
Solution:
// ✅ Correct: Add APPLE_EPOCH_SECONDS before converting
const APPLE_EPOCH_SECONDS = 978_307_200
const unixMs = (appleEpochSeconds + APPLE_EPOCH_SECONDS) * 1000
const date = new Date(unixMs).toISOString()
// ❌ Wrong: Treating as Unix timestamp
const date = new Date(appleEpochSeconds * 1000).toISOString()
Verification:
# Check date ranges in output
jq '.messages | map(.date) | unique | sort | .[0], .[-1]' normalized.json
# Should show reasonable date range:
# "2024-01-01T00:00:00.000Z"
# "2024-12-31T23:59:59.000Z"
Problem: DST Boundaries Cause Duplicate/Missing Messages
Symptom: Messages near DST transitions appear duplicated or missing.
DST Transition Times (varies by region):
- US: March 2am → 3am (spring), November 2am → 1am (fall)
- Australia: October 2am → 3am (spring), April 3am → 2am (fall)
Example Problem:
2024-03-10T02:30:00Z (doesn't exist during spring DST)
2024-11-03T01:30:00Z (exists twice during fall DST)
Solution: The pipeline uses UTC everywhere to avoid DST issues.
# Verify all dates are UTC
jq '.messages[].date | select(endswith("Z") | not)' normalized.json
# Should return no results (empty output)
If CSV has local times:
# Convert during ingest with timezone offset
chatline ingest-csv \
--input messages.csv \
--output ingested.json \
--source-timezone "America/New_York"
Problem: Leap Second Handling
Symptom: Validation fails near leap second timestamps.
❌ Invalid date: 2024-12-31T23:59:60.000Z
Cause: Leap seconds (23:59:60) are valid in UTC but not in ISO 8601.
Solution: Normalize to 23:59:59.000Z:
// The pipeline handles this automatically in date-converters.ts
export function normalizeLeapSecond(dateString: string): string {
return dateString.replace(/T23:59:60/, 'T23:59:59')
}
Verification:
# Check for leap second timestamps
grep -r "23:59:60" normalized.json
# Should be normalized to 23:59:59
Missing Media Files
Problem: "Attachment not found at path"
Symptom: Media messages missing files.
⚠ Missing files: 142/845
- /Users/you/Library/Messages/Attachments/aa/10/IMG_1234.heic
- /Users/you/Library/Messages/Attachments/bb/20/audio.m4a
Common Causes:
- Attachments deleted from disk
- Wrong attachment root directory
- Relative paths in CSV export
- External storage not mounted
Solution 1: Configure Multiple Attachment Roots
chatline ingest-csv \
--input messages.csv \
--output ingested.json \
--attachment-roots \
~/Library/Messages/Attachments \
/Volumes/Backup/old-messages/Attachments \
/Volumes/External/iMessage-Archive
Solution 2: Check Missing Files Report
# Generate detailed missing files report
jq '.messages[] | select(.messageKind == "media" and .media.path == null) | {
guid: .guid,
filename: .media.filename,
originalPath: .metadata.originalPath
}' ingested.json > missing-files.json
# Count missing by type
jq 'group_by(.filename | split(".") | .[-1]) | map({
extension: .[0].filename | split(".") | .[-1],
count: length
})' missing-files.json
Solution 3: Locate Files Manually
#!/bin/bash
# find-missing-attachments.sh
# Read missing files from report
jq -r '.[].filename' missing-files.json | while read filename; do
echo "Searching for $filename..."
# Search common locations
find ~/Library/Messages/Attachments -name "$filename" 2>/dev/null
find ~/Desktop -name "$filename" 2>/dev/null
find /Volumes -name "$filename" 2>/dev/null
done
Solution 4: Skip Missing Files
# Continue pipeline with provenance metadata for missing files
chatline normalize-link \
--input ingested.json \
--output normalized.json \
--keep-missing-files \
--verbose
# Missing files will have:
# media.path = null
# metadata.lastSeenPath = "/original/path/IMG_1234.heic"
# metadata.fileStatus = "missing"
Problem: HEIC/TIFF Conversion Fails
Symptom: Preview generation errors during enrichment.
❌ Failed to convert HEIC to JPG: IMG_5678.heic
Error: sharp: Input buffer contains unsupported image format
Cause:
- Corrupted HEIC files
- Unsupported HEIC variant (e.g., multi-image sequences)
- Missing libheif codec
Solution 1: Verify Sharp Installation
# Reinstall sharp with native dependencies
pnpm remove sharp
pnpm add sharp --force
# Verify codec support
node -e "require('sharp')().metadata().then(console.log)"
Solution 2: Convert Manually
# Use macOS sips command
sips -s format jpeg IMG_5678.heic --out IMG_5678.jpg
# Or ImageMagick
convert IMG_5678.heic IMG_5678.jpg
Solution 3: Skip Failed Conversions
chatline enrich-ai \
--input normalized.json \
--output enriched.json \
--skip-failed-conversions \
--log-conversion-errors ./conversion-errors.json
Problem: Absolute vs Relative Paths
Symptom: Enrichment fails with "ENOENT: no such file or directory".
❌ Error: ENOENT: no such file or directory, open 'Attachments/IMG_1234.heic'
Cause: CSV exports contain relative paths, not absolute paths.
Solution: The pipeline enforces absolute paths:
# Path validator converts relative → absolute
chatline normalize-link \
--input ingested.json \
--output normalized.json \
--attachment-roots ~/Library/Messages/Attachments
# Verifies all paths are absolute
# media.path = "/Users/you/Library/Messages/Attachments/aa/10/IMG_1234.heic"
Manual verification:
# Check for relative paths
jq '.messages[] | select(.messageKind == "media") | select(.media.path | startswith("/") | not)' normalized.json
# Should return no results
Rate Limiting and API Errors
Problem: 429 Too Many Requests (Gemini API)
Symptom: Enrichment stops with rate limit errors.
❌ Gemini API Error: 429 Too Many Requests
Retry-After: 60 seconds
Message: "Quota exceeded for quota metric 'Generate Content API requests per minute'"
Cause: Exceeded Gemini free tier limits (15 RPM).
Solution 1: Increase Delay
# Increase delay between requests to 4000ms (15 req/min)
chatline enrich-ai \
--input normalized.json \
--output enriched.json \
--rate-limit 4000
Solution 2: Upgrade API Tier
Free tier limits:
- 15 RPM (requests per minute)
- 1500 RPD (requests per day)
- 1 million TPM (tokens per minute)
Upgrade to paid tier:
- Visit Google AI Studio
- Enable billing
- Increase to 60 RPM or 360 RPM
Solution 3: Use Checkpoints
# Let pipeline auto-resume after rate limit resets
chatline enrich-ai \
--input normalized.json \
--output enriched.json \
--checkpoint-interval 50 \
--resume \
--max-retries 5
The pipeline will:
- Hit 429 error
- Wait for
Retry-Afterheader duration (or exponential backoff) - Write checkpoint
- Retry automatically
Problem: Exponential Backoff Not Working
Symptom: Pipeline retries too quickly after 429 errors.
❌ Retry attempt 1/3 after 2s...
❌ Retry attempt 2/3 after 4s...
❌ Retry attempt 3/3 after 8s...
❌ Max retries exceeded
Cause: Exponential backoff formula:
// Delay = 2^attempt seconds with ±25% jitter
const baseDelay = Math.pow(2, attemptNumber) * 1000 // ms
const jitter = baseDelay * 0.25 * (Math.random() * 2 - 1)
const delay = baseDelay + jitter
Solution: Respect Retry-After header:
# The pipeline automatically detects Retry-After
# Check rate-limiting.ts logs:
✓ 429 response received
→ Retry-After header: 60 seconds
→ Waiting 60000ms before retry...
Manual override:
// In your config
{
"gemini": {
"rateLimitDelay": 2000,
"maxRetries": 5,
"backoffMultiplier": 3 // 3^n instead of 2^n
}
}
Problem: Circuit Breaker Triggered
Symptom: Pipeline stops after consecutive failures.
❌ Circuit breaker OPEN after 5 consecutive failures
⏸ Halting enrichment to prevent cascading failures
Wait 60s for circuit to reset, then resume with --resume
Cause: Circuit breaker prevents hammering failing APIs.
Default thresholds:
- 5 consecutive failures → circuit opens
- 60 seconds → circuit resets (half-open state)
Solution 1: Wait for Reset
# Wait 60 seconds, then resume
sleep 60
chatline enrich-ai \
--input normalized.json \
--output enriched.json \
--resume
Solution 2: Adjust Threshold
# Increase threshold to 10 failures
chatline enrich-ai \
--input normalized.json \
--output enriched.json \
--circuit-breaker-threshold 10 \
--circuit-breaker-reset-ms 120000
Solution 3: Check API Status
# Verify Gemini API is operational
curl -H "Authorization: Bearer $GEMINI_API_KEY" \
https://generativelanguage.googleapis.com/v1beta/models
# Check status page
open https://status.cloud.google.com/
Problem: Firecrawl 503 Service Unavailable
Symptom: Link enrichment fails intermittently.
⚠ Firecrawl error: 503 Service Unavailable
→ Falling back to provider-specific parser (YouTube)
Solution: The pipeline has automatic fallbacks:
Firecrawl (primary)
↓ (fails)
YouTube Parser (if youtube.com URL)
↓ (fails)
Spotify Parser (if spotify.com URL)
↓ (fails)
Twitter Parser (if twitter.com/x.com URL)
↓ (fails)
Instagram Parser (if instagram.com URL)
↓ (fails)
Generic meta tag parser
Disable Firecrawl:
# Skip Firecrawl, use fallbacks only
chatline enrich-ai \
--input normalized.json \
--output enriched.json \
--enable-firecrawl false
Checkpoint and Resume Failures
Problem: "Config hash mismatch"
Symptom: Resume fails due to configuration change.
❌ Cannot resume: Config hash mismatch
Checkpoint: a3b5c8d0e2f4a6b8
Current: f9e1c3b7d5a9c1e3
Configuration has changed since checkpoint was created.
Delete checkpoint and restart, or restore original config.
Cause: Configuration changed between runs (e.g., different API key, rate limit settings).
Config hash includes:
geminiApiKeyfirecrawlApiKeyrateLimitDelayenableVisionAnalysisenableAudioTranscriptionenableLinkEnrichmentimageCacheDir
Solution 1: Delete Checkpoint
# Remove checkpoint and start fresh
rm -f checkpoints/enrich-checkpoint-*.json
chatline enrich-ai \
--input normalized.json \
--output enriched.json
Solution 2: Restore Original Config
# Find original config hash in checkpoint
jq '.configHash, .createdAt' checkpoints/enrich-checkpoint-500.json
# Restore config to match checkpoint
# (e.g., restore .env, imessage-config.json)
Solution 3: Force Resume (DANGEROUS)
# Override hash check (may cause inconsistent enrichments)
chatline enrich-ai \
--input normalized.json \
--output enriched.json \
--resume \
--force-resume-ignore-config
⚠️ Warning: Force resuming with different config may cause:
- Duplicate enrichments with different models
- Missing enrichments if providers disabled
- Inconsistent rate limiting
Problem: Checkpoint File Corrupted
Symptom: Resume fails with parse error.
❌ Failed to load checkpoint: checkpoints/enrich-checkpoint-300.json
SyntaxError: Unexpected end of JSON input
Cause: Checkpoint write interrupted (Ctrl+C during write, disk full).
Solution 1: Load Previous Checkpoint
# List checkpoints
ls -lh checkpoints/enrich-checkpoint-*.json
# Use earlier checkpoint
cp checkpoints/enrich-checkpoint-200.json checkpoints/enrich-checkpoint-latest.json
chatline enrich-ai \
--input normalized.json \
--output enriched.json \
--resume
Solution 2: Repair JSON
# Check for truncation
tail -c 50 checkpoints/enrich-checkpoint-300.json
# Attempt repair with jq
jq '.' checkpoints/enrich-checkpoint-300.json
# If repair fails, delete and use previous checkpoint
Solution 3: Start from Scratch
# Remove all checkpoints
rm -rf checkpoints/
chatline enrich-ai \
--input normalized.json \
--output enriched.json \
--checkpoint-interval 25 # More frequent checkpoints
Problem: Resume Skips Messages
Symptom: Resume starts at wrong index.
✓ Loaded checkpoint: last index 500
→ Resuming from index 501
⚠ But message 500 was not fully enriched!
Cause: Checkpoint written before enrichment completed.
Resume guarantee: The pipeline resumes within ≤1 item of checkpoint.
// Resume logic (checkpoint.ts)
export function getResumeIndex(checkpoint: EnrichCheckpoint): number {
// Resume at next item after last successfully processed
return checkpoint.lastProcessedIndex + 1
}
Solution: Check failed items list:
# Inspect checkpoint for failed items
jq '.failedItems' checkpoints/enrich-checkpoint-500.json
# Example output:
# [
# {
# "index": 245,
# "guid": "abc-123",
# "kind": "image_analysis",
# "error": "Gemini API timeout"
# }
# ]
# Re-enrich failed items manually
chatline enrich-ai \
--input normalized.json \
--output enriched.json \
--force-refresh \
--only-guids abc-123,def-456
Problem: Checkpoint Writes Too Frequently
Symptom: Enrichment slow due to frequent checkpoint writes.
✓ Checkpoint written: enrich-checkpoint-10.json (2.1 MB)
✓ Checkpoint written: enrich-checkpoint-20.json (4.2 MB)
✓ Checkpoint written: enrich-checkpoint-30.json (6.3 MB)
...
⏱ Total time: 3h 45m (expected 45m for 1000 messages)
Cause: Checkpoint interval too small (default 100).
Solution: Increase interval:
# Write checkpoint every 500 items instead of 100
chatline enrich-ai \
--input normalized.json \
--output enriched.json \
--checkpoint-interval 500
Tradeoffs:
- Smaller interval (10-50): More frequent backups, slower performance
- Larger interval (500-1000): Faster performance, lose more progress on failure
- Recommended: 100 items (default) balances speed and safety
Validation Errors
Problem: "messageKind='media' but media field missing"
Symptom: Schema validation fails.
❌ Validation failed: Message abc-123
- superRefine: messageKind='media' requires media field
Cause: CSV row classified as media but attachment path missing.
Solution: Check CSV row:
# Find problematic message
jq '.messages[] | select(.guid == "abc-123")' ingested.json
# Example:
# {
# "guid": "abc-123",
# "messageKind": "media", ← Classified as media
# "media": null ← But media field is null!
# }
Fix: Re-ingest with strict validation:
chatline ingest-csv \
--input messages.csv \
--output ingested.json \
--strict-media-validation \
--log-invalid ./invalid-rows.json
Problem: "Invalid GUID format"
Symptom: GUID validation fails.
❌ Validation error: guid must match pattern
guid: "invalid guid"
Expected: non-empty string
Cause: Malformed GUID in source data.
Valid GUID formats:
- CSV:
csv:<rowNumber>:<partIndex>(e.g.,csv:123:0) - DB:
<UUID>(e.g.,550e8400-e29b-41d4-a716-446655440000) - Part:
p:<index>/<parentGuid>(e.g.,p:1/abc-123)
Solution: Regenerate GUIDs:
# Regenerate GUIDs during ingest
chatline ingest-csv \
--input messages.csv \
--output ingested.json \
--regenerate-guids
Problem: "Enrichment kind already exists"
Symptom: Idempotency check prevents enrichment.
⏭ Skipping image_analysis for media-456: already enriched
Cause: Re-running enrichment without --force-refresh.
Solution 1: Normal Behavior
This is expected behavior (idempotency). Re-running won't duplicate enrichments.
Solution 2: Force Re-enrichment
# Re-enrich all messages (overwrites existing)
chatline enrich-ai \
--input normalized.json \
--output enriched.json \
--force-refresh
Solution 3: Clear Specific Enrichments
// Clear only image_analysis, keep others
import { clearEnrichmentByKind } from '@nathanvale/chatline'
for (const msg of messages) {
if (msg.messageKind === 'media') {
msg.media.enrichment = clearEnrichmentByKind(
msg.media.enrichment,
'image_analysis',
)
}
}
Problem: camelCase vs snake_case Field Names
Symptom: Validation rejects snake_case fields.
❌ Validation error: Unexpected field 'message_date'
Use camelCase: 'messageDate'
Cause: CSV export uses snake_case (e.g., message_date), schema requires
camelCase.
Solution: The ingest stage auto-converts:
// ingest-csv.ts mapping
const mapping = {
'Message Date': 'date', // CSV → camelCase
'Delivered Date': 'dateDelivered',
'Read Date': 'dateRead',
'Is From Me': 'isFromMe',
// ...
}
Verify conversion:
# Check for snake_case fields
jq 'keys' ingested.json | grep "_"
# Should return no results (all camelCase)
Performance Issues
Problem: Enrichment Takes Too Long
Symptom: Processing 1000 messages takes >2 hours.
Expected performance:
- Ingest CSV: ~500 messages/second
- Normalize-Link: ~1000 messages/second
- Enrich-AI: ~2 messages/second (limited by API rate)
- Render: ~5000 messages/second
Solutions:
1. Reduce Rate Limit Delay
# Default 1000ms → change to 500ms
chatline enrich-ai \
--rate-limit 500
⚠️ Warning: May trigger 429 errors if too aggressive.
2. Skip Certain Enrichments
# Skip audio transcription (slow)
chatline enrich-ai \
--enable-audio false
3. Parallel Processing
# Split messages into batches
jq '.messages[0:500]' normalized.json > batch1.json
jq '.messages[500:1000]' normalized.json > batch2.json
# Run in parallel
chatline enrich-ai --input batch1.json --output enriched1.json &
chatline enrich-ai --input batch2.json --output enriched2.json &
wait
# Merge results
jq -s '.[0].messages + .[1].messages | {messages: .}' enriched1.json enriched2.json > enriched.json
Problem: High Memory Usage
Symptom: Node.js crashes with OOM (out of memory).
<--- Last few GCs --->
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
Cause: Loading entire message JSON into memory.
Solution: Increase Node heap size:
# Increase to 4GB
NODE_OPTIONS="--max-old-space-size=4096" \
chatline enrich-ai \
--input normalized.json \
--output enriched.json
Streaming mode (future enhancement):
# Process in chunks
chatline enrich-ai \
--input normalized.json \
--output enriched.json \
--streaming \
--chunk-size 100
Common Error Messages
Error: "EACCES: permission denied"
❌ Error: EACCES: permission denied, open '/Users/you/Library/Messages/chat.db'
Fix: Grant Full Disk Access on macOS:
- System Preferences → Security & Privacy → Privacy
- Full Disk Access → Add Terminal or your app
- Restart Terminal
Error: "Cannot find module 'sharp'"
❌ Error: Cannot find module 'sharp'
Fix:
pnpm install sharp --force
Error: "Invalid API key"
❌ Gemini API Error: 401 Unauthorized
Invalid API key
Fix:
# Verify API key is set
echo $GEMINI_API_KEY
# Re-export if needed
export GEMINI_API_KEY="AIzaSy..."
# Or update .env
echo "GEMINI_API_KEY=AIzaSy..." >> .env
Error: "No messages found in input"
❌ Error: No messages found in input file
File: ingested.json
Fix: Check JSON structure:
# Verify envelope format
jq '.messages | length' ingested.json
# Should return number (e.g., 1234)
# Not: null or error
Getting Help
If you encounter an error not covered here:
-
Enable verbose logging:
chatline enrich-ai \
--input normalized.json \
--output enriched.json \
--verbose \
--log-file debug.log -
Check logs:
tail -f debug.log -
File an issue with:
- Error message (full stack trace)
- Command run
- Input file sample (first 10 messages)
- Environment (Node version, OS, pnpm version)
Related Documentation
- Usage Guide - How to run the pipeline
- Technical Specification - Architecture details
- Implementation Summary - File structure
Document Version: 1.0 Author: Generated from iMessage Pipeline implementation Last Updated: 2025-10-19