Prerequisites
Before using this skill, ensure:
- Database credentials with SELECT and DELETE permissions on source tables
- Access to destination storage (archive table or cloud storage credentials)
- Network connectivity to cloud storage services if using S3/Azure/GCS
- Backup of database before first archival run
- Understanding of data retention requirements and compliance policies
- Monitoring tools configured to track archival job success
Instructions
Step 1: Define Archival Criteria
- Identify tables containing historical data for archival
- Define age threshold for archival (e.g., records older than 1 year)
- Determine additional criteria (status flags, record size, access frequency)
- Calculate expected data volume to be archived
- Document business requirements and compliance policies
Step 2: Choose Archival Destination
- Evaluate options: archive table in same database, separate archive database, or cold storage
- For cloud storage: select S3, Azure Blob, or GCS based on infrastructure
- Configure destination storage with appropriate security and access controls
- Set up compression settings for storage efficiency
- Define data format for archived records (CSV, Parquet, JSON)
Step 3: Create Archive Schema
- Design archive table schema matching source table structure
- Add metadata columns (archived_at, source_table, archive_reason)
- Create indexes on commonly queried archive columns
- For cloud storage: define bucket structure and naming conventions
- Test archive schema with sample data
Step 4: Implement Archival Logic
- Write SQL query to identify records meeting archival criteria
- Create extraction script to export records from source tables
- Implement transformation logic if archive format differs from source
- Build verification queries to confirm data integrity after archival
- Add transaction handling to ensure atomicity (delete only if archive succeeds)
Step 5: Execute Archival Process
- Run archival in staging environment first with subset of data
- Verify archived data integrity and completeness
- Execute archival in production during low-traffic window
- Monitor database performance during archival operation
- Generate archival report with record counts and storage savings
Step 6: Automate Retention Policy
- Schedule periodic archival jobs (weekly, monthly)
- Configure automated monitoring and alerting for job failures
- Implement cleanup of successfully archived records from source tables
- Set up expiration policies on archived data per compliance requirements
- Document archival schedule and retention periods
Output
This skill produces:
Archival Scripts: SQL and shell scripts to extract, transform, and load data to archive destination
Archive Tables/Files: Structured storage containing historical records with metadata and timestamps
Verification Reports: Row counts, data checksums, and integrity checks confirming successful archival
Storage Metrics: Database size reduction, archive storage utilization, and cost savings estimates
Archival Logs: Detailed logs of each archival run with timestamps, record counts, and any errors
Error Handling
Insufficient Storage Space:
- Check available disk space on archive destination before execution
- Implement storage monitoring and alerting
- Use compression to reduce archive size
- Clean up old archives per retention policy before new archival
Data Integrity Issues:
- Run checksums on source data before and after archival
- Implement row count verification between source and archive
- Keep source data until archive verification completes
- Rollback archive transaction if verification fails
Permission Denied Errors:
- Verify database user has SELECT on source tables and INSERT on archive tables
- Confirm cloud storage credentials have write permissions
- Check network security groups allow connections to cloud storage
- Document required permissions for archival automation
Timeout During Large Archival:
- Split archival into smaller batches by date ranges
- Run archival incrementally over multiple days
- Increase database timeout settings for archival sessions
- Schedule archival during maintenance windows with extended timeouts
Resources
Archival Configuration Templates:
- PostgreSQL archival:
{baseDir}/templates/postgresql-archive-config.yaml
- MySQL archival:
{baseDir}/templates/mysql-archive-config.yaml
- S3 cold storage:
{baseDir}/templates/s3-archive-config.yaml
- Azure Blob storage:
{baseDir}/templates/azure-archive-config.yaml
Retention Policy Definitions: {baseDir}/policies/retention-policies.yaml
Archival Scripts Library: {baseDir}/scripts/archival/
- Extract to CSV script
- Extract to Parquet script
- S3 upload with compression
- Archive verification queries
Monitoring Dashboards: {baseDir}/monitoring/archival-dashboard.json
Cost Analysis Tools: {baseDir}/tools/storage-cost-calculator.py