Streamlining Dataset Migrations with Background Coding Agents at Spotify

By ● min read

At Spotify, migrating thousands of datasets to new storage or schema formats was once a painstaking manual process. To solve this, the engineering team developed background coding agents—automated scripts that handle dataset transformations behind the scenes. These agents work seamlessly with three key tools: Honk (a dependency tracking system), Backstage (an internal developer portal), and Fleet Management (for orchestrating large-scale deployments). The result is a highly efficient, error‑reducing pipeline that supercharges downstream consumer migrations. Below, we explore how each component contributes and answer common questions about this innovative approach.

1. What are background coding agents and how do they assist in dataset migrations?

Background coding agents are automated programs that run asynchronously to modify datasets without requiring manual intervention from engineers. At Spotify, these agents are designed to perform specific migration tasks—such as renaming columns, changing data types, or reformatting files—on downstream consumer datasets. They execute in the background, monitoring triggers from upstream changes and applying transformations to ensure compatibility. This reduces the need for human developers to write bespoke migration scripts for each dataset, cutting down on both time and the risk of human error. The agents can be reused across multiple migration waves, making them a scalable solution for the thousands of datasets Spotify manages. By offloading repetitive coding work to these agents, engineers can focus on higher‑level tasks, while migrations happen reliably and consistently.

Streamlining Dataset Migrations with Background Coding Agents at Spotify — Source: engineering.atspotify.com

2. How does Spotify's Honk system facilitate automated dataset migrations?

Honk is a system that tracks dataset dependencies and lineage across Spotify's data ecosystem. In the context of migrations, Honk plays a critical role by identifying all downstream consumers that rely on a given dataset. When a source dataset changes, Honk automatically determines which downstream datasets need to be migrated and in what order—preventing conflicts and data loss. It also stores metadata about each dependency, allowing background coding agents to fetch the correct transformation logic. Honk essentially acts as a central registry that orchestrates the migration flow: it notifies the agents when a change is detected, provides the exact transformation parameters, and verifies that the migration was successful. This tight integration eliminates manual tracking and ensures that migrations are fully automated from start to finish.

3. What role does Backstage play in managing and orchestrating dataset migrations?

Backstage is Spotify's internal developer portal, and for dataset migrations it serves as a control plane for visibility and governance. Engineers use Backstage to view the status of migrations across all datasets, inspect logs from background coding agents, and manually approve or roll back changes if needed. It provides a unified dashboard that aggregates data from Honk and Fleet Management, showing which datasets are migrated, in progress, or failed. Backstage also hosts service catalogs where teams can document their datasets and migration rules, making it easier to onboard new agents. By integrating with Backstage, the migration process becomes transparent: stakeholders can track progress, audit changes, and collaborate without needing deep technical knowledge of the underlying automation. This reduces friction and helps maintain data quality during large‑scale migrations.

4. How does Fleet Management support the migration process for thousands of datasets?

Fleet Management is the infrastructure layer that handles the scheduling, execution, and scaling of background coding agents. It deploys agents as jobs across Spotify's compute cluster, ensuring that migrations run efficiently even when thousands of datasets are involved. Fleet Management manages resource allocation—spinning up parallel agent instances to process independent datasets simultaneously, while respecting priority and dependency constraints defined by Honk. It also provides monitoring and retry logic: if an agent fails due to a transient error, Fleet Management automatically restarts it or escalates to an error queue. This resilience is crucial for large‑scale migrations, as it minimizes manual oversight. Combined with Honk's dependency tracking, Fleet Management ensures that migrations happen in the correct order and at a pace that doesn't overwhelm the system.

5. What are the main benefits of using background coding agents for downstream consumer migrations?

The primary benefit is dramatic time savings. Instead of engineers manually writing and running migration scripts for each downstream consumer, background coding agents automate the entire process. This reduces migration windows from weeks to hours. Additionally, errors caused by human oversight—such as forgetting to update a dependent dataset or applying the wrong schema—are virtually eliminated because agents follow predefined, tested logic. The system also scales effortlessly: as Spotify grows, new migrations are handled by simply defining new agent jobs without expanding the engineering team. Furthermore, the combination of Honk, Backstage, and Fleet Management provides end‑to‑end visibility and control, so teams retain governance without sacrificing speed. Ultimately, these agents allow Spotify to evolve its data infrastructure continuously while keeping downstream consumers running smoothly.

6. What challenges did Spotify face when migrating datasets and how were they addressed?

Before implementing background coding agents, Spotify's data teams struggled with manual, error‑prone migrations. Each dataset had to be individually assessed, requiring deep knowledge of its schema and consumers. Coordinating across teams was difficult, leading to frequent delays and production incidents when downstream datasets broke. The solution was to build an automated system that combined dependency tracking (Honk), developer portal visibility (Backstage), and scalable execution (Fleet Management). This required creating a standardized agent interface and ensuring agents could handle diverse data formats. A key challenge was designing the agents to be idempotent—so that rerunning a migration wouldn't cause double transformations. By iterating on the agent framework and integrating it tightly with the existing infrastructure, Spotify overcame these hurdles, turning dataset migrations into a quiet, background process that no longer demands constant human attention.

Tags: