DBT

How to Use Seed Files in a dbt-Core Project

Seed files are a powerful feature in dbt-core that allow you to include static, structured data directly in your analytics project. They’re especially useful for reference tables, mapping datasets, or any data that doesn’t change often but is critical for analysis. In this post, we’ll explore what seed files are, how to use them in a dbt-core project, and best practices for managing them.

What Are Seed Files?

Seed files are CSV files stored in your dbt project that dbt can load into your database as tables. These files are perfect for managing small, static datasets, such as:

  • State or country codes
  • Currency conversion rates
  • Default configuration values for models

dbt treats seed files like any other model, allowing you to version control them, reference them in downstream transformations, and even apply column-level configurations. In this blog I will use a small dataset which contains data about Pokemon cards.

Setting Up Seed Files in dbt-Core

Create the Seed File
In your dbt project, create a directory named seeds/ if it doesn’t already exist. Add your CSV file to this directory, ensuring it’s cleanly formatted with headers. For example: seeds/PokemonCardsCSV.csv

Define Your Seeds in dbt_project.yml
You can configure seeds globally or on a per-file basis in your dbt_project.yml. For example, to set a specific schema for your seeds:

# Configuring seeds

seeds:
  DBTMaterials:
    +schema: ref
    +delimiter: ';'

Run Your Seeds
To load the seed files into your database, use the dbt seed command:

dbt seed

By default, dbt will create tables in your database matching the seed file names, and the data will be loaded as-is.

Reference Seed Data in Models
Once the seed file is loaded, you can use it in your SQL models like any other table. For example:

SELECT
    *
FROM {{ ref("PokemonCardsCSV") }}

Best Practices for Using Seed Files

Keep Seed Files Small
Seed files are meant for small, static datasets. Avoid using them for large or frequently changing data; consider staging tables or snapshots for those use cases.

Use Version Control
Like other dbt assets, seed files should be version-controlled. This ensures changes to reference data are tracked and auditable.

Validate Data
Ensure your seed files are well-formatted and free of errors. Invalid data in a seed file can propagate issues across your models.

Apply Data Governance
Use column-level configurations in dbt_project.yml to enforce data types and descriptions for seed file columns.

Automate with CI/CD
Include the dbt seed command in your CI/CD pipeline to ensure seed files are loaded consistently in all environments.


Conclusion

Seed files in dbt-core provide a simple yet effective way to manage static datasets within your project. By following the steps and best practices outlined here, you can leverage seeds to make your dbt project more robust and maintainable. Whether you’re mapping country codes or managing conversion rates, seed files can save you time and keep your analytics pipeline clean and efficient.