Basic dlt
When working with dlt, you should think in terms of two main components: your source and your destination. In our case, the source will be a simple list of dictionaries defined in our code, and the destination will be the same DuckDB database we’ve been using throughout this course.
This setup allows us to explore the basics of how a dlt pipeline works without adding complexity. Once you're comfortable with the mechanics, you can easily scale up to dynamic sources like APIs or cloud storage.
import os
import dlt
data = [
{"id": 1, "name": "Alice"},
{"id": 2, "name": "Bob"},
]
@dlt.source
def simple_source():
@dlt.resource
def load_dict():
yield data
return load_dict
pipeline = dlt.pipeline(
pipeline_name="simple_pipeline",
destination=dlt.destinations.duckdb(os.getenv("DUCKDB_DATABASE")),
dataset_name="mydata",
)
load_info = pipeline.run(simple_source())
The code above does the following:
- Creates a list containing two dicts called
data
. - Uses the
dlt.source
decorator to define a source function, inside of this source is adlt.resource
decorated function that yields the list defined above. - Creates a pipeline using
dlt.pipeline()
that sets the pipeline name, destination (DuckDB) and name of the dataset as it will appear in DuckDB. - Executes the pipeline with
pipeline.run
.
We can execute this code by running the file:
python dlt_quick_start.py
Since dlt is a pure Python framework, there are no additional services or heavy dependencies required, it runs natively in your Python environment.
dlt Benefits
What should stand out most about the dlt approach is how much more ergonomic and streamlined it is compared to the code we previously wrote by hand. If you recall, when working directly with DuckDB, we had to manually manage several steps:
- Stage the data by writing it to a CSV file.
- Define the target table and schema in DuckDB ahead of time.
- Load the data using a COPY statement.
With dlt, all of these responsibilities are abstracted away. Once you define your destination, dlt takes care of the rest. From managing the schema and table creation to inferring data types and performing the actual data load is handled by dlt. This greatly reduces boilerplate code and makes your ETL pipelines more maintainable and adaptable.