Automation · Deep dive 06
Data Pipelines & ETL
Data in the right shape at the right time. Pipelines that move, transform, and land data where it's useful — warehouse, dashboard, or the next automation. Reliable, observable, versioned.
What this covers
Production data pipelines: ingestion from source systems, transformation (SQL-first or dbt), landing in a warehouse (Postgres, BigQuery, Snowflake, ClickHouse), with proper tests, scheduling, and backfills.
Does this sound familiar?
-
'Monday dashboards' requires someone to manually refresh three CSVs first.
-
Reports contradict each other because they pull from different sources.
-
The 'data warehouse' is a Google Sheet with 60,000 rows.
-
Nobody knows exactly when last night's ingest ran, or if it did.
-
Adding a new source requires a week of SQL surgery because there's no pattern.
The customer payoff
What you get
What you feel once it’s running.
A warehouse you trust — one answer per question.
-
Scheduled, observable pipelines with retries and alerts.
-
Transformations expressed in SQL that analysts can read and extend.
-
Backfills + schema migrations that don't break production.
Phases
⏱ 6–12 weeks typicalHow Data Pipelines & ETL actually runs.
-
01
Inventory
List every source + destination, including the hidden ones. Map ownership and freshness requirements."
-
02
Design
Pick ingestion tool (Airbyte, Fivetran, custom), warehouse shape (Postgres / BigQuery / Snowflake), transformation layer (dbt)."
-
03
Build + test
Pipelines in Airflow / Dagster / Prefect, transformations in dbt with tests. Data quality tests fail builds."
-
04
Migrate + cut
Old feeds stay running in parallel for 30 days. Cut over only when dashboards match."
The hand-off
The package
What lands in your hands — every artefact, nothing hidden.
-
Warehouse with documented schema
-
Scheduled pipelines with alerts + retries
-
dbt project with tests + lineage
-
Runbook for common failures + backfills
-
Migration guide from old sources
-
BI tool connections (Metabase, Looker, or your choice)
Straight questions
-
Q·01 Fivetran or Airbyte or custom?
Fivetran if the sources are on their connector list and cost is acceptable. Airbyte when you want self-hosted or non-standard sources. Custom pipelines when neither fits."
-
Q·02 Which warehouse?
Postgres up to ~100GB analytical data and it stays cheap + simple. BigQuery for ad-hoc scale + Google Cloud shops. Snowflake for serious analytics workloads. ClickHouse when you need real-time and can tolerate the ops cost."
-
Q·03 Do you do real-time?
When warranted. Batch is almost always enough; we'll push back on real-time if it's for 'real-time' dashboards that nobody actually watches live."
-
Q·04 Can analysts maintain this?
dbt specifically chosen so they can. Transformations are SQL they can read + extend. Ingestion + orchestration stays engineering-owned."
-
Q·05 What about GDPR / data retention?
Reviewed per pipeline. PII masking, retention policies, and access roles built in from scope day one."
Ready to start
Data you can actually act on.
Two-day audit of sources + destinations, honest shape of the warehouse, clear build plan. Start with what your dashboards are lying about.
Start a pipeline engagementThe wider map
Every service page at a glance.
Each link below opens a dedicated page on that specific piece of one of our four service pillars. Jump sideways — different service, same way of working.
Digital Product Strategy
Service overview →Web & Mobile Development
Service overview →Business Automation
Service overview →- 01 Workflow Automation
- 02 AI-Assisted Operations
- 03 Process Digitisation
- 04 Custom Internal Tools
- 05 System Integration & APIs
- 06 Data Pipelines & ETL — you’re here