"Big Data" bills too big?

It doesn't have to be this way.

Aug 26, 2025

Most analytical queries are actually small (think ~100MB, not terabytes), yet we're still defaulting to expensive cloud data warehouses for everything.

DuckDB's Hannes Mühleisen makes a compelling point: we're paying a "big data premium" for scale we rarely use.

The problem? Analytical workloads running in the wrong place. OLTP databases like Postgres excel at fast writes, not heavy analytics. Warehouses are overkill for smaller queries.

That’s where in-process analytics engines like DuckDB shine. Instead of overloading production databases or routing everything through warehouses, you can:

Offload mid-sized analytics (reporting, KPIs, rollups) to embedded engines
Query Parquet/CSV files directly with sub-second performance
Serve insights locally without provisioning clusters

The new pattern: pre-calculate insights in your warehouse, export to open formats like Parquet, then let lightweight engines handle interactive exploration and reporting.

Food for thought: use database logs to identify your top 5 analytical queries running in the wrong place. Pilot one workload replacement – swap a pandas report or costly dashboard with Parquet + DuckDB. Measure your wins: impact on query performance, CPU load reduction, and warehouse cost savings. Crack a bottle of champaign. 🍾

The Data Exec

Discussion about this post