All articles

In-depth guides on streaming pipelines, cost optimization, open table formats, and building reliable data platforms.

All Iceberg Lakehouse Streaming Platform Quality Orchestration

Apache Iceberg v3 feature overview showing Variant Type, Nanosecond Timestamps, Deletion Vectors, Geospatial Types, and Table Encryption

IcebergApr 14, 2026

Apache Iceberg v3: What's New?

A deep dive into everything new in Apache Iceberg format version 3 — from the Variant type and nanosecond timestamps to deletion vectors, row lineage, geospatial types, and table-level encryption.

David W

11 min read

Apache Iceberg v3 building a multi-engine lakehouse with QueryFlux

IcebergApr 10, 2026

Building a Multi-Engine Iceberg Lakehouse with QueryFlux

How to use QueryFlux as the SQL routing layer in front of Trino, DuckDB, and StarRocks — giving every client one endpoint while each query lands on the right Iceberg engine automatically.

David W

11 min read

Lakehouse architecture diagram with layered data tiers

LakehouseMar 28, 2026

dbt Incremental Models That Actually Scale

Incremental models promise fast builds, but most teams hit correctness bugs within weeks. Here is how to build dbt incrementals that stay correct through late data and schema changes.

David W

10 min read

Streaming data flow visualization with wave patterns

StreamingMar 15, 2026

Real-Time CDC with Debezium and Kafka

Build a production CDC pipeline from PostgreSQL to Kafka using Debezium with log-based capture, schema registry, exactly-once delivery, and zero-downtime snapshots.

Chris P

12 min read

Snowflake cost optimization with governance guardrails: minimalist cover with sizing and spend visuals

PlatformMar 1, 2026

Snowflake Cost Optimization Without Slowing Teams Down

A practical playbook for reducing Snowflake compute waste by 30-50% while protecting delivery speed and analyst productivity with governance guardrails.

Chris P

11 min read

Data quality dashboard with pass rate metrics

QualityFeb 18, 2026

Data Quality Contracts with dbt and Soda

How to implement enforceable data contracts between producers and consumers using dbt model contracts, Soda anomaly detection, and CI/CD gates that block bad data before it reaches production.

David W

11 min read

Pipeline orchestration DAG visualization with connected nodes

OrchestrationFeb 5, 2026

Migrating from Airflow to Dagster: A Practical Guide

A step-by-step migration path from Apache Airflow's task-centric model to Dagster's asset-based approach, covering code translation, testing patterns, and a realistic timeline.

Chris P

11 min read

LakehouseJan 20, 2026

Medallion Architecture: Bronze, Silver, Gold Done Right

The medallion architecture is everywhere, but most implementations get the layer boundaries wrong. Here is how to design bronze, silver, and gold tiers that actually scale.

David W

11 min read

Data platform optimization dashboard with metric tiles

PlatformJan 8, 2026

Terraform for Your Data Platform: Infrastructure as Code

Managing Snowflake warehouses, AWS S3 buckets, and IAM roles with Terraform — from provider setup and remote state to CI/CD pipelines that plan on PR and apply on merge.

Chris P

11 min read

PlatformDec 20, 2025

PySpark Performance Tuning: From 4 Hours to 20 Minutes

Practical PySpark optimizations that reduced a production pipeline from 4 hours to 20 minutes — covering data skew, broadcast joins, partition sizing, and AQE.

Chris P

11 min read

StreamingDec 5, 2025

Event-Driven Data Architecture with Kafka

How to design Kafka topic hierarchies, schema evolution strategies, and consumer patterns that scale to thousands of events per second reliably.

Chris P

10 min read

QualityNov 18, 2025

Data Pipeline Observability: From Alerts to Root Cause

Building observability into data pipelines for fast incident detection and root cause analysis — covering freshness, volume, schema, distribution, and lineage.

Chris P

11 min read