Fivetran acquires Tobiko Data (Makers of SQLMesh, SQLGlot)

Roundup #4

Sep 04, 2025

Hello data folks 👋

Fivetran just acquired Tobiko Data, the creators of SQLMesh, the main open-source competitor to dbt in the data transformation space. It’s the second acquisition Fivetran made this year, the first one being Census, a reverse ETL tool. Fivetran is in full expansion mode, with its services now also including a fully managed data lake. Anyway, what does that mean for current SQLMesh users? Only time will tell.

Sources: Reddit, Fivetran announcement post

Launch of Polars Cloud and Distributed Polars

Polars’ blog | September 3, 2025 | 7 minute read

Polars launched Polars Cloud on AWS with General Availability, plus a Distributed Engine in Open Beta.

The platform offers vertical, horizontal, and diagonal scaling strategies through a single API. Their distributed engine leverages Polars' streaming architecture to handle partitioned queries while maintaining order-dependent processing capabilities. Early features include native Iceberg support, with on-premise deployment and autoscaling coming in the next few months.

This positions Polars as a direct alternative to Spark for teams seeking simpler distributed DataFrame processing, for workloads that just need pandas-like simplicity.

OpenAI Acquires Product Testing Startup Statsig and Shakes Up Its Leadership Team

Techcrunch | September 2, 2025 | 2 minute read

OpenAI acquired product testing startup Statsig for $1.1 billion in stock, bringing founder Vijaye Raji onboard as CTO of Applications. The deal represents one of OpenAI's largest acquisitions under its $300 billion valuation.

Statsig's experimentation platform will be integrated to accelerate product development across OpenAI's Applications division, led by former Instacart CEO Fidji Simo. The acquisition coincides with leadership restructuring as OpenAI scales its enterprise and consumer product offerings.

This move could also indicate that major AI companies like OpenAI are prioritizing in-house experimentation capabilities over vendor solutions. According to Statsig’s CEO, Vijaye Raji: “If you’re an existing customer, don’t worry. Statsig will continue to provide our services and invest in our core products. Our customers will remain a top priority.” But you know the drill, right? 😉

The Hidden Complexity of Feature Stores: Why ML Teams Struggle Without Them

Shishir Nanga | September 3, 2025 | 5 minute read

Netflix, Uber, Airbnb, and DoorDash all built custom feature stores (centralized repositories for ML model inputs) because scaling ML isn't only about bigger models, it's also about managing features across hundreds of teams.

Feature stores solve the silent killers of production ML: training-serving skew (models trained on different data than they serve), poor feature reusability, and latency-throughput tradeoffs. The complexity is real: point-in-time joins that prevent data leakage (accidentally using future information that shouldn’t be used), managing both sub-100ms real-time serving and petabyte-scale batch training.

Every major cloud provider now offers managed feature stores, demonstrating this isn't a niche need. ML success at scale depends more on feature engineering capabilities than model parameters.

Without centralized platforms for storing and serving model inputs, teams waste resources recreating identical features, and models quietly fail in production.

Two More Things

Confirm my suspicion about data modeling

Datawarelakebasehousemart (Reddit)

That’s the brief.

The Data Exec

Discussion about this post

Ready for more?