Fivetran acquires Tobiko Data (Makers of SQLMesh, SQLGlot)
Roundup #4
Hello data folks đ
Fivetran just acquired Tobiko Data, the creators of SQLMesh, the main open-source competitor to dbt in the data transformation space. Itâs the second acquisition Fivetran made this year, the first one being Census, a reverse ETL tool. Fivetran is in full expansion mode, with its services now also including a fully managed data lake. Anyway, what does that mean for current SQLMesh users? Only time will tell.
Sources: Reddit, Fivetran announcement post
Launch of Polars Cloud and Distributed Polars
Polarsâ blog | September 3, 2025 | 7 minute read
Polars launched Polars Cloud on AWS with General Availability, plus a Distributed Engine in Open Beta.
The platform offers vertical, horizontal, and diagonal scaling strategies through a single API. Their distributed engine leverages Polars' streaming architecture to handle partitioned queries while maintaining order-dependent processing capabilities. Early features include native Iceberg support, with on-premise deployment and autoscaling coming in the next few months.
This positions Polars as a direct alternative to Spark for teams seeking simpler distributed DataFrame processing, for workloads that just need pandas-like simplicity.
OpenAI Acquires Product Testing Startup Statsig and Shakes Up Its Leadership Team
Techcrunch | September 2, 2025 | 2 minute read
OpenAI acquired product testing startup Statsig for $1.1 billion in stock, bringing founder Vijaye Raji onboard as CTO of Applications. The deal represents one of OpenAI's largest acquisitions under its $300 billion valuation.
Statsig's experimentation platform will be integrated to accelerate product development across OpenAI's Applications division, led by former Instacart CEO Fidji Simo. The acquisition coincides with leadership restructuring as OpenAI scales its enterprise and consumer product offerings.
This move could also indicate that major AI companies like OpenAI are prioritizing in-house experimentation capabilities over vendor solutions. According to Statsigâs CEO, Vijaye Raji: âIf youâre an existing customer, donât worry. Statsig will continue to provide our services and invest in our core products. Our customers will remain a top priority.â But you know the drill, right? đ
The Hidden Complexity of Feature Stores: Why ML Teams Struggle Without Them
Shishir Nanga | September 3, 2025 | 5 minute read
Netflix, Uber, Airbnb, and DoorDash all built custom feature stores (centralized repositories for ML model inputs) because scaling ML isn't only about bigger models, it's also about managing features across hundreds of teams.
Feature stores solve the silent killers of production ML: training-serving skew (models trained on different data than they serve), poor feature reusability, and latency-throughput tradeoffs. The complexity is real: point-in-time joins that prevent data leakage (accidentally using future information that shouldnât be used), managing both sub-100ms real-time serving and petabyte-scale batch training.
Every major cloud provider now offers managed feature stores, demonstrating this isn't a niche need. ML success at scale depends more on feature engineering capabilities than model parameters.
Without centralized platforms for storing and serving model inputs, teams waste resources recreating identical features, and models quietly fail in production.
Two More Things
Confirm my suspicion about data modeling
Datawarelakebasehousemart (Reddit)
Thatâs the brief.


