Future Tech

OneHouse takes $35M to fight for Hudi in table format wars

Tan KW
Publish date: Thu, 27 Jun 2024, 06:28 AM
Tan KW
0 449,906
Future Tech

OneHouse, a data lake company based around the open source Apache Hudi table format, has secured $35 million in Series B funding led by Craft Ventures.

The haul adds to the $33 million already invested by Addition and Greylock Partners, and comes at an interesting time for the data lake market.

The sum accrued by OneHouse, born out of a data project at ride-share giant Uber, is overshadowed by the $1 billion Databricks paid for Tabular, a so-called headless data warehouse originating in Netflix and based on a rival table format to Hudi, Apache Iceberg.

Tabular was founded just last year, but its founders were behind Iceberg, which has seen vendors Snowflake, Google, Cloudera, and others rally behind it. Microsoft and SAP, meanwhile, have backed Databricks' open source format Delta Lake, a Linux Foundation project.

Observers might think Hudi had been squeezed in the middle and OneHouse CEO Vinoth Chandar admitted there were calls following the Databricks-Tabular news.

"Over the following days, I received various inquiries from friends, ex-colleagues, users, analysts and the press about Apache Hudi's future and the overall landscape," he said.

The idea behind all the open table formats is more or less the same. By employing these formats, users can analyze data where its resides with their engine of choice without going through the cost and hassle of moving it into a data lake or data warehouse.

Speaking to The Register, Chandar said Hudi had suffered from a false dichotomy between Iceberg and Delta Lake, created for marketing purposes.

"Hudi is pre-installed in five public cloud providers, including AWS and GCP and it is directly query-able from pretty much most all of the engines, except for Snowflake," he said.

"The perceived lack of support for Hudi comes from vendors aggressively marketing and pushing Iceberg to differentiate it against Databricks in the last 24 months. That kind of sets a very artificial duopoly type situation in the market: format-wash, if you like."

Despite the focus on Iceberg and Delta, the community support around Hudi has actually been very steady compared to the other vendors, he said.

"The reason is, when you actually go to build open data lakehouses and using open source tools, Hudi has a lot to offer beyond just a table format. It has ingest tools, a very different concurrency model, indexes that help you like write data faster, record level metadata support, and so on. These are the actual technical reasons for which practitioners continue to evaluate, choose, and deploy Hudi."

OneHouse is also keen to stress interoperability with other table formats. It works with Iceberg so data can be read from Snowflake and Delta Lake so data can be read from Photon, the Databricks query engine. It is also supporting XTable, an incubating Apache project designed to ensure the data ecosystem does not fracture over table formats. And has backing from Microsoft and Snowflake.

Despite a $35 million cash injection, OneHouse remains a minnow in the lakehouse market. But Databricks' grab for Tabular and its Iceberg authors has unsettled some observers in the market. Maybe having a third horse in the race is not such a bad idea. ®

 

https://www.theregister.com//2024/06/26/onehouse_35_million_hudi/

Discussions
Be the first to like this. Showing 0 of 0 comments

Post a Comment