Future Tech

Teradata takes plunge into lakehouse waters, but not everyone is convinced

Tan KW
Publish date: Thu, 23 May 2024, 11:01 PM
Tan KW
0 445,240
Future Tech

With its vision of a unified enterprise data warehouse, Teradata attracted globally dominant customers including HSBC, Unilever and Walmart. But earlier this month, it confirmed backing of the lakehouse concept, which combines both messy data lakes and structured data warehouses, together with the idea of analytics anywhere, supported by object storage and open table formats.

Although its hand may have been forced, observers pointed out that there is still a place for Teradata's mainstay high-performance, block storage-based analytics.

The 45-year-old company previously announced support for open table formats (OTFs) Apache Iceberg and Linux Foundation Delta Lake. In doing so, it embraced an industry trend towards performing analytics on data in situ rather than moving it to a single store for BI and other analysis.

Teradata also spoke approvingly for the first time about the lakehouse architecture, a term introduced by rival Databricks to describe an environment for both machine learning and data exploration, and the traditional BI and analytics usually done in the more regimented environment of an enterprise data warehouse.

AI adoption, or so Teradata claimed, had consolidated data warehouses, analytics, and data science workloads into unified lakehouses. "OTF support further enhances Teradata's lakehouse capabilities, providing a storage abstraction layer that's designed to be flexible, cost-efficient, and easy-to-use," it said in a corporate missive.

Speaking to The Register, Louis Landry, a Teradata engineering fellow, said support for OTFs did not mean the company no longer believed in the enterprise data warehouse.

"It's complementary," he told us. "We believe that we need to be able to play data where it lies. In a lot of cases, that's going to mean highly efficient block storage, for low latency and all that kind of good stuff. But in a lot of cases that's not how the data is going to be laid out. Different customers have different needs. Our goal is always is to make sure that they get the best value out of integrated data."

He said the data warehouse and lakehouse ideas were architectures more than just technologies and that customers would pick and choose which approach works for them.

"That means continuing to offer the level of service we do around that high throughput work that can really only be serviced out of block storage. But we also need to be able to address data that's sitting in an object store or some sort of external storage, so that we provide a holistic, singular view of what's available and what's accessible and security and all the things that people have come to expect out of [a] Teradata system."

Teradata has been performing analytics on data external to its main data warehouse since 2020, when it updated Teradata QueryGrid and partnered with Starburst Data to integrate the Presto connector so that users of Teradata's Vantage analytics platform could access and query a gamut of cloud and on-premises data sources.

But it was adamant that it would not endorse the lakehouse concept. Speaking to The Register in 2022, then CTO Stephen Brobst said data lakes and data warehouses were part of a unified architecture but discrete concepts. "There is a difference between the raw data, which is really data lake, and the data product, which is the enterprise data warehouse," he said.

Although Teradata launched its own data lake in August 2022, Brobst said there was an important distinction between where businesses put their raw data and the data warehouse, which optimizes query performance and controls governance. Creating a hybrid lakehouse was "actually not very useful because you don't want to have more copies of data than is necessary."

Landry said he and Brobst, who left Teradata in January this year, "have had a fun relationship and been debating various ideas over the course of my ten-year tenure here."

"I don't think we've changed our minds on the approach. The technology industry evolves and our goal is to provide the best possible integrated data solution for our customers. This is not new, we haven't just started working on this in the last couple of months."

However, one seasoned Teradata support engineer, who asked not to be named, told The Register he feared the company had lost its way.

"Teradata has to back this horse whether they like it or not, and whether they mean it or not," he said.

The source pointed out the precedent in that Teradata had first resisted then adopted the trend of using Hadoop during the big data boom of more than a decade ago.

Meanwhile, cloud vendors with data warehouse and data lake systems - particularly Google and Microsoft - were writing "blank checks" to try to attract Teradata's largest customers to their systems.

Although Teradata might have a superior data warehouse product in terms of user concurrency and query optimization, customers were increasingly satisfied with a dumbed-down solution so long as it got them to the cloud, he said.

At the same time, getting onto object storage and OTFs might not help efficiency but it would put users in the driving seat, he said.

"People are basically saying, 'I don't care whether you call it a lakehouse or whatever.' They're saying we just want to dump our data into object storage, then the next evolution of that is we want to process it where it is. Then they want an overlay that anyone can use so it's not a proprietary format in object storage. I think this creates major trouble for all of the vendors. Let's just choose Iceberg as the winner … it means your data is now in an open format in the cheapest storage you can possibly get your hands on. It's a winner from an end user perspective."

Hyoun Park, CEO and chief analyst at Amalgam Insights, agreed that Teradata's hand had been forced in adopting the lakehouse concept and OTFs, but he said customers still value high-performance data warehouse systems.

"Teradata has been forced to embrace the data lakehouse concept because of the importance of data lakes and unstructured data in AI and machine learning. Teradata is still a top choice for data warehouse, although of course they have to deal with the aggressiveness of Snowflake there. But nobody really doubts that Teradata can support high quality enterprise data warehouse."

Park said an enterprise data warehouse was still a "superior concept," but the problem was that the number of data/analytics applications businesses were expected to support had expanded rapidly.

"There will always be a place for data warehouse that supports your top 50 apps in the enterprise because you are going to want a high-performance data store to support analytics as fast as possible and a data warehouse is the best way to do that.

"However, the challenge is that the current enterprise of a billion dollar-plus revenue typically has over 1,000 apps. The sheer effort to bring those other apps into a data warehouse is just crippling. You have to put the rest of that data somewhere if you want to use it for anything from analytics to AI, so that's where the data lake comes in. That forces with this two-tier approach."

The expansion of data-reliant applications - like machine learning and AI - together with the introduction of cloud computing and object storage have converged to transform the enterprise data management and analytics environments.

While Snowflake shook things up by separating storage and compute, Databricks attached SQL-style BI workloads to its data lake machine learning environments.

Data lake company Cloudera and Tabular, the "headless" data warehouse vendor, both have different visions of the market, as do the powerful cloud platform providers, which similarly claim to offer an all-things-to-all-data product suite. Whether Teradata can thrive in this complex and changing market is still unclear. ®

 

https://www.theregister.com//2024/05/23/teradata_embraces_lakehouse/

Discussions
Be the first to like this. Showing 0 of 0 comments

Post a Comment