Future Tech

DuckDB promises greater stability with 1.0 release

Tan KW
Publish date: Thu, 06 Jun 2024, 09:24 AM
Tan KW
0 448,562
Future Tech

DuckDB has become a fully fledged database release with its 1.0 iteration, promising a new data model and greater stability to enhance backwards compatibility.

With more than a million downloads a month, the in-process analytical database management system has attracted a lot of attention since its 0.5.0 iteration, released in September 2022.

Speaking to The Register, DuckDB co-developer and founder of support company DuckDB Labs Hannes Mühleisen told The Register the 1.0 release was more about stability than new features.

Firstly, there is a new storage format.

"Every time we had a major release of DuckDB… you would have to reload your data into the system because the format would just change in non-compatible ways because we had been making a ton of changes. But now, we guarantee a backward compatibility and also limited forward compatibility with the storage format, which means that if you write data to a DuckDB database format, now, you will still be able to read that file 10 years down the line. That's a big change," he said.

He said the new file format fills a niche in the market as users can create multiple tables in a single file and have transactional updates to those files, which were efficient and compressed, he said.

DuckDB was born in Amsterdam's Centrum Wiskunde & Informatica mathematical and theoretical computing research center, where Mühleisen is a professor. Embedded within a host process, the database requires no DBMS server software to install, update or maintain. For example, the DuckDB Python package can run queries directly on data in Python software library Pandas without importing or copying data. Written in C++, DuckDB is free and open source under the MIT License.

Former Google BigQuery engineer Jordan Tigani pointed out that DuckDB bucks the trend of cloud-based scale-out data warehousing and takes advantage of more powerful laptops. He's such a fan he co-founded MotherDuck, a company which provides backend extension to DuckDB.

Hyoun Park, CEO and chief analyst at Amalgam Insights, on the other hand, said he sees DuckDB as a one trick pony whose trick is high-performance analytics and file transformations with limited resources.

"The file transformation capability is useful for transforming unstructured data in parquet or other unstructured formats into a performant in memory database. And the database is also useful for conducting high-performance analytics in edge environments, or remote environments, which will be increasingly useful for offloading analytic processing," he said.

Park told us DuckDB is also very easy to deploy and support.

Mühleisen said third-party technology firms were adopting DuckDB under the MIT license, as well as an increasing number of single use cases "where the data scientist is on their laptop."

For example, he said, DuckDB Labs was working with FiveTran to help it use the database in its Apache Iceberg table format implementation.

"That's something that didn't exist in the beginning - that people would just grab DuckDB and put it into their into the pipeline as a component - but that's really growing strongly," Mühleisen said.

Matthew Mullins, CTO at collaborative analytics company Coginiti, said: "As a tool builder we're most excited about DuckDB's close partnership with the Apache Arrow community because it's enabling us to build a new generation of highly performant data analysis tools that leverage columnar data formats. This integration not only boosts performance but also simplifies the data exchange process, enabling more efficient and scalable data operations.

"Since its initial release, DuckDB has delighted data engineers, data scientists, and tool builders by being easy to work with and incredibly performant. Users love its friendlier SQL. It's become a favorite for handling CSV and Parquet files efficiently, no matter where they reside. Additionally, the capability to directly attach to PostgreSQL and MySQL databases for zero-ETL analysis has simplified data workflows and reduced the cost of moving data," Mullins said.

The ability to work on data in PostreSQL and MySQL comes through DuckDB plugins. Mühleisen said the team was hoping to develop a platform where people could upload, share and download DuckDB plugins for connectivity, or support for new scalar functions or index types, he said. ®

 

https://www.theregister.com//2024/06/05/duckdb_promises_greater_stability_with/

Discussions
Be the first to like this. Showing 0 of 0 comments

Post a Comment