Future Tech

Codd almighty! Has it been half a century of SQL already?

Tan KW
Publish date: Sat, 01 Jun 2024, 07:09 AM
Tan KW
0 448,535
Future Tech

Feature In the fifty years since SQL was first proposed, it has become the dominant query language for working with relational databases. As such, it underpins the transactions vital to the workings of government, business and the world's economy. But like many standards, its path to global domination is not been a straight one, and in the early years of its gestation, the story could have been very different.

The clue is in the name. Although the acronym might stand for Structured Query Language, it was first known as SEQUEL, a nod to QUEL, a rival language for addressing information in relational databases developed at about the same time.

IBMers Donald Chamberlin and Raymond Boyce wrote the paper "SEQUEL: A structured English query language" in 1974 as a way of addressing data in IBM's newly proposed System R, the first database to embody Edgar Codd's paper [PDF] describing the relational model for database management.

Speaking to The Register, Chamberlin said Codd's approach was a departure from the prevailing thinking in data systems from the 1970s, and would require a new language.

"The most respected figure in the database industry was Charles Bachman. He was working for General Electric, a computer company at the time. He invented the whole concept of a database system as a distinct software layer between the operating system and the application, and he developed a particular database system called Integrated Data Store," Chamberlin said.

Bachman's creation became known as a navigational database, but it was not set to have a long-lasting influence on the industry.

"Charlie thought that you should access a database by writing a procedural program that tells the computer what to do to find the answer. Codd said, 'No that's all wrong, we should use a declarative approach and let the computer figure out how to find the answer.' That was the conflict that was raging in the industry around the early '70s," Chamberlin said.

A SQUARE language, but impossible to type

In their first attempt at creating a language for addressing the new concept of relational databases, Chamberlin and Raymond Boyce created SQUARE or Specifying Queries in A Relational Environment. But it relied on a finger-mangling combination of subscript and superscript notation, and by Chamberlin's own admission, it was difficult to type.

After moving from the east coast to IBM's San Jose Research Laboratory in 1973, Chamberlin and Boyce joined the team for System R.

"The relational model was simple and elegant and powerful. But Codd presented his concepts in a lot of mathematical jargon. He's basically a mathematician, and the first time we read his paper, it wasn't really obvious how elegant and simple this concept was. He took a simple concept and made it complicated. The other problem was that it wasn't clear at first whether these ideas could be efficiently implemented within a program that would have adequate performance," Chamberlin said.

Another database team works on the same problem

System R was IBM's attempt to demonstrate that the relational model could be executed practically by the computers available at the time, but because Codd's paper had been published in a journal, the IBM team were not the only ones working on the problem. Up the coast near San Francisco, another team at University of California Berkeley was working on a similar project.

Chamberlin said: "There were three things that were needed to make a relational database possible: a kind of a three-legged stool. We had the data model. The second thing was a query language that people could understand even though they weren't mathematical specialists, and the third thing was an optimizing compiler that could take this high level language and turn it into an efficient plan with good performance. That's exactly what people were doing both in IBM and also at Berkeley. There was a revolution going on. The cost of computing was coming down, everybody was putting their data online and the winner of that revolution was relational databases," he said.

The question was which one and using which language.

At Berkeley in the early '70s, post-doctorate Michael Stonebraker and fellow computer scientist Eugene Wong began a project to build Ingres, another early relational database, at the same time as System R. With it, they developed their own query language, QUEL.

"Chamberlin and Boyce basically turned SQUARE into a language you could type on a keyboard, but it had this nested structure to it," Stonebraker explained to The Register.

Did the best technical solution win?

However, there were queries that could not be expressed in a nested notation. The result was, Sequel added a flat notation to the nested notation, which made the language "unnecessarily complicated," Stonebraker said.

"On the other hand, QUEL started off with a flat notation from the very beginning. There's no nested notation. Sequel was hampered by the inclusion of the nested notation. But the trouble is, you can't take it out once you put it in," Stonebraker said.

At the same time, the dominance of SEQUEL - the SQL nomenclature came later - was far from a foregone conclusion, as IBM initially showed little interest in developing System R as a commercial product since it was making so much money from its existing data system.

Chamberlin explained: "It was a frustration for me, that's for sure, and for other members of the System R team. This was a business decision that IBM was making. They had a successful commercial database product [in hierarchical IMS]. Why would they want to introduce a competitor for it? It took a few years for them to figure out.

"In the meantime, the System R group published papers both on SQL and on the optimizing compiler in the open technical literature. IBM was very generous allowing us to do that, mainly it was because they didn't take the technology too seriously from a business point of view: this was research," Chamberlin said.

The birth of Oracle and influence rise of the mini-computer

Two companies were, however, interested in producing commercial relational database. As well as Stonebraker's Ingres, commercialized by the Relational Technology company he helped to found, there was the fledgling Software Development Laboratories founded by Larry Ellison, Bob Miner and Ed Oates, which developed the Oracle database, later to give its name to the company.

Oracle, adopted SEQUEL and claimed to be the first commercially available relational system, and Ingres had a significant advantage over IBM, Stonebraker said. Both were developed for mini-computers systems, distinct from the prevalent mainframes of the 1960s and 1970s.

"It was highly advantageous to both companies that at the time in the early to mid-'80s, the major database systems were all running on IBM hardware so they were written in IBM assembler, so they couldn't be ported to DEC and Data General mini-computers. The '80s was the rise of the mini computer and it presented a green field to Oracle and Ingres," Stonebraker said.

He said the implementation of Quel in Ingres was substantially better than Oracle's adoption of SQL. Stonebraker said Oracle had a head start and was achieving strong growth using "sales tactics I would not condone." However, he said that Ingres might have overtaken Oracle in 1985 were it not for an intervention by IBM.

In late 1983, Big Blue introduced Db2, which, although not its first commercial relational database, was soon to become the flagship of a new vanguard. Db2 used SQL, and with IBM's dominance of the mainframe market, it was game over for rival languages, whether they were technically superior or not.

Standards bodies and government purchasing make the call

But for Chamberlin it was the approval of national and then international standards bodies, and the adoption by US government purchasing authorities, which helped win the day for SQL.

As relational databases became more commonplace due to the success of Ingres and Oracle, the American National Standards Institute (ANSI) decided to take an interest and look at the query languages.

"Well, they had a committee called the H2 committee that has representatives from a lot of different software vendors, including IBM and Oracle, both of which at that time, were marketing SQL products. Crucially, the Ingres people didn't participate in that committee. The world could have been different if Ingres had been represented on that committee," he said.

Instead, the ANSI committee saw two relational products in the market, both of which used SQL so they were compatible, so it thought it would get the most traction if it based its standards on the commercially available SQL implementation, Chamberlin said.

Meanwhile, the US government published the Federal Information Processing Standard 127. "You had to pass that standard to sell software to the government. The FIPS 127 was identical to the ANSI standard. That was very beneficial from the point of view of marketing SQL systems to the government," Chamberlin said.

Following the ANSI standard, the International Organization for Standardization (ISO) created its standard for SQL in 1986. The most recent update from 2023, includes new features related to JSON and property graph. With SQL's world domination assured, Chamberlin reflects what his team could have done differently. He said those pointing out that SQL was not orthogonal - the idea that operations change only one thing - were making a valid criticism.

However, the goal in creating SQL was not to create a programming language as such, but to create a means that non-programmers - business analysts and so on - could use to address their data.

"We wanted the language to look as closely as possible to natural language so they could read it and understand it, like it was an English sentence. But of course, English is not an orthogonal language, so we made some compromises there," he said.

In the end, SQL is mainly used by programmers, database administrators and data scientists. "A little bit more care in making the language more orthogonal would have been beneficial in the long run as things turned out," he said.

Despite SQL's success, Chamberlin sees the need for NoSQL databases and languages to help address data needs in modern applications. But the sheer volume of data in existing relational databases, together with the popularity of open source relational databases such as MySQL, PostgreSQL and SQLite for building new applications, mean SQL is unlikely to be replaced.

Despite his language losing out, Stonebraker says that the database industry has benefited from having a single "intergalactic" standard in SQL.

"I don't see anything supplanting it at this point, unless for some reason the relational database goes out of favor, but I don't see that happening. AI might allow people to use natural language to compile SQL, but that just puts another front end on it," he said.

"There are 20 or 30 popular programming languages in the software development world. In the database world, there's only one." ®

 

https://www.theregister.com//2024/05/31/fifty_years_of_sql/

Discussions
Be the first to like this. Showing 0 of 0 comments

Post a Comment