That’s the question I’ve been asked: What exactly is so bad about SQL?. It’s not easy to produce a satisfying answer.
Most of the SQL that most people use is roughly SQL-1992, with perhaps a few bits from SQL_2003 like WITH. That’s a data sub-language, and while you can only write a small part of an application in this language, most of the critical logic has to be written in a general purpose object-oriented programming language. That’s where the ‘object relational impedance mismatch’ comes from.
Here is a list of the ‘Fatal Flaws’ of this kind of SQL, taken from the Askew Wall by Hugh Darwen.
- Anonymous columns (partly addressed in 1992)
- FROM clause restricted to named tables (fixed in 1992)
- Duplicate column names
- Order of columns is significant
- Duplicate rows
- NULL
- Failure to support degenerate cases (e.g. columnless tables)
- Failure to support “=“ properly
- and lots more, and probably to come.
Chris Date and Hugh Darwen wrote a book called Database, Types and the Relational Model. It contains the rational for the Third Manifesto, and also an assessment of how well SQL stacks up against the Manifesto. The version of SQL targeted is SQL:2003 (with PL/PSM). The book does an extensive analysis of this version of SQL against TTM, and identifies sins of omission and commission. Obviously SQL inherits many legacy features and the consequences of decisions made years ago, however many of the gaps appear to have been filled in. Rather than focussing on the bad things one can do in SQL, my question comes down to these.
- If you set out to code in a TTM compliant style, how close can you get in SQL:2003?
- If a TTM language generated SQL:2003 as a backend target, how close could it get to TTM compliance?
Obviously in either case:
- NULL should never be used (and some extra code is needed to avoid it arising during evaluations, eg in aggregates)
- Some comparisons need to be coded carefully to avoid mistakes (eg trailing spaces in CHAR)
- All queries should have DISTINCT in effect to avoid duplicate rows
- Column attributes should be unique and in the same order in all tables (that have the same columns)
- Keys should be defined as compliant candidate keys.
- Tuple level operations should be avoided (FETCH, WHERE CURRENT)
- Much care should be taken to use certain syntactic features and not use others.
The only failures (from that list) that I can find are:
- There are severe limitations in the type system (possreps, components, selectors, constraints, tuple types, relation types).
- No assignment (or multiple assignment) for relation types; emulation via DELETE/INSERT runs into the multiple assignment problem.
- No relation-valued attributes (but might be emulated using MULTISET)
- No public relvars (can only share database tables)
- No nested transactions; some other issues around transactions.
I should point out here that this only applies directly to SQL:2003, but in broad terms it probably applies to later versions of the standard and implementations that claim substantial compliance with the standard (perhaps with different syntax).
So apart from the type system, the horrible syntax and the plethora of ways to shoot yourself in the foot, SQL:2003 is not all that bad! My answer to my two questions is:
- Quite close, apart from the seriously deficient type system.
- Very close indeed, if the compiler can provide its own type system on top of that provided by SQL, its own syntax, and then generate bog standard SQL (or some dialect to suit a particular engine).
You can write good code in SQL:2003 or later, but it’s not easy. A better language sitting on top of SQL might make it a whole lot easier. Is that where Andl fits?