So, my solution for this was always
SELECT a
, b
, c
FROM ...
instead of: SELECT a,
b,
c, -- people want to allow trailing comma here
FROM ...
Leading commas also are also very regular and visible. A missing trailing comma is a lot harder to see.Before people start to mess with the standard, I'd suggest to maybe just change your coding style - and make the start of the list special - instead of the end.
I'd suggest going for a lot less trailing commas - instead of allowing more.
You have simply moved the "special" entry to the beginning rather than the end.
Side remark: I've noticed that when it comes to code formatting and grammar it's almost like there are broadly two camps. There are some instances of code formatting that involve something "clever". Your example of leading commas above for example. Another example is code ligatures. It's as if there's a dividing line of taste here where one either feels delight at the clever twist, or the total opposite, rarely anything in between. I happen to dislike these kinds of things (and particularly loathe code ligatures) but it is often hard to justify exactly why beyond taste.
Code ligature thing has something to do with just seeing the characters that are actually there rather than a smokescreen, which IMO impedes editability because I can't place the cursor half-way through a ligature and so on. But it's more than that -- you could fix those functional issues and I'll still dislike them.
That sounds like a bug in the editor. Ligatures are incredibly common font features even outside of programming, even sequences like fi are often ligatures.
E.g., ≠ is as wide as two characters.
No, you usually don't. That would "bury" the first field so you don't immediately see it if you quickly glance at the code. I'll admit I was a bit surprised when I saw a fully formatted SQL query but it does look much better:
SELECT
a,
b,
c,
d
FROM
Customers
ORDER BY
b,
c DESC
Edit: I've just seen other comments here suggesting you return an extra "pad" value at the start so you can use leading commas without "losing" the first value visually. I hardly know where to start with that. It transmits more data on the wire and, to my eyes, looks horrific. That level of hackery is proof, as if it were needed, of how badly needed trailing commas are.After that, it is about minimizing errors. Leading commas minimize errors, and is a style that is portable across databases.
I agree that new styles and syntaxes can take some getting used to. When I went from Pascal to C, braces seemed awful but I now think punctuation is much clearer than words for delimiting blocks.
But the discussion here is about whether to introduce a new clearer style. Taken on this context, your comment amounts to defending any existing syntax, no matter how horrific, because you'll get used to it eventually.
No, I'm not defending any existing syntax.
As I said, it is about minimizing errors. You can get used to anything. But some styles and syntaxes are more error-prone than others. Leading commas in SQL is less error-prone, and therefore is preferred. No matter what your initial aesthetic impression. Allowing extra trailing commas is not portable, and therefore will increase errors if you come to rely on it then move between databases. So if the feature is provided, I will still be slow to adopt it into my style.
SELECT
a
,b
,c
,d
FROM
Customers
ORDER BY
b
,c DESC
reordering, insertions and deletions much easier within the IDE as you’re manipulating lines as a whole. focus maintained while you are in the zone or tired is the biggest gain
at the cost of slightly uglier code and a miniscule… truly miniscule wire overhead.
other decisions will have much larger wire impact, say choosing a column wider than necessary
Why wouldn't that be possible? (The cursor thing)
It's still two characters as far as your editor is concerned.
1. If I `/=` in vim (find equals signs), should the "!=" ligature be highlighted? If it is, then it is misrepresenting the state: if I then press delete, the highlighted ligature won't actually be removed, only part of it will be.
2. In javascript the `===` operator is less visually distinct from `==` since if there are no comparisons nearby I have to judge from the length somehow and it takes longer, adding some additional comprehension fatigue.
Anyway like I said, you can probably try to fix all of these with weird special case handling but it's just fighting the underlying assumption my editor makes (and my brain has learned to understand): a source file is a list of runes. That's how I like to think about it.
Anyone is free to think otherwise. They're just in that other camp and that's OK. But I don't like them personally.
The cursor moves always over a character, regardless whether it is part of a ligature or not. You can select and delete or replace only a part of a ligature, exactly in the same way as when using a font without ligatures.
The only thing that ligatures change is the visual appearance of the text, which is in my opinion an extremely useful workaround for the inertia of the legacy programming languages, which continue to use the restricted ASCII character set, which has never included all the characters that would have been needed in a programming language.
Now with ligatures, we are finally able to write an ALGOL 60 program that looks like it was intended to look, 65 years ago, instead of being forced to use awkward combinations of ASCII characters, like later programmers and designers of programming languages have resigned themselves to do, because of the character set limitations that the American manufacturers of computing equipment have been unwilling to remove (causing problems not only to programmers, but also to the users of non-English languages).
ASCII has never been intended to include the characters required for writing mathematical expressions. ASCII has been intended only to include the set of characters that were in use in the commercial correspondence written in English, which had been previously available in the better typewriting machines. Because of this, ASCII has always been bad as a character set used for a programming language.
[1] `disable_ligatures cursor`
Code ligatures make text harder to read because they are not text, expecially when many of those ligatures are identical to actually existing unicode characters making the gliph more ambiguous than they laready were.
IMHO symbol should reflex their use: the dozens of different arrows used in math are meant to be handwritten or at least to be seen as a complex gliph.
=> and -> in almost every programming language is a = or a - followed by a >. We could very easily make a language where ⇒ is an operator (https://isabelle.in.tum.de/ does for example IIRC) but most languages use simple ascii characters for grammar for good reasons.
IMHO code ligatures are worse than a cursive coding font.
I actually have my editor configured to display special symbols for all control characters except U+000A NEW LINE (that includes U+0009 HORIZONTAL TABULATION as well, this is actually a control character, not a whitespace) and all whitespace characters except U+0020 SPACE.
Yes, but
1. Terms are more commonly added to the end than the beginning.
2. The beginning is often already special, e.g. starting with `SELECT`.
3. The commas are visually aligned/consistent with the indentation.
4. Because of #3, it's far easier to spot a missing comma.
I worked for years in a SQL-heavy role, and this style was the preference there, for these reasons.
I do not want to write either of your snippets, I want to write
SELECT
a,
b,
c,
FROM
Because now selected values are uniform and I can move them around or add new ones with minimal changes no matter their position in the sequence.It’s also completely wonky in many contexts e.g. CREATE TABLE. Trailing commas always works.
> I'd suggest to maybe just change your coding style - and make the start of the list special - instead of the end.
Supporting trailing commas means neither is special.
Also, to generalize a bit: if a human is expected to read it, the way humans write should be able to be parsed by it. That's subjective, to a point, but it's an easy rule-of-thumb to remember. Trailing commas are so common that people have built workarounds for them. Therefore, they can be understood as "the way humans write". If you're writing a language that you still want to be readable by humans, you really should account for that. And, no shade for it not already being done. I'm just reiterating that there should be NO pushback to allowing trailing commas. It's a completely "common-sense" proposal.
The SQL standards committee is having none of it. I can tell you that just from this one sentence.
And, more seriously, there isn't really such a thing as a common-sense proposal with SQL. The grammar is so warped after all these years that there isn't a path to consistency and the broken attempt at English syntax has rendered it nearly incomprehensible for both human and machine parsing. Any change to anything could have bizarre flow on effects.
I'd love to see trailing commas added to SELECT though. Given the mess it isn't possible to make the situation worse and the end of the list being special can be infuriating.
SELECT
a
b
c
FROM d
Or even SELECT a b c FROM d
Because now selected values are uniform and there's no superfluous punctuation to worry about.Ambigious:
SELECT a aliasA b c aliasC FROM d aliasD e
Unambigious: SELECT a as aliasA b c as aliasC FROM d as aliasD e
Alternately, a schema-aware parser could determine if 'aliasA' was an alias or a column reference.FWIW, personally, I'd rather go the full-Python, using newlines as delimiters:
SELECT
a aliasA
b
c aliasC
FROM
d aliasD
e
(With tabs for nesting.)I love Python - mostly, but significant whitespace is its worst curse, and I'd love new languages to move away from the idea. I enjoy Rust because I can write out garbage faster than I can think of it, and the auto-formatter will make it look neat.
Also, have you experienced the unspeakable horror that is templating YAML files? Of course in SQL prepared statements is the safe&sane way to go, but there are cases where you still have to generate SQL as text, such as when building an ORM.
SELECT a as aliasA b select q from foo as aliasQ c as aliasC FROM d as aliasD e
Even if you can figure it out (and add a 3rd level of sub-expression before deciding if you can), it's completely unreadable. You need either commas or parenthesis. Select Foo Foo, Bar +
1 Baz
Baz Bar
Quux Frob
from
t Table
IMO, required `as` and parenthesis are better. But it's not a clear thing where everybody will agree.
The big problem here is the seemed-good-in-the-70s syntax that died with SQL. In the best of all possible worlds they could have just used Lisp as the foundation then let people macro/transpile their own syntaxes. A subtle flaw in SQL is being just hard enough to correctly emit that most people don't build new languages on top of it.
I have really no idea what's "ideal". Haskell's "white space means application" is way nicer than most function syntaxes, end of line statement separation work very well if you don't need any long statement, the SQL complex syntax is very convenient... and yet if you mix any of those things, you'll get a truckload of disastrous issues.
But one thing that is clear is that allowing for trailing commas in SQL is a gain.
If the items in the list are long IF(...), maybe uses several lines and maybe contain SELECTs it's hard to see a missing trailing comma. At the start they're all lined up well, and it's very hard to get them wrong.
Real happy for you. Trailing commas support don’t prevent you from doing that.
SELECT ...
WHERE
foo = 1
AND bar = 2
I want to comment out a line (i.e. "--foo = 1"), but would break the syntax.The solution is to start with "WHERE true":
SELECT ...
WHERE true
--AND foo = 1
AND bar = 2
Now you can comment/comment anything.(Putting the AND at end of each line has the same problem, of course, and requires putting a "true" at the bottom instead.)
Of course then you get editors/linters/coworkers that always point out that the 'true' is unnecessary. This also doesn't work with ORs (just swap to false), but in practice it seems it is always ANDs that is being used.
SELECT ...
WHERE foo = 1
AND bar = 2
Each keyword gets a new line, the middle gutter between keyword and expressions stays in the same place, and things get really, really fugly if I need a subselect or whatever. Any given line can be commented out. (And no, none of that leading comma bullshit, somehow that looks nasty to me.) Downvote this into oblivion to protect the junior developers from being infected by whatever degeneracy has ahold of me.In order to quickly comment out the "foo = 1" here, you cannot simply comment out the whole line because it would become syntactically invalid:
SELECT ...
--WHERE foo = 1
AND bar = 2
I have a single keyboard shortcut to comment/uncomment a line because I like to work briskly with as little unnecessary typing.It has nothing to do with indentation or being "fugly". I'm talking about interactive exploration of prototyping when the final SQL isn't set in stone.
SELECT a, sum(b),
WHERE foo = 1
AND GROUP BY a
Sounds pretty SQL to me.It'd sure look funny if all keywords that connected clauses together were trailable, though.
SELECT a, sum(b),
FROM stuff
WHERE foo = 1
OR GROUP BY a
Or just as horrifying: SELECT a, b, c,
FROM foo
UNION ALL
SELECT a, b, c,
FROM bar
UNION ALL
`SeLeCt ... fRoM ... wHeRe ...` IS VALID! (And you should use a linter/formatter to avoid these categories of style war)
I catch (friendly) flak for my zealot SQL formatting (capitalization, indenting) and know it doesn't impact the execution, but there's something about working in logic / set theory that matches with strict presentation; maybe helps you think with a more rigid mindset?
I recommend looking at ClickHouse (https://github.com/ClickHouse/ClickHouse/) as an example of a modern SQL database that emphasizes developer experience, performance, and quality-of-life improvements.
I'm the author of ClickHouse, and I'm happy to see that its innovation has been inspired and adopted in other database management systems.
on top i often do a pad entry so that the elements are all on their own line
SELECT 1 as pad
, a
, b
then i can reorder lines trivially without stopping to re-comma the endpoints or maintain which special entry is on the line of the SELECT tokenwhat would be helpful is both LEADING and trailing commas
so I am suggesting:
SELECT
, a
, b
would be permissible too. The parsing step is no longer the difficult portion.Developer ease leads to less mistakes is my conjecture.
SELECT
a,
b,
c,
1 as pad FROM
Then?typically names are different lengths and the commas are hard/harder to spot
Your suggestion:
SELECT
first_column,
second_column_wider_a_lot,
(third + fourth_combined_expression),
1 as pad FROM
vs my current preference: SELECT 1 as pad
, first_column
, second_column_wider_a_lot
, (third + fourth_combined_expression)
FROM
SELECT *
FROM table
WHERE 1=1
That way, if you want to filter down the result, everything programmatically appended just needs an "AND ..." at the start, like: SELECT *
FROM table
WHERE 1=1
AND age > 21
AND xyz = 'abc' ...
Because without "WHERE 1=1", you'd had to handle the first condition different than all subsequent conditions. DELETE FROM table
WHERE 1=0[ OR 1=1
AND age > 21
AND xyz = 'abc']
;
Brackets designate selection bounds before text deletion. The above just safely does nothing after you hit DEL.Without that you’d have to delete whole [WHERE…], which leaves a very dangerous statement in the code.
That method does impact the result set, and using it for CTAS or bulk insert would require more care in column selection.
SELECT
a,
b,
c,
FROM ...
?
SELECT 1 as pad
, a
, b
SELECT
a
, b
, c
FROM ...
0 - https://mode.com/blog/should-sql-queries-use-trailing-or-lea...
No, it's still a really good reason to mess with the standard.
While the standard doesn't evolve into something slightly modern, yes, that workaround is better than the way people usually write SQL. But its existence isn't a reason not to fix the fundamental problem.
, b
, c
FROM ...
is the same as:SELECT a, b, c FROM ...
The line before the Code has to be empty to get correct formatting.
I think the right answer is to fix the merge algorithm to handle some common cases where an inserted line logically includes the delimiter at the end of the previous line.
SELECT
, a
, b
, c
FROM ...
and SELECT a,
b,
c,
FROM ...
or SELECT
a,
b,
c,
FROM ...
the only limitation is that sometimes more parenthesis are needed and that SELECT col_name new_col_name from table_name needs to be rewritten as SELECT col_name as new_col_name from table_name. good tradeoffs IMO SELECT a, b, c, NULL FROM ...
SELECT a, b, c, true FROM ...
SELECT a, b, c, 'ignore me' FROM ...
FWIW, my SQL grammar ignores trailing commas. IIRC, H2 Database does too.I also have a somewhat controversial style preference in regard to capitalization. Because SQL is case-insensitive, I will always type everything in lowercase and let syntax highlighting do its thing. I hate mixed capitalization. Not only does it feel like keywords are yelling at me, but also the moment someone else gets involved there will be inconsistent casing. Do you capitalize just the keywords or do you include functions? How about operators (e.g. “in” or “like”). More often than not I see individuals are inconsistent with their own queries. So I say to hell with it all and just keep it lowercased
You do not need to see the missing comma. That is what compilers are for. Also, literally the only reason anyone has a missing comma is because they reordered the terms and forgot to put the comma onto the one that was forced to not have one because of this monstrous failure.
Some things are nothing but good. You are on the wrong side of history.
Why do you seem to think you've cleverly solved the problem, when you've just moved the problem somewhere else just as bad, by blithely messing with the standard formatting conventions universally used by most human written languages and programming languages in the world?
Programming languages borrow commas from human written languages, and no human written languages have leading commas. And moving the problem to the beginning on the list because you sometimes add things to the end ignores the fact that that also causes problems with READING the code as well as writing it, and you READ code much more often than you WRITE it.
That's not a solution at all, and there's nothing clever about it. You've just pushed the problem to the beginning of the list, and now your code is also butt-ugly with totally non-standard formatting, which sane people don't recognize and editors and IDEs and linters don't support.
I agree with cnity that it's too clever by half, and I'm in the camp (along with Guido van Rossum and his point that "Language Design Is Not Just Solving Puzzles" [1] [2]) that's not impressed by showboating displays of pointlessly creative cleverness and Rube-Goldbergeesque hacks that makes things even worse.
Of course it's not as bad as tezza's clever by a third suggestion to add a confusing throw-away verbose noisy arbitrarily named pad element at the beginning, that actually forces the SQL database to do more work and send more useless data. Now you have three or more problems. The last thing we need is MORE CODE and network traffic contributing to complexity and climate change. I would fire and forget any developer who tried to pull that stunt.
[1] https://www.artima.com/weblogs/viewpost.jsp?thread=147358
All Things Pythonic: Language Design Is Not Just Solving Puzzles. By Guido van van Rossum, February 10, 2006.
Summary: An incident on python-dev today made me appreciate (again) that there's more to language design than puzzle-solving. A ramble on the nature of Pythonicity, culminating in a comparison of language design to user interface design.
[...] The unspoken, right brain constraint here is that the complexity introduced by a solution to a design problem must be somehow proportional to the problem's importance. In my mind, the inability of lambda to contain a print statement or a while-loop etc. is only a minor flaw; after all instead of a lambda you can just use a named function nested in the current scope.
[...] And there's the rub: there's no way to make a Rube Goldberg language feature appear simple. Features of a programming language, whether syntactic or semantic, are all part of the language's user interface. And a user interface can handle only so much complexity or it becomes unusable.
[2] http://lambda-the-ultimate.org/node/1298
The discussion is about multi-statement lambdas, but I don't want to discuss this specific issue. What's more interesting is the discussion of language as a user interface (an interface to what, you might ask), the underlying assumption that languages have character (e.g., Pythonicity), and the integrated view of semantics and syntax of language constructs when thinking about language usability.
A tad bit harsh there?
it is a trade off.
clarity and ease during design time versus slightly uglier but still consistent code as a work around. miniscule energy overhead
>tezza 4 hours ago | root | parent | next [–]
>as mentioned elsewhere, i personally introduce a pad element to get fire and forget consistency
Adding an extra pad entry is a cure much worse than the disease, and I'd expect it should be objectively obvious to anyone that you're introducing more complexity and noise than you're removing, so it's not just a matter of "style" when you're pointlessly increasing the amount of work, memory, and network traffic per row. But sadly some people are just blind to or careless about that kind of complexity and waste.
You might at least have the courtesy of writing a comment explaining "Ignore this extra unused pad argument because I'm just adding it to make the following commas line up." But that would make it even more painfully obvious that your solution was much worse than the problem you're trying to solve. You seem to have forgotten that other people have to read your code. Maybe just don't leave dumpster fires burning in your code that you want to forget in the first place.
As Guido so wisely puts it: "the complexity introduced by a solution to a design problem must be somehow proportional to the problem's importance".
It is a solution, and I'm not motivated by trying to be "clever". It just makes writing and reading the query easier for me.
>You've just pushed the problem to the beginning of the list
The beginning of the list is modified less often than the end. The two cases aren't symmetric.
>and now your code is also butt-ugly with totally non-standard formatting
"Ugly" is subjective. Personally I like how the commas line up vertically, so I can tell at a glance that I didn't miss one out. SQL doesn't have a standard formatting in any case. It's whitespace insensitive and I've seen people write it in all kinds of weird ways.
>which sane people don't recognize and editors and IDEs and linters don't support.
A difference in code-formatting taste is not "insanity". And it does not interfere with tooling at all.
>showboating displays of pointlessly creative cleverness
Where are you getting all of this from? You seem to be imagining a "type of guy" in your head, so that you can be mad at him.
I am reminded of Sayre's law: In any dispute the intensity of feeling is inversely proportional to the value of the issues at stake.
How is this any different from leading periods, which seems to have become the standard across several of the most popular programming languages?
myobject
.somefunc()
.otherfunc();
It's not subjectively pleasant in my opinion, but I think it's hard to argue that it doesn't improve maintainability and (apparently) readability for the masses. o
ムワ
The padding and separators are the least issues I wish I only had there.We don't need philosophy or morals (or jobs where that's important), we just need the way to edit these damn lines without anything screaming "syntax error!" or parasitic "++--" diffs every time you make a change. “When art critics get together they talk about Form and Structure and Meaning. When artists get together they talk about where you can buy cheap turpentine.”
-
(*) That classic claim didn't turn out true for me after so many years, that I suspect developers simply avoid admitting that it's not true for them either.
Who is writing SQL queries expecting them to fail because of this reason?
By your definition, "backwards compatible" doesn't mean anything. Literally everything will be a breaking change if you define "backwards compatible" like this.
It's backwards compatible from the perspective of the database - the db will continue to support queries from older versions.
I believe you're speaking from different perspectives.
But they never ran fine on engines that didn't support trailing commas in the first place :/
What you're calling "forwards compatible" is what I call "backwards compatible". Frankly, I suspect most people expect "backwards compatible" stuff to work like this.
Is this distinction useful in any way?
If you are trying to contribute, you will use the phrase backward compatible as we always do: my old code will still run after the change.
If you are trying to be pedantic, you will choose a new way of looking at it: My new code will not work if you go backward in time.
I don't care. I will swear to never write a dangling comma for any engine that doesn't support it. Also, I will swear to never use an engine that doesn't support it if once that does exists... Just as I did with Javascript.
What annoys me far more often is the lack of support for trailing commas in JSON.
This, 100%! And the lack of comments.
JSON is already strictly worse than XML though as it doesn’t support comments and multi-line strings in a sane way.
But I really don't need philosophy. I know what I want, and I want json with $subj, //comments and optional key quoting. Feels like some people just love making inconvenient things standard and/or teaching others how to live. (I mean the idea "json is not for X" here, not your comment)
And just to be clear: YAML is also not a config format and wasn't meant to be. YAML is for metadata style stuff that is supposed to be close to humans, heck the whole yes/no and "not so strict" typing parts.
If you want a config format you got many options: There is toml/ini and friends on one side and UCL/HCL/... on the other. Or if you want to go really simple, do something like Postgres, Tor, etc. do and just use space separated strings.
Feels like it does 75% of what PRQL does while still staying somewhat backwards compatible. Already works in BQ if you opt in.
It has the small benefit that you never, ever have to worry about running into an old version of a JSON parser that bounces your fancy trailing commas.
Worth it? Not to me but Douglas Crockford does not care what I think. Or you either, apparently.
', '.join(fields)
;)[{},{},,,,{},,
Should be fine. Now you can push things to the eof.
I can use csv or html ofc.
The first case that jumps to me: if you write "a,b,c," you can't know whether you forgot a parameter, you passed an empty string where there shouldn't be one, or you intentionally left a trailing comma. And SQL already is human-oriented - compare your typical SQL to a data operation in K [2]. It just so happens that some things are hard and off-by-one errors are famously near the top of the list [3].
Whenever a database complains that your SQL query contains a trailing comma it's a sign that you may not have paid enough attention to your arguments (or even worse, that you're building queries by hand instead of using prepared statements). From where I stand, not allowing trailing commas is a feature, not a bug, and I would therefore object against them.
[1] "be conservative in what you do, be liberal in what you accept from others" - https://en.wikipedia.org/wiki/Robustness_principle#Criticism
[2] https://news.ycombinator.com/item?id=42999650
[3] http://www.randomprogramming.com/2014/09/quote-of-the-week-h...
SQL is fundamentally unsafe "forgot the item"-wise. It doesn't allow dynamic column selection, there's no type safety, nothing. If you want guarantees, maybe make a proper language with guarantees rather than justifying random side-effects of a comma being a syntax error sometimes.
I reject any argument about how mature SQL is at this point. Just bump the version number (SQL 25) if necessary and make it so.
It all came as a very big surprise, out of thin air.
That all of you can take something so futile so seriously, spend time writing about your workarounds (which, of course, we all have in spades), is simply charming.
Thanks for you being you.
A lot of people prefer purity of grammar; these are the kind of folks who demand coding style guidelines and will be very unhappy if you violate their preference. They’re not wrong, but they have a very strong preference.
A lot of people don’t care so much about particular style decisions, but they want readable code that is easy to understand, maintain and modify. I think a lot of the trailing-comma-preferring people fall into this category. But again, it’s a preference and isn’t right or wrong.
There are probably also people who don’t care and compete in obfuscated code contests or try to minimize line counts by putting as many statements on each line as possible or otherwise writing genius-but-unreadable code by taking advantage of language and syntax idiosyncrasies. This is a preference too (although I consider it antisocial).
My preference is, if it’s unambiguous, allow trailing commas wherever there are lists, because it makes cut and paste operations much easier.
But at the end of the day that’s just my preference and not superior to anyone else.
SELECT (a b c) from sometable where id in (10 20 30)
This would be quite useful when doing exploratory work and you want also to copy/paste values from somewhere else.But to be fair the main issue in this case is handling of the WHERE clause, because (un)commenting parts is never straightforward
select foo from bar
where
x < y -- cannot just comment this
and (
z = 10 -- neither I can comment just this
or baz is not null
)
Yes, one could put AND and OR on their own like, but similarly one could put a comma on a line of its own...I'm not convinced this is better than what you are replacing.
Please add this.
Don’t forget to take into account the people not requesting this feature.
A lot of them don’t think this is worth the changeover.
I think nobody wants to make trailing commas mandatory to use
It would be really helpful if the author had provided at least a couple of these.
I can't think of any obvious examples that would be complicated/conflicting, so it's not even clear if this is real complexity or not. I mean, it might be, but let's at least demonstrate that concretely?
SQL grammar is pretty limited. Surely it can't take more than half an hour or so to check if any grammatical ambiguities could be introduced -- or at least a quick first pass? This whole post is postulating about theoreticals when it could just answer some of them.
> Option 2 is weird, how do you determine the cutoff?
The cutoff seems obvious. If it is currently always a syntax error allow a comma, if it can be valid syntax preserve the existing behaviour.
I also can't think of any other cases right now, but if needed there can be a few places that don't allow trailing commas. As long as SELECT and CREATE TABLE support it 99% of the benefit will be there.
On the downside, does this make it easier to write code that is vulnerable to command injection, because it is easier to append a bunch of statements passed as input?
For the WHERE clause, if i happen to be programatically appending conditions, i'll start with `WHERE true` so i don't need to consider if i'm appending the first condition (no AND/OR/NOT etc.) or a later condition (required AND/OR/NOT etc.). A decent query planner should ignore the fixed value.
PRQL appears to be a rather small project... not some major corporate effort.
The main limitation is developer time. There is so much that could be done with PRQL! Without corporate sponsor, parent company, or more contributors, velocity is unfortunately limited. If you'd like to see that change, please reach out!
C bindings would be great! Do you want to open an issue in the repo?
There was also a helpful comment on a HN thread a few weeks back about how to make the API better to develop against. I've been meaning to get back to that but been constrained myself.
In Rust, when you define a `#[no_mangle] pub unsafe extern "C"` function, and then compile as a shared object / dll, that function will be exposed in an ABI-compatible way the same as any C function would be. It's just a matter of defining the proper header file so that you can use it from a C program, or from any other programming language that can bind to C.
Writing a header file manually is boring and error-prone, so people will often autogenerate the header file for the exposed functions using a tool like cbindgen: https://github.com/mozilla/cbindgen
We can see that PRQL is using cbindgen here to automatically create a C header: https://github.com/PRQL/prql/tree/main/prqlc/bindings/prqlc-...
The public API that PRQL wants to expose is defined in Rust here: https://github.com/PRQL/prql/blob/main/prqlc/bindings/prqlc-...
The generated C header file is here: https://github.com/PRQL/prql/blob/main/prqlc/bindings/prqlc-...
And that C header file -- combined with the compiled library -- should be all that is needed.
I suspect that the PRQL maintainer is saying that they want to offer a more idiomatic binding for C. The raw API that is exposed may not be the most user-friendly API, especially since they don't seem to have much familiarity with what is considered "idiomatic" in C, so they haven't been ready to commit to that API being considered "stable" yet. Based on my own poking around in their existing bindings... that C binding appears to be the API that they are using internally in those other language bindings already. (I'm also not sure how else they would be creating most of those bindings, if they weren't using that C binding... apart from some special cases, like how there is a convenient alternative for exposing bindings to Python from Rust, for example.)
We can see in the dotnet bindings, for example: https://github.com/PRQL/prql/blob/main/prqlc/bindings/dotnet...
C# does not allow directly using a C header file, so it requires manually re-defining the same set of extern function signatures, but it appears to be the same.
I'm not an expert on PRQL by any means, and it's been a few years since I really used Rust, but I'm just piecing together what I can see here.
This article I found could also be helpful: https://www.greyblake.com/blog/exposing-rust-library-to-c/
Rust code normally does not adhere to a C-compatible ABI, but the purpose of these "extern" functions is to do exactly that when you're trying to expose code that can be called by standard conventions... since the industry has largely settled on C-style functions and structs, for better or worse, with all of the limitations that imposes.
> C# does not allow directly using a C header file
https://github.com/dotnet/ClangSharp?tab=readme-ov-file#gene...
there are bespoke libraries which build on top of it like CsWin32 where you specify the methods/modules to import and get nice and, often, memory-safe types and members in C#.
I think it should be possible to enhance this even further like having '// pinvkgen: #include <some_dependency.h>' at the top of any particular C# file and getting what you need in a C-like way. There are some problems with this inline approach but it could work.
The main point is there are quite a few community packages which simplify authoring bindings (there are more, like https://github.com/Cysharp/csbindgen/ for Rust or https://github.com/royalapplications/beyondnet for Swift). It really shouldn't be a necessity to write bindings by hand for large dependencies as it's both error prone and a solved problem.
I could have missed something but I don't see anything that would be impossible to parse without commas so long as there's at least one character of whitespace separation.
Personally I agree with the sentiment, I find it annoying to have to juggle the commas on the last column name but I think there is likely a valid reason to do with making lexing easier to reason about.
I'm not sure if this is a solved problem at the level of the ANSI SQL spec or if every vendor does their own thing, but there's definitely plenty of precedent that ambiguous grammar is allowed and can be resolved.
select
c1,
c2,
c3,
...,
c50,
sum(c51),
from
table
group by all
Its the same as having the imports first
If I had to choose I'd opt for better editors.
‘SELECT a, b, c, 0 FROM t1 ‘
Having newline be a valid list separator is particularly nice because it solves the "trailing comma" and "comma-first" style workarounds in a visually elegant way. The newline already provides a visual separator; we can already tell that we're at the end of most lists by way of having another keyword appear next without needing to rely on a lack of commas, for example:
select
id
name
email
from users
``` SELECT Field AS Renamed, OtherField AS AlsoRenamed ```
and
``` SELECT Field Renamed, OtherField AlsoRenamed ```
are semantically equivalent.
That said, generating SQL from s-expressions can be a very pleasant experience.
Just read the code, write your change, move on. If you find yourself missing small details like this you need to just slow down a tiny bit. Its not hard. By all means format your code cleanly, just don't spend cycles writing blog posts about it.
Whenever I edit JSON and SQL there is a lot of fiddling with commas when rearranging lines or adding new lines to the end of the list. In other languages I use (Go and TypeScript) there is no such fiddling.
Now extrapolate to the whole industry. There's a cost, regardless of how someone might want to value it.
I mean sure, if people spend most of their day just mindlessly copy pasting crap then perhaps I guess. But compared to the amount of time I find I spend thinking about the actual problem at hand, the single keystroke to edit a comma is a rounding error.
What a joke.
Frequently I will be working on a query and have something like
SELECT
a
, b
, c
from foo
and then I want to comment out a column as I am working and exploring the data. But I cant comment out column a without also removing the comma on the next line. SELECT
a,
b,
c,
From Foo
Then if I can have an extra leading comma, I can reorder, comment out or remove, or add a column at any point in the list without having to think about the rest of the projection. Also diffs are cleaner, it only highlights the rows that have changed, not the row above it/below it that had its comma added or removed. This happens a ton when im iterating on a query.
I've found LLMs do a really good job of writing/cleaning up SQL.
Every time I see a trailing comma, I think "is this a bug? did they forget an element?".
Edit: as a reply to the comments below: that's a lot of hassle to save you hitting backspace once. There's premature optimization, and then there's single keystroke optimization.
Every time this trips me up, it’s hand written SQL, because literally every other language I routinely use supports trailing commas.
The additional complexity during codegen is barely existent. If you’re using an orm or code generator this will be an issue once at most (if and when you write your own).
From my experience there are a lot of programmers that simply don't write SQL at all. They use an ORM or some query building library in their preferred programming language. So they're the ones that don't understand the problem when handwriting SQL, because they simply don't do it. Even the parent comment here is talking about PHP array syntax instead of SQL.
I write all of my queries directly in SQL, and I run into the trailing comma issue somewhat frequently. It's a minor annoyance, but I've just been dealing with it forever and accepted it.
Great contribution, thanks for nothing.
1. It makes git diff less noisy when I need to add new element at the end since you don't need to add an extra comma at the end of the now-second-to-last element.
2. It makes reordering the last element a lot more pleasant during development since you don't need to add and remove comma when you are moving the line.
And the same applies (or I wish applies) to SQL. Time and time again I wish trailing comma is a thing in SQL when I'm constructing and testing even slightly more complex queries.
Unless I change the first column, no problem.
Supporting leading or trailing redundant comma are both good options. Not supporting either isn't.
When you stop working with any other code than SQL you look back on your younger self and think ‘I was soooo young’ when typing those lovely leading commas.
Learned to love the functionality of these though and the looks grew on me.
my_things = [
"thing_1",
"thing_2",
"thing_3",
]
And not have to juggle commas when you add a new end thing.And some, amusingly, don't.
Anyway, SQL is a different beast entirely, this is specifically about hand-written SQL which needs to be optimised for readability and comprehension, both in the query itself and the diffs changing them. Spreading columns and arguments across multiple lines is common if not mandatory for writing maintainable SQL.
Those are the same thing, just different keystrokes.
> diff noise
This is a non-issue, or rather a solved issue. Any half competent diff tool does word diffs instead of just line diffs. Changing the syntax of many (every?) programming language to promote "better diffs" seems over the top. And what's the result? "a,b" -> "a,b,c" vs "a,b," -> "a,b,c," both have equally "noisy" diffs.
> Spreading columns and arguments across multiple lines is common if not mandatory for writing maintainable SQL.
I agree. But I disagree that a trailing comma makes it more readable or maintainable in any way. I can't look at a trailing comma without wondering what's missing.
Either you're cuddling the strings with your hands and it'll take time and care anyway, or you're doing programmatic massage, in which case most DB-driver capable languages has a bunch of functions that solve issues like these.
Syntax error at or near ')'