I agree that it's a tragedy, XML had so much standardization, tooling, validation, etc available, it was great. But people used it for configuration and manually writing it, which sucked.
15 years later and JSON still is still very far from the standardization and tooling that XML had at the time.
It reminds me of the NoSQL thing from back then too, oversimplified it was "what if we chuck JSON blobs in a key/value store?". It took years to realize that relational databases and SQL weren't actually that bad, and / or that NoSQL had a long term cost.
1: Standardization was made by committee lovers and/or architecture astronauts leaving us with overly convoluted (sometimes to fit lacklustre object models in early OO languages) and complex ways of working.
2: Complexity that introduced security vulnerabilities
Sure, there was some great tooling available, but how much of it was needed because all complexity made it hopeless to work with without the tooling?
I used it for configuration and serialization in some projects, and it was actually great but I almost always diverged from the bloated norms and defaults for readability/writeability (that made it a bit annoying to specify serialization rules).
Yeah there is so much "flexibility" in those above designs but it wasn't needed 99% of the time.
JSON was and is so far the best popular compromise between "just plain data" and "some" structure to make automated processing non-painful.
Also as an improvement over XML collections (do you created a container element to specify container target leading to bloat or just map some of the sub-elements to specific collections and hope you don't run into ambiguities?) is that collections are just specific lists to a property.
The biggest drawback of JSON is that we never had a way to handle type specializations/subtyping but had we done that we might have not gotten the universal acceptance across languages.
Yes, comments but whenever you need that you can make a single-line regexp to strip a useful subset of them without affecting anything by the standard by removing matches of
<foo>
bar
</foo>
<nested>
<nested_foo>
nested_bar
</nested_foo>
</nested>
Or should I do this?
<foo>
bar
</foo>
<nested nested_foo="nested_bar"/>
If your goal is to simply encode a data structure that looks more or less like a C-style struct (key/values, arrays, primitives), the concept of attributes on tags is superfluous, and introduces ambiguity in how to serialize something.
To me, JSON is nice because it's essentially the minimum viable way of encoding a C-style struct (floating point behavior notwithstanding.) XML has extras like attributes, schemas, and DTDs, which may be useful, but they come at the cost of having additional syntax (<!DOCTYPE ...>, etc), which is auxiliary to the goal of encoding data structures, and thus makes it no longer as minimal.
To me, JSON's approach of having a separate out-of-band definitions of schema (e.g. JSON schema) is the better approach because it's less opinionated, you only pay for it when you need it, and it doesn't require separate syntax. Leave validation to a validation step, my data is my data.
You forgot to mention XSLT, which is just such a ridiculously powerful tool when you got XML data. It allows one to transform XML into XHTML or even other formats. Edit: Browsers also support this automatically, so you open an XML document and you get to see a rendered HTML from it.
I had been tasked at least several times in my past jobs to develop XSLT scripts to transform data into user-readable content. I don't know of anyone that uses XSLT today and I have no idea if there is a JSON equivalent.
I wrote a simple CMS a few years ago that used XSLT to assemble static pages from templates and data. It worked like a dream, although writing XSLT is not the easiest task!
When you are commenting your schema, that's true.
Anything which is generated by machines doesn't need comments either.
But when it's written by people? And the values? That belongs with the 'payload'.
This is true, but it is also true that if someone who didn’t read the documentation or the latest version of it, changes a config parameter that brings down a critical system you are responsible for, your bosses or customers won’t care if it was in the documentation or not.
"I use JSON for just about every data file format in my games."
Not to be mean, but this has the trifecta of amateur programming:
- JSON
- Games
- One solution for everything.
Pro tip, you can store variables as they are in memory to disk. Got 1 million 2D points representing units? Each point is a couple of floats? You can store each float as-is, write the amount of floats as an int (4bytes), the first float is the X coord, the second the Y coord, (4 bytes each), then repeat 1M times, boom you just solved that in 8MB, and in a couple of miliseconds of compute. Bonus point, no escaping, no import json, just a programmer programming.
> Not to be mean, but this has the trifecta of amateur programming
But you are being mean. Also, the guy who wrote the article is no amateur, but a seasoned veteran who's likely been storing floats in files for decades. Check him out, you might be surprised.
To those who may not know, Grumpy Gamer is a blog by Ron Gilbert, the creator of iconic game titles such as Maniac Mansion and Monkey Island. A true legend.
> Why is italics italics* and bold is *bold*? Why not bold and _italics_. That would make a lot more sense to me.*
I think that comes from separating content and style, indicating meaning rather than explicit style: it isn't really one asterisk for italic and two for bold, it is one for emphasis and two for strong emphasis and the renderer choses how to display those levels. Like using HTML's “em” and “strong” tags instead of explicit “i” and “b” tags.
Back on Fidonet/Aminet/Spot on the Amiga in the 1990s, pre-internet, the convention was /italic/, *bold* and _underline_. This was much more intuitive for me than what we have today.
The problem is that markdown has no real standard. Well, it does, but not everything follows it because many things existed before the formal standard and many created since are made to be compatible with something that pre-dates the standard. Some optimise for matching stylistic intent (bold, italic, underscore) and others prefer to be more abstract (emphasis, sting emphasis), and yet more try to support both but that requires compromise and they don't all compromise the same way.
Many interpreters will accept underscores for italic, though they still generally (but not always) require two asterisks for bold.
I just accept it as a general idea, not a standard, and lookup the local conventions for whatever tool I'm using at the time. Or if I'm writing a translator, I prioritise converting things written how I personally prefer to write plain text documentation.
Since I became taking notes basically for everything[0], markdown was a savior. Just text is fine, but when you're able to sprinkle a little bit of formatting here and there (and provide links) without sacrificing the readability, this just great.
This is why I have standardised on Obsidian as my data storage along with Datasette[0] (by simonw) for larger data amounts.
Watch a movie? Add a page to Obsidian with the movie title as the note title, run a python script and boom it has all of the metadata filled up along with everything relevant from TMDB and it's a pretty card with a cover image on my Movies Base.
If Obsidian turns Evil Corporate, my workflow will still be the same, the editor just changes. I'll miss Bases, but all of my own automation is a bunch of external scripts that modify markdown.
The lack of comments is a bummer; allowing /* */ style comments seems like it wouldn't complicate parsing significantly, and instead I've sometimes ended up having to do things like this:
{
"channel-comment": "This is the Slack channel that will receive notifications.",
"channel": "#abc"
}
Coming from PHP, the lack of trailing commas in JSON always bites me. It's annoying when I rearrange the items and now the missing comma line was buried somewhere upwards.
I like the comma in front for SQL better than at the end but you still have the problem of one of the fields missing a comma, the first field instead of the last but I guess maybe you're more likely to move fields and forget to add/remove a comma at the end.
As for JSON, I am also constantly annoyed by lack of trailing commas and mandatory quotations for keys. However I think these were the right design decisions and the slight annoyance is a small price to pay (especially when automation exists).
No trailing commas is great for enforcing consistency. I’ma huge fan of consistency in code. Same with required quotation marks, which also simplify writing (imagine having to wonder if something needs it, or be surprised when it does and things break).
I don't understand you. Forcing trailing commas is one of the best features of Go, it enforces consistency where you must have a comma at the end of the line.
Re-orderding lines? No worries, all of them have commas. Removing a line? No comma to change anywhere
No trailing commas is actually INconsistent, consistency is when every element ends with a comma
You've also got it backwards on quotes, it complicates writing by forcing you to write more. And with "Especially when automation exists" wondering is a non-issue, you'd get the syntax hint/error right there while typing and see if you need quotes before anything breaks
While I have no problems with indentation based syntax, it's not very conductive to minimization, so it's a no go for JSON's case.
Coloring things is a luxury, and from my understanding not many people understand that fact. When you work at the trenches you see a lot of files on a small viewport and without coloring most of the time. Being able to parse an squiggly file on a monochrome screen just by reading is a big plus for the file format in question.
As technology progresses we tend to forget where we came from and what are the fallbacks are, but we shouldn't. Not everyone is seeing the same file in a 30" 4K/8K screen with wide color gamut and HDR. Sometimes a 80x24 over some management port is all we have.
This is a popular question, the most common answer I’ve seen is:
> Commas exist mostly to help JSON be human-readable. They have no real syntactic purpose as one could make their own notation without the commas and it'd work just fine.
It’s an old convention that underline in a manuscript (handwritten or typewritten) directs the typesetter to use italics (as underlines are basically nonexistent in professional typesetting before the WWW). I expect that this is where the _italics_ thing (which predates Markdown) comes from. (There is precedent for /italics/ and I don’t think it’s unreasonable, but it is much rarer.)
To add to this, when I went to school for design a long time ago, our typography teacher basically told us to never use underlines if we can use italics instead. It tends to mess with the readability of a paragraph and shifts the visual center of gravity downward, making text more difficult to parse.
I assume that’s also why italics and underline seem to be used interchangeably from time to time, since they generally achieve the same goal of emphasizing text in the same semantic manner.
Trailing commas were common in language design long before JSON or even JavaScript existed as it simplifies machine generation of code while being comparatively trivial to handle in a parser, so a net win.
The convenience it offers for diffing is just a manifestation of the positive interaction with grammars and language tools. The convention of humans using trailing commas in lists, along with one item per line, is relatively new, though. Stylistically, this used to be frowned upon as long definition lists made source files longer, slower to scroll through, and worsened code locality from the perspective of someone using, e.g., a 25 row terminal.
Huh, I don't know about that. I find it much more convenient for editing because it means the last item in a list doesn't behave differently when it comes to cut/copy/paste.
It can also help eliminate some editing errors, when copying entries to extend a list or reordering entries.
I prefer leading commas to having a final comma with an empty clause, though some people hate that and they don't really solve all the final-entry issues (they address some of them, but others are just moved to being first-entry issues).
I'd rather have a format supporting a friendly truth on the ground (on the filesystem) than adding yet another "almost like standard behavior" quirk to tooling.
Git is generally insufferable when it comes to these. Diffing YAMLs is even worse, and it gets downright hideous when the specific document you're working with betrays YAML's rules about orderedness (the document is order invariant, while YAML is ordered). In that case, even most semantic diffing tools become unusable. This is a thing with JSON too, arrays are ordered.
I've been recently using dyff [0] to diff YAMLs in an order invariant way, and it's been absolutely liberating. Couldn't help with version control, but it's still night and day.
it's also annoying to edit, copy a line, add it to the end of a block.
now you need to add a comma to the previous line, and strip off the comma on the new one you added (if you copied it from not the previous line).
I feel that, despite its repugnant appearance, this "comma-first" approach is the best tradeoff in languages like JSON where trailing commas are forbidden; the leading `[` is much harder to accidentally omit or insert than the subtle trailing `,`. In Emacs I use js3-mode, a hack of js2-mode to support comma-first syntax.
Comma-first syntax is especially convenient in SQL, which has the forbidden-trailing-comma problem and several analogous problems. In C if I have a long Boolean conjunction
if (unpleasantly long boolean expression &&
another unpleasantly long boolean expression &&
yet another unpleasantly long boolean expression) {
there are several ways to fix it, such as nesting ifs or factoring the expressions into variables or functions. The "comma-first" approach is also visually unappealing for spacing reasons, requiring two extra spaces after the parenthesis:
if ( unpleasantly long boolean expression
&& another unpleasantly long boolean expression
&& yet another unpleasantly long boolean expression
) {
In SQL, C's alternative approaches are not available, and the "comma-first" style is much more natural:
where unpleasantly long boolean expression
and another unpleasantly long boolean expression
and yet another unpleasantly long boolean expression
I do agree, though, that it's better to design languages to avoid this problem, and I think the way to do that is by using item terminators or item initiators in a list rather than by using item separators. That's what C did for statements with `;`, which was a difference from the ALGOL tradition including Pascal, where `;` was a statement separator, with the unpleasant consequences described in https://www.cs.virginia.edu/~evans/cs655/readings/bwk-on-pas....
In Meta5ix http://www.canonical.org/~kragen/sw/dev3/meta5ixrun.py I experimented with using item initiators for rules in a grammar, like Markdown uses for bulleted lists. I'm not pleased with the rest of the syntactic decisions I tried in Meta5ix, but I do think that one was a good tradeoff; here's about a quarter of the Meta5ix compiler:
Note that, while comma-first layout feels like a gross abuse of a punctuation mark with `,`, it's quite common and natural with `|` in grammars and pattern-matches in languages like ML, where an initial `|` is also permitted; here's an excerpt from my port of μKanren to OCaml (http://canonical.org/~kragen/sw/dev3/mukanren.ml):
let rec walk (s : env) = function
| Vart (Var x) when Env.mem x s -> walk s (Env.find x s)
| u -> u
I think that's what I should have used in Meta5ix, and I will if I get around to revising it.
Yes, I agree. That's a nice trick! It's more verbose, which is a tradeoff, maybe a bad one. Too bad `,` doesn't have an identity element the way `and` does, so this doesn't solve the problem in JSON.
Underscore for italics probably has its origins in the use of a solid underline as markup in a manuscript/typescript instructing the typesetter to set that fragment in italics (underscore generally being frowned upon in professionally set material otherwise, I think).
> I also have issue with it’s creator, John Gruber. He is a highly annoying smug Apple Fanboy. His writing was fine in the early days when Apple was #3, but got intolerable as Apple became the 800lb gorilla. It’s changed recently as Apple has snubbed him but I still can’t read anything he writes.
Came here to paste the exact same quote with a very similar comment. I expand it by adding that Gruber is probably the archetypal "Apple fanboy" made flesh. They don't look like fanatics - as in sports fan or politics discussions in a bar over a beer - they just made their point with a subtle superiority and an "obviously Apple made the best choice here, as usual" spin to all of his posts. I should also add that I stopped reading him time ago though, things might have changed (also it feels like his posts are not so present on the HN frontpage like they used to be)
It's not that JSON itself is bad, but it's obviously for machines to author, not for humans.
- No comment
- No trailing comma
- No multi-line string
It's a terrible format to type manually. However, we just shrugged and said "at least it's not XML" and started writing it manually anyway.
And later we finally realized comments are not optional, so we got JSON5, JSONC, etc...
15 years later and JSON still is still very far from the standardization and tooling that XML had at the time.
It reminds me of the NoSQL thing from back then too, oversimplified it was "what if we chuck JSON blobs in a key/value store?". It took years to realize that relational databases and SQL weren't actually that bad, and / or that NoSQL had a long term cost.
1: Standardization was made by committee lovers and/or architecture astronauts leaving us with overly convoluted (sometimes to fit lacklustre object models in early OO languages) and complex ways of working.
2: Complexity that introduced security vulnerabilities
Sure, there was some great tooling available, but how much of it was needed because all complexity made it hopeless to work with without the tooling?
I used it for configuration and serialization in some projects, and it was actually great but I almost always diverged from the bloated norms and defaults for readability/writeability (that made it a bit annoying to specify serialization rules).
I mean, why did people prefer?
Over just? Yeah there is so much "flexibility" in those above designs but it wasn't needed 99% of the time.JSON was and is so far the best popular compromise between "just plain data" and "some" structure to make automated processing non-painful.
Also as an improvement over XML collections (do you created a container element to specify container target leading to bloat or just map some of the sub-elements to specific collections and hope you don't run into ambiguities?) is that collections are just specific lists to a property.
The biggest drawback of JSON is that we never had a way to handle type specializations/subtyping but had we done that we might have not gotten the universal acceptance across languages.
Yes, comments but whenever you need that you can make a single-line regexp to strip a useful subset of them without affecting anything by the standard by removing matches of
JSON is rediscovering XML Schema, XML DTDs etc, when we had those a quarter century ago already.
It was so good when you could define the structure easily and validate it with standard tooling.
If I have a structure like this JSON:
Should I do this in XML? Or should I do this? If your goal is to simply encode a data structure that looks more or less like a C-style struct (key/values, arrays, primitives), the concept of attributes on tags is superfluous, and introduces ambiguity in how to serialize something.To me, JSON is nice because it's essentially the minimum viable way of encoding a C-style struct (floating point behavior notwithstanding.) XML has extras like attributes, schemas, and DTDs, which may be useful, but they come at the cost of having additional syntax (<!DOCTYPE ...>, etc), which is auxiliary to the goal of encoding data structures, and thus makes it no longer as minimal.
To me, JSON's approach of having a separate out-of-band definitions of schema (e.g. JSON schema) is the better approach because it's less opinionated, you only pay for it when you need it, and it doesn't require separate syntax. Leave validation to a validation step, my data is my data.
I had been tasked at least several times in my past jobs to develop XSLT scripts to transform data into user-readable content. I don't know of anyone that uses XSLT today and I have no idea if there is a JSON equivalent.
Though as I understand it it's possible that this might not be the case for much longer: https://github.com/whatwg/html/issues/11523
But we haven't, otherwise we'd use all those better formats instead
except they very much are. the place to explain your payload is in the API documentation, not alongside the payload. It's not code.
I'm more partial to YAML for readability, but I don't think JSON configs are an awful anti pattern.
> It's not code.
package.json has a field literally called 'scripts' where the values are shell one-liners.
https://orgmode.org/worg/org-contrib/babel/
Not to be mean, but this has the trifecta of amateur programming:
- JSON - Games - One solution for everything.
Pro tip, you can store variables as they are in memory to disk. Got 1 million 2D points representing units? Each point is a couple of floats? You can store each float as-is, write the amount of floats as an int (4bytes), the first float is the X coord, the second the Y coord, (4 bytes each), then repeat 1M times, boom you just solved that in 8MB, and in a couple of miliseconds of compute. Bonus point, no escaping, no import json, just a programmer programming.
But you are being mean. Also, the guy who wrote the article is no amateur, but a seasoned veteran who's likely been storing floats in files for decades. Check him out, you might be surprised.
https://www.grumpygamer.com/about/
Now I have 308 posts to read :)
I think that comes from separating content and style, indicating meaning rather than explicit style: it isn't really one asterisk for italic and two for bold, it is one for emphasis and two for strong emphasis and the renderer choses how to display those levels. Like using HTML's “em” and “strong” tags instead of explicit “i” and “b” tags.
It still looks good even without any formatting! (And btw. I thought that was the intention of markdown …)
Many interpreters will accept underscores for italic, though they still generally (but not always) require two asterisks for bold.
I just accept it as a general idea, not a standard, and lookup the local conventions for whatever tool I'm using at the time. Or if I'm writing a translator, I prioritise converting things written how I personally prefer to write plain text documentation.
[0] https://www.mindthis.io/
Watch a movie? Add a page to Obsidian with the movie title as the note title, run a python script and boom it has all of the metadata filled up along with everything relevant from TMDB and it's a pretty card with a cover image on my Movies Base.
If Obsidian turns Evil Corporate, my workflow will still be the same, the editor just changes. I'll miss Bases, but all of my own automation is a bunch of external scripts that modify markdown.
[0] https://datasette.io
<pre><code> var , var2 , var3 , var4 </code></pre>
No trailing commas is great for enforcing consistency. I’ma huge fan of consistency in code. Same with required quotation marks, which also simplify writing (imagine having to wonder if something needs it, or be surprised when it does and things break).
You've also got it backwards on quotes, it complicates writing by forcing you to write more. And with "Especially when automation exists" wondering is a non-issue, you'd get the syntax hint/error right there while typing and see if you need quotes before anything breaks
Coloring things is a luxury, and from my understanding not many people understand that fact. When you work at the trenches you see a lot of files on a small viewport and without coloring most of the time. Being able to parse an squiggly file on a monochrome screen just by reading is a big plus for the file format in question.
As technology progresses we tend to forget where we came from and what are the fallbacks are, but we shouldn't. Not everyone is seeing the same file in a 30" 4K/8K screen with wide color gamut and HDR. Sometimes a 80x24 over some management port is all we have.
> Commas exist mostly to help JSON be human-readable. They have no real syntactic purpose as one could make their own notation without the commas and it'd work just fine.
https://stackoverflow.com/a/36104693
Elsewhere such commas can be optional, e.g. in clojure: https://guide.clojure.style/#opt-commas-in-map-literals
An article like this is weakened by including an unnecessary personal attack.
Surely _underline_ would make more sense than _italics_. Somewhere I have seen /italics/ in use, but that does look kind of regexpy.
>I dislike that trailing commas are not allowed...There is no need for this and it makes writing out valid JSON more complex.
Trailing commas as a trend emerged after JSON was standardised. And thank god JSON is as well and truly standard as it is.
The convenience it offers for diffing is just a manifestation of the positive interaction with grammars and language tools. The convention of humans using trailing commas in lists, along with one item per line, is relatively new, though. Stylistically, this used to be frowned upon as long definition lists made source files longer, slower to scroll through, and worsened code locality from the perspective of someone using, e.g., a 25 row terminal.
I prefer leading commas to having a final comma with an empty clause, though some people hate that and they don't really solve all the final-entry issues (they address some of them, but others are just moved to being first-entry issues).
I've been recently using dyff [0] to diff YAMLs in an order invariant way, and it's been absolutely liberating. Couldn't help with version control, but it's still night and day.
[0] https://github.com/homeport/dyff
Comma-first syntax is especially convenient in SQL, which has the forbidden-trailing-comma problem and several analogous problems. In C if I have a long Boolean conjunction
there are several ways to fix it, such as nesting ifs or factoring the expressions into variables or functions. The "comma-first" approach is also visually unappealing for spacing reasons, requiring two extra spaces after the parenthesis: In SQL, C's alternative approaches are not available, and the "comma-first" style is much more natural: I do agree, though, that it's better to design languages to avoid this problem, and I think the way to do that is by using item terminators or item initiators in a list rather than by using item separators. That's what C did for statements with `;`, which was a difference from the ALGOL tradition including Pascal, where `;` was a statement separator, with the unpleasant consequences described in https://www.cs.virginia.edu/~evans/cs655/readings/bwk-on-pas....In Meta5ix http://www.canonical.org/~kragen/sw/dev3/meta5ixrun.py I experimented with using item initiators for rules in a grammar, like Markdown uses for bulleted lists. I'm not pleased with the rest of the syntactic decisions I tried in Meta5ix, but I do think that one was a good tradeoff; here's about a quarter of the Meta5ix compiler:
Note that, while comma-first layout feels like a gross abuse of a punctuation mark with `,`, it's quite common and natural with `|` in grammars and pattern-matches in languages like ML, where an initial `|` is also permitted; here's an excerpt from my port of μKanren to OCaml (http://canonical.org/~kragen/sw/dev3/mukanren.ml): I think that's what I should have used in Meta5ix, and I will if I get around to revising it.FWIW with SQL multi-line booleans, I tend to do:
Because that's _underline_, and /italics/ are slanted
> I also have issue with it’s creator
Just pick a different specification of markdown, it's not like there is only one :)
Thank you! I thought it was just me...