yq: command-line YAML, JSON, XML, CSV and properties processor

(github.com)

224 points | by based2 447 days ago

16 comments

voytec 447 days ago
The yaml document from hell[1] needed three changes ("*.html", "*.png", "!.git") to be parsed by yq at all. "Norway problem" is not a problem as no was converted to a quoted string. Unquoted strings in "allow_postgres_versions" part were not quoted by yq.
[1] https://ruudvanasseldonk.com/2023/01/11/the-yaml-document-fr...
[-]
- alecthomas 447 days ago
  It's a bit hard to be sure because YAML is so insane, but I would argue that yq's behaviour is not incorrect?
  It is erroring on `*.html`, which is reasonable because `.html` is an invalid anchor identifier, though the error is not that useful. The parsing of the unquoted version numbers also seems to be "correct", in that things that look like numbers are supposed to be parsed as numbers.
- EdwardDiego 446 days ago
  I fully agree with the author's statement that "templating YAML is a terrible idea". Helm charts are somewhat finicky to author.
  And I don't much like them as an end-user either, at least where there's a decent operator as an alternative.
  (The definition of "decent" starts with "good documentation", looking at you Prometheus operator.)
  [-]
  - lukeschlather 446 days ago
    Templating yaml with a text templating language like Helm's templating language is a terrible idea. Templating objects and serializing them to Yaml (with input also being Yaml) I find quite nice: https://github.com/con2/emrichen
    [-]
    - EdwardDiego 446 days ago
      Yep, now that is a far better approach.
  - wrldos 446 days ago
    Templating YAML is up there with putting lead in gas and invading Ukraine. It instantly turns me into an expletive spouting rage infested monkey.
    And yes let’s not even get into the state of off the shelf charts.
- deathanatos 446 days ago
  1. yq appears to accept the unquoted tag !.git for me, without change? (This is at least a correct parse syntactically, I think.)
  2. The unquoted aliases (*.html, *.png) are invalid: that yq errors on them is the correct output. (I.e., if you want those as literal strings, they MUST be quoted.)
- MuffinFlavored 447 days ago
  i wonder if the people who worked on the YAML spec regret not string wrapping stuff? i’m sure it was by design “at the time”?
  [-]
  - tyingq 447 days ago
    The spec says this:
    "The plain (unquoted) style has no identifying indicators and provides no form of escaping. It is therefore the most readable, most limited and most context sensitive style."
    Which reads to me that they expected people to treat the plain style as a convenience that had notable downsides.
    Edit: Note that this is the "plain" style, where there's also single and double quoted styles.
    [-]
    - tgv 447 days ago
      A reasonable choice (in contrast to "no" or "null"). YAML is simply not a format for all use cases. It's good enough for many tasks, and more readable than most other formats where it fits.
      [-]
      - dragonwriter 447 days ago
        > YAML is simply not a format for all use cases.
        Maybe, but unquoted strings not being the right choice for all use cases (or, similarly, structure-by-indentation not being) doesn’t show that, since YAML supports unquoted and quoted strings, and supports both indent-sensitive “block style” and delimiter-based “flow style”.
        [-]
        tgv 446 days ago
        To me, it doesn't fit where people with a less technical background can/must edit configuration files, or where there's a large risk of mixing up null and "null". For readability, it's fine, except for the ugly node reference syntax.
theonemind 447 days ago
gojq works great with YAML and reimplements jq itself in Go. I use gojq with --yaml-input or --yaml-output (or sometimes both) and flip back and forth between JSON and YAML promiscuously and have 100% jq UI compat, which helps because I use jq a lot. First thing I looked at on yq is '-s', which is 'slurp' for jq, but different for yq. Slightly altered semantics would just trip me up, and it seems like you can make a nearly straight bijection between YAML and JSON so you can just do exactly the same things with either one (with some minor exceptions.)
https://github.com/itchyny/gojq
[-]
- 0cf8612b2e1e 447 days ago
  gojq does not preserve key order or offer option to sort keys. Which is a non-starter for me. The majority of my jq use is to cleanup API responses for easier human review.
  [-]
  - karmakaze 447 days ago
    They should be feature requests to gojq. There must be libraries for maps with sorted keys or preserving insertion order to use in place of the std 'map'.
  - quinncom 446 days ago
    You might enjoy the httpie cli, which is better than curl for testing APIs for many reasons, one of which is automatic pretty printed and colorized text response output. https://httpie.io/cli
  - tenken 447 days ago
    yup. this. It's portable! ... But nerfed.
- pokstad 447 days ago
  I believe gojq is also bundled with Benthos. Benthos is a great Swiss army CLI tool for various data manipulations.
- filereaper 447 days ago
  yq has a merge function that gojq does not.
  We use both tools equally where ease of use is the primary driving factor.
user3939382 447 days ago
I recently dug into the docs of jq and was surprised to find that, contrary to my prior belief based on shallow experience with it, jq’s expression aren’t merely a path syntax but apparently a turing complete language. I was blown away.
I wish MySQL and AWS could have figured out a way to adopt it, or a subset of it, rather than each using different ones. Now I have varying levels of knowledge for 4-5 variations of JSON path semantics/standards, it’s annoying.
[-]
- nequo 447 days ago
  User odnoletkov[1] has solved several years’ worth of Advent of Code in jq:
  https://github.com/odnoletkov/advent-of-code-jq
  [1] https://news.ycombinator.com/user?id=odnoletkov
  [-]
  - alecthomas 447 days ago
    That is simultaneously impressive and insane. Respect.
- justin_oaks 447 days ago
  The JSON expressions used by MySQL and the AWS command line tool (JMESpath) are extremely limited compared to jq.
  So not only do we end up learning multiple JSON path variations, but most of them are nearly useless for anything but the simplest use cases.
  I appreciate the intention of including JMESpath in awscli, but I quickly dropped it in favor of piping the JSON results to jq.
  [-]
  - mdaniel 447 days ago
    I have a similar complaint but I'd guess there are (at least) two problems standing in the way of awscli getting jq language support: a python impl of the language with a license that awscli tolerates, and awscli being (in general) very conservative about changes. There are innumerable open issues about quality of life improvements that are "thank you for your input" and I'd expect that change to be similarly ignored
  - thayne 446 days ago
    Not to mention that JMESpath appears to be abandoned.
    There is a fork (https://github.com/jmespath-community/jmespath.spec), but it seems unlikely to be used by the aws cli (https://github.com/aws/aws-cli/issues/7396). Although, for that matter jq is semi-abandoned itself.
    [-]
- dragonwriter 447 days ago
  > I wish MySQL and AWS could have figured out a way to adopt it, or a subset of it, rather than each using different ones.
  For AWS CLI, you can just output unfiltered JSON and pipe the results through jq; the filtering is client-side anyway, so it’s not like you are losing anything doing external filtering vs. filtering within the AWS CLI.
- WesolyKubeczek 447 days ago
  There has been a jq written in jq here on the front page.
- eru 447 days ago
  It's not just a Turing complete language, but a well designed one, too.
VWWHFSfQ 447 days ago
Looks very cool! I don't care so much about YAML, but I do a ton of processing of JSON and csv/tsv. Any word on the performance relative to jq and xsv [1]?
[1] https://github.com/BurntSushi/xsv
[-]
- 0cf8612b2e1e 447 days ago
  I am all for faster tools, but I am curious as to your use case where the jq speed would be limiting. I only ever cleanup a maximum of a few megabytes at a time, where the jq response is close enough to instant that it has never been a concern.
  [-]
  - VWWHFSfQ 447 days ago
    I typically work with multi-gigabyte JSON and CSV files. I just did a quick test with yq and it's only about 30% faster than just using Python's csv and json libraries. Whereas the same thing is 1,200% faster with jq and xsv. It's just my use-case though, so YMMV.
- snacktaster 447 days ago
  Being written in Go, I would be very surprised if it's anywhere close to as fast as either of those tools.
  [-]
  - VWWHFSfQ 447 days ago
    I kind of suspected as much. But would still like to see some actual benchmarks.
CathalMullan 447 days ago
Another tool in this space is Dasel[1], which can handle querying/modifying JSON, YAML, TOML, XML and CSV files.
[1] https://github.com/TomWright/dasel
[-]
- AndyKluger 447 days ago
  I do prefer that to jq syntax, as well as alternatives jello and yamlpath.
daurnimator 446 days ago
I personally find the yq tool from https://github.com/kislyuk/yq much more useful: it has all the same options and formats as `jq` (as it's really a wrapper around jq). Rather than the `yq` in the OP here where only partial functionality exists.
ehPReth 447 days ago
honourable mention for HTML: https://github.com/mgdm/htmlq
[-]
EdwardDiego 447 days ago
`yq` is invaluable when working in the K8s world, love it.
raydiatian 446 days ago
I think my dream is yq but with JSONata and an interactive editor at the command line.
I love yq and jq, but imo the core feature they’re missing is queryability. The problem is that afaia the jq syntax doesn’t support things like “where value = x”.
There’s another lesser known but imo better querylang called JSONata [0], which is basically a querying and reshaping syntax for structured data.
I’m working on this in my spare time but if any know of one that exists so that I don’t have to GO (lang) down a rabbit hole, please do share.
[0] https://jsonata.org/
[-]
- slgurtman 446 days ago
  jq and yq both support filtering with select operator
  [-]
  - raydiatian 446 days ago
    Okay I must have forgotten this, now that you point it out. But that’s nowhere near as elegant as compact compared with how JSONata handles it. The ideal tool probably lets you choose. Xpath, jq, jsonata, etc
scarface74 446 days ago
Side rant: Every normal yaml processor I’ve tried struggles with CloudFormation. I end up using the cfn-flip command line program/Python module to deal with CFT Yaml.
Scubabear68 446 days ago
Lately I have had to do a lot of flat file analysis and tools along these lines have been a godsend. Will check this out.
My go to lately has been csvq (https://mithrandie.github.io/csvq/). Really nice to be able run complicated selects right over a CSV file with no setup at all.
wiradikusuma 446 days ago
Another worthy alternative:
https://www.brimdata.io/blog/introducing-zq/
mattewong 446 days ago
Another for the mix: https://github.com/liquidaty/zsv
notorandit 447 days ago
This is really a great tool! Thanks Mike!
encryptluks2 447 days ago
There is also another tool named yq that is Python based, and passes everything through jq. The Go-based yq is pretty awesome, but does have some limitations.
[-]
- nivekastoreth 446 days ago
  the python yq is by far my preferred utility. i love native jq and the fact it simply wraps it means every thing just works, which is not the experience i’ve had with the tool mentioned in the topic
worldsavior 447 days ago
Looks cool, but disappointing it's written in Go. Go is fast, but not as fast as Rust or C. I'm sure with large streams, you probably can see the difference in time if it's written in Rust or Go.
[-]
- avgcorrection 447 days ago
  As they say: always measure when it comes to performance. Unless it’s a programming langauge that you don’t like. Then you don’t even have to run the program.
  [-]
  - worldsavior 447 days ago
    I don't understand why my comment is being flagged. I like the language, just thought it can be even faster in other languages, just a thought.
    [-]
    - rgoulter 446 days ago
      There's a difference between:
      "I have a <very large> YAML file, and it takes yq <some long time> to parse, whereas <some workflow I use written in rust/c> takes <much less time>".
      "Go is slower than Rust, the author should have written it in that".
      It's likely for most YAML documents you encounter, the difference between using Go and using Rust or C is negligible. -- Though, if this isn't true, some numbers would be useful, too.
      A comment like "Go is a bad language to use" is just a thought; but it's also a low-effort dismissal of something someone has put effort into, and of a tool that's quite useful.
- ihucos 447 days ago
  Can we stop this programming language bashing.