SQLite Is a Library of Congress Recommended Storage Format

(sqlite.org)

175 points | by whatisabcdefgh 9 hours ago

13 comments

tnelsond4 1 hour ago
I'm always inspired by SQLite. Overall I like it, but if you're not doing writes it's really overkill.
So I made a format that will never surpass SQLite, except that it's extremely lighter and faster and works on zstd compressed files. It has really small indexes and can contain binaries or text just like SQLite.
The wasm part that decompresses and reads and searches the databases is only 38kb (uncompressed (maybe 16kb gzipped)). Compare that to SQLite's 1.2mb of wasm and glue code it's 3% the size but searching and loading is much faster. My program isn't really column based and isn't suitable for managing spreadsheets, but it's great for dictionaries and file archives of images and audio.
I ported the jbig2 decoder as a 17kb wasm module, so I can load monochrome scans that are 8kb per page and still legible.
https://github.com/tnelsond/peakslab
SQLite is very well engineered, PeakSlab is very simple.
[-]
- zoky 38 minutes ago
  something something XKCD competing standards something something
  [-]
  - tnelsond4 16 minutes ago
    Believe me, I tried sticking to SQLite or aard2 or stardict, they just were fundamentally inadequate with no good pwa tooling cross platform apps.
alexpotato 5 hours ago
I have always loved SQLite.
I have also heard that some firms ban its use.
Why?
Because it makes it SO easy to set up a database for your app that you end up with a super critical component of your application that looks exactly like a file. A file that can have any extension. And that file can be copied around to other servers. Even if there is PII in that file. Multiply this times the number of applications in your firm and you can see how this could get a little nuts.
DevOps and DBA teams would prefer that the database be a big, heavy iron thing that is very obviously a database server. And when you connect to it, that's also very obvious etc etc.
I still love SQLite though.
[-]
- Fwirt 5 hours ago
  The question is, do the same firms ban Excel? Excel spreadsheets often end up as shadow databases in unlikely places.
  [-]
  - hermitShell 4 hours ago
    The sane thing would be to ban Excel and promote SQLite. Excel is often used for tabulated text (issue tracking) not calculations. Perfect use case for a relational db
    [-]
    - frollogaston 4 hours ago
      Excel is made for calculations. But if you make it hard to make a DB, people will abuse Excel as a DB.
      [-]
      - TJSomething 1 hour ago
        I mean, it might have been at first, but Microsoft figured out that the majority of users for lists without formulas in 1993 and they've strategized around that. IMHO, the biggest concession to this was when they added Power Query to core Excel in 2016.
  - silon42 1 hour ago
    IMO, almost any Excel more than a month old should become readonly.
  - Spooky23 4 hours ago
    They generally cannot. But they do banish Access.
    [-]
    - pasc1878 43 minutes ago
      Now that is different.
      Access gets used for a shared DB and that is quite easy to corrupt. It is much more cost effective to have that in a proper central database (I supse SQLLite is better here as well)
  - DeathArrow 2 hours ago
    Do companies ban text files? Text files are used to store data.
- tehlike 3 hours ago
  There are interesting uses for sqlite, like this one: https://sqlite.org/sqlar.html
- ai_slop_hater 5 hours ago
  That's so dumb
- slopinthebag 4 hours ago
  > DevOps and DBA teams
  Ah so two teams nobody should listen to.
  [-]
  - frollogaston 4 hours ago
    At least would take it with a grain of salt when the DBA wants you to depend more on the DBA.
    [-]
    - slopinthebag 4 hours ago
      Same with devops tbh.
      "Hey everyone, we need to chose the option that involves us the most and provides us the most job security"
testermelon 13 minutes ago
I'm surprised they included proprietary format that's de facto standard in profession or supported by multiple tools (.xls, .xlsx) in preferred section [1]. I wonder if "well-known enough" is as good as "open" from preservation standpoint.
[1] https://www.loc.gov/preservation/resources/rfs/data.html
[-]
- mort96 3 minutes ago
  Especially when Office 365 shows that not even Microsoft is capable of making software which can display Office files anymore... if you have a Word file which was created or has ever been modified by the Word application, working with it through Office 365 is such a pain. I've literally had images which are impossible to delete or move in the web version, and they will absolutely render in the wrong place.
- pletnes 4 minutes ago
  You can unzip the xlsx and read the xml inside. It’s not the worst format by far.
rmunn 3 hours ago
> As of this writing (2018-05-29) ...
So this news is nearly <del>six</del> EIGHT years old. But I didn't happen to know about it until now, so that's not a complaint at all; rather, this is a thank-you for posting it.
(Thanks for the correction. Brief brain malfunction in the math department there).
[-]
- tehlike 3 hours ago
  Sir, it's 2026. It's 8 years old.
  [-]
  - rmunn 3 hours ago
    Corrected; thanks.
- frollogaston 2 hours ago
  Was going to say, was having deja vu reading this
faangguyindia 2 hours ago
I went from thinking “SQLite is a toy product, not reliable for real data" to "lets use SQLite for almost everything"
SQLite is very good if you can fit into the single writer, multiple readers pattern; you'll never lose data if you use the correct settings, which takes a minute of Google search to figure out.
Today, most of my apps are simply go binary + SQLite + systemd service file.
I've yet to lose data. Performance is great and plenty for most apps
[-]
- michaelchisari 52 minutes ago
  The single writer is less of an issue in practice than it's made out to be. Modern nvme drives are incredible and it's trivial to get 5k writes per second in an optimized WAL setup. Way more than most apps could ever dream.
  And even then, I've used a batch writer pattern to get 180k writes per second on a commodity vps.
srcreigh 5 hours ago
2026 recommended storage formats: https://www.loc.gov/preservation/resources/rfs/data.html
akihitot 4 hours ago
For public-sector data preservation, it may be one of the best options.
The specification is publicly available
- It is widely adopted - It is likely to remain readable in the future - It has little dependency on specific operating systems or services - It carries low patent risk
From the perspective of long-term continuity, avoiding dependence on any particular company or service is extremely important.
[-]
- Spooky23 4 hours ago
  Archivists also love formats close to native. SQLite lets the relational relationships be present in a way that csv cannot.
  [-]
  - akihitot 4 hours ago
    That's certainly true. The ability to define table relationships is a major difference from CSV.

afshinmeh 1 hour ago

I love SQLite and thanks for sharing it but there should be a "(2018)" at the end in the title:

> As of this writing (2018-05-29) the only other recommended storage formats for datasets are XML, JSON, and CSV.

[-]

maxloh 1 hour ago

FYI, they added a lot more formats to the list after that.

  Preferred
  
  1. Platform-independent, character-based formats are preferred over native or binary formats as long as data is complete, and retains full detail and precision. Preferred formats include well-developed, widely adopted, de facto marketplace standards, e.g.
    a. Formats using well known schemas with public validation tool available
    b. Line-oriented, e.g. TSV, CSV, fixed-width
    c. Platform-independent open formats, e.g. .db, .db3, .sqlite, .sqlite3
  
  2. Any proprietary format that is a de facto standard for a profession or supported by multiple tools (e.g. Excel .xls or .xlsx, Shapefile)
  
  3. Character Encoding, in descending order of preference:
    a. UTF-8, UTF-16 (with BOM),
    b. US-ASCII or ISO 8859-1
    c. Other named encoding
  
  ---
  
  Acceptable
  
  For data (in order of preference):
  
  1. Non-proprietary, publicly documented formats endorsed as standards by a professional community or government agency, e.g. CDF, HDF
  2. Text-based data formats with available schema
  
  For aggregation or transfer:
  
  1. ZIP, RAR, tar, 7z with no encryption, password or other protection mechanisms.

https://www.loc.gov/preservation/resources/rfs/data.html

[-]

xxs 19 minutes ago
.7z being there just discredits the entire process. The underlying compression algorithm is a free-hand one and can be anything[0], or contain bugs and exploits[1]. Personally I use only zstd with .7z which is 'non-standard' by the official (Russian) release.
[0]: https://7-zip.org/7z.html
[1]: CVE-2025-0411

tombert 2 hours ago
On a recent project I have needed to use exFAT. exFAT is terrible for a number of reasons, but in my case the thing I had to deal with was the lack of journaling, which had the possibility to corrupt files if there were a power interruption or something.
I initially was writing a series of files and doing some quasi-append-only things with new files and compacting the old one to sort of reinvent journaling. What I did more or less worked but it was very ad hoc and bad and was probably hiding a lot of bugs I would eventually have to fix later.
And then I remembered SQLite. I realized that ACID was probably safe enough for my needs, and then all the hard parts I was reinventing were probably faster and less likely to break if I used something thoroughly audited and tested, so I reworked everything I was doing to SQLite and it worked fine.
I wish exFAT would die in a fire and a journaling filesystem would replace it as the "one filesystem you can use everywhere", but until it does I'm grateful SQLite exists.
[-]
- topham 2 hours ago
  The problem with it is you didn't solve your biggest actual problem, you just haven't had a problem bite you in the ass yet so you think your problem is solved.
  [-]
  - tombert 6 minutes ago
    I am not sure the problem is actually fully solvable. I think SQLite helps at least a little.
- mmooss 2 hours ago
  > I wish exFAT would die in a fire and a journaling filesystem would replace it as the "one filesystem you can use everywhere"
  Where exactly is everywhere? Win32? All of Linux? BSDs? MacOS? IOS? ...
  [-]
  - tombert 6 minutes ago
    Everywhere exFAT is supported now. Windows, Mac, Linux, FreeBSD would be fine.
  - ghrl 1 hour ago
    Something MacOS and Windows support natively would be a good start, it could grow from there.
ray_v 3 hours ago
It's so funny, because I was JUST telling a colleague of mine - another librarian - this exact fact about sqlite!
arian_ 1 hour ago
[flagged]
WindyBolt907 3 hours ago
[dead]
ksamantha 2 hours ago
个人使用，我是真的超喜欢SQLite
[-]
- cpach 1 minute ago
  Welcome to Hacker News! Please write in English here. Thank you in advance from a long-time member :)