Back

Durable queues, streams, pub/sub, and a cron scheduler – inside your SQLite file

134 points9 hourshonker.dev
tptacek6 hours ago

"Idle cost is that one lightweight SELECT per millisecond per database — no page-cache pressure, no writer-lock contention, no kernel file watcher in the mix."

I think (respectfully) the LLM that probably wrote this overshot the mark here because busy-polling a select does not actually sound better to me than a "kernel file watcher".

felooboolooomba6 hours ago

"one lightweight SELECT per millisecond"

This reminds me of the teenager who told her dad that she was just a tiny little bit pregnant.

sroussey2 hours ago

Thing of the battery!

(read that in the way of "think of the children!")

giraffe_lady5 hours ago

[flagged]

rv64imafdc4 hours ago

Hold on -- if it really is "one lightweight SELECT per millisecond", and you're saying a select is "a couple hundred microseconds", say generously 200us?, then you're spending 200us out of every 1000us just selecting. That's a lot of polling!

giraffe_lady4 hours ago

I mean only in the same sense that you spend 1 second per second doing something. Time is probably not the best way to evaluate the resources this consumes and I doubt it takes much of anything else either.

It does seem weird though even for sqlite. I wonder how oban does it. I also wonder if OP knows oban can run on sqlite.

tptacek5 hours ago

Yeah, again, to be clear: I get how SQLite works and I'm not dunking on the design, I'm just saying the comparison set up on this page snags. It's a classic LLM negated triptych, but "one of these things is not like the other": cache pressure: bad, writer contention: bad, kernel file watcher: ... good, actually? Intuitively seems better than this design?

8note3 hours ago

to me it sounds like they asked it to not make a kernel file watcher, and now it writes that into every comment everywhere, despite not even being in the implementation

ncruces5 hours ago

If you're not making any changes to the database, does the SELECT "kill" you?

And if you are making changes, don't you have to poll regardless after the file watcher wakes you?

For WAL mode, SQLite can probably satisfy this query just by inspecting some shared memory. But it is busy waiting, sure.

billywhizz3 hours ago

SQLite has a wal hook which calls you back every time a transaction is committed to the WAL. https://www.sqlite.org/c3ref/wal_hook.html

ncruces2 hours ago

That only catches changes made by the database connection being "hooked."

This has a thread running in the background trying to catch changes made by other connections, potentially (I'm not sure here, but I suspect as much) in different processes that are modifying the same database.

billywhizz2 hours ago

good point. but ime and as seems to be widely understood writing from multiple connections is a bit of a minefield in SQLite. and afaik it still would be possible to have a hook on all connections you expect to be writing?

redsocksfan454 hours ago

[dead]

d1l5 hours ago

Yeah, I had the same instinct - this feels very much like a "nice idea" but the execution falls short. I mean - busily banging on sqlite like this? Shit at that point just use Redis.

koito175 hours ago

For what it's worth, Kine (software that k3s uses to replace etcd with SQL databases) implements etcd watches on SQLite through polling[1]. The reason being that SQLite does not offer NOTIFY/LISTEN like MySQL and Postgres do. Ironically, Honkey attempts implementing NOTIFY/LISTEN through polling.

k3s has been running on my home server for about three years now (using the default SQLite backend), and there doesn't seem to be excessive CPU usage despite dozens of watches existing in the simulated etcd. Of course, this doesn't say much about Honker, but it's nonetheless worth pointing out that sometimes the choice of database forces one towards a certain design.

[1] https://github.com/k3s-io/kine/blob/648a2daa/pkg/logstructur...

sroussey2 hours ago

Are you trying to avoid sleep?

jallmann4 hours ago

With SQLite, you're basically funneled towards a single-writer / single-process design anyway ... in which case why not use a more traditional condvar + mutex rather than polling?

tptacek5 hours ago

I'm not even saying it's unworkable, just, my intuition is not that the "lightweight per-millisecond select" is an optimal design.

giraffe_lady5 hours ago

Really might be in sqlite. I've learned to never trust my intuition about performance with that thing. So many times I've gone to "optimize" something and discovered that the naive hack way I had been doing it was faster anyway. It's built for this sort of bullshit.

+1
tptacek5 hours ago
andai5 hours ago

What's the CPU usage? Like 2%?

I had a manual fs polling thing a while back. It was ugly (low time budget, didn't wanna mess with the native watchers), just scanned the whole thing once per second. It averaged out to like 0.3% CPU.

Not elegant, but acceptable for my purposes! (Small-ish directory, and "ping me within a second or two" was realtime enough for this use case.)

booi3 hours ago

i mean, technically this is once per millisecond, so this would happen 1000x more. In your case due to the kernel overhead you would likely not even be able to do it (300% CPU?).

Either way this does seem like a very large overhead due to the fact that there's just no other way to do it without a deeper kernel integration which might be outside the scope of what sqlite is trying to do.

paulddraper2 hours ago

> one lightweight SELECT per millisecond

For the low, low cost of $1 per minute, you can also lease a supercar.

djdillon5 hours ago

[flagged]

codedokode2 hours ago

> Once real work flows through a SQLite-backed app, you need a queue. The usual answer is “add Redis + Celery.”

Are they joking? SQLite is usually used for single-process (mutliple threads) applications. The proper way to communicate between threads/processes is a ring buffer, where you allocate structs (allocation typically is incrementing a pointer), and futex/eventfd for notifications (+ some spinlocking to avoid going to kernel when the tasks arrive quickly). Why do you need redis for that? If you need persistent tasks, then you can store them in the table, and still use futex for notifications. This polling is inefficient and they should not make it a library which will cause other lazy developers add it to their app.

> honker polls SQLite’s PRAGMA data_version every millisecond. That’s a monotonic counter SQLite increments on every commit from any connection, journal mode, or process — a ~3 µs read for a precise wake signal

That's 3 ms per second = 0.3% CPU time wasted for every waiting thread.

Like Electron, this feels like written by a web developer and not a real programmer.

Groxx2 hours ago

>That's 3 ms per second = 0.3% CPU time wasted for every waiting thread.

I suspect that's actually "per process, per database (usually 1)", and not based on number of threads or tables. `data_version` semantics mean there's no need for more than one connection polling it, and it's being used as a relatively lightweight "DB has changed, check queues" check.

Also I believe this is mostly intended for multi-process use, e.g. out-of-process workers, so an in-process dirty tracker (e.g. just check after insert/update/delete) isn't sufficient.

So I do think it's somewhat crazy, but it is at least very simple. fsnotify-like monitoring seems like a fairly obvious improvement tho, not sure why that isn't part of it.

deepsun2 hours ago

Nevertheless, expect articles like "We replaced our redis cluster with this simple extension and got it N times faster".

itopaloglu837 hours ago

It’s an interesting approach and can be quite fun to use for new projects.

> How it works: honker polls SQLite’s PRAGMA data_version every millisecond. That’s a monotonic counter SQLite increments on every commit from any connection, journal mode, or process — a ~3 µs read for a precise wake signal.

EvanAnderson7 hours ago

Prior discussion a few days ago: https://news.ycombinator.com/item?id=47874647

vmsp6 hours ago

Reminds me of Litestack for Rails. Eventually, it was abandoned because Rails itself started going all out on SQLite.

https://github.com/oldmoe/litestack

nop_slide6 hours ago

All in*

wmanley3 hours ago

I've implemented something similar in the past, but using inotify. You need to watch the -wal file for IN_MODIFY. To make it work reliably I found I had to run:

    BEGIN IMMEDIATE TRANSACTION; ROLLBACK;
Otherwise the new changes weren't guaranteed to be visible to the process. I'm sure there's a more targetted approach that would work instead - maybe flock on a particular byte in the `-shm` file.
arlobish6 hours ago

At the end it says: "pg-boss and Oban are the Postgres-side gold standards" -- but Oban supports SQLite now too https://github.com/oban-bg/oban

odie55334 hours ago

There's also Graphile Worker. https://github.com/graphile/worker

maxdo4 hours ago

Almost feels like someone is trying to joke about similar postgres application .

To make it look even more absurd . SQLite is not concurrent and you’ll have tons of problems using it practically .

deferredgrant4 hours ago

This seems especially appealing in the awkward middle: too serious for in-memory queues, not big enough to justify Kafka-shaped machinery.

andrewstuart4 hours ago

Suggestion for the author wind back the polling to once a second when nothing is happening.

andrewstuart4 hours ago

I can’t see any benchmarks or performance stats.

I’d like to see messages per second.

canadiantim4 hours ago

Could this work with Turso, the SQLite rust rewrite?