Back

Instant database clones with PostgreSQL 18

275 points10 hoursboringsql.com
elitan5 hours ago

For those who can't wait for PG18 or need full instance isolation: I built Velo, which does instant branching using ZFS snapshots instead of reflinks.

Works with any PG version today. Each branch is a fully isolated PostgreSQL container with its own port. ~2-5 seconds for a 100GB database.

https://github.com/elitan/velo

Main difference from PG18's approach: you get complete server isolation (useful for testing migrations, different PG configs, etc.) rather than databases sharing one instance.

72deluxe2 hours ago

Despite all of the complaints in other comments about the use of Claude Code, it looks interesting and I appreciated the video demo you put on the GitHub page.

newusertoday54 minutes ago

thanks for sharing its interesting approach. I am not sure why people are complaining most of the software is written with the help of agents these days.

theturtletalks38 minutes ago

It’s rampant. Launch anything these days and it’s bombarded with “vibe-coded” comments.

The issue of quality makes sense since it’s so easy to build these days, but when the product is open-source, these vibe coded comments make no sense. Users can literally go read the code or my favorite? Repomix it, pop it into AI Studio, and ask Gemini what this person has built, what value it brings, and does it solve the problem I have?

For vibe coded proprietary apps, you can’t do that so the comments are sort of justified.

whalesalad4 hours ago

Hell yeah. I’ve been meaning to prototype this exact thing but with btrfs.

tobase5 hours ago

Cool

Rovanion4 hours ago

You, is an interesting word to use given that you plagiarized it.

anonymars4 hours ago

Do you have a link to the original?

wahnfrieden2 hours ago

Please share the instant Postgres clones tool this copied! I'd love to try it

teiferer5 hours ago

You mean you told Claude a bunch of details and it built it for you?

Mind you, I'm not saying it's bad per se. But shouldn't we be open and honest about this?

I wonder if this is the new normal. Somebody says "I built Xyz" but then you realize it's vibe coded.

pritambarhate4 hours ago

Let's say there is an architect and he also owns a construction company. This architect, then designs a building and gets it built from of his employees and contractors.

In such cases the person says, I have built this building. People who found companies, say they have built companies. It's commonly accepted in our society.

So even if Claude built for it for GP, as long as GP designed it, paid for tools (Claude) to build it, also tested it to make sure that it works, I personally think, he has right to say he has built it.

If you don't like it, you are not required to use it.

fauigerzigerk3 hours ago

I agree that it's ultimately about the product.

But here's the problem. Five years ago, when someone on here said, "I wrote this non-trivial software", the implication was that a highly motivated and competent software engineer put a lot of effort into making sure that the project meets a reasonable standard of quality and will probably put some effort into maintaining the project.

Today, it does not necessarily imply that. We just don't know.

+1
onion2k52 minutes ago
+1
pritambarhate2 hours ago
+1
heliumtera2 hours ago
pbh1012 hours ago

In general that is all implication and assumption, for any code, especially OSS code.

+1
dabber2 hours ago
+2
wahnfrieden2 hours ago
greatgib4 hours ago

The architect knows what it is doing. And the workers are professionals with supervisors to check that the work is done properly.

happymellon2 hours ago

That's a lot of ifs.

risyachka2 hours ago

Asking someone to build a house - and then saying I built it - is "very misleading" to put it nicely.

When you order a website on upwork - you didn't build it. You bought it.

pebble4 hours ago

No, it's more like the architect has a cousin who is like "I totally got this bro" and builds the building for them.

foobarbecue3 hours ago

Right and also in this world there are no building codes or building inspections.

testdelacc13 hours ago

What an outrageously bad analogy. Everyone involved in that building put their professional reputations and licenses on the line. If that building collapses, the people involved will lose their livelihoods and be held criminally liable.

Meanwhile this vibe coded nonsense is provided “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. We don’t even know if he read it before committing and pushing.

+1
pritambarhate2 hours ago
pbh1012 hours ago

Same with any OSS. Up to you to validate whether or not it is worth depending on, regardless of how built. Social proof is a primary avenue to that and has little to do with how built.

tracker12 hours ago

Depends on the building type/size/scale and jurisdiction. Modern tract homes are really varied, hit or miss and often don't see any negative outcomes for the builders in question for shoddy craftsmanship.

rootnod33 hours ago

That has to be the worst analogy I have read in a while, and I’m HN that says something.

heliumtera2 hours ago

Every single commit is Claude. No human expert involved. Would you trust your company database to an 25 dollars vibe session? Would you live in a 5 dollars building? Is there any difference from hand tailored suit, constructed to your measurements, and a 5 dollars t-shirt? Some people don't want to live in a five dollars world.

+1
wahnfrieden2 hours ago
elAhmo4 hours ago

It is the new normal, whether you are against it or not.

If someone used AI, it is a good discussion to see whether they should explicitly disclose it, but people have been using assisted tools, from auto-complete, text expanders, IDE refactoring tools, for a while - and you wouldn't make a comment that they didn't build it. The lines are becoming more blurry over time, but it is ridiculous to claim that someone didn't build something if they used AI tools.

dpedu4 hours ago

Huh? It says so right in the README.

https://github.com/elitan/velo/blame/12712e26b18d0935bfb6c6e...

And are we really doing this? Do we need to admit how every line of code was produced? Why? Are you expecting to see "built with the influence of Stackoverflow answers" or "google searches" on every single piece of software ever? It's an exercise of pointlessness.

renewiltord3 hours ago

I think you need to start with the following statement:

> We would like to acknowledge the open source people, who are the traditional custodians of this code. We pay our respects to the stack overflow elders, past, present, and future, who call this place, the code and libraries that $program sits upon, their work. We are proud to continue their tradition of coming together and growing as a community. We thank the search engine for their stewardship and support, and we look forward to strengthening our ties as we continue our relationship of mutual respect and understanding

Then if you would kindly say that a Brazilian invented the airplane that would be good too. If you don’t do this you should be cancelled for your heinous crime.

+1
stronglikedan3 hours ago
earthnail4 hours ago

Not sure why this is downvoted. For a critical tool like DB cloning, I‘d very much appreciate if it was hand written. Simply because it means it’s also hand reviewed at least once (by definition).

We wouldn’t have called it reviewed in the old world, but in the AI coding world we’re now in it makes me realise that yes, it is a form of reviewing.

I use Claude a lot btw. But I wouldn’t trust it on mission critical stuff.

dpedu4 hours ago

It's being downvoted because the commenter is asking for something that is already in the readme. Furthermore, it's ironic that the person raising such an issue is performing the same mistake as they are calling out - neglecting to read something they didn't write.

+1
earthnail3 hours ago
ffsm84 hours ago

Eh, DB branching is mostly only necessary for testing - locally, in CI or quick rollbacks on a shared dev instance.

Or at least I cannot come up with a usecase for prod.

From that perspective, it feels like it'd be a perfect usecase to embrace the LLM guided development jank

+1
notKilgoreTrout4 hours ago
gavinray4 hours ago

  > Eh, DB branching is mostly only necessary for testing - locally
For local DB's, when I break them, I stop the Docker image and wipe the volume mounts, then restart + apply the "migrations" folder (minus whatever new broken migration caused the issue).
renewiltord3 hours ago

If you don’t read code you execute someone is going to steal everything on your file system one day

6r173 hours ago

There was a recent wave of such comment on the rust subreddit - exactly in this shape "Oh you mean you built this with AI". This is highly toxic, lead to no discussion, and is literally drove by some dark thought from the commentator. I really hope HN will not jump on this bandwagon and will focus instead on creating cool stuff.

Everybody in the industry is vibecoding right now - the things that stick are due to sufficient quality being pushed on it. Having a pessimistic / judgmental surface reaction to everything as being "ai slop" is not something that I'm going to look forward in my behavior.

heliumtera2 hours ago

>This is highly toxic, lead to no discussion

Why good faith is a requirement for commenting but not for submissions? I would argue the good faith assumption should be disproportionately more important for submissions given the 1 to many relationship. You're not lying, it indeed is toxic and rapidly spreading. I'm glad this is the case.

Most came here for the discussion and enlightenment to be bombarded by heavily biased, low effort marketing bullshit. Presenting something that has no value to anyone besides the builder is the opposite of good faith. This submissions bury and neglect useful discussion, difficult to claim they are harmless and just not useful.

Not everyone in the industry is vibe coding, that is simply not true. but that's not the point I want to make. You don't need to be defensive about your generative tools usage, it is ok to use whatever, nobody cares. Just be ready to maintain your position and defend your ideals. Nothing is more frustrating then giving honest attention to a problem, considering someone else perspective, to just then realize it was just words words words spewed by slop machine. Nobody would give a second thought if that was disclosed. You are responsible for your craft. The moment you delegate that responsibility into the thrash you belong. If the slop machine is so great, why in hell would I need you to ask it to help me? Nonsensical.

6r172 hours ago

Your bias is that you think that because you can use a bike then my bike efforts are worthless. Considering that I often thrash out what I generate and I know I do not generate -> ship ; but have a quality process that validate my work by itself - the way I'm reaching my goals present no value to my public.

The reason this discussion is pathetic, is that it shifts the discussion from the main topic (here it was a database implementation) - to abide by a reactionary emotive emulation with no grace or eloquence - that is mostly driven by pop culture at this point with a justification mostly shaping your ego.

There is no point in putting yourself above someone else just to justify your behavior - in fact it only tells me what kind of person you were in the first place - and as I said, this is not the kind of attitude that i'm looking up to.

rileymichael2 hours ago

> Everybody in the industry is vibecoding right now

no ‘everybody’ is not. a lot of us are using zero LLMs and continuing to build (quality) software just fine

6r171 hour ago

Justifiably, there is 0 correlation between something written manually and quality - in fact I argue it's quiet the opposite since you were unable to process as much play and architecture to try & break, you have spent less time experimenting, and more time pushing your ego.

sheepscreek1 hour ago

I set this up for my employer many years ago when they migrated to RDS. We kept bumping into issues on production migrations that would wreck things. I decided to do something about it.

The steps were basically:

1. Clone the AWS RDS db - or spin up a new instance from a fresh backup.

2. Get the arn and from that the cname or public IP.

3. Plug that into the DB connection in your app

4. Run the migration on pseudo prod.

This helped up catch many bugs that were specific to production db or data quirks and would never haven been caught locally or even in CI.

Then I created a simple ruby script to automate the above and threw it into our integrity checks before any deployment. Last I heard they were still using that script I wrote in 2016!

peterldowns3 hours ago

Really interesting article, I didn't know that the template cloning strategy was configurable. Huge fan of template cloning in general; I've used Neon to do it for "live" integration environments, and I have a golang project https://github.com/peterldowns/pgtestdb that uses templates to give you ~unit-test-speed integration tests that each get their own fully-schema-migrated Postgres database.

Back in the day (2013?) I worked at a startup where the resident Linux guru had set up "instant" staging environment databases with btrfs. Really cool to see the same idea show up over and over with slightly different implementations. Speed and ease of cloning/testing is a real advantage for Postgres and Sqlite, I wish it were possible to do similar things with Clickhouse, Mysql, etc.

riskable2 hours ago

PostgreSQL seems to have become the be-all, end-all SQL database that does everything and does it all well. And it's free!

I'm wondering why anyone would want to use anything else at this point (for SQL).

efxhoy1 hour ago

It’s the clear OLTP winner but for OLAP it’s still not amazing out of the box.

wahnfrieden2 hours ago

Can’t really run it on iOS. And its WASM story is weak

hmokiguess39 minutes ago

I’ve been a fan of Neon and it’s branching strategy, really handy thing for stuff like this.

majodev5 hours ago

Uff, I had no idea that Postgres v15 introduced WAL_LOG and changed the defaults from FILE_COPY. For (parallel CI) test envs, it make so much sense to switch back to the FILE_COPY strategy ... and I previously actually relied on that behavior.

Raised an issue in my previous pet project for doing concurrent integration tests with real PostgreSQL DBs (https://github.com/allaboutapps/integresql) as well.

radarroark5 hours ago

In theory, a database that uses immutable data structures (the hash array mapped trie popularized by Clojure) could allow instant clones on any filesystem, not just ZFS/XFS, and allow instant clones of any subset of the data, not just the entire db. I say "in theory" but I actually built this already so it's not just a theory. I never understood why there aren't more HAMT based databases.

zX41ZdbW5 hours ago

This is typical for analytical databases, e.g., ClickHouse (which I'm the author of) uses immutable data parts, allowing table cloning: https://clickhouse.com/docs/sql-reference/statements/create/...

ozgrakkurt2 hours ago

`ClickHouse (which I'm the author of)` just casually dropped that in the middle

chamomeal3 hours ago

Does datomic have built in cloning functionality? I’ve been wanting to try datomic out but haven’t felt like putting in the work to make a real app lol

radarroark3 hours ago

Surprisingly, no it does not. Datomic has a more limited feature that lets you make an in-memory clone of the latest copy of the db for speculative writes, which might be useful for tests, but you can't take an arbitrary version of the db with as-of and use it as the basis for a new version on disk. See: https://blog.danieljanus.pl/2025/04/22/datomic-forking-the-p...

There's nothing technically that should prevent this if they are using HAMTs underneath, so I'm guessing they just didn't care about the feature. With HAMT, cloning any part of the data structure, no matter how nested, is just a pointer copy. This is more useful than you'd think but hardly any database makes it possible.

BenjaminFaal6 hours ago

For anyone looking for a simple GUI for local testing/development of Postgres based applications. I built a tool a few years ago that simplifies the process: https://github.com/BenjaminFaal/pgtt

peterldowns3 hours ago

Is this basically using templates as "snapshots", and making it easy to go back and forth between them? Little hard to tell from the README but something like that would be useful to me and my team: right now it's a pain to iterate on sql migrations, and I think this would help.

okigan6 hours ago

Would love to see a snapshot of the GUI as part of the README.md.

Also docker link seems to be broken.

BenjaminFaal5 hours ago

Fixed the package link. Github somehow made it private. I will add a snapshot right now.

christophilus5 hours ago

As an aside, I just jumped around and read a few articles. This entire blog looks excellent. I’m going to have to spend some time reading it. I didn’t know about Postgres’s range types.

pak9rabid3 hours ago

Range types are a godsend when you need to calculate things like overlapping or intersecting time/date ranges.

zachrip2 hours ago

Can you give a real world example?

christophilus60 minutes ago

I think the examples here are pretty good: https://boringsql.com/posts/beyond-start-end-columns/

francislavoie4 hours ago

Is anyone aware of something like this for MariaDB?

Something we've been trying to solve for a long time is having instant DB resets between acceptance tests (in CI or locally) back to our known fixture state, but right now it takes decently long (like half a second to a couple seconds, I haven't benchmarked it in a while) and that's by far the slowest thing in our tests.

I just want fast snapshotted resets/rewinds to a known DB state, but I need to be using MariaDB since it's what we use in production, we can't switch DB tech at this stage of the project, even though Postgres' grass looks greener.

pak9rabid3 hours ago

I was able to accomplish this by doing each test within its own transaction session that gets rolled-back after each test. This way I'm allowed to modify the database to suit my needs for each test, then it gets magically reset back to its known state for the next test. Transaction rollbacks are very quick.

hu32 hours ago

As a consultant, I saw many teams doing that and it works well.

The only detail is that autoincrements (SEQUENCEs for PotgreSQL folks) gets bumped even if the transaction rollsback.

So tables tend to get large ids quickly. But it's just dev database so no problem.

proaralyst4 hours ago

You could use LVM or btrfs snapshots (at the filesystem level) if you're ok restarting your database between runs

briffle3 hours ago

LVM snapshots work well. Used it for years with other database tools.. But make sure you allocate enough write space for the COW.. when the write space fills up, LVM just 'drops' the snapshot.

francislavoie4 hours ago

Restarting the DB is unfortunately way too slow. We run the DB in a docker container with a tmpfs (in-memory) volume which helps a lot with speed, but the problem is still the raw compute needed to wipe the tables and re-fill them with the fixtures every time.

ikatson3 hours ago

How about do the changes then bake them into the DB docker image. I.e. "docker commit".

Then spin up the dB using that image instead of an empty one for every test run.

This implies starting the DB through docker is faster than what you're doing now of course.

francislavoie3 hours ago

Yeah there's absolutely no way restarting the container will be faster.

renewiltord3 hours ago

I have not done this so it’s theorycrafting but can’t you do the following?

1. Have a local data dir with initial state

2. Create an overlayfs with a temporary directory

3. Launch your job in your docker container with the overlayfs bind mount as your data directory

4. That’s it. Writes go to the overlay and the base directory is untouched

+2
francislavoie3 hours ago
1f977 hours ago
nateroling48 minutes ago

This is on the cluster level, while the article is talking about the database level, I believe.

horse6665 hours ago

Aurora clones are copy-on-write at the storage layer, which solves part of the problem, but RDS still provisions you a new cluster with its own endpoints, etc, which is slow ~10 mins, so not really practical for the integration testing use case.

TimH6 hours ago

Looks like it would probably be quite useful when setting up git worktrees, to get multiple claude code instances spun up a bit more easily.

mvcosta917 hours ago

It looks very interesting for integration tests

radimm7 hours ago

OP here - yes, this is my use case too: integration and regression testing, as well as providing learning environments. It makes working with larger datasets a breeze.

presentation6 hours ago

We do this, preview deploys, and migration dry runs using Neon Postgres’s branching functionality - seems one benefit of that vs this is that it works even with active connections which is good for doing these things on live databases.

drakyoko6 hours ago

would this work inside test containers?

radimm5 hours ago

OP here - still have to try (generally operate on VM/bare metal level); but my understanding is that ioctl call would get passed to the underlying volume; i.e. you would have to mount volume

horse6665 hours ago

This is really cool, looking forward to trying it out.

Obligatory mention of Neon (https://neon.com/) and Xata (https://xata.io/) which both support “instant” Postgres DB branching on Postgres versions prior to 18.

1a527dd56 hours ago

Many thanks, this solves integration tests for us!

oulipo24 hours ago

Assuming I'd like to replicate my production database for either staging, or to test migrations, etc,

and that most of my data is either:

- business entities (users, projects, etc)

- and "event data" (sent by devices, etc)

where most of the database size is in the latter category, and that I'm fine with "subsetting" those (eg getting only the last month's "event data")

what would be the best strategy to create a kind of "staging clone"? ideally I'd like to tell the database (logically, without locking it expressly): do as though my next operations only apply to items created/updated BEFORE "currentTimestamp", and then:

- copy all my business tables (any update to those after currentTimestamp would be ignored magically even if they happen during the copy) - copy a subset of my event data (same constraint)

what's the best way to do this?

gavinray4 hours ago

You can use "psql" to dump subsets of data from tables and then later import them.

Something like:

  psql <db_url> -c "\copy (SELECT * FROM event_data ORDER BY created_at DESC LIMIT 100) TO 'event-data-sample.csv' WITH CSV HEADER"
https://www.postgresql.org/docs/current/sql-copy.html

It'd be really nice if pg_dump had a "data sample"/"data subset" option but unfortunately nothing like that is built in that I know of.

peterldowns3 hours ago

pg_dump has a few annoyances when it comes to doing stuff like this — tricky to select exactly the data/columns you want, and also the dumped format is not always stable. My migration tool pgmigrate has an experimental `pgmigrate dump` subcommand for doing things like this, might be useful to you or OP maybe even just as a reference. The docs are incomplete since this feature is still experimental, file an issue if you have any questions or trouble

https://github.com/peterldowns/pgmigrate

tehlike1 hour ago

Now i need to find a way to migrate from hydra columnar to pg_lake variants so i can upgrade to PG18.

7023184643 hours ago

Hi