BioHacker News | Microsoft open sources PostgreSQL extensions

▲Microsoft open sources PostgreSQL extensions(theregister.com)

161 points by beardyw 140 days ago | 13 comments

▲zknill 140 days ago

> A spokesperson at MongoDB said: "The rise of MongoDB imitators proves our document model is the industry standard. But bolting an API onto a relational database isn't innovation – it's just kicking the complexity can down the road. These 'modern alternatives' come with a built-in sequel: the inevitable second migration when performance, scale, and flexibility hit a wall."

I think the reason that a there are so many MongoDB wire compatible projects (like this postgres extension from microsoft, and ferretdb) is because people have systems that have MongoDB clients built into their storage layers but don't want to be running on MongoDB anymore, exactly because "performance, scale, and flexibility hit a wall".

If you can change the storage engine, but keep the wire protocol, it makes migrating off Mongo an awful lot cheaper.

▲katsura 140 days ago

MongoDB imitators? Wasn't CouchDB before MongoDB? CouchDB also stores JSON documents. They did create an imitation of the Mongo query syntax, but the document model doesn't seem to originate from Mongo, as far as I can tell.

▲WorldMaker 140 days ago

In some ways it is another reflection of the unix philosophy "worse is better". The CouchDB wire protocol is really complex because A) it starts with an almost full HTTP stack and is "REST-like", and B) prioritized multi-database synchronization and replication first and up front, which is incredibly powerful and useful, but makes compatible implementations a lot harder to write. MongoDB's wire protocol is simple, stupid, relatively efficient (binary encoding of JSON over plaintext JSON encodings), had a query language that specifically wasn't built on REST terms, and was so simple to implement that every database could do it (and have done it).

MongoDB's protocol has fewer features (RIP easy synchronization and replication) but worse is better and if the "core" feature of a database it its query language, then the best/worst query language won, the simple, dumb, easy to replicate one.

▲luckydonald 140 days ago

To be fair, I'd love a plugin/framework/microservice you could simply on top of Postgres for having CouchDB's offline synchronization capabilites. Like the syncing capabilites for mobile and other "sometimes offline" devices are fire. But in my stack, everything else is already running in a Postgres already, and that's the source of truth.

▲WorldMaker 138 days ago

At one point I started to explore what you would need to adapt any MongoDB compatible DB to work with CouchDB-style synchronization. The biggest problem is you need a consistent Changes view of the database and that's optional from MongoDB-provider to MongoDB-provider. Had I continued on that project it probably would have wound up Azure CosmosDB-specific, specific enough that it didn't feel particularly good for ROI, and was starting to feel like a "back to the drawing board/time to leave CouchDB" situation. It's interesting to wonder if the CosmosDB/DocumentDB pieces that Microsoft is open sourcing (as mentioned in the article here) would eventually include a Postgres extension for their Changes view? Had that been done when I was evaluating options in that past project, it might have lent more weight to that ROI decision that it would also support open source Postgres databases with these open source Document DB extensions.

▲ochiba 137 days ago

We are working on this: https://www.powersync.com/

We started with Postgres and added support for MongoDB after they deprecated Atlas Device Sync in September last year.

▲katsura 137 days ago

You mean like https://zero.rocicorp.dev/ or https://electric-sql.com/ ?

▲HackerThemAll 138 days ago

What a load of b.s. in that quote. Why already coming into panic and dismissal mode? Because we really do need more than just a database that stores bunch of JSON and cannot properly do transactions when required.

For known attributes the model based on rigid tables having strict typing, with unknowns placed in one or more JSON fields works best for me. I mean for example there like 100 known attributes of a product, and the category-specific ones, the unknowns generated dynamically at runtime, go to the "product_attributes" JSON column.

▲theshrike79 140 days ago

I've said it as a joke many times that PostgreSQL can be a better NoSQL database than many actual NoSQL databases just by creating a (text, JSONB) table.

You can do actual searches inside the JSON, index the table with JSONB contents etc. Things that became available in MongoDB very very late.

▲zulban 140 days ago

To your point, I replaced MongoDB with postgresql and a naive wrapper to still use JSON docs the same, and to my surprise my services got way faster and more stable. I write about it here:

https://blog.stuartspence.ca/2023-05-goodbye-mongo.html

▲Thaxll 140 days ago

And many told you that you did not tell the whole story or did not understand mongodb but apparently you fail to adress those.

Going from 150ms to 8ms and 80% CPU reduction does not make any sense perf wise. I stand my point it's missing a lot of details in your post and probably miss usage.

▲smt88 140 days ago

In GP’s shoes, why bother to learn Mongo then? What’s the benefit?

Postgres has a good-enough (if not better) document DB inside it, but it also has a billion other features and widespread adoption. Mongo needs a huge benefit somewhere to justify using it.

▲troupo 140 days ago

If that reduction can be achieved by simply replacing Mongo with Postgres... what is the point of using Mongo?

Edit: on top of that, very few things that you store in the DB are non-relational. So you will always end up recreating a relational database on top of any NoSQL database. So why bother when you can just go for a RDBMS from the start?

▲Thews 140 days ago

While data can be used in a relational way, it doesn't mean that's the best for performance or storage. Important systems usually require compliance (auditing) and need things like soft deletion and versioning. Relational databases come to a crawl with that need.

Sure you can implement things to make it better, but it's layers added that balloon the complexity. Most robust systems end up requiring more than one type of database. It is nice to work on projects with a limited scope where RDBMS is good enough.

▲troupo 140 days ago

> and need things like soft deletion and versioning. Relational databases come to a crawl with that need.

Lol. No relational database slows to a crawl on `is_deleted=true` or versioning

In general so far not a single claim by NoSQL databases has been shown to be true. Except KV databases, as they have their own uses

▲Thews 140 days ago

They slow to a crawl when you have huge tables with lots of versioned data and massive indexes that can't perform maintenance in a reasonable amount of time, even with the fastest vertically scaled hardware. You run into issues partitioning the data and spreading it across processors, and spreading it across servers takes solutions that require engineering teams.

There's a large amount of solutions for different kinds of data for a reason.

▲pritambaral 139 days ago

I have built "huge tables with lots of versioned data and massive indexes". This is false. I had no issues partitioning the data and spreading it across shards. On Postgres.

> ... takes solutions that require engineering teams.

All it took was an understanding of the data. And just one guy (me), not an "engineering team". Mongo knows only one way of sharding data. That one way may work for some use-cases, but for the vast majority of use-cases it's a Bad Idea. Postgres lets me do things in many different ways, and that's without extensions.

If you don't understand your data, and you buy in to the marketing bullshit of a proprietary "solution", and you're too gullible to see through their lies, well, you're doomed to fail.

This fear-mongering that you're trying to pull in favour of the pretending-to-be-a-DB that is Mongo is not going to work anymore. It's not the early 2010s.

▲Thews 139 days ago

Where did I ever say anything about Mongo?

I have worked with tables on this scale. It definitely is not a walk in the park with traditional setups. https://www.timescale.com/blog/scaling-postgresql-to-petabyt...

Now data chunked into objects distributed around to be accessed by lots of servers, that's no sweat.

I'd love to see how you handle database maintenance when your active data is over 100TB.

▲troupo 139 days ago

I'd love to see a NoSQL database handling this easier than a RDBMS

▲Thews 139 days ago

You mean like scylla?

▲troupo 139 days ago

> They slow to a crawl when you have huge tables

Define "huge". Define "massive".

For modern RDBMS that starts at volumes that can't really fit on one machine (for some definition of "one machine"). I doubt Mongo would be very happy at that scale, too.

On top of that an analysis of the query plan usually shows trivially fixable bottlenecks.

On top of that it also depends on how you store your versioned data (wikipedia stores gzipped diffs, and runs on PHP and MariaDB).

Again, none of the claims you presented have any solid evidence in real world.

▲Thews 139 days ago

Wikipedia is tiny data. You don't start to really see cost scaling issues until you have active data a few hundred times larger and your data changes enough that autovacuuming can't keep up.

I'm getting paid to move a database that size this morning.

▲troupo 139 days ago

English language Wikipedia revision history dump: April 2019: 18 880 938 139 465 bytes (19 TB) uncompressed. 937GB bz2 compressed. 157GB 7z compressed.

I assume since then it's grown at least ten-fold. It's already an amount of data that would cripple most NoSQL solutions on the market.

I honestly feel like talking to functional programming zealots. There's this fictional product that is oh so much better than whatever tool you're talking about. No one has seen it, no one has proven it exists, or works better than the current perfectly adequate and performant tool. But trust us, for some ridiculous vaguely specified constraints it definitely works amazingly well.

This time "RDBMS is bad at soft deletions and versions because 19TBs of revisions on one of the world's most popular websites is tiny"

[1] https://meta.wikimedia.org/wiki/Data_dumps/Dumps_sizes_and_g...

▲Thews 139 days ago

Wikipedia's active english data is only 24gb compressed. https://dumps.wikimedia.org/enwiki/20250201/

They store revisions in compressed storage mostly read only for archival. https://wikitech.wikimedia.org/wiki/MariaDB#External_storage

They have the layout and backup plans of their servers available.

They've got an efficient layout, and they use caching, and it is by nature very read intensive.

https://wikitech.wikimedia.org/wiki/MariaDB#/media/File:Wiki...

Archival read only servers don't have to worry about any of the maintenance mentioned. Use chatgpt or something to play your devil's advocate, because what you're saying is magical and non existent is quite common.

▲koolba 140 days ago

> I've said it as a joke many times that PostgreSQL can be a better NoSQL database than many actual NoSQL databases just by creating a (text, JSONB) table.

Indeed, the tag line for one of the releases, I think 9.4 or 9.5, was “NoSQL on ACID”.

▲sangnoir 140 days ago

I'd like to buy whoever came up with that tagline a beer.

▲koolba 140 days ago

There was this talk around that time: https://www.pgcon.org/2015/schedule/events/868.en.html

Though pretty sure Bruce is a teetotaler.

▲sangnoir 137 days ago

Root beer, then, or any beverage of their choice.

▲mplanchard 140 days ago

Often true, but updates of JSON documents in postgres or inserts into nested arrays or objects are much, much slower, and involve locking the entire row. I think if your use case is insert/read only, it works well, though at scale even that can become an issue due to the GIN fastupdate index logic, which can lead to the occasional 20-30 second insert as the buffer gets flushed and the index updated.

▲GordonS 140 days ago

> it works well, though at scale even that can become an issue due to the GIN fastupdate index logic, which can lead to the occasional 20-30 second insert as the buffer gets flushed and the index updated.

Hmm, interesting! I've been having some intermittent issues recently with a Postgres table that uses JSONB, where everything seems to lock up for several minutes at a time, with no useful logs as to why.

Do you have any more info about the GIN issue, and how it can be verified as being a problem please?

▲mplanchard 140 days ago

This gitlab issue where they disabled it on one of their tables helped me a lot with understanding and debugging the problem: https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7...

It’s not a super well-documented feature in postgres, so I also wound up doing some digging in the mailing lists, but I was really appreciative of all the detail in that GL issue and the associated links.

▲GordonS 139 days ago

This is absolute gold, thanks!

▲joaohaas 140 days ago

A quick search brought this up: https://www.louisemeta.com/blog/indexes-gin-algorithms/

I think it illustrates the issue he's mentioning well enough.

▲ak217 140 days ago

It's not a joke. JSONB blows Mongo out of the water. I've run large production workloads on both.

> You can do actual searches inside the JSON, index the table

To be clear, you can do those things with Mongo (and the features have been available for a very long time), they just don't work as well, and are less reliable at scale.

▲pjmlp 140 days ago

NoSQL is one of those cycles where I was quite happy to keep working with RDMS, while ignoring all the hype around it.

▲tracker1 138 days ago

There are lessons you can take with you all the same to relational rdbms. Especially with decent json support in the box.

▲ 140 days ago

▲boruto 139 days ago

We use postgres jsonb and have search built 30 million/~100-500 kb records. Only GIN/GIST indexes are possible, if we want to anything special(Search/send events if that is edited) at a deeper node of jsonb, the team prefers to extract it out of jsonb into a new column, which creates it own set of problems.

Probably if someone looking to use jsonb to replace NoSQL DB would need to look deeper.

▲zX41ZdbW 140 days ago

There is still a long road to improve database usability on semistructured data.

My team has compared PostgreSQL and MongoDB on analytical queries on JSON, and the results so far have been unimpressive: https://github.com/ClickHouse/JSONBench/

▲tracker1 138 days ago

The past few years that is often the first table I create in a database/schema for b configurations to store in the database.

▲Thaxll 140 days ago

It's very painful to use JSON with regular SQL drivers, MongoDB driver is much better overall than regular SQL from a programmer perspective.

▲dtech 140 days ago

Which language? In Java (/the JVM) it's perfectly fine while Mongo is a PITA

▲nonethewiser 140 days ago

working with JSONB in Postgres is painful.

It's powerful, I love that it's there, but actually using it is always such a chore.

▲krashidov 140 days ago

Genuinely curious, what do you find painful about it? Is it the syntax? Performance? Or does it just encourage bad practices?

▲javaunsafe2019 140 days ago

I also don't get it - it's super easy to write queries with it. Please elaborate...

▲nonethewiser 140 days ago

stuff like this

SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @> '{"tags": ["qui"]}';

SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @? '$.tags[*] ? (@ == "qui")';

which find documents in which the key "tags" contains array element "qui"

https://www.postgresql.org/docs/current/datatype-json.html

▲Tostino 140 days ago

I had to ask Claude because I've never worked with Mongo, but apparently the equivalent query for MongoDB is:

    db.api.find(
      { "tags": "qui" },
      { 
        "guid": 1, 
        "name": 1,
        "_id": 0
      }
    )

▲nonethewiser 140 days ago

It's fair to compare alternatives but to be clear I'm not saying postgres JSONB is worse than mongo. I'm not very familiar with mongo either. Im saying it can be tedious is absolute terms. I did not assume this to be universal to document dbs but perhaps it could be. Although your example does seem simpler.

▲javaunsafe2019 140 days ago

Wait what? Writing mongo queries vs SQL is a pain in my experience…

▲nikita 140 days ago

The word is it's a serious effort on the part of Microsoft. It's missing a MongoDB wire protocol which they plan to opensource as well. In the meantime it's possible to use FerretDB for that.

I think the biggest use case is big data and dev platform that need application compatibility and wrapping Atlas is less attractive for some reason.

▲Eikon 140 days ago

I really don't see the point of this compared to Postgres' native JSON support.

And with things like this in the source https://github.com/microsoft/documentdb/blob/a0c44348fb473da...

I'd definitely stay away from it.

▲SigmundA 140 days ago

Since it uses BSON it support more datatypes like date, binary, decimal, etc.

Looks like it supports background indexing too.

Date and Binary are ones I wish JSONB supported most often.

▲jasonthorsness 140 days ago

This is the key feature; unless you work with MongoDB you don’t realize the ubiquity of the extra types it supports and how it improves ease of use

▲SigmundA 140 days ago

My preference would have been for PG to have a hierarchal data type that supported all of its normal data types and then built JSON support on top of that.

▲mplanchard 140 days ago

I guess my question would be how it handles updates, inserts, and deletes on nested components of objects. These operations in postgres’ JSONB can be a huge problem due to needing to lock the entire row.

▲jasonthorsness 140 days ago

Well at least now that it is open, that sort of thing can be noticed and fixed more easily

▲Tostino 140 days ago

At least they commented it XD.

Selectivity is pretty hard to get right though so I can see why they stubbed it out and will deal with an actual implementation later.

▲atombender 140 days ago

I was hoping this was an implementation of the schemaless indexing [1], which is the foundation for Azure DocumentDB.

That design allows arbitrary nested JSON data to be indexed using inverted indexes on top a variation of B-trees called Bw-trees, and seems like a nice way of indexing data automatically in a way that preserves the ability to do both exact and range matching on arbitrarily nested values.

[1] https://www.vldb.org/pvldb/vol8/p1668-shukla.pdf

▲isignal 140 days ago

Postgres supports indexing an arbitrary json document. https://www.postgresql.org/docs/current/datatype-json.html

Not sure if the query capabilities and syntax match azure docdb but the basic functionality should be workable.

▲atombender 140 days ago

Of course it does, but it's limited.

GIN does not support range searches (needed for <, <=, >, >=), prefix or wildcard, etc. It also doesn't support index-only scans, last I checked. You cannot efficiently ORDER BY a nested GIN value.

I recommend reading the paper.

▲pier25 140 days ago

how come Microsoft is investing in PG but dotnet doesn't have an official PG driver?

▲taftster 140 days ago

Microsoft is weird.

▲pphysch 140 days ago

Same reason they invest in Linux rather than just Windows

Developers, developers, developers

▲fuy 137 days ago

what about npgsql? it's pretty much official and is developed by Microsoft people

▲pier25 137 days ago

AFAIK currently only one contributor is working at Microsoft.

▲cyberax 140 days ago

I wish PostgreSQL would support better syntax for JSON updates. You can use `->>` to navigate the JSON fields, but there's nothing similar for updates. You have to use clumsy functions or tricks with `||` and string casting.

▲joaohaas 140 days ago

Idk, `jsonb_set` and `||` works good enough for most use cases. For reference, you can use `jsonb_set` to do stuff like: jsonb_set('{"a": [{"b": 1}]}', '{a,0,c}', '2')

I think a `||` that works with JSON patches would be nice, but you can easily implement that as an extension if you need it.

▲cyberax 140 days ago

It's super not great. Instead of having something like: "update blah set field->abc = '123'" you end up with "update blah set field = jsonb_set(field, 'abc', '123');".

Not the end of the world, but I have to look up the syntax every time I need to write a query.

▲giancarlostoro 140 days ago

My biggest gripe with MongoDB is the memory requirements, which did not become immediately obvious to me when I was first using it. If you index a field, Mongo will shove every record from that field into memory, which can add up very quickly, and can catch you off-guard if you're not accounting for it. It is otherwise a fantastic database that I find myself being highly productive with. I just don't always have unlimited memory to waste, so I opt for more memory friendly SQL databases.

▲pickle-wizard 140 days ago

This looks promising. I am using MongoDB in my application because the data is document oriented. I did take a brief look at using JSONB in Postgres, but I found it a lot harder to work with.

I would prefer to use Postgres as my database, so this is worth investigating. Taking a brief look at the github page, it looks like it will be easy to swap it out in my code. So I think I know what I'll be spending my next sprint on.

▲fowl2 139 days ago

A seemingly coordinated release with the nominally independent FerretDb? What’s the relationship here?

▲humanlity 139 days ago

is it https://github.com/microsoft/documentdb ?

▲aleksi 139 days ago

Yes

▲2d8a875f-39a2-4 140 days ago

[flagged]

▲Someone1234 140 days ago

It is MIT licensed, and targets PostgresSql which itself has a very liberal license (and is not a Microsoft project). How does the "extend, extinguish" work in this scenario?

▲ChicagoDave 140 days ago

Microsoft silently hates NoSql document data storage.

The complexity of maintaining a relational database is at the core of their proprietary business model.

For operational systems, documents are just better in every way.

Leave the rdbms for analytics and reporting.

▲robertlagrant 140 days ago

If it hates it that much it shouldn't have made CosmosDB.

▲ChicagoDave 140 days ago

Cosmos is behind a weird wall within Azure. It’s not comparable to DynamoDB or even MongoDB.

They (Microsoft) really don’t want you using it.

▲p_ing 140 days ago

I used CosmosDb in MongoDb mode with an app written for MongoDb.

Cosmos can be a drop-in replacement.

▲dmarlow 140 days ago

This is just absurdly false. What basis do you have for this claim?

▲ChicagoDave 139 days ago

It's just my experience. Microsoft wants you to use SQL Server for everything.

If you start thinking in Domain-Driven Design terms, you realize the technology should be dictated by the business models. In many (if not most) cases, a document database is more than sufficient for most bounded contexts and its services, events, commands, and data.

Spinning up a relational database, managing schema changes, and tuning indexes is a constant drag on productivity.

If Microsoft wanted to compete with DynamoDB or MongoDB, they'd make the Cosmos document database a first line service in Azure. But you have to first spin up Cosmos and then identify a type of data storage. There is no technical reason for this setup other than to lump various non-relational data storage types into one service and create confusion and complexity.

I've done bake-offs between Microsoft and AWS and when I talk about a standard serverless architecture, the MS people are usually confused and respond, "What do you mean?" and the AWS folks are "Cool, so Lambdas and DynamoDB. We have 17 sample solutions for you."

I'm not saying you can't do serverless in Azure. I'm saying the support and advocacy is not there.

▲robertlagrant 138 days ago

> But you have to first spin up Cosmos and then identify a type of data storage

They call (or called, 3 years ago) it a "multi-modal" database, but really it was just a wrapper around three engines. It did come with fairly standardised pricing and availability guarantees, though, so my impression it was it was trying to sell to cloud architects and CIOs who might appreciate that.