Scaling PostgreSQL to power 800M ChatGPT users

(openai.com)

38 points | by mustaphah 4 hours ago

5 comments

  • everfrustrated 2 hours ago
    This is why I love Postgres. It can get you to being one of the largest websites before you need to reconsider your architecture just by throwing CPU and disk at it. At that point you can well afford to hire people who are deep experts at sharding etc.
    • zozbot234 29 minutes ago
      PostgreSQL actually supports sharding out of the box, it's just a matter of setting up the right table partitioning and Foreign Data Wrapper (FDW) to forward queries to remote databases. I'm not sure what the post is referencing when they say that sharding requires leaving Postgres altogether.
  • bzmrgonz 1 hour ago
    Someone ask Microsoft what does it feel to be bested by an open source project on their very own cloud platform!!! Lol.
    • esjeon 25 minutes ago
      Azure offers Postgres “DBaaS”, so I’m pretty sure they are no where near that stage. It’s more likely that we should watch out for the Microsoft E-E-E strategy.
  • QuiCasseRien 28 minutes ago
    I like the way of thinking. Instead of migrating to another database, they keep that awesome one running and found smart workaround to push limits.
  • ed_mercer 2 hours ago
    Why does the [Azure PostgreSQL flexible server instance] link point to Chinese Azure?
    • noxs 38 minutes ago
      All names are Asian and mostly Chinese

      > Author Bohan Zhang

      > Acknowledgements Special thanks to Jon Lee, Sicheng Liu, Chaomin Yu, and Chenglong Hao, who contributed to this post, and to the entire team that helped scale PostgreSQL. We’d also like to thank the Azure PostgreSQL team for their strong partnership.

    • Natfan 2 hours ago
      Bohan Zhang, the article's author, is likely Chinese.

      e: and the link points to en-us at time of writing. I frankly don't see the value in your comment.

  • hu3 3 hours ago
    From what I understand they basically couldn't scale writes in PostgreSQL to their needs and had to offload what they could to Azure's NoSQL database.

    I wonder, is there another popular OLTP database solution that does this better?

    > For write traffic, we’ve migrated shardable, write-heavy workloads to sharded systems such as Azure CosmosDB.

    > Although PostgreSQL scales well for our read-heavy workloads, we still encounter challenges during periods of high write traffic. This is largely due to PostgreSQL’s multiversion concurrency control (MVCC) implementation, which makes it less efficient for write-heavy workloads. For example, when a query updates a tuple or even a single field, the entire row is copied to create a new version. Under heavy write loads, this results in significant write amplification. It also increases read amplification, since queries must scan through multiple tuple versions (dead tuples) to retrieve the latest one. MVCC introduces additional challenges such as table and index bloat, increased index maintenance overhead, and complex autovacuum tuning.

    • 0xdeafbeef 2 hours ago
      Tidb should handle it nice. I've wrote 200к inserts / sec for hour in peak. Underlying lsm works better for writes
      • anonzzzies 2 hours ago
        That would mean it improved somewhat. We always got better write performance from mysql vs postgres, however that is a while ago; we then tried tidb to go further but it was basically rather slow. Again, a while ago.

        When did you get your results, might be time to re-evaluate.