PostgreSQL on Kubernetes (#1)

The next train is about to start...

tags: #Kubernetes #Operators
published: December 16, 2024
reading time: 8 minutes

PostgreSQL on Kubernetes

One can feel the change for some months, if not years, around AI… Including in the PostgreSQL community. pgvector started that some years ago. And we hear more and more about it.

If there’s another area where our community is adapting is the rise of Kubernetes. Because clearly, Kubernetes won what I’d call the battle for containers.

I just did a fast count of “Kubernetes talks” that took part in the last 3 PostgreSQL Europe Conferences:

2022: Berlin had one unique talk;
2023: Prague had 2 talks an 1 full day of training;
2024: Athens had 4 talks and again 1 full day of training.

What in 2025 in Riga?…

Will we have 8 talks and 2 full days of training? Personally, I wouldn’t be surprised :-)

The Data on Kubernetes Community is more and more visible. I strongly recommend you join DoK by the way, it’s a welcoming and fun group :-)

All this just follows what we can see more globally in the Kubernetes world. As an example, cncf.io published in this article about KubeCon+CloudNativeCon Europe 2024: “Over 12,000 people joined us in Paris for the largest ever KubeCon + CloudNativeCon Europe.”… And they were about 9000 in Salt Lake City for the same event KubeCon + CloundNativeCon North America a few weeks ago…

Why put PostgreSQL databases in Kubernetes

Because everything else is already there.

And for some companies, that movement started 5 years ago. Don’t believe it’s sudden. It never is when it comes to databases !

The main point of those big users was to finaly dismantle their monolithic applications into small parts, often called micro services.

Some events did accelerate that move. One of the best example is when VMware was acquired by Broadcom. Some VMWare prices just raised by x2 if not x5…

If StatefulSets allowed databases to run into Kubernetes, it’s more and more convenient to store data into Kubernetes.

And as it’s still hardware underneath, it doesn’t change much things. I’m into Kubernetes now for a bit more than 2 years. I can say I was really suspicious about performances in that area. I must say I completely changed my mind here. It is fast. Specially when Kubernetes is directly installed on Bare Metal.

Finaly, many “PostgreSQL companies” offer products and support in that area too for many years now, and they can list many of their usecases on their website.

So many users thought it was the right time to finaly kill those remaining VMs in the back of the datacenter, those with the PostgreSQL databases, and to migrate them into Kubernetes, aside everything else.

PostgreSQL in Kubernetes is cool

PostgreSQL in Kubernetes is cool.. if you use an operator. There is some choice out there when it comes to PostgreSQL operators for Kubernetes. Make your choice ;-)

Basically, for most usages, it will comes to edit and kubectl apply a YAML configuration file. Every PostgreSQL operator comes with a ton of examples, an good documentation; You’ll have to read a bit.

Speaking about install.. It will all about installing a container (or a small suite of containers), where everything is already configured and installed the right way for you is countless times easier to install than if you’d do that by your own. Or manage the tools to do it. As a comparison, modern PostgreSQL stacks use Ansible now for many years, and to install a complete cluster of many instances, in high availability, with backups, monitoring, pooling, read and read/write ports, virtual IP, etc.. This can turn into 1200 to 2000 tasks to be run among 10 to 20 different roles. It is complex, and there are zillions possibilities to fail, just because of the underlying OS you choose (including the major version of your prefered Linux Distro…).

You can embed things and let programs just deploy the things as you declared them. As an example, read the incredible series my awesome colleague Bob Pacheco wrote, starting with “Postgres GitOps with Argo and Kubernetes”. If you click on the link “More by this author”, you’ll be able to find the next posts about Argo! You will understand that a git comit; git push on your YAML is not only the best way to follow its versions, but it is also a convenient way of actually deploying it :-)

Scaling from a standalone PostgreSQL server to a multi-instances cluster, with or without synchronous replication, High Availability, Monitoring, Pooling… all those operations will result in quite the same thing: edit the YAML again, and re-apply it. And let the operator & Kubernetes magic do the job for you.

Hybridation is more and more possible, depending operators. You could have your data infrastructure on your home Kubernetes cluster, and some of the data elsewhere in the cloud of your choice, or the reverse.

Running PostgreSQL in Kubernetes also inherits lots of innovation that Kubernetes brings, like with Volume Snaphots. They are widely available under Kubernetes. Just read that excellent Brian Pace’s article “PostgreSQL Snapshots and Backups with pgBackRest in Kubernetes” if you want to forge yourself an idea of what does that means.

I must say that quite each month for about 2 years now, I learn something new. Not only I learn it, but I test it, write about it, and finaly have it run by customers. That’s fast, but it is also exciting and fun.

Chosing Containers means trusting someone

Clearly. If you chose to go into containers, you completely trust the company, the group, a person… Because you will not go down into details on how it’s done. You don’t have time, nor will take the time that it requires to fully understand how it is done. Because that’s probably not even in the scope of your skills.

If you are that kind of PostgreSQL user that needs to understand things, master them, or just because that’s part of your job, you’d probably have better time to build your own containers. It is really possible, despite it is still a lot of work. Like building the whole stack in Ansible was.

And if you have already the Ansible stack. Why not still use it to create the actual images you will use in Kubernetes? Because Ansible is such an incredible piece of software, it has that too. The only problem will be economic: what’s the cost of creating then maintaining your own images compared to your favorite’s vendor “package”, a mix of software and support. And, hell yeah, it’s so cool to have someone on the phone including at night, to help you when things gets wrong…

I see also lot of professionals relying on an operator because they have big troubles finding an actual PostgreSQL DBA. So yes, Karen Jex is right, an operator can act as your virtual DBA :-)

But let’s be clear here: nowadays, an operator can do lots of things. But it still cannot do everything. And probably will never be able to. And it’s always better to have a real DBA in the house to handle things like how your schema is done and how the SQL queries are written.

Change your mind and learn new things !

I may be wrong… But I strongly think that that the incoming Kubernetization of databases is the next big step, as big as the one about Virtualization of databases many years ago.

If you’re in a situation where you have to modernize your data stack in your company, I just strongly recommend you ask for if Kubernetes is a topic or not already somewhere in the company. You’d be surprised.

I remember at that time a lot of colleagues tought virtualization would kill PostgreSQL performances, because of the cost of the hypervisor, etc..

History demonstrated they where somewhat right. But real fast, companies like VMWare adjusted their technologies. And we learnt how to use them better with time.

Now who could say that they really need bare-metal to satisfy the performance they need? I know very few project that still do. But they are really a few. I just need one hand to count them all.

Also, like I said earlier in this article, I am so impressed of performances I can have with Kubernetes on Bare Metal, I’d just wouldn’t hesitate like I did 2 years ago answering that question I had at this time: “Can I trust you when you say performances losses will be bearable if I go into Kubernetes?”

Franly? Yes, no doubt. If you have performances issues in Kubernetes, that’s 9 out 10 times because… you under-provision things. I see that a looooot.

If you’re curious about what we see most of the time at Support, just take a coffee and go watch Lessons from the Support Desk. BTW, thank you Evan for that great talk.

Seriously, problems arise when users don’t give enough resources, that’s all (4 out of 5 support tickets, I’d say)… or when they don’t upgrade (and too few do :’(..).

I think one of my next post will be about how it is like working with PostgreSQL in Kubernetes, like, daily DBA tasks.