pgSimload (#1): why & what

What is pgSimload and why I started writing this tool

tags: #PgSimload
published: December 9, 2024
reading time: 5 minutes

Why did I started writing this tool ?

pgSimload version alpha by Patrick

pgSimload is a tool written in Go. The very first version was a POC written by my colleague Patrick McLaughlin. He wanted to show us (as the Solution Architects Team, at my job), that something else than Apache JMeter can be used to create a loop of SELECT 1; to check if a PostgreSQL cluster (mostly in HA) is responding or not, when we do demos.

Since I had (a ton) of ideas about that, I asked Patrick about that code, he told me something like “do what you want with it”. That I did :-)

Reason #1 : regroup all my scripts in a tool

So a first reason to start my own tool was to write my own tool to centralize all the features / scripts / pieces of Perl, Bash and lots of small parts, I was using for decades…

Because one unique tool for that would have been so cool for my daily tasks like demos, tests and benchmark of PostgreSQL instances and cluster, in High Availability or not, in Bare metal like Kubernetes…

Reason #2 : learn a new programming language

I hesitated long here. Should I start in Rust or shoud I start in Go?

Despite the original alpha version Patrick wrote was Go, I really wondered here. Because Rust is probably the next standard for PostgreSQL Extensions. No doubt here.

Also, Go is very, very used in the Kubernetes world. As an example, PostgreSQL operators for Kubernetes are written in Go, that’s the case of PGO, from Crunchy Data. But also for cloudnative-pg.

Since I am 100% convinced the future of PostgreSQL is containerized, that’s already the present for most of the people I work with, and that Kubernetes won the battle of containers for long, I opt-ed for Go.

I frankly don’t regret it.

Reason #3 : coding is fun

I always had bad feelings about myself and code. The few things I did before was mostly bash. Also a bunch of Perl. Perl is still particularly effective when it comes to clean data. And you don’t imagine how bad is the data out there in the real world. I’m a big consummer of the French public data. I use that data to answer the tons of questions I have on everything. I still use Perl a lot to filter and correct the data. I’ll may blog on that.

BUT, I’ve never been able to write a tool from “blank page”. In the past I did maintain some Open Source software here and there. But never things I started or coded on my own.

So I wanted to prove myself that I can do that, too. I learnt a lot, and still learning a lot. It is really fun to code, finaly. Once you consider it a game, and set some reachable enough goals, like, (small) step by step, then, it’s a real pleasure, and very satysfying thing to code!! :-)

So, what’s pgSimload ?

pgSimload is defined on the GitHub page as:

pgSimload, a versatile CLI tool to create activity on PostgreSQL server(s)
and/or test HA in Crunchy Postgres or Crunchy Postgres for Kubernetes

The homepage lists quite every feature I’ve added over the last 2 years of coding this tool.

Basically, it does, on the simulation side, aka “SQL-Loop” mode:

simulate a load on a server, hence, it’s name “PG SIM LOAD”
that load can be simulated with one unique client, or many, you decide
that load can be “as fast and as much as possible”, or throttled, with fixed thinking times and/or random ones (sleep and rsleep)
before executing SQL queries in DML, it can execute a script to create a set of tables, in a schema or not, etc.. Your call
the SQL queries in DML are in a SQL script, and you decide what you put in there
everything is run in “fail & try” manner: if that write or select fails, try to re-do it, with a certain throttle… So if that’s a failover happening, we’ll try to survive it, application-wise. And output how much time we have to wait, from the application-side, because pgSimload is acting there as a PG client

It has a “Patroni watcher” mode to show the output of patronictl (list or topography). With a refresh time you decide (a la top).

It has the same with a “Kube watcher”, that has much less info, but can still identify primary from replicas, etc. This mode I did it to match what others use in my company. Also, it’s nice for operators that don’t rely on Patroni, users can still have some sort of “watcher”, when they want to tests scenarios in HA, as an example.

All those features works in every scenario PostgreSQL is working:

bare metal and VMs, using SSH for watchers
Kubernetes, in every flavour of it I could test, using kubectl for watchers

So when it’s about doing a demo, like I did at PostgreSQL Europe Conference 2024 in Athens, it’s all about having:

1 terminal with k9s
1 terminal to issue commands (kubectl (apply|delete)..., vim postgresql.yaml, etc)
1 terminal with pgSimload in a Patroni-Watcher mode
1 terminal with a pgSimload actually writing to the cluster (using the Read-Write Port)
1 terminal with a pgSimload actually reading from the cluster (using the Read-Only Port)
1 terminal with your own SQL in a watch to check values in your test tables..

And then, I can kill a pod… or delete the entire $PGDATA on the Primary’s filesystem… and let people see that everything is handled, and that the PostgreSQL platform is nearly unbreakable.

So this is how that test environment looks like:

pgSimload in action at PG EU 2024 in Athens

(Thanks to my colleague and friend Roberto Mello for this post on Twitter from which I stolen that photo !)

Conclusion

I hope this post will convince you to give a try to pgSimload. I took a lot of time writing a complete, comprehensive documentation too. I hope it will be clear enough. If you have suggestions or toughts, please use GiHub for that :-)