Postgres SQL

Analyzing Postgres Queries Performance with Grafana (Part 1)

Postgres Performance Analysis

Ayush P Gupta

--

Monitoring database performance is never child’s play. The database is the lowest level element in system design and must be properly set up and should work efficiently. A fast API response is generally backed by a fast query or efficient database setup.

Postgres provides an excellent out-of-the-box solution to monitor the queries, indexing, and in general database performance.

Stats

There are 2 main stats tables that provide statistics of different queries and overall database performance:

  1. pg_stat_activity
  2. pg_stat_statements

Though other tables are also there, but the above 2 satisfy most of all our different monitoring use cases.

pg_stat_activity

This table reflects the current ongoing process or activity in the database. In one line, this showcases what queries are running in a database in real-time.

If we simply run SELECT on this table:

select * from pg_stat_activity

We shall get all the current processes running in the database both active and idle ones. The following columns are available

As the output would represent raw data, hence a better query would be:

SELECT now() - pg_stat_activity.query_start AS duration,
usesysid,
usename,
client_addr,
pid,
state,
datname,
query,
backend_type
FROM pg_stat_activity
order by duration desc;

The result is far better and more meaningful:

Here if we try to interpret

  1. duration The total uptime of execution of this query(process)
  2. usesysid, username Database user id and username
  3. pid Process unique id
  4. state The current state of the process (active/idle)
  5. datname Database name in which process is running
  6. query Actual truncated query

Though there are multiple other columns as in DDL, you are always free to use them too.

Checkpoints/Best Practices

  1. Always maintain unique(separate) credentials for every consumer(client-consuming database), and NEVER share the same credentials with two different clients. This way you can track which client is running long queries if any.
  2. Duration of active queries must not be very large. Ideally must be 30 sec to 1 min or so, depending on your requirement.
  3. An idle the state simply represents an idle connection. Too many idle connections also eat up your database hardware resources.

Bonus

You can even set up a Grafana dashboard for this query.

In this way, you can regularly watch and see the live status of your Postgres queries and hunt down the SLOW queries

pg_stat_statements

Postgres stores all query performance stats in an aggregate form in this table. By aggregate we mean, taking the mean of different metrics of a batch of queries rather than tracking individual queries at once.

If we simply run SELECT on this table:

select * from pg_stat_statements

We shall get output in the following DDL form

Again the results, are not much in meaningful form, hence we shall try to run a more realistic and meaningful version of this query:

SELECT
pss.userid,
pss.dbid,
pd.datname as db_name,
round((pss.total_exec_time + pss.total_plan_time)::numeric, 2) as total_time,
pss.calls,
round((pss.mean_exec_time+pss.mean_plan_time)::numeric, 2) as mean,
round((100 * (pss.total_exec_time + pss.total_plan_time) / sum((pss.total_exec_time + pss.total_plan_time)::numeric) OVER ())::numeric, 2) as cpu_portion_pctg,
pss.query
FROM pg_stat_statements pss, pg_database pd
WHERE pd.oid=pss.dbid
ORDER BY (pss.total_exec_time + pss.total_plan_time)
DESC LIMIT 100;

Don't be intimidated by the above query, its simply we are doing some arithmetic operations like mean, and sum.

Output is in a much more meaningful format.

The results are self-explanatory,

  1. The first query takes a max amount of CPU percentage (1.78) shared with all queries. The mean response time is 238 millisec and the total calls/ queries count is 147850.
  2. The first query is made by the client using 6794153 the user id.

To get the username:

select usename,usesysid from pg_catalog.pg_user ORDER BY usesysid desc

Bonus

You can try creating a Grafana dashboard for easy monitoring:

The highest CPU share query must be optimized and so on.

Conclusion

Both the table pg_stat_activity, pg_stat_statements can be used together to analyze the queries' performance. One can easily draw out which queries are heavier and hence can be optimized to minimize CPU usage. One can even track what are the actual client that are running the slow queries.

There are multiple other monitoring stats too like index stats, idle connections, pending transactions, etc, and we shall talk about them in our next blog of this series.

Whola! Both you and I learned something new today. Congrats
👏 👏 👏

Further Reading

  1. https://jfrog.com/community/data-science/troubleshooting-high-cpu-utilization-in-postgresql-databases-a-how-to-guide/
  2. https://www.postgresql.org/docs/current/pgstatstatements.html
  3. https://www.postgresql.org/docs/current/monitoring-stats.html

--

--

Ayush P Gupta

NodeJs | VueJs | Kubernetes | Flutter | Linux | DIY person