Analyzing Postgres Queries Performance with Grafana (Part 1)
Monitoring database performance is never child’s play. The database is the lowest level element in system design and must be properly set up and should work efficiently. A fast API response is generally backed by a fast query or efficient database setup.
Postgres provides an excellent out-of-the-box solution to monitor the queries, indexing, and in general database performance.
There are 2 main stats tables that provide statistics of different queries and overall database performance:
Though other tables are also there, but the above 2 satisfy most of all our different monitoring use cases.
This table reflects the current ongoing process or activity in the database. In one line, this showcases what queries are running in a database in real-time.
If we simply run SELECT on this table:
select * from pg_stat_activity
We shall get all the current processes running in the database both active and idle ones. The following columns are available
As the output would represent raw data, hence a better query would be:
SELECT now() - pg_stat_activity.query_start AS duration,
order by duration desc;
The result is far better and more meaningful:
Here if we try to interpret
durationThe total uptime of execution of this query(process)
usesysid, usernameDatabase user id and username
pidProcess unique id
stateThe current state of the process (active/idle)
datnameDatabase name in which process is running
queryActual truncated query
Though there are multiple other columns as in DDL, you are always free to use them too.
- Always maintain unique(separate) credentials for every consumer(client-consuming database), and NEVER share the same credentials with two different clients. This way you can track which client is running long queries if any.
- Duration of
activequeries must not be very large. Ideally must be 30 sec to 1 min or so, depending on your requirement.
idlethe state simply represents an idle connection. Too many idle connections also eat up your database hardware resources.
You can even set up a Grafana dashboard for this query.
In this way, you can regularly watch and see the live status of your Postgres queries and hunt down the SLOW queries
Postgres stores all query performance stats in an aggregate form in this table. By aggregate we mean, taking the mean of different metrics of a batch of queries rather than tracking individual queries at once.
If we simply run SELECT on this table:
select * from pg_stat_statements
We shall get output in the following DDL form
Again the results, are not much in meaningful form, hence we shall try to run a more realistic and meaningful version of this query:
pd.datname as db_name,
round((pss.total_exec_time + pss.total_plan_time)::numeric, 2) as total_time,
round((pss.mean_exec_time+pss.mean_plan_time)::numeric, 2) as mean,
round((100 * (pss.total_exec_time + pss.total_plan_time) / sum((pss.total_exec_time + pss.total_plan_time)::numeric) OVER ())::numeric, 2) as cpu_portion_pctg,
FROM pg_stat_statements pss, pg_database pd
ORDER BY (pss.total_exec_time + pss.total_plan_time)
DESC LIMIT 100;
Don't be intimidated by the above query, its simply we are doing some arithmetic operations like mean, and sum.
Output is in a much more meaningful format.
The results are self-explanatory,
- The first query takes a max amount of CPU percentage (1.78) shared with all queries. The mean response time is 238 millisec and the total calls/ queries count is 147850.
- The first query is made by the client using
6794153the user id.
To get the username:
select usename,usesysid from pg_catalog.pg_user ORDER BY usesysid desc
You can try creating a Grafana dashboard for easy monitoring:
The highest CPU share query must be optimized and so on.
Both the table
pg_stat_statements can be used together to analyze the queries' performance. One can easily draw out which queries are heavier and hence can be optimized to minimize CPU usage. One can even track what are the actual client that are running the slow queries.
There are multiple other monitoring stats too like index stats, idle connections, pending transactions, etc, and we shall talk about them in our next blog of this series.
Whola! Both you and I learned something new today. Congrats
👏 👏 👏