In the following article, I will present three ways of finding free queries in the PostgreSQL databases. Each of the manners has its own specifics, and it would not always be adequate in a given situation.
PostgreSQL slow query log
Description
It is the easiest method to implement. It gives the ability to configure the server in such a way that it will save the query in a log file according to the needs.
It can be set to log the content of every query lasting longer than the stated duration. Likewise, it has to be remembered that such logging costs:
- I/O disk storage;
- CPU cycles – to perform operations;
- space in the log file.
The value of log_min_duration_statement = 0, which enables to store every performed queries in the log file, is applied mostly in the test environments. Using it on the overloaded production environments may significantly slow down the operation of the system and cause an excessive exploitation (and even depletion) of the space assigned for logging.
In the manufacturing environments, higher values are being employed, which ensures the capturing of the slowest queries. A good starting point is setting the log_min_duration_statement = 5000 , which is to 5 seconds, but often these values can be set to 1 or 10 seconds – depending on the characteristics of the system.
The settings above ensure gathering of the information about queries which last longer than the specified time, which we consider to be long as regards to our system. They do not enable gathering the sample of all the queries and assessing either their number or their type.
If i.e., a slowdown of base is caused by the occurrence of thousands, or even millions of queries under 1 second, we can not know about this fact by using this method.
One solution is to enable logging of the query duration time, log_duration = on. It allows us to find out the number of the queries, because the times of all the queries execution and the content according to the settings of above-mentioned parameters are logged at this moment:
2022-06-22 12:37:45.839 CEST [18399] LOG: duration: 0.903 ms
2022-06-22 12:37:54.489 CEST [18399] LOG: duration: 9.415 ms
2022-06-22 12:38:42.144 CEST [18399] LOG: duration: 0.149 ms
2022-06-22 12:38:44.213 CEST [18399] LOG: duration: 0.125 ms
2022-06-22 12:38:46.270 CEST [18399] LOG: duration: 0.145 ms
2022-06-22 12:38:48.318 CEST [18399] LOG: duration: 0.126 ms
2022-06-22 12:44:49.238 CEST [18399] LOG: duration: 7.742 ms
2022-06-22 12:44:51.067 CEST [18399] LOG: duration: 5.153 ms
2022-06-22 12:44:51.252 CEST [18399] LOG: duration: 5.726 ms
2022-06-22 12:44:53.923 CEST [18399] LOG: duration: 4.626 ms
2022-06-22 12:44:53.928 CEST [18399] LOG: duration: 5.081 ms
Since version 12 it is possible to set to what percent of all executed transitions will be performed a full record of the transaction (log_transaction_sample_rate).
In version 13 new possibilities of limiting the number of the logged queries have been added: the parameter log_statement_sample_rate determines the percent of queries which last longer than log_min_duration_sample the full text of the query should be saved.
Independently to the above-mentioned, a parameter operates: log_statement – logs all of the queries of the elected type. If we are using time or percent based parameters, we can set this parameter as default (none) or ddl (Data Definition Language).
Configuration parameters
log_min_duration_statement | Log the text of the query, which lasts longer than this many milliseconds. |
log_duration | Log the duration time of ever query, for queries lasting shorter than the log_min_duration_statement; only the time is logged, not the text. |
log_transaction_sample_rate | Log the text of the query for this much percent of the all transactions. |
log_statement_sample_rate | Log the content of this percent of queries lasting longer than log_min_duration_sample. |
log_min_duration_sample | Log the text of the query, which lasts longer than this many milliseconds; refers to log_statement_sample_rate, is not related to log_min_duration_statement. |
log_statement | Log the queries of the type: none – do not log any; ddl – only Data Definition Language queries; mod – only queries which modify data (including DDL); all – logs all the queries. |
Launching
- Entry in the postgresql.conf file
log_min_duration_statement=3000;
+ reloading the database
- Modification from the cluster level
postgres=# ALTER SYSTEM SET log_min_duration_statement=3000;
ALTER SYSTEM
postgres=# Select pg_reload_conf();
pg_reload_conf
----------------
t
(1 row)
- Modification of a specified user
postgres=# CREATE USER test;
CREATE ROLE
postgres=# ALTER USER test SET log_min_duration_statement =10000;
ALTER ROLE
postgres=# \c postgres test
You are now connected to database "postgres" as user "test".
postgres=> SHOW log_min_duration_statement ;
log_min_duration_statement
----------------------------
10s
(1 row)
- Modification of a specified database
postgres=# SHOW log_min_duration_statement ;
log_min_duration_statement
----------------------------
3s
(1 row)
postgres=# CREATE DATABASE test;
CREATE DATABASE
postgres=# ALTER DATABASE test SET log_min_duration_statement
FROM CURRENT TO
postgres=# ALTER DATABASE test SET log_min_duration_statement TO 5000;
ALTER DATABASE
postgres=# \c test
You are now connected to database "test" as user "postgres".
test=# SHOW log_min_duration_statement ;
log_min_duration_statement
----------------------------
5s
(1 row)
pg_stat_statements
Description
This add-on enables collecting all the instances of SQL instructions conducted within the server including along with the statistical information regarding their execution.
The data contained in the overview pg_stat_statements allow for selecting the queries, which should be optimized, and also provide knowledge about potential issues.
postgres=# \d pg_stat_statements
View "public.pg_stat_statements"
Column | Type | Collation | Nullable | Default
---------------------+------------------+-----------+----------+---------
userid | oid | | |
dbid | oid | | |
toplevel | boolean | | |
queryid | bigint | | |
query | text | | |
plans | bigint | | |
total_plan_time | double precision | | |
min_plan_time | double precision | | |
max_plan_time | double precision | | |
mean_plan_time | double precision | | |
stddev_plan_time | double precision | | |
calls | bigint | | |
total_exec_time | double precision | | |
min_exec_time | double precision | | |
max_exec_time | double precision | | |
mean_exec_time | double precision | | |
stddev_exec_time | double precision | | |
rows | bigint | | |
shared_blks_hit | bigint | | |
shared_blks_read | bigint | | |
shared_blks_dirtied | bigint | | |
shared_blks_written | bigint | | |
local_blks_hit | bigint | | |
local_blks_read | bigint | | |
local_blks_dirtied | bigint | | |
local_blks_written | bigint | | |
temp_blks_read | bigint | | |
temp_blks_written | bigint | | |
blk_read_time | double precision | | |
blk_write_time | double precision | | |
wal_records | bigint | | |
wal_fpi | bigint | | |
wal_bytes | numeric | | |
In the upcoming version 15 the additional columns will be found containing the information about the time spent on saving/reading of the temporary files and also the statistics of the “Just in time compilation” functionality.
temp_blk_read_time | double precision | | |
temp_blk_write_time | double precision | | |
jit_functions | bigint | | |
jit_generation_time | bigint | | |
jit_inlining_count | bigint | | |
jit_inlining_time | bigint | | |
jit_optimization_count | bigint | | |
jit_optimization_time | bigint | | |
jit_emission_count | bigint | | |
jit_emission_time | bigint | | |
Example information, which can be obtained by using this add-on:
- Searching for the most frequently asked queries: SELECT query, calls FROM pg_stat_statements ORDER BY 2 DESC;
- Searching for the queries of the longest combined execution time: SELECT query, calls,total_exec_time + total_plan_time FROM pg_stat_statements ORDER BY 3 DESC;
- Searching for the queries which use the temporary files most intensively: SELECT query, calls,pg_size_pretty(temp_blks_written) FROM pg_stat_statements ORDER BY temp_blks_written DESC.
Obviously, the examples presented above do not exhaust the full list of possible queries.
Configuration parameters
pg_stat_statements.max | Maximal number of rows visible in the pg_stat_statements view; if there are more than this many SQL instructions executed on the server, then the information will not be stored. |
pg_stat_statements.track | Determines of which queries the statistics are tracked: top – direct queries; all – also the embedded ones – i.e., in the functions; none – turns off the statistics tracking. |
pg_stat_statements.track_utility | Determines, if the utility commands (other than select, insert, update, delete) are tracked – the list is available on: https://pgpedia.info/categories/utility-command.html. |
pg_stat_statements.track_planning | Enabling/disabling the tracking of query planning operations; impacts performance. |
pg_stat_statements.save | Determines whether statistics are to be stored between the boot ups of the server; default is on. |
Launching
To be able to launch the add-on a proper library has to be loaded from the
postgresql.conf file: shared_preload_libraries = ‘pg_stat_statements’.
To launch the library, a relaunch of the server is needed.
Next, the add-on should be launched:
CREATE EXTENSION pg_stat_statements();
auto_explain
Description
Sometimes we may encounter a situation when we possess the before typed instructions for optimization, and we are carrying out the before mentioned optimization. Then after manually booting them, they work correctly but when being launched automatically by the system they do not work optimally. There could be various reasons for this situation, for instance: the data, on which the query has been executed, has changed, is not available anymore or a nonoptimal query is being triggered by the application (i.e., incompatible data types).
Let’s consider such a situation:
postgres=# CREATE TABLE explain_test (id int GENERATED ALWAYS AS IDENTITY PRIMARY KEY, numerek int);
CREATE TABLE
postgres=# INSERT INTO explain_test (numerek ) SELECT * FROM generate_series(1, 1000000);
INSERT 0 1000000
postgres=# EXPLAIN ANALYZE SELECT id FROM explain_test WHERE id < 1000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using explain_test_pkey on explain_test (cost=0.42..28.49 rows=918 width=4) (actual time=0.043..0.474 rows=999 loops=1)
Index Cond: (id < 1000)
Heap Fetches: 0
Planning Time: 0.483 ms
Execution Time: 0.769 ms
(5 rows)
The query as expected use the index:
postgres=# EXPLAIN ANALYZE SELECT id FROM explain_test WHERE id < 1000000;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Seq Scan on explain_test (cost=0.00..16925.00 rows=999999 width=4) (actual time=0.028..490.079 rows=999999 loops=1)
Filter: (id < 1000000)
Rows Removed by Filter: 1
Planning Time: 0.197 ms
Execution Time: 741.269 ms
(5 rows)
The query will be logged in the pg_stat_statements just as the previous one. However, it will have a different plan of the query (also as expected) and we will not know about it from the information provided by the mentioned add-on.
In such situation, auto_explain (provides the plan of the query) or PostgreSQL slow query log (provides the form of the query – with specified parameters of the trigger) add-ons come in handy.
Configuration parametres
auto_explain.log_min_duration | Log the queries lasting longer than. |
auto_explain.log_analyze | Enables the entry of the output of EXPLAIN ANALYZE command, in the place of the default EXPLAIN. |
auto_explain.log_buffers | Equivalent of the BUFFERS option for the EXPLAIN command. |
auto_explain.log_wal | Equivalent of the WAL option for the EXPLAIN command. |
auto_explain.log_timing | Equivalent of the TIMING option for the EXPLAIN command. |
auto_explain.log_triggers | Considering the statistics of the released triggers. |
auto_explain.log_verbose | Equivalent of the VERBOSE option for the EXPLAIN command. |
auto_explain.log_settings | Enables adding information to the plan about the settings in relation to the default ones. |
auto_explain.log_format | Defines the format of the returned result for the command EXPLAIN {text,xml,json,yaml}. |
auto_explain.log_level | Estimates the level of logging for the add-on auto_explain {DEBUG5, DEBUG4, DEBUG3, DEBUG2, DEBUG1, INFO, NOTICE, WARNING, LOG}. |
auto_explain.log_nested_statements | Causes the nesting of query plans executed within the function. |
auto_explain.sample_rate | Causes the logging of only the specified parts of queries from the session. Does not contain the nested queries – in that case, all or none of the quarries are logged (per the setting of auto_explain.log_nested_statements parameter). |
Booting up
- For an individual session (user needs to have the administrator privileges)
postgres=# LOAD 'auto_explain';
LOAD
postgres=# SET auto_explain.log_min_duration = 0;
SET
postgres=# SET auto_explain.log_analyze = true;
SET
postgres=# SELECT count(*)
postgres-# FROM pg_class, pg_index
postgres-# WHERE oid = indrelid AND indisunique;
W pliku logu pojawi się wpis typu:
duration: 0.731 ms plan:
Query Text: SELECT count(*)
FROM pg_class, pg_index
WHERE oid = indrelid AND indisunique;
Aggregate (cost=28.19..28.20 rows=1 width=8) (actual time=0.713..0.719 rows=1 loops=1)
-> Hash Join (cost=21.91..27.84 rows=143 width=0) (actual time=0.463..0.665 rows=143 loops=1)
Hash Cond: (pg_index.indrelid = pg_class.oid)
-> Seq Scan on pg_index (cost=0.00..5.55 rows=143 width=4) (actual time=0.019..0.104 rows=143 loops=1)
Filter: indisunique
Rows Removed by Filter: 12
-> Hash (cost=16.96..16.96 rows=396 width=4) (actual time=0.429..0.430 rows=402 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 23kB
-> Seq Scan on pg_class (cost=0.00..16.96 rows=396 width=4) (actual time=0.010..0.221 rows=402 loops=1)
duration: 3.069 ms
- For the whole instance, as a library loaded during the system boot up, an entry is required:
shared_preload_libraries = ‘auto_explain’ along the reload of the database.
- For the whole instance, as a library loaded during the srta of a session, an entry is required:
session_preload_libraries = ‘auto_explain’ along the reload of the database.
In that case the module will be launched only for the new session, the pre-existing sessions will not be covered by it.
- For the user/database—on the level of the user or database specific setting can be changed:
ALTER USER auto_explain SET auto_explain.log_min_duration TO 0;
Obviously, the add-on has to be started by one of the two previously mentioned ways.