PostgreSQL / EDB

PostgreSQL’s performance issues – identifying free queries

12/10/2022
Share:

In the following article, I will present three ways of finding free queries in the PostgreSQL databases. Each of the manners has its own specifics, and it would not always be adequate in a given situation.

PostgreSQL slow query log

Description

It is the easiest method to implement. It gives the ability to configure the server in such a way that it will save the query in a log file according to the needs.

It can be set to log the content of every query lasting longer than the stated duration. Likewise, it has to be remembered that such logging costs:

  • I/O disk storage;
  • CPU cycles – to perform operations;
  • space in the log file.

The value of log_min_duration_statement = 0, which enables to store every performed queries in the log file, is applied mostly in the test environments. Using it on the overloaded production environments may significantly slow down the operation of the system and cause an excessive exploitation (and even depletion) of the space assigned for logging.

In the manufacturing environments, higher values are being employed, which ensures the capturing of the slowest queries. A good starting point is setting the  log_min_duration_statement = 5000 , which is to 5 seconds, but often these values can be set to 1 or 10 seconds – depending on the characteristics of the system.

The settings above ensure gathering of the information about queries which last longer than the specified time, which we consider to be long as regards to our system. They do not enable gathering the sample of all the queries and assessing either their number or their type.

If i.e., a slowdown of base is caused by the occurrence of thousands, or even millions of queries under 1 second, we can not know about this fact by using this method.

One solution is to enable logging of the query duration time, log_duration = on. It allows us to find out the number of the queries, because the times of all the queries execution and the content according to the settings of above-mentioned parameters are logged at this moment:

2022-06-22 12:37:45.839 CEST [18399] LOG:  duration: 0.903 ms
2022-06-22 12:37:54.489 CEST [18399] LOG:  duration: 9.415 ms
2022-06-22 12:38:42.144 CEST [18399] LOG:  duration: 0.149 ms
2022-06-22 12:38:44.213 CEST [18399] LOG:  duration: 0.125 ms
2022-06-22 12:38:46.270 CEST [18399] LOG:  duration: 0.145 ms
2022-06-22 12:38:48.318 CEST [18399] LOG:  duration: 0.126 ms
2022-06-22 12:44:49.238 CEST [18399] LOG:  duration: 7.742 ms
2022-06-22 12:44:51.067 CEST [18399] LOG:  duration: 5.153 ms
2022-06-22 12:44:51.252 CEST [18399] LOG:  duration: 5.726 ms
2022-06-22 12:44:53.923 CEST [18399] LOG:  duration: 4.626 ms
2022-06-22 12:44:53.928 CEST [18399] LOG:  duration: 5.081 ms

Since version 12 it is possible to set to what percent of all executed transitions will be performed a full record of the transaction (log_transaction_sample_rate).

In version 13 new possibilities of limiting the number of the logged queries have been added: the parameter log_statement_sample_rate determines the percent of queries which last longer than log_min_duration_sample the full text of the query should be saved.

Independently to the above-mentioned, a parameter operates: log_statement – logs all of the queries of the elected type. If we are using time or percent based parameters, we can set this parameter as default (none) or ddl (Data Definition Language).

Configuration parameters

log_min_duration_statementLog the text of the query, which lasts longer than this many milliseconds.
log_durationLog the duration time of ever query, for queries lasting shorter than the  log_min_duration_statement; only the time is logged, not the text.
log_transaction_sample_rateLog the text of the query for this much percent of the all transactions.
log_statement_sample_rateLog the content of this percent of queries lasting longer than log_min_duration_sample.
log_min_duration_sampleLog the text of the query, which lasts longer than this many milliseconds; refers to log_statement_sample_rate, is not related to log_min_duration_statement.
log_statementLog the queries of the type:
none – do not log any;
ddl – only Data Definition Language queries;
mod – only queries which modify data (including DDL);
all – logs all the queries.

Launching

  • Entry in the postgresql.conf file

log_min_duration_statement=3000;
+ reloading the database

  • Modification from the cluster level
postgres=# ALTER SYSTEM SET log_min_duration_statement=3000;
ALTER SYSTEM
postgres=# Select pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)
  • Modification of a specified user
postgres=# CREATE USER test;
CREATE ROLE
postgres=# ALTER USER test SET log_min_duration_statement =10000;
ALTER ROLE
postgres=# \c postgres test 
You are now connected to database "postgres" as user "test".
postgres=> SHOW log_min_duration_statement ;
 log_min_duration_statement 
----------------------------
 10s
(1 row)
  • Modification of a specified database
postgres=# SHOW log_min_duration_statement ;
 log_min_duration_statement 
----------------------------
 3s
(1 row)
postgres=# CREATE DATABASE test;
CREATE DATABASE
postgres=# ALTER DATABASE test SET log_min_duration_statement 
FROM CURRENT  TO            
postgres=# ALTER DATABASE test SET log_min_duration_statement TO 5000;
ALTER DATABASE
postgres=# \c test
You are now connected to database "test" as user "postgres".
test=# SHOW log_min_duration_statement ;
 log_min_duration_statement 
----------------------------
 5s
(1 row)

pg_stat_statements

Description

This add-on enables collecting all the instances of SQL instructions conducted within the server including along with the statistical information regarding their execution.

The data contained in the overview pg_stat_statements allow for selecting the queries, which should be optimized, and also provide knowledge about potential issues.

postgres=# \d pg_stat_statements
                    View "public.pg_stat_statements"
Column    		|       Type       	| Collation | Nullable | Default 
---------------------+------------------+-----------+----------+---------
userid			| oid		        |           |          | 
dbid			| oid		        |           |          | 
toplevel		| boolean		|           |          | 
queryid			| bigint		|           |          | 
query			| text		        |           |          | 
plans			| bigint		|           |          | 
total_plan_time		| double precision	|           |          | 
min_plan_time		| double precision	|           |          | 
max_plan_time		| double precision	|           |          | 
mean_plan_time      	| double precision	|           |          | 
stddev_plan_time	| double precision	|           |          | 
calls              	| bigint           	|           |          | 
total_exec_time    	| double precision	|           |          | 
min_exec_time      	| double precision	|           |          | 
max_exec_time       	| double precision	|           |          | 
mean_exec_time      	| double precision	|           |          | 
stddev_exec_time    	| double precision	|           |          | 
rows                	| bigint           	|           |          | 
shared_blks_hit     	| bigint           	|           |          | 
shared_blks_read   	| bigint           	|           |          | 
shared_blks_dirtied 	| bigint           	|           |          | 
shared_blks_written 	| bigint           	|           |          | 
local_blks_hit      	| bigint           	|           |          | 
local_blks_read     	| bigint           	|           |          | 
local_blks_dirtied  	| bigint           	|           |          | 
local_blks_written	| bigint           	|           |          | 
temp_blks_read  	| bigint		|           |          | 
temp_blks_written	| bigint		|           |          | 
blk_read_time		| double precision	|           |          | 
blk_write_time		| double precision	|           |          | 
wal_records		| bigint		|           |          | 
wal_fpi			| bigint		|           |          | 
wal_bytes		| numeric	        |           |          | 

In the upcoming version 15 the additional columns will be found containing the information about the time spent on saving/reading of the temporary files and also the statistics of the “Just in time compilation” functionality.

temp_blk_read_time	| double precision	|           |          | 
temp_blk_write_time	| double precision	|           |          | 
jit_functions		| bigint		|           |          | 
jit_generation_time	| bigint		|           |          | 
jit_inlining_count 	| bigint		|           |          | 
jit_inlining_time 	| bigint		|           |          | 
jit_optimization_count	| bigint		|           |          | 
jit_optimization_time	| bigint		|           |          | 
jit_emission_count	| bigint		|           |          | 
jit_emission_time	| bigint		|           |          | 

Example information, which can be obtained by using this add-on:

  • Searching for the most frequently asked queries: SELECT query, calls FROM pg_stat_statements ORDER BY 2 DESC;
  • Searching for the queries of the longest combined execution time: SELECT query, calls,total_exec_time + total_plan_time  FROM pg_stat_statements ORDER BY 3 DESC;
  • Searching for the queries which use the temporary files most intensively: SELECT query, calls,pg_size_pretty(temp_blks_written) FROM pg_stat_statements ORDER BY temp_blks_written DESC.

Obviously, the examples presented above do not exhaust the full list of possible queries.

Configuration parameters

pg_stat_statements.maxMaximal number of rows visible in the pg_stat_statements view; if there are more than this many SQL instructions executed on the server, then the information will not be stored.
pg_stat_statements.trackDetermines of which queries the statistics are tracked:
top – direct queries;
all – also the embedded ones – i.e., in the functions;
none – turns off the statistics tracking.
pg_stat_statements.track_utilityDetermines, if the utility commands (other than select, insert, update, delete) are tracked – the list is available on: https://pgpedia.info/categories/utility-command.html.
pg_stat_statements.track_planningEnabling/disabling the tracking of query planning operations; impacts performance.
pg_stat_statements.saveDetermines whether statistics are to be stored between the boot ups of the server; default is on.

Launching

To be able to launch the add-on a proper library has to be loaded from the

postgresql.conf file: shared_preload_libraries = ‘pg_stat_statements’.

To launch the library, a relaunch of the server is needed.

Next, the add-on should be launched:

CREATE EXTENSION pg_stat_statements();

auto_explain

Description

Sometimes we may encounter a situation when we possess the before typed instructions for optimization, and we are carrying out the before mentioned optimization. Then after manually booting them, they work correctly but when being launched automatically by the system they do not work optimally. There could be various reasons for this situation, for instance: the data, on which the query has been executed, has changed, is not available anymore or a nonoptimal query is being triggered by the application (i.e., incompatible data types).

Let’s consider such a situation:

postgres=# CREATE TABLE explain_test (id int GENERATED ALWAYS AS IDENTITY PRIMARY KEY, numerek int);
CREATE TABLE
postgres=# INSERT INTO explain_test (numerek ) SELECT * FROM generate_series(1, 1000000);
INSERT 0 1000000
postgres=# EXPLAIN ANALYZE SELECT id FROM explain_test WHERE id < 1000;
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using explain_test_pkey on explain_test  (cost=0.42..28.49 rows=918 width=4) (actual time=0.043..0.474 rows=999 loops=1)
   Index Cond: (id < 1000)
   Heap Fetches: 0
 Planning Time: 0.483 ms
 Execution Time: 0.769 ms
(5 rows)

The query as expected use the index:

postgres=# EXPLAIN ANALYZE SELECT id FROM explain_test WHERE id < 1000000;
                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Seq Scan on explain_test  (cost=0.00..16925.00 rows=999999 width=4) (actual time=0.028..490.079 rows=999999 loops=1)
   Filter: (id < 1000000)
   Rows Removed by Filter: 1
 Planning Time: 0.197 ms
 Execution Time: 741.269 ms
(5 rows)

The query will be logged in the  pg_stat_statements just as the previous one. However, it will have a different plan of the query (also as expected) and we will not know about it from the information provided by the mentioned add-on.

In such situation, auto_explain (provides the plan of the query) or PostgreSQL slow query log (provides the form of the query – with specified parameters of the trigger) add-ons come in handy.

Configuration parametres

auto_explain.log_min_durationLog the queries lasting longer than.
auto_explain.log_analyzeEnables the entry of the output of EXPLAIN ANALYZE command, in the place of the default EXPLAIN.
auto_explain.log_buffersEquivalent of the BUFFERS option for the EXPLAIN command.
auto_explain.log_walEquivalent of the WAL option for the EXPLAIN command.
auto_explain.log_timingEquivalent of the TIMING option for the EXPLAIN command.
auto_explain.log_triggersConsidering the statistics of the released triggers.
auto_explain.log_verboseEquivalent of the VERBOSE option for the EXPLAIN command.
auto_explain.log_settingsEnables adding information to the plan about the settings in relation to the default ones.
auto_explain.log_formatDefines the format of the returned result for the command EXPLAIN {text,xml,json,yaml}.
auto_explain.log_levelEstimates the level of logging for the add-on auto_explain {DEBUG5, DEBUG4, DEBUG3, DEBUG2, DEBUG1, INFO, NOTICE, WARNING, LOG}.
auto_explain.log_nested_statementsCauses the nesting of query plans executed within the function.
auto_explain.sample_rateCauses the logging of only the specified parts of queries from the session. Does not contain the nested queries – in that case, all or none of the quarries are logged (per the setting of auto_explain.log_nested_statements parameter).

Booting up

  • For an individual session (user needs to have the administrator privileges)
postgres=# LOAD 'auto_explain';
LOAD
postgres=# SET auto_explain.log_min_duration = 0;
SET
postgres=# SET auto_explain.log_analyze = true;
SET
postgres=# SELECT count(*)
postgres-#            FROM pg_class, pg_index
postgres-#            WHERE oid = indrelid AND indisunique;

W pliku logu pojawi się wpis typu:
duration: 0.731 ms  plan:
	Query Text: SELECT count(*)
	           FROM pg_class, pg_index
	           WHERE oid = indrelid AND indisunique;
	Aggregate  (cost=28.19..28.20 rows=1 width=8) (actual time=0.713..0.719 rows=1 loops=1)
	  ->  Hash Join  (cost=21.91..27.84 rows=143 width=0) (actual time=0.463..0.665 rows=143 loops=1)
	        Hash Cond: (pg_index.indrelid = pg_class.oid)
	        ->  Seq Scan on pg_index  (cost=0.00..5.55 rows=143 width=4) (actual time=0.019..0.104 rows=143 loops=1)
	              Filter: indisunique
	              Rows Removed by Filter: 12
	        ->  Hash  (cost=16.96..16.96 rows=396 width=4) (actual time=0.429..0.430 rows=402 loops=1)
	              Buckets: 1024  Batches: 1  Memory Usage: 23kB
	              ->  Seq Scan on pg_class  (cost=0.00..16.96 rows=396 width=4) (actual time=0.010..0.221 rows=402 loops=1)
duration: 3.069 ms
  • For the whole instance, as a library loaded during the system boot up, an entry is required:

shared_preload_libraries = ‘auto_explain’ along the reload of the database.

  • For the whole instance, as a library loaded during the srta of a session, an entry is required:

session_preload_libraries = ‘auto_explain’ along the reload of the database.

In that case the module will be launched only for the new session, the pre-existing sessions will not be covered by it.

  • For the user/database—on the level of the user or database specific setting can be changed:

ALTER USER auto_explain SET auto_explain.log_min_duration TO 0;

Obviously, the add-on has to be started by one of the two previously mentioned ways.

Look more

Leave a Reply

Your email address will not be published. Required fields are marked *