One of our customer complained about the PostgreSQL database is very slow just after migrated from Oracle database.
We did initial investigation and found very interesting different behaviours between Oracle and PostgreSQL regarding statistics.
As we know, Oracle has a daily overnight job for looking after user tables and indexes statistics. While for PostgreSQL, the way of collecting statistics is in a totally different way.
In PostgreSQL, the following default parameters are defined for collecting statistics.
select name,setting from pg_settings
where name like '%vacuu%ana%';
name | setting
---------------------------------+---------
autovacuum_analyze_scale_factor | 0.05
autovacuum_analyze_threshold | 50
(2 rows)
autovacuum_analyze_scale_factor : Specifies a fraction of the table size to add to autovacuum_analyze_threshold when deciding whether to trigger an ANALYZE. The default is 0.05 (5% of table size).
autovacuum_analyze_threshold : Specifies the minimum number of inserted, updated or deleted tuples needed to trigger an ANALYZE in any one table. The default is 50 tuples.
Let’s create a test table, then try to explain how those two parameters works together.
postgres=>create table test ( id integer);
CREATE TABLE
postgres=>select relname,n_tup_ins, n_tup_upd,n_tup_del,n_live_tup,
n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze,
vacuum_count,autovacuum_count,analyze_count,autoanalyze_count
from pg_stat_all_tables
where relname='test';
relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | autovacuum_count | analyze_count | autoanalyze
_count
---------+-----------+-----------+-----------+------------+------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+------------
-------
test | 0 | 0 | 0 | 0 | 0 | | | | | 0 | 0 | 0 |
0
(1 row)
For test purpose, we set up parameter autovacuum_analyze_threshold for table TEST to a smaller number 5, instead of default 50.
postgres=> ALTER TABLE test SET (autovacuum_enabled = true, autovacuum_analyze_threshold=5);
ALTER TABLE
postgres=> \d+ test
Table "public.test"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+---------+--------------+-------------
id | integer | | | | plain | |
Options: autovacuum_enabled=true, autovacuum_analyze_threshold=5
Insert just four records which should not trigger auto analysing.
postgres=> insert into test values (1);
INSERT 0 1
postgres=> insert into test values (2);
INSERT 0 1
postgres=> insert into test values (3);
INSERT 0 1
postgres=> insert into test values (4);
INSERT 0 1
After a while, check the stats without collected as expected.
postgres=> select relname,n_tup_ins, n_tup_upd,n_tup_del,n_live_tup,
n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze,
vacuum_count,autovacuum_count,analyze_count,autoanalyze_count
from pg_stat_all_tables
where relname='test';
relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | autovacuum_count | analyze_count | autoanalyze
_count
---------+-----------+-----------+-----------+------------+------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+------------
-------
test | 4 | 0 | 0 | 4 | 0 | | | | | 0 | 0 | 0 |
0
(1 row)
Now insert extra two records of fifth and sixth records, which should trigger auto analysing.
postgres=> insert into test values (5); INSERT 0 1 postgres=> insert into test values (6); INSERT 0 1 postgres=> select now(); now ----------------------------- 2020-10-06 03:18:27.418373+00 (1 row)
postgres=> select relname,n_tup_ins, n_tup_upd,n_tup_del,n_live_tup,
n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze,
vacuum_count,autovacuum_count,analyze_count,autoanalyze_count
from pg_stat_all_tables
where relname='test'; \gx
relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | autovacuum_count | analyze_count
| autoanalyze_count
---------+-----------+-----------+-----------+------------+------------+-------------+-----------------+--------------+-------------------------------+--------------+------------------+---------------
+-------------------
test | 6 | 0 | 0 | 6 | 0 | | | | 2020-10-06 03:19:15.765914+00 | 0 | 0 | 0
| 1
(1 row)
-[ RECORD 1 ]-----+------------------------------
relname | test
n_tup_ins | 6
n_tup_upd | 0
n_tup_del | 0
n_live_tup | 6
n_dead_tup | 0
last_vacuum |
last_autovacuum |
last_analyze |
last_autoanalyze | 2020-10-06 03:19:15.765914+00
vacuum_count | 0
autovacuum_count | 0
analyze_count | 0
autoanalyze_count | 1
postgres=>
As expected, the autoanalyze kicked off immediately just after two records inserted.
Beside the autoanalyze, we can analyze the table manually as below.
postgres=> analyze test;
ANALYZE
postgres=> select relname,n_tup_ins, n_tup_upd,n_tup_del,n_live_tup,
n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze,
vacuum_count,autovacuum_count,analyze_count,autoanalyze_count
from pg_stat_all_tables
where relname='test'; \gx
relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | autovacuum_count
| analyze_count | autoanalyze_count
---------+-----------+-----------+-----------+------------+------------+-------------+-----------------+-------------------------------+-------------------------------+--------------+-----------------
-+---------------+-------------------
test | 6 | 0 | 0 | 6 | 0 | | | 2020-10-06 03:31:12.055461+00 | 2020-10-06 03:19:15.765914+00 | 0 | 0
| 1 | 1
(1 row)
-[ RECORD 1 ]-----+------------------------------
relname | test
n_tup_ins | 6
n_tup_upd | 0
n_tup_del | 0
n_live_tup | 6
n_dead_tup | 0
last_vacuum |
last_autovacuum |
last_analyze | 2020-10-06 03:31:12.055461+00
last_autoanalyze | 2020-10-06 03:19:15.765914+00
vacuum_count | 0
autovacuum_count | 0
analyze_count | 1
autoanalyze_count | 1
Let’s go back to the customer’s problem. There are various reasons that can cause statistics missing , like PostgreSQL RDS restored from snapshot, etc.
Since Oracle database was migrated to PostgreSQL, all the statistics were reset to zero. In the beginning, there were not enough transactions to trigger auto-analysing, so many SQL plans were not optimised until auto-analysing kicks off or starts a manual analysing.
In this case, it is a good practice to do a manual analysing immediately after Oracle database migrated to PostgreSQL .