PostgreSQL Automatic Vacuuming

Why Vacuum ?

In PostgreSQL, when rows are updated or deleted from a table, dead rows are created and left behind, vacuum daemon processes will clean those dead tuples ( rows) periodically based on parameter setting for vacuum process. Generally the following four reasons explain why vacuum is needed:

  • To recover or reuse disk space occupied by updated or deleted rows.
  • To update data statistics used by the PostgreSQL query planner.
  • To update the visibility map, which speeds up index-only scans.
  • To protect against loss of very old data due to transaction ID wraparound or multixact ID wraparound.

Automatic Vacuuming Parameters

There are following 14 configuration parameters for PostgreSQL 11. For all the details, please refer to  Automatic Vacuuming.

postgres=> select name, setting,unit 
           from pg_settings 
            where name like '%autovacuum%';
                name                 |  setting  | unit
-------------------------------------+-----------+------
 autovacuum                          | on        |
 autovacuum_analyze_scale_factor     | 0.05      |
 autovacuum_analyze_threshold        | 50        |
 autovacuum_freeze_max_age           | 200000000 |
 autovacuum_max_workers              | 3         |
 autovacuum_multixact_freeze_max_age | 400000000 |
 autovacuum_naptime                  | 15        | s
 autovacuum_vacuum_cost_delay        | 20        | ms
 autovacuum_vacuum_cost_limit        | 200       |
 autovacuum_vacuum_scale_factor      | 0.1       |
 autovacuum_vacuum_threshold         | 50        |
 autovacuum_work_mem                 | -1        | kB
 log_autovacuum_min_duration         | 10000     | ms
 rds.force_autovacuum_logging_level  | info      |
(14 rows)

Now we have a look at the two most important parameters of automatic vacuuming.

                name                 |  setting  | unit
-------------------------------------+-----------+------
 autovacuum_vacuum_scale_factor      | 0.1       |
 autovacuum_vacuum_threshold         | 50        |

We still use table TEST as an example to demonstrate how those two parameters work together.

We intentionally modified autovacuum_vacuum_threshold value from default 50 to 5 for easy testing purpose.

postgres=> ALTER TABLE test SET (autovacuum_enabled = true,autovacuum_vacuum_threshold=5);
ALTER TABLE

postgres=> \d+ test
                                   Table "public.test"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+---------+--------------+-------------
 id     | integer |           |          |         | plain   |              |
Options: autovacuum_analyze_threshold=5, autovacuum_enabled=true, autovacuum_vacuum_threshold=5


postgres=> select relname,n_tup_ins , n_tup_upd,n_tup_del,n_live_tup,n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze,vacuum_count,autovacuum_count,analyze_count,autoanalyze_count
from pg_stat_all_tables where relname='test'; \gx
 relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | last_vacuum | last_autovacuum |         last_analyze          |       last_autoanalyze        | vacuum_count | autovacuum_count
 | analyze_count | autoanalyze_count
---------+-----------+-----------+-----------+------------+------------+-------------+-----------------+-------------------------------+-------------------------------+--------------+-----------------
-+---------------+-------------------
 test    |        12 |         0 |         0 |         12 |          0 |             |                 | 2020-10-06 03:31:12.055461+00 | 2020-10-06 10:21:34.177395+00 |            0 |                0
 |             1 |                 2
(1 row)


-[ RECORD 1 ]-----+------------------------------
relname           | test
n_tup_ins         | 12
n_tup_upd         | 0
n_tup_del         | 0
n_live_tup        | 12
n_dead_tup        | 0
last_vacuum       |
last_autovacuum   |
last_analyze      | 2020-10-06 03:31:12.055461+00
last_autoanalyze  | 2020-10-06 10:21:34.177395+00
vacuum_count      | 0
autovacuum_count  | 0
analyze_count     | 1
autoanalyze_count | 2


postgres=> delete from test where id>=6;
DELETE 7

postgres=> select relname,n_tup_ins , n_tup_upd,n_tup_del,n_live_tup,n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze,vacuum_count,autovacuum_count,analyze_count,autoanalyze_count
from pg_stat_all_tables where relname='test'; \gx
 relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | last_vacuum |        last_autovacuum        |         last_analyze          |       last_autoanalyze        | vacuum_count | au
tovacuum_count | analyze_count | autoanalyze_count
---------+-----------+-----------+-----------+------------+------------+-------------+-------------------------------+-------------------------------+-------------------------------+--------------+---
---------------+---------------+-------------------
 test    |        12 |         0 |         7 |          5 |          0 |             | 2020-10-06 10:28:35.355256+00 | 2020-10-06 03:31:12.055461+00 | 2020-10-06 10:28:35.375704+00 |            0 |
             1 |             1 |                 3
(1 row)


-[ RECORD 1 ]-----+------------------------------
relname           | test
n_tup_ins         | 12
n_tup_upd         | 0
n_tup_del         | 7
n_live_tup        | 5
n_dead_tup        | 0
last_vacuum       |
last_autovacuum   | 2020-10-06 10:28:35.355256+00
last_analyze      | 2020-10-06 03:31:12.055461+00
last_autoanalyze  | 2020-10-06 10:28:35.375704+00
vacuum_count      | 0
autovacuum_count  | 1
analyze_count     | 1
autoanalyze_count | 3

postgres=>

After deleted 7 of 12 records from TEST table, it triggers both auto analyzing and auto vacuuming immediately.

It triggers auto analyzing, because deleted records ( 7 ) > autovacuum_analyze_scale_factor( 0.05 ) *table records (12) + autovacuum_analyze_threshold (5 ).

It triggers auto vacuuming, because deleted records ( 7 ) > autovacuum_vacuum_scale_factor( 0.1 ) *table records(12) + autovacuum_vacuum_threshold (5 ).

Statistics is Missing from Tables in PostgreSQL

One of our customer complained about the PostgreSQL database is very slow just after migrated from Oracle database.

We did initial investigation and found very interesting different behaviours between Oracle and PostgreSQL regarding statistics.

As we know, Oracle has a daily overnight job for looking after user tables and indexes statistics. While for PostgreSQL, the way of collecting statistics is in a totally different way.

In PostgreSQL, the following default parameters are defined for collecting statistics.

 select name,setting  from pg_settings
 where name like '%vacuu%ana%';
              name               | setting
---------------------------------+---------
 autovacuum_analyze_scale_factor | 0.05
 autovacuum_analyze_threshold    | 50
(2 rows)

autovacuum_analyze_scale_factor : Specifies a fraction of the table size to add to autovacuum_analyze_threshold when deciding whether to trigger an ANALYZE. The default is 0.05 (5% of table size).

autovacuum_analyze_threshold : Specifies the minimum number of inserted, updated or deleted tuples needed to trigger an ANALYZE in any one table. The default is 50 tuples.

Let’s create a test table, then try to explain how those two parameters works together.

postgres=>create table test ( id integer);
CREATE TABLE

postgres=>select relname,n_tup_ins, n_tup_upd,n_tup_del,n_live_tup,
n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze,
vacuum_count,autovacuum_count,analyze_count,autoanalyze_count 
from pg_stat_all_tables 
where relname='test';
 relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | autovacuum_count | analyze_count | autoanalyze
_count
---------+-----------+-----------+-----------+------------+------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+------------
-------
 test    |         0 |         0 |         0 |          0 |          0 |             |                 |              |                  |            0 |                0 |             0 |
     0
(1 row)


For test purpose, we set up parameter autovacuum_analyze_threshold  for table TEST to a smaller number 5, instead of default 50.

postgres=> ALTER TABLE test SET (autovacuum_enabled = true, autovacuum_analyze_threshold=5);
ALTER TABLE
             
 postgres=> \d+ test
                                   Table "public.test"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+---------+--------------+-------------
 id     | integer |           |          |         | plain   |              |
Options: autovacuum_enabled=true, autovacuum_analyze_threshold=5

Insert just four records which should not trigger auto analysing.

postgres=> insert into test values (1);
INSERT 0 1
postgres=> insert into test values (2);
INSERT 0 1
postgres=> insert into test values (3);
INSERT 0 1
postgres=> insert into test values (4);
INSERT 0 1

After a while, check the stats without collected as expected.

postgres=> select relname,n_tup_ins, n_tup_upd,n_tup_del,n_live_tup,
n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze,
vacuum_count,autovacuum_count,analyze_count,autoanalyze_count 
from pg_stat_all_tables 
where relname='test';
 relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | autovacuum_count | analyze_count | autoanalyze
_count
---------+-----------+-----------+-----------+------------+------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+------------
-------
 test    |         4 |         0 |         0 |          4 |          0 |             |                 |              |                  |            0 |                0 |             0 |
     0
(1 row)

Now insert extra two records of fifth and sixth records, which should trigger auto analysing.

postgres=> insert into test values (5);
INSERT 0 1

postgres=> insert into test values (6);
INSERT 0 1

postgres=> select now();
now
-----------------------------
2020-10-06 03:18:27.418373+00
(1 row)
postgres=> select relname,n_tup_ins, n_tup_upd,n_tup_del,n_live_tup,
n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze,
vacuum_count,autovacuum_count,analyze_count,autoanalyze_count 
from pg_stat_all_tables 
where relname='test'; \gx
 relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | last_vacuum | last_autovacuum | last_analyze |       last_autoanalyze        | vacuum_count | autovacuum_count | analyze_count
| autoanalyze_count
---------+-----------+-----------+-----------+------------+------------+-------------+-----------------+--------------+-------------------------------+--------------+------------------+---------------
+-------------------
 test    |         6 |         0 |         0 |          6 |          0 |             |                 |              | 2020-10-06 03:19:15.765914+00 |            0 |                0 |             0
|                 1
(1 row)


-[ RECORD 1 ]-----+------------------------------
relname           | test
n_tup_ins         | 6
n_tup_upd         | 0
n_tup_del         | 0
n_live_tup        | 6
n_dead_tup        | 0
last_vacuum       |
last_autovacuum   |
last_analyze      |
last_autoanalyze  | 2020-10-06 03:19:15.765914+00
vacuum_count      | 0
autovacuum_count  | 0
analyze_count     | 0
autoanalyze_count | 1


postgres=>

As expected, the autoanalyze kicked off immediately just after two records inserted.

Beside the autoanalyze, we can analyze the table manually as below.

postgres=> analyze test;
ANALYZE
postgres=> select relname,n_tup_ins, n_tup_upd,n_tup_del,n_live_tup,
n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze,
vacuum_count,autovacuum_count,analyze_count,autoanalyze_count 
from pg_stat_all_tables 
where relname='test'; \gx
 relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | last_vacuum | last_autovacuum |         last_analyze          |       last_autoanalyze        | vacuum_count | autovacuum_count
 | analyze_count | autoanalyze_count
---------+-----------+-----------+-----------+------------+------------+-------------+-----------------+-------------------------------+-------------------------------+--------------+-----------------
-+---------------+-------------------
 test    |         6 |         0 |         0 |          6 |          0 |             |                 | 2020-10-06 03:31:12.055461+00 | 2020-10-06 03:19:15.765914+00 |            0 |                0
 |             1 |                 1
(1 row)


-[ RECORD 1 ]-----+------------------------------
relname           | test
n_tup_ins         | 6
n_tup_upd         | 0
n_tup_del         | 0
n_live_tup        | 6
n_dead_tup        | 0
last_vacuum       |
last_autovacuum   |
last_analyze      | 2020-10-06 03:31:12.055461+00
last_autoanalyze  | 2020-10-06 03:19:15.765914+00
vacuum_count      | 0
autovacuum_count  | 0
analyze_count     | 1
autoanalyze_count | 1

Let’s go back to the customer’s problem. There are various reasons that can cause statistics missing , like PostgreSQL RDS restored from snapshot, etc.

Since Oracle database was migrated to PostgreSQL, all the statistics were reset to zero. In the beginning, there were not enough transactions to trigger auto-analysing, so many SQL plans were not optimised until auto-analysing kicks off or starts a manual analysing.

In this case, it is a good practice to do a manual analysing immediately after Oracle database migrated to PostgreSQL .