PostgreSQL Automatic Vacuuming

Why Vacuum ?

In PostgreSQL, when rows are updated or deleted from a table, dead rows are created and left behind, vacuum daemon processes will clean those dead tuples ( rows) periodically based on parameter setting for vacuum process. Generally the following four reasons explain why vacuum is needed:

  • To recover or reuse disk space occupied by updated or deleted rows.
  • To update data statistics used by the PostgreSQL query planner.
  • To update the visibility map, which speeds up index-only scans.
  • To protect against loss of very old data due to transaction ID wraparound or multixact ID wraparound.

Automatic Vacuuming Parameters

There are following 14 configuration parameters for PostgreSQL 11. For all the details, please refer to  Automatic Vacuuming.

postgres=> select name, setting,unit 
           from pg_settings 
            where name like '%autovacuum%';
                name                 |  setting  | unit
-------------------------------------+-----------+------
 autovacuum                          | on        |
 autovacuum_analyze_scale_factor     | 0.05      |
 autovacuum_analyze_threshold        | 50        |
 autovacuum_freeze_max_age           | 200000000 |
 autovacuum_max_workers              | 3         |
 autovacuum_multixact_freeze_max_age | 400000000 |
 autovacuum_naptime                  | 15        | s
 autovacuum_vacuum_cost_delay        | 20        | ms
 autovacuum_vacuum_cost_limit        | 200       |
 autovacuum_vacuum_scale_factor      | 0.1       |
 autovacuum_vacuum_threshold         | 50        |
 autovacuum_work_mem                 | -1        | kB
 log_autovacuum_min_duration         | 10000     | ms
 rds.force_autovacuum_logging_level  | info      |
(14 rows)

Now we have a look at the two most important parameters of automatic vacuuming.

                name                 |  setting  | unit
-------------------------------------+-----------+------
 autovacuum_vacuum_scale_factor      | 0.1       |
 autovacuum_vacuum_threshold         | 50        |

We still use table TEST as an example to demonstrate how those two parameters work together.

We intentionally modified autovacuum_vacuum_threshold value from default 50 to 5 for easy testing purpose.

postgres=> ALTER TABLE test SET (autovacuum_enabled = true,autovacuum_vacuum_threshold=5);
ALTER TABLE

postgres=> \d+ test
                                   Table "public.test"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+---------+--------------+-------------
 id     | integer |           |          |         | plain   |              |
Options: autovacuum_analyze_threshold=5, autovacuum_enabled=true, autovacuum_vacuum_threshold=5


postgres=> select relname,n_tup_ins , n_tup_upd,n_tup_del,n_live_tup,n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze,vacuum_count,autovacuum_count,analyze_count,autoanalyze_count
from pg_stat_all_tables where relname='test'; \gx
 relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | last_vacuum | last_autovacuum |         last_analyze          |       last_autoanalyze        | vacuum_count | autovacuum_count
 | analyze_count | autoanalyze_count
---------+-----------+-----------+-----------+------------+------------+-------------+-----------------+-------------------------------+-------------------------------+--------------+-----------------
-+---------------+-------------------
 test    |        12 |         0 |         0 |         12 |          0 |             |                 | 2020-10-06 03:31:12.055461+00 | 2020-10-06 10:21:34.177395+00 |            0 |                0
 |             1 |                 2
(1 row)


-[ RECORD 1 ]-----+------------------------------
relname           | test
n_tup_ins         | 12
n_tup_upd         | 0
n_tup_del         | 0
n_live_tup        | 12
n_dead_tup        | 0
last_vacuum       |
last_autovacuum   |
last_analyze      | 2020-10-06 03:31:12.055461+00
last_autoanalyze  | 2020-10-06 10:21:34.177395+00
vacuum_count      | 0
autovacuum_count  | 0
analyze_count     | 1
autoanalyze_count | 2


postgres=> delete from test where id>=6;
DELETE 7

postgres=> select relname,n_tup_ins , n_tup_upd,n_tup_del,n_live_tup,n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze,vacuum_count,autovacuum_count,analyze_count,autoanalyze_count
from pg_stat_all_tables where relname='test'; \gx
 relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | last_vacuum |        last_autovacuum        |         last_analyze          |       last_autoanalyze        | vacuum_count | au
tovacuum_count | analyze_count | autoanalyze_count
---------+-----------+-----------+-----------+------------+------------+-------------+-------------------------------+-------------------------------+-------------------------------+--------------+---
---------------+---------------+-------------------
 test    |        12 |         0 |         7 |          5 |          0 |             | 2020-10-06 10:28:35.355256+00 | 2020-10-06 03:31:12.055461+00 | 2020-10-06 10:28:35.375704+00 |            0 |
             1 |             1 |                 3
(1 row)


-[ RECORD 1 ]-----+------------------------------
relname           | test
n_tup_ins         | 12
n_tup_upd         | 0
n_tup_del         | 7
n_live_tup        | 5
n_dead_tup        | 0
last_vacuum       |
last_autovacuum   | 2020-10-06 10:28:35.355256+00
last_analyze      | 2020-10-06 03:31:12.055461+00
last_autoanalyze  | 2020-10-06 10:28:35.375704+00
vacuum_count      | 0
autovacuum_count  | 1
analyze_count     | 1
autoanalyze_count | 3

postgres=>

After deleted 7 of 12 records from TEST table, it triggers both auto analyzing and auto vacuuming immediately.

It triggers auto analyzing, because deleted records ( 7 ) > autovacuum_analyze_scale_factor( 0.05 ) *table records (12) + autovacuum_analyze_threshold (5 ).

It triggers auto vacuuming, because deleted records ( 7 ) > autovacuum_vacuum_scale_factor( 0.1 ) *table records(12) + autovacuum_vacuum_threshold (5 ).

Statistics is Missing from Tables in PostgreSQL

One of our customer complained about the PostgreSQL database is very slow just after migrated from Oracle database.

We did initial investigation and found very interesting different behaviours between Oracle and PostgreSQL regarding statistics.

As we know, Oracle has a daily overnight job for looking after user tables and indexes statistics. While for PostgreSQL, the way of collecting statistics is in a totally different way.

In PostgreSQL, the following default parameters are defined for collecting statistics.

 select name,setting  from pg_settings
 where name like '%vacuu%ana%';
              name               | setting
---------------------------------+---------
 autovacuum_analyze_scale_factor | 0.05
 autovacuum_analyze_threshold    | 50
(2 rows)

autovacuum_analyze_scale_factor : Specifies a fraction of the table size to add to autovacuum_analyze_threshold when deciding whether to trigger an ANALYZE. The default is 0.05 (5% of table size).

autovacuum_analyze_threshold : Specifies the minimum number of inserted, updated or deleted tuples needed to trigger an ANALYZE in any one table. The default is 50 tuples.

Let’s create a test table, then try to explain how those two parameters works together.

postgres=>create table test ( id integer);
CREATE TABLE

postgres=>select relname,n_tup_ins, n_tup_upd,n_tup_del,n_live_tup,
n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze,
vacuum_count,autovacuum_count,analyze_count,autoanalyze_count 
from pg_stat_all_tables 
where relname='test';
 relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | autovacuum_count | analyze_count | autoanalyze
_count
---------+-----------+-----------+-----------+------------+------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+------------
-------
 test    |         0 |         0 |         0 |          0 |          0 |             |                 |              |                  |            0 |                0 |             0 |
     0
(1 row)


For test purpose, we set up parameter autovacuum_analyze_threshold  for table TEST to a smaller number 5, instead of default 50.

postgres=> ALTER TABLE test SET (autovacuum_enabled = true, autovacuum_analyze_threshold=5);
ALTER TABLE
             
 postgres=> \d+ test
                                   Table "public.test"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+---------+--------------+-------------
 id     | integer |           |          |         | plain   |              |
Options: autovacuum_enabled=true, autovacuum_analyze_threshold=5

Insert just four records which should not trigger auto analysing.

postgres=> insert into test values (1);
INSERT 0 1
postgres=> insert into test values (2);
INSERT 0 1
postgres=> insert into test values (3);
INSERT 0 1
postgres=> insert into test values (4);
INSERT 0 1

After a while, check the stats without collected as expected.

postgres=> select relname,n_tup_ins, n_tup_upd,n_tup_del,n_live_tup,
n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze,
vacuum_count,autovacuum_count,analyze_count,autoanalyze_count 
from pg_stat_all_tables 
where relname='test';
 relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | autovacuum_count | analyze_count | autoanalyze
_count
---------+-----------+-----------+-----------+------------+------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+------------
-------
 test    |         4 |         0 |         0 |          4 |          0 |             |                 |              |                  |            0 |                0 |             0 |
     0
(1 row)

Now insert extra two records of fifth and sixth records, which should trigger auto analysing.

postgres=> insert into test values (5);
INSERT 0 1

postgres=> insert into test values (6);
INSERT 0 1

postgres=> select now();
now
-----------------------------
2020-10-06 03:18:27.418373+00
(1 row)
postgres=> select relname,n_tup_ins, n_tup_upd,n_tup_del,n_live_tup,
n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze,
vacuum_count,autovacuum_count,analyze_count,autoanalyze_count 
from pg_stat_all_tables 
where relname='test'; \gx
 relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | last_vacuum | last_autovacuum | last_analyze |       last_autoanalyze        | vacuum_count | autovacuum_count | analyze_count
| autoanalyze_count
---------+-----------+-----------+-----------+------------+------------+-------------+-----------------+--------------+-------------------------------+--------------+------------------+---------------
+-------------------
 test    |         6 |         0 |         0 |          6 |          0 |             |                 |              | 2020-10-06 03:19:15.765914+00 |            0 |                0 |             0
|                 1
(1 row)


-[ RECORD 1 ]-----+------------------------------
relname           | test
n_tup_ins         | 6
n_tup_upd         | 0
n_tup_del         | 0
n_live_tup        | 6
n_dead_tup        | 0
last_vacuum       |
last_autovacuum   |
last_analyze      |
last_autoanalyze  | 2020-10-06 03:19:15.765914+00
vacuum_count      | 0
autovacuum_count  | 0
analyze_count     | 0
autoanalyze_count | 1


postgres=>

As expected, the autoanalyze kicked off immediately just after two records inserted.

Beside the autoanalyze, we can analyze the table manually as below.

postgres=> analyze test;
ANALYZE
postgres=> select relname,n_tup_ins, n_tup_upd,n_tup_del,n_live_tup,
n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze,
vacuum_count,autovacuum_count,analyze_count,autoanalyze_count 
from pg_stat_all_tables 
where relname='test'; \gx
 relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | last_vacuum | last_autovacuum |         last_analyze          |       last_autoanalyze        | vacuum_count | autovacuum_count
 | analyze_count | autoanalyze_count
---------+-----------+-----------+-----------+------------+------------+-------------+-----------------+-------------------------------+-------------------------------+--------------+-----------------
-+---------------+-------------------
 test    |         6 |         0 |         0 |          6 |          0 |             |                 | 2020-10-06 03:31:12.055461+00 | 2020-10-06 03:19:15.765914+00 |            0 |                0
 |             1 |                 1
(1 row)


-[ RECORD 1 ]-----+------------------------------
relname           | test
n_tup_ins         | 6
n_tup_upd         | 0
n_tup_del         | 0
n_live_tup        | 6
n_dead_tup        | 0
last_vacuum       |
last_autovacuum   |
last_analyze      | 2020-10-06 03:31:12.055461+00
last_autoanalyze  | 2020-10-06 03:19:15.765914+00
vacuum_count      | 0
autovacuum_count  | 0
analyze_count     | 1
autoanalyze_count | 1

Let’s go back to the customer’s problem. There are various reasons that can cause statistics missing , like PostgreSQL RDS restored from snapshot, etc.

Since Oracle database was migrated to PostgreSQL, all the statistics were reset to zero. In the beginning, there were not enough transactions to trigger auto-analysing, so many SQL plans were not optimised until auto-analysing kicks off or starts a manual analysing.

In this case, it is a good practice to do a manual analysing immediately after Oracle database migrated to PostgreSQL .

How to Identify Fragmentation in PostgreSQL RDS

Fragmentation is a popular topic for any databases like Oracle and Microsoft SQL server. This post demonstrates how to identify fragmentation in PostgreSQL including AWS PostgreSQL RDS.

Install and Load pgstattuple Module

As super user, install extension pgstattuple as following if not yet.

CREATE EXTENSION pgstattuple;

Check extension details by query pg_extension and pg_available_extensions.

Use pgstattuple Function to List Table Fragmentation Details

The following sql shows table cdr_record is heavily fragmented with 98.71 % free space, and tuple percent is only 9%.

postgres=>   SELECT * FROM pgstattuple('cdr_record');
  table_len  | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space  | free_percent
-------------+-------------+-----------+---------------+------------------+----------------+--------------------+-------------+--------------
 10459709440 |       58988 |   9583432 |          0.09 |             7876 |        1430999 |               0.01 | 10324502496 |        98.71
(1 row)


postgres=> select count(*) from cdr_record;
 count
-------
 58988
(1 row)

pgstattuple output columns:

ColumnTypeDescription
table_lenbigintPhysical relation length in bytes
tuple_countbigintNumber of live tuples
tuple_lenbigintTotal length of live tuples in bytes
tuple_percentfloat8Percentage of live tuples
dead_tuple_countbigintNumber of dead tuples
dead_tuple_lenbigintTotal length of dead tuples in bytes
dead_tuple_percentfloat8Percentage of dead tuples
free_spacebigintTotal free space in bytes
free_percentfloat8Percentage of free space

Use pgstatindex Function to List Index Fragmentation Details

The following sql shows index idx_customers leaf fragmentation is 67.07%.

elms=> SELECT * FROM pgstatindex('idx_customers');
 version | tree_level | index_size | root_block_no | internal_pages | leaf_pages | empty_pages | deleted_pages | avg_leaf_density | leaf_fragmentation
---------+------------+------------+---------------+----------------+------------+-------------+---------------+------------------+--------------------
       3 |          3 | 1030766592 |         25480 |            835 |     101582 |           0 |         23408 |            77.48 |              67.07
(1 row)

pgstatindex output columns:

ColumnTypeDescription
versionintegerBtree version number
tree_levelintegerTree level of the root page
index_sizebigintTotal number of pages in index
root_block_nobigintLocation of root block
internal_pagesbigintNumber of “internal” (upper-level) pages
leaf_pagesbigintNumber of leaf pages
empty_pagesbigintNumber of empty pages
deleted_pagesbigintNumber of deleted pages
avg_leaf_densityfloat8Average density of leaf pages
leaf_fragmentationfloat8Leaf page fragmentation

By using both pgstattuple and pgstatindex function, the results are accumulated page by page at one point of time,so it is very helpful for users to determine whether vacuum is required or not.

pg_stat_all_tables can also be used to decide necessity of vacuum, but it depends on table or index statistics, which could be stalled for many reasons.

How to Move SQL Database Files to New Location

Move Database Files By SQL Script

1) Get database files details.

SELECT name, physical_name AS NewLocation, state_desc AS OnlineStatus
FROM sys.master_files
WHERE database_id = DB_ID(N'TestDatabase')
GO

2) Run the following SQL script to set a new location for SQL database files.

ALTER DATABASE TestDatabase   
    MODIFY FILE ( NAME = TestDatabase,   
                  FILENAME = 'C:\MSSQLDB\MSSQL15.MSSQLNODE1INST1\UserDB\Data\TestDatabase.mdf');  
GO
 
ALTER DATABASE TestDatabase   
    MODIFY FILE ( NAME = TestDatabase_log,   
                  FILENAME = 'C:\MSSQLDB\MSSQL15.MSSQLNODE1INST1\UserDB\Log\TestDatabase_log.ldf');  
GO

3) Run the following SQL script to take SQL database offline:

ALTER DATABASE TestDatabase SET OFFLINE; 
 GO

4) Move mdf and ldf files to the new location specified in the statement of step2.

5) Now database can be set online by running the following query.

ALTER DATABASE TestDatabase SET ONLINE;  
GO

6) To verify the results by running the following query.

SELECT name, physical_name AS NewLocation, state_desc AS OnlineStatus
FROM sys.master_files
WHERE database_id = DB_ID(N'TestDatabase')
GO

Move Database Files By using Detach and Attach Function

  1. Launch SSMS.
  2. Login to the SQL instance with an account that has the SysAdmin server role.
  3. Navigate to “Databases“.
  4. Right click the database that files to be moved to new location.
  5. Select “Properties“.
  6. In the Properties window, in the Select a Page panel, click “Files“.
  7. Record down the paths and filenames for the database.
  8. Click “OK”.
  9. Right click the same database again.
  10. Select “Tasks “, then click “Detach“.
  11. In the Detach Database window, check the box “Drop Connections“.
  12. Click “OK“.
  13. Launch Windows Explorer.
  14. Browse to the path recorded in step 7.
  15. Move the database’s files to the new location.
  16. Record down the paths where you have just moved the files to.
  17. Return to SSMS again.
  18. Right click “Databases
  19. Select “Attach“.
  20. In the Attach Databases window, click “Add“.
  21. Browse to the path you wrote down in step 16.
  22. Select the .mdf filename for the database.
  23. Click “OK”.
  24. If the other files are not found, click the ellipsis next to the filename that is not found.
  25. In the Locate Database Files window, “browse” to the location of the file for the database that you wrote down in step 16.
  26. Select the file.
  27. Click “OK”.
  28. In the Attach Databases window, click “OK”.

How to Configure SQL Server to Access USB Shared Folder in VirtualBox

In SQL Server Management Studio, you will only see the local drives available to SQL Server database engine. In this article we will take a look at the approach on how to configure SQL Server to access shared folder to perform database backup or restore commands, even further the SQL Server datafiles and logfiles can be stored on to shared folders in Oracle VirtualBox.

Install Guest Additions  

The first thing is to install guest additions if not yet as per following instructions.

Navagite to Devices->Optical Drive->Choose a disk file-> highlight “VboxGuestAdditions” file.
Then click Open.

VirtualBox Guest Addition

Double-click on this CD drive to launch the Oracle VM VirtualBox guest additional setup.

VirtualBox Guest Addition
Start guest additional installation
VirtualBox Guest Addition

Click Next and select the destination folder. It requires approx 1 MB space in the drive.

Destination folder
VirtualBox Guest Addition

In the next step, it shows the components to install.

Select components
VirtualBox Guest Addition

Reboot the VM to complete the guest additional feature for a VM.

Reboot VM
VirtualBox Guest Addition

  Set up VM Shared Folders

VM macine -> Settings->Shared Folders->Add New Shared Folder(+)->Choose Folder Path.

Click OK.

VirtualBox Shared Folder

Go to File Explorer, we can see the following expected network diagram.

Network VBOXSVR of Shared Folder

Enable xp_cmdshell Command in SQL Server

xp_cmdshell is disabled by default. so we need use sp_configure command to enable it as shown below:

EXEC sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO

EXEC sp_configure 'xp_cmdshell',1
GO
RECONFIGURE
GO

Define the Share Drive for SQL

Now we need define the shared folder with xp_cmdshell command of SQL server as below:

EXEC XP_CMDSHELL 'net use Z: \VBOXSVR\mssql'

To verify the new drive, we can use the below command that will show you all files in that newly mapped drive:

EXEC XP_CMDSHELL 'Dir Z:' 

To un-map the shared folder drive from SQL server by:

exec xp_cmdShell 'net use Z: /delete'

Test the Shared Folder Drive in SQL Server

Backup Database onto Shared Folder Drive from SSMS

SSMS backup database

SSMS backup database
SSMS backup database

Backup Database onto Shared Folder Drive by Running SQL Script

BACKUP DATABASE [AdventureWorks15] TO DISK = N'Z:\AdventureWorks15.bak'
WITH NOFORMAT, NOINIT, NAME = N'AdventureWorks15-Full Database Backup',
SKIP, NOREWIND, NOUNLOAD, STATS = 10
GO
10 percent processed.
20 percent processed.
30 percent processed.
40 percent processed.
50 percent processed.
60 percent processed.
70 percent processed.
80 percent processed.
90 percent processed.
Processed 25016 pages for database 'AdventureWorks15', file 'AdventureWorks15' on file 2.
100 percent processed.
Processed 296 pages for database 'AdventureWorks15', file 'AdventureWorks15_log' on file 2.
BACKUP DATABASE successfully processed 25312 pages in 8.847 seconds (22.352 MB/sec).
Completion time: 2020-09-21T18:07:50.3413113+10:00

Create a Database onto Shared Folder Drive

CREATE DATABASE [TestDatabase] ON (NAME = N'TestDatabase', FILENAME = N'Z:\TestDatabase.mdf', SIZE = 512MB, FILEGROWTH = 64MB) LOG ON (NAME = N'TestDatabase_log', FILENAME = N'Z:\TestDatabase_log.ldf', SIZE = 256MB, FILEGROWTH = 32MB) 
GO
Commands completed successfully.
Completion time: 2020-09-21T18:15:32.9791694+10:00

Restore Database onto Shared Folder Drive

Query the logical file names of the database from the previous backup.

RESTORE FILELISTONLY
FROM DISK = 'Z:\AdventureWorks15.bak'

Restore the backup to a new database by using the logicalname column of the previous step.

RESTORE DATABASE AdventureWorks15New
FROM DISK = 'Z:\AdventureWorks15.bak'
WITH MOVE 'AdventureWorks15' TO 'Z:\AdventureWorks15New.mdf',
MOVE 'AdventureWorks15_log' TO 'Z:\AdventureWorks15New_log.ldf'
Processed 25016 pages for database 'AdventureWorks15New', file 'AdventureWorks15' on file 1.
Processed 294 pages for database 'AdventureWorks15New', file 'AdventureWorks15_log' on file 1.
RESTORE DATABASE successfully processed 25310 pages in 5.330 seconds (37.097 MB/sec).
Completion time: 2020-09-21T18:29:04.4634020+10:00

So all works as expected. The shared folder now can be used to store backups, and also datafiles/logfiles of a SQL server.