Home PostgreSQL PostgreSQL Checksum And Data Corruption Issues

PostgreSQL Checksum And Data Corruption Issues

0
106

Media failure is one of the crucial things that the database administrator should be aware of. Media failure is nothing but a physical problem reading or writing to files on the storage medium.

A typical example of media failure is a disk head crash, which causes the loss of all files on a disk drive. All files associated with a database are vulnerable to a disk crash, including datafiles, wal files, and control files.

But what is a PostgreSQL checksum?

When the checksum is enabled, a small integer checksum is written to each “page” of data that Postgres stores on your hard drive. Upon reading that block, the checksum value is recomputed and compared to the stored one.

This detects data corruption, which (without checksums) could be silently lurking in your database for a long time.

How does PostgreSQL checksum work?

PostgreSQL maintains page validity primarily on the way in and out of its buffer cache.

PostgreSQL page has to pass through OS Cache before it leaves or enters into shared buffers. So page validity happens before leaving the shared buffers and before entering the shared buffers. When PostgreSQL tries to copy the page into its buffer cache then it will (if possible) detect that something is wrong, and it will not allow page to enter into shared buffers with this invalid 8k page, and error out any queries that require this page for processing with the ERROR message

ERROR: invalid page in block 0 of relation base/13455/16395

If you already have a block with invalid data at disk-level and its page version at buffer level, during the next checkpoint, while page out, it will update invalid checksum details but which is rarely possible in real-time environments.

If the invalid byte is part of the PostgreSQL database buffer cache, then PostgreSQL will quite happily assume nothing is wrong and attempt to process the data on the page. Results are unpredictable; Some times you will get an error and sometimes you may end up with wrong data.

How PostgreSQL Checks Page Validity?

In a typical page, if data checksums are enabled, information is stored in a 2-byte field containing flag bits after the page header.

This is followed by three 2-byte integer fields (pd_lower, pd_upper, and pd_special). These contain byte offsets from the page start to the start of unallocated space, to the end of unallocated space, and to the start of the special space.

The checksum value typically begins with zero and every time reading that block, the checksum value is recomputed and compared to the stored one. This detects data corruption.

Checksums are not maintained for blocks while they are in the shared buffers – so if you look at a buffer in the PostgreSQL page cache with pageinspect and you see a checksum value, note that when you do page inspect on a page which is already in the buffer, you may not get the actual checksum. The checksum is calculated and stamped onto the page when the page is written out of the buffer cache into the operating system page cache.

Practical Example

I have a table check_corruption wherein I am going to do all the garbage work.

my table size is 8 kB.
has 5 records.
the version I am using is PostgreSQL v12.

postgres=# select * from check_corruption;
aid | bid | abalance | filler
-----+-----+----------+--------------------------------------------------------------------------------------
1 | 1 | 0 | This is checksum example, checksum is for computing block corruption
2 | 1 | 0 | This is checksum example, checksum is for computing block corruption
3 | 1 | 0 | This is checksum example, checksum is for computing block corruption
4 | 1 | 0 | This is checksum example, checksum is for computing block corruption
5 | 1 | 0 | This is checksum example, checksum is for computing block corruption
(5 rows)

postgres=# SELECT * FROM page_header(get_raw_page('check_corruption',0));
lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
-----------+----------+-------+-------+-------+---------+----------+---------+-----------
0/17EFCA0 | 0 | 0 | 44 | 7552 | 8192 | 8192 | 4 | 0
(1 row)

postgres=# \dt+ check_corruption
List of relations
Schema | Name | Type | Owner | Size | Description
--------+------------------+-------+----------+------------+-------------
public | check_corruption | table | postgres | 8192 bytes |
(1 row)

postgres=# select pg_relation_filepath('check_corruption');
pg_relation_filepath
----------------------
base/13455/16490
(1 row)

Check Whether Checksum is enabled?

[[email protected] ~]$ pg_controldata -D /u01/pgsql/data | grep checksum
Data page checksum version: 0
[[email protected] ~]$

It is disabled.

Enable Checksum


[[email protected] ~]$ pg_checksums -D /u01/pgsql/data --enable --progress --verbose
pg_checksums: checksums enabled in file "/u01/pgsql/data/global/2847"
pg_checksums: checksums enabled in file "/u01/pgsql/data/global/1260_fsm"
pg_checksums: checksums enabled in file "/u01/pgsql/data/global/4175"
..
..
23/23 MB (100%) computed
Checksum operation completed
Files scanned: 969
Blocks scanned: 3006
pg_checksums: syncing data directory

pg_checksums: updating control file
Data checksum version: 1
Checksums enabled in cluster

 

Check PostgreSQL page errors

[[email protected] ~]$ pg_checksums -D /u01/pgsql/data –check
Checksum operation completed
Files scanned: 969
Blocks scanned: 3006
Bad checksums: 0
Data checksum version: 1
[[email protected] ~]$

Corrupting the file  

As the table check_corruption data file is 16490, I am going to corrupt the file with the Operating system’s dd command. Make sure to not do this kind of activity in your production environment . I am doing here just to show you how corruption works.

[[email protected] 13455]$ dd bs=8192 count=1 seek=1 of=16490 if=16490

Note: I am corrupting file which is having number 16490 which is the file related to our experimental table.

Let’s restart the node and do a select * from the table.

[[email protected]]$ /usr/local/pgsql/bin/pg_ctl restart -D /u01/pgsql/data

postgres=# select * from check_corruption;
aid | bid | abalance | filler
-----+-----+----------+--------------------------------------------------------------------------------------
1 | 1 | 0 | This is checksum example, checksum is for computing block corruption
2 | 1 | 0 | This is checksum example, checksum is for computing block corruption
3 | 1 | 0 | This is checksum example, checksum is for computing block corruption
4 | 1 | 0 | This is checksum example, checksum is for computing block corruption
5 | 1 | 0 | This is checksum example, checksum is for computing block corruption
(5 rows)

I got the result , but why? I have already corrupted the physical file. Here is the answer.

As we discussed earlier, during restart my PostgreSQL has replaced error checksum with the value of shared buffer.

How can we trigger a checksum warning?

We need to get that row out of shared buffers. The quickest way to do so in this test scenario is to restart the database, then make sure we do not even look at (e.g. SELECT) the table before we make our on-disk modification. Once that is done, the checksum will fail and we will, as expected, receive a checksum error:

i.e., stop the server, corrupt the disk and start it.

[[email protected] ~]$/usr/local/pgsql/bin/pg_ctl stop -D /u01/pgsql/data 
[[email protected] ~]$dd bs=8192 count=1 seek=1 of=16490 if=16490
[[email protected] ~]$/usr/local/pgsql/bin/pg_ctl start -D /u01/pgsql/data

Now do a select * from the table. Here we can see we are getting the corruption error.

postgres=# select * from check_corruption;
2020-02-06 19:06:17.433 IST [25218] WARNING: page verification failed, calculated checksum 39428 but expected 39427
WARNING: page verification failed, calculated checksum 39428 but expected 39427
2020-02-06 19:06:17.434 IST [25218] ERROR: invalid page in block 1 of relation base/13455/16490
2020-02-06 19:06:17.434 IST [25218] STATEMENT: select * from check_corruption;
ERROR: invalid page in block 1 of relation base/13455/16490

How to resolve the PostgreSQL corrupted page issue?

You can create an EXTENSION hstore and use function find_bad_row() to get the ctid of the corrupted location.

postgres=# CREATE EXTENSION hstore;
CREATE EXTENSION
postgres=#
CREATE OR REPLACE FUNCTION
find_bad_row(tableName TEXT)
RETURNS tid
as $find_bad_row$
DECLARE
result tid;
curs REFCURSOR;
row1 RECORD;
row2 RECORD;
tabName TEXT;
count BIGINT := 0;
BEGIN
SELECT reverse(split_part(reverse($1), '.', 1)) INTO tabName;
OPEN curs FOR EXECUTE 'SELECT ctid FROM ' || tableName;
count := 1;
FETCH curs INTO row1;
WHILE row1.ctid IS NOT NULL LOOP
result = row1.ctid;
count := count + 1;
FETCH curs INTO row1;
EXECUTE 'SELECT (each(hstore(' || tabName || '))).* FROM '
|| tableName || ' WHERE ctid = $1' INTO row2
USING row1.ctid;
IF count % 100000 = 0 THEN
RAISE NOTICE 'rows processed: %', count;
END IF;
END LOOP;
CLOSE curs;
RETURN row1.ctid;
EXCEPTION
WHEN OTHERS THEN
RAISE NOTICE 'LAST CTID: %', result;
RAISE NOTICE '%: %', SQLSTATE, SQLERRM;
RETURN result;
END
$find_bad_row$
LANGUAGE plpgsql;

Now execute the function to get the ctid.

postgres=# select find_bad_row(‘check_corruption’);
2020-02-06 19:44:24.227 IST [25929] WARNING: page verification failed, calculated checksum 39428 but expected 39427
2020-02-06 19:44:24.227 IST [25929] CONTEXT: PL/pgSQL function find_bad_row(text) line 21 at FETCH
WARNING: page verification failed, calculated checksum 39428 but expected 39427
NOTICE: LAST CTID: (0,5)
NOTICE: XX001: invalid page in block 1 of relation base/13455/16490
find_bad_row
————–
(0,5)
(1 row)

Delete that particulat ctid to resolve the issue.

postgres=# delete from check_corruption where ctid='(0,6)’;
DELETE 1
postgres=#

If deleting ctid has not worked for you, you have an alternative solution which is setting zero_damaged_pages parameter.

Suppose, I have having a master table and ctid solution is not working for that table.

postgres=# select * from master;
WARNING: page verification failed, calculated checksum 8770 but expected 8769
ERROR: invalid page in block 1 of relation base/13455/16770
postgres=#

Solution:

postgres=# SET zero_damaged_pages = on;
SET
postgres=# vacuum full master;

Now I can access the table. Here, it cleared the damaged page and gave the rest of the result.

postgres=# select * from master;
WARNING: page verification failed, calculated checksum 8770 but expected 8769
WARNING: invalid page in block 1 of relation base/13455/16770; zeroing out page
id | name | city
----+---------+-----------
1 | abc | hyderabad
2 | xyz | chennai
3 | mnp | newyork

zero_damaged_pages (boolean): Detection of a damaged page header normally causes PostgreSQL to report an error, aborting the current transaction. Setting zero_damaged_pages to on causes the system to instead report a warning, zero out the damaged page in memory, and continue processing. This behavior will destroy data, namely all the rows on the damaged page. However, it does allow you to get past the error and retrieve rows from any undamaged pages that might be present in the table. It is useful for recovering data if corruption has occurred due to a hardware or software error. You should generally not set this on until you have given up hope of recovering data from the damaged pages of a table. Zeroed-out pages are not forced to disk so it is recommended to recreate the table or the index before turning this parameter off again. The default setting is off, and it can only be changed by a superuser.

Important Facts to know before Enabling CheckSum

There are a couple of things to be aware when using this feature though. First, using checksums has a cost in performance as it introduces extra calculations for each data page (8kB by default), so be aware of the tradeoff between data security and performance when using it.

There are many factors that influence how much slower things are when checksums are enabled, including:

  • How likely things are to be read from shared_buffers, which depends on how large shared_buffers is set, and how much of your active database fits inside of it?
  • How fast your server is in general, and how well it (and your compiler) are able to optimize the checksum calculation?
  • How many data pages you have (which can be influenced by your data types)?
  • How often you are writing new pages (via COPY, INSERT, or UPDATE)?
  • How often you are reading values (via SELECT)?
  • The more that shared buffers are used (and using them efficiently is a good general goal), the less checksumming is done, and the less the impact of checksums on database performance will be. On an average if you enable checksum the performance cost would be more than 2% and for inserts, the average difference was 6%. For selects, that jumps to 19%.