Why are data files getting modified?

Started by Abhishek Bholaabout 4 years ago4 messagesgeneral
Jump to latest
#1Abhishek Bhola
abhishek.bhola@japannext.co.jp

I have a table with monthly partitions, starting from 2011.
Each partition has about 2-3 billion rows and is about 40TB of data in
total.
I am running a select query to \copy the data into csv files for each month.

I see there is an *AUTOVACUUM* job started on all the partitions that have
been read (on which select has run). I am assuming the reason why Postgres
is autovacuuming these large partitions is this:

"This implies that if a table is not otherwise vacuumed, autovacuum will be
invoked on it approximately once every autovacuum_freeze_max_age minus
vacuum_freeze_min_age transactions. For tables that are regularly vacuumed
for space reclamation purposes, this is of little importance. However, for
static tables (including tables that receive inserts, but no updates or
deletes), there is no need to vacuum for space reclamation, so it can be
useful to try to maximize the interval between forced autovacuums on very
large static tables. Obviously one can do this either by increasing
autovacuum_freeze_max_age or decreasing vacuum_freeze_min_age."

https://www.postgresql.org/docs/13/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND

Please see these partition tables are static, i.e., they received data many
years ago, with no updates ever.

db=# SELECT c.relname as table_name, c.relkind as type, age(c.relfrozenxid)
as age, c.relfrozenxid FROM pg_class AS c WHERE age(c.relfrozenxid) <>
2147483647 ORDER BY 3 DESC LIMIT 20;
table_name | type | age | relfrozenxid
---------------------------+------+-----------+--------------
hist_omx_orderaudit201506 | r | 201463508 | 319891
hist_omx_orderaudit201509 | r | 201418497 | 364902
hist_omx_orderaudit201205 | r | 201142042 | 641357

autovacuum_freeze_max_age
---------------------------
200000000
(1 row)

Am I wrong in my understanding why this autovacuum is getting triggered?

However, my main question is that why do the data files in the tablespace,
corresponding to these partition tables (that have been read or selected
recently), or the ones autovacuumed recently have a latest timestamp?

There was no insert or update on these partitions, so why do I see an
updated timestamp? Does autovacuum change something on the datafiles?

Thanks

--
_This correspondence (including any attachments) is for the intended
recipient(s) only. It may contain confidential or privileged information or
both. No confidentiality or privilege is waived or lost by any
mis-transmission. If you receive this correspondence by mistake, please
contact the sender immediately, delete this correspondence (and all
attachments) and destroy any hard copies. You must not use, disclose, copy,
distribute or rely on any part of this correspondence (including any
attachments) if you are not the intended
recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_

#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Abhishek Bhola (#1)
Re: Why are data files getting modified?

On Wed, 2022-02-09 at 17:29 +0900, Abhishek Bhola wrote:

I have a table with monthly partitions, starting from 2011.
Each partition has about 2-3 billion rows and is about 40TB of data in total.
I am running a select query to \copy the data into csv files for each month.

I see there is an AUTOVACUUM job started on all the partitions that have been
read (on which select has run). I am assuming the reason why Postgres is
autovacuuming these large partitions is this:

db=# SELECT c.relname as table_name, c.relkind as type, age(c.relfrozenxid) as age,
c.relfrozenxid FROM pg_class AS c WHERE age(c.relfrozenxid) <> 2147483647 ORDER BY 3 DESC LIMIT 20;
        table_name         | type |    age    | relfrozenxid
---------------------------+------+-----------+--------------
 hist_omx_orderaudit201506 | r    | 201463508 |       319891
 hist_omx_orderaudit201509 | r    | 201418497 |       364902
 hist_omx_orderaudit201205 | r    | 201142042 |       641357

autovacuum_freeze_max_age
---------------------------
 200000000
(1 row)

Am I wrong in my understanding why this autovacuum is getting triggered?

However, my main question is that why do the data files in the tablespace, corresponding
to these partition tables (that have been read or selected recently), or the ones
autovacuumed recently have a latest timestamp?

There was no insert or update on these partitions, so why do I see an updated timestamp?
Does autovacuum change something on the datafiles?

Yes, autovacuum will freeze old rows in the table, which modifies the data files.

To keep this to a minimum, you can run VACUUM (FREEZE) on a partition as soon
as you know that its data won't get modified any more.
Still, I wouldn't be sure that the file modification timestamp won't change
any more after that. But that should be pretty irrelevant; at any rate, no
substantial data modifications will occur after that.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#3Abhishek Bhola
abhishek.bhola@japannext.co.jp
In reply to: Laurenz Albe (#2)
Re: Why are data files getting modified?

I see, but then why is the autovacuum getting triggered after the select
command?

On Thu, Feb 10, 2022, 12:14 AM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

On Wed, 2022-02-09 at 17:29 +0900, Abhishek Bhola wrote:

I have a table with monthly partitions, starting from 2011.
Each partition has about 2-3 billion rows and is about 40TB of data in

total.

I am running a select query to \copy the data into csv files for each

month.

I see there is an AUTOVACUUM job started on all the partitions that have

been

read (on which select has run). I am assuming the reason why Postgres is
autovacuuming these large partitions is this:

db=# SELECT c.relname as table_name, c.relkind as type,

age(c.relfrozenxid) as age,

c.relfrozenxid FROM pg_class AS c WHERE age(c.relfrozenxid) <>

2147483647 ORDER BY 3 DESC LIMIT 20;

table_name | type | age | relfrozenxid
---------------------------+------+-----------+--------------
hist_omx_orderaudit201506 | r | 201463508 | 319891
hist_omx_orderaudit201509 | r | 201418497 | 364902
hist_omx_orderaudit201205 | r | 201142042 | 641357

autovacuum_freeze_max_age
---------------------------
200000000
(1 row)

Am I wrong in my understanding why this autovacuum is getting triggered?

However, my main question is that why do the data files in the

tablespace, corresponding

to these partition tables (that have been read or selected recently), or

the ones

autovacuumed recently have a latest timestamp?

There was no insert or update on these partitions, so why do I see an

updated timestamp?

Does autovacuum change something on the datafiles?

Yes, autovacuum will freeze old rows in the table, which modifies the data
files.

To keep this to a minimum, you can run VACUUM (FREEZE) on a partition as
soon
as you know that its data won't get modified any more.
Still, I wouldn't be sure that the file modification timestamp won't change
any more after that. But that should be pretty irrelevant; at any rate, no
substantial data modifications will occur after that.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

--
_This correspondence (including any attachments) is for the intended
recipient(s) only. It may contain confidential or privileged information or
both. No confidentiality or privilege is waived or lost by any
mis-transmission. If you receive this correspondence by mistake, please
contact the sender immediately, delete this correspondence (and all
attachments) and destroy any hard copies. You must not use, disclose, copy,
distribute or rely on any part of this correspondence (including any
attachments) if you are not the intended
recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_

#4Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Abhishek Bhola (#3)
Re: Why are data files getting modified?

On Thu, 2022-02-10 at 00:29 +0900, Abhishek Bhola wrote:

I see there is an AUTOVACUUM job started on all the partitions that have been
read (on which select has run). I am assuming the reason why Postgres is
autovacuuming these large partitions is this:

db=# SELECT c.relname as table_name, c.relkind as type, age(c.relfrozenxid) as age,
c.relfrozenxid FROM pg_class AS c WHERE age(c.relfrozenxid) <> 2147483647 ORDER BY 3 DESC LIMIT 20;
        table_name         | type |    age    | relfrozenxid
---------------------------+------+-----------+--------------
 hist_omx_orderaudit201506 | r    | 201463508 |       319891
 hist_omx_orderaudit201509 | r    | 201418497 |       364902
 hist_omx_orderaudit201205 | r    | 201142042 |       641357

autovacuum_freeze_max_age
---------------------------
 200000000
(1 row)

I see, but then why is the autovacuum getting triggered after the select command?

Autovacuum is triggered because "relfrozenxid" is nore than 200000000 transactions
in the past.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com