pg_multixact issues

Started by Kiriakos Georgiouabout 10 years ago5 messagesgeneral
Jump to latest
#1Kiriakos Georgiou
kg.postgresql@olympiakos.com

Hello,

Our pg_multixact directory keeps growing. I did a "vacuum freeze” which didn’t help. I also did a "vacuum full” which didn’t help either.
We had this condition with 9.3.4 as well. When I upgraded our cluster to 9.4.5 (via plain sql dump and load) as expected the issue was resolved but now it’s happening again. Luckily it has no ill effect other than consuming 4G of space for an otherwise 1G database.

Can you offer any hints as to how I can cure this?

thanks,
Kiriakos Georgiou

pg_controldata output:

pg_control version number: 942
Catalog version number: 201409291
Database system identifier: 6211781659140720513
Database cluster state: in production
pg_control last modified: Wed Feb 10 13:45:02 2016
Latest checkpoint location: D/FB5FE630
Prior checkpoint location: D/FB5FE558
Latest checkpoint's REDO location: D/FB5FE5F8
Latest checkpoint's REDO WAL file: 000000010000000D000000FB
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0/3556219
Latest checkpoint's NextOID: 2227252
Latest checkpoint's NextMultiXactId: 2316566
Latest checkpoint's NextMultiOffset: 823062151
Latest checkpoint's oldestXID: 668
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 3556219
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Time of latest checkpoint: Wed Feb 10 13:45:02 2016
Fake LSN counter for unlogged rels: 0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
Current wal_level setting: hot_standby
Current wal_log_hints setting: off
Current max_connections setting: 100
Current max_worker_processes setting: 8
Current max_prepared_xacts setting: 0
Current max_locks_per_xact setting: 1024
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0

the offsets directory:

-rw------- 1 postgres dba 262144 Nov 3 15:22 0000
-rw------- 1 postgres dba 262144 Nov 5 12:45 0001
-rw------- 1 postgres dba 262144 Nov 9 14:25 0002
-rw------- 1 postgres dba 262144 Nov 13 10:10 0003
-rw------- 1 postgres dba 262144 Nov 16 15:40 0004
-rw------- 1 postgres dba 262144 Nov 20 09:55 0005
-rw------- 1 postgres dba 262144 Dec 1 08:00 0006
-rw------- 1 postgres dba 262144 Dec 9 11:50 0007
-rw------- 1 postgres dba 262144 Dec 16 08:14 0008
-rw------- 1 postgres dba 262144 Dec 21 09:40 0009
-rw------- 1 postgres dba 262144 Dec 31 09:55 000A
-rw------- 1 postgres dba 262144 Jan 4 21:17 000B
-rw------- 1 postgres dba 262144 Jan 6 10:50 000C
-rw------- 1 postgres dba 262144 Jan 7 18:20 000D
-rw------- 1 postgres dba 262144 Jan 13 13:55 000E
-rw------- 1 postgres dba 262144 Jan 15 11:55 000F
-rw------- 1 postgres dba 262144 Jan 22 07:50 0010
-rw------- 1 postgres dba 262144 Jan 26 16:35 0011
-rw------- 1 postgres dba 262144 Jan 29 10:16 0012
-rw------- 1 postgres dba 262144 Feb 3 13:17 0013
-rw------- 1 postgres dba 262144 Feb 3 16:13 0014
-rw------- 1 postgres dba 262144 Feb 4 08:24 0015
-rw------- 1 postgres dba 262144 Feb 5 13:20 0016
-rw------- 1 postgres dba 262144 Feb 8 11:26 0017
-rw------- 1 postgres dba 262144 Feb 8 11:46 0018
-rw------- 1 postgres dba 262144 Feb 8 12:25 0019
-rw------- 1 postgres dba 262144 Feb 8 13:19 001A
-rw------- 1 postgres dba 262144 Feb 8 14:23 001B
-rw------- 1 postgres dba 262144 Feb 8 15:32 001C
-rw------- 1 postgres dba 262144 Feb 8 17:01 001D
-rw------- 1 postgres dba 262144 Feb 8 19:19 001E
-rw------- 1 postgres dba 262144 Feb 8 22:11 001F
-rw------- 1 postgres dba 262144 Feb 9 01:44 0020
-rw------- 1 postgres dba 262144 Feb 9 05:57 0021
-rw------- 1 postgres dba 262144 Feb 9 10:45 0022
-rw------- 1 postgres dba 98304 Feb 10 13:35 0023

the members directory has 15723 files:
ls -l|wc -l
15723

#2Thomas Munro
thomas.munro@gmail.com
In reply to: Kiriakos Georgiou (#1)
Re: pg_multixact issues

On Thu, Feb 11, 2016 at 7:52 AM, Kiriakos Georgiou
<kg.postgresql@olympiakos.com> wrote:

Hello,

Our pg_multixact directory keeps growing. I did a "vacuum freeze” which
didn’t help. I also did a "vacuum full” which didn’t help either.
We had this condition with 9.3.4 as well. When I upgraded our cluster to
9.4.5 (via plain sql dump and load) as expected the issue was resolved but
now it’s happening again. Luckily it has no ill effect other than consuming
4G of space for an otherwise 1G database.

Can you offer any hints as to how I can cure this?

thanks,
Kiriakos Georgiou

pg_controldata output:

pg_control version number: 942
Catalog version number: 201409291
Database system identifier: 6211781659140720513
Database cluster state: in production
pg_control last modified: Wed Feb 10 13:45:02 2016
Latest checkpoint location: D/FB5FE630
Prior checkpoint location: D/FB5FE558
Latest checkpoint's REDO location: D/FB5FE5F8
Latest checkpoint's REDO WAL file: 000000010000000D000000FB
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0/3556219
Latest checkpoint's NextOID: 2227252
Latest checkpoint's NextMultiXactId: 2316566
Latest checkpoint's NextMultiOffset: 823062151
Latest checkpoint's oldestXID: 668
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 3556219
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Time of latest checkpoint: Wed Feb 10 13:45:02 2016
Fake LSN counter for unlogged rels: 0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
Current wal_level setting: hot_standby
Current wal_log_hints setting: off
Current max_connections setting: 100
Current max_worker_processes setting: 8
Current max_prepared_xacts setting: 0
Current max_locks_per_xact setting: 1024
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0

the offsets directory:

-rw------- 1 postgres dba 262144 Nov 3 15:22 0000
-rw------- 1 postgres dba 262144 Nov 5 12:45 0001
-rw------- 1 postgres dba 262144 Nov 9 14:25 0002
-rw------- 1 postgres dba 262144 Nov 13 10:10 0003
-rw------- 1 postgres dba 262144 Nov 16 15:40 0004
-rw------- 1 postgres dba 262144 Nov 20 09:55 0005
-rw------- 1 postgres dba 262144 Dec 1 08:00 0006
-rw------- 1 postgres dba 262144 Dec 9 11:50 0007
-rw------- 1 postgres dba 262144 Dec 16 08:14 0008
-rw------- 1 postgres dba 262144 Dec 21 09:40 0009
-rw------- 1 postgres dba 262144 Dec 31 09:55 000A
-rw------- 1 postgres dba 262144 Jan 4 21:17 000B
-rw------- 1 postgres dba 262144 Jan 6 10:50 000C
-rw------- 1 postgres dba 262144 Jan 7 18:20 000D
-rw------- 1 postgres dba 262144 Jan 13 13:55 000E
-rw------- 1 postgres dba 262144 Jan 15 11:55 000F
-rw------- 1 postgres dba 262144 Jan 22 07:50 0010
-rw------- 1 postgres dba 262144 Jan 26 16:35 0011
-rw------- 1 postgres dba 262144 Jan 29 10:16 0012
-rw------- 1 postgres dba 262144 Feb 3 13:17 0013
-rw------- 1 postgres dba 262144 Feb 3 16:13 0014
-rw------- 1 postgres dba 262144 Feb 4 08:24 0015
-rw------- 1 postgres dba 262144 Feb 5 13:20 0016
-rw------- 1 postgres dba 262144 Feb 8 11:26 0017
-rw------- 1 postgres dba 262144 Feb 8 11:46 0018
-rw------- 1 postgres dba 262144 Feb 8 12:25 0019
-rw------- 1 postgres dba 262144 Feb 8 13:19 001A
-rw------- 1 postgres dba 262144 Feb 8 14:23 001B
-rw------- 1 postgres dba 262144 Feb 8 15:32 001C
-rw------- 1 postgres dba 262144 Feb 8 17:01 001D
-rw------- 1 postgres dba 262144 Feb 8 19:19 001E
-rw------- 1 postgres dba 262144 Feb 8 22:11 001F
-rw------- 1 postgres dba 262144 Feb 9 01:44 0020
-rw------- 1 postgres dba 262144 Feb 9 05:57 0021
-rw------- 1 postgres dba 262144 Feb 9 10:45 0022
-rw------- 1 postgres dba 98304 Feb 10 13:35 0023

the members directory has 15723 files:
ls -l|wc -l
15723

Maybe you didn't vacuum freeze your template0 database and it's
probably holding back the oldest multixact ID -- take a look at the
datminmxid column in pg_database.

You seem to be generating fairly large multixacts, and they use disk
space until the cluster-wide oldest multixact ID is advanced by vacuum
freeze. You can estimate the average number of members per multixact
like this (assuming default page size):

number of members = number of member segment files * 1636 * 32
number of multixacts = number of offsets segment files * 2048 * 32

You have about 2.3 million multixacts, and about 823 million members,
so your multixacts have an average of around 358 members each.

There are two ways for autovacuum to be automatically triggered to
clean up old multixact state:

1. When you have more than vacuum_multixact_freeze_table_age (default
150 million) multixacts, it will perform a wraparound vacuum to
advance the cluster's oldest multixact ID. The oldest is 1 for you,
meaning this has probably never happened, and you're only up to around
2.3 million so it's not going to happen for a long time with the
default setting.

2. When it thinks that the member address space is in danger of
wrapping, which kicks in before the above for clusters with large
average multixact size. This happens after
MULTIXACT_MEMBER_SAFE_THRESHOLD is exceeded, which is when you get to
~2.2 billion members, and you're not quite half way there yet.

I'm not sure what the best course of action is for you, but here are
some thoughts:

1. You could manually freeze all databases with vacuumdb --freeze
--all, but this requires first setting datallowcon to true for your
template0 database, because otherwise it would not be vacuum-frozen by
any command you can issue manually, and would therefore prevent the
oldest multixact ID from advancing.

2. You could change the vacuum_multixact_freeze_table_age to a value
that will cause it to do vacuum freezes more often. Assuming your
workload continues with the same average multixact size, you could
work backwards from the numbers above: since 2.3 million multixacts
occupy 4GB, if you set vacuum_multixact_freeze_table_age = 500000 you
should expect a size under 1GB. But note that the freeze operations
can be IO intensive and take a long time depending on your database
size so you don't want them too often.

3. You could do nothing and wait for autovacuum to detect that you
are using more than half the member address space and trigger a
freeze, which will happen some time after you have around 41k member
segments (occupying around 10GB of disk since they hold 256kb each).

4. You could look into whether all those multixacts with many member
are really expected. (Large numbers of concurrent FK checks or
explicit share locks on the same rows perhaps? A good proportion of
this happened on one day this week I see.)

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Thomas Munro (#2)
Re: pg_multixact issues

Thomas Munro wrote:

4. You could look into whether all those multixacts with many member
are really expected. (Large numbers of concurrent FK checks or
explicit share locks on the same rows perhaps? A good proportion of
this happened on one day this week I see.)

I think maybe this patch
/messages/by-id/20150620043650.GX133018@postgresql.org
should help with this. I expect to come back to this and get it pushed
to 9.3 and 9.4 sometime ...

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Thomas Munro
thomas.munro@gmail.com
In reply to: Alvaro Herrera (#3)
Re: pg_multixact issues

On Thu, Feb 11, 2016 at 11:05 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Thomas Munro wrote:

4. You could look into whether all those multixacts with many member
are really expected. (Large numbers of concurrent FK checks or
explicit share locks on the same rows perhaps? A good proportion of
this happened on one day this week I see.)

I think maybe this patch
/messages/by-id/20150620043650.GX133018@postgresql.org
should help with this. I expect to come back to this and get it pushed
to 9.3 and 9.4 sometime ...

Ah, so the OP might get some improvement today by moving to 9.5, which
has the optimization already.

BTW in my message above I said vacuum_multixact_freeze_table_age when
I meant to say autovacuum_multixact_freeze_max_age (and its default is
400 million).

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Achilleas Mantzios
achill@matrix.gatewaynet.com
In reply to: Kiriakos Georgiou (#1)
Re: pg_multixact issues

Καλημέρα Κυριάκο

We have been running 9.3.4 as our test system for quite some years, and 9.3.10 on production for a month or so (little less than 1T size, WAL changes worth of 2-5GB/day, about 250K xations/day) we
never experienced any problems with data/pg_multixact.

In our 9.3.4 :
% pg_controldata data | grep -i multi
Latest checkpoint's NextMultiXactId: 69
Latest checkpoint's NextMultiOffset: 135
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 0
% du -h data/pg_multixact/
10K data/pg_multixact/members
10K data/pg_multixact/offsets
22K data/pg_multixact/

In our prod 9.3.10:
~> pg_controldata data | grep -i multi
Latest checkpoint's NextMultiXactId: 12404
Latest checkpoint's NextMultiOffset: 356
Latest checkpoint's oldestMultiXid: 12232
Latest checkpoint's oldestMulti's DB: 16426
~> du -h data/pg_multixact/
12K data/pg_multixact/members
24K data/pg_multixact/offsets
40K data/pg_multixact/

Our system comprises of a JEE installation with ~ 500 uesrs, and at least 2 other applications hitting the same tables at the same time, we have about one case of deadlocks per week.

What could be different on yours?

On 10/02/2016 20:52, Kiriakos Georgiou wrote:

Hello,

Our pg_multixact directory keeps growing. I did a "vacuum freeze” which didn’t help. I also did a "vacuum full” which didn’t help either.
We had this condition with 9.3.4 as well. When I upgraded our cluster to 9.4.5 (via plain sql dump and load) as expected the issue was resolved but now it’s happening again. Luckily it has no ill
effect other than consuming 4G of space for an otherwise 1G database.

Can you offer any hints as to how I can cure this?

thanks,
Kiriakos Georgiou

pg_controldata output:

pg_control version number: 942
Catalog version number: 201409291
Database system identifier: 6211781659140720513
Database cluster state: in production
pg_control last modified: Wed Feb 10 13:45:02 2016
Latest checkpoint location: D/FB5FE630
Prior checkpoint location: D/FB5FE558
Latest checkpoint's REDO location: D/FB5FE5F8
Latest checkpoint's REDO WAL file: 000000010000000D000000FB
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0/3556219
Latest checkpoint's NextOID: 2227252
Latest checkpoint's NextMultiXactId: 2316566
Latest checkpoint's NextMultiOffset: 823062151
Latest checkpoint's oldestXID: 668
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 3556219
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Time of latest checkpoint: Wed Feb 10 13:45:02 2016
Fake LSN counter for unlogged rels: 0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
Current wal_level setting: hot_standby
Current wal_log_hints setting: off
Current max_connections setting: 100
Current max_worker_processes setting: 8
Current max_prepared_xacts setting: 0
Current max_locks_per_xact setting: 1024
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0

the offsets directory:

-rw------- 1 postgres dba 262144 Nov 3 15:22 0000
-rw------- 1 postgres dba 262144 Nov 5 12:45 0001
-rw------- 1 postgres dba 262144 Nov 9 14:25 0002
-rw------- 1 postgres dba 262144 Nov 13 10:10 0003
-rw------- 1 postgres dba 262144 Nov 16 15:40 0004
-rw------- 1 postgres dba 262144 Nov 20 09:55 0005
-rw------- 1 postgres dba 262144 Dec 1 08:00 0006
-rw------- 1 postgres dba 262144 Dec 9 11:50 0007
-rw------- 1 postgres dba 262144 Dec 16 08:14 0008
-rw------- 1 postgres dba 262144 Dec 21 09:40 0009
-rw------- 1 postgres dba 262144 Dec 31 09:55 000A
-rw------- 1 postgres dba 262144 Jan 4 21:17 000B
-rw------- 1 postgres dba 262144 Jan 6 10:50 000C
-rw------- 1 postgres dba 262144 Jan 7 18:20 000D
-rw------- 1 postgres dba 262144 Jan 13 13:55 000E
-rw------- 1 postgres dba 262144 Jan 15 11:55 000F
-rw------- 1 postgres dba 262144 Jan 22 07:50 0010
-rw------- 1 postgres dba 262144 Jan 26 16:35 0011
-rw------- 1 postgres dba 262144 Jan 29 10:16 0012
-rw------- 1 postgres dba 262144 Feb 3 13:17 0013
-rw------- 1 postgres dba 262144 Feb 3 16:13 0014
-rw------- 1 postgres dba 262144 Feb 4 08:24 0015
-rw------- 1 postgres dba 262144 Feb 5 13:20 0016
-rw------- 1 postgres dba 262144 Feb 8 11:26 0017
-rw------- 1 postgres dba 262144 Feb 8 11:46 0018
-rw------- 1 postgres dba 262144 Feb 8 12:25 0019
-rw------- 1 postgres dba 262144 Feb 8 13:19 001A
-rw------- 1 postgres dba 262144 Feb 8 14:23 001B
-rw------- 1 postgres dba 262144 Feb 8 15:32 001C
-rw------- 1 postgres dba 262144 Feb 8 17:01 001D
-rw------- 1 postgres dba 262144 Feb 8 19:19 001E
-rw------- 1 postgres dba 262144 Feb 8 22:11 001F
-rw------- 1 postgres dba 262144 Feb 9 01:44 0020
-rw------- 1 postgres dba 262144 Feb 9 05:57 0021
-rw------- 1 postgres dba 262144 Feb 9 10:45 0022
-rw------- 1 postgres dba 98304 Feb 10 13:35 0023

the members directory has 15723 files:
ls -l|wc -l
15723

--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt