MultiXact member wraparound protections are disabled
Hi,
We run postgres 9.4.5.
Starting this morning, we started seeing messages like the below:
Oct 12 14:07:15 site-db01a postgres[11253]: [106430-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
Oct 12 14:09:26 site-db01a postgres[11253]: [106526-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
Oct 12 14:14:18 site-db01a postgres[11253]: [106608-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
Our autovacuum_freeze_max_age = 1750000000.
site=# SELECT datname, age(datfrozenxid) FROM pg_database;
datname | age
-----------+------------
site | 1645328344
template0 | 1274558807
bench | 1274558807
postgres | 1324283514
template1 | 1274558807
So we’re about 100 mil transactions away before we start vacuuming to prevent wraparound.
We’re running precautionary vacuums on our largest offenders to try and drop our transaction ids
What I’d request some clarity on is the message above. What does it mean that "oldest checkpointed MultiXact does not exist on disk”? Would we lose data if we did have to wrap around?
Is this telling us we’re not vacuuming effectively enough?
Thanks,
Karthik
AnandKumar, Karthik wrote:
Hi,
We run postgres 9.4.5.
Starting this morning, we started seeing messages like the below:
Oct 12 14:07:15 site-db01a postgres[11253]: [106430-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
Oct 12 14:09:26 site-db01a postgres[11253]: [106526-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
Oct 12 14:14:18 site-db01a postgres[11253]: [106608-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on diskOur autovacuum_freeze_max_age = 1750000000.
site=# SELECT datname, age(datfrozenxid) FROM pg_database;
datname | age
-----------+------------
site | 1645328344
template0 | 1274558807
bench | 1274558807
postgres | 1324283514
template1 | 1274558807So we’re about 100 mil transactions away before we start vacuuming to prevent wraparound.
We’re running precautionary vacuums on our largest offenders to try and drop our transaction ids
What I’d request some clarity on is the message above. What does it mean that "oldest checkpointed MultiXact does not exist on disk”? Would we lose data if we did have to wrap around?
Is this telling us we’re not vacuuming effectively enough?
Ugh. Can you share the output of pg_controldata and the list of files
in pg_multixact/members and pg_multixact/offset?
The problem here is that multixact vacuuming is separate from xid
vacuuming, so you need to be looking at datminmulti rather than
datfrozenxid. It may be that multixact wrap around has already
occurred.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
root@site-db01a:/var/lib/pgsql/cmates/data # ls pg_multixact/members
0000 0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000B 000C 000D 000E 000F 0010 0011 0012 0013 0014 0015 0016 0017 0018 0019 001A 001B
root@site-db01a:/var/lib/pgsql/cmates/data # ls pg_multixact/offsets
0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000B
postgres@site-db01a:~ $ /usr/pgsql-9.4/bin/pg_controldata /var/lib/pgsql/cmates/data
pg_control version number: 942
Catalog version number: 201409291
Database system identifier: 6228991221455883206
Database cluster state: in production
pg_control last modified: Wed 12 Oct 2016 05:22:45 PM PDT
Latest checkpoint location: 62D0/BDE939F8
Prior checkpoint location: 62CF/F039BFD0
Latest checkpoint's REDO location: 62D0/8A060220
Latest checkpoint's REDO WAL file: 00000001000062D00000008A
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 1/1834305762
Latest checkpoint's NextOID: 19540327
Latest checkpoint's NextMultiXactId: 784503
Latest checkpoint's NextMultiOffset: 1445264
Latest checkpoint's oldestXID: 226141373
Latest checkpoint's oldestXID's DB: 16457
Latest checkpoint's oldestActiveXID: 1834302410
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 16457
Time of latest checkpoint: Wed 12 Oct 2016 05:22:05 PM PDT
Fake LSN counter for unlogged rels: 0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
Current wal_level setting: hot_standby
Current wal_log_hints setting: off
Current max_connections setting: 1500
Current max_worker_processes setting: 8
Current max_prepared_xacts setting: 0
Current max_locks_per_xact setting: 1000
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0
On 10/13/16, 5:28 AM, "Alvaro Herrera" <alvherre@2ndquadrant.com> wrote:
AnandKumar, Karthik wrote:
Hi,
We run postgres 9.4.5.
Starting this morning, we started seeing messages like the below:
Oct 12 14:07:15 site-db01a postgres[11253]: [106430-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
Oct 12 14:09:26 site-db01a postgres[11253]: [106526-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
Oct 12 14:14:18 site-db01a postgres[11253]: [106608-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on diskOur autovacuum_freeze_max_age = 1750000000.
site=# SELECT datname, age(datfrozenxid) FROM pg_database;
datname | age
-----------+------------
site | 1645328344
template0 | 1274558807
bench | 1274558807
postgres | 1324283514
template1 | 1274558807So we’re about 100 mil transactions away before we start vacuuming to prevent wraparound.
We’re running precautionary vacuums on our largest offenders to try and drop our transaction ids
What I’d request some clarity on is the message above. What does it mean that "oldest checkpointed MultiXact does not exist on disk”? Would we lose data if we did have to wrap around?
Is this telling us we’re not vacuuming effectively enough?
Ugh. Can you share the output of pg_controldata and the list of files
in pg_multixact/members and pg_multixact/offset?The problem here is that multixact vacuuming is separate from xid
vacuuming, so you need to be looking at datminmulti rather than
datfrozenxid. It may be that multixact wrap around has already
occurred.--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Sharing output
postgres@site-db01a:~/cmates/data/pg_multixact/members $ ls
0000 0002 0004 0006 0008 000A 000C 000E 0010 0012 0014 0016
0018 001A
0001 0003 0005 0007 0009 000B 000D 000F 0011 0013 0015 0017
0019 001B
postgres@site-db01a:~/cmates/data/pg_multixact/offsets $ ls
0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000B
postgres@site-db01a:/tmp $ /usr/pgsql-9.4/bin/pg_controldata -D
/var/lib/pgsql/cmates/data
pg_controldata: could not open file "-D/global/pg_control" for reading: No
such file or directory
pg_controldata is not working in here even though the file is there inside
global but it is not reading from it
postgres@site-db01a:~/cmates/data/global $ ls -la pg_control
-rw-------. 1 postgres postgres 8192 Oct 12 18:55 pg_control
On Wed, Oct 12, 2016 at 4:58 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:
Show quoted text
AnandKumar, Karthik wrote:
Hi,
We run postgres 9.4.5.
Starting this morning, we started seeing messages like the below:
Oct 12 14:07:15 site-db01a postgres[11253]: [106430-1]app=,user=,db=,ip=LOG: MultiXact member wraparound protections are
disabled because oldest checkpointed MultiXact 1 does not exist on diskOct 12 14:09:26 site-db01a postgres[11253]: [106526-1]
app=,user=,db=,ip=LOG: MultiXact member wraparound protections are
disabled because oldest checkpointed MultiXact 1 does not exist on diskOct 12 14:14:18 site-db01a postgres[11253]: [106608-1]
app=,user=,db=,ip=LOG: MultiXact member wraparound protections are
disabled because oldest checkpointed MultiXact 1 does not exist on diskOur autovacuum_freeze_max_age = 1750000000.
site=# SELECT datname, age(datfrozenxid) FROM pg_database;
datname | age
-----------+------------
site | 1645328344
template0 | 1274558807
bench | 1274558807
postgres | 1324283514
template1 | 1274558807So we’re about 100 mil transactions away before we start vacuuming to
prevent wraparound.
We’re running precautionary vacuums on our largest offenders to try and
drop our transaction ids
What I’d request some clarity on is the message above. What does it mean
that "oldest checkpointed MultiXact does not exist on disk”? Would we lose
data if we did have to wrap around?Is this telling us we’re not vacuuming effectively enough?
Ugh. Can you share the output of pg_controldata and the list of files
in pg_multixact/members and pg_multixact/offset?The problem here is that multixact vacuuming is separate from xid
vacuuming, so you need to be looking at datminmulti rather than
datfrozenxid. It may be that multixact wrap around has already
occurred.--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Got the output of pg_control
postgres@site-db01a:~/cmates/data/global $
/usr/pgsql-9.4/bin/pg_controldata /var/lib/pgsql/cmates/data
pg_control version number: 942
Catalog version number: 201409291
Database system identifier: 6228991221455883206
Database cluster state: in production
pg_control last modified: Wed 12 Oct 2016 07:08:14 PM PDT
Latest checkpoint location: 62E1/890DA8D8
Prior checkpoint location: 62E0/550B2178
Latest checkpoint's REDO location: 62E1/4F054A08
Latest checkpoint's REDO WAL file: 00000001000062E10000004F
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 1/1834454859
Latest checkpoint's NextOID: 19540816
Latest checkpoint's NextMultiXactId: 784527
Latest checkpoint's NextMultiOffset: 1445313
Latest checkpoint's oldestXID: 226141373
Latest checkpoint's oldestXID's DB: 16457
Latest checkpoint's oldestActiveXID: 1834454859
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 16457
Time of latest checkpoint: Wed 12 Oct 2016 07:06:45 PM PDT
Fake LSN counter for unlogged rels: 0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
Current wal_level setting: hot_standby
Current wal_log_hints setting: off
Current max_connections setting: 1500
Current max_worker_processes setting: 8
Current max_prepared_xacts setting: 0
Current max_locks_per_xact setting: 1000
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0
On Wed, Oct 12, 2016 at 7:10 PM, avi Singh <avisingh19811981@gmail.com>
wrote:
Show quoted text
Sharing output
postgres@site-db01a:~/cmates/data/pg_multixact/members $ ls
0000 0002 0004 0006 0008 000A 000C 000E 0010 0012 0014 0016
0018 001A
0001 0003 0005 0007 0009 000B 000D 000F 0011 0013 0015 0017
0019 001Bpostgres@site-db01a:~/cmates/data/pg_multixact/offsets $ ls
0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000Bpostgres@site-db01a:/tmp $ /usr/pgsql-9.4/bin/pg_controldata -D
/var/lib/pgsql/cmates/data
pg_controldata: could not open file "-D/global/pg_control" for reading: No
such file or directorypg_controldata is not working in here even though the file is there inside
global but it is not reading from itpostgres@site-db01a:~/cmates/data/global $ ls -la pg_control
-rw-------. 1 postgres postgres 8192 Oct 12 18:55 pg_controlOn Wed, Oct 12, 2016 at 4:58 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:AnandKumar, Karthik wrote:
Hi,
We run postgres 9.4.5.
Starting this morning, we started seeing messages like the below:
Oct 12 14:07:15 site-db01a postgres[11253]: [106430-1]app=,user=,db=,ip=LOG: MultiXact member wraparound protections are
disabled because oldest checkpointed MultiXact 1 does not exist on diskOct 12 14:09:26 site-db01a postgres[11253]: [106526-1]
app=,user=,db=,ip=LOG: MultiXact member wraparound protections are
disabled because oldest checkpointed MultiXact 1 does not exist on diskOct 12 14:14:18 site-db01a postgres[11253]: [106608-1]
app=,user=,db=,ip=LOG: MultiXact member wraparound protections are
disabled because oldest checkpointed MultiXact 1 does not exist on diskOur autovacuum_freeze_max_age = 1750000000.
site=# SELECT datname, age(datfrozenxid) FROM pg_database;
datname | age
-----------+------------
site | 1645328344
template0 | 1274558807
bench | 1274558807
postgres | 1324283514
template1 | 1274558807So we’re about 100 mil transactions away before we start vacuuming to
prevent wraparound.
We’re running precautionary vacuums on our largest offenders to try and
drop our transaction ids
What I’d request some clarity on is the message above. What does it
mean that "oldest checkpointed MultiXact does not exist on disk”? Would we
lose data if we did have to wrap around?Is this telling us we’re not vacuuming effectively enough?
Ugh. Can you share the output of pg_controldata and the list of files
in pg_multixact/members and pg_multixact/offset?The problem here is that multixact vacuuming is separate from xid
vacuuming, so you need to be looking at datminmulti rather than
datfrozenxid. It may be that multixact wrap around has already
occurred.--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
We are also seeing this in our log file
Oct 12 19:08:14 site-db01a postgres[6117]: [7589-1] app=,user=,db=,ip=LOG:
MultiXact member wraparound protections are disabled because oldest
checkpointed MultiXact 1 does not exist on disk
On Wed, Oct 12, 2016 at 7:13 PM, avi Singh <avisingh19811981@gmail.com>
wrote:
Show quoted text
Got the output of pg_control
postgres@site-db01a:~/cmates/data/global $ /usr/pgsql-9.4/bin/pg_controldata
/var/lib/pgsql/cmates/data
pg_control version number: 942
Catalog version number: 201409291
Database system identifier: 6228991221455883206
Database cluster state: in production
pg_control last modified: Wed 12 Oct 2016 07:08:14 PM PDT
Latest checkpoint location: 62E1/890DA8D8
Prior checkpoint location: 62E0/550B2178
Latest checkpoint's REDO location: 62E1/4F054A08
Latest checkpoint's REDO WAL file: 00000001000062E10000004F
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 1/1834454859
Latest checkpoint's NextOID: 19540816
Latest checkpoint's NextMultiXactId: 784527
Latest checkpoint's NextMultiOffset: 1445313
Latest checkpoint's oldestXID: 226141373
Latest checkpoint's oldestXID's DB: 16457
Latest checkpoint's oldestActiveXID: 1834454859
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 16457
Time of latest checkpoint: Wed 12 Oct 2016 07:06:45 PM PDTFake LSN counter for unlogged rels: 0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
Current wal_level setting: hot_standby
Current wal_log_hints setting: off
Current max_connections setting: 1500
Current max_worker_processes setting: 8
Current max_prepared_xacts setting: 0
Current max_locks_per_xact setting: 1000
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0On Wed, Oct 12, 2016 at 7:10 PM, avi Singh <avisingh19811981@gmail.com>
wrote:Sharing output
postgres@site-db01a:~/cmates/data/pg_multixact/members $ ls
0000 0002 0004 0006 0008 000A 000C 000E 0010 0012 0014 0016
0018 001A
0001 0003 0005 0007 0009 000B 000D 000F 0011 0013 0015 0017
0019 001Bpostgres@site-db01a:~/cmates/data/pg_multixact/offsets $ ls
0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000Bpostgres@site-db01a:/tmp $ /usr/pgsql-9.4/bin/pg_controldata -D
/var/lib/pgsql/cmates/data
pg_controldata: could not open file "-D/global/pg_control" for reading:
No such file or directorypg_controldata is not working in here even though the file is there
inside global but it is not reading from itpostgres@site-db01a:~/cmates/data/global $ ls -la pg_control
-rw-------. 1 postgres postgres 8192 Oct 12 18:55 pg_controlOn Wed, Oct 12, 2016 at 4:58 PM, Alvaro Herrera <alvherre@2ndquadrant.com
wrote:
AnandKumar, Karthik wrote:
Hi,
We run postgres 9.4.5.
Starting this morning, we started seeing messages like the below:
Oct 12 14:07:15 site-db01a postgres[11253]: [106430-1]app=,user=,db=,ip=LOG: MultiXact member wraparound protections are
disabled because oldest checkpointed MultiXact 1 does not exist on diskOct 12 14:09:26 site-db01a postgres[11253]: [106526-1]
app=,user=,db=,ip=LOG: MultiXact member wraparound protections are
disabled because oldest checkpointed MultiXact 1 does not exist on diskOct 12 14:14:18 site-db01a postgres[11253]: [106608-1]
app=,user=,db=,ip=LOG: MultiXact member wraparound protections are
disabled because oldest checkpointed MultiXact 1 does not exist on diskOur autovacuum_freeze_max_age = 1750000000.
site=# SELECT datname, age(datfrozenxid) FROM pg_database;
datname | age
-----------+------------
site | 1645328344
template0 | 1274558807
bench | 1274558807
postgres | 1324283514
template1 | 1274558807So we’re about 100 mil transactions away before we start vacuuming to
prevent wraparound.
We’re running precautionary vacuums on our largest offenders to try
and drop our transaction ids
What I’d request some clarity on is the message above. What does it
mean that "oldest checkpointed MultiXact does not exist on disk”? Would we
lose data if we did have to wrap around?Is this telling us we’re not vacuuming effectively enough?
Ugh. Can you share the output of pg_controldata and the list of files
in pg_multixact/members and pg_multixact/offset?The problem here is that multixact vacuuming is separate from xid
vacuuming, so you need to be looking at datminmulti rather than
datfrozenxid. It may be that multixact wrap around has already
occurred.--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
AnandKumar, Karthik wrote:
root@site-db01a:/var/lib/pgsql/cmates/data # ls pg_multixact/members
0000 0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000B 000C 000D 000E 000F 0010 0011 0012 0013 0014 0015 0016 0017 0018 0019 001A 001B
root@site-db01a:/var/lib/pgsql/cmates/data # ls pg_multixact/offsets
0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000B
postgres@site-db01a:~ $ /usr/pgsql-9.4/bin/pg_controldata /var/lib/pgsql/cmates/data
Latest checkpoint's NextMultiXactId: 784503
Latest checkpoint's NextMultiOffset: 1445264
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 16457
This looks perfectly normal, except that the pg_multixact/offsets/0000
file is gone. oldestMultiXid is 1 so I don't see how could have the
file gotten removed. Has this been upgraded recently from a previous
9.3 or 9.4 version? There have been bugs in this area but they've been
fixed now for some time.
The 0000 file could have been removed manually, perhaps?
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Thanks. We started seeing this error right after a SAN FC re-cable effort - so yes, that would make sense.
We’ll do a little more digging to see if the 0000 could have gotten removed.
If that’s an older file that we have in our filesystem backups, is it safe to restore from there?
On 10/13/16, 3:30 PM, "Alvaro Herrera" <alvherre@2ndquadrant.com> wrote:
AnandKumar, Karthik wrote:
root@site-db01a:/var/lib/pgsql/cmates/data # ls pg_multixact/members
0000 0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000B 000C 000D 000E 000F 0010 0011 0012 0013 0014 0015 0016 0017 0018 0019 001A 001B
root@site-db01a:/var/lib/pgsql/cmates/data # ls pg_multixact/offsets
0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000Bpostgres@site-db01a:~ $ /usr/pgsql-9.4/bin/pg_controldata /var/lib/pgsql/cmates/data
Latest checkpoint's NextMultiXactId: 784503
Latest checkpoint's NextMultiOffset: 1445264
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 16457This looks perfectly normal, except that the pg_multixact/offsets/0000
file is gone. oldestMultiXid is 1 so I don't see how could have the
file gotten removed. Has this been upgraded recently from a previous
9.3 or 9.4 version? There have been bugs in this area but they've been
fixed now for some time.The 0000 file could have been removed manually, perhaps?
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
AnandKumar, Karthik wrote:
Thanks. We started seeing this error right after a SAN FC re-cable effort - so yes, that would make sense.
We’ll do a little more digging to see if the 0000 could have gotten removed.
If that’s an older file that we have in our filesystem backups, is it safe to restore from there?
Sure, the files are immutable after they are completed. I worry that if
the system removed it automatically, it would just remove it again,
though. Shouldn't happen on 9.4.5, but it seems just too much of a
coincidence that that file was removed.
Changes such as FC recabling should not cause anything like this. I
mean, why a pg_multixact file and not a table data file? Very fishy.
I'd advise to verify your older logs at the time of restarts whether the
"multixact protections are enabled" message has ever appeared, or it has
always been "protections are disabled". Maybe you've had the problem
for ages and just never noticed ...
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Your right we looked back in our old logs and we do see the messages there
as well. Still what I'm not getting is since we restarted the database
after SAN FC re-cable effort auto-vacuum is running on all the threads
continuous. I have never seen auto-vacuum using all the threads 24*7 on
this database. Any thoughts ?
On Thu, Oct 13, 2016 at 8:35 AM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:
Show quoted text
AnandKumar, Karthik wrote:
Thanks. We started seeing this error right after a SAN FC re-cable
effort - so yes, that would make sense.
We’ll do a little more digging to see if the 0000 could have gotten
removed.
If that’s an older file that we have in our filesystem backups, is it
safe to restore from there?
Sure, the files are immutable after they are completed. I worry that if
the system removed it automatically, it would just remove it again,
though. Shouldn't happen on 9.4.5, but it seems just too much of a
coincidence that that file was removed.Changes such as FC recabling should not cause anything like this. I
mean, why a pg_multixact file and not a table data file? Very fishy.I'd advise to verify your older logs at the time of restarts whether the
"multixact protections are enabled" message has ever appeared, or it has
always been "protections are disabled". Maybe you've had the problem
for ages and just never noticed ...--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
avi Singh wrote:
Your right we looked back in our old logs and we do see the messages there
as well. Still what I'm not getting is since we restarted the database
after SAN FC re-cable effort auto-vacuum is running on all the threads
continuous. I have never seen auto-vacuum using all the threads 24*7 on
this database. Any thoughts ?
It's trying to ensure all tables are correctly frozen. As I recall,
that's working per spec and you should just let it run until it's done.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Thank you for your help Alvaro - we really appreciate it.
The error in fact stopped this morning - we took downtime and ran a vacuum across all of our tables, and saw increased auto vacuum activity as well.
It looks like it bumped up the oldest multitxid to something other than 1 now:
postgres@site-db01a:~ $ /usr/pgsql-9.4/bin/pg_controldata /var/lib/pgsql/cmates/data | grep -i multi
Latest checkpoint's NextMultiXactId: 785051
Latest checkpoint's NextMultiOffset: 1446371
Latest checkpoint's oldestMultiXid: 575211
Latest checkpoint's oldestMulti's DB: 12998
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general