ERROR: found multixact XX from before relminmxid YY

Started by Mark Fletcherover 7 years ago4 messagesgeneral

markf@corp.groups.io

over 7 years ago

Hi,

Starting yesterday morning, auto vacuuming of one of our postgresql 9.6.10
(CentOS 7) table's started failing:

ERROR: found multixact 370350365 from before relminmxid 765860874
CONTEXT: automatic vacuum of table "userdb.public.subs"

This is about as plain and simple a table as there is. No triggers or
foreign keys, I'm not using any extensions. It has about 2.8M rows. I have
not done any consistency checks, but nothing strange has manifested in
production.

Reading the various discussions about this error, the only solution I found
was here:

/messages/by-id/CAGewt-ukbL6WL8cc-G+iN9AVvmMQkhA9i2TKP4-6wJr6YOQkzA@mail.gmail.com

But no other reports of this solving the problem. Can someone verify that
if I do the mentioned fix (and I assume upgrade to 9.6.11) that will fix
the problem? And that it doesn't indicate table corruption?

Thanks,
Mark

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Mark Fletcher (#1)

Re: ERROR: found multixact XX from before relminmxid YY

Mark Fletcher <markf@corp.groups.io> writes:

Starting yesterday morning, auto vacuuming of one of our postgresql 9.6.10
(CentOS 7) table's started failing:
ERROR: found multixact 370350365 from before relminmxid 765860874
CONTEXT: automatic vacuum of table "userdb.public.subs"

Ugh.

Reading the various discussions about this error, the only solution I found
was here:
/messages/by-id/CAGewt-ukbL6WL8cc-G+iN9AVvmMQkhA9i2TKP4-6wJr6YOQkzA@mail.gmail.com
But no other reports of this solving the problem. Can someone verify that
if I do the mentioned fix (and I assume upgrade to 9.6.11) that will fix
the problem? And that it doesn't indicate table corruption?

Yeah, SELECT FOR UPDATE should overwrite the broken xmax value and thereby
fix it, I expect. However, I don't see anything in the release notes
suggesting that we've fixed any related bugs since 9.6.10, so if this
just appeared then we've still got a problem :-(. Did anything
interesting happen since your last successful autovacuum on that table?
Database crashes, WAL-related parameter changes, that sort of thing?

regards, tom lane

Mark Fletcher

markf@corp.groups.io

over 7 years ago

In reply to: Tom Lane (#2)

Re: ERROR: found multixact XX from before relminmxid YY

On Fri, Dec 28, 2018 at 4:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, SELECT FOR UPDATE should overwrite the broken xmax value and thereby
fix it, I expect. However, I don't see anything in the release notes
suggesting that we've fixed any related bugs since 9.6.10, so if this
just appeared then we've still got a problem :-(. Did anything
interesting happen since your last successful autovacuum on that table?
Database crashes, WAL-related parameter changes, that sort of thing?

The last autovacuum of that table was on Dec 8th, the last auto analyze

was Dec 26. There have been no schema changes on that particular table,
database crashes or WAL-related parameter changes since then. We've done
other schema changes during that time, but otherwise the database has been
stable.

Thanks,
Mark

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Tom Lane (#2)

Re: ERROR: found multixact XX from before relminmxid YY

Hi,

On 2018-12-28 19:49:36 -0500, Tom Lane wrote:

Mark Fletcher <markf@corp.groups.io> writes:

Starting yesterday morning, auto vacuuming of one of our postgresql 9.6.10
(CentOS 7) table's started failing:
ERROR: found multixact 370350365 from before relminmxid 765860874
CONTEXT: automatic vacuum of table "userdb.public.subs"

Ugh.

Reading the various discussions about this error, the only solution I found
was here:
/messages/by-id/CAGewt-ukbL6WL8cc-G+iN9AVvmMQkhA9i2TKP4-6wJr6YOQkzA@mail.gmail.com
But no other reports of this solving the problem. Can someone verify that
if I do the mentioned fix (and I assume upgrade to 9.6.11) that will fix
the problem? And that it doesn't indicate table corruption?

Yeah, SELECT FOR UPDATE should overwrite the broken xmax value and thereby
fix it, I expect.

Right.

However, I don't see anything in the release notes
suggesting that we've fixed any related bugs since 9.6.10, so if this
just appeared then we've still got a problem :-(. Did anything
interesting happen since your last successful autovacuum on that table?
Database crashes, WAL-related parameter changes, that sort of thing?

I think it's entirely conceivable that the damage happened with earlier versions,
and just became visible now as the global horizon increased.

Greetings,

Andres Freund