Index corruption / planner issue with one table in my pg 11.6 instance

Started by Jeremy Finzelabout 6 years ago8 messages

finzelj@gmail.com

about 6 years ago

I have a table with about 7 million records. I had a query in which I
needed 2 indexes added, one for a created timestamp field another for an id
field; both very high cardinality.

First I noticed the query would not use the timestamp index no matter what
session config settings I used. I finally created a temp table copy of the
table and verified index is used. Then I rebuilt the main table with
VACUUM FULL and this caused the index to be used.

I repeated this process again for an id index on same table. I created the
index and it would never be chosen no matter what, until I rebuilt the
table using VAC FULL.

I have run bt_index_check and bt_index_parent_check with heapallindexed on
one of these indexes but nothing comes up.

But one other noteworthy thing is that a cluster restart appears to fix the
issue, because on a snapshot of this system (which has been restarted) also
at 11.6, the planner picks up the index.

We quite recently (~ 2-3 weeks) did pgupgrade from 9.6 to 11. This table
in question is fed via pglogical. I checked similar behavior on another
table in this stream and could not reproduce it. So for now, it seems
limited to this one table.

Any suggestions as to how I could verify what is going on here? Anyone
experienced the same?

Thanks!
Jeremy

Jeff Janes

jeff.janes@gmail.com

about 6 years ago

In reply to: Jeremy Finzel (#1)

Re: Index corruption / planner issue with one table in my pg 11.6 instance

On Mon, Dec 9, 2019 at 1:00 PM Jeremy Finzel <finzelj@gmail.com> wrote:

I have a table with about 7 million records. I had a query in which I
needed 2 indexes added, one for a created timestamp field another for an id
field; both very high cardinality.

First I noticed the query would not use the timestamp index no matter what
session config settings I used. I finally created a temp table copy of the
table and verified index is used. Then I rebuilt the main table with
VACUUM FULL and this caused the index to be used.

Were they built with CONCURRENTLY? Do you have any long-open snapshots?

Cheers,

Jeff

Show quoted text

Michael Paquier

michael@paquier.xyz

about 6 years ago

In reply to: Jeff Janes (#2)

Re: Index corruption / planner issue with one table in my pg 11.6 instance

On Mon, Dec 09, 2019 at 03:51:39PM -0500, Jeff Janes wrote:

On Mon, Dec 9, 2019 at 1:00 PM Jeremy Finzel <finzelj@gmail.com> wrote:

I have a table with about 7 million records. I had a query in which I
needed 2 indexes added, one for a created timestamp field another for an id
field; both very high cardinality.

First I noticed the query would not use the timestamp index no matter what
session config settings I used. I finally created a temp table copy of the
table and verified index is used. Then I rebuilt the main table with
VACUUM FULL and this caused the index to be used.

Were they built with CONCURRENTLY? Do you have any long-open snapshots?

Something new as of 11 is that btree indexes can be built in parallel,
and before releasing it we found some bugs with covering indexes.
Perhaps we have an issue hidden behind one of these, but hard to be
sure. I have not seen that yet as of v11.
--
Michael

Peter Geoghegan

pg@bowt.ie

about 6 years ago

In reply to: Michael Paquier (#3)

Re: Index corruption / planner issue with one table in my pg 11.6 instance

On Mon, Dec 9, 2019 at 6:33 PM Michael Paquier <michael@paquier.xyz> wrote:

Something new as of 11 is that btree indexes can be built in parallel,
and before releasing it we found some bugs with covering indexes.
Perhaps we have an issue hidden behind one of these, but hard to be
sure.

I doubt it.

Jeremy did not report queries that give wrong answers. He only said
that the optimizer refused to use one particular index, before a
VACUUM FULL seemingly corrected the problem. OTOH, Jeremy did report
using contrib/amcheck on the index, which didn't complain. (Note also
that the amcheck functions will throw an error with an !indisvalid
index.)

--
Peter Geoghegan

Jerry Sievers

gsievers19@comcast.net

about 6 years ago

In reply to: Peter Geoghegan (#4)

Re: Index corruption / planner issue with one table in my pg 11.6 instance

Peter Geoghegan <pg@bowt.ie> writes:

On Mon, Dec 9, 2019 at 6:33 PM Michael Paquier <michael@paquier.xyz> wrote:

Something new as of 11 is that btree indexes can be built in parallel,
and before releasing it we found some bugs with covering indexes.
Perhaps we have an issue hidden behind one of these, but hard to be
sure.

I doubt it.

Jeremy did not report queries that give wrong answers. He only said
that the optimizer refused to use one particular index, before a
VACUUM FULL seemingly corrected the problem. OTOH, Jeremy did report
using contrib/amcheck on the index, which didn't complain. (Note also
that the amcheck functions will throw an error with an !indisvalid
index.)

I suspect this was due to indcheckxmin=true for the involved index and
the documented (but IMO confusing) interplay w/broken hot-chains and
visibility.

Checking the same DB today, I find 35 indexes across the entire system
having indcheckxmin=true, including one on the same table, though not
the same index that Pg refused to use recently.

Many of the indexes have very old xmins and thus s/b all considered in
plans.

I was able to get that remaining index out of the indcheckxmin=true list
by...

1. Reindexing $index (did not change anything)
2. begin; drop; create; commit (still in the list but with a much newer
xmin.)
3. Vac-Full the table again (and now the index is gone from the
indcheckxmin=true list.)

Please advise.

Thx

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net

Tom Lane

tgl@sss.pgh.pa.us

about 6 years ago

In reply to: Jerry Sievers (#5)

Re: Index corruption / planner issue with one table in my pg 11.6 instance

Jerry Sievers <gsievers19@comcast.net> writes:

I suspect this was due to indcheckxmin=true for the involved index and
the documented (but IMO confusing) interplay w/broken hot-chains and
visibility.

Yeah. The reported behavior can mostly be explained if we assume
that there's some HOT chain in the table that involves an update
of this particular column, so that if we build an index on that
column we see a broken HOT chain, but building an index on some
other column doesn't have a problem.

The thing this doesn't easily explain is that the behavior persists
across repeated index rebuilds. A broken HOT chain is only broken
as long as the older entry is still visible-to-somebody, so that
such situations ought to be self-healing as time passes. If it
fails repeatedly, this theory requires assuming that either

1. You've got some extremely old open transactions (maybe forgotten
prepared transactions?), or

2. Your workload is constantly generating new broken HOT chains of
the same sort, so that there's usually a live one when you try
to build an index.

The fact that you even notice the indcheckxmin restriction indicates
that you do tend to have long-running transactions in the system,
else the index would come free for use fairly quickly. So #1 isn't
as implausible as I might otherwise think. But #2 seems probably
more likely on the whole. OTOH, neither point is exactly within
the offered evidence.

regards, tom lane

Jeremy Finzel

finzelj@gmail.com

about 6 years ago

In reply to: Tom Lane (#6)

Re: Index corruption / planner issue with one table in my pg 11.6 instance

On Tue, Dec 10, 2019 at 12:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah. The reported behavior can mostly be explained if we assume
that there's some HOT chain in the table that involves an update
of this particular column, so that if we build an index on that
column we see a broken HOT chain, but building an index on some
other column doesn't have a problem.

The problem exists so far as I can tell on indexing *any column* of *this
particular table*. I tried same experiment on another table in the same
replication stream, and I cannot reproduce it.

I am building the index **non-concurrently** every time.

The thing this doesn't easily explain is that the behavior persists
across repeated index rebuilds. A broken HOT chain is only broken
as long as the older entry is still visible-to-somebody, so that
such situations ought to be self-healing as time passes. If it
fails repeatedly, this theory requires assuming that either

1. You've got some extremely old open transactions (maybe forgotten
prepared transactions?), or

No prepared_xacts and no transactions older than a few hours. Several hour
transactions are common in this reporting system. I have not yet seen if
after several hours the index starts showing up in plans.

2. Your workload is constantly generating new broken HOT chains of
the same sort, so that there's usually a live one when you try
to build an index.

The fact that you even notice the indcheckxmin restriction indicates
that you do tend to have long-running transactions in the system,
else the index would come free for use fairly quickly. So #1 isn't
as implausible as I might otherwise think. But #2 seems probably
more likely on the whole. OTOH, neither point is exactly within
the offered evidence.

Is there a way for me to test this theory? I tried the following with no
change in behavior:

1. Disable write load to table
2. Vacuum analyze table (not vac full)
3. Create index
4. Explain

Still did not pick up the index.

Thanks,
Jeremy

Jeremy Finzel

finzelj@gmail.com

about 6 years ago

In reply to: Jeremy Finzel (#7)

Re: Index corruption / planner issue with one table in my pg 11.6 instance

On Tue, Dec 10, 2019 at 8:25 AM Jeremy Finzel <finzelj@gmail.com> wrote:

Is there a way for me to test this theory? I tried the following with no
change in behavior:

1. Disable write load to table
2. Vacuum analyze table (not vac full)
3. Create index
4. Explain

Still did not pick up the index.

Just another followup: with no other intervention on our part, after many
hours the planner is picking up the index.

I don't quite know what is causing it still, but is this behavior actually
desired? It's pretty inconvenient when trying to build an index for a
query need and immediately use it which used to work :).

Thanks,
Jeremy