Intermittent regression test failure from index-only scans patch

Started by Tom Laneover 14 years ago7 messages
#1Tom Lane
tgl@sss.pgh.pa.us

I notice that several members of the buildfarm have been consistently
showing the same regression test failure since the index-only scans
patch went in:

*** /home/pgbuildfarm/workdir/HEAD/pgsql.27150/src/test/regress/expected/stats.out	Sat Oct  8 03:20:05 2011
--- /home/pgbuildfarm/workdir/HEAD/pgsql.27150/src/test/regress/results/stats.out	Sat Oct  8 12:30:55 2011
***************
*** 94,100 ****
   WHERE st.relname='tenk2' AND cl.relname='tenk2';
   ?column? | ?column? | ?column? | ?column? 
  ----------+----------+----------+----------
!  t        | t        | t        | t
  (1 row)
  SELECT st.heap_blks_read + st.heap_blks_hit >= pr.heap_blks + cl.relpages,
--- 94,100 ----
   WHERE st.relname='tenk2' AND cl.relname='tenk2';
   ?column? | ?column? | ?column? | ?column? 
  ----------+----------+----------+----------
!  t        | t        | t        | f
  (1 row)

SELECT st.heap_blks_read + st.heap_blks_hit >= pr.heap_blks + cl.relpages,

The diff indicates that the idx_scan count advanced but idx_tup_fetch
did not, which is not so surprising here because tenk2 hasn't been
modified in some time. If the autovacuum daemon managed to mark it
all-visible before the stats test runs, then an index-only scan will
happen, and bingo, no idx_tup_fetch increment (because indeed no heap
tuple was fetched).

I'm inclined to fix this by changing the test to examine idx_tup_read
not idx_tup_fetch. Alternatively, we could have the test force
enable_indexonlyscan off. Thoughts?

regards, tom lane

#2Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Tom Lane (#1)
Re: Intermittent regression test failure from index-only scans patch

2011/10/8 Tom Lane <tgl@sss.pgh.pa.us>:

I notice that several members of the buildfarm have been consistently
showing the same regression test failure since the index-only scans
patch went in:

*** /home/pgbuildfarm/workdir/HEAD/pgsql.27150/src/test/regress/expected/stats.out      Sat Oct  8 03:20:05 2011
--- /home/pgbuildfarm/workdir/HEAD/pgsql.27150/src/test/regress/results/stats.out       Sat Oct  8 12:30:55 2011
***************
*** 94,100 ****
  WHERE st.relname='tenk2' AND cl.relname='tenk2';
  ?column? | ?column? | ?column? | ?column?
 ----------+----------+----------+----------
!  t        | t        | t        | t
 (1 row)
 SELECT st.heap_blks_read + st.heap_blks_hit >= pr.heap_blks + cl.relpages,
--- 94,100 ----
  WHERE st.relname='tenk2' AND cl.relname='tenk2';
  ?column? | ?column? | ?column? | ?column?
 ----------+----------+----------+----------
!  t        | t        | t        | f
 (1 row)

 SELECT st.heap_blks_read + st.heap_blks_hit >= pr.heap_blks + cl.relpages,

The diff indicates that the idx_scan count advanced but idx_tup_fetch
did not, which is not so surprising here because tenk2 hasn't been
modified in some time.  If the autovacuum daemon managed to mark it
all-visible before the stats test runs, then an index-only scan will
happen, and bingo, no idx_tup_fetch increment (because indeed no heap
tuple was fetched).

I'm inclined to fix this by changing the test to examine idx_tup_read
not idx_tup_fetch.  Alternatively, we could have the test force
enable_indexonlyscan off.  Thoughts?

No preferences, but is it interesting to add a "vacuum freeze"
somewhere and check expected result after index-only scan ? (for both
idx_tup_read and idx_tup_fetch)

                       regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Cédric Villemain (#2)
Re: Intermittent regression test failure from index-only scans patch

=?ISO-8859-1?Q?C=E9dric_Villemain?= <cedric.villemain.debian@gmail.com> writes:

2011/10/8 Tom Lane <tgl@sss.pgh.pa.us>:

The diff indicates that the idx_scan count advanced but idx_tup_fetch
did not, which is not so surprising here because tenk2 hasn't been
modified in some time. �If the autovacuum daemon managed to mark it
all-visible before the stats test runs, then an index-only scan will
happen, and bingo, no idx_tup_fetch increment (because indeed no heap
tuple was fetched).

I'm inclined to fix this by changing the test to examine idx_tup_read
not idx_tup_fetch. �Alternatively, we could have the test force
enable_indexonlyscan off. �Thoughts?

No preferences, but is it interesting to add a "vacuum freeze"
somewhere and check expected result after index-only scan ? (for both
idx_tup_read and idx_tup_fetch)

This test is only trying to make sure that the stats collection
machinery is working. I don't think that we should try to coerce things
so that it can check something as context-sensitive as whether an
index-only scan happened. It's too fragile already --- we've seen
non-reproducible failures here many times before.

regards, tom lane

#4Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#1)
Re: Intermittent regression test failure from index-only scans patch

On Oct 8, 2011, at 11:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm inclined to fix this by changing the test to examine idx_tup_read
not idx_tup_fetch. Alternatively, we could have the test force
enable_indexonlyscan off. Thoughts?

No preference.

Should we have another counter for heap fetches avoided? Seems like that could be useful to know.

...Robert

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#4)
Re: Intermittent regression test failure from index-only scans patch

Robert Haas <robertmhaas@gmail.com> writes:

On Oct 8, 2011, at 11:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm inclined to fix this by changing the test to examine idx_tup_read
not idx_tup_fetch. Alternatively, we could have the test force
enable_indexonlyscan off. Thoughts?

No preference.

I ended up doing it the second way (ie enable_indexonlyscan = off)
because it turns out that pg_stat_user_tables doesn't have the
idx_tup_read column --- we track that count per index, not per table.
I could have complicated the test's stats queries some more, but it
seemed quite not relevant to the goals of the test.

Should we have another counter for heap fetches avoided? Seems like that could be useful to know.

Hm. I'm hesitant to add another per-table (or per index?) statistics
counter because of the resultant bloat in the stats file. But it
wouldn't be a bad idea for somebody to take two steps back and rethink
what we're counting in this area. The current counter definitions are
mostly backwards-compatible with pre-8.1 behavior, and it seems like the
goalposts have moved enough that maybe it's time to break compatibility.

regards, tom lane

#6Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#5)
Re: Intermittent regression test failure from index-only scans patch

On Sun, Oct 9, 2011 at 06:34, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Oct 8, 2011, at 11:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm inclined to fix this by changing the test to examine idx_tup_read
not idx_tup_fetch.  Alternatively, we could have the test force
enable_indexonlyscan off.  Thoughts?

No preference.

I ended up doing it the second way (ie enable_indexonlyscan = off)
because it turns out that pg_stat_user_tables doesn't have the
idx_tup_read column --- we track that count per index, not per table.
I could have complicated the test's stats queries some more, but it
seemed quite not relevant to the goals of the test.

Should we have another counter for heap fetches avoided?  Seems like that could be useful to know.

Hm.  I'm hesitant to add another per-table (or per index?) statistics
counter because of the resultant bloat in the stats file.  But it
wouldn't be a bad idea for somebody to take two steps back and rethink
what we're counting in this area.  The current counter definitions are
mostly backwards-compatible with pre-8.1 behavior, and it seems like the
goalposts have moved enough that maybe it's time to break compatibility.

We certainly need *some* way to figure out if this has been used,
IMHO. So yeah, if the current way doesn't scale enough, we need to
think of some other way. But I'm not sure one more counter would
really bloat it that bad? OTOH, repeating that reasoning enough time
will eventually make it enough to care about...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#6)
Re: Intermittent regression test failure from index-only scans patch

Magnus Hagander <magnus@hagander.net> writes:

On Sun, Oct 9, 2011 at 06:34, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Should we have another counter for heap fetches avoided? �Seems like that could be useful to know.

Hm. �I'm hesitant to add another per-table (or per index?) statistics
counter because of the resultant bloat in the stats file.

We certainly need *some* way to figure out if this has been used,
IMHO. So yeah, if the current way doesn't scale enough, we need to
think of some other way. But I'm not sure one more counter would
really bloat it that bad? OTOH, repeating that reasoning enough time
will eventually make it enough to care about...

You can already tell whether it's happening by comparing idx_tup_read
versus idx_tup_fetch. Now that measure does conflate some things, like
whether the tuple was not read at all or was read and rejected as not
visible, but I'm not at all convinced that another counter is worth its
weight. If invisible tuples are a significant part of the table then
index-only scanning isn't going to be very useful to you anyway.

regards, tom lane