Intermittent failure in InstallCheck-C "stat" test

Started by Thomas Munroabout 7 years ago5 messageshackers
Jump to latest
#1Thomas Munro
thomas.munro@gmail.com

Hi,

Just now, and also once 5-and-a-bit days ago, flaviventris failed like
this, as did filefish 41 days ago[1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=filefish&dt=2019-02-23%2009%3A53%3A11 (there may be more, I just
checked a random sample of InstallCheck-C failures accessible via the
web interface):

  WHERE relname like 'trunc_stats_test%' order by relname;
       relname      | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup |
n_dead_tup
 -------------------+-----------+-----------+-----------+------------+------------
- trunc_stats_test  |         3 |         0 |         0 |          0 |
         0
- trunc_stats_test1 |         4 |         2 |         1 |          1 |
         0
- trunc_stats_test2 |         1 |         0 |         0 |          1 |
         0
- trunc_stats_test3 |         4 |         0 |         0 |          2 |
         2
- trunc_stats_test4 |         2 |         0 |         0 |          0 |
         2
+ trunc_stats_test  |         0 |         0 |         0 |          0 |
         0
+ trunc_stats_test1 |         0 |         0 |         0 |          0 |
         0
+ trunc_stats_test2 |         0 |         0 |         0 |          0 |
         0
+ trunc_stats_test3 |         0 |         0 |         0 |          0 |
         0
+ trunc_stats_test4 |         0 |         0 |         0 |          0 |
         0
 (5 rows)
 SELECT st.seq_scan >= pr.seq_scan + 1,
@@ -180,7 +180,7 @@
  WHERE st.relname='tenk2' AND cl.relname='tenk2';
  ?column? | ?column? | ?column? | ?column?
 ----------+----------+----------+----------
- t        | t        | t        | t
+ f        | f        | f        | f
 (1 row)
 SELECT st.heap_blks_read + st.heap_blks_hit >= pr.heap_blks + cl.relpages,
@@ -189,7 +189,7 @@
  WHERE st.relname='tenk2' AND cl.relname='tenk2';
  ?column? | ?column?
 ----------+----------
- t        | t
+ t        | f
 (1 row)

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=filefish&dt=2019-02-23%2009%3A53%3A11

--
Thomas Munro
https://enterprisedb.com

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#1)
Re: Intermittent failure in InstallCheck-C "stat" test

Thomas Munro <thomas.munro@gmail.com> writes:

Just now, and also once 5-and-a-bit days ago, flaviventris failed like
this, as did filefish 41 days ago[1] (there may be more, I just
checked a random sample of InstallCheck-C failures accessible via the
web interface):

This sort of thing has pretty much always happened. I believe it is
just down to the designed-in unreliability of the current stats collection
mechanism. We might be able to get rid of it if we go over to
shared-memory stats, but I've yet to look at that patch :-(. In the
meantime I don't see any reason to think that anything's worse here
than it has been for many years.

regards, tom lane

#3Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#2)
Re: Intermittent failure in InstallCheck-C "stat" test

Hi,

On 2019-04-05 18:19:17 -0400, Tom Lane wrote:

We might be able to get rid of it if we go over to shared-memory
stats, but I've yet to look at that patch :-(.

I did a few review cycles on it, and while I believe the concept is
sound, I think it needs a good bit more time to mature. Not
realistically doable for v12.

Greetings,

Andres Freund

#4Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#2)
Re: Intermittent failure in InstallCheck-C "stat" test

On Sat, Apr 6, 2019 at 11:19 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thomas Munro <thomas.munro@gmail.com> writes:

Just now, and also once 5-and-a-bit days ago, flaviventris failed like
this, as did filefish 41 days ago[1] (there may be more, I just
checked a random sample of InstallCheck-C failures accessible via the
web interface):

This sort of thing has pretty much always happened. I believe it is
just down to the designed-in unreliability of the current stats collection
mechanism. We might be able to get rid of it if we go over to
shared-memory stats, but I've yet to look at that patch :-(. In the
meantime I don't see any reason to think that anything's worse here
than it has been for many years.

Does it imply that the kernel dropped a UDP packet to localhost?

--
Thomas Munro
https://enterprisedb.com

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#4)
Re: Intermittent failure in InstallCheck-C "stat" test

Thomas Munro <thomas.munro@gmail.com> writes:

On Sat, Apr 6, 2019 at 11:19 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

This sort of thing has pretty much always happened. I believe it is
just down to the designed-in unreliability of the current stats collection
mechanism. We might be able to get rid of it if we go over to
shared-memory stats, but I've yet to look at that patch :-(. In the
meantime I don't see any reason to think that anything's worse here
than it has been for many years.

Does it imply that the kernel dropped a UDP packet to localhost?

That's a possible explanation, anyway. The problem shows up seldom enough
that it's hard to say that conclusively. So *maybe* there's a bug here
we could actually fix, but again, without any way to repro it, it's hard
to say much.

regards, tom lane