Cosmic ray hits integerset

Started by Thomas Munroover 4 years ago6 messages
#1Thomas Munro
thomas.munro@gmail.com

Hi,

Here's a curious one-off failure in test_integerset:

+ERROR: iterate returned wrong value; got 519985430528, expected 485625692160

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2021-04-01%2018:19:47

#2Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Thomas Munro (#1)
Re: Cosmic ray hits integerset

On 2021-Jun-22, Thomas Munro wrote:

Hi,

Here's a curious one-off failure in test_integerset:

+ERROR: iterate returned wrong value; got 519985430528, expected 485625692160

Cosmic rays indeed. The base-2 representation of the expected value is
111000100010001100011000000000000000000
and that of the actual value is
111100100010001100011000000000000000000

There's a single bit of difference.

--
�lvaro Herrera Valdivia, Chile
"No hay hombre que no aspire a la plenitud, es decir,
la suma de experiencias de que un hombre es capaz"

#3Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Alvaro Herrera (#2)
Re: Cosmic ray hits integerset

22 июня 2021 г., в 19:21, Alvaro Herrera <alvherre@alvh.no-ip.org> написал(а):

On 2021-Jun-22, Thomas Munro wrote:

Hi,

Here's a curious one-off failure in test_integerset:

+ERROR: iterate returned wrong value; got 519985430528, expected 485625692160

Cosmic rays indeed. The base-2 representation of the expected value is
111000100010001100011000000000000000000
and that of the actual value is
111100100010001100011000000000000000000

There's a single bit of difference.

I've tried to explain this as not a single-event upset, but integer overflow in 30-bits mode of simple8b somewhere. But found nothing so far. Actual error is in bit 35, and next mode is 60-bit mode.

Looks like cosmic ray to me too.

Best regards, Andrey Borodin.

#4Jakub Wartak
Jakub.Wartak@tomtom.com
In reply to: Alvaro Herrera (#2)
RE: Cosmic ray hits integerset

Hi, Asking out of pure technical curiosity about "the rhinoceros" - what kind of animal is it ? Physical box or VM? How one could get dmidecode(1) / dmesg(1) / mcelog (1) from what's out there (e.g. does it run ECC or not ?)

-J.

Show quoted text

-----Original Message-----
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Sent: Tuesday, June 22, 2021 4:21 PM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: pgsql-hackers <pgsql-hackers@postgresql.org>
Subject: Re: Cosmic ray hits integerset

On 2021-Jun-22, Thomas Munro wrote:

Hi,

Here's a curious one-off failure in test_integerset:

+ERROR:  iterate returned wrong value; got 519985430528, expected
+485625692160

Cosmic rays indeed. The base-2 representation of the expected value is
111000100010001100011000000000000000000
and that of the actual value is
111100100010001100011000000000000000000

There's a single bit of difference.

#5Joe Conway
mail@joeconway.com
In reply to: Jakub Wartak (#4)
Re: Cosmic ray hits integerset

On 7/7/21 2:53 AM, Jakub Wartak wrote:

Hi, Asking out of pure technical curiosity about "the rhinoceros" - what kind of animal is it ? Physical box or VM? How one could get dmidecode(1) / dmesg(1) / mcelog (1) from what's out there (e.g. does it run ECC or not ?)

Rhinoceros is just a VM on a simple desktop machine. Nothing fancy.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

#6Greg Stark
stark@mit.edu
In reply to: Joe Conway (#5)
Re: Cosmic ray hits integerset

Fwiw, yes it could be a cosmic ray.

It could also just be marginally bad ram. Bad ram is notoriously hard
to reliably test for. It can be very sensitive to the exact bit
pattern stored in it, the timing of reads and writes, and other
factors. The whole point of the rowhammer attacks is to push some of
those timing factors hard but the same failures can happen randomly.

On Wed, 7 Jul 2021 at 08:14, Joe Conway <mail@joeconway.com> wrote:

On 7/7/21 2:53 AM, Jakub Wartak wrote:

Hi, Asking out of pure technical curiosity about "the rhinoceros" - what kind of animal is it ? Physical box or VM? How one could get dmidecode(1) / dmesg(1) / mcelog (1) from what's out there (e.g. does it run ECC or not ?)

Rhinoceros is just a VM on a simple desktop machine. Nothing fancy.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

--
greg