Shared buffer hash table corrupted
Hi All,
Running 9.6.15, this morning we got a 'shared buffer hash table corrupted'
error on a query. I reran the query a couple hours later, and it completed
without error. This is running in production on a Linode instance which
hasn't seen any config changes in months.
I didn't find much on-line about this. How concerned should I be? Would you
move the instance to a different physical host?
Thanks,
Mark
Mark Fletcher <markf@corp.groups.io> writes:
Running 9.6.15, this morning we got a 'shared buffer hash table corrupted'
error on a query. I reran the query a couple hours later, and it completed
without error. This is running in production on a Linode instance which
hasn't seen any config changes in months.
I didn't find much on-line about this. How concerned should I be? Would you
move the instance to a different physical host?
Personally, I'd restart the postmaster, but not do more than that unless
the error recurs.
regards, tom lane
On Fri, Feb 21, 2020 at 2:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Personally, I'd restart the postmaster, but not do more than that unless
the error recurs.
Thanks for the response. I did restart the postmaster yesterday. Earlier
this morning, a query that normally completes fine started to error out
with 'invalid memory alloc request size 18446744073709551613'. Needless to
say our database isn't quite that size. This query was against a table in a
different database than the one that had the corruption warning yesterday.
Restarting the postmaster again fixed the problem. For good measure I
restarted the machine as well.
I need to decide what to do next, if anything. We have a hot standby that
we also run queries against, and it hasn't shown any errors. I can switch
over to that as the primary. Or I can move the main database to a different
physical host.
Thoughts appreciated.
Thanks,
Mark
Mark Fletcher <markf@corp.groups.io> writes:
Thanks for the response. I did restart the postmaster yesterday. Earlier
this morning, a query that normally completes fine started to error out
with 'invalid memory alloc request size 18446744073709551613'. Needless to
say our database isn't quite that size. This query was against a table in a
different database than the one that had the corruption warning yesterday.
Restarting the postmaster again fixed the problem. For good measure I
restarted the machine as well.
Um. At that point I'd agree with your concern about developing hardware
problems. Both of these symptoms could be easily explained by dropped
bits in PG's shared memory area. Do you happen to know if the server
has ECC RAM?
regards, tom lane
On Sat, Feb 22, 2020 at 9:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Um. At that point I'd agree with your concern about developing hardware
problems. Both of these symptoms could be easily explained by dropped
bits in PG's shared memory area. Do you happen to know if the server
has ECC RAM?Yes, it appears that Linode uses ECC and other server grade hardware for
their machines.
Thanks,
Mark