incorrect resource manager data checksum in record

Started by Devin Christensenalmost 8 years ago6 messagesgeneral

quixoten@gmail.com

almost 8 years ago

I've been seeing this issue in multiple separate hot standby replication
chains of PostgreSQL servers (5 so far). There are 4 servers in each chain
(some running Ubuntu 14.04 and others Ubuntu 16.04. and PostgreSQL >= 10.1
and <= 11). We also have a mix of ext4 and zfs file systems. Here are the
details for each chain:

First chain
===========
dc1-pg105 (pg 10.1, ub 14.04.5) (primary)
|
V
dc1-pg205 (pg 10.3, ub 16.04.4)
|
V
dc2-pg105 (pg 10.1, ub 14.04.5) <-- error first occurs here
|
V
dc2-pg205 (pg 10.3, ub 16.04.4) <-- and also effects this node

Second chain
===========
dc1-pg106 (pg 10.1, ub 14.04.5, ext4) (primary)
|
V
dc1-pg206 (pg 10.3, ub 16.04.4, zfs)
|
V
dc2-pg106 (pg 10.1, ub 14.04.5, ext4) <-- error first occurs here
|
V
dc2-pg206 (pg 10.3, ub 16.04.4, zfs) <-- and also effects this node

Third chain
===========
dc1-pg107 (pg 10.1, ub 14.04.5, ext4) (primary)
|
V
dc1-pg207 (pg 10.3, ub 16.04.4, zfs)
|
V
dc2-pg107 (pg 10.1, ub 14.04.5, ext4) <-- error first occurs here
|
V
dc2-pg207 (pg 10.3, ub 16.04.4, zfs) <-- and also effects this node

Fourth chain
===========
dc1-pg108 (pg 10.3, ub 16.04.4, ext4) (primary)
|
V
dc1-pg208 (pg 10.3, ub 16.04.4, zfs)
|
V
dc2-pg108 (pg 10.3, ub 16.04.4, ext4) <-- error first occurs here
|
V
dc2-pg208 (pg 10.3, ub 16.04.4, zfs) <-- and also effects this node

Fifth chain
===========
dc1-pg110 (pg 10.3, ub 16.04.4, ext4) (primary)
|
V
dc1-pg210 (pg 10.3, ub 16.04.4, zfs)
|
V
dc2-pg110 (pg 10.3, ub 16.04.4, ext4) <-- error first occurs here
|
V
dc2-pg210 (pg 10.3, ub 16.04.4, zfs) <-- and also effects this node

The pattern is the same, regardless of ubuntu or postgresql versions. I'm
concerned this is somehow a ZFS corruption bug, because the error always
occurs downstream of the first ZFS node and ZFS is a recent addition. I
don't know enough about what this error means, and haven't found much
online. When I restart the nodes effected, replication resumes normally,
with no known side-effects that I've discovered so far, but I'm no longer
confident that the data downstream from the primary is valid. Really not
sure how best to start tackling this issue, and hoping to get some
guidance. The error is infrequent. We have 11 total replication chains, and
this error has occurred on 5 of those chains in approximately 2 months.

Thomas Munro

thomas.munro@gmail.com

almost 8 years ago

In reply to: Devin Christensen (#1)

Re: incorrect resource manager data checksum in record

On Fri, Jun 29, 2018 at 5:44 AM, Devin Christensen <quixoten@gmail.com>
wrote:

The pattern is the same, regardless of ubuntu or postgresql versions. I'm
concerned this is somehow a ZFS corruption bug, because the error always
occurs downstream of the first ZFS node and ZFS is a recent addition. I
don't know enough about what this error means, and haven't found much
online. When I restart the nodes effected, replication resumes normally,
with no known side-effects that I've discovered so far, but I'm no longer
confident that the data downstream from the primary is valid. Really not
sure how best to start tackling this issue, and hoping to get some

guidance.

The error is infrequent. We have 11 total replication chains, and this

error

has occurred on 5 of those chains in approximately 2 months.

It's possible and sometimes expected to see that error when there has been
a crash, but you didn't mention that. From your description it sounds like
it's happening in the middle of streaming, right? My first thought was
that the filesystem change is surely a red herring. But... I did find this
similar complaint that involves an ext4 primary and a btrfs replica:

https://dba.stackexchange.com/questions/116569/postgresql-docker-incorrect-resource-manager-data-checksum-in-record-at-46f-6

I'm having trouble imagining how the filesystem could be triggering a
problem though (unless ZoL is dramatically less stable than on other
operating systems, "ZFS ate my bytes" seems like a super unlikely theory).
Perhaps by being slower, it triggers a bug elsewhere? We did have a report
recently of ZFS recycling WAL files very slowly (presumably because when it
moves the old file to become the new file, it finishes up slurping it back
into memory even though we're just going to overwrite it, and it can't see
that because our writes don't line up with the ZFS record size, possibly
unlike ye olde write-in-place 4k block filesystems, but that's just my
guess). Does your machine have ECC RAM?

--
Thomas Munro
http://www.enterprisedb.com

Devin Christensen

quixoten@gmail.com

almost 8 years ago

In reply to: Thomas Munro (#2)

Re: incorrect resource manager data checksum in record

From your description it sounds like it's happening in the middle of

streaming, right?

Correct. None of the instances in the chain experience a crash. Most of the
time I see the "incorrect resource manager data checksum in record" error,
but I've also seen it manifested as:

invalid magic number 8813 in log segment 000000030000AEC20000009C, offset
15335424

I did find this similar complaint that involves an ext4 primary and a

btrfs replica:

It is interesting that my issue occurs on the first hop from ZFS to ext4. I
have not seen any instances of this happening going from the ext4 primary
to the first ZFS replica.

We did have a report recently of ZFS recycling WAL files very slowly

Do you know what version of ZFS that effected? We're currently on 0.6.5.6,
but could upgrade to 0.7.5 on Ubuntu 18.04

Does your machine have ECC RAM?

Yes, all the servers have registered ECC RAM.

---

I'm considering changing the replication configuration from:

ext4 -> zfs -> ext4 -> zfs

ext4 -> zfs -> zfs -> ext4

If the issue only occurs downstream of ZFS, this will give me twice as many
chances for it to occur, and I would expect to see some instances where
only the last ext4 node is effected, and some where the last ZFS and the
last ext4 node is effected. Not sure how much it helps, but at least I
might be able to collect more data until I find a reliable way to reproduce.

---

FYI, I'd be happy to discuss paid consulting if this is an issue in your
wheelhouse and that's something you're interested in.

On Thu, Jun 28, 2018 at 4:13 PM Thomas Munro <thomas.munro@enterprisedb.com>
wrote:

Show quoted text

On Fri, Jun 29, 2018 at 5:44 AM, Devin Christensen <quixoten@gmail.com>
wrote:

The pattern is the same, regardless of ubuntu or postgresql versions. I'm
concerned this is somehow a ZFS corruption bug, because the error always
occurs downstream of the first ZFS node and ZFS is a recent addition. I
don't know enough about what this error means, and haven't found much
online. When I restart the nodes effected, replication resumes normally,
with no known side-effects that I've discovered so far, but I'm no longer
confident that the data downstream from the primary is valid. Really not
sure how best to start tackling this issue, and hoping to get some

guidance.

The error is infrequent. We have 11 total replication chains, and this

error

has occurred on 5 of those chains in approximately 2 months.

It's possible and sometimes expected to see that error when there has been
a crash, but you didn't mention that. From your description it sounds like
it's happening in the middle of streaming, right? My first thought was
that the filesystem change is surely a red herring. But... I did find this
similar complaint that involves an ext4 primary and a btrfs replica:

https://dba.stackexchange.com/questions/116569/postgresql-docker-incorrect-resource-manager-data-checksum-in-record-at-46f-6

I'm having trouble imagining how the filesystem could be triggering a
problem though (unless ZoL is dramatically less stable than on other
operating systems, "ZFS ate my bytes" seems like a super unlikely theory).
Perhaps by being slower, it triggers a bug elsewhere? We did have a report
recently of ZFS recycling WAL files very slowly (presumably because when it
moves the old file to become the new file, it finishes up slurping it back
into memory even though we're just going to overwrite it, and it can't see
that because our writes don't line up with the ZFS record size, possibly
unlike ye olde write-in-place 4k block filesystems, but that's just my
guess). Does your machine have ECC RAM?

--
Thomas Munro
http://www.enterprisedb.com

Thomas Munro

thomas.munro@gmail.com

almost 8 years ago

In reply to: Devin Christensen (#3)

Re: incorrect resource manager data checksum in record

On Fri, Jun 29, 2018 at 1:14 PM, Devin Christensen <quixoten@gmail.com> wrote:

From your description it sounds like it's happening in the middle of
streaming, right?

Correct. None of the instances in the chain experience a crash. Most of the
time I see the "incorrect resource manager data checksum in record" error,
but I've also seen it manifested as:

invalid magic number 8813 in log segment 000000030000AEC20000009C, offset
15335424

I note that that isn't at a segment boundary. Is that also the case
for the other error?

One theory would be that there is a subtle FS cache coherency problem
between writes and reads of a file from different processes
(causality), on that particular stack. Maybe not too many programs
pass data through files with IPC to signal progress in this kinda
funky way, but that'd certainly be a violation of POSIX if it didn't
work correctly and I think people would know about that so I feel a
bit silly suggesting it. To follow that hypothesis to the next step:
I suppose it succeeds after you restart because it requests the whole
segment again and gets a coherent copy all the way down the chain.
Another idea would be that our flush pointer tracking and IPC is
somehow subtly wrong and that's exposed by different timing leading to
incoherent reads, but I feel like we would know about that by now too.

I'm not really a replication expert, so I could be missing something
simple here. Anyone?

I did find this similar complaint that involves an ext4 primary and a
btrfs replica:

It is interesting that my issue occurs on the first hop from ZFS to ext4. I
have not seen any instances of this happening going from the ext4 primary to
the first ZFS replica.

I happen to have a little office server that uses ZFS so I left it
chugging through a massive pgbench session with a chain of 3 replicas
while I worked on other stuff, and didn't see any problems (no ext4
involved though, this is a FreeBSD box). I also tried
--wal-segsize=1MB (a feature of 11) to get some more frequent
recycling happening just in case it was relevant.

We did have a report recently of ZFS recycling WAL files very slowly

Do you know what version of ZFS that effected? We're currently on 0.6.5.6,
but could upgrade to 0.7.5 on Ubuntu 18.04

I think that issue is fundamental/all versions, and has something to
with the record size (if you have 128KB ZFS records and someone writes
8KB, it probably needs to read a whole 128KB record in, whereas with
ext4 et al you have 4KB blocks and the OS can very often skip reading
it in because it can see you're entirely overwriting blocks), and
possibly the COW design too (I dunno). Here's the recent thread,
which points back to an older one, from some Joyent guys who I gather
are heavy ZFS users:

/messages/by-id/CACPQ5FpEY9CfUF6XKs5sBBuaOoGEiO8KD4SuX06wa4ATsesaqg@mail.gmail.com

There was a ZoL bug that made headlines recently but that was in 0.7.7
so not relevant to your case.

--
Thomas Munro
http://www.enterprisedb.com

Brent Wood

Brent.Wood@niwa.co.nz

almost 8 years ago

In reply to: Thomas Munro (#2)

pgloader question - postgis support

Hi,

I'm looking at pgloader to automate data loading into a Postgis enabled Postgres database.

I have seen in the tutorial how the internal point type is supported, and how the postgis extension can be pre-installed by pgloader if necessary, but I can't see how I might take x & y values & save as a postgis point. I guess I could do this via a trigger, but that is hopefully an unnecessary workaround.

if I read data in (say from CSV):

1,tan0104,1,173.567,-43.678

...

to a Postgis table:

create table station

(id int primary key,

trip char(7)

station_no int,

lon_s decimal(7,4),

lat_s decimal(6,4),

startp geometry(POINT,4326));

the startp column is populated by the SQL:

startp=ST_SetSRID(ST_MakePoint(lon_s,lat_s),4326)

This creates a point feature from the x (lon) & y (lat) coordinates,

and specifies the coordinate reference system as lat/lon degrees WGS84 (EPSG:4326) to match the column specification.

How can I implement that in the pgloader command to load the CSV file?

Thanks

Brent Wood

Programme leader: Environmental Information Delivery
NIWA
DDI: +64 (4) 3860529

Brent Wood
Principal Technician - GIS and Spatial Data Management
Programme Leader - Environmental Information Delivery
+64-4-386-0529 | 301 Evans Bay Parade, Greta Point, Wellington | www.niwa.co.nz<http://www.niwa.co.nz>
[NIWA]<http://www.niwa.co.nz>
To ensure compliance with legal requirements and to maintain cyber security standards, NIWA's IT systems are subject to ongoing monitoring, activity logging and auditing. This monitoring and auditing service may be provided by third parties. Such third parties can access information transmitted to, processed by and stored on NIWA's IT systems.

Dimitri Fontaine

dimitri@2ndQuadrant.fr

almost 8 years ago

In reply to: Brent Wood (#5)

Re: pgloader question - postgis support

Hi Brent,

Yes I think it's possible, simply playing with the TARGET COLUMNS clause of the pgloader command. Would you mind opening an issue on Github, where I track bug fixes and user requests, so that our conversation is then publicly archived and available to other PostGIS and pgloader users?

Thanks,
--
Dimitri Fontaine
dim@tapoueh.org

Show quoted text

On Fri, Jun 29, 2018, at 04:33, Brent Wood wrote:

Hi,

I'm looking at pgloader to automate data loading into a Postgis enabled
Postgres database.

I have seen in the tutorial how the internal point type is supported,
and how the postgis extension can be pre-installed by pgloader if
necessary, but I can't see how I might take x & y values & save as a
postgis point. I guess I could do this via a trigger, but that is
hopefully an unnecessary workaround.

if I read data in (say from CSV):

1,tan0104,1,173.567,-43.678

...

to a Postgis table:

create table station

(id int primary key,

trip char(7)

station_no int,

lon_s decimal(7,4),

lat_s decimal(6,4),

startp geometry(POINT,4326));

the startp column is populated by the SQL:

startp=ST_SetSRID(ST_MakePoint(lon_s,lat_s),4326)

This creates a point feature from the x (lon) & y (lat) coordinates,

and specifies the coordinate reference system as lat/lon degrees WGS84
(EPSG:4326) to match the column specification.

How can I implement that in the pgloader command to load the CSV file?

Thanks

Brent Wood

Programme leader: Environmental Information Delivery
NIWA
DDI: +64 (4) 3860529

Brent Wood
Principal Technician - GIS and Spatial Data Management
Programme Leader - Environmental Information Delivery
+64-4-386-0529 | 301 Evans Bay Parade, Greta Point, Wellington |
www.niwa.co.nz<http://www.niwa.co.nz>
[NIWA]<http://www.niwa.co.nz>
To ensure compliance with legal requirements and to maintain cyber
security standards, NIWA's IT systems are subject to ongoing monitoring,
activity logging and auditing. This monitoring and auditing service may
be provided by third parties. Such third parties can access information
transmitted to, processed by and stored on NIWA's IT systems.

Email had 1 attachment:
+ image9c19a7.JPG
28k (image/jpeg)

incorrect resource manager data checksum in record

Attachments: