[BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen

Started by 158306855over 9 years ago8 messagesbugs
Jump to latest
#1158306855
anderson2013@qq.com

hi team:

I found a postgresql 9.4.10 Logical decoding problem
heapam.c:6976 xlog store incorrect oldtuplen when tuplelen is greater than the value field of uint16.

a uint32 variable is assigned to uint16

The structure that holds the oldkey length is:
typedef struct xl_heap_header_len
{
uint16 t_len;
xl_heap_header header;
} xl_heap_header_len;

The problem will lead to logic decoding failure when oldtuplene > 65535 and relreplident = FULL.
The table that triggers the problem is usually relreplident = FULL.

#2Michael Paquier
michael@paquier.xyz
In reply to: 158306855 (#1)
Re: [BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen

On Tue, Dec 27, 2016 at 12:30 PM, anderson <anderson2013@qq.com> wrote:

I found a postgresql 9.4.10 Logical decoding problem
heapam.c:6976 xlog store incorrect oldtuplen when tuplelen is greater than
the value field of uint16.

a uint32 variable is assigned to uint16

The structure that holds the oldkey length is:
typedef struct xl_heap_header_len
{
uint16 t_len;
xl_heap_header header;
} xl_heap_header_len;

The problem will lead to logic decoding failure when oldtuplene > 65535 and
relreplident = FULL.
The table that triggers the problem is usually relreplident = FULL.

Do you have a test case able to reproduce the problem?
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3158306855
anderson2013@qq.com
In reply to: Michael Paquier (#2)

yes i can.

reproduce the step:

1. create table testcase(a int,b int,c text,d text);

2. ALTER TABLE ONLY testcase REPLICA IDENTITY FULL;

3. /u01/pgsql/bin/psql -d postgres -p5559 -c "copy testcase from ‘/tmp/testcase.csv' DELIMITER E'\t' csv QUOTE '''' ";

4. select pg_create_logical_replication_slot('logical_slot','test_decoding');

--update one row
5. update testcase set b = 1;

6. select count(*) from pg_logical_slot_peek_binary_changes('logical_slot',NULL,NULL);

7. postgres=# select count(*) from pg_logical_slot_peek_binary_changes('logical_slot',NULL,NULL);
ERROR: compressed data is corrupt

------------------ 原始邮件 ------------------
发件人: "Michael Paquier";<michael.paquier@gmail.com>;
发送时间: 2016年12月27日(星期二) 中午1:29
收件人: "anderson"<anderson2013@qq.com>;
抄送: "pgsql-bugs"<pgsql-bugs@postgresql.org>;
主题: Re: [BUGS] [BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen

On Tue, Dec 27, 2016 at 12:30 PM, anderson <anderson2013@qq.com> wrote:

I found a postgresql 9.4.10 Logical decoding problem
heapam.c:6976 xlog store incorrect oldtuplen when tuplelen is greater than
the value field of uint16.

a uint32 variable is assigned to uint16

The structure that holds the oldkey length is:
typedef struct xl_heap_header_len
{
uint16 t_len;
xl_heap_header header;
} xl_heap_header_len;

The problem will lead to logic decoding failure when oldtuplene > 65535 and
relreplident = FULL.
The table that triggers the problem is usually relreplident = FULL.

Do you have a test case able to reproduce the problem?
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#4158306855
anderson2013@qq.com
In reply to: 158306855 (#3)
Re: [BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen

yes i can.

reproduce the step:

1. create table testcase(a int,b int,c text,d text);

2. ALTER TABLE ONLY testcase REPLICA IDENTITY FULL;

3. psql -d postgres -p5559 -c "copy testcase from ‘/tmp/testcase.csv' DELIMITER E'\t' csv QUOTE '''' ";

4. select pg_create_logical_replication_slot('logical_slot','test_decoding');

--update one row
5. update testcase set b = 1;

6. select count(*) from pg_logical_slot_peek_binary_changes('logical_slot',NULL,NULL);
ERROR: compressed data is corrupt

--The data length is incorrect and decompression fails

------------------ 原始邮件 ------------------
发件人: "anderson";<anderson2013@qq.com>;
发送时间: 2016年12月28日(星期三) 晚上7:24
收件人: "Michael Paquier"<michael.paquier@gmail.com>;
抄送: "pgsql-bugs"<pgsql-bugs@postgresql.org>;
主题: [BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen

yes i can.

reproduce the step:

1. create table testcase(a int,b int,c text,d text);

2. ALTER TABLE ONLY testcase REPLICA IDENTITY FULL;

3. /u01/pgsql/bin/psql -d postgres -p5559 -c "copy testcase from ‘/tmp/testcase.csv' DELIMITER E'\t' csv QUOTE '''' ";

4. select pg_create_logical_replication_slot('logical_slot','test_decoding');

--update one row
5. update testcase set b = 1;

6. select count(*) from pg_logical_slot_peek_binary_changes('logical_slot',NULL,NULL);

7. postgres=# select count(*) from pg_logical_slot_peek_binary_changes('logical_slot',NULL,NULL);
ERROR: compressed data is corrupt

------------------ 原始邮件 ------------------
发件人: "Michael Paquier";<michael.paquier@gmail.com>;
发送时间: 2016年12月27日(星期二) 中午1:29
收件人: "anderson"<anderson2013@qq.com>;
抄送: "pgsql-bugs"<pgsql-bugs@postgresql.org>;
主题: Re: [BUGS] [BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen

On Tue, Dec 27, 2016 at 12:30 PM, anderson <anderson2013@qq.com> wrote:

I found a postgresql 9.4.10 Logical decoding problem
heapam.c:6976 xlog store incorrect oldtuplen when tuplelen is greater than
the value field of uint16.

a uint32 variable is assigned to uint16

The structure that holds the oldkey length is:
typedef struct xl_heap_header_len
{
uint16 t_len;
xl_heap_header header;
} xl_heap_header_len;

The problem will lead to logic decoding failure when oldtuplene > 65535 and
relreplident = FULL.
The table that triggers the problem is usually relreplident = FULL.

Do you have a test case able to reproduce the problem?
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Attachments:

testcase.csv.zipapplication/octet-stream; charset=gb18030; name=testcase.csv.zipDownload+2-3
#5Michael Paquier
michael@paquier.xyz
In reply to: 158306855 (#4)
Re: [BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen

On Wed, Dec 28, 2016 at 8:32 PM, anderson <anderson2013@qq.com> wrote:

reproduce the step:

1. create table testcase(a int,b int,c text,d text);

2. ALTER TABLE ONLY testcase REPLICA IDENTITY FULL;

3. psql -d postgres -p5559 -c "copy testcase from ‘/tmp/testcase.csv'
DELIMITER E'\t' csv QUOTE '''' ";

4. select
pg_create_logical_replication_slot('logical_slot','test_decoding');

--update one row
5. update testcase set b = 1;

6. select count(*) from
pg_logical_slot_peek_binary_changes('logical_slot',NULL,NULL);
ERROR: compressed data is corrupt
--The data length is incorrect and decompression fails

OK, I can see the problem. And I can see as well that things have been
tackled in this area with 9.5 thanks to commit 2c03216d...
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#6Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#5)
Re: [BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen

On Sun, Jan 1, 2017 at 9:06 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

OK, I can see the problem. And I can see as well that things have been
tackled in this area with 9.5 thanks to commit 2c03216d...

(Adding Andres in CC for some input on the matter)
Looking at that... As HeapTupleData->t_len is uint32, it is really an
oversight in the 9.4 WAL logging machinery that xl_heap_header_len
uses uint16. If xl_heap_update->flags still had a bit free, we could
have for example use XLOG_HEAP_CONTAINS_OLD_TUPLE to track the
existing short version of xl_heap_header_len and add a new bit for a
"long" version where a version of xl_heap_header_len including uint32
as t_len is read. This would preserve replay from doing stupid things
for existing 9.4 deployments and allow new records to work fully. But
there are no bits free here. As well as there are no bits for
XLOG_HEAP[1|2]. Andres, perhaps you have an idea?

Looking at this code, It is worth noting that
XLOG_HEAP_CONTAINS_OLD_TUPLE and XLOG_HEAP_CONTAINS_OLD_KEY are always
used just for XLOG_HEAP_CONTAINS_OLD, so we could merge them and have
a single flag to do the whole thing, the parent's replica identity
value being here to track if a full version of the old tuple is logged
or not.
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#7Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#6)
Re: [BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen

On 2017-01-05 13:09:28 +0900, Michael Paquier wrote:

On Sun, Jan 1, 2017 at 9:06 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

OK, I can see the problem. And I can see as well that things have been
tackled in this area with 9.5 thanks to commit 2c03216d...

(Adding Andres in CC for some input on the matter)

Thanks.

Looking at that... As HeapTupleData->t_len is uint32, it is really an
oversight in the 9.4 WAL logging machinery that xl_heap_header_len
uses uint16.

Not generally, no. For new tuples and such it's perfectly fine - they
can't be bigger than MaxHeapTupleSize. It's just for the old-tuple where
(for FULL), we inline toasted columns.

If xl_heap_update->flags still had a bit free, we could
have for example use XLOG_HEAP_CONTAINS_OLD_TUPLE to track the
existing short version of xl_heap_header_len and add a new bit for a
"long" version where a version of xl_heap_header_len including uint32
as t_len is read. This would preserve replay from doing stupid things
for existing 9.4 deployments and allow new records to work fully. But
there are no bits free here. As well as there are no bits for
XLOG_HEAP[1|2]. Andres, perhaps you have an idea?

Looking at this code, It is worth noting that
XLOG_HEAP_CONTAINS_OLD_TUPLE and XLOG_HEAP_CONTAINS_OLD_KEY are always
used just for XLOG_HEAP_CONTAINS_OLD, so we could merge them and have
a single flag to do the whole thing, the parent's replica identity
value being here to track if a full version of the old tuple is logged
or not.

That sounds like a bad idea to me. But I don't have one a lot better,
and as you say the knowledge is currently not required. If both bits
are set it's a 32bit len, otherwise 16. Yay.

Andres

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#8Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#7)
Re: [BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen

On 2017-01-05 06:48:20 -0800, Andres Freund wrote:

On 2017-01-05 13:09:28 +0900, Michael Paquier wrote:

If xl_heap_update->flags still had a bit free, we could
have for example use XLOG_HEAP_CONTAINS_OLD_TUPLE to track the
existing short version of xl_heap_header_len and add a new bit for a
"long" version where a version of xl_heap_header_len including uint32
as t_len is read. This would preserve replay from doing stupid things
for existing 9.4 deployments and allow new records to work fully. But
there are no bits free here. As well as there are no bits for
XLOG_HEAP[1|2]. Andres, perhaps you have an idea?

Looking at this code, It is worth noting that
XLOG_HEAP_CONTAINS_OLD_TUPLE and XLOG_HEAP_CONTAINS_OLD_KEY are always
used just for XLOG_HEAP_CONTAINS_OLD, so we could merge them and have
a single flag to do the whole thing, the parent's replica identity
value being here to track if a full version of the old tuple is logged
or not.

That sounds like a bad idea to me. But I don't have one a lot better,
and as you say the knowledge is currently not required. If both bits
are set it's a 32bit len, otherwise 16. Yay.

There luckily is a better solution: We can just disregard the old
tuple's t_len, and compute it via r->xl_len
-#size-of-all-new-tuple-stuff. There's really no need for the new
tuple's t_len via xl_heap_header_len.

Pushed a fix. Could have removed copying of xl_heap_header_len in
DecodeUpdate(), but decided that to go as small as possible.

- Andres

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs