A couple logical decoding fixes/patches

Started by Andres Freundover 11 years ago6 messages
#1Andres Freund
andres@2ndquadrant.com
3 attachment(s)

Hi,

Patch 01: Fix a couple of embarassing typos. Most of them harmless, but
one isn't and can lead to crashing or decoding wrong data.

Patch 02: Don't crash with an Assert() failure if wal_level=logical but
max_replication_slots=0.

Patch 03: Add valgrind suppression for writing out padding bytes. That's
better than zeroing the data from the get go because unitialized
accesses are still detected.

Details are in the commit messages of the individual patches.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0001-Fix-a-couple-of-brown-paper-bag-typos-in-the-logical.patchtext/x-patch; charset=us-asciiDownload
>From 9987ff3fead5cee0b9c7da94c8d8a665015be7c2 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 8 May 2014 17:00:06 +0200
Subject: [PATCH 1/3] Fix a couple of brown paper bag typos in the logical
 decoding code.

Most of these were fairly harmless but the typos in
ReorderBufferSerializeChange() could lead to crashes or wrong data
being decoded.

The ReorderBufferSerializeChange() bug was encountered by Christian
Kruse; the rest were found during a review for similar sillyness.
---
 src/backend/replication/logical/reorderbuffer.c | 13 +++++--------
 src/backend/replication/logical/snapbuild.c     | 11 +++++------
 2 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 7f2bbca..f96e3e1 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2064,15 +2064,15 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				if (snap->xcnt)
 				{
 					memcpy(data, snap->xip,
-						   sizeof(TransactionId) + snap->xcnt);
-					data += sizeof(TransactionId) + snap->xcnt;
+						   sizeof(TransactionId) * snap->xcnt);
+					data += sizeof(TransactionId) * snap->xcnt;
 				}
 
 				if (snap->subxcnt)
 				{
 					memcpy(data, snap->subxip,
-						   sizeof(TransactionId) + snap->subxcnt);
-					data += sizeof(TransactionId) + snap->subxcnt;
+						   sizeof(TransactionId) * snap->subxcnt);
+					data += sizeof(TransactionId) * snap->subxcnt;
 				}
 				break;
 			}
@@ -2168,15 +2168,12 @@ ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 
 		}
 
-		ReorderBufferSerializeReserve(rb, sizeof(ReorderBufferDiskChange));
-
-
 		/*
 		 * Read the statically sized part of a change which has information
 		 * about the total size. If we couldn't read a record, we're at the
 		 * end of this file.
 		 */
-
+		ReorderBufferSerializeReserve(rb, sizeof(ReorderBufferDiskChange));
 		readBytes = read(*fd, rb->outbuf, sizeof(ReorderBufferDiskChange));
 
 		/* eof */
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index cb45f90..f00fd7d 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -307,8 +307,7 @@ AllocateSnapshotBuilder(ReorderBuffer *reorder,
 	builder->committed.xip =
 		palloc0(builder->committed.xcnt_space * sizeof(TransactionId));
 	builder->committed.includes_all_transactions = true;
-	builder->committed.xip =
-		palloc0(builder->committed.xcnt_space * sizeof(TransactionId));
+
 	builder->initial_xmin_horizon = xmin_horizon;
 	builder->transactions_after = start_lsn;
 
@@ -1691,7 +1690,7 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
 
 	/* restore running xacts information */
 	sz = sizeof(TransactionId) * ondisk.builder.running.xcnt_space;
-	ondisk.builder.running.xip = MemoryContextAlloc(builder->context, sz);
+	ondisk.builder.running.xip = MemoryContextAllocZero(builder->context, sz);
 	readBytes = read(fd, ondisk.builder.running.xip, sz);
 	if (readBytes != sz)
 	{
@@ -1705,7 +1704,7 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
 
 	/* restore committed xacts information */
 	sz = sizeof(TransactionId) * ondisk.builder.committed.xcnt;
-	ondisk.builder.committed.xip = MemoryContextAlloc(builder->context, sz);
+	ondisk.builder.committed.xip = MemoryContextAllocZero(builder->context, sz);
 	readBytes = read(fd, ondisk.builder.committed.xip, sz);
 	if (readBytes != sz)
 	{
@@ -1763,10 +1762,10 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
 	}
 	ondisk.builder.committed.xip = NULL;
 
-	builder->running.xcnt = ondisk.builder.committed.xcnt;
+	builder->running.xcnt = ondisk.builder.running.xcnt;
 	if (builder->running.xip)
 		pfree(builder->running.xip);
-	builder->running.xcnt_space = ondisk.builder.committed.xcnt_space;
+	builder->running.xcnt_space = ondisk.builder.running.xcnt_space;
 	builder->running.xip = ondisk.builder.running.xip;
 
 	/* our snapshot is not interesting anymore, build a new one */
-- 
1.8.5.rc2.dirty

0002-Remove-overeager-assertion-in-wal_level-logical-spec.patchtext/x-patch; charset=us-asciiDownload
>From 863589478ce185cba02e8ad5a4ff2a8b79588cb0 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 8 May 2014 18:02:20 +0200
Subject: [PATCH 2/3] Remove overeager assertion in wal_level=logical specific
 code.

The assertion triggered when a relation was rewritten while
wal_level=logical but max_replication_slots=0 was set. That's not a
meaningful configuration, but that's not a excuse for crashing.
---
 src/backend/access/heap/rewriteheap.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index 7b57911..687e76e 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -812,8 +812,6 @@ logical_begin_heap_rewrite(RewriteState state)
 	if (!state->rs_logical_rewrite)
 		return;
 
-	Assert(ReplicationSlotCtl != NULL);
-
 	ProcArrayGetReplicationSlotXmin(NULL, &logical_xmin);
 
 	/*
-- 
1.8.5.rc2.dirty

0003-Add-valgrind-suppression-to-ignore-padding-bytes-in-.patchtext/x-patch; charset=us-asciiDownload
>From 2cc7da5accda01b579bb785b510bb16d133516d9 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 8 May 2014 18:16:34 +0200
Subject: [PATCH 3/3] Add valgrind suppression to ignore padding bytes in
 reorderbuffer disk writes.

---
 src/tools/valgrind.supp | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/src/tools/valgrind.supp b/src/tools/valgrind.supp
index d3447d7..b0ab53d 100644
--- a/src/tools/valgrind.supp
+++ b/src/tools/valgrind.supp
@@ -63,6 +63,16 @@
 	fun:write_relcache_init_file
 }
 
+{
+	padding_reorderbuffer_serialize
+	Memcheck:Param
+	write(buf)
+
+	...
+	fun:ReorderBufferSerializeChange
+}
+
+
 
 # gcc on ppc64 can generate a four-byte read to fetch the final "char" fields
 # of a FormData_pg_cast.  This is valid compiler behavior, because a proper
-- 
1.8.5.rc2.dirty

#2Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#1)
Re: A couple logical decoding fixes/patches

On Thu, May 8, 2014 at 12:29 PM, Andres Freund <andres@2ndquadrant.com> wrote:

Patch 01: Fix a couple of embarassing typos. Most of them harmless, but
one isn't and can lead to crashing or decoding wrong data.

Committed.

Patch 02: Don't crash with an Assert() failure if wal_level=logical but
max_replication_slots=0.

Committed.

Patch 03: Add valgrind suppression for writing out padding bytes. That's
better than zeroing the data from the get go because unitialized
accesses are still detected.

I have no understanding of valgrind suppressions. I can commit this
blindly if nobody else wants to pick it up.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Andres Freund
andres@2ndquadrant.com
In reply to: Robert Haas (#2)
Re: A couple logical decoding fixes/patches

Hi,

On 2014-05-09 10:49:09 -0400, Robert Haas wrote:

Committed.

Thanks.

Patch 03: Add valgrind suppression for writing out padding bytes. That's
better than zeroing the data from the get go because unitialized
accesses are still detected.

I have no understanding of valgrind suppressions. I can commit this
blindly if nobody else wants to pick it up.

I think the only committer with previous experience in that area is
Noah. Noah?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Noah Misch
noah@leadboat.com
In reply to: Andres Freund (#3)
Re: A couple logical decoding fixes/patches

On Fri, May 09, 2014 at 04:58:54PM +0200, Andres Freund wrote:

On 2014-05-09 10:49:09 -0400, Robert Haas wrote:

Patch 03: Add valgrind suppression for writing out padding bytes. That's
better than zeroing the data from the get go because unitialized
accesses are still detected.

I have no understanding of valgrind suppressions. I can commit this
blindly if nobody else wants to pick it up.

I think the only committer with previous experience in that area is
Noah. Noah?

I can pick up that patch.

Static functions having only one call site are especially vulnerable to
inlining, so avoid naming them in the suppressions file. I do see
ReorderBufferSerializeChange() inlined away at -O2 and higher. Is it fair to
tie the suppression to ReorderBufferSerializeTXN() instead?

Do you happen to have a self-contained procedure for causing the server to
reach the code in question?

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Andres Freund
andres@2ndquadrant.com
In reply to: Noah Misch (#4)
Re: A couple logical decoding fixes/patches

On 2014-05-10 00:59:59 -0400, Noah Misch wrote:

On Fri, May 09, 2014 at 04:58:54PM +0200, Andres Freund wrote:

On 2014-05-09 10:49:09 -0400, Robert Haas wrote:

Patch 03: Add valgrind suppression for writing out padding bytes. That's
better than zeroing the data from the get go because unitialized
accesses are still detected.

I have no understanding of valgrind suppressions. I can commit this
blindly if nobody else wants to pick it up.

I think the only committer with previous experience in that area is
Noah. Noah?

I can pick up that patch.

Cool.

Static functions having only one call site are especially vulnerable to
inlining, so avoid naming them in the suppressions file. I do see
ReorderBufferSerializeChange() inlined away at -O2 and higher. Is it fair to
tie the suppression to ReorderBufferSerializeTXN() instead?

Hm. That's a good point. If you're talking about tying it to
ReorderBufferSerializeTXN() you mean to list it below the write, as part
of the callstack?

{
padding_reorderbuffer_serialize
Memcheck:Param
write(buf)

...
fun:ReorderBufferSerializeTXN
}

If so, yes, that should be fine. Since there's no other writes it
shouldn't make a difference.

Do you happen to have a self-contained procedure for causing the server to
reach the code in question?

(cd contrib/test_decoding && make -s installcheck-force)
against a server running with
valgrind \
--quiet --trace-children=yes --leak-check=no --track-origins=yes \
--read-var-info=yes run-pg-dev-master -c logging_collector=on \
--suppressions=/home/andres/src/postgresql/src/tools/valgrind.supp
<path/to/postgres> \
-c wal_level=logical -c max_replication_slots=3

Does the trick here. Valgrind warns in the first (ddl) test run.

the -force is needed because it needs a server running with -c
wal_level=logical -c max_replication_slots=3.

Note that you'll possibly get a spurious regression test failure because
autovacuum ran inbetween and it's transaction is visible in the test
output like:
BEGIN
+ COMMIT
+ BEGIN
...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Noah Misch
noah@leadboat.com
In reply to: Andres Freund (#5)
Re: A couple logical decoding fixes/patches

On Sat, May 10, 2014 at 04:56:51PM +0200, Andres Freund wrote:

On 2014-05-10 00:59:59 -0400, Noah Misch wrote:

Static functions having only one call site are especially vulnerable to
inlining, so avoid naming them in the suppressions file. I do see
ReorderBufferSerializeChange() inlined away at -O2 and higher. Is it fair to
tie the suppression to ReorderBufferSerializeTXN() instead?

Hm. That's a good point. If you're talking about tying it to
ReorderBufferSerializeTXN() you mean to list it below the write, as part
of the callstack?

{
padding_reorderbuffer_serialize
Memcheck:Param
write(buf)

...
fun:ReorderBufferSerializeTXN
}

If so, yes, that should be fine. Since there's no other writes it
shouldn't make a difference.

Yep. Committed that way.

Do you happen to have a self-contained procedure for causing the server to
reach the code in question?

(cd contrib/test_decoding && make -s installcheck-force)
against a server running with
valgrind \
--quiet --trace-children=yes --leak-check=no --track-origins=yes \
--read-var-info=yes run-pg-dev-master -c logging_collector=on \
--suppressions=/home/andres/src/postgresql/src/tools/valgrind.supp
<path/to/postgres> \
-c wal_level=logical -c max_replication_slots=3

Does the trick here. Valgrind warns in the first (ddl) test run.

Thanks.

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers