pgsql: Enable logical slots to follow timeline switches

Started by Alvaro Herreraalmost 10 years ago9 messages
#1Alvaro Herrera
alvherre@alvh.no-ip.org

Enable logical slots to follow timeline switches

When decoding from a logical slot, it's necessary for xlog reading to be
able to read xlog from historical (i.e. not current) timelines;
otherwise, decoding fails after failover, because the archives are in
the historical timeline. This is required to make "failover logical
slots" possible; it currently has no other use, although theoretically
it could be used by an extension that creates a slot on a standby and
continues to replay from the slot when the standby is promoted.

This commit includes a module in src/test/modules with functions to
manipulate the slots (which is not otherwise possible in SQL code) in
order to enable testing, and a new test in src/test/recovery to ensure
that the behavior is as expected.

Author: Craig Ringer
Reviewed-By: Oleksii Kliukin, Andres Freund, Petr Jelínek

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/24c5f1a103ce6656a5cb430d9a996c34e61ab2a5

Modified Files
--------------
src/backend/access/transam/xlogreader.c | 9 +
src/backend/access/transam/xlogutils.c | 244 +++++++++++++++--
src/backend/replication/logical/logicalfuncs.c | 17 +-
src/include/access/xlogreader.h | 18 ++
src/test/modules/Makefile | 1 +
src/test/modules/test_slot_timelines/.gitignore | 3 +
src/test/modules/test_slot_timelines/Makefile | 22 ++
src/test/modules/test_slot_timelines/README | 19 ++
.../expected/load_extension.out | 19 ++
.../test_slot_timelines/sql/load_extension.sql | 7 +
.../test_slot_timelines--1.0.sql | 16 ++
.../test_slot_timelines/test_slot_timelines.c | 133 +++++++++
.../test_slot_timelines/test_slot_timelines.conf | 2 +
.../test_slot_timelines.control | 5 +
src/test/recovery/Makefile | 2 +
.../recovery/t/006_logical_decoding_timelines.pl | 304 +++++++++++++++++++++
16 files changed, 790 insertions(+), 31 deletions(-)

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#1)
Re: pgsql: Enable logical slots to follow timeline switches

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Enable logical slots to follow timeline switches

Buildfarm doesn't like this one bit :-(

regards, tom lane

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#2)
Re: pgsql: Enable logical slots to follow timeline switches

Tom Lane wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Enable logical slots to follow timeline switches

Buildfarm doesn't like this one bit :-(

Argh, forgot the alternate expected file when the needed feature is
disabled. Will fix.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#4Michael Paquier
michael.paquier@gmail.com
In reply to: Alvaro Herrera (#3)
Re: pgsql: Enable logical slots to follow timeline switches

On Thu, Mar 31, 2016 at 8:45 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Tom Lane wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Enable logical slots to follow timeline switches

Buildfarm doesn't like this one bit :-(

Argh, forgot the alternate expected file when the needed feature is
disabled. Will fix.

hamster complains here:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&amp;dt=2016-03-31%2016%3A00%3A06
[...]
# Copying slots to replica
after_basebackup|test_decoding||547|0/5000060|0/5000098
# Copying slot 'after_basebackup','test_decoding',NULL,'547','0/5000060','0/5000098'
connection error: 'psql:<stdin>:1: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:<stdin>:1: connection to server was lost'
--
Michael

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#5Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#4)
Re: [COMMITTERS] pgsql: Enable logical slots to follow timeline switches

Moving thread to -hackers, CC'ing Craig.

Michael Paquier wrote:

hamster complains here:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&amp;dt=2016-03-31%2016%3A00%3A06
[...]
# Copying slots to replica
after_basebackup|test_decoding||547|0/5000060|0/5000098
# Copying slot 'after_basebackup','test_decoding',NULL,'547','0/5000060','0/5000098'
connection error: 'psql:<stdin>:1: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:<stdin>:1: connection to server was lost'

Ahem.

So, I see this:

DEBUG: server process (PID 24166) exited with exit code 0
DEBUG: forked new backend, pid=24168 socket=7
LOG: statement: SELECT test_slot_timelines_advance_logical_slot('after_basebackup', NULL, '547', '0/5000060', '0/5000098');
DEBUG: server process (PID 24168) was terminated by signal 11: Segmentation fault
DETAIL: Failed process was running: SELECT test_slot_timelines_advance_logical_slot('after_basebackup', NULL, '547', '0/5000060', '0/5000098');
LOG: server process (PID 24168) was terminated by signal 11: Segmentation fault

in pgsql.build/src/test/recovery/tmp_check/log/006_logical_decoding_timelines_replica2.log

Could you have a look at whether you have core dumps from these? If so,
backtraces would be very useful.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Petr Jelinek
petr@2ndquadrant.com
In reply to: Alvaro Herrera (#5)
Re: Re: [COMMITTERS] pgsql: Enable logical slots to follow timeline switches

On 01/04/16 03:49, Alvaro Herrera wrote:

Moving thread to -hackers, CC'ing Craig.

Michael Paquier wrote:

hamster complains here:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&amp;dt=2016-03-31%2016%3A00%3A06
[...]
# Copying slots to replica
after_basebackup|test_decoding||547|0/5000060|0/5000098
# Copying slot 'after_basebackup','test_decoding',NULL,'547','0/5000060','0/5000098'
connection error: 'psql:<stdin>:1: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:<stdin>:1: connection to server was lost'

Ahem.

So, I see this:

DEBUG: server process (PID 24166) exited with exit code 0
DEBUG: forked new backend, pid=24168 socket=7
LOG: statement: SELECT test_slot_timelines_advance_logical_slot('after_basebackup', NULL, '547', '0/5000060', '0/5000098');
DEBUG: server process (PID 24168) was terminated by signal 11: Segmentation fault
DETAIL: Failed process was running: SELECT test_slot_timelines_advance_logical_slot('after_basebackup', NULL, '547', '0/5000060', '0/5000098');
LOG: server process (PID 24168) was terminated by signal 11: Segmentation fault

in pgsql.build/src/test/recovery/tmp_check/log/006_logical_decoding_timelines_replica2.log

Could you have a look at whether you have core dumps from these? If so,
backtraces would be very useful.

The function does following:
TransactionId new_xmin = (TransactionId) PG_GETARG_INT64(1);

And we are passing NULL as that parameter, that could explain this.
Also while reading it I wonder if the function should be defined with
xid type rather than bigint and use similar input code as xid.c.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Craig Ringer
craig@2ndquadrant.com
In reply to: Petr Jelinek (#6)
Re: Re: [COMMITTERS] pgsql: Enable logical slots to follow timeline switches

On 1 April 2016 at 11:13, Petr Jelinek <petr@2ndquadrant.com> wrote:

The function does following:
TransactionId new_xmin = (TransactionId) PG_GETARG_INT64(1);

This should be reasonable enough though; down-casting it will discard the
high bits but that's fine when we know there's nothing interesting there.

TransactionId is a uint32 anyway, but I had a reason for the above. There's
no cast from integer to xid, which is why I used bigint here, since we
don't have a uint32 native SQL type. What I *should've* done is simply used
quoted-literal arguments like

XID '1234'

or cast via text

1234::text::xid

so there was no need to use a bigint instead. I'll adjust appropriately so
it uses PG_GETARG_TRANSACTIONID(1) and is declared as accepting 'xid'.

And we are passing NULL as that parameter, that could explain this.

If we're on an int64-by-value platform that'd be wrong but still work, it'd
just be zero. On an int64-by-reference platform it could indeed segfault.
So yes, I'd say that's the cause.

Also while reading it I wonder if the function should be defined with
xid type rather than bigint and use similar input code as xid.c.

Yes, it should.

I'll prep a follow-up patch.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#8Craig Ringer
craig@2ndquadrant.com
In reply to: Craig Ringer (#7)
1 attachment(s)
Re: Re: [COMMITTERS] pgsql: Enable logical slots to follow timeline switches

On 1 April 2016 at 12:47, Craig Ringer <craig@2ndquadrant.com> wrote:

I'll prep a follow-up patch.

Done and attached.

Note that I can't use PG_GETARG_TRANSACTIONID directly since it's a macro
defined only in xid.c . It didn't seem worth extracting it and moving it to
postgres.h (where the other non-ADT-specific PG_GETARG_ macros are) or its
own new header just for this, so I've spelled it out each time.

I now remember that that's part of why I used bigint as an argument type.
The other part is that txid_current() returns bigint and there's no cast
from bigint to xid. So the tests would have to CREATE CAST or cast via
'text'. They now do the latter.

We should probably have a cast from bigint to/from xid, but the type is so
little-used that it's not much fuss.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0001-Fix-test_slot_timelines-buildfarm-failure-on-32-bit-.patchtext/x-patch; charset=US-ASCII; name=0001-Fix-test_slot_timelines-buildfarm-failure-on-32-bit-.patchDownload
From 5e63a5f8e467c7855ab22fa7e5fbdac94d9c87e7 Mon Sep 17 00:00:00 2001
From: Craig Ringer <craig@2ndquadrant.com>
Date: Fri, 1 Apr 2016 13:36:31 +0800
Subject: [PATCH] Fix test_slot_timelines buildfarm failure on 32-bit ARM

The code was reading a NULL 64-bit Datum that is pass-by-value
on 64-bit platforms but pass-by-reference on 32-bit. On 64-bit
this just produces zero, but on 32-bit it'll dereference a NULL
pointer. Since InvalidXLogRecPtr is zero this went unnoticed.

Test for NULL arguments.

In the process, switch to taking 'xid'-typed arguments instead of
bigint. Since txid_current() returns bigint and there's no cast from
bigint to xid this requires a cast through 'text' in our tests.
---
 .../expected/load_extension.out                    |  2 +-
 .../test_slot_timelines/sql/load_extension.sql     |  2 +-
 .../test_slot_timelines--1.0.sql                   |  4 +--
 .../test_slot_timelines/test_slot_timelines.c      | 31 +++++++++++++++++++---
 4 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/src/test/modules/test_slot_timelines/expected/load_extension.out b/src/test/modules/test_slot_timelines/expected/load_extension.out
index 14a414a..7c2ad9d 100644
--- a/src/test/modules/test_slot_timelines/expected/load_extension.out
+++ b/src/test/modules/test_slot_timelines/expected/load_extension.out
@@ -5,7 +5,7 @@ SELECT test_slot_timelines_create_logical_slot('test_slot', 'test_decoding');
  
 (1 row)
 
-SELECT test_slot_timelines_advance_logical_slot('test_slot', txid_current(), txid_current(), pg_current_xlog_location(), pg_current_xlog_location());
+SELECT test_slot_timelines_advance_logical_slot('test_slot', txid_current()::text::xid, txid_current()::text::xid, pg_current_xlog_location(), pg_current_xlog_location());
  test_slot_timelines_advance_logical_slot 
 ------------------------------------------
  
diff --git a/src/test/modules/test_slot_timelines/sql/load_extension.sql b/src/test/modules/test_slot_timelines/sql/load_extension.sql
index a71127d..2440355 100644
--- a/src/test/modules/test_slot_timelines/sql/load_extension.sql
+++ b/src/test/modules/test_slot_timelines/sql/load_extension.sql
@@ -2,6 +2,6 @@ CREATE EXTENSION test_slot_timelines;
 
 SELECT test_slot_timelines_create_logical_slot('test_slot', 'test_decoding');
 
-SELECT test_slot_timelines_advance_logical_slot('test_slot', txid_current(), txid_current(), pg_current_xlog_location(), pg_current_xlog_location());
+SELECT test_slot_timelines_advance_logical_slot('test_slot', txid_current()::text::xid, txid_current()::text::xid, pg_current_xlog_location(), pg_current_xlog_location());
 
 SELECT pg_drop_replication_slot('test_slot');
diff --git a/src/test/modules/test_slot_timelines/test_slot_timelines--1.0.sql b/src/test/modules/test_slot_timelines/test_slot_timelines--1.0.sql
index 31d7f8e..50449bf 100644
--- a/src/test/modules/test_slot_timelines/test_slot_timelines--1.0.sql
+++ b/src/test/modules/test_slot_timelines/test_slot_timelines--1.0.sql
@@ -8,9 +8,9 @@ LANGUAGE c AS 'MODULE_PATHNAME';
 COMMENT ON FUNCTION test_slot_timelines_create_logical_slot(text, text)
 IS 'Create a logical slot at a particular lsn and xid. Do not use in production servers, it is not safe. The slot is created with an invalid xmin and lsn.';
 
-CREATE OR REPLACE FUNCTION test_slot_timelines_advance_logical_slot(slot_name text, new_xmin bigint, new_catalog_xmin bigint, new_restart_lsn pg_lsn, new_confirmed_lsn pg_lsn)
+CREATE OR REPLACE FUNCTION test_slot_timelines_advance_logical_slot(slot_name text, new_xmin xid, new_catalog_xmin xid, new_restart_lsn pg_lsn, new_confirmed_lsn pg_lsn)
 RETURNS void
 LANGUAGE c AS 'MODULE_PATHNAME';
 
-COMMENT ON FUNCTION test_slot_timelines_advance_logical_slot(text, bigint, bigint, pg_lsn, pg_lsn)
+COMMENT ON FUNCTION test_slot_timelines_advance_logical_slot(text, xid, xid, pg_lsn, pg_lsn)
 IS 'Advance a logical slot directly. Do not use this in production servers, it is not safe.';
diff --git a/src/test/modules/test_slot_timelines/test_slot_timelines.c b/src/test/modules/test_slot_timelines/test_slot_timelines.c
index 74dd1a0..e818705 100644
--- a/src/test/modules/test_slot_timelines/test_slot_timelines.c
+++ b/src/test/modules/test_slot_timelines/test_slot_timelines.c
@@ -85,10 +85,33 @@ Datum
 test_slot_timelines_advance_logical_slot(PG_FUNCTION_ARGS)
 {
 	char	   *slotname = text_to_cstring(PG_GETARG_TEXT_P(0));
-	TransactionId new_xmin = (TransactionId) PG_GETARG_INT64(1);
-	TransactionId new_catalog_xmin = (TransactionId) PG_GETARG_INT64(2);
-	XLogRecPtr	restart_lsn = PG_GETARG_LSN(3);
-	XLogRecPtr	confirmed_lsn = PG_GETARG_LSN(4);
+	TransactionId new_xmin;
+	TransactionId new_catalog_xmin;
+	XLogRecPtr	restart_lsn;
+	XLogRecPtr	confirmed_lsn;
+
+	if (PG_ARGISNULL(0))
+		elog(ERROR, "slot name may not be null");
+
+	if (PG_ARGISNULL(1))
+		new_xmin = InvalidTransactionId;
+	else
+		new_xmin = (TransactionId) DatumGetTransactionId(PG_GETARG_DATUM(1));
+
+	if (PG_ARGISNULL(2))
+		new_catalog_xmin = InvalidTransactionId;
+	else
+		new_catalog_xmin = (TransactionId) DatumGetTransactionId(PG_GETARG_DATUM(2));
+
+	if (PG_ARGISNULL(3))
+		restart_lsn = InvalidXLogRecPtr;
+	else
+		restart_lsn = PG_GETARG_LSN(3);
+
+	if (PG_ARGISNULL(4))
+		confirmed_lsn = InvalidXLogRecPtr;
+	else
+		confirmed_lsn = PG_GETARG_LSN(4);
 
 	CheckSlotRequirements();
 
-- 
2.1.0

#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Craig Ringer (#8)
Re: Re: [COMMITTERS] pgsql: Enable logical slots to follow timeline switches

Craig Ringer wrote:

Note that I can't use PG_GETARG_TRANSACTIONID directly since it's a macro
defined only in xid.c . It didn't seem worth extracting it and moving it to
postgres.h (where the other non-ADT-specific PG_GETARG_ macros are) or its
own new header just for this, so I've spelled it out each time.

Hm, we should probably fix this sometime. We used to think of Xids are
purely internal objects, but they are becoming more and more visible to
userland.

I now remember that that's part of why I used bigint as an argument type.
The other part is that txid_current() returns bigint and there's no cast
from bigint to xid. So the tests would have to CREATE CAST or cast via
'text'. They now do the latter.

I was rather surprised at this -- I thought that it'd break when the Xid
epoch got further than 0 (because casting from a number larger than 2^32
would throw an overflow error, I thought), but as it turns out the cast
from text to XID automatically discards the higher-order bits when a
higher epoch is used. I imagine this is intentional, but it's not
actually documented in xidin() at all. Also, I'm not sure what happens
if strtoul() refuses to work on > INT_MAX values ... it's documented to
set errno to ERANGE, so I assume the whole chain simply fails. Maybe
there's just no platform on which that happens anymore.

With regards to the rest of the patch, I decided not to use your
approach: it seemed a bit odd to accept NULL values there. I changed
the query to use COALESCE in the value that returned null instead. With
your change, the function takes a null and interprets it as InvalidXid
anyway, so it seems to work fine. (I also tested it manually.)

Let's see what hamster has to say about this.

It would be really good to have other buildfarm members run this code.
Currently only Hamster does, and only because Micha�l patched the
buildfarm script ...

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers