Logical replication existing data copy
Hi,
as the logical replication patch is progressing closer to integration it
seems to be good time to revisit what to do about preexisting data.
This patch implements data synchronization for the logical replication.
It works both for initial setup of subscription as well as for tables
added later to replication when the subscription is already active.
There is new catalog pg_subscription_rel which track
synchronization/replication state of each table that is subscribed via
publications.
From the user perspective it adds some additional options/clauses to
CREATE/ALTER SUBSCRIPTION namely:
- CREATE SUBSCRIPTION ... [ WITH (COPY DATA | NOCOPY DATA) ] where user
can specify if initial data should be copied or not, with the default COPY.
- ALTER SUBSCRIPTION ... REFRESH PUBLICATION [ WITH (COPY DATA | NOCOPY
DATA) ] which updates the local info about tables in the subscribed
publication again with option to either copy or not copy data.
- Also this WITH (COPY DATA | NOCOPY DATA) ] was added to ALTER
SUBSCRIPTION ... SET PUBLICATION which automatically triggers the
REFRESH PUBLICATION as well.
So existing table data can be copied once subscription is created, but
also new tables can be added and their data copied. This is where the
REFRESH PUBLICATION comes into play. Adding table to publication does
not make it automatically replicated by the subscription as the
subscription does not have tracking info for that table. So to add new
table user must call ALTER SUBSCRIPTION ... REFRESH PUBLICATION on
subscriber otherwise the data won't be replicated.
The copy proccess runs in parallel with normal replication and is
visible in the pg_stat_subscription view. There is also new GUC
max_subscription_sync_workers which specifies how many tables can be
synchronized in parallel.
Implementation-wise this adds modified apply process called sync
(tablesync.c) which at the beginning creates temporary slot, copies data
from it, then synces to the correct LSN with the main apply and exits
(this is actually quite complex but it's explained in the tablesync.c
header).
Standard COPY TO/FROM is used. This is done by enhancing COPY FROM to
accept data from callback function which the tablesync implements.
On the publisher side, the walsender is enhanced to accept standard SQL
(simple query protocol only) so that COPY can be run. This also helps
getting info about table using plain SELECT. The way the SQL works is
that the parser was changed to accept strings that are not valid
replication protocol commands and if such string comes it's sent to
exec_simple_query instead of walsender.
The attached patch applies on top of logical replication patch-set v14.
I will add this as separate entry to the CF.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Logical-replication-support-for-initial-data-copy.patchtext/x-diff; name=0001-Logical-replication-support-for-initial-data-copy.patchDownload+2500-278
On 12/19/16 4:30 AM, Petr Jelinek wrote:
So existing table data can be copied once subscription is created, but
also new tables can be added and their data copied. This is where the
REFRESH PUBLICATION comes into play. Adding table to publication does
not make it automatically replicated by the subscription as the
subscription does not have tracking info for that table. So to add new
table user must call ALTER SUBSCRIPTION ... REFRESH PUBLICATION on
subscriber otherwise the data won't be replicated.
Couldn't the subscriber automatically add tracking info when apply
stream data arrives for a relation it has not seen before?
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 20 Dec. 2016 1:27 am, "Peter Eisentraut" <
peter.eisentraut@2ndquadrant.com> wrote:
On 12/19/16 4:30 AM, Petr Jelinek wrote:
So existing table data can be copied once subscription is created, but
also new tables can be added and their data copied. This is where the
REFRESH PUBLICATION comes into play. Adding table to publication does
not make it automatically replicated by the subscription as the
subscription does not have tracking info for that table. So to add new
table user must call ALTER SUBSCRIPTION ... REFRESH PUBLICATION on
subscriber otherwise the data won't be replicated.
Couldn't the subscriber automatically add tracking info when apply
stream data arrives for a relation it has not seen before?
If no table has been created by the user and we start trying to apply a
data stream apply will break.
Since manual action is needed to create the destination I don't see a
problem with requiring manual enabling too, personally.
Let the fully transparent way wait until we can do DDL replication in v11+
On 19/12/16 18:25, Peter Eisentraut wrote:
On 12/19/16 4:30 AM, Petr Jelinek wrote:
So existing table data can be copied once subscription is created, but
also new tables can be added and their data copied. This is where the
REFRESH PUBLICATION comes into play. Adding table to publication does
not make it automatically replicated by the subscription as the
subscription does not have tracking info for that table. So to add new
table user must call ALTER SUBSCRIPTION ... REFRESH PUBLICATION on
subscriber otherwise the data won't be replicated.Couldn't the subscriber automatically add tracking info when apply
stream data arrives for a relation it has not seen before?
Sure, but it has many caveats:
- what if the table does not exist
- what it if exists and already has data
- what if the table is rarely written to
We can't control any of that until we have DDL replication/automatic
structure dumping. Once we have those, we can add options to control
default behavior per subscriber, but with current feature set, anything
that does not require user action will behave non-deterministically
which is usually confusing.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 12/19/16 4:30 AM, Petr Jelinek wrote:
This patch implements data synchronization for the logical replication.
It works both for initial setup of subscription as well as for tables
added later to replication when the subscription is already active.
First detailed read-through. General structure makes sense.
Comments on some details:
No catalogs.sgml documentation for pg_subscription_rel. When that is
added, I would emphasize that entries are only added when relations
are first encountered, not immediately when a table is added to a
publication. So it is unlike pg_publication_rel in that respect.
Rename max_subscription_sync_workers ->
max_sync_workers_per_subscription (similar to
max_parallel_workers_per_gather and others)
About the changes to COPY: It took me a while to get clarity on the
direction of things. Is the read_cb reading the table, or the data?
You are copying data produced by a function into a table, so
produce_cb or data_source_cb could be clearer.
SetSubscriptionRelState(): This is doing an "upsert", so I don't think
a RowExclusiveLock is enough, or it needs to be able to retry on
concurrent changes. I suppose there shouldn't be more than one
concurrent change per sub/rel pair, but that would need to be
explained there.
SetSubscriptionRelState(): The memset(values, 0, sizeof(values)) is
kind of in an odd place. Possibly not actually needed.
GetSubscriptionRelState(): Prefer error messages that identify the
objects by name. (Also subid should be %u.)
GetSubscriptionRelationsFilter(): The state and filter arguments are
kind of weird. And there are only two callers, so all this complexity
is not even used. Perhaps, if state == SUBREL_STATE_UNKNOWN, then
return everything, else return exactly the state specified. The case
of everything-but-the-state-specified does not appear to be needed.
GetSubscriptionRelationsFilter(): RowExclusiveLock is probably too
much
This patch adds the fourth definition of oid_cmp() (also known as
oidComparator()) and the 12th definition of atooid(). Let's collect
those in a central place. I'm sending a separate patch for that. (No
need to change here until that is resolved, of course.)
AlterSubscription_refresh(): Put the if (wrconn == NULL) case first
and error out, and then put the rest of the code into the main
function block, saving one level of indentation.
AlterSubscription_refresh(): remote_table_oids isn't really the
remote OIDs of the tables but the local OIDs of the remote tables.
Consider clearer variable naming in that function.
interesting_relation(): very generic name, maybe
should_apply_changes_for_rel(). Also the comment mentions a "parallel
worker" -- maybe better a "separate worker process running in
parallel", since a parallel worker is something different.
LogicalRepApplyLoop() changed to non-static, but not used anywhere
else.
process_syncing_tables_*(): Function names and associated comments are
not very clear (process what?, handle what?).
In process_syncing_tables_apply(), it says that
logicalrep_worker_count() counts the apply worker as well, but what
happens if the apply worker has temporarily disappeared? Since
logicalrep_worker_count() is only used in this one place, maybe have
it count just the sync workers and rename it slightly.
Changed some LOG messages to DEBUG, not clear what the purpose there is.
check_max_subscription_sync_workers(): Same problem as discussed
previously, checking a GUC setting against another GUC setting like
this doesn't work. Just skip it and check at run time.
wait_for_sync_status_change(): Comment is wrong/misleading: It doesn't
wait until it matches any specific state, it just waits for any state
change.
LogicalRepSyncTableStart(): The code that assembles the slot name
needs to be aware of NAMEDATALEN. (Maybe at least throw in a static
assert that NAMEDATALEN>=64.)
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 11/01/17 17:10, Peter Eisentraut wrote:
On 12/19/16 4:30 AM, Petr Jelinek wrote:
This patch implements data synchronization for the logical replication.
It works both for initial setup of subscription as well as for tables
added later to replication when the subscription is already active.First detailed read-through. General structure makes sense.
Thanks!
Comments on some details:
No catalogs.sgml documentation for pg_subscription_rel. When that is
added, I would emphasize that entries are only added when relations
are first encountered, not immediately when a table is added to a
publication. So it is unlike pg_publication_rel in that respect.
It's not even first encountered, but I did explain.
Rename max_subscription_sync_workers ->
max_sync_workers_per_subscription (similar to
max_parallel_workers_per_gather and others)
Makes sense.
About the changes to COPY: It took me a while to get clarity on the
direction of things. Is the read_cb reading the table, or the data?
You are copying data produced by a function into a table, so
produce_cb or data_source_cb could be clearer.
The data_source_cb sounds good to me.
SetSubscriptionRelState(): This is doing an "upsert", so I don't think
a RowExclusiveLock is enough, or it needs to be able to retry on
concurrent changes. I suppose there shouldn't be more than one
concurrent change per sub/rel pair, but that would need to be
explained there.
Well technically there can be some concurrency via the ALTER
SUBSCRIPTION ... REFRESH so I increased lock level (and same for Remove).
SetSubscriptionRelState(): The memset(values, 0, sizeof(values)) is
kind of in an odd place. Possibly not actually needed.
It might not be needed but we traditionally do it anyway. I moved it a bit.
GetSubscriptionRelState(): Prefer error messages that identify the
objects by name. (Also subid should be %u.)
Well this is cache lookup failure though, who's to know that the objects
actually exist in that case.
GetSubscriptionRelationsFilter(): The state and filter arguments are
kind of weird. And there are only two callers, so all this complexity
is not even used. Perhaps, if state == SUBREL_STATE_UNKNOWN, then
return everything, else return exactly the state specified. The case
of everything-but-the-state-specified does not appear to be needed.
I see this was bit confusing so I split the function into two. Actually
the 2 usecases used are everything and everything except SUBREL_STATE_READY.
GetSubscriptionRelationsFilter(): RowExclusiveLock is probably too
much
Yes, same with GetSubscriptionRelState().
AlterSubscription_refresh(): remote_table_oids isn't really the
remote OIDs of the tables but the local OIDs of the remote tables.
Consider clearer variable naming in that function.
Done.
interesting_relation(): very generic name, maybe
should_apply_changes_for_rel(). Also the comment mentions a "parallel
worker" -- maybe better a "separate worker process running in
parallel", since a parallel worker is something different.
Done.
process_syncing_tables_*(): Function names and associated comments are
not very clear (process what?, handle what?).
Ok added some more explanation, hopefully it's better now.
In process_syncing_tables_apply(), it says that
logicalrep_worker_count() counts the apply worker as well, but what
happens if the apply worker has temporarily disappeared? Since
Then the function is never going to be called for that subscription as
it's only called from the apply.
logicalrep_worker_count() is only used in this one place, maybe have
it count just the sync workers and rename it slightly.
Makes sense anyway though.
Changed some LOG messages to DEBUG, not clear what the purpose there is.
In fact I changed some DEBUG messages to LOG in the main patch set, git
rebase just does not handle that very well. Fixed.
check_max_subscription_sync_workers(): Same problem as discussed
previously, checking a GUC setting against another GUC setting like
this doesn't work. Just skip it and check at run time.
Yeah, we always check at run time.
wait_for_sync_status_change(): Comment is wrong/misleading: It doesn't
wait until it matches any specific state, it just waits for any state
change.
Fixed.
LogicalRepSyncTableStart(): The code that assembles the slot name
needs to be aware of NAMEDATALEN. (Maybe at least throw in a static
assert that NAMEDATALEN>=64.)
I switched to snprintf seems cleaner (also removes the need to know
about proper memory context of a private variable from
LogicalRepSyncTableStart()).
I also added option called SKIP CONNECT to CREATE SUBSCRIPTION (yes that
WIP name, I welcome suggestions). That skips the initial connection
attempt which is now needed always since we need to copy the table list.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Logical-replication-support-for-initial-data-copy-v2.patchtext/plain; charset=UTF-8; name=0001-Logical-replication-support-for-initial-data-copy-v2.patchDownload+2641-266
On 2017-01-18 14:46, Petr Jelinek wrote:
0001-Logical-replication-support-for-initial-data-copy-v2.patch
Applies and builds fine on top of the previous 5 patches.
Two problems:
1. alter_subscription.sgml has an unpaired <command>-tag, which breaks
the doc-build:
This is the offending patch-line:
+ <command>CREATE SUBSCRIPTION</command> with <command>COPY
DATA<command>
2. Running the below (a version of the earlier pgbench_derail.sh) I have
found that
create subscription sub1 .. with (disabled); and then alter
subscription sub1 enable;
cannot be run immediately, consecutively. The error is avoided when the
two
commands are separated (for instance, below in separate psql- calls).
I don't understand why this is but it is reliably so.
The error(s):
2017-01-18 17:26:56.126 CET 24410 LOG: starting logical replication
worker for subscription "sub1"
2017-01-18 17:26:56.132 CET 26291 LOG: logical replication apply for
subscription sub1 started
2017-01-18 17:26:56.139 CET 26291 LOG: starting logical replication
worker for subscription "sub1"
2017-01-18 17:26:56.145 CET 26295 LOG: logical replication sync for
subscription sub1, table pgbench_accounts started
2017-01-18 17:26:56.534 CET 26295 ERROR: duplicate key value violates
unique constraint "pgbench_accounts_pkey"
2017-01-18 17:26:56.534 CET 26295 DETAIL: Key (aid)=(1) already exists.
2017-01-18 17:26:56.534 CET 26295 CONTEXT: COPY pgbench_accounts, line
1
2017-01-18 17:26:56.536 CET 21006 LOG: worker process: logical
replication worker 41015 sync 40991 (PID 26295) exited with exit code 1
2017-01-18 17:26:56.536 CET 26291 LOG: starting logical replication
worker for subscription "sub1"
2017-01-18 17:26:56.542 CET 26297 LOG: logical replication sync for
subscription sub1, table pgbench_branches started
2017-01-18 17:26:57.015 CET 26297 ERROR: duplicate key value violates
unique constraint "pgbench_branches_pkey"
2017-01-18 17:26:57.015 CET 26297 DETAIL: Key (bid)=(1) already exists.
2017-01-18 17:26:57.015 CET 26297 CONTEXT: COPY pgbench_branches, line
1
2017-01-18 17:26:57.017 CET 21006 LOG: worker process: logical
replication worker 41015 sync 40994 (PID 26297) exited with exit code 1
2017-01-18 17:26:57.017 CET 26291 LOG: starting logical replication
worker for subscription "sub1"
2017-01-18 17:26:57.023 CET 26299 LOG: logical replication sync for
subscription sub1, table pgbench_history started
2017-01-18 17:26:57.487 CET 26299 LOG: logical replication
synchronization worker finished processing
2017-01-18 17:26:57.488 CET 26291 LOG: starting logical replication
worker for subscription "sub1"
2017-01-18 17:26:57.491 CET 26301 LOG: logical replication sync for
subscription sub1, table pgbench_tellers started
2017-01-18 17:26:57.948 CET 26301 ERROR: duplicate key value violates
unique constraint "pgbench_tellers_pkey"
2017-01-18 17:26:57.948 CET 26301 DETAIL: Key (tid)=(1) already exists.
2017-01-18 17:26:57.948 CET 26301 CONTEXT: COPY pgbench_tellers, line 1
etc, etc.
#!/bin/sh
# assumes both instances are running, initially without publication or
subscription
unset PGSERVICEFILE PGSERVICE PGPORT PGDATA PGHOST
env | grep PG
PGDATABASE=testdb
# clear logs
echo >
/home/aardvark/pg_stuff/pg_installations/pgsql.logical_replication/logfile.logical_replication
echo >
/home/aardvark/pg_stuff/pg_installations/pgsql.logical_replication2/logfile.logical_replication2
port1=6972
port2=6973
function cb()
{
# display the 4 pgbench tables' accumulated content as md5s
# a,b,t,h stand for: pgbench_accounts, -branches, -tellers, -history
for port in $port1 $port2
do
md5_a=$(echo "select * from pgbench_accounts order by aid"|psql
-qtAXp$port|md5sum|cut -b 1-9)
md5_b=$(echo "select * from pgbench_branches order by bid"|psql
-qtAXp$port|md5sum|cut -b 1-9)
md5_t=$(echo "select * from pgbench_tellers order by tid"|psql
-qtAXp$port|md5sum|cut -b 1-9)
md5_h=$(echo "select * from pgbench_history order by hid"|psql
-qtAXp$port|md5sum|cut -b 1-9)
cnt_a=$(echo "select count(*) from pgbench_accounts"|psql -qtAXp
$port)
cnt_b=$(echo "select count(*) from pgbench_branches"|psql -qtAXp
$port)
cnt_t=$(echo "select count(*) from pgbench_tellers" |psql -qtAXp
$port)
cnt_h=$(echo "select count(*) from pgbench_history" |psql -qtAXp
$port)
printf "$port a,b,t,h: %6d %6d %6d %6d" $cnt_a $cnt_b $cnt_t
$cnt_h
echo -n " $md5_a $md5_b $md5_t $md5_h"
if [[ $port -eq $port1 ]]; then echo " master"
elif [[ $port -eq $port2 ]]; then echo " replica"
else echo " ERROR"
fi
done
}
echo "
drop table if exists pgbench_accounts;
drop table if exists pgbench_branches;
drop table if exists pgbench_tellers;
drop table if exists pgbench_history;" | psql -X -p $port1 \
&& echo "
drop table if exists pgbench_accounts;
drop table if exists pgbench_branches;
drop table if exists pgbench_tellers;
drop table if exists pgbench_history;" | psql -X -p $port2 \
&& pgbench -p $port1 -qis 1 \
&& echo "
alter table pgbench_history add column hid serial primary key;
-- alter table pgbench_history replica identity full;
" | psql -1p $port1 \
&& pg_dump -F c -p $port1 \
-t pgbench_accounts \
-t pgbench_branches \
-t pgbench_tellers \
-t pgbench_history \
| pg_restore -p $port2 -d testdb
echo "$(cb)"
sleep 2
echo "$(cb)"
echo "create publication pub1 for all tables;" | psql -p $port1 -aqtAX
# this demostrates the bug:
echo "create subscription sub1 connection 'port=${port1}' publication
pub1 with (disabled);
alter subscription sub1 enable; " | psql -p $port2 -aqtAX
# like this (create and alter together in a single psql-call) fails;
# with the commands separated (to send them separately to psql suffices)
there is no problem.
#------------------------------------
#exit
#echo "$(cb)"
#echo "-- pgbench -c 1 -T 10 -P 5 -n (short run, first)"
# pgbench -c 1 -T 10 -P 5 -n
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 18/01/17 17:35, Erik Rijkers wrote:
On 2017-01-18 14:46, Petr Jelinek wrote:
0001-Logical-replication-support-for-initial-data-copy-v2.patch
Applies and builds fine on top of the previous 5 patches.
Two problems:
1. alter_subscription.sgml has an unpaired <command>-tag, which breaks
the doc-build:
This is the offending patch-line:
+ <command>CREATE SUBSCRIPTION</command> with <command>COPY
DATA<command>
Hmm, I wonder how did that compile on my machine as it's indeed syntax
error.
2. Running the below (a version of the earlier pgbench_derail.sh) I have
found that
create subscription sub1 .. with (disabled); and then alter
subscription sub1 enable;
cannot be run immediately, consecutively. The error is avoided when the
two
commands are separated (for instance, below in separate psql- calls).I don't understand why this is but it is reliably so.
AFAICS you should always get error from that test after you enable the
subscription, no matter if you enable immediately or later. The default
behavior is to copy the data and your test already copies them via
pg_dump/pg_restore.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-01-18 20:45, Petr Jelinek wrote:
AFAICS you should always get error from that test after you enable the
Ah tes, you were right, of course; I had assumed the earlier mentioned
CREATE SUBSCRIPTION ... [ WITH (COPY DATA | NOCOPY DATA) ]
but that syntax wasn't implemented, I now understand.
Taking that into account, my older tests work OK again (using the
7-patches below).
Other small issue: using this patch-set:
0001-Add-PUBLICATION-catalogs-and-DDL-v18.patch
0002-Add-SUBSCRIPTION-catalog-and-DDL-v18.patch
0003-Define-logical-replication-protocol-and-output-plugi-v18.patch
0004-Add-logical-replication-workers-v18fixed.patch
0005-Add-separate-synchronous-commit-control-for-logical--v18.patch
0001-Logical-replication-support-for-initial-data-copy-v2.patch
0006-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch
( This is now the patch-set to test, is that correct? )
make check complains:
***
/home/aardvark/pg_stuff/pg_sandbox/pgsql.logical_replication/src/test/regress/expected/subscription.out 2017-01-19
09:26:41.354703032 +0100
---
/home/aardvark/pg_stuff/pg_sandbox/pgsql.logical_replication/src/test/regress/results/subscription.out 2017-01-19
09:29:56.104685043 +0100
***************
*** 53,62 ****
COMMIT;
ALTER SUBSCRIPTION testsub RENAME TO testsub_foo;
\dRs
! List of subscriptions
! Name | Owner | Enabled | Publication
!
-------------+---------------------------+---------+--------------------
! testsub_foo | regress_subscription_user | f |
{testpub,testpub1}
(1 row)
DROP SUBSCRIPTION testsub_foo NODROP SLOT;
--- 53,62 ----
COMMIT;
ALTER SUBSCRIPTION testsub RENAME TO testsub_foo;
\dRs
! List of subscriptions
! Name | Owner | Enabled | Publication
! -------------+---------------------------+---------+-------------
! testsub_foo | regress_subscription_user | f | {testpub}
(1 row)
DROP SUBSCRIPTION testsub_foo NODROP SLOT;
======================================================================
Thanks,
Erik Rijkers
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 19/01/17 09:39, Erik Rijkers wrote:
On 2017-01-18 20:45, Petr Jelinek wrote:
AFAICS you should always get error from that test after you enable the
Ah tes, you were right, of course; I had assumed the earlier mentioned
CREATE SUBSCRIPTION ... [ WITH (COPY DATA | NOCOPY DATA) ]
but that syntax wasn't implemented, I now understand.Taking that into account, my older tests work OK again (using the
7-patches below).
Good to hear.
Other small issue: using this patch-set:
0001-Add-PUBLICATION-catalogs-and-DDL-v18.patch
0002-Add-SUBSCRIPTION-catalog-and-DDL-v18.patch
0003-Define-logical-replication-protocol-and-output-plugi-v18.patch
0004-Add-logical-replication-workers-v18fixed.patch
0005-Add-separate-synchronous-commit-control-for-logical--v18.patch
0001-Logical-replication-support-for-initial-data-copy-v2.patch
0006-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch( This is now the patch-set to test, is that correct? )
The rename should be before this one, but I guess with v2 that didn't
work well yet as it didn't know about rename.
make check complains:
***
/home/aardvark/pg_stuff/pg_sandbox/pgsql.logical_replication/src/test/regress/expected/subscription.out
2017-01-19 09:26:41.354703032 +0100
---
/home/aardvark/pg_stuff/pg_sandbox/pgsql.logical_replication/src/test/regress/results/subscription.out
2017-01-19 09:29:56.104685043 +0100
***************
*** 53,62 ****
COMMIT;
ALTER SUBSCRIPTION testsub RENAME TO testsub_foo;
\dRs
! List of subscriptions
! Name | Owner | Enabled | Publication
! -------------+---------------------------+---------+--------------------
! testsub_foo | regress_subscription_user | f | {testpub,testpub1}
(1 row)DROP SUBSCRIPTION testsub_foo NODROP SLOT; --- 53,62 ---- COMMIT; ALTER SUBSCRIPTION testsub RENAME TO testsub_foo; \dRs ! List of subscriptions ! Name | Owner | Enabled | Publication ! -------------+---------------------------+---------+------------- ! testsub_foo | regress_subscription_user | f | {testpub} (1 row)DROP SUBSCRIPTION testsub_foo NODROP SLOT;
======================================================================
Meh, I really messed up the rebase this time.
Okay, here is v3 with some small fixes and rebased on top of rename.
Also it's rebased without the
0005-Add-separate-synchronous-commit-control-for-logical--v18.patch as I
don't expect that one to go further for now.
Thanks for testing!
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Logical-replication-support-for-initial-data-copy-v3.patchtext/plain; charset=UTF-8; name=0001-Logical-replication-support-for-initial-data-copy-v3.patchDownload+2645-269
On Fri, Jan 20, 2017 at 3:03 AM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
Okay, here is v3 with some small fixes and rebased on top of rename.
Also it's rebased without the
0005-Add-separate-synchronous-commit-control-for-logical--v18.patch as I
don't expect that one to go further for now.Thanks for testing!
This patch needs a rebase, moved to next CF with "waiting on author".
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
here is updated patch.
Note that it's rebased on top of logical replication improvements
patches [1]/messages/by-id/42655eb4-6b2e-b35b-c184-509efd6eba10@2ndquadrant.com (which still apply fine to my surprise).
It will probably need another rebase once patches from Masahiko Sawada
and Fujii Masao get in.
[1]: /messages/by-id/42655eb4-6b2e-b35b-c184-509efd6eba10@2ndquadrant.com
/messages/by-id/42655eb4-6b2e-b35b-c184-509efd6eba10@2ndquadrant.com
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 08/02/17 23:24, Petr Jelinek wrote:
Hi,
here is updated patch.
Note that it's rebased on top of logical replication improvements
patches [1] (which still apply fine to my surprise).It will probably need another rebase once patches from Masahiko Sawada
and Fujii Masao get in.[1]
/messages/by-id/42655eb4-6b2e-b35b-c184-509efd6eba10@2ndquadrant.com
And now with the actual patch attached :)
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Logical-replication-support-for-initial-data-copy-v4.patchtext/plain; charset=UTF-8; name=0001-Logical-replication-support-for-initial-data-copy-v4.patchDownload+2643-261
On 2017-02-08 23:25, Petr Jelinek wrote:
0001-Use-asynchronous-connect-API-in-libpqwalreceiver-v2.patch
0002-Always-initialize-stringinfo-buffers-in-walsender-v2.patch
0003-Fix-after-trigger-execution-in-logical-replication-v2.patch
0004-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION-v2.patch
0001-Logical-replication-support-for-initial-data-copy-v4.patch
test 'object_address' fails, see atachment.
That's all I found in a quick first trial.
thanks,
Erik Rijkers
Attachments:
regression.diffstext/x-diff; charset=us-ascii; name=regression.diffsDownload+53-50
On 2017-02-08 23:25, Petr Jelinek wrote:
0001-Use-asynchronous-connect-API-in-libpqwalreceiver-v2.patch
0002-Always-initialize-stringinfo-buffers-in-walsender-v2.patch
0003-Fix-after-trigger-execution-in-logical-replication-v2.patch
0004-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION-v2.patch
0001-Logical-replication-support-for-initial-data-copy-v4.patch
Apart from the failing one make check test (test 'object_address') which
I reported earlier, I find it is easy to 'confuse' the replication.
I attach a script that intends to test the default COPY DATA. There
are two instances, initially without any replication. The script inits
pgbench on the master, adds a serial column to pgbench_history, and
dump-restores the 4 pgbench-tables to the future replica. It then
empties the 4 pgbench-tables on the 'replica'. The idea is that when
logrep is initiated, data will be replicated from master, with the end
result being that there are 4 identical tables on master and replica.
This often works but it also fails far too often (in my hands). I test
whether the tables are identical by comparing an md5 from an ordered
resultset, from both replica and master. I estimate that 1 in 5 tries
fail; 'fail' being a somewhat different table on replica (compared to
mater), most often pgbench_accounts (typically there are 10-30 differing
rows). No errors or warnings in either logfile. I'm not sure but I
think testing on faster machines seem to be doing somewhat better
('better' being less replication error).
Another, probably unrelated, problem occurs (but much more rarely) when
executing 'DROP SUBSCRIPTION sub1' on the replica (see the beginning of
the script). Sometimes that command hangs, and refuses to accept
shutdown of the server. I don't know how to recover from this -- I just
have to kill the replica server (master server still obeys normal
shutdown) and restart the instances.
The script accepts 2 parameters, scale and clients (used in pgbench -s
resp. -c)
I don't think I've managed to successfully run the script with more than
1 client yet.
Can you have a look whether this is reproducible elsewhere?
thanks,
Erik Rijkers
Attachments:
pgbench_derail2.shtext/x-shellscript; name=pgbench_derail2.shDownload
On 2017-02-09 02:25, Erik Rijkers wrote:
On 2017-02-08 23:25, Petr Jelinek wrote:
0001-Use-asynchronous-connect-API-in-libpqwalreceiver-v2.patch
0002-Always-initialize-stringinfo-buffers-in-walsender-v2.patch
0003-Fix-after-trigger-execution-in-logical-replication-v2.patch
0004-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION-v2.patch
0001-Logical-replication-support-for-initial-data-copy-v4.patch
fixes in create_subscription.sgml
Attachments:
create_subscription.sgml.difftext/x-diff; name=create_subscription.sgml.diffDownload+10-10
On 2017-02-11 11:16, Erik Rijkers wrote:
On 2017-02-08 23:25, Petr Jelinek wrote:
0001-Use-asynchronous-connect-API-in-libpqwalreceiver-v2.patch
0002-Always-initialize-stringinfo-buffers-in-walsender-v2.patch
0003-Fix-after-trigger-execution-in-logical-replication-v2.patch
0004-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION-v2.patch
0001-Logical-replication-support-for-initial-data-copy-v4.patchThis often works but it also fails far too often (in my hands). I
test whether the tables are identical by comparing an md5 from an
ordered resultset, from both replica and master. I estimate that 1 in
5 tries fail; 'fail' being a somewhat different table on replica
(compared to mater), most often pgbench_accounts (typically there are
10-30 differing rows). No errors or warnings in either logfile. I'm
not sure but I think testing on faster machines seem to be doing
somewhat better ('better' being less replication error).
I have noticed that when I insert a few seconds wait-state after the
create subscription (or actually: the 'enable'ing of the subscription)
the problem does not occur. Apparently, (I assume) the initial snapshot
occurs somewhere when the subsequent pgbench-run has already started, so
that the logical replication also starts somewhere 'into' that
pgbench-run. Does that make sense?
I don't know what to make of it. Now that I think that I understand
what happens I hesitate to call it a bug. But I'd say it's still a
useability problem that the subscription is only 'valid' after some
time, even if it's only a few seconds.
(the other problem I mentioned (drop subscription hangs) still happens
every now and then)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 13/02/17 14:51, Erik Rijkers wrote:
On 2017-02-11 11:16, Erik Rijkers wrote:
On 2017-02-08 23:25, Petr Jelinek wrote:
0001-Use-asynchronous-connect-API-in-libpqwalreceiver-v2.patch
0002-Always-initialize-stringinfo-buffers-in-walsender-v2.patch
0003-Fix-after-trigger-execution-in-logical-replication-v2.patch
0004-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION-v2.patch
0001-Logical-replication-support-for-initial-data-copy-v4.patchThis often works but it also fails far too often (in my hands). I
test whether the tables are identical by comparing an md5 from an
ordered resultset, from both replica and master. I estimate that 1 in
5 tries fail; 'fail' being a somewhat different table on replica
(compared to mater), most often pgbench_accounts (typically there are
10-30 differing rows). No errors or warnings in either logfile. I'm
not sure but I think testing on faster machines seem to be doing
somewhat better ('better' being less replication error).I have noticed that when I insert a few seconds wait-state after the
create subscription (or actually: the 'enable'ing of the subscription)
the problem does not occur. Apparently, (I assume) the initial snapshot
occurs somewhere when the subsequent pgbench-run has already started, so
that the logical replication also starts somewhere 'into' that
pgbench-run. Does that make sense?I don't know what to make of it. Now that I think that I understand
what happens I hesitate to call it a bug. But I'd say it's still a
useability problem that the subscription is only 'valid' after some
time, even if it's only a few seconds.
It is a bug, we are going to great lengths to create data snapshot that
corresponds to specific LSN so that we are able to decode exactly the
changes that happened since the data snapshot was taken. And the
tablecopy.c does quite a lot to synchronize table handover to main apply
process so that there is correct continuation of data stream as well. So
the end result is that concurrent changes are supposed to be okay and
eventually replication should catch up and the contents should be the same.
That being said, I am so far having problems reproducing this on my test
machine(s) so no idea what causes it yet.
Could you periodically dump contents of the pg_subscription_rel on
subscriber (ideally when dumping the md5 of the data) and attach that as
well?
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-02-16 00:43, Petr Jelinek wrote:
On 13/02/17 14:51, Erik Rijkers wrote:
On 2017-02-11 11:16, Erik Rijkers wrote:
On 2017-02-08 23:25, Petr Jelinek wrote:
0001-Use-asynchronous-connect-API-in-libpqwalreceiver-v2.patch
0002-Always-initialize-stringinfo-buffers-in-walsender-v2.patch
0003-Fix-after-trigger-execution-in-logical-replication-v2.patch
0004-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION-v2.patch
0001-Logical-replication-support-for-initial-data-copy-v4.patchThis often works but it also fails far too often (in my hands). I
Could you periodically dump contents of the pg_subscription_rel on
subscriber (ideally when dumping the md5 of the data) and attach that
as
well?
I attach a bash script (and its output) that polls the 4 pgbench table's
md5s and the pg_subscription_rel table, each second, while I run the
pgbench_derail2.sh (for that see my earlier mail).
pgbench_derail2.sh writes a 'header' into the same output stream (search
for '^===' ).
The .out file reflects a session where I started pgbench_derail2.sh
twice (it removes the publication and subscription at startup). So
there are 2 headers in the attached cb_20170216_10_04_47.out. The first
run ended in a succesful replication (=all 4 pgbench tables
md5-identical). The second run does not end correctly: it has (one of)
the typical faulty end-states: pgbench_accounts, the copy, has a few
less rows than the master table.
Other typical endstates are:
same number of rows but content not identical (for some, typically < 20
rows).
mostly pgbench_accounts and pgbench_history are affected.
(I see now that I made some mistakes in generating the timestamps in the
.out file but I suppose it doesn't matter too much)
I hope it helps; let me know if I can do any other test(s).
On 2017-02-16 00:43, Petr Jelinek wrote:
On 13/02/17 14:51, Erik Rijkers wrote:
On 2017-02-11 11:16, Erik Rijkers wrote:
On 2017-02-08 23:25, Petr Jelinek wrote:
0001-Use-asynchronous-connect-API-in-libpqwalreceiver-v2.patch
0002-Always-initialize-stringinfo-buffers-in-walsender-v2.patch
0003-Fix-after-trigger-execution-in-logical-replication-v2.patch
0004-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION-v2.patch
0001-Logical-replication-support-for-initial-data-copy-v4.patchThis often works but it also fails far too often (in my hands). I
That being said, I am so far having problems reproducing this on my
test
machine(s) so no idea what causes it yet.
A few extra bits:
- I have repeated this now on three different machines (debian 7, 8,
centos6; one a pretty big server); there is always failure within a few
tries of that test program (i.e. pgbench_derail2.sh, with the above 5
patches).
- I have also tried to go back to an older version of logrep: running
with 2 instances with only the first four patches (i.e., leaving out the
support-for-existing-data patch). With only those 4, the logical
replication is solid. (a quick 25x repetition of a (very similar) test
program is 100% successful). So the problem is likely somehow in that
last 5th patch.
- A 25x repetition of a test on a master + replica 5-patch server yields
13 ok, 12 NOK.
- Is the 'make check' FAILED test 'object_addess' unrelated? (Can you
at least reproduce that failed test?)
Maybe add this to the 10 Open Items list?
https://wiki.postgresql.org/wiki/PostgreSQL_10_Open_Items
It may garner a bit more attention.
thanks,
Erik Rijkers
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers