master and sync-replica diverging

Started by Erik Rijkersover 13 years ago5 messages
#1Erik Rijkers
er@xs4all.nl

AMD FX 8120 / centos 6.2 / latest source (git head)

It seems to be quite easy to force a 'sync' replica to not be equal to master by
recreating+loading a table in a while loop.

For this test I compiled+checked+installed three separate instances on the same machine. The
replica application_name are names 'wal_receiver_$copy' where $copy is 01, resp. 02.

$ ./sync_state.sh
pid | application_name | state | sync_state
-------+------------------+-----------+------------
19520 | wal_receiver_01 | streaming | sync
19567 | wal_receiver_02 | streaming | async
(2 rows)

port | synchronous_commit | synchronous_standby_names
------+--------------------+---------------------------
6564 | on | wal_receiver_01
(1 row)

port | synchronous_commit | synchronous_standby_names
------+--------------------+---------------------------
6565 | off |
(1 row)

port | synchronous_commit | synchronous_standby_names
------+--------------------+---------------------------
6566 | off |
(1 row)

The test consists of creating a table and loading tab-separated data from file with COPY and then
taking the rowcount of that table (13 MB, almost 200k rows) in all three instances:

# wget http://flybase.org/static_pages/downloads/FB2012_03/genes/fbgn_annotation_ID_fb_2012_03.tsv.gz

slurp_file=fbgn_annotation_ID_fb_2012_03.tsv.gz

zcat $slurp_file \
| grep -v '^#' \
| grep -Ev '^[[:space:]]*$' \
| psql -c "
drop table if exists $table cascade;
create table $table (
gene_symbol text
, primary_fbgn text
, secondary_fbgns text
, annotation_id text
, secondary_annotation_ids text
);
copy $table from stdin csv delimiter E'\t';
";

# count on master:
echo "select current_setting('port') port,count(*) from $table"|psql -qtXp 6564

# count on wal_receiver_01 (sync replica):
echo "select current_setting('port') port,count(*) from $table"|psql -qtXp 6565

# count on wal_receiver_02 (async replica):
echo "select current_setting('port') port,count(*) from $table"|psql -qtXp 6566

I expected the rowcounts from master and sync replica to always be the same.

Initially this seemed to be the case, but when I run the above sequence in a while loop for a few
minutes about 10% of rowcounts from the sync-replica are not equal to the master.

Perhaps not a likely scenario, but surely such a deviating rowcount on a sync replica should not
be possible?

thank you,

Erik Rijkers

#2Joshua Berkus
josh@agliodbs.com
In reply to: Erik Rijkers (#1)
Re: master and sync-replica diverging

Erik,

Are you taking the counts *while* the table is loading? In sync replication, it's possible for the counts to differ for a short time due to one of three things:

* transaction has been saved to the replica and confirm message hasn't reached the master yet
* replica has synched the transaction to the WAL log, but due to wal_delay settings hasn't yet applied it to the tables in memory.
* updating the master with synchronous_commit = local.

----- Original Message -----

Show quoted text

AMD FX 8120 / centos 6.2 / latest source (git head)

It seems to be quite easy to force a 'sync' replica to not be equal
to master by
recreating+loading a table in a while loop.

For this test I compiled+checked+installed three separate instances
on the same machine. The
replica application_name are names 'wal_receiver_$copy' where $copy
is 01, resp. 02.

$ ./sync_state.sh
pid | application_name | state | sync_state
-------+------------------+-----------+------------
19520 | wal_receiver_01 | streaming | sync
19567 | wal_receiver_02 | streaming | async
(2 rows)

port | synchronous_commit | synchronous_standby_names
------+--------------------+---------------------------
6564 | on | wal_receiver_01
(1 row)

port | synchronous_commit | synchronous_standby_names
------+--------------------+---------------------------
6565 | off |
(1 row)

port | synchronous_commit | synchronous_standby_names
------+--------------------+---------------------------
6566 | off |
(1 row)

The test consists of creating a table and loading tab-separated data
from file with COPY and then
taking the rowcount of that table (13 MB, almost 200k rows) in all
three instances:

# wget
http://flybase.org/static_pages/downloads/FB2012_03/genes/fbgn_annotation_ID_fb_2012_03.tsv.gz

slurp_file=fbgn_annotation_ID_fb_2012_03.tsv.gz

zcat $slurp_file \
| grep -v '^#' \
| grep -Ev '^[[:space:]]*$' \
| psql -c "
drop table if exists $table cascade;
create table $table (
gene_symbol text
, primary_fbgn text
, secondary_fbgns text
, annotation_id text
, secondary_annotation_ids text
);
copy $table from stdin csv delimiter E'\t';
";

# count on master:
echo "select current_setting('port') port,count(*) from $table"|psql
-qtXp 6564

# count on wal_receiver_01 (sync replica):
echo "select current_setting('port') port,count(*) from $table"|psql
-qtXp 6565

# count on wal_receiver_02 (async replica):
echo "select current_setting('port') port,count(*) from $table"|psql
-qtXp 6566

I expected the rowcounts from master and sync replica to always be
the same.

Initially this seemed to be the case, but when I run the above
sequence in a while loop for a few
minutes about 10% of rowcounts from the sync-replica are not equal to
the master.

Perhaps not a likely scenario, but surely such a deviating rowcount
on a sync replica should not
be possible?

thank you,

Erik Rijkers

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Erik Rijkers
er@xs4all.nl
In reply to: Joshua Berkus (#2)
Re: master and sync-replica diverging

On Thu, May 17, 2012 14:32, Joshua Berkus wrote:

Erik,

Are you taking the counts *while* the table is loading? In sync replication, it's possible for
the counts to differ for a short time due to one of three things:

* transaction has been saved to the replica and confirm message hasn't reached the master yet

The count(*) was done in the way that I showed, i.e. *after* psql had exited. My understanding is
that, with synchronous replication 'on' and configured properly, psql could only return *after*
the sync-replica had the data safely on disk. Either my understanding is not correct or there is
a bug in postgres sync-rep.

* replica has synched the transaction to the WAL log, but due to wal_delay settings hasn't yet
applied it to the tables in memory.

settings are untouched.

* updating the master with synchronous_commit = local.

During my test: synchronous_commit = on

Thanks,

Erik Rijkers

Show quoted text

----- Original Message -----

AMD FX 8120 / centos 6.2 / latest source (git head)

It seems to be quite easy to force a 'sync' replica to not be equal
to master by
recreating+loading a table in a while loop.

For this test I compiled+checked+installed three separate instances
on the same machine. The
replica application_name are names 'wal_receiver_$copy' where $copy
is 01, resp. 02.

$ ./sync_state.sh
pid | application_name | state | sync_state
-------+------------------+-----------+------------
19520 | wal_receiver_01 | streaming | sync
19567 | wal_receiver_02 | streaming | async
(2 rows)

port | synchronous_commit | synchronous_standby_names
------+--------------------+---------------------------
6564 | on | wal_receiver_01
(1 row)

port | synchronous_commit | synchronous_standby_names
------+--------------------+---------------------------
6565 | off |
(1 row)

port | synchronous_commit | synchronous_standby_names
------+--------------------+---------------------------
6566 | off |
(1 row)

The test consists of creating a table and loading tab-separated data
from file with COPY and then
taking the rowcount of that table (13 MB, almost 200k rows) in all
three instances:

# wget
http://flybase.org/static_pages/downloads/FB2012_03/genes/fbgn_annotation_ID_fb_2012_03.tsv.gz

slurp_file=fbgn_annotation_ID_fb_2012_03.tsv.gz

zcat $slurp_file \
| grep -v '^#' \
| grep -Ev '^[[:space:]]*$' \
| psql -c "
drop table if exists $table cascade;
create table $table (
gene_symbol text
, primary_fbgn text
, secondary_fbgns text
, annotation_id text
, secondary_annotation_ids text
);
copy $table from stdin csv delimiter E'\t';
";

# count on master:
echo "select current_setting('port') port,count(*) from $table"|psql
-qtXp 6564

# count on wal_receiver_01 (sync replica):
echo "select current_setting('port') port,count(*) from $table"|psql
-qtXp 6565

# count on wal_receiver_02 (async replica):
echo "select current_setting('port') port,count(*) from $table"|psql
-qtXp 6566

I expected the rowcounts from master and sync replica to always be
the same.

Initially this seemed to be the case, but when I run the above
sequence in a while loop for a few
minutes about 10% of rowcounts from the sync-replica are not equal to
the master.

Perhaps not a likely scenario, but surely such a deviating rowcount
on a sync replica should not
be possible?

thank you,

Erik Rijkers

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Ants Aasma
ants@cybertec.at
In reply to: Erik Rijkers (#3)
Re: master and sync-replica diverging

On Thu, May 17, 2012 at 4:53 PM, Erik Rijkers <er@xs4all.nl> wrote:

The count(*) was done in the way that I showed, i.e. *after* psql had exited.  My understanding is
that, with synchronous replication 'on' and configured properly, psql could only return *after*
the sync-replica had the data safely on disk.  Either my understanding is not correct or there is
a bug in postgres sync-rep.

Commit can only return when sync-replica has the data safely on disk,
but this doesn't mean that it's visible yet.

The sequence of events is in dot notation:
commit_command -> master_wal_sync -> replica_wal_sync ->
master_commit_visible -> commit_response
replica_wal_sync -> replica_replay_wal -> replica_commit_visible

If you issue a select on the replica after getting a commit response
from master you can see that the query getting a snapshot races with
replay of the commit record.

Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

#5Erik Rijkers
er@xs4all.nl
In reply to: Ants Aasma (#4)
Re: master and sync-replica diverging

On Thu, May 17, 2012 16:10, Ants Aasma wrote:

On Thu, May 17, 2012 at 4:53 PM, Erik Rijkers <er@xs4all.nl> wrote:

The count(*) was done in the way that I showed, i.e. *after* psql had exited.  My understanding
is
that, with synchronous replication 'on' and configured properly, psql could only return *after*
the sync-replica had the data safely on disk.  Either my understanding is not correct or there
is
a bug in postgres sync-rep.

Commit can only return when sync-replica has the data safely on disk,
but this doesn't mean that it's visible yet.

The sequence of events is in dot notation:
commit_command -> master_wal_sync -> replica_wal_sync ->
master_commit_visible -> commit_response
replica_wal_sync -> replica_replay_wal -> replica_commit_visible

If you issue a select on the replica after getting a commit response
from master you can see that the query getting a snapshot races with
replay of the commit record.

Ah yes, that makes sense. I hadn't thought of that.

Thank you for that explanation.