Right way to restore logical replication

Started by Игорь Выскоркоabout 5 years ago3 messagesgeneral
Jump to latest
#1Игорь Выскорко
vyskorko.igor@yandex.ru

<div><div><div>Hi, community!</div><div>Unfortunately can't find answer in docs and google. Hope only for you)</div><div>Have running 2 postgres locally: </div><div>:5433 - postgres 11 as master</div><div>:5434 - postgres 12 as slave</div><div> </div><div>Creating basic setup for testing:</div><div>[local]:5433 postgres@postgres=# create table tbl(id serial, d text, primary key(id));</div><div>CREATE TABLE</div><div>Time: 7,147 ms</div><div> </div><div>[local]:5434 postgres@postgres=# create table tbl(id serial, d text, primary key(id));</div><div>CREATE TABLE</div><div>Time: 11,557 ms</div><div> </div><div>[local]:5433 postgres@postgres=# create publication pub for table tbl;</div><div>CREATE PUBLICATION</div><div>Time: 6,646 ms</div><div> </div><div>[local]:5434 postgres@postgres=# create subscription sub connection 'host=localhost port=5433 dbname=postgres user=postgres' publication pub;</div><div>NOTICE:  created replication slot "sub" on publisher</div><div>CREATE SUBSCRIPTION</div><div>Time: 18,584 ms</div><div> </div><div>[local]:5433 postgres@postgres=# insert into tbl(d) values ('test');</div><div>INSERT 0 1</div><div>Time: 3,401 ms</div><div> </div><div>[local]:5434 postgres@postgres=# select * from tbl;</div><div> id |  d   </div><div>----+------</div><div>  1 | test</div><div>(1 row)</div><div> </div><div>works like a charm. Ok. Lets drop "accidentally" publication and insert new data:</div><div> </div><div>[local]:5433 postgres@postgres=# drop publication pub ;</div><div>DROP PUBLICATION</div><div>Time: 3,793 ms</div><div> </div><div>[local]:5433 postgres@postgres=# insert into tbl(d) values ('test2');</div><div>INSERT 0 1</div><div>Time: 9,002 ms</div><div> </div><div>Log for master:</div><div> </div><div>2021-02-08 22:13:14.970 +07 [14075] postgres@postgres LOG:  starting logical decoding for slot "sub"</div><div>2021-02-08 22:13:14.970 +07 [14075] postgres@postgres DETAIL:  Streaming transactions committing after 8/FD435A80, reading WAL from 8/FD436948.</div><div>2021-02-08 22:13:14.970 +07 [14075] postgres@postgres LOG:  logical decoding found consistent point at 8/FD436948</div><div>2021-02-08 22:13:14.970 +07 [14075] postgres@postgres DETAIL:  There are no running transactions.</div><div>2021-02-08 22:13:14.970 +07 [14075] postgres@postgres ERROR:  publication "pub" does not exist</div><div>2021-02-08 22:13:14.970 +07 [14075] postgres@postgres CONTEXT:  slot "sub", output plugin "pgoutput", in the change callback, associated LSN 8/FD4369E8</div><div> </div><div>slave:</div><div>2021-02-08 22:13:45.071 +07 [14110] LOG:  logical replication apply worker for subscription "sub" has started</div><div>2021-02-08 22:13:45.078 +07 [14110] ERROR:  could not receive data from WAL stream: ERROR:  publication "pub" does not exist</div><div>    CONTEXT:  slot "sub", output plugin "pgoutput", in the change callback, associated LSN 8/FD4369E8</div><div>2021-02-08 22:13:45.079 +07 [18374] LOG:  background worker "logical replication worker" (PID 14110) exited with exit code 1</div><div> </div><div>Looks reasonable - publication pub does not exist. Ok, trying to recreate publication:</div><div>[local]:5433 postgres@postgres=# create publication pub for table tbl;</div><div>CREATE PUBLICATION</div><div>Time: 6,646 ms</div><div> </div><div>result: nothing changed, same errors appears again and again. I couldn't find how to restore replication without drop&amp;create subscription again. </div><div> </div><div>Questions here:</div><div>1. what is going under the hood here - why walsender thinks that "publication "pub" does not exist" when it actually exists?</div><div>2. what is the right way to restore replication in my example?</div><div> </div><div>Thanks!</div><div> </div></div></div>

#2Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Игорь Выскорко (#1)
Re: Right way to restore logical replication

At Mon, 08 Feb 2021 22:42:21 +0700, Игорь Выскорко <vyskorko.igor@yandex.ru> wrote in

Hi, community!
Unfortunately can't find answer in docs and google. Hope only for you)
[local]:5433 postgres@postgres=# drop publication pub ;
DROP PUBLICATION
Time: 3,793 ms

[local]:5433 postgres@postgres=# insert into tbl(d) values ('test2');
INSERT 0 1
Time: 9,002 ms

[local]:5433 postgres@postgres=# create publication pub for table tbl;
CREATE PUBLICATION
Time: 6,646 ms

result: nothing changed, same errors appears again and again. I couldn't find
how to restore replication without drop&create subscription again.

If you recreated the publication before the insert, replication would
continue.

Questions here:
1. what is going under the hood here - why walsender thinks that "publication
"pub" does not exist" when it actually exists?

The answer is "because the publication did not exist at the time of
the INSERT". Thus the insert cannot be replicated using the new
publication.

It is because logical replication tries to find publications using the
same snapshot with the WAL record to be sent. Although it is the
designed behavior, I'm not sure that is true also for pg_publication.

2. what is the right way to restore replication in my example?

The most conservative way is just to drop the subscription then delete
all rows from the subscriber table then recreate the
subscription. This allows the newly created publication to work.

Also you can drop the subscription, then manually fix the subscriber
table to sync with the publisher table, then create a new subscription
using WITH (copy_data = false);

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#3Игорь Выскорко
vyskorko.igor@yandex.ru
In reply to: Kyotaro Horiguchi (#2)
Re: Right way to restore logical replication

<div>Thanks for reply!<br />It makes sense now why that happened and what to do in case of emergency</div><div> </div><div>09.02.2021, 10:01, "Kyotaro Horiguchi" &lt;horikyota.ntt@gmail.com&gt;:</div><blockquote><p>At Mon, 08 Feb 2021 22:42:21 +0700, Игорь Выскорко &lt;<a href="mailto:vyskorko.igor@yandex.ru" rel="noopener noreferrer">vyskorko.igor@yandex.ru</a>&gt; wrote in</p><blockquote> Hi, community!<br /> Unfortunately can't find answer in docs and google. Hope only for you)<br /> [local]:5433 <a href="mailto:postgres@postgres" rel="noopener noreferrer">postgres@postgres</a>=# drop publication pub ;<br /> DROP PUBLICATION<br /> Time: 3,793 ms<br /> <br /> [local]:5433 <a href="mailto:postgres@postgres" rel="noopener noreferrer">postgres@postgres</a>=# insert into tbl(d) values ('test2');<br /> INSERT 0 1<br /> Time: 9,002 ms<br /> <br /> [local]:5433 <a href="mailto:postgres@postgres" rel="noopener noreferrer">postgres@postgres</a>=# create publication pub for table tbl;<br /> CREATE PUBLICATION<br /> Time: 6,646 ms<br /> <br /> result: nothing changed, same errors appears again and again. I couldn't find<br /> how to restore replication without drop&amp;create subscription again.</blockquote><p><br />If you recreated the publication before the insert, replication would<br />continue.<br /> </p><blockquote> Questions here:<br /> 1. what is going under the hood here - why walsender thinks that "publication<br /> "pub" does not exist" when it actually exists?</blockquote><p><br />The answer is "because the publication did not exist at the time of<br />the INSERT". Thus the insert cannot be replicated using the new<br />publication.<br /><br />It is because logical replication tries to find publications using the<br />same snapshot with the WAL record to be sent. Although it is the<br />designed behavior, I'm not sure that is true also for pg_publication.<br /> </p><blockquote> 2. what is the right way to restore replication in my example?</blockquote><p><br />The most conservative way is just to drop the subscription then delete<br />all rows from the subscriber table then recreate the<br />subscription. This allows the newly created publication to work.<br /><br />Also you can drop the subscription, then manually fix the subscriber<br />table to sync with the publisher table, then create a new subscription<br />using WITH (copy_data = false);<br /><br />regards.<br /> </p>--<br />Kyotaro Horiguchi<br />NTT Open Source Software Center</blockquote>