logical replication: could not create file "state.tmp": File exists

Started by Grigory Smolkinover 6 years ago11 messagesbugs
Jump to latest
#1Grigory Smolkin
g.smolkin@postgrespro.ru

Hello!

One of my colleagues encountered an out of space condition, which broke
his logical replication setup.
It`s manifested with the following errors:

ERROR:  could not receive data from WAL stream: ERROR:  could not create
file "pg_replslot/some_sub/state.tmp": File exists

I`ve digged a bit into this problem, and it`s turned out that in
SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT
| O_EXCL' flags, which makes this routine as not very reentrant.

Since an exclusive lock is taken before temp file creation, I think it
should be safe to replace O_EXCL with O_TRUNC.
Script to reproduce and patch are attached.

--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

replication_bug.shapplication/x-shellscript; name=replication_bug.shDownload
0001-logical_replication_fix.patchtext/x-patch; name=0001-logical_replication_fix.patchDownload+1-1
#2Michael Paquier
michael@paquier.xyz
In reply to: Grigory Smolkin (#1)
Re: logical replication: could not create file "state.tmp": File exists

On Sat, Nov 30, 2019 at 03:09:39PM +0300, Grigory Smolkin wrote:

I`ve digged a bit into this problem, and it`s turned out that in
SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT |
O_EXCL' flags, which makes this routine as not very reentrant.

What did you see as I/O problem before facing the actual error
reported here? Was it just ENOSPC, a fsync failure, or just a failure
in closing the fd? The first pattern is mostly what I guess happened,
still a fsync failure would not trigger a PANIC here (actually we
really should do that!), but I am raising a different thread about
that issue.

Since an exclusive lock is taken before temp file creation, I think it
should be safe to replace O_EXCL with O_TRUNC.
Script to reproduce and patch are attached.

Agreed. I prefer the O_TRUNC option because that's less code churn.
Also, as it can still be useful to have a look at the temporary state
file after a crash or a failure, doing unlink() in the error code
paths is no good option IMO.

Have others thoughts or objections to share?
--
Michael

#3Grigory Smolkin
g.smolkin@postgrespro.ru
In reply to: Michael Paquier (#2)
Re: logical replication: could not create file "state.tmp": File exists

On 12/2/19 7:35 AM, Michael Paquier wrote:

On Sat, Nov 30, 2019 at 03:09:39PM +0300, Grigory Smolkin wrote:

I`ve digged a bit into this problem, and it`s turned out that in
SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT |
O_EXCL' flags, which makes this routine as not very reentrant.

What did you see as I/O problem before facing the actual error
reported here? Was it just ENOSPC, a fsync failure, or just a failure
in closing the fd? The first pattern is mostly what I guess happened,
still a fsync failure would not trigger a PANIC here (actually we
really should do that!), but I am raising a different thread about
that issue.

Hello!

I didn`t see the very first error that left behind the temp file.
I`ve requested it just now, but it will take some time to get it (there
are several terabytes of text log).
But I assume that it was out of space error, which, by the look of the
code, should produce ERROR and leave temp file behind, just as it
happened in aforementioned� case.

Since an exclusive lock is taken before temp file creation, I think it
should be safe to replace O_EXCL with O_TRUNC.
Script to reproduce and patch are attached.

Agreed. I prefer the O_TRUNC option because that's less code churn.
Also, as it can still be useful to have a look at the temporary state
file after a crash or a failure, doing unlink() in the error code
paths is no good option IMO.

I`m sorry, but it was an production system, so, as I understand it,
stale temp file was hastily deleted without long considerations.

Thank you for your interest in this topic.

Have others thoughts or objections to share?
--
Michael

--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#4Andres Freund
andres@anarazel.de
In reply to: Grigory Smolkin (#1)
Re: logical replication: could not create file "state.tmp": File exists

Hi,

On 2019-11-30 15:09:39 +0300, Grigory Smolkin wrote:

One of my colleagues encountered an out of space condition, which broke his
logical replication setup.
It`s manifested with the following errors:

ERROR:� could not receive data from WAL stream: ERROR:� could not create
file "pg_replslot/some_sub/state.tmp": File exists

Hm. What was the log output leading to this state? Some cases of this
would end up in a PANIC, which'd remove the .tmp file during
recovery. But there's some where we won't - it seems the right fix for
this would be to unlink the tmp file in that case?

I`ve digged a bit into this problem, and it`s turned out that in
SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT |
O_EXCL' flags, which makes this routine as not very reentrant.

Since an exclusive lock is taken before temp file creation, I think it
should be safe to replace O_EXCL with O_TRUNC.

I'm very doubtful about this. I think it's a good safety measure to
ensure that there's no previous state file that we're somehow
overwriting.

Script to reproduce and patch are attached.

Well:

# Imitate out_of_space/write_operation_error
touch ${PGDATA_PUB}/pg_replslot/mysub/state.tmp

Doesn't really replicate how we got into this state...

Greetings,

Andres Freund

#5Grigory Smolkin
g.smolkin@postgrespro.ru
In reply to: Andres Freund (#4)
Re: logical replication: could not create file "state.tmp": File exists

On 12/2/19 7:12 PM, Andres Freund wrote:

Hi,

On 2019-11-30 15:09:39 +0300, Grigory Smolkin wrote:

One of my colleagues encountered an out of space condition, which broke his
logical replication setup.
It`s manifested with the following errors:

ERROR:  could not receive data from WAL stream: ERROR:  could not create
file "pg_replslot/some_sub/state.tmp": File exists

Hm. What was the log output leading to this state? Some cases of this
would end up in a PANIC, which'd remove the .tmp file during
recovery. But there's some where we won't - it seems the right fix for
this would be to unlink the tmp file in that case?

I`ve digged a bit into this problem, and it`s turned out that in
SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT |
O_EXCL' flags, which makes this routine as not very reentrant.

Since an exclusive lock is taken before temp file creation, I think it
should be safe to replace O_EXCL with O_TRUNC.

I'm very doubtful about this. I think it's a good safety measure to
ensure that there's no previous state file that we're somehow
overwriting.

Is it possible with exclusive lock taken before that?

Script to reproduce and patch are attached.

Well:

# Imitate out_of_space/write_operation_error
touch ${PGDATA_PUB}/pg_replslot/mysub/state.tmp

Doesn't really replicate how we got into this state...

But it replicate the exactly the same state we would get, if write() to
temp file would have failed with out of space.

Greetings,

Andres Freund

--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#6Dmitry Vasiliev
dmitry.vasiliev@coins.ph
In reply to: Grigory Smolkin (#5)
Re: logical replication: could not create file "state.tmp": File exists

Here's what happened from the publisher and subscriber point of view:

publisher: (some query) ERROR: could not write to tuplestore temporary
file: No space left on device
subscriber: db =, user =, app =, client = ERROR: could not receive data
from WAL stream: ERROR: could not write to file
"pg_logical/snapshots/2AE-F3E52FB8.snap.27574.tmp": No space left on device
subscriber: db =, user =, app =, client = LOG: background worker "logical
replication worker" (PID 23114) exited with exit code 1
subscriber: db =, user =, app =, client = LOG: logical replication apply
worker for subscription "<name> _sub" has started
publisher: LOG: received replication command: IDENTIFY_SYSTEM
publisher: LOG: received replication command: START_REPLICATION SLOT
"<name> _sub" LOGICAL 2AE/F3C0B920 (proto_version '1', publication_names
'"<name>_pub"')
publisher: ERROR: could not create file "pg_replslot
/<name>_sub/state.tmp": File exists

I think some publisher logs may not be available due to out of space
problem.

On Mon, Dec 2, 2019 at 7:54 PM Grigory Smolkin <g.smolkin@postgrespro.ru>
wrote:

Show quoted text

On 12/2/19 7:12 PM, Andres Freund wrote:

Hi,

On 2019-11-30 15:09:39 +0300, Grigory Smolkin wrote:

One of my colleagues encountered an out of space condition, which broke

his

logical replication setup.
It`s manifested with the following errors:

ERROR: could not receive data from WAL stream: ERROR: could not create
file "pg_replslot/some_sub/state.tmp": File exists

Hm. What was the log output leading to this state? Some cases of this
would end up in a PANIC, which'd remove the .tmp file during
recovery. But there's some where we won't - it seems the right fix for
this would be to unlink the tmp file in that case?

I`ve digged a bit into this problem, and it`s turned out that in
SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT

|

O_EXCL' flags, which makes this routine as not very reentrant.

Since an exclusive lock is taken before temp file creation, I think it
should be safe to replace O_EXCL with O_TRUNC.

I'm very doubtful about this. I think it's a good safety measure to
ensure that there's no previous state file that we're somehow
overwriting.

Is it possible with exclusive lock taken before that?

Script to reproduce and patch are attached.

Well:

# Imitate out_of_space/write_operation_error
touch ${PGDATA_PUB}/pg_replslot/mysub/state.tmp

Doesn't really replicate how we got into this state...

But it replicate the exactly the same state we would get, if write() to
temp file would have failed with out of space.

Greetings,

Andres Freund

--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#7Grigory Smolkin
g.smolkin@postgrespro.ru
In reply to: Andres Freund (#4)
Re: logical replication: could not create file "state.tmp": File exists

On 12/2/19 7:12 PM, Andres Freund wrote:

Hi,

On 2019-11-30 15:09:39 +0300, Grigory Smolkin wrote:

One of my colleagues encountered an out of space condition, which broke his
logical replication setup.
It`s manifested with the following errors:

ERROR:  could not receive data from WAL stream: ERROR:  could not create
file "pg_replslot/some_sub/state.tmp": File exists

Hm. What was the log output leading to this state? Some cases of this
would end up in a PANIC, which'd remove the .tmp file during
recovery. But there's some where we won't - it seems the right fix for
this would be to unlink the tmp file in that case?

I`ve digged a bit into this problem, and it`s turned out that in
SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT |
O_EXCL' flags, which makes this routine as not very reentrant.

Since an exclusive lock is taken before temp file creation, I think it
should be safe to replace O_EXCL with O_TRUNC.

I'm very doubtful about this. I think it's a good safety measure to
ensure that there's no previous state file that we're somehow
overwriting.

Script to reproduce and patch are attached.

Well:

# Imitate out_of_space/write_operation_error
touch ${PGDATA_PUB}/pg_replslot/mysub/state.tmp

Doesn't really replicate how we got into this state...

I`ve managed to reproduce the issue using the attached script:

2019-12-02 20:32:12.547 MSK [4180] ERROR:  could not write to file
"pg_logical/snapshots/0-160E9A28.snap.4180.tmp": No space left on device
2019-12-02 20:32:12.548 MSK [4178] ERROR:  could not receive data from
WAL stream: ERROR:  could not write to file
"pg_logical/snapshots/0-160E9A28.snap.4180.tmp": No space left on device
2019-12-02 20:32:12.557 MSK [4645] LOG:  logical replication apply
worker for subscription "mysub" has started
2019-12-02 20:32:12.562 MSK [4091] LOG:  background worker "logical
replication worker" (PID 4178) exited with exit code 1
2019-12-02 20:32:12.569 MSK [4648] LOG:  starting logical decoding for
slot "mysub"
2019-12-02 20:32:12.569 MSK [4648] DETAIL:  Streaming transactions
committing after 0/160E9A28, reading WAL from 0/1566A2D8.
2019-12-02 20:32:12.569 MSK [4648] LOG:  logical decoding found
consistent point at 0/1566A2D8
2019-12-02 20:32:12.569 MSK [4648] DETAIL:  Logical decoding will begin
using saved snapshot.
2019-12-02 20:32:12.672 MSK [4648] ERROR:  could not write to file
"pg_replslot/mysub/state.tmp": No space left on device
2019-12-02 20:32:12.672 MSK [4645] ERROR:  could not receive data from
WAL stream: ERROR:  could not write to file
"pg_replslot/mysub/state.tmp": No space left on device
2019-12-02 20:32:12.672 MSK [4091] LOG:  background worker "logical
replication worker" (PID 4645) exited with exit code 1
2019-12-02 20:32:17.684 MSK [4653] LOG:  logical replication apply
worker for subscription "mysub" has started
2019-12-02 20:32:17.693 MSK [4654] LOG:  starting logical decoding for
slot "mysub"
2019-12-02 20:32:17.693 MSK [4654] DETAIL:  Streaming transactions
committing after 0/160E9A28, reading WAL from 0/1566A2D8.
2019-12-02 20:32:17.693 MSK [4654] LOG:  logical decoding found
consistent point at 0/1566A2D8
2019-12-02 20:32:17.693 MSK [4654] DETAIL:  Logical decoding will begin
using saved snapshot.
2019-12-02 20:32:17.847 MSK [4654] ERROR:  could not write to file
"pg_logical/snapshots/0-160E9A28.snap.4654.tmp": No space left on device
2019-12-02 20:32:17.847 MSK [4653] ERROR:  could not receive data from
WAL stream: ERROR:  could not write to file
"pg_logical/snapshots/0-160E9A28.snap.4654.tmp": No space left on device
2019-12-02 20:32:17.848 MSK [4091] LOG:  background worker "logical
replication worker" (PID 4653) exited with exit code 1
2019-12-02 20:32:22.859 MSK [4658] LOG:  logical replication apply
worker for subscription "mysub" has started
2019-12-02 20:32:22.868 MSK [4659] LOG:  starting logical decoding for
slot "mysub"
2019-12-02 20:32:22.868 MSK [4659] DETAIL:  Streaming transactions
committing after 0/160E9A28, reading WAL from 0/1566A2D8.
2019-12-02 20:32:22.868 MSK [4659] LOG:  logical decoding found
consistent point at 0/1566A2D8
2019-12-02 20:32:22.868 MSK [4659] DETAIL:  Logical decoding will begin
using saved snapshot.
2019-12-02 20:32:22.868 MSK [4659] ERROR:  could not create file
"pg_replslot/mysub/state.tmp": File exists
2019-12-02 20:32:22.869 MSK [4658] ERROR:  could not receive data from
WAL stream: ERROR:  could not create file "pg_replslot/mysub/state.tmp":
File exists

Greetings,

Andres Freund

--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

replication_bug.shapplication/x-shellscript; name=replication_bug.shDownload
#8Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#4)
Re: logical replication: could not create file "state.tmp": File exists

On Mon, Dec 02, 2019 at 08:12:22AM -0800, Andres Freund wrote:

I'm very doubtful about this. I think it's a good safety measure to
ensure that there's no previous state file that we're somehow
overwriting.

During the checkpoint of replication slots, SaveSlotToPath() would
just *LOG* any failure while leaving around the state.tmp of a slot,
and then any follow-up attempt to create state.tmp would just fail
because of that, preventing the slot state file from being flushed
continuously. I think that's wrong. Concurrency is not a concern
either here as the slot's LWLock to track an I/O in progress is taken
in exclusive lock.
--
Michael

#9Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#8)
Re: logical replication: could not create file "state.tmp": File exists

Hi,

On 2019-12-11 14:27:45 +0900, Michael Paquier wrote:

On Mon, Dec 02, 2019 at 08:12:22AM -0800, Andres Freund wrote:

I'm very doubtful about this. I think it's a good safety measure to
ensure that there's no previous state file that we're somehow
overwriting.

During the checkpoint of replication slots, SaveSlotToPath() would
just *LOG* any failure while leaving around the state.tmp of a slot,
and then any follow-up attempt to create state.tmp would just fail
because of that, preventing the slot state file from being flushed
continuously. I think that's wrong. Concurrency is not a concern
either here as the slot's LWLock to track an I/O in progress is taken
in exclusive lock.

I'm not clear on what you're saying? I don't see how I'm arguing for the
type of behaviour you seem to be describing?

Greetings,

Andres Freund

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Grigory Smolkin (#1)
Re: logical replication: could not create file "state.tmp": File exists

On 2019-Nov-30, Grigory Smolkin wrote:

One of my colleagues encountered an out of space condition, which broke his
logical replication setup.
It`s manifested with the following errors:

ERROR:� could not receive data from WAL stream: ERROR:� could not create
file "pg_replslot/some_sub/state.tmp": File exists

We haven't addressed this problem, have we? At least, I see that this
routine still looks like this:

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 21ae8531b3..ef415b44c8 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1253,7 +1253,7 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
sprintf(tmppath, "%s/state.tmp", dir);
sprintf(path, "%s/state", dir);

- fd = OpenTransientFile(tmppath, O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#11Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#10)
Re: logical replication: could not create file "state.tmp": File exists

On Mon, May 25, 2020 at 04:35:41PM -0400, Alvaro Herrera wrote:

We haven't addressed this problem, have we? At least, I see that this
routine still looks like this:

Yeah, I think that we still need to do something here. I got this
issue on my stack for some time but I have not been able to get around
it..
--
Michael