logical replication: could not create file "state.tmp": File exists
Hello!
One of my colleagues encountered an out of space condition, which broke
his logical replication setup.
It`s manifested with the following errors:
ERROR: could not receive data from WAL stream: ERROR: could not create
file "pg_replslot/some_sub/state.tmp": File exists
I`ve digged a bit into this problem, and it`s turned out that in
SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT
| O_EXCL' flags, which makes this routine as not very reentrant.
Since an exclusive lock is taken before temp file creation, I think it
should be safe to replace O_EXCL with O_TRUNC.
Script to reproduce and patch are attached.
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Sat, Nov 30, 2019 at 03:09:39PM +0300, Grigory Smolkin wrote:
I`ve digged a bit into this problem, and it`s turned out that in
SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT |
O_EXCL' flags, which makes this routine as not very reentrant.
What did you see as I/O problem before facing the actual error
reported here? Was it just ENOSPC, a fsync failure, or just a failure
in closing the fd? The first pattern is mostly what I guess happened,
still a fsync failure would not trigger a PANIC here (actually we
really should do that!), but I am raising a different thread about
that issue.
Since an exclusive lock is taken before temp file creation, I think it
should be safe to replace O_EXCL with O_TRUNC.
Script to reproduce and patch are attached.
Agreed. I prefer the O_TRUNC option because that's less code churn.
Also, as it can still be useful to have a look at the temporary state
file after a crash or a failure, doing unlink() in the error code
paths is no good option IMO.
Have others thoughts or objections to share?
--
Michael
On 12/2/19 7:35 AM, Michael Paquier wrote:
On Sat, Nov 30, 2019 at 03:09:39PM +0300, Grigory Smolkin wrote:
I`ve digged a bit into this problem, and it`s turned out that in
SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT |
O_EXCL' flags, which makes this routine as not very reentrant.What did you see as I/O problem before facing the actual error
reported here? Was it just ENOSPC, a fsync failure, or just a failure
in closing the fd? The first pattern is mostly what I guess happened,
still a fsync failure would not trigger a PANIC here (actually we
really should do that!), but I am raising a different thread about
that issue.
Hello!
I didn`t see the very first error that left behind the temp file.
I`ve requested it just now, but it will take some time to get it (there
are several terabytes of text log).
But I assume that it was out of space error, which, by the look of the
code, should produce ERROR and leave temp file behind, just as it
happened in aforementioned� case.
Since an exclusive lock is taken before temp file creation, I think it
should be safe to replace O_EXCL with O_TRUNC.
Script to reproduce and patch are attached.Agreed. I prefer the O_TRUNC option because that's less code churn.
Also, as it can still be useful to have a look at the temporary state
file after a crash or a failure, doing unlink() in the error code
paths is no good option IMO.
I`m sorry, but it was an production system, so, as I understand it,
stale temp file was hastily deleted without long considerations.
Thank you for your interest in this topic.
Have others thoughts or objections to share?
--
Michael
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Hi,
On 2019-11-30 15:09:39 +0300, Grigory Smolkin wrote:
One of my colleagues encountered an out of space condition, which broke his
logical replication setup.
It`s manifested with the following errors:ERROR:� could not receive data from WAL stream: ERROR:� could not create
file "pg_replslot/some_sub/state.tmp": File exists
Hm. What was the log output leading to this state? Some cases of this
would end up in a PANIC, which'd remove the .tmp file during
recovery. But there's some where we won't - it seems the right fix for
this would be to unlink the tmp file in that case?
I`ve digged a bit into this problem, and it`s turned out that in
SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT |
O_EXCL' flags, which makes this routine as not very reentrant.Since an exclusive lock is taken before temp file creation, I think it
should be safe to replace O_EXCL with O_TRUNC.
I'm very doubtful about this. I think it's a good safety measure to
ensure that there's no previous state file that we're somehow
overwriting.
Script to reproduce and patch are attached.
Well:
# Imitate out_of_space/write_operation_error
touch ${PGDATA_PUB}/pg_replslot/mysub/state.tmp
Doesn't really replicate how we got into this state...
Greetings,
Andres Freund
On 12/2/19 7:12 PM, Andres Freund wrote:
Hi,
On 2019-11-30 15:09:39 +0300, Grigory Smolkin wrote:
One of my colleagues encountered an out of space condition, which broke his
logical replication setup.
It`s manifested with the following errors:ERROR: could not receive data from WAL stream: ERROR: could not create
file "pg_replslot/some_sub/state.tmp": File existsHm. What was the log output leading to this state? Some cases of this
would end up in a PANIC, which'd remove the .tmp file during
recovery. But there's some where we won't - it seems the right fix for
this would be to unlink the tmp file in that case?I`ve digged a bit into this problem, and it`s turned out that in
SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT |
O_EXCL' flags, which makes this routine as not very reentrant.Since an exclusive lock is taken before temp file creation, I think it
should be safe to replace O_EXCL with O_TRUNC.I'm very doubtful about this. I think it's a good safety measure to
ensure that there's no previous state file that we're somehow
overwriting.
Is it possible with exclusive lock taken before that?
Script to reproduce and patch are attached.
Well:
# Imitate out_of_space/write_operation_error
touch ${PGDATA_PUB}/pg_replslot/mysub/state.tmpDoesn't really replicate how we got into this state...
But it replicate the exactly the same state we would get, if write() to
temp file would have failed with out of space.
Greetings,
Andres Freund
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Here's what happened from the publisher and subscriber point of view:
publisher: (some query) ERROR: could not write to tuplestore temporary
file: No space left on device
subscriber: db =, user =, app =, client = ERROR: could not receive data
from WAL stream: ERROR: could not write to file
"pg_logical/snapshots/2AE-F3E52FB8.snap.27574.tmp": No space left on device
subscriber: db =, user =, app =, client = LOG: background worker "logical
replication worker" (PID 23114) exited with exit code 1
subscriber: db =, user =, app =, client = LOG: logical replication apply
worker for subscription "<name> _sub" has started
publisher: LOG: received replication command: IDENTIFY_SYSTEM
publisher: LOG: received replication command: START_REPLICATION SLOT
"<name> _sub" LOGICAL 2AE/F3C0B920 (proto_version '1', publication_names
'"<name>_pub"')
publisher: ERROR: could not create file "pg_replslot
/<name>_sub/state.tmp": File exists
I think some publisher logs may not be available due to out of space
problem.
On Mon, Dec 2, 2019 at 7:54 PM Grigory Smolkin <g.smolkin@postgrespro.ru>
wrote:
Show quoted text
On 12/2/19 7:12 PM, Andres Freund wrote:
Hi,
On 2019-11-30 15:09:39 +0300, Grigory Smolkin wrote:
One of my colleagues encountered an out of space condition, which broke
his
logical replication setup.
It`s manifested with the following errors:ERROR: could not receive data from WAL stream: ERROR: could not create
file "pg_replslot/some_sub/state.tmp": File existsHm. What was the log output leading to this state? Some cases of this
would end up in a PANIC, which'd remove the .tmp file during
recovery. But there's some where we won't - it seems the right fix for
this would be to unlink the tmp file in that case?I`ve digged a bit into this problem, and it`s turned out that in
SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT|
O_EXCL' flags, which makes this routine as not very reentrant.
Since an exclusive lock is taken before temp file creation, I think it
should be safe to replace O_EXCL with O_TRUNC.I'm very doubtful about this. I think it's a good safety measure to
ensure that there's no previous state file that we're somehow
overwriting.Is it possible with exclusive lock taken before that?
Script to reproduce and patch are attached.
Well:
# Imitate out_of_space/write_operation_error
touch ${PGDATA_PUB}/pg_replslot/mysub/state.tmpDoesn't really replicate how we got into this state...
But it replicate the exactly the same state we would get, if write() to
temp file would have failed with out of space.Greetings,
Andres Freund
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 12/2/19 7:12 PM, Andres Freund wrote:
Hi,
On 2019-11-30 15:09:39 +0300, Grigory Smolkin wrote:
One of my colleagues encountered an out of space condition, which broke his
logical replication setup.
It`s manifested with the following errors:ERROR: could not receive data from WAL stream: ERROR: could not create
file "pg_replslot/some_sub/state.tmp": File existsHm. What was the log output leading to this state? Some cases of this
would end up in a PANIC, which'd remove the .tmp file during
recovery. But there's some where we won't - it seems the right fix for
this would be to unlink the tmp file in that case?I`ve digged a bit into this problem, and it`s turned out that in
SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT |
O_EXCL' flags, which makes this routine as not very reentrant.Since an exclusive lock is taken before temp file creation, I think it
should be safe to replace O_EXCL with O_TRUNC.I'm very doubtful about this. I think it's a good safety measure to
ensure that there's no previous state file that we're somehow
overwriting.Script to reproduce and patch are attached.
Well:
# Imitate out_of_space/write_operation_error
touch ${PGDATA_PUB}/pg_replslot/mysub/state.tmpDoesn't really replicate how we got into this state...
I`ve managed to reproduce the issue using the attached script:
2019-12-02 20:32:12.547 MSK [4180] ERROR: could not write to file
"pg_logical/snapshots/0-160E9A28.snap.4180.tmp": No space left on device
2019-12-02 20:32:12.548 MSK [4178] ERROR: could not receive data from
WAL stream: ERROR: could not write to file
"pg_logical/snapshots/0-160E9A28.snap.4180.tmp": No space left on device
2019-12-02 20:32:12.557 MSK [4645] LOG: logical replication apply
worker for subscription "mysub" has started
2019-12-02 20:32:12.562 MSK [4091] LOG: background worker "logical
replication worker" (PID 4178) exited with exit code 1
2019-12-02 20:32:12.569 MSK [4648] LOG: starting logical decoding for
slot "mysub"
2019-12-02 20:32:12.569 MSK [4648] DETAIL: Streaming transactions
committing after 0/160E9A28, reading WAL from 0/1566A2D8.
2019-12-02 20:32:12.569 MSK [4648] LOG: logical decoding found
consistent point at 0/1566A2D8
2019-12-02 20:32:12.569 MSK [4648] DETAIL: Logical decoding will begin
using saved snapshot.
2019-12-02 20:32:12.672 MSK [4648] ERROR: could not write to file
"pg_replslot/mysub/state.tmp": No space left on device
2019-12-02 20:32:12.672 MSK [4645] ERROR: could not receive data from
WAL stream: ERROR: could not write to file
"pg_replslot/mysub/state.tmp": No space left on device
2019-12-02 20:32:12.672 MSK [4091] LOG: background worker "logical
replication worker" (PID 4645) exited with exit code 1
2019-12-02 20:32:17.684 MSK [4653] LOG: logical replication apply
worker for subscription "mysub" has started
2019-12-02 20:32:17.693 MSK [4654] LOG: starting logical decoding for
slot "mysub"
2019-12-02 20:32:17.693 MSK [4654] DETAIL: Streaming transactions
committing after 0/160E9A28, reading WAL from 0/1566A2D8.
2019-12-02 20:32:17.693 MSK [4654] LOG: logical decoding found
consistent point at 0/1566A2D8
2019-12-02 20:32:17.693 MSK [4654] DETAIL: Logical decoding will begin
using saved snapshot.
2019-12-02 20:32:17.847 MSK [4654] ERROR: could not write to file
"pg_logical/snapshots/0-160E9A28.snap.4654.tmp": No space left on device
2019-12-02 20:32:17.847 MSK [4653] ERROR: could not receive data from
WAL stream: ERROR: could not write to file
"pg_logical/snapshots/0-160E9A28.snap.4654.tmp": No space left on device
2019-12-02 20:32:17.848 MSK [4091] LOG: background worker "logical
replication worker" (PID 4653) exited with exit code 1
2019-12-02 20:32:22.859 MSK [4658] LOG: logical replication apply
worker for subscription "mysub" has started
2019-12-02 20:32:22.868 MSK [4659] LOG: starting logical decoding for
slot "mysub"
2019-12-02 20:32:22.868 MSK [4659] DETAIL: Streaming transactions
committing after 0/160E9A28, reading WAL from 0/1566A2D8.
2019-12-02 20:32:22.868 MSK [4659] LOG: logical decoding found
consistent point at 0/1566A2D8
2019-12-02 20:32:22.868 MSK [4659] DETAIL: Logical decoding will begin
using saved snapshot.
2019-12-02 20:32:22.868 MSK [4659] ERROR: could not create file
"pg_replslot/mysub/state.tmp": File exists
2019-12-02 20:32:22.869 MSK [4658] ERROR: could not receive data from
WAL stream: ERROR: could not create file "pg_replslot/mysub/state.tmp":
File exists
Greetings,
Andres Freund
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
On Mon, Dec 02, 2019 at 08:12:22AM -0800, Andres Freund wrote:
I'm very doubtful about this. I think it's a good safety measure to
ensure that there's no previous state file that we're somehow
overwriting.
During the checkpoint of replication slots, SaveSlotToPath() would
just *LOG* any failure while leaving around the state.tmp of a slot,
and then any follow-up attempt to create state.tmp would just fail
because of that, preventing the slot state file from being flushed
continuously. I think that's wrong. Concurrency is not a concern
either here as the slot's LWLock to track an I/O in progress is taken
in exclusive lock.
--
Michael
Hi,
On 2019-12-11 14:27:45 +0900, Michael Paquier wrote:
On Mon, Dec 02, 2019 at 08:12:22AM -0800, Andres Freund wrote:
I'm very doubtful about this. I think it's a good safety measure to
ensure that there's no previous state file that we're somehow
overwriting.During the checkpoint of replication slots, SaveSlotToPath() would
just *LOG* any failure while leaving around the state.tmp of a slot,
and then any follow-up attempt to create state.tmp would just fail
because of that, preventing the slot state file from being flushed
continuously. I think that's wrong. Concurrency is not a concern
either here as the slot's LWLock to track an I/O in progress is taken
in exclusive lock.
I'm not clear on what you're saying? I don't see how I'm arguing for the
type of behaviour you seem to be describing?
Greetings,
Andres Freund
On 2019-Nov-30, Grigory Smolkin wrote:
One of my colleagues encountered an out of space condition, which broke his
logical replication setup.
It`s manifested with the following errors:ERROR:� could not receive data from WAL stream: ERROR:� could not create
file "pg_replslot/some_sub/state.tmp": File exists
We haven't addressed this problem, have we? At least, I see that this
routine still looks like this:
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 21ae8531b3..ef415b44c8 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -1253,7 +1253,7 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel) sprintf(tmppath, "%s/state.tmp", dir); sprintf(path, "%s/state", dir);- fd = OpenTransientFile(tmppath, O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, May 25, 2020 at 04:35:41PM -0400, Alvaro Herrera wrote:
We haven't addressed this problem, have we? At least, I see that this
routine still looks like this:
Yeah, I think that we still need to do something here. I got this
issue on my stack for some time but I have not been able to get around
it..
--
Michael