Something is wrong with wal_compression

Started by Tom Lanealmost 3 years ago29 messages

tgl@sss.pgh.pa.us

almost 3 years ago

The symptom being exhibited by Michael's new BF animal tanager
is perfectly reproducible elsewhere.

$ cat /home/postgres/tmp/temp_config
#default_toast_compression = lz4
wal_compression = lz4
$ export TEMP_CONFIG=/home/postgres/tmp/temp_config
$ cd ~/pgsql/src/test/recovery
$ make check PROVE_TESTS=t/011_crash_recovery.pl
...
+++ tap check in src/test/recovery +++
t/011_crash_recovery.pl .. 1/? 
#   Failed test 'new xid after restart is greater'
#   at t/011_crash_recovery.pl line 53.
#     '729'
#         >
#     '729'

# Failed test 'xid is aborted after crash'
# at t/011_crash_recovery.pl line 57.
# got: 'committed'
# expected: 'aborted'
# Looks like you failed 2 tests of 3.

Maybe this is somehow the test script's fault, but I don't see how.

It fails the same way with 'wal_compression = pglz', so I think it's
generic to that whole feature rather than specific to LZ4.

regards, tom lane

Justin Pryzby

pryzby@telsasoft.com

almost 3 years ago

In reply to: Tom Lane (#1)

Re: Something is wrong with wal_compression

On Thu, Jan 26, 2023 at 02:43:29PM -0500, Tom Lane wrote:

The symptom being exhibited by Michael's new BF animal tanager
is perfectly reproducible elsewhere.

I think these tests have always failed with wal_compression ?

/messages/by-id/20210308.173242.463790587797836129.horikyota.ntt@gmail.com
/messages/by-id/20210313012820.GJ29463@telsasoft.com
/messages/by-id/20220222231948.GJ9008@telsasoft.com

/messages/by-id/YNqWd2GSMrnqWIfx@paquier.xyz
|My buildfarm machine has been changed to use wal_compression = lz4,
|while on it for HEAD runs.

Tom Lane

tgl@sss.pgh.pa.us

almost 3 years ago

In reply to: Justin Pryzby (#2)

Re: Something is wrong with wal_compression

Justin Pryzby <pryzby@telsasoft.com> writes:

On Thu, Jan 26, 2023 at 02:43:29PM -0500, Tom Lane wrote:

The symptom being exhibited by Michael's new BF animal tanager
is perfectly reproducible elsewhere.

I think these tests have always failed with wal_compression ?

If that's a known problem, and we've done nothing about it,
that is pretty horrid. That test case is demonstrating fundamental
database corruption after a crash.

regards, tom lane

Justin Pryzby

pryzby@telsasoft.com

almost 3 years ago

In reply to: Justin Pryzby (#2)

Re: Something is wrong with wal_compression

On Thu, Jan 26, 2023 at 02:08:27PM -0600, Justin Pryzby wrote:

On Thu, Jan 26, 2023 at 02:43:29PM -0500, Tom Lane wrote:

The symptom being exhibited by Michael's new BF animal tanager
is perfectly reproducible elsewhere.

I think these tests have always failed with wal_compression ?

/messages/by-id/20210308.173242.463790587797836129.horikyota.ntt@gmail.com
/messages/by-id/20210313012820.GJ29463@telsasoft.com
/messages/by-id/20220222231948.GJ9008@telsasoft.com

+ /messages/by-id/c86ce84f-dd38-9951-102f-13a931210f52@dunslane.net

Andrey Borodin

amborodin86@gmail.com

almost 3 years ago

In reply to: Tom Lane (#3)

Re: Something is wrong with wal_compression

On Thu, Jan 26, 2023 at 12:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

That test case is demonstrating fundamental
database corruption after a crash.

Not exactly corruption. XID was not persisted and buffer data did not
hit a disk. Database is in the correct state.

It was discussed long before WAL compression here [0]/messages/by-id/565FB155-C6B0-41E2-8C44-7B514DC25132@yandex-team.ru. The thing is it
is easier to reproduce with compression, but compression has nothing
to do with it, as far as I understand.

Proposed fix is here[1]/messages/by-id/20210313012820.GJ29463@telsasoft.com, but I think it's better to fix the test. It
should not veryfi Xid, but rather side effects of "CREATE TABLE mine(x
integer);".

Best regards, Andrey Borodin.

[0]: /messages/by-id/565FB155-C6B0-41E2-8C44-7B514DC25132@yandex-team.ru
[1]: /messages/by-id/20210313012820.GJ29463@telsasoft.com

Tom Lane

tgl@sss.pgh.pa.us

almost 3 years ago

In reply to: Andrey Borodin (#5)

Re: Something is wrong with wal_compression

Andrey Borodin <amborodin86@gmail.com> writes:

On Thu, Jan 26, 2023 at 12:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

That test case is demonstrating fundamental
database corruption after a crash.

Not exactly corruption. XID was not persisted and buffer data did not
hit a disk. Database is in the correct state.

Really? I don't see how this part is even a little bit okay:

[00:40:50.744](0.046s) not ok 3 - xid is aborted after crash
[00:40:50.745](0.001s)
[00:40:50.745](0.000s) # Failed test 'xid is aborted after crash'
# at t/011_crash_recovery.pl line 57.
[00:40:50.746](0.001s) # got: 'committed'
# expected: 'aborted'

If any tuples made by that transaction had reached disk,
we'd have a problem.

regards, tom lane

Thomas Munro

thomas.munro@gmail.com

almost 3 years ago

In reply to: Tom Lane (#6)

Re: Something is wrong with wal_compression

On Fri, Jan 27, 2023 at 11:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrey Borodin <amborodin86@gmail.com> writes:

On Thu, Jan 26, 2023 at 12:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

That test case is demonstrating fundamental
database corruption after a crash.

Not exactly corruption. XID was not persisted and buffer data did not
hit a disk. Database is in the correct state.

Really? I don't see how this part is even a little bit okay:

[00:40:50.744](0.046s) not ok 3 - xid is aborted after crash
[00:40:50.745](0.001s)
[00:40:50.745](0.000s) # Failed test 'xid is aborted after crash'
# at t/011_crash_recovery.pl line 57.
[00:40:50.746](0.001s) # got: 'committed'
# expected: 'aborted'

If any tuples made by that transaction had reached disk,
we'd have a problem.

The problem is that the WAL wasn't flushed, allowing the same xid to
be allocated again after crash recovery. But for any data pages to
hit the disk, we'd have to flush WAL first, so then it couldn't
happen, no? FWIW I also re-complained about the dangers of anyone
relying on pg_xact_status() for its stated purpose after seeing
tanager's failure[1]/messages/by-id/CA+hUKGJ9p2JPPMA4eYAKq=r9d_4_8vziet_tS1LEBbiny5-ypA@mail.gmail.com.

[1]: /messages/by-id/CA+hUKGJ9p2JPPMA4eYAKq=r9d_4_8vziet_tS1LEBbiny5-ypA@mail.gmail.com

Tom Lane

tgl@sss.pgh.pa.us

almost 3 years ago

In reply to: Thomas Munro (#7)

Re: Something is wrong with wal_compression

Thomas Munro <thomas.munro@gmail.com> writes:

On Fri, Jan 27, 2023 at 11:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

If any tuples made by that transaction had reached disk,
we'd have a problem.

The problem is that the WAL wasn't flushed, allowing the same xid to
be allocated again after crash recovery. But for any data pages to
hit the disk, we'd have to flush WAL first, so then it couldn't
happen, no?

Ah, now I get the point: the "committed xact" seen after restart
isn't the same one as we saw before the crash, but a new one that
was given the same XID because nothing about the old one had made
it to disk yet.

FWIW I also re-complained about the dangers of anyone
relying on pg_xact_status() for its stated purpose after seeing
tanager's failure[1].

Indeed, it seems like this behavior makes pg_xact_status() basically
useless as things stand.

regards, tom lane

Andrey Borodin

amborodin86@gmail.com

almost 3 years ago

In reply to: Tom Lane (#8)

Re: Something is wrong with wal_compression

On Thu, Jan 26, 2023 at 3:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Indeed, it seems like this behavior makes pg_xact_status() basically
useless as things stand.

If we agree that xid allocation is not something persistent, let's fix
the test? We can replace a check with select * from pg_class or,
maybe, add an amcheck run.
As far as I recollect, this test was introduced to test this new
function in 857ee8e391f.

Best regards, Andrey Borodin.

#10

Tom Lane

tgl@sss.pgh.pa.us

almost 3 years ago

In reply to: Andrey Borodin (#9)

Re: Something is wrong with wal_compression

Andrey Borodin <amborodin86@gmail.com> writes:

On Thu, Jan 26, 2023 at 3:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Indeed, it seems like this behavior makes pg_xact_status() basically
useless as things stand.

If we agree that xid allocation is not something persistent, let's fix
the test?

If we're not going to fix this behavior, we need to fix the docs
to disclaim that pg_xact_status() is of use for what it's said
to be good for.

regards, tom lane

#11

Michael Paquier

michael@paquier.xyz

almost 3 years ago

In reply to: Andrey Borodin (#9)

Re: Something is wrong with wal_compression

On Thu, Jan 26, 2023 at 04:14:57PM -0800, Andrey Borodin wrote:

If we agree that xid allocation is not something persistent, let's fix
the test? We can replace a check with select * from pg_class or,
maybe, add an amcheck run.
As far as I recollect, this test was introduced to test this new
function in 857ee8e391f.

My opinion would be to make this function more reliable, FWIW, even if
that involves a performance impact when called in a close loop by
forcing more WAL flushes to ensure its report durability and
consistency. As things stand, this is basically unreliable, and we
document it as something applications can *use*. Adding a note in the
docs to say that this function can be unstable for some edge cases
does not make much sense to me, either. Commit 857ee8e itself says
that we can use it if a database connection is lost, which could
happen on a crash..
--
Michael

#12

Thomas Munro

thomas.munro@gmail.com

almost 3 years ago

In reply to: Michael Paquier (#11)

Re: Something is wrong with wal_compression

On Fri, Jan 27, 2023 at 1:30 PM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Jan 26, 2023 at 04:14:57PM -0800, Andrey Borodin wrote:

If we agree that xid allocation is not something persistent, let's fix
the test? We can replace a check with select * from pg_class or,
maybe, add an amcheck run.
As far as I recollect, this test was introduced to test this new
function in 857ee8e391f.

My opinion would be to make this function more reliable, FWIW, even if
that involves a performance impact when called in a close loop by
forcing more WAL flushes to ensure its report durability and
consistency. As things stand, this is basically unreliable, and we
document it as something applications can *use*. Adding a note in the
docs to say that this function can be unstable for some edge cases
does not make much sense to me, either. Commit 857ee8e itself says
that we can use it if a database connection is lost, which could
happen on a crash..

Yeah, the other thread has a patch for that. But it would hurt some
workloads. A better patch would do some kind of amortisation
(reserving N xids at a time or some such scheme, while being careful
to make sure the right CLOG pages etc exist if you crash and skip a
bunch of xids on recovery) but be more complicated. For the record,
back before release 13 added the 64 bit xid allocator, these functions
(or rather their txid_XXX ancestors) were broken in a different way:
they didn't track epochs reliably, the discovery of which led to the
new xid8-based functions, so that might provide a natural
back-patching range, if a back-patchable solution can be agreed on.

#13

Tom Lane

tgl@sss.pgh.pa.us

almost 3 years ago

In reply to: Thomas Munro (#12)

Re: Something is wrong with wal_compression

Thomas Munro <thomas.munro@gmail.com> writes:

On Fri, Jan 27, 2023 at 1:30 PM Michael Paquier <michael@paquier.xyz> wrote:

My opinion would be to make this function more reliable, FWIW, even if
that involves a performance impact when called in a close loop by
forcing more WAL flushes to ensure its report durability and
consistency.

Yeah, the other thread has a patch for that. But it would hurt some
workloads.

I think we need to get the thing correct first and worry about
performance later. What's wrong with simply making pg_xact_status
write and flush a record of the XID's existence before returning it?
Yeah, it will cost you if you use that function, but not if you don't.

A better patch would do some kind of amortisation
(reserving N xids at a time or some such scheme, while being careful
to make sure the right CLOG pages etc exist if you crash and skip a
bunch of xids on recovery) but be more complicated.

Maybe that would be appropriate for HEAD, but I'd be wary of adding
anything complicated to the back branches. This is clearly a very
under-tested area.

regards, tom lane

#14

Thomas Munro

thomas.munro@gmail.com

almost 3 years ago

In reply to: Tom Lane (#13)

Re: Something is wrong with wal_compression

On Fri, Jan 27, 2023 at 3:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thomas Munro <thomas.munro@gmail.com> writes:

On Fri, Jan 27, 2023 at 1:30 PM Michael Paquier <michael@paquier.xyz> wrote:

My opinion would be to make this function more reliable, FWIW, even if
that involves a performance impact when called in a close loop by
forcing more WAL flushes to ensure its report durability and
consistency.

Yeah, the other thread has a patch for that. But it would hurt some
workloads.

I think we need to get the thing correct first and worry about
performance later. What's wrong with simply making pg_xact_status
write and flush a record of the XID's existence before returning it?
Yeah, it will cost you if you use that function, but not if you don't.

It would be pg_current_xact_id() that would have to pay the cost of
the WAL flush, not pg_xact_status() itself, but yeah that's what the
patch does (with some optimisations). I guess one question is whether
there are any other reasonable real world uses of
pg_current_xact_id(), other than the original goal[1]/messages/by-id/CAMsr+YHQiWNEi0daCTboS40T+V5s_+dst3PYv_8v2wNVH+Xx4g@mail.gmail.com. If not, then
at least you are penalising the right users, even though they probably
only actually call pg_current_status() in extremely rare circumstances
(if COMMIT hangs up). But I wouldn't be surprised if people have
found other reasons to be interested in xid observability, related to
distributed transactions and snapshots and suchlike. There is no
doubt that the current situation is unacceptable, though, so maybe we
really should just do it and make a faster one later. Anyone else
want to vote on this?

[1]: /messages/by-id/CAMsr+YHQiWNEi0daCTboS40T+V5s_+dst3PYv_8v2wNVH+Xx4g@mail.gmail.com

#15

Tom Lane

tgl@sss.pgh.pa.us

almost 3 years ago

In reply to: Thomas Munro (#14)

Re: Something is wrong with wal_compression

Thomas Munro <thomas.munro@gmail.com> writes:

On Fri, Jan 27, 2023 at 3:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think we need to get the thing correct first and worry about
performance later. What's wrong with simply making pg_xact_status
write and flush a record of the XID's existence before returning it?
Yeah, it will cost you if you use that function, but not if you don't.

It would be pg_current_xact_id() that would have to pay the cost of
the WAL flush, not pg_xact_status() itself,

Right, typo on my part.

regards, tom lane

#16

Laurenz Albe

laurenz.albe@cybertec.at

almost 3 years ago

In reply to: Thomas Munro (#14)

Re: Something is wrong with wal_compression

On Fri, 2023-01-27 at 16:15 +1300, Thomas Munro wrote:

On Fri, Jan 27, 2023 at 3:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thomas Munro <thomas.munro@gmail.com> writes:

On Fri, Jan 27, 2023 at 1:30 PM Michael Paquier <michael@paquier.xyz> wrote:

My opinion would be to make this function more reliable, FWIW, even if
that involves a performance impact when called in a close loop by
forcing more WAL flushes to ensure its report durability and
consistency.

Yeah, the other thread has a patch for that. But it would hurt some
workloads.

I think we need to get the thing correct first and worry about
performance later. What's wrong with simply making pg_xact_status
write and flush a record of the XID's existence before returning it?
Yeah, it will cost you if you use that function, but not if you don't.

There is no
doubt that the current situation is unacceptable, though, so maybe we
really should just do it and make a faster one later. Anyone else
want to vote on this?

I wasn't aware of the existence of pg_xact_status, so I suspect that it
is not a widely known and used feature. After reading the documentation,
I'd say that anybody who uses it will want it to give a reliable answer.
So I'd agree that it is better to make it more expensive, but live up to
its promise.

Yours,
Laurenz Albe

#17

Michael Paquier

michael@paquier.xyz

almost 3 years ago

In reply to: Laurenz Albe (#16)

Re: Something is wrong with wal_compression

On Fri, Jan 27, 2023 at 06:06:05AM +0100, Laurenz Albe wrote:

On Fri, 2023-01-27 at 16:15 +1300, Thomas Munro wrote:

There is no
doubt that the current situation is unacceptable, though, so maybe we
really should just do it and make a faster one later. Anyone else
want to vote on this?

I wasn't aware of the existence of pg_xact_status, so I suspect that it
is not a widely known and used feature. After reading the documentation,
I'd say that anybody who uses it will want it to give a reliable answer.
So I'd agree that it is better to make it more expensive, but live up to
its promise.

A code search within the Debian packages (codesearch.debian.net) and
github does not show that it is not actually used, pg_xact_status() is
reported as parts of copies of the Postgres code in the regression
tests.

FWIW, my vote goes for a more expensive but reliable function even in
stable branches. Even 857ee8e mentions that this could be used on a
lost connection, so we don't even satisfy the use case of the original
commit as things stand (right?), because lost connection could just be
a result of a crash, and if crash recovery reassigns the XID, then the
client gets it wrong.
--
Michael

#18

Andres Freund

andres@anarazel.de

almost 3 years ago

In reply to: Thomas Munro (#14)

Re: Something is wrong with wal_compression

Hi,

On 2023-01-27 16:15:08 +1300, Thomas Munro wrote:

It would be pg_current_xact_id() that would have to pay the cost of
the WAL flush, not pg_xact_status() itself, but yeah that's what the
patch does (with some optimisations). I guess one question is whether
there are any other reasonable real world uses of
pg_current_xact_id(), other than the original goal[1].

txid_current() is a lot older than pg_current_xact_id(), and they're backed by
the same code afaict. 8.4 I think.

Unfortunately txid_current() is used in plenty montiring setups IME.

I don't think it's a good idea to make a function that was quite cheap for 15
years, suddenly be several orders of magnitude more expensive...

Greetings,

Andres Freund

#19

Maciek Sakrejda

m.sakrejda@gmail.com

almost 3 years ago

In reply to: Andres Freund (#18)

Re: Something is wrong with wal_compression

On Fri, Jan 27, 2023, 18:58 Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2023-01-27 16:15:08 +1300, Thomas Munro wrote:

It would be pg_current_xact_id() that would have to pay the cost of
the WAL flush, not pg_xact_status() itself, but yeah that's what the
patch does (with some optimisations). I guess one question is whether
there are any other reasonable real world uses of
pg_current_xact_id(), other than the original goal[1].

txid_current() is a lot older than pg_current_xact_id(), and they're
backed by
the same code afaict. 8.4 I think.

Unfortunately txid_current() is used in plenty montiring setups IME.

I don't think it's a good idea to make a function that was quite cheap for
15
years, suddenly be several orders of magnitude more expensive...

As someone working on a monitoring tool that uses it (well, both), +1. We'd
have to rethink a few things if this becomes a performance concern.

Thanks,
Maciek

#20

Andres Freund

andres@anarazel.de

almost 3 years ago

In reply to: Michael Paquier (#17)

Re: Something is wrong with wal_compression

Hi,

On 2023-01-28 11:38:50 +0900, Michael Paquier wrote:

On Fri, Jan 27, 2023 at 06:06:05AM +0100, Laurenz Albe wrote:

On Fri, 2023-01-27 at 16:15 +1300, Thomas Munro wrote:

There is no
doubt that the current situation is unacceptable, though, so maybe we
really should just do it and make a faster one later.ï¿½ Anyone else
want to vote on this?

I wasn't aware of the existence of pg_xact_status, so I suspect that it
is not a widely known and used feature. After reading the documentation,
I'd say that anybody who uses it will want it to give a reliable answer.
So I'd agree that it is better to make it more expensive, but live up to
its promise.

A code search within the Debian packages (codesearch.debian.net) and
github does not show that it is not actually used, pg_xact_status() is
reported as parts of copies of the Postgres code in the regression
tests.

Not finding a user at codesearch.debian.net provides useful information for C
APIs, but a negative result for an SQL exposed function doesn't provide any
information. Those callers will largely be in application code, which largely
won't be in debian.

And as noted two messages up, we wouldn't need to flush in pg_xact_status(),
we'd need to flush in pg_current_xact_id()/txid_current().

FWIW, my vote goes for a more expensive but reliable function even in
stable branches.

I very strenuously object. If we make txid_current() (by way of
pg_current_xact_id()) flush WAL, we'll cause outages.

Greetings,

Andres Freund

#21

Tom Lane

tgl@sss.pgh.pa.us

almost 3 years ago

In reply to: Andres Freund (#20)

Re: Something is wrong with wal_compression

Andres Freund <andres@anarazel.de> writes:

On 2023-01-28 11:38:50 +0900, Michael Paquier wrote:

FWIW, my vote goes for a more expensive but reliable function even in
stable branches.

I very strenuously object. If we make txid_current() (by way of
pg_current_xact_id()) flush WAL, we'll cause outages.

What are you using it for, that you don't care whether the answer
is trustworthy?

regards, tom lane

#22

Andres Freund

andres@anarazel.de

almost 3 years ago

In reply to: Tom Lane (#21)

Re: Something is wrong with wal_compression

Hi,

On 2023-01-27 22:39:56 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2023-01-28 11:38:50 +0900, Michael Paquier wrote:

FWIW, my vote goes for a more expensive but reliable function even in
stable branches.

I very strenuously object. If we make txid_current() (by way of
pg_current_xact_id()) flush WAL, we'll cause outages.

What are you using it for, that you don't care whether the answer
is trustworthy?

It's quite commonly used as part of trigger based replication tools (IIRC
that's its origin), monitoring, as part of client side logging, as part of
snapshot management.

txid_current() predates pg_xact_status() by well over 10 years. Clearly we had
lots of uses for it before pg_xact_status() was around.

Greetings,

Andres Freund

#23

Andrey Borodin

amborodin86@gmail.com

almost 3 years ago

In reply to: Tom Lane (#21)

Re: Something is wrong with wal_compression

On Fri, Jan 27, 2023 at 7:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

What are you using it for, that you don't care whether the answer
is trustworthy?

It's not trustworthy anyway. Xid wraparound might happen during
reconnect. I suspect we can design a test that will show that it does
not always show correct results during xid->2pc conversion (there is a
point in time when xid is not in regular and not in 2pc, and I'm not
sure ProcArrayLock is held). Maybe there are other edge cases.

Anyway, if a user wants to know the status of xid in case of
disconnection they have prepared xacts.

Best regards, Andrey Borodin.

#24

Andres Freund

andres@anarazel.de

almost 3 years ago

In reply to: Andres Freund (#22)

Re: Something is wrong with wal_compression

Hi,

On 2023-01-27 19:49:17 -0800, Andres Freund wrote:

It's quite commonly used as part of trigger based replication tools (IIRC
that's its origin), monitoring, as part of client side logging, as part of
snapshot management.

Forgot one: Queues.

The way it's used for trigger based replication, queues and also some
materialized aggregation tooling, is that there's a trigger that inserts into
a "log" table. And that log table has a column into which txid_current() will
be inserted. Together with txid_current_snapshot() etc that's used to get a
(at least semi) "transactional" order out of such log tables.

I believe that's originally been invented by londiste / skytool, later slony
migrated to it. The necessary C code was added as contrib/txid in 1f92630fc4e
2007-10-07 and then moved into core a few days later in 18e3fcc31e7.

For those cases making txid_current() flush would approximately double the WAL
flush rate.

Greetings,

Andres Freund

#25

Andres Freund

andres@anarazel.de

almost 3 years ago

In reply to: Andrey Borodin (#23)

Re: Something is wrong with wal_compression

Hi,

On 2023-01-27 19:57:35 -0800, Andrey Borodin wrote:

On Fri, Jan 27, 2023 at 7:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

What are you using it for, that you don't care whether the answer
is trustworthy?

It's not trustworthy anyway. Xid wraparound might happen during
reconnect.

I think that part would be approximately fine, as long as you can live with
an answer of "too old". The xid returned by txid_status/pg_current_xact_id()
is 64bit, and there is code to verify that the relevant range is covered by
the clog.

However - there's nothing preventing the xid to become too old in case of a
crash.

If you have an open connection, you can prevent the clog from being truncated
by having an open snapshot. But you can't really without using e.g. 2PC if you
want to handle crashes - obviously snapshots don't survive them.

I really don't think txid_status() can be used for anything but informational
probing of the clog / procarray.

I suspect we can design a test that will show that it does not always show
correct results during xid->2pc conversion (there is a point in time when
xid is not in regular and not in 2pc, and I'm not sure ProcArrayLock is
held). Maybe there are other edge cases.

Unless I am missing something, that would be very bad [TM], completely
independent of txid_status(). The underlying functions like
TransactionIdIsInProgress() are used for MVCC.

Greetings,

Andres Freund

#26

Thomas Munro

thomas.munro@gmail.com

almost 3 years ago

In reply to: Andrey Borodin (#23)

Re: Something is wrong with wal_compression

On Sat, Jan 28, 2023 at 4:57 PM Andrey Borodin <amborodin86@gmail.com> wrote:

It's not trustworthy anyway. Xid wraparound might happen during
reconnect. I suspect we can design a test that will show that it does
not always show correct results during xid->2pc conversion (there is a
point in time when xid is not in regular and not in 2pc, and I'm not
sure ProcArrayLock is held). Maybe there are other edge cases.

I'm not sure I understand the edge cases, but it is true that this can
only give you the answer until the CLOG is truncated, which is pretty
arbitrary and you could be unlucky. I guess a reliable version of
this would have new policies about CLOG retention, and CLOG segment
filenames derived from 64 bit xids so they don't wrap around.

Anyway, if a user wants to know the status of xid in case of
disconnection they have prepared xacts.

Yeah. The original proposal mentioned that, but that this was a
"lighter" alternative.

Reading Andres's comments and realising how relatively young
txid_status() is compared to txid_current(), I'm now wondering if we
shouldn't just disclaim the whole thing in back branches. Maybe if we
want to rescue it in master, there could be a "reliable" argument,
defaulting to false, or whatever, and we could eventually make the
amortisation improvement.

#27

Tom Lane

tgl@sss.pgh.pa.us

almost 3 years ago

In reply to: Thomas Munro (#26)

Re: Something is wrong with wal_compression

Thomas Munro <thomas.munro@gmail.com> writes:

Reading Andres's comments and realising how relatively young
txid_status() is compared to txid_current(), I'm now wondering if we
shouldn't just disclaim the whole thing in back branches.

My thoughts were trending in that direction too. It's starting
to sound like we aren't going to be able to make a fix that
we'd be willing to risk back-patching, even if it were completely
compatible at the user level.

Still, the idea that txid_status() isn't trustworthy is rather
scary. I wonder whether there is a failure mode here that's
exhibitable without using that.

regards, tom lane

#28

Michael Paquier

michael@paquier.xyz

almost 3 years ago

In reply to: Tom Lane (#27)

Re: Something is wrong with wal_compression

On Sat, Jan 28, 2023 at 12:02:23AM -0500, Tom Lane wrote:

My thoughts were trending in that direction too. It's starting
to sound like we aren't going to be able to make a fix that
we'd be willing to risk back-patching, even if it were completely
compatible at the user level.

Still, the idea that txid_status() isn't trustworthy is rather
scary. I wonder whether there is a failure mode here that's
exhibitable without using that.

Okay, as far as I can see, the consensus would be to not do anything
about the performance impact of these functions:
20210305.115011.558061052471425531.horikyota.ntt@gmail.com

Three of my buildfarm machines are unstable because of that, they need
something for stable branches as well, and I'd like them to stress
their options.

Based on what's been mentioned, we can:
1) tweak the test with an extra checkpoint to make sure that the XIDs
are flushed, like in the patch posted on [1]/messages/by-id/20210305.115011.558061052471425531.horikyota.ntt@gmail.com -- Michael.
2) tweak the test to rely on a state of the table, as
mentioned by Andrey.
3) remove entirely the test, because as introduced it does not
actually test what it should.

2) is not really interesting, IMO, because the test checks for two
things:
- an in-progress XID, which we already do in the main regression test
suite.
- a post-crash state, and switching to an approach where some data is
for example scanned is no different than a lot of the other recovery
tests.

1) means more test cycles, and perhaps we could enforce compression of
WAL while on it? At the end, my vote would just go for 3) and drop
the whole scenario, though there may be an argument in 1).

[1]: /messages/by-id/20210305.115011.558061052471425531.horikyota.ntt@gmail.com -- Michael
--
Michael

#29

Michael Paquier

michael@paquier.xyz

almost 3 years ago

In reply to: Michael Paquier (#28)

Re: Something is wrong with wal_compression

On Mon, Jan 30, 2023 at 02:57:13PM +0900, Michael Paquier wrote:

1) means more test cycles, and perhaps we could enforce compression of
WAL while on it? At the end, my vote would just go for 3) and drop
the whole scenario, though there may be an argument in 1).

And actually I was under the impression that 1) is not completely
stable either in the test because we rely on the return result of
txid_current() with IPC::Run::start, so a checkpoint forcing a flush
may not be able to do its work. In order to bring all my animals back
to green, I have removed the test.
--
Michael