fixing old_snapshot_threshold's time->xid mapping

Started by Robert Haasalmost 6 years ago31 messageshackers
Jump to latest
#1Robert Haas
robertmhaas@gmail.com

Hi,

I'm starting a new thread for this, because the recent discussion of
problems with old_snapshot_threshold[1]/messages/by-id/20200401064008.qob7bfnnbu4w5cw4@alap3.anarazel.de touched on a lot of separate
issues, and I think it will be too confusing if we discuss all of them
on one thread. Attached are three patches.

0001 makes oldSnapshotControl "extern" rather than "static" and
exposes the struct definition via a header.

0002 adds a contrib module called old_snapshot which makes it possible
to examine the time->XID mapping via SQL. As Andres said, the comments
are not really adequate in the existing code, and the code itself is
buggy, so it was a little hard to be sure that I was understanding the
intended meaning of the different fields correctly. However, I gave it
a shot.

0003 attempts to fix bugs in MaintainOldSnapshotTimeMapping() so that
it produces a sensible mapping. I encountered and tried to fix two
issues here:

First, as previously discussed, the branch that advances the mapping
should not categorically do "oldSnapshotControl->head_timestamp = ts;"
assuming that the head_timestamp is supposed to be the timestamp for
the oldest bucket rather than the newest one. Rather, there are three
cases: (1) resetting the mapping resets head_timestamp, (2) extending
the mapping by an entry without dropping an entry leaves
head_timestamp alone, and (3) overwriting the previous head with a new
entry advances head_timestamp by 1 minute.

Second, the calculation of the number of entries by which the mapping
should advance is incorrect. It thinks that it should advance by the
number of minutes between the current head_timestamp and the incoming
timestamp. That would be correct if head_timestamp were the most
recent entry in the mapping, but it's actually the oldest. As a
result, without this fix, every time we move into a new minute, we
advance the mapping much further than we actually should. Instead of
advancing by 1, we advance by the number of entries that already exist
in the mapping - which means we now have entries that correspond to
times which are in the future, and don't advance the mapping again
until those future timestamps are in the past.

With these fixes, I seem to get reasonably sensible mappings, at least
in light testing. I tried running this in one window with \watch 10:

select *, age(newest_xmin), clock_timestamp() from
pg_old_snapshot_time_mapping();

And in another window I ran:

pgbench -T 300 -R 10

And the age does in fact advance by ~600 transactions per minute.

I'm not proposing to commit anything here right now. These patches
haven't had enough testing for that, and their interaction with other
bugs in the feature needs to be considered before we do anything.
However, I thought it might be useful to put them out for review and
comment, and I also thought that having the contrib module from 0002
might permit other people to do some better testing of this feature
and these fixes.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1]: /messages/by-id/20200401064008.qob7bfnnbu4w5cw4@alap3.anarazel.de

Attachments:

v1-0001-Expose-oldSnapshotControl.patchapplication/octet-stream; name=v1-0001-Expose-oldSnapshotControl.patchDownload+77-54
v1-0003-Fix-bugs-in-MaintainOldSnapshotTimeMapping.patchapplication/octet-stream; name=v1-0003-Fix-bugs-in-MaintainOldSnapshotTimeMapping.patchDownload+27-4
v1-0002-contrib-old_snapshot-time-xid-mapping.patchapplication/octet-stream; name=v1-0002-contrib-old_snapshot-time-xid-mapping.patchDownload+203-1
#2Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#1)
Re: fixing old_snapshot_threshold's time->xid mapping

Hi,

On 2020-04-16 12:41:55 -0400, Robert Haas wrote:

I'm starting a new thread for this, because the recent discussion of
problems with old_snapshot_threshold[1] touched on a lot of separate
issues, and I think it will be too confusing if we discuss all of them
on one thread. Attached are three patches.

Cool.

0003 attempts to fix bugs in MaintainOldSnapshotTimeMapping() so that
it produces a sensible mapping. I encountered and tried to fix two
issues here:

First, as previously discussed, the branch that advances the mapping
should not categorically do "oldSnapshotControl->head_timestamp = ts;"
assuming that the head_timestamp is supposed to be the timestamp for
the oldest bucket rather than the newest one. Rather, there are three
cases: (1) resetting the mapping resets head_timestamp, (2) extending
the mapping by an entry without dropping an entry leaves
head_timestamp alone, and (3) overwriting the previous head with a new
entry advances head_timestamp by 1 minute.

Second, the calculation of the number of entries by which the mapping
should advance is incorrect. It thinks that it should advance by the
number of minutes between the current head_timestamp and the incoming
timestamp. That would be correct if head_timestamp were the most
recent entry in the mapping, but it's actually the oldest. As a
result, without this fix, every time we move into a new minute, we
advance the mapping much further than we actually should. Instead of
advancing by 1, we advance by the number of entries that already exist
in the mapping - which means we now have entries that correspond to
times which are in the future, and don't advance the mapping again
until those future timestamps are in the past.

With these fixes, I seem to get reasonably sensible mappings, at least
in light testing. I tried running this in one window with \watch 10:

select *, age(newest_xmin), clock_timestamp() from
pg_old_snapshot_time_mapping();

And in another window I ran:

pgbench -T 300 -R 10

And the age does in fact advance by ~600 transactions per minute.

I still think we need a way to test this without waiting for hours to
hit various edge cases. You argued against a fixed binning of
old_snapshot_threshold/100 arguing its too coarse. How about a 1000 or
so? For 60 days, the current max for old_snapshot_threshold, that'd be a
granularity of 01:26:24, which seems fine. The best way I can think of
that'd keep current GUC values sensible is to change
old_snapshot_threshold to be float. Ugly, but ...?

Greetings,

Andres Freund

#3Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#2)
Re: fixing old_snapshot_threshold's time->xid mapping

On Thu, Apr 16, 2020 at 1:14 PM Andres Freund <andres@anarazel.de> wrote:

I still think we need a way to test this without waiting for hours to
hit various edge cases. You argued against a fixed binning of
old_snapshot_threshold/100 arguing its too coarse. How about a 1000 or
so? For 60 days, the current max for old_snapshot_threshold, that'd be a
granularity of 01:26:24, which seems fine. The best way I can think of
that'd keep current GUC values sensible is to change
old_snapshot_threshold to be float. Ugly, but ...?

Yeah, 1000 would be a lot better. However, if we switch to a fixed
number of bins, it's going to be a lot more code churn. What did you
think of my suggestion of making head_timestamp artificially move
backward to simulate the passage of time?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#4Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#3)
Re: fixing old_snapshot_threshold's time->xid mapping

Hi,

On 2020-04-16 13:34:39 -0400, Robert Haas wrote:

On Thu, Apr 16, 2020 at 1:14 PM Andres Freund <andres@anarazel.de> wrote:

I still think we need a way to test this without waiting for hours to
hit various edge cases. You argued against a fixed binning of
old_snapshot_threshold/100 arguing its too coarse. How about a 1000 or
so? For 60 days, the current max for old_snapshot_threshold, that'd be a
granularity of 01:26:24, which seems fine. The best way I can think of
that'd keep current GUC values sensible is to change
old_snapshot_threshold to be float. Ugly, but ...?

Yeah, 1000 would be a lot better. However, if we switch to a fixed
number of bins, it's going to be a lot more code churn.

Given the number of things that need to be addressed around the feature,
I am not too concerned about that.

What did you think of my suggestion of making head_timestamp
artificially move backward to simulate the passage of time?

I don't think it allows to exercise the various cases well enough. We
need to be able to test this feature both interactively as well as in a
scripted manner. Edge cases like wrapping around in the time mapping imo
can not easily be tested by moving the head timestamp back.

Greetings,

Andres Freund

#5Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#4)
Re: fixing old_snapshot_threshold's time->xid mapping

On Fri, Apr 17, 2020 at 5:46 AM Andres Freund <andres@anarazel.de> wrote:

On 2020-04-16 13:34:39 -0400, Robert Haas wrote:

On Thu, Apr 16, 2020 at 1:14 PM Andres Freund <andres@anarazel.de> wrote:

I still think we need a way to test this without waiting for hours to
hit various edge cases. You argued against a fixed binning of
old_snapshot_threshold/100 arguing its too coarse. How about a 1000 or
so? For 60 days, the current max for old_snapshot_threshold, that'd be a
granularity of 01:26:24, which seems fine. The best way I can think of
that'd keep current GUC values sensible is to change
old_snapshot_threshold to be float. Ugly, but ...?

Yeah, 1000 would be a lot better. However, if we switch to a fixed
number of bins, it's going to be a lot more code churn.

Given the number of things that need to be addressed around the feature,
I am not too concerned about that.

What did you think of my suggestion of making head_timestamp
artificially move backward to simulate the passage of time?

I don't think it allows to exercise the various cases well enough. We
need to be able to test this feature both interactively as well as in a
scripted manner. Edge cases like wrapping around in the time mapping imo
can not easily be tested by moving the head timestamp back.

What about a contrib function that lets you clobber
oldSnapshotControl->current_timestamp? It looks like all times in
this system come ultimately from GetSnapshotCurrentTimestamp(), which
uses that variable to make sure that time never goes backwards.
Perhaps you could abuse that, like so, from test scripts:

postgres=# select * from pg_old_snapshot_time_mapping();
array_offset | end_timestamp | newest_xmin
--------------+------------------------+-------------
0 | 3000-01-01 13:00:00+13 | 490
(1 row)

postgres=# select pg_clobber_current_snapshot_timestamp('3000-01-01 00:01:00Z');
pg_clobber_current_snapshot_timestamp
---------------------------------------

(1 row)

postgres=# select * from pg_old_snapshot_time_mapping();
array_offset | end_timestamp | newest_xmin
--------------+------------------------+-------------
0 | 3000-01-01 13:01:00+13 | 490
1 | 3000-01-01 13:02:00+13 | 490
(2 rows)

Attachments:

0003-Add-pg_clobber_current_snapshot_timestamp.patchtext/x-patch; charset=US-ASCII; name=0003-Add-pg_clobber_current_snapshot_timestamp.patchDownload+18-1
#6Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#5)
Re: fixing old_snapshot_threshold's time->xid mapping

On Fri, Apr 17, 2020 at 2:12 PM Thomas Munro <thomas.munro@gmail.com> wrote:

What about a contrib function that lets you clobber
oldSnapshotControl->current_timestamp? It looks like all times in
this system come ultimately from GetSnapshotCurrentTimestamp(), which
uses that variable to make sure that time never goes backwards.

Here's a draft TAP test that uses that technique successfully, as a
POC. It should probably be extended to cover more cases, but I
thought I'd check what people thought of the concept first before
going further. I didn't see a way to do overlapping transactions with
PostgresNode.pm, so I invented one (please excuse the bad perl); am I
missing something? Maybe it'd be better to do 002 with an isolation
test instead, but I suppose 001 can't be in an isolation test, since
it needs to connect to multiple databases, and it seemed better to do
them both the same way. It's also not entirely clear to me that
isolation tests can expect a database to be fresh and then mess with
dangerous internal state, whereas TAP tests set up and tear down a
cluster each time.

I think I found another bug in MaintainOldSnapshotTimeMapping(): if
you make time jump by more than old_snapshot_threshold in one go, then
the map gets cleared and then no early pruning or snapshot-too-old
errors happen. That's why in 002_too_old.pl it currently advances
time by 10 minutes twice, instead of 20 minutes once. To be
continued.

Attachments:

v2-0001-Expose-oldSnapshotControl.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Expose-oldSnapshotControl.patchDownload+77-54
v2-0002-contrib-old_snapshot-time-xid-mapping.patchtext/x-patch; charset=US-ASCII; name=v2-0002-contrib-old_snapshot-time-xid-mapping.patchDownload+203-1
v2-0003-Fix-bugs-in-MaintainOldSnapshotTimeMapping.patchtext/x-patch; charset=US-ASCII; name=v2-0003-Fix-bugs-in-MaintainOldSnapshotTimeMapping.patchDownload+27-4
v2-0004-Add-pg_clobber_current_snapshot_timestamp.patchtext/x-patch; charset=US-ASCII; name=v2-0004-Add-pg_clobber_current_snapshot_timestamp.patchDownload+18-1
v2-0005-Truncate-old-snapshot-XIDs-before-truncating-CLOG.patchtext/x-patch; charset=US-ASCII; name=v2-0005-Truncate-old-snapshot-XIDs-before-truncating-CLOG.patchDownload+106-2
v2-0006-Add-TAP-test-for-snapshot-too-old.patchtext/x-patch; charset=US-ASCII; name=v2-0006-Add-TAP-test-for-snapshot-too-old.patchDownload+197-1
#7Dilip Kumar
dilipbalaut@gmail.com
In reply to: Thomas Munro (#6)
Re: fixing old_snapshot_threshold's time->xid mapping

On Sat, Apr 18, 2020 at 11:47 AM Thomas Munro <thomas.munro@gmail.com> wrote:

On Fri, Apr 17, 2020 at 2:12 PM Thomas Munro <thomas.munro@gmail.com> wrote:

What about a contrib function that lets you clobber
oldSnapshotControl->current_timestamp? It looks like all times in
this system come ultimately from GetSnapshotCurrentTimestamp(), which
uses that variable to make sure that time never goes backwards.

Here's a draft TAP test that uses that technique successfully, as a
POC. It should probably be extended to cover more cases, but I
thought I'd check what people thought of the concept first before
going further. I didn't see a way to do overlapping transactions with
PostgresNode.pm, so I invented one (please excuse the bad perl); am I
missing something? Maybe it'd be better to do 002 with an isolation
test instead, but I suppose 001 can't be in an isolation test, since
it needs to connect to multiple databases, and it seemed better to do
them both the same way. It's also not entirely clear to me that
isolation tests can expect a database to be fresh and then mess with
dangerous internal state, whereas TAP tests set up and tear down a
cluster each time.

I think I found another bug in MaintainOldSnapshotTimeMapping(): if
you make time jump by more than old_snapshot_threshold in one go, then
the map gets cleared and then no early pruning or snapshot-too-old
errors happen. That's why in 002_too_old.pl it currently advances
time by 10 minutes twice, instead of 20 minutes once. To be
continued.

IMHO that doesn't seems to be a problem. Because even if we jump more
than old_snapshot_threshold in one go we don't clean complete map
right. The latest snapshot timestamp will become the headtimestamp.
So in TransactionIdLimitedForOldSnapshots if (current_ts -
old_snapshot_threshold) is still >= head_timestap then we can still do
early pruning.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#8Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#5)
Re: fixing old_snapshot_threshold's time->xid mapping

Hi,

On 2020-04-17 14:12:44 +1200, Thomas Munro wrote:

What about a contrib function that lets you clobber
oldSnapshotControl->current_timestamp? It looks like all times in
this system come ultimately from GetSnapshotCurrentTimestamp(), which
uses that variable to make sure that time never goes backwards.

It'd be better than the current test situation, and probably would be
good to have as part of testing anyway (since it'd allow to make the
tests not take long / be racy on slow machines). But I still don't think
it really allows to test the feature in a natural way. It makes it
easier to test for know edge cases / problems, but not really discover
unknown ones. For that I think we need more granular bins.

- Andres

#9Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#1)
Re: fixing old_snapshot_threshold's time->xid mapping

On Thu, Apr 16, 2020 at 10:12 PM Robert Haas <robertmhaas@gmail.com> wrote:

Hi,

I'm starting a new thread for this, because the recent discussion of
problems with old_snapshot_threshold[1] touched on a lot of separate
issues, and I think it will be too confusing if we discuss all of them
on one thread. Attached are three patches.

0001 makes oldSnapshotControl "extern" rather than "static" and
exposes the struct definition via a header.

0002 adds a contrib module called old_snapshot which makes it possible
to examine the time->XID mapping via SQL. As Andres said, the comments
are not really adequate in the existing code, and the code itself is
buggy, so it was a little hard to be sure that I was understanding the
intended meaning of the different fields correctly. However, I gave it
a shot.

0003 attempts to fix bugs in MaintainOldSnapshotTimeMapping() so that
it produces a sensible mapping. I encountered and tried to fix two
issues here:

First, as previously discussed, the branch that advances the mapping
should not categorically do "oldSnapshotControl->head_timestamp = ts;"
assuming that the head_timestamp is supposed to be the timestamp for
the oldest bucket rather than the newest one. Rather, there are three
cases: (1) resetting the mapping resets head_timestamp, (2) extending
the mapping by an entry without dropping an entry leaves
head_timestamp alone, and (3) overwriting the previous head with a new
entry advances head_timestamp by 1 minute.

Second, the calculation of the number of entries by which the mapping
should advance is incorrect. It thinks that it should advance by the
number of minutes between the current head_timestamp and the incoming
timestamp. That would be correct if head_timestamp were the most
recent entry in the mapping, but it's actually the oldest. As a
result, without this fix, every time we move into a new minute, we
advance the mapping much further than we actually should. Instead of
advancing by 1, we advance by the number of entries that already exist
in the mapping - which means we now have entries that correspond to
times which are in the future, and don't advance the mapping again
until those future timestamps are in the past.

With these fixes, I seem to get reasonably sensible mappings, at least
in light testing. I tried running this in one window with \watch 10:

select *, age(newest_xmin), clock_timestamp() from
pg_old_snapshot_time_mapping();

And in another window I ran:

pgbench -T 300 -R 10

And the age does in fact advance by ~600 transactions per minute.

I have started reviewing these patches. I think, the fixes looks right to me.

+ LWLockAcquire(OldSnapshotTimeMapLock, LW_SHARED);
+ mapping->head_offset = oldSnapshotControl->head_offset;
+ mapping->head_timestamp = oldSnapshotControl->head_timestamp;
+ mapping->count_used = oldSnapshotControl->count_used;
+ for (int i = 0; i < OLD_SNAPSHOT_TIME_MAP_ENTRIES; ++i)
+ mapping->xid_by_minute[i] = oldSnapshotControl->xid_by_minute[i];
+ LWLockRelease(OldSnapshotTimeMapLock);

I think memcpy would be a better choice instead of looping it for all
the entries, since we are doing this under a lock?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#10Thomas Munro
thomas.munro@gmail.com
In reply to: Dilip Kumar (#7)
Re: fixing old_snapshot_threshold's time->xid mapping

On Sat, Apr 18, 2020 at 9:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Sat, Apr 18, 2020 at 11:47 AM Thomas Munro <thomas.munro@gmail.com> wrote:

I think I found another bug in MaintainOldSnapshotTimeMapping(): if
you make time jump by more than old_snapshot_threshold in one go, then
the map gets cleared and then no early pruning or snapshot-too-old
errors happen. That's why in 002_too_old.pl it currently advances
time by 10 minutes twice, instead of 20 minutes once. To be
continued.

IMHO that doesn't seems to be a problem. Because even if we jump more
than old_snapshot_threshold in one go we don't clean complete map
right. The latest snapshot timestamp will become the headtimestamp.
So in TransactionIdLimitedForOldSnapshots if (current_ts -
old_snapshot_threshold) is still >= head_timestap then we can still do
early pruning.

Right, thanks. I got confused about that, and misdiagnosed something
I was seeing.

Here's a new version:

0004: Instead of writing a new kind of TAP test to demonstrate
snapshot-too-old errors, I adjusted the existing isolation tests to
use the same absolute time control technique. Previously I had
invented a way to do isolation tester-like stuff in TAP tests, which
might be interesting but strange new perl is not necessary for this.

0005: Truncates the time map when the CLOG is truncated. Its test is
now under src/test/module/snapshot_too_old/t/001_truncate.sql.

These apply on top of Robert's patches, but the only dependency is on
his patch 0001 "Expose oldSnapshotControl.", because now I have stuff
in src/test/module/snapshot_too_old/test_sto.c that wants to mess with
that object too.

Is this an improvement? I realise that there is still nothing to
actually verify that early pruning has actually happened. I haven't
thought of a good way to do that yet (stats, page inspection, ...).

Attachments:

v3-0004-Rewrite-the-snapshot_too_old-tests-with-absolute-.patchtext/x-patch; charset=US-ASCII; name=v3-0004-Rewrite-the-snapshot_too_old-tests-with-absolute-.patchDownload+264-139
v3-0005-Truncate-snapshot-too-old-time-map-when-CLOG-is-t.patchtext/x-patch; charset=US-ASCII; name=v3-0005-Truncate-snapshot-too-old-time-map-when-CLOG-is-t.patchDownload+128-1
#11Dilip Kumar
dilipbalaut@gmail.com
In reply to: Thomas Munro (#10)
Re: fixing old_snapshot_threshold's time->xid mapping

On Mon, Apr 20, 2020 at 11:24 AM Thomas Munro <thomas.munro@gmail.com> wrote:

On Sat, Apr 18, 2020 at 9:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Sat, Apr 18, 2020 at 11:47 AM Thomas Munro <thomas.munro@gmail.com> wrote:

I think I found another bug in MaintainOldSnapshotTimeMapping(): if
you make time jump by more than old_snapshot_threshold in one go, then
the map gets cleared and then no early pruning or snapshot-too-old
errors happen. That's why in 002_too_old.pl it currently advances
time by 10 minutes twice, instead of 20 minutes once. To be
continued.

IMHO that doesn't seems to be a problem. Because even if we jump more
than old_snapshot_threshold in one go we don't clean complete map
right. The latest snapshot timestamp will become the headtimestamp.
So in TransactionIdLimitedForOldSnapshots if (current_ts -
old_snapshot_threshold) is still >= head_timestap then we can still do
early pruning.

Right, thanks. I got confused about that, and misdiagnosed something
I was seeing.

Here's a new version:

0004: Instead of writing a new kind of TAP test to demonstrate
snapshot-too-old errors, I adjusted the existing isolation tests to
use the same absolute time control technique. Previously I had
invented a way to do isolation tester-like stuff in TAP tests, which
might be interesting but strange new perl is not necessary for this.

0005: Truncates the time map when the CLOG is truncated. Its test is
now under src/test/module/snapshot_too_old/t/001_truncate.sql.

These apply on top of Robert's patches, but the only dependency is on
his patch 0001 "Expose oldSnapshotControl.", because now I have stuff
in src/test/module/snapshot_too_old/test_sto.c that wants to mess with
that object too.

Is this an improvement? I realise that there is still nothing to
actually verify that early pruning has actually happened. I haven't
thought of a good way to do that yet (stats, page inspection, ...).

Could we test the early pruning using xid-burn patch? Basically, in
xid_by_minute we have some xids with the current epoch. Now, we burns
more than 2b xid and then if we try to vacuum we might hit the case of
early pruning no. Do you wnated to this case or you had some other
case in mind which you wnated to test?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#12Thomas Munro
thomas.munro@gmail.com
In reply to: Dilip Kumar (#11)
Re: fixing old_snapshot_threshold's time->xid mapping

On Mon, Apr 20, 2020 at 6:35 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, Apr 20, 2020 at 11:24 AM Thomas Munro <thomas.munro@gmail.com> wrote:

On Sat, Apr 18, 2020 at 9:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Sat, Apr 18, 2020 at 11:47 AM Thomas Munro <thomas.munro@gmail.com> wrote:

Is this an improvement? I realise that there is still nothing to
actually verify that early pruning has actually happened. I haven't
thought of a good way to do that yet (stats, page inspection, ...).

Could we test the early pruning using xid-burn patch? Basically, in
xid_by_minute we have some xids with the current epoch. Now, we burns
more than 2b xid and then if we try to vacuum we might hit the case of
early pruning no. Do you wnated to this case or you had some other
case in mind which you wnated to test?

I mean I want to verify that VACUUM or heap prune actually removed a
tuple that was visible to an old snapshot. An idea I just had: maybe
sto_using_select.spec should check the visibility map (somehow). For
example, the sto_using_select.spec (the version in the patch I just
posted) just checks that after time 00:11, the old snapshot gets a
snapshot-too-old error. Perhaps we could add a VACUUM before that,
and then check that the page has become all visible, meaning that the
dead tuple our snapshot could see has now been removed.

#13Dilip Kumar
dilipbalaut@gmail.com
In reply to: Thomas Munro (#12)
Re: fixing old_snapshot_threshold's time->xid mapping

On Mon, Apr 20, 2020 at 12:29 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Mon, Apr 20, 2020 at 6:35 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, Apr 20, 2020 at 11:24 AM Thomas Munro <thomas.munro@gmail.com> wrote:

On Sat, Apr 18, 2020 at 9:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Sat, Apr 18, 2020 at 11:47 AM Thomas Munro <thomas.munro@gmail.com> wrote:

Is this an improvement? I realise that there is still nothing to
actually verify that early pruning has actually happened. I haven't
thought of a good way to do that yet (stats, page inspection, ...).

Could we test the early pruning using xid-burn patch? Basically, in
xid_by_minute we have some xids with the current epoch. Now, we burns
more than 2b xid and then if we try to vacuum we might hit the case of
early pruning no. Do you wnated to this case or you had some other
case in mind which you wnated to test?

I mean I want to verify that VACUUM or heap prune actually removed a
tuple that was visible to an old snapshot. An idea I just had: maybe
sto_using_select.spec should check the visibility map (somehow). For
example, the sto_using_select.spec (the version in the patch I just
posted) just checks that after time 00:11, the old snapshot gets a
snapshot-too-old error. Perhaps we could add a VACUUM before that,
and then check that the page has become all visible, meaning that the
dead tuple our snapshot could see has now been removed.

Okay, got your point. Can we try to implement some test functions
that can just call visibilitymap_get_status function internally? I
agree that we will have to pass the correct block number but that we
can find using TID. Or for testing, we can create a very small
relation that just has 1 block?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#14Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#9)
Re: fixing old_snapshot_threshold's time->xid mapping

On Mon, Apr 20, 2020 at 12:10 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I have started reviewing these patches. I think, the fixes looks right to me.

+ LWLockAcquire(OldSnapshotTimeMapLock, LW_SHARED);
+ mapping->head_offset = oldSnapshotControl->head_offset;
+ mapping->head_timestamp = oldSnapshotControl->head_timestamp;
+ mapping->count_used = oldSnapshotControl->count_used;
+ for (int i = 0; i < OLD_SNAPSHOT_TIME_MAP_ENTRIES; ++i)
+ mapping->xid_by_minute[i] = oldSnapshotControl->xid_by_minute[i];
+ LWLockRelease(OldSnapshotTimeMapLock);

I think memcpy would be a better choice instead of looping it for all
the entries, since we are doing this under a lock?

When I did it that way, it complained about "const" and I couldn't
immediately figure out how to fix it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15Thomas Munro
thomas.munro@gmail.com
In reply to: Dilip Kumar (#13)
Re: fixing old_snapshot_threshold's time->xid mapping

On Mon, Apr 20, 2020 at 8:02 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, Apr 20, 2020 at 12:29 PM Thomas Munro <thomas.munro@gmail.com> wrote:

I mean I want to verify that VACUUM or heap prune actually removed a
tuple that was visible to an old snapshot. An idea I just had: maybe
sto_using_select.spec should check the visibility map (somehow). For
example, the sto_using_select.spec (the version in the patch I just
posted) just checks that after time 00:11, the old snapshot gets a
snapshot-too-old error. Perhaps we could add a VACUUM before that,
and then check that the page has become all visible, meaning that the
dead tuple our snapshot could see has now been removed.

Okay, got your point. Can we try to implement some test functions
that can just call visibilitymap_get_status function internally? I
agree that we will have to pass the correct block number but that we
can find using TID. Or for testing, we can create a very small
relation that just has 1 block?

I think it's enough to check SELECT EVERY(all_visible) FROM
pg_visibility_map('sto1'::regclass). I realised that
src/test/module/snapshot_too_old is allowed to install
contrib/pg_visibility with EXTRA_INSTALL, so here's a new version to
try that idea. It extends sto_using_select.spec to VACUUM and check
the vis map at key times. That allows us to check that early pruning
really happens once the snapshot becomes too old. There are other
ways you could check that but this seems quite "light touch" compared
to something based on page inspection.

I also changed src/test/module/snapshot_too_old/t/001_truncate.pl back
to using Robert's contrib/old_snapshot extension to know the size of
the time/xid map, allowing an introspection function to be dropped
from test_sto.c.

As before, these two apply on top of Robert's patches (or at least his
0001 and 0002).

Attachments:

v4-0004-Rewrite-the-snapshot_too_old-tests.patchtext/x-patch; charset=US-ASCII; name=v4-0004-Rewrite-the-snapshot_too_old-tests.patchDownload+366-134
v4-0005-Truncate-snapshot-too-old-time-map-when-CLOG-is-t.patchtext/x-patch; charset=US-ASCII; name=v4-0005-Truncate-snapshot-too-old-time-map-when-CLOG-is-t.patchDownload+109-2
#16Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#14)
Re: fixing old_snapshot_threshold's time->xid mapping

On Mon, Apr 20, 2020 at 11:31 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Apr 20, 2020 at 12:10 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I have started reviewing these patches. I think, the fixes looks right to me.

+ LWLockAcquire(OldSnapshotTimeMapLock, LW_SHARED);
+ mapping->head_offset = oldSnapshotControl->head_offset;
+ mapping->head_timestamp = oldSnapshotControl->head_timestamp;
+ mapping->count_used = oldSnapshotControl->count_used;
+ for (int i = 0; i < OLD_SNAPSHOT_TIME_MAP_ENTRIES; ++i)
+ mapping->xid_by_minute[i] = oldSnapshotControl->xid_by_minute[i];
+ LWLockRelease(OldSnapshotTimeMapLock);

I think memcpy would be a better choice instead of looping it for all
the entries, since we are doing this under a lock?

When I did it that way, it complained about "const" and I couldn't
immediately figure out how to fix it.

I think we can typecast to (const void *). After below change, I did
not get the warning.

diff --git a/contrib/old_snapshot/time_mapping.c
b/contrib/old_snapshot/time_mapping.c
index 37e0055..cc53bdd 100644
--- a/contrib/old_snapshot/time_mapping.c
+++ b/contrib/old_snapshot/time_mapping.c
@@ -94,8 +94,9 @@ GetOldSnapshotTimeMapping(void)
        mapping->head_offset = oldSnapshotControl->head_offset;
        mapping->head_timestamp = oldSnapshotControl->head_timestamp;
        mapping->count_used = oldSnapshotControl->count_used;
-       for (int i = 0; i < OLD_SNAPSHOT_TIME_MAP_ENTRIES; ++i)
-               mapping->xid_by_minute[i] =
oldSnapshotControl->xid_by_minute[i];
+       memcpy(mapping->xid_by_minute,
+                  (const void *) oldSnapshotControl->xid_by_minute,
+                  sizeof(TransactionId) * OLD_SNAPSHOT_TIME_MAP_ENTRIES);
        LWLockRelease(OldSnapshotTimeMapLock);

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#17Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#15)
Re: fixing old_snapshot_threshold's time->xid mapping

On Tue, Apr 21, 2020 at 2:05 PM Thomas Munro <thomas.munro@gmail.com> wrote:

As before, these two apply on top of Robert's patches (or at least his
0001 and 0002).

While trying to figure out if Robert's 0003 patch was correct, I added
yet another patch to this stack to test it. 0006 does basic xid map
maintenance that exercises the cases 0003 fixes, and I think it
demonstrates that they now work correctly. Also some minor perl
improvements to 0005. I'll attach 0001-0004 again but they are
unchanged.

Since confusion about head vs tail seems to have been at the root of
the bugs addressed by 0003, I wonder if we should also rename
head_{timestamp,offset} to oldest_{timestamp,offset}.

Attachments:

v5-0001-Expose-oldSnapshotControl.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Expose-oldSnapshotControl.patchDownload+77-54
v5-0002-contrib-old_snapshot-time-xid-mapping.patchtext/x-patch; charset=US-ASCII; name=v5-0002-contrib-old_snapshot-time-xid-mapping.patchDownload+203-1
v5-0003-Fix-bugs-in-MaintainOldSnapshotTimeMapping.patchtext/x-patch; charset=US-ASCII; name=v5-0003-Fix-bugs-in-MaintainOldSnapshotTimeMapping.patchDownload+27-4
v5-0004-Rewrite-the-snapshot_too_old-tests.patchtext/x-patch; charset=US-ASCII; name=v5-0004-Rewrite-the-snapshot_too_old-tests.patchDownload+366-134
v5-0005-Truncate-snapshot-too-old-time-map-when-CLOG-is-t.patchtext/x-patch; charset=US-ASCII; name=v5-0005-Truncate-snapshot-too-old-time-map-when-CLOG-is-t.patchDownload+128-2
v5-0006-Add-TAP-test-for-snapshot-too-old-time-map-mainte.patchtext/x-patch; charset=US-ASCII; name=v5-0006-Add-TAP-test-for-snapshot-too-old-time-map-mainte.patchDownload+63-1
#18Dilip Kumar
dilipbalaut@gmail.com
In reply to: Thomas Munro (#17)
Re: fixing old_snapshot_threshold's time->xid mapping

On Tue, Apr 21, 2020 at 3:44 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Tue, Apr 21, 2020 at 2:05 PM Thomas Munro <thomas.munro@gmail.com> wrote:

As before, these two apply on top of Robert's patches (or at least his
0001 and 0002).

While trying to figure out if Robert's 0003 patch was correct, I added
yet another patch to this stack to test it. 0006 does basic xid map
maintenance that exercises the cases 0003 fixes, and I think it
demonstrates that they now work correctly.

+1,  I think we should also add a way to test the case, where we
advance the timestamp by multiple slots.  I see that you have such
case
e.g
+# test adding minutes while the map is not full
+set_time('3000-01-01 02:01:00Z');
+is(summarize_mapping(), "2|02:00:00|02:01:00");
+set_time('3000-01-01 02:05:00Z');
+is(summarize_mapping(), "6|02:00:00|02:05:00");
+set_time('3000-01-01 02:19:00Z');
+is(summarize_mapping(), "20|02:00:00|02:19:00");

But, I think we should try to extend it to test that we have put the
new xid only in those slots where we suppose to and not in other
slots?.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#19Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#18)
Re: fixing old_snapshot_threshold's time->xid mapping

On Tue, Apr 21, 2020 at 4:52 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Apr 21, 2020 at 3:44 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Tue, Apr 21, 2020 at 2:05 PM Thomas Munro <thomas.munro@gmail.com> wrote:

As before, these two apply on top of Robert's patches (or at least his
0001 and 0002).

While trying to figure out if Robert's 0003 patch was correct, I added
yet another patch to this stack to test it. 0006 does basic xid map
maintenance that exercises the cases 0003 fixes, and I think it
demonstrates that they now work correctly.

+1,  I think we should also add a way to test the case, where we
advance the timestamp by multiple slots.  I see that you have such
case
e.g
+# test adding minutes while the map is not full
+set_time('3000-01-01 02:01:00Z');
+is(summarize_mapping(), "2|02:00:00|02:01:00");
+set_time('3000-01-01 02:05:00Z');
+is(summarize_mapping(), "6|02:00:00|02:05:00");
+set_time('3000-01-01 02:19:00Z');
+is(summarize_mapping(), "20|02:00:00|02:19:00");

But, I think we should try to extend it to test that we have put the
new xid only in those slots where we suppose to and not in other
slots?.

I feel that we should. probably fix this check as well? Because if ts

update_ts then it will go to else part then there it will finally

end up in the last slot only so I think we can use this case also as
fast exit.

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 93a0c04..644d9b1 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1831,7 +1831,7 @@
TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
                if (!same_ts_as_threshold)
                {
-                       if (ts == update_ts)
+                       if (ts >= update_ts)
                        {
                                xlimit = latest_xmin;
                                if (NormalTransactionIdFollows(xlimit,
recentXmin))

This patch can be applied on top of other v5 patches.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v5-0007-Fix-check-while-computing-transaction-xid-limit.patchapplication/octet-stream; name=v5-0007-Fix-check-while-computing-transaction-xid-limit.patchDownload+1-2
#20Thomas Munro
thomas.munro@gmail.com
In reply to: Dilip Kumar (#19)
Re: fixing old_snapshot_threshold's time->xid mapping

On Wed, Apr 22, 2020 at 5:39 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

-                       if (ts == update_ts)
+                       if (ts >= update_ts)

Hi Dilip, I didn't follow this bit -- could you explain?

Here's a rebase. In the 0004 patch I chose to leave behind some
unnecessary braces to avoid reindenting a bunch of code after removing
an if branch, just for ease of review, but I'd probably remove those
in a committed version. I'm going to add this stuff to the next CF so
we don't lose track of it.

Attachments:

v6-0001-Expose-oldSnapshotControl.patchtext/x-patch; charset=US-ASCII; name=v6-0001-Expose-oldSnapshotControl.patchDownload+77-54
v6-0002-contrib-old_snapshot-time-xid-mapping.patchtext/x-patch; charset=US-ASCII; name=v6-0002-contrib-old_snapshot-time-xid-mapping.patchDownload+203-1
v6-0003-Fix-bugs-in-MaintainOldSnapshotTimeMapping.patchtext/x-patch; charset=US-ASCII; name=v6-0003-Fix-bugs-in-MaintainOldSnapshotTimeMapping.patchDownload+27-4
v6-0004-Rewrite-the-snapshot_too_old-tests.patchtext/x-patch; charset=US-ASCII; name=v6-0004-Rewrite-the-snapshot_too_old-tests.patchDownload+366-131
v6-0005-Truncate-snapshot-too-old-time-map-when-CLOG-is-t.patchtext/x-patch; charset=US-ASCII; name=v6-0005-Truncate-snapshot-too-old-time-map-when-CLOG-is-t.patchDownload+128-2
v6-0006-Add-TAP-test-for-snapshot-too-old-time-map-mainte.patchtext/x-patch; charset=US-ASCII; name=v6-0006-Add-TAP-test-for-snapshot-too-old-time-map-mainte.patchDownload+63-1
#21Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#20)
#22Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#21)
#23Michael Paquier
michael@paquier.xyz
In reply to: Thomas Munro (#22)
#24Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#23)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#24)
#26Thomas Munro
thomas.munro@gmail.com
In reply to: Robert Haas (#25)
#27Hamid Akhtar
hamid.akhtar@gmail.com
In reply to: Thomas Munro (#26)
#28Robert Haas
robertmhaas@gmail.com
In reply to: Thomas Munro (#26)
#29Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#28)
#30Thomas Munro
thomas.munro@gmail.com
In reply to: Michael Paquier (#29)
#31Anastasia Lubennikova
a.lubennikova@postgrespro.ru
In reply to: Thomas Munro (#30)