Challenges preventing us moving to 64 bit transaction id (XID)?

Started by 陈天舟almost 9 years ago59 messageshackers
Jump to latest
#1陈天舟
tianzhouchen@gmail.com

Hi Pg Hackers,

XID wraparound seems to be quite a big concern and we introduce changes like “adding another frozen bit to each page” [http://rhaas.blogspot.com/2016/03/no-more-full-table-vacuums.html <http://rhaas.blogspot.com/2016/03/no-more-full-table-vacuums.html&gt; to tackle this. I am just wondering what’s the challenges preventing us from moving to 64 bit xid? This is the previous thread I find /messages/by-id/CAEYLb_UfC+HZ4RAP7XuoFZr+2_ktQmS9xqcQgE-rNf5UCqEt5A@mail.gmail.com </messages/by-id/CAEYLb_UfC+HZ4RAP7XuoFZr+2_ktQmS9xqcQgE-rNf5UCqEt5A@mail.gmail.com&gt;, the only answer there is:


The most obvious reason for not using 64-bit xid values is that they
require more storage than 32-bit values. There is a patch floating
around that makes it safe to not forcibly safety shutdown the server
where currently it is necessary, but it doesn't work by making xids
64-bit.

"

I am personally not quite convinced that is the main reason, since I feel for database hitting this issue, the schema is mostly non-trivial and doesn’t matter so much with 8 more bytes. Could some postgres experts share more insights about the challenges?

Thanks
Tianzhou

#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: 陈天舟 (#1)
Re: Challenges preventing us moving to 64 bit transaction id (XID)?

On 06/05/2017 11:49 AM, Tianzhou Chen wrote:

Hi Pg Hackers,

XID wraparound seems to be quite a big concern and we introduce
changes like “adding another frozen bit to each page”
[http://rhaas.blogspot.com/2016/03/no-more-full-table-vacuums.html
<http://rhaas.blogspot.com/2016/03/no-more-full-table-vacuums.html&gt;
to tackle this. I am just wondering what’s the challenges preventing
us from moving to 64 bit xid? This is the previous thread I find
/messages/by-id/CAEYLb_UfC+HZ4RAP7XuoFZr+2_ktQmS9xqcQgE-rNf5UCqEt5A@mail.gmail.com
</messages/by-id/CAEYLb_UfC+HZ4RAP7XuoFZr+2_ktQmS9xqcQgE-rNf5UCqEt5A@mail.gmail.com&gt;,
the only answer there is:

“ The most obvious reason for not using 64-bit xid values is that
they require more storage than 32-bit values. There is a patch
floating around that makes it safe to not forcibly safety shutdown
the server where currently it is necessary, but it doesn't work by
making xids 64-bit.
"

I am personally not quite convinced that is the main reason, since I
feel for database hitting this issue, the schema is mostly
non-trivial and doesn’t matter so much with 8 more bytes. Could some
postgres experts share more insights about the challenges?

That quote is accurate. We don't want to just expand XIDs to 64 bits,
because it would significantly bloat the tuple header. PostgreSQL's
per-tuple overhead is already quite large, compared to many other systems.

The most promising approach to tackle this is to switch to 64-bit XIDs
in in-memory structures, and add some kind of an extra epoch field to
the page header. That would effectively give you 64-bit XIDs, but would
only add one a field to each page, not every tuple.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Heikki Linnakangas (#2)
Re: Challenges preventing us moving to 64 bit transaction id (XID)?

On Mon, Jun 5, 2017 at 2:38 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 06/05/2017 11:49 AM, Tianzhou Chen wrote:

Hi Pg Hackers,

XID wraparound seems to be quite a big concern and we introduce
changes like “adding another frozen bit to each page”
[http://rhaas.blogspot.com/2016/03/no-more-full-table-vacuums.html
<http://rhaas.blogspot.com/2016/03/no-more-full-table-vacuums.html&gt;
to tackle this. I am just wondering what’s the challenges preventing
us from moving to 64 bit xid? This is the previous thread I find

/messages/by-id/CAEYLb_UfC+HZ4RAP7XuoFZr+2_ktQmS9xqcQgE-rNf5UCqEt5A@mail.gmail.com

</messages/by-id/CAEYLb_UfC+HZ4RAP7XuoFZr+2_ktQmS9xqcQgE-rNf5UCqEt5A@mail.gmail.com&gt;,
the only answer there is:

“ The most obvious reason for not using 64-bit xid values is that
they require more storage than 32-bit values. There is a patch
floating around that makes it safe to not forcibly safety shutdown
the server where currently it is necessary, but it doesn't work by
making xids 64-bit.
"

I am personally not quite convinced that is the main reason, since I
feel for database hitting this issue, the schema is mostly
non-trivial and doesn’t matter so much with 8 more bytes. Could some
postgres experts share more insights about the challenges?

That quote is accurate. We don't want to just expand XIDs to 64 bits,
because it would significantly bloat the tuple header. PostgreSQL's
per-tuple overhead is already quite large, compared to many other systems.

The most promising approach to tackle this is to switch to 64-bit XIDs in
in-memory structures, and add some kind of an extra epoch field to the page
header. That would effectively give you 64-bit XIDs, but would only add one
a field to each page, not every tuple.

What happens when the epoch is so low that the rest of the XID does
not fit in 32bits of tuple header? Or such a case should never arise?
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Craig Ringer
craig@2ndquadrant.com
In reply to: Ashutosh Bapat (#3)
Re: Challenges preventing us moving to 64 bit transaction id (XID)?

On 6 June 2017 at 12:13, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:

What happens when the epoch is so low that the rest of the XID does
not fit in 32bits of tuple header? Or such a case should never arise?

Storing an epoch implies that rows can't have (xmin,xmax) different by
more than one epoch. So if you're updating/deleting an extremely old
tuple you'll presumably have to set xmin to FrozenTransactionId if it
isn't already, so you can set a new epoch and xmax.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Craig Ringer (#4)
Re: Challenges preventing us moving to 64 bit transaction id (XID)?

On Tue, Jun 6, 2017 at 9:48 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 6 June 2017 at 12:13, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:

What happens when the epoch is so low that the rest of the XID does
not fit in 32bits of tuple header? Or such a case should never arise?

Storing an epoch implies that rows can't have (xmin,xmax) different by
more than one epoch. So if you're updating/deleting an extremely old
tuple you'll presumably have to set xmin to FrozenTransactionId if it
isn't already, so you can set a new epoch and xmax.

If the page has multiple such tuples, updating one tuple will mean
updating headers of other tuples as well? This means that those tuples
need to be locked for concurrent scans? May be not, since such tuples
will be anyway visible to any concurrent scans and updating xmin/xmax
doesn't change the visibility. But we might have to prevent multiple
updates to the xmin/xmax because of concurrent updates on the same
page.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ashutosh Bapat (#5)
Re: Challenges preventing us moving to 64 bit transaction id (XID)?

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:

On Tue, Jun 6, 2017 at 9:48 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

Storing an epoch implies that rows can't have (xmin,xmax) different by
more than one epoch. So if you're updating/deleting an extremely old
tuple you'll presumably have to set xmin to FrozenTransactionId if it
isn't already, so you can set a new epoch and xmax.

If the page has multiple such tuples, updating one tuple will mean
updating headers of other tuples as well? This means that those tuples
need to be locked for concurrent scans?

Locks for tuple header updates are taken at page level anyway, so in
principle you could run around and freeze other tuples on the page
anytime you had to change the page's high-order-XID value. Holding
the lock for long enough to do that is slightly annoying, but it
should happen so seldom as to not represent a real performance problem.

In my mind the harder problem is where to find another 32 bits for the
new page header field. You could convert the header format on-the-fly
if there's free space in the page, but what if there isn't?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Tom Lane (#6)
Re: Challenges preventing us moving to 64 bit transaction id (XID)?

On Tue, Jun 6, 2017 at 10:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:

On Tue, Jun 6, 2017 at 9:48 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

Storing an epoch implies that rows can't have (xmin,xmax) different by
more than one epoch. So if you're updating/deleting an extremely old
tuple you'll presumably have to set xmin to FrozenTransactionId if it
isn't already, so you can set a new epoch and xmax.

If the page has multiple such tuples, updating one tuple will mean
updating headers of other tuples as well? This means that those tuples
need to be locked for concurrent scans?

Locks for tuple header updates are taken at page level anyway, so in
principle you could run around and freeze other tuples on the page
anytime you had to change the page's high-order-XID value. Holding
the lock for long enough to do that is slightly annoying, but it
should happen so seldom as to not represent a real performance problem.

In my mind the harder problem is where to find another 32 bits for the
new page header field. You could convert the header format on-the-fly
if there's free space in the page, but what if there isn't?

I guess, we will have to reserve 32 bits in the header. That's much
better than increasing tuple header by 32 bits.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Craig Ringer
craig@2ndquadrant.com
In reply to: Ashutosh Bapat (#7)
Re: Challenges preventing us moving to 64 bit transaction id (XID)?

On 6 June 2017 at 12:38, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:

On Tue, Jun 6, 2017 at 10:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:

On Tue, Jun 6, 2017 at 9:48 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

Storing an epoch implies that rows can't have (xmin,xmax) different by
more than one epoch. So if you're updating/deleting an extremely old
tuple you'll presumably have to set xmin to FrozenTransactionId if it
isn't already, so you can set a new epoch and xmax.

If the page has multiple such tuples, updating one tuple will mean
updating headers of other tuples as well? This means that those tuples
need to be locked for concurrent scans?

Locks for tuple header updates are taken at page level anyway, so in
principle you could run around and freeze other tuples on the page
anytime you had to change the page's high-order-XID value. Holding
the lock for long enough to do that is slightly annoying, but it
should happen so seldom as to not represent a real performance problem.

In my mind the harder problem is where to find another 32 bits for the
new page header field. You could convert the header format on-the-fly
if there's free space in the page, but what if there isn't?

I guess, we will have to reserve 32 bits in the header. That's much
better than increasing tuple header by 32 bits.

Tom's point is, I think, that we'll want to stay pg_upgrade
compatible. So when we see a pg10 tuple and want to add a new page
with a new page header that has an epoch, but the whole page is full
so there isn't 32 bits left to move tuples "down" the page, what do we
do?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Bruce Momjian
bruce@momjian.us
In reply to: Craig Ringer (#8)
Re: Challenges preventing us moving to 64 bit transaction id (XID)?

On Tue, Jun 6, 2017 at 06:00:54PM +0800, Craig Ringer wrote:

On 6 June 2017 at 12:38, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:

On Tue, Jun 6, 2017 at 10:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

In my mind the harder problem is where to find another 32 bits for the
new page header field. You could convert the header format on-the-fly
if there's free space in the page, but what if there isn't?

I guess, we will have to reserve 32 bits in the header. That's much
better than increasing tuple header by 32 bits.

Tom's point is, I think, that we'll want to stay pg_upgrade
compatible. So when we see a pg10 tuple and want to add a new page
with a new page header that has an epoch, but the whole page is full
so there isn't 32 bits left to move tuples "down" the page, what do we
do?

I guess I am missing something. If you see an old page version number,
you know none of the tuples are from running transactions so you can
just freeze them all, after consulting the pg_clog. What am I missing?
If the page is full, why are you trying to add to the page?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Peter Eisentraut
peter_e@gmx.net
In reply to: Bruce Momjian (#9)
Re: Challenges preventing us moving to 64 bit transaction id (XID)?

On 6/6/17 08:29, Bruce Momjian wrote:

On Tue, Jun 6, 2017 at 06:00:54PM +0800, Craig Ringer wrote:

Tom's point is, I think, that we'll want to stay pg_upgrade
compatible. So when we see a pg10 tuple and want to add a new page
with a new page header that has an epoch, but the whole page is full
so there isn't 32 bits left to move tuples "down" the page, what do we
do?

I guess I am missing something. If you see an old page version number,
you know none of the tuples are from running transactions so you can
just freeze them all, after consulting the pg_clog. What am I missing?
If the page is full, why are you trying to add to the page?

The problem is if you want to delete from such a page. Then you need to
update the tuple's xmax and stick the new xid epoch somewhere.

We had an unconference session at PGCon about this. These issues were
all discussed and some ideas were thrown around. We can expect a patch
to appear soon, I think.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Bruce Momjian
bruce@momjian.us
In reply to: Peter Eisentraut (#10)
Re: Challenges preventing us moving to 64 bit transaction id (XID)?

On Tue, Jun 6, 2017 at 09:05:03AM -0400, Peter Eisentraut wrote:

On 6/6/17 08:29, Bruce Momjian wrote:

On Tue, Jun 6, 2017 at 06:00:54PM +0800, Craig Ringer wrote:

Tom's point is, I think, that we'll want to stay pg_upgrade
compatible. So when we see a pg10 tuple and want to add a new page
with a new page header that has an epoch, but the whole page is full
so there isn't 32 bits left to move tuples "down" the page, what do we
do?

I guess I am missing something. If you see an old page version number,
you know none of the tuples are from running transactions so you can
just freeze them all, after consulting the pg_clog. What am I missing?
If the page is full, why are you trying to add to the page?

The problem is if you want to delete from such a page. Then you need to
update the tuple's xmax and stick the new xid epoch somewhere.

We had an unconference session at PGCon about this. These issues were
all discussed and some ideas were thrown around. We can expect a patch
to appear soon, I think.

Sorry I missed the unconference session.

OK, crazy idea. Since we know the creation is frozen can we put the
epoch in the xmin and set some tuple bit that only has meaning on old
page versions? Yeah, I said crazy.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Ashutosh Bapat (#5)
Re: Challenges preventing us moving to 64 bit transaction id (XID)?

On 06/06/2017 07:24 AM, Ashutosh Bapat wrote:

On Tue, Jun 6, 2017 at 9:48 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 6 June 2017 at 12:13, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:

What happens when the epoch is so low that the rest of the XID does
not fit in 32bits of tuple header? Or such a case should never arise?

Storing an epoch implies that rows can't have (xmin,xmax) different by
more than one epoch. So if you're updating/deleting an extremely old
tuple you'll presumably have to set xmin to FrozenTransactionId if it
isn't already, so you can set a new epoch and xmax.

If the page has multiple such tuples, updating one tuple will mean
updating headers of other tuples as well? This means that those tuples
need to be locked for concurrent scans? May be not, since such tuples
will be anyway visible to any concurrent scans and updating xmin/xmax
doesn't change the visibility. But we might have to prevent multiple
updates to the xmin/xmax because of concurrent updates on the same
page.

"Store the epoch in the page header" is actually a slightly
simpler-to-visualize, but incorrect, version of what we actually need to
do. If you only store the epoch, then all the XIDs on a page need to
belong to the same epoch, which causes trouble when the current epoch
changes. Just after the epoch changes, you cannot necessarily freeze all
the tuples from the previous epoch, because they would not yet be
visible to everyone.

The full picture is that we need to store one 64-bit XID "base" value in
the page header, and all the xmin/xmax values in the tuple headers are
offsets relative to that base. With that, you effectively have 64-bit
XIDs, as long as the *difference* between any two XIDs on a page is not
greater than 2^32. That can be guaranteed, as long as we don't allow a
transaction to be in-progress for more than 2^32 XIDs. That seems like a
reasonable limitation.

But yes, when the "current XID - base XID in page header" becomes
greater than 2^32, and you need to update a tuple on that page, you need
to first freeze the page, update the base XID on the page header to a
more recent value, and update the XID offsets on every tuple on the page
accordingly. And to do that, you need to hold a lock on the page. If you
don't move any tuples around at the same time, but just update the XID
fields, and exclusive lock on the page is enough, i.e. you don't need to
take a super-exclusive or vacuum lock. In any case, it happens so
infrequently that it should not become a serious burden.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Alexander Korotkov
aekorotkov@gmail.com
In reply to: Peter Eisentraut (#10)
Re: Challenges preventing us moving to 64 bit transaction id (XID)?

On Tue, Jun 6, 2017 at 4:05 PM, Peter Eisentraut <
peter.eisentraut@2ndquadrant.com> wrote:

On 6/6/17 08:29, Bruce Momjian wrote:

On Tue, Jun 6, 2017 at 06:00:54PM +0800, Craig Ringer wrote:

Tom's point is, I think, that we'll want to stay pg_upgrade
compatible. So when we see a pg10 tuple and want to add a new page
with a new page header that has an epoch, but the whole page is full
so there isn't 32 bits left to move tuples "down" the page, what do we
do?

I guess I am missing something. If you see an old page version number,
you know none of the tuples are from running transactions so you can
just freeze them all, after consulting the pg_clog. What am I missing?
If the page is full, why are you trying to add to the page?

The problem is if you want to delete from such a page. Then you need to
update the tuple's xmax and stick the new xid epoch somewhere.

We had an unconference session at PGCon about this. These issues were
all discussed and some ideas were thrown around. We can expect a patch
to appear soon, I think.

Right. I'm now working on splitting my large patch for 64-bit xids into
patchset.
I'm planning to post patchset in the beginning of next week.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#14Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#12)
Re: Challenges preventing us moving to 64 bit transaction id (XID)?

On Wed, Jun 7, 2017 at 10:47 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 06/06/2017 07:24 AM, Ashutosh Bapat wrote:

On Tue, Jun 6, 2017 at 9:48 AM, Craig Ringer <craig@2ndquadrant.com>
wrote:

On 6 June 2017 at 12:13, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
wrote:

What happens when the epoch is so low that the rest of the XID does

not fit in 32bits of tuple header? Or such a case should never arise?

Storing an epoch implies that rows can't have (xmin,xmax) different by
more than one epoch. So if you're updating/deleting an extremely old
tuple you'll presumably have to set xmin to FrozenTransactionId if it
isn't already, so you can set a new epoch and xmax.

If the page has multiple such tuples, updating one tuple will mean
updating headers of other tuples as well? This means that those tuples
need to be locked for concurrent scans? May be not, since such tuples
will be anyway visible to any concurrent scans and updating xmin/xmax
doesn't change the visibility. But we might have to prevent multiple
updates to the xmin/xmax because of concurrent updates on the same
page.

"Store the epoch in the page header" is actually a slightly
simpler-to-visualize, but incorrect, version of what we actually need to
do. If you only store the epoch, then all the XIDs on a page need to belong
to the same epoch, which causes trouble when the current epoch changes.
Just after the epoch changes, you cannot necessarily freeze all the tuples
from the previous epoch, because they would not yet be visible to everyone.

The full picture is that we need to store one 64-bit XID "base" value in
the page header, and all the xmin/xmax values in the tuple headers are
offsets relative to that base. With that, you effectively have 64-bit XIDs,
as long as the *difference* between any two XIDs on a page is not greater
than 2^32. That can be guaranteed, as long as we don't allow a transaction
to be in-progress for more than 2^32 XIDs. That seems like a reasonable
limitation.

Right. I used the term "64-bit epoch" during developer unconference, but
that was ambiguous. It would be more correct to call it a "64-bit base".
BTW, we will have to store two 64-bit bases: for xids and for multixacts,
because they are completely independent counters.

But yes, when the "current XID - base XID in page header" becomes greater

than 2^32, and you need to update a tuple on that page, you need to first
freeze the page, update the base XID on the page header to a more recent
value, and update the XID offsets on every tuple on the page accordingly.
And to do that, you need to hold a lock on the page. If you don't move any
tuples around at the same time, but just update the XID fields, and
exclusive lock on the page is enough, i.e. you don't need to take a
super-exclusive or vacuum lock. In any case, it happens so infrequently
that it should not become a serious burden.

Yes, exclusive lock seems to be enough for single page freeze.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#15Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alexander Korotkov (#14)
Re: Challenges preventing us moving to 64 bit transaction id (XID)?

Alexander Korotkov wrote:

Right. I used the term "64-bit epoch" during developer unconference, but
that was ambiguous. It would be more correct to call it a "64-bit base".
BTW, we will have to store two 64-bit bases: for xids and for multixacts,
because they are completely independent counters.

So this takes us from 4 additional bytes per page, to 16 additional
bytes per page. With the proposal to require 4 free bytes it seemed
quite unlikely that many pages would fail to comply (so whatever
fallback mechanism was needed during page upgrade would be seldom used),
but now that they are 16, the likelihood of needing to run that page
upgrade seems a tad high.

Instead of adding a second 64 bit counter for multixacts, how about
first implementing something like TED which gets rid of multixacts (and
freezing thereof) altogether?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Heikki Linnakangas (#12)
Re: Challenges preventing us moving to 64 bit transaction id (XID)?

On Wed, Jun 7, 2017 at 4:47 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 06/06/2017 07:24 AM, Ashutosh Bapat wrote:

On Tue, Jun 6, 2017 at 9:48 AM, Craig Ringer <craig@2ndquadrant.com>
wrote:

On 6 June 2017 at 12:13, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
wrote:

What happens when the epoch is so low that the rest of the XID does
not fit in 32bits of tuple header? Or such a case should never arise?

Storing an epoch implies that rows can't have (xmin,xmax) different by
more than one epoch. So if you're updating/deleting an extremely old
tuple you'll presumably have to set xmin to FrozenTransactionId if it
isn't already, so you can set a new epoch and xmax.

If the page has multiple such tuples, updating one tuple will mean
updating headers of other tuples as well? This means that those tuples
need to be locked for concurrent scans? May be not, since such tuples
will be anyway visible to any concurrent scans and updating xmin/xmax
doesn't change the visibility. But we might have to prevent multiple
updates to the xmin/xmax because of concurrent updates on the same
page.

"Store the epoch in the page header" is actually a slightly
simpler-to-visualize, but incorrect, version of what we actually need to do.
If you only store the epoch, then all the XIDs on a page need to belong to
the same epoch, which causes trouble when the current epoch changes. Just
after the epoch changes, you cannot necessarily freeze all the tuples from
the previous epoch, because they would not yet be visible to everyone.

The full picture is that we need to store one 64-bit XID "base" value in the
page header, and all the xmin/xmax values in the tuple headers are offsets
relative to that base. With that, you effectively have 64-bit XIDs, as long
as the *difference* between any two XIDs on a page is not greater than 2^32.
That can be guaranteed, as long as we don't allow a transaction to be
in-progress for more than 2^32 XIDs. That seems like a reasonable
limitation.

But yes, when the "current XID - base XID in page header" becomes greater
than 2^32, and you need to update a tuple on that page, you need to first
freeze the page, update the base XID on the page header to a more recent
value, and update the XID offsets on every tuple on the page accordingly.
And to do that, you need to hold a lock on the page. If you don't move any
tuples around at the same time, but just update the XID fields, and
exclusive lock on the page is enough, i.e. you don't need to take a
super-exclusive or vacuum lock. In any case, it happens so infrequently that
it should not become a serious burden.

Freezing a page is required when modifying a tuple on the page by a
transaction with greater than 2^32 XID. Is that right?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#15)
Re: Challenges preventing us moving to 64 bit transaction id (XID)?

On 2017-06-07 07:49:00 -0300, Alvaro Herrera wrote:

Instead of adding a second 64 bit counter for multixacts, how about
first implementing something like TED which gets rid of multixacts (and
freezing thereof) altogether?

-1 - that seems like a too high barrier. We've punted on improvements on
this because of CSN, xid-lsn ranges, and at some point we're going to
have to make pragmatic choices, rather than strive for something more ideal.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#17)
Re: Challenges preventing us moving to 64 bit transaction id (XID)?

On Wed, Jun 7, 2017 at 12:49 PM, Andres Freund <andres@anarazel.de> wrote:

On 2017-06-07 07:49:00 -0300, Alvaro Herrera wrote:

Instead of adding a second 64 bit counter for multixacts, how about
first implementing something like TED which gets rid of multixacts (and
freezing thereof) altogether?

-1 - that seems like a too high barrier. We've punted on improvements on
this because of CSN, xid-lsn ranges, and at some point we're going to
have to make pragmatic choices, rather than strive for something more ideal.

What is the problem that we are trying to solve with this change? Is
there a practical use case for setting autovacuum_freeze_max_age >
2000000000, or is this just so that when autovacuum fails to vacuum
things in time, we can bloat clog instead of performing an emergency
shutdown?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#13)
Re: Challenges preventing us moving to 64 bit transaction id (XID)?

On Wed, Jun 7, 2017 at 11:33 AM, Alexander Korotkov <
a.korotkov@postgrespro.ru> wrote:

On Tue, Jun 6, 2017 at 4:05 PM, Peter Eisentraut <
peter.eisentraut@2ndquadrant.com> wrote:

On 6/6/17 08:29, Bruce Momjian wrote:

On Tue, Jun 6, 2017 at 06:00:54PM +0800, Craig Ringer wrote:

Tom's point is, I think, that we'll want to stay pg_upgrade
compatible. So when we see a pg10 tuple and want to add a new page
with a new page header that has an epoch, but the whole page is full
so there isn't 32 bits left to move tuples "down" the page, what do we
do?

I guess I am missing something. If you see an old page version number,
you know none of the tuples are from running transactions so you can
just freeze them all, after consulting the pg_clog. What am I missing?
If the page is full, why are you trying to add to the page?

The problem is if you want to delete from such a page. Then you need to
update the tuple's xmax and stick the new xid epoch somewhere.

We had an unconference session at PGCon about this. These issues were
all discussed and some ideas were thrown around. We can expect a patch
to appear soon, I think.

Right. I'm now working on splitting my large patch for 64-bit xids into
patchset.
I'm planning to post patchset in the beginning of next week.

Work on this patch took longer than I expected. It is still in not so good
shape, but I decided to publish it anyway in order to not stop progress in
this area.
I also tried to split this patch into several. But actually I manage to
separate few small pieces, while most of changes are remaining in the
single big diff.
Long story short, patchset is attached.

0001-64bit-guc-relopt-1.patch
This patch implements 64 bit GUCs and relation options which are used in
further patches.

0002-heap-page-special-1.patch
Putting xid and multixact bases into PageHeaderData would take extra 16
bytes on index pages too. That would be waste of space for indexes. This
is why I decided to put bases into special area of heap pages.
This patch adds special area for heap pages contaning prune xid and magic
number. Magic number is different for regular heap page and sequence page.

0003-64bit-xid-1.patch
It's the major patch. It redefines TransactionID ad 64-bit integer and
defines 32-bit ShortTransactionID which is used for t_xmin and t_xmax.
Transaction id comparison becomes straight instead of circular. Base values
for xids and multixact ids are stored in heap page special. SLRUs also
became 64-bit and non-circular. To be able to calculate xmin/xmax without
accessing heap page, base values are copied into HeapTuple.
Correspondingly HeapTupleHeader(Get|Set)(Xmin|Xmax) becomes just
HeapTuple(Get|Set)(Xmin|Xmax) whose require HeapTuple not just
HeapTupleHeader. heap_page_prepare_for_xid() is used to ensure that given
xid fits particular page base. If it doesn't fit then base of page is
shifted, that could require single-page freeze. Format for wal is changed
in order to prevent unaligned access to TransactionId. *_age GUCs and
relation options are changed to 64-bit. Forced "autovacuum to prevent
wraparound" is removed, but there is still freeze to truncate SLRUs.

0004-base-values-for-testing-1.patch
This patch is used for testing that calculations using 64-bit bases and
short 32-bit xid values are correct. It provides initdb options for
initial xid, multixact id and multixact offset values. Regression tests
initialize cluster with large (more than 2^32) values.

There are a lot of open items, but I would like to notice some of them:
* WAL becomes significantly larger due to storage 8 byte xids instead of 4
byte xids. Probably, its needed to use base approach in WAL too.
* As discussed in developer unconference, we need to write special
background worker which would ensure that each heap page can fit bases.
This background worker should finish its work before database could be
pg_upgraded. Alternatively, we could find a way to store bases in the
existing page header.
* BTPageOpaqueData contains TransactionID in special area.
BTPageOpaqueData should be changed to some pg_upgradable format.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0001-64bit-guc-relopt-1.patchapplication/octet-stream; name=0001-64bit-guc-relopt-1.patchDownload+514-0
0002-heap-page-special-1.patchapplication/octet-stream; name=0002-heap-page-special-1.patchDownload+153-135
0003-64bit-xid-1.patchapplication/octet-stream; name=0003-64bit-xid-1.patchDownload+3615-3900
0004-base-values-for-testing-1.patchapplication/octet-stream; name=0004-base-values-for-testing-1.patchDownload+207-21
#20Jim Finnerty
jfinnert@amazon.com
In reply to: Alexander Korotkov (#19)
Re: Challenges preventing us moving to 64 bit transaction id (XID)?

re: "The problem is if you want to delete from such a page. Then you need to
update the tuple's xmax and stick the new xid epoch somewhere."

When the xid's on a single page span a range of more than 2^32, as could
occur in the scenario above, then a single xid base value won't suffice. Do
we have a proposed solution for this problem?

If not, then allow me to put out a 'straw man' proposal: perhaps we could
mark such a row with a 'tombstone' that points off to some other page in yet
another page format that contains full 64-bit xids. Rows in this 64-bit xid
format page would all be deleted rows, and would be vacuumed away, along
with the tombstone row, when there are no more transactions that can see it.
Under the assumption that deletion of such very old rows is rare, this may
have very little impact on performance. One negative is that rarely
executed code can be a maintainability problem, but we can probably cope
with that.

Feel free to knock down this 'straw man' and propose something better!

--
View this message in context: http://www.postgresql-archive.org/Challenges-preventing-us-moving-to-64-bit-transaction-id-XID-tp5964779p5970238.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Bruce Momjian
bruce@momjian.us
In reply to: Jim Finnerty (#20)
#22Bruce Momjian
bruce@momjian.us
In reply to: Jim Finnerty (#20)
#23Ildar Musin
i.musin@postgrespro.ru
In reply to: Alexander Korotkov (#19)
#24Alexander Korotkov
aekorotkov@gmail.com
In reply to: Ildar Musin (#23)
#25Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#24)
#26Amit Kapila
amit.kapila16@gmail.com
In reply to: Alexander Korotkov (#19)
#27Alexander Korotkov
aekorotkov@gmail.com
In reply to: Amit Kapila (#26)
#28Amit Kapila
amit.kapila16@gmail.com
In reply to: Alexander Korotkov (#27)
#29Robert Haas
robertmhaas@gmail.com
In reply to: Alexander Korotkov (#27)
In reply to: Robert Haas (#29)
#31Alexander Korotkov
aekorotkov@gmail.com
In reply to: Robert Haas (#29)
#32Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Robert Haas (#29)
#33Michael Paquier
michael@paquier.xyz
In reply to: Alexander Korotkov (#31)
#34Ryan Murphy
ryanfmurphy@gmail.com
In reply to: Michael Paquier (#33)
#35Ryan Murphy
ryanfmurphy@gmail.com
In reply to: Ryan Murphy (#34)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ryan Murphy (#35)
#37Alexander Korotkov
aekorotkov@gmail.com
In reply to: Ryan Murphy (#35)
#38Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#37)
#39Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#38)
#40Andres Freund
andres@anarazel.de
In reply to: Alexander Korotkov (#39)
#41Alexander Korotkov
aekorotkov@gmail.com
In reply to: Andres Freund (#40)
#42Andres Freund
andres@anarazel.de
In reply to: Alexander Korotkov (#41)
#43Alexander Korotkov
aekorotkov@gmail.com
In reply to: Andres Freund (#42)
#44Andres Freund
andres@anarazel.de
In reply to: Alexander Korotkov (#43)
#45Chris Travers
chris.travers@adjust.com
In reply to: Andres Freund (#44)
#46Chris Travers
chris.travers@adjust.com
In reply to: Alexander Korotkov (#41)
#47Andres Freund
andres@anarazel.de
In reply to: Chris Travers (#45)
#48Jim Finnerty
jfinnert@amazon.com
In reply to: Andres Freund (#47)
#49Jim Finnerty
jfinnert@amazon.com
In reply to: Alexander Korotkov (#25)
#50Jim Finnerty
jfinnert@amazon.com
In reply to: Jim Finnerty (#49)
#51Jim Finnerty
jfinnert@amazon.com
In reply to: Jim Finnerty (#50)
#52David Steele
david@pgmasters.net
In reply to: Jim Finnerty (#51)
#53Thomas Munro
thomas.munro@gmail.com
In reply to: David Steele (#52)
#54David Steele
david@pgmasters.net
In reply to: Thomas Munro (#53)
#55Jim Finnerty
jfinnert@amazon.com
In reply to: David Steele (#54)
#56Michael Paquier
michael@paquier.xyz
In reply to: David Steele (#54)
#57Julien Rouhaud
rjuju123@gmail.com
In reply to: Michael Paquier (#56)
#58Michael Paquier
michael@paquier.xyz
In reply to: Julien Rouhaud (#57)
#59Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#58)