64-bit XIDs again
Hackers,
I know there were already couple of threads about 64bit XIDs.
/messages/by-id/42CCCFE9.9040809@intellilink.co.jp
/messages/by-id/4F6C0E13.3080406@wiesinger.com
I read them carefully, but I didn't find all the arguments for 64bit XIDs
mentioned. That's why I'd like to raise this subject again.
Now hardware capabilities are much higher than when Postgres was designed.
In the modern PostgreSQL scalability tests it's typical to achieve 400 000
- 500 000 tps with pgbench. With such tps it takes only few minutes to
achieve default autovacuum_freeze_max_age = 200 millions.
Notion of wraparound is evolutioning during the time. Initially it was
something that almost never happens. Then it becomes something that could
happen rarely, and we should be ready to it (freeze tuples in advance).
Now, it becomes quite frequent periodic event for high load database. DB
admins should take into account its performance impact.
Typical scenario that I've faced in real life was so. Database is divided
into operative and archive parts. Operative part is small (dozens of
gigabytes) and it serves most of transactions. Archive part is relatively
large (some terabytes) and it serves rare selects and bulk inserts.
Autovacuum work very active for operative part and very lazy for archive
part (as it's expected). System works well until one day age of archive
tables exceeds autovacuum_freeze_max_age. Then all autovacuum workers
starts to do "autovacuum to prevent wraparound" on archive tables. If even
system IO survive this, operative tables get bloated because all autovacuum
workers are busy with archive tables. In such situation I typically advise
to increase autovacuum_freeze_max_age and run vacuum freeze manually when
system have enough of free resources.
As I mentioned in CSN thread, it would be nice to replace XID with CSN when
setting hint bits for tuple. In this case when hint bits are set we don't
need any additional lookups to check visibility.
/messages/by-id/CAPpHfdv7BMwGv=OfUg3S-jGVFKqHi79pR_ZK1Wsk-13oZ+cy5g@mail.gmail.com
Introducing 32-bit CSN doesn't seem reasonable for me, because it would
double our troubles with wraparound.
Also, I think it's possible to migrate to 64-bit XIDs without breaking
pg_upgrade. Old tuples can be leaved with 32-bit XIDs while new tuples
would be created with 64-bit XIDs. We can use free bits in t_infomask2 to
distinguish old and new formats.
Any thoughts? Do you think 64-bit XIDs worth it?
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 30 July 2015 at 14:26, Alexander Korotkov <a.korotkov@postgrespro.ru>
wrote:
Any thoughts? Do you think 64-bit XIDs worth it?
The problem of freezing is painful, but not impossible, which is why we
have held out so long.
The problem of very long lived snapshots is coming closer at the same speed
as freezing; there is no solution to that without 64-bit xids throughout
whole infrastructure, or CSNs.
The opportunity for us to have SQL Standard historical databases becomes
possible with 64-bit xids, or CSNs. That is a high value goal.
I personally now think we should thoroughly investigate 64-bit xids. I
don't see this as mere debate, I see this as something that we can make a
patch for and scientifically analyze the pros and cons through measurement.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 07/30/2015 04:26 PM, Alexander Korotkov wrote:
Also, I think it's possible to migrate to 64-bit XIDs without breaking
pg_upgrade. Old tuples can be leaved with 32-bit XIDs while new tuples
would be created with 64-bit XIDs. We can use free bits in t_infomask2 to
distinguish old and new formats.
I think we should move to 64-bit XIDs in in-memory structs snapshots,
proc array etc. And expand clog to handle 64-bit XIDs. But keep the
xmin/xmax fields on heap pages at 32-bits, and add an epoch-like field
to the page header so that logically the xmin/xmax fields on the page
are 64 bits wide, but physically stored in 32 bits. That's possible as
long as no two XIDs on the same page are more than 2^31 XIDs apart. So
you still need to freeze old tuples on the page when that's about to
happen, but it would make it possible to have more than 2^32 XID
transactions in the clog. You'd never be forced to do anti-wraparound
vacuums, you could just let the clog grow arbitrarily large.
There is a big downside to expanding xmin/xmax to 64 bits: it takes
space. More space means more memory needed for caching, more memory
bandwidth, more I/O, etc.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 07/30/2015 07:14 AM, Simon Riggs wrote:
On 30 July 2015 at 14:26, Alexander Korotkov
<a.korotkov@postgrespro.ruAny thoughts? Do you think 64-bit XIDs worth it?
The problem of freezing is painful, but not impossible, which is
why we have held out so long.The problem of very long lived snapshots is coming closer at the
same speed as freezing; there is no solution to that without 64-bit
xids throughout whole infrastructure, or CSNs.The opportunity for us to have SQL Standard historical databases
becomes possible with 64-bit xids, or CSNs. That is a high value
goal.I personally now think we should thoroughly investigate 64-bit
xids. I don't see this as mere debate, I see this as something that
we can make a patch for and scientifically analyze the pros and
cons through measurement.
+1
I've been thinking along similar lines to both of you for quite some
time now. I think at the least we should explore an initdb time option
- -- we can and should measure the pros and cons.
- --
Joe Conway
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAEBAgAGBQJVujVzAAoJEDfy90M199hlphUP/j/+TEJO05h+aLD1TrddZ01f
Fq2ijyvQjfe3aBN/4DEKuVBPMsQ6ZWLWtYJ3/FpktlFUIoDdDWJY//8rb63KqUut
tiSMI4MNzIp/ImvyR1pMJpAmfF9zHsJOiC8Hjpj9J8ity1BXm5My0XYzf9cux/KN
Qr8e5RTiPNKZyCB7w5Ci9byYIQKwHS9UyoHhgXQhZTopYLqrN9G7KxjKHZjTYxAs
6uJowQqsoevlgi15L8Ojk+KuJuowEHhVthhZ0f147twrOu2PwvhPP0/tf3TCSzKW
I3TGC8ChQ67+h/x4lF2LMENvDwGZFh0fB4foeu0F3oR5YX4jG6pic/k7BJFPke3f
YPk8PnA4fn5PM2otgikExIM6NFm+1y4JEeVOcGaA0GungdbgcBuN4p8gLvO9zJRa
qsJp6U+FHK7m68jBVAlo0aVERikh29devypOSWhz474nvYsZIm9bfQ4te+DQECzw
m3a9KJWJUy7Bj8xkwLpMMXmm83bIbvNMj8oDlg9tMo//CEzSsXyNjGUPG0/U9jIs
YHZUYd24i8Wg4+BjdQ19ULJH22ROZa2JBq658t6n97vab7HS3ZWGPhao0piYW20i
/q8wmd52KE0e4gg4Jixc1p8kPvIItFeJliEPgbRC1+7vnZu0rkENxXpTuS/g1fn0
Ql/P9C7Nb97cux9EvlZv
=gX7d
-----END PGP SIGNATURE-----
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jul 30, 2015 at 5:24 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 07/30/2015 04:26 PM, Alexander Korotkov wrote:
Also, I think it's possible to migrate to 64-bit XIDs without breaking
pg_upgrade. Old tuples can be leaved with 32-bit XIDs while new tuples
would be created with 64-bit XIDs. We can use free bits in t_infomask2 to
distinguish old and new formats.I think we should move to 64-bit XIDs in in-memory structs snapshots, proc
array etc. And expand clog to handle 64-bit XIDs. But keep the xmin/xmax
fields on heap pages at 32-bits, and add an epoch-like field to the page
header so that logically the xmin/xmax fields on the page are 64 bits wide,
but physically stored in 32 bits. That's possible as long as no two XIDs on
the same page are more than 2^31 XIDs apart. So you still need to freeze
old tuples on the page when that's about to happen, but it would make it
possible to have more than 2^32 XID transactions in the clog. You'd never
be forced to do anti-wraparound vacuums, you could just let the clog grow
arbitrarily large.
Nice idea. Storing extra epoch would be extra 4 bytes per heap tuple
instead of extra 8 bytes per tuple if storing 64 bits xmin/xmax.
But if first column is aligned to 8 bytes (i.e. bigserial) would we loose
this 4 bytes win for alignment?
There is a big downside to expanding xmin/xmax to 64 bits: it takes space.
More space means more memory needed for caching, more memory bandwidth,
more I/O, etc.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 07/30/2015 05:57 PM, Alexander Korotkov wrote:
On Thu, Jul 30, 2015 at 5:24 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
I think we should move to 64-bit XIDs in in-memory structs snapshots, proc
array etc. And expand clog to handle 64-bit XIDs. But keep the xmin/xmax
fields on heap pages at 32-bits, and add an epoch-like field to the page
header so that logically the xmin/xmax fields on the page are 64 bits wide,
but physically stored in 32 bits. That's possible as long as no two XIDs on
the same page are more than 2^31 XIDs apart. So you still need to freeze
old tuples on the page when that's about to happen, but it would make it
possible to have more than 2^32 XID transactions in the clog. You'd never
be forced to do anti-wraparound vacuums, you could just let the clog grow
arbitrarily large.Nice idea. Storing extra epoch would be extra 4 bytes per heap tuple
instead of extra 8 bytes per tuple if storing 64 bits xmin/xmax.
No, I was thinking that the epoch would be stored *per page*, in the
page header.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 30 July 2015 at 15:24, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 07/30/2015 04:26 PM, Alexander Korotkov wrote:
Also, I think it's possible to migrate to 64-bit XIDs without breaking
pg_upgrade. Old tuples can be leaved with 32-bit XIDs while new tuples
would be created with 64-bit XIDs. We can use free bits in t_infomask2 to
distinguish old and new formats.I think we should move to 64-bit XIDs in in-memory structs snapshots, proc
array etc. And expand clog to handle 64-bit XIDs. But keep the xmin/xmax
fields on heap pages at 32-bits, and add an epoch-like field to the page
header so that logically the xmin/xmax fields on the page are 64 bits wide,
but physically stored in 32 bits. That's possible as long as no two XIDs on
the same page are more than 2^31 XIDs apart. So you still need to freeze
old tuples on the page when that's about to happen, but it would make it
possible to have more than 2^32 XID transactions in the clog. You'd never
be forced to do anti-wraparound vacuums, you could just let the clog grow
arbitrarily large.
This is a good scheme, but it assumes, as you say, that you can freeze
tuples that are more than 2^31 xids apart. That is no longer a safe
assumption on high transaction rate systems with longer lived snapshots.
There is a big downside to expanding xmin/xmax to 64 bits: it takes space.
More space means more memory needed for caching, more memory bandwidth,
more I/O, etc.
My feeling is that the overhead will recede in time. Having a nice, simple
change to remove old bugs and new would help us be more robust.
But let's measure the overhead before we try to optimize it away.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 07/30/2015 08:04 AM, Simon Riggs wrote:
There is a big downside to expanding xmin/xmax to 64 bits: it takes
space. More space means more memory needed for caching, more memory
bandwidth, more I/O, etc.My feeling is that the overhead will recede in time. Having a nice,
simple change to remove old bugs and new would help us be more robust.But let's measure the overhead before we try to optimize it away.
In field experience would agree with you. The amount of memory people
are arbitrarily throwing at databases now is pretty significant. It is
common to have >64GB of memory. Heck, I run into >128GB all the time and
seeing >192GB is no longer a, "Wow".
JD
--
Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 31/07/15 02:24, Heikki Linnakangas wrote:
On 07/30/2015 04:26 PM, Alexander Korotkov wrote:
Also, I think it's possible to migrate to 64-bit XIDs without breaking
pg_upgrade. Old tuples can be leaved with 32-bit XIDs while new tuples
would be created with 64-bit XIDs. We can use free bits in
t_infomask2 to
distinguish old and new formats.I think we should move to 64-bit XIDs in in-memory structs snapshots,
proc array etc. And expand clog to handle 64-bit XIDs. But keep the
xmin/xmax fields on heap pages at 32-bits, and add an epoch-like field
to the page header so that logically the xmin/xmax fields on the page
are 64 bits wide, but physically stored in 32 bits. That's possible as
long as no two XIDs on the same page are more than 2^31 XIDs apart. So
you still need to freeze old tuples on the page when that's about to
happen, but it would make it possible to have more than 2^32 XID
transactions in the clog. You'd never be forced to do anti-wraparound
vacuums, you could just let the clog grow arbitrarily large.There is a big downside to expanding xmin/xmax to 64 bits: it takes
space. More space means more memory needed for caching, more memory
bandwidth, more I/O, etc.- Heikki
I think having a special case to save 32 bits per tuple would cause
unnecessary complications, and the savings are minimal compared to the
size of current modern storage devices and the typical memory used in
serious database servers.
I think it is too much pain for very little gain, especially when
looking into the future growth in storage capacity andbandwidth.
The early mainframes used a base displacement technique to keep the size
of addresses down in instructions: 16 bit addresses, comprising 4 bits
for a base register and 12 bits for the displacement (hence the use of
4KB pages sizes now!). Necessary at the time when mainframes were often
less than 128 KB! Now it would ludicrous to do that for modern servers!
Cheers,
Gavin
(Who is ancient enough, to have programmed such MainFrames!)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Gavin Flower <GavinFlower@archidevsys.co.nz> writes:
On 31/07/15 02:24, Heikki Linnakangas wrote:
There is a big downside to expanding xmin/xmax to 64 bits: it takes
space. More space means more memory needed for caching, more memory
bandwidth, more I/O, etc.
I think having a special case to save 32 bits per tuple would cause
unnecessary complications, and the savings are minimal compared to the
size of current modern storage devices and the typical memory used in
serious database servers.
I think the argument that the savings are minimal is pretty thin.
It all depends on how wide your tables are --- but on a narrow table, say
half a dozen ints, the current tuple size is 24 bytes header plus the same
number of bytes of data. We'd be going up to 32 bytes header which makes
for a 16% increase in physical table size. If your table is large,
claiming that 16% doesn't hurt is just silly.
But the elephant in the room is on-disk compatibility. There is
absolutely no way that we can just change xmin/xmax to 64 bits without a
disk format break. However, if we do something like what Heikki is
suggesting, it's at least conceivable that we could convert incrementally
(ie, if you find a page with the old header format, assume all tuples in
it are part of epoch 0; and do not insert new tuples into it unless there
is room to convert the header to new format ... but I'm not sure what we
do about tuple deletion if the old page is totally full and we need to
write an xmax that's past 4G).
Only if you are willing to kiss off on-disk compatibility is it even
worth having a discussion about whether we can afford more bloat in
HeapTupleHeader. And that would be a pretty big pain point for a lot
of users.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jul 30, 2015 at 5:31 PM, Gavin Flower <GavinFlower@archidevsys.co.nz
wrote:
On 31/07/15 02:24, Heikki Linnakangas wrote:
On 07/30/2015 04:26 PM, Alexander Korotkov wrote:
Also, I think it's possible to migrate to 64-bit XIDs without breaking
pg_upgrade. Old tuples can be leaved with 32-bit XIDs while new tuples
would be created with 64-bit XIDs. We can use free bits in t_infomask2 to
distinguish old and new formats.I think we should move to 64-bit XIDs in in-memory structs snapshots,
proc array etc. And expand clog to handle 64-bit XIDs. But keep the
xmin/xmax fields on heap pages at 32-bits, and add an epoch-like field to
the page header so that logically the xmin/xmax fields on the page are 64
bits wide, but physically stored in 32 bits. That's possible as long as no
two XIDs on the same page are more than 2^31 XIDs apart. So you still need
to freeze old tuples on the page when that's about to happen, but it would
make it possible to have more than 2^32 XID transactions in the clog. You'd
never be forced to do anti-wraparound vacuums, you could just let the clog
grow arbitrarily large.There is a big downside to expanding xmin/xmax to 64 bits: it takes
space. More space means more memory needed for caching, more memory
bandwidth, more I/O, etc.- Heikki
I think having a special case to save 32 bits per tuple would cause
unnecessary complications, and the savings are minimal compared to the size
of current modern storage devices and the typical memory used in serious
database servers.I think it is too much pain for very little gain, especially when looking
into the future growth in storage capacity andbandwidth.The early mainframes used a base displacement technique to keep the size
of addresses down in instructions: 16 bit addresses, comprising 4 bits for
a base register and 12 bits for the displacement (hence the use of 4KB
pages sizes now!). Necessary at the time when mainframes were often less
than 128 KB! Now it would ludicrous to do that for modern servers!Cheers,
Gavin(Who is ancient enough, to have programmed such MainFrames!)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
In the other hand PG tuple overhead is already the largest among the
alternatives.
Even if storage keeps getting faster and cheaper stuff you can't ignore the
overhead of adding yet another 8bytes to each tuple.
On 07/30/2015 07:24 AM, Heikki Linnakangas wrote:
I think we should move to 64-bit XIDs in in-memory structs snapshots,
proc array etc. And expand clog to handle 64-bit XIDs. But keep the
xmin/xmax fields on heap pages at 32-bits, and add an epoch-like field
to the page header so that logically the xmin/xmax fields on the page
are 64 bits wide, but physically stored in 32 bits. That's possible as
long as no two XIDs on the same page are more than 2^31 XIDs apart. So
you still need to freeze old tuples on the page when that's about to
happen, but it would make it possible to have more than 2^32 XID
transactions in the clog. You'd never be forced to do anti-wraparound
vacuums, you could just let the clog grow arbitrarily large
When I introduced the same idea a few years back, having the clog get
arbitrarily large was cited as a major issue. I was under the
impression that clog size had some major performance impacts.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Reply to msg id not found: WM4b571a73a83a99e45bb5bd926304226044fe8900bf2b19dc31e3d0bd3ed4924cb2c67171bc3019cd66230e0efd597e55@asav-2.01.com
On 2015-07-30 23:23, Tom Lane wrote:
Gavin Flower <GavinFlower@archidevsys.co.nz> writes:
On 31/07/15 02:24, Heikki Linnakangas wrote:
There is a big downside to expanding xmin/xmax to 64 bits: it takes
space. More space means more memory needed for caching, more memory
bandwidth, more I/O, etc.I think having a special case to save 32 bits per tuple would cause
unnecessary complications, and the savings are minimal compared to the
size of current modern storage devices and the typical memory used in
serious database servers.I think the argument that the savings are minimal is pretty thin.
It all depends on how wide your tables are --- but on a narrow table, say
half a dozen ints, the current tuple size is 24 bytes header plus the same
number of bytes of data. We'd be going up to 32 bytes header which makes
for a 16% increase in physical table size. If your table is large,
claiming that 16% doesn't hurt is just silly.But the elephant in the room is on-disk compatibility. There is
absolutely no way that we can just change xmin/xmax to 64 bits without a
disk format break. However, if we do something like what Heikki is
suggesting, it's at least conceivable that we could convert incrementally
(ie, if you find a page with the old header format, assume all tuples in
it are part of epoch 0; and do not insert new tuples into it unless there
is room to convert the header to new format ...
We could theoretically do similar thing with 64bit xmin/xmax though -
detect page is in old format and convert all tuples there to 64bit
xmin/xmax.
But I agree that we don't want to increase bloat per tuple as it's
already too big.
but I'm not sure what we
do about tuple deletion if the old page is totally full and we need to
write an xmax that's past 4G).
If the page is too full we could move some data to different (or new) page.
For me bigger issue is that we'll still have to "refreeze" pages because
if tuples are updated or deleted in different epoch than the one they
were inserted in, the new version of tuple has to go to different page
and the old page will have free space that can't be used by new tuples
since the system is now in different epoch.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Jul 30, 2015 2:23 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
Gavin Flower <GavinFlower@archidevsys.co.nz> writes:
On 31/07/15 02:24, Heikki Linnakangas wrote:
There is a big downside to expanding xmin/xmax to 64 bits: it takes
space. More space means more memory needed for caching, more memory
bandwidth, more I/O, etc.I think having a special case to save 32 bits per tuple would cause
unnecessary complications, and the savings are minimal compared to the
size of current modern storage devices and the typical memory used in
serious database servers.I think the argument that the savings are minimal is pretty thin.
It all depends on how wide your tables are --- but on a narrow table, say
half a dozen ints, the current tuple size is 24 bytes header plus the same
number of bytes of data. We'd be going up to 32 bytes header which makes
for a 16% increase in physical table size. If your table is large,
claiming that 16% doesn't hurt is just silly.But the elephant in the room is on-disk compatibility. There is
absolutely no way that we can just change xmin/xmax to 64 bits without a
disk format break. However, if we do something like what Heikki is
suggesting, it's at least conceivable that we could convert incrementally
(ie, if you find a page with the old header format, assume all tuples in
it are part of epoch 0; and do not insert new tuples into it unless there
is room to convert the header to new format ... but I'm not sure what we
do about tuple deletion if the old page is totally full and we need to
write an xmax that's past 4G).
Can we safely relegate the responsibility of tracking the per block epoch
to a relation fork?
On 07/31/2015 09:22 AM, Gurjeet Singh wrote:
On Jul 30, 2015 2:23 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
But the elephant in the room is on-disk compatibility. There is
absolutely no way that we can just change xmin/xmax to 64 bits without a
disk format break. However, if we do something like what Heikki is
suggesting, it's at least conceivable that we could convert incrementally
(ie, if you find a page with the old header format, assume all tuples in
it are part of epoch 0; and do not insert new tuples into it unless there
is room to convert the header to new format ... but I'm not sure what we
do about tuple deletion if the old page is totally full and we need to
write an xmax that's past 4G).Can we safely relegate the responsibility of tracking the per block epoch
to a relation fork?
Sounds complicated and fragile. I would rather attack the page version
problem head on.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jul 31, 2015 at 1:27 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
On 2015-07-30 23:23, Tom Lane wrote:
But the elephant in the room is on-disk compatibility. There is
absolutely no way that we can just change xmin/xmax to 64 bits without a
disk format break. However, if we do something like what Heikki is
suggesting, it's at least conceivable that we could convert incrementally
(ie, if you find a page with the old header format, assume all tuples in
it are part of epoch 0; and do not insert new tuples into it unless there
is room to convert the header to new format ...We could theoretically do similar thing with 64bit xmin/xmax though -
detect page is in old format and convert all tuples there to 64bit
xmin/xmax.But I agree that we don't want to increase bloat per tuple as it's already
too big.but I'm not sure what we
do about tuple deletion if the old page is totally full and we need to
write an xmax that's past 4G).If the page is too full we could move some data to different (or new) page.
For me bigger issue is that we'll still have to "refreeze" pages because
if tuples are updated or deleted in different epoch than the one they were
inserted in, the new version of tuple has to go to different page and the
old page will have free space that can't be used by new tuples since the
system is now in different epoch.
It is not so easy to move heap tuple to the different page. When table has
indexes each tuple is referenced by index tuples as (blockNumber; offset).
And we can't remove these references without vacuum. Thus, we would have to
invent something like multipage HOT in order to move tuples between pages.
And that seems to be a complicated kludge.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Fri, Jul 31, 2015 at 12:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
But the elephant in the room is on-disk compatibility. There is
absolutely no way that we can just change xmin/xmax to 64 bits without a
disk format break.
That seems problematic. But I'm not yet convinced that there is absolutely
no way to do this.
However, if we do something like what Heikki is
suggesting, it's at least conceivable that we could convert incrementally
(ie, if you find a page with the old header format, assume all tuples in
it are part of epoch 0; and do not insert new tuples into it unless there
is room to convert the header to new format ... but I'm not sure what we
do about tuple deletion if the old page is totally full and we need to
write an xmax that's past 4G).
If use upgrade database cluster with pg_upgrade, he would stop old
postmaster, pg_upgrade, start new postmaster. That means we start from the
point when there is no running transactions. Thus, between tuples of old
format there are two kinds: visible for everybody and invisible for
everybody. When update or delete old tuple of first kind, we actually don't
need to store its xmin anymore. We can store 64bit xmax in the place of
xmin/xmax.
So, in order to switch to 64bit xmin/xmax, we have to take both free bits
form t_infomask2 in order to implements it. They should indicate one of 3
possible tuple formats:
1) Old format: both xmin/xmax are 32bit
2) Intermediate format: xmax is 64bit, xmin is frozen.
3) New format: both xmin/xmax are 64bit.
But we can use same idea to implement epoch in heap page header as well. If
new page header doesn't fits the page, then we don't have to insert
something to this page, we just need to set xmax and flags to existing
tuples. Then we can use two format from listed above: #1 and #2, and take
one free bit from t_infomask2 for format indication.
Probably I'm missing something, but I think keeping on-disk compatibility
should be somehow possible.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 31 July 2015 at 11:00, Alexander Korotkov <a.korotkov@postgrespro.ru>
wrote:
If use upgrade database cluster with pg_upgrade, he would stop old
postmaster, pg_upgrade, start new postmaster. That means we start from the
point when there is no running transactions. Thus, between tuples of old
format there are two kinds: visible for everybody and invisible for
everybody. When update or delete old tuple of first kind, we actually don't
need to store its xmin anymore. We can store 64bit xmax in the place of
xmin/xmax.So, in order to switch to 64bit xmin/xmax, we have to take both free bits
form t_infomask2 in order to implements it. They should indicate one of 3
possible tuple formats:
1) Old format: both xmin/xmax are 32bit
2) Intermediate format: xmax is 64bit, xmin is frozen.
3) New format: both xmin/xmax are 64bit.But we can use same idea to implement epoch in heap page header as well.
If new page header doesn't fits the page, then we don't have to insert
something to this page, we just need to set xmax and flags to existing
tuples. Then we can use two format from listed above: #1 and #2, and take
one free bit from t_infomask2 for format indication.
I think we can do it by treating the page level epoch as a means of
compression, rather than as a barrier which is how I first saw it.
New Page Format
New Page format has a page-level epoch.
First tuple inserted onto a block sets the page epoch. For later inserts,
we check whether the current epoch matches the page epoch. If it doesn't,
we try to freeze the page. If all tuples can be frozen on the page, we can
then reset the page level epoch as part of our insert. If we can't then
freeze all tuples on the page, we extend the relation to allow us to add a
new page with current epoch on it. (We can't easily track which blocks have
which epoch).
If an Update or Deletes sees a tuple from a prior epoch, we will try to
freeze the tuple. If we can, then we reuse xmin as the xmax's epoch. If we
can't we have problems and need a complex mechanism to avoid problems. I
think it won't be necessary to invent that in the first release, we will
just assume it is possible.
Current Pages
Current pages don't have an epoch, so we store a base epoch in the
controlfile so we remember how to interpret them.
We don't create any new pages with this page format. For later inserts, we
check whether the current epoch matches the page epoch. If it doesn't, we
check whether its possible to rewrite the whole page to new format,
freezing as we go. If that is not possible, we extend the relation to allow
us to add a new page with current epoch on it. (We can't easily track which
blocks have which epoch).
If an Update or Deletes sees a tuple from a prior epoch, we will try to
freeze the tuple. If we can, then we reuse xmin as the xmax's epoch.
I don't think we need any new tuple formats to do this.
This means we have
* changes to allow new bufpage format
* changes in hio.c for page selection
* changes to allow xmin to be reused when freeze bit set
Very little additional path length in the common case.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jul 30, 2015 at 5:23 PM, Arthur Silva <arthurprs@gmail.com> wrote:
In the other hand PG tuple overhead is already the largest among the
alternatives.
Even if storage keeps getting faster and cheaper stuff you can't ignore the
overhead of adding yet another 8bytes to each tuple.
+1, very much.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 07/31/2015 12:29 AM, Josh Berkus wrote:
On 07/30/2015 07:24 AM, Heikki Linnakangas wrote:
I think we should move to 64-bit XIDs in in-memory structs snapshots,
proc array etc. And expand clog to handle 64-bit XIDs. But keep the
xmin/xmax fields on heap pages at 32-bits, and add an epoch-like field
to the page header so that logically the xmin/xmax fields on the page
are 64 bits wide, but physically stored in 32 bits. That's possible as
long as no two XIDs on the same page are more than 2^31 XIDs apart. So
you still need to freeze old tuples on the page when that's about to
happen, but it would make it possible to have more than 2^32 XID
transactions in the clog. You'd never be forced to do anti-wraparound
vacuums, you could just let the clog grow arbitrarily largeWhen I introduced the same idea a few years back, having the clog get
arbitrarily large was cited as a major issue. I was under the
impression that clog size had some major performance impacts.
Well, sure, if you don't want the clog to grow arbitrarily large, then
you need to freeze. And most people would want to freeze regularly, to
keep the clog size in check. The point is that you wouldn't *have* to do
so at any particular time. You would never be up against the wall, in
the "you must freeze now or your database will shut down" situation.
I'm not sure what performance impact a very large clog might have. It
takes some disk space (1 GB per 4 billion XIDs), and caching it takes
some memory. And there is a small fixed number of CLOG buffers in shared
memory. But I don't think there's any particularly nasty problem there.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 07/31/2015 02:46 PM, Heikki Linnakangas wrote:
On 07/31/2015 12:29 AM, Josh Berkus wrote:
On 07/30/2015 07:24 AM, Heikki Linnakangas wrote:
I think we should move to 64-bit XIDs in in-memory structs snapshots,
proc array etc. And expand clog to handle 64-bit XIDs. But keep the
xmin/xmax fields on heap pages at 32-bits, and add an epoch-like field
to the page header so that logically the xmin/xmax fields on the page
are 64 bits wide, but physically stored in 32 bits. That's possible as
long as no two XIDs on the same page are more than 2^31 XIDs apart. So
you still need to freeze old tuples on the page when that's about to
happen, but it would make it possible to have more than 2^32 XID
transactions in the clog. You'd never be forced to do anti-wraparound
vacuums, you could just let the clog grow arbitrarily largeWhen I introduced the same idea a few years back, having the clog get
arbitrarily large was cited as a major issue. I was under the
impression that clog size had some major performance impacts.Well, sure, if you don't want the clog to grow arbitrarily large, then
you need to freeze. And most people would want to freeze regularly, to
keep the clog size in check. The point is that you wouldn't *have* to do
so at any particular time. You would never be up against the wall, in
the "you must freeze now or your database will shut down" situation.
Well, we still have to freeze *eventually*. Just not for 122,000 years
at current real transaction rates. In 2025, though, we'll be having
this conversation again because of people doing 100 billion transactions
per second. ;-)
I'm not sure what performance impact a very large clog might have. It
takes some disk space (1 GB per 4 billion XIDs), and caching it takes
some memory. And there is a small fixed number of CLOG buffers in shared
memory. But I don't think there's any particularly nasty problem there.
Well, one way to find out, clearly.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Reply to msg id not found: WM0394544d39f44cf0a1834d41268c438f9fd9b21561ded7d5f974398f0747cdbeba1c0751608f7e4d5da9a721b512ec6c@asav-1.01.com
Josh Berkus <josh@agliodbs.com> writes:
On 07/31/2015 02:46 PM, Heikki Linnakangas wrote:
Well, sure, if you don't want the clog to grow arbitrarily large, then
you need to freeze. And most people would want to freeze regularly, to
keep the clog size in check. The point is that you wouldn't *have* to do
so at any particular time. You would never be up against the wall, in
the "you must freeze now or your database will shut down" situation.
Well, we still have to freeze *eventually*. Just not for 122,000 years
at current real transaction rates. In 2025, though, we'll be having
this conversation again because of people doing 100 billion transactions
per second. ;-)
Well, we'd wrap the 64-bit WAL position counters well before we wrap
64-bit TIDs ... and there is no code to support wraparound in WAL LSNs.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 31 July 2015 at 22:46, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 07/31/2015 12:29 AM, Josh Berkus wrote:
On 07/30/2015 07:24 AM, Heikki Linnakangas wrote:
You'd never be forced to do anti-wraparound
vacuums, you could just let the clog grow arbitrarily largeWhen I introduced the same idea a few years back, having the clog get
arbitrarily large was cited as a major issue. I was under the
impression that clog size had some major performance impacts.Well, sure, if you don't want the clog to grow arbitrarily large, then you
need to freeze.
This statement isn't quite right, things are better than that.
We don't need to freeze in order to shrink the clog, we just need to hint
and thereby ensure we move forwards the lowest unhinted xid. That does
involve scanning, but doesn't need to scan indexes. That scan won't produce
anywhere near as much additional WAL traffic or I/O.
In practice, larger clog would only happen with higher transaction rate,
which means more system resources, so I don't think its too much of a
problem overall.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 30 July 2015 at 14:26, Alexander Korotkov <a.korotkov@postgrespro.ru>
wrote:
As I mentioned in CSN thread, it would be nice to replace XID with CSN
when setting hint bits for tuple. In this case when hint bits are set we
don't need any additional lookups to check visibility./messages/by-id/CAPpHfdv7BMwGv=OfUg3S-jGVFKqHi79pR_ZK1Wsk-13oZ+cy5g@mail.gmail.com
Introducing 32-bit CSN doesn't seem reasonable for me, because it would
double our troubles with wraparound.
Your idea to replace XIDs with CSNs instead of hinting them was a good one.
It removes the extra-lookup we thought we needed to check visibility with
CSN snapshots.
I agree 32-bit CSNs would not be a good idea though, a 64-bit CSN is needed.
If we break a CSN down into an Epoch and a 32-bit value then it becomes
more easily possible. The Epoch for XID and CSN can be the same - whichever
wraps first we just increment the Epoch.
By doing this we can reuse the page-level epoch for both XID and CSN. Now
hinting a tuple is just replacing a 32-bit XID with a 32-bit CSN.
We would probably need an extra flag bit for the case where the CSN is one
epoch later than the XID.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services