4 billion record limit?
PostgreSQL uses a hidden "oid" field to uniquely identify all records
across all tables in all databases on a particular server. On my
platform, this number is an unsigned 32-bit integer, meaning 0 to about 4
billion.
1. This implies a hard limit of 4 billion records on a server, right?
2. When I delete records, are those oids ever reused?
--
Paul Caskey paul@nmxs.com Software Engineer
New Mexico Software 5041 Indian School NE Albuquerque, NM 87110
--
Paul Caskey wrote:
PostgreSQL uses a hidden "oid" field to uniquely identify all records
across all tables in all databases on a particular server. On my
platform, this number is an unsigned 32-bit integer, meaning 0 to about 4
billion.1. This implies a hard limit of 4 billion records on a server, right?
Basically, yes.
2. When I delete records, are those oids ever reused?
No.
The real solution is to support 64 bit oids, but that has not been done
yet.
I'm not too familiar with the postgres internals, but the oid tells
postgres where the row is stored in the filesystem. So by the time
you've run out of oid's you've probably run out of space on your
filesystem.
Paul Caskey wrote:
Show quoted text
PostgreSQL uses a hidden "oid" field to uniquely identify all records
across all tables in all databases on a particular server. On my
platform, this number is an unsigned 32-bit integer, meaning 0 to about 4
billion.1. This implies a hard limit of 4 billion records on a server, right?
2. When I delete records, are those oids ever reused?
--
Paul Caskey paul@nmxs.com Software Engineer
New Mexico Software 5041 Indian School NE Albuquerque, NM 87110
--
Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes:
Paul Caskey wrote:
1. This implies a hard limit of 4 billion records on a server, right?
Basically, yes.
It's only a hard limit if your application assumes OIDs are unique.
If you don't assume that, then I think it's not a big problem.
It's possible (though obviously not especially likely) that you might
get OID collisions in the system tables after an OID-counter wraparound.
I believe that in 7.0, any such collision will result in a "duplicate
key" error, because we have unique indexes on OID for all the system
tables where it matters. So worst case is you'd have to retry a table
creation or whatever the command was. This could be annoying but it's
certainly not catastrophic.
The real solution is to support 64 bit oids, but that has not been done
yet.
64-bit OIDs are probably the Final Solution here, but we're not in a
big hurry to do that because of the space cost and portability issues.
Another idea that's been discussed is to make OID column optional in
user tables --- then, simply not using OIDs in your large tables would
be enough to make the 32-bit OID supply last indefinitely.
As long as we're talking about outer limits, it might be worth pointing
out that block numbers within tables are 32 bits. With the default
blocksize of 8K, that sets a hard limit of either 16Tb or 32Tb per
table, depending on whether block-number arithmetic is unsigned or not
(I'm not sure). You could push that to 64/128Tb by using BLCKSZ=32K.
If that won't do, maybe you need a different DBMS ;-)
regards, tom lane
Tom Lane wrote:
Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes:
Paul Caskey wrote:
1. This implies a hard limit of 4 billion records on a server, right?
Basically, yes.
It's only a hard limit if your application assumes OIDs are unique.
If you don't assume that, then I think it's not a big problem.It's possible (though obviously not especially likely) that you might
get OID collisions in the system tables after an OID-counter wraparound.
This implies they do wrap around. So they are reused? Chris said no, but
you're saying yes.
(Maybe they wrap around "by accident", by adding one to MAXINT, which will
give zero on an unsigned int, I believe. Will the system choke on zero?
Has anyone tested this wraparound?)
I will not have 4 billion records in one table or even one database. But
on a large server with many databases, it is conceivable to have 4 billion
records on one machine. With a lot of insert/delete activity, over a few
years, it is certainly conceivable to have 4 billion inserts. If the oids
don't wrap, I have a problem. I can ignore it for a long time, but it
will loom, like Y2K. :-)
Even if they do wrap, if I have some old records lying around with a low
OIDs, they will trip me up. Like you said, these are "the outer limits",
but I'm thinking ahead.
Someone suggested in private that I pg_dump/restore all my data to
"repack" the oids which start around 1700 on a fresh database. Thanks for
that idea. Also thanks, Tom, for the sanity check Re: terabytes of data
with 4 billion records. It's still possible, especially in coming years.
It would be a big feather in PG's cap to "fully support" 64-bit platforms
such as IRIX and Solaris (finally) and, coming soon to a theater near you,
Linux on IA-64.
--
Paul Caskey paul@nmxs.com 505-255-1999
New Mexico Software 5041 Indian School NE Albuquerque, NM 87110
--
Joseph Shraibman writes:
I'm not too familiar with the postgres internals, but the oid tells
postgres where the row is stored in the filesystem. So by the time
you've run out of oid's you've probably run out of space on your
filesystem.
The `tid' specifies the storage location. The `oid' is more or less an
arbitrary number as far as user tables are concerned.
--
Peter Eisentraut Sernanders v�g 10:115
peter_e@gmx.net 75262 Uppsala
http://yi.org/peter-e/ Sweden
Paul Caskey <paul@nmxs.com> writes:
Tom Lane wrote:
It's only a hard limit if your application assumes OIDs are unique.
If you don't assume that, then I think it's not a big problem.It's possible (though obviously not especially likely) that you might
get OID collisions in the system tables after an OID-counter wraparound.
This implies they do wrap around. So they are reused? Chris said no, but
you're saying yes.
(Maybe they wrap around "by accident", by adding one to MAXINT, which will
give zero on an unsigned int, I believe. Will the system choke on zero?
Has anyone tested this wraparound?)
Yes, yes, and yes ;-). I just committed a patch to prevent the system
from generating a zero OID during wraparound. I think that the old code
would not have done so in any case, but correct behavior would've
depended on the stored OID counter always being a multiple of 32,
which seems a tad risky to assume over a long time span. There's a
more direct check for wraparound in there now.
(No, I didn't test it the hard way ... just stopped the postmaster and
hacked on the OID counter in the pg_variable file by hand ...)
Even if they do wrap, if I have some old records lying around with a low
OIDs, they will trip me up.
No doubt about it, you're likely to get a few "duplicate key" errors and
stuff like that. I'm just observing that it's not likely to be a
complete catastrophe, especially not if you don't rely on OIDs to be
unique in your user tables.
We have talked about offering 8-byte OIDs as a compile-time option,
and I think it'll happen eventually, but I'm not holding my breath.
Lots of other stuff seems more pressing...
regards, tom lane
THe implications of the OIDs not wrapping are immense. We have some extremely
active databases that will easily reach this limit in two or three years. For
some applications, such as ecommerce, dumping then reinserting the rows is
not an option for large databases due to the 24 hours nature of their work.
This is a much more complex problem than it would at first seem as the
"tripping up" over old records with low OIDs still presents a problem, yet if
the system is changed to try and manage a list of available OIDs, it will
then hit performance problems.
Simply waiting for 64bit numbers is rather inelegant and also presumes usage
parameters for the database... remember Bill Gates saying that he couldn't
foresee any usage for more than 64MB of RAM? Besides which, PostgreSQL is the
best DB around... there's a high standard to maintain!
Some initial ideas:
a) The system remains working the way that it does until a configurable
cutoff point (% of OIDs remaining), at which point the DBA can either switch
on some sort of OID "garbage collection" and take the associated performance
hit, or dump the data and reinsert it to pack the OIDs.
GARBAGE COLLECTION:
b) The system could wrap around, keeping an internal pointer of where it is
in the OID chain. It could scan the OIDs sequentially finding the first free
OID. It coudl then store that position as the new start point for the next
time an OID is needed.
c) An OID compression utility could be writen that doesn't require bringing
the DB down (but will obviously have a performance it). As running this
utilty would be a known impact, provision could be made, or the knock
budgeted for and accounted for while not bringing the entire DB to a halt.
d) OIDs themselves could form a list. The OID system as it stands now could
be the "default mode" but if the OIDs run out, then a new OID list is
started. This means the OID usage would then change to work along the same
lines that the Intel memory adressing works. Personally I hate it, but it
does work.
e) OIDs could be scrapped in favour of some other system.
Well, that's my two pence worth.
Brad
Paul Caskey wrote:
Show quoted text
Tom Lane wrote:
Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes:
Paul Caskey wrote:
1. This implies a hard limit of 4 billion records on a server, right?
Basically, yes.
It's only a hard limit if your application assumes OIDs are unique.
If you don't assume that, then I think it's not a big problem.It's possible (though obviously not especially likely) that you might
get OID collisions in the system tables after an OID-counter wraparound.This implies they do wrap around. So they are reused? Chris said no, but
you're saying yes.(Maybe they wrap around "by accident", by adding one to MAXINT, which will
give zero on an unsigned int, I believe. Will the system choke on zero?
Has anyone tested this wraparound?)I will not have 4 billion records in one table or even one database. But
on a large server with many databases, it is conceivable to have 4 billion
records on one machine. With a lot of insert/delete activity, over a few
years, it is certainly conceivable to have 4 billion inserts. If the oids
don't wrap, I have a problem. I can ignore it for a long time, but it
will loom, like Y2K. :-)Even if they do wrap, if I have some old records lying around with a low
OIDs, they will trip me up. Like you said, these are "the outer limits",
but I'm thinking ahead.Someone suggested in private that I pg_dump/restore all my data to
"repack" the oids which start around 1700 on a fresh database. Thanks for
that idea. Also thanks, Tom, for the sanity check Re: terabytes of data
with 4 billion records. It's still possible, especially in coming years.
It would be a big feather in PG's cap to "fully support" 64-bit platforms
such as IRIX and Solaris (finally) and, coming soon to a theater near you,
Linux on IA-64.--
Paul Caskey paul@nmxs.com 505-255-1999
New Mexico Software 5041 Indian School NE Albuquerque, NM 87110
--
Simply waiting for 64bit numbers is rather inelegant and also
presumes usage
parameters for the database... remember Bill Gates saying that he couldn't
foresee any usage for more than 64MB of RAM?
I've heard this before and I just don't agree. 64MB ram, perhaps, but who
is going to need 64 * 2^32? The magnitude of increase is fairly vast!
I probably need not mention that a 32 bit value can store up to
4,294,967,296 where a 64 bit number can store a number that is 4,294,967,296
times as big. If 32 bit wasn't enough to keep you happy for more than a few
years, a 64 bit oid really should be enough for anyone for long enough that
you won't be using the same software/hardware any more.
Similar to how a 32 bit unix time in seconds is only good for another ~40
years, but a 64 bit one will keep us going for billions of years.
I guess the real issue is rewriting the system so that the type of oid can
be chosen at compile time, so you can use whatever damn one you feel like.
I would also guess that the majority of systems out there using the latest
versions of postgres, already have compiler support for 64 bit integers. So
when someone gets around to doing the necessary work, everything will be
nice.
- Andrew
P.S. IMHO if you can't afford to do a drop and reload once in a while,
Postgres isn't a good choice at the moment for your application.
Tom Lane wrote:
Paul Caskey <paul@nmxs.com> writes:
Tom Lane wrote:
It's only a hard limit if your application assumes OIDs are unique.
If you don't assume that, then I think it's not a big problem.It's possible (though obviously not especially likely) that you might
get OID collisions in the system tables after an OID-counter wraparound.This implies they do wrap around. So they are reused? Chris said no, but
you're saying yes.(Maybe they wrap around "by accident", by adding one to MAXINT, which will
give zero on an unsigned int, I believe. Will the system choke on zero?
Has anyone tested this wraparound?)Yes, yes, and yes ;-).
Good answer. :-) Thanks for looking into it.
Even if they do wrap, if I have some old records lying around with a low
OIDs, they will trip me up.No doubt about it, you're likely to get a few "duplicate key" errors and
stuff like that. I'm just observing that it's not likely to be a
complete catastrophe, especially not if you don't rely on OIDs to be
unique in your user tables.
I don't rely on OID uniqueness, but I assumed Postgres does!
I don't use the OID for anything; I maintain my own "id" field with a
sequence for every table. I can catch a "duplicate key" error in my code,
but I'll treat it as if my own "id" caused a problem. It will totally
confuse me and my code if there's a
"duplicate key" error related to a hidden system field I never create or
examine. I won't know if/how to re-insert with a different key to solve
the problem.
We have talked about offering 8-byte OIDs as a compile-time option,
and I think it'll happen eventually, but I'm not holding my breath.
Lots of other stuff seems more pressing...
Fair enough, and thanks for all your work so far.
--
Paul Caskey paul@nmxs.com 505-255-1999
New Mexico Software 5041 Indian School NE Albuquerque, NM 87110
--
Paul Caskey <paul@nmxs.com> writes:
No doubt about it, you're likely to get a few "duplicate key" errors and
stuff like that. I'm just observing that it's not likely to be a
complete catastrophe, especially not if you don't rely on OIDs to be
unique in your user tables.
I don't rely on OID uniqueness, but I assumed Postgres does!
Only in the system tables, and not even in all of them. From the
system's point of view, there's no real need to assign OIDs to
user table rows at all --- so another possible answer is not to
do that, unless the user requests it.
regards, tom lane
Ummm ... I'm a newbie to this list, but hasn't this evolved into a hacker
issue?
Also, Bill Gates said something along the lines of nobody will ever need
more than 640KB of RAM ... which was the usable limit on the old XT's
(remember them :) in the early MS-DOS days :)
Dave Burbidge
Network Administrator
-----Original Message-----
<snip>
Import Notes
Resolved by subject fallback
brad <brad@kieser.net> writes:
<snip>
Simply waiting for 64bit numbers is rather inelegant and also presumes usage
parameters for the database... remember Bill Gates saying that he couldn't
foresee any usage for more than 64MB of RAM? Besides which, PostgreSQL is the
best DB around... there's a high standard to maintain!
<snip>
Actually, he was purported to have said that nobody would need more
that 640KB or ram, which was the limit of memory on MSDOS. Brings back
memories... remember having to juggle the drivers on bootup to plays
that game which wanted nearly all of the base memory but you still
needed the mouse, soundcard and cdrom access? ;-)
--
Prasanth Kumar
kumar1@home.com
Import Notes
Reply to msg id not found: brad'smessageofWed26Jul2000111005+0100
The Versant ODBMS uses 48 bit oids, and if you do the math I think
you'll find that should last you forever. (It uses an additional 16 bits
to identify the database, but that's another story.).
Any complex scheme to solve this seems like a waste of time. In a couple
of years when you are likely to be running out, you'll probably be
upgrading your computer to a 64bit one with a newer version of postgres,
and then the problem will disappear.
brad wrote:
Show quoted text
THe implications of the OIDs not wrapping are immense. We have some extremely
active databases that will easily reach this limit in two or three years. For
some applications, such as ecommerce, dumping then reinserting the rows is
not an option for large databases due to the 24 hours nature of their work.This is a much more complex problem than it would at first seem as the
"tripping up" over old records with low OIDs still presents a problem, yet if
the system is changed to try and manage a list of available OIDs, it will
then hit performance problems.Simply waiting for 64bit numbers is rather inelegant and also presumes usage
parameters for the database... remember Bill Gates saying that he couldn't
foresee any usage for more than 64MB of RAM? Besides which, PostgreSQL is the
best DB around... there's a high standard to maintain!Some initial ideas:
a) The system remains working the way that it does until a configurable
cutoff point (% of OIDs remaining), at which point the DBA can either switch
on some sort of OID "garbage collection" and take the associated performance
hit, or dump the data and reinsert it to pack the OIDs.
GARBAGE COLLECTION:
b) The system could wrap around, keeping an internal pointer of where it is
in the OID chain. It could scan the OIDs sequentially finding the first free
OID. It coudl then store that position as the new start point for the next
time an OID is needed.
c) An OID compression utility could be writen that doesn't require bringing
the DB down (but will obviously have a performance it). As running this
utilty would be a known impact, provision could be made, or the knock
budgeted for and accounted for while not bringing the entire DB to a halt.
d) OIDs themselves could form a list. The OID system as it stands now could
be the "default mode" but if the OIDs run out, then a new OID list is
started. This means the OID usage would then change to work along the same
lines that the Intel memory adressing works. Personally I hate it, but it
does work.
e) OIDs could be scrapped in favour of some other system.Well, that's my two pence worth.
Brad
Paul Caskey wrote:
Tom Lane wrote:
Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes:
Paul Caskey wrote:
1. This implies a hard limit of 4 billion records on a server, right?
Basically, yes.
It's only a hard limit if your application assumes OIDs are unique.
If you don't assume that, then I think it's not a big problem.It's possible (though obviously not especially likely) that you might
get OID collisions in the system tables after an OID-counter wraparound.This implies they do wrap around. So they are reused? Chris said no, but
you're saying yes.(Maybe they wrap around "by accident", by adding one to MAXINT, which will
give zero on an unsigned int, I believe. Will the system choke on zero?
Has anyone tested this wraparound?)I will not have 4 billion records in one table or even one database. But
on a large server with many databases, it is conceivable to have 4 billion
records on one machine. With a lot of insert/delete activity, over a few
years, it is certainly conceivable to have 4 billion inserts. If the oids
don't wrap, I have a problem. I can ignore it for a long time, but it
will loom, like Y2K. :-)Even if they do wrap, if I have some old records lying around with a low
OIDs, they will trip me up. Like you said, these are "the outer limits",
but I'm thinking ahead.Someone suggested in private that I pg_dump/restore all my data to
"repack" the oids which start around 1700 on a fresh database. Thanks for
that idea. Also thanks, Tom, for the sanity check Re: terabytes of data
with 4 billion records. It's still possible, especially in coming years.
It would be a big feather in PG's cap to "fully support" 64-bit platforms
such as IRIX and Solaris (finally) and, coming soon to a theater near you,
Linux on IA-64.--
Paul Caskey paul@nmxs.com 505-255-1999
New Mexico Software 5041 Indian School NE Albuquerque, NM 87110
--
On Thu, 27 Jul 2000, Dave Burbidge wrote:
Also, Bill Gates said something along the lines of nobody will ever need
more than 640KB of RAM ... which was the usable limit on the old XT's
(remember them :) in the early MS-DOS days :)
Actually, I think the quote is an urban legend.
Brett W. McCoy
http://www.chapelperilous.net/~bmccoy/
-------------------------------------------------------------------------------
A girl's conscience doesn't really keep her from doing anything wrong--
it merely keeps her from enjoying it.
Chris Bitmead wrote:
Any complex scheme to solve this seems like a waste of time. In a couple
of years when you are likely to be running out, you'll probably be
upgrading your computer to a 64bit one with a newer version of postgres,
and then the problem will disappear.
that's the kind of thing people said about y2k, isn't it ?
--
Mathieu Arnold
Mathieu Arnold wrote:
Chris Bitmead wrote:
Any complex scheme to solve this seems like a waste of time. In a couple
of years when you are likely to be running out, you'll probably be
upgrading your computer to a 64bit one with a newer version of postgres,
and then the problem will disappear.that's the kind of thing people said about y2k, isn't it ?
I don't want to start a war but I must agree here... I recoil when the
argument is put forward for a "you will never use that up" approach.
The best that I can offer is: Oh yeah? Seen some of the Beowulf clusters
around recently?
Show quoted text
--
Mathieu Arnold
True, it is a big number and for most people I would agree with what you are saying.
Computationally the amount of activity on the database needed to use up that sort of
number is immense. But then, two years ago the prospect of a 1GHz PC processor was also
remote.
I can only say that OpenSource is exploding across the computing landscape more comprehensively
and faster than anyone thought possible and at the same time supercomputing is becoming the
order of the day rather than that rare place that you hoped to be invited into. Linux has
made this possible and OpenSource is at the very heart of this.
I would personally like to see PostgreSQL being synonymous with OpenSource installations.
To a large degree it has already achieved that. In my own experience working with datbases
of several TB in size is now becoming normal rather than exceptional.
I guess that what I am trying to say is that if the OID limit reflected the maximum storage
capacity of the system (more easily determined) rather than a transactional capacity (a lot
less easily dealt with) it would be a major feather in Postgres' cap.
However, my guess would be that were someone to write a mechanism for choosing the OID
handling algorithm, the payoff between performance/transaction limits would mean that
probably most would opt for the performance gain and deal with the OID issue as a normal
DBA procedure. I know that I would for the majority of databases that I administer.
Brad
Quoting Andrew Snow <als@fl.net.au>:
Simply waiting for 64bit numbers is rather inelegant and also
presumes usage
parameters for the database... remember Bill Gates saying that hecouldn't
foresee any usage for more than 64MB of RAM?
I've heard this before and I just don't agree. 64MB ram, perhaps, but who
is going to need 64 * 2^32? The magnitude of increase is fairly vast!I probably need not mention that a 32 bit value can store up to
4,294,967,296 where a 64 bit number can store a number that is
4,294,967,296
times as big. If 32 bit wasn't enough to keep you happy for more than a few
years, a 64 bit oid really should be enough for anyone for long enough that
you won't be using the same software/hardware any more.Similar to how a 32 bit unix time in seconds is only good for another ~40
years, but a 64 bit one will keep us going for billions of years.I guess the real issue is rewriting the system so that the type of oid can
be chosen at compile time, so you can use whatever damn one you feel like.
I would also guess that the majority of systems out there using the latest
versions of postgres, already have compiler support for 64 bit integers.
So
when someone gets around to doing the necessary work, everything will be
nice.- Andrew
P.S. IMHO if you can't afford to do a drop and reload once in a while,
Postgres isn't a good choice at the moment for your application.
Bradley Kieser
Director
Kieser.net
Quoting Tom Lane <tgl@sss.pgh.pa.us>:
Paul Caskey <paul@nmxs.com> writes:
No doubt about it, you're likely to get a few "duplicate key" errors and
stuff like that. I'm just observing that it's not likely to be a
complete catastrophe, especially not if you don't rely on OIDs to be
unique in your user tables.I don't rely on OID uniqueness, but I assumed Postgres does!
Only in the system tables, and not even in all of them. From the
system's point of view, there's no real need to assign OIDs to
user table rows at all --- so another possible answer is not to
do that, unless the user requests it.
This changes things a lot. If the rows don't have to have OIDs associated with them
then the 4bn limit is not a transactional limit... in which case there shouldn't be a problem.
regards, tom lane
Bradley Kieser
Director
Kieser.net
My mistake! ;-)
I remember wondering who would ever need more that the 16K that the Sinclair Spectrum could give you!
Quoting "Prasanth A. Kumar" <kumar1@home.com>:
brad <brad@kieser.net> writes:
<snip>
Simply waiting for 64bit numbers is rather inelegant and also presumes
usage
parameters for the database... remember Bill Gates saying that he
couldn't
foresee any usage for more than 64MB of RAM? Besides which, PostgreSQL is
the
best DB around... there's a high standard to maintain!
<snip>
Actually, he was purported to have said that nobody would need more
that 640KB or ram, which was the limit of memory on MSDOS. Brings back
memories... remember having to juggle the drivers on bootup to plays
that game which wanted nearly all of the base memory but you still
needed the mouse, soundcard and cdrom access? ;-)--
Prasanth Kumar
kumar1@home.com
Bradley Kieser
Director
Kieser.net