maximum number of rows in table - what about oid limits?

Started by John Scottalmost 25 years ago28 messagesgeneral
Jump to latest
#1John Scott
jmscott@yahoo.com

postgresql docs claim an essentially unlimited number of
rows per table.

http://postgresql.crimelabs.net/users-lounge/limitations.html

this doesn't make sense if each row has an oid.
do more subtle side effects exist if the oid wraps?

thanks - jmscottJUSTREMOVEME@yahoo.com

#2Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: John Scott (#1)
Re: [SQL] maximum number of rows in table - what about oid limits?

On 6 Jun 2001, jmscott@REMOVEMEyahoo.com wrote:

postgresql docs claim an essentially unlimited number of
rows per table.

http://postgresql.crimelabs.net/users-lounge/limitations.html

this doesn't make sense if each row has an oid.
do more subtle side effects exist if the oid wraps?

In general, unless you're relying on unique oids, you should be fine.
You probably don't want to use oid as a unique key in your tables for that
reason. Of course, sequences aren't sufficient either (also being
int4) but some kind of int8 "sequence" mechanism would do it if you expect
more than the int4 number of rows.

You might have problems with creating system table entries with unique
oids after wraparound, but generally that can be fixed by trying again.
(Some of the system tables have a unique index on oid).

#3John Scott
jmscott@yahoo.com
In reply to: Stephan Szabo (#2)
Re: [SQL] maximum number of rows in table - what about oid limits?

well i wasn't interested in using oids in my application.
i was curious about the relationship oids
and the tuple/row limit.

i guess if what you say is true, the oids are NOT used internally
by postgres. this seems odd.

so, i guess my question still stands ... what happens when oids wrap?
are oids nothing more than a sequence with an index,
not used at all internally?

i
--- Stephan Szabo <sszabo@megazone23.bigpanda.com> wrote:

On 6 Jun 2001, jmscott@REMOVEMEyahoo.com wrote:

postgresql docs claim an essentially unlimited number of
rows per table.

http://postgresql.crimelabs.net/users-lounge/limitations.html

this doesn't make sense if each row has an oid.
do more subtle side effects exist if the oid wraps?

In general, unless you're relying on unique oids, you should be fine.
You probably don't want to use oid as a unique key in your tables for that
reason. Of course, sequences aren't sufficient either (also being
int4) but some kind of int8 "sequence" mechanism would do it if you expect
more than the int4 number of rows.

You might have problems with creating system table entries with unique
oids after wraparound, but generally that can be fixed by trying again.
(Some of the system tables have a unique index on oid).

=====
John Scott (john@august.com)
Senior Partner
August Associates

email: john@august.com
web: http://www.august.com/~jmscott

__________________________________________________
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35
a year! http://personal.mail.yahoo.com/

#4Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: John Scott (#3)
Re: [SQL] maximum number of rows in table - what about oid limits?

On Wed, 6 Jun 2001, John Scott wrote:

well i wasn't interested in using oids in my application.
i was curious about the relationship oids
and the tuple/row limit.

i guess if what you say is true, the oids are NOT used internally
by postgres. this seems odd.

so, i guess my question still stands ... what happens when oids wrap?
are oids nothing more than a sequence with an index,
not used at all internally?

They are used for references between system tables. That's why
you'll get intermittent failures on things like create statements after
rollover (system tables may have unique index on oid).
As far as I know the system doesn't do stuff with the oid on user rows.

#5John Scott
jmscott@yahoo.com
In reply to: Stephan Szabo (#4)
Re: [SQL] maximum number of rows in table - what about oid limits?
--- Stephan Szabo <sszabo@megazone23.bigpanda.com> wrote:

On Wed, 6 Jun 2001, John Scott wrote:

well i wasn't interested in using oids in my application.
i was curious about the relationship oids
and the tuple/row limit.

i guess if what you say is true, the oids are NOT used internally
by postgres. this seems odd.

so, i guess my question still stands ... what happens when oids wrap?
are oids nothing more than a sequence with an index,
not used at all internally?

They are used for references between system tables. That's why
you'll get intermittent failures on things like create statements after
rollover (system tables may have unique index on oid).
As far as I know the system doesn't do stuff with the oid on user rows.

hmmm. guess i'm still not getting a clear picture on what
happens when oid's wrap? just seems like an obvious question.
if i wrap twice, things could get very wicked, indeed.

if i'm not comfortable with oid wrapping, then doesn't this imply
that postgres can only handle 2^32 tuples (or 2^31 - 1) -
this number of rows per table is relatively small by today's PC standards.
just converting/bulk loading an existing large commercial database into
postgres would expose this issue ;-)

what am i missing here? i am comfortable not using oids
in application logic. that' pretty standard.

my question is, since oid's are used by the
postgres kernel, what effect does oid wrapping have upon
the stability of the database. for example, does oid wrapping mean
i might clash with oid's in the system tables.

i'll poke around the source. just hoped somebody could
shed some light on what seemed like an obvious question.

thanks-john

=====
John Scott (john@august.com)
Senior Partner
August Associates

email: john@august.com
web: http://www.august.com/~jmscott

__________________________________________________
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35
a year! http://personal.mail.yahoo.com/

#6Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: John Scott (#5)
Re: [SQL] maximum number of rows in table - what about oid limits?

my question is, since oid's are used by the
postgres kernel, what effect does oid wrapping have upon
the stability of the database. for example, does oid wrapping mean
i might clash with oid's in the system tables.

Let me try again. I've been having problems trying to write it out
I fear.

AFAIK, the only real clashes you have to worry about are oid clashes
between rows within the same system table that have unique constraints
on oid. That means that your table rows shouldn't conflict, and you
shouldn't even get conflicts between different system tables. So,
you could get a conflict when say creating a table between an old
pg_class row and the one that was going to be created and the
create table would fail and need to be tried again, but if you
had a pg_class row oid 10000, and 100 other rows with oid 10000
in your own tables there's no conflict.

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: John Scott (#5)
Re: [SQL] maximum number of rows in table - what about oid limits?

John Scott <jmscott@yahoo.com> writes:

my question is, since oid's are used by the
postgres kernel, what effect does oid wrapping have upon
the stability of the database.

It doesn't affect stability, if by that you mean potential crashes.

What can happen after a wrap is that the OID generated for a
newly-created object might conflict with some already-existing object's
OID. If that happens, you get a duplicate-key-insertion error on the
OID index of the relevant system catalog (pg_class, pg_type, etc).
There is a unique index on OID for each system catalog wherein OID
is used to identify objects. It doesn't really matter whether the
same OID is reused in different catalogs or in user tables.

The odds of this happening seem pretty low, if you've got reasonable
numbers of objects (eg, with a couple thousand tables in a database,
you'd have a 2000/4G = 1 in 2 million chance of collision on pg_class
OID). If it does happen, you can just retry the failed object-creation
command until you get a nonconflicting OID assignment.

This is certainly not ideal, but it's not nearly as big a problem as
transaction ID wraparound. You can live with it, whereas right now
xact ID wraparound is catastrophic. That we gotta work on, soon.

regards, tom lane

#8Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#7)
Re: maximum number of rows in table - what about oid limits?

Tom, Stephan,

What can happen after a wrap is that the OID generated for a
newly-created object might conflict with some already-existing
object's
OID. If that happens, you get a duplicate-key-insertion error on the
OID index of the relevant system catalog (pg_class, pg_type, etc).
There is a unique index on OID for each system catalog wherein OID
is used to identify objects. It doesn't really matter whether the
same OID is reused in different catalogs or in user tables.

Given this, why bother with system-generated OIDs on user rows at all?
Why not simply reserve the OIDs for the system tables?

Or are you planning to later increase the OIDs to INT8 and start using
them for OODB-analogous behavior on individual records?

This is certainly not ideal, but it's not nearly as big a problem as
transaction ID wraparound. You can live with it, whereas right now
xact ID wraparound is catastrophic. That we gotta work on, soon.

Nothing like reassuring us commercial DB users, Tom. :-P

Can you describe what you're talking about?

-Josh

______AGLIO DATABASE SOLUTIONS___________________________
Josh Berkus
Complete information technology josh@agliodbs.com
and data management solutions (415) 565-7293
for law firms, small businesses fax 621-2533
and non-profit organizations. San Francisco

#9John Scott
jmscott@yahoo.com
In reply to: Tom Lane (#7)
Re: [SQL] maximum number of rows in table - what about oid limits?
--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

John Scott <jmscott@yahoo.com> writes:

my question is, since oid's are used by the
postgres kernel, what effect does oid wrapping have upon
the stability of the database.

It doesn't affect stability, if by that you mean potential crashes.

yes and know. an application workaround is ok,

What can happen after a wrap is that the OID generated for a
newly-created object might conflict with some already-existing object's
OID. If that happens, you get a duplicate-key-insertion error on the
OID index of the relevant system catalog (pg_class, pg_type, etc).
There is a unique index on OID for each system catalog wherein OID
is used to identify objects. It doesn't really matter whether the
same OID is reused in different catalogs or in user tables.

depends on how your are changing the catalog.
but does retrying fix the problem? wouldn't the second try just often
see the same oid again. does a failed attempt increment the oid????

The odds of this happening seem pretty low, if you've got reasonable
numbers of objects (eg, with a couple thousand tables in a database,
you'd have a 2000/4G = 1 in 2 million chance of collision on pg_class
OID). If it does happen, you can just retry the failed object-creation
command until you get a nonconflicting OID assignment.

same argument as above.

all this sounds like a reasonable halfway solution.
i'm comfortable NOT using OID's at the application level for many reasons.
that's not a problem. i just don't want my imported, test database of 20B
tuples crashing for mysterious reasons, obviously, right out of the gate,
with me mumbling and
waving my hands at the wall. i can do that now with
commercial systems.

This is certainly not ideal, but it's not nearly as big a problem as
transaction ID wraparound. You can live with it, whereas right now
xact ID wraparound is catastrophic. That we gotta work on, soon.

yep, 1000tx/sec ~~ 50 days before wrap.

by the way, have you written anything up on the txid wrapping problem?
we talked about this at osdn last year in san jose. i proposed
64 bits, and you discussed a clever wrapping algorithm involving 48 bits and,
if memory serves me, backward compatiablility of the disk format.
i was the one driving the car.

probably time to take this thread private?

cheers-john

regards, tom lane

__________________________________________________
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35
a year! http://personal.mail.yahoo.com/

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: John Scott (#9)
Re: [SQL] maximum number of rows in table - what about oid limits?

John Scott <jmscott@yahoo.com> writes:

but does retrying fix the problem? wouldn't the second try just often
see the same oid again. does a failed attempt increment the oid????

Yes.

by the way, have you written anything up on the txid wrapping problem?
we talked about this at osdn last year in san jose. i proposed
64 bits, and you discussed a clever wrapping algorithm involving 48 bits and,
if memory serves me, backward compatiablility of the disk format.

The wraparound idea was discussed on pghackers; people didn't like it
much, but I haven't heard a better near-term solution ...

regards, tom lane

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#8)
Re: maximum number of rows in table - what about oid limits?

"Josh Berkus" <josh@agliodbs.com> writes:

Given this, why bother with system-generated OIDs on user rows at all?
Why not simply reserve the OIDs for the system tables?

An option to not generate OIDs unless requested (on a table-by-table
basis) has been discussed. It seems like a fine near-term solution
to me. 8-byte OIDs are a longer-term solution, because they'll break
a lot of things (including clients...)

This is certainly not ideal, but it's not nearly as big a problem as
transaction ID wraparound. You can live with it, whereas right now
xact ID wraparound is catastrophic. That we gotta work on, soon.

Nothing like reassuring us commercial DB users, Tom. :-P
Can you describe what you're talking about?

It's in the archives: after 4G transactions, your database curls up
and dies. When your pg_log starts to approach 1Gbyte (2 bits per
transaction) you'd better plan on dump/initdb/reload.

regards, tom lane

#12Paul Tomblin
ptomblin@xcski.com
In reply to: Tom Lane (#10)
Help me speed things up...

I'm doing some postgres stuff in perl. I haven't done much database stuff
in many years, and back then I was doing Oracle in C. I'm doing a major
query that gets a bunch of fields including a key, and then does other
queries with that key on other tables. Now with Oracle I would have set
up and compiled a query for the subqueries, and then only substituted in
the key variable and do the query, and repeat ad-nauseum. (Forgive the
fact that I've forgotten the terminology) There doesn't seem to be any
way to do that in Pg.pm, unless I'm missing something, and my subqueries
are making the whole thing slow as molasses. I think the query I just did
took about 45 minutes when I did the subqueries, and about 30 seconds
without.

--
Paul Tomblin <ptomblin@xcski.com>, not speaking for anybody
"But I heard him exclaim, as he climbed in the night
Merry Christmas to all, I have traffic in sight" - Phyllis Moses

#13guard
guard@ficnet.net
In reply to: Stephan Szabo (#2)
how to configure --with-tcl in CYGWIN

please help me

#14Jonathan Bartlett
johnnyb6@sdf.lonestar.org
In reply to: Tom Lane (#11)
Re: maximum number of rows in table - what about oid limits?

Besides compatibility, what breaks when you make OIDs/Txn IDs
INT8s? Maybe there should be a minor fork called Postgres64 which does
this for those needing large tables.

Jon

johnnyb6@sdf.lonestar.org
SDF Public Access UNIX System - http://sdf.lonestar.org

On Wed, 6 Jun 2001, Tom Lane wrote:

Show quoted text

"Josh Berkus" <josh@agliodbs.com> writes:

Given this, why bother with system-generated OIDs on user rows at all?
Why not simply reserve the OIDs for the system tables?

An option to not generate OIDs unless requested (on a table-by-table
basis) has been discussed. It seems like a fine near-term solution
to me. 8-byte OIDs are a longer-term solution, because they'll break
a lot of things (including clients...)

This is certainly not ideal, but it's not nearly as big a problem as
transaction ID wraparound. You can live with it, whereas right now
xact ID wraparound is catastrophic. That we gotta work on, soon.

Nothing like reassuring us commercial DB users, Tom. :-P
Can you describe what you're talking about?

It's in the archives: after 4G transactions, your database curls up
and dies. When your pg_log starts to approach 1Gbyte (2 bits per
transaction) you'd better plan on dump/initdb/reload.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

#15Tim Barnard
tbarnard@povn.com
In reply to: John Scott (#5)
Re: Re: [SQL] maximum number of rows in table - what about oid limits?

<snip>
If it does happen, you can just retry the failed object-creation
command until you get a nonconflicting OID assignment.
<snip>

Tom, or anyone who'd know for that matter, what is the
exact error I would need to look for?

Tim

----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: <john@august.com>
Cc: "Stephan Szabo" <sszabo@megazone23.bigpanda.com>;
<pgsql-general@postgresql.org>; <pgsql-sql@postgresql.org>
Sent: Wednesday, June 06, 2001 3:37 PM
Subject: [GENERAL] Re: [SQL] maximum number of rows in table - what about
oid limits?

Show quoted text

John Scott <jmscott@yahoo.com> writes:

my question is, since oid's are used by the
postgres kernel, what effect does oid wrapping have upon
the stability of the database.

It doesn't affect stability, if by that you mean potential crashes.

What can happen after a wrap is that the OID generated for a
newly-created object might conflict with some already-existing object's
OID. If that happens, you get a duplicate-key-insertion error on the
OID index of the relevant system catalog (pg_class, pg_type, etc).
There is a unique index on OID for each system catalog wherein OID
is used to identify objects. It doesn't really matter whether the
same OID is reused in different catalogs or in user tables.

The odds of this happening seem pretty low, if you've got reasonable
numbers of objects (eg, with a couple thousand tables in a database,
you'd have a 2000/4G = 1 in 2 million chance of collision on pg_class
OID). If it does happen, you can just retry the failed object-creation
command until you get a nonconflicting OID assignment.

This is certainly not ideal, but it's not nearly as big a problem as
transaction ID wraparound. You can live with it, whereas right now
xact ID wraparound is catastrophic. That we gotta work on, soon.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jonathan Bartlett (#14)
Re: maximum number of rows in table - what about oid limits?

Jonathan Bartlett <johnnyb6@sdf.lonestar.org> writes:

Besides compatibility, what breaks when you make OIDs/Txn IDs
INT8s?

OIDs are part of the on-the-wire protocol (column type data returned
during a SELECT is in the form of OIDs). So extending OIDs to int8
would pose nontrivial client/server compatibility hazards.

Maybe there should be a minor fork called Postgres64 which does
this for those needing large tables.

I've been thinking it should be a compile-time option. But the protocol
issue is going to be a problem.

regards, tom lane

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tim Barnard (#15)
Re: Re: [SQL] maximum number of rows in table - what about oid limits?

"Tim Barnard" <tbarnard@povn.com> writes:

<snip>
If it does happen, you can just retry the failed object-creation
command until you get a nonconflicting OID assignment.
<snip>

Tom, or anyone who'd know for that matter, what is the
exact error I would need to look for?

It'd be a complaint about attempt to insert a duplicate key into
the unique index on one of several system catalogs' OID columns.
There wouldn't be any earlier detection of the problem than that.

regards, tom lane

#18Jonathan Bartlett
johnnyb6@sdf.lonestar.org
In reply to: Tom Lane (#16)
Re: maximum number of rows in table - what about oid limits?

I've been thinking it should be a compile-time option. But the protocol
issue is going to be a problem.

The protocol issues shouldn't be a problem if you require separate
drivers. Force clients to use libpq64.so.

The reason I'm concerned about this is that there are a _lot_ of great
things you can do with OIDs if they are unique. For example, if you've
worked with Oracle Applications, they have this tacked on to the end of
all of their tables:

LAST_UPDATED_BY Int,
LAST_UPDATED_DATE Date,
...

Which could be replaced by a single table

Table UPDATE_HISTORY
REFERRED_OBJECT OID,
LAST_UPDATED_BY OID,
LAST_UPDATED_DATE Date

which applies to all tables. You could also have a notes table for
everything -

table NOTES
REFERRED_OBJECT OID,
Note Text

And so on and so forth. However, with OID wrap-around, this would be
problematic.

Jon

Show quoted text

regards, tom lane

#19Josh Berkus
josh@agliodbs.com
In reply to: Jonathan Bartlett (#18)
Re: maximum number of rows in table - what about oid limits?

Jon,

Which could be replaced by a single table

Table UPDATE_HISTORY
REFERRED_OBJECT OID,
LAST_UPDATED_BY OID,
LAST_UPDATED_DATE Date

which applies to all tables. You could also have a notes table for
everything -

table NOTES
REFERRED_OBJECT OID,
Note Text

Actually, I'm doing this with one of my applications. Rather than using
the (problematic) OID, I simply established an independant sequence
('universal_sq') and used that as the primary key for all of my
important data tables. So long as the total records in these tables
stays < 2.4 billion, I'm doing fine. Nothing in PostgreSQL prevents you
from using a single independent sequence as the key for multiple tables.

If you are concerned about having > 2.4 billion recs, then perhaps it's
time to hack an INT8 sequence functionality. I think that adding INT8
sequences to the PostgreSQL database would be a *lot* easier than
modifying OID functionality. In fact, if it matters to you, why not pay
for it to get done?

-Josh Berkus

P.S. A lot of these concerns affect only developers with high-traffic
web applications and similar. For example, in my small business
software, it will take <> 1 million days to exhaust the TXN register.
Not something I need to worry about.

Is there some good way that we can "vote with our pocketbooks" for
various development issues in the PostgreSQL to-do list, short of hiring
a C programmer ourselves? I, for one, am desperately eager for real
stored procedures, and could get my clients to contribute toward the
development, but not more than 4 figures ...

______AGLIO DATABASE SOLUTIONS___________________________
Josh Berkus
Complete information technology josh@agliodbs.com
and data management solutions (415) 565-7293
for law firms, small businesses fax 621-2533
and non-profit organizations. San Francisco

#20Jonathan Bartlett
johnnyb6@sdf.lonestar.org
In reply to: Josh Berkus (#19)
Re: maximum number of rows in table - what about oid limits?

Actually, I'm doing this with one of my applications. Rather than using
the (problematic) OID, I simply established an independant sequence
('universal_sq') and used that as the primary key for all of my
important data tables. So long as the total records in these tables
stays < 2.4 billion, I'm doing fine. Nothing in PostgreSQL prevents you
from using a single independent sequence as the key for multiple tables.

The nice thing about OIDs is that if you ever need to merge rows, you
could make a function that searched all OID-type parameters and change the
old reference to the new one.

If you are concerned about having > 2.4 billion recs, then perhaps it's
time to hack an INT8 sequence functionality. I think that adding INT8
sequences to the PostgreSQL database would be a *lot* easier than
modifying OID functionality. In fact, if it matters to you, why not pay
for it to get done?

Were I doing PostgreSQL for a business, I most assuredly would. However,
I am currently just toying around with creating a common set of base
tables that would be useful in almost any application, taking advantage of
PostgreSQL's specific capabilities. However, I asked the question because
if the only real problem is compatibility, I might be able to do it
myself (or maybe not).

P.S. A lot of these concerns affect only developers with high-traffic
web applications and similar. For example, in my small business
software, it will take <> 1 million days to exhaust the TXN register.
Not something I need to worry about.

Agreed, but if people are going to take PostgreSQL seriously, then some
of the top limits need to be expanded.

Is there some good way that we can "vote with our pocketbooks" for
various development issues in the PostgreSQL to-do list, short of hiring
a C programmer ourselves? I, for one, am desperately eager for real
stored procedures, and could get my clients to contribute toward the
development, but not more than 4 figures ...

You could probably find a college student to do whatever you wanted for a
grand.

Out of curiosity, what do you mean by "Real Stored Procedures"?

Jon

#21Josh Berkus
josh@agliodbs.com
In reply to: Jonathan Bartlett (#20)
#22Jonathan Bartlett
johnnyb6@sdf.lonestar.org
In reply to: Josh Berkus (#21)
#23Josh Berkus
josh@agliodbs.com
In reply to: Jonathan Bartlett (#22)
#24Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#11)
#25Bruce Momjian
bruce@momjian.us
In reply to: Jonathan Bartlett (#14)
#26Doug McNaught
doug@wireboard.com
In reply to: John Scott (#9)
#27GH
grasshacker@over-yonder.net
In reply to: John Scott (#3)
#28Tim Barnard
tbarnard@povn.com
In reply to: John Scott (#3)