4B row limit for CLOB tables

Started by Roger Packabout 11 years ago40 messageshackersgeneral
Jump to latest
#1Roger Pack
rogerdpack2@gmail.com
hackersgeneral

Hello. I see on this page a mention of basically a 4B row limit for
tables that have BLOB's
https://wiki.postgresql.org/wiki/BinaryFilesInDB

Is this fact mentioned in the documentation anywhere? Is there an
official source for this? (If not, maybe consider this a feature
request to mention it in the documentation on BLOB).
Cheers and thanks.
-roger-

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Roger Pack
rogerdpack2@gmail.com
In reply to: Roger Pack (#1)
hackersgeneral
Re: 4B row limit for CLOB tables

On 1/29/15, Roger Pack <rogerdpack2@gmail.com> wrote:

Hello. I see on this page a mention of basically a 4B row limit for
tables that have BLOB's

Oops I meant for BYTEA or TEXT columns, but it's possible the
reasoning is the same...

https://wiki.postgresql.org/wiki/BinaryFilesInDB

Is this fact mentioned in the documentation anywhere? Is there an
official source for this? (If not, maybe consider this a feature
request to mention it in the documentation on BLOB).
Cheers and thanks.
-roger-

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Roger Pack (#1)
hackersgeneral
Re: 4B row limit for CLOB tables

On 01/29/2015 09:51 AM, Roger Pack wrote:

Hello. I see on this page a mention of basically a 4B row limit for
tables that have BLOB's
https://wiki.postgresql.org/wiki/BinaryFilesInDB

Is this fact mentioned in the documentation anywhere? Is there an
official source for this? (If not, maybe consider this a feature
request to mention it in the documentation on BLOB).

Take a look at:

http://www.postgresql.org/about/

Pretty sure it has to do with this:

Maximum Table Size 32 TB

Cheers and thanks.
-roger-

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Steve Atkins
steve@blighty.com
In reply to: Roger Pack (#2)
hackersgeneral
Re: 4B row limit for CLOB tables

On Jan 29, 2015, at 9:53 AM, Roger Pack <rogerdpack2@gmail.com> wrote:

On 1/29/15, Roger Pack <rogerdpack2@gmail.com> wrote:

Hello. I see on this page a mention of basically a 4B row limit for
tables that have BLOB's

Oops I meant for BYTEA or TEXT columns, but it's possible the
reasoning is the same...

It only applies to large objects, not bytea or text.

https://wiki.postgresql.org/wiki/BinaryFilesInDB

Some of that looks incorrect or out of date. (e.g. large objects can be a lot
bigger than 2GB in 9.3+).

Is this fact mentioned in the documentation anywhere? Is there an
official source for this? (If not, maybe consider this a feature
request to mention it in the documentation on BLOB).
Cheers and thanks.
-roger

I'm not sure whether it's mentioned explicitly, but large objects are
referenced by an OID, which is a 32 bit value (and a global resource).

If you had 4B BLOBs, though, running out of OIDs would probably be
the least of your worries.

Cheers,
Steve

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Bill Moran
wmoran@potentialtech.com
In reply to: Steve Atkins (#4)
hackersgeneral
Re: 4B row limit for CLOB tables

On Thu, 29 Jan 2015 10:41:58 -0800
Steve Atkins <steve@blighty.com> wrote:

Is this fact mentioned in the documentation anywhere? Is there an
official source for this? (If not, maybe consider this a feature
request to mention it in the documentation on BLOB).
Cheers and thanks.
-roger

I'm not sure whether it's mentioned explicitly, but large objects are
referenced by an OID, which is a 32 bit value (and a global resource).

If you had 4B BLOBs, though, running out of OIDs would probably be
the least of your worries.

Because of how other RDBMs systems use BLOB-ish types, I think a lot of
people get confused about when to use bytea vs. a large object in
PostgreSQL ... and as a result, end up using large objects more often
than is really necessary.

Large objects are for LARGE data ... keep in mind that a bytea column
can store up to 4G of data. While that seems excessive, it's perfectly
reasonable to use it to store images and other data that's frequently
in the "several megabytes" range. In general, if you can transfer the
entirety of the data in a single shot, then bytea will work fine for
you.

Large objects are for something more like streaming media, where it's
impractical to store the entire file in memory, even for a short time.
I.e. you'd read ~100k from the DB into application memory, do
processing on that data, then discard it and read another 100k. While
large objects certainly fill a nitch and for some uses are the only
way to make things work, I have never actually seen an implementation
where large objects were the right solution to the problem. (Of course,
I've never worked in the movie or music industry) but I've found
that bytea is usually the correct storage method for things like PDF
files, wordprocessor files, images, etc ... anything where the entire
file needs to be delivered before it can be used.

--
Bill Moran

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Roger Pack
rogerdpack2@gmail.com
In reply to: Roger Pack (#1)
hackersgeneral
Re: 4B row limit for CLOB tables

Forgot to reply all on this one, many thanks to Steve Adrian and Bill
for their answers.

On Jan 29, 2015, at 12:32 PM, Roger Pack <rogerdpack2@gmail.com> wrote:

On 1/29/15, Steve Atkins <steve@blighty.com> wrote:

On Jan 29, 2015, at 9:53 AM, Roger Pack <rogerdpack2@gmail.com> wrote:

On 1/29/15, Roger Pack <rogerdpack2@gmail.com> wrote:

Hello. I see on this page a mention of basically a 4B row limit for
tables that have BLOB's

Oops I meant for BYTEA or TEXT columns, but it's possible the
reasoning is the same...

It only applies to large objects, not bytea or text.

https://wiki.postgresql.org/wiki/BinaryFilesInDB

Some of that looks incorrect or out of date. (e.g. large objects can be a
lot
bigger than 2GB in 9.3+).

Is this fact mentioned in the documentation anywhere? Is there an
official source for this? (If not, maybe consider this a feature
request to mention it in the documentation on BLOB).
Cheers and thanks.
-roger

I'm not sure whether it's mentioned explicitly, but large objects are
referenced by an OID, which is a 32 bit value (and a global resource).

Thanks for the info, precisely what I was looking for.
As a following up, could you elaborate on what you mean by "global
resource"? I believe OID's are generated from a global counter. Does
this mean the maximum number of large objects in the database is 4B?

Well, OIDs are generated from a global counter, but there's nothing to
stop that wrapping around.

OIDs are used in a lot of places in the system - to identify tables,
and functions
and loaded modules and suchlike, and duplicates are prevented
by unique indexes or similar. But that means that if the OID counter were
to wrap around and return an OID that was already in use in the same
context then the attempt to use it would fail. For instance, you might
try to create a table, and it would fail because the "next" OID was already
used to specify another table.

Wrapping the OID counter around will cause all sorts of things to break.

Use of OIDs by the user (as opposed to by postgresql itself for internal
bookkeeping) has been deprecated for years.

That's one reason, but not the only reason, that I don't believe anyone
should every use the postgresql large object infrastructure. For small
(megabytes rather than gigabytes) chunks of data that might be processed
in the database or might not, bytea or text types are the right thing. For
anything larger, or anything that's not actually processed within the database
(e.g. images or PDFs handled by a webapp) then leaving the file on the
filesystem and just storing metadata in the database is usually the right
thing.

If you actually had that many BLOB's (and the counter wrapped) I
assume that lo_create would start failing [i.e. it has some kind of
uniqueness constraint on the oid]? Or something like that?

lo_create() would fail, but so would the rest of the database. Nothing
would work.

A billion large objects is too many. (Exactly a billion too many, in fact).

Cheers,
Steve

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Roger Pack
rogerdpack2@gmail.com
In reply to: Steve Atkins (#4)
hackersgeneral
Re: 4B row limit for CLOB tables

On 1/29/15, Steve Atkins <steve@blighty.com> wrote:

On Jan 29, 2015, at 9:53 AM, Roger Pack <rogerdpack2@gmail.com> wrote:

On 1/29/15, Roger Pack <rogerdpack2@gmail.com> wrote:

Hello. I see on this page a mention of basically a 4B row limit for
tables that have BLOB's

Oops I meant for BYTEA or TEXT columns, but it's possible the
reasoning is the same...

It only applies to large objects, not bytea or text.

OK I think I figured out possibly why the wiki says this. I guess
BYTEA entries > 8KB will be autostored via TOAST, which uses an OID in
its backend. So BYTEA have the same limitation. It appears that
disabling TOAST is not an option [1]/messages/by-id/20130405140348.GC4326@awork2.anarazel.de?
So I guess if the number of BYTEA entries (in the sum all tables?
partitioning doesn't help?) with size > 8KB is > 4 billion then there
is actually no option there? (I'm not running into that case just
conjecturing).
Thanks!
-roger-

[1]: /messages/by-id/20130405140348.GC4326@awork2.anarazel.de

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Steve Atkins (#4)
hackersgeneral
Re: 4B row limit for CLOB tables

I'm not sure whether it's mentioned explicitly, but large objects are
referenced by an OID, which is a 32 bit value (and a global resource).

Large object is not necessarily referenced by OID since 8.1. You can
assign arbitrary 32 bit integers as long as they are unique in the
pg_largeobject table.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Roger Pack
rogerdpack2@gmail.com
In reply to: Roger Pack (#7)
hackersgeneral
Fwd: [GENERAL] 4B row limit for CLOB tables

On 1/29/15, Roger Pack <rogerdpack2@gmail.com> wrote:

Hello. I see on this page a mention of basically a 4B row limit for
tables that have BLOB's

Oops I meant for BYTEA or TEXT columns, but it's possible the
reasoning is the same...

It only applies to large objects, not bytea or text.

OK I think I figured out possibly why the wiki says this. I guess
BYTEA entries > 2KB will be autostored via TOAST, which uses an OID in
its backend. So BYTEA has a same limitation. It appears that
disabling TOAST is not an option [1]/messages/by-id/20130405140348.GC4326@awork2.anarazel.de.
So I guess if the number of BYTEA entries (in the sum all tables?
partitioning doesn't help?) with size > 2KB is > 4 billion then there
is actually no option there? If this occurred it might cause "all
sorts of things to break"? [2]/messages/by-id/CAL1QdWfb-p5kE9DT2pMqBxohaKG=vxmDREmSBjc+7TkbOeKBBw@mail.gmail.com
Thanks!
-roger-

[1]: /messages/by-id/20130405140348.GC4326@awork2.anarazel.de
[2]: /messages/by-id/CAL1QdWfb-p5kE9DT2pMqBxohaKG=vxmDREmSBjc+7TkbOeKBBw@mail.gmail.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Roger Pack (#9)
hackersgeneral
Re: Fwd: [GENERAL] 4B row limit for CLOB tables

On 1/30/15 11:54 AM, Roger Pack wrote:

On 1/29/15, Roger Pack <rogerdpack2@gmail.com> wrote:

Hello. I see on this page a mention of basically a 4B row limit for
tables that have BLOB's

Oops I meant for BYTEA or TEXT columns, but it's possible the
reasoning is the same...

It only applies to large objects, not bytea or text.

OK I think I figured out possibly why the wiki says this. I guess
BYTEA entries > 2KB will be autostored via TOAST, which uses an OID in
its backend. So BYTEA has a same limitation. It appears that
disabling TOAST is not an option [1].
So I guess if the number of BYTEA entries (in the sum all tables?
partitioning doesn't help?) with size > 2KB is > 4 billion then there
is actually no option there? If this occurred it might cause "all
sorts of things to break"? [2]

It's a bit more complex than that. First, toast isn't limited to bytea;
it holds for ALL varlena fields in a table that are allowed to store
externally. Second, the limit is actually per-table: every table gets
it's own toast table, and each toast table is limited to 4B unique OIDs.
Third, the OID counter is actually global, but the code should handle
conflicts by trying to get another OID. See toast_save_datum(), which
calls GetNewOidWithIndex().

Now, the reality is that GetNewOidWithIndex() is going to keep
incrementing the global OID counter until it finds an OID that isn't in
the toast table. That means that if you actually get anywhere close to
using 4B OIDs you're going to become extremely unhappy with the
performance of toasting new data.

I don't think it would be horrifically hard to change the way toast OIDs
are assigned (I'm thinking we'd basically switch to creating a sequence
for every toast table), but I don't think anyone's ever tried to push
toast hard enough to hit this kind of limit.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Roger Pack
rogerdpack2@gmail.com
In reply to: Roger Pack (#1)
hackersgeneral
Re: Fwd: [GENERAL] 4B row limit for CLOB tables

Oops forgot to forward to the list (suggestion/feature request to the
list admin for the various pg lists: make the default "reply to" go to
the list, not the sender, if at all possible).

Response below:

On 1/30/15, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 1/30/15 11:54 AM, Roger Pack wrote:

On 1/29/15, Roger Pack <rogerdpack2@gmail.com> wrote:

Hello. I see on this page a mention of basically a 4B row limit for
tables that have BLOB's

Oops I meant for BYTEA or TEXT columns, but it's possible the
reasoning is the same...

It only applies to large objects, not bytea or text.

OK I think I figured out possibly why the wiki says this. I guess
BYTEA entries > 2KB will be autostored via TOAST, which uses an OID in
its backend. So BYTEA has a same limitation. It appears that
disabling TOAST is not an option [1].
So I guess if the number of BYTEA entries (in the sum all tables?
partitioning doesn't help?) with size > 2KB is > 4 billion then there
is actually no option there? If this occurred it might cause "all
sorts of things to break"? [2]

It's a bit more complex than that. First, toast isn't limited to bytea;
it holds for ALL varlena fields in a table that are allowed to store
externally. Second, the limit is actually per-table: every table gets
it's own toast table, and each toast table is limited to 4B unique OIDs.
Third, the OID counter is actually global, but the code should handle
conflicts by trying to get another OID. See toast_save_datum(), which
calls GetNewOidWithIndex().

Now, the reality is that GetNewOidWithIndex() is going to keep
incrementing the global OID counter until it finds an OID that isn't in
the toast table. That means that if you actually get anywhere close to
using 4B OIDs you're going to become extremely unhappy with the
performance of toasting new data.

OK so "system stability" doesn't degrade per se when it wraps [since
they all use that GetNewOid method or similar [?] good to know.

So basically when it gets near 4B TOAST'ed rows it may have to wrap that
counter and search for "unused" number, and for each number it's
querying the TOAST table to see if it's already used, degrading
performance.

So I guess partitioning tables for now is an acceptable work around,
good to know.

Thanks much for your response, good to know the details before we dive
into postgres with our 8B row table with BYTEA's in it :)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Roger Pack (#11)
hackersgeneral
Re: Fwd: [GENERAL] 4B row limit for CLOB tables

On 31/01/15 14:55, Roger Pack wrote:

[...]

Now, the reality is that GetNewOidWithIndex() is going to keep
incrementing the global OID counter until it finds an OID that isn't in
the toast table. That means that if you actually get anywhere close to
using 4B OIDs you're going to become extremely unhappy with the
performance of toasting new data.
OK so "system stability" doesn't degrade per se when it wraps [since
they all use that GetNewOid method or similar [?] good to know.

So basically when it gets near 4B TOAST'ed rows it may have to wrap that
counter and search for "unused" number, and for each number it's
querying the TOAST table to see if it's already used, degrading
performance.

The problem here is that performance degrades exponentially, or
worse. Speaking here from experience, we already tested this for a very
similar case (table creation, where two oids are consumed from a global
sequence when inserting to pg_class). Have a look at
http://www.slideshare.net/nosys/billion-tables-project-nycpug-2013,
slides 43-45. We tested there this scenario and shown that table
creations per second dropped from 10K to a few per second and then to a
few per day. In the graphs you can't even realize there were more tables
been created. At around 8K tables from the theoretical limit of 4B oids
consumed, the process basically stopped (doing more insertions).

Hope that this information helps.

Best regards,

Álvaro

--
Álvaro Hernández Tortosa

-----------
8Kdata

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13José Luis Tallón
jltallon@adv-solutions.net
In reply to: Jim Nasby (#10)
hackersgeneral
Re: Fwd: [GENERAL] 4B row limit for CLOB tables

On 01/31/2015 12:25 AM, Jim Nasby wrote:

[snip]
It's a bit more complex than that. First, toast isn't limited to
bytea; it holds for ALL varlena fields in a table that are allowed to
store externally. Second, the limit is actually per-table: every table
gets it's own toast table, and each toast table is limited to 4B
unique OIDs. Third, the OID counter is actually global, but the code
should handle conflicts by trying to get another OID. See
toast_save_datum(), which calls GetNewOidWithIndex().

Now, the reality is that GetNewOidWithIndex() is going to keep
incrementing the global OID counter until it finds an OID that isn't
in the toast table. That means that if you actually get anywhere close
to using 4B OIDs you're going to become extremely unhappy with the
performance of toasting new data.

Indeed ......

I don't think it would be horrifically hard to change the way toast
OIDs are assigned (I'm thinking we'd basically switch to creating a
sequence for every toast table), but I don't think anyone's ever tried
to push toast hard enough to hit this kind of limit.

We did. The Billion Table Project, part2 (a.k.a. "when does Postgres'
OID allocator become a bottleneck").... The allocator becomes
essentially unusable at about 2.1B OIDs, where it performed very well at
"quite empty"(< 100M objects) levels.

So yes, using one sequence per TOAST table should help.
Combined with the new SequenceAMs / sequence implementation being
proposed (specifically: one file for all sequences in a certain
tablespace) this should scale much better.

My 2c.

Regards,

/ J.L.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Roger Pack
rogerdpack2@gmail.com
In reply to: Roger Pack (#1)
hackersgeneral
Re: Fwd: [GENERAL] 4B row limit for CLOB tables

On 1/30/15, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 1/30/15 11:54 AM, Roger Pack wrote:

On 1/29/15, Roger Pack <rogerdpack2@gmail.com> wrote:

Hello. I see on this page a mention of basically a 4B row limit for
tables that have BLOB's

Oops I meant for BYTEA or TEXT columns, but it's possible the
reasoning is the same...

It only applies to large objects, not bytea or text.

OK I think I figured out possibly why the wiki says this. I guess
BYTEA entries > 2KB will be autostored via TOAST, which uses an OID in
its backend. So BYTEA has a same limitation. It appears that
disabling TOAST is not an option [1].
So I guess if the number of BYTEA entries (in the sum all tables?
partitioning doesn't help?) with size > 2KB is > 4 billion then there
is actually no option there? If this occurred it might cause "all
sorts of things to break"? [2]

It's a bit more complex than that. First, toast isn't limited to bytea;
it holds for ALL varlena fields in a table that are allowed to store
externally. Second, the limit is actually per-table: every table gets
it's own toast table, and each toast table is limited to 4B unique OIDs.
Third, the OID counter is actually global, but the code should handle
conflicts by trying to get another OID. See toast_save_datum(), which
calls GetNewOidWithIndex().

Now, the reality is that GetNewOidWithIndex() is going to keep
incrementing the global OID counter until it finds an OID that isn't in
the toast table. That means that if you actually get anywhere close to
using 4B OIDs you're going to become extremely unhappy with the
performance of toasting new data.

OK so "system stability" doesn't degrade per se when it wraps, good to know.

So basically when it gets near 4B rows it may have to wrap that
counter multiple times, and for each "entry" it's searching if it's
already used, etc.

So I guess partitioning tables for now is an acceptable work around,
good to know.

Thanks much for your response, good to know the details before we dive
into postgres with our 8B row table with BYTEA's in it :)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Roger Pack
rogerdpack2@gmail.com
In reply to: Roger Pack (#1)
hackersgeneral
Re: Fwd: [GENERAL] 4B row limit for CLOB tables

On 2/2/15, José Luis Tallón <jltallon@adv-solutions.net> wrote:

On 01/31/2015 12:25 AM, Jim Nasby wrote:

[snip]
It's a bit more complex than that. First, toast isn't limited to
bytea; it holds for ALL varlena fields in a table that are allowed to
store externally. Second, the limit is actually per-table: every table
gets it's own toast table, and each toast table is limited to 4B
unique OIDs. Third, the OID counter is actually global, but the code
should handle conflicts by trying to get another OID. See
toast_save_datum(), which calls GetNewOidWithIndex().

Now, the reality is that GetNewOidWithIndex() is going to keep
incrementing the global OID counter until it finds an OID that isn't
in the toast table. That means that if you actually get anywhere close
to using 4B OIDs you're going to become extremely unhappy with the
performance of toasting new data.

Indeed ......

I don't think it would be horrifically hard to change the way toast
OIDs are assigned (I'm thinking we'd basically switch to creating a
sequence for every toast table), but I don't think anyone's ever tried
to push toast hard enough to hit this kind of limit.

We did. The Billion Table Project, part2 (a.k.a. "when does Postgres'
OID allocator become a bottleneck").... The allocator becomes
essentially unusable at about 2.1B OIDs, where it performed very well at
"quite empty"(< 100M objects) levels.

So yes, using one sequence per TOAST table should help.
Combined with the new SequenceAMs / sequence implementation being
proposed (specifically: one file for all sequences in a certain
tablespace) this should scale much better.

But it wouldn't be perfect, right? I mean if you had multiple
deletion/insertions and pass 4B then the "one sequence per TOAST
table" would still wrap [albeit more slowly], and performance start
degrading the same way. And there would still be the hard 4B limit.
Perhaps the foreign key to the TOAST table could be changed from oid
(32 bits) to something else (64 bits) [as well the sequence] so that
it never wraps? What do you think? And would a more aggressive change
like this have a chance of being accepted into the code base?
Thanks.
-roger-

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Roger Pack (#15)
hackersgeneral
Re: Fwd: [GENERAL] 4B row limit for CLOB tables

Roger Pack <rogerdpack2@gmail.com> writes:

On 2/2/15, José Luis Tallón <jltallon@adv-solutions.net> wrote:

So yes, using one sequence per TOAST table should help.
Combined with the new SequenceAMs / sequence implementation being
proposed (specifically: one file for all sequences in a certain
tablespace) this should scale much better.

But it wouldn't be perfect, right? I mean if you had multiple
deletion/insertions and pass 4B then the "one sequence per TOAST
table" would still wrap [albeit more slowly], and performance start
degrading the same way. And there would still be the hard 4B limit.
Perhaps the foreign key to the TOAST table could be changed from oid
(32 bits) to something else (64 bits) [as well the sequence] so that
it never wraps? What do you think? And would a more aggressive change
like this have a chance of being accepted into the code base?

There has been some thought about this, but I have seen no, zero, reports
of anyone actually running into problems *in practice* (as opposed to
contrived cases like "can we create a billion tables"). So we probably
aren't going to want to address it until it starts being a real problem.

The reason it's not as significant as you might think is that small field
values (less than a couple KB *after compression*) don't get pushed out
to the TOAST table, so they don't consume OIDs. And large field values,
like megabytes worth, aren't a problem either because you just aren't
gonna have that many of them. (Simple arithmetic.) You could potentially
get into trouble if you had a whole lot of entries that were just a little
over the toasting threshold, because then you'd have a lot of OIDs
consumed but still a manageable total amount of disk space. But that
doesn't seem to be a very common usage pattern.

Also, partitioning the table largely eliminates the problem because each
partition will have its own TOAST table. I'm on record as saying that
many people are far too quick to decide that they need partitioning; but
once you get into the volume of data where 4B toast entries starts to
look like a limitation, you will probably have other reasons to think
that you need to partition.

In short, this is something that's theoretically interesting but doesn't
seem worth doing in practice --- yet anyway.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17José Luis Tallón
jltallon@adv-solutions.net
In reply to: Roger Pack (#15)
hackersgeneral
Re: Fwd: [GENERAL] 4B row limit for CLOB tables

On 02/02/2015 09:36 PM, Roger Pack wrote:

On 2/2/15, José Luis Tallón <jltallon@adv-solutions.net> wrote:

On 01/31/2015 12:25 AM, Jim Nasby wrote:

[snip]
It's a bit more complex than that. First, toast isn't limited to
bytea; it holds for ALL varlena fields in a table that are allowed to
store externally. Second, the limit is actually per-table: every table
gets it's own toast table, and each toast table is limited to 4B
unique OIDs. Third, the OID counter is actually global, but the code
should handle conflicts by trying to get another OID. See
toast_save_datum(), which calls GetNewOidWithIndex().

Now, the reality is that GetNewOidWithIndex() is going to keep
incrementing the global OID counter until it finds an OID that isn't
in the toast table. That means that if you actually get anywhere close
to using 4B OIDs you're going to become extremely unhappy with the
performance of toasting new data.

Indeed ......

I don't think it would be horrifically hard to change the way toast
OIDs are assigned (I'm thinking we'd basically switch to creating a
sequence for every toast table), but I don't think anyone's ever tried
to push toast hard enough to hit this kind of limit.

We did. The Billion Table Project, part2 (a.k.a. "when does Postgres'
OID allocator become a bottleneck").... The allocator becomes
essentially unusable at about 2.1B OIDs, where it performed very well at
"quite empty"(< 100M objects) levels.

So yes, using one sequence per TOAST table should help.
Combined with the new SequenceAMs / sequence implementation being
proposed (specifically: one file for all sequences in a certain
tablespace) this should scale much better.

But it wouldn't be perfect, right? I mean if you had multiple
deletion/insertions and pass 4B then the "one sequence per TOAST
table" would still wrap [albeit more slowly], and performance start
degrading the same way. And there would still be the hard 4B limit.
Perhaps the foreign key to the TOAST table could be changed from oid
(32 bits) to something else (64 bits) [as well the sequence] so that
it never wraps?

Hmm.... 2^32 times aprox. 2kB (as per usual heuristics, ~4 rows per heap
page) is 8796093022208 (~9e13) bytes
... which results in 8192 1GB segments :O
Looks like partitioning might be needed much sooner than that (if only
for index efficiency reasons)... unless access is purely sequential.

The problem with changing the id from 32 to 64 bits is that the storage
*for everybody else* doubles, making the implementation slower for
most.... though this might be actually not that important.
The alternative could be some "long LOB" ("HugeOBject"?) using the
equivalent to "serial8" whereas regular LOBs would use "serial4".

Anybody actually reaching this limit out there?

Regards,

/ J .L.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: José Luis Tallón (#17)
hackersgeneral
Re: Fwd: [GENERAL] 4B row limit for CLOB tables

On 2/2/15 3:50 PM, José Luis Tallón wrote:

Hmm.... 2^32 times aprox. 2kB (as per usual heuristics, ~4 rows per heap
page) is 8796093022208 (~9e13) bytes
... which results in 8192 1GB segments :O
Looks like partitioning might be needed much sooner than that (if only
for index efficiency reasons)... unless access is purely sequential.

The problem with changing the id from 32 to 64 bits is that the storage
*for everybody else* doubles, making the implementation slower for
most.... though this might be actually not that important.
The alternative could be some "long LOB" ("HugeOBject"?) using the
equivalent to "serial8" whereas regular LOBs would use "serial4".

Well, it depends on how we did this. We could (for example) add a field
to pg_class that determines what type to use for toast pointers; OID,
int, or bigint. That could then be taken into account in the *toast*
functions.

But as others have pointed out, we haven't even had any real complaints
about toast using OIDs as being an issue until now, so I think it's
premature to start messing with this. At most it's just something to
keep in mind so we don't preclude doing this in the future.

BTW, regarding the size of what gets toasted; I've often thought it
would be useful to allow a custom size limit on columns so that you
could easily force data to be toasted if you knew you were very unlikely
to access it. Basically, a cheap form of vertical partitioning.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Matthew Kelly
mkelly@tripadvisor.com
In reply to: José Luis Tallón (#17)
hackersgeneral
Re: [GENERAL] 4B row limit for CLOB tables

Hmm.... 2^32 times aprox. 2kB (as per usual heuristics, ~4 rows per heap page) is 8796093022208 (~9e13) bytes
... which results in 8192 1GB segments :O

8192 1GB segments is just 8TB, its not _that_ large. At TripAdvisor we’ve been using a NoSQL solution to do session storage. We are looking to probably swap that out to be Postgres (every other db backing the site is Postgres). Essentially, what I’m building is a system with 1 logical table that maps session id to a 2KB+ grab bag of ever changing session attributes which is partially normalized, partially json. 315 million uniques a month multiplied by the retention policy means I need to hold 2-4 billion session objects (and somehow expire old ones). Additionally, most http calls can update the session, so between maintenance windows I expect to take around 20 billion 'upserts’. Obviously, I will have to shard and partition the table in practice, but this weekend I ran a test that demonstrated that a single table on a 9.4 server + logical replication + Dell 730xd can handle 4x that workload. Well, it can for 38 hours… until you wrap xid’s on the toast table. :P I’ll be the first to admit that isn’t the normal use case though. I’m happy to have found this thread, however, because I’m going to have to build around the global oid counter, explicitly the prevent the problem I explain below regarding clustering.

Anybody actually reaching this limit out there?

Well its not the 4 billion row limit that concerns me, its the global shared counter in conjunction with pg_restore/clustering that is actually pretty concerning.

Just checked through all of TripAdvisor’s normal databases and the max tuples I see in single toast table is 17,000,000, so that is still a couple of orders of magnitude too small. (however, close enough that it’ll be a concern in a few years).

However, I do have active databases where the current oid is between 1 billion and 2 billion. They were last dump-restored for a hardware upgrade a couple years ago and were a bit more than half the size. I therefore can imagine that I have tables which are keyed by ~8,000,000 consecutive oids.

I would argue that when it wraps there will be a single insert that will probably block for 2-5 minutes while it tries to accomplish ~8,000,000 index scans inside of GetNewOidWithIndex. Even partitioning doesn’t protect you from this potential problem.

What even more weird is that this issue can be trigged by consuming too many oid’s in a different database in the same cluster (i.e. creating large amounts of temp tables)

The problem with changing the id from 32 to 64 bits is that the storage *for everybody else* doubles, making the implementation slower for most.... though this might be actually not that important.

Well, you aren’t doubling the storage. Even if you have to store the key in 4 places, you are adding 16 bytes per TOAST tuple. If we work off the 2KB estimate for each TOAST tuple, then you are only increasing the storage by 0.7%. I’m sure there are more hidden costs but we are really only talking about a low single digit percent increase. In exchange, you get to drop one index scan per toast insert; an index scan looking in the only hot part of the index.

That being said I’d be perfectly happy merely giving each TOAST table its own sequence as that almost entire mitigates the risk of an unexpected lock up on reasonably sized tables/partitions, and provides a functional work around for those of us with larger than average installs.

- Matt K
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20José Luis Tallón
jltallon@adv-solutions.net
In reply to: Jim Nasby (#18)
hackersgeneral
Re: Fwd: [GENERAL] 4B row limit for CLOB tables

On 02/03/2015 03:44 AM, Jim Nasby wrote:

[snip]

The alternative could be some "long LOB" ("HugeOBject"?) using the
equivalent to "serial8" whereas regular LOBs would use "serial4".

Well, it depends on how we did this. We could (for example) add a
field to pg_class that determines what type to use for toast pointers;
OID, int, or bigint. That could then be taken into account in the
*toast* functions.

But as others have pointed out, we haven't even had any real
complaints about toast using OIDs as being an issue until now, so I
think it's premature to start messing with this. At most it's just
something to keep in mind so we don't preclude doing this in the future.

A patch creating those HOBs (Huge Objects) might well make sense *after*
the sequence refactoring got merged.
Removing the bottleneck due to the OID allocator for this use case will
be definitively welcome
(I don't dare to code that just yet, but here's hoping someone will
step in O:-)

BTW, regarding the size of what gets toasted; I've often thought it
would be useful to allow a custom size limit on columns so that you
could easily force data to be toasted if you knew you were very
unlikely to access it. Basically, a cheap form of vertical partitioning.

Hmmm.... alter column set storage external / set storage extended ?

From http://www.postgresql.org/docs/9.4/static/sql-altertable.html :
ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL |
EXTENDED | MAIN }

This would do what you described, right?

HTH,

/ J.L.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Matthew Kelly (#19)
hackersgeneral
#22David Steele
david@pgmasters.net
In reply to: José Luis Tallón (#20)
hackersgeneral
#23Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: David Steele (#22)
hackersgeneral
#24Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tom Lane (#21)
hackersgeneral
#25David Steele
david@pgmasters.net
In reply to: Jim Nasby (#23)
hackersgeneral
#26Matthew Kelly
mkelly@tripadvisor.com
In reply to: Jim Nasby (#24)
hackersgeneral
#27Bruce Momjian
bruce@momjian.us
In reply to: Álvaro Hernández Tortosa (#12)
hackersgeneral
#28Stephen Frost
sfrost@snowman.net
In reply to: Bruce Momjian (#27)
hackersgeneral
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#28)
hackersgeneral
#30Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#29)
hackersgeneral
#31Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#30)
hackersgeneral
In reply to: Tom Lane (#29)
hackersgeneral
#33Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Álvaro Hernández Tortosa (#32)
hackersgeneral
#34Bruce Momjian
bruce@momjian.us
In reply to: Jim Nasby (#33)
hackersgeneral
In reply to: Jim Nasby (#33)
hackersgeneral
#36Stephen Frost
sfrost@snowman.net
In reply to: Álvaro Hernández Tortosa (#35)
hackersgeneral
#37Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Bruce Momjian (#34)
hackersgeneral
#38Roger Pack
rogerdpack2@gmail.com
In reply to: Jim Nasby (#37)
hackersgeneral
#39José Luis Tallón
jltallon@adv-solutions.net
In reply to: Jim Nasby (#37)
hackersgeneral
#40Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: José Luis Tallón (#39)
hackersgeneral