CLOB & BLOB limitations in PostgreSQL

Started by Nonamealmost 12 years ago8 messages
#1Noname
Jack.O'Sullivan@tessella.com

I am working for a client who is interested in migrating from Oracle to
Postgres. Their database is currently ~20TB in size, and is growing. The
biggest table in this database is effectively a BLOB store and currently
has around 1 billion rows.

From reading around Postgres, there are a couple of limits which are
concerning in terms of being able to migrate this database. We are not up
against these limits just yet, but it is likely that they will be a
potential blocker within the next few years.

1) Table can be maximum of 32TB (http://www.postgresql.org/about/)

2) When storing bytea or text datatypes there is a limit of 4 billion
entries per table (https://wiki.postgresql.org/wiki/BinaryFilesInDB)

With both of these, are they hard limits or can they be worked around with
partitioning of tables? Could we set the table up in such a way that each
child table was limited, but there was no limit on the number of children?

With point two, does this mean that any table with a bytea datatype is
limited to 4 billion rows (which would seem in conflict with the
"unlimited rows" shown by http://www.postgresql.org/about)? If we had rows
where the bytea was a "null" entry would they contribute towards this
total or is it 4 billion non-null entries?

Thanks.

#2Andy Colson
andy@squeakycode.net
In reply to: Noname (#1)
Re: CLOB & BLOB limitations in PostgreSQL

On 4/11/2014 9:45 AM, Jack.O'Sullivan@tessella.com wrote:

I am working for a client who is interested in migrating from Oracle to
Postgres. Their database is currently ~20TB in size, and is growing. The
biggest table in this database is effectively a BLOB store and currently
has around 1 billion rows.

From reading around Postgres, there are a couple of limits which are
concerning in terms of being able to migrate this database. We are not
up against these limits just yet, but it is likely that they will be a
potential blocker within the next few years.

1) Table can be maximum of 32TB (http://www.postgresql.org/about/)

2) When storing bytea or text datatypes there is a limit of 4 billion
entries per table (https://wiki.postgresql.org/wiki/BinaryFilesInDB)

With both of these, are they hard limits or can they be worked around
with partitioning of tables? Could we set the table up in such a way
that each child table was limited, but there was no limit on the number
of children?

With point two, does this mean that any table with a bytea datatype is
limited to 4 billion rows (which would seem in conflict with the
"unlimited rows" shown by http://www.postgresql.org/about)? If we had
rows where the bytea was a "null" entry would they contribute towards
this total or is it 4 billion non-null entries?

Thanks.

Sorry I cant answer any of your questions, but I do have a few more to
raise:

1) I assume Oracle is pretty efficient on disk. You might wanna do a
quick test of a million rows or so and compare the on disk size of an
Oracle db vs PG. It wouldn't surprise me if PG used more space. (I
mean regular varchar, integer, etc.)

2) Does the Oracle blob compress? PG will compress but I'd bet they
compress differently. Again, you might wanna dump out a million blobs
and compare their space usage. At 20TB, a 10% increase in disk usage is
quite a bit.

3) There are two ways to store blob data. Bytea in your table and Large
Object support (in a separate table). Google "postgres bytea vs large
object" might offer useful reading.

I don't know if bytea or large object offer more efficient storage, but
it might be another thing you can test. Large object might be a little
more work to use, but if it saves lots of disk space, it might be worth it.

4) is this blob data binary'ish? We have json/hstore if its text'ish,
which might make it more usable.

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Noname
Jack.O'Sullivan@tessella.com
In reply to: Andy Colson (#2)
Re: CLOB & BLOB limitations in PostgreSQL

Hi Andy,

Thanks for getting those thoughts out so quickly.

As far as points 1 and 2 go, it is definitely something to think about,
but they are largely tangential to what I need to worry about at this
moment. I am less concerned about "how much disk do we need to store
this" than "is it even possible to store this".

If I'm understanding your point 3 correctly, you are referring to the
OID/LOB type, which, from what I have read, is even more restrictive than
bytea or text in that you have a limit of 4 billion objects per database (
https://wiki.postgresql.org/wiki/BinaryFilesInDB). The fact that each
object can be 2G rather than 1G is not much help to us as our objects are
unlikely to ever been more than ~100k.

The table in question supports storing information either as XML, which we
store in a text/char type column, or binary compressed xml, which is in
the BLOB type column. The particular client I'm doing this for uses the
compressed version, so all of their data in this table is binary.

Thanks

Jack

From: Andy Colson <andy@squeakycode.net>
To: Jack.O'Sullivan@tessella.com, pgsql-general@postgresql.org,
Date: 11/04/2014 16:24
Subject: Re: [GENERAL] CLOB & BLOB limitations in PostgreSQL

On 4/11/2014 9:45 AM, Jack.O'Sullivan@tessella.com wrote:

I am working for a client who is interested in migrating from Oracle to
Postgres. Their database is currently ~20TB in size, and is growing. The
biggest table in this database is effectively a BLOB store and currently
has around 1 billion rows.

From reading around Postgres, there are a couple of limits which are
concerning in terms of being able to migrate this database. We are not
up against these limits just yet, but it is likely that they will be a
potential blocker within the next few years.

1) Table can be maximum of 32TB (http://www.postgresql.org/about/)

2) When storing bytea or text datatypes there is a limit of 4 billion
entries per table (https://wiki.postgresql.org/wiki/BinaryFilesInDB)

With both of these, are they hard limits or can they be worked around
with partitioning of tables? Could we set the table up in such a way
that each child table was limited, but there was no limit on the number
of children?

With point two, does this mean that any table with a bytea datatype is
limited to 4 billion rows (which would seem in conflict with the
"unlimited rows" shown by http://www.postgresql.org/about)? If we had
rows where the bytea was a "null" entry would they contribute towards
this total or is it 4 billion non-null entries?

Thanks.

Sorry I cant answer any of your questions, but I do have a few more to
raise:

1) I assume Oracle is pretty efficient on disk. You might wanna do a
quick test of a million rows or so and compare the on disk size of an
Oracle db vs PG. It wouldn't surprise me if PG used more space. (I
mean regular varchar, integer, etc.)

2) Does the Oracle blob compress? PG will compress but I'd bet they
compress differently. Again, you might wanna dump out a million blobs
and compare their space usage. At 20TB, a 10% increase in disk usage is
quite a bit.

3) There are two ways to store blob data. Bytea in your table and Large
Object support (in a separate table). Google "postgres bytea vs large
object" might offer useful reading.

I don't know if bytea or large object offer more efficient storage, but
it might be another thing you can test. Large object might be a little
more work to use, but if it saves lots of disk space, it might be worth
it.

4) is this blob data binary'ish? We have json/hstore if its text'ish,
which might make it more usable.

-Andy

#4Albe Laurenz
laurenz.albe@wien.gv.at
In reply to: Noname (#1)
Re: CLOB & BLOB limitations in PostgreSQL

Jack.O'Sullivan wrote:

I am working for a client who is interested in migrating from Oracle to Postgres. Their database is
currently ~20TB in size, and is growing. The biggest table in this database is effectively a BLOB
store and currently has around 1 billion rows.

From reading around Postgres, there are a couple of limits which are concerning in terms of being able
to migrate this database. We are not up against these limits just yet, but it is likely that they will
be a potential blocker within the next few years.

1) Table can be maximum of 32TB (http://www.postgresql.org/about/ <http://www.postgresql.org/about/&gt;
)

2) When storing bytea or text datatypes there is a limit of 4 billion entries per table
(https://wiki.postgresql.org/wiki/BinaryFilesInDB <https://wiki.postgresql.org/wiki/BinaryFilesInDB&gt; )

With both of these, are they hard limits or can they be worked around with partitioning of tables?
Could we set the table up in such a way that each child table was limited, but there was no limit on
the number of children?

Yes, if you store the BLOBs as bytea. The limits will be per partition.

If you want to use LOBs, there cannot be more than 2^32 per database.

With point two, does this mean that any table with a bytea datatype is limited to 4 billion rows
(which would seem in conflict with the "unlimited rows" shown by http://www.postgresql.org/about
<http://www.postgresql.org/about&gt; )? If we had rows where the bytea was a "null" entry would they
contribute towards this total or is it 4 billion non-null entries?

I think it is "4 billion rows that contain a column that is TOASTed".
NULLs won't contribute.

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Ivan Voras
ivoras@freebsd.org
In reply to: Noname (#1)
Re: CLOB & BLOB limitations in PostgreSQL

On 11/04/2014 16:45, Jack.O'Sullivan@tessella.com wrote:

With point two, does this mean that any table with a bytea datatype is
limited to 4 billion rows (which would seem in conflict with the
"unlimited rows" shown by http://www.postgresql.org/about)? If we had
rows where the bytea was a "null" entry would they contribute towards
this total or is it 4 billion non-null entries?

This seems strange. A core developer should confirm this but it doesn't
make much sense - "bytea" fields are stored the same as "text" fields
(including varchar etc), i.e. the "varlena" internal representation, so
having the limit you are talking about would mean that any non-trivial
table with long-ish text fields would be limited to 2^32 entries...

#6Bruce Momjian
bruce@momjian.us
In reply to: Ivan Voras (#5)
Re: [GENERAL] CLOB & BLOB limitations in PostgreSQL

On Mon, Apr 14, 2014 at 02:01:19PM +0200, Ivan Voras wrote:

On 11/04/2014 16:45, Jack.O'Sullivan@tessella.com wrote:

With point two, does this mean that any table with a bytea datatype is
limited to 4 billion rows (which would seem in conflict with the
"unlimited rows" shown by http://www.postgresql.org/about)? If we had
rows where the bytea was a "null" entry would they contribute towards
this total or is it 4 billion non-null entries?

This seems strange. A core developer should confirm this but it doesn't
make much sense - "bytea" fields are stored the same as "text" fields
(including varchar etc), i.e. the "varlena" internal representation, so
having the limit you are talking about would mean that any non-trivial
table with long-ish text fields would be limited to 2^32 entries...

[ moved to hackers ]

Uh, I had not thought of this before but I think we need oids for toast
storage, which would explain this wiki text:

https://wiki.postgresql.org/wiki/BinaryFilesInDB

Storing binary data using bytea or text data types

Minus

bytea and text data type both use TOAST
limited to 1G per entry
--> 4 Billion entries per table

Is that correct?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7David G Johnston
david.g.johnston@gmail.com
In reply to: Bruce Momjian (#6)
Re: [GENERAL] CLOB & BLOB limitations in PostgreSQL

Bruce Momjian wrote

On Mon, Apr 14, 2014 at 02:01:19PM +0200, Ivan Voras wrote:

On 11/04/2014 16:45, Jack.O'

Sullivan@

wrote:

With point two, does this mean that any table with a bytea datatype is
limited to 4 billion rows (which would seem in conflict with the
"unlimited rows" shown by http://www.postgresql.org/about)? If we had
rows where the bytea was a "null" entry would they contribute towards
this total or is it 4 billion non-null entries?

This seems strange. A core developer should confirm this but it doesn't
make much sense - "bytea" fields are stored the same as "text" fields
(including varchar etc), i.e. the "varlena" internal representation, so
having the limit you are talking about would mean that any non-trivial
table with long-ish text fields would be limited to 2^32 entries...

[ moved to hackers ]

Uh, I had not thought of this before but I think we need oids for toast
storage, which would explain this wiki text:

https://wiki.postgresql.org/wiki/BinaryFilesInDB

Storing binary data using bytea or text data types

Minus

bytea and text data type both use TOAST
limited to 1G per entry
--> 4 Billion entries per table

Is that correct?

Reading only http://www.postgresql.org/docs/9.3/static/storage-toast.html
...

Since only actual out-of-line values require chunk_id (an OID) the number of
main table rows has a minimum but not a maximum. However, the minimum would
appear to be "2^32 / {# of toast-able columns }" - each table can only have
one "pg_class.reltoastrelid" so all toast-able columns on that table pull
from the same OID pool.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Re-GENERAL-CLOB-BLOB-limitations-in-PostgreSQL-tp5800032p5800037.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#6)
Re: [GENERAL] CLOB & BLOB limitations in PostgreSQL

Bruce Momjian <bruce@momjian.us> writes:

Uh, I had not thought of this before but I think we need oids for toast
storage, which would explain this wiki text:

https://wiki.postgresql.org/wiki/BinaryFilesInDB

Storing binary data using bytea or text data types

Minus

bytea and text data type both use TOAST
limited to 1G per entry
--> 4 Billion entries per table

Is that correct?

No. It'd be 4 billion toasted-out-of-line entries per table (actually,
you'd start hitting performance issues well below that, but 4G would be
the hard limit). Small values, up to probably a KB or so, don't count
against the limit.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers