Doubt with physical storage being used by postgres when storing LOBs

Started by Víctor Cosquiover 12 years ago3 messagesgeneral
Jump to latest
#1Víctor Cosqui
victor.cosqui@gmail.com

Hi all

I am developing an application which uses postgres 9.2 to store binaries as
oid objects.

CREATE TABLE content (contentname text, contentoid oid);

I am making some tests to evaluate how much HD space I will need to
allocate these objects.

To measure the space used by postgres I have used two different tools, both
with the same results

1.- Checking physical HD space by making a "sudo du -sb
/opt/PostgreSQL/9.2/data/base/" before and after inserting the data

2.- Asking directly postgres about the tables size estimation "select
pg_size_pretty(pg_relation_size('pg_largeobject'))"

I have tested with different binaries and I am getting different results,
for example when I put the content of a zipped file of 17MB size, the
increment of the disk space is of 24MB. The reason for this increment seems
to be an index created on the table "pg_largeobject". The index is
"pg_largeobject_loid_pn_index"

In other hand when I put let's say many zeroes (same 17Mb) the increase of
HD usage is much smaller.

I think it could be caused because TOAST compresses the content stored, se
he can compress the zeroes but not the previously compressed zip content.

My question is: Is this increase of ~40% normal? Has someone else
experienced this?

Thanks in advance for your help!

Víctor Cosqui

#2Merlin Moncure
mmoncure@gmail.com
In reply to: Víctor Cosqui (#1)
Re: Doubt with physical storage being used by postgres when storing LOBs

On Wed, Oct 2, 2013 at 11:08 AM, Víctor Cosqui <victor.cosqui@gmail.com> wrote:

Hi all

I am developing an application which uses postgres 9.2 to store binaries as
oid objects.

CREATE TABLE content (contentname text, contentoid oid);

I am making some tests to evaluate how much HD space I will need to allocate
these objects.

To measure the space used by postgres I have used two different tools, both
with the same results

1.- Checking physical HD space by making a "sudo du -sb
/opt/PostgreSQL/9.2/data/base/" before and after inserting the data

2.- Asking directly postgres about the tables size estimation "select
pg_size_pretty(pg_relation_size('pg_largeobject'))"

I have tested with different binaries and I am getting different results,
for example when I put the content of a zipped file of 17MB size, the
increment of the disk space is of 24MB. The reason for this increment seems
to be an index created on the table "pg_largeobject". The index is
"pg_largeobject_loid_pn_index"

In other hand when I put let's say many zeroes (same 17Mb) the increase of
HD usage is much smaller.

I think it could be caused because TOAST compresses the content stored, se
he can compress the zeroes but not the previously compressed zip content.

My question is: Is this increase of ~40% normal? Has someone else
experienced this?

TOAST will compress data if it thinks it can (you can disable this
behavior and arguably should if your data is pre-compressed). 40% for
the index seems high but it may be accurate. Personally, I prefer
bytea storage to LOB although LOB is a little bit faster.

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Víctor Cosqui
victor.cosqui@gmail.com
In reply to: Merlin Moncure (#2)
Re: Doubt with physical storage being used by postgres when storing LOBs

Hello again,

I have been digging a bit more into this and I have seen that the increase
of the table size is not caused by the index, but by the data storage to
disk implementation. As it is explained at the documentation of
pg_largeobject (
http://www.postgresql.org/docs/9.2/static/catalog-pg-largeobject.html)

"Each row of pg_largeobject holds data for one page of a large object,
beginning at byte offset (pageno * LOBLKSIZE) within the object. The
implementation allows sparse storage: pages might be missing, and might be
shorter than LOBLKSIZE bytes even if they are not the last page of the
object. Missing regions within a large object read as zeroes."

I could confirm by reading the file in disk where the table is stored that
when I put a file whose content is like:

[aaa…abbb…bccc…c]

What I can read from the file in disk is like

[aaa…000bbb…000ccc…0000]

So pages are not completely fulfilled with object data and filled with
zeroes to reach the LOBLKSIZE.

I have also playing with different LOBLKSIZE values to see what happens
with smaller and bigger values and the conclusion is that the bigger
LOBLKSIZE is, the more space is filled with zeroes.

This is what happens when you store a 17MB compressed file in different
LOBLKSIZE conditions.
(Default value for LOBLKSIZE is 2KB)

------------------------------------------------+-------------+----------+-------------
LOBLKSIZE [B] | 512 | 2048 |
8000 |
------------------------------------------------+-------------+----------+-------------
Rows in pg_largeobject table | 34000 | 8712 | 2231 |
------------------------------------------------+-------------+----------+-------------
Size of pg_largeobject table [MB] | 21 | 24 |
28 |
------------------------------------------------+-------------+----------+-------------
Overhead [%] | 23.53 | 41.18 |
64.71 |
------------------------------------------------+-------------+----------+-------------

IMHO the overhead added by these zeroes is too high.

Thanks for your support and regards!

On Wed, Oct 2, 2013 at 9:00 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

Show quoted text

On Wed, Oct 2, 2013 at 11:08 AM, Víctor Cosqui <victor.cosqui@gmail.com>
wrote:

Hi all

I am developing an application which uses postgres 9.2 to store binaries

as

oid objects.

CREATE TABLE content (contentname text, contentoid oid);

I am making some tests to evaluate how much HD space I will need to

allocate

these objects.

To measure the space used by postgres I have used two different tools,

both

with the same results

1.- Checking physical HD space by making a "sudo du -sb
/opt/PostgreSQL/9.2/data/base/" before and after inserting the data

2.- Asking directly postgres about the tables size estimation "select
pg_size_pretty(pg_relation_size('pg_largeobject'))"

I have tested with different binaries and I am getting different results,
for example when I put the content of a zipped file of 17MB size, the
increment of the disk space is of 24MB. The reason for this increment

seems

to be an index created on the table "pg_largeobject". The index is
"pg_largeobject_loid_pn_index"

In other hand when I put let's say many zeroes (same 17Mb) the increase

of

HD usage is much smaller.

I think it could be caused because TOAST compresses the content stored,

se

he can compress the zeroes but not the previously compressed zip content.

My question is: Is this increase of ~40% normal? Has someone else
experienced this?

TOAST will compress data if it thinks it can (you can disable this
behavior and arguably should if your data is pre-compressed). 40% for
the index seems high but it may be accurate. Personally, I prefer
bytea storage to LOB although LOB is a little bit faster.

merlin