size of bytea + performance issues

Started by tschakabout 20 years ago6 messagesgeneral
Jump to latest
#1tschak
jochen.schlosser@gmail.com

Hi everyone,

my first question concerns the the size of a table with a bytea row. In
the documentation it says something like 4 Bytes + 1 Byte for each
escaped octet sequence per row. For example an insertion into a table
storing just one column with bytea data looks like this:
insert into test values (''\\003''); ---- this allows 185 inserts per
page until a new one is needed
insert into test values (''\\003\\123\\123\\111''); ---- this yields
exactly the same nr. of rows per page!
insert into test values (''\\003\\123\\123\\111\\001''); ---- this one
finally needs more pages!

How can that be, if the system internally allocates ONE BYTE per
octet...
Does it acually take 4 Bytes?

My second question is more generall:
My dbms (acutally it is supposed to be a decision support system, so I
do not really need rollbacks and transactions etc. --- can those
features be turned off to enhance performance?) needs to store 1
Billion rows in a single table (I know that I could use horizontal
partitioning) and I wonder if postgres is powerfull enough to handle
such large tables? If anyone has experience with tables this size and
could give me a hint which system to use (DB2, Sybase, Oracle,
Informix, Postgres) this would be great...

THX

tschak

#2Martijn van Oosterhout
kleptog@svana.org
In reply to: tschak (#1)
Re: size of bytea + performance issues

On Tue, Jan 31, 2006 at 07:58:30AM -0800, tschak wrote:

Hi everyone,

my first question concerns the the size of a table with a bytea row. In
the documentation it says something like 4 Bytes + 1 Byte for each
escaped octet sequence per row. For example an insertion into a table
storing just one column with bytea data looks like this:
insert into test values (''\\003''); ---- this allows 185 inserts per
page until a new one is needed
insert into test values (''\\003\\123\\123\\111''); ---- this yields
exactly the same nr. of rows per page!
insert into test values (''\\003\\123\\123\\111\\001''); ---- this one
finally needs more pages!

How can that be, if the system internally allocates ONE BYTE per
octet...
Does it acually take 4 Bytes?

Alignment. A new row needs to start on a multiple of 4 (or 8) boundary.
So even though your data value might take 5 bytes, you may end up with
some slack before the next tuple.

My second question is more generall:
My dbms (acutally it is supposed to be a decision support system, so I
do not really need rollbacks and transactions etc. --- can those
features be turned off to enhance performance?) needs to store 1

Well, you might need to take this up on the -performance list, but
there are people running databases that large. In general such features
can't be turned off, though they don't really cost much on data that
doesn't change.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.

#3tschak
jochen.schlosser@gmail.com
In reply to: Martijn van Oosterhout (#2)
Re: size of bytea + performance issues

Allright... do you mean an alignment with an offset of 4/8 bit or byte?
If it is just bit I cannot really follow the calculation...
Nevertheless it sounds like an explanation for this "effect".

I have one more question concerning size. The following table
create table test(val0 unsigned smallint,
... ... ... .. ..., val7 unsigned smallint,
bulk bytea);

needs app. 100 Bytes per row even though the user data only uses
8*2Bytes (vali) + 40Bytes (the sizye of my bytea). Does that mean, that
the system needs 46 bytes for internal represantationsor is the
alignment factor a again a source of this "blowup"?

Thanks for your help,
Tschak

#4Martijn van Oosterhout
kleptog@svana.org
In reply to: tschak (#3)
Re: size of bytea + performance issues

On Tue, Jan 31, 2006 at 09:15:18AM -0800, tschak wrote:

Allright... do you mean an alignment with an offset of 4/8 bit or byte?
If it is just bit I cannot really follow the calculation...
Nevertheless it sounds like an explanation for this "effect".

I have one more question concerning size. The following table
create table test(val0 unsigned smallint,
... ... ... .. ..., val7 unsigned smallint,
bulk bytea);

needs app. 100 Bytes per row even though the user data only uses
8*2Bytes (vali) + 40Bytes (the sizye of my bytea). Does that mean, that
the system needs 46 bytes for internal represantationsor is the
alignment factor a again a source of this "blowup"?

Well, it's in the FAQ under "How much database disk space is required
to store data from a typical text file?" but the per tuple overhead is
between 36 and 44 bytes. depends a bit on the version.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.

#5tschak
jochen.schlosser@gmail.com
In reply to: Martijn van Oosterhout (#4)
Re: size of bytea + performance issues

Well, it's in the FAQ under "How much database disk space is required
to store data from a typical text file?" but the per tuple overhead is
between 36 and 44 bytes. depends a bit on the version.

allright... thx a lot!
I did not see this point in the FAQ because I am not storing data from
a textfile and did not look in that direction... What store is
basically a encoding scheme of several hash bins which store
distances. Thus it is just a bitvector and to the best of my knowledge
a bytea or varyiing bitarray is the most space efficient method to
store my information(???).

Thanks again,

Tschak

#6Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: tschak (#5)
Re: size of bytea + performance issues

On Tue, Jan 31, 2006 at 08:15:44PM +0100, Jochen Schlosser wrote:

Well, it's in the FAQ under "How much database disk space is required
to store data from a typical text file?" but the per tuple overhead is
between 36 and 44 bytes. depends a bit on the version.

allright... thx a lot!
I did not see this point in the FAQ because I am not storing data from
a textfile and did not look in that direction... What store is
basically a encoding scheme of several hash bins which store
distances. Thus it is just a bitvector and to the best of my knowledge
a bytea or varyiing bitarray is the most space efficient method to
store my information(???).

It is if it's truely variable in length. If you know it's limited to say
4 bytes, you'd probably be better off with an int4, which doesn't have
the varlena overhead (4 bytes) that a bytea does.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461