Re: Postgres table size

Started by SHARMILA JOTHIRAJAHover 18 years ago8 messagesgeneral
Jump to latest
#1SHARMILA JOTHIRAJAH
sharmi_jo@yahoo.com

Hi
Thanks. Ill post it on the mailing list when I get the results.
Im trying to calculate and see how the tablesize works for a simple
table.

I have a table with 10 cols
5 varchars _ it is declared as varchar(40) but contains data of length 3
5 numeric - declared as numeric(22,0) but contains data of precision 10
There are 10000 rows

select * from pg_relation_size gives 1548288 bytes as the table's size
select relpages from pg_class for that table gives 189 pages

Calculation
varchar = (overhead) 4 + (actual length of string) 3 = 7 bytes
*for 5 varchar
cols = 5*7 = 35 bytes

numeric
(according to manual--- The actual storage requirement is two bytes for
each group of four decimal digits, plus eight bytes overhead )
numeric = ( 10/4)*2 +8 = 13 bytes
*for 5 numeric cols = 13 *5 = 65

*row overhead = 32

So
Bytes per row = 35 + 65 +32 = 132 bytes
for 10000
rows = 1320000

The manual says ---The first 20 bytes of each page consists of a page header (PageHeaderData).
There are 189 pages
*page header cost = 20*189 = 3780 bytes

Therefore Total = 1320000 (row cost) + 3780 (page header cost)=1323780 bytes for this table

Is this calculation right? But the size of the table according to pg_relation_size is 1548288 bytes

What am I missing in my calculation?

Thanks for your help.

Thanks
sharmila

----- Original Message ----
From: Jeff Davis <pgsql@j-davis.com>
To: SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com>
Sent: Friday, November 16, 2007 2:12:46 PM
Subject: Re: [GENERAL] Postgres table size

On Fri, 2007-11-16 at 07:36 -0800, SHARMILA JOTHIRAJAH wrote:

Hi,

You were right. I installed beta2 and the table size now is
4682817536. Thanks

How do you estimate the table size generally., ie.,
what is the storage size of varchar, numeric and timestamp columns

and

What is the row overhead?

For example, If I have a table (in postgres8.2.3 version) with 10

rows

and 3 columns (varchar,numeric(22,0),timestamp) , how do I estimate
their storage size. How does 8.3Beta-2 handle this?
this will be very helpful for me for allocating the space properly
Thanks again for your help
sharmila

That's good news. Please post to pgsql-advocacy to show the reduction
in
table size that 8.3 brings for you, and the performance difference that
means for you. If it helps you to choose PostgreSQL instead of some
other database that would be great to mention too (however, some
databases don't like you to publish benchmarks, so be careful not to
violate your license).

The official docs are here:
http://developer.postgresql.org/pgdocs/postgres/storage.html (8.3)
http://www.postgresql.org/docs/8.2/static/storage.html (8.2)

The way I think about it is simple:

In 8.2:
* 32 bytes of row overhead: 28 byte row header + 4 byte line pointer
* 4 bytes of overhead for every variable-width type: to store length

In 8.3:
* 28 bytes of row overhead: 24 byte row header + 4 byte line pointer
* 1-4 bytes of overhead for every variable-width type: only one byte of
overhead if length < 127 bytes, up to 4 bytes if it is longer.

Regards,
Jeff Davis

____________________________________________________________________________________
Be a better pen pal.
Text or chat with friends inside Yahoo! Mail. See how. http://overview.mail.yahoo.com/

#2Erik Jones
erik@myemma.com
In reply to: SHARMILA JOTHIRAJAH (#1)

On Nov 21, 2007, at 9:17 AM, SHARMILA JOTHIRAJAH wrote:

Hi
Thanks. Ill post it on the mailing list when I get the results. Im
trying to calculate and see how the tablesize works for a simple
table.

I have a table with 10 cols
5 varchars _ it is declared as varchar(40) but contains data of
length 3
5 numeric - declared as numeric(22,0) but contains data of
precision 10
There are 10000 rows

select * from pg_relation_size gives 1548288 bytes as the table's size
select relpages from pg_class for that table gives 189 pages

Calculation
varchar = (overhead) 4 + (actual length of string) 3 = 7 bytes
*for 5 varchar cols =
5*7 = 35 bytes

numeric (according to manual--- The actual storage requirement is
two bytes for each group of four decimal digits, plus eight bytes
overhead )
numeric = ( 10/4)*2 +8 = 13 bytes
*for 5 numeric cols = 13 *5 = 65

Not that it will make much difference, but you need to round up in
the 10/4 part so you get 14 bytes, not 13.

*row overhead = 32

So
Bytes per row = 35 + 65 +32 = 132 bytes
for 10000 rows = 1320000

The manual says ---The first 20 bytes of each page consists of a
page header (PageHeaderData).
There are 189 pages
*page header cost = 20*189 = 3780 bytes

Therefore Total = 1320000 (row cost) + 3780 (page header cost)
=1323780 bytes for this table

Is this calculation right? But the size of the table according to
pg_relation_size is 1548288 bytes

What am I missing in my calculation?

One thing to note is that the relpages value is only exact from the
time of a vacuum until the next dml statement on the table, i.e.
relpages is not updated after inserts, update, and deletes, just
after vacuums.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

#3SHARMILA JOTHIRAJAH
sharmi_jo@yahoo.com
In reply to: Erik Jones (#2)

I vacuumed the table before I got this relpages count. When i make
the
change tat you mentioned the total table size as per my calculation is
1373780 and the result thro pg_relation_size is 1548288.

What other overheads are there for the table? Im not sure how to find
the free space? But I didnot delete/insert/update any rows after the
first insertion.

Thanks

sharmila

----- Original Message ----
From: Erik Jones <erik@myemma.com>
To: SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com>
Cc: pgsql-general@postgresql.org
Sent: Wednesday, November 21, 2007 11:38:44 AM
Subject: Re: [GENERAL] Postgres table size

On Nov 21, 2007, at 9:17 AM, SHARMILA JOTHIRAJAH wrote:

Hi
Thanks. Ill post it on the mailing list when I get the results. Im
trying to calculate and see how the tablesize works for a simple
table.

I have a table with 10 cols
5 varchars _ it is declared as varchar(40) but contains data of
length 3
5 numeric - declared as numeric(22,0) but contains data of
precision 10
There are 10000 rows

select * from pg_relation_size gives 1548288 bytes as the table's

size

select relpages from pg_class for that table gives 189 pages

Calculation
varchar = (overhead) 4 + (actual length of string) 3 = 7 bytes
*for 5 varchar cols =
5*7 = 35 bytes

numeric (according to manual--- The actual storage requirement is
two bytes for each group of four decimal digits, plus eight bytes
overhead )
numeric = ( 10/4)*2 +8 = 13 bytes
*for 5 numeric cols = 13 *5 = 65

Not that it will make much difference, but you need to round up in
the 10/4 part so you get 14 bytes, not 13.

*row overhead = 32

So
Bytes per row = 35 + 65 +32 = 132 bytes
for 10000 rows = 1320000

The manual says ---The first 20 bytes of each page consists of a
page header (PageHeaderData).
There are 189 pages
*page header cost = 20*189 = 3780 bytes

Therefore Total = 1320000 (row cost) + 3780 (page header cost)
=1323780 bytes for this table

Is this calculation right? But the size of the table according to
pg_relation_size is 1548288 bytes

What am I missing in my calculation?

One thing to note is that the relpages value is only exact from the
time of a vacuum until the next dml statement on the table, i.e.
relpages is not updated after inserts, update, and deletes, just
after vacuums.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

---------------------------(end of
broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

____________________________________________________________________________________
Be a better sports nut! Let your teams follow you
with Yahoo Mobile. Try it now. http://mobile.yahoo.com/sports;_ylt=At9_qDKvtAbMuh1G1SQtBI7ntAcJ

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Erik Jones (#2)

Erik Jones <erik@myemma.com> writes:

On Nov 21, 2007, at 9:17 AM, SHARMILA JOTHIRAJAH wrote:

Calculation
varchar = (overhead) 4 + (actual length of string) 3 = 7 bytes
*for 5 varchar cols =
5*7 = 35 bytes

numeric (according to manual--- The actual storage requirement is
two bytes for each group of four decimal digits, plus eight bytes
overhead )
numeric = ( 10/4)*2 +8 = 13 bytes
*for 5 numeric cols = 13 *5 = 65

Not that it will make much difference, but you need to round up in
the 10/4 part so you get 14 bytes, not 13.

Also, this calculation is ignoring the fact that (pre-8.3) varlena
values have to be int-aligned, so there's wasted pad space too.
The varchar values really need 8 bytes each, and the numeric values
16, so the actual data payload in each row is 120 bytes. Then add
HeapTupleHeader (28 bytes), and then MAXALIGN the whole row size.
So the rowsize would be either 148 or 152 bytes depending on if you
were on a machine with 8-byte MAXALIGN. Then add the per-row item
pointer, giving total per-row space of 152 or 156 bytes. That
means you can fit either 53 or 52 rows per page, giving either 188
or 192 pages as the minimum possible file size. Evidently, Sharmila
is using a MAXALIGN=4 machine and has a few dead rows in there.

regards, tom lane

#5SHARMILA JOTHIRAJAH
sharmi_jo@yahoo.com
In reply to: Tom Lane (#4)

Calculation
varchar = (overhead) 4 + (actual length of string) 3 = 7 bytes
*for 5 varchar cols =
5*7 = 35 bytes

numeric (according to manual--- The actual storage requirement is

two bytes for each group of four decimal digits, plus eight bytes
overhead )
numeric = ( 10/4)*2 +8 = 13 bytes
*for 5 numeric cols = 13 *5 = 65

Not that it will make much difference, but you need to round up in
the 10/4 part so you get 14 bytes, not 13.

Also, this calculation is ignoring the fact that (pre-8.3) varlena
values have to be int-aligned, so there's wasted pad space too.
The varchar values really need 8 bytes each, and the numeric values
16, so the actual data payload in each row is 120 bytes. Then add
HeapTupleHeader (28 bytes), and then MAXALIGN the whole row size.
So the rowsize would be either 148 or 152 bytes depending on if you
were on a machine with 8-byte MAXALIGN. Then add the per-row item
pointer, giving total per-row space of 152 or 156 bytes. That
means you can fit either 53 or 52 rows per page, giving either 188
or 192 pages as the minimum possible file size. Evidently, Sharmila
is using a MAXALIGN=4 machine and has a few dead rows in there.

Thanks for the detailed explanation Tom. You are rigt...The number of pages for this tables as per pg_class is 189.

I have a few questions
1. How do you find the MAXALIGN of the machine? And what is that used for?
2. How does null columns account for this space. For example, if I have the same table (10 cols --5 varchars and 5 numerics) in which 8 cols are null and 2 cols(1 varchar and 1 num) are not null, how is the storage space affected for this case?

Thanks again
sharmila

____________________________________________________________________________________
Get easy, one-click access to your favorites.
Make Yahoo! your homepage.
http://www.yahoo.com/r/hs

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: SHARMILA JOTHIRAJAH (#5)

SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com> writes:

1. How do you find the MAXALIGN of the machine? And what is that used for?

pg_controldata will show "maximum data alignment". A rule of thumb is
that it's 4 on 32-bit machines and 8 on 64-bit machines, but there are
exceptions.

2. How does null columns account for this space.

If there are any nulls in a row then you pay for a null bitmap with 1
bit/column, but the null columns themselves aren't stored and hence
take zero space. In your example the bitmap needs 10 bits, but after
allowing for alignment the effect is that the heap tuple header gets
4 bytes bigger if there's any nulls.

regards, tom lane

#7SHARMILA JOTHIRAJAH
sharmi_jo@yahoo.com
In reply to: Tom Lane (#6)

Thanks Tom and Erik and all the others who helped.
You guys really rock!!!
Sharmila

----- Original Message ----
From: Tom Lane <tgl@sss.pgh.pa.us>
To: SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com>
Cc: Erik Jones <erik@myemma.com>; pgsql-general@postgresql.org
Sent: Wednesday, November 21, 2007 1:14:02 PM
Subject: Re: [GENERAL] Postgres table size

SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com> writes:

1. How do you find the MAXALIGN of the machine? And what is that

used for?

pg_controldata will show "maximum data alignment". A rule of thumb is
that it's 4 on 32-bit machines and 8 on 64-bit machines, but there are
exceptions.

2. How does null columns account for this space.

If there are any nulls in a row then you pay for a null bitmap with 1
bit/column, but the null columns themselves aren't stored and hence
take zero space. In your example the bitmap needs 10 bits, but after
allowing for alignment the effect is that the heap tuple header gets
4 bytes bigger if there's any nulls.

regards, tom lane

____________________________________________________________________________________
Never miss a thing. Make Yahoo your home page.
http://www.yahoo.com/r/hs

#8Martijn van Oosterhout
kleptog@svana.org
In reply to: SHARMILA JOTHIRAJAH (#5)

On Wed, Nov 21, 2007 at 09:51:22AM -0800, SHARMILA JOTHIRAJAH wrote:

Thanks for the detailed explanation Tom. You are rigt...The number of pages for this tables as per pg_class is 189.

I have a few questions
1. How do you find the MAXALIGN of the machine? And what is that used for?

MAXALIGN is the preferred alignment for structure objects, iirc. It's
generally less efficient to access unaligned objects than aligned ones,
though this is architechure specific. You should be able to find it in
your pg_config.h under MAXIMUM_ALIGNOF,.

2. How does null columns account for this space. For example, if I
have the same table (10 cols --5 varchars and 5 numerics) in which 8
cols are null and 2 cols(1 varchar and 1 num) are not null, how is
the storage space affected for this case?

The cost is typically one bit per column in the table, unless there are
no NULLs in which case it costs nothing. 10 cols = 2 bytes, which may
or may not be swallowed by alignment.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Those who make peaceful revolution impossible will make violent revolution inevitable.
-- John F Kennedy