Re: Postgres table size
Hi
Thanks. Ill post it on the mailing list when I get the results.
Im trying to calculate and see how the tablesize works for a simple
table.
I have a table with 10 cols
5 varchars _ it is declared as varchar(40) but contains data of length 3
5 numeric - declared as numeric(22,0) but contains data of precision 10
There are 10000 rows
select * from pg_relation_size gives 1548288 bytes as the table's size
select relpages from pg_class for that table gives 189 pages
Calculation
varchar = (overhead) 4 + (actual length of string) 3 = 7 bytes
*for 5 varchar
cols = 5*7 = 35 bytes
numeric
(according to manual--- The actual storage requirement is two bytes for
each group of four decimal digits, plus eight bytes overhead )
numeric = ( 10/4)*2 +8 = 13 bytes
*for 5 numeric cols = 13 *5 = 65
*row overhead = 32
So
Bytes per row = 35 + 65 +32 = 132 bytes
for 10000
rows = 1320000
The manual says ---The first 20 bytes of each page consists of a page header (PageHeaderData).
There are 189 pages
*page header cost = 20*189 = 3780 bytes
Therefore Total = 1320000 (row cost) + 3780 (page header cost)=1323780 bytes for this table
Is this calculation right? But the size of the table according to pg_relation_size is 1548288 bytes
What am I missing in my calculation?
Thanks for your help.
Thanks
sharmila
----- Original Message ----
From: Jeff Davis <pgsql@j-davis.com>
To: SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com>
Sent: Friday, November 16, 2007 2:12:46 PM
Subject: Re: [GENERAL] Postgres table size
On Fri, 2007-11-16 at 07:36 -0800, SHARMILA JOTHIRAJAH wrote:
Hi,
You were right. I installed beta2 and the table size now is
4682817536. ThanksHow do you estimate the table size generally., ie.,
what is the storage size of varchar, numeric and timestamp columns
and
What is the row overhead?
For example, If I have a table (in postgres8.2.3 version) with 10
rows
and 3 columns (varchar,numeric(22,0),timestamp) , how do I estimate
their storage size. How does 8.3Beta-2 handle this?
this will be very helpful for me for allocating the space properly
Thanks again for your help
sharmila
That's good news. Please post to pgsql-advocacy to show the reduction
in
table size that 8.3 brings for you, and the performance difference that
means for you. If it helps you to choose PostgreSQL instead of some
other database that would be great to mention too (however, some
databases don't like you to publish benchmarks, so be careful not to
violate your license).
The official docs are here:
http://developer.postgresql.org/pgdocs/postgres/storage.html (8.3)
http://www.postgresql.org/docs/8.2/static/storage.html (8.2)
The way I think about it is simple:
In 8.2:
* 32 bytes of row overhead: 28 byte row header + 4 byte line pointer
* 4 bytes of overhead for every variable-width type: to store length
In 8.3:
* 28 bytes of row overhead: 24 byte row header + 4 byte line pointer
* 1-4 bytes of overhead for every variable-width type: only one byte of
overhead if length < 127 bytes, up to 4 bytes if it is longer.
Regards,
Jeff Davis
____________________________________________________________________________________
Be a better pen pal.
Text or chat with friends inside Yahoo! Mail. See how. http://overview.mail.yahoo.com/
On Nov 21, 2007, at 9:17 AM, SHARMILA JOTHIRAJAH wrote:
Hi
Thanks. Ill post it on the mailing list when I get the results. Im
trying to calculate and see how the tablesize works for a simple
table.I have a table with 10 cols
5 varchars _ it is declared as varchar(40) but contains data of
length 3
5 numeric - declared as numeric(22,0) but contains data of
precision 10
There are 10000 rowsselect * from pg_relation_size gives 1548288 bytes as the table's size
select relpages from pg_class for that table gives 189 pagesCalculation
varchar = (overhead) 4 + (actual length of string) 3 = 7 bytes
*for 5 varchar cols =
5*7 = 35 bytesnumeric (according to manual--- The actual storage requirement is
two bytes for each group of four decimal digits, plus eight bytes
overhead )
numeric = ( 10/4)*2 +8 = 13 bytes
*for 5 numeric cols = 13 *5 = 65
Not that it will make much difference, but you need to round up in
the 10/4 part so you get 14 bytes, not 13.
*row overhead = 32
So
Bytes per row = 35 + 65 +32 = 132 bytes
for 10000 rows = 1320000The manual says ---The first 20 bytes of each page consists of a
page header (PageHeaderData).
There are 189 pages
*page header cost = 20*189 = 3780 bytesTherefore Total = 1320000 (row cost) + 3780 (page header cost)
=1323780 bytes for this tableIs this calculation right? But the size of the table according to
pg_relation_size is 1548288 bytesWhat am I missing in my calculation?
One thing to note is that the relpages value is only exact from the
time of a vacuum until the next dml statement on the table, i.e.
relpages is not updated after inserts, update, and deletes, just
after vacuums.
Erik Jones
Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)
Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com
I vacuumed the table before I got this relpages count. When i make
the
change tat you mentioned the total table size as per my calculation is
1373780 and the result thro pg_relation_size is 1548288.
What other overheads are there for the table? Im not sure how to find
the free space? But I didnot delete/insert/update any rows after the
first insertion.
Thanks
sharmila
----- Original Message ----
From: Erik Jones <erik@myemma.com>
To: SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com>
Cc: pgsql-general@postgresql.org
Sent: Wednesday, November 21, 2007 11:38:44 AM
Subject: Re: [GENERAL] Postgres table size
On Nov 21, 2007, at 9:17 AM, SHARMILA JOTHIRAJAH wrote:
Hi
Thanks. Ill post it on the mailing list when I get the results. Im
trying to calculate and see how the tablesize works for a simple
table.I have a table with 10 cols
5 varchars _ it is declared as varchar(40) but contains data of
length 3
5 numeric - declared as numeric(22,0) but contains data of
precision 10
There are 10000 rowsselect * from pg_relation_size gives 1548288 bytes as the table's
size
select relpages from pg_class for that table gives 189 pages
Calculation
varchar = (overhead) 4 + (actual length of string) 3 = 7 bytes
*for 5 varchar cols =
5*7 = 35 bytesnumeric (according to manual--- The actual storage requirement is
two bytes for each group of four decimal digits, plus eight bytes
overhead )
numeric = ( 10/4)*2 +8 = 13 bytes
*for 5 numeric cols = 13 *5 = 65
Not that it will make much difference, but you need to round up in
the 10/4 part so you get 14 bytes, not 13.
*row overhead = 32
So
Bytes per row = 35 + 65 +32 = 132 bytes
for 10000 rows = 1320000The manual says ---The first 20 bytes of each page consists of a
page header (PageHeaderData).
There are 189 pages
*page header cost = 20*189 = 3780 bytesTherefore Total = 1320000 (row cost) + 3780 (page header cost)
=1323780 bytes for this tableIs this calculation right? But the size of the table according to
pg_relation_size is 1548288 bytesWhat am I missing in my calculation?
One thing to note is that the relpages value is only exact from the
time of a vacuum until the next dml statement on the table, i.e.
relpages is not updated after inserts, update, and deletes, just
after vacuums.
Erik Jones
Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)
Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com
---------------------------(end of
broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
____________________________________________________________________________________
Be a better sports nut! Let your teams follow you
with Yahoo Mobile. Try it now. http://mobile.yahoo.com/sports;_ylt=At9_qDKvtAbMuh1G1SQtBI7ntAcJ
Import Notes
Resolved by subject fallback
Erik Jones <erik@myemma.com> writes:
On Nov 21, 2007, at 9:17 AM, SHARMILA JOTHIRAJAH wrote:
Calculation
varchar = (overhead) 4 + (actual length of string) 3 = 7 bytes
*for 5 varchar cols =
5*7 = 35 bytesnumeric (according to manual--- The actual storage requirement is
two bytes for each group of four decimal digits, plus eight bytes
overhead )
numeric = ( 10/4)*2 +8 = 13 bytes
*for 5 numeric cols = 13 *5 = 65
Not that it will make much difference, but you need to round up in
the 10/4 part so you get 14 bytes, not 13.
Also, this calculation is ignoring the fact that (pre-8.3) varlena
values have to be int-aligned, so there's wasted pad space too.
The varchar values really need 8 bytes each, and the numeric values
16, so the actual data payload in each row is 120 bytes. Then add
HeapTupleHeader (28 bytes), and then MAXALIGN the whole row size.
So the rowsize would be either 148 or 152 bytes depending on if you
were on a machine with 8-byte MAXALIGN. Then add the per-row item
pointer, giving total per-row space of 152 or 156 bytes. That
means you can fit either 53 or 52 rows per page, giving either 188
or 192 pages as the minimum possible file size. Evidently, Sharmila
is using a MAXALIGN=4 machine and has a few dead rows in there.
regards, tom lane
Calculation
varchar = (overhead) 4 + (actual length of string) 3 = 7 bytes
*for 5 varchar cols =
5*7 = 35 bytesnumeric (according to manual--- The actual storage requirement is
two bytes for each group of four decimal digits, plus eight bytes
overhead )
numeric = ( 10/4)*2 +8 = 13 bytes
*for 5 numeric cols = 13 *5 = 65
Not that it will make much difference, but you need to round up in
the 10/4 part so you get 14 bytes, not 13.
Also, this calculation is ignoring the fact that (pre-8.3) varlena
values have to be int-aligned, so there's wasted pad space too.
The varchar values really need 8 bytes each, and the numeric values
16, so the actual data payload in each row is 120 bytes. Then add
HeapTupleHeader (28 bytes), and then MAXALIGN the whole row size.
So the rowsize would be either 148 or 152 bytes depending on if you
were on a machine with 8-byte MAXALIGN. Then add the per-row item
pointer, giving total per-row space of 152 or 156 bytes. That
means you can fit either 53 or 52 rows per page, giving either 188
or 192 pages as the minimum possible file size. Evidently, Sharmila
is using a MAXALIGN=4 machine and has a few dead rows in there.
Thanks for the detailed explanation Tom. You are rigt...The number of pages for this tables as per pg_class is 189.
I have a few questions
1. How do you find the MAXALIGN of the machine? And what is that used for?
2. How does null columns account for this space. For example, if I have the same table (10 cols --5 varchars and 5 numerics) in which 8 cols are null and 2 cols(1 varchar and 1 num) are not null, how is the storage space affected for this case?
Thanks again
sharmila
____________________________________________________________________________________
Get easy, one-click access to your favorites.
Make Yahoo! your homepage.
http://www.yahoo.com/r/hs
Import Notes
Resolved by subject fallback
SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com> writes:
1. How do you find the MAXALIGN of the machine? And what is that used for?
pg_controldata will show "maximum data alignment". A rule of thumb is
that it's 4 on 32-bit machines and 8 on 64-bit machines, but there are
exceptions.
2. How does null columns account for this space.
If there are any nulls in a row then you pay for a null bitmap with 1
bit/column, but the null columns themselves aren't stored and hence
take zero space. In your example the bitmap needs 10 bits, but after
allowing for alignment the effect is that the heap tuple header gets
4 bytes bigger if there's any nulls.
regards, tom lane
Thanks Tom and Erik and all the others who helped.
You guys really rock!!!
Sharmila
----- Original Message ----
From: Tom Lane <tgl@sss.pgh.pa.us>
To: SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com>
Cc: Erik Jones <erik@myemma.com>; pgsql-general@postgresql.org
Sent: Wednesday, November 21, 2007 1:14:02 PM
Subject: Re: [GENERAL] Postgres table size
SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com> writes:
1. How do you find the MAXALIGN of the machine? And what is that
used for?
pg_controldata will show "maximum data alignment". A rule of thumb is
that it's 4 on 32-bit machines and 8 on 64-bit machines, but there are
exceptions.
2. How does null columns account for this space.
If there are any nulls in a row then you pay for a null bitmap with 1
bit/column, but the null columns themselves aren't stored and hence
take zero space. In your example the bitmap needs 10 bits, but after
allowing for alignment the effect is that the heap tuple header gets
4 bytes bigger if there's any nulls.
regards, tom lane
____________________________________________________________________________________
Never miss a thing. Make Yahoo your home page.
http://www.yahoo.com/r/hs
Import Notes
Resolved by subject fallback
On Wed, Nov 21, 2007 at 09:51:22AM -0800, SHARMILA JOTHIRAJAH wrote:
Thanks for the detailed explanation Tom. You are rigt...The number of pages for this tables as per pg_class is 189.
I have a few questions
1. How do you find the MAXALIGN of the machine? And what is that used for?
MAXALIGN is the preferred alignment for structure objects, iirc. It's
generally less efficient to access unaligned objects than aligned ones,
though this is architechure specific. You should be able to find it in
your pg_config.h under MAXIMUM_ALIGNOF,.
2. How does null columns account for this space. For example, if I
have the same table (10 cols --5 varchars and 5 numerics) in which 8
cols are null and 2 cols(1 varchar and 1 num) are not null, how is
the storage space affected for this case?
The cost is typically one bit per column in the table, unless there are
no NULLs in which case it costs nothing. 10 cols = 2 bytes, which may
or may not be swallowed by alignment.
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
Those who make peaceful revolution impossible will make violent revolution inevitable.
-- John F Kennedy