database size
Hi,
I created a table with two columns of type int, and loaded about 300 K records
in it. So, the total size of the table is approx. that of 600 K integers,
roughly 2.4 MB.
But, the file corresponding to the table in pgsql/data/base directory
has a size of 19 MB. I was wondering if I have done something wrong in
the installation or usage, or is it the normal behavior ?
Also, I was trying to execute the query:
select item as item, count(*) as cnt into table C_temp
from temp group by item;
Here, temp is the name of the table which contains the data and item is an
integer attribute. While doing the sort for the group by, the size of one of
the temporary pg_psort relation grows to about 314 MB. The size of the temp
table is as mentioned above. If someone tried similar queries, could you
please tell me if this is normal.
The above query did not finish even after 2 hours. I am executing it on a
Sun Sparc 5 running Sun OS 5.5.
Thanks
--shiby
I created a table with two columns of type int, and loaded about 300 K records
in it. So, the total size of the table is approx. that of 600 K integers,
roughly 2.4 MB.
But, the file corresponding to the table in pgsql/data/base directory
has a size of 19 MB. I was wondering if I have done something wrong in
the installation or usage, or is it the normal behavior ?
48 bytes + each row header (on my aix box..._your_ mileage may vary)
8 bytes + two int fields @ 4 bytes each
4 bytes + pointer on page to tuple
-------- =
60 bytes per tuple
8192 / 60 give 136 tuples per page.
300000 / 136 ... round up ... need 2206 pages which gives us ...
2206 * 8192 = 18,071,532
So 19 MB is about right. And this is the best to be done, unless
you can make do with int2s which would optimally shrink the table
size to 16,834,560 bytes. Any nulls in there might add a few bytes
per offending row too, but other than that, this should be considered
normal postgresql behavior.
...
One massive sort file...
...
This one I don't know if is "normal"...
Darren aka darrenk@insightdist.com
Import Notes
Resolved by subject fallback
Shiby Thomas wrote:
Hi,
I created a table with two columns of type int, and loaded about 300 K records
in it. So, the total size of the table is approx. that of 600 K integers,
roughly 2.4 MB.
But, the file corresponding to the table in pgsql/data/base directory
has a size of 19 MB. I was wondering if I have done something wrong in
the installation or usage, or is it the normal behavior ?
This is OK. First thing - int is not 2 bytes long, it's 4 bytes long.
Use int2 if you want so. Second - you have to add up other per-record
stuff like oids and other internal attributes.
Also, I was trying to execute the query:
select item as item, count(*) as cnt into table C_temp
from temp group by item;Here, temp is the name of the table which contains the data and item is an
integer attribute. While doing the sort for the group by, the size of one of
the temporary pg_psort relation grows to about 314 MB. The size of the temp
table is as mentioned above.
It ain't good. Seems like the psort is very hungry.
Mike
--
WWW: http://www.lodz.pdi.net/~mimo tel: Int. Acc. Code + 48 42 148340
add: Michal Mosiewicz * Bugaj 66 m.54 * 95-200 Pabianice * POLAND
On Tue, 6 Jan 1998, Shiby Thomas wrote:
Hi,
I created a table with two columns of type int, and loaded about 300 K records
in it. So, the total size of the table is approx. that of 600 K integers,
roughly 2.4 MB.
But, the file corresponding to the table in pgsql/data/base directory
has a size of 19 MB. I was wondering if I have done something wrong in
the installation or usage, or is it the normal behavior ?Also, I was trying to execute the query:
select item as item, count(*) as cnt into table C_temp
from temp group by item;Here, temp is the name of the table which contains the data and item is an
integer attribute. While doing the sort for the group by, the size of one of
the temporary pg_psort relation grows to about 314 MB. The size of the temp
table is as mentioned above. If someone tried similar queries, could you
please tell me if this is normal.
The above query did not finish even after 2 hours. I am executing it on a
Sun Sparc 5 running Sun OS 5.5.
What version of PostgreSQL are you running? *raised eyebrow*
Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
On Tue, 6 Jan 1998, Darren King wrote:
48 bytes + each row header (on my aix box..._your_ mileage may vary)
8 bytes + two int fields @ 4 bytes each
4 bytes + pointer on page to tuple
-------- =
60 bytes per tuple8192 / 60 give 136 tuples per page.
300000 / 136 ... round up ... need 2206 pages which gives us ...
2206 * 8192 = 18,071,532
So 19 MB is about right. And this is the best to be done, unless
you can make do with int2s which would optimally shrink the table
size to 16,834,560 bytes. Any nulls in there might add a few bytes
per offending row too, but other than that, this should be considered
normal postgresql behavior.
Bruce...this would be *great* to have in the FAQ!! What we do need is
a section of the User Manual dealing with computing resources required for
a table, similar to this :)
Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
=> What version of PostgreSQL are you running? *raised eyebrow*
6.2.1. I haven't yet applied the patches(put in the PostgreSQL web page)
though.
--shiby
I created a table with two columns of type int, and loaded about 300 K records
in it. So, the total size of the table is approx. that of 600 K integers,
roughly 2.4 MB.
But, the file corresponding to the table in pgsql/data/base directory
has a size of 19 MB. I was wondering if I have done something wrong in
the installation or usage, or is it the normal behavior ?48 bytes + each row header (on my aix box..._your_ mileage may vary)
8 bytes + two int fields @ 4 bytes each
4 bytes + pointer on page to tuple
-------- =
60 bytes per tuple8192 / 60 give 136 tuples per page.
300000 / 136 ... round up ... need 2206 pages which gives us ...
2206 * 8192 = 18,071,532
So 19 MB is about right. And this is the best to be done, unless
you can make do with int2s which would optimally shrink the table
size to 16,834,560 bytes. Any nulls in there might add a few bytes
per offending row too, but other than that, this should be considered
normal postgresql behavior.
Nice math exercise.
Does anyone want to tell me the row overhead on commercial databases?
--
Bruce Momjian
maillist@candle.pha.pa.us
On Tue, 6 Jan 1998, Darren King wrote:
48 bytes + each row header (on my aix box..._your_ mileage may vary)
8 bytes + two int fields @ 4 bytes each
4 bytes + pointer on page to tuple
-------- =
60 bytes per tuple8192 / 60 give 136 tuples per page.
300000 / 136 ... round up ... need 2206 pages which gives us ...
2206 * 8192 = 18,071,532
So 19 MB is about right. And this is the best to be done, unless
you can make do with int2s which would optimally shrink the table
size to 16,834,560 bytes. Any nulls in there might add a few bytes
per offending row too, but other than that, this should be considered
normal postgresql behavior.Bruce...this would be *great* to have in the FAQ!! What we do need is
a section of the User Manual dealing with computing resources required for
a table, similar to this :)
Added to FAQ.
--
Bruce Momjian
maillist@candle.pha.pa.us
48 bytes + each row header (on my aix box..._your_ mileage may vary)
8 bytes + two int fields @ 4 bytes each
4 bytes + pointer on page to tuple
-------- =
60 bytes per tuple...
Nice math exercise.
Does anyone want to tell me the row overhead on commercial databases?
I've seen this for Oracle, but I _can't_ find it right now. I'll dig it
up tonite...this is driving me nuts trying to remember where it is now.
But this I do have handy! It's an HTML page from IBM DB2 docs. A touch
long, but I found it to most interesting.
If there are any of the linked pages that someone else is interested in,
contact me and if I have it, I can send it to you off-list.
Darren aka darrenk@insightdist.com
<HTML>
<HEAD>
<TITLE>DB2 Administration Guide</TITLE>
</HEAD>
<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#9900CC" VLINK="#3366CC" ALINK="#3399CC">
<H2><A NAME="HDRDBSIZE"></A>Estimating Space Requirements for Tables</H2>
<P>The following information provides a general rule for estimating the
size of a database: </P>
<UL COMPACT>
<LI><A HREF="#HDROPCAT">"System Catalog Tables"</A> </LI>
<LI><A HREF="#HDROPDAT">"User Table Data"</A> </LI>
<LI><A HREF="#HDROPLF">"Long Field Data"</A> </LI>
<LI><A HREF="#HDROPLOB">"Large Object (LOB) Data"</A> </LI>
<LI><A HREF="#HDROPINX">"Index Space"</A> </LI>
</UL>
<P>After reading these sections, you should read <A HREF="sqld00025.html#HDRTBSPACE">"Designing
and Choosing Table Spaces"</A>. </P>
<P>Information is not provided for the space required by such things as:
</P>
<UL COMPACT>
<LI>The local database directory file </LI>
<LI>The system database directory file </LI>
<LI>The file management overhead required by the operating system, including:
</LI>
<UL COMPACT>
<LI>file block size </LI>
<LI>directory control space </LI>
</UL>
</UL>
<P>Information such as row size and structure is precise. However, multiplication
factors for file overhead because of disk fragmentation, free space, and
variable length columns will vary in your own database since there is such
a wide range of possibilities for the column types and lengths of rows
in a database. After initially estimating your database size, create a
test database and populate it with representative data. You will then find
a multiplication factor that is more accurate for your own particular database
design. </P>
<H3><A NAME="HDROPCAT"></A>System Catalog Tables</H3>
<P>When a database is initially created, system catalog tables are created.
These system tables will grow as user tables, views, indexes, authorizations,
and packages are added to the database. Initially, they use approximately
1600 KB of disk space. </P>
<P>The amount of space allocated for the catalog tables depends on the
type of table space and the extent size for the table space. For example,
if a DMS table space with an extent size of 32 is used, the catalog table
space will initially be allocated 20MB of space. For more information,
see <A HREF="sqld00025.html#HDRTBSPACE">"Designing and Choosing Table
Spaces"</A>. </P>
<H3><A NAME="HDROPDAT"></A>User Table Data</H3>
<P>Table data is stored on 4KB pages. Each page contains 76 bytes of overhead
for the database manager. This leaves 4020 bytes to hold user data (or
rows), although no row can exceed 4005 bytes in length. A row will <I>not</I>
span multiple pages. </P>
<P>Note that the table data pages <B>do not</B> contain the data for columns
defined with LONG VARCHAR, LONG VARGRAPHIC, BLOB, CLOB, or DBCLOB data
types. The rows in a table data page do, however, contain a descriptor
of these columns. (See <A HREF="#HDROPLF">"Long Field Data"</A>
for information about estimating the space required for the table objects
that will contain the data stored using these data types.) </P>
<P>Rows are inserted into the table in a first-fit order. The file is searched
(using a free space map) for the first available space that is large enough
to hold the new row. When a row is updated, it is updated in place unless
there is insufficient room left on the 4KB page to contain it. If this
is the case, a "tombstone record" is created in the original
row location which points to the new location in the table file of the
updated row. </P>
<P>See <A HREF="#HDROPLF">"Long Field Data"</A> for information
about how LONG VARCHAR, LONG VARGRAPHIC, BLOB, CLOB and DBCLOB data is
stored and for estimating the space required to store these types of columns.
</P>
<P>For each user table in the database, the space needed is: </P>
<PRE> (average row size + 8) * number of rows * 1.5
</PRE>
<P>The average row size is the sum of the average column sizes. For information
on the size of each column, see CREATE TABLE in the <A HREF="/data/db2/support/sqls00aa/sqls0.html"><I>SQL
Reference</I>. </A></P>
<P>The factor of "1.5" is for overhead such as page overhead
and free space. </P>
<H3><A NAME="HDROPLF"></A>Long Field Data</H3>
<P>If a table has LONG VARCHAR or LONG VARGRAPHIC data, in addition to
the byte count of 20 for the LONG VARCHAR or LONG VARGRAPHIC descriptor
(in the table row), the data itself must be stored. Long field data is
stored in a separate table object which is structured differently from
the other data types (see <A HREF="#HDROPDAT">"User Table Data"</A>
and <A HREF="#HDROPLOB">"Large Object (LOB) Data"</A>). </P>
<P>Data is stored in 32KB areas that are broken up into segments whose
sizes are "powers of two" times 512 bytes. (Hence these segments
can be 512 bytes, 1024 bytes, 2048 bytes, and so on, up to 32KB.) </P>
<P>They are stored in a fashion that enables free space to be reclaimed
easily. Allocation and free space information is stored in 4KB allocation
pages, which appear infrequently throughout the object. </P>
<P>The amount of unused space in the object depends on the size of the
long field data and whether this size is relatively constant across all
occurrences of the data. For data entries larger than 255 bytes, this unused
space can be up to 50 percent of the size of the long field data. </P>
<P>If character data is less than 4KB in length, the CHAR, GRAPHIC, VARCHAR,
or VARGRAPHIC data types should be used instead of LONG VARCHAR or LONG
VARGRAPHIC. </P>
<H3><A NAME="HDROPLOB"></A>Large Object (LOB) Data</H3>
<P>If a table has BLOB, CLOB, or DBCLOB data, in addition to the byte count
(between 72 and 280 bytes) for the BLOB, CLOB, or DBCLOB descriptor (in
the table row), the data itself must be stored. This data is stored in
two separate table objects that are structured differently than other data
types (see <A HREF="#HDROPDAT">"User Table Data"</A>). </P>
<P>To estimate the space required by large object data, you need to consider
the two table objects used to store data defined with these data types:
</P>
<UL>
<LI><B>LOB Data Objects</B> </LI>
<P>Data is stored in 64MB areas that are broken up into segments whose
sizes are "powers of two" times 1024 bytes. (Hence these segments
can be 1024 bytes, 2048 bytes, 4096 bytes, and so on, up to 64MB.) </P>
<P>To reduce the amount of disk space used by the LOB data, you can use
the COMPACT parameter on the <I>lob-options-clause</I> on the CREATE TABLE
and ALTER TABLE statements. The COMPACT option minimizes the amount of
disk space required by allowing the LOB data to be split into smaller segments
so that it will use the smallest amount of space possible. Without the
COMPACT option, the entire LOB value must contiguously fit into a single
segment. Appending to LOB values stored using the COMPACT option may result
in slower performance compared to LOB values for which the COMPACT option
is not specified. </P>
<P>The amount of free space contained in LOB data objects will be influenced
by the amount of update and delete activity, as well as the size of the
LOB values being inserted. </P>
<LI><B>LOB Allocation Objects</B> </LI>
<P>Allocation and free space information is stored in 4KB allocation pages
separated from the actual data. The number of these 4KB pages is dependent
on the amount of data, including unused space, allocated for the large
object data. The overhead is calculated as follows: one 4KB pages for every
64GB plus one 4KB page for every 8MB. </P>
</UL>
<P>If character data is less than 4KB in length, the CHAR, GRAPHIC, VARCHAR,
or VARGRAPHIC data types should be used instead of BLOB, CLOB or DBCLOB.
</P>
<H3><A NAME="HDROPINX"></A>Index Space</H3>
<P>For each index, the space needed can be estimated as: </P>
<PRE> (average index key size + 8) * number of rows * 2
</PRE>
<P>where: </P>
<UL COMPACT>
<LI>The "average index key size" is the byte count of each column
in the index key. See the CREATE TABLE statement <A HREF="/data/db2/support/sqls00aa/sqls0.html"><I>SQL
Reference</I> </A>for information on how to calculate the byte count for
columns with different data types. (Note that to estimate the average column
size for VARCHAR and VARGRAPHIC columns, use an average of the current
data size, plus one byte. Do not use the maximum declared size.) </LI>
<LI>The factor of 2 is for overhead, such as non-leaf pages and free space.
</LI>
</UL>
<P><B>Note: </B></P>
<BLOCKQUOTE>
<P>For every column that allows nulls, add one extra byte for the null
indicator. </P>
</BLOCKQUOTE>
<P>Temporary space is required when creating the index. The maximum amount
of temporary space required during index creation can be estimated as:
</P>
<PRE> (average index key size + 8) * number of rows * 3.2
</PRE>
<P>where the factor of 3.2 is for index overhead as well as space required
for the sorting needed to create the index. </P>
<P>
<HR><B>[ <A HREF="sqld0.html#ToC">Table of Contents</A>
| <A HREF="sqld00022.html">Previous Page</A> | <A HREF="sqld00024.html">Next
Page</A> ]</B>
<HR></P>
</BODY>
</HTML>
Import Notes
Resolved by subject fallback
I've seen this for Oracle, but I _can't_ find it right now. I'll dig it
up tonite...this is driving me nuts trying to remember where it is now.But this I do have handy! It's an HTML page from IBM DB2 docs. A touch
long, but I found it to most interesting.If there are any of the linked pages that someone else is interested in,
contact me and if I have it, I can send it to you off-list.
Interesting that they have "tombstone" records, which sounds like our
time travel that vacuum cleans up.
They recommend (rowsize+8) * 1.5.
Sounds like we are not too bad.
I assume our index overhead is not as large as data rows, but still
significant. I am adding a mention of it to the FAQ. That comes up
often too.
Indexes do not contain the same overhead, but do contain the
data that is being indexed, so they can be large also.
--
Bruce Momjian
maillist@candle.pha.pa.us
I created a table with two columns of type int, and loaded about 300 K records
in it. So, the total size of the table is approx. that of 600 K integers,
roughly 2.4 MB.
But, the file corresponding to the table in pgsql/data/base directory
has a size of 19 MB. I was wondering if I have done something wrong in
the installation or usage, or is it the normal behavior ?48 bytes + each row header (on my aix box..._your_ mileage may vary)
8 bytes + two int fields @ 4 bytes each
4 bytes + pointer on page to tuple
-------- =
60 bytes per tuple8192 / 60 give 136 tuples per page.
300000 / 136 ... round up ... need 2206 pages which gives us ...
2206 * 8192 = 18,071,532
The above is for the current release of 6.2.1. For 6.3, a couple of things
have been removed from the header that gives a 13% size savings for the above.
That percentage will go down of course as you add fields to the table.
A little more accurate by including the tuple rounding before storage. For
me the above would still be true if there is one or two int4s since the four
bytes I would save would be taken back by the double-word tuple alignment.
With the current src tree...again, all with aix alignment...
40 bytes + each row header
8 bytes + two int fields @ 4 bytes each
--------- =
48 bytes per tuple (round up to next highest mulitple of 8)
4 bytes + pointer on page to tuple
--------- =
52 bytes per tuple
8192 bytes - page size
8 bytes - page header
0 bytes - "special" Opaque space at page end...currently unused.
---------- =
8184 bytes
8184 / 52 gives 157 tuples per page.
300000 / 157 ... round up ... need 1911 pages which gives us ...
1911 * 8192 = 15,654,912 ... 13% smaller than 6.2 file size!
space = pg_sz * ceil(num_tuples / floor((pg_sz - pg_hdr - pg_opaque) / tup_sz))
where tup_sz is figured out from above. You can figure out what your
platform is using by creating the table, inserting one record and then
examining the table file with a binary editor such as bpatch or beav.
Using the above and knowing the size of the fields, you should be able
to accurately calculate the amount a space any table will require before
you create it.
darrenk
Import Notes
Resolved by subject fallback
A little more accurate by including the tuple rounding before storage. For
me the above would still be true if there is one or two int4s since the four
bytes I would save would be taken back by the double-word tuple alignment.With the current src tree...again, all with aix alignment...
40 bytes + each row header
8 bytes + two int fields @ 4 bytes each
--------- =
48 bytes per tuple (round up to next highest mulitple of 8)
4 bytes + pointer on page to tuple
--------- =
52 bytes per tuple
Thanks. Updated FAQ.
--
Bruce Momjian
maillist@candle.pha.pa.us