Database Bloat
Hi,
I am using PostgreSQL 9.1 and loading very large tables ( 13 million
rows each ). The flat file size is only 25M. However, the equivalent
database table is 548MB. This is without any indexes applied and auto
vacuum turned on. I have read that the bloat can be around 5 times
greater for tables than flat files so over 20 times seems quite excessive.
Any ideas on how to go about decreasing this bloat or is this not
unexpected for such large tables?
Thanks
-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
owner@postgresql.org] On Behalf Of elliott
Sent: Monday, August 20, 2012 1:54 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Database BloatHi,
I am using PostgreSQL 9.1 and loading very large tables ( 13 million rows
each
). The flat file size is only 25M. However, the equivalent database
table is
548MB. This is without any indexes applied and auto vacuum turned on. I
have read that the bloat can be around 5 times greater for tables than
flat
files so over 20 times seems quite excessive.
Any ideas on how to go about decreasing this bloat or is this not
unexpected
for such large tables?
Thanks
Kinda guessing here but that 5x estimate has some assumptions built in. I
am guessing that a table that has many large plain-text data would compress
more than one with a mixture of numbers, varchars, dates and other
less-compressible datatypes.
It would help to provide a general description of the structure of said
tables and how large individual fields (if bytea or text) tend to be.
I would think that filesystem parameters come into play as well and you do
not specify the OS on which you are running.
Do you have any idea which specific tables are "bloated"? If there are only
a few main contributors what is different about them?
More questions than answers but something to ponder while you wait for more
knowledgeable persons to respond.
David J.
On 08/20/12 10:53 AM, elliott wrote:
Hi,
I am using PostgreSQL 9.1 and loading very large tables ( 13 million
rows each ). The flat file size is only 25M. However, the equivalent
database table is 548MB. This is without any indexes applied and auto
vacuum turned on. I have read that the bloat can be around 5 times
greater for tables than flat files so over 20 times seems quite
excessive.Any ideas on how to go about decreasing this bloat or is this not
unexpected for such large tables?
what do the fields of this table look like (output from \d tablename
would be useful)
--
john r pierce N 37, W 122
santa cruz ca mid-left coast
envdb=# \d astgtm2_n60e073;
Table "public.astgtm2_n60e073"
Column | Type | Modifiers
--------+---------+-----------
lat | real |
lon | real |
alt | integer |
Indexes:
"q3c_astgtm2_n60e073_idx" btree (q3c_ang2ipix(lon, lat)) CLUSTER
Show quoted text
On 8/20/2012 2:10 PM, John R Pierce wrote:
On 08/20/12 10:53 AM, elliott wrote:
Hi,
I am using PostgreSQL 9.1 and loading very large tables ( 13 million
rows each ). The flat file size is only 25M. However, the
equivalent database table is 548MB. This is without any indexes
applied and auto vacuum turned on. I have read that the bloat can
be around 5 times greater for tables than flat files so over 20 times
seems quite excessive.Any ideas on how to go about decreasing this bloat or is this not
unexpected for such large tables?what do the fields of this table look like (output from \d tablename
would be useful)
On 08/20/12 11:46 AM, elliott wrote:
envdb=# \d astgtm2_n60e073;
Table "public.astgtm2_n60e073"
Column | Type | Modifiers
--------+---------+-----------
lat | real |
lon | real |
alt | integer |
Indexes:
"q3c_astgtm2_n60e073_idx" btree (q3c_ang2ipix(lon, lat)) CLUSTER
so, you DO have an index. what type does this function
q3c_ang2ipix(real,real) return ? googling it suggested a bigint, which
means every 12 byte (real,real,int) row has a corresponding b-tree entry
of 8 bytes plus b-tree stuff. I see you used cluster on this, did you
do the CLUSTER after populating the table, and before checking the
pg_total_relation_size that you reported as 20X your data ?
--
john r pierce N 37, W 122
santa cruz ca mid-left coast
On Mon, Aug 20, 2012 at 2:33 PM, John R Pierce <pierce@hogranch.com> wrote:
On 08/20/12 11:46 AM, elliott wrote:
envdb=# \d astgtm2_n60e073;
Table "public.astgtm2_n60e073"
Column | Type | Modifiers
--------+---------+-----------
lat | real |
lon | real |
alt | integer |
Indexes:
"q3c_astgtm2_n60e073_idx" btree (q3c_ang2ipix(lon, lat)) CLUSTERso, you DO have an index. what type does this function
q3c_ang2ipix(real,real) return ? googling it suggested a bigint, which
means every 12 byte (real,real,int) row has a corresponding b-tree entry of
8 bytes plus b-tree stuff. I see you used cluster on this, did you do the
CLUSTER after populating the table, and before checking the
pg_total_relation_size that you reported as 20X your data ?
Apparently it returns a bigint:
https://www.google.com/search?q=q3c_ang2ipix&rlz=1C1CHKZ_enUS437US437&sugexp=chrome,mod=17&sourceid=chrome&ie=UTF-8
Anyways, the overhead for vary narrow tuples is going to be higher for
very narrow tables. Your tuple is only 12 bytes. Each tuple has 23
bytes of overhead typically see:
(http://www.postgresql.org/docs/9.1/interactive/storage-page-layout.html).
Also, measuring total database size is pretty silly because there's a
number of things that are global and don't increase as your data
increases. To get table size, try doing
SELECT pg_size_pretty(pg_relation_size('astgtm2_n60e073'));
SELECT pg_size_pretty(pg_relation_size('q3c_astgtm2_n60e073_idx'));
merlin
elliott <elliott@cpi.com> writes:
Hi,
I am using PostgreSQL 9.1 and loading very large tables ( 13 million
rows each ). The flat file size is only 25M. However, the equivalent
database table is 548MB. This is without any indexes applied and auto
vacuum turned on. I have read that the bloat can be around 5 times
greater for tables than flat files so over 20 times seems quite
excessive.Any ideas on how to go about decreasing this bloat or is this not
unexpected for such large tables?
Well, check if the table has a low fill-factor setting.
\d+ footable
Hasn't come up yet in the remarks by others on this thread but worth verifying.
Thanks
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 732.216.7255
On Mon, Aug 20, 2012 at 10:53 AM, elliott <elliott@cpi.com> wrote:
Hi,
I am using PostgreSQL 9.1 and loading very large tables ( 13 million rows
each ). The flat file size is only 25M.
That is only 2 bytes per row. Is the size given for the flat file for
a compressed file?
Cheers,
Jeff
Yes, it is a tif file. Uncompressed it is around 85M.
Show quoted text
On 8/22/2012 1:20 PM, Jeff Janes wrote:
On Mon, Aug 20, 2012 at 10:53 AM, elliott<elliott@cpi.com> wrote:
Hi,
I am using PostgreSQL 9.1 and loading very large tables ( 13 million rows
each ). The flat file size is only 25M.That is only 2 bytes per row. Is the size given for the flat file for
a compressed file?Cheers,
Jeff
On Wed, Aug 22, 2012 at 12:25 PM, elliott <elliott@cpi.com> wrote:
Yes, it is a tif file. Uncompressed it is around 85M.
ok, 85 -> 548mb is reasonable considering you have very narrow rows
and an index that covers 2/3 of your column data. if you want to see
dramatic reduction in table size, you probably need to explore use of
arrays in some fashion.
merlin
On 08/22/12 10:25 AM, elliott wrote:
Yes, it is a tif file. Uncompressed it is around 85M.
a tif file is a pixel map image, eg, graphics, no? I thought we were
talking about CSV data here?
--
john r pierce N 37, W 122
santa cruz ca mid-left coast
-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
owner@postgresql.org] On Behalf Of John R Pierce
Sent: Wednesday, August 22, 2012 4:32 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database BloatOn 08/22/12 10:25 AM, elliott wrote:
Yes, it is a tif file. Uncompressed it is around 85M.
a tif file is a pixel map image, eg, graphics, no? I thought we were
talking about CSV data here?
If I had to hazard a guess I think he meant "TAR" since he is talking about
compression...
David J.