Database Bloat

Started by elliottover 13 years ago12 messagesgeneral

elliott@cpi.com

over 13 years ago

Hi,

I am using PostgreSQL 9.1 and loading very large tables ( 13 million
rows each ). The flat file size is only 25M. However, the equivalent
database table is 548MB. This is without any indexes applied and auto
vacuum turned on. I have read that the bloat can be around 5 times
greater for tables than flat files so over 20 times seems quite excessive.

Any ideas on how to go about decreasing this bloat or is this not
unexpected for such large tables?

Thanks

David G. Johnston

david.g.johnston@gmail.com

over 13 years ago

In reply to: elliott (#1)

Re: Database Bloat

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
owner@postgresql.org] On Behalf Of elliott
Sent: Monday, August 20, 2012 1:54 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Database Bloat

Hi,

I am using PostgreSQL 9.1 and loading very large tables ( 13 million rows

each

). The flat file size is only 25M. However, the equivalent database

table is

548MB. This is without any indexes applied and auto vacuum turned on. I
have read that the bloat can be around 5 times greater for tables than

flat

files so over 20 times seems quite excessive.

Any ideas on how to go about decreasing this bloat or is this not

unexpected

for such large tables?

Thanks

Kinda guessing here but that 5x estimate has some assumptions built in. I
am guessing that a table that has many large plain-text data would compress
more than one with a mixture of numbers, varchars, dates and other
less-compressible datatypes.

It would help to provide a general description of the structure of said
tables and how large individual fields (if bytea or text) tend to be.

I would think that filesystem parameters come into play as well and you do
not specify the OS on which you are running.

Do you have any idea which specific tables are "bloated"? If there are only
a few main contributors what is different about them?

More questions than answers but something to ponder while you wait for more
knowledgeable persons to respond.

David J.

John R Pierce

pierce@hogranch.com

over 13 years ago

In reply to: elliott (#1)

Re: Database Bloat

On 08/20/12 10:53 AM, elliott wrote:

Hi,

I am using PostgreSQL 9.1 and loading very large tables ( 13 million
rows each ). The flat file size is only 25M. However, the equivalent
database table is 548MB. This is without any indexes applied and auto
vacuum turned on. I have read that the bloat can be around 5 times
greater for tables than flat files so over 20 times seems quite
excessive.

Any ideas on how to go about decreasing this bloat or is this not
unexpected for such large tables?

what do the fields of this table look like (output from \d tablename
would be useful)

--
john r pierce N 37, W 122
santa cruz ca mid-left coast

elliott

elliott@cpi.com

over 13 years ago

In reply to: John R Pierce (#3)

Re: Database Bloat

Show quoted text

On 8/20/2012 2:10 PM, John R Pierce wrote:

On 08/20/12 10:53 AM, elliott wrote:

Hi,

I am using PostgreSQL 9.1 and loading very large tables ( 13 million
rows each ). The flat file size is only 25M. However, the
equivalent database table is 548MB. This is without any indexes
applied and auto vacuum turned on. I have read that the bloat can
be around 5 times greater for tables than flat files so over 20 times
seems quite excessive.

Any ideas on how to go about decreasing this bloat or is this not
unexpected for such large tables?

what do the fields of this table look like (output from \d tablename
would be useful)

John R Pierce

pierce@hogranch.com

over 13 years ago

In reply to: elliott (#4)

Re: Database Bloat

On 08/20/12 11:46 AM, elliott wrote:

envdb=# \d astgtm2_n60e073;
Table "public.astgtm2_n60e073"
Column | Type | Modifiers
--------+---------+-----------
lat | real |
lon | real |
alt | integer |
Indexes:
"q3c_astgtm2_n60e073_idx" btree (q3c_ang2ipix(lon, lat)) CLUSTER

so, you DO have an index. what type does this function
q3c_ang2ipix(real,real) return ? googling it suggested a bigint, which
means every 12 byte (real,real,int) row has a corresponding b-tree entry
of 8 bytes plus b-tree stuff. I see you used cluster on this, did you
do the CLUSTER after populating the table, and before checking the
pg_total_relation_size that you reported as 20X your data ?

--
john r pierce N 37, W 122
santa cruz ca mid-left coast

Merlin Moncure

mmoncure@gmail.com

over 13 years ago

In reply to: John R Pierce (#5)

Re: Database Bloat

On Mon, Aug 20, 2012 at 2:33 PM, John R Pierce <pierce@hogranch.com> wrote:

On 08/20/12 11:46 AM, elliott wrote:

envdb=# \d astgtm2_n60e073;
Table "public.astgtm2_n60e073"
Column | Type | Modifiers
--------+---------+-----------
lat | real |
lon | real |
alt | integer |
Indexes:
"q3c_astgtm2_n60e073_idx" btree (q3c_ang2ipix(lon, lat)) CLUSTER

so, you DO have an index. what type does this function
q3c_ang2ipix(real,real) return ? googling it suggested a bigint, which
means every 12 byte (real,real,int) row has a corresponding b-tree entry of
8 bytes plus b-tree stuff. I see you used cluster on this, did you do the
CLUSTER after populating the table, and before checking the
pg_total_relation_size that you reported as 20X your data ?

Apparently it returns a bigint:
https://www.google.com/search?q=q3c_ang2ipix&rlz=1C1CHKZ_enUS437US437&sugexp=chrome,mod=17&sourceid=chrome&ie=UTF-8

Anyways, the overhead for vary narrow tuples is going to be higher for
very narrow tables. Your tuple is only 12 bytes. Each tuple has 23
bytes of overhead typically see:
(http://www.postgresql.org/docs/9.1/interactive/storage-page-layout.html).

Also, measuring total database size is pretty silly because there's a
number of things that are global and don't increase as your data
increases. To get table size, try doing

SELECT pg_size_pretty(pg_relation_size('astgtm2_n60e073'));
SELECT pg_size_pretty(pg_relation_size('q3c_astgtm2_n60e073_idx'));

merlin

Jerry Sievers

gsievers19@comcast.net

over 13 years ago

In reply to: elliott (#1)

Re: Database Bloat

elliott <elliott@cpi.com> writes:

Hi,

I am using PostgreSQL 9.1 and loading very large tables ( 13 million
rows each ). The flat file size is only 25M. However, the equivalent
database table is 548MB. This is without any indexes applied and auto
vacuum turned on. I have read that the bloat can be around 5 times
greater for tables than flat files so over 20 times seems quite
excessive.

Any ideas on how to go about decreasing this bloat or is this not
unexpected for such large tables?

Well, check if the table has a low fill-factor setting.

\d+ footable

Hasn't come up yet in the remarks by others on this thread but worth verifying.

Thanks

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 732.216.7255

Jeff Janes

jeff.janes@gmail.com

over 13 years ago

In reply to: elliott (#1)

Re: Database Bloat

On Mon, Aug 20, 2012 at 10:53 AM, elliott <elliott@cpi.com> wrote:

Hi,

I am using PostgreSQL 9.1 and loading very large tables ( 13 million rows
each ). The flat file size is only 25M.

That is only 2 bytes per row. Is the size given for the flat file for
a compressed file?

Cheers,

Jeff

elliott

elliott@cpi.com

over 13 years ago

In reply to: Jeff Janes (#8)

Re: Database Bloat

Yes, it is a tif file. Uncompressed it is around 85M.

Show quoted text

On 8/22/2012 1:20 PM, Jeff Janes wrote:

On Mon, Aug 20, 2012 at 10:53 AM, elliott<elliott@cpi.com> wrote:

Hi,

I am using PostgreSQL 9.1 and loading very large tables ( 13 million rows
each ). The flat file size is only 25M.

That is only 2 bytes per row. Is the size given for the flat file for
a compressed file?

Cheers,

Jeff

#10

Merlin Moncure

mmoncure@gmail.com

over 13 years ago

In reply to: elliott (#9)

Re: Database Bloat

On Wed, Aug 22, 2012 at 12:25 PM, elliott <elliott@cpi.com> wrote:

Yes, it is a tif file. Uncompressed it is around 85M.

ok, 85 -> 548mb is reasonable considering you have very narrow rows
and an index that covers 2/3 of your column data. if you want to see
dramatic reduction in table size, you probably need to explore use of
arrays in some fashion.

merlin

#11

John R Pierce

pierce@hogranch.com

over 13 years ago

In reply to: elliott (#9)

Re: Database Bloat

On 08/22/12 10:25 AM, elliott wrote:

Yes, it is a tif file. Uncompressed it is around 85M.

a tif file is a pixel map image, eg, graphics, no? I thought we were
talking about CSV data here?

--
john r pierce N 37, W 122
santa cruz ca mid-left coast

#12

David G. Johnston

david.g.johnston@gmail.com

over 13 years ago

In reply to: John R Pierce (#11)

Re: Database Bloat

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
owner@postgresql.org] On Behalf Of John R Pierce
Sent: Wednesday, August 22, 2012 4:32 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database Bloat

On 08/22/12 10:25 AM, elliott wrote:

Yes, it is a tif file. Uncompressed it is around 85M.

a tif file is a pixel map image, eg, graphics, no? I thought we were
talking about CSV data here?

If I had to hazard a guess I think he meant "TAR" since he is talking about
compression...

David J.