R: space taken by a row & compressed data

Started by Leonardo Francalanciover 21 years ago4 messagesgeneral

Jump to latest

Leonardo Francalanci

lfrancalanci@simtel.ie

over 21 years ago

We have an FAQ item about this.

Damn! I didn't see that one! Sorry...

Long data values are automatically compressed.

The reason I'm asking is:
we have a system that stores 200,000,000 rows per month
(other tables store 10,000,000 rows per month)
Every row has 400 columns of integers + 2 columns (date+integer) as index.

Our system compresses rows before writing them to a binary file on disk.
Data don't usually need to be updated/removed.
We usually access all columns of a row (hence compression on a per-row basis
makes sense).

Is there any way to compress data on a per-row basis? Maybe with
a User-Defined type?

Import Notes

Reply to msg id not found: 200408261505.i7QF5FR26375@candle.pha.pa.us

Bruce Momjian

bruce@momjian.us

over 21 years ago

In reply to: Leonardo Francalanci (#1)

Re: R: space taken by a row & compressed data

Leonardo Francalanci wrote:

We have an FAQ item about this.

Damn! I didn't see that one! Sorry...

Long data values are automatically compressed.

The reason I'm asking is:
we have a system that stores 200,000,000 rows per month
(other tables store 10,000,000 rows per month)
Every row has 400 columns of integers + 2 columns (date+integer) as index.

Our system compresses rows before writing them to a binary file on disk.
Data don't usually need to be updated/removed.
We usually access all columns of a row (hence compression on a per-row basis
makes sense).

Is there any way to compress data on a per-row basis? Maybe with
a User-Defined type?

Ah, we only compress long row values, which integers would not be. I
don't see any way to compress an entire row even with a user-defined
type unless you put multiple values into a single column and compress
those as a single value. In fact, if you used an array or some special
data type it would become a long value and would be automatically
compressed.

However, as integers, there would have to be a lot of duplicate values
before compression would be a win.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Tom Lane

tgl@sss.pgh.pa.us

over 21 years ago

In reply to: Leonardo Francalanci (#1)

Re: R: space taken by a row & compressed data

"Leonardo Francalanci" <lfrancalanci@simtel.ie> writes:

we have a system that stores 200,000,000 rows per month
(other tables store 10,000,000 rows per month)
Every row has 400 columns of integers + 2 columns (date+integer) as index.

Our system compresses rows before writing them to a binary file on disk.
Data don't usually need to be updated/removed.
We usually access all columns of a row (hence compression on a per-row basis
makes sense).

Is there any way to compress data on a per-row basis? Maybe with
a User-Defined type?

If you just stuck all the integers into a single integer-array column,
it would be 1600 bytes wide, which is ... hmm ... not quite wide enough
to trigger the toast logic. Perhaps it would be worthwhile for you to
run a custom build with TOAST_TUPLE_THRESHOLD/TOAST_TUPLE_TARGET set
to half their standard values (see src/include/access/tuptoaster.h).
You'd not need to write any specialized code that way.

Note that if you sometimes search on the values of one of the non-index
columns, this might be a bad idea.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

over 21 years ago

In reply to: Bruce Momjian (#2)

Re: R: space taken by a row & compressed data

Bruce Momjian <pgman@candle.pha.pa.us> writes:

However, as integers, there would have to be a lot of duplicate values
before compression would be a win.

Not necessarily. If for instance most of the values fit in int2, then
the upper zero bytes would be fodder for compression. (If they *all*
fit in int2 then of course he's missing a trick...) The fact that they
are successfully using row compression on their old platform indicates
that there's some win available there.

regards, tom lane