PostgreSQL Arrays and Performance

Started by Marc Philippover 20 years ago6 messagesgeneral

mail@marcphilipp.de

over 20 years ago

A few performance issues using PostgreSQL's arrays led us to the
question how postgres actually stores variable length arrays. First,
let me explain our situation.

We have a rather large table containing a simple integer primary key
and a couple more columns of fixed size. However, there is a dates
column of type "timestamp without time zone[]" that is apparently
causing some severe performance problems.

During a daily update process new timestamps are collected and
existing data rows are being updated (new rows are also being added).
These changes affect a large percentage of the existing rows.

What we have been observing in the last few weeks is, that the
overall database size is increasing rapidly due to this table and
vacuum processes seem to deadlock with other processes querying data
from this table.

Therefore, the the database keeps growing and becomes more and more
unusable. The only thing that helps is dumping and restoring it which
is nothing you are eager to do on a large live system and a daily basis.

This problem led us to the question, how these arrays are stored
internally. Are they stored "in-place" with the other columns or
merely as a pointer to another file?

Would it be more efficient to not use an array for this purpose but
split the table in two parts?

Any help is appreciated!

Marc Philipp

Tom Lane

tgl@sss.pgh.pa.us

over 20 years ago

In reply to: Marc Philipp (#1)

Re: PostgreSQL Arrays and Performance

Marc Philipp <mail@marcphilipp.de> writes:

A few performance issues using PostgreSQL's arrays led us to the
question how postgres actually stores variable length arrays. First,
let me explain our situation.

We have a rather large table containing a simple integer primary key
and a couple more columns of fixed size. However, there is a dates
column of type "timestamp without time zone[]" that is apparently
causing some severe performance problems.

How large are the arrays? PG is definitely not designed to do well with
very large arrays (say more than a couple hundred elements). You should
reconsider your data design if you find yourself trying to do that.

regards, tom lane

Joe Conway

mail@joeconway.com

over 20 years ago

In reply to: Marc Philipp (#1)

Re: PostgreSQL Arrays and Performance

Marc Philipp wrote:

During a daily update process new timestamps are collected and existing
data rows are being updated (new rows are also being added). These
changes affect a large percentage of the existing rows.

What we have been observing in the last few weeks is, that the overall
database size is increasing rapidly due to this table and vacuum
processes seem to deadlock with other processes querying data from this
table.

This sounds like it has more to do with inadequate freespace map
settings than use of arrays. Every update creates a dead tuple, and if
it is large (because the array is large) and leaked (because you have no
room in your freespace map), that would explain a rapidly increasing
database size.

Therefore, the the database keeps growing and becomes more and more
unusable. The only thing that helps is dumping and restoring it which
is nothing you are eager to do on a large live system and a daily basis.

Arrays are stored as compressed varlenas, pretty much exactly like
varchar or text fields. However, if your use of arrays causes you to
need to perform updates to a large percentage of your rows on a daily
basis, instead of just inserting new rows, you should probably rethink
your data model.

Would it be more efficient to not use an array for this purpose but
split the table in two parts?

I think so.

Joe

Marc Philipp

mail@marcphilipp.de

over 20 years ago

In reply to: Joe Conway (#3)

Re: PostgreSQL Arrays and Performance

This sounds like it has more to do with inadequate freespace map
settings than use of arrays. Every update creates a dead tuple, and
if
it is large (because the array is large) and leaked (because you have
no
room in your freespace map), that would explain a rapidly increasing
database size.

Can you tell me more about free-space map settings? What exactly is the
free-space map? The information in the documentation is not very
helpful. How can dead tuples leak?

Regards, Marc Philipp

Marc Philipp

mail@marcphilipp.de

over 20 years ago

In reply to: Tom Lane (#2)

Re: PostgreSQL Arrays and Performance

How large are the arrays? PG is definitely not designed to do well
with
very large arrays (say more than a couple hundred elements). You
should
reconsider your data design if you find yourself trying to do that

At the moment, the arrays are not larger than 200 entries. But there is
not upper bound for their size.

Regards, Marc Philipp

Jim Nasby

Jim.Nasby@BlueTreble.com

over 20 years ago

In reply to: Marc Philipp (#4)

Re: PostgreSQL Arrays and Performance

On Sun, Jan 08, 2006 at 10:22:22AM +0100, Marc Philipp wrote:

This sounds like it has more to do with inadequate freespace map
settings than use of arrays. Every update creates a dead tuple, and
if
it is large (because the array is large) and leaked (because you have
no
room in your freespace map), that would explain a rapidly increasing
database size.

Can you tell me more about free-space map settings? What exactly is the
free-space map? The information in the documentation is not very
helpful. How can dead tuples leak?

http://www.pervasive-postgres.com/lp/newsletters/2005/Insights_opensource_Nov.asp#3
is an article I wrote that might clear things up.
http://www.pervasive-postgres.com/lp/newsletters/2005/Insights_opensource_Dec.asp#2
might also be an interesting read, though it's just about MVCC in
general.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461