splitting up tables based on read/write frequency of columns

Started by Jonathan Vanascoabout 11 years ago4 messagesgeneral

postgres@2xlp.com

about 11 years ago

This is really a theoretical/anecdotal question, as I'm not at a scale yet where this would measurable. I want to investigate while this is fresh in my mind...

I recall reading that unless a row has columns that are TOASTed, an `UPDATE` is essentially an `INSERT + DELETE`, with the previous row marked for vacuuming.

A few of my tables have the following characteristics:
- The Primary Key has many other tables/columns that FKEY onto it.
- Many columns (30+) of small data size
- Most columns (90%) are 1 WRITE(UPDATE) for 1000 READS
- Some columns (10%) do a bit of internal bookkeeping and are 1 WRITE(UPDATE) for 50 READS

Has anyone done testing/benchmarking on potential efficiency/savings by consolidating the frequent UPDATE columns into their own table?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Stefan Keller

sfkeller@gmail.com

about 11 years ago

In reply to: Jonathan Vanasco (#1)

Re: splitting up tables based on read/write frequency of columns

I'm pretty sure PostgreSQL can handle this.
But since you asked with a theoretic background,
it's probably worthwhile to look at column stores (like [1]).

-S.

[*] http://citusdata.github.io/cstore_fdw/

2015-01-19 22:47 GMT+01:00 Jonathan Vanasco <postgres@2xlp.com>:

This is really a theoretical/anecdotal question, as I'm not at a scale yet where this would measurable. I want to investigate while this is fresh in my mind...

I recall reading that unless a row has columns that are TOASTed, an `UPDATE` is essentially an `INSERT + DELETE`, with the previous row marked for vacuuming.

A few of my tables have the following characteristics:
- The Primary Key has many other tables/columns that FKEY onto it.
- Many columns (30+) of small data size
- Most columns (90%) are 1 WRITE(UPDATE) for 1000 READS
- Some columns (10%) do a bit of internal bookkeeping and are 1 WRITE(UPDATE) for 50 READS

Has anyone done testing/benchmarking on potential efficiency/savings by consolidating the frequent UPDATE columns into their own table?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Jonathan Vanasco

postgres@2xlp.com

about 11 years ago

In reply to: Stefan Keller (#2)

Re: splitting up tables based on read/write frequency of columns

On Jan 19, 2015, at 5:07 PM, Stefan Keller wrote:

Hi

I'm pretty sure PostgreSQL can handle this.
But since you asked with a theoretic background,
it's probably worthwhile to look at column stores (like [1]).

Wow. I didn't know there was a column store extension for PG -- this would come in handy for some analytic stuff we run!

I know that PG can handle my current system "at scale". I'm really just wondering what the possible slowdowns/improvements will be.

Doing a rewrite of the entire row + updating the various indexes seems to be a lot of unnecessary IO. At some point it will make sense to minimize that and isolate the heavy-write columns from impacting the rest of the table's performance.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

David G. Johnston

david.g.johnston@gmail.com

about 11 years ago

In reply to: Jonathan Vanasco (#1)

Re: splitting up tables based on read/write frequency of columns

Jonathan Vanasco-7 wrote

This is really a theoretical/anecdotal question, as I'm not at a scale yet
where this would measurable. I want to investigate while this is fresh in
my mind...

I recall reading that unless a row has columns that are TOASTed, an
`UPDATE` is essentially an `INSERT + DELETE`, with the previous row marked
for vacuuming.

A few of my tables have the following characteristics:
- The Primary Key has many other tables/columns that FKEY onto it.
- Many columns (30+) of small data size
- Most columns (90%) are 1 WRITE(UPDATE) for 1000 READS
- Some columns (10%) do a bit of internal bookkeeping and are 1
WRITE(UPDATE) for 50 READS

Has anyone done testing/benchmarking on potential efficiency/savings by
consolidating the frequent UPDATE columns into their own table?

Consider another in-database attempt to mitigate the need to do this
manually:

HOT (heap-only-tuple)

http://pgsql.tapoueh.org/site/html/misc/hot.html

I haven't done any bench-marking but I do currently use this idea to
segregate read-only fields from read-write fields. Likely there is also a
model reason why these fields have different update frequencies and so can
both logically and physically be partitioned out.

The only harm I see is that it can make using the schema more difficult -
though that can be somewhat mitigated by using (updateable) views in front
of the partitioned tables.

If you can logically partition them I would go for it; if it is a purely
physical concern then I'd probably ponder it for another couple of days and
then go for it anyway. The main additional thing to ponder is the cost of
additional parsing and the additional join. Neither is super expensive but
if only doing this for physical reasons you need to evaluate your planned
usage patterns. With a logical split you are more likely to find situations
where you do not even care about one table or the other and so can avoid the
join entirely.

David J.

--
View this message in context: http://postgresql.nabble.com/splitting-up-tables-based-on-read-write-frequency-of-columns-tp5834646p5834911.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general