Table Design for Many Updates
I have a multi-tenant database that I'm migrating from SQL Server to
PostgreSQL 9.6.1. I read the recent articles about the potential write
amplification issue in Postgres. I have one particular table that has 14
columns, a primary key, five foreign keys, and eight indexes. We have a
little over a thousand devices (this number will increase over time) on the
Internet that will insert a row into this table and then proceed to update
two columns in that row about once a minute for the next two hours. The two
columns are NOT NULL and are not FK or indexed columns. I've thought about
moving them to a one-to-one related table. Any thoughts on if this is a
wise move or if I'm making a mountain out of a mole hill? It looks like
this scenario would be covered by the Heap-Only-Tuple update but with over a
hundred updates to the same row and over a thousand different rows being
updated at a time, will I reap the benefits?
Thanks,
Craig
On Tue, Jan 10, 2017 at 2:33 PM, Craig Boucher <craig@wesvic.com> wrote:
I have a multi-tenant database that I'm migrating from SQL Server to
PostgreSQL 9.6.1. I read the recent articles about the potential write
amplification issue in Postgres. I have one particular table that has 14
columns, a primary key, five foreign keys, and eight indexes. We have a
little over a thousand devices (this number will increase over time) on the
Internet that will insert a row into this table and then proceed to update
two columns in that row about once a minute for the next two hours. The
two columns are NOT NULL and are not FK or indexed columns. I've thought
about moving them to a one-to-one related table. Any thoughts on if this
is a wise move or if I'm making a mountain out of a mole hill? It looks
like this scenario would be covered by the Heap-Only-Tuple update but with
over a hundred updates to the same row and over a thousand different rows
being updated at a time, will I reap the benefits?
With a reasonable fill-factor on the table you probably would be OK - but
I'm partial to separating out the static and dynamic data into separate
tables if the rest of the model and intended applications support it. The
main concern is how many queries do you have with a WHERE clause that
includes fields from both sets? Cross-table statistical estimates are
problematic but if you don't have to be concerned about them it would be
conceptually cleaner to setup a one-to-one here.
David J.
On 1/10/2017 1:42 PM, David G. Johnston wrote:
On Tue, Jan 10, 2017 at 2:33 PM, Craig Boucher <craig@wesvic.com
<mailto:craig@wesvic.com>>wrote:I have a multi-tenant database that I'm migrating from SQL Server
to PostgreSQL 9.6.1. I read the recent articles about the
potential write amplification issue in Postgres. I have one
particular table that has 14 columns, a primary key, five foreign
keys, and eight indexes. We have a little over a thousand devices
(this number will increase over time) on the Internet that will
insert a row into this table and then proceed to update two
columns in that row about once a minute for the next two hours.
The two columns are NOT NULL and are not FK or indexed columns.
I've thought about moving them to a one-to-one related table. Any
thoughts on if this is a wise move or if I'm making a mountain out
of a mole hill? It looks like this scenario would be covered by
the Heap-Only-Tuple update but with over a hundred updates to the
same row and over a thousand different rows being updated at a
time, will I reap the benefits?
I do hope those 1000-and-growing devices aren't directly connecting to
sql, but are instead talking to some sort of app service which queues up
requests to a sane number of database worker processes where said worker
process count can be tuned for to balance throughput and response ?
With a reasonable fill-factor on the table you probably would be OK
- but I'm partial to separating out the static and dynamic data into
separate tables if the rest of the model and intended applications
support it. The main concern is how many queries do you have with a
WHERE clause that includes fields from both sets? Cross-table
statistical estimates are problematic but if you don't have to be
concerned about them it would be conceptually cleaner to setup a
one-to-one here.
if these two updated fields are not indexed, its unlikely they're being
used in WHERE clauses
the real question with HOT is if the table can be vacuumed frequently
enough so the tuples can continue to be updated in the same block.
--
john r pierce, recycling bits in santa cruz
Thanks for the insights. I don’t think we have any where clauses that would filter on a column from the related table.
Craig
From: David G. Johnston [mailto:david.g.johnston@gmail.com]
Sent: Tuesday, January 10, 2017 1:42 PM
To: Craig Boucher <craig@wesvic.com>
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Table Design for Many Updates
On Tue, Jan 10, 2017 at 2:33 PM, Craig Boucher <craig@wesvic.com <mailto:craig@wesvic.com> > wrote:
I have a multi-tenant database that I'm migrating from SQL Server to PostgreSQL 9.6.1. I read the recent articles about the potential write amplification issue in Postgres. I have one particular table that has 14 columns, a primary key, five foreign keys, and eight indexes. We have a little over a thousand devices (this number will increase over time) on the Internet that will insert a row into this table and then proceed to update two columns in that row about once a minute for the next two hours. The two columns are NOT NULL and are not FK or indexed columns. I've thought about moving them to a one-to-one related table. Any thoughts on if this is a wise move or if I'm making a mountain out of a mole hill? It looks like this scenario would be covered by the Heap-Only-Tuple update but with over a hundred updates to the same row and over a thousand different rows being updated at a time, will I reap the benefits?
With a reasonable fill-factor on the table you probably would be OK - but I'm partial to separating out the static and dynamic data into separate tables if the rest of the model and intended applications support it. The main concern is how many queries do you have with a WHERE clause that includes fields from both sets? Cross-table statistical estimates are problematic but if you don't have to be concerned about them it would be conceptually cleaner to setup a one-to-one here.
David J.
No, they aren’t connecting directly to the database. They go through a web api. The data is queued on the device and uploaded in the background. So a little latency isn’t a problem. I’ve looked at Amazon’s SQS but I don’t like the idea of being tied to a specific hosting vendor.
Thanks,
Craig
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of John R Pierce
Sent: Tuesday, January 10, 2017 1:48 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Table Design for Many Updates
On 1/10/2017 1:42 PM, David G. Johnston wrote:
On Tue, Jan 10, 2017 at 2:33 PM, Craig Boucher <craig@wesvic.com <mailto:craig@wesvic.com> > wrote:
I have a multi-tenant database that I'm migrating from SQL Server to PostgreSQL 9.6.1. I read the recent articles about the potential write amplification issue in Postgres. I have one particular table that has 14 columns, a primary key, five foreign keys, and eight indexes. We have a little over a thousand devices (this number will increase over time) on the Internet that will insert a row into this table and then proceed to update two columns in that row about once a minute for the next two hours. The two columns are NOT NULL and are not FK or indexed columns. I've thought about moving them to a one-to-one related table. Any thoughts on if this is a wise move or if I'm making a mountain out of a mole hill? It looks like this scenario would be covered by the Heap-Only-Tuple update but with over a hundred updates to the same row and over a thousand different rows being updated at a time, will I reap the benefits?
I do hope those 1000-and-growing devices aren't directly connecting to sql, but are instead talking to some sort of app service which queues up requests to a sane number of database worker processes where said worker process count can be tuned for to balance throughput and response ?
With a reasonable fill-factor on the table you probably would be OK - but I'm partial to separating out the static and dynamic data into separate tables if the rest of the model and intended applications support it. The main concern is how many queries do you have with a WHERE clause that includes fields from both sets? Cross-table statistical estimates are problematic but if you don't have to be concerned about them it would be conceptually cleaner to setup a one-to-one here.
if these two updated fields are not indexed, its unlikely they're being used in WHERE clauses
the real question with HOT is if the table can be vacuumed frequently enough so the tuples can continue to be updated in the same block.
--
john r pierce, recycling bits in santa cruz