performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains
Hi All,
I have quite a few tables that follow a pattern like this:
Table "public.my_model"
Column | Type | Modifiers
--------+-------------------+-----------
period | tsrange | not null
key | character varying | not null
value | integer |
Indexes:
"my_model_pkey" PRIMARY KEY, btree (period, key)
"my_model_period_key_excl" EXCLUDE USING gist (period WITH &&, key
WITH =)
Check constraints:
"my_model_period_check" CHECK (period <> 'empty'::tsrange)
So, a primary key of a period column and one or more other columns
(usually int or string) and an exclude constraint to prevent overlaps,
and a check constraint to prevent empty ranges.
However, I'm hitting performance problems on moderate bulk inserts and
updates, with ~700k rows taking around 13 minutes. Profiling my python
code suggests that most of the time is being taken by Postgres (9.4 in
this case...)
What can I do to speed things up? Is there a different type of index I
can use to achieve the same exclude constraint? Is there something I can
do to have the index changes only done on the commit of the bulk batches?
cheers,
Chris
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 9/16/2016 2:01 AM, Chris Withers wrote:
Hi All,
I have quite a few tables that follow a pattern like this:
Table "public.my_model"
Column | Type | Modifiers
--------+-------------------+-----------
period | tsrange | not null
key | character varying | not null
value | integer |
Indexes:
"my_model_pkey" PRIMARY KEY, btree (period, key)
"my_model_period_key_excl" EXCLUDE USING gist (period WITH &&,
key WITH =)
Check constraints:
"my_model_period_check" CHECK (period <> 'empty'::tsrange)So, a primary key of a period column and one or more other columns
(usually int or string) and an exclude constraint to prevent overlaps,
and a check constraint to prevent empty ranges.However, I'm hitting performance problems on moderate bulk inserts and
updates, with ~700k rows taking around 13 minutes. Profiling my python
code suggests that most of the time is being taken by Postgres (9.4 in
this case...)What can I do to speed things up? Is there a different type of index I
can use to achieve the same exclude constraint? Is there something I
can do to have the index changes only done on the commit of the bulk
batches?
if (period,key) is unique, by virtue of being the primary key, then
whats the point of the exclusion ??
I'm curious, how fast do your insert/updates run if you remove the key
exclusion and check constraint ? tsvector operations are a lot more
complicated than simple matches in indexing....
--
john r pierce, recycling bits in santa cruz
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 9/16/2016 2:12 AM, John R Pierce wrote:
Column | Type | Modifiers
--------+-------------------+-----------
period | tsrange | not null
wait, what is a tsrange? the standard textsearch data types in
postgres are tsvector and tsquery,
--
john r pierce, recycling bits in santa cruz
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 9/16/2016 2:23 AM, John R Pierce wrote:
wait, what is a tsrange? the standard textsearch data types in
postgres are tsvector and tsquery,
never mind, I should have known, its a timestamp range. ...
when you do updates, are you changing any of the indexed fields, or just
"value" ?
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 16/09/2016 10:26, John R Pierce wrote:
On 9/16/2016 2:23 AM, John R Pierce wrote:
wait, what is a tsrange? the standard textsearch data types in
postgres are tsvector and tsquery,never mind, I should have known, its a timestamp range. ...
when you do updates, are you changing any of the indexed fields, or
just "value" ?
Yeah, it's a temporal table, so "updates" involve modifying the period
column for a row to set its end ts, and then inserting a new row with a
start ts running on from that.
Of course, the adds are just inserting new rows.
cheers,
Chris
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 16/09/2016 10:26, John R Pierce wrote:
On 9/16/2016 2:23 AM, John R Pierce wrote:
wait, what is a tsrange? the standard textsearch data types in
postgres are tsvector and tsquery,never mind, I should have known, its a timestamp range. ...
when you do updates, are you changing any of the indexed fields, or
just "value" ?
Yeah, it's a temporal table, so "updates" involve modifying the period
column for a row to set its end ts, and then inserting a new row with a
start ts running on from that.
Of course, the adds are just inserting new rows.
cheers,
Chris
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 9/16/2016 3:46 AM, Chris Withers wrote:
when you do updates, are you changing any of the indexed fields, or
just "value" ?Yeah, it's a temporal table, so "updates" involve modifying the period
column for a row to set its end ts, and then inserting a new row with
a start ts running on from that.
thats expensive, as it has to reindex that row. and range indexes are
more expensive than timestamp indexes
modifiyng the primary key is kind of a violation of one of the basic
rules of relational databases as it means the row can't be referenced by
another table.
I expect the expensive one is the constraint that ensures no periods
overlap for the given key. I'm not sure how that can be done short of
a full scan for each update/insert. it might actually perform better
if you write the index with the key first as presumably the key is
invariant ?
--
john r pierce, recycling bits in santa cruz
On 16/09/2016 12:00, John R Pierce wrote:
On 9/16/2016 3:46 AM, Chris Withers wrote:
when you do updates, are you changing any of the indexed fields, or
just "value" ?Yeah, it's a temporal table, so "updates" involve modifying the period
column for a row to set its end ts, and then inserting a new row with
a start ts running on from that.thats expensive, as it has to reindex that row. and range indexes are
more expensive than timestamp indexesmodifiyng the primary key is kind of a violation of one of the basic
rules of relational databases as it means the row can't be referenced by
another table.
Right, but these rows have no natural primary key. Would it help if I
just added an auto-incrementing integer key? Would that make a positive
difference or would it just be a wasted column?
I expect the expensive one is the constraint that ensures no periods
overlap for the given key. I'm not sure how that can be done short of
a full scan for each update/insert.
Indeed, I wonder if making the constraint deferrable might help for the
bulk case?
it might actually perform better
if you write the index with the key first as presumably the key is
invariant ?
You mean:
PRIMARY KEY, btree (period, key) as opposed to
--
john r pierce, recycling bits in santa cruz
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 16/09/2016 12:00, John R Pierce wrote:
On 9/16/2016 3:46 AM, Chris Withers wrote:
when you do updates, are you changing any of the indexed fields, or
just "value" ?Yeah, it's a temporal table, so "updates" involve modifying the period
column for a row to set its end ts, and then inserting a new row with
a start ts running on from that.thats expensive, as it has to reindex that row. and range indexes are
more expensive than timestamp indexesmodifiyng the primary key is kind of a violation of one of the basic
rules of relational databases as it means the row can't be referenced by
another table.
Right, but these rows have no natural primary key. Would it help if I
just added an auto-incrementing integer key? Would that make a positive
difference or would it just be a wasted column?
I expect the expensive one is the constraint that ensures no periods
overlap for the given key. I'm not sure how that can be done short of
a full scan for each update/insert.
Indeed, I wonder if making the constraint deferrable might help for the
bulk case?
it might actually perform better
if you write the index with the key first as presumably the key is
invariant ?
You mean:
PRIMARY KEY, btree (key1, key2, period)
as opposed to
PRIMARY KEY, btree (period, key)
Interesting, I'd assumed postgres would optimise that under the covers...
Chris
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Chris Withers
Sent: Friday, September 16, 2016 6:47 AM
To: John R Pierce <pierce@hogranch.com>; pgsql-general@postgresql.org
Subject: Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains
On 16/09/2016 10:26, John R Pierce wrote:
On 9/16/2016 2:23 AM, John R Pierce wrote:
wait, what is a tsrange? the standard textsearch data types in
postgres are tsvector and tsquery,never mind, I should have known, its a timestamp range. ...
when you do updates, are you changing any of the indexed fields, or
just "value" ?
Yeah, it's a temporal table, so "updates" involve modifying the period column for a row to set its end ts, and then inserting a new row with a start ts running on from that.
Of course, the adds are just inserting new rows.
cheers,
Chris
____________________________________________________________________________________________________
So, what is the value for "end ts", when the record is inserted (the range just started)?
Regards,
Igor Neyman
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 16/09/2016 14:54, Igor Neyman wrote:
-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Chris Withers
Sent: Friday, September 16, 2016 6:47 AM
To: John R Pierce <pierce@hogranch.com>; pgsql-general@postgresql.org
Subject: Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrainsOn 16/09/2016 10:26, John R Pierce wrote:
On 9/16/2016 2:23 AM, John R Pierce wrote:
wait, what is a tsrange? the standard textsearch data types in
postgres are tsvector and tsquery,never mind, I should have known, its a timestamp range. ...
when you do updates, are you changing any of the indexed fields, or
just "value" ?Yeah, it's a temporal table, so "updates" involve modifying the period column for a row to set its end ts, and then inserting a new row with a start ts running on from that.
Of course, the adds are just inserting new rows.
So, what is the value for "end ts", when the record is inserted (the range just started)?
It's open ended, so the period is [start_ts, )
Chris
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Chris Withers <chris@simplistix.co.uk> writes:
On 16/09/2016 14:54, Igor Neyman wrote:
So, what is the value for "end ts", when the record is inserted (the range just started)?
It's open ended, so the period is [start_ts, )
I've not looked at the GiST range opclass, but I would not be surprised if
having lots of those is pretty destructive to the index's ability to be
selective about && searches.
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 16/09/2016 15:29, Tom Lane wrote:
Chris Withers <chris@simplistix.co.uk> writes:
On 16/09/2016 14:54, Igor Neyman wrote:
So, what is the value for "end ts", when the record is inserted (the range just started)?
It's open ended, so the period is [start_ts, )
I've not looked at the GiST range opclass, but I would not be surprised if
having lots of those is pretty destructive to the index's ability to be
selective about && searches.
If that's so, that's a little disappointing...
(I'd have thought the special case end value (open ended) and the ending
type (inclusive/exclusive) would just be sentinel values)
How would I verify your suspicions?
cheers,
Chris
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Fri, Sep 16, 2016 at 2:01 AM, Chris Withers <chris@simplistix.co.uk>
wrote:
Hi All,
I have quite a few tables that follow a pattern like this:
Table "public.my_model"
Column | Type | Modifiers
--------+-------------------+-----------
period | tsrange | not null
key | character varying | not null
value | integer |
Indexes:
"my_model_pkey" PRIMARY KEY, btree (period, key)
"my_model_period_key_excl" EXCLUDE USING gist (period WITH &&, key
WITH =)
Check constraints:
"my_model_period_check" CHECK (period <> 'empty'::tsrange)
Try swapping the order of the columns in the exclude constraint. You want
the more selective criterion to appear first in the index/constraint.
Presumably "key with =" is the most selective, especially if many of your
periods are unbounded.
Cheers,
Jeff
Jeff Janes wrote
Try swapping the order of the columns in the exclude constraint. You want
the more selective criterion to appear first in the index/constraint.
Presumably "key with =" is the most selective, especially if many of your
periods are unbounded.
I would not be so sure with that:
http://use-the-index-luke.com/sql/myth-directory/most-selective-first
--
View this message in context: http://postgresql.nabble.com/performance-problems-with-bulk-inserts-updates-on-tsrange-with-gist-based-exclude-constrains-tp5921498p5922219.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Wed, Sep 21, 2016 at 2:18 PM, pinker <pinker@onet.eu> wrote:
Jeff Janes wrote
Try swapping the order of the columns in the exclude constraint. You
want
the more selective criterion to appear first in the index/constraint.
Presumably "key with =" is the most selective, especially if many of your
periods are unbounded.I would not be so sure with that:
As a rule, I generally don't spout random nonsense. Or at least, not
without including a disclaimer. I didn't test it on the OPs exact case,
because he has need blessed us with his data or his scripts. But I have
tested it on other data, and it does work.
http://use-the-index-luke.com/sql/myth-directory/most-selective-first
I don't see how anything there applies to GiST indexes. Indeed, there
doesn't seem to be much there worth reading at all. The only thing vaguely
informative, other than trivia about other RDBMS, is "It’s useless to have
the most selective column of the index on the left if very few queries
filter on it", which is rather obvious, but also obviously does not apply
to this case.
Cheers,
Jeff
Hey Tom,
I appreciate you're busy, but did you ever get a chance to look at this?
On 19/09/2016 08:40, Chris Withers wrote:
On 16/09/2016 15:29, Tom Lane wrote:
Chris Withers <chris@simplistix.co.uk> writes:
On 16/09/2016 14:54, Igor Neyman wrote:
So, what is the value for "end ts", when the record is inserted (the
range just started)?It's open ended, so the period is [start_ts, )
I've not looked at the GiST range opclass, but I would not be
surprised if
having lots of those is pretty destructive to the index's ability to be
selective about && searches.If that's so, that's a little disappointing...
(I'd have thought the special case end value (open ended) and the ending
type (inclusive/exclusive) would just be sentinel values)How would I verify your suspicions?
cheers,
Chris
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Hey Tom,
I appreciate you're busy, but did you ever get a chance to look at this?
On 19/09/2016 08:40, Chris Withers wrote:
On 16/09/2016 15:29, Tom Lane wrote:
Chris Withers <chris@simplistix.co.uk> writes:
On 16/09/2016 14:54, Igor Neyman wrote:
So, what is the value for "end ts", when the record is inserted (the
range just started)?It's open ended, so the period is [start_ts, )
I've not looked at the GiST range opclass, but I would not be
surprised if
having lots of those is pretty destructive to the index's ability to be
selective about && searches.If that's so, that's a little disappointing...
(I'd have thought the special case end value (open ended) and the ending
type (inclusive/exclusive) would just be sentinel values)How would I verify your suspicions?
cheers,
Chris
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general