performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

Started by Chris Withersover 9 years ago18 messagesgeneral

chris@simplistix.co.uk

over 9 years ago

Hi All,

I have quite a few tables that follow a pattern like this:

So, a primary key of a period column and one or more other columns
(usually int or string) and an exclude constraint to prevent overlaps,
and a check constraint to prevent empty ranges.

However, I'm hitting performance problems on moderate bulk inserts and
updates, with ~700k rows taking around 13 minutes. Profiling my python
code suggests that most of the time is being taken by Postgres (9.4 in
this case...)

What can I do to speed things up? Is there a different type of index I
can use to achieve the same exclude constraint? Is there something I can
do to have the index changes only done on the commit of the bulk batches?

cheers,

Chris

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

John R Pierce

pierce@hogranch.com

over 9 years ago

In reply to: Chris Withers (#1)

Re: performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

On 9/16/2016 2:01 AM, Chris Withers wrote:

Hi All,

I have quite a few tables that follow a pattern like this:

Table "public.my_model"
Column | Type | Modifiers
--------+-------------------+-----------
period | tsrange | not null
key | character varying | not null
value | integer |
Indexes:
"my_model_pkey" PRIMARY KEY, btree (period, key)
"my_model_period_key_excl" EXCLUDE USING gist (period WITH &&,
key WITH =)
Check constraints:
"my_model_period_check" CHECK (period <> 'empty'::tsrange)

So, a primary key of a period column and one or more other columns
(usually int or string) and an exclude constraint to prevent overlaps,
and a check constraint to prevent empty ranges.

However, I'm hitting performance problems on moderate bulk inserts and
updates, with ~700k rows taking around 13 minutes. Profiling my python
code suggests that most of the time is being taken by Postgres (9.4 in
this case...)

What can I do to speed things up? Is there a different type of index I
can use to achieve the same exclude constraint? Is there something I
can do to have the index changes only done on the commit of the bulk
batches?

if (period,key) is unique, by virtue of being the primary key, then
whats the point of the exclusion ??

I'm curious, how fast do your insert/updates run if you remove the key
exclusion and check constraint ? tsvector operations are a lot more
complicated than simple matches in indexing....

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

John R Pierce

pierce@hogranch.com

over 9 years ago

In reply to: John R Pierce (#2)

Re: performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

On 9/16/2016 2:12 AM, John R Pierce wrote:

Column | Type | Modifiers
--------+-------------------+-----------
period | tsrange | not null

wait, what is a tsrange? the standard textsearch data types in
postgres are tsvector and tsquery,

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

John R Pierce

pierce@hogranch.com

over 9 years ago

In reply to: John R Pierce (#3)

Re: performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

On 9/16/2016 2:23 AM, John R Pierce wrote:

wait, what is a tsrange? the standard textsearch data types in
postgres are tsvector and tsquery,

never mind, I should have known, its a timestamp range. ...

when you do updates, are you changing any of the indexed fields, or just
"value" ?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Chris Withers

chris@withers.org

over 9 years ago

In reply to: John R Pierce (#4)

Re: performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

On 16/09/2016 10:26, John R Pierce wrote:

On 9/16/2016 2:23 AM, John R Pierce wrote:

wait, what is a tsrange? the standard textsearch data types in
postgres are tsvector and tsquery,

never mind, I should have known, its a timestamp range. ...

when you do updates, are you changing any of the indexed fields, or
just "value" ?

Yeah, it's a temporal table, so "updates" involve modifying the period
column for a row to set its end ts, and then inserting a new row with a
start ts running on from that.

Of course, the adds are just inserting new rows.

cheers,

Chris

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Chris Withers

chris@simplistix.co.uk

over 9 years ago

In reply to: John R Pierce (#4)

Re: performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

On 16/09/2016 10:26, John R Pierce wrote:

On 9/16/2016 2:23 AM, John R Pierce wrote:

wait, what is a tsrange? the standard textsearch data types in
postgres are tsvector and tsquery,

never mind, I should have known, its a timestamp range. ...

when you do updates, are you changing any of the indexed fields, or
just "value" ?

Yeah, it's a temporal table, so "updates" involve modifying the period
column for a row to set its end ts, and then inserting a new row with a
start ts running on from that.

Of course, the adds are just inserting new rows.

cheers,

Chris

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

John R Pierce

pierce@hogranch.com

over 9 years ago

In reply to: Chris Withers (#6)

Re: performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

On 9/16/2016 3:46 AM, Chris Withers wrote:

when you do updates, are you changing any of the indexed fields, or
just "value" ?

Yeah, it's a temporal table, so "updates" involve modifying the period
column for a row to set its end ts, and then inserting a new row with
a start ts running on from that.

thats expensive, as it has to reindex that row. and range indexes are
more expensive than timestamp indexes

modifiyng the primary key is kind of a violation of one of the basic
rules of relational databases as it means the row can't be referenced by
another table.

I expect the expensive one is the constraint that ensures no periods
overlap for the given key. I'm not sure how that can be done short of
a full scan for each update/insert. it might actually perform better
if you write the index with the key first as presumably the key is
invariant ?

--
john r pierce, recycling bits in santa cruz

Chris Withers

chris@withers.org

over 9 years ago

In reply to: John R Pierce (#7)

Re: performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

On 16/09/2016 12:00, John R Pierce wrote:

On 9/16/2016 3:46 AM, Chris Withers wrote:

when you do updates, are you changing any of the indexed fields, or
just "value" ?

Yeah, it's a temporal table, so "updates" involve modifying the period
column for a row to set its end ts, and then inserting a new row with
a start ts running on from that.

thats expensive, as it has to reindex that row. and range indexes are
more expensive than timestamp indexes

modifiyng the primary key is kind of a violation of one of the basic
rules of relational databases as it means the row can't be referenced by
another table.

Right, but these rows have no natural primary key. Would it help if I
just added an auto-incrementing integer key? Would that make a positive
difference or would it just be a wasted column?

I expect the expensive one is the constraint that ensures no periods
overlap for the given key. I'm not sure how that can be done short of
a full scan for each update/insert.

Indeed, I wonder if making the constraint deferrable might help for the
bulk case?

it might actually perform better
if you write the index with the key first as presumably the key is
invariant ?

You mean:

PRIMARY KEY, btree (period, key) as opposed to

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Chris Withers

chris@simplistix.co.uk

over 9 years ago

In reply to: John R Pierce (#7)

Re: performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

On 16/09/2016 12:00, John R Pierce wrote:

On 9/16/2016 3:46 AM, Chris Withers wrote:

when you do updates, are you changing any of the indexed fields, or
just "value" ?

Yeah, it's a temporal table, so "updates" involve modifying the period
column for a row to set its end ts, and then inserting a new row with
a start ts running on from that.

thats expensive, as it has to reindex that row. and range indexes are
more expensive than timestamp indexes

modifiyng the primary key is kind of a violation of one of the basic
rules of relational databases as it means the row can't be referenced by
another table.

Right, but these rows have no natural primary key. Would it help if I
just added an auto-incrementing integer key? Would that make a positive
difference or would it just be a wasted column?

I expect the expensive one is the constraint that ensures no periods
overlap for the given key. I'm not sure how that can be done short of
a full scan for each update/insert.

Indeed, I wonder if making the constraint deferrable might help for the
bulk case?

it might actually perform better
if you write the index with the key first as presumably the key is
invariant ?

You mean:

PRIMARY KEY, btree (key1, key2, period)

as opposed to

PRIMARY KEY, btree (period, key)

Interesting, I'd assumed postgres would optimise that under the covers...

Chris

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10

Igor Neyman

ineyman@perceptron.com

over 9 years ago

In reply to: Chris Withers (#6)

Re: performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Chris Withers
Sent: Friday, September 16, 2016 6:47 AM
To: John R Pierce <pierce@hogranch.com>; pgsql-general@postgresql.org
Subject: Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

On 16/09/2016 10:26, John R Pierce wrote:

On 9/16/2016 2:23 AM, John R Pierce wrote:

wait, what is a tsrange? the standard textsearch data types in
postgres are tsvector and tsquery,

never mind, I should have known, its a timestamp range. ...

when you do updates, are you changing any of the indexed fields, or
just "value" ?

Yeah, it's a temporal table, so "updates" involve modifying the period column for a row to set its end ts, and then inserting a new row with a start ts running on from that.

Of course, the adds are just inserting new rows.

cheers,

Chris

____________________________________________________________________________________________________

So, what is the value for "end ts", when the record is inserted (the range just started)?

Regards,
Igor Neyman

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11

Chris Withers

chris@simplistix.co.uk

over 9 years ago

In reply to: Igor Neyman (#10)

Re: performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

On 16/09/2016 14:54, Igor Neyman wrote:

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Chris Withers
Sent: Friday, September 16, 2016 6:47 AM
To: John R Pierce <pierce@hogranch.com>; pgsql-general@postgresql.org
Subject: Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

On 16/09/2016 10:26, John R Pierce wrote:

On 9/16/2016 2:23 AM, John R Pierce wrote:

wait, what is a tsrange? the standard textsearch data types in
postgres are tsvector and tsquery,

never mind, I should have known, its a timestamp range. ...

when you do updates, are you changing any of the indexed fields, or
just "value" ?

Yeah, it's a temporal table, so "updates" involve modifying the period column for a row to set its end ts, and then inserting a new row with a start ts running on from that.

Of course, the adds are just inserting new rows.

So, what is the value for "end ts", when the record is inserted (the range just started)?

It's open ended, so the period is [start_ts, )

Chris

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#12

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Chris Withers (#11)

Re: performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

Chris Withers <chris@simplistix.co.uk> writes:

On 16/09/2016 14:54, Igor Neyman wrote:

So, what is the value for "end ts", when the record is inserted (the range just started)?

It's open ended, so the period is [start_ts, )

I've not looked at the GiST range opclass, but I would not be surprised if
having lots of those is pretty destructive to the index's ability to be
selective about && searches.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#13

Chris Withers

chris@simplistix.co.uk

over 9 years ago

In reply to: Tom Lane (#12)

Re: performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

On 16/09/2016 15:29, Tom Lane wrote:

Chris Withers <chris@simplistix.co.uk> writes:

On 16/09/2016 14:54, Igor Neyman wrote:

So, what is the value for "end ts", when the record is inserted (the range just started)?

It's open ended, so the period is [start_ts, )

I've not looked at the GiST range opclass, but I would not be surprised if
having lots of those is pretty destructive to the index's ability to be
selective about && searches.

If that's so, that's a little disappointing...
(I'd have thought the special case end value (open ended) and the ending
type (inclusive/exclusive) would just be sentinel values)

How would I verify your suspicions?

cheers,

Chris

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#14

Jeff Janes

jeff.janes@gmail.com

over 9 years ago

In reply to: Chris Withers (#1)

Re: performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

On Fri, Sep 16, 2016 at 2:01 AM, Chris Withers <chris@simplistix.co.uk>
wrote:

Hi All,

I have quite a few tables that follow a pattern like this:

Table "public.my_model"
Column | Type | Modifiers
--------+-------------------+-----------
period | tsrange | not null
key | character varying | not null
value | integer |
Indexes:
"my_model_pkey" PRIMARY KEY, btree (period, key)
"my_model_period_key_excl" EXCLUDE USING gist (period WITH &&, key
WITH =)
Check constraints:
"my_model_period_check" CHECK (period <> 'empty'::tsrange)

Try swapping the order of the columns in the exclude constraint. You want
the more selective criterion to appear first in the index/constraint.
Presumably "key with =" is the most selective, especially if many of your
periods are unbounded.

Cheers,

Jeff

#15

pinker

pinker@onet.eu

over 9 years ago

In reply to: Jeff Janes (#14)

Re: performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

Jeff Janes wrote

Try swapping the order of the columns in the exclude constraint. You want
the more selective criterion to appear first in the index/constraint.
Presumably "key with =" is the most selective, especially if many of your
periods are unbounded.

I would not be so sure with that:
http://use-the-index-luke.com/sql/myth-directory/most-selective-first

--
View this message in context: http://postgresql.nabble.com/performance-problems-with-bulk-inserts-updates-on-tsrange-with-gist-based-exclude-constrains-tp5921498p5922219.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#16

Jeff Janes

jeff.janes@gmail.com

over 9 years ago

In reply to: pinker (#15)

Re: Re: performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

On Wed, Sep 21, 2016 at 2:18 PM, pinker <pinker@onet.eu> wrote:

Jeff Janes wrote

Try swapping the order of the columns in the exclude constraint. You

want

the more selective criterion to appear first in the index/constraint.
Presumably "key with =" is the most selective, especially if many of your
periods are unbounded.

I would not be so sure with that:

As a rule, I generally don't spout random nonsense. Or at least, not
without including a disclaimer. I didn't test it on the OPs exact case,
because he has need blessed us with his data or his scripts. But I have
tested it on other data, and it does work.

http://use-the-index-luke.com/sql/myth-directory/most-selective-first

I don't see how anything there applies to GiST indexes. Indeed, there
doesn't seem to be much there worth reading at all. The only thing vaguely
informative, other than trivia about other RDBMS, is "It’s useless to have
the most selective column of the index on the left if very few queries
filter on it", which is rather obvious, but also obviously does not apply
to this case.

Cheers,

Jeff

#17

Chris Withers

chris@withers.org

over 9 years ago

In reply to: Chris Withers (#13)

Re: performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

Hey Tom,

I appreciate you're busy, but did you ever get a chance to look at this?

On 19/09/2016 08:40, Chris Withers wrote:

On 16/09/2016 15:29, Tom Lane wrote:

Chris Withers <chris@simplistix.co.uk> writes:

On 16/09/2016 14:54, Igor Neyman wrote:

So, what is the value for "end ts", when the record is inserted (the
range just started)?

It's open ended, so the period is [start_ts, )

I've not looked at the GiST range opclass, but I would not be
surprised if
having lots of those is pretty destructive to the index's ability to be
selective about && searches.

If that's so, that's a little disappointing...
(I'd have thought the special case end value (open ended) and the ending
type (inclusive/exclusive) would just be sentinel values)

How would I verify your suspicions?

cheers,

Chris

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#18

Chris Withers

chris@simplistix.co.uk

over 9 years ago

In reply to: Chris Withers (#13)

Re: performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

Hey Tom,

I appreciate you're busy, but did you ever get a chance to look at this?

On 19/09/2016 08:40, Chris Withers wrote:

On 16/09/2016 15:29, Tom Lane wrote:

Chris Withers <chris@simplistix.co.uk> writes:

On 16/09/2016 14:54, Igor Neyman wrote:

So, what is the value for "end ts", when the record is inserted (the
range just started)?

It's open ended, so the period is [start_ts, )

I've not looked at the GiST range opclass, but I would not be
surprised if
having lots of those is pretty destructive to the index's ability to be
selective about && searches.

If that's so, that's a little disappointing...
(I'd have thought the special case end value (open ended) and the ending
type (inclusive/exclusive) would just be sentinel values)

How would I verify your suspicions?

cheers,

Chris

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general