Partitioning

Started by James Sewellabout 11 years ago12 messagesgeneral

james.sewell@lisasoft.com

about 11 years ago

Hello,

I am using partitioning with around 100 sub-tables. Each sub-table is
around 11GB and partitioned on the 'id' column. I have an index on the id
column on each sub-table.

Is it possible to get a query like the following working using constraint
exclusion, or am I doomed to do index/sequential scans of every sub-table?

I want to select all rows which have an id which is in another query, so
something like:

WITH idlist as (SELECT id from othertable)
SELECT id from mastertable WHERE id = idlist.id);

I am guessing that I am not getting constraint exclusion to work as the
planner doesn't know the outcome of my subquery at plan time?

Any tricks I am overlooking?

James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________

Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099

------------------------------
The contents of this email are confidential and may be subject to legal or
professional privilege and copyright. No representation is made that this
email is free of viruses or other defects. If you have received this
communication in error, you may not copy or distribute any part of it or
otherwise disclose its contents to anyone. Please advise the sender of your
incorrect receipt of this correspondence.

John R Pierce

pierce@hogranch.com

about 11 years ago

In reply to: James Sewell (#1)

Re: Partitioning

On 1/18/2015 5:58 PM, James Sewell wrote:

WITH idlist as (SELECT id from othertable)
SELECT id from mastertable WHERE id = idlist.id <http://idlist.id>);

select mt.id, ... from mastertable mt join othertable ot on
mt.id=ot.id;

might optimize better.

--
john r pierce 37N 122W
somewhere on the middle of the left coast

James Sewell

james.sewell@lisasoft.com

about 11 years ago

In reply to: John R Pierce (#2)

Re: Partitioning

Sadly not ... I still hit all the tables.

Cheers,

James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________

Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099

On Mon, Jan 19, 2015 at 1:54 PM, John R Pierce <pierce@hogranch.com> wrote:

On 1/18/2015 5:58 PM, James Sewell wrote:

WITH idlist as (SELECT id from othertable)
SELECT id from mastertable WHERE id = idlist.id);

select mt.id, ... from mastertable mt join othertable ot on mt.id=
ot.id;

might optimize better.

--
john r pierce 37N 122W
somewhere on the middle of the left coast

Kyotaro Horiguchi

horikyota.ntt@gmail.com

about 11 years ago

In reply to: James Sewell (#3)

Re: Partitioning

Hello,

an 2015 14:13:37 +1100, James Sewell <james.sewell@lisasoft.com> wrote in <CANkGpBs8GypQ3TQGKdjTD+n-w1rkq5uO97h3tuhg5eWaKR6RbA@mail.gmail.com>

Sadly not ... I still hit all the tables.

| 5.9.4. Partitioning and Constraint Exclusion

http://www.postgresql.org/docs/9.4/static/ddl-partitioning.html

Constraint exclusion is a mechanism to omit tables that are known
to have no hit by the query *beforehand* execution. So the
criteria cannot rely on out of the query itself (and CHECK
constraints, of course).

Your query uses the result of the WITH-clause-query in the WHERE
clause which is unknown to the planner so constraint exclusion
does not work. JOINs don't change the situation.

On Mon, Jan 19, 2015 at 1:54 PM, John R Pierce <pierce@hogranch.com> wrote:

On 1/18/2015 5:58 PM, James Sewell wrote:

WITH idlist as (SELECT id from othertable)
SELECT id from mastertable WHERE id = idlist.id);

select mt.id, ... from mastertable mt join othertable ot on mt.id=
ot.id;

might optimize better.

As the result, the query inevitably scans all the tables, but not
necessariry in sequqntial scans or simple index scans. The
suggestion above seeems showing the notation which the planner
can find the better plans on that premise.

For example, if you have an index on id of one of the two tables,
(and some other conditions match, of course) index only scan will
be selected for it and the suggested query will give you a
seemingly better plan than your query.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

James Sewell

james.sewell@lisasoft.com

about 11 years ago

In reply to: Kyotaro Horiguchi (#4)

Re: Partitioning

Hello,

Thanks for the reply. I can write queries which get index scans, but they
are still slow.

Each index is about 2.5GB, I suspect I am trying to read a these into
memory in entirety.

Perhaps there is no way to tune this?

Cheers, james

On Monday, 19 January 2015, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello,

an 2015 14:13:37 +1100, James Sewell <james.sewell@lisasoft.com
<javascript:;>> wrote in <
CANkGpBs8GypQ3TQGKdjTD+n-w1rkq5uO97h3tuhg5eWaKR6RbA@mail.gmail.com
<javascript:;>>

Sadly not ... I still hit all the tables.

| 5.9.4. Partitioning and Constraint Exclusion

http://www.postgresql.org/docs/9.4/static/ddl-partitioning.html

Constraint exclusion is a mechanism to omit tables that are known
to have no hit by the query *beforehand* execution. So the
criteria cannot rely on out of the query itself (and CHECK
constraints, of course).

Your query uses the result of the WITH-clause-query in the WHERE
clause which is unknown to the planner so constraint exclusion
does not work. JOINs don't change the situation.

On Mon, Jan 19, 2015 at 1:54 PM, John R Pierce <pierce@hogranch.com

<javascript:;>> wrote:

On 1/18/2015 5:58 PM, James Sewell wrote:

WITH idlist as (SELECT id from othertable)
SELECT id from mastertable WHERE id = idlist.id);

select mt.id, ... from mastertable mt join othertable ot on

mt.id=

ot.id;

might optimize better.

As the result, the query inevitably scans all the tables, but not
necessariry in sequqntial scans or simple index scans. The
suggestion above seeems showing the notation which the planner
can find the better plans on that premise.

For example, if you have an index on id of one of the two tables,
(and some other conditions match, of course) index only scan will
be selected for it and the suggested query will give you a
seemingly better plan than your query.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________

Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099

John R Pierce

pierce@hogranch.com

about 11 years ago

In reply to: James Sewell (#5)

Re: Partitioning

On 1/18/2015 11:13 PM, James Sewell wrote:

Each index is about 2.5GB, I suspect I am trying to read a these into
memory in entirety.

an 11GB table with a (presumably integer) primary key requires an 2.5GB
index ? 100 of these would need 250GB of shared_buffers to stay
resident, not likely.

--
john r pierce 37N 122W
somewhere on the middle of the left coast

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

James Sewell

james.sewell@lisasoft.com

about 11 years ago

In reply to: John R Pierce (#6)

Re: Partitioning

Yeah definitely not staying resident once read, although the machine does
gave 256GB of memory so some will persist in the OS cache.

Actually this brings up another question, if I have an unique integer index
of 2.gb what percentage would I expect to read for a value that was higher
or lower than all index values?

Cheers,
James

On Monday, 19 January 2015, John R Pierce <pierce@hogranch.com> wrote:

On 1/18/2015 11:13 PM, James Sewell wrote:

Each index is about 2.5GB, I suspect I am trying to read a these into
memory in entirety.

an 11GB table with a (presumably integer) primary key requires an 2.5GB
index ? 100 of these would need 250GB of shared_buffers to stay
resident, not likely.

--
john r pierce 37N 122W
somewhere on the middle of the left coast

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________

Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099

John R Pierce

pierce@hogranch.com

about 11 years ago

In reply to: James Sewell (#7)

Re: Partitioning

On 1/18/2015 11:59 PM, James Sewell wrote:

Actually this brings up another question, if I have an unique integer
index of 2.gb <http://2.gb> what percentage would I expect to read for
a value that was higher or lower than all index values?

a couple 8k blocks. its a b-tree.

--
john r pierce 37N 122W
somewhere on the middle of the left coast

Kyotaro Horiguchi

horikyota.ntt@gmail.com

about 11 years ago

In reply to: John R Pierce (#8)

Re: Partitioning

Hi,

At Mon, 19 Jan 2015 00:14:55 -0800, John R Pierce <pierce@hogranch.com> wrote in <54BCBCFF.5040100@hogranch.com>

On 1/18/2015 11:59 PM, James Sewell wrote:

Actually this brings up another question, if I have an unique integer
index of 2.gb <http://2.gb> what percentage would I expect to read for
a value that was higher or lower than all index values?

a couple 8k blocks. its a b-tree.

Yeah, scanning children reading highest/lowest values for each of
them would make it faster than the first single query.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10

Kyotaro Horiguchi

horikyota.ntt@gmail.com

about 11 years ago

In reply to: James Sewell (#3)

Re: Partitioning

Sorry,

Yeah, scanning children reading highest/lowest values for each of them

would make it faster than the first single query.

Mmm..no. It has nothing different from querieng on the parent table. Please
Ignore the silly thing.

--
Kyotaro Horiguchi

#11

François Beausoleil

francois@teksol.info

about 11 years ago

In reply to: James Sewell (#1)

Re: Partitioning

Le 2015-01-18 à 20:58, James Sewell <james.sewell@lisasoft.com> a écrit :

Hello,

I am using partitioning with around 100 sub-tables. Each sub-table is around 11GB and partitioned on the 'id' column. I have an index on the id column on each sub-table.

Is it possible to get a query like the following working using constraint exclusion, or am I doomed to do index/sequential scans of every sub-table?

I want to select all rows which have an id which is in another query, so something like:

WITH idlist as (SELECT id from othertable)
SELECT id from mastertable WHERE id = idlist.id);

I am guessing that I am not getting constraint exclusion to work as the planner doesn't know the outcome of my subquery at plan time?

How many rows in idlist? Can you do two queries? Fetch the ids, then call the 2nd query with those values hard-coded in the query.

If necessary, and you know where each partition starts, tou could sort in the app and query the correct ranges, in multiple queries.

Hope that helps!
François Beausoleil

#12

Kyotaro Horiguchi

horikyota.ntt@gmail.com

about 11 years ago

In reply to: François Beausoleil (#11)

Re: Partitioning

Hi,

19 Jan 2015 06:58:21 -0500, François Beausoleil <francois@teksol.info> wrote in <CC2FD572-320A-4225-B98C-48F20992527C@teksol.info>

Le 2015-01-18 à 20:58, James Sewell <james.sewell@lisasoft.com> a écrit :

Hello,

I am using partitioning with around 100 sub-tables. Each sub-table is around 11GB and partitioned on the 'id' column. I have an index on the id column on each sub-table.

Is it possible to get a query like the following working using constraint exclusion, or am I doomed to do index/sequential scans of every sub-table?

I want to select all rows which have an id which is in another query, so something like:

WITH idlist as (SELECT id from othertable)
SELECT id from mastertable WHERE id = idlist.id);

I am guessing that I am not getting constraint exclusion to work as the planner doesn't know the outcome of my subquery at plan time?

How many rows in idlist? Can you do two queries? Fetch the ids, then call the 2nd query with those values hard-coded in the query.

If necessary, and you know where each partition starts, tou could sort in the app and query the correct ranges, in multiple queries.

The strategy's effectiveness is depends mainly on how many ids
come from the othertable. It wins if fewer than certain number or
converged in a few partitions, however, straight joins will win
elsewise.

The result of EXPLAIN ANALYZE might draw more precise advices.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general