Re: Partition performance causing ddl commands to slow down significantly

Started by Burgess, Freddieabout 13 years ago4 messagesbugs

FBurgess@Radiantblue.com

about 13 years ago

<html><body><span style="font-family:Verdana; color:#000000; font-size:10pt;"><div>Sorry, Is this visible? </div>
<div> </div>
<div>We are having performance related problems on one of our big data Partition tables. The table is partitioned by date and the partitions are organized from Jan 2003 thru Dec 2013. <BR>We have 268 child partitions associated with the Parent table, and we have constraint_exclusion=partition set. </div>
<div>The execution of the SQL query:  select count(*) from dna_strands; </div>
<div>yields:  QUERY PLAN<BR>_____________________________________________________________________________________________   </div>
<div>Aggregate (cost=2246778.49..2246778.50 rows=1 width=0)<BR>  -> Append (0.00..2159647.04 rows=34852580 width=0)<BR>     -> Seq Scan on dna_strands (cost=0.00..0.00 rows=1 width)<BR>          Filter: (cid = 1)<BR>     -> Index Scan using dna_strands_y2003m01_cid on dna_strands_y2003m01 dna_strands (cost=0.00..677652 rows=1 width=0)<BR>           Index Cond: (cid = 1)<BR>     -> Index Scan using dna_strands_y2003m02_cid on dna_strands_y2003m02 dna_strands (cost=0.00..974423 rows=1 width=0)<BR>           Index Cond: (cid = 1)<BR>     -> Index Scan using dna_strands_y2003m03_cid on dna_strands_y2003m03 dna_strands (cost=0.00..992301 rows=1 width=0)<BR>           Index Cond: (cid = 1)<BR>     ...<BR>     ...</div>
<div>     -> Index Scan using dna_strands_y2013m12_cid on dna_strands_y2013m12 dna_strands (cost=0.00..8.27 rows=1 width=0)<BR>           Index Cond: (cid = 1)</div>
<div>Question: Is there any way to modify the Planner to do the inverse of the Index Scan's.  In other words, to start the index scans in reverse order from <BR>the most recent date to the oldest date, i.e. "dna_strands_y2013m12" backwards. Our application users query much more heavily at the most recent data that <BR>has been ingested into the PostgreSQL database.  Would this capability speed up query performance?</div>
<div> </div>
<div>Thanks</div>
<div><BR> </div>
<div> </div>
<BLOCKQUOTE style="BORDER-LEFT: blue 2px solid; PADDING-LEFT: 8px; FONT-FAMILY: verdana; COLOR: black; MARGIN-LEFT: 8px; FONT-SIZE: 10pt" id=replyBlockquote webmail="1">
<DIV id=wmQuoteWrapper>-------- Original Message --------<BR>Subject: Re: [BUGS] Partition performance causing ddl commands to slow<BR>down significantly<BR>From: Andres Freund <<a href="mailto:andres@2ndquadrant.com">andres@2ndquadrant.com</a>><BR>Date: Fri, April 12, 2013 11:36 am<BR>To: <a href="mailto:fburgess@radiantblue.com">fburgess@radiantblue.com</a><BR><BR>Hi,<BR><BR>On 2013-04-12 11:31:33 -0700, <a href="mailto:fburgess@radiantblue.com">fburgess@radiantblue.com</a> wrote:<BR>> <html><body><span style="font-family:Verdana; color:#000000; font-size:10pt;"><div>We are having performance related problems on one of our big data Partition tables. The table is partitioned by date and the partitions are organized from Jan 2003 thru Dec 2013. <BR>We have 268 child partitions associated with the Parent table, and we have constraint_exclusion=partition set. </div><BR>> <div>&nbsp;</div><BR>> <div>The execution of the SQL query:&nbsp; select count(*) from dna_strands; </div><BR>> <div>yields:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; QUERY PLAN<BR>_____________________________________________________________________________________________&nbsp;&nbsp; </div><BR>> <div>Aggregate (cost=2246778.49..2246778.50 rows=1 width=0)<BR>&nbsp; -&gt; Append (0.00..2159647.04 rows=34852580 width=0)<BR>&nbsp;&nbsp;&nbsp;&nbsp; -&gt; Seq Scan on dna_strands (cost=0.00..0.00 rows=1 width)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Filter: (cid = 1)<BR>&nbsp;&nbsp;&nbsp;&nbsp; -&gt; Index Scan using dna_strands_y2003m01_cid on dna_strands_y2003m01 dna_strands (cost=0.00..677652 rows=1 width=0)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Index Cond: (cid = 1)<BR>&nbsp;&nbsp;&nbsp;&nbsp; -&gt; Index Scan using dna_strands_y2003m02_cid on dna_strands_y2003m02 dna_strands (cost=0.00..974423 rows=1 width=0)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Index Cond: (cid = 1)<BR>&nbsp;&nbsp;&nbsp;&nbsp; -&gt; Index Scan using dna_strands_y2003m03_cid on dna_strands_y2003m03 dna_strands (cost=0.00..992301 rows=1 width=0)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Index Cond: (cid = 1)<BR>&nbsp;&nbsp;&nbsp;&nbsp; ...<BR>&nbsp;&nbsp;&nbsp;&nbsp; ...</div><BR>> <div>&nbsp;&nbsp;&nbsp;&nbsp; ...</div><BR>> <div>&nbsp;</div><BR>> <div>&nbsp;&nbsp;&nbsp;&nbsp; -&gt; Index Scan using dna_strands_y2013m12_cid on dna_strands_y2013m12 dna_strands (cost=0.00..8.27 rows=1 width=0)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Index Cond: (cid = 1)</div><BR>> <div>Question: Is there any way to modify the Planner to do the inverse of the Index Scan's.&nbsp; In other words, to start the index scans in reverse order from <BR>the most recent date to the oldest date, i.e. "dna_strands_y2013m12" backwards. Our application users query much more heavily at the most recent data that <BR>has been ingested into the PostgreSQL database.&nbsp; Would this capability speed up query performance?</div><BR>> <div>Thanks</div><BR>> <div><BR>&nbsp;</div></span></body></html><BR><BR>Youre sending completely unreadable html only mails again.<BR><BR>Greetings,<BR><BR>Andres Freund<BR><BR>-- <BR>Andres Freund <a href="http://www.2ndQuadrant.com">http://www.2ndQuadrant.com</a>/<BR>PostgreSQL Development, 24x7 Support, Training & Services<BR></DIV></BLOCKQUOTE></span></body></html>

John R Pierce

pierce@hogranch.com

about 13 years ago

In reply to: Burgess, Freddie (#1)

On 4/12/2013 11:46 AM, fburgess@radiantblue.com wrote:

We are having performance related problems on one of our big data
Partition tables. The table is partitioned by date and the partitions
are organized from Jan 2003 thru Dec 2013.
We have 268 child partitions associated with the Parent table, and we
have constraint_exclusion=partition set.
The execution of the SQL query: select count(*) from dna_strands;
yields: QUERY PLAN
_____________________________________________________________________________________________

Aggregate (cost=2246778.49..2246778.50 rows=1 width=0)
-> Append (0.00..2159647.04 rows=34852580 width=0)
-> Seq Scan on dna_strands (cost=0.00..0.00 rows=1 width)
Filter: (cid = 1)
-> Index Scan using dna_strands_y2003m01_cid on
dna_strands_y2003m01 dna_strands (cost=0.00..677652 rows=1 width=0)
Index Cond: (cid = 1)
-> Index Scan using dna_strands_y2003m02_cid on
dna_strands_y2003m02 dna_strands (cost=0.00..974423 rows=1 width=0)
Index Cond: (cid = 1)
-> Index Scan using dna_strands_y2003m03_cid on
dna_strands_y2003m03 dna_strands (cost=0.00..992301 rows=1 width=0)
Index Cond: (cid = 1)
...
...
-> Index Scan using dna_strands_y2013m12_cid on
dna_strands_y2013m12 dna_strands (cost=0.00..8.27 rows=1 width=0)
Index Cond: (cid = 1)
Question: Is there any way to modify the Planner to do the inverse of
the Index Scan's. In other words, to start the index scans in reverse
order from
the most recent date to the oldest date, i.e. "dna_strands_y2013m12"
backwards. Our application users query much more heavily at the most
recent data that
has been ingested into the PostgreSQL database. Would this capability
speed up query performance?

it wouldn't speed up your example, as your example has to scan every
single row of the whole mess. not sure where Filter: (cid=1) comes
from, since you showed the query as SELECT COUNT(*) FROM dna_strands;

--
john r pierce 37N 122W
somewhere on the middle of the left coast

Burgess, Freddie

FBurgess@Radiantblue.com

about 13 years ago

In reply to: John R Pierce (#2)

<html><body><span style="font-family:Verdana; color:#000000; font-size:10pt;"><div>Hi John,</div>
<div> </div>
<div>The SQL query should be: select count(*) from dna_strands where cid = 1;</div>
<div> </div>
<div>I  just realize don't think this is not going to work. if for the sake of argument that cid = 1 is much more likely be be found in a more recent partition, any inverse search mechanism in the planner will find that match first but then continue through all of the other partitions, Right? The only way to optimize this is to find some way of adding the partition key date to the where clause.</div>
<div> </div>
<div>In actuality the query's are being generated via Hibernate</div>
<div> </div>
<div>thanks</div>
<BLOCKQUOTE style="BORDER-LEFT: blue 2px solid; PADDING-LEFT: 8px; FONT-FAMILY: verdana; COLOR: black; MARGIN-LEFT: 8px; FONT-SIZE: 10pt" id=replyBlockquote webmail="1">
<DIV id=wmQuoteWrapper>-------- Original Message --------<BR>Subject: Re: [BUGS] Partition performance causing ddl commands to slow<BR>down significantly<BR>From: John R Pierce <<a href="mailto:pierce@hogranch.com">pierce@hogranch.com</a>><BR>Date: Fri, April 12, 2013 11:53 am<BR>To: <a href="mailto:pgsql-bugs@postgresql.org">pgsql-bugs@postgresql.org</a><BR><BR>
<DIV class=moz-cite-prefix>On 4/12/2013 11:46 AM, <A class=moz-txt-link-abbreviated href="mailto:fburgess@radiantblue.com" target=_blank>fburgess@radiantblue.com</A> wrote:<BR></DIV>
<BLOCKQUOTE cite=mid:20130412114630.5a830134ae84016b0174832fdc1a3173.fc7948457c.wbe@email11.secureserver.net type="cite"><SPAN style="FONT-FAMILY: Verdana; COLOR: #000000; FONT-SIZE: 10pt">
<DIV>We are having performance related problems on one of our big data Partition tables. The table is partitioned by date and the partitions are organized from Jan 2003 thru Dec 2013. <BR>We have 268 child partitions associated with the Parent table, and we have constraint_exclusion=partition set. </DIV>
<DIV>The execution of the SQL query:  select count(*) from dna_strands; </DIV>
<DIV>yields:  QUERY PLAN<BR>_____________________________________________________________________________________________   </DIV>
<DIV>Aggregate (cost=2246778.49..2246778.50 rows=1 width=0)<BR>  -> Append (0.00..2159647.04 rows=34852580 width=0)<BR>     -> Seq Scan on dna_strands (cost=0.00..0.00 rows=1 width)<BR>          Filter: (cid = 1)<BR>     -> Index Scan using dna_strands_y2003m01_cid on dna_strands_y2003m01 dna_strands (cost=0.00..677652 rows=1 width=0)<BR>           Index Cond: (cid = 1)<BR>     -> Index Scan using dna_strands_y2003m02_cid on dna_strands_y2003m02 dna_strands (cost=0.00..974423 rows=1 width=0)<BR>           Index Cond: (cid = 1)<BR>     -> Index Scan using dna_strands_y2003m03_cid on dna_strands_y2003m03 dna_strands (cost=0.00..992301 rows=1 width=0)<BR>           Index Cond: (cid = 1)<BR>     ...<BR>     ...</DIV>
<DIV>     -> Index Scan using dna_strands_y2013m12_cid on dna_strands_y2013m12 dna_strands (cost=0.00..8.27 rows=1 width=0)<BR>           Index Cond: (cid = 1)</DIV>
<DIV>Question: Is there any way to modify the Planner to do the inverse of the Index Scan's.  In other words, to start the index scans in reverse order from <BR>the most recent date to the oldest date, i.e. "dna_strands_y2013m12" backwards. Our application users query much more heavily at the most recent data that <BR>has been ingested into the PostgreSQL database.  Would this capability speed up query performance?</DIV></SPAN></BLOCKQUOTE><BR>it wouldn't speed up your example, as your example has to scan every single row of the whole mess.   not sure where Filter: (cid=1) comes from, since you showed the query as SELECT COUNT(*) FROM dna_strands;<BR><BR><BR><PRE class=moz-signature cols="72">-- john r pierce 37N 122W somewhere on the middle of the left coast</PRE></DIV></BLOCKQUOTE></span></body></html>

Import Notes

Resolved by subject fallback

John R Pierce

pierce@hogranch.com

about 13 years ago

In reply to: Burgess, Freddie (#3)

On 4/12/2013 12:10 PM, fburgess@radiantblue.com wrote:

The SQL query should be: select count(*) from dna_strands where cid = 1;

ah. I suspected as much. its so hard to analyze problems with
incorrect information and so easy to make wrong assumptions.

I just realize don't think this is not going to work. if for the sake
of argument that cid = 1 is much more likely be be found in a more
recent partition, any inverse search mechanism in the planner will
find that match first but then continue through all of the other
partitions, Right? The only way to optimize this is to find some way
of adding the partition key date to the where clause.

yeah, pretty much. all partitions HAVE to be scanned in case there's a
row with that condition, there's no way of avoiding that.

and yeah, abstraction layers like Hibernate make things even harder to
manage

btw, I think you said 268 child partitions. that is, in my experience,
way too many. we try and keep partitions under a few dozen even on our
multi-terabyte tables. 6 months by week is about as far as we go.
with your 10 year data, I'd probably partition by quarter, or something.

--
john r pierce 37N 122W
somewhere on the middle of the left coast