Proposed Query Planner TODO items

Started by Josh Berkusabout 22 years ago54 messages

josh@agliodbs.com

about 22 years ago

PG Folks,

What follows are a couple of proposed TODO items to make up for some of the
places our planner is weak compared to other leading databases.
Particularly, I'm personally concerned that as of 7.4.0 we would "fail" the
TPC benchmark even if someone sponsored us for it (see Issue #2 below).

I freely admit that I don't have the skill to implement either of these;
instead, I want them on the TODO list just so we don't lose track of them,
and just in case some new brilliant coder jumps into our community looking
for something to do.

1) MAINTAIN CORROLARY STATS ON FORIEGN KEYS

Summary: Keep correspondance statistics between FK columns.

Description: One of the areas of ongoing planner estimation problems
estimation of cross-table correspondence of column values. Indeed, as late
a 7.2.4 the WHERE EXISTS code just estimated a flat 50%.
While it would be practically impossible to maintain statistics between all
columns in a database that might possibly be compared, there is one class of
cross-table column comparisons which is both used heavily and is readily
identifiable: foriegn keys.
My proposal is to keep statistics on the correlation of values between the
key and foriegn key values in order to arrive at better estimates. Adapting
the newly committed pg_indexstats to track this as well seems to me to be the
easiest method, but I'll admit to not really understanding Manfried's code.

NOTE: This suggestion was dicussed on Hackers early last summer and received
general approval but somehow never ended up on the TODO list.

2) DEVELOP BETTER PLANS FOR "OR GROUP" QUERIES

Summary: Currently, queries with complex "or group" criteria get devolved by
the planner into canonical and-or filters resulting in very poor execution on
large data sets. We should find better ways of dealing with these queries,
for example UNIONing.

Description: While helping OSDL with their derivative TPC-R benchmark, we ran
into a query (#19) which took several hours to complete on PostgreSQL. It
was in the general form:

SELECT t1.a, t2.b
FROM t1, t2
WHERE t1.a = t2.a
AND (
( t1.c = x
AND t1.f IN (m, n, o)
AND t2.d = v
AND t2.e BETWEEN j AND k
)
OR
( t1.c = y
AND t1.f IN (n, o, p)
AND t2.d = v
AND t2.e BETWEEN k AND h
)
OR
( t1.c = z
AND t1.f IN (p, q)
AND t2.d = w
AND t2.e BETWEEN k AND h
)
)

The reason why this query is included in the TPC benchmarks is the reason I've
run into problems with similar querys before; it is the kind of query
produced by many 3rd-party decision-support and reporting applications. Its
distinguishing feature is the same thing which gives PG indigestion; the
distinct OR groups with a complex set of criteria for each.

Or planner's approach to this sort of query is to devolve the criteria into a
3-page long set of canonical and-or filters, and seq scan the entire
underlying data set. This is fine if the data set is small, but if it's
several times the size of RAM, a full-table seq scan is fatal, as it is for
TPC-R which seems specifically designed to test for this kind of failure.

One solution which suggests itself is that the following query form runs in a
couple of seconds:

SELECT t1.a, t2.b
FROM t1, t2
WHERE t1.a = t2.a
AND t1.c = x
AND t1.f IN (m, n, o)
AND t2.d = v
AND t2.e BETWEEN j AND k
UNION ALL
SELECT t1.a, t2.b
FROM t1, t2
WHERE t1.a = t2.a
AND t1.c = y
AND t1.f IN (n, o, p)
AND t2.d = v
AND t2.e BETWEEN k AND h
UNION ALL
SELECT t1.a, t2.b
FROM t1, t2
AND t1.c = z
AND t1.f IN (p, q)
AND t2.d = w
AND t2.e BETWEEN k AND h

So the trick would be teaching the planner to:
a) recognize an "or group query" when it sees one;
b) break down that query into a multi-part union and estimate the cost

However, I'm sure there are other possible solutions. Oracle and MSSQL have
solved this particular query problem; anybody know how they do it?

--
Josh Berkus
Aglio Database Solutions
San Francisco

Josh Berkus

josh@agliodbs.com

about 22 years ago

In reply to: Josh Berkus (#1)

Re: Proposed Query Planner TODO items

John,

SELECT t1.a, t2.b
FROM t1, t2
WHERE t1.a = t2.a
AND t1.c = x
AND t1.f IN (m, n, o)
AND t2.d = v
AND t2.e BETWEEN j AND k
UNION ALL

Shouldn't that be "UNION" instead of "UNION ALL"? You don't want
duplicate rows, if i'm not mistaken.

Yes, you're correct; I copied UNION ALL from a test case which was not
generic. In general, one would want UNION.

--
-Josh Berkus
Aglio Database Solutions
San Francisco

Import Notes

Reply to msg id not found: 20031205195110.GA19331@performics.com

Greg Stark

gsstark@mit.edu

about 22 years ago

In reply to: Josh Berkus (#1)

Re: Proposed Query Planner TODO items

I know Oracle is capable of producing the UNION plan. but I don't know if
that's the only option. I'm curious what indexes the rewritten union-based
query used.

Josh Berkus <josh@agliodbs.com> writes:

SELECT t1.a, t2.b
FROM t1, t2
WHERE t1.a = t2.a
AND (
( t1.c = x
AND t1.f IN (m, n, o)
AND t2.d = v
AND t2.e BETWEEN j AND k
)
OR
( t1.c = y
AND t1.f IN (n, o, p)
AND t2.d = v
AND t2.e BETWEEN k AND h
)
OR
( t1.c = z
AND t1.f IN (p, q)
AND t2.d = w
AND t2.e BETWEEN k AND h
)
)

In this case it seems like it might be possible to look for a covering set
that is guaranteed to include all the records and doesn't include any ORs. If
that covering set can be scanned quickly then the complex conditions could be
tested on the resulting records individually.

In this case it would be something like

select t1.a,t2.b from t1,t2 where t1.a = t2.a
and ( t1.c in (x,y,z)
and t1.f in (m,n,o,p,q)
and t2.d in (v,w)
and t2.e between min(j,k) and max(k,h)
)
and (.... the above constraints...)

It seems like it would be a lot of work and only help in narrow cases though.

--
greg

Tom Lane

tgl@sss.pgh.pa.us

about 22 years ago

In reply to: Josh Berkus (#1)

Re: Proposed Query Planner TODO items

Josh Berkus <josh@agliodbs.com> writes:

Summary: Currently, queries with complex "or group" criteria get devolved by
the planner into canonical and-or filters resulting in very poor execution on
large data sets. We should find better ways of dealing with these queries,
for example UNIONing.

Could we see the actual present query plans for both the TPC-R query
and the UNION version? (I'll settle for "explain" on the slow
version, but "explain analyze" on the other, please.)

In general I am suspicious of proposals to rewrite queries into UNION
"equivalents", because the "equivalent" usually isn't exactly
equivalent, at least not without conditions that the planner can't
easily prove. This proposal looks a lot like the KSQO optimization that
we put in and then took out again several years ago --- it also rewrote
queries into a UNION form, only the UNION didn't necessarily produce
identical results.

I am thinking that the guys who do this query fast are probably
extracting single-relation subsets of the big OR/AND clause, so that
they can do some filtering of the input tables before the join. Our
existing planner would think that the OR/AND clause is only usable at
the join step, which is why it's seqscanning. But if we pulled out
subsets, we could have for instance

WHERE t1.a = t2.a
AND (
( t1.c = x
AND t1.f IN (m, n, o)
AND t2.d = v
AND t2.e BETWEEN j AND k
)
OR
( t1.c = y
AND t1.f IN (n, o, p)
AND t2.d = v
AND t2.e BETWEEN k AND h
)
OR
( t1.c = z
AND t1.f IN (p, q)
AND t2.d = w
AND t2.e BETWEEN k AND h
)
)
AND ( t1.c = x OR t1.c = y OR t1.c = z )

which is redundant, but that last clause could enable an indexscan on t1.c.

However ... the planner has code in it already that should do something
close to that, so there may be something I am missing. Again, could we
see EXPLAIN results?

regards, tom lane

Josh Berkus

josh@agliodbs.com

about 22 years ago

In reply to: Tom Lane (#4)

Re: Proposed Query Planner TODO items

Tom,

In general I am suspicious of proposals to rewrite queries into UNION
"equivalents", because the "equivalent" usually isn't exactly
equivalent, at least not without conditions that the planner can't
easily prove.

As I said, I'm not sure that UNIONing the query is the solution, we just need
something other than what the planner currently does, which does not
complete.

Explains later today.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Josh Berkus

josh@agliodbs.com

about 22 years ago

In reply to: Tom Lane (#4)

Re: Proposed Query Planner TODO items

Tom,

Could we see the actual present query plans for both the TPC-R query
and the UNION version? (I'll settle for "explain" on the slow
version, but "explain analyze" on the other, please.)

I'm not going to be able to set this up. I just had to put my server into
cold storage due to dismantling my office, and running the TPC stuff on my
laptop is a joke.

I'll contact the OSDL folks to see if they can run it.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Joshua D. Drake

jd@commandprompt.com

about 22 years ago

In reply to: Josh Berkus (#6)

Re: Proposed Query Planner TODO items

I'm not going to be able to set this up. I just had to put my server into
cold storage due to dismantling my office, and running the TPC stuff on my
laptop is a joke.

I'll contact the OSDL folks to see if they can run it.

We can... depending on what you need for a server.

-- 
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL

Tom Lane

tgl@sss.pgh.pa.us

about 22 years ago

In reply to: Josh Berkus (#1)

Re: Proposed Query Planner TODO items

Josh Berkus <josh@agliodbs.com> writes:

2) DEVELOP BETTER PLANS FOR "OR GROUP" QUERIES

Summary: Currently, queries with complex "or group" criteria get devolved by
the planner into canonical and-or filters resulting in very poor execution on
large data sets. We should find better ways of dealing with these queries,
for example UNIONing.

Description: While helping OSDL with their derivative TPC-R benchmark, we ran
into a query (#19) which took several hours to complete on PostgreSQL.

I've made some progress on this over the last week or two. Would it be
possible to retry that benchmark with CVS tip?

regards, tom lane

Noname

markw@osdl.org

about 22 years ago

In reply to: Tom Lane (#8)

Re: Proposed Query Planner TODO items

On 5 Jan, Tom Lane wrote:

Josh Berkus <josh@agliodbs.com> writes:

2) DEVELOP BETTER PLANS FOR "OR GROUP" QUERIES

Summary: Currently, queries with complex "or group" criteria get devolved by
the planner into canonical and-or filters resulting in very poor execution on
large data sets. We should find better ways of dealing with these queries,
for example UNIONing.

Description: While helping OSDL with their derivative TPC-R benchmark, we ran
into a query (#19) which took several hours to complete on PostgreSQL.

I've made some progress on this over the last week or two. Would it be
possible to retry that benchmark with CVS tip?

Yeah, no problem. We'll pull the code from CVS and give it a try.

Mark

#10

Josh Berkus

josh@agliodbs.com

about 22 years ago

In reply to: Tom Lane (#8)

Re: Proposed Query Planner TODO items

Tom,

I've made some progress on this over the last week or two. Would it be
possible to retry that benchmark with CVS tip?

Yes! I'll just need some time to get my laptop set up for running it. My
server is, alas, in storage due to me being "between offices".

--
-Josh Berkus
Aglio Database Solutions
San Francisco

#11

Noname

markw@osdl.org

almost 22 years ago

In reply to: Tom Lane (#8)

Re: Proposed Query Planner TODO items

On 5 Jan, Tom Lane wrote:

Josh Berkus <josh@agliodbs.com> writes:

2) DEVELOP BETTER PLANS FOR "OR GROUP" QUERIES

Summary: Currently, queries with complex "or group" criteria get devolved by
the planner into canonical and-or filters resulting in very poor execution on
large data sets. We should find better ways of dealing with these queries,
for example UNIONing.

Description: While helping OSDL with their derivative TPC-R benchmark, we ran
into a query (#19) which took several hours to complete on PostgreSQL.

I've made some progress on this over the last week or two. Would it be
possible to retry that benchmark with CVS tip?

regards, tom lane

Sorry it's taking so long. I tried to take a export from CVS today and
the database appears not to be able to connect to the postmaster when I
attempt to create the database. Let me know if getting a trace of
anything will help, if you guys already aren't already aware of the
problem.

Mark

#12

Tom Lane

tgl@sss.pgh.pa.us

almost 22 years ago

In reply to: Noname (#11)

Re: Proposed Query Planner TODO items

markw@osdl.org writes:

Sorry it's taking so long. I tried to take a export from CVS today and
the database appears not to be able to connect to the postmaster when I
attempt to create the database. Let me know if getting a trace of
anything will help, if you guys already aren't already aware of the
problem.

CVS tip is not broken to my knowledge. Details please?

regards, tom lane

#13

Noname

markw@osdl.org

almost 22 years ago

In reply to: Tom Lane (#12)

1 attachment(s)

Re: Proposed Query Planner TODO items

On 6 Feb, Tom Lane wrote:

markw@osdl.org writes:

Sorry it's taking so long. I tried to take a export from CVS today and
the database appears not to be able to connect to the postmaster when I
attempt to create the database. Let me know if getting a trace of
anything will help, if you guys already aren't already aware of the
problem.

CVS tip is not broken to my knowledge. Details please?

I ran this:

$ strace -o /tmp/initdb-7.5.out initdb -D /opt/pgdb/dbt2
The files belonging to this database system will be owned by user "markw".
This user must also own the server process.

The database cluster will be initialized with locale C.

creating directory /opt/pgdb/dbt2 ... ok
creating directory /opt/pgdb/dbt2/global ... ok
creating directory /opt/pgdb/dbt2/pg_xlog ... ok
creating directory /opt/pgdb/dbt2/pg_clog ... ok
creating directory /opt/pgdb/dbt2/base ... ok
creating directory /opt/pgdb/dbt2/base/1 ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 1000
creating configuration files ... ok
creating template1 database in /opt/pgdb/dbt2/base/1 ... ERROR: relnatts disagrees with indnatts for index 16601
initdb: child process exited with exit code 1
initdb: failed
initdb: removing data directory "/opt/pgdb/dbt2"

I've never seen this relnatts and indnatts disagreements message before.
I'll attach a compressed strace.

Thanks,
Mark

#14

Tom Lane

tgl@sss.pgh.pa.us

almost 22 years ago

In reply to: Noname (#13)

Re: Proposed Query Planner TODO items

markw@osdl.org writes:

creating template1 database in /opt/pgdb/dbt2/base/1 ... ERROR: relnatts disagrees with indnatts for index 16601

Wow, that's a bizarre one. Are you sure you did a clean rebuild?
I usually like to do "make distclean" before or after "cvs update";
it tends to save me a lot of wasted time chasing build inconsistencies.
Which is what I suspect this is.

FWIW, my last CVS pull was yesterday about 15:00 EST, and it works fine.

regards, tom lane

#15

Noname

markw@osdl.org

almost 22 years ago

In reply to: Tom Lane (#14)

Re: Proposed Query Planner TODO items

On 6 Feb, Tom Lane wrote:

markw@osdl.org writes:

creating template1 database in /opt/pgdb/dbt2/base/1 ... ERROR: relnatts disagrees with indnatts for index 16601

Wow, that's a bizarre one. Are you sure you did a clean rebuild?
I usually like to do "make distclean" before or after "cvs update";
it tends to save me a lot of wasted time chasing build inconsistencies.
Which is what I suspect this is.

FWIW, my last CVS pull was yesterday about 15:00 EST, and it works fine.

Well, that make distclean did the trick. I actually did an export this
morning, not a checkout, but not like that should matter. Ok, will
hopefully get back with results soon.

Thanks,
Mark

#16

Noname

markw@osdl.org

almost 22 years ago

In reply to: Noname (#15)

Re: Proposed Query Planner TODO items

On 6 Feb, To: tgl@sss.pgh.pa.us wrote:

On 5 Jan, Tom Lane wrote:

Josh Berkus <josh@agliodbs.com> writes:

2) DEVELOP BETTER PLANS FOR "OR GROUP" QUERIES

Summary: Currently, queries with complex "or group" criteria get devolved by
the planner into canonical and-or filters resulting in very poor execution on
large data sets. We should find better ways of dealing with these queries,
for example UNIONing.

Description: While helping OSDL with their derivative TPC-R benchmark, we ran
into a query (#19) which took several hours to complete on PostgreSQL.

http://developer.osdl.org/markw/dbt3-pgsql/

There's a short summary of the tests I ran over the weekend, with links
to detailed retults. Comparing runs 43 (7.4) and 52 (7.5devel), it
looks like query #7 had the only significant improvement. Oprofile data
should be there too, if that'll help. Let us know if there's anything
else we can try for you.

Mark

Import Notes

Resolved by subject fallback

#17

Tom Lane

tgl@sss.pgh.pa.us

almost 22 years ago

In reply to: Noname (#16)

Re: Proposed Query Planner TODO items

markw@osdl.org writes:

http://developer.osdl.org/markw/dbt3-pgsql/

There's a short summary of the tests I ran over the weekend, with links
to detailed retults. Comparing runs 43 (7.4) and 52 (7.5devel), it
looks like query #7 had the only significant improvement. Oprofile data
should be there too, if that'll help. Let us know if there's anything
else we can try for you.

I couldn't figure out anything at all from that, possibly because many
of the links are dead, eg the "task" descriptions. I don't even see
where you see the time for query #7.

What would be interesting from my perspective is "explain" results (or
even better, "explain analyze" results) for the problem queries. Any
chance of extracting such a thing?

regards, tom lane

#18

Noname

markw@osdl.org

almost 22 years ago

In reply to: Tom Lane (#17)

Re: Proposed Query Planner TODO items

On 9 Feb, Tom Lane wrote:

markw@osdl.org writes:

http://developer.osdl.org/markw/dbt3-pgsql/

There's a short summary of the tests I ran over the weekend, with links
to detailed retults. Comparing runs 43 (7.4) and 52 (7.5devel), it
looks like query #7 had the only significant improvement. Oprofile data
should be there too, if that'll help. Let us know if there's anything
else we can try for you.

I couldn't figure out anything at all from that, possibly because many
of the links are dead, eg the "task" descriptions. I don't even see
where you see the time for query #7.

What would be interesting from my perspective is "explain" results (or
even better, "explain analyze" results) for the problem queries. Any
chance of extracting such a thing?

Sorry about the task links, I think I've got that corrected.

I'll see what I can do about the "explain" and "explain analyze"
results. I remember in the past that someone said it would be most
interesting to execute the latter while the test while running, as
opposed to before or after a test. Should I do that here too?

Mark

#19

Tom Lane

tgl@sss.pgh.pa.us

almost 22 years ago

In reply to: Noname (#18)

Re: Proposed Query Planner TODO items

markw@osdl.org writes:

I'll see what I can do about the "explain" and "explain analyze"
results. I remember in the past that someone said it would be most
interesting to execute the latter while the test while running, as
opposed to before or after a test. Should I do that here too?

If possible, but I'd settle for a standalone result, so long as it's
executed against the correct database contents (including pg_statistic
settings).

regards, tom lane

#20

Noname

markw@osdl.org

almost 22 years ago

In reply to: Tom Lane (#19)

Re: Proposed Query Planner TODO items

On 9 Feb, Tom Lane wrote:

markw@osdl.org writes:

I'll see what I can do about the "explain" and "explain analyze"
results. I remember in the past that someone said it would be most
interesting to execute the latter while the test while running, as
opposed to before or after a test. Should I do that here too?

If possible, but I'd settle for a standalone result, so long as it's
executed against the correct database contents (including pg_statistic
settings).

Ok, I've found that the kit does capture "explain" results and I've
added a "Query Plans" links under the query time charts on each of the
pages. Um, but I did notice a couple of problems. It looks liks one of
the 22 queries is missing and they're not labeled. I'll see about
getting that fixed.

Mark

#21

Josh Berkus

josh@agliodbs.com

almost 22 years ago

In reply to: Noname (#20)

Re: Proposed Query Planner TODO items

Mark,

Ok, I've found that the kit does capture "explain" results and I've
added a "Query Plans" links under the query time charts on each of the
pages. Um, but I did notice a couple of problems. It looks liks one of
the 22 queries is missing and they're not labeled. I'll see about
getting that fixed.

If #19 is missing it's because Oleg & I could not get it to complete. That
was also the query which we are most interested in testing.

--
Josh Berkus
Aglio Database Solutions
San Francisco

#22

Tom Lane

tgl@sss.pgh.pa.us

almost 22 years ago

In reply to: Josh Berkus (#21)

Re: Proposed Query Planner TODO items

Josh Berkus <josh@agliodbs.com> writes:

If #19 is missing it's because Oleg & I could not get it to complete. That
was also the query which we are most interested in testing.

Q19 doesn't seem to be particularly slow in either the 7.4 or 7.5 tests
--- there are many others with longer runtimes.  I speculate that what
is actually being run here is a modified Q19 query with the merge join
condition pulled out by hand.  The CVS-tip planner should be able to do
that for itself, though, and obtain essentially this same performance
with the per-spec query.

regards, tom lane

#23

Noname

markw@osdl.org

almost 22 years ago

In reply to: Josh Berkus (#21)

Re: Proposed Query Planner TODO items

On 9 Feb, Josh Berkus wrote:

Mark,

Ok, I've found that the kit does capture "explain" results and I've
added a "Query Plans" links under the query time charts on each of the
pages. Um, but I did notice a couple of problems. It looks liks one of
the 22 queries is missing and they're not labeled. I'll see about
getting that fixed.

If #19 is missing it's because Oleg & I could not get it to complete. That
was also the query which we are most interested in testing.

Oh, it's probably because we've altered Q19 and Q20. I'm still not all
that familiar with this kit, so I'm learning as we go. So we need to
change it back to make it worthwhile for you.

Mark

#24

Josh Berkus

josh@agliodbs.com

almost 22 years ago

In reply to: Noname (#23)

Re: Proposed Query Planner TODO items

Jenny,

For 19, we moved the common conditions out of the big ORs, for 20, we
added distinct. We can change the query back if the optimizer can
handle it now.

Well, we want to test if it can.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Import Notes

Reply to msg id not found: 1076374550.14046.47.camel@ibm-a.pdx.osdl.net

#25

Jenny Zhang

jenny@osdl.org

almost 22 years ago

In reply to: Noname (#23)

Re: Proposed Query Planner TODO items

On Mon, 2004-02-09 at 11:39, markw@osdl.org wrote:

On 9 Feb, Josh Berkus wrote:

Mark,

Ok, I've found that the kit does capture "explain" results and I've
added a "Query Plans" links under the query time charts on each of the
pages. Um, but I did notice a couple of problems. It looks liks one of
the 22 queries is missing and they're not labeled. I'll see about
getting that fixed.

If #19 is missing it's because Oleg & I could not get it to complete. That
was also the query which we are most interested in testing.

Oh, it's probably because we've altered Q19 and Q20. I'm still not all
that familiar with this kit, so I'm learning as we go. So we need to
change it back to make it worthwhile for you.

The missing one is query 15, since it has create view, and I can not get
execution plan for that query.

I started the kit on PG 7.3. We can not finish query 19 and 20 at that
time. So I rewrote 19 and 20:
For 19, we moved the common conditions out of the big ORs, for 20, we
added distinct. We can change the query back if the optimizer can
handle it now.

Jenny

#26

Jenny Zhang

jenny@osdl.org

almost 22 years ago

In reply to: Josh Berkus (#24)

1 attachment(s)

Re: Proposed Query Planner TODO items

On Mon, 2004-02-09 at 16:53, Josh Berkus wrote:

Jenny,

For 19, we moved the common conditions out of the big ORs, for 20, we
added distinct. We can change the query back if the optimizer can
handle it now.

Well, we want to test if it can.

Replace the file 19.sql under datagen/pgsql-queries with the attachment
should do it.

Jenny

#27

Noname

markw@osdl.org

almost 22 years ago

In reply to: Tom Lane (#22)

Re: Proposed Query Planner TODO items

Ok, I have EXPLAIN ANALYZE results for both the power and throughput
tests:
http://developer.osdl.org/markw/dbt3-pgsql/

It's run #60 and the links are towards the bottom of the page under the
"Run log data" heading. The results from the power test is
"power_query.result" and "thuput_qs1.result", etc. for each stream in
the throughput test.

Mark

#28

Josh Berkus

josh@agliodbs.com

almost 22 years ago

In reply to: Noname (#27)

Re: Proposed Query Planner TODO items

Mark,

It's run #60 and the links are towards the bottom of the page under the
"Run log data" heading. The results from the power test is
"power_query.result" and "thuput_qs1.result", etc. for each stream in
the throughput test.

I'm confused. Were you able to get the original-form query #19 to complete,
or not?

--
-Josh Berkus
Aglio Database Solutions
San Francisco

#29

Noname

markw@osdl.org

almost 22 years ago

In reply to: Josh Berkus (#28)

Re: Proposed Query Planner TODO items

On 12 Feb, Josh Berkus wrote:

Mark,

It's run #60 and the links are towards the bottom of the page under the
"Run log data" heading. The results from the power test is
"power_query.result" and "thuput_qs1.result", etc. for each stream in
the throughput test.

I'm confused. Were you able to get the original-form query #19 to complete,
or not?

Oh sorry, I completely forgot that Q19 the whole purpose of this. So
#60 doesn't have the right Q19. I'll run with the one you want now.

Mark

#30

Josh Berkus

josh@agliodbs.com

almost 22 years ago

In reply to: Noname (#29)

Re: Proposed Query Planner TODO items

Mark,

Oh sorry, I completely forgot that Q19 the whole purpose of this. So
#60 doesn't have the right Q19. I'll run with the one you want now.

Thanks! And the original, not the "fixed", Q19 if you please. It's the
original that wouldn't finish on Postgres 7.3.

--
-Josh Berkus
Aglio Database Solutions
San Francisco

#31

Tom Lane

tgl@sss.pgh.pa.us

almost 22 years ago

In reply to: Noname (#27)

Re: Proposed Query Planner TODO items

markw@osdl.org writes:

Ok, I have EXPLAIN ANALYZE results for both the power and throughput
tests:
http://developer.osdl.org/markw/dbt3-pgsql/

Thanks. I just looked at Q9 and Q21, since those are the slowest
queries according to your chart. (Are all the queries weighted the same
for evaluation purposes, or are some more important than others?)

The problem with Q9 seems to be an estimation failure:

-> Nested Loop (cost=0.00..437591.67 rows=92 width=74) (actual time=12.030..1603892.783 rows=681518 loops=1)
-> Nested Loop (cost=0.00..65364.57 rows=61720 width=43) (actual time=0.326..5667.573 rows=90676 loops=1)
-> Seq Scan on part (cost=0.00..15733.27 rows=15992 width=11) (actual time=0.183..1539.306 rows=22669 loops=1)
Filter: ((p_name)::text ~~ '%hot%'::text)
-> Index Scan using i_ps_partkey on partsupp (cost=0.00..3.05 rows=4 width=32) (actual time=0.119..0.151 rows=4 loops=22669)
Index Cond: ("outer".p_partkey = partsupp.ps_partkey)
-> Index Scan using i_l_suppkey_partkey on lineitem (cost=0.00..6.02 rows=1 width=64) (actual time=2.183..17.564 rows=8 loops=90676)
Index Cond: (("outer".p_partkey = lineitem.l_partkey) AND ("outer".ps_suppkey = lineitem.l_suppkey))

The estimate for the part/partsupp join is close enough (60K vs 90K
rows), but why is it estimating 92 rows out of the join to lineitem when
the true figure is 681518? With a more accurate estimate the planner
would probably have chosen different join methods above this point.

Can you show us the pg_stats rows for the columns p_partkey, l_partkey,
ps_suppkey, and l_suppkey?

It would also be interesting to see whether a better estimate emerges
if you increase default_statistics_target (try 100 or so).

Q21 is a more interesting case:

EXPLAIN ANALYZE
select s_name, count(*) as numwait
from supplier, lineitem l1, orders, nation
where s_suppkey = l1.l_suppkey and o_orderkey = l1.l_orderkey and o_orderstatus = 'F' and l1.l_receiptdate > l1.l_commitdate
and exists ( select * from lineitem l2 where l2.l_orderkey = l1.l_orderkey and l2.l_suppkey <> l1.l_suppkey )
and not exists ( select * from lineitem l3 where l3.l_orderkey = l1.l_orderkey and l3.l_suppkey <> l1.l_suppkey and l3.l_receiptdate > l3.l_commitdate )
and s_nationkey = n_nationkey and n_name = 'MOROCCO'
group by s_name
order by numwait desc, s_name
LIMIT 100;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2984783.51..2984783.76 rows=100 width=29) (actual time=1490860.249..1490860.460 rows=100 loops=1)
-> Sort (cost=2984783.51..2984831.91 rows=19361 width=29) (actual time=1490860.244..1490860.320 rows=100 loops=1)
Sort Key: count(*), supplier.s_name
-> HashAggregate (cost=2983356.52..2983404.92 rows=19361 width=29) (actual time=1490853.802..1490856.472 rows=760 loops=1)
-> Nested Loop (cost=0.00..2983259.72 rows=19361 width=29) (actual time=350.991..1490777.523 rows=7471 loops=1)
-> Nested Loop (cost=0.00..2862119.72 rows=40000 width=40) (actual time=350.805..1453771.752 rows=15369 loops=1)
-> Nested Loop (cost=0.00..994.08 rows=802 width=40) (actual time=0.152..187.510 rows=760 loops=1)
Join Filter: ("inner".s_nationkey = "outer".n_nationkey)
-> Seq Scan on nation (cost=0.00..1.31 rows=1 width=9) (actual time=0.088..0.113 rows=1 loops=1)
Filter: (n_name = 'MOROCCO'::bpchar)
-> Seq Scan on supplier (cost=0.00..742.34 rows=20034 width=49) (actual time=0.010..136.902 rows=20000 loops=1)
-> Index Scan using i_l_suppkey on lineitem l1 (cost=0.00..3566.81 rows=54 width=21) (actual time=87.928..1912.454 rows=20 loops=760)
Index Cond: ("outer".s_suppkey = l1.l_suppkey)
Filter: ((l_receiptdate > l_commitdate) AND (subplan) AND (NOT (subplan)))
SubPlan
-> Index Scan using i_l_orderkey on lineitem l3 (cost=0.00..3.13 rows=3 width=178) (actual time=0.066..0.066 rows=1 loops=277343)
Index Cond: (l_orderkey = $0)
Filter: ((l_suppkey <> $1) AND (l_receiptdate > l_commitdate))
-> Index Scan using i_l_orderkey on lineitem l2 (cost=0.00..3.11 rows=7 width=178) (actual time=0.812..0.812 rows=1 loops=287821)
Index Cond: (l_orderkey = $0)
Filter: (l_suppkey <> $1)
-> Index Scan using orders_pkey on orders (cost=0.00..3.02 rows=1 width=11) (actual time=2.397..2.399 rows=0 loops=15369)
Index Cond: (orders.o_orderkey = "outer".l_orderkey)
Filter: (o_orderstatus = 'F'::bpchar)
Total runtime: 1490867.126 ms
(25 rows)

I think the key issue here is that the two EXISTS tests depend only on
l1.l_orderkey and l1.l_suppkey of the outer query. Therefore they get
"pushed down" in the plan tree to be evaluated during the initial scan
of l1. This is normally a good heuristic choice, but because the EXISTS
tests are relatively expensive, that ends up forcing the planner to use
a nestloop-with-inner-index-scan join between nation/supplier and l1.
Any other join technique will involve a seqscan of l1 causing the EXISTS
tests to be evaluated at every row of lineitem; the planner correctly
ranks those alternatives as even worse than this.

The trouble is that the nestloop is hugely expensive: you can see that
the repeated indexscans on l1 take up 1912.454*760 - 0.066*277343 -
0.812*287821 or 1201449.750 msec, about 80% of the total.

It seems that the correct way to plan this query would require
postponing evaluation of the EXISTS clauses. If those were further up
the tree, the planner would have chosen a merge or hash join at this
step, which would probably take a tenth as much time. The cost to run
the EXISTS clauses themselves wouldn't change; they'd not be executed
any more frequently in this case.

I recall seeing traces in the code of logic that would attempt to delay
the evaluation of expensive WHERE tests, but that's been gone since
Berkeley days. Perhaps we should think about resurrecting it, or at
least putting in some kind of heuristic to try to cope better with this
case.

It would be interesting to see what the runtime looks like if you add
the following to the WHERE clauses of both inner EXISTS:
AND s_nationkey = n_nationkey AND o_orderkey = l1.l_orderkey
This would not change the results AFAICS, but it would force the
evaluation of the EXISTS clauses up to the top level of the outer plan
(since the planner would then see 'em as join constraints).

regards, tom lane

#32

Noname

markw@osdl.org

almost 22 years ago

In reply to: Josh Berkus (#30)

Re: Proposed Query Planner TODO items

On 12 Feb, Josh Berkus wrote:

Mark,

Oh sorry, I completely forgot that Q19 the whole purpose of this. So
#60 doesn't have the right Q19. I'll run with the one you want now.

Thanks! And the original, not the "fixed", Q19 if you please. It's the
original that wouldn't finish on Postgres 7.3.

Josh,

http://developer.osdl.org/markw/dbt3-pgsql/

Check out #61. I replaced the Q19 template with the one Jenny sent out.
Looks like it ran just fine. This run also has the EXPLAIN ANALYZE
results, but none of the other things Tom has asked for yet.

Mark

#33

Dennis Haney

davh@diku.dk

almost 22 years ago

In reply to: Tom Lane (#31)

Re: Proposed Query Planner TODO items

You are refering to:

@inproceedings{ hellerstein93predicate,
author = "Joseph M. Hellerstein and Michael Stonebraker",
title = "Predicate migration: optimizing queries with expensive
predicates",
pages = "267--276",
year = "1993",
abstract = "The traditional focus of relational query optimization
schemes has been on the choice of join methods and join orders.
Restrictions have typically been handled in query optimizers by
"predicate pushdown" rules, which apply restrictions in some random
order before as many joins as possible. These rules work under the
assumption that restriction is essentially a zero-time operation.
However, today's extensible and object-oriented database systems allow
users to define time-consuming functions,...",
url = "citeseer.nj.nec.com/article/hellerstein92predicate.html" }

Tom Lane wrote:

I think the key issue here is that the two EXISTS tests depend only on
l1.l_orderkey and l1.l_suppkey of the outer query. Therefore they get
"pushed down" in the plan tree to be evaluated during the initial scan
of l1. This is normally a good heuristic choice, but because the EXISTS
tests are relatively expensive, that ends up forcing the planner to use
a nestloop-with-inner-index-scan join between nation/supplier and l1.
Any other join technique will involve a seqscan of l1 causing the EXISTS
tests to be evaluated at every row of lineitem; the planner correctly
ranks those alternatives as even worse than this.

The trouble is that the nestloop is hugely expensive: you can see that
the repeated indexscans on l1 take up 1912.454*760 - 0.066*277343 -
0.812*287821 or 1201449.750 msec, about 80% of the total.

It seems that the correct way to plan this query would require
postponing evaluation of the EXISTS clauses. If those were further up
the tree, the planner would have chosen a merge or hash join at this
step, which would probably take a tenth as much time. The cost to run
the EXISTS clauses themselves wouldn't change; they'd not be executed
any more frequently in this case.

I recall seeing traces in the code of logic that would attempt to delay
the evaluation of expensive WHERE tests, but that's been gone since
Berkeley days. Perhaps we should think about resurrecting it, or at
least putting in some kind of heuristic to try to cope better with this
case.

It would be interesting to see what the runtime looks like if you add
the following to the WHERE clauses of both inner EXISTS:
AND s_nationkey = n_nationkey AND o_orderkey = l1.l_orderkey
This would not change the results AFAICS, but it would force the
evaluation of the EXISTS clauses up to the top level of the outer plan
(since the planner would then see 'em as join constraints).

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

--
Dennis

#34

Tom Lane

tgl@sss.pgh.pa.us

almost 22 years ago

In reply to: Dennis Haney (#33)

Re: Proposed Query Planner TODO items

Dennis Haney <davh@diku.dk> writes:

You are refering to:
@inproceedings{ hellerstein93predicate,
author = "Joseph M. Hellerstein and Michael Stonebraker",
title = "Predicate migration: optimizing queries with expensive
predicates",

Yup, I sure am. This is the same thesis referred to here:
http://archives.postgresql.org/pgsql-hackers/2002-06/msg00085.php

We may need to put some of it back ;-)

regards, tom lane

#35

Noname

markw@osdl.org

almost 22 years ago

In reply to: Tom Lane (#31)

Re: Proposed Query Planner TODO items

On 12 Feb, Tom Lane wrote:

markw@osdl.org writes:

Ok, I have EXPLAIN ANALYZE results for both the power and throughput
tests:
http://developer.osdl.org/markw/dbt3-pgsql/

Thanks. I just looked at Q9 and Q21, since those are the slowest
queries according to your chart. (Are all the queries weighted the same
for evaluation purposes, or are some more important than others?)

[snip]

The estimate for the part/partsupp join is close enough (60K vs 90K
rows), but why is it estimating 92 rows out of the join to lineitem when
the true figure is 681518? With a more accurate estimate the planner
would probably have chosen different join methods above this point.

Can you show us the pg_stats rows for the columns p_partkey, l_partkey,
ps_suppkey, and l_suppkey?

It would also be interesting to see whether a better estimate emerges
if you increase default_statistics_target (try 100 or so).

http://developer.osdl.org/markw/dbt3-pgsql/62/

This run changes default_statistics_target to 1000 and I have p_partkey,
l_partkey, ps_suppkey, and l_suppkey pg_stats here at 1 min intervals
http (no links on the web page.) Pretty significant performance change.

Power:

http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/power.l_partkey.out
http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/power.l_suppkey.out
http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/power.p_partkey.out
http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/power.ps_suppkey.out

Throughput:

http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/thuput.l_partkey.out
http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/thuput.l_suppkey.out
http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/thuput.p_partkey.out
http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/thuput.ps_suppkey.out

Something went wrong when I tried to run another test with the Q21
changes overnight, so I'll have to get back to you on that one.

Mark

#36

Tom Lane

tgl@sss.pgh.pa.us

almost 22 years ago

In reply to: Noname (#35)

Re: Proposed Query Planner TODO items

markw@osdl.org writes:

http://developer.osdl.org/markw/dbt3-pgsql/66/

There's a run with a modified Q21. Made a huge improvement in Q21.

Okay, looks like we know what we need to attack to solve Q21... actually
solving it will be a tad harder ;-) but we understand where the problem is.

I see what is going on to make Q4 slow, too. It's this:

where o_orderdate >= date '1995-04-01' and o_orderdate < date '1995-04-01' + interval '3 month'

(o_orderdate is of type "date", unsurprisingly). This produces

-> Index Scan using i_o_orderdate on orders (cost=0.00..2603496.38 rows=253677 width=19) (actual time=45.908..202483.023 rows=104083 loops=1)
Index Cond: (o_orderdate >= '1995-04-01'::date)
Filter: (((o_orderdate)::timestamp without time zone < '1995-07-01 00:00:00'::timestamp without time zone) AND (subplan))

that is, the lower bound is recognized as an indexscan constraint,
but the upper bound isn't because of the datatype mismatch. So we end
up fetching the whole table up through its ending date.

Up to now, all we could do about this sort of issue was to suggest that
people cast to eliminate the datatype mismatch:

where o_orderdate >= date '1995-04-01' and o_orderdate < CAST(date '1995-04-01' + interval '3 month' AS date)

but I dunno whether that's an allowed query modification under the TPC-H
rules.

As of CVS tip the issue could be eliminated by introducing
cross-data-type comparison operators between types date and timestamp
without time zone, and then making these be members of the date index
opclass. I'm strongly tempted to do so ...

regards, tom lane

Import Notes

Reply to msg id not found: 200402132326.i1DNQpE23476@mail.osdl.orgReference msg id not found: 200402132326.i1DNQpE23476@mail.osdl.org | Resolved by subject fallback

#37

Tom Lane

tgl@sss.pgh.pa.us

almost 22 years ago

In reply to: Tom Lane (#36)

Re: Proposed Query Planner TODO items

I wrote:

I see what is going on to make Q4 slow, too. It's this:
where o_orderdate >= date '1995-04-01' and o_orderdate < date '1995-04-01' + interval '3 month'
...
As of CVS tip the issue could be eliminated by introducing
cross-data-type comparison operators between types date and timestamp
without time zone, and then making these be members of the date index
opclass. I'm strongly tempted to do so ...

I have now done this, so if you care to re-sync with CVS tip you should
find that the queries using this sort of date constraint go faster.
(You do have indexes on all the date columns, no?)

regards, tom lane

#38

Dennis Haney

davh@diku.dk

almost 22 years ago

In reply to: Noname (#35)

Re: Proposed Query Planner TODO items

markw@osdl.org wrote:

On 12 Feb, Tom Lane wrote:

http://developer.osdl.org/markw/dbt3-pgsql/62/

This run changes default_statistics_target to 1000 and I have p_partkey,
l_partkey, ps_suppkey, and l_suppkey pg_stats here at 1 min intervals
http (no links on the web page.) Pretty significant performance change.

Why the filesystem change to ext2 at the same time?

Something went wrong when I tried to run another test with the Q21
changes overnight, so I'll have to get back to you on that one.

--
Dennis

#39

Noname

markw@osdl.org

almost 22 years ago

In reply to: Dennis Haney (#38)

Re: Proposed Query Planner TODO items

On 16 Feb, Dennis Haney wrote:

markw@osdl.org wrote:

On 12 Feb, Tom Lane wrote:

http://developer.osdl.org/markw/dbt3-pgsql/62/

This run changes default_statistics_target to 1000 and I have p_partkey,
l_partkey, ps_suppkey, and l_suppkey pg_stats here at 1 min intervals
http (no links on the web page.) Pretty significant performance change.

Why the filesystem change to ext2 at the same time?

I've been rotating filesystems occasionally. Otherwise no specific
reason.

Mark

#40

Noname

markw@osdl.org

almost 22 years ago

In reply to: Tom Lane (#37)

Re: Proposed Query Planner TODO items

On 15 Feb, Tom Lane wrote:

I wrote:

I see what is going on to make Q4 slow, too. It's this:
where o_orderdate >= date '1995-04-01' and o_orderdate < date '1995-04-01' + interval '3 month'
...
As of CVS tip the issue could be eliminated by introducing
cross-data-type comparison operators between types date and timestamp
without time zone, and then making these be members of the date index
opclass. I'm strongly tempted to do so ...

I have now done this, so if you care to re-sync with CVS tip you should
find that the queries using this sort of date constraint go faster.
(You do have indexes on all the date columns, no?)

I ran a test with the CAST you recommended for Q4 over the weekend:
http://developer.osdl.org/markw/dbt3-pgsql/68/

But it didn't seem to have much of an affect on Q4, compared to run
#66. I'll still give the CVS tip a try.

Mark

#41

Tom Lane

tgl@sss.pgh.pa.us

almost 22 years ago

In reply to: Noname (#40)

Re: Proposed Query Planner TODO items

markw@osdl.org writes:

I ran a test with the CAST you recommended for Q4 over the weekend:
http://developer.osdl.org/markw/dbt3-pgsql/68/
But it didn't seem to have much of an affect on Q4, compared to run
#66. I'll still give the CVS tip a try.

Hm. Disappointing. I can see from the EXPLAIN results that it is
picking up the additional index constraint correctly in this run.
That should have saved a good number of useless heap fetches.
[ works with the numbers a little... ] Actually, I guess it did:
it looks like the time spent in the indexscan proper went down from
44msec to 7msec. The problem is that the bulk of the query time is
actually going into the repeated EXISTS() sub-selects, and those didn't
get any better.

There are some other queries in the set that also have date limits of
this kind, so I still think it's worth redoing a run with CVS tip to
see if we pick up anything overall. (You do have indexes created on
all the date columns no?)

There's probably no way to make Q4 fly without finding a way to optimize
the EXISTS into an IN-join. I'll put that on my to-do list ... in the
meantime, if you feel like making a run to confirm that theory, try
modifying Q4 to replace

and exists ( select * from lineitem
where l_orderkey = o_orderkey and l_commitdate < l_receiptdate )

with

and o_orderkey in ( select l_orderkey from lineitem
where l_commitdate < l_receiptdate )

I think that either 7.4 or CVS tip will do better with this variant,
but it probably ought to be checked.

regards, tom lane

#42

Noname

markw@osdl.org

almost 22 years ago

In reply to: Tom Lane (#41)

Re: Proposed Query Planner TODO items

On 16 Feb, Tom Lane wrote:

markw@osdl.org writes:

I ran a test with the CAST you recommended for Q4 over the weekend:
http://developer.osdl.org/markw/dbt3-pgsql/68/
But it didn't seem to have much of an affect on Q4, compared to run
#66. I'll still give the CVS tip a try.

Hm. Disappointing. I can see from the EXPLAIN results that it is
picking up the additional index constraint correctly in this run.
That should have saved a good number of useless heap fetches.
[ works with the numbers a little... ] Actually, I guess it did:
it looks like the time spent in the indexscan proper went down from
44msec to 7msec. The problem is that the bulk of the query time is
actually going into the repeated EXISTS() sub-selects, and those didn't
get any better.

There are some other queries in the set that also have date limits of
this kind, so I still think it's worth redoing a run with CVS tip to
see if we pick up anything overall. (You do have indexes created on
all the date columns no?)

There's probably no way to make Q4 fly without finding a way to optimize
the EXISTS into an IN-join. I'll put that on my to-do list ... in the
meantime, if you feel like making a run to confirm that theory, try
modifying Q4 to replace

and exists ( select * from lineitem
where l_orderkey = o_orderkey and l_commitdate < l_receiptdate )

with

and o_orderkey in ( select l_orderkey from lineitem
where l_commitdate < l_receiptdate )

I think that either 7.4 or CVS tip will do better with this variant,
but it probably ought to be checked.

It looks like we have indexes on all of the date columns except
l_commitdate, which appears to be in Q4.

So I think I'll run against the CVS tip as is, again with an index on
l_commitdate, and then another test to confirm your theory. Sound good?

Mark

#43

Tom Lane

tgl@sss.pgh.pa.us

almost 22 years ago

In reply to: Noname (#42)

Re: Proposed Query Planner TODO items

markw@osdl.org writes:

It looks like we have indexes on all of the date columns except
l_commitdate, which appears to be in Q4.

So I think I'll run against the CVS tip as is, again with an index on
l_commitdate, and then another test to confirm your theory. Sound good?

Sure, it's only cycles ;-). I am not certain that an index on
commitdate would help any, but it's worth trying.

regards, tom lane

#44

Noname

markw@osdl.org

almost 22 years ago

In reply to: Tom Lane (#43)

Re: Proposed Query Planner TODO items

On 16 Feb, Tom Lane wrote:

markw@osdl.org writes:

It looks like we have indexes on all of the date columns except
l_commitdate, which appears to be in Q4.

So I think I'll run against the CVS tip as is, again with an index on
l_commitdate, and then another test to confirm your theory. Sound good?

Sure, it's only cycles ;-). I am not certain that an index on
commitdate would help any, but it's worth trying.

http://developer.osdl.org/markw/dbt3-pgsql/70/

Those are results from a pull from CVS I did this morning.
I reverted Q4 (removed the CAST), but the extra WHERE constraints are
still in Q21.

Mark

#45

Tatsuo Ishii

t-ishii@sra.co.jp

about 21 years ago

In reply to: Noname (#16)

Re: Proposed Query Planner TODO items

Mark,

I see nice graphs for each DBT3 query(for example,
http://developer.osdl.org/markw/dbt3-pgsql/42/q_time.png). It seems
they do not come with normal dbt3-1.4 kit. How did you get them?
Maybe you have slightly modified dbt3 kit?
--
Tatsuo Ishii

Show quoted text

On 6 Feb, To: tgl@sss.pgh.pa.us wrote:

On 5 Jan, Tom Lane wrote:

Josh Berkus <josh@agliodbs.com> writes:

2) DEVELOP BETTER PLANS FOR "OR GROUP" QUERIES

Summary: Currently, queries with complex "or group" criteria get devolved by
the planner into canonical and-or filters resulting in very poor execution on
large data sets. We should find better ways of dealing with these queries,
for example UNIONing.

Description: While helping OSDL with their derivative TPC-R benchmark, we ran
into a query (#19) which took several hours to complete on PostgreSQL.

http://developer.osdl.org/markw/dbt3-pgsql/

There's a short summary of the tests I ran over the weekend, with links
to detailed retults. Comparing runs 43 (7.4) and 52 (7.5devel), it
looks like query #7 had the only significant improvement. Oprofile data
should be there too, if that'll help. Let us know if there's anything
else we can try for you.

Mark

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

#46

Reini Urban

rurban@x-ray.at

about 21 years ago

In reply to: Tatsuo Ishii (#45)

Re: Proposed Query Planner TODO items

Tatsuo Ishii schrieb:

I see nice graphs for each DBT3 query(for example,
http://developer.osdl.org/markw/dbt3-pgsql/42/q_time.png). It seems
they do not come with normal dbt3-1.4 kit. How did you get them?
Maybe you have slightly modified dbt3 kit?

This looks like a simple ploticus one-liner.
like:
pl -png -o vbars.png -prefab vbars data=dbt3.data x=1 y=2 barwidth=line

see for example: http://ploticus.sourceforge.net/doc/prefab_vbars.html
or
http://phpwiki.sourceforge.net/phpwiki/PhpMemoryExhausted/Testresults
--
Reini Urban
http://xarch.tu-graz.ac.at/home/rurban/

#47

Mark Wong

markw@osdl.org

about 21 years ago

In reply to: Tatsuo Ishii (#45)

Re: Proposed Query Planner TODO items

Hi Tatsuo,

Yes, I've been updating the dbt3 kit over the past several months.
The query time graph is a new feature. It's available via BitKeeper
at bk://developer.osdl.org:/var/bk/dbt3 but I haven't tested the kit
well enough to make a v1.5 release yet. If BitKeeper isn't something
you can use, I can make a preliminary tarball for you.

Mark

On Mon, Oct 25, 2004 at 01:59:46PM +0900, Tatsuo Ishii wrote:

Mark,

I see nice graphs for each DBT3 query(for example,
http://developer.osdl.org/markw/dbt3-pgsql/42/q_time.png). It seems
they do not come with normal dbt3-1.4 kit. How did you get them?
Maybe you have slightly modified dbt3 kit?
--
Tatsuo Ishii

On 6 Feb, To: tgl@sss.pgh.pa.us wrote:

On 5 Jan, Tom Lane wrote:

Josh Berkus <josh@agliodbs.com> writes:

2) DEVELOP BETTER PLANS FOR "OR GROUP" QUERIES

Summary: Currently, queries with complex "or group" criteria get devolved by
the planner into canonical and-or filters resulting in very poor execution on
large data sets. We should find better ways of dealing with these queries,
for example UNIONing.

Description: While helping OSDL with their derivative TPC-R benchmark, we ran
into a query (#19) which took several hours to complete on PostgreSQL.

http://developer.osdl.org/markw/dbt3-pgsql/

There's a short summary of the tests I ran over the weekend, with links
to detailed retults. Comparing runs 43 (7.4) and 52 (7.5devel), it
looks like query #7 had the only significant improvement. Oprofile data
should be there too, if that'll help. Let us know if there's anything
else we can try for you.

Mark

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

--
Mark Wong - - markw@osdl.org
Open Source Development Lab Inc - A non-profit corporation
12725 SW Millikan Way - Suite 400 - Beaverton, OR 97005
(503) 626-2455 x 32 (office)
(503) 626-2436 (fax)
http://developer.osdl.org/markw/

#48

Tatsuo Ishii

t-ishii@sra.co.jp

about 21 years ago

In reply to: Mark Wong (#47)

Re: Proposed Query Planner TODO items

Hi,

Thanks for the info. Would you give me the tarball?
--
Tatsuo Ishii

Show quoted text

Hi Tatsuo,

Yes, I've been updating the dbt3 kit over the past several months.
The query time graph is a new feature. It's available via BitKeeper
at bk://developer.osdl.org:/var/bk/dbt3 but I haven't tested the kit
well enough to make a v1.5 release yet. If BitKeeper isn't something
you can use, I can make a preliminary tarball for you.

Mark

On Mon, Oct 25, 2004 at 01:59:46PM +0900, Tatsuo Ishii wrote:

Mark,

I see nice graphs for each DBT3 query(for example,
http://developer.osdl.org/markw/dbt3-pgsql/42/q_time.png). It seems
they do not come with normal dbt3-1.4 kit. How did you get them?
Maybe you have slightly modified dbt3 kit?
--
Tatsuo Ishii

On 6 Feb, To: tgl@sss.pgh.pa.us wrote:

On 5 Jan, Tom Lane wrote:

Josh Berkus <josh@agliodbs.com> writes:

2) DEVELOP BETTER PLANS FOR "OR GROUP" QUERIES

Summary: Currently, queries with complex "or group" criteria get devolved by
the planner into canonical and-or filters resulting in very poor execution on
large data sets. We should find better ways of dealing with these queries,
for example UNIONing.

Description: While helping OSDL with their derivative TPC-R benchmark, we ran
into a query (#19) which took several hours to complete on PostgreSQL.

http://developer.osdl.org/markw/dbt3-pgsql/

There's a short summary of the tests I ran over the weekend, with links
to detailed retults. Comparing runs 43 (7.4) and 52 (7.5devel), it
looks like query #7 had the only significant improvement. Oprofile data
should be there too, if that'll help. Let us know if there's anything
else we can try for you.

Mark

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

--
Mark Wong - - markw@osdl.org
Open Source Development Lab Inc - A non-profit corporation
12725 SW Millikan Way - Suite 400 - Beaverton, OR 97005
(503) 626-2455 x 32 (office)
(503) 626-2436 (fax)
http://developer.osdl.org/markw/

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

#49

Mark Wong

markw@osdl.org

about 21 years ago

In reply to: Tatsuo Ishii (#48)

Re: Proposed Query Planner TODO items

Hi Tatsuo,

I've made a new release:
http://prdownloads.sourceforge.net/osdldbt/dbt3-v1.5.tar.gz?download

Let me know if there are any problems.

Show quoted text

On Tue, Oct 26, 2004 at 12:44:49PM +0900, Tatsuo Ishii wrote:

Hi,

Thanks for the info. Would you give me the tarball?

#50

Tatsuo Ishii

t-ishii@sra.co.jp

about 21 years ago

In reply to: Mark Wong (#49)

Re: Proposed Query Planner TODO items

Hi Tatsuo,

I've made a new release:
http://prdownloads.sourceforge.net/osdldbt/dbt3-v1.5.tar.gz?download

Let me know if there are any problems.

Thanks!

Show quoted text

On Tue, Oct 26, 2004 at 12:44:49PM +0900, Tatsuo Ishii wrote:

Hi,

Thanks for the info. Would you give me the tarball?

#51

Tatsuo Ishii

t-ishii@sra.co.jp

about 21 years ago

In reply to: Tatsuo Ishii (#50)

Re: Proposed Query Planner TODO items

Hi Tatsuo,

I've made a new release:
http://prdownloads.sourceforge.net/osdldbt/dbt3-v1.5.tar.gz?download

Let me know if there are any problems.

Thanks!

Just for quick note, it seems query 19 takes forever. Have you
successfully run Q19?
--
Tatsuo Ishii

#52

Tatsuo Ishii

t-ishii@sra.co.jp

about 21 years ago

In reply to: Tatsuo Ishii (#51)

1 attachment(s)

DBT-3 v1.5 Q19 (Re: Proposed Query Planner TODO items)

Hi Tatsuo,

I've made a new release:
http://prdownloads.sourceforge.net/osdldbt/dbt3-v1.5.tar.gz?download

Let me know if there are any problems.

Thanks!

Just for quick note, it seems query 19 takes forever. Have you
successfully run Q19?

Here is the more detailed info. The query was not finished within 3
days and was canceled on a Dual Xeon 2.8GHz with 2.5GB RAM running
Linux. PostgreSQL is 7.4.5 with default postgresql.conf. An explain
output is attatched.
--
Tatsuo Ishii

#53

Tom Lane

tgl@sss.pgh.pa.us

about 21 years ago

In reply to: Tatsuo Ishii (#52)

Re: DBT-3 v1.5 Q19 (Re: Proposed Query Planner TODO items)

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

Just for quick note, it seems query 19 takes forever. Have you
successfully run Q19?

Here is the more detailed info. The query was not finished within 3
days and was canceled on a Dual Xeon 2.8GHz with 2.5GB RAM running
Linux. PostgreSQL is 7.4.5 with default postgresql.conf.

7.4's planner is not able to do anything useful with the complicated
WHERE clause in Q19. I believe I've improved that situation for 8.0.
A really slick solution will probably have to await the appearance of
bitmap indexes, though.

regards, tom lane

#54

Tatsuo Ishii

t-ishii@sra.co.jp

about 21 years ago

In reply to: Tom Lane (#53)

Re: DBT-3 v1.5 Q19

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

Just for quick note, it seems query 19 takes forever. Have you
successfully run Q19?

Here is the more detailed info. The query was not finished within 3
days and was canceled on a Dual Xeon 2.8GHz with 2.5GB RAM running
Linux. PostgreSQL is 7.4.5 with default postgresql.conf.

7.4's planner is not able to do anything useful with the complicated
WHERE clause in Q19. I believe I've improved that situation for 8.0.
A really slick solution will probably have to await the appearance of
bitmap indexes, though.

Thanks. I will try with 8.0.
--
Tatsuo Ishii

Proposed Query Planner TODO items

Attachments:

Attachments:

Attachments: