Re: "SELECT IN" Still Broken in 7.4b
On Wed, 20 Aug 2003, Mike Winter wrote:
I'm sure many on this list are sick of hearing about this problem, but it
was on the fix list for 7.4, but doesn't appear to have been changed.
IN (subselect) was changed for 7.4 (although I'm not sure of the list
mentions the difference). I don't know of any major changes to IN
(valuelist) though.
Import Notes
Reply to msg id not found: Pine.LNX.4.33L2.0308201002210.6023-100000@frontlogic.com
On Wed, 20 Aug 2003, Stephan Szabo wrote:
On Wed, 20 Aug 2003, Mike Winter wrote:
I'm sure many on this list are sick of hearing about this problem, but it
was on the fix list for 7.4, but doesn't appear to have been changed.IN (subselect) was changed for 7.4 (although I'm not sure of the list
mentions the difference). I don't know of any major changes to IN
(valuelist) though.
Thanks, Stephan. I was really hoping that the IN(valuelist) was going to
be changed at the same time, because it really is unusable for anything
over a couple of thousand values.
--
_______________________________________________________________________
Front Logic Inc. Tel: 306.653.2725 x14
226 Pacific Ave or 1.800.521.4510 x14
Saskatoon, SK Fax: 306.653.0972
S7K 1N9 Canada Cell: 306.717.2550
http://www.frontlogic.com mike.winter@frontlogic.com
Thanks, Stephan. I was really hoping that the IN(valuelist) was going to
be changed at the same time, because it really is unusable for anything
over a couple of thousand values.
Changed to do what?
I suppose that the ability to combine several index scans via a bitmap
would help to linearize those, but that is far from an IN(valuelist)
specific enhancement.
On Wed, 20 Aug 2003, Rod Taylor wrote:
Thanks, Stephan. I was really hoping that the IN(valuelist) was going to
be changed at the same time, because it really is unusable for anything
over a couple of thousand values.Changed to do what?
One possibility might be to act as if the valuelist was a table and do the
IN as if it were that way, rather than treating it as a set of ORs. That
would be basically like doing the temporary table solution, but without
requiring the user to do it.
On Wed, 2003-08-20 at 17:41, Stephan Szabo wrote:
On Wed, 20 Aug 2003, Rod Taylor wrote:
Thanks, Stephan. I was really hoping that the IN(valuelist) was going to
be changed at the same time, because it really is unusable for anything
over a couple of thousand values.Changed to do what?
One possibility might be to act as if the valuelist was a table and do the
IN as if it were that way, rather than treating it as a set of ORs. That
would be basically like doing the temporary table solution, but without
requiring the user to do it.
Is the temp table version any faster? I realize it has a higher limit
to the number of items you can have in the list.
On Wed, 20 Aug 2003, Rod Taylor wrote:
On Wed, 2003-08-20 at 17:41, Stephan Szabo wrote:
On Wed, 20 Aug 2003, Rod Taylor wrote:
Thanks, Stephan. I was really hoping that the IN(valuelist) was going to
be changed at the same time, because it really is unusable for anything
over a couple of thousand values.Changed to do what?
One possibility might be to act as if the valuelist was a table and do the
IN as if it were that way, rather than treating it as a set of ORs. That
would be basically like doing the temporary table solution, but without
requiring the user to do it.Is the temp table version any faster? I realize it has a higher limit
to the number of items you can have in the list.
Within the scope of the new hashed IN stuff I believe so in at least some
cases. I have a few million row table of integers where searching for
values IN (~10000 values) takes longer than creating the temp table,
copying into it and doing the in subquery. That's not a particularly
meaningful test case, but sending the psql output to /dev/null gives me:
create temp table/copy 10001 entries/select in subquery - .8 sec
select in (value list 9998 entries) - ~ 2min 19 sec
explain select in (value list) - ~ 4.8 sec
On Wed, 20 Aug 2003, Stephan Szabo wrote:
On Wed, 20 Aug 2003, Rod Taylor wrote:
Thanks, Stephan. I was really hoping that the IN(valuelist) was going to
be changed at the same time, because it really is unusable for anything
over a couple of thousand values.Changed to do what?
One possibility might be to act as if the valuelist was a table and do the
IN as if it were that way, rather than treating it as a set of ORs. That
would be basically like doing the temporary table solution, but without
requiring the user to do it.
for integers we use contrib/intarray as a workaround. In principle,
it's possible to extend intarray to general array.
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
Stephan Szabo wrote:
On Wed, 20 Aug 2003, Rod Taylor wrote:
...Is the temp table version any faster? I realize it has a higher limit
to the number of items you can have in the list.Within the scope of the new hashed IN stuff I believe so in at least some
cases. I have a few million row table of integers where searching for
values IN (~10000 values) takes longer than creating the temp table,
copying into it and doing the in subquery. That's not a particularly
meaningful test case, but sending the psql output to /dev/null gives me: ...
But where do your values come from in the first place?
Couldn't you optimize your model so that you don't have to copy around
such amounts of data?
Regards,
Dani
On Thu, 21 Aug 2003, Dani Oderbolz wrote:
Stephan Szabo wrote:
On Wed, 20 Aug 2003, Rod Taylor wrote:
...Is the temp table version any faster? I realize it has a higher limit
to the number of items you can have in the list.Within the scope of the new hashed IN stuff I believe so in at least some
cases. I have a few million row table of integers where searching for
values IN (~10000 values) takes longer than creating the temp table,
copying into it and doing the in subquery. That's not a particularly
meaningful test case, but sending the psql output to /dev/null gives me: ...But where do your values come from in the first place?
Couldn't you optimize your model so that you don't have to copy around
such amounts of data?
I wasn't the OP, I was doing a simple test as a comparison between
the two forms of the queries to see if making a temp table and populating
it and then doing the subselect form could ever be faster than the current
conversion for valuelists to a sequence of or conditions. In 7.3, it
probably was not possible for a conversion to in subselect to be faster,
but with the new hash subquery stuff it was worth trying again.
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
Within the scope of the new hashed IN stuff I believe so in at least some
cases. I have a few million row table of integers where searching for
values IN (~10000 values) takes longer than creating the temp table,
copying into it and doing the in subquery.
I did some profiling and soon realized that the main problem is the
executor's trick for not returning the same row multiple times in a
multiple-indexscan plan node. The point is that given
WHERE a = 1 OR b = 1
you could create a plan that first indexscans on a, then indexscans on
b --- but you mustn't return any tuples in the second scan that you
already returned in the first. IndexNext solves this by evaluating the
prior-scan index conditions to see if they are true. Which is okay if
there aren't too many of them. But when you have an N-element IN list
this means you are going to do O(N^2) index expression evaluations.
In the 10000-element IN-list test case, ExecQual gets called almost
50 million times :-(
I'm toying with the notion of ripping out that logic and instead
building an in-memory hashtable of already-returned TIDs. This could
theoretically run out of memory if the multiple indexscan returns enough
tuples, but I think in practice that wouldn't happen because the planner
wouldn't choose an indexscan when very large numbers of tuples are being
selected.
Comments?
regards, tom lane
On Thu, 21 Aug 2003, Tom Lane wrote:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
Within the scope of the new hashed IN stuff I believe so in at least some
cases. I have a few million row table of integers where searching for
values IN (~10000 values) takes longer than creating the temp table,
copying into it and doing the in subquery.I did some profiling and soon realized that the main problem is the
executor's trick for not returning the same row multiple times in a
multiple-indexscan plan node. The point is that given
WHERE a = 1 OR b = 1
you could create a plan that first indexscans on a, then indexscans on
b --- but you mustn't return any tuples in the second scan that you
already returned in the first. IndexNext solves this by evaluating the
prior-scan index conditions to see if they are true. Which is okay if
there aren't too many of them. But when you have an N-element IN list
this means you are going to do O(N^2) index expression evaluations.
In the 10000-element IN-list test case, ExecQual gets called almost
50 million times :-(I'm toying with the notion of ripping out that logic and instead
building an in-memory hashtable of already-returned TIDs. This could
theoretically run out of memory if the multiple indexscan returns enough
tuples, but I think in practice that wouldn't happen because the planner
wouldn't choose an indexscan when very large numbers of tuples are being
selected.
Well, if you want to be safer, I guess you could (at runtime) decide that
the table's gotten too big and fall back to the old method if you didn't
entirely rip it out. I'm not sure if that'd be too ugly though, but it
would mean that you wouldn't have to worry about it returning too many
tuples.
On Thu, 21 Aug 2003 16:42:20 -0400, Tom Lane <tgl@sss.pgh.pa.us>
wrote:
The point is that given
WHERE a = 1 OR b = 1
you could create a plan that first indexscans on a, then indexscans on
b --- but you mustn't return any tuples in the second scan that you
already returned in the first. IndexNext solves this by evaluating the
prior-scan index conditions to see if they are true.
WHERE a = 1 OR b = 2
and
WHERE a = 1 OR a = 2
are totally different things. In the latter case you don't have to
check prior conditions because the conditions are mutually exclusive.
Is this reasonably easy to find out at plan creation time?
Yes, I changed your example to make my point clear, because
WHERE a = 1 OR a = 1
has its own set of problems.
Servus
Manfred
Tom,
I'm toying with the notion of ripping out that logic and instead
building an in-memory hashtable of already-returned TIDs. This could
theoretically run out of memory if the multiple indexscan returns enough
tuples, but I think in practice that wouldn't happen because the planner
wouldn't choose an indexscan when very large numbers of tuples are being
selected.
Don't forget all of the tyros who tune their queries through "set
enable_seqscan=false". I think we'd need some kind of memory safety valve on
this, like sandboxing it in sort_mem.
--
-Josh Berkus
Aglio Database Solutions
San Francisco
Manfred Koizar <mkoi-pg@aon.at> writes:
WHERE a = 1 OR a = 2
are totally different things. In the latter case you don't have to
check prior conditions because the conditions are mutually exclusive.
Is this reasonably easy to find out at plan creation time?
Yeah, I know, but I see no easy way to verify this (where "easy" means
"doesn't take an unreasonable amount of time"). A naive check would
take O(N^2) time, putting you right back where you started. Even a
smart check would surely take more time per item than one hashtable probe.
I'm also concerned about how much the planner would have to assume about
the semantics of the operators in order to prove the conditions are
mutually exclusive.
Finally, I suspect that once we get rid of the O(N^2) behavior in the
executor, we will find that the next biggest bottleneck is in the
planner; adding more work for it to do per OR-clause item will make
things worse not better.
regards, tom lane
Tom Lane wrote:
I'm toying with the notion of ripping out that logic and instead
building an in-memory hashtable of already-returned TIDs. This could
theoretically run out of memory if the multiple indexscan returns enough
tuples, but I think in practice that wouldn't happen because the planner
wouldn't choose an indexscan when very large numbers of tuples are being
selected.
I thought with your recent changes, you could use dynahash, which
already spills to disk instead of consuming all memory?
Joe
Joe Conway <mail@joeconway.com> writes:
Tom Lane wrote:
I'm toying with the notion of ripping out that logic and instead
building an in-memory hashtable of already-returned TIDs. This could
theoretically run out of memory if the multiple indexscan returns enough
tuples, but I think in practice that wouldn't happen because the planner
wouldn't choose an indexscan when very large numbers of tuples are being
selected.
I thought with your recent changes, you could use dynahash, which
already spills to disk instead of consuming all memory?
I was going to use dynahash, but it doesn't spill to disk. You're
confusing that with the HashJoin mechanism, which is quite different and
only really useful for joins.
regards, tom lane
I'm toying with the notion of ripping out that logic and instead
building an in-memory hashtable of already-returned TIDs. This could
theoretically run out of memory if the multiple indexscan returns enough
tuples, but I think in practice that wouldn't happen because the planner
wouldn't choose an indexscan when very large numbers of tuples are being
selected.Comments?
Sounds kind of like a bitmap index almost..
Chris
On 21 Aug 2003 at 16:42, Tom Lane wrote:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
Within the scope of the new hashed IN stuff I believe so in at least some
cases. I have a few million row table of integers where searching for
values IN (~10000 values) takes longer than creating the temp table,
copying into it and doing the in subquery.I did some profiling and soon realized that the main problem is the
executor's trick for not returning the same row multiple times in a
multiple-indexscan plan node. The point is that given
WHERE a = 1 OR b = 1
you could create a plan that first indexscans on a, then indexscans on
b --- but you mustn't return any tuples in the second scan that you
already returned in the first. IndexNext solves this by evaluating the
prior-scan index conditions to see if they are true. Which is okay if
there aren't too many of them. But when you have an N-element IN list
this means you are going to do O(N^2) index expression evaluations.
In the 10000-element IN-list test case, ExecQual gets called almost
50 million times :-(I'm toying with the notion of ripping out that logic and instead
building an in-memory hashtable of already-returned TIDs. This could
theoretically run out of memory if the multiple indexscan returns enough
tuples, but I think in practice that wouldn't happen because the planner
wouldn't choose an indexscan when very large numbers of tuples are being
selected.
If I understand it correctly, we are essentially looking at ~10000 individual
index scans, in above example, isn't it? So if planner takes this in account,
it probably would not opt for seq. scan.
Other idea could be, index the in list first and search the index using
locality of values in the in list. If the 10K values in the list are between
10K-20K, they should be pretty easy to locate with single index sweep, assuming
uniform distribution. May be planner could build this IN list index before
drawing plan to determine cardinality and distribution of individual values.
On the least side, it could draw a plan for fetching a tuple range between min
and max of IN values and building in memory hash of these values for
comparison. That's would surely be cheaper than scanning entire table/result
set over ad over
Frankly, I would say a temp table is far better solution..:-)
Just a thought...
Bye
Shridhar
--
Youth doesn't excuse everything. -- Dr. Janice Lester (in Kirk's body),
"Turnabout Intruder", stardate 5928.5.
Tom Lane <tgl@sss.pgh.pa.us> writes:
Finally, I suspect that once we get rid of the O(N^2) behavior in the
executor, we will find that the next biggest bottleneck is in the
planner; adding more work for it to do per OR-clause item will make
things worse not better.
But time spent in the planner doesn't matter if you're using prepared queries
which is true for OLTP applications where there are very few queries being
executed many many times. I hear web sites are quite popular these days.
I missed the beginning of this thread so I'm not sure if it's relevant. But
wouldn't it be possible to just check if all the clauses and see if they're
using precisely the same index with an equality type operator? That won't
catch things like "a BETWEEN 1 AND 2 OR a BETWEEN 5 AND 6" but it will catch
things like "a IN (1,2,3,4,5,6)". And I don't see how it wouldn't be linear
in the number of clauses.
--
greg
On 22 Aug 2003, Greg Stark wrote:
Tom Lane <tgl@sss.pgh.pa.us> writes:
Finally, I suspect that once we get rid of the O(N^2) behavior in the
executor, we will find that the next biggest bottleneck is in the
planner; adding more work for it to do per OR-clause item will make
things worse not better.But time spent in the planner doesn't matter if you're using prepared queries
which is true for OLTP applications where there are very few queries being
executed many many times. I hear web sites are quite popular these days.
True, but not every application is such. You still need to balance
planning time with other concerns.
I missed the beginning of this thread so I'm not sure if it's relevant. But
wouldn't it be possible to just check if all the clauses and see if they're
using precisely the same index with an equality type operator? That won't
catch things like "a BETWEEN 1 AND 2 OR a BETWEEN 5 AND 6" but it will catch
things like "a IN (1,2,3,4,5,6)". And I don't see how it wouldn't be linear
in the number of clauses.
But that wouldn't help unless you made sure that what was being compared
was not the same AFAICT (since the point would be to not need to check if
the rows were already returned) since a=1 or a=1 is legal if meaningless.
And that's not necessarily immediately obvious depending on the behavior
of the = operator without trying it, for example does where a='c ' or
a='c' have a redundancy or not?
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
Well, if you want to be safer, I guess you could (at runtime) decide that
the table's gotten too big and fall back to the old method if you didn't
entirely rip it out. I'm not sure if that'd be too ugly though, but it
would mean that you wouldn't have to worry about it returning too many
tuples.
I did it this way --- it falls back to the old code if the TID hash
table grows to exceed SortMem. Should be noticeably faster than the
old code for reasonably-sized IN lists.
regards, tom lane