Question with hashed IN

Started by Stephan Szaboover 22 years ago9 messages
#1Stephan Szabo
sszabo@megazone.bigpanda.com

I've noticed that when the stats are wrong (like
in cases where you've loaded data but reltuples
hasn't been updated yet) that a hashed NOT IN
seems to take a significant time penalty. Is
this to be expected?

On a pktest table with 1 million integers and a dual table with a single
integer and sort_mem set high enough to give a hashed subplan for the
various reltuples values, I saw the following behavior for

explain analyze select * from dual where a not in (select a from pktest);

with reltuples=1000 for pktest, query takes about 96 seconds
reltuples=10000, query takes about 15 seconds
reltuples=100000, query takes about 8 seconds

And the memory usage seemed to be the same even if I set sort_mem back
to 1024.

#2Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Stephan Szabo (#1)
Re: Question with hashed IN

On Sat, 16 Aug 2003, Stephan Szabo wrote:

I've noticed that when the stats are wrong (like
in cases where you've loaded data but reltuples
hasn't been updated yet) that a hashed NOT IN
seems to take a significant time penalty. Is
this to be expected?

On a pktest table with 1 million integers and a dual table with a single
integer and sort_mem set high enough to give a hashed subplan for the
various reltuples values, I saw the following behavior for

explain analyze select * from dual where a not in (select a from pktest);

with reltuples=1000 for pktest, query takes about 96 seconds
reltuples=10000, query takes about 15 seconds
reltuples=100000, query takes about 8 seconds

And the memory usage seemed to be the same even if I set sort_mem back
to 1024.

Errm, I meant in the cases where it still chose a hashed
subplan. Stupid cold medicine.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephan Szabo (#2)
Re: Question with hashed IN

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

with reltuples=1000 for pktest, query takes about 96 seconds
reltuples=10000, query takes about 15 seconds
reltuples=100000, query takes about 8 seconds

Errm, I meant in the cases where it still chose a hashed
subplan. Stupid cold medicine.

I'm confused too. Please explain again when you're feeling better...

regards, tom lane

#4Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Tom Lane (#3)
Re: Question with hashed IN

On Sun, 17 Aug 2003, Tom Lane wrote:

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

with reltuples=1000 for pktest, query takes about 96 seconds
reltuples=10000, query takes about 15 seconds
reltuples=100000, query takes about 8 seconds

Errm, I meant in the cases where it still chose a hashed
subplan. Stupid cold medicine.

I'm confused too. Please explain again when you're feeling better...

Basically, the first thing I noticed was that changing reltuples
on the pg_class row for a table affected the speed of

explain analyze select * from othertable where foo not in (select bar from
table);

even when the plan wasn't changing, seqscan + filter on hashed subquery.
I thought that was kind of odd since the plan didn't seem any different,
but the real world time changed by about a factor of 10.

Then I noted that changing sort_mem changed the point at which it would
choose a hashed subquery in the initial plan based on the estimated
tuples, but didn't seem to actually affect the real memory usage, which
means that a table with a few million rows but reltuples still set at 1000
would eat up a very large amount of memory (in my case it sent my machine
a few hundred megs into swap).

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephan Szabo (#4)
Re: Question with hashed IN

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

Basically, the first thing I noticed was that changing reltuples
on the pg_class row for a table affected the speed of
explain analyze select * from othertable where foo not in (select bar from
table);
even when the plan wasn't changing, seqscan + filter on hashed subquery.

That doesn't make any sense to me --- AFAICS, only the planner pays any
attention to reltuples, so it could only affect things via changing the
plan. Could we see details?

Then I noted that changing sort_mem changed the point at which it would
choose a hashed subquery in the initial plan based on the estimated
tuples, but didn't seem to actually affect the real memory usage,

Yeah, the hashed=subquery code doesn't make any attempt to spill to
disk. So if the planner's estimate is badly off, you could see actual
usage well in excess of sort_mem.

regards, tom lane

#6Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Tom Lane (#5)
2 attachment(s)
Re: Question with hashed IN

On Sun, 17 Aug 2003, Tom Lane wrote:

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

Basically, the first thing I noticed was that changing reltuples
on the pg_class row for a table affected the speed of
explain analyze select * from othertable where foo not in (select bar from
table);
even when the plan wasn't changing, seqscan + filter on hashed subquery.

That doesn't make any sense to me --- AFAICS, only the planner pays any
attention to reltuples, so it could only affect things via changing the
plan. Could we see details?

I've included a perl file that generates data like that I was using and
the output of the commands from that through psql -E on my machine. The
times seem pretty repeatable in any order so caching and such doesn't seem
to be playing a big part.

Then I noted that changing sort_mem changed the point at which it would
choose a hashed subquery in the initial plan based on the estimated
tuples, but didn't seem to actually affect the real memory usage,

Yeah, the hashed=subquery code doesn't make any attempt to spill to
disk. So if the planner's estimate is badly off, you could see actual
usage well in excess of sort_mem.

Ah, that makes sense then.

Attachments:

hashtest.plapplication/x-perl; name=hashtest.plDownload
hashtest.outtext/plain; charset=US-ASCII; name=hashtest.outDownload
#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephan Szabo (#6)
Re: Question with hashed IN

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

On Sun, 17 Aug 2003, Tom Lane wrote:

That doesn't make any sense to me --- AFAICS, only the planner pays any
attention to reltuples, so it could only affect things via changing the
plan. Could we see details?

I've included a perl file that generates data like that I was using and
the output of the commands from that through psql -E on my machine. The
times seem pretty repeatable in any order so caching and such doesn't seem
to be playing a big part.

Oh, I see what it is. The initial sizing of the hash table (number of
buckets) is done using the planner's estimate of the number of rows out
of the subplan. In your later examples, the hash table is woefully
overloaded and so searching it takes longer (too many items on each
hash chain).

I'm not sure how important this is to work on. We could try to make the
executor's hash code more able to adapt when the hash table grows beyond
what it was expecting (by rehashing, etc) but personally I'd rather spend
the time on trying to improve the estimate to begin with.

regards, tom lane

#8Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Tom Lane (#7)
Re: Question with hashed IN

On Sun, 17 Aug 2003, Tom Lane wrote:

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

On Sun, 17 Aug 2003, Tom Lane wrote:

That doesn't make any sense to me --- AFAICS, only the planner pays any
attention to reltuples, so it could only affect things via changing the
plan. Could we see details?

I've included a perl file that generates data like that I was using and
the output of the commands from that through psql -E on my machine. The
times seem pretty repeatable in any order so caching and such doesn't seem
to be playing a big part.

Oh, I see what it is. The initial sizing of the hash table (number of
buckets) is done using the planner's estimate of the number of rows out
of the subplan. In your later examples, the hash table is woefully
overloaded and so searching it takes longer (too many items on each
hash chain).

I'm not sure how important this is to work on. We could try to make the
executor's hash code more able to adapt when the hash table grows beyond
what it was expecting (by rehashing, etc) but personally I'd rather spend
the time on trying to improve the estimate to begin with.

Right.

In case you're wondering, this all came out of playing with doing the NOT
IN query for the ALTER TABLE ADD CONSTRAINT stuff for foreign keys where
those values are potentially still default when the alter occurs. I've
built a not quite complete version using NOT IN and another with NOT
EXISTS (neither handles the 0 column case, although I don't think you can
actually specify it syntactically, and doesn't handle permissions
correctly if the owner can't read the other table) for performance testing
purposes. The things I noticed were side effects of watching that
distilled into simpler tests.

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#7)
Re: Question with hashed IN

I said:

Oh, I see what it is. The initial sizing of the hash table (number of
buckets) is done using the planner's estimate of the number of rows out
of the subplan. In your later examples, the hash table is woefully
overloaded and so searching it takes longer (too many items on each
hash chain).

I'm not sure how important this is to work on. We could try to make the
executor's hash code more able to adapt when the hash table grows beyond
what it was expecting (by rehashing, etc) but personally I'd rather spend
the time on trying to improve the estimate to begin with.

After some further looking, I realized that the backend's standard hashtable
management code (dynahash.c) already has a perfectly good algorithm for
expanding hashtables when they start to get full. But the code I'd
written for hashed aggregation and grouping didn't use dynahash.c,
because the latter didn't support anything more complex than memcmp()
for comparing keys. This was obviously a dumb decision in hindsight,
so I've invested the couple hours' work needed to improve dynahash's API
and make the hashed-aggregation code use it.

As of CVS tip I get more reasonable behavior in your example:

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Seq Scan on fktest (cost=12510.00..12532.50 rows=500 width=4) (actual time=7803.77..7803.77 rows=0 loops=1)
Filter: (NOT (hashed subplan))
SubPlan
-> Seq Scan on pktest (cost=0.00..10010.00 rows=1000000 width=4) (actual time=248.34..5408.39 rows=1000000 loops=1)
Total runtime: 7979.45 msec
(5 rows)

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Seq Scan on fktest (cost=1260.00..1282.50 rows=500 width=4) (actual time=4482.05..4482.05 rows=0 loops=1)
Filter: (NOT (hashed subplan))
SubPlan
-> Seq Scan on pktest (cost=0.00..1010.00 rows=100000 width=4) (actual time=0.09..2052.05 rows=1000000 loops=1)
Total runtime: 4651.02 msec
(5 rows)

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Seq Scan on fktest (cost=135.00..157.50 rows=500 width=4) (actual time=5830.16..5830.16 rows=0 loops=1)
Filter: (NOT (hashed subplan))
SubPlan
-> Seq Scan on pktest (cost=0.00..110.00 rows=10000 width=4) (actual time=0.03..3413.05 rows=1000000 loops=1)
Total runtime: 5994.26 msec
(5 rows)

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Seq Scan on fktest (cost=22.50..45.00 rows=500 width=4) (actual time=4160.85..4160.85 rows=0 loops=1)
Filter: (NOT (hashed subplan))
SubPlan
-> Seq Scan on pktest (cost=0.00..20.00 rows=1000 width=4) (actual time=0.03..1755.32 rows=1000000 loops=1)
Total runtime: 4326.24 msec
(5 rows)

Before, the same four tests gave runtimes like these on this machine:
Total runtime: 16590.97 msec
Total runtime: 13792.45 msec
Total runtime: 22702.87 msec
Total runtime: 115465.43 msec
(I forget now whether those times were taken with a backend compiled for
profiling --- so they may not be directly comparable. But at least I
can say that the increase in runtime with smaller estimates is gone.)

regards, tom lane