Re:Am I really stupid???
I would like to ask again, because I feel really stupid, I get no answer. I always got an answer to my
questions when I mailed in mailing lists until now. What's happening? At least somebody could answer
simply: "Your question is not worth an answer!" but no answer makes me feel really stupid. As I have
spent a lot of time reading documentation and making tests (perhaps not enough) I would really be
pleased if I had an answer to this simple problem (tell me if it is my fault please):
Original message from: Dragos Stoichita
Show quoted text
Hi, I'm a new user to SQL and PostgreSQL so perhaps my questions below will be a little stupid so
please excuse me.I do this:
CREATE TABLE t1 ( PRIMARY KEY (f1), f1 INTEGER, f2 INTEGER);
CREATE TABLE t2 ( PRIMARY KEY (f1), f1 INTEGER, f2 INTEGER);Then I fill each of these tables with say, around 10000 rows.
When I do:
SELECT f2 FROM t1 WHERE f1 > 100;
It is amazingly fast! It takes less than 1 second. And it returns around 3000 rows.
I do then:
SELECT f2 FROM t2 WHERE f1 > 100;
It is also amazingly fast and returns around 4000 rows.
Then I do:
SELECT f2 FROM t1 WHERE f1 > 100 INTERSECT SELECT f2 FROM t2 WHERE f1 > 100;
And it is incredibly *SLOW*!!! I really don't understand, I run postmaster on a 400Mhz pc with 64 megs
of ram. What's happening? It is only an intersection of integers. If I had to do it in C, I would Quicksort
the results from the first query, Quicksort the results from the second query, then unique them, then
intersect them. On a 400 Mhz processor I think it would take less than 1 second. I tested my Quicksort
routines on a Pentium 120 and remembered it sorted more than 100000 integers per second. And a
unique algorithm when the elements are ordered is very fast. The same for an intersection algorithm. But
it takes more than 8 seconds for PostgreSQL to process the INTERSECT.Is there an explanation? Is it my fault? Please help me I already switched from another database to this
one and hoped PostgreSQL would perform well :(Dragos Stoichita, 19 year old student in electronics at ESIEE (http://www.esiee.fr)
I used to reformat broken email and respond, but lately I don't have
the time for the effort, if you follow the guidlines at
http://www.lemis.com/email.html
You'll probably have a lot more luck in the future.
As far as INTERSECT being slow, I really don't know, I've found EXCEPT to
be horribly slow in Postgresql, right now we're trying to work out some
funding to get this resolved.
In the meanwhile I think a better way to accomplish your query would
be this:
SELECT
t1.f2
FROM
t1, t2
WHERE
t1.f1 > 100 AND t2.f1 > 100
AND t1.f2 = t2.f2
;
(I hope) :)
-Alfred
* Dragos Stoichita <ddd@genesis.homeip.net> [000516 15:46] wrote:
I would like to ask again, because I feel really stupid, I get no answer. I always got an answer to my
questions when I mailed in mailing lists until now. What's happening? At least somebody could answer
simply: "Your question is not worth an answer!" but no answer makes me feel really stupid. As I have
spent a lot of time reading documentation and making tests (perhaps not enough) I would really be
pleased if I had an answer to this simple problem (tell me if it is my fault please):Original message from: Dragos Stoichita
Hi, I'm a new user to SQL and PostgreSQL so perhaps my questions below will be a little stupid so
please excuse me.I do this:
CREATE TABLE t1 ( PRIMARY KEY (f1), f1 INTEGER, f2 INTEGER);
CREATE TABLE t2 ( PRIMARY KEY (f1), f1 INTEGER, f2 INTEGER);Then I fill each of these tables with say, around 10000 rows.
When I do:
SELECT f2 FROM t1 WHERE f1 > 100;
It is amazingly fast! It takes less than 1 second. And it returns around 3000 rows.
I do then:
SELECT f2 FROM t2 WHERE f1 > 100;
It is also amazingly fast and returns around 4000 rows.
Then I do:
SELECT f2 FROM t1 WHERE f1 > 100 INTERSECT SELECT f2 FROM t2 WHERE f1 > 100;
And it is incredibly *SLOW*!!! I really don't understand, I run postmaster on a 400Mhz pc with 64 megs
of ram. What's happening? It is only an intersection of integers. If I had to do it in C, I would Quicksort
the results from the first query, Quicksort the results from the second query, then unique them, then
intersect them. On a 400 Mhz processor I think it would take less than 1 second. I tested my Quicksort
routines on a Pentium 120 and remembered it sorted more than 100000 integers per second. And a
unique algorithm when the elements are ordered is very fast. The same for an intersection algorithm. But
it takes more than 8 seconds for PostgreSQL to process the INTERSECT.Is there an explanation? Is it my fault? Please help me I already switched from another database to this
one and hoped PostgreSQL would perform well :(Dragos Stoichita, 19 year old student in electronics at ESIEE (http://www.esiee.fr)
--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."
* Alfred Perlstein (bright@wintelcom.net) [20000516 22:54]:
In the meanwhile I think a better way to accomplish your query would
be this:SELECT
t1.f2
FROM
t1, t2
WHERE
t1.f1 > 100 AND t2.f1 > 100
AND t1.f2 = t2.f2
;(I hope) :)
You really don't need the t2.f1 > 100 bit because of the latter join.
i.e.,
SELECT
t1.f2
FROM
t1, t2
WHERE
t1.f1 > 100 AND t1.f2 = t2.f2
;
should be sufficient.
Hope this helps,
patrick
--
Abstainer: a weak person who yields to the temptation of
denying himself a pleasure.
-- Ambrose Bierce
Original message from: Raul Chirea
Anyway, try to learn some SQL bases before asking that kind of things
(especially the "join between two or more tables" notion and what is an
"index" and how to use it) !
You'll save a lot of time, yours and other's.
1) I intensively use indexes
2) I already used joins but my problem is a lot more complicated than this one, I have
to do around 20 intersects between very complicated SQL queries, so rewriting this in
joins is very difficult if not near impossible.
3) I think I have given some very strong arguments in my message.
I have to write a database system for an online employment site and there is a
search with more than 20 criteria. There should be around 10000 candidates after 1
year, but I prefer to be sure the search is very fast with 100000 or 1000000 so that
in the future there will be no problems. For each one of the search criteria I have done simple
tables with 2 columns, one being the index and the other an integer indicating the
candidate identifier. After having done multiple selects, I need to do an intersect.
Of course I can't say it is impossible to write the same thing in joins, but believe me
it would be a lot slower, here's my idea:
Each of the selects returns around 2000 integers. I have a 400 Mhz pc for the
development, but the final machine will be an IBM Netfinity server with several
pentium 3 processors. What I do is a very high quality work and the server must
be able to handle a huge demand.
Now of course every people in this forum will tell me to rewrite the query in a join,
because I gave a simple example, but I could tell you a SQL request that's near
1 page long, between multiple intersects and unions.
Now that's ONE thing that I think nobody here will be able to excuse:
Sorting integers on today's 400+ Mhz pc's, especially 10000 ones, is really fast.
Doing an unique on sorted integers is really fast too.
Doing an intersect on sorted, unique integers is really fast.
So intersecting 2000x2000x3000x2000x5000 on today's 400+ Mhz pc's should
always take less than 1 second (a lot less).
Nobody really answered my question. I did not ask you to tell me how to rewrite
my question, because I already know that, don't think I do not read the docs,
tutorials, etc. I need this fast intersect because if I did not have it the complexity
of the problem would have been multiplied by at least a factor of 10 believe me.
Now listen to me: 13 seconds for intersecting two 2000 element arrays it is a
SHAME.
Dragos Stoichita.
Don't blame a poor student in electronics that's only 19 years old, and look at
the real arguments he gives.
Import Notes
Resolved by subject fallback
Original message from: Fabrice Scemama
On finit par penser que ta question comporte en elle
sa propre reponse :-)
informe-toi sur les joins en particulier et le SQL en general.
Tu critiqueras la liste et les gens ensuite.(translation to English-speaking mailing-list users:
I replied 'RTFM' :-)Fabrice Scemama
Dragos Stoichita wrote:
Let's get rid of those kind of messages. When you write a message
saying you are a beginner, there's always a little bunch of people like
you who particularly like to say: "go read the docs newbie" or
"go learn that" etc.
I am not someone who never reads the docs before asking a question.
I know SQL, I know joins, and all the people who told me to rewrite
the query would better have not answered my message, because
I already knew that.
I take your message as a personal insult, because I don't criticize
people without reason, and I did criticize this list because it was
justified.
I need help about only once every 2 or three months, because I am
able to use the docs and do my own work alone.
HERE I AM NOT ASKING HELP!
I asked a simple question: why the query with intersects is not only
slower, but with a factor of 10^something slower than the join one.
If the select returns a table of 1 column with n rows all integers,
sorting them ~n*logn, unique is ~n and intersect is ~n. I have here
a book on Algorithmics and have validated a course on this, I know
what I talk about. Why, with performant sorting, unique and intersect
algorithms, it takes 13 seconds???
I just want a real serious person that has knowledge to answer this
precise question, using an algorithmic demonstration. I only talk about
maths and algorithms here.
Please be serious in your answer next time.
Dragos Stoichita.
Import Notes
Resolved by subject fallback
Dragos Stoichita wrote:
Original message from: Fabrice Scemama
On finit par penser que ta question comporte en elle
sa propre reponse :-)
informe-toi sur les joins en particulier et le SQL en general.
Tu critiqueras la liste et les gens ensuite.(translation to English-speaking mailing-list users:
I replied 'RTFM' :-)Fabrice Scemama
Dragos Stoichita wrote:
Let's get rid of those kind of messages.
I agree.
When you write a message
saying you are a beginner, there's always a little bunch of people like
you who particularly like to say: "go read the docs newbie" or
"go learn that" etc.I am not someone who never reads the docs before asking a question.
I know SQL, I know joins, and all the people who told me to rewrite
the query would better have not answered my message, because
I already knew that.
Unfortunantely email often does not include a knowlege meter that tells
others what is know or not. Some people are better at making guesses that
others, but it's not always easy.
I take your message as a personal insult, because I don't criticize
people without reason, and I did criticize this list because it was
justified.
I'm not convinced that your criticism of the list was justified (you are
of course welcome to draw your own conclusions).
I need help about only once every 2 or three months, because I am
able to use the docs and do my own work alone.HERE I AM NOT ASKING HELP!
I asked a simple question: why the query with intersects is not only
slower, but with a factor of 10^something slower than the join one.
If the select returns a table of 1 column with n rows all integers,
sorting them ~n*logn, unique is ~n and intersect is ~n. I have here
a book on Algorithmics and have validated a course on this, I know
what I talk about. Why, with performant sorting, unique and intersect
algorithms, it takes 13 seconds???I just want a real serious person that has knowledge to answer this
precise question, using an algorithmic demonstration. I only talk about
maths and algorithms here.
If you wish to talk about the actual implemented algorithms, you might
have done better posting to the hackers mailing list. But remember,
unless you pay for support, you have no intrinsic right to expect that
others will take the problem as seriously as you do. Particularly if you
cannot be VERY clear about what the problem is and what questions you want
answerred. That being said, the developers have always treated me well.
As for your specific question, I'd like to ask if you've run the vaious
queries and subqueries using 'explain' -- it is sometimes the case that
indexes do not behave as you would expect or hope on complex queries. I
think that historical experience with this fact is what caused some well
meaning responders to suggest rewrite alternatives for you. I do not
think it was out of ill will or a desire to insult your intelligence.
For example, if you look through the archive you will see quite a few
discussions about slow returns from "SELECT * FROM foo WHERE x in (...)"
-- this is a fact about the current implementation of PostgreSQL and the
(only?) solution is to rewrite using "WHERE EXISTS ...." Though I forget
the exact reason, it turns out that it is not a trivial matter to have the
optimizer rewrite the first query into the second (and of course the first
is slow because it cannot use inidces). Your question could be a similar
case, or it could be a repairable shortcoming in the planner/optimizer.
In either case, explain will help and the developers/hackers may be able
to give specific insight, as may browsing the code.
Please be serious in your answer next time.
I am being as serious as I can. Please do not take offense at anything I
have said -- offense or insult is explicitly not my intent. My intent is
to help. And maybe to gently suggest that in most cases where insult is
percieved on these lists, it was not intended, and that it always takes
two parties to escalate the sense of insult.
Karl DeBisschop
Original message from: Karl DeBisschop
I agree.
...
two parties to escalate the sense of insult.Karl DeBisschop
Thank you for your answer, you know, that's *THAT* kind of answer that I often get,
people that look a minimum at the question and that either agree, disagree with
the problem, try to find out, etc...
But sometimes there are people like Fabrice Scemama that for no reason do not like
me. Those people even send me private insult messages. Fabrice Scemama has done
so.
I do not like people that forget about being polite.
It is true that I insisted on my question, perhaps I should have waited one week before
doing it. That was because I usually get an answer really quickly.
Perhaps I overestimated the speed of this mailing list or the number of persons
consulting it every day.
If I have done a disaster please excuse me. I am working very hard and am tired.
Dragos Stoichita.
Import Notes
Resolved by subject fallback
Karl DeBisschop <kdebisschop@h00a0cc3b7988.ne.mediaone.net> writes:
For example, if you look through the archive you will see quite a few
discussions about slow returns from "SELECT * FROM foo WHERE x in (...)"
-- this is a fact about the current implementation of PostgreSQL and the
(only?) solution is to rewrite using "WHERE EXISTS ...." Though I forget
the exact reason, it turns out that it is not a trivial matter to have the
optimizer rewrite the first query into the second (and of course the first
is slow because it cannot use inidces). Your question could be a similar
case, or it could be a repairable shortcoming in the planner/optimizer.
In fact, IN (subselect), INTERSECT, and EXCEPT are all pretty much the
same thing, and they're all pretty slow in the current code :-(, because
they all work by rescanning the inner query for each outer tuple --- in
other words, they're all implemented like plain nestloop joins. EXISTS
is marginally better because the planner can figure out how to use an
index on the inner table, if there is one.
The right way to fix this is to promote these operations into
full-fledged join types so that the optimizer can consider alternatives
like mergejoin (which is basically the method Dragos is talking about),
hashjoin, index-driven nestloop, etc. That's not a small task. I'm
hoping to see it happen as part of the querytree redesign scheduled for
7.2, which will also give us outer joins. If you think about it, all
of these are variants on the theme of outer join...
regards, tom lane
At 11:18 AM 17-05-2000 -0400, Tom Lane wrote:
In fact, IN (subselect), INTERSECT, and EXCEPT are all pretty much the
same thing, and they're all pretty slow in the current code :-(, because
they all work by rescanning the inner query for each outer tuple --- in
other words, they're all implemented like plain nestloop joins. EXISTS
is marginally better because the planner can figure out how to use an
index on the inner table, if there is one.
Does that mean for two tables, one 1000 rows another 2000 rows, it's a
total of 1000 X 2000 rows scanned?
I suppose one way to slightly reduce the number of rows would be to select
stuff into temp tables first.
Cheerio,
Link.
Lincoln Yeoh <lylyeoh@mecomb.com> writes:
At 11:18 AM 17-05-2000 -0400, Tom Lane wrote:
In fact, IN (subselect), INTERSECT, and EXCEPT are all pretty much the
same thing, and they're all pretty slow in the current code :-(, because
they all work by rescanning the inner query for each outer tuple --- in
other words, they're all implemented like plain nestloop joins. EXISTS
is marginally better because the planner can figure out how to use an
index on the inner table, if there is one.
Does that mean for two tables, one 1000 rows another 2000 rows, it's a
total of 1000 X 2000 rows scanned?
You got it :-(. (Unless we find what we're looking for before scanning
all the way through --- so M*N is the worst case, but the average case
is probably less.)
I suppose one way to slightly reduce the number of rows would be to select
stuff into temp tables first.
7.0 is smart enough to do that if the inner query looks complicated.
If it's just a sequential scan itself, then of course there's no win...
regards, tom lane
Does that mean for two tables, one 1000 rows another 2000 rows, it's a
total of 1000 X 2000 rows scanned?You got it :-(. (Unless we find what we're looking for before scanning
all the way through --- so M*N is the worst case, but the average case
is probably less.)I suppose one way to slightly reduce the number of rows would be to select
stuff into temp tables first.7.0 is smart enough to do that if the inner query looks complicated.
If it's just a sequential scan itself, then of course there's no win...
Ok I got it. I'm going to do simple selects and doing the intersects myself :)
Dragos.
Import Notes
Resolved by subject fallback