SQL query runs fine on one platform (FreeBSD), but hangs on another (Windows)
The query I'm running is:
"select page.*, coalesce((select COUNT(*) from sentence where
sentence."PageURL" = page."URL" group by page."URL"), 0) as
NoOfSentences from page WHERE "Classification" LIKE CASE WHEN "<>"
THEN " ELSE '%' END ORDER BY "PublishDate" DESC Offset 0 LIMIT 100"
I can post the table definitions if that would be helpful but I don't
have them on hand at the moment.
The gist of it though is that "page" and "sentence" are two tables.
page.URL maps to sentence.PageURL. The page table has the columns
"Classification", and "PublishDate". URL, PageURL, and Classification
are strings. PublishDate is a timestamp with timezone.
Both queries are run from a Java project using the latest JDBC driver.
The PostgreSQL Server versions it's being run on are 9.2 and 9.3. The
query executes and returns just fine when run on a FreeBSD-based
platform, but executes forever when run under Windows.
Does anyone have any idea why this might be happening? Are there
platform/syntax compatibility issues I'm triggering here that I'm
unaware of? Is there something wrong with the query?
We're going to try to test it under Linux too, but that system will
have to be set up first so it might be a while before we know those
results.
Any thoughts would be appreciated,
David Noel
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 29/04/2014 09:59, David Noel wrote:
The query I'm running is:
"select page.*, coalesce((select COUNT(*) from sentence where
sentence."PageURL" = page."URL" group by page."URL"), 0) as
NoOfSentences from page WHERE "Classification" LIKE CASE WHEN "<>"
THEN " ELSE '%' END ORDER BY "PublishDate" DESC Offset 0 LIMIT 100"
In all honesty, this query is very badly written. It seems like it was ported from some other
system. The inner group by in the coalesce is redundant since the result is always one row,
moreover, it is wrong since coalesce accepts a scalar value, it hits the eye at first sight.
Additionally, ''<>'' always returns false, what's the purpose of the CASE statement?
I can post the table definitions if that would be helpful but I don't
have them on hand at the moment.The gist of it though is that "page" and "sentence" are two tables.
page.URL maps to sentence.PageURL. The page table has the columns
"Classification", and "PublishDate". URL, PageURL, and Classification
are strings. PublishDate is a timestamp with timezone.Both queries are run from a Java project using the latest JDBC driver.
The PostgreSQL Server versions it's being run on are 9.2 and 9.3. The
query executes and returns just fine when run on a FreeBSD-based
platform, but executes forever when run under Windows.Does anyone have any idea why this might be happening? Are there
platform/syntax compatibility issues I'm triggering here that I'm
unaware of? Is there something wrong with the query?We're going to try to test it under Linux too, but that system will
have to be set up first so it might be a while before we know those
results.Any thoughts would be appreciated,
Try to re-write the query in a good form, and then perform EXPLAIN ANALYZE on both systems
to see what's wrong.
David Noel
--
Achilleas Mantzios
Head of IT DEV
IT DEPT
Dynacom Tankers Mgmt
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 4/29/14, David Noel <david.i.noel@gmail.com> wrote:
The query I'm running is:
"select page.*, coalesce((select COUNT(*) from sentence where
sentence."PageURL" = page."URL" group by page."URL"), 0) as
NoOfSentences from page WHERE "Classification" LIKE CASE WHEN "<>"
THEN " ELSE '%' END ORDER BY "PublishDate" DESC Offset 0 LIMIT 100"
Relevant schema below:
-------------------------------------------------
CREATE TABLE page
(
"URL" text NOT NULL,
"Title" text,
"Article" text,
"PublishDate" timestamp with time zone,
"SiteName" text,
"Classification" text,
...etc...
CONSTRAINT page_pkey PRIMARY KEY ("URL")
)
WITH (
OIDS=FALSE
);
ALTER TABLE page
OWNER TO dba;
-------------------------------------------------
CREATE TABLE sentence
(
"UUID" serial NOT NULL,
"IDSentence" text NOT NULL,
"Contents" text,
"IDAuthor" text,
"CreatedAt" text,
"PageURL" text NOT NULL,
CONSTRAINT sentence_pkey PRIMARY KEY ("UUID"),
CONSTRAINT idpage_fkey FOREIGN KEY ("PageURL")
REFERENCES page ("URL") MATCH Unknown
ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
OIDS=FALSE
);
ALTER TABLE sentence
OWNER TO dba;
-------------------------------------------------
-David
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 4/29/14, Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:
On 29/04/2014 09:59, David Noel wrote:
"select page.*, coalesce((select COUNT(*) from sentence where
sentence."PageURL" = page."URL" group by page."URL"), 0) as
NoOfSentences from page WHERE "Classification" LIKE CASE WHEN "<>"
THEN " ELSE '%' END ORDER BY "PublishDate" DESC Offset 0 LIMIT 100"In all honesty, this query is very badly written. It seems like it was
ported from some other
system. The inner group by in the coalesce is redundant since the result is
always one row,
moreover, it is wrong since coalesce accepts a scalar value, it hits the eye
at first sight.
Additionally, ''<>'' always returns false, what's the purpose of the CASE
statement?
Ok, thanks for the heads up. It confused me, too. It's code I'm just
picking up from another developer, so I don't know why it was done the
way it was done. I'm not super proficient with SQL but I'll take a
stab at rewriting it.
Try to re-write the query in a good form, and then perform EXPLAIN ANALYZE
on both systems to see what's wrong.
Will do. Thanks for the advice.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Ahh, sorry, copied the query over incorrectly. It should read as follows:
select page.*, coalesce((select COUNT(*) from sentence where
sentence."PageURL" = page."URL" group by page."URL"), 0) as
NoOfSentences from page WHERE "Classification" LIKE CASE WHEN
'health'<>'' THEN 'health' ELSE '%' END ORDER BY "PublishDate" DESC
Offset 0 LIMIT 100
Does that make any more sense?
On 4/29/14, David Noel <david.i.noel@gmail.com> wrote:
On 4/29/14, Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:
On 29/04/2014 09:59, David Noel wrote:
"select page.*, coalesce((select COUNT(*) from sentence where
sentence."PageURL" = page."URL" group by page."URL"), 0) as
NoOfSentences from page WHERE "Classification" LIKE CASE WHEN "<>"
THEN " ELSE '%' END ORDER BY "PublishDate" DESC Offset 0 LIMIT 100"In all honesty, this query is very badly written. It seems like it was
ported from some other
system. The inner group by in the coalesce is redundant since the result
is
always one row,
moreover, it is wrong since coalesce accepts a scalar value, it hits the
eye
at first sight.
Additionally, ''<>'' always returns false, what's the purpose of the CASE
statement?Ok, thanks for the heads up. It confused me, too. It's code I'm just
picking up from another developer, so I don't know why it was done the
way it was done. I'm not super proficient with SQL but I'll take a
stab at rewriting it.Try to re-write the query in a good form, and then perform EXPLAIN
ANALYZE
on both systems to see what's wrong.Will do. Thanks for the advice.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 4/29/2014 12:42 AM, David Noel wrote:
Ok, thanks for the heads up. It confused me, too. It's code I'm just
picking up from another developer, so I don't know why it was done the
way it was done. I'm not super proficient with SQL but I'll take a
stab at rewriting it.
wild guess says it was barfed out of some ORM or similar data distraction.
--
john r pierce 37N 122W
somewhere on the middle of the left coast
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 04/29/2014 09:44 AM, David Noel wrote:
Ahh, sorry, copied the query over incorrectly. It should read as follows:
select page.*, coalesce((select COUNT(*) from sentence where
sentence."PageURL" = page."URL" group by page."URL"), 0) as
NoOfSentences from page WHERE "Classification" LIKE CASE WHEN
'health'<>'' THEN 'health' ELSE '%' END ORDER BY "PublishDate" DESC
Offset 0 LIMIT 100Does that make any more sense?
For 9.3, you can write that as:
select p.*, s.NoOfSentences
from page p,
lateral (select count(*) as NoOfSentences
from sentence s
where s."PageURL" = p."URL") s
where "Classification" like case ... end
order by "PublishDate" desc
limit 100;
Performance will be much, much better than what you have but it won't
work at all on the 9.2 server.
--
Vik
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
select p.*, s.NoOfSentences
from page p,
lateral (select count(*) as NoOfSentences
from sentence s
where s."PageURL" = p."URL") s
where "Classification" like case ... end
order by "PublishDate" desc
limit 100;
Great. Thanks so much!
Could I make it even simpler and drop the case entirely?
select p.*, s.NoOfSentences
from page p,
lateral (select count(*) as NoOfSentences
from sentence s
where s."PageURL" = p."URL") s
where "Classification" like 'health'
order by "PublishDate" desc
limit 100;
I'm not sure what "case WHEN 'health'<>'' THEN 'health' ELSE '%' end"
does. I follow everything just fine until I get to the 'health'<>''
condition. What does the single quotation mark mean? I can't seem to
find it in the documentation.
-David
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Ehh, to clarify I'm referring to the lone _double_ quotation mark at
the end of the condition 'health'<>''. I called it a "single quotation
mark" because it was a quotation mark all by itself, but realize that
could be misread. Single quotation marks are technically this: '
Sorry for the newbie spam -- I can't run
less-than/greater-than/quotation marks through Google for answers.
On 4/29/14, David Noel <david.i.noel@gmail.com> wrote:
select p.*, s.NoOfSentences
from page p,
lateral (select count(*) as NoOfSentences
from sentence s
where s."PageURL" = p."URL") s
where "Classification" like case ... end
order by "PublishDate" desc
limit 100;Great. Thanks so much!
Could I make it even simpler and drop the case entirely?
select p.*, s.NoOfSentences
from page p,
lateral (select count(*) as NoOfSentences
from sentence s
where s."PageURL" = p."URL") s
where "Classification" like 'health'
order by "PublishDate" desc
limit 100;I'm not sure what "case WHEN 'health'<>'' THEN 'health' ELSE '%' end"
does. I follow everything just fine until I get to the 'health'<>''
condition. What does the single quotation mark mean? I can't seem to
find it in the documentation.-David
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 29/04/2014 12:39, David Noel wrote:
Ehh, to clarify I'm referring to the lone _double_ quotation mark at
the end of the condition 'health'<>''. I called it a "single quotation
mark" because it was a quotation mark all by itself, but realize that
could be misread. Single quotation marks are technically this: '
" (double quotation mark) designates a column name, table name, and rest of database objects.
' (single quotation mark) designates a text literal e.g. 'john', 'david', etc...
'health'<>'' (if that is what you have) means a boolean expression that compares the
literal 'health' with the empty literal '' which is of course always false.
Maybe *health* is a column name somewhere ? In this case it should be written :
"health" <> '' (i.e. comparison between the value of column "health" and the literal value '')
Sorry for the newbie spam -- I can't run
less-than/greater-than/quotation marks through Google for answers.On 4/29/14, David Noel <david.i.noel@gmail.com> wrote:
select p.*, s.NoOfSentences
from page p,
lateral (select count(*) as NoOfSentences
from sentence s
where s."PageURL" = p."URL") s
where "Classification" like case ... end
order by "PublishDate" desc
limit 100;Great. Thanks so much!
Could I make it even simpler and drop the case entirely?
select p.*, s.NoOfSentences
from page p,
lateral (select count(*) as NoOfSentences
from sentence s
where s."PageURL" = p."URL") s
where "Classification" like 'health'
order by "PublishDate" desc
limit 100;I'm not sure what "case WHEN 'health'<>'' THEN 'health' ELSE '%' end"
does. I follow everything just fine until I get to the 'health'<>''
condition. What does the single quotation mark mean? I can't seem to
find it in the documentation.-David
--
Achilleas Mantzios
Head of IT DEV
IT DEPT
Dynacom Tankers Mgmt
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
'health'<>'' (if that is what you have) means a boolean expression that
compares the
literal 'health' with the empty literal '' which is of course always false.
Ah. Gotcha. Thanks. I didn't know you could use a single double
quotation mark in a query -- I thought like in most languages that you
needed two of them for it to be valid.
Maybe *health* is a column name somewhere ? In this case it should be
written :
"health" <> '' (i.e. comparison between the value of column "health" and the
literal value '')
'health' is one of the accepted values of the page table's
"Classification" column.
Many thanks,
-David
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 29/04/2014 12:54, David Noel wrote:
'health'<>'' (if that is what you have) means a boolean expression that
compares the
literal 'health' with the empty literal '' which is of course always false.Ah. Gotcha. Thanks. I didn't know you could use a single double
quotation mark in a query -- I thought like in most languages that you
needed two of them for it to be valid.
But there are two of them : ' and ' makes ''. If you use only one psql/parser will complain.
Maybe *health* is a column name somewhere ? In this case it should be
written :
"health" <> '' (i.e. comparison between the value of column "health" and the
literal value '')'health' is one of the accepted values of the page table's
"Classification" column.Many thanks,
-David
--
Achilleas Mantzios
Head of IT DEV
IT DEPT
Dynacom Tankers Mgmt
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 29 Duben 2014, 8:59, David Noel wrote:
The query I'm running is:
"select page.*, coalesce((select COUNT(*) from sentence where
sentence."PageURL" = page."URL" group by page."URL"), 0) as
NoOfSentences from page WHERE "Classification" LIKE CASE WHEN "<>"
THEN " ELSE '%' END ORDER BY "PublishDate" DESC Offset 0 LIMIT 100"I can post the table definitions if that would be helpful but I don't
have them on hand at the moment.The gist of it though is that "page" and "sentence" are two tables.
page.URL maps to sentence.PageURL. The page table has the columns
"Classification", and "PublishDate". URL, PageURL, and Classification
are strings. PublishDate is a timestamp with timezone.Both queries are run from a Java project using the latest JDBC driver.
The PostgreSQL Server versions it's being run on are 9.2 and 9.3. The
query executes and returns just fine when run on a FreeBSD-based
platform, but executes forever when run under Windows.
Is both server/client running on FreeBSD or Windows, or are you switching
only part of the stack?
Does anyone have any idea why this might be happening? Are there
platform/syntax compatibility issues I'm triggering here that I'm
unaware of? Is there something wrong with the query?
It shouldn't get stuck. It might be slower on some platforms, but it
shouldn't really get stuck, so it might be a bug.
On linux I'd recommend perf/strace/... to investigate the issue, but I'm
not familiar with similar tool on Windows.
Is the query eating a lot of CPU, or is it just sitting there idle, doing
nothing? Or is there some other process doing a lot of CPU (e.g. Java)?
Can you try running the query through 'psql' directly, to rule out JDBC
issues etc.? Try to collect explain plans for the query (maybe there's
something wrong with it).
Tomas
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
But there are two of them : ' and ' makes ''. If you use only one
psql/parser will complain.
Ha! Wow. That one totally flew by me. It's not a double quotation mark
(one character), it's a double _single_ quotation mark (two
characters). Yeah, that makes complete sense. Wow. Can't believe I
missed that one. I blame it on it being 5am. Yeah. Wow.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Is both server/client running on FreeBSD or Windows, or are you switching
only part of the stack?
When I run it it's all FreeBSD. When the other developer working on it
runs it it's all Windows.
It shouldn't get stuck. It might be slower on some platforms, but it
shouldn't really get stuck, so it might be a bug.
That's what I was starting to thing.
On linux I'd recommend perf/strace/... to investigate the issue, but I'm
not familiar with similar tool on Windows.
On Windows I'm not sure what he could run either.
Is the query eating a lot of CPU, or is it just sitting there idle, doing
nothing? Or is there some other process doing a lot of CPU (e.g. Java)?
IIRC it's eating CPU. I'll double check though.
Can you try running the query through 'psql' directly, to rule out JDBC
issues etc.? Try to collect explain plans for the query (maybe there's
something wrong with it).
That's a good idea. I'll see what he can come up with.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
David Noel <david.i.noel@gmail.com> writes:
Both queries are run from a Java project using the latest JDBC driver.
The PostgreSQL Server versions it's being run on are 9.2 and 9.3. The
query executes and returns just fine when run on a FreeBSD-based
platform, but executes forever when run under Windows.
Um .. which one is 9.2, and which one is 9.3? Do you get the same
plan according to EXPLAIN on both systems? (Even if you do, let's
see the EXPLAIN output. And maybe EXPLAIN ANALYZE, on the system
where it completes.)
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Very strange. I ran the query and it seemed slow so I rewrote it with
a join instead. Using join it finished in 800ms. The query using the
lateral finished in more than a minute. I guess I need to do some
analysis on those queries to figure out why there was such a vast
difference in performance. %10, %20, %50, even %100 differences in
performance are huge, but for something to take nearly 100x -- %10000
longer to complete? Something just doesn't seem right.
On Tue, Apr 29, 2014 at 3:38 AM, Vik Fearing <vik.fearing@dalibo.com> wrote:
On 04/29/2014 09:44 AM, David Noel wrote:
Ahh, sorry, copied the query over incorrectly. It should read as follows:
select page.*, coalesce((select COUNT(*) from sentence where
sentence."PageURL" = page."URL" group by page."URL"), 0) as
NoOfSentences from page WHERE "Classification" LIKE CASE WHEN
'health'<>'' THEN 'health' ELSE '%' END ORDER BY "PublishDate" DESC
Offset 0 LIMIT 100Does that make any more sense?
For 9.3, you can write that as:
select p.*, s.NoOfSentences
from page p,
lateral (select count(*) as NoOfSentences
from sentence s
where s."PageURL" = p."URL") s
where "Classification" like case ... end
order by "PublishDate" desc
limit 100;Performance will be much, much better than what you have but it won't
work at all on the 9.2 server.--
Vik
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
The FreeBSD system is running 9.3, the Windows systems are running
9.2. I am waiting on the output from the other developer.
On Tue, Apr 29, 2014 at 8:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Noel <david.i.noel@gmail.com> writes:
Both queries are run from a Java project using the latest JDBC driver.
The PostgreSQL Server versions it's being run on are 9.2 and 9.3. The
query executes and returns just fine when run on a FreeBSD-based
platform, but executes forever when run under Windows.Um .. which one is 9.2, and which one is 9.3? Do you get the same
plan according to EXPLAIN on both systems? (Even if you do, let's
see the EXPLAIN output. And maybe EXPLAIN ANALYZE, on the system
where it completes.)regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
For 9.3, you can write that as:
select p.*, s.NoOfSentences
from page p,
lateral (select count(*) as NoOfSentences
from sentence s
where s."PageURL" = p."URL") s
where "Classification" like case ... end
order by "PublishDate" desc
limit 100;Performance will be much, much better than what you have but it won't
work at all on the 9.2 server.
Some interesting feedback on that query you provided. It took nearly
80 seconds to complete.
I rewrote it* as a join and it took .8 seconds to complete:
select p.*, count(*) as NoOfSentences
from page p
inner join sentence c on p."URL" = c."URL"
where "Classification" = 'health'
group by p."URL"
*I may have written it incorrectly but it does _seem_ to produce correct output.
Something seems odd with laterals. I'll have to dig into it more later
and report back, I'm not sure it behaves this way.
For the record, with modification the query you provided wound up
getting executed looking like this:
select p.*, s.NoOfSentences
from page p,
lateral (select count(*) as NoOfSentences
from sentence s
where s."PageURL" = p."URL") s
where "Classification" = 'health'
order by "PublishDate" desc
limit 100;
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 30 Duben 2014, 10:46, David Noel wrote:
Very strange. I ran the query and it seemed slow so I rewrote it with
a join instead. Using join it finished in 800ms. The query using the
lateral finished in more than a minute. I guess I need to do some
analysis on those queries to figure out why there was such a vast
difference in performance. %10, %20, %50, even %100 differences in
performance are huge, but for something to take nearly 100x -- %10000
longer to complete? Something just doesn't seem right.
That is not strange at all.
In an ideal world the database would able to "understand" the semantics of
the query perfectly, and rewrite it to the best plan possible (returning
the desired result). In practice that is not the case, sadly - the planner
has limited knowledge and while it can do many clever tweaks, the way you
write a query somehow limits the options. So when you use LATERAL in the
query, it may or may not be able to rewrite it to the better plan.
To really understand what's going on here we need to see the explain plans
of the queries.
Tomas
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general