SQL query help!

Started by Arcadius A.about 23 years ago9 messages

Arcadius A.

ahouans@sh.cvut.cz

about 23 years ago

Hello!

I hope that someone here could help.

I'm using PostgreSQL7.1.3

I have 3 tables in my DB: the tables are defined in the following way:

CREATE TABLE category(
id SERIAL NOT NULL PRIMARY KEY,
// etc etc

)
;

CREATE TABLE subcategory(
id SERIAL NOT NULL PRIMARY KEY,
categoryid int CONSTRAINT subcategory__ref_category
REFERENCES category (id)
// etc etc
)
;

CREATE TABLE entry(
entryid SERIAL NOT NULL PRIMARY KEY,
isapproved CHAR(1) NOT NULL DEFAULT 'n',
subcategoryid int CONSTRAINT entry__ref_subcategory
REFERENCES subcategory (id)
// atd
,
)
;

I have the following SQL query :

"SELECT * FROM entry where isapproved='y' AND subcategoryid IN (SELECT id
FROM subcategory WHERE
categoryid='"+catID+"') ORDER BY subcategoryid DESC";

For a given categoryid( catID), the query will return all entries in the
"entry" table
having a corresponding subcategoryid(s)[returned by the inned subquery].

But I want to return only a limited number of entries of each
subcategory..... let's say that I want to return at most 5 entries of each
subcategory type ( for instance if the inner subquery returns 3 results,
thus I will be having in total at most 15 entries as relust)....

How can this be achieved?

I'm aware of postgreSQL "LIMIT" and "GROUP BY" clause..... but so far, I'm
not able to put all this together...

Thanks in advance.

Arcadius.

Luis Sousa

llsousa@ualg.pt

about 23 years ago

In reply to: Arcadius A. (#1)

Re: SQL query help!

Tell me what did you try with limit and group by.
Where's IN, why don't you use EXISTS instead. It runs much master !

Regards,
Luis Sousa

Arcadius A. wrote:

Show quoted text

Hello!

I hope that someone here could help.

I'm using PostgreSQL7.1.3

I have 3 tables in my DB: the tables are defined in the following way:

CREATE TABLE category(
id SERIAL NOT NULL PRIMARY KEY,
// etc etc

)
;

CREATE TABLE subcategory(
id SERIAL NOT NULL PRIMARY KEY,
categoryid int CONSTRAINT subcategory__ref_category
REFERENCES category (id)
// etc etc
)
;

CREATE TABLE entry(
entryid SERIAL NOT NULL PRIMARY KEY,
isapproved CHAR(1) NOT NULL DEFAULT 'n',
subcategoryid int CONSTRAINT entry__ref_subcategory
REFERENCES subcategory (id)
// atd
,
)
;

I have the following SQL query :

"SELECT * FROM entry where isapproved='y' AND subcategoryid IN (SELECT id
FROM subcategory WHERE
categoryid='"+catID+"') ORDER BY subcategoryid DESC";

For a given categoryid( catID), the query will return all entries in the
"entry" table
having a corresponding subcategoryid(s)[returned by the inned subquery].

But I want to return only a limited number of entries of each
subcategory..... let's say that I want to return at most 5 entries of each
subcategory type ( for instance if the inner subquery returns 3 results,
thus I will be having in total at most 15 entries as relust)....

How can this be achieved?

I'm aware of postgreSQL "LIMIT" and "GROUP BY" clause..... but so far, I'm
not able to put all this together...

Thanks in advance.

Arcadius.

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Achilleus Mantzios

achill@matrix.gatewaynet.com

about 23 years ago

In reply to: Luis Sousa (#2)

FreeBSD, Linux: select, select count(*) performance

Hi,

i run 2 queries on 2 similar boxes (one running Linux 2.4.7, redhat 7.1
and the other running FreeBSD 4.7-RELEASE-p2)

The 2 boxes run postgresql 7.2.3.

I get some performance results that are not obvious (at least to me)

i have one table named "noon" with 108095 rows.

The 2 queries are:
q1: SELECT count(*) from noon;
q2: SELECT * from noon;

Linux q1
========
dynacom=# EXPLAIN ANALYZE SELECT count(*) from noon;
NOTICE: QUERY PLAN:

Aggregate (cost=20508.19..20508.19 rows=1 width=0) (actual
time=338.17..338.17
rows=1 loops=1)
-> Seq Scan on noon (cost=0.00..20237.95 rows=108095 width=0) (actual
time=0.01..225.73 rows=108095 loops=1)
Total runtime: 338.25 msec

Linux q2
========
dynacom=# EXPLAIN ANALYZE SELECT * from noon;
NOTICE: QUERY PLAN:

Seq Scan on noon (cost=0.00..20237.95 rows=108095 width=1960) (actual
time=1.22..67909.31 rows=108095 loops=1)
Total runtime: 68005.96 msec

FreeBSD q1
==========
dynacom=# EXPLAIN ANALYZE SELECT count(*) from noon;
NOTICE: QUERY PLAN:

Aggregate (cost=20508.19..20508.19 rows=1 width=0) (actual
time=888.93..888.94
rows=1 loops=1)
-> Seq Scan on noon (cost=0.00..20237.95 rows=108095 width=0) (actual
time=0.02..501.09 rows=108095 loops=1)
Total runtime: 889.06 msec

FreeBSD q2
==========
dynacom=# EXPLAIN ANALYZE SELECT * from noon;
NOTICE: QUERY PLAN:

Seq Scan on noon (cost=0.00..20237.95 rows=108095 width=1975) (actual
time=1.08..53470.93 rows=108095 loops=1)
Total runtime: 53827.37 msec

The pgsql configuration for both systems is identical
(the FreeBSD system has less memory but vmstat dont show
any paging activity so i assume this is not an issue here).

The interesting part is that FreeBSD does better in select *,
whereas Linux seem to do much better in select count(*).

Paging and disk IO activity for both systems is near 0.

When i run the select count(*) in Linux i notice a small
increase (15%) in Context Switches per sec, whereas in FreeBSD
i notice a big increase in Context Switches (300%) and
a huge increase in system calls per second (from normally
9-10 to 110,000).
(Linux vmstat gives no syscall info).

The same results come out for every count(*) i try.
Is it just the reporting from explain analyze??

Has any hacker some light to shed??

Thanx.

==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel: +30-10-8981112
fax: +30-10-8981877
email: achill@matrix.gatewaynet.com
mantzios@softlab.ece.ntua.gr

Tom Lane

tgl@sss.pgh.pa.us

about 23 years ago

In reply to: Achilleus Mantzios (#3)

Re: FreeBSD, Linux: select, select count(*) performance

Achilleus Mantzios <achill@matrix.gatewaynet.com> writes:

Linux q1
========
dynacom=# EXPLAIN ANALYZE SELECT count(*) from noon;
NOTICE: QUERY PLAN:

Aggregate (cost=20508.19..20508.19 rows=1 width=0) (actual
time=338.17..338.17
rows=1 loops=1)
-> Seq Scan on noon (cost=0.00..20237.95 rows=108095 width=0) (actual
time=0.01..225.73 rows=108095 loops=1)
Total runtime: 338.25 msec

Linux q2
========
dynacom=# EXPLAIN ANALYZE SELECT * from noon;
NOTICE: QUERY PLAN:

Seq Scan on noon (cost=0.00..20237.95 rows=108095 width=1960) (actual
time=1.22..67909.31 rows=108095 loops=1)
Total runtime: 68005.96 msec

You didn't say what was *in* the table, exactly ... but I'm betting
there are a lot of toasted columns, and that the extra runtime
represents the time to fetch (and perhaps decompress) the TOAST entries.

regards, tom lane

Oleg Bartunov

oleg@sai.msu.su

about 23 years ago

In reply to: Tom Lane (#4)

Re: [GENERAL] FreeBSD, Linux: select, select count(*) performance

On Wed, 27 Nov 2002, Tom Lane wrote:

Achilleus Mantzios <achill@matrix.gatewaynet.com> writes:

Linux q1
========
dynacom=# EXPLAIN ANALYZE SELECT count(*) from noon;
NOTICE: QUERY PLAN:

Aggregate (cost=20508.19..20508.19 rows=1 width=0) (actual
time=338.17..338.17
rows=1 loops=1)
-> Seq Scan on noon (cost=0.00..20237.95 rows=108095 width=0) (actual
time=0.01..225.73 rows=108095 loops=1)
Total runtime: 338.25 msec

Linux q2
========
dynacom=# EXPLAIN ANALYZE SELECT * from noon;
NOTICE: QUERY PLAN:

Seq Scan on noon (cost=0.00..20237.95 rows=108095 width=1960) (actual
time=1.22..67909.31 rows=108095 loops=1)
Total runtime: 68005.96 msec

You didn't say what was *in* the table, exactly ... but I'm betting
there are a lot of toasted columns, and that the extra runtime
represents the time to fetch (and perhaps decompress) the TOAST entries.

Are there any reason to "fetch (and perhaps decompress) the TOAST entries"
just to count(*) without any WHERE clause ?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Achilleus Mantzios

achill@matrix.gatewaynet.com

about 23 years ago

In reply to: Tom Lane (#4)

Re: FreeBSD, Linux: select, select count(*) performance

On Wed, 27 Nov 2002, Tom Lane wrote:

Achilleus Mantzios <achill@matrix.gatewaynet.com> writes:

Linux q1
========
dynacom=# EXPLAIN ANALYZE SELECT count(*) from noon;
NOTICE: QUERY PLAN:

Aggregate (cost=20508.19..20508.19 rows=1 width=0) (actual
time=338.17..338.17
rows=1 loops=1)
-> Seq Scan on noon (cost=0.00..20237.95 rows=108095 width=0) (actual
time=0.01..225.73 rows=108095 loops=1)
Total runtime: 338.25 msec

Linux q2
========
dynacom=# EXPLAIN ANALYZE SELECT * from noon;
NOTICE: QUERY PLAN:

Seq Scan on noon (cost=0.00..20237.95 rows=108095 width=1960) (actual
time=1.22..67909.31 rows=108095 loops=1)
Total runtime: 68005.96 msec

You didn't say what was *in* the table, exactly ... but I'm betting
there are a lot of toasted columns, and that the extra runtime
represents the time to fetch (and perhaps decompress) the TOAST entries.

278 columns of various types.
namely,

The data as i told you are the same db dumped from the production system.
This same dump file was used to populate both (Linux,FBSD) databases.

How is it possible one to have toasted columns whereas the other not??
How can someone identify toasted columns??

Thanx,

Achilleus

Tom Lane

tgl@sss.pgh.pa.us

about 23 years ago

In reply to: Achilleus Mantzios (#6)

Re: FreeBSD, Linux: select, select count(*) performance

Achilleus Mantzios <achill@matrix.gatewaynet.com> writes:

On Wed, 27 Nov 2002, Tom Lane wrote:

You didn't say what was *in* the table, exactly ... but I'm betting
there are a lot of toasted columns, and that the extra runtime
represents the time to fetch (and perhaps decompress) the TOAST entries.

278 columns of various types.
namely,
[snip]

Hmm, no particularly wide columns there --- but 278 columns is a lot.
I think the extra time might just be the time involved in fetching all
those column values out of the table row?

If you're interested in pursuing it, I'd suggest rebuilding the backend
with profiling enabled so you can see exactly where the time goes.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

about 23 years ago

In reply to: Oleg Bartunov (#5)

Re: [GENERAL] FreeBSD, Linux: select, select count(*) performance

Oleg Bartunov <oleg@sai.msu.su> writes:

Are there any reason to "fetch (and perhaps decompress) the TOAST entries"
just to count(*) without any WHERE clause ?

It doesn't. That was my point...

regards, tom lane

Arcadius A.

ahouans@sh.cvut.cz

about 23 years ago

In reply to: Luis Sousa (#2)

Re: SQL query help!

Hello!

"Luis Sousa" <llsousa@ualg.pt> wrote in message
news:3DE498E4.2050002@ualg.pt...

This is a cryptographically signed message in MIME format.

--------------ms080209060900030807050408
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Tell me what did you try with limit and group by.
Where's IN, why don't you use EXISTS instead. It runs much master !

Thanks for the reply!
Alright, I'll use EXISTS instead of IN .... I didn't know that EXISTS is
faster.....

About my query, I have tried :

"
SELECT * FROM entry where isapproved='y' AND EXISTS (SELECT id
FROM subcategory WHERE catid='2') ORDER BY subcatid DESC LIMIT 5;
";
This will return only 5 rows....

But when I add the GROUP BY, then I got error
"
SELECT * FROM entry where isapproved='y' AND EXISTS (SELECT id
FROM subcategory WHERE catid='2') ORDER BY subcatid DESC LIMIT 5 GROUP BY
subcatid;
"

: ERROR: parser: parse error at or near "GROUP"

Thanks.....

Arcadius.