Question about ltree....
Hello,
First let me thank Mike for his earlier suggestion that I consider using
ltree - it really is working very nicely!...
I do have a question about performance/design though...
I have a table like this:
CREATE TABLE sometable (
id SERIAL,
category0 LTREE,
category1 LTREE,
category2 LTREE
);
CREATE INDEX sometable_category0 ON sometable USING gist(category0) WHERE
category0 IS NOT NULL;
CREATE INDEX sometable_category1 ON sometable USING gist(category1) WHERE
category1 IS NOT NULL;
CREATE INDEX sometable_category2 ON sometable USING gist(category2) WHERE
category2 IS NOT NULL;
... The idea is that every row in sometable can be assigned to 1, 2, or 3
categories of a users choosing....
I essentially have categories like this:
a
a.a
a.a.a
a.a.a.a
a.a.a.b
a.a.a.c
a.a.b
a.a.b.a
a.a.b.b
... so the categories are essentially 3 levels deep (not counting the top
level) - in my test case I have about 4300 rows of data in this table.
Unfortunately, queries against this table can be a bit inefficient, for
example, if I want to get a count of all of items within a category I
have do something like:
SELECT count(*) AS count FROM
(SELECT category0 AS category FROM sometable WHERE category0 IS NOT NULL
UNION SELECT category1 AS category FROM sometable WHERE category1
IS NOT NULL
UNION SELECT category2 AS category FROM sometable WHERE category2
IS NOT NULL) AS b
WHERE b.category <@ 'a.b';
If I want to retrieve rows which fit into one or more of the categories,
I have to do something like:
SELECT * FROM sometable WHERE category0 <@ 'a.b' OR category1 <@ 'a.b' OR
category2 <@ 'a.c';
.. What is interesting is that this does a sequential scan:
SELECT * FROM sometable WHERE category0 <@ 'a.b';
.. This on the other hand does an index scan, despite the fact that every
row IS NOT NULL and it is about 10 times faster:
SELECT * FROM sometable WHERE category0 <@ 'a.b' AND category0 IS NOT NULL;
.. But, as soon as I add a second OR clause the planner decides to start
doing sequential scans again, despite the fact all category1 columns are NULL:
SELECT * FROM sometable WHERE (category0 <@ 'a.b' AND category0 IS NOT
NULL) OR (category1 <@ 'a.b' AND category1 IS NOT NULL);
Is there anything which jumps out as being something that I could do to
improve the schema design or the performance of the two queries I
identified above - as these get executed very often any performance
improvement would dramatically boost performance...
Somehow I missed the ltree[] array stuff - this solves fairly nicely at
least the second part of my problem, but I'm still not sure how to
optimize the query which contains the union...
My optimized table looks like this:
CREATE TABLE sometable (
id SERIAL,
category LTREE[]
);
CREATE INDEX sometable_category_gist_idx ON sometable(category);
My select query looks like this and I get quite good performance out of it:
SELECT * FROM sometable WHERE category <@ 'a.b';
... The remaining problem is how to get a count of the number of rows
which fall into each category, which looks like this:
CREATE TABLE category (
category LTREE,
head1 VARCHAR,
head2 VARCHAR,
head3 VARCHAR
-- I use head1, head2, and head3 to hold the long-name
representations of the three lower-level category levels
);
SELECT a.category,count(*) AS count,a
.head1,a.head2,a.head3,nlevel(a.category) AS level FROM categories a,
(SELECT category[1] AS category FROM sometable
UNION ALL SELECT category[2] AS category FROM sometable
UNION ALL SELECT category[3] AS category FROM sometable) AS b
WHERE a.category @> b.category AND b.category IS NOT NULL
GROUP BY a.category;
- Greg
Hello,
First let me thank Mike for his earlier suggestion that I consider using
ltree - it really is working very nicely!...I do have a question about performance/design though...
I have a table like this:
CREATE TABLE sometable (
id SERIAL,
category0 LTREE,
category1 LTREE,
category2 LTREE
);CREATE INDEX sometable_category0 ON sometable USING gist(category0) WHERE
category0 IS NOT NULL;
CREATE INDEX sometable_category1 ON sometable USING gist(category1) WHERE
category1 IS NOT NULL;
CREATE INDEX sometable_category2 ON sometable USING gist(category2) WHERE
category2 IS NOT NULL;... The idea is that every row in sometable can be assigned to 1, 2, or 3
categories of a users choosing....I essentially have categories like this:
a
a.a
a.a.a
a.a.a.a
a.a.a.b
a.a.a.c
a.a.b
a.a.b.a
a.a.b.b... so the categories are essentially 3 levels deep (not counting the top
level) - in my test case I have about 4300 rows of data in this table.Unfortunately, queries against this table can be a bit inefficient, for
example, if I want to get a count of all of items within a category I
have do something like:SELECT count(*) AS count FROM
(SELECT category0 AS category FROM sometable WHERE category0 IS NOT NULL
UNION SELECT category1 AS category FROM sometable WHERE category1
IS NOT NULL
UNION SELECT category2 AS category FROM sometable WHERE category2
IS NOT NULL) AS b
WHERE b.category <@ 'a.b';If I want to retrieve rows which fit into one or more of the categories,
I have to do something like:SELECT * FROM sometable WHERE category0 <@ 'a.b' OR category1 <@ 'a.b' OR
category2 <@ 'a.c';.. What is interesting is that this does a sequential scan:
SELECT * FROM sometable WHERE category0 <@ 'a.b';
.. This on the other hand does an index scan, despite the fact that every
row IS NOT NULL and it is about 10 times faster:SELECT * FROM sometable WHERE category0 <@ 'a.b' AND category0 IS NOT NULL;
.. But, as soon as I add a second OR clause the planner decides to start
doing sequential scans again, despite the fact all category1 columns are
NULL:
Show quoted text
SELECT * FROM sometable WHERE (category0 <@ 'a.b' AND category0 IS NOT
NULL) OR (category1 <@ 'a.b' AND category1 IS NOT NULL);Is there anything which jumps out as being something that I could do to
improve the schema design or the performance of the two queries I
identified above - as these get executed very often any performance
improvement would dramatically boost performance...
Import Notes
Reply to msg id not found: 20041026133505.30591@mail.net-virtual.comReference msg id not found: 20041026133505.30591@mail.net-virtual.com | Resolved by subject fallback
Greg,
I don't understand your idea ! Why do you need three Ltrees ?
Oleg
On Tue, 26 Oct 2004, Greg Saylor wrote:
Hello,
First let me thank Mike for his earlier suggestion that I consider using
ltree - it really is working very nicely!...I do have a question about performance/design though...
I have a table like this:
CREATE TABLE sometable (
id SERIAL,
category0 LTREE,
category1 LTREE,
category2 LTREE
);CREATE INDEX sometable_category0 ON sometable USING gist(category0) WHERE
category0 IS NOT NULL;
CREATE INDEX sometable_category1 ON sometable USING gist(category1) WHERE
category1 IS NOT NULL;
CREATE INDEX sometable_category2 ON sometable USING gist(category2) WHERE
category2 IS NOT NULL;... The idea is that every row in sometable can be assigned to 1, 2, or 3
categories of a users choosing....I essentially have categories like this:
a
a.a
a.a.a
a.a.a.a
a.a.a.b
a.a.a.c
a.a.b
a.a.b.a
a.a.b.b... so the categories are essentially 3 levels deep (not counting the top
level) - in my test case I have about 4300 rows of data in this table.Unfortunately, queries against this table can be a bit inefficient, for
example, if I want to get a count of all of items within a category I
have do something like:SELECT count(*) AS count FROM
(SELECT category0 AS category FROM sometable WHERE category0 IS NOT NULL
UNION SELECT category1 AS category FROM sometable WHERE category1
IS NOT NULL
UNION SELECT category2 AS category FROM sometable WHERE category2
IS NOT NULL) AS b
WHERE b.category <@ 'a.b';If I want to retrieve rows which fit into one or more of the categories,
I have to do something like:SELECT * FROM sometable WHERE category0 <@ 'a.b' OR category1 <@ 'a.b' OR
category2 <@ 'a.c';.. What is interesting is that this does a sequential scan:
SELECT * FROM sometable WHERE category0 <@ 'a.b';
.. This on the other hand does an index scan, despite the fact that every
row IS NOT NULL and it is about 10 times faster:SELECT * FROM sometable WHERE category0 <@ 'a.b' AND category0 IS NOT NULL;
.. But, as soon as I add a second OR clause the planner decides to start
doing sequential scans again, despite the fact all category1 columns are NULL:SELECT * FROM sometable WHERE (category0 <@ 'a.b' AND category0 IS NOT
NULL) OR (category1 <@ 'a.b' AND category1 IS NOT NULL);Is there anything which jumps out as being something that I could do to
improve the schema design or the performance of the two queries I
identified above - as these get executed very often any performance
improvement would dramatically boost performance...---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83