Conditional JOINs ?

Started by Leon Mergenabout 18 years ago11 messagesgeneral
Jump to latest
#1Leon Mergen
leon@solatis.com

Hello,

I'm currently camping with a bit of a problem -- i have a common
requests table, and based on an entry's type, I need to join a
specific other table. Consider this database layout:

###########################
-- table where all common request data is stored
CREATE TABLE log.requests (
id BIGSERIAL,
type SMALLINT NOT NULL,
timestamp INTEGER NOT NULL,
-- all kinds of common data
)

-- only rows where log.requests.type = 1
CREATE TABLE log.requests1 (
request_id BIGINT NOT NULL
)

-- only rows where log.requests.type = 2
CREATE TABLE log.requests2 (
request_id BIGINT NOT NULL
)

etc (about 10 different types)

the current VIEW:
CREATE VIEW requests_view AS
SELECT * FROM log.requests LEFT JOIN log.requests1 ON
(log.requests.id = log.requests1.request_id) LEFT JOIN log.requests2
ON (log.requests.id = log.requests2.request_id) -- etc
###########################

As you can see, if the requests table gets a bit large (50mil+ rows)
and when there are many different types, there will be a lot of wasted
JOINs.

Now, based on a previous post on the PostgreSQL mailing list
[http://archives.postgresql.org/pgsql-general/2007-11/msg00723.php] I
came up with this solution:

###########################
-- table where all common request data is stored
CREATE TABLE log.requests (
id BIGSERIAL,
type SMALLINT NOT NULL,
ref1 BIGINT CHECK (type = 1) = (ref1 IS NOT NULL)),
ref2 BIGINT CHECK (type = 2) = (ref2 IS NOT NULL)),
timestamp INTEGER NOT NULL,
-- all kinds of common data
)

-- only rows where log.requests.type = 1
CREATE TABLE log.requests1 (
request_id BIGINT NOT NULL
)

-- only rows where log.requests.type = 2
CREATE TABLE log.requestsz2 (
request_id BIGINT NOT NULL
)

etc (about 10 different types)

the current VIEW:
CREATE VIEW requests_view AS
SELECT * FROM log.requests LEFT JOIN log.requests1 ON
(log.requests.ref1 = log.requests1.request_id) LEFT JOIN log.requests2
ON (log.requests.ref2 = log.requests2.request_id) -- etc
###########################

Now, in my theory, you would say that if postgresql encounters ref1 =
NULL, it will not attempt to JOIN the log.requests1 table. However,
I've been told that because the PostgreSQL planner doesn't know that
ref1 (or any other refX for that matter) is NULL, it will attempt to
JOIN all tables for all rows.

Is this true, and if so.. is there a workaround for this (perhaps that
my database design is flawed) ?

Regards,

Leon Mergen

#2Alban Hertroys
dalroi@solfertje.student.utwente.nl
In reply to: Leon Mergen (#1)
Re: Conditional JOINs ?

On Mar 18, 2008, at 8:06 PM, Leon Mergen wrote:

Hello,

Now, based on a previous post on the PostgreSQL mailing list
[http://archives.postgresql.org/pgsql-general/2007-11/msg00723.php] I
came up with this solution:

###########################
-- table where all common request data is stored
CREATE TABLE log.requests (
id BIGSERIAL,
type SMALLINT NOT NULL,
ref1 BIGINT CHECK (type = 1) = (ref1 IS NOT NULL)),
ref2 BIGINT CHECK (type = 2) = (ref2 IS NOT NULL)),
timestamp INTEGER NOT NULL,
-- all kinds of common data
)

-- only rows where log.requests.type = 1
CREATE TABLE log.requests1 (
request_id BIGINT NOT NULL
)

-- only rows where log.requests.type = 2
CREATE TABLE log.requestsz2 (
request_id BIGINT NOT NULL
)

etc (about 10 different types)

the current VIEW:
CREATE VIEW requests_view AS
SELECT * FROM log.requests LEFT JOIN log.requests1 ON
(log.requests.ref1 = log.requests1.request_id) LEFT JOIN log.requests2
ON (log.requests.ref2 = log.requests2.request_id) -- etc
###########################

Now, in my theory, you would say that if postgresql encounters ref1 =
NULL, it will not attempt to JOIN the log.requests1 table. However,
I've been told that because the PostgreSQL planner doesn't know that
ref1 (or any other refX for that matter) is NULL, it will attempt to
JOIN all tables for all rows.

Is this true, and if so.. is there a workaround for this (perhaps that
my database design is flawed) ?

This looks almost like table partitioning. If you inherit your
requestxxx tables from a common requests table and add a check
constraint to each inheriting table (a "partition"), the planner is
smart enough to figure out that no rows in that partition can
possibly match (constraint exclusion) and skips it.

Instead of joining, it uses something equivalent to a UNION ALL btw,
which I think is what you're looking for anyway.

There have been plenty of posts about how to set up table
partitioning, there's even an article about it in the standard
Postgres documentation:

http://www.postgresql.org/docs/8.1/static/ddl-partitioning.html

Regards,
Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,47e019899786732118417!

#3Leon Mergen
leon@solatis.com
In reply to: Alban Hertroys (#2)
Re: Conditional JOINs ?

Hello Alban,

On 3/18/08, Alban Hertroys <dalroi@solfertje.student.utwente.nl> wrote:

Now, in my theory, you would say that if postgresql encounters ref1 =
NULL, it will not attempt to JOIN the log.requests1 table. However,
I've been told that because the PostgreSQL planner doesn't know that
ref1 (or any other refX for that matter) is NULL, it will attempt to
JOIN all tables for all rows.

Is this true, and if so.. is there a workaround for this (perhaps that
my database design is flawed) ?

This looks almost like table partitioning. If you inherit your
requestxxx tables from a common requests table and add a check
constraint to each inheriting table (a "partition"), the planner is
smart enough to figure out that no rows in that partition can
possibly match (constraint exclusion) and skips it.

Instead of joining, it uses something equivalent to a UNION ALL btw,
which I think is what you're looking for anyway.

Well, the thing (as far as I'm aware) is that table partinioning and
UNION ALL expect the table layouts to look the same, don't they ? The
problem I'm having is that each row in a table has some 'additional'
information, which is in another table, and can be retrieved based on
a specific column in the table (request_type).

Now, I fail to see how UNION ALL or table partitioning can solve this
problem, which can be my problem -- am I missing some technique how
table partitioning can be used to extend a base table with several
extra tables that provide extra information ?

--
Leon Mergen
http://www.solatis.com

#4Erik Jones
erik@myemma.com
In reply to: Leon Mergen (#3)
Re: Conditional JOINs ?

On Mar 18, 2008, at 3:06 PM, Leon Mergen wrote:

Hello Alban,

On 3/18/08, Alban Hertroys <dalroi@solfertje.student.utwente.nl>
wrote:

Now, in my theory, you would say that if postgresql encounters
ref1 =
NULL, it will not attempt to JOIN the log.requests1 table. However,
I've been told that because the PostgreSQL planner doesn't know that
ref1 (or any other refX for that matter) is NULL, it will attempt to
JOIN all tables for all rows.

Is this true, and if so.. is there a workaround for this (perhaps
that
my database design is flawed) ?

This looks almost like table partitioning. If you inherit your
requestxxx tables from a common requests table and add a check
constraint to each inheriting table (a "partition"), the planner is
smart enough to figure out that no rows in that partition can
possibly match (constraint exclusion) and skips it.

Instead of joining, it uses something equivalent to a UNION ALL btw,
which I think is what you're looking for anyway.

Well, the thing (as far as I'm aware) is that table partinioning and
UNION ALL expect the table layouts to look the same, don't they ? The
problem I'm having is that each row in a table has some 'additional'
information, which is in another table, and can be retrieved based on
a specific column in the table (request_type).

Now, I fail to see how UNION ALL or table partitioning can solve this
problem, which can be my problem -- am I missing some technique how
table partitioning can be used to extend a base table with several
extra tables that provide extra information ?

Table partitioning is normally implemented via table inheritance and
you are free to add more, and different, columns to the "child" tables.

Observe:

CREATE SEQUENCE part_seq;
CREATE TABLE parent (
id integer PRIMARY KEY DEFAULT nextval('part_seq'),
foo text
);

CREATE TABLE child1 (
bar text,
CHECK(foo='some_type1'),
PRIMARY KEY (id)
) INHERITS (parent);

CREATE TABLE child2 (
baz text,
CHECK(foo='some_type2'),
PRIMARY KEY (id)
) INHERITS (parent);

Now, both child1 and child2 have id and foo fields, child1 will only
allow entries with foo='some_type1', child2 will only allow entries
with foo='some_type2', and both children have extra fields that
weren't present in the parent.

Erik Jones

DBA | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

#5Leon Mergen
leon@solatis.com
In reply to: Erik Jones (#4)
Re: Conditional JOINs ?

Hello Erik,

On 3/18/08, Erik Jones <erik@myemma.com> wrote:

Table partitioning is normally implemented via table inheritance and
you are free to add more, and different, columns to the "child" tables.

Observe:

CREATE SEQUENCE part_seq;
CREATE TABLE parent (
id integer PRIMARY KEY DEFAULT nextval('part_seq'),
foo text
);

CREATE TABLE child1 (
bar text,
CHECK(foo='some_type1'),
PRIMARY KEY (id)
) INHERITS (parent);

CREATE TABLE child2 (
baz text,
CHECK(foo='some_type2'),
PRIMARY KEY (id)
) INHERITS (parent);

Now, both child1 and child2 have id and foo fields, child1 will only
allow entries with foo='some_type1', child2 will only allow entries
with foo='some_type2', and both children have extra fields that
weren't present in the parent.

Ah, silly that I failed to understand that.

Thanks a lot for your response (Alban too) -- I can see table
partitioning solving my problem.

--
Leon Mergen
http://www.solatis.com

#6Erik Jones
erik@myemma.com
In reply to: Leon Mergen (#5)
Re: Conditional JOINs ?

On Mar 18, 2008, at 3:50 PM, Leon Mergen wrote:

Ah, silly that I failed to understand that.

Nah, we all do that stuff.

Thanks a lot for your response (Alban too) -- I can see table
partitioning solving my problem.

You're welcome!

Erik Jones

DBA | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

#7Leon Mergen
leon@solatis.com
In reply to: Erik Jones (#4)
Re: Conditional JOINs ?

Hello,

On 3/18/08, Erik Jones <erik@myemma.com> wrote:

Observe:

CREATE SEQUENCE part_seq;
CREATE TABLE parent (
id integer PRIMARY KEY DEFAULT nextval('part_seq'),
foo text
);

CREATE TABLE child1 (
bar text,
CHECK(foo='some_type1'),
PRIMARY KEY (id)
) INHERITS (parent);

CREATE TABLE child2 (
baz text,
CHECK(foo='some_type2'),
PRIMARY KEY (id)
) INHERITS (parent);

Now, both child1 and child2 have id and foo fields, child1 will only
allow entries with foo='some_type1', child2 will only allow entries
with foo='some_type2', and both children have extra fields that
weren't present in the parent.

Excuse me for bumping this up again, but I still don't understand how
to use this approach to sequentially walk through all different child
tables in one select, without having to JOIN these tables all the time
-- or will the planner 'understand' a query such as this:

SELECT parent.*, child1.*, child2.* FROM parent LEFT JOIN child1 ON
(parent.id = child1.id) LEFT JOIN child2 ON (parent.id = child2.id);

When running explain on this, as I interpret it, it shows that the
query plan will join both child1 and child2 on all the rows inside the
parent table:

QUERY PLAN
-----------------------------------------------------------------------------------
Hash Left Join (cost=56.00..189.50 rows=2760 width=172)
Hash Cond: (public.parent.id = child1.id)
-> Hash Left Join (cost=28.00..123.55 rows=2760 width=104)
Hash Cond: (public.parent.id = child2.id)
-> Append (cost=0.00..57.60 rows=2760 width=36)
-> Seq Scan on parent (cost=0.00..21.60 rows=1160 width=36)
-> Seq Scan on child1 parent (cost=0.00..18.00
rows=800 width=36)
-> Seq Scan on child2 parent (cost=0.00..18.00
rows=800 width=36)
-> Hash (cost=18.00..18.00 rows=800 width=68)
-> Seq Scan on child2 (cost=0.00..18.00 rows=800 width=68)
-> Hash (cost=18.00..18.00 rows=800 width=68)
-> Seq Scan on child1 (cost=0.00..18.00 rows=800 width=68)

Now, of course there must be something I'm missing here.. but this
seems like the solution of table inheritance will only result in the
same problem I was having before -- either I need to JOIN every row on
all child tables, or I need to specifically iterate over all the child
tables, one child table at a time (which will probably result in even
worse performance, since the 'parent' table is huge).

Am I misunderstanding something here, or is there simple no solution
for what I want ?

Regards,

Leon Mergen

#8Leon Mergen
leon@solatis.com
In reply to: Leon Mergen (#7)
Re: Conditional JOINs ?

On 3/19/08, Leon Mergen <leon@solatis.com> wrote:

Excuse me for bumping this up again, but I still don't understand how
to use this approach to sequentially walk through all different child
tables in one select, without having to JOIN these tables all the time

Apparently a UNION all solved this problem -- sorry for the noise.

--
Leon Mergen
http://www.solatis.com

#9Erik Jones
erik@myemma.com
In reply to: Leon Mergen (#8)
Re: Conditional JOINs ?

On Mar 19, 2008, at 9:01 AM, Leon Mergen wrote:

On 3/19/08, Leon Mergen <leon@solatis.com> wrote:

Excuse me for bumping this up again, but I still don't understand how
to use this approach to sequentially walk through all different child
tables in one select, without having to JOIN these tables all the
time

Apparently a UNION all solved this problem -- sorry for the noise.

If you have the child tables INHERITing from the parent, then a simple

SELECT parent.* FROM parent;

would be equivalent to manually spelling out a UNION ALL that
explicitly lists all of the tables.

Erik Jones

DBA | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

#10Leon Mergen
leon@solatis.com
In reply to: Erik Jones (#9)
Re: Conditional JOINs ?

On 3/19/08, Erik Jones <erik@myemma.com> wrote:

Excuse me for bumping this up again, but I still don't understand how
to use this approach to sequentially walk through all different child
tables in one select, without having to JOIN these tables all the
time

Apparently a UNION all solved this problem -- sorry for the noise.

If you have the child tables INHERITing from the parent, then a simple

SELECT parent.* FROM parent;

would be equivalent to manually spelling out a UNION ALL that
explicitly lists all of the tables.

But this will only display the information that is common for all the
child tables -- if I also want to display all the information that is
specific for the child tables, as I understand it, I have to use a
UNION ALL and merge all the child tables together this way.

The EXPLAIN of this query:

Append (cost=0.00..2169.52 rows=34376 width=94)
-> Seq Scan ON child1 (cost=0.00..1824.71 rows=34371 width=94)
-> Seq Scan ON child2 (cost=0.00..1.05 rows=5 width=56)

Regards,

Leon Mergen

#11Joris Dobbelsteen
Joris@familiedobbelsteen.nl
In reply to: Leon Mergen (#10)
Re: Conditional JOINs ?

Leon Mergen wrote:

On 3/19/08, Erik Jones <erik@myemma.com> wrote:

Excuse me for bumping this up again, but I still don't understand how
to use this approach to sequentially walk through all different child
tables in one select, without having to JOIN these tables all the
time

Apparently a UNION all solved this problem -- sorry for the noise.

If you have the child tables INHERITing from the parent, then a simple

SELECT parent.* FROM parent;

would be equivalent to manually spelling out a UNION ALL that
explicitly lists all of the tables.

But this will only display the information that is common for all the
child tables -- if I also want to display all the information that is
specific for the child tables, as I understand it, I have to use a
UNION ALL and merge all the child tables together this way.

The EXPLAIN of this query:

Append (cost=0.00..2169.52 rows=34376 width=94)
-> Seq Scan ON child1 (cost=0.00..1824.71 rows=34371 width=94)
-> Seq Scan ON child2 (cost=0.00..1.05 rows=5 width=56)

Regards,

Leon Mergen

What I think you desire is more in the form of:
SELECT id, foo, bar, NULL AS "baz"
FROM child1
UNION ALL
SELECT id, foo, NULL, baz
FROM child2.

I think if you compare it to I/O volume, the joins will not cause many
additional I/Os as long as the indexes on "id" for tables child1 and
child2 will fit into memory.

- Joris