nested loops in joins, ambiguous rewrite rules

Started by Charles Hornbergerabout 27 years ago3 messagesgeneral
Jump to latest
#1Charles Hornberger
hornberger@tabloid.net

I've got a Postgres 6.4 DB that is doing, to my mind, funny things.

The main problem is that joins over multiple tables are taking a long, long
time to execute.

When I do EXPLAINs on join queries, I find that the optimizer is choosing
to do a lot of nested loops and sequential scans, instead of Merge Joins or
something that sounds more efficient.

For instance, here's a EXPLAIN SELECT on a view that joins 7 tables. All
of the join columns are indexed.

NOTICE: QUERY PLAN:

Nested Loop (cost=0.00 size=1 width=140)
-> Nested Loop (cost=0.00 size=1 width=124)
-> Nested Loop (cost=0.00 size=1 width=108)
-> Nested Loop (cost=0.00 size=1 width=92)
-> Nested Loop (cost=0.00 size=1 width=76)
-> Merge Join (cost=0.00 size=1 width=60)
-> Seq Scan (cost=0.00 size=0 width=0)
-> Sort (cost=0.00 size=0 width=0)
-> Seq Scan on article
(cost=0.00 size=0 width=32)
-> Seq Scan (cost=0.00 size=0 width=0)
-> Sort (cost=0.00 size=0 width=0)
-> Seq Scan on article_text
(cost=0.00 size=0 width=28)
-> Seq Scan on article_source (cost=0.00 size=0
width=16)
-> Seq Scan on section (cost=0.00 size=0 width=16)
-> Seq Scan on locale (cost=0.00 size=0 width=16)
-> Seq Scan on volume (cost=0.00 size=0 width=16)
-> Seq Scan on issue (cost=0.00 size=0 width=16)

EXPLAIN

This view was created with:

CREATE VIEW all_articles AS
SELECT article.article_id, article.print_publ_date,
article.print_page_no,
article_text.headline, article_text.subhead,
article_source.source_name,
section.section_name,
locale.locale_name,
volume.volume_name,
issue.issue_name
FROM article,
article_text,
article_source,
section,
locale,
volume,
issue
WHERE article.article_id = article_text.article_id
AND article.article_source_id = article_source.source_id
AND article.section_id = section.section_id
AND article.locale_id = locale.locale_id
AND article.volume_id = volume.volume_id
AND article.issue_id = issue.issue_id ;

It takes 5-7 minutes to perform a query on this view, even though there are
no records in any of these tables.

One more thing I noticed. In trying to figure out what's going wrong, I
dumped the structure of an existing DB using `pg_dump -s`. Then when I
tried to create a new DB from the dump file, I got the following error:

CREATE RULE "_RETall_articles" AS ON SELECT TO "all_articles" DO INSTEAD
SELECT "article_id", "print_publ_date", "print_page_no", "headline",
"subhead", "source_name", "section_name", "locale_name", "volume_name",
"issue_name" FROM "article", "article_text", "article_source", "section",
"locale", "volume", "issue" WHERE ((((("article_id" = "article_id") AND
("article_source_id" = "source_id")) AND ("section_id" = "section_id")) AND
("locale_id" = "locale_id")) AND ("volume_id" = "volume_id")) AND
("issue_id" = "issue_id");
ERROR: Column article_id is ambiguous

It seems that pg_dump isn't handling the SQL correctly.

Thanks in advance for any advice.

Charlie

#2Charles Hornberger
hornberger@tabloid.net
In reply to: Charles Hornberger (#1)
Re: [GENERAL] nested loops in joins, ambiguous rewrite rules

Hello! I'm in a bit of a fix, and I bet that someone out there has just the
right answer for me. I've been struggling with this optimization issue for
a couple days now.

The explanation that follows is very detailed ... I include the dumps from
a few versions of
one of my test databases.

It's been suggested to me by others that perhaps there's some kind of bug
in the optimizer, but that's not a diagnois that I could make on my own.
I'd really appreciate a fresh view on this problem.

The most recent suggestion I got from Vadim was to run vacuum analyze
against the databases. I've tried this multiple times, but it has no effect.

What is happening (please forgive my non-technical explanation) is that the
optimizer is occasionally lapsing into these "nested loop" query plans,
rather than using something more efficient. It doesn't matter whether I
vacuum the DB; it seems to depend on something else entirely. But I can't
figure out what that is.

Here are two identically structured sets of tables from two separate
databases, with slightly different data in them.

This DB is named apx13:
----------------------
CREATE SEQUENCE "article_article_id_seq" start 10 increment 1 maxvalue
2147483647 minvalue 1 cache 1 ;
SELECT nextval ('article_article_id_seq');
CREATE TABLE "article" ("article_id" "int4" DEFAULT nextval (
'article_article_id_seq' ) NOT NULL, "section_id" "int4" NOT NULL,
"locale_id" "int4" NOT NULL, "article_source_id" "int4", "volume_id"
"int4", "issue_id" "int4", "print_page_no" "int4", "print_publ_date"
"date", "site_publ_date" "date" NOT NULL, "site_publ_time" "datetime" NOT
NULL, "inputter_id" "int4" NOT NULL, "input_date" "datetime" DEFAULT text
'now', "published" "int4" DEFAULT 0);
INSERT INTO "article"
("article_id","section_id","locale_id","article_source_id","volume_id","issu
e_id","print_page_no","print_publ_date","site_publ_date","site_publ_time","i
nputter_id","input_date","published") values
(10,3,1,4,2,3,4,'04-05-2006','01-28-1999','Thu Jan 28 19:28:40 1999
PST',100,'Thu Jan 28 19:28:40 1999 PST',0);
CREATE UNIQUE INDEX "article_article_id_key" on "article" using btree (
"article_id" "int4_ops" );
CREATE INDEX "article_vol_ix" on "article" using btree ( "volume_id"
"int4_ops" );
CREATE INDEX "article_source_ix" on "article" using btree (
"article_source_id" "int4_ops" );
CREATE INDEX "article_issue_ix" on "article" using btree ( "issue_id"
"int4_ops" );
CREATE INDEX "article_locale_ix" on "article" using btree ( "locale_id"
"int4_ops" );
CREATE INDEX "article_section_ix" on "article" using btree ( "section_id"
"int4_ops" );
CREATE TABLE "article_text" ("article_id" "int4" NOT NULL, "headline"
varchar, "subhead" varchar);
INSERT INTO "article_text" ("article_id","headline","subhead") values
(10,'Mayor Signs Contract With Company','Legally binding document said to
be four pages long');

Here's the same thing, but with a couple extra rows in the 'article' and
'article_text' tables

This one is named apx14
-----------------------
CREATE SEQUENCE "article_article_id_seq" start 10 increment 1 maxvalue
2147483647 minvalue 1 cache 1 ;
SELECT nextval ('article_article_id_seq');
CREATE TABLE "article" ("article_id" "int4" DEFAULT nextval (
'article_article_id_seq' ) NOT NULL, "section_id" "int4" NOT NULL,
"locale_id" "int4" NOT NULL, "article_source_id" "int4", "volume_id"
"int4", "issue_id" "int4", "print_page_no" "int4", "print_publ_date"
"date", "site_publ_date" "date" NOT NULL, "site_publ_time" "datetime" NOT
NULL, "inputter_id" "int4" NOT NULL, "input_date" "datetime" DEFAULT text
'now', "published" "int4" DEFAULT 0);
INSERT INTO "article"
("article_id","section_id","locale_id","article_source_id","volume_id","issu
e_id","print_page_no","print_publ_date","site_publ_date","site_publ_time","i
nputter_id","input_date","published") values
(10,3,1,4,2,3,4,'04-05-2006','01-28-1999','Thu Jan 28 19:28:40 1999
PST',100,'Thu Jan 28 19:28:40 1999 PST',0);
INSERT INTO "article"
("article_id","section_id","locale_id","article_source_id","volume_id","issu
e_id","print_page_no","print_publ_date","site_publ_date","site_publ_time","i
nputter_id","input_date","published") values
(11,3,1,4,2,3,4,'04-05-2006','01-28-1999','Thu Jan 28 19:28:40 1999
PST',100,'Thu Jan 28 19:28:40 1999 PST',0);
CREATE UNIQUE INDEX "article_article_id_key" on "article" using btree (
"article_id" "int4_ops" );
CREATE INDEX "article_vol_ix" on "article" using btree ( "volume_id"
"int4_ops" );
CREATE INDEX "article_source_ix" on "article" using btree (
"article_source_id" "int4_ops" );
CREATE INDEX "article_issue_ix" on "article" using btree ( "issue_id"
"int4_ops" );
CREATE INDEX "article_locale_ix" on "article" using btree ( "locale_id"
"int4_ops" );
CREATE INDEX "article_section_ix" on "article" using btree ( "section_id"
"int4_ops" );
CREATE TABLE "article_text" ("article_id" "int4" NOT NULL, "headline"
varchar, "subhead" varchar);
INSERT INTO "article_text" ("article_id","headline","subhead") values
(10,'Mayor Signs Contract With Company','Legally binding document said to
be four pages long');
INSERT INTO "article_text" ("article_id","headline","subhead") values
(11,'Mayor Cancels Contract','Company Promises to Sue Over Scuttled Deal');
CREATE INDEX "article_text_ix" on "article_text" using btree (
"article_id" "int4_ops" );

When I run the following query...

EXPLAIN SELECT a.article_id, b.article_id
FROM article a, article_text b
WHERE a.article_id = b.article_id;

... against apx13, it produces:

NOTICE: QUERY PLAN:

Nested Loop (cost=2.07 size=1 width=8)
-> Seq Scan on article a (cost=1.03 size=1 width=4)
-> Seq Scan on article_text b (cost=1.03 size=1 width=4)

EXPLAIN

When I run it against apx14, it produces:

NOTICE: QUERY PLAN:

Merge Join (cost=0.00 size=1 width=8)
-> Seq Scan (cost=0.00 size=0 width=0)
-> Sort (cost=0.00 size=0 width=0)
-> Seq Scan on article a (cost=0.00 size=0 width=4)
-> Seq Scan (cost=0.00 size=0 width=0)
-> Sort (cost=0.00 size=0 width=0)
-> Seq Scan on article_text b (cost=0.00 size=0 width=4)

EXPLAIN

In order to further try and track down this problem, I dumped the schema
from apx14 and the data from apx13, then created a new database, apx15, by
reloading the apx14 schema and the apx13 data. Join queries in the new
apx15 database also produce "Merge Join" query plans instead of nested loops.

So what's wrong with apx13?

Can anyone give me any hints as to how to fix this?

Thanks in advance!

Charlie

(My previous posting, and Vadim's reponse, follow:)

At 07:03 PM 1/29/99 +0700, you wrote:

Charles Hornberger wrote:

I've got a Postgres 6.4 DB that is doing, to my mind, funny things.

The main problem is that joins over multiple tables are taking a long, long
time to execute.

man vacuum

Vadim

When I do EXPLAINs on join queries, I find that the optimizer is choosing
to do a lot of nested loops and sequential scans, instead of Merge Joins or
something that sounds more efficient.

For instance, here's a EXPLAIN SELECT on a view that joins 7 tables. All
of the join columns are indexed.

NOTICE: QUERY PLAN:

Nested Loop (cost=0.00 size=1 width=140)
-> Nested Loop (cost=0.00 size=1 width=124)
-> Nested Loop (cost=0.00 size=1 width=108)
-> Nested Loop (cost=0.00 size=1 width=92)
-> Nested Loop (cost=0.00 size=1 width=76)
-> Merge Join (cost=0.00 size=1 width=60)
-> Seq Scan (cost=0.00 size=0 width=0)
-> Sort (cost=0.00 size=0 width=0)
-> Seq Scan on article

(cost=0.00 size=0 width=32)

-> Seq Scan (cost=0.00 size=0 width=0)
-> Sort (cost=0.00 size=0 width=0)
-> Seq Scan on article_text

(cost=0.00 size=0 width=28)

-> Seq Scan on article_source (cost=0.00

size=0 width=16)

Show quoted text

-> Seq Scan on section (cost=0.00 size=0 width=16)
-> Seq Scan on locale (cost=0.00 size=0 width=16)
-> Seq Scan on volume (cost=0.00 size=0 width=16)
-> Seq Scan on issue (cost=0.00 size=0 width=16)

EXPLAIN

This view was created with:

CREATE VIEW all_articles AS
SELECT article.article_id, article.print_publ_date,
article.print_page_no,
article_text.headline, article_text.subhead,
article_source.source_name,
section.section_name,
locale.locale_name,
volume.volume_name,
issue.issue_name
FROM article,
article_text,
article_source,
section,
locale,
volume,
issue
WHERE article.article_id = article_text.article_id
AND article.article_source_id = article_source.source_id
AND article.section_id = section.section_id
AND article.locale_id = locale.locale_id
AND article.volume_id = volume.volume_id
AND article.issue_id = issue.issue_id ;

It takes 5-7 minutes to perform a query on this view, even though there
are no records in any of >these tables.

One more thing I noticed. In trying to figure out what's going wrong, I
dumped the structure of an existing DB using `pg_dump -s`. Then when I
tried to create a new DB from the dump file, I got the following error:

CREATE RULE "_RETall_articles" AS ON SELECT TO "all_articles" DO INSTEAD
SELECT "article_id", "print_publ_date", "print_page_no", "headline",
"subhead", "source_name", "section_name", "locale_name", "volume_name",
"issue_name" FROM "article", "article_text", "article_source", "section",
"locale", "volume", "issue" WHERE ((((("article_id" = "article_id") AND
("article_source_id" = "source_id")) AND ("section_id" = "section_id"))
AND ("locale_id" = "locale_id")) AND ("volume_id" = "volume_id")) AND
("issue_id" = "issue_id");
ERROR: Column article_id is ambiguous

It seems that pg_dump isn't handling the SQL correctly.

Thanks in advance for any advice.

Charlie

#3Charles Hornberger
hornberger@tabloid.net
In reply to: Charles Hornberger (#2)
Re: [GENERAL] nested loops in joins, ambiguous rewrite rules

Hello again. I've got a few more notes on this "strange" optimizer
behavior, which might be useful to anyone interested in this problem:

As I mentioned in my last posting, I have one DB called apx13 that is
producing expensive "nested loop" query plans when I try to join two
tables, and one called apx14 that's using "merge join" query plans. I've
been trying to figure out why this is. (This is all happening under 6.4.
We haven't upgraded to 6.4.2 yet but I don't see anything in the 6.4.2
HISTORY file that suggests that major changes have been made to the
optimizer, so I guess that probably wouldn't change anything?)

In any case, I dumped the entire contents the apx14 database using `pg_dump
-D apx14 > apx14.out`. Then I created a new database called apx15 and ran
`psql -f apx14.out apx15`.

Then I ran a join query against apx15:

apx15=> \i ../query
explain select a.article_id, b.article_id
from article a, article_text b
where a.article_id = b.article_id;
NOTICE: QUERY PLAN:

Nested Loop (cost=3.20 size=2 width=8)
-> Seq Scan on article a (cost=1.07 size=2 width=4)
-> Seq Scan on article_text b (cost=1.07 size=2 width=4)

EXPLAIN

So I ran vacuum analyze and did it again. And got the same query plan.

At that point, I decided to create yet another DB, apx16. I just used the
schema from apx14, and the contents of the two tables being used in my join
query.

In other words:
`pg_dump -s apx14 > apx14.schema.out`
`pg_dump -aDt article apx14 > apx14.article.out`
`pg_dump -aDt article_text apx14 > apx14.article_text.out`
`psql -f apx14.schema.out apx16`
`psql -f apx14.article.out apx16`
`psql -f apx14.article_text.out apx16`

Then I ran the same join query against apx16, and this time the optimizer
decided to use a "merge join":

apx16=> \i ../query
explain select a.article_id, b.article_id
from article a, article_text b
where a.article_id = b.article_id;
NOTICE: QUERY PLAN:

Merge Join (cost=0.00 size=1 width=8)
-> Seq Scan (cost=0.00 size=0 width=0)
-> Sort (cost=0.00 size=0 width=0)
-> Seq Scan on article a (cost=0.00 size=0 width=4)
-> Seq Scan (cost=0.00 size=0 width=0)
-> Sort (cost=0.00 size=0 width=0)
-> Seq Scan on article_text b (cost=0.00 size=0 width=4)

EXPLAIN

Maybe I'm missing something, but I can't understand why two identical join
queries run against two identical tables should produce different output.
Maybe there's something in the way my tables and indices are set up, but I
can't imagine what that is.

One final note: I just did another test, hoping to exlude other factors
from the equation. I created two more databases using only the two
relevant tables (and sequences & indices) from apx16. What I discovered is
that if I build a database from "full" dump files (i.e., dump files that
include both table schema & data), the optimizer will always elect to use
"nested loops" for its query plans. On the other hand, if I build the
database by first creating the tables and indices, and then doing the
inserts in a second operation, it uses the less costly "merge join". As a
matter of fact, it appears that if you create an index on a table *after*
inserting data into that table, joins performed on that table will *always*
use these expensive nested loops.

So my theory is:

Since pg_dump creates dump files that perform INSERTs on tables before
performing any CREATE INDEX statements on tables, using raw dump files to
create databases will cause this "nested loop" behavior.

And there's another catch: If you create your tables and indices first,
then insert your data, the optimizer will continue to build "merge join"
query plans until you make the mistake of running a "vacuum analyze". If
you run "vacuum analyze", the optimizer will then decide to begin using the
more costly nested loops to handle any join queries. The only way to get
around this is to delete all records from the tables being joined,
re-insert them and remember NOT to run vacuum analyze.

If anyone wants to replicate this, I've documented my tests below.

(And, of course, if I've totally missed the point on anything in here,
please correct me.)

===============
Table structure
===============

apx16=> \d article
Table    = article
+----------------------------------+----------------------------------+-----
--+
|              Field               |              Type                |
Length|
+----------------------------------+----------------------------------+-----
--+
| article_id                       | int4 not null default nextval (  |
4 |
| section_id                       | int4 not null                    |
4 |
| locale_id                        | int4 not null                    |
4 |
| article_source_id                | int4                             |
4 |
| volume_id                        | int4                             |
4 |
| issue_id                         | int4                             |
4 |
| print_page_no                    | int4                             |
4 |
| print_publ_date                  | date                             |
4 |
| site_publ_date                   | date not null                    |
4 |
| site_publ_time                   | datetime not null                |
8 |
| inputter_id                      | int4 not null                    |
4 |
| input_date                       | datetime default text 'now'      |
8 |
| published                        | int4 default 0                   |
4 |
+----------------------------------+----------------------------------+-----
--+
Indices:  article_article_id_key
          article_issue_ix
          article_locale_ix
          article_section_ix
          article_source_ix
          article_vol_ix

apx16=> \d article_article_id_key

Table    = article_article_id_key
+----------------------------------+----------------------------------+-----
--+
|              Field               |              Type                |
Length|
+----------------------------------+----------------------------------+-----
--+
| article_id                       | int4                             |
4 |
+----------------------------------+----------------------------------+-----
--+

apx16=> \d article_text

Table    = article_text
+----------------------------------+----------------------------------+-----
--+
|              Field               |              Type                |
Length|
+----------------------------------+----------------------------------+-----
--+
| article_id                       | int4 not null                    |
4 |
| headline                         | varchar()                        |
0 |
| subhead                          | varchar()                        |
0 |
+----------------------------------+----------------------------------+-----
--+
Index:    article_text_ix

apx16=> \d article_text_ix

Table    = article_text_ix
+----------------------------------+----------------------------------+-----
--+
|              Field               |              Type                |
Length|
+----------------------------------+----------------------------------+-----
--+
| article_id                       | int4                             |
4 |
+----------------------------------+----------------------------------+-----
--+

=================
CREATE STATEMENTS
=================

CREATE SEQUENCE "article_article_id_seq" start 10 increment 1 maxvalue
2147483647 minvalue 1 cache 1 ;
SELECT nextval ('article_article_id_seq');
CREATE TABLE "article" ("article_id" "int4" DEFAULT nextval (
'article_article_id_seq' ) NOT NULL, "section_id" "int4" NOT NULL,
"locale_id" "int4" NOT NULL, "article_source_id" "int4", "volume_id"
"int4", "issue_id" "int4", "print_page_no" "int4", "print_publ_date"
"date", "site_publ_date" "date" NOT NULL, "site_publ_time" "datetime" NOT
NULL, "inputter_id" "int4" NOT NULL, "input_date" "datetime" DEFAULT text
'now', "published" "int4" DEFAULT 0);
CREATE UNIQUE INDEX "article_article_id_key" on "article" using btree (
"article_id" "int4_ops" );
CREATE INDEX "article_vol_ix" on "article" using btree ( "volume_id"
"int4_ops" );
CREATE INDEX "article_source_ix" on "article" using btree (
"article_source_id" "int4_ops" );
CREATE INDEX "article_issue_ix" on "article" using btree ( "issue_id"
"int4_ops" );
CREATE INDEX "article_locale_ix" on "article" using btree ( "locale_id"
"int4_ops" );
CREATE INDEX "article_section_ix" on "article" using btree ( "section_id"
"int4_ops" );
CREATE TABLE "article_text" ("article_id" "int4" NOT NULL, "headline"
varchar, "subhead" varchar);
CREATE INDEX "article_text_ix" on "article_text" using btree (
"article_id" "int4_ops" );

=================
INSERT STATEMENTS
=================

INSERT INTO "article"
("article_id","section_id","locale_id","article_source_id","volume_id","issu
e_id","print_page_no","print_publ_date","site_publ_date","site_publ_time","i
nputter_id","input_date","published") values
(10,3,1,4,2,3,4,'04-05-2006','01-28-1999','Thu Jan 28 19:28:40 1999
PST',100,'Thu Jan 28 19:28:40 1999 PST',0);
INSERT INTO "article"
("article_id","section_id","locale_id","article_source_id","volume_id","issu
e_id","print_page_no","print_publ_date","site_publ_date","site_publ_time","i
nputter_id","input_date","published") values
(11,3,1,4,2,3,4,'04-05-2006','01-28-1999','Thu Jan 28 19:28:40 1999
PST',100,'Thu Jan 28 19:28:40 1999 PST',0);
INSERT INTO "article_text" ("article_id","headline","subhead") values
(10,'Mayor Signs Contract With Company','Legally binding document said to
be four pages long');
INSERT INTO "article_text" ("article_id","headline","subhead") values
(11,'Mayor Cancels Contract','Company Promises to Sue Over Scuttled Deal');

==================
Steps to replicate
==================

Vacuum analyze 'problem'
------------------------
1. Create a fresh database and execute the CREATE statements above
2. Execute the INSERT statements above
3. Execute the following query:
EXPLAIN SELECT a.article_id, b.article_id
FROM article a, article_text b
WHERE a.article_id = b.article_id;
4. You should see:

NOTICE: QUERY PLAN:

Merge Join (cost=0.00 size=1 width=8)
-> Seq Scan (cost=0.00 size=0 width=0)
-> Sort (cost=0.00 size=0 width=0)
-> Seq Scan on article a (cost=0.00 size=0 width=4)
-> Seq Scan (cost=0.00 size=0 width=0)
-> Sort (cost=0.00 size=0 width=0)
-> Seq Scan on article_text b (cost=0.00 size=0 width=4)

EXPLAIN

5. Do "vacuum analyze".
6. Execute the query again.
7. Now you'll see:

NOTICE: QUERY PLAN:

Nested Loop (cost=3.20 size=2 width=8)
-> Seq Scan on article a (cost=1.07 size=2 width=4)
-> Seq Scan on article_text b (cost=1.07 size=2 width=4)

EXPLAIN

pg_dump 'problem'
-----------------

1. Dump the DB you created above to a file with
`pg_dump -D dbname > foo`

2. Create a new DB

3. Do `psql -f foo newdb`

4. Execute the join query:
EXPLAIN SELECT a.article_id, b.article_id
FROM article a, article_text b
WHERE a.article_id = b.article_id;

5. You should see:

NOTICE: QUERY PLAN:

Nested Loop (cost=3.20 size=2 width=8)
-> Seq Scan on article a (cost=1.07 size=2 width=4)
-> Seq Scan on article_text b (cost=1.07 size=2 width=4)

EXPLAIN