Tool for database design documentation?

Started by Jon Christian Ottersenalmost 21 years ago23 messagesgeneral

Jon Christian Ottersen

jco@kvikkpass.no

almost 21 years ago

We are trying to find a good way to document our database design - what
is the rationale behind each table/field, what kind of information is
each field supposed to contain, perhaps also something about the linking
between the tables etc. Is there anybody who has some experience on
this? Is the built in 'comments' fields in pgsql the best tool or are
there any special tools that would be recommendable?

smanes@magpie.com

almost 21 years ago

In reply to: Jon Christian Ottersen (#1)

Re: Tool for database design documentation?

Jon Christian Ottersen wrote:

We are trying to find a good way to document our database design ï¿½ what
is the rationale behind each table/field, what kind of information is
each field supposed to contain, perhaps also something about the linking
between the tables etc. Is there anybody who has some experience on
this? Is the built in ï¿½commentsï¿½ fields in pgsql the best tool or are
there any special tools that would be recommendable?

I prefer to use a third-party tool like Case Studio 2
(http://www.casestudio.com) for this. Import your schema into CS,
document your various entities in the table/element/key/etc popups, then
Export everything to HTML. CS generates pretty complete documentation.

jmelloy@visualdistortion.org

almost 21 years ago

In reply to: Jon Christian Ottersen (#1)

Re: Tool for database design documentation?

On Jul 30, 2005, at 2:30 PM, Jon Christian Ottersen wrote:

We are trying to find a good way to document our database design –
what is the rationale behind each table/field, what kind of
information is each field supposed to contain, perhaps also
something about the linking between the tables etc. Is there
anybody who has some experience on this? Is the built in ‘comments’
fields in pgsql the best tool or are there any special tools that
would be recommendable?

Druid (http://druid.sourceforge.net) works fairly well. (Though the
interface is abysmal, it does its job). It makes javadoc-style
documentation for tables and columns.

Jeff

wb27@drexel.edu

almost 21 years ago

In reply to: Jeffrey Melloy (#3)

Re: Tool for database design documentation?

I use a combination of the gnome-based diagraming application Dia
(http://www.gnome.org/projects/dia/) and the tedia2sql diagram-to-DDL
SQL translation script, which is written in Perl (http://
tedia2sql.tigris.org/).

It's a bit awkward, but:
1) Dia is a wonderful, open source diagramming tool I can run on
Windows, Mac OS X (via Fink) or Linux - nearly all of the commercial
tools have traditionally been Windows only, though this is slowly
changing;
2) Dia comes with a set of UML diagram objects which work quite
well for laying out a detailed data model. There are also ERD
diagram objects, they don't seem to be of much use;
3) tedia2sql does a very good job translating most standard
SQL-92 (and some additional SQL-99 extensions) from UML objects to
SQL objects;
4) Dia saves its diagrams in XML format which makes it
relatively straight-forward to add to the SQL translations being
performed (see the many tedia2sql like translators on the Dia links
page - http://www.gnome.org/projects/dia/links.html). Some of the
PostgreSQL specific features such as Inheritance and Rules can be
diagramed using the UML formalism and one can write extensions in
Perl to tedia2sql to turn that in DDL SQL;

There are other SQL to Dia diagram translators (e.g., PostgreSQL
AutoDoc [http://www.rbt.ca/autodoc/]) which allow for reverse-
engineering a data model diagram from an existing PostgreSQL DDL SQL
schema. There are also other Dia-to-SQL tools (see the Dia links
page), some of them even more specifically tailored to PostgreSQL
than tedia2SQL is. I came to using tedia2sql over these because I
found it supported more of the general SQL entities and relations I
needed and it was relatively easy to use.

I should add this approach of sticking with open source tools that
are relatively straight-forward to modify and extend is not for the
faint of heart. You absolutely must have a solid grasp on what a
database Entity-Relation Diagram (ERD) is. Understanding the
translation process (XML Dia diagram objects to DDL SQL code
generation) can be quite helpful in making best use of tedia2sql,
though it is not necessary to getting started turning DDL SQL out
from your Dia data models. A thorough understanding of this process
and knowledge of XML document parsing IS required to extend the
translator.

This more "home grown" solution doesn't do as good a job separating
logical data models (implementable in any RDBMS) from physical data
models (version of the logical model designed for a specific RDBMS).
I've found to easiest and quickest to simply create the physical
model diagram and focus on how to get that translate properly in
PostgreSQL DDL SQL. The Dia data model diagrams I create aren't
easily re-usuable in other RDBMSs, but then again, I don't intend to
stop using PostgreSQL, unless I'm driven from it for some
applications-specific reason.

I'm working on a port of tedia2sql to Ruby (my scripting language of
choice of late), mainly because I've got Object-Relational mapping
libraries there which will make it easier to create a complete,
"round trip" ERD system - i.e., be able to both generate Dia diagrams
form existing PostgreSQL DDL SQL schemas and turn Dia diagrams in to
DDL SQL that can be used to directly instantiate the schema in
PostgreSQL (via DBI code).

Cheers,
Bill

On Jul 31, 2005, at 1:54 AM, Jeffrey Melloy wrote:

Show quoted text

On Jul 30, 2005, at 2:30 PM, Jon Christian Ottersen wrote:

We are trying to find a good way to document our database design –
what is the rationale behind each table/field, what kind of
information is each field supposed to contain, perhaps also
something about the linking between the tables etc. Is there
anybody who has some experience on this? Is the built in
‘comments’ fields in pgsql the best tool or are there any special
tools that would be recommendable?

Druid (http://druid.sourceforge.net) works fairly well. (Though
the interface is abysmal, it does its job). It makes javadoc-style
documentation for tables and columns.

Jeff

littlebutty@gmail.com

almost 21 years ago

In reply to: Jon Christian Ottersen (#1)

Re: Tool for database design documentation?

Not a free utility, but a good one:
http://www.datanamic.com/dezign/index.html

Havasvölgyi Ottó

h.otto@freemail.hu

almost 21 years ago

In reply to: Jon Christian Ottersen (#1)

feeding big script to psql

Hi,

A generated a big SQL script (about 20 Mb), and fed it to psql. I was very
surprised that within a minute psql became quite slow. There were areas,
where less than 10 row were inserted in a second.
This is on a WinXP machine with local server 8.0.3, and only I use it.
Looking at the log files of PG the commands are executed fast.
I have monitored psql with te TaskManager, and it was at 45-50% CPU (this is
Hyperthreaded, so 50% is the max for one thread), and the memory
allocation/deallocation was very active, even about +-2Mb/sec.
The command were simple create table and insert commands chunked into
several lines like this:

CREATE TABLE aeloleg(
vevo CHAR(6),
szallito INTEGER,
datum DATE,
hatarido DATE,
vevo_nev CHAR(30),
ir_szam INTEGER,
helyseg CHAR(20),
cim CHAR(20),
befizetes INTEGER,
terheles INTEGER,
hitel INTEGER,
rendeles INTEGER,
jel CHAR(1),
trans INTEGER,
szoveg TEXT,
storno BOOLEAN) WITHOUT OIDS;

The insert commands for one table were surrounded by BEGIN and COMMIT like
this:

CREATE
BEGIN
INSERT
INSERT
...
INSERT
COMMIT

I know it would be faster with COPY, but this is extremly slow, and the
bottleneck is psql.
What is the problem?

Regards,
Otto

tgl@sss.pgh.pa.us

almost 21 years ago

In reply to: Havasvölgyi Ottó (#6)

Re: feeding big script to psql

=?ISO-8859-1?Q?Havasv=F6lgyi_Ott=F3?= <h.otto@freemail.hu> writes:

I know it would be faster with COPY, but this is extremly slow, and the
bottleneck is psql.
What is the problem?

Hmm, does the Windows port have readline support, and if so does adding
the "-n" switch to the psql invocation fix the problem? Or you could
try feeding the script with -f switch or \i rather than "psql <script".
Readline adds a fair amount of overhead, which is completely invisible
at human typing speeds but can be annoying when reading scripts.

regards, tom lane

Havasvölgyi Ottó

h.otto@freemail.hu

almost 21 years ago

In reply to: Jon Christian Ottersen (#1)

Re: feeding big script to psql

Tom,

Thanks for the suggestion. I have just applied both switch , -f (I have
applied this in the previous case too) and -n, but it becomes slow again. At
the beginning it reads about 300 KB a second, and when it has read 1.5 MB,
it reads only about 10 KB a second, it slows down gradually. Maybe others
should also try this scenario. Can I help anything?

Best Regards,
Otto

----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Havasvďż˝lgyi Ottďż˝" <h.otto@freemail.hu>
Cc: <pgsql-general@postgresql.org>
Sent: Tuesday, August 02, 2005 3:54 AM
Subject: Re: [GENERAL] feeding big script to psql

Show quoted text

=?ISO-8859-1?Q?Havasv=F6lgyi_Ott=F3?= <h.otto@freemail.hu> writes:

I know it would be faster with COPY, but this is extremly slow, and the
bottleneck is psql.
What is the problem?

Hmm, does the Windows port have readline support, and if so does adding
the "-n" switch to the psql invocation fix the problem? Or you could
try feeding the script with -f switch or \i rather than "psql <script".
Readline adds a fair amount of overhead, which is completely invisible
at human typing speeds but can be annoying when reading scripts.

regards, tom lane

Havasvölgyi Ottó

h.otto@freemail.hu

almost 21 years ago

In reply to: Jon Christian Ottersen (#1)

Re: feeding big script to psql

Hi,

The effect is the same even if I redirect the output to file with the -o
switch.
At the beginning 200 KB/sec, at 1.5 MB the speed is less than 20 KB/sec.

Best Regards,
Otto

----- Original Message -----
From: "Havasvďż˝lgyi Ottďż˝" <h.otto@freemail.hu>
To: "Tom Lane" <tgl@sss.pgh.pa.us>
Cc: <pgsql-general@postgresql.org>
Sent: Tuesday, August 02, 2005 11:24 AM
Subject: Re: [GENERAL] feeding big script to psql

Show quoted text

Tom,

Thanks for the suggestion. I have just applied both switch , -f (I have
applied this in the previous case too) and -n, but it becomes slow again.
At the beginning it reads about 300 KB a second, and when it has read 1.5
MB, it reads only about 10 KB a second, it slows down gradually. Maybe
others should also try this scenario. Can I help anything?

Best Regards,
Otto

----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Havasvďż˝lgyi Ottďż˝" <h.otto@freemail.hu>
Cc: <pgsql-general@postgresql.org>
Sent: Tuesday, August 02, 2005 3:54 AM
Subject: Re: [GENERAL] feeding big script to psql

=?ISO-8859-1?Q?Havasv=F6lgyi_Ott=F3?= <h.otto@freemail.hu> writes:

I know it would be faster with COPY, but this is extremly slow, and the
bottleneck is psql.
What is the problem?

Hmm, does the Windows port have readline support, and if so does adding
the "-n" switch to the psql invocation fix the problem? Or you could
try feeding the script with -f switch or \i rather than "psql <script".
Readline adds a fair amount of overhead, which is completely invisible
at human typing speeds but can be annoying when reading scripts.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Havasvölgyi Ottó

h.otto@freemail.hu

almost 21 years ago

In reply to: Jon Christian Ottersen (#1)

Re: feeding big script to psql

Hi,

Now I am at 7 MB, and the reading speed is 3-4KB/sec.

Best Regards,
Otto

----- Original Message -----
From: "Havasvďż˝lgyi Ottďż˝" <h.otto@freemail.hu>
To: <pgsql-general@postgresql.org>
Sent: Tuesday, August 02, 2005 1:31 PM
Subject: Re: [GENERAL] feeding big script to psql

Show quoted text

Hi,

The effect is the same even if I redirect the output to file with the -o
switch.
At the beginning 200 KB/sec, at 1.5 MB the speed is less than 20 KB/sec.

Best Regards,
Otto

----- Original Message -----
From: "Havasvďż˝lgyi Ottďż˝" <h.otto@freemail.hu>
To: "Tom Lane" <tgl@sss.pgh.pa.us>
Cc: <pgsql-general@postgresql.org>
Sent: Tuesday, August 02, 2005 11:24 AM
Subject: Re: [GENERAL] feeding big script to psql

Tom,

Thanks for the suggestion. I have just applied both switch , -f (I have
applied this in the previous case too) and -n, but it becomes slow again.
At the beginning it reads about 300 KB a second, and when it has read 1.5
MB, it reads only about 10 KB a second, it slows down gradually. Maybe
others should also try this scenario. Can I help anything?

Best Regards,
Otto

----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Havasvďż˝lgyi Ottďż˝" <h.otto@freemail.hu>
Cc: <pgsql-general@postgresql.org>
Sent: Tuesday, August 02, 2005 3:54 AM
Subject: Re: [GENERAL] feeding big script to psql

=?ISO-8859-1?Q?Havasv=F6lgyi_Ott=F3?= <h.otto@freemail.hu> writes:

I know it would be faster with COPY, but this is extremly slow, and the
bottleneck is psql.
What is the problem?

Hmm, does the Windows port have readline support, and if so does adding
the "-n" switch to the psql invocation fix the problem? Or you could
try feeding the script with -f switch or \i rather than "psql <script".
Readline adds a fair amount of overhead, which is completely invisible
at human typing speeds but can be annoying when reading scripts.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

dan@sidhe.org

almost 21 years ago

In reply to: Havasvölgyi Ottó (#10)

Re: feeding big script to psql

At 1:57 PM +0200 8/2/05, Havasvölgyi Ottó wrote:

Hi,

Now I am at 7 MB, and the reading speed is 3-4KB/sec.

Have you checked to see if you're swapping as
this goes on, either in the client or on the
server?

----- Original Message ----- From: "Havasvölgyi Ottó" <h.otto@freemail.hu>
To: <pgsql-general@postgresql.org>
Sent: Tuesday, August 02, 2005 1:31 PM
Subject: Re: [GENERAL] feeding big script to psql

Hi,

The effect is the same even if I redirect the
output to file with the -o switch.
At the beginning 200 KB/sec, at 1.5 MB the speed is less than 20 KB/sec.

Best Regards,
Otto

----- Original Message ----- From: "Havasvölgyi Ottó" <h.otto@freemail.hu>
To: "Tom Lane" <tgl@sss.pgh.pa.us>
Cc: <pgsql-general@postgresql.org>
Sent: Tuesday, August 02, 2005 11:24 AM
Subject: Re: [GENERAL] feeding big script to psql

Tom,

Thanks for the suggestion. I have just applied
both switch , -f (I have applied this in the
previous case too) and -n, but it becomes slow
again. At the beginning it reads about 300 KB
a second, and when it has read 1.5 MB, it
reads only about 10 KB a second, it slows down
gradually. Maybe others should also try this
scenario. Can I help anything?

Best Regards,
Otto

----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Havasvölgyi Ottó" <h.otto@freemail.hu>
Cc: <pgsql-general@postgresql.org>
Sent: Tuesday, August 02, 2005 3:54 AM
Subject: Re: [GENERAL] feeding big script to psql

=?ISO-8859-1?Q?Havasv=F6lgyi_Ott=F3?= <h.otto@freemail.hu> writes:

I know it would be faster with COPY, but this is extremly slow, and the
bottleneck is psql.
What is the problem?

Hmm, does the Windows port have readline support, and if so does adding
the "-n" switch to the psql invocation fix the problem? Or you could
try feeding the script with -f switch or \i rather than "psql <script".
Readline adds a fair amount of overhead, which is completely invisible
at human typing speeds but can be annoying when reading scripts.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

--
Dan

--------------------------------------it's like this-------------------
Dan Sugalski even samurai
dan@sidhe.org have teddy bears and even
teddy bears get drunk

petew@yellowhawk.co.uk

almost 21 years ago

In reply to: Dan Sugalski (#11)

Re: feeding big script to psql

Dan Sugalski wrote:

At 1:57 PM +0200 8/2/05, Havasvölgyi Ottó wrote:

Hi,

Now I am at 7 MB, and the reading speed is 3-4KB/sec.

Have you checked to see if you're swapping as this goes on, either in
the client or on the server?

----- Original Message ----- From: "Havasvölgyi Ottó"
<h.otto@freemail.hu>
To: <pgsql-general@postgresql.org>
Sent: Tuesday, August 02, 2005 1:31 PM
Subject: Re: [GENERAL] feeding big script to psql

Hi,

The effect is the same even if I redirect the output to file with the
-o switch.
At the beginning 200 KB/sec, at 1.5 MB the speed is less than 20 KB/sec.

Best Regards,
Otto

----- Original Message ----- From: "Havasvölgyi Ottó"
<h.otto@freemail.hu>
To: "Tom Lane" <tgl@sss.pgh.pa.us>
Cc: <pgsql-general@postgresql.org>
Sent: Tuesday, August 02, 2005 11:24 AM
Subject: Re: [GENERAL] feeding big script to psql

Tom,

Thanks for the suggestion. I have just applied both switch , -f (I
have applied this in the previous case too) and -n, but it becomes
slow again. At the beginning it reads about 300 KB a second, and
when it has read 1.5 MB, it reads only about 10 KB a second, it
slows down gradually. Maybe others should also try this scenario.
Can I help anything?

Best Regards,
Otto

----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Havasvölgyi Ottó" <h.otto@freemail.hu>
Cc: <pgsql-general@postgresql.org>
Sent: Tuesday, August 02, 2005 3:54 AM
Subject: Re: [GENERAL] feeding big script to psql

=?ISO-8859-1?Q?Havasv=F6lgyi_Ott=F3?= <h.otto@freemail.hu> writes:

I know it would be faster with COPY, but this is extremly slow,
and the
bottleneck is psql.
What is the problem?

Hmm, does the Windows port have readline support, and if so does
adding
the "-n" switch to the psql invocation fix the problem? Or you could
try feeding the script with -f switch or \i rather than "psql
<script".
Readline adds a fair amount of overhead, which is completely invisible
at human typing speeds but can be annoying when reading scripts.

regards, tom lane

---------------------------(end of
broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

--
Dan

--------------------------------------it's like this-------------------
Dan Sugalski even samurai
dan@sidhe.org have teddy bears and even
teddy bears get drunk

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Have you tried inserting VACUUM commands into the script every now and then?

I found a while ago that after inserting a lot of rows into a clean
Postgres table it would take several minutes just to analyse a command,
not even starting the execution. That was on version 8.0. On version
7.4.x the query never returned at all.

Pete
--
Peter Wilson - YellowHawk Ltd : http://www.yellowhawk.co.uk

smarlowe@g2switchworks.com

almost 21 years ago

In reply to: Havasvölgyi Ottó (#8)

Re: feeding big script to psql

On Tue, 2005-08-02 at 04:24, Havasvölgyi Ottó wrote:

Tom,

Thanks for the suggestion. I have just applied both switch , -f (I have
applied this in the previous case too) and -n, but it becomes slow again. At
the beginning it reads about 300 KB a second, and when it has read 1.5 MB,
it reads only about 10 KB a second, it slows down gradually. Maybe others
should also try this scenario. Can I help anything?

I be you've got an issue where a seq scan on an fk field or something
works fine for the first few thousand rows. At some point, pgsql should
switch to an index scan, but it just doesn't know it.

Try wrapping every 10,000 or so inserts with

begin;
<insert 10,000 rows>
commit;
analyze;
begin;
rinse, wash repeat.

You probably won't need an analyze after the first one though.

tgl@sss.pgh.pa.us

almost 21 years ago

In reply to: Peter Wilson (#12)

Re: feeding big script to psql

Peter Wilson <petew@yellowhawk.co.uk> writes:

I found a while ago that after inserting a lot of rows into a clean
Postgres table it would take several minutes just to analyse a command,
not even starting the execution.

Oh? Could you provide a test case for this? I can certainly believe
that the planner might choose a bad plan if it has no statistics, but
it shouldn't take a long time to do it.

regards, tom lane

petew@yellowhawk.co.uk

almost 21 years ago

In reply to: Tom Lane (#14)

Re: feeding big script to psql

I was a little busy with deadlines at the time but I saved the database
in it's slow configuration so I could investigate during a quieter period.

I'll do a restore now and see whether I can remember back to April when
I came across this issue.

Pete

Tom Lane wrote:

Show quoted text

Peter Wilson <petew@yellowhawk.co.uk> writes:

I found a while ago that after inserting a lot of rows into a clean
Postgres table it would take several minutes just to analyse a command,
not even starting the execution.

Oh? Could you provide a test case for this? I can certainly believe
that the planner might choose a bad plan if it has no statistics, but
it shouldn't take a long time to do it.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Havasvölgyi Ottó

h.otto@freemail.hu

almost 21 years ago

In reply to: Jon Christian Ottersen (#1)

Re: feeding big script to psql

Scott,

There were no foreign keys (even no indices) during data import, and none of
the tables had more than 4000 records. And I have checked the log for
durations, and all insert statements were 0.000 ms. So it seems that the
problem is not at the server.
During the process no other application did anything. No other HDD activity
either.

Best Regadrs,
Otto

----- Original Message -----
From: "Scott Marlowe" <smarlowe@g2switchworks.com>
To: "Havasvölgyi Ottó" <h.otto@freemail.hu>
Cc: "Tom Lane" <tgl@sss.pgh.pa.us>; <pgsql-general@postgresql.org>
Sent: Tuesday, August 02, 2005 5:57 PM
Subject: Re: [GENERAL] feeding big script to psql

On Tue, 2005-08-02 at 04:24, Havasvölgyi Ottó wrote:

Tom,

Thanks for the suggestion. I have just applied both switch , -f (I have
applied this in the previous case too) and -n, but it becomes slow again.
At
the beginning it reads about 300 KB a second, and when it has read 1.5 MB,
it reads only about 10 KB a second, it slows down gradually. Maybe others
should also try this scenario. Can I help anything?

I be you've got an issue where a seq scan on an fk field or something
works fine for the first few thousand rows. At some point, pgsql should
switch to an index scan, but it just doesn't know it.

Try wrapping every 10,000 or so inserts with

begin;
<insert 10,000 rows>
commit;
analyze;
begin;
rinse, wash repeat.

You probably won't need an analyze after the first one though.

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

petew@yellowhawk.co.uk

almost 21 years ago

In reply to: Tom Lane (#14)

Re: feeding big script to psql

Tom Lane wrote:

Peter Wilson <petew@yellowhawk.co.uk> writes:

I found a while ago that after inserting a lot of rows into a clean
Postgres table it would take several minutes just to analyse a command,
not even starting the execution.

Oh? Could you provide a test case for this? I can certainly believe
that the planner might choose a bad plan if it has no statistics, but
it shouldn't take a long time to do it.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Hi Tom,
I've spent half an hour restoring the database from the backup I took. This was back in December '04. The database dump is about 95Mbytes, none of the
tables are particularly big.

On investigation the problems occurs on 'EXPLAIN ANALYZE' - which is what pgadminIII does when you press the explain button. According to the manual,
this actually executes the command and gives actual times (I think pgadmin is wrong here - if you ask it to explain a command you want the plan it's
going to use - not wait for it to finish!)

That said - I'll include the details here, at least partly because the tables are small in database terms, and I don't understand how this query even
with lots of scanning can take 100% CPU for over 10 minutes (654 seconds for explain/analyze 651 seconds for execute).

OK - the query is:

SELECT DISTINCT c.client_id, c.instance, c.contact_id, c.uname FROM contacts c
WHERE c.client_id = 'gadget'
AND c.instance = '0'
AND ( c.type = 'COMMUNITY'
OR c.type = 'OU'
OR c.type = 'INDIVIDUAL'
)
AND c.contact_id in (
SELECT subb.community_id FROM contact_att subb
WHERE subb.client_id = 'gadget'
AND subb.instance = '0'
AND subb.contact_id = 3854.000000
) ;

By itself the sub-select executes in 235ms and yields a set that does not vary depending on the outer select - it's a constant set in effect. It would
seem that in the worst case assuming no index or size information, the planner should spot the invariance of the subselect and a sequential scan of
the 'contacts' table would be the worst result I would expect.

There are two tables involved in this query.

'contacts' contains 3728 records.
'contact_att' contains 19217 records.

The query plan yields :

Unique (cost=12.05..12.06 rows=1 width=90) (actual time=654491.967..654491.984 rows=3 loops=1)
-> Sort (cost=12.05..12.05 rows=1 width=90) (actual time=654491.958..654491.959 rows=3 loops=1)
Sort Key: c.client_id, c.instance, c.contact_id, c.uname
-> Nested Loop IN Join (cost=0.00..12.04 rows=1 width=90) (actual time=577763.884..654491.864 rows=3 loops=1)
Join Filter: ("outer".contact_id = "inner".community_id)
-> Index Scan using cos_uk on contacts c (cost=0.00..6.02 rows=1 width=90) (actual time=0.227..123.862 rows=2791 loops=1)
Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text))
Filter: ((("type")::text = 'COMMUNITY'::text) OR (("type")::text = 'OU'::text) OR (("type")::text = 'INDIVIDUAL'::text))
-> Index Scan using ca_pk on contact_att subb (cost=0.00..6.01 rows=1 width=8) (actual time=0.207..234.423 rows=3 loops=2791)
Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text))
Filter: ((contact_id)::numeric = 3854.000000)
Total runtime: 654492.320 ms

Definitions for the two relevant tables are:

create table contacts (
INSTANCE CHARACTER (1) NOT NULL ,
client_id varchar (50) not null ,
contact_id bigint default nextval('contacts_contact_id_seq'::text),
UNAME VARCHAR (32) NOT NULL ,
TYPE VARCHAR (20) NOT NULL DEFAULT 'INDIVIDUAL',
parent bigint,
NAME VARCHAR (240) ,
PHONE VARCHAR (32) ,
FAX VARCHAR (32) ,
EMAIL VARCHAR (240) ,
BASIC_PW VARCHAR (128) ,
DESCRIPTION VARCHAR (240),
custom_data varchar(8192),
CONSTRAINT COS_PK PRIMARY KEY(INSTANCE,CLIENT_ID,CONTACT_ID)
)

create table contact_att(
instance character(1),
client_id varchar(50) not null,
contact_id bigint,
community_id bigint,
inherited smallint,
CONSTRAINT CA_PK PRIMARY KEY(INSTANCE,CLIENT_ID,CONTACT_ID,COMMUNITY_ID)
)

CREATE INDEX CO_PA_IND ON CONTACTS (PARENT);
CREATE INDEX CO_TY_IND ON CONTACTS (TYPE);

--
Hope you find that useful! If there is anything else I can provide you with then please do let me know.

Pete

------------------------------------------------------------------------
Peter Wilson - YellowHawk Ltd, http://www.yellowhawk.co.uk

petew@yellowhawk.co.uk

almost 21 years ago

In reply to: Tom Lane (#14)

Re: feeding big script to psql

Tom Lane wrote:

Peter Wilson <petew@yellowhawk.co.uk> writes:

I found a while ago that after inserting a lot of rows into a clean
Postgres table it would take several minutes just to analyse a command,
not even starting the execution.

Oh? Could you provide a test case for this? I can certainly believe
that the planner might choose a bad plan if it has no statistics, but
it shouldn't take a long time to do it.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

After a vacuum the query plan becomes:

Unique (cost=1438.65..1438.66 rows=1 width=39) (actual time=260.473..260.489 rows=3 loops=1)
-> Sort (cost=1438.65..1438.65 rows=1 width=39) (actual time=260.468..260.471 rows=3 loops=1)
Sort Key: c.client_id, c.instance, c.contact_id, c.uname
-> Nested Loop (cost=1434.14..1438.64 rows=1 width=39) (actual time=260.007..260.306 rows=3 loops=1)
-> HashAggregate (cost=1434.14..1434.14 rows=1 width=8) (actual time=259.666..259.686 rows=3 loops=1)
-> Index Scan using ca_pk on contact_att subb (cost=0.00..1433.95 rows=78 width=8) (actual time=0.367..259.617 rows=3 loops=1)
Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text))
Filter: ((contact_id)::numeric = 3854.000000)
-> Index Scan using cos_pk on contacts c (cost=0.00..4.48 rows=1 width=39) (actual time=0.178..0.182 rows=1 loops=3)
Index Cond: ((c.instance = '0'::bpchar) AND ((c.client_id)::text = 'gadget'::text) AND (c.contact_id = "outer".community_id))
Filter: ((("type")::text = 'COMMUNITY'::text) OR (("type")::text = 'OU'::text) OR (("type")::text = 'INDIVIDUAL'::text))
Total runtime: 260.886 ms

whitebeam_slow=> \d contacts
Table "public.contacts"
Column | Type | Modifiers
-------------+-------------------------+-----------------------------------------------------------
instance | character(1) | not null
client_id | character varying(50) | not null
contact_id | bigint | not null default nextval('contacts_contact_id_seq'::text)
uname | character varying(32) | not null
type | character varying(20) | not null default 'INDIVIDUAL'::character varying
parent | bigint |
name | character varying(240) |
phone | character varying(32) |
fax | character varying(32) |
email | character varying(240) |
basic_pw | character varying(128) |
description | character varying(240) |
custom_data | character varying(8192) |
Indexes:
"cos_pk" PRIMARY KEY, btree (instance, client_id, contact_id)
"cos_uk" UNIQUE, btree (instance, client_id, uname)
"co_pa_ind" btree (parent)
"co_ty_ind" btree ("type")

whitebeam_slow-> \d contact_att
Table "public.contact_att"
Column | Type | Modifiers
--------------+-----------------------+-----------
instance | character(1) | not null
client_id | character varying(50) | not null
contact_id | bigint | not null
community_id | bigint | not null
inherited | smallint |
Indexes:
"ca_pk" PRIMARY KEY, btree (instance, client_id, contact_id, community_id)
Foreign-key constraints:
"ca_cos_comm" FOREIGN KEY (instance, client_id, community_id) REFERENCES contacts(instance, client_id, contact_id) ON UPDATE RESTRICT ON DELETE
RESTRICT
"ca_cos_fk" FOREIGN KEY (instance, client_id, contact_id) REFERENCES contacts(instance, client_id, contact_id) ON UPDATE RESTRICT ON DELETE RESTRICT

------------------------------------------------------------------------
Peter Wilson - YellowHawk Ltd, http://www.yellowhawk.co.uk

tgl@sss.pgh.pa.us

almost 21 years ago

In reply to: Peter Wilson (#17)

Re: feeding big script to psql

Peter Wilson <petew@yellowhawk.co.uk> writes:

Tom Lane wrote:

Oh? Could you provide a test case for this? I can certainly believe
that the planner might choose a bad plan if it has no statistics, but
it shouldn't take a long time to do it.

On investigation the problems occurs on 'EXPLAIN ANALYZE' - which is
what pgadminIII does when you press the explain button.

Ah. Well, this is an ideal example of why you need statistics ---
without 'em, the planner is more or less flying blind about the number
of matching rows. The original plan had

-> Index Scan using ca_pk on contact_att subb (cost=0.00..6.01 rows=1 width=8) (actual time=0.207..234.423 rows=3 loops=2791)
Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text))
Filter: ((contact_id)::numeric = 3854.000000)

while your "after a vacuum" (I suppose really a vacuum analyze) plan has

-> Index Scan using ca_pk on contact_att subb (cost=0.00..1433.95 rows=78 width=8) (actual time=0.367..259.617 rows=3 loops=1)
Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text))
Filter: ((contact_id)::numeric = 3854.000000)

This is the identical scan plan ... but now that the planner realizes
it's going to be pretty expensive, it arranges the join in a way that
requires only one scan of contact_att and not 2791 of 'em.

The key point here is that the index condition on instance/client_id
is not selective --- it'll pull out a lot of rows.  All but 3 of 'em are
then discarded by the contact_id condition, but the damage in terms
of runtime was already done.  With stats, the planner can realize this
--- without stats, it has no chance.

Looking at your table definition, I suppose you were expecting the
contact_id condition to be used with the index, but since contact_id is
bigint, comparing it to a numeric-type constant is not considered indexable.
You want to lose the ".000000" in the query.

regards, tom lane

tgl@sss.pgh.pa.us

almost 21 years ago

In reply to: Havasvölgyi Ottó (#8)

Re: feeding big script to psql

=?iso-8859-2?Q?Havasv=F6lgyi_Ott=F3?= <h.otto@freemail.hu> writes:

Thanks for the suggestion. I have just applied both switch , -f (I have
applied this in the previous case too) and -n, but it becomes slow again. At
the beginning it reads about 300 KB a second, and when it has read 1.5 MB,
it reads only about 10 KB a second, it slows down gradually. Maybe others
should also try this scenario. Can I help anything?

Well, I don't see it happening here. I made up a script consisting of a
whole lot of repetitions of

insert into t1 values(1,2,3);

with one of these inserted every 1000 lines:

\echo 1000 `date`

so I could track the performance. I created a table by hand:

create table t1(f1 int, f2 int, f3 int);

and then started the script with

psql -q -f big.sql testdb

At the beginning I was seeing about two echoes per second. I let it run
for an hour, and I was still seeing about two echoes per second. That's
something close to 170MB of script file read (over 5.7 million rows
inserted by the time I stopped it).

So, either this test case is too simple to expose your problem, or
there's something platform-specific going on. I don't have a windows
machine to try it on ...

regards, tom lane

petew@yellowhawk.co.uk

almost 21 years ago

In reply to: Tom Lane (#19)

Havasvölgyi Ottó

h.otto@freemail.hu

almost 21 years ago

In reply to: Jon Christian Ottersen (#1)

tgl@sss.pgh.pa.us

almost 21 years ago

In reply to: Havasvölgyi Ottó (#22)