7.4Beta

Started by Peter Childsover 22 years ago99 messages
#1Peter Childs
Blue.Dragon@blueyonder.co.uk

Just loaded up 7.4Beta on a test system, to try and work out what the major
changes are/improvements.... I also started up pg_vacuum as this is now
working very well on our semi-live system (semi-live because we have not
finished developing the system yet.....)
Anyway. After trying to throw our 7.3 config file at it and it crashed badly
I resorted to the defaults. some of the settings in the config file have
changed. (Can somone make the config files backward compatable so at least
they ignor depricated settings!)
I throw last nights backup at it. Data went in in about 1/2 an hour then the
constraints went in and they took at age. about 2 hours.....
Is there anyway to speed up the database constraint code? Because quite
frankly at the current speed your probably better off without the
constraints.... (Same problem with 7.3 come to think about it.)
Otherwise 7.4. seams fine.
Question, when is replication and rolling backup comming. I want to be able
to take the last backup and a list of all the updates since then and get back
to the current database quickly.

Peter Childs

#2Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Peter Childs (#1)
Re: [GENERAL] 7.4Beta

[ Beta discussion moved to hackers.]

Peter Childs wrote:

Just loaded up 7.4Beta on a test system, to try and work out what the major
changes are/improvements.... I also started up pg_vacuum as this is now
working very well on our semi-live system (semi-live because we have not
finished developing the system yet.....)
Anyway. After trying to throw our 7.3 config file at it and it crashed badly
I resorted to the defaults. some of the settings in the config file have
changed. (Can somone make the config files backward compatable so at least
they ignor depricated settings!)

There will be a lost of config name and other changes in a special
section soon.

I throw last nights backup at it. Data went in in about 1/2 an hour then the
constraints went in and they took at age. about 2 hours.....
Is there anyway to speed up the database constraint code? Because quite
frankly at the current speed your probably better off without the
constraints.... (Same problem with 7.3 come to think about it.)
Otherwise 7.4. seams fine.
Question, when is replication and rolling backup comming. I want to be able
to take the last backup and a list of all the updates since then and get back
to the current database quickly.

Rolling backups should be in 7.5, I hope.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#3Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Bruce Momjian (#2)
Re: [GENERAL] 7.4Beta

I throw last nights backup at it. Data went in in about 1/2 an hour then

the

constraints went in and they took at age. about 2 hours.....
Is there anyway to speed up the database constraint code? Because quite
frankly at the current speed your probably better off without the
constraints.... (Same problem with 7.3 come to think about it.)

I can also attest to the horrendously long time it takes to restore the ADD
FOREIGN KEY section...

Chris

#4Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Christopher Kings-Lynne (#3)
Re: [GENERAL] 7.4Beta

On Fri, 15 Aug 2003, Christopher Kings-Lynne wrote:

I throw last nights backup at it. Data went in in about 1/2 an hour then

the

constraints went in and they took at age. about 2 hours.....
Is there anyway to speed up the database constraint code? Because quite
frankly at the current speed your probably better off without the
constraints.... (Same problem with 7.3 come to think about it.)

I can also attest to the horrendously long time it takes to restore the ADD
FOREIGN KEY section...

That really needs to be rewritten to do a single check over the table
rather than running the constraint for every row. I keep meaning to get
around to it and never actually do. :( I'm not sure that in practice
you'll get a better plan at restore time depending on what the default
statistics give you.

#5Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Stephan Szabo (#4)
Re: [GENERAL] 7.4Beta

I can also attest to the horrendously long time it takes to restore the

ADD

FOREIGN KEY section...

That really needs to be rewritten to do a single check over the table
rather than running the constraint for every row. I keep meaning to get
around to it and never actually do. :( I'm not sure that in practice
you'll get a better plan at restore time depending on what the default
statistics give you.

Surely in the default case it would reduce to using the new hashed IN()
feature, so it'd be a lot faster?

Chris

#6Gavin Sherry
swm@linuxworld.com.au
In reply to: Stephan Szabo (#4)
Re: [GENERAL] 7.4Beta

On Thu, 14 Aug 2003, Stephan Szabo wrote:

That really needs to be rewritten to do a single check over the table
rather than running the constraint for every row. I keep meaning to get
around to it and never actually do. :( I'm not sure that in practice
you'll get a better plan at restore time depending on what the default
statistics give you.

Perhaps it would be easier to allow SKIP VALIDATION (or something) with
ALTER TABLE ADD .... which can set FkConstraint->skip_validation. If we're
just handling pg_dump output, then presumably the data is already
validated. On the other handle, it might encourage users to bypass FKs
when they feel like it...

Thanks,

Gavin

#7Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Christopher Kings-Lynne (#5)
Re: [GENERAL] 7.4Beta

On Fri, 15 Aug 2003, Christopher Kings-Lynne wrote:

I can also attest to the horrendously long time it takes to restore the

ADD

FOREIGN KEY section...

That really needs to be rewritten to do a single check over the table
rather than running the constraint for every row. I keep meaning to get
around to it and never actually do. :( I'm not sure that in practice
you'll get a better plan at restore time depending on what the default
statistics give you.

Surely in the default case it would reduce to using the new hashed IN()
feature, so it'd be a lot faster?

If we wrote the query using IN that'd be the hope (I've not played with it
enough to guarantee that)

However, on a simple test comparing

select * from fk where not exists(select * from pk where pk.key=fk.key)
and key is not null;
(doing seq scan/subplan doing index scan - which is probably close to the
current system)

and
select * from fk where key in (select key from pk) and key is not null

on a pk table with 100k rows and fk table with 1m rows gives me a
difference of about 2x on my machine.

But that's with a single column int4 key, I haven't tried multi-column
keys or larger key types.

#8Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Gavin Sherry (#6)
Re: [GENERAL] 7.4Beta

On Fri, 15 Aug 2003, Gavin Sherry wrote:

On Thu, 14 Aug 2003, Stephan Szabo wrote:

That really needs to be rewritten to do a single check over the table
rather than running the constraint for every row. I keep meaning to get
around to it and never actually do. :( I'm not sure that in practice
you'll get a better plan at restore time depending on what the default
statistics give you.

Perhaps it would be easier to allow SKIP VALIDATION (or something) with
ALTER TABLE ADD .... which can set FkConstraint->skip_validation. If we're
just handling pg_dump output, then presumably the data is already
validated. On the other handle, it might encourage users to bypass FKs
when they feel like it...

We've talked about stuff like that in the past, but we seem to generally
get stuck about how to specify it. If we add it to the alter table add as
an option then we're generating statements that are almost like a standard
sql statement, but not quite, and some people didn't like that. A set
option that only affected ALTER TABLE ADD CONSTRAINT wouldn't be too bad,
but I'd be worried about making it apply to the normal checks as well
(which I believe was suggested last time this came up). In any case,
making the full checks faster for when you really do care isn't a bad
plan. :)

#9Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Stephan Szabo (#8)
Re: [GENERAL] 7.4Beta

We've talked about stuff like that in the past, but we seem to generally
get stuck about how to specify it. If we add it to the alter table add as
an option then we're generating statements that are almost like a standard
sql statement, but not quite, and some people didn't like that. A set
option that only affected ALTER TABLE ADD CONSTRAINT wouldn't be too bad,
but I'd be worried about making it apply to the normal checks as well
(which I believe was suggested last time this came up). In any case,
making the full checks faster for when you really do care isn't a bad
plan. :)

How about having an 'i'm restoring' SET var:

SET data_restore = true;

Which means (among other things) that FK's aren't checked?

Chris

#10Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Stephan Szabo (#8)
Re: [GENERAL] 7.4Beta

We've talked about stuff like that in the past, but we seem to generally
get stuck about how to specify it. If we add it to the alter table add as
an option then we're generating statements that are almost like a standard
sql statement, but not quite, and some people didn't like that. A set
option that only affected ALTER TABLE ADD CONSTRAINT wouldn't be too bad,
but I'd be worried about making it apply to the normal checks as well
(which I believe was suggested last time this came up). In any case,
making the full checks faster for when you really do care isn't a bad
plan. :)

Or, this:

1. Add the FK to the table BEFORE COPYing data
2. Use the old update blah set reltriggers = 0 trick
3. restore the data
4. Undo step 2

Chris

#11Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Christopher Kings-Lynne (#10)
Re: [GENERAL] 7.4Beta

On Fri, 15 Aug 2003, Christopher Kings-Lynne wrote:

We've talked about stuff like that in the past, but we seem to generally
get stuck about how to specify it. If we add it to the alter table add as
an option then we're generating statements that are almost like a standard
sql statement, but not quite, and some people didn't like that. A set
option that only affected ALTER TABLE ADD CONSTRAINT wouldn't be too bad,
but I'd be worried about making it apply to the normal checks as well
(which I believe was suggested last time this came up). In any case,
making the full checks faster for when you really do care isn't a bad
plan. :)

Or, this:

1. Add the FK to the table BEFORE COPYing data
2. Use the old update blah set reltriggers = 0 trick
3. restore the data
4. Undo step 2

The problem with that is that I think the reltriggers=0 trick only works
if you're superuser, I thought that's why the trigger disabling became
optional. A set that affected only atac would probably be reasonable in
the dump (and presumably easy to do). It'd also carry over to future
cases where we separate some check constraints (for examples ones
that refer directly or indirectly to the same table in a subselect).

#12Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Stephan Szabo (#11)
Re: [GENERAL] 7.4Beta

1. Add the FK to the table BEFORE COPYing data
2. Use the old update blah set reltriggers = 0 trick
3. restore the data
4. Undo step 2

The problem with that is that I think the reltriggers=0 trick only works
if you're superuser, I thought that's why the trigger disabling became
optional. A set that affected only atac would probably be reasonable in
the dump (and presumably easy to do). It'd also carry over to future
cases where we separate some check constraints (for examples ones
that refer directly or indirectly to the same table in a subselect).

Well yes, this would be a super-user only ability. Are you worried about
people restoring dumps as non-superuser?

Chris

#13Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Christopher Kings-Lynne (#12)
Re: [GENERAL] 7.4Beta

On Fri, 15 Aug 2003, Christopher Kings-Lynne wrote:

1. Add the FK to the table BEFORE COPYing data
2. Use the old update blah set reltriggers = 0 trick
3. restore the data
4. Undo step 2

The problem with that is that I think the reltriggers=0 trick only works
if you're superuser, I thought that's why the trigger disabling became
optional. A set that affected only atac would probably be reasonable in
the dump (and presumably easy to do). It'd also carry over to future
cases where we separate some check constraints (for examples ones
that refer directly or indirectly to the same table in a subselect).

Well yes, this would be a super-user only ability. Are you worried about
people restoring dumps as non-superuser?

Basically, yes (you might dump only some tables that you own for example).
I think that's why the data only dumps no longer do the reltriggers thing
by default and you need an option to get at it. We could make a similar
option for this case, but it'd only work when restored by a superuser.

#14Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Stephan Szabo (#4)
Re: [GENERAL] 7.4Beta

Stephan Szabo wrote:

On Fri, 15 Aug 2003, Christopher Kings-Lynne wrote:

I throw last nights backup at it. Data went in in about 1/2 an hour then

the

constraints went in and they took at age. about 2 hours.....
Is there anyway to speed up the database constraint code? Because quite
frankly at the current speed your probably better off without the
constraints.... (Same problem with 7.3 come to think about it.)

I can also attest to the horrendously long time it takes to restore the ADD
FOREIGN KEY section...

That really needs to be rewritten to do a single check over the table
rather than running the constraint for every row. I keep meaning to get
around to it and never actually do. :( I'm not sure that in practice
you'll get a better plan at restore time depending on what the default
statistics give you.

This is clearly a case for a statement level trigger, as soon as
affected rows can be identified.
One remark on that enable/disable triggers stuff: from a user's
perspective, I wouldn't consider a constraint trigger as a trigger, so
if I'd disable all triggers on a table, I still would expect all
constraints to be checked.

Regards,
Andreas

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephan Szabo (#4)
ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

On Fri, 15 Aug 2003, Christopher Kings-Lynne wrote:

I can also attest to the horrendously long time it takes to restore the ADD
FOREIGN KEY section...

That really needs to be rewritten to do a single check over the table
rather than running the constraint for every row. I keep meaning to get
around to it and never actually do. :( I'm not sure that in practice
you'll get a better plan at restore time depending on what the default
statistics give you.

In simple cases I think that the creation of indexes would be enough to
get you a passable plan --- CREATE INDEX does update pg_class.reltuples,
so the planner will know how big the tables are, and for single-column
primary keys the existence of a unique index is enough to cue the
planner that the column is unique, even without any ANALYZE stats.
Those are the biggest levers on the plan choice.

This assumes that pg_dump always dumps CREATE INDEX before ADD FOREIGN
KEY; I'm not certain if there's anything to enforce that at the
moment...

I assume what you have in mind is to replace
validateForeignKeyConstraint() with something that does a join of the
two tables via an SPI command. But supposing that we want to keep the
present ability to report (one of) the failing key values, it seems
like the query has to look like
SELECT keycolumns FROM referencing_table WHERE
keycolumns NOT IN (SELECT refcols FROM referenced_table);
which is only gonna do the right thing for one of the MATCH styles
(not sure which, offhand ... actually it may not do the right thing
for any match style if there are nulls in referenced_table ...).
How would you make it work for all the MATCH styles? And will it
really be all that efficient? (NOT IN is a lot more circumscribed
than IN.)

regards, tom lane

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephan Szabo (#7)
Re: [GENERAL] 7.4Beta

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

select * from fk where not exists(select * from pk where pk.key=fk.key)
and key is not null;
(doing seq scan/subplan doing index scan - which is probably close to the
current system)

Actually, even that would probably be noticeably better than the current
system. I haven't profiled it (someone should) but I suspect that
executor startup/shutdown time is a huge hit. Even though the trigger
is caching a plan, it has to instantiate that plan for each referencing
tuple --- and the executor is not designed for quick startup/shutdown.

(Of course, this would become less relevant if the triggers got
rewritten to not go through SPI ...)

regards, tom lane

#17Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Tom Lane (#16)
Re: [GENERAL] 7.4Beta

On Fri, 15 Aug 2003, Tom Lane wrote:

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

select * from fk where not exists(select * from pk where pk.key=fk.key)
and key is not null;
(doing seq scan/subplan doing index scan - which is probably close to the
current system)

Actually, even that would probably be noticeably better than the current
system. I haven't profiled it (someone should) but I suspect that
executor startup/shutdown time is a huge hit. Even though the trigger
is caching a plan, it has to instantiate that plan for each referencing
tuple --- and the executor is not designed for quick startup/shutdown.

Yeah, but it was pretty much the best I could do testing on the command
line. And it was still a fair bit more expensive than using IN (my tests
on various key types showed anywhere from 15% to 300% better speed on IN
over exists for this).

#18Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Andreas Pflug (#14)
Re: [GENERAL] 7.4Beta

On Fri, 15 Aug 2003, Andreas Pflug wrote:

Stephan Szabo wrote:

That really needs to be rewritten to do a single check over the table
rather than running the constraint for every row. I keep meaning to get
around to it and never actually do. :( I'm not sure that in practice
you'll get a better plan at restore time depending on what the default
statistics give you.

This is clearly a case for a statement level trigger, as soon as
affected rows can be identified.

Well, I think single inserts might be more expensive (because the query is
more involved for the table joining case) using a statement level trigger,
so we'd probably want to profile the cases.

#19Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Tom Lane (#15)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

On Fri, 15 Aug 2003, Tom Lane wrote:

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

On Fri, 15 Aug 2003, Christopher Kings-Lynne wrote:

I can also attest to the horrendously long time it takes to restore the ADD
FOREIGN KEY section...

That really needs to be rewritten to do a single check over the table
rather than running the constraint for every row. I keep meaning to get
around to it and never actually do. :( I'm not sure that in practice
you'll get a better plan at restore time depending on what the default
statistics give you.

In simple cases I think that the creation of indexes would be enough to
get you a passable plan --- CREATE INDEX does update pg_class.reltuples,
so the planner will know how big the tables are, and for single-column
primary keys the existence of a unique index is enough to cue the
planner that the column is unique, even without any ANALYZE stats.
Those are the biggest levers on the plan choice.

This assumes that pg_dump always dumps CREATE INDEX before ADD FOREIGN
KEY; I'm not certain if there's anything to enforce that at the
moment...

I assume what you have in mind is to replace
validateForeignKeyConstraint() with something that does a join of the
two tables via an SPI command. But supposing that we want to keep the
present ability to report (one of) the failing key values, it seems
like the query has to look like
SELECT keycolumns FROM referencing_table WHERE
keycolumns NOT IN (SELECT refcols FROM referenced_table);
which is only gonna do the right thing for one of the MATCH styles
(not sure which, offhand ... actually it may not do the right thing
for any match style if there are nulls in referenced_table ...).

Yes, in practice, you'd have to put IS NOT NULL checks in the subselect,
which is fine for the two match types we support since a referenced row
with a NULL isn't a choice for a referenced row for those. I think MATCH
PARTIAL might have to fall back to the repeated check unless we can make
the query work which would be harder because you only want to compare the
columns for a particular row where the keycolumn case is not null and I
can't think of a query for that that'd be particularly clean and likely to
be fast, then again I don't think the constraint would be either. :(

It'd probably be:
MATCH unspecified:
SELECT keycolumns FROM referencing_table WHERE
(keycolumns) NOT IN (SELECT refcols FROM referenced_table
WHERE refcol1 IS NOT NULL AND ... )
AND keycolumn1 IS NOT NULL AND ...;

MATCH FULL: (something like, I haven't tried it)
SELECT keycolumns FROM referencing_table WHERE
((keycolumns) NOT IN (SELECT refcols FROM referenced_table
WHERE refcol1 IS NOT NULL AND ...)
AND
(keycolumn1 IS NOT NULL AND ...)
)
OR ((keycolumn1 IS NOT NULL)!=(keycolumn2 IS NOT NULL) OR ...)

How would you make it work for all the MATCH styles? And will it
really be all that efficient? (NOT IN is a lot more circumscribed
than IN.)

I'm not really sure yet. Limited tests seem to show that it'll probably
be as fast if not faster for all reasonable cases, but I'd want to
generate a much larger random data set and actually put it in to make a
fair comparison (maybe temporarily with a set to allow people to try both
cases on real world data). One other advantage here is that we don't need
to get row locks while checking this if we've already gotten the exclusive
table locks on both tables involved. I'm not sure if we do that currently
though.

#20Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Stephan Szabo (#18)
Re: [GENERAL] 7.4Beta

Stephan Szabo wrote:

On Fri, 15 Aug 2003, Andreas Pflug wrote:

Stephan Szabo wrote:

That really needs to be rewritten to do a single check over the table
rather than running the constraint for every row. I keep meaning to get
around to it and never actually do. :( I'm not sure that in practice
you'll get a better plan at restore time depending on what the default
statistics give you.

This is clearly a case for a statement level trigger, as soon as
affected rows can be identified.

Well, I think single inserts might be more expensive (because the query is
more involved for the table joining case) using a statement level trigger,
so we'd probably want to profile the cases.

This really depends. If a constraint is just a check on the
inserted/updated column, so no other row needs to be checked, there's no
faster way then the current row trigger. But FK constraints need to
execute a query to retrieve the referenced row, and every RDBMS prefers
to execute a single statement with many rows over many statements with a
single row, because the first will profit from optimization. And even if
only a single row is inserted or updated, there's still the need to
lookup the reference.

Regards,
Andreas

#21Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Andreas Pflug (#20)
Re: [GENERAL] 7.4Beta

On Fri, 15 Aug 2003, Andreas Pflug wrote:

Stephan Szabo wrote:

On Fri, 15 Aug 2003, Andreas Pflug wrote:

Stephan Szabo wrote:

That really needs to be rewritten to do a single check over the table
rather than running the constraint for every row. I keep meaning to get
around to it and never actually do. :( I'm not sure that in practice
you'll get a better plan at restore time depending on what the default
statistics give you.

This is clearly a case for a statement level trigger, as soon as
affected rows can be identified.

Well, I think single inserts might be more expensive (because the query is
more involved for the table joining case) using a statement level trigger,
so we'd probably want to profile the cases.

This really depends. If a constraint is just a check on the
inserted/updated column, so no other row needs to be checked, there's no
faster way then the current row trigger. But FK constraints need to
execute a query to retrieve the referenced row, and every RDBMS prefers
to execute a single statement with many rows over many statements with a
single row, because the first will profit from optimization. And even if
only a single row is inserted or updated, there's still the need to
lookup the reference.

I don't think that addresses the issue I brought up. If you're doing a
bunch of single inserts:
begin;
insert into foo values (1);
insert into foo values (1);
insert into foo values (1);
insert into foo values (1);
insert into foo values (1);
end;

Each of those statement triggers is still only going to be dealing with a
single row. If you're in immediate mode there's not much you can do about
that since the constraint is checked between inserts. If you're in
deferred mode, right now it won't help because it's not going to batch
them, it's going to be 5 statement triggers AFAICT each with its own 1 row
affected table.

I believe that the more complicated join the old/new table with the pk
table and do the constraint check is going to be slightly slower than the
current row behavior for such cases because the trigger query is going to
be more complicated. What would be nice would be some way to choose
whether to use a single query per statement vs a simpler query per row
based on what's happening.

#22Jan Wieck
JanWieck@Yahoo.com
In reply to: Tom Lane (#16)
Re: [GENERAL] 7.4Beta

Tom Lane wrote:

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

select * from fk where not exists(select * from pk where pk.key=fk.key)
and key is not null;
(doing seq scan/subplan doing index scan - which is probably close to the
current system)

Actually, even that would probably be noticeably better than the current
system. I haven't profiled it (someone should) but I suspect that
executor startup/shutdown time is a huge hit. Even though the trigger
is caching a plan, it has to instantiate that plan for each referencing
tuple --- and the executor is not designed for quick startup/shutdown.

(Of course, this would become less relevant if the triggers got
rewritten to not go through SPI ...)

One of the reasons why we used SPI to generate the plans was the ease of
use. I'm not 100% sure, but I think the standard doesn't require the
referencing and referenced column(s) to be identical, only compatible.
So for example a text type foreign key can reference a varchar() and an
int4 can reference int8. Not using SPI for that lookup does not reduce
to a simple index- or seq-scan (depending on index availability on the
foreign key attributes).

Even if the standard does require it, we did not for a couple releases
and breaking that backward compatibility is IMHO not an option.

I'm thinking instead of a way to "cache" entire executors for this. Each
SPI plan used during a transaction would need it's own executor, and I
don't know offhand what type and how much resources an executor requires
(I think it's only some memory that get's initialized and the VFD's
opened). If I also remember correctly, the executor holds the pointer to
the parameters in the execstate and the actual values stay just in the
caller provided array. All that can change for a given plan between
SPI_execp() calls is this parameter array and the maxtuple arg. If an
executor is comparably cheap resource wise, SPI_execp() should be able
to just manipulate the parameter array in the execstate (with
propagation into the scankeys I fear) and then let it do a rescan. At
transaction commit time we'd need to close all executors then, like we
do with cursors.

Does that all make any sense to you?

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#23Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Stephan Szabo (#21)
Re: [GENERAL] 7.4Beta

Stephan Szabo wrote:

On Fri, 15 Aug 2003, Andreas Pflug wrote:

Stephan Szabo wrote:

On Fri, 15 Aug 2003, Andreas Pflug wrote:

Stephan Szabo wrote:

That really needs to be rewritten to do a single check over the table
rather than running the constraint for every row. I keep meaning to get
around to it and never actually do. :( I'm not sure that in practice
you'll get a better plan at restore time depending on what the default
statistics give you.

This is clearly a case for a statement level trigger, as soon as
affected rows can be identified.

Well, I think single inserts might be more expensive (because the query is
more involved for the table joining case) using a statement level trigger,
so we'd probably want to profile the cases.

This really depends. If a constraint is just a check on the
inserted/updated column, so no other row needs to be checked, there's no
faster way then the current row trigger. But FK constraints need to
execute a query to retrieve the referenced row, and every RDBMS prefers
to execute a single statement with many rows over many statements with a
single row, because the first will profit from optimization. And even if
only a single row is inserted or updated, there's still the need to
lookup the reference.

I don't think that addresses the issue I brought up. If you're doing a
bunch of single inserts:
begin;
insert into foo values (1);
insert into foo values (1);
insert into foo values (1);
insert into foo values (1);
insert into foo values (1);
end;

Each of those statement triggers is still only going to be dealing with a
single row. If you're in immediate mode there's not much you can do about
that since the constraint is checked between inserts. If you're in
deferred mode, right now it won't help because it's not going to batch
them, it's going to be 5 statement triggers AFAICT each with its own 1 row
affected table.

I believe that the more complicated join the old/new table with the pk
table and do the constraint check is going to be slightly slower than the
current row behavior for such cases because the trigger query is going to
be more complicated. What would be nice would be some way to choose
whether to use a single query per statement vs a simpler query per row
based on what's happening.

Deferring the constraint check would mean checking 5 single rows, right.
But I still can't see why you think that a row level trigger would be
cheaper in this case. I had a look at ri_triggers.c and what's coded
there looks just as I expected, doing a query on the referenced table.
the queries might look a bit different when checking multiple rows at
once, but carefully designed I doubt that there would be a performance
hit from this. In case it *is* significantly slower, single row updates
could be handled separately using the current triggers, and statement
triggers for multiple rows. This would cover both scenarios best. At the
moment, update/insert scales not too good.

Best thing in the situation above would certainly be if all 5 rows would
be checked in a single query, but that looks quite impossible because a
mixture of inserts/updates/deletes on different tables might be deferred.

Regards,
Andreas

#24Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Andreas Pflug (#23)
Re: [GENERAL] 7.4Beta

On Fri, 15 Aug 2003, Andreas Pflug wrote:

Stephan Szabo wrote:

On Fri, 15 Aug 2003, Andreas Pflug wrote:

Stephan Szabo wrote:

Well, I think single inserts might be more expensive (because the query is
more involved for the table joining case) using a statement level trigger,
so we'd probably want to profile the cases.

This really depends. If a constraint is just a check on the
inserted/updated column, so no other row needs to be checked, there's no
faster way then the current row trigger. But FK constraints need to
execute a query to retrieve the referenced row, and every RDBMS prefers
to execute a single statement with many rows over many statements with a
single row, because the first will profit from optimization. And even if
only a single row is inserted or updated, there's still the need to
lookup the reference.

I don't think that addresses the issue I brought up. If you're doing a
bunch of single inserts:
begin;
insert into foo values (1);
insert into foo values (1);
insert into foo values (1);
insert into foo values (1);
insert into foo values (1);
end;

Each of those statement triggers is still only going to be dealing with a
single row. If you're in immediate mode there's not much you can do about
that since the constraint is checked between inserts. If you're in
deferred mode, right now it won't help because it's not going to batch
them, it's going to be 5 statement triggers AFAICT each with its own 1 row
affected table.

I believe that the more complicated join the old/new table with the pk
table and do the constraint check is going to be slightly slower than the
current row behavior for such cases because the trigger query is going to
be more complicated. What would be nice would be some way to choose
whether to use a single query per statement vs a simpler query per row
based on what's happening.

Deferring the constraint check would mean checking 5 single rows, right.
But I still can't see why you think that a row level trigger would be
cheaper in this case. I had a look at ri_triggers.c and what's coded
there looks just as I expected, doing a query on the referenced table.
the queries might look a bit different when checking multiple rows at
once, but carefully designed I doubt that there would be a performance
hit from this. In case it *is* significantly slower, single row updates

I don't know if there will be or not, but in one case it's a single table
select with constant values, in the other it's probably some kind of scan
and subselect. I'm just not going to rule out the possibility, so we
should profile it in large transactions with say 100k single inserts and
see.

could be handled separately using the current triggers, and statement
triggers for multiple rows. This would cover both scenarios best. At the

Yep. I'd wish that it could do it without actually needing to queue up
both triggers, but I don't know how if that'd be possible without tying
some knowledge of the fk functions deeper down.

Best thing in the situation above would certainly be if all 5 rows would
be checked in a single query, but that looks quite impossible because a
mixture of inserts/updates/deletes on different tables might be deferred.

Yeah, the 5 above are pretty easy to show that it's safe, but other cases
and referential action cases won't necessarily be so easy.

#25Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Stephan Szabo (#24)
Re: [GENERAL] 7.4Beta

Stephan Szabo wrote:

I don't know if there will be or not, but in one case it's a single table
select with constant values, in the other it's probably some kind of scan
and subselect. I'm just not going to rule out the possibility, so we
should profile it in large transactions with say 100k single inserts and
see.

You're talking about bulk operations, that should be handled carefully
either. Usually loading all data into a temporary table, and making a
INSERT INTO xxx SELECT FROM tmptable should give a better performance if
indices and constraints are concerned. PostgreSQL shouldn't be
considered to accept the most abusive ways of operation, but it should
offer a reasonable set of tools enabling the jobs in a convenient way.
Best situation available is if many small random transactions are
performed good, for TPC like loads, as well as bulk operations. Nobody
should expect that a database will smootly convert a bunch of single
transactions into an optimized bulk one. That's the job of a programmer.

Yeah, the 5 above are pretty easy to show that it's safe, but other cases
and referential action cases won't necessarily be so easy.

So it's the programmers responsibility to offer mass data to the
backend, not separate inserts that by chance might be handled in a
similar way. A RDBMS is not a clairvoyant.

Regards,
Andreas

#26Dann Corbit
DCorbit@connx.com
In reply to: Andreas Pflug (#25)
Re: [GENERAL] 7.4Beta

-----Original Message-----
From: Andreas Pflug [mailto:pgadmin@pse-consulting.de]
Sent: Friday, August 15, 2003 10:49 AM
To: Stephan Szabo
Cc: PostgreSQL-development
Subject: Re: [HACKERS] [GENERAL] 7.4Beta

Stephan Szabo wrote:

I don't know if there will be or not, but in one case it's a single
table select with constant values, in the other it's

probably some kind

of scan and subselect. I'm just not going to rule out the

possibility,

so we should profile it in large transactions with say 100k single
inserts and see.

You're talking about bulk operations, that should be handled
carefully
either. Usually loading all data into a temporary table, and making a
INSERT INTO xxx SELECT FROM tmptable should give a better
performance if
indices and constraints are concerned. PostgreSQL shouldn't be
considered to accept the most abusive ways of operation, but
it should
offer a reasonable set of tools enabling the jobs in a
convenient way.
Best situation available is if many small random transactions are
performed good, for TPC like loads, as well as bulk
operations. Nobody
should expect that a database will smootly convert a bunch of single
transactions into an optimized bulk one. That's the job of a
programmer.

Yeah, the 5 above are pretty easy to show that it's safe, but other
cases and referential action cases won't necessarily be so easy.

So it's the programmers responsibility to offer mass data to the
backend, not separate inserts that by chance might be handled in a
similar way. A RDBMS is not a clairvoyant.

Simplification of bulk operations can be very important for customers
(on the other hand). For the CONNX tool set, we offer an escape on
INSERT/SELECT that performs the operation in bulk mode.

There are serious downsides to bulk operations also (such as not being
logged and therefore existing outside of a transaction). Therefore,
they are useful really only in cases where a complete destruction and
repopulation is called for. If anything goes haywire, you can't simply
roll it back.

Yet the speed savings can be enormous (orders of magnitude).

Compared to iteration over a set of prepared inserts, a bulk insert
(such as using Microsoft's BCP API or Oracles Direct Path loading) can
be 100 times faster. If you are moving gigabytes of data and performing
a complete refresh, the method to use becomes obvious.

When we go outside of the language bounds, a curly braced escape
notation is used. For instance, an insert/select might look like this:
INSERT INTO <destination_table> SELECT <column_list> FROM <source_table>
{fn commitcount 1000} {bulkmode}
The commit count says to use batches of 1000 rows and bulkmode says to
use the fastest possible insert method.

Imagine (for instance) that PostgreSQL can use an ODBC {or similar} data
source as a table. Then, with a syntax such as the above (or SELECT
INTO etc.), you could very rapidly move data from one system into
another.

#27Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Dann Corbit (#26)
Re: [GENERAL] 7.4Beta

Dann Corbit wrote:

Simplification of bulk operations can be very important for customers
(on the other hand). For the CONNX tool set, we offer an escape on
INSERT/SELECT that performs the operation in bulk mode.

There are serious downsides to bulk operations also (such as not being
logged and therefore existing outside of a transaction). Therefore,
they are useful really only in cases where a complete destruction and
repopulation is called for. If anything goes haywire, you can't simply
roll it back.

Yet the speed savings can be enormous (orders of magnitude).

Compared to iteration over a set of prepared inserts, a bulk insert
(such as using Microsoft's BCP API or Oracles Direct Path loading) can
be 100 times faster. If you are moving gigabytes of data and performing
a complete refresh, the method to use becomes obvious.

When we go outside of the language bounds, a curly braced escape
notation is used. For instance, an insert/select might look like this:
INSERT INTO <destination_table> SELECT <column_list> FROM <source_table>
{fn commitcount 1000} {bulkmode}
The commit count says to use batches of 1000 rows and bulkmode says to
use the fastest possible insert method.

Imagine (for instance) that PostgreSQL can use an ODBC {or similar} data
source as a table. Then, with a syntax such as the above (or SELECT
INTO etc.), you could very rapidly move data from one system into
another.

When saying "bulk operation" I don't necessarily mean using bulk load or
stuff like that. What I mean is handling large amounts of similar data
at the same time. That doesn't say anything about transactions or
logging problems.
Imagine you have 100k or rows to load, each having FKs to (hopefully)
existing rows
- Step 1: load the rows into a temp table
- Step 2: identify duplicates, logging and deleting them
- Step 3: insert all data satisfying the FK constraints.
- Step 4: log all that didn't insert.

This are relatively few statements (not the simplest), which can be
handled in a safe manner.
Just an example, how a RDBMS can do the job in a mass oriented (and thus
optimizable) way.

Regards,
Andreas

#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Wieck (#22)
Re: [GENERAL] 7.4Beta

Jan Wieck <JanWieck@Yahoo.com> writes:

I'm thinking instead of a way to "cache" entire executors for this. Each
SPI plan used during a transaction would need it's own executor, and I
don't know offhand what type and how much resources an executor requires
(I think it's only some memory that get's initialized and the VFD's
opened).

Hmm. This is probably more feasible now than it would have been a year
ago, because I did some cleanup work to ensure that executor state is
localized into a specific memory context. I'm not certain about the
amount of overhead either, but it's surely worth a try.

regards, tom lane

#29Dann Corbit
DCorbit@connx.com
In reply to: Tom Lane (#28)
Re: [GENERAL] 7.4Beta

-----Original Message-----
From: Andreas Pflug [mailto:pgadmin@pse-consulting.de]
Sent: Friday, August 15, 2003 11:36 AM
To: Dann Corbit
Cc: Stephan Szabo; PostgreSQL-development
Subject: Re: [HACKERS] [GENERAL] 7.4Beta

Dann Corbit wrote:

Simplification of bulk operations can be very important for

customers

(on the other hand). For the CONNX tool set, we offer an escape on
INSERT/SELECT that performs the operation in bulk mode.

There are serious downsides to bulk operations also (such as

not being

logged and therefore existing outside of a transaction). Therefore,
they are useful really only in cases where a complete

destruction and

repopulation is called for. If anything goes haywire, you

can't simply

roll it back.

Yet the speed savings can be enormous (orders of magnitude).

Compared to iteration over a set of prepared inserts, a bulk insert
(such as using Microsoft's BCP API or Oracles Direct Path

loading) can

be 100 times faster. If you are moving gigabytes of data and
performing a complete refresh, the method to use becomes obvious.

When we go outside of the language bounds, a curly braced escape
notation is used. For instance, an insert/select might look

like this:

INSERT INTO <destination_table> SELECT <column_list> FROM
<source_table> {fn commitcount 1000} {bulkmode} The commit

count says

to use batches of 1000 rows and bulkmode says to use the fastest
possible insert method.

Imagine (for instance) that PostgreSQL can use an ODBC {or similar}
data source as a table. Then, with a syntax such as the above (or
SELECT INTO etc.), you could very rapidly move data from one system
into another.

When saying "bulk operation" I don't necessarily mean using
bulk load or
stuff like that. What I mean is handling large amounts of
similar data
at the same time. That doesn't say anything about transactions or
logging problems.
Imagine you have 100k or rows to load, each having FKs to (hopefully)
existing rows
- Step 1: load the rows into a temp table
- Step 2: identify duplicates, logging and deleting them
- Step 3: insert all data satisfying the FK constraints.
- Step 4: log all that didn't insert.

This are relatively few statements (not the simplest), which can be
handled in a safe manner.
Just an example, how a RDBMS can do the job in a mass
oriented (and thus
optimizable) way.

I really, really like SQL*Server's solution to the problem you have
stated above. A typical example is you have a pile of incoming words
and you want to create a dictionary. An example might be a library
database, where you have an abstract for each item. You parse the words
and look for new ones to add.

What SQL*Server allows is an index with an option called "IGNORE
DUPLICATES" that simply tosses out rows that are already in the table.
For applications like what I have described and what you have described
it is an incredibly useful extension. Once I got used to it, I found
myself using it all the time. Of course, you must be very careful to
ensure that the duplicates really are completely unimportant.

#30Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Dann Corbit (#29)
Re: [GENERAL] 7.4Beta

Dann Corbit wrote:

-----Original Message-----
From: Andreas Pflug [mailto:pgadmin@pse-consulting.de]
Sent: Friday, August 15, 2003 11:36 AM
To: Dann Corbit
Cc: Stephan Szabo; PostgreSQL-development
Subject: Re: [HACKERS] [GENERAL] 7.4Beta

Dann Corbit wrote:

Simplification of bulk operations can be very important for

customers

(on the other hand). For the CONNX tool set, we offer an escape on
INSERT/SELECT that performs the operation in bulk mode.

There are serious downsides to bulk operations also (such as

not being

logged and therefore existing outside of a transaction). Therefore,
they are useful really only in cases where a complete

destruction and

repopulation is called for. If anything goes haywire, you

can't simply

roll it back.

Yet the speed savings can be enormous (orders of magnitude).

Compared to iteration over a set of prepared inserts, a bulk insert
(such as using Microsoft's BCP API or Oracles Direct Path

loading) can

be 100 times faster. If you are moving gigabytes of data and
performing a complete refresh, the method to use becomes obvious.

When we go outside of the language bounds, a curly braced escape
notation is used. For instance, an insert/select might look

like this:

INSERT INTO <destination_table> SELECT <column_list> FROM
<source_table> {fn commitcount 1000} {bulkmode} The commit

count says

to use batches of 1000 rows and bulkmode says to use the fastest
possible insert method.

Imagine (for instance) that PostgreSQL can use an ODBC {or similar}
data source as a table. Then, with a syntax such as the above (or
SELECT INTO etc.), you could very rapidly move data from one system
into another.

When saying "bulk operation" I don't necessarily mean using
bulk load or
stuff like that. What I mean is handling large amounts of
similar data
at the same time. That doesn't say anything about transactions or
logging problems.
Imagine you have 100k or rows to load, each having FKs to (hopefully)
existing rows
- Step 1: load the rows into a temp table
- Step 2: identify duplicates, logging and deleting them
- Step 3: insert all data satisfying the FK constraints.
- Step 4: log all that didn't insert.

This are relatively few statements (not the simplest), which can be
handled in a safe manner.
Just an example, how a RDBMS can do the job in a mass
oriented (and thus
optimizable) way.

I really, really like SQL*Server's solution to the problem you have
stated above. A typical example is you have a pile of incoming words
and you want to create a dictionary. An example might be a library
database, where you have an abstract for each item. You parse the words
and look for new ones to add.

What SQL*Server allows is an index with an option called "IGNORE
DUPLICATES" that simply tosses out rows that are already in the table.
For applications like what I have described and what you have described
it is an incredibly useful extension. Once I got used to it, I found
myself using it all the time. Of course, you must be very careful to
ensure that the duplicates really are completely unimportant.

Hm, option is quite a quick hack. I like logging in the database much
better than some sql messages. And it's only about duplicates, not about
fk constraint violations.

Regards,
Andresa

#31Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#28)
Re: [GENERAL] 7.4Beta

Is there a TODO here?

---------------------------------------------------------------------------

Tom Lane wrote:

Jan Wieck <JanWieck@Yahoo.com> writes:

I'm thinking instead of a way to "cache" entire executors for this. Each
SPI plan used during a transaction would need it's own executor, and I
don't know offhand what type and how much resources an executor requires
(I think it's only some memory that get's initialized and the VFD's
opened).

Hmm. This is probably more feasible now than it would have been a year
ago, because I did some cleanup work to ensure that executor state is
localized into a specific memory context. I'm not certain about the
amount of overhead either, but it's surely worth a try.

regards, tom lane

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#32Jan Wieck
JanWieck@Yahoo.com
In reply to: Bruce Momjian (#31)
Re: [GENERAL] 7.4Beta

Bruce Momjian wrote:

Is there a TODO here?

Maybe!? It's one of these premature things noone can tell by now. So the
TODO would be "investigation" for now.

Jan

---------------------------------------------------------------------------

Tom Lane wrote:

Jan Wieck <JanWieck@Yahoo.com> writes:

I'm thinking instead of a way to "cache" entire executors for this. Each
SPI plan used during a transaction would need it's own executor, and I
don't know offhand what type and how much resources an executor requires
(I think it's only some memory that get's initialized and the VFD's
opened).

Hmm. This is probably more feasible now than it would have been a year
ago, because I did some cleanup work to ensure that executor state is
localized into a specific memory context. I'm not certain about the
amount of overhead either, but it's surely worth a try.

regards, tom lane

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephan Szabo (#19)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

[ continuing a discussion from mid-August ]

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

I assume what you have in mind is to replace
validateForeignKeyConstraint() with something that does a join of the
two tables via an SPI command.

It'd probably be:
MATCH unspecified:
SELECT keycolumns FROM referencing_table WHERE
(keycolumns) NOT IN (SELECT refcols FROM referenced_table
WHERE refcol1 IS NOT NULL AND ... )
AND keycolumn1 IS NOT NULL AND ...;

MATCH FULL: (something like, I haven't tried it)
SELECT keycolumns FROM referencing_table WHERE
((keycolumns) NOT IN (SELECT refcols FROM referenced_table
WHERE refcol1 IS NOT NULL AND ...)
AND
(keycolumn1 IS NOT NULL AND ...)
)
OR ((keycolumn1 IS NOT NULL)!=(keycolumn2 IS NOT NULL) OR ...)

I thought of what seems to be a better design for the check query: use
a LEFT JOIN and check for NULL in the righthand joined column. For
example, I think a MATCH UNSPECIFIED on two columns could be tested like
this:

select f1,f2
from fk left join pk on (fk.f1=pk.f1 and fk.f2=pk.f2)
where pk.f1 is null and (fk.f1 is not null and fk.f2 is not null);

and MATCH FULL is the same except

where pk.f1 is null and (fk.f1 is not null OR fk.f2 is not null);

MATCH PARTIAL would be harder; I think you'd need to generate a separate
query for each subset of the columns, in which you would probe for
unmatched rows having exactly that subset non-null. But it could be
done.

Do you see any logical error here?

In some preliminary tests, the planner seems to be able to choose
reasonable plans for this type of query even without pg_statistic data,
as long as it knows the table sizes (which it would do after CREATE INDEX).
So it would work reasonably well during a pg_dump script, I think.

regards, tom lane

#34Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Tom Lane (#33)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

On Sat, 27 Sep 2003, Tom Lane wrote:

[ continuing a discussion from mid-August ]

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

I assume what you have in mind is to replace
validateForeignKeyConstraint() with something that does a join of the
two tables via an SPI command.

It'd probably be:
MATCH unspecified:
SELECT keycolumns FROM referencing_table WHERE
(keycolumns) NOT IN (SELECT refcols FROM referenced_table
WHERE refcol1 IS NOT NULL AND ... )
AND keycolumn1 IS NOT NULL AND ...;

MATCH FULL: (something like, I haven't tried it)
SELECT keycolumns FROM referencing_table WHERE
((keycolumns) NOT IN (SELECT refcols FROM referenced_table
WHERE refcol1 IS NOT NULL AND ...)
AND
(keycolumn1 IS NOT NULL AND ...)
)
OR ((keycolumn1 IS NOT NULL)!=(keycolumn2 IS NOT NULL) OR ...)

I thought of what seems to be a better design for the check query: use
a LEFT JOIN and check for NULL in the righthand joined column. For
example, I think a MATCH UNSPECIFIED on two columns could be tested like
this:

select f1,f2
from fk left join pk on (fk.f1=pk.f1 and fk.f2=pk.f2)
where pk.f1 is null and (fk.f1 is not null and fk.f2 is not null);

and MATCH FULL is the same except

where pk.f1 is null and (fk.f1 is not null OR fk.f2 is not null);

MATCH PARTIAL would be harder; I think you'd need to generate a separate
query for each subset of the columns, in which you would probe for
unmatched rows having exactly that subset non-null. But it could be
done.

Do you see any logical error here?

In some preliminary tests, the planner seems to be able to choose
reasonable plans for this type of query even without pg_statistic data,
as long as it knows the table sizes (which it would do after CREATE INDEX).
So it would work reasonably well during a pg_dump script, I think.

Hmm, my initial testing showed that it really was a little slower
than a more complicated one with NOT EXISTS so I'd abandoned it. How does
it fare for you compared to:
select f1, f2 from fk where not exists (select 1 from pk where pk.f1=fk.f1
and pk.f2=pk.f2) where fk.f1 is not null and fk.f2 is not null;

I believe the above is the appropriate not exists form for match
unspecified.

I've actually got code (that no longer cleanly applies, but...) that uses
the single query version with NOT EXISTS (which could be easily changed to
either of the other forms) and was planning to put it together for a patch
when 7.5 devel started because I figured it wasn't precisely a bug and
wouldn't get accepted for 7.4.

#35Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Stephan Szabo (#34)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Stephan Szabo wrote:

Hmm, my initial testing showed that it really was a little slower
than a more complicated one with NOT EXISTS so I'd abandoned it. How does
it fare for you compared to:
select f1, f2 from fk where not exists (select 1 from pk where pk.f1=fk.f1
and pk.f2=pk.f2) where fk.f1 is not null and fk.f2 is not null;

I believe the above is the appropriate not exists form for match
unspecified.

I've actually got code (that no longer cleanly applies, but...) that uses
the single query version with NOT EXISTS (which could be easily changed to
either of the other forms) and was planning to put it together for a patch
when 7.5 devel started because I figured it wasn't precisely a bug and
wouldn't get accepted for 7.4.

I am a little lost on this point myself --- are we talking 7.4 or 7.5
for this change?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephan Szabo (#34)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

On Sat, 27 Sep 2003, Tom Lane wrote:

I thought of what seems to be a better design for the check query: use
a LEFT JOIN and check for NULL in the righthand joined column.

Hmm, my initial testing showed that it really was a little slower
than a more complicated one with NOT EXISTS so I'd abandoned it. How does
it fare for you compared to:
select f1, f2 from fk where not exists (select 1 from pk where pk.f1=fk.f1
and pk.f2=pk.f2) where fk.f1 is not null and fk.f2 is not null;

Were you testing against 7.3 or 7.4? On what kinds of tables?

In 7.4 I think that the JOIN would yield as good or better a plan. The
best possible plan for the NOT EXISTS query is effectively a nestloop
with inner indexscan, which is great if the FK table is small and the
PK table is large, but it sucks otherwise. The planner should choose a
plan of this form for the LEFT JOIN given that combination of table
sizes, and so there shouldn't be any great difference in runtime in that
case. But in other combinations, such as large FK and small PK, other
plan types will beat the pants off nestloop.

I've actually got code (that no longer cleanly applies, but...) that uses
the single query version with NOT EXISTS (which could be easily changed to
either of the other forms) and was planning to put it together for a patch
when 7.5 devel started because I figured it wasn't precisely a bug and
wouldn't get accepted for 7.4.

Well, Bruce has this on his open-items list, so I figure we have a green
light to do something for 7.4 if we can work out what to do.

regards, tom lane

#37Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Tom Lane (#36)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

On Sun, 28 Sep 2003, Tom Lane wrote:

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

On Sat, 27 Sep 2003, Tom Lane wrote:

I thought of what seems to be a better design for the check query: use
a LEFT JOIN and check for NULL in the righthand joined column.

Hmm, my initial testing showed that it really was a little slower
than a more complicated one with NOT EXISTS so I'd abandoned it. How does
it fare for you compared to:
select f1, f2 from fk where not exists (select 1 from pk where pk.f1=fk.f1
and pk.f2=pk.f2) where fk.f1 is not null and fk.f2 is not null;

Were you testing against 7.3 or 7.4? On what kinds of tables?

7.4 with tables of 1-2 integer key columns with 10k-1m pk rows of
sequential data (in the 2 key case it was value and #rows-value iirc) and
1m-20m fk rows of randomly generated valid data. But it wasn't any sort
of amazingly detailed test and those aren't huge tables, but I don't
exactly have a huge machine. I can go back through, do more tests and
report back.

In 7.4 I think that the JOIN would yield as good or better a plan. The
best possible plan for the NOT EXISTS query is effectively a nestloop
with inner indexscan, which is great if the FK table is small and the
PK table is large, but it sucks otherwise. The planner should choose a
plan of this form for the LEFT JOIN given that combination of table
sizes, and so there shouldn't be any great difference in runtime in that
case. But in other combinations, such as large FK and small PK, other
plan types will beat the pants off nestloop.

That's what I was expecting too. I expected it to basically go, NOT IN,
LEFT JOIN, NOT EXISTS in speed (at least when the hashing stuff
happened for in given the not in enhancements), but didn't actually see
that.

I've actually got code (that no longer cleanly applies, but...) that uses
the single query version with NOT EXISTS (which could be easily changed to
either of the other forms) and was planning to put it together for a patch
when 7.5 devel started because I figured it wasn't precisely a bug and
wouldn't get accepted for 7.4.

Well, Bruce has this on his open-items list, so I figure we have a green
light to do something for 7.4 if we can work out what to do.

I must have missed that. I'd have mentioned it earlier then.

#38Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Bruce Momjian (#35)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

On Sun, 28 Sep 2003, Bruce Momjian wrote:

Stephan Szabo wrote:

Hmm, my initial testing showed that it really was a little slower
than a more complicated one with NOT EXISTS so I'd abandoned it. How does
it fare for you compared to:
select f1, f2 from fk where not exists (select 1 from pk where pk.f1=fk.f1
and pk.f2=pk.f2) where fk.f1 is not null and fk.f2 is not null;

I believe the above is the appropriate not exists form for match
unspecified.

I've actually got code (that no longer cleanly applies, but...) that uses
the single query version with NOT EXISTS (which could be easily changed to
either of the other forms) and was planning to put it together for a patch
when 7.5 devel started because I figured it wasn't precisely a bug and
wouldn't get accepted for 7.4.

I am a little lost on this point myself --- are we talking 7.4 or 7.5
for this change?

I'd thought 7.5, but I now see that it's on the 7.4 open items list.

#39Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#36)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Tom Lane wrote:

I've actually got code (that no longer cleanly applies, but...) that uses
the single query version with NOT EXISTS (which could be easily changed to
either of the other forms) and was planning to put it together for a patch
when 7.5 devel started because I figured it wasn't precisely a bug and
wouldn't get accepted for 7.4.

Well, Bruce has this on his open-items list, so I figure we have a green
light to do something for 7.4 if we can work out what to do.

I put it on because I wasn't clear exactly what was happening in the
discussion. There also was discussion that we want to improve this now
because everyone will be using for upgrading to 7.4, and with a ~50% db
reload speed improvement, it is hard to ignore.

I am not against the idea of adding it to 7.4 if we can do it cleanly,
and in fact we are sort of waiting for more serious bug reports at this
time, so doing something else to improve the code isn't out of the
question if we can do it without stumbling --- seems dump/reload gets
full attention only during beta, which makes sense.

However, I think we have to be honest that this is a performance
_improvement_, not a fix. Yea, you can say it was a bug that we did it
the way we did in the past, but you have to look real hard to see it
that way. :-)

Let's have multiple people eyeball the patch and give it an OK and we
can add it for 7.4 if people want it. If you look really hard, you can
say it is a fix for a missing pg_upgrade!

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#40Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#39)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Let's have multiple people eyeball the patch and give it an OK and we
can add it for 7.4 if people want it.

Well, we haven't even *got* a proposed patch yet, but yeah we should
tread carefully. I do think it'd be okay to apply a patch if we can
come up with one that Stephan and Jan and I all like. As you say,
dump/reload speed normally doesn't get thought about except at this
stage of the release cycle, so ...

regards, tom lane

#41Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#40)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Let's have multiple people eyeball the patch and give it an OK and we
can add it for 7.4 if people want it.

Well, we haven't even *got* a proposed patch yet, but yeah we should
tread carefully. I do think it'd be okay to apply a patch if we can
come up with one that Stephan and Jan and I all like. As you say,
dump/reload speed normally doesn't get thought about except at this
stage of the release cycle, so ...

OK. What releases had this slow restore problem?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#41)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom Lane wrote:

Well, we haven't even *got* a proposed patch yet, but yeah we should
tread carefully.

OK. What releases had this slow restore problem?

We introduced it in 7.3 --- before that, FKs were simply dumped as
"create trigger" commands, and there was no check overhead. So arguably
it is a bug; a performance bug maybe, but that's still a bug. No one
has yet gone through a dump/reload cycle in which they had to face this
penalty.

regards, tom lane

#43Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#42)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom Lane wrote:

Well, we haven't even *got* a proposed patch yet, but yeah we should
tread carefully.

OK. What releases had this slow restore problem?

We introduced it in 7.3 --- before that, FKs were simply dumped as
"create trigger" commands, and there was no check overhead. So arguably
it is a bug; a performance bug maybe, but that's still a bug. No one
has yet gone through a dump/reload cycle in which they had to face this
penalty.

Now that is a strong argument. I knew you would find one. :-)

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#44Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Bruce Momjian (#43)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

You could just as easily argue that the lack of integrity testing at
data load time was equally a bug.

I think we need someway of telling postgres to suppress a foreign key check.

The main problem is that the foreign key column is often not indexed.

Chris

Bruce Momjian wrote:

Show quoted text

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom Lane wrote:

Well, we haven't even *got* a proposed patch yet, but yeah we should
tread carefully.

OK. What releases had this slow restore problem?

We introduced it in 7.3 --- before that, FKs were simply dumped as
"create trigger" commands, and there was no check overhead. So arguably
it is a bug; a performance bug maybe, but that's still a bug. No one
has yet gone through a dump/reload cycle in which they had to face this
penalty.

Now that is a strong argument. I knew you would find one. :-)

#45Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Christopher Kings-Lynne (#44)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Christopher Kings-Lynne wrote:

You could just as easily argue that the lack of integrity testing at
data load time was equally a bug.

I think we need someway of telling postgres to suppress a foreign key check.

The main problem is that the foreign key column is often not indexed.

As I remember, the new code is showing full table checks of a few
seconds, rather than minutes, but I agree we do need a way to turn off
checks some times.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#46Tom Lane
tgl@sss.pgh.pa.us
In reply to: Christopher Kings-Lynne (#44)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:

I think we need someway of telling postgres to suppress a foreign key check.

Well, the subtext argument here is "do we fix it by providing a way to
suppress the check, or do we fix it by making the check fast enough to
be tolerable?"

I think the advantages of choice (b) are obvious --- it doesn't allow
bogus data to be loaded accidentally, and it doesn't create a problem
with loading existing 7.3 dump files that don't know how to suppress the
check.

If we find there is no way to do (b) acceptably well, then and only then
would I want to consider (a).

regards, tom lane

#47Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Tom Lane (#46)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

I think the advantages of choice (b) are obvious --- it doesn't allow
bogus data to be loaded accidentally, and it doesn't create a problem
with loading existing 7.3 dump files that don't know how to suppress the
check.

OK, I didn't realise there was a (b). I volunteer to do speed tests on
data reloading on real data for our site.

Chris

#48Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Christopher Kings-Lynne (#44)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Christopher Kings-Lynne wrote:

You could just as easily argue that the lack of integrity testing at
data load time was equally a bug.

I think we need someway of telling postgres to suppress a foreign key
check.

The main problem is that the foreign key column is often not indexed.

So a db designer made a bloody mistake.
The problem is there's no easy way to find out what's missing.
I'd really like EXPLAIN to display all subsequent triggered queries
also, to see the full scans caused by missing indexes.

Regards,
Andreas

#49Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Andreas Pflug (#48)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

So a db designer made a bloody mistake.
The problem is there's no easy way to find out what's missing.
I'd really like EXPLAIN to display all subsequent triggered queries
also, to see the full scans caused by missing indexes.

I'd sure second that!

Chris

#50Hannu Krosing
hannu@tm.ee
In reply to: Andreas Pflug (#48)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Andreas Pflug kirjutas E, 29.09.2003 kell 12:04:

Christopher Kings-Lynne wrote:

You could just as easily argue that the lack of integrity testing at
data load time was equally a bug.

I think we need someway of telling postgres to suppress a foreign key
check.

The main problem is that the foreign key column is often not indexed.

So a db designer made a bloody mistake.
The problem is there's no easy way to find out what's missing.
I'd really like EXPLAIN to display all subsequent triggered queries
also, to see the full scans caused by missing indexes.

It could probably be doable for EXPLAIN ANALYZE (by actually tracing
execution), but then you will see really _all_ queries, i.e. for a 1000
row update you would see 1 UPDATE query and 1000 fk checks ...

OTOH, you probably can get that already from logs with right logging
parameters.

-------------
Hannu

#51Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Hannu Krosing (#50)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

So a db designer made a bloody mistake.
The problem is there's no easy way to find out what's missing.
I'd really like EXPLAIN to display all subsequent triggered queries
also, to see the full scans caused by missing indexes.

It could probably be doable for EXPLAIN ANALYZE (by actually tracing
execution), but then you will see really _all_ queries, i.e. for a 1000
row update you would see 1 UPDATE query and 1000 fk checks ...

OTOH, you probably can get that already from logs with right logging
parameters.

Actually - it shouldn't be too hard to write a query that returns all
unindexed foreign keys, surely?

Chris

#52Shridhar Daithankar
shridhar_daithankar@persistent.co.in
In reply to: Christopher Kings-Lynne (#51)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

On Monday 29 September 2003 15:58, Christopher Kings-Lynne wrote:

So a db designer made a bloody mistake.
The problem is there's no easy way to find out what's missing.
I'd really like EXPLAIN to display all subsequent triggered queries
also, to see the full scans caused by missing indexes.

It could probably be doable for EXPLAIN ANALYZE (by actually tracing
execution), but then you will see really _all_ queries, i.e. for a 1000
row update you would see 1 UPDATE query and 1000 fk checks ...

OTOH, you probably can get that already from logs with right logging
parameters.

Actually - it shouldn't be too hard to write a query that returns all
unindexed foreign keys, surely?

Correct me if I am wrong but I remember postgresql throwing error that foreign
key field was not unique in foreign table. Obviously it can not detect that
without an index. Either primary key or unique constraint would need an
index.

What am I missing here?

IOW, how do I exactly create foreign keys without an index?

Shridhar

#53Nigel J. Andrews
nandrews@investsystems.co.uk
In reply to: Christopher Kings-Lynne (#49)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

On Mon, 29 Sep 2003, Christopher Kings-Lynne wrote:

So a db designer made a bloody mistake.
The problem is there's no easy way to find out what's missing.
I'd really like EXPLAIN to display all subsequent triggered queries
also, to see the full scans caused by missing indexes.

I'd sure second that!

That's only partially determinable though. The trigger code could branch and
run two different queries depending on the values supplied in the 'input' tuple
of the trigger.

Nigel

#54Hannu Krosing
hannu@tm.ee
In reply to: Shridhar Daithankar (#52)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Shridhar Daithankar kirjutas E, 29.09.2003 kell 13:34:

On Monday 29 September 2003 15:58, Christopher Kings-Lynne wrote:

So a db designer made a bloody mistake.
The problem is there's no easy way to find out what's missing.
I'd really like EXPLAIN to display all subsequent triggered queries
also, to see the full scans caused by missing indexes.

It could probably be doable for EXPLAIN ANALYZE (by actually tracing
execution), but then you will see really _all_ queries, i.e. for a 1000
row update you would see 1 UPDATE query and 1000 fk checks ...

OTOH, you probably can get that already from logs with right logging
parameters.

Actually - it shouldn't be too hard to write a query that returns all
unindexed foreign keys, surely?

Correct me if I am wrong but I remember postgresql throwing error that foreign
key field was not unique in foreign table. Obviously it can not detect that
without an index. Either primary key or unique constraint would need an
index.

What am I missing here?

IOW, how do I exactly create foreign keys without an index?

hannu=# create table pkt(i int primary key);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
'pkt_pkey' for table 'pkt'
CREATE TABLE
hannu=# create table fkt(j int references pkt);
NOTICE: CREATE TABLE will create implicit trigger(s) for FOREIGN KEY
check(s)
CREATE TABLE
hannu=#

now the *foreygn key* column (fkt.j) is without index. As foreign keys
are enforced both ways, this can be a problem when changing table pkt or
bulk creating FK's on big tables.

----------------
Hannu

#55Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Nigel J. Andrews (#53)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Nigel J. Andrews wrote:

On Mon, 29 Sep 2003, Christopher Kings-Lynne wrote:

So a db designer made a bloody mistake.
The problem is there's no easy way to find out what's missing.
I'd really like EXPLAIN to display all subsequent triggered queries
also, to see the full scans caused by missing indexes.

I'd sure second that!

That's only partially determinable though. The trigger code could branch and
run two different queries depending on the values supplied in the 'input' tuple
of the trigger.

That would be ok; if I got a problem with a certain query, I don't
expect to find problems I might get with other queries. Though this
would be nice, how about a general pg_gimme_all_problems() function :-)

Regards,
Andreas

#56Greg Stark
gsstark@mit.edu
In reply to: Nigel J. Andrews (#53)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

So a db designer made a bloody mistake.

Not necessarily. If I'm never going to update or delete from the parent table
the index would be useless. I find very few of my foreign key relationships
actually need indexes on the child table. I usually only have the unique index
on the parent table.

And often the child table is the big table. The index would be very large and
have awful selectivity. The last thing I want is a 5-million record table with
half a dozen indexes each with 10-20 unique values.

The problem is there's no easy way to find out what's missing.
I'd really like EXPLAIN to display all subsequent triggered queries
also, to see the full scans caused by missing indexes.

I'd sure second that!

I think the root of problem here is the same as the root of the problem with
foreign key checks being slow for large batch updates and inserts. Namely that
foreign key constraint checks are being handled as a million small queries.

To handle foreign key constraints optimally they would really have to be
merged into the plan in a kind of join. For most inserts/updates something
like a nested-loop join that is effectively the same as the current triggers
would be used. But for large batch updates/inserts it's quite possible that it
would look more like a hash join or even a merge join.

To do that would probably mean throwing out the whole trigger-based
implementation though, which seems like an awfully big project.

And being able to disable and reenable constraints would still be nice.
They're never going to be instantaneous. And besides, speed isn't the only
reason to want to disable constraints temporarily. The database is a tool, it
should be there to do the DBA's bidding, not the other way around :)

--
greg

#57Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Tom Lane (#36)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

On Sun, 28 Sep 2003, Tom Lane wrote:

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

On Sat, 27 Sep 2003, Tom Lane wrote:

I thought of what seems to be a better design for the check query: use
a LEFT JOIN and check for NULL in the righthand joined column.

Hmm, my initial testing showed that it really was a little slower
than a more complicated one with NOT EXISTS so I'd abandoned it. How does
it fare for you compared to:
select f1, f2 from fk where not exists (select 1 from pk where pk.f1=fk.f1
and pk.f2=pk.f2) where fk.f1 is not null and fk.f2 is not null;

Were you testing against 7.3 or 7.4? On what kinds of tables?

In 7.4 I think that the JOIN would yield as good or better a plan. The
best possible plan for the NOT EXISTS query is effectively a nestloop
with inner indexscan, which is great if the FK table is small and the
PK table is large, but it sucks otherwise. The planner should choose a
plan of this form for the LEFT JOIN given that combination of table
sizes, and so there shouldn't be any great difference in runtime in that
case. But in other combinations, such as large FK and small PK, other
plan types will beat the pants off nestloop.

As an update, so far I still am getting better results with NOT EXISTS
than the left join. For a 50m row fk, 10k row pk where the rows are just
the keys, I'm getting a plan like
Merge Join
Index scan on pktable
Sort
Seqscan on fktable

which is taking about 2-4 times longer for me than the not exists
depending on sort_mem (at 4096,64000,128000).

When I lowered random_page_cost to 1, I got an indexscan on fktable, but
that hadn't seemed to finish after about 2 hours (as opposed to about
30-35 minutes for the not exists and about 30 min - 1 1/2 hours for the
sort+seqscan version.

I want to do some more tests where there's extraneous data in both tables
and see what that does to the results.

#58Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Stephan Szabo (#57)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

On Mon, 29 Sep 2003, Stephan Szabo wrote:

When I lowered random_page_cost to 1, I got an indexscan on fktable, but
that hadn't seemed to finish after about 2 hours (as opposed to about
30-35 minutes for the not exists and about 30 min - 1 1/2 hours for the

Small correction, I'd meant to type 20-35 minutes above, not 30-35.

#59Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephan Szabo (#57)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

As an update, so far I still am getting better results with NOT EXISTS
than the left join.

Hm. Don't suppose you were using EXPLAIN ANALYZE so we could see what's
happening? This is clearly a planner failure, although I'm unsure if we
can expect the planner to get the right answer with no pg_statistic entries.

regards, tom lane

#60Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Tom Lane (#59)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

On Mon, 29 Sep 2003, Tom Lane wrote:

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

As an update, so far I still am getting better results with NOT EXISTS
than the left join.

Hm. Don't suppose you were using EXPLAIN ANALYZE so we could see what's
happening? This is clearly a planner failure, although I'm unsure if we
can expect the planner to get the right answer with no pg_statistic entries.

For the sort+seq one and the not exists, I had, but I'll re-run it (it's
on my home desktop that I won't be able to access). The other when I
forced it to use an index scan I haven't let complete yet, and I don't
know how long that will take.

I was also planning to run a set after running analyze, so I'll include
those too. It'll probably be a few hours before the results are in. :)

Are there any other options (enable_mergejoin, etc) that you want me to
try with?

#61Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Tom Lane (#59)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

On Mon, 29 Sep 2003, Tom Lane wrote:

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

As an update, so far I still am getting better results with NOT EXISTS
than the left join.

Hm. Don't suppose you were using EXPLAIN ANALYZE so we could see what's
happening? This is clearly a planner failure, although I'm unsure if we
can expect the planner to get the right answer with no pg_statistic entries.

The left join one seems to give me values like the following:

At sort_mem=4096
Merge Right Join (cost=9966071.76..10349763.45 rows=49501250 width=4)
(actual time=4383435.48..4383435.48 rows=0 loops=1)
Merge Cond: (("outer".a = "inner".b) AND ("outer".b = "inner".c))
Filter: ("outer".a IS NULL)
-> Index Scan using pktest_a_key on pktest (cost=0.00..52.00
rows=1000 width=8) (actual time=17.82..1609.97 rows=10000 loops=1)
-> Sort (cost=9966071.76..10089824.88 rows=49501250 width=8) (actual
time=3876614.87..4157850.82 rows=50000000 loops=1)
Sort Key: fktest.b, fktest.c
-> Seq Scan on fktest (cost=0.00..745099.00 rows=49501250
width=8) (actual time=4.09..273798.65 rows=50000000 loops=1)
Filter: ((b IS NOT NULL) AND (c IS NOT NULL))
Total runtime: 4384366.79 msec
(9 rows)

At sort_mem=128000
Merge Right Join (cost=69.32..134.00 rows=991 width=4) (actual
time=2183787.83..2183787.83 rows=0 loops=1)
Merge Cond: (("outer".a = "inner".b) AND ("outer".b = "inner".c))
Filter: ("outer".a IS NULL)
-> Index Scan using pktest_a_key on pktest (cost=0.00..52.00
rows=1000 width=8) (actual time=13.11..390.40 rows=10000
loops=1)
-> Sort (cost=69.32..71.79 rows=991 width=8) (actual
time=1944240.67..2048954.65 rows=50000000 loops=1)
Sort Key: fktest.b, fktest.c
-> Seq Scan on fktest (cost=0.00..20.00 rows=991 width=8)
(actual time=2.61..225967.79 rows=50000000 loops=1)
Filter: ((b IS NOT NULL) AND (c IS NOT NULL))
Total runtime: 2184348.78 msec

I haven't finished a run with it doing an index scan on fktestyet,
still...

The not exists gives me:

Seq Scan on fktest (cost=0.00..242021289.48 rows=24750625 width=8)
(actual time=2032607.68..2032607.68 rows=0 loops=1)
Filter: ((b IS NOT NULL) AND (c IS NOT NULL) AND (NOT (subplan)))
SubPlan
-> Index Scan using pktest_a_key on pktest (cost=0.00..4.83 rows=1
width=0) (actual time=0.03..0.03 rows=1 loops=50000000)
Index Cond: ((a = $0) AND (b = $1))
Total runtime: 2032607.87 msec
(6 rows)

But this time was one of the higher times for this query. I'd seen times
down at about 1400000 msec yesterday.

#62Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Greg Stark (#56)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Greg Stark wrote:

So a db designer made a bloody mistake.

Not necessarily. If I'm never going to update or delete from the parent table
the index would be useless. I find very few of my foreign key relationships
actually need indexes on the child table. I usually only have the unique index
on the parent table.

And often the child table is the big table. The index would be very large and
have awful selectivity. The last thing I want is a 5-million record table with
half a dozen indexes each with 10-20 unique values.

The problem is there's no easy way to find out what's missing.
I'd really like EXPLAIN to display all subsequent triggered queries
also, to see the full scans caused by missing indexes.

I'd sure second that!

I think the root of problem here is the same as the root of the problem with
foreign key checks being slow for large batch updates and inserts. Namely that
foreign key constraint checks are being handled as a million small queries.

Apart from missing indices, this is certainly a problem. Statement level
triggers will solve this, as soon as they are fully implemented and
support OLD and NEW record sets.

Regards,
Andreas

#63Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephan Szabo (#61)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

Hm. Don't suppose you were using EXPLAIN ANALYZE so we could see what's
happening? This is clearly a planner failure, although I'm unsure if we
can expect the planner to get the right answer with no pg_statistic entries.

The left join one seems to give me values like the following:

There are some fishy row estimates in here:

-> Index Scan using pktest_a_key on pktest (cost=0.00..52.00
rows=1000 width=8) (actual time=17.82..1609.97 rows=10000 loops=1)

The system definitely should be expected to have the accurate row count
for the PK table, since an index should have been created on it (and we
do do that after loading the data, no?). It is possible that it'd have
the default 1000 estimate for the FK table, if there are no indexes at
all on the FK table; otherwise it should have the right number. It's
not real clear to me what conditions you're testing under, but the
estimates in the plans you're quoting aren't consistent ...

regards, tom lane

#64Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Tom Lane (#63)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

On Mon, 29 Sep 2003, Tom Lane wrote:

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

Hm. Don't suppose you were using EXPLAIN ANALYZE so we could see what's
happening? This is clearly a planner failure, although I'm unsure if we
can expect the planner to get the right answer with no pg_statistic entries.

The left join one seems to give me values like the following:

There are some fishy row estimates in here:

-> Index Scan using pktest_a_key on pktest (cost=0.00..52.00
rows=1000 width=8) (actual time=17.82..1609.97 rows=10000 loops=1)

The system definitely should be expected to have the accurate row count
for the PK table, since an index should have been created on it (and we
do do that after loading the data, no?). It is possible that it'd have
the default 1000 estimate for the FK table, if there are no indexes at
all on the FK table; otherwise it should have the right number. It's
not real clear to me what conditions you're testing under, but the
estimates in the plans you're quoting aren't consistent ...

Well, they're all from the same load of the same data with only stopping
and starting in between, but I did make the index on the pk table first
loaded the data and then built the fk table index ( because I'd wanted to
try without the index as well), which meant that it wouldn't match the
behavior of a dump. Ugh, I'd forgotten that the primary key didn't get
created until later too.

Okay, that's much better:
Hash Left Join (cost=203.00..1487869.29 rows=49501250 width=4) (actual
time=611632.67..611632.67 rows=0 loops=1)
Hash Cond: (("outer".b = "inner".a) AND ("outer".c = "inner".b))
Filter: ("inner".a IS NULL)
-> Seq Scan on fktest (cost=0.00..745099.00 rows=49501250 width=8)
(actual time=0.01..169642.48 rows=50000000 loops=1)
Filter: ((b IS NOT NULL) AND (c IS NOT NULL))
-> Hash (cost=152.00..152.00 rows=10000 width=8) (actual
time=46.04..46.04 rows=0 loops=1)
-> Seq Scan on pktest (cost=0.00..152.00 rows=10000 width=8)
(actual time=0.02..21.38 rows=10000 loops=1)
Total runtime: 611632.95 msec
(8 rows)

That's much better. :) As long as the row estimates are reasonable it
seems to be okay, but I do wonder why it chose the merge join for the case
when it thought there was only 1000 rows though.

#65Jan Wieck
JanWieck@Yahoo.com
In reply to: Tom Lane (#46)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Tom Lane wrote:

Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:

I think we need someway of telling postgres to suppress a foreign key check.

Well, the subtext argument here is "do we fix it by providing a way to
suppress the check, or do we fix it by making the check fast enough to
be tolerable?"

I think the advantages of choice (b) are obvious --- it doesn't allow
bogus data to be loaded accidentally, and it doesn't create a problem
with loading existing 7.3 dump files that don't know how to suppress the
check.

If we find there is no way to do (b) acceptably well, then and only then
would I want to consider (a).

I think I can accept it to be the choice of the DBA what to do. Pg_dump
has that kind of options already, one can choose between COPY and INSERT
for example. Why not adding the choice of dumping FKeys as ALTER TABLE
or CREATE CONSTRAINT TRIGGER?

The whole "original" idea (way back a few years ago) of doing it with
the CREATE CONSTRAINT TRIGGER command was, that your "backup" ought to
be consistent anyway. Finding out that your tape contains inconsistent
garbage _after_ your harddisk made that suspicious noise ... is a bit
late, isn't it?

That ALTER TABLE ... ADD CONSTRAINT needs to be improved, because at the
moment it is normally used we cannot make any assumptions about data
consistency, no question. But just because we have such a nice and
allways asked for ALTER TABLE command does not mean we have to force
every DBA of every well maintained and stable system to perform hourly
long nonsense-tests on known-to-be-good data.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#66Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Tom Lane (#63)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

(I think my previous attempt got aborted by a lost connection, so a
message like this may arrive twice)

On Mon, 29 Sep 2003, Tom Lane wrote:

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

Hm. Don't suppose you were using EXPLAIN ANALYZE so we could see what's
happening? This is clearly a planner failure, although I'm unsure if we
can expect the planner to get the right answer with no pg_statistic entries.

The left join one seems to give me values like the following:

There are some fishy row estimates in here:

-> Index Scan using pktest_a_key on pktest (cost=0.00..52.00
rows=1000 width=8) (actual time=17.82..1609.97 rows=10000 loops=1)

The system definitely should be expected to have the accurate row count
for the PK table, since an index should have been created on it (and we
do do that after loading the data, no?). It is possible that it'd have
the default 1000 estimate for the FK table, if there are no indexes at
all on the FK table; otherwise it should have the right number. It's
not real clear to me what conditions you're testing under, but the
estimates in the plans you're quoting aren't consistent ...

Also, the sequence was basically:
CREATE TABLE pktest(a int, b int, unique(a,b));
CREATE TABLE fktest(b int, c int);
COPY pktest FROM STDIN;
...
COPY fktest FROM STDIN;
...
<run some tests I didn't mention here>
CREATE INDEX fki on fktest(b,c);
<run the above test>

With stopping and restarting the server involved and running the tests
multiple times.

#67Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Wieck (#65)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Jan Wieck <JanWieck@Yahoo.com> writes:

I think I can accept it to be the choice of the DBA what to do. Pg_dump
has that kind of options already, one can choose between COPY and INSERT
for example. Why not adding the choice of dumping FKeys as ALTER TABLE
or CREATE CONSTRAINT TRIGGER?

We don't want it dumping as CREATE CONSTRAINT TRIGGER, because (a) that
loses pg_depend information and (b) it's too low-level a representation;
we couldn't ever change the implementation of foreign keys as long as
dumps look like that.

Also, I don't see why you'd want to make such a choice at pg_dump time.
Probably better to control it at restore time. Accordingly, my proposal
if we were to go that route would be a boolean GUC variable that simply
prevents ALTER TABLE ADD FOREIGN KEY from doing the validity checks.

regards, tom lane

#68Jan Wieck
JanWieck@Yahoo.com
In reply to: Tom Lane (#67)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Tom Lane wrote:

Jan Wieck <JanWieck@Yahoo.com> writes:

I think I can accept it to be the choice of the DBA what to do. Pg_dump
has that kind of options already, one can choose between COPY and INSERT
for example. Why not adding the choice of dumping FKeys as ALTER TABLE
or CREATE CONSTRAINT TRIGGER?

We don't want it dumping as CREATE CONSTRAINT TRIGGER, because (a) that
loses pg_depend information and (b) it's too low-level a representation;
we couldn't ever change the implementation of foreign keys as long as
dumps look like that.

That's finally 2 points, okay.

Also, I don't see why you'd want to make such a choice at pg_dump time.
Probably better to control it at restore time. Accordingly, my proposal
if we were to go that route would be a boolean GUC variable that simply
prevents ALTER TABLE ADD FOREIGN KEY from doing the validity checks.

Okay too. And this would be simple and safe enough to add it at the time
being.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#69Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Shridhar Daithankar (#52)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Correct me if I am wrong but I remember postgresql throwing error that foreign
key field was not unique in foreign table. Obviously it can not detect that
without an index. Either primary key or unique constraint would need an
index.

What am I missing here?

IOW, how do I exactly create foreign keys without an index?

You are taling about the primary key or the referenced key, not the
foreign key. (eg. the source column)

Chris

#70Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Jan Wieck (#68)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Jan Wieck wrote:

Tom Lane wrote:

Jan Wieck <JanWieck@Yahoo.com> writes:

I think I can accept it to be the choice of the DBA what to do. Pg_dump
has that kind of options already, one can choose between COPY and INSERT
for example. Why not adding the choice of dumping FKeys as ALTER TABLE
or CREATE CONSTRAINT TRIGGER?

We don't want it dumping as CREATE CONSTRAINT TRIGGER, because (a) that
loses pg_depend information and (b) it's too low-level a representation;
we couldn't ever change the implementation of foreign keys as long as
dumps look like that.

That's finally 2 points, okay.

Also, I don't see why you'd want to make such a choice at pg_dump time.
Probably better to control it at restore time. Accordingly, my proposal
if we were to go that route would be a boolean GUC variable that simply
prevents ALTER TABLE ADD FOREIGN KEY from doing the validity checks.

Okay too. And this would be simple and safe enough to add it at the time
being.

If we go that direction, why don't we just make a GUC variable to
disable constraint checking. Is that what this will do, or is it more
limited. I know it breaks referential integrity, but we have had many
folks as for it, it is on the TODO list, and there are tons of server
functions flying around that do just this by fiddling with pg_class. I
would rather just have it be a GUC for that particular backend. People
are going to need to turn it off anyway, so why not give them a clean
way to do it.

Also, how does someone turn it on at restore time if they are piping
into psql?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#71Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#70)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Jan Wieck wrote:

Tom Lane wrote:

if we were to go that route would be a boolean GUC variable that simply
prevents ALTER TABLE ADD FOREIGN KEY from doing the validity checks.

Okay too. And this would be simple and safe enough to add it at the time
being.

If we go that direction, why don't we just make a GUC variable to
disable constraint checking.

You mean in general, even for plain old insert/update/delete changes?
Yipes. What happened to ACID compliance?

What I actually expected to ensue was a discussion about how we could
narrow down the effects of a disable-foreign-key-verification switch to
reduce the odds of shooting oneself in the foot. (For example, maybe
disallow it from being set in postgresql.conf.) I wasn't expecting
proposals to enlarge the gauge of the foot-gun ...

Also, how does someone turn it on at restore time if they are piping
into psql?

Something like
export PGOPTIONS="-c disable-fk-verification=true"
then run psql or pg_restore.

regards, tom lane

#72Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Bruce Momjian (#70)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

On Tue, 30 Sep 2003, Bruce Momjian wrote:

Jan Wieck wrote:

Tom Lane wrote:

Jan Wieck <JanWieck@Yahoo.com> writes:

I think I can accept it to be the choice of the DBA what to do. Pg_dump
has that kind of options already, one can choose between COPY and INSERT
for example. Why not adding the choice of dumping FKeys as ALTER TABLE
or CREATE CONSTRAINT TRIGGER?

We don't want it dumping as CREATE CONSTRAINT TRIGGER, because (a) that
loses pg_depend information and (b) it's too low-level a representation;
we couldn't ever change the implementation of foreign keys as long as
dumps look like that.

That's finally 2 points, okay.

Also, I don't see why you'd want to make such a choice at pg_dump time.
Probably better to control it at restore time. Accordingly, my proposal
if we were to go that route would be a boolean GUC variable that simply
prevents ALTER TABLE ADD FOREIGN KEY from doing the validity checks.

Okay too. And this would be simple and safe enough to add it at the time
being.

If we go that direction, why don't we just make a GUC variable to
disable constraint checking. Is that what this will do, or is it more
limited. I know it breaks referential integrity, but we have had many
folks as for it, it is on the TODO list, and there are tons of server
functions flying around that do just this by fiddling with pg_class. I
would rather just have it be a GUC for that particular backend. People
are going to need to turn it off anyway, so why not give them a clean
way to do it.

But such a GUC wouldn't affect just one backend. It'd potentially affect
all backends that were doing concurrent modifications that would be
involved since the locks aren't taken. In addition, who would be allowed
to set this value and what constraints would it affect? If it's only
superusers, then it doesn't help for non-superuser restores. If it's
settable by anyone and affects only constraints on tables that user owns
and that refer to tables that user owns it might be okay. If it's
settable by anyone and affects all tables it renders the constraints
meaningless since anyone could break them.

#73Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#71)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Tom Lane wrote:

If we go that direction, why don't we just make a GUC variable to
disable constraint checking.

You mean in general, even for plain old insert/update/delete changes?
Yipes. What happened to ACID compliance?

What I actually expected to ensue was a discussion about how we could
narrow down the effects of a disable-foreign-key-verification switch to
reduce the odds of shooting oneself in the foot. (For example, maybe
disallow it from being set in postgresql.conf.) I wasn't expecting
proposals to enlarge the gauge of the foot-gun ...

Fact is, folks are doing it anyway by modifying pg_class. I know one
guy who did it in a transaction so he was the only one to see the
triggers disabled! The PostgreSQL cookbook page has an example too.
People are always asking how to do this. Why not just make it setable
only by the super-user.

FYI, TODO has:

* Allow triggers to be disabled [trigger]
* With disabled triggers, allow pg_dump to use ALTER TABLE ADD FOREIGN
KEY

The second one is the one we are discussing. If we never want to do it,
I will remove it from the TODO list. However, I think we might be
making things too controlled by not allowing administrators to do this.

Also, how does someone turn it on at restore time if they are piping
into psql?

Something like
export PGOPTIONS="-c disable-fk-verification=true"
then run psql or pg_restore.

How many folks are going to remember to do this? Why make it hard for
them? Someone is going to forget too easily. "Why is this restore
taking so long? Oh, I forgot that switch." Or they put it in a login
file and forget it is set. Seems safer for it to be in the dump file.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#74Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Stephan Szabo (#72)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Stephan Szabo wrote:

If we go that direction, why don't we just make a GUC variable to
disable constraint checking. Is that what this will do, or is it more
limited. I know it breaks referential integrity, but we have had many
folks as for it, it is on the TODO list, and there are tons of server
functions flying around that do just this by fiddling with pg_class. I
would rather just have it be a GUC for that particular backend. People
are going to need to turn it off anyway, so why not give them a clean
way to do it.

But such a GUC wouldn't affect just one backend. It'd potentially affect
all backends that were doing concurrent modifications that would be
involved since the locks aren't taken. In addition, who would be allowed
to set this value and what constraints would it affect? If it's only
superusers, then it doesn't help for non-superuser restores. If it's
settable by anyone and affects only constraints on tables that user owns
and that refer to tables that user owns it might be okay. If it's
settable by anyone and affects all tables it renders the constraints
meaningless since anyone could break them.

I assume it would be only setable by the super-user. They are mucking
around with pg_class anyway (and have permission to do so), so let them
do it cleanly at least. Allowing non-supers to do it for tables they
own would be OK, I guess. Is there a problem if some of the primary
table is owned by someone else? Not sure.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#75Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#73)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Bruce Momjian <pgman@candle.pha.pa.us> writes:

How many folks are going to remember to do this? Why make it hard for
them? Someone is going to forget too easily. "Why is this restore
taking so long? Oh, I forgot that switch." Or they put it in a login
file and forget it is set. Seems safer for it to be in the dump file.

I disagree. The "how many folks are going to remember to do this"
argument applies just as well to magic pg_dump switches; that's not
a tenable argument against doing it at restore time.

The difference between controlling it at pg_dump time and pg_restore
time is that if you change your mind after having made the dump, it's
too late, if the decision was nailed down in the dump file. In an
upgrade situation it's very likely that you no longer have the option
to re-do your dump, because you already blew away your old installation.

Since there's no performance difference at pg_dump time, I can't see any
advantage to freezing your decision then.

regards, tom lane

#76Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#74)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I assume it would be only setable by the super-user.

That might be a good restriction too (on top of my speculation about not
allowing it in postgresql.conf). Only allow it to be SET per-session,
and only by a superuser.

regards, tom lane

#77Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Bruce Momjian (#74)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

On Tue, 30 Sep 2003, Bruce Momjian wrote:

Stephan Szabo wrote:

If we go that direction, why don't we just make a GUC variable to
disable constraint checking. Is that what this will do, or is it more
limited. I know it breaks referential integrity, but we have had many
folks as for it, it is on the TODO list, and there are tons of server
functions flying around that do just this by fiddling with pg_class. I
would rather just have it be a GUC for that particular backend. People
are going to need to turn it off anyway, so why not give them a clean
way to do it.

But such a GUC wouldn't affect just one backend. It'd potentially affect
all backends that were doing concurrent modifications that would be
involved since the locks aren't taken. In addition, who would be allowed
to set this value and what constraints would it affect? If it's only
superusers, then it doesn't help for non-superuser restores. If it's
settable by anyone and affects only constraints on tables that user owns
and that refer to tables that user owns it might be okay. If it's
settable by anyone and affects all tables it renders the constraints
meaningless since anyone could break them.

I assume it would be only setable by the super-user. They are mucking
around with pg_class anyway (and have permission to do so), so let them
do it cleanly at least. Allowing non-supers to do it for tables they
own would be OK, I guess. Is there a problem if some of the primary
table is owned by someone else? Not sure.

The problem I have with a super-user only solution is that it doesn't
solve the problem for restores in general. I think we need a mechanism
that works for any user that wants to restore a table (or tables) from
dump(s), so for the dump/restore mechanism I think we should be looking in
that direction.

#78Christopher Browne
cbbrowne@acm.org
In reply to: Bruce Momjian (#73)
Re: ADD FOREIGN KEY

Centuries ago, Nostradamus foresaw when tgl@sss.pgh.pa.us (Tom Lane) would write:

Since there's no performance difference at pg_dump time, I can't see any
advantage to freezing your decision then.

This parallels the common suggestion of throwing an ANALYZE in at the
bottom of a pg_dump script. On that particular note, I'd think it
preferable to analyze after loading each table, since the data for the
specific table will still be in memory. But that's a _bit_ of a
change of subject.

This looks like something where a "hook" would be valuable such that
there is something in the pg_dump that can be configured AFTER the
fact to control how it's loaded.

It would surely seem valuable to have a way of making loads go As Fast
As Possible, even with the possibility of "breakneck speed" offering
the possibility of actually getting seriously injured (breaking one's
neck?). If the hardware fails during the recovery, consider that you
were _recovering_ from a _backup_; that surely ought to be an
eminently redoable operation, quite unlike accepting a random SQL
request from a user.

I have done some "recoveries" recently (well, more precisely,
"installs") by taking a tarball of a pre-existing database and
dropping it into place. I had no problem with the fact that if my
hand slipped and hit ^C at the wrong moment ("quelle horreur!"), I
would be forced to restart the "cd $TARGETDIR; tar xfvz Flex.tgz"
process.

I would be pretty "game" for a near-single-user-mode approach that
would turn off some of the usual functionality that we knew we didn't
need because the data source was an already-committed-and-FK-checked
set of data.
--
output = reverse("ac.notelrac.teneerf" "@" "454aa")
http://www.ntlug.org/~cbbrowne/spiritual.html
"Another result of the tyranny of Pascal is that beginners don't use
function pointers." --Rob Pike

#79Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Stephan Szabo (#77)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Stephan Szabo wrote:

On Tue, 30 Sep 2003, Bruce Momjian wrote:

Stephan Szabo wrote:

If we go that direction, why don't we just make a GUC variable to
disable constraint checking. Is that what this will do, or is it more
limited. I know it breaks referential integrity, but we have had many
folks as for it, it is on the TODO list, and there are tons of server
functions flying around that do just this by fiddling with pg_class. I
would rather just have it be a GUC for that particular backend. People
are going to need to turn it off anyway, so why not give them a clean
way to do it.

But such a GUC wouldn't affect just one backend. It'd potentially affect
all backends that were doing concurrent modifications that would be
involved since the locks aren't taken. In addition, who would be allowed
to set this value and what constraints would it affect? If it's only
superusers, then it doesn't help for non-superuser restores. If it's
settable by anyone and affects only constraints on tables that user owns
and that refer to tables that user owns it might be okay. If it's
settable by anyone and affects all tables it renders the constraints
meaningless since anyone could break them.

I assume it would be only setable by the super-user. They are mucking
around with pg_class anyway (and have permission to do so), so let them
do it cleanly at least. Allowing non-supers to do it for tables they
own would be OK, I guess. Is there a problem if some of the primary
table is owned by someone else? Not sure.

The problem I have with a super-user only solution is that it doesn't
solve the problem for restores in general. I think we need a mechanism
that works for any user that wants to restore a table (or tables) from
dump(s), so for the dump/restore mechanism I think we should be looking in
that direction.

OK. Let's explore that. What does ownership mean? If I grant all
permissions on an object I own to you, what can you not do? I think
GRANT/REVOKE and ALTER TABLE are the only two ones, right?

So, if I own it, I am the only one who can ALTER the table to add/remove
the foreign key constraint. So, if I already have a foreign key
constraint on a table, I can easily remove it if I am the owner and do
whatever I want with the table.

Now, the big question is, is there harm in my saying in the system
catalogs that I have a foreign key constraint on a table, when I might
have turned off the constraint via GUC and modified the table so the
foreign key constraint isn't valid?  I think that is the big question
--- is there harm to others in saying something I own has a foreign key,
when it might not?
-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#80Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#75)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

How many folks are going to remember to do this? Why make it hard for
them? Someone is going to forget too easily. "Why is this restore
taking so long? Oh, I forgot that switch." Or they put it in a login
file and forget it is set. Seems safer for it to be in the dump file.

I disagree. The "how many folks are going to remember to do this"
argument applies just as well to magic pg_dump switches; that's not
a tenable argument against doing it at restore time.

The difference between controlling it at pg_dump time and pg_restore
time is that if you change your mind after having made the dump, it's
too late, if the decision was nailed down in the dump file. In an
upgrade situation it's very likely that you no longer have the option
to re-do your dump, because you already blew away your old installation.

Since there's no performance difference at pg_dump time, I can't see any
advantage to freezing your decision then.

I understand, and if everyone used pg_restore, then adding a flag to
pg_restore to do this would make sense. However, everyone is used to
treating that dump file as a simple dump and throwing it into psql.
Psql doesn't have any special dump flags, so you have to do the
environment variable trick, which you must admit is pretty ugly looking
and prone to typing errors, and forgetting, because they are used to
invoking psql all the time.

Maybe we need a psql dump reload flag? Would we be able to do any other
optimizations, like increasing sort_mem or something? That would be a
clean solution, and perhaps allow additional optimizations.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#81Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#76)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I assume it would be only setable by the super-user.

That might be a good restriction too (on top of my speculation about not
allowing it in postgresql.conf). Only allow it to be SET per-session,

We don't have a way to make something unsetable in postgresql.conf right
now, do we?

and only by a superuser.

See my recent email on this about "ownership". I personally am happy
with super-user only (or db-owner and super-user only). As I said, it
is a question of what documenting a foreign key in the system catalogs
means to folks who don't own the table.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#82Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Bruce Momjian (#73)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Bruce Momjian wrote:

Fact is, folks are doing it anyway by modifying pg_class. I know one
guy who did it in a transaction so he was the only one to see the
triggers disabled! The PostgreSQL cookbook page has an example too.
People are always asking how to do this. Why not just make it setable
only by the super-user.

FYI, TODO has:

* Allow triggers to be disabled [trigger]
* With disabled triggers, allow pg_dump to use ALTER TABLE ADD FOREIGN
KEY

For practical reasons, I'd prefer the "disable trigger" not to influence
fk triggers, or at least to have such a default flavor. When restoring a
database, you might consider the data as consistent and complete, so no
triggers and ref checks are needed at all. But in the cases of some kind
of application data import, you might like the data to have fk ref
checked, but don't want to trigger all user triggers.
The implementation of fk checking by triggers should normally be hidden
to the user.

Regards,
Andreas

#83Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#80)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom Lane wrote:

Since there's no performance difference at pg_dump time, I can't see any
advantage to freezing your decision then.

I understand, and if everyone used pg_restore, then adding a flag to
pg_restore to do this would make sense. However, everyone is used to
treating that dump file as a simple dump and throwing it into psql.

So? A GUC variable could be set equally easily either way. In fact
more so.

Psql doesn't have any special dump flags, so you have to do the
environment variable trick,

You forgot SET ...

regards, tom lane

#84Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#81)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom Lane wrote:

That might be a good restriction too (on top of my speculation about not
allowing it in postgresql.conf). Only allow it to be SET per-session,

We don't have a way to make something unsetable in postgresql.conf right
now, do we?

Yeah, we do --- see GUC_DISALLOW_IN_FILE. The existing variable
zero_damaged_pages has the same restrictions we're talking about here,
and for largely the same reasons: you can shoot yourself in the foot
with it.

regards, tom lane

#85Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#79)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Stephan Szabo wrote:

The problem I have with a super-user only solution is that it doesn't
solve the problem for restores in general.

OK. Let's explore that. What does ownership mean?

It does not normally mean the ability to bypass consistency checks;
for example, if you put a CHECK constraint on a table, you don't get
to violate it because you own the table. (Of course superuserness
doesn't let you do so either...)

I see where Stephan is coming from, but in my mind disabling consistency
checks ought to be a feature reserved to the DBA (ie superuser), who
presumably has some clue about the tradeoffs involved. I don't think
ordinary users should be able to do it. If we can get the cost of
performing the initial check down to something reasonable (and I don't
mean "near zero", I mean something that's small in comparison to the
other costs of loading data and creating indexes), then I think we've
done as much as we should do for ordinary users.

regards, tom lane

#86Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Tom Lane (#85)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

On Tue, 30 Sep 2003, Tom Lane wrote:

I see where Stephan is coming from, but in my mind disabling consistency
checks ought to be a feature reserved to the DBA (ie superuser), who
presumably has some clue about the tradeoffs involved. I don't think
ordinary users should be able to do it. If we can get the cost of
performing the initial check down to something reasonable (and I don't
mean "near zero", I mean something that's small in comparison to the
other costs of loading data and creating indexes), then I think we've
done as much as we should do for ordinary users.

Limiting the cases under which constraint ignoring works is certainly
fine by me, but I was assuming that we were trying to make it accessable
to any restore. If that's not true, then we don't need to worry about that
part of the issue.

As a side note, in the partial implementation I'd already done, I noticed
a potential problem if the person doing the alter table didn't have read
permissions on the pktable. I'd written it to bail and do the slow check
in that case (well actually in most error cases that didn't themselves
cause an elog), does anyone have a better idea?

#87Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephan Szabo (#86)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

As a side note, in the partial implementation I'd already done, I noticed
a potential problem if the person doing the alter table didn't have read
permissions on the pktable. I'd written it to bail and do the slow check
in that case (well actually in most error cases that didn't themselves
cause an elog), does anyone have a better idea?

Wouldn't all the subsequent triggers fail also in such a case? (For
that matter, wouldn't the existing implementation of the initial check
fail?) I can't see a reason to expend code to avoid failing here. It's
not very sensible to be able to create an FK on a table you don't have
read permission for.

regards, tom lane

#88Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Tom Lane (#87)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

On Tue, 30 Sep 2003, Tom Lane wrote:

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

As a side note, in the partial implementation I'd already done, I noticed
a potential problem if the person doing the alter table didn't have read
permissions on the pktable. I'd written it to bail and do the slow check
in that case (well actually in most error cases that didn't themselves
cause an elog), does anyone have a better idea?

Wouldn't all the subsequent triggers fail also in such a case? (For
that matter, wouldn't the existing implementation of the initial check
fail?) I can't see a reason to expend code to avoid failing here. It's

No, because the triggers change permissions to the owner of the
appropriate (either fk or pk) table before running the query, so the old
method works as well as the final constraint would. However, if the two
owners are not the same, you can't set to both during the single query.

not very sensible to be able to create an FK on a table you don't have
read permission for.

IIRC, you only need references permissions to make an fk constraint, not
select.

#89Jan Wieck
JanWieck@Yahoo.com
In reply to: Stephan Szabo (#86)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Stephan Szabo wrote:

On Tue, 30 Sep 2003, Tom Lane wrote:

I see where Stephan is coming from, but in my mind disabling consistency
checks ought to be a feature reserved to the DBA (ie superuser), who
presumably has some clue about the tradeoffs involved. I don't think
ordinary users should be able to do it. If we can get the cost of
performing the initial check down to something reasonable (and I don't
mean "near zero", I mean something that's small in comparison to the
other costs of loading data and creating indexes), then I think we've
done as much as we should do for ordinary users.

Limiting the cases under which constraint ignoring works is certainly
fine by me, but I was assuming that we were trying to make it accessable
to any restore. If that's not true, then we don't need to worry about that
part of the issue.

It is not true.

Fact is that restoring can require more rights than creating the dump.
That is already the case if you want to restore anything that contains
objects owned by different users. Trying to enable everyone who can take
a dump also to restore it, by whatever mechanism, gives someone the
right to revert things in time and create a situation (consistent or
not) that he could not (re)create without doing dump/restore. This is
wrong and should not be possible.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#90Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: Jan Wieck (#89)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

On Tue, 30 Sep 2003, Jan Wieck wrote:

Stephan Szabo wrote:

On Tue, 30 Sep 2003, Tom Lane wrote:

I see where Stephan is coming from, but in my mind disabling consistency
checks ought to be a feature reserved to the DBA (ie superuser), who
presumably has some clue about the tradeoffs involved. I don't think
ordinary users should be able to do it. If we can get the cost of
performing the initial check down to something reasonable (and I don't
mean "near zero", I mean something that's small in comparison to the
other costs of loading data and creating indexes), then I think we've
done as much as we should do for ordinary users.

Limiting the cases under which constraint ignoring works is certainly
fine by me, but I was assuming that we were trying to make it accessable
to any restore. If that's not true, then we don't need to worry about that
part of the issue.

It is not true.

Fact is that restoring can require more rights than creating the dump.
That is already the case if you want to restore anything that contains
objects owned by different users. Trying to enable everyone who can take
a dump also to restore it, by whatever mechanism, gives someone the
right to revert things in time and create a situation (consistent or
not) that he could not (re)create without doing dump/restore. This is
wrong and should not be possible.

I think this is a larger argument than the one that was being discussed
above. Given a dump of objects I own, can I restore them without requiring
the fk check to be done if I alter table add constraint a foreign key? If
the answer to that is no, then the option can be put in as a superuser
only option and it's relatively easy. If the answer to that is yes, then
there are additional issues that need to be resolved.

#91Jan Wieck
JanWieck@Yahoo.com
In reply to: Stephan Szabo (#90)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Stephan Szabo wrote:

On Tue, 30 Sep 2003, Jan Wieck wrote:

Stephan Szabo wrote:

On Tue, 30 Sep 2003, Tom Lane wrote:

I see where Stephan is coming from, but in my mind disabling consistency
checks ought to be a feature reserved to the DBA (ie superuser), who
presumably has some clue about the tradeoffs involved. I don't think
ordinary users should be able to do it. If we can get the cost of
performing the initial check down to something reasonable (and I don't
mean "near zero", I mean something that's small in comparison to the
other costs of loading data and creating indexes), then I think we've
done as much as we should do for ordinary users.

Limiting the cases under which constraint ignoring works is certainly
fine by me, but I was assuming that we were trying to make it accessable
to any restore. If that's not true, then we don't need to worry about that
part of the issue.

It is not true.

Fact is that restoring can require more rights than creating the dump.
That is already the case if you want to restore anything that contains
objects owned by different users. Trying to enable everyone who can take
a dump also to restore it, by whatever mechanism, gives someone the
right to revert things in time and create a situation (consistent or
not) that he could not (re)create without doing dump/restore. This is
wrong and should not be possible.

I think this is a larger argument than the one that was being discussed
above. Given a dump of objects I own, can I restore them without requiring
the fk check to be done if I alter table add constraint a foreign key? If
the answer to that is no, then the option can be put in as a superuser
only option and it's relatively easy. If the answer to that is yes, then
there are additional issues that need to be resolved.

Well, with the CREATE CONSTRAINT TRIGGER you _can_, but we already have
a consensus that we don't _want_ that. Probably we should declare it
deprecated and remove it in 7.5. And the option currently under
discussion is exactly what will cause ALTER TABLE to let you, but IMHO
that _should_ be restricted.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#92Manfred Koizar
mkoi-pg@aon.at
In reply to: Christopher Browne (#78)
Re: ADD FOREIGN KEY

On Tue, 30 Sep 2003 08:00:07 -0400, Christopher Browne
<cbbrowne@acm.org> wrote:

I would be pretty "game" for a near-single-user-mode approach that
would turn off some of the usual functionality that we knew we didn't
need because the data source was an already-committed-and-FK-checked
set of data.

Single user mode is a good idea, IMHO. But it should only make sure
that there is not more than one user connected to the database (or to
the postmaster). Everything else should depend on special GUC
variables that are only settable in single user mode:

db=> SET disable-fk-verification = true;
ERROR: "disable-fk-verification" can only be set in single user mode
db=> SET SINGLE USER MODE ON;
ERROR: permission denied
HINT: Must be superuser or owner of database "db".
db=> \c - dbo
You are now connected as new user "dbo".
db=> SET SINGLE USER MODE ON;
ERROR: cannot enter single user mode
HINT: You are not the only user connected to database "db".
-- after other users have logged out ...
db=> SET SINGLE USER MODE ON;
SET
db=> SET disable-fk-verification = true;
SET

Single user mode would also help in several cases where now a
standalone backend is required ...

Servus
Manfred

#93Christopher Browne
cbbrowne@libertyrms.info
In reply to: Bruce Momjian (#73)
Re: ADD FOREIGN KEY

mkoi-pg@aon.at (Manfred Koizar) writes:

On Tue, 30 Sep 2003 08:00:07 -0400, Christopher Browne
<cbbrowne@acm.org> wrote:

I would be pretty "game" for a near-single-user-mode approach that
would turn off some of the usual functionality that we knew we didn't
need because the data source was an already-committed-and-FK-checked
set of data.

Single user mode is a good idea, IMHO. But it should only make sure
that there is not more than one user connected to the database (or
to the postmaster).

Well, there already exists an honest-to-goodness single-user mode,
where you start a postmaster directly.

This is the way that you need to connect to PG in order to be able to
regenerate indexes for any "nailed" system tables.

If I could be certain that a "pg_fast_recovery" program could run
several times faster than the existing approach of "psql <
recoveryfile.sql", then it might well be worthwhile to have something
invoked something like the following:

% zcat /backups/latest_backup.gz | postmaster -D $PGDATA -F -N 0 --fast-recovery-off-ACID --log /tmp/recovery.log mydb

-N 0 means that there won't even be as many as one user connected to
the database.

I would, given an ideal world, prefer to be able to have a connection
or two live during this to let me monitor the DB and even get an early
peek at the data. But if I could save a few hours of recovery time,
it might be livable to lose that.
--
select 'cbbrowne' || '@' || 'libertyrms.info';
<http://dev6.int.libertyrms.com/&gt;
Christopher Browne
(416) 646 3304 x124 (land)

#94Tom Lane
tgl@sss.pgh.pa.us
In reply to: Manfred Koizar (#92)
Re: ADD FOREIGN KEY

Manfred Koizar <mkoi-pg@aon.at> writes:

db=> SET disable-fk-verification = true;
ERROR: "disable-fk-verification" can only be set in single user mode

I don't really see the point of such a restriction. Restricting the
system to a single user has nothing to do with making
disable-fk-verification more safe. It would simply be an artificial
restriction making the feature harder to use.

Also, not very long ago we were speculating about the possible value of
parallel restore processes --- while I recall being unimpressed with the
likely gains, I wouldn't want to put a permanent kibosh on the idea by
adopting a philosophy that restores are supposed to be done in
single-user mode.

regards, tom lane

#95Greg Stark
gsstark@mit.edu
In reply to: Christopher Browne (#93)
Re: ADD FOREIGN KEY

Christopher Browne <cbbrowne@libertyrms.info> writes:

I would, given an ideal world, prefer to be able to have a connection
or two live during this to let me monitor the DB and even get an early
peek at the data.

On that note, how hard would it be to implement a read-dirty mode in postgres?
This would be useful for few things, the only thing I can think of are
progress indicators for long-running updates/inserts.

It seems like it falls naturally out of the MVCC algorithm, simply have it set
the transaction id of the current transaction to be a magic value that
compares greater than any transaction id. So all uncommitted transactions are
seen as having been committed in the past.

I don't see any real need for updates or inserts, but reasonable semantics for
them also fall out of MVCC. Any updates or inserts should be seen as being
committed infinitely far in the future. So they can only be seen by other
read-dirty transactions.

The main use for this that I see are doing select count(*) on tables being
imported or inserted into. Or perhaps being able to peek at records being
updated by another session in a long-running job.

If nothing else it'll save the load on the mailing list every time people ask
how to calculate how much longer their data load is going to take based on the
size of the files in the postgres data directory.

I'm sure I'm skipping a few steps. What I said doesn't quite make sense on its
own. I think I'm missing some key elements of the postgres MVCC system.

--
greg

#96Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Jan Wieck (#91)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Jan Wieck wrote:

I think this is a larger argument than the one that was being discussed
above. Given a dump of objects I own, can I restore them without requiring
the fk check to be done if I alter table add constraint a foreign key? If
the answer to that is no, then the option can be put in as a superuser
only option and it's relatively easy. If the answer to that is yes, then
there are additional issues that need to be resolved.

Well, with the CREATE CONSTRAINT TRIGGER you _can_, but we already have
a consensus that we don't _want_ that. Probably we should declare it
deprecated and remove it in 7.5. And the option currently under
discussion is exactly what will cause ALTER TABLE to let you, but IMHO
that _should_ be restricted.

Added to TODO:

* Remove CREATE CONSTRAINT TRIGGER

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#97Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Bruce Momjian (#96)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Well, with the CREATE CONSTRAINT TRIGGER you _can_, but we already have
a consensus that we don't _want_ that. Probably we should declare it
deprecated and remove it in 7.5. And the option currently under
discussion is exactly what will cause ALTER TABLE to let you, but IMHO
that _should_ be restricted.

How can we ever remove it - what about people upgrading from 7.0, 7.1,
7.2? Also, people upgrading from 7.3 who've never heard of adddepend...

Chris

#98Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Christopher Kings-Lynne (#97)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Christopher Kings-Lynne wrote:

Well, with the CREATE CONSTRAINT TRIGGER you _can_, but we already have
a consensus that we don't _want_ that. Probably we should declare it
deprecated and remove it in 7.5. And the option currently under
discussion is exactly what will cause ALTER TABLE to let you, but IMHO
that _should_ be restricted.

How can we ever remove it - what about people upgrading from 7.0, 7.1,
7.2? Also, people upgrading from 7.3 who've never heard of adddepend...

Not sure. We can remove documentation about it, at least.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#99Jan Wieck
JanWieck@Yahoo.com
In reply to: Bruce Momjian (#98)
Re: ADD FOREIGN KEY (was Re: [GENERAL] 7.4Beta)

Bruce Momjian wrote:

Christopher Kings-Lynne wrote:

Well, with the CREATE CONSTRAINT TRIGGER you _can_, but we already have
a consensus that we don't _want_ that. Probably we should declare it
deprecated and remove it in 7.5. And the option currently under
discussion is exactly what will cause ALTER TABLE to let you, but IMHO
that _should_ be restricted.

How can we ever remove it - what about people upgrading from 7.0, 7.1,
7.2? Also, people upgrading from 7.3 who've never heard of adddepend...

Not sure. We can remove documentation about it, at least.

If the idea is to support any 7.n -> 7.m (where n < m) upgrade directly,
then it's IMHO time for 8.0 and clearly stating that 7.x -> 8.y only is
supported as 7.x -> 7.3 -> 8.0 -> 8.y and "you're on your own with any
other attempt".

Don't get this wrong, I am a big friend of easy upgrades. But I am not a
big friend of making improvements impossible by "backward compatibility
forever". It was the "backward compatibility" to CP/M-80 (v2.2) that
caused MS-DOS 7.0 to have a maximum commandline length of 127 characters
... that was taking compatibility too far. Well, M$ took it too far the
other way from there and is compatible to nothing any more, not even to
themself ... but at least they learned from that mistake.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #