7.4Beta
Just loaded up 7.4Beta on a test system, to try and work out what the major
changes are/improvements.... I also started up pg_vacuum as this is now
working very well on our semi-live system (semi-live because we have not
finished developing the system yet.....)
Anyway. After trying to throw our 7.3 config file at it and it crashed badly
I resorted to the defaults. some of the settings in the config file have
changed. (Can somone make the config files backward compatable so at least
they ignor depricated settings!)
I throw last nights backup at it. Data went in in about 1/2 an hour then the
constraints went in and they took at age. about 2 hours.....
Is there anyway to speed up the database constraint code? Because quite
frankly at the current speed your probably better off without the
constraints.... (Same problem with 7.3 come to think about it.)
Otherwise 7.4. seams fine.
Question, when is replication and rolling backup comming. I want to be able
to take the last backup and a list of all the updates since then and get back
to the current database quickly.
Peter Childs
[ Beta discussion moved to hackers.]
Peter Childs wrote:
Just loaded up 7.4Beta on a test system, to try and work out what the major
changes are/improvements.... I also started up pg_vacuum as this is now
working very well on our semi-live system (semi-live because we have not
finished developing the system yet.....)
Anyway. After trying to throw our 7.3 config file at it and it crashed badly
I resorted to the defaults. some of the settings in the config file have
changed. (Can somone make the config files backward compatable so at least
they ignor depricated settings!)
There will be a lost of config name and other changes in a special
section soon.
I throw last nights backup at it. Data went in in about 1/2 an hour then the
constraints went in and they took at age. about 2 hours.....
Is there anyway to speed up the database constraint code? Because quite
frankly at the current speed your probably better off without the
constraints.... (Same problem with 7.3 come to think about it.)
Otherwise 7.4. seams fine.
Question, when is replication and rolling backup comming. I want to be able
to take the last backup and a list of all the updates since then and get back
to the current database quickly.
Rolling backups should be in 7.5, I hope.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
I throw last nights backup at it. Data went in in about 1/2 an hour then
the
constraints went in and they took at age. about 2 hours.....
Is there anyway to speed up the database constraint code? Because quite
frankly at the current speed your probably better off without the
constraints.... (Same problem with 7.3 come to think about it.)
I can also attest to the horrendously long time it takes to restore the ADD
FOREIGN KEY section...
Chris
On Fri, 15 Aug 2003, Christopher Kings-Lynne wrote:
I throw last nights backup at it. Data went in in about 1/2 an hour then
the
constraints went in and they took at age. about 2 hours.....
Is there anyway to speed up the database constraint code? Because quite
frankly at the current speed your probably better off without the
constraints.... (Same problem with 7.3 come to think about it.)I can also attest to the horrendously long time it takes to restore the ADD
FOREIGN KEY section...
That really needs to be rewritten to do a single check over the table
rather than running the constraint for every row. I keep meaning to get
around to it and never actually do. :( I'm not sure that in practice
you'll get a better plan at restore time depending on what the default
statistics give you.
I can also attest to the horrendously long time it takes to restore the
ADD
FOREIGN KEY section...
That really needs to be rewritten to do a single check over the table
rather than running the constraint for every row. I keep meaning to get
around to it and never actually do. :( I'm not sure that in practice
you'll get a better plan at restore time depending on what the default
statistics give you.
Surely in the default case it would reduce to using the new hashed IN()
feature, so it'd be a lot faster?
Chris
On Thu, 14 Aug 2003, Stephan Szabo wrote:
That really needs to be rewritten to do a single check over the table
rather than running the constraint for every row. I keep meaning to get
around to it and never actually do. :( I'm not sure that in practice
you'll get a better plan at restore time depending on what the default
statistics give you.
Perhaps it would be easier to allow SKIP VALIDATION (or something) with
ALTER TABLE ADD .... which can set FkConstraint->skip_validation. If we're
just handling pg_dump output, then presumably the data is already
validated. On the other handle, it might encourage users to bypass FKs
when they feel like it...
Thanks,
Gavin
On Fri, 15 Aug 2003, Christopher Kings-Lynne wrote:
I can also attest to the horrendously long time it takes to restore the
ADD
FOREIGN KEY section...
That really needs to be rewritten to do a single check over the table
rather than running the constraint for every row. I keep meaning to get
around to it and never actually do. :( I'm not sure that in practice
you'll get a better plan at restore time depending on what the default
statistics give you.Surely in the default case it would reduce to using the new hashed IN()
feature, so it'd be a lot faster?
If we wrote the query using IN that'd be the hope (I've not played with it
enough to guarantee that)
However, on a simple test comparing
select * from fk where not exists(select * from pk where pk.key=fk.key)
and key is not null;
(doing seq scan/subplan doing index scan - which is probably close to the
current system)
and
select * from fk where key in (select key from pk) and key is not null
on a pk table with 100k rows and fk table with 1m rows gives me a
difference of about 2x on my machine.
But that's with a single column int4 key, I haven't tried multi-column
keys or larger key types.
On Fri, 15 Aug 2003, Gavin Sherry wrote:
On Thu, 14 Aug 2003, Stephan Szabo wrote:
That really needs to be rewritten to do a single check over the table
rather than running the constraint for every row. I keep meaning to get
around to it and never actually do. :( I'm not sure that in practice
you'll get a better plan at restore time depending on what the default
statistics give you.Perhaps it would be easier to allow SKIP VALIDATION (or something) with
ALTER TABLE ADD .... which can set FkConstraint->skip_validation. If we're
just handling pg_dump output, then presumably the data is already
validated. On the other handle, it might encourage users to bypass FKs
when they feel like it...
We've talked about stuff like that in the past, but we seem to generally
get stuck about how to specify it. If we add it to the alter table add as
an option then we're generating statements that are almost like a standard
sql statement, but not quite, and some people didn't like that. A set
option that only affected ALTER TABLE ADD CONSTRAINT wouldn't be too bad,
but I'd be worried about making it apply to the normal checks as well
(which I believe was suggested last time this came up). In any case,
making the full checks faster for when you really do care isn't a bad
plan. :)
We've talked about stuff like that in the past, but we seem to generally
get stuck about how to specify it. If we add it to the alter table add as
an option then we're generating statements that are almost like a standard
sql statement, but not quite, and some people didn't like that. A set
option that only affected ALTER TABLE ADD CONSTRAINT wouldn't be too bad,
but I'd be worried about making it apply to the normal checks as well
(which I believe was suggested last time this came up). In any case,
making the full checks faster for when you really do care isn't a bad
plan. :)
How about having an 'i'm restoring' SET var:
SET data_restore = true;
Which means (among other things) that FK's aren't checked?
Chris
We've talked about stuff like that in the past, but we seem to generally
get stuck about how to specify it. If we add it to the alter table add as
an option then we're generating statements that are almost like a standard
sql statement, but not quite, and some people didn't like that. A set
option that only affected ALTER TABLE ADD CONSTRAINT wouldn't be too bad,
but I'd be worried about making it apply to the normal checks as well
(which I believe was suggested last time this came up). In any case,
making the full checks faster for when you really do care isn't a bad
plan. :)
Or, this:
1. Add the FK to the table BEFORE COPYing data
2. Use the old update blah set reltriggers = 0 trick
3. restore the data
4. Undo step 2
Chris
On Fri, 15 Aug 2003, Christopher Kings-Lynne wrote:
We've talked about stuff like that in the past, but we seem to generally
get stuck about how to specify it. If we add it to the alter table add as
an option then we're generating statements that are almost like a standard
sql statement, but not quite, and some people didn't like that. A set
option that only affected ALTER TABLE ADD CONSTRAINT wouldn't be too bad,
but I'd be worried about making it apply to the normal checks as well
(which I believe was suggested last time this came up). In any case,
making the full checks faster for when you really do care isn't a bad
plan. :)Or, this:
1. Add the FK to the table BEFORE COPYing data
2. Use the old update blah set reltriggers = 0 trick
3. restore the data
4. Undo step 2
The problem with that is that I think the reltriggers=0 trick only works
if you're superuser, I thought that's why the trigger disabling became
optional. A set that affected only atac would probably be reasonable in
the dump (and presumably easy to do). It'd also carry over to future
cases where we separate some check constraints (for examples ones
that refer directly or indirectly to the same table in a subselect).
1. Add the FK to the table BEFORE COPYing data
2. Use the old update blah set reltriggers = 0 trick
3. restore the data
4. Undo step 2The problem with that is that I think the reltriggers=0 trick only works
if you're superuser, I thought that's why the trigger disabling became
optional. A set that affected only atac would probably be reasonable in
the dump (and presumably easy to do). It'd also carry over to future
cases where we separate some check constraints (for examples ones
that refer directly or indirectly to the same table in a subselect).
Well yes, this would be a super-user only ability. Are you worried about
people restoring dumps as non-superuser?
Chris
On Fri, 15 Aug 2003, Christopher Kings-Lynne wrote:
1. Add the FK to the table BEFORE COPYing data
2. Use the old update blah set reltriggers = 0 trick
3. restore the data
4. Undo step 2The problem with that is that I think the reltriggers=0 trick only works
if you're superuser, I thought that's why the trigger disabling became
optional. A set that affected only atac would probably be reasonable in
the dump (and presumably easy to do). It'd also carry over to future
cases where we separate some check constraints (for examples ones
that refer directly or indirectly to the same table in a subselect).Well yes, this would be a super-user only ability. Are you worried about
people restoring dumps as non-superuser?
Basically, yes (you might dump only some tables that you own for example).
I think that's why the data only dumps no longer do the reltriggers thing
by default and you need an option to get at it. We could make a similar
option for this case, but it'd only work when restored by a superuser.
Stephan Szabo wrote:
On Fri, 15 Aug 2003, Christopher Kings-Lynne wrote:
I throw last nights backup at it. Data went in in about 1/2 an hour then
the
constraints went in and they took at age. about 2 hours.....
Is there anyway to speed up the database constraint code? Because quite
frankly at the current speed your probably better off without the
constraints.... (Same problem with 7.3 come to think about it.)I can also attest to the horrendously long time it takes to restore the ADD
FOREIGN KEY section...That really needs to be rewritten to do a single check over the table
rather than running the constraint for every row. I keep meaning to get
around to it and never actually do. :( I'm not sure that in practice
you'll get a better plan at restore time depending on what the default
statistics give you.
This is clearly a case for a statement level trigger, as soon as
affected rows can be identified.
One remark on that enable/disable triggers stuff: from a user's
perspective, I wouldn't consider a constraint trigger as a trigger, so
if I'd disable all triggers on a table, I still would expect all
constraints to be checked.
Regards,
Andreas
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
On Fri, 15 Aug 2003, Christopher Kings-Lynne wrote:
I can also attest to the horrendously long time it takes to restore the ADD
FOREIGN KEY section...
That really needs to be rewritten to do a single check over the table
rather than running the constraint for every row. I keep meaning to get
around to it and never actually do. :( I'm not sure that in practice
you'll get a better plan at restore time depending on what the default
statistics give you.
In simple cases I think that the creation of indexes would be enough to
get you a passable plan --- CREATE INDEX does update pg_class.reltuples,
so the planner will know how big the tables are, and for single-column
primary keys the existence of a unique index is enough to cue the
planner that the column is unique, even without any ANALYZE stats.
Those are the biggest levers on the plan choice.
This assumes that pg_dump always dumps CREATE INDEX before ADD FOREIGN
KEY; I'm not certain if there's anything to enforce that at the
moment...
I assume what you have in mind is to replace
validateForeignKeyConstraint() with something that does a join of the
two tables via an SPI command. But supposing that we want to keep the
present ability to report (one of) the failing key values, it seems
like the query has to look like
SELECT keycolumns FROM referencing_table WHERE
keycolumns NOT IN (SELECT refcols FROM referenced_table);
which is only gonna do the right thing for one of the MATCH styles
(not sure which, offhand ... actually it may not do the right thing
for any match style if there are nulls in referenced_table ...).
How would you make it work for all the MATCH styles? And will it
really be all that efficient? (NOT IN is a lot more circumscribed
than IN.)
regards, tom lane
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
select * from fk where not exists(select * from pk where pk.key=fk.key)
and key is not null;
(doing seq scan/subplan doing index scan - which is probably close to the
current system)
Actually, even that would probably be noticeably better than the current
system. I haven't profiled it (someone should) but I suspect that
executor startup/shutdown time is a huge hit. Even though the trigger
is caching a plan, it has to instantiate that plan for each referencing
tuple --- and the executor is not designed for quick startup/shutdown.
(Of course, this would become less relevant if the triggers got
rewritten to not go through SPI ...)
regards, tom lane
On Fri, 15 Aug 2003, Tom Lane wrote:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
select * from fk where not exists(select * from pk where pk.key=fk.key)
and key is not null;
(doing seq scan/subplan doing index scan - which is probably close to the
current system)Actually, even that would probably be noticeably better than the current
system. I haven't profiled it (someone should) but I suspect that
executor startup/shutdown time is a huge hit. Even though the trigger
is caching a plan, it has to instantiate that plan for each referencing
tuple --- and the executor is not designed for quick startup/shutdown.
Yeah, but it was pretty much the best I could do testing on the command
line. And it was still a fair bit more expensive than using IN (my tests
on various key types showed anywhere from 15% to 300% better speed on IN
over exists for this).
On Fri, 15 Aug 2003, Andreas Pflug wrote:
Stephan Szabo wrote:
That really needs to be rewritten to do a single check over the table
rather than running the constraint for every row. I keep meaning to get
around to it and never actually do. :( I'm not sure that in practice
you'll get a better plan at restore time depending on what the default
statistics give you.This is clearly a case for a statement level trigger, as soon as
affected rows can be identified.
Well, I think single inserts might be more expensive (because the query is
more involved for the table joining case) using a statement level trigger,
so we'd probably want to profile the cases.
On Fri, 15 Aug 2003, Tom Lane wrote:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
On Fri, 15 Aug 2003, Christopher Kings-Lynne wrote:
I can also attest to the horrendously long time it takes to restore the ADD
FOREIGN KEY section...That really needs to be rewritten to do a single check over the table
rather than running the constraint for every row. I keep meaning to get
around to it and never actually do. :( I'm not sure that in practice
you'll get a better plan at restore time depending on what the default
statistics give you.In simple cases I think that the creation of indexes would be enough to
get you a passable plan --- CREATE INDEX does update pg_class.reltuples,
so the planner will know how big the tables are, and for single-column
primary keys the existence of a unique index is enough to cue the
planner that the column is unique, even without any ANALYZE stats.
Those are the biggest levers on the plan choice.This assumes that pg_dump always dumps CREATE INDEX before ADD FOREIGN
KEY; I'm not certain if there's anything to enforce that at the
moment...I assume what you have in mind is to replace
validateForeignKeyConstraint() with something that does a join of the
two tables via an SPI command. But supposing that we want to keep the
present ability to report (one of) the failing key values, it seems
like the query has to look like
SELECT keycolumns FROM referencing_table WHERE
keycolumns NOT IN (SELECT refcols FROM referenced_table);
which is only gonna do the right thing for one of the MATCH styles
(not sure which, offhand ... actually it may not do the right thing
for any match style if there are nulls in referenced_table ...).
Yes, in practice, you'd have to put IS NOT NULL checks in the subselect,
which is fine for the two match types we support since a referenced row
with a NULL isn't a choice for a referenced row for those. I think MATCH
PARTIAL might have to fall back to the repeated check unless we can make
the query work which would be harder because you only want to compare the
columns for a particular row where the keycolumn case is not null and I
can't think of a query for that that'd be particularly clean and likely to
be fast, then again I don't think the constraint would be either. :(
It'd probably be:
MATCH unspecified:
SELECT keycolumns FROM referencing_table WHERE
(keycolumns) NOT IN (SELECT refcols FROM referenced_table
WHERE refcol1 IS NOT NULL AND ... )
AND keycolumn1 IS NOT NULL AND ...;
MATCH FULL: (something like, I haven't tried it)
SELECT keycolumns FROM referencing_table WHERE
((keycolumns) NOT IN (SELECT refcols FROM referenced_table
WHERE refcol1 IS NOT NULL AND ...)
AND
(keycolumn1 IS NOT NULL AND ...)
)
OR ((keycolumn1 IS NOT NULL)!=(keycolumn2 IS NOT NULL) OR ...)
How would you make it work for all the MATCH styles? And will it
really be all that efficient? (NOT IN is a lot more circumscribed
than IN.)
I'm not really sure yet. Limited tests seem to show that it'll probably
be as fast if not faster for all reasonable cases, but I'd want to
generate a much larger random data set and actually put it in to make a
fair comparison (maybe temporarily with a set to allow people to try both
cases on real world data). One other advantage here is that we don't need
to get row locks while checking this if we've already gotten the exclusive
table locks on both tables involved. I'm not sure if we do that currently
though.
Stephan Szabo wrote:
On Fri, 15 Aug 2003, Andreas Pflug wrote:
Stephan Szabo wrote:
That really needs to be rewritten to do a single check over the table
rather than running the constraint for every row. I keep meaning to get
around to it and never actually do. :( I'm not sure that in practice
you'll get a better plan at restore time depending on what the default
statistics give you.This is clearly a case for a statement level trigger, as soon as
affected rows can be identified.Well, I think single inserts might be more expensive (because the query is
more involved for the table joining case) using a statement level trigger,
so we'd probably want to profile the cases.
This really depends. If a constraint is just a check on the
inserted/updated column, so no other row needs to be checked, there's no
faster way then the current row trigger. But FK constraints need to
execute a query to retrieve the referenced row, and every RDBMS prefers
to execute a single statement with many rows over many statements with a
single row, because the first will profit from optimization. And even if
only a single row is inserted or updated, there's still the need to
lookup the reference.
Regards,
Andreas