minimal update
For some time I have been working on removing some inefficiencies from a
large DW-type app. This app does a large daily batch update, and this is
what is the major bottleneck. One of the things I have been doing is to
remove unnecessary updates (which are particualrly expensive in our
index-rich setting). Several times now I have wished that there was a
switch on the UPDATE command that said "do minimal instead of maximal
updating". i.e., don't update records with identical replacements. At
the moment I have to write things like:
update tname set foo = bar ... where foo is null or foo <> bar ...
This becomes more than tedious when the update might be setting thirty
or forty fields, and I have to write such tests for each of them. It
would be so much nicer to be able to write something like:
update tname minimally set foo = bar ...
Is this an insane idea, or would it be possible, practical and useful?
cheers
andrew
On Fri, Nov 02, 2007 at 11:49:38AM -0400, Andrew Dunstan wrote:
For some time I have been working on removing some inefficiencies
from a large DW-type app. This app does a large daily batch update,
and this is what is the major bottleneck. One of the things I have
been doing is to remove unnecessary updates (which are particualrly
expensive in our index-rich setting). Several times now I have
wished that there was a switch on the UPDATE command that said "do
minimal instead of maximal updating". i.e., don't update records
with identical replacements. At the moment I have to write things
like:update tname set foo = bar ... where foo is null or foo <> bar
...
One way I've done this is make RULEs which basically drop non-updating
"UPDATEs" on the floor.
CREATE RULE foo_drop_empty_updates AS
ON UPDATE TO foo
WHERE ROW(OLD.*)::foo IS NOT DISTINCT FROM ROW(NEW.*)::foo
DO INSTEAD NOTHING;
It's pretty easy to automate rule creation, but since Postgres doesn't
have DDL triggers, it's also a bit of a foot gun.
By the way, the above has what I think of as an infelicity in 8.2.5,
namely that you need non-obvious contortions to get it to work. I'm
thinking OLD IS NOT DISTINCT FROM NEW should Just Work(TM).
This becomes more than tedious when the update might be setting thirty
or forty fields, and I have to write such tests for each of them. It
would be so much nicer to be able to write something like:update tname minimally set foo = bar ...
Is this an insane idea, or would it be possible, practical and useful?
I don't know about the sanity, but I've done it a couple of places :)
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> writes:
On Fri, Nov 02, 2007 at 11:49:38AM -0400, Andrew Dunstan wrote:
At the moment I have to write things like:
update tname set foo = bar ... where foo is null or foo <> bar
One way I've done this is make RULEs which basically drop non-updating
"UPDATEs" on the floor.
A BEFORE UPDATE trigger would be better, and probably hardly more
expensive than a wired-in facility (especially if you were willing to
write it in C).
regards, tom lane
Tom Lane wrote:
David Fetter <david@fetter.org> writes:
On Fri, Nov 02, 2007 at 11:49:38AM -0400, Andrew Dunstan wrote:
At the moment I have to write things like:
update tname set foo = bar ... where foo is null or foo <> bar
One way I've done this is make RULEs which basically drop non-updating
"UPDATEs" on the floor.A BEFORE UPDATE trigger would be better, and probably hardly more
expensive than a wired-in facility (especially if you were willing to
write it in C).
Yes. I also prefer the trigger idea to a rule because triggers are easy
to enable and disable. It's still a lot of work for what must be a
common want, though. Could it be done generically?
cheers
andrew
Andrew Dunstan <andrew@dunslane.net> writes:
Tom Lane wrote:
A BEFORE UPDATE trigger would be better, and probably hardly more
expensive than a wired-in facility (especially if you were willing to
write it in C).
Yes. I also prefer the trigger idea to a rule because triggers are easy
to enable and disable. It's still a lot of work for what must be a
common want, though. Could it be done generically?
Well, you could write the trigger in C and it'd work for any table.
I think it could be as simple as a memcmp of the tuples' data areas,
since we now require padding bytes to be 0 ...
regards, tom lane
Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
Tom Lane wrote:
A BEFORE UPDATE trigger would be better, and probably hardly more
expensive than a wired-in facility (especially if you were willing to
write it in C).Yes. I also prefer the trigger idea to a rule because triggers are easy
to enable and disable. It's still a lot of work for what must be a
common want, though. Could it be done generically?Well, you could write the trigger in C and it'd work for any table.
I think it could be as simple as a memcmp of the tuples' data areas,
since we now require padding bytes to be 0 ...
Ah. Good. Thanks, that's the piece I was missing.
cheers
andrew
Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
Tom Lane wrote:
A BEFORE UPDATE trigger would be better, and probably hardly more
expensive than a wired-in facility (especially if you were willing to
write it in C).Yes. I also prefer the trigger idea to a rule because triggers are easy
to enable and disable. It's still a lot of work for what must be a
common want, though. Could it be done generically?Well, you could write the trigger in C and it'd work for any table.
I think it could be as simple as a memcmp of the tuples' data areas,
since we now require padding bytes to be 0 ...
Something like this fragment?
newtuple = trigdata->tg_newtuple;
oldtuple = trigdata->tg_trigtuple;
rettuple = newtuple;
if (newtuple->t_len == oldtuple->t_len &&
newtuple->t_data->t_hoff == oldtuple->t_data->t_hoff &&
memcmp(GETSTRUCT(newtuple),GETSTRUCT(oldtuple),
newtuple->t_len - newtuple->t_data->t_hoff) == 0)
rettuple = NULL;
return PointerGetDatum(rettuple);
Also, when did we first require padding bytes to be 0?
cheers
andrew
Andrew Dunstan <andrew@dunslane.net> writes:
Tom Lane wrote:
Well, you could write the trigger in C and it'd work for any table.
I think it could be as simple as a memcmp of the tuples' data areas,
since we now require padding bytes to be 0 ...
Something like this fragment?
newtuple = trigdata->tg_newtuple;
oldtuple = trigdata->tg_trigtuple;
rettuple = newtuple;
if (newtuple->t_len == oldtuple->t_len &&
newtuple->t_data->t_hoff == oldtuple->t_data->t_hoff &&
memcmp(GETSTRUCT(newtuple),GETSTRUCT(oldtuple),
newtuple->t_len - newtuple->t_data->t_hoff) == 0)
rettuple = NULL;
return PointerGetDatum(rettuple);
Close, but I think you also need to take care to compare natts and
the null bitmaps (if any). Might be worth comparing OIDs too, though
AFAIR there is no mechanism for substituting a different OID during
UPDATE. Probably the easiest coding is to memcmp all the way from
offsetof(t_bits) to t_len, after comparing natts and the HASNULL and
HASOID flags.
Also, when did we first require padding bytes to be 0?
The 8.3 varvarlena patch is what requires it, but in practice
heap_formtuple has always started with a palloc0, so I think it would
work a long ways back.
regards, tom lane
On Nov 2, 2007, at 13:44 , Andrew Dunstan wrote:
Ah. Good. Thanks, that's the piece I was missing.
What would be the disadvantages of always doing this, i.e., just
making this part of the normal update path in the backend? I'd think
it should save on unnecessarily dead tuples as well.
Michael Glaesemann
grzm seespotcode net
Michael Glaesemann <grzm@seespotcode.net> writes:
What would be the disadvantages of always doing this, i.e., just
making this part of the normal update path in the backend?
(1) cycles wasted to no purpose in the vast majority of cases.
(2) visibly inconsistent behavior for apps that pay attention
to ctid/xmin/etc.
(3) visibly inconsistent behavior for apps that have AFTER triggers.
There's enough other overhead in issuing an update (network,
parsing/planning/etc) that a sanely coded application should try
to avoid issuing no-op updates anyway. The proposed trigger is
just a band-aid IMHO.
I think having it as an optional trigger is a reasonable compromise.
regards, tom lane
Tom Lane wrote:
Michael Glaesemann <grzm@seespotcode.net> writes:
What would be the disadvantages of always doing this, i.e., just
making this part of the normal update path in the backend?(1) cycles wasted to no purpose in the vast majority of cases.
(2) visibly inconsistent behavior for apps that pay attention
to ctid/xmin/etc.(3) visibly inconsistent behavior for apps that have AFTER triggers.
There's enough other overhead in issuing an update (network,
parsing/planning/etc) that a sanely coded application should try
to avoid issuing no-op updates anyway. The proposed trigger is
just a band-aid IMHO.I think having it as an optional trigger is a reasonable compromise.
Right. I never proposed making this the default behaviour, for all these
good reasons.
The point about making the app try to avoid no-op updates is that this
can impose some quite considerable code complexity on the app,
especially where the number of updated fields is large. It's fragile and
error-prone. A simple switch that can turn a trigger on or off will be
nicer. Syntax support for that might be even nicer, but there appears to
be some resistance to that, so I can easily settle for the trigger.
cheers
andrew
On Nov 8, 2007, at 10:46 , Andrew Dunstan wrote:
Tom Lane wrote:
Michael Glaesemann <grzm@seespotcode.net> writes:
What would be the disadvantages of always doing this, i.e., just
making this part of the normal update path in the backend?(1) cycles wasted to no purpose in the vast majority of cases.
(2) visibly inconsistent behavior for apps that pay attention
to ctid/xmin/etc.(3) visibly inconsistent behavior for apps that have AFTER triggers.
There's enough other overhead in issuing an update (network,
parsing/planning/etc) that a sanely coded application should try
to avoid issuing no-op updates anyway. The proposed trigger is
just a band-aid IMHO.I think having it as an optional trigger is a reasonable compromise.
Right. I never proposed making this the default behaviour, for all
these good reasons.The point about making the app try to avoid no-op updates is that
this can impose some quite considerable code complexity on the app,
especially where the number of updated fields is large. It's
fragile and error-prone. A simple switch that can turn a trigger on
or off will be nicer. Syntax support for that might be even nicer,
but there appears to be some resistance to that, so I can easily
settle for the trigger.
This confirms what I thought. Thanks.
Michael Glaesemann
grzm seespotcode net
On Nov 2, 2007, at 10:49 AM, Andrew Dunstan wrote:
update tname set foo = bar ... where foo is null or foo <> bar ...
FYI, you should be able to do WHERE foo IS DISTINCT FROM bar instead.
--
Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828
Attachments:
Decibel! wrote:
On Nov 2, 2007, at 10:49 AM, Andrew Dunstan wrote:
update tname set foo = bar ... where foo is null or foo <> bar ...
FYI, you should be able to do WHERE foo IS DISTINCT FROM bar instead.
True, that's a bit nicer. It's still more than somewhat ugly and fragile
if there a lot of foos and the bars are complex expressions.
cheers
andrew
Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
Tom Lane wrote:
Well, you could write the trigger in C and it'd work for any table.
I think it could be as simple as a memcmp of the tuples' data areas,
since we now require padding bytes to be 0 ...Something like this fragment?
newtuple = trigdata->tg_newtuple;
oldtuple = trigdata->tg_trigtuple;
rettuple = newtuple;if (newtuple->t_len == oldtuple->t_len &&
newtuple->t_data->t_hoff == oldtuple->t_data->t_hoff &&
memcmp(GETSTRUCT(newtuple),GETSTRUCT(oldtuple),
newtuple->t_len - newtuple->t_data->t_hoff) == 0)
rettuple = NULL;return PointerGetDatum(rettuple);
Close, but I think you also need to take care to compare natts and
the null bitmaps (if any). Might be worth comparing OIDs too, though
AFAIR there is no mechanism for substituting a different OID during
UPDATE. Probably the easiest coding is to memcmp all the way from
offsetof(t_bits) to t_len, after comparing natts and the HASNULL and
HASOID flags.
How does this look?
if (newtuple->t_len == oldtuple->t_len &&
newtuple->t_data->t_hoff == oldtuple->t_data->t_hoff &&
HeapTupleHeaderGetNatts(newtuple) == HeapTupleHeaderGetNatts(oldtuple) &&
(newtuple->t_data->t_infomask & (HEAP_HASOID|HEAP_HASNULL)) == (oldtuple->t_data->t_infomask & (HEAP_HASOID|HEAP_HASNULL)) &&
memcmp(newtuple->t_data + offsetof(HeapTupleHeaderData, t_bits),
oldtuple->t_data + offsetof(HeapTupleHeaderData, t_bits)
newtuple->t_len - offsetof(HeapTupleHeaderData, t_bits)) == 0)
rettuple = NULL;
return PointerGetDatum(rettuple);
cheers
andrew
Show quoted text
Also, when did we first require padding bytes to be 0?
The 8.3 varvarlena patch is what requires it, but in practice
heap_formtuple has always started with a palloc0, so I think it would
work a long ways back.regards, tom lane
Andrew Dunstan <andrew@dunslane.net> writes:
How does this look?
if (newtuple->t_len == oldtuple->t_len &&
newtuple->t_data->t_hoff == oldtuple->t_data->t_hoff &&
HeapTupleHeaderGetNatts(newtuple) == HeapTupleHeaderGetNatts(oldtuple) &&
(newtuple->t_data->t_infomask & (HEAP_HASOID|HEAP_HASNULL)) == (oldtuple->t_data->t_infomask & (HEAP_HASOID|HEAP_HASNULL)) &&
memcmp(newtuple->t_data + offsetof(HeapTupleHeaderData, t_bits),
oldtuple->t_data + offsetof(HeapTupleHeaderData, t_bits)
newtuple->t_len - offsetof(HeapTupleHeaderData, t_bits)) == 0)
rettuple = NULL;
Looks sane. It might be even saner if you compare all of the
non-visibility-related infomask bits, viz
(newtuple->t_data->t_infomask & ~HEAP_XACT_MASK) ==
(oldtuple->t_data->t_infomask & ~HEAP_XACT_MASK)
rather than just HASOID and HASNULL.
regards, tom lane
Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
How does this look?
if (newtuple->t_len == oldtuple->t_len &&
newtuple->t_data->t_hoff == oldtuple->t_data->t_hoff &&
HeapTupleHeaderGetNatts(newtuple) == HeapTupleHeaderGetNatts(oldtuple) &&
(newtuple->t_data->t_infomask & (HEAP_HASOID|HEAP_HASNULL)) == (oldtuple->t_data->t_infomask & (HEAP_HASOID|HEAP_HASNULL)) &&
memcmp(newtuple->t_data + offsetof(HeapTupleHeaderData, t_bits),
oldtuple->t_data + offsetof(HeapTupleHeaderData, t_bits)
newtuple->t_len - offsetof(HeapTupleHeaderData, t_bits)) == 0)rettuple = NULL;
Looks sane. It might be even saner if you compare all of the
non-visibility-related infomask bits, viz(newtuple->t_data->t_infomask & ~HEAP_XACT_MASK) ==
(oldtuple->t_data->t_infomask & ~HEAP_XACT_MASK)rather than just HASOID and HASNULL.
Sadly, the memcmp is failing on my test ("update foo set bar = bar") on
8.2. Looks like I'm in for weekend with my fave debugger :-(
cheers
andrew
Andrew Dunstan wrote:
Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
How does this look?
if (newtuple->t_len == oldtuple->t_len &&
newtuple->t_data->t_hoff == oldtuple->t_data->t_hoff &&
HeapTupleHeaderGetNatts(newtuple) ==
HeapTupleHeaderGetNatts(oldtuple) &&
(newtuple->t_data->t_infomask & (HEAP_HASOID|HEAP_HASNULL))
== (oldtuple->t_data->t_infomask & (HEAP_HASOID|HEAP_HASNULL)) &&
memcmp(newtuple->t_data + offsetof(HeapTupleHeaderData,
t_bits),
oldtuple->t_data + offsetof(HeapTupleHeaderData, t_bits)
newtuple->t_len - offsetof(HeapTupleHeaderData,
t_bits)) == 0)rettuple = NULL;
Looks sane. It might be even saner if you compare all of the
non-visibility-related infomask bits, viz(newtuple->t_data->t_infomask & ~HEAP_XACT_MASK) ==
(oldtuple->t_data->t_infomask & ~HEAP_XACT_MASK)rather than just HASOID and HASNULL.
Sadly, the memcmp is failing on my test ("update foo set bar = bar")
on 8.2. Looks like I'm in for weekend with my fave debugger :-(
Turns out we needed those pointers used in the arguments to memcmp cast
to char * so the pointer arithmetic would work right.
I'll be suggesting we add a utility function like this for 8.4.
cheers
andrew
I assume don't want a TODO for this? (Suppress UPDATE no changed
columns)
---------------------------------------------------------------------------
Andrew Dunstan wrote:
Tom Lane wrote:
Michael Glaesemann <grzm@seespotcode.net> writes:
What would be the disadvantages of always doing this, i.e., just
making this part of the normal update path in the backend?(1) cycles wasted to no purpose in the vast majority of cases.
(2) visibly inconsistent behavior for apps that pay attention
to ctid/xmin/etc.(3) visibly inconsistent behavior for apps that have AFTER triggers.
There's enough other overhead in issuing an update (network,
parsing/planning/etc) that a sanely coded application should try
to avoid issuing no-op updates anyway. The proposed trigger is
just a band-aid IMHO.I think having it as an optional trigger is a reasonable compromise.
Right. I never proposed making this the default behaviour, for all these
good reasons.The point about making the app try to avoid no-op updates is that this
can impose some quite considerable code complexity on the app,
especially where the number of updated fields is large. It's fragile and
error-prone. A simple switch that can turn a trigger on or off will be
nicer. Syntax support for that might be even nicer, but there appears to
be some resistance to that, so I can easily settle for the trigger.cheers
andrew
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
On Fri, Mar 7, 2008 at 9:40 PM, Bruce Momjian <bruce@momjian.us> wrote:
I assume don't want a TODO for this? (Suppress UPDATE no changed
columns)
I am starting to implement this. Do we want to have this trigger function in
the server, or in an external module?
Best regards,
---------------------------------------------------------------------------
Andrew Dunstan wrote:
Tom Lane wrote:
Michael Glaesemann <grzm@seespotcode.net> writes:
What would be the disadvantages of always doing this, i.e., just
making this part of the normal update path in the backend?(1) cycles wasted to no purpose in the vast majority of cases.
(2) visibly inconsistent behavior for apps that pay attention
to ctid/xmin/etc.(3) visibly inconsistent behavior for apps that have AFTER triggers.
There's enough other overhead in issuing an update (network,
parsing/planning/etc) that a sanely coded application should try
to avoid issuing no-op updates anyway. The proposed trigger is
just a band-aid IMHO.I think having it as an optional trigger is a reasonable compromise.
Right. I never proposed making this the default behaviour, for all these
good reasons.The point about making the app try to avoid no-op updates is that this
can impose some quite considerable code complexity on the app,
especially where the number of updated fields is large. It's fragile and
error-prone. A simple switch that can turn a trigger on or off will be
nicer. Syntax support for that might be even nicer, but there appears to
be some resistance to that, so I can easily settle for the trigger.cheers
andrew
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com+ If your life is a hard drive, Christ can be your backup. +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com
EnterpriseDB http://www.enterprisedb.com
17° 29' 34.37"N, 78° 30' 59.76"E - Hyderabad *
18° 32' 57.25"N, 73° 56' 25.42"E - Pune
37° 47' 19.72"N, 122° 24' 1.69" W - San Francisco
Mail sent from my BlackLaptop device