Faster Updates

Started by PFCover 19 years ago5 messages
#1PFC
lists@peufeu.com

Hello,
Sometimes people complain that UPDATE is slow in postgres. UPDATE...

- generates dead tuples which must be vacuumed.
- needs to hit all indexes even if only one column was modified.

From what I know UPDATE creates a new copy of the old row with the
relevant C/TID's, then indexes it. On COMMIT the old version becomes dead
but stays in the table and indexes until VACUUM.
I propose a simple idea, which may be idiotic, but who knows.

When a row is UPDATED, instead of storing a new copy of the entire row,
only a differential is stored. The old row stays in the page anyway, so we
might as well only store the binary encoded equivalent of "Use the row
version number X and change column A to value Y".
This is possible only if the differential fits in the free space on the
page.
In this case, a lot less dead space is generated. VACUUM would
consolidate the differentials for commited transactions into a new base
value for this row.
While reading the page looking for a specific version of a row, all
differences would need to be consolidated. This adds overhead, but it
might be a win.
With this method, it could be possible to avoid updating the indexes for
unmodified columns. This is a big win.

What do you think ?

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PFC (#1)
Re: Faster Updates

PFC <lists@peufeu.com> writes:

What do you think ?

Sounds enormously complicated and of very doubtful net win --- you're
moving a lot of overhead into SELECT in order to make UPDATE cheaper,
and on top of that the restriction to same-page will limit the
usefulness quite a lot (unless we deliberately keep pages less than
full, which costs a lot in distributed extra I/O).

Basically this is an extension of the notion of update tuple chains.
Anyone who's been around the project awhile will know that we've had
an unreasonably large number of corner-case bugs associated with tuple
chains (particularly in VACUUM FULL), so adding a second level of
complexity to 'em doesn't sound very appealing to me.

regards, tom lane

#3Nicolai Petri
nicolai@catpipe.net
In reply to: Tom Lane (#2)
Re: Faster Updates

On Saturday 03 June 2006 17:27, Tom Lane wrote:

PFC <lists@peufeu.com> writes:

[snip - complicated update logic proprosal]
What do you think ?

Sounds enormously complicated and of very doubtful net win --- you're

[snip - ... bad idea reasoning] :)

What if every backend while processing a transaction collected a list of
touched records - probably with a max number of entries (GUC) collected per
transaction. Then when transaction completes the list of touples are sent to
pg_autovacuum or possible a new process that selectively only went for those
tupples. Of course it should have some kind of logic connected so we don't
visit the tupples for vacuum unless we are quite sure no running transactions
would be blocking adding the blocks to the FSM. We might be able to actually
queue up the blocks until a later time (GUC queue-max-time +
queue-size-limit) if we cannot determine that it would be safe to FSM the
blocks at current time.

I guess this has probably been suggested before and there is probably a reason
why it cannot be done or wouldn't be effective. But it could probably be a
big win in for common workloads like webpages. Where it would be troublesome
is systems with long-running transactions - it might as well just be disabled
there.

Best regards,
Nicolai Petri

#4Jim Nasby
jnasby@pervasive.com
In reply to: Nicolai Petri (#3)
Re: Faster Updates

On Jun 3, 2006, at 2:05 PM, Nicolai Petri wrote:

On Saturday 03 June 2006 17:27, Tom Lane wrote:

PFC <lists@peufeu.com> writes:

[snip - complicated update logic proprosal]
What do you think ?

Sounds enormously complicated and of very doubtful net win --- you're

[snip - ... bad idea reasoning] :)

What if every backend while processing a transaction collected a
list of
touched records - probably with a max number of entries (GUC)
collected per
transaction. Then when transaction completes the list of touples
are sent to
pg_autovacuum or possible a new process that selectively only went
for those
tupples. Of course it should have some kind of logic connected so
we don't
visit the tupples for vacuum unless we are quite sure no running
transactions
would be blocking adding the blocks to the FSM. We might be able to
actually
queue up the blocks until a later time (GUC queue-max-time +
queue-size-limit) if we cannot determine that it would be safe to
FSM the
blocks at current time.

I guess this has probably been suggested before and there is
probably a reason

Yup. Search the archives for 'dead space map'.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#5Jim Nasby
jnasby@pervasive.com
In reply to: Tom Lane (#2)
Re: Faster Updates

On Jun 3, 2006, at 10:27 AM, Tom Lane wrote:

PFC <lists@peufeu.com> writes:

What do you think ?

Sounds enormously complicated and of very doubtful net win --- you're
moving a lot of overhead into SELECT in order to make UPDATE cheaper,
and on top of that the restriction to same-page will limit the
usefulness quite a lot (unless we deliberately keep pages less than
full, which costs a lot in distributed extra I/O).

A lot of CPU overhead, which in many cases won't really matter.

If someone has interest in testing this to see what impact it has,
how hard would it be to hack together enough code to test the base
concept? I'm thinking only basic SELECT and UPDATE support, along
with a means to leave a certain percentage of each page empty.