Rewritten rows on unchanged values

Started by Bertrand Janinabout 13 years ago12 messagesgeneral
Jump to latest
#1Bertrand Janin
b@janin.com

I noticed how rows were re-written to a different location (new ctid) even
without changes to the values. This illustrate what I mean:

CREATE TABLE demo (id serial, value text);

-- generate a few pages of dummy data
INSERT INTO demo (value)
SELECT md5(s.a::text)
FROM generate_series(1, 1000) AS s(a);

-- ctid = (0,1)
SELECT id, xmin, ctid, value
FROM demo
WHERE id = 1;

UPDATE demo
SET value = value
WHERE id = 1;

-- ctid = (8,41)
SELECT id, xmin, ctid, value
FROM demo
WHERE id = 1;

I'm curious as to what would prevent keeping the row where it is and maybe
change xmin in place?

Thanks,
-b

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Bertrand Janin (#1)
Re: Rewritten rows on unchanged values

On 03/22/2013 05:32 AM, Bertrand Janin wrote:

I noticed how rows were re-written to a different location (new ctid) even
without changes to the values. This illustrate what I mean:

CREATE TABLE demo (id serial, value text);

-- generate a few pages of dummy data
INSERT INTO demo (value)
SELECT md5(s.a::text)
FROM generate_series(1, 1000) AS s(a);

-- ctid = (0,1)
SELECT id, xmin, ctid, value
FROM demo
WHERE id = 1;

UPDATE demo
SET value = value
WHERE id = 1;

-- ctid = (8,41)
SELECT id, xmin, ctid, value
FROM demo
WHERE id = 1;

I'm curious as to what would prevent keeping the row where it is and maybe
change xmin in place?

Because Postgres uses MVCC:

http://www.postgresql.org/docs/9.2/static/mvcc-intro.html

So an update is a delete and an insert and you are really seeing a new row.

Thanks,
-b

--
Adrian Klaver
adrian.klaver@gmail.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Adrian Klaver (#2)
Re: Rewritten rows on unchanged values

Adrian Klaver wrote:

On 03/22/2013 05:32 AM, Bertrand Janin wrote:

I noticed how rows were re-written to a different location (new ctid) even
without changes to the values. This illustrate what I mean:

-- ctid = (0,1)
SELECT id, xmin, ctid, value
FROM demo
WHERE id = 1;

UPDATE demo
SET value = value
WHERE id = 1;

-- ctid = (8,41)
SELECT id, xmin, ctid, value
FROM demo
WHERE id = 1;

I'm curious as to what would prevent keeping the row where it is and maybe
change xmin in place?

Because Postgres uses MVCC:

http://www.postgresql.org/docs/9.2/static/mvcc-intro.html

So an update is a delete and an insert and you are really seeing a new row.

You could use a BEFORE UPDATE trigger that returns
NULL if all fields are unchanged, but I'm not sure if
that is what the OP is looking for.

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Ryan Kelly
rpkelly22@gmail.com
In reply to: Adrian Klaver (#2)
Re: Rewritten rows on unchanged values

On Fri, Mar 03/22/13, 2013 at 06:16:11AM -0700, Adrian Klaver wrote:

On 03/22/2013 05:32 AM, Bertrand Janin wrote:

I noticed how rows were re-written to a different location (new ctid) even
without changes to the values. This illustrate what I mean:

CREATE TABLE demo (id serial, value text);

-- generate a few pages of dummy data
INSERT INTO demo (value)
SELECT md5(s.a::text)
FROM generate_series(1, 1000) AS s(a);

-- ctid = (0,1)
SELECT id, xmin, ctid, value
FROM demo
WHERE id = 1;

UPDATE demo
SET value = value
WHERE id = 1;

-- ctid = (8,41)
SELECT id, xmin, ctid, value
FROM demo
WHERE id = 1;

I'm curious as to what would prevent keeping the row where it is and maybe
change xmin in place?

Because Postgres uses MVCC:

http://www.postgresql.org/docs/9.2/static/mvcc-intro.html

So an update is a delete and an insert and you are really seeing a new row.

I'm having trouble understanding why it is necessary to generate a new
tuple even when nothing has changed. It seems that the OP understands
that MVCC is at work, but is questioning why this exact behavior occurs.
I too have the same question.

Perhaps you could provide an example where an replacing the tuple would
be required in the presence of multiple transactions?

-Ryan Kelly

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ryan Kelly (#4)
Re: Rewritten rows on unchanged values

Ryan Kelly <rpkelly22@gmail.com> writes:

I'm having trouble understanding why it is necessary to generate a new
tuple even when nothing has changed. It seems that the OP understands
that MVCC is at work, but is questioning why this exact behavior occurs.
I too have the same question.

It's not *necessary* to do so. However, avoiding it would require
sitting there and comparing the old and new tuples, which would be a
waste of cycles for most applications, which don't do useless updates.
We'd be trading off a small loss on every update for a moderate gain
on a few updates, with all of the benefit going to poorly-coded
applications. This has come up before and we've always judged that
it would be a net loss not gain of performance to check for useless
updates.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Ryan Kelly (#4)
Re: Rewritten rows on unchanged values

On 03/22/2013 06:41 AM, Ryan Kelly wrote:

On Fri, Mar 03/22/13, 2013 at 06:16:11AM -0700, Adrian Klaver wrote:

On 03/22/2013 05:32 AM, Bertrand Janin wrote:

I noticed how rows were re-written to a different location (new ctid) even
without changes to the values. This illustrate what I mean:

CREATE TABLE demo (id serial, value text);

-- generate a few pages of dummy data
INSERT INTO demo (value)
SELECT md5(s.a::text)
FROM generate_series(1, 1000) AS s(a);

-- ctid = (0,1)
SELECT id, xmin, ctid, value
FROM demo
WHERE id = 1;

UPDATE demo
SET value = value
WHERE id = 1;

-- ctid = (8,41)
SELECT id, xmin, ctid, value
FROM demo
WHERE id = 1;

I'm curious as to what would prevent keeping the row where it is and maybe
change xmin in place?

Because Postgres uses MVCC:

http://www.postgresql.org/docs/9.2/static/mvcc-intro.html

So an update is a delete and an insert and you are really seeing a new row.

I'm having trouble understanding why it is necessary to generate a new
tuple even when nothing has changed. It seems that the OP understands
that MVCC is at work, but is questioning why this exact behavior occurs.
I too have the same question.

I don't have the answer but Tom Lane does, from:

http://postgresql.1045698.n5.nabble.com/GENERAL-Update-on-tables-when-the-row-doesn-t-change-td1844002.html

"Because testing for this would almost surely be a net loss for the vast
majority of applications. Checking to see if the new row value exactly
equals the old is hardly a zero-cost operation; if you pay that on every
update, that's a lot of overhead that you are hoping to make back by
sometimes avoiding the physical store of the new tuple. In most
applications I think the "sometimes" isn't going to be often enough
to justify doing it.

If you have a particular table in a particular app where it is worth it,
I'd recommend writing a BEFORE UPDATE trigger to make the comparisons
and suppress the update when NEW and OLD are equal. "

Perhaps you could provide an example where an replacing the tuple would
be required in the presence of multiple transactions?

I am not sure what you are asking above?

-Ryan Kelly

--
Adrian Klaver
adrian.klaver@gmail.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Ryan Kelly (#4)
Re: Rewritten rows on unchanged values

Ryan Kelly <rpkelly22@gmail.com> wrote:

I'm having trouble understanding why it is necessary to generate a new
tuple even when nothing has changed. It seems that the OP understands
that MVCC is at work, but is questioning why this exact behavior occurs.
I too have the same question.

Perhaps you could provide an example where an replacing the tuple would
be required in the presence of multiple transactions?

Well, someone might use such an update for concurrency control
purposes, since the SELECT FOR UPDATE in PostgreSQL isn't as strong
as it is in some other products.  An actual UPDATE will generate a
write conflict in some circumstances where SELECT FOR UPDATE will
not.  Aside from that, I think it's mainly an issue of performance
-- it is slower to process an UPDATE command and check whether any
of the columns in the SET list are actually being set to a new
value than to not generate the UPDATE in the first place.  There is
a trigger function to do this extra work for those who need it;
take a look at the suppress_redundant_updates_trigger() function:

http://www.postgresql.org/docs/current/interactive/functions-trigger.html

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Hannes Erven
hannes@erven.at
In reply to: Tom Lane (#5)
Re: Rewritten rows on unchanged values

Hi folks,

On 2013-03-22 13:32, Bertrand Janin wrote:

UPDATE demo
SET value = value
WHERE id = 1;

On 2013-03-22 14:55, Tom Lane wrote:

It's not *necessary* to do so. However, avoiding it would require
sitting there and comparing the old and new tuples,

But in this case, no tuples would need to be compared: just by looking
at the query is becomes clear that it is a no-op.
Isn't this something the planner is or could be aware of?

Betrand, out of curiosity: is this a synthentic test case or do you
really have an application that generates queries like this?

Best regards,

-hannes

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Bertrand Janin
b@janin.com
In reply to: Hannes Erven (#8)
Re: Rewritten rows on unchanged values

Hannes Erven wrote :

Hi folks,

On 2013-03-22 13:32, Bertrand Janin wrote:

UPDATE demo
SET value = value
WHERE id = 1;

On 2013-03-22 14:55, Tom Lane wrote:

It's not *necessary* to do so. However, avoiding it would require
sitting there and comparing the old and new tuples,

But in this case, no tuples would need to be compared: just by
looking at the query is becomes clear that it is a no-op.
Isn't this something the planner is or could be aware of?

Betrand, out of curiosity: is this a synthentic test case or do you
really have an application that generates queries like this?

I used the value=value to emphasize the example, in my use case it would be
value='whatever is already there'.

Tom is right, this would be an optimization for a corner case, I noticed this
when running a generated script for a batch update that wasn't given a ton of
attention. The BEFORE UPDATE trigger will work great.

-b

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannes Erven (#8)
Re: Rewritten rows on unchanged values

Hannes Erven <hannes@erven.at> writes:

On 2013-03-22 13:32, Bertrand Janin wrote:

UPDATE demo
SET value = value
WHERE id = 1;

On 2013-03-22 14:55, Tom Lane wrote:

It's not *necessary* to do so. However, avoiding it would require
sitting there and comparing the old and new tuples,

But in this case, no tuples would need to be compared: just by looking
at the query is becomes clear that it is a no-op.

That's not clear at all; for example, there might be a BEFORE UPDATE
trigger that changes the row some more (perhaps by updating a timestamp
column, for instance). To avoid changing the behavior of triggers,
any optimization like this would have to be performed at a very low
level when the new tuple is just about to be written to disk.

Another factor in all this is that if you have an application that's in
the habit of issuing useless updates often enough for the situation to
be worth optimizing, it'd be far better to fix the application to not
issue such commands in the first place. No amount of short-circuiting
in the planner or executor can buy back the costs of transmitting and
parsing such queries.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11Jasen Betts
jasen@xnet.co.nz
In reply to: Bertrand Janin (#1)
Re: Rewritten rows on unchanged values

On 2013-03-22, Ryan Kelly <rpkelly22@gmail.com> wrote:

On Fri, Mar 03/22/13, 2013 at 06:16:11AM -0700, Adrian Klaver wrote:

On 03/22/2013 05:32 AM, Bertrand Janin wrote:

I noticed how rows were re-written to a different location (new ctid) even
without changes to the values. This illustrate what I mean:

CREATE TABLE demo (id serial, value text);

-- generate a few pages of dummy data
INSERT INTO demo (value)
SELECT md5(s.a::text)
FROM generate_series(1, 1000) AS s(a);

-- ctid = (0,1)
SELECT id, xmin, ctid, value
FROM demo
WHERE id = 1;

UPDATE demo
SET value = value
WHERE id = 1;

-- ctid = (8,41)
SELECT id, xmin, ctid, value
FROM demo
WHERE id = 1;

I'm curious as to what would prevent keeping the row where it is and maybe
change xmin in place?

Because Postgres uses MVCC:

http://www.postgresql.org/docs/9.2/static/mvcc-intro.html

So an update is a delete and an insert and you are really seeing a new row.

I'm having trouble understanding why it is necessary to generate a new
tuple even when nothing has changed. It seems that the OP understands
that MVCC is at work, but is questioning why this exact behavior occurs.
I too have the same question.

Perhaps you could provide an example where an replacing the tuple would
be required in the presence of multiple transactions?

create temp table foo as select 1::integer as x ;
create temp table bar as select 1::integer as y ;
create temp table goo as select 1::integer as z ;

A B
begin transaction; begin transaction;
update foo set x=1;
update bar set y=3;
update foo set x=2;
update goo set z=3;
commit;
commit;

One of those has to fail.
if you discard the "update foo set x=1;"
bot can succeed and you end up with an inconsistant
state.

--
⚂⚃ 100% natural

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#12Chris Angelico
rosuav@gmail.com
In reply to: Bertrand Janin (#9)
Re: Rewritten rows on unchanged values

On Sat, Mar 23, 2013 at 1:50 AM, Bertrand Janin <b@janin.com> wrote:

Tom is right, this would be an optimization for a corner case, I noticed this
when running a generated script for a batch update that wasn't given a ton of
attention. The BEFORE UPDATE trigger will work great.

If you know the app, just tweak its one query:

UPDATE demo
SET value = some_value
WHERE id = 1 AND value != some_value;

ChrisA

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general