OIDs, CTIDs, updateable cursors and friends

Started by Shachar Shemeshalmost 22 years ago11 messages
#1Shachar Shemesh
psql@shemesh.biz

Hi all,

I'm in the process of writing an OLE DB provider for postgres. I am,
right now, at the point where updating an entry becomes an issue.
Ideally, I would open an updateable cursor for a table/view, and use
that. Unfortunetly, Postgres doesn't seem to support those at all.

As an alternative, I was thinking of using some unique field to
identify, with certanity, the line in question. This will allow me to
use an "update" command to modify said line, in case there is a need
with the OLE DB client. My quetion is this - what can I use?

I have tried to find some docs regarding OIDs and CTIDs, but the docs
seem scarce. If I understand this correctly - OID is meant to identify a
row in a table, though it is not guarenteed to be unique across tables,
or even inside a given table. A CTID is meant to identify the physical
location at which a row is stored. Are these correct?

Would adding "OID" to the rows returned by each "Select" call, and then
doing "update blah where oid=xxx" when I'm requested to update the row
sound like a reasonable stategy, in lieu of updateable cursors? Can
anyone suggest a better way?

Shachar

--
Shachar Shemesh
Lingnu Open Systems Consulting
http://www.lingnu.com/

#2Dave Page
dpage@vale-housing.co.uk
In reply to: Shachar Shemesh (#1)
Re: OIDs, CTIDs, updateable cursors and friends

-----Original Message-----
From: Shachar Shemesh [mailto:psql@shemesh.biz]
Sent: 18 February 2004 13:18
To: Hackers; PostgreSQL OLE DB development
Subject: [HACKERS] OIDs, CTIDs, updateable cursors and friends

Would adding "OID" to the rows returned by each "Select"
call, and then doing "update blah where oid=xxx" when I'm
requested to update the row sound like a reasonable stategy,
in lieu of updateable cursors? Can anyone suggest a better way?

Ignoring potential OID wraparound problems (which we do in pgAdmin) this
should work, assuming there is an OID column. I would suggest trying the
following methods in sequence:

1) Use the tables primary key.
2) Use the OID (and check that only one record will be affected).
3) Build a where clause based on all known original values (and check
that only one record will be affected).
4) Fail with an appropriate error.

2 & 3 can potentially affect more than one record, but even Microsoft
code runs into that problem from time to time and fails with an
appropriate error message. In pgAdmin II we used to ask the user if they
wanted to update all matching rows, but of course that is not
appropriate in a driver.

Regards, Dave.

#3Shachar Shemesh
psql@shemesh.biz
In reply to: Dave Page (#2)
Re: OIDs, CTIDs, updateable cursors and friends

Dave Page wrote:

-----Original Message-----
From: Shachar Shemesh [mailto:psql@shemesh.biz]
Sent: 18 February 2004 13:18
To: Hackers; PostgreSQL OLE DB development
Subject: [HACKERS] OIDs, CTIDs, updateable cursors and friends

Would adding "OID" to the rows returned by each "Select"
call, and then doing "update blah where oid=xxx" when I'm
requested to update the row sound like a reasonable stategy,
in lieu of updateable cursors? Can anyone suggest a better way?

Ignoring potential OID wraparound problems (which we do in pgAdmin) this
should work, assuming there is an OID column. I would suggest trying the
following methods in sequence:

1) Use the tables primary key.

I would, except I'm not sure how many queries I would need in order to
find what the primary key is. Also, what happens if the primary key is
not a part of the fields returned by the query?

2) Use the OID (and check that only one record will be affected).

That may work. Do a query for "how many would be affected". Then again,
I'm currently not inside a transaction. The plan was not to be inside a
transaction unless I needed to. I'm not sure how safe this is to perform
many queries.

3) Build a where clause based on all known original values (and check
that only one record will be affected).

Again - what happens when I'm not inside a transaction?

4) Fail with an appropriate error.

2 & 3 can potentially affect more than one record, but even Microsoft
code runs into that problem from time to time and fails with an
appropriate error message. In pgAdmin II we used to ask the user if they
wanted to update all matching rows, but of course that is not
appropriate in a driver.

Regards, Dave.

The doc mentions something about making the OID column unique. Would
that not cause other problems? What happens if I define the OID field as
unique, and I get a wraparound and an attempt to put a new field in with
existing value? Would the OID skip to the next unique per table, or
would the insert fail?

Shachar

--
Shachar Shemesh
Lingnu Open Systems Consulting
http://www.lingnu.com/

#4Dave Page
dpage@vale-housing.co.uk
In reply to: Shachar Shemesh (#3)
Re: OIDs, CTIDs, updateable cursors and friends

-----Original Message-----
From: Shachar Shemesh [mailto:psql@shemesh.biz]
Sent: 18 February 2004 14:10
To: Dave Page
Cc: Hackers; PostgreSQL OLE DB development
Subject: Re: [HACKERS] OIDs, CTIDs, updateable cursors and friends

I would, except I'm not sure how many queries I would need in
order to find what the primary key is.

Well this is the only safe way to update a specific record. To find the
pkey, look for an index on the table in pg_index with indisprimary =
true. The indkey column holds an array of pg_attribute.attnum's that are
in the index iirc.

Also, what happens if
the primary key is not a part of the fields returned by the query?

Add them as you proprosed to do with the OID, or fall back to the next
method. ADO etc. normally fail to update rows if the programmer hasn't
included a suitable key in the recordset.

2) Use the OID (and check that only one record will be affected).

That may work. Do a query for "how many would be affected".
Then again, I'm currently not inside a transaction. The plan
was not to be inside a transaction unless I needed to. I'm
not sure how safe this is to perform many queries.

Should be perfectly safe.

3) Build a where clause based on all known original values

(and check

that only one record will be affected).

Again - what happens when I'm not inside a transaction?

You might find a new row that wasn;t there before but is now, or vice
versa.

The doc mentions something about making the OID column
unique. Would that not cause other problems? What happens if
I define the OID field as unique, and I get a wraparound and
an attempt to put a new field in with existing value? Would
the OID skip to the next unique per table, or would the insert fail?

It is not the drivers place to mess with peoples schemas, but yes, it
could cause an insert to fail following wraparound.

Regards, Dave.

#5Shachar Shemesh
psql@shemesh.biz
In reply to: Dave Page (#4)
Re: OIDs, CTIDs, updateable cursors and friends

Dave Page wrote:

-----Original Message-----
From: Shachar Shemesh [mailto:psql@shemesh.biz]
Sent: 18 February 2004 14:10
To: Dave Page
Cc: Hackers; PostgreSQL OLE DB development
Subject: Re: [HACKERS] OIDs, CTIDs, updateable cursors and friends

I would, except I'm not sure how many queries I would need in
order to find what the primary key is.

Well this is the only safe way to update a specific record. To find the
pkey, look for an index on the table in pg_index with indisprimary =
true. The indkey column holds an array of pg_attribute.attnum's that are
in the index iirc.

I'll have a look at that. How would updateable cursors do it? By locking
the row?

Also, what happens if
the primary key is not a part of the fields returned by the query?

Add them as you proprosed to do with the OID, or fall back to the next
method. ADO etc. normally fail to update rows if the programmer hasn't
included a suitable key in the recordset.

So, basically, I would not be able to update a table that has no primary
key?

2) Use the OID (and check that only one record will be affected).

That may work. Do a query for "how many would be affected".
Then again, I'm currently not inside a transaction. The plan
was not to be inside a transaction unless I needed to. I'm
not sure how safe this is to perform many queries.

Should be perfectly safe.

What happens if I check how many would be updated, and get "1" as a
result. I then actually do it, but between asking and performing,
someone added a second row that matches the criteria?

3) Build a where clause based on all known original values

(and check

that only one record will be affected).

Again - what happens when I'm not inside a transaction?

You might find a new row that wasn;t there before but is now, or vice
versa.

But what if someone else changes some of the known values of my row?

The doc mentions something about making the OID column
unique. Would that not cause other problems? What happens if
I define the OID field as unique, and I get a wraparound and
an attempt to put a new field in with existing value? Would
the OID skip to the next unique per table, or would the insert fail?

It is not the drivers place to mess with peoples schemas, but yes, it
could cause an insert to fail following wraparound.

Then it's not a good enough solution, even if the driver did have the
prorogative to change the table.

Regards, Dave.

Ok, it seems to me there are several options here.
1. Find out which is the primary key for the table. What happens if the
primary key is a multi-row thing? What happens if there is no primary key?
2. If I'm in a transaction, use OID for the insert after checking with a
select that I'm only affecting one row. If I'm not in a transaction -
perform the update in a generated transaction, and roll it back if there
is more than one row affected.

I like 1 better, frankly. Dillemas dillemas dillemas.

--
Shachar Shemesh
Lingnu Open Systems Consulting
http://www.lingnu.com/

#6Dave Page
dpage@vale-housing.co.uk
In reply to: Shachar Shemesh (#5)
Re: OIDs, CTIDs, updateable cursors and friends

-----Original Message-----
From: Shachar Shemesh [mailto:psql@shemesh.biz]
Sent: 18 February 2004 14:56
To: Dave Page
Cc: Hackers; PostgreSQL OLE DB development
Subject: Re: [HACKERS] OIDs, CTIDs, updateable cursors and friends

I'll have a look at that. How would updateable cursors do it?
By locking the row?

Dunno, we don't have them!

So, basically, I would not be able to update a table that has
no primary key?

Yes, unless you feel back to the value matching type update.

Realistically though, how can anyone expect to edit data successfully
unless they have defined a key to identify rows with? Whilst it's nice
to get it to work 100% of the time no matter how brain dead the schema
it's not that practical.

What happens if I check how many would be updated, and get
"1" as a result. I then actually do it, but between asking
and performing, someone added a second row that matches the criteria?

It'll update both rows unless you do it in one transaction.

But what if someone else changes some of the known values of my row?

The update will fail to find any rows. This is almost certainly what
happens when MS Access starts pinting #DELETED# in rows of a linked
table.

Ok, it seems to me there are several options here.
1. Find out which is the primary key for the table. What
happens if the primary key is a multi-row thing? What happens
if there is no primary key?

I guess you mean multicolumn? No different, you just need all columns in
your WHERE clause. If there is no pkey (and I would be inclined to say
if there is none in the user's query and not try to add it yourself)
then you fail with an error.

2. If I'm in a transaction, use OID for the insert after
checking with a select that I'm only affecting one row. If
I'm not in a transaction - perform the update in a generated
transaction, and roll it back if there is more than one row affected.

I like 1 better, frankly. Dillemas dillemas dillemas.

1 is definitely better and is the only way that is guaranteed to be
safe. Thinking about it more, that is almost certainly the position a
driver should take. In pgAdmin we can afford a little artistic licence
(no pun intended) because no one will be using pgAdmin as a driver to
connect another program to a database, plus we can ask the user what
action to take if we don't know if the result will be exactly what was
intended. You do not have that luxury in a driver of course.

Regards, Dave

#7Brett Schwarz
brett_schwarz@yahoo.com
In reply to: Dave Page (#6)
Re: OIDs, CTIDs, updateable cursors and friends

Ok, it seems to me there are several options here.
1. Find out which is the primary key for the

table. What

happens if the primary key is a multi-row thing?

What happens

if there is no primary key?

I guess you mean multicolumn? No different, you just
need all columns in
your WHERE clause. If there is no pkey (and I would
be inclined to say
if there is none in the user's query and not try to
add it yourself)
then you fail with an error.

2. If I'm in a transaction, use OID for the insert

after

checking with a select that I'm only affecting one

row. If

I'm not in a transaction - perform the update in a

generated

transaction, and roll it back if there is more

than one row affected.

I like 1 better, frankly. Dillemas dillemas

dillemas.

1 is definitely better and is the only way that is
guaranteed to be
safe. Thinking about it more, that is almost
certainly the position a
driver should take. In pgAdmin we can afford a
little artistic licence
(no pun intended) because no one will be using
pgAdmin as a driver to
connect another program to a database, plus we can
ask the user what
action to take if we don't know if the result will
be exactly what was
intended. You do not have that luxury in a driver of
course.

Just as another datapoint, pgaccess does the same
thing (finding the pkey(s), and using those to
uniquely identify the row). It is kind of a PITA, but
as Dave says, it's the best way to do this.

HTH,

--brett

__________________________________
Do you Yahoo!?
Yahoo! Mail SpamGuard - Read only the mail you want.
http://antispam.yahoo.com/tools

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dave Page (#2)
Re: OIDs, CTIDs, updateable cursors and friends

"Dave Page" <dpage@vale-housing.co.uk> writes:

Ignoring potential OID wraparound problems (which we do in pgAdmin) this
should work, assuming there is an OID column. I would suggest trying the
following methods in sequence:

1) Use the tables primary key.
2) Use the OID (and check that only one record will be affected).
3) Build a where clause based on all known original values (and check
that only one record will be affected).
4) Fail with an appropriate error.

I don't think it's a good idea for driver-level code to depend on OIDs
for this; to do that you need the knowledge and cooperation of the
database designer. The OID column may not exist at all (CREATE TABLE
... WITHOUT OIDS). If it does exist, it's not guaranteed to be unique
unless someone put a unique index on it (and I agree with Dave that a
driver has no business installing such an index). Furthermore, if
there's not an index on OID then an update specifying "WHERE oid = nnn"
is going to be very slow because it will have to seqscan the whole
table.

I believe the ODBC driver uses CTID for this sort of problem. CTID is
guaranteed to exist and to be fast to access (since it's a physical
locator). Against this you have the problem that concurrent updates
of the record will move it, leaving your CTID invalid. However, that
could be a good thing, as it's debatable that you want to blindly apply
your update in such a case anyway.

If you are willing to hold open a transaction while the user edits the
record, you can lock the record with SELECT FOR UPDATE, and then your
CTID is guaranteed good for the duration of the transaction.

If you don't want to do that, I'd suggest reading both CTID and XMIN
when you initially read the tuple. When you are ready to commit
changes, do this:

BEGIN;
SELECT xmin FROM table WHERE ctid = whatever FOR UPDATE;
-- check that you get a record and its xmin matches
-- what you had; if so, you can go ahead and do
UPDATE table SET ... WHERE ctid = whatever;
COMMIT;

If some other client changed the record behind your back, you'll know it
because the xmin changes, and you can then do whatever seems best to
cope.

regards, tom lane

#9Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Tom Lane (#8)
Re: OIDs, CTIDs, updateable cursors and friends

I believe the ODBC driver uses CTID for this sort of problem. CTID is
guaranteed to exist and to be fast to access (since it's a physical
locator). Against this you have the problem that concurrent updates
of the record will move it, leaving your CTID invalid. However, that

IIRC the ctid access follows the chain up to the currently valid
tuple ? I thought the only enemy of ctid access was "vacuum" ?

Andreas

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas SB SD (#9)
Re: OIDs, CTIDs, updateable cursors and friends

"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:

IIRC the ctid access follows the chain up to the currently valid
tuple ?

No.

I think Hiroshi or someone put in a function you can use to follow the
chain, but a simple "WHERE ctid = whatever" won't do it. In any case,
if you're not holding an open transaction then you have to be prepared
to have the dead tuple vacuumed out from under you, in which case you'd
not be able to follow the chain anyway.

regards, tom lane

#11Greg Stark
gsstark@mit.edu
In reply to: Shachar Shemesh (#1)
Re: OIDs, CTIDs, updateable cursors and friends

Shachar Shemesh <psql@shemesh.biz> writes:

Would adding "OID" to the rows returned by each "Select" call, and then doing
"update blah where oid=xxx" when I'm requested to update the row sound like a
reasonable stategy, in lieu of updateable cursors? Can anyone suggest a better
way?

If you're in control of the database schema and can ensure that all tables
will have OIDs enabled and you can add a unique index on OID on all these
tables then yes. But it's not ideal. If OID wraps around you'll get errors
from unique key violations.

A better strategy is to pull the primary key columns from information_schema
and use those columns. This would be more work but would work on any table
with a primary key.

This won't work for tables without primary keys, but in that case, arguably,
updating records doesn't really make sense anyways.

Something like this, though I'm not really very familiar with the
information_schema.

db=> SELECT ordinal_position,column_name
FROM information_schema.table_constraints AS a
JOIN information_schema.key_column_usage AS b USING (constraint_schema,constraint_name)
WHERE a.constraint_type = 'PRIMARY KEY'
AND a.table_schema = 'public'
AND a.table_name = 'country'
ORDER BY ordinal_position;
ordinal_position | column_name
------------------+--------------
1 | country_code
(1 row)

--
greg