Re: Synthesize support for Statement.getGeneratedKeys()?

Started by Ken Johansonover 18 years ago41 messagesgeneral

pg-user@kensystem.com

over 18 years ago

The previous discussion does detail the remaining steps needed to get
something ready to be committed: proper quoting, error checking, test
cases, ... If you have the time and skill to work on these that would
be appreciated.

Kris (and folks), I did actually find some justification for the time to
work on this. The attached patch should follow on from our last state in
Feb. Essentially the only thing I added was error checking of the
arguments to 'executeUpdate's columnNames. The checking is done in a new
Utils.needsQuoted(String in) (see javadoc) (feel free to rename this
method).

I opted to use the Quoting mechanism I already had in executeUpdate for
now, since the string validation (no 0x00 && no nested quotes) is being
done in needsQuoted (in the same loop that validates quotes and scans
for whitespace).

Let me know in detail what else needs to be implemented in terms of
error checking or methods. I did not add any unit-tests (that will be a
learning curve; seeking volunteers).

Questions:

-is whitespace the sole determinator for needing quoting? And other chars?

-is it fine to leave the string un-quoted if it contains no ws, vs
always quoting it (my feeling is yes).

-is '"' the only legal quoting chars? (I cant remember for having
dabbled with too many non-spec databases)

-my needsQuoted method throws if the identifier contains nested quotes
(foo"bar or "foo"bar"); is there a legal quote-escaping mechanism
similar to apostrophe doubling? eg: how or would one pass foo"bar (I
imagine quotes are never allowed in identifiers but don't have an SQL
spec handy)

Ken

PS - Kris, I recall you said the backslashes in the patch were
troublesome; did you find any fix for your patch tool aside from
translating them to '/'? If not I will translate them from hereto forth.

Import Notes

Reply to msg id not found: 4610AA0C.30608@ejurka.comReference msg id not found: 22a9f7570703301215y44a937d0r5faa706bad7948c@mail.gmail.comReference msg id not found: 4610AA0C.30608@ejurka.com

pg-user@kensystem.com

over 18 years ago

In reply to: Ken Johanson (#1)

Version with 4-space instead of tabs..

pg-user@kensystem.com

over 18 years ago

In reply to: Ken Johanson (#2)

Statement.executeUpdate(String sql, int columnIndexes[]) via RETURNING clause?

Does anyone have knowledge of how to implement:

Statement.executeUpdate(String sql, int columnIndexes[])

This would be in the context of using the server's RETURNING clause. For
named-identifiers this is straightforward but I do not know how the
'returning' clause could support numbers... hopefully it can.

Something like?:

... RETURNING [1],[2]

Ken

pg-user@kensystem.com

over 18 years ago

In reply to: Ken Johanson (#2)

RETURNING clause: how to specifiy column indexes?

I am attempting to implement (in a driver)(PG JDBC) support for
specifying which column indexes (that generated keys) to return, so I'm
searching for a way to get the server to return the values of the
columns by their index, not name. By name, it is simply to append the
RETURNING clause and column names to the query:

INSERT... RETURNING foo,bar

Does anyone know how (if) this is possible by index? A standard or
server-specific syntax is fine since this is being implemented in a
server-driver.

Something like?:

INSERT... RETURNING [1],[2] (obviously this will not work)

Would I otherwise need to?:

INSERT... RETURNING *

then extract the user-requested columns? This seems inefficient as it
returns all columns / non-key ones (blobs etc).

While the values of an API that specifies the table's columns by
ordinaility may seem dubious, it is an API that I think should be
implemented anyway.

Thanks,
Ken

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Ken Johanson (#4)

Re: RETURNING clause: how to specifiy column indexes?

Ken Johanson <pg-user@kensystem.com> writes:

While the values of an API that specifies the table's columns by
ordinaility may seem dubious, it is an API that I think should be
implemented anyway.

Every few weeks we get a complaint from someone who thinks that it
should be easy to rearrange the logical order of table columns.
If that comes to pass, it would be a seriously bad idea to have
encouraged applications to rely on table column numbers. And given
the number of votes for that (probably in the hundreds by now)
versus the number of votes for this idea (one), I think column
reordering is much more likely to get done ...

regards, tom lane

books@ejurka.com

over 18 years ago

In reply to: Tom Lane (#5)

Re: RETURNING clause: how to specifiy column indexes?

On Wed, 12 Dec 2007, Tom Lane wrote:

Every few weeks we get a complaint from someone who thinks that it
should be easy to rearrange the logical order of table columns.
If that comes to pass, it would be a seriously bad idea to have
encouraged applications to rely on table column numbers.

I think the expectation is that:

CREATE TABLE t(a int, b int);
INSERT INTO t(b,a) VALUES (1,2) RETURNING *;

will return 1,2 instead of 2,1 as it does now. In this case the op is
not expecting that the (potentially reorganized) table order is
driving the results, but the order that they've actually specified the
columns in creates the result.

Kris Jurka

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Kris Jurka (#6)

Re: RETURNING clause: how to specifiy column indexes?

Kris Jurka <books@ejurka.com> writes:

I think the expectation is that:

CREATE TABLE t(a int, b int);
INSERT INTO t(b,a) VALUES (1,2) RETURNING *;

will return 1,2 instead of 2,1 as it does now.

Hmm ... I see your point, but on what grounds could one argue that
a "*" targetlist here should return something different from what
"SELECT * FROM t" would return?

I'd say that an app that wants that should write

INSERT INTO t(b,a) VALUES (1,2) RETURNING b,a;

which is surely not that hard if you've got the code to produce
the "(b,a)" part.

In any case it's not clear this is the same thing Ken is complaining
about ...

regards, tom lane

pg-user@kensystem.com

over 18 years ago

In reply to: Tom Lane (#7)

Re: RETURNING clause: how to specifiy column indexes?

Tom Lane wrote:

Kris Jurka <books@ejurka.com> writes:

I think the expectation is that:

CREATE TABLE t(a int, b int);
INSERT INTO t(b,a) VALUES (1,2) RETURNING *;

will return 1,2 instead of 2,1 as it does now.

Hmm ... I see your point, but on what grounds could one argue that
a "*" targetlist here should return something different from what
"SELECT * FROM t" would return?

I'd say that an app that wants that should write

INSERT INTO t(b,a) VALUES (1,2) RETURNING b,a;

which is surely not that hard if you've got the code to produce
the "(b,a)" part.

In any case it's not clear this is the same thing Ken is complaining
about ...

I am only seeking to have the columns returned in the order they appear
naturally. JDBC says "This array contains the indexes of the columns in
the target table that contain the auto-generated keys that should be
made available."

For the record I was not "complaining", only citing in advance the fact
that while some consider selecting the keys by index to be dubious, it
nonetheless must be done because an API requires it. Casting my question
into a complaint is another topic.

pg-user@kensystem.com

over 18 years ago

In reply to: Kris Jurka (#6)

Re: RETURNING clause: how to specifiy column indexes?

Kris Jurka wrote:

I think the expectation is that:

CREATE TABLE t(a int, b int);
INSERT INTO t(b,a) VALUES (1,2) RETURNING *;

will return 1,2 instead of 2,1 as it does now. In this case the op is
not expecting that the (potentially reorganized) table order is driving
the results, but the order that they've actually specified the columns
in creates the result.

Kris, do you have pointers to a spec that says the named-columns should
specify the index, or that it should instead be the order in the table?
My interpretation from the JDBC spec was that the latter is true, I may
be wrong...

In the case where it is table-order, then I presume in PG that the
"natural" order of the columns (even if reordering is allowed at a alter
date) is specified by data in one of the pg_* tables (pg_class,
pg_index, etc). Does anyone know if this is true/false?

If true, my next idea would be to derive the column name using a
subquery in the returning clause. But it sounds like this may have
potential security contraints (will any INSERT query always have read
access to the PG tables?). And no guarantee of the order matching in the
long term.

Is there a more elegant approach, like?:

INSERT... RETURNING (PG_LIST_KEYS(tblname))

I looked but did not find such a utility. It seems that such function
would be best implemented in the server instead of in a driver (eg.
having hardcoded subquery to the schema).

Ken

pg-user@kensystem.com

over 18 years ago

In reply to: Ken Johanson (#2)

Kris, do you what token parsers utils exists in the current JDBC
package? E.g the most tried and true way to get the schema and table
name from:

INSERT INTO foo (col1, col2..) VALUES ..
INSERT INTO foo VALUES ..
INSERT INTO "foo" VALUES ..
INSERT INTO mydb."foo" VALUES ..
etc.

Thanks,
Ken

books@ejurka.com

over 18 years ago

In reply to: Ken Johanson (#9)

Re: RETURNING clause: how to specifiy column indexes?

On Wed, 12 Dec 2007, Ken Johanson wrote:

Kris, do you have pointers to a spec that says the named-columns should
specify the index, or that it should instead be the order in the table? My
interpretation from the JDBC spec was that the latter is true, I may be
wrong...

No, I was actually misremembering what the JDBC spec said, although I
think it's an interesting case to consider regardless of any specs.

In the case where it is table-order, then I presume in PG that the "natural"
order of the columns (even if reordering is allowed at a alter date) is
specified by data in one of the pg_* tables (pg_class, pg_index, etc). Does
anyone know if this is true/false?

pg_attribute.attnum stores column order at the moment. If/when
reordering is allowed, there will be another column indicating the
logical order of the colums.

If true, my next idea would be to derive the column name using a subquery in
the returning clause. But it sounds like this may have potential security
contraints (will any INSERT query always have read access to the PG tables?).
And no guarantee of the order matching in the long term.

There is no requirement that insert permission on a user table implies
read access to pg_catalog. Still many clients will break if they can't
read pg_catalog. For example, all of the JDBC driver's MetaData results
need to query pg tables, updatable ResultSets need to query pg tables to
know what the primary key is and so on. So if this functionality required
access to pg_catalog that would neither be unprecedented nor unreasonable.

Is there a more elegant approach, like?:

INSERT... RETURNING (PG_LIST_KEYS(tblname))

You can't dynamically derive the returning clause for the same reason you
can't say "INSERT INTO (SELECT myfunc()) VALUES (...)", using myfunc to
determine the table name at runtime. The planner needs to know all the
tables/columns/other database parts up front before executing anything.

Kris Jurka

books@ejurka.com

over 18 years ago

In reply to: Ken Johanson (#10)

On Wed, 12 Dec 2007, Ken Johanson wrote:

Kris, do you what token parsers utils exists in the current JDBC
package? E.g the most tried and true way to get the schema and table
name from:

INSERT INTO foo (col1, col2..) VALUES ..

Most of the parsing code in the driver is focused on finding placeholders
and escape sequences and doesn't care what the query is actually doing.
Deriving the base tables of a query happens in only one place, updatable
resultset support. See
org.postgresql.jdbc2.AbstractJdbc2ResultSet#parseQuery. That said, the
current implementation is terrible and is fooled by many queries. It just
looks for the first " FROM " and takes anything after that as the table
the select is based on. Clearly this doesn't work for SELECT col AS from
FROM tab; or SELECT /* FROM */ * FROM tab; or SELECT (SELECT col FROM tab)
FROM tab2; and many other ways. So I doubt modelling new code on it is a
good idea.

Kris Jurka

pg-user@kensystem.com

over 18 years ago

In reply to: Kris Jurka (#11)

Re: RETURNING clause: how to specifiy column indexes?

If true, my next idea would be to derive the column name using a
subquery in the returning clause. But it sounds like this may have
potential security contraints (will any INSERT query always have read
access to the PG tables?). And no guarantee of the order matching in
the long term.

There is no requirement that insert permission on a user table implies
read access to pg_catalog. Still many clients will break if they can't
read pg_catalog. For example, all of the JDBC driver's MetaData results
need to query pg tables, updatable ResultSets need to query pg tables to
know what the primary key is and so on. So if this functionality
required access to pg_catalog that would neither be unprecedented nor
unreasonable.

So it sounds like this may be the best approach, do you agree? I'll try
and find the cycles to code this up although the limited value of
getGeneratedKeys by index makes me think my time would be better spent
elsewhere on the JDBC driver. For now at least. If you can respond to my
earlier query (5 Dec) about what robustness improvements are needed,
I'll start there..

Thanks,
Ken

books@ejurka.com

over 18 years ago

In reply to: Ken Johanson (#1)

On Wed, 5 Dec 2007, Ken Johanson wrote:

I opted to use the Quoting mechanism I already had in executeUpdate for now,
since the string validation (no 0x00 && no nested quotes) is being done in
needsQuoted (in the same loop that validates quotes and scans for
whitespace).

-is whitespace the sole determinator for needing quoting? And other chars?

Any keywords would need quoting: If you had a column named "user" it must
be quoted.

jurka=# create temp table zz(a int, "user" text);
CREATE TABLE

jurka=# insert into zz values(1,'a') returning a, user, "user";
a | current_user | user
---+--------------+------
1 | jurka | a

-is it fine to leave the string un-quoted if it contains no ws, vs always
quoting it (my feeling is yes).

I was thinking about this some more and I think we should quote everything
regardless of whether it needs it or not. This forces the caller to
provide the column in the correct case because it won't be folded any
more, but that's something we're already doing in DatabaseMetaData. If we
don't do this there will be no way for the user to indicate that he has
case-sensitive column names. (Unless of course we implemented
getGeneratedKeys with column names similar to how we might implemented it
for interger column indexes. If we used RETURNING * and only did the
extraction once it got back to the driver, then we have some more
flexibility in handling names.)

-is '"' the only legal quoting chars? (I cant remember for having dabbled
with too many non-spec databases)

Yes.

-my needsQuoted method throws if the identifier contains nested quotes
(foo"bar or "foo"bar"); is there a legal quote-escaping mechanism similar to
apostrophe doubling? eg: how or would one pass foo"bar (I imagine quotes are
never allowed in identifiers but don't have an SQL spec handy)

Nested quotes are legal and escaped just like apostrophe doubling:

create table "abc""def" ( """" int);

PS - Kris, I recall you said the backslashes in the patch were troublesome;
did you find any fix for your patch tool aside from translating them to '/'?
If not I will translate them from hereto forth.

I haven't looked at it since then. Let's get another draft or two and
I'll see what needs to be done.

Kris Jurka

pg-user@kensystem.com

over 18 years ago

In reply to: Ken Johanson (#13)

Re: RETURNING clause: how to specifiy column indexes?

Kris and all,

Here is the query I will call to the get the name of columns by ordinal
position. Do you see any compatibility drivers will older server
versions, or other issues?

SELECT column_name
FROM information_schema.columns
WHERE table_catalog=? AND table_schema=? AND table_name=?
ORDER BY ordinal_position

Ken

pg-user@kensystem.com

over 18 years ago

In reply to: Kris Jurka (#14)

Kris Jurka wrote:

Any keywords would need quoting: If you had a column named "user" it
must be quoted.

Enough said. I will quote all IDs and provide a diff tonight hopefully.

books@ejurka.com

over 18 years ago

In reply to: Ken Johanson (#15)

Re: RETURNING clause: how to specifiy column indexes?

On Thu, 13 Dec 2007, Ken Johanson wrote:

Here is the query I will call to the get the name of columns by ordinal
position. Do you see any compatibility drivers will older server versions, or
other issues?

SELECT column_name
FROM information_schema.columns
WHERE table_catalog=? AND table_schema=? AND table_name=?
ORDER BY ordinal_position

Using pg_catalog tables is better than using information_schema because of
the way permissions work. For information_schema you must be the table
owner, while people who only have permissions to access a table will most
likely be able to read pg_catalog.

Kris Jurka

pg-user@kensystem.com

over 18 years ago

In reply to: Kris Jurka (#17)

Re: RETURNING clause: how to specifiy column indexes?

Kris Jurka wrote:

Using pg_catalog tables is better than using information_schema because
of the way permissions work. For information_schema you must be the
table owner, while people who only have permissions to access a table
will most likely be able to read pg_catalog.

Do you have an equivalent query/join handy that will get the catalog and
schema and table and column names frm the pg tables?

SELECT column_name
FROM information_schema.columns
WHERE table_catalog=? AND table_schema=? AND table_name=?
ORDER BY ordinal_position

pg-user@kensystem.com

over 18 years ago

In reply to: Ken Johanson (#16)

Patch for Statement.getGeneratedKeys()

Kris, please try to apply the attached and let me know what errors if
any you get.

All ids are now quoted in: executeUpdate(String sql, String
columnIndexes[]), and: int executeUpdate(String sql, int
columnIndexes[]) is implemented, however it currently will work ONLY if
the fully qualified table is set in the insert:

INSERT INTO foocatalog.fooschema.tbl .....(quoted or not)

It will support normalizing the not-supplied catalog and schema names --
after I find out how to extract these from the Connection (hopefully
this would not require an additional round trip). Any suggestions on this?

Ken

pg-user@kensystem.com

over 18 years ago

In reply to: Ken Johanson (#19)

Re: Patch for Statement.getGeneratedKeys()

Kris, were you able to apply the last patch I sent? Let me know what
you'd like. I would like to proceed so I can close this project out.

books@ejurka.com

over 18 years ago

In reply to: Ken Johanson (#20)

pg-user@kensystem.com

over 18 years ago

In reply to: Kris Jurka (#21)

books@ejurka.com

over 18 years ago

In reply to: Ken Johanson (#19)

pg-user@kensystem.com

over 18 years ago

In reply to: Kris Jurka (#23)

pg-user@kensystem.com

over 18 years ago

In reply to: Kris Jurka (#23)

books@ejurka.com

over 18 years ago

In reply to: Ken Johanson (#24)

books@ejurka.com

over 18 years ago

In reply to: Ken Johanson (#25)

pg-user@kensystem.com

over 18 years ago

In reply to: Kris Jurka (#26)

pg-user@kensystem.com

over 18 years ago

In reply to: Kris Jurka (#27)

books@ejurka.com

over 18 years ago

In reply to: Ken Johanson (#28)

pg-user@kensystem.com

over 18 years ago

In reply to: Kris Jurka (#30)

pg-user@kensystem.com

over 18 years ago

In reply to: Ken Johanson (#31)

books@ejurka.com

over 18 years ago

In reply to: Ken Johanson (#32)

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Kris Jurka (#33)

pg-user@kensystem.com

over 18 years ago

In reply to: Tom Lane (#34)

books@ejurka.com

over 18 years ago

In reply to: Ken Johanson (#35)

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Ken Johanson (#35)

pg-user@kensystem.com

over 18 years ago

In reply to: Tom Lane (#37)

tivvpgsqljdbc@gtech-ua.com

over 18 years ago

In reply to: Ken Johanson (#35)

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Ken Johanson (#38)

pg-user@kensystem.com

over 18 years ago

In reply to: Kris Jurka (#26)