proposal: function parse_ident
Hi
I miss a functionality that helps with parsing any identifier to basic
three parts - database, schema, objectname. We have this function
internally, but it is not available for SQL layer.
FUNCTION parse_ident(IN ident text, OUT dbname text, OUT schemaname text,
OUT objectname text)
Examples:
SELECT parse_ident('"some schema".tablename');
Comments, ideas, notes?
Regards
Pavel
Pavel Stehule <pavel.stehule@gmail.com> writes:
I miss a functionality that helps with parsing any identifier to basic
three parts - database, schema, objectname. We have this function
internally, but it is not available for SQL layer.
FUNCTION parse_ident(IN ident text, OUT dbname text, OUT schemaname text,
OUT objectname text)
What exactly would you do with this that would not be better done with,
for example, regclass?
Don't say "parse names for things other than tables". Only a minority
of the types of objects used in the database have names that meet this
specification.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi
2015-08-19 21:33 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Stehule <pavel.stehule@gmail.com> writes:
I miss a functionality that helps with parsing any identifier to basic
three parts - database, schema, objectname. We have this function
internally, but it is not available for SQL layer.FUNCTION parse_ident(IN ident text, OUT dbname text, OUT schemaname text,
OUT objectname text)What exactly would you do with this that would not be better done with,
for example, regclass?Don't say "parse names for things other than tables". Only a minority
of the types of objects used in the database have names that meet this
specification.
I see one important reason and one minor reason:
Important - cast to regclass is possible only for existing objects -
parse_ident doesn't check validity of parsed ident.
minor - cast to regclass depends on search_path - but parse_ident not -
with this function I am able to detect if ident depends (or not) on
search_path.
Regards
Pavel
Show quoted text
regards, tom lane
On 8/19/15 2:44 PM, Pavel Stehule wrote:
Don't say "parse names for things other than tables". Only a minority
of the types of objects used in the database have names that meet this
specification.
Really? My impression is that almost everything that's not a shared
object allows for a schema...
I see one important reason and one minor reason:
Important - cast to regclass is possible only for existing objects -
parse_ident doesn't check validity of parsed ident.
minor - cast to regclass depends on search_path - but parse_ident not -
with this function I am able to detect if ident depends (or not) on
search_path.
I've been forced to write this several times. I'd really like to expose
this functionality.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Jim Nasby <Jim.Nasby@bluetreble.com> writes:
Don't say "parse names for things other than tables". Only a minority
of the types of objects used in the database have names that meet this
specification.
Really? My impression is that almost everything that's not a shared
object allows for a schema...
Tables meet this naming spec. Columns, functions, operators, operator
classes/families, collations, constraints, and conversions do not (you
need more data to name them). Schemas, databases, languages, extensions,
and some other things also do not, because you need *less* data to name
them. Types also don't really meet this naming spec, because you need to
contend with special cases like "int[]" or "timestamp with time zone".
So this proposal doesn't seem very carefully thought-through to me,
or at least the use case is much narrower than it could be.
Also, if "object does not exist" isn't supposed to be an error case,
what of "name is not correctly formatted"? It seems a bit arbitrary
to me to throw an error in one case but not the other.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2015-08-20 2:22 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:
Jim Nasby <Jim.Nasby@bluetreble.com> writes:
Don't say "parse names for things other than tables". Only a minority
of the types of objects used in the database have names that meet this
specification.Really? My impression is that almost everything that's not a shared
object allows for a schema...Tables meet this naming spec. Columns, functions, operators, operator
classes/families, collations, constraints, and conversions do not (you
need more data to name them). Schemas, databases, languages, extensions,
and some other things also do not, because you need *less* data to name
them. Types also don't really meet this naming spec, because you need to
contend with special cases like "int[]" or "timestamp with time zone".
So this proposal doesn't seem very carefully thought-through to me,
or at least the use case is much narrower than it could be.Also, if "object does not exist" isn't supposed to be an error case,
what of "name is not correctly formatted"? It seems a bit arbitrary
to me to throw an error in one case but not the other.
When I would to work with living object, then behave of cast to regclass is
correct, but I can work with object, that will be created in future, and I
need to take some other information about this future object - and then
cast has to fail.
Regards
Pavel
Show quoted text
regards, tom lane
On 8/19/15 7:22 PM, Tom Lane wrote:
Jim Nasby <Jim.Nasby@bluetreble.com> writes:
Don't say "parse names for things other than tables". Only a minority
of the types of objects used in the database have names that meet this
specification.Really? My impression is that almost everything that's not a shared
object allows for a schema...Tables meet this naming spec. Columns, functions, operators, operator
classes/families, collations, constraints, and conversions do not (you
need more data to name them). Schemas, databases, languages, extensions,
and some other things also do not, because you need *less* data to name
them. Types also don't really meet this naming spec, because you need to
contend with special cases like "int[]" or "timestamp with time zone".
So this proposal doesn't seem very carefully thought-through to me,
or at least the use case is much narrower than it could be.Also, if "object does not exist" isn't supposed to be an error case,
what of "name is not correctly formatted"? It seems a bit arbitrary
to me to throw an error in one case but not the other.
I think the important point here is this is *parse*_ident(). It's not
meant to guarantee an object actually exists, what kind of object it is,
or whether it's syntactically correct. It's meant only to separate an
identifier into it's 3 (or in some cases 2) components. If this was as
simple as string_to_array(foo, '.') then it'd be a bit pointless, but
it's obviously a lot more complex than that.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2015-08-20 21:16 GMT+02:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 8/19/15 7:22 PM, Tom Lane wrote:
Jim Nasby <Jim.Nasby@bluetreble.com> writes:
Don't say "parse names for things other than tables". Only a minority
of the types of objects used in the database have names that meet this
specification.Really? My impression is that almost everything that's not a shared
object allows for a schema...
Tables meet this naming spec. Columns, functions, operators, operator
classes/families, collations, constraints, and conversions do not (you
need more data to name them). Schemas, databases, languages, extensions,
and some other things also do not, because you need *less* data to name
them. Types also don't really meet this naming spec, because you need to
contend with special cases like "int[]" or "timestamp with time zone".
So this proposal doesn't seem very carefully thought-through to me,
or at least the use case is much narrower than it could be.Also, if "object does not exist" isn't supposed to be an error case,
what of "name is not correctly formatted"? It seems a bit arbitrary
to me to throw an error in one case but not the other.I think the important point here is this is *parse*_ident(). It's not
meant to guarantee an object actually exists, what kind of object it is, or
whether it's syntactically correct. It's meant only to separate an
identifier into it's 3 (or in some cases 2) components. If this was as
simple as string_to_array(foo, '.') then it'd be a bit pointless, but it's
obviously a lot more complex than that.
parsing composite identifier is pretty complex - and almost all work is
done - it just need SQL envelope only
Pavel
Show quoted text
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com
Hi
2015-08-21 7:15 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
2015-08-20 21:16 GMT+02:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 8/19/15 7:22 PM, Tom Lane wrote:
Jim Nasby <Jim.Nasby@bluetreble.com> writes:
Don't say "parse names for things other than tables". Only a minority
of the types of objects used in the database have names that meet this
specification.Really? My impression is that almost everything that's not a shared
object allows for a schema...
Tables meet this naming spec. Columns, functions, operators, operator
classes/families, collations, constraints, and conversions do not (you
need more data to name them). Schemas, databases, languages, extensions,
and some other things also do not, because you need *less* data to name
them. Types also don't really meet this naming spec, because you need to
contend with special cases like "int[]" or "timestamp with time zone".
So this proposal doesn't seem very carefully thought-through to me,
or at least the use case is much narrower than it could be.Also, if "object does not exist" isn't supposed to be an error case,
what of "name is not correctly formatted"? It seems a bit arbitrary
to me to throw an error in one case but not the other.I think the important point here is this is *parse*_ident(). It's not
meant to guarantee an object actually exists, what kind of object it is, or
whether it's syntactically correct. It's meant only to separate an
identifier into it's 3 (or in some cases 2) components. If this was as
simple as string_to_array(foo, '.') then it'd be a bit pointless, but it's
obviously a lot more complex than that.parsing composite identifier is pretty complex - and almost all work is
done - it just need SQL envelope only
here is the patch
It is really trivial - all heavy work was done done before.
Regards
Pavel
Show quoted text
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com
Attachments:
parse_ident.patchtext/x-patch; charset=US-ASCII; name=parse_ident.patchDownload+100-0
On 2015-08-23 17:46:36 +0200, Pavel Stehule wrote:
here is the patch
It is really trivial - all heavy work was done done before.
This seems to entirely disregard the comments in
http://archives.postgresql.org/message-id/29030.1440030152%40sss.pgh.pa.us
about the fact that this approach only works for a few object types?
Also, for the umpteenst time: Start actually quoting in a sane manner.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2015-09-03 13:11 GMT+02:00 Andres Freund <andres@anarazel.de>:
On 2015-08-23 17:46:36 +0200, Pavel Stehule wrote:
here is the patch
It is really trivial - all heavy work was done done before.
This seems to entirely disregard the comments in
http://archives.postgresql.org/message-id/29030.1440030152%40sss.pgh.pa.us
about the fact that this approach only works for a few object types?
The alghoritm for parsing identifiers is same - the differences are in a
names of levels, and in ending symbols.
I don't would to write totally generic parser - more without access to
system catalog or without external hint, you cannot to decide if identifier
is schema.table or table.column. But the rules for parsing is exactly same.
The function can be redesigned little bit:
FUNCTION parse_ident(OUT level1 text,OUT level2 text,OUT level3 text,OUT
specific text)
so it can parse function myschema.myfunc(xx int)
level1: NULL
level2: myschema
level3: myfunc
specific: (xx int)
Is it acceptable?
Regards
Pavel
Show quoted text
Also, for the umpteenst time: Start actually quoting in a sane manner.
Greetings,
Andres Freund
On Fri, Sep 4, 2015 at 12:24 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
The alghoritm for parsing identifiers is same - the differences are in a
names of levels, and in ending symbols.I don't would to write totally generic parser - more without access to
system catalog or without external hint, you cannot to decide if identifier
is schema.table or table.column. But the rules for parsing is exactly same.The function can be redesigned little bit:
FUNCTION parse_ident(OUT level1 text,OUT level2 text,OUT level3 text,OUT
specific text)so it can parse function myschema.myfunc(xx int)
level1: NULL
level2: myschema
level3: myfunc
specific: (xx int)Is it acceptable?
Well, *I* think that would be useful. I'm not sure it belongs in
core, but useful? Yeah, definitely. I would probably make it text[]
rather than level1, level2, level3, though.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2015-09-08 14:06 GMT+02:00 Robert Haas <robertmhaas@gmail.com>:
On Fri, Sep 4, 2015 at 12:24 AM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:The alghoritm for parsing identifiers is same - the differences are in a
names of levels, and in ending symbols.I don't would to write totally generic parser - more without access to
system catalog or without external hint, you cannot to decide ifidentifier
is schema.table or table.column. But the rules for parsing is exactly
same.
The function can be redesigned little bit:
FUNCTION parse_ident(OUT level1 text,OUT level2 text,OUT level3 text,OUT
specific text)so it can parse function myschema.myfunc(xx int)
level1: NULL
level2: myschema
level3: myfunc
specific: (xx int)Is it acceptable?
Well, *I* think that would be useful. I'm not sure it belongs in
core, but useful? Yeah, definitely. I would probably make it text[]
rather than level1, level2, level3, though.
Returning a array is a good idea.
Pavel
Show quoted text
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Sep 8, 2015 at 8:57 AM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:
2015-09-08 14:06 GMT+02:00 Robert Haas <robertmhaas@gmail.com>:
On Fri, Sep 4, 2015 at 12:24 AM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:The alghoritm for parsing identifiers is same - the differences are in a
names of levels, and in ending symbols.I don't would to write totally generic parser - more without access to
system catalog or without external hint, you cannot to decide ifidentifier
is schema.table or table.column. But the rules for parsing is exactly
same.
The function can be redesigned little bit:
FUNCTION parse_ident(OUT level1 text,OUT level2 text,OUT level3 text,OUT
specific text)so it can parse function myschema.myfunc(xx int)
level1: NULL
level2: myschema
level3: myfunc
specific: (xx int)Is it acceptable?
Well, *I* think that would be useful. I'm not sure it belongs in
core, but useful? Yeah, definitely. I would probably make it text[]
rather than level1, level2, level3, though.Returning a array is a good idea.
+1
I would have immediate use for this. So often a function is written with a
table name as a parameter and it's not immediately clear if the schema is
to be parsed out of the string, prescribed, or a separate parameter...in
which case the function signature now has a clumsy optional schema
parameter somewhere. I've written this bit of code probably five times now,
let's make it a solved problem.
text[] return seems most sensible. While I can see the use for a record
output, it wouldn't be used as often.
2015-09-08 20:17 GMT+02:00 Corey Huinker <corey.huinker@gmail.com>:
I would have immediate use for this. So often a function is written with a
table name as a parameter and it's not immediately clear if the schema is
to be parsed out of the string, prescribed, or a separate parameter...in
which case the function signature now has a clumsy optional schema
parameter somewhere. I've written this bit of code probably five times now,
let's make it a solved problem.text[] return seems most sensible. While I can see the use for a record
output, it wouldn't be used as often.
here is a patch
I cannot to use current SplitIdentifierString because it is designed for
different purpose - and it cannot to separate non identifier part. But the
code is simple - and will be cleaned.
postgres=# select * from parse_ident('"AHOJ".NAZDAR[]'::text);
┌───────────────┬───────┐
│ parts │ other │
╞═══════════════╪═══════╡
│ {AHOJ,nazdar} │ [] │
└───────────────┴───────┘
(1 row)
Regards
Pavel
Attachments:
parse_ident-to-text-array.patchtext/x-patch; charset=US-ASCII; name=parse_ident-to-text-array.patchDownload+168-0
Hi
I sending the path with little bit enhanced parser - it doesn't support
utf8 alpha in identifiers yet
Regards
Pavel
Attachments:
parse_ident-to-text-array.patchtext/x-patch; charset=US-ASCII; name=parse_ident-to-text-array.patchDownload+198-0
next iteration - fixed bug in parsing UTF8 chars, enhanced error messages.
Regards
Pavel
Attachments:
parse_ident-to-text-array-01.patchtext/x-patch; charset=US-ASCII; name=parse_ident-to-text-array-01.patchDownload+199-0
Pavel Stehule wrote:
I cannot to use current SplitIdentifierString because it is designed for
different purpose - and it cannot to separate non identifier part. But the
code is simple - and will be cleaned.postgres=# select * from parse_ident('"AHOJ".NAZDAR[]'::text);
┌───────────────┬───────┐
│ parts │ other │
╞═══════════════╪═══════╡
│ {AHOJ,nazdar} │ [] │
└───────────────┴───────┘
(1 row)
Um. Now this is really getting into much of the same territory I got
into with the objname/objargs arrays for pg_get_object_address. I think
the "other" bit is a very poor solution to that.
If you want to be able to parse names for all kinds of objects, you need
a solution much more complex than this function. I think a clean
solution returns three sets of things; one is the primary part of the
name, which is an array of text; the second is the secondary name, which
is another array of text; the third is an array of TypeName.
For the name of a relation, only the first of these arrays is used. For
the name of objects subsidiary to a relation, the first two are used
(the first array is the name of the relation itself, and the second
array is the name of the object; for instance a constraint name, or a
trigger name).
The array of type names is necessary because the parsing of TypeName is
completely unlike parsing of plain names. You need [] decorator and
typmod. If you consider objects such as casts, you need two TypeNames
("from" and "to"), hence this is an array and not a single one. As far
as I recall there are other object types that also need more than one
TypeName.
For the name of a function, you need the first text array, and the array
of TypeName which are the input argument types.
If you don't want to have all this complexity, I think you need to forgo
the idea of the "other" thingy that you propose above, and just concern
yourself with the first bits. I don't think "AHOJ".NAZDAR[] is an
identifier.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi
2015-09-09 21:55 GMT+02:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
I cannot to use current SplitIdentifierString because it is designed for
different purpose - and it cannot to separate non identifier part. Butthe
code is simple - and will be cleaned.
postgres=# select * from parse_ident('"AHOJ".NAZDAR[]'::text);
┌───────────────┬───────┐
│ parts │ other │
╞═══════════════╪═══════╡
│ {AHOJ,nazdar} │ [] │
└───────────────┴───────┘
(1 row)Um. Now this is really getting into much of the same territory I got
into with the objname/objargs arrays for pg_get_object_address. I think
the "other" bit is a very poor solution to that.If you want to be able to parse names for all kinds of objects, you need
a solution much more complex than this function. I think a clean
solution returns three sets of things; one is the primary part of the
name, which is an array of text; the second is the secondary name, which
is another array of text; the third is an array of TypeName.For the name of a relation, only the first of these arrays is used. For
the name of objects subsidiary to a relation, the first two are used
(the first array is the name of the relation itself, and the second
array is the name of the object; for instance a constraint name, or a
trigger name).The array of type names is necessary because the parsing of TypeName is
completely unlike parsing of plain names. You need [] decorator and
typmod. If you consider objects such as casts, you need two TypeNames
("from" and "to"), hence this is an array and not a single one. As far
as I recall there are other object types that also need more than one
TypeName.For the name of a function, you need the first text array, and the array
of TypeName which are the input argument types.If you don't want to have all this complexity, I think you need to forgo
the idea of the "other" thingy that you propose above, and just concern
yourself with the first bits. I don't think "AHOJ".NAZDAR[] is an
identifier.
yes, usually I don't need a "other" part. And If I need it, then I can get
it as difference against a original string. But sometimes you don't get a
clean string - and you have to find a end of identifier. The
SplitIdentifierString calculates only with separator char, and it cannot to
find end of ident. So little bit modified API can look like
CREATE OR REPLACE FUNCTION parse_ident(str text, strict boolean DEFAULT
true) RETURNS text[]
raise exception "syntax error" for '"AHOJ".NAZDAR[]' when "strict" is true
returns "AHOJ".nazdar for '"AHOJ".NAZDAR[]' when "strict" is false
Pavel
Show quoted text
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi
new update of parse_ident function patch
Regards
Pavel