proposal: function parse_ident

Started by Pavel Stehuleover 10 years ago51 messages

pavel.stehule@gmail.com

over 10 years ago

I miss a functionality that helps with parsing any identifier to basic
three parts - database, schema, objectname. We have this function
internally, but it is not available for SQL layer.

FUNCTION parse_ident(IN ident text, OUT dbname text, OUT schemaname text,
OUT objectname text)

Examples:

SELECT parse_ident('"some schema".tablename');

Comments, ideas, notes?

Regards

Pavel

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Pavel Stehule (#1)

Re: proposal: function parse_ident

Pavel Stehule <pavel.stehule@gmail.com> writes:

I miss a functionality that helps with parsing any identifier to basic
three parts - database, schema, objectname. We have this function
internally, but it is not available for SQL layer.

FUNCTION parse_ident(IN ident text, OUT dbname text, OUT schemaname text,
OUT objectname text)

What exactly would you do with this that would not be better done with,
for example, regclass?

Don't say "parse names for things other than tables". Only a minority
of the types of objects used in the database have names that meet this
specification.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Pavel Stehule

pavel.stehule@gmail.com

over 10 years ago

In reply to: Tom Lane (#2)

Re: proposal: function parse_ident

2015-08-19 21:33 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:

I miss a functionality that helps with parsing any identifier to basic
three parts - database, schema, objectname. We have this function
internally, but it is not available for SQL layer.

FUNCTION parse_ident(IN ident text, OUT dbname text, OUT schemaname text,
OUT objectname text)

What exactly would you do with this that would not be better done with,
for example, regclass?

Don't say "parse names for things other than tables". Only a minority
of the types of objects used in the database have names that meet this
specification.

I see one important reason and one minor reason:

Important - cast to regclass is possible only for existing objects -
parse_ident doesn't check validity of parsed ident.
minor - cast to regclass depends on search_path - but parse_ident not -
with this function I am able to detect if ident depends (or not) on
search_path.

Regards

Pavel

Show quoted text

regards, tom lane

Jim Nasby

Jim.Nasby@BlueTreble.com

over 10 years ago

In reply to: Pavel Stehule (#3)

Re: proposal: function parse_ident

On 8/19/15 2:44 PM, Pavel Stehule wrote:

Don't say "parse names for things other than tables". Only a minority
of the types of objects used in the database have names that meet this
specification.

Really? My impression is that almost everything that's not a shared
object allows for a schema...

I see one important reason and one minor reason:

Important - cast to regclass is possible only for existing objects -
parse_ident doesn't check validity of parsed ident.
minor - cast to regclass depends on search_path - but parse_ident not -
with this function I am able to detect if ident depends (or not) on
search_path.

I've been forced to write this several times. I'd really like to expose
this functionality.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Jim Nasby (#4)

Re: proposal: function parse_ident

Jim Nasby <Jim.Nasby@bluetreble.com> writes:

Don't say "parse names for things other than tables". Only a minority
of the types of objects used in the database have names that meet this
specification.

Really? My impression is that almost everything that's not a shared
object allows for a schema...

Tables meet this naming spec. Columns, functions, operators, operator
classes/families, collations, constraints, and conversions do not (you
need more data to name them). Schemas, databases, languages, extensions,
and some other things also do not, because you need *less* data to name
them. Types also don't really meet this naming spec, because you need to
contend with special cases like "int[]" or "timestamp with time zone".
So this proposal doesn't seem very carefully thought-through to me,
or at least the use case is much narrower than it could be.

Also, if "object does not exist" isn't supposed to be an error case,
what of "name is not correctly formatted"? It seems a bit arbitrary
to me to throw an error in one case but not the other.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Pavel Stehule

pavel.stehule@gmail.com

over 10 years ago

In reply to: Tom Lane (#5)

Re: proposal: function parse_ident

2015-08-20 2:22 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:

Jim Nasby <Jim.Nasby@bluetreble.com> writes:

Don't say "parse names for things other than tables". Only a minority
of the types of objects used in the database have names that meet this
specification.

Really? My impression is that almost everything that's not a shared
object allows for a schema...

Tables meet this naming spec. Columns, functions, operators, operator
classes/families, collations, constraints, and conversions do not (you
need more data to name them). Schemas, databases, languages, extensions,
and some other things also do not, because you need *less* data to name
them. Types also don't really meet this naming spec, because you need to
contend with special cases like "int[]" or "timestamp with time zone".
So this proposal doesn't seem very carefully thought-through to me,
or at least the use case is much narrower than it could be.

Also, if "object does not exist" isn't supposed to be an error case,
what of "name is not correctly formatted"? It seems a bit arbitrary
to me to throw an error in one case but not the other.

When I would to work with living object, then behave of cast to regclass is
correct, but I can work with object, that will be created in future, and I
need to take some other information about this future object - and then
cast has to fail.

Regards

Pavel

Show quoted text

regards, tom lane

Jim Nasby

Jim.Nasby@BlueTreble.com

over 10 years ago

In reply to: Tom Lane (#5)

Re: proposal: function parse_ident

On 8/19/15 7:22 PM, Tom Lane wrote:

Jim Nasby <Jim.Nasby@bluetreble.com> writes:

Don't say "parse names for things other than tables". Only a minority
of the types of objects used in the database have names that meet this
specification.

Really? My impression is that almost everything that's not a shared
object allows for a schema...

Tables meet this naming spec. Columns, functions, operators, operator
classes/families, collations, constraints, and conversions do not (you
need more data to name them). Schemas, databases, languages, extensions,
and some other things also do not, because you need *less* data to name
them. Types also don't really meet this naming spec, because you need to
contend with special cases like "int[]" or "timestamp with time zone".
So this proposal doesn't seem very carefully thought-through to me,
or at least the use case is much narrower than it could be.

Also, if "object does not exist" isn't supposed to be an error case,
what of "name is not correctly formatted"? It seems a bit arbitrary
to me to throw an error in one case but not the other.

I think the important point here is this is *parse*_ident(). It's not
meant to guarantee an object actually exists, what kind of object it is,
or whether it's syntactically correct. It's meant only to separate an
identifier into it's 3 (or in some cases 2) components. If this was as
simple as string_to_array(foo, '.') then it'd be a bit pointless, but
it's obviously a lot more complex than that.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Pavel Stehule

pavel.stehule@gmail.com

over 10 years ago

In reply to: Jim Nasby (#7)

Re: proposal: function parse_ident

2015-08-20 21:16 GMT+02:00 Jim Nasby <Jim.Nasby@bluetreble.com>:

On 8/19/15 7:22 PM, Tom Lane wrote:

Jim Nasby <Jim.Nasby@bluetreble.com> writes:

Don't say "parse names for things other than tables". Only a minority

of the types of objects used in the database have names that meet this
specification.

Really? My impression is that almost everything that's not a shared

object allows for a schema...

Tables meet this naming spec. Columns, functions, operators, operator
classes/families, collations, constraints, and conversions do not (you
need more data to name them). Schemas, databases, languages, extensions,
and some other things also do not, because you need *less* data to name
them. Types also don't really meet this naming spec, because you need to
contend with special cases like "int[]" or "timestamp with time zone".
So this proposal doesn't seem very carefully thought-through to me,
or at least the use case is much narrower than it could be.

Also, if "object does not exist" isn't supposed to be an error case,
what of "name is not correctly formatted"? It seems a bit arbitrary
to me to throw an error in one case but not the other.

I think the important point here is this is *parse*_ident(). It's not
meant to guarantee an object actually exists, what kind of object it is, or
whether it's syntactically correct. It's meant only to separate an
identifier into it's 3 (or in some cases 2) components. If this was as
simple as string_to_array(foo, '.') then it'd be a bit pointless, but it's
obviously a lot more complex than that.

parsing composite identifier is pretty complex - and almost all work is
done - it just need SQL envelope only

Pavel

Show quoted text

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com

Pavel Stehule

pavel.stehule@gmail.com

over 10 years ago

In reply to: Pavel Stehule (#8)

1 attachment(s)

Re: proposal: function parse_ident

2015-08-21 7:15 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:

2015-08-20 21:16 GMT+02:00 Jim Nasby <Jim.Nasby@bluetreble.com>:

On 8/19/15 7:22 PM, Tom Lane wrote:

Jim Nasby <Jim.Nasby@bluetreble.com> writes:

Don't say "parse names for things other than tables". Only a minority

of the types of objects used in the database have names that meet this
specification.

Really? My impression is that almost everything that's not a shared

object allows for a schema...

Tables meet this naming spec. Columns, functions, operators, operator
classes/families, collations, constraints, and conversions do not (you
need more data to name them). Schemas, databases, languages, extensions,
and some other things also do not, because you need *less* data to name
them. Types also don't really meet this naming spec, because you need to
contend with special cases like "int[]" or "timestamp with time zone".
So this proposal doesn't seem very carefully thought-through to me,
or at least the use case is much narrower than it could be.

Also, if "object does not exist" isn't supposed to be an error case,
what of "name is not correctly formatted"? It seems a bit arbitrary
to me to throw an error in one case but not the other.

I think the important point here is this is *parse*_ident(). It's not
meant to guarantee an object actually exists, what kind of object it is, or
whether it's syntactically correct. It's meant only to separate an
identifier into it's 3 (or in some cases 2) components. If this was as
simple as string_to_array(foo, '.') then it'd be a bit pointless, but it's
obviously a lot more complex than that.

parsing composite identifier is pretty complex - and almost all work is
done - it just need SQL envelope only

here is the patch

It is really trivial - all heavy work was done done before.

Regards

Pavel

Show quoted text

Pavel

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com

Attachments:

parse_ident.patchtext/x-patch; charset=US-ASCII; name=parse_ident.patchDownload

commit 1546ec8d173c4fa91e08012af7fbfe0d64585ef0
Author: Pavel Stehule <pavel.stehule@gmail.com>
Date:   Sun Aug 23 17:43:20 2015 +0200

    initial

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 617d0b3..5d678bc 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1707,6 +1707,23 @@
       <row>
        <entry>
         <indexterm>
+         <primary>parse_ident</primary>
+        </indexterm>
+        <literal><function>parse_ident(<parameter>qualified_identifier</parameter> <type>text</type>,
+        OUT <parameter>dbname</parameter> <type>text</type>, OUT <parameter>schemaname</parameter> <type>text</type>,
+        OUT <parameter>objectname</parameter> <type>text</type>)</function></literal>
+       </entry>
+       <entry><type>record</type></entry>
+       <entry>Split <parameter>qualified identifier</parameter> to parts <parameter>dbname</parameter>,
+        <parameter>schemaname</parameter> and <parameter>objectname</parameter>.
+       </entry>
+       <entry><literal>parse_ident('"SomeSchema".someTable')</literal></entry>
+       <entry><literal>(,SomeSchema,sometable)</literal></entry>
+      </row>
+
+      <row>
+       <entry>
+        <indexterm>
          <primary>pg_client_encoding</primary>
         </indexterm>
         <literal><function>pg_client_encoding()</function></literal>
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index c0495d9..19f87db 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -21,7 +21,9 @@
 #include <unistd.h>
 
 #include "access/sysattr.h"
+#include "access/htup_details.h"
 #include "catalog/catalog.h"
+#include "catalog/namespace.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_type.h"
 #include "commands/dbcommands.h"
@@ -598,3 +600,59 @@ pg_column_is_updatable(PG_FUNCTION_ARGS)
 
 	PG_RETURN_BOOL((events & REQ_EVENTS) == REQ_EVENTS);
 }
+
+/*
+ * parse_ident - decompose text identifier to basic three parts
+ */
+Datum
+parse_ident(PG_FUNCTION_ARGS)
+{
+	text		*qualname = PG_GETARG_TEXT_PP(0);
+	List		*name_list;
+	RangeVar	*rv;
+	Datum		values[3];
+	bool		nulls[3];
+	TupleDesc	tupdesc;
+
+	name_list = stringToQualifiedNameList(text_to_cstring(qualname));
+	rv = makeRangeVarFromNameList(name_list);
+
+	/* Prepare result tuple desc */
+	tupdesc = CreateTemplateTupleDesc(3, false);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "dbname",
+					    TEXTOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "schemaname",
+					    TEXTOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "objectname",
+					    TEXTOID, -1, 0);
+
+	BlessTupleDesc(tupdesc);
+
+	/* Fill returned values */
+	if (rv->catalogname != NULL)
+	{
+		values[0] = CStringGetTextDatum(rv->catalogname);
+		nulls[0] = false;
+	}
+	else
+		nulls[0] = true;
+
+	if (rv->schemaname != NULL)
+	{
+		values[1] = CStringGetTextDatum(rv->schemaname);
+		nulls[1] = false;
+	}
+	else
+		nulls[1] = true;
+
+	if (rv->relname != NULL)
+	{
+		values[2] = CStringGetTextDatum(rv->relname);
+		nulls[2] = false;
+	}
+	else
+		nulls[2] = true;
+
+	PG_RETURN_DATUM(HeapTupleGetDatum(
+						heap_form_tuple(tupdesc, values, nulls)));
+}
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index ddf7c67..68a694a 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -3516,6 +3516,9 @@ DESCR("I/O");
 DATA(insert OID = 4086 (  to_regnamespace	PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 4089 "2275" _null_ _null_ _null_ _null_ _null_ to_regnamespace _null_ _null_ _null_ ));
 DESCR("convert namespace name to regnamespace");
 
+DATA(insert OID = 3300 (  parse_ident		PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 2249 "25" "{25,25,25,25}" "{i,o,o,o}" "{identifier,dbname,schemaname,objectname}" _null_ _null_ parse_ident _null_ _null_ _null_ ));
+DESCR("parse pattern dbname.schema.object");
+
 DATA(insert OID = 2246 ( fmgr_internal_validator PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_internal_validator _null_ _null_ _null_ ));
 DESCR("(internal)");
 DATA(insert OID = 2247 ( fmgr_c_validator	PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_c_validator _null_ _null_ _null_ ));
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index fc1679e..0cb491d 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -495,6 +495,7 @@ extern Datum pg_typeof(PG_FUNCTION_ARGS);
 extern Datum pg_collation_for(PG_FUNCTION_ARGS);
 extern Datum pg_relation_is_updatable(PG_FUNCTION_ARGS);
 extern Datum pg_column_is_updatable(PG_FUNCTION_ARGS);
+extern Datum parse_ident(PG_FUNCTION_ARGS);
 
 /* oid.c */
 extern Datum oidin(PG_FUNCTION_ARGS);
diff --git a/src/test/regress/expected/name.out b/src/test/regress/expected/name.out
index b359d52..662633f 100644
--- a/src/test/regress/expected/name.out
+++ b/src/test/regress/expected/name.out
@@ -124,3 +124,14 @@ SELECT '' AS two, c.f1 FROM NAME_TBL c WHERE c.f1 ~ '.*asdf.*';
 (2 rows)
 
 DROP TABLE NAME_TBL;
+DO $$
+DECLARE r record;
+BEGIN
+  r := parse_ident('Schemax.Tabley');
+  RAISE NOTICE '%', format('%I.%I', r.schemaname, r.objectname);
+  r := parse_ident('"SchemaX"."TableY"');
+  RAISE NOTICE '%', format('%I.%I', r.schemaname, r.objectname);
+END;
+$$;
+NOTICE:  schemax.tabley
+NOTICE:  "SchemaX"."TableY"
diff --git a/src/test/regress/sql/name.sql b/src/test/regress/sql/name.sql
index 1c7a671..964ecf2 100644
--- a/src/test/regress/sql/name.sql
+++ b/src/test/regress/sql/name.sql
@@ -52,3 +52,13 @@ SELECT '' AS three, c.f1 FROM NAME_TBL c WHERE c.f1 ~ '[0-9]';
 SELECT '' AS two, c.f1 FROM NAME_TBL c WHERE c.f1 ~ '.*asdf.*';
 
 DROP TABLE NAME_TBL;
+
+DO $$
+DECLARE r record;
+BEGIN
+  r := parse_ident('Schemax.Tabley');
+  RAISE NOTICE '%', format('%I.%I', r.schemaname, r.objectname);
+  r := parse_ident('"SchemaX"."TableY"');
+  RAISE NOTICE '%', format('%I.%I', r.schemaname, r.objectname);
+END;
+$$;

#10

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Pavel Stehule (#9)

Re: proposal: function parse_ident

On 2015-08-23 17:46:36 +0200, Pavel Stehule wrote:

here is the patch

It is really trivial - all heavy work was done done before.

This seems to entirely disregard the comments in
http://archives.postgresql.org/message-id/29030.1440030152%40sss.pgh.pa.us
about the fact that this approach only works for a few object types?

Also, for the umpteenst time: Start actually quoting in a sane manner.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Pavel Stehule

pavel.stehule@gmail.com

over 10 years ago

In reply to: Andres Freund (#10)

Re: proposal: function parse_ident

2015-09-03 13:11 GMT+02:00 Andres Freund <andres@anarazel.de>:

On 2015-08-23 17:46:36 +0200, Pavel Stehule wrote:

here is the patch

It is really trivial - all heavy work was done done before.

This seems to entirely disregard the comments in
http://archives.postgresql.org/message-id/29030.1440030152%40sss.pgh.pa.us
about the fact that this approach only works for a few object types?

The alghoritm for parsing identifiers is same - the differences are in a
names of levels, and in ending symbols.

I don't would to write totally generic parser - more without access to
system catalog or without external hint, you cannot to decide if identifier
is schema.table or table.column. But the rules for parsing is exactly same.

The function can be redesigned little bit:

FUNCTION parse_ident(OUT level1 text,OUT level2 text,OUT level3 text,OUT
specific text)

so it can parse function myschema.myfunc(xx int)

level1: NULL
level2: myschema
level3: myfunc
specific: (xx int)

Is it acceptable?

Regards

Pavel

Show quoted text

Also, for the umpteenst time: Start actually quoting in a sane manner.

Greetings,

Andres Freund

#12

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Pavel Stehule (#11)

Re: proposal: function parse_ident

On Fri, Sep 4, 2015 at 12:24 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

The alghoritm for parsing identifiers is same - the differences are in a
names of levels, and in ending symbols.

I don't would to write totally generic parser - more without access to
system catalog or without external hint, you cannot to decide if identifier
is schema.table or table.column. But the rules for parsing is exactly same.

The function can be redesigned little bit:

FUNCTION parse_ident(OUT level1 text,OUT level2 text,OUT level3 text,OUT
specific text)

so it can parse function myschema.myfunc(xx int)

level1: NULL
level2: myschema
level3: myfunc
specific: (xx int)

Is it acceptable?

Well, *I* think that would be useful. I'm not sure it belongs in
core, but useful? Yeah, definitely. I would probably make it text[]
rather than level1, level2, level3, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Pavel Stehule

pavel.stehule@gmail.com

over 10 years ago

In reply to: Robert Haas (#12)

Re: proposal: function parse_ident

2015-09-08 14:06 GMT+02:00 Robert Haas <robertmhaas@gmail.com>:

On Fri, Sep 4, 2015 at 12:24 AM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

The alghoritm for parsing identifiers is same - the differences are in a
names of levels, and in ending symbols.

I don't would to write totally generic parser - more without access to
system catalog or without external hint, you cannot to decide if

identifier

is schema.table or table.column. But the rules for parsing is exactly

same.

The function can be redesigned little bit:

FUNCTION parse_ident(OUT level1 text,OUT level2 text,OUT level3 text,OUT
specific text)

so it can parse function myschema.myfunc(xx int)

level1: NULL
level2: myschema
level3: myfunc
specific: (xx int)

Is it acceptable?

Well, *I* think that would be useful. I'm not sure it belongs in
core, but useful? Yeah, definitely. I would probably make it text[]
rather than level1, level2, level3, though.

Returning a array is a good idea.

Pavel

Show quoted text

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#14

Corey Huinker

corey.huinker@gmail.com

over 10 years ago

In reply to: Pavel Stehule (#13)

Re: proposal: function parse_ident

On Tue, Sep 8, 2015 at 8:57 AM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

2015-09-08 14:06 GMT+02:00 Robert Haas <robertmhaas@gmail.com>:

On Fri, Sep 4, 2015 at 12:24 AM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

The alghoritm for parsing identifiers is same - the differences are in a
names of levels, and in ending symbols.

I don't would to write totally generic parser - more without access to
system catalog or without external hint, you cannot to decide if

identifier

is schema.table or table.column. But the rules for parsing is exactly

same.

The function can be redesigned little bit:

FUNCTION parse_ident(OUT level1 text,OUT level2 text,OUT level3 text,OUT
specific text)

so it can parse function myschema.myfunc(xx int)

level1: NULL
level2: myschema
level3: myfunc
specific: (xx int)

Is it acceptable?

Well, *I* think that would be useful. I'm not sure it belongs in
core, but useful? Yeah, definitely. I would probably make it text[]
rather than level1, level2, level3, though.

Returning a array is a good idea.

I would have immediate use for this. So often a function is written with a
table name as a parameter and it's not immediately clear if the schema is
to be parsed out of the string, prescribed, or a separate parameter...in
which case the function signature now has a clumsy optional schema
parameter somewhere. I've written this bit of code probably five times now,
let's make it a solved problem.

text[] return seems most sensible. While I can see the use for a record
output, it wouldn't be used as often.

#15

Pavel Stehule

pavel.stehule@gmail.com

over 10 years ago

In reply to: Corey Huinker (#14)

1 attachment(s)

Re: proposal: function parse_ident

2015-09-08 20:17 GMT+02:00 Corey Huinker <corey.huinker@gmail.com>:

I would have immediate use for this. So often a function is written with a
table name as a parameter and it's not immediately clear if the schema is
to be parsed out of the string, prescribed, or a separate parameter...in
which case the function signature now has a clumsy optional schema
parameter somewhere. I've written this bit of code probably five times now,
let's make it a solved problem.

text[] return seems most sensible. While I can see the use for a record
output, it wouldn't be used as often.

here is a patch

I cannot to use current SplitIdentifierString because it is designed for
different purpose - and it cannot to separate non identifier part. But the
code is simple - and will be cleaned.

postgres=# select * from parse_ident('"AHOJ".NAZDAR[]'::text);
┌───────────────┬───────┐
│ parts │ other │
╞═══════════════╪═══════╡
│ {AHOJ,nazdar} │ [] │
└───────────────┴───────┘
(1 row)

Regards

Pavel

Attachments:

parse_ident-to-text-array.patchtext/x-patch; charset=US-ASCII; name=parse_ident-to-text-array.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index b3b78d2..75ea33a
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***************
*** 1707,1712 ****
--- 1707,1727 ----
        <row>
         <entry>
          <indexterm>
+          <primary>parse_ident</primary>
+         </indexterm>
+         <literal><function>parse_ident(<parameter>qualified_identifier</parameter> <type>text</type>,
+         OUT <parameter>parts</parameter> <type>text[]</type>, OUT <parameter>other</parameter> <type>text</type>)</function></literal>
+        </entry>
+        <entry><type>record</type></entry>
+        <entry>Split <parameter>qualified identifier</parameter> to array <parameter>parts</parameter>.
+        </entry>
+        <entry><literal>parse_ident('"SomeSchema".someTable')</literal></entry>
+        <entry><literal>("{SomeSchema,sometable}",)</literal></entry>
+       </row>
+ 
+       <row>
+        <entry>
+         <indexterm>
           <primary>pg_client_encoding</primary>
          </indexterm>
          <literal><function>pg_client_encoding()</function></literal>
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
new file mode 100644
index c0495d9..f5b6067
*** a/src/backend/utils/adt/misc.c
--- b/src/backend/utils/adt/misc.c
***************
*** 21,32 ****
--- 21,35 ----
  #include <unistd.h>
  
  #include "access/sysattr.h"
+ #include "access/htup_details.h"
  #include "catalog/catalog.h"
+ #include "catalog/namespace.h"
  #include "catalog/pg_tablespace.h"
  #include "catalog/pg_type.h"
  #include "commands/dbcommands.h"
  #include "funcapi.h"
  #include "miscadmin.h"
+ #include "parser/scansup.h"
  #include "parser/keywords.h"
  #include "postmaster/syslogger.h"
  #include "rewrite/rewriteHandler.h"
***************
*** 38,43 ****
--- 41,47 ----
  #include "utils/ruleutils.h"
  #include "tcop/tcopprot.h"
  #include "utils/acl.h"
+ #include "utils/array.h"
  #include "utils/builtins.h"
  #include "utils/timestamp.h"
  
*************** pg_column_is_updatable(PG_FUNCTION_ARGS)
*** 598,600 ****
--- 602,728 ----
  
  	PG_RETURN_BOOL((events & REQ_EVENTS) == REQ_EVENTS);
  }
+ 
+ static bool
+ is_ident_start(char c)
+ {
+ 	return  c == '_' || (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z');
+ }
+ 
+ static bool
+ is_ident_cont(char c)
+ {
+ 	return  c == '_' || (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z')
+ 		    || (c >= '0' && c <= '9');
+ 
+ }
+ 
+ /*
+  * parse_ident - decompose text identifier to basic three parts
+  */
+ Datum
+ parse_ident(PG_FUNCTION_ARGS)
+ {
+ 	text		*qualname = PG_GETARG_TEXT_PP(0);
+ 	char		*qualname_str;
+ 	Datum		values[2];
+ 	bool		nulls[2];
+ 	TupleDesc	tupdesc;
+ 	ArrayBuildState *astate = NULL;
+ 	char	*nextp;
+ 
+ 	qualname_str = text_to_cstring(qualname);
+ 	nextp = qualname_str;
+ 
+ 	/* Prepare result tuple desc */
+ 	tupdesc = CreateTemplateTupleDesc(2, false);
+ 	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "parts",
+ 					    TEXTARRAYOID, -1, 0);
+ 	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "other",
+ 					    TEXTOID, -1, 0);
+ 
+ 	BlessTupleDesc(tupdesc);
+ 
+ 	nulls[0] = false;
+ 	nulls[1] = true;
+ 
+ 	/* skip leading whitespace */
+ 	while (isspace((unsigned char) *nextp))
+ 		nextp++;
+ 
+ 	do
+ 	{
+ 		char		*curname;
+ 		char		*endp;
+ 
+ 		if (*nextp == '\"')
+ 		{
+ 			curname = nextp + 1;
+ 			for (;;)
+ 			{
+ 				endp = strchr(nextp + 1, '\"');
+ 				if (endp == NULL)
+ 					elog(ERROR, "unclosed double quotes");
+ 				if (endp[1] != '\"')
+ 					break;
+ 				memmove(endp, endp + 1, strlen(endp));
+ 				nextp = endp;
+ 			}
+ 			nextp = endp + 1;
+ 			*endp = '\0';
+ 
+ 			astate = accumArrayResult(astate,
+ 				CStringGetTextDatum(curname), false,
+ 						    TEXTOID, CurrentMemoryContext);
+ 		}
+ 		else
+ 		{
+ 			if (is_ident_start(*nextp))
+ 			{
+ 				char *downname;
+ 				int	len;
+ 				text	*part;
+ 
+ 				curname = nextp++;
+ 				while (is_ident_cont(*nextp))
+ 					nextp++;
+ 
+ 				len = nextp - curname;
+ 
+ 				downname = downcase_truncate_identifier(curname, len, false);
+ 				part = cstring_to_text_with_len(downname, len);
+ 				astate = accumArrayResult(astate,
+ 					PointerGetDatum(part), false,
+ 							    TEXTOID, CurrentMemoryContext);
+ 			}
+ 		}
+ 
+ 		while (isspace((unsigned char) *nextp))
+ 			nextp++;
+ 
+ 		if (*nextp == '.')
+ 		{
+ 			nextp++;
+ 			while (isspace((unsigned char) *nextp))
+ 				nextp++;
+ 			continue;
+ 		}
+ 		else if (*nextp == '\0')
+ 		{
+ 			break;
+ 		}
+ 		else
+ 		{
+ 			values[1] = CStringGetTextDatum(nextp);
+ 			nulls[1] = false;
+ 			break;
+ 		}
+ 	} while (true);
+ 
+ 
+ 	values[0] = makeArrayResult(astate, CurrentMemoryContext);
+ 
+ 
+ 	PG_RETURN_DATUM(HeapTupleGetDatum(
+ 			    heap_form_tuple(tupdesc, values, nulls)));
+ }
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
new file mode 100644
index ddf7c67..3ae7b50
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** DESCR("I/O");
*** 3516,3521 ****
--- 3516,3524 ----
  DATA(insert OID = 4086 (  to_regnamespace	PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 4089 "2275" _null_ _null_ _null_ _null_ _null_ to_regnamespace _null_ _null_ _null_ ));
  DESCR("convert namespace name to regnamespace");
  
+ DATA(insert OID = 3300 (  parse_ident		PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 2249 "25" "{25,1009,25}" "{i,o,o}" "{identifier,parts,other}" _null_ _null_ parse_ident _null_ _null_ _null_ ));
+ DESCR("parse pattern dbname.schema.object");
+ 
  DATA(insert OID = 2246 ( fmgr_internal_validator PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_internal_validator _null_ _null_ _null_ ));
  DESCR("(internal)");
  DATA(insert OID = 2247 ( fmgr_c_validator	PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_c_validator _null_ _null_ _null_ ));
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
new file mode 100644
index fc1679e..0cb491d
*** a/src/include/utils/builtins.h
--- b/src/include/utils/builtins.h
*************** extern Datum pg_typeof(PG_FUNCTION_ARGS)
*** 495,500 ****
--- 495,501 ----
  extern Datum pg_collation_for(PG_FUNCTION_ARGS);
  extern Datum pg_relation_is_updatable(PG_FUNCTION_ARGS);
  extern Datum pg_column_is_updatable(PG_FUNCTION_ARGS);
+ extern Datum parse_ident(PG_FUNCTION_ARGS);
  
  /* oid.c */
  extern Datum oidin(PG_FUNCTION_ARGS);
diff --git a/src/test/regress/expected/name.out b/src/test/regress/expected/name.out
new file mode 100644
index b359d52..2c84d8a
*** a/src/test/regress/expected/name.out
--- b/src/test/regress/expected/name.out
*************** SELECT '' AS two, c.f1 FROM NAME_TBL c W
*** 124,126 ****
--- 124,137 ----
  (2 rows)
  
  DROP TABLE NAME_TBL;
+ DO $$
+ DECLARE r record;
+ BEGIN
+   r := parse_ident('Schemax.Tabley');
+   RAISE NOTICE '%', format('%I.%I', r.parts[1], r.parts[2]);
+   r := parse_ident('"SchemaX"."TableY"');
+   RAISE NOTICE '%', format('%I.%I', r.parts[1], r.parts[2]);
+ END;
+ $$;
+ NOTICE:  schemax.tabley
+ NOTICE:  "SchemaX"."TableY"
diff --git a/src/test/regress/sql/name.sql b/src/test/regress/sql/name.sql
new file mode 100644
index 1c7a671..e80f9aa
*** a/src/test/regress/sql/name.sql
--- b/src/test/regress/sql/name.sql
*************** SELECT '' AS three, c.f1 FROM NAME_TBL c
*** 52,54 ****
--- 52,64 ----
  SELECT '' AS two, c.f1 FROM NAME_TBL c WHERE c.f1 ~ '.*asdf.*';
  
  DROP TABLE NAME_TBL;
+ 
+ DO $$
+ DECLARE r record;
+ BEGIN
+   r := parse_ident('Schemax.Tabley');
+   RAISE NOTICE '%', format('%I.%I', r.parts[1], r.parts[2]);
+   r := parse_ident('"SchemaX"."TableY"');
+   RAISE NOTICE '%', format('%I.%I', r.parts[1], r.parts[2]);
+ END;
+ $$;

#16

Pavel Stehule

pavel.stehule@gmail.com

over 10 years ago

In reply to: Pavel Stehule (#15)

1 attachment(s)

Re: proposal: function parse_ident

I sending the path with little bit enhanced parser - it doesn't support
utf8 alpha in identifiers yet

Regards

Pavel

Attachments:

parse_ident-to-text-array.patchtext/x-patch; charset=US-ASCII; name=parse_ident-to-text-array.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index b3b78d2..75ea33a
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***************
*** 1707,1712 ****
--- 1707,1727 ----
        <row>
         <entry>
          <indexterm>
+          <primary>parse_ident</primary>
+         </indexterm>
+         <literal><function>parse_ident(<parameter>qualified_identifier</parameter> <type>text</type>,
+         OUT <parameter>parts</parameter> <type>text[]</type>, OUT <parameter>other</parameter> <type>text</type>)</function></literal>
+        </entry>
+        <entry><type>record</type></entry>
+        <entry>Split <parameter>qualified identifier</parameter> to array <parameter>parts</parameter>.
+        </entry>
+        <entry><literal>parse_ident('"SomeSchema".someTable')</literal></entry>
+        <entry><literal>("{SomeSchema,sometable}",)</literal></entry>
+       </row>
+ 
+       <row>
+        <entry>
+         <indexterm>
           <primary>pg_client_encoding</primary>
          </indexterm>
          <literal><function>pg_client_encoding()</function></literal>
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
new file mode 100644
index c0495d9..a116e8e
*** a/src/backend/utils/adt/misc.c
--- b/src/backend/utils/adt/misc.c
***************
*** 21,32 ****
--- 21,35 ----
  #include <unistd.h>
  
  #include "access/sysattr.h"
+ #include "access/htup_details.h"
  #include "catalog/catalog.h"
+ #include "catalog/namespace.h"
  #include "catalog/pg_tablespace.h"
  #include "catalog/pg_type.h"
  #include "commands/dbcommands.h"
  #include "funcapi.h"
  #include "miscadmin.h"
+ #include "parser/scansup.h"
  #include "parser/keywords.h"
  #include "postmaster/syslogger.h"
  #include "rewrite/rewriteHandler.h"
***************
*** 38,43 ****
--- 41,47 ----
  #include "utils/ruleutils.h"
  #include "tcop/tcopprot.h"
  #include "utils/acl.h"
+ #include "utils/array.h"
  #include "utils/builtins.h"
  #include "utils/timestamp.h"
  
*************** pg_column_is_updatable(PG_FUNCTION_ARGS)
*** 598,600 ****
--- 602,758 ----
  
  	PG_RETURN_BOOL((events & REQ_EVENTS) == REQ_EVENTS);
  }
+ 
+ 
+ /*
+  * This simple parser utility are compatible with lexer implementation,
+  * used only in parse_ident function
+  */
+ static bool
+ is_ident_start(char c)
+ {
+ 	if (c == '_')
+ 		return true;
+ 
+ 	if (c >= 'a' && c <= 'z')
+ 		return true;
+ 	if (c >= 'A' && c <= 'Z')
+ 		return true;
+ 
+ 	if (c >= 0200 && c <= 0377)
+ 		return true;
+ 
+ 	return true;
+ }
+ 
+ static bool
+ is_ident_cont(char c)
+ {
+ 	if (c >= '0' && c <= '9')
+ 		return true;
+ 
+ 	return is_ident_start(c);
+ }
+ 
+ /*
+  * parse_ident - decompose text identifier to parts of
+  * identifiers and the rest (doesn't follow a identifier rule).
+  */
+ Datum
+ parse_ident(PG_FUNCTION_ARGS)
+ {
+ 	text		*qualname = PG_GETARG_TEXT_PP(0);
+ 	char		*qualname_str;
+ 	Datum		values[2];
+ 	bool		nulls[2];
+ 	TupleDesc	tupdesc;
+ 	ArrayBuildState *astate = NULL;
+ 	char	*nextp;
+ 
+ 	qualname_str = text_to_cstring(qualname);
+ 	nextp = qualname_str;
+ 
+ 	/* Prepare result tuple desc */
+ 	tupdesc = CreateTemplateTupleDesc(2, false);
+ 	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "parts",
+ 					    TEXTARRAYOID, -1, 0);
+ 	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "other",
+ 					    TEXTOID, -1, 0);
+ 
+ 	BlessTupleDesc(tupdesc);
+ 
+ 	nulls[0] = false;
+ 	nulls[1] = true;
+ 
+ 	/* skip leading whitespace */
+ 	while (isspace((unsigned char) *nextp))
+ 		nextp++;
+ 
+ 	do
+ 	{
+ 		char		*curname;
+ 		char		*endp;
+ 		bool		missing_ident;
+ 
+ 		missing_ident = true;
+ 
+ 		if (*nextp == '\"')
+ 		{
+ 			curname = nextp + 1;
+ 			for (;;)
+ 			{
+ 				endp = strchr(nextp + 1, '\"');
+ 				if (endp == NULL)
+ 					elog(ERROR, "unclosed double quotes");
+ 				if (endp[1] != '\"')
+ 					break;
+ 				memmove(endp, endp + 1, strlen(endp));
+ 				nextp = endp;
+ 			}
+ 			nextp = endp + 1;
+ 			*endp = '\0';
+ 
+ 			if (endp - curname == 0)
+ 				elog(ERROR, "empty identifier");
+ 
+ 			astate = accumArrayResult(astate,
+ 				CStringGetTextDatum(curname), false,
+ 						    TEXTOID, CurrentMemoryContext);
+ 			missing_ident = false;
+ 		}
+ 		else
+ 		{
+ 			if (is_ident_start(*nextp))
+ 			{
+ 				char *downname;
+ 				int	len;
+ 				text	*part;
+ 
+ 				curname = nextp++;
+ 				while (is_ident_cont(*nextp))
+ 					nextp++;
+ 
+ 				len = nextp - curname;
+ 
+ 				downname = downcase_truncate_identifier(curname, len, false);
+ 				part = cstring_to_text_with_len(downname, len);
+ 				astate = accumArrayResult(astate,
+ 					PointerGetDatum(part), false,
+ 							    TEXTOID, CurrentMemoryContext);
+ 				missing_ident = false;
+ 			}
+ 		}
+ 
+ 		if (missing_ident)
+ 			elog(ERROR, "missing identifier");
+ 
+ 		while (isspace((unsigned char) *nextp))
+ 			nextp++;
+ 
+ 		if (*nextp == '.')
+ 		{
+ 			nextp++;
+ 			while (isspace((unsigned char) *nextp))
+ 				nextp++;
+ 			continue;
+ 		}
+ 		else if (*nextp == '\0')
+ 		{
+ 			break;
+ 		}
+ 		else
+ 		{
+ 			values[1] = CStringGetTextDatum(nextp);
+ 			nulls[1] = false;
+ 			break;
+ 		}
+ 	} while (true);
+ 
+ 
+ 	values[0] = makeArrayResult(astate, CurrentMemoryContext);
+ 	nulls[0] = false;
+ 
+ 	PG_RETURN_DATUM(HeapTupleGetDatum(
+ 			    heap_form_tuple(tupdesc, values, nulls)));
+ }
+     
\ No newline at end of file
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
new file mode 100644
index ddf7c67..3ae7b50
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** DESCR("I/O");
*** 3516,3521 ****
--- 3516,3524 ----
  DATA(insert OID = 4086 (  to_regnamespace	PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 4089 "2275" _null_ _null_ _null_ _null_ _null_ to_regnamespace _null_ _null_ _null_ ));
  DESCR("convert namespace name to regnamespace");
  
+ DATA(insert OID = 3300 (  parse_ident		PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 2249 "25" "{25,1009,25}" "{i,o,o}" "{identifier,parts,other}" _null_ _null_ parse_ident _null_ _null_ _null_ ));
+ DESCR("parse pattern dbname.schema.object");
+ 
  DATA(insert OID = 2246 ( fmgr_internal_validator PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_internal_validator _null_ _null_ _null_ ));
  DESCR("(internal)");
  DATA(insert OID = 2247 ( fmgr_c_validator	PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_c_validator _null_ _null_ _null_ ));
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
new file mode 100644
index fc1679e..0cb491d
*** a/src/include/utils/builtins.h
--- b/src/include/utils/builtins.h
*************** extern Datum pg_typeof(PG_FUNCTION_ARGS)
*** 495,500 ****
--- 495,501 ----
  extern Datum pg_collation_for(PG_FUNCTION_ARGS);
  extern Datum pg_relation_is_updatable(PG_FUNCTION_ARGS);
  extern Datum pg_column_is_updatable(PG_FUNCTION_ARGS);
+ extern Datum parse_ident(PG_FUNCTION_ARGS);
  
  /* oid.c */
  extern Datum oidin(PG_FUNCTION_ARGS);
diff --git a/src/test/regress/expected/name.out b/src/test/regress/expected/name.out
new file mode 100644
index b359d52..2c84d8a
*** a/src/test/regress/expected/name.out
--- b/src/test/regress/expected/name.out
*************** SELECT '' AS two, c.f1 FROM NAME_TBL c W
*** 124,126 ****
--- 124,137 ----
  (2 rows)
  
  DROP TABLE NAME_TBL;
+ DO $$
+ DECLARE r record;
+ BEGIN
+   r := parse_ident('Schemax.Tabley');
+   RAISE NOTICE '%', format('%I.%I', r.parts[1], r.parts[2]);
+   r := parse_ident('"SchemaX"."TableY"');
+   RAISE NOTICE '%', format('%I.%I', r.parts[1], r.parts[2]);
+ END;
+ $$;
+ NOTICE:  schemax.tabley
+ NOTICE:  "SchemaX"."TableY"
diff --git a/src/test/regress/sql/name.sql b/src/test/regress/sql/name.sql
new file mode 100644
index 1c7a671..e80f9aa
*** a/src/test/regress/sql/name.sql
--- b/src/test/regress/sql/name.sql
*************** SELECT '' AS three, c.f1 FROM NAME_TBL c
*** 52,54 ****
--- 52,64 ----
  SELECT '' AS two, c.f1 FROM NAME_TBL c WHERE c.f1 ~ '.*asdf.*';
  
  DROP TABLE NAME_TBL;
+ 
+ DO $$
+ DECLARE r record;
+ BEGIN
+   r := parse_ident('Schemax.Tabley');
+   RAISE NOTICE '%', format('%I.%I', r.parts[1], r.parts[2]);
+   r := parse_ident('"SchemaX"."TableY"');
+   RAISE NOTICE '%', format('%I.%I', r.parts[1], r.parts[2]);
+ END;
+ $$;

#17

Pavel Stehule

pavel.stehule@gmail.com

over 10 years ago

In reply to: Pavel Stehule (#16)

1 attachment(s)

Re: proposal: function parse_ident

next iteration - fixed bug in parsing UTF8 chars, enhanced error messages.

Regards

Pavel

Attachments:

parse_ident-to-text-array-01.patchtext/x-patch; charset=US-ASCII; name=parse_ident-to-text-array-01.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index b3b78d2..75ea33a
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***************
*** 1707,1712 ****
--- 1707,1727 ----
        <row>
         <entry>
          <indexterm>
+          <primary>parse_ident</primary>
+         </indexterm>
+         <literal><function>parse_ident(<parameter>qualified_identifier</parameter> <type>text</type>,
+         OUT <parameter>parts</parameter> <type>text[]</type>, OUT <parameter>other</parameter> <type>text</type>)</function></literal>
+        </entry>
+        <entry><type>record</type></entry>
+        <entry>Split <parameter>qualified identifier</parameter> to array <parameter>parts</parameter>.
+        </entry>
+        <entry><literal>parse_ident('"SomeSchema".someTable')</literal></entry>
+        <entry><literal>("{SomeSchema,sometable}",)</literal></entry>
+       </row>
+ 
+       <row>
+        <entry>
+         <indexterm>
           <primary>pg_client_encoding</primary>
          </indexterm>
          <literal><function>pg_client_encoding()</function></literal>
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
new file mode 100644
index c0495d9..59f1e3e
*** a/src/backend/utils/adt/misc.c
--- b/src/backend/utils/adt/misc.c
***************
*** 21,32 ****
--- 21,35 ----
  #include <unistd.h>
  
  #include "access/sysattr.h"
+ #include "access/htup_details.h"
  #include "catalog/catalog.h"
+ #include "catalog/namespace.h"
  #include "catalog/pg_tablespace.h"
  #include "catalog/pg_type.h"
  #include "commands/dbcommands.h"
  #include "funcapi.h"
  #include "miscadmin.h"
+ #include "parser/scansup.h"
  #include "parser/keywords.h"
  #include "postmaster/syslogger.h"
  #include "rewrite/rewriteHandler.h"
***************
*** 38,43 ****
--- 41,47 ----
  #include "utils/ruleutils.h"
  #include "tcop/tcopprot.h"
  #include "utils/acl.h"
+ #include "utils/array.h"
  #include "utils/builtins.h"
  #include "utils/timestamp.h"
  
*************** pg_column_is_updatable(PG_FUNCTION_ARGS)
*** 598,600 ****
--- 602,759 ----
  
  	PG_RETURN_BOOL((events & REQ_EVENTS) == REQ_EVENTS);
  }
+ 
+ 
+ /*
+  * This simple parser utility are compatible with lexer implementation,
+  * used only in parse_ident function
+  */
+ static bool
+ is_ident_start(unsigned char c)
+ {
+ 	if (c == '_')
+ 		return true;
+ 	if (isalpha(c))
+ 		return true;
+ 
+ 	if (c >= 0200 && c <= 0377)
+ 		return true;
+ 
+ 	return false;
+ }
+ 
+ static bool
+ is_ident_cont(unsigned char c)
+ {
+ 	if (isdigit(c))
+ 		return true;
+ 
+ 	return is_ident_start(c);
+ }
+ 
+ /*
+  * parse_ident - decompose text identifier to parts of
+  * identifiers and the rest (doesn't follow a identifier rule).
+  */
+ Datum
+ parse_ident(PG_FUNCTION_ARGS)
+ {
+ 	text		*qualname = PG_GETARG_TEXT_PP(0);
+ 	char		*qualname_str;
+ 	Datum		values[2];
+ 	bool		nulls[2];
+ 	TupleDesc	tupdesc;
+ 	ArrayBuildState *astate = NULL;
+ 	char	*nextp;
+ 
+ 	qualname_str = text_to_cstring(qualname);
+ 	nextp = qualname_str;
+ 
+ 	/* Prepare result tuple desc */
+ 	tupdesc = CreateTemplateTupleDesc(2, false);
+ 	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "parts",
+ 					    TEXTARRAYOID, -1, 0);
+ 	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "other",
+ 					    TEXTOID, -1, 0);
+ 
+ 	BlessTupleDesc(tupdesc);
+ 
+ 	nulls[0] = false;
+ 	nulls[1] = true;
+ 
+ 	/* skip leading whitespace */
+ 	while (isspace((unsigned char) *nextp))
+ 		nextp++;
+ 
+ 	do
+ 	{
+ 		char		*curname;
+ 		char		*endp;
+ 		bool		missing_ident;
+ 
+ 		missing_ident = true;
+ 
+ 		if (*nextp == '\"')
+ 		{
+ 			curname = nextp + 1;
+ 			for (;;)
+ 			{
+ 				endp = strchr(nextp + 1, '\"');
+ 				if (endp == NULL)
+ 					ereport(ERROR,
+ 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 						 errmsg("unclused double quotes")));
+ 				if (endp[1] != '\"')
+ 					break;
+ 				memmove(endp, endp + 1, strlen(endp));
+ 				nextp = endp;
+ 			}
+ 			nextp = endp + 1;
+ 			*endp = '\0';
+ 
+ 			if (endp - curname == 0)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("identifier should not be empty")));
+ 
+ 			astate = accumArrayResult(astate,
+ 				CStringGetTextDatum(curname), false,
+ 						    TEXTOID, CurrentMemoryContext);
+ 			missing_ident = false;
+ 		}
+ 		else
+ 		{
+ 			if (is_ident_start((unsigned char) *nextp))
+ 			{
+ 				char *downname;
+ 				int	len;
+ 				text	*part;
+ 
+ 				curname = nextp++;
+ 				while (is_ident_cont((unsigned char) *nextp))
+ 					nextp++;
+ 
+ 				len = nextp - curname;
+ 
+ 				downname = downcase_truncate_identifier(curname, len, false);
+ 				part = cstring_to_text_with_len(downname, len);
+ 				astate = accumArrayResult(astate,
+ 					PointerGetDatum(part), false,
+ 							    TEXTOID, CurrentMemoryContext);
+ 				missing_ident = false;
+ 			}
+ 		}
+ 
+ 		if (missing_ident)
+ 			ereport(ERROR,
+ 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 				 errmsg("missing identifier after \".\" symbol")));
+ 
+ 		while (isspace((unsigned char) *nextp))
+ 			nextp++;
+ 
+ 		if (*nextp == '.')
+ 		{
+ 			nextp++;
+ 			while (isspace((unsigned char) *nextp))
+ 				nextp++;
+ 			continue;
+ 		}
+ 		else if (*nextp == '\0')
+ 		{
+ 			break;
+ 		}
+ 		else
+ 		{
+ 			values[1] = CStringGetTextDatum(nextp);
+ 			nulls[1] = false;
+ 			break;
+ 		}
+ 	} while (true);
+ 
+ 	values[0] = makeArrayResult(astate, CurrentMemoryContext);
+ 	nulls[0] = false;
+ 
+ 	PG_RETURN_DATUM(HeapTupleGetDatum(
+ 			    heap_form_tuple(tupdesc, values, nulls)));
+ }
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
new file mode 100644
index ddf7c67..3ae7b50
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** DESCR("I/O");
*** 3516,3521 ****
--- 3516,3524 ----
  DATA(insert OID = 4086 (  to_regnamespace	PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 4089 "2275" _null_ _null_ _null_ _null_ _null_ to_regnamespace _null_ _null_ _null_ ));
  DESCR("convert namespace name to regnamespace");
  
+ DATA(insert OID = 3300 (  parse_ident		PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 2249 "25" "{25,1009,25}" "{i,o,o}" "{identifier,parts,other}" _null_ _null_ parse_ident _null_ _null_ _null_ ));
+ DESCR("parse pattern dbname.schema.object");
+ 
  DATA(insert OID = 2246 ( fmgr_internal_validator PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_internal_validator _null_ _null_ _null_ ));
  DESCR("(internal)");
  DATA(insert OID = 2247 ( fmgr_c_validator	PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_c_validator _null_ _null_ _null_ ));
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
new file mode 100644
index fc1679e..0cb491d
*** a/src/include/utils/builtins.h
--- b/src/include/utils/builtins.h
*************** extern Datum pg_typeof(PG_FUNCTION_ARGS)
*** 495,500 ****
--- 495,501 ----
  extern Datum pg_collation_for(PG_FUNCTION_ARGS);
  extern Datum pg_relation_is_updatable(PG_FUNCTION_ARGS);
  extern Datum pg_column_is_updatable(PG_FUNCTION_ARGS);
+ extern Datum parse_ident(PG_FUNCTION_ARGS);
  
  /* oid.c */
  extern Datum oidin(PG_FUNCTION_ARGS);
diff --git a/src/test/regress/expected/name.out b/src/test/regress/expected/name.out
new file mode 100644
index b359d52..2c84d8a
*** a/src/test/regress/expected/name.out
--- b/src/test/regress/expected/name.out
*************** SELECT '' AS two, c.f1 FROM NAME_TBL c W
*** 124,126 ****
--- 124,137 ----
  (2 rows)
  
  DROP TABLE NAME_TBL;
+ DO $$
+ DECLARE r record;
+ BEGIN
+   r := parse_ident('Schemax.Tabley');
+   RAISE NOTICE '%', format('%I.%I', r.parts[1], r.parts[2]);
+   r := parse_ident('"SchemaX"."TableY"');
+   RAISE NOTICE '%', format('%I.%I', r.parts[1], r.parts[2]);
+ END;
+ $$;
+ NOTICE:  schemax.tabley
+ NOTICE:  "SchemaX"."TableY"
diff --git a/src/test/regress/sql/name.sql b/src/test/regress/sql/name.sql
new file mode 100644
index 1c7a671..e80f9aa
*** a/src/test/regress/sql/name.sql
--- b/src/test/regress/sql/name.sql
*************** SELECT '' AS three, c.f1 FROM NAME_TBL c
*** 52,54 ****
--- 52,64 ----
  SELECT '' AS two, c.f1 FROM NAME_TBL c WHERE c.f1 ~ '.*asdf.*';
  
  DROP TABLE NAME_TBL;
+ 
+ DO $$
+ DECLARE r record;
+ BEGIN
+   r := parse_ident('Schemax.Tabley');
+   RAISE NOTICE '%', format('%I.%I', r.parts[1], r.parts[2]);
+   r := parse_ident('"SchemaX"."TableY"');
+   RAISE NOTICE '%', format('%I.%I', r.parts[1], r.parts[2]);
+ END;
+ $$;

#18

Alvaro Herrera

alvherre@2ndquadrant.com

over 10 years ago

In reply to: Pavel Stehule (#15)

Re: proposal: function parse_ident

Pavel Stehule wrote:

I cannot to use current SplitIdentifierString because it is designed for
different purpose - and it cannot to separate non identifier part. But the
code is simple - and will be cleaned.

postgres=# select * from parse_ident('"AHOJ".NAZDAR[]'::text);
┌───────────────┬───────┐
│ parts │ other │
╞═══════════════╪═══════╡
│ {AHOJ,nazdar} │ [] │
└───────────────┴───────┘
(1 row)

Um. Now this is really getting into much of the same territory I got
into with the objname/objargs arrays for pg_get_object_address. I think
the "other" bit is a very poor solution to that.

If you want to be able to parse names for all kinds of objects, you need
a solution much more complex than this function. I think a clean
solution returns three sets of things; one is the primary part of the
name, which is an array of text; the second is the secondary name, which
is another array of text; the third is an array of TypeName.

For the name of a relation, only the first of these arrays is used. For
the name of objects subsidiary to a relation, the first two are used
(the first array is the name of the relation itself, and the second
array is the name of the object; for instance a constraint name, or a
trigger name).

The array of type names is necessary because the parsing of TypeName is
completely unlike parsing of plain names. You need [] decorator and
typmod. If you consider objects such as casts, you need two TypeNames
("from" and "to"), hence this is an array and not a single one. As far
as I recall there are other object types that also need more than one
TypeName.

For the name of a function, you need the first text array, and the array
of TypeName which are the input argument types.

If you don't want to have all this complexity, I think you need to forgo
the idea of the "other" thingy that you propose above, and just concern
yourself with the first bits. I don't think "AHOJ".NAZDAR[] is an
identifier.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Pavel Stehule

pavel.stehule@gmail.com

over 10 years ago

In reply to: Alvaro Herrera (#18)

Re: proposal: function parse_ident

2015-09-09 21:55 GMT+02:00 Alvaro Herrera <alvherre@2ndquadrant.com>:

Pavel Stehule wrote:

I cannot to use current SplitIdentifierString because it is designed for
different purpose - and it cannot to separate non identifier part. But

the

code is simple - and will be cleaned.

postgres=# select * from parse_ident('"AHOJ".NAZDAR[]'::text);
┌───────────────┬───────┐
│ parts │ other │
╞═══════════════╪═══════╡
│ {AHOJ,nazdar} │ [] │
└───────────────┴───────┘
(1 row)

Um. Now this is really getting into much of the same territory I got
into with the objname/objargs arrays for pg_get_object_address. I think
the "other" bit is a very poor solution to that.

If you want to be able to parse names for all kinds of objects, you need
a solution much more complex than this function. I think a clean
solution returns three sets of things; one is the primary part of the
name, which is an array of text; the second is the secondary name, which
is another array of text; the third is an array of TypeName.

For the name of a relation, only the first of these arrays is used. For
the name of objects subsidiary to a relation, the first two are used
(the first array is the name of the relation itself, and the second
array is the name of the object; for instance a constraint name, or a
trigger name).

The array of type names is necessary because the parsing of TypeName is
completely unlike parsing of plain names. You need [] decorator and
typmod. If you consider objects such as casts, you need two TypeNames
("from" and "to"), hence this is an array and not a single one. As far
as I recall there are other object types that also need more than one
TypeName.

For the name of a function, you need the first text array, and the array
of TypeName which are the input argument types.

If you don't want to have all this complexity, I think you need to forgo
the idea of the "other" thingy that you propose above, and just concern
yourself with the first bits. I don't think "AHOJ".NAZDAR[] is an
identifier.

yes, usually I don't need a "other" part. And If I need it, then I can get
it as difference against a original string. But sometimes you don't get a
clean string - and you have to find a end of identifier. The
SplitIdentifierString calculates only with separator char, and it cannot to
find end of ident. So little bit modified API can look like

CREATE OR REPLACE FUNCTION parse_ident(str text, strict boolean DEFAULT
true) RETURNS text[]

raise exception "syntax error" for '"AHOJ".NAZDAR[]' when "strict" is true
returns "AHOJ".nazdar for '"AHOJ".NAZDAR[]' when "strict" is false

Pavel

Show quoted text

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#20

Pavel Stehule

pavel.stehule@gmail.com

over 10 years ago

In reply to: Pavel Stehule (#19)

1 attachment(s)

Re: proposal: function parse_ident

new update of parse_ident function patch

Regards

Pavel

Attachments:

parse_ident-to-text-array-02.patchtext/x-patch; charset=US-ASCII; name=parse_ident-to-text-array-02.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index 22d4f61..1581d5a
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***************
*** 1707,1712 ****
--- 1707,1727 ----
        <row>
         <entry>
          <indexterm>
+          <primary>parse_ident</primary>
+         </indexterm>
+         <literal><function>parse_ident(<parameter>str</parameter> <type>text</type>,
+            [ <parameter>strictmode</parameter> <type>boolean</type> DEFAULT true ] )</function></literal>
+        </entry>
+        <entry><type>text[]</type></entry>
+        <entry>Split <parameter>qualified identifier</parameter> to array <parameter>parts</parameter>.
+        </entry>
+        <entry><literal>parse_ident('"SomeSchema".someTable')</literal></entry>
+        <entry><literal>"SomeSchema,sometable"</literal></entry>
+       </row>
+ 
+       <row>
+        <entry>
+         <indexterm>
           <primary>pg_client_encoding</primary>
          </indexterm>
          <literal><function>pg_client_encoding()</function></literal>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
new file mode 100644
index ccc030f..9a7e89d
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*************** RETURNS jsonb
*** 940,942 ****
--- 940,949 ----
  LANGUAGE INTERNAL
  STRICT IMMUTABLE
  AS 'jsonb_set';
+ 
+ CREATE OR REPLACE FUNCTION
+   parse_ident(str text, strictmode boolean DEFAULT true)
+ RETURNS text[]
+ LANGUAGE INTERNAL
+ STRICT IMMUTABLE
+ AS 'parse_ident';
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
new file mode 100644
index c0495d9..8a68533
*** a/src/backend/utils/adt/misc.c
--- b/src/backend/utils/adt/misc.c
***************
*** 21,32 ****
--- 21,35 ----
  #include <unistd.h>
  
  #include "access/sysattr.h"
+ #include "access/htup_details.h"
  #include "catalog/catalog.h"
+ #include "catalog/namespace.h"
  #include "catalog/pg_tablespace.h"
  #include "catalog/pg_type.h"
  #include "commands/dbcommands.h"
  #include "funcapi.h"
  #include "miscadmin.h"
+ #include "parser/scansup.h"
  #include "parser/keywords.h"
  #include "postmaster/syslogger.h"
  #include "rewrite/rewriteHandler.h"
***************
*** 38,43 ****
--- 41,47 ----
  #include "utils/ruleutils.h"
  #include "tcop/tcopprot.h"
  #include "utils/acl.h"
+ #include "utils/array.h"
  #include "utils/builtins.h"
  #include "utils/timestamp.h"
  
*************** pg_column_is_updatable(PG_FUNCTION_ARGS)
*** 598,600 ****
--- 602,754 ----
  
  	PG_RETURN_BOOL((events & REQ_EVENTS) == REQ_EVENTS);
  }
+ 
+ 
+ /*
+  * This simple parser utility are compatible with lexer implementation,
+  * used only in parse_ident function
+  */
+ static bool
+ is_ident_start(unsigned char c)
+ {
+ 	if (c == '_')
+ 		return true;
+ 	if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'))
+ 		return true;
+ 
+ 	if (c >= 0200 && c <= 0377)
+ 		return true;
+ 
+ 	return false;
+ }
+ 
+ static bool
+ is_ident_cont(unsigned char c)
+ {
+ 	if (c >= '0' && c <= '9')
+ 		return true;
+ 
+ 	return is_ident_start(c);
+ }
+ 
+ /*
+  * parse_ident - decompose text identifier to parts of
+  * identifiers and the rest (doesn't follow a identifier rule).
+  */
+ Datum
+ parse_ident(PG_FUNCTION_ARGS)
+ {
+ 	text		*qualname;
+ 	char		*qualname_str;
+ 	bool		strict_mode;
+ 	ArrayBuildState *astate = NULL;
+ 	char	*nextp;
+ 
+ 	qualname = PG_GETARG_TEXT_PP(0);
+ 	qualname_str = text_to_cstring(qualname);
+ 	strict_mode = PG_GETARG_BOOL(1);
+ 
+ 	nextp = qualname_str;
+ 
+ 	/* skip leading whitespace */
+ 	while (isspace((unsigned char) *nextp))
+ 		nextp++;
+ 
+ 	do
+ 	{
+ 		char		*curname;
+ 		char		*endp;
+ 		bool		missing_ident;
+ 
+ 		missing_ident = true;
+ 
+ 		if (*nextp == '\"')
+ 		{
+ 			curname = nextp + 1;
+ 			for (;;)
+ 			{
+ 				endp = strchr(nextp + 1, '\"');
+ 				if (endp == NULL)
+ 					ereport(ERROR,
+ 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 						 errmsg("unclused double quotes"),
+ 						 errdetail("string \"%s\" is not valid identifier",
+ 							    text_to_cstring(qualname))));
+ 				if (endp[1] != '\"')
+ 					break;
+ 				memmove(endp, endp + 1, strlen(endp));
+ 				nextp = endp;
+ 			}
+ 			nextp = endp + 1;
+ 			*endp = '\0';
+ 
+ 			if (endp - curname == 0)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("identifier should not be empty"),
+ 					 errdetail("string \"%s\" is not valid identifier",
+ 							    text_to_cstring(qualname))));
+ 
+ 			astate = accumArrayResult(astate,
+ 				CStringGetTextDatum(curname), false,
+ 						    TEXTOID, CurrentMemoryContext);
+ 			missing_ident = false;
+ 		}
+ 		else
+ 		{
+ 			if (is_ident_start((unsigned char) *nextp))
+ 			{
+ 				char *downname;
+ 				int	len;
+ 				text	*part;
+ 
+ 				curname = nextp++;
+ 				while (is_ident_cont((unsigned char) *nextp))
+ 					nextp++;
+ 
+ 				len = nextp - curname;
+ 
+ 				downname = downcase_truncate_identifier(curname, len, false);
+ 				part = cstring_to_text_with_len(downname, len);
+ 				astate = accumArrayResult(astate,
+ 					PointerGetDatum(part), false,
+ 							    TEXTOID, CurrentMemoryContext);
+ 				missing_ident = false;
+ 			}
+ 		}
+ 
+ 		if (missing_ident)
+ 			ereport(ERROR,
+ 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 				 errmsg("missing identifier after \".\" symbol"),
+ 				 errdetail("string \"%s\" is not valid identifier",
+ 							    text_to_cstring(qualname))));
+ 
+ 		while (isspace((unsigned char) *nextp))
+ 			nextp++;
+ 
+ 		if (*nextp == '.')
+ 		{
+ 			nextp++;
+ 			while (isspace((unsigned char) *nextp))
+ 				nextp++;
+ 			continue;
+ 		}
+ 		else if (*nextp == '\0')
+ 		{
+ 			break;
+ 		}
+ 		else
+ 		{
+ 			if (strict_mode)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("identifier contains disallowed chars"),
+ 					 errdetail("string \"%s\" is not valid identifier",
+ 								    text_to_cstring(qualname))));
+ 			break;
+ 		}
+ 	} while (true);
+ 
+ 	PG_RETURN_DATUM(makeArrayResult(astate, CurrentMemoryContext));
+ }
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
new file mode 100644
index ddf7c67..923c99b
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** DESCR("I/O");
*** 3516,3521 ****
--- 3516,3524 ----
  DATA(insert OID = 4086 (  to_regnamespace	PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 4089 "2275" _null_ _null_ _null_ _null_ _null_ to_regnamespace _null_ _null_ _null_ ));
  DESCR("convert namespace name to regnamespace");
  
+ DATA(insert OID = 3300 (  parse_ident		PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 1009 "25 16" _null_ _null_ "{str,strictmode}" _null_ _null_ parse_ident _null_ _null_ _null_ ));
+ DESCR("parse qualified identifier to array of identifiers");
+ 
  DATA(insert OID = 2246 ( fmgr_internal_validator PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_internal_validator _null_ _null_ _null_ ));
  DESCR("(internal)");
  DATA(insert OID = 2247 ( fmgr_c_validator	PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_c_validator _null_ _null_ _null_ ));
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
new file mode 100644
index fc1679e..0cb491d
*** a/src/include/utils/builtins.h
--- b/src/include/utils/builtins.h
*************** extern Datum pg_typeof(PG_FUNCTION_ARGS)
*** 495,500 ****
--- 495,501 ----
  extern Datum pg_collation_for(PG_FUNCTION_ARGS);
  extern Datum pg_relation_is_updatable(PG_FUNCTION_ARGS);
  extern Datum pg_column_is_updatable(PG_FUNCTION_ARGS);
+ extern Datum parse_ident(PG_FUNCTION_ARGS);
  
  /* oid.c */
  extern Datum oidin(PG_FUNCTION_ARGS);
diff --git a/src/test/regress/expected/name.out b/src/test/regress/expected/name.out
new file mode 100644
index b359d52..ebd7f0a
*** a/src/test/regress/expected/name.out
--- b/src/test/regress/expected/name.out
*************** SELECT '' AS two, c.f1 FROM NAME_TBL c W
*** 124,126 ****
--- 124,152 ----
  (2 rows)
  
  DROP TABLE NAME_TBL;
+ DO $$
+ DECLARE r text[];
+ BEGIN
+   r := parse_ident('Schemax.Tabley');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+   r := parse_ident('"SchemaX"."TableY"');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+ END;
+ $$;
+ NOTICE:  schemax.tabley
+ NOTICE:  "SchemaX"."TableY"
+ SELECT parse_ident('foo.boo');
+  parse_ident 
+ -------------
+  {foo,boo}
+ (1 row)
+ 
+ SELECT parse_ident('foo.boo[]'); -- should fail
+ ERROR:  identifier contains disallowed chars
+ DETAIL:  string "foo.boo[]" is not valid identifier
+ SELECT parse_ident('foo.boo[]', strictmode => false); -- ok
+  parse_ident 
+ -------------
+  {foo,boo}
+ (1 row)
+ 
diff --git a/src/test/regress/sql/name.sql b/src/test/regress/sql/name.sql
new file mode 100644
index 1c7a671..629e23f
*** a/src/test/regress/sql/name.sql
--- b/src/test/regress/sql/name.sql
*************** SELECT '' AS three, c.f1 FROM NAME_TBL c
*** 52,54 ****
--- 52,69 ----
  SELECT '' AS two, c.f1 FROM NAME_TBL c WHERE c.f1 ~ '.*asdf.*';
  
  DROP TABLE NAME_TBL;
+ 
+ DO $$
+ DECLARE r text[];
+ BEGIN
+   r := parse_ident('Schemax.Tabley');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+   r := parse_ident('"SchemaX"."TableY"');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+ END;
+ $$;
+ 
+ SELECT parse_ident('foo.boo');
+ SELECT parse_ident('foo.boo[]'); -- should fail
+ SELECT parse_ident('foo.boo[]', strictmode => false); -- ok
+

#21

Peter Eisentraut

peter_e@gmx.net

over 10 years ago

In reply to: Pavel Stehule (#20)

Re: proposal: function parse_ident

On 9/11/15 6:25 AM, Pavel Stehule wrote:

new update of parse_ident function patch

How would you actually use this?

I know several people have spoken up that they could use this, but could
you provide a few actual practical examples?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22

Pavel Stehule

pavel.stehule@gmail.com

over 10 years ago

In reply to: Peter Eisentraut (#21)

Re: proposal: function parse_ident

2015-09-16 2:41 GMT+02:00 Peter Eisentraut <peter_e@gmx.net>:

On 9/11/15 6:25 AM, Pavel Stehule wrote:

new update of parse_ident function patch

How would you actually use this?

I know several people have spoken up that they could use this, but could
you provide a few actual practical examples?

I see two basic use cases

1. processing user input with little bit more comfort - the user doesn't
need to separate schema and table

CREATE OR REPLACE FUNCTION createtable(tablename text)
RETURNS void AS $$
DECLARE names text[];
BEGIN
names := parse_ident(tablename);
IF array_length(names) > 2 || array_length(names) = 0 THEN
RAISE EXCEPTION 'wrong identifier';
END IF;
IF names[2] IS NOT NULL THEN
CREATE SCHEMA IF NOT EXISTS names[2];
END IF;
CREATE TABLE tablename;
END;
$$ LANGUAGE plpgsql;

2. parsing error messages or some automatic variables

Regards

Pavel

#23

Marko Tiikkaja

marko@joh.to

about 10 years ago

In reply to: Pavel Stehule (#20)

Re: proposal: function parse_ident

On 9/11/15 12:25 PM, Pavel Stehule wrote:

new update of parse_ident function patch

Nice! I've certainly wanted something like this a number of times.

Some comments about the v2 of the patch:

- The patch doesn't apply anymore, so it should be rebased.
- The docs don't even try and explain what the "strictmode"
parameter does.
- The comment before the parse_ident function is not up to date
anymore, since "the rest" was removed from the interface.
- I can't immediately grep for any uses of do { .. } while (true)
from our code base. AFAICT the majority look like for (;;); I see no
reason not to be consistent here.
- What should happen if the input is a string like
'one.two.three.four.five.six'? Do we want to accept input like that?
- I haven't reviewed the actual parsing code carefully, but didn't
we already have a function which splits identifiers up? I of course
can't find one with my grepping right now, so I might be wrong.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

Pavel Stehule

pavel.stehule@gmail.com

about 10 years ago

In reply to: Marko Tiikkaja (#23)

1 attachment(s)

Re: proposal: function parse_ident

2015-11-17 1:49 GMT+01:00 Marko Tiikkaja <marko@joh.to>:

On 9/11/15 12:25 PM, Pavel Stehule wrote:

new update of parse_ident function patch

Nice! I've certainly wanted something like this a number of times.

Some comments about the v2 of the patch:

- The patch doesn't apply anymore, so it should be rebased.

done

- The docs don't even try and explain what the "strictmode" parameter
does.

fixed

- The comment before the parse_ident function is not up to date anymore,

since "the rest" was removed from the interface.

fixed

- I can't immediately grep for any uses of do { .. } while (true) from

our code base. AFAICT the majority look like for (;;); I see no reason
not to be consistent here.

fixed

- What should happen if the input is a string like

'one.two.three.four.five.six'? Do we want to accept input like that?

I don't see the reason why not. It is pretty simple to count fields in
result array and raise error later. The application has better information
about expected and valid numbers. But any opinion in this question should
be valid. I have not strong position here.

- I haven't reviewed the actual parsing code carefully, but didn't we
already have a function which splits identifiers up? I of course can't
find one with my grepping right now, so I might be wrong.

There is: SplitIdentifierString or textToQualifiedNameList in varlena.c. My
first patch was based on these functions. But I cannot to use it.

1. major reason: The buildin parser is based on searching the dot "." and
doesn't search any disallowed identifiers chars. So there is not possible
to implement non strict mode - find last char of last identifier and ignore
other.
2. minor reason: little bit more precious diagnostics - buildin routines
returns only true (valid) and false (invalid).

Regards

Pavel

Show quoted text

.m

Attachments:

parse_ident-to-text-array-04.patchtext/x-patch; charset=US-ASCII; name=parse_ident-to-text-array-04.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index 60b9a09..7b65ef4
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***************
*** 1707,1712 ****
--- 1707,1729 ----
        <row>
         <entry>
          <indexterm>
+          <primary>parse_ident</primary>
+         </indexterm>
+         <literal><function>parse_ident(<parameter>str</parameter> <type>text</type>,
+            [ <parameter>strictmode</parameter> <type>boolean</type> DEFAULT true ] )</function></literal>
+        </entry>
+        <entry><type>text[]</type></entry>
+        <entry>Split <parameter>qualified identifier</parameter> to array <parameter>parts</parameter>.
+        When second parameter is true, then no any chars after last identifier is allowed. When
+        second parameter is false, then chars after last identifier are ignored.
+        </entry>
+        <entry><literal>parse_ident('"SomeSchema".someTable')</literal></entry>
+        <entry><literal>"SomeSchema,sometable"</literal></entry>
+       </row>
+ 
+       <row>
+        <entry>
+         <indexterm>
           <primary>pg_client_encoding</primary>
          </indexterm>
          <literal><function>pg_client_encoding()</function></literal>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
new file mode 100644
index ccc030f..9a7e89d
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*************** RETURNS jsonb
*** 940,942 ****
--- 940,949 ----
  LANGUAGE INTERNAL
  STRICT IMMUTABLE
  AS 'jsonb_set';
+ 
+ CREATE OR REPLACE FUNCTION
+   parse_ident(str text, strictmode boolean DEFAULT true)
+ RETURNS text[]
+ LANGUAGE INTERNAL
+ STRICT IMMUTABLE
+ AS 'parse_ident';
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
new file mode 100644
index 3ef6e43..2540dac
*** a/src/backend/utils/adt/misc.c
--- b/src/backend/utils/adt/misc.c
***************
*** 21,32 ****
--- 21,35 ----
  #include <unistd.h>
  
  #include "access/sysattr.h"
+ #include "access/htup_details.h"
  #include "catalog/catalog.h"
+ #include "catalog/namespace.h"
  #include "catalog/pg_tablespace.h"
  #include "catalog/pg_type.h"
  #include "commands/dbcommands.h"
  #include "funcapi.h"
  #include "miscadmin.h"
+ #include "parser/scansup.h"
  #include "parser/keywords.h"
  #include "postmaster/syslogger.h"
  #include "rewrite/rewriteHandler.h"
***************
*** 38,43 ****
--- 41,47 ----
  #include "utils/ruleutils.h"
  #include "tcop/tcopprot.h"
  #include "utils/acl.h"
+ #include "utils/array.h"
  #include "utils/builtins.h"
  #include "utils/timestamp.h"
  
*************** pg_column_is_updatable(PG_FUNCTION_ARGS)
*** 598,600 ****
--- 602,755 ----
  
  	PG_RETURN_BOOL((events & REQ_EVENTS) == REQ_EVENTS);
  }
+ 
+ 
+ /*
+  * This simple parser utility are compatible with lexer implementation,
+  * used only in parse_ident function
+  */
+ static bool
+ is_ident_start(unsigned char c)
+ {
+ 	if (c == '_')
+ 		return true;
+ 	if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'))
+ 		return true;
+ 
+ 	if (c >= 0200 && c <= 0377)
+ 		return true;
+ 
+ 	return false;
+ }
+ 
+ static bool
+ is_ident_cont(unsigned char c)
+ {
+ 	if (c >= '0' && c <= '9')
+ 		return true;
+ 
+ 	return is_ident_start(c);
+ }
+ 
+ /*
+  * parse_ident - parse SQL composed identifier to separate identifiers.
+  * When strict mode is active (second parameter), then any chars after
+  * last identifiers are disallowed.
+  */
+ Datum
+ parse_ident(PG_FUNCTION_ARGS)
+ {
+ 	text		*qualname;
+ 	char		*qualname_str;
+ 	bool		strict_mode;
+ 	ArrayBuildState *astate = NULL;
+ 	char	*nextp;
+ 
+ 	qualname = PG_GETARG_TEXT_PP(0);
+ 	qualname_str = text_to_cstring(qualname);
+ 	strict_mode = PG_GETARG_BOOL(1);
+ 
+ 	nextp = qualname_str;
+ 
+ 	/* skip leading whitespace */
+ 	while (isspace((unsigned char) *nextp))
+ 		nextp++;
+ 
+ 	for (;;)
+ 	{
+ 		char		*curname;
+ 		char		*endp;
+ 		bool		missing_ident;
+ 
+ 		missing_ident = true;
+ 
+ 		if (*nextp == '\"')
+ 		{
+ 			curname = nextp + 1;
+ 			for (;;)
+ 			{
+ 				endp = strchr(nextp + 1, '\"');
+ 				if (endp == NULL)
+ 					ereport(ERROR,
+ 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 						 errmsg("unclosed double quotes"),
+ 						 errdetail("string \"%s\" is not valid identifier",
+ 							    text_to_cstring(qualname))));
+ 				if (endp[1] != '\"')
+ 					break;
+ 				memmove(endp, endp + 1, strlen(endp));
+ 				nextp = endp;
+ 			}
+ 			nextp = endp + 1;
+ 			*endp = '\0';
+ 
+ 			if (endp - curname == 0)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("identifier should not be empty"),
+ 					 errdetail("string \"%s\" is not valid identifier",
+ 							    text_to_cstring(qualname))));
+ 
+ 			astate = accumArrayResult(astate,
+ 				CStringGetTextDatum(curname), false,
+ 						    TEXTOID, CurrentMemoryContext);
+ 			missing_ident = false;
+ 		}
+ 		else
+ 		{
+ 			if (is_ident_start((unsigned char) *nextp))
+ 			{
+ 				char *downname;
+ 				int	len;
+ 				text	*part;
+ 
+ 				curname = nextp++;
+ 				while (is_ident_cont((unsigned char) *nextp))
+ 					nextp++;
+ 
+ 				len = nextp - curname;
+ 
+ 				downname = downcase_truncate_identifier(curname, len, false);
+ 				part = cstring_to_text_with_len(downname, len);
+ 				astate = accumArrayResult(astate,
+ 					PointerGetDatum(part), false,
+ 							    TEXTOID, CurrentMemoryContext);
+ 				missing_ident = false;
+ 			}
+ 		}
+ 
+ 		if (missing_ident)
+ 			ereport(ERROR,
+ 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 				 errmsg("missing identifier after \".\" symbol"),
+ 				 errdetail("string \"%s\" is not valid identifier",
+ 							    text_to_cstring(qualname))));
+ 
+ 		while (isspace((unsigned char) *nextp))
+ 			nextp++;
+ 
+ 		if (*nextp == '.')
+ 		{
+ 			nextp++;
+ 			while (isspace((unsigned char) *nextp))
+ 				nextp++;
+ 			continue;
+ 		}
+ 		else if (*nextp == '\0')
+ 		{
+ 			break;
+ 		}
+ 		else
+ 		{
+ 			if (strict_mode)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("identifier contains disallowed chars"),
+ 					 errdetail("string \"%s\" is not valid identifier",
+ 								    text_to_cstring(qualname))));
+ 			break;
+ 		}
+ 	}
+ 
+ 	PG_RETURN_DATUM(makeArrayResult(astate, CurrentMemoryContext));
+ }
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
new file mode 100644
index d8640db..0c27876
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** DESCR("I/O");
*** 3520,3525 ****
--- 3520,3528 ----
  DATA(insert OID = 4086 (  to_regnamespace	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 4089 "2275" _null_ _null_ _null_ _null_ _null_ to_regnamespace _null_ _null_ _null_ ));
  DESCR("convert namespace name to regnamespace");
  
+ DATA(insert OID = 3317 (  parse_ident		PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 1009 "25 16" _null_ _null_ "{str,strictmode}" _null_ _null_ parse_ident _null_ _null_ _null_ ));
+ DESCR("parse qualified identifier to array of identifiers");
+ 
  DATA(insert OID = 2246 ( fmgr_internal_validator PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_internal_validator _null_ _null_ _null_ ));
  DESCR("(internal)");
  DATA(insert OID = 2247 ( fmgr_c_validator	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_c_validator _null_ _null_ _null_ ));
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
new file mode 100644
index e610bf3..773af5d
*** a/src/include/utils/builtins.h
--- b/src/include/utils/builtins.h
*************** extern Datum pg_typeof(PG_FUNCTION_ARGS)
*** 495,500 ****
--- 495,501 ----
  extern Datum pg_collation_for(PG_FUNCTION_ARGS);
  extern Datum pg_relation_is_updatable(PG_FUNCTION_ARGS);
  extern Datum pg_column_is_updatable(PG_FUNCTION_ARGS);
+ extern Datum parse_ident(PG_FUNCTION_ARGS);
  
  /* oid.c */
  extern Datum oidin(PG_FUNCTION_ARGS);
diff --git a/src/test/regress/expected/name.out b/src/test/regress/expected/name.out
new file mode 100644
index b359d52..ebd7f0a
*** a/src/test/regress/expected/name.out
--- b/src/test/regress/expected/name.out
*************** SELECT '' AS two, c.f1 FROM NAME_TBL c W
*** 124,126 ****
--- 124,152 ----
  (2 rows)
  
  DROP TABLE NAME_TBL;
+ DO $$
+ DECLARE r text[];
+ BEGIN
+   r := parse_ident('Schemax.Tabley');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+   r := parse_ident('"SchemaX"."TableY"');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+ END;
+ $$;
+ NOTICE:  schemax.tabley
+ NOTICE:  "SchemaX"."TableY"
+ SELECT parse_ident('foo.boo');
+  parse_ident 
+ -------------
+  {foo,boo}
+ (1 row)
+ 
+ SELECT parse_ident('foo.boo[]'); -- should fail
+ ERROR:  identifier contains disallowed chars
+ DETAIL:  string "foo.boo[]" is not valid identifier
+ SELECT parse_ident('foo.boo[]', strictmode => false); -- ok
+  parse_ident 
+ -------------
+  {foo,boo}
+ (1 row)
+ 
diff --git a/src/test/regress/sql/name.sql b/src/test/regress/sql/name.sql
new file mode 100644
index 1c7a671..629e23f
*** a/src/test/regress/sql/name.sql
--- b/src/test/regress/sql/name.sql
*************** SELECT '' AS three, c.f1 FROM NAME_TBL c
*** 52,54 ****
--- 52,69 ----
  SELECT '' AS two, c.f1 FROM NAME_TBL c WHERE c.f1 ~ '.*asdf.*';
  
  DROP TABLE NAME_TBL;
+ 
+ DO $$
+ DECLARE r text[];
+ BEGIN
+   r := parse_ident('Schemax.Tabley');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+   r := parse_ident('"SchemaX"."TableY"');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+ END;
+ $$;
+ 
+ SELECT parse_ident('foo.boo');
+ SELECT parse_ident('foo.boo[]'); -- should fail
+ SELECT parse_ident('foo.boo[]', strictmode => false); -- ok
+

#25

Jim Nasby

Jim.Nasby@BlueTreble.com

about 10 years ago

In reply to: Pavel Stehule (#22)

Re: proposal: function parse_ident

On 9/15/15 11:49 PM, Pavel Stehule wrote:

1. processing user input with little bit more comfort - the user doesn't
need to separate schema and table

This is especially useful if you're doing anything that needs to
dynamically work with different objects. I'd say about 80% of the time
I'm doing this ::regclass is good enough, but obviously the object has
to exist then, and ::regclass doesn't separate schema from name.

There's a number of other handy convenience functions/views you can
create to interface with the catalog, none of which are rocket science.
But you're pretty screwed if what you want isn't in the catalog yet. (On
a side note, something my TODO is to restart pg_newsysviews as an
extension, and then add a bunch of convenience functions on top of
that... things like relation_info(regclass) RETURNS (everything in
pg_class, plus other useful bits like nspname), and
relation_schema(regclass) RETURNS regnamespace).

FWIW, the other function I've wanted in the past that's difficult to
implement externally is parsing the arguments of a function definition.
::regprocedure kinda works for this, but it blows up on certain things
(defaults being one, iirc). I've specifically wanted that capability for
a function I wrote that made it easy to specify *everything* about a
function in a single call, including it's permissions and a comment on
the function. That may sound trivial, but it's a PITA to cut and paste
the whole argument list into multiple REVOKE/GRANT/COMMENT on
statements. Even worse, not all the options of CREATE FUNCTION are
supported in those other commands, so often you can't even just cut and
paste.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26

Michael Paquier

michael.paquier@gmail.com

about 10 years ago

In reply to: Pavel Stehule (#24)

Re: proposal: function parse_ident

On Thu, Dec 3, 2015 at 5:31 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

There is: SplitIdentifierString or textToQualifiedNameList in varlena.c. My
first patch was based on these functions. But I cannot to use it.

1. major reason: The buildin parser is based on searching the dot "." and
doesn't search any disallowed identifiers chars. So there is not possible to
implement non strict mode - find last char of last identifier and ignore
other.
2. minor reason: little bit more precious diagnostics - buildin routines
returns only true (valid) and false (invalid).

I am moving that to next CF because there is a patch but no actual
reviews, and a couple of hackers have showed interest in having that
based on the latest updates on this thread.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27

Marko Tiikkaja

marko@joh.to

almost 10 years ago

In reply to: Pavel Stehule (#24)

Re: proposal: function parse_ident

Hi Pavel,

Sorry for the lack of review here. I didn't realize I was still the
reviewer for this after it had been moved to another commitfest.

That said, I think I've exhausted my usefulness here as a reviewer.
Marking ready for committer.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28

Michael Paquier

michael.paquier@gmail.com

almost 10 years ago

In reply to: Marko Tiikkaja (#27)

Re: proposal: function parse_ident

On Sat, Jan 23, 2016 at 1:25 AM, Marko Tiikkaja <marko@joh.to> wrote:

Hi Pavel,

Sorry for the lack of review here. I didn't realize I was still the
reviewer for this after it had been moved to another commitfest.

That said, I think I've exhausted my usefulness here as a reviewer. Marking
ready for committer.

+                      errmsg("identifier contains disallowed chars"),
+                      errdetail("string \"%s\" is not valid identifier",
+                                     text_to_cstring(qualname))));
Perhaps, "identifier contains not allowed character" is better?
-- 
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29

Tom Lane

tgl@sss.pgh.pa.us

almost 10 years ago

In reply to: Michael Paquier (#28)

Re: proposal: function parse_ident

Michael Paquier <michael.paquier@gmail.com> writes:

On Sat, Jan 23, 2016 at 1:25 AM, Marko Tiikkaja <marko@joh.to> wrote:
+                      errmsg("identifier contains disallowed chars"),
+                      errdetail("string \"%s\" is not valid identifier",
+                                     text_to_cstring(qualname))));
Perhaps, "identifier contains not allowed character" is better?

"disallowed" reads better to me. I agree with expanding "chars" to
"characters" though. Also, the errdetail is conveying no actual extra
detail AFAICS. I'd go with something like

errmsg("identifier contains disallowed characters: \"%s\"",
text_to_cstring(qualname)));

regards, tom lane

The errdeta

regards, tom lane

--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30

Pavel Stehule

pavel.stehule@gmail.com

almost 10 years ago

In reply to: Tom Lane (#29)

1 attachment(s)

Re: proposal: function parse_ident

2016-01-23 16:36 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:

Michael Paquier <michael.paquier@gmail.com> writes:
On Sat, Jan 23, 2016 at 1:25 AM, Marko Tiikkaja <marko@joh.to> wrote:
+                      errmsg("identifier contains disallowed chars"),
+                      errdetail("string \"%s\" is not valid identifier",
+                                     text_to_cstring(qualname))));
Perhaps, "identifier contains not allowed character" is better?
"disallowed" reads better to me. I agree with expanding "chars" to
"characters" though. Also, the errdetail is conveying no actual extra
detail AFAICS. I'd go with something like

errmsg("identifier contains disallowed characters: \"%s\"",
text_to_cstring(qualname)));

regards, tom lane

rebased, messages changes per Tom's proposal

Regards

Pavel

Show quoted text

The errdeta

regards, tom lane

--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachments:

parse_ident-to-text-array-05.patchtext/x-patch; charset=US-ASCII; name=parse_ident-to-text-array-05.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index 9c143b2..f85bfd0
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***************
*** 1778,1783 ****
--- 1778,1800 ----
        <row>
         <entry>
          <indexterm>
+          <primary>parse_ident</primary>
+         </indexterm>
+         <literal><function>parse_ident(<parameter>str</parameter> <type>text</type>,
+            [ <parameter>strictmode</parameter> <type>boolean</type> DEFAULT true ] )</function></literal>
+        </entry>
+        <entry><type>text[]</type></entry>
+        <entry>Split <parameter>qualified identifier</parameter> to array <parameter>parts</parameter>.
+        When second parameter is true, then no any chars after last identifier is allowed. When
+        second parameter is false, then chars after last identifier are ignored.
+        </entry>
+        <entry><literal>parse_ident('"SomeSchema".someTable')</literal></entry>
+        <entry><literal>"SomeSchema,sometable"</literal></entry>
+       </row>
+ 
+       <row>
+        <entry>
+         <indexterm>
           <primary>pg_client_encoding</primary>
          </indexterm>
          <literal><function>pg_client_encoding()</function></literal>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
new file mode 100644
index 923fe58..61d5b80
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*************** RETURNS jsonb
*** 965,967 ****
--- 965,974 ----
  LANGUAGE INTERNAL
  STRICT IMMUTABLE
  AS 'jsonb_set';
+ 
+ CREATE OR REPLACE FUNCTION
+   parse_ident(str text, strictmode boolean DEFAULT true)
+ RETURNS text[]
+ LANGUAGE INTERNAL
+ STRICT IMMUTABLE
+ AS 'parse_ident';
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
new file mode 100644
index 6a306f3..a6c3452
*** a/src/backend/utils/adt/misc.c
--- b/src/backend/utils/adt/misc.c
***************
*** 21,32 ****
--- 21,35 ----
  #include <unistd.h>
  
  #include "access/sysattr.h"
+ #include "access/htup_details.h"
  #include "catalog/catalog.h"
+ #include "catalog/namespace.h"
  #include "catalog/pg_tablespace.h"
  #include "catalog/pg_type.h"
  #include "commands/dbcommands.h"
  #include "funcapi.h"
  #include "miscadmin.h"
+ #include "parser/scansup.h"
  #include "parser/keywords.h"
  #include "postmaster/syslogger.h"
  #include "rewrite/rewriteHandler.h"
***************
*** 38,43 ****
--- 41,47 ----
  #include "utils/ruleutils.h"
  #include "tcop/tcopprot.h"
  #include "utils/acl.h"
+ #include "utils/array.h"
  #include "utils/builtins.h"
  #include "utils/timestamp.h"
  
*************** pg_column_is_updatable(PG_FUNCTION_ARGS)
*** 598,600 ****
--- 602,752 ----
  
  	PG_RETURN_BOOL((events & REQ_EVENTS) == REQ_EVENTS);
  }
+ 
+ 
+ /*
+  * This simple parser utility are compatible with lexer implementation,
+  * used only in parse_ident function
+  */
+ static bool
+ is_ident_start(unsigned char c)
+ {
+ 	if (c == '_')
+ 		return true;
+ 	if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'))
+ 		return true;
+ 
+ 	if (c >= 0200 && c <= 0377)
+ 		return true;
+ 
+ 	return false;
+ }
+ 
+ static bool
+ is_ident_cont(unsigned char c)
+ {
+ 	if (c >= '0' && c <= '9')
+ 		return true;
+ 
+ 	return is_ident_start(c);
+ }
+ 
+ /*
+  * parse_ident - parse SQL composed identifier to separate identifiers.
+  * When strict mode is active (second parameter), then any chars after
+  * last identifiers are disallowed.
+  */
+ Datum
+ parse_ident(PG_FUNCTION_ARGS)
+ {
+ 	text		*qualname;
+ 	char		*qualname_str;
+ 	bool		strict_mode;
+ 	ArrayBuildState *astate = NULL;
+ 	char	*nextp;
+ 
+ 	qualname = PG_GETARG_TEXT_PP(0);
+ 	qualname_str = text_to_cstring(qualname);
+ 	strict_mode = PG_GETARG_BOOL(1);
+ 
+ 	nextp = qualname_str;
+ 
+ 	/* skip leading whitespace */
+ 	while (isspace((unsigned char) *nextp))
+ 		nextp++;
+ 
+ 	for (;;)
+ 	{
+ 		char		*curname;
+ 		char		*endp;
+ 		bool		missing_ident;
+ 
+ 		missing_ident = true;
+ 
+ 		if (*nextp == '\"')
+ 		{
+ 			curname = nextp + 1;
+ 			for (;;)
+ 			{
+ 				endp = strchr(nextp + 1, '\"');
+ 				if (endp == NULL)
+ 					ereport(ERROR,
+ 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 						 errmsg("unclosed double quotes"),
+ 						 errdetail("string \"%s\" is not valid identifier",
+ 							    text_to_cstring(qualname))));
+ 				if (endp[1] != '\"')
+ 					break;
+ 				memmove(endp, endp + 1, strlen(endp));
+ 				nextp = endp;
+ 			}
+ 			nextp = endp + 1;
+ 			*endp = '\0';
+ 
+ 			if (endp - curname == 0)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("identifier should not be empty: \"%s\"",
+ 								    text_to_cstring(qualname))));
+ 
+ 			astate = accumArrayResult(astate,
+ 				CStringGetTextDatum(curname), false,
+ 						    TEXTOID, CurrentMemoryContext);
+ 			missing_ident = false;
+ 		}
+ 		else
+ 		{
+ 			if (is_ident_start((unsigned char) *nextp))
+ 			{
+ 				char *downname;
+ 				int	len;
+ 				text	*part;
+ 
+ 				curname = nextp++;
+ 				while (is_ident_cont((unsigned char) *nextp))
+ 					nextp++;
+ 
+ 				len = nextp - curname;
+ 
+ 				downname = downcase_truncate_identifier(curname, len, false);
+ 				part = cstring_to_text_with_len(downname, len);
+ 				astate = accumArrayResult(astate,
+ 					PointerGetDatum(part), false,
+ 							    TEXTOID, CurrentMemoryContext);
+ 				missing_ident = false;
+ 			}
+ 		}
+ 
+ 		if (missing_ident)
+ 			ereport(ERROR,
+ 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 				 errmsg("missing identifier after \".\" symbol: \"%s\"",
+ 							    text_to_cstring(qualname))));
+ 
+ 		while (isspace((unsigned char) *nextp))
+ 			nextp++;
+ 
+ 		if (*nextp == '.')
+ 		{
+ 			nextp++;
+ 			while (isspace((unsigned char) *nextp))
+ 				nextp++;
+ 			continue;
+ 		}
+ 		else if (*nextp == '\0')
+ 		{
+ 			break;
+ 		}
+ 		else
+ 		{
+ 			if (strict_mode)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("identifier contains disallowed characters: \"%s\"",
+ 								    text_to_cstring(qualname))));
+ 			break;
+ 		}
+ 	}
+ 
+ 	PG_RETURN_DATUM(makeArrayResult(astate, CurrentMemoryContext));
+ }
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
new file mode 100644
index 79e92ff..a67518e
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** DESCR("I/O");
*** 3443,3448 ****
--- 3443,3451 ----
  DATA(insert OID = 4086 (  to_regnamespace	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 4089 "25" _null_ _null_ _null_ _null_ _null_ to_regnamespace _null_ _null_ _null_ ));
  DESCR("convert namespace name to regnamespace");
  
+ DATA(insert OID = 3318 (  parse_ident		PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 1009 "25 16" _null_ _null_ "{str,strictmode}" _null_ _null_ parse_ident _null_ _null_ _null_ ));
+ DESCR("parse qualified identifier to array of identifiers");
+ 
  DATA(insert OID = 2246 ( fmgr_internal_validator PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_internal_validator _null_ _null_ _null_ ));
  DESCR("(internal)");
  DATA(insert OID = 2247 ( fmgr_c_validator	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_c_validator _null_ _null_ _null_ ));
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
new file mode 100644
index c2e529f..193b049
*** a/src/include/utils/builtins.h
--- b/src/include/utils/builtins.h
*************** extern Datum pg_typeof(PG_FUNCTION_ARGS)
*** 503,508 ****
--- 503,509 ----
  extern Datum pg_collation_for(PG_FUNCTION_ARGS);
  extern Datum pg_relation_is_updatable(PG_FUNCTION_ARGS);
  extern Datum pg_column_is_updatable(PG_FUNCTION_ARGS);
+ extern Datum parse_ident(PG_FUNCTION_ARGS);
  
  /* oid.c */
  extern Datum oidin(PG_FUNCTION_ARGS);
diff --git a/src/test/regress/expected/name.out b/src/test/regress/expected/name.out
new file mode 100644
index b359d52..8de3f79
*** a/src/test/regress/expected/name.out
--- b/src/test/regress/expected/name.out
*************** SELECT '' AS two, c.f1 FROM NAME_TBL c W
*** 124,126 ****
--- 124,151 ----
  (2 rows)
  
  DROP TABLE NAME_TBL;
+ DO $$
+ DECLARE r text[];
+ BEGIN
+   r := parse_ident('Schemax.Tabley');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+   r := parse_ident('"SchemaX"."TableY"');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+ END;
+ $$;
+ NOTICE:  schemax.tabley
+ NOTICE:  "SchemaX"."TableY"
+ SELECT parse_ident('foo.boo');
+  parse_ident 
+ -------------
+  {foo,boo}
+ (1 row)
+ 
+ SELECT parse_ident('foo.boo[]'); -- should fail
+ ERROR:  identifier contains disallowed characters: "foo.boo[]"
+ SELECT parse_ident('foo.boo[]', strictmode => false); -- ok
+  parse_ident 
+ -------------
+  {foo,boo}
+ (1 row)
+ 
diff --git a/src/test/regress/sql/name.sql b/src/test/regress/sql/name.sql
new file mode 100644
index 1c7a671..629e23f
*** a/src/test/regress/sql/name.sql
--- b/src/test/regress/sql/name.sql
*************** SELECT '' AS three, c.f1 FROM NAME_TBL c
*** 52,54 ****
--- 52,69 ----
  SELECT '' AS two, c.f1 FROM NAME_TBL c WHERE c.f1 ~ '.*asdf.*';
  
  DROP TABLE NAME_TBL;
+ 
+ DO $$
+ DECLARE r text[];
+ BEGIN
+   r := parse_ident('Schemax.Tabley');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+   r := parse_ident('"SchemaX"."TableY"');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+ END;
+ $$;
+ 
+ SELECT parse_ident('foo.boo');
+ SELECT parse_ident('foo.boo[]'); -- should fail
+ SELECT parse_ident('foo.boo[]', strictmode => false); -- ok
+

#31

Teodor Sigaev

teodor@sigaev.ru

almost 10 years ago

In reply to: Pavel Stehule (#30)

Re: proposal: function parse_ident

rebased, messages changes per Tom's proposal

Cool feature and sometimes I needed it a lot.

But, seems, there are some bugs in error processing.

1
Following query is okay:
# select * from parse_ident(E'"Some \r Schema".someTable');
parse_ident
------------------------------
{"Some \r Schema",sometable}
but:
% select * from parse_ident(E'"Some \r Schema".9someTable');
Schema".9someTable"tifier after "." symbol: ""Some

Return carriage is not escaped in error message. Suppose, any other
special charaters will not be escaped.

2
# select * from parse_ident('.someTable');
ERROR: missing identifier after "." symbol: ".someTable"
Why AFTER instead of BEFORE?

2
Function succesfully truncates long indentifier but not in case of quoted
identifier.
select length(a[1]), length(a[2]) from
parse_ident('"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx".yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy')
as a ;
length | length
--------+--------
414 | 63

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32

Pavel Stehule

pavel.stehule@gmail.com

almost 10 years ago

In reply to: Teodor Sigaev (#31)

Re: proposal: function parse_ident

2016-02-08 16:55 GMT+01:00 Teodor Sigaev <teodor@sigaev.ru>:

rebased, messages changes per Tom's proposal

Cool feature and sometimes I needed it a lot.

But, seems, there are some bugs in error processing.

I am looking on it

Regards

Pavel

#33

Alvaro Herrera

alvherre@2ndquadrant.com

almost 10 years ago

In reply to: Pavel Stehule (#32)

Re: proposal: function parse_ident

Pavel Stehule wrote:

I am looking on it

Thanks, closing as returned-with-feedback.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34

Pavel Stehule

pavel.stehule@gmail.com

almost 10 years ago

In reply to: Teodor Sigaev (#31)

Re: proposal: function parse_ident

2016-02-08 16:55 GMT+01:00 Teodor Sigaev <teodor@sigaev.ru>:

rebased, messages changes per Tom's proposal

Cool feature and sometimes I needed it a lot.

But, seems, there are some bugs in error processing.

1
Following query is okay:
# select * from parse_ident(E'"Some \r Schema".someTable');
parse_ident
------------------------------
{"Some \r Schema",sometable}
but:
% select * from parse_ident(E'"Some \r Schema".9someTable');
Schema".9someTable"tifier after "." symbol: ""Some

Return carriage is not escaped in error message. Suppose, any other
special charaters will not be escaped.

2
# select * from parse_ident('.someTable');
ERROR: missing identifier after "." symbol: ".someTable"
Why AFTER instead of BEFORE?

fixed - now the function produce more adequate message - see regress tests

2
Function succesfully truncates long indentifier but not in case of quoted
identifier.
select length(a[1]), length(a[2]) from
parse_ident('"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx".yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy')
as a ;
length | length
--------+--------
414 | 63

fixed - I used the function downcase_truncate_identifier, that does
truncating. I agree - in this case default truncating isn't practical - and
user can explicitly truncate (or implicitly by casting to "name")

New patch attached

Thank you for test

Regards

Pavel

Show quoted text

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW:
http://www.sigaev.ru/

#35

Pavel Stehule

pavel.stehule@gmail.com

almost 10 years ago

In reply to: Pavel Stehule (#34)

1 attachment(s)

Re: proposal: function parse_ident

sorry, I am sending missing attachment

Regards

Pavel

Attachments:

parse_ident-to-text-array-06.patchtext/x-patch; charset=US-ASCII; name=parse_ident-to-text-array-06.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index f9eea76..bfba459
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***************
*** 1821,1826 ****
--- 1821,1843 ----
        <row>
         <entry>
          <indexterm>
+          <primary>parse_ident</primary>
+         </indexterm>
+         <literal><function>parse_ident(<parameter>str</parameter> <type>text</type>,
+            [ <parameter>strictmode</parameter> <type>boolean</type> DEFAULT true ] )</function></literal>
+        </entry>
+        <entry><type>text[]</type></entry>
+        <entry>Split <parameter>qualified identifier</parameter> to array <parameter>parts</parameter>.
+        When second parameter is true, then no any chars after last identifier is allowed. When
+        second parameter is false, then chars after last identifier are ignored.
+        </entry>
+        <entry><literal>parse_ident('"SomeSchema".someTable')</literal></entry>
+        <entry><literal>"SomeSchema,sometable"</literal></entry>
+       </row>
+ 
+       <row>
+        <entry>
+         <indexterm>
           <primary>pg_client_encoding</primary>
          </indexterm>
          <literal><function>pg_client_encoding()</function></literal>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
new file mode 100644
index 923fe58..61d5b80
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*************** RETURNS jsonb
*** 965,967 ****
--- 965,974 ----
  LANGUAGE INTERNAL
  STRICT IMMUTABLE
  AS 'jsonb_set';
+ 
+ CREATE OR REPLACE FUNCTION
+   parse_ident(str text, strictmode boolean DEFAULT true)
+ RETURNS text[]
+ LANGUAGE INTERNAL
+ STRICT IMMUTABLE
+ AS 'parse_ident';
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
new file mode 100644
index 2b4ab20..7aa5b76
*** a/src/backend/parser/scansup.c
--- b/src/backend/parser/scansup.c
*************** scanstr(const char *s)
*** 130,135 ****
--- 130,144 ----
  char *
  downcase_truncate_identifier(const char *ident, int len, bool warn)
  {
+ 	return downcase_identifier(ident, len, warn, true);
+ }
+ 
+ /*
+  * a workhorse for downcase_truncate_identifier
+  */
+ char *
+ downcase_identifier(const char *ident, int len, bool warn, bool truncate)
+ {
  	char	   *result;
  	int			i;
  	bool		enc_is_single_byte;
*************** downcase_truncate_identifier(const char
*** 158,169 ****
  	}
  	result[i] = '\0';
  
! 	if (i >= NAMEDATALEN)
  		truncate_identifier(result, i, warn);
  
  	return result;
  }
  
  /*
   * truncate_identifier() --- truncate an identifier to NAMEDATALEN-1 bytes.
   *
--- 167,179 ----
  	}
  	result[i] = '\0';
  
! 	if (i >= NAMEDATALEN && truncate)
  		truncate_identifier(result, i, warn);
  
  	return result;
  }
  
+ 
  /*
   * truncate_identifier() --- truncate an identifier to NAMEDATALEN-1 bytes.
   *
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
new file mode 100644
index 43f36db..8c2a2f6
*** a/src/backend/utils/adt/misc.c
--- b/src/backend/utils/adt/misc.c
***************
*** 21,32 ****
--- 21,35 ----
  #include <unistd.h>
  
  #include "access/sysattr.h"
+ #include "access/htup_details.h"
  #include "catalog/catalog.h"
+ #include "catalog/namespace.h"
  #include "catalog/pg_tablespace.h"
  #include "catalog/pg_type.h"
  #include "commands/dbcommands.h"
  #include "funcapi.h"
  #include "miscadmin.h"
+ #include "parser/scansup.h"
  #include "parser/keywords.h"
  #include "postmaster/syslogger.h"
  #include "rewrite/rewriteHandler.h"
***************
*** 38,43 ****
--- 41,47 ----
  #include "utils/ruleutils.h"
  #include "tcop/tcopprot.h"
  #include "utils/acl.h"
+ #include "utils/array.h"
  #include "utils/builtins.h"
  #include "utils/timestamp.h"
  
*************** pg_column_is_updatable(PG_FUNCTION_ARGS)
*** 719,721 ****
--- 723,895 ----
  
  	PG_RETURN_BOOL((events & REQ_EVENTS) == REQ_EVENTS);
  }
+ 
+ 
+ /*
+  * This simple parser utility are compatible with lexer implementation,
+  * used only in parse_ident function
+  */
+ static bool
+ is_ident_start(unsigned char c)
+ {
+ 	if (c == '_')
+ 		return true;
+ 	if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'))
+ 		return true;
+ 
+ 	if (c >= 0200 && c <= 0377)
+ 		return true;
+ 
+ 	return false;
+ }
+ 
+ static bool
+ is_ident_cont(unsigned char c)
+ {
+ 	if (c >= '0' && c <= '9')
+ 		return true;
+ 
+ 	return is_ident_start(c);
+ }
+ 
+ /*
+  * parse_ident - parse SQL composed identifier to separate identifiers.
+  * When strict mode is active (second parameter), then any chars after
+  * last identifiers are disallowed.
+  */
+ Datum
+ parse_ident(PG_FUNCTION_ARGS)
+ {
+ 	text		*qualname;
+ 	char		*qualname_str;
+ 	bool		strict_mode;
+ 	ArrayBuildState *astate = NULL;
+ 	char	*nextp;
+ 	bool		after_dot = false;
+ 
+ 	qualname = PG_GETARG_TEXT_PP(0);
+ 	qualname_str = text_to_cstring(qualname);
+ 	strict_mode = PG_GETARG_BOOL(1);
+ 
+ 	nextp = qualname_str;
+ 
+ 	/* skip leading whitespace */
+ 	while (isspace((unsigned char) *nextp))
+ 		nextp++;
+ 
+ 	for (;;)
+ 	{
+ 		char		*curname;
+ 		char		*endp;
+ 		bool		missing_ident;
+ 
+ 		missing_ident = true;
+ 
+ 		if (*nextp == '\"')
+ 		{
+ 			curname = nextp + 1;
+ 			for (;;)
+ 			{
+ 				endp = strchr(nextp + 1, '\"');
+ 				if (endp == NULL)
+ 					ereport(ERROR,
+ 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 						 errmsg("unclosed double quotes"),
+ 						 errdetail("string \"%s\" is not valid identifier",
+ 							    text_to_cstring(qualname))));
+ 				if (endp[1] != '\"')
+ 					break;
+ 				memmove(endp, endp + 1, strlen(endp));
+ 				nextp = endp;
+ 			}
+ 			nextp = endp + 1;
+ 			*endp = '\0';
+ 
+ 			if (endp - curname == 0)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("identifier should not be empty: \"%s\"",
+ 								    text_to_cstring(qualname))));
+ 
+ 			astate = accumArrayResult(astate,
+ 				CStringGetTextDatum(curname), false,
+ 						    TEXTOID, CurrentMemoryContext);
+ 			missing_ident = false;
+ 		}
+ 		else
+ 		{
+ 			if (is_ident_start((unsigned char) *nextp))
+ 			{
+ 				char *downname;
+ 				int	len;
+ 				text	*part;
+ 
+ 				curname = nextp++;
+ 				while (is_ident_cont((unsigned char) *nextp))
+ 					nextp++;
+ 
+ 				len = nextp - curname;
+ 
+ 				/*
+ 				 * We don't would to truncate identifier implicitli in this
+ 				 * moment, so we use downcase_identifier instead
+ 				 * dowcase_truncate_identifier. The user can do truncating
+ 				 * later if it is necessary for his purpouse.
+ 				 */
+ 				downname = downcase_identifier(curname, len, false, false);
+ 				part = cstring_to_text_with_len(downname, len);
+ 				astate = accumArrayResult(astate,
+ 					PointerGetDatum(part), false,
+ 							    TEXTOID, CurrentMemoryContext);
+ 				missing_ident = false;
+ 			}
+ 		}
+ 
+ 		if (missing_ident)
+ 		{
+ 			/* choose text of adequate error message */
+ 			if (*nextp == '.')
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("missing identifier before \".\" symbol: \"%s\"",
+ 								    text_to_cstring(qualname))));
+ 			else if (after_dot)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("missing identifier after \".\" symbol: \"%s\"",
+ 								    text_to_cstring(qualname))));
+ 			else
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("missing identifier: \"%s\"",
+ 								    text_to_cstring(qualname))));
+ 		}
+ 
+ 		while (isspace((unsigned char) *nextp))
+ 			nextp++;
+ 
+ 		if (*nextp == '.')
+ 		{
+ 			after_dot = true;
+ 			nextp++;
+ 			while (isspace((unsigned char) *nextp))
+ 				nextp++;
+ 			continue;
+ 		}
+ 		else if (*nextp == '\0')
+ 		{
+ 			break;
+ 		}
+ 		else
+ 		{
+ 			if (strict_mode)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("identifier contains disallowed characters: \"%s\"",
+ 								    text_to_cstring(qualname))));
+ 			break;
+ 		}
+ 	}
+ 
+ 	PG_RETURN_DATUM(makeArrayResult(astate, CurrentMemoryContext));
+ }
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
new file mode 100644
index 1c0ef9a..94b1a22
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** DESCR("I/O");
*** 3457,3462 ****
--- 3457,3465 ----
  DATA(insert OID = 4086 (  to_regnamespace	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 4089 "25" _null_ _null_ _null_ _null_ _null_ to_regnamespace _null_ _null_ _null_ ));
  DESCR("convert namespace name to regnamespace");
  
+ DATA(insert OID = 3318 (  parse_ident		PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 1009 "25 16" _null_ _null_ "{str,strictmode}" _null_ _null_ parse_ident _null_ _null_ _null_ ));
+ DESCR("parse qualified identifier to array of identifiers");
+ 
  DATA(insert OID = 2246 ( fmgr_internal_validator PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_internal_validator _null_ _null_ _null_ ));
  DESCR("(internal)");
  DATA(insert OID = 2247 ( fmgr_c_validator	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_c_validator _null_ _null_ _null_ ));
diff --git a/src/include/parser/scansup.h b/src/include/parser/scansup.h
new file mode 100644
index 4f4164b..4f95c81
*** a/src/include/parser/scansup.h
--- b/src/include/parser/scansup.h
*************** extern char *scanstr(const char *s);
*** 20,25 ****
--- 20,28 ----
  extern char *downcase_truncate_identifier(const char *ident, int len,
  							 bool warn);
  
+ extern char *downcase_identifier(const char *ident, int len,
+ 							 bool warn, bool truncate);
+ 
  extern void truncate_identifier(char *ident, int len, bool warn);
  
  extern bool scanner_isspace(char ch);
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
new file mode 100644
index affcc01..b22a6b8
*** a/src/include/utils/builtins.h
--- b/src/include/utils/builtins.h
*************** extern Datum pg_typeof(PG_FUNCTION_ARGS)
*** 508,513 ****
--- 508,514 ----
  extern Datum pg_collation_for(PG_FUNCTION_ARGS);
  extern Datum pg_relation_is_updatable(PG_FUNCTION_ARGS);
  extern Datum pg_column_is_updatable(PG_FUNCTION_ARGS);
+ extern Datum parse_ident(PG_FUNCTION_ARGS);
  
  /* oid.c */
  extern Datum oidin(PG_FUNCTION_ARGS);
diff --git a/src/test/regress/expected/name.out b/src/test/regress/expected/name.out
new file mode 100644
index b359d52..5fe3670
*** a/src/test/regress/expected/name.out
--- b/src/test/regress/expected/name.out
*************** SELECT '' AS two, c.f1 FROM NAME_TBL c W
*** 124,126 ****
--- 124,164 ----
  (2 rows)
  
  DROP TABLE NAME_TBL;
+ DO $$
+ DECLARE r text[];
+ BEGIN
+   r := parse_ident('Schemax.Tabley');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+   r := parse_ident('"SchemaX"."TableY"');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+ END;
+ $$;
+ NOTICE:  schemax.tabley
+ NOTICE:  "SchemaX"."TableY"
+ SELECT parse_ident('foo.boo');
+  parse_ident 
+ -------------
+  {foo,boo}
+ (1 row)
+ 
+ SELECT parse_ident('foo.boo[]'); -- should fail
+ ERROR:  identifier contains disallowed characters: "foo.boo[]"
+ SELECT parse_ident('foo.boo[]', strictmode => false); -- ok
+  parse_ident 
+ -------------
+  {foo,boo}
+ (1 row)
+ 
+ -- should fail
+ SELECT parse_ident(' ');
+ ERROR:  missing identifier: " "
+ SELECT parse_ident(' .aaa');
+ ERROR:  missing identifier before "." symbol: " .aaa"
+ SELECT parse_ident(' aaa . ');
+ ERROR:  missing identifier after "." symbol: " aaa . "
+ SELECT length(a[1]), length(a[2]) from parse_ident('"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx".yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy') as a ;
+  length | length 
+ --------+--------
+     414 |    289
+ (1 row)
+ 
diff --git a/src/test/regress/sql/name.sql b/src/test/regress/sql/name.sql
new file mode 100644
index 1c7a671..3050613
*** a/src/test/regress/sql/name.sql
--- b/src/test/regress/sql/name.sql
*************** SELECT '' AS three, c.f1 FROM NAME_TBL c
*** 52,54 ****
--- 52,75 ----
  SELECT '' AS two, c.f1 FROM NAME_TBL c WHERE c.f1 ~ '.*asdf.*';
  
  DROP TABLE NAME_TBL;
+ 
+ DO $$
+ DECLARE r text[];
+ BEGIN
+   r := parse_ident('Schemax.Tabley');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+   r := parse_ident('"SchemaX"."TableY"');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+ END;
+ $$;
+ 
+ SELECT parse_ident('foo.boo');
+ SELECT parse_ident('foo.boo[]'); -- should fail
+ SELECT parse_ident('foo.boo[]', strictmode => false); -- ok
+ 
+ -- should fail
+ SELECT parse_ident(' ');
+ SELECT parse_ident(' .aaa');
+ SELECT parse_ident(' aaa . ');
+ 
+ SELECT length(a[1]), length(a[2]) from parse_ident('"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx".yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy') as a ;

#36

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 10 years ago

In reply to: Pavel Stehule (#35)

Re: proposal: function parse_ident

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested

NEEDS CATVERSION BUMP

I editorialized the docs and some of the comments. In particular, I documented behavior of not truncating, and recommended casting to name[] if user cares about that. Added a unit test to verify that works. BTW, I saw mention in the thread about not truncated spaces, but the function *does* truncate them, unless they're inside quotes, where they're legitimate.

Also added test for invalid characters.

I think "strict" would be more in line with other uses in code. There are currently no other occurrences of 'strictmode' in the code. There are loads of references to 'strict', but I didn't go through all of them to see if any were used as externally visible function parameter names.

qualname_str is used in exactly 1 place. Either it should be gotten rid of, or all the uses of text_to_cstring(qualname) should be changed to qualname_str.

I think the code would have been clearer if instead of the big if (*nextp == '\"') it did the same "inquote" looping that is done elsewhere, but I don't have a strong opinion on it.

The new status of this patch is: Waiting on Author

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 10 years ago

In reply to: Jim Nasby (#36)

1 attachment(s)

Re: proposal: function parse_ident

On 2/10/16 12:26 PM, Jim Nasby wrote:

I editorialized the docs and some of the comments. In particular, I documented behavior of not truncating, and recommended casting to name[] if user cares about that. Added a unit test to verify that works. BTW, I saw mention in the thread about not truncated spaces, but the function*does* truncate them, unless they're inside quotes, where they're legitimate.

New patch for that.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Attachments:

parse_ident-to-text-array-07.patchtext/plain; charset=UTF-8; name=parse_ident-to-text-array-07.patch; x-mac-creator=0; x-mac-type=0Download

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 139aa2b..b4a2898 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1778,6 +1778,27 @@
       <row>
        <entry>
         <indexterm>
+         <primary>parse_ident</primary>
+        </indexterm>
+        <literal><function>parse_ident(<parameter>str</parameter> <type>text</type>,
+           [ <parameter>strictmode</parameter> <type>boolean</type> DEFAULT true ] )</function></literal>
+       </entry>
+       <entry><type>text[]</type></entry>
+       <entry>Split <parameter>qualified identifier</parameter> into array <parameter>parts</parameter>.
+       When <parameter>strictmode</parameter> is false, extra characters after the identifier are ignored.
+       This is useful for parsing identifiers for objects like functions and arrays that may have trailing
+       characters. By default, extra characters after the last identifier are considered an error.
+       second parameter is false, then chars after last identifier are ignored. Note that this function
+       does not truncate quoted identifiers. If you care about that you should cast the result of this
+	   function to name[].
+       </entry>
+       <entry><literal>parse_ident('"SomeSchema".someTable')</literal></entry>
+       <entry><literal>"SomeSchema,sometable"</literal></entry>
+      </row>
+
+      <row>
+       <entry>
+        <indexterm>
          <primary>pg_client_encoding</primary>
         </indexterm>
         <literal><function>pg_client_encoding()</function></literal>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 923fe58..61d5b80 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -965,3 +965,10 @@ RETURNS jsonb
 LANGUAGE INTERNAL
 STRICT IMMUTABLE
 AS 'jsonb_set';
+
+CREATE OR REPLACE FUNCTION
+  parse_ident(str text, strictmode boolean DEFAULT true)
+RETURNS text[]
+LANGUAGE INTERNAL
+STRICT IMMUTABLE
+AS 'parse_ident';
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index 2b4ab20..7aa5b76 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -130,6 +130,15 @@ scanstr(const char *s)
 char *
 downcase_truncate_identifier(const char *ident, int len, bool warn)
 {
+	return downcase_identifier(ident, len, warn, true);
+}
+
+/*
+ * a workhorse for downcase_truncate_identifier
+ */
+char *
+downcase_identifier(const char *ident, int len, bool warn, bool truncate)
+{
 	char	   *result;
 	int			i;
 	bool		enc_is_single_byte;
@@ -158,12 +167,13 @@ downcase_truncate_identifier(const char *ident, int len, bool warn)
 	}
 	result[i] = '\0';
 
-	if (i >= NAMEDATALEN)
+	if (i >= NAMEDATALEN && truncate)
 		truncate_identifier(result, i, warn);
 
 	return result;
 }
 
+
 /*
  * truncate_identifier() --- truncate an identifier to NAMEDATALEN-1 bytes.
  *
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index 6a306f3..3072c32 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -21,12 +21,15 @@
 #include <unistd.h>
 
 #include "access/sysattr.h"
+#include "access/htup_details.h"
 #include "catalog/catalog.h"
+#include "catalog/namespace.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_type.h"
 #include "commands/dbcommands.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "parser/scansup.h"
 #include "parser/keywords.h"
 #include "postmaster/syslogger.h"
 #include "rewrite/rewriteHandler.h"
@@ -38,6 +41,7 @@
 #include "utils/ruleutils.h"
 #include "tcop/tcopprot.h"
 #include "utils/acl.h"
+#include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/timestamp.h"
 
@@ -598,3 +602,173 @@ pg_column_is_updatable(PG_FUNCTION_ARGS)
 
 	PG_RETURN_BOOL((events & REQ_EVENTS) == REQ_EVENTS);
 }
+
+
+/*
+ * This simple parser utility are compatible with lexer implementation,
+ * used only in parse_ident function
+ */
+static bool
+is_ident_start(unsigned char c)
+{
+	if (c == '_')
+		return true;
+	if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'))
+		return true;
+
+	if (c >= 0200 && c <= 0377)
+		return true;
+
+	return false;
+}
+
+static bool
+is_ident_cont(unsigned char c)
+{
+	if (c >= '0' && c <= '9')
+		return true;
+
+	return is_ident_start(c);
+}
+
+/*
+ * parse_ident - parse SQL composed identifier to separate identifiers.
+ * When strict mode is active (second parameter), then any chars after
+ * last identifiers are disallowed.
+ */
+Datum
+parse_ident(PG_FUNCTION_ARGS)
+{
+	text		*qualname;
+	char		*qualname_str;
+	bool		strict_mode;
+	ArrayBuildState *astate = NULL;
+	char	*nextp;
+	bool		after_dot = false;
+
+	qualname = PG_GETARG_TEXT_PP(0);
+	qualname_str = text_to_cstring(qualname);
+	strict_mode = PG_GETARG_BOOL(1);
+
+	nextp = qualname_str;
+
+	/* skip leading whitespace */
+	while (isspace((unsigned char) *nextp))
+		nextp++;
+
+	for (;;)
+	{
+		char		*curname;
+		char		*endp;
+		bool		missing_ident;
+
+		missing_ident = true;
+
+		if (*nextp == '\"')
+		{
+			curname = nextp + 1;
+			for (;;)
+			{
+				endp = strchr(nextp + 1, '\"');
+				if (endp == NULL)
+					ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("unclosed double quotes"),
+						 errdetail("string \"%s\" is not valid identifier",
+							    text_to_cstring(qualname))));
+				if (endp[1] != '\"')
+					break;
+				memmove(endp, endp + 1, strlen(endp));
+				nextp = endp;
+			}
+			nextp = endp + 1;
+			*endp = '\0';
+
+			if (endp - curname == 0)
+				ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("identifier should not be empty: \"%s\"",
+								    text_to_cstring(qualname))));
+
+			astate = accumArrayResult(astate,
+				CStringGetTextDatum(curname), false,
+						    TEXTOID, CurrentMemoryContext);
+			missing_ident = false;
+		}
+		else
+		{
+			if (is_ident_start((unsigned char) *nextp))
+			{
+				char *downname;
+				int	len;
+				text	*part;
+
+				curname = nextp++;
+				while (is_ident_cont((unsigned char) *nextp))
+					nextp++;
+
+				len = nextp - curname;
+
+				/*
+				 * We don't would to truncate identifier implicitli in this
+				 * moment, so we use downcase_identifier instead
+				 * dowcase_truncate_identifier. The user can do truncating
+				 * later if it is necessary for his purpouse.
+				 */
+				downname = downcase_identifier(curname, len, false, false);
+				part = cstring_to_text_with_len(downname, len);
+				astate = accumArrayResult(astate,
+					PointerGetDatum(part), false,
+							    TEXTOID, CurrentMemoryContext);
+				missing_ident = false;
+			}
+		}
+
+		if (missing_ident)
+		{
+			/* choose text of adequate error message */
+			if (*nextp == '.')
+				ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("missing identifier before \".\" symbol: \"%s\"",
+								    text_to_cstring(qualname))));
+			else if (after_dot)
+				ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("missing identifier after \".\" symbol: \"%s\"",
+								    text_to_cstring(qualname))));
+			else
+				ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("missing identifier: \"%s\"",
+								    text_to_cstring(qualname))));
+		}
+
+		while (isspace((unsigned char) *nextp))
+			nextp++;
+
+		if (*nextp == '.')
+		{
+			after_dot = true;
+			nextp++;
+			while (isspace((unsigned char) *nextp))
+				nextp++;
+			continue;
+		}
+		else if (*nextp == '\0')
+		{
+			break;
+		}
+		else
+		{
+			if (strict_mode)
+				ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("identifier contains disallowed characters: \"%s\"",
+								    text_to_cstring(qualname))));
+			break;
+		}
+	}
+
+	PG_RETURN_DATUM(makeArrayResult(astate, CurrentMemoryContext));
+}
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 5ded13e..d7c09a7 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -3451,6 +3451,9 @@ DESCR("I/O");
 DATA(insert OID = 4086 (  to_regnamespace	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 4089 "25" _null_ _null_ _null_ _null_ _null_ to_regnamespace _null_ _null_ _null_ ));
 DESCR("convert namespace name to regnamespace");
 
+DATA(insert OID = 3318 (  parse_ident		PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 1009 "25 16" _null_ _null_ "{str,strictmode}" _null_ _null_ parse_ident _null_ _null_ _null_ ));
+DESCR("parse qualified identifier to array of identifiers");
+
 DATA(insert OID = 2246 ( fmgr_internal_validator PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_internal_validator _null_ _null_ _null_ ));
 DESCR("(internal)");
 DATA(insert OID = 2247 ( fmgr_c_validator	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_c_validator _null_ _null_ _null_ ));
diff --git a/src/include/parser/scansup.h b/src/include/parser/scansup.h
index 4f4164b..4f95c81 100644
--- a/src/include/parser/scansup.h
+++ b/src/include/parser/scansup.h
@@ -20,6 +20,9 @@ extern char *scanstr(const char *s);
 extern char *downcase_truncate_identifier(const char *ident, int len,
 							 bool warn);
 
+extern char *downcase_identifier(const char *ident, int len,
+							 bool warn, bool truncate);
+
 extern void truncate_identifier(char *ident, int len, bool warn);
 
 extern bool scanner_isspace(char ch);
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 5e8e832..b1b6ef6 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -504,6 +504,7 @@ extern Datum pg_typeof(PG_FUNCTION_ARGS);
 extern Datum pg_collation_for(PG_FUNCTION_ARGS);
 extern Datum pg_relation_is_updatable(PG_FUNCTION_ARGS);
 extern Datum pg_column_is_updatable(PG_FUNCTION_ARGS);
+extern Datum parse_ident(PG_FUNCTION_ARGS);
 
 /* oid.c */
 extern Datum oidin(PG_FUNCTION_ARGS);
diff --git a/src/test/regress/expected/name.out b/src/test/regress/expected/name.out
index b359d52..66569f4 100644
--- a/src/test/regress/expected/name.out
+++ b/src/test/regress/expected/name.out
@@ -124,3 +124,55 @@ SELECT '' AS two, c.f1 FROM NAME_TBL c WHERE c.f1 ~ '.*asdf.*';
 (2 rows)
 
 DROP TABLE NAME_TBL;
+DO $$
+DECLARE r text[];
+BEGIN
+  r := parse_ident('Schemax.Tabley');
+  RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+  r := parse_ident('"SchemaX"."TableY"');
+  RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+END;
+$$;
+NOTICE:  schemax.tabley
+NOTICE:  "SchemaX"."TableY"
+SELECT parse_ident('foo.boo');
+ parse_ident 
+-------------
+ {foo,boo}
+(1 row)
+
+SELECT parse_ident('foo.boo[]'); -- should fail
+ERROR:  identifier contains disallowed characters: "foo.boo[]"
+SELECT parse_ident('foo.boo[]', strictmode => false); -- ok
+ parse_ident 
+-------------
+ {foo,boo}
+(1 row)
+
+-- should fail
+SELECT parse_ident(' ');
+ERROR:  missing identifier: " "
+SELECT parse_ident(' .aaa');
+ERROR:  missing identifier before "." symbol: " .aaa"
+SELECT parse_ident(' aaa . ');
+ERROR:  missing identifier after "." symbol: " aaa . "
+SELECT parse_ident('aaa.a%b');
+ERROR:  identifier contains disallowed characters: "aaa.a%b"
+SELECT length(a[1]), length(a[2]) from parse_ident('"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx".yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy') as a ;
+ length | length 
+--------+--------
+    414 |    289
+(1 row)
+
+SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"');
+                                                parse_ident                                                
+-----------------------------------------------------------------------------------------------------------
+ {first,"  second  ","   third   ","  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}
+(1 row)
+
+SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"')::name[];
+                                             parse_ident                                              
+------------------------------------------------------------------------------------------------------
+ {first,"  second  ","   third   ","  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}
+(1 row)
+
diff --git a/src/test/regress/sql/name.sql b/src/test/regress/sql/name.sql
index 1c7a671..2e70e4e 100644
--- a/src/test/regress/sql/name.sql
+++ b/src/test/regress/sql/name.sql
@@ -52,3 +52,28 @@ SELECT '' AS three, c.f1 FROM NAME_TBL c WHERE c.f1 ~ '[0-9]';
 SELECT '' AS two, c.f1 FROM NAME_TBL c WHERE c.f1 ~ '.*asdf.*';
 
 DROP TABLE NAME_TBL;
+
+DO $$
+DECLARE r text[];
+BEGIN
+  r := parse_ident('Schemax.Tabley');
+  RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+  r := parse_ident('"SchemaX"."TableY"');
+  RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+END;
+$$;
+
+SELECT parse_ident('foo.boo');
+SELECT parse_ident('foo.boo[]'); -- should fail
+SELECT parse_ident('foo.boo[]', strictmode => false); -- ok
+
+-- should fail
+SELECT parse_ident(' ');
+SELECT parse_ident(' .aaa');
+SELECT parse_ident(' aaa . ');
+SELECT parse_ident('aaa.a%b');
+
+SELECT length(a[1]), length(a[2]) from parse_ident('"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx".yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy') as a ;
+
+SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"');
+SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"')::name[];

#38

Pavel Stehule

pavel.stehule@gmail.com

almost 10 years ago

In reply to: Jim Nasby (#36)

1 attachment(s)

Re: proposal: function parse_ident

2016-02-10 19:26 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested

NEEDS CATVERSION BUMP

I editorialized the docs and some of the comments. In particular, I
documented behavior of not truncating, and recommended casting to name[] if
user cares about that. Added a unit test to verify that works. BTW, I saw
mention in the thread about not truncated spaces, but the function *does*
truncate them, unless they're inside quotes, where they're legitimate.

Also added test for invalid characters.

I think "strict" would be more in line with other uses in code. There are
currently no other occurrences of 'strictmode' in the code. There are loads
of references to 'strict', but I didn't go through all of them to see if
any were used as externally visible function parameter names.

I am sorry, I don't understand to this point. You unlike the name of
parameter "strictmode" ? Have you any proposal? Maybe "restrictive" ?

qualname_str is used in exactly 1 place. Either it should be gotten rid
of, or all the uses of text_to_cstring(qualname) should be changed to
qualname_str.

fixed, qualname_str is used everywhere

I think the code would have been clearer if instead of the big if (*nextp
== '\"') it did the same "inquote" looping that is done elsewhere, but I
don't have a strong opinion on it.

The almost all code +/- is related to human readable error messages. We can
move some code to separate static functions - read_quoted_ident,
read_unquoted_ident, but there will be some magic about parameters, and the
code will not be much better, than it is now.

The new status of this patch is: Waiting on Author

Thank you for your work on documentation.

I am sending updated version of this patch.

Regards

Pavel

Show quoted text

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachments:

parse_ident-to-text-array-08.patchtext/x-patch; charset=US-ASCII; name=parse_ident-to-text-array-08.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index f9eea76..9eed19a
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***************
*** 1821,1826 ****
--- 1821,1847 ----
        <row>
         <entry>
          <indexterm>
+          <primary>parse_ident</primary>
+         </indexterm>
+         <literal><function>parse_ident(<parameter>str</parameter> <type>text</type>,
+            [ <parameter>strictmode</parameter> <type>boolean</type> DEFAULT true ] )</function></literal>
+        </entry>
+        <entry><type>text[]</type></entry>
+        <entry>Split <parameter>qualified identifier</parameter> into array <parameter>parts</parameter>.
+        When <parameter>strictmode</parameter> is false, extra characters after the identifier are ignored.
+        This is useful for parsing identifiers for objects like functions and arrays that may have trailing
+        characters. By default, extra characters after the last identifier are considered an error.
+        second parameter is false, then chars after last identifier are ignored. Note that this function
+        does not truncate quoted identifiers. If you care about that you should cast the result of this
+ 	   function to name[].
+        </entry>
+        <entry><literal>parse_ident('"SomeSchema".someTable')</literal></entry>
+        <entry><literal>"SomeSchema,sometable"</literal></entry>
+       </row>
+ 
+       <row>
+        <entry>
+         <indexterm>
           <primary>pg_client_encoding</primary>
          </indexterm>
          <literal><function>pg_client_encoding()</function></literal>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
new file mode 100644
index 923fe58..61d5b80
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*************** RETURNS jsonb
*** 965,967 ****
--- 965,974 ----
  LANGUAGE INTERNAL
  STRICT IMMUTABLE
  AS 'jsonb_set';
+ 
+ CREATE OR REPLACE FUNCTION
+   parse_ident(str text, strictmode boolean DEFAULT true)
+ RETURNS text[]
+ LANGUAGE INTERNAL
+ STRICT IMMUTABLE
+ AS 'parse_ident';
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
new file mode 100644
index 2b4ab20..7aa5b76
*** a/src/backend/parser/scansup.c
--- b/src/backend/parser/scansup.c
*************** scanstr(const char *s)
*** 130,135 ****
--- 130,144 ----
  char *
  downcase_truncate_identifier(const char *ident, int len, bool warn)
  {
+ 	return downcase_identifier(ident, len, warn, true);
+ }
+ 
+ /*
+  * a workhorse for downcase_truncate_identifier
+  */
+ char *
+ downcase_identifier(const char *ident, int len, bool warn, bool truncate)
+ {
  	char	   *result;
  	int			i;
  	bool		enc_is_single_byte;
*************** downcase_truncate_identifier(const char
*** 158,169 ****
  	}
  	result[i] = '\0';
  
! 	if (i >= NAMEDATALEN)
  		truncate_identifier(result, i, warn);
  
  	return result;
  }
  
  /*
   * truncate_identifier() --- truncate an identifier to NAMEDATALEN-1 bytes.
   *
--- 167,179 ----
  	}
  	result[i] = '\0';
  
! 	if (i >= NAMEDATALEN && truncate)
  		truncate_identifier(result, i, warn);
  
  	return result;
  }
  
+ 
  /*
   * truncate_identifier() --- truncate an identifier to NAMEDATALEN-1 bytes.
   *
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
new file mode 100644
index 43f36db..778b71a
*** a/src/backend/utils/adt/misc.c
--- b/src/backend/utils/adt/misc.c
***************
*** 21,32 ****
--- 21,35 ----
  #include <unistd.h>
  
  #include "access/sysattr.h"
+ #include "access/htup_details.h"
  #include "catalog/catalog.h"
+ #include "catalog/namespace.h"
  #include "catalog/pg_tablespace.h"
  #include "catalog/pg_type.h"
  #include "commands/dbcommands.h"
  #include "funcapi.h"
  #include "miscadmin.h"
+ #include "parser/scansup.h"
  #include "parser/keywords.h"
  #include "postmaster/syslogger.h"
  #include "rewrite/rewriteHandler.h"
***************
*** 38,43 ****
--- 41,47 ----
  #include "utils/ruleutils.h"
  #include "tcop/tcopprot.h"
  #include "utils/acl.h"
+ #include "utils/array.h"
  #include "utils/builtins.h"
  #include "utils/timestamp.h"
  
*************** pg_column_is_updatable(PG_FUNCTION_ARGS)
*** 719,721 ****
--- 723,895 ----
  
  	PG_RETURN_BOOL((events & REQ_EVENTS) == REQ_EVENTS);
  }
+ 
+ 
+ /*
+  * This simple parser utility are compatible with lexer implementation,
+  * used only in parse_ident function
+  */
+ static bool
+ is_ident_start(unsigned char c)
+ {
+ 	if (c == '_')
+ 		return true;
+ 	if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'))
+ 		return true;
+ 
+ 	if (c >= 0200 && c <= 0377)
+ 		return true;
+ 
+ 	return false;
+ }
+ 
+ static bool
+ is_ident_cont(unsigned char c)
+ {
+ 	if (c >= '0' && c <= '9')
+ 		return true;
+ 
+ 	return is_ident_start(c);
+ }
+ 
+ /*
+  * parse_ident - parse SQL composed identifier to separate identifiers.
+  * When strict mode is active (second parameter), then any chars after
+  * last identifiers are disallowed.
+  */
+ Datum
+ parse_ident(PG_FUNCTION_ARGS)
+ {
+ 	text		*qualname;
+ 	char		*qualname_str;
+ 	bool		strict_mode;
+ 	ArrayBuildState *astate = NULL;
+ 	char	*nextp;
+ 	bool		after_dot = false;
+ 
+ 	qualname = PG_GETARG_TEXT_PP(0);
+ 	qualname_str = text_to_cstring(qualname);
+ 	strict_mode = PG_GETARG_BOOL(1);
+ 
+ 	nextp = qualname_str;
+ 
+ 	/* skip leading whitespace */
+ 	while (isspace((unsigned char) *nextp))
+ 		nextp++;
+ 
+ 	for (;;)
+ 	{
+ 		char		*curname;
+ 		char		*endp;
+ 		bool		missing_ident;
+ 
+ 		missing_ident = true;
+ 
+ 		if (*nextp == '\"')
+ 		{
+ 			curname = nextp + 1;
+ 			for (;;)
+ 			{
+ 				endp = strchr(nextp + 1, '\"');
+ 				if (endp == NULL)
+ 					ereport(ERROR,
+ 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 						 errmsg("unclosed double quotes"),
+ 						 errdetail("string \"%s\" is not valid identifier",
+ 													qualname_str)));
+ 				if (endp[1] != '\"')
+ 					break;
+ 				memmove(endp, endp + 1, strlen(endp));
+ 				nextp = endp;
+ 			}
+ 			nextp = endp + 1;
+ 			*endp = '\0';
+ 
+ 			if (endp - curname == 0)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("identifier should not be empty: \"%s\"",
+ 												qualname_str)));
+ 
+ 			astate = accumArrayResult(astate,
+ 				CStringGetTextDatum(curname), false,
+ 						    TEXTOID, CurrentMemoryContext);
+ 			missing_ident = false;
+ 		}
+ 		else
+ 		{
+ 			if (is_ident_start((unsigned char) *nextp))
+ 			{
+ 				char *downname;
+ 				int	len;
+ 				text	*part;
+ 
+ 				curname = nextp++;
+ 				while (is_ident_cont((unsigned char) *nextp))
+ 					nextp++;
+ 
+ 				len = nextp - curname;
+ 
+ 				/*
+ 				 * We don't would to truncate identifier implicitli in this
+ 				 * moment, so we use downcase_identifier instead
+ 				 * dowcase_truncate_identifier. The user can do truncating
+ 				 * later if it is necessary for his purpouse.
+ 				 */
+ 				downname = downcase_identifier(curname, len, false, false);
+ 				part = cstring_to_text_with_len(downname, len);
+ 				astate = accumArrayResult(astate,
+ 					PointerGetDatum(part), false,
+ 							    TEXTOID, CurrentMemoryContext);
+ 				missing_ident = false;
+ 			}
+ 		}
+ 
+ 		if (missing_ident)
+ 		{
+ 			/* choose text of adequate error message */
+ 			if (*nextp == '.')
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("missing identifier before \".\" symbol: \"%s\"",
+ 												qualname_str)));
+ 			else if (after_dot)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("missing identifier after \".\" symbol: \"%s\"",
+ 												qualname_str)));
+ 			else
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("missing identifier: \"%s\"",
+ 												qualname_str)));
+ 		}
+ 
+ 		while (isspace((unsigned char) *nextp))
+ 			nextp++;
+ 
+ 		if (*nextp == '.')
+ 		{
+ 			after_dot = true;
+ 			nextp++;
+ 			while (isspace((unsigned char) *nextp))
+ 				nextp++;
+ 			continue;
+ 		}
+ 		else if (*nextp == '\0')
+ 		{
+ 			break;
+ 		}
+ 		else
+ 		{
+ 			if (strict_mode)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("identifier contains disallowed characters: \"%s\"",
+ 												qualname_str)));
+ 			break;
+ 		}
+ 	}
+ 
+ 	PG_RETURN_DATUM(makeArrayResult(astate, CurrentMemoryContext));
+ }
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
new file mode 100644
index 1c0ef9a..94b1a22
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** DESCR("I/O");
*** 3457,3462 ****
--- 3457,3465 ----
  DATA(insert OID = 4086 (  to_regnamespace	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 4089 "25" _null_ _null_ _null_ _null_ _null_ to_regnamespace _null_ _null_ _null_ ));
  DESCR("convert namespace name to regnamespace");
  
+ DATA(insert OID = 3318 (  parse_ident		PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 1009 "25 16" _null_ _null_ "{str,strictmode}" _null_ _null_ parse_ident _null_ _null_ _null_ ));
+ DESCR("parse qualified identifier to array of identifiers");
+ 
  DATA(insert OID = 2246 ( fmgr_internal_validator PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_internal_validator _null_ _null_ _null_ ));
  DESCR("(internal)");
  DATA(insert OID = 2247 ( fmgr_c_validator	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_c_validator _null_ _null_ _null_ ));
diff --git a/src/include/parser/scansup.h b/src/include/parser/scansup.h
new file mode 100644
index 4f4164b..4f95c81
*** a/src/include/parser/scansup.h
--- b/src/include/parser/scansup.h
*************** extern char *scanstr(const char *s);
*** 20,25 ****
--- 20,28 ----
  extern char *downcase_truncate_identifier(const char *ident, int len,
  							 bool warn);
  
+ extern char *downcase_identifier(const char *ident, int len,
+ 							 bool warn, bool truncate);
+ 
  extern void truncate_identifier(char *ident, int len, bool warn);
  
  extern bool scanner_isspace(char ch);
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
new file mode 100644
index affcc01..b22a6b8
*** a/src/include/utils/builtins.h
--- b/src/include/utils/builtins.h
*************** extern Datum pg_typeof(PG_FUNCTION_ARGS)
*** 508,513 ****
--- 508,514 ----
  extern Datum pg_collation_for(PG_FUNCTION_ARGS);
  extern Datum pg_relation_is_updatable(PG_FUNCTION_ARGS);
  extern Datum pg_column_is_updatable(PG_FUNCTION_ARGS);
+ extern Datum parse_ident(PG_FUNCTION_ARGS);
  
  /* oid.c */
  extern Datum oidin(PG_FUNCTION_ARGS);
diff --git a/src/test/regress/expected/name.out b/src/test/regress/expected/name.out
new file mode 100644
index b359d52..66569f4
*** a/src/test/regress/expected/name.out
--- b/src/test/regress/expected/name.out
*************** SELECT '' AS two, c.f1 FROM NAME_TBL c W
*** 124,126 ****
--- 124,178 ----
  (2 rows)
  
  DROP TABLE NAME_TBL;
+ DO $$
+ DECLARE r text[];
+ BEGIN
+   r := parse_ident('Schemax.Tabley');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+   r := parse_ident('"SchemaX"."TableY"');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+ END;
+ $$;
+ NOTICE:  schemax.tabley
+ NOTICE:  "SchemaX"."TableY"
+ SELECT parse_ident('foo.boo');
+  parse_ident 
+ -------------
+  {foo,boo}
+ (1 row)
+ 
+ SELECT parse_ident('foo.boo[]'); -- should fail
+ ERROR:  identifier contains disallowed characters: "foo.boo[]"
+ SELECT parse_ident('foo.boo[]', strictmode => false); -- ok
+  parse_ident 
+ -------------
+  {foo,boo}
+ (1 row)
+ 
+ -- should fail
+ SELECT parse_ident(' ');
+ ERROR:  missing identifier: " "
+ SELECT parse_ident(' .aaa');
+ ERROR:  missing identifier before "." symbol: " .aaa"
+ SELECT parse_ident(' aaa . ');
+ ERROR:  missing identifier after "." symbol: " aaa . "
+ SELECT parse_ident('aaa.a%b');
+ ERROR:  identifier contains disallowed characters: "aaa.a%b"
+ SELECT length(a[1]), length(a[2]) from parse_ident('"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx".yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy') as a ;
+  length | length 
+ --------+--------
+     414 |    289
+ (1 row)
+ 
+ SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"');
+                                                 parse_ident                                                
+ -----------------------------------------------------------------------------------------------------------
+  {first,"  second  ","   third   ","  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}
+ (1 row)
+ 
+ SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"')::name[];
+                                              parse_ident                                              
+ ------------------------------------------------------------------------------------------------------
+  {first,"  second  ","   third   ","  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}
+ (1 row)
+ 
diff --git a/src/test/regress/sql/name.sql b/src/test/regress/sql/name.sql
new file mode 100644
index 1c7a671..2e70e4e
*** a/src/test/regress/sql/name.sql
--- b/src/test/regress/sql/name.sql
*************** SELECT '' AS three, c.f1 FROM NAME_TBL c
*** 52,54 ****
--- 52,79 ----
  SELECT '' AS two, c.f1 FROM NAME_TBL c WHERE c.f1 ~ '.*asdf.*';
  
  DROP TABLE NAME_TBL;
+ 
+ DO $$
+ DECLARE r text[];
+ BEGIN
+   r := parse_ident('Schemax.Tabley');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+   r := parse_ident('"SchemaX"."TableY"');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+ END;
+ $$;
+ 
+ SELECT parse_ident('foo.boo');
+ SELECT parse_ident('foo.boo[]'); -- should fail
+ SELECT parse_ident('foo.boo[]', strictmode => false); -- ok
+ 
+ -- should fail
+ SELECT parse_ident(' ');
+ SELECT parse_ident(' .aaa');
+ SELECT parse_ident(' aaa . ');
+ SELECT parse_ident('aaa.a%b');
+ 
+ SELECT length(a[1]), length(a[2]) from parse_ident('"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx".yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy') as a ;
+ 
+ SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"');
+ SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"')::name[];

#39

Pavel Stehule

pavel.stehule@gmail.com

almost 10 years ago

In reply to: Pavel Stehule (#38)

Re: proposal: function parse_ident

Hi Jim

2016-02-11 8:27 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>

ok

Also added test for invalid characters.

I think "strict" would be more in line with other uses in code. There are
currently no other occurrences of 'strictmode' in the code. There are loads
of references to 'strict', but I didn't go through all of them to see if
any were used as externally visible function parameter names.

I am sorry, I don't understand to this point. You unlike the name of
parameter "strictmode" ? Have you any proposal? Maybe "restrictive" ?

Please, can you explain this point?

Regards

Pavel

Show quoted text

#40

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 10 years ago

In reply to: Pavel Stehule (#38)

1 attachment(s)

Re: proposal: function parse_ident

On 2/11/16 1:27 AM, Pavel Stehule wrote:

I editorialized the docs and some of the comments. In particular, I
documented behavior of not truncating, and recommended casting to
name[] if user cares about that. Added a unit test to verify that
works. BTW, I saw mention in the thread about not truncated spaces,
but the function *does* truncate them, unless they're inside quotes,
where they're legitimate.

ok

I missed some of my edits. Updated patch with those in place attached.

Also added test for invalid characters.

I think "strict" would be more in line with other uses in code.
There are currently no other occurrences of 'strictmode' in the
code. There are loads of references to 'strict', but I didn't go
through all of them to see if any were used as externally visible
function parameter names.

I am sorry, I don't understand to this point. You unlike the name of
parameter "strictmode" ? Have you any proposal? Maybe "restrictive" ?

I would just call it strict. There's precedent for that in the code.

The almost all code +/- is related to human readable error messages. We
can move some code to separate static functions - read_quoted_ident,
read_unquoted_ident, but there will be some magic about parameters, and
the code will not be much better, than it is now.

What I'm saying is that most places that need to do de-quoting or
similar just run a simple while loop and use an in_quote variable to
track whether they're inside a quote or not. See
backend/utils/adt/rowtypes.c line 199 for an example.

As I said, I don't have a strong opinion on it, so if you prefer it this
way that's fine with me.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Attachments:

parse_ident-9.patchtext/plain; charset=UTF-8; name=parse_ident-9.patch; x-mac-creator=0; x-mac-type=0Download

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 139aa2b..b4a2898 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1778,6 +1778,27 @@
       <row>
        <entry>
         <indexterm>
+         <primary>parse_ident</primary>
+        </indexterm>
+        <literal><function>parse_ident(<parameter>str</parameter> <type>text</type>,
+           [ <parameter>strictmode</parameter> <type>boolean</type> DEFAULT true ] )</function></literal>
+       </entry>
+       <entry><type>text[]</type></entry>
+       <entry>Split <parameter>qualified identifier</parameter> into array <parameter>parts</parameter>.
+       When <parameter>strictmode</parameter> is false, extra characters after the identifier are ignored.
+       This is useful for parsing identifiers for objects like functions and arrays that may have trailing
+       characters. By default, extra characters after the last identifier are considered an error.
+       second parameter is false, then chars after last identifier are ignored. Note that this function
+       does not truncate quoted identifiers. If you care about that you should cast the result of this
+	   function to name[].
+       </entry>
+       <entry><literal>parse_ident('"SomeSchema".someTable')</literal></entry>
+       <entry><literal>"SomeSchema,sometable"</literal></entry>
+      </row>
+
+      <row>
+       <entry>
+        <indexterm>
          <primary>pg_client_encoding</primary>
         </indexterm>
         <literal><function>pg_client_encoding()</function></literal>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 923fe58..61d5b80 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -965,3 +965,10 @@ RETURNS jsonb
 LANGUAGE INTERNAL
 STRICT IMMUTABLE
 AS 'jsonb_set';
+
+CREATE OR REPLACE FUNCTION
+  parse_ident(str text, strictmode boolean DEFAULT true)
+RETURNS text[]
+LANGUAGE INTERNAL
+STRICT IMMUTABLE
+AS 'parse_ident';
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index 2b4ab20..7aa5b76 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -130,6 +130,15 @@ scanstr(const char *s)
 char *
 downcase_truncate_identifier(const char *ident, int len, bool warn)
 {
+	return downcase_identifier(ident, len, warn, true);
+}
+
+/*
+ * a workhorse for downcase_truncate_identifier
+ */
+char *
+downcase_identifier(const char *ident, int len, bool warn, bool truncate)
+{
 	char	   *result;
 	int			i;
 	bool		enc_is_single_byte;
@@ -158,12 +167,13 @@ downcase_truncate_identifier(const char *ident, int len, bool warn)
 	}
 	result[i] = '\0';
 
-	if (i >= NAMEDATALEN)
+	if (i >= NAMEDATALEN && truncate)
 		truncate_identifier(result, i, warn);
 
 	return result;
 }
 
+
 /*
  * truncate_identifier() --- truncate an identifier to NAMEDATALEN-1 bytes.
  *
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index 6a306f3..f7d60b1 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -21,12 +21,15 @@
 #include <unistd.h>
 
 #include "access/sysattr.h"
+#include "access/htup_details.h"
 #include "catalog/catalog.h"
+#include "catalog/namespace.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_type.h"
 #include "commands/dbcommands.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "parser/scansup.h"
 #include "parser/keywords.h"
 #include "postmaster/syslogger.h"
 #include "rewrite/rewriteHandler.h"
@@ -38,6 +41,7 @@
 #include "utils/ruleutils.h"
 #include "tcop/tcopprot.h"
 #include "utils/acl.h"
+#include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/timestamp.h"
 
@@ -598,3 +602,174 @@ pg_column_is_updatable(PG_FUNCTION_ARGS)
 
 	PG_RETURN_BOOL((events & REQ_EVENTS) == REQ_EVENTS);
 }
+
+
+/*
+ * This simple parser utility are compatible with lexer implementation,
+ * used only in parse_ident function
+ */
+static bool
+is_ident_start(unsigned char c)
+{
+	if (c == '_')
+		return true;
+	if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'))
+		return true;
+
+	if (c >= 0200 && c <= 0377)
+		return true;
+
+	return false;
+}
+
+static bool
+is_ident_cont(unsigned char c)
+{
+	if (c >= '0' && c <= '9')
+		return true;
+
+	return is_ident_start(c);
+}
+
+/*
+ * parse_ident - parse SQL composed identifier to separate identifiers.
+ * When strict mode is active (second parameter), then any chars after
+ * last identifiers are disallowed.
+ */
+Datum
+parse_ident(PG_FUNCTION_ARGS)
+{
+	text		*qualname;
+	char		*qualname_str;
+	bool		strict_mode;
+	ArrayBuildState *astate = NULL;
+	char	*nextp;
+	bool		after_dot = false;
+
+	qualname = PG_GETARG_TEXT_PP(0);
+	qualname_str = text_to_cstring(qualname);
+	strict_mode = PG_GETARG_BOOL(1);
+
+	nextp = qualname_str;
+
+	/* skip leading whitespace */
+	while (isspace((unsigned char) *nextp))
+		nextp++;
+
+	for (;;)
+	{
+		char		*curname;
+		char		*endp;
+		bool		missing_ident;
+
+		missing_ident = true;
+
+		if (*nextp == '\"')
+		{
+			curname = nextp + 1;
+			for (;;)
+			{
+				endp = strchr(nextp + 1, '\"');
+				if (endp == NULL)
+					ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("unclosed double quotes"),
+						 errdetail("string \"%s\" is not valid identifier",
+													qualname_str)));
+				if (endp[1] != '\"')
+					break;
+				memmove(endp, endp + 1, strlen(endp));
+				nextp = endp;
+			}
+			nextp = endp + 1;
+			*endp = '\0';
+
+			if (endp - curname == 0)
+				ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("identifier should not be empty: \"%s\"",
+												qualname_str)));
+
+			astate = accumArrayResult(astate,
+				CStringGetTextDatum(curname), false,
+						    TEXTOID, CurrentMemoryContext);
+			missing_ident = false;
+		}
+		else
+		{
+			if (is_ident_start((unsigned char) *nextp))
+			{
+				char *downname;
+				int	len;
+				text	*part;
+
+				curname = nextp++;
+				while (is_ident_cont((unsigned char) *nextp))
+					nextp++;
+
+				len = nextp - curname;
+
+				/*
+				 * Unlike name, we don't implicitly truncate identifiers. This
+				 * is useful for allowing the user to check for specific parts
+				 * of the identifier being too long. It's easy enough for the
+				 * user to get the truncated names by casting our output to
+				 * name[].
+				 */
+				downname = downcase_identifier(curname, len, false, false);
+				part = cstring_to_text_with_len(downname, len);
+				astate = accumArrayResult(astate,
+					PointerGetDatum(part), false,
+							    TEXTOID, CurrentMemoryContext);
+				missing_ident = false;
+			}
+		}
+
+		if (missing_ident)
+		{
+			/* Different error messages based on where we failed. */
+			if (*nextp == '.')
+				ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("missing identifier before \".\" symbol: \"%s\"",
+												qualname_str)));
+			else if (after_dot)
+				ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("missing identifier after \".\" symbol: \"%s\"",
+												qualname_str)));
+			else
+				ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("missing identifier: \"%s\"",
+												qualname_str)));
+		}
+
+		while (isspace((unsigned char) *nextp))
+			nextp++;
+
+		if (*nextp == '.')
+		{
+			after_dot = true;
+			nextp++;
+			while (isspace((unsigned char) *nextp))
+				nextp++;
+			continue;
+		}
+		else if (*nextp == '\0')
+		{
+			break;
+		}
+		else
+		{
+			if (strict_mode)
+				ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("identifier contains disallowed characters: \"%s\"",
+												qualname_str)));
+			break;
+		}
+	}
+
+	PG_RETURN_DATUM(makeArrayResult(astate, CurrentMemoryContext));
+}
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 5ded13e..d7c09a7 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -3451,6 +3451,9 @@ DESCR("I/O");
 DATA(insert OID = 4086 (  to_regnamespace	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 4089 "25" _null_ _null_ _null_ _null_ _null_ to_regnamespace _null_ _null_ _null_ ));
 DESCR("convert namespace name to regnamespace");
 
+DATA(insert OID = 3318 (  parse_ident		PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 1009 "25 16" _null_ _null_ "{str,strictmode}" _null_ _null_ parse_ident _null_ _null_ _null_ ));
+DESCR("parse qualified identifier to array of identifiers");
+
 DATA(insert OID = 2246 ( fmgr_internal_validator PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_internal_validator _null_ _null_ _null_ ));
 DESCR("(internal)");
 DATA(insert OID = 2247 ( fmgr_c_validator	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_c_validator _null_ _null_ _null_ ));
diff --git a/src/include/parser/scansup.h b/src/include/parser/scansup.h
index 4f4164b..4f95c81 100644
--- a/src/include/parser/scansup.h
+++ b/src/include/parser/scansup.h
@@ -20,6 +20,9 @@ extern char *scanstr(const char *s);
 extern char *downcase_truncate_identifier(const char *ident, int len,
 							 bool warn);
 
+extern char *downcase_identifier(const char *ident, int len,
+							 bool warn, bool truncate);
+
 extern void truncate_identifier(char *ident, int len, bool warn);
 
 extern bool scanner_isspace(char ch);
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 5e8e832..b1b6ef6 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -504,6 +504,7 @@ extern Datum pg_typeof(PG_FUNCTION_ARGS);
 extern Datum pg_collation_for(PG_FUNCTION_ARGS);
 extern Datum pg_relation_is_updatable(PG_FUNCTION_ARGS);
 extern Datum pg_column_is_updatable(PG_FUNCTION_ARGS);
+extern Datum parse_ident(PG_FUNCTION_ARGS);
 
 /* oid.c */
 extern Datum oidin(PG_FUNCTION_ARGS);
diff --git a/src/test/regress/expected/name.out b/src/test/regress/expected/name.out
index b359d52..66569f4 100644
--- a/src/test/regress/expected/name.out
+++ b/src/test/regress/expected/name.out
@@ -124,3 +124,55 @@ SELECT '' AS two, c.f1 FROM NAME_TBL c WHERE c.f1 ~ '.*asdf.*';
 (2 rows)
 
 DROP TABLE NAME_TBL;
+DO $$
+DECLARE r text[];
+BEGIN
+  r := parse_ident('Schemax.Tabley');
+  RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+  r := parse_ident('"SchemaX"."TableY"');
+  RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+END;
+$$;
+NOTICE:  schemax.tabley
+NOTICE:  "SchemaX"."TableY"
+SELECT parse_ident('foo.boo');
+ parse_ident 
+-------------
+ {foo,boo}
+(1 row)
+
+SELECT parse_ident('foo.boo[]'); -- should fail
+ERROR:  identifier contains disallowed characters: "foo.boo[]"
+SELECT parse_ident('foo.boo[]', strictmode => false); -- ok
+ parse_ident 
+-------------
+ {foo,boo}
+(1 row)
+
+-- should fail
+SELECT parse_ident(' ');
+ERROR:  missing identifier: " "
+SELECT parse_ident(' .aaa');
+ERROR:  missing identifier before "." symbol: " .aaa"
+SELECT parse_ident(' aaa . ');
+ERROR:  missing identifier after "." symbol: " aaa . "
+SELECT parse_ident('aaa.a%b');
+ERROR:  identifier contains disallowed characters: "aaa.a%b"
+SELECT length(a[1]), length(a[2]) from parse_ident('"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx".yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy') as a ;
+ length | length 
+--------+--------
+    414 |    289
+(1 row)
+
+SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"');
+                                                parse_ident                                                
+-----------------------------------------------------------------------------------------------------------
+ {first,"  second  ","   third   ","  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}
+(1 row)
+
+SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"')::name[];
+                                             parse_ident                                              
+------------------------------------------------------------------------------------------------------
+ {first,"  second  ","   third   ","  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}
+(1 row)
+
diff --git a/src/test/regress/sql/name.sql b/src/test/regress/sql/name.sql
index 1c7a671..2e70e4e 100644
--- a/src/test/regress/sql/name.sql
+++ b/src/test/regress/sql/name.sql
@@ -52,3 +52,28 @@ SELECT '' AS three, c.f1 FROM NAME_TBL c WHERE c.f1 ~ '[0-9]';
 SELECT '' AS two, c.f1 FROM NAME_TBL c WHERE c.f1 ~ '.*asdf.*';
 
 DROP TABLE NAME_TBL;
+
+DO $$
+DECLARE r text[];
+BEGIN
+  r := parse_ident('Schemax.Tabley');
+  RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+  r := parse_ident('"SchemaX"."TableY"');
+  RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+END;
+$$;
+
+SELECT parse_ident('foo.boo');
+SELECT parse_ident('foo.boo[]'); -- should fail
+SELECT parse_ident('foo.boo[]', strictmode => false); -- ok
+
+-- should fail
+SELECT parse_ident(' ');
+SELECT parse_ident(' .aaa');
+SELECT parse_ident(' aaa . ');
+SELECT parse_ident('aaa.a%b');
+
+SELECT length(a[1]), length(a[2]) from parse_ident('"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx".yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy') as a ;
+
+SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"');
+SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"')::name[];

#41

Pavel Stehule

pavel.stehule@gmail.com

almost 10 years ago

In reply to: Jim Nasby (#40)

1 attachment(s)

Re: proposal: function parse_ident

2016-02-17 1:38 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:

On 2/11/16 1:27 AM, Pavel Stehule wrote:

I editorialized the docs and some of the comments. In particular, I
documented behavior of not truncating, and recommended casting to
name[] if user cares about that. Added a unit test to verify that
works. BTW, I saw mention in the thread about not truncated spaces,
but the function *does* truncate them, unless they're inside quotes,
where they're legitimate.

ok

I missed some of my edits. Updated patch with those in place attached.

Also added test for invalid characters.

I think "strict" would be more in line with other uses in code.
There are currently no other occurrences of 'strictmode' in the
code. There are loads of references to 'strict', but I didn't go
through all of them to see if any were used as externally visible
function parameter names.

I am sorry, I don't understand to this point. You unlike the name of
parameter "strictmode" ? Have you any proposal? Maybe "restrictive" ?

I would just call it strict. There's precedent for that in the code.

fixed in attached patch

The almost all code +/- is related to human readable error messages. We

can move some code to separate static functions - read_quoted_ident,
read_unquoted_ident, but there will be some magic about parameters, and
the code will not be much better, than it is now.

What I'm saying is that most places that need to do de-quoting or similar
just run a simple while loop and use an in_quote variable to track whether
they're inside a quote or not. See backend/utils/adt/rowtypes.c line 199
for an example.

As I said, I don't have a strong opinion on it, so if you prefer it this
way that's fine with me.

yes, I don't see string differences between for(;;) and break and
while(var). I prefer current state.

Regards

Pavel

Show quoted text

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Attachments:

parse_ident-10.patchtext/x-patch; charset=US-ASCII; name=parse_ident-10.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index f9eea76..9eed19a
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***************
*** 1821,1826 ****
--- 1821,1847 ----
        <row>
         <entry>
          <indexterm>
+          <primary>parse_ident</primary>
+         </indexterm>
+         <literal><function>parse_ident(<parameter>str</parameter> <type>text</type>,
+            [ <parameter>strictmode</parameter> <type>boolean</type> DEFAULT true ] )</function></literal>
+        </entry>
+        <entry><type>text[]</type></entry>
+        <entry>Split <parameter>qualified identifier</parameter> into array <parameter>parts</parameter>.
+        When <parameter>strictmode</parameter> is false, extra characters after the identifier are ignored.
+        This is useful for parsing identifiers for objects like functions and arrays that may have trailing
+        characters. By default, extra characters after the last identifier are considered an error.
+        second parameter is false, then chars after last identifier are ignored. Note that this function
+        does not truncate quoted identifiers. If you care about that you should cast the result of this
+ 	   function to name[].
+        </entry>
+        <entry><literal>parse_ident('"SomeSchema".someTable')</literal></entry>
+        <entry><literal>"SomeSchema,sometable"</literal></entry>
+       </row>
+ 
+       <row>
+        <entry>
+         <indexterm>
           <primary>pg_client_encoding</primary>
          </indexterm>
          <literal><function>pg_client_encoding()</function></literal>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
new file mode 100644
index 923fe58..9a65bc9
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*************** RETURNS jsonb
*** 965,967 ****
--- 965,974 ----
  LANGUAGE INTERNAL
  STRICT IMMUTABLE
  AS 'jsonb_set';
+ 
+ CREATE OR REPLACE FUNCTION
+   parse_ident(str text, strict boolean DEFAULT true)
+ RETURNS text[]
+ LANGUAGE INTERNAL
+ STRICT IMMUTABLE
+ AS 'parse_ident';
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
new file mode 100644
index 2b4ab20..7aa5b76
*** a/src/backend/parser/scansup.c
--- b/src/backend/parser/scansup.c
*************** scanstr(const char *s)
*** 130,135 ****
--- 130,144 ----
  char *
  downcase_truncate_identifier(const char *ident, int len, bool warn)
  {
+ 	return downcase_identifier(ident, len, warn, true);
+ }
+ 
+ /*
+  * a workhorse for downcase_truncate_identifier
+  */
+ char *
+ downcase_identifier(const char *ident, int len, bool warn, bool truncate)
+ {
  	char	   *result;
  	int			i;
  	bool		enc_is_single_byte;
*************** downcase_truncate_identifier(const char
*** 158,169 ****
  	}
  	result[i] = '\0';
  
! 	if (i >= NAMEDATALEN)
  		truncate_identifier(result, i, warn);
  
  	return result;
  }
  
  /*
   * truncate_identifier() --- truncate an identifier to NAMEDATALEN-1 bytes.
   *
--- 167,179 ----
  	}
  	result[i] = '\0';
  
! 	if (i >= NAMEDATALEN && truncate)
  		truncate_identifier(result, i, warn);
  
  	return result;
  }
  
+ 
  /*
   * truncate_identifier() --- truncate an identifier to NAMEDATALEN-1 bytes.
   *
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
new file mode 100644
index 43f36db..3a8d9f5
*** a/src/backend/utils/adt/misc.c
--- b/src/backend/utils/adt/misc.c
***************
*** 21,32 ****
--- 21,35 ----
  #include <unistd.h>
  
  #include "access/sysattr.h"
+ #include "access/htup_details.h"
  #include "catalog/catalog.h"
+ #include "catalog/namespace.h"
  #include "catalog/pg_tablespace.h"
  #include "catalog/pg_type.h"
  #include "commands/dbcommands.h"
  #include "funcapi.h"
  #include "miscadmin.h"
+ #include "parser/scansup.h"
  #include "parser/keywords.h"
  #include "postmaster/syslogger.h"
  #include "rewrite/rewriteHandler.h"
***************
*** 38,43 ****
--- 41,47 ----
  #include "utils/ruleutils.h"
  #include "tcop/tcopprot.h"
  #include "utils/acl.h"
+ #include "utils/array.h"
  #include "utils/builtins.h"
  #include "utils/timestamp.h"
  
*************** pg_column_is_updatable(PG_FUNCTION_ARGS)
*** 719,721 ****
--- 723,896 ----
  
  	PG_RETURN_BOOL((events & REQ_EVENTS) == REQ_EVENTS);
  }
+ 
+ 
+ /*
+  * This simple parser utility are compatible with lexer implementation,
+  * used only in parse_ident function
+  */
+ static bool
+ is_ident_start(unsigned char c)
+ {
+ 	if (c == '_')
+ 		return true;
+ 	if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'))
+ 		return true;
+ 
+ 	if (c >= 0200 && c <= 0377)
+ 		return true;
+ 
+ 	return false;
+ }
+ 
+ static bool
+ is_ident_cont(unsigned char c)
+ {
+ 	if (c >= '0' && c <= '9')
+ 		return true;
+ 
+ 	return is_ident_start(c);
+ }
+ 
+ /*
+  * parse_ident - parse SQL composed identifier to separate identifiers.
+  * When strict mode is active (second parameter), then any chars after
+  * last identifiers are disallowed.
+  */
+ Datum
+ parse_ident(PG_FUNCTION_ARGS)
+ {
+ 	text		*qualname;
+ 	char		*qualname_str;
+ 	bool		strict;
+ 	ArrayBuildState *astate = NULL;
+ 	char	*nextp;
+ 	bool		after_dot = false;
+ 
+ 	qualname = PG_GETARG_TEXT_PP(0);
+ 	qualname_str = text_to_cstring(qualname);
+ 	strict = PG_GETARG_BOOL(1);
+ 
+ 	nextp = qualname_str;
+ 
+ 	/* skip leading whitespace */
+ 	while (isspace((unsigned char) *nextp))
+ 		nextp++;
+ 
+ 	for (;;)
+ 	{
+ 		char		*curname;
+ 		char		*endp;
+ 		bool		missing_ident;
+ 
+ 		missing_ident = true;
+ 
+ 		if (*nextp == '\"')
+ 		{
+ 			curname = nextp + 1;
+ 			for (;;)
+ 			{
+ 				endp = strchr(nextp + 1, '\"');
+ 				if (endp == NULL)
+ 					ereport(ERROR,
+ 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 						 errmsg("unclosed double quotes"),
+ 						 errdetail("string \"%s\" is not valid identifier",
+ 													qualname_str)));
+ 				if (endp[1] != '\"')
+ 					break;
+ 				memmove(endp, endp + 1, strlen(endp));
+ 				nextp = endp;
+ 			}
+ 			nextp = endp + 1;
+ 			*endp = '\0';
+ 
+ 			if (endp - curname == 0)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("identifier should not be empty: \"%s\"",
+ 												qualname_str)));
+ 
+ 			astate = accumArrayResult(astate,
+ 				CStringGetTextDatum(curname), false,
+ 						    TEXTOID, CurrentMemoryContext);
+ 			missing_ident = false;
+ 		}
+ 		else
+ 		{
+ 			if (is_ident_start((unsigned char) *nextp))
+ 			{
+ 				char *downname;
+ 				int	len;
+ 				text	*part;
+ 
+ 				curname = nextp++;
+ 				while (is_ident_cont((unsigned char) *nextp))
+ 					nextp++;
+ 
+ 				len = nextp - curname;
+ 
+ 				/*
+ 				 * Unlike name, we don't implicitly truncate identifiers. This
+ 				 * is useful for allowing the user to check for specific parts
+ 				 * of the identifier being too long. It's easy enough for the
+ 				 * user to get the truncated names by casting our output to
+ 				 * name[].
+ 				 */
+ 				downname = downcase_identifier(curname, len, false, false);
+ 				part = cstring_to_text_with_len(downname, len);
+ 				astate = accumArrayResult(astate,
+ 					PointerGetDatum(part), false,
+ 							    TEXTOID, CurrentMemoryContext);
+ 				missing_ident = false;
+ 			}
+ 		}
+ 
+ 		if (missing_ident)
+ 		{
+ 			/* Different error messages based on where we failed. */
+ 			if (*nextp == '.')
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("missing identifier before \".\" symbol: \"%s\"",
+ 												qualname_str)));
+ 			else if (after_dot)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("missing identifier after \".\" symbol: \"%s\"",
+ 												qualname_str)));
+ 			else
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("missing identifier: \"%s\"",
+ 												qualname_str)));
+ 		}
+ 
+ 		while (isspace((unsigned char) *nextp))
+ 			nextp++;
+ 
+ 		if (*nextp == '.')
+ 		{
+ 			after_dot = true;
+ 			nextp++;
+ 			while (isspace((unsigned char) *nextp))
+ 				nextp++;
+ 			continue;
+ 		}
+ 		else if (*nextp == '\0')
+ 		{
+ 			break;
+ 		}
+ 		else
+ 		{
+ 			if (strict)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("identifier contains disallowed characters: \"%s\"",
+ 												qualname_str)));
+ 			break;
+ 		}
+ 	}
+ 
+ 	PG_RETURN_DATUM(makeArrayResult(astate, CurrentMemoryContext));
+ }
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
new file mode 100644
index b24e434..8354a82
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** DESCR("I/O");
*** 3457,3462 ****
--- 3457,3465 ----
  DATA(insert OID = 4086 (  to_regnamespace	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 4089 "25" _null_ _null_ _null_ _null_ _null_ to_regnamespace _null_ _null_ _null_ ));
  DESCR("convert namespace name to regnamespace");
  
+ DATA(insert OID = 3318 (  parse_ident		PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 1009 "25 16" _null_ _null_ "{str,strict}" _null_ _null_ parse_ident _null_ _null_ _null_ ));
+ DESCR("parse qualified identifier to array of identifiers");
+ 
  DATA(insert OID = 2246 ( fmgr_internal_validator PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_internal_validator _null_ _null_ _null_ ));
  DESCR("(internal)");
  DATA(insert OID = 2247 ( fmgr_c_validator	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_c_validator _null_ _null_ _null_ ));
diff --git a/src/include/parser/scansup.h b/src/include/parser/scansup.h
new file mode 100644
index 4f4164b..4f95c81
*** a/src/include/parser/scansup.h
--- b/src/include/parser/scansup.h
*************** extern char *scanstr(const char *s);
*** 20,25 ****
--- 20,28 ----
  extern char *downcase_truncate_identifier(const char *ident, int len,
  							 bool warn);
  
+ extern char *downcase_identifier(const char *ident, int len,
+ 							 bool warn, bool truncate);
+ 
  extern void truncate_identifier(char *ident, int len, bool warn);
  
  extern bool scanner_isspace(char ch);
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
new file mode 100644
index affcc01..b22a6b8
*** a/src/include/utils/builtins.h
--- b/src/include/utils/builtins.h
*************** extern Datum pg_typeof(PG_FUNCTION_ARGS)
*** 508,513 ****
--- 508,514 ----
  extern Datum pg_collation_for(PG_FUNCTION_ARGS);
  extern Datum pg_relation_is_updatable(PG_FUNCTION_ARGS);
  extern Datum pg_column_is_updatable(PG_FUNCTION_ARGS);
+ extern Datum parse_ident(PG_FUNCTION_ARGS);
  
  /* oid.c */
  extern Datum oidin(PG_FUNCTION_ARGS);
diff --git a/src/test/regress/expected/name.out b/src/test/regress/expected/name.out
new file mode 100644
index b359d52..2a8876f
*** a/src/test/regress/expected/name.out
--- b/src/test/regress/expected/name.out
*************** SELECT '' AS two, c.f1 FROM NAME_TBL c W
*** 124,126 ****
--- 124,178 ----
  (2 rows)
  
  DROP TABLE NAME_TBL;
+ DO $$
+ DECLARE r text[];
+ BEGIN
+   r := parse_ident('Schemax.Tabley');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+   r := parse_ident('"SchemaX"."TableY"');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+ END;
+ $$;
+ NOTICE:  schemax.tabley
+ NOTICE:  "SchemaX"."TableY"
+ SELECT parse_ident('foo.boo');
+  parse_ident 
+ -------------
+  {foo,boo}
+ (1 row)
+ 
+ SELECT parse_ident('foo.boo[]'); -- should fail
+ ERROR:  identifier contains disallowed characters: "foo.boo[]"
+ SELECT parse_ident('foo.boo[]', strict => false); -- ok
+  parse_ident 
+ -------------
+  {foo,boo}
+ (1 row)
+ 
+ -- should fail
+ SELECT parse_ident(' ');
+ ERROR:  missing identifier: " "
+ SELECT parse_ident(' .aaa');
+ ERROR:  missing identifier before "." symbol: " .aaa"
+ SELECT parse_ident(' aaa . ');
+ ERROR:  missing identifier after "." symbol: " aaa . "
+ SELECT parse_ident('aaa.a%b');
+ ERROR:  identifier contains disallowed characters: "aaa.a%b"
+ SELECT length(a[1]), length(a[2]) from parse_ident('"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx".yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy') as a ;
+  length | length 
+ --------+--------
+     414 |    289
+ (1 row)
+ 
+ SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"');
+                                                 parse_ident                                                
+ -----------------------------------------------------------------------------------------------------------
+  {first,"  second  ","   third   ","  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}
+ (1 row)
+ 
+ SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"')::name[];
+                                              parse_ident                                              
+ ------------------------------------------------------------------------------------------------------
+  {first,"  second  ","   third   ","  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}
+ (1 row)
+ 
diff --git a/src/test/regress/sql/name.sql b/src/test/regress/sql/name.sql
new file mode 100644
index 1c7a671..90b7ea0
*** a/src/test/regress/sql/name.sql
--- b/src/test/regress/sql/name.sql
*************** SELECT '' AS three, c.f1 FROM NAME_TBL c
*** 52,54 ****
--- 52,79 ----
  SELECT '' AS two, c.f1 FROM NAME_TBL c WHERE c.f1 ~ '.*asdf.*';
  
  DROP TABLE NAME_TBL;
+ 
+ DO $$
+ DECLARE r text[];
+ BEGIN
+   r := parse_ident('Schemax.Tabley');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+   r := parse_ident('"SchemaX"."TableY"');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+ END;
+ $$;
+ 
+ SELECT parse_ident('foo.boo');
+ SELECT parse_ident('foo.boo[]'); -- should fail
+ SELECT parse_ident('foo.boo[]', strict => false); -- ok
+ 
+ -- should fail
+ SELECT parse_ident(' ');
+ SELECT parse_ident(' .aaa');
+ SELECT parse_ident(' aaa . ');
+ SELECT parse_ident('aaa.a%b');
+ 
+ SELECT length(a[1]), length(a[2]) from parse_ident('"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx".yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy') as a ;
+ 
+ SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"');
+ SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"')::name[];

#42

Teodor Sigaev

teodor@sigaev.ru

almost 10 years ago

In reply to: Pavel Stehule (#41)

Re: proposal: function parse_ident

I missed some of my edits. Updated patch with those in place attached.

I'm sorry, but special chararacter still isn't escaped correctly in error messages:

% select parse_ident(E'X\rXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX');
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX""X
Time: 0,510 ms

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#43

Pavel Stehule

pavel.stehule@gmail.com

almost 10 years ago

In reply to: Teodor Sigaev (#42)

Re: proposal: function parse_ident

2016-02-17 14:02 GMT+01:00 Teodor Sigaev <teodor@sigaev.ru>:

I missed some of my edits. Updated patch with those in place attached.

I'm sorry, but special chararacter still isn't escaped correctly in error
messages:

% select
parse_ident(E'X\rXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX');
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX""X
Time: 0,510 ms

:(, .. I'll fix it today or tomorrow, when I'll have free time

Pavel

Show quoted text

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW:
http://www.sigaev.ru/

#44

Pavel Stehule

pavel.stehule@gmail.com

almost 10 years ago

In reply to: Pavel Stehule (#43)

1 attachment(s)

Re: proposal: function parse_ident

2016-02-18 4:59 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:

select

parse_ident(E'X\rXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX');

I am sending updated patch - I used json function for correct escaping -
the escaping behave is same.

Regards

Pavel

Attachments:

parse_ident-11.patchtext/x-patch; charset=US-ASCII; name=parse_ident-11.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index f9eea76..c1c113a
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***************
*** 1821,1826 ****
--- 1821,1847 ----
        <row>
         <entry>
          <indexterm>
+          <primary>parse_ident</primary>
+         </indexterm>
+         <literal><function>parse_ident(<parameter>str</parameter> <type>text</type>,
+            [ <parameter>strictmode</parameter> <type>boolean</type> DEFAULT true ] )</function></literal>
+        </entry>
+        <entry><type>text[]</type></entry>
+        <entry>Split <parameter>qualified identifier</parameter> into array <parameter>parts</parameter>.
+        When <parameter>strictmode</parameter> is false, extra characters after the identifier are ignored.
+        This is useful for parsing identifiers for objects like functions and arrays that may have trailing
+        characters. By default, extra characters after the last identifier are considered an error.
+        second parameter is false, then chars after last identifier are ignored. Note that this function
+        does not truncate quoted identifiers. If you care about that you should cast the result of this
+        function to name[].
+        </entry>
+        <entry><literal>parse_ident('"SomeSchema".someTable')</literal></entry>
+        <entry><literal>"SomeSchema,sometable"</literal></entry>
+       </row>
+ 
+       <row>
+        <entry>
+         <indexterm>
           <primary>pg_client_encoding</primary>
          </indexterm>
          <literal><function>pg_client_encoding()</function></literal>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
new file mode 100644
index abf9a70..38af138
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*************** RETURNS jsonb
*** 971,973 ****
--- 971,980 ----
  LANGUAGE INTERNAL
  STRICT IMMUTABLE
  AS 'jsonb_set';
+ 
+ CREATE OR REPLACE FUNCTION
+   parse_ident(str text, strict boolean DEFAULT true)
+ RETURNS text[]
+ LANGUAGE INTERNAL
+ STRICT IMMUTABLE
+ AS 'parse_ident';
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
new file mode 100644
index 2b4ab20..7aa5b76
*** a/src/backend/parser/scansup.c
--- b/src/backend/parser/scansup.c
*************** scanstr(const char *s)
*** 130,135 ****
--- 130,144 ----
  char *
  downcase_truncate_identifier(const char *ident, int len, bool warn)
  {
+ 	return downcase_identifier(ident, len, warn, true);
+ }
+ 
+ /*
+  * a workhorse for downcase_truncate_identifier
+  */
+ char *
+ downcase_identifier(const char *ident, int len, bool warn, bool truncate)
+ {
  	char	   *result;
  	int			i;
  	bool		enc_is_single_byte;
*************** downcase_truncate_identifier(const char
*** 158,169 ****
  	}
  	result[i] = '\0';
  
! 	if (i >= NAMEDATALEN)
  		truncate_identifier(result, i, warn);
  
  	return result;
  }
  
  /*
   * truncate_identifier() --- truncate an identifier to NAMEDATALEN-1 bytes.
   *
--- 167,179 ----
  	}
  	result[i] = '\0';
  
! 	if (i >= NAMEDATALEN && truncate)
  		truncate_identifier(result, i, warn);
  
  	return result;
  }
  
+ 
  /*
   * truncate_identifier() --- truncate an identifier to NAMEDATALEN-1 bytes.
   *
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
new file mode 100644
index 43f36db..d11581a
*** a/src/backend/utils/adt/misc.c
--- b/src/backend/utils/adt/misc.c
***************
*** 21,32 ****
--- 21,35 ----
  #include <unistd.h>
  
  #include "access/sysattr.h"
+ #include "access/htup_details.h"
  #include "catalog/catalog.h"
+ #include "catalog/namespace.h"
  #include "catalog/pg_tablespace.h"
  #include "catalog/pg_type.h"
  #include "commands/dbcommands.h"
  #include "funcapi.h"
  #include "miscadmin.h"
+ #include "parser/scansup.h"
  #include "parser/keywords.h"
  #include "postmaster/syslogger.h"
  #include "rewrite/rewriteHandler.h"
***************
*** 38,44 ****
--- 41,49 ----
  #include "utils/ruleutils.h"
  #include "tcop/tcopprot.h"
  #include "utils/acl.h"
+ #include "utils/array.h"
  #include "utils/builtins.h"
+ #include "utils/json.h"
  #include "utils/timestamp.h"
  
  #define atooid(x)  ((Oid) strtoul((x), NULL, 10))
*************** pg_column_is_updatable(PG_FUNCTION_ARGS)
*** 719,721 ****
--- 724,912 ----
  
  	PG_RETURN_BOOL((events & REQ_EVENTS) == REQ_EVENTS);
  }
+ 
+ 
+ /*
+  * This simple parser utility are compatible with lexer implementation,
+  * used only in parse_ident function
+  */
+ static bool
+ is_ident_start(unsigned char c)
+ {
+ 	if (c == '_')
+ 		return true;
+ 	if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'))
+ 		return true;
+ 
+ 	if (c >= 0200 && c <= 0377)
+ 		return true;
+ 
+ 	return false;
+ }
+ 
+ static bool
+ is_ident_cont(unsigned char c)
+ {
+ 	if (c >= '0' && c <= '9')
+ 		return true;
+ 
+ 	return is_ident_start(c);
+ }
+ 
+ /*
+  * Sanitize enhanced string for using in error message.
+  */
+ static char *
+ sanitize_str(const char *str)
+ {
+ 	StringInfoData	ds;
+ 
+ 	/* we share same escaping rules with json escaping function */
+ 	initStringInfo(&ds);
+ 	escape_json(&ds, str);
+ 
+ 	return ds.data;
+ }
+ 
+ /*
+  * parse_ident - parse SQL composed identifier to separate identifiers.
+  * When strict mode is active (second parameter), then any chars after
+  * last identifiers are disallowed.
+  */
+ Datum
+ parse_ident(PG_FUNCTION_ARGS)
+ {
+ 	text		*qualname;
+ 	char		*qualname_str;
+ 	bool		strict;
+ 	ArrayBuildState *astate = NULL;
+ 	char	*nextp;
+ 	bool		after_dot = false;
+ 
+ 	qualname = PG_GETARG_TEXT_PP(0);
+ 	qualname_str = text_to_cstring(qualname);
+ 	strict = PG_GETARG_BOOL(1);
+ 
+ 	nextp = qualname_str;
+ 
+ 	/* skip leading whitespace */
+ 	while (isspace((unsigned char) *nextp))
+ 		nextp++;
+ 
+ 	for (;;)
+ 	{
+ 		char		*curname;
+ 		char		*endp;
+ 		bool		missing_ident;
+ 
+ 		missing_ident = true;
+ 
+ 		if (*nextp == '\"')
+ 		{
+ 			curname = nextp + 1;
+ 			for (;;)
+ 			{
+ 				endp = strchr(nextp + 1, '\"');
+ 				if (endp == NULL)
+ 					ereport(ERROR,
+ 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 						 errmsg("unclosed double quotes"),
+ 						 errdetail("string %s is not valid identifier",
+ 													sanitize_str(qualname_str))));
+ 				if (endp[1] != '\"')
+ 					break;
+ 				memmove(endp, endp + 1, strlen(endp));
+ 				nextp = endp;
+ 			}
+ 			nextp = endp + 1;
+ 			*endp = '\0';
+ 
+ 			if (endp - curname == 0)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("identifier should not be empty: %s",
+ 												sanitize_str(qualname_str))));
+ 
+ 			astate = accumArrayResult(astate,
+ 				CStringGetTextDatum(curname), false,
+ 						    TEXTOID, CurrentMemoryContext);
+ 			missing_ident = false;
+ 		}
+ 		else
+ 		{
+ 			if (is_ident_start((unsigned char) *nextp))
+ 			{
+ 				char *downname;
+ 				int	len;
+ 				text	*part;
+ 
+ 				curname = nextp++;
+ 				while (is_ident_cont((unsigned char) *nextp))
+ 					nextp++;
+ 
+ 				len = nextp - curname;
+ 
+ 				/*
+ 				 * Unlike name, we don't implicitly truncate identifiers. This
+ 				 * is useful for allowing the user to check for specific parts
+ 				 * of the identifier being too long. It's easy enough for the
+ 				 * user to get the truncated names by casting our output to
+ 				 * name[].
+ 				 */
+ 				downname = downcase_identifier(curname, len, false, false);
+ 				part = cstring_to_text_with_len(downname, len);
+ 				astate = accumArrayResult(astate,
+ 					PointerGetDatum(part), false,
+ 							    TEXTOID, CurrentMemoryContext);
+ 				missing_ident = false;
+ 			}
+ 		}
+ 
+ 		if (missing_ident)
+ 		{
+ 			/* Different error messages based on where we failed. */
+ 			if (*nextp == '.')
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("missing identifier before \".\" symbol: %s",
+ 												sanitize_str(qualname_str))));
+ 			else if (after_dot)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("missing identifier after \".\" symbol: %s",
+ 												sanitize_str(qualname_str))));
+ 			else
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("missing identifier: %s",
+ 												sanitize_str(qualname_str))));
+ 		}
+ 
+ 		while (isspace((unsigned char) *nextp))
+ 			nextp++;
+ 
+ 		if (*nextp == '.')
+ 		{
+ 			after_dot = true;
+ 			nextp++;
+ 			while (isspace((unsigned char) *nextp))
+ 				nextp++;
+ 			continue;
+ 		}
+ 		else if (*nextp == '\0')
+ 		{
+ 			break;
+ 		}
+ 		else
+ 		{
+ 			if (strict)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("identifier contains disallowed characters: %s",
+ 												sanitize_str(qualname_str))));
+ 			break;
+ 		}
+ 	}
+ 
+ 	PG_RETURN_DATUM(makeArrayResult(astate, CurrentMemoryContext));
+ }
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
new file mode 100644
index 2222e8f..8eef46c
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** DESCR("I/O");
*** 3457,3462 ****
--- 3457,3465 ----
  DATA(insert OID = 4086 (  to_regnamespace	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 4089 "25" _null_ _null_ _null_ _null_ _null_ to_regnamespace _null_ _null_ _null_ ));
  DESCR("convert namespace name to regnamespace");
  
+ DATA(insert OID = 3318 (  parse_ident		PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 1009 "25 16" _null_ _null_ "{str,strict}" _null_ _null_ parse_ident _null_ _null_ _null_ ));
+ DESCR("parse qualified identifier to array of identifiers");
+ 
  DATA(insert OID = 2246 ( fmgr_internal_validator PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_internal_validator _null_ _null_ _null_ ));
  DESCR("(internal)");
  DATA(insert OID = 2247 ( fmgr_c_validator	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_c_validator _null_ _null_ _null_ ));
diff --git a/src/include/parser/scansup.h b/src/include/parser/scansup.h
new file mode 100644
index 4f4164b..4f95c81
*** a/src/include/parser/scansup.h
--- b/src/include/parser/scansup.h
*************** extern char *scanstr(const char *s);
*** 20,25 ****
--- 20,28 ----
  extern char *downcase_truncate_identifier(const char *ident, int len,
  							 bool warn);
  
+ extern char *downcase_identifier(const char *ident, int len,
+ 							 bool warn, bool truncate);
+ 
  extern void truncate_identifier(char *ident, int len, bool warn);
  
  extern bool scanner_isspace(char ch);
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
new file mode 100644
index a784de9..9643a6a
*** a/src/include/utils/builtins.h
--- b/src/include/utils/builtins.h
*************** extern Datum pg_typeof(PG_FUNCTION_ARGS)
*** 508,513 ****
--- 508,514 ----
  extern Datum pg_collation_for(PG_FUNCTION_ARGS);
  extern Datum pg_relation_is_updatable(PG_FUNCTION_ARGS);
  extern Datum pg_column_is_updatable(PG_FUNCTION_ARGS);
+ extern Datum parse_ident(PG_FUNCTION_ARGS);
  
  /* oid.c */
  extern Datum oidin(PG_FUNCTION_ARGS);
diff --git a/src/test/regress/expected/name.out b/src/test/regress/expected/name.out
new file mode 100644
index b359d52..a6bc88a
*** a/src/test/regress/expected/name.out
--- b/src/test/regress/expected/name.out
*************** SELECT '' AS two, c.f1 FROM NAME_TBL c W
*** 124,126 ****
--- 124,180 ----
  (2 rows)
  
  DROP TABLE NAME_TBL;
+ DO $$
+ DECLARE r text[];
+ BEGIN
+   r := parse_ident('Schemax.Tabley');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+   r := parse_ident('"SchemaX"."TableY"');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+ END;
+ $$;
+ NOTICE:  schemax.tabley
+ NOTICE:  "SchemaX"."TableY"
+ SELECT parse_ident('foo.boo');
+  parse_ident 
+ -------------
+  {foo,boo}
+ (1 row)
+ 
+ SELECT parse_ident('foo.boo[]'); -- should fail
+ ERROR:  identifier contains disallowed characters: "foo.boo[]"
+ SELECT parse_ident('foo.boo[]', strict => false); -- ok
+  parse_ident 
+ -------------
+  {foo,boo}
+ (1 row)
+ 
+ -- should fail
+ SELECT parse_ident(' ');
+ ERROR:  missing identifier: " "
+ SELECT parse_ident(' .aaa');
+ ERROR:  missing identifier before "." symbol: " .aaa"
+ SELECT parse_ident(' aaa . ');
+ ERROR:  missing identifier after "." symbol: " aaa . "
+ SELECT parse_ident('aaa.a%b');
+ ERROR:  identifier contains disallowed characters: "aaa.a%b"
+ SELECT parse_ident(E'X\rXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX');
+ ERROR:  identifier contains disallowed characters: "X\rXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
+ SELECT length(a[1]), length(a[2]) from parse_ident('"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx".yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy') as a ;
+  length | length 
+ --------+--------
+     414 |    289
+ (1 row)
+ 
+ SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"');
+                                                 parse_ident                                                
+ -----------------------------------------------------------------------------------------------------------
+  {first,"  second  ","   third   ","  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}
+ (1 row)
+ 
+ SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"')::name[];
+                                              parse_ident                                              
+ ------------------------------------------------------------------------------------------------------
+  {first,"  second  ","   third   ","  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}
+ (1 row)
+ 
diff --git a/src/test/regress/sql/name.sql b/src/test/regress/sql/name.sql
new file mode 100644
index 1c7a671..3741d1b
*** a/src/test/regress/sql/name.sql
--- b/src/test/regress/sql/name.sql
*************** SELECT '' AS three, c.f1 FROM NAME_TBL c
*** 52,54 ****
--- 52,81 ----
  SELECT '' AS two, c.f1 FROM NAME_TBL c WHERE c.f1 ~ '.*asdf.*';
  
  DROP TABLE NAME_TBL;
+ 
+ DO $$
+ DECLARE r text[];
+ BEGIN
+   r := parse_ident('Schemax.Tabley');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+   r := parse_ident('"SchemaX"."TableY"');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+ END;
+ $$;
+ 
+ SELECT parse_ident('foo.boo');
+ SELECT parse_ident('foo.boo[]'); -- should fail
+ SELECT parse_ident('foo.boo[]', strict => false); -- ok
+ 
+ -- should fail
+ SELECT parse_ident(' ');
+ SELECT parse_ident(' .aaa');
+ SELECT parse_ident(' aaa . ');
+ SELECT parse_ident('aaa.a%b');
+ SELECT parse_ident(E'X\rXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX');
+ 
+ SELECT length(a[1]), length(a[2]) from parse_ident('"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx".yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy') as a ;
+ 
+ SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"');
+ SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"')::name[];
+

#45

Teodor Sigaev

teodor@sigaev.ru

almost 10 years ago

In reply to: Pavel Stehule (#44)

Re: proposal: function parse_ident

select
parse_ident(E'X\rXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX');

I am sending updated patch - I used json function for correct escaping - the
escaping behave is same.

Hmm, it doesn't look so:
% select parse_ident(E'_\005');
ERROR: identifier contains disallowed characters: "_\u0005"
% select parse_ident(E'\005');
ERROR: missing identifier: "\u0005"

but

# select parse_ident(E'"\005"');
parse_ident
-------------
{\x05}

Error messages above point wrong character wrongly.

One more inconsistence:
# select parse_ident(E'"\005"') as "\005";
\005
--------
{\x05}

Display outputs of actual identifier and parse_indent are differ.

Actually, I can live with both but are any other opinions? Seems, at least
difference of actual identifier and output of parse_indent should be pointed in
docs.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#46

Pavel Stehule

pavel.stehule@gmail.com

almost 10 years ago

In reply to: Teodor Sigaev (#45)

1 attachment(s)

Re: proposal: function parse_ident

2016-03-10 15:34 GMT+01:00 Teodor Sigaev <teodor@sigaev.ru>:

select

parse_ident(E'X\rXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX');

I am sending updated patch - I used json function for correct escaping -
the
escaping behave is same.

Hmm, it doesn't look so:
% select parse_ident(E'_\005');
ERROR: identifier contains disallowed characters: "_\u0005"
% select parse_ident(E'\005');
ERROR: missing identifier: "\u0005"

but

# select parse_ident(E'"\005"');
parse_ident
-------------
{\x05}

Error messages above point wrong character wrongly.

One more inconsistence:
# select parse_ident(E'"\005"') as "\005";
\005
--------
{\x05}

Display outputs of actual identifier and parse_indent are differ.

Actually, I can live with both but are any other opinions? Seems, at least
difference of actual identifier and output of parse_indent should be
pointed in docs.

I afraid so I cannot to fix this inconsistency (if this is inconsistency -
the binary values are same) - the parameter of function is raw string with
processed escape codes, and I have not any information about original
escape sequences. When you enter octet value, and I show it as hex value,
then there should be difference. Buy I have not information about your
input (octet or hex). I have the original string of SQL identifier inside
parser, executor, but I have not original string of function parameter
inside function (not without pretty complex and long code).

I am trying describe it in doc (I am sorry for my less level English) in
new patch. Fixed duplicated oid too.

Regards

Pavel

Show quoted text

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW:
http://www.sigaev.ru/

Attachments:

parse_ident-12.patchtext/x-patch; charset=US-ASCII; name=parse_ident-12.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index 4b5ee81..05b3cf9
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***************
*** 1821,1826 ****
--- 1821,1849 ----
        <row>
         <entry>
          <indexterm>
+          <primary>parse_ident</primary>
+         </indexterm>
+         <literal><function>parse_ident(<parameter>str</parameter> <type>text</type>,
+            [ <parameter>strictmode</parameter> <type>boolean</type> DEFAULT true ] )</function></literal>
+        </entry>
+        <entry><type>text[]</type></entry>
+        <entry>Split <parameter>qualified identifier</parameter> into array <parameter>parts</parameter>.
+        When <parameter>strictmode</parameter> is false, extra characters after the identifier are ignored.
+        This is useful for parsing identifiers for objects like functions and arrays that may have trailing
+        characters. By default, extra characters after the last identifier are considered an error.
+        second parameter is false, then chars after last identifier are ignored. Note that this function
+        does not truncate quoted identifiers. If you care about that you should cast the result of this
+        function to name[]. A non printable chars (like 0 to 31) are displayed as hexadecimal codes always,
+        what can be different from PostgreSQL internal SQL identifiers processing, when the oroginal
+        escaped value is displayed.
+        </entry>
+        <entry><literal>parse_ident('"SomeSchema".someTable')</literal></entry>
+        <entry><literal>"SomeSchema,sometable"</literal></entry>
+       </row>
+ 
+       <row>
+        <entry>
+         <indexterm>
           <primary>pg_client_encoding</primary>
          </indexterm>
          <literal><function>pg_client_encoding()</function></literal>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
new file mode 100644
index abf9a70..38af138
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*************** RETURNS jsonb
*** 971,973 ****
--- 971,980 ----
  LANGUAGE INTERNAL
  STRICT IMMUTABLE
  AS 'jsonb_set';
+ 
+ CREATE OR REPLACE FUNCTION
+   parse_ident(str text, strict boolean DEFAULT true)
+ RETURNS text[]
+ LANGUAGE INTERNAL
+ STRICT IMMUTABLE
+ AS 'parse_ident';
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
new file mode 100644
index 2b4ab20..7aa5b76
*** a/src/backend/parser/scansup.c
--- b/src/backend/parser/scansup.c
*************** scanstr(const char *s)
*** 130,135 ****
--- 130,144 ----
  char *
  downcase_truncate_identifier(const char *ident, int len, bool warn)
  {
+ 	return downcase_identifier(ident, len, warn, true);
+ }
+ 
+ /*
+  * a workhorse for downcase_truncate_identifier
+  */
+ char *
+ downcase_identifier(const char *ident, int len, bool warn, bool truncate)
+ {
  	char	   *result;
  	int			i;
  	bool		enc_is_single_byte;
*************** downcase_truncate_identifier(const char
*** 158,169 ****
  	}
  	result[i] = '\0';
  
! 	if (i >= NAMEDATALEN)
  		truncate_identifier(result, i, warn);
  
  	return result;
  }
  
  /*
   * truncate_identifier() --- truncate an identifier to NAMEDATALEN-1 bytes.
   *
--- 167,179 ----
  	}
  	result[i] = '\0';
  
! 	if (i >= NAMEDATALEN && truncate)
  		truncate_identifier(result, i, warn);
  
  	return result;
  }
  
+ 
  /*
   * truncate_identifier() --- truncate an identifier to NAMEDATALEN-1 bytes.
   *
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
new file mode 100644
index 43f36db..d11581a
*** a/src/backend/utils/adt/misc.c
--- b/src/backend/utils/adt/misc.c
***************
*** 21,32 ****
--- 21,35 ----
  #include <unistd.h>
  
  #include "access/sysattr.h"
+ #include "access/htup_details.h"
  #include "catalog/catalog.h"
+ #include "catalog/namespace.h"
  #include "catalog/pg_tablespace.h"
  #include "catalog/pg_type.h"
  #include "commands/dbcommands.h"
  #include "funcapi.h"
  #include "miscadmin.h"
+ #include "parser/scansup.h"
  #include "parser/keywords.h"
  #include "postmaster/syslogger.h"
  #include "rewrite/rewriteHandler.h"
***************
*** 38,44 ****
--- 41,49 ----
  #include "utils/ruleutils.h"
  #include "tcop/tcopprot.h"
  #include "utils/acl.h"
+ #include "utils/array.h"
  #include "utils/builtins.h"
+ #include "utils/json.h"
  #include "utils/timestamp.h"
  
  #define atooid(x)  ((Oid) strtoul((x), NULL, 10))
*************** pg_column_is_updatable(PG_FUNCTION_ARGS)
*** 719,721 ****
--- 724,912 ----
  
  	PG_RETURN_BOOL((events & REQ_EVENTS) == REQ_EVENTS);
  }
+ 
+ 
+ /*
+  * This simple parser utility are compatible with lexer implementation,
+  * used only in parse_ident function
+  */
+ static bool
+ is_ident_start(unsigned char c)
+ {
+ 	if (c == '_')
+ 		return true;
+ 	if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'))
+ 		return true;
+ 
+ 	if (c >= 0200 && c <= 0377)
+ 		return true;
+ 
+ 	return false;
+ }
+ 
+ static bool
+ is_ident_cont(unsigned char c)
+ {
+ 	if (c >= '0' && c <= '9')
+ 		return true;
+ 
+ 	return is_ident_start(c);
+ }
+ 
+ /*
+  * Sanitize enhanced string for using in error message.
+  */
+ static char *
+ sanitize_str(const char *str)
+ {
+ 	StringInfoData	ds;
+ 
+ 	/* we share same escaping rules with json escaping function */
+ 	initStringInfo(&ds);
+ 	escape_json(&ds, str);
+ 
+ 	return ds.data;
+ }
+ 
+ /*
+  * parse_ident - parse SQL composed identifier to separate identifiers.
+  * When strict mode is active (second parameter), then any chars after
+  * last identifiers are disallowed.
+  */
+ Datum
+ parse_ident(PG_FUNCTION_ARGS)
+ {
+ 	text		*qualname;
+ 	char		*qualname_str;
+ 	bool		strict;
+ 	ArrayBuildState *astate = NULL;
+ 	char	*nextp;
+ 	bool		after_dot = false;
+ 
+ 	qualname = PG_GETARG_TEXT_PP(0);
+ 	qualname_str = text_to_cstring(qualname);
+ 	strict = PG_GETARG_BOOL(1);
+ 
+ 	nextp = qualname_str;
+ 
+ 	/* skip leading whitespace */
+ 	while (isspace((unsigned char) *nextp))
+ 		nextp++;
+ 
+ 	for (;;)
+ 	{
+ 		char		*curname;
+ 		char		*endp;
+ 		bool		missing_ident;
+ 
+ 		missing_ident = true;
+ 
+ 		if (*nextp == '\"')
+ 		{
+ 			curname = nextp + 1;
+ 			for (;;)
+ 			{
+ 				endp = strchr(nextp + 1, '\"');
+ 				if (endp == NULL)
+ 					ereport(ERROR,
+ 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 						 errmsg("unclosed double quotes"),
+ 						 errdetail("string %s is not valid identifier",
+ 													sanitize_str(qualname_str))));
+ 				if (endp[1] != '\"')
+ 					break;
+ 				memmove(endp, endp + 1, strlen(endp));
+ 				nextp = endp;
+ 			}
+ 			nextp = endp + 1;
+ 			*endp = '\0';
+ 
+ 			if (endp - curname == 0)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("identifier should not be empty: %s",
+ 												sanitize_str(qualname_str))));
+ 
+ 			astate = accumArrayResult(astate,
+ 				CStringGetTextDatum(curname), false,
+ 						    TEXTOID, CurrentMemoryContext);
+ 			missing_ident = false;
+ 		}
+ 		else
+ 		{
+ 			if (is_ident_start((unsigned char) *nextp))
+ 			{
+ 				char *downname;
+ 				int	len;
+ 				text	*part;
+ 
+ 				curname = nextp++;
+ 				while (is_ident_cont((unsigned char) *nextp))
+ 					nextp++;
+ 
+ 				len = nextp - curname;
+ 
+ 				/*
+ 				 * Unlike name, we don't implicitly truncate identifiers. This
+ 				 * is useful for allowing the user to check for specific parts
+ 				 * of the identifier being too long. It's easy enough for the
+ 				 * user to get the truncated names by casting our output to
+ 				 * name[].
+ 				 */
+ 				downname = downcase_identifier(curname, len, false, false);
+ 				part = cstring_to_text_with_len(downname, len);
+ 				astate = accumArrayResult(astate,
+ 					PointerGetDatum(part), false,
+ 							    TEXTOID, CurrentMemoryContext);
+ 				missing_ident = false;
+ 			}
+ 		}
+ 
+ 		if (missing_ident)
+ 		{
+ 			/* Different error messages based on where we failed. */
+ 			if (*nextp == '.')
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("missing identifier before \".\" symbol: %s",
+ 												sanitize_str(qualname_str))));
+ 			else if (after_dot)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("missing identifier after \".\" symbol: %s",
+ 												sanitize_str(qualname_str))));
+ 			else
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("missing identifier: %s",
+ 												sanitize_str(qualname_str))));
+ 		}
+ 
+ 		while (isspace((unsigned char) *nextp))
+ 			nextp++;
+ 
+ 		if (*nextp == '.')
+ 		{
+ 			after_dot = true;
+ 			nextp++;
+ 			while (isspace((unsigned char) *nextp))
+ 				nextp++;
+ 			continue;
+ 		}
+ 		else if (*nextp == '\0')
+ 		{
+ 			break;
+ 		}
+ 		else
+ 		{
+ 			if (strict)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("identifier contains disallowed characters: %s",
+ 												sanitize_str(qualname_str))));
+ 			break;
+ 		}
+ 	}
+ 
+ 	PG_RETURN_DATUM(makeArrayResult(astate, CurrentMemoryContext));
+ }
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
new file mode 100644
index a0f821a..c56b496
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** DESCR("I/O");
*** 3461,3466 ****
--- 3461,3469 ----
  DATA(insert OID = 4086 (  to_regnamespace	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 4089 "25" _null_ _null_ _null_ _null_ _null_ to_regnamespace _null_ _null_ _null_ ));
  DESCR("convert namespace name to regnamespace");
  
+ DATA(insert OID = 3319 (  parse_ident		PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 1009 "25 16" _null_ _null_ "{str,strict}" _null_ _null_ parse_ident _null_ _null_ _null_ ));
+ DESCR("parse qualified identifier to array of identifiers");
+ 
  DATA(insert OID = 2246 ( fmgr_internal_validator PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_internal_validator _null_ _null_ _null_ ));
  DESCR("(internal)");
  DATA(insert OID = 2247 ( fmgr_c_validator	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_c_validator _null_ _null_ _null_ ));
diff --git a/src/include/parser/scansup.h b/src/include/parser/scansup.h
new file mode 100644
index 4f4164b..4f95c81
*** a/src/include/parser/scansup.h
--- b/src/include/parser/scansup.h
*************** extern char *scanstr(const char *s);
*** 20,25 ****
--- 20,28 ----
  extern char *downcase_truncate_identifier(const char *ident, int len,
  							 bool warn);
  
+ extern char *downcase_identifier(const char *ident, int len,
+ 							 bool warn, bool truncate);
+ 
  extern void truncate_identifier(char *ident, int len, bool warn);
  
  extern bool scanner_isspace(char ch);
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
new file mode 100644
index 115f8af..5237e27
*** a/src/include/utils/builtins.h
--- b/src/include/utils/builtins.h
*************** extern Datum pg_typeof(PG_FUNCTION_ARGS)
*** 509,514 ****
--- 509,515 ----
  extern Datum pg_collation_for(PG_FUNCTION_ARGS);
  extern Datum pg_relation_is_updatable(PG_FUNCTION_ARGS);
  extern Datum pg_column_is_updatable(PG_FUNCTION_ARGS);
+ extern Datum parse_ident(PG_FUNCTION_ARGS);
  
  /* oid.c */
  extern Datum oidin(PG_FUNCTION_ARGS);
diff --git a/src/test/regress/expected/name.out b/src/test/regress/expected/name.out
new file mode 100644
index b359d52..a6bc88a
*** a/src/test/regress/expected/name.out
--- b/src/test/regress/expected/name.out
*************** SELECT '' AS two, c.f1 FROM NAME_TBL c W
*** 124,126 ****
--- 124,180 ----
  (2 rows)
  
  DROP TABLE NAME_TBL;
+ DO $$
+ DECLARE r text[];
+ BEGIN
+   r := parse_ident('Schemax.Tabley');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+   r := parse_ident('"SchemaX"."TableY"');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+ END;
+ $$;
+ NOTICE:  schemax.tabley
+ NOTICE:  "SchemaX"."TableY"
+ SELECT parse_ident('foo.boo');
+  parse_ident 
+ -------------
+  {foo,boo}
+ (1 row)
+ 
+ SELECT parse_ident('foo.boo[]'); -- should fail
+ ERROR:  identifier contains disallowed characters: "foo.boo[]"
+ SELECT parse_ident('foo.boo[]', strict => false); -- ok
+  parse_ident 
+ -------------
+  {foo,boo}
+ (1 row)
+ 
+ -- should fail
+ SELECT parse_ident(' ');
+ ERROR:  missing identifier: " "
+ SELECT parse_ident(' .aaa');
+ ERROR:  missing identifier before "." symbol: " .aaa"
+ SELECT parse_ident(' aaa . ');
+ ERROR:  missing identifier after "." symbol: " aaa . "
+ SELECT parse_ident('aaa.a%b');
+ ERROR:  identifier contains disallowed characters: "aaa.a%b"
+ SELECT parse_ident(E'X\rXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX');
+ ERROR:  identifier contains disallowed characters: "X\rXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
+ SELECT length(a[1]), length(a[2]) from parse_ident('"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx".yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy') as a ;
+  length | length 
+ --------+--------
+     414 |    289
+ (1 row)
+ 
+ SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"');
+                                                 parse_ident                                                
+ -----------------------------------------------------------------------------------------------------------
+  {first,"  second  ","   third   ","  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}
+ (1 row)
+ 
+ SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"')::name[];
+                                              parse_ident                                              
+ ------------------------------------------------------------------------------------------------------
+  {first,"  second  ","   third   ","  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}
+ (1 row)
+ 
diff --git a/src/test/regress/sql/name.sql b/src/test/regress/sql/name.sql
new file mode 100644
index 1c7a671..3741d1b
*** a/src/test/regress/sql/name.sql
--- b/src/test/regress/sql/name.sql
*************** SELECT '' AS three, c.f1 FROM NAME_TBL c
*** 52,54 ****
--- 52,81 ----
  SELECT '' AS two, c.f1 FROM NAME_TBL c WHERE c.f1 ~ '.*asdf.*';
  
  DROP TABLE NAME_TBL;
+ 
+ DO $$
+ DECLARE r text[];
+ BEGIN
+   r := parse_ident('Schemax.Tabley');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+   r := parse_ident('"SchemaX"."TableY"');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+ END;
+ $$;
+ 
+ SELECT parse_ident('foo.boo');
+ SELECT parse_ident('foo.boo[]'); -- should fail
+ SELECT parse_ident('foo.boo[]', strict => false); -- ok
+ 
+ -- should fail
+ SELECT parse_ident(' ');
+ SELECT parse_ident(' .aaa');
+ SELECT parse_ident(' aaa . ');
+ SELECT parse_ident('aaa.a%b');
+ SELECT parse_ident(E'X\rXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX');
+ 
+ SELECT length(a[1]), length(a[2]) from parse_ident('"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx".yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy') as a ;
+ 
+ SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"');
+ SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"')::name[];
+

#47

Teodor Sigaev

teodor@sigaev.ru

almost 10 years ago

In reply to: Pavel Stehule (#46)

1 attachment(s)

Re: proposal: function parse_ident

I afraid so I cannot to fix this inconsistency (if this is inconsistency - the
binary values are same) - the parameter of function is raw string with processed
escape codes, and I have not any information about original escape sequences.
When you enter octet value, and I show it as hex value, then there should be
difference. Buy I have not information about your input (octet or hex). I have
the original string of SQL identifier inside parser, executor, but I have not
original string of function parameter inside function (not without pretty
complex and long code).

Ok, agree

I am trying describe it in doc (I am sorry for my less level English) in new
patch. Fixed duplicated oid too.

Edited a bit + fix some typos and remove unneeded headers, patch attached

Sorry, I can't find all corner-cases at once, but:
SELECT parse_ident(E'"c".X XXXXXXXXXX');
ERROR: identifier contains disallowed characters: "\"c"

Error message wrongly points to the reason of error.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

Attachments:

parse_ident-13.patchbinary/octet-stream; name=parse_ident-13.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 000489d..918356c 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1821,6 +1821,32 @@
       <row>
        <entry>
         <indexterm>
+         <primary>parse_ident</primary>
+        </indexterm>
+        <literal><function>parse_ident(<parameter>str</parameter> <type>text</type>,
+           [ <parameter>strictmode</parameter> <type>boolean</type> DEFAULT true ] )</function></literal>
+       </entry>
+       <entry><type>text[]</type></entry>
+       <entry>Split <parameter>qualified identifier</parameter> into array
+       <parameter>parts</parameter>. When <parameter>strictmode</parameter> is
+       false, extra characters after the identifier are ignored. This is useful
+       for parsing identifiers for objects like functions and arrays that may
+       have trailing characters. By default, extra characters after the last
+       identifier are considered an error, but if second parameter is false,
+       then chararacters after last identifier are ignored. Note that this
+       function does not truncate quoted identifiers. If you care about that
+       you should cast the result of this function to name[]. A non-printable
+       chararacters (like 0 to 31) are displayed as hexadecimal codes always,
+       what can be different from PostgreSQL internal SQL identifiers
+       processing, when the original escaped value is displayed.
+       </entry>
+       <entry><literal>parse_ident('"SomeSchema".someTable')</literal></entry>
+       <entry><literal>"SomeSchema,sometable"</literal></entry>
+      </row>
+
+      <row>
+       <entry>
+        <indexterm>
          <primary>pg_client_encoding</primary>
         </indexterm>
         <literal><function>pg_client_encoding()</function></literal>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 84aa061..cb6fbb2 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -972,3 +972,10 @@ RETURNS jsonb
 LANGUAGE INTERNAL
 STRICT IMMUTABLE
 AS 'jsonb_set';
+
+CREATE OR REPLACE FUNCTION
+  parse_ident(str text, strict boolean DEFAULT true)
+RETURNS text[]
+LANGUAGE INTERNAL
+STRICT IMMUTABLE
+AS 'parse_ident';
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index 2b4ab20..7aa5b76 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -130,6 +130,15 @@ scanstr(const char *s)
 char *
 downcase_truncate_identifier(const char *ident, int len, bool warn)
 {
+	return downcase_identifier(ident, len, warn, true);
+}
+
+/*
+ * a workhorse for downcase_truncate_identifier
+ */
+char *
+downcase_identifier(const char *ident, int len, bool warn, bool truncate)
+{
 	char	   *result;
 	int			i;
 	bool		enc_is_single_byte;
@@ -158,12 +167,13 @@ downcase_truncate_identifier(const char *ident, int len, bool warn)
 	}
 	result[i] = '\0';
 
-	if (i >= NAMEDATALEN)
+	if (i >= NAMEDATALEN && truncate)
 		truncate_identifier(result, i, warn);
 
 	return result;
 }
 
+
 /*
  * truncate_identifier() --- truncate an identifier to NAMEDATALEN-1 bytes.
  *
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index 43f36db..fab1331 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -27,6 +27,7 @@
 #include "commands/dbcommands.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "parser/scansup.h"
 #include "parser/keywords.h"
 #include "postmaster/syslogger.h"
 #include "rewrite/rewriteHandler.h"
@@ -39,6 +40,7 @@
 #include "tcop/tcopprot.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/json.h"
 #include "utils/timestamp.h"
 
 #define atooid(x)  ((Oid) strtoul((x), NULL, 10))
@@ -719,3 +721,187 @@ pg_column_is_updatable(PG_FUNCTION_ARGS)
 
 	PG_RETURN_BOOL((events & REQ_EVENTS) == REQ_EVENTS);
 }
+
+
+/*
+ * This simple parser utility are compatible with lexer implementation,
+ * used only in parse_ident function
+ */
+static bool
+is_ident_start(unsigned char c)
+{
+	if (c == '_')
+		return true;
+	if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'))
+		return true;
+
+	if (c >= 0200 && c <= 0377)
+		return true;
+
+	return false;
+}
+
+static bool
+is_ident_cont(unsigned char c)
+{
+	if (c >= '0' && c <= '9')
+		return true;
+
+	return is_ident_start(c);
+}
+
+/*
+ * Sanitize enhanced string for using in error message.
+ */
+static char *
+sanitize_str(const char *str)
+{
+	StringInfoData	ds;
+
+	/* we share same escaping rules with json escaping function */
+	initStringInfo(&ds);
+	escape_json(&ds, str);
+
+	return ds.data;
+}
+
+/*
+ * parse_ident - parse SQL composed identifier to separate identifiers.
+ * When strict mode is active (second parameter), then any chars after
+ * last identifiers are disallowed.
+ */
+Datum
+parse_ident(PG_FUNCTION_ARGS)
+{
+	text		*qualname;
+	char		*qualname_str;
+	bool		strict;
+	ArrayBuildState *astate = NULL;
+	char	*nextp;
+	bool		after_dot = false;
+
+	qualname = PG_GETARG_TEXT_PP(0);
+	qualname_str = text_to_cstring(qualname);
+	strict = PG_GETARG_BOOL(1);
+
+	nextp = qualname_str;
+
+	/* skip leading whitespace */
+	while (isspace((unsigned char) *nextp))
+		nextp++;
+
+	for (;;)
+	{
+		char		*curname;
+		char		*endp;
+		bool		missing_ident;
+
+		missing_ident = true;
+
+		if (*nextp == '\"')
+		{
+			curname = nextp + 1;
+			for (;;)
+			{
+				endp = strchr(nextp + 1, '\"');
+				if (endp == NULL)
+					ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("unclosed double quotes"),
+						 errdetail("string %s is not valid identifier",
+									sanitize_str(qualname_str))));
+				if (endp[1] != '\"')
+					break;
+				memmove(endp, endp + 1, strlen(endp));
+				nextp = endp;
+			}
+			nextp = endp + 1;
+			*endp = '\0';
+
+			if (endp - curname == 0)
+				ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("identifier should not be empty: %s",
+							sanitize_str(qualname_str))));
+
+			astate = accumArrayResult(astate, CStringGetTextDatum(curname),
+									  false, TEXTOID, CurrentMemoryContext);
+			missing_ident = false;
+		}
+		else
+		{
+			if (is_ident_start((unsigned char) *nextp))
+			{
+				char *downname;
+				int	len;
+				text	*part;
+
+				curname = nextp++;
+				while (is_ident_cont((unsigned char) *nextp))
+					nextp++;
+
+				len = nextp - curname;
+
+				/*
+				 * Unlike name, we don't implicitly truncate identifiers. This
+				 * is useful for allowing the user to check for specific parts
+				 * of the identifier being too long. It's easy enough for the
+				 * user to get the truncated names by casting our output to
+				 * name[].
+				 */
+				downname = downcase_identifier(curname, len, false, false);
+				part = cstring_to_text_with_len(downname, len);
+				astate = accumArrayResult(astate, PointerGetDatum(part), false,
+										  TEXTOID, CurrentMemoryContext);
+				missing_ident = false;
+			}
+		}
+
+		if (missing_ident)
+		{
+			/* Different error messages based on where we failed. */
+			if (*nextp == '.')
+				ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("missing identifier before \".\" symbol: %s",
+							sanitize_str(qualname_str))));
+			else if (after_dot)
+				ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("missing identifier after \".\" symbol: %s",
+							sanitize_str(qualname_str))));
+			else
+				ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("missing identifier: %s",
+							sanitize_str(qualname_str))));
+		}
+
+		while (isspace((unsigned char) *nextp))
+			nextp++;
+
+		if (*nextp == '.')
+		{
+			after_dot = true;
+			nextp++;
+			while (isspace((unsigned char) *nextp))
+				nextp++;
+			continue;
+		}
+		else if (*nextp == '\0')
+		{
+			break;
+		}
+		else
+		{
+			if (strict)
+				ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("identifier contains disallowed characters: %s",
+							sanitize_str(qualname_str))));
+			break;
+		}
+	}
+
+	PG_RETURN_DATUM(makeArrayResult(astate, CurrentMemoryContext));
+}
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 5c71bce..31ffa71 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -3463,6 +3463,9 @@ DESCR("I/O");
 DATA(insert OID = 4086 (  to_regnamespace	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 4089 "25" _null_ _null_ _null_ _null_ _null_ to_regnamespace _null_ _null_ _null_ ));
 DESCR("convert namespace name to regnamespace");
 
+DATA(insert OID = 1268 (  parse_ident		PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 1009 "25 16" _null_ _null_ "{str,strict}" _null_ _null_ parse_ident _null_ _null_ _null_ ));
+DESCR("parse qualified identifier to array of identifiers");
+
 DATA(insert OID = 2246 ( fmgr_internal_validator PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_internal_validator _null_ _null_ _null_ ));
 DESCR("(internal)");
 DATA(insert OID = 2247 ( fmgr_c_validator	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_c_validator _null_ _null_ _null_ ));
diff --git a/src/include/parser/scansup.h b/src/include/parser/scansup.h
index 4f4164b..4f95c81 100644
--- a/src/include/parser/scansup.h
+++ b/src/include/parser/scansup.h
@@ -20,6 +20,9 @@ extern char *scanstr(const char *s);
 extern char *downcase_truncate_identifier(const char *ident, int len,
 							 bool warn);
 
+extern char *downcase_identifier(const char *ident, int len,
+							 bool warn, bool truncate);
+
 extern void truncate_identifier(char *ident, int len, bool warn);
 
 extern bool scanner_isspace(char ch);
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 59a00bb..206288d 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -510,6 +510,7 @@ extern Datum pg_typeof(PG_FUNCTION_ARGS);
 extern Datum pg_collation_for(PG_FUNCTION_ARGS);
 extern Datum pg_relation_is_updatable(PG_FUNCTION_ARGS);
 extern Datum pg_column_is_updatable(PG_FUNCTION_ARGS);
+extern Datum parse_ident(PG_FUNCTION_ARGS);
 
 /* oid.c */
 extern Datum oidin(PG_FUNCTION_ARGS);
diff --git a/src/test/regress/expected/name.out b/src/test/regress/expected/name.out
index b359d52..a6bc88a 100644
--- a/src/test/regress/expected/name.out
+++ b/src/test/regress/expected/name.out
@@ -124,3 +124,57 @@ SELECT '' AS two, c.f1 FROM NAME_TBL c WHERE c.f1 ~ '.*asdf.*';
 (2 rows)
 
 DROP TABLE NAME_TBL;
+DO $$
+DECLARE r text[];
+BEGIN
+  r := parse_ident('Schemax.Tabley');
+  RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+  r := parse_ident('"SchemaX"."TableY"');
+  RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+END;
+$$;
+NOTICE:  schemax.tabley
+NOTICE:  "SchemaX"."TableY"
+SELECT parse_ident('foo.boo');
+ parse_ident 
+-------------
+ {foo,boo}
+(1 row)
+
+SELECT parse_ident('foo.boo[]'); -- should fail
+ERROR:  identifier contains disallowed characters: "foo.boo[]"
+SELECT parse_ident('foo.boo[]', strict => false); -- ok
+ parse_ident 
+-------------
+ {foo,boo}
+(1 row)
+
+-- should fail
+SELECT parse_ident(' ');
+ERROR:  missing identifier: " "
+SELECT parse_ident(' .aaa');
+ERROR:  missing identifier before "." symbol: " .aaa"
+SELECT parse_ident(' aaa . ');
+ERROR:  missing identifier after "." symbol: " aaa . "
+SELECT parse_ident('aaa.a%b');
+ERROR:  identifier contains disallowed characters: "aaa.a%b"
+SELECT parse_ident(E'X\rXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX');
+ERROR:  identifier contains disallowed characters: "X\rXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
+SELECT length(a[1]), length(a[2]) from parse_ident('"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx".yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy') as a ;
+ length | length 
+--------+--------
+    414 |    289
+(1 row)
+
+SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"');
+                                                parse_ident                                                
+-----------------------------------------------------------------------------------------------------------
+ {first,"  second  ","   third   ","  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}
+(1 row)
+
+SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"')::name[];
+                                             parse_ident                                              
+------------------------------------------------------------------------------------------------------
+ {first,"  second  ","   third   ","  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}
+(1 row)
+
diff --git a/src/test/regress/sql/name.sql b/src/test/regress/sql/name.sql
index 1c7a671..3741d1b 100644
--- a/src/test/regress/sql/name.sql
+++ b/src/test/regress/sql/name.sql
@@ -52,3 +52,30 @@ SELECT '' AS three, c.f1 FROM NAME_TBL c WHERE c.f1 ~ '[0-9]';
 SELECT '' AS two, c.f1 FROM NAME_TBL c WHERE c.f1 ~ '.*asdf.*';
 
 DROP TABLE NAME_TBL;
+
+DO $$
+DECLARE r text[];
+BEGIN
+  r := parse_ident('Schemax.Tabley');
+  RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+  r := parse_ident('"SchemaX"."TableY"');
+  RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+END;
+$$;
+
+SELECT parse_ident('foo.boo');
+SELECT parse_ident('foo.boo[]'); -- should fail
+SELECT parse_ident('foo.boo[]', strict => false); -- ok
+
+-- should fail
+SELECT parse_ident(' ');
+SELECT parse_ident(' .aaa');
+SELECT parse_ident(' aaa . ');
+SELECT parse_ident('aaa.a%b');
+SELECT parse_ident(E'X\rXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX');
+
+SELECT length(a[1]), length(a[2]) from parse_ident('"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx".yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy') as a ;
+
+SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"');
+SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"')::name[];
+

#48

Pavel Stehule

pavel.stehule@gmail.com

almost 10 years ago

In reply to: Teodor Sigaev (#47)

Re: proposal: function parse_ident

2016-03-14 17:39 GMT+01:00 Teodor Sigaev <teodor@sigaev.ru>:

I afraid so I cannot to fix this inconsistency (if this is inconsistency -

the
binary values are same) - the parameter of function is raw string with
processed
escape codes, and I have not any information about original escape
sequences.
When you enter octet value, and I show it as hex value, then there should
be
difference. Buy I have not information about your input (octet or hex). I
have
the original string of SQL identifier inside parser, executor, but I have
not
original string of function parameter inside function (not without pretty
complex and long code).

Ok, agree

I am trying describe it in doc (I am sorry for my less level English) in
new
patch. Fixed duplicated oid too.

Edited a bit + fix some typos and remove unneeded headers, patch attached

Sorry, I can't find all corner-cases at once, but:
SELECT parse_ident(E'"c".X XXXXXXXXXX');
ERROR: identifier contains disallowed characters: "\"c"

I'll check it tomorrow

Thank you

Pavel

Show quoted text

Error message wrongly points to the reason of error.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW:
http://www.sigaev.ru/

#49

Pavel Stehule

pavel.stehule@gmail.com

almost 10 years ago

In reply to: Teodor Sigaev (#47)

1 attachment(s)

Re: proposal: function parse_ident

2016-03-14 17:39 GMT+01:00 Teodor Sigaev <teodor@sigaev.ru>:

I afraid so I cannot to fix this inconsistency (if this is inconsistency -

the
binary values are same) - the parameter of function is raw string with
processed
escape codes, and I have not any information about original escape
sequences.
When you enter octet value, and I show it as hex value, then there should
be
difference. Buy I have not information about your input (octet or hex). I
have
the original string of SQL identifier inside parser, executor, but I have
not
original string of function parameter inside function (not without pretty
complex and long code).

Ok, agree

I am trying describe it in doc (I am sorry for my less level English) in
new
patch. Fixed duplicated oid too.

Edited a bit + fix some typos and remove unneeded headers, patch attached

Sorry, I can't find all corner-cases at once, but:
SELECT parse_ident(E'"c".X XXXXXXXXXX');
ERROR: identifier contains disallowed characters: "\"c"

Error message wrongly points to the reason of error.

I forgot my original plan - show full original string. Now, complete
original parameter is used as part of message everywhere. It is more
consistent.

I cannot to reuse escape_json - it escape double quotes, and then the
paremeter in message looks strange.

I hope so the messages are ok now. Few more regress tests added.

Thank you for your patience.

Regards

Pavel

Show quoted text

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW:
http://www.sigaev.ru/

Attachments:

parse_ident-14.patchtext/x-patch; charset=US-ASCII; name=parse_ident-14.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index 000489d..918356c
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***************
*** 1821,1826 ****
--- 1821,1852 ----
        <row>
         <entry>
          <indexterm>
+          <primary>parse_ident</primary>
+         </indexterm>
+         <literal><function>parse_ident(<parameter>str</parameter> <type>text</type>,
+            [ <parameter>strictmode</parameter> <type>boolean</type> DEFAULT true ] )</function></literal>
+        </entry>
+        <entry><type>text[]</type></entry>
+        <entry>Split <parameter>qualified identifier</parameter> into array
+        <parameter>parts</parameter>. When <parameter>strictmode</parameter> is
+        false, extra characters after the identifier are ignored. This is useful
+        for parsing identifiers for objects like functions and arrays that may
+        have trailing characters. By default, extra characters after the last
+        identifier are considered an error, but if second parameter is false,
+        then chararacters after last identifier are ignored. Note that this
+        function does not truncate quoted identifiers. If you care about that
+        you should cast the result of this function to name[]. A non-printable
+        chararacters (like 0 to 31) are displayed as hexadecimal codes always,
+        what can be different from PostgreSQL internal SQL identifiers
+        processing, when the original escaped value is displayed.
+        </entry>
+        <entry><literal>parse_ident('"SomeSchema".someTable')</literal></entry>
+        <entry><literal>"SomeSchema,sometable"</literal></entry>
+       </row>
+ 
+       <row>
+        <entry>
+         <indexterm>
           <primary>pg_client_encoding</primary>
          </indexterm>
          <literal><function>pg_client_encoding()</function></literal>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
new file mode 100644
index fef67bd..9ae1ef4
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*************** RETURNS jsonb
*** 990,992 ****
--- 990,999 ----
  LANGUAGE INTERNAL
  STRICT IMMUTABLE
  AS 'jsonb_set';
+ 
+ CREATE OR REPLACE FUNCTION
+   parse_ident(str text, strict boolean DEFAULT true)
+ RETURNS text[]
+ LANGUAGE INTERNAL
+ STRICT IMMUTABLE
+ AS 'parse_ident';
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
new file mode 100644
index 2b4ab20..7aa5b76
*** a/src/backend/parser/scansup.c
--- b/src/backend/parser/scansup.c
*************** scanstr(const char *s)
*** 130,135 ****
--- 130,144 ----
  char *
  downcase_truncate_identifier(const char *ident, int len, bool warn)
  {
+ 	return downcase_identifier(ident, len, warn, true);
+ }
+ 
+ /*
+  * a workhorse for downcase_truncate_identifier
+  */
+ char *
+ downcase_identifier(const char *ident, int len, bool warn, bool truncate)
+ {
  	char	   *result;
  	int			i;
  	bool		enc_is_single_byte;
*************** downcase_truncate_identifier(const char
*** 158,169 ****
  	}
  	result[i] = '\0';
  
! 	if (i >= NAMEDATALEN)
  		truncate_identifier(result, i, warn);
  
  	return result;
  }
  
  /*
   * truncate_identifier() --- truncate an identifier to NAMEDATALEN-1 bytes.
   *
--- 167,179 ----
  	}
  	result[i] = '\0';
  
! 	if (i >= NAMEDATALEN && truncate)
  		truncate_identifier(result, i, warn);
  
  	return result;
  }
  
+ 
  /*
   * truncate_identifier() --- truncate an identifier to NAMEDATALEN-1 bytes.
   *
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
new file mode 100644
index 43f36db..8917b1e
*** a/src/backend/utils/adt/misc.c
--- b/src/backend/utils/adt/misc.c
***************
*** 27,32 ****
--- 27,33 ----
  #include "commands/dbcommands.h"
  #include "funcapi.h"
  #include "miscadmin.h"
+ #include "parser/scansup.h"
  #include "parser/keywords.h"
  #include "postmaster/syslogger.h"
  #include "rewrite/rewriteHandler.h"
***************
*** 39,44 ****
--- 40,46 ----
  #include "tcop/tcopprot.h"
  #include "utils/acl.h"
  #include "utils/builtins.h"
+ #include "utils/json.h"
  #include "utils/timestamp.h"
  
  #define atooid(x)  ((Oid) strtoul((x), NULL, 10))
*************** pg_column_is_updatable(PG_FUNCTION_ARGS)
*** 719,721 ****
--- 721,946 ----
  
  	PG_RETURN_BOOL((events & REQ_EVENTS) == REQ_EVENTS);
  }
+ 
+ 
+ /*
+  * This simple parser utility are compatible with lexer implementation,
+  * used only in parse_ident function
+  */
+ static bool
+ is_ident_start(unsigned char c)
+ {
+ 	if (c == '_')
+ 		return true;
+ 	if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'))
+ 		return true;
+ 
+ 	if (c >= 0200 && c <= 0377)
+ 		return true;
+ 
+ 	return false;
+ }
+ 
+ static bool
+ is_ident_cont(unsigned char c)
+ {
+ 	if (c >= '0' && c <= '9')
+ 		return true;
+ 
+ 	return is_ident_start(c);
+ }
+ 
+ /*
+  * Sanitize SQL string for using in error message.
+  */
+ static char *
+ sanitize_text(text *t)
+ {
+ 	int			len = VARSIZE_ANY_EXHDR(t);
+ 	const char *p = VARDATA_ANY(t);
+ 	StringInfo	dstr;
+ 
+ 	dstr = makeStringInfo();
+ 
+ 	appendStringInfoChar(dstr, '"');
+ 
+ 	while (len--)
+ 	{
+ 		switch (*p)
+ 		{
+ 			case '\b':
+ 				appendStringInfoString(dstr, "\\b");
+ 				break;
+ 			case '\f':
+ 				appendStringInfoString(dstr, "\\f");
+ 				break;
+ 			case '\n':
+ 				appendStringInfoString(dstr, "\\n");
+ 				break;
+ 			case '\r':
+ 				appendStringInfoString(dstr, "\\r");
+ 				break;
+ 			case '\t':
+ 				appendStringInfoString(dstr, "\\t");
+ 				break;
+ 			case '\'':
+ 				appendStringInfoString(dstr, "''");
+ 				break;
+ 			case '\\':
+ 				appendStringInfoString(dstr, "\\\\");
+ 				break;
+ 			default:
+ 				if ((unsigned char) *p < ' ')
+ 					appendStringInfo(dstr, "\\u%04x", (int) *p);
+ 				else
+ 					appendStringInfoCharMacro(dstr, *p);
+ 				break;
+ 		}
+ 		p++;
+ 	}
+ 
+ 	appendStringInfoChar(dstr, '"');
+ 
+ 	return dstr->data;
+ }
+ 
+ /*
+  * parse_ident - parse SQL composed identifier to separate identifiers.
+  * When strict mode is active (second parameter), then any chars after
+  * last identifiers are disallowed.
+  */
+ Datum
+ parse_ident(PG_FUNCTION_ARGS)
+ {
+ 	text		*qualname;
+ 	char		*qualname_str;
+ 	bool		strict;
+ 	ArrayBuildState *astate = NULL;
+ 	char	*nextp;
+ 	bool		after_dot = false;
+ 
+ 	qualname = PG_GETARG_TEXT_PP(0);
+ 	qualname_str = text_to_cstring(qualname);
+ 	strict = PG_GETARG_BOOL(1);
+ 
+ 	nextp = qualname_str;
+ 
+ 	/* skip leading whitespace */
+ 	while (isspace((unsigned char) *nextp))
+ 		nextp++;
+ 
+ 	for (;;)
+ 	{
+ 		char		*curname;
+ 		char		*endp;
+ 		bool		missing_ident;
+ 
+ 		missing_ident = true;
+ 
+ 		if (*nextp == '\"')
+ 		{
+ 			curname = nextp + 1;
+ 			for (;;)
+ 			{
+ 				endp = strchr(nextp + 1, '\"');
+ 				if (endp == NULL)
+ 					ereport(ERROR,
+ 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 						 errmsg("unclosed double quotes"),
+ 						 errdetail("string %s is not valid identifier",
+ 									sanitize_text(qualname))));
+ 				if (endp[1] != '\"')
+ 					break;
+ 				memmove(endp, endp + 1, strlen(endp));
+ 				nextp = endp;
+ 			}
+ 			nextp = endp + 1;
+ 			*endp = '\0';
+ 
+ 			/* Show complete input string in this case. */
+ 			if (endp - curname == 0)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("identifier should not be empty: %s",
+ 							sanitize_text(qualname))));
+ 
+ 			astate = accumArrayResult(astate, CStringGetTextDatum(curname),
+ 									  false, TEXTOID, CurrentMemoryContext);
+ 			missing_ident = false;
+ 		}
+ 		else
+ 		{
+ 			if (is_ident_start((unsigned char) *nextp))
+ 			{
+ 				char *downname;
+ 				int	len;
+ 				text	*part;
+ 
+ 				curname = nextp++;
+ 				while (is_ident_cont((unsigned char) *nextp))
+ 					nextp++;
+ 
+ 				len = nextp - curname;
+ 
+ 				/*
+ 				 * Unlike name, we don't implicitly truncate identifiers. This
+ 				 * is useful for allowing the user to check for specific parts
+ 				 * of the identifier being too long. It's easy enough for the
+ 				 * user to get the truncated names by casting our output to
+ 				 * name[].
+ 				 */
+ 				downname = downcase_identifier(curname, len, false, false);
+ 				part = cstring_to_text_with_len(downname, len);
+ 				astate = accumArrayResult(astate, PointerGetDatum(part), false,
+ 										  TEXTOID, CurrentMemoryContext);
+ 				missing_ident = false;
+ 			}
+ 		}
+ 
+ 		if (missing_ident)
+ 		{
+ 			/* Different error messages based on where we failed. */
+ 			if (*nextp == '.')
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("missing valid identifier before \".\" symbol: %s",
+ 							sanitize_text(qualname))));
+ 			else if (after_dot)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("missing valid identifier after \".\" symbol: %s",
+ 							sanitize_text(qualname))));
+ 			else
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("missing valid identifier: %s",
+ 							sanitize_text(qualname))));
+ 		}
+ 
+ 		while (isspace((unsigned char) *nextp))
+ 			nextp++;
+ 
+ 		if (*nextp == '.')
+ 		{
+ 			after_dot = true;
+ 			nextp++;
+ 			while (isspace((unsigned char) *nextp))
+ 				nextp++;
+ 		}
+ 		else if (*nextp == '\0')
+ 		{
+ 			break;
+ 		}
+ 		else
+ 		{
+ 			if (strict)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("identifier contains disallowed characters: %s",
+ 							sanitize_text(qualname))));
+ 			break;
+ 		}
+ 	}
+ 
+ 	PG_RETURN_DATUM(makeArrayResult(astate, CurrentMemoryContext));
+ }
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
new file mode 100644
index ceb8129..a595327
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** DESCR("I/O");
*** 3463,3468 ****
--- 3463,3471 ----
  DATA(insert OID = 4086 (  to_regnamespace	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 4089 "25" _null_ _null_ _null_ _null_ _null_ to_regnamespace _null_ _null_ _null_ ));
  DESCR("convert namespace name to regnamespace");
  
+ DATA(insert OID = 1268 (  parse_ident		PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 1009 "25 16" _null_ _null_ "{str,strict}" _null_ _null_ parse_ident _null_ _null_ _null_ ));
+ DESCR("parse qualified identifier to array of identifiers");
+ 
  DATA(insert OID = 2246 ( fmgr_internal_validator PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_internal_validator _null_ _null_ _null_ ));
  DESCR("(internal)");
  DATA(insert OID = 2247 ( fmgr_c_validator	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 2278 "26" _null_ _null_ _null_ _null_ _null_ fmgr_c_validator _null_ _null_ _null_ ));
diff --git a/src/include/parser/scansup.h b/src/include/parser/scansup.h
new file mode 100644
index 4f4164b..4f95c81
*** a/src/include/parser/scansup.h
--- b/src/include/parser/scansup.h
*************** extern char *scanstr(const char *s);
*** 20,25 ****
--- 20,28 ----
  extern char *downcase_truncate_identifier(const char *ident, int len,
  							 bool warn);
  
+ extern char *downcase_identifier(const char *ident, int len,
+ 							 bool warn, bool truncate);
+ 
  extern void truncate_identifier(char *ident, int len, bool warn);
  
  extern bool scanner_isspace(char ch);
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
new file mode 100644
index 59a00bb..206288d
*** a/src/include/utils/builtins.h
--- b/src/include/utils/builtins.h
*************** extern Datum pg_typeof(PG_FUNCTION_ARGS)
*** 510,515 ****
--- 510,516 ----
  extern Datum pg_collation_for(PG_FUNCTION_ARGS);
  extern Datum pg_relation_is_updatable(PG_FUNCTION_ARGS);
  extern Datum pg_column_is_updatable(PG_FUNCTION_ARGS);
+ extern Datum parse_ident(PG_FUNCTION_ARGS);
  
  /* oid.c */
  extern Datum oidin(PG_FUNCTION_ARGS);
diff --git a/src/test/regress/expected/name.out b/src/test/regress/expected/name.out
new file mode 100644
index b359d52..56139d4
*** a/src/test/regress/expected/name.out
--- b/src/test/regress/expected/name.out
*************** SELECT '' AS two, c.f1 FROM NAME_TBL c W
*** 124,126 ****
--- 124,192 ----
  (2 rows)
  
  DROP TABLE NAME_TBL;
+ DO $$
+ DECLARE r text[];
+ BEGIN
+   r := parse_ident('Schemax.Tabley');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+   r := parse_ident('"SchemaX"."TableY"');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+ END;
+ $$;
+ NOTICE:  schemax.tabley
+ NOTICE:  "SchemaX"."TableY"
+ SELECT parse_ident('foo.boo');
+  parse_ident 
+ -------------
+  {foo,boo}
+ (1 row)
+ 
+ SELECT parse_ident('foo.boo[]'); -- should fail
+ ERROR:  identifier contains disallowed characters: "foo.boo[]"
+ SELECT parse_ident('foo.boo[]', strict => false); -- ok
+  parse_ident 
+ -------------
+  {foo,boo}
+ (1 row)
+ 
+ -- should fail
+ SELECT parse_ident(' ');
+ ERROR:  missing valid identifier: " "
+ SELECT parse_ident(' .aaa');
+ ERROR:  missing valid identifier before "." symbol: " .aaa"
+ SELECT parse_ident(' aaa . ');
+ ERROR:  missing valid identifier after "." symbol: " aaa . "
+ SELECT parse_ident('aaa.a%b');
+ ERROR:  identifier contains disallowed characters: "aaa.a%b"
+ SELECT parse_ident(E'X\rXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX');
+ ERROR:  identifier contains disallowed characters: "X\rXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
+ SELECT length(a[1]), length(a[2]) from parse_ident('"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx".yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy') as a ;
+  length | length 
+ --------+--------
+     414 |    289
+ (1 row)
+ 
+ SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"');
+                                                 parse_ident                                                
+ -----------------------------------------------------------------------------------------------------------
+  {first,"  second  ","   third   ","  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}
+ (1 row)
+ 
+ SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"')::name[];
+                                              parse_ident                                              
+ ------------------------------------------------------------------------------------------------------
+  {first,"  second  ","   third   ","  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}
+ (1 row)
+ 
+ SELECT parse_ident(E'"c".X XXXX\002XXXXXX');
+ ERROR:  identifier contains disallowed characters: ""c".X XXXX\u0002XXXXXX"
+ SELECT parse_ident('1020');
+ ERROR:  missing valid identifier: "1020"
+ SELECT parse_ident('10.20');
+ ERROR:  missing valid identifier: "10.20"
+ SELECT parse_ident('.');
+ ERROR:  missing valid identifier before "." symbol: "."
+ SELECT parse_ident('.1020');
+ ERROR:  missing valid identifier before "." symbol: ".1020"
+ SELECT parse_ident('xxx.1020');
+ ERROR:  missing valid identifier after "." symbol: "xxx.1020"
diff --git a/src/test/regress/sql/name.sql b/src/test/regress/sql/name.sql
new file mode 100644
index 1c7a671..602bf26
*** a/src/test/regress/sql/name.sql
--- b/src/test/regress/sql/name.sql
*************** SELECT '' AS three, c.f1 FROM NAME_TBL c
*** 52,54 ****
--- 52,87 ----
  SELECT '' AS two, c.f1 FROM NAME_TBL c WHERE c.f1 ~ '.*asdf.*';
  
  DROP TABLE NAME_TBL;
+ 
+ DO $$
+ DECLARE r text[];
+ BEGIN
+   r := parse_ident('Schemax.Tabley');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+   r := parse_ident('"SchemaX"."TableY"');
+   RAISE NOTICE '%', format('%I.%I', r[1], r[2]);
+ END;
+ $$;
+ 
+ SELECT parse_ident('foo.boo');
+ SELECT parse_ident('foo.boo[]'); -- should fail
+ SELECT parse_ident('foo.boo[]', strict => false); -- ok
+ 
+ -- should fail
+ SELECT parse_ident(' ');
+ SELECT parse_ident(' .aaa');
+ SELECT parse_ident(' aaa . ');
+ SELECT parse_ident('aaa.a%b');
+ SELECT parse_ident(E'X\rXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX');
+ 
+ SELECT length(a[1]), length(a[2]) from parse_ident('"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx".yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy') as a ;
+ 
+ SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"');
+ SELECT parse_ident(' first . "  second  " ."   third   ". "  ' || repeat('x',66) || '"')::name[];
+ 
+ SELECT parse_ident(E'"c".X XXXX\002XXXXXX');
+ SELECT parse_ident('1020');
+ SELECT parse_ident('10.20');
+ SELECT parse_ident('.');
+ SELECT parse_ident('.1020');
+ SELECT parse_ident('xxx.1020');

#50

Teodor Sigaev

teodor@sigaev.ru

almost 10 years ago

In reply to: Pavel Stehule (#49)

Re: proposal: function parse_ident

I hope so the messages are ok now. Few more regress tests added.

Thank you, committed.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#51

Pavel Stehule

pavel.stehule@gmail.com

almost 10 years ago

In reply to: Teodor Sigaev (#50)

Re: proposal: function parse_ident

2016-03-18 16:26 GMT+01:00 Teodor Sigaev <teodor@sigaev.ru>:

I hope so the messages are ok now. Few more regress tests added.

Thank you, committed.

Thank you very much

Pavel

Show quoted text

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW:
http://www.sigaev.ru/