proposal - function string_to_table

Started by Pavel Stehulealmost 6 years ago22 messageshackers
Jump to latest
#1Pavel Stehule
pavel.stehule@gmail.com

Hi

I propose new function string_to_table. This function is significantly
faster (and simpler) variant of regexp_split_to_array function. There was
same process years ago when we implemented string_agg as faster variant of
array_to_string(array_agg()). string_to_table is faster variant (and little
bit more intuitive alternative of unnest(string_to_array()).

string_to_table is about 15% faster than unnest(string_to_array()) and
about 40% faster than regexp_split_to_array.

Initial patch is attached

Notes, comments?

Regards

Pavel

Attachments:

string_to_table.patchtext/x-patch; charset=US-ASCII; name=string_to_table.patchDownload+263-35
#2Justin Pryzby
pryzby@telsasoft.com
In reply to: Pavel Stehule (#1)
Re: proposal - function string_to_table

On Fri, Apr 17, 2020 at 07:47:15PM +0200, Pavel Stehule wrote:

I propose new function string_to_table. This function is significantly

+1

+/*
+ * Add text to result set (table or array). Build a table when set is a expected or build
+ * a array

as expected (??)
*an* array

+select string_to_table('abc', '', 'abc');
+ string_to_table 
+-----------------
+ 
+(1 row)

Maybe you should \pset null '(null)' for this

--
Justin

#3Pavel Stehule
pavel.stehule@gmail.com
In reply to: Justin Pryzby (#2)
Re: proposal - function string_to_table

pá 17. 4. 2020 v 23:29 odesílatel Justin Pryzby <pryzby@telsasoft.com>
napsal:

On Fri, Apr 17, 2020 at 07:47:15PM +0200, Pavel Stehule wrote:

I propose new function string_to_table. This function is significantly

+1

+/*
+ * Add text to result set (table or array). Build a table when set is a

expected or build

+ * a array

as expected (??)
*an* array

I tried to fix this comment

+select string_to_table('abc', '', 'abc');
+ string_to_table
+-----------------
+
+(1 row)

Maybe you should \pset null '(null)' for this

changing NULL output can break lot of existing tests, but I add second
column with info about null

+select string_to_table('1,2,3,4,*,6', ',', '*'),
string_to_table('1,2,3,4,*,6', ',', '*') IS NULL;
+ string_to_table | ?column?
+-----------------+----------
+ 1               | f
+ 2               | f
+ 3               | f
+ 4               | f
+                 | t
+ 6               | f
+(6 rows)

Regards

Pavel

Show quoted text

--
Justin

Attachments:

string_to_table-20200418.patchtext/x-patch; charset=US-ASCII; name=string_to_table-20200418.patchDownload+263-35
#4movead.li@highgo.ca
movead.li@highgo.ca
In reply to: Pavel Stehule (#1)
Re: proposal - function string_to_table
+{ oid => '2228', descr => 'split delimited text',
+  proname => 'string_to_table', prorows => '1000', proretset => 't',
+  prorettype => 'text', proargtypes => 'text text',
+  prosrc => 'text_to_table' },
+{ oid => '2282', descr => 'split delimited text with null string',
+  proname => 'string_to_table', prorows => '1000', proretset => 't',
+  prorettype => 'text', proargtypes => 'text text text',
+  prosrc => 'text_to_table_null' },

I go through the patch, and everything looks good to me. But I do not know
why it needs a 'text_to_table_null()', it's ok to put a 'text_to_table' there, I think.

Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

#5Pavel Stehule
pavel.stehule@gmail.com
In reply to: movead.li@highgo.ca (#4)
Re: proposal - function string_to_table

Hi

čt 4. 6. 2020 v 11:49 odesílatel movead.li@highgo.ca <movead.li@highgo.ca>
napsal:

+{ oid => '2228', descr => 'split delimited text',
+  proname => 'string_to_table', prorows => '1000', proretset => 't',
+  prorettype => 'text', proargtypes => 'text text',
+  prosrc => 'text_to_table' },
+{ oid => '2282', descr => 'split delimited text with null string',
+  proname => 'string_to_table', prorows => '1000', proretset => 't',
+  prorettype => 'text', proargtypes => 'text text text',
+  prosrc => 'text_to_table_null' },

I go through the patch, and everything looks good to me. But I do not know
why it needs a 'text_to_table_null()', it's ok to put a 'text_to_table'
there, I think.

It is a convention in Postgres - every SQL unique signature has its own
unique internal C function.

I am sending a refreshed patch.

Regards

Pavel

Show quoted text

------------------------------
Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

Attachments:

string_to_table-20200605.patchtext/x-patch; charset=US-ASCII; name=string_to_table-20200605.patchDownload+301-50
#6Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#5)
Re: proposal - function string_to_table

pá 5. 6. 2020 v 13:55 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
napsal:

Hi

čt 4. 6. 2020 v 11:49 odesílatel movead.li@highgo.ca <movead.li@highgo.ca>
napsal:

+{ oid => '2228', descr => 'split delimited text',
+  proname => 'string_to_table', prorows => '1000', proretset => 't',
+  prorettype => 'text', proargtypes => 'text text',
+  prosrc => 'text_to_table' },
+{ oid => '2282', descr => 'split delimited text with null string',
+  proname => 'string_to_table', prorows => '1000', proretset => 't',
+  prorettype => 'text', proargtypes => 'text text text',
+  prosrc => 'text_to_table_null' },

I go through the patch, and everything looks good to me. But I do not know
why it needs a 'text_to_table_null()', it's ok to put a 'text_to_table'
there, I think.

It is a convention in Postgres - every SQL unique signature has its own
unique internal C function.

I am sending a refreshed patch.

rebase

Regards

Pavel

Show quoted text

Regards

Pavel

------------------------------
Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

Attachments:

string_to_table-20200705.patchtext/x-patch; charset=US-ASCII; name=string_to_table-20200705.patchDownload+301-50
#7Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#6)
Re: proposal - function string_to_table

ne 5. 7. 2020 v 13:30 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
napsal:

pá 5. 6. 2020 v 13:55 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
napsal:

Hi

čt 4. 6. 2020 v 11:49 odesílatel movead.li@highgo.ca <movead.li@highgo.ca>
napsal:

+{ oid => '2228', descr => 'split delimited text',
+  proname => 'string_to_table', prorows => '1000', proretset => 't',
+  prorettype => 'text', proargtypes => 'text text',
+  prosrc => 'text_to_table' },
+{ oid => '2282', descr => 'split delimited text with null string',
+  proname => 'string_to_table', prorows => '1000', proretset => 't',
+  prorettype => 'text', proargtypes => 'text text text',
+  prosrc => 'text_to_table_null' },

I go through the patch, and everything looks good to me. But I do not
know
why it needs a 'text_to_table_null()', it's ok to put a 'text_to_table'
there, I think.

It is a convention in Postgres - every SQL unique signature has its own
unique internal C function.

I am sending a refreshed patch.

rebase

two fresh fix

a) remove garbage from patch that breaks doc

b) these functions should not be strict - be consistent with
string_to_array functions

Regards

Pavel

Show quoted text

Regards

Pavel

Regards

Pavel

------------------------------
Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

Attachments:

string_to_table-20200706-2.patchtext/x-patch; charset=US-ASCII; name=string_to_table-20200706-2.patchDownload+306-35
#8Peter Smith
smithpb2250@gmail.com
In reply to: Pavel Stehule (#7)
Re: proposal - function string_to_table

Hi.

I have been looking at the patch: string_to_table-20200706-2.patch

Below are some review comments for your consideration.

====

COMMENT func.sgml (style)

+       <para>
+        splits string into table using supplied delimiter and
+        optional null string.
+       </para>

The format style of the short description is inconsistent with the
other functions.
e.g. Should start with Capital letter.
e.g. Should tag the parameter names properly

Something like:
<para>
Splits <parameter>string</parameter> into a table
using supplied <parameter>delimiter</parameter>
and optional null string <parameter>nullstr</parameter>.
</para>

====

COMMENT func.sgml (what does nullstr do)

The description does not sufficiently describe the purpose/behaviour
of the nullstr.

e.g. Firstly I thought that it meant if 2 consecutive delimiters were
encountered it would substitute this string as the row value. But it
is doing the opposite of what I guessed - if the extracted row value
is the same as nullstr then a NULL row is inserted instead.

====

COMMENT func.sgml (wrong sample output)

+<programlisting>xx
+yy,
+zz</programlisting>

This output is incorrect for the sample given. There is no "yy," in
the output because there is a 'yy' nullstr substitution.

Should be:
---
xx
NULL
zz
---

====

COMMENT func.sgml (related to regexp_split_to_table)

Because this new function is similar to the existing
regexp_split_to_table, perhaps they should cross-reference each other
so a reader of this documentation is made aware of the alternative
function?

====

COMMENT (test cases)

It is impossible to tell difference in the output between empty
strings and nulls currently, so maybe you can change all the tests to
have a form like below so they can be validated properly:

# select v, v IS NULL as "is null" from
string_to_table('a,b,*,c,d,',',','*') g(v);
v | is null
---+---------
a | f
b | f
| t
c | f
d | f
| f
(6 rows)

or maybe like this is even easier:

# select quote_nullable(string_to_table('a|*||c|d|','|','*'));
quote_nullable
----------------
'a'
NULL
''
'c'
'd'
''
(6 rows)

Something similar was already proposed before [1]/messages/by-id/CAFj8pRDSzDYmaS06dfMXBfbr8x+3xjDJxA5kbL3h8+eOGoRUcA@mail.gmail.com but that never got
put into the test code.
[1]: /messages/by-id/CAFj8pRDSzDYmaS06dfMXBfbr8x+3xjDJxA5kbL3h8+eOGoRUcA@mail.gmail.com

====

COMMENT (test cases)

There are multiple combinations of the parameters to this function and
MANY different results depending on different values they can take, so
the existing tests only cover a small sample.

I have attached a lot more test scenarios that you may want to include
for better test coverage. Everything seemed to work as expected.

PSA test-cases.pdf

====

COMMENT (accum_result)

+ Datum values[1];
+ bool nulls[1];
+
+ values[0] = PointerGetDatum(result_text);
+ nulls[0] = is_null;

Why not use variables instead of arrays with only 1 element?

====

COMMENT (text_to_array_internal)

+ if (!tstate.astate)
+ PG_RETURN_ARRAYTYPE_P(construct_empty_array(TEXTOID));

Maybe the condition is more readable when expressed as:
if (tstate.astate == NULL)

====

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachments:

test-cases.pdfapplication/pdf; name=test-cases.pdfDownload+4-4
#9Pavel Stehule
pavel.stehule@gmail.com
In reply to: Peter Smith (#8)
Re: proposal - function string_to_table

Hi

čt 20. 8. 2020 v 4:07 odesílatel Peter Smith <smithpb2250@gmail.com> napsal:

Hi.

I have been looking at the patch: string_to_table-20200706-2.patch

Below are some review comments for your consideration.

====

COMMENT func.sgml (style)

+       <para>
+        splits string into table using supplied delimiter and
+        optional null string.
+       </para>

The format style of the short description is inconsistent with the
other functions.
e.g. Should start with Capital letter.
e.g. Should tag the parameter names properly

Something like:
<para>
Splits <parameter>string</parameter> into a table
using supplied <parameter>delimiter</parameter>
and optional null string <parameter>nullstr</parameter>.
</para>

done

====

COMMENT func.sgml (what does nullstr do)

The description does not sufficiently describe the purpose/behaviour
of the nullstr.

e.g. Firstly I thought that it meant if 2 consecutive delimiters were
encountered it would substitute this string as the row value. But it
is doing the opposite of what I guessed - if the extracted row value
is the same as nullstr then a NULL row is inserted instead.

done

====

COMMENT func.sgml (wrong sample output)

+<programlisting>xx
+yy,
+zz</programlisting>

This output is incorrect for the sample given. There is no "yy," in
the output because there is a 'yy' nullstr substitution.

Should be:
---
xx
NULL
zz
---

fixed

====

COMMENT func.sgml (related to regexp_split_to_table)

Because this new function is similar to the existing
regexp_split_to_table, perhaps they should cross-reference each other
so a reader of this documentation is made aware of the alternative
function?

I wrote new sentence with ref

====

COMMENT (test cases)

It is impossible to tell difference in the output between empty
strings and nulls currently, so maybe you can change all the tests to
have a form like below so they can be validated properly:

# select v, v IS NULL as "is null" from
string_to_table('a,b,*,c,d,',',','*') g(v);
v | is null
---+---------
a | f
b | f
| t
c | f
d | f
| f
(6 rows)

or maybe like this is even easier:

# select quote_nullable(string_to_table('a|*||c|d|','|','*'));
quote_nullable
----------------
'a'
NULL
''
'c'
'd'
''
(6 rows)

I prefer the first variant, it is clean. It is good idea, done

Something similar was already proposed before [1] but that never got
put into the test code.
[1]
/messages/by-id/CAFj8pRDSzDYmaS06dfMXBfbr8x+3xjDJxA5kbL3h8+eOGoRUcA@mail.gmail.com

====

COMMENT (test cases)

There are multiple combinations of the parameters to this function and
MANY different results depending on different values they can take, so
the existing tests only cover a small sample.

I have attached a lot more test scenarios that you may want to include
for better test coverage. Everything seemed to work as expected.

ok, merged

PSA test-cases.pdf

====

COMMENT (accum_result)

+ Datum values[1];
+ bool nulls[1];
+
+ values[0] = PointerGetDatum(result_text);
+ nulls[0] = is_null;

Why not use variables instead of arrays with only 1 element?

Technically it is equivalent, but I think so using one element array is
more correct, because function heap_form_tuple expects an array. Sure in C
language there is no difference between pointer to value or pointer to
array, but minimally the name of the argument "values" implies so argument
is an array.

This pattern is used more times in Postgres. You can find a fragments where
although we know so array has only one field, still we works with array

misc.c
hash.c
execTuples.c

but I can this code simplify little bit - I can use function
tuplestore_putvalues(tupstore, tupdesc, values, nulls);

I see, so this code can be reduced more, and I don't need local variables,
but I prefer to be consistent with other parts, and I feel better if I pass
an array where the array is expected.

This is not extra important, and I can it change, just I think this variant
is cleaner little bit

====

COMMENT (text_to_array_internal)

+ if (!tstate.astate)
+ PG_RETURN_ARRAYTYPE_P(construct_empty_array(TEXTOID));

Maybe the condition is more readable when expressed as:
if (tstate.astate == NULL)

done

new patch attached

Thank you for precious review

Regards

Pavel

====

Show quoted text

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachments:

string_to_table-20200820.patchtext/x-patch; charset=US-ASCII; name=string_to_table-20200820.patchDownload+862-36
#10Peter Smith
smithpb2250@gmail.com
In reply to: Pavel Stehule (#9)
Re: proposal - function string_to_table

On Fri, Aug 21, 2020 at 5:21 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:

new patch attached

Thanks for taking some of my previous review comments.

I have re-checked the string_to_table_20200820.patch.

Below are some remaining questions/comments:

====

COMMENT (help text)

+        Splits the <parameter>string</parameter> at occurrences
+        of <parameter>delimiter</parameter> and forms the remaining data
+        into a <type>text</type> tavke.

What did you mean by "remaining" in that description?
It gets a bit strange thinking about remaining NULLs, or remaining
empty strings.

Why not just say "... and forms the data into a <type>text</type> table."

---

+        Splits the <parameter>string</parameter> at occurrences
+        of <parameter>delimiter</parameter> and forms the remaining data
+        into a <type>text</type> tavke.

Typo: "tavke." -> "table."

====

COMMENT (help text reference to regexp_split_to_table)

+        input <parameter>string</parameter> can be done by function
+        <function>regexp_split_to_table</function> (see <xref
linkend="functions-posix-regexp"/>).
+       </para>

In the previous review I suggested adding a reference to the
regexp_split_to_table function.
A hyperlink would be a bonus, but maybe it is not possible.

The hyperlink added in the latest patch is to page for POSIX Regular
Expressions, which doesn't seem appropriate.

====

QUESTION (test cases)

Thanks for merging lots of my additional test cases!

Actually, the previous PDF I sent was 2 pages long but you only merged
the tests of page 1.
I wondered was it accidental to omit all those 2nd page tests?

====

QUESTION (function name?)

I noticed that ALL current string functions that use delimiters have
the word "split" in their name.

e.g.
* regexp_split_to_array
* regexp_split_to_table
* split_part

But "string_to_table" is not following this pattern.

Maybe a different choice of function name would be more consistent
with what is already there?
e.g. split_to_table, string_split_to_table, etc.

====

Kind Regards,
Peter Smith.
Fujitsu Australia

#11Pavel Stehule
pavel.stehule@gmail.com
In reply to: Peter Smith (#10)
Re: proposal - function string_to_table

pá 21. 8. 2020 v 9:44 odesílatel Peter Smith <smithpb2250@gmail.com> napsal:

On Fri, Aug 21, 2020 at 5:21 AM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

new patch attached

Thanks for taking some of my previous review comments.

I have re-checked the string_to_table_20200820.patch.

Below are some remaining questions/comments:

====

COMMENT (help text)

+        Splits the <parameter>string</parameter> at occurrences
+        of <parameter>delimiter</parameter> and forms the remaining data
+        into a <type>text</type> tavke.

What did you mean by "remaining" in that description?
It gets a bit strange thinking about remaining NULLs, or remaining
empty strings.

Why not just say "... and forms the data into a <type>text</type> table."

---

+        Splits the <parameter>string</parameter> at occurrences
+        of <parameter>delimiter</parameter> and forms the remaining data
+        into a <type>text</type> tavke.

Typo: "tavke." -> "table."

This text is taken from doc for string_to_array

====

COMMENT (help text reference to regexp_split_to_table)

+        input <parameter>string</parameter> can be done by function
+        <function>regexp_split_to_table</function> (see <xref
linkend="functions-posix-regexp"/>).
+       </para>

In the previous review I suggested adding a reference to the
regexp_split_to_table function.
A hyperlink would be a bonus, but maybe it is not possible.

The hyperlink added in the latest patch is to page for POSIX Regular
Expressions, which doesn't seem appropriate.

ok I remove it

====

QUESTION (test cases)

Thanks for merging lots of my additional test cases!

Actually, the previous PDF I sent was 2 pages long but you only merged
the tests of page 1.
I wondered was it accidental to omit all those 2nd page tests?

I'll check it

====

QUESTION (function name?)

I noticed that ALL current string functions that use delimiters have
the word "split" in their name.

e.g.
* regexp_split_to_array
* regexp_split_to_table
* split_part

But "string_to_table" is not following this pattern.

Maybe a different choice of function name would be more consistent
with what is already there?
e.g. split_to_table, string_split_to_table, etc.

I don't agree. This function is twin (with almost identical behaviour) for
"string_to_array" function, so I think so the name is correct.

Show quoted text

====

Kind Regards,
Peter Smith.
Fujitsu Australia

#12Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#11)
Re: proposal - function string_to_table

pá 21. 8. 2020 v 11:08 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
napsal:

pá 21. 8. 2020 v 9:44 odesílatel Peter Smith <smithpb2250@gmail.com>
napsal:

On Fri, Aug 21, 2020 at 5:21 AM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

new patch attached

Thanks for taking some of my previous review comments.

I have re-checked the string_to_table_20200820.patch.

Below are some remaining questions/comments:

====

COMMENT (help text)

+        Splits the <parameter>string</parameter> at occurrences
+        of <parameter>delimiter</parameter> and forms the remaining data
+        into a <type>text</type> tavke.

What did you mean by "remaining" in that description?
It gets a bit strange thinking about remaining NULLs, or remaining
empty strings.

Why not just say "... and forms the data into a <type>text</type> table."

---

+        Splits the <parameter>string</parameter> at occurrences
+        of <parameter>delimiter</parameter> and forms the remaining data
+        into a <type>text</type> tavke.

Typo: "tavke." -> "table."

This text is taken from doc for string_to_array

I fixed typo. I hope and expect so doc will be finalized by native
speakers.

====

COMMENT (help text reference to regexp_split_to_table)

+        input <parameter>string</parameter> can be done by function
+        <function>regexp_split_to_table</function> (see <xref
linkend="functions-posix-regexp"/>).
+       </para>

In the previous review I suggested adding a reference to the
regexp_split_to_table function.
A hyperlink would be a bonus, but maybe it is not possible.

The hyperlink added in the latest patch is to page for POSIX Regular
Expressions, which doesn't seem appropriate.

ok I remove it

====

QUESTION (test cases)

Thanks for merging lots of my additional test cases!

Actually, the previous PDF I sent was 2 pages long but you only merged
the tests of page 1.
I wondered was it accidental to omit all those 2nd page tests?

I'll check it

I forgot it - now it is merged. Maybe it is over dimensioned for one
function, but it is (at the end) a test of string_to_array function too.

====

QUESTION (function name?)

I noticed that ALL current string functions that use delimiters have
the word "split" in their name.

e.g.
* regexp_split_to_array
* regexp_split_to_table
* split_part

But "string_to_table" is not following this pattern.

Maybe a different choice of function name would be more consistent
with what is already there?
e.g. split_to_table, string_split_to_table, etc.

I don't agree. This function is twin (with almost identical behaviour) for
"string_to_array" function, so I think so the name is correct.

Unfortunately - there is not consistency in naming already, But I think so
string_to_table is a better name, because this function is almost identical
with string_to_array.

Regards

Pavel

Show quoted text

====

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachments:

string_to_table-20200821.patchtext/x-patch; charset=US-ASCII; name=string_to_table-20200821.patchDownload+1251-36
#13Peter Smith
smithpb2250@gmail.com
In reply to: Pavel Stehule (#12)
Re: proposal - function string_to_table

I have re-checked the string_to_table_20200821.patch.

Below is one remaining problem.

====

COMMENT (help text)

+        Splits the <parameter>string</parameter> at occurrences
+        of <parameter>delimiter</parameter> and forms the remaining data
+        into a table with one <type>text</type> type column.
+        If <parameter>delimiter</parameter> is <literal>NULL</literal>,
+        each character in the <parameter>string</parameter> will become a
+        separate element in the array.

Seems like here is a cut/paste error from the string_to_array help text.

"separate element in the array" should say "separate row of the table"

====

Maybe a different choice of function name would be more consistent
with what is already there?
e.g. split_to_table, string_split_to_table, etc.

I don't agree. This function is twin (with almost identical behaviour) for "string_to_array" function, so I think so the name is correct.

OK

====

Kind Regards,
Peter Smith.
Fujitsu Australia

#14Pavel Stehule
pavel.stehule@gmail.com
In reply to: Peter Smith (#13)
Re: proposal - function string_to_table

po 24. 8. 2020 v 4:19 odesílatel Peter Smith <smithpb2250@gmail.com> napsal:

I have re-checked the string_to_table_20200821.patch.

Below is one remaining problem.

====

COMMENT (help text)

+        Splits the <parameter>string</parameter> at occurrences
+        of <parameter>delimiter</parameter> and forms the remaining data
+        into a table with one <type>text</type> type column.
+        If <parameter>delimiter</parameter> is <literal>NULL</literal>,
+        each character in the <parameter>string</parameter> will become a
+        separate element in the array.

Seems like here is a cut/paste error from the string_to_array help text.

"separate element in the array" should say "separate row of the table"

fixed

====

Maybe a different choice of function name would be more consistent
with what is already there?
e.g. split_to_table, string_split_to_table, etc.

I don't agree. This function is twin (with almost identical behaviour)

for "string_to_array" function, so I think so the name is correct.

OK

====

please, check attached patch

Regards

Pavel

Show quoted text

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachments:

string_to_table-20200824.patchtext/x-patch; charset=US-ASCII; name=string_to_table-20200824.patchDownload+1251-36
#15Peter Smith
smithpb2250@gmail.com
In reply to: Pavel Stehule (#14)
Re: proposal - function string_to_table

Hi.

I have re-checked the string_to_table_20200824.patch.

====

On Tue, Aug 25, 2020 at 2:34 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:

COMMENT (help text)

+        Splits the <parameter>string</parameter> at occurrences
+        of <parameter>delimiter</parameter> and forms the remaining data
+        into a table with one <type>text</type> type column.
+        If <parameter>delimiter</parameter> is <literal>NULL</literal>,
+        each character in the <parameter>string</parameter> will become a
+        separate element in the array.

Seems like here is a cut/paste error from the string_to_array help text.

"separate element in the array" should say "separate row of the table"

fixed

No. You wrote "separate row of table". Should say "separate row of the table".

====

QUESTION (pg_proc.dat)

I noticed the oids of the functions are modified in this latest patch.
They seem 1000's away from the next nearest oid.
I was curious about the reason for those particular numbers (8432, 8433)?

====

Kind Regards,
Peter Smith.
Fujitsu Australia

#16Pavel Stehule
pavel.stehule@gmail.com
In reply to: Peter Smith (#15)
Re: proposal - function string_to_table

út 25. 8. 2020 v 1:19 odesílatel Peter Smith <smithpb2250@gmail.com> napsal:

Hi.

I have re-checked the string_to_table_20200824.patch.

====

On Tue, Aug 25, 2020 at 2:34 AM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

COMMENT (help text)

+        Splits the <parameter>string</parameter> at occurrences
+        of <parameter>delimiter</parameter> and forms the remaining

data

+        into a table with one <type>text</type> type column.
+        If <parameter>delimiter</parameter> is <literal>NULL</literal>,
+        each character in the <parameter>string</parameter> will

become a

+ separate element in the array.

Seems like here is a cut/paste error from the string_to_array help text.

"separate element in the array" should say "separate row of the table"

fixed

No. You wrote "separate row of table". Should say "separate row of the
table".

should be fixed now

====

QUESTION (pg_proc.dat)

I noticed the oids of the functions are modified in this latest patch.
They seem 1000's away from the next nearest oid.
I was curious about the reason for those particular numbers (8432, 8433)?

When you run ./unused_oids script, then you get this message

[pavel@nemesis catalog]$ ./unused_oids
4 - 9
560 - 583
786 - 789
811 - 816
1136 - 1137
2121
2137
2228
3435
3585
4035
4142
4179 - 4180
4198 - 4199
4225 - 4301
4388 - 4401
4450 - 4451
4532 - 4565
4572 - 4999
5097 - 5999
6015 - 6099
6105
6107 - 6109
6116
6122 - 8431
8434 - 8455
8457 - 9999
Patches should use a more-or-less consecutive range of OIDs.
Best practice is to start with a random choice in the range 8000-9999.
Suggested random unused OID: 8973 (1027 consecutive OID(s) available
starting here)

For me, this is simple protection against oid collision under development,
and I expect so commiters does oid' space defragmentation.

Regards

Pavel

Show quoted text

====

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachments:

string_to_table-20200825.patchtext/x-patch; charset=US-ASCII; name=string_to_table-20200825.patchDownload+1251-36
#17Peter Smith
smithpb2250@gmail.com
In reply to: Pavel Stehule (#16)
Re: proposal - function string_to_table

On Tue, Aug 25, 2020 at 4:58 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:

When you run ./unused_oids script, then you get this message

[pavel@nemesis catalog]$ ./unused_oids

<snip>

Patches should use a more-or-less consecutive range of OIDs.
Best practice is to start with a random choice in the range 8000-9999.
Suggested random unused OID: 8973 (1027 consecutive OID(s) available starting here)

For me, this is simple protection against oid collision under development, and I expect so commiters does oid' space defragmentation.

I have not used that tool before. Thanks for teaching me!

===

I have re-checked the string_to_table_20200825.patch.

Everything looks good to me now, so I am marking this as "ready for committer".

Kind Regards,
Peter Smith.
Fujitsu Australia

#18Pavel Stehule
pavel.stehule@gmail.com
In reply to: Peter Smith (#17)
Re: proposal - function string_to_table

út 25. 8. 2020 v 11:19 odesílatel Peter Smith <smithpb2250@gmail.com>
napsal:

On Tue, Aug 25, 2020 at 4:58 PM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

When you run ./unused_oids script, then you get this message

[pavel@nemesis catalog]$ ./unused_oids

<snip>

Patches should use a more-or-less consecutive range of OIDs.
Best practice is to start with a random choice in the range 8000-9999.
Suggested random unused OID: 8973 (1027 consecutive OID(s) available

starting here)

For me, this is simple protection against oid collision under

development, and I expect so commiters does oid' space defragmentation.

I have not used that tool before. Thanks for teaching me!

:)

===

I have re-checked the string_to_table_20200825.patch.

Everything looks good to me now, so I am marking this as "ready for
committer".

Thank you very much :)

Regard

Pavel

Show quoted text

Kind Regards,
Peter Smith.
Fujitsu Australia

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#18)
Re: proposal - function string_to_table

Pavel Stehule <pavel.stehule@gmail.com> writes:

[ string_to_table-20200825.patch ]

I reviewed this, whacked it around a little, and pushed it.

Possibly the most controversial thing I did was to move the existing
documentation entry for string_to_array() into the string-functions
table. I did not like it one bit that the patch was documenting
string_to_table() far away from string_to_array(), and on reflection
I concluded that you'd picked the right place and the issue here is
that string_to_array() was in the wrong place.

Also, I pared the proposed regression tests a great deal, ending up
with something that matches the existing tests for string_to_array().
The proposed tests seemed mighty duplicative, and they even contained
syntax errors, so I didn't believe that they were carefully considered.

regards, tom lane

#20Peter Smith
smithpb2250@gmail.com
In reply to: Tom Lane (#19)
Re: proposal - function string_to_table

On Thu, Sep 3, 2020 at 8:30 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

The proposed tests seemed mighty duplicative, and they even contained
syntax errors, so I didn't believe that they were carefully considered.

Can you please share examples of what syntax errors were in those
previous tests?

Kind Regards,
Peter Smith.
Fujitsu Australia

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Smith (#20)
#22Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#19)