[PATCH] Add missing type conversion functions for PL/Python

Started by Haozhou Wangabout 8 years ago16 messageshackers
Jump to latest
#1Haozhou Wang
hawang@pivotal.io

Hi All,

PL/Python already has different type conversion functions to
convert PostgreSQL datum to Python object. However, the conversion
functions from Python object to PostgreSQL datum only has Boolean,
Bytea and String functions.

In this patch, we add rest of Integer and Float related datatype conversion
functions
and can increase the performance of data conversion greatly especially
when returning a large array.

We did a quick test about the performance of returning array in PL/Python:

The following UDF is used for test:

```
CREATE OR REPLACE FUNCTION pyoutfloat8(num int) RETURNS float8[] AS $$
return [x/3.0 for x in range(num)]
$$ LANGUAGE plpythonu;
```

The test command is

```
select count(pyoutfloat8(n));
```

The count is used for avoid large output, where n is the number of element
in returned array, and the size is from 1.5k to15m.

Size of Array 1.5k | 15k |
150k | 1.5m | 15m |

Origin 2.324ms | 19.834ms | 194.991ms
| 1927.28ms | 19982.1ms |

With this patch 1.168ms | 3.052ms | 21.888ms |
213.39ms | 2138.5ms |

All test for both PG and PL/Python are passed.

Thanks very much.

--
Regards,
Haozhou

Attachments:

0001-Add-missing-type-conversion-functions-for-PL-Python.patchapplication/octet-stream; name=0001-Add-missing-type-conversion-functions-for-PL-Python.patchDownload+201-1
#2Anthony Bykov
a.bykov@postgrespro.ru
In reply to: Haozhou Wang (#1)
Re: [PATCH] Add missing type conversion functions for PL/Python

On Wed, 31 Jan 2018 11:57:12 +0800
Haozhou Wang <hawang@pivotal.io> wrote:

Hi All,

PL/Python already has different type conversion functions to
convert PostgreSQL datum to Python object. However, the conversion
functions from Python object to PostgreSQL datum only has Boolean,
Bytea and String functions.

In this patch, we add rest of Integer and Float related datatype
conversion functions
and can increase the performance of data conversion greatly especially
when returning a large array.

We did a quick test about the performance of returning array in
PL/Python:

The following UDF is used for test:

```
CREATE OR REPLACE FUNCTION pyoutfloat8(num int) RETURNS float8[] AS $$
return [x/3.0 for x in range(num)]
$$ LANGUAGE plpythonu;
```

The test command is

```
select count(pyoutfloat8(n));
```

The count is used for avoid large output, where n is the number of
element in returned array, and the size is from 1.5k to15m.

Size of Array 1.5k | 15k |
150k | 1.5m | 15m |

Origin 2.324ms | 19.834ms | 194.991ms
| 1927.28ms | 19982.1ms |

With this patch 1.168ms | 3.052ms |
21.888ms | 213.39ms | 2138.5ms |

All test for both PG and PL/Python are passed.

Thanks very much.

Hello,
sounds like a really nice patch. I've started looking
through the code and noticed a sort of a typos (or I just couldn't
understand what did you mean) in comments.

file "src/pl/plpython/plpy_typeio.c"
the comment is
* If can not convert if directly, fallback to PLyObject_ToDatum
* to convert it

Maybe it should be something like ("it" instead of second "if")
* If can not convert it directly, fallback to PLyObject_ToDatum
* to convert it

And the same typo is repeated several times in comments.

--
Anthony Bykov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#3Haozhou Wang
hawang@pivotal.io
In reply to: Anthony Bykov (#2)
Re: [PATCH] Add missing type conversion functions for PL/Python

Thank you very much for your review!

I attached a new patch with typo fixed.

Regards,
Haozhou

On Mon, Feb 19, 2018 at 2:37 PM, Anthony Bykov <a.bykov@postgrespro.ru>
wrote:

Show quoted text

On Wed, 31 Jan 2018 11:57:12 +0800
Haozhou Wang <hawang@pivotal.io> wrote:

Hi All,

PL/Python already has different type conversion functions to
convert PostgreSQL datum to Python object. However, the conversion
functions from Python object to PostgreSQL datum only has Boolean,
Bytea and String functions.

In this patch, we add rest of Integer and Float related datatype
conversion functions
and can increase the performance of data conversion greatly especially
when returning a large array.

We did a quick test about the performance of returning array in
PL/Python:

The following UDF is used for test:

```
CREATE OR REPLACE FUNCTION pyoutfloat8(num int) RETURNS float8[] AS $$
return [x/3.0 for x in range(num)]
$$ LANGUAGE plpythonu;
```

The test command is

```
select count(pyoutfloat8(n));
```

The count is used for avoid large output, where n is the number of
element in returned array, and the size is from 1.5k to15m.

Size of Array 1.5k | 15k |
150k | 1.5m | 15m |

Origin 2.324ms | 19.834ms | 194.991ms
| 1927.28ms | 19982.1ms |

With this patch 1.168ms | 3.052ms |
21.888ms | 213.39ms | 2138.5ms |

All test for both PG and PL/Python are passed.

Thanks very much.

Hello,
sounds like a really nice patch. I've started looking
through the code and noticed a sort of a typos (or I just couldn't
understand what did you mean) in comments.

file "src/pl/plpython/plpy_typeio.c"
the comment is
* If can not convert if directly, fallback to PLyObject_ToDatum
* to convert it

Maybe it should be something like ("it" instead of second "if")
* If can not convert it directly, fallback to PLyObject_ToDatum
* to convert it

And the same typo is repeated several times in comments.

--
Anthony Bykov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0001-Add-missing-type-conversion-functions-for-PL-Python-v2.patchapplication/octet-stream; name=0001-Add-missing-type-conversion-functions-for-PL-Python-v2.patchDownload+201-1
#4David Steele
david@pgmasters.net
In reply to: Haozhou Wang (#3)
Re: Re: [PATCH] Add missing type conversion functions for PL/Python

On 2/20/18 10:14 AM, Haozhou Wang wrote:

Thank you very much for your review!

I attached a new patch with typo fixed.

I think it's a bit premature to mark this Ready for Committer after a
review consisting of a few typos. Anthony only said that he started
looking at it so I've marked it Needs Review.

Anthony, were you planning to have a deeper look?

Regards,
--
-David
david@pgmasters.net

#5Nikita Glukhov
n.gluhov@postgrespro.ru
In reply to: David Steele (#4)
Re: [PATCH] Add missing type conversion functions for PL/Python

On 26.03.2018 17:19, David Steele wrote:

On 2/20/18 10:14 AM, Haozhou Wang wrote:

Thank you very much for your review!

I attached a new patch with typo fixed.

I think it's a bit premature to mark this Ready for Committer after a
review consisting of a few typos. Anthony only said that he started
looking at it so I've marked it Needs Review.

Hi.

I also have looked at this patch and found some problems.
Attached fixed 3th version of the patch:
* initialization of arg->u.scalar was moved into PLy_output_setup_func()
* added range checks for int16 and int32 types
* added subroutine PLyInt_AsLong() for correct handling OverflowError that can
be thrown from PyInt_AsLong()
* casting from Python float to PostgreSQL numeric using PyFloat_AsDouble() was
removed because it can return incorrect result for Python long and
float8_numeric() uses float8 and numeric I/O functions
* fixed whitespace

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0001-Add-missing-type-conversion-functions-for-PL-Python-v3.patchtext/x-patch; name=0001-Add-missing-type-conversion-functions-for-PL-Python-v3.patchDownload+238-3
#6Haozhou Wang
hawang@pivotal.io
In reply to: Nikita Glukhov (#5)
Re: [PATCH] Add missing type conversion functions for PL/Python

Thanks Nikita!

On Tue, Mar 27, 2018 at 12:07 AM, Nikita Glukhov <n.gluhov@postgrespro.ru>
wrote:

On 26.03.2018 17:19, David Steele wrote:

On 2/20/18 10:14 AM, Haozhou Wang wrote:

Thank you very much for your review!

I attached a new patch with typo fixed.

I think it's a bit premature to mark this Ready for Committer after a
review consisting of a few typos. Anthony only said that he started
looking at it so I've marked it Needs Review.

Hi.

I also have looked at this patch and found some problems.
Attached fixed 3th version of the patch:
* initialization of arg->u.scalar was moved into PLy_output_setup_func()
* added range checks for int16 and int32 types
* added subroutine PLyInt_AsLong() for correct handling OverflowError
that can
be thrown from PyInt_AsLong()
* casting from Python float to PostgreSQL numeric using
PyFloat_AsDouble() was
removed because it can return incorrect result for Python long and
float8_numeric() uses float8 and numeric I/O functions
* fixed whitespace

--
Nikita Glukhov

Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Regards,
Haozhou

#7David Steele
david@pgmasters.net
In reply to: Nikita Glukhov (#5)
Re: [PATCH] Add missing type conversion functions for PL/Python

On 3/26/18 12:07 PM, Nikita Glukhov wrote:

On 26.03.2018 17:19, David Steele wrote:

On 2/20/18 10:14 AM, Haozhou Wang wrote:

Thank you very much for your review!

I attached a new patch with typo fixed.

I think it's a bit premature to mark this Ready for Committer after a
review consisting of a few typos.  Anthony only said that he started
looking at it so I've marked it Needs Review.

Hi.

I also have looked at this patch and found some problems.
Attached fixed 3th version of the patch:
 * initialization of arg->u.scalar was moved into PLy_output_setup_func()
 * added range checks for int16 and int32 types
 * added subroutine PLyInt_AsLong() for correct handling OverflowError
that can
   be thrown from PyInt_AsLong()
 * casting from Python float to PostgreSQL numeric using
PyFloat_AsDouble() was
   removed because it can return incorrect result for Python long and
   float8_numeric() uses float8 and numeric I/O functions
 * fixed whitespace

There's a new patch on this thread but since it was not submitted by the
author I've moved this entry to the next CF in Waiting for Author state.

Regards,
--
-David
david@pgmasters.net

#8Haozhou Wang
hawang@pivotal.io
In reply to: David Steele (#7)
Re: [PATCH] Add missing type conversion functions for PL/Python

Hi David,

Thanks for your email.
Just wondering will I need to re-submit this patch?

Thanks a lot!

Regards,
Haozhou

On Tue, Apr 10, 2018 at 9:35 PM, David Steele <david@pgmasters.net> wrote:

Show quoted text

On 3/26/18 12:07 PM, Nikita Glukhov wrote:

On 26.03.2018 17:19, David Steele wrote:

On 2/20/18 10:14 AM, Haozhou Wang wrote:

Thank you very much for your review!

I attached a new patch with typo fixed.

I think it's a bit premature to mark this Ready for Committer after a
review consisting of a few typos. Anthony only said that he started
looking at it so I've marked it Needs Review.

Hi.

I also have looked at this patch and found some problems.
Attached fixed 3th version of the patch:
* initialization of arg->u.scalar was moved into PLy_output_setup_func()
* added range checks for int16 and int32 types
* added subroutine PLyInt_AsLong() for correct handling OverflowError
that can
be thrown from PyInt_AsLong()
* casting from Python float to PostgreSQL numeric using
PyFloat_AsDouble() was
removed because it can return incorrect result for Python long and
float8_numeric() uses float8 and numeric I/O functions
* fixed whitespace

There's a new patch on this thread but since it was not submitted by the
author I've moved this entry to the next CF in Waiting for Author state.

Regards,
--
-David
david@pgmasters.net

#9David Steele
david@pgmasters.net
In reply to: Haozhou Wang (#8)
Re: [PATCH] Add missing type conversion functions for PL/Python

On 4/11/18 2:36 AM, Haozhou Wang wrote:

Thanks for your email.
Just wondering will I need to re-submit this patch?

No need to resubmit, the CF entry has been moved here:

https://commitfest.postgresql.org/18/1499/

You should have a look at Nikita's patch, though.

Regards,
--
-David
david@pgmasters.net

#10Haozhou Wang
hawang@pivotal.io
In reply to: David Steele (#9)
Re: [PATCH] Add missing type conversion functions for PL/Python

Hi David,

This new patch is look good for me. So I change the status to need review.
Thanks!

Regards,
Haozhou

On Wed, Apr 11, 2018 at 7:52 PM, David Steele <david@pgmasters.net> wrote:

Show quoted text

On 4/11/18 2:36 AM, Haozhou Wang wrote:

Thanks for your email.
Just wondering will I need to re-submit this patch?

No need to resubmit, the CF entry has been moved here:

https://commitfest.postgresql.org/18/1499/

You should have a look at Nikita's patch, though.

Regards,
--
-David
david@pgmasters.net

#11Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Nikita Glukhov (#5)
Re: [PATCH] Add missing type conversion functions for PL/Python

On 26/03/18 19:07, Nikita Glukhov wrote:

Attached fixed 3th version of the patch:

Thanks, I'm reviewing this now. Nice speedup!

There is no test coverage for some of the added code. You can get a code
coverage report with:

./configure --enable-coverage ...
make
make -C src/pl/plpython check
make coverage-html

That produces a code coverage report in coverage/index.html. Please look
at the coverage of the new functions, and add tests to
src/pl/plpython/sql/plpython_types.sql so that all the new code is covered.

In some places, where you've already checked the object type e.g. with
PyFloat_Check(), I think you could use the less safe variants, like
PyFloat_AS_DOUBLE() instead of PyFloat_AsDouble(). Since this patch is
all about performance, seems worth doing.

Some of the conversions seem a bit questionable. For example:

/*
* Convert a Python object to a PostgreSQL float8 datum directly.
* If can not convert it directly, fallback to PLyObject_ToScalar
* to convert it.
*/
static Datum
PLyObject_ToFloat8(PLyObToDatum *arg, PyObject *plrv,
bool *isnull, bool inarray)
{
if (plrv == Py_None)
{
*isnull = true;
return (Datum) 0;
}

if (PyFloat_Check(plrv) || PyInt_Check(plrv) || PyLong_Check(plrv))
{
*isnull = false;
return Float8GetDatum((double)PyFloat_AsDouble(plrv));
}

return PLyObject_ToScalar(arg, plrv, isnull, inarray);
}

The conversion from Python int to C double is performed by
PyFloat_AsDouble(). But there's no error checking. And wouldn't
PyLong_AsDouble() be more appropriate in that case, since we already
checked the python type?

- Heikki

#12Haozhou Wang
hawang@pivotal.io
In reply to: Heikki Linnakangas (#11)
Re: [PATCH] Add missing type conversion functions for PL/Python

Hi Heikki,

Thank you very much for your review!
I will prepare a new patch and make it ready soon.

Regards,
Haozhou

On Thu, Jul 12, 2018 at 2:03 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 26/03/18 19:07, Nikita Glukhov wrote:

Attached fixed 3th version of the patch:

Thanks, I'm reviewing this now. Nice speedup!

There is no test coverage for some of the added code. You can get a code
coverage report with:

./configure --enable-coverage ...
make
make -C src/pl/plpython check
make coverage-html

That produces a code coverage report in coverage/index.html. Please look
at the coverage of the new functions, and add tests to
src/pl/plpython/sql/plpython_types.sql so that all the new code is covered.

In some places, where you've already checked the object type e.g. with
PyFloat_Check(), I think you could use the less safe variants, like
PyFloat_AS_DOUBLE() instead of PyFloat_AsDouble(). Since this patch is
all about performance, seems worth doing.

Some of the conversions seem a bit questionable. For example:

/*
* Convert a Python object to a PostgreSQL float8 datum directly.
* If can not convert it directly, fallback to PLyObject_ToScalar
* to convert it.
*/
static Datum
PLyObject_ToFloat8(PLyObToDatum *arg, PyObject *plrv,
bool *isnull, bool inarray)
{
if (plrv == Py_None)
{
*isnull = true;
return (Datum) 0;
}

if (PyFloat_Check(plrv) || PyInt_Check(plrv) || PyLong_Check(plrv))
{
*isnull = false;
return Float8GetDatum((double)PyFloat_AsDouble(plrv));
}

return PLyObject_ToScalar(arg, plrv, isnull, inarray);
}

The conversion from Python int to C double is performed by
PyFloat_AsDouble(). But there's no error checking. And wouldn't
PyLong_AsDouble() be more appropriate in that case, since we already
checked the python type?

- Heikki

--
Regards,
Haozhou

#13Nikita Glukhov
n.gluhov@postgrespro.ru
In reply to: Heikki Linnakangas (#11)
Re: [PATCH] Add missing type conversion functions for PL/Python

On 11.07.2018 21:03, Heikki Linnakangas wrote:

On 26/03/18 19:07, Nikita Glukhov wrote:

Attached fixed 3th version of the patch:

Thanks, I'm reviewing this now. Nice speedup!

Thank you for your review.

There is no test coverage for some of the added code. You can get a
code coverage report with:

./configure --enable-coverage ...
make
make -C src/pl/plpython check
make coverage-html

That produces a code coverage report in coverage/index.html. Please
look at the coverage of the new functions, and add tests to
src/pl/plpython/sql/plpython_types.sql so that all the new code is
covered.

I have added some cross-type test cases and now almost all new code is covered
(excluding several error cases which can be triggered only by custom numeric
type implementations).

In some places, where you've already checked the object type e.g. with
PyFloat_Check(), I think you could use the less safe variants, like
PyFloat_AS_DOUBLE() instead of PyFloat_AsDouble(). Since this patch is
all about performance, seems worth doing.

Fixed.

Some of the conversions seem a bit questionable. For example:

/*
 * Convert a Python object to a PostgreSQL float8 datum directly.
 * If can not convert it directly, fallback to PLyObject_ToScalar
 * to convert it.
 */
static Datum
PLyObject_ToFloat8(PLyObToDatum *arg, PyObject *plrv,
                   bool *isnull, bool inarray)
{
    if (plrv == Py_None)
    {
        *isnull = true;
        return (Datum) 0;
    }

    if (PyFloat_Check(plrv) || PyInt_Check(plrv) || PyLong_Check(plrv))
    {
        *isnull = false;
        return Float8GetDatum((double)PyFloat_AsDouble(plrv));
    }

    return PLyObject_ToScalar(arg, plrv, isnull, inarray);
}

The conversion from Python int to C double is performed by
PyFloat_AsDouble(). But there's no error checking. And wouldn't
PyLong_AsDouble() be more appropriate in that case, since we already
checked the python type?

Yes, this might be wrong, but PyFloat_AsDouble() internally tries first to
convert number to float. Also, after gaining more experience in PL/Python
during the implementation of jsonb transforms, I found a lot of similar
problems in the code. All of them are fixed in the 4th version of the patch.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0001-Add-missing-type-conversion-functions-for-PL-Python-v4.patchtext/x-patch; name=0001-Add-missing-type-conversion-functions-for-PL-Python-v4.patchDownload+1146-3
#14Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Nikita Glukhov (#13)
Re: [PATCH] Add missing type conversion functions for PL/Python

On 12/07/18 18:06, Nikita Glukhov wrote:

I have added some cross-type test cases and now almost all new code is covered
(excluding several error cases which can be triggered only by custom numeric
type implementations).

Thanks! Some of those new tests actually fail, if you run them against
unpatched master. For example:

  SELECT * FROM test_type_conversion_float8_int2(100::float8);
  INFO:  (100.0, <type 'float'>)
- test_type_conversion_float8_int2
-----------------------------------
-                              100
-(1 row)
-
+ERROR:  invalid input syntax for integer: "100.0"
+CONTEXT:  while creating return value
+PL/Python function "test_type_conversion_float8_int2"

So this patch is making some subtle changes to behavior. I don't think
we want that.

- Heikki

#15Haozhou Wang
hawang@pivotal.io
In reply to: Heikki Linnakangas (#14)
Re: [PATCH] Add missing type conversion functions for PL/Python
+1, I also think that we may not change the previous behavior of plpython.
@Nikita Glukhov <n.gluhov@postgrespro.ru> maybe we just check the
whether pyobject is int or long only in related conversion functions, and
fallback otherwise?

On Fri, Jul 13, 2018 at 12:09 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 12/07/18 18:06, Nikita Glukhov wrote:

I have added some cross-type test cases and now almost all new code is

covered

(excluding several error cases which can be triggered only by custom

numeric

type implementations).

Thanks! Some of those new tests actually fail, if you run them against
unpatched master. For example:

SELECT * FROM test_type_conversion_float8_int2(100::float8);
INFO:  (100.0, <type 'float'>)
- test_type_conversion_float8_int2
-----------------------------------
-                              100
-(1 row)
-
+ERROR:  invalid input syntax for integer: "100.0"
+CONTEXT:  while creating return value
+PL/Python function "test_type_conversion_float8_int2"

So this patch is making some subtle changes to behavior. I don't think
we want that.

- Heikki

--
Regards,
Haozhou

#16Michael Paquier
michael@paquier.xyz
In reply to: Haozhou Wang (#15)
Re: [PATCH] Add missing type conversion functions for PL/Python

On Mon, Jul 16, 2018 at 03:35:57PM +0800, Haozhou Wang wrote:

+1, I also think that we may not change the previous behavior of plpython.
@Nikita Glukhov <n.gluhov@postgrespro.ru> maybe we just check the
whether pyobject is int or long only in related conversion functions, and
fallback otherwise?

This patch was around for some time, and the changes in behavior are not
really nice, so this is marked as returned with feedback.
--
Michael