[PATCH] Add some documentation on how to call internal functions

Started by Florents Tselaiover 1 year ago9 messageshackers
Jump to latest
#1Florents Tselai
florents.tselai@gmail.com

Hi,

The documentation on extending using C functions,
leaves a blank on how to call other internal Postgres functions.
This can leave first-timers hanging.

I think it’d be helpful to spend some paragraphs on discussing the DirectFunctionCall API.

Here’s an attempt (also a PR[0]https://github.com/Florents-Tselai/postgres/pull/1/commits/1651b7bb68e0f9c2b61e1462367295d846d253ec).

Here’s the relevant HTML snippet for convenience:
To call another version-1 function, you can use DirectFunctionCalln(func, arg1,...,argn). This is particularly useful when you want to call functions defined in the standard internal library, by using an interface similar to their SQL signature.

Different flavors of similar macros can be found in fmgr.h. The main point though is that they expect a C function name to call as their first argument (or its Oid in some cases), and actual arguments should be supplied as Datums. They always return Datum.

For example, to call the starts_with(text, text) from C, you can search through the catalog and find out that its C implementation is based on the Datum text_starts_with(PG_FUNCTION_ARGS) function.

In fmgr.h there are also available macros the facilitate conversions between C types and Datum. For example to turn text* into Datum, you can use DatumGetTextPP(X). If your extension defines additional types, it is usually convenient to define similar macros for these types too.

I’ve also added the below example function:

PG_FUNCTION_INFO_V1(t_starts_with);

Datum
t_starts_with(PG_FUNCTION_ARGS)
{
Datum t1 = PG_GETARG_DATUM(0);
Datum t2 = PG_GETARG_DATUM(1);
bool bool_res;

Datum datum_res = DirectFunctionCall2(text_starts_with, t1, t2);
bool_res = DatumGetBool(datum_res);

PG_RETURN_BOOL(bool_res);
}
PS1: I was not sure if src/tutorial is still relevant with this part of the documentation.
If so, it needs updating too.

[0]: https://github.com/Florents-Tselai/postgres/pull/1/commits/1651b7bb68e0f9c2b61e1462367295d846d253ec

Attachments:

0001-Add-some-documentation-on-how-to-call-internal-funct.patchapplication/octet-stream; name=0001-Add-some-documentation-on-how-to-call-internal-funct.patch; x-unix-mode=0644Download+51-1
#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Florents Tselai (#1)
Re: [PATCH] Add some documentation on how to call internal functions

Hi

I am checking this patch

čt 12. 9. 2024 v 11:42 odesílatel Florents Tselai <florents.tselai@gmail.com>
napsal:

Hi,

The documentation on extending using C functions,
leaves a blank on how to call other internal Postgres functions.
This can leave first-timers hanging.

I think it’d be helpful to spend some paragraphs on discussing the
DirectFunctionCall API.

Here’s an attempt (also a PR[0]).

Here’s the relevant HTML snippet for convenience:

To call another version-1 function, you can use DirectFunctionCall*n*(func,
arg1,...,argn). This is particularly useful when you want to call
functions defined in the standard internal library, by using an interface
similar to their SQL signature.

Different flavors of similar macros can be found in fmgr.h. The main
point though is that they expect a C function name to call as their first
argument (or its Oid in some cases), and actual arguments should be
supplied as Datums. They always return Datum.

For example, to call the starts_with(text, text) from C, you can search
through the catalog and find out that its C implementation is based on the Datum
text_starts_with(PG_FUNCTION_ARGS) function.

In fmgr.h there are also available macros the facilitate conversions
between C types and Datum. For example to turn text* into Datum, you can
use DatumGetTextPP(X). If your extension defines additional types, it is
usually convenient to define similar macros for these types too.
I’ve also added the below example function:

PG_FUNCTION_INFO_V1(t_starts_with);

Datum
t_starts_with(PG_FUNCTION_ARGS)
{
Datum t1 = PG_GETARG_DATUM(0);
Datum t2 = PG_GETARG_DATUM(1);
bool bool_res;

Datum datum_res = DirectFunctionCall2(text_starts_with, t1, t2);
bool_res = DatumGetBool(datum_res);

PG_RETURN_BOOL(bool_res);
}

PS1: I was not sure if src/tutorial is still relevant with this part of
the documentation.
If so, it needs updating too.

I have few points

1. I am missing warning so NULL is not supported by DirectFunctionCall - on
input and on output. Any argument should not be null, and the result should
not be null.

2. The example looks little bit not consistent

I propose

Text *t1 = PG_GETARG_TEXT_PP(0);
Text *t2 = PG_GETARG_TEXT_PP(1);
bool result;

result = DatumGetBool(
DirectFunctionCall2(text_starts_with,
PointerGetDatum(t1),
PointerGetDatum(t2));

PG_RETURN_BOOL(result);

or

Datum t1 = PG_GETARG_DATUM(0);
Datum t2 = PG_GETARG_DATUM(1);
Datum result;

result = DirectFunctionCall2(text_starts_with, t1, t2);

PG_RETURN_DATUM(result);

Both examples can show some interesting patterns (maybe can be used both)

Regards

Pavel

Show quoted text

[0]
https://github.com/Florents-Tselai/postgres/pull/1/commits/1651b7bb68e0f9c2b61e1462367295d846d253ec

#3Florents Tselai
florents.tselai@gmail.com
In reply to: Pavel Stehule (#2)
Re: [PATCH] Add some documentation on how to call internal functions

On 9 Oct 2024, at 2:57 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Hi

I am checking this patch

čt 12. 9. 2024 v 11:42 odesílatel Florents Tselai <florents.tselai@gmail.com <mailto:florents.tselai@gmail.com>> napsal:

Hi,

The documentation on extending using C functions,
leaves a blank on how to call other internal Postgres functions.
This can leave first-timers hanging.

I think it’d be helpful to spend some paragraphs on discussing the DirectFunctionCall API.

Here’s an attempt (also a PR[0]).

Here’s the relevant HTML snippet for convenience:
To call another version-1 function, you can use DirectFunctionCalln(func, arg1,...,argn). This is particularly useful when you want to call functions defined in the standard internal library, by using an interface similar to their SQL signature.

Different flavors of similar macros can be found in fmgr.h. The main point though is that they expect a C function name to call as their first argument (or its Oid in some cases), and actual arguments should be supplied as Datums. They always return Datum.

For example, to call the starts_with(text, text) from C, you can search through the catalog and find out that its C implementation is based on the Datum text_starts_with(PG_FUNCTION_ARGS) function.

In fmgr.h there are also available macros the facilitate conversions between C types and Datum. For example to turn text* into Datum, you can use DatumGetTextPP(X). If your extension defines additional types, it is usually convenient to define similar macros for these types too.

I’ve also added the below example function:

PG_FUNCTION_INFO_V1(t_starts_with);

Datum
t_starts_with(PG_FUNCTION_ARGS)
{
Datum t1 = PG_GETARG_DATUM(0);
Datum t2 = PG_GETARG_DATUM(1);
bool bool_res;

Datum datum_res = DirectFunctionCall2(text_starts_with, t1, t2);
bool_res = DatumGetBool(datum_res);

PG_RETURN_BOOL(bool_res);
}
PS1: I was not sure if src/tutorial is still relevant with this part of the documentation.
If so, it needs updating too.

I have few points

1. I am missing warning so NULL is not supported by DirectFunctionCall - on input and on output. Any argument should not be null, and the result should not be null.

You’re right.

2. The example looks little bit not consistent

I propose

Text *t1 = PG_GETARG_TEXT_PP(0);
Text *t2 = PG_GETARG_TEXT_PP(1);
bool result;

result = DatumGetBool(
DirectFunctionCall2(text_starts_with,
PointerGetDatum(t1),
PointerGetDatum(t2));

PG_RETURN_BOOL(result);

or

Datum t1 = PG_GETARG_DATUM(0);
Datum t2 = PG_GETARG_DATUM(1);
Datum result;

result = DirectFunctionCall2(text_starts_with, t1, t2);

PG_RETURN_DATUM(result);

Both examples can show some interesting patterns (maybe can be used both)

The example can be extended a bit more.
I like your first proposal.

My primary purpose with this patch is to show explicitly how
one goes from SQL types, to Datum and/or char*/text if necessary,
use some internal C function (even if it doesn’t have an SQL equivalent).
and then return back an SQL type.

So, simply GET_ARG_DATUM() and then PG_RETURN_DATUM() defeats the purpose.

#4Pavel Stehule
pavel.stehule@gmail.com
In reply to: Florents Tselai (#3)
Re: [PATCH] Add some documentation on how to call internal functions

st 9. 10. 2024 v 14:39 odesílatel Florents Tselai <florents.tselai@gmail.com>
napsal:

On 9 Oct 2024, at 2:57 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Hi

I am checking this patch

čt 12. 9. 2024 v 11:42 odesílatel Florents Tselai <
florents.tselai@gmail.com> napsal:

Hi,

The documentation on extending using C functions,
leaves a blank on how to call other internal Postgres functions.
This can leave first-timers hanging.

I think it’d be helpful to spend some paragraphs on discussing the
DirectFunctionCall API.

Here’s an attempt (also a PR[0]).

Here’s the relevant HTML snippet for convenience:

To call another version-1 function, you can use DirectFunctionCall*n*(func,
arg1,...,argn). This is particularly useful when you want to call
functions defined in the standard internal library, by using an interface
similar to their SQL signature.

Different flavors of similar macros can be found in fmgr.h. The main
point though is that they expect a C function name to call as their first
argument (or its Oid in some cases), and actual arguments should be
supplied as Datums. They always return Datum.

For example, to call the starts_with(text, text) from C, you can search
through the catalog and find out that its C implementation is based on the Datum
text_starts_with(PG_FUNCTION_ARGS) function.

In fmgr.h there are also available macros the facilitate conversions
between C types and Datum. For example to turn text* into Datum, you can
use DatumGetTextPP(X). If your extension defines additional types, it is
usually convenient to define similar macros for these types too.
I’ve also added the below example function:

PG_FUNCTION_INFO_V1(t_starts_with);

Datum
t_starts_with(PG_FUNCTION_ARGS)
{
Datum t1 = PG_GETARG_DATUM(0);
Datum t2 = PG_GETARG_DATUM(1);
bool bool_res;

Datum datum_res = DirectFunctionCall2(text_starts_with, t1, t2);
bool_res = DatumGetBool(datum_res);

PG_RETURN_BOOL(bool_res);
}

PS1: I was not sure if src/tutorial is still relevant with this part of
the documentation.
If so, it needs updating too.

I have few points

1. I am missing warning so NULL is not supported by DirectFunctionCall -
on input and on output. Any argument should not be null, and the result
should not be null.

You’re right.

2. The example looks little bit not consistent

I propose

Text *t1 = PG_GETARG_TEXT_PP(0);
Text *t2 = PG_GETARG_TEXT_PP(1);
bool result;

result = DatumGetBool(
DirectFunctionCall2(text_starts_with,
PointerGetDatum(t1),
PointerGetDatum(t2));

PG_RETURN_BOOL(result);

or

Datum t1 = PG_GETARG_DATUM(0);
Datum t2 = PG_GETARG_DATUM(1);
Datum result;

result = DirectFunctionCall2(text_starts_with, t1, t2);

PG_RETURN_DATUM(result);

Both examples can show some interesting patterns (maybe can be used both)

The example can be extended a bit more.
I like your first proposal.

My primary purpose with this patch is to show explicitly how
one goes from SQL types, to Datum and/or char*/text if necessary,
use some internal C function (even if it doesn’t have an SQL equivalent).
and then return back an SQL type.

So, simply GET_ARG_DATUM() and then PG_RETURN_DATUM() defeats the purpose.

so please, send updated patch + sentence missing null support, and I think
so this patch can be ok

Regards

Pavel

#5Florents Tselai
florents.tselai@gmail.com
In reply to: Pavel Stehule (#4)
Re: [PATCH] Add some documentation on how to call internal functions

Here's a v2
- note on NULL args/returns
- ref PointerGetDatum
- used your example. Started adding some comments but don't think they're
really necessary. The reader gets the point as-is I think.

Attachments:

v2-0001-Add-some-documentation-on-how-to-call-internal-fu.patchapplication/octet-stream; name=v2-0001-Add-some-documentation-on-how-to-call-internal-fu.patchDownload+51-1
v2-0002-Add-note-on-NULL.-Update-example-to-use-text.patchapplication/octet-stream; name=v2-0002-Add-note-on-NULL.-Update-example-to-use-text.patchDownload+10-7
#6Pavel Stehule
pavel.stehule@gmail.com
In reply to: Florents Tselai (#5)
Re: [PATCH] Add some documentation on how to call internal functions

Hi

st 9. 10. 2024 v 20:54 odesílatel Florents Tselai <florents.tselai@gmail.com>
napsal:

Here's a v2
- note on NULL args/returns
- ref PointerGetDatum
- used your example. Started adding some comments but don't think they're
really necessary. The reader gets the point as-is I think.

Now, I don't see any issue

I tested this patch and there is not any problem with patching and compiling

I'll mark this patch as ready for committer

Regards

Pavel

Show quoted text
#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#6)
Re: [PATCH] Add some documentation on how to call internal functions

Pavel Stehule <pavel.stehule@gmail.com> writes:

I'll mark this patch as ready for committer

I spent a little time looking at this. I agree that it's better to
show conversion of the example function's arguments to and from text*
rather than leaving them as Datum. I also do think that we need to
incorporate the example into src/tutorial, if only because that
provides a chance to test it. And indeed, testing exposed that the
example doesn't work:

$ cd src/tutorial
$ make
$ psql postgres
psql (18devel)
Type "help" for help.

postgres=# \i funcs.sql

psql:funcs.sql:152: ERROR: could not determine which collation to use for string comparison
HINT: Use the COLLATE clause to set the collation explicitly.

The problem here is that we failed to pass through the result of
PG_GET_COLLATION() to text_starts_with. We could do that, certainly,
for a couple more lines of code. But it feels like this is getting
into details that obscure the main point. I wonder if it'd be better
to choose a different example that calls a non-collation-dependent
target function.

Anyway, proposed v3 attached. I folded the two patches into one,
and editorialized on the text a little.

regards, tom lane

Attachments:

v3-Add-some-documentation.patchtext/x-diff; charset=us-ascii; name=v3-Add-some-documentation.patchDownload+100-4
#8Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#7)
Re: [PATCH] Add some documentation on how to call internal functions

pá 18. 10. 2024 v 22:23 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Pavel Stehule <pavel.stehule@gmail.com> writes:

I'll mark this patch as ready for committer

I spent a little time looking at this. I agree that it's better to
show conversion of the example function's arguments to and from text*
rather than leaving them as Datum. I also do think that we need to
incorporate the example into src/tutorial, if only because that
provides a chance to test it. And indeed, testing exposed that the
example doesn't work:

$ cd src/tutorial
$ make
$ psql postgres
psql (18devel)
Type "help" for help.

postgres=# \i funcs.sql

psql:funcs.sql:152: ERROR: could not determine which collation to use for
string comparison
HINT: Use the COLLATE clause to set the collation explicitly.

The problem here is that we failed to pass through the result of
PG_GET_COLLATION() to text_starts_with. We could do that, certainly,
for a couple more lines of code. But it feels like this is getting
into details that obscure the main point. I wonder if it'd be better
to choose a different example that calls a non-collation-dependent
target function.

This can be a trap for some beginners too. So example of
DirectFunctionCall2Coll can be nice

<-->result = DatumGetBool(DirectFunctionCall2Coll(text_starts_with,
<--><--><--><--><--><--><--><--><--><--><--><--> DEFAULT_COLLATION_OID,
<--><--><--><--><--><--><--><--><--><--><--><--> PointerGetDatum(t1),
<--><--><--><--><--><--><--><--><--><--><--><--> PointerGetDatum(t2)));

With comment so text based functions can require collation - and simple
solution can be using DEFAULT_COLLATION_OID, and we can
introduce second example of just DirectFunctionCall

Datum
bytea_left(PG_FUNCTION_ARGS)
{
<-->bytea<-> *t = PG_GETARG_BYTEA_PP(0);
<-->int32<-><-->l = PG_GETARG_INT32(1);
<-->bytea<-> *result;

<-->result = DatumGetByteaPP(DirectFunctionCall3(bytea_substr,
<--><--><--><--><--><--><--><--><--><--><--><--> PointerGetDatum(t),
<--><--><--><--><--><--><--><--><--><--><--><--> Int32GetDatum(1),
<--><--><--><--><--><--><--><--><--><--><--><--> Int32GetDatum(l)));
<-->PG_RETURN_BYTEA_P(result);
}

Show quoted text

Anyway, proposed v3 attached. I folded the two patches into one,
and editorialized on the text a little.

regards, tom lane

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#8)
Re: [PATCH] Add some documentation on how to call internal functions

Pavel Stehule <pavel.stehule@gmail.com> writes:

pá 18. 10. 2024 v 22:23 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

The problem here is that we failed to pass through the result of
PG_GET_COLLATION() to text_starts_with. We could do that, certainly,
for a couple more lines of code. But it feels like this is getting
into details that obscure the main point. I wonder if it'd be better
to choose a different example that calls a non-collation-dependent
target function.

This can be a trap for some beginners too. So example of
DirectFunctionCall2Coll can be nice

Hearing no other comments, I fixed the example to pass through
collation and pushed it.

regards, tom lane