Weird behaviour of C extension function

Started by Amaury Bouchardalmost 6 years ago2 messagesgeneral
Jump to latest
#1Amaury Bouchard
amaury.bouchard@anasen.com

Hello everybody,

I have a really strange behaviour with a C function, wich gets a text as
parameter.
Everything works fine when I call the function directly, giving a text
string as parameter. But a problem occurs when I try to read data from a
table.

To illustrate the problem, I stripped the function down to the minimum. The
source code is below, but first, here is the behaviour :

Direct call
-----------

select passthru('hello world!'), passthru('utf8 çhàràtérs'), passthru('

h3110 123 456 ');
INFO: INPUT STRING: 'hello world!' (12)
INFO: INPUT STRING: 'utf8 çhàràtérs' (18)
INFO: INPUT STRING: ' h3110 123 456 ' (15)

(as you can see, the log messages show the correct input, with the number
of bytes between parentheses)

Reading a table data
--------------------

create table mytable ( str text);
insert into mytable (str) values ('hello world!'), ('utf8 çhàràtérs'), ('

h3110 123 456 ');

select passthru(str) from mytable;

INFO: INPUT STRING: 'lo world!' (12)
INFO: INPUT STRING: '8 çhàràtérs' (18)
INFO: INPUT STRING: '110 123 456 �
' (15)
INFO: INPUT STRING: '��' (5)
INFO: INPUT STRING: '' (3)

There, you can see that the pointer seems to be shifted 3 bytes farther.

Do you have any clue for this strange behaviour?

The source code
---------------

#include "postgres.h"
#include "fmgr.h"
#include "funcapi.h"

// PG module init
#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif
void _PG_init(void);
Datum passthru(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(passthru);

void _PG_init() {
}

Datum passthru(PG_FUNCTION_ARGS) {
// get the input string
text *input = PG_GETARG_TEXT_PP(0);
char *input_pt = (char*)VARDATA(input);
int32 input_len = VARSIZE_ANY_EXHDR(input);
// create a null terminated copy of the input string
char *str_copy = calloc(1, input_len + 1);
memcpy(str_copy, input_pt, input_len);
// log message
elog(INFO, "INPUT STRING: '%s' (%d)", str_copy, input_len);
free(str_copy);
PG_RETURN_NULL();
}

Thank you.
Best regards,

Amaury Bouchard

#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Amaury Bouchard (#1)
Re: Weird behaviour of C extension function

On Fri, 2020-04-24 at 14:53 +0200, Amaury Bouchard wrote:

I have a really strange behaviour with a C function, wich gets a text as parameter.
Everything works fine when I call the function directly, giving a text string as parameter. But a problem occurs when I try to read data from a table.

To illustrate the problem, I stripped the function down to the minimum. The source code is below, but first, here is the behaviour :

Direct call
-----------

select passthru('hello world!'), passthru('utf8 çhàràtérs'), passthru(' h3110 123 456 ');

INFO: INPUT STRING: 'hello world!' (12)
INFO: INPUT STRING: 'utf8 çhàràtérs' (18)
INFO: INPUT STRING: ' h3110 123 456 ' (15)

(as you can see, the log messages show the correct input, with the number of bytes between parentheses)

Reading a table data
--------------------

create table mytable ( str text);
insert into mytable (str) values ('hello world!'), ('utf8 çhàràtérs'), (' h3110 123 456 ');
select passthru(str) from mytable;

INFO: INPUT STRING: 'lo world!' (12)
INFO: INPUT STRING: '8 çhàràtérs' (18)
INFO: INPUT STRING: '110 123 456 �
' (15)
INFO: INPUT STRING: '��' (5)
INFO: INPUT STRING: '' (3)

There, you can see that the pointer seems to be shifted 3 bytes farther.

Do you have any clue for this strange behaviour?

The source code
---------------

#include "postgres.h"
#include "fmgr.h"
#include "funcapi.h"

// PG module init
#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif
void _PG_init(void);
Datum passthru(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(passthru);

void _PG_init() {
}

Datum passthru(PG_FUNCTION_ARGS) {
// get the input string
text *input = PG_GETARG_TEXT_PP(0);
char *input_pt = (char*)VARDATA(input);
int32 input_len = VARSIZE_ANY_EXHDR(input);
// create a null terminated copy of the input string
char *str_copy = calloc(1, input_len + 1);
memcpy(str_copy, input_pt, input_len);
// log message
elog(INFO, "INPUT STRING: '%s' (%d)", str_copy, input_len);
free(str_copy);
PG_RETURN_NULL();
}

You find this in "postgres.h":

* In consumers oblivious to data alignment, call PG_DETOAST_DATUM_PACKED(),
* VARDATA_ANY(), VARSIZE_ANY() and VARSIZE_ANY_EXHDR(). Elsewhere, call
* PG_DETOAST_DATUM(), VARDATA() and VARSIZE(). Directly fetching an int16,
* int32 or wider field in the struct representing the datum layout requires
* aligned data. memcpy() is alignment-oblivious, as are most operations on
* datatypes, such as text, whose layout struct contains only char fields.

So you should use VARDATA_ANY.

What happens is that these short text columns have a 1-byte TOAST header,
but you ship the first 4 bytes unconditionally, assuming they were detoasted.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com