Weird problems with C extension and bytea as input type

Started by Adrian Schreyerabout 15 years ago13 messagesgeneral
Jump to latest
#1Adrian Schreyer
ams214@cam.ac.uk

Hi,

I have a weird problem with my custom functions (written in C,C++)
that use bytea as input type (cstring works fine). The functions will
work as expected if they are the only function that uses the bytea
column in a query; as soon as there is a second function using the
same column, the C function will return one of the following: an empty
cstring, a substring of the bytea or the correct cstring. Based on
these symptoms I assume there is something fundamental that I do wrong
(or that is missing) with handling the bytea pointer.

In one specific example, the bytea contains a binary file format that
the function converts into a string format. I convert the bytea to a
C++ string with string(VARDATA(b), VARSIZE(b)-VARHDRSZ).

bytea *b = PG_GETARG_BYTEA_P(0);
char *ism;

ism = function(b);

PG_RETURN_CSTRING(ism);

#2Merlin Moncure
mmoncure@gmail.com
In reply to: Adrian Schreyer (#1)
Re: Weird problems with C extension and bytea as input type

On Tue, Mar 22, 2011 at 8:22 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote:

Hi,

I have a weird problem with my custom functions (written in C,C++)
that use bytea as input type (cstring works fine). The functions will
work as expected if they are the only function that uses the bytea
column in a query; as soon as there is a second function using the
same column, the C function will return one of the following: an empty
cstring, a substring of the bytea or the correct cstring. Based on
these symptoms I assume there is something fundamental that I do wrong
(or that is missing) with handling the bytea pointer.

In one specific example, the bytea contains a binary file format that
the function converts into a string format. I convert the bytea to a
C++ string with string(VARDATA(b), VARSIZE(b)-VARHDRSZ).

bytea *b = PG_GETARG_BYTEA_P(0);
char *ism;

ism = function(b);

PG_RETURN_CSTRING(ism);

your problem is probably inside 'function' -- are you properly copying
the data out of the bytea struct?. also, are you really sure you want
to be returning cstring type, not text?

merlin

#3Adrian Schreyer
ams214@cam.ac.uk
In reply to: Merlin Moncure (#2)
Re: Weird problems with C extension and bytea as input type

On Tue, Mar 22, 2011 at 16:07, Merlin Moncure <mmoncure@gmail.com> wrote:

On Tue, Mar 22, 2011 at 8:22 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote:

Hi,

I have a weird problem with my custom functions (written in C,C++)
that use bytea as input type (cstring works fine). The functions will
work as expected if they are the only function that uses the bytea
column in a query; as soon as there is a second function using the
same column, the C function will return one of the following: an empty
cstring, a substring of the bytea or the correct cstring. Based on
these symptoms I assume there is something fundamental that I do wrong
(or that is missing) with handling the bytea pointer.

In one specific example, the bytea contains a binary file format that
the function converts into a string format. I convert the bytea to a
C++ string with string(VARDATA(b), VARSIZE(b)-VARHDRSZ).

bytea *b = PG_GETARG_BYTEA_P(0);
char *ism;

ism = function(b);

PG_RETURN_CSTRING(ism);

your problem is probably inside 'function' -- are you properly copying
the data out of the bytea struct?.  also, are you really sure you want
to be returning cstring type, not text?

merlin

yes, the function in the C++ toolkit I use takes a std::string as
input. So far I have used string(VARDATA(oeb), VARSIZE(oeb)-VARHDRSZ)
to convert it. I want to change both input and output to text later on
once this is working. I suspect I will get similar problems with
pg_getarg_text_p() or any other struct. I think you are right, I am
not coping the data out of the struct correctly; all my functions that
use VARDATA() have these problems. What would be the proper way?

Cheers,

Adrian

#4Adrian Schreyer
ams214@cam.ac.uk
In reply to: Merlin Moncure (#2)
Re: Weird problems with C extension and bytea as input type

On Tue, Mar 22, 2011 at 16:07, Merlin Moncure <mmoncure@gmail.com> wrote:

On Tue, Mar 22, 2011 at 8:22 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote:

Hi,

I have a weird problem with my custom functions (written in C,C++)
that use bytea as input type (cstring works fine). The functions will
work as expected if they are the only function that uses the bytea
column in a query; as soon as there is a second function using the
same column, the C function will return one of the following: an empty
cstring, a substring of the bytea or the correct cstring. Based on
these symptoms I assume there is something fundamental that I do wrong
(or that is missing) with handling the bytea pointer.

In one specific example, the bytea contains a binary file format that
the function converts into a string format. I convert the bytea to a
C++ string with string(VARDATA(b), VARSIZE(b)-VARHDRSZ).

bytea *b = PG_GETARG_BYTEA_P(0);
char *ism;

ism = function(b);

PG_RETURN_CSTRING(ism);

your problem is probably inside 'function' -- are you properly copying
the data out of the bytea struct?.  also, are you really sure you want
to be returning cstring type, not text?

merlin

yes, the function in the C++ toolkit I use takes a std::string as
input. So far I have used string(VARDATA(oeb), VARSIZE(oeb)-VARHDRSZ)
to convert it. I want to change both input and output to text later on
once this is working. I suspect I will get similar problems with
pg_getarg_text_p() or any other struct. I think you are right, I am
not coping the data out of the struct correctly; all my functions that
use VARDATA() have these problems. What would be the proper way?

Cheers,

Adrian

#5Merlin Moncure
mmoncure@gmail.com
In reply to: Adrian Schreyer (#3)
Re: Weird problems with C extension and bytea as input type

On Tue, Mar 22, 2011 at 11:57 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote:

On Tue, Mar 22, 2011 at 16:07, Merlin Moncure <mmoncure@gmail.com> wrote:

On Tue, Mar 22, 2011 at 8:22 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote:

Hi,

I have a weird problem with my custom functions (written in C,C++)
that use bytea as input type (cstring works fine). The functions will
work as expected if they are the only function that uses the bytea
column in a query; as soon as there is a second function using the
same column, the C function will return one of the following: an empty
cstring, a substring of the bytea or the correct cstring. Based on
these symptoms I assume there is something fundamental that I do wrong
(or that is missing) with handling the bytea pointer.

In one specific example, the bytea contains a binary file format that
the function converts into a string format. I convert the bytea to a
C++ string with string(VARDATA(b), VARSIZE(b)-VARHDRSZ).

bytea *b = PG_GETARG_BYTEA_P(0);
char *ism;

ism = function(b);

PG_RETURN_CSTRING(ism);

your problem is probably inside 'function' -- are you properly copying
the data out of the bytea struct?.  also, are you really sure you want
to be returning cstring type, not text?

merlin

yes, the function in the C++ toolkit I use takes a std::string as
input. So far I have used string(VARDATA(oeb), VARSIZE(oeb)-VARHDRSZ)
to convert it. I want to change both input and output to text later on
once this is working. I suspect I will get similar problems with
pg_getarg_text_p() or any other struct. I think you are right, I am
not coping the data out of the struct correctly; all my functions that
use VARDATA() have these problems. What would be the proper way?

Well, C++ string constructor is proper in the sense it makes copy of
the source data. however, it's a little weird that you are passing
bytea like this...bytea can contain null and c++ string initialization
stops at any 0 byte. Maybe you should be encoding the data to text
(say, to hex) first?

merlin

#6David W Noon
dwnoon@ntlworld.com
In reply to: Merlin Moncure (#5)
Re: Weird problems with C extension and bytea as input type

On Tue, 22 Mar 2011 16:14:47 -0500, Merlin Moncure wrote about Re:
[GENERAL] Weird problems with C extension and bytea as input type:

[snip]

On Tue, Mar 22, 2011 at 8:22 AM, Adrian Schreyer <ams214@cam.ac.uk>
wrote:

[snip]

bytea *b = PG_GETARG_BYTEA_P(0);
char *ism;

ism = function(b);

PG_RETURN_CSTRING(ism);

What is the prototype for function()? If it returns a char * then you
will likely have either scope problems, reentrancy problems or memory
leaks. If you are going to buy the C++ religion then you usually need to
buy it wholesale: do all if your string processing as std::string
objects and only return to char * when you revert to C.

As a rough example:

bytea *b = PG_GETARG_BYTEA_P(0);
std::string ism;

ism = function(std::string(VARDATA(b), VARSIZE(b)-VARHDRSZ));

PG_RETURN_CSTRING(ism.c_str());

Note that this returns an ASCIIZ string, which is not necessarily the
same as the C++ string. You would be better off creating a
PostgreSQL text object and then return that.

Well, C++ string constructor is proper in the sense it makes copy of
the source data. however, it's a little weird that you are passing
bytea like this...bytea can contain null and c++ string initialization
stops at any 0 byte.

Not so. If the constructor also specifies a length then the data
pointer's area is not assumed to be NUL-terminated.

Maybe you should be encoding the data to text (say, to hex) first?

Better to use the supplied length in the varlena descriptor.
--
Regards,

Dave [RLU #314465]
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
dwnoon@ntlworld.com (David W Noon)
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*

#7Adrian Schreyer
adrian@schreyer.me
In reply to: David W Noon (#6)
Re: Weird problems with C extension and bytea as input type

On Tue, Mar 22, 2011 at 22:21, David W Noon <dwnoon@ntlworld.com> wrote:

On Tue, 22 Mar 2011 16:14:47 -0500, Merlin Moncure wrote about Re:
[GENERAL] Weird problems with C extension and bytea as input type:

[snip]

On Tue, Mar 22, 2011 at 8:22 AM, Adrian Schreyer <ams214@cam.ac.uk>
wrote:

[snip]

bytea *b = PG_GETARG_BYTEA_P(0);
char *ism;

ism = function(b);

PG_RETURN_CSTRING(ism);

What is the prototype for function()?  If it returns a char * then you
will likely have either scope problems, reentrancy problems or memory
leaks. If you are going to buy the C++ religion then you usually need to
buy it wholesale: do all if your string processing as std::string
objects and only return to char * when you revert to C.

you are right, it returns a char *.

The prototype:

char *function(bytea *b);

The actual C++ function looks roughly like this

extern "C"
char *function(bytea *b)
{
string ism;
[...]
return ism.c_str();
}

The postgres wrapper in C like this:

PG_FUNCTION_INFO_V1(bin_to_string);
Datum bin_to_string(PG_FUNCTION_ARGS)
{
bytea *b= PG_GETARG_BYTEA_P(0);

char *ism = function(b);

PG_RETURN_CSTRING(ism);
}

I have another function in C++ that parses the binary string (file)
into an object that is then further processed. This works for all
functions returning boolean or numeric values, only the string methods
produce these odd results. So as you said, the way in which strings
are passed between C++ and C in my code must be horribly wrong. What
would be the correct way?

Show quoted text

As a rough example:

 bytea *b = PG_GETARG_BYTEA_P(0);
 std::string ism;

 ism = function(std::string(VARDATA(b), VARSIZE(b)-VARHDRSZ));

 PG_RETURN_CSTRING(ism.c_str());

Note that this returns an ASCIIZ string, which is not necessarily the
same as the C++ string.  You would be better off creating a
PostgreSQL text object and then return that.

Well, C++ string constructor is proper in the sense it makes copy of
the source data.  however, it's a little weird that you are passing
bytea like this...bytea can contain null and c++ string initialization
stops at any 0 byte.

Not so.  If the constructor also specifies a length then the data
pointer's area is not assumed to be NUL-terminated.

Maybe you should be encoding the data to text (say, to hex) first?

Better to use the supplied length in the varlena descriptor.
--
Regards,

Dave  [RLU #314465]
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
dwnoon@ntlworld.com (David W Noon)
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*

#8Adrian Schreyer
ams214@cam.ac.uk
In reply to: David W Noon (#6)
Re: Weird problems with C extension and bytea as input type

On Tue, Mar 22, 2011 at 22:21, David W Noon <dwnoon@ntlworld.com> wrote:

On Tue, 22 Mar 2011 16:14:47 -0500, Merlin Moncure wrote about Re:
[GENERAL] Weird problems with C extension and bytea as input type:

[snip]

On Tue, Mar 22, 2011 at 8:22 AM, Adrian Schreyer <ams214@cam.ac.uk>
wrote:

[snip]

bytea *b = PG_GETARG_BYTEA_P(0);
char *ism;

ism = function(b);

PG_RETURN_CSTRING(ism);

What is the prototype for function()?  If it returns a char * then you
will likely have either scope problems, reentrancy problems or memory
leaks. If you are going to buy the C++ religion then you usually need to
buy it wholesale: do all if your string processing as std::string
objects and only return to char * when you revert to C.

you are right, it returns a char *.

The prototype:

char *function(bytea *b);

The actual C++ function looks roughly like this

extern "C"
char *function(bytea *b)
{
string ism;
[...]
return ism.c_str();
}

The postgres wrapper in C like this:

PG_FUNCTION_INFO_V1(bin_to_string);
Datum bin_to_string(PG_FUNCTION_ARGS)
{
bytea *b= PG_GETARG_BYTEA_P(0);

char *ism = function(b);

PG_RETURN_CSTRING(ism);
}

I have another function in C++ that parses the binary string (file)
into an object that is then further processed. This works for all
functions returning boolean or numeric values, only the string methods
produce these odd results. So as you said, the way in which strings
are passed between C++ and C in my code must be horribly wrong. What
would be the correct way?

Show quoted text

As a rough example:

 bytea *b = PG_GETARG_BYTEA_P(0);
 std::string ism;

 ism = function(std::string(VARDATA(b), VARSIZE(b)-VARHDRSZ));

 PG_RETURN_CSTRING(ism.c_str());

Note that this returns an ASCIIZ string, which is not necessarily the
same as the C++ string.  You would be better off creating a
PostgreSQL text object and then return that.

Well, C++ string constructor is proper in the sense it makes copy of
the source data.  however, it's a little weird that you are passing
bytea like this...bytea can contain null and c++ string initialization
stops at any 0 byte.

Not so.  If the constructor also specifies a length then the data
pointer's area is not assumed to be NUL-terminated.

Maybe you should be encoding the data to text (say, to hex) first?

Better to use the supplied length in the varlena descriptor.
--
Regards,

Dave  [RLU #314465]
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
dwnoon@ntlworld.com (David W Noon)
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*

#9dennis jenkins
dennis.jenkins.75@gmail.com
In reply to: Adrian Schreyer (#8)
Re: Weird problems with C extension and bytea as input type

On Wed, Mar 23, 2011 at 5:08 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote:

you are right, it returns a char *.

The prototype:

char *function(bytea *b);

The actual C++ function looks roughly like this

extern "C"
char *function(bytea *b)
{
  string ism;
  [...]
  return ism.c_str();
}

Don't do that. You are returning a pointer to an unallocated buffer
(previously held by a local variable). c_str() is just a const
pointer to a buffer held inside "ism". When ism goes out of scope,
that buffer if freed.

Either return "std::string", or strdup() the string and have the
caller free that. (but use the postgresql alloc pool function to
handle the strdup. I don't recall that function's name off the top of
my head).

#10Merlin Moncure
mmoncure@gmail.com
In reply to: dennis jenkins (#9)
Re: Weird problems with C extension and bytea as input type

On Wed, Mar 23, 2011 at 9:04 AM, dennis jenkins
<dennis.jenkins.75@gmail.com> wrote:

On Wed, Mar 23, 2011 at 5:08 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote:

you are right, it returns a char *.

The prototype:

char *function(bytea *b);

The actual C++ function looks roughly like this

extern "C"
char *function(bytea *b)
{
  string ism;
  [...]
  return ism.c_str();
}

Don't do that.  You are returning a pointer to an unallocated buffer
(previously held by a local variable).  c_str() is just a const
pointer to a buffer held inside "ism".  When ism goes out of scope,
that buffer if freed.

Either return "std::string", or strdup() the string and have the
caller free that.  (but use the postgresql alloc pool function to
handle the strdup.  I don't recall that function's name off the top of
my head).

that would be pstrdup, and it's the way to go (you don't have to
pfree). who says C doesn't have garbage collection?

merlin

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Adrian Schreyer (#8)
Re: Weird problems with C extension and bytea as input type

Adrian Schreyer <ams214@cam.ac.uk> writes:

The actual C++ function looks roughly like this

extern "C"
char *function(bytea *b)
{
string ism;
[...]
return ism.c_str();
}

My C++ is pretty rusty, but is the pointer returned by c_str() still
valid after the string variable goes out of scope? I'm wondering if you
need a pstrdup before returning.

regards, tom lane

#12Adrian Schreyer
adrian@schreyer.me
In reply to: Merlin Moncure (#10)
Re: Weird problems with C extension and bytea as input type

On Wed, Mar 23, 2011 at 14:08, Merlin Moncure <mmoncure@gmail.com> wrote:

On Wed, Mar 23, 2011 at 9:04 AM, dennis jenkins
<dennis.jenkins.75@gmail.com> wrote:

On Wed, Mar 23, 2011 at 5:08 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote:

you are right, it returns a char *.

The prototype:

char *function(bytea *b);

The actual C++ function looks roughly like this

extern "C"
char *function(bytea *b)
{
  string ism;
  [...]
  return ism.c_str();
}

Don't do that.  You are returning a pointer to an unallocated buffer
(previously held by a local variable).  c_str() is just a const
pointer to a buffer held inside "ism".  When ism goes out of scope,
that buffer if freed.

Either return "std::string", or strdup() the string and have the
caller free that.  (but use the postgresql alloc pool function to
handle the strdup.  I don't recall that function's name off the top of
my head).

that would be pstrdup, and it's the way to go (you don't have to
pfree).  who says C doesn't have garbage collection?

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

I am using pstrdup and it works now as expected. Thank you all for your help,

Adrian

#13Adrian Schreyer
ams214@cam.ac.uk
In reply to: Merlin Moncure (#10)
Re: Weird problems with C extension and bytea as input type

On Wed, Mar 23, 2011 at 14:08, Merlin Moncure <mmoncure@gmail.com> wrote:

On Wed, Mar 23, 2011 at 9:04 AM, dennis jenkins
<dennis.jenkins.75@gmail.com> wrote:

On Wed, Mar 23, 2011 at 5:08 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote:

you are right, it returns a char *.

The prototype:

char *function(bytea *b);

The actual C++ function looks roughly like this

extern "C"
char *function(bytea *b)
{
  string ism;
  [...]
  return ism.c_str();
}

Don't do that.  You are returning a pointer to an unallocated buffer
(previously held by a local variable).  c_str() is just a const
pointer to a buffer held inside "ism".  When ism goes out of scope,
that buffer if freed.

Either return "std::string", or strdup() the string and have the
caller free that.  (but use the postgresql alloc pool function to
handle the strdup.  I don't recall that function's name off the top of
my head).

that would be pstrdup, and it's the way to go (you don't have to
pfree).  who says C doesn't have garbage collection?

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

I am using pstrdup and it works now as expected. Thank you all for your help,

Adrian