problem with splitting a string

Started by Werner Echezuriaover 16 years ago6 messages
#1Werner Echezuria
wercool@gmail.com

Hi,

I'm trying to develop a contrib module in order to parse sqlf queries, I'm
using lemon as a LALR parser generator (because I think it's easier than
bison) and re2c (because I think it's easier than flex) but when I try to
split the string into words postgres add some weird characters (this works
in pure gcc), I write something like "CREATE FUZZY PREDICATE joven ON 0..120
AS (0,0,35,120);", but postgresql adds a character like  at the end of
"joven" and the others words.

The code I use to split the string is:

void parse_query(char *str,const char **sqlf){

parse_words(str);
*sqlf=fuzzy_query;
}
void parse_words(char *str){
char *word;
int token;
const char semicolon =';';
const char dot='.';
const char comma=',';
const char open_bracket='(';
const char close_bracket=')';
struct Token sToken;

int i = 0;

void* pParser = ParseAlloc (malloc);

while(str[i] !='\0'){
int c=0;

word=(char *)malloc(sizeof(char));

if(isspace(str[i]) || str[i]==semicolon){
i++;
continue;
}

if (str[i]==open_bracket || str[i]==close_bracket ||
str[i]==dot || str[i]==comma){
word[c] = str[i];
i++;
token=scan(word, strlen(word));
Parse(pParser, token, sToken);
continue;
}else{
while(!isspace(str[i]) && str[i]!=semicolon && str[i]!='\0' &&
str[i]!=open_bracket && str[i]!=close_bracket &&
str[i]!=dot && str[i]!=comma){
word[c++] = str[i++];
}
}

token=scan(word, strlen(word));

if (token==PARAMETRO){
//TODO: I don't know why it needs the malloc function again, all
I know is it's working
const char *param=word;
word= (char *)malloc(sizeof(char));
sToken.z=param;
}

Parse(pParser, token, sToken);
free(word);
}
Parse(pParser, 0, sToken);
ParseFree(pParser, free );

}

Header:

#ifndef SQLF_H_
#define SQLF_H_

typedef struct Token {
const char *z;
int value;
unsigned n;
} Token;
void parse_query(char *str,const char **sqlf);
void parse_words(char *str);
int scan(char *s, int l);

#endif /* SQLF_H_ */

Screen:

postgres=# select * from fuzzy.sqlf('CREATE FUZZY PREDICATE joven ON 0..120
AS (0,0,35,120);'::text);
ERROR: syntax error at or near ""
LINE 1: INSERT INTO fuzzydb.pg_fuzzypredicate VALUES(joven,0�
�,120
...
^
QUERY: INSERT INTO fuzzydb.pg_fuzzypredicate VALUES(joven,0�
�,120
�,0�

�,0�

�,35

�,120

�);

Thanks for any help

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Werner Echezuria (#1)
Re: problem with splitting a string

Werner Echezuria <wercool@gmail.com> writes:

I'm trying to develop a contrib module in order to parse sqlf queries, I'm
using lemon as a LALR parser generator (because I think it's easier than
bison) and re2c (because I think it's easier than flex) but when I try to
split the string into words postgres add some weird characters (this works
in pure gcc), I write something like "CREATE FUZZY PREDICATE joven ON 0..120
AS (0,0,35,120);", but postgresql adds a character like  at the end of
"joven" and the others words.

Maybe you are expecting 'text' values to be null-terminated? They are
not. You might look into using TextDatumGetCString or related functions
to convert.

regards, tom lane

PS: the chances of us accepting a contrib module that requires
significant unusual infrastructure to build seem pretty low from
where I sit. You're certainly free to do whatever you want for
private work, or even for a pgfoundry project --- but if you do
have ambitions of this eventually becoming contrib, "it's easier"
is not going to be sufficient rationale to not use bison/flex.

#3Werner Echezuria
wercool@gmail.com
In reply to: Tom Lane (#2)
Re: problem with splitting a string

Hi,

Well, I use TextDatumGetCString in the main file, but it remains with the
weird characters.

this is the main file:

#include "postgres.h"
#include "fmgr.h"
#include "gram.h"
#include "sqlf.h"
#include "utils/builtins.h"

extern Datum sqlf(PG_FUNCTION_ARGS);

PG_MODULE_MAGIC;

PG_FUNCTION_INFO_V1(sqlf);

Datum
sqlf(PG_FUNCTION_ARGS){

char *query = TextDatumGetCString(PG_GETARG_DATUM(0));
const char *parse_str;
char *result;

parse_query(query,&parse_str);

result=parse_str;

PG_RETURN_TEXT_P(cstring_to_text(result));
}

About the PS: Ok, I understand that if I want that you include this as a
contrib module I need to use bison/flex, I never thought about it, but I now
have a couple of questions:
What are the chances to really include it in PostgreSQL as a contrib module?
Are there any requirement I have to follow?

2009/8/6 Tom Lane <tgl@sss.pgh.pa.us>

Show quoted text

Werner Echezuria <wercool@gmail.com> writes:

I'm trying to develop a contrib module in order to parse sqlf queries,

I'm

using lemon as a LALR parser generator (because I think it's easier than
bison) and re2c (because I think it's easier than flex) but when I try to
split the string into words postgres add some weird characters (this

works

in pure gcc), I write something like "CREATE FUZZY PREDICATE joven ON

0..120

AS (0,0,35,120);", but postgresql adds a character like at the end of
"joven" and the others words.

Maybe you are expecting 'text' values to be null-terminated? They are
not. You might look into using TextDatumGetCString or related functions
to convert.

regards, tom lane

PS: the chances of us accepting a contrib module that requires
significant unusual infrastructure to build seem pretty low from
where I sit. You're certainly free to do whatever you want for
private work, or even for a pgfoundry project --- but if you do
have ambitions of this eventually becoming contrib, "it's easier"
is not going to be sufficient rationale to not use bison/flex.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Werner Echezuria (#3)
Re: problem with splitting a string

Werner Echezuria <wercool@gmail.com> writes:

Well, I use TextDatumGetCString in the main file, but it remains with the
weird characters.

Hmm, no ideas then. Your interface code looks fine (making parse_str
const seems a bit strange, but it's not related to the problem at hand).
Given that the problems appear at token boundaries I'd guess that re2c
isn't behaving the way you expect, but I'm not familiar with that tool
so I can't give any specific advice.

About the PS: Ok, I understand that if I want that you include this as a
contrib module I need to use bison/flex, I never thought about it, but I now
have a couple of questions:
What are the chances to really include it in PostgreSQL as a contrib module?
Are there any requirement I have to follow?

Well, it'd mainly be a question of whether there's enough interest out
there, which I can't judge. From a project standpoint we just require
that it be BSD-licensed and not impose any undue new burden on
maintainers (thus not wanting new build tools), but beyond that it's a
matter of how many people might use it.

regards, tom lane

#5Alvaro Herrera
alvherre@commandprompt.com
In reply to: Tom Lane (#4)
Re: problem with splitting a string

Tom Lane escribi�:

Well, it'd mainly be a question of whether there's enough interest out
there, which I can't judge. From a project standpoint we just require
that it be BSD-licensed and not impose any undue new burden on
maintainers (thus not wanting new build tools), but beyond that it's a
matter of how many people might use it.

What use is there for fuzzy predicates? I think it would mainly be to
stop more students from coming up with new implementations of the same
thing over and over.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#6Werner Echezuria
wercool@gmail.com
In reply to: Alvaro Herrera (#5)
Re: problem with splitting a string

What use is there for fuzzy predicates? I think it would mainly be to
stop more students from coming up with new implementations of the same
thing over and over.

Well, I'm sorry if anyone of us who is involved on these projects have
already explain the true usefulness of sqlf and fuzzy database, I guess we
focus just in the technical problem, but never explain the theory.

For example here is a paragraph from Flexible queries in relational
databases paper:

"This paper deals with this second type of "uncertainty" and is concerned
essentially with
database language extensions in order to deal with more expressive
requirements. Indeed,
consider a query such that, for instance, "retrieve the apartments which are
not too expensive
and not too far from downtown". In such a case, there does not exist a
definite threshold for
which the price becomes suddenly too high, but rather we have to
discriminate between
prices which are perfectly acceptable for the user, and other prices,
somewhat higher, which
are still more or less acceptable (especially if the apartment is close to
downtown). Note that
the meaning of vague predicate expressions like "not too expensive" is
context/user
dependent, rather than universal. Fuzzy set membership functions [26] are
convenient tools
for modelling user's preference profiles and the large panoply of fuzzy set
connectives can
capture the different user attitudes concerning the way the different
criteria present in his/her
query compensate or not; see [4] for a unified presentation in the fuzzy set
framework of the
existing proposals for handling flexible queries. Moreover in a given query,
some part of the
request may be less important to fulfill (e.g., in the above example, the
price requirement
may be judged more important than the distance to downtown); the handling of
importance
leads to the need for weighted connectives, as it will be seen in the
following."

I really think this could be something useful, but it is sometimes difficult
to implement and I'm trying to make a different and easy way to do things.

regards