workaround for lack of REPLACE() function
Hi,
I looked around for a definitive answer to this question but to no avail. It appears that PostgreSQL does not implement the SQL function REPLACE() , which, in MySQL works like this:
REPLACE(str,from_str,to_str)
Returns the string str with all all occurrences of the string from_str replaced by the string to_str:
What do folks usually do when they have to do a global search/replace on a big table?
Thanks,
Matt
"Agent155 Support" <matt@planetnet.com> writes:
What do folks usually do when they have to do a global search/replace on a =
big table?
You can code pretty much any text transformation you'd like in plperl or
pltcl, both of which languages are very strong on string manipulations.
So there's not been a lot of concern about the lack of a SQL-level
substitution operator.
IIRC, SQL99 does specify some sort of substring replacement function,
and Thomas recently implemented it for 7.3. But it's not very bright
and I suspect people will keep falling back on plperl or pltcl to do
anything nontrivial.
regards, tom lane
Tom Lane wrote:
"Agent155 Support" <matt@planetnet.com> writes:
What do folks usually do when they have to do a global search/replace on a =
big table?You can code pretty much any text transformation you'd like in plperl or
pltcl, both of which languages are very strong on string manipulations.
So there's not been a lot of concern about the lack of a SQL-level
substitution operator.IIRC, SQL99 does specify some sort of substring replacement function,
and Thomas recently implemented it for 7.3. But it's not very bright
and I suspect people will keep falling back on plperl or pltcl to do
anything nontrivial.
I think Thomas did just recently commit the SQL99 OVERLAY function, but
similar to Tom's comment, I don't like the way SQL99 defines it. I've
written a replace() C function (along with a couple of other string
manipulation functions) for my own use. If you'd like a copy let me know
and I'll gladly send it to you.
I have thought about sending it in as a contrib, but wasn't sure if
there was enough interest to warrant it.
Joe
...
I think Thomas did just recently commit the SQL99 OVERLAY function, but
similar to Tom's comment, I don't like the way SQL99 defines it. I've
written a replace() C function (along with a couple of other string
manipulation functions) for my own use. If you'd like a copy let me know
and I'll gladly send it to you.
OK, what don't you like about it? If you can define some functionality
that we *should* have, then it is likely to go into the main distro.
Either as an extension to existing functions or as a separate function.
"Style" counts for not-much, but "can't do it with what we have" counts
for a lot.
- Thomas
Thomas Lockhart wrote:
I think Thomas did just recently commit the SQL99 OVERLAY function, but
similar to Tom's comment, I don't like the way SQL99 defines it. I've
written a replace() C function (along with a couple of other string
manipulation functions) for my own use. If you'd like a copy let me know
and I'll gladly send it to you.OK, what don't you like about it? If you can define some functionality
that we *should* have, then it is likely to go into the main distro.
Either as an extension to existing functions or as a separate function."Style" counts for not-much, but "can't do it with what we have" counts
for a lot.
Hmmm, making justify my comment ;-)
Well, OVERLAY is defined as:
overlay(string placing string from integer [for integer])
and replace() is defined (by me at least) as:
replace(inputstring, old-substr, new-substr)
OVERLAY requires that I know the "from" position and possibly the "for"
in advance. Other functions (such as strpos() and substr()) can be used
to help, but consider the following:
test=# create table strtest(f1 text);
CREATE TABLE
test=# insert into strtest values('/usr/local/pgsql/data');
INSERT 124955 1
test=# select replace(f1,'/local','') from strtest;
replace
-----------------
/usr/pgsql/data
(1 row)
Now, how can I do this with overlay()? If I happen to know in advance
that my only input string is '/usr/local/pgsql/data', then I can do:
test=# select overlay(f1 placing '' from 5 for 6) from strtest;
overlay
-----------------
/usr/pgsql/data
(1 row)
But what if now I do:
test=# insert into strtest values('/m1/usr/local/pgsql/data');
INSERT 124957 1
Now
test=# select replace(f1,'/local','') from strtest;
replace
--------------------
/usr/pgsql/data
/m1/usr/pgsql/data
(2 rows)
works fine, but
test=# select overlay(f1 placing '' from 5 for 6) from strtest;
overlay
--------------------
/usr/pgsql/data
/m1/cal/pgsql/data
(2 rows)
doesn't give the desired result. Of course you can work around this, but
it starts to get ugly:
test=# select overlay(f1 placing '' from strpos(f1,'/local') for 6) from
strtest;
overlay
--------------------
/usr/pgsql/data
/m1/usr/pgsql/data
(2 rows)
But now what happens if you wanted to replace all of the '/' characters
with '\'?
test=# select replace(f1,'/','\\') from strtest;
replace
--------------------------
\usr\local\pgsql\data
\m1\usr\local\pgsql\data
(2 rows)
You can't do this at all with overlay(), unless you want to write a
PL/pgSQL function and loop through each string. I started out with
exactly this, using strpos() and substr(), but I thought a C function
was cleaner, and it is certainly faster.
BTW, the other functions already in the string manipulation module are:
to_hex -- Accepts bigint and returns it as equivilent hex string
to_hex(bigint inputnum) RETURNS text
test=# select to_hex(123456789::bigint);
to_hex
---------
75bcd15
(1 row)
and
extract_tok -- Extracts and returns individual token from delimited
text
extract_tok(text inputstring, text delimiter, int posn) RETURNS text
test=# select extract_tok(extract_tok('f=1&g=3&h=4','&',2),'=',2);
extract_tok
-------------
3
(1 row)
extract_tok() is actually already in dblink (dblink_strtok), because it
is useful in that context, but it probably belongs in a contrib for
string manipulation instead. In fact, now that I think about it, so is
replace() (dblink_replace).
Regards,
Joe
(crossposted to -hackers, should follow up on that list)
Well, OVERLAY is defined as:
overlay(string placing string from integer [for integer])
and replace() is defined (by me at least) as:
replace(inputstring, old-substr, new-substr)
OK.
OVERLAY requires that I know the "from" position and possibly the "for"
in advance. Other functions (such as strpos() and substr()) can be used
to help...
Right. So you can do your example pretty easily:
thomas=# select overlay(f1 placing '' from position('/local' in f1)
thomas-# for length('/local')) from strtest;
overlay
--------------------
/usr/pgsql/data
/m1/usr/pgsql/data
And if you don't like that much typing you can do:
thomas=# create function replace(text, text, text) returns text as '
thomas'# select overlay($1 placing $3 from position($2 in $1) for
length($2));
thomas'# ' language 'sql';
CREATE FUNCTION
thomas=# select replace(f1, '/local', '') from strtest;
replace
--------------------
/usr/pgsql/data
/m1/usr/pgsql/data
But now what happens if you wanted to replace all of the '/' characters
with '\'?...
You can't do this at all with overlay(), unless you want to write a
PL/pgSQL function and loop through each string. I started out with
exactly this, using strpos() and substr(), but I thought a C function
was cleaner, and it is certainly faster.
OK, this is in the "can't do it what we have" category. Should we have
it accept a regular expression rather than a simple string? In either
case it should probably go into the main distro. Except that I see
"REPLACE" is mentioned as a reserved word in SQL99. But has no other
mention in my copy of the draft standard. Anyone else have an idea what
it might be used for in the standard?
The other functions look useful too, unless to_char() and varbit can be
evolved to support this functionality.
- Thomas
Thomas Lockhart wrote:
(crossposted to -hackers, should follow up on that list)
<snip>
OK, this is in the "can't do it what we have" category. Should we have
it accept a regular expression rather than a simple string? In either
case it should probably go into the main distro. Except that I see
"REPLACE" is mentioned as a reserved word in SQL99. But has no other
mention in my copy of the draft standard. Anyone else have an idea what
it might be used for in the standard?
Not sure, but I see what you mean. Perhaps because of Oracle pushing to
legitimize the "CREATE OR REPLACE" syntax? In any case, this works in 8i:
SQL> select replace('hello','l','x') from dual;
REPLACE('HELLO','L','X')
------------------------
hexxo
and here it is in MSSQL 7:
select replace('hello','l','x')
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
hexxo
(1 row(s) affected)
and my proposed PostgreSQL function:
test=# select replace('hello','l','x');
replace
---------
hexxo
(1 row)
so at least we would be consistant/compatable with these two.
The other functions look useful too, unless to_char() and varbit can be
evolved to support this functionality.
I will take a look at merging these into existing functions, but I have
a few other things ahead of this in my queue.
One of the reasons I wasn't pushing too hard to get replace() into the
backend is because my current solution is a bit of a hack. It uses the
builtin length, strpos and substr text functions (which I think makes
sense since they already know how to deal with mb strings), but because
they accept and return text, I'm doing lots of conversions back and
forth from (* text) to (* char). To do this "right" probably means
reworking the text string manipulation functions to be wrappers around
some equivalent functions accepting and returning C strings. That was
more work than I had time for when I wrote the current replace(). But as
I said, if there is support for getting this into the backend, I'll add
it to my todo list:
- Create new backend function replace()
- Either create new backend functions, or merge into existing functions:
to_hex() and extract_tok()
Joe
Joe Conway wrote:
more work than I had time for when I wrote the current replace(). But as
I said, if there is support for getting this into the backend, I'll add
it to my todo list:- Create new backend function replace()
- Either create new backend functions, or merge into existing functions:
to_hex() and extract_tok()
I'm just starting to take a look at this again. While studying the
current text_substr() function I found two behaviors which conflict with
specific SQL92/SQL99 requirements, and one bug. First the spec
compliance -- SQL92 section 6.7/SQL99 section 6.18 say:
If <character substring function> is specified, then:
a) Let C be the value of the <character value expression>, let LC be the
length of C, and let S be the value of the <start position>.
b) If <string length> is specified, then let L be the value of <string
length> and let E be S+L. Otherwise, let E be the larger of LC + 1
and S.
c) If either C, S, or L is the null value, then the result of the
<character substring function> is the null value.
d) If E is less than S, then an exception condition is raised: data
exception-substring error.
e) Case:
i) If S is greater than LC or if E is less than 1, then the result of
the <character substring function> is a zero-length string.
ii) Otherwise,
1) Let SI be the larger of S and 1. Let El be the smaller of E and
LC+l. Let Ll be El-Sl.
2) The result of the <character substring function> is a character
string containing the Ll characters of C starting at character
number Sl in the same order that the characters appear in C.
The only way for d) to be true is when L < 0. Instead of an error, we do:
test=# select substr('hello',2,-1);
substr
--------
ello
(1 row)
The other spec issue is wrt para e)i). If E (=S+L) < 1, we should return
a zero-length string. Currently I get:
test=# select substr('hello',-4,3);
substr
--------
hello
(1 row)
Neither behavior is documented (unless it's somewhere other than:
http://developer.postgresql.org/docs/postgres/functions-string.html ).
The bug is this one:
test=# create DATABASE testmb with encoding = 'EUC_JP';
CREATE DATABASE
test=# \c testmb
You are now connected to database testmb.
testmb=# select substr('hello',6,2);
substr
--------
~
(1 row)
testmb=# \c test
You are now connected to database test.
test=# select substr('hello',6,2);
substr
--------
(1 row)
The multibyte database behavior is the bug. The SQL_ASCII behavior is
correct (zero-length string):
test=# select substr('hello',6,2) is null;
?column?
----------
f
(1 row)
Any objection if I rework this function to meet SQL92 and fix the bug?
Or is the SQL92 part not desirable because it breaks backward
compatability?
In any case, can the #ifdef MULTIBYTE's be removed now in favor of a
test for encoding max length?
Joe
Any objection if I rework this function to meet SQL92 and fix the bug?
I don't object.
Or is the SQL92 part not desirable because it breaks backward
compatability?
I don't think so.
In any case, can the #ifdef MULTIBYTE's be removed now in favor of a
test for encoding max length?
Sure.
--
Tatsuo Ishii
Tatsuo Ishii wrote:
Any objection if I rework this function to meet SQL92 and fix the bug?
I've started working on text_substr() as described in this thread (which
is hopefully prep work for the replace() function that started the
thread). I haven't really looked at toast or multibyte closely before,
so I'd like to ask a couple of questions to be sure I'm understanding
the relevant issues correctly.
First, in textlen() I see (ignoring multibyte for a moment):
text *t = PG_GETARG_TEXT_P(0);
PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ);
Tom has pointed out to me before that PG_GETARG_TEXT_P(n) incurs the
overhead of retrieving and possibly decompressing a toasted datum. So my
first question is, can we simply do:
PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ);
and save the overhead of retrieving and decompressing the whole datum?
Now, in the multibyte case, again in textlen(), I see:
/* optimization for single byte encoding */
if (pg_database_encoding_max_length() <= 1)
PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ);
PG_RETURN_INT32(
pg_mbstrlen_with_len(VARDATA(t), VARSIZE(t) - VARHDRSZ));
Three questions here.
1) In the case of encoding max length == 1, can we treat it the same as
the non-multibyte case (I presume they are exactly the same)?
2) Can encoding max length ever be < 1? Doesn't make sense to me.
3) In the case of encoding max length > 1, if I understand correctly,
each encoded character can be one *or more* bytes, up to and encluding
encoding max length bytes. So the *only* way presently to get the length
of the original character string is to loop through the entire string
checking the length of each individual character (that's what
pg_mbstrlen_with_len() does it seems)?
Finally, if 3) is true, then there is no way to avoid the retrieval and
decompression of the datum just to find out its length. For large
datums, detoasting plus the looping through each character would add a
huge amount of overhead just to get at the length of the original
string. I don't know if we need to be able to get *just* the length
often enough to really care, but if we do, I had an idea for some future
release (I wouldn't propose doing this for 7.3):
- add a new EXTENDED state to va_external for MULTIBYTE
- any string with max encoding length > 1 would be EXTENDED even if it
is not EXTERNAL and not COMPRESSED.
- to each of the structs in the union, add va_strlen
- populate va_strlen on INSERT and maintain it on UPDATE.
Now a new function similar to toast_raw_datum_size(), maybe
toast_raw_datum_strlen() could be used to get the original string
length, whether MB or not, without needing to retrieve and decompress
the entire datum.
I understand we would either: have to steal another bit from the VARHDR
which would reduce the effective size of a valena from 1GB down to .5GB;
or we would need to add a byte or two to the VARHDR which is extra
per-datum overhead. I'm not sure we would want to do either. But I
wanted to toss out the idea while it was fresh on my mind.
Thanks,
Joe
Tatsuo Ishii wrote:
Any objection if I rework this function to meet SQL92 and fix the bug?
I don't object.
One more question on this: how can I generate some characters with > 1
encoding length? I need a way to test the work I'm doing, and I'm not
quite sure how to test it.
Just making a database that uses a MB encoding doesn't make 0-9/A-Z/a-z
into characters of > 1 byte, does it? Sorry, but never having used MB
encoding has left me a bit clueless on this ;-)
Joe
Now, in the multibyte case, again in textlen(), I see:
/* optimization for single byte encoding */
if (pg_database_encoding_max_length() <= 1)
PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ);PG_RETURN_INT32(
pg_mbstrlen_with_len(VARDATA(t), VARSIZE(t) - VARHDRSZ));Three questions here.
1) In the case of encoding max length == 1, can we treat it the same as
the non-multibyte case (I presume they are exactly the same)?
Yes.
2) Can encoding max length ever be < 1? Doesn't make sense to me.
No. It seems just a defensive coding.
3) In the case of encoding max length > 1, if I understand correctly,
each encoded character can be one *or more* bytes, up to and encluding
encoding max length bytes.
Right.
So the *only* way presently to get the length
of the original character string is to loop through the entire string
checking the length of each individual character (that's what
pg_mbstrlen_with_len() does it seems)?
Yes.
Finally, if 3) is true, then there is no way to avoid the retrieval and
decompression of the datum just to find out its length. For large
datums, detoasting plus the looping through each character would add a
huge amount of overhead just to get at the length of the original
string. I don't know if we need to be able to get *just* the length
often enough to really care, but if we do, I had an idea for some future
release (I wouldn't propose doing this for 7.3):- add a new EXTENDED state to va_external for MULTIBYTE
- any string with max encoding length > 1 would be EXTENDED even if it
is not EXTERNAL and not COMPRESSED.
- to each of the structs in the union, add va_strlen
- populate va_strlen on INSERT and maintain it on UPDATE.Now a new function similar to toast_raw_datum_size(), maybe
toast_raw_datum_strlen() could be used to get the original string
length, whether MB or not, without needing to retrieve and decompress
the entire datum.I understand we would either: have to steal another bit from the VARHDR
which would reduce the effective size of a valena from 1GB down to .5GB;
or we would need to add a byte or two to the VARHDR which is extra
per-datum overhead. I'm not sure we would want to do either. But I
wanted to toss out the idea while it was fresh on my mind.
Interesting idea. I also was thinking about adding some extra
infomation to text data types such as character set, collation
etc. for 7.4 or later.
--
Tatsuo Ishii
Tatsuo Ishii wrote:
Now a new function similar to toast_raw_datum_size(), maybe
toast_raw_datum_strlen() could be used to get the original string
length, whether MB or not, without needing to retrieve and decompress
the entire datum.I understand we would either: have to steal another bit from the VARHDR
which would reduce the effective size of a valena from 1GB down to .5GB;
or we would need to add a byte or two to the VARHDR which is extra
per-datum overhead. I'm not sure we would want to do either. But I
wanted to toss out the idea while it was fresh on my mind.Interesting idea. I also was thinking about adding some extra
infomation to text data types such as character set, collation
etc. for 7.4 or later.
I ran some tests to confirm the theory above regarding overhead;
create table strtest(f1 text);
do 100 times
insert into strtest values('12345....'); -- 100000 characters
loop
do 1000 times
select length(f1) from strtest;
loop
Results:
SQL_ASCII database, new code:
=============================
PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ);
==> 2 seconds
SQL_ASCII database, old code:
=============================
text
*t = PG_GETARG_TEXT_P(0);
PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ);
==> 66 seconds
EUC_JP database, new & old code:
================================
text
*t = PG_GETARG_TEXT_P(0);
PG_RETURN_INT32(pg_mbstrlen_with_len(VARDATA(t),
VARSIZE(t) - VARHDRSZ));
==> 469 seconds
So it appears that, while detoasting is moderately expensive (adds 64
seconds to the test), the call to pg_mbstrlen_with_len() is very
expensive (adds 403 seconds to the test).
Joe
Tatsuo Ishii wrote:
Joe Conway wrote:
Any objection if I rework this function to meet SQL92 and fix the bug?
I don't object.
Or is the SQL92 part not desirable because it breaks backward
compatability?I don't think so.
In any case, can the #ifdef MULTIBYTE's be removed now in favor of a
test for encoding max length?Sure.
<sorry so long-winded>
Attached is a patch that implements the above items wrt text_substr(). I
also modified textlen(), textoctetlen(), byteaoctetlen(), and
bytea_substr(). Here's a summary of the change to each:
- text_substr(): rewrite function to meet SQL92, fix MB related bug,
and remove #ifdef MULTIBYTE.
- bytea_substr(): same as text_substr() enc max len == 1.
- textoctetlen(), byteaoctetlen(): use
toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ)
to avoid detoasting.
- textlen(): same as textoctetlen() for enc max len == 1, and remove
#ifdef MULTIBYTE.
I did some benchmarking to ensure no performance degradation, and to
help me understand MB and related performance issues. The results were
very enlightening:
===================================================================
First test - textlen() (already reported, repeated here for completeness):
-------------------------------------------------------------------
create table strtest(f1 text);
do 100 times
insert into strtest values('12345....'); -- 100000 characters
loop
do 1000 times
select length(f1) from strtest;
loop
-------------------------------------------------------------------
Results:
SQL_ASCII database, new code 2 seconds
SQL_ASCII database, old code 66 seconds
EUC_JP database, new & old code 469 seconds
===================================================================
Second test - short string test:
-------------------------------------------------------------------
create table parts(partnum text);
<fill with ~220000 rows, 8 to 12 characters each>
do 300 times
select substr(partnum, 3, 3) from parts;
loop
-------------------------------------------------------------------
Results:
SQL_ASCII database, old code 352 seconds
SQL_ASCII database, new code 350 seconds
EUC_JP database, old code 461 seconds
EUC_JP database, new code 422 seconds
===================================================================
Third test - long string, EXTENDED storage (EXTERNAL+COMPRESSED):
-------------------------------------------------------------------
create table strtest(f1 text);
do 100 times
insert into strtest values('12345....'); -- 100000 characters
loop
do 1000 times
select substr(f1, 89000, 10000) from strtest;
loop
-------------------------------------------------------------------
Results:
SQL_ASCII database, old code 59 seconds
SQL_ASCII database, new code 58 seconds
EUC_JP database, old code 915 seconds
EUC_JP database, new code 912 seconds
===================================================================
Forth test - long string, EXTERNAL storage (not COMPRESSED)
-------------------------------------------------------------------
create table strtest(f1 text);
do 100 times
insert into strtest values('12345....'); -- 100000 characters
loop
do 1000 times
select substr(f1, 89000, 10000) from strtest;
loop
-------------------------------------------------------------------
Results:
SQL_ASCII database, old code 17 seconds
SQL_ASCII database, new code 17 seconds
EUC_JP database, old code 918 seconds
EUC_JP database, new code 911 seconds
The only remaining problem is that this causes opr_sanity to fail based
on this query:
-- Considering only built-in procs (prolang = 12), look for multiple
-- uses of the same internal function (ie, matching prosrc fields).
-- It's OK to have several entries with different pronames for the same
-- internal function, but conflicts in the number of arguments and other
-- critical items should be complained of.
SELECT p1.oid, p1.proname, p2.oid, p2.proname
FROM pg_proc AS p1, pg_proc AS p2
WHERE p1.oid != p2.oid AND
p1.prosrc = p2.prosrc AND
p1.prolang = 12 AND p2.prolang = 12 AND
(p1.prolang != p2.prolang OR
p1.proisagg != p2.proisagg OR
p1.prosecdef != p2.prosecdef OR
p1.proisstrict != p2.proisstrict OR
p1.proretset != p2.proretset OR
p1.provolatile != p2.provolatile OR
p1.pronargs != p2.pronargs);
This fails because I implemented text_substr() and bytea_substr() to
take either 2 or 3 args. This was necessary for SQL92 spec compliance.
SQL92 requires L < 0 to throw an error, and L IS NULL to return NULL. It
also requires that if L is not provided, the length to the end of the
string is assumed. Current code handles L IS NULL correctly but not L <
0 -- it assumes L < 0 is the same as L is not provided. By allowing the
function to determine if it was passed 2 or 3 args, this can be handled
properly.
So the question is, can/should I change opr_sanity to allow this case?
I also still owe some additions to the strings regression test to make
it cover toasted values.
Other than those two issues, I think the patch is ready to go. I'm
planning to take on the replace function next.
Thanks,
Joe
Attachments:
substr.2002.08.13.3.patchtext/plain; name=substr.2002.08.13.3.patchDownload
Index: src/backend/utils/adt/varlena.c
===================================================================
RCS file: /opt/src/cvs/pgsql-server/src/backend/utils/adt/varlena.c,v
retrieving revision 1.87
diff -c -r1.87 varlena.c
*** src/backend/utils/adt/varlena.c 4 Aug 2002 06:44:47 -0000 1.87
--- src/backend/utils/adt/varlena.c 14 Aug 2002 05:42:45 -0000
***************
*** 18,23 ****
--- 18,24 ----
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+ #include "access/tuptoaster.h"
#include "utils/builtins.h"
#include "utils/pg_locale.h"
***************
*** 285,303 ****
Datum
textlen(PG_FUNCTION_ARGS)
{
! text *t = PG_GETARG_TEXT_P(0);
! #ifdef MULTIBYTE
! /* optimization for single byte encoding */
! if (pg_database_encoding_max_length() <= 1)
! PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ);
!
! PG_RETURN_INT32(
! pg_mbstrlen_with_len(VARDATA(t), VARSIZE(t) - VARHDRSZ)
! );
! #else
! PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ);
! #endif
}
/*
--- 286,309 ----
Datum
textlen(PG_FUNCTION_ARGS)
{
! /* fastpath when max encoding length is one */
! if (pg_database_encoding_max_length() == 1)
! PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ);
! if (pg_database_encoding_max_length() > 1)
! {
! text *t = PG_GETARG_TEXT_P(0);
!
! PG_RETURN_INT32(pg_mbstrlen_with_len(VARDATA(t),
! VARSIZE(t) - VARHDRSZ));
! }
!
! /* should never get here */
! elog(ERROR, "Invalid backend encoding; encoding max length "
! "is less than one.");
!
! /* notreached: suppress compiler warning */
! PG_RETURN_NULL();
}
/*
***************
*** 308,316 ****
Datum
textoctetlen(PG_FUNCTION_ARGS)
{
! text *arg = PG_GETARG_TEXT_P(0);
!
! PG_RETURN_INT32(VARSIZE(arg) - VARHDRSZ);
}
/*
--- 314,320 ----
Datum
textoctetlen(PG_FUNCTION_ARGS)
{
! PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ);
}
/*
***************
*** 358,363 ****
--- 362,375 ----
PG_RETURN_TEXT_P(result);
}
+
+ #define PG_TEXTARG_GET_STR(arg_) \
+ DatumGetCString(DirectFunctionCall1(textout, PG_GETARG_DATUM(arg_)))
+ #define PG_TEXT_GET_STR(textp_) \
+ DatumGetCString(DirectFunctionCall1(textout, PointerGetDatum(textp_)))
+ #define PG_STR_GET_TEXT(str_) \
+ DatumGetTextP(DirectFunctionCall1(textin, CStringGetDatum(str_)))
+
/*
* text_substr()
* Return a substring starting at the specified position.
***************
*** 382,471 ****
* - Thomas Lockhart 1998-12-10
* Now uses faster TOAST-slicing interface
* - John Gray 2002-02-22
*/
Datum
text_substr(PG_FUNCTION_ARGS)
{
! text *string;
! int32 m = PG_GETARG_INT32(1);
! int32 n = PG_GETARG_INT32(2);
! int32 sm;
! int32 sn;
! int eml = 1;
! #ifdef MULTIBYTE
! int i;
! int len;
! text *ret;
! char *p;
! #endif
! /*
! * starting position before the start of the string? then offset into
! * the string per SQL92 spec...
! */
! if (m < 1)
{
! n += (m - 1);
! m = 1;
! }
! /* Check for m > octet length is made in TOAST access routine */
! /* m will now become a zero-based starting position */
! sm = m - 1;
! sn = n;
! #ifdef MULTIBYTE
! eml = pg_database_encoding_max_length ();
! if (eml > 1)
! {
! sm = 0;
! if (n > -1)
! sn = (m + n) * eml + 3; /* +3 to avoid mb characters overhanging slice end */
else
! sn = n; /* n < 0 is special-cased by heap_tuple_untoast_attr_slice */
! }
! #endif
! string = PG_GETARG_TEXT_P_SLICE (0, sm, sn);
! if (eml == 1)
! {
! PG_RETURN_TEXT_P (string);
! }
! #ifndef MULTIBYTE
! PG_RETURN_NULL(); /* notreached: suppress compiler warning */
! #endif
! #ifdef MULTIBYTE
! if (n > -1)
! len = pg_mbstrlen_with_len (VARDATA (string), sn - 3);
! else /* n < 0 is special-cased; need full string length */
! len = pg_mbstrlen_with_len (VARDATA (string), VARSIZE(string)-VARHDRSZ);
!
! if (m > len)
! {
! m = 1;
! n = 0;
! }
! m--;
! if (((m + n) > len) || (n < 0))
! n = (len - m);
!
! p = VARDATA(string);
! for (i = 0; i < m; i++)
! p += pg_mblen(p);
! m = p - VARDATA(string);
! for (i = 0; i < n; i++)
! p += pg_mblen(p);
! n = p - (VARDATA(string) + m);
! ret = (text *) palloc(VARHDRSZ + n);
! VARATT_SIZEP(ret) = VARHDRSZ + n;
! memcpy(VARDATA(ret), VARDATA(string) + m, n);
! PG_RETURN_TEXT_P(ret);
! #endif
}
/*
--- 394,582 ----
* - Thomas Lockhart 1998-12-10
* Now uses faster TOAST-slicing interface
* - John Gray 2002-02-22
+ * Remove "#ifdef MULTIBYTE" and test for encoding_max_length instead. Change
+ * behaviors conflicting with SQL92 to meet SQL92 (if E = S + L < S throw
+ * error; if E < 1, return '', not entire string). Fixed MB related bug when
+ * S > LC and < LC + 4 sometimes garbage characters are returned.
+ * - Joe Conway 2002-08-10
*/
Datum
text_substr(PG_FUNCTION_ARGS)
{
! int S = PG_GETARG_INT32(1); /* start position */
! int eml = pg_database_encoding_max_length();
! int S1; /* adjusted start position */
! int L1; /* adjusted substring length */
! /* life is easy if the encoding max length is 1 */
! if (eml == 1)
{
! S1 = Max(S, 1);
! if (fcinfo->nargs == 2)
! {
! /*
! * Not passed a length - PG_GETARG_TEXT_P_SLICE()
! * grabs everything to the end of the string if we pass it
! * a negative value for length.
! */
! L1 = -1;
! }
! else
! {
! /* end position */
! int E = S + PG_GETARG_INT32(2);
! /*
! * A negative value for L is the only way for the end position
! * to be before the start. SQL99 says to throw an error.
! */
! if (E < S)
! elog(ERROR, "negative substring length not allowed");
! /*
! * A zero or negative value for the end position can happen if the start
! * was negative or one. SQL99 says to return a zero-length string.
! */
! if (E < 1)
! PG_RETURN_TEXT_P(PG_STR_GET_TEXT(""));
!
! L1 = E - S1;
! }
!
! /*
! * If the start position is past the end of the string,
! * SQL99 says to return a zero-length string --
! * PG_GETARG_TEXT_P_SLICE() will do that for us.
! * Convert to zero-based starting position
! */
! PG_RETURN_TEXT_P (PG_GETARG_TEXT_P_SLICE (0, S1 - 1, L1));
! }
! else if (eml > 1)
! {
! /*
! * When encoding max length is > 1, we can't get LC without
! * detoasting, so we'll grab a conservatively large slice
! * now and go back later to do the right thing
! */
! int slice_start;
! int slice_size;
! int slice_strlen;
! text *slice;
! int E1;
! int i;
! char *p;
! char *s;
! text *ret;
!
! /*
! * if S is past the end of the string, the tuple toaster
! * will return a zero-length string to us
! */
! S1 = Max(S, 1);
!
! /*
! * We need to start at position zero because there is no
! * way to know in advance which byte offset corresponds to
! * the supplied start position.
! */
! slice_start = 0;
!
! if (fcinfo->nargs == 2)
! {
! /*
! * If we were not passed a length, the spec says that
! * E = Max(LC + 1, S). Since we don't know LC yet, set
! * slice_size = -1 which will cause heap_tuple_untoast_attr_slice
! * to give use everything to the end of the string.
! * If S > LC + 1, we'll get back a zero length string anyway.
! */
! slice_size = L1 = -1;
! }
else
! {
! int E = S + PG_GETARG_INT32(2);
! /*
! * A negative value for L is the only way for the end position
! * to be before the start. SQL99 says to throw an error.
! */
! if (E < S)
! elog(ERROR, "negative substring length not allowed");
! /*
! * A zero or negative value for the end position can happen if the start
! * was negative or one. SQL99 says to return a zero-length string.
! */
! if (E < 1)
! PG_RETURN_TEXT_P(PG_STR_GET_TEXT(""));
! /*
! * if E is past the end of the string, the tuple toaster
! * will truncate the length for us
! */
! L1 = E - S1;
! /*
! * Total slice size in bytes can't be any longer than the start
! * position plus substring length times the encoding max length.
! */
! slice_size = (S1 + L1) * eml;
! }
! slice = PG_GETARG_TEXT_P_SLICE (0, slice_start, slice_size);
! /* see if we got back an empty string */
! if ((VARSIZE(slice) - VARHDRSZ) == 0)
! PG_RETURN_TEXT_P(PG_STR_GET_TEXT(""));
!
! /* Now we can get the actual length of the slice in MB characters */
! slice_strlen = pg_mbstrlen_with_len (VARDATA(slice), VARSIZE(slice) - VARHDRSZ);
!
! /* Check that the start position wasn't > slice_strlen. If so,
! * SQL99 says to return a zero-length string.
! */
! if (S1 > slice_strlen)
! PG_RETURN_TEXT_P(PG_STR_GET_TEXT(""));
!
! /*
! * Adjust L1 and E1 now that we know the slice string length.
! * Again remember that S1 is one based, and slice_start is zero based.
! */
! if (L1 > -1)
! E1 = Min(S1 + L1 , slice_start + 1 + slice_strlen);
! else
! E1 = slice_start + 1 + slice_strlen;
!
! /*
! * Find the start position in the slice;
! * remember S1 is not zero based
! */
! p = VARDATA(slice);
! for (i = 0; i < S1 - 1; i++)
! p += pg_mblen(p);
!
! /* hang onto a pointer to our start position */
! s = p;
!
! /*
! * Count the actual bytes used by the substring of
! * the requested length.
! */
! for (i = S1; i < E1; i++)
! p += pg_mblen(p);
!
! ret = (text *) palloc(VARHDRSZ + (p - s));
! VARATT_SIZEP(ret) = VARHDRSZ + (p - s);
! memcpy(VARDATA(ret), s, (p - s));
!
! PG_RETURN_TEXT_P(ret);
! }
! else
! elog(ERROR, "Invalid backend encoding; encoding max length "
! "is less than one.");
!
! /* notreached: suppress compiler warning */
! PG_RETURN_NULL();
}
/*
***************
*** 758,766 ****
Datum
byteaoctetlen(PG_FUNCTION_ARGS)
{
! bytea *v = PG_GETARG_BYTEA_P(0);
!
! PG_RETURN_INT32(VARSIZE(v) - VARHDRSZ);
}
/*
--- 869,875 ----
Datum
byteaoctetlen(PG_FUNCTION_ARGS)
{
! PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ);
}
/*
***************
*** 805,810 ****
--- 914,921 ----
PG_RETURN_BYTEA_P(result);
}
+ #define PG_STR_GET_BYTEA(str_) \
+ DatumGetByteaP(DirectFunctionCall1(byteain, CStringGetDatum(str_)))
/*
* bytea_substr()
* Return a substring starting at the specified position.
***************
*** 813,845 ****
* Input:
* - string
* - starting position (is one-based)
! * - string length
*
* If the starting position is zero or less, then return from the start of the string
* adjusting the length to be consistent with the "negative start" per SQL92.
! * If the length is less than zero, return the remaining string.
! *
*/
Datum
bytea_substr(PG_FUNCTION_ARGS)
{
! int32 m = PG_GETARG_INT32(1);
! int32 n = PG_GETARG_INT32(2);
!
! /*
! * starting position before the start of the string? then offset into
! * the string per SQL92 spec...
! */
! if (m < 1)
! {
! n += (m - 1);
! m = 1;
}
! /* m will now become a zero-based starting position */
! m--;
!
! PG_RETURN_BYTEA_P(PG_GETARG_BYTEA_P_SLICE (0, m, n));
}
/*
--- 924,983 ----
* Input:
* - string
* - starting position (is one-based)
! * - string length (optional)
*
* If the starting position is zero or less, then return from the start of the string
* adjusting the length to be consistent with the "negative start" per SQL92.
! * If the length is less than zero, an ERROR is thrown. If no third argument
! * (length) is provided, the length to the end of the string is assumed.
*/
Datum
bytea_substr(PG_FUNCTION_ARGS)
{
! int S = PG_GETARG_INT32(1); /* start position */
! int S1; /* adjusted start position */
! int L1; /* adjusted substring length */
!
! S1 = Max(S, 1);
!
! if (fcinfo->nargs == 2)
! {
! /*
! * Not passed a length - PG_GETARG_BYTEA_P_SLICE()
! * grabs everything to the end of the string if we pass it
! * a negative value for length.
! */
! L1 = -1;
}
+ else
+ {
+ /* end position */
+ int E = S + PG_GETARG_INT32(2);
! /*
! * A negative value for L is the only way for the end position
! * to be before the start. SQL99 says to throw an error.
! */
! if (E < S)
! elog(ERROR, "negative substring length not allowed");
!
! /*
! * A zero or negative value for the end position can happen if the start
! * was negative or one. SQL99 says to return a zero-length string.
! */
! if (E < 1)
! PG_RETURN_BYTEA_P(PG_STR_GET_BYTEA(""));
!
! L1 = E - S1;
! }
!
! /*
! * If the start position is past the end of the string,
! * SQL99 says to return a zero-length string --
! * PG_GETARG_TEXT_P_SLICE() will do that for us.
! * Convert to zero-based starting position
! */
! PG_RETURN_BYTEA_P(PG_GETARG_BYTEA_P_SLICE (0, S1 - 1, L1));
}
/*
Index: src/include/catalog/pg_proc.h
===================================================================
RCS file: /opt/src/cvs/pgsql-server/src/include/catalog/pg_proc.h,v
retrieving revision 1.253
diff -c -r1.253 pg_proc.h
*** src/include/catalog/pg_proc.h 9 Aug 2002 16:45:15 -0000 1.253
--- src/include/catalog/pg_proc.h 14 Aug 2002 02:34:38 -0000
***************
*** 2121,2127 ****
DESCR("remove initial characters from string");
DATA(insert OID = 882 ( rtrim PGNSP PGUID 14 f f t f i 1 25 "25" "select rtrim($1, \' \')" - _null_ ));
DESCR("remove trailing characters from string");
! DATA(insert OID = 883 ( substr PGNSP PGUID 14 f f t f i 2 25 "25 23" "select substr($1, $2, -1)" - _null_ ));
DESCR("return portion of string");
DATA(insert OID = 884 ( btrim PGNSP PGUID 12 f f t f i 2 25 "25 25" btrim - _null_ ));
DESCR("trim both ends of string");
--- 2121,2127 ----
DESCR("remove initial characters from string");
DATA(insert OID = 882 ( rtrim PGNSP PGUID 14 f f t f i 1 25 "25" "select rtrim($1, \' \')" - _null_ ));
DESCR("remove trailing characters from string");
! DATA(insert OID = 883 ( substr PGNSP PGUID 12 f f t f i 2 25 "25 23" text_substr - _null_ ));
DESCR("return portion of string");
DATA(insert OID = 884 ( btrim PGNSP PGUID 12 f f t f i 2 25 "25 25" btrim - _null_ ));
DESCR("trim both ends of string");
***************
*** 2130,2136 ****
DATA(insert OID = 936 ( substring PGNSP PGUID 12 f f t f i 3 25 "25 23 23" text_substr - _null_ ));
DESCR("return portion of string");
! DATA(insert OID = 937 ( substring PGNSP PGUID 14 f f t f i 2 25 "25 23" "select substring($1, $2, -1)" - _null_ ));
DESCR("return portion of string");
/* for multi-byte support */
--- 2130,2136 ----
DATA(insert OID = 936 ( substring PGNSP PGUID 12 f f t f i 3 25 "25 23 23" text_substr - _null_ ));
DESCR("return portion of string");
! DATA(insert OID = 937 ( substring PGNSP PGUID 12 f f t f i 2 25 "25 23" text_substr - _null_ ));
DESCR("return portion of string");
/* for multi-byte support */
***************
*** 2778,2784 ****
DESCR("concatenate");
DATA(insert OID = 2012 ( substring PGNSP PGUID 12 f f t f i 3 17 "17 23 23" bytea_substr - _null_ ));
DESCR("return portion of string");
! DATA(insert OID = 2013 ( substring PGNSP PGUID 14 f f t f i 2 17 "17 23" "select substring($1, $2, -1)" - _null_ ));
DESCR("return portion of string");
DATA(insert OID = 2014 ( position PGNSP PGUID 12 f f t f i 2 23 "17 17" byteapos - _null_ ));
DESCR("return position of substring");
--- 2778,2784 ----
DESCR("concatenate");
DATA(insert OID = 2012 ( substring PGNSP PGUID 12 f f t f i 3 17 "17 23 23" bytea_substr - _null_ ));
DESCR("return portion of string");
! DATA(insert OID = 2013 ( substring PGNSP PGUID 12 f f t f i 2 17 "17 23" bytea_substr - _null_ ));
DESCR("return portion of string");
DATA(insert OID = 2014 ( position PGNSP PGUID 12 f f t f i 2 23 "17 17" byteapos - _null_ ));
DESCR("return position of substring");
Joe Conway <mail@joeconway.com> writes:
The only remaining problem is that this causes opr_sanity to fail based
on this query: ...
This fails because I implemented text_substr() and bytea_substr() to
take either 2 or 3 args. This was necessary for SQL92 spec compliance.
Rather than loosening the opr_sanity test, I'd suggest setting this
up as two separate builtin functions. They can call a common
implementation routine if you like. But a runtime test on the number
of arguments doesn't offer any attractive improvement.
I'm planning to take on the replace function next.
Isn't Gavin on that already?
regards, tom lane
Tom Lane wrote:
I'm planning to take on the replace function next.
Isn't Gavin on that already?
No, sorry for the confusion. I meant:
replace(bigstring, substr, newsubstr)
which I discussed (mainly with with Thomas) a few weeks ago. This
current work was a warmup, since I wasn't comfortable with MB character
handling, and noticed some issues whilst studying it.
Joe
Tom Lane wrote:
Joe Conway <mail@joeconway.com> writes:
The only remaining problem is that this causes opr_sanity to fail based
on this query: ...
This fails because I implemented text_substr() and bytea_substr() to
take either 2 or 3 args. This was necessary for SQL92 spec compliance.Rather than loosening the opr_sanity test, I'd suggest setting this
up as two separate builtin functions. They can call a common
implementation routine if you like. But a runtime test on the number
of arguments doesn't offer any attractive improvement.
I took Tom's advice and added wrapper functions around text_substr() and
bytea_substr() to cover the 2 argument case.
I also added tests to strings.sql to cover substr() on toasted columns
of both text and bytea.
If there are no objections, please apply.
Thanks,
Joe
Attachments:
substr.2002.08.14.1.patchtext/plain; name=substr.2002.08.14.1.patchDownload
Index: src/backend/utils/adt/varlena.c
===================================================================
RCS file: /opt/src/cvs/pgsql-server/src/backend/utils/adt/varlena.c,v
retrieving revision 1.87
diff -c -r1.87 varlena.c
*** src/backend/utils/adt/varlena.c 4 Aug 2002 06:44:47 -0000 1.87
--- src/backend/utils/adt/varlena.c 14 Aug 2002 16:44:36 -0000
***************
*** 18,23 ****
--- 18,24 ----
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+ #include "access/tuptoaster.h"
#include "utils/builtins.h"
#include "utils/pg_locale.h"
***************
*** 285,303 ****
Datum
textlen(PG_FUNCTION_ARGS)
{
! text *t = PG_GETARG_TEXT_P(0);
! #ifdef MULTIBYTE
! /* optimization for single byte encoding */
! if (pg_database_encoding_max_length() <= 1)
! PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ);
!
! PG_RETURN_INT32(
! pg_mbstrlen_with_len(VARDATA(t), VARSIZE(t) - VARHDRSZ)
! );
! #else
! PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ);
! #endif
}
/*
--- 286,309 ----
Datum
textlen(PG_FUNCTION_ARGS)
{
! /* fastpath when max encoding length is one */
! if (pg_database_encoding_max_length() == 1)
! PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ);
! if (pg_database_encoding_max_length() > 1)
! {
! text *t = PG_GETARG_TEXT_P(0);
!
! PG_RETURN_INT32(pg_mbstrlen_with_len(VARDATA(t),
! VARSIZE(t) - VARHDRSZ));
! }
!
! /* should never get here */
! elog(ERROR, "Invalid backend encoding; encoding max length "
! "is less than one.");
!
! /* notreached: suppress compiler warning */
! PG_RETURN_NULL();
}
/*
***************
*** 308,316 ****
Datum
textoctetlen(PG_FUNCTION_ARGS)
{
! text *arg = PG_GETARG_TEXT_P(0);
!
! PG_RETURN_INT32(VARSIZE(arg) - VARHDRSZ);
}
/*
--- 314,320 ----
Datum
textoctetlen(PG_FUNCTION_ARGS)
{
! PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ);
}
/*
***************
*** 358,363 ****
--- 362,375 ----
PG_RETURN_TEXT_P(result);
}
+
+ #define PG_TEXTARG_GET_STR(arg_) \
+ DatumGetCString(DirectFunctionCall1(textout, PG_GETARG_DATUM(arg_)))
+ #define PG_TEXT_GET_STR(textp_) \
+ DatumGetCString(DirectFunctionCall1(textout, PointerGetDatum(textp_)))
+ #define PG_STR_GET_TEXT(str_) \
+ DatumGetTextP(DirectFunctionCall1(textin, CStringGetDatum(str_)))
+
/*
* text_substr()
* Return a substring starting at the specified position.
***************
*** 382,471 ****
* - Thomas Lockhart 1998-12-10
* Now uses faster TOAST-slicing interface
* - John Gray 2002-02-22
*/
Datum
text_substr(PG_FUNCTION_ARGS)
{
! text *string;
! int32 m = PG_GETARG_INT32(1);
! int32 n = PG_GETARG_INT32(2);
! int32 sm;
! int32 sn;
! int eml = 1;
! #ifdef MULTIBYTE
! int i;
! int len;
! text *ret;
! char *p;
! #endif
! /*
! * starting position before the start of the string? then offset into
! * the string per SQL92 spec...
! */
! if (m < 1)
{
! n += (m - 1);
! m = 1;
! }
! /* Check for m > octet length is made in TOAST access routine */
! /* m will now become a zero-based starting position */
! sm = m - 1;
! sn = n;
! #ifdef MULTIBYTE
! eml = pg_database_encoding_max_length ();
! if (eml > 1)
! {
! sm = 0;
! if (n > -1)
! sn = (m + n) * eml + 3; /* +3 to avoid mb characters overhanging slice end */
else
! sn = n; /* n < 0 is special-cased by heap_tuple_untoast_attr_slice */
! }
! #endif
! string = PG_GETARG_TEXT_P_SLICE (0, sm, sn);
! if (eml == 1)
! {
! PG_RETURN_TEXT_P (string);
! }
! #ifndef MULTIBYTE
! PG_RETURN_NULL(); /* notreached: suppress compiler warning */
! #endif
! #ifdef MULTIBYTE
! if (n > -1)
! len = pg_mbstrlen_with_len (VARDATA (string), sn - 3);
! else /* n < 0 is special-cased; need full string length */
! len = pg_mbstrlen_with_len (VARDATA (string), VARSIZE(string)-VARHDRSZ);
!
! if (m > len)
! {
! m = 1;
! n = 0;
! }
! m--;
! if (((m + n) > len) || (n < 0))
! n = (len - m);
!
! p = VARDATA(string);
! for (i = 0; i < m; i++)
! p += pg_mblen(p);
! m = p - VARDATA(string);
! for (i = 0; i < n; i++)
! p += pg_mblen(p);
! n = p - (VARDATA(string) + m);
! ret = (text *) palloc(VARHDRSZ + n);
! VARATT_SIZEP(ret) = VARHDRSZ + n;
! memcpy(VARDATA(ret), VARDATA(string) + m, n);
! PG_RETURN_TEXT_P(ret);
! #endif
}
/*
--- 394,593 ----
* - Thomas Lockhart 1998-12-10
* Now uses faster TOAST-slicing interface
* - John Gray 2002-02-22
+ * Remove "#ifdef MULTIBYTE" and test for encoding_max_length instead. Change
+ * behaviors conflicting with SQL92 to meet SQL92 (if E = S + L < S throw
+ * error; if E < 1, return '', not entire string). Fixed MB related bug when
+ * S > LC and < LC + 4 sometimes garbage characters are returned.
+ * - Joe Conway 2002-08-10
*/
Datum
text_substr(PG_FUNCTION_ARGS)
{
! int S = PG_GETARG_INT32(1); /* start position */
! int eml = pg_database_encoding_max_length();
! int S1; /* adjusted start position */
! int L1; /* adjusted substring length */
! /* life is easy if the encoding max length is 1 */
! if (eml == 1)
{
! S1 = Max(S, 1);
!
! if (fcinfo->nargs == 2)
! {
! /*
! * Not passed a length - PG_GETARG_TEXT_P_SLICE()
! * grabs everything to the end of the string if we pass it
! * a negative value for length.
! */
! L1 = -1;
! }
! else
! {
! /* end position */
! int E = S + PG_GETARG_INT32(2);
! /*
! * A negative value for L is the only way for the end position
! * to be before the start. SQL99 says to throw an error.
! */
! if (E < S)
! elog(ERROR, "negative substring length not allowed");
! /*
! * A zero or negative value for the end position can happen if the start
! * was negative or one. SQL99 says to return a zero-length string.
! */
! if (E < 1)
! PG_RETURN_TEXT_P(PG_STR_GET_TEXT(""));
! L1 = E - S1;
! }
!
! /*
! * If the start position is past the end of the string,
! * SQL99 says to return a zero-length string --
! * PG_GETARG_TEXT_P_SLICE() will do that for us.
! * Convert to zero-based starting position
! */
! PG_RETURN_TEXT_P (PG_GETARG_TEXT_P_SLICE (0, S1 - 1, L1));
! }
! else if (eml > 1)
! {
! /*
! * When encoding max length is > 1, we can't get LC without
! * detoasting, so we'll grab a conservatively large slice
! * now and go back later to do the right thing
! */
! int slice_start;
! int slice_size;
! int slice_strlen;
! text *slice;
! int E1;
! int i;
! char *p;
! char *s;
! text *ret;
!
! /*
! * if S is past the end of the string, the tuple toaster
! * will return a zero-length string to us
! */
! S1 = Max(S, 1);
!
! /*
! * We need to start at position zero because there is no
! * way to know in advance which byte offset corresponds to
! * the supplied start position.
! */
! slice_start = 0;
!
! if (fcinfo->nargs == 2)
! {
! /*
! * If we were not passed a length, the spec says that
! * E = Max(LC + 1, S). Since we don't know LC yet, set
! * slice_size = -1 which will cause heap_tuple_untoast_attr_slice
! * to give use everything to the end of the string.
! * If S > LC + 1, we'll get back a zero length string anyway.
! */
! slice_size = L1 = -1;
! }
else
! {
! int E = S + PG_GETARG_INT32(2);
!
! /*
! * A negative value for L is the only way for the end position
! * to be before the start. SQL99 says to throw an error.
! */
! if (E < S)
! elog(ERROR, "negative substring length not allowed");
! /*
! * A zero or negative value for the end position can happen if the start
! * was negative or one. SQL99 says to return a zero-length string.
! */
! if (E < 1)
! PG_RETURN_TEXT_P(PG_STR_GET_TEXT(""));
! /*
! * if E is past the end of the string, the tuple toaster
! * will truncate the length for us
! */
! L1 = E - S1;
!
! /*
! * Total slice size in bytes can't be any longer than the start
! * position plus substring length times the encoding max length.
! */
! slice_size = (S1 + L1) * eml;
! }
! slice = PG_GETARG_TEXT_P_SLICE (0, slice_start, slice_size);
! /* see if we got back an empty string */
! if ((VARSIZE(slice) - VARHDRSZ) == 0)
! PG_RETURN_TEXT_P(PG_STR_GET_TEXT(""));
!
! /* Now we can get the actual length of the slice in MB characters */
! slice_strlen = pg_mbstrlen_with_len (VARDATA(slice), VARSIZE(slice) - VARHDRSZ);
!
! /* Check that the start position wasn't > slice_strlen. If so,
! * SQL99 says to return a zero-length string.
! */
! if (S1 > slice_strlen)
! PG_RETURN_TEXT_P(PG_STR_GET_TEXT(""));
!
! /*
! * Adjust L1 and E1 now that we know the slice string length.
! * Again remember that S1 is one based, and slice_start is zero based.
! */
! if (L1 > -1)
! E1 = Min(S1 + L1 , slice_start + 1 + slice_strlen);
! else
! E1 = slice_start + 1 + slice_strlen;
! /*
! * Find the start position in the slice;
! * remember S1 is not zero based
! */
! p = VARDATA(slice);
! for (i = 0; i < S1 - 1; i++)
! p += pg_mblen(p);
!
! /* hang onto a pointer to our start position */
! s = p;
!
! /*
! * Count the actual bytes used by the substring of
! * the requested length.
! */
! for (i = S1; i < E1; i++)
! p += pg_mblen(p);
!
! ret = (text *) palloc(VARHDRSZ + (p - s));
! VARATT_SIZEP(ret) = VARHDRSZ + (p - s);
! memcpy(VARDATA(ret), s, (p - s));
! PG_RETURN_TEXT_P(ret);
! }
! else
! elog(ERROR, "Invalid backend encoding; encoding max length "
! "is less than one.");
!
! /* notreached: suppress compiler warning */
! PG_RETURN_NULL();
! }
!
! /*
! * text_substr_no_len -
! * Wrapper to avoid opr_sanity failure due to
! * one function accepting a different number of args.
! */
! Datum
! text_substr_no_len(PG_FUNCTION_ARGS)
! {
! return text_substr(fcinfo);
}
/*
***************
*** 758,766 ****
Datum
byteaoctetlen(PG_FUNCTION_ARGS)
{
! bytea *v = PG_GETARG_BYTEA_P(0);
!
! PG_RETURN_INT32(VARSIZE(v) - VARHDRSZ);
}
/*
--- 880,886 ----
Datum
byteaoctetlen(PG_FUNCTION_ARGS)
{
! PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ);
}
/*
***************
*** 805,810 ****
--- 925,932 ----
PG_RETURN_BYTEA_P(result);
}
+ #define PG_STR_GET_BYTEA(str_) \
+ DatumGetByteaP(DirectFunctionCall1(byteain, CStringGetDatum(str_)))
/*
* bytea_substr()
* Return a substring starting at the specified position.
***************
*** 813,845 ****
* Input:
* - string
* - starting position (is one-based)
! * - string length
*
* If the starting position is zero or less, then return from the start of the string
* adjusting the length to be consistent with the "negative start" per SQL92.
! * If the length is less than zero, return the remaining string.
! *
*/
Datum
bytea_substr(PG_FUNCTION_ARGS)
{
! int32 m = PG_GETARG_INT32(1);
! int32 n = PG_GETARG_INT32(2);
!
! /*
! * starting position before the start of the string? then offset into
! * the string per SQL92 spec...
! */
! if (m < 1)
! {
! n += (m - 1);
! m = 1;
}
! /* m will now become a zero-based starting position */
! m--;
! PG_RETURN_BYTEA_P(PG_GETARG_BYTEA_P_SLICE (0, m, n));
}
/*
--- 935,1005 ----
* Input:
* - string
* - starting position (is one-based)
! * - string length (optional)
*
* If the starting position is zero or less, then return from the start of the string
* adjusting the length to be consistent with the "negative start" per SQL92.
! * If the length is less than zero, an ERROR is thrown. If no third argument
! * (length) is provided, the length to the end of the string is assumed.
*/
Datum
bytea_substr(PG_FUNCTION_ARGS)
{
! int S = PG_GETARG_INT32(1); /* start position */
! int S1; /* adjusted start position */
! int L1; /* adjusted substring length */
!
! S1 = Max(S, 1);
!
! if (fcinfo->nargs == 2)
! {
! /*
! * Not passed a length - PG_GETARG_BYTEA_P_SLICE()
! * grabs everything to the end of the string if we pass it
! * a negative value for length.
! */
! L1 = -1;
}
+ else
+ {
+ /* end position */
+ int E = S + PG_GETARG_INT32(2);
! /*
! * A negative value for L is the only way for the end position
! * to be before the start. SQL99 says to throw an error.
! */
! if (E < S)
! elog(ERROR, "negative substring length not allowed");
!
! /*
! * A zero or negative value for the end position can happen if the start
! * was negative or one. SQL99 says to return a zero-length string.
! */
! if (E < 1)
! PG_RETURN_BYTEA_P(PG_STR_GET_BYTEA(""));
!
! L1 = E - S1;
! }
!
! /*
! * If the start position is past the end of the string,
! * SQL99 says to return a zero-length string --
! * PG_GETARG_TEXT_P_SLICE() will do that for us.
! * Convert to zero-based starting position
! */
! PG_RETURN_BYTEA_P(PG_GETARG_BYTEA_P_SLICE (0, S1 - 1, L1));
! }
! /*
! * bytea_substr_no_len -
! * Wrapper to avoid opr_sanity failure due to
! * one function accepting a different number of args.
! */
! Datum
! bytea_substr_no_len(PG_FUNCTION_ARGS)
! {
! return bytea_substr(fcinfo);
}
/*
Index: src/include/catalog/pg_proc.h
===================================================================
RCS file: /opt/src/cvs/pgsql-server/src/include/catalog/pg_proc.h,v
retrieving revision 1.253
diff -c -r1.253 pg_proc.h
*** src/include/catalog/pg_proc.h 9 Aug 2002 16:45:15 -0000 1.253
--- src/include/catalog/pg_proc.h 14 Aug 2002 17:44:01 -0000
***************
*** 2121,2127 ****
DESCR("remove initial characters from string");
DATA(insert OID = 882 ( rtrim PGNSP PGUID 14 f f t f i 1 25 "25" "select rtrim($1, \' \')" - _null_ ));
DESCR("remove trailing characters from string");
! DATA(insert OID = 883 ( substr PGNSP PGUID 14 f f t f i 2 25 "25 23" "select substr($1, $2, -1)" - _null_ ));
DESCR("return portion of string");
DATA(insert OID = 884 ( btrim PGNSP PGUID 12 f f t f i 2 25 "25 25" btrim - _null_ ));
DESCR("trim both ends of string");
--- 2121,2127 ----
DESCR("remove initial characters from string");
DATA(insert OID = 882 ( rtrim PGNSP PGUID 14 f f t f i 1 25 "25" "select rtrim($1, \' \')" - _null_ ));
DESCR("remove trailing characters from string");
! DATA(insert OID = 883 ( substr PGNSP PGUID 12 f f t f i 2 25 "25 23" text_substr_no_len - _null_ ));
DESCR("return portion of string");
DATA(insert OID = 884 ( btrim PGNSP PGUID 12 f f t f i 2 25 "25 25" btrim - _null_ ));
DESCR("trim both ends of string");
***************
*** 2130,2136 ****
DATA(insert OID = 936 ( substring PGNSP PGUID 12 f f t f i 3 25 "25 23 23" text_substr - _null_ ));
DESCR("return portion of string");
! DATA(insert OID = 937 ( substring PGNSP PGUID 14 f f t f i 2 25 "25 23" "select substring($1, $2, -1)" - _null_ ));
DESCR("return portion of string");
/* for multi-byte support */
--- 2130,2136 ----
DATA(insert OID = 936 ( substring PGNSP PGUID 12 f f t f i 3 25 "25 23 23" text_substr - _null_ ));
DESCR("return portion of string");
! DATA(insert OID = 937 ( substring PGNSP PGUID 12 f f t f i 2 25 "25 23" text_substr_no_len - _null_ ));
DESCR("return portion of string");
/* for multi-byte support */
***************
*** 2778,2784 ****
DESCR("concatenate");
DATA(insert OID = 2012 ( substring PGNSP PGUID 12 f f t f i 3 17 "17 23 23" bytea_substr - _null_ ));
DESCR("return portion of string");
! DATA(insert OID = 2013 ( substring PGNSP PGUID 14 f f t f i 2 17 "17 23" "select substring($1, $2, -1)" - _null_ ));
DESCR("return portion of string");
DATA(insert OID = 2014 ( position PGNSP PGUID 12 f f t f i 2 23 "17 17" byteapos - _null_ ));
DESCR("return position of substring");
--- 2778,2788 ----
DESCR("concatenate");
DATA(insert OID = 2012 ( substring PGNSP PGUID 12 f f t f i 3 17 "17 23 23" bytea_substr - _null_ ));
DESCR("return portion of string");
! DATA(insert OID = 2013 ( substring PGNSP PGUID 12 f f t f i 2 17 "17 23" bytea_substr_no_len - _null_ ));
! DESCR("return portion of string");
! DATA(insert OID = 2084 ( substr PGNSP PGUID 12 f f t f i 3 17 "17 23 23" bytea_substr - _null_ ));
! DESCR("return portion of string");
! DATA(insert OID = 2085 ( substr PGNSP PGUID 12 f f t f i 2 17 "17 23" bytea_substr_no_len - _null_ ));
DESCR("return portion of string");
DATA(insert OID = 2014 ( position PGNSP PGUID 12 f f t f i 2 23 "17 17" byteapos - _null_ ));
DESCR("return position of substring");
Index: src/include/utils/builtins.h
===================================================================
RCS file: /opt/src/cvs/pgsql-server/src/include/utils/builtins.h,v
retrieving revision 1.190
diff -c -r1.190 builtins.h
*** src/include/utils/builtins.h 9 Aug 2002 16:45:16 -0000 1.190
--- src/include/utils/builtins.h 14 Aug 2002 16:35:26 -0000
***************
*** 447,452 ****
--- 447,453 ----
extern Datum textoctetlen(PG_FUNCTION_ARGS);
extern Datum textpos(PG_FUNCTION_ARGS);
extern Datum text_substr(PG_FUNCTION_ARGS);
+ extern Datum text_substr_no_len(PG_FUNCTION_ARGS);
extern Datum name_text(PG_FUNCTION_ARGS);
extern Datum text_name(PG_FUNCTION_ARGS);
extern int varstr_cmp(char *arg1, int len1, char *arg2, int len2);
***************
*** 476,481 ****
--- 477,483 ----
extern Datum byteacat(PG_FUNCTION_ARGS);
extern Datum byteapos(PG_FUNCTION_ARGS);
extern Datum bytea_substr(PG_FUNCTION_ARGS);
+ extern Datum bytea_substr_no_len(PG_FUNCTION_ARGS);
/* version.c */
extern Datum pgsql_version(PG_FUNCTION_ARGS);
Index: src/test/regress/expected/strings.out
===================================================================
RCS file: /opt/src/cvs/pgsql-server/src/test/regress/expected/strings.out,v
retrieving revision 1.12
diff -c -r1.12 strings.out
*** src/test/regress/expected/strings.out 11 Jun 2002 15:41:38 -0000 1.12
--- src/test/regress/expected/strings.out 14 Aug 2002 17:52:39 -0000
***************
*** 573,575 ****
--- 573,649 ----
text and varchar
(1 row)
+ --
+ -- test substr with toasted text values
+ --
+ CREATE TABLE toasttest(f1 text);
+ insert into toasttest values(repeat('1234567890',10000));
+ insert into toasttest values(repeat('1234567890',10000));
+ -- If the starting position is zero or less, then return from the start of the string
+ -- adjusting the length to be consistent with the "negative start" per SQL92.
+ SELECT substr(f1, -1, 5) from toasttest;
+ substr
+ --------
+ 123
+ 123
+ (2 rows)
+
+ -- If the length is less than zero, an ERROR is thrown.
+ SELECT substr(f1, 5, -1) from toasttest;
+ ERROR: negative substring length not allowed
+ -- If no third argument (length) is provided, the length to the end of the
+ -- string is assumed.
+ SELECT substr(f1, 99995) from toasttest;
+ substr
+ --------
+ 567890
+ 567890
+ (2 rows)
+
+ -- If start plus length is > string length, the result is truncated to
+ -- string length
+ SELECT substr(f1, 99995, 10) from toasttest;
+ substr
+ --------
+ 567890
+ 567890
+ (2 rows)
+
+ DROP TABLE toasttest;
+ --
+ -- test substr with toasted bytea values
+ --
+ CREATE TABLE toasttest(f1 bytea);
+ insert into toasttest values(decode(repeat('1234567890',10000),'escape'));
+ insert into toasttest values(decode(repeat('1234567890',10000),'escape'));
+ -- If the starting position is zero or less, then return from the start of the string
+ -- adjusting the length to be consistent with the "negative start" per SQL92.
+ SELECT substr(f1, -1, 5) from toasttest;
+ substr
+ --------
+ 123
+ 123
+ (2 rows)
+
+ -- If the length is less than zero, an ERROR is thrown.
+ SELECT substr(f1, 5, -1) from toasttest;
+ ERROR: negative substring length not allowed
+ -- If no third argument (length) is provided, the length to the end of the
+ -- string is assumed.
+ SELECT substr(f1, 99995) from toasttest;
+ substr
+ --------
+ 567890
+ 567890
+ (2 rows)
+
+ -- If start plus length is > string length, the result is truncated to
+ -- string length
+ SELECT substr(f1, 99995, 10) from toasttest;
+ substr
+ --------
+ 567890
+ 567890
+ (2 rows)
+
+ DROP TABLE toasttest;
Index: src/test/regress/sql/strings.sql
===================================================================
RCS file: /opt/src/cvs/pgsql-server/src/test/regress/sql/strings.sql,v
retrieving revision 1.8
diff -c -r1.8 strings.sql
*** src/test/regress/sql/strings.sql 11 Jun 2002 15:41:38 -0000 1.8
--- src/test/regress/sql/strings.sql 14 Aug 2002 17:42:36 -0000
***************
*** 197,199 ****
--- 197,249 ----
SELECT text 'text' || char(20) ' and characters' AS "Concat text to char";
SELECT text 'text' || varchar ' and varchar' AS "Concat text to varchar";
+
+ --
+ -- test substr with toasted text values
+ --
+ CREATE TABLE toasttest(f1 text);
+
+ insert into toasttest values(repeat('1234567890',10000));
+ insert into toasttest values(repeat('1234567890',10000));
+
+ -- If the starting position is zero or less, then return from the start of the string
+ -- adjusting the length to be consistent with the "negative start" per SQL92.
+ SELECT substr(f1, -1, 5) from toasttest;
+
+ -- If the length is less than zero, an ERROR is thrown.
+ SELECT substr(f1, 5, -1) from toasttest;
+
+ -- If no third argument (length) is provided, the length to the end of the
+ -- string is assumed.
+ SELECT substr(f1, 99995) from toasttest;
+
+ -- If start plus length is > string length, the result is truncated to
+ -- string length
+ SELECT substr(f1, 99995, 10) from toasttest;
+
+ DROP TABLE toasttest;
+
+ --
+ -- test substr with toasted bytea values
+ --
+ CREATE TABLE toasttest(f1 bytea);
+
+ insert into toasttest values(decode(repeat('1234567890',10000),'escape'));
+ insert into toasttest values(decode(repeat('1234567890',10000),'escape'));
+
+ -- If the starting position is zero or less, then return from the start of the string
+ -- adjusting the length to be consistent with the "negative start" per SQL92.
+ SELECT substr(f1, -1, 5) from toasttest;
+
+ -- If the length is less than zero, an ERROR is thrown.
+ SELECT substr(f1, 5, -1) from toasttest;
+
+ -- If no third argument (length) is provided, the length to the end of the
+ -- string is assumed.
+ SELECT substr(f1, 99995) from toasttest;
+
+ -- If start plus length is > string length, the result is truncated to
+ -- string length
+ SELECT substr(f1, 99995, 10) from toasttest;
+
+ DROP TABLE toasttest;
Joe Conway wrote:
I took Tom's advice and added wrapper functions around text_substr() and
bytea_substr() to cover the 2 argument case.I also added tests to strings.sql to cover substr() on toasted columns
of both text and bytea.
Please replace the original patch (substr.2002.08.14.1.patch) with the
attached. It includes everything from the previous one, plus newly
implemented builtin functions:
replace(string, from, to)
-- replaces all occurrences of "from" in "string" to "to"
split(string, fldsep, column)
-- splits "string" on "fldsep" and returns "column" number piece
to_hex(int32_num) & to_hex(int64_num)
-- takes integer number and returns as hex string
All previously discussed on the list; see thread at:
http://archives.postgresql.org/pgsql-hackers/2002-07/msg00511.php
Examples:
SELECT replace('yabadabadoo', 'ba', '123') AS "ya123da123doo";
ya123da123doo
---------------
ya123da123doo
(1 row)
select split('joeuser@mydatabase','@',1) AS "joeuser";
joeuser
---------
joeuser
(1 row)
select split('joeuser@mydatabase','@',2) AS "mydatabase";
mydatabase
------------
mydatabase
(1 row)
select to_hex(256::bigint*256::bigint*256::bigint*256::bigint - 1) AS
"ffffffff";
ffffffff
----------
ffffffff
(1 row)
Tests have been added to the regression suite.
Passes all regression tests. I've checked the strings.sql script in a
multibyte database and it works fine also. I'd appreciate a good look by
someone more familiar with multibyte related issues though.
If it is OK, I'd like to hold off on docs until this is committed and
after beta starts.
If there are no objections, please apply.
Thanks,
Joe
Attachments:
str_funcs.2002.08.16.1.patchtext/plain; name=str_funcs.2002.08.16.1.patchDownload
Index: src/backend/utils/adt/varlena.c
===================================================================
RCS file: /opt/src/cvs/pgsql-server/src/backend/utils/adt/varlena.c,v
retrieving revision 1.87
diff -c -r1.87 varlena.c
*** src/backend/utils/adt/varlena.c 4 Aug 2002 06:44:47 -0000 1.87
--- src/backend/utils/adt/varlena.c 16 Aug 2002 19:54:03 -0000
***************
*** 18,23 ****
--- 18,25 ----
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+ #include "access/tuptoaster.h"
+ #include "lib/stringinfo.h"
#include "utils/builtins.h"
#include "utils/pg_locale.h"
***************
*** 27,34 ****
--- 29,62 ----
#define DatumGetUnknownP(X) ((unknown *) PG_DETOAST_DATUM(X))
#define PG_GETARG_UNKNOWN_P(n) DatumGetUnknownP(PG_GETARG_DATUM(n))
#define PG_RETURN_UNKNOWN_P(x) PG_RETURN_POINTER(x)
+ #define PG_TEXTARG_GET_STR(arg_) \
+ DatumGetCString(DirectFunctionCall1(textout, PG_GETARG_DATUM(arg_)))
+ #define PG_TEXT_GET_STR(textp_) \
+ DatumGetCString(DirectFunctionCall1(textout, PointerGetDatum(textp_)))
+ #define PG_STR_GET_TEXT(str_) \
+ DatumGetTextP(DirectFunctionCall1(textin, CStringGetDatum(str_)))
+ #define TEXTLEN(textp) \
+ text_length(PointerGetDatum(textp))
+ #define TEXTPOS(buf_text, from_sub_text) \
+ text_position(PointerGetDatum(buf_text), PointerGetDatum(from_sub_text), 1)
+ #define TEXTDUP(textp) \
+ DatumGetTextPCopy(PointerGetDatum(textp))
+ #define LEFT(buf_text, from_sub_text) \
+ text_substring(PointerGetDatum(buf_text), \
+ 1, \
+ TEXTPOS(buf_text, from_sub_text) - 1, false)
+ #define RIGHT(buf_text, from_sub_text, from_sub_text_len) \
+ text_substring(PointerGetDatum(buf_text), \
+ TEXTPOS(buf_text, from_sub_text) + from_sub_text_len, \
+ -1, true)
static int text_cmp(text *arg1, text *arg2);
+ static int32 text_length(Datum str);
+ static int32 text_position(Datum str, Datum search_str, int matchnum);
+ static text *text_substring(Datum str,
+ int32 start,
+ int32 length,
+ bool length_not_specified);
/*****************************************************************************
***************
*** 285,303 ****
Datum
textlen(PG_FUNCTION_ARGS)
{
! text *t = PG_GETARG_TEXT_P(0);
! #ifdef MULTIBYTE
! /* optimization for single byte encoding */
! if (pg_database_encoding_max_length() <= 1)
! PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ);
!
! PG_RETURN_INT32(
! pg_mbstrlen_with_len(VARDATA(t), VARSIZE(t) - VARHDRSZ)
! );
! #else
! PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ);
! #endif
}
/*
--- 313,348 ----
Datum
textlen(PG_FUNCTION_ARGS)
{
! PG_RETURN_INT32(text_length(PG_GETARG_DATUM(0)));
! }
! /*
! * text_length -
! * Does the real work for textlen()
! * This is broken out so it can be called directly by other string processing
! * functions.
! */
! static int32
! text_length(Datum str)
! {
! /* fastpath when max encoding length is one */
! if (pg_database_encoding_max_length() == 1)
! PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ);
!
! if (pg_database_encoding_max_length() > 1)
! {
! text *t = DatumGetTextP(str);
!
! PG_RETURN_INT32(pg_mbstrlen_with_len(VARDATA(t),
! VARSIZE(t) - VARHDRSZ));
! }
!
! /* should never get here */
! elog(ERROR, "Invalid backend encoding; encoding max length "
! "is less than one.");
!
! /* not reached: suppress compiler warning */
! return 0;
}
/*
***************
*** 308,316 ****
Datum
textoctetlen(PG_FUNCTION_ARGS)
{
! text *arg = PG_GETARG_TEXT_P(0);
!
! PG_RETURN_INT32(VARSIZE(arg) - VARHDRSZ);
}
/*
--- 353,359 ----
Datum
textoctetlen(PG_FUNCTION_ARGS)
{
! PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ);
}
/*
***************
*** 382,471 ****
* - Thomas Lockhart 1998-12-10
* Now uses faster TOAST-slicing interface
* - John Gray 2002-02-22
*/
Datum
text_substr(PG_FUNCTION_ARGS)
{
! text *string;
! int32 m = PG_GETARG_INT32(1);
! int32 n = PG_GETARG_INT32(2);
! int32 sm;
! int32 sn;
! int eml = 1;
! #ifdef MULTIBYTE
! int i;
! int len;
! text *ret;
! char *p;
! #endif
! /*
! * starting position before the start of the string? then offset into
! * the string per SQL92 spec...
! */
! if (m < 1)
{
! n += (m - 1);
! m = 1;
! }
! /* Check for m > octet length is made in TOAST access routine */
! /* m will now become a zero-based starting position */
! sm = m - 1;
! sn = n;
! #ifdef MULTIBYTE
! eml = pg_database_encoding_max_length ();
! if (eml > 1)
{
! sm = 0;
! if (n > -1)
! sn = (m + n) * eml + 3; /* +3 to avoid mb characters overhanging slice end */
else
! sn = n; /* n < 0 is special-cased by heap_tuple_untoast_attr_slice */
! }
! #endif
! string = PG_GETARG_TEXT_P_SLICE (0, sm, sn);
! if (eml == 1)
! {
! PG_RETURN_TEXT_P (string);
! }
! #ifndef MULTIBYTE
! PG_RETURN_NULL(); /* notreached: suppress compiler warning */
! #endif
! #ifdef MULTIBYTE
! if (n > -1)
! len = pg_mbstrlen_with_len (VARDATA (string), sn - 3);
! else /* n < 0 is special-cased; need full string length */
! len = pg_mbstrlen_with_len (VARDATA (string), VARSIZE(string)-VARHDRSZ);
!
! if (m > len)
! {
! m = 1;
! n = 0;
! }
! m--;
! if (((m + n) > len) || (n < 0))
! n = (len - m);
!
! p = VARDATA(string);
! for (i = 0; i < m; i++)
! p += pg_mblen(p);
! m = p - VARDATA(string);
! for (i = 0; i < n; i++)
! p += pg_mblen(p);
! n = p - (VARDATA(string) + m);
! ret = (text *) palloc(VARHDRSZ + n);
! VARATT_SIZEP(ret) = VARHDRSZ + n;
! memcpy(VARDATA(ret), VARDATA(string) + m, n);
! PG_RETURN_TEXT_P(ret);
! #endif
}
/*
--- 425,625 ----
* - Thomas Lockhart 1998-12-10
* Now uses faster TOAST-slicing interface
* - John Gray 2002-02-22
+ * Remove "#ifdef MULTIBYTE" and test for encoding_max_length instead. Change
+ * behaviors conflicting with SQL92 to meet SQL92 (if E = S + L < S throw
+ * error; if E < 1, return '', not entire string). Fixed MB related bug when
+ * S > LC and < LC + 4 sometimes garbage characters are returned.
+ * - Joe Conway 2002-08-10
*/
Datum
text_substr(PG_FUNCTION_ARGS)
{
! PG_RETURN_TEXT_P(text_substring(PG_GETARG_DATUM(0),
! PG_GETARG_INT32(1),
! PG_GETARG_INT32(2),
! false));
! }
! /*
! * text_substr_no_len -
! * Wrapper to avoid opr_sanity failure due to
! * one function accepting a different number of args.
! */
! Datum
! text_substr_no_len(PG_FUNCTION_ARGS)
! {
! PG_RETURN_TEXT_P(text_substring(PG_GETARG_DATUM(0),
! PG_GETARG_INT32(1),
! -1, true));
! }
!
! /*
! * text_substring -
! * Does the real work for text_substr() and text_substr_no_len()
! * This is broken out so it can be called directly by other string processing
! * functions.
! */
! static text*
! text_substring(Datum str, int32 start, int32 length, bool length_not_specified)
! {
! int32 eml = pg_database_encoding_max_length();
! int32 S = start; /* start position */
! int32 S1; /* adjusted start position */
! int32 L1; /* adjusted substring length */
!
! /* life is easy if the encoding max length is 1 */
! if (eml == 1)
{
! S1 = Max(S, 1);
! if (length_not_specified) /* special case - get length to end of string */
! L1 = -1;
! else
! {
! /* end position */
! int E = S + length;
! /*
! * A negative value for L is the only way for the end position
! * to be before the start. SQL99 says to throw an error.
! */
! if (E < S)
! elog(ERROR, "negative substring length not allowed");
! /*
! * A zero or negative value for the end position can happen if the start
! * was negative or one. SQL99 says to return a zero-length string.
! */
! if (E < 1)
! return PG_STR_GET_TEXT("");
!
! L1 = E - S1;
! }
!
! /*
! * If the start position is past the end of the string,
! * SQL99 says to return a zero-length string --
! * PG_GETARG_TEXT_P_SLICE() will do that for us.
! * Convert to zero-based starting position
! */
! return DatumGetTextPSlice(str, S1 - 1, L1);
! }
! else if (eml > 1)
{
! /*
! * When encoding max length is > 1, we can't get LC without
! * detoasting, so we'll grab a conservatively large slice
! * now and go back later to do the right thing
! */
! int32 slice_start;
! int32 slice_size;
! int32 slice_strlen;
! text *slice;
! int32 E1;
! int32 i;
! char *p;
! char *s;
! text *ret;
!
! /*
! * if S is past the end of the string, the tuple toaster
! * will return a zero-length string to us
! */
! S1 = Max(S, 1);
!
! /*
! * We need to start at position zero because there is no
! * way to know in advance which byte offset corresponds to
! * the supplied start position.
! */
! slice_start = 0;
!
! if (length_not_specified) /* special case - get length to end of string */
! slice_size = L1 = -1;
else
! {
! int E = S + length;
!
! /*
! * A negative value for L is the only way for the end position
! * to be before the start. SQL99 says to throw an error.
! */
! if (E < S)
! elog(ERROR, "negative substring length not allowed");
! /*
! * A zero or negative value for the end position can happen if the start
! * was negative or one. SQL99 says to return a zero-length string.
! */
! if (E < 1)
! return PG_STR_GET_TEXT("");
! /*
! * if E is past the end of the string, the tuple toaster
! * will truncate the length for us
! */
! L1 = E - S1;
!
! /*
! * Total slice size in bytes can't be any longer than the start
! * position plus substring length times the encoding max length.
! */
! slice_size = (S1 + L1) * eml;
! }
! slice = DatumGetTextPSlice(str, slice_start, slice_size);
! /* see if we got back an empty string */
! if ((VARSIZE(slice) - VARHDRSZ) == 0)
! return PG_STR_GET_TEXT("");
! /* Now we can get the actual length of the slice in MB characters */
! slice_strlen = pg_mbstrlen_with_len (VARDATA(slice), VARSIZE(slice) - VARHDRSZ);
! /* Check that the start position wasn't > slice_strlen. If so,
! * SQL99 says to return a zero-length string.
! */
! if (S1 > slice_strlen)
! return PG_STR_GET_TEXT("");
!
! /*
! * Adjust L1 and E1 now that we know the slice string length.
! * Again remember that S1 is one based, and slice_start is zero based.
! */
! if (L1 > -1)
! E1 = Min(S1 + L1 , slice_start + 1 + slice_strlen);
! else
! E1 = slice_start + 1 + slice_strlen;
!
! /*
! * Find the start position in the slice;
! * remember S1 is not zero based
! */
! p = VARDATA(slice);
! for (i = 0; i < S1 - 1; i++)
! p += pg_mblen(p);
!
! /* hang onto a pointer to our start position */
! s = p;
!
! /*
! * Count the actual bytes used by the substring of
! * the requested length.
! */
! for (i = S1; i < E1; i++)
! p += pg_mblen(p);
!
! ret = (text *) palloc(VARHDRSZ + (p - s));
! VARATT_SIZEP(ret) = VARHDRSZ + (p - s);
! memcpy(VARDATA(ret), s, (p - s));
!
! return ret;
! }
! else
! elog(ERROR, "Invalid backend encoding; encoding max length "
! "is less than one.");
!
! /* not reached: suppress compiler warning */
! return PG_STR_GET_TEXT("");
}
/*
***************
*** 481,536 ****
Datum
textpos(PG_FUNCTION_ARGS)
{
! text *t1 = PG_GETARG_TEXT_P(0);
! text *t2 = PG_GETARG_TEXT_P(1);
! int pos;
! int px,
! p;
! int len1,
len2;
- pg_wchar *p1,
- *p2;
! #ifdef MULTIBYTE
! pg_wchar *ps1,
! *ps2;
! #endif
if (VARSIZE(t2) <= VARHDRSZ)
PG_RETURN_INT32(1); /* result for empty pattern */
len1 = (VARSIZE(t1) - VARHDRSZ);
len2 = (VARSIZE(t2) - VARHDRSZ);
! #ifdef MULTIBYTE
! ps1 = p1 = (pg_wchar *) palloc((len1 + 1) * sizeof(pg_wchar));
! (void) pg_mb2wchar_with_len((unsigned char *) VARDATA(t1), p1, len1);
! len1 = pg_wchar_strlen(p1);
! ps2 = p2 = (pg_wchar *) palloc((len2 + 1) * sizeof(pg_wchar));
! (void) pg_mb2wchar_with_len((unsigned char *) VARDATA(t2), p2, len2);
! len2 = pg_wchar_strlen(p2);
! #else
! p1 = VARDATA(t1);
! p2 = VARDATA(t2);
! #endif
! pos = 0;
px = (len1 - len2);
! for (p = 0; p <= px; p++)
{
! #ifdef MULTIBYTE
! if ((*p2 == *p1) && (pg_wchar_strncmp(p1, p2, len2) == 0))
! #else
! if ((*p2 == *p1) && (strncmp(p1, p2, len2) == 0))
! #endif
{
! pos = p + 1;
! break;
! };
! p1++;
! };
! #ifdef MULTIBYTE
! pfree(ps1);
! pfree(ps2);
! #endif
PG_RETURN_INT32(pos);
}
--- 635,729 ----
Datum
textpos(PG_FUNCTION_ARGS)
{
! PG_RETURN_INT32(text_position(PG_GETARG_DATUM(0), PG_GETARG_DATUM(1), 1));
! }
!
! /*
! * text_position -
! * Does the real work for textpos()
! * This is broken out so it can be called directly by other string processing
! * functions.
! */
! static int32
! text_position(Datum str, Datum search_str, int matchnum)
! {
! int eml = pg_database_encoding_max_length();
! text *t1 = DatumGetTextP(str);
! text *t2 = DatumGetTextP(search_str);
! int match = 0,
! pos = 0,
! p = 0,
! px,
! len1,
len2;
! if(matchnum == 0)
! return 0; /* result for 0th match */
if (VARSIZE(t2) <= VARHDRSZ)
PG_RETURN_INT32(1); /* result for empty pattern */
len1 = (VARSIZE(t1) - VARHDRSZ);
len2 = (VARSIZE(t2) - VARHDRSZ);
!
! /* no use in searching str past point where search_str will fit */
px = (len1 - len2);
!
! if (eml == 1) /* simple case - single byte encoding */
{
! char *p1,
! *p2;
!
! p1 = VARDATA(t1);
! p2 = VARDATA(t2);
!
! for (p = 0; p <= px; p++)
{
! if ((*p2 == *p1) && (strncmp(p1, p2, len2) == 0))
! {
! if (++match == matchnum)
! {
! pos = p + 1;
! break;
! }
! }
! p1++;
! }
! }
! else if (eml > 1) /* not as simple - multibyte encoding */
! {
! pg_wchar *p1,
! *p2,
! *ps1,
! *ps2;
!
! ps1 = p1 = (pg_wchar *) palloc((len1 + 1) * sizeof(pg_wchar));
! (void) pg_mb2wchar_with_len((unsigned char *) VARDATA(t1), p1, len1);
! len1 = pg_wchar_strlen(p1);
! ps2 = p2 = (pg_wchar *) palloc((len2 + 1) * sizeof(pg_wchar));
! (void) pg_mb2wchar_with_len((unsigned char *) VARDATA(t2), p2, len2);
! len2 = pg_wchar_strlen(p2);
!
! for (p = 0; p <= px; p++)
! {
! if ((*p2 == *p1) && (pg_wchar_strncmp(p1, p2, len2) == 0))
! {
! if (++match == matchnum)
! {
! pos = p + 1;
! break;
! }
! }
! p1++;
! }
!
! pfree(ps1);
! pfree(ps2);
! }
! else
! elog(ERROR, "Invalid backend encoding; encoding max length "
! "is less than one.");
!
PG_RETURN_INT32(pos);
}
***************
*** 758,766 ****
Datum
byteaoctetlen(PG_FUNCTION_ARGS)
{
! bytea *v = PG_GETARG_BYTEA_P(0);
!
! PG_RETURN_INT32(VARSIZE(v) - VARHDRSZ);
}
/*
--- 951,957 ----
Datum
byteaoctetlen(PG_FUNCTION_ARGS)
{
! PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ);
}
/*
***************
*** 805,810 ****
--- 996,1003 ----
PG_RETURN_BYTEA_P(result);
}
+ #define PG_STR_GET_BYTEA(str_) \
+ DatumGetByteaP(DirectFunctionCall1(byteain, CStringGetDatum(str_)))
/*
* bytea_substr()
* Return a substring starting at the specified position.
***************
*** 813,845 ****
* Input:
* - string
* - starting position (is one-based)
! * - string length
*
* If the starting position is zero or less, then return from the start of the string
* adjusting the length to be consistent with the "negative start" per SQL92.
! * If the length is less than zero, return the remaining string.
! *
*/
Datum
bytea_substr(PG_FUNCTION_ARGS)
{
! int32 m = PG_GETARG_INT32(1);
! int32 n = PG_GETARG_INT32(2);
! /*
! * starting position before the start of the string? then offset into
! * the string per SQL92 spec...
! */
! if (m < 1)
{
! n += (m - 1);
! m = 1;
}
! /* m will now become a zero-based starting position */
! m--;
! PG_RETURN_BYTEA_P(PG_GETARG_BYTEA_P_SLICE (0, m, n));
}
/*
--- 1006,1076 ----
* Input:
* - string
* - starting position (is one-based)
! * - string length (optional)
*
* If the starting position is zero or less, then return from the start of the string
* adjusting the length to be consistent with the "negative start" per SQL92.
! * If the length is less than zero, an ERROR is thrown. If no third argument
! * (length) is provided, the length to the end of the string is assumed.
*/
Datum
bytea_substr(PG_FUNCTION_ARGS)
{
! int S = PG_GETARG_INT32(1); /* start position */
! int S1; /* adjusted start position */
! int L1; /* adjusted substring length */
! S1 = Max(S, 1);
!
! if (fcinfo->nargs == 2)
! {
! /*
! * Not passed a length - PG_GETARG_BYTEA_P_SLICE()
! * grabs everything to the end of the string if we pass it
! * a negative value for length.
! */
! L1 = -1;
! }
! else
{
! /* end position */
! int E = S + PG_GETARG_INT32(2);
!
! /*
! * A negative value for L is the only way for the end position
! * to be before the start. SQL99 says to throw an error.
! */
! if (E < S)
! elog(ERROR, "negative substring length not allowed");
!
! /*
! * A zero or negative value for the end position can happen if the start
! * was negative or one. SQL99 says to return a zero-length string.
! */
! if (E < 1)
! PG_RETURN_BYTEA_P(PG_STR_GET_BYTEA(""));
!
! L1 = E - S1;
}
! /*
! * If the start position is past the end of the string,
! * SQL99 says to return a zero-length string --
! * PG_GETARG_TEXT_P_SLICE() will do that for us.
! * Convert to zero-based starting position
! */
! PG_RETURN_BYTEA_P(PG_GETARG_BYTEA_P_SLICE (0, S1 - 1, L1));
! }
! /*
! * bytea_substr_no_len -
! * Wrapper to avoid opr_sanity failure due to
! * one function accepting a different number of args.
! */
! Datum
! bytea_substr_no_len(PG_FUNCTION_ARGS)
! {
! return bytea_substr(fcinfo);
}
/*
***************
*** 1422,1424 ****
--- 1653,1834 ----
PG_RETURN_INT32(cmp);
}
+
+ /*
+ * replace_text
+ * replace all occurences of 'old_sub_str' in 'orig_str'
+ * with 'new_sub_str' to form 'new_str'
+ *
+ * returns 'orig_str' if 'old_sub_str' == '' or 'orig_str' == ''
+ * otherwise returns 'new_str'
+ */
+ Datum
+ replace_text(PG_FUNCTION_ARGS)
+ {
+ text *left_text;
+ text *right_text;
+ text *buf_text;
+ text *ret_text;
+ int curr_posn;
+ text *src_text = PG_GETARG_TEXT_P(0);
+ int src_text_len = TEXTLEN(src_text);
+ text *from_sub_text = PG_GETARG_TEXT_P(1);
+ int from_sub_text_len = TEXTLEN(from_sub_text);
+ text *to_sub_text = PG_GETARG_TEXT_P(2);
+ char *to_sub_str = PG_TEXT_GET_STR(to_sub_text);
+ StringInfo str = makeStringInfo();
+
+ if (src_text_len == 0 || from_sub_text_len == 0)
+ PG_RETURN_TEXT_P(src_text);
+
+ buf_text = TEXTDUP(src_text);
+ curr_posn = TEXTPOS(buf_text, from_sub_text);
+
+ while (curr_posn > 0)
+ {
+ left_text = LEFT(buf_text, from_sub_text);
+ right_text = RIGHT(buf_text, from_sub_text, from_sub_text_len);
+
+ appendStringInfo(str, PG_TEXT_GET_STR(left_text));
+ appendStringInfo(str, to_sub_str);
+
+ pfree(buf_text);
+ pfree(left_text);
+ buf_text = right_text;
+ curr_posn = TEXTPOS(buf_text, from_sub_text);
+ }
+
+ appendStringInfo(str, PG_TEXT_GET_STR(buf_text));
+ pfree(buf_text);
+
+ ret_text = PG_STR_GET_TEXT(str->data);
+ pfree(str->data);
+ pfree(str);
+
+ PG_RETURN_TEXT_P(ret_text);
+ }
+
+ /*
+ * split_text
+ * parse input string
+ * return ord item (1 based)
+ * based on provided field separator
+ */
+ Datum
+ split_text(PG_FUNCTION_ARGS)
+ {
+ text *inputstring = PG_GETARG_TEXT_P(0);
+ int inputstring_len = TEXTLEN(inputstring);
+ text *fldsep = PG_GETARG_TEXT_P(1);
+ int fldsep_len = TEXTLEN(fldsep);
+ int fldnum = PG_GETARG_INT32(2);
+ int start_posn = 0;
+ int end_posn = 0;
+ text *result_text;
+
+ /* return empty string for empty input string */
+ if (inputstring_len < 1)
+ PG_RETURN_TEXT_P(PG_STR_GET_TEXT(""));
+
+ /* empty field separator */
+ if (fldsep_len < 1)
+ {
+ if (fldnum == 1) /* first field - just return the input string */
+ PG_RETURN_TEXT_P(inputstring);
+ else /* otherwise return an empty string */
+ PG_RETURN_TEXT_P(PG_STR_GET_TEXT(""));
+ }
+
+ /* field number is 1 based */
+ if (fldnum < 1)
+ elog(ERROR, "field position must be > 0");
+
+ start_posn = text_position(PointerGetDatum(inputstring),
+ PointerGetDatum(fldsep),
+ fldnum - 1);
+ end_posn = text_position(PointerGetDatum(inputstring),
+ PointerGetDatum(fldsep),
+ fldnum);
+
+ if ((start_posn == 0) && (end_posn == 0)) /* fldsep not found */
+ {
+ if (fldnum == 1) /* first field - just return the input string */
+ PG_RETURN_TEXT_P(inputstring);
+ else /* otherwise return an empty string */
+ PG_RETURN_TEXT_P(PG_STR_GET_TEXT(""));
+ }
+ else if ((start_posn != 0) && (end_posn == 0))
+ {
+ /* last field requested */
+ result_text = text_substring(PointerGetDatum(inputstring), start_posn + fldsep_len, -1, true);
+ PG_RETURN_TEXT_P(result_text);
+ }
+ else if ((start_posn == 0) && (end_posn != 0))
+ {
+ /* first field requested */
+ result_text = LEFT(inputstring, fldsep);
+ PG_RETURN_TEXT_P(result_text);
+ }
+ else
+ {
+ /* prior to last field requested */
+ result_text = text_substring(PointerGetDatum(inputstring), start_posn + fldsep_len, end_posn - start_posn - fldsep_len, false);
+ PG_RETURN_TEXT_P(result_text);
+ }
+ }
+
+ #define HEXBASE 16
+ /*
+ * Convert a int32 to a string containing a base 16 (hex) representation of
+ * the number.
+ */
+ Datum
+ to_hex32(PG_FUNCTION_ARGS)
+ {
+ static char digits[] = "0123456789abcdef";
+ char buf[32]; /* bigger than needed, but reasonable */
+ char *ptr,
+ *end;
+ text *result_text;
+ int32 value = PG_GETARG_INT32(0);
+
+ end = ptr = buf + sizeof(buf) - 1;
+ *ptr = '\0';
+
+ do
+ {
+ *--ptr = digits[value % HEXBASE];
+ value /= HEXBASE;
+ } while (ptr > buf && value);
+
+ result_text = PG_STR_GET_TEXT(ptr);
+ PG_RETURN_TEXT_P(result_text);
+ }
+
+ /*
+ * Convert a int64 to a string containing a base 16 (hex) representation of
+ * the number.
+ */
+ Datum
+ to_hex64(PG_FUNCTION_ARGS)
+ {
+ static char digits[] = "0123456789abcdef";
+ char buf[32]; /* bigger than needed, but reasonable */
+ char *ptr,
+ *end;
+ text *result_text;
+ int64 value = PG_GETARG_INT64(0);
+
+ end = ptr = buf + sizeof(buf) - 1;
+ *ptr = '\0';
+
+ do
+ {
+ *--ptr = digits[value % HEXBASE];
+ value /= HEXBASE;
+ } while (ptr > buf && value);
+
+ result_text = PG_STR_GET_TEXT(ptr);
+ PG_RETURN_TEXT_P(result_text);
+ }
+
Index: src/include/catalog/pg_proc.h
===================================================================
RCS file: /opt/src/cvs/pgsql-server/src/include/catalog/pg_proc.h,v
retrieving revision 1.254
diff -c -r1.254 pg_proc.h
*** src/include/catalog/pg_proc.h 15 Aug 2002 02:51:27 -0000 1.254
--- src/include/catalog/pg_proc.h 16 Aug 2002 18:53:13 -0000
***************
*** 2121,2127 ****
DESCR("remove initial characters from string");
DATA(insert OID = 882 ( rtrim PGNSP PGUID 14 f f t f i 1 25 "25" "select rtrim($1, \' \')" - _null_ ));
DESCR("remove trailing characters from string");
! DATA(insert OID = 883 ( substr PGNSP PGUID 14 f f t f i 2 25 "25 23" "select substr($1, $2, -1)" - _null_ ));
DESCR("return portion of string");
DATA(insert OID = 884 ( btrim PGNSP PGUID 12 f f t f i 2 25 "25 25" btrim - _null_ ));
DESCR("trim both ends of string");
--- 2121,2127 ----
DESCR("remove initial characters from string");
DATA(insert OID = 882 ( rtrim PGNSP PGUID 14 f f t f i 1 25 "25" "select rtrim($1, \' \')" - _null_ ));
DESCR("remove trailing characters from string");
! DATA(insert OID = 883 ( substr PGNSP PGUID 12 f f t f i 2 25 "25 23" text_substr_no_len - _null_ ));
DESCR("return portion of string");
DATA(insert OID = 884 ( btrim PGNSP PGUID 12 f f t f i 2 25 "25 25" btrim - _null_ ));
DESCR("trim both ends of string");
***************
*** 2130,2137 ****
DATA(insert OID = 936 ( substring PGNSP PGUID 12 f f t f i 3 25 "25 23 23" text_substr - _null_ ));
DESCR("return portion of string");
! DATA(insert OID = 937 ( substring PGNSP PGUID 14 f f t f i 2 25 "25 23" "select substring($1, $2, -1)" - _null_ ));
DESCR("return portion of string");
/* for multi-byte support */
--- 2130,2145 ----
DATA(insert OID = 936 ( substring PGNSP PGUID 12 f f t f i 3 25 "25 23 23" text_substr - _null_ ));
DESCR("return portion of string");
! DATA(insert OID = 937 ( substring PGNSP PGUID 12 f f t f i 2 25 "25 23" text_substr_no_len - _null_ ));
DESCR("return portion of string");
+ DATA(insert OID = 2087 ( replace PGNSP PGUID 12 f f t f i 3 25 "25 25 25" replace_text - _null_ ));
+ DESCR("replace all occurrences of old_substr with new_substr in string");
+ DATA(insert OID = 2088 ( split PGNSP PGUID 12 f f t f i 3 25 "25 25 23" split_text - _null_ ));
+ DESCR("split string by field_sep and return field_num");
+ DATA(insert OID = 2089 ( to_hex PGNSP PGUID 12 f f t f i 1 25 "23" to_hex32 - _null_ ));
+ DESCR("convert int32 number to hex");
+ DATA(insert OID = 2090 ( to_hex PGNSP PGUID 12 f f t f i 1 25 "20" to_hex64 - _null_ ));
+ DESCR("convert int64 number to hex");
/* for multi-byte support */
***************
*** 2778,2784 ****
DESCR("concatenate");
DATA(insert OID = 2012 ( substring PGNSP PGUID 12 f f t f i 3 17 "17 23 23" bytea_substr - _null_ ));
DESCR("return portion of string");
! DATA(insert OID = 2013 ( substring PGNSP PGUID 14 f f t f i 2 17 "17 23" "select substring($1, $2, -1)" - _null_ ));
DESCR("return portion of string");
DATA(insert OID = 2014 ( position PGNSP PGUID 12 f f t f i 2 23 "17 17" byteapos - _null_ ));
DESCR("return position of substring");
--- 2786,2796 ----
DESCR("concatenate");
DATA(insert OID = 2012 ( substring PGNSP PGUID 12 f f t f i 3 17 "17 23 23" bytea_substr - _null_ ));
DESCR("return portion of string");
! DATA(insert OID = 2013 ( substring PGNSP PGUID 12 f f t f i 2 17 "17 23" bytea_substr_no_len - _null_ ));
! DESCR("return portion of string");
! DATA(insert OID = 2085 ( substr PGNSP PGUID 12 f f t f i 3 17 "17 23 23" bytea_substr - _null_ ));
! DESCR("return portion of string");
! DATA(insert OID = 2086 ( substr PGNSP PGUID 12 f f t f i 2 17 "17 23" bytea_substr_no_len - _null_ ));
DESCR("return portion of string");
DATA(insert OID = 2014 ( position PGNSP PGUID 12 f f t f i 2 23 "17 17" byteapos - _null_ ));
DESCR("return position of substring");
Index: src/include/utils/builtins.h
===================================================================
RCS file: /opt/src/cvs/pgsql-server/src/include/utils/builtins.h,v
retrieving revision 1.191
diff -c -r1.191 builtins.h
*** src/include/utils/builtins.h 15 Aug 2002 02:51:27 -0000 1.191
--- src/include/utils/builtins.h 16 Aug 2002 18:53:13 -0000
***************
*** 447,458 ****
--- 447,463 ----
extern Datum textoctetlen(PG_FUNCTION_ARGS);
extern Datum textpos(PG_FUNCTION_ARGS);
extern Datum text_substr(PG_FUNCTION_ARGS);
+ extern Datum text_substr_no_len(PG_FUNCTION_ARGS);
extern Datum name_text(PG_FUNCTION_ARGS);
extern Datum text_name(PG_FUNCTION_ARGS);
extern int varstr_cmp(char *arg1, int len1, char *arg2, int len2);
extern List *textToQualifiedNameList(text *textval, const char *caller);
extern bool SplitIdentifierString(char *rawstring, char separator,
List **namelist);
+ extern Datum replace_text(PG_FUNCTION_ARGS);
+ extern Datum split_text(PG_FUNCTION_ARGS);
+ extern Datum to_hex32(PG_FUNCTION_ARGS);
+ extern Datum to_hex64(PG_FUNCTION_ARGS);
extern Datum unknownin(PG_FUNCTION_ARGS);
extern Datum unknownout(PG_FUNCTION_ARGS);
***************
*** 476,481 ****
--- 481,487 ----
extern Datum byteacat(PG_FUNCTION_ARGS);
extern Datum byteapos(PG_FUNCTION_ARGS);
extern Datum bytea_substr(PG_FUNCTION_ARGS);
+ extern Datum bytea_substr_no_len(PG_FUNCTION_ARGS);
/* version.c */
extern Datum pgsql_version(PG_FUNCTION_ARGS);
Index: src/test/regress/expected/strings.out
===================================================================
RCS file: /opt/src/cvs/pgsql-server/src/test/regress/expected/strings.out,v
retrieving revision 1.12
diff -c -r1.12 strings.out
*** src/test/regress/expected/strings.out 11 Jun 2002 15:41:38 -0000 1.12
--- src/test/regress/expected/strings.out 16 Aug 2002 18:53:13 -0000
***************
*** 573,575 ****
--- 573,738 ----
text and varchar
(1 row)
+ --
+ -- test substr with toasted text values
+ --
+ CREATE TABLE toasttest(f1 text);
+ insert into toasttest values(repeat('1234567890',10000));
+ insert into toasttest values(repeat('1234567890',10000));
+ -- If the starting position is zero or less, then return from the start of the string
+ -- adjusting the length to be consistent with the "negative start" per SQL92.
+ SELECT substr(f1, -1, 5) from toasttest;
+ substr
+ --------
+ 123
+ 123
+ (2 rows)
+
+ -- If the length is less than zero, an ERROR is thrown.
+ SELECT substr(f1, 5, -1) from toasttest;
+ ERROR: negative substring length not allowed
+ -- If no third argument (length) is provided, the length to the end of the
+ -- string is assumed.
+ SELECT substr(f1, 99995) from toasttest;
+ substr
+ --------
+ 567890
+ 567890
+ (2 rows)
+
+ -- If start plus length is > string length, the result is truncated to
+ -- string length
+ SELECT substr(f1, 99995, 10) from toasttest;
+ substr
+ --------
+ 567890
+ 567890
+ (2 rows)
+
+ DROP TABLE toasttest;
+ --
+ -- test substr with toasted bytea values
+ --
+ CREATE TABLE toasttest(f1 bytea);
+ insert into toasttest values(decode(repeat('1234567890',10000),'escape'));
+ insert into toasttest values(decode(repeat('1234567890',10000),'escape'));
+ -- If the starting position is zero or less, then return from the start of the string
+ -- adjusting the length to be consistent with the "negative start" per SQL92.
+ SELECT substr(f1, -1, 5) from toasttest;
+ substr
+ --------
+ 123
+ 123
+ (2 rows)
+
+ -- If the length is less than zero, an ERROR is thrown.
+ SELECT substr(f1, 5, -1) from toasttest;
+ ERROR: negative substring length not allowed
+ -- If no third argument (length) is provided, the length to the end of the
+ -- string is assumed.
+ SELECT substr(f1, 99995) from toasttest;
+ substr
+ --------
+ 567890
+ 567890
+ (2 rows)
+
+ -- If start plus length is > string length, the result is truncated to
+ -- string length
+ SELECT substr(f1, 99995, 10) from toasttest;
+ substr
+ --------
+ 567890
+ 567890
+ (2 rows)
+
+ DROP TABLE toasttest;
+ --
+ -- test length
+ --
+ SELECT length('abcdef') AS "length_6";
+ length_6
+ ----------
+ 6
+ (1 row)
+
+ --
+ -- test strpos
+ --
+ SELECT strpos('abcdef', 'cd') AS "pos_3";
+ pos_3
+ -------
+ 3
+ (1 row)
+
+ SELECT strpos('abcdef', 'xy') AS "pos_0";
+ pos_0
+ -------
+ 0
+ (1 row)
+
+ --
+ -- test replace
+ --
+ SELECT replace('abcdef', 'de', '45') AS "abc45f";
+ abc45f
+ --------
+ abc45f
+ (1 row)
+
+ SELECT replace('yabadabadoo', 'ba', '123') AS "ya123da123doo";
+ ya123da123doo
+ ---------------
+ ya123da123doo
+ (1 row)
+
+ SELECT replace('yabadoo', 'bad', '') AS "yaoo";
+ yaoo
+ ------
+ yaoo
+ (1 row)
+
+ --
+ -- test split
+ --
+ select split('joeuser@mydatabase','@',0) AS "an error";
+ ERROR: field position must be > 0
+ select split('joeuser@mydatabase','@',1) AS "joeuser";
+ joeuser
+ ---------
+ joeuser
+ (1 row)
+
+ select split('joeuser@mydatabase','@',2) AS "mydatabase";
+ mydatabase
+ ------------
+ mydatabase
+ (1 row)
+
+ select split('joeuser@mydatabase','@',3) AS "empty string";
+ empty string
+ --------------
+
+ (1 row)
+
+ select split('@joeuser@mydatabase@','@',2) AS "joeuser";
+ joeuser
+ ---------
+ joeuser
+ (1 row)
+
+ --
+ -- test to_hex
+ --
+ select to_hex(256*256*256 - 1) AS "ffffff";
+ ffffff
+ --------
+ ffffff
+ (1 row)
+
+ select to_hex(256::bigint*256::bigint*256::bigint*256::bigint - 1) AS "ffffffff";
+ ffffffff
+ ----------
+ ffffffff
+ (1 row)
+
Index: src/test/regress/sql/strings.sql
===================================================================
RCS file: /opt/src/cvs/pgsql-server/src/test/regress/sql/strings.sql,v
retrieving revision 1.8
diff -c -r1.8 strings.sql
*** src/test/regress/sql/strings.sql 11 Jun 2002 15:41:38 -0000 1.8
--- src/test/regress/sql/strings.sql 16 Aug 2002 18:53:13 -0000
***************
*** 197,199 ****
--- 197,292 ----
SELECT text 'text' || char(20) ' and characters' AS "Concat text to char";
SELECT text 'text' || varchar ' and varchar' AS "Concat text to varchar";
+
+ --
+ -- test substr with toasted text values
+ --
+ CREATE TABLE toasttest(f1 text);
+
+ insert into toasttest values(repeat('1234567890',10000));
+ insert into toasttest values(repeat('1234567890',10000));
+
+ -- If the starting position is zero or less, then return from the start of the string
+ -- adjusting the length to be consistent with the "negative start" per SQL92.
+ SELECT substr(f1, -1, 5) from toasttest;
+
+ -- If the length is less than zero, an ERROR is thrown.
+ SELECT substr(f1, 5, -1) from toasttest;
+
+ -- If no third argument (length) is provided, the length to the end of the
+ -- string is assumed.
+ SELECT substr(f1, 99995) from toasttest;
+
+ -- If start plus length is > string length, the result is truncated to
+ -- string length
+ SELECT substr(f1, 99995, 10) from toasttest;
+
+ DROP TABLE toasttest;
+
+ --
+ -- test substr with toasted bytea values
+ --
+ CREATE TABLE toasttest(f1 bytea);
+
+ insert into toasttest values(decode(repeat('1234567890',10000),'escape'));
+ insert into toasttest values(decode(repeat('1234567890',10000),'escape'));
+
+ -- If the starting position is zero or less, then return from the start of the string
+ -- adjusting the length to be consistent with the "negative start" per SQL92.
+ SELECT substr(f1, -1, 5) from toasttest;
+
+ -- If the length is less than zero, an ERROR is thrown.
+ SELECT substr(f1, 5, -1) from toasttest;
+
+ -- If no third argument (length) is provided, the length to the end of the
+ -- string is assumed.
+ SELECT substr(f1, 99995) from toasttest;
+
+ -- If start plus length is > string length, the result is truncated to
+ -- string length
+ SELECT substr(f1, 99995, 10) from toasttest;
+
+ DROP TABLE toasttest;
+
+ --
+ -- test length
+ --
+
+ SELECT length('abcdef') AS "length_6";
+
+ --
+ -- test strpos
+ --
+
+ SELECT strpos('abcdef', 'cd') AS "pos_3";
+
+ SELECT strpos('abcdef', 'xy') AS "pos_0";
+
+ --
+ -- test replace
+ --
+ SELECT replace('abcdef', 'de', '45') AS "abc45f";
+
+ SELECT replace('yabadabadoo', 'ba', '123') AS "ya123da123doo";
+
+ SELECT replace('yabadoo', 'bad', '') AS "yaoo";
+
+ --
+ -- test split
+ --
+ select split('joeuser@mydatabase','@',0) AS "an error";
+
+ select split('joeuser@mydatabase','@',1) AS "joeuser";
+
+ select split('joeuser@mydatabase','@',2) AS "mydatabase";
+
+ select split('joeuser@mydatabase','@',3) AS "empty string";
+
+ select split('@joeuser@mydatabase@','@',2) AS "joeuser";
+
+ --
+ -- test to_hex
+ --
+ select to_hex(256*256*256 - 1) AS "ffffff";
+
+ select to_hex(256::bigint*256::bigint*256::bigint*256::bigint - 1) AS "ffffffff";
Your patch has been added to the PostgreSQL unapplied patches list at:
http://candle.pha.pa.us/cgi-bin/pgpatches
I will try to apply it within the next 48 hours.
---------------------------------------------------------------------------
Joe Conway wrote:
Joe Conway wrote:
I took Tom's advice and added wrapper functions around text_substr() and
bytea_substr() to cover the 2 argument case.I also added tests to strings.sql to cover substr() on toasted columns
of both text and bytea.Please replace the original patch (substr.2002.08.14.1.patch) with the
attached. It includes everything from the previous one, plus newly
implemented builtin functions:replace(string, from, to)
-- replaces all occurrences of "from" in "string" to "to"
split(string, fldsep, column)
-- splits "string" on "fldsep" and returns "column" number piece
to_hex(int32_num) & to_hex(int64_num)
-- takes integer number and returns as hex stringAll previously discussed on the list; see thread at:
http://archives.postgresql.org/pgsql-hackers/2002-07/msg00511.phpExamples:
SELECT replace('yabadabadoo', 'ba', '123') AS "ya123da123doo";
ya123da123doo
---------------
ya123da123doo
(1 row)select split('joeuser@mydatabase','@',1) AS "joeuser";
joeuser
---------
joeuser
(1 row)select split('joeuser@mydatabase','@',2) AS "mydatabase";
mydatabase
------------
mydatabase
(1 row)select to_hex(256::bigint*256::bigint*256::bigint*256::bigint - 1) AS
"ffffffff";
ffffffff
----------
ffffffff
(1 row)Tests have been added to the regression suite.
Passes all regression tests. I've checked the strings.sql script in a
multibyte database and it works fine also. I'd appreciate a good look by
someone more familiar with multibyte related issues though.If it is OK, I'd like to hold off on docs until this is committed and
after beta starts.If there are no objections, please apply.
Thanks,
Joe
Index: src/backend/utils/adt/varlena.c =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/backend/utils/adt/varlena.c,v retrieving revision 1.87 diff -c -r1.87 varlena.c *** src/backend/utils/adt/varlena.c 4 Aug 2002 06:44:47 -0000 1.87 --- src/backend/utils/adt/varlena.c 16 Aug 2002 19:54:03 -0000 *************** *** 18,23 **** --- 18,25 ----#include "mb/pg_wchar.h"
#include "miscadmin.h"
+ #include "access/tuptoaster.h"
+ #include "lib/stringinfo.h"
#include "utils/builtins.h"
#include "utils/pg_locale.h"*************** *** 27,34 **** --- 29,62 ---- #define DatumGetUnknownP(X) ((unknown *) PG_DETOAST_DATUM(X)) #define PG_GETARG_UNKNOWN_P(n) DatumGetUnknownP(PG_GETARG_DATUM(n)) #define PG_RETURN_UNKNOWN_P(x) PG_RETURN_POINTER(x) + #define PG_TEXTARG_GET_STR(arg_) \ + DatumGetCString(DirectFunctionCall1(textout, PG_GETARG_DATUM(arg_))) + #define PG_TEXT_GET_STR(textp_) \ + DatumGetCString(DirectFunctionCall1(textout, PointerGetDatum(textp_))) + #define PG_STR_GET_TEXT(str_) \ + DatumGetTextP(DirectFunctionCall1(textin, CStringGetDatum(str_))) + #define TEXTLEN(textp) \ + text_length(PointerGetDatum(textp)) + #define TEXTPOS(buf_text, from_sub_text) \ + text_position(PointerGetDatum(buf_text), PointerGetDatum(from_sub_text), 1) + #define TEXTDUP(textp) \ + DatumGetTextPCopy(PointerGetDatum(textp)) + #define LEFT(buf_text, from_sub_text) \ + text_substring(PointerGetDatum(buf_text), \ + 1, \ + TEXTPOS(buf_text, from_sub_text) - 1, false) + #define RIGHT(buf_text, from_sub_text, from_sub_text_len) \ + text_substring(PointerGetDatum(buf_text), \ + TEXTPOS(buf_text, from_sub_text) + from_sub_text_len, \ + -1, true)static int text_cmp(text *arg1, text *arg2); + static int32 text_length(Datum str); + static int32 text_position(Datum str, Datum search_str, int matchnum); + static text *text_substring(Datum str, + int32 start, + int32 length, + bool length_not_specified);/*****************************************************************************
***************
*** 285,303 ****
Datum
textlen(PG_FUNCTION_ARGS)
{
! text *t = PG_GETARG_TEXT_P(0);! #ifdef MULTIBYTE
! /* optimization for single byte encoding */
! if (pg_database_encoding_max_length() <= 1)
! PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ);
!
! PG_RETURN_INT32(
! pg_mbstrlen_with_len(VARDATA(t), VARSIZE(t) - VARHDRSZ)
! );
! #else
! PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ);
! #endif
}/* --- 313,348 ---- Datum textlen(PG_FUNCTION_ARGS) { ! PG_RETURN_INT32(text_length(PG_GETARG_DATUM(0))); ! }! /*
! * text_length -
! * Does the real work for textlen()
! * This is broken out so it can be called directly by other string processing
! * functions.
! */
! static int32
! text_length(Datum str)
! {
! /* fastpath when max encoding length is one */
! if (pg_database_encoding_max_length() == 1)
! PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ);
!
! if (pg_database_encoding_max_length() > 1)
! {
! text *t = DatumGetTextP(str);
!
! PG_RETURN_INT32(pg_mbstrlen_with_len(VARDATA(t),
! VARSIZE(t) - VARHDRSZ));
! }
!
! /* should never get here */
! elog(ERROR, "Invalid backend encoding; encoding max length "
! "is less than one.");
!
! /* not reached: suppress compiler warning */
! return 0;
}/*
***************
*** 308,316 ****
Datum
textoctetlen(PG_FUNCTION_ARGS)
{
! text *arg = PG_GETARG_TEXT_P(0);
!
! PG_RETURN_INT32(VARSIZE(arg) - VARHDRSZ);
}/* --- 353,359 ---- Datum textoctetlen(PG_FUNCTION_ARGS) { ! PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ); }/*
***************
*** 382,471 ****
* - Thomas Lockhart 1998-12-10
* Now uses faster TOAST-slicing interface
* - John Gray 2002-02-22
*/
Datum
text_substr(PG_FUNCTION_ARGS)
{
! text *string;
! int32 m = PG_GETARG_INT32(1);
! int32 n = PG_GETARG_INT32(2);
! int32 sm;
! int32 sn;
! int eml = 1;
! #ifdef MULTIBYTE
! int i;
! int len;
! text *ret;
! char *p;
! #endif! /*
! * starting position before the start of the string? then offset into
! * the string per SQL92 spec...
! */
! if (m < 1)
{
! n += (m - 1);
! m = 1;
! }
! /* Check for m > octet length is made in TOAST access routine */! /* m will now become a zero-based starting position */
! sm = m - 1;
! sn = n;! #ifdef MULTIBYTE
! eml = pg_database_encoding_max_length ();! if (eml > 1)
{
! sm = 0;
! if (n > -1)
! sn = (m + n) * eml + 3; /* +3 to avoid mb characters overhanging slice end */
else
! sn = n; /* n < 0 is special-cased by heap_tuple_untoast_attr_slice */
! }
! #endif! string = PG_GETARG_TEXT_P_SLICE (0, sm, sn);
! if (eml == 1)
! {
! PG_RETURN_TEXT_P (string);
! }
! #ifndef MULTIBYTE
! PG_RETURN_NULL(); /* notreached: suppress compiler warning */
! #endif
! #ifdef MULTIBYTE
! if (n > -1)
! len = pg_mbstrlen_with_len (VARDATA (string), sn - 3);
! else /* n < 0 is special-cased; need full string length */
! len = pg_mbstrlen_with_len (VARDATA (string), VARSIZE(string)-VARHDRSZ);
!
! if (m > len)
! {
! m = 1;
! n = 0;
! }
! m--;
! if (((m + n) > len) || (n < 0))
! n = (len - m);
!
! p = VARDATA(string);
! for (i = 0; i < m; i++)
! p += pg_mblen(p);
! m = p - VARDATA(string);
! for (i = 0; i < n; i++)
! p += pg_mblen(p);
! n = p - (VARDATA(string) + m);! ret = (text *) palloc(VARHDRSZ + n);
! VARATT_SIZEP(ret) = VARHDRSZ + n;! memcpy(VARDATA(ret), VARDATA(string) + m, n);
! PG_RETURN_TEXT_P(ret);
! #endif
}/* --- 425,625 ---- * - Thomas Lockhart 1998-12-10 * Now uses faster TOAST-slicing interface * - John Gray 2002-02-22 + * Remove "#ifdef MULTIBYTE" and test for encoding_max_length instead. Change + * behaviors conflicting with SQL92 to meet SQL92 (if E = S + L < S throw + * error; if E < 1, return '', not entire string). Fixed MB related bug when + * S > LC and < LC + 4 sometimes garbage characters are returned. + * - Joe Conway 2002-08-10 */ Datum text_substr(PG_FUNCTION_ARGS) { ! PG_RETURN_TEXT_P(text_substring(PG_GETARG_DATUM(0), ! PG_GETARG_INT32(1), ! PG_GETARG_INT32(2), ! false)); ! }! /*
! * text_substr_no_len -
! * Wrapper to avoid opr_sanity failure due to
! * one function accepting a different number of args.
! */
! Datum
! text_substr_no_len(PG_FUNCTION_ARGS)
! {
! PG_RETURN_TEXT_P(text_substring(PG_GETARG_DATUM(0),
! PG_GETARG_INT32(1),
! -1, true));
! }
!
! /*
! * text_substring -
! * Does the real work for text_substr() and text_substr_no_len()
! * This is broken out so it can be called directly by other string processing
! * functions.
! */
! static text*
! text_substring(Datum str, int32 start, int32 length, bool length_not_specified)
! {
! int32 eml = pg_database_encoding_max_length();
! int32 S = start; /* start position */
! int32 S1; /* adjusted start position */
! int32 L1; /* adjusted substring length */
!
! /* life is easy if the encoding max length is 1 */
! if (eml == 1)
{
! S1 = Max(S, 1);! if (length_not_specified) /* special case - get length to end of string */
! L1 = -1;
! else
! {
! /* end position */
! int E = S + length;! /*
! * A negative value for L is the only way for the end position
! * to be before the start. SQL99 says to throw an error.
! */
! if (E < S)
! elog(ERROR, "negative substring length not allowed");! /*
! * A zero or negative value for the end position can happen if the start
! * was negative or one. SQL99 says to return a zero-length string.
! */
! if (E < 1)
! return PG_STR_GET_TEXT("");
!
! L1 = E - S1;
! }
!
! /*
! * If the start position is past the end of the string,
! * SQL99 says to return a zero-length string --
! * PG_GETARG_TEXT_P_SLICE() will do that for us.
! * Convert to zero-based starting position
! */
! return DatumGetTextPSlice(str, S1 - 1, L1);
! }
! else if (eml > 1)
{
! /*
! * When encoding max length is > 1, we can't get LC without
! * detoasting, so we'll grab a conservatively large slice
! * now and go back later to do the right thing
! */
! int32 slice_start;
! int32 slice_size;
! int32 slice_strlen;
! text *slice;
! int32 E1;
! int32 i;
! char *p;
! char *s;
! text *ret;
!
! /*
! * if S is past the end of the string, the tuple toaster
! * will return a zero-length string to us
! */
! S1 = Max(S, 1);
!
! /*
! * We need to start at position zero because there is no
! * way to know in advance which byte offset corresponds to
! * the supplied start position.
! */
! slice_start = 0;
!
! if (length_not_specified) /* special case - get length to end of string */
! slice_size = L1 = -1;
else
! {
! int E = S + length;
!
! /*
! * A negative value for L is the only way for the end position
! * to be before the start. SQL99 says to throw an error.
! */
! if (E < S)
! elog(ERROR, "negative substring length not allowed");! /*
! * A zero or negative value for the end position can happen if the start
! * was negative or one. SQL99 says to return a zero-length string.
! */
! if (E < 1)
! return PG_STR_GET_TEXT("");! /*
! * if E is past the end of the string, the tuple toaster
! * will truncate the length for us
! */
! L1 = E - S1;
!
! /*
! * Total slice size in bytes can't be any longer than the start
! * position plus substring length times the encoding max length.
! */
! slice_size = (S1 + L1) * eml;
! }
! slice = DatumGetTextPSlice(str, slice_start, slice_size);! /* see if we got back an empty string */
! if ((VARSIZE(slice) - VARHDRSZ) == 0)
! return PG_STR_GET_TEXT("");! /* Now we can get the actual length of the slice in MB characters */
! slice_strlen = pg_mbstrlen_with_len (VARDATA(slice), VARSIZE(slice) - VARHDRSZ);! /* Check that the start position wasn't > slice_strlen. If so,
! * SQL99 says to return a zero-length string.
! */
! if (S1 > slice_strlen)
! return PG_STR_GET_TEXT("");
!
! /*
! * Adjust L1 and E1 now that we know the slice string length.
! * Again remember that S1 is one based, and slice_start is zero based.
! */
! if (L1 > -1)
! E1 = Min(S1 + L1 , slice_start + 1 + slice_strlen);
! else
! E1 = slice_start + 1 + slice_strlen;
!
! /*
! * Find the start position in the slice;
! * remember S1 is not zero based
! */
! p = VARDATA(slice);
! for (i = 0; i < S1 - 1; i++)
! p += pg_mblen(p);
!
! /* hang onto a pointer to our start position */
! s = p;
!
! /*
! * Count the actual bytes used by the substring of
! * the requested length.
! */
! for (i = S1; i < E1; i++)
! p += pg_mblen(p);
!
! ret = (text *) palloc(VARHDRSZ + (p - s));
! VARATT_SIZEP(ret) = VARHDRSZ + (p - s);
! memcpy(VARDATA(ret), s, (p - s));
!
! return ret;
! }
! else
! elog(ERROR, "Invalid backend encoding; encoding max length "
! "is less than one.");
!
! /* not reached: suppress compiler warning */
! return PG_STR_GET_TEXT("");
}/*
***************
*** 481,536 ****
Datum
textpos(PG_FUNCTION_ARGS)
{
! text *t1 = PG_GETARG_TEXT_P(0);
! text *t2 = PG_GETARG_TEXT_P(1);
! int pos;
! int px,
! p;
! int len1,
len2;
- pg_wchar *p1,
- *p2;! #ifdef MULTIBYTE
! pg_wchar *ps1,
! *ps2;
! #endifif (VARSIZE(t2) <= VARHDRSZ)
PG_RETURN_INT32(1); /* result for empty pattern */len1 = (VARSIZE(t1) - VARHDRSZ);
len2 = (VARSIZE(t2) - VARHDRSZ);
! #ifdef MULTIBYTE
! ps1 = p1 = (pg_wchar *) palloc((len1 + 1) * sizeof(pg_wchar));
! (void) pg_mb2wchar_with_len((unsigned char *) VARDATA(t1), p1, len1);
! len1 = pg_wchar_strlen(p1);
! ps2 = p2 = (pg_wchar *) palloc((len2 + 1) * sizeof(pg_wchar));
! (void) pg_mb2wchar_with_len((unsigned char *) VARDATA(t2), p2, len2);
! len2 = pg_wchar_strlen(p2);
! #else
! p1 = VARDATA(t1);
! p2 = VARDATA(t2);
! #endif
! pos = 0;
px = (len1 - len2);
! for (p = 0; p <= px; p++)
{
! #ifdef MULTIBYTE
! if ((*p2 == *p1) && (pg_wchar_strncmp(p1, p2, len2) == 0))
! #else
! if ((*p2 == *p1) && (strncmp(p1, p2, len2) == 0))
! #endif
{
! pos = p + 1;
! break;
! };
! p1++;
! };
! #ifdef MULTIBYTE
! pfree(ps1);
! pfree(ps2);
! #endif
PG_RETURN_INT32(pos);
}--- 635,729 ---- Datum textpos(PG_FUNCTION_ARGS) { ! PG_RETURN_INT32(text_position(PG_GETARG_DATUM(0), PG_GETARG_DATUM(1), 1)); ! } ! ! /* ! * text_position - ! * Does the real work for textpos() ! * This is broken out so it can be called directly by other string processing ! * functions. ! */ ! static int32 ! text_position(Datum str, Datum search_str, int matchnum) ! { ! int eml = pg_database_encoding_max_length(); ! text *t1 = DatumGetTextP(str); ! text *t2 = DatumGetTextP(search_str); ! int match = 0, ! pos = 0, ! p = 0, ! px, ! len1, len2;! if(matchnum == 0)
! return 0; /* result for 0th match */if (VARSIZE(t2) <= VARHDRSZ)
PG_RETURN_INT32(1); /* result for empty pattern */len1 = (VARSIZE(t1) - VARHDRSZ);
len2 = (VARSIZE(t2) - VARHDRSZ);
!
! /* no use in searching str past point where search_str will fit */
px = (len1 - len2);
!
! if (eml == 1) /* simple case - single byte encoding */
{
! char *p1,
! *p2;
!
! p1 = VARDATA(t1);
! p2 = VARDATA(t2);
!
! for (p = 0; p <= px; p++)
{
! if ((*p2 == *p1) && (strncmp(p1, p2, len2) == 0))
! {
! if (++match == matchnum)
! {
! pos = p + 1;
! break;
! }
! }
! p1++;
! }
! }
! else if (eml > 1) /* not as simple - multibyte encoding */
! {
! pg_wchar *p1,
! *p2,
! *ps1,
! *ps2;
!
! ps1 = p1 = (pg_wchar *) palloc((len1 + 1) * sizeof(pg_wchar));
! (void) pg_mb2wchar_with_len((unsigned char *) VARDATA(t1), p1, len1);
! len1 = pg_wchar_strlen(p1);
! ps2 = p2 = (pg_wchar *) palloc((len2 + 1) * sizeof(pg_wchar));
! (void) pg_mb2wchar_with_len((unsigned char *) VARDATA(t2), p2, len2);
! len2 = pg_wchar_strlen(p2);
!
! for (p = 0; p <= px; p++)
! {
! if ((*p2 == *p1) && (pg_wchar_strncmp(p1, p2, len2) == 0))
! {
! if (++match == matchnum)
! {
! pos = p + 1;
! break;
! }
! }
! p1++;
! }
!
! pfree(ps1);
! pfree(ps2);
! }
! else
! elog(ERROR, "Invalid backend encoding; encoding max length "
! "is less than one.");
!
PG_RETURN_INT32(pos);
}***************
*** 758,766 ****
Datum
byteaoctetlen(PG_FUNCTION_ARGS)
{
! bytea *v = PG_GETARG_BYTEA_P(0);
!
! PG_RETURN_INT32(VARSIZE(v) - VARHDRSZ);
}/* --- 951,957 ---- Datum byteaoctetlen(PG_FUNCTION_ARGS) { ! PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ); }/* *************** *** 805,810 **** --- 996,1003 ---- PG_RETURN_BYTEA_P(result); }+ #define PG_STR_GET_BYTEA(str_) \ + DatumGetByteaP(DirectFunctionCall1(byteain, CStringGetDatum(str_))) /* * bytea_substr() * Return a substring starting at the specified position. *************** *** 813,845 **** * Input: * - string * - starting position (is one-based) ! * - string length * * If the starting position is zero or less, then return from the start of the string * adjusting the length to be consistent with the "negative start" per SQL92. ! * If the length is less than zero, return the remaining string. ! * */ Datum bytea_substr(PG_FUNCTION_ARGS) { ! int32 m = PG_GETARG_INT32(1); ! int32 n = PG_GETARG_INT32(2);! /*
! * starting position before the start of the string? then offset into
! * the string per SQL92 spec...
! */
! if (m < 1)
{
! n += (m - 1);
! m = 1;
}! /* m will now become a zero-based starting position */
! m--;! PG_RETURN_BYTEA_P(PG_GETARG_BYTEA_P_SLICE (0, m, n));
}/* --- 1006,1076 ---- * Input: * - string * - starting position (is one-based) ! * - string length (optional) * * If the starting position is zero or less, then return from the start of the string * adjusting the length to be consistent with the "negative start" per SQL92. ! * If the length is less than zero, an ERROR is thrown. If no third argument ! * (length) is provided, the length to the end of the string is assumed. */ Datum bytea_substr(PG_FUNCTION_ARGS) { ! int S = PG_GETARG_INT32(1); /* start position */ ! int S1; /* adjusted start position */ ! int L1; /* adjusted substring length */! S1 = Max(S, 1);
!
! if (fcinfo->nargs == 2)
! {
! /*
! * Not passed a length - PG_GETARG_BYTEA_P_SLICE()
! * grabs everything to the end of the string if we pass it
! * a negative value for length.
! */
! L1 = -1;
! }
! else
{
! /* end position */
! int E = S + PG_GETARG_INT32(2);
!
! /*
! * A negative value for L is the only way for the end position
! * to be before the start. SQL99 says to throw an error.
! */
! if (E < S)
! elog(ERROR, "negative substring length not allowed");
!
! /*
! * A zero or negative value for the end position can happen if the start
! * was negative or one. SQL99 says to return a zero-length string.
! */
! if (E < 1)
! PG_RETURN_BYTEA_P(PG_STR_GET_BYTEA(""));
!
! L1 = E - S1;
}! /*
! * If the start position is past the end of the string,
! * SQL99 says to return a zero-length string --
! * PG_GETARG_TEXT_P_SLICE() will do that for us.
! * Convert to zero-based starting position
! */
! PG_RETURN_BYTEA_P(PG_GETARG_BYTEA_P_SLICE (0, S1 - 1, L1));
! }! /*
! * bytea_substr_no_len -
! * Wrapper to avoid opr_sanity failure due to
! * one function accepting a different number of args.
! */
! Datum
! bytea_substr_no_len(PG_FUNCTION_ARGS)
! {
! return bytea_substr(fcinfo);
}/* *************** *** 1422,1424 **** --- 1653,1834 ----PG_RETURN_INT32(cmp); } + + /* + * replace_text + * replace all occurences of 'old_sub_str' in 'orig_str' + * with 'new_sub_str' to form 'new_str' + * + * returns 'orig_str' if 'old_sub_str' == '' or 'orig_str' == '' + * otherwise returns 'new_str' + */ + Datum + replace_text(PG_FUNCTION_ARGS) + { + text *left_text; + text *right_text; + text *buf_text; + text *ret_text; + int curr_posn; + text *src_text = PG_GETARG_TEXT_P(0); + int src_text_len = TEXTLEN(src_text); + text *from_sub_text = PG_GETARG_TEXT_P(1); + int from_sub_text_len = TEXTLEN(from_sub_text); + text *to_sub_text = PG_GETARG_TEXT_P(2); + char *to_sub_str = PG_TEXT_GET_STR(to_sub_text); + StringInfo str = makeStringInfo(); + + if (src_text_len == 0 || from_sub_text_len == 0) + PG_RETURN_TEXT_P(src_text); + + buf_text = TEXTDUP(src_text); + curr_posn = TEXTPOS(buf_text, from_sub_text); + + while (curr_posn > 0) + { + left_text = LEFT(buf_text, from_sub_text); + right_text = RIGHT(buf_text, from_sub_text, from_sub_text_len); + + appendStringInfo(str, PG_TEXT_GET_STR(left_text)); + appendStringInfo(str, to_sub_str); + + pfree(buf_text); + pfree(left_text); + buf_text = right_text; + curr_posn = TEXTPOS(buf_text, from_sub_text); + } + + appendStringInfo(str, PG_TEXT_GET_STR(buf_text)); + pfree(buf_text); + + ret_text = PG_STR_GET_TEXT(str->data); + pfree(str->data); + pfree(str); + + PG_RETURN_TEXT_P(ret_text); + } + + /* + * split_text + * parse input string + * return ord item (1 based) + * based on provided field separator + */ + Datum + split_text(PG_FUNCTION_ARGS) + { + text *inputstring = PG_GETARG_TEXT_P(0); + int inputstring_len = TEXTLEN(inputstring); + text *fldsep = PG_GETARG_TEXT_P(1); + int fldsep_len = TEXTLEN(fldsep); + int fldnum = PG_GETARG_INT32(2); + int start_posn = 0; + int end_posn = 0; + text *result_text; + + /* return empty string for empty input string */ + if (inputstring_len < 1) + PG_RETURN_TEXT_P(PG_STR_GET_TEXT("")); + + /* empty field separator */ + if (fldsep_len < 1) + { + if (fldnum == 1) /* first field - just return the input string */ + PG_RETURN_TEXT_P(inputstring); + else /* otherwise return an empty string */ + PG_RETURN_TEXT_P(PG_STR_GET_TEXT("")); + } + + /* field number is 1 based */ + if (fldnum < 1) + elog(ERROR, "field position must be > 0"); + + start_posn = text_position(PointerGetDatum(inputstring), + PointerGetDatum(fldsep), + fldnum - 1); + end_posn = text_position(PointerGetDatum(inputstring), + PointerGetDatum(fldsep), + fldnum); + + if ((start_posn == 0) && (end_posn == 0)) /* fldsep not found */ + { + if (fldnum == 1) /* first field - just return the input string */ + PG_RETURN_TEXT_P(inputstring); + else /* otherwise return an empty string */ + PG_RETURN_TEXT_P(PG_STR_GET_TEXT("")); + } + else if ((start_posn != 0) && (end_posn == 0)) + { + /* last field requested */ + result_text = text_substring(PointerGetDatum(inputstring), start_posn + fldsep_len, -1, true); + PG_RETURN_TEXT_P(result_text); + } + else if ((start_posn == 0) && (end_posn != 0)) + { + /* first field requested */ + result_text = LEFT(inputstring, fldsep); + PG_RETURN_TEXT_P(result_text); + } + else + { + /* prior to last field requested */ + result_text = text_substring(PointerGetDatum(inputstring), start_posn + fldsep_len, end_posn - start_posn - fldsep_len, false); + PG_RETURN_TEXT_P(result_text); + } + } + + #define HEXBASE 16 + /* + * Convert a int32 to a string containing a base 16 (hex) representation of + * the number. + */ + Datum + to_hex32(PG_FUNCTION_ARGS) + { + static char digits[] = "0123456789abcdef"; + char buf[32]; /* bigger than needed, but reasonable */ + char *ptr, + *end; + text *result_text; + int32 value = PG_GETARG_INT32(0); + + end = ptr = buf + sizeof(buf) - 1; + *ptr = '\0'; + + do + { + *--ptr = digits[value % HEXBASE]; + value /= HEXBASE; + } while (ptr > buf && value); + + result_text = PG_STR_GET_TEXT(ptr); + PG_RETURN_TEXT_P(result_text); + } + + /* + * Convert a int64 to a string containing a base 16 (hex) representation of + * the number. + */ + Datum + to_hex64(PG_FUNCTION_ARGS) + { + static char digits[] = "0123456789abcdef"; + char buf[32]; /* bigger than needed, but reasonable */ + char *ptr, + *end; + text *result_text; + int64 value = PG_GETARG_INT64(0); + + end = ptr = buf + sizeof(buf) - 1; + *ptr = '\0'; + + do + { + *--ptr = digits[value % HEXBASE]; + value /= HEXBASE; + } while (ptr > buf && value); + + result_text = PG_STR_GET_TEXT(ptr); + PG_RETURN_TEXT_P(result_text); + } + Index: src/include/catalog/pg_proc.h =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/include/catalog/pg_proc.h,v retrieving revision 1.254 diff -c -r1.254 pg_proc.h *** src/include/catalog/pg_proc.h 15 Aug 2002 02:51:27 -0000 1.254 --- src/include/catalog/pg_proc.h 16 Aug 2002 18:53:13 -0000 *************** *** 2121,2127 **** DESCR("remove initial characters from string"); DATA(insert OID = 882 ( rtrim PGNSP PGUID 14 f f t f i 1 25 "25" "select rtrim($1, \' \')" - _null_ )); DESCR("remove trailing characters from string"); ! DATA(insert OID = 883 ( substr PGNSP PGUID 14 f f t f i 2 25 "25 23" "select substr($1, $2, -1)" - _null_ )); DESCR("return portion of string"); DATA(insert OID = 884 ( btrim PGNSP PGUID 12 f f t f i 2 25 "25 25" btrim - _null_ )); DESCR("trim both ends of string"); --- 2121,2127 ---- DESCR("remove initial characters from string"); DATA(insert OID = 882 ( rtrim PGNSP PGUID 14 f f t f i 1 25 "25" "select rtrim($1, \' \')" - _null_ )); DESCR("remove trailing characters from string"); ! DATA(insert OID = 883 ( substr PGNSP PGUID 12 f f t f i 2 25 "25 23" text_substr_no_len - _null_ )); DESCR("return portion of string"); DATA(insert OID = 884 ( btrim PGNSP PGUID 12 f f t f i 2 25 "25 25" btrim - _null_ )); DESCR("trim both ends of string"); *************** *** 2130,2137 ****DATA(insert OID = 936 ( substring PGNSP PGUID 12 f f t f i 3 25 "25 23 23" text_substr - _null_ ));
DESCR("return portion of string");
! DATA(insert OID = 937 ( substring PGNSP PGUID 14 f f t f i 2 25 "25 23" "select substring($1, $2, -1)" - _null_ ));
DESCR("return portion of string");/* for multi-byte support */
--- 2130,2145 ----DATA(insert OID = 936 ( substring PGNSP PGUID 12 f f t f i 3 25 "25 23 23" text_substr - _null_ )); DESCR("return portion of string"); ! DATA(insert OID = 937 ( substring PGNSP PGUID 12 f f t f i 2 25 "25 23" text_substr_no_len - _null_ )); DESCR("return portion of string"); + DATA(insert OID = 2087 ( replace PGNSP PGUID 12 f f t f i 3 25 "25 25 25" replace_text - _null_ )); + DESCR("replace all occurrences of old_substr with new_substr in string"); + DATA(insert OID = 2088 ( split PGNSP PGUID 12 f f t f i 3 25 "25 25 23" split_text - _null_ )); + DESCR("split string by field_sep and return field_num"); + DATA(insert OID = 2089 ( to_hex PGNSP PGUID 12 f f t f i 1 25 "23" to_hex32 - _null_ )); + DESCR("convert int32 number to hex"); + DATA(insert OID = 2090 ( to_hex PGNSP PGUID 12 f f t f i 1 25 "20" to_hex64 - _null_ )); + DESCR("convert int64 number to hex");/* for multi-byte support */
*************** *** 2778,2784 **** DESCR("concatenate"); DATA(insert OID = 2012 ( substring PGNSP PGUID 12 f f t f i 3 17 "17 23 23" bytea_substr - _null_ )); DESCR("return portion of string"); ! DATA(insert OID = 2013 ( substring PGNSP PGUID 14 f f t f i 2 17 "17 23" "select substring($1, $2, -1)" - _null_ )); DESCR("return portion of string"); DATA(insert OID = 2014 ( position PGNSP PGUID 12 f f t f i 2 23 "17 17" byteapos - _null_ )); DESCR("return position of substring"); --- 2786,2796 ---- DESCR("concatenate"); DATA(insert OID = 2012 ( substring PGNSP PGUID 12 f f t f i 3 17 "17 23 23" bytea_substr - _null_ )); DESCR("return portion of string"); ! DATA(insert OID = 2013 ( substring PGNSP PGUID 12 f f t f i 2 17 "17 23" bytea_substr_no_len - _null_ )); ! DESCR("return portion of string"); ! DATA(insert OID = 2085 ( substr PGNSP PGUID 12 f f t f i 3 17 "17 23 23" bytea_substr - _null_ )); ! DESCR("return portion of string"); ! DATA(insert OID = 2086 ( substr PGNSP PGUID 12 f f t f i 2 17 "17 23" bytea_substr_no_len - _null_ )); DESCR("return portion of string"); DATA(insert OID = 2014 ( position PGNSP PGUID 12 f f t f i 2 23 "17 17" byteapos - _null_ )); DESCR("return position of substring"); Index: src/include/utils/builtins.h =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/include/utils/builtins.h,v retrieving revision 1.191 diff -c -r1.191 builtins.h *** src/include/utils/builtins.h 15 Aug 2002 02:51:27 -0000 1.191 --- src/include/utils/builtins.h 16 Aug 2002 18:53:13 -0000 *************** *** 447,458 **** --- 447,463 ---- extern Datum textoctetlen(PG_FUNCTION_ARGS); extern Datum textpos(PG_FUNCTION_ARGS); extern Datum text_substr(PG_FUNCTION_ARGS); + extern Datum text_substr_no_len(PG_FUNCTION_ARGS); extern Datum name_text(PG_FUNCTION_ARGS); extern Datum text_name(PG_FUNCTION_ARGS); extern int varstr_cmp(char *arg1, int len1, char *arg2, int len2); extern List *textToQualifiedNameList(text *textval, const char *caller); extern bool SplitIdentifierString(char *rawstring, char separator, List **namelist); + extern Datum replace_text(PG_FUNCTION_ARGS); + extern Datum split_text(PG_FUNCTION_ARGS); + extern Datum to_hex32(PG_FUNCTION_ARGS); + extern Datum to_hex64(PG_FUNCTION_ARGS);extern Datum unknownin(PG_FUNCTION_ARGS); extern Datum unknownout(PG_FUNCTION_ARGS); *************** *** 476,481 **** --- 481,487 ---- extern Datum byteacat(PG_FUNCTION_ARGS); extern Datum byteapos(PG_FUNCTION_ARGS); extern Datum bytea_substr(PG_FUNCTION_ARGS); + extern Datum bytea_substr_no_len(PG_FUNCTION_ARGS);/* version.c */ extern Datum pgsql_version(PG_FUNCTION_ARGS); Index: src/test/regress/expected/strings.out =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/test/regress/expected/strings.out,v retrieving revision 1.12 diff -c -r1.12 strings.out *** src/test/regress/expected/strings.out 11 Jun 2002 15:41:38 -0000 1.12 --- src/test/regress/expected/strings.out 16 Aug 2002 18:53:13 -0000 *************** *** 573,575 **** --- 573,738 ---- text and varchar (1 row)+ -- + -- test substr with toasted text values + -- + CREATE TABLE toasttest(f1 text); + insert into toasttest values(repeat('1234567890',10000)); + insert into toasttest values(repeat('1234567890',10000)); + -- If the starting position is zero or less, then return from the start of the string + -- adjusting the length to be consistent with the "negative start" per SQL92. + SELECT substr(f1, -1, 5) from toasttest; + substr + -------- + 123 + 123 + (2 rows) + + -- If the length is less than zero, an ERROR is thrown. + SELECT substr(f1, 5, -1) from toasttest; + ERROR: negative substring length not allowed + -- If no third argument (length) is provided, the length to the end of the + -- string is assumed. + SELECT substr(f1, 99995) from toasttest; + substr + -------- + 567890 + 567890 + (2 rows) + + -- If start plus length is > string length, the result is truncated to + -- string length + SELECT substr(f1, 99995, 10) from toasttest; + substr + -------- + 567890 + 567890 + (2 rows) + + DROP TABLE toasttest; + -- + -- test substr with toasted bytea values + -- + CREATE TABLE toasttest(f1 bytea); + insert into toasttest values(decode(repeat('1234567890',10000),'escape')); + insert into toasttest values(decode(repeat('1234567890',10000),'escape')); + -- If the starting position is zero or less, then return from the start of the string + -- adjusting the length to be consistent with the "negative start" per SQL92. + SELECT substr(f1, -1, 5) from toasttest; + substr + -------- + 123 + 123 + (2 rows) + + -- If the length is less than zero, an ERROR is thrown. + SELECT substr(f1, 5, -1) from toasttest; + ERROR: negative substring length not allowed + -- If no third argument (length) is provided, the length to the end of the + -- string is assumed. + SELECT substr(f1, 99995) from toasttest; + substr + -------- + 567890 + 567890 + (2 rows) + + -- If start plus length is > string length, the result is truncated to + -- string length + SELECT substr(f1, 99995, 10) from toasttest; + substr + -------- + 567890 + 567890 + (2 rows) + + DROP TABLE toasttest; + -- + -- test length + -- + SELECT length('abcdef') AS "length_6"; + length_6 + ---------- + 6 + (1 row) + + -- + -- test strpos + -- + SELECT strpos('abcdef', 'cd') AS "pos_3"; + pos_3 + ------- + 3 + (1 row) + + SELECT strpos('abcdef', 'xy') AS "pos_0"; + pos_0 + ------- + 0 + (1 row) + + -- + -- test replace + -- + SELECT replace('abcdef', 'de', '45') AS "abc45f"; + abc45f + -------- + abc45f + (1 row) + + SELECT replace('yabadabadoo', 'ba', '123') AS "ya123da123doo"; + ya123da123doo + --------------- + ya123da123doo + (1 row) + + SELECT replace('yabadoo', 'bad', '') AS "yaoo"; + yaoo + ------ + yaoo + (1 row) + + -- + -- test split + -- + select split('joeuser@mydatabase','@',0) AS "an error"; + ERROR: field position must be > 0 + select split('joeuser@mydatabase','@',1) AS "joeuser"; + joeuser + --------- + joeuser + (1 row) + + select split('joeuser@mydatabase','@',2) AS "mydatabase"; + mydatabase + ------------ + mydatabase + (1 row) + + select split('joeuser@mydatabase','@',3) AS "empty string"; + empty string + -------------- + + (1 row) + + select split('@joeuser@mydatabase@','@',2) AS "joeuser"; + joeuser + --------- + joeuser + (1 row) + + -- + -- test to_hex + -- + select to_hex(256*256*256 - 1) AS "ffffff"; + ffffff + -------- + ffffff + (1 row) + + select to_hex(256::bigint*256::bigint*256::bigint*256::bigint - 1) AS "ffffffff"; + ffffffff + ---------- + ffffffff + (1 row) + Index: src/test/regress/sql/strings.sql =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/test/regress/sql/strings.sql,v retrieving revision 1.8 diff -c -r1.8 strings.sql *** src/test/regress/sql/strings.sql 11 Jun 2002 15:41:38 -0000 1.8 --- src/test/regress/sql/strings.sql 16 Aug 2002 18:53:13 -0000 *************** *** 197,199 **** --- 197,292 ---- SELECT text 'text' || char(20) ' and characters' AS "Concat text to char";SELECT text 'text' || varchar ' and varchar' AS "Concat text to varchar"; + + -- + -- test substr with toasted text values + -- + CREATE TABLE toasttest(f1 text); + + insert into toasttest values(repeat('1234567890',10000)); + insert into toasttest values(repeat('1234567890',10000)); + + -- If the starting position is zero or less, then return from the start of the string + -- adjusting the length to be consistent with the "negative start" per SQL92. + SELECT substr(f1, -1, 5) from toasttest; + + -- If the length is less than zero, an ERROR is thrown. + SELECT substr(f1, 5, -1) from toasttest; + + -- If no third argument (length) is provided, the length to the end of the + -- string is assumed. + SELECT substr(f1, 99995) from toasttest; + + -- If start plus length is > string length, the result is truncated to + -- string length + SELECT substr(f1, 99995, 10) from toasttest; + + DROP TABLE toasttest; + + -- + -- test substr with toasted bytea values + -- + CREATE TABLE toasttest(f1 bytea); + + insert into toasttest values(decode(repeat('1234567890',10000),'escape')); + insert into toasttest values(decode(repeat('1234567890',10000),'escape')); + + -- If the starting position is zero or less, then return from the start of the string + -- adjusting the length to be consistent with the "negative start" per SQL92. + SELECT substr(f1, -1, 5) from toasttest; + + -- If the length is less than zero, an ERROR is thrown. + SELECT substr(f1, 5, -1) from toasttest; + + -- If no third argument (length) is provided, the length to the end of the + -- string is assumed. + SELECT substr(f1, 99995) from toasttest; + + -- If start plus length is > string length, the result is truncated to + -- string length + SELECT substr(f1, 99995, 10) from toasttest; + + DROP TABLE toasttest; + + -- + -- test length + -- + + SELECT length('abcdef') AS "length_6"; + + -- + -- test strpos + -- + + SELECT strpos('abcdef', 'cd') AS "pos_3"; + + SELECT strpos('abcdef', 'xy') AS "pos_0"; + + -- + -- test replace + -- + SELECT replace('abcdef', 'de', '45') AS "abc45f"; + + SELECT replace('yabadabadoo', 'ba', '123') AS "ya123da123doo"; + + SELECT replace('yabadoo', 'bad', '') AS "yaoo"; + + -- + -- test split + -- + select split('joeuser@mydatabase','@',0) AS "an error"; + + select split('joeuser@mydatabase','@',1) AS "joeuser"; + + select split('joeuser@mydatabase','@',2) AS "mydatabase"; + + select split('joeuser@mydatabase','@',3) AS "empty string"; + + select split('@joeuser@mydatabase@','@',2) AS "joeuser"; + + -- + -- test to_hex + -- + select to_hex(256*256*256 - 1) AS "ffffff"; + + select to_hex(256::bigint*256::bigint*256::bigint*256::bigint - 1) AS "ffffffff";
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Patch applied. Thanks.
---------------------------------------------------------------------------
Joe Conway wrote:
Joe Conway wrote:
I took Tom's advice and added wrapper functions around text_substr() and
bytea_substr() to cover the 2 argument case.I also added tests to strings.sql to cover substr() on toasted columns
of both text and bytea.Please replace the original patch (substr.2002.08.14.1.patch) with the
attached. It includes everything from the previous one, plus newly
implemented builtin functions:replace(string, from, to)
-- replaces all occurrences of "from" in "string" to "to"
split(string, fldsep, column)
-- splits "string" on "fldsep" and returns "column" number piece
to_hex(int32_num) & to_hex(int64_num)
-- takes integer number and returns as hex stringAll previously discussed on the list; see thread at:
http://archives.postgresql.org/pgsql-hackers/2002-07/msg00511.phpExamples:
SELECT replace('yabadabadoo', 'ba', '123') AS "ya123da123doo";
ya123da123doo
---------------
ya123da123doo
(1 row)select split('joeuser@mydatabase','@',1) AS "joeuser";
joeuser
---------
joeuser
(1 row)select split('joeuser@mydatabase','@',2) AS "mydatabase";
mydatabase
------------
mydatabase
(1 row)select to_hex(256::bigint*256::bigint*256::bigint*256::bigint - 1) AS
"ffffffff";
ffffffff
----------
ffffffff
(1 row)Tests have been added to the regression suite.
Passes all regression tests. I've checked the strings.sql script in a
multibyte database and it works fine also. I'd appreciate a good look by
someone more familiar with multibyte related issues though.If it is OK, I'd like to hold off on docs until this is committed and
after beta starts.If there are no objections, please apply.
Thanks,
Joe
Index: src/backend/utils/adt/varlena.c =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/backend/utils/adt/varlena.c,v retrieving revision 1.87 diff -c -r1.87 varlena.c *** src/backend/utils/adt/varlena.c 4 Aug 2002 06:44:47 -0000 1.87 --- src/backend/utils/adt/varlena.c 16 Aug 2002 19:54:03 -0000 *************** *** 18,23 **** --- 18,25 ----#include "mb/pg_wchar.h"
#include "miscadmin.h"
+ #include "access/tuptoaster.h"
+ #include "lib/stringinfo.h"
#include "utils/builtins.h"
#include "utils/pg_locale.h"*************** *** 27,34 **** --- 29,62 ---- #define DatumGetUnknownP(X) ((unknown *) PG_DETOAST_DATUM(X)) #define PG_GETARG_UNKNOWN_P(n) DatumGetUnknownP(PG_GETARG_DATUM(n)) #define PG_RETURN_UNKNOWN_P(x) PG_RETURN_POINTER(x) + #define PG_TEXTARG_GET_STR(arg_) \ + DatumGetCString(DirectFunctionCall1(textout, PG_GETARG_DATUM(arg_))) + #define PG_TEXT_GET_STR(textp_) \ + DatumGetCString(DirectFunctionCall1(textout, PointerGetDatum(textp_))) + #define PG_STR_GET_TEXT(str_) \ + DatumGetTextP(DirectFunctionCall1(textin, CStringGetDatum(str_))) + #define TEXTLEN(textp) \ + text_length(PointerGetDatum(textp)) + #define TEXTPOS(buf_text, from_sub_text) \ + text_position(PointerGetDatum(buf_text), PointerGetDatum(from_sub_text), 1) + #define TEXTDUP(textp) \ + DatumGetTextPCopy(PointerGetDatum(textp)) + #define LEFT(buf_text, from_sub_text) \ + text_substring(PointerGetDatum(buf_text), \ + 1, \ + TEXTPOS(buf_text, from_sub_text) - 1, false) + #define RIGHT(buf_text, from_sub_text, from_sub_text_len) \ + text_substring(PointerGetDatum(buf_text), \ + TEXTPOS(buf_text, from_sub_text) + from_sub_text_len, \ + -1, true)static int text_cmp(text *arg1, text *arg2); + static int32 text_length(Datum str); + static int32 text_position(Datum str, Datum search_str, int matchnum); + static text *text_substring(Datum str, + int32 start, + int32 length, + bool length_not_specified);/*****************************************************************************
***************
*** 285,303 ****
Datum
textlen(PG_FUNCTION_ARGS)
{
! text *t = PG_GETARG_TEXT_P(0);! #ifdef MULTIBYTE
! /* optimization for single byte encoding */
! if (pg_database_encoding_max_length() <= 1)
! PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ);
!
! PG_RETURN_INT32(
! pg_mbstrlen_with_len(VARDATA(t), VARSIZE(t) - VARHDRSZ)
! );
! #else
! PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ);
! #endif
}/* --- 313,348 ---- Datum textlen(PG_FUNCTION_ARGS) { ! PG_RETURN_INT32(text_length(PG_GETARG_DATUM(0))); ! }! /*
! * text_length -
! * Does the real work for textlen()
! * This is broken out so it can be called directly by other string processing
! * functions.
! */
! static int32
! text_length(Datum str)
! {
! /* fastpath when max encoding length is one */
! if (pg_database_encoding_max_length() == 1)
! PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ);
!
! if (pg_database_encoding_max_length() > 1)
! {
! text *t = DatumGetTextP(str);
!
! PG_RETURN_INT32(pg_mbstrlen_with_len(VARDATA(t),
! VARSIZE(t) - VARHDRSZ));
! }
!
! /* should never get here */
! elog(ERROR, "Invalid backend encoding; encoding max length "
! "is less than one.");
!
! /* not reached: suppress compiler warning */
! return 0;
}/*
***************
*** 308,316 ****
Datum
textoctetlen(PG_FUNCTION_ARGS)
{
! text *arg = PG_GETARG_TEXT_P(0);
!
! PG_RETURN_INT32(VARSIZE(arg) - VARHDRSZ);
}/* --- 353,359 ---- Datum textoctetlen(PG_FUNCTION_ARGS) { ! PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ); }/*
***************
*** 382,471 ****
* - Thomas Lockhart 1998-12-10
* Now uses faster TOAST-slicing interface
* - John Gray 2002-02-22
*/
Datum
text_substr(PG_FUNCTION_ARGS)
{
! text *string;
! int32 m = PG_GETARG_INT32(1);
! int32 n = PG_GETARG_INT32(2);
! int32 sm;
! int32 sn;
! int eml = 1;
! #ifdef MULTIBYTE
! int i;
! int len;
! text *ret;
! char *p;
! #endif! /*
! * starting position before the start of the string? then offset into
! * the string per SQL92 spec...
! */
! if (m < 1)
{
! n += (m - 1);
! m = 1;
! }
! /* Check for m > octet length is made in TOAST access routine */! /* m will now become a zero-based starting position */
! sm = m - 1;
! sn = n;! #ifdef MULTIBYTE
! eml = pg_database_encoding_max_length ();! if (eml > 1)
{
! sm = 0;
! if (n > -1)
! sn = (m + n) * eml + 3; /* +3 to avoid mb characters overhanging slice end */
else
! sn = n; /* n < 0 is special-cased by heap_tuple_untoast_attr_slice */
! }
! #endif! string = PG_GETARG_TEXT_P_SLICE (0, sm, sn);
! if (eml == 1)
! {
! PG_RETURN_TEXT_P (string);
! }
! #ifndef MULTIBYTE
! PG_RETURN_NULL(); /* notreached: suppress compiler warning */
! #endif
! #ifdef MULTIBYTE
! if (n > -1)
! len = pg_mbstrlen_with_len (VARDATA (string), sn - 3);
! else /* n < 0 is special-cased; need full string length */
! len = pg_mbstrlen_with_len (VARDATA (string), VARSIZE(string)-VARHDRSZ);
!
! if (m > len)
! {
! m = 1;
! n = 0;
! }
! m--;
! if (((m + n) > len) || (n < 0))
! n = (len - m);
!
! p = VARDATA(string);
! for (i = 0; i < m; i++)
! p += pg_mblen(p);
! m = p - VARDATA(string);
! for (i = 0; i < n; i++)
! p += pg_mblen(p);
! n = p - (VARDATA(string) + m);! ret = (text *) palloc(VARHDRSZ + n);
! VARATT_SIZEP(ret) = VARHDRSZ + n;! memcpy(VARDATA(ret), VARDATA(string) + m, n);
! PG_RETURN_TEXT_P(ret);
! #endif
}/* --- 425,625 ---- * - Thomas Lockhart 1998-12-10 * Now uses faster TOAST-slicing interface * - John Gray 2002-02-22 + * Remove "#ifdef MULTIBYTE" and test for encoding_max_length instead. Change + * behaviors conflicting with SQL92 to meet SQL92 (if E = S + L < S throw + * error; if E < 1, return '', not entire string). Fixed MB related bug when + * S > LC and < LC + 4 sometimes garbage characters are returned. + * - Joe Conway 2002-08-10 */ Datum text_substr(PG_FUNCTION_ARGS) { ! PG_RETURN_TEXT_P(text_substring(PG_GETARG_DATUM(0), ! PG_GETARG_INT32(1), ! PG_GETARG_INT32(2), ! false)); ! }! /*
! * text_substr_no_len -
! * Wrapper to avoid opr_sanity failure due to
! * one function accepting a different number of args.
! */
! Datum
! text_substr_no_len(PG_FUNCTION_ARGS)
! {
! PG_RETURN_TEXT_P(text_substring(PG_GETARG_DATUM(0),
! PG_GETARG_INT32(1),
! -1, true));
! }
!
! /*
! * text_substring -
! * Does the real work for text_substr() and text_substr_no_len()
! * This is broken out so it can be called directly by other string processing
! * functions.
! */
! static text*
! text_substring(Datum str, int32 start, int32 length, bool length_not_specified)
! {
! int32 eml = pg_database_encoding_max_length();
! int32 S = start; /* start position */
! int32 S1; /* adjusted start position */
! int32 L1; /* adjusted substring length */
!
! /* life is easy if the encoding max length is 1 */
! if (eml == 1)
{
! S1 = Max(S, 1);! if (length_not_specified) /* special case - get length to end of string */
! L1 = -1;
! else
! {
! /* end position */
! int E = S + length;! /*
! * A negative value for L is the only way for the end position
! * to be before the start. SQL99 says to throw an error.
! */
! if (E < S)
! elog(ERROR, "negative substring length not allowed");! /*
! * A zero or negative value for the end position can happen if the start
! * was negative or one. SQL99 says to return a zero-length string.
! */
! if (E < 1)
! return PG_STR_GET_TEXT("");
!
! L1 = E - S1;
! }
!
! /*
! * If the start position is past the end of the string,
! * SQL99 says to return a zero-length string --
! * PG_GETARG_TEXT_P_SLICE() will do that for us.
! * Convert to zero-based starting position
! */
! return DatumGetTextPSlice(str, S1 - 1, L1);
! }
! else if (eml > 1)
{
! /*
! * When encoding max length is > 1, we can't get LC without
! * detoasting, so we'll grab a conservatively large slice
! * now and go back later to do the right thing
! */
! int32 slice_start;
! int32 slice_size;
! int32 slice_strlen;
! text *slice;
! int32 E1;
! int32 i;
! char *p;
! char *s;
! text *ret;
!
! /*
! * if S is past the end of the string, the tuple toaster
! * will return a zero-length string to us
! */
! S1 = Max(S, 1);
!
! /*
! * We need to start at position zero because there is no
! * way to know in advance which byte offset corresponds to
! * the supplied start position.
! */
! slice_start = 0;
!
! if (length_not_specified) /* special case - get length to end of string */
! slice_size = L1 = -1;
else
! {
! int E = S + length;
!
! /*
! * A negative value for L is the only way for the end position
! * to be before the start. SQL99 says to throw an error.
! */
! if (E < S)
! elog(ERROR, "negative substring length not allowed");! /*
! * A zero or negative value for the end position can happen if the start
! * was negative or one. SQL99 says to return a zero-length string.
! */
! if (E < 1)
! return PG_STR_GET_TEXT("");! /*
! * if E is past the end of the string, the tuple toaster
! * will truncate the length for us
! */
! L1 = E - S1;
!
! /*
! * Total slice size in bytes can't be any longer than the start
! * position plus substring length times the encoding max length.
! */
! slice_size = (S1 + L1) * eml;
! }
! slice = DatumGetTextPSlice(str, slice_start, slice_size);! /* see if we got back an empty string */
! if ((VARSIZE(slice) - VARHDRSZ) == 0)
! return PG_STR_GET_TEXT("");! /* Now we can get the actual length of the slice in MB characters */
! slice_strlen = pg_mbstrlen_with_len (VARDATA(slice), VARSIZE(slice) - VARHDRSZ);! /* Check that the start position wasn't > slice_strlen. If so,
! * SQL99 says to return a zero-length string.
! */
! if (S1 > slice_strlen)
! return PG_STR_GET_TEXT("");
!
! /*
! * Adjust L1 and E1 now that we know the slice string length.
! * Again remember that S1 is one based, and slice_start is zero based.
! */
! if (L1 > -1)
! E1 = Min(S1 + L1 , slice_start + 1 + slice_strlen);
! else
! E1 = slice_start + 1 + slice_strlen;
!
! /*
! * Find the start position in the slice;
! * remember S1 is not zero based
! */
! p = VARDATA(slice);
! for (i = 0; i < S1 - 1; i++)
! p += pg_mblen(p);
!
! /* hang onto a pointer to our start position */
! s = p;
!
! /*
! * Count the actual bytes used by the substring of
! * the requested length.
! */
! for (i = S1; i < E1; i++)
! p += pg_mblen(p);
!
! ret = (text *) palloc(VARHDRSZ + (p - s));
! VARATT_SIZEP(ret) = VARHDRSZ + (p - s);
! memcpy(VARDATA(ret), s, (p - s));
!
! return ret;
! }
! else
! elog(ERROR, "Invalid backend encoding; encoding max length "
! "is less than one.");
!
! /* not reached: suppress compiler warning */
! return PG_STR_GET_TEXT("");
}/*
***************
*** 481,536 ****
Datum
textpos(PG_FUNCTION_ARGS)
{
! text *t1 = PG_GETARG_TEXT_P(0);
! text *t2 = PG_GETARG_TEXT_P(1);
! int pos;
! int px,
! p;
! int len1,
len2;
- pg_wchar *p1,
- *p2;! #ifdef MULTIBYTE
! pg_wchar *ps1,
! *ps2;
! #endifif (VARSIZE(t2) <= VARHDRSZ)
PG_RETURN_INT32(1); /* result for empty pattern */len1 = (VARSIZE(t1) - VARHDRSZ);
len2 = (VARSIZE(t2) - VARHDRSZ);
! #ifdef MULTIBYTE
! ps1 = p1 = (pg_wchar *) palloc((len1 + 1) * sizeof(pg_wchar));
! (void) pg_mb2wchar_with_len((unsigned char *) VARDATA(t1), p1, len1);
! len1 = pg_wchar_strlen(p1);
! ps2 = p2 = (pg_wchar *) palloc((len2 + 1) * sizeof(pg_wchar));
! (void) pg_mb2wchar_with_len((unsigned char *) VARDATA(t2), p2, len2);
! len2 = pg_wchar_strlen(p2);
! #else
! p1 = VARDATA(t1);
! p2 = VARDATA(t2);
! #endif
! pos = 0;
px = (len1 - len2);
! for (p = 0; p <= px; p++)
{
! #ifdef MULTIBYTE
! if ((*p2 == *p1) && (pg_wchar_strncmp(p1, p2, len2) == 0))
! #else
! if ((*p2 == *p1) && (strncmp(p1, p2, len2) == 0))
! #endif
{
! pos = p + 1;
! break;
! };
! p1++;
! };
! #ifdef MULTIBYTE
! pfree(ps1);
! pfree(ps2);
! #endif
PG_RETURN_INT32(pos);
}--- 635,729 ---- Datum textpos(PG_FUNCTION_ARGS) { ! PG_RETURN_INT32(text_position(PG_GETARG_DATUM(0), PG_GETARG_DATUM(1), 1)); ! } ! ! /* ! * text_position - ! * Does the real work for textpos() ! * This is broken out so it can be called directly by other string processing ! * functions. ! */ ! static int32 ! text_position(Datum str, Datum search_str, int matchnum) ! { ! int eml = pg_database_encoding_max_length(); ! text *t1 = DatumGetTextP(str); ! text *t2 = DatumGetTextP(search_str); ! int match = 0, ! pos = 0, ! p = 0, ! px, ! len1, len2;! if(matchnum == 0)
! return 0; /* result for 0th match */if (VARSIZE(t2) <= VARHDRSZ)
PG_RETURN_INT32(1); /* result for empty pattern */len1 = (VARSIZE(t1) - VARHDRSZ);
len2 = (VARSIZE(t2) - VARHDRSZ);
!
! /* no use in searching str past point where search_str will fit */
px = (len1 - len2);
!
! if (eml == 1) /* simple case - single byte encoding */
{
! char *p1,
! *p2;
!
! p1 = VARDATA(t1);
! p2 = VARDATA(t2);
!
! for (p = 0; p <= px; p++)
{
! if ((*p2 == *p1) && (strncmp(p1, p2, len2) == 0))
! {
! if (++match == matchnum)
! {
! pos = p + 1;
! break;
! }
! }
! p1++;
! }
! }
! else if (eml > 1) /* not as simple - multibyte encoding */
! {
! pg_wchar *p1,
! *p2,
! *ps1,
! *ps2;
!
! ps1 = p1 = (pg_wchar *) palloc((len1 + 1) * sizeof(pg_wchar));
! (void) pg_mb2wchar_with_len((unsigned char *) VARDATA(t1), p1, len1);
! len1 = pg_wchar_strlen(p1);
! ps2 = p2 = (pg_wchar *) palloc((len2 + 1) * sizeof(pg_wchar));
! (void) pg_mb2wchar_with_len((unsigned char *) VARDATA(t2), p2, len2);
! len2 = pg_wchar_strlen(p2);
!
! for (p = 0; p <= px; p++)
! {
! if ((*p2 == *p1) && (pg_wchar_strncmp(p1, p2, len2) == 0))
! {
! if (++match == matchnum)
! {
! pos = p + 1;
! break;
! }
! }
! p1++;
! }
!
! pfree(ps1);
! pfree(ps2);
! }
! else
! elog(ERROR, "Invalid backend encoding; encoding max length "
! "is less than one.");
!
PG_RETURN_INT32(pos);
}***************
*** 758,766 ****
Datum
byteaoctetlen(PG_FUNCTION_ARGS)
{
! bytea *v = PG_GETARG_BYTEA_P(0);
!
! PG_RETURN_INT32(VARSIZE(v) - VARHDRSZ);
}/* --- 951,957 ---- Datum byteaoctetlen(PG_FUNCTION_ARGS) { ! PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ); }/* *************** *** 805,810 **** --- 996,1003 ---- PG_RETURN_BYTEA_P(result); }+ #define PG_STR_GET_BYTEA(str_) \ + DatumGetByteaP(DirectFunctionCall1(byteain, CStringGetDatum(str_))) /* * bytea_substr() * Return a substring starting at the specified position. *************** *** 813,845 **** * Input: * - string * - starting position (is one-based) ! * - string length * * If the starting position is zero or less, then return from the start of the string * adjusting the length to be consistent with the "negative start" per SQL92. ! * If the length is less than zero, return the remaining string. ! * */ Datum bytea_substr(PG_FUNCTION_ARGS) { ! int32 m = PG_GETARG_INT32(1); ! int32 n = PG_GETARG_INT32(2);! /*
! * starting position before the start of the string? then offset into
! * the string per SQL92 spec...
! */
! if (m < 1)
{
! n += (m - 1);
! m = 1;
}! /* m will now become a zero-based starting position */
! m--;! PG_RETURN_BYTEA_P(PG_GETARG_BYTEA_P_SLICE (0, m, n));
}/* --- 1006,1076 ---- * Input: * - string * - starting position (is one-based) ! * - string length (optional) * * If the starting position is zero or less, then return from the start of the string * adjusting the length to be consistent with the "negative start" per SQL92. ! * If the length is less than zero, an ERROR is thrown. If no third argument ! * (length) is provided, the length to the end of the string is assumed. */ Datum bytea_substr(PG_FUNCTION_ARGS) { ! int S = PG_GETARG_INT32(1); /* start position */ ! int S1; /* adjusted start position */ ! int L1; /* adjusted substring length */! S1 = Max(S, 1);
!
! if (fcinfo->nargs == 2)
! {
! /*
! * Not passed a length - PG_GETARG_BYTEA_P_SLICE()
! * grabs everything to the end of the string if we pass it
! * a negative value for length.
! */
! L1 = -1;
! }
! else
{
! /* end position */
! int E = S + PG_GETARG_INT32(2);
!
! /*
! * A negative value for L is the only way for the end position
! * to be before the start. SQL99 says to throw an error.
! */
! if (E < S)
! elog(ERROR, "negative substring length not allowed");
!
! /*
! * A zero or negative value for the end position can happen if the start
! * was negative or one. SQL99 says to return a zero-length string.
! */
! if (E < 1)
! PG_RETURN_BYTEA_P(PG_STR_GET_BYTEA(""));
!
! L1 = E - S1;
}! /*
! * If the start position is past the end of the string,
! * SQL99 says to return a zero-length string --
! * PG_GETARG_TEXT_P_SLICE() will do that for us.
! * Convert to zero-based starting position
! */
! PG_RETURN_BYTEA_P(PG_GETARG_BYTEA_P_SLICE (0, S1 - 1, L1));
! }! /*
! * bytea_substr_no_len -
! * Wrapper to avoid opr_sanity failure due to
! * one function accepting a different number of args.
! */
! Datum
! bytea_substr_no_len(PG_FUNCTION_ARGS)
! {
! return bytea_substr(fcinfo);
}/* *************** *** 1422,1424 **** --- 1653,1834 ----PG_RETURN_INT32(cmp); } + + /* + * replace_text + * replace all occurences of 'old_sub_str' in 'orig_str' + * with 'new_sub_str' to form 'new_str' + * + * returns 'orig_str' if 'old_sub_str' == '' or 'orig_str' == '' + * otherwise returns 'new_str' + */ + Datum + replace_text(PG_FUNCTION_ARGS) + { + text *left_text; + text *right_text; + text *buf_text; + text *ret_text; + int curr_posn; + text *src_text = PG_GETARG_TEXT_P(0); + int src_text_len = TEXTLEN(src_text); + text *from_sub_text = PG_GETARG_TEXT_P(1); + int from_sub_text_len = TEXTLEN(from_sub_text); + text *to_sub_text = PG_GETARG_TEXT_P(2); + char *to_sub_str = PG_TEXT_GET_STR(to_sub_text); + StringInfo str = makeStringInfo(); + + if (src_text_len == 0 || from_sub_text_len == 0) + PG_RETURN_TEXT_P(src_text); + + buf_text = TEXTDUP(src_text); + curr_posn = TEXTPOS(buf_text, from_sub_text); + + while (curr_posn > 0) + { + left_text = LEFT(buf_text, from_sub_text); + right_text = RIGHT(buf_text, from_sub_text, from_sub_text_len); + + appendStringInfo(str, PG_TEXT_GET_STR(left_text)); + appendStringInfo(str, to_sub_str); + + pfree(buf_text); + pfree(left_text); + buf_text = right_text; + curr_posn = TEXTPOS(buf_text, from_sub_text); + } + + appendStringInfo(str, PG_TEXT_GET_STR(buf_text)); + pfree(buf_text); + + ret_text = PG_STR_GET_TEXT(str->data); + pfree(str->data); + pfree(str); + + PG_RETURN_TEXT_P(ret_text); + } + + /* + * split_text + * parse input string + * return ord item (1 based) + * based on provided field separator + */ + Datum + split_text(PG_FUNCTION_ARGS) + { + text *inputstring = PG_GETARG_TEXT_P(0); + int inputstring_len = TEXTLEN(inputstring); + text *fldsep = PG_GETARG_TEXT_P(1); + int fldsep_len = TEXTLEN(fldsep); + int fldnum = PG_GETARG_INT32(2); + int start_posn = 0; + int end_posn = 0; + text *result_text; + + /* return empty string for empty input string */ + if (inputstring_len < 1) + PG_RETURN_TEXT_P(PG_STR_GET_TEXT("")); + + /* empty field separator */ + if (fldsep_len < 1) + { + if (fldnum == 1) /* first field - just return the input string */ + PG_RETURN_TEXT_P(inputstring); + else /* otherwise return an empty string */ + PG_RETURN_TEXT_P(PG_STR_GET_TEXT("")); + } + + /* field number is 1 based */ + if (fldnum < 1) + elog(ERROR, "field position must be > 0"); + + start_posn = text_position(PointerGetDatum(inputstring), + PointerGetDatum(fldsep), + fldnum - 1); + end_posn = text_position(PointerGetDatum(inputstring), + PointerGetDatum(fldsep), + fldnum); + + if ((start_posn == 0) && (end_posn == 0)) /* fldsep not found */ + { + if (fldnum == 1) /* first field - just return the input string */ + PG_RETURN_TEXT_P(inputstring); + else /* otherwise return an empty string */ + PG_RETURN_TEXT_P(PG_STR_GET_TEXT("")); + } + else if ((start_posn != 0) && (end_posn == 0)) + { + /* last field requested */ + result_text = text_substring(PointerGetDatum(inputstring), start_posn + fldsep_len, -1, true); + PG_RETURN_TEXT_P(result_text); + } + else if ((start_posn == 0) && (end_posn != 0)) + { + /* first field requested */ + result_text = LEFT(inputstring, fldsep); + PG_RETURN_TEXT_P(result_text); + } + else + { + /* prior to last field requested */ + result_text = text_substring(PointerGetDatum(inputstring), start_posn + fldsep_len, end_posn - start_posn - fldsep_len, false); + PG_RETURN_TEXT_P(result_text); + } + } + + #define HEXBASE 16 + /* + * Convert a int32 to a string containing a base 16 (hex) representation of + * the number. + */ + Datum + to_hex32(PG_FUNCTION_ARGS) + { + static char digits[] = "0123456789abcdef"; + char buf[32]; /* bigger than needed, but reasonable */ + char *ptr, + *end; + text *result_text; + int32 value = PG_GETARG_INT32(0); + + end = ptr = buf + sizeof(buf) - 1; + *ptr = '\0'; + + do + { + *--ptr = digits[value % HEXBASE]; + value /= HEXBASE; + } while (ptr > buf && value); + + result_text = PG_STR_GET_TEXT(ptr); + PG_RETURN_TEXT_P(result_text); + } + + /* + * Convert a int64 to a string containing a base 16 (hex) representation of + * the number. + */ + Datum + to_hex64(PG_FUNCTION_ARGS) + { + static char digits[] = "0123456789abcdef"; + char buf[32]; /* bigger than needed, but reasonable */ + char *ptr, + *end; + text *result_text; + int64 value = PG_GETARG_INT64(0); + + end = ptr = buf + sizeof(buf) - 1; + *ptr = '\0'; + + do + { + *--ptr = digits[value % HEXBASE]; + value /= HEXBASE; + } while (ptr > buf && value); + + result_text = PG_STR_GET_TEXT(ptr); + PG_RETURN_TEXT_P(result_text); + } + Index: src/include/catalog/pg_proc.h =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/include/catalog/pg_proc.h,v retrieving revision 1.254 diff -c -r1.254 pg_proc.h *** src/include/catalog/pg_proc.h 15 Aug 2002 02:51:27 -0000 1.254 --- src/include/catalog/pg_proc.h 16 Aug 2002 18:53:13 -0000 *************** *** 2121,2127 **** DESCR("remove initial characters from string"); DATA(insert OID = 882 ( rtrim PGNSP PGUID 14 f f t f i 1 25 "25" "select rtrim($1, \' \')" - _null_ )); DESCR("remove trailing characters from string"); ! DATA(insert OID = 883 ( substr PGNSP PGUID 14 f f t f i 2 25 "25 23" "select substr($1, $2, -1)" - _null_ )); DESCR("return portion of string"); DATA(insert OID = 884 ( btrim PGNSP PGUID 12 f f t f i 2 25 "25 25" btrim - _null_ )); DESCR("trim both ends of string"); --- 2121,2127 ---- DESCR("remove initial characters from string"); DATA(insert OID = 882 ( rtrim PGNSP PGUID 14 f f t f i 1 25 "25" "select rtrim($1, \' \')" - _null_ )); DESCR("remove trailing characters from string"); ! DATA(insert OID = 883 ( substr PGNSP PGUID 12 f f t f i 2 25 "25 23" text_substr_no_len - _null_ )); DESCR("return portion of string"); DATA(insert OID = 884 ( btrim PGNSP PGUID 12 f f t f i 2 25 "25 25" btrim - _null_ )); DESCR("trim both ends of string"); *************** *** 2130,2137 ****DATA(insert OID = 936 ( substring PGNSP PGUID 12 f f t f i 3 25 "25 23 23" text_substr - _null_ ));
DESCR("return portion of string");
! DATA(insert OID = 937 ( substring PGNSP PGUID 14 f f t f i 2 25 "25 23" "select substring($1, $2, -1)" - _null_ ));
DESCR("return portion of string");/* for multi-byte support */
--- 2130,2145 ----DATA(insert OID = 936 ( substring PGNSP PGUID 12 f f t f i 3 25 "25 23 23" text_substr - _null_ )); DESCR("return portion of string"); ! DATA(insert OID = 937 ( substring PGNSP PGUID 12 f f t f i 2 25 "25 23" text_substr_no_len - _null_ )); DESCR("return portion of string"); + DATA(insert OID = 2087 ( replace PGNSP PGUID 12 f f t f i 3 25 "25 25 25" replace_text - _null_ )); + DESCR("replace all occurrences of old_substr with new_substr in string"); + DATA(insert OID = 2088 ( split PGNSP PGUID 12 f f t f i 3 25 "25 25 23" split_text - _null_ )); + DESCR("split string by field_sep and return field_num"); + DATA(insert OID = 2089 ( to_hex PGNSP PGUID 12 f f t f i 1 25 "23" to_hex32 - _null_ )); + DESCR("convert int32 number to hex"); + DATA(insert OID = 2090 ( to_hex PGNSP PGUID 12 f f t f i 1 25 "20" to_hex64 - _null_ )); + DESCR("convert int64 number to hex");/* for multi-byte support */
*************** *** 2778,2784 **** DESCR("concatenate"); DATA(insert OID = 2012 ( substring PGNSP PGUID 12 f f t f i 3 17 "17 23 23" bytea_substr - _null_ )); DESCR("return portion of string"); ! DATA(insert OID = 2013 ( substring PGNSP PGUID 14 f f t f i 2 17 "17 23" "select substring($1, $2, -1)" - _null_ )); DESCR("return portion of string"); DATA(insert OID = 2014 ( position PGNSP PGUID 12 f f t f i 2 23 "17 17" byteapos - _null_ )); DESCR("return position of substring"); --- 2786,2796 ---- DESCR("concatenate"); DATA(insert OID = 2012 ( substring PGNSP PGUID 12 f f t f i 3 17 "17 23 23" bytea_substr - _null_ )); DESCR("return portion of string"); ! DATA(insert OID = 2013 ( substring PGNSP PGUID 12 f f t f i 2 17 "17 23" bytea_substr_no_len - _null_ )); ! DESCR("return portion of string"); ! DATA(insert OID = 2085 ( substr PGNSP PGUID 12 f f t f i 3 17 "17 23 23" bytea_substr - _null_ )); ! DESCR("return portion of string"); ! DATA(insert OID = 2086 ( substr PGNSP PGUID 12 f f t f i 2 17 "17 23" bytea_substr_no_len - _null_ )); DESCR("return portion of string"); DATA(insert OID = 2014 ( position PGNSP PGUID 12 f f t f i 2 23 "17 17" byteapos - _null_ )); DESCR("return position of substring"); Index: src/include/utils/builtins.h =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/include/utils/builtins.h,v retrieving revision 1.191 diff -c -r1.191 builtins.h *** src/include/utils/builtins.h 15 Aug 2002 02:51:27 -0000 1.191 --- src/include/utils/builtins.h 16 Aug 2002 18:53:13 -0000 *************** *** 447,458 **** --- 447,463 ---- extern Datum textoctetlen(PG_FUNCTION_ARGS); extern Datum textpos(PG_FUNCTION_ARGS); extern Datum text_substr(PG_FUNCTION_ARGS); + extern Datum text_substr_no_len(PG_FUNCTION_ARGS); extern Datum name_text(PG_FUNCTION_ARGS); extern Datum text_name(PG_FUNCTION_ARGS); extern int varstr_cmp(char *arg1, int len1, char *arg2, int len2); extern List *textToQualifiedNameList(text *textval, const char *caller); extern bool SplitIdentifierString(char *rawstring, char separator, List **namelist); + extern Datum replace_text(PG_FUNCTION_ARGS); + extern Datum split_text(PG_FUNCTION_ARGS); + extern Datum to_hex32(PG_FUNCTION_ARGS); + extern Datum to_hex64(PG_FUNCTION_ARGS);extern Datum unknownin(PG_FUNCTION_ARGS); extern Datum unknownout(PG_FUNCTION_ARGS); *************** *** 476,481 **** --- 481,487 ---- extern Datum byteacat(PG_FUNCTION_ARGS); extern Datum byteapos(PG_FUNCTION_ARGS); extern Datum bytea_substr(PG_FUNCTION_ARGS); + extern Datum bytea_substr_no_len(PG_FUNCTION_ARGS);/* version.c */ extern Datum pgsql_version(PG_FUNCTION_ARGS); Index: src/test/regress/expected/strings.out =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/test/regress/expected/strings.out,v retrieving revision 1.12 diff -c -r1.12 strings.out *** src/test/regress/expected/strings.out 11 Jun 2002 15:41:38 -0000 1.12 --- src/test/regress/expected/strings.out 16 Aug 2002 18:53:13 -0000 *************** *** 573,575 **** --- 573,738 ---- text and varchar (1 row)+ -- + -- test substr with toasted text values + -- + CREATE TABLE toasttest(f1 text); + insert into toasttest values(repeat('1234567890',10000)); + insert into toasttest values(repeat('1234567890',10000)); + -- If the starting position is zero or less, then return from the start of the string + -- adjusting the length to be consistent with the "negative start" per SQL92. + SELECT substr(f1, -1, 5) from toasttest; + substr + -------- + 123 + 123 + (2 rows) + + -- If the length is less than zero, an ERROR is thrown. + SELECT substr(f1, 5, -1) from toasttest; + ERROR: negative substring length not allowed + -- If no third argument (length) is provided, the length to the end of the + -- string is assumed. + SELECT substr(f1, 99995) from toasttest; + substr + -------- + 567890 + 567890 + (2 rows) + + -- If start plus length is > string length, the result is truncated to + -- string length + SELECT substr(f1, 99995, 10) from toasttest; + substr + -------- + 567890 + 567890 + (2 rows) + + DROP TABLE toasttest; + -- + -- test substr with toasted bytea values + -- + CREATE TABLE toasttest(f1 bytea); + insert into toasttest values(decode(repeat('1234567890',10000),'escape')); + insert into toasttest values(decode(repeat('1234567890',10000),'escape')); + -- If the starting position is zero or less, then return from the start of the string + -- adjusting the length to be consistent with the "negative start" per SQL92. + SELECT substr(f1, -1, 5) from toasttest; + substr + -------- + 123 + 123 + (2 rows) + + -- If the length is less than zero, an ERROR is thrown. + SELECT substr(f1, 5, -1) from toasttest; + ERROR: negative substring length not allowed + -- If no third argument (length) is provided, the length to the end of the + -- string is assumed. + SELECT substr(f1, 99995) from toasttest; + substr + -------- + 567890 + 567890 + (2 rows) + + -- If start plus length is > string length, the result is truncated to + -- string length + SELECT substr(f1, 99995, 10) from toasttest; + substr + -------- + 567890 + 567890 + (2 rows) + + DROP TABLE toasttest; + -- + -- test length + -- + SELECT length('abcdef') AS "length_6"; + length_6 + ---------- + 6 + (1 row) + + -- + -- test strpos + -- + SELECT strpos('abcdef', 'cd') AS "pos_3"; + pos_3 + ------- + 3 + (1 row) + + SELECT strpos('abcdef', 'xy') AS "pos_0"; + pos_0 + ------- + 0 + (1 row) + + -- + -- test replace + -- + SELECT replace('abcdef', 'de', '45') AS "abc45f"; + abc45f + -------- + abc45f + (1 row) + + SELECT replace('yabadabadoo', 'ba', '123') AS "ya123da123doo"; + ya123da123doo + --------------- + ya123da123doo + (1 row) + + SELECT replace('yabadoo', 'bad', '') AS "yaoo"; + yaoo + ------ + yaoo + (1 row) + + -- + -- test split + -- + select split('joeuser@mydatabase','@',0) AS "an error"; + ERROR: field position must be > 0 + select split('joeuser@mydatabase','@',1) AS "joeuser"; + joeuser + --------- + joeuser + (1 row) + + select split('joeuser@mydatabase','@',2) AS "mydatabase"; + mydatabase + ------------ + mydatabase + (1 row) + + select split('joeuser@mydatabase','@',3) AS "empty string"; + empty string + -------------- + + (1 row) + + select split('@joeuser@mydatabase@','@',2) AS "joeuser"; + joeuser + --------- + joeuser + (1 row) + + -- + -- test to_hex + -- + select to_hex(256*256*256 - 1) AS "ffffff"; + ffffff + -------- + ffffff + (1 row) + + select to_hex(256::bigint*256::bigint*256::bigint*256::bigint - 1) AS "ffffffff"; + ffffffff + ---------- + ffffffff + (1 row) + Index: src/test/regress/sql/strings.sql =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/test/regress/sql/strings.sql,v retrieving revision 1.8 diff -c -r1.8 strings.sql *** src/test/regress/sql/strings.sql 11 Jun 2002 15:41:38 -0000 1.8 --- src/test/regress/sql/strings.sql 16 Aug 2002 18:53:13 -0000 *************** *** 197,199 **** --- 197,292 ---- SELECT text 'text' || char(20) ' and characters' AS "Concat text to char";SELECT text 'text' || varchar ' and varchar' AS "Concat text to varchar"; + + -- + -- test substr with toasted text values + -- + CREATE TABLE toasttest(f1 text); + + insert into toasttest values(repeat('1234567890',10000)); + insert into toasttest values(repeat('1234567890',10000)); + + -- If the starting position is zero or less, then return from the start of the string + -- adjusting the length to be consistent with the "negative start" per SQL92. + SELECT substr(f1, -1, 5) from toasttest; + + -- If the length is less than zero, an ERROR is thrown. + SELECT substr(f1, 5, -1) from toasttest; + + -- If no third argument (length) is provided, the length to the end of the + -- string is assumed. + SELECT substr(f1, 99995) from toasttest; + + -- If start plus length is > string length, the result is truncated to + -- string length + SELECT substr(f1, 99995, 10) from toasttest; + + DROP TABLE toasttest; + + -- + -- test substr with toasted bytea values + -- + CREATE TABLE toasttest(f1 bytea); + + insert into toasttest values(decode(repeat('1234567890',10000),'escape')); + insert into toasttest values(decode(repeat('1234567890',10000),'escape')); + + -- If the starting position is zero or less, then return from the start of the string + -- adjusting the length to be consistent with the "negative start" per SQL92. + SELECT substr(f1, -1, 5) from toasttest; + + -- If the length is less than zero, an ERROR is thrown. + SELECT substr(f1, 5, -1) from toasttest; + + -- If no third argument (length) is provided, the length to the end of the + -- string is assumed. + SELECT substr(f1, 99995) from toasttest; + + -- If start plus length is > string length, the result is truncated to + -- string length + SELECT substr(f1, 99995, 10) from toasttest; + + DROP TABLE toasttest; + + -- + -- test length + -- + + SELECT length('abcdef') AS "length_6"; + + -- + -- test strpos + -- + + SELECT strpos('abcdef', 'cd') AS "pos_3"; + + SELECT strpos('abcdef', 'xy') AS "pos_0"; + + -- + -- test replace + -- + SELECT replace('abcdef', 'de', '45') AS "abc45f"; + + SELECT replace('yabadabadoo', 'ba', '123') AS "ya123da123doo"; + + SELECT replace('yabadoo', 'bad', '') AS "yaoo"; + + -- + -- test split + -- + select split('joeuser@mydatabase','@',0) AS "an error"; + + select split('joeuser@mydatabase','@',1) AS "joeuser"; + + select split('joeuser@mydatabase','@',2) AS "mydatabase"; + + select split('joeuser@mydatabase','@',3) AS "empty string"; + + select split('@joeuser@mydatabase@','@',2) AS "joeuser"; + + -- + -- test to_hex + -- + select to_hex(256*256*256 - 1) AS "ffffff"; + + select to_hex(256::bigint*256::bigint*256::bigint*256::bigint - 1) AS "ffffffff";
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073