Eliminate more detoast copies for packed varlenas

Started by Gregory Starkover 18 years ago14 messages
#1Gregory Stark
stark@enterprisedb.com
1 attachment(s)

Ok, this removes what should be most if not all of the call sites where we're
detoasting text or byteas. In particular it gets all the regexp/like functions
and all the trim/pad functions. It also gets hashtext and hash_any.

$ zcat packed-varlena-efficiency_v0.patch.gz | diffstat
backend/access/hash/hashfunc.c | 12 !!
backend/utils/adt/like.c | 80 !!!!!!!!!!!!!!!!!!!
backend/utils/adt/oracle_compat.c | 157 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
backend/utils/adt/regexp.c | 119 !!!!!!!!!!!!!!!!!!!!!!!!!!!!
include/fmgr.h | 1
5 files changed, 5 insertions(+), 364 modifications(!)

Attachments:

packed-varlena-efficiency_v0.patch.gzapplication/octet-streamDownload
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gregory Stark (#1)
Re: Eliminate more detoast copies for packed varlenas

Gregory Stark <stark@enterprisedb.com> writes:

Ok, this removes what should be most if not all of the call sites where we're
detoasting text or byteas. In particular it gets all the regexp/like functions
and all the trim/pad functions. It also gets hashtext and hash_any.

Applied with some fixes --- you'd missed like_match.c, which doubtless
explains Guillame's complaint that it didn't work ...

regards, tom lane

#3Gregory Stark
stark@enterprisedb.com
In reply to: Tom Lane (#2)
Re: [PATCHES] Eliminate more detoast copies for packed varlenas

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

Gregory Stark <stark@enterprisedb.com> writes:

Ok, this removes what should be most if not all of the call sites where we're
detoasting text or byteas. In particular it gets all the regexp/like functions
and all the trim/pad functions. It also gets hashtext and hash_any.

Applied with some fixes --- you'd missed like_match.c, which doubtless
explains Guillame's complaint that it didn't work ...

Strange. It passed all regression tests for me and it seems like this is
something that would have been caught even in single-byte mode by a simple
test. It seems to me that like_match.c only used for SIMILAR is that right?
That would explain it as there don't appear to be any tests of SIMILAR.

Separately:

Although I'd misdiagnosed the reason for the recent failures on buildfarm
member grebe, I see no reason to revert the 1-byte-header-friendly changes I
made in varlena.c. Instead, tweak the code a little bit to get more
advantage out of that.

This may have been a misdiagnosis of the buildfarm failures but it looks like
a correct bug fix to me. That is, it looks like regexp_split_to_array() was
capable of passing a packed varlena to setup_regexp_matches which wasn't
expecting it. But this I don't understand why it wouldn't cause regression
failures and indeed when I tested it with my build it didn't cause any
problems.

This all brings up the question of what other files should be considered for
fixing. There is the possibility that some of the other sites could turn out
to be performance critical for a given application. In particular I'm
primarily concerned with calls which could be invoked during a data load or
index build.

Of the following list it seems to me the calls in adt/acl.c are probably
interesting to fix. Especially since it seems we could just replace all those
text* parameters with Datum parameters since they're only going to be passed
to textin anyways.

After that the tsearch and xml data types are probably interesting, and the
various data type input functions and contrib gist/gin operator classes.

I'm fine doing the drudge work but wanted to check before I do it that I'm not
doing something we don't think is worth doing at this point in time.

src/backend/access/transam/xlog.c:3
src/backend/catalog/pg_conversion.c:2
src/backend/commands/sequence.c:1
src/backend/libpq/be-fsstubs.c:2
src/backend/tsearch/dict.c:3
src/backend/tsearch/to_tsany.c:6
src/backend/tsearch/wparser.c:6
src/backend/utils/adt/acl.c:61
src/backend/utils/adt/char.c:1
src/backend/utils/adt/date.c:3
src/backend/utils/adt/dbsize.c:2
src/backend/utils/adt/encode.c:1
src/backend/utils/adt/formatting.c:14
src/backend/utils/adt/genfile.c:3
src/backend/utils/adt/not_in.c:2
src/backend/utils/adt/quote.c:2
src/backend/utils/adt/regproc.c:1
src/backend/utils/adt/ruleutils.c:6
src/backend/utils/adt/tid.c:1
src/backend/utils/adt/timestamp.c:8
src/backend/utils/adt/tsquery_rewrite.c:1
src/backend/utils/adt/tsvector_op.c:3
src/backend/utils/adt/xml.c:24
src/backend/utils/mb/mbutils.c:1
src/tutorial/funcs_new.c:3

contrib/adminpack/adminpack.c:6
contrib/chkpass/chkpass.c:2
contrib/dblink/dblink.c:41
contrib/fuzzystrmatch/dmetaphone.c:2
contrib/fuzzystrmatch/fuzzystrmatch.c:6
contrib/hstore/hstore_gin.c:1
contrib/hstore/hstore_gist.c:1
contrib/hstore/hstore_op.c:6
contrib/intarray/_int_op.c:1
contrib/ltree/ltree_op.c:3
contrib/pageinspect/btreefuncs.c:3
contrib/pageinspect/rawpage.c:1
contrib/pg_trgm/trgm_gin.c:2
contrib/pg_trgm/trgm_gist.c:1
contrib/pg_trgm/trgm_op.c:3
contrib/pgcrypto/pgcrypto.c:10
contrib/pgcrypto/pgp-pgsql.c:1
contrib/pgrowlocks/pgrowlocks.c:1
contrib/pgstattuple/pgstatindex.c:2
contrib/pgstattuple/pgstattuple.c:1
contrib/sslinfo/sslinfo.c:2
contrib/tablefunc/tablefunc.c:14
contrib/tsearch2/dict.c:3
contrib/tsearch2/dict_ex.c:1
contrib/tsearch2/dict_ispell.c:1
contrib/tsearch2/dict_snowball.c:3
contrib/tsearch2/dict_syn.c:1
contrib/tsearch2/dict_thesaurus.c:1
contrib/tsearch2/query.c:4
contrib/tsearch2/query_rewrite.c:1
contrib/tsearch2/ts_cfg.c:1
contrib/tsearch2/ts_stat.c:2
contrib/tsearch2/tsvector.c:2
contrib/tsearch2/wparser.c:9
contrib/uuid-ossp/uuid-ossp.c:2
contrib/xml2/xpath.c:25
contrib/xml2/xslt_proc.c:3

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#4Andrew Dunstan
andrew@dunslane.net
In reply to: Gregory Stark (#3)
Re: [PATCHES] Eliminate more detoast copies for packed varlenas

Gregory Stark wrote:

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

Gregory Stark <stark@enterprisedb.com> writes:

Ok, this removes what should be most if not all of the call sites where we're
detoasting text or byteas. In particular it gets all the regexp/like functions
and all the trim/pad functions. It also gets hashtext and hash_any.

Applied with some fixes --- you'd missed like_match.c, which doubtless
explains Guillame's complaint that it didn't work ...

Strange. It passed all regression tests for me and it seems like this is
something that would have been caught even in single-byte mode by a simple
test. It seems to me that like_match.c only used for SIMILAR is that right?
That would explain it as there don't appear to be any tests of SIMILAR.

No. like_match.c contains the template for all the various incarnations
of LIKE and ILIKE functions. It is included multiple times with
different sets of #defines in like.c to create those functions
(currently there are 4 of them). It also supplies the template for the
like_escape functions, and this is where the macros are used. Those
functions are apparently only called if there is an explicit ESCAPE
clause. Some of our regression tests do have this, so I'm not sure what
happened.

cheers

andrew

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gregory Stark (#3)
Re: [PATCHES] Eliminate more detoast copies for packed varlenas

Gregory Stark <stark@enterprisedb.com> writes:

This may have been a misdiagnosis of the buildfarm failures but it looks like
a correct bug fix to me. That is, it looks like regexp_split_to_array() was
capable of passing a packed varlena to setup_regexp_matches which wasn't
expecting it. But this I don't understand why it wouldn't cause regression
failures and indeed when I tested it with my build it didn't cause any
problems.

The particular regression tests we have for those functions seem to pass
constants to them, not table columns, and so they don't see packed inputs.

(It might be interesting to make textin produce a packed result when
possible, just to see what breaks; but I would be afraid to try to do
that for production...)

This all brings up the question of what other files should be considered for
fixing.

I'm very much against such a wholesale edit as you seem to have in mind
here. We already had some destabilization from the limited patch that
went in; now when we're trying to get to beta is not the time for more.
Maybe at the beginning of 8.4 devel cycle would be a reasonable time
to consider touching a lot of files.

regards, tom lane

#6Gregory Stark
stark@enterprisedb.com
In reply to: Tom Lane (#5)
Re: [PATCHES] Eliminate more detoast copies for packed varlenas

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

Gregory Stark <stark@enterprisedb.com> writes:

(It might be interesting to make textin produce a packed result when
possible, just to see what breaks; but I would be afraid to try to do
that for production...)

This all brings up the question of what other files should be considered for
fixing.

I'm very much against such a wholesale edit as you seem to have in mind
here. We already had some destabilization from the limited patch that
went in; now when we're trying to get to beta is not the time for more.
Maybe at the beginning of 8.4 devel cycle would be a reasonable time
to consider touching a lot of files.

Well I did expect that sort of concern if I went ahead and did all of them, or
nearly all of them. That's why I'm asking if any of the list seem like they
might be important enough to do now.

For 8.4 I'm starting to think it would make sense to make the distinction
between a "real" varlena and a possibly-unaligned pointer so text* wouldn't be
an ambiguous type which might not be what it appears to be.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#7Gregory Stark
stark@enterprisedb.com
In reply to: Tom Lane (#5)
1 attachment(s)
Re: [PATCHES] Eliminate more detoast copies for packed varlenas

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

(It might be interesting to make textin produce a packed result when
possible, just to see what breaks; but I would be afraid to try to do
that for production...)

Reassuringly all checks pass with a hack like that in place. (attached)

I think the right approach here is to define a new type text_packed * (which
would just be a char* or varattrib_1b* or something like that). Then
PG_GETARG_*_PP would return one of these new pointers.

This does leave us with warnings whenever an old-style function calls a
new-style function but I think there would be relatively few of those since
we'll tackle the higher traffic areas first which will be the lower level
functions.

The benefit is that it will give us a warning if we try to pass a pointer from
a new-style function to an old-style function which isn't prepared to receive
it.

Attachments:

debug-packed-varlena_v0.patchtext/x-diffDownload
Index: src/backend/utils/adt/varlena.c
===================================================================
RCS file: /home/stark/src/REPOSITORY/pgsql/src/backend/utils/adt/varlena.c,v
retrieving revision 1.159
diff -c -r1.159 varlena.c
*** src/backend/utils/adt/varlena.c	22 Sep 2007 04:40:03 -0000	1.159
--- src/backend/utils/adt/varlena.c	24 Sep 2007 14:08:30 -0000
***************
*** 263,268 ****
--- 263,278 ----
  	int			len;
  
  	len = strlen(inputText);
+ 
+ #ifdef DEBUG_PACKED_VARLENA
+ 	if (len < VARATT_SHORT_MAX-VARHDRSZ_SHORT) {
+ 		result = (text*) palloc(len + VARHDRSZ_SHORT);
+ 		memcpy(VARDATA_SHORT(result), inputText, len);
+ 		SET_VARSIZE_SHORT(result, len+VARHDRSZ_SHORT);
+ 		PG_RETURN_TEXT_P(result);
+ 	}
+ #endif
+ 
  	result = (text *) palloc(len + VARHDRSZ);
  	SET_VARSIZE(result, len + VARHDRSZ);
  
#8Brendan Jurd
direvus@gmail.com
In reply to: Gregory Stark (#1)
1 attachment(s)
Re: Eliminate more detoast copies for packed varlenas

On 9/22/07, Gregory Stark <stark@enterprisedb.com> wrote:

Ok, this removes what should be most if not all of the call sites where we're
detoasting text or byteas. In particular it gets all the regexp/like functions
and all the trim/pad functions. It also gets hashtext and hash_any.

Looks like there's some more of this in src/tutorial/funcs.c and funcs_new.c.

On a related note, while I was trawling through header files trying to
wrap my head around all this toast and varlena business, I found the
following comment, in fmgr.h and reiterated in postgres.h:

<>
WARNING: It is only safe to use PG_DETOAST_DATUM_UNPACKED() and
VARDATA_ANY() if you really don't care about the alignment.
</>

Shouldn't this be PG_DETOAST_DATUM_PACKED()? I'm emboldened by the
fact that there is no macro called PG_TOAST_DATUM_UNPACKED defined
anywhere in postgres.

Patch attached, in case I've got the right idea.

Regards,
BJ

Attachments:

packed-comment.difftext/plain; name=packed-comment.diffDownload
Index: src/include/fmgr.h
===================================================================
--- src/include/fmgr.h	(revision 29144)
+++ src/include/fmgr.h	(working copy)
@@ -158,11 +158,11 @@
  * The resulting datum can be accessed using VARSIZE_ANY() and VARDATA_ANY()
  * (beware of multiple evaluations in those macros!)
  *
- * WARNING: It is only safe to use PG_DETOAST_DATUM_UNPACKED() and
- * VARDATA_ANY() if you really don't care about the alignment. Either because
- * you're working with something like text where the alignment doesn't matter
- * or because you're not going to access its constituent parts and just use
- * things like memcpy on it anyways.
+ * WARNING: It is only safe to use PG_DETOAST_DATUM_PACKED() and VARDATA_ANY()
+ * if you really don't care about the alignment. Either because you're working
+ * with something like text where the alignment doesn't matter or because
+ * you're not going to access its constituent parts and just use things like
+ * memcpy on it anyways.
  *
  * Note: it'd be nice if these could be macros, but I see no way to do that
  * without evaluating the arguments multiple times, which is NOT acceptable.
Index: src/include/postgres.h
===================================================================
--- src/include/postgres.h	(revision 29144)
+++ src/include/postgres.h	(working copy)
@@ -237,7 +237,7 @@
  * code that specifically wants to work with still-toasted Datums.
  *
  * WARNING: It is only safe to use VARDATA_ANY() -- typically with
- * PG_DETOAST_DATUM_UNPACKED() -- if you really don't care about the alignment.
+ * PG_DETOAST_DATUM_PACKED() -- if you really don't care about the alignment.
  * Either because you're working with something like text where the alignment
  * doesn't matter or because you're not going to access its constituent parts
  * and just use things like memcpy on it anyways.
#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brendan Jurd (#8)
Re: Eliminate more detoast copies for packed varlenas

"Brendan Jurd" <direvus@gmail.com> writes:

On 9/22/07, Gregory Stark <stark@enterprisedb.com> wrote:

Ok, this removes what should be most if not all of the call sites where we're
detoasting text or byteas. In particular it gets all the regexp/like functions
and all the trim/pad functions. It also gets hashtext and hash_any.

Looks like there's some more of this in src/tutorial/funcs.c and funcs_new.c.

Well, those are just tutorial code, so I'd vote for keeping it simple.

On a related note, while I was trawling through header files trying to
wrap my head around all this toast and varlena business, I found the
following comment, in fmgr.h and reiterated in postgres.h:
WARNING: It is only safe to use PG_DETOAST_DATUM_UNPACKED() and
VARDATA_ANY() if you really don't care about the alignment.
Shouldn't this be PG_DETOAST_DATUM_PACKED()?

Yup, good catch. In the context of fmgr.h, it seems like the comment
is more sensible if it refers to the underlying function
pg_detoast_datum_packed (which is what the preceding paragraph is
speaking of), so I made it say that instead.

regards, tom lane

#10Gregory Stark
stark@enterprisedb.com
In reply to: Gregory Stark (#7)
Re: [PATCHES] Eliminate more detoast copies for packed varlenas

"Gregory Stark" <stark@enterprisedb.com> writes:

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

(It might be interesting to make textin produce a packed result when
possible, just to see what breaks; but I would be afraid to try to do
that for production...)

Reassuringly all checks pass with a hack like that in place. (attached)

For the record I've been doing some more testing and found one place that
could be a problem down the road. I'm not sure why it didn't show up
previously. In selfuncs.c we use VARDATA/VARSIZE on data that is taken from
parser Const nodes and from the histogram arrays without detoasting them.

Currently this is safe as array elements are not packed and parser nodes
contain values read using textin and never stored in a tuple. But down the
road I expect we'll want to pack array element so this code would need to
detoast the elements or prepare to handle packed elements.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gregory Stark (#10)
Re: [PATCHES] Eliminate more detoast copies for packed varlenas

Gregory Stark <stark@enterprisedb.com> writes:

For the record I've been doing some more testing and found one place that
could be a problem down the road. I'm not sure why it didn't show up
previously. In selfuncs.c we use VARDATA/VARSIZE on data that is taken from
parser Const nodes and from the histogram arrays without detoasting them.

Currently this is safe as array elements are not packed and parser nodes
contain values read using textin and never stored in a tuple. But down the
road I expect we'll want to pack array element so this code would need to
detoast the elements or prepare to handle packed elements.

Hmmm ... I think this should be fixed now, actually. I'm far from
convinced that a Const could never contain a toasted datum. Consider
constant-folding in the planner --- it just stuffs the result of a
function into a Const node.

In fact, it seems there's a different risk here: if such a datum were
toasted out-of-line, the reference in a cached plan might live longer
than the underlying toast-table data. Maybe we need a forcible detoast
in evaluate_function().

regards, tom lane

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#11)
Re: [PATCHES] Eliminate more detoast copies for packed varlenas

I wrote:

In fact, it seems there's a different risk here: if such a datum were
toasted out-of-line, the reference in a cached plan might live longer
than the underlying toast-table data. Maybe we need a forcible detoast
in evaluate_function().

Sure enough, it seems we do. The attached script fails in every release
back to 7.3. It's a rather contrived case, because a function marked
immutable probably shouldn't be reading from a table at all, especially
not one you are likely to change or drop. That's probably why we've not
heard reports of this before. But it's definitely a bug.

regards, tom lane

create table big(f1 text);
insert into big values(repeat('xyzzy',100000));

create or replace function getbig() returns text as
'select f1 from big' language sql immutable;

create or replace function usebig(text) returns bool as '
begin return $1 ~ ''xyzzy''; end
' language plpgsql;

prepare foo as select usebig(getbig()) from int4_tbl;

execute foo;

drop table big;

execute foo;

#13Bruce Momjian
bruce@momjian.us
In reply to: Gregory Stark (#3)
Re: [PATCHES] Eliminate more detoast copies for packed varlenas

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Gregory Stark wrote:

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

Gregory Stark <stark@enterprisedb.com> writes:

Ok, this removes what should be most if not all of the call sites where we're
detoasting text or byteas. In particular it gets all the regexp/like functions
and all the trim/pad functions. It also gets hashtext and hash_any.

Applied with some fixes --- you'd missed like_match.c, which doubtless
explains Guillame's complaint that it didn't work ...

Strange. It passed all regression tests for me and it seems like this is
something that would have been caught even in single-byte mode by a simple
test. It seems to me that like_match.c only used for SIMILAR is that right?
That would explain it as there don't appear to be any tests of SIMILAR.

Separately:

Although I'd misdiagnosed the reason for the recent failures on buildfarm
member grebe, I see no reason to revert the 1-byte-header-friendly changes I
made in varlena.c. Instead, tweak the code a little bit to get more
advantage out of that.

This may have been a misdiagnosis of the buildfarm failures but it looks like
a correct bug fix to me. That is, it looks like regexp_split_to_array() was
capable of passing a packed varlena to setup_regexp_matches which wasn't
expecting it. But this I don't understand why it wouldn't cause regression
failures and indeed when I tested it with my build it didn't cause any
problems.

This all brings up the question of what other files should be considered for
fixing. There is the possibility that some of the other sites could turn out
to be performance critical for a given application. In particular I'm
primarily concerned with calls which could be invoked during a data load or
index build.

Of the following list it seems to me the calls in adt/acl.c are probably
interesting to fix. Especially since it seems we could just replace all those
text* parameters with Datum parameters since they're only going to be passed
to textin anyways.

After that the tsearch and xml data types are probably interesting, and the
various data type input functions and contrib gist/gin operator classes.

I'm fine doing the drudge work but wanted to check before I do it that I'm not
doing something we don't think is worth doing at this point in time.

src/backend/access/transam/xlog.c:3
src/backend/catalog/pg_conversion.c:2
src/backend/commands/sequence.c:1
src/backend/libpq/be-fsstubs.c:2
src/backend/tsearch/dict.c:3
src/backend/tsearch/to_tsany.c:6
src/backend/tsearch/wparser.c:6
src/backend/utils/adt/acl.c:61
src/backend/utils/adt/char.c:1
src/backend/utils/adt/date.c:3
src/backend/utils/adt/dbsize.c:2
src/backend/utils/adt/encode.c:1
src/backend/utils/adt/formatting.c:14
src/backend/utils/adt/genfile.c:3
src/backend/utils/adt/not_in.c:2
src/backend/utils/adt/quote.c:2
src/backend/utils/adt/regproc.c:1
src/backend/utils/adt/ruleutils.c:6
src/backend/utils/adt/tid.c:1
src/backend/utils/adt/timestamp.c:8
src/backend/utils/adt/tsquery_rewrite.c:1
src/backend/utils/adt/tsvector_op.c:3
src/backend/utils/adt/xml.c:24
src/backend/utils/mb/mbutils.c:1
src/tutorial/funcs_new.c:3

contrib/adminpack/adminpack.c:6
contrib/chkpass/chkpass.c:2
contrib/dblink/dblink.c:41
contrib/fuzzystrmatch/dmetaphone.c:2
contrib/fuzzystrmatch/fuzzystrmatch.c:6
contrib/hstore/hstore_gin.c:1
contrib/hstore/hstore_gist.c:1
contrib/hstore/hstore_op.c:6
contrib/intarray/_int_op.c:1
contrib/ltree/ltree_op.c:3
contrib/pageinspect/btreefuncs.c:3
contrib/pageinspect/rawpage.c:1
contrib/pg_trgm/trgm_gin.c:2
contrib/pg_trgm/trgm_gist.c:1
contrib/pg_trgm/trgm_op.c:3
contrib/pgcrypto/pgcrypto.c:10
contrib/pgcrypto/pgp-pgsql.c:1
contrib/pgrowlocks/pgrowlocks.c:1
contrib/pgstattuple/pgstatindex.c:2
contrib/pgstattuple/pgstattuple.c:1
contrib/sslinfo/sslinfo.c:2
contrib/tablefunc/tablefunc.c:14
contrib/tsearch2/dict.c:3
contrib/tsearch2/dict_ex.c:1
contrib/tsearch2/dict_ispell.c:1
contrib/tsearch2/dict_snowball.c:3
contrib/tsearch2/dict_syn.c:1
contrib/tsearch2/dict_thesaurus.c:1
contrib/tsearch2/query.c:4
contrib/tsearch2/query_rewrite.c:1
contrib/tsearch2/ts_cfg.c:1
contrib/tsearch2/ts_stat.c:2
contrib/tsearch2/tsvector.c:2
contrib/tsearch2/wparser.c:9
contrib/uuid-ossp/uuid-ossp.c:2
contrib/xml2/xpath.c:25
contrib/xml2/xslt_proc.c:3

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#14Bruce Momjian
bruce@momjian.us
In reply to: Gregory Stark (#3)
Re: [PATCHES] Eliminate more detoast copies for packed varlenas

Added to TODO:

* Research reducing deTOASTing in more places

http://archives.postgresql.org/pgsql-hackers/2007-09/msg00895.php

---------------------------------------------------------------------------

Gregory Stark wrote:

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

Gregory Stark <stark@enterprisedb.com> writes:

Ok, this removes what should be most if not all of the call sites where we're
detoasting text or byteas. In particular it gets all the regexp/like functions
and all the trim/pad functions. It also gets hashtext and hash_any.

Applied with some fixes --- you'd missed like_match.c, which doubtless
explains Guillame's complaint that it didn't work ...

Strange. It passed all regression tests for me and it seems like this is
something that would have been caught even in single-byte mode by a simple
test. It seems to me that like_match.c only used for SIMILAR is that right?
That would explain it as there don't appear to be any tests of SIMILAR.

Separately:

Although I'd misdiagnosed the reason for the recent failures on buildfarm
member grebe, I see no reason to revert the 1-byte-header-friendly changes I
made in varlena.c. Instead, tweak the code a little bit to get more
advantage out of that.

This may have been a misdiagnosis of the buildfarm failures but it looks like
a correct bug fix to me. That is, it looks like regexp_split_to_array() was
capable of passing a packed varlena to setup_regexp_matches which wasn't
expecting it. But this I don't understand why it wouldn't cause regression
failures and indeed when I tested it with my build it didn't cause any
problems.

This all brings up the question of what other files should be considered for
fixing. There is the possibility that some of the other sites could turn out
to be performance critical for a given application. In particular I'm
primarily concerned with calls which could be invoked during a data load or
index build.

Of the following list it seems to me the calls in adt/acl.c are probably
interesting to fix. Especially since it seems we could just replace all those
text* parameters with Datum parameters since they're only going to be passed
to textin anyways.

After that the tsearch and xml data types are probably interesting, and the
various data type input functions and contrib gist/gin operator classes.

I'm fine doing the drudge work but wanted to check before I do it that I'm not
doing something we don't think is worth doing at this point in time.

src/backend/access/transam/xlog.c:3
src/backend/catalog/pg_conversion.c:2
src/backend/commands/sequence.c:1
src/backend/libpq/be-fsstubs.c:2
src/backend/tsearch/dict.c:3
src/backend/tsearch/to_tsany.c:6
src/backend/tsearch/wparser.c:6
src/backend/utils/adt/acl.c:61
src/backend/utils/adt/char.c:1
src/backend/utils/adt/date.c:3
src/backend/utils/adt/dbsize.c:2
src/backend/utils/adt/encode.c:1
src/backend/utils/adt/formatting.c:14
src/backend/utils/adt/genfile.c:3
src/backend/utils/adt/not_in.c:2
src/backend/utils/adt/quote.c:2
src/backend/utils/adt/regproc.c:1
src/backend/utils/adt/ruleutils.c:6
src/backend/utils/adt/tid.c:1
src/backend/utils/adt/timestamp.c:8
src/backend/utils/adt/tsquery_rewrite.c:1
src/backend/utils/adt/tsvector_op.c:3
src/backend/utils/adt/xml.c:24
src/backend/utils/mb/mbutils.c:1
src/tutorial/funcs_new.c:3

contrib/adminpack/adminpack.c:6
contrib/chkpass/chkpass.c:2
contrib/dblink/dblink.c:41
contrib/fuzzystrmatch/dmetaphone.c:2
contrib/fuzzystrmatch/fuzzystrmatch.c:6
contrib/hstore/hstore_gin.c:1
contrib/hstore/hstore_gist.c:1
contrib/hstore/hstore_op.c:6
contrib/intarray/_int_op.c:1
contrib/ltree/ltree_op.c:3
contrib/pageinspect/btreefuncs.c:3
contrib/pageinspect/rawpage.c:1
contrib/pg_trgm/trgm_gin.c:2
contrib/pg_trgm/trgm_gist.c:1
contrib/pg_trgm/trgm_op.c:3
contrib/pgcrypto/pgcrypto.c:10
contrib/pgcrypto/pgp-pgsql.c:1
contrib/pgrowlocks/pgrowlocks.c:1
contrib/pgstattuple/pgstatindex.c:2
contrib/pgstattuple/pgstattuple.c:1
contrib/sslinfo/sslinfo.c:2
contrib/tablefunc/tablefunc.c:14
contrib/tsearch2/dict.c:3
contrib/tsearch2/dict_ex.c:1
contrib/tsearch2/dict_ispell.c:1
contrib/tsearch2/dict_snowball.c:3
contrib/tsearch2/dict_syn.c:1
contrib/tsearch2/dict_thesaurus.c:1
contrib/tsearch2/query.c:4
contrib/tsearch2/query_rewrite.c:1
contrib/tsearch2/ts_cfg.c:1
contrib/tsearch2/ts_stat.c:2
contrib/tsearch2/tsvector.c:2
contrib/tsearch2/wparser.c:9
contrib/uuid-ossp/uuid-ossp.c:2
contrib/xml2/xpath.c:25
contrib/xml2/xslt_proc.c:3

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +