WIP: index support for regexp search
Hackers,
WIP patch with index support for regexp search for pg_trgm contrib is
attached.
In spite of techniques which extracts continuous text parts from regexp,
this patch presents technique of automatum transformation. That allows more
comprehensive trigrams extraction.
A little example of possible perfomance benefit.
test=# explain analyze select * from words where s ~ 'a[bc]+[de]';
QUERY PLAN
----------------------------------------------------------------------------------------------------
---
Seq Scan on words (cost=0.00..1703.11 rows=10 width=9) (actual
time=3.092..242.303 rows=662 loops=1)
Filter: (s ~ 'a[bc]+[de]'::text)
Rows Removed by Filter: 97907
Total runtime: 243.213 ms
(4 rows)
test=# explain analyze select * from words where s ~ 'a[bc]+[de]';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on words (cost=260.08..295.83 rows=10 width=9) (actual
time=4.166..7.506 rows=662 loops=1)
Recheck Cond: (s ~ 'a[bc]+[de]'::text)
Rows Removed by Index Recheck: 18
-> Bitmap Index Scan on words_trgm_idx (cost=0.00..260.07 rows=10
width=0) (actual time=4.076..4.076 rows=680 loops=1)
Index Cond: (s ~ 'a[bc]+[de]'::text)
Total runtime: 8.424 ms
(6 rows)
Current version of patch have some limitations:
1) Algorithm of logical expression extraction on trigrams have high
computational complexity. So, it can become really slow on regexp with many
branches. Probably, improvements of this algorithm is possible.
2) Surely, no perfomance benefit if no trigrams can be extracted from
regexp. It's inevitably.
3) Currently, only GIN index is supported. There are no serious problems,
GiST code for it just not written yet.
4) It appear to be some kind of problem to extract multibyte encoded
character from pg_wchar. I've posted question about it here:
http://archives.postgresql.org/pgsql-hackers/2011-11/msg01222.php
While I've hardcoded some dirty solution. So
PG_EUC_JP, PG_EUC_CN, PG_EUC_KR, PG_EUC_TW, PG_EUC_JIS_2004 are not
supported yet.
------
With best regards,
Alexander Korotkov.
Attachments:
On Tue, Nov 22, 2011 at 2:38 PM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:
WIP patch with index support for regexp search for pg_trgm contrib is
attached.
In spite of techniques which extracts continuous text parts from regexp,
this patch presents technique of automatum transformation. That allows more
comprehensive trigrams extraction.
Please add this patch here so it does not get lost in the shuffle:
https://commitfest.postgresql.org/action/commitfest_view/open
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Dec 1, 2011 at 12:29 AM, Robert Haas <robertmhaas@gmail.com> wrote:
Please add this patch here so it does not get lost in the shuffle:
https://commitfest.postgresql.org/action/commitfest_view/open
Done.
------
With best regards,
Alexander Korotkov.
On 22.11.2011 21:38, Alexander Korotkov wrote:
WIP patch with index support for regexp search for pg_trgm contrib is
attached.
In spite of techniques which extracts continuous text parts from regexp,
this patch presents technique of automatum transformation. That allows more
comprehensive trigrams extraction.
Nice!
Current version of patch have some limitations:
1) Algorithm of logical expression extraction on trigrams have high
computational complexity. So, it can become really slow on regexp with many
branches. Probably, improvements of this algorithm is possible.
2) Surely, no perfomance benefit if no trigrams can be extracted from
regexp. It's inevitably.
3) Currently, only GIN index is supported. There are no serious problems,
GiST code for it just not written yet.
4) It appear to be some kind of problem to extract multibyte encoded
character from pg_wchar. I've posted question about it here:
http://archives.postgresql.org/pgsql-hackers/2011-11/msg01222.php
While I've hardcoded some dirty solution. So
PG_EUC_JP, PG_EUC_CN, PG_EUC_KR, PG_EUC_TW, PG_EUC_JIS_2004 are not
supported yet.
This is pretty far from being in committable state, so I'm going to mark
this as "returned with feedback" in the commitfest app. The feedback:
The code badly needs comments. There is no explanation of how the
trigram extraction code in trgm_regexp.c works. Guessing from the
variable names, it seems to be some sort of a coloring algorithm that
works on a graph, but that all needs to be explained. Can this algorithm
be found somewhere in literature, perhaps? A link to a paper would be nice.
Apart from that, the multibyte issue seems like the big one. Any way
around that?
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Fri, Jan 20, 2012 at 12:30 AM, Heikki Linnakangas <
heikki.linnakangas@enterprisedb.com> wrote:
The code badly needs comments. There is no explanation of how the trigram
extraction code in trgm_regexp.c works.
Sure. I hoped to find a time for comments before commitfest starts.
Unfortunately I didn't, sorry.
Guessing from the variable names, it seems to be some sort of a coloring
algorithm that works on a graph, but that all needs to be explained. Can
this algorithm be found somewhere in literature, perhaps? A link to a paper
would be nice.
I hope it's truly novel. At least application to regular expressions. I'm
going to write a paper about it.
Apart from that, the multibyte issue seems like the big one. Any way
around that?
Conversion of pg_wchar to multibyte character is the only way I found to
avoid serious hacking of existing regexp code. Do you think additional
function in pg_wchar_tbl which converts pg_wchar back to multibyte
character is possible solution?
------
With best regards,
Alexander Korotkov.
I also have a question about pg_wchar.
/*
*-------------------------------------------------------------------
* encoding info table
* XXX must be sorted by the same order as enum pg_enc (in mb/pg_wchar.h)
*-------------------------------------------------------------------
*/
pg_wchar_tbl pg_wchar_table[] = {
{pg_ascii2wchar_with_len, pg_ascii_mblen, pg_ascii_dsplen,
pg_ascii_verifier, 1}, /* PG_SQL_ASCII */
{pg_eucjp2wchar_with_len, pg_eucjp_mblen, pg_eucjp_dsplen,
pg_eucjp_verifier, 3}, /* PG_EUC_JP */
{pg_euccn2wchar_with_len, pg_euccn_mblen, pg_euccn_dsplen,
pg_euccn_verifier, 2}, /* PG_EUC_CN */
{pg_euckr2wchar_with_len, pg_euckr_mblen, pg_euckr_dsplen,
pg_euckr_verifier, 3}, /* PG_EUC_KR */
{pg_euctw2wchar_with_len, pg_euctw_mblen, pg_euctw_dsplen,
pg_euctw_verifier, 4}, /* PG_EUC_TW */
{pg_eucjp2wchar_with_len, pg_eucjp_mblen, pg_eucjp_dsplen,
pg_eucjp_verifier, 3}, /* PG_EUC_JIS_2004 */
{pg_utf2wchar_with_len, pg_utf_mblen, pg_utf_dsplen, pg_utf8_verifier, 4}, /*
PG_UTF8 */
{pg_mule2wchar_with_len, pg_mule_mblen, pg_mule_dsplen, pg_mule_verifier,
4}, /* PG_MULE_INTERNAL */
{pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_LATIN1 */
{pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_LATIN2 */
{pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_LATIN3 */
{pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_LATIN4 */
{pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_LATIN5 */
{pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_LATIN6 */
{pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_LATIN7 */
{pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_LATIN8 */
{pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_LATIN9 */
{pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_LATIN10 */
{pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_WIN1256 */
{pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_WIN1258 */
{pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_WIN866 */
{pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_WIN874 */
{pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_KOI8R */
{pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_WIN1251 */
{pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_WIN1252 */
{pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* ISO-8859-5 */
{pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* ISO-8859-6 */
{pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* ISO-8859-7 */
{pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* ISO-8859-8 */
{pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_WIN1250 */
{pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_WIN1253 */
{pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_WIN1254 */
{pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_WIN1255 */
{pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_WIN1257 */
{pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_KOI8U */
{0, pg_sjis_mblen, pg_sjis_dsplen, pg_sjis_verifier, 2}, /* PG_SJIS */
{0, pg_big5_mblen, pg_big5_dsplen, pg_big5_verifier, 2}, /* PG_BIG5 */
{0, pg_gbk_mblen, pg_gbk_dsplen, pg_gbk_verifier, 2}, /* PG_GBK */
{0, pg_uhc_mblen, pg_uhc_dsplen, pg_uhc_verifier, 2}, /* PG_UHC */
{0, pg_gb18030_mblen, pg_gb18030_dsplen, pg_gb18030_verifier, 4}, /*
PG_GB18030 */
{0, pg_johab_mblen, pg_johab_dsplen, pg_johab_verifier, 3}, /* PG_JOHAB */
{0, pg_sjis_mblen, pg_sjis_dsplen, pg_sjis_verifier, 2} /*
PG_SHIFT_JIS_2004 */
};
What does last 7 zeros in the first column means? No conversion to pg_wchar
is possible from these encodings?
------
With best regards,
Alexander Korotkov.
On Fri, Jan 20, 2012 at 1:07 AM, Alexander Korotkov <aekorotkov@gmail.com>wrote:
What does last 7 zeros in the first column means? No conversion to
pg_wchar is possible from these encodings?
Uh, I see. These encodings is not supported as server encodings.
------
With best regards,
Alexander Korotkov.
On Thu, January 19, 2012 21:30, Heikki Linnakangas wrote:
On 22.11.2011 21:38, Alexander Korotkov wrote:
WIP patch with index support for regexp search for pg_trgm contrib is
attached.
In spite of techniques which extracts continuous text parts from regexp,
this patch presents technique of automatum transformation. That allows more
comprehensive trigrams extraction.Nice!
Yes, wonderful stuff; I tested quite a bit with this patch; FWIW, here's what I found.
The patch yields spectacular speedups with small, simple-enough regexen. But it does not do a
good enough job when guessing where to use the index and where fall back to Seq Scan. This can
lead to (also spectacular) slow-downs, compared to Seq Scan.
I used the following to generate 3 test tables with lines of 80 random chars (just in case it's
handy for others to use):
$ cat create_data.sh
#!/bin/sh
for power in 4 5 6
do
table=azjunk${power}
index=${table}_trgmrgx_txt_01_idx
echo "-- generating table $table with index $index";
time perl -E'
sub ss{ join"",@_[ map{rand @_} 1 .. shift ] };
say(ss(80,"a".."g"," ","h".."m"," ","n".."s"," ","t".."z"))
for 1 .. 1e'"${power};" \
| psql -aqXc "drop table if exists $table;
create table $table(txt text); copy $table from stdin;
-- set session maintenance_work_mem='20GB';
create index $index on $table using gin(txt gin_trgm_ops);
analyze $table;";
done
\dt+ public.azjunk*
List of relations
Schema | Name | Type | Owner | Size | Description
--------+---------+-------+-----------+---------+-------------
public | azjunk4 | table | breinbaas | 1152 kB |
public | azjunk5 | table | breinbaas | 11 MB |
public | azjunk6 | table | breinbaas | 112 MB |
(3 rows)
I guessed that MAX_COLOR_CHARS limits the character class size (to 4, in your patch), is that
true? I can understand you want that value to be low to limit the above risk, but now it reduces
the usability of the feature a bit: one has to split up larger char-classes into several smaller
ones to make a statement use the index: i.e.:
txt ~ 'f[aeio]n' OR txt ~ 'f[uy]n'
instead of
txt ~ 'f[aeiouy]n'
I made compiled instances with larger values for MAX_COLOR_CHARS (6 and 9), and sure enough they
used the index for larger classes such as the above, but of course also got into problems easier
when quantifiers are added (*, +, {n,m}).
A better calculation to decide index-use seems necessary, and ideally one that allows for a larger
MAX_COLOR_CHARS than 4. Especially quantifiers could perhaps be inspected wrt that decision.
IMHO, the functionality would still be very useful when only very simple regexen were considered.
Btw, it seems impossible to Ctrl-C out of a search once it is submitted; I suppose this is
normally necessary for perfomance reasons, but it would be useful te be able to compile a test
version that allows it. I don't know how hard that would be.
There is also a minor bug, I think, when running with 'set enable_seqscan=off' in combination
with a too-large regex:
$ cat fail.sql:
set enable_bitmapscan=on;
set enable_seqscan=on;
explain analyze select txt from azjunk4 where txt ~ 'f[aeio]n'; -- OK
explain analyze select txt from azjunk4 where txt ~ 'f[aeiou]n'; -- OK
set enable_bitmapscan=on;
set enable_seqscan=off;
explain analyze select txt from azjunk4 where txt ~ 'f[aeio]n'; -- OK
explain analyze select txt from azjunk4 where txt ~ 'f[aeiou]n'; -- crashes
$ psql -f fail.sql
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on azjunk4 (cost=52.01..56.02 rows=1 width=81) (actual time=1.011..5.291
rows=131 loops=1)
Recheck Cond: (txt ~ 'f[aeio]n'::text)
-> Bitmap Index Scan on azjunk4_trgmrgx_txt_01_idx (cost=0.00..52.01 rows=1 width=0) (actual
time=0.880..0.880 rows=131 loops=1)
Index Cond: (txt ~ 'f[aeio]n'::text)
Total runtime: 5.700 ms
(5 rows)
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Seq Scan on azjunk4 (cost=0.00..268.00 rows=1 width=81) (actual time=1.491..36.049 rows=164
loops=1)
Filter: (txt ~ 'f[aeiou]n'::text)
Rows Removed by Filter: 9836
Total runtime: 36.112 ms
(4 rows)
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on azjunk4 (cost=52.01..56.02 rows=1 width=81) (actual time=0.346..0.927
rows=131 loops=1)
Recheck Cond: (txt ~ 'f[aeio]n'::text)
-> Bitmap Index Scan on azjunk4_trgmrgx_txt_01_idx (cost=0.00..52.01 rows=1 width=0) (actual
time=0.316..0.316 rows=131 loops=1)
Index Cond: (txt ~ 'f[aeio]n'::text)
Total runtime: 0.996 ms
(5 rows)
psql:fail.sql:24: connection to server was lost
Sorry: I found the code, esp. the regex engine hard to understand sofar, so all the above are
somewhat inconclusive remarks; I hope nonetheless they are useful.
Thanks,
Erik Rijkers
On Fri, Jan 20, 2012 at 01:33, Erik Rijkers <er@xs4all.nl> wrote:
Btw, it seems impossible to Ctrl-C out of a search once it is submitted; I suppose this is
normally necessary for perfomance reasons, but it would be useful te be able to compile a test
version that allows it.
I believe being interruptible is a requirement for the patch to be accepted.
CHECK_FOR_INTERRUPTS(); should be added to the indeterminate loops.
Regards,
Marti
Hi!
Thank you for your feedback!
On Fri, Jan 20, 2012 at 3:33 AM, Erik Rijkers <er@xs4all.nl> wrote:
The patch yields spectacular speedups with small, simple-enough regexen.
But it does not do a
good enough job when guessing where to use the index and where fall back
to Seq Scan. This can
lead to (also spectacular) slow-downs, compared to Seq Scan.
Could you give some examples of regexes where index scan becomes slower
than seq scan?
I guessed that MAX_COLOR_CHARS limits the character class size (to 4, in
your patch), is that
true? I can understand you want that value to be low to limit the above
risk, but now it reduces
the usability of the feature a bit: one has to split up larger
char-classes into several smaller
ones to make a statement use the index: i.e.:
Yes, MAX_COLOR_CHARS is number of maximum character in automata color when
that color is divided to a separated characters. And it's likely there
could be better solution than just have this hard limit.
Btw, it seems impossible to Ctrl-C out of a search once it is submitted; I
suppose this is
normally necessary for perfomance reasons, but it would be useful te be
able to compile a test
version that allows it. I don't know how hard that would be.
I seems that Ctrl-C was impossible because procedure of trigrams
exctraction becomes so long while it is not breakable. It's not difficult
to make this procedure breakable, but actually it just shouldn't take so
long.
There is also a minor bug, I think, when running with 'set
enable_seqscan=off' in combination
with a too-large regex:
Thanks for pointing. Will be fixed.
------
With best regards,
Alexander Korotkov.
On Fri, Jan 20, 2012 at 8:45 PM, Marti Raudsepp <marti@juffo.org> wrote:
On Fri, Jan 20, 2012 at 01:33, Erik Rijkers <er@xs4all.nl> wrote:
Btw, it seems impossible to Ctrl-C out of a search once it is submitted;
I suppose this is
normally necessary for perfomance reasons, but it would be useful te be
able to compile a test
version that allows it.
I believe being interruptible is a requirement for the patch to be
accepted.CHECK_FOR_INTERRUPTS(); should be added to the indeterminate loops.
Sure. It's easy to fix. But it seems that in this case gin extract_query
method becomes slow (because index scan itself is breakable). So, it just
shouldn't work so long.
------
With best regards,
Alexander Korotkov.
On Fri, Jan 20, 2012 at 12:54 AM, Alexander Korotkov
<aekorotkov@gmail.com>wrote:
On Fri, Jan 20, 2012 at 12:30 AM, Heikki Linnakangas <
heikki.linnakangas@enterprisedb.com> wrote:Apart from that, the multibyte issue seems like the big one. Any way
around that?
Conversion of pg_wchar to multibyte character is the only way I found to
avoid serious hacking of existing regexp code. Do you think additional
function in pg_wchar_tbl which converts pg_wchar back to multibyte
character is possible solution?
Do you have any notes on it? I could make the patch which adds such
function into core.
------
With best regards,
Alexander Korotkov.
On Sat, January 21, 2012 06:26, Alexander Korotkov wrote:
Hi!
Thank you for your feedback!
On Fri, Jan 20, 2012 at 3:33 AM, Erik Rijkers <er@xs4all.nl> wrote:
The patch yields spectacular speedups with small, simple-enough regexen.
But it does not do a
good enough job when guessing where to use the index and where fall back
to Seq Scan. This can
lead to (also spectacular) slow-downs, compared to Seq Scan.Could you give some examples of regexes where index scan becomes slower
than seq scan?
x[aeio]+q takes many minutes, uninterruptible. It's now running for almost 30 minutes, I'm sure
it will come back eventually, but I think I'll kill it later on; I suppose you get the point ;-)
Even with {n,m} quantifiers it's easy to hit slowdowns:
(table azjunk6 is 112 MB, the index 693 MB.)
(MAX_COLOR_CHARS=4 <= your original patch)
instance table regex plan time
HEAD azjunk6 x[aeio]{1,3}q Seq Scan 3566.088 ms
HEAD azjunk6 x[aeio]{1,3}q Seq Scan 3540.606 ms
HEAD azjunk6 x[aeio]{1,3}q Seq Scan 3495.034 ms
HEAD azjunk6 x[aeio]{1,3}q Seq Scan 3510.403 ms
trgm_regex azjunk6 x[aeio]{1,3}q Bitmap Heap Scan 3724.131 ms
trgm_regex azjunk6 x[aeio]{1,3}q Bitmap Heap Scan 3844.999 ms
trgm_regex azjunk6 x[aeio]{1,3}q Bitmap Heap Scan 3835.190 ms
trgm_regex azjunk6 x[aeio]{1,3}q Bitmap Heap Scan 3724.016 ms
HEAD azjunk6 x[aeio]{1,4}q Seq Scan 3617.997 ms
HEAD azjunk6 x[aeio]{1,4}q Seq Scan 3644.215 ms
HEAD azjunk6 x[aeio]{1,4}q Seq Scan 3636.976 ms
HEAD azjunk6 x[aeio]{1,4}q Seq Scan 3625.493 ms
trgm_regex azjunk6 x[aeio]{1,4}q Bitmap Heap Scan 7885.247 ms
trgm_regex azjunk6 x[aeio]{1,4}q Bitmap Heap Scan 8799.082 ms
trgm_regex azjunk6 x[aeio]{1,4}q Bitmap Heap Scan 7754.152 ms
trgm_regex azjunk6 x[aeio]{1,4}q Bitmap Heap Scan 7721.332 ms
This is with your patch as is; in instances compiled with higher MAX_COLOR_CHARS (I did 6 and 9),
it is of course even easier to dream up a slow regex...
Erik Rijkers
Hi!
New version of patch is attached. Changes are following:
1) Right way to convert from pg_wchar to multibyte.
2) Optimization of producing CFNA-like graph on trigrams (produce smaller,
but equivalent, graphs in less time).
3) Comments and refactoring.
------
With best regards,
Alexander Korotkov.
Attachments:
On Mon, November 19, 2012 22:58, Alexander Korotkov wrote:
New version of patch is attached.
Hi Alexander,
I get some compile-errors:
(Centos 6.3, Linux 2.6.32-279.14.1.el6.x86_64 GNU/Linux, gcc (GCC) 4.7.2)
make contrib
trgm_regexp.c:73:2: error: unknown type name TrgmStateKey
make[1]: *** [trgm_regexp.o] Error 1
make: *** [all-pg_trgm-recurse] Error 2
trgm_regexp.c:73:2: error: unknown type name TrgmStateKey
make[1]: *** [trgm_regexp.o] Error 1
make: *** [install-pg_trgm-recurse] Error 2
Did I forget something?
Thanks,
Erik Rijkers
On 19.11.2012 22:58, Alexander Korotkov wrote:
Hi!
New version of patch is attached. Changes are following:
1) Right way to convert from pg_wchar to multibyte.
2) Optimization of producing CFNA-like graph on trigrams (produce
smaller, but equivalent, graphs in less time).
3) Comments and refactoring.
Hi,
thanks for the updated message-id. I've done the initial review:
1) Patch applies fine to the master.
2) It's common to use upper-case names for macros, but trgm.h defines
macro "iswordchr" - I see it's moved from trgm_op.c but maybe we
could make it a bit more correct?
3) I see there are two '#ifdef KEEPONLYALNUM" blocks right next to each
other in trgm.h - maybe it'd be a good idea to join them?
4) The two new method prototypes at the end of trgm.h use different
indendation than the rest (spaces only instead of tabs).
5) There are no regression tests / updated docs (yet).
6) It does not compile - I do get a bunch of errors like this
trgm_regexp.c:73:2: error: expected specifier-qualifier-list before
‘TrgmStateKey’
trgm_regexp.c: In function ‘addKeys’:
trgm_regexp.c:291:24: error: ‘TrgmState’ has no member named ‘keys’
trgm_regexp.c:304:10: error: ‘TrgmState’ has no member named ‘keys’
...
It seems this is cause by the order of typedefs in trgm_regexp.c, namely
TrgmState referencing TrgmStateKey before it's defined. Moving the
TrgmStateKey before TrgmState fixed the issue (I'm using gcc-4.5.4 but
I think it's not compiler-dependent.)
7) Once fixed, it seems to work
CREATE EXTENSION pg_trgm ;
CREATE TABLE TEST (val TEXT);
INSERT INTO test
SELECT md5(i::text) FROM generate_series(1,1000000) s(i);
CREATE INDEX trgm_idx ON test USING gin (val gin_trgm_ops);
ANALYZE test;
EXPLAIN SELECT * FROM test WHERE val ~ '.*qqq.*';
QUERY PLAN
---------------------------------------------------------------------
Bitmap Heap Scan on test (cost=16.77..385.16 rows=100 width=33)
Recheck Cond: (val ~ '.*qqq.*'::text)
-> Bitmap Index Scan on trgm_idx (cost=0.00..16.75 rows=100
width=0)
Index Cond: (val ~ '.*qqq.*'::text)
(4 rows)
but I do get a bunch of NOTICE messages with debugging info (no matter
if the GIN index is used or not, so it's somewhere in the common regexp
code). But I guess that's due to WIP status.
regards
Tomas
Some quick comments.
On Tue, Nov 20, 2012 at 3:02 AM, Tomas Vondra <tv@fuzzy.cz> wrote:
6) It does not compile - I do get a bunch of errors like this
Fixed.
7) Once fixed, it seems to work
CREATE EXTENSION pg_trgm ;
CREATE TABLE TEST (val TEXT);
INSERT INTO test
SELECT md5(i::text) FROM generate_series(1,1000000) s(i);
CREATE INDEX trgm_idx ON test USING gin (val gin_trgm_ops);
ANALYZE test;EXPLAIN SELECT * FROM test WHERE val ~ '.*qqq.*';
QUERY PLAN
---------------------------------------------------------------------
Bitmap Heap Scan on test (cost=16.77..385.16 rows=100 width=33)
Recheck Cond: (val ~ '.*qqq.*'::text)
-> Bitmap Index Scan on trgm_idx (cost=0.00..16.75 rows=100
width=0)
Index Cond: (val ~ '.*qqq.*'::text)
(4 rows)but I do get a bunch of NOTICE messages with debugging info (no matter
if the GIN index is used or not, so it's somewhere in the common regexp
code). But I guess that's due to WIP status.
It's due to TRGM_REGEXP_DEBUG macro. I disabled it by default. But I think
pieces of code hidden by that macro could be useful for debug even after
WIP status.
------
With best regards,
Alexander Korotkov.
Attachments:
Glad to see this patch hasn't been totally forgotten. Being able to use
indexes for regular expressions would be really cool!
Back in January, I asked for some high-level description of how the
algorithm works
(http://archives.postgresql.org/message-id/4F187D5C.30701@enterprisedb.com).
That's still sorely needed. Googling around, I found the slides for your
presentation on this from PGConf.EU - it would be great to have the
information from that presentation included in the patch.
- Heikki
Hello
I tested it now and it working very well!
tested utf8, case sensitive, case insensitive
Regards
Pavel Stehule
2012/11/20 Heikki Linnakangas <hlinnakangas@vmware.com>:
Show quoted text
Glad to see this patch hasn't been totally forgotten. Being able to use
indexes for regular expressions would be really cool!Back in January, I asked for some high-level description of how the
algorithm works
(http://archives.postgresql.org/message-id/4F187D5C.30701@enterprisedb.com).
That's still sorely needed. Googling around, I found the slides for your
presentation on this from PGConf.EU - it would be great to have the
information from that presentation included in the patch.- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Nov 20, 2012 at 3:02 AM, Tomas Vondra <tv@fuzzy.cz> wrote:
2) It's common to use upper-case names for macros, but trgm.h defines
macro "iswordchr" - I see it's moved from trgm_op.c but maybe we
could make it a bit more correct?3) I see there are two '#ifdef KEEPONLYALNUM" blocks right next to each
other in trgm.h - maybe it'd be a good idea to join them?4) The two new method prototypes at the end of trgm.h use different
indendation than the rest (spaces only instead of tabs).
These issues are fixed in attached patch.
Additionally 3 new macros are
introduced: MAX_RESULT_STATES, MAX_RESULT_ARCS, MAX_RESULT_PATHS. They are
limiting resources usage during regex processing.
------
With best regards,
Alexander Korotkov.
Attachments:
hello
do you plan to support GiST?
Regards
Pavel
2012/11/20 Alexander Korotkov <aekorotkov@gmail.com>:
Show quoted text
On Tue, Nov 20, 2012 at 3:02 AM, Tomas Vondra <tv@fuzzy.cz> wrote:
2) It's common to use upper-case names for macros, but trgm.h defines
macro "iswordchr" - I see it's moved from trgm_op.c but maybe we
could make it a bit more correct?3) I see there are two '#ifdef KEEPONLYALNUM" blocks right next to each
other in trgm.h - maybe it'd be a good idea to join them?4) The two new method prototypes at the end of trgm.h use different
indendation than the rest (spaces only instead of tabs).These issues are fixed in attached patch.
Additionally 3 new macros are introduced: MAX_RESULT_STATES,
MAX_RESULT_ARCS, MAX_RESULT_PATHS. They are limiting resources usage during
regex processing.------
With best regards,
Alexander Korotkov.--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Nov 20, 2012 at 1:43 PM, Heikki Linnakangas <hlinnakangas@vmware.com
wrote:
Glad to see this patch hasn't been totally forgotten. Being able to use
indexes for regular expressions would be really cool!Back in January, I asked for some high-level description of how the
algorithm works (http://archives.postgresql.**
org/message-id/4F187D5C.30701@**enterprisedb.com<http://archives.postgresql.org/message-id/4F187D5C.30701@enterprisedb.com>).
That's still sorely needed. Googling around, I found the slides for your
presentation on this from PGConf.EU - it would be great to have the
information from that presentation included in the patch.
New version of patch is attached. The changes are following:
1) A big comment with high-level description of what is going on.
2) Regression tests.
3) Documetation update.
4) Some more refactoring.
------
With best regards,
Alexander Korotkov.
Attachments:
Hi!
On Wed, Nov 21, 2012 at 12:51 AM, Pavel Stehule <pavel.stehule@gmail.com>wrote:
do you plan to support GiST?
At first, I would note that pg_trgm GiST opclass is quite ridiculous for
support regex search (and, actually for LIKE/ILIKE search which is already
implemented too). Because in GiST opclass we store set of trigrams in leaf
pages. In was designed for trigram similarity search and have sense for it
because of elimination of trigram set computation. But for regex or
LIKE/ILIKE search this representation is both lossy and bigger than just
original string. Probably we could think about another opclass for GiST
focusing on regex and LIKE/ILIKE search?
However, amyway I can create additional patch for current GiST opclass.
------
With best regards,
Alexander Korotkov.
On 25.11.2012 22:55, Alexander Korotkov wrote:
On Tue, Nov 20, 2012 at 1:43 PM, Heikki Linnakangas<hlinnakangas@vmware.com
wrote:
Glad to see this patch hasn't been totally forgotten. Being able to use
indexes for regular expressions would be really cool!Back in January, I asked for some high-level description of how the
algorithm works (http://archives.postgresql.**
org/message-id/4F187D5C.30701@**enterprisedb.com<http://archives.postgresql.org/message-id/4F187D5C.30701@enterprisedb.com>).
That's still sorely needed. Googling around, I found the slides for your
presentation on this from PGConf.EU - it would be great to have the
information from that presentation included in the patch.New version of patch is attached. The changes are following:
1) A big comment with high-level description of what is going on.
2) Regression tests.
3) Documetation update.
4) Some more refactoring.
Great, that top-level comment helped tremendously! I feel enlightened.
I fixed some spelling, formatting etc. trivial stuff while reading
through the patch, see attached. Below is some feedback on the details:
* I don't like the PG_TRY/CATCH trick. It's not generally safe to catch
an error, without propagating it further or rolling back the whole
(sub)transation. It might work in this case, as you're only suppressing
errors with the special sqlcode that are used in the same file, but it
nevertheless feels naughty. I believe none of the limits that are being
checked are strict; it's OK to exceed the limits somewhat, as long as
you terminate the processing in a reasonable time, in case of
pathological input. I'd suggest putting an explicit check for the limits
somewhere, and not rely on ereport(). Something like this, in the code
that recurses:
if (trgmCNFA->arcsCount > MAX_RESULT_ARCS ||
hash_get_num_entries(trgmCNFA->states) > MAX_RESULT_STATES)
{
trgmCNFA->overflowed = true;
return;
}
And then check for the overflowed flag at the top level.
* This part of the high-level comment was not clear to me:
* States of the graph produced in the first stage are marked with "keys". Key is a pair
* of a "prefix" and a state of the original automaton. "Prefix" is a last
* characters. So, knowing the prefix is enough to know what is a trigram when we read some new
* character. However, we can know single character of prefix or don't know any
* characters of it. Each state of resulting graph have an "enter key" (with that
* key we've entered this state) and a set of keys which are reachable without
* reading any predictable trigram. The algorithm of processing each state
* of resulting graph are so:
* 1) Add all keys which achievable without reading of any predictable trigram.
* 2) Add outgoing arcs labeled with trigrams.
* Step 2 leads to creation of new states and recursively processing them. So,
* we use depth-first algorithm.
I didn't understand that. Can you elaborate? It might help to work
through an example, with some ascii art depicting the graph.
* It would be nice to add some comments to TrgmCNFA struct, explaining
which fields are valid at which stages. For example, it seems that
'trgms' array is calculated only after building the CNFA, by
getTrgmVector() function, while arcsCount is updated on the fly, while
recursing in the getState() function.
* What is the representation used for the path matrix? Needs a comment.
* What do the getColorinfo() and scanColorMap() functions do? What
exactly does a color represent? What's the tradeoff in choosing
MAX_COLOR_CHARS?
- Heikki
Attachments:
trgm-regexp-0.5-heikki1.patchtext/x-diff; name=trgm-regexp-0.5-heikki1.patchDownload
diff --git a/contrib/pg_trgm/Makefile b/contrib/pg_trgm/Makefile
index 64fd69f..8033733 100644
--- a/contrib/pg_trgm/Makefile
+++ b/contrib/pg_trgm/Makefile
@@ -1,7 +1,7 @@
# contrib/pg_trgm/Makefile
MODULE_big = pg_trgm
-OBJS = trgm_op.o trgm_gist.o trgm_gin.o
+OBJS = trgm_op.o trgm_gist.o trgm_gin.o trgm_regexp.o
EXTENSION = pg_trgm
DATA = pg_trgm--1.0.sql pg_trgm--unpackaged--1.0.sql
diff --git a/contrib/pg_trgm/expected/pg_trgm.out b/contrib/pg_trgm/expected/pg_trgm.out
index 81d0ca8..ee0131f 100644
--- a/contrib/pg_trgm/expected/pg_trgm.out
+++ b/contrib/pg_trgm/expected/pg_trgm.out
@@ -54,7 +54,7 @@ select similarity('wow',' WOW ');
(1 row)
CREATE TABLE test_trgm(t text);
-\copy test_trgm from 'data/trgm.data
+\copy test_trgm from 'data/trgm.data'
select t,similarity(t,'qwertyu0988') as sml from test_trgm where t % 'qwertyu0988' order by sml desc, t;
t | sml
-------------+----------
@@ -3515,6 +3515,47 @@ select * from test2 where t ilike 'qua%';
quark
(1 row)
+select * from test2 where t ~ '[abc]{3}';
+ t
+--------
+ abcdef
+(1 row)
+
+select * from test2 where t ~ 'a[bc]+d';
+ t
+--------
+ abcdef
+(1 row)
+
+select * from test2 where t ~* 'DEF';
+ t
+--------
+ abcdef
+(1 row)
+
+select * from test2 where t ~ 'dEf';
+ t
+---
+(0 rows)
+
+select * from test2 where t ~* '^q';
+ t
+-------
+ quark
+(1 row)
+
+select * from test2 where t ~* '[abc]{3}[def]{3}';
+ t
+--------
+ abcdef
+(1 row)
+
+select * from test2 where t ~ 'q.*rk$';
+ t
+-------
+ quark
+(1 row)
+
drop index test2_idx_gin;
create index test2_idx_gist on test2 using gist (t gist_trgm_ops);
set enable_seqscan=off;
diff --git a/contrib/pg_trgm/pg_trgm--1.0.sql b/contrib/pg_trgm/pg_trgm--1.0.sql
index 8067bd6..ca9bcaa 100644
--- a/contrib/pg_trgm/pg_trgm--1.0.sql
+++ b/contrib/pg_trgm/pg_trgm--1.0.sql
@@ -163,4 +163,6 @@ AS
ALTER OPERATOR FAMILY gin_trgm_ops USING gin ADD
OPERATOR 3 pg_catalog.~~ (text, text),
- OPERATOR 4 pg_catalog.~~* (text, text);
+ OPERATOR 4 pg_catalog.~~* (text, text),
+ OPERATOR 5 pg_catalog.~ (text, text),
+ OPERATOR 6 pg_catalog.~* (text, text);
diff --git a/contrib/pg_trgm/sql/pg_trgm.sql b/contrib/pg_trgm/sql/pg_trgm.sql
index 81ab1e7..7d8a151 100644
--- a/contrib/pg_trgm/sql/pg_trgm.sql
+++ b/contrib/pg_trgm/sql/pg_trgm.sql
@@ -13,7 +13,7 @@ select similarity('wow',' WOW ');
CREATE TABLE test_trgm(t text);
-\copy test_trgm from 'data/trgm.data
+\copy test_trgm from 'data/trgm.data'
select t,similarity(t,'qwertyu0988') as sml from test_trgm where t % 'qwertyu0988' order by sml desc, t;
select t,similarity(t,'gwertyu0988') as sml from test_trgm where t % 'gwertyu0988' order by sml desc, t;
@@ -52,6 +52,14 @@ select * from test2 where t like '%bcd%';
select * from test2 where t like E'%\\bcd%';
select * from test2 where t ilike '%BCD%';
select * from test2 where t ilike 'qua%';
+
+select * from test2 where t ~ '[abc]{3}';
+select * from test2 where t ~ 'a[bc]+d';
+select * from test2 where t ~* 'DEF';
+select * from test2 where t ~ 'dEf';
+select * from test2 where t ~* '^q';
+select * from test2 where t ~* '[abc]{3}[def]{3}';
+select * from test2 where t ~ 'q.*rk$';
drop index test2_idx_gin;
create index test2_idx_gist on test2 using gist (t gist_trgm_ops);
set enable_seqscan=off;
diff --git a/contrib/pg_trgm/trgm.h b/contrib/pg_trgm/trgm.h
index 067f29d..6ec9345 100644
--- a/contrib/pg_trgm/trgm.h
+++ b/contrib/pg_trgm/trgm.h
@@ -7,7 +7,6 @@
#include "access/gist.h"
#include "access/itup.h"
#include "storage/bufpage.h"
-#include "utils/builtins.h"
/* options */
#define LPADDING 2
@@ -28,6 +27,8 @@
#define DistanceStrategyNumber 2
#define LikeStrategyNumber 3
#define ILikeStrategyNumber 4
+#define RegExpStrategyNumber 5
+#define RegExpStrategyNumberICase 6
typedef char trgm[3];
@@ -46,8 +47,10 @@ uint32 trgm2int(trgm *ptr);
#ifdef KEEPONLYALNUM
#define ISPRINTABLECHAR(a) ( isascii( *(unsigned char*)(a) ) && (isalnum( *(unsigned char*)(a) ) || *(unsigned char*)(a)==' ') )
+#define ISWORDCHR(c) (t_isalpha(c) || t_isdigit(c))
#else
#define ISPRINTABLECHAR(a) ( isascii( *(unsigned char*)(a) ) && isprint( *(unsigned char*)(a) ) )
+#define ISWORDCHR(c) (!t_isspace(c))
#endif
#define ISPRINTABLETRGM(t) ( ISPRINTABLECHAR( ((char*)(t)) ) && ISPRINTABLECHAR( ((char*)(t))+1 ) && ISPRINTABLECHAR( ((char*)(t))+2 ) )
@@ -99,11 +102,21 @@ typedef char *BITVECP;
#define GETARR(x) ( (trgm*)( (char*)x+TRGMHDRSIZE ) )
#define ARRNELEM(x) ( ( VARSIZE(x) - TRGMHDRSIZE )/sizeof(trgm) )
+typedef struct
+{
+ int count;
+ char data[0];
+} PackedTrgmPaths;
+
extern float4 trgm_limit;
+#define ERRCODE_TRGM_REGEX_TOO_COMPLEX MAKE_SQLSTATE('T','M','0','0','0')
+
TRGM *generate_trgm(char *str, int slen);
TRGM *generate_wildcard_trgm(const char *str, int slen);
float4 cnt_sml(TRGM *trg1, TRGM *trg2);
bool trgm_contained_by(TRGM *trg1, TRGM *trg2);
+void cnt_trigram(trgm *trgmptr, char *str, int bytelen);
+TRGM *createTrgmCNFA(text *text_re, MemoryContext context, PackedTrgmPaths **paths);
#endif /* __TRGM_H__ */
diff --git a/contrib/pg_trgm/trgm_gin.c b/contrib/pg_trgm/trgm_gin.c
index 114fb78..9f53644 100644
--- a/contrib/pg_trgm/trgm_gin.c
+++ b/contrib/pg_trgm/trgm_gin.c
@@ -75,19 +75,20 @@ gin_extract_value_trgm(PG_FUNCTION_ARGS)
Datum
gin_extract_query_trgm(PG_FUNCTION_ARGS)
{
- text *val = (text *) PG_GETARG_TEXT_P(0);
- int32 *nentries = (int32 *) PG_GETARG_POINTER(1);
- StrategyNumber strategy = PG_GETARG_UINT16(2);
+ text *val = (text *) PG_GETARG_TEXT_P(0);
+ int32 *nentries = (int32 *) PG_GETARG_POINTER(1);
+ StrategyNumber strategy = PG_GETARG_UINT16(2);
- /* bool **pmatch = (bool **) PG_GETARG_POINTER(3); */
- /* Pointer *extra_data = (Pointer *) PG_GETARG_POINTER(4); */
- /* bool **nullFlags = (bool **) PG_GETARG_POINTER(5); */
- int32 *searchMode = (int32 *) PG_GETARG_POINTER(6);
- Datum *entries = NULL;
- TRGM *trg;
- int32 trglen;
- trgm *ptr;
- int32 i;
+ /* bool **pmatch = (bool **) PG_GETARG_POINTER(3); */
+ Pointer **extra_data = (Pointer **) PG_GETARG_POINTER(4);
+ /* bool **nullFlags = (bool **) PG_GETARG_POINTER(5); */
+ int32 *searchMode = (int32 *) PG_GETARG_POINTER(6);
+ Datum *entries = NULL;
+ TRGM *trg;
+ int32 trglen;
+ trgm *ptr;
+ int32 i;
+ PackedTrgmPaths *paths;
switch (strategy)
{
@@ -107,6 +108,32 @@ gin_extract_query_trgm(PG_FUNCTION_ARGS)
*/
trg = generate_wildcard_trgm(VARDATA(val), VARSIZE(val) - VARHDRSZ);
break;
+ case RegExpStrategyNumberICase:
+#ifndef IGNORECASE
+ elog(ERROR, "cannot handle ~* with case-sensitive trigrams");
+#endif
+ /* FALL THRU */
+ case RegExpStrategyNumber:
+ trg = createTrgmCNFA(val, fcinfo->flinfo->fn_mcxt, &paths);
+ if (trg && ARRNELEM(trg) > 0)
+ {
+ /*
+ * Successful of regex processing: store path matrix as an
+ * extra_data.
+ */
+ *extra_data = (Pointer *)palloc0(sizeof(Pointer) *
+ ARRNELEM(trg));
+ for (i = 0; i < ARRNELEM(trg); i++)
+ (*extra_data)[i] = (Pointer)paths;
+ }
+ else
+ {
+ /* No result: have to do full index scan. */
+ *nentries = 0;
+ *searchMode = GIN_SEARCH_MODE_ALL;
+ PG_RETURN_POINTER(entries);
+ }
+ break;
default:
elog(ERROR, "unrecognized strategy number: %d", strategy);
trg = NULL; /* keep compiler quiet */
@@ -147,11 +174,15 @@ gin_trgm_consistent(PG_FUNCTION_ARGS)
/* text *query = PG_GETARG_TEXT_P(2); */
int32 nkeys = PG_GETARG_INT32(3);
- /* Pointer *extra_data = (Pointer *) PG_GETARG_POINTER(4); */
+ Pointer *extra_data = (Pointer *) PG_GETARG_POINTER(4);
bool *recheck = (bool *) PG_GETARG_POINTER(5);
- bool res;
- int32 i,
+ bool res, f;
+ int32 i, j,
ntrue;
+ PackedTrgmPaths *paths;
+ char *path;
+ int pathCount, bitmaskLength = (nkeys + 7) / 8;
+
/* All cases served by this function are inexact */
*recheck = true;
@@ -189,6 +220,37 @@ gin_trgm_consistent(PG_FUNCTION_ARGS)
}
}
break;
+ case RegExpStrategyNumber:
+ case RegExpStrategyNumberICase:
+ if (nkeys < 1)
+ {
+ /* Regex processing gives no result: do full index scan */
+ res = true;
+ break;
+ }
+ /* Try to find path conforming this set of trigrams */
+ paths = (PackedTrgmPaths *)extra_data[0];
+ pathCount = paths->count;
+ res = false;
+ for (i = 0; i < pathCount; i++)
+ {
+ path = &paths->data[bitmaskLength * i];
+ f = true;
+ for (j = 0; j < nkeys; j++)
+ {
+ if (GETBIT(path, j) && !check[j])
+ {
+ f = false;
+ break;
+ }
+ }
+ if (f)
+ {
+ res = true;
+ break;
+ }
+ }
+ break;
default:
elog(ERROR, "unrecognized strategy number: %d", strategy);
res = false; /* keep compiler quiet */
diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c
index 87dffd1..71aa938 100644
--- a/contrib/pg_trgm/trgm_op.c
+++ b/contrib/pg_trgm/trgm_op.c
@@ -77,12 +77,6 @@ unique_array(trgm *a, int len)
return curend + 1 - a;
}
-#ifdef KEEPONLYALNUM
-#define iswordchr(c) (t_isalpha(c) || t_isdigit(c))
-#else
-#define iswordchr(c) (!t_isspace(c))
-#endif
-
/*
* Finds first word in string, returns pointer to the word,
* endword points to the character after word
@@ -92,7 +86,7 @@ find_word(char *str, int lenstr, char **endword, int *charlen)
{
char *beginword = str;
- while (beginword - str < lenstr && !iswordchr(beginword))
+ while (beginword - str < lenstr && !ISWORDCHR(beginword))
beginword += pg_mblen(beginword);
if (beginword - str >= lenstr)
@@ -100,7 +94,7 @@ find_word(char *str, int lenstr, char **endword, int *charlen)
*endword = beginword;
*charlen = 0;
- while (*endword - str < lenstr && iswordchr(*endword))
+ while (*endword - str < lenstr && ISWORDCHR(*endword))
{
*endword += pg_mblen(*endword);
(*charlen)++;
@@ -109,8 +103,7 @@ find_word(char *str, int lenstr, char **endword, int *charlen)
return beginword;
}
-#ifdef USE_WIDE_UPPER_LOWER
-static void
+void
cnt_trigram(trgm *tptr, char *str, int bytelen)
{
if (bytelen == 3)
@@ -131,7 +124,6 @@ cnt_trigram(trgm *tptr, char *str, int bytelen)
CPTRGM(tptr, &crc);
}
}
-#endif
/*
* Adds trigrams from words (already padded).
@@ -287,7 +279,7 @@ get_wildcard_part(const char *str, int lenstr,
{
if (in_escape)
{
- if (iswordchr(beginword))
+ if (ISWORDCHR(beginword))
break;
in_escape = false;
in_leading_wildcard_meta = false;
@@ -298,7 +290,7 @@ get_wildcard_part(const char *str, int lenstr,
in_escape = true;
else if (ISWILDCARDCHAR(beginword))
in_leading_wildcard_meta = true;
- else if (iswordchr(beginword))
+ else if (ISWORDCHR(beginword))
break;
else
in_leading_wildcard_meta = false;
@@ -341,7 +333,7 @@ get_wildcard_part(const char *str, int lenstr,
clen = pg_mblen(endword);
if (in_escape)
{
- if (iswordchr(endword))
+ if (ISWORDCHR(endword))
{
memcpy(s, endword, clen);
(*charlen)++;
@@ -369,7 +361,7 @@ get_wildcard_part(const char *str, int lenstr,
in_trailing_wildcard_meta = true;
break;
}
- else if (iswordchr(endword))
+ else if (ISWORDCHR(endword))
{
memcpy(s, endword, clen);
(*charlen)++;
diff --git a/contrib/pg_trgm/trgm_regexp.c b/contrib/pg_trgm/trgm_regexp.c
new file mode 100644
index 0000000..be7d846
--- /dev/null
+++ b/contrib/pg_trgm/trgm_regexp.c
@@ -0,0 +1,1209 @@
+/*
+ * contrib/pg_trgm/trgm_regexp.c - regular expression matching using trigrams
+ *
+ * The general idea of index support for a regular expression (regex) search
+ * is to transform regex to a logical expression on trigrams. For example:
+ *
+ * (ab|cd)efg => ((abe & bef) | (cde & def)) & efg
+ *
+ * If a string matches the regex, then it must match the logical expression of
+ * trigrams. The opposite is not necessary true, however: a string that matches
+ * the logical expression might not match the original regex. Such false
+ * positives are removed during recheck.
+ *
+ * The algorithm to convert a regex to a logical expression is based on
+ * analysis of an automaton that corresponds to regex. The algorithm consists
+ * of two stages:
+ * 1) Transform the automaton to an automaton-like graph of trigrams.
+ * 2) Collect all minimal paths of that graph into a matrix.
+ *
+ * Automaton we have after processing a regular expression is a graph where
+ * vertices are "states" and arcs are labeled with characters. There are
+ * two special states: "initial" and "final". If you can traverse from the
+ * initial state to the final state, and type given string by arc labels then
+ * the string matches the regular expression.
+ *
+ * We use CNFA (colored non-deterministic finite-state automaton) produced by
+ * the PostgreSQL regex library. CNFA means that:
+ * 1) Characters are grouped into colors, so arcs are labeled with colors.
+ * 2) Multiple outgoing arcs from same state can be labeled with the same color.
+ * This makes the automaton non-deterministic, because it can be in many
+ * states simultaneously.
+ * 3) It has finite number of states (actually infinite-state automata are
+ * almost never considered).
+ *
+ * As result of the first stage we have a CNFA-like graph with the following
+ * property: If you can traverse from the initial state to the final state, via
+ * arcs labeled with trigrams that are present in the string, then the string
+ * might match the regex. Otherwise, it does not. Actually, this graph is a
+ * form of representation of logical expression we need.
+ *
+ * States of the graph produced in the first stage are marked with "keys". Key is a pair
+ * of a "prefix" and a state of the original automaton. "Prefix" is a last
+ * characters. So, knowing the prefix is enough to know what is a trigram when we read some new
+ * character. However, we can know single character of prefix or don't know any
+ * characters of it. Each state of resulting graph have an "enter key" (with that
+ * key we've entered this state) and a set of keys which are reachable without
+ * reading any predictable trigram. The algorithm of processing each state
+ * of resulting graph are so:
+ * 1) Add all keys which achievable without reading of any predictable trigram.
+ * 2) Add outgoing arcs labeled with trigrams.
+ * Step 2 leads to creation of new states and recursively processing them. So,
+ * we use depth-first algorithm.
+ *
+ * At the second stage we collect all the paths in graph from first stage. We
+ * only need "minimal" paths. For example, if we have two paths "abc & bcd" and
+ * "abc & bcd & cde" then then second one doesn't matter because the first one
+ * includes all the strings which the second one does. In order to evade
+ * enumeration of too many non-minimal paths we use breadth-first search with
+ * keeping matrix of minimal paths in each state.
+ */
+#include "postgres.h"
+
+#include "trgm.h"
+
+#include "catalog/pg_collation.h"
+#include "fmgr.h"
+#include "miscadmin.h"
+#include "mb/pg_wchar.h"
+#include "nodes/pg_list.h"
+#include "regex/regex.h"
+#undef INFINITY /* avoid conflict of INFINITY definition */
+#include "regex/regguts.h"
+#include "tsearch/ts_locale.h"
+#include "utils/hsearch.h"
+
+/*
+ * Uncomment to print intermediate stages, for exploring and debugging the
+ * algorithm implementation.
+ */
+/* #define TRGM_REGEXP_DEBUG */
+
+/* How big colors we're considering as separate arcs */
+#define MAX_COLOR_CHARS 4
+
+/*---
+ * Following group of parameters are used in order to limit our computations.
+ * Otherwise regex processing could be too slow and memory-consuming.
+ *
+ * MAX_RESULT_STATES - How many states we allow in result CNFA-like graph
+ * MAX_RESULT_ARCS - How many arcs we allow in result CNFA-like graph
+ * MAX_RESULT_PATHS - How many paths we allow in single path matrix
+ */
+#define MAX_RESULT_STATES 128
+#define MAX_RESULT_ARCS 1024
+#define MAX_RESULT_PATHS 256
+
+/*
+ * Widechar trigram datatype for holding trigram before possible conversion into
+ * CRC32
+ */
+typedef pg_wchar Trgm[3];
+
+/*
+ * Maximum length of multibyte encoding character is 4. So, we can hold it in
+ * 32 bit integer for handling simplicity.
+ */
+typedef uint32 mb_char;
+
+/*----
+ * Attributes of CNFA colors:
+ *
+ * containAlpha - flag indicates if color might contain alphanumeric characters
+ * (which are extracted into trigrams)
+ * charsCount - count of characters in color
+ * chars - characters of color
+ *
+ * All non alphanumeric character are treated as same zero character, because
+ * there are no difference between them for trigrams. Exact value of
+ * "charsCount" and value of "chars" is meaningful only when
+ * charsCount <= MAX_COLOR_CHARS. When charsCount > MAX_COLOR_CHARS we can
+ * expect any characters from this color.
+ */
+typedef struct
+{
+ int charsCount;
+ bool containAlpha;
+ mb_char chars[MAX_COLOR_CHARS];
+} ColorInfo;
+
+/*
+ * Prefix is information about last two read characters when coming into
+ * specific CNFA state. "s" contain that characters itself. But s[0] or both
+ * s[0] and s[1] could be zeros. "ambiguity" flag tells us how to treat that.
+ * If "ambiguity" is false then zeros in s indicates start of trigram. When
+ * "ambiguity" is true then zeros in s indicates that it could be any
+ * characters. Having "ambiguity" flag this structure actually express a class
+ * of prefixes.
+ */
+typedef struct
+{
+ mb_char s[2];
+ bool ambiguity;
+} TrgmPrefix;
+
+/*
+ * "Key" of resulting state: pair of prefix and source CNFA state.
+ */
+typedef struct
+{
+ TrgmPrefix prefix;
+ int nstate;
+} TrgmStateKey;
+
+/*---
+ * State of resulting graph.
+ *
+ * enterKey - a key with which we can enter this state
+ * keys - all keys achievable without reading any predictable trigram
+ * arcs - outgoing arcs of this state
+ * fin - flag indicated this state if final
+ * queued - flag indicated this states is queued in path matrix collection
+ * algorithm
+ */
+typedef struct
+{
+ TrgmStateKey enterKey;
+ List *keys;
+ List *arcs;
+ List *path;
+ bool fin;
+ bool queued;
+} TrgmState;
+
+/*
+ * Arc of trigram CNFA-like structure. Arc is labeled with trigram.
+ */
+typedef struct
+{
+ TrgmState *target;
+ Trgm trgm;
+} TrgmArc;
+
+/*---
+ * A single path is a path matrix of some state.
+ *
+ * queued - flag indicated that this path is new since last processing of this
+ * state and it's queued for processing.
+ * path - bit array representing a path itself.
+ */
+typedef struct
+{
+ bool queued;
+ char path[0];
+} TrgmStatePath;
+
+/*---
+ * Data structure representing all the data we need during regex processing.
+ *
+ * states - hash of states of resulting graph
+ * cnfa - source CFNA of regex
+ * colorInfo - processed information of regex colors
+ * initState - pointer to initial state of resulting graph
+ * trgms - array of all trigrams presented in graph
+ * trgmCount - count of that trigrams
+ * arcsCount - total number of arcs of resulting graph (for resouces limiting)
+ * bitmaskLen - length of bitmask representing single path in path matrix
+ * path - resulting path matrix
+ * queue - queue for path matrix producing
+ */
+typedef struct
+{
+ HTAB *states;
+ struct cnfa *cnfa;
+ ColorInfo *colorInfo;
+ TrgmState *initState;
+ Trgm *trgms;
+ int trgmCount;
+ int arcsCount;
+ int bitmaskLen;
+ List *path;
+ List *queue;
+} TrgmCNFA;
+
+static TrgmState *getState(TrgmCNFA *trgmCNFA, TrgmStateKey *key);
+
+/*
+ * Convert pg_wchar to multibyte character.
+ */
+static mb_char
+convertPgWchar(pg_wchar c)
+{
+ /*
+ * "s" has enough of space for, at maximum 4 byte multibyte character and
+ * a zero-byte at the end.
+ */
+ char s[5];
+ char *lowerCased;
+ mb_char result;
+
+ if (c == 0)
+ return 0;
+
+ MemSet(s, 0, sizeof(s));
+ pg_wchar2mb_with_len(&c, s, 1);
+
+ /* Convert to lowercase if needed */
+#ifdef IGNORECASE
+ lowerCased = lowerstr(s);
+#else
+ lowerCased = s;
+#endif
+ strncpy((char *)&result, lowerCased, 4);
+ return result;
+}
+
+/*
+ * Recursive function of colormap scanning.
+ */
+static void
+scanColorMap(union tree tree[NBYTS], union tree *t, ColorInfo *colorInfos,
+ int level, pg_wchar p)
+{
+ int i;
+
+ check_stack_depth();
+
+ if (level < NBYTS - 1)
+ {
+ for (i = 0; i < BYTTAB; i++)
+ {
+ /*
+ * This condition checks if all underlying levels express zero
+ * color. Zero color uses multiple links to same color table. So,
+ * avoid scanning it because it's expensive.
+ */
+ if (t->tptr[i] == &tree[level + 1])
+ continue;
+ /* Recursive scanning of next level color table */
+ scanColorMap(tree, t->tptr[i], colorInfos, level + 1, (p << 8) | i);
+ }
+ }
+ else
+ {
+ p <<= 8;
+ for (i = 0; i < BYTTAB; i++)
+ {
+ ColorInfo *colorInfo = &colorInfos[t->tcolor[i]];
+ int j;
+ pg_wchar c;
+
+ /* Convert to multibyte character */
+ c = convertPgWchar(p | i);
+
+ /* Update color attributes according to next character */
+ if (ISWORDCHR((char *)&c))
+ colorInfo->containAlpha = true;
+ else
+ c = 0;
+ if (colorInfo->charsCount <= MAX_COLOR_CHARS)
+ {
+ bool found = false;
+ for (j = 0; j < colorInfo->charsCount; j++)
+ {
+ if (colorInfo->chars[j] == c)
+ {
+ found = true;
+ break;
+ }
+ }
+ if (found)
+ continue;
+ }
+ if (colorInfo->charsCount < MAX_COLOR_CHARS)
+ colorInfo->chars[colorInfo->charsCount] = c;
+ colorInfo->charsCount++;
+ }
+ }
+}
+
+/*
+ * Obtain attributes of colors.
+ */
+static ColorInfo *
+getColorInfo(regex_t *regex)
+{
+ struct guts *g;
+ struct colormap *cm;
+ ColorInfo *result;
+ int colorsCount;
+
+ g = (struct guts *) regex->re_guts;
+ cm = &g->cmap;
+ colorsCount = cm->max + 1;
+
+ result = (ColorInfo *) palloc0(colorsCount * sizeof(ColorInfo));
+
+ /*
+ * Zero color is a default color which contains all character which aren't
+ * in explicitly expressed classes. Mark that we can expect everything
+ * from it.
+ */
+ result[0].containAlpha = true;
+ result[0].charsCount = MAX_COLOR_CHARS + 1;
+ scanColorMap(cm->tree, &cm->tree[0], result, 0, 0);
+
+#ifdef TRGM_REGEXP_DEBUG
+ {
+ int i;
+ for (i = 0; i < colorsCount; i++)
+ {
+ ColorInfo *colorInfo = &result[i];
+ elog(NOTICE, "COLOR %d %d %d %08X %08X %08X %08X", i,
+ colorInfo->charsCount, colorInfo->containAlpha,
+ colorInfo->chars[0], colorInfo->chars[1],
+ colorInfo->chars[2], colorInfo->chars[3]);
+ }
+ }
+#endif
+ return result;
+}
+
+/*
+ * Check if prefix1 "contains" prefix2. "contains" mean that any exact prefix
+ * (which no ambiguity) which satisfy to prefix2 also satisfy to prefix1.
+ */
+static bool
+prefixContains(TrgmPrefix *prefix1, TrgmPrefix *prefix2)
+{
+ if (prefix1->ambiguity)
+ {
+ if (prefix1->s[1] == 0)
+ {
+ /* Fully ambiguous prefix contains everything */
+ return true;
+ }
+ else
+ {
+ /*
+ * Prefix with only first ambiguous characters contains every prefix
+ * with same second character.
+ */
+ if (prefix1->s[1] == prefix2->s[1])
+ return true;
+ else
+ return false;
+ }
+ }
+ else
+ {
+ /* Exact prefix contains only exactly same prefix */
+ if (prefix1->s[0] == prefix2->s[0] && prefix1->s[1] == prefix2->s[1]
+ && !prefix2->ambiguity)
+ return true;
+ else
+ return false;
+ }
+}
+
+/*
+ * Add all keys which can be achieved without reading any trigram to state of
+ * CNFA-like graph on trigrams.
+ */
+static void
+addKeys(TrgmCNFA *trgmCNFA, TrgmState *state, TrgmStateKey *key)
+{
+ struct carc *s;
+ TrgmStateKey destKey;
+ ListCell *cell, *prev, *next;
+ TrgmStateKey *keyCopy;
+
+ MemSet(&destKey, 0, sizeof(TrgmStateKey));
+
+ /* Adjust list of keys with new one */
+ prev = NULL;
+ cell = list_head(state->keys);
+ while (cell)
+ {
+ TrgmStateKey *listKey = (TrgmStateKey *) lfirst(cell);
+ next = lnext(cell);
+ if (listKey->nstate == key->nstate)
+ {
+ if (prefixContains(&listKey->prefix, &key->prefix))
+ {
+ /* Already had this key: nothing to do */
+ return;
+ }
+ if (prefixContains(&key->prefix, &listKey->prefix))
+ state->keys = list_delete_cell(state->keys, cell, prev);
+ else
+ prev = cell;
+ }
+ else
+ prev = cell;
+ cell = next;
+ }
+ keyCopy = (TrgmStateKey *) palloc(sizeof(TrgmStateKey));
+ memcpy(keyCopy, key, sizeof(TrgmStateKey));
+ state->keys = lappend(state->keys, keyCopy);
+
+ /* Mark final state */
+ if (key->nstate == trgmCNFA->cnfa->post)
+ {
+ state->fin = true;
+ return;
+ }
+
+ s = trgmCNFA->cnfa->states[key->nstate];
+ while (s->co != COLORLESS)
+ {
+ ColorInfo *colorInfo;
+
+ if (s->co == trgmCNFA->cnfa->bos[1])
+ {
+ /* Start of line (^) */
+ destKey.nstate = s->to;
+
+ /* Mark prefix as start of new trigram */
+ destKey.prefix.s[0] = 0;
+ destKey.prefix.s[1] = 0;
+ destKey.prefix.ambiguity = false;
+
+ /* Add key to this state */
+ addKeys(trgmCNFA, state, &destKey);
+ if (state->fin)
+ return;
+ }
+ else if (s->co == trgmCNFA->cnfa->eos[1])
+ {
+ /* End of string ($) */
+ if (key->prefix.ambiguity)
+ {
+ destKey.nstate = s->to;
+
+ /*
+ * Let's think prefix to become ambiguous (in order to prevent
+ * latter fiddling around with keys).
+ */
+ destKey.prefix.s[1] = 0;
+ destKey.prefix.s[0] = 0;
+ destKey.prefix.ambiguity = true;
+
+ /* Prefix is ambiguous, add key to the same state */
+ addKeys(trgmCNFA, state, &destKey);
+ if (state->fin)
+ return;
+ }
+ }
+ else
+ {
+ /* Regular color */
+ colorInfo = &trgmCNFA->colorInfo[s->co];
+
+ if (colorInfo->charsCount > 0 &&
+ colorInfo->charsCount <= MAX_COLOR_CHARS)
+ {
+ /* We can enumerate characters of this color */
+ int i;
+ for (i = 0; i < colorInfo->charsCount; i++)
+ {
+ mb_char c = colorInfo->chars[i];
+
+ /*
+ * ----
+ * Create new prefix
+ * 1 (end of prefix) - current character "c"
+ * 0 (start of prefix) - end of previous prefix
+ *
+ * Zero "c" means non alphanumeric character, this means
+ * new prefix should indicate start of new trigram.
+ */
+ destKey.prefix.s[1] = c;
+ destKey.prefix.s[0] = c ? key->prefix.s[1] : 0;
+
+ destKey.nstate = s->to;
+
+ if (destKey.prefix.s[0] || c == 0)
+ destKey.prefix.ambiguity = false;
+ else
+ destKey.prefix.ambiguity = key->prefix.ambiguity;
+
+ if (key->prefix.ambiguity ||
+ (key->prefix.s[1] == 0 && c == 0))
+ {
+ /*
+ * If we have ambiguity or start of new trigram then add
+ * corresponding keys to same state.
+ */
+ addKeys(trgmCNFA, state, &destKey);
+ if (state->fin)
+ return;
+ }
+ }
+ }
+ else
+ {
+ /*
+ * Can't enumerate characters. Add corresponding key to this
+ * state.
+ */
+ destKey.nstate = s->to;
+ destKey.prefix.s[0] = 0;
+ destKey.prefix.s[1] = 0;
+ destKey.prefix.ambiguity = colorInfo->containAlpha;
+ addKeys(trgmCNFA, state, &destKey);
+ if (state->fin)
+ return;
+ }
+ }
+ s++;
+ }
+}
+
+/*
+ * Add outgoing arc from state if needed.
+ */
+static void
+addArc(TrgmCNFA *trgmCNFA, TrgmState *state, TrgmStateKey *key,
+ TrgmStateKey *destKey, mb_char c)
+{
+ TrgmArc *arc;
+ ListCell *cell2;
+
+ if (key->prefix.ambiguity || (key->prefix.s[1] == 0 && c == 0))
+ return;
+
+ /* If we have the same key here, we don't need to add new arc */
+ foreach(cell2, state->keys)
+ {
+ TrgmStateKey *key2 = (TrgmStateKey *) lfirst(cell2);
+ if (key2->nstate == destKey->nstate &&
+ prefixContains(&key2->prefix, &destKey->prefix))
+ {
+ return;
+ }
+ }
+
+ /* Not found, add new arc */
+ arc = (TrgmArc *) palloc(sizeof(TrgmArc));
+ arc->target = getState(trgmCNFA, destKey);
+ arc->trgm[0] = key->prefix.s[0];
+ arc->trgm[1] = key->prefix.s[1];
+ arc->trgm[2] = c;
+ state->arcs = lappend(state->arcs, arc);
+ trgmCNFA->arcsCount++;
+ if (trgmCNFA->arcsCount > MAX_RESULT_ARCS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TRGM_REGEX_TOO_COMPLEX),
+ errmsg("Too many resulting arcs.")));
+}
+
+/*
+ * Add outgoing arcs from given state.
+ */
+static void
+addArcs(TrgmCNFA *trgmCNFA, TrgmState *state)
+{
+ struct carc *s;
+ TrgmStateKey destKey;
+ ListCell *cell;
+
+ MemSet(&destKey, 0, sizeof(TrgmStateKey));
+
+ /*
+ * Iterate over keys associated with this output state.
+ */
+ foreach(cell, state->keys)
+ {
+ TrgmStateKey *key = (TrgmStateKey *) lfirst(cell);
+ s = trgmCNFA->cnfa->states[key->nstate];
+ while (s->co != COLORLESS)
+ {
+ ColorInfo *colorInfo;
+ if (s->co == trgmCNFA->cnfa->bos[1])
+ {
+ /* Should be already handled by addKeys. */
+ }
+ else if (s->co == trgmCNFA->cnfa->eos[1])
+ {
+ /* End of the string ($) */
+ destKey.nstate = s->to;
+
+ /* Assume prefix to become ambiguous after end of the string */
+ destKey.prefix.s[1] = 0;
+ destKey.prefix.s[0] = 0;
+ destKey.prefix.ambiguity = true;
+
+ addArc(trgmCNFA, state, key, &destKey, 0);
+ }
+ else
+ {
+ /* Regular color */
+ colorInfo = &trgmCNFA->colorInfo[s->co];
+
+ if (colorInfo->charsCount > 0
+ && colorInfo->charsCount <= MAX_COLOR_CHARS)
+ {
+ /* We can enumerate characters of this color */
+ int i;
+ for (i = 0; i < colorInfo->charsCount; i++)
+ {
+ mb_char c = colorInfo->chars[i];
+
+ /*
+ * ----
+ * Create new prefix
+ * 1 (end of prefix) - current character "c"
+ * 0 (start of prefix) - end of previous prefix
+ *
+ * Zero "c" means non alphanumeric character, this means
+ * new prefix should indicate start of new trigram.
+ */
+ destKey.prefix.s[1] = c;
+ destKey.prefix.s[0] = c ? key->prefix.s[1] : 0;
+
+ destKey.nstate = s->to;
+
+ if (destKey.prefix.s[0] || c == 0)
+ destKey.prefix.ambiguity = false;
+ else
+ destKey.prefix.ambiguity = key->prefix.ambiguity;
+
+ addArc(trgmCNFA, state, key, &destKey, c);
+ }
+ }
+ }
+ s++;
+ }
+ }
+}
+
+/*
+ * Get state of trigram CNFA-like graph of given enter key and process it if
+ * needed.
+ */
+static TrgmState *
+getState(TrgmCNFA *trgmCNFA, TrgmStateKey *key)
+{
+ bool found;
+ TrgmState *state;
+
+ state = hash_search(trgmCNFA->states, key, HASH_ENTER, &found);
+ if (found)
+ return state;
+ else
+ {
+ if (hash_get_num_entries(trgmCNFA->states) > MAX_RESULT_STATES)
+ ereport(ERROR,
+ (errcode(ERRCODE_TRGM_REGEX_TOO_COMPLEX),
+ errmsg("Too many resulting states.")));
+ state->arcs = NIL;
+ state->keys = NIL;
+ state->path = NIL;
+ state->fin = false;
+ state->queued = false;
+ addKeys(trgmCNFA, state, key);
+ if (!state->fin)
+ addArcs(trgmCNFA, state);
+ }
+ return state;
+}
+
+#ifdef TRGM_REGEXP_DEBUG
+/*
+ * Log source CNFA.
+ */
+static void
+printCNFA(struct cnfa *cnfa)
+{
+ int state;
+ for (state = 0; state < cnfa->nstates; state++)
+ {
+ struct carc *s;
+ elog(NOTICE, "STATE %d", state);
+ s = cnfa->states[state];
+ while (s->co != COLORLESS)
+ {
+ elog(NOTICE, "ARC %d %d", s->co, s->to);
+ s++;
+ }
+ }
+}
+
+/*
+ * Log resulting trigram-based CNFA-like structure.
+ */
+static void
+printTrgmCNFA(TrgmCNFA *trgmCNFA)
+{
+ HASH_SEQ_STATUS scan_status;
+ TrgmState *state;
+
+ elog(NOTICE, "INITSTATE %p", (void *)trgmCNFA->initState);
+
+ hash_seq_init(&scan_status, trgmCNFA->states);
+ while ((state = (TrgmState *) hash_seq_search(&scan_status)) != NULL)
+ {
+ ListCell *cell;
+ elog(NOTICE, "STATE %p %d", (void *)state, (int)state->fin);
+ foreach(cell, state->keys)
+ {
+ TrgmStateKey *key = (TrgmStateKey *) lfirst(cell);
+ elog(NOTICE, "KEY %08X %08X %d %d", key->prefix.s[0],
+ key->prefix.s[1], (int)key->prefix.ambiguity, key->nstate);
+ }
+
+ foreach(cell, state->arcs)
+ {
+ TrgmArc *arc = (TrgmArc *) lfirst(cell);
+ elog(NOTICE, "ARC %p %08X %08X %08X", (void *)arc->target,
+ (uint32)arc->trgm[0], (uint32)arc->trgm[1], arc->trgm[2]);
+ }
+ }
+}
+
+/*
+ * Log path matrix on trigrams.
+ */
+static void
+printTrgm(TRGM *trg, char *paths)
+{
+ int i,
+ pathsCount,
+ trgmCount,
+ bitmaskSize,
+ j;
+ for (i = 0; i < ARRNELEM(trg); i++)
+ {
+ elog(NOTICE, "TRGM %c %c %c",
+ GETARR(trg)[i][0], GETARR(trg)[i][1], GETARR(trg)[i][2]);
+ }
+
+ trgmCount = ARRNELEM(trg);
+ bitmaskSize = (trgmCount + 7) / 8;
+ pathsCount = *((int *)paths);
+ for (i = 0; i < pathsCount; i++)
+ {
+ char *path = paths + sizeof(int) + i * bitmaskSize;
+ char msg[1024], *p;
+ p = msg;
+
+ for (j = 0; j < trgmCount; j++)
+ {
+ if (GETBIT(path, j))
+ *p++ = '1';
+ else
+ *p++ = '0';
+ }
+ *p = 0;
+ elog(NOTICE, "%s", msg);
+ }
+}
+#endif
+
+/*
+ * Compare function for sorting of Trgm datatype.
+ */
+static int
+trgmCmp(const void *p1, const void *p2)
+{
+ return memcmp(p1, p2, sizeof(Trgm));
+}
+
+/*
+ * Calculate vector of all unique trigrams which are used
+ */
+static void
+getTrgmVector(TrgmCNFA *trgmCNFA)
+{
+ HASH_SEQ_STATUS scan_status;
+ int totalCount = 0, i = 0;
+ Trgm *p1, *p2;
+ TrgmState *state;
+
+ hash_seq_init(&scan_status, trgmCNFA->states);
+ while ((state = (TrgmState *) hash_seq_search(&scan_status)) != NULL)
+ {
+ ListCell *cell;
+ foreach(cell, state->arcs)
+ {
+ totalCount++;
+ }
+ }
+ trgmCNFA->trgms = (Trgm *) palloc(sizeof(Trgm) * totalCount);
+
+ hash_seq_init(&scan_status, trgmCNFA->states);
+ while ((state = (TrgmState *) hash_seq_search(&scan_status)) != NULL)
+ {
+ ListCell *cell;
+ foreach(cell, state->arcs)
+ {
+ TrgmArc *arc = (TrgmArc *) lfirst(cell);
+ memcpy(&trgmCNFA->trgms[i], &arc->trgm, sizeof(Trgm));
+ i++;
+ }
+ }
+
+ if (totalCount < 2)
+ {
+ trgmCNFA->trgmCount = totalCount;
+ return;
+ }
+
+ qsort(trgmCNFA->trgms, totalCount, sizeof(Trgm), trgmCmp);
+
+ /* Remove duplicates */
+ p1 = trgmCNFA->trgms;
+ p2 = trgmCNFA->trgms;
+ while (p1 - trgmCNFA->trgms < totalCount)
+ {
+ if (memcmp(p1, p2, sizeof(Trgm)) != 0)
+ {
+ p2++;
+ memcpy(p2, p1, sizeof(Trgm));
+ }
+ p1++;
+ }
+ trgmCNFA->trgmCount = p2 + 1 - trgmCNFA->trgms;
+}
+
+/*
+ * Compare two paths in trigram CNFA-like structure. "contain" means that "new"
+ * path contain all trigrams from "origin" path. "contained" means that "origin"
+ * cotain all trigrams from "new" path.
+ */
+static void
+compareMasks(char *new, char *origin, int len, bool *contain, bool *contained)
+{
+ int i;
+ *contain = true;
+ *contained = true;
+ for (i = 0; i < len; i++)
+ {
+ if ((~new[i]) & origin[i])
+ *contain = false;
+ if (new[i] & (~origin[i]))
+ *contained = false;
+ }
+}
+
+/*
+ * Add new path into path matrix.
+ */
+static void
+addPath(List **pathMatrix, TrgmStatePath *path, int len, bool *modify)
+{
+ ListCell *cell, *prev, *next;
+ int count = 0;
+
+ *modify = false;
+ prev = NULL;
+ cell = list_head(*pathMatrix);
+ while(cell)
+ {
+ bool contain, contained;
+
+ next = lnext(cell);
+ compareMasks(path->path,
+ ((TrgmStatePath *) lfirst(cell))->path,
+ len, &contain, &contained);
+
+ if (contain)
+ {
+ /* We already have same or wider path in matrix. Nothing to do. */
+ return;
+ }
+ if (contained)
+ {
+ /* New path is wider than other path in matrix. Delete latter. */
+ *pathMatrix = list_delete_cell(*pathMatrix, cell, prev);
+ }
+ else
+ {
+ prev = cell;
+ count++;
+ }
+
+ cell = next;
+ }
+ *modify = true;
+ if (count >= MAX_RESULT_PATHS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TRGM_REGEX_TOO_COMPLEX),
+ errmsg( "Too many paths.")));
+ *pathMatrix = lappend(*pathMatrix, path);
+}
+
+/*
+ * Add path to path matrix of corresponding path.
+ */
+static void
+adjustPaths(TrgmCNFA *trgmCNFA, TrgmState *state, TrgmStatePath *path)
+{
+ bool modify;
+
+ if (state->fin)
+ {
+ /*
+ * It's a final state. Add path to the final path matrix.
+ */
+ addPath(&trgmCNFA->path, path, trgmCNFA->bitmaskLen, &modify);
+ }
+ else
+ {
+ addPath(&state->path, path, trgmCNFA->bitmaskLen, &modify);
+
+ /* Did we actually change anything? */
+ if (!modify)
+ return;
+
+ /*
+ * Plan to scan outgoing arcs from this state if it's not done already.
+ */
+ if (!state->queued)
+ {
+ trgmCNFA->queue = lappend(trgmCNFA->queue, state);
+ state->queued = true;
+ }
+ }
+}
+
+/*
+ * Process queue of trigram CNFA-like states until we collect all the paths.
+ */
+static void
+processQueue(TrgmCNFA *trgmCNFA)
+{
+ while (trgmCNFA->queue != NIL)
+ {
+ TrgmState *state = (TrgmState *) linitial(trgmCNFA->queue);
+ ListCell *arcCell, *pathCell;
+
+ state->queued = false;
+ trgmCNFA->queue = list_delete_first(trgmCNFA->queue);
+
+ foreach(pathCell, state->path)
+ {
+ TrgmStatePath *path = (TrgmStatePath *) lfirst(pathCell);
+ if (!path->queued)
+ continue;
+ foreach(arcCell, state->arcs)
+ {
+ TrgmArc *arc = (TrgmArc *) lfirst(arcCell);
+ int index;
+ size_t size;
+ TrgmStatePath *pathCopy;
+
+ /* Create path according to arc (with corresponding bit set) */
+ size = sizeof(bool) + sizeof(char) * trgmCNFA->bitmaskLen;
+ pathCopy = (TrgmStatePath *)palloc(size);
+ memcpy(pathCopy, path, size);
+ index = (Trgm *)bsearch(&arc->trgm, trgmCNFA->trgms,
+ trgmCNFA->trgmCount, sizeof(Trgm), trgmCmp)
+ - trgmCNFA->trgms;
+ SETBIT(pathCopy->path, index);
+ adjustPaths(trgmCNFA, arc->target, pathCopy);
+ }
+ path->queued = false;
+ }
+ }
+}
+
+/*
+ * Convert trigram into trgm datatype.
+ */
+static void
+fillTrgm(trgm *ptrgm, Trgm trgm)
+{
+ char text[14], *p;
+ int i;
+
+ /* Write multibyte string into "text". */
+ p = text;
+ for (i = 0; i < 3; i++)
+ {
+ int len;
+ if (trgm[i] != 0)
+ {
+ len = strnlen((char *)&trgm[i], 4);
+ memcpy(p, &trgm[i], len);
+ p += len;
+ }
+ else
+ {
+ *p++ = ' ';
+ }
+ }
+ *p = 0;
+
+ /* Extract trigrams from "text" */
+ cnt_trigram(ptrgm, text, p - text);
+}
+
+/*
+ * Final pack of path matrix: convert trigrams info trgm datatype and remove
+ * empty columns from the matrix.
+ */
+static TRGM *
+finalPack(TrgmCNFA *trgmCNFA, MemoryContext context, PackedTrgmPaths **pathsOut)
+{
+ ListCell *cell;
+ char *unionPath;
+ int i, nonEmptyCount = 0, j;
+ TRGM *trg;
+ trgm *trgms;
+ int newBitMaskLen, pathIndex = 0;
+ PackedTrgmPaths *newPaths;
+
+ /* Calculate union of all path of order to detect empty columns */
+ unionPath = (char *) palloc0(sizeof(char) * trgmCNFA->bitmaskLen);
+ foreach(cell, trgmCNFA->path)
+ {
+ char *path = ((TrgmStatePath *) lfirst(cell))->path;
+ for (i = 0; i < trgmCNFA->bitmaskLen; i++)
+ unionPath[i] |= path[i];
+ }
+
+ /* Count non-empty columns */
+ for (i = 0; i < trgmCNFA->trgmCount; i++)
+ {
+ if (GETBIT(unionPath, i))
+ nonEmptyCount++;
+ }
+
+ /* Convert trigrams into trgm datatype */
+ trg = (TRGM *) palloc0(TRGMHDRSIZE + nonEmptyCount * sizeof(trgm));
+ trg->flag = ARRKEY;
+ SET_VARSIZE(trg, CALCGTSIZE(ARRKEY, nonEmptyCount));
+ trgms = GETARR(trg);
+ j = 0;
+ for (i = 0; i < trgmCNFA->trgmCount; i++)
+ {
+ if (GETBIT(unionPath, i))
+ {
+ fillTrgm(&trgms[j], trgmCNFA->trgms[i]);
+ j++;
+ }
+ }
+
+ /* Fill new matrix without empty columns */
+ newBitMaskLen = (nonEmptyCount + 7) / 8;
+ newPaths = (PackedTrgmPaths *)
+ MemoryContextAllocZero(context,
+ offsetof(PackedTrgmPaths, data)
+ + newBitMaskLen * sizeof(char) * list_length(trgmCNFA->path));
+ newPaths->count = list_length(trgmCNFA->path);
+ foreach(cell, trgmCNFA->path)
+ {
+ char *path = ((TrgmStatePath *) lfirst(cell))->path;
+ char *path2 = &newPaths->data[newBitMaskLen * pathIndex];
+ j = 0;
+ for (i = 0; i < trgmCNFA->trgmCount; i++)
+ {
+ if (GETBIT(unionPath, i))
+ {
+ if (GETBIT(path, i))
+ SETBIT(path2, j);
+ j++;
+ }
+ }
+ pathIndex++;
+ }
+ *pathsOut = newPaths;
+ return trg;
+}
+
+/*
+ * Main function of regex processing. Returns an array of trigrams and a paths
+ * matrix of those trigrams.
+ */
+TRGM *
+createTrgmCNFA(text *text_re, MemoryContext context, PackedTrgmPaths **pathsOut)
+{
+ HASHCTL hashCtl;
+ struct guts *g;
+ TrgmStateKey key;
+ TrgmCNFA trgmCNFA;
+ TrgmStatePath *path;
+ regex_t *regex;
+ TRGM *trg;
+ MemoryContext ccxt = CurrentMemoryContext;
+
+ PG_TRY();
+ {
+#ifdef IGNORECASE
+ regex = RE_compile_and_cache(text_re, REG_ADVANCED | REG_ICASE, DEFAULT_COLLATION_OID);
+#else
+ regex = RE_compile_and_cache(text_re, REG_ADVANCED, DEFAULT_COLLATION_OID);
+#endif
+ g = (struct guts *) regex->re_guts;
+
+ /* Collect color info */
+ trgmCNFA.cnfa = &g->search;
+ trgmCNFA.path = NIL;
+ trgmCNFA.queue = NIL;
+ trgmCNFA.colorInfo = getColorInfo(regex);
+ trgmCNFA.arcsCount = 0;
+
+#ifdef TRGM_REGEXP_DEBUG
+ printCNFA(&g->search);
+#endif
+
+ /* Create hash of states */
+ hashCtl.keysize = sizeof(TrgmStateKey);
+ hashCtl.entrysize = sizeof(TrgmState);
+ hashCtl.hcxt = CurrentMemoryContext;
+ hashCtl.hash = tag_hash;
+ hashCtl.match = memcmp;
+ trgmCNFA.states = hash_create("Trigram CNFA",
+ 1024,
+ &hashCtl,
+ HASH_ELEM | HASH_CONTEXT
+ | HASH_FUNCTION | HASH_COMPARE);
+
+ /* Create initial state of CNFA-like graph on trigrams */
+ MemSet(&key, 0, sizeof(TrgmStateKey));
+ key.prefix.s[0] = 0;
+ key.prefix.s[1] = 0;
+ key.nstate = trgmCNFA.cnfa->pre;
+ key.prefix.ambiguity = true;
+
+ /* Recursively build CNFA-like graph on trigrams */
+ trgmCNFA.initState = getState(&trgmCNFA, &key);
+
+#ifdef TRGM_REGEXP_DEBUG
+ printTrgmCNFA(&trgmCNFA);
+#endif
+
+ /* Get vector of unique trigrams */
+ getTrgmVector(&trgmCNFA);
+ trgmCNFA.bitmaskLen = (trgmCNFA.trgmCount + 7) / 8;
+
+ /*
+ * Create add empty path to initial state, and create matrix by
+ * processing queue (BFS search).
+ */
+ path = (TrgmStatePath *) palloc0(sizeof(bool) +
+ sizeof(char) * trgmCNFA.bitmaskLen);
+ path->queued = true;
+ adjustPaths(&trgmCNFA, trgmCNFA.initState, path);
+ processQueue(&trgmCNFA);
+
+ /* Final pack of paths matrix */
+ trg = finalPack(&trgmCNFA, context, pathsOut);
+
+#ifdef TRGM_REGEXP_DEBUG
+ printTrgm(trg, *pathsOut);
+#endif
+ }
+ PG_CATCH();
+ {
+ ErrorData *errdata;
+ MemoryContext ecxt;
+
+ ecxt = MemoryContextSwitchTo(ccxt);
+ errdata = CopyErrorData();
+ if (errdata->sqlerrcode == ERRCODE_TRGM_REGEX_TOO_COMPLEX)
+ {
+ trg = NULL;
+ FlushErrorState();
+ }
+ else
+ {
+ MemoryContextSwitchTo(ecxt);
+ PG_RE_THROW();
+ }
+ }
+ PG_END_TRY();
+ return trg;
+}
diff --git a/doc/src/sgml/pgtrgm.sgml b/doc/src/sgml/pgtrgm.sgml
index 30e5355..c4105e1 100644
--- a/doc/src/sgml/pgtrgm.sgml
+++ b/doc/src/sgml/pgtrgm.sgml
@@ -145,9 +145,10 @@
operator classes that allow you to create an index over a text column for
the purpose of very fast similarity searches. These index types support
the above-described similarity operators, and additionally support
- trigram-based index searches for <literal>LIKE</> and <literal>ILIKE</>
- queries. (These indexes do not support equality nor simple comparison
- operators, so you may need a regular B-tree index too.)
+ trigram-based index searches for <literal>LIKE</>, <literal>ILIKE</>,
+ <literal>~</> and <literal>~*</> queries. (These indexes do not
+ support equality nor simple comparison operators, so you may need a
+ regular B-tree index too.)
</para>
<para>
@@ -203,6 +204,23 @@ SELECT * FROM test_trgm WHERE t LIKE '%foo%bar';
</para>
<para>
+ Beginning in <productname>PostgreSQL</> 9.3, these index types also support
+ index searches for <literal>~</> and <literal>~*</> (regular expression
+ matching), for example
+<programlisting>
+SELECT * FROM test_trgm WHERE t ~ '(foo|bar)';
+</programlisting>
+ The index search works by extracting trigrams from the regular expression
+ and then looking these up in the index. The more trigrams could be
+ extracted from regular expression, the more effective the index search is.
+ Unlike B-tree based searches, the search string need not be left-anchored.
+ However, sometimes regular expression is too complex for analysis, then
+ it performs the same as when no trigrams can be extracted from regular
+ expression. In this situation full index scan or sequential scan will
+ be performed depending on query plan.
+ </para>
+
+ <para>
The choice between GiST and GIN indexing depends on the relative
performance characteristics of GiST and GIN, which are discussed elsewhere.
As a rule of thumb, a GIN index is faster to search than a GiST index, but
diff --git a/src/backend/utils/adt/regexp.c b/src/backend/utils/adt/regexp.c
index 92dfbb1..b30e753 100644
--- a/src/backend/utils/adt/regexp.c
+++ b/src/backend/utils/adt/regexp.c
@@ -128,7 +128,7 @@ static Datum build_regexp_split_result(regexp_matches_ctx *splitctx);
* Pattern is given in the database encoding. We internally convert to
* an array of pg_wchar, which is what Spencer's regex package wants.
*/
-static regex_t *
+regex_t *
RE_compile_and_cache(text *text_re, int cflags, Oid collation)
{
int text_re_len = VARSIZE_ANY_EXHDR(text_re);
diff --git a/src/include/regex/regex.h b/src/include/regex/regex.h
index 616c2c6..7e19e8a 100644
--- a/src/include/regex/regex.h
+++ b/src/include/regex/regex.h
@@ -171,5 +171,6 @@ extern int pg_regprefix(regex_t *, pg_wchar **, size_t *);
extern void pg_regfree(regex_t *);
extern size_t pg_regerror(int, const regex_t *, char *, size_t);
extern void pg_set_regex_collation(Oid collation);
+regex_t *RE_compile_and_cache(text *text_re, int cflags, Oid collation);
#endif /* _REGEX_H_ */
On Mon, Nov 26, 2012 at 4:55 PM, Heikki Linnakangas <hlinnakangas@vmware.com
wrote:
Great, that top-level comment helped tremendously! I feel enlightened.
I fixed some spelling, formatting etc. trivial stuff while reading through
the patch, see attached. Below is some feedback on the details:* I don't like the PG_TRY/CATCH trick. It's not generally safe to catch an
error, without propagating it further or rolling back the whole
(sub)transation. It might work in this case, as you're only suppressing
errors with the special sqlcode that are used in the same file, but it
nevertheless feels naughty. I believe none of the limits that are being
checked are strict; it's OK to exceed the limits somewhat, as long as you
terminate the processing in a reasonable time, in case of pathological
input. I'd suggest putting an explicit check for the limits somewhere, and
not rely on ereport(). Something like this, in the code that recurses:if (trgmCNFA->arcsCount > MAX_RESULT_ARCS ||
hash_get_num_entries(trgmCNFA-**>states) > MAX_RESULT_STATES)
{
trgmCNFA->overflowed = true;
return;
}
PG_TRY/CATCH trick is replaced with some number of if/return. Code becomes
a bit more complicated, but your notes does matter.
And then check for the overflowed flag at the top level.
* This part of the high-level comment was not clear to me:
* States of the graph produced in the first stage are marked with
"keys". Key is a pair
* of a "prefix" and a state of the original automaton. "Prefix" is a last
* characters. So, knowing the prefix is enough to know what is a trigram
when we read some new
* character. However, we can know single character of prefix or don't
know any
* characters of it. Each state of resulting graph have an "enter key"
(with that
* key we've entered this state) and a set of keys which are reachable
without
* reading any predictable trigram. The algorithm of processing each state
* of resulting graph are so:
* 1) Add all keys which achievable without reading of any predictable
trigram.
* 2) Add outgoing arcs labeled with trigrams.
* Step 2 leads to creation of new states and recursively processing
them. So,
* we use depth-first algorithm.I didn't understand that. Can you elaborate? It might help to work through
an example, with some ascii art depicting the graph.
This comment is extended with example.
* It would be nice to add some comments to TrgmCNFA struct, explaining
which fields are valid at which stages. For example, it seems that 'trgms'
array is calculated only after building the CNFA, by getTrgmVector()
function, while arcsCount is updated on the fly, while recursing in the
getState() function.
TrgmCNFA comment is extended with this.
* What is the representation used for the path matrix? Needs a comment.
Comments are added to PackedTrgmPaths and TrgmStatePath.
* What do the getColorinfo()
getColorInfo comment now references to ColorInfo struct which have comment.
and scanColorMap() functions do?
scanColorMap comment now have description of colormap structure.
What exactly does a color represent?
This is added to the top comment.
What's the tradeoff in choosing MAX_COLOR_CHARS?
Comment is added to the macro.
------
With best regards,
Alexander Korotkov.
Attachments:
trgm-regexp-0.6.patch.gzapplication/x-gzip; name=trgm-regexp-0.6.patch.gzDownload
� +��P �<kw�H������MX`��d�sh�$�ql/v&3��r
� ��D����N�������v��]���[�n�w�*���SQ���P�=�sC���-g���-�>�K5�%&����J����h���v���Z�����V�?��z�Vk��zR�T����Z�n����C����!���#���������=��y�T����sh�����y�mfa���y�c��������I����A�P�6j�Z��I"w)�K9SV�I�E�����S�_}5S���Z�mJS*3TV�P��p����mS�^����W�)Uo��
���zE[�S:(�D`/lG�vx]����v������)�"::m�[����wUf��������t<�
B�����!�~*~6��u�%���;��Q�� �&$42�����+���Q������D�pE��j�|%B�\�����/&�4�R�i��L�?!���_�r�����+�����O����~6lP�JJK3��v�K�D��iQ��h�����,�M�~�VMQJ�
���n��]����]�|�����;� 6^'|M�� b��=���x�zd��s8|���Z�)�5:_�������Y�R����r������S��V�/��6�sl��R�'&�c�������`p#��t�y��A����X�Z����)�� p�� p,o��sN:�-�����~���8=��#�v������1��9��N�a�&�D���OK1&�8���~�]��1�
p��F������������ck�6��f��w���o����$��z��Al�q����k�}�'����c\��y�?��@�i�3�[$�;�4��p�a������>��-5�Gc���_��'w���9�?�;u�Fg?�������+�����&\�n�9����L ���'e�����:�1�c���2��&'�R!L_�P��
B��m��LPX~&�����r��Q�@}
L�����[>Y�|����m���i�o�j�N��07�w�h�W���j#�F��Gl�t" �(MS����u]v-�]A������D�%|bw5���@�����`���o���������g�?@*Q(4iY@iS��/�:�=��JM��o��I�Dc<��H�T��z0�>��#���`}��B+p��
JC��l�}���qt U�2O�7�^* �\�T����r��E����4�{�\he�:��F'������fB�3{���m8<;=9��������b��FG'��F%Y.���������"7�g����J�EY�x!J ���b#�o���y�z�Q������������d���1/�� >[6<����r�����������HR���JHr�7���b��C)D����RI��5M[Avwj� ���h[������X��}�Q��*��7�����nx1�J��X
�!�)���wq��G�G�j��0�dx<� C>V������o�M �^�.��~V��� ���WK_�
%��M�R��@,$���5Q4��
� a!�(� �Y��b|1��+0��B�9P��LcK���Tf�����uPCi�i���o���4�d��Z��h���u0+��D }�������
(��rMi^�����0�,=�:d!4_p�H���<�_���p'���c��Z�~d���+�+�]0e��1����/��C�I�RY��3�%1��|WLO�m.5;�R��(���*3�*�����>����i`"E���W�c����8'��4��(L7C6YBl��H�7 t�y��1FU k�'�����o�m1n�|����i��5�S����sd�����-�������� l7���sl����y��HP��q����E}k�B'����;�%1�F{:���j�i������ls"��k�L���i:{7~�������d<�;/s��1�I�s(�O5�����M��SQ@���`:�Zh� g�n|1�����T�bu�C5p.K8�ZW�����J
��'�1�#N�|��n��
bT]���_�&�����Z�W� <��� k����1�@@�;�����Hfw#�y��Yp�dt��@I��@}���.��DI#S�|<>����@���q��Q
�@�N� ���Bpe#+K�P
�r~
:�����B�~*�����U�~��W�R�7��&�ZK�U-`��,�I��+�W��,��)VA�����v��h�6�U��Q�
��2y]��I� i��X�,��!�B����%P���do��<b��/R"�4��vIyC�������KL������������`p>���x��p4:��Ir��\B����D^�l'�C��J�"Y��.�����cq�~����-���1��\��bj������:��/L��/�@L cN��z���sY��2���Jo$q��{9��A=$u�A���SN;9{��T�)���jI������/!���zI'���,b��?+��W[�z>�`��������iw��*e�(��dH)/�l�~���<���� "'|I��=ay����mj��g�g=�y�A�;:��������� -���F������chl�UR����NK$�MbE�#�W�7s��V��]�>��*IklR����x��J-!'\,!������,v������B {��h���$whp
L������
���sE���Ku�@�ZMi�yP�sQ��t���L�����r�:��
fo�����#�������&������4rM�s�&� �q;�de�al�p��F�����ilMyz_�>���W�*~��u�F����q�� 4���O���5p�����1o��:����/�=������,�1���S6�F����O��C�t"wR�GW��[����DX�7�%�������[���&B^�>^�r�.z
88v!b,�z1��
�wqOf���q^�����DVe���U��U�������[�a+A�
Y1����:�W�����Y��$"�8CS��S�S�P�k�yHf`t?]���#~���SR���|IBi2�f�.,'�D�'�1�1�M����K6��=j���L�-7W)�o�A��5�Z�Zm�!e���V� \[j��X�����T$�5!�% ,�pab����@��0�]aF>d����QK�)���V�T��Ie��<�2��J�U]:�4:W���It��p��d��Hp���c���l^K �:����sE`��4� � # �>}!��Ax�
�Z D�d����!�9H+p|r�o����"���3Q�������8�^�Q�Jio{�l=�c���N�0�a�t�v1�a�8�@#�O�����L,�u�~3��Fzz��'�� q���z�^��-��dV�<H8�O�����Tn1PY�������@�v�x���.'z}��������R���q(~�t�F���{��E�`]�����p���T���������z<�1���'�7���*��5�r�x�Z�������8��B�cO���]3o[
��J�a��3>?�!/L�dy� ����c�C,p�I&E�)p��(I�u
Y�e)�\cV5Au���^v�����~����.��fo�h�5w���!,ZA��T�c��{6���$9���9q�k�����P���`|U R�^������.��({��=�Uo���JI����Wtt|x0�U
�-k����
���N���2CV\�:(R7�c�w�������$�V�a���`rI<������������Ba�����{
+>Z�"^>�Z-���-?�����B������N��`�b;�������h=�'�.�Hw?g7����q�O���q��o���j�N�.'�: ���������z�p�hO��wV�/��%����gK0�+��
K#|O1����^����ja[Jb�D�g����C���y��.�}3:�wb�t���ch�6�� �d0�5��r�[������SL��dc��i��t&^�%xT�$`S����i����2|Pf����-%�B�K$�� ����`�t{�:�_�A�y����;�tF�
K_+T`��{W�V&"��cJ4��.��<$�)]Ln� G����g���Q1Ts�����,��f�U�ZN����p�@���!K{��`� n�eL����G�BF�d�tq��#RM���������+���
XD�,.�Bvdf�Vf���h��r����E�,<�.K�F�����AG�y08n������� ��J��W2���C��7]�bl�s��b+�VFEX�
&�:��0�O+��!t�qm�LV��^�"���1��U�Hw���H`�h ����;mz#�,4��B�����o�H�3�don����1��,�?)�]J��)����e:�l)� J/����UU&5��2J��L���D�yA8���kEw��f[�i
��� U����$���EKe��U��.�1����d\����M�%L��~���(�(�y�V8������ANr���B8k��X#_�K����n�� �����_<��!��N����>@�t�����UGx�h��:-rE+g"�i��(S�F�9�8Y�U��>�=3����d����HR[%1g�D��)x�
�=��-����[�������S��Vo����Jdn^�&��U����R�Hc�v���5M�gy��j��f��g%���&26��tY +vM v�JYY���\5�{b�z9Y��j���yQ���MS�I:�wim�)�0���������-���J�D�L��:�@8�0;�I�_���s��.I��\H�r�h6G�b?�nf0�wf�P>W���s��P�90��a�p����k�c���G�<w'dh��������p�F�V@�p$V|W����EQ�j��g�j`���gi���xb9���)����y9�
�%#�g>o_�`X�e�!��dQ���P�]�����j/����p�%2_�m-U�������^���v��m-�|�M��vJr����T������E~ ����TnAJ��8�Yj��l3 ��&w�!&�)tz�GN>���{~���B���c���|�j~���lC������w���FEPy��8������MBt��L*(�39O��x��e6dG�v;
����Z,�(�e�0D�6�@9�*X,��k���N����6jYj�1>b!�����7�XG�r�D���u���x��u�Gw��T�d<y����5�o4����ZL(�,[�� DA�9(��
&�]j����$.d�
�d1($"�R������> ��;�Z`h�X���=��������i��ch��&.O��B
�;k�W�]� �C/F�&5)�� ,m�����~ �a���,c�� >q
` ��&�)'f:M;&��!jm<��7V'nm��x�o��R?
���IGc�����J���NL��
���Q�U?�:*h�of�u���tX�he#��������Q���"^�JX���HL!������%�1m�����I���d{�������?aJ�'P���"�dq��`�aG��^F���A~ -�J�{LIx
=��z��n�u�����!.��0>�A�7ux�P�b��<}](}�s��1z��^aN��]���?R�R
���`)�t1��m;0���������|��?�=kwG���_��l�dd��dx�1|bc�<����r�n�+j ��L~��g���nI@fv>,'���z�����&���v(���%��=�������O��_���3=����H ��{K��D�&����<�{������ply���.��O�����h�" E.&�e�@&������� ?2+w��\\_�"|�C���J
~�a �K����:�:?��>�<~��K����`F2��)�����0���"p �����a1��F6q�9p9W�d0��5�����e���%��4���
��N�!}O�)������<�w9������QB\����o����$���4���|}��x�T���'���
�I�����l�@�`���p��7��t���b1Km�������G/O^�u�2v�q��w��g��g5D�{5J��d��,�0
(��f�`�Q�� �v*^�2"�u;G����EM��s����{~qx�9Ov �� [�El
�(�U�������������^^<Zrx�5����/��`������q�������4������S�#�F 5g�=I�8�f2�I/��8.����u�����������*RN.4�=�����h12��������\B��b5��{,���#E
I&*��m�lx
�i2*������>����W�OF�]���9�����f!����f\bF���`0�]�1&�d��*FV�H������g}#`�KH5/�{�CZL�Pu���lcq��F���y@��v��)|AY`c��������<q����{
��q��M���"�-�IK�G��%g���������i��I_�
�F1@E� �������=|�K�7)���9��\*��7�|D�H��N��4�zcc��I���po���J�7��$^�{=&�Q|��e���)Q�R�tc�h��Gd� <�9�|�^A��J�(7��(����Z�������2�z���@��'������0G��. C�Rh�� ��� (�0sO��6'�0�C�J
AS(z.i�f���,�6��2��>����J=���U��V7�%}�*o�c�;�W�N��������<���r��Gmd���=��i~EJ{h��H�(��V����T�C�c$%6�b�M��.�F<��P��Y���2�]�:,:{
��$�%�N��\����`Jc��pc�S�d�3P}����j`$?$�K�_�
,��#DKA�#�R�������?'R����[����' �q�0�F��p�G.^�1�� S@��<�D"(>��� 7��;oM*��+��c��d��u�����QJ���^��{0�bW��A�5(�,�W���1S��H*��.�{����;����LDzB"�����:A`s�S�����<�(��HZ
��L�18-r,��8��f.��4z�li�\I��%9U>#�
�yx��*��K�� �� uO���i@���[��?��y�P�Rb�u��R��V<�m�
�83��34�����]�/�-����& *@�I���M���q"aW����@��4��IF],�Ak%��������X8a$��8"�Ef��`:d2p��$��#�d��f� k������`�&p�|���?�E�������8�mm��K���YXC��NH�M�r����:��8RQHwp��o�+@��J%�C���
-9����b��d5������XCD��(��*1RPB�������l*S=;���H��4��8U(��[y� ����<y�~���Z�:6���o��?�E��#J�}D"��cV����/�;��p�I8�P6�_>����0�9�IpEu����
;��ba [���}�7�����h�-����t�<����]<O�� }�9E��o���%���.G����-hu����6H |?<��W�?���k�_�����G��W�xU �@A��FB�����G2e�3��_���K�{lJ���5YK����o�q\�F7_��/Y � �c8�d����}}�"��s�W�o$��"�\�mhY�.��o����=����Q�D~���e`PLn���<��pFj��-�lA�<�h���o-�Z�m�}#����H������gDg�����Y��@o�V�zS�j��!�z^9��8w�-�|�M�����><�
p8�< �Ph�i�U{@2j����I\h�G�cd0T���9EW����"�?��u#��(:& N5�^�r
n��,�;uC��frR�,/�<��
J�&s�o����X>`��!�
�i �:�5�����e��-������F��K
2_���%��h(`
��C`C;��7�-�9���~�9}�����F���k��d�\-60wE!�7<��:��h;�cI��]|�N�nq��������!p�{� � 2i $F^��$;����[�R���E�-���o<�N_����`u`��n����3�D��a6~O[Bb��7�9�F�V�K?&���8����1K����5$��(�=��B���(��}�u���\���l���Y���%��_��D�lX����]5�#1���14��4y�0�2��Cy�C&o�|� A���2��<~8op����2�i� G:�6"QFuY��j*��n�����~�������h�6��C������\1c��g� |*&�&R"rk�/S�G)5H0@��8E�.�����q�7���e�3�� ��:�3��k/:!�\�4+�Qyi}�����J`�UpR"����J�'Nh��_����7'[�p�Qw���0�(�Obcc�������G���yz��Q�]h�c��f���.0���{0K������kX6�����p�G���"���&��9�M�f��(k�J�
�(��A����<*�H�3H�"�g�����:����No ���h @W ��r���X�U������Dl���lnyr������*aJx@/� ]���Z3E����P#Q�8�}]>a�~)��wL9���6)����>�������Q��l��������o�>�I��8����W���)<l��,�.+�mLP���Jt8��Y�zZ� d@��G��}�X��#p���q{b�Og���0X&��^5�2��|R|�*�&�0��_�`j���#5I�a�3��im������f��O�:&O�I��O���� ����n�O���8Z\�Ked� N^<�]������[�&8f���B�J�/�����c�
yi8�*N�c@�O�fL�i�RB��t?)<��N���G��t��&J��gY�d y. �����b��M�j�p^�*�.g6�7>2"��������+�D=W���H}��w"=)tnM�G)�{��
:��M��y�&vs4���W�:���U��z��B��@�!]�12�hC�\�����U����hv�>�T���
��������z2�C������.��}N�-7�{�vq���g��<�;l���������z1k��$���������C�&6I �{�/��>@4$'��u���J��<����m rG0�[W/��Q,S�W�d�,R��-d��V ��7A�h�*.�x��)��W���)�/�0����!����`<���O��]����d�Wp�F�c-���[{L(Scr�QB��I����\��9�'��K�p9���S�s����d��Xs4uQhy��r���-�K�����9��!j����������XA�
J�[(��9�0�{�-H[1�MZ��9���!I�.|Z���t�����_��0��SE>�6(H�}������1�+\�g�j��O}:��F��1�����7#}����<��7vT,�x=u{>7�Rn
�
:wC\$*`��W1��d���P��^V�2����p����V�$���m.��V���(��=�8�&��a���_G#hV��E�UV��KV���HJ��"G�ge�)�t��,+O�Z ����yuv�]�f����
[�-Sk�v�Q���M|C2/��j��p@�����}X���������7d QSu)�m���G�Q�gU���5�(��}���k�#����=�&�?L�k����Jj<��B0<���V������D��u�p].��dVN@�p�m 4��v� ������Du�+h��\T���QC�PK��wavJ�}�,������]M���*\�*���#0���+�f�8+IW��M�K���Y�s�]��F
_8I��w'��'y����mk��:��������0:a�1(���W�+!������$��c�z��C���"�z���H��n���X�J'������1�m�HW-�#��8��%:���2�(�>~���� {�B7(
G�������� ��&��5�o��(Y��73��g���Q�fs��E?�I�����eZq��'�����W�H�i�����\�� ��.�2/`[�d'�K�����8�e��L|]�P$�G3g�e�d=����*�!.@ s&L�N����~F~��t���$�:d���`=����V���y���>�J�7.������*�|DH����0�K��;��Mn6+K�#$'�y����2���������b��'A�2��p�+����
<Z����GU.�~���\��a!y�4!�s��'��r�g�F,�Q7!���T��Lq�s�]#�~���%"o ��z�@�]C���w}�7�|�H�]���@��W��,k���*7�g��Y�J��\>U�X|R%���0
�3a��d��;�'��o� (��
9����V��h�.gC1���_������n�%�=c���*�,���^��d��P�ws��6
��C��\Xc=�L9�C����Nu����2p�x����B?�����l�1M�Kd_��R��z\�[�</�U�8����u���U�`�C��%�Dz�N&�6������<tS��z����l����;S��0a�o,�/�X���
�B��D���k�EF�X�Og��~@�b�'�+6����Da��>�Ip�.��.�]�����"jr;@8���7:�����}���H���R8mL%K<����{��A�E'9"���.��m�~IwX�����[=q� ��-���x
�E�
���v������us��������g�~�����l��b�����l����5h�`�����0EU�E����C4-:
�
��n���k�V�5
Oq�V{{��� �|���;z5D�t��~#�y3��)�Kv�'.(OH@�y��T���s����(r�W}�oK�(���3�R�e��U|�����CEE#"�O����U�B;5�Y���].]vie,r�����������(��������7���`���������<vq���c*�c�?<����T��3�s������������m���L�&!���0B���%pj�����)��k�N&�������2d ?#��h*���Y��I��;�
�n?P N��z��r��mo�_�H���L���I���Da�V�Q��@-} ���L��
���$s�B��9�B�i/g��#I�zh�/A��1�=���0��uL������Q����E���mG�J�b���o�h �a�.�X�������/�$j�(�f8��b ��wF��$W� 1S�L[����O�/d��xoln-<����,�A�����������r�f0������b�aR|�m��b$(��g���%�w��[�j�dk�~���jI�gA��g��&����5L#b��R�dR�v�-U��>����\����rQ���������2��'�����}�������W�Oc��Y��0F8�x�0�����FNu�J�_)��*,s�'����eZ��0�N��
�\�C��j�&�a>���0(�o�h��W{9��T:�K�?%��'��`%V��� j���H��Ew�]��y ��k��b LD���z\��y��d����D�q�O����}H5T��2�kj� ��{�to�+���U���<u@�K?� �����rMR����2'� N��n�ou�'Ky���=�Y�[�ZT��*��3 <*���{|�T�b�~/c�E������%Xo�e��G�!�� ��F����e.�R�4�!��m
.����WEH������F�1��>��7&T��4o����-_�^�}`��' �}4�3����)'!����oy^�����a���C��K$L����I0�H�V�GVu���
k'y��!z��c}~JXB\_�e���H�Hl+��lXy�C�����6�L�6��-2b�9��!,=C�<7%K�E��#!�0�#�Y��,R�3�3�+�y��R����s���3�(�6h�;n1���U�]�rkQ���5��'�xoO��x���P���I�x`Z��i������!H��C���|�c��<�?��q"� n�{�r�$���,���l���&���TS���{\���R�t#����EKD�*��U+�������������-������m����ca<<�U�%j����\}�PPf�$�7�������W��8O���7-�R�!OK7}����t�XK�HT�����E�����/��9 �����vc�1�}���Hy�Z�������}�mH�<�]�����MnGdt��0��#�iI4m�x���5� |
��J�V�&>� ]ao������k��"����P��������w-F��_7Y!�����F�
Q�e��2�������F���+�@�O����b^)~>p��O(���u�tY��4�f��.8\M�H����h�[��q6�<��X0+�c����N��)�4�j��j��=���K� �f�(U@��kY�;���w��s��e=Q��S��[Jt�?��������I�;�^U�4��;NQ�#V7 �� �����p
�)F�24xF$�N��/�������w�P�P�����.��P��M��}�gzT���!5S#[����������#��#%�1��w���%4���6����-0�E���T�'I�Pk&����(��� �f���,��b��������$k��<�d2 �[�� 0X���#��7b�9 o-:��`����xnI��Q�����~���T���`�ipXb�-;���,���3�bM��":�"������0�Ry��3�\���#TM�.���|��Q��n����E���f���H�����;�M9�S%�5"�;���`��'�h�=�8�#�V��9So�����������1�j#I����%�L���SN���������my�%g�.��Az���v�ph��[�������O�:O����1��H�t����������������'af�O�by���j��q�����'������I��]������=W�A��
%(&�C��>6J1�{�]m"�eI��c��O�|�.,�Gy�yi����ty"Z=k�
�%o�d�8b������pd 6����5x�W�>��p�d&���'��ua�b[�>3�}[�����|�y`O��^�^t~��������#�R_�����N�}���PI�'��{�*:��J�w�W�����A ��nU��M.;�-�\d��5��:���XK����F.��:
D1��� ���:'��7��������'v��a��;a{~ �=��4���ro�;A��+�xe�PM�5�^�)PJy��4a��alV{��\.�����J�e$��b�����+������������l�����,x�����Y}��)6r�,+ey��Y��8��S������n!��c����dw���������=�M�qDM��\V��D�f�����*�����{��G�����{�~��������6wvv������/�i'�G����o���M�]Y1�b\���]���t�o�%��s6f]*G���QC ������M'|Q
%j�eMy6����P>^��5��?�4�"u����t2���z���.���e�p0LC:���8 �nN63m����I��;��a�!������;��=��X�rcp@1�F_3����&d3�����Eo����&�],�3Y>s[f���{����{��Q���]��.�4�4��:������?���=�A��]���Fq���{�G���G+��[Y�7�Y����x��T"��|�����z����s�I� �@r��!����F{�[FS��{�v�kS��������L������3�F��+����o'��i����88]��,5�l/�v���&�����$!��[���{�!�G� ��x��Y�f$�k����K �d�U)4�� &�|����p/���p"����_�
�p2y/���".��i�}�����]�zA7]^���l"�>EI��&���
��+��'Q����*b*"(G�y�&�t0����7��u&xO5f�m����l�_$]�2�������c��,m8jxT�S���w�W\�.��i���x,��|���
��.y\��,�������p�MAO2,�/��=.�b`k�=�$��cw�=�c(`���k��e�/��OyD�6���hBp6���r+2�^������23�b�x�_eyA)�Q����4��Ct�-���q�]*��ns�ZG���z���kdB���K�M����l�����{�d�6��-/���K�jpy�j6/�+���]��V4���B1^����|1b ��e��PZ��
k3f���6Zw�9n'yE�zt8pZP �a���k��~L����X.��?�%�l�KBt���Y��)^z=������w#~���9m7���tN�BxQ�3�G������evE�:t������P��e���G���?w;?=r��5����5S���f���U8\��?\\�����/}��H}������l~����~�3G��-���� �A�
����R����z��L|��i5Z���?p�4FqQ�� (�o��� .��f�F�ih���m&
�`��~�t������n.r��/������`r8����]���a;��h� On Mon, November 26, 2012 20:49, Alexander Korotkov wrote:
trgm-regexp-0.6.patch.gz
I ran the simple-minded tests against generated data (similar to the ones I did in January 2012).
The problems of that older version seem pretty much all removed. (although I didn't do much work
on it -- just reran these tests).
I used two 2 instances, 'HEAD' and 'trgm_regex', which were both compiled with
'--enable-depend' '--with-openssl' '--with-perl' '--with-libxml'
Tables used:
rowcount size table size index (trgm)
azjunk4 10^4 rows 1,171,456 | 9,781,248
azjunk5 10^5 rows 11,706,368 | 65,093,632
azjunk6 10^6 rows 117,030,912 | 726,310,912
azjunk7 10^7 rows 1,170,292,736 | 4,976,189,440
(See my previous emails for a generating script)
Tables contain random generated text:
table azjunk7 limit 5;
txt
----------------------------------------------------------------------------------
i kzzhv ssaa zv x xlepzxsgbdkxev v wn dmvqkuwb qxkyvgab gpidaosaqbewqimmai jxj
bvwn zbevtzyhibbn hoctxurutn pvlatjjyxf f runa owpltbcunrbq ux peoook rxwoscbytz
bbjlbbhhkivjivklgbh tvapzogh rj ky ahvgkvvlfudotvqapznludohdoyqrp kvothyclbckbxu
hvic gomewbp izsjifqggyqgzcghdat lb kud ltfqaxqxjjom qkw wqggikgvph yi sftmbjt
edbjfl vtcasudjpgfgjaf swooxygsse flnqd pxzsdmesqhqbzgirswysote muakq agk p w uq
(5 rows)
with index on column 'txt':
create index az7_idx on azjunk7 using gin (txt gin_trgm_ops);
Queries were of the form:
explain analyze select txt from azjunkXX where txt ~ '$REGEX';
The main problem with the January version was that it chose to use the trgm index even when it
could take a long time (hours). This has been resolved as far as I can see, and the results are
now very attractive.
(There does seem to be a very slight regression on the seqscan, but it's so small that I'm not yet
sure it's not noise)
Hardware: AMD FX-8120 with Linux 2.6.32-279.14.1.el6.x86_64 x86_64 GNU/Linux
PostgreSQL 9.3devel-trgm_regex-20121127_2353-e78d288c895bd296e3cb1ca29c7fe2431eef3fcd on
x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.7.2, 64-bit
port instance table regex rows method expl.analyze timing
6543 HEAD azjunk4 x[ae]q 46 Seq Scan 12.962 ms
6554 trgm_regex azjunk4 x[ae]q 46 Bitmap Heap Scan 0.800 ms
6543 HEAD azjunk4 x[ae]{1}q 46 Seq Scan 12.487 ms
6554 trgm_regex azjunk4 x[ae]{1}q 46 Bitmap Heap Scan 0.209 ms
6543 HEAD azjunk4 x[ae]{1,1}q 46 Seq Scan 12.266 ms
6554 trgm_regex azjunk4 x[ae]{1,1}q 46 Bitmap Heap Scan 0.210 ms
6543 HEAD azjunk4 x[ae]{,2}q 0 Seq Scan 14.322 ms
6554 trgm_regex azjunk4 x[ae]{,2}q 0 Bitmap Heap Scan 0.610 ms
6543 HEAD azjunk4 x[ae]{,10}q 0 Seq Scan 20.503 ms
6554 trgm_regex azjunk4 x[ae]{,10}q 0 Bitmap Heap Scan 0.511 ms
6543 HEAD azjunk4 x[ae]{1,2}q 49 Seq Scan 13.060 ms
6554 trgm_regex azjunk4 x[ae]{1,2}q 49 Bitmap Heap Scan 0.429 ms
6543 HEAD azjunk4 x[aei]q 81 Seq Scan 12.487 ms
6554 trgm_regex azjunk4 x[aei]q 81 Bitmap Heap Scan 0.367 ms
6543 HEAD azjunk4 x[aei]{1}q 81 Seq Scan 12.132 ms
6554 trgm_regex azjunk4 x[aei]{1}q 81 Bitmap Heap Scan 0.336 ms
6543 HEAD azjunk4 x[aei]{1,1}q 81 Seq Scan 12.168 ms
6554 trgm_regex azjunk4 x[aei]{1,1}q 81 Bitmap Heap Scan 0.319 ms
6543 HEAD azjunk4 x[aei]{,2}q 0 Seq Scan 14.586 ms
6554 trgm_regex azjunk4 x[aei]{,2}q 0 Bitmap Heap Scan 0.621 ms
6543 HEAD azjunk4 x[aei]{,10}q 0 Seq Scan 20.134 ms
6554 trgm_regex azjunk4 x[aei]{,10}q 0 Bitmap Heap Scan 0.552 ms
6543 HEAD azjunk4 x[aei]{1,2}q 89 Seq Scan 12.553 ms
6554 trgm_regex azjunk4 x[aei]{1,2}q 89 Bitmap Heap Scan 0.916 ms
6543 HEAD azjunk4 x[aei]{1,3}q 89 Seq Scan 13.055 ms
6554 trgm_regex azjunk4 x[aei]{1,3}q 89 Seq Scan 13.064 ms
6543 HEAD azjunk4 x[aei]q 81 Seq Scan 11.856 ms
6554 trgm_regex azjunk4 x[aei]q 81 Bitmap Heap Scan 0.398 ms
6543 HEAD azjunk4 x[aei]{1}q 81 Seq Scan 11.951 ms
6554 trgm_regex azjunk4 x[aei]{1}q 81 Bitmap Heap Scan 0.369 ms
6543 HEAD azjunk4 x[aei]{1,1}q 81 Seq Scan 12.750 ms
6554 trgm_regex azjunk4 x[aei]{1,1}q 81 Bitmap Heap Scan 0.355 ms
6543 HEAD azjunk4 x[aei]{,2}q 0 Seq Scan 14.032 ms
6554 trgm_regex azjunk4 x[aei]{,2}q 0 Bitmap Heap Scan 0.540 ms
6543 HEAD azjunk4 x[aei]{,10}q 0 Seq Scan 20.377 ms
6554 trgm_regex azjunk4 x[aei]{,10}q 0 Bitmap Heap Scan 0.550 ms
6543 HEAD azjunk4 x[aei]{1,2}q 89 Seq Scan 12.706 ms
6554 trgm_regex azjunk4 x[aei]{1,2}q 89 Bitmap Heap Scan 0.969 ms
6543 HEAD azjunk4 x[aei]{1,3}q 89 Seq Scan 13.127 ms
6554 trgm_regex azjunk4 x[aei]{1,3}q 89 Seq Scan 13.025 ms
6543 HEAD azjunk4 x[aeio]q 105 Seq Scan 12.533 ms
6554 trgm_regex azjunk4 x[aeio]q 105 Bitmap Heap Scan 0.391 ms
6543 HEAD azjunk4 x[aeio]{1}q 105 Seq Scan 12.532 ms
6554 trgm_regex azjunk4 x[aeio]{1}q 105 Bitmap Heap Scan 0.362 ms
6543 HEAD azjunk4 x[aeio]{1,1}q 105 Seq Scan 12.323 ms
6554 trgm_regex azjunk4 x[aeio]{1,1}q 105 Bitmap Heap Scan 0.449 ms
6543 HEAD azjunk4 x[aeio]{,2}q 0 Seq Scan 14.417 ms
6554 trgm_regex azjunk4 x[aeio]{,2}q 0 Bitmap Heap Scan 0.844 ms
6543 HEAD azjunk4 x[aeio]{,10}q 0 Seq Scan 23.056 ms
6554 trgm_regex azjunk4 x[aeio]{,10}q 0 Bitmap Heap Scan 0.668 ms
6543 HEAD azjunk4 x[aeio]{1,2}q 121 Seq Scan 13.072 ms
6554 trgm_regex azjunk4 x[aeio]{1,2}q 121 Seq Scan 13.750 ms
6543 HEAD azjunk4 x[aeio]{1,3}q 123 Seq Scan 12.916 ms
6554 trgm_regex azjunk4 x[aeio]{1,3}q 123 Seq Scan 13.078 ms
6543 HEAD azjunk4 x[aeio]{1,4}q 124 Seq Scan 13.478 ms
6554 trgm_regex azjunk4 x[aeio]{1,4}q 124 Seq Scan 14.334 ms
6543 HEAD azjunk4 x[aeio]{2,4}q 19 Seq Scan 13.922 ms
6554 trgm_regex azjunk4 x[aeio]{2,4}q 19 Seq Scan 13.503 ms
6543 HEAD azjunk4 x[aeio]{3,4}q 3 Seq Scan 14.325 ms
6554 trgm_regex azjunk4 x[aeio]{3,4}q 3 Seq Scan 13.429 ms
6543 HEAD azjunk4 x[aeiou]q 134 Seq Scan 12.356 ms
6554 trgm_regex azjunk4 x[aeiou]q 134 Seq Scan 13.215 ms
6543 HEAD azjunk4 x[aeiou]{1}q 134 Seq Scan 13.005 ms
6554 trgm_regex azjunk4 x[aeiou]{1}q 134 Seq Scan 12.893 ms
6543 HEAD azjunk4 x[aeiou]{1,1}q 134 Seq Scan 12.430 ms
6554 trgm_regex azjunk4 x[aeiou]{1,1}q 134 Seq Scan 13.108 ms
6543 HEAD azjunk4 x[aeiou]{,2}q 0 Seq Scan 14.486 ms
6554 trgm_regex azjunk4 x[aeiou]{,2}q 0 Bitmap Heap Scan 0.349 ms
6543 HEAD azjunk4 x[aeiou]{,10}q 0 Seq Scan 21.597 ms
6554 trgm_regex azjunk4 x[aeiou]{,10}q 0 Bitmap Heap Scan 0.363 ms
6543 HEAD azjunk4 x[aeiou]{1,2}q 156 Seq Scan 13.069 ms
6554 trgm_regex azjunk4 x[aeiou]{1,2}q 156 Seq Scan 13.879 ms
6543 HEAD azjunk4 x[aeiou]{1,3}q 160 Seq Scan 13.005 ms
6554 trgm_regex azjunk4 x[aeiou]{1,3}q 160 Seq Scan 14.016 ms
6543 HEAD azjunk4 x[aeiou]{1,4}q 161 Seq Scan 13.603 ms
6554 trgm_regex azjunk4 x[aeiou]{1,4}q 161 Seq Scan 14.667 ms
6543 HEAD azjunk4 x[aeiou]{2,4}q 27 Seq Scan 13.656 ms
6554 trgm_regex azjunk4 x[aeiou]{2,4}q 27 Seq Scan 14.113 ms
6543 HEAD azjunk4 x[aeiou]{3,4}q 5 Seq Scan 13.541 ms
6554 trgm_regex azjunk4 x[aeiou]{3,4}q 5 Seq Scan 14.265 ms
6543 HEAD azjunk4 x[aeiou]{1,5}q 162 Seq Scan 13.750 ms
6554 trgm_regex azjunk4 x[aeiou]{1,5}q 162 Seq Scan 13.858 ms
6543 HEAD azjunk4 x[aeiou]{2,5}q 28 Seq Scan 13.745 ms
6554 trgm_regex azjunk4 x[aeiou]{2,5}q 28 Seq Scan 13.680 ms
6543 HEAD azjunk4 x[aeiou]{4,5}q 2 Seq Scan 13.577 ms
6554 trgm_regex azjunk4 x[aeiou]{4,5}q 2 Seq Scan 13.755 ms
6543 HEAD azjunk4 x[aeiouy]q 173 Seq Scan 12.106 ms
6554 trgm_regex azjunk4 x[aeiouy]q 173 Seq Scan 12.655 ms
6543 HEAD azjunk4 x[aeiouy]{1}q 173 Seq Scan 12.488 ms
6554 trgm_regex azjunk4 x[aeiouy]{1}q 173 Seq Scan 12.723 ms
6543 HEAD azjunk4 x[aeiouy]{1,1}q 173 Seq Scan 12.525 ms
6554 trgm_regex azjunk4 x[aeiouy]{1,1}q 173 Seq Scan 13.957 ms
6543 HEAD azjunk4 x[aeiouy]{,2}q 0 Seq Scan 14.574 ms
6554 trgm_regex azjunk4 x[aeiouy]{,2}q 0 Bitmap Heap Scan 0.295 ms
6543 HEAD azjunk4 x[aeiouy]{,10}q 0 Seq Scan 20.916 ms
6554 trgm_regex azjunk4 x[aeiouy]{,10}q 0 Bitmap Heap Scan 0.311 ms
6543 HEAD azjunk4 x[aeiouy]{1,2}q 204 Seq Scan 12.730 ms
6554 trgm_regex azjunk4 x[aeiouy]{1,2}q 204 Seq Scan 13.392 ms
6543 HEAD azjunk4 x[aeiouy]{1,3}q 215 Seq Scan 12.824 ms
6554 trgm_regex azjunk4 x[aeiouy]{1,3}q 215 Seq Scan 13.083 ms
6543 HEAD azjunk4 x[aeiouy]{1,4}q 218 Seq Scan 13.985 ms
6554 trgm_regex azjunk4 x[aeiouy]{1,4}q 218 Seq Scan 13.890 ms
6543 HEAD azjunk4 x[aeiouy]{2,4}q 46 Seq Scan 13.735 ms
6554 trgm_regex azjunk4 x[aeiouy]{2,4}q 46 Seq Scan 13.724 ms
6543 HEAD azjunk4 x[aeiouy]{3,4}q 14 Seq Scan 13.470 ms
6554 trgm_regex azjunk4 x[aeiouy]{3,4}q 14 Seq Scan 14.046 ms
6543 HEAD azjunk4 x[aeiouy]{1,5}q 219 Seq Scan 14.245 ms
6554 trgm_regex azjunk4 x[aeiouy]{1,5}q 219 Seq Scan 14.370 ms
6543 HEAD azjunk4 x[aeiouy]{2,5}q 47 Seq Scan 13.483 ms
6554 trgm_regex azjunk4 x[aeiouy]{2,5}q 47 Seq Scan 15.065 ms
6543 HEAD azjunk4 x[aeiouy]{4,5}q 4 Seq Scan 13.394 ms
6554 trgm_regex azjunk4 x[aeiouy]{4,5}q 4 Seq Scan 14.158 ms
6543 HEAD azjunk5 x[ae]q 677 Seq Scan 114.862 ms
6554 trgm_regex azjunk5 x[ae]q 677 Bitmap Heap Scan 2.213 ms
6543 HEAD azjunk5 x[ae]{1}q 677 Seq Scan 119.024 ms
6554 trgm_regex azjunk5 x[ae]{1}q 677 Bitmap Heap Scan 1.800 ms
6543 HEAD azjunk5 x[ae]{1,1}q 677 Seq Scan 118.500 ms
6554 trgm_regex azjunk5 x[ae]{1,1}q 677 Bitmap Heap Scan 1.788 ms
6543 HEAD azjunk5 x[ae]{,2}q 0 Seq Scan 138.023 ms
6554 trgm_regex azjunk5 x[ae]{,2}q 0 Bitmap Heap Scan 9.822 ms
6543 HEAD azjunk5 x[ae]{,10}q 0 Seq Scan 223.479 ms
6554 trgm_regex azjunk5 x[ae]{,10}q 0 Bitmap Heap Scan 9.141 ms
6543 HEAD azjunk5 x[ae]{1,2}q 723 Seq Scan 122.973 ms
6554 trgm_regex azjunk5 x[ae]{1,2}q 723 Bitmap Heap Scan 3.865 ms
6543 HEAD azjunk5 x[aei]q 982 Seq Scan 121.424 ms
6554 trgm_regex azjunk5 x[aei]q 982 Bitmap Heap Scan 2.639 ms
6543 HEAD azjunk5 x[aei]{1}q 982 Seq Scan 119.213 ms
6554 trgm_regex azjunk5 x[aei]{1}q 982 Bitmap Heap Scan 2.769 ms
6543 HEAD azjunk5 x[aei]{1,1}q 982 Seq Scan 121.673 ms
6554 trgm_regex azjunk5 x[aei]{1,1}q 982 Bitmap Heap Scan 2.657 ms
6543 HEAD azjunk5 x[aei]{,2}q 0 Seq Scan 142.256 ms
6554 trgm_regex azjunk5 x[aei]{,2}q 0 Bitmap Heap Scan 5.588 ms
6543 HEAD azjunk5 x[aei]{,10}q 0 Seq Scan 214.769 ms
6554 trgm_regex azjunk5 x[aei]{,10}q 0 Bitmap Heap Scan 9.007 ms
6543 HEAD azjunk5 x[aei]{1,2}q 1075 Seq Scan 128.672 ms
6554 trgm_regex azjunk5 x[aei]{1,2}q 1075 Bitmap Heap Scan 8.079 ms
6543 HEAD azjunk5 x[aei]{1,3}q 1086 Seq Scan 127.069 ms
6554 trgm_regex azjunk5 x[aei]{1,3}q 1086 Bitmap Heap Scan 27.654 ms
6543 HEAD azjunk5 x[aei]q 982 Seq Scan 121.431 ms
6554 trgm_regex azjunk5 x[aei]q 982 Bitmap Heap Scan 2.782 ms
6543 HEAD azjunk5 x[aei]{1}q 982 Seq Scan 121.270 ms
6554 trgm_regex azjunk5 x[aei]{1}q 982 Bitmap Heap Scan 2.603 ms
6543 HEAD azjunk5 x[aei]{1,1}q 982 Seq Scan 120.032 ms
6554 trgm_regex azjunk5 x[aei]{1,1}q 982 Bitmap Heap Scan 2.627 ms
6543 HEAD azjunk5 x[aei]{,2}q 0 Seq Scan 143.379 ms
6554 trgm_regex azjunk5 x[aei]{,2}q 0 Bitmap Heap Scan 4.906 ms
6543 HEAD azjunk5 x[aei]{,10}q 0 Seq Scan 196.212 ms
6554 trgm_regex azjunk5 x[aei]{,10}q 0 Bitmap Heap Scan 4.707 ms
6543 HEAD azjunk5 x[aei]{1,2}q 1075 Seq Scan 127.050 ms
6554 trgm_regex azjunk5 x[aei]{1,2}q 1075 Bitmap Heap Scan 8.474 ms
6543 HEAD azjunk5 x[aei]{1,3}q 1086 Seq Scan 127.090 ms
6554 trgm_regex azjunk5 x[aei]{1,3}q 1086 Bitmap Heap Scan 27.646 ms
6543 HEAD azjunk5 x[aeio]q 1292 Seq Scan 119.951 ms
6554 trgm_regex azjunk5 x[aeio]q 1292 Bitmap Heap Scan 3.881 ms
6543 HEAD azjunk5 x[aeio]{1}q 1292 Seq Scan 123.444 ms
6554 trgm_regex azjunk5 x[aeio]{1}q 1292 Bitmap Heap Scan 3.346 ms
6543 HEAD azjunk5 x[aeio]{1,1}q 1292 Seq Scan 124.024 ms
6554 trgm_regex azjunk5 x[aeio]{1,1}q 1292 Bitmap Heap Scan 3.681 ms
6543 HEAD azjunk5 x[aeio]{,2}q 0 Seq Scan 152.181 ms
6554 trgm_regex azjunk5 x[aeio]{,2}q 0 Bitmap Heap Scan 5.774 ms
6543 HEAD azjunk5 x[aeio]{,10}q 0 Seq Scan 214.168 ms
6554 trgm_regex azjunk5 x[aeio]{,10}q 0 Bitmap Heap Scan 12.074 ms
6543 HEAD azjunk5 x[aeio]{1,2}q 1441 Seq Scan 128.491 ms
6554 trgm_regex azjunk5 x[aeio]{1,2}q 1441 Bitmap Heap Scan 22.538 ms
6543 HEAD azjunk5 x[aeio]{1,3}q 1461 Seq Scan 132.987 ms
6554 trgm_regex azjunk5 x[aeio]{1,3}q 1461 Seq Scan 125.682 ms
6543 HEAD azjunk5 x[aeio]{1,4}q 1464 Seq Scan 132.729 ms
6554 trgm_regex azjunk5 x[aeio]{1,4}q 1464 Seq Scan 133.625 ms
6543 HEAD azjunk5 x[aeio]{2,4}q 175 Seq Scan 135.328 ms
6554 trgm_regex azjunk5 x[aeio]{2,4}q 175 Seq Scan 134.194 ms
6543 HEAD azjunk5 x[aeio]{3,4}q 23 Seq Scan 131.590 ms
6554 trgm_regex azjunk5 x[aeio]{3,4}q 23 Seq Scan 135.435 ms
6543 HEAD azjunk5 x[aeiou]q 1598 Seq Scan 124.063 ms
6554 trgm_regex azjunk5 x[aeiou]q 1598 Seq Scan 124.983 ms
6543 HEAD azjunk5 x[aeiou]{1}q 1598 Seq Scan 134.563 ms
6554 trgm_regex azjunk5 x[aeiou]{1}q 1598 Seq Scan 128.089 ms
6543 HEAD azjunk5 x[aeiou]{1,1}q 1598 Seq Scan 124.158 ms
6554 trgm_regex azjunk5 x[aeiou]{1,1}q 1598 Seq Scan 128.355 ms
6543 HEAD azjunk5 x[aeiou]{,2}q 0 Seq Scan 144.541 ms
6554 trgm_regex azjunk5 x[aeiou]{,2}q 0 Bitmap Heap Scan 2.369 ms
6543 HEAD azjunk5 x[aeiou]{,10}q 0 Seq Scan 208.091 ms
6554 trgm_regex azjunk5 x[aeiou]{,10}q 0 Bitmap Heap Scan 2.528 ms
6543 HEAD azjunk5 x[aeiou]{1,2}q 1838 Seq Scan 130.474 ms
6554 trgm_regex azjunk5 x[aeiou]{1,2}q 1838 Seq Scan 130.433 ms
6543 HEAD azjunk5 x[aeiou]{1,3}q 1886 Seq Scan 134.002 ms
6554 trgm_regex azjunk5 x[aeiou]{1,3}q 1886 Seq Scan 134.786 ms
6543 HEAD azjunk5 x[aeiou]{1,4}q 1892 Seq Scan 137.588 ms
6554 trgm_regex azjunk5 x[aeiou]{1,4}q 1892 Seq Scan 145.194 ms
6543 HEAD azjunk5 x[aeiou]{2,4}q 299 Seq Scan 136.125 ms
6554 trgm_regex azjunk5 x[aeiou]{2,4}q 299 Seq Scan 138.212 ms
6543 HEAD azjunk5 x[aeiou]{3,4}q 54 Seq Scan 135.205 ms
6554 trgm_regex azjunk5 x[aeiou]{3,4}q 54 Seq Scan 134.146 ms
6543 HEAD azjunk5 x[aeiou]{1,5}q 1895 Seq Scan 137.151 ms
6554 trgm_regex azjunk5 x[aeiou]{1,5}q 1895 Seq Scan 140.986 ms
6543 HEAD azjunk5 x[aeiou]{2,5}q 302 Seq Scan 142.189 ms
6554 trgm_regex azjunk5 x[aeiou]{2,5}q 302 Seq Scan 137.368 ms
6543 HEAD azjunk5 x[aeiou]{4,5}q 9 Seq Scan 138.165 ms
6554 trgm_regex azjunk5 x[aeiou]{4,5}q 9 Seq Scan 137.122 ms
6543 HEAD azjunk5 x[aeiouy]q 1913 Seq Scan 126.283 ms
6554 trgm_regex azjunk5 x[aeiouy]q 1913 Seq Scan 130.424 ms
6543 HEAD azjunk5 x[aeiouy]{1}q 1913 Seq Scan 125.947 ms
6554 trgm_regex azjunk5 x[aeiouy]{1}q 1913 Seq Scan 131.957 ms
6543 HEAD azjunk5 x[aeiouy]{1,1}q 1913 Seq Scan 126.529 ms
6554 trgm_regex azjunk5 x[aeiouy]{1,1}q 1913 Seq Scan 130.958 ms
6543 HEAD azjunk5 x[aeiouy]{,2}q 0 Seq Scan 147.704 ms
6554 trgm_regex azjunk5 x[aeiouy]{,2}q 0 Bitmap Heap Scan 2.331 ms
6543 HEAD azjunk5 x[aeiouy]{,10}q 0 Seq Scan 221.774 ms
6554 trgm_regex azjunk5 x[aeiouy]{,10}q 0 Bitmap Heap Scan 2.522 ms
6543 HEAD azjunk5 x[aeiouy]{1,2}q 2275 Seq Scan 134.044 ms
6554 trgm_regex azjunk5 x[aeiouy]{1,2}q 2275 Seq Scan 136.827 ms
6543 HEAD azjunk5 x[aeiouy]{1,3}q 2358 Seq Scan 135.599 ms
6554 trgm_regex azjunk5 x[aeiouy]{1,3}q 2358 Seq Scan 134.196 ms
6543 HEAD azjunk5 x[aeiouy]{1,4}q 2376 Seq Scan 138.685 ms
6554 trgm_regex azjunk5 x[aeiouy]{1,4}q 2376 Seq Scan 141.408 ms
6543 HEAD azjunk5 x[aeiouy]{2,4}q 474 Seq Scan 142.223 ms
6554 trgm_regex azjunk5 x[aeiouy]{2,4}q 474 Seq Scan 143.439 ms
6543 HEAD azjunk5 x[aeiouy]{3,4}q 103 Seq Scan 138.690 ms
6554 trgm_regex azjunk5 x[aeiouy]{3,4}q 103 Seq Scan 136.192 ms
6543 HEAD azjunk5 x[aeiouy]{1,5}q 2381 Seq Scan 140.836 ms
6554 trgm_regex azjunk5 x[aeiouy]{1,5}q 2381 Seq Scan 143.374 ms
6543 HEAD azjunk5 x[aeiouy]{2,5}q 479 Seq Scan 140.223 ms
6554 trgm_regex azjunk5 x[aeiouy]{2,5}q 479 Seq Scan 139.995 ms
6543 HEAD azjunk5 x[aeiouy]{4,5}q 23 Seq Scan 139.976 ms
6554 trgm_regex azjunk5 x[aeiouy]{4,5}q 23 Seq Scan 138.114 ms
6543 HEAD azjunk6 x[ae]q 6448 Seq Scan 1219.490 ms
6554 trgm_regex azjunk6 x[ae]q 6448 Bitmap Heap Scan 23.452 ms
6543 HEAD azjunk6 x[ae]{1}q 6448 Seq Scan 1153.371 ms
6554 trgm_regex azjunk6 x[ae]{1}q 6448 Bitmap Heap Scan 18.492 ms
6543 HEAD azjunk6 x[ae]{1,1}q 6448 Seq Scan 1189.951 ms
6554 trgm_regex azjunk6 x[ae]{1,1}q 6448 Bitmap Heap Scan 24.596 ms
6543 HEAD azjunk6 x[ae]{,2}q 0 Seq Scan 1423.474 ms
6554 trgm_regex azjunk6 x[ae]{,2}q 0 Bitmap Heap Scan 41.593 ms
6543 HEAD azjunk6 x[ae]{,10}q 0 Seq Scan 1957.142 ms
6554 trgm_regex azjunk6 x[ae]{,10}q 0 Bitmap Heap Scan 45.238 ms
6543 HEAD azjunk6 x[ae]{1,2}q 6886 Seq Scan 1253.761 ms
6554 trgm_regex azjunk6 x[ae]{1,2}q 6886 Bitmap Heap Scan 31.247 ms
6543 HEAD azjunk6 x[aei]q 9600 Seq Scan 1203.022 ms
6554 trgm_regex azjunk6 x[aei]q 9600 Bitmap Heap Scan 31.467 ms
6543 HEAD azjunk6 x[aei]{1}q 9600 Seq Scan 1213.834 ms
6554 trgm_regex azjunk6 x[aei]{1}q 9600 Bitmap Heap Scan 26.008 ms
6543 HEAD azjunk6 x[aei]{1,1}q 9600 Seq Scan 1244.158 ms
6554 trgm_regex azjunk6 x[aei]{1,1}q 9600 Bitmap Heap Scan 25.997 ms
6543 HEAD azjunk6 x[aei]{,2}q 0 Seq Scan 1432.935 ms
6554 trgm_regex azjunk6 x[aei]{,2}q 0 Bitmap Heap Scan 44.843 ms
6543 HEAD azjunk6 x[aei]{,10}q 0 Seq Scan 1940.611 ms
6554 trgm_regex azjunk6 x[aei]{,10}q 0 Bitmap Heap Scan 45.838 ms
6543 HEAD azjunk6 x[aei]{1,2}q 10604 Seq Scan 1235.913 ms
6554 trgm_regex azjunk6 x[aei]{1,2}q 10604 Bitmap Heap Scan 78.764 ms
6543 HEAD azjunk6 x[aei]{1,3}q 10704 Seq Scan 1244.960 ms
6554 trgm_regex azjunk6 x[aei]{1,3}q 10704 Bitmap Heap Scan 272.049 ms
6543 HEAD azjunk6 x[aei]q 9600 Seq Scan 1211.965 ms
6554 trgm_regex azjunk6 x[aei]q 9600 Bitmap Heap Scan 26.230 ms
6543 HEAD azjunk6 x[aei]{1}q 9600 Seq Scan 1218.431 ms
6554 trgm_regex azjunk6 x[aei]{1}q 9600 Bitmap Heap Scan 25.462 ms
6543 HEAD azjunk6 x[aei]{1,1}q 9600 Seq Scan 1250.050 ms
6554 trgm_regex azjunk6 x[aei]{1,1}q 9600 Bitmap Heap Scan 25.711 ms
6543 HEAD azjunk6 x[aei]{,2}q 0 Seq Scan 1457.725 ms
6554 trgm_regex azjunk6 x[aei]{,2}q 0 Bitmap Heap Scan 43.491 ms
6543 HEAD azjunk6 x[aei]{,10}q 0 Seq Scan 2034.895 ms
6554 trgm_regex azjunk6 x[aei]{,10}q 0 Bitmap Heap Scan 46.139 ms
6543 HEAD azjunk6 x[aei]{1,2}q 10604 Seq Scan 1250.820 ms
6554 trgm_regex azjunk6 x[aei]{1,2}q 10604 Bitmap Heap Scan 78.067 ms
6543 HEAD azjunk6 x[aei]{1,3}q 10704 Seq Scan 1265.146 ms
6554 trgm_regex azjunk6 x[aei]{1,3}q 10704 Bitmap Heap Scan 274.109 ms
6543 HEAD azjunk6 x[aeio]q 12784 Seq Scan 1235.647 ms
6554 trgm_regex azjunk6 x[aeio]q 12784 Bitmap Heap Scan 35.613 ms
6543 HEAD azjunk6 x[aeio]{1}q 12784 Seq Scan 1206.185 ms
6554 trgm_regex azjunk6 x[aeio]{1}q 12784 Bitmap Heap Scan 39.618 ms
6543 HEAD azjunk6 x[aeio]{1,1}q 12784 Seq Scan 1210.467 ms
6554 trgm_regex azjunk6 x[aeio]{1,1}q 12784 Bitmap Heap Scan 34.513 ms
6543 HEAD azjunk6 x[aeio]{,2}q 0 Seq Scan 1457.918 ms
6554 trgm_regex azjunk6 x[aeio]{,2}q 0 Bitmap Heap Scan 55.732 ms
6543 HEAD azjunk6 x[aeio]{,10}q 0 Seq Scan 2104.860 ms
6554 trgm_regex azjunk6 x[aeio]{,10}q 0 Bitmap Heap Scan 62.129 ms
6543 HEAD azjunk6 x[aeio]{1,2}q 14538 Seq Scan 1286.881 ms
6554 trgm_regex azjunk6 x[aeio]{1,2}q 14538 Bitmap Heap Scan 182.161 ms
6543 HEAD azjunk6 x[aeio]{1,3}q 14761 Seq Scan 1291.199 ms
6554 trgm_regex azjunk6 x[aeio]{1,3}q 14761 Bitmap Heap Scan 1445.593 ms
6543 HEAD azjunk6 x[aeio]{1,4}q 14791 Seq Scan 1331.960 ms
6554 trgm_regex azjunk6 x[aeio]{1,4}q 14791 Seq Scan 1340.845 ms
6543 HEAD azjunk6 x[aeio]{2,4}q 2024 Seq Scan 1337.631 ms
6554 trgm_regex azjunk6 x[aeio]{2,4}q 2024 Seq Scan 1354.844 ms
6543 HEAD azjunk6 x[aeio]{3,4}q 257 Seq Scan 1321.271 ms
6554 trgm_regex azjunk6 x[aeio]{3,4}q 257 Seq Scan 1335.737 ms
6543 HEAD azjunk6 x[aeiou]q 15976 Seq Scan 1237.313 ms
6554 trgm_regex azjunk6 x[aeiou]q 15976 Seq Scan 1268.531 ms
6543 HEAD azjunk6 x[aeiou]{1}q 15976 Seq Scan 1251.777 ms
6554 trgm_regex azjunk6 x[aeiou]{1}q 15976 Seq Scan 1268.431 ms
6543 HEAD azjunk6 x[aeiou]{1,1}q 15976 Seq Scan 1243.416 ms
6554 trgm_regex azjunk6 x[aeiou]{1,1}q 15976 Seq Scan 1263.152 ms
6543 HEAD azjunk6 x[aeiou]{,2}q 0 Seq Scan 1476.587 ms
6554 trgm_regex azjunk6 x[aeiou]{,2}q 0 Bitmap Heap Scan 19.583 ms
6543 HEAD azjunk6 x[aeiou]{,10}q 0 Seq Scan 2084.845 ms
6554 trgm_regex azjunk6 x[aeiou]{,10}q 0 Bitmap Heap Scan 21.377 ms
6543 HEAD azjunk6 x[aeiou]{1,2}q 18692 Seq Scan 1302.585 ms
6554 trgm_regex azjunk6 x[aeiou]{1,2}q 18692 Seq Scan 1330.683 ms
6543 HEAD azjunk6 x[aeiou]{1,3}q 19128 Seq Scan 1290.309 ms
6554 trgm_regex azjunk6 x[aeiou]{1,3}q 19128 Seq Scan 1317.831 ms
6543 HEAD azjunk6 x[aeiou]{1,4}q 19202 Seq Scan 1347.727 ms
6554 trgm_regex azjunk6 x[aeiou]{1,4}q 19202 Seq Scan 1361.307 ms
6543 HEAD azjunk6 x[aeiou]{2,4}q 3268 Seq Scan 1362.704 ms
6554 trgm_regex azjunk6 x[aeiou]{2,4}q 3268 Seq Scan 1372.468 ms
6543 HEAD azjunk6 x[aeiou]{3,4}q 523 Seq Scan 1321.774 ms
6554 trgm_regex azjunk6 x[aeiou]{3,4}q 523 Seq Scan 1346.200 ms
6543 HEAD azjunk6 x[aeiou]{1,5}q 19214 Seq Scan 1367.949 ms
6554 trgm_regex azjunk6 x[aeiou]{1,5}q 19214 Seq Scan 1428.444 ms
6543 HEAD azjunk6 x[aeiou]{2,5}q 3280 Seq Scan 1349.375 ms
6554 trgm_regex azjunk6 x[aeiou]{2,5}q 3280 Seq Scan 1375.887 ms
6543 HEAD azjunk6 x[aeiou]{4,5}q 88 Seq Scan 1324.008 ms
6554 trgm_regex azjunk6 x[aeiou]{4,5}q 88 Seq Scan 1394.067 ms
6543 HEAD azjunk6 x[aeiouy]q 19168 Seq Scan 1262.363 ms
6554 trgm_regex azjunk6 x[aeiouy]q 19168 Seq Scan 1248.167 ms
6543 HEAD azjunk6 x[aeiouy]{1}q 19168 Seq Scan 1257.760 ms
6554 trgm_regex azjunk6 x[aeiouy]{1}q 19168 Seq Scan 1276.502 ms
6543 HEAD azjunk6 x[aeiouy]{1,1}q 19168 Seq Scan 1282.770 ms
6554 trgm_regex azjunk6 x[aeiouy]{1,1}q 19168 Seq Scan 1284.173 ms
6543 HEAD azjunk6 x[aeiouy]{,2}q 0 Seq Scan 1483.940 ms
6554 trgm_regex azjunk6 x[aeiouy]{,2}q 0 Bitmap Heap Scan 20.634 ms
6543 HEAD azjunk6 x[aeiouy]{,10}q 0 Seq Scan 2058.701 ms
6554 trgm_regex azjunk6 x[aeiouy]{,10}q 0 Bitmap Heap Scan 21.596 ms
6543 HEAD azjunk6 x[aeiouy]{1,2}q 23069 Seq Scan 1340.593 ms
6554 trgm_regex azjunk6 x[aeiouy]{1,2}q 23069 Seq Scan 1322.919 ms
6543 HEAD azjunk6 x[aeiouy]{1,3}q 23844 Seq Scan 1321.853 ms
6554 trgm_regex azjunk6 x[aeiouy]{1,3}q 23844 Seq Scan 1333.974 ms
6543 HEAD azjunk6 x[aeiouy]{1,4}q 23993 Seq Scan 1377.787 ms
6554 trgm_regex azjunk6 x[aeiouy]{1,4}q 23993 Seq Scan 1389.073 ms
6543 HEAD azjunk6 x[aeiouy]{2,4}q 4903 Seq Scan 1392.936 ms
6554 trgm_regex azjunk6 x[aeiouy]{2,4}q 4903 Seq Scan 1399.154 ms
6543 HEAD azjunk6 x[aeiouy]{3,4}q 944 Seq Scan 1342.379 ms
6554 trgm_regex azjunk6 x[aeiouy]{3,4}q 944 Seq Scan 1375.420 ms
6543 HEAD azjunk6 x[aeiouy]{1,5}q 24028 Seq Scan 1402.588 ms
6554 trgm_regex azjunk6 x[aeiouy]{1,5}q 24028 Seq Scan 1482.936 ms
6543 HEAD azjunk6 x[aeiouy]{2,5}q 4938 Seq Scan 1378.311 ms
6554 trgm_regex azjunk6 x[aeiouy]{2,5}q 4938 Seq Scan 1402.020 ms
6543 HEAD azjunk6 x[aeiouy]{4,5}q 189 Seq Scan 1348.171 ms
6554 trgm_regex azjunk6 x[aeiouy]{4,5}q 189 Seq Scan 1392.002 ms
6543 HEAD azjunk7 x[ae]q 63781 Seq Scan 11722.978 ms
6554 trgm_regex azjunk7 x[ae]q 63781 Bitmap Heap Scan 418.407 ms
6543 HEAD azjunk7 x[ae]{1}q 63781 Seq Scan 11787.311 ms
6554 trgm_regex azjunk7 x[ae]{1}q 63781 Bitmap Heap Scan 423.027 ms
6543 HEAD azjunk7 x[ae]{1,1}q 63781 Seq Scan 11902.061 ms
6554 trgm_regex azjunk7 x[ae]{1,1}q 63781 Bitmap Heap Scan 420.819 ms
6543 HEAD azjunk7 x[ae]{,2}q 0 Seq Scan 14144.148 ms
6554 trgm_regex azjunk7 x[ae]{,2}q 0 Bitmap Heap Scan 343.806 ms
6543 HEAD azjunk7 x[ae]{,10}q 0 Seq Scan 20390.872 ms
6554 trgm_regex azjunk7 x[ae]{,10}q 0 Bitmap Heap Scan 370.856 ms
6543 HEAD azjunk7 x[ae]{1,2}q 68145 Seq Scan 12569.198 ms
6554 trgm_regex azjunk7 x[ae]{1,2}q 68145 Bitmap Heap Scan 571.570 ms
6543 HEAD azjunk7 x[aei]q 95281 Seq Scan 12027.646 ms
6554 trgm_regex azjunk7 x[aei]q 95281 Bitmap Heap Scan 579.807 ms
6543 HEAD azjunk7 x[aei]{1}q 95281 Seq Scan 12213.674 ms
6554 trgm_regex azjunk7 x[aei]{1}q 95281 Bitmap Heap Scan 581.085 ms
6543 HEAD azjunk7 x[aei]{1,1}q 95281 Seq Scan 12121.898 ms
6554 trgm_regex azjunk7 x[aei]{1,1}q 95281 Bitmap Heap Scan 587.568 ms
6543 HEAD azjunk7 x[aei]{,2}q 0 Seq Scan 14519.020 ms
6554 trgm_regex azjunk7 x[aei]{,2}q 0 Bitmap Heap Scan 440.596 ms
6543 HEAD azjunk7 x[aei]{,10}q 0 Seq Scan 20829.970 ms
6554 trgm_regex azjunk7 x[aei]{,10}q 0 Bitmap Heap Scan 443.136 ms
6543 HEAD azjunk7 x[aei]{1,2}q 105054 Seq Scan 12967.634 ms
6554 trgm_regex azjunk7 x[aei]{1,2}q 105054 Bitmap Heap Scan 1151.202 ms
6543 HEAD azjunk7 x[aei]{1,3}q 106031 Seq Scan 12601.485 ms
6554 trgm_regex azjunk7 x[aei]{1,3}q 106031 Bitmap Heap Scan 3084.092 ms
6543 HEAD azjunk7 x[aei]q 95281 Seq Scan 12251.805 ms
6554 trgm_regex azjunk7 x[aei]q 95281 Bitmap Heap Scan 579.398 ms
6543 HEAD azjunk7 x[aei]{1}q 95281 Seq Scan 12251.196 ms
6554 trgm_regex azjunk7 x[aei]{1}q 95281 Bitmap Heap Scan 579.351 ms
6543 HEAD azjunk7 x[aei]{1,1}q 95281 Seq Scan 12176.216 ms
6554 trgm_regex azjunk7 x[aei]{1,1}q 95281 Bitmap Heap Scan 577.931 ms
6543 HEAD azjunk7 x[aei]{,2}q 0 Seq Scan 14632.855 ms
6554 trgm_regex azjunk7 x[aei]{,2}q 0 Bitmap Heap Scan 434.758 ms
6543 HEAD azjunk7 x[aei]{,10}q 0 Seq Scan 20637.829 ms
6554 trgm_regex azjunk7 x[aei]{,10}q 0 Bitmap Heap Scan 440.237 ms
6543 HEAD azjunk7 x[aei]{1,2}q 105054 Seq Scan 12967.108 ms
6554 trgm_regex azjunk7 x[aei]{1,2}q 105054 Bitmap Heap Scan 1166.260 ms
6543 HEAD azjunk7 x[aei]{1,3}q 106031 Seq Scan 12820.629 ms
6554 trgm_regex azjunk7 x[aei]{1,3}q 106031 Bitmap Heap Scan 3079.662 ms
6543 HEAD azjunk7 x[aeio]q 126868 Seq Scan 12535.441 ms
6554 trgm_regex azjunk7 x[aeio]q 126868 Bitmap Heap Scan 737.634 ms
6543 HEAD azjunk7 x[aeio]{1}q 126868 Seq Scan 12338.188 ms
6554 trgm_regex azjunk7 x[aeio]{1}q 126868 Bitmap Heap Scan 749.844 ms
6543 HEAD azjunk7 x[aeio]{1,1}q 126868 Seq Scan 12579.271 ms
6554 trgm_regex azjunk7 x[aeio]{1,1}q 126868 Bitmap Heap Scan 731.667 ms
6543 HEAD azjunk7 x[aeio]{,2}q 0 Seq Scan 14806.573 ms
6554 trgm_regex azjunk7 x[aeio]{,2}q 0 Bitmap Heap Scan 555.302 ms
6543 HEAD azjunk7 x[aeio]{,10}q 0 Seq Scan 22000.135 ms
6554 trgm_regex azjunk7 x[aeio]{,10}q 0 Bitmap Heap Scan 560.526 ms
6543 HEAD azjunk7 x[aeio]{1,2}q 144180 Seq Scan 12919.840 ms
6554 trgm_regex azjunk7 x[aeio]{1,2}q 144180 Bitmap Heap Scan 2245.885 ms
6543 HEAD azjunk7 x[aeio]{1,3}q 146526 Seq Scan 12807.513 ms
6554 trgm_regex azjunk7 x[aeio]{1,3}q 146526 Bitmap Heap Scan 15261.582 ms
6543 HEAD azjunk7 x[aeio]{1,4}q 146834 Seq Scan 13179.285 ms
6554 trgm_regex azjunk7 x[aeio]{1,4}q 146834 Seq Scan 13874.164 ms
6543 HEAD azjunk7 x[aeio]{2,4}q 20220 Seq Scan 13365.779 ms
6554 trgm_regex azjunk7 x[aeio]{2,4}q 20220 Seq Scan 13544.404 ms
6543 HEAD azjunk7 x[aeio]{3,4}q 2697 Seq Scan 13224.408 ms
6554 trgm_regex azjunk7 x[aeio]{3,4}q 2697 Seq Scan 13699.898 ms
6543 HEAD azjunk7 x[aeiou]q 158778 Seq Scan 12753.739 ms
6554 trgm_regex azjunk7 x[aeiou]q 158778 Seq Scan 12736.813 ms
6543 HEAD azjunk7 x[aeiou]{1}q 158778 Seq Scan 12385.108 ms
6554 trgm_regex azjunk7 x[aeiou]{1}q 158778 Seq Scan 12852.739 ms
6543 HEAD azjunk7 x[aeiou]{1,1}q 158778 Seq Scan 12665.614 ms
6554 trgm_regex azjunk7 x[aeiou]{1,1}q 158778 Seq Scan 12482.476 ms
6543 HEAD azjunk7 x[aeiou]{,2}q 0 Seq Scan 14886.647 ms
6554 trgm_regex azjunk7 x[aeiou]{,2}q 0 Bitmap Heap Scan 197.807 ms
6543 HEAD azjunk7 x[aeiou]{,10}q 0 Seq Scan 21428.416 ms
6554 trgm_regex azjunk7 x[aeiou]{,10}q 0 Bitmap Heap Scan 210.265 ms
6543 HEAD azjunk7 x[aeiou]{1,2}q 185669 Seq Scan 12896.338 ms
6554 trgm_regex azjunk7 x[aeiou]{1,2}q 185669 Seq Scan 13354.702 ms
6543 HEAD azjunk7 x[aeiou]{1,3}q 190274 Seq Scan 12730.517 ms
6554 trgm_regex azjunk7 x[aeiou]{1,3}q 190274 Seq Scan 13026.644 ms
6543 HEAD azjunk7 x[aeiou]{1,4}q 191017 Seq Scan 13664.473 ms
6554 trgm_regex azjunk7 x[aeiou]{1,4}q 191017 Seq Scan 13743.875 ms
6543 HEAD azjunk7 x[aeiou]{2,4}q 32732 Seq Scan 13360.429 ms
6554 trgm_regex azjunk7 x[aeiou]{2,4}q 32732 Seq Scan 13804.770 ms
6543 HEAD azjunk7 x[aeiou]{3,4}q 5449 Seq Scan 13170.928 ms
6554 trgm_regex azjunk7 x[aeiou]{3,4}q 5449 Seq Scan 13394.707 ms
6543 HEAD azjunk7 x[aeiou]{1,5}q 191160 Seq Scan 13591.866 ms
6554 trgm_regex azjunk7 x[aeiou]{1,5}q 191160 Seq Scan 14158.325 ms
6543 HEAD azjunk7 x[aeiou]{2,5}q 32878 Seq Scan 13507.736 ms
6554 trgm_regex azjunk7 x[aeiou]{2,5}q 32878 Seq Scan 13687.159 ms
6543 HEAD azjunk7 x[aeiou]{4,5}q 903 Seq Scan 13329.291 ms
6554 trgm_regex azjunk7 x[aeiou]{4,5}q 903 Seq Scan 13645.331 ms
6543 HEAD azjunk7 x[aeiouy]q 190245 Seq Scan 12331.375 ms
6554 trgm_regex azjunk7 x[aeiouy]q 190245 Seq Scan 12726.390 ms
6543 HEAD azjunk7 x[aeiouy]{1}q 190245 Seq Scan 12752.629 ms
6554 trgm_regex azjunk7 x[aeiouy]{1}q 190245 Seq Scan 12712.805 ms
6543 HEAD azjunk7 x[aeiouy]{1,1}q 190245 Seq Scan 12500.269 ms
6554 trgm_regex azjunk7 x[aeiouy]{1,1}q 190245 Seq Scan 12863.557 ms
6543 HEAD azjunk7 x[aeiouy]{,2}q 0 Seq Scan 14746.988 ms
6554 trgm_regex azjunk7 x[aeiouy]{,2}q 0 Bitmap Heap Scan 194.024 ms
6543 HEAD azjunk7 x[aeiouy]{,10}q 0 Seq Scan 21648.192 ms
6554 trgm_regex azjunk7 x[aeiouy]{,10}q 0 Bitmap Heap Scan 208.955 ms
6543 HEAD azjunk7 x[aeiouy]{1,2}q 228677 Seq Scan 13359.817 ms
6554 trgm_regex azjunk7 x[aeiouy]{1,2}q 228677 Seq Scan 13358.769 ms
6543 HEAD azjunk7 x[aeiouy]{1,3}q 236512 Seq Scan 13191.587 ms
6554 trgm_regex azjunk7 x[aeiouy]{1,3}q 236512 Seq Scan 13504.745 ms
6543 HEAD azjunk7 x[aeiouy]{1,4}q 238061 Seq Scan 13756.733 ms
6554 trgm_regex azjunk7 x[aeiouy]{1,4}q 238061 Seq Scan 13929.557 ms
6543 HEAD azjunk7 x[aeiouy]{2,4}q 48681 Seq Scan 13766.984 ms
6554 trgm_regex azjunk7 x[aeiouy]{2,4}q 48681 Seq Scan 14135.231 ms
6543 HEAD azjunk7 x[aeiouy]{3,4}q 9602 Seq Scan 13429.259 ms
6554 trgm_regex azjunk7 x[aeiouy]{3,4}q 9602 Seq Scan 13539.163 ms
6543 HEAD azjunk7 x[aeiouy]{1,5}q 238407 Seq Scan 13863.009 ms
6554 trgm_regex azjunk7 x[aeiouy]{1,5}q 238407 Seq Scan 14091.161 ms
6543 HEAD azjunk7 x[aeiouy]{2,5}q 49031 Seq Scan 14006.522 ms
6554 trgm_regex azjunk7 x[aeiouy]{2,5}q 49031 Seq Scan 14105.039 ms
6543 HEAD azjunk7 x[aeiouy]{4,5}q 1935 Seq Scan 13718.130 ms
6554 trgm_regex azjunk7 x[aeiouy]{4,5}q 1935 Seq Scan 14032.751 ms
(You asked also for testing against real text, I'll probably some of that too (although I do not
expect all that many differences).
Thanks, great work!
Erik Rijkers
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
One thing that bothers me with this algoritm is that the overflow
mechanism is all-or-nothing. In many cases, even when there is a huge
number of states in the diagram, you could still extract at least a few
trigrams that must be present in any matching string, with little
effort. At least, it seems like that to a human :-).
For example, consider this:
explain analyze select count(*) from azjunk4 where txt ~
('^aabaacaadaaeaafaagaahaaiaajaakaalaamaanaaoaapaaqaaraasaataauaavaawaaxaayaazabaabbabcabdabeabfabgabhabiabjabkablabmabnaboabpabqabrabsabtabuabvabwabxabyabzacaacbaccacdaceacfacgachaciacjackaclacmacnacoacpacqacracsactacuacvacwacxacyaczadaadbadcaddadeadfadgadhadiadjadkadladmadnadoadpadqadradsadtaduadvadwadxadyadzaeaaebaecaedaeeaefaegaehaeiaejaekaelaemaenaeoaepaeqaeraesaetaeuaevaewaexaeyaezafaafbafcafdafeaffafgafhafiafjafkaflafmafnafoafpafqafrafsaftafuafvafwafxafyafzagaagbagcagdageagfaggaghagiagjagkaglagmagnagoagpagqagragsagtaguagvagwagxagyagzahaahbahcahdaheahfahgahhahiahjahkahlahmahnahoahpahqahrahs$');
you get a query plan like this (the long regexp string edited out):
Aggregate (cost=228148.02..228148.03 rows=1 width=0) (actual
time=131.100..131
.101 rows=1 loops=1)
-> Bitmap Heap Scan on azjunk4 (cost=228144.01..228148.02 rows=1
width=0) (
actual time=131.096..131.096 rows=0 loops=1)
Recheck Cond: (txt ~ <ridiculously long regexp>)
Rows Removed by Index Recheck: 10000
-> Bitmap Index Scan on azjunk4_trgmrgx_txt_01_idx
(cost=0.00..228144
.01 rows=1 width=0) (actual time=82.914..82.914 rows=10000 loops=1)
Index Cond: (txt ~ <ridiculously long regexp>)
Total runtime: 131.230 ms
(7 rows)
That ridiculously long string exceeds the number of states (I think,
could be number of paths or arcs too), and the algorithm gives up,
resorting to scanning the whole index as can be seen by the "Rows
Removed by Index Recheck" line. However, it's easy to see that any
matching string must contain *any* of the possible trigrams the
algorithm extracts. If it could safely return just a few of them, say
"aab" and "abz", and discard the rest, that would already be much better
than a full index scan.
Would it be safe to simply stop short the depth-first search on
overflow, and proceed with the graph that was constructed up to that point?
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Nov 29, 2012 at 5:25 PM, Heikki Linnakangas <hlinnakangas@vmware.com
wrote:
One thing that bothers me with this algoritm is that the overflow
mechanism is all-or-nothing. In many cases, even when there is a huge
number of states in the diagram, you could still extract at least a few
trigrams that must be present in any matching string, with little effort.
At least, it seems like that to a human :-).For example, consider this:
explain analyze select count(*) from azjunk4 where txt ~ ('^**
aabaacaadaaeaafaagaahaaiaajaak**aalaamaanaaoaapaaqaaraasaataau**
aavaawaaxaayaazabaabbabcabdabe**abfabgabhabiabjabkablabmabnabo**
abpabqabrabsabtabuabvabwabxaby**abzacaacbaccacdaceacfacgachaci**
acjackaclacmacnacoacpacqacracs**actacuacvacwacxacyaczadaadbadc**
addadeadfadgadhadiadjadkadladm**adnadoadpadqadradsadtaduadvadw**
adxadyadzaeaaebaecaedaeeaefaeg**aehaeiaejaekaelaemaenaeoaepaeq**
aeraesaetaeuaevaewaexaeyaezafa**afbafcafdafeaffafgafhafiafjafk**
aflafmafnafoafpafqafrafsaftafu**afvafwafxafyafzagaagbagcagdage**
agfaggaghagiagjagkaglagmagnago**agpagqagragsagtaguagvagwagxagy**
agzahaahbahcahdaheahfahgahhahi**ahjahkahlahmahnahoahpahqahrahs**$');you get a query plan like this (the long regexp string edited out):
Aggregate (cost=228148.02..228148.03 rows=1 width=0) (actual
time=131.100..131
.101 rows=1 loops=1)
-> Bitmap Heap Scan on azjunk4 (cost=228144.01..228148.02 rows=1
width=0) (
actual time=131.096..131.096 rows=0 loops=1)
Recheck Cond: (txt ~ <ridiculously long regexp>)
Rows Removed by Index Recheck: 10000
-> Bitmap Index Scan on azjunk4_trgmrgx_txt_01_idx
(cost=0.00..228144
.01 rows=1 width=0) (actual time=82.914..82.914 rows=10000 loops=1)
Index Cond: (txt ~ <ridiculously long regexp>)
Total runtime: 131.230 ms
(7 rows)That ridiculously long string exceeds the number of states (I think, could
be number of paths or arcs too), and the algorithm gives up, resorting to
scanning the whole index as can be seen by the "Rows Removed by Index
Recheck" line. However, it's easy to see that any matching string must
contain *any* of the possible trigrams the algorithm extracts. If it could
safely return just a few of them, say "aab" and "abz", and discard the
rest, that would already be much better than a full index scan.Would it be safe to simply stop short the depth-first search on overflow,
and proceed with the graph that was constructed up to that point?
For depth-first it's not. But your proposal naturally makes sense. I've
changed it to breadth-first search. And then it's safe to mark all
unprocessed states as final when overflow. It means that we assume every
unprocessed branch to immediately finish with matching (this can give us
more false positives but no false negatives).
For overflow of matrix collection, it's safe to do just OR between all the
trigrams.
New version of patch is attached.
------
With best regards,
Alexander Korotkov.
Attachments:
trgm-regexp-0.7.patch.gzapplication/x-gzip; name=trgm-regexp-0.7.patch.gzDownload
� ;��P �<kWK�������� /Q�1kq�$�����LN.��.�c�M��='�����~���������U�v����U��E�:�C!wL�
}{�3�C2�y/���v���z��kA03�R�Q�w��'�k����[���V���Z{���J��a�'�ju�L����0�=�L�)��?;�p����8��Sq��_/�����������w��1��?.���g�B�.{iC����k�W'm���4��DYI'-����O����j����Z�MJS*3TV�P��p��W�mR���U7�~��T��j��(�J��m%hN�D� �=������u�]o[���G�U~Et���N=V�RC��u��~8��.�����I_�*i�R���0��������K�}o&�,��
�"MHhdH �����Q����U2��a)������s�.<�R�-h����+��B���a���T�?��Wb1sA+�O����~6�Q�JJK3��v�+�D��iQk��2��>k�T7�vc�j�R�
P�WO��o��4�b����k��z��:�k�hK���X~���#�������G'���g����:��H�}����m��K%�'X�O��Z�������Y��{s���m�;��W�����r�
�A6�zgoduj5SvG��k��
���
����.8����F��w���\����?�]�
������
`�PG�@|�8>}�M�wt�x��?-� ����oR��QB�h�{4�=��^5��4��������[����+��ypg��<�6*8�M�?�����G�j�V���ec��F��Wkw��e6~�������)��7���jr��&�OE�FlK��������o���8uz�O�N�Mcw/�������VD�����J�^��s������W�O���g�5u�c6t���Oe2wMN6�B����Z���s5�(����6�4�LB[�e
���������=���[>Y�t����M���q�k�j��v�a�s��j��;W�F��F+)����D@@Q��
�*L����0������a��3��s���j�;
m'�N� m7����S�<�=7����]%N�!�T�Ph�����X�_�u4��V+5����j'e���#]S]�>��dq�F�G��Y�����V�x%D�4��I��<�{���T��<��p1W .����2�������l7l5h�����<�u(nw��_�{�n'b��c��o������?{'��g{q>8>� t��7(�r�$�@�m�D���=q�ETV��/���Q��fka~�}e���$������������A����!b�O%<!|�l0:x�M�3��`s��d=�O��`.M����oXM�����R�4� ���/,k�6�l7��$�A�y=Z���/�����u�F��MU�Qo�%����m��7����RC�S���o�J�
.����4�a�i����
��#(|���a;����r��mp����CoL����
�J�)������Ip��k�hz�A�B
PQ�)���S��b�cW`
�!$�ts���#������Tf�����"���4�4BL�7��Q�`2S�-bH����u0+��D }�������
(��r�i^�����0��=�:d!4_p�H���<�_���p+���c��Z�~d����+�]0e��1��O����C�K�JY���s�%1��|W�O�m.5;�R��(���*�*�����>����i`"E�����c����8'�X7��(L7B6YBl��H�7 t�y��!FU k
G��C��7���f�v��w����-B��m���\�xx��G{K� ��2�{5���!��&y������5��
��!N5|7��oM^���\�Zp��$��h�G{��Z���
sC���Od4@��a
B����t"�@�o�o>�^��{��e�zv�n���0�S�,0m�� ;x*
�s�L'��QY� r�o���\�Ku*V8T� W��T���uy��x�����q�y#=��t�������� "F���*s�e��`2j����U~���C�= �M��c5��,�dv7r�7������N(�����O�Y�aV�(id���''����5#4n0?jA��i��T�mde)*C�7�OA����A:U(�O� �!U��J����*V*b�:�Z�X+ ����s��)t��� ��09�*�s��������7*7����U&����`RB�h?V ��� ����yp T�l$�>A�������0� �]Q�P0a��~'�l{�b�q���l�?�]�i`A9�����s����,����GE ���W!� �������HAVg��t�M��D\�|��ge/y�#�a����;������t�3#��8��`��d��\�E�� ��G��� C\D��G��tPI��m�������N�^1-�n� ��Z�����u�sHp=�^����(������j����&X���-���fh��.��J2����R��8���&*�/q�o�� _R�-BOX� �9���E�Zf��Y�i^vo�O���������Q�C�g@��A����4�[y�T��h���c�X������M\����e��O<��F��k#9#^��Rs� gs��|�5�ag �]�-��6���^�*��*v�;48&���g����PM��"qu��ZK���VD`�x�\��#]���2����&.������,�|��[ iF*��@b�=n��o ��6^��G�I{N�$�>�;nG��9���^�h�'�k?��)O����s\�2S��_�o?QI-�J��A�+�����x�{P��3\����c�l�����z����{�%�C��f������ke����@�!n:�;�����C����������",���A�_o������P!/S/f�E=���1fT������8�'3pE�8�?�T�uUB/�)���!bG&~��5S!�]�D��t1��1#G�(%�k3D��W&�R�9�u���p�����L��f�f��%�u!���Z��H�!�&q7�]!�e�%�K>W�51O�%�,��N+F���?A�]CL$I�@�T��$�e/P���5���T_x�/)���a8�_�/KT1_������e��y���=�c|c�RXB�#fw6�~������7__��������j�j{
)���!���D�7F\b��
���&����6.���IR��0�]aF>� z4`�"���c:�ZqRRM��vp���9��pLQ���F��������ex�v-�H[��cY����tO����SE`��4� � �-��}!��A���Z}�F~{���Z-�9H��|J�m����!�� ?RF��\G��\OQ�Jio{�l=�c���N^�0�a�t]y���@�ia~��z�2�`�ct�q�~4�����D���&k4��;lX���R�Y��Tp �MT��d����r������Hf�� ��+�����9��BN]�gN8��T\�C���5�N��--���v^.������p~�O�>���Y�I�xK�����S��H�M�VY�Ztz��h|�����+b�m�a4Z���px�g�4����,�D5=��x��:I��
.9%�@����X���5fUT��]�e'*L�ns���O����h����|Ck'r�����*9���>tyC�`H���rY<rL��)�fj0���������([�"���1��o���TS�)��.V ���������jYHf|O���Uv�n�-3d����"u������*�{��Ao�F��q
&����/;����R<)h�(fjf����S6pX�16�����V�he���/e����0�w�Fko7gi�c����tSS~����AT�����S����}���\����7��j�����FV��x�L��y LN�o=s����cLz?h��x�9�u��c��s�Z���r��H/a���b��-%�L�ka�|����j�\��Ds��30:�wb�t���#h�6�� �d0����"�������^L��dc��i��x"^�<*��1lLE��� r�r>(3�x����Y��%U_6��E`�|�����-!�<`P`�}\�u$\�eF��@�CL�k�
+ �T�1%�� g�d��.&�]�#�kx�6�x����ob��W3��VD3�
p-'z�L m8���@������Al������$P���j�BF�d�tq��u.RM����������k���
XD�,.�Bvdf��f���s��|�����E�,=�.cK�F�����XG�y08n���G��\/��Z� W2���C7�7]�cl�s��b+�VFEX�
&�:��Xw������j�2FE\�+��1���+�;~��8F*���� ,x��
���nz|P�cDq���s�A������
H �L3�����jc��)���
|�x@
��9tq�![
8��p�7FbU�IM4���"�NIR"�� ����?O��;�����4\G=HU�0�9�c�{�\Y�!DU���e�tQ2W6D��.m�d S���#'��
/
'��#1r��hP��*b6��Z2�%�G���f~jD7xj NSFt�2�����.l�c�2P*]�E�������o5Z8�N�\��J���.���*�T��e��Nt�����{��a�`�-Y��"9��VI�Y/�pl���Fk��{s|��Fk���|����(�eA��kr<��[����h�60>�i����,�k��,O��X��7�\J�9�D�F��.f�o ���NW)+�����=cO,P/'+\�L�W1/�x���oj��xo�&��������9j~������tM�5���3<�B7���%Wz���%�B�IP�M�(X��-���~�5��Z1���!{��j���"2�V��x���yz�(��n��x)������v�F�V@'�$V|W����EQ�j��g�j`��������xb9�d������y9�
��"#��>o_�`X�e������� ��$����n��n�{��Y��,a���o+��G��������J��������������*m;���!!�� ����TnFJ��9�YjN�l3 ��&w�!&�)tz�J�>���{~�������c���|�r~���lM������w���FEPy��8����k��uB�>��<TP"�gr���,��8l����( _[�bEa,��!���e�
�3������f�q�Z�gl7��C,���r���=�Q��2���b{�8��b��&.*O^�rdjA����g�#h1���l='I��D���k�I����� 7�����TK�[J(��$HJ�k��]bM�F�$�C�b.{�Z����������<Q*
Q4���Q^6v
h�U����r��3��p[�%H�)W w2�����5�%,����T��4���]�Qk�����:qk���|S'���Q���L:[���6Pj������-Q�z�Z�C�s��V��ff_G<~%��V�1����o�uLfDQ��<x�*a` +N"1�`�E��bdZ��0�����*&������23�32��)i���jh3F�����!��.��/������� Z��w���z%��X�� �?�=kWG���WL87��� �N����������X����F0��Q4��l~��gwu�����|�9�����~U�����7pl��=��5E���] B�
����f���)�c�PK������K�
4����\��o3_��g����0��2��G�������v$���%��m��������v���LT��_�b�F�Jp}�[
��h'�5q�/�"�u�{�=/����4n���W����_M��x�* e�&e�@&��� ���?�(7��^,./E\����X����%B['�O~�}|���na�Br���D�������00��"1p`���G��b5��F�;���A��p���S8|��x[�F���tTB�V ���d���J��$�|W� kRc�%�� �9E�a�&)�<���6�r"���%���1�%�s\�"�E�,Yd��9�1�$�c;0�2�� �b�b!�A�X,F���|������']���r�8���P�r�E
1��^�Sg:Y�:�D�3��-{T�v�l)Y 8G\F$`�n���
�c�Hb� ���::������&[�pD�D�{�* B�1�l-r���A ���O�r��i ��a�F�Y�O�4�v;�T�������t�U���u�����4JZ?+��H�@�W�(���H�����`�3�E�i'w;n*���D99���8��>d���d&�!*�0
`s�����j �X�G��LL�w;������d0-�H��>����b.b��/���w�Xl������XG�R0��q�����;����w����O� ����#���#�����Q�!5����Y1�B5u, ��������!������Vd��
eA@���u�
ey� �����k����j�eC8R[��&�:���5k"��sH���{�����V&}-o(�
�������{g�6���aL/A���v�f/n"x�LgJ�lh��#��;���T���5�'��rr��6z,�+����In���l2����������!�R��������>�/O
��1|�s"����<��"�Q�n'�Pg9�y�V��\�Sz�{Y����l� �`���|���hT�~{��!u)<��0�S/P�a����6'�0�c�j
(T=�@��gs?g������w�����Q�c��E��b�MzI��;�X�N�UH��S�w^{s�3�Pv�0g��FF>��o��Wl���FI�����n���(�b�#�aH���{�v�52�1H/��1�����-3���������d�d��L�K��5LF"��*'����+�x�5#�!E\��u�.���o
�hi]T�?�n��3�y����DPQ�����q����\�}��?U���)u"|��=r�[L9��X����S�P(���Y�P ��<��MVmh��3~&�@m.S��g���Z��R���@���_}2���1�w��
, ��Y@�LH2e�F����[i�����L���M�����^,���0�S��C Ma��<�S����zK�K��j��A3�����*��,�o� �m(���#�A`��������!"<F����"����/���LB�L�%[�QJJi�����A>��� ���I,Nt#B�o�T�`�����JA7J��r��p�H{�9|�p|���e�!-����� $
*��F��p2�
C8��q�(,cF#C�L�I`�DTr��\��M��ULBD�7 &<d�'�ep���^H����(�A�t�.e��a
�:%F6 � >��[�|S����2���^qeB�LW>r;D��l���S���P�|������2v>6��%�R��*NL��+m���{(��\��mf�r1)
,H&gZz�W��6������O��m�!��6�?p-JG����f����KH^�3]������G�#���1j��g�X+�%��p{�tj���J�����Gk2�MFP�l ���&�{��@eW�1^����� "�m��F��y��lIk�r��6AX�_
�����j"1�K�H�9��Ag�A�������2 Q������ 4�~S��������" �@�Y�����b+� t��k��!>�5��@k������N��k�2������a.?�������gG���1�GJ�H������O�9;��)NRH�]��F9��q����=�
a�/?ZRqT�d�!�f8$�V����9$�S8�B��HY��Q������L��-�}+�����#S�����N4(��1`� 3���1�M��,��,*��eM�x��Z��@A�����9\8�fh����jz �yD��}�$�7�GI�d4T�� :�8�����%N��u#���=V�N�b���<4�;uC��vrT���9\~� ��|�
C��w��;��1B%�E���
�����\����G�]���nD��d]����@�= %�B9xwmh������$�H�4]��?����u+������xzOqkX��Ra�[��M}b�����P�.��M<��� �*p8.\���qh����{-��lr`�~��H���etG�Fy]�����`[b�O����)�~b4�(���-!���,1vt��O��77���c,g�.m�R��Im�af"Lz���}�u����2����������K!W��D�l�D���]��+1���1���4y� ��2)�DE�_J0y{��C��r�-��?,��s���?a��*`���Y���Z�#�8�.Kk�E��M�{E��L}{}8�J�h�=�9@�q`� �`�+��0�������EC��W�n���l�Q=����vPY���������F���
�w����t��%Tz�s�����KS�2k��W~�����WeL�
NJ������
���#���\YL���A�u8W�;�"P����d����2����6��H%��J*���.�� ��
��5LXq������]|�b���%,t�O<,\����H��8���s3�Jw��M-]���E6��lR�E)s��!H�l;����\�a��v�z��G��YP@��S�&�������+��[����<�����:
aZxD/� ]���Z3G��?P+Q�$�]>�KA-�PhM5��rmV 3s���:����g��d�&�|9���|�s���V��j�#���u_������m��j�u�P���Ju8��f�z�v�L���!�:m����l1A�D'0�cx�"���s`5U*S?rHA��P����%�����2��qNm�����������n�����
����O':&�/��;}�/
u�����c��
�;1kq��,����A�����^�7���C"���1��U!%K��E���J��b��4�@���yhP�O�fL����Kj4��x�����d�\x�X��=q�b�~��IV����
���3)^��/�>
�p�f3��##�\o0�z_n�;\��*b�}r��v������0��>��E��]������n��z���sK Z�����aX�7��<��|4���f��h��p��QP�A>���W�
Z��=��I�����L��������f2�C��3�h]��s�l��;�a��?*���Q�[�s~<������T���\�����E/���!���Dt/��� ���������t���..��`w�C�u�2���6�t��@���"} (X��,6j1P�.���������Z?g,�,�?�rldcp��������+�s����,Xe�l����")cs�����=&T�1��aB��������J*s�1N�����E�Y����4����7�
�h��$ y��r���1&���������qzs�������XE�
Jsc(�E��0��{�-D[)��.��9���!I�.|Z����&�9UI�����`@;f�J|VmPP��(%c'%�chQ�/5lT4��t2�@F�"2�������xQ�r����+�m��:O�}���O9����f��!^%jp���� �X
�*
�R��^V�2���� �^���S������W�MQ���ab���F��>����Q4kT�������x�6FZZf�`8b�D:����������n7��?>���������]a��a����g��m%����1�3�, S�;��Kn<53���$�yB�']I{��S���������O��X�D�}����"G��U��=�.�?�l���_U�xp��`x���QZ ;jJ2�
��G�5��u� \7����R��( t���~���������������Kn��l�����U��M��4;��D0�2�)�`=n������"W��J���L!����F�J�
�e��}w��Tu��n@Q�Nu����Ja��o�:.��hr�c��R�s���8������/(��a]�x�
Ox��E�����[�~��^Z��
M�cTZln�YJ�H�}��1z��,�#���$��%:����jUL�Th��=x!���#I��t�D�x�
����+:������n��n�����lE��tH|��Y�@�L������8u�+����\��#w=���=�3�Dt�����zc��$
E��(�Z��`9��o](��\��\�:t�{����b��(�V�>�r�3�~�Wo�U*�/��J��^_�����i��Y��:�d�_o@����\Z��?�
�*�[�)L������Z_����\cSF���f*��1?Mi��OP���{�-����Tw�D��S�7��}���9
\ ��R�>C���w-��j\��}�"w�����N�S��y���k�"��U�d��K a )�gT����\�z��H�w�M����� E
%��T$�D�.�hxb�,U�������!�^k��@������~3"x����ik��o)�%I���G����P�F(�?��CWA�XnA��e���<��/k�����d�F���/D�_/#4B����`�@(q� ��_�@�X���_eU4����$��
�Wv�Je���Pl�s8��_ 0�{���'kny�.&�tgF%�����}�7��{���Fc�(�D2��Gf#S%�rYMIE�|-
���k�����5��S�jK�XqP/��`a%��k�^�������|���&\�-K�
��d��ue�>
� $^�s�$s_�#�'��$��G��(�b�,��c`v8�$A]�Z�}�k�\�
�uX*�||���]�T�:)����!���RI
�_�U�Z�"9�f��I�%�]3�z�B� ��z&CzAB.��q�ta�5N�6���d)�
�T�^= ��W*j��W�Bx��[]�����! �=�rU�&��%!vJ��K[�����=�6���l���$�+��L�?$����V�c���C]�D�z�4za����<*������C�����*~��pq}B��Y[\��*�}��+�XFHH�9=�!��S���D��"��D�R8m,�&K<��7(/a��������"b��]|��e�m%%<��jB����f!6�D_Y�X6� ���S'%� ^ �4,�,7�(�|��'����hS���l�q��6$3��0ZA�G��)"I+MQ����t���M��������[kpE��5����8qk ��`+�cP�au��;z
$�t���0w���5�4��@/9&X���d} Y����R}��S�����V��4�/���
�-�;�O����5�e�h��HWSQ�'W?�F�o�dA|c{
�Yh��@���q!+��?����@�������G��`���������s���c����c����7#��"����X5$lN���wo�����n���Zx�-���~�5`�~[�H�u�S<�R�M���l��B`.��\ �3������r���n+ �����b,|�
�g�GXl����5�J�xp)�e��s�j5Kj�9���H��qqb1^�~�k�;�8D$����]b������q��=#��O��F�#R��F��-]����.�K��\`w������|���x����:�Gq/�QL��0�R�3���K�y��fE� |
8�e!�W���F������?�gZ��/�����{��Z��<p�$H�#!1bP)�S;�$sg���D�Q���8i��z��/EI�u\�F�h��&}.�������I6�Y�!4���vq������Y��}��������'g��?h��|�<�������!�9Jd� :��)LZ��D��^>H:%#i <����!��r��2�|Bq%�&���E����J��N;���#S���7���C�a"��V�LO;�"��G�q���NwM��
q&�w��4j�(�{_T=�T�M5���0�5%���C$��WH���o��*�Vn����\
�eP5���|�\!|�Q9��q�$�(Y���a�'����N����{���>�Y>����V5��1���{�x����_�.0��W�
|���}�j"��a�5���U{��q`M�J;��d�����BcI�������x�9H����H]R�+$%X�+iH�J�L�10�;����:�q��T��� M�'���� ��-$���=a�-7�[��X�7O��0��l���T��%���-g%8��R>���o��L�js�};�+� .�W��1Y�%4�wi6�*7(@�RnP��[Jb|���w�*7�c�
��6�������(���8x������[i��8k�T��VW1
$�!!q�46�ua�Q����=��������a��V����!��";;�\D�KAI����a����O}�1h,T�oL�� o|���-Gg��w�l�fY�����! �H���U4�:����D����W��]y9����(�V���q3wa&k
�*�}�K�����M|����uV��+�J�2����uV�c�"�C������x�?�����v���eS��.�L#�� m����-�������Z��1Z�Z# >��*��� ��������h��A��f$'n��G�����ph
�A�R~,�[�smY�yW��MouEi��w T�1���jWL��[���3%�A�@���`��Bn�JCG��S�J[�`���6��]��[��|}�m�89�4\���X�?��$Q �p�Wq��|�:�������)b����H���uwf�:���,�>���q0���xR�kdk��G��a�%�RRyN�`������l���Q���y�
������"yZ*�9�0���al��������ihZ4P@�{�������z&��������eEXg��j~Z6���p]>-^m����g<r3�&Fkg��ip�c�
��:����)g?��J"e����P��=LZ8�������oUK���g�A��@�In@���1,�!��T�T��]�bbj�����J�V ��F���8��S�U�0���y]��VZa�[{Si.��Jmh�=K��T�Mp�q��M\�z��Y�����������
%r�a���C`�0�V��G0�N�p3H������,7�QcP
S�%��G��-��� q���|���"��7��
7r��W���m�|���(:��V!�ae���&a������3_
������������7!|� }�������Q|=6u���cRxw`��{R����������*���������)��U�0R/B8��.7HUnh������b������<8;���n��9���!��%��RD����8��$q�"l���H����r9G/x�y�c���5o�������NP?��y�>��������Iw�����������g��<>�~m_/������8��x�8��}�.���Rx��'��(���F
�5i����'Y��.x����� �^���r���A��Q�Jx���vh��,+���;�|���ua-2�`��FP��!r��hQ�'�M1��q���OV�N�
�\�G�0��.��3x�w�?�0�p�d&�2�'��qfl;���C������9��=�=�>xq|v��������:>@,�����?9*��B�n�XRN�Ac��J��0�����5��om~Up(�0��.���Y�������"
������4R�����k<��m�i�w����'&���}�z��&P
�w����e����OuN]�<:�vIL��14q��:� ��%1�� �,I.m\�3�M�XL�a�V-������R�}�Q@��w�~��"T�U^�{V�z�������l��8c0����V�����g���yp������Y<Kz��k��0��
�a�i+:���^�jh��S������2f�������67�oK�p����&[[�x��� �o3���x�=����wrQ�ju�!Z��� �����{�lr�����{�~�����s?�]���\��������6��������wz@W��SL�Z ���X/]��1_�rP'lk�:�D�W�D)��opCw�b6���-J��tE6�F�Rw>�X�:9���M
ER,��|6��z��^�1�.0����L$������R8_�{�$����QP���Q�EF{G�~8|��G���g�������}�
r�II�I���7�aN0���k�����'����i�13�*��P����G[t��R����:����=���_����8���w�v�[�-n1(���2����V C�&,E�I��������O^<�}-0
�r������0�'���/�D(:;w[��o�P Nuv��:��<NUwz?J/3��d/�����7N�^���r�����o��X? F��-;K ��e�5t��"�\O�8����k>�pC��mK�'�j��Hn7�y��H
�R�� &���goID���M��qR�M��D���rOyS
�����q>����
1 -9�Y������(=H��`6�jB�� *�D�7�%�H�<a9R��4���f�Bo�7����^�:�����A�-SyNgj��m�� M�;�e�
G
o�+|��\�=��"q��������}�r��|��p1�QD�{�|�?z�q�p�=��0:���S ����t��
�����14����U�O��3��'��yD�{)af#Jhf(2�Z�\�Dg���Bl�x�AV�������XU�� �1�f�q���b|���/�.�F0�
�qg��n�
�/{.�l6l/��������vd��
��b�����He_��+��
�,��h�e�<>��cVk���[L��u9#�1c��Mk��='�m&/)�� ����h<I:����b����]czQC��V��H7�Pd8���tv�P�-��{}����n���9�{`�E�y��F1����!8�/�i,t����������90�6����=�������,O�����������/V�pe��h1Hw���+���w���W�_�;����������������OS��� �<hSa�����J�6"�M��������{�8G�Q�CG��`�\# ��f�l��0�P��L��5j���S�s>�]���psQ�p ?O�c���]@C��YO�>�"N��sX�� Hi!
On Thu, Nov 29, 2012 at 12:58 PM, er <er@xs4all.nl> wrote:
On Mon, November 26, 2012 20:49, Alexander Korotkov wrote:
trgm-regexp-0.6.patch.gz
I ran the simple-minded tests against generated data (similar to the ones
I did in January 2012).
The problems of that older version seem pretty much all removed. (although
I didn't do much work
on it -- just reran these tests).
Thanks a lot for testing! Could you repeat for 0.7 version of patch which
has new overflow handling?
------
With best regards,
Alexander Korotkov.
On Fri, Nov 30, 2012 at 3:20 PM, Alexander Korotkov <aekorotkov@gmail.com>wrote:
For depth-first it's not.
Oh, I didn't explained it.
In order to stop graph processing we need to be sure that we put all
outgoing arcs from state or assume that state to be final. In DFS we can be
in the final part of graph producing but still didn't add some arc (with
new trigram) from initial state directly to the final state. It obviously
leads to false negatives.
------
With best regards,
Alexander Korotkov.
On 30.11.2012 13:20, Alexander Korotkov wrote:
On Thu, Nov 29, 2012 at 5:25 PM, Heikki Linnakangas<hlinnakangas@vmware.com
wrote:
Would it be safe to simply stop short the depth-first search on overflow,
and proceed with the graph that was constructed up to that point?For depth-first it's not. But your proposal naturally makes sense. I've
changed it to breadth-first search. And then it's safe to mark all
unprocessed states as final when overflow. It means that we assume every
unprocessed branch to immediately finish with matching (this can give us
more false positives but no false negatives).
For overflow of matrix collection, it's safe to do just OR between all the
trigrams.
New version of patch is attached.
Thanks, sounds good.
I've spent quite a long time trying to understand the transformation the
getState/addKeys/addAcrs functions do to the original CNFA graph. I
think that still needs more comments to explain the steps involved in it.
One thing that occurs to me is that it might be better to delay
expanding colors to characters. You could build and transform the graph,
and even create the paths, all while operating on colors. You would end
up with lists of "color trigrams", consisting of sequences of three
colors that must appear in the source string. Only at the last step you
would expand the color trigrams to real character trigrams. I think that
would save a lot of processing while building the graph, if you have
colors that contain many characters. It would allow us to do better than
the fixed small MAX_COLOR_CHARS limit. For example with MAX_COLOR_CHARS
= 4 in the current patch, it's a shame that we can't do anything with a
fairly simple regexp like "^a[b-g]h$"
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, November 30, 2012 12:22, Alexander Korotkov wrote:
Hi!
On Thu, Nov 29, 2012 at 12:58 PM, er <er@xs4all.nl> wrote:
On Mon, November 26, 2012 20:49, Alexander Korotkov wrote:
I ran the simple-minded tests against generated data (similar to the ones
I did in January 2012).
The problems of that older version seem pretty much all removed. (although
I didn't do much work
on it -- just reran these tests).Thanks a lot for testing! Could you repeat for 0.7 version of patch which
has new overflow handling?
I've attached a similar test re-run that compares HEAD with patch versions 0.6, and 0.7.
Erik Rijkers
Attachments:
On Fri, Nov 30, 2012 at 6:23 PM, Heikki Linnakangas <hlinnakangas@vmware.com
wrote:
On 30.11.2012 13:20, Alexander Korotkov wrote:
On Thu, Nov 29, 2012 at 5:25 PM, Heikki Linnakangas<hlinnakangas@**
vmware.com <hlinnakangas@vmware.com>wrote:
Would it be safe to simply stop short the depth-first search on overflow,
and proceed with the graph that was constructed up to that point?
For depth-first it's not. But your proposal naturally makes sense. I've
changed it to breadth-first search. And then it's safe to mark all
unprocessed states as final when overflow. It means that we assume every
unprocessed branch to immediately finish with matching (this can give us
more false positives but no false negatives).
For overflow of matrix collection, it's safe to do just OR between all the
trigrams.
New version of patch is attached.Thanks, sounds good.
I've spent quite a long time trying to understand the transformation the
getState/addKeys/addAcrs functions do to the original CNFA graph. I think
that still needs more comments to explain the steps involved in it.One thing that occurs to me is that it might be better to delay expanding
colors to characters. You could build and transform the graph, and even
create the paths, all while operating on colors. You would end up with
lists of "color trigrams", consisting of sequences of three colors that
must appear in the source string. Only at the last step you would expand
the color trigrams to real character trigrams. I think that would save a
lot of processing while building the graph, if you have colors that contain
many characters. It would allow us to do better than the fixed small
MAX_COLOR_CHARS limit. For example with MAX_COLOR_CHARS = 4 in the current
patch, it's a shame that we can't do anything with a fairly simple regexp
like "^a[b-g]h$"
Nice idea to delay expanding colors to characters! Obviously, we should
delay expanding inly alphanumerical characters. Because non-alphanumberical
characters influence graph structure. Trying to implement...
------
With best regards,
Alexander Korotkov.
On Sat, Dec 1, 2012 at 3:22 PM, Erik Rijkers <er@xs4all.nl> wrote:
On Fri, November 30, 2012 12:22, Alexander Korotkov wrote:
Hi!
On Thu, Nov 29, 2012 at 12:58 PM, er <er@xs4all.nl> wrote:
On Mon, November 26, 2012 20:49, Alexander Korotkov wrote:
I ran the simple-minded tests against generated data (similar to the
ones
I did in January 2012).
The problems of that older version seem pretty much all removed.(although
I didn't do much work
on it -- just reran these tests).Thanks a lot for testing! Could you repeat for 0.7 version of patch which
has new overflow handling?I've attached a similar test re-run that compares HEAD with patch versions
0.6, and 0.7.
Thanks! Did you write scripts for automated testing? I would be nice if you
share them.
------
With best regards,
Alexander Korotkov.
Alexander Korotkov <aekorotkov@gmail.com> writes:
Nice idea to delay expanding colors to characters! Obviously, we should
delay expanding inly alphanumerical characters. Because non-alphanumberical
characters influence graph structure. Trying to implement...
Uh, why would that be? Colors are colors. The regexp machinery doesn't
care whether they represent alphanumerics or not. (Or to be more
precise, if there is a situation where it makes a difference, separate
colors will have been created for each set of characters that need to be
distinguished.)
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, December 2, 2012 19:07, Alexander Korotkov wrote:
I've attached a similar test re-run that compares HEAD with patch versions
0.6, and 0.7.Thanks! Did you write scripts for automated testing? I would be nice if you
share them.
Sure, here they are.
The perl program does depend a bit on my particular setup (it reads the port from the
postgresql.conf of each instance, and the script knows the data_dir locations), but I suppose it's
easy enough to remove that.
(Just hardcode the %instances hash, instead of calling instances().)
If you need help to get them to run I can 'generalise' them, but for now I'll send them as they are.
Erik Rijkers
On 02.12.2012 20:19, Tom Lane wrote:
Alexander Korotkov<aekorotkov@gmail.com> writes:
Nice idea to delay expanding colors to characters! Obviously, we should
delay expanding inly alphanumerical characters. Because non-alphanumberical
characters influence graph structure. Trying to implement...Uh, why would that be? Colors are colors. The regexp machinery doesn't
care whether they represent alphanumerics or not. (Or to be more
precise, if there is a situation where it makes a difference, separate
colors will have been created for each set of characters that need to be
distinguished.)
The regexp machinery doesn't care, but the trigrams that pg_trgm
extracts only contain alphanumeric characters. So if by looking at the
CNFA graph produced by the regexp machinery you conclude that any
matching strings must contain three-letter sequences "%oo" and "#oo",
you can just club them together into " oo" trigram.
I think you can run a pre-processing step to the colors, and merge
colors that are equivalent as far as trigrams are considered. For
example, if you have a color that contains only character '%', and
another that contains character '#', you can treat them as the same
hcolor. You might then be able to simplify the CNFA. Actually, it would
be even better if you could apply the pre-processing to the regexp
before the regexp machinery turns it into a CNFA. Not sure how easy it
would be to do such pre-processing.
BTW, why create the path matrix? You could check the "check" array of
trigrams in the consistent function directly against the graph.
Consistent should return true, iff there is a path through the graph
following only arcs that contain trigrams present in the check array.
Finding a path through a complex graph could be expensive, O(|E|), but
if the path is complex, the path matrix would be large as well, and
checking against a large matrix isn't exactly free either. It would
allow you to avoid "overflows" caused by having too many paths through
the graph.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Dec 3, 2012 at 2:05 PM, Heikki Linnakangas
<hlinnakangas@vmware.com>wrote:
On 02.12.2012 20:19, Tom Lane wrote:
Alexander Korotkov<aekorotkov@gmail.com> writes:
Nice idea to delay expanding colors to characters! Obviously, we should
delay expanding inly alphanumerical characters. Because
non-alphanumberical
characters influence graph structure. Trying to implement...Uh, why would that be? Colors are colors. The regexp machinery doesn't
care whether they represent alphanumerics or not. (Or to be more
precise, if there is a situation where it makes a difference, separate
colors will have been created for each set of characters that need to be
distinguished.)The regexp machinery doesn't care, but the trigrams that pg_trgm extracts
only contain alphanumeric characters. So if by looking at the CNFA graph
produced by the regexp machinery you conclude that any matching strings
must contain three-letter sequences "%oo" and "#oo", you can just club them
together into " oo" trigram.I think you can run a pre-processing step to the colors, and merge colors
that are equivalent as far as trigrams are considered. For example, if you
have a color that contains only character '%', and another that contains
character '#', you can treat them as the same hcolor. You might then be
able to simplify the CNFA. Actually, it would be even better if you could
apply the pre-processing to the regexp before the regexp machinery turns it
into a CNFA. Not sure how easy it would be to do such pre-processing.
Treating colors as same should be possible only for colors which has no
alphanumeric characters, because colors are non-overlapping. However, this
optimization could be significant in some cases.
BTW, why create the path matrix? You could check the "check" array of
trigrams in the consistent function directly against the graph. Consistent
should return true, iff there is a path through the graph following only
arcs that contain trigrams present in the check array. Finding a path
through a complex graph could be expensive, O(|E|), but if the path is
complex, the path matrix would be large as well, and checking against a
large matrix isn't exactly free either. It would allow you to avoid
"overflows" caused by having too many paths through the graph.
Actually, I generally dislike path matrix for same reasons. But:
1) Output graphs could contain trigrams which are completely useless for
search. For example, for regex /(abcdefgh)*ijk/ we need only "ijk" trigram
while graph would contain much more.Path matrix is a method to get rid of
all of them.
2) If we use color trigrams then we need some criteria for which color
trigrams to expand into trigrams. Simultaneously, we shouldn't allow path
from initial state to the final by unexpanded trigrams. It seems much
harder to do with graph than with matrix.
------
With best regards,
Alexander Korotkov.
On Mon, Dec 3, 2012 at 4:31 PM, Alexander Korotkov <aekorotkov@gmail.com>wrote:
Actually, I generally dislike path matrix for same reasons. But:
1) Output graphs could contain trigrams which are completely useless for
search. For example, for regex /(abcdefgh)*ijk/ we need only "ijk" trigram
while graph would contain much more.Path matrix is a method to get rid of
all of them.
2) If we use color trigrams then we need some criteria for which color
trigrams to expand into trigrams. Simultaneously, we shouldn't allow path
from initial state to the final by unexpanded trigrams. It seems much
harder to do with graph than with matrix.
Now, I have an idea about doing some not comprehensive but simple and fast
simplification of graph. I'm doing experiments now. In case of success we
could get rid of path matrix.
------
With best regards,
Alexander Korotkov.
On Fri, Dec 14, 2012 at 1:34 AM, Alexander Korotkov <aekorotkov@gmail.com>wrote:
On Mon, Dec 3, 2012 at 4:31 PM, Alexander Korotkov <aekorotkov@gmail.com>wrote:
Actually, I generally dislike path matrix for same reasons. But:
1) Output graphs could contain trigrams which are completely useless for
search. For example, for regex /(abcdefgh)*ijk/ we need only "ijk" trigram
while graph would contain much more.Path matrix is a method to get rid of
all of them.
2) If we use color trigrams then we need some criteria for which color
trigrams to expand into trigrams. Simultaneously, we shouldn't allow path
from initial state to the final by unexpanded trigrams. It seems much
harder to do with graph than with matrix.Now, I have an idea about doing some not comprehensive but simple and fast
simplification of graph. I'm doing experiments now. In case of success we
could get rid of path matrix.
Attached patch have following changes:
1) Postphone expansion of colors. Graph are building on color trigrams.
2) Selective expansion of color trigrams into simple trigrams. All
non-expanded color trigrams are removed. Such removal leads to union of all
states pairs connected with corresponding arcs. Surely, this must no lead
to union of initial and final states: that could do all previous work
senseless.
------
With best regards,
Alexander Korotkov.
Attachments:
On Sun, December 16, 2012 22:25, Alexander Korotkov wrote:
trgm-regexp-0.8.patch.gz 22 k
Hi Alexander,
I gave this a quick try; the patch works when compiled for DEBUG, but crashes as a
'speed'-compiled binary:
Compile for speed:
$ pg_config --configure
'--prefix=/home/aardvark/pg_stuff/pg_installations/pgsql.trgm_regex8' '--with-pgport=6556'
'--enable-depend' '--with-openssl' '--with-perl' '--with-libxml'
$ psql
psql (9.3devel-trgm_regex8-20121216_2336-c299477229559d4ee7db68720d86d3fb391db761)
Type "help" for help.
testdb=# explain analyze select txt from azjunk5 where txt ~ 'x[aeiouy]{2,5}q';
The connection to the server was lost. Attempting reset: Failed.
!> \q
log after such:
-----------------
2012-12-17 09:31:02.337 CET 15801 LOG: server process (PID 16903) was terminated by signal 11:
Segmentation fault
2012-12-17 09:31:02.337 CET 15801 DETAIL: Failed process was running: explain analyze select txt
from azjunk5 where txt ~ 'x[aeiouy]{2,5}q';
2012-12-17 09:31:02.347 CET 15801 LOG: terminating any other active server processes
2012-12-17 09:31:02.348 CET 17049 FATAL: the database system is in recovery mode
2012-12-17 09:31:02.722 CET 15801 LOG: all server processes terminated; reinitializing
2012-12-17 09:31:03.506 CET 17052 LOG: database system was interrupted; last known up at
2012-12-17 09:26:00 CET
2012-12-17 09:31:03.693 CET 17052 LOG: database system was not properly shut down; automatic
recovery in progress
2012-12-17 09:31:04.493 CET 17052 LOG: record with zero length at 2/7E3C7588
2012-12-17 09:31:04.494 CET 17052 LOG: redo is not required
2012-12-17 09:31:06.940 CET 15801 LOG: database system is ready to accept connections
----------------
A debug-compile with below options runs OK (so far):
Compile for debug:
$ pg_config --configure
'--prefix=/home/aardvark/pg_stuff/pg_installations/pgsql.trgm_regex8b' '--with-pgport=6560'
'--enable-depend' '--enable-cassert' '--enable-debug' '--with-openssl' '--with-perl'
'--with-libxml'
which does show some speed gain, I think. When I have time I'll post comparisons between HEAD,
versions 6, 7, 8. (BTW, is v6 still interesting?)
Thanks,
Erik Rijkers
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi!
On Mon, Dec 17, 2012 at 12:54 PM, Erik Rijkers <er@xs4all.nl> wrote:
On Sun, December 16, 2012 22:25, Alexander Korotkov wrote:
trgm-regexp-0.8.patch.gz 22 k
Hi Alexander,
I gave this a quick try; the patch works when compiled for DEBUG, but
crashes as a
'speed'-compiled binary:Compile for speed:
$ pg_config --configure
'--prefix=/home/aardvark/pg_stuff/pg_installations/pgsql.trgm_regex8'
'--with-pgport=6556'
'--enable-depend' '--with-openssl' '--with-perl' '--with-libxml'$ psql
psql
(9.3devel-trgm_regex8-20121216_2336-c299477229559d4ee7db68720d86d3fb391db761)
Type "help" for help.testdb=# explain analyze select txt from azjunk5 where txt ~
'x[aeiouy]{2,5}q';
The connection to the server was lost. Attempting reset: Failed.
!> \q
Didn't reproduce it yet. Can you retry it with this line uncommented:
#define TRGM_REGEXP_DEBUG
Then we can see which stage it fails.
------
With best regards,
Alexander Korotkov.
On Mon, Dec 17, 2012 at 1:16 PM, Alexander Korotkov <aekorotkov@gmail.com>wrote:
Didn't reproduce it yet. Can you retry it with this line uncommented:
#define TRGM_REGEXP_DEBUG
Then we can see which stage it fails.
Bug is found and fixed in attached patch.
------
With best regards,
Alexander Korotkov.
Attachments:
On Tue, December 18, 2012 08:04, Alexander Korotkov wrote:
trgm-regexp-0.9.patch.gz 22 k
Hi.
I ran the same test again: HEAD versus trgm_regex v6, 7 and 9. In v9 there is some gain but also
some regression.
It remains a difficult problem...
If I get some time in the holidays I'll try to diversify the test program; it is now too simple.
Thanks,
Erik Rijkers
Attachments:
On Tue, Dec 18, 2012 at 11:45 AM, Erik Rijkers <er@xs4all.nl> wrote:
On Tue, December 18, 2012 08:04, Alexander Korotkov wrote:
I ran the same test again: HEAD versus trgm_regex v6, 7 and 9. In v9
there is some gain but also
some regression.It remains a difficult problem...
If I get some time in the holidays I'll try to diversify the test program;
it is now too simple.
Note, that regexes which contains {,n} are likely not what do you expect.
test=# select 'xq' ~ 'x[aeiou]{,2}q';
?column?
----------
f
(1 row)
test=# select 'xa{,2}q' ~ 'x[aeiou]{,2}q';
?column?
----------
t
(1 row)
You should use {0,n} to express from 0 to n occurences.
------
With best regards,
Alexander Korotkov.
On Tue, December 18, 2012 09:45, Alexander Korotkov wrote:
You should use {0,n} to express from 0 to n occurences.
Thanks, but I know that of course. It's a testing program; and in the end robustness with
unexpected or even wrong input is as important as performance. (to put it bluntly, I am also
trying to get your patch to fall over ;-))
Thanks,
Erik Rijkers
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Dec 18, 2012 at 12:51 PM, Erik Rijkers <er@xs4all.nl> wrote:
On Tue, December 18, 2012 09:45, Alexander Korotkov wrote:
You should use {0,n} to express from 0 to n occurences.
Thanks, but I know that of course. It's a testing program; and in the end
robustness with
unexpected or even wrong input is as important as performance. (to put it
bluntly, I am also
trying to get your patch to fall over ;-))
I found most of regressions in 0.9 version to be in {,n} cases. New version
of patch use more of trigrams than previous versions.
For example for regex 'x[aeiou]{,2}q'.
In 0.7 version we use trigrams '__2', '_2_' and '__q'.
In 0.9 version we use trigrams 'xa_', 'xe_', 'xi_', 'xo_', 'xu_', '__2',
'_2_' and '__q'.
But, actually trigram '__2' or '_2_' never occurs. It enough to have one of
them, all others are just causing a slowdown. Simultaneously, we can't
decide reasonably which trigrams to use without knowing their frequencies.
For example, if trigrams 'xa_', 'xe_', 'xi_', 'xo_', 'xu_' were altogether
more rare than '__2', newer version of patch would be faster.
------
With best regards,
Alexander Korotkov.
On 18.12.2012 09:04, Alexander Korotkov wrote:
Bug is found and fixed in attached patch.
I finally got around to look at this. I like this new version, without
the path matrix, much better.
I spend quite some time honing the code and comments, trying to organize
it so that it's easier to understand. In particular, I divided the
processing more clearly into four separate stages, and added comments
indicating which functions and which fields in the structs are needed in
which state.
I understand the other stages fairly well now, but the transformation
from the source CNFA form into the transformed graph is still a black
box to me. The addKeys/addArcs functions still need more explanation.
Can you come up with some extra comments or refactoring to clarify those?
I'd like to see a few more regression test cases, to cover the various
overflow cases. In particular, I built with --enable-coverage and ran
"make installcheck", and it looks like the state merging code isn't
exercised at all. Report attached.
To visualize the graphs, I rewrote the debugging print* functions to
write the graphs in graphviz .dot format. That helped a lot. See
attached graphs, generated from the regexp '^(abc|def)(ghi|jk[lmn])$'.
There's still a lot of cleanup to do, I'm going to continue working on
this tomorrow, but wanted to shared what I have this far.
- Heikki
Attachments:
trgm-regexp-0.9-heikki-1.patchtext/x-diff; name=trgm-regexp-0.9-heikki-1.patchDownload
diff --git a/contrib/pg_trgm/Makefile b/contrib/pg_trgm/Makefile
index 64fd69f..8033733 100644
--- a/contrib/pg_trgm/Makefile
+++ b/contrib/pg_trgm/Makefile
@@ -1,7 +1,7 @@
# contrib/pg_trgm/Makefile
MODULE_big = pg_trgm
-OBJS = trgm_op.o trgm_gist.o trgm_gin.o
+OBJS = trgm_op.o trgm_gist.o trgm_gin.o trgm_regexp.o
EXTENSION = pg_trgm
DATA = pg_trgm--1.0.sql pg_trgm--unpackaged--1.0.sql
diff --git a/contrib/pg_trgm/expected/pg_trgm.out b/contrib/pg_trgm/expected/pg_trgm.out
index 81d0ca8..ee0131f 100644
--- a/contrib/pg_trgm/expected/pg_trgm.out
+++ b/contrib/pg_trgm/expected/pg_trgm.out
@@ -54,7 +54,7 @@ select similarity('wow',' WOW ');
(1 row)
CREATE TABLE test_trgm(t text);
-\copy test_trgm from 'data/trgm.data
+\copy test_trgm from 'data/trgm.data'
select t,similarity(t,'qwertyu0988') as sml from test_trgm where t % 'qwertyu0988' order by sml desc, t;
t | sml
-------------+----------
@@ -3515,6 +3515,47 @@ select * from test2 where t ilike 'qua%';
quark
(1 row)
+select * from test2 where t ~ '[abc]{3}';
+ t
+--------
+ abcdef
+(1 row)
+
+select * from test2 where t ~ 'a[bc]+d';
+ t
+--------
+ abcdef
+(1 row)
+
+select * from test2 where t ~* 'DEF';
+ t
+--------
+ abcdef
+(1 row)
+
+select * from test2 where t ~ 'dEf';
+ t
+---
+(0 rows)
+
+select * from test2 where t ~* '^q';
+ t
+-------
+ quark
+(1 row)
+
+select * from test2 where t ~* '[abc]{3}[def]{3}';
+ t
+--------
+ abcdef
+(1 row)
+
+select * from test2 where t ~ 'q.*rk$';
+ t
+-------
+ quark
+(1 row)
+
drop index test2_idx_gin;
create index test2_idx_gist on test2 using gist (t gist_trgm_ops);
set enable_seqscan=off;
diff --git a/contrib/pg_trgm/pg_trgm--1.0.sql b/contrib/pg_trgm/pg_trgm--1.0.sql
index 8067bd6..ca9bcaa 100644
--- a/contrib/pg_trgm/pg_trgm--1.0.sql
+++ b/contrib/pg_trgm/pg_trgm--1.0.sql
@@ -163,4 +163,6 @@ AS
ALTER OPERATOR FAMILY gin_trgm_ops USING gin ADD
OPERATOR 3 pg_catalog.~~ (text, text),
- OPERATOR 4 pg_catalog.~~* (text, text);
+ OPERATOR 4 pg_catalog.~~* (text, text),
+ OPERATOR 5 pg_catalog.~ (text, text),
+ OPERATOR 6 pg_catalog.~* (text, text);
diff --git a/contrib/pg_trgm/sql/pg_trgm.sql b/contrib/pg_trgm/sql/pg_trgm.sql
index 81ab1e7..7d8a151 100644
--- a/contrib/pg_trgm/sql/pg_trgm.sql
+++ b/contrib/pg_trgm/sql/pg_trgm.sql
@@ -13,7 +13,7 @@ select similarity('wow',' WOW ');
CREATE TABLE test_trgm(t text);
-\copy test_trgm from 'data/trgm.data
+\copy test_trgm from 'data/trgm.data'
select t,similarity(t,'qwertyu0988') as sml from test_trgm where t % 'qwertyu0988' order by sml desc, t;
select t,similarity(t,'gwertyu0988') as sml from test_trgm where t % 'gwertyu0988' order by sml desc, t;
@@ -52,6 +52,14 @@ select * from test2 where t like '%bcd%';
select * from test2 where t like E'%\\bcd%';
select * from test2 where t ilike '%BCD%';
select * from test2 where t ilike 'qua%';
+
+select * from test2 where t ~ '[abc]{3}';
+select * from test2 where t ~ 'a[bc]+d';
+select * from test2 where t ~* 'DEF';
+select * from test2 where t ~ 'dEf';
+select * from test2 where t ~* '^q';
+select * from test2 where t ~* '[abc]{3}[def]{3}';
+select * from test2 where t ~ 'q.*rk$';
drop index test2_idx_gin;
create index test2_idx_gist on test2 using gist (t gist_trgm_ops);
set enable_seqscan=off;
diff --git a/contrib/pg_trgm/trgm.h b/contrib/pg_trgm/trgm.h
index 067f29d..b3ce65b 100644
--- a/contrib/pg_trgm/trgm.h
+++ b/contrib/pg_trgm/trgm.h
@@ -7,7 +7,6 @@
#include "access/gist.h"
#include "access/itup.h"
#include "storage/bufpage.h"
-#include "utils/builtins.h"
/* options */
#define LPADDING 2
@@ -28,6 +27,8 @@
#define DistanceStrategyNumber 2
#define LikeStrategyNumber 3
#define ILikeStrategyNumber 4
+#define RegExpStrategyNumber 5
+#define RegExpStrategyNumberICase 6
typedef char trgm[3];
@@ -46,8 +47,10 @@ uint32 trgm2int(trgm *ptr);
#ifdef KEEPONLYALNUM
#define ISPRINTABLECHAR(a) ( isascii( *(unsigned char*)(a) ) && (isalnum( *(unsigned char*)(a) ) || *(unsigned char*)(a)==' ') )
+#define ISWORDCHR(c) (t_isalpha(c) || t_isdigit(c))
#else
#define ISPRINTABLECHAR(a) ( isascii( *(unsigned char*)(a) ) && isprint( *(unsigned char*)(a) ) )
+#define ISWORDCHR(c) (!t_isspace(c))
#endif
#define ISPRINTABLETRGM(t) ( ISPRINTABLECHAR( ((char*)(t)) ) && ISPRINTABLECHAR( ((char*)(t))+1 ) && ISPRINTABLECHAR( ((char*)(t))+2 ) )
@@ -99,11 +102,16 @@ typedef char *BITVECP;
#define GETARR(x) ( (trgm*)( (char*)x+TRGMHDRSIZE ) )
#define ARRNELEM(x) ( ( VARSIZE(x) - TRGMHDRSIZE )/sizeof(trgm) )
+typedef struct PackedGraph PackedGraph;
+
extern float4 trgm_limit;
TRGM *generate_trgm(char *str, int slen);
TRGM *generate_wildcard_trgm(const char *str, int slen);
float4 cnt_sml(TRGM *trg1, TRGM *trg2);
bool trgm_contained_by(TRGM *trg1, TRGM *trg2);
+void cnt_trigram(trgm *trgmptr, char *str, int bytelen);
+TRGM *createTrgmCNFA(text *text_re, MemoryContext context, PackedGraph **paths);
+bool trigramsMatchGraph(PackedGraph *graph, bool *check);
#endif /* __TRGM_H__ */
diff --git a/contrib/pg_trgm/trgm_gin.c b/contrib/pg_trgm/trgm_gin.c
index 114fb78..b519901 100644
--- a/contrib/pg_trgm/trgm_gin.c
+++ b/contrib/pg_trgm/trgm_gin.c
@@ -80,7 +80,7 @@ gin_extract_query_trgm(PG_FUNCTION_ARGS)
StrategyNumber strategy = PG_GETARG_UINT16(2);
/* bool **pmatch = (bool **) PG_GETARG_POINTER(3); */
- /* Pointer *extra_data = (Pointer *) PG_GETARG_POINTER(4); */
+ Pointer **extra_data = (Pointer **) PG_GETARG_POINTER(4);
/* bool **nullFlags = (bool **) PG_GETARG_POINTER(5); */
int32 *searchMode = (int32 *) PG_GETARG_POINTER(6);
Datum *entries = NULL;
@@ -88,6 +88,7 @@ gin_extract_query_trgm(PG_FUNCTION_ARGS)
int32 trglen;
trgm *ptr;
int32 i;
+ PackedGraph *graph;
switch (strategy)
{
@@ -107,6 +108,32 @@ gin_extract_query_trgm(PG_FUNCTION_ARGS)
*/
trg = generate_wildcard_trgm(VARDATA(val), VARSIZE(val) - VARHDRSZ);
break;
+ case RegExpStrategyNumberICase:
+#ifndef IGNORECASE
+ elog(ERROR, "cannot handle ~* with case-sensitive trigrams");
+#endif
+ /* FALL THRU */
+ case RegExpStrategyNumber:
+ trg = createTrgmCNFA(val, fcinfo->flinfo->fn_mcxt, &graph);
+ if (trg && ARRNELEM(trg) > 0)
+ {
+ /*
+ * Successful regex processing: store CNFA-like graph as an
+ * extra_data.
+ */
+ *extra_data = (Pointer *) palloc0(sizeof(Pointer) *
+ ARRNELEM(trg));
+ for (i = 0; i < ARRNELEM(trg); i++)
+ (*extra_data)[i] = (Pointer) graph;
+ }
+ else
+ {
+ /* No result: have to do full index scan. */
+ *nentries = 0;
+ *searchMode = GIN_SEARCH_MODE_ALL;
+ PG_RETURN_POINTER(entries);
+ }
+ break;
default:
elog(ERROR, "unrecognized strategy number: %d", strategy);
trg = NULL; /* keep compiler quiet */
@@ -147,7 +174,7 @@ gin_trgm_consistent(PG_FUNCTION_ARGS)
/* text *query = PG_GETARG_TEXT_P(2); */
int32 nkeys = PG_GETARG_INT32(3);
- /* Pointer *extra_data = (Pointer *) PG_GETARG_POINTER(4); */
+ Pointer *extra_data = (Pointer *) PG_GETARG_POINTER(4);
bool *recheck = (bool *) PG_GETARG_POINTER(5);
bool res;
int32 i,
@@ -189,6 +216,17 @@ gin_trgm_consistent(PG_FUNCTION_ARGS)
}
}
break;
+ case RegExpStrategyNumber:
+ case RegExpStrategyNumberICase:
+ if (nkeys < 1)
+ {
+ /* Regex processing gave no result: do full index scan */
+ res = true;
+ break;
+ }
+ res = trigramsMatchGraph((PackedGraph *) extra_data[0], check);
+
+ break;
default:
elog(ERROR, "unrecognized strategy number: %d", strategy);
res = false; /* keep compiler quiet */
diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c
index 87dffd1..71aa938 100644
--- a/contrib/pg_trgm/trgm_op.c
+++ b/contrib/pg_trgm/trgm_op.c
@@ -77,12 +77,6 @@ unique_array(trgm *a, int len)
return curend + 1 - a;
}
-#ifdef KEEPONLYALNUM
-#define iswordchr(c) (t_isalpha(c) || t_isdigit(c))
-#else
-#define iswordchr(c) (!t_isspace(c))
-#endif
-
/*
* Finds first word in string, returns pointer to the word,
* endword points to the character after word
@@ -92,7 +86,7 @@ find_word(char *str, int lenstr, char **endword, int *charlen)
{
char *beginword = str;
- while (beginword - str < lenstr && !iswordchr(beginword))
+ while (beginword - str < lenstr && !ISWORDCHR(beginword))
beginword += pg_mblen(beginword);
if (beginword - str >= lenstr)
@@ -100,7 +94,7 @@ find_word(char *str, int lenstr, char **endword, int *charlen)
*endword = beginword;
*charlen = 0;
- while (*endword - str < lenstr && iswordchr(*endword))
+ while (*endword - str < lenstr && ISWORDCHR(*endword))
{
*endword += pg_mblen(*endword);
(*charlen)++;
@@ -109,8 +103,7 @@ find_word(char *str, int lenstr, char **endword, int *charlen)
return beginword;
}
-#ifdef USE_WIDE_UPPER_LOWER
-static void
+void
cnt_trigram(trgm *tptr, char *str, int bytelen)
{
if (bytelen == 3)
@@ -131,7 +124,6 @@ cnt_trigram(trgm *tptr, char *str, int bytelen)
CPTRGM(tptr, &crc);
}
}
-#endif
/*
* Adds trigrams from words (already padded).
@@ -287,7 +279,7 @@ get_wildcard_part(const char *str, int lenstr,
{
if (in_escape)
{
- if (iswordchr(beginword))
+ if (ISWORDCHR(beginword))
break;
in_escape = false;
in_leading_wildcard_meta = false;
@@ -298,7 +290,7 @@ get_wildcard_part(const char *str, int lenstr,
in_escape = true;
else if (ISWILDCARDCHAR(beginword))
in_leading_wildcard_meta = true;
- else if (iswordchr(beginword))
+ else if (ISWORDCHR(beginword))
break;
else
in_leading_wildcard_meta = false;
@@ -341,7 +333,7 @@ get_wildcard_part(const char *str, int lenstr,
clen = pg_mblen(endword);
if (in_escape)
{
- if (iswordchr(endword))
+ if (ISWORDCHR(endword))
{
memcpy(s, endword, clen);
(*charlen)++;
@@ -369,7 +361,7 @@ get_wildcard_part(const char *str, int lenstr,
in_trailing_wildcard_meta = true;
break;
}
- else if (iswordchr(endword))
+ else if (ISWORDCHR(endword))
{
memcpy(s, endword, clen);
(*charlen)++;
diff --git a/contrib/pg_trgm/trgm_regexp.c b/contrib/pg_trgm/trgm_regexp.c
new file mode 100644
index 0000000..cfcabbb
--- /dev/null
+++ b/contrib/pg_trgm/trgm_regexp.c
@@ -0,0 +1,1797 @@
+/*
+ * contrib/pg_trgm/trgm_regexp.c - regular expression matching using trigrams
+ *
+ * The general idea of index support for a regular expression (regex) search
+ * is to transform the regex to a logical expression on trigrams. For example:
+ *
+ * (ab|cd)efg => ((abe & bef) | (cde & def)) & efg
+ *
+ * If a string matches the regex, then it must match the logical expression of
+ * trigrams. The opposite is not necessary true, however: a string that matches
+ * the logical expression might not match the original regex. Such false
+ * positives are removed during recheck.
+ *
+ * The algorithm to convert a regex to a logical expression is based on
+ * analysis of an automaton that corresponds to regex. The algorithm consists
+ * of four stages:
+ *
+ * 1) Compile the regexp to CNFA form. This is handled by the PostgreSQL
+ * regexp library, but we have peek into the data structures it produces.
+ *
+ * 2) Transform the original CNFA into an automaton-like graph, where arcs
+ * are labeled with trigrams that must be present in order to move from
+ * state to another via the arc. The trigrams used in this stage consist
+ * of colors, not characters, like the original CNFA.
+ *
+ * 3) Expand the color trigrams into regular trigrams consisting of
+ * characters. If too many distinct trigrams are produced, trigrams are
+ * eliminated and the graph is simplified until it's simple enough.
+ *
+ * 4) Finally, the resulting graph is packed into a PackedGraph struct, and
+ * returned to the caller.
+ *
+ *
+ * 1) Compile the regexp to CNFA form
+ * ----------------------------------
+ * The automaton returned by the regexp compiler is a graph where vertices
+ * are "states" and arcs are labeled with colors. Each color represents
+ * a group of characters, so that all characters assigned to the same color
+ * are interchangeable, as far as matching the regexp is concerned. There are
+ * two special states: "initial" and "final". There can be multiple outgoing
+ * arcs from a state labeled with the same color, which makes the automaton
+ * non-deterministic, because it can be in many states simultaneously.
+ *
+ * 2) Transform the original CNFA into an automaton-like graph
+ * -----------------------------------------------------------
+ * In the 2nd stage, the automaton is transformed into a graph that resembles
+ * the original CNFA. Each state in the transformed graph represents a state
+ * from the original CNFA, with an "enter key". The enter key consists of the
+ * last two characters (colors, to be precise) read before entering the state.
+ * There can be one or more states in the transformed graph for each state in
+ * the original CNFA, depending on what characters can precede it. Each arc
+ * is labelled with a trigram that must be present in the string to match.
+ *
+ * So the transformed graph resembles the original CNFA, but the arcs are
+ * labeled with trigrams instead of individual characters, and there are
+ * more states. It is a lossy representation of the original CNFA: any string
+ * that matches the original regexp must match the transformed graph, but the
+ * reverse is not true.
+ *
+ * When building the graph, if the number of states or arcs exceed pre-defined
+ * limits, we give up and simply mark any states not yet processed as final
+ * states. Roughly speaking, that means that we make use of some portion from
+ * the beginning of the regexp.
+ *
+ * 3) Expand the color trigrams into regular trigrams
+ * --------------------------------------------------
+ * The trigrams in the transformed graph are "color trigrams", consisting
+ * of three consecutive colors that must be present in the string. But for
+ * search, we need regular trigrams consisting of characters. In the 3rd
+ * stage, the color trigrams are expanded into regular trigrams. Since each
+ * color can represent many characters, the total number of regular trigrams
+ * after expansion could be very large. Because searching an index with
+ * thousands of trigrams would be slow, and would likely produce so many
+ * "false positives" that you would have to traverse a large fraction of the
+ * index, the graph is simplified further in a lossy fashion by removing
+ * color trigrams until the number of trigrams after expansion is below
+ * MAX_TRGM_COUNT threshold. When a color trigram is removed, the states
+ * connected by any arcs labelled with that trigram are merged.
+ *
+ * 4) Pack the graph into a compact representation
+ * -----------------------------------------------
+ * The 2nd and 3rd stages might have eliminated or merged many of the states
+ * and trigrams created earlier, so in this final stage, the graph is
+ * compacted and packed into a simpler struct that contains only the
+ * information needed to evaluate it.
+ *
+ *
+ * OLD COMMENTS:
+ * States of the graph produced in the first stage are marked with "keys". Key
+ * is a pair of a "prefix" and a state of the original automaton. "Prefix" is
+ * pair of colors of last two read characters. So, knowing the prefix is enough
+ * to know what is a color trigram when we read new character with some
+ * particular color. However, we can know single color of prefix or don't know
+ * any color of it. Each state of resulting graph have an "enter key" (with that
+ * key we've entered this state) and a set of keys which are reachable without
+ * reading any predictable color trigram. The algorithm of processing each state
+ * of resulting graph are so:
+ * 1) Add all keys which achievable without reading of any predictable color
+ * trigram.
+ * 2) Add outgoing arcs labeled with trigrams.
+ * Step 2 leads to creation of new states. We use breadth-first algorithm for
+ * processing them.
+ *
+ * Consider this on example of regex ab[cd]. This regex are transformed into
+ * following CNFA (for simplicity of example we don't use colors):
+ *
+ * 4#
+ * c/
+ * a b /
+ * 1* --- 2 ---- 3
+ * \
+ * d\
+ * 5#
+ *
+ * We use * to mark initial state and # to mark final state. It's not depicted,
+ * but states 1, 4, 5 have self-referencing arcs for all possible characters,
+ * because pattern can match to any part of string.
+ *
+ * As the result of first stage we will have following graph:
+ *
+ * abc abd
+ * 2# <---- 1* ----> 3#
+ *
+ * Let us consider the sequence of algorithm work for this graph producing.
+ * We will designate state key as (prefix.s, prefix.ambiguity, nstate).
+ * 1) Create state 1 with enter key (" ", true, 1).
+ * 2) Add key (" a", true, 2) to state 1.
+ * 3) Add key ("ab", false, 3) to state 1.
+ * 4) Add arc from output state 1 to new state 2 with enter key
+ * ("bc", false, 4).
+ * 5) Mark state 2 final because state 4 of source CNFA is marked as final.
+ * 6) Add arc from output state 1 to new state 3 with enter key
+ * ("bd", false, 5).
+ * 7) Mark state 3 final because state 4 of source CNFA is marked as final.
+ *
+ * At the second stage we select color trigrams to expand into simple trigrams.
+ * We're removing color trigrams starting from the most wide. When removing
+ * color trigram we have to merge states connected by corresponding arcs.
+ * It's necessary to not merge initial and final states, because our graph
+ * becomes useless in this case.
+ */
+#include "postgres.h"
+
+#include "trgm.h"
+
+#include "catalog/pg_collation.h"
+#include "fmgr.h"
+#include "miscadmin.h"
+#include "mb/pg_wchar.h"
+#include "nodes/pg_list.h"
+#include "regex/regex.h"
+#undef INFINITY /* avoid conflict of INFINITY definition */
+#include "regex/regguts.h"
+#include "tsearch/ts_locale.h"
+#include "utils/hsearch.h"
+
+/*
+ * Uncomment to print intermediate stages, for exploring and debugging the
+ * algorithm implementation. This produces three graphs in Graphviz .dot
+ * format, in /tmp.
+ */
+#define TRGM_REGEXP_DEBUG
+
+/*---
+ * Following group of parameters are used in order to limit our computations.
+ * Otherwise regex processing could be too slow and memory-consuming.
+ *
+ * MAX_RESULT_STATES - How many states we allow in result CNFA-like graph
+ * MAX_RESULT_ARCS - How many arcs we allow in result CNFA-like graph
+ * MAX_TRGM_COUNT - How many simple trigrams we allow to extract
+ */
+#define MAX_RESULT_STATES 128
+#define MAX_RESULT_ARCS 1024
+#define MAX_TRGM_COUNT 256
+
+/* Virtual color for representation in prefixes and color trigrams. */
+#define EMPTY_COLOR ((color) -1)
+#define UNKNOWN_COLOR ((color) -2)
+
+/*
+ * Widechar trigram datatype for holding trigram before possible conversion into
+ * CRC32
+ */
+typedef color ColorTrgm[3];
+
+/*
+ * Maximum length of multibyte encoding character is 4. So, we can hold it in
+ * 32 bit integer for handling simplicity.
+ */
+typedef uint32 mb_char;
+
+/*----
+ * Attributes of CNFA colors:
+ *
+ * expandable - flag indicates we potentially can expand this
+ * color into distinct characters.
+ * containNonAlpha - flag indicates if color might contain
+ * non-alphanumeric characters (which aren't
+ * extracted into trigrams)
+ * alphaCharsCount - count of characters in color
+ * alphaCharsCountAllocated - allocated size of alphaChars array
+ * alphaChars - array of alphanumeric characters of this color
+ * (which are extracted into trigrams)
+ *
+ * When expandable is false, all other attributes doesn't matter we just think
+ * this color as always unknown character.
+ */
+typedef struct
+{
+ bool expandable;
+ bool containNonAlpha;
+ int alphaCharsCount;
+ int alphaCharsCountAllocated;
+ mb_char *alphaChars;
+} ColorInfo;
+
+/*
+ * Prefix is information about colors of last two read characters when coming
+ * into specific CNFA state. These colors could have special values
+ * UNKNOWN_COLOR and EMPTY_COLOR. UNKNOWN_COLOR means that we read some
+ * character of unexpandable color. EMPTY_COLOR means that we read
+ * non-alphanumeric character.
+ */
+typedef struct
+{
+ color s[2];
+} TrgmPrefix;
+
+/*
+ * "Key" of resulting state: pair of prefix and source CNFA state.
+ */
+typedef struct
+{
+ TrgmPrefix prefix;
+ int nstate;
+} TrgmStateKey;
+
+/*---
+ * State of resulting graph.
+ *
+ * enterKey - a key with which we can enter this state
+ * keys - all keys achievable without reading any predictable trigram
+ * arcs - outgoing arcs of this state
+ * parent - parent state if this state has been merged
+ * children - children states if this state has been merged
+ * fin - flag indicated this state is final
+ * init - flag indicated this state is initial
+ * queued - flag indicated this state is queued in CNFA-like graph
+ * construction
+ * number - number of this state (use at the package stage)
+ */
+typedef struct TrgmState
+{
+ TrgmStateKey enterKey;
+ List *arcs;
+ bool fin;
+ bool init;
+
+ /* (stage 2) */
+ List *keys;
+ bool queued;
+
+ struct TrgmState *parent;
+ List *children;
+ int number;
+} TrgmState;
+
+/*
+ * Arc in the transformed graph. Arc is labeled with trigram.
+ */
+typedef struct
+{
+ TrgmState *target;
+ ColorTrgm trgm;
+} TrgmArc;
+
+/*
+ * Information about arc of specific color trigram: contain pointers to the
+ * source and target states. (stage 3)
+ */
+typedef struct
+{
+ TrgmState *source;
+ TrgmState *target;
+} ArcInfo;
+
+/*---
+ * Information about color trigram: (stage 3)
+ *
+ * trgm - trigram itself
+ * number - number of this trigram (used in the package stage)
+ * count - number of simple trigrams fitting into this color trigram
+ * expanded - flag indicates this color trigram is expanded into simple trigrams
+ * arcs - list of all arcs labeled with this color trigram.
+ */
+typedef struct
+{
+ ColorTrgm trgm;
+ int number;
+ int count;
+ bool expanded;
+ List *arcs;
+} ColorTrgmInfo;
+
+/*---
+ * Data structure representing all the data we need during regex processing.
+ * Initially we set "cnfa" to cnfa of regex, write color information info
+ * "colorInfo" and set "overflowed" flag to false. And the stage of trigram
+ * CFNA-like graph creation "states", "initState" and "arcsCount" are filled.
+ * "owerflowed" flag could be set in case of overflow. Then we collect array
+ * of all present color trigrams to "colorTrgms" and "colorTrgmsCount" and
+ * expand them into "trg" and "totalTrgmCount".
+ *
+ * cnfa - source CFNA of regex
+ * colorInfo - processed information of regex colors
+ * ncolors - number of colors in colorInfo
+ * states - hash of states of resulting graph
+ * initState - pointer to initial state of resulting graph
+ * arcsCount - total number of arcs of resulting graph (for resource
+ * limiting)
+ * colorTrgms - array of all color trigrams presented in graph
+ * colorTrgmsCount - count of that color trigrams
+ * totalTrgmCount - total count of extracted simple trigrams
+ * queue - queue for CFNA-like graph construction
+ * overflowed - if set, we have exceeded resource limit for transformation
+ */
+typedef struct
+{
+ /*
+ * Source CNFA of the regexp, and color information extracted from it
+ * (stage 1)
+ */
+ struct cnfa *cnfa;
+ ColorInfo *colorInfo;
+ int ncolors;
+
+ /* Transformed graph (stage 2) */
+ HTAB *states;
+ TrgmState *initState;
+ int arcsCount;
+ List *queue;
+ bool overflowed;
+
+ /* Information about distinct color trigrams in the graph (stage 3) */
+ ColorTrgmInfo *colorTrgms;
+
+ int colorTrgmsCount;
+ int totalTrgmCount;
+} TrgmCNFA;
+
+
+/*
+ * Final, compact representation of CNFA-like graph.
+ */
+typedef struct
+{
+ int targetState;
+ int colorTrgm;
+} PackedArc;
+
+typedef struct
+{
+ int arcsCount;
+ PackedArc *arcs;
+} PackedState;
+
+struct PackedGraph
+{
+ /*
+ * colorTrigramsCount and colorTrigramsGroups contain information
+ * about how trigrams are grouped into color trigrams. "colorTrigramsCount"
+ * represents total count of color trigrams and "colorTrigramGroups" contain
+ * number of simple trigrams in each color trigram.
+ */
+ int colorTrigramsCount;
+ int *colorTrigramGroups;
+
+ /*
+ * "states" points to definition of "statesCount" states. 0 state is
+ * always initial state and 1 state is always final state. Each state's
+ * "arcs" points to "arcsCount" description of arcs. Each arc describe by
+ * number of color trigram and number of target state (both are zero-based).
+ */
+ int statesCount;
+ PackedState *states;
+
+ /* Temporary work space for trigramsMatchGraph() */
+ bool *colorTrigramsActive;
+ bool *statesActive;
+};
+
+
+/* prototypes for private functions */
+static bool activateState(PackedGraph *graph, int stateno);
+static ColorInfo *getColorInfo(regex_t *regex, int *ncolors);
+static TrgmState *getState(TrgmCNFA *trgmCNFA, TrgmStateKey *key);
+static void transformGraph(TrgmCNFA *trgmCNFA);
+static bool selectColorTrigrams(TrgmCNFA *trgmCNFA);
+static TRGM *expandColorTrigrams(TrgmCNFA *trgmCNFA);
+static PackedGraph *packGraph(TrgmCNFA *trgmCNFA, MemoryContext context);
+
+#ifdef TRGM_REGEXP_DEBUG
+static void printSourceCNFA(struct cnfa *cnfa, ColorInfo *colors, int ncolors);
+static void printTrgmCNFA(TrgmCNFA *trgmCNFA);
+static void printPackedGraph(PackedGraph *packedGraph, TRGM *trigrams);
+#endif
+
+
+/*
+ * Main entry point to process a regular expression. Returns a packed graph
+ * representation of the regular expression, or NULL if the regular expression
+ * was too complex.
+ */
+TRGM *
+createTrgmCNFA(text *text_re, MemoryContext context, PackedGraph **graph)
+{
+ struct guts *g;
+ TrgmCNFA trgmCNFA;
+ regex_t *regex;
+ TRGM *trg;
+
+ /*
+ * Stage 1: Compile the regexp into a CNFA, using the PostgreSQL regexp
+ * library.
+ */
+#ifdef IGNORECASE
+ regex = RE_compile_and_cache(text_re, REG_ADVANCED | REG_ICASE, DEFAULT_COLLATION_OID);
+#else
+ regex = RE_compile_and_cache(text_re, REG_ADVANCED, DEFAULT_COLLATION_OID);
+#endif
+ g = (struct guts *) regex->re_guts;
+ trgmCNFA.cnfa = &g->search;
+
+ /* Collect color information from the CNFA */
+ trgmCNFA.colorInfo = getColorInfo(regex, &trgmCNFA.ncolors);
+
+#ifdef TRGM_REGEXP_DEBUG
+ printSourceCNFA(&g->search, trgmCNFA.colorInfo, trgmCNFA.ncolors);
+#endif
+
+ /*
+ * Stage 2: Create a transformed graph from the source CNFA.
+ */
+ transformGraph(&trgmCNFA);
+
+#ifdef TRGM_REGEXP_DEBUG
+ printTrgmCNFA(&trgmCNFA);
+#endif
+
+ if (trgmCNFA.initState->fin)
+ return NULL;
+
+ /*
+ * Stage 3: Select color trigrams to expand.
+ */
+ if (!selectColorTrigrams(&trgmCNFA))
+ return NULL;
+
+ /*
+ * Stage 4: Expand color trigrams and pack graph into final representation.
+ */
+ trg = expandColorTrigrams(&trgmCNFA);
+
+ *graph = packGraph(&trgmCNFA, context);
+
+#ifdef TRGM_REGEXP_DEBUG
+ printPackedGraph(*graph, trg);
+#endif
+
+ return trg;
+}
+
+/*
+ * Main entry point for evaluating a graph.
+ */
+bool
+trigramsMatchGraph(PackedGraph *graph, bool *check)
+{
+ int i,
+ j,
+ k;
+
+ /*
+ * Reset temporary working areas.
+ */
+ memset(graph->colorTrigramsActive, 0,
+ sizeof(bool) * graph->colorTrigramsCount);
+ memset(graph->statesActive, 0, sizeof(bool) * graph->statesCount);
+
+ /* Check which color trigrams were matched. */
+ j = 0;
+ for (i = 0; i < graph->colorTrigramsCount; i++)
+ {
+ int cnt = graph->colorTrigramGroups[i];
+ for (k = j; k < j + cnt; k++)
+ {
+ if (check[k])
+ {
+ /*
+ * Found one matched trigram in the group. Can skip the rest
+ * of them and go to the next group.
+ */
+ graph->colorTrigramsActive[i] = true;
+ break;
+ }
+ }
+ j = j + cnt;
+ }
+
+ /* Recursively evaluate graph, starting from initial state */
+ return activateState(graph, 0);
+}
+
+/*
+ * Recursive function for graph state activation. Returns true if state
+ * activation leads to activation of final state.
+ */
+static bool
+activateState(PackedGraph *graph, int stateno)
+{
+ PackedState *state = &graph->states[stateno];
+ int cnt = state->arcsCount;
+ int i;
+
+ graph->statesActive[stateno] = true;
+
+ /* Loop over arcs */
+ for (i = 0; i < cnt; i++)
+ {
+ PackedArc *arc = (PackedArc *) &state->arcs[i];
+ /*
+ * If corresponding color trigram is present then activate the
+ * corresponding state.
+ */
+ if (graph->colorTrigramsActive[arc->colorTrgm])
+ {
+ if (arc->targetState == 1)
+ return true; /* reached final state */
+ if (!graph->statesActive[arc->targetState])
+ {
+ if (activateState(graph, arc->targetState))
+ return true;
+ }
+ }
+ }
+ return false;
+}
+
+/*---------------------
+ * Subroutines for pre-processing the color map.
+ *---------------------
+ */
+
+/*
+ * Convert pg_wchar to multibyte character.
+ */
+static mb_char
+convertPgWchar(pg_wchar c)
+{
+ /*
+ * "s" has enough of space for a multibyte character of 4 bytes, and
+ * a zero-byte at the end.
+ */
+ char s[5];
+ char *lowerCased;
+ mb_char result;
+
+ if (c == 0)
+ return 0;
+
+ MemSet(s, 0, sizeof(s));
+ pg_wchar2mb_with_len(&c, s, 1);
+
+ /* Convert to lowercase if needed */
+#ifdef IGNORECASE
+ lowerCased = lowerstr(s);
+ if (strncmp(lowerCased, s, 4))
+ return 0;
+#else
+ lowerCased = s;
+#endif
+ strncpy((char *) &result, lowerCased, 4);
+ return result;
+}
+
+/*
+ * Add new character into color information
+ */
+static void
+addCharToColor(ColorInfo *colorInfo, pg_wchar c)
+{
+ if (!colorInfo->alphaChars)
+ {
+ colorInfo->alphaCharsCountAllocated = 16;
+ colorInfo->alphaChars = (pg_wchar *) palloc(sizeof(pg_wchar) *
+ colorInfo->alphaCharsCountAllocated);
+ }
+ if (colorInfo->alphaCharsCount >= colorInfo->alphaCharsCountAllocated)
+ {
+ colorInfo->alphaCharsCountAllocated *= 2;
+ colorInfo->alphaChars = (pg_wchar *) repalloc(colorInfo->alphaChars,
+ sizeof(pg_wchar) * colorInfo->alphaCharsCountAllocated);
+ }
+ colorInfo->alphaChars[colorInfo->alphaCharsCount++] = c;
+}
+
+/*
+ * Recursive function which scans colormap and updates color attributes in
+ * ColorInfo structure. Colormap is a tree which maps character to colors.
+ * The tree contains 4 levels. Each level corresponding to byte of character.
+ * Non-leaf nodes of tree contain pointers to descending tree nodes for each
+ * byte value. Leaf nodes of tree contain color numbers for each byte value.
+ * Potentially colormap could be very large, but most part of the colormap
+ * points to zero colors. That's tree nodes which corresponds to only zero
+ * color can be reused.
+ */
+static void
+scanColorMap(union tree tree[NBYTS], union tree *t, ColorInfo *colorInfos,
+ int level, pg_wchar p)
+{
+ int i;
+
+ check_stack_depth();
+
+ if (level < NBYTS - 1)
+ {
+ /* non-leaf node */
+ for (i = 0; i < BYTTAB; i++)
+ {
+ /*
+ * This condition checks if all underlying levels express zero
+ * color. Zero color uses multiple links to same tree node. So,
+ * avoid scanning it because it's expensive.
+ */
+ if (t->tptr[i] == &tree[level + 1])
+ continue;
+ /* Recursive scanning of next level color table */
+ scanColorMap(tree, t->tptr[i], colorInfos, level + 1, (p << 8) | i);
+ }
+ }
+ else
+ {
+ /* leaf node */
+ p <<= 8;
+ for (i = 0; i < BYTTAB; i++)
+ {
+ ColorInfo *colorInfo = &colorInfos[t->tcolor[i]];
+ pg_wchar c;
+
+ if (!colorInfo->expandable)
+ continue;
+
+ /* Convert to multibyte character */
+ c = convertPgWchar(p | i);
+
+ if (!c)
+ continue;
+
+ /* Update color attributes according to next character */
+ if (ISWORDCHR((char *)&c))
+ addCharToColor(colorInfo, c);
+ else
+ colorInfo->containNonAlpha = true;
+ }
+ }
+}
+
+/*
+ * Fill ColorInfo structure for each color by scanning colormap.
+ */
+static ColorInfo *
+getColorInfo(regex_t *regex, int *ncolors)
+{
+ struct guts *g;
+ struct colormap *cm;
+ ColorInfo *result;
+ int colorsCount;
+ int i;
+
+ g = (struct guts *) regex->re_guts;
+ cm = &g->cmap;
+ *ncolors = colorsCount = cm->max + 1;
+
+ result = (ColorInfo *) palloc0(colorsCount * sizeof(ColorInfo));
+
+ /*
+ * Zero color is a default color which contains all characters that aren't
+ * in explicitly expressed classes. Mark that we can expect everything
+ * from it.
+ */
+ result[0].expandable = false;
+ for (i = 1; i < colorsCount; i++)
+ result[i].expandable = true;
+
+ scanColorMap(cm->tree, &cm->tree[0], result, 0, 0);
+
+ return result;
+}
+
+/*---------------------
+ * Subroutines for transforming original CNFA graph into a color trigram
+ * graph.
+ *---------------------
+ */
+
+/*
+ * Check if prefix1 "contains" prefix2. "contains" mean that any exact prefix
+ * (which no ambiguity) which satisfy to prefix2 also satisfy to prefix1.
+ */
+static bool
+prefixContains(TrgmPrefix *prefix1, TrgmPrefix *prefix2)
+{
+ if (prefix1->s[1] == UNKNOWN_COLOR)
+ {
+ /* Fully ambiguous prefix contains everything */
+ return true;
+ }
+ else if (prefix1->s[0] == UNKNOWN_COLOR)
+ {
+ /*
+ * Prefix with only first unknown color contains every prefix with same
+ * second color.
+ */
+ if (prefix1->s[1] == prefix2->s[1])
+ return true;
+ else
+ return false;
+ }
+ else
+ {
+ /* Exact prefix contains only exactly same prefix */
+ if (prefix1->s[0] == prefix2->s[0] && prefix1->s[1] == prefix2->s[1])
+ return true;
+ else
+ return false;
+ }
+}
+
+/*
+ * Add all keys that can be reached without reading any color trigram to
+ * state of CNFA-like graph on color trigrams.
+ */
+static void
+addKeys(TrgmCNFA *trgmCNFA, TrgmState *state, TrgmStateKey *key)
+{
+ struct carc *s;
+ TrgmStateKey destKey;
+ ListCell *cell, *prev, *next;
+ TrgmStateKey *keyCopy;
+
+ MemSet(&destKey, 0, sizeof(TrgmStateKey));
+
+ /* Adjust list of keys with new one */
+ prev = NULL;
+ cell = list_head(state->keys);
+ while (cell)
+ {
+ TrgmStateKey *existingKey = (TrgmStateKey *) lfirst(cell);
+ next = lnext(cell);
+ if (existingKey->nstate == key->nstate)
+ {
+ if (prefixContains(&existingKey->prefix, &key->prefix))
+ {
+ /* This old key already covers the new key. Nothing to do */
+ return;
+ }
+ if (prefixContains(&key->prefix, &existingKey->prefix))
+ {
+ /*
+ * The new key covers this old key. Remove the old key, it's
+ * no longer needed once we add this key to the list.
+ */
+ state->keys = list_delete_cell(state->keys, cell, prev);
+ }
+ else
+ prev = cell;
+ }
+ else
+ prev = cell;
+ cell = next;
+ }
+
+ keyCopy = (TrgmStateKey *) palloc(sizeof(TrgmStateKey));
+ memcpy(keyCopy, key, sizeof(TrgmStateKey));
+ state->keys = lappend(state->keys, keyCopy);
+
+ /* Mark final state */
+ if (key->nstate == trgmCNFA->cnfa->post)
+ {
+ state->fin = true;
+ return;
+ }
+
+ /*
+ * Loop through all outgoing arcs from the corresponding state in the
+ * original CNFA.
+ */
+ s = trgmCNFA->cnfa->states[key->nstate];
+ while (s->co != COLORLESS)
+ {
+ ColorInfo *colorInfo;
+
+ if (s->co == trgmCNFA->cnfa->bos[1])
+ {
+ /* Start of line (^) */
+ destKey.nstate = s->to;
+
+ /* Mark prefix as start of new color trigram */
+ destKey.prefix.s[0] = EMPTY_COLOR;
+ destKey.prefix.s[1] = EMPTY_COLOR;
+
+ /* Add key to this state */
+ addKeys(trgmCNFA, state, &destKey);
+ if (state->fin)
+ return;
+ }
+ else if (s->co == trgmCNFA->cnfa->eos[1])
+ {
+ /* End of string ($) */
+ if (key->prefix.s[0] == UNKNOWN_COLOR ||
+ key->prefix.s[1] == EMPTY_COLOR)
+ {
+ destKey.nstate = s->to;
+
+ /*
+ * Let's think prefix to become ambiguous (in order to prevent
+ * latter fiddling around with keys).
+ */
+ destKey.prefix.s[1] = UNKNOWN_COLOR;
+ destKey.prefix.s[0] = UNKNOWN_COLOR;
+
+ /* Prefix is ambiguous, add key to the same state */
+ addKeys(trgmCNFA, state, &destKey);
+ if (state->fin)
+ return;
+ }
+ }
+ else
+ {
+ /* Regular color */
+ colorInfo = &trgmCNFA->colorInfo[s->co];
+
+ if (colorInfo->expandable)
+ {
+ if (colorInfo->containNonAlpha &&
+ (key->prefix.s[0] == UNKNOWN_COLOR ||
+ key->prefix.s[1] == EMPTY_COLOR))
+ {
+ /*
+ * When color contain non-alphanumeric character we should
+ * add empty key with empty prefix.
+ */
+ destKey.prefix.s[0] = EMPTY_COLOR;
+ destKey.prefix.s[1] = EMPTY_COLOR;
+ destKey.nstate = s->to;
+ addKeys(trgmCNFA, state, &destKey);
+ if (state->fin)
+ return;
+ }
+
+ if (colorInfo->alphaCharsCount > 0 &&
+ key->prefix.s[0] == UNKNOWN_COLOR)
+ {
+ /* Add corresponding key when prefix was unknown */
+ destKey.prefix.s[0] = key->prefix.s[1];
+ destKey.prefix.s[1] = s->co;
+ destKey.nstate = s->to;
+ addKeys(trgmCNFA, state, &destKey);
+ if (state->fin)
+ return;
+ }
+ }
+ else
+ {
+ /*
+ * Unexpandable color. Add corresponding key to this state.
+ */
+ destKey.nstate = s->to;
+ destKey.prefix.s[0] = UNKNOWN_COLOR;
+ destKey.prefix.s[1] = UNKNOWN_COLOR;
+ addKeys(trgmCNFA, state, &destKey);
+ if (state->fin)
+ return;
+ }
+ }
+ s++;
+ }
+}
+
+/*
+ * Add outgoing arc from state if needed.
+ */
+static void
+addArc(TrgmCNFA *trgmCNFA, TrgmState *state, TrgmStateKey *key,
+ TrgmStateKey *destKey, color co)
+{
+ TrgmArc *arc;
+ ListCell *cell2;
+
+ /* Check whether we actually can add the arc */
+ if (key->prefix.s[0] == UNKNOWN_COLOR ||
+ (key->prefix.s[1] == EMPTY_COLOR && co == EMPTY_COLOR))
+ return;
+
+ /* If we have the same key here, we don't need to add new arc */
+ foreach(cell2, state->keys)
+ {
+ TrgmStateKey *key2 = (TrgmStateKey *) lfirst(cell2);
+ if (key2->nstate == destKey->nstate &&
+ prefixContains(&key2->prefix, &destKey->prefix))
+ {
+ return;
+ }
+ }
+
+ /* Not found, add new arc */
+ arc = (TrgmArc *) palloc(sizeof(TrgmArc));
+ arc->target = getState(trgmCNFA, destKey);
+ arc->trgm[0] = key->prefix.s[0];
+ arc->trgm[1] = key->prefix.s[1];
+ arc->trgm[2] = co;
+ state->arcs = lappend(state->arcs, arc);
+ trgmCNFA->arcsCount++;
+}
+
+/*
+ * Add outgoing arcs from given state.
+ */
+static void
+addArcs(TrgmCNFA *trgmCNFA, TrgmState *state)
+{
+ struct carc *s;
+ TrgmStateKey destKey;
+ ListCell *cell;
+
+ MemSet(&destKey, 0, sizeof(TrgmStateKey));
+
+ /*
+ * Iterate over keys associated with this output state.
+ */
+ foreach(cell, state->keys)
+ {
+ TrgmStateKey *key = (TrgmStateKey *) lfirst(cell);
+ s = trgmCNFA->cnfa->states[key->nstate];
+ while (s->co != COLORLESS)
+ {
+ ColorInfo *colorInfo;
+ if (s->co == trgmCNFA->cnfa->bos[1])
+ {
+ /* Should be already handled by addKeys. */
+ }
+ else if (s->co == trgmCNFA->cnfa->eos[1])
+ {
+ /* End of the string ($) */
+ destKey.nstate = s->to;
+
+ /* Assume prefix to become ambiguous after end of the string */
+ destKey.prefix.s[1] = UNKNOWN_COLOR;
+ destKey.prefix.s[0] = UNKNOWN_COLOR;
+
+ addArc(trgmCNFA, state, key, &destKey, EMPTY_COLOR);
+ }
+ else
+ {
+ /* Regular color: try to add outgoing arcs */
+ colorInfo = &trgmCNFA->colorInfo[s->co];
+
+ if (colorInfo->expandable)
+ {
+ if (colorInfo->containNonAlpha)
+ {
+ destKey.prefix.s[0] = EMPTY_COLOR;
+ destKey.prefix.s[1] = EMPTY_COLOR;
+ destKey.nstate = s->to;
+ addArc(trgmCNFA, state, key, &destKey, EMPTY_COLOR);
+ }
+
+ if (colorInfo->alphaCharsCount > 0)
+ {
+ destKey.prefix.s[0] = key->prefix.s[1];
+ destKey.prefix.s[1] = s->co;
+ destKey.nstate = s->to;
+ addArc(trgmCNFA, state, key, &destKey, s->co);
+ }
+ }
+ }
+ s++;
+ }
+ }
+}
+
+/*
+ * Get state of trigram CNFA-like graph of given enter key and queue it for
+ * processing if needed.
+ */
+static TrgmState *
+getState(TrgmCNFA *trgmCNFA, TrgmStateKey *key)
+{
+ bool found;
+ TrgmState *state;
+
+ state = hash_search(trgmCNFA->states, key, HASH_ENTER, &found);
+ if (!found)
+ {
+ /* New state: initialize and queue it */
+ state->arcs = NIL;
+ state->keys = NIL;
+ state->init = false;
+ state->fin = false;
+ state->children = NIL;
+ state->parent = NULL;
+ state->number = -1;
+
+ state->queued = true;
+ trgmCNFA->queue = lappend(trgmCNFA->queue, state);
+ }
+ return state;
+}
+
+/*
+ * Process the state: add keys and then add outgoing arcs.
+ */
+static void
+processState(TrgmCNFA *trgmCNFA, TrgmState *state)
+{
+ addKeys(trgmCNFA, state, &state->enterKey);
+ /*
+ * Add outgoing arc only if state isn't final (we have no interest in arc
+ * if we already match)
+ */
+ if (!state->fin)
+ addArcs(trgmCNFA, state);
+
+ state->queued = false;
+}
+
+/*
+ * Process queue of CFNA-like graph states until all the states are processed.
+ * This algorithm may be stopped due to resources limitation. In this case we
+ * assume every unprocessed branch to immediately finish with matching (this
+ * can give us more false positives but no false negatives) by marking all
+ * unprocessed states as final.
+ */
+static void
+transformGraph(TrgmCNFA *trgmCNFA)
+{
+ HASHCTL hashCtl;
+ TrgmStateKey initkey;
+ TrgmState *initstate;
+
+ trgmCNFA->queue = NIL;
+ trgmCNFA->arcsCount = 0;
+ trgmCNFA->overflowed = false;
+
+ /* Initialize hash of states */
+ hashCtl.keysize = sizeof(TrgmStateKey);
+ hashCtl.entrysize = sizeof(TrgmState);
+ hashCtl.hcxt = CurrentMemoryContext;
+ hashCtl.hash = tag_hash;
+ hashCtl.match = memcmp;
+ trgmCNFA->states =
+ hash_create("Trigram CNFA",
+ 1024,
+ &hashCtl,
+ HASH_ELEM | HASH_CONTEXT | HASH_FUNCTION | HASH_COMPARE);
+
+ /* Create initial state */
+ MemSet(&initkey, 0, sizeof(TrgmStateKey));
+ initkey.prefix.s[0] = UNKNOWN_COLOR;
+ initkey.prefix.s[1] = UNKNOWN_COLOR;
+ initkey.nstate = trgmCNFA->cnfa->pre;
+
+ initstate = getState(trgmCNFA, &initkey);
+ initstate->init = true;
+ trgmCNFA->initState = initstate;
+
+ /*
+ * Recursively build the transformed graph by processing queue of
+ * states (Breadth-first search).
+ */
+ while (trgmCNFA->queue != NIL)
+ {
+ TrgmState *state = (TrgmState *) linitial(trgmCNFA->queue);
+
+ state->queued = false;
+ trgmCNFA->queue = list_delete_first(trgmCNFA->queue);
+
+ /*
+ * If we overflow then just mark state as final. Otherwise do actual
+ * processing.
+ */
+ if (trgmCNFA->overflowed)
+ state->fin = true;
+ else
+ processState(trgmCNFA, state);
+
+ /* Did we overflow? */
+ if (trgmCNFA->arcsCount > MAX_RESULT_ARCS ||
+ hash_get_num_entries(trgmCNFA->states) > MAX_RESULT_STATES)
+ {
+ trgmCNFA->overflowed = true;
+ }
+ }
+}
+
+
+/*---------------------
+ * Subroutines for expanding color trigrams into regular trigrams.
+ *---------------------
+ */
+
+/*
+ * Compare function for sorting of color trigrams by their colors.
+ */
+static int
+colorTrgmInfoCmp(const void *p1, const void *p2)
+{
+ const ColorTrgmInfo *c1 = (const ColorTrgmInfo *)p1;
+ const ColorTrgmInfo *c2 = (const ColorTrgmInfo *)p2;
+ return memcmp(c1->trgm, c2->trgm, sizeof(ColorTrgm));
+}
+
+/*
+ * Compare function for sorting color trigrams by descending of their simple
+ * trigrams counts.
+ */
+static int
+colorTrgmInfoCountCmp(const void *p1, const void *p2)
+{
+ const ColorTrgmInfo *c1 = (const ColorTrgmInfo *)p1;
+ const ColorTrgmInfo *c2 = (const ColorTrgmInfo *)p2;
+ if (c1->count < c2->count)
+ return 1;
+ else if (c1->count == c2->count)
+ return 0;
+ else
+ return -1;
+}
+
+/*
+ * Convert trigram into trgm datatype.
+ */
+static void
+fillTrgm(trgm *ptrgm, mb_char s[3])
+{
+ char text[14],
+ *p;
+ int i;
+
+ /* Write multibyte string into "text". */
+ p = text;
+ for (i = 0; i < 3; i++)
+ {
+ int len;
+ if (s[i] != 0)
+ {
+ len = strnlen((char *) &s[i], 4);
+ memcpy(p, &s[i], len);
+ p += len;
+ }
+ else
+ {
+ *p++ = ' ';
+ }
+ }
+ *p = 0;
+
+ /* Extract trigrams from "text" */
+ cnt_trigram(ptrgm, text, p - text);
+}
+
+/*
+ * Merge states of graph.
+ */
+static void
+mergeStates(TrgmState *state1, TrgmState *state2)
+{
+ ListCell *cell;
+
+ Assert(!state1->parent);
+ Assert(!state2->parent);
+ Assert(state1 != state2);
+
+ foreach(cell, state2->children)
+ {
+ TrgmState *state = (TrgmState *) lfirst(cell);
+ state->parent = state1;
+ }
+ state1->init = state1->init || state2->init;
+ state1->fin = state1->fin || state2->fin;
+ state2->parent = state1;
+ state1->children = list_concat(state1->children, state2->children);
+ state1->children = lappend(state1->children, state2);
+ state2->children = NIL;
+}
+
+/*
+ * Get vector of all color trigrams in graph and select which of them to expand
+ * into simple trigrams.
+ */
+static bool
+selectColorTrigrams(TrgmCNFA *trgmCNFA)
+{
+ HASH_SEQ_STATUS scan_status;
+ int arcsCount = trgmCNFA->arcsCount,
+ i;
+ TrgmState *state;
+ ColorTrgmInfo *p1, *p2, *colorTrgms;
+ int totalTrgmCount, number;
+
+ /* Collect color trigrams from all arcs */
+ colorTrgms = (ColorTrgmInfo *) palloc(sizeof(ColorTrgmInfo) * arcsCount);
+ i = 0;
+ hash_seq_init(&scan_status, trgmCNFA->states);
+ while ((state = (TrgmState *) hash_seq_search(&scan_status)) != NULL)
+ {
+ ListCell *cell;
+ foreach(cell, state->arcs)
+ {
+ TrgmArc *arc = (TrgmArc *) lfirst(cell);
+ ArcInfo *arcInfo = (ArcInfo *) palloc(sizeof(ArcInfo));
+ arcInfo->source = state;
+ arcInfo->target = arc->target;
+ colorTrgms[i].arcs = list_make1(arcInfo);
+ colorTrgms[i].expanded = true;
+ memcpy(&colorTrgms[i].trgm, &arc->trgm, sizeof(ColorTrgm));
+ i++;
+ }
+ }
+ Assert(i == arcsCount);
+
+ /* Remove duplicates and merge arcs lists */
+ if (arcsCount >= 2)
+ {
+ qsort(colorTrgms, arcsCount, sizeof(ColorTrgmInfo), colorTrgmInfoCmp);
+ p1 = colorTrgms + 1;
+ p2 = colorTrgms;
+ while (p1 - colorTrgms < arcsCount)
+ {
+ if (memcmp(p1->trgm, p2->trgm, sizeof(ColorTrgm)) != 0)
+ {
+ p2++;
+ *p2 = *p1;
+ }
+ else
+ {
+ p2->arcs = list_concat(p2->arcs, p1->arcs);
+ }
+ p1++;
+ }
+ trgmCNFA->colorTrgmsCount = p2 + 1 - colorTrgms;
+ }
+ else
+ {
+ trgmCNFA->colorTrgmsCount = arcsCount;
+ }
+
+ /* Count number of simple trigrams in each color trigram */
+ totalTrgmCount = 0;
+ for (i = 0; i < trgmCNFA->colorTrgmsCount; i++)
+ {
+ int j, count = 1;
+ ColorTrgmInfo *trgmInfo = &colorTrgms[i];
+ for (j = 0; j < 3; j++)
+ {
+ /* FIXME: count could overflow */
+ if (trgmInfo->trgm[j] != EMPTY_COLOR)
+ count *= trgmCNFA->colorInfo[trgmInfo->trgm[j]].alphaCharsCount;
+ }
+ trgmInfo->count = count;
+ totalTrgmCount += count;
+ }
+
+ /* Sort color trigrams by descending of their simple trigram counts */
+ qsort(colorTrgms, trgmCNFA->colorTrgmsCount, sizeof(ColorTrgmInfo),
+ colorTrgmInfoCountCmp);
+ /*
+ * Remove color trigrams from the graph until total number of simple
+ * trigrams is below MAX_TRGM_COUNT. We start removing from largest color
+ * trigrams which are most promising for reduce total number of simple
+ * trigrams. When removing color trigram we have to merge states connected
+ * by corresponding arcs. It's necessary to not merge initial and final
+ * states, because our graph becomes useless in this case.
+ */
+ for (i = 0;
+ (i < trgmCNFA->colorTrgmsCount) && (totalTrgmCount > MAX_TRGM_COUNT);
+ i++)
+ {
+ ColorTrgmInfo *trgmInfo = &colorTrgms[i];
+ bool canRemove = true;
+ ListCell *cell;
+
+ /*
+ * Does any arc of with this color trigram connect initial and final
+ * states?
+ */
+ foreach(cell, trgmInfo->arcs)
+ {
+ ArcInfo *arcInfo = (ArcInfo *) lfirst(cell);
+ TrgmState *source = arcInfo->source, *target = arcInfo->target;
+ while (source->parent)
+ source = source->parent;
+ while (target->parent)
+ target = target->parent;
+ if ((source->init || target->init) && (source->fin || target->fin))
+ {
+ canRemove = false;
+ break;
+ }
+ }
+ if (!canRemove)
+ continue;
+
+ /* Merge states */
+ foreach(cell, trgmInfo->arcs)
+ {
+ ArcInfo *arcInfo = (ArcInfo *) lfirst(cell);
+ TrgmState *source = arcInfo->source, *target = arcInfo->target;
+ while (source->parent)
+ source = source->parent;
+ while (target->parent)
+ target = target->parent;
+ if (source != target)
+ mergeStates(source, target);
+ }
+
+ trgmInfo->expanded = false;
+ totalTrgmCount -= trgmInfo->count;
+ }
+ trgmCNFA->totalTrgmCount = totalTrgmCount;
+
+ /* Did we succeed in fitting into MAX_TRGM_COUNT? */
+ if (totalTrgmCount > MAX_TRGM_COUNT)
+ return false;
+
+ /*
+ * Sort color trigrams by colors (will be useful for bsearch) and enumerate
+ * expanded color trigrams.
+ */
+ number = 0;
+ qsort(colorTrgms, trgmCNFA->colorTrgmsCount, sizeof(ColorTrgmInfo),
+ colorTrgmInfoCmp);
+ for (i = 0; i < trgmCNFA->colorTrgmsCount; i++)
+ {
+ if (colorTrgms[i].expanded)
+ {
+ colorTrgms[i].number = number;
+ number++;
+ }
+ }
+
+ trgmCNFA->colorTrgms = colorTrgms;
+ return true;
+}
+
+/*
+ * Expand selected color trigrams into regular trigrams.
+ */
+static TRGM *
+expandColorTrigrams(TrgmCNFA *trgmCNFA)
+{
+ TRGM *trg;
+ trgm *p;
+ int i;
+ ColorInfo emptyColor;
+ mb_char emptyChar = 0;
+
+ /* Virtual "empty" color structure for representing zero character */
+ emptyColor.alphaCharsCount = 1;
+ emptyColor.alphaChars = &emptyChar;
+
+ trg = (TRGM *) palloc0(TRGMHDRSIZE +
+ trgmCNFA->totalTrgmCount * sizeof(trgm));
+ trg->flag = ARRKEY;
+ SET_VARSIZE(trg, CALCGTSIZE(ARRKEY, trgmCNFA->totalTrgmCount));
+ p = GETARR(trg);
+ for (i = 0; i < trgmCNFA->colorTrgmsCount; i++)
+ {
+ mb_char s[3];
+ ColorTrgmInfo *colorTrgm = &trgmCNFA->colorTrgms[i];
+ ColorInfo *c[3];
+ int j, i1, i2, i3;
+
+ if (!colorTrgm->expanded)
+ continue;
+
+ /* Get colors */
+ for (j = 0; j < 3; j++)
+ {
+ if (colorTrgm->trgm[j] != EMPTY_COLOR)
+ c[j] = &trgmCNFA->colorInfo[colorTrgm->trgm[j]];
+ else
+ c[j] = &emptyColor;
+ }
+
+ /* Iterate over all possible combinations of color characters */
+ for (i1 = 0; i1 < c[0]->alphaCharsCount; i1++)
+ {
+ s[0] = c[0]->alphaChars[i1];
+ for (i2 = 0; i2 < c[1]->alphaCharsCount; i2++)
+ {
+ s[1] = c[1]->alphaChars[i2];
+ for (i3 = 0; i3 < c[2]->alphaCharsCount; i3++)
+ {
+ s[2] = c[2]->alphaChars[i3];
+ fillTrgm(p, s);
+ p++;
+ }
+ }
+ }
+ }
+ return trg;
+}
+
+/*---------------------
+ * Subroutines for packing the graph into final representation
+ *---------------------
+ */
+
+/*
+ * Temporary structure for representing arc during packaging.
+ */
+typedef struct
+{
+ int sourceState;
+ int targetState;
+ int colorTrgm;
+} PackArcInfo;
+
+/*
+ * Comparison function for arcs during comparison. Compares arcs in following
+ * order: sourceState, colorTrgm, targetState.
+ */
+static int
+packArcInfoCmp(const void *a1, const void *a2)
+{
+ const PackArcInfo *p1 = (const PackArcInfo *) a1;
+ const PackArcInfo *p2 = (const PackArcInfo *) a2;
+
+ if (p1->sourceState < p2->sourceState)
+ return -1;
+ if (p1->sourceState > p2->sourceState)
+ return 1;
+ if (p1->colorTrgm < p2->colorTrgm)
+ return -1;
+ if (p1->colorTrgm > p2->colorTrgm)
+ return 1;
+ if (p1->targetState < p2->targetState)
+ return -1;
+ if (p1->targetState > p2->targetState)
+ return 1;
+ return 0;
+}
+
+/*
+ * Pack resulting graph into final representation.
+ */
+static PackedGraph *
+packGraph(TrgmCNFA *trgmCNFA, MemoryContext context)
+{
+ int number = 2,
+ arcIndex,
+ arcsCount;
+ HASH_SEQ_STATUS scan_status;
+ TrgmState *state;
+ PackArcInfo *arcs, *p1, *p2;
+ PackedArc *packedArcs;
+ PackedGraph *result;
+ int i,
+ j;
+
+ /* Enumerate states */
+ hash_seq_init(&scan_status, trgmCNFA->states);
+ while ((state = (TrgmState *) hash_seq_search(&scan_status)) != NULL)
+ {
+ while (state->parent)
+ state = state->parent;
+
+ if (state->number < 0)
+ {
+ if (state->init)
+ state->number = 0;
+ else if (state->fin)
+ state->number = 1;
+ else
+ {
+ state->number = number;
+ number++;
+ }
+ }
+ }
+
+ /* Collect array of all arcs */
+ arcs = (PackArcInfo *) palloc(sizeof(PackArcInfo) * trgmCNFA->arcsCount);
+ arcIndex = 0;
+ hash_seq_init(&scan_status, trgmCNFA->states);
+ while ((state = (TrgmState *) hash_seq_search(&scan_status)) != NULL)
+ {
+ TrgmState *source = state;
+ ListCell *cell;
+
+ while (source->parent)
+ source = source->parent;
+
+ foreach(cell, state->arcs)
+ {
+ TrgmArc *arc = (TrgmArc *) lfirst(cell);
+ TrgmState *target = arc->target;
+
+ while (target->parent)
+ target = target->parent;
+
+ if (source->number != target->number)
+ {
+ arcs[arcIndex].sourceState = source->number;
+ arcs[arcIndex].targetState = target->number;
+ arcs[arcIndex].colorTrgm = ((ColorTrgmInfo *)bsearch(&arc->trgm,
+ trgmCNFA->colorTrgms, trgmCNFA->colorTrgmsCount,
+ sizeof(ColorTrgmInfo), colorTrgmInfoCmp))->number;
+
+ arcIndex++;
+ }
+ }
+ }
+ qsort(arcs, arcIndex, sizeof(PackArcInfo), packArcInfoCmp);
+
+ /* We could have duplicates because states were merged. Remove them. */
+ p1 = p2 = arcs;
+ while (p1 < arcs + arcIndex)
+ {
+ if (packArcInfoCmp(p1, p2) > 0)
+ {
+ p2++;
+ *p2 = *p1;
+ }
+ p1++;
+ }
+ arcsCount = p2 - arcs + 1;
+
+ /* Write packed representation */
+ result = (PackedGraph *) MemoryContextAlloc(context, sizeof(PackedGraph));
+
+ /* Pack color trigrams information */
+ result->colorTrigramsCount = 0;
+ for (i = 0; i < trgmCNFA->colorTrgmsCount; i++)
+ if (trgmCNFA->colorTrgms[i].expanded)
+ result->colorTrigramsCount++;
+ result->colorTrigramGroups = (int *) MemoryContextAlloc(context,
+ sizeof(int) * result->colorTrigramsCount);
+ j = 0;
+ for (i = 0; i < trgmCNFA->colorTrgmsCount; i++)
+ {
+ if (trgmCNFA->colorTrgms[i].expanded)
+ {
+ result->colorTrigramGroups[j] = trgmCNFA->colorTrgms[i].count;
+ j++;
+ }
+ }
+
+ /* Pack states and arcs information */
+ result->states = (PackedState *) MemoryContextAlloc(context,
+ number * sizeof(PackedState));
+ result->statesCount = number;
+ packedArcs = (PackedArc *) MemoryContextAlloc(context,
+ arcsCount * sizeof(PackedArc));
+ j = 0;
+ for (i = 0; i < number; i++)
+ {
+ int cnt = 0;
+ result->states[i].arcs = &packedArcs[j];
+ while (j < arcsCount && arcs[j].sourceState == i)
+ {
+ packedArcs[j].targetState = arcs[j].targetState;
+ packedArcs[j].colorTrgm = arcs[j].colorTrgm;
+ cnt++;
+ j++;
+ }
+ result->states[i].arcsCount = cnt;
+ }
+
+ /* Allocate memory for evaluation */
+ result->colorTrigramsActive = (bool *) MemoryContextAlloc(context,
+ sizeof(bool) * result->colorTrigramsCount);
+ result->statesActive = (bool *) MemoryContextAlloc(context,
+ sizeof(bool) * result->statesCount);
+
+ return result;
+}
+
+
+/*---------------------
+ * Debugging functions
+ *---------------------
+ */
+
+#ifdef TRGM_REGEXP_DEBUG
+/*
+ * Print initial CNFA, in regexp library's representation
+ */
+static void
+printSourceCNFA(struct cnfa *cnfa, ColorInfo *colors, int ncolors)
+{
+ int state;
+ StringInfoData buf;
+ int i;
+
+ initStringInfo(&buf);
+
+ appendStringInfo(&buf, "\ndigraph sourceCNFA {\n");
+
+ for (state = 0; state < cnfa->nstates; state++)
+ {
+ struct carc *s = cnfa->states[state];
+
+ appendStringInfo(&buf, "s%d", state);
+ if (cnfa->post == state)
+ appendStringInfo(&buf, " [shape = doublecircle]");
+ appendStringInfo(&buf, ";\n");
+
+ while (s->co != COLORLESS)
+ {
+ appendStringInfo(&buf, " s%d -> s%d [label = \"%d\"];\n",
+ state, s->to, s->co);
+ s++;
+ }
+ }
+
+ appendStringInfo(&buf, " node [shape = point ]; initial;\n");
+ appendStringInfo(&buf, " initial -> s%d;\n", cnfa->pre);
+
+ /* Print colors */
+ appendStringInfo(&buf, " { rank = sink;\n");
+ appendStringInfo(&buf, " Colors [shape = none, margin=0, label=<\n");
+
+ for (i = 0; i < ncolors; i++)
+ {
+ ColorInfo *color = &colors[i];
+ int j;
+
+ appendStringInfo(&buf, "<br/>Color %d: ", i);
+
+ if (!color->expandable)
+ appendStringInfo(&buf, "not expandable");
+ else
+ {
+ for (j = 0; j < color->alphaCharsCount; j++)
+ {
+ char s[5];
+ memcpy(s, (char *) &color->alphaChars[j], 4);
+ s[4] = '\0';
+ appendStringInfo(&buf, "%s", s);
+ }
+ }
+ appendStringInfo(&buf, "\n");
+ }
+
+ appendStringInfo(&buf, " >];\n");
+ appendStringInfo(&buf, " }\n");
+
+
+ appendStringInfo(&buf, "}\n");
+
+ {
+ /* dot -Tpng -o /tmp/source.png < /tmp/source.dot */
+ FILE *fp = fopen("/tmp/source.dot", "w");
+ fprintf(fp, "%s", buf.data);
+ fclose(fp);
+ }
+
+ pfree(buf.data);
+}
+
+/*
+ * Print trigram-based CNFA-like graph.
+ */
+static void
+printTrgmCNFA(TrgmCNFA *trgmCNFA)
+{
+ StringInfoData buf;
+ HASH_SEQ_STATUS scan_status;
+ TrgmState *state;
+ TrgmState *initstate = NULL;
+
+ initStringInfo(&buf);
+
+ appendStringInfo(&buf, "\ndigraph transformedCNFA {\n");
+
+ hash_seq_init(&scan_status, trgmCNFA->states);
+ while ((state = (TrgmState *) hash_seq_search(&scan_status)) != NULL)
+ {
+ ListCell *cell;
+
+ appendStringInfo(&buf, "s%p", state);
+ if (state->fin)
+ appendStringInfo(&buf, " [shape = doublecircle]");
+ if (state->init)
+ initstate = state;
+ appendStringInfo(&buf, " [label = \"%d\"]", state->enterKey.nstate);
+ appendStringInfo(&buf, ";\n");
+
+ foreach(cell, state->arcs)
+ {
+ TrgmArc *arc = (TrgmArc *) lfirst(cell);
+
+ appendStringInfo(&buf, " s%p -> s%p [label = \"%d %d %d\"];\n",
+ (void *) state, (void *) arc->target,
+ (uint32) arc->trgm[0], (uint32) arc->trgm[1], (uint32) arc->trgm[2]);
+ }
+ }
+
+ if (initstate)
+ {
+ appendStringInfo(&buf, " node [shape = point ]; initial;\n");
+ appendStringInfo(&buf, " initial -> s%p;\n", (void *) initstate);
+ }
+ appendStringInfo(&buf, "}\n");
+
+ {
+ /* dot -Tpng -o /tmp/transformed.png < /tmp/transformed.dot */
+ FILE *fp = fopen("/tmp/transformed.dot", "w");
+ fprintf(fp, "%s", buf.data);
+ fclose(fp);
+ }
+
+ pfree(buf.data);
+}
+
+/*
+ * Print final packed representation of resulting trigram-based CNFA-like graph.
+ */
+static void
+printPackedGraph(PackedGraph *packedGraph, TRGM *trigrams)
+{
+ int i;
+ trgm *p;
+ StringInfoData buf;
+ initStringInfo(&buf);
+
+ appendStringInfo(&buf, "\ndigraph packedGraph {\n");
+
+ for (i = 0; i < packedGraph->statesCount; i++)
+ {
+ PackedState *state = &packedGraph->states[i];
+ int j;
+
+ appendStringInfo(&buf, " s%d", i);
+ if (i == 1)
+ appendStringInfo(&buf, " [shape = doublecircle]");
+
+ appendStringInfo(&buf, " [label = <s%d>];\n", i);
+
+ for (j = 0; j < state->arcsCount; j++)
+ {
+ PackedArc *arc = &state->arcs[j];
+ appendStringInfo(&buf, " s%d -> s%d [label = \"trigram %d\"];\n",
+ i, arc->targetState, arc->colorTrgm);
+ }
+ }
+
+ appendStringInfo(&buf, " node [shape = point ]; initial;\n");
+ appendStringInfo(&buf, " initial -> s%d;\n", 0);
+
+ /* Print trigrams */
+ appendStringInfo(&buf, " { rank = sink;\n");
+ appendStringInfo(&buf, " Trigrams [shape = none, margin=0, label=<\n");
+
+ p = GETARR(trigrams);
+ for (i = 0; i < packedGraph->colorTrigramsCount; i++)
+ {
+ int count = packedGraph->colorTrigramGroups[i];
+ int j;
+
+ appendStringInfo(&buf, "<br/>Trigram %d: ", i);
+
+ for (j = 0; j < count; j++)
+ {
+ if (j > 0)
+ appendStringInfo(&buf, ", ");
+ appendStringInfo(&buf, "\"%c%c%c\"", (*p)[0], (*p)[1], (*p)[2]);
+ p++;
+ }
+ }
+ appendStringInfo(&buf, " >];\n");
+ appendStringInfo(&buf, " }\n");
+
+ appendStringInfo(&buf, "}\n");
+
+ {
+ /* dot -Tpng -o /tmp/packed.png < /tmp/packed.dot */
+ FILE *fp = fopen("/tmp/packed.dot", "w");
+ fprintf(fp, "%s", buf.data);
+ fclose(fp);
+ }
+
+ pfree(buf.data);
+}
+#endif
diff --git a/doc/src/sgml/pgtrgm.sgml b/doc/src/sgml/pgtrgm.sgml
index 30e5355..c4105e1 100644
--- a/doc/src/sgml/pgtrgm.sgml
+++ b/doc/src/sgml/pgtrgm.sgml
@@ -145,9 +145,10 @@
operator classes that allow you to create an index over a text column for
the purpose of very fast similarity searches. These index types support
the above-described similarity operators, and additionally support
- trigram-based index searches for <literal>LIKE</> and <literal>ILIKE</>
- queries. (These indexes do not support equality nor simple comparison
- operators, so you may need a regular B-tree index too.)
+ trigram-based index searches for <literal>LIKE</>, <literal>ILIKE</>,
+ <literal>~</> and <literal>~*</> queries. (These indexes do not
+ support equality nor simple comparison operators, so you may need a
+ regular B-tree index too.)
</para>
<para>
@@ -203,6 +204,23 @@ SELECT * FROM test_trgm WHERE t LIKE '%foo%bar';
</para>
<para>
+ Beginning in <productname>PostgreSQL</> 9.3, these index types also support
+ index searches for <literal>~</> and <literal>~*</> (regular expression
+ matching), for example
+<programlisting>
+SELECT * FROM test_trgm WHERE t ~ '(foo|bar)';
+</programlisting>
+ The index search works by extracting trigrams from the regular expression
+ and then looking these up in the index. The more trigrams could be
+ extracted from regular expression, the more effective the index search is.
+ Unlike B-tree based searches, the search string need not be left-anchored.
+ However, sometimes regular expression is too complex for analysis, then
+ it performs the same as when no trigrams can be extracted from regular
+ expression. In this situation full index scan or sequential scan will
+ be performed depending on query plan.
+ </para>
+
+ <para>
The choice between GiST and GIN indexing depends on the relative
performance characteristics of GiST and GIN, which are discussed elsewhere.
As a rule of thumb, a GIN index is faster to search than a GiST index, but
diff --git a/src/backend/utils/adt/regexp.c b/src/backend/utils/adt/regexp.c
index 6a89fab..e29e734 100644
--- a/src/backend/utils/adt/regexp.c
+++ b/src/backend/utils/adt/regexp.c
@@ -128,7 +128,7 @@ static Datum build_regexp_split_result(regexp_matches_ctx *splitctx);
* Pattern is given in the database encoding. We internally convert to
* an array of pg_wchar, which is what Spencer's regex package wants.
*/
-static regex_t *
+regex_t *
RE_compile_and_cache(text *text_re, int cflags, Oid collation)
{
int text_re_len = VARSIZE_ANY_EXHDR(text_re);
diff --git a/src/include/regex/regex.h b/src/include/regex/regex.h
index 616c2c6..7e19e8a 100644
--- a/src/include/regex/regex.h
+++ b/src/include/regex/regex.h
@@ -171,5 +171,6 @@ extern int pg_regprefix(regex_t *, pg_wchar **, size_t *);
extern void pg_regfree(regex_t *);
extern size_t pg_regerror(int, const regex_t *, char *, size_t);
extern void pg_set_regex_collation(Oid collation);
+regex_t *RE_compile_and_cache(text *text_re, int cflags, Oid collation);
#endif /* _REGEX_H_ */
trgm-regexp-lcov-report.tar.gzapplication/x-gzip; name=trgm-regexp-lcov-report.tar.gzDownload
� ���P �=Mo�Hv�$�Iz�H�� @��-�m��H}t��zlw��n��v�`�Xh)�$C��j���=����r�9A�B�S��K� ���v��"E�����t��0�%���W��{�>���}�������w��:�A�U;���7n[��t:F[i��-E��-e�
�t[���!�5���_[��+��#mvj�'�����3��:�m]_;���Y�]7�-�T�Ga����?��x��{y�|�����+��'��O~����?��������G[[���?���r��/��O���������~Eo�.^n����{L/?}z���?��g���������~�g��O~����
�����������������?~��O�W��N���������m}��_���������|���{�t�������~����{vc������U5:��������>��g'���_�8E�������8?;F��f���q�yr}��7]{��[��:��l�^�������l����~����=��g80�4�{������v�:v����9��axuP���I�<F����88xu�d�[cP+������k���
�����{z�m9_#�5?���?�8���t��~
M=<>�M�7�����f�����;�%��������5
�5UQ�����F�;P���?7�kx57G#vE��O{���
�"%]Sk1�Cw���7�3'�K�btH�x��{{��}k6A�7$��
�a�B�-'��|0mB��a
b�}�G����'�N�H]�������aP"c�,����������7�y�p]�e�Bxm�B���#��6�q��!�������7��R��U���l�����A�9d!G��
d�Q��*��k����J�L��b��5�-���r��7��x��CU�����&��sL��m7t�*Z�������=E���q����R .C�NG��n[����!G�bR�|[D�/��N���h ���_��>t����FF66zI���K{OL����N������D�������2��Qgi��Tn\-��TE���~��h����fAl�,U����a�z�h�P�����X/�YUl�����]�G����\U�gW�������9�����dZ�2����I�����1��|;-�_P�+��Y���1�)����
a���X��-E�������FS3��]?��;6�4-��cl4��v�Hs�������Cgp�Xt����n�<k��O��7�5�kk����B�2�_�^����8�&+�V?`{��H���"���&�9`/M�n;����R�����{��}�#�&X�z�Dx���~��3�����n�X������QsQA'� �N����F�#����w�A w���x�L������C6�~�s�~�|�gd�����WV��|�c=�(�(P�b�S����R,�C�����R5%�A)��7S�����y�����H�uF6�����!����P���Y�����y�w��]o����oR�ic>�G��,R�d�i������������/����K�}���t�s�{/�h�c���>����h����7�U��J�?��ng�_SU�.w����>��On'$j4����9(��.4����4IoYRQ#��~�}�bI��X�.KRE���%����cI2��7�P�x��%�(���#Jr������+Q?#��lV
'�#.����X�I|�>�@���?�p5���_��E�����R:k�tW��nC��r5��i��$�Da���N5Q��@`R6
i=X�@�t)`U�� �Bt>&�����hXP]��FAO=d���,zbnpL)�L���p�����O�+F����y��/?�����d������7b=CDB)�$�
���hOmt�W����Q�A|&���DJ���������������23�$D�g�06��$���!P�3���������~����{l��_�^�����]1����N�����������1�f�d�����BcA�-���������AmL�������PW���v���"��`�A�XQ/�C��Pn��r��F~���d/$��xl(1�����5������H(��~e[}(��OC�y��jf,c�+��h� 3�sOKg��x�����t@t�����F&��hc����e�����$���R��d�7����!�^ [��CE���G��������4 ,�������������>�*0���Y�������ix��� v���F���M��������9���H��MU�CsH���N��d-�X�.�U������f��mwh����%����
]��+Z��7>���-�QJiY��/���_��:?�??zzv�8�������K�
]�c�5}�
m�5�t��Fk������Ju��,������������������k��� ��r�J]���]�R�$���(]=}�]NJ
���P��fiRN�����J��]�9Z&�J�y�m,��IZ-�ZBeE!-i�R���>���B���9l����"ZvKh 5r�Lz�"f%�Ak����������V������)���!k*e�P��
����AAj�s������U�i��!-��(�^���H�%`j\�+���.8:4�P��\0��(F�Vm@'������^�q���_]�u��:T��$�g������-&���BA7��7@����3}��Q�`~�G��������1�}��~��x���
������b7L6��������h�y���1�v�
��Gh���
�b:#����S���"��*���!�^��!@��������xy.1�*��]J|����:TN�J|]�<��Z/q����P��n��"����[�1��V�u��u����u���f`
��|xK6�����N�Z�Cw6�^C�����Fh��E�����K���:<�J�P�X�k"t{=��*��� ��uV IMA24���*�P�R��X�b�KX�_&9n�������z��Nfy [R ����d���RrN��V�K()?)�,��Q��HuR��V[/@�Zzv�0/�������Jf��^�P���KNk��;'��^�j�]�O@@�4 Ei�3��^/��L�UUV����I����S���6�T����7Z��8�����M�3o���'�����6�T���68�5�l�v�;�]~P���v�����iW�m�7C ���,d�D�&M��l�v6��ik�
&S^�������"����1�S2�!3�����+(�3t���"/
����l ��������V���2�� 9#����$���D�F��
�$�$�6���hK����h����1��/X;R�����a8;�dJGJI��prK$����m:�U�&�����Uvq��VV�a��|�2;��k=n�$D������h���DN��kG�6���$������>h<��'�3����� ���������,g���Gs��us(pQ0��6�i�H6$3��e��q���N�1�KoY��%���@*b��+�\���� �Z�H7Z�Mb"s����}��J��s��ND����%���V�v��-�x�qc� �� O,�1��S�"�B9Jo#����w
d �{E�! NkQ�U�gGV��(�a]+�,'w�Nn�/`�M��c�gg/���:9~v�|���mq�:�:G��t�A7����~x����l@���&h]�*�{�r�Bn}qau�KH`��S��'�����A����I#k����9�7��j6>9e����X[��������l��~�y��~7�)iF�(r�K1l�@ �(��v�g��R����������B�\��T ���Q#�����O'3�N�����b� d�w�-C���d��Z&� �2�o,��zb�6���LE��ST�*�6UE.msUOE�BH��
Tk��9����&�W�Q����!�!I�^�3����N�'��g��@N0_n��=��6��m��@;w��Jn��E�8�-a�c ��7fl�DD��$��,n|�r
�D������A\Z7�������i�rj��/J����h��7li�5���*���E@���G�.����W�-�N� d������%�q|����l��&�F�M�&W�������"N�T���`��XV��m���>F�c���b>���=w���M�9&*1x�����v��d���?�� /��rI)�*R
���dg;8k��;�x���AT�� ��(�U�uke��������:��1)5�}T7m��[D�����F��m)|#�S���@iXB��Ea�����
�RO�Fy}F�����&ox!�P9�j����)7��3n��&< �
T����V���T� �X����Z����x`v<�Ln�E[_��|��S='2WP���5�1z�������i����W���/O������i��1�V%3#��30}�����'��g��>��I0�oG�[ 6W���/@���j�b�����������wAC�s��q]#8�����+H����[�P�K���4�m�1�0��e/�P*��U�y���4�^QzQ���S-����k~fD����w\���J��I-��V� ���>U�YE6�����d��$�
H�P$��������P�dr�8Rz"����d.�D'D�C��$5����"\����{rCT����
�zp�e7�cgd�����7q���e��������O����c/H�Z��;h�����Pr�C�mI�F&Rb!�����{�5�;�Y�7� w�tM1�C|�N�,%����%X�I�-�5�C�a"E[B����|�VE(�.mQ-7�r�����T�d/�P��/��z����S���H�Dt�I�*�&��U.C����G,Wx��*��6��^��������/�8�'��>a�!�;��%q���/�%�E�_�<��� .E�����Q��*%-�FJz��`��H3�PQ4.�a��:\��i;�e~������)*����F&VO=�U�hn��;d��N�^���)�����c��F;�N%}��
#_�`�d�J�{�W=�M�����OO��! wH�*/O����m(IvQ�X@JT�QZ�0H����I�J�d��L (�s���]������������������U�1����h�:�N�v���t*����)��
r��K��0|���K�**��P��'�c �p��0�T�,��C�oVI�[O�����01��)��*M~W���s3��pJ��k�D��k��r��qR�e����N�4���+MZ�8���>�Xd�@X�NU�'�����+� *N�.��
"R,K�qX������2
F�q�a�\�j�F���6���� q���S����8��^\^��<������lq�Im�7�#���X����;C�����p�|��f(�>��yY@����h�~
�z�3��Z<8=�<v�U��M��/�fu�g��m�b���"�I��^��;��z���m>�0Z�:�����,�
����� '�:���"�T���t�<?�Y� �
��$�$~7��&~���Il���h3����BG�vS<P�a���i���H�8�G�k-4p�}��58�����U$+c��'����V�����*dFi��b=4^�0��r{l�=D��3�
�!����|(9��N'��$y�ov�X�|�M�h��y���@����zA�����D� ��q��.�.y�}�������0/Q�_B�k�PD^�#k8>:?~z����GH(�*y���I�/���(N�}��?���w�QBWz�9 ~K
1O���>vd���D�9{7�=��H?�h������&Zx�|���b��0�
���C�;��� �zp����^r����G�]~I�
�g6���Wn �J n�P�������aL|NXL[w01E3 ����=2�����]���P/����{������b�>[� �ed���%�Y����H�Cl�!#���$��q�1��%N�����I��<�r�gV����i;�*�'=CX]�?=��s�������������""��N:��ZD�Mg�r��~�����V�8>�t�&hx���>*��r��hz������;:�����K�k�6�_x=t=�s��uw����K� �nq�4�/r���A����5��6�/*?�U�|{b����mA?V}�����W�i�X@����"�;6S�������l���r2�m%������(�0:��o J�7"7�TZM�����J��8���u��������
rf�������=�8i��FLR��99w 9��9B����8sX���q���|(�2�dG"~:x�y�r�G�'��m>����*5b��=4��Ss��|���;�������B.�b��n���B�FsbX~��8��k�b��0[�v�y|;����N��{7��[��Z"�+4�c%�1� �`�K2�-M�d���D>[�shTr�����@�=9�<�� o�p�9��g����y�#Y���4���ccc�R2��������@6y
�l�A^�����`U �U���Y$q��o!���"3meZi'-G;�F;m��J�1�������
�D��CT�#������?w�!��U��V�On�7V�dI�j��\E>i=E���59R3��k����U0XMz��L�F�w������������0�����g���.`���8�E� x�J?$r~�jQ0 a^�V��h'�`"�j����><�i_{�!��!�R��|�X�u�z�)q���oJG��,����)�8zz���`jli<.B����==�&����.^\.fg(�*�3Ai��R�d����vzj�6����TCG���k\����v�Cp����p�IG`���?yS��C��v���� �����/L���K�-]�;U��U�u������C�J���r����E�^>�q�ju������~gZ9����U-Z�vZ�N2KL �)�xt�#u�`����i��:��e�*i�j�R@/v�<
��� ,;�)06��
�����W����B7�P���XVt��Ak�^C��-A���f�a���6O��\
5F�(���9��l �:�@���
����A8�f�:��a�i�����|Y���:dF������6c'���������!'�I�i��q�Ae����";V��r��X�Qoi~D�l����y?����ZZ�R��n�:�;V�V��~S^G��6/v� ���x��y���:IM;�Y1�So��#IT �<�dY�![�A�0�E�3��P��@�)��H��k��q6����w���e_�d ��(�L2TGY6����=[�2q(��J��58|Gq8�����8����7"���&�B������?��~��'qUW q�G14�8��G�K��������L"_�TC����@_���y�������n\x���T�B�N��H�iI������d�0"G�I���}�,���������T�GS~�<P�9�Idr���p��{��Qs��L�3R,%C���S��xB���7��k�v�51��8����V}Z0x�zXS�F:d������B�y����w����6�'�m�U�|��2����N�J�ps��6�K�9��D��%\;qF���A-V-J��S����E1H�Bs#�#�{�,m���GVj�8-��� �%�w���U������:�������C�sG��P��Q�}{��~e����Yp#���)��4�]y��@c�Qt@@������@�>���}��~�R�Z��1(-�)�uU.L)��-Uai��&WF��g�qE�VC�w�����eG�BhnkM4�1�a�*����rZ��2u�+&����Z�{V!6�� �y�Q-�sB^�����z
�a��(�]:�!�$�)�2[
��#��9�����_�����W��+����c 6
'G��g|8�O�d6�7�e�����s�p����"�y��G����|D�����&6�v�'�n&�������b<�xk����
����.������R���%#���&����d���)�6�d��������6���j�*���s]}�B�@+.� ��Z)��|i1Mq��v�_V()����j��8�j#J1�����x-�
��ttk��&@�`���]��UVN=�E��E���A�F��UH��`�)t���#����dh^����Y����DN���b��.{���+V�7���!�������������u��Zs��i��9Q���[�Ddu�,�|w�:�� ��9�[�W�`�i��t��`��w���%F��+�e�b����)�I�������k���UvAa���4|���4�h��b)e��{�T"%��c
3r���b���nm q�����������k� �8����u2�u�����o���`n=n<m��������n��,EqG���)��j@�.9)^��F/�X�����R��_��U��h!�X����zI$�����u���sM�e���3�_Bd�*�b����`�V������z���S�K���)��M��E<�M���z������PH"��^���j��.�_��5]�Ee��Ko����!������]s�VP��"����l�;.g
���[;�m�hq!�d�[�kz����ES������K���k1���� �{�6Z��b��+�I|jh����%�����l����uEw���F�����X����x��;
g��ZVc8iIX9��v*�)r�������J�N�����*����y]��'���g����xu|���� ���=eq����R:��<�e�@��}������>={�j���:�h;��U���pq������z���=E����(]h�=a����v�n�]dY� $�-���O�O�[��_FR�?��&���bsG��`�j�%���$:����:�`�������RC$�����rk��J�g�_�h���V �&���Y }"Jc��}�@|���&;���uc�+q�U3�
��N���x�����x/��f����>�wB�A��s/����0!~'��3���;�C�=+�{m]_���")K�tm/����Z�~��[���yq��������Abh�Ka1@���~q���|m��kDbv��i��=d"��cp.4��-��9%��J����
�iy��d7�a���mM������y�S�;���vE��1��' I
*���L�g:��h:��{�E.�sY��� ,H�M��w������W����Y��� ��d�'q�)u�J�i,�|�#���sQO;IZ��<�Q�P�t�h���Y����B���X�$���B��1Ok������WP=�|���~,�����nZ����!�;��%@;��0N���1���RF(��GF����]A�����:�W/�*Z�z�R�E����V��@��M� �������M���J���b�Af�TP��[��!*5���g��i<?�o�84���#-8U���TO�������=���&���|>Op$�%]�7X����xvv���_[�t��^=24 ��<K�oO��9>{��������G�=*?�����Z}{��&*�o�����^���2c:N�#��p�E�q�>�[��d�k������� e�*FE��r���<�@�0�E:iYf���
��RO�#P/�*�)�[A�����D��+�h�E=\���[j����#�n�Z��*���%�dRu
=�����h_���2�j������JRS�}gl�q+D�����}�i�*
���^�8� �V����N�����sbOm��T�E����X�O��d�sh������^��gg}hPe5�D�.,�KN��q�p��uj�u�������\�R������z����v���LeA��������kZLv�s��vm���1y��������k�^�6�{��9���(@���Y2-��W���+q����6{"����b#�����5��@r�vG7��r{�Z�O�aJB����X~� >N��CCM&��o[I��l�`��Lb��3�2���Y�I��N�K��3��Cb�����r���c3K.���!�Un����p#��{���Y1��Q�I`��/~~{���1^���i�"U��r����j���y=�J�Au.!#M�1ez�p��w���������^��A^D��u���TU����h-dp*���y`�Z�;���C����<���+?�����v��!�E=�a�Ev�� =z.�
vzD|��~���b�h;b��Hz��a�{�n� ��!�q��3���!���g����lg�������U �,0�����E� BFq"�` �N�bns0N�����WV�d�+
�6-�6s�h�y#�"�����O�����j`hs �F�Z�VQ*w�d-mR,����X�^�-^fo���
��O�ukb�j��(7,��h�X_����_�]�5V
��
v
���-�)i���H[� 4��Vw5s������t�K&���e$��D�84��Z�#�1L�p+
i]��rF�o@�,}(�DO#��q������lp��q���MzwS�?��LA +�d���.�5�b
P���@�C�0]��G������I�#6Q}a����V�
��
�y��|"4v����1D��Z��pM�&�#0M�z�+�|���A�~"�'�^S�`�d��2�X�0���5(
&�����Q��>�3gI�EZ�1���c}C�%N��DM���]V���z��
l�����F>)���'\NR�@��x-������ )��b�9�A��4R�L!�-���i���G`�:��Z�s#�Md�o�J���!J���[G�!��Y�����V��t]�j��#�*��_U~x�����p7��}��tnx-v�t�q|v�j���qZ7� ��rf���L2rR)g���j9�K���n�%9�lD���I�6��X����4�&�`I������������a��o<v[�XP����A��VrY�K
o_r�2m�?o�T{������|ix-��}*K�����6?�l��-������1�J6F���fc��OP��l,��112�gb���������9���]&F���z�gb�qx-6rAu+-��UQkS����W��2��c��8�+?��Ur�;�Wwfs�����v��|�_�������~8���_������k�D�n�����?�fC<�
��e�4�F#��T���!���c��y0�d�|�xm{{��w �n���kI;v��W1����k�'@kJf���<�<���"\{�<Q0@jd��=v5�/�]������8�mG�b>�/����=���a�+�.��1�=��������eH��O��'G�~fI;������DTm�~�9G�|�v������_��?c�5�5�g��4��;�K���;�2�n�/���N�[�mXZmp���G:��W�/�t\��q��A��/��Wm��E��X���\*��o���[���`��T�����7O_����!����{����lmkg������������v�a��(��o������� Xl�q��x��g�a���q�����><^;�+���-H��&0���X �y$2�D�u�=}���&j���`[,��x;��R m��W��E"�E���
�*~������(�������[rj"�Q��p���O�w'���m/������p����3�C�c����U�����b���������o�M�����%�����/b�� �9�b�C�|��
5tz�~��>��9*�$���O���=����1���/L�Z�Re�%_��p��z�v����F���w�����*-�T��d��H���+7�*�]�=�����Qw.�����������+��4����F���dT+����V�)�"KVR����N�O�nC�G\��(���?���z�I�������o^�|�w��}���q�wd_c�����j;[���@��p������Pz�m��n�w�M^������nW�7G�z��m��M�}��ZR{z7JK��*f-����U�����`��c�(z��rIjx����~���{���t��co���
?�� Z
�o�B�~�AZ�F��iq��v:LNW�T�/��Y)����c���8��o�l-��m�3? ���hU���?Wi��>)������l�\B�|6m��L.������7�RjD �g�e?kO:m����%��')�B���g�D�d5:�`4�qb�2��/������'���=�B�H?g6�s�$���l<�7O�^���6����������9���1m/YV���M��e����^���L��+@VkC���v����:�{��m�;���i�����Z^����|�U��*V��j�����=�L��7���Y������U[7+{o��'Q�����)@[A-T�}�)��F��KW�"Gl��E�~mP�VN�/���2E��8�S���\�����U�5b�� S;}��
K�H��[H�gflA�
^s��X�a�S��{yw���kw���Riw��j�S����s=�^���[���R�NI��}���e��CEcV��Go��;\N��}�o�K�������_-bX��q�&���8���l��r����]���4�X���?����S}@^�������y4{�����D|��g;�l�_V3�uj���MM��j�����h�]a��+@��������VW�`w�%,?�]��U����H��g�7 -��^�=^u�U`�����t�]�����Y��7���k�����.��-��J�����H�.Lf���P7�����j��+*������i,�n*n�-,���jZ�����v�aw0����8�e����bva���w���U�`��N�2,Z�����k���Q��!���*�_�u�����9����(w�������qo�-���h�?�W�{���e��otHU�H�N�I��v�%����6���?�Y�_�Mv�_��, ��{��]F�dW�{ � )�&��: �LfF8�n�g�P��i�8�Sp�Y���t��nt�*��|/�4y��O���^g��O��v�kO&�{?�Hs��k�������,����L$
�&��q���R|X]"�{MR��-��
M��;r��4y���Tx��.��q������������0�"�\)�����i��i�h�_ L�B�<����>����� ���j���1�n��d���7�4E�ZFS������:�
?������{��<�x����[�#����nt���yR�"�f�}<�q<�0*��7����s���"48��]�)��@7N�^ua~�Tv ��$�f������`z����X���B�������N�b�w������[(���{�������o�������������{�>����qS����}�����4�� P�
[���t��m�w�G�������(��������N���[=���{��5*������7o��e����
�_���p:�]w�5�Kw��>\7�-|�p���)�� ��U��}o����������w �U�����. 7�p��Qs���?�}��{����_���_��L��;���.$��S�*�P
�����Vr�"�������:�{���,����������v���������(B�/��{�����"A������^|��4l[%��GY���o%�I�E}%h�D�O�vmM86�H)������/E����\���������VgQp9=��I8������Bi�� i���0�vG����\�it� j�����:m�n�5
�@��(�&��i�[����*w�9����X�Ng7%t���h��l5J;nN)������}n�
�<�2��A7�/�7�Ve�����u=����)?x4���{��`��^Fe0|�����[ ��e�=�v����n?�p1
t��[!���{����aGni���{q�u��!���N��L��tn�+$bV�@go�>��~F����>� ��N�f���i5jM����n�Z#I�h����#�r�>�n�J-�����tjd�G�0��~�T3!�6�m����nc�jt�L�@h�Du�o�y�5��������f8}��"D=�����������X):�?����?n����q�{���(��?��?��?��?��?K�}����H������������ >3�����D�w��t7�tw� �n����/��y\�nL�l����1��{��{���x�w��e�;��;��rW��w����81�;�����V����=X�l���?����?��?��f���d[��
�����z������p��s�O����������g����7�a��R��Y�U������������
���������|�>?�s�*k��{��<����<b��}��l�����pb� ?��"�y\��.{:���2?����*���8���.��pq�?^L�����R�v6�����%��60�o�Cnc��K���`:/�����a�hF�������3�H��^K��TS�+d����
�~�D�����t��)���M��u�6��Y��j?�����^���g����xu|��������V��\��\��.>:uCryb1)����*���� �H��kY�=W�p7�dO1����<��Q���n|����[5�.:����i<w�I�����l4�T����.g�
jn�nVWq�����yX��M��r����c���=���X���l��m
�)�]�LV\��7A���J<����e�A�r ����d�A�$
#
A��<��Y���P9^�r��p�I�hv�I�e�GZ5L*��kY�6V����i<����
4���4l4�V�8��Lc�!M���A�a��c�Z�H��v�\��T^���� ��������7m}������Z��7�_��u���~�:�S��Y���e����������m~�\_���"�t�,cg�����s^�p�o�q��)�U/���+b_���R�?,�`��(f>\�l*(8/�8�;����=f���GG���d���M���u�������&�8���eI#���_�_O�8���H�T�%-K)����������&z��r���y���n�q�F5)K)�����������X�s*S��a��pV��pV�\��Y��.���s]g���.�[�H��������Mep��a�.>��������r �������Ui\����-���X�\���}*{�����R��i���e����Wo6{�4�Y0��� �����{M_��!|sx���������/[m��O����t�P���W�JZ�[�*
M
�J
��P�.�Lf��[�]�@w�����L���hW/S�.M.��`�Wh�E�>o&��~|q����k�Lj<�r���a|���^�%��>y�� ����d�Te�O����������-��v
-6���$/7f�������B&��b\!�:#�d�Ii�H�b��54�T:�����q��erS�78���fY'��
�B���)o�%I9y�����?�?J?��`�3^W��E� f;$kl�]�����
�KC�$�Vm���X��������.�|�G��i����BE���z~E���
���p]�s�z/�{��2����<�q5�;
����l�rH����5p�V��X��&�G|��o2` �{���3s�YF�O�'?%K��������n5��G�0���o�Q�GJ�`O�S������I��=H�J�;��n��$-�_�HC?"}�����������85���~�����
������`�)���o�I�R�p�6�KN��N�k�p���1P�����
����.��EO]�����?���������J9-B������=9mSI���6������Nm��������i��s���6��������v����[v���7��������2:y�/���m��OO������"�2j�Rf�3o<��D��>���N��O7�x)}����4�[ C�d���/ ��Z|��7I���s��/���I�g���^F��� 5��� l�g�G'����A���P���{M��#��~C5Ri�x~��JT�& ���C�[�zE�8>^�����?��U��(s�D�PV$��{i�����������*�P�
qo~@%6���]-U�A��D��>U�"����/-Zg0�������j���3�!�u�Z9��l�/����)!z��y�v�c�����zzHD�z�����Zn"�����%�`�B��.�U�_��tq���y,_�F~eN���V�^!���8������V|��w����)?�?:x~*D��xec�4�R'?�6��� ^�f�II>s�LdbN3�
�M�����,�;p�6�
�uh��p1�.�5L�.��=^��������8����}-"{���
�"��� 5:�"����������2k��B�j��Z0��� h�T{�J������U!���p�����Q������o�g=�/i�9��C��o�E.`|3����R�:�K��v+R�[�3M�Z:R��=^�(�x�j/�H��� i^�#]~/h�����\}I#m�L�h���eg�M}���K,Q����{_�,gc�� ��v�iE�j5Q
���QQ
�I����W\�j�m��.��!P�����m� .ti�@������Slv]�������gC^1��|*6�<��h+��[��R�_��b�i{���Jw�/��}<��C��W��d����S(�����t��5d/I��sAw�=?Ig��_~��?��O�VZ�F����v� ���z^������e��u
J���3�]KO�� �U�����C����P����S!2w
��^q<W��t� ����B�P�����l�O:���.���m�{��W���G-��H�����d�^�^��<b�h�a \��J����@>�_�H2�.Q��4��o�IH���T|!�
������0�����O�^�"I�2U�����.��e��(y��+T��|��~�%Rj���qKjI�:�
F�'�������������#�-
A���y���sx���m��Wg�6����*yE���\>�@Q���������<����E��c�6�n������ E���+�#��)��d�E6V��Tb�B��� ����qv6�H3��CfC����)a������?���S���03����,���i�����V&mp6�}���+�A���T�}o:bb2��*��so"�,���d���[(tf8b)]ISi��,*� O���5��n��f>T>���g��l��b�H��r�X�Q��I:�D�����"N�_m��h�~�����.���^�r^G@�b��l_���i��>6�M�{��E,�N���|�� RL/�3/��|�.�#�s��t� �X�G!K�S��u����4k;��"���T
�v�
J���+����F�c��s�r���09����GG�o#�
n�q�Sb�{U�[?P�
��c����� ��Z���bNnd�Je���!\Os��86�w�F�&�d8����$m��IY����F��7{Ws���<:�?�����K����R>�L6�
F��c�k���)X�� ���RA[G��}��d�7I���r�I0 ��<���|�!��yd���W���By!���\c��q�9`��i!�2��s�{O���d8��"������������o�����`\�|���`������VK=���S��W������`�/\��oE����!�"Ei�(�Z*�������.[.9;�6|�Nz�V%��9�"V���1�X��k�ReD��h��G� ���Fi�b��t�&�LgQ��
��b+�mI��9�b:�����O���qx��Kb�mf�tv.s�*�>�������{��1p���V��"�c� g|P�`���uAIAM�.8��b���I*M/|�^��p~-�:�����R��d�)&1-mX��E��fp���S.��H?
��17g�MK��dvpm"�s�����Q#[�v�.���b��Y��F�S��u>
�9@�"��1!�,�k���F�y�a�E��������&�~�Zj�����t�F#���I
{��|5���~Bm:��M�����F���]��@�Z��$e�T���� %#��Z����a����kY�Y�,#��v"\�i3ZA���Q�Z�5)o�E�7��D���
��%�2�`<��mu��#+"���Z2�v�;J�HH>{�
E�OG��=�[8������p�YR"-e����+��m����z�_X����A,����c(�(� ��%%��� ������.��w��x2�&