Triggers and Full Text Search *
More than a year ago, I implemented full text search on one of my sites.
From the beginning, there was one problem (or at least, what I perceive to
be a problem): when I use a script to insert many documents at once, they
do *not* get indexed in fts. If a document is created or inserted one at a
time, fts indexes immediately. The workaround I came up with was just to
open each of those script inserted documents and then close them. As soon
as they are opened, they get indexed.
I assume this has to do with the trigger, which is set to BEFORE, and which
I carefully followed from the blog post that I got the code from. I wrote
to that author at the time, but he was of no help. My thought was that the
trigger was not firing, and thus the documents were not getting indexed,
because until the document was actually there, there was nothing to index.
Therefore, I thought a simple switch from BEFORE to AFTER would solve my
problem. However, in the example in the official docs, BEFORE is used as
well, so I abandoned that idea and decided to post this question.
Another solution I had in mind was to simply include an additional step in
my insert script to sleep for one second, during which the current document
would be opened, and hopefully indexed, and then closed, so the script
could go on to the next document. Note my insert script is in Python and
goes through Django. This is not a ‘pure’ postgresql operation, if that
matters.
My questions are:
1.
Does this sleep / open / close / proceed idea seem like a workable
solution?
2.
Is there a better workaround?
3.
At first blush, I would think the speed of insertion would not be an
issue for any trigger - it would seem to defeat the purpose - but am I
wrong about that?
I am not (yet) posting the trigger code because this post is long already,
and if your answers are 1) yes, 2) no and 3) triggers often work / fail
like this, then there’s no point and we can wrap this up. But if not, I
will happily post what I have. Thank you.
*“None of you has faith until he loves for his brother or his neighbor what
he loves for himself.”*
På tirsdag 21. april 2020 kl. 19:24:10, skrev Malik Rumi <
malik.a.rumi@gmail.com <mailto:malik.a.rumi@gmail.com>>:
[...]
I am not (yet) posting the trigger code because this post is long already, and
if your answers are 1) yes, 2) no and 3) triggers often work / fail like this,
then there’s no point and we can wrap this up. But if not, I will happily post
what I have. Thank you.
This is too much prose for the regular programmer, show us the code, and point
out what doesn't work for you, then we can help:-)
--
Andreas Joseph Krogh
I think COPY bypasses the triggers.
Best Regards
- Ericson Smith
+1 876-375-9857 (whatsapp)
+1 646-483-3420 (sms)
On Wed, Apr 22, 2020 at 12:32 AM Andreas Joseph Krogh <andreas@visena.com>
wrote:
Show quoted text
På tirsdag 21. april 2020 kl. 19:24:10, skrev Malik Rumi <
malik.a.rumi@gmail.com>:[...]
I am not (yet) posting the trigger code because this post is long already,
and if your answers are 1) yes, 2) no and 3) triggers often work / fail
like this, then there’s no point and we can wrap this up. But if not, I
will happily post what I have. Thank you.This is too much prose for the regular programmer, show us the code, and
point out what doesn't work for you, then we can help:-)--
Andreas Joseph Krogh
On 4/21/20 11:04 AM, Ericson Smith wrote:
I think COPY bypasses the triggers.
No:
https://www.postgresql.org/docs/12/sql-copy.html
"COPY FROM will invoke any triggers and check constraints on the
destination table. However, it will not invoke rules."
Best Regards - Ericson Smith +1 876-375-9857 (whatsapp) +1 646-483-3420 (sms)On Wed, Apr 22, 2020 at 12:32 AM Andreas Joseph Krogh
<andreas@visena.com <mailto:andreas@visena.com>> wrote:På tirsdag 21. april 2020 kl. 19:24:10, skrev Malik Rumi
<malik.a.rumi@gmail.com <mailto:malik.a.rumi@gmail.com>>:[...]
I am not (yet) posting the trigger code because this post is
long already, and if your answers are 1) yes, 2) no and 3)
triggers often work / fail like this, then there’s no point and
we can wrap this up. But if not, I will happily post what I
have. Thank you.This is too much prose for the regular programmer, show us the code,
and point out what doesn't work for you, then we can help:-)
--
Andreas Joseph Krogh
--
Adrian Klaver
adrian.klaver@aklaver.com
@Ericson,
Forgive me for seeming dense, but how does COPY help or hurt here?
@Andreas,
I had to laugh at your reference to "prose". Would you believe I am
actually a published playwright? Long before I started coding, of course.
Old habits die hard.....
entry_search_vector_trigger
BEGIN
SELECT setweight(to_tsvector(NEW.title), 'A') ||
setweight(to_tsvector(NEW.content), 'B') ||
setweight(to_tsvector(NEW.category), 'D') ||
setweight(to_tsvector(COALESCE(string_agg(tag.tag, ',
'), '')), 'C')
INTO NEW.search_vector
FROM ktab_entry AS entry
LEFT JOIN ktab_entry_tags AS entry_tags ON
entry_tags.entry_id = entry.id
LEFT JOIN ktab_tag AS tag ON tag.id = entry_tags.tag_id
WHERE entry.id = NEW.id
GROUP BY entry.id, category;
RETURN NEW;
END;
tag_search_vector_trigger
BEGIN
UPDATE ktab_entry SET id = id WHERE id IN (
SELECT entry_id FROM ktab_entry_tags WHERE tag_id = NEW.id
);
RETURN NEW;
END;
tags_search_vector_trigger
BEGIN
IF (TG_OP = 'DELETE') THEN
UPDATE ktab_entry SET id = id WHERE id = OLD.entry_id;
RETURN OLD;
ELSE
UPDATE ktab_entry SET id = id WHERE id = NEW.entry_id;
RETURN NEW;
END IF;
END;
search_vector_update
BEGIN
SELECT setweight(to_tsvector(NEW.title), 'A') ||
setweight(to_tsvector(NEW.content), 'B') ||
setweight(to_tsvector(NEW.category), 'D') ||
setweight(to_tsvector(COALESCE(string_agg(tag.tag, ',
'), '')), 'C')
INTO NEW.search_vector
FROM ktab_entry AS entry
LEFT JOIN ktab_entry_tags AS entry_tags ON
entry_tags.entry_id = entry.id
LEFT JOIN ktab_tag AS tag ON tag.id = entry_tags.tag_id
WHERE entry.id = NEW.id
GROUP BY entry.id, category;
RETURN NEW;
END;
search_vector_update (tags)
BEGIN
IF (TG_OP = 'DELETE') THEN
UPDATE ktab_entry SET id = id WHERE id = OLD.entry_id;
RETURN OLD;
ELSE
UPDATE ktab_entry SET id = id WHERE id = NEW.entry_id;
RETURN NEW;
END IF;
END;
Thank you!
*“None of you has faith until he loves for his brother or his neighbor what
he loves for himself.”*
On Tue, Apr 21, 2020 at 1:05 PM Ericson Smith <esconsult1@gmail.com> wrote:
Show quoted text
I think COPY bypasses the triggers.
Best Regards - Ericson Smith +1 876-375-9857 (whatsapp) +1 646-483-3420 (sms)On Wed, Apr 22, 2020 at 12:32 AM Andreas Joseph Krogh <andreas@visena.com>
wrote:På tirsdag 21. april 2020 kl. 19:24:10, skrev Malik Rumi <
malik.a.rumi@gmail.com>:[...]
I am not (yet) posting the trigger code because this post is long
already, and if your answers are 1) yes, 2) no and 3) triggers often work /
fail like this, then there’s no point and we can wrap this up. But if not,
I will happily post what I have. Thank you.This is too much prose for the regular programmer, show us the code, and
point out what doesn't work for you, then we can help:-)--
Andreas Joseph Krogh
My apologies - I did not look closely at the manual. Many many years ago
(6.xx days I had a similar problem and leapt to answer).
Could you post your CREATE TRIGGER statements as well?
On Wed, Apr 22, 2020 at 1:21 AM Malik Rumi <malik.a.rumi@gmail.com> wrote:
Show quoted text
@Ericson,
Forgive me for seeming dense, but how does COPY help or hurt here?@Andreas,
I had to laugh at your reference to "prose". Would you believe I am
actually a published playwright? Long before I started coding, of course.
Old habits die hard.....entry_search_vector_trigger
BEGIN
SELECT setweight(to_tsvector(NEW.title), 'A') ||
setweight(to_tsvector(NEW.content), 'B') ||
setweight(to_tsvector(NEW.category), 'D') ||
setweight(to_tsvector(COALESCE(string_agg(tag.tag, ',
'), '')), 'C')
INTO NEW.search_vector
FROM ktab_entry AS entry
LEFT JOIN ktab_entry_tags AS entry_tags ON
entry_tags.entry_id = entry.id
LEFT JOIN ktab_tag AS tag ON tag.id = entry_tags.tag_id
WHERE entry.id = NEW.id
GROUP BY entry.id, category;
RETURN NEW;
END;tag_search_vector_trigger
BEGIN
UPDATE ktab_entry SET id = id WHERE id IN (
SELECT entry_id FROM ktab_entry_tags WHERE tag_id = NEW.id
);
RETURN NEW;
END;tags_search_vector_trigger
BEGIN
IF (TG_OP = 'DELETE') THEN
UPDATE ktab_entry SET id = id WHERE id = OLD.entry_id;
RETURN OLD;
ELSE
UPDATE ktab_entry SET id = id WHERE id = NEW.entry_id;
RETURN NEW;
END IF;
END;search_vector_update
BEGIN
SELECT setweight(to_tsvector(NEW.title), 'A') ||
setweight(to_tsvector(NEW.content), 'B') ||
setweight(to_tsvector(NEW.category), 'D') ||
setweight(to_tsvector(COALESCE(string_agg(tag.tag, ',
'), '')), 'C')
INTO NEW.search_vector
FROM ktab_entry AS entry
LEFT JOIN ktab_entry_tags AS entry_tags ON
entry_tags.entry_id = entry.id
LEFT JOIN ktab_tag AS tag ON tag.id = entry_tags.tag_id
WHERE entry.id = NEW.id
GROUP BY entry.id, category;
RETURN NEW;
END;search_vector_update (tags)
BEGIN
IF (TG_OP = 'DELETE') THEN
UPDATE ktab_entry SET id = id WHERE id = OLD.entry_id;
RETURN OLD;
ELSE
UPDATE ktab_entry SET id = id WHERE id = NEW.entry_id;
RETURN NEW;
END IF;
END;Thank you!
*“None of you has faith until he loves for his brother or his neighbor
what he loves for himself.”*On Tue, Apr 21, 2020 at 1:05 PM Ericson Smith <esconsult1@gmail.com>
wrote:I think COPY bypasses the triggers.
Best Regards - Ericson Smith +1 876-375-9857 (whatsapp) +1 646-483-3420 (sms)On Wed, Apr 22, 2020 at 12:32 AM Andreas Joseph Krogh <andreas@visena.com>
wrote:På tirsdag 21. april 2020 kl. 19:24:10, skrev Malik Rumi <
malik.a.rumi@gmail.com>:[...]
I am not (yet) posting the trigger code because this post is long
already, and if your answers are 1) yes, 2) no and 3) triggers often work /
fail like this, then there’s no point and we can wrap this up. But if not,
I will happily post what I have. Thank you.This is too much prose for the regular programmer, show us the code, and
point out what doesn't work for you, then we can help:-)--
Andreas Joseph Krogh
On 4/21/20 11:21 AM, Malik Rumi wrote:
@Ericson,
Forgive me for seeming dense, but how does COPY help or hurt here?@Andreas,
I had to laugh at your reference to "prose". Would you believe I am
actually a published playwright? Long before I started coding, of
course. Old habits die hard.....
The script code via Python/Django/psycopg2 would be helpful as my
suspicion is that you are seeing the effects of open transactions.
entry_search_vector_trigger
BEGIN
SELECT setweight(to_tsvector(NEW.title), 'A') ||
setweight(to_tsvector(NEW.content), 'B') ||
setweight(to_tsvector(NEW.category), 'D') ||
setweight(to_tsvector(COALESCE(string_agg(tag.tag,
', '), '')), 'C')
INTO NEW.search_vector
FROM ktab_entry AS entry
LEFT JOIN ktab_entry_tags AS entry_tags ON
entry_tags.entry_id = entry.id <http://entry.id>
LEFT JOIN ktab_tag AS tag ON tag.id <http://tag.id> =
entry_tags.tag_id
WHERE entry.id <http://entry.id> = NEW.id
GROUP BY entry.id <http://entry.id>, category;
RETURN NEW;
END;tag_search_vector_trigger
BEGIN
UPDATE ktab_entry SET id = id WHERE id IN (
SELECT entry_id FROM ktab_entry_tags WHERE tag_id = NEW.id
);
RETURN NEW;
END;tags_search_vector_trigger
BEGIN
IF (TG_OP = 'DELETE') THEN
UPDATE ktab_entry SET id = id WHERE id = OLD.entry_id;
RETURN OLD;
ELSE
UPDATE ktab_entry SET id = id WHERE id = NEW.entry_id;
RETURN NEW;
END IF;
END;search_vector_update
BEGIN
SELECT setweight(to_tsvector(NEW.title), 'A') ||
setweight(to_tsvector(NEW.content), 'B') ||
setweight(to_tsvector(NEW.category), 'D') ||
setweight(to_tsvector(COALESCE(string_agg(tag.tag,
', '), '')), 'C')
INTO NEW.search_vector
FROM ktab_entry AS entry
LEFT JOIN ktab_entry_tags AS entry_tags ON
entry_tags.entry_id = entry.id <http://entry.id>
LEFT JOIN ktab_tag AS tag ON tag.id <http://tag.id> =
entry_tags.tag_id
WHERE entry.id <http://entry.id> = NEW.id
GROUP BY entry.id <http://entry.id>, category;
RETURN NEW;
END;search_vector_update (tags)
BEGIN
IF (TG_OP = 'DELETE') THEN
UPDATE ktab_entry SET id = id WHERE id = OLD.entry_id;
RETURN OLD;
ELSE
UPDATE ktab_entry SET id = id WHERE id = NEW.entry_id;
RETURN NEW;
END IF;
END;Thank you!
*/“None of you has faith until he loves for his brother or his neighbor
what he loves for himself.”/*On Tue, Apr 21, 2020 at 1:05 PM Ericson Smith <esconsult1@gmail.com
<mailto:esconsult1@gmail.com>> wrote:I think COPY bypasses the triggers.
Best Regards - Ericson Smith +1 876-375-9857 (whatsapp) +1 646-483-3420 (sms)On Wed, Apr 22, 2020 at 12:32 AM Andreas Joseph Krogh
<andreas@visena.com <mailto:andreas@visena.com>> wrote:På tirsdag 21. april 2020 kl. 19:24:10, skrev Malik Rumi
<malik.a.rumi@gmail.com <mailto:malik.a.rumi@gmail.com>>:[...]
I am not (yet) posting the trigger code because this post is
long already, and if your answers are 1) yes, 2) no and 3)
triggers often work / fail like this, then there’s no point
and we can wrap this up. But if not, I will happily post
what I have. Thank you.This is too much prose for the regular programmer, show us the
code, and point out what doesn't work for you, then we can help:-)
--
Andreas Joseph Krogh
--
Adrian Klaver
adrian.klaver@aklaver.com
On Tue, 2020-04-21 at 12:24 -0500, Malik Rumi wrote:
More than a year ago, I implemented full text search on one of my sites.
From the beginning, there was one problem (or at least, what I perceive
to be a problem): when I use a script to insert many documents at once,
they do *not* get indexed in fts. If a document is created or inserted
one at a time, fts indexes immediately. The workaround I came up with
was just to open each of those script inserted documents and then close
them. As soon as they are opened, they get indexed.
A trigger will fire and update the index immediately.
That opening and closing you are talking about does not sound like
a database activity. Rather, it sounds like your software is delaying
the actual insert into the database, which would of course explain
why you cannot find it in the index.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
@Ericson
I have the script (statements?) by which I created the triggers, but since
you asked I do not see them in pga4, from which I manage my postgres. I
don't know if this is significant or not. Also, this was originally done
quite a while ago, so my memory may be fuzzy. From the text of the
statement "runSQL..." I think I ran this in the terminal. So this is the
closest thing I can find to your request. The text that I previously posted
can be found in 'triggers functions' under this schema in pga4, but not
these statements. Also, further up the pga4 tree, "event triggers" is
blank. I mention these things because I am not sure of their importance.
Thanks.
triggers.py
# Trigger on insert or update of ktab.Entry
migrations.RunSQL('''CREATE OR REPLACE FUNCTION
entry_search_vector_trigger() RETURNS trigger AS $$
BEGIN
SELECT setweight(to_tsvector(NEW.title), 'A') ||
setweight(to_tsvector(NEW.content), 'B') ||
setweight(to_tsvector(NEW.category), 'D') ||
setweight(to_tsvector(COALESCE(string_agg(tags.tag, ', '), '')),
'C')
INTO NEW.search_vector
FROM ktab_entry AS entry
LEFT JOIN ktab_entry_tags AS entry_tags ON entry_tags.entry_id =
entry.id
LEFT JOIN ktab_tags AS tag ON tag.id = entry_tags.tag_id
WHERE entry.id = NEW.id
GROUP BY entry.id, category;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER search_vector_update BEFORE INSERT OR UPDATE ON ktab_entry
FOR EACH ROW EXECUTE PROCEDURE entry_search_vector_trigger();''')
# Trigger after ktab.Author is updated
'''
Since I don't have author, and besides, his author was a separate table -
SKIP
CREATE OR REPLACE FUNCTION author_search_vector_trigger() RETURNS trigger
AS $$
BEGIN
UPDATE ktab_entry SET id = id WHERE author_id = NEW.id;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER search_vector_update AFTER INSERT OR UPDATE ON ktab_author
FOR EACH ROW EXECUTE PROCEDURE author_search_vector_trigger();
'''
# Trigger after ktab.Entry.tags are added, deleted from a entry
migrations.RunSQL('''CREATE OR REPLACE FUNCTION
tags_search_vector_trigger() RETURNS trigger AS $$
BEGIN
IF (TG_OP = 'DELETE') THEN
UPDATE ktab_entry SET id = id WHERE id = OLD.entry_id;
RETURN OLD;
ELSE
UPDATE ktab_entry SET id = id WHERE id = NEW.entry_id;
RETURN NEW;
END IF;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER search_vector_update AFTER INSERT OR UPDATE OR DELETE ON
ktab_entry_tags
FOR EACH ROW EXECUTE PROCEDURE tags_search_vector_trigger();
''')
# Trigger after ktab.Tag is updated
migrations.RunSQL('''CREATE OR REPLACE FUNCTION tag_search_vector_trigger()
RETURNS trigger AS $$
BEGIN
UPDATE ktab_entry SET id = id WHERE id IN (
SELECT entry_id FROM ktab_entry_tags WHERE tag_id = NEW.id
);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER search_vector_update AFTER UPDATE ON ktab_tag
FOR EACH ROW EXECUTE PROCEDURE tag_search_vector_trigger();
*“None of you has faith until he loves for his brother or his neighbor what
he loves for himself.”*
On Tue, Apr 21, 2020 at 1:25 PM Ericson Smith <esconsult1@gmail.com> wrote:
Show quoted text
My apologies - I did not look closely at the manual. Many many years ago
(6.xx days I had a similar problem and leapt to answer).Could you post your CREATE TRIGGER statements as well?
On Wed, Apr 22, 2020 at 1:21 AM Malik Rumi <malik.a.rumi@gmail.com> wrote:
@Ericson,
Forgive me for seeming dense, but how does COPY help or hurt here?@Andreas,
I had to laugh at your reference to "prose". Would you believe I am
actually a published playwright? Long before I started coding, of course.
Old habits die hard.....entry_search_vector_trigger
BEGIN
SELECT setweight(to_tsvector(NEW.title), 'A') ||
setweight(to_tsvector(NEW.content), 'B') ||
setweight(to_tsvector(NEW.category), 'D') ||
setweight(to_tsvector(COALESCE(string_agg(tag.tag,
', '), '')), 'C')
INTO NEW.search_vector
FROM ktab_entry AS entry
LEFT JOIN ktab_entry_tags AS entry_tags ON
entry_tags.entry_id = entry.id
LEFT JOIN ktab_tag AS tag ON tag.id = entry_tags.tag_id
WHERE entry.id = NEW.id
GROUP BY entry.id, category;
RETURN NEW;
END;tag_search_vector_trigger
BEGIN
UPDATE ktab_entry SET id = id WHERE id IN (
SELECT entry_id FROM ktab_entry_tags WHERE tag_id = NEW.id
);
RETURN NEW;
END;tags_search_vector_trigger
BEGIN
IF (TG_OP = 'DELETE') THEN
UPDATE ktab_entry SET id = id WHERE id = OLD.entry_id;
RETURN OLD;
ELSE
UPDATE ktab_entry SET id = id WHERE id = NEW.entry_id;
RETURN NEW;
END IF;
END;search_vector_update
BEGIN
SELECT setweight(to_tsvector(NEW.title), 'A') ||
setweight(to_tsvector(NEW.content), 'B') ||
setweight(to_tsvector(NEW.category), 'D') ||
setweight(to_tsvector(COALESCE(string_agg(tag.tag,
', '), '')), 'C')
INTO NEW.search_vector
FROM ktab_entry AS entry
LEFT JOIN ktab_entry_tags AS entry_tags ON
entry_tags.entry_id = entry.id
LEFT JOIN ktab_tag AS tag ON tag.id = entry_tags.tag_id
WHERE entry.id = NEW.id
GROUP BY entry.id, category;
RETURN NEW;
END;search_vector_update (tags)
BEGIN
IF (TG_OP = 'DELETE') THEN
UPDATE ktab_entry SET id = id WHERE id = OLD.entry_id;
RETURN OLD;
ELSE
UPDATE ktab_entry SET id = id WHERE id = NEW.entry_id;
RETURN NEW;
END IF;
END;Thank you!
*“None of you has faith until he loves for his brother or his neighbor
what he loves for himself.”*On Tue, Apr 21, 2020 at 1:05 PM Ericson Smith <esconsult1@gmail.com>
wrote:I think COPY bypasses the triggers.
Best Regards - Ericson Smith +1 876-375-9857 (whatsapp) +1 646-483-3420 (sms)On Wed, Apr 22, 2020 at 12:32 AM Andreas Joseph Krogh <
andreas@visena.com> wrote:På tirsdag 21. april 2020 kl. 19:24:10, skrev Malik Rumi <
malik.a.rumi@gmail.com>:[...]
I am not (yet) posting the trigger code because this post is long
already, and if your answers are 1) yes, 2) no and 3) triggers often work /
fail like this, then there’s no point and we can wrap this up. But if not,
I will happily post what I have. Thank you.This is too much prose for the regular programmer, show us the code,
and point out what doesn't work for you, then we can help:-)--
Andreas Joseph Krogh