Trigger that spawns forked process

Started by Christopher Murtaghalmost 21 years ago19 messagesgeneral
Jump to latest
#1Christopher Murtagh
christopher.murtagh@mcgill.ca

I would like to write a trigger or function that spawns a forked
process so that the transaction is considered 'complete' to the client,
but continues to perform more work. I've been looking for examples to
steal^H^H^H^H^H learn from but have only found someone asking pretty
much the same question:

http://archives.postgresql.org/pgsql-general/2002-12/msg01187.php

Has anyone done anything like this?

Cheers,

Chris

--
Christopher Murtagh
Enterprise Systems Administrator
ISR / Web Service Group
McGill University
Montreal, Quebec
Canada

Tel.: (514) 398-3122
Fax: (514) 398-2017

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Christopher Murtagh (#1)
Re: Trigger that spawns forked process

Christopher Murtagh <christopher.murtagh@mcgill.ca> writes:

I would like to write a trigger or function that spawns a forked
process so that the transaction is considered 'complete' to the client,
but continues to perform more work.

It's not very clear what you are hoping to have the forked process do,
but if it's supposed to be another backend, forget it --- won't work.
See this thread:
http://archives.postgresql.org/pgsql-sql/2005-04/msg00329.php

regards, tom lane

#3Christopher Murtagh
christopher.murtagh@mcgill.ca
In reply to: Tom Lane (#2)
Re: Trigger that spawns forked process

On Mon, 2005-05-09 at 15:38 -0400, Tom Lane wrote:

Christopher Murtagh <christopher.murtagh@mcgill.ca> writes:

I would like to write a trigger or function that spawns a forked
process so that the transaction is considered 'complete' to the client,
but continues to perform more work.

It's not very clear what you are hoping to have the forked process do,
but if it's supposed to be another backend, forget it --- won't work.

No, I don't want the trigger to do any db stuff at all. Basically, I've
got a content management system that is going to be split across a
cluster. Upon an update submission from one of them, I want to replicate
across the others (which can happen in pseudo real time). So, basically
the DB is the master, once it makes an update, it will spawn a process
to the syncronization (all of this code is written). I just don't want
the initial update process to wait for replication to finish (which is
only a second or two under normal load). I could write a daemon that
would sit an listen to these replication requests, but that just seems
to be more complex than I need.

Cheers,

Chris

--
Christopher Murtagh
Enterprise Systems Administrator
ISR / Web Service Group
McGill University
Montreal, Quebec
Canada

Tel.: (514) 398-3122
Fax: (514) 398-2017

#4Doug McNaught
doug@mcnaught.org
In reply to: Christopher Murtagh (#3)
Re: Trigger that spawns forked process

Christopher Murtagh <christopher.murtagh@mcgill.ca> writes:

No, I don't want the trigger to do any db stuff at all. Basically, I've
got a content management system that is going to be split across a
cluster. Upon an update submission from one of them, I want to replicate
across the others (which can happen in pseudo real time). So, basically
the DB is the master, once it makes an update, it will spawn a process
to the syncronization (all of this code is written). I just don't want
the initial update process to wait for replication to finish (which is
only a second or two under normal load). I could write a daemon that
would sit an listen to these replication requests, but that just seems
to be more complex than I need.

Why not have a client connection LISTENing and doing the
synchronization, and have the trigger use NOTIFY?

Or, you could have the trigger write to a table, and have another
client periodically scanning the table for new sync events.

Either one of those would be simpler and more robust than fork()ing
inside the backend.

-Doug

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Doug McNaught (#4)
Re: Trigger that spawns forked process

Douglas McNaught <doug@mcnaught.org> writes:

Why not have a client connection LISTENing and doing the
synchronization, and have the trigger use NOTIFY?
Or, you could have the trigger write to a table, and have another
client periodically scanning the table for new sync events.
Either one of those would be simpler and more robust than fork()ing
inside the backend.

... not to mention it would avoid the risk of propagating
not-yet-committed changes.

regards, tom lane

#6Christopher Murtagh
christopher.murtagh@mcgill.ca
In reply to: Doug McNaught (#4)
Re: Trigger that spawns forked process

On Mon, 2005-05-09 at 17:01 -0400, Douglas McNaught wrote:

Why not have a client connection LISTENing and doing the
synchronization, and have the trigger use NOTIFY?

Or, you could have the trigger write to a table, and have another
client periodically scanning the table for new sync events.

Either one of those would be simpler and more robust than fork()ing
inside the backend.

How is writing a daemon simpler than using something that could be done
within Postgres? Forking is something that should be natural to Unix
systems, I shouldn't need to write another application to do this. I
don't see how a daemon would necessarily be more robust either.

Cheers,

Chris

--
Christopher Murtagh
Enterprise Systems Administrator
ISR / Web Service Group
McGill University
Montreal, Quebec
Canada

Tel.: (514) 398-3122
Fax: (514) 398-2017

#7Christopher Murtagh
christopher.murtagh@mcgill.ca
In reply to: Tom Lane (#5)
Re: Trigger that spawns forked process

On Mon, 2005-05-09 at 17:07 -0400, Tom Lane wrote:

Douglas McNaught <doug@mcnaught.org> writes:

Why not have a client connection LISTENing and doing the
synchronization, and have the trigger use NOTIFY?
Or, you could have the trigger write to a table, and have another
client periodically scanning the table for new sync events.
Either one of those would be simpler and more robust than fork()ing
inside the backend.

... not to mention it would avoid the risk of propagating
not-yet-committed changes.

How's that? If I can notify a daemon that the change is committed, then
why couldn't I write a forking plperl function that executes when the
transaction is done? How is one riskier than the other? Is there
something obvious I'm missing here?

Cheers,

Chris

--
Christopher Murtagh
Enterprise Systems Administrator
ISR / Web Service Group
McGill University
Montreal, Quebec
Canada

Tel.: (514) 398-3122
Fax: (514) 398-2017

#8Doug McNaught
doug@mcnaught.org
In reply to: Christopher Murtagh (#6)
Re: Trigger that spawns forked process

Christopher Murtagh <christopher.murtagh@mcgill.ca> writes:

On Mon, 2005-05-09 at 17:01 -0400, Douglas McNaught wrote:

Why not have a client connection LISTENing and doing the
synchronization, and have the trigger use NOTIFY?

Or, you could have the trigger write to a table, and have another
client periodically scanning the table for new sync events.

Either one of those would be simpler and more robust than fork()ing
inside the backend.

How is writing a daemon simpler than using something that could be done
within Postgres? Forking is something that should be natural to Unix
systems, I shouldn't need to write another application to do this. I
don't see how a daemon would necessarily be more robust either.

Why do random code surgery on the backend, which needs to be utterly
robust and stable, when client-level solutions are just as easy and
much less dangerous?

It's kind of akin to the Linux principle of "don't do in the kernel
what you can do in userspace."

That's my philosophy, anyway. :)

-Doug

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Christopher Murtagh (#7)
Re: Trigger that spawns forked process

Christopher Murtagh <christopher.murtagh@mcgill.ca> writes:

On Mon, 2005-05-09 at 17:07 -0400, Tom Lane wrote:

... not to mention it would avoid the risk of propagating
not-yet-committed changes.

How's that? If I can notify a daemon that the change is committed, then
why couldn't I write a forking plperl function that executes when the
transaction is done? How is one riskier than the other? Is there
something obvious I'm missing here?

Yes: the mechanisms that are being suggested to you already exist.
There is not, AND NEVER WILL BE, any mechanism to invoke random
user-defined functions during the post-commit sequence. That code
sequence cannot afford to do anything that will potentially incur
errors.

regards, tom lane

#10Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Christopher Murtagh (#6)
Re: Trigger that spawns forked process

On Mon, May 09, 2005 at 09:07:40PM -0400, Christopher Murtagh wrote:

On Mon, 2005-05-09 at 17:01 -0400, Douglas McNaught wrote:

Why not have a client connection LISTENing and doing the
synchronization, and have the trigger use NOTIFY?

Or, you could have the trigger write to a table, and have another
client periodically scanning the table for new sync events.

Either one of those would be simpler and more robust than fork()ing
inside the backend.

How is writing a daemon simpler than using something that could be done
within Postgres? Forking is something that should be natural to Unix
systems, I shouldn't need to write another application to do this. I
don't see how a daemon would necessarily be more robust either.

Well, LISTEN and NOTIFY are built into PostgreSQL
(http://www.postgresql.org/docs/8.0/interactive/sql-notify.html). If the
processes that you're trying to notify of the changes are connected to
the database then this might be the easiest way to do what you're
looking for. Setting up some form of replication, such as Slony, also
comes to mind. But it's impossible to really make a recommendation
without having a better idea of what you're doing.

BTW, my understanding is that it's pretty easy to write a daemon in
perl, and there are examples of how to do this floating around.
--
Jim C. Nasby, Database Consultant decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

#11Christopher Murtagh
christopher.murtagh@mcgill.ca
In reply to: Jim Nasby (#10)
Re: Trigger that spawns forked process

On Tue, 2005-05-10 at 11:11 -0500, Jim C. Nasby wrote:

Well, LISTEN and NOTIFY are built into PostgreSQL
(http://www.postgresql.org/docs/8.0/interactive/sql-notify.html). If the
processes that you're trying to notify of the changes are connected to
the database then this might be the easiest way to do what you're
looking for. Setting up some form of replication, such as Slony, also
comes to mind. But it's impossible to really make a recommendation
without having a better idea of what you're doing.

BTW, my understanding is that it's pretty easy to write a daemon in
perl, and there are examples of how to do this floating around.

Yes, I saw the LISTEN/NOTIFY stuff, and it could be interesting. As to
the replication, Slony won't do it for me, as it isn't the database I
want to replicate. Here's a basic description:

I have 4 cluster nodes all running the same content management software
(home grown). When a change request comes in to one of them (update to
an XML document), it submits the new XML doc to the database (which is
the master repository of all content), then performs an XSLT. Upon the
new change, I want the database to propagate the new result of the XSLT
to the other nodes so that they can pre-cache it (to avoid page loading
latency).

I was given an example of how to spawn a forked process with plperlu,
and it looks pretty simple and straightforward and exactly what I want:

CREATE or REPLACE function somefunc() returns void as $$
$SIG{CHLD}='IGNORE';
# the preceding line removes any zombies created.
# Assumes you do not want to handle the return value
#from the child process

unless (defined ($pid=fork)) {
die "cannot fork: $!";
}
unless ($pid) {
$cmd="your command here";
system "$cmd";
if ($? != 0) {
# handle errors here
}
exit;
}
RETURN;
$$ language plperlu;

This seems to be pretty trivial, and near fail-proof to me. my '$cmd'
would then be a script that handles the replication of the cached file
to the nodes (already written and tested). Why is a daemon more robust
than this? (BTW, I ask out of ignorance, not out of arrogance).

Cheers,

Chris
--
Christopher Murtagh
Enterprise Systems Administrator
ISR / Web Service Group
McGill University
Montreal, Quebec
Canada

Tel.: (514) 398-3122
Fax: (514) 398-2017

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Christopher Murtagh (#11)
Re: Trigger that spawns forked process

Christopher Murtagh <christopher.murtagh@mcgill.ca> writes:

I was given an example of how to spawn a forked process with plperlu,
and it looks pretty simple and straightforward and exactly what I want:

CREATE or REPLACE function somefunc() returns void as $$
$SIG{CHLD}='IGNORE';

... let's see, you already broke the backend there --- unless its normal
setting of SIGCHLD is IGNORE, in which case munging it is unnecessary
anyway ...

unless ($pid) {
$cmd="your command here";
system "$cmd";
if ($? != 0) {
# handle errors here
}
exit;
}

I'm not sure what happens when you do "exit" here, but I'll lay odds
against it being exactly the right things. (An atexit hook in a backend
process is entitled to suppose it is cleaning up a backend.) Also,
exactly what is your "handle errors" step going to be? You don't get
to reflect anything back into the database at that point.

This seems to be pretty trivial, and near fail-proof to me. my '$cmd'
would then be a script that handles the replication of the cached file
to the nodes (already written and tested). Why is a daemon more robust
than this? (BTW, I ask out of ignorance, not out of arrogance).

The main reason why this is probably a bad idea is that your
transaction is causing side-effects outside the database that cannot
be undone if the transaction later rolls back. The general consensus
of people who have looked at this is that it's safer to fire those
operations after the transaction actually commits. (As an example,
this gives you an opportunity to retry the outside operations if *they*
fail, and in any case to keep some state information about whether the
outside-the-DB state is actually synced with inside-the-DB or not.)

regards, tom lane

#13Scott Marlowe
smarlowe@g2switchworks.com
In reply to: Christopher Murtagh (#11)
Re: Trigger that spawns forked process

On Tue, 2005-05-10 at 15:02, Christopher Murtagh wrote:

On Tue, 2005-05-10 at 11:11 -0500, Jim C. Nasby wrote:

Well, LISTEN and NOTIFY are built into PostgreSQL
(http://www.postgresql.org/docs/8.0/interactive/sql-notify.html). If the
processes that you're trying to notify of the changes are connected to
the database then this might be the easiest way to do what you're
looking for. Setting up some form of replication, such as Slony, also
comes to mind. But it's impossible to really make a recommendation
without having a better idea of what you're doing.

BTW, my understanding is that it's pretty easy to write a daemon in
perl, and there are examples of how to do this floating around.

Yes, I saw the LISTEN/NOTIFY stuff, and it could be interesting. As to
the replication, Slony won't do it for me, as it isn't the database I
want to replicate. Here's a basic description:

I have 4 cluster nodes all running the same content management software
(home grown). When a change request comes in to one of them (update to
an XML document), it submits the new XML doc to the database (which is
the master repository of all content), then performs an XSLT. Upon the
new change, I want the database to propagate the new result of the XSLT
to the other nodes so that they can pre-cache it (to avoid page loading
latency).

Seeing as how Slony replicates tables you choose to have it replicate,
it seems to me you could just have it replicate the post-xslt table and
it would do what you want.

#14David Fetter
david@fetter.org
In reply to: Christopher Murtagh (#11)
Re: Trigger that spawns forked process

On Tue, May 10, 2005 at 04:02:59PM -0400, Christopher Murtagh wrote:

On Tue, 2005-05-10 at 11:11 -0500, Jim C. Nasby wrote:

Well, LISTEN and NOTIFY are built into PostgreSQL
(http://www.postgresql.org/docs/8.0/interactive/sql-notify.html).
If the processes that you're trying to notify of the changes are
connected to the database then this might be the easiest way to do
what you're looking for. Setting up some form of replication, such
as Slony, also comes to mind. But it's impossible to really make a
recommendation without having a better idea of what you're doing.

BTW, my understanding is that it's pretty easy to write a daemon
in perl, and there are examples of how to do this floating around.

Yes, I saw the LISTEN/NOTIFY stuff, and it could be interesting. As
to the replication, Slony won't do it for me, as it isn't the
database I want to replicate. Here's a basic description:

I have 4 cluster nodes all running the same content management
software (home grown). When a change request comes in to one of them
(update to an XML document), it submits the new XML doc to the
database (which is the master repository of all content), then
performs an XSLT. Upon the new change, I want the database to
propagate the new result of the XSLT to the other nodes so that they
can pre-cache it (to avoid page loading latency).

Why do you think Slony won't work for this? One way it could do it is
to have an ON INSERT trigger that populates one or more tables with
the result of the XSLT, which table(s) Slony replicates to the other
servers.

Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100 mobile: +1 415 235 3778

Remember to vote!

#15Christopher Murtagh
christopher.murtagh@mcgill.ca
In reply to: David Fetter (#14)
Re: Trigger that spawns forked process

On Tue, 2005-05-10 at 13:50 -0700, David Fetter wrote:

Why do you think Slony won't work for this? One way it could do it is
to have an ON INSERT trigger that populates one or more tables with
the result of the XSLT, which table(s) Slony replicates to the other
servers.

Because the nodes are not databases, they are Apache/PHP web servers
which have file system caching where the URL = directory/file. The XSLT
also converts XML objects to PHP code. So basically a content editor can
do something like:

<br /><br />
This is my course description: <coursedesc courseid="AAA 123" />
...
etc.

and the XSLT converts the <coursedesc /> tag into a PHP function (that
does a db lookup).

Cheers,

Chris

#16Christopher Murtagh
christopher.murtagh@mcgill.ca
In reply to: Tom Lane (#12)
Re: Trigger that spawns forked process

On Tue, 2005-05-10 at 16:17 -0400, Tom Lane wrote:

... let's see, you already broke the backend there --- unless its normal
setting of SIGCHLD is IGNORE, in which case munging it is unnecessary
anyway ...

Here's my (probably all garbled) explanation: Essentially what that code
is a self-daemonizing perl wrapper. Setting SIGCHLD to IGNORE will
prevent zombie processes from hanging around, essentially
daemonizing/orphaning/forking the perl script.

unless ($pid) {
$cmd="your command here";
system "$cmd";
if ($? != 0) {
# handle errors here
}
exit;
}

I'm not sure what happens when you do "exit" here, but I'll lay odds
against it being exactly the right things.

It ends the daemonized process, kinda like a wrapper suicide. :-)

(An atexit hook in a backend
process is entitled to suppose it is cleaning up a backend.) Also,
exactly what is your "handle errors" step going to be?

Well, if my command fails for some reason, I can replace '#handle
errors' with something that notifies me (email, or by populating the
database, etc.).

You don't get to reflect anything back into the database at that point.

That's ok, my $cmd might or might not have db connections in it, same
for the error checking script (which could be written in a totally
different language).

The main reason why this is probably a bad idea is that your
transaction is causing side-effects outside the database that cannot
be undone if the transaction later rolls back. The general consensus
of people who have looked at this is that it's safer to fire those
operations after the transaction actually commits.

I should have stated that this will get used only by single auto-commit
transactions. Any rollbacks are essentially changes to the past and
aren't permitted. Instead if someone wanted to 'undo' a change, they
would re-submit a previous version. This way, I can keep my replication
code to very atomic things which makes it very simple to write and
maintain.

From my (somewhat limited experience) point of view, I think that this
plperlu script isn't much different from writing a daemon to receive
signals via NOTIFY. Instead the script is self daemonizing, and it will
always run (instead of a couple of NOTIFY's building up and only one
being sent), which is more in line with what I want.

Sorry, my explanation probably isn't very clear at all, I've been
writing talk material and my brain is in a totally different space. Feel
free to deliver any LARTs. :-)

Cheers,

Chris

#17Martijn van Oosterhout
kleptog@svana.org
In reply to: Christopher Murtagh (#16)
Re: Trigger that spawns forked process

On Tue, May 10, 2005 at 05:31:56PM -0400, Christopher Murtagh wrote:

I'm not sure what happens when you do "exit" here, but I'll lay odds
against it being exactly the right things.

It ends the daemonized process, kinda like a wrapper suicide. :-)

I think you have a problem here. PostgreSQL is a complete program,
which use signal, atexit handlers, callback all or which fork()
preserves. When your "little daemon" exits it may trigger all the code
normally run on backend exit, you know, closing WAL files, etc...

The rest of the program has no idea it's a forked process rather than a
real one. Say the backend had a alarm() set and it goes off in your
forked process. Havoc ensues...

From my (somewhat limited experience) point of view, I think that this
plperlu script isn't much different from writing a daemon to receive
signals via NOTIFY. Instead the script is self daemonizing, and it will
always run (instead of a couple of NOTIFY's building up and only one
being sent), which is more in line with what I want.

Except that the daemon will be a client process that uses the database
to do work whereas with the other you're duplicating the server without
telling it and putting your data at risk...

Sorry, my explanation probably isn't very clear at all, I've been
writing talk material and my brain is in a totally different space. Feel
free to deliver any LARTs. :-)

You should never fork() and not exec() a large program unless it knows
you're doing it. Note that exec() doesn't run atexit handlers but
exit() does. Big difference...

Hope this helps,

--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.

#18Christopher Murtagh
christopher.murtagh@mcgill.ca
In reply to: Martijn van Oosterhout (#17)
Re: Trigger that spawns forked process

On Wed, 2005-05-11 at 00:08 +0200, Martijn van Oosterhout wrote:

On Tue, May 10, 2005 at 05:31:56PM -0400, Christopher Murtagh wrote:

I'm not sure what happens when you do "exit" here, but I'll lay odds
against it being exactly the right things.

It ends the daemonized process, kinda like a wrapper suicide. :-)

I think you have a problem here. PostgreSQL is a complete program,
which use signal, atexit handlers, callback all or which fork()
preserves. When your "little daemon" exits it may trigger all the code
normally run on backend exit, you know, closing WAL files, etc...

The rest of the program has no idea it's a forked process rather than a
real one. Say the backend had a alarm() set and it goes off in your
forked process. Havoc ensues...

Ok, I think I'm seeing the light here. Thanks for your input!

So, if I made a slight modification to my script to this instead:

CREATE or REPLACE function exec_test() returns void as '
unless (defined ($pid=fork)) {
die "cannot fork: $!";
}
unless ($pid) {
$cmd="/path/to/some/script.pl";
exec "$cmd";
}
RETURN;
' language plperlu;

Then the exec'd $cmd shouldn't inherit any of the signals or atexit
handlers. My script.pl can handle any errors it encounters (other than
not being executed of course, but I can live with that), and this way I
don't have to write a daemon that polls listening for a NOTIFY.

Is this less objectionable? Again, thanks to all for your input and
feedback, I really do appreciate it.

Cheers,

Chris

--
Christopher Murtagh
Enterprise Systems Administrator
ISR / Web Service Group
McGill University
Montreal, Quebec
Canada

Tel.: (514) 398-3122
Fax: (514) 398-2017

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Christopher Murtagh (#18)
Re: Trigger that spawns forked process

Christopher Murtagh <christopher.murtagh@mcgill.ca> writes:

So, if I made a slight modification to my script to this instead:

CREATE or REPLACE function exec_test() returns void as '
unless (defined ($pid=fork)) {
die "cannot fork: $!";
}
unless ($pid) {
$cmd="/path/to/some/script.pl";
exec "$cmd";
}
RETURN;
' language plperlu;

Is this less objectionable?

Well, it's better, but you need to think about what happens if the exec
fails (eg, script is not where you thought). Does plperl let you get at
abort(), or some other way of terminating the process other than exit()?

There still remains the point that this violates transaction semantics,
in that an aborted transaction may still have caused things to happen
outside the database.

regards, tom lane