Review: Extra Daemons / bgworker
Hello Álvaro,
first of all, thank you for bringing this up again and providing a
patch. My first attempt on that was more than two years ago [1]That was during commit fest 2010-09. Back then, neither extensions, latches nor the walsender existed. The patches had a dependency on imessages, which was not accepted. Other than that, it's a working, pooled bgworker implementation on top of which autovacuum runs just fine. It lacks extensibility and the extra daemons feature, though. For those who missed it:. As the
author of a former bgworker patch, I'd like to provide an additional
review - KaiGai was simply faster to sing up as a reviewer on the
commitfest app...
I started my review is based on rev 6d7e51.. from your bgworker branch
on github, only to figure out later that it wasn't quite up to date. I
upgraded to bgworker-7.patch. I hope that's the most recent version.
# Concept
I appreciate the ability to run daemons that are not connected to
Postgres shared memory. It satisfies a user request that came up several
times when I talked about the bgworkers in Postgres-R.
Another good point is the flexible interface via extensions, even
allowing different starting points for such background workers.
One thing I'd miss (for use of that framework in Postgres-R) is the
ability to start a registered bgworker only upon request, way after the
system started. So that the bgworker process doesn't even exist until it
is really needed. I realize that RegisterBackgroundWorker() is still
necessary in advance to reserve the slots in BackgroundWorkerList and
shared memory.
As a use case independent of Postgres-R, think of something akin to a
worker_spi, but wanting that to perform a task every 24h on hundreds of
databases. You don't want to keep hundreds of processes occupying PGPROC
slots just perform a measly task every 24h.
(We've discussed the ability to let bgworkers re-connect to another
database back then. For one, you'd still have currently unneeded worker
processes around all the time. And second, I think that's hard to get
right - after all, a terminated process is guaranteed to not leak any
stale data into a newly started one, no matter what.)
From my point of view, autovacuum is the very first example of a
background worker process. And I'm a bit puzzled about it not being
properly integrated into this framework. Once you think of autovacuum as
a background job which needs access to Postgres shared memory and a
database, but no client connection, it looks like a bit of code
duplication (and not using code we already have). I realize this kind of
needs the above feature being able to request the (re)start of bgworkers
at arbitrary points in time. However, it would also be a nice test case
for the bgworker infrastructure.
I'd be happy to help with extending the current patch into that
direction, if you agree it's generally useful. Or adapt my bgworker code
accordingly.
# Minor technical issues or questions
In assign_maxconnections, et al, GetNumRegisteredBackgroundWorkers() is
used in relation to MAX_BACKENDS or to calculate MaxBackends. That seems
to "leak" the bgworkers that registered with neither
BGWORKER_SHMEM_ACCESS nor BGWORKER_BACKEND_DATABASE_CONNECTION set. Or
is there any reason to discount such extra daemon processes?
The additional contrib modules auth_counter and worker_spi are missing
from the contrib/Makefile. If that's intentional, they should probably
be listed under "Missing".
The auth_counter module leaves worker.bgw_restart_time uninitialized.
Being an example, it might make sense for auth_counter to provide a
signal that just calls SetLatch() to interrupt WaitLatch() and make
auth_counter emit its log line upon request, i.e. show how to use SIGUSR1.
The bgw_restart_time doesn't always work (certainly not the way I'm
expecting it to). For example, if you forget to pass the
BGWORKER_SHMEM_ACCESS flag, trying to LWLockAcquire() leads to the
worker being restarted immediately and repeatedly - independent of the
bgw_restart_time setting. The same holds true if the bgworker exits with
status 0 or in case it segfaults. Not when exiting with code 1, though.
Why is that? Especially in case of a segfault or equally "hard" errors
that can be expected to occur repeatedly, I don't want the worker to be
restarted that frequently.
# Documentation
There are two working examples in contrib. The auth_counter could use a
header comment similar to worker_spi, quickly describing what it does.
There's no example of a plain extra daemon, without shmem access.
Coding guidelines for bgworker / extra daemon writers are missing. I
read these must not use sleep(), with an explanation in both examples.
Other questions that come to mind: what about signal handlers? fork()?
threads? Can/should it use PG_TRY/PG_CATCH or setjmp()/longjmp(). How to
best load other 3rd party libraries, i.e. for OpenGL processing?
Especially for extra daemons (i.e. bgw_flags = 0), differences to
running an external daemon should be documented. I myself am unclear
about all of the implications that running as a child of the postmaster
has (OOM and cgroups come to mind - there certainly are other aspects).
(I myself still have a hard time finding a proper use case for extra
daemons. I don't want the postmaster to turn into a general purpose
watchdog for everything and the kitchen sink.)
# Tests
No additional automated tests are included - hard to see how that could
be tested automatically, given the lack of a proper test harness.
I hope this review provides useful input.
Regards
Markus Wanner
[1]: That was during commit fest 2010-09. Back then, neither extensions, latches nor the walsender existed. The patches had a dependency on imessages, which was not accepted. Other than that, it's a working, pooled bgworker implementation on top of which autovacuum runs just fine. It lacks extensibility and the extra daemons feature, though. For those who missed it:
latches nor the walsender existed. The patches had a dependency on
imessages, which was not accepted. Other than that, it's a working,
pooled bgworker implementation on top of which autovacuum runs just
fine. It lacks extensibility and the extra daemons feature, though. For
those who missed it:
https://commitfest.postgresql.org/action/patch_view?id=339
http://archives.postgresql.org/message-id/4C3C789C.1040409@bluegap.ch
http://git.postgres-r.org/?p=bgworker;a=summary
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Markus Wanner wrote:
Hi Markus,
Many thanks for your review.
first of all, thank you for bringing this up again and providing a
patch. My first attempt on that was more than two years ago [1]. As the
author of a former bgworker patch, I'd like to provide an additional
review - KaiGai was simply faster to sing up as a reviewer on the
commitfest app...
I remember your patchset. I didn't look at it for this round, for no
particular reason. I did look at KaiGai's submission from two
commitfests ago, and also at a patch from Simon which AFAIK was never
published openly. Simon's patch merged the autovacuum code to make
autovac workers behave like bgworkers as handled by his patch, just like
you suggest. To me it didn't look all that good; I didn't have the guts
for that, hence the separation. If later bgworkers are found to work
well enough, we can "port" autovac workers to use this framework, but
for now it seems to me that the conservative thing is to leave autovac
untouched.
(As an example, we introduced "ilist" some commits back and changed some
uses to it; but there are still many places where we're using SHM_QUEUE,
or List, or open-coded lists, which we could port to the new
infrastructure, but there's no pressing need to do it.)
I started my review is based on rev 6d7e51.. from your bgworker branch
on github, only to figure out later that it wasn't quite up to date. I
upgraded to bgworker-7.patch. I hope that's the most recent version.
Sorry about that -- forgot to push to github. bgworker-7 corresponds to
commit 0a49a540b which I have just pushed to github.
# Concept
I appreciate the ability to run daemons that are not connected to
Postgres shared memory. It satisfies a user request that came up several
times when I talked about the bgworkers in Postgres-R.Another good point is the flexible interface via extensions, even
allowing different starting points for such background workers.
Great.
One thing I'd miss (for use of that framework in Postgres-R) is the
ability to start a registered bgworker only upon request, way after the
system started. So that the bgworker process doesn't even exist until it
is really needed. I realize that RegisterBackgroundWorker() is still
necessary in advance to reserve the slots in BackgroundWorkerList and
shared memory.As a use case independent of Postgres-R, think of something akin to a
worker_spi, but wanting that to perform a task every 24h on hundreds of
databases. You don't want to keep hundreds of processes occupying PGPROC
slots just perform a measly task every 24h.
Yeah, this is something I specifically kept out initially to keep things
simple.
Maybe one thing to do in this area would be to ensure that there is a
certain number of PGPROC elements reserved specifically for bgworkers;
kind of like autovacuum workers have. That would avoid having regular
clients exhausting slots for bgworkers, and vice versa.
How are you envisioning that the requests would occur?
# Minor technical issues or questions
In assign_maxconnections, et al, GetNumRegisteredBackgroundWorkers() is
used in relation to MAX_BACKENDS or to calculate MaxBackends. That seems
to "leak" the bgworkers that registered with neither
BGWORKER_SHMEM_ACCESS nor BGWORKER_BACKEND_DATABASE_CONNECTION set. Or
is there any reason to discount such extra daemon processes?
No, I purposefully let those out, because those don't need a PGPROC. In
fact that seems to me to be the whole point of non-shmem-connected
workers: you can have as many as you like and they won't cause a
backend-side impact. You can use such a worker to connect via libpq to
the server, for example.
The additional contrib modules auth_counter and worker_spi are missing
from the contrib/Makefile. If that's intentional, they should probably
be listed under "Missing".The auth_counter module leaves worker.bgw_restart_time uninitialized.
Being an example, it might make sense for auth_counter to provide a
signal that just calls SetLatch() to interrupt WaitLatch() and make
auth_counter emit its log line upon request, i.e. show how to use SIGUSR1.
KaiGai proposed that we remove auth_counter. I don't see why not; I
mean, worker_spi is already doing most of what auth_counter is doing, so
why not? However, as you say, maybe we need more coding examples.
The bgw_restart_time doesn't always work (certainly not the way I'm
expecting it to). For example, if you forget to pass the
BGWORKER_SHMEM_ACCESS flag, trying to LWLockAcquire() leads to the
worker being restarted immediately and repeatedly - independent of the
bgw_restart_time setting. The same holds true if the bgworker exits with
status 0 or in case it segfaults. Not when exiting with code 1, though.
Why is that? Especially in case of a segfault or equally "hard" errors
that can be expected to occur repeatedly, I don't want the worker to be
restarted that frequently.
Ah, that's just a bug, of course.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 11/28/2012 03:51 PM, Alvaro Herrera wrote:
I remember your patchset. I didn't look at it for this round, for no
particular reason. I did look at KaiGai's submission from two
commitfests ago, and also at a patch from Simon which AFAIK was never
published openly. Simon's patch merged the autovacuum code to make
autovac workers behave like bgworkers as handled by his patch, just like
you suggest. To me it didn't look all that good; I didn't have the guts
for that, hence the separation. If later bgworkers are found to work
well enough, we can "port" autovac workers to use this framework, but
for now it seems to me that the conservative thing is to leave autovac
untouched.
Hm.. interesting to see how the same idea grows into different patches
and gets refined until we end up with something considered committable.
Do you remember anything in particular that didn't look good? Would you
help reviewing a patch on top of bgworker-7 that changed autovacuum into
a bgworker?
(As an example, we introduced "ilist" some commits back and changed some
uses to it; but there are still many places where we're using SHM_QUEUE,
or List, or open-coded lists, which we could port to the new
infrastructure, but there's no pressing need to do it.)
Well, I usually like cleaning things up earlier rather than later (my
desk clearly being an exception to that rule, though). But yeah, new
code usually brings new bugs with it.
Sorry about that -- forgot to push to github. bgworker-7 corresponds to
commit 0a49a540b which I have just pushed to github.
Thanks.
Maybe one thing to do in this area would be to ensure that there is a
certain number of PGPROC elements reserved specifically for bgworkers;
kind of like autovacuum workers have. That would avoid having regular
clients exhausting slots for bgworkers, and vice versa.
Yeah, I think that's mandatory, anyways, see below.
How are you envisioning that the requests would occur?
Just like av_launcher does it now: set a flag in shared memory and
signal the postmaster (PMSIGNAL_START_AUTOVAC_WORKER).
(That's why I'm so puzzled: it looks like it's pretty much all there,
already. I even remember a discussion about that mechanism probably not
being fast enough to spawn bgworkers. And a proposal to add multiple
such flags, so an avlauncher-like daemon could ask for multiple
bgworkers to be started in parallel. I've even measured the serial
bgworker fork rate back then, IIRC it was in the hundreds of forks per
second...)
# Minor technical issues or questions
In assign_maxconnections, et al, GetNumRegisteredBackgroundWorkers() is
used in relation to MAX_BACKENDS or to calculate MaxBackends. That seems
to "leak" the bgworkers that registered with neither
BGWORKER_SHMEM_ACCESS nor BGWORKER_BACKEND_DATABASE_CONNECTION set. Or
is there any reason to discount such extra daemon processes?No, I purposefully let those out, because those don't need a PGPROC. In
fact that seems to me to be the whole point of non-shmem-connected
workers: you can have as many as you like and they won't cause a
backend-side impact. You can use such a worker to connect via libpq to
the server, for example.
Hm.. well, in that case, the shmem-connected ones are *not* counted. If
I create and run an extensions that registers 100 shmem-connected
bgworkers, I cannot connect to a system with max_connections=100
anymore, because bgworkers then occupy all of the connections, already.
Please add the registered shmem-connected bgworkers to the
max_connections limit. I think it's counter intuitive to have autovacuum
workers reserved separately, but extension's bgworkers eat (client)
connections. Or put another way: max_connections should always be the
max number of *client* connections the DBA wants to allow.
(Or, if that's in some way complicated, please give the DBA an
additional GUC akin to max_background_workers. That can be merged with
the current max_autovacuum_workers, once/if we rebase autovaccum onto
bgworkers).
The additional contrib modules auth_counter and worker_spi are missing
from the contrib/Makefile. If that's intentional, they should probably
be listed under "Missing".The auth_counter module leaves worker.bgw_restart_time uninitialized.
Being an example, it might make sense for auth_counter to provide a
signal that just calls SetLatch() to interrupt WaitLatch() and make
auth_counter emit its log line upon request, i.e. show how to use SIGUSR1.KaiGai proposed that we remove auth_counter. I don't see why not; I
mean, worker_spi is already doing most of what auth_counter is doing, so
why not?
Agreed.
However, as you say, maybe we need more coding examples.
Maybe a minimally usable extra daemon example? Showing how to avoid
common pitfalls? Use cases, anybody? :-)
Ah, that's just a bug, of course.
I see. Glad my review found it.
Regards
Markus Wanner
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Markus Wanner <markus@bluegap.ch> writes:
However, as you say, maybe we need more coding examples.
Maybe a minimally usable extra daemon example? Showing how to avoid
common pitfalls? Use cases, anybody? :-)
What about the PGQ ticker, pgqd?
https://github.com/markokr/skytools/tree/master/sql/ticker
https://github.com/markokr/skytools/blob/master/sql/ticker/pgqd.c
Or maybe pgAgent, which seems to live there, but is in C++ so might need
a rewrite to the specs:
Maybe it would be easier to have a version of GNU mcron as an extension,
with the abitity to fire PostgreSQL stored procedures directly? (That
way the cron specific parts of the logic are already implemented)
http://www.gnu.org/software/mcron/
Another idea would be to have a pgbouncer extension. We would still need
of course to have pgbouncer as a separate component so that client
connection can outlive a postmaster crash, but that would still be very
useful as a first step into admission control. Let's not talk about the
feedback loop and per-cluster resource usage monitoring yet, but I guess
that you can see the drift.
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2012/11/30 Dimitri Fontaine <dimitri@2ndquadrant.fr>:
Markus Wanner <markus@bluegap.ch> writes:
However, as you say, maybe we need more coding examples.
Maybe a minimally usable extra daemon example? Showing how to avoid
common pitfalls? Use cases, anybody? :-)What about the PGQ ticker, pgqd?
https://github.com/markokr/skytools/tree/master/sql/ticker
https://github.com/markokr/skytools/blob/master/sql/ticker/pgqd.cOr maybe pgAgent, which seems to live there, but is in C++ so might need
a rewrite to the specs:Maybe it would be easier to have a version of GNU mcron as an extension,
with the abitity to fire PostgreSQL stored procedures directly? (That
way the cron specific parts of the logic are already implemented)http://www.gnu.org/software/mcron/
Another idea would be to have a pgbouncer extension. We would still need
of course to have pgbouncer as a separate component so that client
connection can outlive a postmaster crash, but that would still be very
useful as a first step into admission control. Let's not talk about the
feedback loop and per-cluster resource usage monitoring yet, but I guess
that you can see the drift.
both will be nice
Pavel
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 11/30/2012 11:09 AM, Dimitri Fontaine wrote:
Markus Wanner <markus@bluegap.ch> writes:
However, as you say, maybe we need more coding examples.
Maybe a minimally usable extra daemon example? Showing how to avoid
common pitfalls? Use cases, anybody? :-)What about the PGQ ticker, pgqd?
https://github.com/markokr/skytools/tree/master/sql/ticker
https://github.com/markokr/skytools/blob/master/sql/ticker/pgqd.c
AFAICS pgqd currently uses libpq, so I think it would rather turn into
what I call a background worker, with a connection to Postgres shared
memory. I perfectly well see use cases (plural!) for those.
What I'm questioning is the use for what I rather call "extra daemons",
that is, additional processes started by the postmaster without a
connection to Postgres shared memory (and thus without a database
connection).
I was asking for a minimal example of such an extra daemon, similar to
worker_spi, showing how to properly write such a thing. Ideally showing
how to avoid common pitfalls.
Maybe it would be easier to have a version of GNU mcron as an extension,
with the abitity to fire PostgreSQL stored procedures directly? (That
way the cron specific parts of the logic are already implemented)
Again, that's something that would eventually require a database
connection. Or at least access to Postgres shared memory to eventually
start the required background jobs. (Something Alvaro's patch doesn't
implement, yet. But which exactly matches what the coordinator and
bgworkers in Postgres-R do.)
For ordinary extra daemons, I'm worried about things like an OOM killer
deciding to kill the postmaster, being its parent. Or (io)niceness
settings. Or Linux cgroups. IMO all of these things just get harder to
administrate for extra daemons, when they move under the hat of the
postmaster.
Without access to shared memory, the only thing an extra daemon would
gain by moving under postmaster is the "knowledge" of the start, stop
and restart (crash) events of the database. And that in a very
inflexible way: the extra process is forced to start, stop and restart
together with the database to "learn" about these events.
Using a normal client connection arguably is a better way to learn about
crash events - it doesn't have the above limitation. And the start and
stop events, well, the DBA or sysadmin is under control of these,
already. We can possibly improve on that, yes. But extra daemons are not
the way to go, IMO.
Another idea would be to have a pgbouncer extension. We would still need
of course to have pgbouncer as a separate component so that client
connection can outlive a postmaster crash, but that would still be very
useful as a first step into admission control. Let's not talk about the
feedback loop and per-cluster resource usage monitoring yet, but I guess
that you can see the drift.
Sorry, I don't. Especially not with something like pgbouncer, because I
definitely *want* to control and administrate that separately.
So I guess this is a vote to disallow `worker.bgw_flags = 0` on the
basis that everything a process, which doesn't need to access Postgres
shared memory, better does whatever it does outside of Postgres. For the
benefit of the stability of Postgres and for ease of administration of
the two.
Or, maybe, rather drop the BGWORKER_SHMEM_ACCESS flag and imply that
every registered process wants to have access to Postgres shared memory.
Then document the gotchas and requirements of living under the
Postmaster, as I've requested before. (If you want a foot gun, you can
still write an extension that doesn't need to access Postgres shared
memory, but hey.. I we can't help those who desperately try to shoot
their foot).
Regards
Markus Wanner
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Markus Wanner <markus@bluegap.ch> writes:
AFAICS pgqd currently uses libpq, so I think it would rather turn into
what I call a background worker, with a connection to Postgres shared
memory. I perfectly well see use cases (plural!) for those.What I'm questioning is the use for what I rather call "extra daemons",
that is, additional processes started by the postmaster without a
connection to Postgres shared memory (and thus without a database
connection).
I totally missed the need to connect to shared memory to be able to
connect to a database and query it. Can't we just link the bgworkder
code to the client libpq library, just as plproxy is doing I believe?
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Dimitri Fontaine wrote:
Markus Wanner <markus@bluegap.ch> writes:
AFAICS pgqd currently uses libpq, so I think it would rather turn into
what I call a background worker, with a connection to Postgres shared
memory. I perfectly well see use cases (plural!) for those.What I'm questioning is the use for what I rather call "extra daemons",
that is, additional processes started by the postmaster without a
connection to Postgres shared memory (and thus without a database
connection).I totally missed the need to connect to shared memory to be able to
connect to a database and query it. Can't we just link the bgworkder
code to the client libpq library, just as plproxy is doing I believe?
One of the uses for bgworkers that don't have shmem connection is to
have them use libpq connections instead. I don't really see the point
of forcing everyone to use backend connections when libpq connections
are enough. In particular, they are easier to port from existing code;
and they make it easier to share code with systems that still have to
support older PG versions.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2012-11-30 09:57:20 -0300, Alvaro Herrera wrote:
Dimitri Fontaine wrote:
Markus Wanner <markus@bluegap.ch> writes:
AFAICS pgqd currently uses libpq, so I think it would rather turn into
what I call a background worker, with a connection to Postgres shared
memory. I perfectly well see use cases (plural!) for those.What I'm questioning is the use for what I rather call "extra daemons",
that is, additional processes started by the postmaster without a
connection to Postgres shared memory (and thus without a database
connection).I totally missed the need to connect to shared memory to be able to
connect to a database and query it. Can't we just link the bgworkder
code to the client libpq library, just as plproxy is doing I believe?One of the uses for bgworkers that don't have shmem connection is to
have them use libpq connections instead. I don't really see the point
of forcing everyone to use backend connections when libpq connections
are enough. In particular, they are easier to port from existing code;
and they make it easier to share code with systems that still have to
support older PG versions.
They also can get away with a lot more crazy stuff without corrupting
the database. You better know something about what youre doing before
doing something with direct shared memory access.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
One of the uses for bgworkers that don't have shmem connection is to
have them use libpq connections instead. I don't really see the point
of forcing everyone to use backend connections when libpq connections
are enough. In particular, they are easier to port from existing code;
and they make it easier to share code with systems that still have to
support older PG versions.
Exactly, I think most bgworker would just use libpq if that's available,
using a backend's infrastructure is not that good a fit here. I mean,
connect from your worker to a database using libpq and call a backend's
function (provided by the same extension I guess) in there.
That's how I think pgqd would get integrated into the worker
infrastructure, right?
They also can get away with a lot more crazy stuff without corrupting
the database. You better know something about what youre doing before
doing something with direct shared memory access.
And there's a whole lot you can already do just with a C coded stored
procedure already.
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2012/11/30 Dimitri Fontaine <dimitri@2ndquadrant.fr>:
Andres Freund <andres@2ndquadrant.com> writes:
One of the uses for bgworkers that don't have shmem connection is to
have them use libpq connections instead. I don't really see the point
of forcing everyone to use backend connections when libpq connections
are enough. In particular, they are easier to port from existing code;
and they make it easier to share code with systems that still have to
support older PG versions.Exactly, I think most bgworker would just use libpq if that's available,
using a backend's infrastructure is not that good a fit here. I mean,
connect from your worker to a database using libpq and call a backend's
function (provided by the same extension I guess) in there.That's how I think pgqd would get integrated into the worker
infrastructure, right?
One thing we have to pay attention is, the backend code cannot distinguish
connection from pgworker via libpq from other regular connections, from
perspective of access control.
Even if we implement major portion with C-function, do we have a reliable way
to prohibit C-function being invoked with user's query?
I also plan to use bgworker to load data chunk from shared_buffer to GPU
device in parallel, it shall be performed under the regular access control
stuff.
I think, using libpq is a good "option" for 95% of development, however, it
also should be possible to use SPI interface for corner case usage.
Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 11/30/2012 01:40 PM, Dimitri Fontaine wrote:
I totally missed the need to connect to shared memory to be able to
connect to a database and query it. Can't we just link the bgworkder
code to the client libpq library, just as plproxy is doing I believe?
Well, sure you can use libpq. But so can external processes. So that's
no benefit of running under the postmaster.
Regards
Markus Wanner
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 11/30/2012 01:59 PM, Andres Freund wrote:
On 2012-11-30 09:57:20 -0300, Alvaro Herrera wrote:
One of the uses for bgworkers that don't have shmem connection is to
have them use libpq connections instead. I don't really see the point
of forcing everyone to use backend connections when libpq connections
are enough.
Requiring a libpq connection is a good indication for *not* wanting the
process to run under the postmaster, IMO.
In particular, they are easier to port from existing code;
and they make it easier to share code with systems that still have to
support older PG versions.They also can get away with a lot more crazy stuff without corrupting
the database.
Exactly. That's a good reason to *not* tie that to the postmaster, then.
Please keep as much of the potentially dangerous stuff separate (and
advice developers to do so as well, instead of offering them a foot
gun). So that our postmaster can do its job. And do it reliably, without
trying to be a general purpose start/stop daemon. There are better and
well established tools for that.
Regards
Markus Wanner
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Kohei KaiGai <kaigai@kaigai.gr.jp> writes:
One thing we have to pay attention is, the backend code cannot distinguish
connection from pgworker via libpq from other regular connections, from
perspective of access control.
Even if we implement major portion with C-function, do we have a reliable way
to prohibit C-function being invoked with user's query?
Why would you want to do that? And maybe the way to enforce that would
be by having your extension do its connecting using SPI to be able to
place things in known pieces of memory for the function to check?
I also plan to use bgworker to load data chunk from shared_buffer to GPU
device in parallel, it shall be performed under the regular access control
stuff.
That sounds like a job where you need shared memory access but maybe not
a database connection?
I think, using libpq is a good "option" for 95% of development, however, it
also should be possible to use SPI interface for corner case usage.
+1, totally agreed.
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Markus Wanner wrote:
On 11/30/2012 01:40 PM, Dimitri Fontaine wrote:
I totally missed the need to connect to shared memory to be able to
connect to a database and query it. Can't we just link the bgworkder
code to the client libpq library, just as plproxy is doing I believe?Well, sure you can use libpq. But so can external processes. So that's
no benefit of running under the postmaster.
No, it's not a benefit of that; but such a process would get started up
when postmaster is started, and shut down when postmaster stops. So it
makes easier to have processes that need to run alongside postmaster.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2012/11/30 Dimitri Fontaine <dimitri@2ndquadrant.fr>:
Kohei KaiGai <kaigai@kaigai.gr.jp> writes:
One thing we have to pay attention is, the backend code cannot distinguish
connection from pgworker via libpq from other regular connections, from
perspective of access control.
Even if we implement major portion with C-function, do we have a reliable way
to prohibit C-function being invoked with user's query?Why would you want to do that? And maybe the way to enforce that would
be by having your extension do its connecting using SPI to be able to
place things in known pieces of memory for the function to check?
As long as SPI is an option of bgworker, I have nothing to argue here.
If the framework enforced extension performing as background worker using
libpq for database connection, a certain kind of works being tied with internal
stuff has to be implemented as C-functions. Thus, I worried about it.
I also plan to use bgworker to load data chunk from shared_buffer to GPU
device in parallel, it shall be performed under the regular access control
stuff.That sounds like a job where you need shared memory access but maybe not
a database connection?
Both of them are needed in this case. This "io-worker" will perform according
to the request from regular backend process, and fetch tuples from the heap
to the DMA buffer being on shared memory.
I think, using libpq is a good "option" for 95% of development, however, it
also should be possible to use SPI interface for corner case usage.+1, totally agreed.
Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Alvaro,
On 11/30/2012 02:44 PM, Alvaro Herrera wrote:
So it
makes easier to have processes that need to run alongside postmaster.
That's where we are in respectful disagreement, then. As I don't think
that's easier, overall, but in my eye, this looks like a foot gun.
As long as things like pgbouncer, pgqd, etc.. keep to be available as
separate daemons, I'm happy, though.
Regards
Markus Wanner
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2012/11/30 Markus Wanner <markus@bluegap.ch>:
Alvaro,
On 11/30/2012 02:44 PM, Alvaro Herrera wrote:
So it
makes easier to have processes that need to run alongside postmaster.That's where we are in respectful disagreement, then. As I don't think
that's easier, overall, but in my eye, this looks like a foot gun.As long as things like pgbouncer, pgqd, etc.. keep to be available as
separate daemons, I'm happy, though.
This feature does not enforce them to implement with this new framework.
If they can perform as separate daemons, it is fine enough.
But it is not all the cases where we want background workers being tied
with postmaster's duration.
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 11/30/2012 03:16 PM, Kohei KaiGai wrote:
This feature does not enforce them to implement with this new framework.
If they can perform as separate daemons, it is fine enough.
I'm not clear on what exactly you envision, but if a process needs
access to shared buffers, it sounds like it should be a bgworker. I
don't quite understand why that process also wants a libpq connection,
but that's certainly doable.
But it is not all the cases where we want background workers being tied
with postmaster's duration.
Not wanting a process to be tied to postmaster's duration is a strong
indication that it better not be a bgworker, I think.
Regards
Markus Wanner
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2012/11/30 Markus Wanner <markus@bluegap.ch>:
On 11/30/2012 03:16 PM, Kohei KaiGai wrote:
This feature does not enforce them to implement with this new framework.
If they can perform as separate daemons, it is fine enough.I'm not clear on what exactly you envision, but if a process needs
access to shared buffers, it sounds like it should be a bgworker. I
don't quite understand why that process also wants a libpq connection,
but that's certainly doable.
It seemed to me you are advocating that any use case of background-
worker can be implemented with existing separate daemon approach.
What I wanted to say is, we have some cases that background-worker
framework allows to implement such kind of extensions with more
reasonable design.
Yes, one reason I want to use background-worker is access to shared-
memory segment. Also, it want to connect databases simultaneously
out of access controls; like as autovacuum. It is a reason why I'm saying
SPI interface should be only an option, not only libpq.
(If extension choose libpq, it does not take anything special, does it?)
But it is not all the cases where we want background workers being tied
with postmaster's duration.Not wanting a process to be tied to postmaster's duration is a strong
indication that it better not be a bgworker, I think.
It also probably means, in case when a process whose duration wants to
be tied with duration of postmaster, its author can consider to implement
it as background worker.
Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers