Passing values to a dynamic background worker

Started by Keith Fiskeover 8 years ago6 messages
#1Keith Fiske
keith@omniti.com

So after reading a recent thread on the steep learning curve for PG
internals [1], I figured I'd share where I've gotten stuck with this in a
new thread vs hijacking that one.

One of the goals I had with pg_partman was to see if I could get the
partitioning python scripts redone as C functions using a dynamic
background worker to be able to commit in batches with a single call. My
thinking was to have a user-function that can accept arguments for things
like the interval value, batch size, and other arguments to the python
script, then start/stop a dynamic bgw up for each batch so it can commit
after each one. The dymanic bgw would essentially just have to call the
already existing partition_data() plpgsql function, but I have to be able
to pass the argument values that the user gave down into the dynamic bgw.

I've reached a roadblock in that bgw_main_arg can only accept a single
argument that must be passed by value for a dynamic bgw. I already worked
around this for passing the database name to the my existing use of a bgw
with doing partition maintenance (pass a simple integer to use as an index
array value). But I'm not sure how to do this for passing multiple values
in. I'm assuming this would be the place where I'd see about storing values
in shared memory to be able to re-use later? I'm not even sure if that's
the right approach, and if it is, where to even start to understand how to
do that. Let alone in the context of how that would interact with the
background worker system. If you look at my existing C code, you can see
it's very simple and doesn't do much more than the worker_spi example. I've
yet to have to interact with any memory contexts or such things, and as the
referenced thread below mentions, doing so is quite a steep learning curve.

Any guidance for a newer internals dev here would be great.

1.
/messages/by-id/CAH=t1kqwCBF7J1bP0RjgsTcp-SaJaHrF4Yhb1iiQZMe3W-FX2w@mail.gmail.com

--
Keith Fiske
Database Administrator
OmniTI Computer Consulting, Inc.
http://www.keithf4.com

#2Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Keith Fiske (#1)
Re: Passing values to a dynamic background worker

Hello,

At Mon, 17 Apr 2017 16:19:13 -0400, Keith Fiske <keith@omniti.com> wrote in <CAG1_KcAFJ60pac_QnnZX0qeO12NENiPOcohuoQvs297WaT_ObQ@mail.gmail.com>

So after reading a recent thread on the steep learning curve for PG
internals [1], I figured I'd share where I've gotten stuck with this in a
new thread vs hijacking that one.

One of the goals I had with pg_partman was to see if I could get the
partitioning python scripts redone as C functions using a dynamic
background worker to be able to commit in batches with a single call. My
thinking was to have a user-function that can accept arguments for things
like the interval value, batch size, and other arguments to the python
script, then start/stop a dynamic bgw up for each batch so it can commit
after each one. The dymanic bgw would essentially just have to call the
already existing partition_data() plpgsql function, but I have to be able
to pass the argument values that the user gave down into the dynamic bgw.

I've reached a roadblock in that bgw_main_arg can only accept a single
argument that must be passed by value for a dynamic bgw. I already worked
around this for passing the database name to the my existing use of a bgw
with doing partition maintenance (pass a simple integer to use as an index
array value). But I'm not sure how to do this for passing multiple values
in. I'm assuming this would be the place where I'd see about storing values
in shared memory to be able to re-use later? I'm not even sure if that's
the right approach, and if it is, where to even start to understand how to
do that.

I think you are on the way, shared memory is that. There are two
ways to acquire shared memory areas for such purpose. One is
static shared memory that stays living aside shared_buffers, and
the another is dynamic shared memory (DSM). If you need fixed
size of memory segment, the former will work. If you need that of
indefinite amount, DSM will work.

You will see how to use (static) shared memory in the following
section in the documentation. Or pg_stat_statements.c will be a
good reference. This kind of shared memory is guaranteed to be
mapped at the same address so we can use pointers on there.

https://www.postgresql.org/docs/devel/static/xfunc-c.html#idp83376336

On the other hand, AFAICS, DSM doesn't seem well documented. I
mangaged to find a related document in Postgres Wiki but it seems
a bit old.

https://wiki.postgresql.org/wiki/Parallel_Internal_Sort

This is a little complex than static shared memory, and it is
*not* guaranteed to mapped at the same address among workers. You
will see an instance in LaunchParallelWorkers() and the related
functions in parallel.c. The basic of its usage would be as the
follows.

- Create a segment :
dsm_segment *seg = dsm_create(size);
- Send its handle via the bgw_main_arg.
worker.bgw_main_arg = dsm_segment_handle(seg);
- Attach the memory on the other side.
dsm_segment *seg = dsm_attach(main_arg);

On both side, the address of the attached shared memory is
obtained using dsm_segment_address(seg).

dsm_detach(seg) detaches the segment. All users of this segment
detach the segment, it will be destroyed.

You might need some locking or notification mechanism. Usually
the mechanisms named LWLock and Latch are used for the purpose.

Let alone in the context of how that would interact with the
background worker system. If you look at my existing C code, you can see
it's very simple and doesn't do much more than the worker_spi example. I've
yet to have to interact with any memory contexts or such things, and as the
referenced thread below mentions, doing so is quite a steep learning curve.

Any guidance for a newer internals dev here would be great.

1.
/messages/by-id/CAH=t1kqwCBF7J1bP0RjgsTcp-SaJaHrF4Yhb1iiQZMe3W-FX2w@mail.gmail.com

--
Keith Fiske
Database Administrator
OmniTI Computer Consulting, Inc.
http://www.keithf4.com

Good luck!

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#2)
Re: Passing values to a dynamic background worker

On 2017/04/18 18:12, Kyotaro HORIGUCHI wrote:

At Mon, 17 Apr 2017 16:19:13 -0400, Keith Fiske wrote:

So after reading a recent thread on the steep learning curve for PG
internals [1], I figured I'd share where I've gotten stuck with this in a
new thread vs hijacking that one.

One of the goals I had with pg_partman was to see if I could get the
partitioning python scripts redone as C functions using a dynamic
background worker to be able to commit in batches with a single call. My
thinking was to have a user-function that can accept arguments for things
like the interval value, batch size, and other arguments to the python
script, then start/stop a dynamic bgw up for each batch so it can commit
after each one. The dymanic bgw would essentially just have to call the
already existing partition_data() plpgsql function, but I have to be able
to pass the argument values that the user gave down into the dynamic bgw.

I've reached a roadblock in that bgw_main_arg can only accept a single
argument that must be passed by value for a dynamic bgw. I already worked
around this for passing the database name to the my existing use of a bgw
with doing partition maintenance (pass a simple integer to use as an index
array value). But I'm not sure how to do this for passing multiple values
in. I'm assuming this would be the place where I'd see about storing values
in shared memory to be able to re-use later? I'm not even sure if that's
the right approach, and if it is, where to even start to understand how to
do that.

On the other hand, AFAICS, DSM doesn't seem well documented. I
mangaged to find a related document in Postgres Wiki but it seems
a bit old.

https://wiki.postgresql.org/wiki/Parallel_Internal_Sort

This is a little complex than static shared memory, and it is
*not* guaranteed to mapped at the same address among workers. You
will see an instance in LaunchParallelWorkers() and the related
functions in parallel.c. The basic of its usage would be as the
follows.

- Create a segment :
dsm_segment *seg = dsm_create(size);
- Send its handle via the bgw_main_arg.
worker.bgw_main_arg = dsm_segment_handle(seg);
- Attach the memory on the other side.
dsm_segment *seg = dsm_attach(main_arg);

On both side, the address of the attached shared memory is
obtained using dsm_segment_address(seg).

dsm_detach(seg) detaches the segment. All users of this segment
detach the segment, it will be destroyed.

Perhaps, the more modern DSA mechanism could be applicable here, too.

Some recent commits demonstrate examples of DSA usage, such as BRIN
autosummarization commit (7526e10224f) and tidbitmap.c's shared iteration
support commit (98e6e89040a05).

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Keith Fiske
keith@omniti.com
In reply to: Amit Langote (#3)
Re: Passing values to a dynamic background worker

On Tue, Apr 18, 2017 at 5:40 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp

wrote:

On 2017/04/18 18:12, Kyotaro HORIGUCHI wrote:

At Mon, 17 Apr 2017 16:19:13 -0400, Keith Fiske wrote:

So after reading a recent thread on the steep learning curve for PG
internals [1], I figured I'd share where I've gotten stuck with this in

a

new thread vs hijacking that one.

One of the goals I had with pg_partman was to see if I could get the
partitioning python scripts redone as C functions using a dynamic
background worker to be able to commit in batches with a single call. My
thinking was to have a user-function that can accept arguments for

things

like the interval value, batch size, and other arguments to the python
script, then start/stop a dynamic bgw up for each batch so it can commit
after each one. The dymanic bgw would essentially just have to call the
already existing partition_data() plpgsql function, but I have to be

able

to pass the argument values that the user gave down into the dynamic

bgw.

I've reached a roadblock in that bgw_main_arg can only accept a single
argument that must be passed by value for a dynamic bgw. I already

worked

around this for passing the database name to the my existing use of a

bgw

with doing partition maintenance (pass a simple integer to use as an

index

array value). But I'm not sure how to do this for passing multiple

values

in. I'm assuming this would be the place where I'd see about storing

values

in shared memory to be able to re-use later? I'm not even sure if that's
the right approach, and if it is, where to even start to understand how

to

do that.

On the other hand, AFAICS, DSM doesn't seem well documented. I
mangaged to find a related document in Postgres Wiki but it seems
a bit old.

https://wiki.postgresql.org/wiki/Parallel_Internal_Sort

This is a little complex than static shared memory, and it is
*not* guaranteed to mapped at the same address among workers. You
will see an instance in LaunchParallelWorkers() and the related
functions in parallel.c. The basic of its usage would be as the
follows.

- Create a segment :
dsm_segment *seg = dsm_create(size);
- Send its handle via the bgw_main_arg.
worker.bgw_main_arg = dsm_segment_handle(seg);
- Attach the memory on the other side.
dsm_segment *seg = dsm_attach(main_arg);

On both side, the address of the attached shared memory is
obtained using dsm_segment_address(seg).

dsm_detach(seg) detaches the segment. All users of this segment
detach the segment, it will be destroyed.

Perhaps, the more modern DSA mechanism could be applicable here, too.

Some recent commits demonstrate examples of DSA usage, such as BRIN
autosummarization commit (7526e10224f) and tidbitmap.c's shared iteration
support commit (98e6e89040a05).

Thanks,
Amit

Thank you both very much for the suggestions!

Keith

#5Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Keith Fiske (#1)
Re: Passing values to a dynamic background worker

On 4/17/17 16:19, Keith Fiske wrote:

I've reached a roadblock in that bgw_main_arg can only accept a single
argument that must be passed by value for a dynamic bgw. I already
worked around this for passing the database name to the my existing use
of a bgw with doing partition maintenance (pass a simple integer to use
as an index array value). But I'm not sure how to do this for passing
multiple values in.

You can also store this kind of information in a table.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Keith Fiske
keith@omniti.com
In reply to: Peter Eisentraut (#5)
Re: Passing values to a dynamic background worker

On Tue, Apr 18, 2017 at 12:34 PM, Peter Eisentraut <
peter.eisentraut@2ndquadrant.com> wrote:

On 4/17/17 16:19, Keith Fiske wrote:

I've reached a roadblock in that bgw_main_arg can only accept a single
argument that must be passed by value for a dynamic bgw. I already
worked around this for passing the database name to the my existing use
of a bgw with doing partition maintenance (pass a simple integer to use
as an index array value). But I'm not sure how to do this for passing
multiple values in.

You can also store this kind of information in a table.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

True, but that seemed like the easy way out. :) Trying to find ways to
learn internals better through projects I'm actively working on.

Keith