proposal: contrib module - generic command scheduler

Started by Pavel Stehuleover 10 years ago15 messages

pavel.stehule@gmail.com

over 10 years ago

Generic simple scheduler to contrib
===================================
Job schedulers are important and sometimes very complex part of any
software. PostgreSQL miss it. I propose new contrib module, that can be
used simply for some tasks, and that can be used as base for other more
richer schedulers. I prefer minimalist design - but strong enough for
enhancing when it is necessary. Some complex logic can be implemented in PL
better than in C. Motto: Simply to learn, simply to use, simply to
customize.

Motivation
----------
Possibility to simplify administration of repeated tasks. Possibility to
write complex schedulers in PL/pgSQL or other PL.

Design
------
Any scheduled command will be executed in independent worker. The number
workers for one command can be limited. Total number of workers will be
limited. Any command will be executed under specified user with known
timeout in current database. Next step can be implementation global
scheduler - but we have not a environment for running server side global
scripts, so I don't think about it in this moment.

This scheduler does not guarantee number of executions. Without available
workers the execution will be suspended, after crash the execution can be
repeated. But it can be solved in upper layer if it is necessary. It is not
designed as realtime system. Scheduled task will be executed immediately
when related worker will be free, but the execution window is limited to
next start.

This design don't try to solve mechanism for repeating tasks when tasks
hash a crash. This can be solved better in PL on custom layer when it is
necessary.

Scheduled time is stored to type scheduled_time:

create type scheduled_time as (second int[], minute int[], hour int[], dow
int[], month int[]);

(,"{1,10,20,30,40,50}",,,) .. run every 10 minutes.
(,"{5}",,,) .. run once per hour

The core is table pg_scheduled_commands

Oid: 1
name: xxxx
user: pavel
stime: (,"{5}",,,)
max_workers: 1
timeout: 10s
command: SELECT plpgsql_entry(scheduled_time(), scheduled_command_oid())

set timeout to 0 ~ unlimited, -1 default statement_timeout
set max_workers to 0 ~ disable tasks

API
---
pg_create_scheduled_command(name,
stime,
command,
user default current_user,
max_workers default 1,
timeout default -1);

pg_drop_scheduled_command(oid)
pg_drop_scheduled_command(name);

pg_update_scheduled_command(oid | name, ...

Usage:
------
pg_create_scheduled_command('delete obsolete data', '(,,"{1}",,)', $$DELETE
FROM data WHERE inserted < current_timestamp - interval '1month'$$);
pg_update_scheduled_command('delete obsolete data', max_workers => 2,
timeout :=> '1h');
pg_drop_scheduled_command('delete obsolete data');

select * from pg_scheduled_commands;

Comments, notices?

Regards

Pavel

Dave Page

dpage@pgadmin.org

over 10 years ago

In reply to: Pavel Stehule (#1)

Re: proposal: contrib module - generic command scheduler

On Tue, May 12, 2015 at 10:25 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Generic simple scheduler to contrib
===================================
Job schedulers are important and sometimes very complex part of any
software. PostgreSQL miss it. I propose new contrib module, that can be used
simply for some tasks, and that can be used as base for other more richer
schedulers. I prefer minimalist design - but strong enough for enhancing
when it is necessary. Some complex logic can be implemented in PL better
than in C. Motto: Simply to learn, simply to use, simply to customize.

Motivation
----------
Possibility to simplify administration of repeated tasks. Possibility to
write complex schedulers in PL/pgSQL or other PL.

Design
------
Any scheduled command will be executed in independent worker. The number
workers for one command can be limited. Total number of workers will be
limited. Any command will be executed under specified user with known
timeout in current database. Next step can be implementation global
scheduler - but we have not a environment for running server side global
scripts, so I don't think about it in this moment.

This scheduler does not guarantee number of executions. Without available
workers the execution will be suspended, after crash the execution can be
repeated. But it can be solved in upper layer if it is necessary. It is not
designed as realtime system. Scheduled task will be executed immediately
when related worker will be free, but the execution window is limited to
next start.

This design don't try to solve mechanism for repeating tasks when tasks hash
a crash. This can be solved better in PL on custom layer when it is
necessary.

Scheduled time is stored to type scheduled_time:

create type scheduled_time as (second int[], minute int[], hour int[], dow
int[], month int[]);

(,"{1,10,20,30,40,50}",,,) .. run every 10 minutes.
(,"{5}",,,) .. run once per hour

The core is table pg_scheduled_commands

Oid: 1
name: xxxx
user: pavel
stime: (,"{5}",,,)
max_workers: 1
timeout: 10s
command: SELECT plpgsql_entry(scheduled_time(), scheduled_command_oid())

set timeout to 0 ~ unlimited, -1 default statement_timeout
set max_workers to 0 ~ disable tasks

API
---
pg_create_scheduled_command(name,
stime,
command,
user default current_user,
max_workers default 1,
timeout default -1);

pg_drop_scheduled_command(oid)
pg_drop_scheduled_command(name);

pg_update_scheduled_command(oid | name, ...

Usage:
------
pg_create_scheduled_command('delete obsolete data', '(,,"{1}",,)', $$DELETE
FROM data WHERE inserted < current_timestamp - interval '1month'$$);
pg_update_scheduled_command('delete obsolete data', max_workers => 2,
timeout :=> '1h');
pg_drop_scheduled_command('delete obsolete data');

select * from pg_scheduled_commands;

Comments, notices?

It's not integrated with the server (though it is integrated with
pgAdmin), but pgAgent provides scheduling services for PostgreSQL
already, offering multi-schedule, multi-step job execution.

http://www.pgadmin.org/docs/1.20/pgagent.html

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Pavel Stehule

pavel.stehule@gmail.com

over 10 years ago

In reply to: Dave Page (#2)

Re: proposal: contrib module - generic command scheduler

2015-05-12 10:45 GMT+02:00 Dave Page <dpage@pgadmin.org>:

On Tue, May 12, 2015 at 10:25 AM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

Generic simple scheduler to contrib
===================================
Job schedulers are important and sometimes very complex part of any
software. PostgreSQL miss it. I propose new contrib module, that can be

used

simply for some tasks, and that can be used as base for other more richer
schedulers. I prefer minimalist design - but strong enough for enhancing
when it is necessary. Some complex logic can be implemented in PL better
than in C. Motto: Simply to learn, simply to use, simply to customize.

Motivation
----------
Possibility to simplify administration of repeated tasks. Possibility to
write complex schedulers in PL/pgSQL or other PL.

Design
------
Any scheduled command will be executed in independent worker. The number
workers for one command can be limited. Total number of workers will be
limited. Any command will be executed under specified user with known
timeout in current database. Next step can be implementation global
scheduler - but we have not a environment for running server side global
scripts, so I don't think about it in this moment.

This scheduler does not guarantee number of executions. Without available
workers the execution will be suspended, after crash the execution can be
repeated. But it can be solved in upper layer if it is necessary. It is

not

designed as realtime system. Scheduled task will be executed immediately
when related worker will be free, but the execution window is limited to
next start.

This design don't try to solve mechanism for repeating tasks when tasks

hash

a crash. This can be solved better in PL on custom layer when it is
necessary.

Scheduled time is stored to type scheduled_time:

create type scheduled_time as (second int[], minute int[], hour int[],

dow

int[], month int[]);

(,"{1,10,20,30,40,50}",,,) .. run every 10 minutes.
(,"{5}",,,) .. run once per hour

The core is table pg_scheduled_commands

Oid: 1
name: xxxx
user: pavel
stime: (,"{5}",,,)
max_workers: 1
timeout: 10s
command: SELECT plpgsql_entry(scheduled_time(), scheduled_command_oid())

set timeout to 0 ~ unlimited, -1 default statement_timeout
set max_workers to 0 ~ disable tasks

API
---
pg_create_scheduled_command(name,
stime,
command,
user default current_user,
max_workers default 1,
timeout default -1);

pg_drop_scheduled_command(oid)
pg_drop_scheduled_command(name);

pg_update_scheduled_command(oid | name, ...

Usage:
------
pg_create_scheduled_command('delete obsolete data', '(,,"{1}",,)',

$$DELETE

FROM data WHERE inserted < current_timestamp - interval '1month'$$);
pg_update_scheduled_command('delete obsolete data', max_workers => 2,
timeout :=> '1h');
pg_drop_scheduled_command('delete obsolete data');

select * from pg_scheduled_commands;

Comments, notices?

It's not integrated with the server (though it is integrated with
pgAdmin), but pgAgent provides scheduling services for PostgreSQL
already, offering multi-schedule, multi-step job execution.

http://www.pgadmin.org/docs/1.20/pgagent.html

I know pgagent - the proposal is about more deeper integration with core -
based on background workers without any other dependency.

Regards

Pavel

Show quoted text

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

hubert depesz lubaczewski

depesz@depesz.com

over 10 years ago

In reply to: Pavel Stehule (#1)

Re: proposal: contrib module - generic command scheduler

On Tue, May 12, 2015 at 09:25:50AM +0200, Pavel Stehule wrote:

create type scheduled_time as (second int[], minute int[], hour int[], dow
int[], month int[]);
(,"{1,10,20,30,40,50}",,,) .. run every 10 minutes.
(,"{5}",,,) .. run once per hour
Comments, notices?

First, please note that I'm definitely not a hacker, just a user.

One comment that I'd like to make, is that since we're at planning
phase, I think it would be great to add capability to limit number of
executions of given command.
This would allow running things like "at" in unix - run once, at given
time, and that's it.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Pavel Stehule

pavel.stehule@gmail.com

over 10 years ago

In reply to: hubert depesz lubaczewski (#4)

Re: proposal: contrib module - generic command scheduler

2015-05-12 11:27 GMT+02:00 hubert depesz lubaczewski <depesz@depesz.com>:

On Tue, May 12, 2015 at 09:25:50AM +0200, Pavel Stehule wrote:

create type scheduled_time as (second int[], minute int[], hour int[],

dow

int[], month int[]);
(,"{1,10,20,30,40,50}",,,) .. run every 10 minutes.
(,"{5}",,,) .. run once per hour
Comments, notices?

First, please note that I'm definitely not a hacker, just a user.

One comment that I'd like to make, is that since we're at planning
phase, I think it would be great to add capability to limit number of
executions of given command.
This would allow running things like "at" in unix - run once, at given
time, and that's it.

I would not to store state on this level - so "at" should be implemented on
higher level. There is very high number of possible strategies, what can be
done with failed tasks - and I would not to open this topic. I believe with
proposed scheduler, anybody can simply implement what need in PLpgSQL with
dynamic SQL. But on second hand "run once" can be implemented with proposed
API too.

pg_create_scheduled_command('delete obsolete data', '(,,"{1}","{1}",)',
$$DO $_$
BEGIN
DELETE FROM data WHERE inserted < current_timestamp - interval '1month';
PERFORM pg_update_scheduled_command(scheduled_command_oid(),
max_workers => 0);
END $_$
$$);

Regards

Pavel

Show quoted text

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact
with it.

http://depesz.com/

Craig Ringer

craig@2ndquadrant.com

over 10 years ago

In reply to: Pavel Stehule (#5)

Re: proposal: contrib module - generic command scheduler

On 13 May 2015 at 00:31, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-05-12 11:27 GMT+02:00 hubert depesz lubaczewski <depesz@depesz.com>:

On Tue, May 12, 2015 at 09:25:50AM +0200, Pavel Stehule wrote:

create type scheduled_time as (second int[], minute int[], hour int[],

dow

int[], month int[]);
(,"{1,10,20,30,40,50}",,,) .. run every 10 minutes.
(,"{5}",,,) .. run once per hour
Comments, notices?

First, please note that I'm definitely not a hacker, just a user.

One comment that I'd like to make, is that since we're at planning
phase, I think it would be great to add capability to limit number of
executions of given command.
This would allow running things like "at" in unix - run once, at given
time, and that's it.

I would not to store state on this level - so "at" should be implemented
on higher level. There is very high number of possible strategies, what can
be done with failed tasks - and I would not to open this topic. I believe
with proposed scheduler, anybody can simply implement what need in PLpgSQL
with dynamic SQL. But on second hand "run once" can be implemented with
proposed API too.

That seems reasonable in a v1, so long as there's room to easily extend it
without pain to add "at"-like one-shot commands, at-startup commands, etc.

I'd prefer to see a scheduling interface that's a close match for cron's or
that leaves room for it - so things like "*/5" for every five minutes,
ranges like "Mon-Fri", etc. If there's a way to express similar
capabilities more cleanly using PostgreSQL's types and conventions that
makes sense, but I'm not sure a composite type of arrays fits that.

How do you plan to manage the bgworkers?

In BDR, where we have a similar need to have workers across multiple
databases, and where each database contains a list of workers to launch, we
have:

* A single static "supervisor" bgworker. In 9.5 this will connect with
InvalidOid as the target database so it can only access shared catalogs. In
9.4 this isn't possible in the bgworker API so we have to connect to a
dummy database.

* A dynamic background worker for each database in which BDR is enabled,
which is launched from the supervisor. We check which DBs are BDR-enabled
by (ab)using database security labels and checking pg_shseclabel from the
supervisor worker so we only launch bgworkers on BDR-enabled DBs.

* A dynamic background worker for each peer node, launched by the
per-database worker based on the contents of that database's
bdr.bdr_connections table.

What I suspect you're going to want is:

* A static worker launched by your extension when it starts, which launches
per-db workers for each DB in which the scheduler is enabled. You could use
a GUC listing scheduler-enabled DBs in postgresql.conf and have an
on-reload hook to update it, you don't need to do the security label hack.

* A DB scheduler worker, which looks up the scheduled tasks list, finds the
next scheduled event, and sleeps on a long latch timeout until then,
resetting it when interrupted. When it reaches the scheduled event it would
launch a one-shot BGW_NO_RESTART worker to run the desired PL/PgSQL
procedure over the SPI.

* A task runner worker, which gets launched by the db scheduler to actually
run a task using the SPI.

Does that match your current thinking?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Pavel Stehule

pavel.stehule@gmail.com

over 10 years ago

In reply to: Craig Ringer (#6)

Re: proposal: contrib module - generic command scheduler

2015-05-13 4:08 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:

On 13 May 2015 at 00:31, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-05-12 11:27 GMT+02:00 hubert depesz lubaczewski <depesz@depesz.com>:

On Tue, May 12, 2015 at 09:25:50AM +0200, Pavel Stehule wrote:

create type scheduled_time as (second int[], minute int[], hour int[],

dow

int[], month int[]);
(,"{1,10,20,30,40,50}",,,) .. run every 10 minutes.
(,"{5}",,,) .. run once per hour
Comments, notices?

First, please note that I'm definitely not a hacker, just a user.

One comment that I'd like to make, is that since we're at planning
phase, I think it would be great to add capability to limit number of
executions of given command.
This would allow running things like "at" in unix - run once, at given
time, and that's it.

I would not to store state on this level - so "at" should be implemented
on higher level. There is very high number of possible strategies, what can
be done with failed tasks - and I would not to open this topic. I believe
with proposed scheduler, anybody can simply implement what need in PLpgSQL
with dynamic SQL. But on second hand "run once" can be implemented with
proposed API too.

That seems reasonable in a v1, so long as there's room to easily extend it
without pain to add "at"-like one-shot commands, at-startup commands, etc.

I'd prefer to see a scheduling interface that's a close match for cron's
or that leaves room for it - so things like "*/5" for every five minutes,
ranges like "Mon-Fri", etc. If there's a way to express similar
capabilities more cleanly using PostgreSQL's types and conventions that
makes sense, but I'm not sure a composite type of arrays fits that.

I though about it too - but the parser for this cron time will be longer
than all other code probably. I see a possibility to write constructors
that simplify creating a value of this type. Some like

make_scheduled_time(secs => '*/5', dows => 'Mon-Fri') or
make_scheduled_time(at =>'2015-014-05 10:00:0'::timestamp);

There are two possible ways - composite with arrays or custom composite.
I'll decide later.

There are basic points:

1. don't hold a states, results of commands
2. It execute task immediately in related time window once (from start to
next start), when necessary worker is available
3. When command fails, it writes info to log only
4. When command runs too long (over specified timeout), it is killed.
5. When command waits to free worker, write to log
6. When command was not be executed due missing workers (and max_workers >
0), write to log

How do you plan to manage the bgworkers?

I am thinking about one static supervisor, that will hold a calendar in
shared memory, that will start dynamic bgworkers for commands per database.
The scheduler is enabled in all databases, where the proposed extension is
installed.

For working with prototype I am planning to use SPI, but maybe it is not
necessary - so commands like VACUUM, CREATE DATABASE, DROP DATABASE can be
supported too. But I didn't tested it and I don't know if it is possible or
not. It can define new hooks too. So some other extensions can be based on
it.

Show quoted text

In BDR, where we have a similar need to have workers across multiple
databases, and where each database contains a list of workers to launch, we
have:

* A single static "supervisor" bgworker. In 9.5 this will connect with
InvalidOid as the target database so it can only access shared catalogs. In
9.4 this isn't possible in the bgworker API so we have to connect to a
dummy database.

* A dynamic background worker for each database in which BDR is enabled,
which is launched from the supervisor. We check which DBs are BDR-enabled
by (ab)using database security labels and checking pg_shseclabel from the
supervisor worker so we only launch bgworkers on BDR-enabled DBs.

* A dynamic background worker for each peer node, launched by the
per-database worker based on the contents of that database's
bdr.bdr_connections table.

What I suspect you're going to want is:

* A static worker launched by your extension when it starts, which
launches per-db workers for each DB in which the scheduler is enabled. You
could use a GUC listing scheduler-enabled DBs in postgresql.conf and have
an on-reload hook to update it, you don't need to do the security label
hack.

* A DB scheduler worker, which looks up the scheduled tasks list, finds
the next scheduled event, and sleeps on a long latch timeout until then,
resetting it when interrupted. When it reaches the scheduled event it would
launch a one-shot BGW_NO_RESTART worker to run the desired PL/PgSQL
procedure over the SPI.

* A task runner worker, which gets launched by the db scheduler to
actually run a task using the SPI.

Does that match your current thinking?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Jim Nasby

Jim.Nasby@BlueTreble.com

over 10 years ago

In reply to: Pavel Stehule (#7)

Re: proposal: contrib module - generic command scheduler

On 5/12/15 11:32 PM, Pavel Stehule wrote:

I would not to store state on this level - so "at" should be
implemented on higher level. There is very high number of
possible strategies, what can be done with failed tasks - and I
would not to open this topic. I believe with proposed scheduler,
anybody can simply implement what need in PLpgSQL with dynamic
SQL. But on second hand "run once" can be implemented with
proposed API too.

That seems reasonable in a v1, so long as there's room to easily
extend it without pain to add "at"-like one-shot commands,
at-startup commands, etc.

Yeah, being able to run things after certain system events would be nice.

I'd prefer to see a scheduling interface that's a close match for
cron's or that leaves room for it - so things like "*/5" for every
five minutes, ranges like "Mon-Fri", etc. If there's a way to
express similar capabilities more cleanly using PostgreSQL's types
and conventions that makes sense, but I'm not sure a composite type
of arrays fits that.

It seems unfortunate to go with cron's limited syntax when we have such
fully capable timestamp and interval capabilities already in the
database. :/

Is there anything worth stealing from pgAgent?

I though about it too - but the parser for this cron time will be longer
than all other code probably. I see a possibility to write constructors
that simplify creating a value of this type. Some like

make_scheduled_time(secs => '*/5', dows => 'Mon-Fri') or
make_scheduled_time(at =>'2015-014-05 10:00:0'::timestamp);

Wouldn't that be just as bad as writing the parser in the first place?

1. don't hold a states, results of commands

...

3. When command fails, it writes info to log only

Unfortunate, but understandable in a first pass.

4. When command runs too long (over specified timeout), it is killed.

I think that needs to be optional.

5. When command waits to free worker, write to log
6. When command was not be executed due missing workers (and max_workers

0), write to log

Also unfortunate. We already don't provide enough monitoring capability
and this just makes that worse.

Perhaps it would be better to put something into PGXN first; this
doesn't really feel like it's baked enough for contrib yet. (And I say
that as someone who's really wanted this ability in the past...)
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Pavel Stehule

pavel.stehule@gmail.com

over 10 years ago

In reply to: Jim Nasby (#8)

Re: proposal: contrib module - generic command scheduler

2015-05-13 7:50 GMT+02:00 Jim Nasby <Jim.Nasby@bluetreble.com>:

On 5/12/15 11:32 PM, Pavel Stehule wrote:

I would not to store state on this level - so "at" should be
implemented on higher level. There is very high number of
possible strategies, what can be done with failed tasks - and I
would not to open this topic. I believe with proposed scheduler,
anybody can simply implement what need in PLpgSQL with dynamic
SQL. But on second hand "run once" can be implemented with
proposed API too.

That seems reasonable in a v1, so long as there's room to easily
extend it without pain to add "at"-like one-shot commands,
at-startup commands, etc.

Yeah, being able to run things after certain system events would be nice.

I'd prefer to see a scheduling interface that's a close match for

cron's or that leaves room for it - so things like "*/5" for every
five minutes, ranges like "Mon-Fri", etc. If there's a way to
express similar capabilities more cleanly using PostgreSQL's types
and conventions that makes sense, but I'm not sure a composite type
of arrays fits that.

It seems unfortunate to go with cron's limited syntax when we have such
fully capable timestamp and interval capabilities already in the database.
:/

It is next option - MySQL event scheduler use it. The usage is trivial -
but it is little bit weak - it is hard to describe some asymmetric events -
like run in working days only - but if I use named parameters and axillary
constructor function I am thinking so it can be supported too.

make_scheduled_time(at => '2015-014-05 10:00:0', repeat => '1day',
stop_after => '...')

Is there anything worth stealing from pgAgent?

Surely not - although I have little bit different goals - pgAgent is top
end scheduler - little bit complex due support jobs/steps. My target is
implementation of low end scheduler. pgAgent and others can be implemented
as next layer. It should be strong enough for some simple admin tasks, and
strong enough for base for implementation some complex scheduler and
workflow systems - but it should be simple as possible. In this moment
PLpgSQL is strong enough for implementation very complex workflow system -
but missing the low end scheduling functionality.

I though about it too - but the parser for this cron time will be longer

than all other code probably. I see a possibility to write constructors
that simplify creating a value of this type. Some like

make_scheduled_time(secs => '*/5', dows => 'Mon-Fri') or
make_scheduled_time(at =>'2015-014-05 10:00:0'::timestamp);

Wouldn't that be just as bad as writing the parser in the first place?

yes - I am thinking about special type, where input function will be empty
and value has to be created with constructor function - it can simplify
parser lot.

1. don't hold a states, results of commands

...

3. When command fails, it writes info to log only

Unfortunate, but understandable in a first pass.

4. When command runs too long (over specified timeout), it is killed.

I think that needs to be optional.

you can specify timeout for any command - so if you specify timeout 0, then
it will run without timeout.

5. When command waits to free worker, write to log

6. When command was not be executed due missing workers (and max_workers

0), write to log

Also unfortunate. We already don't provide enough monitoring capability
and this just makes that worse.

theoretically it can be supported some pg_stat_ view - but I would not to
implement a some history table for commands. Again it is task for higher
layers.

Perhaps it would be better to put something into PGXN first; this doesn't
really feel like it's baked enough for contrib yet. (And I say that as
someone who's really wanted this ability in the past...)

It is plan B. I am thinking so PostgreSQL missing some lowend scheduler so
I am asking here. Some features can be implemented later, some features can
be implemented elsewhere. I have to specify limit, borders, what is simple
scheduler, and what is not. The full functionality scheduler is relatively
heavy application - so it should not be a contrib module. But simple
generic scheduler can be good enough for 50% and with some simple plpgsql
code for other 40%

Regards

Pavel

Show quoted text

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

#10

Jim Nasby

Jim.Nasby@BlueTreble.com

over 10 years ago

In reply to: Pavel Stehule (#9)

Re: proposal: contrib module - generic command scheduler

On 5/13/15 1:32 AM, Pavel Stehule wrote:

5. When command waits to free worker, write to log
6. When command was not be executed due missing workers (and
max_workers

0), write to log

Also unfortunate. We already don't provide enough monitoring
capability and this just makes that worse.

theoretically it can be supported some pg_stat_ view - but I would not
to implement a some history table for commands. Again it is task for
higher layers.

I don't think we want to log statements, but we should be able to log
when a job has run and whether it succeeded or not. (log in a table, not
just a logfile).

This isn't something that can be done at higher layers either; only the
scheduler will know if the job failed to even start, or whether it tried
to run the job.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Pavel Stehule

pavel.stehule@gmail.com

over 10 years ago

In reply to: Jim Nasby (#10)

Re: proposal: contrib module - generic command scheduler

2015-05-14 8:01 GMT+02:00 Jim Nasby <Jim.Nasby@bluetreble.com>:

On 5/13/15 1:32 AM, Pavel Stehule wrote:

5. When command waits to free worker, write to log
6. When command was not be executed due missing workers (and
max_workers

0), write to log

Also unfortunate. We already don't provide enough monitoring
capability and this just makes that worse.

theoretically it can be supported some pg_stat_ view - but I would not
to implement a some history table for commands. Again it is task for
higher layers.

I don't think we want to log statements, but we should be able to log when
a job has run and whether it succeeded or not. (log in a table, not just a
logfile).

This isn't something that can be done at higher layers either; only the
scheduler will know if the job failed to even start, or whether it tried to
run the job.

I don't agree - generic scheduler can run your procedure, and there you can
log start, you can run other commands and you can log result (now there is
no problem to catch any production nonfatal exception). Personally I afraid
about responsibility to maintain this log table - when and by who it should
be cleaned, who can see results, ... This is job for top end scheduler.

Regards

Pavel

Show quoted text

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

#12

Jim Nasby

Jim.Nasby@BlueTreble.com

over 10 years ago

In reply to: Pavel Stehule (#11)

Re: proposal: contrib module - generic command scheduler

On 5/14/15 1:36 AM, Pavel Stehule wrote:

I don't think we want to log statements, but we should be able to
log when a job has run and whether it succeeded or not. (log in a
table, not just a logfile).

This isn't something that can be done at higher layers either; only
the scheduler will know if the job failed to even start, or whether
it tried to run the job.

I don't agree - generic scheduler can run your procedure, and there you
can log start, you can run other commands and you can log result (now
there is no problem to catch any production nonfatal exception).

And what happens when the job fails to even start? You get no logging.

Personally I afraid about responsibility to maintain this log table -
when and by who it should be cleaned, who can see results, ... This is
job for top end scheduler.

Only if the top-end scheduler has callbacks for everytime the bottom-end
scheduler tries to start a job. Otherwise, the top has no clue what the
bottom has actually attempted.

To be clear, I don't think these need to be done in a first pass. I am
concerned about not painting ourselves into a corner though.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Pavel Stehule

pavel.stehule@gmail.com

over 10 years ago

In reply to: Jim Nasby (#12)

Re: proposal: contrib module - generic command scheduler

2015-05-14 19:12 GMT+02:00 Jim Nasby <Jim.Nasby@bluetreble.com>:

On 5/14/15 1:36 AM, Pavel Stehule wrote:

I don't think we want to log statements, but we should be able to
log when a job has run and whether it succeeded or not. (log in a
table, not just a logfile).

This isn't something that can be done at higher layers either; only
the scheduler will know if the job failed to even start, or whether
it tried to run the job.

I don't agree - generic scheduler can run your procedure, and there you
can log start, you can run other commands and you can log result (now
there is no problem to catch any production nonfatal exception).

And what happens when the job fails to even start? You get no logging.

Is only one case - when job is not started due missing worker. Else where
is started topend executor, that can run in protected block.

Personally I afraid about responsibility to maintain this log table -

when and by who it should be cleaned, who can see results, ... This is
job for top end scheduler.

Only if the top-end scheduler has callbacks for everytime the bottom-end
scheduler tries to start a job. Otherwise, the top has no clue what the
bottom has actually attempted.

sure.

To be clear, I don't think these need to be done in a first pass. I am
concerned about not painting ourselves into a corner though.

I understand

Regards

Pavel

Show quoted text

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

#14

Alvaro Herrera

alvherre@2ndquadrant.com

over 10 years ago

In reply to: Pavel Stehule (#1)

Re: proposal: contrib module - generic command scheduler

Pavel Stehule wrote:

Hi,

Job schedulers are important and sometimes very complex part of any
software. PostgreSQL miss it. I propose new contrib module, that can be
used simply for some tasks, and that can be used as base for other more
richer schedulers. I prefer minimalist design - but strong enough for
enhancing when it is necessary. Some complex logic can be implemented in PL
better than in C. Motto: Simply to learn, simply to use, simply to
customize.

Have you made any progress on this?

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Pavel Stehule

pavel.stehule@gmail.com

over 10 years ago

In reply to: Alvaro Herrera (#14)

Re: proposal: contrib module - generic command scheduler

2015-08-20 16:42 GMT+02:00 Alvaro Herrera <alvherre@2ndquadrant.com>:

Pavel Stehule wrote:

Hi,

Job schedulers are important and sometimes very complex part of any
software. PostgreSQL miss it. I propose new contrib module, that can be
used simply for some tasks, and that can be used as base for other more
richer schedulers. I prefer minimalist design - but strong enough for
enhancing when it is necessary. Some complex logic can be implemented in

PL

better than in C. Motto: Simply to learn, simply to use, simply to
customize.

Have you made any progress on this?

I am working on second iteration prototype - resp. I worked one month ago.
I finished the basic design, basic infrastructure. I designed architecture
based on one coordinator and dynamicaly started workers.

I found, so probably some fair policy should be implemented in future.

I have to finish other requests now, so I am planning to continue at end of
autumn, but sources are public

https://github.com/okbob/generic-scheduler, Not sure, about code quality -
I had not time to debug it. But mental model (UI) is almost complete -
https://github.com/okbob/generic-scheduler/blob/master/schedulerx--1.0.sql

I found as interesting idea to handle not only time events, but handle our
notifications too. It can be perfect base for building some complex
workflow systems. But I did zero work on this topic.

Regards

Pavel

Show quoted text

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services