using a postgres table as a multi-writer multi-updater queue
Hi All,
I wondered if any of you could recommend best practices for using a
postgres table as a queue. Roughly speaking, 100-200 workers will vomit
rows and rates of a few hundres per second into the table leaving the
status as new and then as many workers as needed to keep up with the
load will plough through the queue changing the status to something
other than new.
My naive implementation would be something along the lines of:
CREATE TABLE event (
ts timestamp,
event char(40),
status char(10),
CONSTRAINT pkey PRIMARY KEY(ts, event)
);
...with writers doing INSERT or COPY to get data into the table and
readers doing something like:
SELECT FOR UPDATE * FROM event WHERE status='new' LIMIT 1000;
...so, grabbing batches of 1,000, working on them and then setting their
status.
But, am I correct in thinking that SELECT FOR UPDATE will not prevent
multiple workers selecting the same rows?
Anyway, is this approach reasonable? If so, what tweaks/optimisations
should I be looking to make?
If it's totally wrong, how should I be looking to approach the problem?
cheers,
Chris
Hello.
On 23.11.2015 11:41, Chris Withers wrote:
Hi All,
I wondered if any of you could recommend best practices for using a postgres
table as a queue. Roughly speaking, 100-200 workers will vomit rows and rates of
a few hundres per second into the table leaving the status as new and then as
many workers as needed to keep up with the load will plough through the queue
changing the status to something other than new.My naive implementation would be something along the lines of:
CREATE TABLE event (
ts timestamp,
event char(40),
status char(10),
CONSTRAINT pkey PRIMARY KEY(ts, event)
);...with writers doing INSERT or COPY to get data into the table and readers
doing something like:SELECT FOR UPDATE * FROM event WHERE status='new' LIMIT 1000;
...so, grabbing batches of 1,000, working on them and then setting their status.
But, am I correct in thinking that SELECT FOR UPDATE will not prevent multiple
workers selecting the same rows?Anyway, is this approach reasonable? If so, what tweaks/optimisations should I
be looking to make?If it's totally wrong, how should I be looking to approach the problem?
I suggest an excellent read on this topic:
http://www.depesz.com/2013/08/30/pick-a-task-to-work-on/
Highly recommended if you haven't read it yet.
Also, if you aim on 9.5 (not released yet), it will introduce:
SELECT...
FOR UPDATE
SKIP LOCKED -- this is new
which supports exactly this use-case (i.e. to implement a job queue).
HTH,
Ladislav Lenart
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 11/23/15 6:12 AM, Ladislav Lenart wrote:
I suggest an excellent read on this topic:
http://www.depesz.com/2013/08/30/pick-a-task-to-work-on/
Highly recommended if you haven't read it yet.
One thing it doesn't mention that you need to be aware of is the vacuum
workload on a queue table. In a busy queue, it will be difficult or even
impossible for vacuum to keep the amount of dead rows down to something
manageable. That's why PgQ and Slony don't even attempt it; instead,
they rotate through a fixed set of tables. Once all the entries in a
table have been processed, the table is truncated.
If you go the delete route, make sure you don't index any fields in the
queue that get updated (otherwise you won't get HOT updates), and run a
very aggressive manual vacuum so the table stays small.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 11/23/2015 4:41 AM, Chris Withers wrote:
Hi All,
I wondered if any of you could recommend best practices for using a
postgres table as a queue. Roughly speaking, 100-200 workers will vomit
rows and rates of a few hundres per second into the table leaving the
status as new and then as many workers as needed to keep up with the
load will plough through the queue changing the status to something
other than new.My naive implementation would be something along the lines of:
CREATE TABLE event (
ts timestamp,
event char(40),
status char(10),
CONSTRAINT pkey PRIMARY KEY(ts, event)
);...with writers doing INSERT or COPY to get data into the table and
readers doing something like:SELECT FOR UPDATE * FROM event WHERE status='new' LIMIT 1000;
...so, grabbing batches of 1,000, working on them and then setting their
status.But, am I correct in thinking that SELECT FOR UPDATE will not prevent
multiple workers selecting the same rows?Anyway, is this approach reasonable? If so, what tweaks/optimisations
should I be looking to make?If it's totally wrong, how should I be looking to approach the problem?
cheers,
Chris
Have you tried Redis? Its really good at that sort of thing.
-Andy
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 11/23/2015 2:41 AM, Chris Withers wrote:
If it's totally wrong, how should I be looking to approach the problem?
depending on where these queue entries are coming from, I'd considering
using a message queueing system like AMS, MQseries, etc, rather than
trying to use a relational database table as a queue. your external data
source(s) would write messages to this queue, and you'd have
'subscriber' processes that listen to the queue and process the
messages, inserting persistent data into the database as needed.
--
john r pierce, recycling bits in santa cruz
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Mon, Nov 23, 2015 at 12:54 PM, John R Pierce <pierce@hogranch.com> wrote:
On 11/23/2015 2:41 AM, Chris Withers wrote:
If it's totally wrong, how should I be looking to approach the problem?
depending on where these queue entries are coming from, I'd considering
using a message queueing system like AMS, MQseries, etc, rather than trying
to use a relational database table as a queue. your external data source(s)
would write messages to this queue, and you'd have 'subscriber' processes
that listen to the queue and process the messages, inserting persistent data
into the database as needed.
I just don't agree with this generalization. Keeping the state of the
queue in the database has a lot of advantages and is a lot easier to
deal with from a programming perspective especially if SQL is your
core competency. Being able to produce and consume in SQL based on
other relational datasources is...elegant.
Specialized queue systems are a very heavy dependency and adding a new
server to your platform to mange queues is not something to take
lightly. This advice also applies to scheduling systems like quartz,
specialized search like solr and elastisearch, and distributed data
platforms like hadoop. I've used all of these things and have tended
to wish I had just used the database instead in just about every case.
Also, check out pgq (https://wiki.postgresql.org/wiki/PGQ_Tutorial).
Personally, I tend to roll my own queues. It's not difficult.
merlin
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 11/23/2015 2:51 PM, Merlin Moncure wrote:
On Mon, Nov 23, 2015 at 12:54 PM, John R Pierce<pierce@hogranch.com> wrote:
depending on where these queue entries are coming from, I'd considering
using a message queueing system like AMS, MQseries, etc, rather than trying
to use a relational database table as a queue. your external data source(s)
would write messages to this queue, and you'd have 'subscriber' processes
that listen to the queue and process the messages, inserting persistent data
into the database as needed.I just don't agree with this generalization. Keeping the state of the
queue in the database has a lot of advantages and is a lot easier to
deal with from a programming perspective especially if SQL is your
core competency. Being able to produce and consume in SQL based on
other relational datasources is...elegant.
our whole system at $job is message based as its a distributed
system. clients send messages to middleware servers that talk to the
database servers. noone talks directly to the database, instead they
use messaging. also the several databases in our core cluster talk to
each other with messaging, where the front end database publishes events
that the other reporting database servers subscribe to. its a very
powerful model for building complex distributed systems and maintaining
quite a lot of implementation flexibility, as the exact nature of the
schema only needs to be known by a few publisher and subscriber modules.
--
john r pierce, recycling bits in santa cruz
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
I'll add my two cents.....
I set up something similar a while ago. Here are my suggestions for what
they are worth.
You don't ever want to delete from such a table so you need to set up
something which allows you to truncate the tables when you no longer need
them.
One way to accomplish this is with rules (not triggers, rules are blazingly
fast compared to triggers). Set up a table inheritance scheme based on
whatever search criteria you have (date, sequence etc). Set up a cron job
to create the child tables well ahead and to set up the rule redirecting
the insert. For example let's say you have a date partitioned table and you
want to keep a table for every day. Your cron job would run once a day
and would create the next seven days worth of tables (just incase the cron
job fails to run for some reason) and would rewrite the rule to insert into
the table with a if then else type of logic. This method is preferable to
the dynamic creation of the table name with string concatenation because
again it's significantly faster.
Another method I tried was to have one "primary" child table and "archival"
child tables and insert directly into the primal child table. For example
say you have a table called "Q". You set up a table called Q_in which
inherits from Q. Your code inserts into the Q_in table, you select from
the Q table. On a periodic basis you do this
BEGIN TRANSACTION
LOCK TABLE Q_IN IN EXCLUSIVE MODE;
ALTER TABLE Q_IN RENAME TO Q_SOME_DATETIME;
CREATE TABLE Q_IN
(LIKE Q_SOME_DATETIME INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING
INDEXES);
ALTER SEQUENCE q_in_id_seq OWNED BY q_in.id;
-- Set some constraints so the query optimizer knows what to do
END TRANSACTION
There is one other method which is the Kafka approach( You can use this in
addition to the above methods)
Create N tables for incoming queue, each one has a sequence for their ID
number. N should be determined by how many clients you expect to run.
Create a rule which uses some hash function or round robin or randomly to
insert data into one of the tables. Create a different table which keeps
track of client connections. The clients use this table to keep track of
the last id fetched. For example let's have I have three types of
processes that run on the incoming data p1,p2, p3 The table logs the
highest ID fetched from each table for each process. When a client connects
it connects to the table with the lowest used ID for that process, and it
locks it for that client (not process because you can multiple clients
running each process), it processes the records, it updates the id, it
unlocks the table and it backs off for a few seconds. The next client
which woke up goes through the same process and so on. Both Apache Kafka
and Amazon kinesis use this approach. One nice thing about this approach
is that you can put each table in it's own tablespace in it's own disk for
higher performance.
One other thing. This is dangerous but you can turn off logging of the
tables, this will make the inserts much faster.
Finally:
There is no need to do any of this. Kinesis is cheap, Kafka is pretty
awesome, Rabbit is crazy useful.
Hope this helps, ping me offline if you want more details.
Cheers.
On Tue, Nov 24, 2015 at 11:51 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
Show quoted text
On Mon, Nov 23, 2015 at 12:54 PM, John R Pierce <pierce@hogranch.com>
wrote:On 11/23/2015 2:41 AM, Chris Withers wrote:
If it's totally wrong, how should I be looking to approach the problem?
depending on where these queue entries are coming from, I'd considering
using a message queueing system like AMS, MQseries, etc, rather thantrying
to use a relational database table as a queue. your external data
source(s)
would write messages to this queue, and you'd have 'subscriber' processes
that listen to the queue and process the messages, inserting persistentdata
into the database as needed.
I just don't agree with this generalization. Keeping the state of the
queue in the database has a lot of advantages and is a lot easier to
deal with from a programming perspective especially if SQL is your
core competency. Being able to produce and consume in SQL based on
other relational datasources is...elegant.Specialized queue systems are a very heavy dependency and adding a new
server to your platform to mange queues is not something to take
lightly. This advice also applies to scheduling systems like quartz,
specialized search like solr and elastisearch, and distributed data
platforms like hadoop. I've used all of these things and have tended
to wish I had just used the database instead in just about every case.Also, check out pgq (https://wiki.postgresql.org/wiki/PGQ_Tutorial).
Personally, I tend to roll my own queues. It's not difficult.merlin
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Mon, Nov 23, 2015 at 2:41 AM, Chris Withers <chris@simplistix.co.uk> wrote:
Hi All,
I wondered if any of you could recommend best practices for using a postgres
table as a queue. Roughly speaking, 100-200 workers will vomit rows and
rates of a few hundres per second into the table leaving the status as new
and then as many workers as needed to keep up with the load will plough
through the queue changing the status to something other than new.
Is that several hundred per second per worker, or just several hundred
per second in total?
What happens if the worker finds the database has crashed when it goes
to insert the records? That would determine what kind of
transactional system you need. On the consumer side, what would
happen if a record were processed twice?
My naive implementation would be something along the lines of:
CREATE TABLE event (
ts timestamp,
event char(40),
status char(10),
CONSTRAINT pkey PRIMARY KEY(ts, event)
);
How long are going you keep these records around for once processed?
Unless you delete them immediately, you will probably want a partial
index on (ts) where status='new'
...with writers doing INSERT or COPY to get data into the table and readers
doing something like:SELECT FOR UPDATE * FROM event WHERE status='new' LIMIT 1000;
...so, grabbing batches of 1,000, working on them and then setting their
status.
Why 1000 at a time? Also, you probably need an ORDER BY
But, am I correct in thinking that SELECT FOR UPDATE will not prevent
multiple workers selecting the same rows?
You are correct, but the way it will prevent multiple workers from
selecting them at the same time is that the next worker will block
until the first one commits. You would either use need to use SKIP
LOCKED in 9.5 release, or you would need to use
pg_try_advisory_xact_lock on lower versions, to avoid that.
And, how do you unlock the rows? There are two general approaches.
One is to lock the row using PostgreSQL's FOR UPDATE type locks, and
hold the transaction open while processing, then updating the row to
mark it done and committing to release the lock. This cleans up after
itself in the case a worker crashes, but there is no visibility into
what is going on.
The other is to claim the row for the worker by updating a status
field (for example, to have a hostname and pid), and committing that.
And then doing the processing, then updating it again to set it as
done, and committing that. An abnormal terminated worker will need to
have someone or something clean up after it, but it gives you much
better visibility into what is happening.
Anyway, is this approach reasonable? If so, what tweaks/optimisations should
I be looking to make?
The best optimization would be to not do it at all. Why can't the 200
inserting worker just do the work themselves immediately (either
client side or server side), instead of queuing it? If there is any
possibility of making the processing fast enough to do it that way,
I'd surely spend my time optimizing the actual work, rather than
optimizing a queuing system.
And depending on what failure modes you can tolerate, consider a
best-effort dedicated queuing system rather than a perfectly ACID one
built on PostgreSQL.
Cheers,
Jeff
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
(Kindly forgive my top-post. My stupid email client software (Microsoft Outlook Express on Win XP) refuses to respect its own "Reply" option settings for inline reply text. I've been looking for a replacement email client but so far without success.)
* * *
* * *
You don't ever want to delete from such a table so you need to set up something which allows you to truncate the tables when you no longer need them.
I am migrating a web PHP application (called ITS-ETO) from mysql to postgres. The app INSERTs a row into a postgres table <eto_sql_tb_session_www> to manage each session with a web browser. Here is the DDL for the session table:
CREATE TABLE its_eto.eto_sql_tb_session_www
(
session_www_code char(32) NOT NULL UNIQUE PRIMARY KEY,
session_www_type int NOT NULL,
session_www_state int NOT NULL,
session_verify_code char(7) NOT NULL,
session_www_serno SERIAL NOT NULL UNIQUE,
session_target_serno int NULL,
session_target_data varchar(1000) NULL,
session_www_init_utc timestamp NOT NULL,
session_www_last_utc timestamp NOT NULL,
session_www_expiry_utc timestamp NOT NULL,
session_www_delete_utc timestamp NOT NULL,
session_www_hit_count int NOT NULL,
session_www_act_seqno int NULL
);
CREATE INDEX ON its_eto.eto_sql_tb_session_www (session_www_serno);
Using a "fuzzy" probability mechanism, some randomly-selected fraction of the HTTP requests that initiate a new session, also SELECT and DELETE expired rows from the session table. I naively assumed that the database server would automatically recycle the storage space dynamically released in this way.
Now, I'm reading in this forum that in fact, postgres does not efficiently automatically recycle storage space released by row DELETion.
* * *
* * *
My application is quite simple and will be supporting a modest workload, using a small amount of storage space, compared to the massive transaction rates and gigantic space usages, I'm reading about in this forum.
I do have the luxury of being able to shut down the application for a few minutes periodically e.g every 24 hours.
My plan was always, to avoid eventual exhaustion of the SERIAL sequence number integer value series, by swapping in during the periodic app shutdown, a freshly truncated postgres <eto_sql_tb_session_www> table.
Before going online, the freshly truncated postgres <eto_sql_tb_session_www> table will receive INSERTs of any active session rows copied over from the old <eto_sql_tb_session_www> table. The copied session rows will get new sequence numbers, but that won't matter, because a session row is referenced within each incoming HTTP request, not by its row serial number column <session_www_serno> integer value, but by a randomly-generated (MD5) 32-character unique key column <session_www_code> value.
So my question to this postgres forum is -- should I just remove from the online app the "fuzzy" probability mechanism, that DELETEs expired rows from the session table -- because the postgres server is not going to dynamically recycle the released storage space anyway?
Any comments appreciated.
Steve
----- Original Message -----
From: Tim Uckun
To: Merlin Moncure
Cc: John R Pierce ; PostgreSQL General
Sent: Wednesday, November 25, 2015 3:50 PM
Subject: Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue
I'll add my two cents.....
I set up something similar a while ago. Here are my suggestions for what they are worth.
You don't ever want to delete from such a table so you need to set up something which allows you to truncate the tables when you no longer need them.
One way to accomplish this is with rules (not triggers, rules are blazingly fast compared to triggers). Set up a table inheritance scheme based on whatever search criteria you have (date, sequence etc). Set up a cron job to create the child tables well ahead and to set up the rule redirecting the insert. For example let's say you have a date partitioned table and you want to keep a table for every day. Your cron job would run once a day and would create the next seven days worth of tables (just incase the cron job fails to run for some reason) and would rewrite the rule to insert into the table with a if then else type of logic. This method is preferable to the dynamic creation of the table name with string concatenation because again it's significantly faster.
Another method I tried was to have one "primary" child table and "archival" child tables and insert directly into the primal child table. For example say you have a table called "Q". You set up a table called Q_in which inherits from Q. Your code inserts into the Q_in table, you select from the Q table. On a periodic basis you do this
BEGIN TRANSACTION
LOCK TABLE Q_IN IN EXCLUSIVE MODE;
ALTER TABLE Q_IN RENAME TO Q_SOME_DATETIME;
CREATE TABLE Q_IN
(LIKE Q_SOME_DATETIME INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES);
ALTER SEQUENCE q_in_id_seq OWNED BY q_in.id;
-- Set some constraints so the query optimizer knows what to do
END TRANSACTION
There is one other method which is the Kafka approach( You can use this in addition to the above methods)
Create N tables for incoming queue, each one has a sequence for their ID number. N should be determined by how many clients you expect to run. Create a rule which uses some hash function or round robin or randomly to insert data into one of the tables. Create a different table which keeps track of client connections. The clients use this table to keep track of the last id fetched. For example let's have I have three types of processes that run on the incoming data p1,p2, p3 The table logs the highest ID fetched from each table for each process. When a client connects it connects to the table with the lowest used ID for that process, and it locks it for that client (not process because you can multiple clients running each process), it processes the records, it updates the id, it unlocks the table and it backs off for a few seconds. The next client which woke up goes through the same process and so on. Both Apache Kafka and Amazon kinesis use this approach. One nice thing about this approach is that you can put each table in it's own tablespace in it's own disk for higher performance.
One other thing. This is dangerous but you can turn off logging of the tables, this will make the inserts much faster.
Finally:
There is no need to do any of this. Kinesis is cheap, Kafka is pretty awesome, Rabbit is crazy useful.
Hope this helps, ping me offline if you want more details.
Cheers.
On Tue, Nov 24, 2015 at 11:51 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Mon, Nov 23, 2015 at 12:54 PM, John R Pierce <pierce@hogranch.com> wrote:
On 11/23/2015 2:41 AM, Chris Withers wrote:
If it's totally wrong, how should I be looking to approach the problem?
depending on where these queue entries are coming from, I'd considering
using a message queueing system like AMS, MQseries, etc, rather than trying
to use a relational database table as a queue. your external data source(s)
would write messages to this queue, and you'd have 'subscriber' processes
that listen to the queue and process the messages, inserting persistent data
into the database as needed.
I just don't agree with this generalization. Keeping the state of the
queue in the database has a lot of advantages and is a lot easier to
deal with from a programming perspective especially if SQL is your
core competency. Being able to produce and consume in SQL based on
other relational datasources is...elegant.
Specialized queue systems are a very heavy dependency and adding a new
server to your platform to mange queues is not something to take
lightly. This advice also applies to scheduling systems like quartz,
specialized search like solr and elastisearch, and distributed data
platforms like hadoop. I've used all of these things and have tended
to wish I had just used the database instead in just about every case.
Also, check out pgq (https://wiki.postgresql.org/wiki/PGQ_Tutorial).
Personally, I tend to roll my own queues. It's not difficult.
merlin
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Wed, Nov 25, 2015 at 8:39 PM, Steve Petrie, P.Eng.
<apetrie@aspetrie.net> wrote:
You don't ever want to delete from such a table so you need to set up
something which allows you to truncate the tables when you no longer need
them.I am migrating a web PHP application (called ITS-ETO) from mysql to
postgres. The app INSERTs a row into a postgres table
<eto_sql_tb_session_www> to manage each session with a web browser. Here is
the DDL for the session table:CREATE TABLE its_eto.eto_sql_tb_session_www
(
session_www_code char(32) NOT NULL UNIQUE PRIMARY KEY,session_www_type int NOT NULL,
session_www_state int NOT NULL,
session_verify_code char(7) NOT NULL,session_www_serno SERIAL NOT NULL UNIQUE,
session_target_serno int NULL,
session_target_data varchar(1000) NULL,session_www_init_utc timestamp NOT NULL,
session_www_last_utc timestamp NOT NULL,
session_www_expiry_utc timestamp NOT NULL,
session_www_delete_utc timestamp NOT NULL,
session_www_hit_count int NOT NULL,
session_www_act_seqno int NULL);
CREATE INDEX ON its_eto.eto_sql_tb_session_www (session_www_serno);
Using a "fuzzy" probability mechanism, some randomly-selected fraction of
the HTTP requests that initiate a new session, also SELECT and DELETE
expired rows from the session table. I naively assumed that the database
server would automatically recycle the storage space dynamically released in
this way.Now, I'm reading in this forum that in fact, postgres does not efficiently
automatically recycle storage space released by row DELETion.
My application is quite simple and will be supporting a modest workload,
using a small amount of storage space, compared to the massive transaction
rates and gigantic space usages, I'm reading about in this forum.
Truncation is far more efficient than deletion + vacuuming. If you
are running on the edge of your hardware's capabilities, this
efficiency is important. But if you are not on the edge, then it is
not worth worrying about. Just make sure your autovacuum settings are
at least as aggressive as the default settings.
I do have the luxury of being able to shut down the application for a few
minutes periodically e.g every 24 hours.My plan was always, to avoid eventual exhaustion of the SERIAL sequence
number integer value series, by swapping in during the periodic app
shutdown, a freshly truncated postgres <eto_sql_tb_session_www> table.
I'd make the serial column and bigserial, and then forget about it.
Cheers,
Jeff
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Wed, 25 Nov 2015 23:39:39 -0500, "Steve Petrie, P.Eng."
<apetrie@aspetrie.net> wrote:
Now, I'm reading in this forum that in fact, postgres does not efficiently
automatically recycle storage space released by row DELETion.
Yes and no.
Space resulting from deleted rows is not given back to the OS unless
you perform a full vacuum. Autovacuum compacts the table, squeezing
out empty rows and leaving free space at the end for new insertions.
If the table is heavily used, you will reuse the free space quickly
anyway. The problem you may run into is needing to autovacuum too
often to control the size of the table.
Space from truncated or dropped tables is immediately given back to
the OS.
The issue with heavy deletion usually is performance. Deleting scans
the rows and fires any relevant triggers ... truncating or dropping
the table does not.
My plan was always, to avoid eventual exhaustion of the SERIAL
sequence number integer value series, by swapping in during the
periodic app shutdown, a freshly truncated postgres
<eto_sql_tb_session_www> table.
Is there a logical problem with letting the sequence wrap around?
So my question to this postgres forum is -- should I just remove from
the online app the "fuzzy" probability mechanism, that DELETEs expired
rows from the session table -- because the postgres server is not going
to dynamically recycle the released storage space anyway?
I'm not sure I understand the reason for "fuzzy" deletion. There are
a number of timestamps in your data ... is it not possible to delete
deterministically based on one of them?
Hope this helps,
George
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Wed, 25 Nov 2015 23:39:39 -0500, "Steve Petrie, P.Eng."
<apetrie@aspetrie.net> wrote:
My stupid email client software (Microsoft Outlook Express on Win XP)
refuses to respect its own "Reply" option settings for inline reply text.
I've been looking for a replacement email client but so far without success.)
Without further comment about a 15 year old, unsupported OS ...
My vote for an email client would be Thunderbird. It runs on XP or
higher and you can import Outlook's PST files so as to keep your mail
archives. Importing PST files directly requires Outlook be available
on the same system [there is also a less friendly way to do it via EML
files exported from Outlook where Outlook is not on the same system].
Thunderbird directly supports net news, so you don't have to get news
mixed with your mail (unless you want to).
Hope this helps,
George
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Thanks to Jeff for the helpful response. My remarks are below.
----- Original Message -----
From: "Jeff Janes" <jeff.janes@gmail.com>
To: "Steve Petrie, P.Eng." <apetrie@aspetrie.net>
Cc: "Tim Uckun" <timuckun@gmail.com>; "Merlin Moncure" <mmoncure@gmail.com>;
"John R Pierce" <pierce@hogranch.com>; "PostgreSQL General"
<pgsql-general@postgresql.org>
Sent: Thursday, November 26, 2015 2:07 AM
Subject: Re: [GENERAL] using a postgres table as a multi-writer
multi-updater queue
On Wed, Nov 25, 2015 at 8:39 PM, Steve Petrie, P.Eng.
<apetrie@aspetrie.net> wrote:You don't ever want to delete from such a table so you need to set up
something which allows you to truncate the tables when you no longer
need
them.
...
Truncation is far more efficient than deletion + vacuuming. If you
are running on the edge of your hardware's capabilities, this
efficiency is important. But if you are not on the edge, then it is
not worth worrying about.
This is what I was hoping to learn -- because I doubt my app workload will
ever approach the edge of hardware capability.
Just make sure your autovacuum settings are
at least as aggressive as the default settings.
I'll keep that in mind. And by a happy coincidence, in another recent forum
thread (26 Nov 2015, subject "Re: [GENERAL] Re: [webmaster] How to
commit/vacuum a batch of delete statements in a postgresql function") there
is advice from Adrian Klaver to about the need to execute VACUUM outside of
a transaction block.
My plan was always, to avoid eventual exhaustion of the SERIAL sequence
number integer value series, by swapping in during the periodic app
shutdown, a freshly truncated postgres <eto_sql_tb_session_www> table.I'd make the serial column and bigserial, and then forget about it.
I considered using bigint, but decided against doing so for three reasons.
1. int rep is already going to be way more precision than is needed, 2.
avoid the extra resource consumption incurred by bigint as compared to int,
and 3. keep strictly 32-bit compatible with my 32-bit Win XP / PHP
development system. In fact the PHP app is programmed to terminate
abnormally on detecting a postgres SERIAL sequence number that exceeds the
maximum positive value of a strictly 32-bit signed integer.
Cheers,
Jeff
Steve Petrie, P.Eng.
ITS-ETO Consortium
Oakville, Ontario, Canada
apetrie@aspetrie.net
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Thanks to George for the helpful comments. My remarks are below.
"George Neuner" <gneuner2@comcast.net> wrote in message
news:gvad5bllba9slstdhkn6ql2jbplgd78p98@4ax.com...
On Wed, 25 Nov 2015 23:39:39 -0500, "Steve Petrie, P.Eng."
<apetrie@aspetrie.net> wrote:
My plan was always, to avoid eventual exhaustion of the SERIAL
sequence number integer value series, by swapping in during the
periodic app shutdown, a freshly truncated postgres
<eto_sql_tb_session_www> table.Is there a logical problem with letting the sequence wrap around?
I can't think of any reason why letting the sequence wrap would be a logical
problem, but for operational simplicity I prefer to reset the sequence
number back to one, during each brief daily postgres server shutdown window.
I don't relish the challenge of mentally grappling with gigantic session row
sequence numbers, when I'm investigating some future operational anomaly.
I'm not sure I understand the reason for "fuzzy" deletion. There are
a number of timestamps in your data ... is it not possible to delete
deterministically based on one of them?
You are correct. The timestamps are there in the session row, and the PHP
app's session row deletion process does use them to select rows for
deletion. There is also a decrementing integer "hit count" limit column
(presently initialized == 25) that kills a session when its "budget" of HTTP
requests is used up.
I should have clarified, that I use the the term "fuzzy" to refer to the
probability mechanism, that hooks a small burst of session row deletion
activity, to each one of a randomly-selected portion (presently 1/5) of the
HTTP requests that cause a new session row to be INSERTed into the session
table.
This means that on average, only every 5th HTTP request that creates a new
session row, will also incur the session deletion workload. When the session
row deletion process occurs, its (aggressive) limit for deletion workload is
2X as many expired rows as needed on average, to keep up with the rate of
session row creation (so the 2X DELETE limit presently == 10 rows).
The idea is to make the process of DELETing expired session rows,
automatically scale its activity, to closely and aggressively match the rate
of session row creation. A heavy burst of new sessions being created will
work proportionately more aggressively to DELETE expired session rows. This
way, the (HTTP request-driven) PHP app will be self-tuning its own session
table space recycling. And there is no process (e.g. cron-driven),
external to the PHP app itself, that is doing session row deletion.
Based on what I've learned from this forum (but before I have studied
AUTOVACUUM in any detail) my thinking is to include an AUTOVACUUM command
(outside of any SQL transaction block) in the HTTP request-driven PHP app,
immediately following any time the PHP app completes a session row DELETE
command.
Or maybe the AUTOVACUUM request should occur less frequently?
Hope this helps,
George--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Steve Petrie, P.Eng.
ITS-ETO Consortium
Oakville, Ontario, Canada
apetrie@aspetrie.net
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Import Notes
Resolved by subject fallback
Thanks to George for the Thuderbird email client recommendation. My remarks
are below.
"George Neuner" <gneuner2@comcast.net> wrote in message
news:kaed5btl92qr4v8ndevlgtv0f28qaaeju7@4ax.com...
On Wed, 25 Nov 2015 23:39:39 -0500, "Steve Petrie, P.Eng."
<apetrie@aspetrie.net> wrote:My stupid email client software (Microsoft Outlook Express on Win XP)
refuses to respect its own "Reply" option settings for inline reply text.
I've been looking for a replacement email client but so far without
success.)Without further comment about a 15 year old, unsupported OS ...
(Suddenly, my MS Outlook Express is letting me do inline comments.)
George, your self-restraint is admirable :) And my plan is to move from Win
XP to Linux in the (somewhat near) future.
Before I lose all credibility with this excellent forum -- be assured that
the PHP website app (not yet online) I'm migrating to postgres from mysql,
will ABSOLUTELY NOT be running in prodution under any kind of Microsoft
server software. Right now I'm planning to use as a server o/s, DragonFlyBSD
with its robust HAMMER filesystem.
My vote for an email client would be Thunderbird. It runs on XP or
higher and you can import Outlook's PST files so as to keep your mail
archives. Importing PST files directly requires Outlook be available
on the same system [there is also a less friendly way to do it via EML
files exported from Outlook where Outlook is not on the same system].
It's a common misconception that MS Outlook Express is compatible with MS
Outlook. But in fact the two products are architecturally unrelated. Much
the same way that Java and Javascript are unrelated.
MS OE does not use .PST files, but there are open source utilities that will
extract the contents of MS OE mail folders for migration to alternate email
clients.
I am considering Thunderbird as an MS OE replacement, but my understanding
is that Mozilla has abandoned all but security-related support for
Thundebird. I have been kicking the (email client functionality) tires of
SeaMonkey under my Win XP. I believe that much of SeaMonkey is built on a
Mozilla code base.
Hope this helps,
George--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
Steve Petrie, P.Eng.
ITS-ETO Consortium
Oakville, Ontario, Canada
(905) 847-3253
apetrie@aspetrie.net
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Import Notes
Resolved by subject fallback
On 11/29/2015 02:04 AM, Steve Petrie, P.Eng. wrote:
Thanks to George for the Thuderbird email client recommendation. My remarks
are below."George Neuner" <gneuner2@comcast.net> wrote in message
news:kaed5btl92qr4v8ndevlgtv0f28qaaeju7@4ax.com...On Wed, 25 Nov 2015 23:39:39 -0500, "Steve Petrie, P.Eng."
<apetrie@aspetrie.net> wrote:My stupid email client software (Microsoft Outlook Express on Win XP)
refuses to respect its own "Reply" option settings for inline reply
text.
I've been looking for a replacement email client but so far without
success.)Without further comment about a 15 year old, unsupported OS ...
(Suddenly, my MS Outlook Express is letting me do inline comments.)
George, your self-restraint is admirable :) And my plan is to move from Win
XP to Linux in the (somewhat near) future.Before I lose all credibility with this excellent forum -- be assured that
the PHP website app (not yet online) I'm migrating to postgres from mysql,
will ABSOLUTELY NOT be running in prodution under any kind of Microsoft
server software. Right now I'm planning to use as a server o/s,
DragonFlyBSD
with its robust HAMMER filesystem.My vote for an email client would be Thunderbird. It runs on XP or
higher and you can import Outlook's PST files so as to keep your mail
archives. Importing PST files directly requires Outlook be available
on the same system [there is also a less friendly way to do it via EML
files exported from Outlook where Outlook is not on the same system].It's a common misconception that MS Outlook Express is compatible with MS
Outlook. But in fact the two products are architecturally unrelated. Much
the same way that Java and Javascript are unrelated.MS OE does not use .PST files, but there are open source utilities that
will
extract the contents of MS OE mail folders for migration to alternate email
clients.I am considering Thunderbird as an MS OE replacement, but my understanding
is that Mozilla has abandoned all but security-related support for
Thundebird. I have been kicking the (email client functionality) tires of
SeaMonkey under my Win XP. I believe that much of SeaMonkey is built on a
Mozilla code base.
My take is SeaMonkey is Thunderbird with a bunch of unneeded features. I
use Thunderbird on a variety of machines and it works just fine. To me
an email client is a solved problem and security fixes are all that are
required, still Mozilla is making changes. For the latest version 38:
https://support.mozilla.org/en-US/kb/new-thunderbird-38
Hope this helps,
George--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 11/29/2015 01:59 AM, Steve Petrie, P.Eng. wrote:
Thanks to Jeff for the helpful response. My remarks are below.
----- Original Message ----- From: "Jeff Janes" <jeff.janes@gmail.com>
To: "Steve Petrie, P.Eng." <apetrie@aspetrie.net>
Cc: "Tim Uckun" <timuckun@gmail.com>; "Merlin Moncure"
<mmoncure@gmail.com>;
"John R Pierce" <pierce@hogranch.com>; "PostgreSQL General"
<pgsql-general@postgresql.org>
Sent: Thursday, November 26, 2015 2:07 AM
Subject: Re: [GENERAL] using a postgres table as a multi-writer
multi-updater queueOn Wed, Nov 25, 2015 at 8:39 PM, Steve Petrie, P.Eng.
<apetrie@aspetrie.net> wrote:You don't ever want to delete from such a table so you need to set up
something which allows you to truncate the tables when you no longer
need
them....
Truncation is far more efficient than deletion + vacuuming. If you
are running on the edge of your hardware's capabilities, this
efficiency is important. But if you are not on the edge, then it is
not worth worrying about.This is what I was hoping to learn -- because I doubt my app workload will
ever approach the edge of hardware capability.Just make sure your autovacuum settings are
at least as aggressive as the default settings.I'll keep that in mind. And by a happy coincidence, in another recent forum
thread (26 Nov 2015, subject "Re: [GENERAL] Re: [webmaster] How to
commit/vacuum a batch of delete statements in a postgresql function") there
is advice from Adrian Klaver to about the need to execute VACUUM outside of
a transaction block.
Just to be clear my reference was to someone running VACUUM manually.
Jeff's reference was to the autovacuum daemon which runs VACUUM and
ANALYZE as a background processes:
http://www.postgresql.org/docs/9.4/interactive/routine-vacuuming.html#AUTOVACUUM
My plan was always, to avoid eventual exhaustion of the SERIAL sequence
number integer value series, by swapping in during the periodic app
shutdown, a freshly truncated postgres <eto_sql_tb_session_www> table.I'd make the serial column and bigserial, and then forget about it.
I considered using bigint, but decided against doing so for three reasons.
1. int rep is already going to be way more precision than is needed, 2.
avoid the extra resource consumption incurred by bigint as compared to int,
and 3. keep strictly 32-bit compatible with my 32-bit Win XP / PHP
development system. In fact the PHP app is programmed to terminate
abnormally on detecting a postgres SERIAL sequence number that exceeds the
maximum positive value of a strictly 32-bit signed integer.Cheers,
Jeff
Steve Petrie, P.Eng.
ITS-ETO Consortium
Oakville, Ontario, Canada
apetrie@aspetrie.net
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Sun, 29 Nov 2015 05:04:42 -0500, "Steve Petrie, P.Eng."
<apetrie@aspetrie.net> wrote:
"George Neuner" <gneuner2@comcast.net> wrote in message
news:kaed5btl92qr4v8ndevlgtv0f28qaaeju7@4ax.com...My vote for an email client would be Thunderbird. It runs on XP or
higher and you can import Outlook's PST files so as to keep your mail
archives. Importing PST files directly requires Outlook be available
on the same system [there is also a less friendly way to do it via EML
files exported from Outlook where Outlook is not on the same system].It's a common misconception that MS Outlook Express is compatible with MS
Outlook. But in fact the two products are architecturally unrelated.
My understanding was that OE was based on the old (Win9x) Outlook. I
know it isn't the same as the "enterprise" version.
I wasn't aware that OE used a different file format. But, AFAIK, it
does still export EML files, so you can move your mailboxes into
Thunderbird (or whatever).
I am considering Thunderbird as an MS OE replacement, but my understanding
is that Mozilla has abandoned all but security-related support for
Thundebird. I have been kicking the (email client functionality) tires of
SeaMonkey under my Win XP. I believe that much of SeaMonkey is built on a
Mozilla code base.
Yes and no. Mozilla has Thunderbird on a slow development track. It
does occasionally get new features, but mostly now by having some very
popular extension becoming built in.
Seamonkey was a fork from a discontinued Mozilla application suite. It
is not a Mozilla project, although it does incorporate Mozilla code
from Firefox and Thunderbird.
The problem I have with Seamonkey is that it tries to be all things to
all web users. "Jack of all trades, master of none" is a truism.
YMMV, but I would rather have very reliable purpose specific tools
than an integrated suite which may do more but be less reliable
overall.
I'm not knocking Seamonkey per se - it seems to be quite well done -
I'm just making a general observation re: integrated application
suites. Netscape failed in part because it bit off too much, trying
to do mail and news on top of the browser [and not doing them well - I
loved the Netscape browser, but it's mail and news interface was just
bad]. Mozilla discontinued its web application suite because too few
people wanted it.
George
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Nov 29, 2015 18:34, "George Neuner" <gneuner2@comcast.net> wrote:
On Sun, 29 Nov 2015 05:04:42 -0500, "Steve Petrie, P.Eng."
<apetrie@aspetrie.net> wrote:"George Neuner" <gneuner2@comcast.net> wrote in message
news:kaed5btl92qr4v8ndevlgtv0f28qaaeju7@4ax.com...My vote for an email client would be Thunderbird. It runs on XP or
higher and you can import Outlook's PST files so as to keep your mail
archives. Importing PST files directly requires Outlook be available
on the same system [there is also a less friendly way to do it via EML
files exported from Outlook where Outlook is not on the same system].It's a common misconception that MS Outlook Express is compatible with MS
Outlook. But in fact the two products are architecturally unrelated.My understanding was that OE was based on the old (Win9x) Outlook. I
know it isn't the same as the "enterprise" version.
This is fantastically of topic but no, it was not. OE was based on the old
"Internet mail and news". The actual outlook product has always been
separate.
/Magnus