Unloading a table consistently
Hi,
I will have a log table which, once a day or so, is copied to a file
(for movement to a data warehouse), and the log table emptied. For
performance, the log table on the production system has no indexes,
and is write-only. (The unload process is the only reader.)
To unload it, I will be doing:
BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
COPY log TO 'filename-path';
TRUNCATE log;
COMMIT;
My understanding is that I need the SERIALIZABLE isolation level so
that the COPY and TRUNCATE see exactly the same view of the table.
Obviously, I don't want to lose data by having the TRUNCATE delete
records that appeared while the COPY was executing. Is that
correct? Is there a better way to handle this kind of thing that I'm
missing?
Thanks!
-- Xof
Christophe <xof@thebuild.com> writes:
I will have a log table which, once a day or so, is copied to a file
(for movement to a data warehouse), and the log table emptied. For
performance, the log table on the production system has no indexes,
and is write-only. (The unload process is the only reader.)
To unload it, I will be doing:
BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
COPY log TO 'filename-path';
TRUNCATE log;
COMMIT;
My understanding is that I need the SERIALIZABLE isolation level so
that the COPY and TRUNCATE see exactly the same view of the table.
No, the SERIALIZABLE bit is useless if not actually counterproductive
here. What you'll actually need is an exclusive lock:
BEGIN;
LOCK TABLE log;
COPY log TO 'filename-path';
TRUNCATE log;
COMMIT;
to prevent anyone from inserting new rows during the COPY. Otherwise
any such rows would be lost at the TRUNCATE.
You might be able to get away with a lock that only locks out writers
and not readers (offhand I think IN SHARE MODE is the right thing),
but from your description that would be a useless refinement anyway.
If you can't tolerate locking out writers for that long, you won't
be able to use TRUNCATE. The operation I think you were imagining is
BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
COPY log TO 'filename-path';
DELETE FROM log;
COMMIT;
VACUUM log; -- highly recommended at this point
Here, the serializable mode is exactly what you want to guarantee
that the DELETE deletes the same rows the COPY copied. This is
a great deal less efficient than TRUNCATE, but it's secure for
concurrent insertions, which TRUNCATE is definitely not.
regards, tom lane
On May 3, 2008, at 9:56 AM, Tom Lane wrote:
This is
a great deal less efficient than TRUNCATE, but it's secure for
concurrent insertions, which TRUNCATE is definitely not.
Exactly my question; thank you!
-- Xof
In article <9478.1209833817@sss.pgh.pa.us>, Tom Lane <tgl@sss.pgh.pa.us> wrote:
% If you can't tolerate locking out writers for that long, you won't
% be able to use TRUNCATE. The operation I think you were imagining is
%
% BEGIN;
% SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
% COPY log TO 'filename-path';
% DELETE FROM log;
% COMMIT;
% VACUUM log; -- highly recommended at this point
How about something along the lines of
BEGIN;
ALTER TABLE log RENAME to log_old;
CREATE TABLE log(...);
COMMIT;
BEGIN;
LOCK table log_old;
COPY log_old TO 'filename-path';
DROP TABLE log_old;
COMMIT;
I believe this will keep the writers writing while keeping the efficiency
of truncating.
--
Patrick TJ McPhee
North York Canada
ptjm@interlog.com
Patrick TJ McPhee wrote:
How about something along the lines of
BEGIN;
ALTER TABLE log RENAME to log_old;
CREATE TABLE log(...);
COMMIT;BEGIN;
LOCK table log_old;
COPY log_old TO 'filename-path';
DROP TABLE log_old;
COMMIT;I believe this will keep the writers writing while keeping the efficiency
of truncating.
It's almost a pity that there's no
TRUNCATE TABLE log MOVE DATA TO log_copy;
or similar; ie with two identical table definitions `log' and `log_copy'
swap the backing file from `log' to `log_copy' before truncating `log'.
`log_copy' could be a new temp table created with CREATE TEMPORARY TABLE
... LIKE.
This sort of thing doesn't seem to come up all that much, though.
--
Craig Ringer
On May 3, 2008, at 9:29 PM, Patrick TJ McPhee wrote:
How about something along the lines of
BEGIN;
ALTER TABLE log RENAME to log_old;
CREATE TABLE log(...);
COMMIT;BEGIN;
LOCK table log_old;
COPY log_old TO 'filename-path';
DROP TABLE log_old;
COMMIT;I believe this will keep the writers writing while keeping the
efficiency
of truncating.
Brilliant.
On Sat, 2008-05-03 at 09:11 -0700, Christophe wrote:
I will have a log table which, once a day or so, is copied to a file
(for movement to a data warehouse), and the log table emptied.
...
Is there a better way to handle this kind of thing that I'm
missing?
Anyone care to comment about whether a table partition might be better
here? Could an insert rule on a table partition automatically create the
inherited table, if needed, and insert there?
Thanks,
Reece
--
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0