[Solved] Generic logging system for pre-hstore using plperl triggers
Hi, I had to implement a logging system for some DBs in work. It's
generic and implemented using plperl. I've seen tons of mails on the
list from newbies asking for something like this using plpgsql, but no
specific solution is pointed for them. I think this may discourage
some of them.
The system is implemented using a separate schema with whatever name
you want, and has some really nice features: relevant tables can be
rotated to facilitate auditing, each logged action refers to the
modified tuple by pk, which you don't even have to (but can) specify
when reating the trigger, and some more. Performance is very
acceptable (test cases and suggestions are welcome), and never had a
problem since the about 8 months it's been working.
In the wiki some points are mentioned, but all is _too_ general and
for a vanilla pg-8.4 you won't have the hstore facility (note that
upgrading is not always a choice).
Will it worth pasting the little code in here or that'll be ignored?
--
Diego Augusto Molina
diegoaugustomolina@gmail.com
ES: Por favor, evite adjuntar documentos de Microsoft Office. Serán
desestimados.
EN: Please, avoid attaching Microsoft Office documents. They shall be discarded.
LINK: http://www.gnu.org/philosophy/no-word-attachments.html
2011/9/27 Diego Augusto Molina <diegoaugustomolina@gmail.com>
Hi, I had to implement a logging system for some DBs in work. It's
generic and implemented using plperl. I've seen tons of mails on the
list from newbies asking for something like this using plpgsql, but no
specific solution is pointed for them. I think this may discourage
some of them.
The system is implemented using a separate schema with whatever name
you want, and has some really nice features: relevant tables can be
rotated to facilitate auditing, each logged action refers to the
modified tuple by pk, which you don't even have to (but can) specify
when reating the trigger, and some more. Performance is very
acceptable (test cases and suggestions are welcome), and never had a
problem since the about 8 months it's been working.
In the wiki some points are mentioned, but all is _too_ general and
for a vanilla pg-8.4 you won't have the hstore facility (note that
upgrading is not always a choice).
Will it worth pasting the little code in here or that'll be ignored?
If you can share it - do it, maybe someone will have use of your work - it's
always welcome.
Publishing it on the web and linking here instead of pasting - will be even
better.
Filip
/* Created by Diego Augusto Molina in 2011 for Tucuman Government,
Argentina. */
/*
-- Execute the following accordingly to your needs.
CREATE TRUSTED PROCEDURAL LANGUAGE 'plpgsql';
CREATE TRUSTED PROCEDURAL LANGUAGE 'plperl';
*/
CREATE ROLE auditor NOSUPERUSER NOINHERIT NOCREATEDB NOCREATEROLE;
CREATE ROLE audit LOGIN ENCRYPTED PASSWORD 'test.1234' NOSUPERUSER
NOINHERIT NOCREATEDB NOCREATEROLE;
CREATE SCHEMA audit AUTHORIZATION audit;
ALTER ROLE auditor SET search_path=audit;
ALTER ROLE audit SET search_path=audit;
SET search_path=audit;
SET SESSION AUTHORIZATION audit;
CREATE SEQUENCE seq_audit
INCREMENT 1
MINVALUE -9223372036854775808
MAXVALUE 9223372036854775807
START 0
CACHE 1
CYCLE;
ALTER TABLE seq_audit OWNER TO audit;
CREATE SEQUENCE seq_elems
INCREMENT 1
MINVALUE -32768
MAXVALUE 32767
START 0
CACHE 1
CYCLE;
ALTER TABLE seq_elems OWNER TO audit;
CREATE TABLE field
(
id smallint NOT NULL DEFAULT nextval('seq_elems'::regclass),
value name NOT NULL,
CONSTRAINT field_pk PRIMARY KEY (id)
WITH (FILLFACTOR=100),
CONSTRAINT field_uq_value UNIQUE (value)
WITH (FILLFACTOR=100)
)
WITH (
OIDS=FALSE
);
ALTER TABLE field OWNER TO audit;
GRANT ALL ON TABLE field TO audit;
GRANT SELECT ON TABLE field TO auditor;
CREATE TABLE client_inet
(
id smallint NOT NULL DEFAULT nextval('seq_elems'::regclass),
value inet NOT NULL DEFAULT inet_client_addr(),
CONSTRAINT dir_inet_pk PRIMARY KEY (id )
WITH (FILLFACTOR=100),
CONSTRAINT dir_inet_uq_value UNIQUE (value)
WITH (FILLFACTOR=95)
)
WITH (
OIDS=FALSE
);
ALTER TABLE client_inet
OWNER TO audit;
GRANT ALL ON TABLE client_inet TO audit;
GRANT SELECT ON TABLE client_inet TO auditor;
CREATE TABLE schema
(
id smallint NOT NULL DEFAULT nextval('seq_elems'::regclass),
value name NOT NULL,
CONSTRAINT schema_pk PRIMARY KEY (id )
WITH (FILLFACTOR=100),
CONSTRAINT schema_uq_value UNIQUE (value )
WITH (FILLFACTOR=100)
)
WITH (
OIDS=FALSE
);
ALTER TABLE schema
OWNER TO audit;
GRANT ALL ON TABLE schema TO audit;
GRANT SELECT ON TABLE schema TO auditor;
CREATE TABLE table
(
id smallint NOT NULL DEFAULT nextval('seq_elems'::regclass),
value name NOT NULL,
CONSTRAINT table_pk PRIMARY KEY (id )
WITH (FILLFACTOR=100),
CONSTRAINT table_uq_value UNIQUE (value )
WITH (FILLFACTOR=100)
)
WITH (
OIDS=FALSE
);
ALTER TABLE table
OWNER TO audit;
GRANT ALL ON TABLE table TO audit;
GRANT SELECT ON TABLE table TO auditor;
CREATE TABLE user
(
id smallint NOT NULL DEFAULT nextval('seq_elems'::regclass),
value name NOT NULL DEFAULT "current_user"(),
CONSTRAINT user_pk PRIMARY KEY (id )
WITH (FILLFACTOR=100),
CONSTRAINT user_uq_value UNIQUE (value )
WITH (FILLFACTOR=95)
)
WITH (
OIDS=FALSE
);
ALTER TABLE user
OWNER TO audit;
GRANT ALL ON TABLE user TO audit;
GRANT SELECT ON TABLE user TO auditor;
CREATE TABLE audit
(
id bigint,
type character(1),
tstmp timestamp with time zone DEFAULT now(),
schema smallint,
table smallint,
user smallint,
client_inet smallint,
client_port integer DEFAULT inet_client_port(),
pid integer DEFAULT pg_backend_pid()
)
WITH (
OIDS=FALSE
);
ALTER TABLE audit OWNER TO audit;
GRANT ALL ON TABLE audit TO audit;
GRANT SELECT ON TABLE audit TO auditor;
CREATE TABLE audet
(
id bigint,
field smallint,
is_pk boolean,
before text,
after text
)
WITH (
OIDS=FALSE
);
ALTER TABLE audet OWNER TO audit;
GRANT ALL ON TABLE audet TO audit;
GRANT SELECT ON TABLE audet TO auditor;
CREATE OR REPLACE FUNCTION tgf_ins_audet()
RETURNS trigger AS
$BODY$
begin
execute E'insert into audet_' || tg_argv[0] || E'
(
id,
field,
is_pk,
before,
after
) values
(
'||coalesce(new.id::text,'NULL')||E',
'||coalesce(new.field::text,'NULL')||E',
'||coalesce(new.is_pk::text,'NULL')||E',
'||coalesce(quote_literal(new.before),'NULL')||E',
'||coalesce(quote_literal(new.after),'NULL')||E'
)';
return null;
end;$BODY$ LANGUAGE plpgsql VOLATILE;
ALTER FUNCTION tgf_ins_audet() SET search_path=auditoria;
ALTER FUNCTION tgf_ins_audet() OWNER TO audit;
GRANT EXECUTE ON FUNCTION tgf_ins_audet() TO audit;
CREATE OR REPLACE FUNCTION tgf_ins_audit()
RETURNS trigger AS
$BODY$
begin
execute E'insert into audit_' || tg_argv[0] || E'
(
id,
type,
tstmp,
schema,
table,
user,
client_inet,
client_port,
pid
) values
(
'||coalesce(new.id::text,'NULL')||E',
'||coalesce(quote_literal(new.type),'NULL')||E',
'||coalesce(quote_literal(new.tstmp),'NULL')||E',
'||coalesce(new.schema::text,'NULL')||E',
'||coalesce(new.table::text,'NULL')||E',
'||coalesce(new.user::text,'NULL')||E',
'||coalesce(new.client_inet::text,'NULL')||E',
'||coalesce(new.client_port::text,'NULL')||E',
'||coalesce(new.pid::text,'NULL')||E'
)';
return null;
end;$BODY$ LANGUAGE plpgsql VOLATILE;
ALTER FUNCTION tgf_ins_audit() SET search_path=auditoria;
ALTER FUNCTION tgf_ins_audit() OWNER TO audit;
GRANT EXECUTE ON FUNCTION tgf_ins_audit() TO audit;
CREATE TRIGGER tg_audit_20110518
BEFORE INSERT
ON audit
FOR EACH ROW
EXECUTE PROCEDURE tgf_ins_audit(20110518);
CREATE TRIGGER tg_audet_20110907
BEFORE INSERT
ON audet
FOR EACH ROW
EXECUTE PROCEDURE tgf_ins_audet(20110907);
CREATE OR REPLACE FUNCTION rotate(character)
RETURNS void AS
$BODY$
/* Created by Diego Augusto Molina in 2011 for Tucuman Government,
Argentina. */
declare
first_execution boolean := false;
cur_start char(8) := null;
cur_tstmp_min timestamp with time zone;
cur_tstmp_max timestamp with time zone;
cur_id_min bigint;
cur_id_max bigint;
new_start char(8);
begin
/* Determine the creation tstmp of the tables
=========================================================================
*/
select substring(max(c.relname::text) from $1 || E'_(........)')
into cur_start
from
pg_namespace n inner join
pg_class c on (n.oid = c.relnamespace)
where
n.nspname = 'audit'::name and
c.relname::text like $1 || '_%';
if cur_start is null then
first_execution := true;
cur_start := '';
end if;
new_start := cast(to_char(current_timestamp,'YYYYMMDD') as name);
case $1
when 'audit' then /* if I'm rotating the table audit
================================================================== */
/* current table */
if not first_execution then
execute 'select min(tstmp), max(tstmp) from audit_' || cur_start
into cur_tstmp_min, cur_tstmp_max;
execute $$
alter index idx_audit_$$|| cur_start ||$$_id
set (fillfactor = 100);
alter index idx_audit_$$|| cur_start ||$$_tstmp
set (fillfactor = 100);
alter index idx_audit_$$|| cur_start ||$$_schema__table
set (fillfactor = 100);
alter index idx_audit_$$|| cur_start ||$$_user
set (fillfactor = 100);
cluster audit_$$|| cur_start ||$$;
analyze audit_$$|| cur_start ||$$;
alter table audit_$$|| cur_start ||$$ add
constraint audit_$$|| cur_start ||$$_ck_exclusion
check (
tstmp >= '$$|| cur_tstmp_min ||$$'
and
tstmp <= '$$|| cur_tstmp_max ||$$'
)
$$;
end if;
execute $$
/* new table */
create table audit_$$|| new_start ||$$ () inherits (audit);
create index idx_audit_$$|| new_start ||$$_id
on audit_$$|| new_start ||$$ using btree (id) with
(fillfactor = 99);
create index idx_audit_$$|| new_start ||$$_tstmp
on audit_$$|| new_start ||$$ using btree (tstmp) with
(fillfactor = 99);
create index idx_audit_$$|| new_start ||$$_schema__tabla
on audit_$$|| new_start ||$$ using btree (schema, table) with
(fillfactor = 95);
create index idx_audit_$$|| new_start ||$$_user
on audit_$$|| new_start ||$$ using btree (usuario) with
(fillfactor = 95);
cluster audit_$$|| new_start ||$$ using idx_audit_$$||
new_start ||$$_tstmp;
/* Parent table */
drop trigger if exists tg_audit_$$|| cur_start ||$$ on audit;
create trigger tg_audit_$$|| new_start ||$$
before insert
on audit
for each row
execute procedure tgf_ins_audit('$$|| new_start ||$$');
$$;
when 'audet' then /* if I'm rotating the table audet
================================================================== */
/* current table */
if not first_execution then
execute 'select min(id), max(id) from audet_' || cur_start
into cur_id_min, cur_id_max;
execute $$
alter index idx_audet_$$|| cur_start ||$$_id set
(fillfactor = 100);
alter index idx_audet_$$|| cur_start ||$$_fieldpk set
(fillfactor = 100);
cluster audet_$$|| cur_start ||$$;
analyze audet_$$|| cur_start ||$$;
alter table audet_$$|| cur_start ||$$ add
constraint audet_$$|| cur_start ||$$_ck_exclusion
check (
id >= '$$|| cur_id_min ||$$'
and
id <= '$$|| cur_id_max ||$$'
);
/* Parent table */
drop trigger tg_audet_$$|| cur_start ||$$ on audet;
$$;
end if;
execute $$
/* new table */
create table audet_$$|| new_start ||$$ () inherits (audet);
create index idx_audet_$$|| new_start ||$$_id on
audet_$$|| new_start ||$$ using btree (id) with (fillfactor = 99);
create index idx_audet_$$|| new_start ||$$_fieldpk on
audet_$$|| new_start ||$$ using btree (field) with (fillfactor = 99)
where es_pk;
cluster audet_$$|| new_start ||$$ using idx_audet_$$||
new_start ||$$_id;
/* Parent table */
create trigger tg_audet_$$|| new_start ||$$
before insert
on audet
for each row
execute procedure tgf_ins_audet('$$|| new_start ||$$');
$$;
else /* if I got a wrong argument
=====================================================================================
*/
raise notice E'Error: expected \'audit\' o \'audet\'. Got \'%\'.', $1;
return;
end case;
end;$BODY$ LANGUAGE plpgsql VOLATILE SECURITY DEFINER;
ALTER FUNCTION rotate(character) SET search_path=auditoria;
ALTER FUNCTION rotate(character) OWNER TO audit;
GRANT EXECUTE ON FUNCTION rotate(character) TO audit;
CREATE OR REPLACE FUNCTION audit() RETURNS trigger AS $BODY$
## Created by Diego Augusto Molina in 2011 for Tucuman Government, Argentina.
## This is the trigger which should be called by any table we want to
audit. If it receives
## arguments, they will be interpreted as the primary key of the
table. If any of the
## arguments is not a column of the table, or no arguments is received
at all, a probing process
## is taken which ends up determining the pk of the table. Thus, it is
better to create the trigger
## calling this function with no arguments. See the TODO list.
## Usage:
## CREATE TRIGGER <tg_name> AFTER {INSERT | UPDATE | DELETE} [ OR
...] ON <table_name>
## FOR EACH ROW EXECUTE PROCEDURE audit.audit( { <column_list> |
<nothing_at_all> } );
## KNOWN ISSUE #1: you don't want to use this trigger on a table which
has a column of some type
## that doesn't have an implicit cast to string. That
would cause a runtime error
## killing your transaction! See TODO #2. Be easy,
most 'common' types have an
## implicit cast to string by default.
## TODO #1: In 'P3', instead of asking '( scalar keys %pk == 0 )' each
time, put an 'else'.
## TODO #1.1: if the pk was not passed as argument, at the end of the
probing execute an 'alter
## trigger' so that next time there's no probing at all. This
would be unfriendly with
## modifications in the table definition, which should carry
an update in the trigger
## (putting no arguments at all would imply probing again for
the first time and then we
## just use it!).
## TODO #2: search for a way to save the binary contents of the
columns instead of the formatted
## content. The table 'field' would have an extra column of
type 'type' and that would
## help describing the field audited. That would solve the
problem with strange fields
## (assuming that _any_ value can be converted to it's
binary/internal representation).
## This may carry some extra complexity, maybe needing extra
tables holding information
## about types.
## TODO #3: make this function receive only two parameters: two arrays
of type name[], the first
## holding the set of columns which are the primary key, the
second one is the set of
## columns which in addition to the pk one's are to be
registered. Note that pk columns
## will always be registered because that will identify the
tuple modified.
## TODO #4: support for TRUNCATE event.
# P0. Declarations and general definitions
###################################################
##############################################################################################
my $elog_pref =
"(schm:$_TD->{table_schema};tab:$_TD->{table_name};trg:$_TD->{name};evt:$_TD->{event}):";
my $rv = ""; # Query execution
my $val = ""; # Iterating value
my %tables = ( # Value of the respective
tables inserted in "audit"
"user" => 'pg_catalog."session_user"()',
"table" => "'$_TD->{table_name}'",
"schema" => "'$_TD->{table_schema}'",
"client_inet" => "pg_catalog.inet_client_addr()"
);
my $id = ""; # Id of the tuple inserted
in "audit"
my $field = ""; # Field id
my $is_pk = 0; # Determines if a field is
part of the PK
my $before = ""; # Value of a field in OLD
my $after = ""; # Value of a field in NEW
my %cols = (); # Columns of the table
my %pk = (); # Primary key
# Copy columns from some available transitional variable
-------------------------------------
if (exists $_TD->{new}){
%cols = %{$_TD->{new}};
} else {
%cols = %{$_TD->{old}};
}
# P1. Create necessary tuples in user, table, schema and
client_inet #########################
##############################################################################################
foreach $val (keys %tables){
$rv = spi_exec_query("select id from $val where value = $tables{$val}");
if ( $rv->{status} != SPI_OK_SELECT ){
elog(ERROR, "$elog_pref Error querying table '$val'.");
}
if ( $rv->{processed} == 1 ){
$tables{$val} = $rv->{rows}[0]->{id};
} else {
$rv = spi_exec_query("insert into $val (value) values
($tables{$val}) returning id");
if ( $rv->{status} != SPI_OK_INSERT_RETURNING ){
elog(ERROR, "$elog_pref Error inserting in table '$val'.");
}
$tables{$val} = $rv->{rows}[0]->{id};
}
}
# P2. Insert in audit
########################################################################
##############################################################################################
$rv = spi_exec_query("select nextval('seq_audit'::regclass) as id");
if ( $rv->{status} != SPI_OK_SELECT ){
elog(ERROR, "$elog_pref Error querying next value of sequence
'seq_audit'.");
}
$id = $rv->{rows}[0]->{id};
$rv = spi_exec_query("insert into audit (id, type, schema, table,
user, client_inet)
values (
$id,
substring('$_TD->{event}', 1, 1),
$tables{'schema'},
$tables{'table'},
$tables{'user'},
$tables{'client_inet'}
)
");
if ($rv->{status} != SPI_OK_INSERT){
elog(ERROR,"$elog_pref Error inserting tuple in table 'audit'.");
}
# P3. Determine PK of the table
##############################################################
##############################################################################################
if ( scalar keys %pk == 0){
# Criterion 1: if got params, each of them is a column of the
table, and all of them make
# up the pk of the table
-------------------------------------------------------------------
elog(DEBUG, "$elog_pref Searching pk in the trigger's params.");
if ($_TD->{argc} > 0){
ARGS: foreach $val ( @{$_TD->{args}} ){
if (exists $cols{$val}){
$pk{$val} = "-";
} else {
%pk = ();
elog(DEBUG, "$elog_pref The column '$val' given as
argument does not exist. Skipping to next criterion.");
last ARGS;
}
}
}
}
if ( scalar keys %pk == 0 ) {
# Criterion 2: search the pk in the system catalogs
---------------------------------------
elog(DEBUG, "$elog_pref Searching pk in system catalogs.");
$rv = spi_exec_query("
select a.attname from
( select cl.oid, unnest(c.conkey) as att
from
pg_catalog.pg_constraint c inner join
pg_catalog.pg_class cl on (c.conrelid = cl.oid)
where
c.contype = 'p'
and cl.oid = $_TD->{relid}
) as c inner join
pg_catalog.pg_attribute a on (c.att = a.attnum and c.oid = a.attrelid)
");
if ( $rv->{status} == SPI_OK_SELECT ){
if ( $rv->{processed} > 0 ){
foreach $val ($rv->{rows}){
$pk{$val->{attname}} = "-";
}
}
} else {
elog(DEBUG, "$elog_pref Error querying the system catalogs.
Skipping to next criterion.");
}
}
if ( scalar keys %pk == 0) {
# Criterion 3: if the table has OIDs, use that as pk and emit
a warning -------------------
elog(DEBUG, "$elog_pref Searching OIDs in the table.");
$rv = spi_exec_query("select * from pg_catalog.pg_class where
oid = $_TD->{relid} and relhasoids = true");
if( $rv->{status} == SPI_OK_SELECT ){
if ( $rv->{processed} > 0 ){
%pk = ("oid","-");
elog(DEBUG, "$elog_pref Using OIDs as table pk for
'$_TD->{table_name}' because no previous criterion could find one.");
}
} else {
elog(DEBUG, "$elog_pref Error querying the system catalogs.
Skipping to next criterion.");
}
}
if ( scalar keys %pk == 0){
# Default criterion: all tuples
-----------------------------------------------------------
elog(DEBUG, "$elog_pref Could not find a suitable pk. Logging
every column.");
%pk = %cols;
}
# P4. Insert in audet
########################################################################
##############################################################################################
foreach $val (keys %cols){
$is_pk = 0 + exists($pk{$val});
if ( $_TD->{event} ne "UPDATE" || $is_pk || $_TD->{new}{$val} ne
$_TD->{old}{$val} ){
$before = (exists $_TD->{old}) ? "'".$_TD->{old}{$val}."'" : "NULL";
$after = (exists $_TD->{new}) ? "'".$_TD->{new}{$val}."'" : "NULL";
if ( $_TD->{event} eq "UPDATE" && $_TD->{new}{$val} eq
$_TD->{old}{$val}){
# We don't save the previous state of the column which
is part of the pk while updating
# if it hasn't changed.
$before = "NULL";
}
$rv = spi_exec_query("select id from field where value = '$val'");
if ( $rv->{status} != SPI_OK_SELECT ){
elog(ERROR, "$elog_pref Error querying table 'field'.");
}
if ( $rv->{processed} > 0 ){
$field = $rv->{rows}[0]->{id};
} else {
$rv = spi_exec_query("insert into field (value) values
('$val') returning id");
if ( $rv->{status} != SPI_OK_INSERT_RETURNING ){
elog(ERROR, "$elog_pref Error executing insert returning
in table 'field'.");
}
$field = $rv->{rows}[0]->{id};
}
$rv = spi_exec_query("insert into audet (id, field, is_pk,
before, after)
values ($id, $field, cast($is_pk as boolean), cast($before
as text), cast($after as text))");
if ( $rv->{status} ne SPI_OK_INSERT ){
elog(ERROR, "$elog_pref Error inserting tuples in table 'audet'.");
}
}
}
# P5. Finishing
##############################################################################
##############################################################################################
return;
$BODY$
LANGUAGE plperl VOLATILE SECURITY DEFINER;
ALTER FUNCTION audit() SET search_path=auditoria;
ALTER FUNCTION audit() OWNER TO audit;
GRANT EXECUTE ON FUNCTION audit() TO public;
Sh*#@, the code was wrapped! if you can suggest me an external web to
host the code I can put it there and send the link.
Maybe the wiki....????
Nevertheless, I can't seem to publish in the wiki. I think I've
followed all the steps but still can't.
Anyway, there aren't many long-long lines, except for those whoch are comments.
Honestly, I don't remember why I used triggers instead of rules in the
"audit" and "audet" tables.
--
Diego Augusto Molina
diegoaugustomolina@gmail.com
ES: Por favor, evite adjuntar documentos de Microsoft Office. Serán
desestimados.
EN: Please, avoid attaching Microsoft Office documents. They shall be discarded.
LINK: http://www.gnu.org/philosophy/no-word-attachments.html
2011/9/27, Diego Augusto Molina <diegoaugustomolina@gmail.com>:
Honestly, I don't remember why I used triggers instead of rules in the
"audit" and "audet" tables.
I remember now, that's because in my case, operations over tuples are
done very lightly (one or two in the same sentence at a time). So, for
a case as such, rules end up beeing more expensive than triggers
(right?).
--
Diego Augusto Molina
diegoaugustomolina@gmail.com
ES: Por favor, evite adjuntar documentos de Microsoft Office. Serán
desestimados.
EN: Please, avoid attaching Microsoft Office documents. They shall be discarded.
LINK: http://www.gnu.org/philosophy/no-word-attachments.html
2011/9/27 Diego Augusto Molina <diegoaugustomolina@gmail.com>:
/* Created by Diego Augusto Molina in 2011 for Tucuman Government,
Argentina. */
OK, few random comments:
*) when posting schema definitions, particularly in email format, try
not to use dumped definitions from pg_dump or pgadmin. This creates a
lot of noise in the script that detracts from what you are trying to
do. Also an attached file would probably have been more appropriate.
*) using smallint key for client_inet is really dubious. why not just
use the inet itself?
*) what is the audet table for? Are you truly storing a record for
every field of every audited table? This will be incredibly
efficient, especially for large, wide tables.
*) surely, creating a table called 'table' is not a good idea.
*) this approach obviously is a lot more complicated than hstore.
however, for 8.4 and down, hstore won't work. but, what about just
storing the record as text?
*) I can't quite follow the perl criteron steps -- what is happening
there? What are the loops doing?
merlin
On Tue, Sep 27, 2011 at 04:52:08PM -0300, Diego Augusto Molina wrote:
2011/9/27, Diego Augusto Molina <diegoaugustomolina@gmail.com>:
Honestly, I don't remember why I used triggers instead of rules in the
"audit" and "audet" tables.I remember now, that's because in my case, operations over tuples are
done very lightly (one or two in the same sentence at a time). So, for
a case as such, rules end up beeing more expensive than triggers
(right?).
There's an even better reason not to use rules: they're going away in
a not too distant version of PostgreSQL.
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
2011/9/27, Merlin Moncure <mmoncure@gmail.com>:
*) when posting schema definitions, particularly in email format, try
not to use dumped definitions from pg_dump or pgadmin. This creates a
lot of noise in the script that detracts from what you are trying to
do. Also an attached file would probably have been more appropriate.
Right! I'm sending it attached and from the source files instead of
the pgAdminIII dump. Well, that'll be the next mail, I don't have the
files right now.
*) using smallint key for client_inet is really dubious. why not just
use the inet itself?
Sure, this has two reasons:
1) I wanted the audit table to be as narrow as possible. Believe me, I
migrated the tenths of millions of tuples from the previous (slow,
inefficient, extremly difficult to maintain) logging system on a test
server just to try things out: the narrower the table, the better it
performs with searchs. And I mean it! You wouldn't imagine. I don't
know what I did with the annotated results, but I will search for them
to share that.
2) I put many things outside the audit table (like the table, schema
and field names too); that makes it narrower but also makes it very
easy to see all values without querying the audit table: I can see in
a very small table all inet's from clients (and you could gather more
info if you would like too). Note that for us most of the accesses to
the database come from the web server which implements a new version
of a big application, so it would be a pitty to allocate 5 extra bytes
(7 bytes for IPv4, 2 for smallint) just to see mostly the same IP
address. So, why bother logging the IP at all? well, besides adding
completeness, it allows us to see if they were using the new
application or the old one, which accessed directly to the database
server from the client's computer.
Other fields, namely client_port and pid, aren't mapped out to other
tables because they do not increase too much the table width and
because I wouldn't analyze those contents very often: like never, but
may happen some time some kind of weird attack that needs to be
analized with this data.
*) what is the audet table for? Are you truly storing a record for
every field of every audited table? This will be incredibly
efficient, especially for large, wide tables.
See the answer about criterions.
*) surely, creating a table called 'table' is not a good idea.
Ok, name it "tables" then.
*) this approach obviously is a lot more complicated than hstore.
however, for 8.4 and down, hstore won't work. but, what about just
storing the record as text?
Third time lucky! see the next...
*) I can't quite follow the perl criteron steps -- what is happening
there? What are the loops doing?
We have some very wide tables (like 20 or more columns). We only log
the value of two kind of columns: i) those which make up the primary
key of the table (which helps to track down the alteration); and ii)
those whose values change in the event. Note that columns in group i)
can also be in group ii)
This carries the complexity of the criterions, which are meant to
determine the primary key of the table at any cost. Each failing
criterion makes the following one to take place. These are the
criterions I could think of:
1) If we got parameters, _check_ them and consider each of them as one
column of the primary key. This is the cheapest and almost way
through. We really have to check, because if there's i.e. some typo
the whole transaction outside the trigger would fail inconditionally
together and we want this logging system to interfere as least as
possible. A little less performance in exchange for some more
stability.
2) Search in the system catalogs for a primary key constraint.
3) Search in the system catalogs for the unique constraint which has
least columns (in fact, I think it should be "the narrowest unique
constraint").
4) If the table has OIDs, use that and emit a warning (that's never a
real pk, unless you make an external unique index, which I don't have
intentions to check right now).
5) The "else" (or "default") case is to log every column emiting a
warning (Really guys, use primary keys! ;).
We wouldn't bear with these complexity every time but only once: if
criterion 1) fails, after determining the "primary key" we should
execute a string which drops this same trigger and re-creates it
passing it the names of the columns which were determined to be the pk
so that the next time we don't go furher than 1). This works, I tried
it out time ago but never did the change (oops!). I mean, dropping the
trigger from the trigger itself (in the docs it says that 'alter
trigger' can only rename it).
For my case, all this head ache was needed: it was specified as one of
the requirements of the logging system that every tuple should be
trackable. Using a recursive "with" query, it is possible to track
down the changes to any single tuple in the audited tables (or make a
function, whatever). Fortunately, they never specified a maximum time
for that ;). If instead we would have made a string from the record,
we wouldn't have been able to easily track the tupple.
Note also the "rotate(character)" function. It hot-rotates the
audit/audet tables similarly to a logrotate program in *nix. At the
same time, you never stop logging, and you can dump the old table
before dropping it (to save space) and restore it somewhere else to
exploit it. We would usually rotate each 6 months or so, but that is
shortening each time with the growing system.
The *real* system is very discreet with I/U/D operations: it's not
usual to have more than one of those operations per second. For higher
transactional systems I think this logging system would be more
noticeable (right now things go smooth). As I said before, if you have
test cases they're very welcome.
--
Diego Augusto Molina
diegoaugustomolina@gmail.com
ES: Por favor, evite adjuntar documentos de Microsoft Office. Serán
desestimados.
EN: Please, avoid attaching Microsoft Office documents. They shall be discarded.
LINK: http://www.gnu.org/philosophy/no-word-attachments.html
On Wed, Sep 28, 2011 at 06:20:04AM -0700, David Fetter wrote:
There's an even better reason not to use rules: they're going away in
a not too distant version of PostgreSQL.
Really? How? I thought views were done using rules under the hood?
Also, it'd be awfully nice if, in case rules are going away, the
documentation actually said prominently in the rules section, "Rules
are deprecated and are planned to be removed at some point in the
future." Right now, anyone coming to Postgres for the first time
could easily understand the manual to say that Postgres has this cool
feature on which they can rely.
Best,
A
--
Andrew Sullivan
ajs@crankycanuck.ca
Andrew Sullivan <ajs@crankycanuck.ca> writes:
On Wed, Sep 28, 2011 at 06:20:04AM -0700, David Fetter wrote:
There's an even better reason not to use rules: they're going away in
a not too distant version of PostgreSQL.
Really? How? I thought views were done using rules under the hood?
Also, it'd be awfully nice if, in case rules are going away, the
documentation actually said prominently in the rules section, "Rules
are deprecated and are planned to be removed at some point in the
future." Right now, anyone coming to Postgres for the first time
could easily understand the manual to say that Postgres has this cool
feature on which they can rely.
I think the true state of affairs is this: rules have a lot of
surprising behaviors, and if we could think of something that works more
straightforwardly, we'd love to replace them. But I think we'd have to
have the "something" in place before we consider deprecating rules.
At the moment we don't even have a glimmer of a design, so David's
statement is many years premature.
regards, tom lane
On Wed, Sep 28, 2011 at 10:34:32AM -0400, Tom Lane wrote:
I think the true state of affairs is this: rules have a lot of
surprising behaviors, and if we could think of something that works more
straightforwardly, we'd love to replace them.
Oh. Well, _that's_ not news :-)
--
Andrew Sullivan
ajs@crankycanuck.ca
On Wed, Sep 28, 2011 at 8:20 AM, Diego Augusto Molina
<diegoaugustomolina@gmail.com> wrote:
2011/9/27, Merlin Moncure <mmoncure@gmail.com>:
*) when posting schema definitions, particularly in email format, try
not to use dumped definitions from pg_dump or pgadmin. This creates a
lot of noise in the script that detracts from what you are trying to
do. Also an attached file would probably have been more appropriate.Right! I'm sending it attached and from the source files instead of
the pgAdminIII dump. Well, that'll be the next mail, I don't have the
files right now.*) using smallint key for client_inet is really dubious. why not just
use the inet itself?Sure, this has two reasons:
1) I wanted the audit table to be as narrow as possible. Believe me, I
migrated the tenths of millions of tuples from the previous (slow,
inefficient, extremly difficult to maintain) logging system on a test
server just to try things out: the narrower the table, the better it
performs with searchs. And I mean it! You wouldn't imagine. I don't
know what I did with the annotated results, but I will search for them
to share that.2) I put many things outside the audit table (like the table, schema
and field names too); that makes it narrower but also makes it very
easy to see all values without querying the audit table: I can see in
a very small table all inet's from clients (and you could gather more
info if you would like too). Note that for us most of the accesses to
the database come from the web server which implements a new version
of a big application, so it would be a pitty to allocate 5 extra bytes
(7 bytes for IPv4, 2 for smallint) just to see mostly the same IP
address. So, why bother logging the IP at all? well, besides adding
completeness, it allows us to see if they were using the new
application or the old one, which accessed directly to the database
server from the client's computer.Other fields, namely client_port and pid, aren't mapped out to other
tables because they do not increase too much the table width and
because I wouldn't analyze those contents very often: like never, but
may happen some time some kind of weird attack that needs to be
analized with this data.*) what is the audet table for? Are you truly storing a record for
every field of every audited table? This will be incredibly
efficient, especially for large, wide tables.See the answer about criterions.
*) surely, creating a table called 'table' is not a good idea.
Ok, name it "tables" then.
*) this approach obviously is a lot more complicated than hstore.
however, for 8.4 and down, hstore won't work. but, what about just
storing the record as text?Third time lucky! see the next...
*) I can't quite follow the perl criteron steps -- what is happening
there? What are the loops doing?We have some very wide tables (like 20 or more columns). We only log
the value of two kind of columns: i) those which make up the primary
key of the table (which helps to track down the alteration); and ii)
those whose values change in the event. Note that columns in group i)
can also be in group ii)
This carries the complexity of the criterions, which are meant to
determine the primary key of the table at any cost. Each failing
criterion makes the following one to take place. These are the
criterions I could think of:
1) If we got parameters, _check_ them and consider each of them as one
column of the primary key. This is the cheapest and almost way
through. We really have to check, because if there's i.e. some typo
the whole transaction outside the trigger would fail inconditionally
together and we want this logging system to interfere as least as
possible. A little less performance in exchange for some more
stability.
2) Search in the system catalogs for a primary key constraint.
3) Search in the system catalogs for the unique constraint which has
least columns (in fact, I think it should be "the narrowest unique
constraint").
4) If the table has OIDs, use that and emit a warning (that's never a
real pk, unless you make an external unique index, which I don't have
intentions to check right now).
5) The "else" (or "default") case is to log every column emiting a
warning (Really guys, use primary keys! ;).
We wouldn't bear with these complexity every time but only once: if
criterion 1) fails, after determining the "primary key" we should
execute a string which drops this same trigger and re-creates it
passing it the names of the columns which were determined to be the pk
so that the next time we don't go furher than 1). This works, I tried
it out time ago but never did the change (oops!). I mean, dropping the
trigger from the trigger itself (in the docs it says that 'alter
trigger' can only rename it).For my case, all this head ache was needed: it was specified as one of
the requirements of the logging system that every tuple should be
trackable. Using a recursive "with" query, it is possible to track
down the changes to any single tuple in the audited tables (or make a
function, whatever). Fortunately, they never specified a maximum time
for that ;). If instead we would have made a string from the record,
we wouldn't have been able to easily track the tupple.Note also the "rotate(character)" function. It hot-rotates the
audit/audet tables similarly to a logrotate program in *nix. At the
same time, you never stop logging, and you can dump the old table
before dropping it (to save space) and restore it somewhere else to
exploit it. We would usually rotate each 6 months or so, but that is
shortening each time with the growing system.The *real* system is very discreet with I/U/D operations: it's not
usual to have more than one of those operations per second. For higher
transactional systems I think this logging system would be more
noticeable (right now things go smooth). As I said before, if you have
test cases they're very welcome.
Yup -- I get it now. Well, one point about this is that it seems
directed towards your personal requirements. This is a very 'heavy'
audit system that will not be suitable for high performance
transactional systems. That said, it looks well thought out. Storing
only the changed fields is pretty clever, but I can't help but wonder
if you're better off using arrays for that purpose:
create type audit_field_t as (field text, old_value text, new_value text);
and inside the audit table itself having
fields audit_field_t,
and, if at all possible, constructing the array of audit fields in a
single expression. This will be much more compact than one record per
field -- normally, arrays in table definitions tend to be bad mojo but
this is one case they could be useful. Audit records are WORM, 'Write
Once Read Maybe', so compactness is important. Obviously, for 9.0+,
I would be rigging a solution around hstore for an 'all sql' solution
which is usually better if you can get away with it.
merlin
On 09/28/2011 08:34 AM, Tom Lane wrote:
Andrew Sullivan <ajs@crankycanuck.ca> writes:
On Wed, Sep 28, 2011 at 06:20:04AM -0700, David Fetter wrote:
There's an even better reason not to use rules: they're going away in
a not too distant version of PostgreSQL.Really? How? I thought views were done using rules under the hood?
Also, it'd be awfully nice if, in case rules are going away, the
documentation actually said prominently in the rules section, "Rules
are deprecated and are planned to be removed at some point in the
future." Right now, anyone coming to Postgres for the first time
could easily understand the manual to say that Postgres has this cool
feature on which they can rely.I think the true state of affairs is this: rules have a lot of
surprising behaviors, and if we could think of something that works more
straightforwardly, we'd love to replace them. But I think we'd have to
have the "something" in place before we consider deprecating rules.
At the moment we don't even have a glimmer of a design, so David's
statement is many years premature.regards, tom lane
Yoda: "Damned by faint praise, these rules are"
Would this future something more likely be a totally different concept
or a re-implementation?
Is there a list of "gotchas" w.r.t. rules?
2011/9/28, Merlin Moncure <mmoncure@gmail.com>:
Yup -- I get it now. Well, one point about this is that it seems
directed towards your personal requirements. This is a very 'heavy'
audit system that will not be suitable for high performance
transactional systems. That said, it looks well thought out. Storing
only the changed fields is pretty clever, but I can't help but wonder
if you're better off using arrays for that purpose:create type audit_field_t as (field text, old_value text, new_value text);
and inside the audit table itself having
fields audit_field_t,and, if at all possible, constructing the array of audit fields in a
single expression. This will be much more compact than one record per
field -- normally, arrays in table definitions tend to be bad mojo but
this is one case they could be useful. Audit records are WORM, 'Write
Once Read Maybe', so compactness is important. Obviously, for 9.0+,
I would be rigging a solution around hstore for an 'all sql' solution
which is usually better if you can get away with it.merlin
Well that sounds pretty fair to me. But that flow would not allow me
to make partial indexes on primary key fields. As you can see in the
"audet" table, there's a column named "is_pk" which tells if that
column was considered a primary key at the moment of the logging.
Normally there's no indexes, but when I have to make some audits I do
the following:
1) Dump the audits.
2) Restore somewhere else.
3) Generate some indexes on: timestamp, schema|table, field|is_pk and
id (I think, I've got the procedure annotated too, but not here hehe).
This indexing is a pain sometimes but even adding it to the time it
takes to run one query it is really cheap. Making the indexes gets far
more necessary if you run more than one query (which is probably the
case).
I had considered the solution you're posting, but it would get a
_real_ pain to run a query with 'unnest's and 'array_agg's. Also, note
that some of these may not be available in versions of PostgreSQL
prior to 8.4 (I think), so if you're planning to track the tupple you
won't be able to do it in clear (maybe using temp tables).
But! all those arguments above get beat by only one you asserted: that
"WORM" thing. You are defintly right about that. Logging in the
majority of the cases should be meaningful, light to run,
compact/compressed, and rotated so that it doesn't take up all your
space with time.
Having said that, I'm going to take your advice for the next version,
which I hope that also checks some TODO's in the list. When I get home
I'll send the current code attached and when I get some fresh air at
work I'll make the changes and post the new version.
Any other ideas for the new version? (get some previews in the TODO
list at the top of the perl trigger function in the attachment of the
next mail).
--
Diego Augusto Molina
diegoaugustomolina@gmail.com
ES: Por favor, evite adjuntar documentos de Microsoft Office. Serán
desestimados.
EN: Please, avoid attaching Microsoft Office documents. They shall be discarded.
LINK: http://www.gnu.org/philosophy/no-word-attachments.html
On Wed, Sep 28, 2011 at 11:45 AM, Diego Augusto Molina
<diegoaugustomolina@gmail.com> wrote:
2011/9/28, Merlin Moncure <mmoncure@gmail.com>:
Yup -- I get it now. Well, one point about this is that it seems
directed towards your personal requirements. This is a very 'heavy'
audit system that will not be suitable for high performance
transactional systems. That said, it looks well thought out. Storing
only the changed fields is pretty clever, but I can't help but wonder
if you're better off using arrays for that purpose:create type audit_field_t as (field text, old_value text, new_value text);
and inside the audit table itself having
fields audit_field_t,and, if at all possible, constructing the array of audit fields in a
single expression. This will be much more compact than one record per
field -- normally, arrays in table definitions tend to be bad mojo but
this is one case they could be useful. Audit records are WORM, 'Write
Once Read Maybe', so compactness is important. Obviously, for 9.0+,
I would be rigging a solution around hstore for an 'all sql' solution
which is usually better if you can get away with it.merlin
Well that sounds pretty fair to me. But that flow would not allow me
to make partial indexes on primary key fields. As you can see in the
"audet" table, there's a column named "is_pk" which tells if that
column was considered a primary key at the moment of the logging.
Normally there's no indexes, but when I have to make some audits I do
the following:
1) Dump the audits.
2) Restore somewhere else.
3) Generate some indexes on: timestamp, schema|table, field|is_pk and
id (I think, I've got the procedure annotated too, but not here hehe).
This indexing is a pain sometimes but even adding it to the time it
takes to run one query it is really cheap. Making the indexes gets far
more necessary if you run more than one query (which is probably the
case).
I had considered the solution you're posting, but it would get a
_real_ pain to run a query with 'unnest's and 'array_agg's. Also, note
that some of these may not be available in versions of PostgreSQL
prior to 8.4 (I think), so if you're planning to track the tupple you
won't be able to do it in clear (maybe using temp tables).
I disagree. unnest() and array_agg() (or, even better, array()
constructor syntax) are an absolute joy to work with and thinking in a
more functional way, which is usually the key to making things run
quickly. Also both functions are trivial to emulate in userland for
compatibility. Arrays of composites IIRC only go back to 8.3 so that
would be a true stopper for any solution in that vein.
As for the rest of it, I'd be looking to try and come up with an all
sql implementation. Also you should give an honest comparison between
what you've come up with vs. this:
http://pgfoundry.org/projects/tablelog/.
merlin
On Wed, Sep 28, 2011 at 10:53 AM, Rob Sargent <robjsargent@gmail.com> wrote:
On 09/28/2011 08:34 AM, Tom Lane wrote:
Andrew Sullivan <ajs@crankycanuck.ca> writes:
On Wed, Sep 28, 2011 at 06:20:04AM -0700, David Fetter wrote:
There's an even better reason not to use rules: they're going away in
a not too distant version of PostgreSQL.Really? How? I thought views were done using rules under the hood?
Also, it'd be awfully nice if, in case rules are going away, the
documentation actually said prominently in the rules section, "Rules
are deprecated and are planned to be removed at some point in the
future." Right now, anyone coming to Postgres for the first time
could easily understand the manual to say that Postgres has this cool
feature on which they can rely.I think the true state of affairs is this: rules have a lot of
surprising behaviors, and if we could think of something that works more
straightforwardly, we'd love to replace them. But I think we'd have to
have the "something" in place before we consider deprecating rules.
At the moment we don't even have a glimmer of a design, so David's
statement is many years premature.regards, tom lane
Yoda: "Damned by faint praise, these rules are"
Would this future something more likely be a totally different concept
or a re-implementation?Is there a list of "gotchas" w.r.t. rules?
yes, and it is huge -- see the archives. note that views are based on
rules (SELECT rules) and there is nothing wrong there -- so it's not
quite correct to say they will be going away completely.
view triggers removed the #1 thing that most people wanted to do with
rules, namely updatable views. the basic problem with rules is that
by hacking the sql you send to the server, you lose control over
various aspects of the statement that normally are fairly predictable.
they are almost impossible to get working properly -- I've tried many
times. aside from the multiple evaluation thing, you have bizarre
interactions with many sql features that came in later (like
RETURNING).
folks, don't use RULES! use triggers -- and as much as possible, keep
triggers simple, short, and to the point (simple validation, custom
RI, auditing/logging, etc).
merlin
Hi,
folks, don't use RULES! use triggers -- and as much as possible, keep
triggers simple, short, and to the point (simple validation, custom
RI, auditing/logging, etc).
I like them :). 'DO INSTEAD' rules are great for partitioning so you
can insert (or update) to parent table and 'DO INSTEAD' rule takes
care about the rest.
--
Ondrej Ivanic
(ondrej.ivanic@gmail.com)
-----Original Message-----
From: Ondrej Ivanič [mailto:ondrej.ivanic@gmail.com]
Sent: Wednesday, September 28, 2011 6:47 PM
To: pgsql-general@postgresql.org
Subject: Re: Rules going awayHi,
folks, don't use RULES! use triggers -- and as much as possible, keep
triggers simple, short, and to the point (simple validation, custom
RI, auditing/logging, etc).I like them :). 'DO INSTEAD' rules are great for partitioning so you
can insert (or update) to parent table and 'DO INSTEAD' rule takes
care about the rest.--
Ondrej Ivanic
(ondrej.ivanic@gmail.com)
Years ago (PG version 8.2), I was working on implementing partitioned tables, and evaluated whether to use triggers or rules to "automagically" redirect inserts/updates/delete from main table to appropriate partition based on the value of partitioning key.
Unfortunately, I didn't document my research, but the result was that triggers were unusable for my purposes, and I ended up with the rules calling functions where using dynamic sql I decide which partition should be used for insert/update/delete (at least on "selects" PG takes care of choosing proper partition, of course with properly set configuration parameter).
These rules (and functions they call) work perfectly for me all these years in many, many, many installations.
So, until PG takes care of the problem I'm solving with the rules (implementing "complete" partitioning feature, and not just on "selects"), please leave rules where they are, they definitely have their use when properly implemented with specific purpose.
Regards,
Igor Neyman
speaking of DO INSTEAD, for insert/update case. Try using RETURNING
with that and rules ;) Good luck
-----Original Message-----
From: Gregg Jaskiewicz [mailto:gryzman@gmail.com]
Sent: Friday, September 30, 2011 5:18 AM
To: Igor Neyman
Cc: Ondrej Ivanič; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Rules going awayspeaking of DO INSTEAD, for insert/update case. Try using RETURNING
with that and rules ;) Good luck
Well, I designed it for specific case.
And in my application I had no need for RETURNING clause.
That's why I'm saying, it works perfectly for me, but I'm not trying to generalize, like those who say "Rules are evil and don't ever use them".
Regards,
Igor Neyman