Writing Trigger Functions in C

Started by Charles Gomesabout 13 years ago10 messages
#1Charles Gomes
charlesrg@outlook.com

Hello guys,

I've been finding performance issues when using a trigger to modify inserts on a partitioned table.
If using the trigger the total time goes from 1 Hour to 4 hours.

The trigger is pretty simple:

CREATE OR REPLACE FUNCTION quotes_insert_trigger()
RETURNS trigger AS $$
BEGIN
EXECUTE 'INSERT INTO quotes_'|| to_char(new.received_time,'YYYY_MM_DD') ||' VALUES (($1).*)' USING NEW ;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;

I've seen that some of you guys have worked on writing triggers in C.

Does anyone have had an experience writing a trigger for partitioning in C ?

If you have some code to paste so I can start from I will really appreciate.

Thanks

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Merlin Moncure
mmoncure@gmail.com
In reply to: Charles Gomes (#1)
Re: Writing Trigger Functions in C

On Fri, Dec 21, 2012 at 10:25 AM, Charles Gomes <charlesrg@outlook.com> wrote:

Hello guys,

I've been finding performance issues when using a trigger to modify inserts on a partitioned table.
If using the trigger the total time goes from 1 Hour to 4 hours.

The trigger is pretty simple:

CREATE OR REPLACE FUNCTION quotes_insert_trigger()
RETURNS trigger AS $$
BEGIN
EXECUTE 'INSERT INTO quotes_'|| to_char(new.received_time,'YYYY_MM_DD') ||' VALUES (($1).*)' USING NEW ;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;

I've seen that some of you guys have worked on writing triggers in C.

Does anyone have had an experience writing a trigger for partitioning in C ?

If you have some code to paste so I can start from I will really appreciate.

Honestly I'd leave the trigger alone and modify the client code in
performance sensitive places to insert directly to the correct
partition table.

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Pavel Stehule
pavel.stehule@gmail.com
In reply to: Charles Gomes (#1)
Re: Writing Trigger Functions in C

Hello

you can find lot of examples in PostgreSQL source code - see
postgresql/contrib/spi directory

and documentation http://www.postgresql.org/docs/9.0/static/trigger-example.html

Regards

Pavel Stehule

2012/12/21 Charles Gomes <charlesrg@outlook.com>:

Hello guys,

I've been finding performance issues when using a trigger to modify inserts on a partitioned table.
If using the trigger the total time goes from 1 Hour to 4 hours.

The trigger is pretty simple:

CREATE OR REPLACE FUNCTION quotes_insert_trigger()
RETURNS trigger AS $$
BEGIN
EXECUTE 'INSERT INTO quotes_'|| to_char(new.received_time,'YYYY_MM_DD') ||' VALUES (($1).*)' USING NEW ;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;

I've seen that some of you guys have worked on writing triggers in C.

Does anyone have had an experience writing a trigger for partitioning in C ?

If you have some code to paste so I can start from I will really appreciate.

Thanks

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Joe Conway
mail@joeconway.com
In reply to: Merlin Moncure (#2)
Re: Writing Trigger Functions in C

On 12/21/2012 08:39 AM, Merlin Moncure wrote:

On Fri, Dec 21, 2012 at 10:25 AM, Charles Gomes <charlesrg@outlook.com> wrote:

Hello guys,

I've been finding performance issues when using a trigger to modify inserts on a partitioned table.
If using the trigger the total time goes from 1 Hour to 4 hours.

The trigger is pretty simple:

CREATE OR REPLACE FUNCTION quotes_insert_trigger()
RETURNS trigger AS $$
BEGIN
EXECUTE 'INSERT INTO quotes_'|| to_char(new.received_time,'YYYY_MM_DD') ||' VALUES (($1).*)' USING NEW ;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;

I've seen that some of you guys have worked on writing triggers in C.

Does anyone have had an experience writing a trigger for partitioning in C ?

If you have some code to paste so I can start from I will really appreciate.

Honestly I'd leave the trigger alone and modify the client code in
performance sensitive places to insert directly to the correct
partition table.

I second that recommendation -- your performance will be much, much, better.

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Christopher Browne
cbbrowne@gmail.com
In reply to: Charles Gomes (#1)
Re: Writing Trigger Functions in C

On Fri, Dec 21, 2012 at 11:25 AM, Charles Gomes <charlesrg@outlook.com>
wrote:

Hello guys,

I've been finding performance issues when using a trigger to modify

inserts on a partitioned table.

If using the trigger the total time goes from 1 Hour to 4 hours.

The trigger is pretty simple:

CREATE OR REPLACE FUNCTION quotes_insert_trigger()
RETURNS trigger AS $
BEGIN
EXECUTE 'INSERT INTO quotes_'|| to_char(new.received_time,'YYYY_MM_DD')

||' VALUES (($1).*)' USING NEW ;

RETURN NULL;
END;
$
LANGUAGE plpgsql;

I've seen that some of you guys have worked on writing triggers in C.

Does anyone have had an experience writing a trigger for partitioning in

C ?

I'd want to be very careful about assuming that implementing the trigger
function in C
would necessarily improve performance. It's pretty likely that it wouldn't
help much,
as a fair bit of the cost of firing a trigger have to do with figuring out
which function to
call, marshalling arguments, and calling the function, none of which would
magically disappear by virtue of implementing in C.

A *major* cost that your existing implementation has is that it's
re-planning
the queries for every single invocation. This is an old, old problem from
the
Lisp days, "EVAL considered evil" <
http://stackoverflow.com/questions/2571401/why-exactly-is-eval-evil&gt;

The EXECUTE winds up replanning queries every time the trigger fires.

If you can instead enumerate the partitions explicitly, putting them into
(say) a
CASE clause, the planner could generate the plan once, rather than a
million
times, which would be a HUGE savings, vastly greater than you could expect
from
recoding into C.

The function might look more like:

create or replace function quotes_insert_trigger () returns trigger as $$
declare
c_rt text;
begin
c_rt := to_char(new.received_time, 'YYYY_MM_DD');
case c_rt
when '2012_03_01' then
insert into 2012_03_01 values (NEW.*) using new;
when '2012_03_02' then
insert into 2012_03_02 values (NEW.*) using new;
else
raise exception 'Need a new partition function for %', c_rt;
end case;
end $$ language plpgsql;

You'd periodically need to change the function to reflect the existing set
of
partitions, but that's cheaper than creating a new partition.

The case statement gets more expensive (in effect O(n) on the number of
partitions, n) as the number of partitions increases. You could split
the date into pieces (e.g. - years, months, days) to diminish that cost.

But at any rate, this should be *way* faster than what you're running now,
and not at any heinous change in development costs (as would likely
be the case reimplementing using SPI).
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

#6Pavel Stehule
pavel.stehule@gmail.com
In reply to: Joe Conway (#4)
Re: Writing Trigger Functions in C

2012/12/21 Joe Conway <mail@joeconway.com>:

On 12/21/2012 08:39 AM, Merlin Moncure wrote:

On Fri, Dec 21, 2012 at 10:25 AM, Charles Gomes <charlesrg@outlook.com> wrote:

Hello guys,

I've been finding performance issues when using a trigger to modify inserts on a partitioned table.
If using the trigger the total time goes from 1 Hour to 4 hours.

The trigger is pretty simple:

CREATE OR REPLACE FUNCTION quotes_insert_trigger()
RETURNS trigger AS $$
BEGIN
EXECUTE 'INSERT INTO quotes_'|| to_char(new.received_time,'YYYY_MM_DD') ||' VALUES (($1).*)' USING NEW ;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;

I've seen that some of you guys have worked on writing triggers in C.

Does anyone have had an experience writing a trigger for partitioning in C ?

If you have some code to paste so I can start from I will really appreciate.

Honestly I'd leave the trigger alone and modify the client code in
performance sensitive places to insert directly to the correct
partition table.

I second that recommendation -- your performance will be much, much, better.

sure

Pavel

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Charles Gomes
charlesrg@outlook.com
In reply to: Christopher Browne (#5)
Re: Writing Trigger Functions in C

________________________________

Date: Fri, 21 Dec 2012 11:56:25 -0500
Subject: Re: [HACKERS] Writing Trigger Functions in C
From: cbbrowne@gmail.com
To: charlesrg@outlook.com
CC: pgsql-hackers@postgresql.org

On Fri, Dec 21, 2012 at 11:25 AM, Charles Gomes
<charlesrg@outlook.com<mailto:charlesrg@outlook.com>> wrote:

Hello guys,

I've been finding performance issues when using a trigger to modify

inserts on a partitioned table.

If using the trigger the total time goes from 1 Hour to 4 hours.

The trigger is pretty simple:

CREATE OR REPLACE FUNCTION quotes_insert_trigger()
RETURNS trigger AS $
BEGIN
EXECUTE 'INSERT INTO quotes_'||

to_char(new.received_time,'YYYY_MM_DD') ||' VALUES (($1).*)' USING NEW
;

RETURN NULL;
END;
$
LANGUAGE plpgsql;

I've seen that some of you guys have worked on writing triggers in C.

Does anyone have had an experience writing a trigger for partitioning

in C ?

I'd want to be very careful about assuming that implementing the
trigger function in C
would necessarily improve performance. It's pretty likely that it
wouldn't help much,
as a fair bit of the cost of firing a trigger have to do with figuring
out which function to
call, marshalling arguments, and calling the function, none of which would
magically disappear by virtue of implementing in C.

A *major* cost that your existing implementation has is that it's re-planning
the queries for every single invocation. This is an old, old problem
from the
Lisp days, "EVAL considered evil"
<http://stackoverflow.com/questions/2571401/why-exactly-is-eval-evil&gt;

The EXECUTE winds up replanning queries every time the trigger fires.

If you can instead enumerate the partitions explicitly, putting them
into (say) a
CASE clause, the planner could generate the plan once, rather than a million
times, which would be a HUGE savings, vastly greater than you could
expect from
recoding into C.

The function might look more like:

create or replace function quotes_insert_trigger () returns trigger as $$
declare
c_rt text;
begin
c_rt := to_char(new.received_time, 'YYYY_MM_DD');
case c_rt
when '2012_03_01' then
insert into 2012_03_01 values (NEW.*) using new;
when '2012_03_02' then
insert into 2012_03_02 values (NEW.*) using new;
else
raise exception 'Need a new partition function for %', c_rt;
end case;
end $$ language plpgsql;

You'd periodically need to change the function to reflect the existing set of
partitions, but that's cheaper than creating a new partition.

The case statement gets more expensive (in effect O(n) on the number of
partitions, n) as the number of partitions increases. You could split
the date into pieces (e.g. - years, months, days) to diminish that cost.

But at any rate, this should be *way* faster than what you're running now,
and not at any heinous change in development costs (as would likely
be the case reimplementing using SPI).
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

I will change and implement it this way, I was not aware of such optimization.
Will post back after my benchmark runs.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Charles Gomes
charlesrg@outlook.com
In reply to: Charles Gomes (#7)
Re: Writing Trigger Functions in C

----------------------------------------

From: charlesrg@outlook.com
To: cbbrowne@gmail.com
CC: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Writing Trigger Functions in C
Date: Fri, 21 Dec 2012 12:27:26 -0500

________________________________

Date: Fri, 21 Dec 2012 11:56:25 -0500
Subject: Re: [HACKERS] Writing Trigger Functions in C
From: cbbrowne@gmail.com
To: charlesrg@outlook.com
CC: pgsql-hackers@postgresql.org

On Fri, Dec 21, 2012 at 11:25 AM, Charles Gomes
<charlesrg@outlook.com<mailto:charlesrg@outlook.com>> wrote:

Hello guys,

I've been finding performance issues when using a trigger to modify

inserts on a partitioned table.

If using the trigger the total time goes from 1 Hour to 4 hours.

The trigger is pretty simple:

CREATE OR REPLACE FUNCTION quotes_insert_trigger()
RETURNS trigger AS $
BEGIN
EXECUTE 'INSERT INTO quotes_'||

to_char(new.received_time,'YYYY_MM_DD') ||' VALUES (($1).*)' USING NEW
;

RETURN NULL;
END;
$
LANGUAGE plpgsql;

I've seen that some of you guys have worked on writing triggers in C.

Does anyone have had an experience writing a trigger for partitioning

in C ?

I'd want to be very careful about assuming that implementing the
trigger function in C
would necessarily improve performance. It's pretty likely that it
wouldn't help much,
as a fair bit of the cost of firing a trigger have to do with figuring
out which function to
call, marshalling arguments, and calling the function, none of which would
magically disappear by virtue of implementing in C.

A *major* cost that your existing implementation has is that it's re-planning
the queries for every single invocation. This is an old, old problem
from the
Lisp days, "EVAL considered evil"
<http://stackoverflow.com/questions/2571401/why-exactly-is-eval-evil&gt;

The EXECUTE winds up replanning queries every time the trigger fires.

If you can instead enumerate the partitions explicitly, putting them
into (say) a
CASE clause, the planner could generate the plan once, rather than a million
times, which would be a HUGE savings, vastly greater than you could
expect from
recoding into C.

The function might look more like:

create or replace function quotes_insert_trigger () returns trigger as $$
declare
c_rt text;
begin
c_rt := to_char(new.received_time, 'YYYY_MM_DD');
case c_rt
when '2012_03_01' then
insert into 2012_03_01 values (NEW.*) using new;
when '2012_03_02' then
insert into 2012_03_02 values (NEW.*) using new;
else
raise exception 'Need a new partition function for %', c_rt;
end case;
end $$ language plpgsql;

You'd periodically need to change the function to reflect the existing set of
partitions, but that's cheaper than creating a new partition.

The case statement gets more expensive (in effect O(n) on the number of
partitions, n) as the number of partitions increases. You could split
the date into pieces (e.g. - years, months, days) to diminish that cost.

But at any rate, this should be *way* faster than what you're running now,
and not at any heinous change in development costs (as would likely
be the case reimplementing using SPI).
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

I will change and implement it this way, I was not aware of such optimization.
Will post back after my benchmark runs.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

I've to have add 2 weeks of data at a time, therefore I had to keep two weeks of case statements
Replaced the short trigger function to:

CREATE OR REPLACE FUNCTION quotes_insert_trigger()
RETURNS trigger AS $$
DECLARE
r_date text;
BEGIN
r_date = to_char(new.received_time, 'YYYY_MM_DD');
case r_date
    when '2012_09_10' then
        insert into quotes_2012_09_10 values (NEW.*) using new;
        return;
    when '2012_09_11' then
        insert into quotes_2012_09_11 values (NEW.*) using new;
        return;
    when '2012_09_12' then
        insert into quotes_2012_09_12 values (NEW.*) using new;
        return;
    when '2012_09_13' then
        insert into quotes_2012_09_13 values (NEW.*) using new;
        return;
    when '2012_09_14' then
        insert into quotes_2012_09_14 values (NEW.*) using new;
        return;
    when '2012_09_15' then
        insert into quotes_2012_09_15 values (NEW.*) using new;
        return;
    when '2012_09_16' then
        insert into quotes_2012_09_16 values (NEW.*) using new;
        return;
    when '2012_09_17' then
        insert into quotes_2012_09_17 values (NEW.*) using new;
        return;
    when '2012_09_18' then
        insert into quotes_2012_09_18 values (NEW.*) using new;
        return;
    when '2012_09_19' then
        insert into quotes_2012_09_19 values (NEW.*) using new;
        return;
    when '2012_09_20' then
        insert into quotes_2012_09_20 values (NEW.*) using new;
        return;
    when '2012_09_21' then
        insert into quotes_2012_09_21 values (NEW.*) using new;
        return;
    when '2012_09_22' then
        insert into quotes_2012_09_22 values (NEW.*) using new;
        return;
    when '2012_09_23' then
        insert into quotes_2012_09_23 values (NEW.*) using new;
        return;
    when '2012_09_24' then
        insert into quotes_2012_09_24 values (NEW.*) using new;
        return;
end case
RETURN NULL;
END;
$$
LANGUAGE plpgsql;

And I had no performance improvements at all.
Took the same time as with the previous EXECUTE statement;

I don't see what am I doing wrong.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Robert Haas
robertmhaas@gmail.com
In reply to: Charles Gomes (#8)
Re: Writing Trigger Functions in C

On Mon, Dec 24, 2012 at 10:43 AM, Charles Gomes <charlesrg@outlook.com> wrote:

And I had no performance improvements at all.
Took the same time as with the previous EXECUTE statement;

I don't see what am I doing wrong.

You might not be doing anything wrong. Triggers ARE slow.

If you have "perf" on your system, you could use "perf top" or "perf
record -a" to find out where the CPU time is going while you're doing
stuff that fires this trigger. That might provide some clues about
how to optimize. But it may be that you'll get a completely flat
profile, or something that otherwise boils down to ... triggers are
slow.

In answer to your original question, there is a C language trigger in
contrib/tcn. But, without some proof that the use of PL/pgsql is the
problem, I don't know how far down that road it's worth going. It
might be worth writing a C trigger that does nothing but return the
original tuple, or even a PL/pgsql one. This obviously wouldn't
accomplish anything as far as partitioning goes, but it would let you
measure the overhead of calling a no-op trigger, which could be a
useful thing to know.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Charles Gomes
charlesrg@outlook.com
In reply to: Robert Haas (#9)
Re: Writing Trigger Functions in C

----------------------------------------

Date: Sat, 29 Dec 2012 23:45:06 -0500
Subject: Re: [HACKERS] Writing Trigger Functions in C
From: robertmhaas@gmail.com
To: charlesrg@outlook.com
CC: cbbrowne@gmail.com; pgsql-hackers@postgresql.org

On Mon, Dec 24, 2012 at 10:43 AM, Charles Gomes <charlesrg@outlook.com> wrote:

And I had no performance improvements at all.
Took the same time as with the previous EXECUTE statement;

I don't see what am I doing wrong.

You might not be doing anything wrong. Triggers ARE slow.

If you have "perf" on your system, you could use "perf top" or "perf
record -a" to find out where the CPU time is going while you're doing
stuff that fires this trigger. That might provide some clues about
how to optimize. But it may be that you'll get a completely flat
profile, or something that otherwise boils down to ... triggers are
slow.

In answer to your original question, there is a C language trigger in
contrib/tcn. But, without some proof that the use of PL/pgsql is the
problem, I don't know how far down that road it's worth going. It
might be worth writing a C trigger that does nothing but return the
original tuple, or even a PL/pgsql one. This obviously wouldn't
accomplish anything as far as partitioning goes, but it would let you
measure the overhead of calling a no-op trigger, which could be a
useful thing to know.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

I've translated the trigger to C and performance had not increased, just like you guys said. I've created an article with the trigger and the metrics in case anyone becomes interested in the future http://www.charlesrg.com/linux/71-postgresql-partitioning-the-database-the-fastest-way

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers