Why plpython functions increase transaction counter much more then plpgsql functions?

Started by Michał Albrychtover 1 year ago4 messagesgeneral
Jump to latest
#1Michał Albrycht
michalalbrycht@gmail.com

I'm trying to understand why plpython function has much bigger impact on
transaction counter in Postgres than plpgSQL function. Below is example
which uses 2 functions:

Version with plpgSQL (each part done in separate transactions one after
another)
- check txid_current
- SQL query which calls the `f1_plpgsql` function which calls the
`insert_row_to_db` function 100 times
- check txid_current

Then we compare txid_currnent values and difference is 2 which means that
whole sql with 100 calls to `f1_plpgsql` and `insert_row_to_db` increased
transaction counter only by 1.

Here is the code:
```
CREATE TABLE insert_rows_table(
i BIGINT
);

CREATE OR REPLACE FUNCTION insert_row_to_db(i BIGINT)
RETURNS VOID
AS $$
BEGIN
INSERT INTO insert_rows_table SELECT i;
END
$$ LANGUAGE plpgsql SECURITY DEFINER VOLATILE PARALLEL UNSAFE;

CREATE OR REPLACE FUNCTION f1_plpgsql(i BIGINT)
RETURNS bigint
AS $$
BEGIN
PERFORM insert_row_to_db(i);
RETURN i;
END
$$ LANGUAGE plpgsql SECURITY DEFINER VOLATILE PARALLEL UNSAFE;

SELECT txid_current();
SELECT f1_plpgsql(i::BIGINT) FROM generate_series(1,100) as i;
SELECT txid_current();
```

Example output:

txid_current
500

f1_plpgsql
1
2
...
99
100

txid_current
502

Here is a code reproduction on db-fiddle:
https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/15135

Now let's replace `f1_plpgsql` with function written in plpython:

```
CREATE OR REPLACE FUNCTION f1_plpython(i BIGINT)
RETURNS bigint
AS $$
rows = plpy.execute("SELECT insert_row_to_db(" + str(i) + ")")
return i
$$ LANGUAGE plpython3u SECURITY DEFINER VOLATILE PARALLEL UNSAFE;
```

I get:

txid_current
500

f1_plpgsql
1
2
...
99
100

txid_current
602

This proves that the plpython function affects the transaction counter much
more. Does anyone know why? Is there anything I can do about it?

What's interesting it happens only if the function called by plpyhon makes
changes to DB. When I replace `INSERT INTO insert_rows_table SELECT i;`
with `SELECT i` both plpython and plpgsql functions behave the same.
Regards,

Michał Albrycht

#2Ron
ronljohnsonjr@gmail.com
In reply to: Michał Albrycht (#1)
Re: Why plpython functions increase transaction counter much more then plpgsql functions?

Because the plpython function is executing dynamic SQL?

On Fri, Nov 8, 2024 at 2:59 AM Michał Albrycht <michalalbrycht@gmail.com>
wrote:

I'm trying to understand why plpython function has much bigger impact on
transaction counter in Postgres than plpgSQL function. Below is example
which uses 2 functions:

Version with plpgSQL (each part done in separate transactions one after
another)
- check txid_current
- SQL query which calls the `f1_plpgsql` function which calls the
`insert_row_to_db` function 100 times
- check txid_current

Then we compare txid_currnent values and difference is 2 which means that
whole sql with 100 calls to `f1_plpgsql` and `insert_row_to_db` increased
transaction counter only by 1.

Here is the code:
```
CREATE TABLE insert_rows_table(
i BIGINT
);

CREATE OR REPLACE FUNCTION insert_row_to_db(i BIGINT)
RETURNS VOID
AS $$
BEGIN
INSERT INTO insert_rows_table SELECT i;
END
$$ LANGUAGE plpgsql SECURITY DEFINER VOLATILE PARALLEL UNSAFE;

CREATE OR REPLACE FUNCTION f1_plpgsql(i BIGINT)
RETURNS bigint
AS $$
BEGIN
PERFORM insert_row_to_db(i);
RETURN i;
END
$$ LANGUAGE plpgsql SECURITY DEFINER VOLATILE PARALLEL UNSAFE;

SELECT txid_current();
SELECT f1_plpgsql(i::BIGINT) FROM generate_series(1,100) as i;
SELECT txid_current();
```

Example output:

txid_current
500

f1_plpgsql
1
2
...
99
100

txid_current
502

Here is a code reproduction on db-fiddle:
https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/15135

Now let's replace `f1_plpgsql` with function written in plpython:

```
CREATE OR REPLACE FUNCTION f1_plpython(i BIGINT)
RETURNS bigint
AS $$
rows = plpy.execute("SELECT insert_row_to_db(" + str(i) + ")")
return i
$$ LANGUAGE plpython3u SECURITY DEFINER VOLATILE PARALLEL UNSAFE;
```

I get:

txid_current
500

f1_plpgsql
1
2
...
99
100

txid_current
602

This proves that the plpython function affects the transaction counter
much more. Does anyone know why? Is there anything I can do about it?

What's interesting it happens only if the function called by plpyhon makes
changes to DB. When I replace `INSERT INTO insert_rows_table SELECT i;`
with `SELECT i` both plpython and plpgsql functions behave the same.
Regards,

Michał Albrycht

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michał Albrycht (#1)
Re: Why plpython functions increase transaction counter much more then plpgsql functions?

=?UTF-8?Q?Micha=C5=82_Albrycht?= <michalalbrycht@gmail.com> writes:

This proves that the plpython function affects the transaction counter much
more. Does anyone know why?

A quick look at PLy_spi_execute_query shows that it runs each command
in a subtransaction. It pretty much has to, because the coding rules
for a Python method don't permit it to just longjmp out of the Python
interpreter, so it has to set up a subtransaction so it can catch any
error. In this example, each subtransaction will consume an XID.

The plpgsql example is different because it doesn't trap errors,
hence no subtransaction needed, and all the rows will get inserted
under the XID of the outer command's main transaction. If you were
to wrap the insert_row_to_db call in BEGIN ... EXCEPTION then it'd
consume the same number of XIDs as plpython, for the same reason.

Is there anything I can do about it?

Batch the DB updates, perhaps?

What's interesting it happens only if the function called by plpyhon makes
changes to DB.

Totally unsurprising. XIDs are acquired only when the current
transaction or subtransaction first needs to change the DB.

regards, tom lane

#4Michał Albrycht
michalalbrycht@gmail.com
In reply to: Tom Lane (#3)
Re: Why plpython functions increase transaction counter much more then plpgsql functions?

A quick look at PLy_spi_execute_query shows that it runs each command
in a subtransaction. It pretty much has to, because the coding rules
for a Python method don't permit it to just longjmp out of the Python
interpreter, so it has to set up a subtransaction so it can catch any
error. In this example, each subtransaction will consume an XID.

That makes sense. Thank you very much.

Regards,

Michał Albrycht