Invoking SQL function while doing CREATE OR REPLACE on it

Started by Nagendra Mahesh (namahesh)almost 3 years ago4 messagesgeneral
Jump to latest

I have a Postgres 14.4 cluster (AWS Aurora) to which I connect from my application using JDBC.

I use liquibase for schema management - not only tables, but also a bunch of SQL stored procedures and functions. Basically, there is one liquibase changeSet that runs last and executes a set of SQL files which contain stored procedures and functions.

CREATE OR REPLACE FUNCTION bar(arg1 text, arg2 text) RETURNS record LANGUAGE "plpgsql" AS '
BEGIN
// function body
END;
';

These functions / procedures are replaced ONLY when there is a change in one / more SQL files which are part of this changeSet. (runOnChange: true).

Whenever I do a rolling deployment of my application (say, with a change in the function body of bar()), liquibase will execute the CREATE OR REPLACE FUNCTION bar() as part of a transaction.

In the few milliseconds while bar() is being replaced, there are other ongoing transactions (from other replicas of my application) which are continuously trying to invoke bar().

Only in this tiny time window, few transactions fail with the following error:

ERROR: function bar(arg1 => text, arg2 => text) does not exist
Hint: No function matches the given name and argument types. You might need to add explicit type casts.
Position: 4 : errorCode = 42883

I don't want any of these transactions to fail (we do not have any proper way of re-trying them from the application layer).
This is seen as affecting availability.
However, it is acceptable for these transactions to BLOCK (for a few hundred ms) while the SQL function body is being replaced, and then proceed with invocation.

Is there a way to safely modify a stored function / procedure in PostgreSQL while that function / procedure is being invoked continuously by multiple transactions?

Thanks!

#2Erik Wienhold
ewie@ewie.name
In reply to: Nagendra Mahesh (namahesh) (#1)
Re: Invoking SQL function while doing CREATE OR REPLACE on it

On 03/05/2023 20:17 CEST Nagendra Mahesh (namahesh) <namahesh@cisco.com> wrote:

I have a Postgres 14.4 cluster (AWS Aurora) to which I connect from my
application using JDBC.

I use liquibase for schema management - not only tables, but also a bunch of
SQL stored procedures and functions. Basically, there is one liquibase
changeSet that runs last and executes a set of SQL files which contain stored
procedures and functions.

CREATE OR REPLACE FUNCTION bar(arg1 text, arg2 text) RETURNS record LANGUAGE "plpgsql" AS '
BEGIN
// function body
END;
';

These functions / procedures are replaced ONLY when there is a change in one /
more SQL files which are part of this changeSet. (runOnChange: true).

Whenever I do a rolling deployment of my application (say, with a change in
the function body of bar()), liquibase will execute the CREATE OR REPLACE FUNCTION bar()
as part of a transaction.

In the few milliseconds while bar() is being replaced, there are other ongoing
transactions (from other replicas of my application) which are continuously
trying to invoke bar().

Only in this tiny time window, few transactions fail with the following error:

ERROR: function bar(arg1 => text, arg2 => text) does not exist
Hint: No function matches the given name and argument types. You might need to add explicit type casts.
Position: 4 : errorCode = 42883

CREATE OR REPLACE FUNCTION should be atomic and cannot change the function
signature. I don't see how a function cannot exist at some point in this case.

Are you sure that Liquibase is not dropping the function before re-creating it?
If Liquibase drops and re-creates the function in separate transactions, the
transactions trying to execute that function may find it dropped when using the
read committed isolation level.

There's also a race condition bug in v14.4 that may be relevant. It got fixed
in v14.5. See "Fix race condition when checking transaction visibility" in
https://www.postgresql.org/docs/14/release-14-5.html.

--
Erik

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Erik Wienhold (#2)
Re: Invoking SQL function while doing CREATE OR REPLACE on it

Erik Wienhold <ewie@ewie.name> writes:

On 03/05/2023 20:17 CEST Nagendra Mahesh (namahesh) <namahesh@cisco.com> wrote:

I have a Postgres 14.4 cluster (AWS Aurora) to which I connect from my
application using JDBC.

Aurora uses, I believe, some other storage engine entirely than
community Postgres has.

Only in this tiny time window, few transactions fail with the following error:
ERROR: function bar(arg1 => text, arg2 => text) does not exist
Hint: No function matches the given name and argument types. You might need to add explicit type casts.

There's also a race condition bug in v14.4 that may be relevant. It got fixed
in v14.5. See "Fix race condition when checking transaction visibility" in
https://www.postgresql.org/docs/14/release-14-5.html.

That race could easily explain this symptom: during the update, there
are two versions of the function's pg_proc row, and it could be that
both of them appear "dead" to an onlooker transaction if one of them
is inspected during the race window. Then the onlooker would find
no live version of the row and report that it doesn't exist.

But having said that, it's not clear to me whether Aurora's storage
engine shares this bug with community PG, or you're seeing some
independent bug of theirs that happens to have a similar symptom.
It's even less clear whether AWS would have applied the fix yet if it
is a shared bug. You really need to discuss this with AWS support.

regards, tom lane

In reply to: Erik Wienhold (#2)
Re: Invoking SQL function while doing CREATE OR REPLACE on it

Thanks for the insight, Erik.
Turns out, it was indeed liquibase which was:
- dropping the functions in one transaction (inadvertently as the result of a DROP TYPE CASCADE)
- immediately followed by the next transaction which did a CREATE OR REPLACE FUNCTION which recreated these functions

The tiny gap in time after the first transaction commit, before the second transaction could commit was when the function actually ceased to exist.
So, the invocations during that time failed.

Thanks again.

From: Erik Wienhold <ewie@ewie.name>
Sent: Thursday, May 4, 2023 2:33 AM
To: Nagendra Mahesh (namahesh) <namahesh@cisco.com>; pgsql-general@postgresql.org <pgsql-general@postgresql.org>
Subject: Re: Invoking SQL function while doing CREATE OR REPLACE on it
 

On 03/05/2023 20:17 CEST Nagendra Mahesh (namahesh) <namahesh@cisco.com> wrote:

I have a Postgres 14.4 cluster (AWS Aurora) to which I connect from my
application using JDBC.

I use liquibase for schema management - not only tables, but also a bunch of
SQL stored procedures and functions. Basically, there is one liquibase
changeSet that runs last and executes a set of SQL files which contain stored
procedures and functions.

CREATE OR REPLACE FUNCTION bar(arg1 text, arg2 text) RETURNS record LANGUAGE "plpgsql" AS '
BEGIN
    // function body
END;
';

These functions / procedures are replaced ONLY when there is a change in one /
more SQL files which are part of this changeSet. (runOnChange: true).

Whenever I do a rolling deployment of my application (say, with a change in
the function body of bar()), liquibase will execute the CREATE OR REPLACE FUNCTION bar()
as part of a transaction.

In the few milliseconds while bar() is being replaced, there are other ongoing
transactions (from other replicas of my application) which are continuously
trying to invoke bar().

Only in this tiny time window, few transactions fail with the following error:

ERROR: function bar(arg1 => text, arg2 => text) does not exist
   Hint: No function matches the given name and argument types. You might need to add explicit type casts.
Position: 4 : errorCode = 42883

CREATE OR REPLACE FUNCTION should be atomic and cannot change the function
signature.  I don't see how a function cannot exist at some point in this case.

Are you sure that Liquibase is not dropping the function before re-creating it?
If Liquibase drops and re-creates the function in separate transactions, the
transactions trying to execute that function may find it dropped when using the
read committed isolation level.

There's also a race condition bug in v14.4 that may be relevant.  It got fixed
in v14.5.  See "Fix race condition when checking transaction visibility" in
https://www.postgresql.org/docs/14/release-14-5.html.

--
Erik