Keeping state in a foreign data wrapper

Started by Stelios Sfakianakisover 5 years ago8 messagesgeneral
Jump to latest
#1Stelios Sfakianakis
sgsfak@gmail.com

Hi,

I am trying to implement a FDW in Postgres for accessing a web api and I would like to keep information like for example the total number of requests submiited. Ideally these data should be kept “per-user” and of course with the proper locking to eliminate race conditions. So the question I have is how to design such a global (but per user and database) state, using the C FDW API of Postgres. I have gone through the API and for example I see various fdw_private fields in the query planning structures and callbacks but these do not seem to be relevant to my requirements. Another possiblity seems to be to use shared memory (https://www.postgresql.org/docs/12/xfunc-c.html#id-1.8.3.13.14) but it is even less clear how to do it.

So is there any simple way to do implement such shared thread/process safe state?
Thank you
Stelios

#2Ian Lawrence Barwick
barwick@gmail.com
In reply to: Stelios Sfakianakis (#1)
Re: Keeping state in a foreign data wrapper

2020年8月4日(火) 1:24 Stelios Sfakianakis <sgsfak@gmail.com>:

Hi,

I am trying to implement a FDW in Postgres for accessing a web api and I would like to keep information like for example the total number of requests submiited. Ideally these data should be kept “per-user” and of course with the proper locking to eliminate race conditions. So the question I have is how to design such a global (but per user and database) state, using the C FDW API of Postgres. I have gone through the API and for example I see various fdw_private fields in the query planning structures and callbacks but these do not seem to be relevant to my requirements. Another possiblity seems to be to use shared memory (https://www.postgresql.org/docs/12/xfunc-c.html#id-1.8.3.13.14) but it is even less clear how to do it.

Shared memory would probably work; you'd need to load the FDW via
"shared_preload_libraries" and have the FDW handler function(s) update
shared memory with whatever statistics you want to track. You could
then define SQL functions to retrieve the stored values, and possibly
persist them over server restarts by storing/retrieving them from a
file.

Look at "pg_stat_statements" for a good example of how to do that kind of thing.

Regards

Ian Barwick

--
Ian Barwick https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#3Stelios Sfakianakis
sgsfak@gmail.com
In reply to: Ian Lawrence Barwick (#2)
Re: Keeping state in a foreign data wrapper

Thank you Ian for the prompt reply! I will certainly have a look at pg_stat_statements

I also came across mysql_fdw (https://github.com/EnterpriseDB/mysql_fdw <https://github.com/EnterpriseDB/mysql_fdw&gt;) that features a connection pool shared across queries. It uses a hash table with the serverid and userid as lookup key : https://github.com/EnterpriseDB/mysql_fdw/blob/REL-2_5_4/connection.c#L55 <https://github.com/EnterpriseDB/mysql_fdw/blob/REL-2_5_4/connection.c#L55&gt; The hash table is allocated in the cache memory context but it worries me that 1) no locks are used, 2) the "ConnectionHash" variable is declared static so in the multi-process architecture of Postgres could have been the case that multiple copies of this exist when the shared library of mysql_fdw is loaded?

Best regards
Stelios

Show quoted text

On 4 Aug 2020, at 06:25, Ian Lawrence Barwick <barwick@gmail.com> wrote:

2020年8月4日(火) 1:24 Stelios Sfakianakis <sgsfak@gmail.com>:

Hi,

I am trying to implement a FDW in Postgres for accessing a web api and I would like to keep information like for example the total number of requests submiited. Ideally these data should be kept “per-user” and of course with the proper locking to eliminate race conditions. So the question I have is how to design such a global (but per user and database) state, using the C FDW API of Postgres. I have gone through the API and for example I see various fdw_private fields in the query planning structures and callbacks but these do not seem to be relevant to my requirements. Another possiblity seems to be to use shared memory (https://www.postgresql.org/docs/12/xfunc-c.html#id-1.8.3.13.14) but it is even less clear how to do it.

Shared memory would probably work; you'd need to load the FDW via
"shared_preload_libraries" and have the FDW handler function(s) update
shared memory with whatever statistics you want to track. You could
then define SQL functions to retrieve the stored values, and possibly
persist them over server restarts by storing/retrieving them from a
file.

Look at "pg_stat_statements" for a good example of how to do that kind of thing.

Regards

Ian Barwick

--
Ian Barwick https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#4Ian Lawrence Barwick
barwick@gmail.com
In reply to: Stelios Sfakianakis (#3)
Re: Keeping state in a foreign data wrapper

2020年8月4日(火) 14:54 Stelios Sfakianakis <sgsfak@gmail.com>:

On 4 Aug 2020, at 06:25, Ian Lawrence Barwick <barwick@gmail.com> wrote:

2020年8月4日(火) 1:24 Stelios Sfakianakis <sgsfak@gmail.com>:

Hi,

I am trying to implement a FDW in Postgres for accessing a web api and I would like to keep information like for example the total number of requests submiited. Ideally these data should be kept “per-user” and of course with the proper locking to eliminate race conditions. So the question I have is how to design such a global (but per user and database) state, using the C FDW API of Postgres. I have gone through the API and for example I see various fdw_private fields in the query planning structures and callbacks but these do not seem to be relevant to my requirements. Another possiblity seems to be to use shared memory (https://www.postgresql.org/docs/12/xfunc-c.html#id-1.8.3.13.14) but it is even less clear how to do it.

Shared memory would probably work; you'd need to load the FDW via
"shared_preload_libraries" and have the FDW handler function(s) update
shared memory with whatever statistics you want to track. You could
then define SQL functions to retrieve the stored values, and possibly
persist them over server restarts by storing/retrieving them from a
file.

Look at "pg_stat_statements" for a good example of how to do that kind of thing.

Thank you Ian for the prompt reply! I will certainly have a look at pg_stat_statements

I also came across mysql_fdw (https://github.com/EnterpriseDB/mysql_fdw) that features a connection pool shared across queries. It uses a hash table with the serverid and userid as lookup key : https://github.com/EnterpriseDB/mysql_fdw/blob/REL-2_5_4/connection.c#L55

This is essentially the same as what "postgres_fdw" and similar FDW
implementations do.

The hash table is allocated in the cache memory context but it worries me that 1) no locks are used, 2) the "ConnectionHash" variable is declared static so in the multi-process architecture of Postgres could have been the case that multiple copies of this exist when the shared library of mysql_fdw is loaded?

The hash table is specific to each running backend so will only be
accessed by that process.

Pre-loading a shared library just gives the library an opportunity to
set up shared memory etc. You can always try adding one of the FDW
libraries to "shared_preload_libraries" and see what happens
(theoretically nothing).

Regards

Ian Barwick

Show quoted text

Best regards
Stelios

Regards

Ian Barwick

--
Ian Barwick https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#5Stelios Sfakianakis
sgsfak@gmail.com
In reply to: Ian Lawrence Barwick (#4)
Re: Keeping state in a foreign data wrapper

Thank you again, I have another question in order to make sure I have a clear understanding:

On 4 Aug 2020, at 11:24, Ian Lawrence Barwick <barwick@gmail.com> wrote:

The hash table is specific to each running backend so will only be
accessed by that process.

Pre-loading a shared library just gives the library an opportunity to
set up shared memory etc. You can always try adding one of the FDW
libraries to "shared_preload_libraries" and see what happens
(theoretically nothing).

My impression was that since each client (e.g. libpq) connection results in the creation of a Postgres process in the backend (https://www.postgresql.org/developer/backend/) then this (mysql) "connection pool" hash table is not global per se and shared among the different client / users sessions. But that defeats the purpose, no?

Thank you
Best
Stelios

#6Ian Lawrence Barwick
barwick@gmail.com
In reply to: Stelios Sfakianakis (#5)
Re: Keeping state in a foreign data wrapper

On 2020/08/04 19:21, Stelios Sfakianakis wrote:

Thank you again, I have another question in order to make sure I have a clear understanding:

On 4 Aug 2020, at 11:24, Ian Lawrence Barwick <barwick@gmail.com> wrote:

The hash table is specific to each running backend so will only be
accessed by that process.

Pre-loading a shared library just gives the library an opportunity to
set up shared memory etc. You can always try adding one of the FDW
libraries to "shared_preload_libraries" and see what happens
(theoretically nothing).

My impression was that since each client (e.g. libpq) connection results in the creation of a Postgres process in the backend (https://www.postgresql.org/developer/backend/) then this (mysql) "connection pool" hash table is not global per se and shared among the different client / users sessions.

Correct, the connections are specific to each individual backend.

But that defeats the purpose, no?

The purpose is to cache connections within the session, to avoid the overhead
of reconnecting to the remote server each time a query for that server is issued
in that session.

Regards

Ian Barwick

--
Ian Barwick https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#7Ian Lawrence Barwick
barwick@gmail.com
In reply to: Stelios Sfakianakis (#5)
Re: Keeping state in a foreign data wrapper

On 2020/08/04 19:21, Stelios Sfakianakis wrote:

Thank you again, I have another question in order to make sure I have a clear understanding:

On 4 Aug 2020, at 11:24, Ian Lawrence Barwick <barwick@gmail.com> wrote:

The hash table is specific to each running backend so will only be
accessed by that process.

Pre-loading a shared library just gives the library an opportunity to
set up shared memory etc. You can always try adding one of the FDW
libraries to "shared_preload_libraries" and see what happens
(theoretically nothing).

My impression was that since each client (e.g. libpq) connection results in the creation of a Postgres process in the backend (https://www.postgresql.org/developer/backend/) then this (mysql) "connection pool" hash table is not global per se and shared among the different client / users sessions.

Correct, the connections are specific to each individual backend.

But that defeats the purpose, no?

The purpose is to cache connections within the session, to avoid the overhead
of reconnecting to the remote server each time a query for that server is issued
in that session.

Regards

Ian Barwick

--
Ian Barwick https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#8Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Stelios Sfakianakis (#1)
Re: Keeping state in a foreign data wrapper

On Mon, 2020-08-03 at 19:23 +0300, Stelios Sfakianakis wrote:

I am trying to implement a FDW in Postgres for accessing a web api and I would
like to keep information like for example the total number of requests submiited.
Ideally these data should be kept “per-user” and of course with the proper locking
to eliminate race conditions. So the question I have is how to design such a global
(but per user and database) state, using the C FDW API of Postgres. I have gone
through the API and for example I see various fdw_private fields in the query
planning structures and callbacks but these do not seem to be relevant to my requirements.
Another possiblity seems to be to use shared memory
(https://www.postgresql.org/docs/12/xfunc-c.html#id-1.8.3.13.14) but it is even
less clear how to do it.

So is there any simple way to do implement such shared thread/process safe state?

I would create a table as part of the extension and use SPI to store
the data there.

Yours,
Laurenz Albe