Using Ephemeral Named Relation like a temporary table

Started by Yugo NAGATAalmost 3 years ago3 messages
#1Yugo NAGATA
nagata@sraoss.co.jp
1 attachment(s)

Hello,

Temporary tables are often used to store transient data in
batch processing and the contents can be accessed multiple
times. However, frequent use of temporary tables has a problem
that the system catalog tends to bloat. I know there has been
several proposals to attack this problem, but I would like to
propose a new one.

The idea is to use Ephemeral Named Relation (ENR) like a
temporary table. ENR information is not stored into the system
catalog, but in QueryEnvironment, so it never bloat the system
catalog.

Although we cannot perform insert, update or delete on ENR,
I wonder it could be beneficial if we need to reference to a
result of a query multiple times in a batch processing.

The attached is a concept patch. This adds a new syntax
"OPEN cursor INTO TABLE tablename" to pl/pgSQL, that stores
a result of the cursor query into a ENR with specified name.
However, this is a tentative interface to demonstrate the
concept of feature.

Here is an example;

postgres=# \sf fnc
CREATE OR REPLACE FUNCTION public.fnc()
RETURNS TABLE(sum1 integer, avg1 integer, sum2 integer, avg2 integer)
LANGUAGE plpgsql
AS $function$
DECLARE
sum1 integer;
sum2 integer;
avg1 integer;
avg2 integer;
curs CURSOR FOR SELECT aid, bid, abalance FROM pgbench_accounts
WHERE abalance BETWEEN 100 AND 200;
BEGIN
OPEN curs INTO TABLE tmp_accounts;
SELECT count(abalance) , avg(abalance) INTO sum1, avg1
FROM tmp_accounts;
SELECT count(bbalance), avg(bbalance) INTO sum2, avg2
FROM tmp_accounts a, pgbench_branches b WHERE a.bid = b.bid;
RETURN QUERY SELECT sum1,avg1,sum2,avg2;
END;
$function$

postgres=# select fnc();
fnc
--------------------
(541,151,541,3937)
(1 row)

As above, we can use the same query result for multiple
aggregations, and also join it with other tables.

What do you think of using ENR for this way?

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

Attachments:

use_enr_like_temptable.patchtext/x-diff; name=use_enr_like_temptable.patchDownload
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index e3a170c38b..27e94ecc87 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -20,6 +20,7 @@
 #include "access/xact.h"
 #include "catalog/heap.h"
 #include "catalog/pg_type.h"
+#include "commands/portalcmds.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
 #include "executor/spi_priv.h"
@@ -3378,3 +3379,30 @@ SPI_register_trigger_data(TriggerData *tdata)
 
 	return SPI_OK_TD_REGISTER;
 }
+
+int
+SPI_register_portal(Portal portal, char *name)
+{
+	int			rc;
+	EphemeralNamedRelation enr;
+
+	if (portal == NULL || name == NULL)
+		return SPI_ERROR_ARGUMENT;
+
+	PortalCreateHoldStore(portal);
+	PersistHoldablePortal(portal);
+
+	enr = palloc(sizeof(EphemeralNamedRelationData));
+
+	enr->md.name = name;
+	enr->md.reliddesc = InvalidOid;
+	enr->md.tupdesc = portal->tupDesc;
+	enr->md.enrtype = ENR_NAMED_TUPLESTORE;
+	enr->md.enrtuples = tuplestore_tuple_count(portal->holdStore);
+	enr->reldata = portal->holdStore;
+	rc = SPI_register_relation(enr);
+	if (rc != SPI_OK_REL_REGISTER)
+			return rc;
+
+	return SPI_OK_TD_REGISTER;
+}
diff --git a/src/include/executor/spi.h b/src/include/executor/spi.h
index d1de139a3b..40753dc78a 100644
--- a/src/include/executor/spi.h
+++ b/src/include/executor/spi.h
@@ -199,6 +199,7 @@ extern void SPI_cursor_close(Portal portal);
 extern int	SPI_register_relation(EphemeralNamedRelation enr);
 extern int	SPI_unregister_relation(const char *name);
 extern int	SPI_register_trigger_data(TriggerData *tdata);
+extern int	SPI_register_portal(Portal portal, char *name);
 
 extern void SPI_start_transaction(void);
 extern void SPI_commit(void);
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index b0a2cac227..5d561b39bd 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -4776,6 +4776,9 @@ exec_stmt_open(PLpgSQL_execstate *estate, PLpgSQL_stmt_open *stmt)
 		elog(ERROR, "could not open cursor: %s",
 			 SPI_result_code_string(SPI_result));
 
+	if (stmt->tablename)
+		SPI_register_portal(portal, stmt->tablename);
+
 	/*
 	 * If cursor variable was NULL, store the generated portal name in it,
 	 * after verifying it's okay to assign to.
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index edeb72c380..576ea86943 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -185,7 +185,7 @@ static	void			check_raise_parameters(PLpgSQL_stmt_raise *stmt);
 %type <ival>	foreach_slice
 %type <stmt>	for_control
 
-%type <str>		any_identifier opt_block_label opt_loop_label opt_label
+%type <str>		any_identifier opt_block_label opt_into_table opt_loop_label opt_label
 %type <str>		option_value
 
 %type <list>	proc_sect stmt_elsifs stmt_else
@@ -2064,7 +2064,17 @@ stmt_dynexecute : K_EXECUTE
 				;
 
 
-stmt_open		: K_OPEN cursor_variable
+opt_into_table	:
+					{
+						$$ = NULL;
+					}
+				| K_INTO K_TABLE any_identifier
+					{
+						$$ = $3;
+					}
+				;
+
+stmt_open		: K_OPEN cursor_variable opt_into_table
 					{
 						PLpgSQL_stmt_open *new;
 						int			tok;
@@ -2075,6 +2085,7 @@ stmt_open		: K_OPEN cursor_variable
 						new->stmtid = ++plpgsql_curr_compile->nstatements;
 						new->curvar = $2->dno;
 						new->cursor_options = CURSOR_OPT_FAST_PLAN;
+						new->tablename = $3;
 
 						if ($2->cursor_explicit_expr == NULL)
 						{
diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h
index 355c9f678d..f1b1fa147b 100644
--- a/src/pl/plpgsql/src/plpgsql.h
+++ b/src/pl/plpgsql/src/plpgsql.h
@@ -768,6 +768,7 @@ typedef struct PLpgSQL_stmt_open
 	PLpgSQL_expr *query;
 	PLpgSQL_expr *dynquery;
 	List	   *params;			/* USING expressions */
+	char *tablename;
 } PLpgSQL_stmt_open;
 
 /*
#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Yugo NAGATA (#1)
Re: Using Ephemeral Named Relation like a temporary table

Hi

st 29. 3. 2023 v 6:54 odesílatel Yugo NAGATA <nagata@sraoss.co.jp> napsal:

Hello,

Temporary tables are often used to store transient data in
batch processing and the contents can be accessed multiple
times. However, frequent use of temporary tables has a problem
that the system catalog tends to bloat. I know there has been
several proposals to attack this problem, but I would like to
propose a new one.

The idea is to use Ephemeral Named Relation (ENR) like a
temporary table. ENR information is not stored into the system
catalog, but in QueryEnvironment, so it never bloat the system
catalog.

Although we cannot perform insert, update or delete on ENR,
I wonder it could be beneficial if we need to reference to a
result of a query multiple times in a batch processing.

The attached is a concept patch. This adds a new syntax
"OPEN cursor INTO TABLE tablename" to pl/pgSQL, that stores
a result of the cursor query into a ENR with specified name.
However, this is a tentative interface to demonstrate the
concept of feature.

Here is an example;

postgres=# \sf fnc
CREATE OR REPLACE FUNCTION public.fnc()
RETURNS TABLE(sum1 integer, avg1 integer, sum2 integer, avg2 integer)
LANGUAGE plpgsql
AS $function$
DECLARE
sum1 integer;
sum2 integer;
avg1 integer;
avg2 integer;
curs CURSOR FOR SELECT aid, bid, abalance FROM pgbench_accounts
WHERE abalance BETWEEN 100 AND 200;
BEGIN
OPEN curs INTO TABLE tmp_accounts;
SELECT count(abalance) , avg(abalance) INTO sum1, avg1
FROM tmp_accounts;
SELECT count(bbalance), avg(bbalance) INTO sum2, avg2
FROM tmp_accounts a, pgbench_branches b WHERE a.bid = b.bid;
RETURN QUERY SELECT sum1,avg1,sum2,avg2;
END;
$function$

postgres=# select fnc();
fnc
--------------------
(541,151,541,3937)
(1 row)

As above, we can use the same query result for multiple
aggregations, and also join it with other tables.

What do you think of using ENR for this way?

The idea looks pretty good. I think it can be very useful. I am not sure if
this design is intuitive. If I remember well, the Oracle's has similar
features, and can be nice if we use the same or more similar syntax
(although I am not sure how it can be implementable)? I think so PL/SQL
design has an advantage, because you don't need to solve the scope of the
cursor's assigned table.

OPEN curs INTO TABLE tmp_accounts; -- it looks little bit strange. I miss
info, so tmp_accounts is not normal table

what about

OPEN curs INTO CURSOR TABLE xxx;

or

OPEN curs FOR CURSOR TABLE xxx

Regards

Pavel

Show quoted text

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

#3Corey Huinker
corey.huinker@gmail.com
In reply to: Yugo NAGATA (#1)
Re: Using Ephemeral Named Relation like a temporary table

On Wed, Mar 29, 2023 at 12:54 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:

Hello,

Temporary tables are often used to store transient data in
batch processing and the contents can be accessed multiple
times. However, frequent use of temporary tables has a problem
that the system catalog tends to bloat. I know there has been
several proposals to attack this problem, but I would like to
propose a new one.

The idea is to use Ephemeral Named Relation (ENR) like a
temporary table. ENR information is not stored into the system
catalog, but in QueryEnvironment, so it never bloat the system
catalog.

Although we cannot perform insert, update or delete on ENR,
I wonder it could be beneficial if we need to reference to a
result of a query multiple times in a batch processing.

The attached is a concept patch. This adds a new syntax
"OPEN cursor INTO TABLE tablename" to pl/pgSQL, that stores
a result of the cursor query into a ENR with specified name.
However, this is a tentative interface to demonstrate the
concept of feature.

Here is an example;

postgres=# \sf fnc
CREATE OR REPLACE FUNCTION public.fnc()
RETURNS TABLE(sum1 integer, avg1 integer, sum2 integer, avg2 integer)
LANGUAGE plpgsql
AS $function$
DECLARE
sum1 integer;
sum2 integer;
avg1 integer;
avg2 integer;
curs CURSOR FOR SELECT aid, bid, abalance FROM pgbench_accounts
WHERE abalance BETWEEN 100 AND 200;
BEGIN
OPEN curs INTO TABLE tmp_accounts;
SELECT count(abalance) , avg(abalance) INTO sum1, avg1
FROM tmp_accounts;
SELECT count(bbalance), avg(bbalance) INTO sum2, avg2
FROM tmp_accounts a, pgbench_branches b WHERE a.bid = b.bid;
RETURN QUERY SELECT sum1,avg1,sum2,avg2;
END;
$function$

postgres=# select fnc();
fnc
--------------------
(541,151,541,3937)
(1 row)

As above, we can use the same query result for multiple
aggregations, and also join it with other tables.

What do you think of using ENR for this way?

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

This looks like a slightly more flexible version of the Oracle pl/sql table
type.

For those not familiar, PL/SQL can have record types, and in-memory
collections of records types, and you can either build up multiple records
in a collection manually, or you can bulk-collect them from a query. Then,
you can later reference that collection in a regular SQL query with FROM
TABLE(collection_name). It's a neat system for certain types of workloads.

example link, I'm sure there's better out there:
https://oracle-base.com/articles/12c/using-the-table-operator-with-locally-defined-types-in-plsql-12cr1

My first take is there are likely customers out there that will want this.
However, those customers will want to manually add/delete rows from the
ENR, so we'll want a way to do that.

I haven't looked at ENRs in a while, when would the memory from that ENR
get freed?