query multiple schemas

Started by Norbert Sándoralmost 2 years ago8 messagesgeneral

sandor.norbert@erinors.com

almost 2 years ago

Hello,

I have a database with multiple tenants with a separate schema for each
tenant.
The structure of each schema is identical, the tenant ID is the name of
the schema.

What I would like to achieve is to able to query tables in all schemas
at once with the tenant IDs added to the result set.

I experimented with typed solutions like described in
https://dev.to/staab/how-to-query-multiple-schemas-with-postgresql-28kj
without much success.
So I turned to a more dynamic JSON-based solution.

Please note that I'm new to plpgsql, so /any/ (even a less related)
advice is welcome :)

My current experimental function is:

CREATE OR REPLACE FUNCTION demo.tenant_union_query(tbl anyelement)

RETURNS setof json AS $func$

declare

_select text;

begin

_select := (select

string_agg(

format('select t.*, %L tenantId from %I.%I t', schema_name,
schema_name, pg_typeof(tbl)),

E'\n' || ' union all ' || E'\n')

from (

SELECT schema_name

FROM information_schema.schemata

where schema_name not in ('information_schema') and schema_name not
like 'pg_%'

) tenants

);

return query execute 'select row_to_json(r) from (' || _select || ')
as r';

END;

$func$ LANGUAGE plpgsql;

And this is how I use it to query a "usual" result-set-like result with
the tenant ID in the 1st column, followed by the fields from the given
table:

select r->>'tenantid' tenantId, json_populate_record(null::mytable,
r) from tenant_union_query(null::mytable) r order by tenantId; -- (1)

The above solution seems to work, my questions are:

1. Is there a better way to achieve the same functionality? Maybe
without using JSON as an intermediate representation?
2. Is there a way to further simplify the usage, like instead of the
query (1) above something more simple, like: select * from
tenant_union_query_2(null::mytable) order by tenantId;

Thanks for your help in advance.
Best regards,
Norbi

Steve Baldwin

steve.baldwin@gmail.com

almost 2 years ago

In reply to: Norbert Sándor (#1)

Re: query multiple schemas

Hi Norbi,

If the number of tenant schemas is reasonably static, you could write a
plpgsql function to create a set of UNION ALL views with one view for each
table in all tenant schemas. You could re-run the function each time a
tenant schema is added. Having the set of views would allow you to query
them as you would any of the underlying tables, and the query planner could
likely optimise the query better. With your current function, if you needed
to add a WHERE clause and the underlying tables were large, it would likely
not perform as well as the UNION ALL view.

Cheers,

Steve

On Mon, Apr 22, 2024 at 6:12 AM Norbert Sándor <sandor.norbert@erinors.com>
wrote:

Show quoted text

Hello,

I have a database with multiple tenants with a separate schema for each
tenant.
The structure of each schema is identical, the tenant ID is the name of
the schema.

What I would like to achieve is to able to query tables in all schemas at
once with the tenant IDs added to the result set.

I experimented with typed solutions like described in
https://dev.to/staab/how-to-query-multiple-schemas-with-postgresql-28kj
without much success.
So I turned to a more dynamic JSON-based solution.

Please note that I'm new to plpgsql, so *any* (even a less related)
advice is welcome :)

My current experimental function is:

CREATE OR REPLACE FUNCTION demo.tenant_union_query(tbl anyelement)

RETURNS setof json AS $func$

declare

_select text;

begin

_select := (select

string_agg(

format('select t.*, %L tenantId from %I.%I t', schema_name, schema_name,
pg_typeof(tbl)),

E'\n' || ' union all ' || E'\n')

from (

SELECT schema_name

FROM information_schema.schemata

where schema_name not in ('information_schema') and schema_name not like
'pg_%'

) tenants

);

return query execute 'select row_to_json(r) from (' || _select || ') as r'
;

END;

$func$ LANGUAGE plpgsql;

And this is how I use it to query a "usual" result-set-like result with
the tenant ID in the 1st column, followed by the fields from the given
table:

select r->>'tenantid' tenantId, json_populate_record(null::mytable, r)
from tenant_union_query(null::mytable) r order by tenantId; -- (1)

The above solution seems to work, my questions are:

1. Is there a better way to achieve the same functionality? Maybe
without using JSON as an intermediate representation?
2. Is there a way to further simplify the usage, like instead of the
query (1) above something more simple, like: select * from
tenant_union_query_2(null::mytable) order by tenantId;

Thanks for your help in advance.
Best regards,
Norbi

David G. Johnston

david.g.johnston@gmail.com

almost 2 years ago

In reply to: Norbert Sándor (#1)

On Sunday, April 21, 2024, Norbert Sándor <sandor.norbert@erinors.com>
wrote:

The structure of each schema is identical, the tenant ID is the name of
the schema.

You’ve hit the main reason why the scheme you choose is usually avoided.
Better to just add tenant_id to your tables in the first place. And use
partitioned tables if you desire physical separation.

The above solution seems to work, my questions are:

1. Is there a better way to achieve the same functionality? Maybe
without using JSON as an intermediate representation?

In-database, I doubt it (though I didn’t study your specific solution in
depth). Json provides the easiest way to generate the virtual tables you
need.

Otherwise maybe try something with say bash scripting and psql scripts; or
some other client-side setup where you separate the query and the metadata
lookups so the queries just return normal results and the client takes them
are merges them.

David J.

Tom Lane

tgl@sss.pgh.pa.us

almost 2 years ago

In reply to: Steve Baldwin (#2)

Re: query multiple schemas

Steve Baldwin <steve.baldwin@gmail.com> writes:

If the number of tenant schemas is reasonably static, you could write a
plpgsql function to create a set of UNION ALL views with one view for each
table in all tenant schemas. You could re-run the function each time a
tenant schema is added. Having the set of views would allow you to query
them as you would any of the underlying tables, and the query planner could
likely optimise the query better. With your current function, if you needed
to add a WHERE clause and the underlying tables were large, it would likely
not perform as well as the UNION ALL view.

Another idea is to build a partitioned table comprising all the
individual tables. This might require the tables to have all the
same owner (not sure about that) and it'd require you to store the
partition key, ie tenant, explicitly in every table which might
seem a bit wasteful. But it'll likely outperform any other solution
for the union queries. Notably, adding or deleting a partition would
be much less painful than redefining a UNION ALL view.

regards, tom lane

Dominique Devienne

ddevienne@gmail.com

almost 2 years ago

In reply to: Tom Lane (#4)

Re: query multiple schemas

On Sun, Apr 21, 2024 at 11:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Steve Baldwin <steve.baldwin@gmail.com> writes:

If the number of tenant schemas is reasonably static, you could write a
plpgsql function to create a set of UNION ALL views

Another idea is to build a partitioned table

Hi Norbert. I asked a [similar question][1]https://postgrespro.com/list/thread-id/2673517 a while back,
and unfortunately didn't get any actionable input, perhaps
because I already mentioned in my message the options
proposed here so far. Seems like people like us, using a
dynamic number of schemas, are outliers in database-land.

In my case, the revision/version of the schema could be
different as well, which would complicate the partitioning idea.

In any case, I'm interested in what works well for you.
And if/when I get back to this issue myself, I'll do the same.

Thanks, --DD

[1]: https://postgrespro.com/list/thread-id/2673517

Norbert Sándor

sandor.norbert@erinors.com

almost 2 years ago

In reply to: Dominique Devienne (#5)

Re: query multiple schemas

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<i>> using a dynamic number of schemas</i><br>
<br>
Although there are historical reasons behind our "schema per tenant"
architecture, it provides very good logical separation of data, and
is very convenient that we don't need to include the tenant ID in
each query (I'm sure that it would cause lots of bugs and trouble).<br>
Besides, we use Hibernate and <a
href="https://docs.jboss.org/hibernate/orm/6.4/userguide/html_single/Hibernate_User_Guide.html#multitenacy-separate-schema">it
has great support</a> for this architecture.<br>
<br>
<i>> In any case, I'm interested in what works well for you.</i><br>
<br>
I went with the dynamic solution I proposed in my original email
(which performs a UNION of the queried table from all tenants).<br>
Performance is currently not a priority in our use-cases, otherwise
I would have probably chosen a more static solution (which wouldn't
be easy to maintain continuously, I'm sure of).<br>
<br>
I further simplified its usage, so finally it is fairly comfortable
to use either directly:<br>
<blockquote><font face="monospace">select tenantId, (record).*<br>
from tenant_query(null::mytable)<br>
where (record).type=2<br>
order by tenantId, (record).name;</font><br>
</blockquote>
or by using a temporary view:<br>
<blockquote><font face="monospace">create temp view
all_tenant_mytable as<br>
select tenantId, (record).* from tenant_query(null::mytable);<br>
<br>
select *<br>
from all_tenant_mytable<br>
where type=2<br>
order by tenantId, name;</font><br>
</blockquote>
<i>> In my case, the revision/version of the schema could be
different as well</i><br>
<br>
This complicates things very much, it is probably not possible to
implement a solution as comfortable as in my case.<br>
<br>
Although I think the JSON-based solutions proposed in this thread
would work in your case as well.<br>
In my current solution I also use JSON as intermediate
representation - although I <i>feel </i>it is because of my lack
of deeper knowledge of Postgresql's type system.<br>
The difference is that you would need to use JSON as the final
representation, and reference the JSON fields using <a
href="https://www.postgresql.org/docs/current/functions-json.html">Postgres's
JSON operators</a>.<br>
<br>
<i>> And if/when I get back to this issue myself, I'll do the
same.</i><br>
<br>
My current solution is not much different than the one I posted in
my original question.<br>
My main difficulty was the relatively static nature of Postgresql's
type system, so this solution is a result of lots of trial-and-error
rounds :)<br>
Take a look at it, and you (and maybe others) may have
recommendations, e.g. how I could get rid of the usage of the
intermediate JSON "layer".<br>
(Although I have to admit: it is amazing that it is possible to
implement this at all ;) )<br>
<blockquote>
<div style="background-color:#ffffff;padding:0px 0px 0px 2px;">
<div
style="color:#000000;background-color:#ffffff;font-family:"Liberation Mono";font-size:8pt;white-space:pre;"><p
style="margin:0;">CREATE OR REPLACE FUNCTION tenant_query_json(<span
style="text-decoration:underline;text-decoration-color:#ff8040;text-decoration-style:wavy;">tbl</span> <span
style="text-decoration:underline;text-decoration-color:#ff8040;text-decoration-style:wavy;">anyelement</span>)</p><p
style="margin:0;">RETURNS <span
style="text-decoration:underline;text-decoration-color:#ff8040;text-decoration-style:wavy;">setof</span> <span
style="text-decoration:underline;text-decoration-color:#ff8040;text-decoration-style:wavy;">json</span> AS $<span
style="text-decoration:underline;text-decoration-color:#ff8040;text-decoration-style:wavy;">func</span>$</p><p
style="margin:0;">declare</p><p style="margin:0;"> _select text;</p><p
style="margin:0;">begin</p><p style="margin:0;"> _select := (select</p><p
style="margin:0;"> string_agg(</p><p style="margin:0;"> format('select t.*, %L tenantId from %I.%I t', schema_name, schema_name, pg_typeof(<span
style="text-decoration:underline;text-decoration-color:#ff8040;text-decoration-style:wavy;">tbl</span>)),</p><p
style="margin:0;"> E'\n' || ' union all ' || E'\n')</p><p
style="margin:0;"> from (</p><p style="margin:0;"> SELECT schema_name</p><p
style="margin:0;"> FROM information_schema.schemata</p><p
style="margin:0;"> where schema_name not in ('information_schema') and schema_name not like '<span
style="text-decoration:underline;text-decoration-color:#ff8040;text-decoration-style:wavy;">pg</span>_%'</p><p
style="margin:0;"> ) tenants</p><p style="margin:0;"> );</p><p
style="margin:0;">
</p><p style="margin:0;"> return query execute 'select row_to_json(r) from (' || _select || ') as r';</p><p
style="margin:0;">END;</p><p style="margin:0;">$<span
style="text-decoration:underline;text-decoration-color:#ff8040;text-decoration-style:wavy;">func</span>$ LANGUAGE <span
style="text-decoration:underline;text-decoration-color:#ff8040;text-decoration-style:wavy;">plpgsql</span>;</p><p
style="margin:0;">
</p><p style="margin:0;">
</p><p style="margin:0;">CREATE OR REPLACE FUNCTION tenant_query(<span
style="text-decoration:underline;text-decoration-color:#ff8040;text-decoration-style:wavy;">tbl</span> <span
style="text-decoration:underline;text-decoration-color:#ff8040;text-decoration-style:wavy;">anyelement</span>)</p><p
style="margin:0;">RETURNS table(tenantId text, record <span
style="text-decoration:underline;text-decoration-color:#ff8040;text-decoration-style:wavy;">anyelement</span>) AS $<span
style="text-decoration:underline;text-decoration-color:#ff8040;text-decoration-style:wavy;">func</span>$</p><p
style="margin:0;">begin</p><p style="margin:0;"> return query</p><p
style="margin:0;"> select t.tenantId, t.rec</p><p
style="margin:0;"> from (</p><p style="margin:0;"> select</p><p
style="margin:0;"> <span
style="text-decoration:underline;text-decoration-color:#ff8040;text-decoration-style:wavy;">jr</span>->>'<span
style="text-decoration:underline;text-decoration-color:#ff8040;text-decoration-style:wavy;">tenantid</span>' tenantId,</p><p
style="margin:0;"> json_populate_record(<span
style="text-decoration:underline;text-decoration-color:#ff8040;text-decoration-style:wavy;">tbl</span>, <span
style="text-decoration:underline;text-decoration-color:#ff8040;text-decoration-style:wavy;">jr</span>) <span
style="text-decoration:underline;text-decoration-color:#ff8040;text-decoration-style:wavy;">rec</span></p><p
style="margin:0;"> from tenant_query_json(<span
style="text-decoration:underline;text-decoration-color:#ff8040;text-decoration-style:wavy;">tbl</span>) <span
style="text-decoration:underline;text-decoration-color:#ff8040;text-decoration-style:wavy;">jr</span></p><p
style="margin:0;"> ) t;</p><p style="margin:0;">END;</p><p
style="margin:0;">$<span
style="text-decoration:underline;text-decoration-color:#ff8040;text-decoration-style:wavy;">func</span>$ LANGUAGE <span
style="text-decoration:underline;text-decoration-color:#ff8040;text-decoration-style:wavy;">plpgsql</span>;</p><p
style="margin:0;">
</p></div>
</div>
</blockquote>
As you can see in my examples above, I use the <font
face="monospace">tenant_query()</font> function but in your case
(if your schemas are different) something similar to <font
face="monospace">tenant_query_json()</font> may work better.<br>
<br>
--<br>
Norbi<br>
<br>
<div class="moz-cite-prefix">On 2024. 04. 23. 9:33, Dominique
Devienne wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAFCRh--7Oh2CFdAhqN-GXbGt63djxHN2SYLHUwt_MKcYZZuTKg@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div dir="ltr">On Sun, Apr 21, 2024 at 11:12 PM Tom Lane <<a
href="mailto:tgl@sss.pgh.pa.us" moz-do-not-send="true"
class="moz-txt-link-freetext">tgl@sss.pgh.pa.us</a>>
wrote:<br>
</div>
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Steve
Baldwin <<a href="mailto:steve.baldwin@gmail.com"
target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">steve.baldwin@gmail.com</a>>
writes:<br>
> If the number of tenant schemas is reasonably static,
you could write a<br>
> plpgsql function to create a set of UNION ALL views<br>
<br>
Another idea is to build a partitioned table<br>
</blockquote>
<div><br>
</div>
<div>Hi Norbert. I asked a [similar question][1] a while back,</div>
<div>and unfortunately didn't get any actionable input,
perhaps</div>
<div>because I already mentioned in my message the options</div>
<div>proposed here so far. Seems like people like us, using a</div>
<div>dynamic number of schemas, are outliers in database-land.</div>
<div><br>
</div>
<div>In my case, the revision/version of the schema could be</div>
<div>different as well, which would complicate the
partitioning idea.</div>
<div><br>
</div>
<div>In any case, I'm interested in what works well for you.</div>
<div>And if/when I get back to this issue myself, I'll do the
same.</div>
<div><br>
</div>
<div>Thanks, --DD</div>
<div><br>
</div>
<div>[1]: <a
href="https://postgrespro.com/list/thread-id/2673517"
moz-do-not-send="true" class="moz-txt-link-freetext">https://postgrespro.com/list/thread-id/2673517</a> </div>
</div>
</div>
</blockquote>
<br>
</body>
</html>

Dominique Devienne

ddevienne@gmail.com

almost 2 years ago

In reply to: Norbert Sándor (#6)

Re: query multiple schemas

On Tue, Apr 23, 2024 at 11:08 AM Norbert Sándor <sandor.norbert@erinors.com>
wrote:

*> And if/when I get back to this issue myself, I'll do the same.*
My current solution is not much different than the one I posted in my
original question.

CREATE OR REPLACE FUNCTION tenant_query_json(tbl anyelement)

RETURNS setof json AS $func$ [...]

CREATE OR REPLACE FUNCTION tenant_query(tbl anyelement)

RETURNS table(tenantId text, record anyelement) AS $func$ [...]

Thanks for sharing Norbi. I'm not well versed in PG/PLsql, so using

`anyelement` and `returns setof / table` is interesting to see.

Regarding the type system, I don't know if PostgreSQL supports "structural"
typing,
i.e. types from different schemas, despite having the same "shape", are not
interoperable.
Thus your need to go through JSON to "anonymize" the types in your inner
function, then
de-anonymize them (to a schema's specific type) as a record, in your outer
function.

One solution in your case of homogenous tables types, is to centralize your
types in
a single data-less schema, and then create each tenants tables based on
those types,
i.e. all tables (across tenant schemas) share the same type. But that's
speculation mostly.

--DD

I.e. 2nd form of CREATE TABLE, from
https://www.postgresql.org/docs/current/sql-createtable.html:
CREATE [...] TABLE [...] *table_name* OF *type_name*

Marcos Pegoraro

marcos@f10.com.br

almost 2 years ago

In reply to: Norbert Sándor (#1)

Re: query multiple schemas

Em dom., 21 de abr. de 2024 às 17:12, Norbert Sándor <
sandor.norbert@erinors.com> escreveu:

Hello,

I have a database with multiple tenants with a separate schema for each
tenant.
The structure of each schema is identical, the tenant ID is the name of
the schema.

What I would like to achieve is to able to query tables in all schemas at
once with the tenant IDs added to the result set.

I have a similar structure and do my multi tenant queries this way.
The only problem I see is that we have to define every result type, because
I return a record, but it runs fine.

create function sql_per_tenant(sql text, tenants text[]) returns setof
record language plpgsql AS $function$
declare
Result record;
schemas text;
begin
for schemas in select unnest(tenants) loop
execute Format('set local search_path to %s, public;', schemas);
for Result in execute Format('select $$%1$s$$ tenant, * from (%2$s) x',
schemas, sql) loop
return next Result;
end loop;
end loop;
end;$function$;

select * from sql_per_tenant('select Order_ID, sum(Value) from Orders inner
join Items using(Order_ID)
where Due_Date = Current_Date','{cus_001,cus_035,cus_175}')
as (SchemaName text, Order_ID integer, sum_of_items Numeric)

regards
Marcos