Query on postgres_fdw extension

Started by Swathi Palmost 5 years ago18 messagesbugsgeneral
Jump to latest
#1Swathi P
swathi.bluepearl@gmail.com
bugsgeneral

Hello EveryOne,

Hope you are all doing well and staying safe.

Am Swathi, have been working with postgres for last 3 years. Currently we
are working on a project to build a sharding solution with the help of
native postgres_fdw extension. During this process, we have encountered an
issue with postgres_fdw. I tried to give as much as details below on the
issue we are facing, it would be of great help if you can help us overcome
this issue.

- We have Host_A and Host_B , where Host_A is out coordinator node and
Host_B is used as our data node.

- Host_B has a table "Table_B" with a sequence id column which auto
generates the series by default
CREATE TABLE public.table_a
(
id bigint NOT NULL DEFAULT nextval('table_a_id_seq'::regclass),
topic character varying(50) NOT NULL,
CONSTRAINT table_a_pk PRIMARY KEY (id)
)

- on Host_A we have a foreign table created with the ddl below
CREATE FOREIGN TABLE public.table_a
(
id bigint ,
topic character varying(50) NOT NULL,
) SERVER Host_A OPTIONS (schema_name 'public', table_name 'table_a');

- When we try to insert data directly on the table_a while connected to
Host_B, works fine with the auto incremented values for the id column

- But the same insert fails when run from the coordinator node with below
error.
poc=> insert into table_a(topic) values ('test');
ERROR: null value in column "id" of relation "table_a" violates not-null
constraint
DETAIL: Failing row contains (null, test).
CONTEXT: remote SQL command: INSERT INTO public.table_a(id, topic) VALUES
($1, $2)

- If we omit the primary key and not null constraints on the table_a on
remote host (Host_B) , inserts will work fine on Host_A but a NULL value
insertedd for the id column instead of sequence

We are looking for some help on understanding the issue here and the best
possible workaround for the same.

Your help will be greatly appreciated

Thanks
Swathi P

#2Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Swathi P (#1)
bugsgeneral
Re: Query on postgres_fdw extension

On Thu, May 13, 2021 at 4:58 PM Swathi P <swathi.bluepearl@gmail.com> wrote:

Hello EveryOne,

Hope you are all doing well and staying safe.

Am Swathi, have been working with postgres for last 3 years. Currently we are working on a project to build a sharding solution with the help of native postgres_fdw extension. During this process, we have encountered an issue with postgres_fdw. I tried to give as much as details below on the issue we are facing, it would be of great help if you can help us overcome this issue.

- We have Host_A and Host_B , where Host_A is out coordinator node and Host_B is used as our data node.

- Host_B has a table "Table_B" with a sequence id column which auto generates the series by default
CREATE TABLE public.table_a
(
id bigint NOT NULL DEFAULT nextval('table_a_id_seq'::regclass),
topic character varying(50) NOT NULL,
CONSTRAINT table_a_pk PRIMARY KEY (id)
)

- on Host_A we have a foreign table created with the ddl below
CREATE FOREIGN TABLE public.table_a
(
id bigint ,
topic character varying(50) NOT NULL,
) SERVER Host_A OPTIONS (schema_name 'public', table_name 'table_a');

- When we try to insert data directly on the table_a while connected to Host_B, works fine with the auto incremented values for the id column

- But the same insert fails when run from the coordinator node with below error.
poc=> insert into table_a(topic) values ('test');
ERROR: null value in column "id" of relation "table_a" violates not-null constraint
DETAIL: Failing row contains (null, test).
CONTEXT: remote SQL command: INSERT INTO public.table_a(id, topic) VALUES ($1, $2)

- If we omit the primary key and not null constraints on the table_a on remote host (Host_B) , inserts will work fine on Host_A but a NULL value insertedd for the id column instead of sequence

We are looking for some help on understanding the issue here and the best possible workaround for the same.

Your help will be greatly appreciated

I think you need to declare your foreign table column id as "serial"
type instead of "bigint". Below is what I tried from my end.

On remote server:
CREATE USER foreign_user;
DROP TABLE table_a;
CREATE TABLE table_a
(
id serial NOT NULL,
topic character varying(50) NOT NULL,
CONSTRAINT table_a_pk PRIMARY KEY (id)
);
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO foreign_user;

On local server:
DROP EXTENSION postgres_fdw CASCADE;
CREATE EXTENSION postgres_fdw;
CREATE SERVER foreign_server
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'localhost', port '5433', dbname 'postgres');
CREATE USER MAPPING FOR public
SERVER foreign_server
OPTIONS (user 'foreign_user', password '');
CREATE FOREIGN TABLE table_a (id serial NOT NULL,
topic character varying(50) NOT NULL)
SERVER foreign_server
OPTIONS (schema_name 'public', table_name 'table_a');
SELECT * FROM table_a;
INSERT INTO table_a(topic) VALUES('row1');
INSERT INTO table_a(topic) VALUES('row2');
INSERT INTO table_a(topic) VALUES('row3');

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#3Swathi P
swathi.bluepearl@gmail.com
In reply to: Bharath Rupireddy (#2)
bugsgeneral
Re: Query on postgres_fdw extension

Thanks Bharath for the details.

In our sharding solution, we have multiple coodinator nodes. If we declare
the table column as serial data type, we might end up having duplicate
values for id column in the table_a in host_b (data node) as cconnections
come from multiple coordinatoor nodes and might end up in duplicate key
violations.

Hence we decided to have the coordinator nodes as stateless and hence
declared the column with no serial/sequence. Let me know if this makes
sense.

Have come across multiple articles on the same issue, i would like to know
if we are doing something wrong here or we have bette workaround for this
issue,

/messages/by-id/CAP=oouH=FccW4V2zm1VjGit=NZDCXzU2tYBoZe88v3mXrEA9Qg@mail.gmail.com
https://stackoverflow.com/questions/66582694/how-to-get-the-generated-id-from-an-insert-to-a-postgres-foreign-table
https://paquier.xyz/postgresql-2/global-sequences-with-postgres_fdw-and-postgres-core/

Thanks again for your time.

- Swathi

On Thu, May 13, 2021 at 5:34 PM Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:

Show quoted text

On Thu, May 13, 2021 at 4:58 PM Swathi P <swathi.bluepearl@gmail.com>
wrote:

Hello EveryOne,

Hope you are all doing well and staying safe.

Am Swathi, have been working with postgres for last 3 years. Currently

we are working on a project to build a sharding solution with the help of
native postgres_fdw extension. During this process, we have encountered an
issue with postgres_fdw. I tried to give as much as details below on the
issue we are facing, it would be of great help if you can help us overcome
this issue.

- We have Host_A and Host_B , where Host_A is out coordinator node and

Host_B is used as our data node.

- Host_B has a table "Table_B" with a sequence id column which auto

generates the series by default

CREATE TABLE public.table_a
(
id bigint NOT NULL DEFAULT nextval('table_a_id_seq'::regclass),
topic character varying(50) NOT NULL,
CONSTRAINT table_a_pk PRIMARY KEY (id)
)

- on Host_A we have a foreign table created with the ddl below
CREATE FOREIGN TABLE public.table_a
(
id bigint ,
topic character varying(50) NOT NULL,
) SERVER Host_A OPTIONS (schema_name 'public', table_name 'table_a');

- When we try to insert data directly on the table_a while connected to

Host_B, works fine with the auto incremented values for the id column

- But the same insert fails when run from the coordinator node with

below error.

poc=> insert into table_a(topic) values ('test');
ERROR: null value in column "id" of relation "table_a" violates

not-null constraint

DETAIL: Failing row contains (null, test).
CONTEXT: remote SQL command: INSERT INTO public.table_a(id, topic)

VALUES ($1, $2)

- If we omit the primary key and not null constraints on the table_a on

remote host (Host_B) , inserts will work fine on Host_A but a NULL value
insertedd for the id column instead of sequence

We are looking for some help on understanding the issue here and the

best possible workaround for the same.

Your help will be greatly appreciated

I think you need to declare your foreign table column id as "serial"
type instead of "bigint". Below is what I tried from my end.

On remote server:
CREATE USER foreign_user;
DROP TABLE table_a;
CREATE TABLE table_a
(
id serial NOT NULL,
topic character varying(50) NOT NULL,
CONSTRAINT table_a_pk PRIMARY KEY (id)
);
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO foreign_user;

On local server:
DROP EXTENSION postgres_fdw CASCADE;
CREATE EXTENSION postgres_fdw;
CREATE SERVER foreign_server
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'localhost', port '5433', dbname 'postgres');
CREATE USER MAPPING FOR public
SERVER foreign_server
OPTIONS (user 'foreign_user', password '');
CREATE FOREIGN TABLE table_a (id serial NOT NULL,
topic character varying(50) NOT NULL)
SERVER foreign_server
OPTIONS (schema_name 'public', table_name 'table_a');
SELECT * FROM table_a;
INSERT INTO table_a(topic) VALUES('row1');
INSERT INTO table_a(topic) VALUES('row2');
INSERT INTO table_a(topic) VALUES('row3');

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#4Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Swathi P (#3)
bugsgeneral
Re: Query on postgres_fdw extension

On Fri, May 14, 2021 at 12:37 PM Swathi P <swathi.bluepearl@gmail.com> wrote:

Thanks Bharath for the details.

In our sharding solution, we have multiple coodinator nodes. If we declare the table column as serial data type, we might end up having duplicate values for id column in the table_a in host_b (data node) as cconnections come from multiple coordinatoor nodes and might end up in duplicate key violations.

Hence we decided to have the coordinator nodes as stateless and hence declared the column with no serial/sequence. Let me know if this makes sense.

Hm.

Have come across multiple articles on the same issue, i would like to know if we are doing something wrong here or we have bette workaround for this issue,

/messages/by-id/CAP=oouH=FccW4V2zm1VjGit=NZDCXzU2tYBoZe88v3mXrEA9Qg@mail.gmail.com
https://stackoverflow.com/questions/66582694/how-to-get-the-generated-id-from-an-insert-to-a-postgres-foreign-table
https://paquier.xyz/postgresql-2/global-sequences-with-postgres_fdw-and-postgres-core/

Did you try using the approach specified by Michael in [1]https://paquier.xyz/postgresql-2/global-sequences-with-postgres_fdw-and-postgres-core/? Won't that help?

"As a conclusion, you can create tables using unique values across
multiple nodes by associating for example foreign_seq_nextval() with
DEFAULT for a column.
=# CREATE TABLE tab (a int DEFAULT foreign_seq_nextval());
CREATE TABLE
=# INSERT INTO tab VALUES (DEFAULT), (DEFAULT), (DEFAULT);
INSERT 0 3
=# SELECT * FROM tab;
a
----
9
10
11
(3 rows)
"

[1]: https://paquier.xyz/postgresql-2/global-sequences-with-postgres_fdw-and-postgres-core/

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Swathi P (#3)
bugsgeneral
Re: Query on postgres_fdw extension

[ removing -bugs ]

Swathi P <swathi.bluepearl@gmail.com> writes:

Hence we decided to have the coordinator nodes as stateless and hence
declared the column with no serial/sequence. Let me know if this makes
sense.

Attaching serial-sequence defaults on both sides would certainly not
work very well, because the sequences wouldn't stay in sync.

Unfortunately, postgres_fdw just doesn't have a good way right now
to make use of dynamically-generated defaults at the remote server.
If you leave out a column in your INSERT, it's going to compute
and send the locally-defined default (which is just null in this
case), so the remote's default expression is never used.

I remember that we spent a great deal of effort in postgres_fdw's
early days, trying to find a way that we could use the remote's
defaults in cases like this. But everything we tried ended up
causing horrible semantic inconsistencies, so we ended up with
the always-use-the-local-default approach. There was some feeling
that maybe this could be revisited later, but no one's done so.

One conceivable workaround is to do your insertions through a
foreign table that doesn't even have the serial column, so that
the INSERT command received by the remote server lacks that
column and the default gets applied. Probably too messy though.

regards, tom lane

#6Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Tom Lane (#5)
bugsgeneral
Re: Query on postgres_fdw extension

On Fri, May 14, 2021 at 8:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

[ removing -bugs ]

Swathi P <swathi.bluepearl@gmail.com> writes:

Hence we decided to have the coordinator nodes as stateless and hence
declared the column with no serial/sequence. Let me know if this makes
sense.

Attaching serial-sequence defaults on both sides would certainly not
work very well, because the sequences wouldn't stay in sync.

Unfortunately, postgres_fdw just doesn't have a good way right now
to make use of dynamically-generated defaults at the remote server.
If you leave out a column in your INSERT, it's going to compute
and send the locally-defined default (which is just null in this
case), so the remote's default expression is never used.

I remember that we spent a great deal of effort in postgres_fdw's
early days, trying to find a way that we could use the remote's
defaults in cases like this. But everything we tried ended up
causing horrible semantic inconsistencies, so we ended up with
the always-use-the-local-default approach. There was some feeling
that maybe this could be revisited later, but no one's done so.

One conceivable workaround is to do your insertions through a
foreign table that doesn't even have the serial column, so that
the INSERT command received by the remote server lacks that
column and the default gets applied. Probably too messy though.

Then the serial column cannot be selected via the foreign table.
Maybe, drop the serial column from the foreign table before insertions
and add the serial column before the selects from the foreign table.
This is not elegant though.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bharath Rupireddy (#6)
bugsgeneral
Re: Query on postgres_fdw extension

Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:

On Fri, May 14, 2021 at 8:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

One conceivable workaround is to do your insertions through a
foreign table that doesn't even have the serial column, so that
the INSERT command received by the remote server lacks that
column and the default gets applied. Probably too messy though.

Then the serial column cannot be selected via the foreign table.

Yeah, you'd probably need to use different tables for inserting
and reading, which is why I called it messy.

The idea you pointed to in Michael's blog might work though,
ie make a local function that reaches over to the remote server
to get a globally-valid next sequence value, and use that as
the default expression of the foreign table.

regards, tom lane

#8Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Tom Lane (#5)
bugsgeneral
Re: Query on postgres_fdw extension

On Fri, 2021-05-14 at 10:53 -0400, Tom Lane wrote:

Swathi P <swathi.bluepearl@gmail.com> writes:

Hence we decided to have the coordinator nodes as stateless and hence
declared the column with no serial/sequence. Let me know if this makes
sense.

Attaching serial-sequence defaults on both sides would certainly not
work very well, because the sequences wouldn't stay in sync.

Unfortunately, postgres_fdw just doesn't have a good way right now
to make use of dynamically-generated defaults at the remote server.
If you leave out a column in your INSERT, it's going to compute
and send the locally-defined default (which is just null in this
case), so the remote's default expression is never used.

I remember that we spent a great deal of effort in postgres_fdw's
early days, trying to find a way that we could use the remote's
defaults in cases like this. But everything we tried ended up
causing horrible semantic inconsistencies, so we ended up with
the always-use-the-local-default approach. There was some feeling
that maybe this could be revisited later, but no one's done so.

One conceivable workaround is to do your insertions through a
foreign table that doesn't even have the serial column, so that
the INSERT command received by the remote server lacks that
column and the default gets applied. Probably too messy though.

One possibility might be to define a trigger on the remote table
that fetches the next sequence value if you try to insert NULL.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#9Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Swathi P (#3)
bugsgeneral
Re: Query on postgres_fdw extension

Hi,

On Fri, May 14, 2021 at 6:08 PM Swathi P <swathi.bluepearl@gmail.com> wrote:

In our sharding solution, we have multiple coodinator nodes. If we declare the table column as serial data type, we might end up having duplicate values for id column in the table_a in host_b (data node) as cconnections come from multiple coordinatoor nodes and might end up in duplicate key violations.

Hence we decided to have the coordinator nodes as stateless and hence declared the column with no serial/sequence. Let me know if this makes sense.

It seems reasonable to me to make coodinator nodes stateless, but may
I ask the reason you use multiple coordinator nodes?

Best regards,
Etsuro Fujita

#10Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Etsuro Fujita (#9)
bugsgeneral
Re: Query on postgres_fdw extension

On Tue, Jun 1, 2021 at 3:31 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

Hi,

On Fri, May 14, 2021 at 6:08 PM Swathi P <swathi.bluepearl@gmail.com> wrote:

In our sharding solution, we have multiple coodinator nodes. If we declare the table column as serial data type, we might end up having duplicate values for id column in the table_a in host_b (data node) as cconnections come from multiple coordinatoor nodes and might end up in duplicate key violations.

Hence we decided to have the coordinator nodes as stateless and hence declared the column with no serial/sequence. Let me know if this makes sense.

It seems reasonable to me to make coodinator nodes stateless, but may
I ask the reason you use multiple coordinator nodes?

Perhaps, as a redundant node to avoid single point of failures? It's
just a guess as I'm not the right one to answer that question though.

With Regards,
Bharath Rupireddy.

#11Swathi P
swathi.bluepearl@gmail.com
In reply to: Bharath Rupireddy (#10)
bugsgeneral
Re: Query on postgres_fdw extension

You are right, we added more than one
coordinator nodes for high availability and to avoid single point of
failure.

Thanks
Swathi

On Tue, Jun 1, 2021, 3:54 PM Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:

Show quoted text

On Tue, Jun 1, 2021 at 3:31 PM Etsuro Fujita <etsuro.fujita@gmail.com>
wrote:

Hi,

On Fri, May 14, 2021 at 6:08 PM Swathi P <swathi.bluepearl@gmail.com>

wrote:

In our sharding solution, we have multiple coodinator nodes. If we

declare the table column as serial data type, we might end up having
duplicate values for id column in the table_a in host_b (data node) as
cconnections come from multiple coordinatoor nodes and might end up in
duplicate key violations.

Hence we decided to have the coordinator nodes as stateless and hence

declared the column with no serial/sequence. Let me know if this makes
sense.

It seems reasonable to me to make coodinator nodes stateless, but may
I ask the reason you use multiple coordinator nodes?

Perhaps, as a redundant node to avoid single point of failures? It's
just a guess as I'm not the right one to answer that question though.

With Regards,
Bharath Rupireddy.

#12Duarte Carreira
dncarreira@gmail.com
In reply to: Laurenz Albe (#8)
bugsgeneral
Re: Query on postgres_fdw extension

Hello everyone.

I got here after encountering the same difficulty, although on a much more
mundane scenario.

I'm used to fdw on a read-only basis. I was just inserting a new record on
a foreign table and got blocked... and after much searching got here.

Not to rant or anything, but I am completely surprised by this limitation.
As far as I can see it is impossible to use fdw to insert records on 99% of
tables, since all have some kind of primary sequential key.

I'm just a user so cannot really understand the intricacies involved in
this process. Tried to find past messages and up to 2013 without
understanding the real problem.

The simplest workaround seems to be to quit using auto-numbering mechanisms
and implement numbering trigger functions, which is really just going back
to the 90s...

Another option would be a local function that would get the remote default
and use it in a local insert trigger. The complexity is just orders of
magnitude higher. We are talking about auto-numbering keys...

I don't know... realistically what do you guys see as a best/simple
approach?

Having 2 tables seems to me the easiest, less complex solution, but it's
hard on maintenance...

And don't take this the wrong way, but is it really that hard to have a
compromise: if there's a serial on the remote, then the user could change
the local definition so to just send the "DEFAULT" keyword to the remote
and let it figure it out? At least the user would have a chance of setting
the preferred behavior without much fuss, on a per-table basis. And still
use the basic functionality of serial/identity columns. PostgreSQL has such
complex stuff that this seems odd to be left out.

Well I hope I didn't cross over as negative or anything. I do love pgsql
and always promote it as the best thing under the sun.

Best regards,
Duarte

Laurenz Albe <laurenz.albe@cybertec.at> escreveu no dia quinta, 20/01/2022
à(s) 15:36:

Show quoted text

On Fri, 2021-05-14 at 10:53 -0400, Tom Lane wrote:

Swathi P <swathi.bluepearl@gmail.com> writes:

Hence we decided to have the coordinator nodes as stateless and hence
declared the column with no serial/sequence. Let me know if this makes
sense.

Attaching serial-sequence defaults on both sides would certainly not
work very well, because the sequences wouldn't stay in sync.

Unfortunately, postgres_fdw just doesn't have a good way right now
to make use of dynamically-generated defaults at the remote server.
If you leave out a column in your INSERT, it's going to compute
and send the locally-defined default (which is just null in this
case), so the remote's default expression is never used.

I remember that we spent a great deal of effort in postgres_fdw's
early days, trying to find a way that we could use the remote's
defaults in cases like this. But everything we tried ended up
causing horrible semantic inconsistencies, so we ended up with
the always-use-the-local-default approach. There was some feeling
that maybe this could be revisited later, but no one's done so.

One conceivable workaround is to do your insertions through a
foreign table that doesn't even have the serial column, so that
the INSERT command received by the remote server lacks that
column and the default gets applied. Probably too messy though.

One possibility might be to define a trigger on the remote table
that fetches the next sequence value if you try to insert NULL.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#13Vijaykumar Jain
vijaykumarjain.github@gmail.com
In reply to: Duarte Carreira (#12)
bugsgeneral
Re: Query on postgres_fdw extension

On Thu, 20 Jan 2022 at 21:29, Duarte Carreira <dncarreira@gmail.com> wrote:

Hello everyone.

I don't know... realistically what do you guys see as a best/simple
approach?

We implemented a custom sharding (directory sharding with lookup tables)
layer of 10 shards, but it was write local, read global.
the api was responsible for all rebalancing incase of hotspots.
other api sharding examples ...
Database Sharding: Solving Performance in a Multi-Tenant Restaurant Data
Analytics System (gotenzo.com)
<https://blog.gotenzo.com/tech/database-sharding-solving-performance-in-a-multi-tenant-restaurant-data-analytics-system&gt;

<https://blog.gotenzo.com/tech/database-sharding-solving-performance-in-a-multi-tenant-restaurant-data-analytics-system&gt;
although
it worked really well, when you are maintaining it on your own, it gets
really painful, much beyond id generation globally.

i will not go into the details, but in short, sharded setup is not the same
as local setup. there would be many more things that would not work as
expected
which would otherwise work really well on a standalone setup.

writes over shard may work, but you realize it is over the network, so you
can lock you table for a much longer duration and cause a much more serious
outage,
if you really wanted to have distributed writes with unique keys, you can
go with uuid i think or have your own seq generator globally (see below).

*Move ID generation out of the database to an ID generation service outside
of the database… As soon as a piece of work enters their system, an ID gets
assigned to it… and that ID generated in a way that is known to be globally
unique within their system*

A Better ID Generator For PostgreSQL | robconery
<https://rob.conery.io/2014/05/29/a-better-id-generator-for-postgresql/&gt;
Index of /shard_manager/shard_manager-0.0.1/ (pgxn.org)
<https://api.pgxn.org/src/shard_manager/shard_manager-0.0.1/&gt; (pretty old
but if you can use your coordinator server as a id_generator(), then you
can generate ids which are globally unique)
Sharding & IDs at Instagram. With more than 25 photos and 90 likes… | by
Instagram Engineering | Instagram Engineering (instagram-engineering.com)
<https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c&gt;

imho, do not try sharding manually, unless you have enough dbas to maintain
the shards, try using citus, it would make a lot of the manual stuff easier.

also, the below work arounds are bad, incase you just want to rush through

postgres=# \c localdb
You are now connected to database "localdb" as user "postgres".
localdb=#
localdb=# \dt
Did not find any relations.
localdb=# \det
List of foreign tables
Schema | Table | Server
--------+-------+---------------
public | t | remote_server
(1 row)

localdb=# \det+ t
List of foreign tables
Schema | Table | Server | FDW options |
Description
--------+-------+---------------+----------------------------------------+-------------
public | t | remote_server | (schema_name 'public', table_name 't') |
(1 row)

localdb=# \det t
List of foreign tables
Schema | Table | Server
--------+-------+---------------
public | t | remote_server
(1 row)

localdb=# create or replace function getnext() returns int as $_$ select id
FROM dblink ('dbname = remotedb', $$ select nextval('t_id_seq') $$ ) as
newtable(id int); $_$ language sql;
CREATE FUNCTION
localdb=# \c remotedb
You are now connected to database "remotedb" as user "postgres".
remotedb=# \dt t
List of relations
Schema | Name | Type | Owner
--------+------+-------+----------
public | t | table | postgres
(1 row)

remotedb=# \ds t_id_seq
List of relations
Schema | Name | Type | Owner
--------+----------+----------+----------
public | t_id_seq | sequence | postgres
(1 row)

remotedb=# \c localdb
You are now connected to database "localdb" as user "postgres".
localdb=# insert into t values (getnext(), 100);
INSERT 0 1
localdb=# insert into t values (getnext(), 100);
INSERT 0 1
localdb=# select * from t;
id | col1
----+------
11 | 4
12 | 5
13 | 100
14 | 100
(4 rows)

just my opinion, ignore it not useful.

#14Duarte Carreira
dncarreira@gmail.com
In reply to: Vijaykumar Jain (#13)
bugsgeneral
Re: Query on postgres_fdw extension

Thanks for your help!

I'm not going forward with the id generating scheme... I prefer to let the
bd do that work on its own. Sharding is way over my head.
For now I just created the 2 tables, one for inserting (without the id
column), another for everything else. It's awkward and prone to human error
but as long as nothing changes and no one deletes it thinking it's
garbage...

Thanks.

Vijaykumar Jain <vijaykumarjain.github@gmail.com> escreveu no dia quinta,
20/01/2022 à(s) 17:39:

Show quoted text

On Thu, 20 Jan 2022 at 21:29, Duarte Carreira <dncarreira@gmail.com>
wrote:

Hello everyone.

I don't know... realistically what do you guys see as a best/simple
approach?

We implemented a custom sharding (directory sharding with lookup tables)
layer of 10 shards, but it was write local, read global.
the api was responsible for all rebalancing incase of hotspots.
other api sharding examples ...
Database Sharding: Solving Performance in a Multi-Tenant Restaurant Data
Analytics System (gotenzo.com)
<https://blog.gotenzo.com/tech/database-sharding-solving-performance-in-a-multi-tenant-restaurant-data-analytics-system&gt;

<https://blog.gotenzo.com/tech/database-sharding-solving-performance-in-a-multi-tenant-restaurant-data-analytics-system&gt; although
it worked really well, when you are maintaining it on your own, it gets
really painful, much beyond id generation globally.

i will not go into the details, but in short, sharded setup is not the
same as local setup. there would be many more things that would not work as
expected
which would otherwise work really well on a standalone setup.

writes over shard may work, but you realize it is over the network, so you
can lock you table for a much longer duration and cause a much more serious
outage,
if you really wanted to have distributed writes with unique keys, you can
go with uuid i think or have your own seq generator globally (see below).

*Move ID generation out of the database to an ID generation service
outside of the database… As soon as a piece of work enters their system, an
ID gets assigned to it… and that ID generated in a way that is known to be
globally unique within their system*

A Better ID Generator For PostgreSQL | robconery
<https://rob.conery.io/2014/05/29/a-better-id-generator-for-postgresql/&gt;
Index of /shard_manager/shard_manager-0.0.1/ (pgxn.org)
<https://api.pgxn.org/src/shard_manager/shard_manager-0.0.1/&gt; (pretty
old but if you can use your coordinator server as a id_generator(), then
you can generate ids which are globally unique)
Sharding & IDs at Instagram. With more than 25 photos and 90 likes… | by
Instagram Engineering | Instagram Engineering (instagram-engineering.com)
<https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c&gt;

imho, do not try sharding manually, unless you have enough dbas to
maintain the shards, try using citus, it would make a lot of the manual
stuff easier.

also, the below work arounds are bad, incase you just want to rush through

postgres=# \c localdb
You are now connected to database "localdb" as user "postgres".
localdb=#
localdb=# \dt
Did not find any relations.
localdb=# \det
List of foreign tables
Schema | Table | Server
--------+-------+---------------
public | t | remote_server
(1 row)

localdb=# \det+ t
List of foreign tables
Schema | Table | Server | FDW options |
Description

--------+-------+---------------+----------------------------------------+-------------
public | t | remote_server | (schema_name 'public', table_name 't') |
(1 row)

localdb=# \det t
List of foreign tables
Schema | Table | Server
--------+-------+---------------
public | t | remote_server
(1 row)

localdb=# create or replace function getnext() returns int as $_$ select
id FROM dblink ('dbname = remotedb', $$ select nextval('t_id_seq') $$ )
as newtable(id int); $_$ language sql;
CREATE FUNCTION
localdb=# \c remotedb
You are now connected to database "remotedb" as user "postgres".
remotedb=# \dt t
List of relations
Schema | Name | Type | Owner
--------+------+-------+----------
public | t | table | postgres
(1 row)

remotedb=# \ds t_id_seq
List of relations
Schema | Name | Type | Owner
--------+----------+----------+----------
public | t_id_seq | sequence | postgres
(1 row)

remotedb=# \c localdb
You are now connected to database "localdb" as user "postgres".
localdb=# insert into t values (getnext(), 100);
INSERT 0 1
localdb=# insert into t values (getnext(), 100);
INSERT 0 1
localdb=# select * from t;
id | col1
----+------
11 | 4
12 | 5
13 | 100
14 | 100
(4 rows)

just my opinion, ignore it not useful.

#15Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Duarte Carreira (#12)
bugsgeneral
Re: Query on postgres_fdw extension

On Thu, 2022-01-20 at 15:59 +0000, Duarte Carreira wrote:

I got here after encountering the same difficulty, although on a much more mundane scenario.

I'm used to fdw on a read-only basis. I was just inserting a new record on a foreign table
and got blocked... and after much searching got here.

As far as I can see it is impossible to use fdw to insert records on 99% of tables,
since all have some kind of primary sequential key.

Yes, this is tricky. You could use something like this:

CREATE TABLE local (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
data text
);

CREATE FOREIGN TABLE remote (id bigint NOT NULL, data text)
SERVER whatever OPTIONS (table_name 'local');

CREATE FOREIGN TABLE remote_noid (data text)
SERVER whatever OPTIONS (table_name 'local');

CREATE VIEW v_remote AS SELECT * FROM remote;

CREATE FUNCTION ins_trig() RETURNS trigger LANGUAGE plpgsql AS
$$BEGIN
INSERT INTO remote_noid (data) VALUES (NEW.data);
RETURN NEW;
END;$$;

CREATE TRIGGER ins_trig INSTEAD OF INSERT ON v_remote
FOR EACH ROW EXECUTE FUNCTION ins_trig();

INSERT INTO v_remote (data) VALUES ('something');

SELECT * FROM v_remote;

id │ data
════╪═══════════
1 │ something

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#16Duarte Carreira
dncarreira@gmail.com
In reply to: Laurenz Albe (#15)
bugsgeneral
Re: Query on postgres_fdw extension

Hmmm... I don't think a view or trigger are necessary.

If we just create the 2 foreign tables, one complete and one without id,
you can simply insert into the table without id and it will work fine.
To select and show data, you use the "complete" table that has the id
column.

No need for trigger and view. If I understood correctly.

I have this 2 table setup working.

It's a workaround that quickly escalates out of hand though... with little
added value.

Thanks.

Laurenz Albe <laurenz.albe@cybertec.at> escreveu no dia sexta, 21/01/2022
à(s) 13:59:

Show quoted text

On Thu, 2022-01-20 at 15:59 +0000, Duarte Carreira wrote:

I got here after encountering the same difficulty, although on a much

more mundane scenario.

I'm used to fdw on a read-only basis. I was just inserting a new record

on a foreign table

and got blocked... and after much searching got here.

As far as I can see it is impossible to use fdw to insert records on 99%

of tables,

since all have some kind of primary sequential key.

Yes, this is tricky. You could use something like this:

CREATE TABLE local (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
data text
);

CREATE FOREIGN TABLE remote (id bigint NOT NULL, data text)
SERVER whatever OPTIONS (table_name 'local');

CREATE FOREIGN TABLE remote_noid (data text)
SERVER whatever OPTIONS (table_name 'local');

CREATE VIEW v_remote AS SELECT * FROM remote;

CREATE FUNCTION ins_trig() RETURNS trigger LANGUAGE plpgsql AS
$$BEGIN
INSERT INTO remote_noid (data) VALUES (NEW.data);
RETURN NEW;
END;$$;

CREATE TRIGGER ins_trig INSTEAD OF INSERT ON v_remote
FOR EACH ROW EXECUTE FUNCTION ins_trig();

INSERT INTO v_remote (data) VALUES ('something');

SELECT * FROM v_remote;

id │ data
════╪═══════════
1 │ something

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#17Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Duarte Carreira (#16)
bugsgeneral
Re: Query on postgres_fdw extension

On Fri, 2022-01-21 at 14:33 +0000, Duarte Carreira wrote:

If we just create the 2 foreign tables, one complete and one without id,
you can simply insert into the table without id and it will work fine.
To select and show data, you use the "complete" table that has the id column.

No need for trigger and view. If I understood correctly.

I have this 2 table setup working.

It's a workaround that quickly escalates out of hand though... with little added value.

Sure, that will work.

I just wanted to show how you can do without dealing with two different
foreign tables explicitly.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#18Shaozhong SHI
shishaozhong@gmail.com
In reply to: Laurenz Albe (#17)
bugsgeneral
Re: Query on postgres_fdw extension

Any functional code to be tested to confirm?

Regards,

David

On Fri, 21 Jan 2022 at 15:55, Laurenz Albe <laurenz.albe@cybertec.at> wrote:

Show quoted text

On Fri, 2022-01-21 at 14:33 +0000, Duarte Carreira wrote:

If we just create the 2 foreign tables, one complete and one without id,
you can simply insert into the table without id and it will work fine.
To select and show data, you use the "complete" table that has the id

column.

No need for trigger and view. If I understood correctly.

I have this 2 table setup working.

It's a workaround that quickly escalates out of hand though... with

little added value.

Sure, that will work.

I just wanted to show how you can do without dealing with two different
foreign tables explicitly.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com