managing primary key conflicts while restoring data to table with existing data

Started by Krishnakant Maneover 6 years ago14 messagesgeneral

kkmane@riseup.net

over 6 years ago

Hello all,

I have been using postgresql for an enterprise quality account's
automation and inventory management software called GNUKhata
<https://gnukhata.in>

Our team is planning to add backup and restore function in the software.

But we don't want to dump the entire database and then restore the same.

What we are trying to do is to copy data specific to an organization.

The challenge here is that I might copy all data (account heads, bills,
vouchers etc ) for one organization from an instance on one machine.

I take the archive in what ever format to another machine and now
attempt to restore.

The risk here is for example if the primary key value for orgcode in the
organization table is 5, it might conflict with the data where I am
attempting it to be restored.

Same holds true for bills, invoices etc.

A certain account head with accountcode 1 might be already present on
the second machine.

I am not expecting the users to empty all data from the destination
machine before restoring a backup.

The reason is that an auditor may have many client's data and one can't
predict what primary key values are going to come from a backup.

Basically I can even say this is a copy paste instead of a pure backup
and restore.

Can any one suggest how to handle such conflicts?

--
Regards,
Krishnakant Mane,
Project Founder and Leader,
GNUKhata <https://gnukhata.in/>
//(Opensource Accounting, Billing and Inventory Management Software)//

Rob Sargent

robjsargent@gmail.com

over 6 years ago

In reply to: Krishnakant Mane (#1)

Re: managing primary key conflicts while restoring data to table with existing data

On Sep 25, 2019, at 1:15 AM, Krishnakant Mane <kkmane@riseup.net> wrote:

Hello all,

I have been using postgresql for an enterprise quality account's automation and inventory management software called GNUKhata <https://gnukhata.in>

Our team is planning to add backup and restore function in the software.

But we don't want to dump the entire database and then restore the same.

What we are trying to do is to copy data specific to an organization.

The challenge here is that I might copy all data (account heads, bills, vouchers etc ) for one organization from an instance on one machine.

I take the archive in what ever format to another machine and now attempt to restore.

The risk here is for example if the primary key value for orgcode in the organization table is 5, it might conflict with the data where I am attempting it to be restored.

Same holds true for bills, invoices etc.

A certain account head with accountcode 1 might be already present on the second machine.

I am not expecting the users to empty all data from the destination machine before restoring a backup.

The reason is that an auditor may have many client's data and one can't predict what primary key values are going to come from a backup.

Basically I can even say this is a copy paste instead of a pure backup and restore.

Can any one suggest how to handle such conflicts?

--
Regards,
Krishnakant Mane,
Project Founder and Leader,
GNUKhata
(Opensource Accounting, Billing and Inventory Management Software)

I’m not sure I like your definition of ‘backup and restore’ but you might get away with your approach if your keys were UUIDs. But I’ll bet dollars to doughnuts you’re using serial keys. Those seem to appeal to accounting types.

Krishnakant Mane

kkmane@riseup.net

over 6 years ago

In reply to: Rob Sargent (#2)

Re: managing primary key conflicts while restoring data to table with existing data

On 25/09/19 4:32 PM, Rob Sargent wrote:

On Sep 25, 2019, at 1:15 AM, Krishnakant Mane <kkmane@riseup.net
<mailto:kkmane@riseup.net>> wrote:

Hello all,

I have been using postgresql for an enterprise quality account's
automation and inventory management software called GNUKhata
<https://gnukhata.in>

Our team is planning to add backup and restore function in the software.

But we don't want to dump the entire database and then restore the same.

What we are trying to do is to copy data specific to an organization.

The challenge here is that I might copy all data (account heads,
bills, vouchers etc ) for one organization from an instance on one
machine.

I take the archive in what ever format to another machine and now
attempt to restore.

The risk here is for example if the primary key value for orgcode in
the organization table is 5, it might conflict with the data where I
am attempting it to be restored.

Same holds true for bills, invoices etc.

A certain account head with accountcode 1 might be already present on
the second machine.

I am not expecting the users to empty all data from the destination
machine before restoring a backup.

The reason is that an auditor may have many client's data and one
can't predict what primary key values are going to come from a backup.

Basically I can even say this is a copy paste instead of a pure
backup and restore.

Can any one suggest how to handle such conflicts?

--
Regards,
Krishnakant Mane,
Project Founder and Leader,
GNUKhata <https://gnukhata.in/>
//(Opensource Accounting, Billing and Inventory Management Software)//

I’m not sure I like your definition of ‘backup and restore’ but you
might get away with your approach if your keys were UUIDs. But I’ll
bet dollars to doughnuts you’re using serial keys. Those seem to
appeal to accounting types.

--
Regards,
Hi Rob,
yes you are right, they are serial keys because timestamp or anything
thereoff is agressively despised by the accountants and the likes.
Now I am really stuck as to what could be done.Krishnakant Mane,
Project Founder and Leader,
GNUKhata <https://gnukhata.in/>
//(Opensource Accounting, Billing and Inventory Management Software)//

Adrian Klaver

adrian.klaver@aklaver.com

over 6 years ago

In reply to: Krishnakant Mane (#1)

Re: managing primary key conflicts while restoring data to table with existing data

On 9/25/19 12:15 AM, Krishnakant Mane wrote:

Hello all,

I have been using postgresql for an enterprise quality account's
automation and inventory management software called GNUKhata
<https://gnukhata.in>

Our team is planning to add backup and restore function in the software.

But we don't want to dump the entire database and then restore the same.

What we are trying to do is to copy data specific to an organization.

The challenge here is that I might copy all data (account heads, bills,
vouchers etc ) for one organization from an instance on one machine.

I take the archive in what ever format to another machine and now
attempt to restore.

The risk here is for example if the primary key value for orgcode in the
organization table is 5, it might conflict with the data where I am
attempting it to be restored.

Same holds true for bills, invoices etc.

A certain account head with accountcode 1 might be already present on
the second machine.

I am not expecting the users to empty all data from the destination
machine before restoring a backup.

The reason is that an auditor may have many client's data and one can't
predict what primary key values are going to come from a backup.

Basically I can even say this is a copy paste instead of a pure backup
and restore.

Can any one suggest how to handle such conflicts?

Hard to say. If the data is held in common tables(bills, vouchers,
etc)then the only thing I see happening is changing the PK values to an
unused value. That could turn into a nightmare though. Not only that you
lose the connection to the original data source. If the data can be
broken out into separate tables then I could see placing them in their
own schema.

--
Regards,
Krishnakant Mane,
Project Founder and Leader,
GNUKhata <https://gnukhata.in/>
//(Opensource Accounting, Billing and Inventory Management Software)//

--
Adrian Klaver
adrian.klaver@aklaver.com

Krishnakant Mane

kkmane@riseup.net

over 6 years ago

In reply to: Adrian Klaver (#4)

Re: managing primary key conflicts while restoring data to table with existing data

On 25/09/19 7:50 PM, Adrian Klaver wrote:

On 9/25/19 12:15 AM, Krishnakant Mane wrote:

Hello all,

I have been using postgresql for an enterprise quality account's
automation and inventory management software called GNUKhata
<https://gnukhata.in>

Our team is planning to add backup and restore function in the software.

But we don't want to dump the entire database and then restore the same.

What we are trying to do is to copy data specific to an organization.

The challenge here is that I might copy all data (account heads,
bills, vouchers etc ) for one organization from an instance on one
machine.

I take the archive in what ever format to another machine and now
attempt to restore.

The risk here is for example if the primary key value for orgcode in
the organization table is 5, it might conflict with the data where I
am attempting it to be restored.

Same holds true for bills, invoices etc.

A certain account head with accountcode 1 might be already present on
the second machine.

I am not expecting the users to empty all data from the destination
machine before restoring a backup.

The reason is that an auditor may have many client's data and one
can't predict what primary key values are going to come from a backup.

Basically I can even say this is a copy paste instead of a pure
backup and restore.

Can any one suggest how to handle such conflicts?

Hard to say. If the data is held in common tables(bills, vouchers,
etc)then the only thing I see happening is changing the PK values to
an unused value. That could turn into a nightmare though. Not only
that you lose the connection to the original data source. If the data
can be broken out into separate tables then I could see placing them
in their own schema.

--
Regards,
Krishnakant Mane,
Project Founder and Leader,
GNUKhata <https://gnukhata.in/>
//(Opensource Accounting, Billing and Inventory Management Software)//

Hi Adrian,

Even I am thinnking to do some kind of upsert with this situation.

And I would have to set the pkey to an unassigned value when there is
conflict.

I may also choose to revamp the serial by timestamps but don't know if
the target customers would like it.

--
Regards,
Krishnakant Mane,
Project Founder and Leader,
GNUKhata <https://gnukhata.in/>
//(Opensource Accounting, Billing and Inventory Management Software)//

Rob Sargent

robjsargent@gmail.com

over 6 years ago

In reply to: Krishnakant Mane (#5)

Re: managing primary key conflicts while restoring data to table with existing data

On Sep 25, 2019, at 8:24 AM, Krishnakant Mane <kkmane@riseup.net> wrote:

On 25/09/19 7:50 PM, Adrian Klaver wrote:

On 9/25/19 12:15 AM, Krishnakant Mane wrote:
Hello all,

I have been using postgresql for an enterprise quality account's automation and inventory management software called GNUKhata <https://gnukhata.in>

Our team is planning to add backup and restore function in the software.

But we don't want to dump the entire database and then restore the same.

What we are trying to do is to copy data specific to an organization.

The challenge here is that I might copy all data (account heads, bills, vouchers etc ) for one organization from an instance on one machine.

I take the archive in what ever format to another machine and now attempt to restore.

The risk here is for example if the primary key value for orgcode in the organization table is 5, it might conflict with the data where I am attempting it to be restored.

Same holds true for bills, invoices etc.

A certain account head with accountcode 1 might be already present on the second machine.

I am not expecting the users to empty all data from the destination machine before restoring a backup.

The reason is that an auditor may have many client's data and one can't predict what primary key values are going to come from a backup.

Basically I can even say this is a copy paste instead of a pure backup and restore.

Can any one suggest how to handle such conflicts?

Hard to say. If the data is held in common tables(bills, vouchers, etc)then the only thing I see happening is changing the PK values to an unused value. That could turn into a nightmare though. Not only that you lose the connection to the original data source. If the data can be broken out into separate tables then I could see placing them in their own schema.

--
Regards,
Krishnakant Mane,
Project Founder and Leader,
GNUKhata <https://gnukhata.in/>
//(Opensource Accounting, Billing and Inventory Management Software)//

Hi Adrian,

Even I am thinnking to do some kind of upsert with this situation.

And I would have to set the pkey to an unassigned value when there is conflict.

I may also choose to revamp the serial by timestamps but don't know if the target customers would like it.

--
Regards,
Krishnakant Mane,
Project Founder and Leader,
GNUKhata
(Opensource Accounting, Billing and Inventory Management Software)

It would likely be easier to rethink your backup and restore plan. Putting each restore into its own space would be one tack.

Michael Lewis

mlewis@entrata.com

over 6 years ago

In reply to: Rob Sargent (#6)

Re: managing primary key conflicts while restoring data to table with existing data

If the data is held in common tables(bills, vouchers, etc)then the only
thing I see happening is changing the PK values to an unused value. That
could turn into a nightmare though. Not only that you lose the connection
to the original data source. If the data can be broken out into separate
tables then I could see placing them in their own schema.

Why not have separate databases for each "restore"? They can run together
on one machine still. What is the user doing with this copy of the data?

Adrian Klaver

adrian.klaver@aklaver.com

over 6 years ago

In reply to: Rob Sargent (#6)

Re: managing primary key conflicts while restoring data to table with existing data

On 9/25/19 8:04 AM, Rob Sargent wrote:

On Sep 25, 2019, at 8:24 AM, Krishnakant Mane <kkmane@riseup.net
<mailto:kkmane@riseup.net>> wrote:

On 25/09/19 7:50 PM, Adrian Klaver wrote:

On 9/25/19 12:15 AM, Krishnakant Mane wrote:

Hello all,

I have been using postgresql for an enterprise quality account's
automation and inventory management software called GNUKhata
<https://gnukhata.in>

Our team is planning to add backup and restore function in the
software.

But we don't want to dump the entire database and then restore the
same.

What we are trying to do is to copy data specific to an organization.

The challenge here is that I might copy all data (account heads,
bills, vouchers etc ) for one organization from an instance on one
machine.

I take the archive in what ever format to another machine and now
attempt to restore.

The risk here is for example if the primary key value for orgcode in
the organization table is 5, it might conflict with the data where I
am attempting it to be restored.

Same holds true for bills, invoices etc.

A certain account head with accountcode 1 might be already present
on the second machine.

I am not expecting the users to empty all data from the destination
machine before restoring a backup.

The reason is that an auditor may have many client's data and one
can't predict what primary key values are going to come from a backup.

Basically I can even say this is a copy paste instead of a pure
backup and restore.

Can any one suggest how to handle such conflicts?

Hard to say. If the data is held in common tables(bills, vouchers,
etc)then the only thing I see happening is changing the PK values to
an unused value. That could turn into a nightmare though. Not only
that you lose the connection to the original data source. If the data
can be broken out into separate tables then I could see placing them
in their own schema.

--
Regards,
Krishnakant Mane,
Project Founder and Leader,
GNUKhata <https://gnukhata.in/>
//(Opensource Accounting, Billing and Inventory Management Software)//

Hi Adrian,

Even I am thinnking to do some kind of upsert with this situation.

So to be clear the tables you are working can have records from multiple
organizations in a single table?

And I would have to set the pkey to an unassigned value when there is
conflict.

I am seeing nextval() in your future:)

I may also choose to revamp the serial by timestamps but don't know if
the target customers would like it.

I would avoid that. In my opinion timestamps are to too volatile to
serve as a PK. If you are going to change I would go with the previous
suggestion of UUID:
https://www.postgresql.org/docs/11/datatype-uuid.html

Not sure your customers would like that either.

--
Regards,
Krishnakant Mane,
Project Founder and Leader,
GNUKhata <https://gnukhata.in/>
//(Opensource Accounting, Billing and Inventory Management Software)//

It would likely be easier to rethink your backup and restore plan.
Putting each restore into its own space would be one tack.

--
Adrian Klaver
adrian.klaver@aklaver.com

Krishnakant Mane

kkmane@riseup.net

over 6 years ago

In reply to: Adrian Klaver (#8)

Re: managing primary key conflicts while restoring data to table with existing data

On 26/09/19 12:03 AM, Adrian Klaver wrote:

On 9/25/19 8:04 AM, Rob Sargent wrote:

On Sep 25, 2019, at 8:24 AM, Krishnakant Mane <kkmane@riseup.net
<mailto:kkmane@riseup.net>> wrote:

On 25/09/19 7:50 PM, Adrian Klaver wrote:

On 9/25/19 12:15 AM, Krishnakant Mane wrote:

Hello all,

I have been using postgresql for an enterprise quality account's
automation and inventory management software called GNUKhata
<https://gnukhata.in>

Our team is planning to add backup and restore function in the
software.

But we don't want to dump the entire database and then restore the
same.

What we are trying to do is to copy data specific to an organization.

The challenge here is that I might copy all data (account heads,
bills, vouchers etc ) for one organization from an instance on one
machine.

I take the archive in what ever format to another machine and now
attempt to restore.

The risk here is for example if the primary key value for orgcode
in the organization table is 5, it might conflict with the data
where I am attempting it to be restored.

Same holds true for bills, invoices etc.

A certain account head with accountcode 1 might be already present
on the second machine.

I am not expecting the users to empty all data from the
destination machine before restoring a backup.

The reason is that an auditor may have many client's data and one
can't predict what primary key values are going to come from a
backup.

Basically I can even say this is a copy paste instead of a pure
backup and restore.

Can any one suggest how to handle such conflicts?

Hard to say. If the data is held in common tables(bills, vouchers,
etc)then the only thing I see happening is changing the PK values
to an unused value. That could turn into a nightmare though. Not
only that you lose the connection to the original data source. If
the data can be broken out into separate tables then I could see
placing them in their own schema.

--
Regards,
Krishnakant Mane,
Project Founder and Leader,
GNUKhata <https://gnukhata.in/>
//(Opensource Accounting, Billing and Inventory Management
Software)//

Hi Adrian,

Even I am thinnking to do some kind of upsert with this situation.

So to be clear the tables you are working can have records from
multiple organizations in a single table?

And I would have to set the pkey to an unassigned value when there
is conflict.

I am seeing nextval() in your future:)

I may also choose to revamp the serial by timestamps but don't know
if the target customers would like it.

I would avoid that. In my opinion timestamps are to too volatile to
serve as a PK. If you are going to change I would go with the previous
suggestion of UUID:
https://www.postgresql.org/docs/11/datatype-uuid.html

Not sure your customers would like that either.

Hi Adrian,

I think I would make them like the uuid idea.

So now what I am thinking is to first revamp the database by first
removing all the primary key constraints and then deleting all the values.

Then loop through the existing data and get uuid in that colum for every
row.

I might also require to update all the references to this value as
foreign key in related tables.

But I guess some kind of on update cascade might do well.

I know this would slow down the system, but given that this will be a
one time process for an individual user (that too if he has existing
data ), I would take that trade-off.

What do you say?

--
Regards,
Krishnakant Mane,
Project Founder and Leader,
GNUKhata <https://gnukhata.in/>
//(Opensource Accounting, Billing and Inventory Management Software)//

It would likely be easier to rethink your backup and restore plan.
Putting each restore into its own space would be one tack.

--
Regards,
Krishnakant Mane,
Project Founder and Leader,
GNUKhata <https://gnukhata.in/>
//(Opensource Accounting, Billing and Inventory Management Software)//

#10

Rob Sargent

robjsargent@gmail.com

over 6 years ago

In reply to: Krishnakant Mane (#9)

Re: managing primary key conflicts while restoring data to table with existing data

On Sep 26, 2019, at 12:27 AM, Krishnakant Mane <kkmane@riseup.net> wrote:

On 26/09/19 12:03 AM, Adrian Klaver wrote:

On 9/25/19 8:04 AM, Rob Sargent wrote:

On Sep 25, 2019, at 8:24 AM, Krishnakant Mane <kkmane@riseup.net <mailto:kkmane@riseup.net>> wrote:

On 25/09/19 7:50 PM, Adrian Klaver wrote:

On 9/25/19 12:15 AM, Krishnakant Mane wrote:
Hello all,

I have been using postgresql for an enterprise quality account's automation and inventory management software called GNUKhata <https://gnukhata.in>

Our team is planning to add backup and restore function in the software.

But we don't want to dump the entire database and then restore the same.

What we are trying to do is to copy data specific to an organization.

The challenge here is that I might copy all data (account heads, bills, vouchers etc ) for one organization from an instance on one machine.

I take the archive in what ever format to another machine and now attempt to restore.

The risk here is for example if the primary key value for orgcode in the organization table is 5, it might conflict with the data where I am attempting it to be restored.

Same holds true for bills, invoices etc.

A certain account head with accountcode 1 might be already present on the second machine.

I am not expecting the users to empty all data from the destination machine before restoring a backup.

The reason is that an auditor may have many client's data and one can't predict what primary key values are going to come from a backup.

Basically I can even say this is a copy paste instead of a pure backup and restore.

Can any one suggest how to handle such conflicts?

Hard to say. If the data is held in common tables(bills, vouchers, etc)then the only thing I see happening is changing the PK values to an unused value. That could turn into a nightmare though. Not only that you lose the connection to the original data source. If the data can be broken out into separate tables then I could see placing them in their own schema.

--
Regards,
Krishnakant Mane,
Project Founder and Leader,
GNUKhata <https://gnukhata.in/>
//(Opensource Accounting, Billing and Inventory Management Software)//

Hi Adrian,

Even I am thinnking to do some kind of upsert with this situation.

So to be clear the tables you are working can have records from multiple organizations in a single table?

And I would have to set the pkey to an unassigned value when there is conflict.

I am seeing nextval() in your future:)

I may also choose to revamp the serial by timestamps but don't know if the target customers would like it.

I would avoid that. In my opinion timestamps are to too volatile to serve as a PK. If you are going to change I would go with the previous suggestion of UUID:
https://www.postgresql.org/docs/11/datatype-uuid.html

Not sure your customers would like that either.

Hi Adrian,

I think I would make them like the uuid idea.

So now what I am thinking is to first revamp the database by first removing all the primary key constraints and then deleting all the values.

Then loop through the existing data and get uuid in that colum for every row.

I might also require to update all the references to this value as foreign key in related tables.

But I guess some kind of on update cascade might do well.

I know this would slow down the system, but given that this will be a one time process for an individual user (that too if he has existing data ), I would take that trade-off.

What do you say?

--
Regards,
Krishnakant Mane,
Project Founder and Leader,
GNUKhata <https://gnukhata.in/>
//(Opensource Accounting, Billing and Inventory Management Software)//

It would likely be easier to rethink your backup and restore plan. Putting each restore into its own space would be one tack.

--
Regards,
Krishnakant Mane,
Project Founder and Leader,
GNUKhata
(Opensource Accounting, Billing and Inventory Management Software)

You might think about adding the new UUID column and use the existing primary key to inform the updates in dependent tables. Then remove the old PK column and constraint followed by promoting the UUID to primary key. This could be safely scripted and applied to all instances of your data.
That said, this is only truly necessary of you have production databases to worry about.

#11

Krishnakant Mane

kkmane@riseup.net

over 6 years ago

In reply to: Rob Sargent (#10)

Re: managing primary key conflicts while restoring data to table with existing data

On 26/09/19 6:53 PM, Rob Sargent wrote:

On Sep 26, 2019, at 12:27 AM, Krishnakant Mane <kkmane@riseup.net
<mailto:kkmane@riseup.net>> wrote:

On 26/09/19 12:03 AM, Adrian Klaver wrote:

On 9/25/19 8:04 AM, Rob Sargent wrote:

On Sep 25, 2019, at 8:24 AM, Krishnakant Mane <kkmane@riseup.net
<mailto:kkmane@riseup.net>> wrote:

On 25/09/19 7:50 PM, Adrian Klaver wrote:

On 9/25/19 12:15 AM, Krishnakant Mane wrote:

Hello all,

I have been using postgresql for an enterprise quality account's
automation and inventory management software called GNUKhata
<https://gnukhata.in>

Our team is planning to add backup and restore function in the
software.

But we don't want to dump the entire database and then restore
the same.

What we are trying to do is to copy data specific to an
organization.

The challenge here is that I might copy all data (account heads,
bills, vouchers etc ) for one organization from an instance on
one machine.

I take the archive in what ever format to another machine and
now attempt to restore.

The risk here is for example if the primary key value for
orgcode in the organization table is 5, it might conflict with
the data where I am attempting it to be restored.

Same holds true for bills, invoices etc.

A certain account head with accountcode 1 might be already
present on the second machine.

I am not expecting the users to empty all data from the
destination machine before restoring a backup.

The reason is that an auditor may have many client's data and
one can't predict what primary key values are going to come from
a backup.

Basically I can even say this is a copy paste instead of a pure
backup and restore.

Can any one suggest how to handle such conflicts?

Hard to say. If the data is held in common tables(bills,
vouchers, etc)then the only thing I see happening is changing the
PK values to an unused value. That could turn into a nightmare
though. Not only that you lose the connection to the original
data source. If the data can be broken out into separate tables
then I could see placing them in their own schema.

--
Regards,
Krishnakant Mane,
Project Founder and Leader,
GNUKhata <https://gnukhata.in/>
//(Opensource Accounting, Billing and Inventory Management
Software)//

Hi Adrian,

Even I am thinnking to do some kind of upsert with this situation.

So to be clear the tables you are working can have records from
multiple organizations in a single table?

And I would have to set the pkey to an unassigned value when there
is conflict.

I am seeing nextval() in your future:)

I may also choose to revamp the serial by timestamps but don't
know if the target customers would like it.

I would avoid that. In my opinion timestamps are to too volatile to
serve as a PK. If you are going to change I would go with the
previous suggestion of UUID:
https://www.postgresql.org/docs/11/datatype-uuid.html

Not sure your customers would like that either.

Hi Adrian,

I think I would make them like the uuid idea.

So now what I am thinking is to first revamp the database by first
removing all the primary key constraints and then deleting all the
values.

Then loop through the existing data and get uuid in that colum for
every row.

I might also require to update all the references to this value as
foreign key in related tables.

But I guess some kind of on update cascade might do well.

I know this would slow down the system, but given that this will be a
one time process for an individual user (that too if he has existing
data ), I would take that trade-off.

What do you say?

--
Regards,
Krishnakant Mane,
Project Founder and Leader,
GNUKhata <https://gnukhata.in/>
//(Opensource Accounting, Billing and Inventory Management
Software)//

It would likely be easier to rethink your backup and restore plan.
Putting each restore into its own space would be one tack.

--
Regards,
Krishnakant Mane,
Project Founder and Leader,
GNUKhata <https://gnukhata.in/>
//(Opensource Accounting, Billing and Inventory Management Software)//

You might think about adding the new UUID column and use the existing
primary key to inform the updates in dependent tables. Then remove the
old PK column and constraint followed by promoting the UUID to
primary key. This could be safely scripted and applied to all
instances of your data.
That said, this is only truly necessary of you have production
databases to worry about.

Thanks a million, this is the most logical and safe way.

yes I have a lot of production databases to worry about.

I am only confused about what you mean by "use the existing primary key
to inform the updates in dependent tables."

Are you refering to a cascading effect?

If yes then does it mean I first program my upgrade script to manually
go through all new uuid keys and update the same in the depending tables
with reference to the old primary key working as foreign key in those
tables?

--
Regards,
Krishnakant Mane,
Project Founder and Leader,
GNUKhata <https://gnukhata.in/>
//(Opensource Accounting, Billing and Inventory Management Software)//

#12

Rob Sargent

robjsargent@gmail.com

over 6 years ago

In reply to: Krishnakant Mane (#11)

Re: managing primary key conflicts while restoring data to table with existing data

On Sep 26, 2019, at 7:44 AM, Krishnakant Mane <kkmane@riseup.net> wrote:

On 26/09/19 6:53 PM, Rob Sargent wrote:

On Sep 26, 2019, at 12:27 AM, Krishnakant Mane <kkmane@riseup.net> wrote:

On 26/09/19 12:03 AM, Adrian Klaver wrote:

On 9/25/19 8:04 AM, Rob Sargent wrote:

On Sep 25, 2019, at 8:24 AM, Krishnakant Mane <kkmane@riseup.net <mailto:kkmane@riseup.net>> wrote:

On 25/09/19 7:50 PM, Adrian Klaver wrote:

On 9/25/19 12:15 AM, Krishnakant Mane wrote:
Hello all,

I have been using postgresql for an enterprise quality account's automation and inventory management software called GNUKhata <https://gnukhata.in>

Our team is planning to add backup and restore function in the software.

But we don't want to dump the entire database and then restore the same.

What we are trying to do is to copy data specific to an organization.

The challenge here is that I might copy all data (account heads, bills, vouchers etc ) for one organization from an instance on one machine.

I take the archive in what ever format to another machine and now attempt to restore.

The risk here is for example if the primary key value for orgcode in the organization table is 5, it might conflict with the data where I am attempting it to be restored.

Same holds true for bills, invoices etc.

A certain account head with accountcode 1 might be already present on the second machine.

I am not expecting the users to empty all data from the destination machine before restoring a backup.

The reason is that an auditor may have many client's data and one can't predict what primary key values are going to come from a backup.

Basically I can even say this is a copy paste instead of a pure backup and restore.

Can any one suggest how to handle such conflicts?

Hard to say. If the data is held in common tables(bills, vouchers, etc)then the only thing I see happening is changing the PK values to an unused value. That could turn into a nightmare though. Not only that you lose the connection to the original data source. If the data can be broken out into separate tables then I could see placing them in their own schema.

--
Regards,
Krishnakant Mane,
Project Founder and Leader,
GNUKhata <https://gnukhata.in/>
//(Opensource Accounting, Billing and Inventory Management Software)//

Hi Adrian,

Even I am thinnking to do some kind of upsert with this situation.

So to be clear the tables you are working can have records from multiple organizations in a single table?

And I would have to set the pkey to an unassigned value when there is conflict.

I am seeing nextval() in your future:)

I may also choose to revamp the serial by timestamps but don't know if the target customers would like it.

I would avoid that. In my opinion timestamps are to too volatile to serve as a PK. If you are going to change I would go with the previous suggestion of UUID:
https://www.postgresql.org/docs/11/datatype-uuid.html

Not sure your customers would like that either.

Hi Adrian,

I think I would make them like the uuid idea.

So now what I am thinking is to first revamp the database by first removing all the primary key constraints and then deleting all the values.

Then loop through the existing data and get uuid in that colum for every row.

I might also require to update all the references to this value as foreign key in related tables.

But I guess some kind of on update cascade might do well.

I know this would slow down the system, but given that this will be a one time process for an individual user (that too if he has existing data ), I would take that trade-off.

What do you say?

--
Regards,
Krishnakant Mane,
Project Founder and Leader,
GNUKhata <https://gnukhata.in/>
//(Opensource Accounting, Billing and Inventory Management Software)//

It would likely be easier to rethink your backup and restore plan. Putting each restore into its own space would be one tack.

--
Regards,
Krishnakant Mane,
Project Founder and Leader,
GNUKhata
(Opensource Accounting, Billing and Inventory Management Software)

You might think about adding the new UUID column and use the existing primary key to inform the updates in dependent tables. Then remove the old PK column and constraint followed by promoting the UUID to primary key. This could be safely scripted and applied to all instances of your data.
That said, this is only truly necessary of you have production databases to worry about.

Thanks a million, this is the most logical and safe way.

yes I have a lot of production databases to worry about.

I am only confused about what you mean by "use the existing primary key to inform the updates in dependent tables."

Are you refering to a cascading effect?

If yes then does it mean I first program my upgrade script to manually go through all new uuid keys and update the same in the depending tables with reference to the old primary key working as foreign key in those tables?

Yes. You will need the old keys to recreate referential integrity (foreign keys, etc)

Show quoted text

--
Regards,
Krishnakant Mane,
Project Founder and Leader,
GNUKhata
(Opensource Accounting, Billing and Inventory Management Software)

#13

Rob Sargent

robjsargent@gmail.com

over 6 years ago

In reply to: Krishnakant Mane (#11)

Re: managing primary key conflicts while restoring data to table with existing data

--
Regards,
Krishnakant Mane,
Project Founder and Leader,
GNUKhata <https://gnukhata.in/>
//(Opensource Accounting, Billing and Inventory Management Software)//

You might think about adding the new UUID column and use the existing
primary key to inform the updates in dependent tables. Then remove
the old PK column and constraint followed by promoting the UUID to
primary key. This could be safely scripted and applied to all
instances of your data.
That said, this is only truly necessary of you have production
databases to worry about.

Thanks a million, this is the most logical and safe way.

yes I have a lot of production databases to worry about.

I am only confused about what you mean by "use the existing primary
key to inform the updates in dependent tables."

Are you refering to a cascading effect?

If yes then does it mean I first program my upgrade script to manually
go through all new uuid keys and update the same in the depending
tables with reference to the old primary key working as foreign key in
those tables?

It occurs to me you will also need to "duplicate" the columns in which
you have foreign keys. How many tables are there in the affected
schemata and what is the size? Pretty sure you will have to go off-line
to perform this sort of transition. Nearly everything will be touched.

Show quoted text

--
Regards,
Krishnakant Mane,
Project Founder and Leader,
GNUKhata <https://gnukhata.in/>
//(Opensource Accounting, Billing and Inventory Management Software)//

#14

Pankaj Jangid

pankaj.jangid@gmail.com

over 6 years ago

In reply to: Krishnakant Mane (#11)

Re: managing primary key conflicts while restoring data to table with existing data

Krishnakant Mane <kkmane@riseup.net> writes:

You might think about adding the new UUID column and use the existing
primary key to inform the updates in dependent tables. Then remove the
old PK column and constraint followed by promoting the UUID to
primary key. This could be safely scripted and applied to all
instances of your data.
That said, this is only truly necessary of you have production
databases to worry about.

Thanks a million, this is the most logical and safe way.

yes I have a lot of production databases to worry about.

I am only confused about what you mean by "use the existing primary key
to inform the updates in dependent tables."

Are you refering to a cascading effect?

If yes then does it mean I first program my upgrade script to manually
go through all new uuid keys and update the same in the depending tables
with reference to the old primary key working as foreign key in those
tables?

I guess that is safest option given that the databases are in
production.

1. add UUID UNIQUI column
2. add references to it by identifying ising pkey
3. remove pkey references and the columns
4. make UUID column the pkey
5. remove old pkey column.

--
Pankaj Jangid