combining semi-duplicate rows
Hello,
I have a table of rows which partially duplicate one another and need to
be merged and moved into a table with a primary key.
As an additional complication, some of the duplicates contain different
information, ex.:
schema1.datatable: key1 None None 3 4
schema2.datatable: key1 1 2 7 None
desired result:
schema1.datatable: key1 1 2 3 4
I looked for a specific function that would do this sort of merge and
failed. So I tried:
UPDATE schema1.datatable SET schema1.datatable.field1 =
schema2.datatable.field1 FROM schema2.datatable
WHERE schema2.datatable.keyfield = schema1.datatable.keyfield
AND schema1.datatable.field1 = None;
Which is suboptimal because I'd need a command for each field, but it
would be a start. However, the schema names are not recognised. I get
error messages to the effect that "cross-database references are not
implemented" or "relation "schema1/2" does not exist.Even the much simpler
SELECT DISTINCT schema2.datatable INTO schema1.datatable;
...gives me these messages. Qualifying right up to the database level
produces "improper qualified name (too many dotted names)".
I'm pretty sure that this isn't a capitalization/quoting problem as
described in the FAQ. Is it not possible to use these functions between
schemas? Or am I misusing the functions in a more basic way?
The problem is somewhat similar to this one:
http://archives.postgresql.org/pgsql-sql/2007-02/msg00055.php
Namely,
Regards,
H.Jenkins
"hjenkins" <hjenkins@uvic.ca> writes:
So I tried:
UPDATE schema1.datatable SET schema1.datatable.field1 =
schema2.datatable.field1 FROM schema2.datatable
WHERE schema2.datatable.keyfield = schema1.datatable.keyfield
AND schema1.datatable.field1 = None;
Which is suboptimal because I'd need a command for each field, but it
would be a start. However, the schema names are not recognised. I get
error messages to the effect that "cross-database references are not
implemented" or "relation "schema1/2" does not exist.
The target column of a SET clause can't be qualified with the relation
name; it would introduce ambiguity in the case of composite-type fields,
and it's useless anyway since the target relation was already given.
Your example works for me (syntactically at least) as
regression=# UPDATE schema1.datatable SET field1 =
schema2.datatable.field1 FROM schema2.datatable
WHERE schema2.datatable.keyfield = schema1.datatable.keyfield
AND schema1.datatable.field1 = 'None';
UPDATE 0
Personally, though, I'd use some aliases to improve readability and
forestall the onset of carpal tunnel syndrome:
regression=# UPDATE schema1.datatable t SET field1 =
s.field1 FROM schema2.datatable s
WHERE s.keyfield = t.keyfield
AND t.field1 = 'None';
UPDATE 0
regards, tom lane