Limitting full join to one match
Dear Experts,
I have a couple of tables that I want to reconcile, finding rows
that match and places where rows are missing from one table or the
other:
db=> select * from a;
+------------+--------+
| date | amount |
+------------+--------+
| 2018-01-01 | 10.00 |
| 2018-02-01 | 5.00 | <-- missing from b
| 2018-04-01 | 5.00 |
+------------+--------+
db=> select * from b;
+------------+--------+
| date | amount |
+------------+--------+
| 2018-01-01 | 10.00 |
| 2018-03-01 | 8.00 | <-- missing from a
| 2018-04-01 | 5.00 |
+------------+--------+
db=> select a.date, a.amount, b.date, b.amount from a full join b using (date,amount);
+------------+--------+------------+--------+
| date | amount | date | amount |
+------------+--------+------------+--------+
| 2018-01-01 | 10.00 | 2018-01-01 | 10.00 |
| 2018-02-01 | 5.00 | | |
| | | 2018-03-01 | 8.00 |
| 2018-04-01 | 5.00 | 2018-04-01 | 5.00 |
+------------+--------+------------+--------+
This works fine until I have multiple items with the same date
and amount:
db=> select * from a;
+------------+--------+
| date | amount |
+------------+--------+
| 2018-01-01 | 10.00 |
| 2018-02-01 | 5.00 |
| 2018-04-01 | 5.00 |
| 2018-05-01 | 20.00 | <--
| 2018-05-01 | 20.00 | <--
+------------+--------+
db=> select * from b;
+------------+--------+
| date | amount |
+------------+--------+
| 2018-01-01 | 10.00 |
| 2018-03-01 | 8.00 |
| 2018-04-01 | 5.00 |
| 2018-05-01 | 20.00 | <--
| 2018-05-01 | 20.00 | <--
+------------+--------+
db=> select a.date, a.amount, b.date, b.amount from a full join b using (date,amount);
+------------+--------+------------+--------+
| date | amount | date | amount |
+------------+--------+------------+--------+
| 2018-01-01 | 10.00 | 2018-01-01 | 10.00 |
| 2018-02-01 | 5.00 | | |
| | | 2018-03-01 | 8.00 |
| 2018-04-01 | 5.00 | 2018-04-01 | 5.00 |
| 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 1
| 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 2
| 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 3
| 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 4
+------------+--------+------------+--------+
It has, of course, put four rows in the output for the new items.
So my question is: how can I modify my query to output only two rows,
like this:?
+------------+--------+------------+--------+
| date | amount | date | amount |
+------------+--------+------------+--------+
| 2018-01-01 | 10.00 | 2018-01-01 | 10.00 |
| 2018-02-01 | 5.00 | | |
| | | 2018-03-01 | 8.00 |
| 2018-04-01 | 5.00 | 2018-04-01 | 5.00 |
| 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 1
| 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 2
+------------+--------+------------+--------+
Any suggestions anyone?
The best I have found so far is something involving EXCEPT ALL:
db=> select * from a except all select * from b;
db=> select * from b except all select * from a;
That's not ideal, though, as what I ultimately want is something
that lists everything with its status:
+------------+--------+--------+
| date | amount | status |
+------------+--------+--------+
| 2018-01-01 | 10.00 | OK |
| 2018-02-01 | 5.00 | a_only |
| 2018-03-01 | 8.00 | b_only |
| 2018-04-01 | 5.00 | OK |
| 2018-05-01 | 20.00 | OK |
| 2018-05-01 | 20.00 | OK |
+------------+--------+--------+
That would be easy enough to achieve from the JOIN.
Thanks, Phil.
On Wed, Dec 5, 2018 at 4:34 PM Phil Endecott <
spam_from_pgsql_lists@chezphil.org> wrote:
Dear Experts,
I have a couple of tables that I want to reconcile, finding rows
that match and places where rows are missing from one table or the
other:...
So my question is: how can I modify my query to output only two rows,
like this:?+------------+--------+------------+--------+ | date | amount | date | amount | +------------+--------+------------+--------+ | 2018-01-01 | 10.00 | 2018-01-01 | 10.00 | | 2018-02-01 | 5.00 | | | | | | 2018-03-01 | 8.00 | | 2018-04-01 | 5.00 | 2018-04-01 | 5.00 | | 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 1 | 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 2 +------------+--------+------------+--------+
Evening Phil,
Window functions are your friend here. I prefer views for this stuff - but
subqueries would work just fine.
create view a_rows as (select *,
row_number() OVER (PARTITION BY date, amount) AS pos
from a);
create view b_rows as (select *,
row_number() OVER (PARTITION BY date, amount) AS pos
from b);
select
a_rows.date,
a_rows.amount,
a_rows.pos,
b_rows.date,
b_rows.amount,
b_rows.pos
from
a_rows full join b_rows using (date,amount,pos);
Example here - http://sqlfiddle.com/#!17/305d6/3
John
Show quoted text
Any suggestions anyone?
The best I have found so far is something involving EXCEPT ALL:
db=> select * from a except all select * from b;
db=> select * from b except all select * from a;That's not ideal, though, as what I ultimately want is something
that lists everything with its status:+------------+--------+--------+
| date | amount | status |
+------------+--------+--------+
| 2018-01-01 | 10.00 | OK |
| 2018-02-01 | 5.00 | a_only |
| 2018-03-01 | 8.00 | b_only |
| 2018-04-01 | 5.00 | OK |
| 2018-05-01 | 20.00 | OK |
| 2018-05-01 | 20.00 | OK |
+------------+--------+--------+That would be easy enough to achieve from the JOIN.
Thanks, Phil.
On 12/5/2018 8:30 PM, John W Higgins wrote:
On Wed, Dec 5, 2018 at 4:34 PM Phil Endecott
<spam_from_pgsql_lists@chezphil.org
<mailto:spam_from_pgsql_lists@chezphil.org>> wrote:Dear Experts,
I have a couple of tables that I want to reconcile, finding rows
that match and places where rows are missing from one table or the
other:...
So my question is: how can I modify my query to output only two rows,
like this:?+------------+--------+------------+--------+ | date | amount | date | amount | +------------+--------+------------+--------+ | 2018-01-01 | 10.00 | 2018-01-01 | 10.00 | | 2018-02-01 | 5.00 | | | | | | 2018-03-01 | 8.00 | | 2018-04-01 | 5.00 | 2018-04-01 | 5.00 | | 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 1 | 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 2 +------------+--------+------------+--------+Evening Phil,
Window functions are your friend here. I prefer views for this stuff -
but subqueries would work just fine.create view a_rows as (select *,
row_number() OVER (PARTITION BY date, amount)
AS pos from a);
create view b_rows as (select *,
row_number() OVER (PARTITION BY date, amount)
AS pos from b);select
a_rows.date,
a_rows.amount,
a_rows.pos,
b_rows.date,
b_rows.amount,
b_rows.pos
from
a_rows full join b_rows using (date,amount,pos);Example here - http://sqlfiddle.com/#!17/305d6/3
John
Any suggestions anyone?
The best I have found so far is something involving EXCEPT ALL:
db=> select * from a except all select * from b;
db=> select * from b except all select * from a;That's not ideal, though, as what I ultimately want is something
that lists everything with its status:+------------+--------+--------+
| date | amount | status |
+------------+--------+--------+
| 2018-01-01 | 10.00 | OK |
| 2018-02-01 | 5.00 | a_only |
| 2018-03-01 | 8.00 | b_only |
| 2018-04-01 | 5.00 | OK |
| 2018-05-01 | 20.00 | OK |
| 2018-05-01 | 20.00 | OK |
+------------+--------+--------+That would be easy enough to achieve from the JOIN.
Thanks, Phil.
This question is always asked time to time.
I have found an old article with so far the best solution for big tables.
https://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:2151582681236#15393095283923
On the same test data
create table a (date date, amount money);
create table b (date date, amount money);
insert into a values ('2018-01-01', 10);
insert into a values ('2018-02-01', 5);
insert into a values ('2018-04-01', 5);
insert into a values ('2018-05-01', 20);
insert into a values ('2018-05-01', 20);
insert into b values ('2018-01-01', 10);
insert into b values ('2018-03-01', 8);
insert into b values ('2018-04-01', 5);
insert into b values ('2018-05-01', 20);
insert into b values ('2018-05-01', 20);
select tt.date,
tt.amount,
count(tt.src1) CNT1,
count(tt.src2) CNT2
from
(
select a.date,
a.amount,
1 src1,
null::integer src2
from a
union all
select b.date,
b.amount,
null::integer src1,
2 src2
from b
) tt
group by tt.date, tt.amount;
date amount cnt1 cnt2
2018-01-01 $10.00 1 1
2018-02-01 $5.00 1 0
2018-03-01 $8.00 0 1
2018-04-01 $5.00 1 1
2018-05-01 $20.00 2 2
It requires a sort, so you may want to increase work_mem before
execution, and then return it back like
SET work_mem = '512MB';
... run your query
RESET work_mem;
Regards,
Sergei Agalakov
On 12/05/2018 06:34 PM, Phil Endecott wrote:
Dear Experts,
I have a couple of tables that I want to reconcile, finding rows
that match and places where rows are missing from one table or the
other:db=> select * from a;
+------------+--------+
| date | amount |
+------------+--------+
| 2018-01-01 | 10.00 |
| 2018-02-01 | 5.00 | <-- missing from b
| 2018-04-01 | 5.00 |
+------------+--------+db=> select * from b;
+------------+--------+
| date | amount |
+------------+--------+
| 2018-01-01 | 10.00 |
| 2018-03-01 | 8.00 | <-- missing from a
| 2018-04-01 | 5.00 |
+------------+--------+db=> select a.date, a.amount, b.date, b.amount from a full join b using (date,amount); +------------+--------+------------+--------+ | date | amount | date | amount | +------------+--------+------------+--------+ | 2018-01-01 | 10.00 | 2018-01-01 | 10.00 | | 2018-02-01 | 5.00 | | | | | | 2018-03-01 | 8.00 | | 2018-04-01 | 5.00 | 2018-04-01 | 5.00 | +------------+--------+------------+--------+This works fine until I have multiple items with the same date
and amount:db=> select * from a;
+------------+--------+
| date | amount |
+------------+--------+
| 2018-01-01 | 10.00 |
| 2018-02-01 | 5.00 |
| 2018-04-01 | 5.00 |
| 2018-05-01 | 20.00 | <--
| 2018-05-01 | 20.00 | <--
+------------+--------+db=> select * from b;
+------------+--------+
| date | amount |
+------------+--------+
| 2018-01-01 | 10.00 |
| 2018-03-01 | 8.00 |
| 2018-04-01 | 5.00 |
| 2018-05-01 | 20.00 | <--
| 2018-05-01 | 20.00 | <--
+------------+--------+
What's your PK on "a" and "b"?
(Also, gmail seems to think that all -- or at least most -- of your email is
spam.)
db=> select a.date, a.amount, b.date, b.amount from a full join b using (date,amount); +------------+--------+------------+--------+ | date | amount | date | amount | +------------+--------+------------+--------+ | 2018-01-01 | 10.00 | 2018-01-01 | 10.00 | | 2018-02-01 | 5.00 | | | | | | 2018-03-01 | 8.00 | | 2018-04-01 | 5.00 | 2018-04-01 | 5.00 | | 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 1 | 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 2 | 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 3 | 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 4 +------------+--------+------------+--------+It has, of course, put four rows in the output for the new items.
So my question is: how can I modify my query to output only two rows,
like this:?+------------+--------+------------+--------+ | date | amount | date | amount | +------------+--------+------------+--------+ | 2018-01-01 | 10.00 | 2018-01-01 | 10.00 | | 2018-02-01 | 5.00 | | | | | | 2018-03-01 | 8.00 | | 2018-04-01 | 5.00 | 2018-04-01 | 5.00 | | 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 1 | 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 2 +------------+--------+------------+--------+Any suggestions anyone?
The best I have found so far is something involving EXCEPT ALL:
db=> select * from a except all select * from b;
db=> select * from b except all select * from a;That's not ideal, though, as what I ultimately want is something
that lists everything with its status:+------------+--------+--------+
| date | amount | status |
+------------+--------+--------+
| 2018-01-01 | 10.00 | OK |
| 2018-02-01 | 5.00 | a_only |
| 2018-03-01 | 8.00 | b_only |
| 2018-04-01 | 5.00 | OK |
| 2018-05-01 | 20.00 | OK |
| 2018-05-01 | 20.00 | OK |
+------------+--------+--------+That would be easy enough to achieve from the JOIN.
Thanks, Phil.
--
Angular momentum makes the world go 'round.
John W Higgins wrote:
On Wed, Dec 5, 2018 at 4:34 PM Phil Endecott <
spam_from_pgsql_lists@chezphil.org> wrote:Dear Experts,
I have a couple of tables that I want to reconcile, finding rows
that match and places where rows are missing from one table or the
other:...
So my question is: how can I modify my query to output only two rows,
like this:?+------------+--------+------------+--------+ | date | amount | date | amount | +------------+--------+------------+--------+ | 2018-01-01 | 10.00 | 2018-01-01 | 10.00 | | 2018-02-01 | 5.00 | | | | | | 2018-03-01 | 8.00 | | 2018-04-01 | 5.00 | 2018-04-01 | 5.00 | | 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 1 | 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 2 +------------+--------+------------+--------+Evening Phil,
Window functions are your friend here. I prefer views for this stuff - but
subqueries would work just fine.create view a_rows as (select *,
row_number() OVER (PARTITION BY date, amount) AS pos
from a);
create view b_rows as (select *,
row_number() OVER (PARTITION BY date, amount) AS pos
from b);select
a_rows.date,
a_rows.amount,
a_rows.pos,
b_rows.date,
b_rows.amount,
b_rows.pos
from
a_rows full join b_rows using (date,amount,pos);
Thanks John, that's great. I'm a little surprised that there isn't an
easier way, but this certainly works.
Regard, Phil.
Hi Ron,
Ron wrote:
On 12/05/2018 06:34 PM, Phil Endecott wrote:
Dear Experts,
I have a couple of tables that I want to reconcile, finding rows
that match and places where rows are missing from one table or the
other:db=> select * from a;
+------------+--------+
| date | amount |
+------------+--------+
| 2018-01-01 | 10.00 |
| 2018-02-01 | 5.00 | <-- missing from b
| 2018-04-01 | 5.00 |
+------------+--------+db=> select * from b;
+------------+--------+
| date | amount |
+------------+--------+
| 2018-01-01 | 10.00 |
| 2018-03-01 | 8.00 | <-- missing from a
| 2018-04-01 | 5.00 |
+------------+--------+db=> select a.date, a.amount, b.date, b.amount from a full join b using (date,amount); +------------+--------+------------+--------+ | date | amount | date | amount | +------------+--------+------------+--------+ | 2018-01-01 | 10.00 | 2018-01-01 | 10.00 | | 2018-02-01 | 5.00 | | | | | | 2018-03-01 | 8.00 | | 2018-04-01 | 5.00 | 2018-04-01 | 5.00 | +------------+--------+------------+--------+This works fine until I have multiple items with the same date
and amount:db=> select * from a;
+------------+--------+
| date | amount |
+------------+--------+
| 2018-01-01 | 10.00 |
| 2018-02-01 | 5.00 |
| 2018-04-01 | 5.00 |
| 2018-05-01 | 20.00 | <--
| 2018-05-01 | 20.00 | <--
+------------+--------+db=> select * from b;
+------------+--------+
| date | amount |
+------------+--------+
| 2018-01-01 | 10.00 |
| 2018-03-01 | 8.00 |
| 2018-04-01 | 5.00 |
| 2018-05-01 | 20.00 | <--
| 2018-05-01 | 20.00 | <--
+------------+--------+What's your PK on "a" and "b"?
These input tables can have duplicate rows, so defining a primary key
requires something like a row ID or similar.
(Also, gmail seems to think that all -- or at least most -- of your email is
spam.)
Yes, it is becoming increasingly difficult to persuade gmail etc. that
you are not a spammer if you run your own mail server. If you have any
interesting headers suggesting exactly what they disliked about my message,
could you please forward them off-list? Thanks.
Regards, Phil.
Yes, it is becoming increasingly difficult to persuade gmail etc. that> you are not a spammer if you run your own mail server. If you
have any> interesting headers suggesting exactly what they disliked about my
message,> could you please forward them off-list? Thanks.
It is for this reason (and few others) I am off gmail and other free
email accounts.I have tried gmail / outlook / yahoo / aol and all of them mark many
mails to this listas spam.
I like fastmail a lot and $3 per month is practically free.