filter duplicates by priority

Started by Clark Slaterover 16 years ago6 messagesgeneral
Jump to latest
#1Clark Slater
pg@slatech.com

Hello-

I am trying to use DISTINCT ON to filter out *potential* duplicate values
from a set of sub queries. There are certain cases where there can be
repetitive part numbers that are priced differently. I'm trying to start
with the full list, ordered by priority, and then remove any repeats that
have a lesser priority.

SELECT DISTINCT ON (part_number) * FROM (
SELECT part_number, priority FROM ...
UNION ALL
SELECT part_number, priority FROM ...
UNION ALL
SELECT part_number, priority FROM ...
) AS filter_duplicates ORDER BY priority,part_number

The above statement does not work because if I ORDER BY
priority,part_number then I have to DISTINCT ON (priority,part_number).
But DISTINCT ON (priority, part_number) does not remove the repeated rows
because the same part_number with a different priority becomes a distinct
tuple.

Any suggestions are appreciated.

--------------------------------------

A more detailed explanation of my problem follows:

I am working on an e-commerce system that has different lists of products
which contain many of the same products, at different prices. When a user
searches for a certain set of part numbers, I would like the resulting
products (and prices) to come from one of the lists, according to the
list's priority. Each user can have a different set of lists and
priorities.

Table: product_lists
id | name | priority | user_id
-----+------------------------------+----------+----------
5 | General List of Products | 2 | 23
3 | Different List of Products | 3 | 23
150 | Customer-Specific Products | 1 | 23

Table: products
product_list_id | part_number | price
-----------------+-------------+--------
3 | 92298A | 123.38
5 | 92298A | 111.04
3 | C39207 | 78.38
150 | C39207 | 67.93

Below is a simplified example of the structure of the query I am working
with. I realize that in this case, I could re-factor all of this into one
statement, but each sub-query in the real case has a more complex set of
joins that determines the price. The pricing joins from one sub-query to
the next vary, so a collection of sub-queries seemed to be a logical
solution. Some part numbers are found in only one of the lists, while
other part numbers are repeated across lists at different prices.

This is what I would *like* to say:

SELECT DISTINCT ON (part_number) * FROM (

SELECT
product_list_id,part_number,price,priority
FROM products, product_lists
WHERE
product_list_id=product_lists.id
AND product_list_id=150
AND (part_number='92298A' OR part_number='C39207' OR part_number=...)

UNION ALL

SELECT
product_list_id,part_number,price,priority
FROM products, product_lists
WHERE
product_list_id= product_lists.id
AND product_list_id=5
AND (part_number='92298A' OR part_number='C39207' OR part_number=...)

UNION ALL

SELECT
product_list_id,part_number,price,priority
FROM products, product_lists
WHERE
product_list_id= product_lists.id
AND product_list_id=3
AND (part_number='92298A' OR part_number='C39207' OR part_number=...)

) AS filter_duplicates ORDER BY priority,part_number

I need to ORDER BY priority so that, in the case of duplicates, the
product from the desired list is returned first. Then the purpose of
DISTINCT ON is to filter out any duplicate part numbers that have a lesser
priority. But, the above statement fails because the DISTINCT ON
expression must match the leftmost ORDER BY expression. However,
inserting the priority into the DISTINCT ON expression means that all of
the resulting tuples are unique, even though the part_number is the same.

#2Hartman, Matthew
Matthew.Hartman@krcc.on.ca
In reply to: Clark Slater (#1)
Re: filter duplicates by priority

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
owner@postgresql.org] On Behalf Of Clark Slater
Sent: Tuesday, July 14, 2009 10:04 AM
Subject: [GENERAL] filter duplicates by priority

Maybe I'm missing something, but why not something like (incoming
pseudo-SQL):

Select part_number, max(priority)
From (
Select part_number, priority from TableAndCriteria
Union all
Select part_number, priority from TableAndCriteria
Union all
Select part_number, priority from TableAndCriteria
) as allTables
Group by part_number

Matthew Hartman
Programmer/Analyst
Information Management, ICP
Kingston General Hospital
.now.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Clark Slater (#1)
Re: filter duplicates by priority

"Clark Slater" <pg@slatech.com> writes:

I am trying to use DISTINCT ON to filter out *potential* duplicate values
from a set of sub queries. There are certain cases where there can be
repetitive part numbers that are priced differently. I'm trying to start
with the full list, ordered by priority, and then remove any repeats that
have a lesser priority.

SELECT DISTINCT ON (part_number) * FROM (
SELECT part_number, priority FROM ...
UNION ALL
SELECT part_number, priority FROM ...
UNION ALL
SELECT part_number, priority FROM ...
) AS filter_duplicates ORDER BY priority,part_number

The above statement does not work because if I ORDER BY
priority,part_number then I have to DISTINCT ON (priority,part_number).
But DISTINCT ON (priority, part_number) does not remove the repeated rows
because the same part_number with a different priority becomes a distinct
tuple.

AFAICS, changing it to ORDER BY part_number,priority would solve the
stated problem. If you really need the final result in priority rather
than part number order, put the whole thing in a sub-select and re-sort
outside it.

regards, tom lane

#4Martijn van Oosterhout
kleptog@svana.org
In reply to: Clark Slater (#1)
Re: filter duplicates by priority

On Tue, Jul 14, 2009 at 10:04:12AM -0400, Clark Slater wrote:

Hello-

I am trying to use DISTINCT ON to filter out *potential* duplicate values
from a set of sub queries. There are certain cases where there can be
repetitive part numbers that are priced differently. I'm trying to start
with the full list, ordered by priority, and then remove any repeats that
have a lesser priority.

I think what you need to do is order by part_number first, do the
DISTINCT ON () and then do an ORDER BY priority around that.

Hope this helps,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.

#5Sam Mason
sam@samason.me.uk
In reply to: Clark Slater (#1)
Re: filter duplicates by priority

On Tue, Jul 14, 2009 at 10:04:12AM -0400, Clark Slater wrote:

SELECT DISTINCT ON (part_number) * FROM (
SELECT part_number, priority FROM ...
UNION ALL
SELECT part_number, priority FROM ...
UNION ALL
SELECT part_number, priority FROM ...
) AS filter_duplicates ORDER BY priority,part_number

The above statement does not work because if I ORDER BY
priority,part_number then I have to DISTINCT ON (priority,part_number).
But DISTINCT ON (priority, part_number) does not remove the repeated rows
because the same part_number with a different priority becomes a distinct
tuple.

I think you just want to swap the ORDER BY columns around; i.e:

ORDER BY part_number, priority

--
Sam http://samason.me.uk/

#6Clark Slater
pg@slatech.com
In reply to: Tom Lane (#3)
Re: filter duplicates by priority

This solved my problem. Now why didn't I think of that!? Thank you very
much everybody. This list is an incredible resource.

-Clark

Show quoted text

AFAICS, changing it to ORDER BY part_number,priority would solve the
stated problem. If you really need the final result in priority rather
than part number order, put the whole thing in a sub-select and re-sort
outside it.

regards, tom lane

"Clark Slater" <pg@slatech.com> writes:

I am trying to use DISTINCT ON to filter out *potential* duplicate
values
from a set of sub queries. There are certain cases where there can be
repetitive part numbers that are priced differently. I'm trying to
start
with the full list, ordered by priority, and then remove any repeats
that
have a lesser priority.

SELECT DISTINCT ON (part_number) * FROM (
SELECT part_number, priority FROM ...
UNION ALL
SELECT part_number, priority FROM ...
UNION ALL
SELECT part_number, priority FROM ...
) AS filter_duplicates ORDER BY priority,part_number

The above statement does not work because if I ORDER BY
priority,part_number then I have to DISTINCT ON (priority,part_number).
But DISTINCT ON (priority, part_number) does not remove the repeated
rows
because the same part_number with a different priority becomes a
distinct
tuple.