Searching for big differences between values

Started by Durumdaraover 8 years ago5 messagesgeneral
Jump to latest
#1Durumdara
durumdara@gmail.com

Hello!

Somewhere the users made mistakes on prices (stock).

I need to search for big differences between values.
For example:

20
21
21,5
30
28
..
46392 <-
46392 <-

But it could be:

42300
43100
44000
43800
65000 <-
42100

Human eye could locate these values, but there we need to check 30.000
articles and 450.000 values.

Do you have any idea, how to this with SQL?

In most cases the first values are ok, the second interval (after a date)
it could be wrong...

I don't know how to define the solution, but I think PGSQL have intelligent
solution for this problem.

We need to search for elements have bigger value like base price *
tolerance.
But the base price is calculated dynamically from the lower values... The
tolerance is lower on highest base prices.

Thank you for any help!

Best regards
dd

#2Chris Travers
chris.travers@gmail.com
In reply to: Durumdara (#1)
Re: Searching for big differences between values

On Nov 30, 2017 08:35, "Durumdara" <durumdara@gmail.com> wrote:

Hello!

Somewhere the users made mistakes on prices (stock).

I need to search for big differences between values.
For example:

20
21
21,5
30
28
..
46392 <-
46392 <-

But it could be:

42300
43100
44000
43800
65000 <-
42100

Human eye could locate these values, but there we need to check 30.000
articles and 450.000 values.

Do you have any idea, how to this with SQL?

In most cases the first values are ok, the second interval (after a date)
it could be wrong...

I don't know how to define the solution, but I think PGSQL have intelligent
solution for this problem.

We need to search for elements have bigger value like base price *
tolerance.
But the base price is calculated dynamically from the lower values... The
tolerance is lower on highest base prices.

The good news is relational dbs excel at performing. However you have not
provided nearly enough info to help write a query. How are such things
calculated?

Thank you for any help!

Best regards
dd

#3Chris Mair
chris@1006.org
In reply to: Durumdara (#1)
Re: Searching for big differences between values

I need to search for big differences between values.

[...]

Hi,

from an SQL point of view this is not difficult, but you need to
carefully define a criteria for the outliers.

For example, to find values that are more than a standard deviation
away from the mean, do something like this:

chris=# select * from val;
x
-------
20
21
21.5
30
28
46392
46393
40
(8 rows)

chris=# select * from val where x > (select avg(x) + stddev(x) from val) or x < (select avg(x) - stddev(x) from val);
x
-------
46392
46393
(2 rows)

Try with n*stddev(x) for n = 2, 3, 4, 5, ... to see
if you can get to your outliers...

Bye,
Chris.

#4Rory Campbell-Lange
rory@campbell-lange.net
In reply to: Durumdara (#1)
Re: Searching for big differences between values

On 30/11/17, Durumdara (durumdara@gmail.com) wrote:

Somewhere the users made mistakes on prices (stock).

I need to search for big differences between values.
For example:

20
21
21,5
30
28
..
46392 <-
46392 <-

You could use window functions
https://www.postgresql.org/docs/current/static/functions-window.html

Eg
costings=> create table vals (num integer);

costings=> insert into vals values (20), (21), (30), (28), (46392);

costings=> select num, num - lag(num, 1, num) over () as diff from vals
order by num;
num | diff
-------+-------
20 | 0
21 | 1
28 | -2
30 | 9
46392 | 46364
(5 rows)

Although you might want to use averaging or percentages to throw up
errors instead.

#5Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Durumdara (#1)
Re: Searching for big differences between values

Durumdara wrote:

Somewhere the users made mistakes on prices (stock).

I need to search for big differences between values.
For example:

20
21
21,5
30
28
..
46392 <-
46392 <-

But it could be:

42300
43100
44000
43800
65000 <-
42100

Human eye could locate these values, but there we need to check 30.000 articles and 450.000 values.

Do you have any idea, how to this with SQL?

You could use some variant of

SELECT id, price
FROM (SELECT id, price,
avg(price) OVER (ORDER BY id ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) AS av
FROM stock) q
WHERE NOT (price BETWEEN 0.8 * av AND 1.25 * av);

Yours,
Laurenz Albe