Refining query statement

Started by Rich Shepardover 7 years ago33 messagesgeneral
Jump to latest
#1Rich Shepard
rshepard@appl-ecosys.com

Working with my sales/client management system using psql I have a select
statement to identify contacts to be made. This statement works:

select (C.contact_id, C.lname, C.fname, C.direct_phone, O.org_name,
A.next_contact)
from Contacts as C, Organizations as O, Activities as A
where C.org_id = O.org_id and C.contact_id = A.contact_id and
A.next_contact <= 'today' and A.next_contact > '2018-12-31' and
A.next_contact is not null;

but would benefit from tweaking. When I have had multiple contacts with
someone I want only the most recent one displayed, not all, and they should
be no more ancient than a defined period (e.g., a month).

I want to learn how to make this query cleaner and more flexible. When I
write the UI for this I want to be able to specify a data range in addition
to a fixed 'today'. Pointers on what to read will be very helpful.

TIA,

Rich

#2Thomas Kellerer
spam_eater@gmx.net
In reply to: Rich Shepard (#1)
Re: Refining query statement

Rich Shepard schrieb am 15.01.2019 um 16:39:

  Working with my sales/client management system using psql I have a select
statement to identify contacts to be made. This statement works:

select (C.contact_id, C.lname, C.fname, C.direct_phone, O.org_name, A.next_contact)
from Contacts as C, Organizations as O, Activities as A
where C.org_id = O.org_id and C.contact_id = A.contact_id and
      A.next_contact <= 'today' and A.next_contact > '2018-12-31' and
      A.next_contact is not null;

but would benefit from tweaking. When I have had multiple contacts with
someone I want only the most recent one displayed, not all, and they should
be no more ancient than a defined period (e.g., a month).

  I want to learn how to make this query cleaner and more flexible. When I
write the UI for this I want to be able to specify a data range in addition
to a fixed 'today'. Pointers on what to read will be very helpful.

With regards to "cleaner": the first thing to do is to remove the parentheses around the column list.
In Postgres "(a,b,c)" creates a single column with an anonymous record type (that contains three fields), rather than selecting three columns.
In other DBMS those parentheses are simply useless.

"cleaner" is always subjective, but I find explicit JOIN operators a lot cleaner than the old implicit joins.

The condition "A.next_contact is not null" is actually no necessary because you already have a condition on that column, so NULL values won't be returned anyway.

To get the "most recent one" in Postgres, DISTINCT ON () is usually the best way to do it:

So we end up with something like this:

select distinct on (C.contact_id) C.contact_id, C.lname, C.fname, C.direct_phone, O.org_name, A.next_contact
from Contacts as C
join Organizations as O on C.org_id = O.org_id
join Activities as A on C.contact_id = A.contact_id
where A.next_contact <= 'today'
and A.next_contact > '2018-12-31'
order by c.contact_id, a.next_contact DESC;

#3Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Rich Shepard (#1)
Re: Refining query statement

On 1/15/19 7:39 AM, Rich Shepard wrote:

  Working with my sales/client management system using psql I have a
select
statement to identify contacts to be made. This statement works:

select (C.contact_id, C.lname, C.fname, C.direct_phone, O.org_name,
A.next_contact)
from Contacts as C, Organizations as O, Activities as A
where C.org_id = O.org_id and C.contact_id = A.contact_id and
      A.next_contact <= 'today' and A.next_contact > '2018-12-31' and
      A.next_contact is not null;

but would benefit from tweaking. When I have had multiple contacts with
someone I want only the most recent one displayed, not all, and they should
be no more ancient than a defined period (e.g., a month).

  I want to learn how to make this query cleaner and more flexible. When I
write the UI for this I want to be able to specify a data range in addition
to a fixed 'today'. Pointers on what to read will be very helpful.

For the above I could see using a datepicker widget that allows for
multidate select. The specifics would depend on the software you are
using to write the UI.

TIA,

Rich

--
Adrian Klaver
adrian.klaver@aklaver.com

#4Ron
ronljohnsonjr@gmail.com
In reply to: Thomas Kellerer (#2)
Re: Refining query statement

On 1/15/19 9:47 AM, Thomas Kellerer wrote:

Rich Shepard schrieb am 15.01.2019 um 16:39:

  Working with my sales/client management system using psql I have a select
statement to identify contacts to be made. This statement works:

select (C.contact_id, C.lname, C.fname, C.direct_phone, O.org_name, A.next_contact)
from Contacts as C, Organizations as O, Activities as A
where C.org_id = O.org_id and C.contact_id = A.contact_id and
      A.next_contact <= 'today' and A.next_contact > '2018-12-31' and
      A.next_contact is not null;

but would benefit from tweaking. When I have had multiple contacts with
someone I want only the most recent one displayed, not all, and they should
be no more ancient than a defined period (e.g., a month).

  I want to learn how to make this query cleaner and more flexible. When I
write the UI for this I want to be able to specify a data range in addition
to a fixed 'today'. Pointers on what to read will be very helpful.

With regards to "cleaner": the first thing to do is to remove the parentheses around the column list.
In Postgres "(a,b,c)" creates a single column with an anonymous record type (that contains three fields), rather than selecting three columns.
In other DBMS those parentheses are simply useless.

"cleaner" is always subjective, but I find explicit JOIN operators a lot cleaner than the old implicit joins.

The condition "A.next_contact is not null" is actually no necessary because you already have a condition on that column, so NULL values won't be returned anyway.

To get the "most recent one" in Postgres, DISTINCT ON () is usually the best way to do it:

So we end up with something like this:

select distinct on (C.contact_id) C.contact_id, C.lname, C.fname, C.direct_phone, O.org_name, A.next_contact
from Contacts as C
join Organizations as O on C.org_id = O.org_id
join Activities as A on C.contact_id = A.contact_id
where A.next_contact <= 'today'
and A.next_contact > '2018-12-31'
order by c.contact_id, a.next_contact DESC;

And I've never liked this method (though I'm old and crotchety)....

--
Angular momentum makes the world go 'round.

#5Rich Shepard
rshepard@appl-ecosys.com
In reply to: Adrian Klaver (#3)
Re: Refining query statement

On Tue, 15 Jan 2019, Adrian Klaver wrote:

For the above I could see using a datepicker widget that allows for
multidate select. The specifics would depend on the software you are using
to write the UI.

Adrian,

I'm using wxPython4, and I will use a calendar with that. But, is there a
way to enhance the psql version, too?

Thanks,

Rich

#6Rich Shepard
rshepard@appl-ecosys.com
In reply to: Thomas Kellerer (#2)
Re: Refining query statement

On Tue, 15 Jan 2019, Thomas Kellerer wrote:

With regards to "cleaner": the first thing to do is to remove the
parentheses around the column list. In Postgres "(a,b,c)" creates a single
column with an anonymous record type (that contains three fields), rather
than selecting three columns. In other DBMS those parentheses are simply
useless.

Thomas,

I should have seen that myself. Thanks for pointing out the obvious to me.

"cleaner" is always subjective, but I find explicit JOIN operators a lot
cleaner than the old implicit joins.

This does make sense; I need to refresh my knowledge of JOIN operators and
will do so.

The condition "A.next_contact is not null" is actually no necessary
because you already have a condition on that column, so NULL values won't
be returned anyway.

True that.

To get the "most recent one" in Postgres, DISTINCT ON () is usually the
best way to do it: So we end up with something like this:

select distinct on (C.contact_id) C.contact_id, C.lname, C.fname, C.direct_phone, O.org_name, A.next_contact
from Contacts as C
join Organizations as O on C.org_id = O.org_id
join Activities as A on C.contact_id = A.contact_id
where A.next_contact <= 'today'
and A.next_contact > '2018-12-31'
order by c.contact_id, a.next_contact DESC;

This insight really helps. Now I know how to approach other queries.

Many thanks,

Rich

#7Rich Shepard
rshepard@appl-ecosys.com
In reply to: Ron (#4)
Re: Refining query statement

On Tue, 15 Jan 2019, Ron wrote:

And I've never liked this method (though I'm old and crotchety)....

Ron,

I'm older but not crotchety (most days), and I'm taking my SQL knowledge
beyone what I've used in the past. I would appreciate you're explaining why
you don't like the explicit JOINs presented by Thomas. The more I learn from
you more experienced SQL programmers the better I am.

Regards,

Rich

#8Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Rich Shepard (#5)
Re: Refining query statement

On 1/15/19 8:26 AM, Rich Shepard wrote:

On Tue, 15 Jan 2019, Adrian Klaver wrote:

For the above I could see using a datepicker widget that allows for
multidate select. The specifics would depend on the software you are
using
to write the UI.

Adrian,

  I'm using wxPython4, and I will use a calendar with that. But, is
there a
way to enhance the psql version, too?

Use BETWEEN?:
https://www.postgresql.org/docs/10/functions-comparison.html
a BETWEEN x AND y between

So:

next_contact BETWEEN '01/01/2019'::date AND '01/15/2019'::date

Or a range function:

https://www.postgresql.org/docs/10/functions-range.html

<@ element is contained by

Thanks,

Rich

--
Adrian Klaver
adrian.klaver@aklaver.com

#9Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Ron (#4)
Re: Refining query statement

On 1/15/19 8:02 AM, Ron wrote:

the best way to do it:

So we end up with something like this:

     select distinct on (C.contact_id) C.contact_id, C.lname, C.fname,
C.direct_phone, O.org_name, A.next_contact
     from Contacts as C
       join Organizations as O on C.org_id = O.org_id
       join Activities as A on C.contact_id = A.contact_id
     where A.next_contact <= 'today'
       and A.next_contact > '2018-12-31'
     order by c.contact_id, a.next_contact DESC;

And I've never liked this method (though I'm old and crotchety)....

What is the specific objection?

To me:

1) Plus side. It saves an explicit sub-query

2) Minus side. It is not portable. But then complete portability is a
unicorn in my opinion.

--
Adrian Klaver
adrian.klaver@aklaver.com

#10Rich Shepard
rshepard@appl-ecosys.com
In reply to: Adrian Klaver (#8)
Re: Refining query statement

On Tue, 15 Jan 2019, Adrian Klaver wrote:

Use BETWEEN?:
https://www.postgresql.org/docs/10/functions-comparison.html
a BETWEEN x AND y between

So:

next_contact BETWEEN '01/01/2019'::date AND '01/15/2019'::date

Or a range function:

https://www.postgresql.org/docs/10/functions-range.html

Adrian,

Ah, yes! I do need to refresh my SQL knowledge and am working on that.

Thanks very much again,

Rich

#11Thomas Kellerer
spam_eater@gmx.net
In reply to: Adrian Klaver (#9)
Re: Refining query statement

Adrian Klaver schrieb am 15.01.2019 um 17:44:

So we end up with something like this:

     select distinct on (C.contact_id) C.contact_id, C.lname, C.fname, C.direct_phone, O.org_name, A.next_contact
     from Contacts as C
       join Organizations as O on C.org_id = O.org_id
       join Activities as A on C.contact_id = A.contact_id
     where A.next_contact <= 'today'
       and A.next_contact > '2018-12-31'
     order by c.contact_id, a.next_contact DESC;

And I've never liked this method (though I'm old and crotchety)....

What is the specific objection?

To me:

1) Plus side. It saves an explicit sub-query

2) Minus side. It is not portable. But then complete portability is a unicorn in my opinion.

I think Ron was referring to the JOIN operator, rather than the DISTINCT ON

#12Rob Sargent
robjsargent@gmail.com
In reply to: Ron (#4)
Re: Refining query statement

On 1/15/19 9:02 AM, Ron wrote: select distinct on (C.contact_id)
C.contact_id, C.lname, C.fname, C.direct_phone, O.org_name, A.next_contact

     from Contacts as C
       join Organizations as O on C.org_id = O.org_id
       join Activities as A on C.contact_id = A.contact_id
     where A.next_contact <= 'today'
       and A.next_contact > '2018-12-31'
     order by c.contact_id, a.next_contact DESC;

And I've never liked this method (though I'm old and crotchety)....

I suspect I can match your old and crotchety, but jump in, the water is
fine (once you get used to it)

#13Rich Shepard
rshepard@appl-ecosys.com
In reply to: Thomas Kellerer (#2)
Re: Refining query statement

On Tue, 15 Jan 2019, Thomas Kellerer wrote:

select distinct on (C.contact_id) C.contact_id, C.lname, C.fname, C.direct_phone, O.org_name, A.next_contact
from Contacts as C
join Organizations as O on C.org_id = O.org_id
join Activities as A on C.contact_id = A.contact_id
where A.next_contact <= 'today'
and A.next_contact > '2018-12-31'
order by c.contact_id, a.next_contact DESC;

Combining this with Adrian's advice to use BETWEEN I have this statement
that almost works:

SELECT DISTINCT ON (c.contact_id) c.contact_id, c.lname, c.fname, c.direct_phone, o.org_name, a.next_contact
FROM Contacts AS c
JOIN Organizations AS o ON c.org_id = o.org_id
JOIN Activities AS a ON c.contact_id = a.contact_id
WHERE next_contact BETWEEN '01/01/2019'::date AND 'today'::date
ORDER BY c.contact_id, a.next_contact DESC;

It fails when the most recent next_contact column in Activities is NULL and
an earier row has a non-NULL date in the specified range.

I tried specifying max(a.next_contact) and added GROUP BY, but the result
set all returned o.org_name columns to the same one.

The WHERE clause needs to exclude a contact_id where the most current row in
Activities has NULL for the next_contact column. I've tried a few ideas but
none work so I need to learn the proper syntax, and I don't find that in
Rick van der Lans' or Joe Celko's books I have.

Looking forward to learning,

Rich

#14Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Rich Shepard (#13)
Re: Refining query statement

On 1/17/19 8:14 AM, Rich Shepard wrote:

On Tue, 15 Jan 2019, Thomas Kellerer wrote:

   select distinct on (C.contact_id) C.contact_id, C.lname, C.fname,
C.direct_phone, O.org_name, A.next_contact
   from Contacts as C
     join Organizations as O on C.org_id = O.org_id
     join Activities as A on C.contact_id = A.contact_id
   where A.next_contact <= 'today'
     and A.next_contact > '2018-12-31'
   order by c.contact_id, a.next_contact DESC;

Combining this with Adrian's advice to use BETWEEN I have this statement
that almost works:

SELECT DISTINCT ON (c.contact_id) c.contact_id, c.lname, c.fname,
c.direct_phone, o.org_name, a.next_contact
FROM Contacts AS c
  JOIN Organizations AS o ON c.org_id = o.org_id
  JOIN Activities AS a ON c.contact_id = a.contact_id WHERE
next_contact BETWEEN '01/01/2019'::date AND 'today'::date ORDER BY
c.contact_id, a.next_contact DESC;

It fails when the most recent next_contact column in Activities is NULL and
an earier row has a non-NULL date in the specified range.

I tried specifying max(a.next_contact) and added GROUP BY, but the result
set all returned o.org_name columns to the same one.

The WHERE clause needs to exclude a contact_id where the most current
row in
Activities has NULL for the next_contact column. I've tried a few ideas but
none work so I need to learn the proper syntax, and I don't find that in
Rick van der Lans' or Joe Celko's books I have.

?
...

WHERE
next_contact
BETWEEN
'01/01/2019'::date AND 'today'::date
AND
a.next_contact IS NOT NULL
ORDER BY
c.contact_id, a.next_contact DESC;

Looking forward to learning,

Rich

--
Adrian Klaver
adrian.klaver@aklaver.com

#15David G. Johnston
david.g.johnston@gmail.com
In reply to: Rich Shepard (#13)
Re: Refining query statement

On Thu, Jan 17, 2019 at 9:14 AM Rich Shepard <rshepard@appl-ecosys.com> wrote:

ORDER BY c.contact_id, a.next_contact DESC;

The WHERE clause needs to exclude a contact_id where the most current row in
Activities has NULL for the next_contact column.

Why is next_contact allowed to be null?

Your concept of "most current row" is strictly based upon next_contact
so if next_contact is null it is impossible to know whether it is the
most current row (consider that two activities could be missing the
next_contact date - which of those is the most current?)

David J.

#16Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Rich Shepard (#13)
Re: Refining query statement

On 1/17/19 8:14 AM, Rich Shepard wrote:

On Tue, 15 Jan 2019, Thomas Kellerer wrote:

   select distinct on (C.contact_id) C.contact_id, C.lname, C.fname,
C.direct_phone, O.org_name, A.next_contact
   from Contacts as C
     join Organizations as O on C.org_id = O.org_id
     join Activities as A on C.contact_id = A.contact_id
   where A.next_contact <= 'today'
     and A.next_contact > '2018-12-31'
   order by c.contact_id, a.next_contact DESC;

Combining this with Adrian's advice to use BETWEEN I have this statement
that almost works:

SELECT DISTINCT ON (c.contact_id) c.contact_id, c.lname, c.fname,
c.direct_phone, o.org_name, a.next_contact
FROM Contacts AS c
  JOIN Organizations AS o ON c.org_id = o.org_id
  JOIN Activities AS a ON c.contact_id = a.contact_id WHERE
next_contact BETWEEN '01/01/2019'::date AND 'today'::date ORDER BY
c.contact_id, a.next_contact DESC;

It fails when the most recent next_contact column in Activities is NULL and
an earier row has a non-NULL date in the specified range.

I tried specifying max(a.next_contact) and added GROUP BY, but the result
set all returned o.org_name columns to the same one.

The WHERE clause needs to exclude a contact_id where the most current
row in
Activities has NULL for the next_contact column. I've tried a few ideas but
none work so I need to learn the proper syntax, and I don't find that in
Rick van der Lans' or Joe Celko's books I have.

Got to thinking more and realized the answer depends on what you want
the query to produce. Can you let us know what is you are trying to pull
out with the query?

Looking forward to learning,

Rich

--
Adrian Klaver
adrian.klaver@aklaver.com

#17Rich Shepard
rshepard@appl-ecosys.com
In reply to: David G. Johnston (#15)
Re: Refining query statement

On Thu, 17 Jan 2019, David G. Johnston wrote:

Why is next_contact allowed to be null?

David,

There are a number of reasons. The prospect might have retired, told me to
pound sand, or has put off a decision.

Your concept of "most current row" is strictly based upon next_contact so
if next_contact is null it is impossible to know whether it is the most
current row (consider that two activities could be missing the
next_contact date - which of those is the most current?)

That's quite true. I realize I've been asking the question incorrectly.

What I want is a list of contacts to make today. This includes ones that
should have been made earlier but weren't and excludes earlier contacts that
have no scheduled next contact (therefore, the nulls.).

Does this clarify what I'm asking of the data?

Thanks,

Rich

#18Rich Shepard
rshepard@appl-ecosys.com
In reply to: Adrian Klaver (#16)
Re: Refining query statement

On Thu, 17 Jan 2019, Adrian Klaver wrote:

Got to thinking more and realized the answer depends on what you want the
query to produce. Can you let us know what is you are trying to pull out
with the query?

Adrian,

Certainly. Over breakfast I realized the same thing: the existing SELECT
query is not asking for the correct data.

As I replied in response to David's message, what I want is a list of
contacts to make today. This includes ones that should have been made
earlier but weren't and excludes earlier contacts that have no scheduled
next contact (therefore, the nulls.).

Thanks,

Rich

#19David G. Johnston
david.g.johnston@gmail.com
In reply to: Rich Shepard (#17)
Re: Refining query statement

On Thu, Jan 17, 2019 at 9:47 AM Rich Shepard <rshepard@appl-ecosys.com> wrote:

What I want is a list of contacts to make today. This includes ones that
should have been made earlier but weren't and excludes earlier contacts that
have no scheduled next contact (therefore, the nulls.).

Does this clarify what I'm asking of the data?

Yes...though now it just sounds like a flawed data model. How stuck
are you in that regard? Those "future" contacts should have their own
records and not be derived via an optional field on an existing
record.

Put differently, how do you know which activities are completed and
which are not?

Dave

#20Rich Shepard
rshepard@appl-ecosys.com
In reply to: David G. Johnston (#19)
Re: Refining query statement

On Thu, 17 Jan 2019, David G. Johnston wrote:

Yes...though now it just sounds like a flawed data model.

David,

This is what I thought.

How stuck are you in that regard? Those "future" contacts should have their
own records and not be derived via an optional field on an existing
record.

My goal is to make a functioning business tracking application for my
consulting services. Almost all my prior postgres databases hold
environmental data for statistical and spatio-temporal analyses so writing a
business application is a new experience for me and I want to get it
correct.

Put differently, how do you know which activities are completed and
which are not?

The direct answer is that a completed activity has a row with either a
future next-activity date or a null (which is the case when the status of
that organization or contact is 'no further contact'.)

I should rename the Contacts table as People and the Activities table as
Contacts. The original names came from a sales management system I used as a
design guide, but they're probably confusing to others as well as to me. :-)

I can provide my current schema (eight tables) to the list (perhaps as an
attachment), an individual, or put in on a cloud site and pass the URL.

Thanks,

Rich

#21Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Rich Shepard (#20)
#22David G. Johnston
david.g.johnston@gmail.com
In reply to: Rich Shepard (#20)
#23Rich Shepard
rshepard@appl-ecosys.com
In reply to: Adrian Klaver (#21)
#24Rich Shepard
rshepard@appl-ecosys.com
In reply to: Adrian Klaver (#21)
#25Rich Shepard
rshepard@appl-ecosys.com
In reply to: David G. Johnston (#22)
#26Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Rich Shepard (#24)
#27Rich Shepard
rshepard@appl-ecosys.com
In reply to: Adrian Klaver (#26)
#28Rich Shepard
rshepard@appl-ecosys.com
In reply to: Adrian Klaver (#16)
#29Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Rich Shepard (#28)
#30Rich Shepard
rshepard@appl-ecosys.com
In reply to: Adrian Klaver (#29)
#31David G. Johnston
david.g.johnston@gmail.com
In reply to: Rich Shepard (#28)
#32Rich Shepard
rshepard@appl-ecosys.com
In reply to: David G. Johnston (#31)
#33Rich Shepard
rshepard@appl-ecosys.com
In reply to: Rich Shepard (#1)