SQL group by help

Started by Chris Stephensover 5 years ago3 messagesgeneral

cstephens16@gmail.com

over 5 years ago

I'm trying to create a visual representation of a 6x8 grid of samples on a
rack using the following SQL format:

with rack_display as (
select sr.ts rack_ts
, sr.rack_id
, r.rack_barcode
, 1 as row_pos
, max(case when rack_well = 0 then 'A1: '||sample_barcode end) as
col1
, max(case when rack_well = 1 then 'A2: '||sample_barcode end) as
col2
, max(case when rack_well = 2 then 'A3: '||sample_barcode end) as
col3
, max(case when rack_well = 3 then 'A4: '||sample_barcode end) as
col4
, max(case when rack_well = 4 then 'A5: '||sample_barcode end) as
col5
, max(case when rack_well = 5 then 'A6: '||sample_barcode end) as
col6
from rack r
, sample_rack sr
, sample s
where r.rack_id = sr.rack_id
and sr.sample_id = s.sample_id
and sr.rack_well < 6
group by sr.ts, sr.rack_id, r.rack_barcode, row_pos
union all
select sr.ts rack_ts
, sr.rack_id
, r.rack_barcode
, 2 as row_pos
, max(case when rack_well = 6 then 'B1: '||sample_barcode end) as
col1
, max(case when rack_well = 7 then 'B2: '||sample_barcode end) as
col2
, max(case when rack_well = 8 then 'B3: '||sample_barcode end) as
col3
, max(case when rack_well = 9 then 'B4: '||sample_barcode end) as
col4
, max(case when rack_well = 10 then 'B5: '||sample_barcode end) as
col5
, max(case when rack_well = 11 then 'B6: '||sample_barcode end) as
col6
from rack r
, sample_rack sr
, sample s
where r.rack_id = sr.rack_id
and sr.sample_id = s.sample_id
and sr.rack_well >= 6
and sr.rack_well < 12
group by sr.ts, sr.rack_id, r.rack_barcode, row_pos
union all
...
)
select * from rack_display order by rack_ts, rack_id, row_pos;

the "union all"s continue for another 6 blocks. reports would filter on
rack_id and timestamp.

if timestamps for each load of a rack were guaranteed to be the same, this
would work. however, the "sr.ts" values may vary by a few seconds so there
is potential for the "group by" to break. ts differences will be a minimum
of 5 minutes for each distinct load of a rack.

what i think i need is to manufacture a group by column based off rows in
"sample_rack" that have "ts" values that are < 1 minute from each other and
rack_id is the same. i'm coming up blank on how to accomplish that though.
my first thought was to create an interval of +/- 1 min then find all rows
that overlap and assign a group number but i'm not sure how to accomplish
that.

there's also no guarantee an entire rack is full of samples so some "cells"
of display might be null. i think that makes the use of tablefunc crosstab
a little harder. if i remember correctly, it does not handle missing values
well. i'm open to any pivoting strategy.

anyways, i thought i'd reach out for ideas while i do my own digging.

thanks for any input! let me know if i'm not being clear on the problem and
desired outcome.

Rob Sargent

robjsargent@gmail.com

over 5 years ago

In reply to: Chris Stephens (#1)

Re: SQL group by help

What

On Dec 11, 2020, at 10:24 AM, Chris Stephens <cstephens16@gmail.com> wrote:

I'm trying to create a visual representation of a 6x8 grid of samples on a rack using the following SQL format:

with rack_display as (
select sr.ts rack_ts
, sr.rack_id
, r.rack_barcode
, 1 as row_pos
, max(case when rack_well = 0 then 'A1: '||sample_barcode end) as col1
, max(case when rack_well = 1 then 'A2: '||sample_barcode end) as col2
, max(case when rack_well = 2 then 'A3: '||sample_barcode end) as col3
, max(case when rack_well = 3 then 'A4: '||sample_barcode end) as col4
, max(case when rack_well = 4 then 'A5: '||sample_barcode end) as col5
, max(case when rack_well = 5 then 'A6: '||sample_barcode end) as col6
from rack r
, sample_rack sr
, sample s
where r.rack_id = sr.rack_id
and sr.sample_id = s.sample_id
and sr.rack_well < 6
group by sr.ts, sr.rack_id, r.rack_barcode, row_pos
union all
select sr.ts rack_ts
, sr.rack_id
, r.rack_barcode
, 2 as row_pos
, max(case when rack_well = 6 then 'B1: '||sample_barcode end) as col1
, max(case when rack_well = 7 then 'B2: '||sample_barcode end) as col2
, max(case when rack_well = 8 then 'B3: '||sample_barcode end) as col3
, max(case when rack_well = 9 then 'B4: '||sample_barcode end) as col4
, max(case when rack_well = 10 then 'B5: '||sample_barcode end) as col5
, max(case when rack_well = 11 then 'B6: '||sample_barcode end) as col6
from rack r
, sample_rack sr
, sample s
where r.rack_id = sr.rack_id
and sr.sample_id = s.sample_id
and sr.rack_well >= 6
and sr.rack_well < 12
group by sr.ts, sr.rack_id, r.rack_barcode, row_pos
union all
...
)
select * from rack_display order by rack_ts, rack_id, row_pos;

the "union all"s continue for another 6 blocks. reports would filter on rack_id and timestamp.

if timestamps for each load of a rack were guaranteed to be the same, this would work. however, the "sr.ts" values may vary by a few seconds so there is potential for the "group by" to break. ts differences will be a minimum of 5 minutes for each distinct load of a rack.

what i think i need is to manufacture a group by column based off rows in "sample_rack" that have "ts" values that are < 1 minute from each other and rack_id is the same. i'm coming up blank on how to accomplish that though. my first thought was to create an interval of +/- 1 min then find all rows that overlap and assign a group number but i'm not sure how to accomplish that.

there's also no guarantee an entire rack is full of samples so some "cells" of display might be null. i think that makes the use of tablefunc crosstab a little harder. if i remember correctly, it does not handle missing values well. i'm open to any pivoting strategy.

anyways, i thought i'd reach out for ideas while i do my own digging.

thanks for any input! let me know if i'm not being clear on the problem and desired outcome.

What stack are you usinglr. I would hope you could hand data off to front end which should have tools necessary

Alban Hertroys

haramrae@gmail.com

over 5 years ago

In reply to: Chris Stephens (#1)

Re: SQL group by help

On 11 Dec 2020, at 18:24, Chris Stephens <cstephens16@gmail.com> wrote:

I'm trying to create a visual representation of a 6x8 grid of samples on a rack using the following SQL format:

with rack_display as (
select sr.ts rack_ts
, sr.rack_id
, r.rack_barcode
, 1 as row_pos
, max(case when rack_well = 0 then 'A1: '||sample_barcode end) as col1
, max(case when rack_well = 1 then 'A2: '||sample_barcode end) as col2
, max(case when rack_well = 2 then 'A3: '||sample_barcode end) as col3
, max(case when rack_well = 3 then 'A4: '||sample_barcode end) as col4
, max(case when rack_well = 4 then 'A5: '||sample_barcode end) as col5
, max(case when rack_well = 5 then 'A6: '||sample_barcode end) as col6
from rack r
, sample_rack sr
, sample s
where r.rack_id = sr.rack_id
and sr.sample_id = s.sample_id
and sr.rack_well < 6
group by sr.ts, sr.rack_id, r.rack_barcode, row_pos
union all
select sr.ts rack_ts
, sr.rack_id
, r.rack_barcode
, 2 as row_pos
, max(case when rack_well = 6 then 'B1: '||sample_barcode end) as col1
, max(case when rack_well = 7 then 'B2: '||sample_barcode end) as col2
, max(case when rack_well = 8 then 'B3: '||sample_barcode end) as col3
, max(case when rack_well = 9 then 'B4: '||sample_barcode end) as col4
, max(case when rack_well = 10 then 'B5: '||sample_barcode end) as col5
, max(case when rack_well = 11 then 'B6: '||sample_barcode end) as col6
from rack r
, sample_rack sr
, sample s
where r.rack_id = sr.rack_id
and sr.sample_id = s.sample_id
and sr.rack_well >= 6
and sr.rack_well < 12
group by sr.ts, sr.rack_id, r.rack_barcode, row_pos
union all
...
)
select * from rack_display order by rack_ts, rack_id, row_pos;

the "union all"s continue for another 6 blocks. reports would filter on rack_id and timestamp.

Is time really what groups these batches? I would double-check whether you may be omitting to store some data relevant to this process.

if timestamps for each load of a rack were guaranteed to be the same, this would work. however, the "sr.ts" values may vary by a few seconds so there is potential for the "group by" to break. ts differences will be a minimum of 5 minutes for each distinct load of a rack.

what i think i need is to manufacture a group by column based off rows in "sample_rack" that have "ts" values that are < 1 minute from each other and rack_id is the same. i'm coming up blank on how to accomplish that though. my first thought was to create an interval of +/- 1 min then find all rows that overlap and assign a group number but i'm not sure how to accomplish that.

You could date_trunc those timestamps to the minute and group on that.

there's also no guarantee an entire rack is full of samples so some "cells" of display might be null. i think that makes the use of tablefunc crosstab a little harder. if i remember correctly, it does not handle missing values well. i'm open to any pivoting strategy.

Many reporting tools have features to support just that. We use WebFOCUS, which calls those ACROSS columns. It’s a common requirement in reporting.

Alban Hertroys
--
There is always an exception to always.