STRING_AGG and GROUP BY
Good afternoon,
I have prepared an SQL Fiddle for my question:
http://sqlfiddle.com/#!17/4ef8b/2
Here are my 4 test tables:
CREATE TABLE players (
uid SERIAL PRIMARY KEY,
name text NOT NULL
);
CREATE TABLE games (
gid SERIAL PRIMARY KEY,
player1 integer NOT NULL REFERENCES players ON DELETE CASCADE,
player2 integer NOT NULL REFERENCES players ON DELETE CASCADE
);
CREATE TABLE moves (
mid BIGSERIAL PRIMARY KEY,
uid integer NOT NULL REFERENCES players ON DELETE CASCADE,
gid integer NOT NULL REFERENCES games ON DELETE CASCADE,
played timestamptz NOT NULL,
tiles jsonb NOT NULL
);
CREATE TABLE scores (
mid bigint NOT NULL REFERENCES moves ON DELETE CASCADE,
uid integer NOT NULL REFERENCES players ON DELETE CASCADE,
gid integer NOT NULL REFERENCES games ON DELETE CASCADE,
word text NOT NULL CHECK(word ~ '^[A-Z]{2,}$'),
score integer NOT NULL CHECK(score >= 0)
);
Here they are filled with test data (two players Alice and Bob
interchangeably performing moves in game #1):
INSERT INTO players (name) VALUES ('Alice'), ('Bob');
INSERT INTO games (player1, player2) VALUES (1, 2);
INSERT INTO moves (uid, gid, played, tiles) VALUES
(1, 1, now() + interval '1 min', '[{"col": 7, "row": 12, "value": 3,
"letter": "A"}, {"col": 8, "row": 12, "value": 10, "letter": "A"}, {"col":
9, "row": 12, "value": 1, "letter": "C"}, {"col": 10, "row": 12, "value":
2, "letter": "D"}]
'::jsonb),
(2, 1, now() + interval '2 min', '[{"col": 7, "row": 12, "value": 3,
"letter": "X"}, {"col": 8, "row": 12, "value": 10, "letter": "X"}, {"col":
9, "row": 12, "value": 1, "letter": "Z"}]
'::jsonb),
(1, 1, now() + interval '3 min', '[{"col": 7, "row": 12, "value": 3,
"letter": "K"}, {"col": 8, "row": 12, "value": 10, "letter": "K"}, {"col":
9, "row": 12, "value": 1, "letter": "M"}, {"col": 10, "row": 12, "value":
2, "letter": "N"}]
'::jsonb),
(2, 1, now() + interval '4 min', '[]'::jsonb),
(1, 1, now() + interval '5 min', '[{"col": 7, "row": 12, "value": 3,
"letter": "A"}, {"col": 8, "row": 12, "value": 10, "letter": "B"}, {"col":
9, "row": 12, "value": 1, "letter": "C"}, {"col": 10, "row": 12, "value":
2, "letter": "D"}]
'::jsonb),
(2, 1, now() + interval '6 min', '[{"col": 7, "row": 12, "value": 3,
"letter": "P"}, {"col": 8, "row": 12, "value": 10, "letter": "P"}]
'::jsonb);
INSERT INTO scores (mid, uid, gid, word, score) VALUES
(1, 1, 1, 'AACD', 40),
(2, 2, 1, 'XXZ', 30),
(2, 2, 1, 'XAB', 30),
(3, 1, 1, 'KKMN', 40),
(3, 1, 1, 'KYZ', 30),
(5, 1, 1, 'ABCD', 40),
(6, 2, 1, 'PP', 20),
(6, 2, 1, 'PABCD', 50);
For a PHP-script which would display all moves+words+scores played in a
certain game I am trying:
SELECT
mid,
STRING_AGG(x->>'letter', '') AS tiles,
STRING_AGG(DISTINCT y, ', ') AS words
FROM (
SELECT
mid,
JSONB_ARRAY_ELEMENTS(m.tiles) AS x,
FORMAT('%s (%s)', s.word, s.score) AS y
FROM moves m
LEFT JOIN scores s
USING (mid)
WHERE m.gid = 1
) AS z
GROUP BY mid;
Which produces a slightly wrong result (the played letters are duplicated):
mid tiles words
1 AACD AACD (40)
2 XXZXXZ XAB (30), XXZ (30)
3 KKMNKKMN KKMN (40), KYZ (30)
5 ABCD ABCD (40)
6 PPPP PABCD (50), PP (20)
My expected result would actually be:
mid tiles words
1 AACD AACD (40)
2 XXZ XAB (30), XXZ (30)
3 KKMN KKMN (40), KYZ (30)
5 ABCD ABCD (40)
6 PP PABCD (50), PP (20)
Is that actually possible by the means of SQL or should I do it in the PHP
script?
And I can not change the query to: STRING_AGG(DISTINCT x->>'letter', '') AS
tiles,
Because for example in the last move with mid=6 the player Bob had played 2
tiles, both with letter-value "P" and has formed 2 words (PP and PABCD),
but adding distinct would suggest he played a single tile "P".
Thank you
Alex
On Fri, Mar 16, 2018 at 7:17 AM, Alexander Farber <
alexander.farber@gmail.com> wrote:
And I can not change the query to: STRING_AGG(DISTINCT x->>'letter', '')
AS tiles,Because for example in the last move with mid=6 the player Bob had played
2 tiles, both with letter-value "P" and has formed 2 words (PP and PABCD),
but adding distinct would suggest he played a single tile "P".
First reaction is to ARRAY_AGG(DISTINCT x) and then write a function that
converts that array into a string by extracting 'letter' from each cell in
the array.
Thinking it over a bit you have two columns that both are aggregates but
that are otherwise independent of each other. Since they are independent
they cannot be aggregated at the same time. You need to write a two
subqueries, either in the target list or as separate from/join items, and
then join the already aggregated queries together on their common group by
column.
The presence of DISTINCT here (and, IMO, generally), even if it worked,
would be an indicator that something is not quite right.
David J.
Hi David -
On Fri, Mar 16, 2018 at 4:40 PM, David G. Johnston <
david.g.johnston@gmail.com> wrote:
First reaction is to ARRAY_AGG(DISTINCT x) and then write a function that
converts that array into a string by extracting 'letter' from each cell in
the array.Thinking it over a bit you have two columns that both are aggregates but
that are otherwise independent of each other. Since they are independent
they cannot be aggregated at the same time. You need to write a two
subqueries, either in the target list or as separate from/join items, and
then join the already aggregated queries together on their common group by
column.The presence of DISTINCT here (and, IMO, generally), even if it worked,
would be an indicator that something is not quite right.
thank you for confirming my feeling that DISTINCT is a bad indicator here...
But you say that "tiles" and "word (score)" are unrelated and this does
not seem true to me:
For each move id aka "mid" there is a JSON value, describing how the player
played the letter tiles.
And for the same "mid" there is a list of one or more "word (score)"s
achieved...
Maybe it is possible to GROUP BY tiles somehow or some kind of special SQL
JOIN?
Regards
Alex
On Fri, Mar 16, 2018 at 9:10 AM, Alexander Farber <
alexander.farber@gmail.com> wrote:
But you say that "tiles" and "word (score)" are unrelated and this does
not seem true to me:For each move id aka "mid" there is a JSON value, describing how the
player played the letter tiles.
And for the same "mid" there is a list of one or more "word (score)"s
achieved...
For each mid you want to know all tiles played and all word scores
achieved - but you want to forget/ignore that a given tile achieved a given
word score. IOW, you are intentionally forgetting/ignoring the fact that
the tiles and the corresponding word scores are related to each other,
beyond the simple/incomplete relationship that both share the same mid.
You thus need to write a query that only relates tiles and word scores to
mid and not to each other.
Maybe it is possible to GROUP BY tiles somehow or some kind of special SQL
JOIN?
I do not know if the tables or columns below match your model but the
concept should still come across intact.
SELECT mid,
(SELECT string_agg(tiles, '; ') FROM tiles WHERE ta.mid = moves.mid) AS
mid_tiles,
(SELECT string_agg(words, '; ') FROM words WHERE words.mid = moves.mid) AS
mid_words
FROM moves
There are other ways to write that that could perform better but the idea
holds.
David J.
Thank you, David -
On Fri, Mar 16, 2018 at 5:40 PM, David G. Johnston <
david.g.johnston@gmail.com> wrote:
On Fri, Mar 16, 2018 at 9:10 AM, Alexander Farber <
alexander.farber@gmail.com> wrote:SELECT mid,
(SELECT string_agg(tiles, '; ') FROM tiles WHERE ta.mid = moves.mid) AS
mid_tiles,
(SELECT string_agg(words, '; ') FROM words WHERE words.mid = moves.mid) AS
mid_words
FROM movesThere are other ways to write that that could perform better but the idea
holds.
I've come up with the following query, wonder if you meant something
similar -
http://sqlfiddle.com/#!17/4ef8b/48
WITH cte1 AS (
SELECT
mid,
STRING_AGG(x->>'letter', '') AS tiles
FROM (
SELECT
mid,
JSONB_ARRAY_ELEMENTS(m.tiles) AS x
FROM moves m
WHERE m.gid = 1
) AS z
GROUP BY mid),
cte2 AS (
SELECT
mid,
STRING_AGG(y, ', ') AS words
FROM (
SELECT
mid,
FORMAT('%s (%s)', s.word, s.score) AS y
FROM scores s
WHERE s.gid = 1
) AS z
GROUP BY mid)
SELECT mid, tiles, words
FROM cte1 JOIN cte2 using (mid) ORDER BY mid ASC;
Regards
Alex
P.S. Below is the complete test data in case SQL Fiddle link stops working:
CREATE TABLE players (
uid SERIAL PRIMARY KEY,
name text NOT NULL
);
CREATE TABLE games (
gid SERIAL PRIMARY KEY,
player1 integer NOT NULL REFERENCES players ON DELETE CASCADE,
player2 integer NOT NULL REFERENCES players ON DELETE CASCADE
);
CREATE TABLE moves (
mid BIGSERIAL PRIMARY KEY,
uid integer NOT NULL REFERENCES players ON DELETE CASCADE,
gid integer NOT NULL REFERENCES games ON DELETE CASCADE,
played timestamptz NOT NULL,
tiles jsonb NOT NULL
);
CREATE TABLE scores (
mid bigint NOT NULL REFERENCES moves ON DELETE CASCADE,
uid integer NOT NULL REFERENCES players ON DELETE CASCADE,
gid integer NOT NULL REFERENCES games ON DELETE CASCADE,
word text NOT NULL CHECK(word ~ '^[A-Z]{2,}$'),
score integer NOT NULL CHECK(score >= 0)
);
INSERT INTO players (name) VALUES ('Alice'), ('Bob');
INSERT INTO games (player1, player2) VALUES (1, 2);
INSERT INTO moves (uid, gid, played, tiles) VALUES
(1, 1, now() + interval '1 min', '[{"col": 7, "row": 12, "value": 3,
"letter": "A"}, {"col": 8, "row": 12, "value": 10, "letter": "A"}, {"col":
9, "row": 12, "value": 1, "letter": "C"}, {"col": 10, "row": 12, "value":
2, "letter": "D"}]
'::jsonb),
(2, 1, now() + interval '2 min', '[{"col": 7, "row": 12, "value": 3,
"letter": "X"}, {"col": 8, "row": 12, "value": 10, "letter": "X"}, {"col":
9, "row": 12, "value": 1, "letter": "Z"}]
'::jsonb),
(1, 1, now() + interval '3 min', '[{"col": 7, "row": 12, "value": 3,
"letter": "K"}, {"col": 8, "row": 12, "value": 10, "letter": "K"}, {"col":
9, "row": 12, "value": 1, "letter": "M"}, {"col": 10, "row": 12, "value":
2, "letter": "N"}]
'::jsonb),
(2, 1, now() + interval '4 min', '[]'::jsonb),
(1, 1, now() + interval '5 min', '[{"col": 7, "row": 12, "value": 3,
"letter": "A"}, {"col": 8, "row": 12, "value": 10, "letter": "B"}, {"col":
9, "row": 12, "value": 1, "letter": "C"}, {"col": 10, "row": 12, "value":
2, "letter": "D"}]
'::jsonb),
(2, 1, now() + interval '6 min', '[{"col": 7, "row": 12, "value": 3,
"letter": "P"}, {"col": 8, "row": 12, "value": 10, "letter": "Q"}]
'::jsonb);
INSERT INTO scores (mid, uid, gid, word, score) VALUES
(1, 1, 1, 'AACD', 40),
(2, 2, 1, 'XXZ', 30),
(2, 2, 1, 'XAB', 30),
(3, 1, 1, 'KKMN', 40),
(3, 1, 1, 'KYZ', 30),
(5, 1, 1, 'ABCD', 40),
(6, 2, 1, 'PQ', 20),
(6, 2, 1, 'PABCD', 50);
On Mon, Mar 19, 2018 at 1:54 PM, Alexander Farber <
alexander.farber@gmail.com> wrote:
I've come up with the following query, wonder if you meant something
similar -http://sqlfiddle.com/#!17/4ef8b/48
WITH cte1 AS (
SELECT
mid,
STRING_AGG(x->>'letter', '') AS tiles
FROM (
SELECT
mid,
JSONB_ARRAY_ELEMENTS(m.tiles) AS x
FROM moves m
WHERE m.gid = 1
) AS z
GROUP BY mid),
cte2 AS (
SELECT
mid,
STRING_AGG(y, ', ') AS words
FROM (
SELECT
mid,
FORMAT('%s (%s)', s.word, s.score) AS y
FROM scores s
WHERE s.gid = 1
) AS z
GROUP BY mid)
SELECT mid, tiles, words
FROM cte1 JOIN cte2 using (mid) ORDER BY mid ASC;
Yes. It does end up presuming that the sets moves.mid and scores.mid
are identical but that is probably a safe assumption. Repetition of m.gid
= 1 is worth avoiding in theory though depending on how its done the
solution can be worse than the problem (if the planner ends up unable to
push the predicate down).
David J.