Need some advice on a difficult query
Hi all - I'd like some advice on how to write a rather complicated (for me,
anyway) query and if there's any nifty Postgres features I can take
advantage of in this situation. Imagine a database, if you will, used to
store recipes. I have a recipes table:
RecipeId
RecipeTitle
RecipeRating
And an ingredients table:
RecipeId
IngredientId
Amount
When the user searches for a new pasta dish, the UI would generate a query
something like this:
SELECT * FROM Recipes where RecipeTitle ilike '%pasta%';
I only need the data from the recipes table since I display a summary of the
search results and don't load the full recipe until the user clicks on the
link. This works great. However, I'm now in the process of implementing an
ingredient blacklist. This means NEVER show me any recipes which have one
of my blacklisted ingredients, as if I ingest any I will die a painful
death. Or maybe my god will smite me or something. This table would look
like:
UserId
IngredientId
Ok, now the question. What's the best way to query for my pasta dish above
while excluding any recipe that contains one or more of my blacklisted
ingredients? I don't want to join in Ingredients because I'd then have a
row for each ingredient of each recipe (which would cause me massive
headache and redesign).. I thought about some weird nested query, but I'm
concerned it might be slow (searching for recipes is the central feature of
my site and must be fast, fast and fast). This basically boils down to some
weird Cartesian product thing.
I am, however, willing to redesign the mechanism I store blacklisted
ingredients with. I could store this as an array on the Users table, or
anything else that might make things easier. Any advice on this?
Mike
Hi Mike,
here is an untested "weird nested query" for your problem:
SELECT * FROM Recipes r where lower(RecipeTitle) like lower('%pasta%')
and not exists
(select 1 from ingredients inner join blacklist using (IngredientId) where
RecipeId = r.RecipeId and blacklist.UserId = 123 limit 1);
Should be quite fast with the right indexes.
Timo
Von: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] Im Auftrag von Mike Christensen
Gesendet: Mittwoch, 16. Dezember 2009 09:48
An: pgsql-general@postgresql.org
Betreff: [GENERAL] Need some advice on a difficult query
Hi all - I'd like some advice on how to write a rather complicated (for me,
anyway) query and if there's any nifty Postgres features I can take
advantage of in this situation. Imagine a database, if you will, used to
store recipes. I have a recipes table:
RecipeId
RecipeTitle
RecipeRating
And an ingredients table:
RecipeId
IngredientId
Amount
When the user searches for a new pasta dish, the UI would generate a query
something like this:
SELECT * FROM Recipes where RecipeTitle ilike '%pasta%';
I only need the data from the recipes table since I display a summary of the
search results and don't load the full recipe until the user clicks on the
link. This works great. However, I'm now in the process of implementing an
ingredient blacklist. This means NEVER show me any recipes which have one
of my blacklisted ingredients, as if I ingest any I will die a painful
death. Or maybe my god will smite me or something. This table would look
like:
UserId
IngredientId
Ok, now the question. What's the best way to query for my pasta dish above
while excluding any recipe that contains one or more of my blacklisted
ingredients? I don't want to join in Ingredients because I'd then have a
row for each ingredient of each recipe (which would cause me massive
headache and redesign).. I thought about some weird nested query, but I'm
concerned it might be slow (searching for recipes is the central feature of
my site and must be fast, fast and fast). This basically boils down to some
weird Cartesian product thing.
I am, however, willing to redesign the mechanism I store blacklisted
ingredients with. I could store this as an array on the Users table, or
anything else that might make things easier. Any advice on this?
Mike
Awesome, I'll give this a shot.. Blacklist.UserId will be indexed and all
the recipe links are of course already indexed, but I'll run it under the
query analyzer to see if there's any other fine tuning needed. I appreciate
your help!
Mike
On Wed, Dec 16, 2009 at 1:27 AM, Timo Klecker <klecker@decoit.de> wrote:
Show quoted text
Hi Mike,
here is an untested "weird nested query" for your problem:
SELECT * FROM Recipes r where lower(RecipeTitle) like lower('%pasta%')
and not exists
(select 1 from ingredients inner join blacklist using (IngredientId) where
RecipeId = r.RecipeId and blacklist.UserId = 123 limit 1);Should be quite fast with the right indexes.
Timo
* *
*Von:* pgsql-general-owner@postgresql.org [mailto:
pgsql-general-owner@postgresql.org] *Im Auftrag von *Mike Christensen
*Gesendet:* Mittwoch, 16. Dezember 2009 09:48
*An:* pgsql-general@postgresql.org
*Betreff:* [GENERAL] Need some advice on a difficult queryHi all - I'd like some advice on how to write a rather complicated (for me,
anyway) query and if there's any nifty Postgres features I can take
advantage of in this situation. Imagine a database, if you will, used to
store recipes. I have a recipes table:RecipeId
RecipeTitle
RecipeRatingAnd an ingredients table:
RecipeId
IngredientId
AmountWhen the user searches for a new pasta dish, the UI would generate a query
something like this:SELECT * FROM Recipes where RecipeTitle ilike '%pasta%';
I only need the data from the recipes table since I display a summary of
the search results and don't load the full recipe until the user clicks on
the link. This works great. However, I'm now in the process of
implementing an ingredient blacklist. This means NEVER show me any recipes
which have one of my blacklisted ingredients, as if I ingest any I will die
a painful death. Or maybe my god will smite me or something. This table
would look like:UserId
IngredientIdOk, now the question. What's the best way to query for my pasta dish above
while excluding any recipe that contains one or more of my blacklisted
ingredients? I don't want to join in Ingredients because I'd then have a
row for each ingredient of each recipe (which would cause me massive
headache and redesign).. I thought about some weird nested query, but I'm
concerned it might be slow (searching for recipes is the central feature of
my site and must be fast, fast and fast). This basically boils down to some
weird Cartesian product thing.I am, however, willing to redesign the mechanism I store blacklisted
ingredients with. I could store this as an array on the Users table, or
anything else that might make things easier. Any advice on this?Mike
On Wed, Dec 16, 2009 at 12:47:36AM -0800, Mike Christensen wrote:
When the user searches for a new pasta dish, the UI would generate a query
something like this:SELECT * FROM Recipes where RecipeTitle ilike '%pasta%';
I only need the data from the recipes table since I display a summary of the
search results and don't load the full recipe until the user clicks on the
link. This works great. However, I'm now in the process of implementing an
ingredient blacklist. This means NEVER show me any recipes which have one
of my blacklisted ingredients, as if I ingest any I will die a painful
death.
If you expect the number of blacklisted recipes to be low, the
following may be a good alternative as well:
SELECT r.*
FROM recipes r LEFT JOIN (
SELECT i.recipeid FROM ingredients i, blacklist b
WHERE i.ingredientid = b.ingredientid
AND b.userid = 123
GROUP BY i.recipeid) x ON r.recipeid = x.recipeid
WHERE r.recipetitle ILIKE '%pasta%'
AND x.recipeid IS NULL;
Note that it's generally considered bad form to include "*" in the
return of a query when it's code dealing with the response. Humans can
deal with the columns coming back differently, but code has the habit of
getting confused.
Also, you may want to consider using full text search when searching the
titles. That ILIKE requires a full table scan and will slow down as
more recipes get added.
--
Sam http://samason.me.uk/
Thanks! The queries I wrote in my email were just an example, my actual
implementation specifies all column names required and also uses full text
search. I just didn't want to paste in that much cruft :)
I'll do some tests with your technique below and see which works better..
Mike
On Wed, Dec 16, 2009 at 7:01 AM, Sam Mason <sam@samason.me.uk> wrote:
Show quoted text
On Wed, Dec 16, 2009 at 12:47:36AM -0800, Mike Christensen wrote:
When the user searches for a new pasta dish, the UI would generate a
query
something like this:
SELECT * FROM Recipes where RecipeTitle ilike '%pasta%';
I only need the data from the recipes table since I display a summary of
the
search results and don't load the full recipe until the user clicks on
the
link. This works great. However, I'm now in the process of implementing
an
ingredient blacklist. This means NEVER show me any recipes which have
one
of my blacklisted ingredients, as if I ingest any I will die a painful
death.If you expect the number of blacklisted recipes to be low, the
following may be a good alternative as well:SELECT r.*
FROM recipes r LEFT JOIN (
SELECT i.recipeid FROM ingredients i, blacklist b
WHERE i.ingredientid = b.ingredientid
AND b.userid = 123
GROUP BY i.recipeid) x ON r.recipeid = x.recipeid
WHERE r.recipetitle ILIKE '%pasta%'
AND x.recipeid IS NULL;Note that it's generally considered bad form to include "*" in the
return of a query when it's code dealing with the response. Humans can
deal with the columns coming back differently, but code has the habit of
getting confused.Also, you may want to consider using full text search when searching the
titles. That ILIKE requires a full table scan and will slow down as
more recipes get added.--
Sam http://samason.me.uk/--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general