Need some advice on a difficult query

Started by Mike Christensenover 16 years ago5 messagesgeneral
Jump to latest
#1Mike Christensen
mike@kitchenpc.com

Hi all - I'd like some advice on how to write a rather complicated (for me,
anyway) query and if there's any nifty Postgres features I can take
advantage of in this situation. Imagine a database, if you will, used to
store recipes. I have a recipes table:

RecipeId
RecipeTitle
RecipeRating

And an ingredients table:

RecipeId
IngredientId
Amount

When the user searches for a new pasta dish, the UI would generate a query
something like this:

SELECT * FROM Recipes where RecipeTitle ilike '%pasta%';

I only need the data from the recipes table since I display a summary of the
search results and don't load the full recipe until the user clicks on the
link. This works great. However, I'm now in the process of implementing an
ingredient blacklist. This means NEVER show me any recipes which have one
of my blacklisted ingredients, as if I ingest any I will die a painful
death. Or maybe my god will smite me or something. This table would look
like:

UserId
IngredientId

Ok, now the question. What's the best way to query for my pasta dish above
while excluding any recipe that contains one or more of my blacklisted
ingredients? I don't want to join in Ingredients because I'd then have a
row for each ingredient of each recipe (which would cause me massive
headache and redesign).. I thought about some weird nested query, but I'm
concerned it might be slow (searching for recipes is the central feature of
my site and must be fast, fast and fast). This basically boils down to some
weird Cartesian product thing.

I am, however, willing to redesign the mechanism I store blacklisted
ingredients with. I could store this as an array on the Users table, or
anything else that might make things easier. Any advice on this?

Mike

#2Timo Klecker
klecker@decoit.de
In reply to: Mike Christensen (#1)
Re: Need some advice on a difficult query

Hi Mike,

here is an untested "weird nested query" for your problem:

SELECT * FROM Recipes r where lower(RecipeTitle) like lower('%pasta%')

and not exists

(select 1 from ingredients inner join blacklist using (IngredientId) where
RecipeId = r.RecipeId and blacklist.UserId = 123 limit 1);

Should be quite fast with the right indexes.

Timo

Von: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] Im Auftrag von Mike Christensen
Gesendet: Mittwoch, 16. Dezember 2009 09:48
An: pgsql-general@postgresql.org
Betreff: [GENERAL] Need some advice on a difficult query

Hi all - I'd like some advice on how to write a rather complicated (for me,
anyway) query and if there's any nifty Postgres features I can take
advantage of in this situation. Imagine a database, if you will, used to
store recipes. I have a recipes table:

RecipeId
RecipeTitle
RecipeRating

And an ingredients table:

RecipeId
IngredientId
Amount

When the user searches for a new pasta dish, the UI would generate a query
something like this:

SELECT * FROM Recipes where RecipeTitle ilike '%pasta%';

I only need the data from the recipes table since I display a summary of the
search results and don't load the full recipe until the user clicks on the
link. This works great. However, I'm now in the process of implementing an
ingredient blacklist. This means NEVER show me any recipes which have one
of my blacklisted ingredients, as if I ingest any I will die a painful
death. Or maybe my god will smite me or something. This table would look
like:

UserId
IngredientId

Ok, now the question. What's the best way to query for my pasta dish above
while excluding any recipe that contains one or more of my blacklisted
ingredients? I don't want to join in Ingredients because I'd then have a
row for each ingredient of each recipe (which would cause me massive
headache and redesign).. I thought about some weird nested query, but I'm
concerned it might be slow (searching for recipes is the central feature of
my site and must be fast, fast and fast). This basically boils down to some
weird Cartesian product thing.

I am, however, willing to redesign the mechanism I store blacklisted
ingredients with. I could store this as an array on the Users table, or
anything else that might make things easier. Any advice on this?

Mike

#3Mike Christensen
mike@kitchenpc.com
In reply to: Timo Klecker (#2)
Re: Need some advice on a difficult query

Awesome, I'll give this a shot.. Blacklist.UserId will be indexed and all
the recipe links are of course already indexed, but I'll run it under the
query analyzer to see if there's any other fine tuning needed. I appreciate
your help!

Mike

On Wed, Dec 16, 2009 at 1:27 AM, Timo Klecker <klecker@decoit.de> wrote:

Show quoted text

Hi Mike,

here is an untested "weird nested query" for your problem:

SELECT * FROM Recipes r where lower(RecipeTitle) like lower('%pasta%')

and not exists

(select 1 from ingredients inner join blacklist using (IngredientId) where
RecipeId = r.RecipeId and blacklist.UserId = 123 limit 1);

Should be quite fast with the right indexes.

Timo

* *

*Von:* pgsql-general-owner@postgresql.org [mailto:
pgsql-general-owner@postgresql.org] *Im Auftrag von *Mike Christensen
*Gesendet:* Mittwoch, 16. Dezember 2009 09:48
*An:* pgsql-general@postgresql.org
*Betreff:* [GENERAL] Need some advice on a difficult query

Hi all - I'd like some advice on how to write a rather complicated (for me,
anyway) query and if there's any nifty Postgres features I can take
advantage of in this situation. Imagine a database, if you will, used to
store recipes. I have a recipes table:

RecipeId
RecipeTitle
RecipeRating

And an ingredients table:

RecipeId
IngredientId
Amount

When the user searches for a new pasta dish, the UI would generate a query
something like this:

SELECT * FROM Recipes where RecipeTitle ilike '%pasta%';

I only need the data from the recipes table since I display a summary of
the search results and don't load the full recipe until the user clicks on
the link. This works great. However, I'm now in the process of
implementing an ingredient blacklist. This means NEVER show me any recipes
which have one of my blacklisted ingredients, as if I ingest any I will die
a painful death. Or maybe my god will smite me or something. This table
would look like:

UserId
IngredientId

Ok, now the question. What's the best way to query for my pasta dish above
while excluding any recipe that contains one or more of my blacklisted
ingredients? I don't want to join in Ingredients because I'd then have a
row for each ingredient of each recipe (which would cause me massive
headache and redesign).. I thought about some weird nested query, but I'm
concerned it might be slow (searching for recipes is the central feature of
my site and must be fast, fast and fast). This basically boils down to some
weird Cartesian product thing.

I am, however, willing to redesign the mechanism I store blacklisted
ingredients with. I could store this as an array on the Users table, or
anything else that might make things easier. Any advice on this?

Mike

#4Sam Mason
sam@samason.me.uk
In reply to: Mike Christensen (#1)
Re: Need some advice on a difficult query

On Wed, Dec 16, 2009 at 12:47:36AM -0800, Mike Christensen wrote:

When the user searches for a new pasta dish, the UI would generate a query
something like this:

SELECT * FROM Recipes where RecipeTitle ilike '%pasta%';

I only need the data from the recipes table since I display a summary of the
search results and don't load the full recipe until the user clicks on the
link. This works great. However, I'm now in the process of implementing an
ingredient blacklist. This means NEVER show me any recipes which have one
of my blacklisted ingredients, as if I ingest any I will die a painful
death.

If you expect the number of blacklisted recipes to be low, the
following may be a good alternative as well:

SELECT r.*
FROM recipes r LEFT JOIN (
SELECT i.recipeid FROM ingredients i, blacklist b
WHERE i.ingredientid = b.ingredientid
AND b.userid = 123
GROUP BY i.recipeid) x ON r.recipeid = x.recipeid
WHERE r.recipetitle ILIKE '%pasta%'
AND x.recipeid IS NULL;

Note that it's generally considered bad form to include "*" in the
return of a query when it's code dealing with the response. Humans can
deal with the columns coming back differently, but code has the habit of
getting confused.

Also, you may want to consider using full text search when searching the
titles. That ILIKE requires a full table scan and will slow down as
more recipes get added.

--
Sam http://samason.me.uk/

#5Mike Christensen
mike@kitchenpc.com
In reply to: Sam Mason (#4)
Re: Need some advice on a difficult query

Thanks! The queries I wrote in my email were just an example, my actual
implementation specifies all column names required and also uses full text
search. I just didn't want to paste in that much cruft :)

I'll do some tests with your technique below and see which works better..

Mike

On Wed, Dec 16, 2009 at 7:01 AM, Sam Mason <sam@samason.me.uk> wrote:

Show quoted text

On Wed, Dec 16, 2009 at 12:47:36AM -0800, Mike Christensen wrote:

When the user searches for a new pasta dish, the UI would generate a

query

something like this:

SELECT * FROM Recipes where RecipeTitle ilike '%pasta%';

I only need the data from the recipes table since I display a summary of

the

search results and don't load the full recipe until the user clicks on

the

link. This works great. However, I'm now in the process of implementing

an

ingredient blacklist. This means NEVER show me any recipes which have

one

of my blacklisted ingredients, as if I ingest any I will die a painful
death.

If you expect the number of blacklisted recipes to be low, the
following may be a good alternative as well:

SELECT r.*
FROM recipes r LEFT JOIN (
SELECT i.recipeid FROM ingredients i, blacklist b
WHERE i.ingredientid = b.ingredientid
AND b.userid = 123
GROUP BY i.recipeid) x ON r.recipeid = x.recipeid
WHERE r.recipetitle ILIKE '%pasta%'
AND x.recipeid IS NULL;

Note that it's generally considered bad form to include "*" in the
return of a query when it's code dealing with the response. Humans can
deal with the columns coming back differently, but code has the habit of
getting confused.

Also, you may want to consider using full text search when searching the
titles. That ILIKE requires a full table scan and will slow down as
more recipes get added.

--
Sam http://samason.me.uk/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general