use null or 0 in foreign key column, to mean "no value"?
Hi,
(Maybe my subject line should be: `is not distinct from` and indexes.)
In Postgres 9.4, I’ve got a table of ‘items’ that references a table ‘colors’. Not all items have colors, so I created a nullable column in items like:
color_id bigint references colors
There is also an index on color_id:
create index on items (color_id);
I thought this was the right way to do it, but now I’m not so sure... In application code, prepared statements want to say: `select * from items where color_id = ?` and that `?` might be a int or null, so that doesn’t work. I used `is not distinct from` instead of =, which has the right meaning, but now I notice it doesn’t use the index for queries that replace `=` with `is not distinct from`, and queries run much slower. Using `explain` confirms: it’s doing sequential scans where `=` was using index.
So… is this bad DB design to use null to mean that an item has no color? Should I instead put a special row in `colors`, maybe with id = 0, to represent the “no color” value? Or is there some way to make an index work with nulls and `is not distinct from`?
thank you,
Rob
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On June 26, 2015 11:59:05 PM Robert Nikander wrote:
Hi,
(Maybe my subject line should be: `is not distinct from` and indexes.)
In Postgres 9.4, I’ve got a table of ‘items’ that references a table
‘colors’. Not all items have colors, so I created a nullable column in
items like:color_id bigint references colors
There is also an index on color_id:
create index on items (color_id);
I thought this was the right way to do it, but now I’m not so sure... In
application code, prepared statements want to say: `select * from items
where color_id = ?` and that `?` might be a int or null, so that doesn’t
work. I used `is not distinct from` instead of =, which has the right
meaning, but now I notice it doesn’t use the index for queries that replace
`=` with `is not distinct from`, and queries run much slower. Using
`explain` confirms: it’s doing sequential scans where `=` was using index.
I test for NULL in my application code and emit '... WHERE foo = ?' if the
value is not NULL and '... WHERE foo IS NOT NULL' otherwise. ISTR that that
actually uses indexes.
So… is this bad DB design to use null to mean that an item has no color?
Should I instead put a special row in `colors`, maybe with id = 0, to
represent the “no color” value? Or is there some way to make an index work
with nulls and `is not distinct from`?thank you,
Rob
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Friday, June 26, 2015, Robert Nikander <rob.nikander@gmail.com> wrote:
So… is this bad DB design to use null to mean that an item has no color?
Should I instead put a special row in `colors`, maybe with id = 0, to
represent the “no color” value? Or is there some way to make an index work
with nulls and `is not distinct from`?
Not sure about getting is distinct to work with indexes but in this
particular case I would add a "colorless" color to the table and make the
column constraint not null.
I would also likely just make the text value the unique key and forget the
surrogate integer key.
David J.
On Fri, Jun 26, 2015 at 10:59 PM, Robert Nikander <rob.nikander@gmail.com>
wrote:
Hi,
(Maybe my subject line should be: `is not distinct from` and indexes.)
In Postgres 9.4, I’ve got a table of ‘items’ that references a table
‘colors’. Not all items have colors, so I created a nullable column in
items like:color_id bigint references colors
There is also an index on color_id:
create index on items (color_id);
I thought this was the right way to do it, but now I’m not so sure... In
application code, prepared statements want to say: `select * from items
where color_id = ?` and that `?` might be a int or null, so that doesn’t
work. I used `is not distinct from` instead of =, which has the right
meaning, but now I notice it doesn’t use the index for queries that replace
`=` with `is not distinct from`, and queries run much slower. Using
`explain` confirms: it’s doing sequential scans where `=` was using index.So… is this bad DB design to use null to mean that an item has no color?
Should I instead put a special row in `colors`, maybe with id = 0, to
represent the “no color” value? Or is there some way to make an index work
with nulls and `is not distinct from`?
_My_ personal opinion is that NULL should _never_ be used to "indicate"
anything at all, other than something like "?I don't have that
information!" NULL should be anathema because, as you have seen, just
complicates coding SQL queries.
What I have read (Joe Celko mainly) is that it is better to have "encoded
special values" for "unusual" things. In your case, what does "no color"
really mean? Does it mean "it has some color, but I don't know what it is"
or is it more like a "transparent" substance, such as pure glass? Or is it
"reflective" so that it is seems to have the color of what ever color of
light is illuminating it? Maybe a perfect mirror of some sort. Are the
"normal" color_id values always non-negative (0 or greater)? If so then I'd
encode color_id something like -1 == "some unknown color", -2 ==
"transparent", -3 == "reflective", or some other meaning. One example that
we actually have at work is in an "expiration date" on certain types of
data files. Most of the time, this is just a normal date like 2017-01-20.
But, sometimes, we want codes to indicate things such as "keep the last 3
copies, and expire previous copies" or "keep until it has not been
referenced in 90 days" or "keep forever". We either have to have some
"special" date values to indicate these latter non-date case, or we need
yet another column for "type of expiration" which would be values for "use
date", "permanent", "keep ??? days non-reference", "keep ??? copies" and so
forth. We actually don't use an SQL date field, but a text field which we
can test and, if necessary, convert to a date or to some "explanatory" text.
thank you,
Rob
--
Schrodinger's backup: The condition of any backup is unknown until a
restore is attempted.
Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be.
My sister opened a computer store in Hawaii. She sells C shells down by the
seashore.
If someone tell you that nothing is impossible:
Ask him to dribble a football.
He's about as useful as a wax frying pan.
10 to the 12th power microphones = 1 Megaphone
Maranatha! <><
John McKown
On 27 Jun 2015, at 5:59, Robert Nikander <rob.nikander@gmail.com> wrote:
In application code, prepared statements want to say: `select * from items where color_id = ?` and that `?` might be a int or null, so that doesn’t work.
You could add another parameter to test which expression to 'activate' in the query, something like:
select * from items where ('notnull' = ? and color_id = ?) or ('null' = ? and color_id is null);
Of course, with those _positional_ query parameters that means you need to add the same value TWICE into the query. You wouldn't need to with _named_ query parameters, if those are available to you.
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general