Approaches for Lookup values (codes) in OLTP application

Started by James B. Byrneabout 18 years ago8 messagesgeneral
Jump to latest
#1James B. Byrne
byrnejb@harte-lyne.ca

I am considering how best to handle the issue of attribute encoding for an
OLTP application conversion. The existing system, which does not employ a
relational DBMS in the commonly accepted sense, uses a "system_table" to
validate system codes. This dataset uses concatenated fields to form a unique
key. The fields are "table_name", "table_column_name", and "value_as_char".

The conversion project framework is Ruby on Rails which embeds the practice of
arbitrary integer primary keys assigned by sequencers rather than so-called
"natural" keys or predicates that define the unique portion of the table-row.

My questions revolve around how best to implement this in postgresql given the
expectations of Rails. Is it best that I create a table with the three key
columns and an additional id then have a unique index on the three values but
store the id in the referential row? Do I store the code value in the
referential row and use the implied table_name, table_column_name and stored
value to preform a lookup on the system_table? Is there another approach that
I am not aware of that is superior to both of these?

Comments most welcome.

--
*** E-Mail is NOT a SECURE channel ***
James B. Byrne mailto:ByrneJB@Harte-Lyne.ca
Harte & Lyne Limited http://www.harte-lyne.ca
9 Brockley Drive vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada L8E 3C3

#2Richard Huxton
dev@archonet.com
In reply to: James B. Byrne (#1)
Re: Approaches for Lookup values (codes) in OLTP application

James B. Byrne wrote:

I am considering how best to handle the issue of attribute encoding for an
OLTP application conversion.

[snip]

The conversion project framework is Ruby on Rails which embeds the practice of
arbitrary integer primary keys assigned by sequencers rather than so-called
"natural" keys or predicates that define the unique portion of the table-row.

I'm not a Rails guy, but everything I've read about it suggests if
you're going to gain any advantage from it, then you should follow its
way of doing things. That means not converting anything, but rather
writing a rails app that does the same as your current app (if I'm
making myself clear).

--
Richard Huxton
Archonet Ltd

#3James B. Byrne
byrnejb@harte-lyne.ca
In reply to: Richard Huxton (#2)
Re: Approaches for Lookup values (codes) in OLTP application

On Fri, February 15, 2008 12:38, Richard Huxton wrote:

I'm not a Rails guy, but everything I've read about it suggests if
you're going to gain any advantage from it, then you should follow its
way of doing things. That means not converting anything, but rather
writing a rails app that does the same as your current app (if I'm
making myself clear).

Rails is "Opinionated" software but it will allow non-arbitrary keys. I
realize that I am not expressing myself well but this is in large measure due
to transitioning from a non-RBMS environment to relational technology and
having at the same time move from a host based application to a web-based
n-tier application. So, I tend to get muddled from time to time.

To restate my original query in more straight-forward terms: What is
considered appropriate RBMS practice to deal with encoded information which
has to be validated on input? One always has the option of just putting a
table of values into the application itself, but I have not found much to
recommend in this approach.

I can over-ride Rails assumptions and force a primary key formed by multiple
columns which will have a unique index automatically created for the
previously described "system_values_table". My question still hinges upon
what to put into the referential table, a foreign key lookup or just the
encoded value and let the application do the reference checking?

Consider the example of ISO 3166 country codes. There are at least two ways
to handle this:

1. Have a table just for country codes and have the code the primary key

2. Have a systems value table having a code prefix column and the code value
concatenated into a key
(table_prefix = "country_codes" + table_value ="CA" for example)

For something externally provided and widely used like country codes then
option one is attractive and possibly the most sensible and robust solution.
But consider things like transaction status codes. Perhaps an invoice
transaction has five possible codes and a credit-note has only three, but one
of those three is not valid for invoices. Where does one put such things?
What is the generally accepted best practice? Does one construct a separate
code table for every transaction type? Is it good practice to have a
transaction_type table, a code_table, and a transaction_code_union table and
lookup against the union?

This is perhaps a very minor and basic issue for this list's audience, but I
am going to live with these decisions a very long time and I would prefer to
have some idea of what is considered appropriate RBMS treatment for
application validation data as opposed to business state data.

--
*** E-Mail is NOT a SECURE channel ***
James B. Byrne mailto:ByrneJB@Harte-Lyne.ca
Harte & Lyne Limited http://www.harte-lyne.ca
9 Brockley Drive vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada L8E 3C3

#4Scott Marlowe
scott.marlowe@gmail.com
In reply to: James B. Byrne (#3)
Re: Approaches for Lookup values (codes) in OLTP application

On Fri, Feb 15, 2008 at 12:12 PM, James B. Byrne <byrnejb@harte-lyne.ca> wrote:

I can over-ride Rails assumptions and force a primary key formed by multiple
columns which will have a unique index automatically created for the
previously described "system_values_table". My question still hinges upon
what to put into the referential table, a foreign key lookup or just the
encoded value and let the application do the reference checking?

Consider the example of ISO 3166 country codes. There are at least two ways
to handle this:

1. Have a table just for country codes and have the code the primary key

2. Have a systems value table having a code prefix column and the code value
concatenated into a key
(table_prefix = "country_codes" + table_value ="CA" for example)

Generally speaking, I tend towards using the real value as the key and
foreign key in lookup tables, but occasionally using an artificial
numeric key is a better choice.

If you'll generally always need to know the actual value, you should
use it, because then it will be stored in the main table as well.
But, if you access that value only 1 time for every 100 accesses, it
will likely be faster to have it be on the other end of an int value,
which usually takes up less space.

For something externally provided and widely used like country codes then
option one is attractive and possibly the most sensible and robust solution.
But consider things like transaction status codes. Perhaps an invoice
transaction has five possible codes and a credit-note has only three, but one
of those three is not valid for invoices. Where does one put such things?

You could use a simple multi-part check constraint for that, or, if it
needs to be more fluid than that, you could use some kind of multi-key
table that points to a valid tx type list on a 1 to many basis, and
when you insert you FK check the two values against that table.

#5James B. Byrne
byrnejb@harte-lyne.ca
In reply to: Scott Marlowe (#4)
Re: Approaches for Lookup values (codes) in OLTP application

On Fri, February 15, 2008 14:43, Scott Marlowe wrote:

For something externally provided and widely used like country codes
then option one is attractive and possibly the most sensible and
robust solution. But consider things like transaction status codes.
Perhaps an invoice transaction has five possible codes and a credit-
note has only three, but one of those three is not valid for invoices.
Where does one put such things?

You could use a simple multi-part check constraint for that, or, if it
needs to be more fluid than that, you could use some kind of multi-key
table that points to a valid tx type list on a 1 to many basis, and
when you insert you FK check the two values against that table.

Is this to say that one should establish a table with the code as the
"non-unique" index and then have as its dependent values the usage contexts
which are applied as filters? I do not comprehend what you mean by a valid tx
type list on a 1 to many basis. If employed then an fk check presumably has
to resolve to a unique entry in the case of code validation.

I should rather think that one should set up a uniqueness constraint for a
particular code/context combination. Then one select might by code value and
context as a where clause parameter.

Say :

CREATE TABLE system_values_table (
value_as_char char(8),
value_context char(30),
PRIMARY KEY (value_as_char, value_context)
)

I understand from the PostgreSQL documentation (CREATE TABLE) that "PRIMARY
KEY" implies "UNIQUE", "NOT NULL", and "INDEX". Is this correct?

Presuming a table entry having value_as_char ="ACTV" and value_context =
"INVOICE" then when I do a SELECT I would pass the code value (as char)
together with the context thus?

SELECT * FROM system_values
WHERE value_as_char = input_code_as_char, value_context = "INVOICE"

I presume that the decision to place the code value first or the context value
first in the primary key construct depends upon whether one foresees the need
to span selects based on the context. So, for example, if I intended to
provide the UI with a drop down list populated with the available codes then
it would be better to have:

...
PRIMARY KEY (value_context, value_as_char)
...

and I could then populate the selection list with a select having the form:

...
SELECT * FROM system_values
WHERE value_context = "INVOICE"
...

The DBMS can then decide how to get the qualifying rows back and the index
would be usable in this case, whereas if the code value came first in the
composite key then the index would be useless for this query.

Have I got this more or less straight?

--
*** E-Mail is NOT a SECURE channel ***
James B. Byrne mailto:ByrneJB@Harte-Lyne.ca
Harte & Lyne Limited http://www.harte-lyne.ca
9 Brockley Drive vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada L8E 3C3

#6Scott Marlowe
scott.marlowe@gmail.com
In reply to: James B. Byrne (#5)
Re: Approaches for Lookup values (codes) in OLTP application

On Feb 15, 2008 3:31 PM, James B. Byrne <byrnejb@harte-lyne.ca> wrote:

On Fri, February 15, 2008 14:43, Scott Marlowe wrote:

For something externally provided and widely used like country codes
then option one is attractive and possibly the most sensible and
robust solution. But consider things like transaction status codes.
Perhaps an invoice transaction has five possible codes and a credit-
note has only three, but one of those three is not valid for invoices.
Where does one put such things?

You could use a simple multi-part check constraint for that, or, if it
needs to be more fluid than that, you could use some kind of multi-key
table that points to a valid tx type list on a 1 to many basis, and
when you insert you FK check the two values against that table.

Is this to say that one should establish a table with the code as the
"non-unique" index and then have as its dependent values the usage contexts
which are applied as filters? I do not comprehend what you mean by a valid tx
type list on a 1 to many basis. If employed then an fk check presumably has
to resolve to a unique entry in the case of code validation.

No, I was saying you should have a multi-value key in your lookup
table that gives the relation of something like::

create table tx_type_check (tx_type text, codes text, primary key
(tx_type, codes));

You populate it with all your possible value combinations, and then in
your master table have a FK to the tx_type_check table.

Does that make sense?

#7Scott Marlowe
scott.marlowe@gmail.com
In reply to: Scott Marlowe (#6)
Re: Approaches for Lookup values (codes) in OLTP application

On Feb 15, 2008 5:25 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

On Feb 15, 2008 3:31 PM, James B. Byrne <byrnejb@harte-lyne.ca> wrote:

On Fri, February 15, 2008 14:43, Scott Marlowe wrote:

For something externally provided and widely used like country codes
then option one is attractive and possibly the most sensible and
robust solution. But consider things like transaction status codes.
Perhaps an invoice transaction has five possible codes and a credit-
note has only three, but one of those three is not valid for invoices.
Where does one put such things?

You could use a simple multi-part check constraint for that, or, if it
needs to be more fluid than that, you could use some kind of multi-key
table that points to a valid tx type list on a 1 to many basis, and
when you insert you FK check the two values against that table.

Is this to say that one should establish a table with the code as the
"non-unique" index and then have as its dependent values the usage contexts
which are applied as filters? I do not comprehend what you mean by a valid tx
type list on a 1 to many basis. If employed then an fk check presumably has
to resolve to a unique entry in the case of code validation.

No, I was saying you should have a multi-value key in your lookup
table that gives the relation of something like::

create table tx_type_check (tx_type text, codes text, primary key
(tx_type, codes));

You populate it with all your possible value combinations, and then in
your master table have a FK to the tx_type_check table.

Does that make sense?

Here's what I had in mind, a simple example:

-- Create and load the lookup table:
create table tx_type_check (tx_type text, codes text, primary key
(tx_type,codes));
insert into tx_type_check values ('invoice','inv1');
insert into tx_type_check values ('invoice','inv2');
insert into tx_type_check values ('invoice','inv3');
insert into tx_type_check values ('invoice','shr1');
insert into tx_type_check values ('invoice','shr2');
insert into tx_type_check values ('credit','shr1');
insert into tx_type_check values ('credit','shr2');
insert into tx_type_check values ('credit','crd1');

-- Create a master table that references this lookup table:
create table txm (id serial primary key, tx_type text, tx_code text,
foreign key (tx_type,tx_code) references tx_type_check
(tx_type,codes));

-- test it
insert into txm (tx_type, tx_code) values ('invoice','inv1');
INSERT 0 1
insert into txm (tx_type, tx_code) values ('invoice','shr1');
INSERT 0 1
insert into txm (tx_type, tx_code) values ('invoice','crd1');
ERROR: insert or update on table "txm" violates foreign key
constraint "txm_tx_type_fkey"
DETAIL: Key (tx_type,tx_code)=(invoice,crd1) is not present in table
"tx_type_check".

and we can't insert invalid combinations of the two.

#8Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Scott Marlowe (#4)
Re: Approaches for Lookup values (codes) in OLTP application

On Feb 15, 2008, at 1:43 PM, Scott Marlowe wrote:

Generally speaking, I tend towards using the real value as the key and
foreign key in lookup tables, but occasionally using an artificial
numeric key is a better choice.

Something to consider here... any table that will have either a lot
of rows or a lot of "type" fields will likely be better off with a
phantom key (such as a serial) rather than storing text values in the
base table. As an example, we have a 500G database at work that
currently doesn't use any phantom keys for this kind of thing. I
recently estimated that if I normalized every field where doing so
would save more than 1MB it would reduce the size of the database by
142GB. Granted, about half of that is in a somewhat unusual table
that logs emails (a lot of the emails have the same text, so the gain
there is from normalizing that), but even discounting that 75G is
nothing to sneeze at in an OLTP database.
--
Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Attachments:

smime.p7sapplication/pkcs7-signature; name=smime.p7sDownload