Request for advice: Table design

Started by Mattias Kregertalmost 23 years ago8 messagesgeneral
Jump to latest
#1Mattias Kregert
mattias@kregert.se

Good morning, list!
I am in the middle of a project and i am just about to decide how to lay out the table(s) for work orders. I had been planning to use one table, but now i'm not so sure that it would be optimal.

The order history will grow with something like 150K-200K rows/year and will have to be saved for ten years.
The total number of active (new+validated+planned+ready_for_invoicing) orders would be something like 5K rows.

I am thinking about two solutions:

1. One table, "orders" with a column named "status". New orders, validated orders, planned orders, ready for invoicing, and old orders, all in one big table. The status column would be updated a number of times for each order.

2. A number of tables, "new_orders", "validated_orders", "order_history" etc... No status column. Order rows would be moved from one table to another. Perhaps i should have only two tables: "orders" and "order_history"?

More tables would mean more indexes (but smaller in size), harder to use/maintain a lot of tables and indexes and triggers and stuff... One table would mean that a lot of inactive orders would slow down access to active rows in the table...

Any suggestions? Real world examples?

/M

#2Thomas Kellerer
spam_eater@gmx.net
In reply to: Mattias Kregert (#1)
Re: Request for advice: Table design

Mattias Kregert schrieb:

The order history will grow with something like 150K-200K rows/year
and will have to be saved for ten years. The total number of active
(new+validated+planned+ready_for_invoicing) orders would be something
like 5K rows.

I am thinking about two solutions:

1. One table, "orders" with a column named "status". New orders,
validated orders, planned orders, ready for invoicing, and old
orders, all in one big table. The status column would be updated a
number of times for each order.

2. A number of tables, "new_orders", "validated_orders",
"order_history" etc... No status column. Order rows would be moved
from one table to another. Perhaps i should have only two tables:
"orders" and "order_history"?

More tables would mean more indexes (but smaller in size), harder to
use/maintain a lot of tables and indexes and triggers and stuff...
One table would mean that a lot of inactive orders would slow down
access to active rows in the table...

From my experience, I'd go for two tables orders and order_history.
Then move orders which are no longer used from orders to order_history.
E.g. you could keep only the last two years in orders, and move
everything else over to order_history. Thus you'll never have more than
300-400k rows in the orders table, which isn't much at all. The only
drawback is, that your application needs to be aware of that, in order
to be able to offer reporting on historic orders.

Using different tables for different order statuses is not a good idea,
you will eventually run into problems with that.

Thomas

#3Jason Godden
jasongodden@optushome.com.au
In reply to: Mattias Kregert (#1)
Re: Request for advice: Table design

Use one table - it's easier to maintain, more relationally correct and a
growth of 150K-200K rows a year should be nothing to PostgreSQL.

2 million rows isn't very many or difficult for PostgreSQL to manage so long
as it is correctly indexed and vacuumed following any really large bulk data
changes. The speed of PostgreSQL won't be an issue either providing you have
allocated enough memory in your pg_hba.conf and you are running on a
reasonable server (ie - at leats 512mb of ram)

You could actually consider three tables though:

CREATE TABLE orders (
orderid SERIAL8 NOT NULL PRIMARY KEY,
...
<misc order record stuff>
) WITHOUT OIDS;

CREATE TABLE orderstatuses (
orderstatus VARCHAR(20) NOT NULL PRIMARY KEY,
statusdesc VARCHAR(200) NOT NULL DEFAULT 'New Status',
) WITHOUT OIDS;

CREATE TABLE orderhistory (
orderid INT8 NOT NULL REFERENCES orders (orderid) ON UPDATE CASCADE ON
DELETE CASCADE,
historyuser VARCHAR(20) NOT NULL DEFAULT current_user,
historywhen TIMESTAMP NOT NULL DEFAULT current_timestamp,
historystatus VARCHAR(20) NOT NULL REFERENCES orderstatuses (orderstatus) ON
UPDATE CASCADE ON DELETE RESTRICT,
CONSTRAINT orderhistory_pkey PRIMARY KEY (orderid,historywhen)
) WITHOUT OIDS;

... and maybe create a non-unique index on orderid in orderhistory.

This way you can always get the status of an order by selecting the most
recent entry in the orderhistory table pertaining to a particular record and
use a single plpgsql function add/modify data. You could also use a rewrite
rule on a view displaying the most recent record. And now you have an order
history defining an order's state at a point in time.

Your style may differ to mine in terms of id's etc.. I know storing a whole
bunch of varchar's in the orderhistory table and referencing orderstatuses
may be a bit funny but I've always been of the opinion that it's best to use
a key that actually describes the data.

Rgds,

Jason

Show quoted text

On Tue, 17 Jun 2003 06:35 pm, Mattias Kregert wrote:

Good morning, list!
I am in the middle of a project and i am just about to decide how to lay
out the table(s) for work orders. I had been planning to use one table, but
now i'm not so sure that it would be optimal.

The order history will grow with something like 150K-200K rows/year and
will have to be saved for ten years. The total number of active
(new+validated+planned+ready_for_invoicing) orders would be something like
5K rows.

I am thinking about two solutions:

1. One table, "orders" with a column named "status". New orders, validated
orders, planned orders, ready for invoicing, and old orders, all in one big
table. The status column would be updated a number of times for each order.

2. A number of tables, "new_orders", "validated_orders", "order_history"
etc... No status column. Order rows would be moved from one table to
another. Perhaps i should have only two tables: "orders" and
"order_history"?

More tables would mean more indexes (but smaller in size), harder to
use/maintain a lot of tables and indexes and triggers and stuff... One
table would mean that a lot of inactive orders would slow down access to
active rows in the table...

Any suggestions? Real world examples?

/M

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

#4Andrew Perrin
clists@perrin.socsci.unc.edu
In reply to: Jason Godden (#3)
Re: Request for advice: Table design

I agree with Jason. There's no reason to use several tables for data that
are essentially the same in character; that's the point of a relational
database to begin with! I also agree about the orderhistory table: use it
to store cumulative updates, linked one-to-many with the orders table, so
you can get the complete history of an order as well as the latest status
very easily.

Best,
Andy Perrin

----------------------------------------------------------------------
Andrew J Perrin - http://www.unc.edu/~aperrin
Assistant Professor of Sociology, U of North Carolina, Chapel Hill
clists@perrin.socsci.unc.edu * andrew_perrin (at) unc.edu

On Tue, 17 Jun 2003, Jason Godden wrote:

Show quoted text

Use one table - it's easier to maintain, more relationally correct and a
growth of 150K-200K rows a year should be nothing to PostgreSQL.

2 million rows isn't very many or difficult for PostgreSQL to manage so long
as it is correctly indexed and vacuumed following any really large bulk data
changes. The speed of PostgreSQL won't be an issue either providing you have
allocated enough memory in your pg_hba.conf and you are running on a
reasonable server (ie - at leats 512mb of ram)

You could actually consider three tables though:

CREATE TABLE orders (
orderid SERIAL8 NOT NULL PRIMARY KEY,
...
<misc order record stuff>
) WITHOUT OIDS;

CREATE TABLE orderstatuses (
orderstatus VARCHAR(20) NOT NULL PRIMARY KEY,
statusdesc VARCHAR(200) NOT NULL DEFAULT 'New Status',
) WITHOUT OIDS;

CREATE TABLE orderhistory (
orderid INT8 NOT NULL REFERENCES orders (orderid) ON UPDATE CASCADE ON
DELETE CASCADE,
historyuser VARCHAR(20) NOT NULL DEFAULT current_user,
historywhen TIMESTAMP NOT NULL DEFAULT current_timestamp,
historystatus VARCHAR(20) NOT NULL REFERENCES orderstatuses (orderstatus) ON
UPDATE CASCADE ON DELETE RESTRICT,
CONSTRAINT orderhistory_pkey PRIMARY KEY (orderid,historywhen)
) WITHOUT OIDS;

... and maybe create a non-unique index on orderid in orderhistory.

This way you can always get the status of an order by selecting the most
recent entry in the orderhistory table pertaining to a particular record and
use a single plpgsql function add/modify data. You could also use a rewrite
rule on a view displaying the most recent record. And now you have an order
history defining an order's state at a point in time.

Your style may differ to mine in terms of id's etc.. I know storing a whole
bunch of varchar's in the orderhistory table and referencing orderstatuses
may be a bit funny but I've always been of the opinion that it's best to use
a key that actually describes the data.

Rgds,

Jason

On Tue, 17 Jun 2003 06:35 pm, Mattias Kregert wrote:

Good morning, list!
I am in the middle of a project and i am just about to decide how to lay
out the table(s) for work orders. I had been planning to use one table, but
now i'm not so sure that it would be optimal.

The order history will grow with something like 150K-200K rows/year and
will have to be saved for ten years. The total number of active
(new+validated+planned+ready_for_invoicing) orders would be something like
5K rows.

I am thinking about two solutions:

1. One table, "orders" with a column named "status". New orders, validated
orders, planned orders, ready for invoicing, and old orders, all in one big
table. The status column would be updated a number of times for each order.

2. A number of tables, "new_orders", "validated_orders", "order_history"
etc... No status column. Order rows would be moved from one table to
another. Perhaps i should have only two tables: "orders" and
"order_history"?

More tables would mean more indexes (but smaller in size), harder to
use/maintain a lot of tables and indexes and triggers and stuff... One
table would mean that a lot of inactive orders would slow down access to
active rows in the table...

Any suggestions? Real world examples?

/M

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

#5Mike Mascari
mascarm@mascari.com
In reply to: Mattias Kregert (#1)
Re: Request for advice: Table design

Mattias Kregert wrote:

Good morning, list!

I am in the middle of a project and i am just about to decide
how to lay out the table(s) for work orders. I had been
planning to use one table, but now i'm not so sure
that it would be optimal.

By "optimal", do you mean "logically consistent"?

The order history will grow with something like 150K-200K
rows/year and will have to be saved for ten years.
The total number of active
(new+validated+planned+ready_for_invoicing) orders
would be something like 5K rows.

That shouldn't affect logical design, unless you know you will be
querying the database in a manner which cannot use indexes.

I am thinking about two solutions:

1. One table, "orders" with a column named "status".
New orders, validated orders, planned orders, ready
for invoicing, and old orders, all in one big table.
The status column would be updated a number of
times for each order.

2. A number of tables, "new_orders", "validated_orders",
"order_history" etc... No status column. Order rows
would be moved from one table to another. Perhaps i
should have only two tables: "orders" and "order_history"?

More tables would mean more indexes (but smaller in size),
harder to use/maintain a lot of tables and indexes and
triggers and stuff... One table would mean that a lot
of inactive orders would slow down access to active
rows in the table...

The question cannot be answered without a fully defined requirements
document from the customer.

Do you need to know only what the current status of a workorder is? If
yes, one table might do. Otherwise, you need a history table. For each
change in the the status, does the customer require the same
attributes of the transaction to be recorded? If yes, then a two-table
workorder/orderhistory table might suffice. If no, then a separate
relation for each transaction type will be required. etc. etc.

For example,

Does the transition between "planned orders" and "ready for invoicing"
involve the same sort of attributes as the transition from
non-existence to "new orders"? Or, if the data involved is different,
does the customer not care? And has stated so explicitly?

Without such information, a design would be pure speculation.

HTH,

Mike Mascari
mascarm@mascari.com

#6SZŰCS Gábor
surrano@mailbox.hu
In reply to: Mattias Kregert (#1)
Re: Request for advice: Table design

Mattias,

Here goes how we do it:

order_status (id, name):
valid order statuses
order (partners, dates, status, etc):
main data of order
refs to order_status
order_statuschange (order, timestamp, oldstatus, newstatus, user):
history of status changes on each order
refs to order (cascade), order_status
order_mod (order, mod, user, closed, etc):
separate modifications for an order (only the newest is able to receive
items,
see below)
refs to order (cascade)
order_item (order, itemno, archive, archive_from_mod):
items in each order.
refs to order (cascade), order_mod
order_notify (order, timestamp, user, operation):
log and notify table
does NOT ref to order so delete from order leaves this table alone.

The same structure goes for shipments and invoices. A note: order_mod is a
historical thing we'd like to get rid of, but our hands are always full to
do that, so we let it stay.

G.
------------------------------- cut here -------------------------------
----- Original Message -----
From: "Mattias Kregert" <mattias@kregert.se>
Sent: Tuesday, June 17, 2003 10:35 AM

More tables would mean more indexes (but smaller in size), harder to
use/maintain a lot of tables and indexes and triggers and stuff... One table
would mean that a lot of inactive orders would slow down access to active
rows in the table...

Any suggestions? Real world examples?

#7Guy Fraser
guy@incentre.net
In reply to: Andrew Perrin (#4)
Re: Request for advice: Table design

Hi

As an additional note;

Older data is moved into a seperate table to reduce the number of
records that require regular vacuuming. Since the tables would contain
similar data it is simple to use union selections in a view with an
additional column to indicate which table the data comes from. Using a
view that combines the data from the two tables using a union, the data
will appear to be comming from a single table. This method make archival
access transparent.

I have a realtime data collection system that I built. The data is put
into tables on a yearly and monthly basis on the fly and new tables are
created as needed. I use a union to join tables to access the data over
several months. I just thought of a new idea, I am going to write a
function to join the tables required over a timespan - but that's
another story.

Two tables are easy to join with a union :
{if the column types are exactly matched}

create view all_data
select *,'current_data'::text as data_table from current_data ...
union
select *,'archive_data'::text from archive_data ...
;

The last column will indicate the data's origin.

Now to see all the data :

select * from all_data ;

Thats about it, using this method allows the "dynamic" table to small
for quick maintenace and operation, while the "static" table needs less
maintenace so it can be large with out the penalties incurred by
frequent maintenace.

Guy

Andrew Perrin wrote:

Show quoted text

I agree with Jason. There's no reason to use several tables for data that
are essentially the same in character; that's the point of a relational
database to begin with! I also agree about the orderhistory table: use it
to store cumulative updates, linked one-to-many with the orders table, so
you can get the complete history of an order as well as the latest status
very easily.

Best,
Andy Perrin

----------------------------------------------------------------------
Andrew J Perrin - http://www.unc.edu/~aperrin
Assistant Professor of Sociology, U of North Carolina, Chapel Hill
clists@perrin.socsci.unc.edu * andrew_perrin (at) unc.edu

On Tue, 17 Jun 2003, Jason Godden wrote:

Use one table - it's easier to maintain, more relationally correct and a
growth of 150K-200K rows a year should be nothing to PostgreSQL.

2 million rows isn't very many or difficult for PostgreSQL to manage so long
as it is correctly indexed and vacuumed following any really large bulk data
changes. The speed of PostgreSQL won't be an issue either providing you have
allocated enough memory in your pg_hba.conf and you are running on a
reasonable server (ie - at leats 512mb of ram)

You could actually consider three tables though:

CREATE TABLE orders (
orderid SERIAL8 NOT NULL PRIMARY KEY,
...
<misc order record stuff>
) WITHOUT OIDS;

CREATE TABLE orderstatuses (
orderstatus VARCHAR(20) NOT NULL PRIMARY KEY,
statusdesc VARCHAR(200) NOT NULL DEFAULT 'New Status',
) WITHOUT OIDS;

CREATE TABLE orderhistory (
orderid INT8 NOT NULL REFERENCES orders (orderid) ON UPDATE CASCADE ON
DELETE CASCADE,
historyuser VARCHAR(20) NOT NULL DEFAULT current_user,
historywhen TIMESTAMP NOT NULL DEFAULT current_timestamp,
historystatus VARCHAR(20) NOT NULL REFERENCES orderstatuses (orderstatus) ON
UPDATE CASCADE ON DELETE RESTRICT,
CONSTRAINT orderhistory_pkey PRIMARY KEY (orderid,historywhen)
) WITHOUT OIDS;

... and maybe create a non-unique index on orderid in orderhistory.

This way you can always get the status of an order by selecting the most
recent entry in the orderhistory table pertaining to a particular record and
use a single plpgsql function add/modify data. You could also use a rewrite
rule on a view displaying the most recent record. And now you have an order
history defining an order's state at a point in time.

Your style may differ to mine in terms of id's etc.. I know storing a whole
bunch of varchar's in the orderhistory table and referencing orderstatuses
may be a bit funny but I've always been of the opinion that it's best to use
a key that actually describes the data.

Rgds,

Jason

On Tue, 17 Jun 2003 06:35 pm, Mattias Kregert wrote:

Good morning, list!
I am in the middle of a project and i am just about to decide how to lay
out the table(s) for work orders. I had been planning to use one table, but
now i'm not so sure that it would be optimal.

The order history will grow with something like 150K-200K rows/year and
will have to be saved for ten years. The total number of active
(new+validated+planned+ready_for_invoicing) orders would be something like
5K rows.

I am thinking about two solutions:

1. One table, "orders" with a column named "status". New orders, validated
orders, planned orders, ready for invoicing, and old orders, all in one big
table. The status column would be updated a number of times for each order.

2. A number of tables, "new_orders", "validated_orders", "order_history"
etc... No status column. Order rows would be moved from one table to
another. Perhaps i should have only two tables: "orders" and
"order_history"?

More tables would mean more indexes (but smaller in size), harder to
use/maintain a lot of tables and indexes and triggers and stuff... One
table would mean that a lot of inactive orders would slow down access to
active rows in the table...

Any suggestions? Real world examples?

/M

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

#8Guy Fraser
guy@incentre.net
In reply to: Mattias Kregert (#1)
Re: [GENERAL] Request for advice: Table design

Hi

I realized I made a mistake. :(

The example below should have :

create view all_data as
select ...

The "as" statement was missing.

Here is a real sample of a function allowing a similar effect :

----------- sql script ---------------
--
-- delete old function [must be done before type is dropped]
--
DROP FUNCTION acct_info( TEXT , INTEGER );
--
-- delete old type
--
DROP TYPE acct_info_record;

--
-- return type for acct_info records
--
CREATE TYPE acct_info_record AS (
"Time-Stamp" ABSTIME,
"Acct-Status-Type" TEXT,
"User-Name" TEXT,
"Realm" TEXT,
"Acct-Session-Time" INTEGER,
"Acct-Input-Octets" INTEGER,
"Acct-Output-Octets" INTEGER,
"Called-Station-Id" TEXT,
"Calling-Station-Id" TEXT,
"Acct-Terminate-Cause" TEXT,
"Framed-IP-Address" INET,
"Service-Type" TEXT,
"Framed-Protocol" TEXT,
"Client-IP-Address" INET,
"NAS-IP-Address" INET,
"NAS-Port-Type" TEXT,
"NAS-Port-Id" INTEGER,
"Timestamp" INTEGER,
"Acct-Session-Id" TEXT,
"Acct-Link-Count" SMALLINT,
"Acct-Multi-Session-Id" TEXT,
"Acct-Delay-Time" INTEGER
);

--
-- function to select start and stop records as one data set by "mon"
and year.
--
CREATE FUNCTION acct_info( TEXT , INTEGER ) RETURNS SETOF
acct_info_record AS '
DECLARE
p_mon ALIAS FOR $1;
p_year ALIAS FOR $2;
v_exec TEXT;
rec RECORD;
BEGIN
v_exec := ''SELECT
"Time-Stamp",
"Acct-Status-Type",
"User-Name",
"Realm",
"Acct-Session-Time",
"Acct-Input-Octets",
"Acct-Output-Octets",
"Called-Station-Id",
"Calling-Station-Id",
"Acct-Terminate-Cause",
"Framed-IP-Address",
"Service-Type",
"Framed-Protocol",
"Client-IP-Address",
"NAS-IP-Address",
"NAS-Port-Type",
"NAS-Port-Id",
"Timestamp",
"Acct-Session-Id",
"Acct-Link-Count",
"Acct-Multi-Session-Id",
"Acct-Delay-Time"
FROM acct_start_'' || p_year || p_mon ||
'' UNION SELECT
"Time-Stamp",
"Acct-Status-Type",
"User-Name",
"Realm",
"Acct-Session-Time",
"Acct-Input-Octets",
"Acct-Output-Octets",
"Called-Station-Id",
"Calling-Station-Id",
"Acct-Terminate-Cause",
"Framed-IP-Address",
"Service-Type",
"Framed-Protocol",
"Client-IP-Address",
"NAS-IP-Address",
"NAS-Port-Type",
"NAS-Port-Id",
"Timestamp",
"Acct-Session-Id",
"Acct-Link-Count",
"Acct-Multi-Session-Id",
"Acct-Delay-Time"
FROM acct_stop_'' || p_year || p_mon ;
FOR rec IN EXECUTE v_exec
LOOP
RETURN NEXT rec;
END LOOP;
RETURN;
END;
' LANGUAGE 'plpgsql';

--
-- check to make sure it works
--
SELECT * FROM acct_info('jun','2003') LIMIT 10;
--
------- end of sql script ----------

That may not be of as much help for that project, but it was somthing I
realized I could use in one of my applications.

Guy

Dennis Gearon wrote:

Show quoted text

wow! Thanks for that info. I'm definitely filing this for use in a
future,near term project.

Guy Fraser wrote:

Hi

As an additional note;

Older data is moved into a seperate table to reduce the number of
records that require regular vacuuming. Since the tables would
contain similar data it is simple to use union selections in a view
with an additional column to indicate which table the data comes
from. Using a view that combines the data from the two tables using a
union, the data will appear to be comming from a single table. This
method make archival access transparent.

I have a realtime data collection system that I built. The data is
put into tables on a yearly and monthly basis on the fly and new
tables are created as needed. I use a union to join tables to access
the data over several months. I just thought of a new idea, I am
going to write a function to join the tables required over a timespan
- but that's another story.

Two tables are easy to join with a union :
{if the column types are exactly matched}

create view all_data
select *,'current_data'::text as data_table from current_data ...
union
select *,'archive_data'::text from archive_data ...
;

The last column will indicate the data's origin.

Now to see all the data :

select * from all_data ;

Thats about it, using this method allows the "dynamic" table to small
for quick maintenace and operation, while the "static" table needs
less maintenace so it can be large with out the penalties incurred by
frequent maintenace.

Guy