RULE questions.

Started by Neil Burrowsover 27 years ago11 messageshackers

maillist@remo.demon.co.uk

over 27 years ago

Hi,

I have what I first thought would be a trivial problem, in that I require
the 2 VARCHAR columns in the following table to have the data stored in
upper case.

test_table
+-----------------------------+------------------------------+------+
|            Field            |             Type             |Length|
+-----------------------------+------------------------------+------+
| user_id                     | int4                         |    4 |
| name                        | varchar()                    |   10 |
| password                    | varchar()                    |   10 |
+-----------------------------+------------------------------+------+

I considered just using UPPER() in every SELECT statement, or creating a
view that SELECTed the 2 VARCHAR cols using UPPER but I assume that this
would have more performance issues than having the data converted to
uppercase only once during an insert or update. (Please correct me if I am
wrong).

After looking at triggers and rules I came up with the following solution:

CREATE VIEW test AS SELECT * FROM test_table;

CREATE RULE insert_test AS
ON INSERT TO test DO INSTEAD
INSERT INTO test_table (user_id, name, password) VALUES(new.user_id,
UPPER(new.name), UPPER(new.password));

CREATE RULE update_test AS
ON UPDATE TO test DO INSTEAD
UPDATE test_table SET name = UPPER(new.name), password = UPPER(new.password)
WHERE user_id = new.user_id;

which means that any insert or update to the test view is stored in upper
case as required.

However, I still have two concerns about this.

1) What impact on performance does using a VIEW in this way have?
2) Users can still enter data straight into test_table in lower case
bypassing the "rules"

First off, is there an easier way to ensure that data is stored in uppercase
for certain columns (not the whole table). And if not does anyone have
comments on performance issues, or ways of stopping users accidentally or
intentionally inserting lower case data straight into the table rather than
the view?

Many thanks in advance,

---[ Neil Burrows ]-----------------------------------------------------
E-mail: neil.burrows@gssec.bt.co.uk British Telecom Plc.
: neil@pawprint.co.uk Glasgow Engineering Centre
Web : http://www.remo.demon.co.uk/ Highburgh Rd. Glasgow UK
-----------< Any views expressed are not those of my employer >-----------

Jan Wieck

JanWieck@Yahoo.com

over 27 years ago

In reply to: Neil Burrows (#1)

Re: [SQL] RULE questions.

Hi,

I have what I first thought would be a trivial problem, in that I require
the 2 VARCHAR columns in the following table to have the data stored in
upper case.
test_table
+-----------------------------+------------------------------+------+
|            Field            |             Type             |Length|
+-----------------------------+------------------------------+------+
| user_id                     | int4                         |    4 |
| name                        | varchar()                    |   10 |
| password                    | varchar()                    |   10 |
+-----------------------------+------------------------------+------+
I considered just using UPPER() in every SELECT statement, or creating a
view that SELECTed the 2 VARCHAR cols using UPPER but I assume that this
would have more performance issues than having the data converted to
uppercase only once during an insert or update. (Please correct me if I am
wrong).

It's right.

After looking at triggers and rules I came up with the following solution:

CREATE VIEW test AS SELECT * FROM test_table;

CREATE RULE insert_test AS
ON INSERT TO test DO INSTEAD
INSERT INTO test_table (user_id, name, password) VALUES(new.user_id,
UPPER(new.name), UPPER(new.password));

CREATE RULE update_test AS
ON UPDATE TO test DO INSTEAD
UPDATE test_table SET name = UPPER(new.name), password = UPPER(new.password)
WHERE user_id = new.user_id;

1. Make sure user_id is unique or extend the WHERE clause in
the UPDATE rule. To explain why:

user_id | name
--------+----------
1 | aaa
1 | bbb
2 | ccc

UPDATE test SET name = 'ddd' WHERE name = 'aaa';

user_id | name
--------+----------
1 | ddd
1 | ddd
2 | ccc

This is because the rule will find the user_id 1 for name
'aaa' and then updates any row with user_id 1.

2. Change the WHERE clause in the UPDATE rule to compare
against old.user_id and add "user_id = new.user_id" to
the SET clause. Otherwise it would not be possible to
change the user_id because this thrown away by the rule.

3. Don't forget the ON DELETE rule. Maybe you don't want
once given user_id's to be changed or deleted. Then 2.
and 3. aren't right.

which means that any insert or update to the test view is stored in upper
case as required.

However, I still have two concerns about this.

1) What impact on performance does using a VIEW in this way have?

Only the rewriting overhead per query. The rewrite system
changes the querytree generated by the parser in such a way
that the planner/optimizer will get the same input as if the
query really was the SELECT from test_table. If you have a
view

CREATE VIEW test AS SELECT * FROM test_table;

the two statements

SELECT * FROM test;
SELECT * FROM test_table;

are totally equivalent from the planners/optimizers (and so
from the executors) point of view. The rewriting overhead
depends on how complex the statements and rule definitions
are. But not on the number of rows affected in the statement.
Selecting thousands of rows has the same speed than doing it
from the real tables behind a view. It's very small because
compared against parser/planner/optimizer it has to do very
few system cache lookups and works mostly with the data that
is already in memory.

2) Users can still enter data straight into test_table in lower case
bypassing the "rules"

Not necessarily. Since v6.4 rule actions (in contrast to
triggers up to now) inherit the access permissions of the
owner of the relation they're fired on.

CREATE TABLE test_table ...;
CREATE VIEW test AS SELECT * FROM test_table;

REVOKE ALL ON test_table FROM public;
GRANT ALL ON test_table TO me;

REVOKE ALL ON test FROM public;
GRANT ALL ON test TO me;
GRANT SELECT, INSERT, UPDATE, DELETE ON test TO public;

Now any user can access test, but nobody but me can access
test_table. Not even a SELECT does work. They can do most
things on test. But the rule actions are executed under the
permissions of me, so they work silently.

YOU MUST NOT GRANT ALL TO PUBLIC. ALL includes RULE
permission, so a user could change the rules on test, do some
things (maybe on any of your other tables) and reinstall the
original state of rules!

In addition to that, consider the case you really don't want
once given user_id's ever to change. Nor you like them to be
ever reused. But they should disappear on DELETE.

CREATE TABLE test_table (user_id int,
name varchar(10),
pass varchar(10),
alive bool);

CREATE UNIQUE INDEX test_user_id ON test_table (user_id);

CREATE VIEW test AS SELECT * FROM test_data
WHERE alive;

CREATE RULE ins_test AS ON INSERT TO test
DO INSTEAD INSERT INTO test_table
VALUES (new.user_id, UPPER(new.name), UPPER(new.pass), 't');

CREATE RULE upd_test AS ON UPDATE TO test
DO INSTEAD UPDATE test_table
SET name = UPPER(new.name), pass = UPPER(new.pass)
WHERE user_id = old.user_id AND alive;

CREATE RULE del_test AS ON DELETE TO test
DO INSTEAD UPDATE test_table
SET alive = 'f'
WHERE user_id = old.user_id AND alive;

Plus all the REVOKE and GRANT. This setup denies changes to
user_id, makes the row's disappear on DELETE but throw's an
error 'cannot insert duplicate ...' if someone tries to reuse
a user_id. Only the owner of the test_table can reincarnate a
once deleted account.

First off, is there an easier way to ensure that data is stored in uppercase
for certain columns (not the whole table). And if not does anyone have
comments on performance issues, or ways of stopping users accidentally or
intentionally inserting lower case data straight into the table rather than
the view?

The Postgres rewrite rule system is the most powerful way to
do that.

Jan

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #

D'Arcy J.M. Cain

darcy@druid.net

over 27 years ago

In reply to: Neil Burrows (#1)

Re: [SQL] RULE questions.

Thus spake Neil Burrows

First off, is there an easier way to ensure that data is stored in uppercase
for certain columns (not the whole table). And if not does anyone have
comments on performance issues, or ways of stopping users accidentally or
intentionally inserting lower case data straight into the table rather than
the view?

This makes me think of two features missing in PostgreSQL that I would
love to see. I know it's probably to late to think about it now for
6.5 but I wonder what others think about this.

First, as suggested above, how about an option to automatically convert
data to upper case on entry? I realize that triggers can do the job but
it seems to be needed often enough that putting it into the definition
for the field seems useful. I guess a lower option would make sense too.

Second, an option to CREATE INDEX to make the index case insensitive.
Other RDBMS systems do this and it is nice not to depend on users being
consistent when entering names. Consider ("albert", "Daniel", "DENNIS")
which would sort exactly opposite. Also, in a primary key field (or
unique index) it would be nice if "A" was rejected if "a" already was
in the database.

Thoughts?

Followups to hackers.

-- 
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 424 2871     (DoD#0082)    (eNTP)   |  what's for dinner.

Neil Burrows

maillist@remo.demon.co.uk

over 27 years ago

In reply to: Jan Wieck (#2)

RE: [SQL] RULE questions.

Hi,

I have what I first thought would be a trivial problem, in that

I require

the 2 VARCHAR columns in the following table to have the data stored in
upper case.

1. Make sure user_id is unique or extend the WHERE clause in
the UPDATE rule. To explain why:

This is actually just a small test table, and the real one has quite a few
more columns, but I did mean to make user_id unique, just forgot. :)

2. Change the WHERE clause in the UPDATE rule to compare
against old.user_id and add "user_id = new.user_id" to
the SET clause. Otherwise it would not be possible to
change the user_id because this thrown away by the rule.

The thinking behind it was that user_id shouldn't be able changed but I
accidentally neglected to mention that.

2) Users can still enter data straight into test_table in lower case
bypassing the "rules"

Not necessarily. Since v6.4 rule actions (in contrast to
triggers up to now) inherit the access permissions of the
owner of the relation they're fired on.

Ahh, I see. I thought that the rule actions used the current users access
permissions, not the owners. That's much handier, thanks.

In addition to that, consider the case you really don't want
once given user_id's ever to change. Nor you like them to be
ever reused. But they should disappear on DELETE.

CREATE TABLE test_table (user_id int,
name varchar(10),
pass varchar(10),
alive bool);

And that's a great way of doing what I was going to start looking at next.
:)

The Postgres rewrite rule system is the most powerful way to
do that.

Thanks very much for your time and comments here. It's certainly made
things clearer.

Thanks again,

Tom Lane

tgl@sss.pgh.pa.us

over 27 years ago

In reply to: Neil Burrows (#4)

Re: [HACKERS] Re: [SQL] RULE questions.

"D'Arcy" "J.M." Cain <darcy@druid.net> writes:

Second, an option to CREATE INDEX to make the index case insensitive.

That, at least, we can already do: build the index on lower(field) not
just field. Or upper(field) if that seems more natural to you.

Also, in a primary key field (or
unique index) it would be nice if "A" was rejected if "a" already was
in the database.

Making either of the above a UNIQUE index should accomplish that.

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofThu11Feb1999073300-0500m10AvIi-0000c1C@druid.net | Resolved by subject fallback

Hannu Krosing

hannu@tm.ee

over 27 years ago

In reply to: D'Arcy J.M. Cain (#3)

Re: [HACKERS] Re: [SQL] RULE questions.

D'Arcy J.M. Cain wrote:

This makes me think of two features missing in PostgreSQL that I would
love to see. I know it's probably to late to think about it now for
6.5 but I wonder what others think about this.

First, as suggested above, how about an option to automatically convert
data to upper case on entry? I realize that triggers can do the job but
it seems to be needed often enough that putting it into the definition
for the field seems useful. I guess a lower option would make sense too.

These could probably be implemened more effectively using rules. Having
the
rules generated automatically for simple cases would of course be nice,
but a warning at least should be given to user about creating the rule,
like it's currently done with primary key.

Or maybe it would be better to support virtual fields, like this :

create table people(
first_name varchar(25),
last_name varchar(25),
upper_first_name VIRTUAL upper(first_name),
upper_last_name VIRTUAL upper(last_name),
full_name VIRTUAL (upper_first_name || ' ' || upper_last_name)
primary key
);

and then untangle this in the backend and create required rules and
indexes automatically ?

Second, an option to CREATE INDEX to make the index case insensitive.

If you have this option on idex, how do you plan to make sure that the
index is actually used ?

It may be better to do it explicitly -

1. create index on upper(field)

2. use where upper(field) = 'MYDATA'

---------------
Hannu

Jan Wieck

JanWieck@Yahoo.com

over 27 years ago

In reply to: Hannu Krosing (#6)

Re: [HACKERS] Re: [SQL] RULE questions.

D'Arcy J.M. Cain wrote:

This makes me think of two features missing in PostgreSQL that I would
love to see. I know it's probably to late to think about it now for
6.5 but I wonder what others think about this.

First, as suggested above, how about an option to automatically convert
data to upper case on entry? I realize that triggers can do the job but
it seems to be needed often enough that putting it into the definition
for the field seems useful. I guess a lower option would make sense too.

These could probably be implemened more effectively using rules. Having
the
rules generated automatically for simple cases would of course be nice,
but a warning at least should be given to user about creating the rule,
like it's currently done with primary key.

No it can't.

Such a rule would look like

CREATE RULE xxx AS ON INSERT TO this_table
DO INSTEAD INSERT INTO this_table ...

The rule system will be triggerd on an INSERT INTO
this_table, rewrite and generate another parsetree that is an
INSERT INTO this_table, which is recursively rewritten again
applying rule xxx...

That's an endless recursion. A rule can never do the same
operation to a table it is fired for.

The old pre-Postgres95 university version (Postgres release
4.2) had the possibility to define rules that UPDATE NEW.
They where buggy and didn't worked sometimes at all. Instead
of fixing them, this functionality got removed when Postgres
became 95 :-(

Jan

D'Arcy J.M. Cain

darcy@druid.net

over 27 years ago

In reply to: Tom Lane (#5)

Re: [HACKERS] Re: [SQL] RULE questions.

Thus spake Tom Lane

"D'Arcy" "J.M." Cain <darcy@druid.net> writes:

Second, an option to CREATE INDEX to make the index case insensitive.

That, at least, we can already do: build the index on lower(field) not
just field. Or upper(field) if that seems more natural to you.

Almost. I guess I wasn't completely clear. Here's an example.

darcy=> create table x (a int, t text);
CREATE
darcy=> create unique index ti on x (lower(t) text_ops);
CREATE
darcy=> insert into x values (1, 'abc');
INSERT 19021 1
darcy=> insert into x values (2, 'ABC');
ERROR: Cannot insert a duplicate key into a unique index
darcy=> insert into x values (2, 'Def');
INSERT 19023 1
darcy=> select * from x;
a|t
-+---
1|abc
2|Def
(2 rows)

darcy=> select * from x where t = 'ABC';
a|t
-+-
(0 rows)

Note that it prevented me from adding the upper case dup just fine. The
last select is the issue. It's necessary for the user to know how it is
stored before doing the select. I realize that you can do this.

darcy=> select * from x where lower(t) = 'abc';

But other systems make this more convenient by just making 'ABC' and 'abc'
equivalent.

Mind you, it may not be possible in our system without creating a new,
case-insensitive type.

Also, in a primary key field (or
unique index) it would be nice if "A" was rejected if "a" already was
in the database.

Making either of the above a UNIQUE index should accomplish that.

True. I'm thinking of the situation where you want the primary key to
be case-insensitive. You can't control that on the auto-generated
unique index so you have to add a second unique index on the same
field. Again, perhaps a new type is the proper way to handle this.

Speaking of primary keys, there's one more thing needed to make primary
support complete, I think. Didn't we originally say that a primary
key field was immutable? We should be able to delete the record but
not change the value of the field in an update. Would this be hard
to do?

-- 
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 424 2871     (DoD#0082)    (eNTP)   |  what's for dinner.

Jan Wieck

JanWieck@Yahoo.com

over 27 years ago

In reply to: D'Arcy J.M. Cain (#8)

Re: [HACKERS] Re: [SQL] RULE questions.

D'Arcy J.M. Cain wrote:

But other systems make this more convenient by just making 'ABC' and 'abc'
equivalent.

Mind you, it may not be possible in our system without creating a new,
case-insensitive type.

And that wouldn't be too hard. For example, implementing
citext (case insensitive text) could use text's input/output
functions and all the things for lower/upper case conversion,
concatenation, substring etc (these are SQL language wrappers
as we already have tons of). Only new comparision operators
have to be built that compare case insensitive and then
creating a new operator class for it. All qualifications and
the sorting in indices, order by, group by are done with the
operators defined for the type.

Also comparision wrappers like to compare text = citext would
be useful, which simply uses citext_eq().

Making either of the above a UNIQUE index should accomplish that.

True. I'm thinking of the situation where you want the primary key to
be case-insensitive. You can't control that on the auto-generated
unique index so you have to add a second unique index on the same
field. Again, perhaps a new type is the proper way to handle this.

The above citext type would inherit this auto.

Speaking of primary keys, there's one more thing needed to make primary
support complete, I think. Didn't we originally say that a primary
key field was immutable? We should be able to delete the record but
not change the value of the field in an update. Would this be hard
to do?

No efford on that. I'm planning to reincarnate attribute
specification for rules and implement a RAISE statement. The
attributes (this time it will be multiple) suppress rule
action completely if none of the attributes appear in the
queries targetlist (what they must on UPDATE to change).

So at create table time, a rule like

CREATE RULE somename AS ON UPDATE TO table
ATTRIBUTE att1, att2
WHERE old.att1 != new.att1 OR old.att2 != old.att2
DO RAISE EXCEPTION 'Primary key of "table" cannot be changed';

could be installed. As long as nobody specifies the fields of
the primary key in it's update, the rewrite system will not
add the RAISE query to the querytree list, so no checking is
done at all.

But as soon as one of the attributes appears in the UPDATE,
there will be one extra query RAISE executed prior to the
UPDATE itself and check that all the new values are the old
ones. This would have the extra benefit, that the transaction
would abort BEFORE any changes have been made to the table at
all (remember that UPDATE in Postgres means another heap
tuple for each touched row and one more invalid tuple for
vacuum to throw away and for in-the-middle-aborted updates it
means so-far-I-came more never committed heap tuples that
vacuum has to send to byte-hell).

This will not appear in v6.5 (hopefully in v6.6). But it's
IMHO the best solution. With the mentioned RAISE, plus the
currently discussed deferred queries etc. we would have the
rule system ready to support ALL the constraint stuff
(cascaded delete, foreign key). But the more we use the rule
system, the more important it becomes that we get rid of the
block limit for tuples.

I think it would be better to spend your efford on that
issue.

Jan

#10

Hannu Krosing

hannu@tm.ee

over 27 years ago

In reply to: Jan Wieck (#7)

Re: [HACKERS] RULE (and ALTER TABLE) questions

Jan Wieck wrote:

These could probably be implemened more effectively using rules. Having
the
rules generated automatically for simple cases would of course be nice,
but a warning at least should be given to user about creating the rule,
like it's currently done with primary key.

No it can't.

Such a rule would look like

CREATE RULE xxx AS ON INSERT TO this_table
DO INSTEAD INSERT INTO this_table ...

The rule system will be triggerd on an INSERT INTO
this_table, rewrite and generate another parsetree that is an
INSERT INTO this_table, which is recursively rewritten again
applying rule xxx...

That's an endless recursion. A rule can never do the same
operation to a table it is fired for.

But when doing that at the table creation time, then the table can
actually
be defined as a view on storage table and rules for insert update and
delete
be defined for this view that do the actual data manipulation on the
storage table.

Or is the rule system currently not capable for this ?

When some field is changed to UPPER-ONLY status using alter table, the
table
could be renamed to staorage table and all the rules be created ?

And the other question - what is the status of ALTER TABLE commands -
can we add/remove/disable constraints without recreating the table ?

Is constraint and index disabling supported at all ?

-------------------
Hannu

#11

Jan Wieck

JanWieck@Yahoo.com

over 27 years ago

In reply to: Hannu Krosing (#10)

Re: [HACKERS] RULE (and ALTER TABLE) questions

But when doing that at the table creation time, then the table can
actually
be defined as a view on storage table and rules for insert update and
delete
be defined for this view that do the actual data manipulation on the
storage table.

That's IMHO a too specific case to do it generally with the
rule system. Should be some kind of constraint handled by
the parser in putting an UPPER() func node around the
targetlist expression.

There could be more general support implemented, in that a
user can allways tell that a custom function should be called
with the result of the TLE-expr before the value is dropped
into the tuple on INSERT/UPDATE.

Jan