The case for preserving case.
Hello, postgresql hackers.
I am working with a client with a 20k record MySQL database (that will
shortly expand to 100k/1m) and a few thousand lines of PHP code that
does a lot of DB interaction.
Their application, with a lot of relationships between data and a bunch
of data integrity requirements is perfectly suited to postgresql.
The PHP code follows a coding standard wherein variables are assigned
CamelCase identifiers. All of the objects persist themselves to the DB,
with a variable per column; on object initialization db columns are
read from the db and added as attributes of the object.
All of this breaks when I start to use postgresql, because all of the
attributes become lowercased.
Fixing this problem involves one of three things:
1.) rewriting all the code to have lowercased identifiers. This is
effectively renaming everything, as long camel case attributes become
much harder to read when they're lowercased. This also changes the
clients' preferred coding standard.
2.) using double quotes around all identifiers in sql statements. As
you're probably aware, there's no string format in PHP that lets you
write double quote marks unescaped (and do variable substitution), so
this involves rewriting hundreds of lines and imposing ongoing overhead
for every SQL query.
3.) escaping 4 lines in src/backend/parser/scansup.c , where
identifiers are lowercased.
I understand that the reason for lowercasing is because odbc
connections, etc expect case insensitivity, but the current behaviour
isn't an SQL standard nor is it really case insensitivity. I would love
case insensitivity with case preservation, but since that evidently is
a more complicated option, I would like to know how I can formulate the
'case preserving' option in a way to make it palatable for inclusion.
--
nothing can happen inside a sphere
that you could not inscribe upon it.
~mindlace http://mindlace.net
Hi EMF,
emf wrote:
Hello, postgresql hackers.
All of this breaks when I start to use postgresql, because all of the
attributes become lowercased.
Well, I'm afraid that something will have to change in the app. Your
email arrives while we are in the middle of a small war around the
subject of case changing, but none of the options going around at the
moment will match your request.
The two options considered are leaving things as they are (con: non
standard complient) or folding to uppercase (con: ugly and requires
transition, unfortunetly, in this order).
Fixing this problem involves one of three things:
1.) rewriting all the code to have lowercased identifiers. This is
effectively renaming everything, as long camel case attributes become
much harder to read when they're lowercased. This also changes the
clients' preferred coding standard.
Ok, so don't do it.
2.) using double quotes around all identifiers in sql statements. As
you're probably aware, there's no string format in PHP that lets you
write double quote marks unescaped (and do variable substitution), so
this involves rewriting hundreds of lines and imposing ongoing
overhead for every SQL query.
But is consistant with both SQL standard, and with current MySQL setup.
This will also make porting the app to further database, such as
commercial ones, easier. I think that is the proper way or you to go.
3.) escaping 4 lines in src/backend/parser/scansup.c , where
identifiers are lowercased.
You can do that for your own postgres installation, if you really like.
I'm not a member of the steering commitee, but I doubt you'll manage to
get such a patch accepted. Maybe as a per-session setting....
I understand that the reason for lowercasing is because odbc
connections, etc expect case insensitivity, but the current behaviour
isn't an SQL standard
Neither is yours, though. If we are going to change it, we might as well
change it to the standard, don't you think?
nor is it really case insensitivity.
It's as close as you can get without losing the benefits of doing hash
lookups.
I would love case insensitivity with case preservation, but since that
evidently is a more complicated option, I would like to know how I can
formulate the 'case preserving' option in a way to make it palatable
for inclusion.
Can't help you there, unless you want to compile Postgres yourself.
Shachar
--
Shachar Shemesh
Lingnu Open Source Consulting
http://www.lingnu.com/
On Apr 24, 2004, at 00:48, Tom Lane wrote:
You do realize that any such patch would be at least a thousand times
larger than that?
I am coming from a state of ignorance past the fact that commenting out
four lines of code appeared to create the behaviour I desired. I knew
that just changing it to match the behaviour *I* wanted isn't the same
thing as making a change that could work for everyone; that's why I
asked what sort of implementation of this behaviour would be
acceptable.
And have vast repercussions on existing client code?
I don't want to impose this on anyone else, I just want a postgresql
that doesn't mangle my case, as case carries meaning in my application.
From what I've seen online, other people migrating away from MySQL
would like this behaviour to be an option as well.
I'm willing to debate this, but not with people who claim it's a
four-line change. Do some research.
You are welcome to not pay attention to what I have to say; I will
probably never be deeply involved in the PostgreSQL codebase.
I am willing to do more work to make this option that is very useful to
me more widely acceptable.
--
Living on earth and in space are the same class of
problem. In one, the environment is harshly inimical to
humans: in the other, the inverse is true.
~mindlace http://mindlace.net
Import Notes
Reply to msg id not found: 7629.1082782126@sss.pgh.pa.us
On Apr 24, 2004, at 11:17, Jan Wieck wrote:
I don't think that we will break backward compatibility for existing
PostgreSQL specific code in order to gain CamelCase+MySQL porting ease
by adopting an even less standard compliant behaviour than we
currently have.
I understand and agree that breaking backward compatibility is not an
option.
As things are today, we are case insensitive for unquoted identifiers
and breaking that is not an option.
What you do to unquoted identifiers is not case insensitivity, but
lowercase folding.
I see a chance for getting your desired behaviour, case preservation,
only as a side effect if a larger move towards the standard.
That sounds great! I'd like to help if I can.
This would not be a simple per postmaster config option or even a
compiletime setting, but rather a per database option in the
pg_database system catalog, chosen at CREATE DATABASE time.
This also sounds good, but with my vast ignorance of postgresql, I have
no idea the proper way to tell scansup.c to knock it off (or fold up,
or fold down) based on something in the pg_database system catalog.
The real problem with this is that it has far greater side effects
than you seem to imagine yet.
[snip described problem] I am probably still not understanding, since
if the internals always quote in their queries, it would seem that the
internals could continue to use lowercase identifiers regardless of the
DB setting.
I am certain that most of us are open for a more complete proposal
that includes moving towards the ANSI standard, but the change you
outlined below is not acceptable.
I understand that. I am willing to do work to make a more complete
proposal, but I would appreciate some guidance as to how to code
something that would be more acceptable. I read in another thread that
the stuff going on in scansup.c isn't allowed/shouldn't talk to the
database, so I freely admit I don't know how to approach a palatable
solution.
Worst case scenario, I'm content with keeping my hacked version of
postgresql so that I can get this application ported faster. Making
that happen fast and taking advantage of postgresql's superior features
will help convince my client that the thousands of dollars he's
spending in this port were worthwhile ... once I've done that,
gradually transitioning to case-insensitive identifiers is possible,
but right now all he sees is big transition pain for gain he hasn't
seen yet.
--
nothing can happen inside a sphere
that you could not inscribe upon it.
~mindlace http://mindlace.net
Import Notes
Reply to msg id not found: 408A84F1.8020803@Yahoo.com