(another ;-)) PostgreSQL-derived project ...

Started by Albretch Muellerover 14 years ago19 messagesgeneral

lbrtchx@gmail.com

over 14 years ago

~
I have been searching for a PostgreSQL-derived project with a
"less-is-best" Philosophy. Even though I have read about quite a bit
of PG forks out there, what I have in mind is more like a baseline
than a fork.
~
My intention is not wrapping the same thing in a different package or
code add-ons/value-added features on top of PG, but ridding PG of
quite a bit of its internal capabilities and just use its very
baseline.
~
All I would need PG for is raw data warehousing, memory,
I/O-subsystem management, MVCC/transaction management ... No fanciness
whatsoever. What do you need to, say, format dates in the database if
formatting/pretty-printing and internalization can be taken care more
appropriately in the calling environment say Python or Java? All is
needed is to store a long representing the date. Why are arrays needed
in a the DB proper when serialization and marshaling/casting can be
taken care of in the calling environment. If you are using say, java,
all you need PG to do is to faithfully store a sequence of bytes and
you would do the (de)serialization very naturally indeed.
~
There used to be a postgresql-base-<version> package with the bare
minimum of source code to build and run PostgreSQL which I think would
be a good starting point, but I don't find it in the mirror sites
anymore
~
http://wwwmaster.postgresql.org/download/mirrors-ftp
~
Where can I find it?
~
I know the result will not be a SQL-compliant DBMS anymore, yet I
wonder how much faster would SQL+client code doing such things as
formatting "on-the-fly" work.
~
Do you know of such tests even in a regular PG installation?
~
Do you see any usefulness in such a project?
~
Do you know of such a project? Anyone interested? Any suggestions to
someone embarking in it?
~
It would be great if PG developers see any good in it and do it themselves ;-)
~
lbrtchx

Mike Christensen

mike@kitchenpc.com

over 14 years ago

In reply to: Albretch Mueller (#1)

Re: (another ;-)) PostgreSQL-derived project ...

~
I have been searching for a PostgreSQL-derived project with a
"less-is-best" Philosophy. Even though I have read about quite a bit
of PG forks out there, what I have in mind is more like a baseline
than a fork.
~
My intention is not wrapping the same thing in a different package or
code add-ons/value-added features on top of PG, but ridding PG of
quite a bit of its internal capabilities and just use its very
baseline.
~
All I would need PG for is raw data warehousing, memory,
I/O-subsystem management, MVCC/transaction management ... No fanciness
whatsoever. What do you need to, say, format dates in the database if
formatting/pretty-printing and internalization can be taken care more
appropriately in the calling environment say Python or Java? All is
needed is to store a long representing the date. Why are arrays needed
in a the DB proper when serialization and marshaling/casting can be
taken care of in the calling environment. If you are using say, java,
all you need PG to do is to faithfully store a sequence of bytes and
you would do the (de)serialization very naturally indeed.
~
There used to be a postgresql-base-<version> package with the bare
minimum of source code to build and run PostgreSQL which I think would
be a good starting point, but I don't find it in the mirror sites
anymore
~
http://wwwmaster.postgresql.org/download/mirrors-ftp
~
Where can I find it?
~
I know the result will not be a SQL-compliant DBMS anymore, yet I
wonder how much faster would SQL+client code doing such things as
formatting "on-the-fly" work.
~
Do you know of such tests even in a regular PG installation?
~
Do you see any usefulness in such a project?
~
Do you know of such a project? Anyone interested? Any suggestions to
someone embarking in it?
~
It would be great if PG developers see any good in it and do it themselves ;-)
~
lbrtchx

Doesn't Yahoo! have some super modified mega-high-performant version
of Postgres? Last I heard (which was like 2008) they were planning on
putting it online. I think it involved a columnar oriented table
format or something. Did this ever happen?

David G. Johnston

david.g.johnston@gmail.com

over 14 years ago

In reply to: Albretch Mueller (#1)

Re: (another ;-)) PostgreSQL-derived project ...

On Sep 24, 2011, at 22:54, Albretch Mueller <lbrtchx@gmail.com> wrote:

Do you see any usefulness in such a project?
~
Do you know of such a project? Anyone interested? Any suggestions to
someone embarking in it?
~
It would be great if PG developers see any good in it and do it themselves ;-)
~
lbrtchx

I can't tell if you mean this as a humorous post of if you just having something in your eye ;-)

I cannot imagine you would benefit that much by removing these capabilities compared to simply ignoring them.

As a developer I still have to deal with dates and arrays so while PostgreSQL could do less the work still has to be done. I happier having a group of programmers more skilled than myself doing it instead of me. Plus, by having it in the DB I avoid considerable considerable overhead and can now use those features within my SQL statements/queries.

I can see where adding complexity could weigh into a do/ignore decision but once it's in and tested why would you want to remove a feature? If it had serious performance implications maybe, but even then arguing for a runtime enable/disable flag would be best so everyone could decide based upon their unique circumstances.

Not a project developer but unless and until you can identify meaningful areas of performance degradation you are simply guessing that in simply being feature rich PostgreSQL has sub-optimal performance; but if most/all of the overhead is in areas that you deem critical/core then nothing you would do would have a meaningful impact; and improving the core areas would improve not only your own situation but the core project as well.

"Premature optimization is the root of all evil." (someone not me)

David J.

Darren Duncan

darren@darrenduncan.net

over 14 years ago

In reply to: Albretch Mueller (#1)

Re: (another ;-)) PostgreSQL-derived project ...

Based on your description, I suggest you might want to look at SQLite. It
provides a number of compile-time options where you can exclude various features
you don't want from the binary, when simply ignoring the extra features isn't
good enough. -- Darren Duncan

Albretch Mueller wrote:

Show quoted text

~
I have been searching for a PostgreSQL-derived project with a
"less-is-best" Philosophy. Even though I have read about quite a bit
of PG forks out there, what I have in mind is more like a baseline
than a fork.
~
My intention is not wrapping the same thing in a different package or
code add-ons/value-added features on top of PG, but ridding PG of
quite a bit of its internal capabilities and just use its very
baseline.
~
All I would need PG for is raw data warehousing, memory,
I/O-subsystem management, MVCC/transaction management ... No fanciness
whatsoever. What do you need to, say, format dates in the database if
formatting/pretty-printing and internalization can be taken care more
appropriately in the calling environment say Python or Java? All is
needed is to store a long representing the date. Why are arrays needed
in a the DB proper when serialization and marshaling/casting can be
taken care of in the calling environment. If you are using say, java,
all you need PG to do is to faithfully store a sequence of bytes and
you would do the (de)serialization very naturally indeed.
~
There used to be a postgresql-base-<version> package with the bare
minimum of source code to build and run PostgreSQL which I think would
be a good starting point, but I don't find it in the mirror sites
anymore
~
http://wwwmaster.postgresql.org/download/mirrors-ftp
~
Where can I find it?
~
I know the result will not be a SQL-compliant DBMS anymore, yet I
wonder how much faster would SQL+client code doing such things as
formatting "on-the-fly" work.
~
Do you know of such tests even in a regular PG installation?
~
Do you see any usefulness in such a project?
~
Do you know of such a project? Anyone interested? Any suggestions to
someone embarking in it?
~
It would be great if PG developers see any good in it and do it themselves ;-)
~
lbrtchx

Uwe C. Schroeder

uwe@oss4u.com

over 14 years ago

In reply to: Albretch Mueller (#1)

Re: (another ;-)) PostgreSQL-derived project ...

~
I have been searching for a PostgreSQL-derived project with a
"less-is-best" Philosophy. Even though I have read about quite a bit
of PG forks out there, what I have in mind is more like a baseline
than a fork.
~
My intention is not wrapping the same thing in a different package or
code add-ons/value-added features on top of PG, but ridding PG of
quite a bit of its internal capabilities and just use its very
baseline.
~
All I would need PG for is raw data warehousing, memory,
I/O-subsystem management, MVCC/transaction management ... No fanciness
whatsoever. What do you need to, say, format dates in the database if
formatting/pretty-printing and internalization can be taken care more
appropriately in the calling environment say Python or Java? All is
needed is to store a long representing the date. Why are arrays needed
in a the DB proper when serialization and marshaling/casting can be
taken care of in the calling environment. If you are using say, java,
all you need PG to do is to faithfully store a sequence of bytes and
you would do the (de)serialization very naturally indeed.

Maybe you let us in a little more on what you're trying to accomplish. What it
looks like to me right now is that you're looking for a non-sql compliant SQL
database where a lot of the data integrity is actually coded in the
application :-)
I bet there are database systems out there that do exactly or nearly what you
want. Maybe an object oriented one may suit you better than a relational
system?

For me, I sure don't use all that postgresql has to offer, but I like that it
does a lot of things for me and I code most of what my application does inside
the database using views, stored procedures and triggers. That approach strips
down on application complexity. My apps don't have to do any post-processing
of the data - I query the records I need and the app merely displays them.
Yes, sometimes I wish postgresql was more "high performance" - but then, I
drive an old, paid for, practical car and not a formula one racer without a
boot or spare tire. My point being: postgresql does what it does very reliably
and although not the best performer on the market, it is a database I would
trust my payroll with - and there are few where I'd make that statement.
Never had any data loss ever, never had it crash on me. Give good hardware to
postgresql and you will get good performance with exceptional stability and
integrity.

Uwe

Albretch Mueller

lbrtchx@gmail.com

over 14 years ago

In reply to: Uwe C. Schroeder (#5)

Re: (another ;-)) PostgreSQL-derived project ...

~
Well, at least I thought you would tell me where the postgresql-base
is to be found. The last version I found is:
~
http://freebsd.csie.nctu.edu.tw/pub/distfiles/postgresql/postgresql-base-8.3beta2.tar.bz2
~
and I wondered what that is and why there are no postgresql-base
after "8.3beta2"
~

I cannot imagine you would benefit that much by removing these capabilities compared to simply ignoring them.
Plus, by having it in the DB I avoid considerable considerable overhead

~
Can you or do you know of anyone who has made those kinds of
imaginations falsifiable?
~

... and can now use those features within my SQL statements/queries.

~
For what exactly? Isn't a comparison on 4 numeric bytes (1 (or 1/2)
word in modern hardware) more efficient than comparing sequences of
string characters?
~

simply guessing that in simply being feature rich PostgreSQL has sub-optimal performance

~
I never said that
~

... you might want to look at SQLite. It provides a number of compile-time options where you can exclude various features you don't want from the binary

~
I couldn't find the compile options you mentioned: sqlite.org/
{faq.html, custombuild.html, docs.html}
~

... you're looking for a non-sql compliant SQL database where a lot of the data integrity is actually coded in the application :-)

~
First past of your statement I acknowledged, but how is it exactly
that "lot of the data integrity is actually coded in the application"
~

That approach strips down on application complexity. My apps don't have to do any post-processing

of the data - I query the records I need and the app merely displays them.
~
Again have you actually tested those assumptions?
~

My point being: postgresql does what it does very reliably

~
I never said otherwise
~
lbrtchx

David G. Johnston

david.g.johnston@gmail.com

over 14 years ago

In reply to: Albretch Mueller (#6)

Re: (another ;-)) PostgreSQL-derived project ...

On Sep 25, 2011, at 2:11, Albretch Mueller <lbrtchx@gmail.com> wrote:

~
Well, at least I thought you would tell me where the postgresql-base
is to be found. The last version I found is:
~
http://freebsd.csie.nctu.edu.tw/pub/distfiles/postgresql/postgresql-base-8.3beta2.tar.bz2
~
and I wondered what that is and why there are no postgresql-base
after "8.3beta2"
~

I cannot imagine you would benefit that much by removing these capabilities compared to simply ignoring them.
Plus, by having it in the DB I avoid considerable considerable overhead

~
Can you or do you know of anyone who has made those kinds of
imaginations falsifiable?

No; not worth my effort.

~

... and can now use those features within my SQL statements/queries.

~
For what exactly? Isn't a comparison on 4 numeric bytes (1 (or 1/2)
word in modern hardware) more efficient than comparing sequences of
string characters?

The ARRAY_AGG() function in particular has been very useful in queries I write.

~

simply guessing that in simply being feature rich PostgreSQL has sub-optimal performance

~
I never said that

Your whole post implies this otherwise there is no meaningful reason to look for something excluding features (assuming proper and correct implementation).

That approach strips down on application complexity. My apps don't have to do any post-processing of the data - I query the records I need and the app merely displays them.

~
Again have you actually tested those assumptions?

Is this the best response you can come up with? The crux of the counter-argument is that by having PostgreSQL handle 'advanced' features application code avoids the need to do so. The principle of code-reuse and the fact the features are executed by the same program holding the data make this a de-facto truth (and yes, one that we are probably taking for granted). But, if you really feel a bare-bones implementation of PostgreSQL is worthwhile you are the one that needs to test (and state explicitly) your own underlying assumptions to see whether they hold and thus make such an endeavor worthwhile.

David J.

Alban Hertroys

haramrae@gmail.com

over 14 years ago

In reply to: Albretch Mueller (#6)

Re: (another ;-)) PostgreSQL-derived project ...

On 25 Sep 2011, at 8:11, Albretch Mueller wrote:

... and can now use those features within my SQL statements/queries.

~
For what exactly? Isn't a comparison on 4 numeric bytes (1 (or 1/2)
word in modern hardware) more efficient than comparing sequences of
string characters?

Data types aren't stored in the database as character strings (unless you define your columns as text, of course).

When data in the database gets compared to data in a query (for example, when you use a WHERE clause that compares a date column to a given date), the data in the query gets transformed to the appropriate type (text to date, in this case) - just once. That's efficient enough that the difference in performance between a numerical value and the string representation doesn't matter.

I don't know what you're trying to say in the above, but you seem to base your hypothesis on wrong assumptions.

Alban Hertroys

--
The scale of a problem often equals the size of an ego.

Scott Ribe

scott_ribe@elevated-dev.com

over 14 years ago

In reply to: Albretch Mueller (#6)

Re: (another ;-)) PostgreSQL-derived project ...

On Sep 25, 2011, at 2:11 AM, Albretch Mueller wrote:

For what exactly? Isn't a comparison on 4 numeric bytes (1 (or 1/2)
word in modern hardware) more efficient than comparing sequences of
string characters?

What on earth makes you think the db engine compares numbers as strings???

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice

#10

Martijn van Oosterhout

kleptog@svana.org

over 14 years ago

In reply to: Albretch Mueller (#6)

Re: (another ;-)) PostgreSQL-derived project ...

On Sun, Sep 25, 2011 at 06:11:36AM +0000, Albretch Mueller wrote:

~
Well, at least I thought you would tell me where the postgresql-base
is to be found. The last version I found is:
~
http://freebsd.csie.nctu.edu.tw/pub/distfiles/postgresql/postgresql-base-8.3beta2.tar.bz2

Notwithstanding the rest of your post, I'm surpised you missed the
website:

http://www.postgresql.org/download/

There's a source code link, as well as several others.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

He who writes carelessly confesses thereby at the very outset that he does
not attach much importance to his own thoughts.

-- Arthur Schopenhauer

#11

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Albretch Mueller (#6)

Re: (another ;-)) PostgreSQL-derived project ...

Albretch Mueller <lbrtchx@gmail.com> writes:

Well, at least I thought you would tell me where the postgresql-base
is to be found. The last version I found is:
http://freebsd.csie.nctu.edu.tw/pub/distfiles/postgresql/postgresql-base-8.3beta2.tar.bz2
and I wondered what that is and why there are no postgresql-base
after "8.3beta2"

We stopped bothering because the split tarballs weren't really good for
anything separately. They were never independently buildable pieces,
and were only meant to ease downloading the distribution over unreliable
internet connections. That concern was obsolete some years ago. The
only part of the PG distribution that's ever been meant to be separately
buildable is libpq and some of the client-side tools. If you want to
start stripping down the server, you're on your own.

Now, having said that, there has been some interest in pushing
lesser-used chunks like the geometric datatypes out into extensions.
I don't see how that's going to result in any significant performance
gain, though.

regards, tom lane

#12

Uwe C. Schroeder

uwe@oss4u.com

over 14 years ago

In reply to: Albretch Mueller (#6)

Re: (another ;-)) PostgreSQL-derived project ...

... you're looking for a non-sql compliant SQL database where a lot of
the data integrity is actually coded in the application :-)

~
First past of your statement I acknowledged, but how is it exactly
that "lot of the data integrity is actually coded in the application"
~

Take dates for example: you'd have to code very carefully to catch all the
different ways dates are represented on this planet. Your application has to
handle this since all the database knows at this point is an absolute time
(i.e. seconds since epoch or something the like) and your app has to convert
every occurrence of a date to this format or the database won't find anything
or even worse store something wrong.
Same goes for numbers: if everything is stored in a byte sequence, how does
the database know you try to compare the number 1 with the character "1"?
Again, your app would have to handle that.
To me, that's moving data integrity into the application.

That approach strips down on application complexity. My apps don't have
to do any post-processing

of the data - I query the records I need and the app merely displays them.
~
Again have you actually tested those assumptions?

Yes, I have, as have many others. Simple example: program a website like, say
Facebook. So you have thousands of users from all over the world. Your website
code handles all the data conversions. Now Apple comes along and sells an
iPhone which silly enough a lot of people like and try to use to access your
website. You now face the problem that you need a second website doing the
same thing as the first website except solely made for touch-screen devices.
You will be forced to rewrite a lot of your code because all the data
conversion is in the code. Even worse, if you'd have to make an iphone or
android app in lieu of the second website, you'd have to recode everything you
did in a different language - i.e. objective C.

If you leave these things to the database, you "simply" write a second client
for a different platform and you don't have to fuzz around to get the
conversions correct because the application receives the data already
converted.

Sure this all depends on what application you need this specialized database
engine for. If it's an application for a very defined environment you can
dictate how data is to be input and train users. If it's an application for
the big wild world you will have problems with users doing stupid things
beyond your control like writing "P.O. Box 1" into the zipcode field where you
expected a 10 digit number. I rather have the database catch those cases and
reject storing the bad input. That saves me a lot of validation code in my
app.

#13

Chris Travers

chris.travers@gmail.com

over 14 years ago

In reply to: Albretch Mueller (#6)

Re: (another ;-)) PostgreSQL-derived project ...

On Sat, Sep 24, 2011 at 11:11 PM, Albretch Mueller <lbrtchx@gmail.com> wrote:

I cannot imagine you would benefit that much by removing these capabilities compared to simply ignoring them.
Plus, by having it in the DB I avoid considerable considerable overhead

~
Can you or do you know of anyone who has made those kinds of
imaginations falsifiable?
~

My own experience here is that while it is generally possible to
create additional overhead by mis-use of advanced features, *in
general* you save more overhead and get clearer code by pushing what
you can into the database within reason.

... and can now use those features within my SQL statements/queries.

~
For what exactly? Isn't a comparison on 4 numeric bytes (1 (or 1/2)
word in modern hardware) more efficient than comparing sequences of
string characters?
~

simply guessing that in simply being feature rich PostgreSQL has sub-optimal performance

~
I never said that
~

... you might want to look at SQLite. It provides a number of compile-time options where you can exclude various features you don't want from the binary

~
I couldn't find the compile options you mentioned: sqlite.org/
{faq.html, custombuild.html, docs.html}
~

... you're looking for a non-sql compliant SQL database where a lot of the data integrity is actually coded in the application :-)

~
First past of your statement I acknowledged, but how is it exactly
that "lot of the data integrity is actually coded in the application"

I can give you a good example. Some years ago, I was working on an
accounting application someone else wrote which stored all monetary
values as double-precision floats and then handled arbitrary precision
math in the front-end of the application. This meant:

1) To detect if an invoice was closed, it would retrieve all gl lines
associated with the invoice and an AR/AP account and see if these
totalled to 0 in the middleware. This performed ok for a small
database, but for a large one, it didn't work so well....... Had the
application used NUMERIC types, this could have been more easily done
with HAVING clause, and this could have been done far more efficiently
on the db server.

2) It made the application relatively sensitive to rounding errors---
sum() with group by would return different numbers with different
groupings in sufficiently large databases.

So here you get a case where the application was made less robust and
performed quite a bit worse by not using arbitrary math capabilities
of PostgreSQL.

~

That approach strips down on application complexity. My apps don't have to do any post-processing

of the data - I query the records I need and the app merely displays them.
~
Again have you actually tested those assumptions?

In general my experience is that it is far easier to tune performance
of an app as is described here (where all presentation is done in db)
than it is an app where a lot of it is done in middle-ware or
front-end.

For example, consider the following: I need to determine all of the
years that have dates in a database table with, say, 50M records. If
I have a database query which does this all at once, when it performs
badly, I can tune it, and there are fewer tradeoffs I have to make.

~

My point being: postgresql does what it does very reliably

~
I never said otherwise

I'd add it performs remarkably well IMHO as well as reliably.

Best Wishes,
Chris Travers

#14

Albretch Mueller

lbrtchx@gmail.com

over 14 years ago

In reply to: Chris Travers (#13)

Re: (another ;-)) PostgreSQL-derived project ...

On 9/25/11, David Johnston <polobo@yahoo.com> wrote:

On Sep 25, 2011, at 2:11, Albretch Mueller <lbrtchx@gmail.com> wrote:

Can you or do you know of anyone who has made those kinds of
imaginations falsifiable?

No; not worth my effort.

~
;-)
~

That approach strips down on application complexity. My apps don't have
to do any post-processing of the data - I query the records I need and
the app merely displays them.

~
Again have you actually tested those assumptions?

Is this the best response you can come up with? The crux of the
counter-argument is that by having PostgreSQL handle 'advanced' features
application code avoids the need to do so. The principle of code-reuse and
the fact the features are executed by the same program holding the data make
this a de-facto truth (and yes, one that we are probably taking for
granted).

~
First, I wonder what kind of technical person would say there are
"de-facto truth(s)". I thought only politicians would talk like that.
Now, in a sense you are right, I am talking from the background of my
own experiences (and so are you). When I have developed relatively
complicated and heavily accessed websites I only use DBMS when I need
to actually persist any data. For example there are options in Tomcat
(the java-based web serverrr) to offload session handling to a DBMS
(which is great when you need to stat(istically trace and infer users'
navigation) and establish transactions offloading a timed-out session
to an actual database hitting thhard drivevee (some user got
distracted ...) ...) and that sounds great, but anything dealing with
I/O would invariably slow your apps, so what I do is use in-mem (lite)
DBMS such as Hypersonic SQL (hsqldb.org) to keep sessions off the I/O
subsystem and the speed increase is --very-- noticeable
~
Since any piece of code engaging the I/O such as database access code
should be as fast and simple as possible; yes, I would design, say,
java code wrappers doing anything that is not strictly INSERT and
SELECT raw data ... let me deal with the "semantics" and "business
intelligence" of the data myself
~

But, if you really feel a bare-bones implementation of PostgreSQL
is worthwhile you are the one that needs to test (and state explicitly) your
own underlying assumptions to see whether they hold and thus make such an
endeavor worthwhile.

~
you are right and I am up to it
~
On 9/25/11, Alban Hertroys <haramrae@gmail.com> wrote:

... and can now use those features within my SQL statements/queries.

~
For what exactly? Isn't a comparison on 4 numeric bytes (1 (or 1/2)
word in modern hardware) more efficient than comparing sequences of
string characters?

Data types aren't stored in the database as character strings (unless you
define your columns as text, of course).

~
I was talking about text and any formatting option in NUMERIC or DATE data
~
On 9/25/11, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Albretch Mueller <lbrtchx@gmail.com> writes:

Well, at least I thought you would tell me where the postgresql-base
is to be found. The last version I found is:

http://freebsd.csie.nctu.edu.tw/pub/distfiles/postgresql/postgresql-base-8.3beta2.tar.bz2
and I wondered what that is and why there are no postgresql-base
after "8.3beta2"

We stopped bothering because the split tarballs weren't really good for
anything separately ...

~
Thank you for answering my question
~

Now, having said that, there has been some interest in pushing
lesser-used chunks like the geometric datatypes out into extensions ...

~
Yes, that is it, extensions! It would be great if PG could be built
in a more modular way, with just the features you need. Clusters of
dependencies will have to be checked ... I think that would enrich PG
development
~
On 9/25/11, Uwe Schroeder <uwe@oss4u.com> wrote:

... you're looking for a non-sql compliant SQL database where a lot of
the data integrity is actually coded in the application :-)

~
First past of your statement I acknowledged, but how is it exactly
that "lot of the data integrity is actually coded in the application"
~

Take dates for example: you'd have to code very carefully to catch all the
different ways dates are represented on this planet.

~
Yeah! And java does an exceptionally good job at that (including
internationalization)
~
http://download.oracle.com/javase/7/docs/api/java/ {util/Date.html,
text/DateFormat.html}
~
So, you would ultimately just have to store a long value into the DBMS
~
I am amazed to read that you/the PC community were still running
regression tests in ASCII
http://www.postgresql.org/docs/9.1/static/release-9-1.html
(E.1.3.12. Source Code)
* Run regression tests
(postgresql.org/docs/9.1/static/install-procedure.html#BUILD) in the
default encoding (Peter Eisentraut)
Regression tests were previously always run with SQL_ASCII encoding.
~

Same goes for numbers: if everything is stored in a byte sequence, how does
the database know you try to compare the number 1 with the character "1"?

~
This is something I would do with wrapping code using input and
output bound command objects which are indexed after the same column
index the DBMS uses
~

To me, that's moving data integrity into the application.

~
Not exactly! Integrity is still a matter of the DBMS which can now
handle it in an easier way in someone write a date in bangla and
somebody else in Ukranian this is still the same date/time value
ultimately determined by the rotation of our planet around the sun ...
and all we need for that is a long value. Now, aren't we easying
things for the DBMS?
~

That approach strips down on application complexity. My apps don't have
to do any post-processing

of the data - I query the records I need and the app merely displays them.
~
Again have you actually tested those assumptions?

Yes, I have, as have many others. Simple example: program a website like,
say
Facebook. So you have thousands of users from all over the world. Your
website
code handles all the data conversions. Now Apple comes along and sells an
iPhone which silly enough a lot of people like and try to use to access your
website. You now face the problem that you need a second website doing the
same thing as the first website except solely made for touch-screen devices.
You will be forced to rewrite a lot of your code because all the data
conversion is in the code.

~
Well, the code you will have to write either way, regardless of where
you keep it and in order to not even have to restart the application
server cold I would code command objects (like function pointers in C)
to handle those cases. It is ultimately a strings of characters you
are dealing with
~

Sure this all depends on what application you need this specialized database
engine for. If it's an application for a very defined environment you can
dictate how data is to be input and train users. If it's an application for
the big wild world you will have problems with users doing stupid things
beyond your control like writing "P.O. Box 1" into the zipcode field where
you
expected a 10 digit number. I rather have the database catch those cases and
reject storing the bad input.

~
You see this is just an implementation issue I would rather try to do
validation as close to the client as possible and do it in an
incremental way if necessary
~
On 9/25/11, Chris Travers <chris.travers@gmail.com> wrote:

...
So here you get a case where the application was made less robust and
performed quite a bit worse by not using arbitrary math capabilities
of PostgreSQL.

~
this example is not representing what we have been discussing here
and rounding/decimal representation is something base 2 arithmetic was
not design for anyway
~
thank you and I think I will go for it anyway. Now, (and this is a
question for developers) since refactoring code is something that has
crossed your collective/cultural mind, is there a way I could do my
own perusing with the code that you would find useful?
~
lbrtchx

#15

Karsten Hilbert

Karsten.Hilbert@gmx.net

over 14 years ago

In reply to: Albretch Mueller (#14)

Re: (another ;-)) PostgreSQL-derived project ...

On Sun, Sep 25, 2011 at 09:41:19PM +0000, Albretch Mueller wrote:

I am amazed to read that you/the PC community were still running regression tests

*in ASCII*:

http://www.postgresql.org/docs/9.1/static/release-9-1.html (E.1.3.12. Source Code)
* Run regression tests (postgresql.org/docs/9.1/static/install-procedure.html#BUILD) in the default encoding (Peter Eisentraut)
Regression tests were previously always run

*with SQL_ASCII* encoding.

Quite obviously you have got no clue and didn't bother
checking either.

Karsten
--
GPG key ID E4071346 @ gpg-keyserver.de
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

#16

Uwe C. Schroeder

uwe@oss4u.com

over 14 years ago

In reply to: Albretch Mueller (#14)

Re: (another ;-)) PostgreSQL-derived project ...

First, I wonder what kind of technical person would say there are
"de-facto truth(s)". I thought only politicians would talk like that.

Well, politicians and Microsoft, Oracle etc. :-)

Now, in a sense you are right, I am talking from the background of my
own experiences (and so are you). When I have developed relatively
complicated and heavily accessed websites I only use DBMS when I need
to actually persist any data. For example there are options in Tomcat
(the java-based web serverrr) to offload session handling to a DBMS
(which is great when you need to stat(istically trace and infer users'
navigation) and establish transactions offloading a timed-out session
to an actual database hitting thhard drivevee (some user got
distracted ...) ...) and that sounds great, but anything dealing with
I/O would invariably slow your apps, so what I do is use in-mem (lite)
DBMS such as Hypersonic SQL (hsqldb.org) to keep sessions off the I/O
subsystem and the speed increase is --very-- noticeable
~
Since any piece of code engaging the I/O such as database access code
should be as fast and simple as possible; yes, I would design, say,
java code wrappers doing anything that is not strictly INSERT and
SELECT raw data ... let me deal with the "semantics" and "business
intelligence" of the data myself

So you're keeping a lot in memory, which to me suggests plenty of hardware is
available. One of my current apps chews up 8Gb of memory just for the app and
I can't afford to get a 64Gb or more server. If I wanted to keep permanently
accessed data in memory, I'd need somewhere around 1/2 a terrabyte of memory -
so obviously not an option (or maybe really bad database design :-) )

That said, just considering the cost/effort it takes to strip Postgresql down,
why don't you go with a server that has 1TB of solid state discs? That strips
down the I/O bottleneck considerably without any effort.

Data types aren't stored in the database as character strings (unless you
define your columns as text, of course).

~
I was talking about text and any formatting option in NUMERIC or DATE data

In my experience "data formatting" goes both ways, in and out. Out is
obviously not a major issue because errors don't cause data corruption. In,
however, is a different issue. Errors in "inwards" conversion will cause data
corruption. So unless you have an application that does very little "in" and a
lot of "out", you still have to code a lot of data conversion which otherwise
someone else (the postgresql developers) have already done for you.

Take dates for example: you'd have to code very carefully to catch all
the different ways dates are represented on this planet.

~
Yeah! And java does an exceptionally good job at that (including
internationalization)

Maybe it does. I never coded Java because I don't like to use technology where
Oracle can come sue me :-) I do know however that a lot of languages have
quirks with dates and internationalization (python you mentioned earlier being
one of them)

http://download.oracle.com/javase/7/docs/api/java/ {util/Date.html,
text/DateFormat.html}
~
So, you would ultimately just have to store a long value into the DBMS

Yes, a long value - which can represent pretty much any valid and invalid date
ever devised, so again you don't really know what's in the database when you
leave the validation to the application.

This is something I would do with wrapping code using input and
output bound command objects which are indexed after the same column
index the DBMS uses

Which still depends on your use case. Your assumption is that every piece of
code is coded in Java - which is fine if that's what your application calls
for. It's going to be a major hassle when you ever have to re-code in a
different language though.

To me, that's moving data integrity into the application.

~
Not exactly! Integrity is still a matter of the DBMS which can now
handle it in an easier way in someone write a date in bangla and
somebody else in Ukranian this is still the same date/time value
ultimately determined by the rotation of our planet around the sun ...
and all we need for that is a long value. Now, aren't we easying
things for the DBMS?

I agree to disagree on this one. The date value the database stores in this
case is a long. Any "long" can be converted into a valid date - but is it
really the date that was entered in the first place? If I give a date
representation, i.e. 12/31/2010 to the database, I personally don't really
care how the database stores the date underneath. All that interests me is
that the next time I ask for that field I get 12/31/2010 back. There is no
error that can be made other than user error if you ask the database to store
a specific date representation. There are errors you can make in your own
conversion code which can lead to a different "long" stored than intended. So
again data integrity is at least partially in the application and not the
database.

Yes, I have, as have many others. Simple example: program a website
like, say
Facebook. So you have thousands of users from all over the world. Your
website
code handles all the data conversions. Now Apple comes along and sells an
iPhone which silly enough a lot of people like and try to use to access
your website. You now face the problem that you need a second website
doing the same thing as the first website except solely made for
touch-screen devices. You will be forced to rewrite a lot of your code
because all the data conversion is in the code.

~
Well, the code you will have to write either way, regardless of where
you keep it and in order to not even have to restart the application
server cold I would code command objects (like function pointers in C)
to handle those cases. It is ultimately a strings of characters you
are dealing with

With the right design, you will have to rewrite the visual layer, not the
application logic. Errors in the visual layer are of little consequence
(except disgruntled users). So yes, if you use some kind of middleware that
does all the converting and validating for you, the difference is negligible.
But then, why write your own when the database already provides that
functionality?

You see this is just an implementation issue I would rather try to do
validation as close to the client as possible and do it in an
incremental way if necessary

This one I can agree on. My background is government and financial industry and
neither use a system with client side validation - at least not that I have
seen. Actually I've seen military systems which handle 98% of the
"application" inside the database and just import/export text files without any
GUI whatsoever.
As I've said earlier, it all depends on what you're trying to do, what the
requirements are and how sensitive the data is. In my way of thinking - being
a bit paranoid as it is - I rather have nothing in memory and everything on at
least 10 different computer systems spread out over the planet, just so that
one of the systems survives the next asteroid impact with all my data intact
:-) (i.e. one of my webservers doesn't even have the session id in memory -
everything is in postgresql and replicated to different servers. Given, a big
I/O hog because there's multiple records for every page ... but any cleaning
woman can pull the plug and nothing will happen because the hot standbys
someplace else will simply take over - people don't even have to log in again,
the session is still valid on the standby)

#17

John R Pierce

pierce@hogranch.com

over 14 years ago

In reply to: Uwe C. Schroeder (#16)

Re: (another ;-)) PostgreSQL-derived project ...

On 09/25/11 7:48 PM, Uwe Schroeder wrote:

Which still depends on your use case. Your assumption is that every piece of
code is coded in Java - which is fine if that's what your application calls
for. It's going to be a major hassle when you ever have to re-code in a
different language though.

its the old hammer and nail thing [1]If your only tool is a hammer, the whole world looks like a nail.. a pure Java programmer wants to
see everything in Java as its the tool he knows.

[1]: If your only tool is a hammer, the whole world looks like a nail.

--
john r pierce N 37, W 122
santa cruz ca mid-left coast

#18

Albretch Mueller

lbrtchx@gmail.com

over 14 years ago

In reply to: Karsten Hilbert (#15)

Re: (another ;-)) PostgreSQL-derived project ...

On 9/25/11, Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote:

On Sun, Sep 25, 2011 at 09:41:19PM +0000, Albretch Mueller wrote:

I am amazed to read that you/the PC community were still running
regression tests

*in ASCII*:

http://www.postgresql.org/docs/9.1/static/release-9-1.html (E.1.3.12.
Source Code)
* Run regression tests
(postgresql.org/docs/9.1/static/install-procedure.html#BUILD) in the
default encoding (Peter Eisentraut)
Regression tests were previously always run

*with SQL_ASCII* encoding.

Quite obviously you have got no clue and didn't bother
checking either.

~
Karsten, you are right to some extent, but this is what I plainly
read/it says and I can't keep a mental map of PG's code base and
culture.
~
Now, a more insufferable one, when I said:
~

... if someone write a date in Bangla and somebody else in Ukrainian this is still the same date/time value ultimately determined by the rotation of our planet around the sun

~
I meant by the rotation of our planet around the its own axis and
even if it is primary school knowledge as a Physicist I more than
enough know the difference even though the rotation of our planet
around the sun does influence what we nominally consider to be the
wall clock time (day light savings ...)
~
lbrtchx

#19

Albretch Mueller

lbrtchx@gmail.com

over 14 years ago

In reply to: Albretch Mueller (#18)

Re: (another ;-)) PostgreSQL-derived project ...

On 9/26/11, Uwe Schroeder <uwe@oss4u.com> wrote:

In my experience "data formatting" goes both ways, in and out. Out is
obviously not a major issue because errors don't cause data corruption. In,
however, is a different issue. Errors in "inwards" conversion will cause
data
corruption. So unless you have an application that does very little "in" and
a
lot of "out", you still have to code a lot of data conversion which
otherwise
someone else (the postgresql developers) have already done for you.

~
Well, yes Uwe, but any code in the DBMS would slow a bit its main job
which (to me) is store data and keeping its consistency as soon as
possible
~

Take dates for example: you'd have to code very carefully to catch all
the different ways dates are represented on this planet.

~
Yeah! And java does an exceptionally good job at that (including
internationalization)

Maybe it does. I never coded Java because I don't like to use technology
where
Oracle can come sue me :-)

~
Yeah! I felt like sh!t when I heard that Oracle had bought Sun, a
-slightly- more open company, at least they like to keep that
perception of themselves ...
~

http://download.oracle.com/javase/7/docs/api/java/ {util/Date.html,
text/DateFormat.html}
~
So, you would ultimately just have to store a long value into the DBMS

Yes, a long value - which can represent pretty much any valid and invalid
date
ever devised, so again you don't really know what's in the database when you
leave the validation to the application.

~
Not if you get that long value through java ;-)
~

This is something I would do with wrapping code using input and
output bound command objects which are indexed after the same column
index the DBMS uses

Which still depends on your use case. Your assumption is that every piece of
code is coded in Java - which is fine if that's what your application calls
for. It's going to be a major hassle when you ever have to re-code in a
different language though.

~
Well, yes you are right
~

To me, that's moving data integrity into the application.

~
Not exactly! Integrity is still a matter of the DBMS which can now
handle it in an easier way in someone write a date in bangla and
somebody else in Ukranian this is still the same date/time value
ultimately determined by the rotation of our planet around the sun ...
and all we need for that is a long value. Now, aren't we easying
things for the DBMS?

I agree to disagree on this one. The date value the database stores in this
case is a long. Any "long" can be converted into a valid date - but is it
really the date that was entered in the first place?

~
Again, not if you get that long value through java. It takes care of
doing those checks for you. You could for example not enter 02/30/2011
as a date object in java and try to get a long out of it
~

If I give a date
representation, i.e. 12/31/2010 to the database, I personally don't really
care how the database stores the date underneath. All that interests me is
that the next time I ask for that field I get 12/31/2010 back.

~
But "12/31/2010" is ultimately a representation of that long you
would have inserted that was and can be represented in many different
ways, depending on user preferences
~

There is no
error that can be made other than user error if you ask the database to
store
a specific date representation. There are errors you can make in your own
conversion code which can lead to a different "long" stored than intended.

~
conversion code will not be mine and it is part of java and I would
guess here python or any decent programming language
~

~
Well, the code you will have to write either way, regardless of where
you keep it and in order to not even have to restart the application
server cold I would code command objects (like function pointers in C)
to handle those cases. It is ultimately a strings of characters you
are dealing with

With the right design, you will have to rewrite the visual layer, not the
application logic. Errors in the visual layer are of little consequence
(except disgruntled users). So yes, if you use some kind of middleware that
does all the converting and validating for you, the difference is
negligible.
But then, why write your own when the database already provides that
functionality?

~
Because DBMS are I/O beasts and are way more likely to enter into
delayed states and concurrency conflicts, so to me, the less you
hassle them the better, for example I never handle http sessions with
a DBMS because they are very volatile, temporary and user specific. An
also, the hardware DBMSs sit on mechanically and magnetically wears
off with usage
~
On 9/26/11, John R Pierce <pierce@hogranch.com> wrote:

its the old hammer and nail thing [1]. a pure Java programmer wants to
see everything in Java as its the tool he knows.

~
Well, not exactly. I am not religious about code. The most
sophisticated code I have written in my life was in FORTRAN, but the
most amount of code I have written has been ANSI C and C++ and later
java, which I like because its hot spot technology enables it do run
time optimizations (something you can't do with statically compiled
code) which makes it faster than C on web servers
~
question: which kinds of code analysis tools do you use guys?
~
lbrtchx