Predictive or scoring solution for PostgreSQL ?
Hi,
Does anyone know a predictive or a database scoring solution for PostgreSQL ?
I'm looking for a system able to take a database with for example 100 000
records in total, inside them we have got 1000 records with one field set to
YES ... with about 100 fields in the table ...
The system should be able to set a score to the 100 fields to determine the
most importants fields to this 1000 records who's got the YES value ...
Then set a formula ... to calculate and to apply to the rest of the database
the same score ... and then estimate (predictive thing) in the 90 000 rest of
records which one may have the famous field set to YES ...
I hope I'm clear in my demand ... ;o)
Hope also someone have already heard about this ... and may be could help
me ;o)
best regards,
--
Hervᅵ
Hmmmm, it's been a while since I did this but...
This was with Sybase (it should be configurable with ODBC by now) but we used a
tool called ModelMAX (Advanced Software Appliactions or A.S.A) which could
select a sample of records and score them on the basis of fields (you need some
NO's as well). It produced 'C' code that would score non-flagged records on the
basis of the new results.
Our process was to select a sample of YES/NO records and split it into to two
samples. (The Yes records are actually coded as '1' and the No records as '0').
The No records give the system something to differentiate.
The first and larger sample was used to generate or train the neural net. Then
the second sample (with known values) was scored using the new model, and the
known result compared with the score.
Generally the score was a probability - of response or credit card application
approval or the like.
If the model is valid, the formula can be rolled out to the database.
The trick is that the tool needs to understand something about the fields
available for scoring. Domain and type, ranges and codings - if these are fixed
they are a one time setup.
Other tools do similar things - another was Knowledge Seeker from Angoss
Software - which built turnkey decision trees (this was fairly cheap depending
on the system it is running on). SAS also produced a turnkey modeling solution
(not cheap $$$$). You could also try SPSS (cheaper than SAS). Group 1 Software
also marketed an all-in-one Modeling Sol'n - Model 1 (I think) but I never
actually used it.
I'll dig around and see if I can find an article I wrote about this...
Marc A. Leith
President
redboxdata inc.
E-mail:mleith@redboxdata.com
Quoting Herv� Piedvache <footcow@noos.fr>:
Show quoted text
Hi,
Does anyone know a predictive or a database scoring solution for PostgreSQL
?I'm looking for a system able to take a database with for example 100 000
records in total, inside them we have got 1000 records with one field set toYES ... with about 100 fields in the table ...
The system should be able to set a score to the 100 fields to determine the
most importants fields to this 1000 records who's got the YES value ...
Then set a formula ... to calculate and to apply to the rest of the databasethe same score ... and then estimate (predictive thing) in the 90 000 rest of
records which one may have the famous field set to YES ...
I hope I'm clear in my demand ... ;o)
Hope also someone have already heard about this ... and may be could help
me ;o)best regards,
--
Herv�---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
Quoting Herv� Piedvache <footcow@noos.fr>:
Hi,
Does anyone know a predictive or a database scoring solution for PostgreSQL
in response, Marc A. Leith wrote:
Hmmmm, it's been a while since I did this but...
Other tools do similar things - another was Knowledge Seeker from Angoss
Software - which built turnkey decision trees (this was fairly cheap depending
on the system it is running on). SAS also produced a turnkey modeling solution
(not cheap $$$$). You could also try SPSS (cheaper than SAS). Group 1 Software
also marketed an all-in-one Modeling Sol'n - Model 1 (I think) but I never
actually used it.
Would Joe Conway's PL/R procedural language be any help here? I'd guess
there's an R package to fit the bill, but then again I'm only on page 30
of Modern Applied Statistics in S-Plus. ;-)
Mike Mascari
Marc A. Leith wrote:
Other tools do similar things - another was Knowledge Seeker from Angoss
Software - which built turnkey decision trees (this was fairly cheap depending
on the system it is running on). SAS also produced a turnkey modeling solution
(not cheap $$$$). You could also try SPSS (cheaper than SAS). Group 1 Software
also marketed an all-in-one Modeling Sol'n - Model 1 (I think) but I never
actually used it.
Or try R (open source implementation of the S language, similar to
S-PLUS)...
http://www.r-project.org/
...along with PL/R:
http://www.joeconway.com/plr/
And see here for a variety of packages to do just about any kind of
analysis you can think of:
http://cran.r-project.org/
Some assembly required, but powerful and free.
HTH,
Joe
Quoting Mike Mascari <mascarm@mascari.com>:
Quoting Herv� Piedvache <footcow@noos.fr>:
Hi,
Does anyone know a predictive or a database scoring solution forPostgreSQL
in response, Marc A. Leith wrote:
Hmmmm, it's been a while since I did this but...
Other tools do similar things - another was Knowledge Seeker from Angoss
Software - which built turnkey decision trees (this was fairly cheapdepending
on the system it is running on). SAS also produced a turnkey modeling
solution
(not cheap $$$$). You could also try SPSS (cheaper than SAS). Group 1
Software
also marketed an all-in-one Modeling Sol'n - Model 1 (I think) but I never
actually used it.Would Joe Conway's PL/R procedural language be any help here? I'd guess
there's an R package to fit the bill, but then again I'm only on page 30
of Modern Applied Statistics in S-Plus. ;-)Mike Mascari
For a turnkey modeling solution, you need more than simple stat functions. These
solutions automatically transform or 'bucketize' the data and then analyze the
covariance between the score variables and the known result.
They then select a smaller number of variables and use them to build a model -
this may be done with a backward-propogation neural network, a more traditional
regression model, or some sort of decision tree or CHAID system. Model 1 uses 3
or 4 approaches and selects the 1 with the best (truest fit).
ModelMAX (and the like) have been honed over the last decade by teams of
statisticians and still generate models that are close but not yet equal to
those that our modeling team used to build. The difference was I could build a
model in a few hours (limited by the CPU on the PC) and they took several weeks
to hand tune the result.
Marc A. Leith
President
redboxdata inc.
E-mail:mleith@redboxdata.com
Marc A. Leith wrote:
Quoting Mike Mascari <mascarm@mascari.com>:
Would Joe Conway's PL/R procedural language be any help here? I'd guess
there's an R package to fit the bill, but then again I'm only on page 30
of Modern Applied Statistics in S-Plus. ;-)For a turnkey modeling solution, you need more than simple stat functions. These
solutions automatically transform or 'bucketize' the data and then analyze the
covariance between the score variables and the known result.
I'm obviously not in any position to define what is needed here. I only
had business statistics in college as a requirement for an economics
degree many years ago. However, I will say that you may be
underestimating R's capabilities. It includes linear and non-linear
regression models, neural networks, time-series analysis, and a host
(and I mean 100's) of other models I have yet to fathom. I'd humbly
speculate that the core developers, include the chairman of the
statistics department at Oxford, would take issue with its
characterization as "simple stat functions". But what do I know... :-)
Mike Mascari
On Thu, 05 Feb 2004 07:45:41 -0500, Mike Mascari wrote:
Marc A. Leith wrote:
Quoting Mike Mascari <mascarm@mascari.com>:
Would Joe Conway's PL/R procedural language be any help here? I'd guess
there's an R package to fit the bill, but then again I'm only on page 30
of Modern Applied Statistics in S-Plus. ;-)For a turnkey modeling solution, you need more than simple stat functions. These
solutions automatically transform or 'bucketize' the data and then analyze the
covariance between the score variables and the known result.
I'm obviously not in any position to define what is needed here. I only
had business statistics in college as a requirement for an economics
degree many years ago. However, I will say that you may be
underestimating R's capabilities. It includes linear and non-linear
regression models, neural networks, time-series analysis, and a host
(and I mean 100's) of other models I have yet to fathom. I'd humbly
speculate that the core developers, include the chairman of the
statistics department at Oxford, would take issue with its
characterization as "simple stat functions". But what do I know... :-)
Mike Mascari
Fair enough - I took a look at the links that Joe Conway provided and it seems very powerful and feature complete. My comment was unfair, and consider it
rephrased/withdrawn
- BUT is it turnkey? The original question sought a 'system' to score the database.
SAS & SPSS can be configured to do this, as likely R can be, but does that make it a system?
The solutions I suggested can be run by someone with virtually no knowledge of stats (Not that I suggest this for complex issues). They can select an appropriate model in
minutes rather than needing a MA to desing a solution.
Marc
Marc A. Leith
President
redboxdata inc.
e-mail: marc@redboxdata.com
cell: (416) 737 0045