Re: To all the pgsql developers..Have a look at the operators proposed by me in my researc

Started by Tasneem Memonover 18 years ago3 messages

tasneememon@hotmail.com

over 18 years ago

From: josh@agliodbs.com> To: pgsql-hackers@postgresql.org> Tasneem,> > > > The margins to the op2, i.e. m1 and m2, are added dynamically on > > > both the sides, considering the value it contains. To keep this > > > margin big is important for a certain reason discussed later.> > > The NEAR operator is supposed to obtain the values near to the op2, > > > thus the target membership degree(md) is initially set to 0.8.> > > The algorithm compares the op1(column) values row by row to the > > > elements of the set that NEAR defined, i.e. the values from md 1.0 > > > to 0.8, adding matching tuples to the result set.> > Are we talking about a mathematical calculation on the values, or an algorithm > against the population of the result set? I'm presuming the latter or you > could just use a function. If so, is NEAR an absolute range or based on > something logarithmic like standard deviation?>

It is based on fuzzy logic.. we take the operand2 (which is a crisp value, given by the end user) as a fuzzy set.. assign membership degree to its elements.. and then get the values between 1.0 to 0.8 as the values NEAR to the operand2, and output those as the result.
I have made the initial membership degree constant, i.e 0.8. But that doesn’t mean that the size of the set defined by NEAR (md=0.8) remains constant. The larger the operand2 set, the larger the range of the set defined by NEAR.

Beyond that, I would think that this mechanism would need some kind of extra > heuristics to be at all performant, otherwise you're querying the entire > table (or at least the entire index) every time you run a query. Have you > given any thought to this?>

Yes u are right.. that’s my main concern. Here I have just put forward an idea to incorporate fuzziness in current database systems through the ANSI SQL; but still I have to look into that problem if it is to be functional at all with large amount of data.

Tasneem Memon

_________________________________________________________________
Connect to the next generation of MSN Messenger
http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline

Tasneem Memon

tasneememon@hotmail.com

over 18 years ago

In reply to: Tasneem Memon (#1)

From: decibel@decibel.org> To: tasneememon@hotmail.com> CC: pgsql-hackers@postgresql.org> Subject: Re: [HACKERS] To all the pgsql developers..Have a look at the operators proposed by me in my researc> > On Sat, Jun 02, 2007 at 01:37:19PM +0000, Tasneem Memon wrote:> > We can make the system ask the user as to what membership degree s/he wants to get the values, but we don?t want to make the system interactive, where a user gives a membership degree value of his/her choice. These operators are supposed to work just like the other operators in SQL.. you just put them in the query and get a result. I have put 0.8 because all the case studies I have made for the NEAR, 0.8 seems to be the best choice.. 0.9 narrows the range.. 0.75 or 0.7 gets those values also that are irrelevant.. However, these values will no more seem to be irrelevant when we haven?t got any values till the md 0.8, so the operator fetches them when they are the NEARest. > > While having them function just like any other operator is good, it> seems like you're making quite a bit of an assumption for the user;> namely that you know what their data looks like better than they might.> Is it not possible that someone would come along with a dataset that> looks different enough from your test cases so that the values you> picked wouldn't work?> --
Jim Nasby decibel@decibel.org

I believe that for the most cases it will get you the relevant results; because the size of the set depends on how big the value in operand2 is, and so does the set defined by NEAR. I have taken the values as small as 6 and as large as 2147,483,647 and it gives good results. For example:
For 6, the range defined by NEAR is: 4 – 8
For 2147,483,647, the range defined by NEAR is: 1717,986,917 – 2576,980,377
But yes, for other cases it may not give good results. We can give the option for the user to specify the membership degree, or one can always use the BETWEEN operator when he knows the thresholds exactly.

- Tasneem Memon

_________________________________________________________________
Explore the seven wonders of the world
http://search.msn.com/results.aspx?q=7+wonders+world&mkt=en-US&form=QBRE

Import Notes

Resolved by subject fallback

Josh Berkus

josh@agliodbs.com

over 18 years ago

In reply to: Tasneem Memon (#2)

Re: To all the pgsql developers..Have a look at the operators proposed by me in my researc

Tasneem,

For example: For 6,
the range defined by NEAR is: 4 – 8 For
2147,483,647, the range defined by NEAR is: 1717,986,917 –
2576,980,377 But yes, for other cases it may not give good results. We can
give the option for the user to specify the membership degree, or one can
always use the BETWEEN operator when he knows the thresholds exactly.

Oh, so this is actually based on a direct calculation on the values, rather
than on the population of data. That's much easier; heck, you could index on
it.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco