Integration with Access Method interface

Started by Alice Lottiniover 23 years ago4 messageshackers

alice_lottini@yahoo.it

over 23 years ago

Hi everybody,
we're developing a C programme which needs to directly
use the functions of the Access Methods interface.
In particular, our programme contains a function,
readFromPG, which directly calls functions such as
heap_open, heap_beginscan and so on in order to
perform a low-level retrieval of data which are to be
made available for further elaborations.

We've already written the code and we'd like to test
it, but we don't know how to make the Access Methods
visible to our programme.

Is it possible to create a common C programme in which
our function readFromPG is called and to run it
directly as a simple Unix executable (it should not
become a UDF or a stored procedure)?

Thanks in advance, alice and lorena

______________________________________________________________________
Yahoo! Cellulari: loghi, suonerie, picture message per il tuo telefonino
http://it.yahoo.com/mail_it/foot/?http://it.mobile.yahoo.com/index2002.html

Tom Lane

tgl@sss.pgh.pa.us

over 23 years ago

In reply to: Alice Lottini (#1)

Re: Integration with Access Method interface

=?iso-8859-1?q?Alice=20Lottini?= <alice_lottini@yahoo.it> writes:

we're developing a C programme which needs to directly
use the functions of the Access Methods interface.
In particular, our programme contains a function,
readFromPG, which directly calls functions such as
heap_open, heap_beginscan and so on in order to
perform a low-level retrieval of data which are to be
made available for further elaborations.

Why?

The answer to your question is simple: you can't, because those are
internal backend operations and are just not available to client
programs. But I'm really at a loss why you think this would be a good
thing to do. What's wrong with a "SELECT ..." command ?

regards, tom lane

Alice Lottini

alice_lottini@yahoo.it

over 23 years ago

In reply to: Tom Lane (#2)

Re: Integration with Access Method interface

Our task is to implement FPGrowth (an algorithm for
extracting association rules for data mining purposes)
as a C programme and to integrate it at low level into
Postgres. We are strictly required not to pass through
the SQL layer and to bypass even the optimiser layer,
getting the data out of tables directly with the
Access Methods.

The reason for this is that all the existing tools for
data mining obtain data either from flat files or from
dbms, through SQL queries; since the amount of data
involved is usually extremely huge, this high level
integration results in rather poor performances.
Furthermore, FPGrowth is a recursive algorithm and the
data structures it needs (FPTree's) are likely not to
fit into memory.
In order to partly solve such problems, we've studied
an optimised version of the algorithm as well as a
partitioning technique for the data structures so that
they can be stored on the disk instead of having to be
held into memory.

Now we must enable our programme to access the data
directly from the table so that the FPtree can be
built and, after having partitioned it according to
our strategy, stored on the disk blocks (each node of
our tree should be a tuple).

We'd like to know which is the most suitable way for
integrating our algorithm into the server at the
access method level. If it is not possible simply to
invoke the access methods from an external programme,
what could be an alternative? Maybe making the whole
procedure a user defined function such as the ones in
contrib is the most viable way...
Any suggestion would be greatly appreciated.
Thanks in advance!
Best regards, alice and lorena

 --- Tom Lane <tgl@sss.pgh.pa.us> ha scritto: >
=?iso-8859-1?q?Alice=20Lottini?=

<alice_lottini@yahoo.it> writes:

we're developing a C programme which needs to

directly

use the functions of the Access Methods interface.
In particular, our programme contains a function,
readFromPG, which directly calls functions such as
heap_open, heap_beginscan and so on in order to
perform a low-level retrieval of data which are to

be

made available for further elaborations.

Why?

The answer to your question is simple: you can't,
because those are
internal backend operations and are just not
available to client
programs. But I'm really at a loss why you think
this would be a good
thing to do. What's wrong with a "SELECT ..."
command ?

regards, tom lane

Jan Wieck

JanWieck@Yahoo.com

over 23 years ago

In reply to: Alice Lottini (#3)

Re: Integration with Access Method interface

Alice Lottini wrote:

Our task is to implement FPGrowth (an algorithm for
extracting association rules for data mining purposes)
as a C programme and to integrate it at low level into
Postgres. We are strictly required not to pass through
the SQL layer and to bypass even the optimiser layer,
getting the data out of tables directly with the
Access Methods.

You have no way to tell the database system what you currently access.
So the data you read from the disk is allowed to be invalid and even
corrupted as the DB system sees fit (it needs to know how to return into
a consistent state, but it doesn not need to tell you how it thinks to
accomplish that task or when it will be in the mood to do so).

The only way you can ensure consistent reads from PostgreSQL data files
is by shutting down the postmaster first. So your "digger" could be some
sort of standalone backend that can only work while the database system
is down.

The reason for this is that all the existing tools for
data mining obtain data either from flat files or from
dbms, through SQL queries; since the amount of data
involved is usually extremely huge, this high level
integration results in rather poor performances.
Furthermore, FPGrowth is a recursive algorithm and the
data structures it needs (FPTree's) are likely not to
fit into memory.

But the single nodes of that FPTree still fit, no?

In order to partly solve such problems, we've studied
an optimised version of the algorithm as well as a
partitioning technique for the data structures so that
they can be stored on the disk instead of having to be
held into memory.

Now we must enable our programme to access the data
directly from the table so that the FPtree can be
built and, after having partitioned it according to
our strategy, stored on the disk blocks (each node of
our tree should be a tuple).

We'd like to know which is the most suitable way for
integrating our algorithm into the server at the
access method level. If it is not possible simply to
invoke the access methods from an external programme,
what could be an alternative? Maybe making the whole
procedure a user defined function such as the ones in
contrib is the most viable way...
Any suggestion would be greatly appreciated.
Thanks in advance!
Best regards, alice and lorena

I think we know much too little about your algorithms to give you any
advice yet. What it looks like to me is that you might have wanted very
substantial, in depth, relational database knowledge and experience a
little earlier in your project. Just that everyone else using SQL failed
so far doesn't mean that these guy's had the brightest database hotshots
on their teams. Also, PostgreSQL's extensibility might offer a few
possible paths inside the "supported" boundaries of the backends
streetmap. We just need to know more.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #