SQL/MED estimated time of arrival?
Hi SQL/MED developers,
Our company has just finished development of a database extension for
Informix that provides tabular access to various types of structured
files (NetCDF and HDF5, with more types to come). We would like to
port this logic to run on PostgreSQL, since many of our potential
customers use PostgreSQL.
On Informix, we were able to take advantage of the VTI (Virtual Table
Interface) feature to support "table" scans and indexing. (See
http://www.ibm.com/developerworks/data/zones/informix/library/techarticle/db_vti.html
.) Do you have any idea of how long it will be before SQL/MED on
PostgreSQL will be available, and perhaps how similar it will be to
Informix VTI?
Thanks,
Eric.
**********************************************
Eric Davies, M.Sc.
Senior Programmer Analyst
Barrodale Computing Services Ltd.
1095 McKenzie Ave., Suite 418
Victoria BC V8P 2L5
Canada
Tel: (250) 704-4428
Web: http://www.barrodale.com
Email: eric@barrodale.com
**********************************************
On Wed, 03 Nov 2010 13:32:18 -0700
Eric Davies <eric@barrodale.com> wrote:
On Informix, we were able to take advantage of the VTI (Virtual Table
Interface) feature to support "table" scans and indexing. (See
http://www.ibm.com/developerworks/data/zones/informix/library/techarticle/db_vti.html
.) Do you have any idea of how long it will be before SQL/MED on
PostgreSQL will be available, and perhaps how similar it will be to
Informix VTI?
SQL/MED is now under discussion/development for PostgreSQL 9.1, and
9.1 would be released one year after 9.0, maybe around Sep 2011? For
detail of release schedule, please see the development plan of
PostgreSQL 9.1.
http://wiki.postgresql.org/wiki/PostgreSQL_9.1_Development_Plan
I looked into VTI documents you've pointed. ISTM that VTI and SQL/MED
would have a lot of common ideas, and most of VTI items would be able
to be mapped to one of SQL/MED items, except features about updating
data and indexing.
For example:
* PRIMARY ACCESS_METHOD -> HANDLER of FOREIGN DATA WRAPPER
* am_scancost() -> FdwRoutine.EstimateCosts()
* am_open() -> FdwRoutine.Open()
* am_beginscan() -> first call of FdwRoutine.Iterate()?
* am_getnext() -> FdwRoutine.Iterate()
* am_rescan() -> FdwRoutine.ReOpen()
* am_close() -> FdwRoutine.Close()
* Table descriptor -> Relation, Form_pg_class
* Qual descriptor -> PlanState.qual
I hope the summary of SQL/MED described in wiki page helps you.
http://wiki.postgresql.org/wiki/SQL/MED
Any comments and questions are welcome.
Regards,
--
Shigeru Hanada
On Thu, Nov 4, 2010 at 6:04 PM, Shigeru HANADA
<hanada@metrosystems.co.jp> wrote:
For example:
* PRIMARY ACCESS_METHOD -> HANDLER of FOREIGN DATA WRAPPER
* am_scancost() -> FdwRoutine.EstimateCosts()
* am_open() -> FdwRoutine.Open()
* am_beginscan() -> first call of FdwRoutine.Iterate()?
It might be good to have a separated "beginscan" method if we use
asynchronous scans in multiple foreign servers in one query
because multiple foreign servers can run their queries in parallel.
(Imagine that pushing-down aggregate function into each foreign server.)
I think it is different from "open" because it is called
before query execution, for example by EXPLAIN.
* am_getnext() -> FdwRoutine.Iterate()
* am_rescan() -> FdwRoutine.ReOpen()
* am_close() -> FdwRoutine.Close()
* Table descriptor -> Relation, Form_pg_class
* Qual descriptor -> PlanState.qual
Do you think you have all counterpart methods for VTI AMs?
If so, it's a good news ;-) We could support foreign table
features as same level as Informix.
--
Itagaki Takahiro
On Thu, 4 Nov 2010 18:22:52 +0900
Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote:
On Thu, Nov 4, 2010 at 6:04 PM, Shigeru HANADA
<hanada@metrosystems.co.jp> wrote:For example:
* PRIMARY ACCESS_METHOD -> HANDLER of FOREIGN DATA WRAPPER
* am_scancost() -> FdwRoutine.EstimateCosts()
* am_open() -> FdwRoutine.Open()
* am_beginscan() -> first call of FdwRoutine.Iterate()?It might be good to have a separated "beginscan" method if we use
asynchronous scans in multiple foreign servers in one query
because multiple foreign servers can run their queries in parallel.
(Imagine that pushing-down aggregate function into each foreign server.)
You mean that separated beginscan (FdwRoutine.BeginScan?) starts
asynchronous query and returns immediately, and FdwRoutine.Iterate
returns result of that query?
Pushing aggregate down to foreign server would be efficient, but need
another hook which can create one ForeignScan node which have "Agg +
ForeignScan" functionality. Same optimization would be able to apply
for Sort and Limit. Such optimization should be done in optimizer
with estimated costs? Or FDW's hook function may change plan tree
which was created by planner?
I think it is different from "open" because it is called
before query execution, for example by EXPLAIN.
Right, I've misunderstood.
VTI programmer's guide says that am_open is called before processing
SQL to initialize input or output, and called for not only SELECT but
also other queries using a virtual table such as INSERT and DROP TABLE.
The am_open would have no counterpart in SQL/MED.
Do you think you have all counterpart methods for VTI AMs?
If so, it's a good news ;-) We could support foreign table
features as same level as Informix.
Not all, but most of them for read-only access.
VTI supports updating external data and various management tasks via
SQL, but SQL/MED supports (at least in standard) only read access.
The full set of ACCESS_METHOD functions are:
am_create CREATE FOREIGN TABLE
am_drop DROP TABLE
am_stats gather statistics (ANALYZE)
am_check verify data structure and index consistency
am_open initialize access to a virtual table
(might connect to external server)
am_close finalize access to a virtual table
am_scancost estimate cost of a scan
am_beginscan initialize scan
am_getbyid get a tuple by row-id
am_getnext get next tuple(s)
am_rescan reset state of scanning
am_endscan finalize scan
am_insert insert a tuple and return row-id
am_update update a tuple by row-id
am_delete delete a tuple by row-id
am_truncate truncate table
VTI might be similar to storage engine of MySQL or heap-am of PG,
rather than SQL/MED of PG.
Like FOREIGN INDEX of HiRDB, Informix has Virtual Index Interface, and
am_getbyid is used to get a tuple by row-id. I'll research more about
VTI and VII for revising design of SQL/MED.
Regards,
--
Shigeru Hanada
On Fri, Nov 5, 2010 at 4:00 PM, Shigeru HANADA
<hanada@metrosystems.co.jp> wrote:
* am_beginscan() -> first call of FdwRoutine.Iterate()?
It might be good to have a separated "beginscan" method if we use
asynchronous scans in multiple foreign servers in one queryYou mean that separated beginscan (FdwRoutine.BeginScan?) starts
asynchronous query and returns immediately, and FdwRoutine.Iterate
returns result of that query?
Yes. Each BeginScan() in the executor node tree will be called at
the beginning of executor's run. The callback should not block
the caller. OTOH, Iterate() are called at the first time tuples
in the node are required.
PL/Proxy has a similar functionality with RUN ON ALL to start queries
in parallel. So, I think it's a infrastructure commonly required.
--
Itagaki Takahiro
On Fri, 5 Nov 2010 16:27:49 +0900
Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote:
On Fri, Nov 5, 2010 at 4:00 PM, Shigeru HANADA
<hanada@metrosystems.co.jp> wrote:* am_beginscan() -> first call of FdwRoutine.Iterate()?
It might be good to have a separated "beginscan" method if we use
asynchronous scans in multiple foreign servers in one queryYou mean that separated beginscan (FdwRoutine.BeginScan?) starts
asynchronous query and returns immediately, and FdwRoutine.Iterate
returns result of that query?Yes. Each BeginScan() in the executor node tree will be called at
the beginning of executor's run. The callback should not block
the caller. OTOH, Iterate() are called at the first time tuples
in the node are required.
Thanks, now I see your point. Current FdwRoutine has no appropriate
function because Open is called from ExecutorStart which is used by
EXPLAIN too.
But then we have mismatch between executor node interface and FDW
interface about BeginScan. Should we add new function such as
ExecBeginNode and call ExecBeginXXX for each plan node?
New Query Processing Control Flow would be:
# based on README of executor directory
CreateQueryDesc
ExecutorStart
CreateExecutorState
creates per-query context
switch to per-query context to run ExecInitNode
ExecInitNode --- recursively scans plan tree
CreateExprContext
creates per-tuple context
ExecInitExpr
ExecutorRun
ExecBeginNode(new) --- recursively scans plan tree
call ExecBeginXXXS for each plan node
ExecProcNode --- recursively called in per-query context
ExecEvalExpr --- called in per-tuple context
ResetExprContext --- to free memory
ExecutorEnd
ExecEndNode --- recursively releases resources
FreeExecutorState
frees per-query context and child contexts
FreeQueryDesc
PL/Proxy has a similar functionality with RUN ON ALL to start queries
in parallel. So, I think it's a infrastructure commonly required.
I noticed the lack of consideration about cache invalidation from
reading PL/Proxy source, thanks for your mention about PL/Proxy. :-)
Regards,
--
Shigeru Hanada
Shigeru HANADA <hanada@metrosystems.co.jp> writes:
Thanks, now I see your point. Current FdwRoutine has no appropriate
function because Open is called from ExecutorStart which is used by
EXPLAIN too.
But then we have mismatch between executor node interface and FDW
interface about BeginScan. Should we add new function such as
ExecBeginNode and call ExecBeginXXX for each plan node?
That seems like a massive amount of new code, and wasted cycles during
every query startup, to fix a very small problem.
There's a flag EXEC_FLAG_EXPLAIN_ONLY that tells node Init functions
whether the query is going to be run "for real" or only EXPLAINed.
Use that to decide whether to do any real work.
regards, tom lane
2010/11/5 Shigeru HANADA <hanada@metrosystems.co.jp>:
On Fri, 5 Nov 2010 16:27:49 +0900
Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote:PL/Proxy has a similar functionality with RUN ON ALL to start queries
in parallel. So, I think it's a infrastructure commonly required.I noticed the lack of consideration about cache invalidation from
reading PL/Proxy source, thanks for your mention about PL/Proxy. :-)
And if we really make this async query come true, I suggest designing
resource (i.e. remote connection) management very carefully. When the
executor fails in the middle of its execution, it possibly fails to
release its own resource; close() in ExecutorEnd() will never be
called. As far as I know files and memory are released automatically
in the current mechanism, but MED APIs will use their own resources
other than them.
Regards,
--
Hitoshi Harada
Hitoshi Harada <umi.tanuki@gmail.com> writes:
And if we really make this async query come true, I suggest designing
resource (i.e. remote connection) management very carefully. When the
executor fails in the middle of its execution, it possibly fails to
release its own resource; close() in ExecutorEnd() will never be
called. As far as I know files and memory are released automatically
in the current mechanism, but MED APIs will use their own resources
other than them.
The way to fix that is for the FDW to hook into the ResourceOwner
mechanism (via RegisterResourceReleaseCallback). Then it can track
and clean up things it knows about just as "automatically" as anything
else is.
Of course, if you lose your network connection to the remote DB,
you have to assume it will clean up of its own accord.
regards, tom lane
On Fri, 05 Nov 2010 10:43:45 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:
Shigeru HANADA <hanada@metrosystems.co.jp> writes:
Thanks, now I see your point. Current FdwRoutine has no appropriate
function because Open is called from ExecutorStart which is used by
EXPLAIN too.But then we have mismatch between executor node interface and FDW
interface about BeginScan. Should we add new function such as
ExecBeginNode and call ExecBeginXXX for each plan node?That seems like a massive amount of new code, and wasted cycles during
every query startup, to fix a very small problem.
Agreed.
There's a flag EXEC_FLAG_EXPLAIN_ONLY that tells node Init functions
whether the query is going to be run "for real" or only EXPLAINed.
Use that to decide whether to do any real work.
I missed that flag. That flag would make ExecInitForeignScan be able
to skip calling FdwRoutine.BeginScan when the query was an EXPLAIN
without ANALYZE. Thanks for the suggestion.
Regards,
--
Shigeru Hanada
On Sat, 6 Nov 2010 16:04:37 +0900
Hitoshi Harada <umi.tanuki@gmail.com> wrote:
2010/11/5 Shigeru HANADA <hanada@metrosystems.co.jp>:
On Fri, 5 Nov 2010 16:27:49 +0900
Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote:PL/Proxy has a similar functionality with RUN ON ALL to start queries
in parallel. So, I think it's a infrastructure commonly required.I noticed the lack of consideration about cache invalidation from
reading PL/Proxy source, thanks for your mention about PL/Proxy. :-)And if we really make this async query come true, I suggest designing
resource (i.e. remote connection) management very carefully. When the
executor fails in the middle of its execution, it possibly fails to
release its own resource; close() in ExecutorEnd() will never be
called. As far as I know files and memory are released automatically
in the current mechanism, but MED APIs will use their own resources
other than them.
Yes, managegement of FDW's resources is very important issue. Curren
FdwRoutine includes ConnectServer and FreeFSConnection, but they might
not be enough to manage FDW's resources by backend in common way.
Because connection is not only resource FDW use. Possible resources
are:
- Files (Virtual File descriptor would help to manage)
- Database connections (might be cached)
- Server-side cursors (would be released with DB connection?)
- Heap memory (for instance, libpq uses malloc)
For example, if postgresql_fdw uses server-side cursor to retreive
result tuples, it would be required to CLOSE cursors at the end of
transaction. Closing cursor at the end of session wouldn't be good
idea because clients might pool and reuse connections.
How about removing them, ConnectServer and FreeFSConnection, from
FdwRoutine and leaving the responsibility of resource management to
each FDW? Each FDW would have to use mechanism such as Virtual File
and ResourceOwner to manage resources properly, though.
Regards,
--
Shigeru Hanada
On Fri, 5 Nov 2010 16:27:49 +0900
Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote:
On Fri, Nov 5, 2010 at 4:00 PM, Shigeru HANADA
<hanada@metrosystems.co.jp> wrote:* am_beginscan() -> first call of FdwRoutine.Iterate()?
It might be good to have a separated "beginscan" method if we use
asynchronous scans in multiple foreign servers in one queryYou mean that separated beginscan (FdwRoutine.BeginScan?) starts
asynchronous query and returns immediately, and FdwRoutine.Iterate
returns result of that query?Yes. Each BeginScan() in the executor node tree will be called at
the beginning of executor's run. The callback should not block
the caller. OTOH, Iterate() are called at the first time tuples
in the node are required.
Please find attached WIP patch for BeginScan. Postgresql_fdw has been
changed to use server-side cursor for sample. It's DECLAREd with HOLD
option to avoid transaction management, though.
Other changes since 20101025 patch are:
- Some document fixes.
- Don't call ConnectServer from ExecInitForeignScan. Instead,
postgresql_fdw calls it from pgOpen(). This change is only trial
and would be removed later.
- Add "schema" column to output of \det psql command.
- New \dE psql command shows list of foreign tables in \d format.
- \d+ <foreign table> psql command shows per-column options.
If the changes (at least adding BeginScan) are OK, I'll clean the
patch up and post it soon.
Regards,
--
Shigeru Hanada
Attachments:
fdw_select_cursor.patch.gzapplication/octet-stream; name=fdw_select_cursor.patch.gzDownload
����L fdw_select_cursor.patch �<�w���?�_��}X��
~�q�S�M��n�������D$����~gfWO�mr�����;;������ULk:e���
�~h�N�Y��;��O-���ZS�rL��M�������^7.&/�X������X�Vs���r���g����+��9�����vg8b���EV(,g����������k<��o?��`��94��������w�65��g����S�Ol>]9�4�0T��;5�d� ���N����Z�������|���4�� ��!��U?B���*,�_�g�V0��3Z44O��X���:����w~}�{���5���]�.�b9��X.4X����u���t����
S����3���`�=8�F?u������k�r�g����L-�;�ZN ��L�yZr��X�1#�Y����(, Jn�������
�����|����~a�5��k3���|)��r�W�H�9�#�f�g/������b�����`�%�8n����r�
F8�4�93~�3���}Al(�?
m>Um�`�w`�)l{jy~���K�(R���L���k�5Y�� �j��Y}
��DC
�v�Y�K����K��#ZW����d5���o�W����l���X*4��
}������K��~�m1���J%\g��Q���\�������`���`�icw����I��^�3n9�V|����3��s��6�����F���������m��!���a�x�i��W��@4���:@m�?���6q�]��Co��pm���ZT�:4�8��$��p�3\�+�p���U�������W?���6�u��$�@.���g����Ka�6��3�x+��,�� ����"i
���F�� |��zXw��T&]�������o0��n����;��v�����6n^uU�'����HBX�]���J9�����d�f:<�v���-H�����e@�����������a����tl�%{��l�M�E��z�9���`�B����
�OM�|q�A�7b�(�F8�N'
2N'�6 -$Y 9�L ���DK�m4ljX�E*l�>�t����G��J~I�����V�=�
� 0�/�&�b/I�KJ�hd�����Wv��w��f{t���0I���� �\�St�#�`zZ�N�H�A@���L��$�C��?tV�����PnG�#0G�c�Y�����w��[�p����iy���gq ,��?��9��
[o��NN_���:AH�
��+��1�b����������&����������<�T;_+����L��jT����������������������Gb)������MV��M/d� ��r�����<��z3TG�
��#����
n~ �����
��0�Z3l�e�^���9��@��>,�r{��_F��m)����X�%�i
�Z��VKbK������j3���vX� ��]���,��J������nwlV�gm������8�E���2� �0R�l�6��[m����Wp\1.E��:��U�c_��
�k����W���0 ��\�bo|�Y�������pSH�OG^����F��Z�G��^��&��
����-���r���@�.>�l�p�wb�u����V���qS��(X_w�����Pa0����7��`���t���kw�:4q��C���nP��@�8nS\�2�r�\��.'��P�r
�������bN0�e����/m���ef7dw����������i������V��C��berr�DXPW`�F�a�?���F�a�O�����4�?���a����i9k��bk�b�-�Na�4��<���#
s���"Z��FE�����u�\�;�����V�{�cF��h�]�$$��!��f5���FY"������0����8"`��8\`�n�������AN$$Q$�?l�<W�O r�/�>�V�8A�J�S���#���G�*��~q~��
h���loD�����S��v��y��OLv���*��2:�����I�$<��fNX"���:nv� u
v�=��'���L�,��,����w�����K����6z4c��n��4<0�=�|k���Pe���Z�~��Z���������mer�F�����i�1�FV����2������j���H�Vg��0�z�~m�7yl)���g��
{���� z��� �x����v��NO��c
4���W� ��>�����v�}W� �|^)u�=E������S\g����,��})�����E��� �zp���
����X�>�V8W�+'�����0��k��=��$T��onU�a�,�?���#0���P������cz�!2d1��62���G�)����~I�����?*h�����
�S�?>��XA�|�'����f4����j�f���|�I���9�`bcC���#�6��Z��P��Y�9j5�*�������\��%>����2�g���"��*B�8�,��@(f_��%��,(����#��s`26>��
���4sk�<F*�dl�#����V�'����w�VdfI&(^����A�s��2�,3+4
,��^"�^���W�\�r���.�<k6`��X�����8�UX�~C�&�����.�I����~��:m�7�\wZMo��|��:���8,��
f{�\j�=��q�������D��1���i]X��` 2�d��C�t��2���\>��F�gX�H)P��s;
=�C__~R���Pp��>Oq������IW)�\��b��$�����R?�Vp.A$-���b��p�;<�8`[������+!(`��P(`��[�y�����?.q`B�����t?�H8���!8~��,�z��Y�.�B��X����~��U�2�s,#���O�Shl���umv0uW���M�z�Z�p ������h���
k��j���{��m$��Bf �m��I�h�(���r��M8�C
FV��<q��b�'��A��*�$��1;�����^��:�\,��p��A����sa9�a��5���~������?9�����&���2�;Y�A�!z�6��N�<��B��EM�n 8�_�X����p�%���)�ih��� �J*%��P��$B��S.���gc�$�rD�Eb"P�'�s����h���o#%�����uM�}?���We]{����([3��z���
�{���\r�uLkB���dF��~|f��;�����.�����!���/
�>@?�)�6G��q�P0[���$�Uq�X���J+��p�#(���
���r��?��D[D�,IhY�KL��rx�M�J��
@��Ux�G�Z.q��0�bn|�����SK��u���peO�~���^7�2 ���&� ���V�7V�b�z�H�&:�(�:��s
4/P��80�XA
�7&�xcb�u���&D?�{3����������s My�G*��������*U�O2+�.����s�������O�YS��:HA�o�-��VX��?&\�5�n�,,%m���&"P�W���� �z�6�ga
����%���W�nE���
�Q���`����'����ud�����gf�S�>��N�$Q��u���D$�e�D�5$�%�O�fgW���{'�E�>�-��J��p5����N�"|r�4�M�'o=k�����Q����_��r�_^�����C���=��6W�2��6�H_���z��YM��� �����������6�
|*>M=��n�]y�C�q-��M�V�M��h�������������j�M�,� Y(��w���TYs�$V�rp`PB��h�d/-�"�#�t U}}
H�WRrA�%��p�4��A���5Q\�c��)1�I\I���4�G�H�O�i���s����/!������9�4�24�*�4������5�S��^�T�zvheiZ�g*7�����a9��9s��H��vJ%�f��������
U�j�/�E������h\����F���n�]^;��'�(t�:
,�4H����n�O�M �L����;n@�E��y)�ag��=6w]� -������Erz�1y<��d��9EJFlw.�p�=�_F�m�m��o�LS�c��M_��7��+-|d�6;<�������[������n^�����I������/�T^�8�V��=��I��}�1�Gd�vg�pbw�bwg2� �e[e�Sf*������#n�-S�(��y�(�t��y��7������i���;����D?�.?�N_D$��������������������<���'���M^��k�tZd��-aI��F��|}���9��R���7a�O�C��� 6���+l�=<�����P2�������p1����K8�s�GfCH���>�
���6������l��q���N���������`��� &�����Br��nPn���h�����N�r:U�������������)}i��� 5�*��;_�����J1BXi��w+�V�K��Y�$�����Z=�z��D���&�. ��wa$%�(g+���S���zt=�O0�u��N�_L[,��&����� �A��+����q���0�,
is��y�d ��r �/cs�[op��E�2�����<