[PATCH] Largeobject Access Controls (r2432)

Started by KaiGai Koheiabout 16 years ago121 messages
#1KaiGai Kohei
kaigai@ak.jp.nec.com
1 attachment(s)

The attached patch is a revised version of large object privileges
based on the feedbacks at the last commit fest.

List of updates:
* Rebased to the latest CVS HEAD
* Add pg_largeobject_aclcheck_snapshot() interface.
In the read-only access mode, large object feature uses
query's snaphot, not only SnapshotNow. This behavior also
should be applied on accesses to its metadata.
When it makes its access control decision, it has to fetch
database acls from the system catalog. In this time, we scan
the pg_largeobject_metadata with a snapshot of large object
descriptor. (Note that it is SnapshotNow in read-write mode.)
It enables to resolve the matter when access rights are changed
during large objects are opened.
* Add pg_dump support.
* Replace all the "largeobject" by "large object" from
user visible messages and documentation.
* Remove ac_largeobject_*() routines because we decided
not to share same entry points between DAC and MAC.
* Add a description of large object privileges in SGML.
* Add a definition of pg_largeobject_metadata in SGML.
* \lo_list command supports both of v8.5 and prior version.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

sepgsql-02-blob-8.5devel-r2432.patch.gzapplication/gzip; name=sepgsql-02-blob-8.5devel-r2432.patch.gzDownload
���J�<ks�F���_1��:�
IE=-�J�h�kY��tR���C1�a0�dn��~����P6w�]Wb���~wO��A8���d��{������<��G�h��R/b#��O�>b�K�����U8�>����BN�{r�l�5]���O����]�@n���{pV?��_��J����1@�'��'��B?�s����<
�/��q�z[��;�F^2���'����<Mf~{z1
�w�����}��H��<�/&�9� 
��������>���
;�z#���%���g9��3��7'wBS�^��@m���S���_��2�<��4��T&O��S*��)Q��5�C�,
8Ia� z�}	��M%���g,�E���{�\�$!'a<����~N�XJ���&��(���1�I(���z)
S������a��I<��&���"M�����/���l4&)#� ?�8�����%���	��K��x]9�e�&l��
���4��t"�r�s>���������}����Y{��*2?KA�u�4���77W�{��o�����i�os� _���	#8K8=����@r��J����;e�#��o>�,G������S,n�a��������4L�a���E�I"���������� ����>��	�l�.���[X������|Z���r?	�i���,)��NOP
��S�ARh�1Rx���C
F���R��r*��~rsU�$D��hP
+>U1��	�"
."�s�H��`��B��OU20���T�	�����utL0`��E�t�A��hP����$�,��	��lC�l��q�;l�lr��T|��>c,�9�����A��0a�O���
��
�EGn�����lh�������<���e������Hf�y@\E3��&�Q��!���>��!L`R\�I�Py9 �p4NqFD9�������g��{K��\a����x�J`
o2Z`�<O���D�	��~�a1]wu���p��"*�r�}�N��,�^J��S���$�1��nvX<-+���v,�
L�Ch�*_�P��R}�=�4 2��dr��@#Q��u��r
��<4�J�2uJ�n����0��P�Y:������t�Zx�a-@��PM��E���������t��D��""���R�7{
��� 
V� 3��?a!U!��D���&�p w���
q�g��a��!�(l'^�F�f���&t������/N�a�`���v��o_��-�����P�A�#��M�<+�x��
���Y��p:�2���=���!O)(Y��q@�G�`i����pT�c��Vu3�������I(���u��g��*�.�`��>�|�@&L�tL^�o��~L������� �y|�p���,����z��&�j<E����/��U��Q��B(l4�ef�U�����?���M�#��F���@�m9P�������	�.�&�����*)�b��P��D������%������z�%Ft�9
rM�U��)�j0��EaV �?�r��9{�d�q������x�
e��6�1D�������.	q��DXU59�;��J
B|��/fX
<P�9�����"[=��<�600�x&�]���R�S,C zM8V�>����y��
�^Ny�.�N2�R���!���<����c*CP�l������$��$�|��@��Uk�-��~�&��:RX:%FlN�,Ik;Y���0���hLaW�g ��&�IBMq
��������
9M����(T&;(u�4��F"n�j�2d�������t	J;|���_/�Mni��,�c
��Z�,x7��Y;�|�*��:+
,M[�L~�JZ�]E�2��~��>q���lutvx�Y��"ae�3*�U���g
�X��z���U���A����������t2���T����4�VH��v������<(�b?���N6�`��c<���i�]������
F�l�%rC\��`]�v�]\/�nf�'K�;��'s*k/R�d	�k�������(5�����&��+�78�JLW�98"Z�)�=�B���3����~_������8����-�����=&�0���7[�r��x�H�
���X,�Dsy435&p.OI&J1�}�(FcL�j.�L�\�������
`	��D2y��B{�BLf�8��kr���V���������4� $�U�)���B�?&�H����4Ou�@R0�	"�<������
�(-�P$��!�Q�pF��4
3��m�h�WK��S�G��
U��{��S"�3}?�\�"
J~m��\��P�s!^��$Ma����4�
�q|�����65�GHE��:l��f`��b�J���8�T��r�g1F�r�����]\u>|h��I��|_�	-���~�Kn.��Z���o�s�;mg:�pQ��������m.��a�����fd�A��B��8�W��.��c�*Q����B!+](��,�����Y~����[�X�*�������ZE��&7���S���XVw�[�p��<bu����@|�(=>�>-Q��84���7���xUpsqMey�n����`*��8Qfu�.����5���e
��N����C��N�b�1d	,��7fr/#X�l��`
�W��9
7�����k��gx0�B��U��V�������m�j �B�LQO�'����(j���7^<�aj*��wz����:2,��
���;`�^
.Z�6S�V���bea�,aVL}�MB�iP�;�z��\�n���B/���[����
��h�������������Q��LT�-Cv�U[,�6��]��Fd�ld��px����%������-�`9
+��������~�`��R��!����.1>�3�O�5`��jQ�9��P>d�����L��@Ng��qz��au�#{�2o
GH��c���c��8�7��nJtO��Z��Vr����y��<�j�s���(q�4�|*��p���w���-����Q�A:?��~'��"��u<c#5�69RV�
2�Y������I�]��SW�^^�_�&W��]D%]Gx��P�b0.+3��Ea5��P9�* �����t�
8)�9*�j�U��N7t��
��0�j��PbPv�l%/7�|�z�/\��[�-:5}�|/m��V�G���
���#�#c%e�=�5�����p���!Zu��Sz��PY���lf��oo@|���MxI�d+�QJ.ak�`�<�JP����|���-�tM(^��(ty��t�V���I2�D�Kr�`9YJ\�zM�eK��F��L
���Gg�����������%�u!{�`�B��!��]��}�����#��o���S����,�xb�G;�`L��>"�B��G�X\b���O������� ����xk��r	�h�_���77�+��F8h��S|v��m ��<S������������
���X� 5/�c-��O����8��Dp�m!��*KX%����w��y�s�O�?���m��r�y�_D9������������&t@q�[hB�A��qx����3qs�QWMcB6%U7c,TR�H0�l�>x���ZQ��D���2������"��������x�_L��KY������d�j�O��{TTVe����W���-��X����/(��%j��h�aej���UjYq�^�Y3\�W������)b�%MbI�����6����5N��y������%��c�v�2���#��jU���m}���e���#n���w��@>���2�Re��������d-��BN�@[&'b�����"��R�J��w>�8�P�S���U��u�-���<�������5����+{D4ms#��J���b�5������/Z�F�=_-h%q��X������PlcKy-��t�j�8����q�������y���W=�������~��)�x"���sp����5
�	���>��N~��4	�g�z��1�QM��������a[#�����/>�.w��4��^����TR����31����k����G���<��8��lhcH�H���$�h���[��)��d�i���� ���������)K�`K���0�.Q)�0�X8�`��+�������q�<��h%L>6�'��#�u�T��oj�������M�r�Sc2_X��42���lf����P3['r{~������AUC,������L�����ur_ntL���������������X�t��x�)���*pN�����lCF���(�y*��l��x�������2i}��;l�
���v�B�Eb��&�K]{_�_@u]�D�@&��B.�}!�~�t����F���O��+�es_V�5�z���ZyO��7!g���&�D����}���Q�+�$����
m�N��r��a�i�[	[[>�,����o�^t��L��zc��x��U[�A����~������	����Q���3�zV-�i�������wwy��6OmQ��H���j�y�Q#r����k���������
��(X�W	�&	������b����0~����/+��_���H���Y�)�5�����'���k�w��U�ZT�*B��}"�r������<��E�
3�8t�C����
8pK\��&�t%�M���������&�b_mE�e��>������/�"�B
���9����<;j�Z����������j���iD5�=ANX)��O��AjDf��/��l�[[x�>�qA�����a��T��2U�����Y{��W���;
7�+[�n����������/�:���������m�Z��c��m����:���Dq�mk�ae'G����v�-|��p�e<�mlCU�&R�E���	{�]�o6���5T�a��,������6N��@��~��n�2A�v.0%���v���'-��pSpUei��i�F�6�����-S����F$���������\�y|��_��!j5MC�7�-��`l��K-�����n%���
]�	B^�)iF��w/t����s���<"�t�QyD:d	�����4i<��+��f����4�9"����W��y�@o���xc�Ue��20�����ptZ�����$�	J���b{��L*^��1�k~B���_��n?���^:�@^�#�Ib�����Ielm]���_����`$��/�P�J��-T��`�F��������%���b��w��>��-�.F�(
f���R��W����j�n�:��M�������2�������V���EqF����t~���a�\�2�T|����8|(
�HN�����w��w�f@>�T����MA;��sIB���
�6��R7�n�����o���g^�Ka�KGs���t����/�yGS��z����Ap�����=�1��G�	�����>�4�\���^�M������!�C���B�6P��Al
iP�+�#�6�kXU��"�Fl��l�"����d�G��2*�m��*�,v�j�b�y������W}S{����������yd���;�M�x-J��G��DZ�9���Q�,��g�	6<�o�"��*��@������)R�t��c+V�b+�2�U�` 	����
���v�\�FK�j@�W�(��k���!�K��F��g�`���yw��i�mE�k��9	c��oj�o��XJ<��Pk"z��A9����8�R�rO!�� \��?r����,f^Kb�`������� �|)3��j�(1�/�/��o�y���9Q]d����������;�|o�/!�GO~-_����@)������c��^:������2��(���.�B�Z�����D�5}�r���X-�L@�b����S�"��;����MyT����o�j����L~E�$F2
�.�'^���@����,-!5�c!)j	�I���������8q�7^�
�U�u��k��<�����_�P�
e�3�����r4�@.��j$��ZX����R4�b����'��`d��:hy�_
�@��fTR�*�!�)��m����+����3�^��6���s�t���IV�P�����R��hs��������M	����{t������@������0��B�����<�X�"b�����n��H�
�$6�K��XA%`�������e�^�%�?0/x�������.����t��{��E��:��:�($��M���$2�g������s`Dr/n��fcdsF����n����,�����pL;U����8�x��>����%>������Eo�\�AC��7���U�_�^�m����L�����,�R�������D ���2f+���J��0E�,H��W�0L���x�a�Sy@QD;z������a���Nc��a��}D�D_���\���>��t��&��U����"��D��������b�"�9��L�s�y8"F������~�����)W���|������2>�+���`)�
�R��*�5�oP�=a<��������7�
�cV.V�6m	��Q�Z�����wr��n��[[`�����$��^�5�[��j�\��J��h(����$�y��x�{��$\r=�w��N���������kp7���i����h�Kl}!�k�v(!kI�>31�!����ft_��bv��Z��Mox��+������t��Zn��6�/�����w�u��&����ZFJ;>��V��JYL���j�l0��%�]�Z�!61��M�J"o��BD�@�6V#���h��7��j��2�{��Tt3D��xI�\�(��\*P(4.r��}���+����"
��u�����OFn��$(	B�;0/[\�����\>�0tac�6�Y)��?D��.�����Q���H�KD���uooF�K���C��{���:C3Ar��2�|Sgk��r�����A��Q��@n:�rz�?4s~��c9����<"��uXH7x�3u�5}9�$�!���U��R%?�we[Ua��6��F�U7���Z����#7�w5�
mn��i�u�S�&Z��Y��T-�W����zU�	M��{+�wB�-��s) �������S�P:��
S^����E��p
�W�o����@�������n���a3�K�LD�9�
�I�Zy�V9�R��SfJ4��t���s)XL�,4�zt�9T��5�^�t~�tnk�0s�`T��u�EW���ekz�b������fL��f�R����i���F�iM4��y'[Rl����^Q��8�'�=���Q'�Co�!n�������Jv���r��R��P!��n�����$�����h��Ga�l����Q��b��=�L)�'����|n:Z�n���9��m]w�-��u$�4G��%E�K�#�e���3����Q`j�W�2� �2�O	f2��-^���Z�?���9�9FRF�8�L��x�L<:���m� n��|+y���4q�X��b�G��Z��v����i�TN��O,g�|�R�N�F�.5A���,"�������Q't��h.������lt�/�%G�Dk�2.���~� ���{�w�C{��_��
���#�@�����?�O~xa������6�Y�c�r��|��My�XY��sS����td�?�g��a�oa��V���J�����/F�'����WNk�aW�I�h]�,�Oy�GhE��Y�<���N5{u�H�N����"�}���2.t����5���
K�kA��������'��_c~��2���,�����QW�l��?�����>�X���tisIQ������=����BL��~����inifO�'OD�1�eE������^�`��M �;��D�������b>���\�����$B��9c;Cr�P�a�&�����=��r��� �r�t�������0]_O����?9����s�V`�	���7�T������,����Vi�p���}e��c�O����O����]����Sp�]���m<V��O��z������B���`�\6v���&(
>�d��r	��J��&�l`��u0�O����M:5���T0�re&VeGl�h�
$9��~c�~�����&�I�m�9�4P����I�#�;D��>�<��~�eS>��p��5�{���zr>��s	/6/sS�`���
M���
�g\?x3�	j�}�h�:eL�����v��J�V
�/;d�R�Gv8d2�/H(�X���4�O�DH�G��Q���59�h'��5'�gI��$��i���	9��;c2N�x��>^�0��f�p,������x��\��������L?�����w��=�����5��OH�E:~��|	� ���r�o�kK�I9����
R%e�!m<&��W����^!�r�w��_��{!�@�������_]�^O0H��v<B��&H-���}�����y��t�Aw(������0�9&�&�V�������}�Ex;�>������0 ���m������C�����ztr�-;�	a�
����L��v�IH
?ni(�����P#4>b�%��*'.��I�*T1���jL�`�����3E��U�#��?Tn^�
����>�F�lb4B)�|KX���1����D�'^���nz��g�'D�)�}�w��%���(bi�+5��,{
�M�`�h�%�����rl�%�I5���%�/��G�i��K�|E��Q���;�A�@��?��Z��g�nnm�XT��k���b2�������������nD����%��i�mOV�^�����<��1�X��U���jfK���]�OJ�Z�xt{R���j$�&6����l�3"�7�������|���OD��^�J>��j��@�����J!Rp���ZMSm�Z�i�4�M/]?f����;X����NL���F�G����
�t���7�SFn�q3(k���Om���0���Ra8GR��lnz&5}�c�cJ;��u@,����:�wHE��s��!tD&������}6S���o��S`Y������FnT�^��%��E� ���2�%9Gi��na�K�<�ev^R���O��xdQ�l6g�RP���n�l������SL���TL.�'�:�X����LB��.���+�NP���D�����'�r;$wfI�Z�<YJ�T6,�pi���09�M;;��i7�dq��:D`>��R����y��7��mx�	�2��c�x�9���S���pp����N�4/���%gf�l��<%�@�4M1?�:���9f?�#9y��!�����#E@�#�c�����~����P���������OOS�S�~����'��{B�uQf��?����d+��6�}�0��q�����N����8�_���*{~�;�=Z�&{����T���w����v&�Bh��"����l�zR
�Oq�XQ�J.+����\Vt�|��2���:������Ws��������S..��$.-�)�5TG
��If��D����H}��>z�%)�R,'M�>�=6��,i2���o��te.�~�I%��#���P^�������U8������lM�����B���]v;*��'��NGEu��t���>O�G���5�|Y�)�4Zu��RN���j�8�AO	�q�$�
����-/���d[�������S�����?J�������Na~����{E2|��]6T��B*��M��]�u��k���-IG�hA{K����B\�����r(j�����y�9"�'��2�U/�n��eB�N�'V	�F�����_��[BF������o��T�m5���pP�}��&�������$*g��%�M�Y�}�TP�)H���st|�������������b����>Kn"�{Y��Y)9%����Q
������������k��T�N��Ul>�,��b����kh�UBb��^L��������a�(XQYP?K��H�vN�+$.(�9�,���J�X���p�
7q^\<���5f�O�Pa�e����Y��q��L�.����?_���r;~�����#A��aY���#�'	��(��)K��8�(�N�G�y�n���
q�P��1�����xLD��a����xN�,��S(�0��S��l7�Oj��!��J��,5��l�qw���s��(�s��Q��=o�e�����3��<c�	������}��.m3���r �[��vn�+I�8H��B��&����$��
�EN
��Dt��	O����@0��4M����3���?������,������Q,��b��CXRM������C���=����).q��u��g�����kN�Z�)v�a����.�:P��)6�9�]2;���B�zP-�(��[��)��[J�.�A�m�s��`�R�p�,O���Y������^���s������L�5t2�W����>j�^�+{v�@:	�;�o������1�����1'�<��+~5U��6�OI�o�)yA��q���n3Sz.����Ih�D/�z�0d@��
(){�1�]�M�;/V"e��J������~v!.�V3�[��B[�\*�V�_=[m<����Pz�z��!�%��;���&�Z�6�l���.���������(�����x�R�����_����6�h:�4}8����
�_��1	�YQ(iiC�;G%�����b��_��1s^{��7�m�I��5�=;��
��9.�f�	�"sK�2s6�.��1�l��av��\�����"�h�j��fG�W1�mr��K��
x��Y���/��l_��WQ4�]&�p���,��-���
��z��6�6���>��#���<�Q�g�k�d^���r�V�if���]�4���t�]�
E���M�M$���*|�Nq�+���D�YU��XS���k���Otl��D����3��,�*�oSOU�,��(�cv�������T"��?�ks}h�Q��!^��>�G/��~ "��h
7YU*�3��[�T�����#�CbL�9g��s�Y%W��$a�h���n��[�f��Dk��?�����G������D�(��!fS��f9��v�5~��g�R��{����:�����n$_YU�bL�q_���7����������2�EO���������\���v���%�4�f��7w��������L�_I����C���
[7/���!p�?}��2�����7�}�G ����}�WTol&�h��?�{���
o�y�2�b�z5�\3�L�$�������J������u-M��^�2�@������G�}�SnU{0j��o��'������=��d�~��e;����;�<�UrM��<�� =����zp�]��{#����yz#?����>����W��l�L����R�
".����8YN����,���s���<p�3+�cl����zR���RX�1�Xe��4��S�r�d>P%y7�g��[���Iq����_.Y�'���!&�%q�a�P�7�T`'���5��������8������6�c�y,������c����j�\q.uF����oC.�*R���	Zzsxm���P�	����v�^��2/�������\�����������Pt���mG��}i�0��w���x�`�zx���R�,�t�q�����S	j;�v�^n�j��#������C����� 6��9;��pv=��Vh/g��B�z���zHJ��k�m�)B��A���Vvv����@V�j}�R�_�$�^�0��xN=��W�XaE���A�e����V�}G	T��k��D��]g�@~v�3F���h���}�7/W���?�_�����b�"��VE����M'/1(
|oc��x�
��y�z�KrBl��
�����I�[3
��a �����9EZ��2�����}��W���Zs�uCR����mE���~�_e�����}�4������_��B�.�C�9�%�E1��nAz��wEc��������x�^��\�r��R�����j�r��G{���A�U��G�0����hW����ysv�X�@�.��ii��V=���,��V�G�f|i���t��p������e�_��-N~:��'�I:��q�*��k1�X1�,�B/x�� Gz�1	/�oL�uN
��\*�M��b������@>�f�i.JLH�L��8e�iW.�:4�o�/�g���j5����+r���������X�\w���m"#<RT�����]GVI[�2���d_������j�!��V �V��y��@s:2�X��B�<�o��H �>����#�������3���j����jv��ms[4�ZA��9K+A���A�<U��1�MH+����!���c_�:���T3����Q&.�.�{���-����_}�0��U�h��� 	�G��6��NR ed?B�J_AzkS�,?k���)��St�XT�Vw=��m��A��#nN�T��-����t��[�7�%��
��5) K���t'Sob��0c���q2(��c�5E�Z�����W����:�l��.�q�}&���� �S�(s���VWg� �YA��828_�Qb������:vD8�����b6`8��{TlN1��=��dc�(����Q���B��S��9V���
��������������*(f��G�����!�~���
�:U�`�j��L"U;'��#1*o�"��=4���D|�M2��!���h�+��W��H%�7>\T���'m�l�@@SW��OM�bZ��)	-%'���Vi���:�"%M���AeH2x�2�����e��l�wZ��H�z��'����`�,<�me ������t��4�g�;�6�$W��x�
^�5��M���I���(6b/#��`/��K{�VKq5b#�����L�b�)��I������Dy)�&~)fQ.S�"G�1
ep�2������ �]����%"��e�s���9Z��H�:W�h��5�D��J�Q�����Q(1��ef8p�S��M�`��I�Mz�#���74�Z�^|M��A���?|hK�if#@������,m�����]<a�����(�]�C�R�.a/F
��`�3�Fq�#�P\���cKa�W����B��:L0%�qW���pN��Eacj�T���:&��<�����������A^�2��3;�����^�8���������Z4���/e�	�4��:q�3%�������������z�&f��S�zP��c�Y���J�����,����L����
t"��x�D��W$/��:��VK�K�~����_���X�-��b�
c�c�N@����}�a����@c�)�VbbH	�M$�r	��y�
��������X�����!�w��<�h�����
�X�C��$?	>�v��/h)#j�TI�`����C�l#uc@�5�L:�Q�#���J���5�h�Y�m'��5�B�0&�\K�Km'�6���W���X�\7�'^��y��������/a/6_J�)������Rl8WO=uE����UY����2�a�u'~}�H��������t�������`0��7�Za.�j��(�jx�r���s��9��k��^����VPo�N�������������w�@:q��?��k5�Ts?�����Y�q�����t�|�'c��C\�H��R�J��*�����4�G+��6�f�2�6k��
^�k��Z9��FHg�a��.'PV��hE���iw4�������}O�=��c[2*gF�� �/:����������z�Q.�c��QM>j���F���I}��5L������k���
OC���D[��Z<�>��X�K�-�6�������)�Cx�?V����B��������v�).nP\������qt�nKV>�f�4��=h��������p(����s�2oEf2���{��}v�����`��
ZD�����c4!��A�v�s����^���z{|";#�in��^@��;}�����to_m�9��3c��<D�V���Y7�JM�,25�dT���X�;�$VS�Z����N���I�tWC��&p[4H���A�E'������� vO���h���5������5yd�}�y6!'TN�X�T�A���KW*5xd�|]���x���I��(�~r��ezW�9{�f�}����"�'�O����'��<�Aj@7��p4�.PB
�GWW�T�t��.8��N�H}�?�?�2��e��goCpuV��aost�I^^���G�|�W�_�f��3�|V�$�/��S�"�w����:>��A�U�nK���i���n��l���u�������F7�_f#�����@L!v�o�P���������#�#�� ���Ba|�`g�#�����L�h
�����0*m�����M5i�s������81#����:=�7��vz;
aD8j�K:�!�d�n����I���_ F�By� �6B��_Y�A�(�c�
���U�L]�iT���PBlgOq�������@��� �� "�D����,����A��"�����FjC6��nN�B���:}�}�)pO%�cM�+�1P;�x����������~������n��_u(�
��
{��
�/>�{b�x� :��J��hLF�����S-�L�|�RB���[ �Yv���������V�����H��j�Bi?�V�`�Q�
�Yoo�b���5�Tb�'a���l���AV-��E	Cc����PI�����w'���}x|q�&�������z�Gw�>�U������Re;d	a�I�����H��s���m}�3�F����M��,�N]�#	N�-4��������p�����b����O�W�Zu�{��v'�^3X9}�*J�0���H�T�R�t��'��(d���Q+�b�Z�y�����k��L�G{pr#.��Q�Lk-"8�n�6�W�|4�t�szQX� M�N���I����+B"��p�~���K4�M��bx3��{�,UK���������>0�C����T�8;?9������n�m�kJ��r���uW�^k|�Z{}v�^dSG�G��MA�T9�J��?N����Hqr,P�E��RF����C�K�u�S������E���^��U��)���������4=_�g�c|�w���*NL��y�m�{��s0|��e�TG�Z]r����$�<����1m��9
2{c�	�\����vJZ�5��s~G�i_y�b����}u��G�4s}����D�\���VV�lJ]�/�P����U�)t��b
�~������\��F���Z����$|�����`}�xv������S�~
G��<c�aFB0/����'��hh�[�r	��\�1K�x��L���>�[���@�@�OnV�/��r��3G���o-������i��?���D�9����}Ff�+�N�;��+��G8G8G~z��DID��!��d0=����!�ZS�����d:X������b��iv<���1��v���"�vR}7G�b���%{������.{���3�{��.��[�Rjw^h��K��1�
m���U6�+I
��B:��.^*O�h�2n���P���*�z*���s�6QC��{�W|�;���S���^���O��j�?fQ�l8uV�����O��/*)E�����=�D����eGW��D��4����Y�r�;����}V$8
�����E7�1Q��z4wR*KN�'���F�.�N�u�4�(n9&����G��ZT�++����O�=�@�����86�h��� IRg����O�L���. �a|�e��"oJ��������(/��y���^�#yI��?����e�*|������>4?�}��z%������n8x(���������|9��.s�X�h�b��5�I�o������%`�����6��N��{�t2z�H�����4�c��1v��?cq��'Rm�/��0-��z,�N��n���'
��.�|��[��Y��w-��;JWD���	*��������]��\��������p�p2������1?[c�Y�*�@y�/��
OZ�|�R����k�E?��HT�5!�c+�fM7�Z�O��N���oV��fN��O���(+^#\����Z�����|�X�I��?�#y�����,F��m(�w�j����G,9������a>3o��W�����8y*�SyV��^e�f���
��gK���_Q,Z��8=P���N-�)�h�4��EO����h�y�����������H�0��R.g�*�3�H�&Z����RD�I�^�p8���H�Q�@���|P�Vm�&|`nc	����1)��K�,{p�z���,(K"v�)��M�*C��$L���E�i��r65E5�W��kuR� �c|v����m@�t:2pC�k�d��)�$�4�I?R��p|����<8�g�Z-t
jC�LO�zPn�i���6��7�F����6u�Z��j�A*�Ed�F���C���22`8g
�y�`���}�r���K��U��Z�6=]5���g���T��abc���wi��	w��hLP�f&k�{���'Y�Sw������r�����N=�'�������]�� ^2{��K�����:��{�f�B�i��h��{.��h�rP�Z�r>��%++�9���VV�,:�9����wI8���&���	�u�n:���G ����������~�<����������;�Ds��V���fP�N(?��n���p��f����V�E��U��q��+��?Qn5K��C������g\P�*L���T�1I���_�e�h� xT����X�Z��
�i�ZQ��A�6�Y�4J\Q�8_ %�F����{}:�0���h�0�_�LE�[�%j56q^(An� ��7�^��AxFc�S���f�yM�0�5#P��?�$�N��~g��M�����o��warQ������*5���R��&��n�)nG����<�4�Z�<>�8<W����N_~�oE7�o4�4�6�~$�P�xP�W��lAaU�?��\������;���+)���E�yZ���H��!J1���b;�~kI4Ye���rj(d�6���#F�-��V���@=��ZG��:Xe,H���[�uL��t���A!�����9xw�&mX�����:�t� <��ly�KT�;�9PM�1-�*#�ZI�d��sg���5�������&p��=�3�����pu:l���h�����6������#H��g�a� ��	'�� k��K���c�DGi]ZAf#S���EB+�j��'�N��`v�<9"��<��e��B���"��������qnzO��#�7��TD�q��^#v<A��{���h�)�W2Cu4�Cn�7a������7����"�#����u��7����t�qG��%�x�=
����7�;L����>�e����K ]C�S;~`:b�-7����$0��J�;��G`�2~K�������prG'f�K�U�;9�z���V�x��'g�Go����i����"
'w:x��"�/��c=�F���k�j����7�������s��j���e�� Q�b��;.J%M�^.�9>��:+3_ru��~��
�A���(�3�T�����J�Fk��_m�h<��M�>���d�yH������W��x[�P����0���v��r�v7x7H��cQ�Q���:��WV���?�?\dUP-+��7��kj�
l-���6���&n�[Ts��0��E��B6;�%���V���Z��k�S�-���pt|@�vG}���u���Z�(gK��\K�U2�,��]�����c��h���`����H0��J#SD���N��43���,64���-��3�Z���?�������
�����W�a[��@<
����lx?bG�;hqL|g�)�c2�=��>�q����=�a4N�P������L4�&?�Nvg����u�6jk�d���Kz��T�
�L���^����^�2�53r�+6R����)z�\c6���op#�A���X�1B��P;1y�Y�O���
���8����jZ��k��PJ��������m��w?����A�ry�i,i[�3�-�	V�}��X��;�j��D��gn':{�bu}��W��b��D�KT�HV7�o��M}=��|n�����'���~|
��:9���>IF��I���v�i����7�h�<Oi�.��F�[��Me�/�W���;G���?��/��l]�*RFGd2��T�}bN����r��#��)��2�'J)c����rTz�<*=\U\��s��~vx~x!�cm����L!�<&�G�ir:
�.��,��pe�{]�$��1�t"�7�!�����?���@�������N�JR�S��o
�B��eD�]��-8!�<����+6i./��s��{.�������b��g]��K�rQ�,?K�,���8����v��W��^%O{���U��W��^-O{�������9�;�pIL�`�lc����W���_�����U,�2���������M�\}��rU?]���H+�iqt;���=A+��	Csv�����7w	��������lWg�]��|������h����duO5T���k1���_�V+��m����[�wT�����wT���J~�Q��h��@J����O���Y�����@T��(s��|����mW�m�������d����%c�7�_p<�'l����Qq_�\v{��T�Ys��"�'�SI���.L���/��j��kO:�����a�������1%��7������N~8��N�4d,O����:�Q%�T�V2�~;��2��H���}�:�/M���Trs��eNn).
���m�G�������$f�\����o�T�k�?�3���)��t��~}�?8J��0,[��_��r������O������������
�G��o������h�q�>O[u�Q�/Vs\�}1��J���D����-���\������N���m��F��~����������������=���������W,�K^�j�����B���%Vl;}�����+��+��Cg��:83
���9H��I���P��	m^��|�)h���� p2��#������`6�����6`��3�+���t�1%"��lbvP�y���G�Pso�0��JY����02{�k����u�
���O_������A����,B���d)�����%[�8Vl�l��%j�����������0Jj�I;N� }��
��/Uk��rP�w�r�Vu��V����q������<Z��,�2���������������``��|WQ}���=���B\���C�_���_�TJ�k��U����]�J��1����R��I�
�L\e��6�N�o<�:C`��m��������ur�.�X��[n��S�h��'������M'�hG����\����^g�"?r.�������J{m���F��qs�jS�*�vn�W�����~[�J%���4����R��b����o{z9��}��1t��3�������3IT���pn�*�=��"��LH�V�8��.&=�<��!3�7"��D�G��b��m��}��T��`��x������k�NK������������qD���W��{�~E��%�i1����;�:��b���:�h
���VQ�>�f�i�U���5��l
K����!r^�7�{��'%�^>C�s��Rj��������z��7���UJ����;�7�����
��&�t�x����gKw������h�D�����F��Tj�6����O��Z�F���5S��mT}*}�S�<�5a.����*�;�ST-)��t�7�90�fO��	vm~��6�'�^,k��S����>�z��1����?H/��j|Y����t-���-���O���i��W|�	����V)h��iJ�t��X���(���)��F�4*u+2��R�����Yt�f
#2Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: KaiGai Kohei (#1)
Re: [PATCH] Largeobject Access Controls (r2432)

2009/11/12 KaiGai Kohei <kaigai@ak.jp.nec.com>:

The attached patch is a revised version of large object privileges
based on the feedbacks at the last commit fest.

please update the patch, it's giving an error when 'make check' is
trying to "create template1" in initdb:

creating template1 database in
/home/postgres/pg_releases/pgsql/src/test/regress/./tmp_check/data/base/1
... TRAP: FailedAssertion("!(reln->md_fd[forkNum] == ((void *)0))",
File: "md.c", Line: 254)
child process was terminated by signal 6: Aborted

Meanwhile, i will make some comments:

This manual will be specific for 8.5 so i think all mentions to the
version should be removed
for example;

+    In this version, a large object has OID of its owner, access permissions
+    and OID of the largeobject itself.
+         Prior to the version 8.5.x release does not have any
privilege checks on
+   large objects.
the parameter name (large_object_privilege_checks) is confusing enough
that we have to make this statements to clarify... let's think in a
better less confuse name
+         Please note that it is not equivalent to disable all the security
+         checks corresponding to large objects.
+         For example, the <literal>lo_import()</literal> and
+         <literal>lo_export</literal> need superuser privileges independent
+         from this setting as prior versions were doing.

this will not be off by default? it should be for compatibility
reasons... i remember there was a discussion about that but can't
remember the conclusion

Mmm... One of them? the first?
+ The one is <literal>SELECT</literal>.

+     Even if a transaction modified access rights and commit it, it is
+     not invisible from other transaction which already opened the large
+     object.

The other one, the second
+ The other is <literal>UPDATE</literal>.

it seems there is an "are" that should not be there :)
+
+     These functions are originally requires database superuser privilege,
+     and it allows to bypass the default database privilege checks, so
+     we don't need to check an obvious test twice.
a typo, obviously
+        For largeo bjects, this privilege also allows to read from
+        the target large object.
We have two versions of these functions one that a recieve an SnapShot
parameter and other that don't...
what is the rationale of this? AFAIU, the one that doesn't receive
SnapShot is calling the other one with SnapShotNow, can't we simply
call it that way and drop the version of the functions that doesn't
have that parameter?
+ pg_largeobject_aclmask(Oid lobj_oid, Oid roleid,
+                      AclMode mask, AclMaskHow how)

+ pg_largeobject_aclcheck(Oid lobj_oid, Oid roleid, AclMode mode)

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

#3Robert Haas
robertmhaas@gmail.com
In reply to: Jaime Casanova (#2)
Re: [PATCH] Largeobject Access Controls (r2432)

On Thu, Dec 3, 2009 at 12:49 PM, Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:

This manual will be specific for 8.5 so i think all mentions to the
version should be removed

Not sure I agree on this point. We have similar mentions elsewhere.

...Robert

#4Greg Smith
greg@2ndquadrant.com
In reply to: Robert Haas (#3)
Re: [PATCH] Largeobject Access Controls (r2432)

Robert Haas wrote:

On Thu, Dec 3, 2009 at 12:49 PM, Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:

This manual will be specific for 8.5 so i think all mentions to the
version should be removed

Not sure I agree on this point. We have similar mentions elsewhere.

In this particular example, it's bad form because it's even possible
that 8.5 will actually be 9.0. You don't want to refer to a version
number that doesn't even exist for sure yet, lest it leave a loose end
that needs to be cleaned up later if that number is changed before release.

Rewriting in terms like "in earlier versions..." instead is one
approach. Then people will have to manually scan earlier docs to sort
that out, I know I end up doing that all the time. If you want to keep
the note specific, saying "in 8.4 and earlier versions [old behavior]"
is better than "before 8.5 [old behavior]" because it only mentions
version numbers that are historical rather than future.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.com

#5Robert Haas
robertmhaas@gmail.com
In reply to: Greg Smith (#4)
Re: [PATCH] Largeobject Access Controls (r2432)

On Thu, Dec 3, 2009 at 1:23 PM, Greg Smith <greg@2ndquadrant.com> wrote:

Robert Haas wrote:
On Thu, Dec 3, 2009 at 12:49 PM, Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:

This manual will be specific for 8.5 so i think all mentions to the
version should be removed

Not sure I agree on this point. We have similar mentions elsewhere.

In this particular example, it's bad form because it's even possible that
8.5 will actually be 9.0.  You don't want to refer to a version number that
doesn't even exist for sure yet, lest it leave a loose end that needs to be
cleaned up later if that number is changed before release.

Rewriting in terms like "in earlier versions..." instead is one approach.
Then people will have to manually scan earlier docs to sort that out, I know
I end up doing that all the time.  If you want to keep the note specific,
saying "in 8.4 and earlier versions [old behavior]" is better than "before
8.5 [old behavior]" because it only mentions version numbers that are
historical rather than future.

Ah, yes, I like "In 8.4 and earlier versions", or maybe "earlier
releases". Compare:

http://www.postgresql.org/docs/8.4/static/sql-copy.html#AEN55855
http://www.postgresql.org/docs/8.4/static/runtime-config-logging.html#GUC-LOG-FILENAME

...Robert

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#5)
Re: [PATCH] Largeobject Access Controls (r2432)

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Dec 3, 2009 at 1:23 PM, Greg Smith <greg@2ndquadrant.com> wrote:

In this particular example, it's bad form because it's even possible that
8.5 will actually be 9.0.� You don't want to refer to a version number that
doesn't even exist for sure yet, lest it leave a loose end that needs to be
cleaned up later if that number is changed before release.

Ah, yes, I like "In 8.4 and earlier versions", or maybe "earlier
releases". Compare:

Please do *not* resort to awkward constructions just to avoid one
mention of the current version number. If we did decide to call the
next version 9.0, the search-and-replace effort involved is not going
to be measurably affected by any one usage. There are plenty already.

(I did the work when we decided to call 7.5 8.0, so I know whereof
I speak.)

regards, tom lane

#7Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#6)
Re: [PATCH] Largeobject Access Controls (r2432)

On Thu, Dec 3, 2009 at 2:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Dec 3, 2009 at 1:23 PM, Greg Smith <greg@2ndquadrant.com> wrote:

In this particular example, it's bad form because it's even possible that
8.5 will actually be 9.0.  You don't want to refer to a version number that
doesn't even exist for sure yet, lest it leave a loose end that needs to be
cleaned up later if that number is changed before release.

Ah, yes, I like "In 8.4 and earlier versions", or maybe "earlier
releases".  Compare:

Please do *not* resort to awkward constructions just to avoid one
mention of the current version number.  If we did decide to call the
next version 9.0, the search-and-replace effort involved is not going
to be measurably affected by any one usage.  There are plenty already.

(I did the work when we decided to call 7.5 8.0, so I know whereof
I speak.)

I agree that search and replace isn't that hard, but I don't find the
proposed construction awkward, and we have various uses of it in the
docs already. Actually the COPY one is not quite clear whether it
means <= 7.3 or < 7.3. I think we're just aiming for consistency here
as much as anything.

...Robert

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#7)
Re: [PATCH] Largeobject Access Controls (r2432)

Robert Haas <robertmhaas@gmail.com> writes:

I agree that search and replace isn't that hard, but I don't find the
proposed construction awkward, and we have various uses of it in the
docs already. Actually the COPY one is not quite clear whether it
means <= 7.3 or < 7.3. I think we're just aiming for consistency here
as much as anything.

Well, the problem is that "<= 8.4" is confusing as to whether it
includes 8.4.n. You and I know that it does because we know we
don't make feature changes in minor releases, but this is not
necessarily obvious to everyone. "< 8.5" is much less ambiguous.

regards, tom lane

#9Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#8)
Re: [PATCH] Largeobject Access Controls (r2432)

On Thu, Dec 3, 2009 at 3:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I agree that search and replace isn't that hard, but I don't find the
proposed construction awkward, and we have various uses of it in the
docs already.  Actually the COPY one is not quite clear whether it
means <= 7.3 or < 7.3.  I think we're just aiming for consistency here
as much as anything.

Well, the problem is that "<= 8.4" is confusing as to whether it
includes 8.4.n.  You and I know that it does because we know we
don't make feature changes in minor releases, but this is not
necessarily obvious to everyone.  "< 8.5" is much less ambiguous.

Ah. I would not have considered that, but it does make sense.

...Robert

#10Greg Smith
greg@2ndquadrant.com
In reply to: Robert Haas (#7)
Re: [PATCH] Largeobject Access Controls (r2432)

Robert Haas wrote:

I agree that search and replace isn't that hard, but I don't find the
proposed construction awkward, and we have various uses of it in the
docs already. Actually the COPY one is not quite clear whether it
means <= 7.3 or < 7.3.

Yeah, I wouldn't have suggested it if it made the wording particularly
difficult in the process. I don't know what your issue with the COPY
one is:

"The following syntax was used before PostgreSQL version 7.3 and is
still supported"

I can't parse that as anything other than "<7.3"; now sure how can
someone read that to be "<="?

In any case, the two examples you gave are certainly good for showing
the standard practices used here. Specific version numbers are strewn
all about, and if there's commits mentioning 8.5 already in there one
more won't hurt.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.com

#11KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Jaime Casanova (#2)
Re: [PATCH] Largeobject Access Controls (r2432)

Jaime Casanova wrote:

2009/11/12 KaiGai Kohei <kaigai@ak.jp.nec.com>:

The attached patch is a revised version of large object privileges
based on the feedbacks at the last commit fest.

please update the patch, it's giving an error when 'make check' is
trying to "create template1" in initdb:

creating template1 database in
/home/postgres/pg_releases/pgsql/src/test/regress/./tmp_check/data/base/1
... TRAP: FailedAssertion("!(reln->md_fd[forkNum] == ((void *)0))",
File: "md.c", Line: 254)
child process was terminated by signal 6: Aborted

I could not reproduce it.
Could you run "make clean", then "make check"?
Various kind of patches are merged under the commit fest, so some of them
changes definition of structures. If *.o files are already built based on
older definitions, it may refers incorrect addresses.

Meanwhile, i will make some comments:

This manual will be specific for 8.5 so i think all mentions to the
version should be removed
for example;

+    In this version, a large object has OID of its owner, access permissions
+    and OID of the largeobject itself.
+         Prior to the version 8.5.x release does not have any
privilege checks on
+   large objects.

The conclusion is unclear for me.

Is "In the 8.4.x and prior release, ..." an ambiguous expression?
^^^^^

the parameter name (large_object_privilege_checks) is confusing enough
that we have to make this statements to clarify... let's think in a
better less confuse name
+         Please note that it is not equivalent to disable all the security
+         checks corresponding to large objects.
+         For example, the <literal>lo_import()</literal> and
+         <literal>lo_export</literal> need superuser privileges independent
+         from this setting as prior versions were doing.

In the last commit fest, it was named "largeobject_compat_acl",
but it is not preferable for Tom Lane, so he suggested to rename it
into "large_object_privilege_checks".

Other candidates:
- lo_compat_privileges (<- my preference in this four)
- large_object_compat_privs
- lo_compat_access_control
- large_object_compat_ac

I think "_compat_" should be contained to emphasize it is a compatibility
option.

this will not be off by default? it should be for compatibility
reasons... i remember there was a discussion about that but can't
remember the conclusion

IIRC, we have no discussion about its default value, although similar topics
were discussed:

* what should be checked on creation of a large object?
-> No need to check permission on its creation. It allows everyone to create
a new large object like current implementation.
(Also note that this behavior may be changed in the future.)

* DELETE should be checked on deletion of a large object?
-> No. PgSQL checks ownership of the database objects on its deletion such
as DROP TABLE. The DELETE permission is checked when we delete contents
of a certain database object, not the database object itself.

Mmm... One of them? the first?
+ The one is <literal>SELECT</literal>.

+     Even if a transaction modified access rights and commit it, it is
+     not invisible from other transaction which already opened the large
+     object.

The other one, the second
+ The other is <literal>UPDATE</literal>.

I have no arguments about English expression.

BTW, "The one is ..., the other is ..." was a statement on textbook
to introduce two things. :-)

it seems there is an "are" that should not be there :)
+
+     These functions are originally requires database superuser privilege,
+     and it allows to bypass the default database privilege checks, so
+     we don't need to check an obvious test twice.
a typo, obviously
+        For largeo bjects, this privilege also allows to read from
+        the target large object.

Thanks, I see.

We have two versions of these functions one that a recieve an SnapShot
parameter and other that don't...
what is the rationale of this? AFAIU, the one that doesn't receive
SnapShot is calling the other one with SnapShotNow, can't we simply
call it that way and drop the version of the functions that doesn't
have that parameter?
+ pg_largeobject_aclmask(Oid lobj_oid, Oid roleid,
+                      AclMode mask, AclMaskHow how)

+ pg_largeobject_aclcheck(Oid lobj_oid, Oid roleid, AclMode mode)

We have no reason other than cosmetic rationale.

In the current implementation, all the caller of pg_largeobejct_aclcheck_*()
needs to provides correct snapshot including SnapshotNow when read-writable.
When pg_aclmask() calls pg_largeobject_aclmask(), it is the only case that
caller assumes SnapshotNow shall be applied implicitly.

On the other hand, all the case when pg_largeobject_ownercheck() is called,
the caller assumes SnapshotNow is applied, so we don't have multiple versions.

So, I'll reorganize these APIs as follows:
- pg_largeobject_aclmask_snapshot()
- pg_largeobject_aclcheck_snapshot()
- pg_largeobject_ownercheck()

Thanks, please wait for revised revision.
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

#12Itagaki Takahiro
itagaki.takahiro@oss.ntt.co.jp
In reply to: KaiGai Kohei (#11)
Re: [PATCH] Largeobject Access Controls (r2432)

KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

creating template1 database in
/home/postgres/pg_releases/pgsql/src/test/regress/./tmp_check/data/base/1
... TRAP: FailedAssertion("!(reln->md_fd[forkNum] == ((void *)0))",
File: "md.c", Line: 254)
child process was terminated by signal 6: Aborted

I could not reproduce it.

I had the same trap before when I mistakenly used duplicated oids.
Don't you add a new catalog with existing oids?
src/include/catalog/duplicate_oids might be a help.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

#13KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Itagaki Takahiro (#12)
Re: [PATCH] Largeobject Access Controls (r2432)

Itagaki Takahiro wrote:

KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

creating template1 database in
/home/postgres/pg_releases/pgsql/src/test/regress/./tmp_check/data/base/1
... TRAP: FailedAssertion("!(reln->md_fd[forkNum] == ((void *)0))",
File: "md.c", Line: 254)
child process was terminated by signal 6: Aborted

I could not reproduce it.

I had the same trap before when I mistakenly used duplicated oids.
Don't you add a new catalog with existing oids?
src/include/catalog/duplicate_oids might be a help.

Thanks, Bingo!

toasting.h:DECLARE_TOAST(pg_trigger, 2336, 2337);

pg_largeobject_metadata.h:CATALOG(pg_largeobject_metadata,2336)

--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

#14KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Jaime Casanova (#2)
1 attachment(s)
[PATCH] Largeobject Access Controls (r2460)

The attached patch is an updated revision of Largeobject Access Controls.

List of updates:
* rebased to the latest CVS HEAD

* SGML documentation fixes:
- The future version number was replaced as:
"In the 8.4.x series and earlier release, ..."
- Other strange English representations and typoes were fixed.

* Fixed OID conflicts in system catalog definition.
The new TOAST relation for pg_trigger used same OID number with
pg_largeobject_metadata.

* Fixed incorrect error code in pg_largeobject_ownercheck().
It raised _UNDEFINED_FUNCTION, but should be _UNDEFINED_OBJECT.

* Renamed GUC parameter to "lo_compat_privileges" from
"large_object_privilege_checks".

* pg_largeobject_aclmask() and pg_largeobject_aclcheck(), not
take an argument of snapshot, were removed.
Currently, the caller provide an appropriate snapshot them.

Thanks,

Jaime Casanova wrote:

2009/11/12 KaiGai Kohei <kaigai@ak.jp.nec.com>:

The attached patch is a revised version of large object privileges
based on the feedbacks at the last commit fest.

please update the patch, it's giving an error when 'make check' is
trying to "create template1" in initdb:

creating template1 database in
/home/postgres/pg_releases/pgsql/src/test/regress/./tmp_check/data/base/1
... TRAP: FailedAssertion("!(reln->md_fd[forkNum] == ((void *)0))",
File: "md.c", Line: 254)
child process was terminated by signal 6: Aborted

Meanwhile, i will make some comments:

This manual will be specific for 8.5 so i think all mentions to the
version should be removed
for example;

+    In this version, a large object has OID of its owner, access permissions
+    and OID of the largeobject itself.
+         Prior to the version 8.5.x release does not have any
privilege checks on
+   large objects.
the parameter name (large_object_privilege_checks) is confusing enough
that we have to make this statements to clarify... let's think in a
better less confuse name
+         Please note that it is not equivalent to disable all the security
+         checks corresponding to large objects.
+         For example, the <literal>lo_import()</literal> and
+         <literal>lo_export</literal> need superuser privileges independent
+         from this setting as prior versions were doing.

this will not be off by default? it should be for compatibility
reasons... i remember there was a discussion about that but can't
remember the conclusion

Mmm... One of them? the first?
+ The one is <literal>SELECT</literal>.

+     Even if a transaction modified access rights and commit it, it is
+     not invisible from other transaction which already opened the large
+     object.

The other one, the second
+ The other is <literal>UPDATE</literal>.

it seems there is an "are" that should not be there :)
+
+     These functions are originally requires database superuser privilege,
+     and it allows to bypass the default database privilege checks, so
+     we don't need to check an obvious test twice.
a typo, obviously
+        For largeo bjects, this privilege also allows to read from
+        the target large object.
We have two versions of these functions one that a recieve an SnapShot
parameter and other that don't...
what is the rationale of this? AFAIU, the one that doesn't receive
SnapShot is calling the other one with SnapShotNow, can't we simply
call it that way and drop the version of the functions that doesn't
have that parameter?
+ pg_largeobject_aclmask(Oid lobj_oid, Oid roleid,
+                      AclMode mask, AclMaskHow how)

+ pg_largeobject_aclcheck(Oid lobj_oid, Oid roleid, AclMode mode)

--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

sepgsql-02-blob-8.5devel-r2460.patch.gzapplication/gzip; name=sepgsql-02-blob-8.5devel-r2460.patch.gzDownload
#15Greg Smith
greg@2ndquadrant.com
In reply to: KaiGai Kohei (#14)
Re: [PATCH] Largeobject Access Controls (r2460)

I just looked over the latest version of this patch and it seems to
satisfy all the issues suggested by the initial review. This looks like
it's ready for a committer from a quality perspective and I'm going to
mark it as such.

I have a guess what some of the first points of discussion are going to
be though, so might as well raise them here. This patch is 2.8K lines
of code that's in a lot of places: a mix of full new functions, tweaks
to existing ones, docs, regression tests, it's a well structured but
somewhat heavy bit of work. One obvious questions is whether there's
enough demand for access controls on large objects to justify adding the
complexity involved to do so. A second thing I'm concerned about is
what implications this change would have for in-place upgrades. If
there's demand and it's not going to cause upgrade issues, then we just
need to find a committer willing to chew on it. I think those are the
main hurdles left for this patch.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.com

#16KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Greg Smith (#15)
Re: [PATCH] Largeobject Access Controls (r2460)

Greg Smith wrote:

I just looked over the latest version of this patch and it seems to
satisfy all the issues suggested by the initial review. This looks like
it's ready for a committer from a quality perspective and I'm going to
mark it as such.

Thanks for your efforts.

I have a guess what some of the first points of discussion are going to
be though, so might as well raise them here. This patch is 2.8K lines
of code that's in a lot of places: a mix of full new functions, tweaks
to existing ones, docs, regression tests, it's a well structured but
somewhat heavy bit of work. One obvious questions is whether there's
enough demand for access controls on large objects to justify adding the
complexity involved to do so.

At least, it is a todo item in the community:
http://wiki.postgresql.org/wiki/Todo#Binary_Data

Apart from SELinux, it is quite natural to apply any access controls on
binary data. If we could not have any valid access controls, users will
not want to store their sensitive information, such as confidential PDF
files, as a large object.

A second thing I'm concerned about is
what implications this change would have for in-place upgrades. If
there's demand and it's not going to cause upgrade issues, then we just
need to find a committer willing to chew on it. I think those are the
main hurdles left for this patch.

I guess we need to create an empty entry with a given OID into the
pg_largeobject_metadata for each large objects when we try to upgrade
in-place from 8.4.x or earlier release to the upcoming release.
However, no format changes in the pg_largeobject catalog, including
an empty large object, so I guess we need a small amount of additional
support in pg_dump to create empty metadata.

I want any suggestion about here.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

#17Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Greg Smith (#15)
Re: [PATCH] Largeobject Access Controls (r2460)

On Sun, Dec 6, 2009 at 11:19 PM, Greg Smith <greg@2ndquadrant.com> wrote:

I just looked over the latest version of this patch and it seems to satisfy
all the issues suggested by the initial review.  This looks like it's ready
for a committer from a quality perspective and I'm going to mark it as such.

yes. i have just finished my tests and seems like the patch is working
just fine...

BTW, seems like KaiGai miss this comment in
src/backend/catalog/pg_largeobject.c when renaming the parameter
* large_object_privilege_checks is not refered here,

i still doesn't like the name but we have changed it a lot of times so
if anyone has a better idea now is when you have to speak

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

#18KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Jaime Casanova (#17)
1 attachment(s)
Re: [PATCH] Largeobject Access Controls (r2460)

Jaime Casanova wrote:

On Sun, Dec 6, 2009 at 11:19 PM, Greg Smith <greg@2ndquadrant.com> wrote:

I just looked over the latest version of this patch and it seems to satisfy
all the issues suggested by the initial review. This looks like it's ready
for a committer from a quality perspective and I'm going to mark it as such.

yes. i have just finished my tests and seems like the patch is working
just fine...

BTW, seems like KaiGai miss this comment in
src/backend/catalog/pg_largeobject.c when renaming the parameter
* large_object_privilege_checks is not refered here,

i still doesn't like the name but we have changed it a lot of times so
if anyone has a better idea now is when you have to speak

Oops, it should be fixed to "lo_compat_privileges".
This comment also have version number issue, so I fixed it as follows:

BEFORE:
/*
* large_object_privilege_checks is not refered here,
* because it is a compatibility option, but we don't
* have ALTER LARGE OBJECT prior to the v8.5.0.
*/

AFTER:
/*
* The 'lo_compat_privileges' is not checked here, because we
* don't have any access control features in the 8.4.x series
* or earlier release.
* So, it is not a place we can define a compatible behavior.
*/

Nothing are changed in other codes, including something corresponding to
in-place upgrading. I'm waiting for suggestion.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

sepgsql-02-blob-8.5devel-r2461.patch.gzapplication/gzip; name=sepgsql-02-blob-8.5devel-r2461.patch.gzDownload
#19Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: KaiGai Kohei (#16)
Re: [PATCH] Largeobject Access Controls (r2460)

KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

Apart from SELinux, it is quite natural to apply any access
controls on binary data. If we could not have any valid access
controls, users will not want to store their sensitive
information, such as confidential PDF files, as a large object.

Absolutely. The historical security issues for large objects
immediately eliminated them as a possible place to store PDF files.

-Kevin

#20Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp
In reply to: KaiGai Kohei (#18)
Re: Largeobject Access Controls (r2460)

Hi, I'm reviewing LO-AC patch.

KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

Nothing are changed in other codes, including something corresponding to
in-place upgrading. I'm waiting for suggestion.

I have a question about the behavior -- the patch adds ownership
management of large objects. Non-privileged users cannot read, write,
or drop othere's LOs. But they can read the contents of large object
when they read pg_catalog.pg_largeobject directly. Even if the patch
is applied, we still allow "SELECT * FROM pg_largeobject" ...right?

This issue might be solved by the core SE-PgSQL patch,
but what should we do fo now?

Other changes in the patch seem to be reasonable.

"GRANT/REVOKE ON LARGE OBJECT <number>" might be hard to use if used alone,
but we can use the commands as dynamic SQLs in DO statements if we want to
grant or revoke privileges in bulk.

"SELECT oid FROM pg_largeobject_metadata" is used in some places instead of
"SELECT DISTINCT loid FROM pg_largeobject". They return the same result,
but the former will be faster because we don't use DISTINCT. pg_dump will
be slightly accelerated by the new query.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

#21KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Takahiro Itagaki (#20)
1 attachment(s)
Re: Largeobject Access Controls (r2460)

Takahiro Itagaki wrote:

Hi, I'm reviewing LO-AC patch.

KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

Nothing are changed in other codes, including something corresponding to
in-place upgrading. I'm waiting for suggestion.

I have a question about the behavior -- the patch adds ownership
management of large objects. Non-privileged users cannot read, write,
or drop othere's LOs. But they can read the contents of large object
when they read pg_catalog.pg_largeobject directly. Even if the patch
is applied, we still allow "SELECT * FROM pg_largeobject" ...right?

This issue might be solved by the core SE-PgSQL patch,
but what should we do fo now?

Oops, I forgot to fix it.

It is a misconfiguration on database initialization, and not related
issue with SE-PgSQL feature.

It can be solved with revoking any privileges from anybody in the initdb
phase. So, we should inject the following statement for setup_privileges().

REVOKE ALL ON pg_largeobject FROM PUBLIC;

In the default PG model, database superuser is an exception in access
controls, so he can bypass the checks eventually. Here is no difference,
even if he can see pg_largeobject.
For unprivileged users, this configuration restricts the way to access
large object into lo_*() functions, so we can acquire all their accesses
and apply permission checks comprehensively.

When database superuser tries to consider something malicious, such as
exposing any large object to public, we have to give up anything.

Other changes in the patch seem to be reasonable.

"GRANT/REVOKE ON LARGE OBJECT <number>" might be hard to use if used alone,
but we can use the commands as dynamic SQLs in DO statements if we want to
grant or revoke privileges in bulk.

We already have COMMENT ON LARGE OBJECT <number> IS <comment>; statement :-)

"SELECT oid FROM pg_largeobject_metadata" is used in some places instead of
"SELECT DISTINCT loid FROM pg_largeobject". They return the same result,
but the former will be faster because we don't use DISTINCT. pg_dump will
be slightly accelerated by the new query.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

pgsql-blob-fix-initdb.patchtext/x-patch; name=pgsql-blob-fix-initdb.patchDownload
*** base/src/bin/initdb/initdb.c	2009-11-21 05:52:12.000000000 +0900
--- blob/src/bin/initdb/initdb.c	2009-12-13 06:33:55.000000000 +0900
*************** setup_privileges(void)
*** 1783,1788 ****
--- 1783,1789 ----
  		"  WHERE relkind IN ('r', 'v', 'S') AND relacl IS NULL;\n",
  		"GRANT USAGE ON SCHEMA pg_catalog TO PUBLIC;\n",
  		"GRANT CREATE, USAGE ON SCHEMA public TO PUBLIC;\n",
+ 		"REVOKE ALL ON pg_largeobject FROM PUBLIC;\n",
  		NULL
  	};
  
#22Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp
In reply to: KaiGai Kohei (#21)
Re: Largeobject Access Controls (r2460)

KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

we still allow "SELECT * FROM pg_largeobject" ...right?

It can be solved with revoking any privileges from anybody in the initdb
phase. So, we should inject the following statement for setup_privileges().

REVOKE ALL ON pg_largeobject FROM PUBLIC;

OK, I'll add the following description in the documentation of pg_largeobject.

<structname>pg_largeobject</structname> should not be readable by the
public, since the catalog contains data in large objects of all users.
<structname>pg_largeobject_metadata</> is a publicly readable catalog
that only contains identifiers of large objects.

{"lo_compat_privileges", PGC_SUSET, COMPAT_OPTIONS_PREVIOUS,
gettext_noop("Turn on/off privilege checks on large objects."),

The description is true, but gives a confusion because
"lo_compat_privileges = on" means "privilege checks are turned off".

short desc: Enables backward compatibility in privilege checks on large objects
long desc: When turned on, privilege checks on large objects are disabled.

Are those descriptions appropriate?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

#23KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Takahiro Itagaki (#22)
Re: Largeobject Access Controls (r2460)

Takahiro Itagaki wrote:

KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

we still allow "SELECT * FROM pg_largeobject" ...right?

It can be solved with revoking any privileges from anybody in the initdb
phase. So, we should inject the following statement for setup_privileges().

REVOKE ALL ON pg_largeobject FROM PUBLIC;

OK, I'll add the following description in the documentation of pg_largeobject.

<structname>pg_largeobject</structname> should not be readable by the
public, since the catalog contains data in large objects of all users.
<structname>pg_largeobject_metadata</> is a publicly readable catalog
that only contains identifiers of large objects.

It makes sense to me.

{"lo_compat_privileges", PGC_SUSET, COMPAT_OPTIONS_PREVIOUS,
gettext_noop("Turn on/off privilege checks on large objects."),

The description is true, but gives a confusion because
"lo_compat_privileges = on" means "privilege checks are turned off".

short desc: Enables backward compatibility in privilege checks on large objects
long desc: When turned on, privilege checks on large objects are disabled.

Are those descriptions appropriate?

The long description is a bit confusable, because it does not disable all the
privilege checks, such as lo_export/lo_import which already checks superuser
privilege on execution in the earlier releases.

What's your opinion about:

long desc: When turned on, privilege checks on large objects perform with
backward compatibility as 8.4.x or earlier releases.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

#24Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp
In reply to: KaiGai Kohei (#23)
Re: Largeobject Access Controls (r2460)

KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

What's your opinion about:
long desc: When turned on, privilege checks on large objects perform with
backward compatibility as 8.4.x or earlier releases.

I updated the description as your suggest.

Applied with minor editorialization,
mainly around tab-completion support in psql.

# This is my first commit :)

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Takahiro Itagaki (#22)
Re: Largeobject Access Controls (r2460)

Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes:

OK, I'll add the following description in the documentation of pg_largeobject.

<structname>pg_largeobject</structname> should not be readable by the
public, since the catalog contains data in large objects of all users.

This is going to be a problem, because it will break applications that
expect to be able to read pg_largeobject. Like, say, pg_dump.

regards, tom lane

#26KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Tom Lane (#25)
Re: Largeobject Access Controls (r2460)

Tom Lane wrote:

Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes:

OK, I'll add the following description in the documentation of pg_largeobject.

<structname>pg_largeobject</structname> should not be readable by the
public, since the catalog contains data in large objects of all users.

This is going to be a problem, because it will break applications that
expect to be able to read pg_largeobject. Like, say, pg_dump.

Is it a right behavior, even if we have permission checks on large objects?

If so, we can inject a hardwired rule to prevent to select pg_largeobject
when lo_compat_privileges is turned off, instead of REVOKE ALL FROM PUBLIC.

--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

#27Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: KaiGai Kohei (#26)
Re: Largeobject Access Controls (r2460)

2009/12/10 KaiGai Kohei <kaigai@ak.jp.nec.com>:

If so, we can inject a hardwired rule to prevent to select pg_largeobject
when lo_compat_privileges is turned off, instead of REVOKE ALL FROM PUBLIC.

it doesn't seem like a good idea to make that GUC act like a GRANT or
REVOKE on the case of pg_largeobject.
besides if a normal user can read from pg_class why we deny pg_largeobject

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

#28Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp
In reply to: KaiGai Kohei (#26)
Re: Largeobject Access Controls (r2460)

KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

Tom Lane wrote:

Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes:

<structname>pg_largeobject</structname> should not be readable by the
public, since the catalog contains data in large objects of all users.

This is going to be a problem, because it will break applications that
expect to be able to read pg_largeobject. Like, say, pg_dump.

Is it a right behavior, even if we have permission checks on large objects?

Can we use column-level access control here?

REVOKE ALL ON pg_largeobject FROM PUBLIC;
=> GRANT SELECT (loid) ON pg_largeobject TO PUBLIC;

We use "SELECT loid FROM pg_largeobject LIMIT 1" in pg_dump. We could
replace pg_largeobject_metadata instead if we try to fix only pg_dump,
but it's no wonder that any other user applications use such queries.
I think to allow reading loid is a balanced solution.

If so, we can inject a hardwired rule to prevent to select pg_largeobject
when lo_compat_privileges is turned off, instead of REVOKE ALL FROM PUBLIC.

Is it enough to run "GRANT SELECT ON pg_largeobject TO PUBLIC" ?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

#29KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Takahiro Itagaki (#28)
Re: Largeobject Access Controls (r2460)

Takahiro Itagaki wrote:

KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

Tom Lane wrote:

Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes:

<structname>pg_largeobject</structname> should not be readable by the
public, since the catalog contains data in large objects of all users.

This is going to be a problem, because it will break applications that
expect to be able to read pg_largeobject. Like, say, pg_dump.

Is it a right behavior, even if we have permission checks on large objects?

Can we use column-level access control here?

REVOKE ALL ON pg_largeobject FROM PUBLIC;
=> GRANT SELECT (loid) ON pg_largeobject TO PUBLIC;

Indeed, it seems to me reasonable.

We use "SELECT loid FROM pg_largeobject LIMIT 1" in pg_dump. We could
replace pg_largeobject_metadata instead if we try to fix only pg_dump,
but it's no wonder that any other user applications use such queries.
I think to allow reading loid is a balanced solution.

Right, I also remind this query has to be fixed up by other reason right now.
If all the large objects are empty, this query can return nothing, even if
large object entries are in pg_largeobject_metadata.

Please wait for a while.

If so, we can inject a hardwired rule to prevent to select pg_largeobject
when lo_compat_privileges is turned off, instead of REVOKE ALL FROM PUBLIC.

Is it enough to run "GRANT SELECT ON pg_largeobject TO PUBLIC" ?

Agreed.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

#30Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp
In reply to: Jaime Casanova (#27)
Re: Largeobject Access Controls (r2460)

Jaime Casanova <jcasanov@systemguards.com.ec> wrote:

besides if a normal user can read from pg_class why we deny pg_largeobject

pg_class and pg_largeobject_metadata contain only metadata of objects.
Tables and pg_largeobject contain actual data of the objects. A normal user
can read pg_class, but cannot read contents of tables without privilege.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

#31KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: KaiGai Kohei (#29)
1 attachment(s)
Re: Largeobject Access Controls (r2460)

KaiGai Kohei wrote:

Takahiro Itagaki wrote:

KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

Tom Lane wrote:

Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes:

<structname>pg_largeobject</structname> should not be readable by the
public, since the catalog contains data in large objects of all users.

This is going to be a problem, because it will break applications that
expect to be able to read pg_largeobject. Like, say, pg_dump.

Is it a right behavior, even if we have permission checks on large objects?

Can we use column-level access control here?

REVOKE ALL ON pg_largeobject FROM PUBLIC;
=> GRANT SELECT (loid) ON pg_largeobject TO PUBLIC;

Indeed, it seems to me reasonable.

We use "SELECT loid FROM pg_largeobject LIMIT 1" in pg_dump. We could
replace pg_largeobject_metadata instead if we try to fix only pg_dump,
but it's no wonder that any other user applications use such queries.
I think to allow reading loid is a balanced solution.

Right, I also remind this query has to be fixed up by other reason right now.
If all the large objects are empty, this query can return nothing, even if
large object entries are in pg_largeobject_metadata.

Please wait for a while.

The attached patch fixes these matters.

* It adds "GRANT SELECT (loid) ON pg_largeobject TO PUBLIC;" during initdb
phase to resolve the matter pointed out.

* A few queries in pg_dump were fixed to select pg_largeobject_metadata
instead of pg_largeobject. If a dumpable large obejct is empty (it means
no page frames are on pg_largeobject), pg_dump misunderstand no such
large object is here.
We have to reference pg_largeobject_metadata to check whether a certain
large objct exists, or not.

Thanks,

$ diffstat ~/pgsql-blob-priv-fix.patch
doc/src/sgml/catalogs.sgml | 3 !!!
src/bin/initdb/initdb.c | 1 +
src/bin/pg_dump/pg_dump.c | 8 !!!!!!!!
src/test/regress/expected/privileges.out | 15 +++++++++++++++
src/test/regress/sql/privileges.sql | 8 ++++++++
5 files changed, 24 insertions(+), 11 modifications(!)
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

pgsql-blob-priv-fix.patchtext/x-patch; name=pgsql-blob-priv-fix.patchDownload
*** base/doc/src/sgml/catalogs.sgml	(revision 2467)
--- base/doc/src/sgml/catalogs.sgml	(working copy)
***************
*** 3136,3142 ****
  
    <para>
     <structname>pg_largeobject</structname> should not be readable by the
!    public, since the catalog contains data in large objects of all users.
     <structname>pg_largeobject_metadata</> is a publicly readable catalog
     that only contains identifiers of large objects.
    </para>
--- 3136,3143 ----
  
    <para>
     <structname>pg_largeobject</structname> should not be readable by the
!    public (except for <structfield>loid</structfield>), since the catalog
!    contains data in large objects of all users.
     <structname>pg_largeobject_metadata</> is a publicly readable catalog
     that only contains identifiers of large objects.
    </para>
*** base/src/test/regress/sql/privileges.sql	(revision 2467)
--- base/src/test/regress/sql/privileges.sql	(working copy)
***************
*** 565,570 ****
--- 565,578 ----
  SELECT lo_unlink(1002);
  SELECT lo_export(1001, '/dev/null');			-- to be denied
  
+ -- don't allow unpriv users to access pg_largeobject contents
+ \c -
+ SELECT * FROM pg_largeobject LIMIT 0;
+ 
+ SET SESSION AUTHORIZATION regressuser1;
+ SELECT * FROM pg_largeobject LIMIT 0;			-- to be denied
+ SELECT loid FROM pg_largeobject LIMIT 0;
+ 
  -- test default ACLs
  \c -
  
*** base/src/test/regress/expected/privileges.out	(revision 2467)
--- base/src/test/regress/expected/privileges.out	(working copy)
***************
*** 1041,1046 ****
--- 1041,1061 ----
  SELECT lo_export(1001, '/dev/null');			-- to be denied
  ERROR:  must be superuser to use server-side lo_export()
  HINT:  Anyone can use the client-side lo_export() provided by libpq.
+ -- don't allow unpriv users to access pg_largeobject contents
+ \c -
+ SELECT * FROM pg_largeobject LIMIT 0;
+  loid | pageno | data 
+ ------+--------+------
+ (0 rows)
+ 
+ SET SESSION AUTHORIZATION regressuser1;
+ SELECT * FROM pg_largeobject LIMIT 0;			-- to be denied
+ ERROR:  permission denied for relation pg_largeobject
+ SELECT loid FROM pg_largeobject LIMIT 0;
+  loid 
+ ------
+ (0 rows)
+ 
  -- test default ACLs
  \c -
  CREATE SCHEMA testns;
*** base/src/bin/initdb/initdb.c	(revision 2467)
--- base/src/bin/initdb/initdb.c	(working copy)
***************
*** 1784,1789 ****
--- 1784,1790 ----
  		"GRANT USAGE ON SCHEMA pg_catalog TO PUBLIC;\n",
  		"GRANT CREATE, USAGE ON SCHEMA public TO PUBLIC;\n",
  		"REVOKE ALL ON pg_largeobject FROM PUBLIC;\n",
+ 		"GRANT SELECT (loid) ON pg_largeobject TO PUBLIC;\n",
  		NULL
  	};
  
*** base/src/bin/pg_dump/pg_dump.c	(revision 2467)
--- base/src/bin/pg_dump/pg_dump.c	(working copy)
***************
*** 1945,1951 ****
  	selectSourceSchema("pg_catalog");
  
  	/* Check for BLOB OIDs */
! 	if (AH->remoteVersion >= 70100)
  		blobQry = "SELECT loid FROM pg_largeobject LIMIT 1";
  	else
  		blobQry = "SELECT oid FROM pg_class WHERE relkind = 'l' LIMIT 1";
--- 1945,1953 ----
  	selectSourceSchema("pg_catalog");
  
  	/* Check for BLOB OIDs */
! 	if (AH->remoteVersion >= 80500)
! 		blobQry = "SELECT oid FROM pg_largeobject_metadata LIMIT 1";
! 	else if (AH->remoteVersion >= 70100)
  		blobQry = "SELECT loid FROM pg_largeobject LIMIT 1";
  	else
  		blobQry = "SELECT oid FROM pg_class WHERE relkind = 'l' LIMIT 1";
***************
*** 1981,1987 ****
  	selectSourceSchema("pg_catalog");
  
  	/* Cursor to get all BLOB OIDs */
! 	if (AH->remoteVersion >= 70100)
  		blobQry = "DECLARE bloboid CURSOR FOR SELECT DISTINCT loid FROM pg_largeobject";
  	else
  		blobQry = "DECLARE bloboid CURSOR FOR SELECT oid FROM pg_class WHERE relkind = 'l'";
--- 1983,1991 ----
  	selectSourceSchema("pg_catalog");
  
  	/* Cursor to get all BLOB OIDs */
! 	if (AH->remoteVersion >= 80500)
! 		blobQry = "DECLARE bloboid CURSOR FOR SELECT oid FROM pg_largeobject_metadata";
! 	else if (AH->remoteVersion >= 70100)
  		blobQry = "DECLARE bloboid CURSOR FOR SELECT DISTINCT loid FROM pg_largeobject";
  	else
  		blobQry = "DECLARE bloboid CURSOR FOR SELECT oid FROM pg_class WHERE relkind = 'l'";
#32Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp
In reply to: KaiGai Kohei (#31)
Re: Largeobject Access Controls (r2460)

KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

The attached patch fixes these matters.

I'll start to check it.

We have to reference pg_largeobject_metadata to check whether a certain
large objct exists, or not.

What is the situation where there is a row in pg_largeobject_metadata
and no corresponding rows in pg_largeobject? Do we have a method to
delete all rows in pg_largeobject but leave some metadata?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

#33KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Takahiro Itagaki (#32)
Re: Largeobject Access Controls (r2460)

Takahiro Itagaki wrote:

KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

The attached patch fixes these matters.

I'll start to check it.

Thanks,

We have to reference pg_largeobject_metadata to check whether a certain
large objct exists, or not.

What is the situation where there is a row in pg_largeobject_metadata
and no corresponding rows in pg_largeobject? Do we have a method to
delete all rows in pg_largeobject but leave some metadata?

It is a case when we create a new large object, but write nothing.

postgres=# SELECT lo_create(1234);
lo_create
-----------
1234
(1 row)

postgres=# SELECT * FROM pg_largeobject_metadata WHERE oid = 1234;
lomowner | lomacl
----------+--------
10 |
(1 row)

postgres=# SELECT * FROM pg_largeobject WHERE loid = 1234;
loid | pageno | data
------+--------+------
(0 rows)

In this case, the following two queries are not equivalent.
* SELECT oid FROM pg_largeobject_metadata
* SELECT DISTINCT loid FROM pg_largeobject

The second query does not return the loid of empty large objects.

The prior implementation inserted a zero-length page to show here is
a large object with this loid, but it got unnecessary with this
enhancement.
If we need compatibility in this level, we can insert a zero-length
page into pg_largeobject on LargeObjectCreate().
It is harmless, but its worth is uncertain.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

#34Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp
In reply to: KaiGai Kohei (#33)
1 attachment(s)
Re: Largeobject Access Controls (r2460)

KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

We have to reference pg_largeobject_metadata to check whether a certain
large objct exists, or not.

It is a case when we create a new large object, but write nothing.

OK, that makes sense.

In addition of the patch, we also need to fix pg_restore with
--clean option. I added DropBlobIfExists() in pg_backup_db.c.

A revised patch attached. Please check further mistakes.

BTW, we can optimize lo_truncate because we allow metadata-only large
objects. inv_truncate() doesn't have to update the first data tuple to
be zero length. It only has to delete all corresponding tuples like as:
DELETE FROM pg_largeobject WHERE loid = {obj_desc->id}

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Attachments:

pgsql-blob-priv-fix_v2.patchapplication/octet-stream; name=pgsql-blob-priv-fix_v2.patchDownload
diff -cprN head/doc/src/sgml/catalogs.sgml work/doc/src/sgml/catalogs.sgml
*** head/doc/src/sgml/catalogs.sgml	Fri Dec 11 12:39:49 2009
--- work/doc/src/sgml/catalogs.sgml	Fri Dec 11 16:38:25 2009
***************
*** 3136,3142 ****
  
    <para>
     <structname>pg_largeobject</structname> should not be readable by the
!    public, since the catalog contains data in large objects of all users.
     <structname>pg_largeobject_metadata</> is a publicly readable catalog
     that only contains identifiers of large objects.
    </para>
--- 3136,3143 ----
  
    <para>
     <structname>pg_largeobject</structname> should not be readable by the
!    public (except for <structfield>loid</structfield>), since the catalog
!    contains data in large objects of all users.
     <structname>pg_largeobject_metadata</> is a publicly readable catalog
     that only contains identifiers of large objects.
    </para>
Binary files head/dump.dmp and work/dump.dmp differ
diff -cprN head/src/bin/initdb/initdb.c work/src/bin/initdb/initdb.c
*** head/src/bin/initdb/initdb.c	Fri Dec 11 12:39:49 2009
--- work/src/bin/initdb/initdb.c	Fri Dec 11 16:38:25 2009
*************** setup_privileges(void)
*** 1784,1789 ****
--- 1784,1790 ----
  		"GRANT USAGE ON SCHEMA pg_catalog TO PUBLIC;\n",
  		"GRANT CREATE, USAGE ON SCHEMA public TO PUBLIC;\n",
  		"REVOKE ALL ON pg_largeobject FROM PUBLIC;\n",
+ 		"GRANT SELECT (loid) ON pg_largeobject TO PUBLIC;\n",
  		NULL
  	};
  
diff -cprN head/src/bin/pg_dump/pg_backup_archiver.c work/src/bin/pg_dump/pg_backup_archiver.c
*** head/src/bin/pg_dump/pg_backup_archiver.c	Tue Oct  6 16:37:35 2009
--- work/src/bin/pg_dump/pg_backup_archiver.c	Fri Dec 11 17:03:56 2009
*************** StartRestoreBlob(ArchiveHandle *AH, Oid 
*** 914,921 ****
  	ahlog(AH, 2, "restoring large object with OID %u\n", oid);
  
  	if (drop)
! 		ahprintf(AH, "SELECT CASE WHEN EXISTS(SELECT 1 FROM pg_catalog.pg_largeobject WHERE loid = '%u') THEN pg_catalog.lo_unlink('%u') END;\n",
! 				 oid, oid);
  
  	if (AH->connection)
  	{
--- 914,920 ----
  	ahlog(AH, 2, "restoring large object with OID %u\n", oid);
  
  	if (drop)
! 		DropBlobIfExists(AH, oid);
  
  	if (AH->connection)
  	{
diff -cprN head/src/bin/pg_dump/pg_backup_archiver.h work/src/bin/pg_dump/pg_backup_archiver.h
*** head/src/bin/pg_dump/pg_backup_archiver.h	Mon Aug 10 09:40:51 2009
--- work/src/bin/pg_dump/pg_backup_archiver.h	Fri Dec 11 16:56:26 2009
*************** extern void InitArchiveFmt_Tar(ArchiveHa
*** 371,376 ****
--- 371,377 ----
  extern bool isValidTarHeader(char *header);
  
  extern int	ReconnectToServer(ArchiveHandle *AH, const char *dbname, const char *newUser);
+ extern void	DropBlobIfExists(ArchiveHandle *AH, Oid oid);
  
  int			ahwrite(const void *ptr, size_t size, size_t nmemb, ArchiveHandle *AH);
  int			ahprintf(ArchiveHandle *AH, const char *fmt,...) __attribute__((format(printf, 2, 3)));
diff -cprN head/src/bin/pg_dump/pg_backup_db.c work/src/bin/pg_dump/pg_backup_db.c
*** head/src/bin/pg_dump/pg_backup_db.c	Fri Jun 12 09:52:43 2009
--- work/src/bin/pg_dump/pg_backup_db.c	Fri Dec 11 16:57:35 2009
*************** CommitTransaction(ArchiveHandle *AH)
*** 652,657 ****
--- 652,679 ----
  	ExecuteSqlCommand(AH, "COMMIT", "could not commit database transaction");
  }
  
+ void
+ DropBlobIfExists(ArchiveHandle *AH, Oid oid)
+ {
+ 	const char *lo_relname;
+ 	const char *lo_colname;
+ 
+ 	if (PQserverVersion(AH->connection) >= 80500)
+ 	{
+ 		lo_relname = "pg_largeobject_metadata";
+ 		lo_colname = "oid";
+ 	}
+ 	else
+ 	{
+ 		lo_relname = "pg_largeobject";
+ 		lo_colname = "loid";
+ 	}
+ 
+ 	/* Call lo_unlink only if exists to avoid not-found error. */
+ 	ahprintf(AH, "SELECT CASE WHEN EXISTS(SELECT 1 FROM pg_catalog.%s WHERE %s = '%u') THEN pg_catalog.lo_unlink('%u') END;\n",
+ 			 lo_relname, lo_colname, oid, oid);
+ }
+ 
  static bool
  _isIdentChar(unsigned char c)
  {
diff -cprN head/src/bin/pg_dump/pg_backup_null.c work/src/bin/pg_dump/pg_backup_null.c
*** head/src/bin/pg_dump/pg_backup_null.c	Wed Aug  5 10:11:11 2009
--- work/src/bin/pg_dump/pg_backup_null.c	Fri Dec 11 16:58:30 2009
*************** _StartBlob(ArchiveHandle *AH, TocEntry *
*** 151,158 ****
  		die_horribly(AH, NULL, "invalid OID for large object\n");
  
  	if (AH->ropt->dropSchema)
! 		ahprintf(AH, "SELECT CASE WHEN EXISTS(SELECT 1 FROM pg_catalog.pg_largeobject WHERE loid = '%u') THEN pg_catalog.lo_unlink('%u') END;\n",
! 				 oid, oid);
  
  	ahprintf(AH, "SELECT pg_catalog.lo_open(pg_catalog.lo_create('%u'), %d);\n",
  			 oid, INV_WRITE);
--- 151,157 ----
  		die_horribly(AH, NULL, "invalid OID for large object\n");
  
  	if (AH->ropt->dropSchema)
! 		DropBlobIfExists(AH, oid);
  
  	ahprintf(AH, "SELECT pg_catalog.lo_open(pg_catalog.lo_create('%u'), %d);\n",
  			 oid, INV_WRITE);
diff -cprN head/src/bin/pg_dump/pg_dump.c work/src/bin/pg_dump/pg_dump.c
*** head/src/bin/pg_dump/pg_dump.c	Fri Dec 11 12:39:49 2009
--- work/src/bin/pg_dump/pg_dump.c	Fri Dec 11 16:38:26 2009
*************** hasBlobs(Archive *AH)
*** 1945,1951 ****
  	selectSourceSchema("pg_catalog");
  
  	/* Check for BLOB OIDs */
! 	if (AH->remoteVersion >= 70100)
  		blobQry = "SELECT loid FROM pg_largeobject LIMIT 1";
  	else
  		blobQry = "SELECT oid FROM pg_class WHERE relkind = 'l' LIMIT 1";
--- 1945,1953 ----
  	selectSourceSchema("pg_catalog");
  
  	/* Check for BLOB OIDs */
! 	if (AH->remoteVersion >= 80500)
! 		blobQry = "SELECT oid FROM pg_largeobject_metadata LIMIT 1";
! 	else if (AH->remoteVersion >= 70100)
  		blobQry = "SELECT loid FROM pg_largeobject LIMIT 1";
  	else
  		blobQry = "SELECT oid FROM pg_class WHERE relkind = 'l' LIMIT 1";
*************** dumpBlobs(Archive *AH, void *arg)
*** 1981,1987 ****
  	selectSourceSchema("pg_catalog");
  
  	/* Cursor to get all BLOB OIDs */
! 	if (AH->remoteVersion >= 70100)
  		blobQry = "DECLARE bloboid CURSOR FOR SELECT DISTINCT loid FROM pg_largeobject";
  	else
  		blobQry = "DECLARE bloboid CURSOR FOR SELECT oid FROM pg_class WHERE relkind = 'l'";
--- 1983,1991 ----
  	selectSourceSchema("pg_catalog");
  
  	/* Cursor to get all BLOB OIDs */
! 	if (AH->remoteVersion >= 80500)
! 		blobQry = "DECLARE bloboid CURSOR FOR SELECT oid FROM pg_largeobject_metadata";
! 	else if (AH->remoteVersion >= 70100)
  		blobQry = "DECLARE bloboid CURSOR FOR SELECT DISTINCT loid FROM pg_largeobject";
  	else
  		blobQry = "DECLARE bloboid CURSOR FOR SELECT oid FROM pg_class WHERE relkind = 'l'";
diff -cprN head/src/test/regress/expected/privileges.out work/src/test/regress/expected/privileges.out
*** head/src/test/regress/expected/privileges.out	Fri Dec 11 12:39:49 2009
--- work/src/test/regress/expected/privileges.out	Fri Dec 11 16:38:25 2009
*************** SELECT lo_unlink(1002);
*** 1041,1046 ****
--- 1041,1061 ----
  SELECT lo_export(1001, '/dev/null');			-- to be denied
  ERROR:  must be superuser to use server-side lo_export()
  HINT:  Anyone can use the client-side lo_export() provided by libpq.
+ -- don't allow unpriv users to access pg_largeobject contents
+ \c -
+ SELECT * FROM pg_largeobject LIMIT 0;
+  loid | pageno | data 
+ ------+--------+------
+ (0 rows)
+ 
+ SET SESSION AUTHORIZATION regressuser1;
+ SELECT * FROM pg_largeobject LIMIT 0;			-- to be denied
+ ERROR:  permission denied for relation pg_largeobject
+ SELECT loid FROM pg_largeobject LIMIT 0;
+  loid 
+ ------
+ (0 rows)
+ 
  -- test default ACLs
  \c -
  CREATE SCHEMA testns;
diff -cprN head/src/test/regress/sql/privileges.sql work/src/test/regress/sql/privileges.sql
*** head/src/test/regress/sql/privileges.sql	Fri Dec 11 12:39:49 2009
--- work/src/test/regress/sql/privileges.sql	Fri Dec 11 16:38:25 2009
*************** SELECT lo_truncate(lo_open(1002, x'20000
*** 565,570 ****
--- 565,578 ----
  SELECT lo_unlink(1002);
  SELECT lo_export(1001, '/dev/null');			-- to be denied
  
+ -- don't allow unpriv users to access pg_largeobject contents
+ \c -
+ SELECT * FROM pg_largeobject LIMIT 0;
+ 
+ SET SESSION AUTHORIZATION regressuser1;
+ SELECT * FROM pg_largeobject LIMIT 0;			-- to be denied
+ SELECT loid FROM pg_largeobject LIMIT 0;
+ 
  -- test default ACLs
  \c -
  
#35Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp
In reply to: Takahiro Itagaki (#34)
1 attachment(s)
Re: Largeobject Access Controls (r2460)

Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> wrote:

In addition of the patch, we also need to fix pg_restore with
--clean option. I added DropBlobIfExists() in pg_backup_db.c.

A revised patch attached. Please check further mistakes.

...and here is an additional fix for contrib modules.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Attachments:

fix-lo-contrib.patchapplication/octet-stream; name=fix-lo-contrib.patchDownload
Index: contrib/lo/lo_test.sql
===================================================================
--- contrib/lo/lo_test.sql	(head)
+++ contrib/lo/lo_test.sql	(work)
@@ -12,7 +12,7 @@
 --
 
 -- Check what is in pg_largeobject
-SELECT count(DISTINCT loid) FROM pg_largeobject;
+SELECT count(oid) FROM pg_largeobject_metadata;
 
 -- ignore any errors here - simply drop the table if it already exists
 DROP TABLE a;
@@ -74,6 +74,6 @@
 DROP TABLE a;
 
 -- Check what is in pg_largeobject ... if different from original, trouble
-SELECT count(DISTINCT loid) FROM pg_largeobject;
+SELECT count(oid) FROM pg_largeobject_metadata;
 
 -- end of tests
Index: contrib/vacuumlo/vacuumlo.c
===================================================================
--- contrib/vacuumlo/vacuumlo.c	(head)
+++ contrib/vacuumlo/vacuumlo.c	(work)
@@ -142,7 +142,10 @@
 	 */
 	buf[0] = '\0';
 	strcat(buf, "CREATE TEMP TABLE vacuum_l AS ");
-	strcat(buf, "SELECT DISTINCT loid AS lo FROM pg_largeobject ");
+	if (PQserverVersion(conn) >= 80500)
+		strcat(buf, "SELECT oid AS lo FROM pg_largeobject_metadata");
+	else
+		strcat(buf, "SELECT DISTINCT loid AS lo FROM pg_largeobject");
 	res = PQexec(conn, buf);
 	if (PQresultStatus(res) != PGRES_COMMAND_OK)
 	{
#36Bruce Momjian
bruce@momjian.us
In reply to: KaiGai Kohei (#29)
Re: Largeobject Access Controls (r2460)

KaiGai Kohei wrote:

Takahiro Itagaki wrote:

KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

Tom Lane wrote:

Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes:

<structname>pg_largeobject</structname> should not be readable by the
public, since the catalog contains data in large objects of all users.

This is going to be a problem, because it will break applications that
expect to be able to read pg_largeobject. Like, say, pg_dump.

Is it a right behavior, even if we have permission checks on large objects?

Can we use column-level access control here?

REVOKE ALL ON pg_largeobject FROM PUBLIC;
=> GRANT SELECT (loid) ON pg_largeobject TO PUBLIC;

Indeed, it seems to me reasonable.

We use "SELECT loid FROM pg_largeobject LIMIT 1" in pg_dump. We could
replace pg_largeobject_metadata instead if we try to fix only pg_dump,
but it's no wonder that any other user applications use such queries.
I think to allow reading loid is a balanced solution.

Right, I also remind this query has to be fixed up by other reason right now.
If all the large objects are empty, this query can return nothing, even if
large object entries are in pg_largeobject_metadata.

"metadata" seems very vague. Can't we come up with a more descriptive
name?

Also, how will this affect pg_migrator? pg_migrator copies
pg_largeobject and its index from the old to the new server. Is the
format inside pg_largeobject changed by this patch? What happens when
there is no entry in pg_largeobject_metadata for a specific row?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#37KaiGai Kohei
kaigai@kaigai.gr.jp
In reply to: Bruce Momjian (#36)
Re: Largeobject Access Controls (r2460)

Bruce Momjian さんは書きました:

KaiGai Kohei wrote:

Takahiro Itagaki wrote:

KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

Tom Lane wrote:

Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes:

<structname>pg_largeobject</structname> should not be readable by the
public, since the catalog contains data in large objects of all users.

This is going to be a problem, because it will break applications that
expect to be able to read pg_largeobject. Like, say, pg_dump.

Is it a right behavior, even if we have permission checks on large objects?

Can we use column-level access control here?

REVOKE ALL ON pg_largeobject FROM PUBLIC;
=> GRANT SELECT (loid) ON pg_largeobject TO PUBLIC;

Indeed, it seems to me reasonable.

We use "SELECT loid FROM pg_largeobject LIMIT 1" in pg_dump. We could
replace pg_largeobject_metadata instead if we try to fix only pg_dump,
but it's no wonder that any other user applications use such queries.
I think to allow reading loid is a balanced solution.

Right, I also remind this query has to be fixed up by other reason right now.
If all the large objects are empty, this query can return nothing, even if
large object entries are in pg_largeobject_metadata.

"metadata" seems very vague. Can't we come up with a more descriptive
name?

What about "property"?
The "metadata" was the suggested name from Robert Haas at the last
commit fest, because we may store any other properties of a large
object in this catalog future.

Also, how will this affect pg_migrator? pg_migrator copies
pg_largeobject and its index from the old to the new server. Is the
format inside pg_largeobject changed by this patch?

The format of pg_largeobject was not touched.

What happens when
there is no entry in pg_largeobject_metadata for a specific row?

In this case, these rows become orphan.
So, I think we need to create an empty large object with same LOID on
pg_migrator. It makes an entry on pg_largeobject_metadata without
writing anything to the pg_largeobject.
I guess rest of migrations are not difference. Correct?

Thanks,

#38Bruce Momjian
bruce@momjian.us
In reply to: KaiGai Kohei (#37)
Re: Largeobject Access Controls (r2460)

KaiGai Kohei wrote:

We use "SELECT loid FROM pg_largeobject LIMIT 1" in pg_dump. We could
replace pg_largeobject_metadata instead if we try to fix only pg_dump,
but it's no wonder that any other user applications use such queries.
I think to allow reading loid is a balanced solution.

Right, I also remind this query has to be fixed up by other reason right now.
If all the large objects are empty, this query can return nothing, even if
large object entries are in pg_largeobject_metadata.

"metadata" seems very vague. Can't we come up with a more descriptive
name?

What about "property"?
The "metadata" was the suggested name from Robert Haas at the last
commit fest, because we may store any other properties of a large
object in this catalog future.

Well, we usually try to be more specific about what something represents
and only later abstract it out if needed, but if everyone else is fine
with 'metadata', then just leave it unchanged.

Also, how will this affect pg_migrator? pg_migrator copies
pg_largeobject and its index from the old to the new server. Is the
format inside pg_largeobject changed by this patch?

The format of pg_largeobject was not touched.

Good.

What happens when
there is no entry in pg_largeobject_metadata for a specific row?

In this case, these rows become orphan.
So, I think we need to create an empty large object with same LOID on
pg_migrator. It makes an entry on pg_largeobject_metadata without
writing anything to the pg_largeobject.
I guess rest of migrations are not difference. Correct?

Uh, yea, pg_migrator could do that pretty easily.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#39KaiGai Kohei
kaigai@kaigai.gr.jp
In reply to: Takahiro Itagaki (#34)
Re: Largeobject Access Controls (r2460)

Takahiro Itagaki wrote:

KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

We have to reference pg_largeobject_metadata to check whether a certain
large objct exists, or not.

It is a case when we create a new large object, but write nothing.

OK, that makes sense.

In addition of the patch, we also need to fix pg_restore with
--clean option. I added DropBlobIfExists() in pg_backup_db.c.

A revised patch attached. Please check further mistakes.

+ void
+ DropBlobIfExists(ArchiveHandle *AH, Oid oid)
+ {
+   const char *lo_relname;
+   const char *lo_colname;
+
+   if (PQserverVersion(AH->connection) >= 80500)
+   {
+       lo_relname = "pg_largeobject_metadata";
+       lo_colname = "oid";
+   }
+   else
+   {
+       lo_relname = "pg_largeobject";
+       lo_colname = "loid";
+   }
+
+   /* Call lo_unlink only if exists to avoid not-found error. */
+   ahprintf(AH, "SELECT CASE WHEN EXISTS(SELECT 1 FROM pg_catalog.%s WHERE %s = '%u') THEN pg_catalog.lo_unlink('%u') END;\n",
+            lo_relname, lo_colname, oid, oid);
+ }

I think the following approach is more reasonable for the current design.

if (PQserverVersion(AH->connection) >= 80500)
{
/* newer query */
ahprintf(AH, "SELECT pg_catalog.lo_unlink(oid) "
"FROM pg_catalog.pg_largeobject_metadata "
"WHERE oid = %u;\n", oid);
}
else
{
/* original query */
ahprintf(AH, "SELECT CASE WHEN EXISTS(SELECT 1 FROM pg_catalog.pg_largeobject WHERE loid = '%u') "
"THEN pg_catalog.lo_unlink('%u') END;\n", oid, oid);
}

We don't have any reason why still CASE ... WHEN and subquery for the given
LOID. Right?

The fix-lo-contrib.patch looks good for me.

BTW, we can optimize lo_truncate because we allow metadata-only large
objects. inv_truncate() doesn't have to update the first data tuple to
be zero length. It only has to delete all corresponding tuples like as:
DELETE FROM pg_largeobject WHERE loid = {obj_desc->id}

Right, when inv_truncate takes an aligned length by LOBLKSIZE.

I'll also submit a small patch on CF-Jan, OK?

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

#40Bruce Momjian
bruce@momjian.us
In reply to: KaiGai Kohei (#37)
1 attachment(s)
Re: Largeobject Access Controls (r2460)

KaiGai Kohei wrote:

What happens when
there is no entry in pg_largeobject_metadata for a specific row?

In this case, these rows become orphan.
So, I think we need to create an empty large object with same LOID on
pg_migrator. It makes an entry on pg_largeobject_metadata without
writing anything to the pg_largeobject.
I guess rest of migrations are not difference. Correct?

Agreed. I have modified pg_migrator with the attached patch which
creates a script that adds default permissions for all large object
tables.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachments:

/rtmp/difftext/x-diffDownload
? tools
? log
? src/pg_migrator
Index: src/info.c
===================================================================
RCS file: /cvsroot/pg-migrator/pg_migrator/src/info.c,v
retrieving revision 1.25
diff -c -r1.25 info.c
*** src/info.c	10 Dec 2009 23:14:25 -0000	1.25
--- src/info.c	13 Dec 2009 01:17:37 -0000
***************
*** 480,486 ****
  										"SELECT DISTINCT probin "
  										"FROM	pg_catalog.pg_proc "
  										"WHERE	prolang = 13 /* C */ AND "
! 										"		probin IS NOT NULL");
  		totaltups += PQntuples(ress[dbnum]);
  
  		PQfinish(conn);
--- 480,488 ----
  										"SELECT DISTINCT probin "
  										"FROM	pg_catalog.pg_proc "
  										"WHERE	prolang = 13 /* C */ AND "
! 										"		probin IS NOT NULL AND "
! 										"		oid >= "
! 										STRINGIFY(FirstNormalObjectId) ";");
  		totaltups += PQntuples(ress[dbnum]);
  
  		PQfinish(conn);
Index: src/pg_migrator.c
===================================================================
RCS file: /cvsroot/pg-migrator/pg_migrator/src/pg_migrator.c,v
retrieving revision 1.69
diff -c -r1.69 pg_migrator.c
*** src/pg_migrator.c	10 Dec 2009 14:34:19 -0000	1.69
--- src/pg_migrator.c	13 Dec 2009 01:17:37 -0000
***************
*** 92,97 ****
--- 92,100 ----
  			sequence_script_file_name =
  				v8_3_create_sequence_script(&ctx, CLUSTER_OLD);
  	}
+ 	if (GET_MAJOR_VERSION(ctx.old.pg_version) <= 804 &&
+ 		GET_MAJOR_VERSION(ctx.new.pg_version) >= 805)
+ 		v8_4_populate_pg_largeobject_metadata(&ctx, true, CLUSTER_OLD);
  
  	/* Looks okay so far.  Prepare the pg_dump output */
  	generate_old_dump(&ctx);
***************
*** 294,299 ****
--- 297,309 ----
  		v8_3_invalidate_bpchar_pattern_ops_indexes(&ctx, false, CLUSTER_NEW);
  		stop_postmaster(&ctx, false, true);
  	}
+ 	if (GET_MAJOR_VERSION(ctx.old.pg_version) <= 804 &&
+ 		GET_MAJOR_VERSION(ctx.new.pg_version) >= 805)
+ 	{
+ 		start_postmaster(&ctx, CLUSTER_NEW, true);
+ 		v8_4_populate_pg_largeobject_metadata(&ctx, false, CLUSTER_NEW);
+ 		stop_postmaster(&ctx, false, true);
+ 	}
  	
  	pg_log(&ctx, PG_REPORT, "\n*Upgrade complete*\n");
  
***************
*** 416,422 ****
  	char		new_clog_path[MAXPGPATH];
  
  	/* copy old commit logs to new data dir */
! 	prep_status(ctx, "Deleting old commit clogs");
  
  	snprintf(old_clog_path, sizeof(old_clog_path), "%s/pg_clog", ctx->old.pgdata);
  	snprintf(new_clog_path, sizeof(new_clog_path), "%s/pg_clog", ctx->new.pgdata);
--- 426,432 ----
  	char		new_clog_path[MAXPGPATH];
  
  	/* copy old commit logs to new data dir */
! 	prep_status(ctx, "Deleting new commit clogs");
  
  	snprintf(old_clog_path, sizeof(old_clog_path), "%s/pg_clog", ctx->old.pgdata);
  	snprintf(new_clog_path, sizeof(new_clog_path), "%s/pg_clog", ctx->new.pgdata);
***************
*** 424,430 ****
  		pg_log(ctx, PG_FATAL, "Unable to delete directory %s\n", new_clog_path);
  	check_ok(ctx);
  
! 	prep_status(ctx, "Copying commit clogs");
  	/* libpgport's copydir() doesn't work in FRONTEND code */
  #ifndef WIN32
  	exec_prog(ctx, true, SYSTEMQUOTE "%s \"%s\" \"%s\"" SYSTEMQUOTE,
--- 434,440 ----
  		pg_log(ctx, PG_FATAL, "Unable to delete directory %s\n", new_clog_path);
  	check_ok(ctx);
  
! 	prep_status(ctx, "Copying old commit clogs to new server");
  	/* libpgport's copydir() doesn't work in FRONTEND code */
  #ifndef WIN32
  	exec_prog(ctx, true, SYSTEMQUOTE "%s \"%s\" \"%s\"" SYSTEMQUOTE,
Index: src/pg_migrator.h
===================================================================
RCS file: /cvsroot/pg-migrator/pg_migrator/src/pg_migrator.h,v
retrieving revision 1.75
diff -c -r1.75 pg_migrator.h
*** src/pg_migrator.h	12 Dec 2009 16:56:23 -0000	1.75
--- src/pg_migrator.h	13 Dec 2009 01:17:37 -0000
***************
*** 395,400 ****
--- 395,402 ----
  							bool check_mode, Cluster whichCluster);
  void		v8_3_invalidate_bpchar_pattern_ops_indexes(migratorContext *ctx,
  							bool check_mode, Cluster whichCluster);
+ void		v8_4_populate_pg_largeobject_metadata(migratorContext *ctx,
+ 							bool check_mode, Cluster whichCluster);
  char 		*v8_3_create_sequence_script(migratorContext *ctx,
  							Cluster whichCluster);
  void		check_for_composite_types(migratorContext *ctx,
Index: src/version.c
===================================================================
RCS file: /cvsroot/pg-migrator/pg_migrator/src/version.c,v
retrieving revision 1.32
diff -c -r1.32 version.c
*** src/version.c	7 Aug 2009 20:16:12 -0000	1.32
--- src/version.c	13 Dec 2009 01:17:37 -0000
***************
*** 421,427 ****
  					"| between your old and new clusters so the tables\n"
  					"| must be rebuilt.  The file:\n"
  					"| \t%s\n"
! 					"| when executed by psql by the database super-user,\n"
  					"| will rebuild all tables with tsvector columns.\n\n",
  					output_path);
  	}
--- 421,427 ----
  					"| between your old and new clusters so the tables\n"
  					"| must be rebuilt.  The file:\n"
  					"| \t%s\n"
! 					"| when executed by psql by the database super-user\n"
  					"| will rebuild all tables with tsvector columns.\n\n",
  					output_path);
  	}
***************
*** 535,541 ****
  					"| they must be reindexed with the REINDEX command.\n"
  					"| The file:\n"
  					"| \t%s\n"
! 					"| when executed by psql by the database super-user,\n"
  					"| will recreate all invalid indexes; until then,\n"
  					"| none of these indexes will be used.\n\n",
  					output_path);
--- 535,541 ----
  					"| they must be reindexed with the REINDEX command.\n"
  					"| The file:\n"
  					"| \t%s\n"
! 					"| when executed by psql by the database super-user\n"
  					"| will recreate all invalid indexes; until then,\n"
  					"| none of these indexes will be used.\n\n",
  					output_path);
***************
*** 664,670 ****
  					"| new clusters so they must be reindexed with the\n"
  					"| REINDEX command.  The file:\n"
  					"| \t%s\n"
! 					"| when executed by psql by the database super-user,\n"
  					"| will recreate all invalid indexes; until then,\n"
  					"| none of these indexes will be used.\n\n",
  					output_path);
--- 664,670 ----
  					"| new clusters so they must be reindexed with the\n"
  					"| REINDEX command.  The file:\n"
  					"| \t%s\n"
! 					"| when executed by psql by the database super-user\n"
  					"| will recreate all invalid indexes; until then,\n"
  					"| none of these indexes will be used.\n\n",
  					output_path);
***************
*** 675,680 ****
--- 675,762 ----
  
  
  /*
+  * v8_4_populate_pg_largeobject_metadata()
+  *
+  *	8.5 has a new pg_largeobject permission table
+  */
+ void
+ v8_4_populate_pg_largeobject_metadata(migratorContext *ctx, bool check_mode,
+ 									  Cluster whichCluster)
+ {
+ 	ClusterInfo	*active_cluster = (whichCluster == CLUSTER_OLD) ?
+ 					&ctx->old : &ctx->new;
+ 	int			dbnum;
+ 	FILE		*script = NULL;
+ 	bool		found = false;
+ 	char		output_path[MAXPGPATH];
+ 
+ 	prep_status(ctx, "Checking for large objects");
+ 
+ 	snprintf(output_path, sizeof(output_path), "%s/pg_largeobject.sql",
+ 			ctx->home_dir);
+ 
+ 	for (dbnum = 0; dbnum < active_cluster->dbarr.ndbs; dbnum++)
+ 	{
+ 		PGresult   *res;
+ 		int			i_count;
+ 		DbInfo	   *active_db = &active_cluster->dbarr.dbs[dbnum];
+ 		PGconn	   *conn = connectToServer(ctx, active_db->db_name, whichCluster);
+ 		
+ 		/* find if there are any large objects */
+ 		res = executeQueryOrDie(ctx, conn,
+ 								"SELECT count(*) "
+ 								"FROM	pg_catalog.pg_largeobject ");
+ 
+ 		i_count = PQfnumber(res, "count");
+ 		if (atoi(PQgetvalue(res, 0, i_count)) != 0)
+ 		{
+ 			found = true;
+ 			if (!check_mode)
+ 			{
+ 				if (script == NULL && (script = fopen(output_path, "w")) == NULL)
+ 						pg_log(ctx, PG_FATAL, "Could not create necessary file:  %s\n", output_path);
+ 				fprintf(script, "\\connect %s\n",
+ 						quote_identifier(ctx, active_db->db_name));
+ 				fprintf(script,
+ 					"INSERT INTO pg_catalog.pg_largeobject_metadata (lomowner)\n"
+ 								"SELECT DISTINCT loid\n"
+ 								"FROM pg_catalog.pg_largeobject;\n");
+ 			}
+ 		}
+ 
+ 		PQclear(res);
+ 		PQfinish(conn);
+ 	}
+ 
+ 	if (found)
+ 	{
+ 		if (!check_mode)
+ 			fclose(script);
+ 		report_status(ctx, PG_WARNING, "warning");
+ 		if (check_mode)
+ 			pg_log(ctx, PG_WARNING, "\n"
+ 					"| Your installation contains large objects.\n"
+ 					"| The new database has an additional large object\n"
+ 					"| permission table.  After migration, you will be\n"
+ 					"| given a command to populate the pg_largeobject\n"
+ 					"| permission table with default permissions.\n\n");
+ 		else
+ 			pg_log(ctx, PG_WARNING, "\n"
+ 					"| Your installation contains large objects.\n"
+ 					"| The new database has an additional large object\n"
+ 					"| permission table so default permissions must be\n"
+ 					"| defined for all large objects.  The file:\n"
+ 					"| \t%s\n"
+ 					"| when executed by psql by the database super-user\n"
+ 					"| will define the default permissions.\n\n",
+ 					output_path);
+ 	}
+ 	else
+ 		check_ok(ctx);
+ }
+ 
+ 
+ /*
   * v8_3_create_sequence_script()
   *
   *	8.4 added the column "start_value" to all sequences.  For this reason,
#41Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#40)
Largeobject Access Controls and pg_migrator

Bruce Momjian wrote:

KaiGai Kohei wrote:

What happens when
there is no entry in pg_largeobject_metadata for a specific row?

In this case, these rows become orphan.
So, I think we need to create an empty large object with same LOID on
pg_migrator. It makes an entry on pg_largeobject_metadata without
writing anything to the pg_largeobject.
I guess rest of migrations are not difference. Correct?

Agreed. I have modified pg_migrator with the attached patch which
creates a script that adds default permissions for all large object
tables.

Oops, seem like I have a problem in getting pg_migrator to populate
pg_largeobject_metadata:

test=> select lo_import('/etc/profile');
lo_import
-----------
16385
(1 row)

test=> select lo_import('/etc/profile.env');
lo_import
-----------
16386
(1 row)

test=> select oid,* from pg_largeobject_metadata;
oid | lomowner | lomacl
-------+----------+--------
16385 | 10 |
16386 | 10 |
(2 rows)

You might remember that INSERT cannot set the oid of a row, so I don't
see how pg_migrator can migrate this? Is there a reason we used 'oid'
in pg_largeobject_metadata but 'lobj' in pg_largeobject? Why did't we
add the lomowner and lomacl columns to pg_largeobject?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#42KaiGai Kohei
kaigai@kaigai.gr.jp
In reply to: Bruce Momjian (#41)
Re: Largeobject Access Controls and pg_migrator

(2009/12/13 10:39), Bruce Momjian wrote:

Bruce Momjian wrote:

KaiGai Kohei wrote:

What happens when
there is no entry in pg_largeobject_metadata for a specific row?

In this case, these rows become orphan.
So, I think we need to create an empty large object with same LOID on
pg_migrator. It makes an entry on pg_largeobject_metadata without
writing anything to the pg_largeobject.
I guess rest of migrations are not difference. Correct?

Agreed. I have modified pg_migrator with the attached patch which
creates a script that adds default permissions for all large object
tables.

Oops, seem like I have a problem in getting pg_migrator to populate
pg_largeobject_metadata:

test=> select lo_import('/etc/profile');
lo_import
-----------
16385
(1 row)

test=> select lo_import('/etc/profile.env');
lo_import
-----------
16386
(1 row)

test=> select oid,* from pg_largeobject_metadata;
oid | lomowner | lomacl
-------+----------+--------
16385 | 10 |
16386 | 10 |
(2 rows)

lo_import() has an another prototype which takes second argument to
specify LOID. Isn't it available to restore a large object with
correct LOID? For example, lo_import('/etc/profile', 1234)

Or, if you intend to restore metadata in the second lo_import(),
ALTER LAEGE OBJECT and GRANT LARGE OBJECT enable to set up metadata
of a certain large object.

Or, am I missing the problem?

You might remember that INSERT cannot set the oid of a row, so I don't
see how pg_migrator can migrate this? Is there a reason we used 'oid'
in pg_largeobject_metadata but 'lobj' in pg_largeobject? Why did't we
add the lomowner and lomacl columns to pg_largeobject?

A large object consists of multiple tuples within pg_largeobject.
If we added lomowner and lomacl into pg_largeobject, we have to manage
all the pages in a large object to keep consistent state.

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

#43Bruce Momjian
bruce@momjian.us
In reply to: KaiGai Kohei (#42)
Re: Largeobject Access Controls and pg_migrator

KaiGai Kohei wrote:

lo_import() has an another prototype which takes second argument to
specify LOID. Isn't it available to restore a large object with
correct LOID? For example, lo_import('/etc/profile', 1234)

I can't use that because the migration has already brought over the
pg_largeobject file which has the data.

Or, if you intend to restore metadata in the second lo_import(),
ALTER LAEGE OBJECT and GRANT LARGE OBJECT enable to set up metadata
of a certain large object.

Yes, that will work cleanly. The file might be large because I need a
GRANT for every large object, but I suppose that is OK.

You might remember that INSERT cannot set the oid of a row, so I don't
see how pg_migrator can migrate this? Is there a reason we used 'oid'
in pg_largeobject_metadata but 'lobj' in pg_largeobject? Why did't we
add the lomowner and lomacl columns to pg_largeobject?

A large object consists of multiple tuples within pg_largeobject.
If we added lomowner and lomacl into pg_largeobject, we have to manage
all the pages in a large object to keep consistent state.

Ah, good point.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#44Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#43)
Re: Largeobject Access Controls and pg_migrator

Bruce Momjian wrote:

KaiGai Kohei wrote:

lo_import() has an another prototype which takes second argument to
specify LOID. Isn't it available to restore a large object with
correct LOID? For example, lo_import('/etc/profile', 1234)

I can't use that because the migration has already brought over the
pg_largeobject file which has the data.

Or, if you intend to restore metadata in the second lo_import(),
ALTER LAEGE OBJECT and GRANT LARGE OBJECT enable to set up metadata
of a certain large object.

Yes, that will work cleanly. The file might be large because I need a
GRANT for every large object, but I suppose that is OK.

Uh, I tested pg_migrator and found a problem with this approach:

test=> select loid from pg_largeobject;
loid
-------
16385
16385
16386
(3 rows)

test=> grant all ON LARGE OBJECT 16385 to public;
ERROR: large object 16385 does not exist

I am wondering if the missing pg_largeobject_metadata row is causing
this, and again I have no way of creating one with the specified oid.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#45KaiGai Kohei
kaigai@kaigai.gr.jp
In reply to: Bruce Momjian (#44)
Re: Largeobject Access Controls and pg_migrator

(2009/12/13 11:31), Bruce Momjian wrote:

Bruce Momjian wrote:

KaiGai Kohei wrote:

lo_import() has an another prototype which takes second argument to
specify LOID. Isn't it available to restore a large object with
correct LOID? For example, lo_import('/etc/profile', 1234)

I can't use that because the migration has already brought over the
pg_largeobject file which has the data.

Or, if you intend to restore metadata in the second lo_import(),
ALTER LAEGE OBJECT and GRANT LARGE OBJECT enable to set up metadata
of a certain large object.

Yes, that will work cleanly. The file might be large because I need a
GRANT for every large object, but I suppose that is OK.

Uh, I tested pg_migrator and found a problem with this approach:

test=> select loid from pg_largeobject;
loid
-------
16385
16385
16386
(3 rows)

test=> grant all ON LARGE OBJECT 16385 to public;
ERROR: large object 16385 does not exist

I am wondering if the missing pg_largeobject_metadata row is causing
this, and again I have no way of creating one with the specified oid.

Can SELECT lo_create(16385); help this situation?

It create an entry into pg_largeobject_metadata, but does not touch
pg_largeobject because it is empty in the initial state.
But, in this case, pg_migrator already bring only data chunks to
pg_largeobject. So, this operation enables to recombine orphan chunks
and a metadata entry.

I'm not clear whether we also check pg_largeobejct has chunks with same
LOID on lo_create(). In the regular operation, it shall never happen.

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

#46Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp
In reply to: KaiGai Kohei (#45)
Re: Largeobject Access Controls and pg_migrator

KaiGai Kohei <kaigai@kaigai.gr.jp> wrote:

Can SELECT lo_create(16385); help this situation?

SELECT lo_create(loid) FROM (SELECT DISTINCT loid FROM pg_largeobject) AS t

would work for pg_migrator.

I'm not clear whether we also check pg_largeobejct has chunks with same
LOID on lo_create(). In the regular operation, it shall never happen.

I think the omission is a reasonable optimization.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

#47Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp
In reply to: KaiGai Kohei (#39)
Re: Largeobject Access Controls (r2460)

KaiGai Kohei <kaigai@kaigai.gr.jp> wrote:

We don't have any reason why still CASE ... WHEN and subquery for the given
LOID. Right?

Ah, I see. I used your suggestion.

I applied the bug fixes. Our tools and contrib modules will always use
pg_largeobject_metadata instead of pg_largeobject to enumerate large objects.

I removed "GRANT SELECT (loid) ON pg_largeobject TO PUBLIC" from initdb
because users must use pg_largeobject_metadata.oid when they want to check
OIDs of large objects; If not, they could misjudge the existence of objects.
This is an unavoidable incompatibility unless we always have corresponding
tuples in pg_largeobject even for zero-length large objects.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

#48Bruce Momjian
bruce@momjian.us
In reply to: Takahiro Itagaki (#46)
Re: Largeobject Access Controls and pg_migrator

Takahiro Itagaki wrote:

KaiGai Kohei <kaigai@kaigai.gr.jp> wrote:

Can SELECT lo_create(16385); help this situation?

SELECT lo_create(loid) FROM (SELECT DISTINCT loid FROM pg_largeobject) AS t

would work for pg_migrator.

I'm not clear whether we also check pg_largeobejct has chunks with same
LOID on lo_create(). In the regular operation, it shall never happen.

I think the omission is a reasonable optimization.

Thanks, I have updated pg_migrator to use your suggested method.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#49Robert Haas
robertmhaas@gmail.com
In reply to: Takahiro Itagaki (#24)
Re: Largeobject Access Controls (r2460)

On Thu, Dec 10, 2009 at 10:41 PM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:

KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

What's your opinion about:
  long desc: When turned on, privilege checks on large objects perform with
             backward compatibility as 8.4.x or earlier releases.

I updated the description as your suggest.

Applied with minor editorialization,
mainly around tab-completion support in psql.

The documentation in this patch needs work.

...Robert

#50KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Robert Haas (#49)
Re: Largeobject Access Controls (r2460)

(2009/12/17 7:25), Robert Haas wrote:

On Thu, Dec 10, 2009 at 10:41 PM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:

KaiGai Kohei<kaigai@ak.jp.nec.com> wrote:

What's your opinion about:
long desc: When turned on, privilege checks on large objects perform with
backward compatibility as 8.4.x or earlier releases.

I updated the description as your suggest.

Applied with minor editorialization,
mainly around tab-completion support in psql.

The documentation in this patch needs work.

Are you talking about English quality? or Am I missing something to be
documented?

--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

#51Robert Haas
robertmhaas@gmail.com
In reply to: KaiGai Kohei (#50)
1 attachment(s)
Re: Largeobject Access Controls (r2460)

2009/12/16 KaiGai Kohei <kaigai@ak.jp.nec.com>:

(2009/12/17 7:25), Robert Haas wrote:

On Thu, Dec 10, 2009 at 10:41 PM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp>  wrote:

KaiGai Kohei<kaigai@ak.jp.nec.com>  wrote:

What's your opinion about:
   long desc: When turned on, privilege checks on large objects perform with
              backward compatibility as 8.4.x or earlier releases.

I updated the description as your suggest.

Applied with minor editorialization,
mainly around tab-completion support in psql.

The documentation in this patch needs work.

Are you talking about English quality? or Am I missing something to be
documented?

Mostly English quality, but there are some other issues too. Proposed
patch attached.

...Robert

Attachments:

lobj_doc.patchtext/x-patch; charset=US-ASCII; name=lobj_doc.patchDownload
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index fdff8b8..482aeac 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -3125,9 +3125,8 @@
 
   <para>
    The catalog <structname>pg_largeobject</structname> holds the data making up
-   <quote>large objects</quote>.  A large object is identified by an OID of
-   <link linkend="catalog-pg-largeobject-metadata"><structname>pg_largeobject_metadata</></link>
-   catalog, assigned when it is created.  Each large object is broken into
+   <quote>large objects</quote>.  A large object is identified by an OID
+   assigned when it is created.  Each large object is broken into
    segments or <quote>pages</> small enough to be conveniently stored as rows
    in <structname>pg_largeobject</structname>.
    The amount of data per page is defined to be <symbol>LOBLKSIZE</> (which is currently
@@ -3135,10 +3134,12 @@
   </para>
 
   <para>
-   <structname>pg_largeobject</structname> should not be readable by the
-   public, since the catalog contains data in large objects of all users.
-   <structname>pg_largeobject_metadata</> is a publicly readable catalog
-   that only contains identifiers of large objects.
+   Prior to <productname>PostgreSQL</> 8.5, there was no permission structure
+   associated with large objects.  As a result,
+   <structname>pg_largeobject</structname> was publicly readable and could be
+   used to obtain the OIDs (and contents) of all large objects in the system.
+   This is no longer the case; use <structname>pg_largeobject_metadata</> to
+   obtain a list of large object OIDs.
   </para>
 
   <table>
@@ -3202,9 +3203,8 @@
   </indexterm>
 
   <para>
-   The purpose of <structname>pg_largeobject_metadata</structname> is to
-   hold metadata of <quote>large objects</quote>, such as OID of its owner,
-   access permissions and OID of the large object itself.
+   The catalog <structname>pg_largeobject_metadata</structname>
+   holds metadata associated with large objects.
   </para>
 
   <table>
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 36d3a22..5e4b44a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4825,22 +4825,19 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
       </indexterm>
       <listitem>
        <para>
-        This allows us to tuen on/off database privilege checks on large
-        objects. In the 8.4.x series and earlier release do not have
-        privilege checks on large object in most cases.
-
-        So, turning the <varname>lo_compat_privileges</varname> off means
-        the large object feature performs in compatible mode.
+        In <productname>PostgreSQL</> releases prior to 8.5, large objects
+        did not have access privileges and were, in effect, readable and
+        writable by all users.  Setting this variable to <literal>on</>
+        disables the new privilege checks, for compatibility with prior
+        releases.  The default is <literal>off</>.
        </para>
        <para>
-        Please note that it is not equivalent to disable all the security
-        checks corresponding to large objects.
-        For example, the <literal>lo_import()</literal> and
+        Setting this variable does not disable all security checks for
+        large objects - only those for which the default behavior has changed
+        in <productname>PostgreSQL</> 8.5.
+        For example, <literal>lo_import()</literal> and
         <literal>lo_export()</literal> need superuser privileges independent
-        from this setting as prior versions were doing.
-       </para>
-       <para>
-        It is <literal>off</literal> by default.
+        of this setting.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/lobj.sgml b/doc/src/sgml/lobj.sgml
index e5a680a..de246b0 100644
--- a/doc/src/sgml/lobj.sgml
+++ b/doc/src/sgml/lobj.sgml
@@ -59,6 +59,21 @@
     searches for the correct chunk number when doing random
     access reads and writes.
    </para>
+
+   <para>
+    As of <productname>PostgreSQL</> 8.5, large objects have an owner
+    and a set of access permissions, which can be managed using
+    <xref linkend="sql-grant" endterm="sql-grant-title"> and
+    <xref linkend="sql-revoke" endterm="sql-revoke-title">.  
+    For compatibility with prior releases, see
+    <xref linkend="guc-lo-compat-privileges">.
+    <literal>SELECT</literal> privileges are required to read a large
+    object, and 
+    <literal>UPDATE</literal> privileges are required to write to or
+    truncate it.
+    Only the large object owner (or the database superuser) can unlink, comment
+    on, or change the owner of a large object.
+   </para>
   </sect1>
 
   <sect1 id="lo-interfaces">
@@ -438,60 +453,9 @@ SELECT lo_export(image.raster, '/tmp/motd') FROM image
     owning user.  Therefore, their use is restricted to superusers.  In
     contrast, the client-side import and export functions read and write files
     in the client's file system, using the permissions of the client program.
-    The client-side functions can be used by any
-    <productname>PostgreSQL</productname> user.
+    The client-side functions do not require superuser privilege.
   </para>
 
-  <sect2 id="lo-func-privilege">
-   <title>Large object and privileges</title>
-   <para>
-    Note that access control feature was not supported in the 8.4.x series
-    and earlier release.
-    Also see the <xref linkend="guc-lo-compat-privileges"> compatibility
-    option.
-   </para>
-   <para>
-    Now it supports access controls on large objects, and allows the owner
-    of large objects to set up access rights using
-    <xref linkend="sql-grant" endterm="sql-grant-title"> and
-    <xref linkend="sql-revoke" endterm="sql-revoke-title"> statement.
-   </para>
-   <para>
-    Two permissions are defined on the large object class.
-    These are checked only when <xref linkend="guc-lo-compat-privileges">
-    option is disabled.
-   </para>
-   <para>
-    The first is <literal>SELECT</literal>.
-    It is required on <function>loread()</function> function.
-    Note that when we open large object with read-only mode, we can see
-    a static image even if other concurrent transaction modified the
-    same large object.
-    This principle is also applied on the access rights of large objects.
-    Even if a transaction modified access rights and commit it, it is
-    not invisible from other transaction which already opened the large
-    object.
-   </para>
-   <para>
-    The second is <literal>UPDATE</literal>.
-    It is required on <function>lowrite()</function> function and
-    <function>lo_truncate()</function> function.
-   </para>
-   <para>
-    In addition, <function>lo_unlink()</function> function,
-    <command>COMMENT ON</command> and <command>ALTER LARGE OBJECT</command>
-    statements needs ownership of the large object to be accessed.
-   </para>
-   <para>
-    You may wonder why <literal>SELECT</literal> is not checked on the
-    <function>lo_export()</function> function or <literal>UPDATE</literal>
-    is not checked on the <function>lo_import</function> function.
-
-    These functions originally require database superuser privilege,
-    and it allows to bypass the default database privilege checks,
-    so we don't need to check an obvious test twice.
-   </para>
-  </sect2>
 </sect1>
 
 <sect1 id="lo-examplesect">
diff --git a/doc/src/sgml/ref/grant.sgml b/doc/src/sgml/ref/grant.sgml
index 8f61d72..2456a96 100644
--- a/doc/src/sgml/ref/grant.sgml
+++ b/doc/src/sgml/ref/grant.sgml
@@ -174,8 +174,7 @@ GRANT <replaceable class="PARAMETER">role_name</replaceable> [, ...] TO <replace
        <xref linkend="sql-delete" endterm="sql-delete-title">.
        For sequences, this privilege also allows the use of the
        <function>currval</function> function.
-       For large objects, this privilege also allows to read from
-       the target large object.
+       For large objects, this privilege allows the object to be read.
       </para>
      </listitem>
     </varlistentry>
@@ -209,8 +208,8 @@ GRANT <replaceable class="PARAMETER">role_name</replaceable> [, ...] TO <replace
        <literal>SELECT</literal> privilege.  For sequences, this
        privilege allows the use of the <function>nextval</function> and
        <function>setval</function> functions.
-       For large objects, this privilege also allows to write or truncate
-       on the target large object.
+       For large objects, this privilege allows writing or truncating the
+       object.
       </para>
      </listitem>
     </varlistentry>
#52KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Robert Haas (#51)
Re: Largeobject Access Controls (r2460)

(2009/12/17 13:20), Robert Haas wrote:

2009/12/16 KaiGai Kohei<kaigai@ak.jp.nec.com>:

(2009/12/17 7:25), Robert Haas wrote:

On Thu, Dec 10, 2009 at 10:41 PM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:

KaiGai Kohei<kaigai@ak.jp.nec.com> wrote:

What's your opinion about:
long desc: When turned on, privilege checks on large objects perform with
backward compatibility as 8.4.x or earlier releases.

I updated the description as your suggest.

Applied with minor editorialization,
mainly around tab-completion support in psql.

The documentation in this patch needs work.

Are you talking about English quality? or Am I missing something to be
documented?

Mostly English quality, but there are some other issues too. Proposed
patch attached.

I have nothing to comment on English quality....

Thanks for your fixups.
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

#53Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp
In reply to: Robert Haas (#51)
Re: Largeobject Access Controls (r2460)

Robert Haas <robertmhaas@gmail.com> wrote:

2009/12/16 KaiGai Kohei <kaigai@ak.jp.nec.com>:

? ?long desc: When turned on, privilege checks on large objects perform with
? ? ? ? ? ? ? backward compatibility as 8.4.x or earlier releases.

Mostly English quality, but there are some other issues too. Proposed
patch attached.

I remember we had discussions about the version number, like
"Don't use '8.5' because it might be released as '9.0'", no?

Another comment is I'd like to keep <link linkend="catalog-pg-largeobject-metadata">
for the first <structname>pg_largeobject</structname> in each topic.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

#54Robert Haas
robertmhaas@gmail.com
In reply to: Takahiro Itagaki (#53)
Re: Largeobject Access Controls (r2460)

2009/12/17 Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp>:

Robert Haas <robertmhaas@gmail.com> wrote:

2009/12/16 KaiGai Kohei <kaigai@ak.jp.nec.com>:

? ?long desc: When turned on, privilege checks on large objects perform with
? ? ? ? ? ? ? backward compatibility as 8.4.x or earlier releases.

Mostly English quality, but there are some other issues too.  Proposed
patch attached.

I remember we had discussions about the version number, like
"Don't use '8.5' because it might be released as '9.0'", no?

I chose to follow the style which Tom indicated that he preferred
here. We don't use the phrase "8.4.x series" anywhere else in the
documentation, so this doesn't seem like a good time to start. Tom or
I will go through and renumber everything if we end up renaming the
release to 9.0.

Another comment is I'd like to keep <link linkend="catalog-pg-largeobject-metadata">
for the first <structname>pg_largeobject</structname> in each topic.

Those two things aren't the same. Perhaps you meant <link
linkend="catalog-pg-largeobject">? I'll tweak the pg_largeobject and
pg_largeobject_metadata sections to make sure each has a link to the
other and commit this. I also found one more spelling mistake so I
will include that correction as well.

...Robert

#55Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp
In reply to: Robert Haas (#54)
Re: Largeobject Access Controls (r2460)

Robert Haas <robertmhaas@gmail.com> wrote:

Another comment is I'd like to keep <link linkend="catalog-pg-largeobject-metadata">
for the first <structname>pg_largeobject</structname> in each topic.

Those two things aren't the same. Perhaps you meant <link
linkend="catalog-pg-largeobject">?

Oops, yes. Thank you for the correction.

We also have "8.4.x series" in the core code. Do you think we also
rewrite those messages? One of them is an use-visible message.

LargeObjectAlterOwner() : pg_largeobject.c
* The 'lo_compat_privileges' is not checked here, because we
* don't have any access control features in the 8.4.x series
* or earlier release.
* So, it is not a place we can define a compatible behavior.

guc.c
{"lo_compat_privileges", PGC_SUSET, COMPAT_OPTIONS_PREVIOUS,
gettext_noop("Enables backward compatibility in privilege checks on large objects"),
gettext_noop("When turned on, privilege checks on large objects perform "
"with backward compatibility as 8.4.x or earlier releases.")

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

#56Robert Haas
robertmhaas@gmail.com
In reply to: Takahiro Itagaki (#55)
1 attachment(s)
Re: Largeobject Access Controls (r2460)

On Thu, Dec 17, 2009 at 7:27 PM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:

Another comment is I'd like to keep <link linkend="catalog-pg-largeobject-metadata">
for the first <structname>pg_largeobject</structname> in each topic.

Those two things aren't the same.  Perhaps you meant <link
linkend="catalog-pg-largeobject">?

Oops, yes. Thank you for the correction.

We also have "8.4.x series" in the core code. Do you think we also
rewrite those messages? One of them is an use-visible message.

Yes. I started going through the comments tonight. Partial patch
attached. There were two comments that I was unable to understand and
therefore could not reword - the one at the top of
pg_largeobject_aclmask_snapshot(), and the second part of the comment
at the top of LargeObjectExists():

* Note that LargeObjectExists always scans the system catalog
* with SnapshotNow, so it is unavailable to use to check
* existence in read-only accesses.

In both cases, I'm lost. Help?

In acldefault(), there is this comment:

/* Grant SELECT,UPDATE by default, for now */

This doesn't seem to match what the code is doing, so I think we
should remove it.

I also notice that dumpBlobComments() is now misnamed, but it seems
we've chosen to add a comment
mentioning that fact rather than fixing it. That doesn't seem like
the right approach.

...Robert

Attachments:

lo_comments.patchtext/x-patch; charset=US-ASCII; name=lo_comments.patchDownload
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index 809df7a..b0aea41 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -4261,9 +4261,8 @@ pg_language_ownercheck(Oid lan_oid, Oid roleid)
 /*
  * Ownership check for a largeobject (specified by OID)
  *
- * Note that we have no candidate to call this routine with a certain
- * snapshot except for SnapshotNow, so we don't provide an interface
- * with _snapshot() version now.
+ * This is only used for operations like ALTER LARGE OBJECT that are always
+ * relative to SnapshotNow.
  */
 bool
 pg_largeobject_ownercheck(Oid lobj_oid, Oid roleid)
diff --git a/src/backend/catalog/pg_largeobject.c b/src/backend/catalog/pg_largeobject.c
index ada5b88..dfbf350 100644
--- a/src/backend/catalog/pg_largeobject.c
+++ b/src/backend/catalog/pg_largeobject.c
@@ -79,10 +79,8 @@ LargeObjectCreate(Oid loid)
 }
 
 /*
- * Drop a large object having the given LO identifier.
- *
- * When we drop a large object, it is necessary to drop both of metadata
- * and data pages in same time.
+ * Drop a large object having the given LO identifier.  Both the data pages
+ * and metadata must be dropped.
  */
 void
 LargeObjectDrop(Oid loid)
@@ -191,13 +189,12 @@ LargeObjectAlterOwner(Oid loid, Oid newOwnerId)
 		if (!superuser())
 		{
 			/*
-			 * The 'lo_compat_privileges' is not checked here, because we
-			 * don't have any access control features in the 8.4.x series
-			 * or earlier release.
-			 * So, it is not a place we can define a compatible behavior.
+			 * lo_compat_privileges is not checked here, because ALTER
+			 * LARGE OBJECT ... OWNER did not exist at all prior to
+			 * PostgreSQL 8.5.
+			 *
+			 * We must be the owner of the existing object.
 			 */
-
-			/* Otherwise, must be owner of the existing object */
 			if (!pg_largeobject_ownercheck(loid, GetUserId()))
 				ereport(ERROR,
 						(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
@@ -251,9 +248,8 @@ LargeObjectAlterOwner(Oid loid, Oid newOwnerId)
 /*
  * LargeObjectExists
  *
- * Currently, we don't use system cache to contain metadata of
- * large objects, because massive number of large objects can
- * consume not a small amount of process local memory.
+ * We don't use the system cache to for large object metadata, for fear of
+ * using too much local memory.
  *
  * Note that LargeObjectExists always scans the system catalog
  * with SnapshotNow, so it is unavailable to use to check
diff --git a/src/backend/commands/comment.c b/src/backend/commands/comment.c
index 8f8ecc7..ece2a30 100644
--- a/src/backend/commands/comment.c
+++ b/src/backend/commands/comment.c
@@ -1449,7 +1449,7 @@ CommentLargeObject(List *qualname, char *comment)
 	 *
 	 * See the comment in the inv_create() which describes
 	 * the reason why LargeObjectRelationId is used instead
-	 * of the LargeObjectMetadataRelationId.
+	 * of LargeObjectMetadataRelationId.
 	 */
 	CreateComments(loid, LargeObjectRelationId, 0, comment);
 }
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index c799b13..22b8ea7 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1229,9 +1229,9 @@ static struct config_bool ConfigureNamesBool[] =
 
 	{
 		{"lo_compat_privileges", PGC_SUSET, COMPAT_OPTIONS_PREVIOUS,
-			gettext_noop("Enables backward compatibility in privilege checks on large objects"),
-			gettext_noop("When turned on, privilege checks on large objects perform "
-						 "with backward compatibility as 8.4.x or earlier releases.")
+			gettext_noop("Enables backward compatibility mode for privilege checks on large objects"),
+			gettext_noop("Skips privilege checks when reading or modifying large objects, "
+						 "for compatibility with PostgreSQL releases prior to 8.5.")
 		},
 		&lo_compat_privileges,
 		false, NULL, NULL
#57Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp
In reply to: Robert Haas (#56)
Re: Largeobject Access Controls (r2460)

Robert Haas <robertmhaas@gmail.com> wrote:

In both cases, I'm lost. Help?

They might be contrasted with the comments for myLargeObjectExists.
Since we use MVCC visibility in loread(), metadata for large object
also should be visible in MVCC rule.

If I understand them, they say:
* pg_largeobject_aclmask_snapshot requires a snapshot which will be
used in loread().
* Don't use LargeObjectExists if you need MVCC visibility.

In acldefault(), there is this comment:
/* Grant SELECT,UPDATE by default, for now */
This doesn't seem to match what the code is doing, so I think we
should remove it.

Ah, ACL_NO_RIGHTS is the default.

I also notice that dumpBlobComments() is now misnamed, but it seems
we've chosen to add a comment mentioning that fact rather than fixing it.

Hmmm, now it dumps not only comments but also ownership of large objects.
Should we rename it dumpBlobMetadata() or so?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

#58KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Takahiro Itagaki (#57)
1 attachment(s)
Re: Largeobject Access Controls (r2460)

(2009/12/18 15:48), Takahiro Itagaki wrote:

Robert Haas<robertmhaas@gmail.com> wrote:

In both cases, I'm lost. Help?

They might be contrasted with the comments for myLargeObjectExists.
Since we use MVCC visibility in loread(), metadata for large object
also should be visible in MVCC rule.

If I understand them, they say:
* pg_largeobject_aclmask_snapshot requires a snapshot which will be
used in loread().
* Don't use LargeObjectExists if you need MVCC visibility.

Yes, correct.

In acldefault(), there is this comment:
/* Grant SELECT,UPDATE by default, for now */
This doesn't seem to match what the code is doing, so I think we
should remove it.

Ah, ACL_NO_RIGHTS is the default.

Oops, it reflects very early phase design, but fixed later.

I also notice that dumpBlobComments() is now misnamed, but it seems
we've chosen to add a comment mentioning that fact rather than fixing it.

Hmmm, now it dumps not only comments but also ownership of large objects.
Should we rename it dumpBlobMetadata() or so?

It seems to me quite natural.

The attached patch fixes them.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

pg_dump-blob-fix-comment.patchtext/x-patch; name=pg_dump-blob-fix-comment.patchDownload
*** base/src/backend/utils/adt/acl.c	(revision 2503)
--- base/src/backend/utils/adt/acl.c	(working copy)
***************
*** 765,771 ****
  			owner_default = ACL_ALL_RIGHTS_LANGUAGE;
  			break;
  		case ACL_OBJECT_LARGEOBJECT:
- 			/* Grant SELECT,UPDATE by default, for now */
  			world_default = ACL_NO_RIGHTS;
  			owner_default = ACL_ALL_RIGHTS_LARGEOBJECT;
  			break;
--- 765,770 ----
*** base/src/bin/pg_dump/pg_dump.h	(revision 2503)
--- base/src/bin/pg_dump/pg_dump.h	(working copy)
***************
*** 116,122 ****
  	DO_FOREIGN_SERVER,
  	DO_DEFAULT_ACL,
  	DO_BLOBS,
! 	DO_BLOB_COMMENTS
  } DumpableObjectType;
  
  typedef struct _dumpableObject
--- 116,122 ----
  	DO_FOREIGN_SERVER,
  	DO_DEFAULT_ACL,
  	DO_BLOBS,
! 	DO_BLOB_METADATA
  } DumpableObjectType;
  
  typedef struct _dumpableObject
*** base/src/bin/pg_dump/pg_dump_sort.c	(revision 2503)
--- base/src/bin/pg_dump/pg_dump_sort.c	(working copy)
***************
*** 56,62 ****
  	4,							/* DO_FOREIGN_SERVER */
  	17,							/* DO_DEFAULT_ACL */
  	10,							/* DO_BLOBS */
! 	11							/* DO_BLOB_COMMENTS */
  };
  
  /*
--- 56,62 ----
  	4,							/* DO_FOREIGN_SERVER */
  	17,							/* DO_DEFAULT_ACL */
  	10,							/* DO_BLOBS */
! 	11							/* DO_BLOB_METADATA */
  };
  
  /*
***************
*** 93,99 ****
  	15,							/* DO_FOREIGN_SERVER */
  	27,							/* DO_DEFAULT_ACL */
  	20,							/* DO_BLOBS */
! 	21							/* DO_BLOB_COMMENTS */
  };
  
  
--- 93,99 ----
  	15,							/* DO_FOREIGN_SERVER */
  	27,							/* DO_DEFAULT_ACL */
  	20,							/* DO_BLOBS */
! 	21							/* DO_BLOB_METADATA */
  };
  
  
***************
*** 1151,1159 ****
  					 "BLOBS  (ID %d)",
  					 obj->dumpId);
  			return;
! 		case DO_BLOB_COMMENTS:
  			snprintf(buf, bufsize,
! 					 "BLOB COMMENTS  (ID %d)",
  					 obj->dumpId);
  			return;
  	}
--- 1151,1159 ----
  					 "BLOBS  (ID %d)",
  					 obj->dumpId);
  			return;
! 		case DO_BLOB_METADATA:
  			snprintf(buf, bufsize,
! 					 "BLOB METADATA  (ID %d)",
  					 obj->dumpId);
  			return;
  	}
*** base/src/bin/pg_dump/pg_dump.c	(revision 2503)
--- base/src/bin/pg_dump/pg_dump.c	(working copy)
***************
*** 191,197 ****
  static const char *fmtQualifiedId(const char *schema, const char *id);
  static bool hasBlobs(Archive *AH);
  static int	dumpBlobs(Archive *AH, void *arg);
! static int	dumpBlobComments(Archive *AH, void *arg);
  static void dumpDatabase(Archive *AH);
  static void dumpEncoding(Archive *AH);
  static void dumpStdStrings(Archive *AH);
--- 191,197 ----
  static const char *fmtQualifiedId(const char *schema, const char *id);
  static bool hasBlobs(Archive *AH);
  static int	dumpBlobs(Archive *AH, void *arg);
! static int	dumpBlobMetadata(Archive *AH, void *arg);
  static void dumpDatabase(Archive *AH);
  static void dumpEncoding(Archive *AH);
  static void dumpStdStrings(Archive *AH);
***************
*** 707,716 ****
  		blobobj->name = strdup("BLOBS");
  
  		blobcobj = (DumpableObject *) malloc(sizeof(DumpableObject));
! 		blobcobj->objType = DO_BLOB_COMMENTS;
  		blobcobj->catId = nilCatalogId;
  		AssignDumpId(blobcobj);
! 		blobcobj->name = strdup("BLOB COMMENTS");
  		addObjectDependency(blobcobj, blobobj->dumpId);
  	}
  
--- 707,716 ----
  		blobobj->name = strdup("BLOBS");
  
  		blobcobj = (DumpableObject *) malloc(sizeof(DumpableObject));
! 		blobcobj->objType = DO_BLOB_METADATA;
  		blobcobj->catId = nilCatalogId;
  		AssignDumpId(blobcobj);
! 		blobcobj->name = strdup("BLOB METADATA");
  		addObjectDependency(blobcobj, blobobj->dumpId);
  	}
  
***************
*** 2048,2064 ****
  }
  
  /*
!  * dumpBlobComments
!  *	dump all blob properties.
!  *  It has "BLOB COMMENTS" tag due to the historical reason, but note
!  *  that it is the routine to dump all the properties of blobs.
   *
   * Since we don't provide any way to be selective about dumping blobs,
!  * there's no need to be selective about their comments either.  We put
!  * all the comments into one big TOC entry.
   */
  static int
! dumpBlobComments(Archive *AH, void *arg)
  {
  	const char *blobQry;
  	const char *blobFetchQry;
--- 2048,2062 ----
  }
  
  /*
!  * dumpBlobMetadata
!  *	dump all blob metadata.
   *
   * Since we don't provide any way to be selective about dumping blobs,
!  * there's no need to be selective about their metadata either.  We put
!  * all the metadata into one big TOC entry.
   */
  static int
! dumpBlobMetadata(Archive *AH, void *arg)
  {
  	const char *blobQry;
  	const char *blobFetchQry;
***************
*** 6294,6306 ****
  						 dobj->dependencies, dobj->nDeps,
  						 dumpBlobs, NULL);
  			break;
! 		case DO_BLOB_COMMENTS:
  			ArchiveEntry(fout, dobj->catId, dobj->dumpId,
  						 dobj->name, NULL, NULL, "",
! 						 false, "BLOB COMMENTS", SECTION_DATA,
  						 "", "", NULL,
  						 dobj->dependencies, dobj->nDeps,
! 						 dumpBlobComments, NULL);
  			break;
  	}
  }
--- 6292,6304 ----
  						 dobj->dependencies, dobj->nDeps,
  						 dumpBlobs, NULL);
  			break;
! 		case DO_BLOB_METADATA:
  			ArchiveEntry(fout, dobj->catId, dobj->dumpId,
  						 dobj->name, NULL, NULL, "",
! 						 false, "BLOB METADATA", SECTION_DATA,
  						 "", "", NULL,
  						 dobj->dependencies, dobj->nDeps,
! 						 dumpBlobMetadata, NULL);
  			break;
  	}
  }
#59Robert Haas
robertmhaas@gmail.com
In reply to: KaiGai Kohei (#58)
Re: Largeobject Access Controls (r2460)

2009/12/18 KaiGai Kohei <kaigai@ak.jp.nec.com>:

(2009/12/18 15:48), Takahiro Itagaki wrote:

Robert Haas<robertmhaas@gmail.com>  wrote:

In both cases, I'm lost.  Help?

They might be contrasted with the comments for myLargeObjectExists.
Since we use MVCC visibility in loread(), metadata for large object
also should be visible in MVCC rule.

If I understand them, they say:
   * pg_largeobject_aclmask_snapshot requires a snapshot which will be
     used in loread().
   * Don't use LargeObjectExists if you need MVCC visibility.

Yes, correct.

In acldefault(), there is this comment:
   /* Grant SELECT,UPDATE by default, for now */
This doesn't seem to match what the code is doing, so I think we
should remove it.

Ah, ACL_NO_RIGHTS is the default.

Oops, it reflects very early phase design, but fixed later.

I also notice that dumpBlobComments() is now misnamed, but it seems
we've chosen to add a comment mentioning that fact rather than fixing it.

Hmmm, now it dumps not only comments but also ownership of large objects.
Should we rename it dumpBlobMetadata() or so?

It seems to me quite natural.

The attached patch fixes them.

I think we might want to go with dumpBlobProperties(), because
dumpBlobMetadata() might lead you to think that all of the properties
being dumped are stored in pg_largeobject_metadata, which is not the
case.

I do also wonder why we are calling these blobs in this code rather
than large objects, but that problem predates this patch and I think
we might as well leave it alone for now.

...Robert

#60Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#59)
Re: Largeobject Access Controls (r2460)

On Fri, Dec 18, 2009 at 9:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:

2009/12/18 KaiGai Kohei <kaigai@ak.jp.nec.com>:

(2009/12/18 15:48), Takahiro Itagaki wrote:

Robert Haas<robertmhaas@gmail.com>  wrote:

In both cases, I'm lost.  Help?

They might be contrasted with the comments for myLargeObjectExists.
Since we use MVCC visibility in loread(), metadata for large object
also should be visible in MVCC rule.

If I understand them, they say:
   * pg_largeobject_aclmask_snapshot requires a snapshot which will be
     used in loread().
   * Don't use LargeObjectExists if you need MVCC visibility.

Yes, correct.

In acldefault(), there is this comment:
   /* Grant SELECT,UPDATE by default, for now */
This doesn't seem to match what the code is doing, so I think we
should remove it.

Ah, ACL_NO_RIGHTS is the default.

Oops, it reflects very early phase design, but fixed later.

I also notice that dumpBlobComments() is now misnamed, but it seems
we've chosen to add a comment mentioning that fact rather than fixing it.

Hmmm, now it dumps not only comments but also ownership of large objects.
Should we rename it dumpBlobMetadata() or so?

It seems to me quite natural.

The attached patch fixes them.

I think we might want to go with dumpBlobProperties(), because
dumpBlobMetadata() might lead you to think that all of the properties
being dumped are stored in pg_largeobject_metadata, which is not the
case.

Oh. This is more complicated than it appeared on the surface. It
seems that the string "BLOB COMMENTS" actually gets inserted into
custom dumps somewhere, so I'm not sure whether we can just change it.
Was this issue discussed at some point before this was committed?
Changing it would seem to require inserting some backward
compatibility code here. Another option would be to add a separate
section for "BLOB METADATA", and leave "BLOB COMMENTS" alone. Can
anyone comment on what the Right Thing To Do is here?

...Robert

#61Robert Haas
robertmhaas@gmail.com
In reply to: Takahiro Itagaki (#57)
Re: Largeobject Access Controls (r2460)

On Fri, Dec 18, 2009 at 1:48 AM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:

In both cases, I'm lost.  Help?

They might be contrasted with the comments for myLargeObjectExists.
Since we use MVCC visibility in loread(), metadata for large object
also should be visible in MVCC rule.

If I understand them, they say:
 * pg_largeobject_aclmask_snapshot requires a snapshot which will be
   used in loread().
 * Don't use LargeObjectExists if you need MVCC visibility.

Part of what I'm confused about (and what I think should be documented
in a comment somewhere) is why we're using MVCC visibility in some
places but not others. In particular, there seem to be some bits of
the comment that imply that we do this for read but not for write,
which seems really strange. It may or may not actually be strange,
but I don't understand it.

...Robert

#62Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#60)
Re: Largeobject Access Controls (r2460)

Robert Haas <robertmhaas@gmail.com> writes:

Oh. This is more complicated than it appeared on the surface. It
seems that the string "BLOB COMMENTS" actually gets inserted into
custom dumps somewhere, so I'm not sure whether we can just change it.
Was this issue discussed at some point before this was committed?
Changing it would seem to require inserting some backward
compatibility code here. Another option would be to add a separate
section for "BLOB METADATA", and leave "BLOB COMMENTS" alone. Can
anyone comment on what the Right Thing To Do is here?

The BLOB COMMENTS label is, or was, correct for what it contained.
If this patch has usurped it to contain other things I would argue
that that is seriously wrong. pg_dump already has a clear notion
of how to handle ACLs for objects. ACLs for blobs ought to be
made to fit into that structure, not dumped in some random place
because that saved a few lines of code.

regards, tom lane

#63Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#61)
Re: Largeobject Access Controls (r2460)

Robert Haas <robertmhaas@gmail.com> writes:

Part of what I'm confused about (and what I think should be documented
in a comment somewhere) is why we're using MVCC visibility in some
places but not others. In particular, there seem to be some bits of
the comment that imply that we do this for read but not for write,
which seems really strange. It may or may not actually be strange,
but I don't understand it.

It is supposed to depend on whether you opened the blob for read only
or for read write. Please do not tell me that this patch broke that;
because if it did it broke pg_dump.

This behavior is documented at least here:
http://www.postgresql.org/docs/8.4/static/lo-interfaces.html#AEN36338

regards, tom lane

#64Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#62)
Re: Largeobject Access Controls (r2460)

On Fri, Dec 18, 2009 at 9:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Oh.  This is more complicated than it appeared on the surface.  It
seems that the string "BLOB COMMENTS" actually gets inserted into
custom dumps somewhere, so I'm not sure whether we can just change it.
 Was this issue discussed at some point before this was committed?
Changing it would seem to require inserting some backward
compatibility code here.  Another option would be to add a separate
section for "BLOB METADATA", and leave "BLOB COMMENTS" alone.  Can
anyone comment on what the Right Thing To Do is here?

The BLOB COMMENTS label is, or was, correct for what it contained.
If this patch has usurped it to contain other things

It has.

I would argue
that that is seriously wrong.  pg_dump already has a clear notion
of how to handle ACLs for objects.  ACLs for blobs ought to be
made to fit into that structure, not dumped in some random place
because that saved a few lines of code.

OK. Hopefully KaiGai or Takahiro can suggest a fix.

Thanks,

...Robert

#65Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#63)
Re: Largeobject Access Controls (r2460)

On Fri, Dec 18, 2009 at 9:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Part of what I'm confused about (and what I think should be documented
in a comment somewhere) is why we're using MVCC visibility in some
places but not others.  In particular, there seem to be some bits of
the comment that imply that we do this for read but not for write,
which seems really strange.  It may or may not actually be strange,
but I don't understand it.

It is supposed to depend on whether you opened the blob for read only
or for read write.  Please do not tell me that this patch broke that;
because if it did it broke pg_dump.

This behavior is documented at least here:
http://www.postgresql.org/docs/8.4/static/lo-interfaces.html#AEN36338

Oh, I see. Thanks for the pointer. Having read that through, I can
now say that the comments in the patch seem to imply that it attempted
to preserve those semantics, but I can't swear that it did. I will
take another look at it, but it might bear closer examination by
someone with more MVCC-fu than myself.

...Robert

#66KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Robert Haas (#64)
Re: Largeobject Access Controls (r2460)

(2009/12/19 12:05), Robert Haas wrote:

On Fri, Dec 18, 2009 at 9:48 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Robert Haas<robertmhaas@gmail.com> writes:

Oh. This is more complicated than it appeared on the surface. It
seems that the string "BLOB COMMENTS" actually gets inserted into
custom dumps somewhere, so I'm not sure whether we can just change it.
Was this issue discussed at some point before this was committed?
Changing it would seem to require inserting some backward
compatibility code here. Another option would be to add a separate
section for "BLOB METADATA", and leave "BLOB COMMENTS" alone. Can
anyone comment on what the Right Thing To Do is here?

The BLOB COMMENTS label is, or was, correct for what it contained.
If this patch has usurped it to contain other things

It has.

I would argue
that that is seriously wrong. pg_dump already has a clear notion
of how to handle ACLs for objects. ACLs for blobs ought to be
made to fit into that structure, not dumped in some random place
because that saved a few lines of code.

OK. Hopefully KaiGai or Takahiro can suggest a fix.

Currently, BLOBS (and BLOB COMMENTS) section does not contain
owner of the large objects, because it may press the local memory
of pg_dump when the database contains massive amount of large
objects.
I believe it is the reason why we dump all the large objects in
a single section. Correct?

I don't think it is reasonable to dump all the large object with
its individual section.
However, we can categorize them per owner. In generally, we can
assume the number of database users are smaller than the number
of large objects.
In other word, we can obtain the number of sections to be dumped
as result of the following query:

SELECT DISTINCT lomowner FROM pg_largeobject_metadata;

Then, we can dump them per users.

For earlier versions, all the large objects will be dumped in
a single section with anonymous user.

What's your opinion?
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

#67KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: KaiGai Kohei (#66)
1 attachment(s)
Re: Largeobject Access Controls (r2460)

(2009/12/21 9:39), KaiGai Kohei wrote:

(2009/12/19 12:05), Robert Haas wrote:

On Fri, Dec 18, 2009 at 9:48 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Robert Haas<robertmhaas@gmail.com> writes:

Oh. This is more complicated than it appeared on the surface. It
seems that the string "BLOB COMMENTS" actually gets inserted into
custom dumps somewhere, so I'm not sure whether we can just change it.
Was this issue discussed at some point before this was committed?
Changing it would seem to require inserting some backward
compatibility code here. Another option would be to add a separate
section for "BLOB METADATA", and leave "BLOB COMMENTS" alone. Can
anyone comment on what the Right Thing To Do is here?

The BLOB COMMENTS label is, or was, correct for what it contained.
If this patch has usurped it to contain other things

It has.

I would argue
that that is seriously wrong. pg_dump already has a clear notion
of how to handle ACLs for objects. ACLs for blobs ought to be
made to fit into that structure, not dumped in some random place
because that saved a few lines of code.

OK. Hopefully KaiGai or Takahiro can suggest a fix.

The patch has grown larger than I expected before, because the way
to handle large objects are far from any other object classes.

Here are three points:

1) The new BLOB ACLS section was added.

It is a single purpose section to describe GRANT/REVOKE statements
on large objects, and BLOB COMMENTS section was reverted to put
only descriptions.

Because we need to assume a case when the database holds massive
number of large objects, it is not reasonable to store them using
dumpACL(). It chains an ACL entry with the list of TOC entries,
then, these are dumped. It means pg_dump may have to register massive
number of large objects in the local memory space.

Currently, we also store GRANT/REVOKE statements in BLOB COMMENTS
section, but confusable. Even if pg_restore is launched with
--no-privileges options, it cannot ignore GRANT/REVOKE statements
on large objects. This fix enables to distinguish ACLs on large
objects from other properties, and to handle them correctly.

2) The BLOBS section was separated for each database users.

Currently, the BLOBS section does not have information about owner
of the large objects to be restored. So, we tried to alter its
ownership in the BLOB COMMENTS section, but incorrect.

The --use-set-session-authorization option requires to restore
ownership of objects without ALTER ... OWNER TO statements.
So, we need to set up correct database username on the section
properties.

This patch renamed the hasBlobs() by getBlobs(), and changed its
purpose. It registers DO_BLOBS, DO_BLOB_COMMENTS and DO_BLOB_ACLS
for each large objects owners, if necessary.
For example, if here are five users owning large objects, getBlobs()
shall register five TOC entries for each users, and dumpBlobs(),
dumpBlobComments() and dumpBlobAcls() shall be also invoked five
times with the username.

3) _LoadBlobs()

For regular database object classes, _printTocEntry() can inject
"ALTER xxx OWNER TO ..." statement on the restored object based on
the ownership described in the section header.
However, we cannot use this infrastructure for large objects as-is,
because one BLOBS section can restore multiple large objects.

_LoadBlobs() is a routine to restore large objects within a certain
section. This patch modifies this routine to inject "ALTER LARGE
OBJECT <loid> OWNER TO <user>" statement for each large objects
based on the ownership of the section.
(if --use-set-session-authorization is not given.)

$ diffstat pgsql-fix-pg_dump-blob-privs.patch
pg_backup_archiver.c | 4
pg_backup_custom.c | 11 !
pg_backup_files.c | 9 !
pg_backup_tar.c | 9 !
pg_dump.c | 312 +++++++----!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
pg_dump.h | 9 !
pg_dump_sort.c | 8 !
7 files changed, 68 insertions(+), 25 deletions(-), 269 modifications(!)

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

pgsql-fix-pg_dump-blob-privs.patchtext/x-patch; name=pgsql-fix-pg_dump-blob-privs.patchDownload
*** base/src/bin/pg_dump/pg_dump.h	(revision 2512)
--- base/src/bin/pg_dump/pg_dump.h	(working copy)
***************
*** 116,122 ****
  	DO_FOREIGN_SERVER,
  	DO_DEFAULT_ACL,
  	DO_BLOBS,
! 	DO_BLOB_COMMENTS
  } DumpableObjectType;
  
  typedef struct _dumpableObject
--- 116,123 ----
  	DO_FOREIGN_SERVER,
  	DO_DEFAULT_ACL,
  	DO_BLOBS,
! 	DO_BLOB_COMMENTS,
! 	DO_BLOB_ACLS,
  } DumpableObjectType;
  
  typedef struct _dumpableObject
***************
*** 442,447 ****
--- 443,454 ----
  	char	   *defaclacl;
  } DefaultACLInfo;
  
+ typedef struct _blobsInfo
+ {
+ 	DumpableObject dobj;
+ 	char	   *rolname;
+ } BlobsInfo;
+ 
  /* global decls */
  extern bool force_quotes;		/* double-quotes for identifiers flag */
  extern bool g_verbose;			/* verbose flag */
*** base/src/bin/pg_dump/pg_backup_tar.c	(revision 2512)
--- base/src/bin/pg_dump/pg_backup_tar.c	(working copy)
***************
*** 104,110 ****
  
  static const char *modulename = gettext_noop("tar archiver");
  
! static void _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt);
  
  static TAR_MEMBER *tarOpen(ArchiveHandle *AH, const char *filename, char mode);
  static void tarClose(ArchiveHandle *AH, TAR_MEMBER *TH);
--- 104,110 ----
  
  static const char *modulename = gettext_noop("tar archiver");
  
! static void _LoadBlobs(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  
  static TAR_MEMBER *tarOpen(ArchiveHandle *AH, const char *filename, char mode);
  static void tarClose(ArchiveHandle *AH, TAR_MEMBER *TH);
***************
*** 700,712 ****
  	}
  
  	if (strcmp(te->desc, "BLOBS") == 0)
! 		_LoadBlobs(AH, ropt);
  	else
  		_PrintFileData(AH, tctx->filename, ropt);
  }
  
  static void
! _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt)
  {
  	Oid			oid;
  	lclContext *ctx = (lclContext *) AH->formatData;
--- 700,712 ----
  	}
  
  	if (strcmp(te->desc, "BLOBS") == 0)
! 		_LoadBlobs(AH, te, ropt);
  	else
  		_PrintFileData(AH, tctx->filename, ropt);
  }
  
  static void
! _LoadBlobs(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  	Oid			oid;
  	lclContext *ctx = (lclContext *) AH->formatData;
***************
*** 737,742 ****
--- 737,745 ----
  					ahwrite(buf, 1, cnt, AH);
  				}
  				EndRestoreBlob(AH, oid);
+ 				if (!ropt->use_setsessauth)
+ 					ahprintf(AH, "ALTER LARGE OBJECT %u OWNER TO %s;\n\n",
+ 							 oid, te->owner);
  				foundBlob = true;
  			}
  			tarClose(AH, th);
*** base/src/bin/pg_dump/pg_dump_sort.c	(revision 2512)
--- base/src/bin/pg_dump/pg_dump_sort.c	(working copy)
***************
*** 93,99 ****
  	15,							/* DO_FOREIGN_SERVER */
  	27,							/* DO_DEFAULT_ACL */
  	20,							/* DO_BLOBS */
! 	21							/* DO_BLOB_COMMENTS */
  };
  
  
--- 93,100 ----
  	15,							/* DO_FOREIGN_SERVER */
  	27,							/* DO_DEFAULT_ACL */
  	20,							/* DO_BLOBS */
! 	21,							/* DO_BLOB_COMMENTS */
! 	28,							/* DO_BLOB_ACLS */
  };
  
  
***************
*** 1156,1161 ****
--- 1157,1167 ----
  					 "BLOB COMMENTS  (ID %d)",
  					 obj->dumpId);
  			return;
+ 		case DO_BLOB_ACLS:
+ 			snprintf(buf, bufsize,
+ 					 "BLOB ACLS  (ID %d)",
+ 					 obj->dumpId);
+ 			return;
  	}
  	/* shouldn't get here */
  	snprintf(buf, bufsize,
*** base/src/bin/pg_dump/pg_backup_files.c	(revision 2512)
--- base/src/bin/pg_dump/pg_backup_files.c	(working copy)
***************
*** 66,72 ****
  } lclTocEntry;
  
  static const char *modulename = gettext_noop("file archiver");
! static void _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt);
  static void _getBlobTocEntry(ArchiveHandle *AH, Oid *oid, char *fname);
  
  /*
--- 66,72 ----
  } lclTocEntry;
  
  static const char *modulename = gettext_noop("file archiver");
! static void _LoadBlobs(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  static void _getBlobTocEntry(ArchiveHandle *AH, Oid *oid, char *fname);
  
  /*
***************
*** 330,336 ****
  		return;
  
  	if (strcmp(te->desc, "BLOBS") == 0)
! 		_LoadBlobs(AH, ropt);
  	else
  		_PrintFileData(AH, tctx->filename, ropt);
  }
--- 330,336 ----
  		return;
  
  	if (strcmp(te->desc, "BLOBS") == 0)
! 		_LoadBlobs(AH, te, ropt);
  	else
  		_PrintFileData(AH, tctx->filename, ropt);
  }
***************
*** 365,371 ****
  }
  
  static void
! _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt)
  {
  	Oid			oid;
  	lclContext *ctx = (lclContext *) AH->formatData;
--- 365,371 ----
  }
  
  static void
! _LoadBlobs(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  	Oid			oid;
  	lclContext *ctx = (lclContext *) AH->formatData;
***************
*** 385,390 ****
--- 385,393 ----
  		StartRestoreBlob(AH, oid, ropt->dropSchema);
  		_PrintFileData(AH, fname, ropt);
  		EndRestoreBlob(AH, oid);
+ 		if (!ropt->use_setsessauth)
+ 			ahprintf(AH, "ALTER LARGE OBJECT %u OWNER TO %s;\n\n",
+ 					 oid, te->owner);
  		_getBlobTocEntry(AH, &oid, fname);
  	}
  
*** base/src/bin/pg_dump/pg_backup_archiver.c	(revision 2512)
--- base/src/bin/pg_dump/pg_backup_archiver.c	(working copy)
***************
*** 519,525 ****
  				_printTocEntry(AH, te, ropt, true, false);
  
  				if (strcmp(te->desc, "BLOBS") == 0 ||
! 					strcmp(te->desc, "BLOB COMMENTS") == 0)
  				{
  					ahlog(AH, 1, "restoring %s\n", te->desc);
  
--- 519,527 ----
  				_printTocEntry(AH, te, ropt, true, false);
  
  				if (strcmp(te->desc, "BLOBS") == 0 ||
! 					strcmp(te->desc, "BLOB COMMENTS") == 0 ||
! 					(!ropt->aclsSkip &&
! 					 strcmp(te->desc, "BLOB ACLS") == 0))
  				{
  					ahlog(AH, 1, "restoring %s\n", te->desc);
  
*** base/src/bin/pg_dump/pg_backup_custom.c	(revision 2512)
--- base/src/bin/pg_dump/pg_backup_custom.c	(working copy)
***************
*** 54,60 ****
  static void _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
  static void _EndBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
  static void _EndBlobs(ArchiveHandle *AH, TocEntry *te);
! static void _LoadBlobs(ArchiveHandle *AH, bool drop);
  static void _Clone(ArchiveHandle *AH);
  static void _DeClone(ArchiveHandle *AH);
  
--- 54,60 ----
  static void _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
  static void _EndBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
  static void _EndBlobs(ArchiveHandle *AH, TocEntry *te);
! static void _LoadBlobs(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  static void _Clone(ArchiveHandle *AH);
  static void _DeClone(ArchiveHandle *AH);
  
***************
*** 498,504 ****
  			break;
  
  		case BLK_BLOBS:
! 			_LoadBlobs(AH, ropt->dropSchema);
  			break;
  
  		default:				/* Always have a default */
--- 498,504 ----
  			break;
  
  		case BLK_BLOBS:
! 			_LoadBlobs(AH, te, ropt);
  			break;
  
  		default:				/* Always have a default */
***************
*** 619,625 ****
  }
  
  static void
! _LoadBlobs(ArchiveHandle *AH, bool drop)
  {
  	Oid			oid;
  
--- 619,625 ----
  }
  
  static void
! _LoadBlobs(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  	Oid			oid;
  
***************
*** 628,636 ****
  	oid = ReadInt(AH);
  	while (oid != 0)
  	{
! 		StartRestoreBlob(AH, oid, drop);
  		_PrintData(AH);
  		EndRestoreBlob(AH, oid);
  		oid = ReadInt(AH);
  	}
  
--- 628,639 ----
  	oid = ReadInt(AH);
  	while (oid != 0)
  	{
! 		StartRestoreBlob(AH, oid, ropt->dropSchema);
  		_PrintData(AH);
  		EndRestoreBlob(AH, oid);
+ 		if (!ropt->use_setsessauth)
+ 			ahprintf(AH, "ALTER LARGE OBJECT %u OWNER TO %s;\n\n",
+ 					 oid, te->owner);
  		oid = ReadInt(AH);
  	}
  
*** base/src/bin/pg_dump/pg_dump.c	(revision 2513)
--- base/src/bin/pg_dump/pg_dump.c	(working copy)
***************
*** 190,198 ****
  static char *getFormattedTypeName(Oid oid, OidOptions opts);
  static char *myFormatType(const char *typname, int32 typmod);
  static const char *fmtQualifiedId(const char *schema, const char *id);
! static bool hasBlobs(Archive *AH);
  static int	dumpBlobs(Archive *AH, void *arg);
  static int	dumpBlobComments(Archive *AH, void *arg);
  static void dumpDatabase(Archive *AH);
  static void dumpEncoding(Archive *AH);
  static void dumpStdStrings(Archive *AH);
--- 190,199 ----
  static char *getFormattedTypeName(Oid oid, OidOptions opts);
  static char *myFormatType(const char *typname, int32 typmod);
  static const char *fmtQualifiedId(const char *schema, const char *id);
! static void getBlobs(Archive *AH);
  static int	dumpBlobs(Archive *AH, void *arg);
  static int	dumpBlobComments(Archive *AH, void *arg);
+ static int	dumpBlobAcls(Archive *AH, void *arg);
  static void dumpDatabase(Archive *AH);
  static void dumpEncoding(Archive *AH);
  static void dumpStdStrings(Archive *AH);
***************
*** 695,720 ****
  			getTableDataFKConstraints();
  	}
  
! 	if (outputBlobs && hasBlobs(g_fout))
! 	{
! 		/* Add placeholders to allow correct sorting of blobs */
! 		DumpableObject *blobobj;
! 		DumpableObject *blobcobj;
  
- 		blobobj = (DumpableObject *) malloc(sizeof(DumpableObject));
- 		blobobj->objType = DO_BLOBS;
- 		blobobj->catId = nilCatalogId;
- 		AssignDumpId(blobobj);
- 		blobobj->name = strdup("BLOBS");
- 
- 		blobcobj = (DumpableObject *) malloc(sizeof(DumpableObject));
- 		blobcobj->objType = DO_BLOB_COMMENTS;
- 		blobcobj->catId = nilCatalogId;
- 		AssignDumpId(blobcobj);
- 		blobcobj->name = strdup("BLOB COMMENTS");
- 		addObjectDependency(blobcobj, blobobj->dumpId);
- 	}
- 
  	/*
  	 * Collect dependency data to assist in ordering the objects.
  	 */
--- 696,704 ----
  			getTableDataFKConstraints();
  	}
  
! 	if (outputBlobs)
! 		getBlobs(g_fout);
  
  	/*
  	 * Collect dependency data to assist in ordering the objects.
  	 */
***************
*** 1932,1966 ****
  
  
  /*
!  * hasBlobs:
   *	Test whether database contains any large objects
   */
! static bool
! hasBlobs(Archive *AH)
  {
! 	bool		result;
  	const char *blobQry;
  	PGresult   *res;
  
  	/* Make sure we are in proper schema */
  	selectSourceSchema("pg_catalog");
  
  	/* Check for BLOB OIDs */
  	if (AH->remoteVersion >= 80500)
! 		blobQry = "SELECT oid FROM pg_largeobject_metadata LIMIT 1";
  	else if (AH->remoteVersion >= 70100)
! 		blobQry = "SELECT loid FROM pg_largeobject LIMIT 1";
  	else
! 		blobQry = "SELECT oid FROM pg_class WHERE relkind = 'l' LIMIT 1";
  
  	res = PQexec(g_conn, blobQry);
  	check_sql_result(res, g_conn, blobQry, PGRES_TUPLES_OK);
  
! 	result = PQntuples(res) > 0;
  
  	PQclear(res);
- 
- 	return result;
  }
  
  /*
--- 1916,1980 ----
  
  
  /*
!  * getBlobs:
   *	Test whether database contains any large objects
+  *  If exist, it adds BlobsInfo objects for each owners
   */
! static void
! getBlobs(Archive *AH)
  {
! 	BlobsInfo  *blobobj;
! 	BlobsInfo  *blobcobj;
! 	BlobsInfo  *blobaobj;
  	const char *blobQry;
  	PGresult   *res;
+ 	int			i;
  
  	/* Make sure we are in proper schema */
  	selectSourceSchema("pg_catalog");
  
  	/* Check for BLOB OIDs */
  	if (AH->remoteVersion >= 80500)
! 		blobQry = "SELECT DISTINCT pg_get_userbyid(lomowner)"
! 			" FROM pg_largeobject_metadata";
  	else if (AH->remoteVersion >= 70100)
! 		blobQry = "SELECT NULL FROM pg_largeobject LIMIT 1";
  	else
! 		blobQry = "SELECT NULL FROM pg_class WHERE relkind = 'l' LIMIT 1";
  
  	res = PQexec(g_conn, blobQry);
  	check_sql_result(res, g_conn, blobQry, PGRES_TUPLES_OK);
  
! 	for (i = 0; i < PQntuples(res); i++)
! 	{
! 		blobobj = (BlobsInfo *) malloc(sizeof(BlobsInfo));
! 		blobobj->dobj.objType = DO_BLOBS;
! 		blobobj->dobj.catId = nilCatalogId;
! 		AssignDumpId(&blobobj->dobj);
! 		blobobj->dobj.name = strdup("BLOBS");
! 		blobobj->rolname = strdup(PQgetvalue(res, i, 0));
  
+ 		blobcobj = (BlobsInfo *) malloc(sizeof(BlobsInfo));
+ 		blobcobj->dobj.objType = DO_BLOB_COMMENTS;
+ 		blobcobj->dobj.catId = nilCatalogId;
+ 		AssignDumpId(&blobcobj->dobj);
+ 		blobcobj->dobj.name = strdup("BLOB COMMENTS");
+ 		blobcobj->rolname = strdup(PQgetvalue(res, i, 0));
+ 		addObjectDependency(&blobcobj->dobj, blobobj->dobj.dumpId);
+ 
+ 		if (AH->remoteVersion < 80500 || dataOnly || aclsSkip)
+ 			continue;
+ 
+ 		blobaobj = (BlobsInfo *) malloc(sizeof(BlobsInfo));
+ 		blobaobj->dobj.objType = DO_BLOB_ACLS;
+ 		blobaobj->dobj.catId = nilCatalogId;
+ 		AssignDumpId(&blobaobj->dobj);
+ 		blobaobj->dobj.name = strdup("BLOB ACLS");
+ 		blobaobj->rolname = strdup(PQgetvalue(res, i, 0));
+ 		addObjectDependency(&blobaobj->dobj, blobobj->dobj.dumpId);
+ 	}
+ 
  	PQclear(res);
  }
  
  /*
***************
*** 1970,1977 ****
  static int
  dumpBlobs(Archive *AH, void *arg)
  {
! 	const char *blobQry;
! 	const char *blobFetchQry;
  	PGresult   *res;
  	char		buf[LOBBUFSIZE];
  	int			i;
--- 1984,1993 ----
  static int
  dumpBlobs(Archive *AH, void *arg)
  {
! 	BlobsInfo  *binfo = (BlobsInfo *)arg;
! 	PQExpBuffer blobQry = createPQExpBuffer();
! 	const char *blobFetchQry = "FETCH 1000 IN bloboid";
! 	const char *blobCloseQry = "CLOSE bloboid";
  	PGresult   *res;
  	char		buf[LOBBUFSIZE];
  	int			i;
***************
*** 1985,2002 ****
  
  	/* Cursor to get all BLOB OIDs */
  	if (AH->remoteVersion >= 80500)
! 		blobQry = "DECLARE bloboid CURSOR FOR SELECT oid FROM pg_largeobject_metadata";
  	else if (AH->remoteVersion >= 70100)
! 		blobQry = "DECLARE bloboid CURSOR FOR SELECT DISTINCT loid FROM pg_largeobject";
  	else
! 		blobQry = "DECLARE bloboid CURSOR FOR SELECT oid FROM pg_class WHERE relkind = 'l'";
  
! 	res = PQexec(g_conn, blobQry);
! 	check_sql_result(res, g_conn, blobQry, PGRES_COMMAND_OK);
  
- 	/* Command to fetch from cursor */
- 	blobFetchQry = "FETCH 1000 IN bloboid";
- 
  	do
  	{
  		PQclear(res);
--- 2001,2020 ----
  
  	/* Cursor to get all BLOB OIDs */
  	if (AH->remoteVersion >= 80500)
! 		appendPQExpBuffer(blobQry, "DECLARE bloboid CURSOR FOR "
! 						  "SELECT oid, lomacl FROM pg_largeobject_metadata "
! 						  "WHERE pg_get_userbyid(lomowner) = '%s'",
! 						  binfo->rolname);
  	else if (AH->remoteVersion >= 70100)
! 		appendPQExpBuffer(blobQry, "DECLARE bloboid CURSOR FOR "
! 						  "SELECT DISTINCT loid, NULL FROM pg_largeobject");
  	else
! 		appendPQExpBuffer(blobQry, "DECLARE bloboid CURSOR FOR "
! 						  "SELECT oid, NULL FROM pg_class WHERE relkind = 'l'");
  
! 	res = PQexec(g_conn, blobQry->data);
! 	check_sql_result(res, g_conn, blobQry->data, PGRES_COMMAND_OK);
  
  	do
  	{
  		PQclear(res);
***************
*** 2045,2058 ****
  
  	PQclear(res);
  
  	return 1;
  }
  
  /*
   * dumpBlobComments
!  *	dump all blob properties.
!  *  It has "BLOB COMMENTS" tag due to the historical reason, but note
!  *  that it is the routine to dump all the properties of blobs.
   *
   * Since we don't provide any way to be selective about dumping blobs,
   * there's no need to be selective about their comments either.  We put
--- 2063,2078 ----
  
  	PQclear(res);
  
+ 	/* Cleanup cursor */
+ 	res = PQexec(g_conn, blobCloseQry);
+ 	check_sql_result(res, g_conn, blobCloseQry, PGRES_COMMAND_OK);
+ 
  	return 1;
  }
  
  /*
   * dumpBlobComments
!  *	dump all blob comments.
   *
   * Since we don't provide any way to be selective about dumping blobs,
   * there's no need to be selective about their comments either.  We put
***************
*** 2061,2069 ****
  static int
  dumpBlobComments(Archive *AH, void *arg)
  {
! 	const char *blobQry;
! 	const char *blobFetchQry;
  	PQExpBuffer cmdQry = createPQExpBuffer();
  	PGresult   *res;
  	int			i;
  
--- 2081,2091 ----
  static int
  dumpBlobComments(Archive *AH, void *arg)
  {
! 	char	   *rolname = ((BlobsInfo *)arg)->rolname;
  	PQExpBuffer cmdQry = createPQExpBuffer();
+ 	PQExpBuffer blobQry = createPQExpBuffer();
+ 	const char *blobFetchQry = "FETCH 100 IN blobcmt";
+ 	const char *blobCloseQry = "CLOSE blobcmt";
  	PGresult   *res;
  	int			i;
  
***************
*** 2075,2113 ****
  
  	/* Cursor to get all BLOB comments */
  	if (AH->remoteVersion >= 80500)
! 		blobQry = "DECLARE blobcmt CURSOR FOR SELECT oid, "
! 			"obj_description(oid, 'pg_largeobject'), "
! 			"pg_get_userbyid(lomowner), lomacl "
! 			"FROM pg_largeobject_metadata";
  	else if (AH->remoteVersion >= 70300)
! 		blobQry = "DECLARE blobcmt CURSOR FOR SELECT loid, "
! 			"obj_description(loid, 'pg_largeobject'), NULL, NULL "
! 			"FROM (SELECT DISTINCT loid FROM "
! 			"pg_description d JOIN pg_largeobject l ON (objoid = loid) "
! 			"WHERE classoid = 'pg_largeobject'::regclass) ss";
  	else if (AH->remoteVersion >= 70200)
! 		blobQry = "DECLARE blobcmt CURSOR FOR SELECT loid, "
! 			"obj_description(loid, 'pg_largeobject'), NULL, NULL "
! 			"FROM (SELECT DISTINCT loid FROM pg_largeobject) ss";
  	else if (AH->remoteVersion >= 70100)
! 		blobQry = "DECLARE blobcmt CURSOR FOR SELECT loid, "
! 			"obj_description(loid), NULL, NULL "
! 			"FROM (SELECT DISTINCT loid FROM pg_largeobject) ss";
  	else
! 		blobQry = "DECLARE blobcmt CURSOR FOR SELECT oid, "
! 			"	( "
! 			"		SELECT description "
! 			"		FROM pg_description pd "
! 			"		WHERE pd.objoid=pc.oid "
! 			"	), NULL, NULL "
! 			"FROM pg_class pc WHERE relkind = 'l'";
  
! 	res = PQexec(g_conn, blobQry);
! 	check_sql_result(res, g_conn, blobQry, PGRES_COMMAND_OK);
  
- 	/* Command to fetch from cursor */
- 	blobFetchQry = "FETCH 100 IN blobcmt";
- 
  	do
  	{
  		PQclear(res);
--- 2097,2137 ----
  
  	/* Cursor to get all BLOB comments */
  	if (AH->remoteVersion >= 80500)
! 		appendPQExpBuffer(blobQry,
! 						  "DECLARE blobcmt CURSOR FOR SELECT oid, "
! 						  "obj_description(oid, 'pg_largeobject') "
! 						  "FROM pg_largeobject_metadata "
! 						  "WHERE pg_get_userbyid(lomowner) = '%s'", rolname);
  	else if (AH->remoteVersion >= 70300)
! 		appendPQExpBuffer(blobQry,
! 						  "DECLARE blobcmt CURSOR FOR SELECT loid, "
! 						  "obj_description(loid, 'pg_largeobject') "
! 						  "FROM (SELECT DISTINCT loid FROM "
! 						  "pg_description d JOIN pg_largeobject l ON (objoid = loid) "
! 						  "WHERE classoid = 'pg_largeobject'::regclass) ss");
  	else if (AH->remoteVersion >= 70200)
! 		appendPQExpBuffer(blobQry,
! 						  "DECLARE blobcmt CURSOR FOR SELECT loid, "
! 						  "obj_description(loid, 'pg_largeobject') "
! 						  "FROM (SELECT DISTINCT loid FROM pg_largeobject) ss");
  	else if (AH->remoteVersion >= 70100)
! 		appendPQExpBuffer(blobQry,
! 						  "DECLARE blobcmt CURSOR FOR SELECT loid, "
! 						  "obj_description(loid) "
! 						  "FROM (SELECT DISTINCT loid FROM pg_largeobject) ss");
  	else
! 		appendPQExpBuffer(blobQry,
! 						  "DECLARE blobcmt CURSOR FOR SELECT oid, "
! 						  "	( "
! 						  "		SELECT description "
! 						  "		FROM pg_description pd "
! 						  "		WHERE pd.objoid=pc.oid "
! 						  "	) "
! 						  "FROM pg_class pc WHERE relkind = 'l'");
  
! 	res = PQexec(g_conn, blobQry->data);
! 	check_sql_result(res, g_conn, blobQry->data, PGRES_COMMAND_OK);
  
  	do
  	{
  		PQclear(res);
***************
*** 2121,2169 ****
  		{
  			Oid			blobOid = atooid(PQgetvalue(res, i, 0));
  			char	   *lo_comment = PQgetvalue(res, i, 1);
- 			char	   *lo_owner = PQgetvalue(res, i, 2);
- 			char	   *lo_acl = PQgetvalue(res, i, 3);
- 			char		lo_name[32];
  
  			resetPQExpBuffer(cmdQry);
  
! 			/* comment on the blob */
! 			if (!PQgetisnull(res, i, 1))
! 			{
! 				appendPQExpBuffer(cmdQry,
! 								  "COMMENT ON LARGE OBJECT %u IS ", blobOid);
! 				appendStringLiteralAH(cmdQry, lo_comment, AH);
! 				appendPQExpBuffer(cmdQry, ";\n");
! 			}
  
! 			/* dump blob ownership, if necessary */
! 			if (!PQgetisnull(res, i, 2))
! 			{
! 				appendPQExpBuffer(cmdQry,
! 								  "ALTER LARGE OBJECT %u OWNER TO %s;\n",
! 								  blobOid, lo_owner);
! 			}
  
! 			/* dump blob privileges, if necessary */
! 			if (!PQgetisnull(res, i, 3) &&
! 				!dataOnly && !aclsSkip)
! 			{
! 				snprintf(lo_name, sizeof(lo_name), "%u", blobOid);
! 				if (!buildACLCommands(lo_name, NULL, "LARGE OBJECT",
! 									  lo_acl, lo_owner, "",
! 									  AH->remoteVersion, cmdQry))
! 				{
! 					write_msg(NULL, "could not parse ACL (%s) for "
! 							  "large object %u", lo_acl, blobOid);
! 					exit_nicely();
! 				}
! 			}
  
! 			if (cmdQry->len > 0)
  			{
! 				appendPQExpBuffer(cmdQry, "\n");
! 				archputs(cmdQry->data, AH);
  			}
  		}
  	} while (PQntuples(res) > 0);
  
--- 2145,2244 ----
  		{
  			Oid			blobOid = atooid(PQgetvalue(res, i, 0));
  			char	   *lo_comment = PQgetvalue(res, i, 1);
  
+ 			/* comment on the blob, if necessary */
+ 			if (PQgetisnull(res, i, 1))
+ 				continue;
+ 
  			resetPQExpBuffer(cmdQry);
  
! 			appendPQExpBuffer(cmdQry,
! 							  "COMMENT ON LARGE OBJECT %u IS ", blobOid);
! 			appendStringLiteralAH(cmdQry, lo_comment, AH);
! 			appendPQExpBuffer(cmdQry, ";\n\n");
  
! 			archputs(cmdQry->data, AH);
! 		}
! 	} while (PQntuples(res) > 0);
  
! 	PQclear(res);
  
! 	archputs("\n", AH);
! 
! 	destroyPQExpBuffer(cmdQry);
! 
! 	/* Cleanup cursor */
! 	res = PQexec(g_conn, blobCloseQry);
! 	check_sql_result(res, g_conn, blobCloseQry, PGRES_COMMAND_OK);
! 
! 	return 1;
! }
! 
! /*
!  * dumpBlobAcls
!  *	dump all blob privileges.
!  *
!  * Since we don't provide any way to be selective about dumping blobs,
!  * there's no need to be selective about their privileges either.
!  * We put all the privileges into one big TOC entry.
!  */
! static int
! dumpBlobAcls(Archive *AH, void *arg)
! {
! 	char	   *rolname = ((BlobsInfo *)arg)->rolname;
! 	PQExpBuffer	cmdQry = createPQExpBuffer();
! 	PQExpBuffer blobQry = createPQExpBuffer();
! 	const char *blobFetchQry = "FETCH 100 IN blobacl";
! 	const char *blobCloseQry = "CLOSE blobacl";
! 	PGresult   *res;
! 	int			i;
! 
! 	if (g_verbose)
! 		write_msg(NULL, "saving large object permissions\n");
! 
! 	/* Cursor to get all BLOB permissions */
! 	if (AH->remoteVersion >= 80500)
! 		appendPQExpBuffer(blobQry, "DECLARE blobacl CURSOR FOR "
! 						  "SELECT oid, lomacl FROM pg_largeobject_metadata "
! 						  "WHERE pg_get_userbyid(lomowner) = '%s'", rolname);
! 	else
! 		return 1;	/* no need to dump anything < 8.5 */
! 
! 	/* Make sure we are in proper schema */
!     selectSourceSchema("pg_catalog");
! 
! 	/* Open cursor */
! 	res = PQexec(g_conn, blobQry->data);
!     check_sql_result(res, g_conn, blobQry->data, PGRES_COMMAND_OK);
! 
! 	do
! 	{
! 		PQclear(res);
! 
! 		/* Do a fetch */
! 		res = PQexec(g_conn, blobFetchQry);
! 		check_sql_result(res, g_conn, blobFetchQry, PGRES_TUPLES_OK);
! 
! 		/* Process the tuples, if any */
! 		for (i = 0; i < PQntuples(res); i++)
! 		{
! 			char   *lo_oid = PQgetvalue(res, i, 0);
! 			char   *lo_acl = PQgetvalue(res, i, 1);
! 
! 			if (PQgetisnull(res, i, 1))
! 				continue;
! 
! 			resetPQExpBuffer(cmdQry);
! 
! 			if (!buildACLCommands(lo_oid, NULL, "LARGE OBJECT",
! 								  lo_acl, rolname, "",
! 								  AH->remoteVersion, cmdQry))
  			{
! 				write_msg(NULL, "could not parse ACL (%s) for "
! 						  "large object %s", lo_acl, lo_oid);
! 				exit_nicely();
  			}
+ 			archprintf(AH, "%s\n", cmdQry->data);
  		}
  	} while (PQntuples(res) > 0);
  
***************
*** 2173,2178 ****
--- 2248,2257 ----
  
  	destroyPQExpBuffer(cmdQry);
  
+ 	/* Cleanup */
+ 	res = PQexec(g_conn, blobCloseQry);
+ 	check_sql_result(res, g_conn, blobCloseQry, PGRES_COMMAND_OK);
+ 
  	return 1;
  }
  
***************
*** 6292,6311 ****
  			break;
  		case DO_BLOBS:
  			ArchiveEntry(fout, dobj->catId, dobj->dumpId,
! 						 dobj->name, NULL, NULL, "",
  						 false, "BLOBS", SECTION_DATA,
  						 "", "", NULL,
  						 dobj->dependencies, dobj->nDeps,
! 						 dumpBlobs, NULL);
  			break;
  		case DO_BLOB_COMMENTS:
  			ArchiveEntry(fout, dobj->catId, dobj->dumpId,
! 						 dobj->name, NULL, NULL, "",
  						 false, "BLOB COMMENTS", SECTION_DATA,
  						 "", "", NULL,
  						 dobj->dependencies, dobj->nDeps,
! 						 dumpBlobComments, NULL);
  			break;
  	}
  }
  
--- 6371,6401 ----
  			break;
  		case DO_BLOBS:
  			ArchiveEntry(fout, dobj->catId, dobj->dumpId,
! 						 dobj->name, NULL, NULL,
! 						 ((BlobsInfo *) dobj)->rolname,
  						 false, "BLOBS", SECTION_DATA,
  						 "", "", NULL,
  						 dobj->dependencies, dobj->nDeps,
! 						 dumpBlobs, dobj);
  			break;
  		case DO_BLOB_COMMENTS:
  			ArchiveEntry(fout, dobj->catId, dobj->dumpId,
! 						 dobj->name, NULL, NULL,
! 						 ((BlobsInfo *) dobj)->rolname,
  						 false, "BLOB COMMENTS", SECTION_DATA,
  						 "", "", NULL,
  						 dobj->dependencies, dobj->nDeps,
! 						 dumpBlobComments, dobj);
  			break;
+ 		case DO_BLOB_ACLS:
+ 			ArchiveEntry(fout, dobj->catId, dobj->dumpId,
+ 						 dobj->name, NULL, NULL,
+ 						 ((BlobsInfo *) dobj)->rolname,
+ 						 false, "BLOB ACLS", SECTION_DATA,
+ 						 "", "", NULL,
+ 						 dobj->dependencies, dobj->nDeps,
+ 						 dumpBlobAcls, dobj);
+ 			break;
  	}
  }
  
#68Robert Haas
robertmhaas@gmail.com
In reply to: KaiGai Kohei (#67)
Re: Largeobject Access Controls (r2460)

2009/12/22 KaiGai Kohei <kaigai@ak.jp.nec.com>:

(2009/12/21 9:39), KaiGai Kohei wrote:

(2009/12/19 12:05), Robert Haas wrote:

On Fri, Dec 18, 2009 at 9:48 PM, Tom Lane<tgl@sss.pgh.pa.us>   wrote:

Robert Haas<robertmhaas@gmail.com>   writes:

Oh.  This is more complicated than it appeared on the surface.  It
seems that the string "BLOB COMMENTS" actually gets inserted into
custom dumps somewhere, so I'm not sure whether we can just change it.
   Was this issue discussed at some point before this was committed?
Changing it would seem to require inserting some backward
compatibility code here.  Another option would be to add a separate
section for "BLOB METADATA", and leave "BLOB COMMENTS" alone.  Can
anyone comment on what the Right Thing To Do is here?

The BLOB COMMENTS label is, or was, correct for what it contained.
If this patch has usurped it to contain other things

It has.

I would argue
that that is seriously wrong.  pg_dump already has a clear notion
of how to handle ACLs for objects.  ACLs for blobs ought to be
made to fit into that structure, not dumped in some random place
because that saved a few lines of code.

OK.  Hopefully KaiGai or Takahiro can suggest a fix.

The patch has grown larger than I expected before, because the way
to handle large objects are far from any other object classes.

Here are three points:

1) The new BLOB ACLS section was added.

It is a single purpose section to describe GRANT/REVOKE statements
on large objects, and BLOB COMMENTS section was reverted to put
only descriptions.

Because we need to assume a case when the database holds massive
number of large objects, it is not reasonable to store them using
dumpACL(). It chains an ACL entry with the list of TOC entries,
then, these are dumped. It means pg_dump may have to register massive
number of large objects in the local memory space.

Currently, we also store GRANT/REVOKE statements in BLOB COMMENTS
section, but confusable. Even if pg_restore is launched with
--no-privileges options, it cannot ignore GRANT/REVOKE statements
on large objects. This fix enables to distinguish ACLs on large
objects from other properties, and to handle them correctly.

2) The BLOBS section was separated for each database users.

Currently, the BLOBS section does not have information about owner
of the large objects to be restored. So, we tried to alter its
ownership in the BLOB COMMENTS section, but incorrect.

The --use-set-session-authorization option requires to restore
ownership of objects without ALTER ... OWNER TO statements.
So, we need to set up correct database username on the section
properties.

This patch renamed the hasBlobs() by getBlobs(), and changed its
purpose. It registers DO_BLOBS, DO_BLOB_COMMENTS and DO_BLOB_ACLS
for each large objects owners, if necessary.
For example, if here are five users owning large objects, getBlobs()
shall register five TOC entries for each users, and dumpBlobs(),
dumpBlobComments() and dumpBlobAcls() shall be also invoked five
times with the username.

3) _LoadBlobs()

For regular database object classes, _printTocEntry() can inject
"ALTER xxx OWNER TO ..." statement on the restored object based on
the ownership described in the section header.
However, we cannot use this infrastructure for large objects as-is,
because one BLOBS section can restore multiple large objects.

_LoadBlobs() is a routine to restore large objects within a certain
section. This patch modifies this routine to inject "ALTER LARGE
OBJECT <loid> OWNER TO <user>" statement for each large objects
based on the ownership of the section.
(if --use-set-session-authorization is not given.)

$ diffstat pgsql-fix-pg_dump-blob-privs.patch
 pg_backup_archiver.c |    4
 pg_backup_custom.c   |   11 !
 pg_backup_files.c    |    9 !
 pg_backup_tar.c      |    9 !
 pg_dump.c            |  312 +++++++----!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 pg_dump.h            |    9 !
 pg_dump_sort.c       |    8 !
 7 files changed, 68 insertions(+), 25 deletions(-), 269 modifications(!)

I will review this sooner if I have time, but please make sure it gets
added to the next CommitFest so we don't lose it. I think it also
needs to be added here, since AFAICS this is a must-fix for 8.5.

http://wiki.postgresql.org/wiki/PostgreSQL_8.5_Open_Items

Thanks,

...Robert

#69Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp
In reply to: KaiGai Kohei (#67)
Re: Largeobject Access Controls (r2460)

KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

This patch renamed the hasBlobs() by getBlobs(), and changed its
purpose. It registers DO_BLOBS, DO_BLOB_COMMENTS and DO_BLOB_ACLS
for each large objects owners, if necessary.

This patch adds DumpableObjectType DO_BLOB_ACLS and struct BlobsInfo. We
use three BlobsInfo objects for DO_BLOBS, DO_BLOB_COMMENTS, and DO_BLOB_ACLS
_for each distinct owners_ of large objects. So, even if we have many large
objects in the database, we just keep at most (3 * num-of-roles) BlobsInfo
in memory. For older versions of server, we assume that blobs are owned by
only one user with an empty name. We have no BlobsInfo if no large objects.

I'm not sure whether we need to make groups for each owner of large objects.
If I remember right, the primary issue was separating routines for dump
BLOB ACLS from routines for BLOB COMMENTS, right? Why did you make the change?

Another concern is their confusable identifier names -- getBlobs()
returns BlobsInfo for each owners. Could we rename them something
like getBlobOwners() and BlobOwnerInfo?

Also, DumpableObject.name is not used in BlobsInfo. We could reuse
DumpableObject.name instead of the "rolname" field in BlobsInfo.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

#70KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Takahiro Itagaki (#69)
Re: Largeobject Access Controls (r2460)

(2010/01/21 16:52), Takahiro Itagaki wrote:

KaiGai Kohei<kaigai@ak.jp.nec.com> wrote:

This patch renamed the hasBlobs() by getBlobs(), and changed its
purpose. It registers DO_BLOBS, DO_BLOB_COMMENTS and DO_BLOB_ACLS
for each large objects owners, if necessary.

This patch adds DumpableObjectType DO_BLOB_ACLS and struct BlobsInfo. We
use three BlobsInfo objects for DO_BLOBS, DO_BLOB_COMMENTS, and DO_BLOB_ACLS
_for each distinct owners_ of large objects. So, even if we have many large
objects in the database, we just keep at most (3 * num-of-roles) BlobsInfo
in memory. For older versions of server, we assume that blobs are owned by
only one user with an empty name. We have no BlobsInfo if no large objects.

I'm not sure whether we need to make groups for each owner of large objects.
If I remember right, the primary issue was separating routines for dump
BLOB ACLS from routines for BLOB COMMENTS, right? Why did you make the change?

When --use-set-session-authorization is specified, pg_restore tries to
change database role of the current session just before creation of
database objects to be restored.

Ownership of the database objects are recorded in the section header,
and it informs pg_restore who should be owner of the objects to be
restored in this section.

Then, pg_restore can generate ALTER xxx OWNER TO after creation, or
SET SESSION AUTHORIZATION before creation in runtime.
So, we cannot put creation of largeobjects with different ownership
in same section.

It is the reason why we have to group largeobjects by database user.

Another concern is their confusable identifier names -- getBlobs()
returns BlobsInfo for each owners. Could we rename them something
like getBlobOwners() and BlobOwnerInfo?

OK, I'll do.

Also, DumpableObject.name is not used in BlobsInfo. We could reuse
DumpableObject.name instead of the "rolname" field in BlobsInfo.

Isn't it confusable for the future hacks?
It follows the manner in other database object classes.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

#71Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp
In reply to: KaiGai Kohei (#70)
Re: Largeobject Access Controls (r2460)

KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

I'm not sure whether we need to make groups for each owner of large objects.
If I remember right, the primary issue was separating routines for dump
BLOB ACLS from routines for BLOB COMMENTS, right? Why did you make the change?

When --use-set-session-authorization is specified, pg_restore tries to
change database role of the current session just before creation of
database objects to be restored.

Ownership of the database objects are recorded in the section header,
and it informs pg_restore who should be owner of the objects to be
restored in this section.

Then, pg_restore can generate ALTER xxx OWNER TO after creation, or
SET SESSION AUTHORIZATION before creation in runtime.
So, we cannot put creation of largeobjects with different ownership
in same section.

It is the reason why we have to group largeobjects by database user.

Ah, I see.

Then... What happen if we drop or rename roles who have large objects
during pg_dump? Does the patch still work? It uses pg_get_userbyid().

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

#72KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Takahiro Itagaki (#71)
Re: Largeobject Access Controls (r2460)

(2010/01/21 19:42), Takahiro Itagaki wrote:

KaiGai Kohei<kaigai@ak.jp.nec.com> wrote:

I'm not sure whether we need to make groups for each owner of large objects.
If I remember right, the primary issue was separating routines for dump
BLOB ACLS from routines for BLOB COMMENTS, right? Why did you make the change?

When --use-set-session-authorization is specified, pg_restore tries to
change database role of the current session just before creation of
database objects to be restored.

Ownership of the database objects are recorded in the section header,
and it informs pg_restore who should be owner of the objects to be
restored in this section.

Then, pg_restore can generate ALTER xxx OWNER TO after creation, or
SET SESSION AUTHORIZATION before creation in runtime.
So, we cannot put creation of largeobjects with different ownership
in same section.

It is the reason why we have to group largeobjects by database user.

Ah, I see.

Then... What happen if we drop or rename roles who have large objects
during pg_dump? Does the patch still work? It uses pg_get_userbyid().

Indeed, pg_get_userbyid() internally uses SnapshotNow always, then it
can return "unknown (OID=%u)" in this case.

The "username_subquery" should be here, instead.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

#73KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Takahiro Itagaki (#71)
1 attachment(s)
Re: Largeobject Access Controls (r2460)

The attached patch is a revised version.

List of updates:
- cleanup: getBlobs() was renamed to getBlobOwners()
- cleanup: BlobsInfo was renamed to BlobOwnerInfo
- bugfix: pg_get_userbyid() in SQLs were replaced by username_subquery which
constins a right subquery to obtain a username for the given id.
It enables to run pg_dump under the concurrent role deletion.
- bugfix: Even if we give -a (--data-only) or -O (--no-owner) options,
or archive does not contain Owner information, it tried to write
out "ALTER LARGE OBJECT xxx OWNER TO ..." statement.
- bugfix: Even if we give -a (--data-only) or -x (--no-privileges) options,
it tried to write out "BLOB ACLS" section.

The last two are the problems I noticed newly. It needs to introduce them.

The BLOB section can contain multiple definitions of large objects, unlike
any other object classes. It is also a reason why I had to group large
objects by database user.
The Owner tag of BLOB section is used to identify owner of the large objects
to be restored, and also used in --use-set-session-authorization mode.
However, we need to inject "ALTER LARGE OBJECT xxx OWNER TO ..." statement
for each lo_create() in _LoadBlobs(), because we cannot know how many large
objects are in the section before reading the archive.
But the last patch didn't pay mention about -a/-O option and an archive
which does not have Owner: tag.

The BLOB ACLS section is categorized to SECTION_DATA, it follows the manner
in BLOB COMMENTS behavior. In same reason, it has to handle the -a/-x option
by itself, but the last patch didn't handle it well.

BTW, here is a known issue. When we run pg_dump with -s(--schema-only),
it write out descriptions of regular object classes, but does not write
out the description of large objects.
It seems to me the description of large objects are considered as a part
of data, not properties. However, it might be inconsistent.

The reason of this behavior is all the BLOB dumps are categorized to
SECTION_DATA, so -s option informs pg_backup_archiver.c to skip routines
related to large objects.

However, it may be a time to consider this code into two steps.
In the schema section stage,
- It creates empty large objects with lo_create()
- It restores the description of the large objects.
- It restores the ownership/privileges of the large objects.

In the date section stage,
- It loads actual data contents into the empty large objects with lowrite().

Thanks,

(2010/01/21 19:42), Takahiro Itagaki wrote:

KaiGai Kohei<kaigai@ak.jp.nec.com> wrote:

I'm not sure whether we need to make groups for each owner of large objects.
If I remember right, the primary issue was separating routines for dump
BLOB ACLS from routines for BLOB COMMENTS, right? Why did you make the change?

When --use-set-session-authorization is specified, pg_restore tries to
change database role of the current session just before creation of
database objects to be restored.

Ownership of the database objects are recorded in the section header,
and it informs pg_restore who should be owner of the objects to be
restored in this section.

Then, pg_restore can generate ALTER xxx OWNER TO after creation, or
SET SESSION AUTHORIZATION before creation in runtime.
So, we cannot put creation of largeobjects with different ownership
in same section.

It is the reason why we have to group largeobjects by database user.

Ah, I see.

Then... What happen if we drop or rename roles who have large objects
during pg_dump? Does the patch still work? It uses pg_get_userbyid().

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

pgsql-fix-pg_dump-blob-privs.2.patchtext/x-patch; name=pgsql-fix-pg_dump-blob-privs.2.patchDownload
*** a/src/bin/pg_dump/pg_backup_archiver.c
--- b/src/bin/pg_dump/pg_backup_archiver.c
***************
*** 517,527 **** restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
  			 */
  			if (AH->PrintTocDataPtr !=NULL && (reqs & REQ_DATA) != 0)
  			{
- 				_printTocEntry(AH, te, ropt, true, false);
- 
  				if (strcmp(te->desc, "BLOBS") == 0 ||
! 					strcmp(te->desc, "BLOB COMMENTS") == 0)
  				{
  					ahlog(AH, 1, "restoring %s\n", te->desc);
  
  					_selectOutputSchema(AH, "pg_catalog");
--- 517,535 ----
  			 */
  			if (AH->PrintTocDataPtr !=NULL && (reqs & REQ_DATA) != 0)
  			{
  				if (strcmp(te->desc, "BLOBS") == 0 ||
! 					strcmp(te->desc, "BLOB COMMENTS") == 0 ||
! 					strcmp(te->desc, "BLOB ACLS") == 0)
  				{
+ 					/*
+ 					 * We don't need to dump ACLs, if -a or -x cases.
+ 					 */
+ 					if (strcmp(te->desc, "BLOB ACLS") == 0 &&
+ 						(ropt->aclsSkip || ropt->dataOnly))
+ 						return retval;
+ 
+ 					_printTocEntry(AH, te, ropt, true, false);
+ 
  					ahlog(AH, 1, "restoring %s\n", te->desc);
  
  					_selectOutputSchema(AH, "pg_catalog");
***************
*** 530,535 **** restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
--- 538,545 ----
  				}
  				else
  				{
+ 					_printTocEntry(AH, te, ropt, true, false);
+ 
  					_disableTriggersIfNecessary(AH, te, ropt);
  
  					/* Select owner and schema as necessary */
*** a/src/bin/pg_dump/pg_backup_custom.c
--- b/src/bin/pg_dump/pg_backup_custom.c
***************
*** 54,60 **** static void _StartBlobs(ArchiveHandle *AH, TocEntry *te);
  static void _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
  static void _EndBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
  static void _EndBlobs(ArchiveHandle *AH, TocEntry *te);
! static void _LoadBlobs(ArchiveHandle *AH, bool drop);
  static void _Clone(ArchiveHandle *AH);
  static void _DeClone(ArchiveHandle *AH);
  
--- 54,60 ----
  static void _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
  static void _EndBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
  static void _EndBlobs(ArchiveHandle *AH, TocEntry *te);
! static void _LoadBlobs(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  static void _Clone(ArchiveHandle *AH);
  static void _DeClone(ArchiveHandle *AH);
  
***************
*** 498,504 **** _PrintTocData(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  			break;
  
  		case BLK_BLOBS:
! 			_LoadBlobs(AH, ropt->dropSchema);
  			break;
  
  		default:				/* Always have a default */
--- 498,504 ----
  			break;
  
  		case BLK_BLOBS:
! 			_LoadBlobs(AH, te, ropt);
  			break;
  
  		default:				/* Always have a default */
***************
*** 619,625 **** _PrintData(ArchiveHandle *AH)
  }
  
  static void
! _LoadBlobs(ArchiveHandle *AH, bool drop)
  {
  	Oid			oid;
  
--- 619,625 ----
  }
  
  static void
! _LoadBlobs(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  	Oid			oid;
  
***************
*** 628,636 **** _LoadBlobs(ArchiveHandle *AH, bool drop)
  	oid = ReadInt(AH);
  	while (oid != 0)
  	{
! 		StartRestoreBlob(AH, oid, drop);
  		_PrintData(AH);
  		EndRestoreBlob(AH, oid);
  		oid = ReadInt(AH);
  	}
  
--- 628,640 ----
  	oid = ReadInt(AH);
  	while (oid != 0)
  	{
! 		StartRestoreBlob(AH, oid, ropt->dropSchema);
  		_PrintData(AH);
  		EndRestoreBlob(AH, oid);
+ 		if (!ropt->noOwner && !ropt->dataOnly &&
+ 			!ropt->use_setsessauth && strlen(te->owner) > 0)
+ 			ahprintf(AH, "ALTER LARGE OBJECT %u OWNER TO %s;\n\n",
+ 					 oid, te->owner);
  		oid = ReadInt(AH);
  	}
  
*** a/src/bin/pg_dump/pg_backup_files.c
--- b/src/bin/pg_dump/pg_backup_files.c
***************
*** 66,72 **** typedef struct
  } lclTocEntry;
  
  static const char *modulename = gettext_noop("file archiver");
! static void _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt);
  static void _getBlobTocEntry(ArchiveHandle *AH, Oid *oid, char *fname);
  
  /*
--- 66,72 ----
  } lclTocEntry;
  
  static const char *modulename = gettext_noop("file archiver");
! static void _LoadBlobs(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  static void _getBlobTocEntry(ArchiveHandle *AH, Oid *oid, char *fname);
  
  /*
***************
*** 330,336 **** _PrintTocData(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  		return;
  
  	if (strcmp(te->desc, "BLOBS") == 0)
! 		_LoadBlobs(AH, ropt);
  	else
  		_PrintFileData(AH, tctx->filename, ropt);
  }
--- 330,336 ----
  		return;
  
  	if (strcmp(te->desc, "BLOBS") == 0)
! 		_LoadBlobs(AH, te, ropt);
  	else
  		_PrintFileData(AH, tctx->filename, ropt);
  }
***************
*** 365,371 **** _getBlobTocEntry(ArchiveHandle *AH, Oid *oid, char fname[K_STD_BUF_SIZE])
  }
  
  static void
! _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt)
  {
  	Oid			oid;
  	lclContext *ctx = (lclContext *) AH->formatData;
--- 365,371 ----
  }
  
  static void
! _LoadBlobs(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  	Oid			oid;
  	lclContext *ctx = (lclContext *) AH->formatData;
***************
*** 385,390 **** _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt)
--- 385,394 ----
  		StartRestoreBlob(AH, oid, ropt->dropSchema);
  		_PrintFileData(AH, fname, ropt);
  		EndRestoreBlob(AH, oid);
+ 		if (!ropt->noOwner && !ropt->dataOnly &&
+ 			!ropt->use_setsessauth && strlen(te->owner) > 0)
+ 			ahprintf(AH, "ALTER LARGE OBJECT %u OWNER TO %s;\n\n",
+ 					 oid, te->owner);
  		_getBlobTocEntry(AH, &oid, fname);
  	}
  
*** a/src/bin/pg_dump/pg_backup_tar.c
--- b/src/bin/pg_dump/pg_backup_tar.c
***************
*** 100,106 **** typedef struct
  
  static const char *modulename = gettext_noop("tar archiver");
  
! static void _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt);
  
  static TAR_MEMBER *tarOpen(ArchiveHandle *AH, const char *filename, char mode);
  static void tarClose(ArchiveHandle *AH, TAR_MEMBER *TH);
--- 100,106 ----
  
  static const char *modulename = gettext_noop("tar archiver");
  
! static void _LoadBlobs(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  
  static TAR_MEMBER *tarOpen(ArchiveHandle *AH, const char *filename, char mode);
  static void tarClose(ArchiveHandle *AH, TAR_MEMBER *TH);
***************
*** 696,708 **** _PrintTocData(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  	}
  
  	if (strcmp(te->desc, "BLOBS") == 0)
! 		_LoadBlobs(AH, ropt);
  	else
  		_PrintFileData(AH, tctx->filename, ropt);
  }
  
  static void
! _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt)
  {
  	Oid			oid;
  	lclContext *ctx = (lclContext *) AH->formatData;
--- 696,708 ----
  	}
  
  	if (strcmp(te->desc, "BLOBS") == 0)
! 		_LoadBlobs(AH, te, ropt);
  	else
  		_PrintFileData(AH, tctx->filename, ropt);
  }
  
  static void
! _LoadBlobs(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  	Oid			oid;
  	lclContext *ctx = (lclContext *) AH->formatData;
***************
*** 733,738 **** _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt)
--- 733,742 ----
  					ahwrite(buf, 1, cnt, AH);
  				}
  				EndRestoreBlob(AH, oid);
+ 				if (!ropt->noOwner && !ropt->dataOnly &&
+ 					!ropt->use_setsessauth && strlen(te->owner) > 0)
+ 					ahprintf(AH, "ALTER LARGE OBJECT %u OWNER TO %s;\n\n",
+ 							 oid, te->owner);
  				foundBlob = true;
  			}
  			tarClose(AH, th);
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
***************
*** 190,198 **** static void selectSourceSchema(const char *schemaName);
  static char *getFormattedTypeName(Oid oid, OidOptions opts);
  static char *myFormatType(const char *typname, int32 typmod);
  static const char *fmtQualifiedId(const char *schema, const char *id);
! static bool hasBlobs(Archive *AH);
  static int	dumpBlobs(Archive *AH, void *arg);
  static int	dumpBlobComments(Archive *AH, void *arg);
  static void dumpDatabase(Archive *AH);
  static void dumpEncoding(Archive *AH);
  static void dumpStdStrings(Archive *AH);
--- 190,199 ----
  static char *getFormattedTypeName(Oid oid, OidOptions opts);
  static char *myFormatType(const char *typname, int32 typmod);
  static const char *fmtQualifiedId(const char *schema, const char *id);
! static void getBlobOwners(Archive *AH);
  static int	dumpBlobs(Archive *AH, void *arg);
  static int	dumpBlobComments(Archive *AH, void *arg);
+ static int	dumpBlobAcls(Archive *AH, void *arg);
  static void dumpDatabase(Archive *AH);
  static void dumpEncoding(Archive *AH);
  static void dumpStdStrings(Archive *AH);
***************
*** 701,725 **** main(int argc, char **argv)
  			getTableDataFKConstraints();
  	}
  
! 	if (outputBlobs && hasBlobs(g_fout))
! 	{
! 		/* Add placeholders to allow correct sorting of blobs */
! 		DumpableObject *blobobj;
! 		DumpableObject *blobcobj;
! 
! 		blobobj = (DumpableObject *) malloc(sizeof(DumpableObject));
! 		blobobj->objType = DO_BLOBS;
! 		blobobj->catId = nilCatalogId;
! 		AssignDumpId(blobobj);
! 		blobobj->name = strdup("BLOBS");
! 
! 		blobcobj = (DumpableObject *) malloc(sizeof(DumpableObject));
! 		blobcobj->objType = DO_BLOB_COMMENTS;
! 		blobcobj->catId = nilCatalogId;
! 		AssignDumpId(blobcobj);
! 		blobcobj->name = strdup("BLOB COMMENTS");
! 		addObjectDependency(blobcobj, blobobj->dumpId);
! 	}
  
  	/*
  	 * Collect dependency data to assist in ordering the objects.
--- 702,709 ----
  			getTableDataFKConstraints();
  	}
  
! 	if (outputBlobs)
! 		getBlobOwners(g_fout);
  
  	/*
  	 * Collect dependency data to assist in ordering the objects.
***************
*** 1938,1972 **** dumpStdStrings(Archive *AH)
  
  
  /*
!  * hasBlobs:
   *	Test whether database contains any large objects
   */
! static bool
! hasBlobs(Archive *AH)
  {
! 	bool		result;
! 	const char *blobQry;
! 	PGresult   *res;
  
  	/* Make sure we are in proper schema */
  	selectSourceSchema("pg_catalog");
  
  	/* Check for BLOB OIDs */
  	if (AH->remoteVersion >= 80500)
! 		blobQry = "SELECT oid FROM pg_largeobject_metadata LIMIT 1";
  	else if (AH->remoteVersion >= 70100)
! 		blobQry = "SELECT loid FROM pg_largeobject LIMIT 1";
  	else
! 		blobQry = "SELECT oid FROM pg_class WHERE relkind = 'l' LIMIT 1";
  
! 	res = PQexec(g_conn, blobQry);
! 	check_sql_result(res, g_conn, blobQry, PGRES_TUPLES_OK);
  
! 	result = PQntuples(res) > 0;
  
  	PQclear(res);
- 
- 	return result;
  }
  
  /*
--- 1922,1990 ----
  
  
  /*
!  * getBlobOwners:
   *	Test whether database contains any large objects
+  *  If exist, it adds BlobOwnerInfo objects for each owners
   */
! static void
! getBlobOwners(Archive *AH)
  {
! 	PQExpBuffer		blobQry = createPQExpBuffer();
! 	BlobOwnerInfo  *blobobj;
! 	BlobOwnerInfo  *blobcobj;
! 	BlobOwnerInfo  *blobaobj;
! 	PGresult	   *res;
! 	int				i;
  
  	/* Make sure we are in proper schema */
  	selectSourceSchema("pg_catalog");
  
  	/* Check for BLOB OIDs */
  	if (AH->remoteVersion >= 80500)
! 		appendPQExpBuffer(blobQry,
! 						  "SELECT DISTINCT (%s lomowner)"
! 						  " FROM pg_largeobject_metadata",
! 						  username_subquery);
  	else if (AH->remoteVersion >= 70100)
! 		appendPQExpBuffer(blobQry,
! 						  "SELECT NULL FROM pg_largeobject LIMIT 1");
  	else
! 		appendPQExpBuffer(blobQry,
! 						  "SELECT NULL FROM pg_class WHERE relkind = 'l' LIMIT 1");
  
! 	res = PQexec(g_conn, blobQry->data);
! 	check_sql_result(res, g_conn, blobQry->data, PGRES_TUPLES_OK);
  
! 	for (i = 0; i < PQntuples(res); i++)
! 	{
! 		blobobj = (BlobOwnerInfo *) malloc(sizeof(BlobOwnerInfo));
! 		blobobj->dobj.objType = DO_BLOBS;
! 		blobobj->dobj.catId = nilCatalogId;
! 		AssignDumpId(&blobobj->dobj);
! 		blobobj->dobj.name = strdup("BLOBS");
! 		blobobj->rolname = strdup(PQgetvalue(res, i, 0));
! 
! 		blobcobj = (BlobOwnerInfo *) malloc(sizeof(BlobOwnerInfo));
! 		blobcobj->dobj.objType = DO_BLOB_COMMENTS;
! 		blobcobj->dobj.catId = nilCatalogId;
! 		AssignDumpId(&blobcobj->dobj);
! 		blobcobj->dobj.name = strdup("BLOB COMMENTS");
! 		blobcobj->rolname = strdup(PQgetvalue(res, i, 0));
! 		addObjectDependency(&blobcobj->dobj, blobobj->dobj.dumpId);
! 
! 		if (AH->remoteVersion >= 80500 && !dataOnly && !aclsSkip)
! 		{
! 			blobaobj = (BlobOwnerInfo *) malloc(sizeof(BlobOwnerInfo));
! 			blobaobj->dobj.objType = DO_BLOB_ACLS;
! 			blobaobj->dobj.catId = nilCatalogId;
! 			AssignDumpId(&blobaobj->dobj);
! 			blobaobj->dobj.name = strdup("BLOB ACLS");
! 			blobaobj->rolname = strdup(PQgetvalue(res, i, 0));
! 			addObjectDependency(&blobaobj->dobj, blobobj->dobj.dumpId);
! 		}
! 	}
  
  	PQclear(res);
  }
  
  /*
***************
*** 1976,1983 **** hasBlobs(Archive *AH)
  static int
  dumpBlobs(Archive *AH, void *arg)
  {
! 	const char *blobQry;
! 	const char *blobFetchQry;
  	PGresult   *res;
  	char		buf[LOBBUFSIZE];
  	int			i;
--- 1994,2003 ----
  static int
  dumpBlobs(Archive *AH, void *arg)
  {
! 	char	   *rolname = ((BlobOwnerInfo *)arg)->rolname;
! 	PQExpBuffer	blobQry = createPQExpBuffer();
! 	const char *blobFetchQry = "FETCH 1000 IN bloboid";
! 	const char *blobCloseQry = "CLOSE bloboid";
  	PGresult   *res;
  	char		buf[LOBBUFSIZE];
  	int			i;
***************
*** 1991,2007 **** dumpBlobs(Archive *AH, void *arg)
  
  	/* Cursor to get all BLOB OIDs */
  	if (AH->remoteVersion >= 80500)
! 		blobQry = "DECLARE bloboid CURSOR FOR SELECT oid FROM pg_largeobject_metadata";
  	else if (AH->remoteVersion >= 70100)
! 		blobQry = "DECLARE bloboid CURSOR FOR SELECT DISTINCT loid FROM pg_largeobject";
  	else
! 		blobQry = "DECLARE bloboid CURSOR FOR SELECT oid FROM pg_class WHERE relkind = 'l'";
  
! 	res = PQexec(g_conn, blobQry);
! 	check_sql_result(res, g_conn, blobQry, PGRES_COMMAND_OK);
! 
! 	/* Command to fetch from cursor */
! 	blobFetchQry = "FETCH 1000 IN bloboid";
  
  	do
  	{
--- 2011,2029 ----
  
  	/* Cursor to get all BLOB OIDs */
  	if (AH->remoteVersion >= 80500)
! 		appendPQExpBuffer(blobQry, "DECLARE bloboid CURSOR FOR "
! 						  "SELECT oid, lomacl FROM pg_largeobject_metadata "
! 						  "WHERE '%s' in (%s lomowner)",
! 						  rolname, username_subquery);
  	else if (AH->remoteVersion >= 70100)
! 		appendPQExpBuffer(blobQry, "DECLARE bloboid CURSOR FOR "
! 						  "SELECT DISTINCT loid, NULL FROM pg_largeobject");
  	else
! 		appendPQExpBuffer(blobQry, "DECLARE bloboid CURSOR FOR "
! 						  "SELECT oid, NULL FROM pg_class WHERE relkind = 'l'");
  
! 	res = PQexec(g_conn, blobQry->data);
! 	check_sql_result(res, g_conn, blobQry->data, PGRES_COMMAND_OK);
  
  	do
  	{
***************
*** 2051,2064 **** dumpBlobs(Archive *AH, void *arg)
  
  	PQclear(res);
  
  	return 1;
  }
  
  /*
   * dumpBlobComments
!  *	dump all blob properties.
!  *  It has "BLOB COMMENTS" tag due to the historical reason, but note
!  *  that it is the routine to dump all the properties of blobs.
   *
   * Since we don't provide any way to be selective about dumping blobs,
   * there's no need to be selective about their comments either.  We put
--- 2073,2088 ----
  
  	PQclear(res);
  
+ 	/* Cleanup cursor */
+ 	res = PQexec(g_conn, blobCloseQry);
+ 	check_sql_result(res, g_conn, blobCloseQry, PGRES_COMMAND_OK);
+ 
  	return 1;
  }
  
  /*
   * dumpBlobComments
!  *	dump all blob comments.
   *
   * Since we don't provide any way to be selective about dumping blobs,
   * there's no need to be selective about their comments either.  We put
***************
*** 2067,2075 **** dumpBlobs(Archive *AH, void *arg)
  static int
  dumpBlobComments(Archive *AH, void *arg)
  {
! 	const char *blobQry;
! 	const char *blobFetchQry;
  	PQExpBuffer cmdQry = createPQExpBuffer();
  	PGresult   *res;
  	int			i;
  
--- 2091,2101 ----
  static int
  dumpBlobComments(Archive *AH, void *arg)
  {
! 	char	   *rolname = ((BlobOwnerInfo *)arg)->rolname;
  	PQExpBuffer cmdQry = createPQExpBuffer();
+ 	PQExpBuffer blobQry = createPQExpBuffer();
+ 	const char *blobFetchQry = "FETCH 100 IN blobcmt";
+ 	const char *blobCloseQry = "CLOSE blobcmt";
  	PGresult   *res;
  	int			i;
  
***************
*** 2081,2118 **** dumpBlobComments(Archive *AH, void *arg)
  
  	/* Cursor to get all BLOB comments */
  	if (AH->remoteVersion >= 80500)
! 		blobQry = "DECLARE blobcmt CURSOR FOR SELECT oid, "
! 			"obj_description(oid, 'pg_largeobject'), "
! 			"pg_get_userbyid(lomowner), lomacl "
! 			"FROM pg_largeobject_metadata";
  	else if (AH->remoteVersion >= 70300)
! 		blobQry = "DECLARE blobcmt CURSOR FOR SELECT loid, "
! 			"obj_description(loid, 'pg_largeobject'), NULL, NULL "
! 			"FROM (SELECT DISTINCT loid FROM "
! 			"pg_description d JOIN pg_largeobject l ON (objoid = loid) "
! 			"WHERE classoid = 'pg_largeobject'::regclass) ss";
  	else if (AH->remoteVersion >= 70200)
! 		blobQry = "DECLARE blobcmt CURSOR FOR SELECT loid, "
! 			"obj_description(loid, 'pg_largeobject'), NULL, NULL "
! 			"FROM (SELECT DISTINCT loid FROM pg_largeobject) ss";
  	else if (AH->remoteVersion >= 70100)
! 		blobQry = "DECLARE blobcmt CURSOR FOR SELECT loid, "
! 			"obj_description(loid), NULL, NULL "
! 			"FROM (SELECT DISTINCT loid FROM pg_largeobject) ss";
  	else
! 		blobQry = "DECLARE blobcmt CURSOR FOR SELECT oid, "
! 			"	( "
! 			"		SELECT description "
! 			"		FROM pg_description pd "
! 			"		WHERE pd.objoid=pc.oid "
! 			"	), NULL, NULL "
! 			"FROM pg_class pc WHERE relkind = 'l'";
! 
! 	res = PQexec(g_conn, blobQry);
! 	check_sql_result(res, g_conn, blobQry, PGRES_COMMAND_OK);
! 
! 	/* Command to fetch from cursor */
! 	blobFetchQry = "FETCH 100 IN blobcmt";
  
  	do
  	{
--- 2107,2147 ----
  
  	/* Cursor to get all BLOB comments */
  	if (AH->remoteVersion >= 80500)
! 		appendPQExpBuffer(blobQry,
! 						  "DECLARE blobcmt CURSOR FOR SELECT oid, "
! 						  "obj_description(oid, 'pg_largeobject') "
! 						  "FROM pg_largeobject_metadata "
! 						  "WHERE '%s' in (%s lomowner)",
! 						  rolname, username_subquery);
  	else if (AH->remoteVersion >= 70300)
! 		appendPQExpBuffer(blobQry,
! 						  "DECLARE blobcmt CURSOR FOR SELECT loid, "
! 						  "obj_description(loid, 'pg_largeobject') "
! 						  "FROM (SELECT DISTINCT loid FROM "
! 						  "pg_description d JOIN pg_largeobject l ON (objoid = loid) "
! 						  "WHERE classoid = 'pg_largeobject'::regclass) ss");
  	else if (AH->remoteVersion >= 70200)
! 		appendPQExpBuffer(blobQry,
! 						  "DECLARE blobcmt CURSOR FOR SELECT loid, "
! 						  "obj_description(loid, 'pg_largeobject') "
! 						  "FROM (SELECT DISTINCT loid FROM pg_largeobject) ss");
  	else if (AH->remoteVersion >= 70100)
! 		appendPQExpBuffer(blobQry,
! 						  "DECLARE blobcmt CURSOR FOR SELECT loid, "
! 						  "obj_description(loid) "
! 						  "FROM (SELECT DISTINCT loid FROM pg_largeobject) ss");
  	else
! 		appendPQExpBuffer(blobQry,
! 						  "DECLARE blobcmt CURSOR FOR SELECT oid, "
! 						  "	( "
! 						  "		SELECT description "
! 						  "		FROM pg_description pd "
! 						  "		WHERE pd.objoid=pc.oid "
! 						  "	) "
! 						  "FROM pg_class pc WHERE relkind = 'l'");
! 
! 	res = PQexec(g_conn, blobQry->data);
! 	check_sql_result(res, g_conn, blobQry->data, PGRES_COMMAND_OK);
  
  	do
  	{
***************
*** 2127,2175 **** dumpBlobComments(Archive *AH, void *arg)
  		{
  			Oid			blobOid = atooid(PQgetvalue(res, i, 0));
  			char	   *lo_comment = PQgetvalue(res, i, 1);
! 			char	   *lo_owner = PQgetvalue(res, i, 2);
! 			char	   *lo_acl = PQgetvalue(res, i, 3);
! 			char		lo_name[32];
  
  			resetPQExpBuffer(cmdQry);
  
! 			/* comment on the blob */
! 			if (!PQgetisnull(res, i, 1))
! 			{
! 				appendPQExpBuffer(cmdQry,
! 								  "COMMENT ON LARGE OBJECT %u IS ", blobOid);
! 				appendStringLiteralAH(cmdQry, lo_comment, AH);
! 				appendPQExpBuffer(cmdQry, ";\n");
! 			}
  
! 			/* dump blob ownership, if necessary */
! 			if (!PQgetisnull(res, i, 2))
! 			{
! 				appendPQExpBuffer(cmdQry,
! 								  "ALTER LARGE OBJECT %u OWNER TO %s;\n",
! 								  blobOid, lo_owner);
! 			}
  
! 			/* dump blob privileges, if necessary */
! 			if (!PQgetisnull(res, i, 3) &&
! 				!dataOnly && !aclsSkip)
! 			{
! 				snprintf(lo_name, sizeof(lo_name), "%u", blobOid);
! 				if (!buildACLCommands(lo_name, NULL, "LARGE OBJECT",
! 									  lo_acl, lo_owner, "",
! 									  AH->remoteVersion, cmdQry))
! 				{
! 					write_msg(NULL, "could not parse ACL (%s) for "
! 							  "large object %u", lo_acl, blobOid);
! 					exit_nicely();
! 				}
! 			}
  
! 			if (cmdQry->len > 0)
  			{
! 				appendPQExpBuffer(cmdQry, "\n");
! 				archputs(cmdQry->data, AH);
  			}
  		}
  	} while (PQntuples(res) > 0);
  
--- 2156,2256 ----
  		{
  			Oid			blobOid = atooid(PQgetvalue(res, i, 0));
  			char	   *lo_comment = PQgetvalue(res, i, 1);
! 
! 			/* comment on the blob, if necessary */
! 			if (PQgetisnull(res, i, 1))
! 				continue;
  
  			resetPQExpBuffer(cmdQry);
  
! 			appendPQExpBuffer(cmdQry,
! 							  "COMMENT ON LARGE OBJECT %u IS ", blobOid);
! 			appendStringLiteralAH(cmdQry, lo_comment, AH);
! 			appendPQExpBuffer(cmdQry, ";\n\n");
  
! 			archputs(cmdQry->data, AH);
! 		}
! 	} while (PQntuples(res) > 0);
  
! 	PQclear(res);
! 
! 	archputs("\n", AH);
! 
! 	destroyPQExpBuffer(cmdQry);
! 
! 	/* Cleanup cursor */
! 	res = PQexec(g_conn, blobCloseQry);
! 	check_sql_result(res, g_conn, blobCloseQry, PGRES_COMMAND_OK);
! 
! 	return 1;
! }
! 
! /*
!  * dumpBlobAcls
!  *	dump all blob privileges.
!  *
!  * Since we don't provide any way to be selective about dumping blobs,
!  * there's no need to be selective about their privileges either.
!  * We put all the privileges into one big TOC entry.
!  */
! static int
! dumpBlobAcls(Archive *AH, void *arg)
! {
! 	char	   *rolname = ((BlobOwnerInfo *)arg)->rolname;
! 	PQExpBuffer	cmdQry = createPQExpBuffer();
! 	PQExpBuffer blobQry = createPQExpBuffer();
! 	const char *blobFetchQry = "FETCH 100 IN blobacl";
! 	const char *blobCloseQry = "CLOSE blobacl";
! 	PGresult   *res;
! 	int			i;
! 
! 	if (g_verbose)
! 		write_msg(NULL, "saving large object permissions\n");
! 
! 	/* Cursor to get all BLOB permissions */
! 	if (AH->remoteVersion >= 80500)
! 		appendPQExpBuffer(blobQry, "DECLARE blobacl CURSOR FOR "
! 						  "SELECT oid, lomacl FROM pg_largeobject_metadata "
! 						  "WHERE '%s' in (%s lomowner)",
! 						  rolname, username_subquery);
! 	else
! 		return 1;	/* no need to dump anything < 8.5 */
! 
! 	/* Make sure we are in proper schema */
!     selectSourceSchema("pg_catalog");
! 
! 	/* Open cursor */
! 	res = PQexec(g_conn, blobQry->data);
!     check_sql_result(res, g_conn, blobQry->data, PGRES_COMMAND_OK);
! 
! 	do
! 	{
! 		PQclear(res);
! 
! 		/* Do a fetch */
! 		res = PQexec(g_conn, blobFetchQry);
! 		check_sql_result(res, g_conn, blobFetchQry, PGRES_TUPLES_OK);
! 
! 		/* Process the tuples, if any */
! 		for (i = 0; i < PQntuples(res); i++)
! 		{
! 			char   *lo_oid = PQgetvalue(res, i, 0);
! 			char   *lo_acl = PQgetvalue(res, i, 1);
  
! 			if (PQgetisnull(res, i, 1))
! 				continue;
! 
! 			resetPQExpBuffer(cmdQry);
! 
! 			if (!buildACLCommands(lo_oid, NULL, "LARGE OBJECT",
! 								  lo_acl, rolname, "",
! 								  AH->remoteVersion, cmdQry))
  			{
! 				write_msg(NULL, "could not parse ACL (%s) for "
! 						  "large object %s", lo_acl, lo_oid);
! 				exit_nicely();
  			}
+ 			archprintf(AH, "%s\n", cmdQry->data);
  		}
  	} while (PQntuples(res) > 0);
  
***************
*** 2179,2184 **** dumpBlobComments(Archive *AH, void *arg)
--- 2260,2269 ----
  
  	destroyPQExpBuffer(cmdQry);
  
+ 	/* Cleanup */
+ 	res = PQexec(g_conn, blobCloseQry);
+ 	check_sql_result(res, g_conn, blobCloseQry, PGRES_COMMAND_OK);
+ 
  	return 1;
  }
  
***************
*** 6480,6498 **** dumpDumpableObject(Archive *fout, DumpableObject *dobj)
  			break;
  		case DO_BLOBS:
  			ArchiveEntry(fout, dobj->catId, dobj->dumpId,
! 						 dobj->name, NULL, NULL, "",
  						 false, "BLOBS", SECTION_DATA,
  						 "", "", NULL,
  						 dobj->dependencies, dobj->nDeps,
! 						 dumpBlobs, NULL);
  			break;
  		case DO_BLOB_COMMENTS:
  			ArchiveEntry(fout, dobj->catId, dobj->dumpId,
! 						 dobj->name, NULL, NULL, "",
  						 false, "BLOB COMMENTS", SECTION_DATA,
  						 "", "", NULL,
  						 dobj->dependencies, dobj->nDeps,
! 						 dumpBlobComments, NULL);
  			break;
  	}
  }
--- 6565,6594 ----
  			break;
  		case DO_BLOBS:
  			ArchiveEntry(fout, dobj->catId, dobj->dumpId,
! 						 dobj->name, NULL, NULL,
! 						 ((BlobOwnerInfo *) dobj)->rolname,
  						 false, "BLOBS", SECTION_DATA,
  						 "", "", NULL,
  						 dobj->dependencies, dobj->nDeps,
! 						 dumpBlobs, dobj);
  			break;
  		case DO_BLOB_COMMENTS:
  			ArchiveEntry(fout, dobj->catId, dobj->dumpId,
! 						 dobj->name, NULL, NULL,
! 						 ((BlobOwnerInfo *) dobj)->rolname,
  						 false, "BLOB COMMENTS", SECTION_DATA,
  						 "", "", NULL,
  						 dobj->dependencies, dobj->nDeps,
! 						 dumpBlobComments, dobj);
! 			break;
! 		case DO_BLOB_ACLS:
! 			ArchiveEntry(fout, dobj->catId, dobj->dumpId,
! 						 dobj->name, NULL, NULL,
! 						 ((BlobOwnerInfo *) dobj)->rolname,
! 						 false, "BLOB ACLS", SECTION_DATA,
! 						 "", "", NULL,
! 						 dobj->dependencies, dobj->nDeps,
! 						 dumpBlobAcls, dobj);
  			break;
  	}
  }
*** a/src/bin/pg_dump/pg_dump.h
--- b/src/bin/pg_dump/pg_dump.h
***************
*** 116,122 **** typedef enum
  	DO_FOREIGN_SERVER,
  	DO_DEFAULT_ACL,
  	DO_BLOBS,
! 	DO_BLOB_COMMENTS
  } DumpableObjectType;
  
  typedef struct _dumpableObject
--- 116,123 ----
  	DO_FOREIGN_SERVER,
  	DO_DEFAULT_ACL,
  	DO_BLOBS,
! 	DO_BLOB_COMMENTS,
! 	DO_BLOB_ACLS,
  } DumpableObjectType;
  
  typedef struct _dumpableObject
***************
*** 442,447 **** typedef struct _defaultACLInfo
--- 443,454 ----
  	char	   *defaclacl;
  } DefaultACLInfo;
  
+ typedef struct _blobOwnerInfo
+ {
+ 	DumpableObject dobj;
+ 	char	   *rolname;
+ } BlobOwnerInfo;
+ 
  /* global decls */
  extern bool force_quotes;		/* double-quotes for identifiers flag */
  extern bool g_verbose;			/* verbose flag */
*** a/src/bin/pg_dump/pg_dump_sort.c
--- b/src/bin/pg_dump/pg_dump_sort.c
***************
*** 93,99 **** static const int newObjectTypePriority[] =
  	15,							/* DO_FOREIGN_SERVER */
  	27,							/* DO_DEFAULT_ACL */
  	20,							/* DO_BLOBS */
! 	21							/* DO_BLOB_COMMENTS */
  };
  
  
--- 93,100 ----
  	15,							/* DO_FOREIGN_SERVER */
  	27,							/* DO_DEFAULT_ACL */
  	20,							/* DO_BLOBS */
! 	21,							/* DO_BLOB_COMMENTS */
! 	28,							/* DO_BLOB_ACLS */
  };
  
  
***************
*** 1156,1161 **** describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
--- 1157,1167 ----
  					 "BLOB COMMENTS  (ID %d)",
  					 obj->dumpId);
  			return;
+ 		case DO_BLOB_ACLS:
+ 			snprintf(buf, bufsize,
+ 					 "BLOB ACLS  (ID %d)",
+ 					 obj->dumpId);
+ 			return;
  	}
  	/* shouldn't get here */
  	snprintf(buf, bufsize,
#74Tom Lane
tgl@sss.pgh.pa.us
In reply to: KaiGai Kohei (#73)
Re: Largeobject Access Controls (r2460)

KaiGai Kohei <kaigai@ak.jp.nec.com> writes:

The attached patch is a revised version.

I'm inclined to wonder whether this patch doesn't prove that we've
reached the end of the line for the current representation of blobs
in pg_dump archives. The alternative that I'm thinking about is to
treat each blob as an independent object (hence, with its own TOC
entry). If we did that, then the standard pg_dump mechanisms for
ownership, ACLs, and comments would apply, and we could get rid of
the messy hacks that this patch is just adding to. That would also
open the door to future improvements like being able to selectively
restore blobs. (Actually you could do it immediately if you didn't
mind editing a -l file...) And it would for instance allow loading
of blobs to be parallelized.

Now the argument against that is that it won't scale terribly well
to situations with very large numbers of blobs. However, I'm not
convinced that the current approach of cramming them all into one
TOC entry scales so well either. If your large objects are actually
large, there's not going to be an enormous number of them. We've
heard of people with many tens of thousands of tables, and pg_dump
speed didn't seem to be a huge bottleneck for them (at least not
in recent versions). So I'm feeling we should not dismiss the
idea of one TOC entry per blob.

Thoughts?

regards, tom lane

#75Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#74)
Re: Largeobject Access Controls (r2460)

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Now the argument against that is that it won't scale terribly well
to situations with very large numbers of blobs. However, I'm not
convinced that the current approach of cramming them all into one
TOC entry scales so well either. If your large objects are
actually large, there's not going to be an enormous number of
them. We've heard of people with many tens of thousands of
tables, and pg_dump speed didn't seem to be a huge bottleneck for
them (at least not in recent versions). So I'm feeling we should
not dismiss the idea of one TOC entry per blob.

Thoughts?

We've got a "DocImage" table with about 7 million rows storing PDF
documents in a bytea column, approaching 1 TB of data. (We don't
want to give up ACID guarantees, replication, etc. by storing them
on the file system with filenames in the database.) This works
pretty well, except that client software occasionally has a tendency
to run out of RAM. The interface could arguably be cleaner if we
used BLOBs, but the security issues have precluded that in
PostgreSQL.

I suspect that 7 million BLOBs (and growing fast) would be a problem
for this approach. Of course, if we're atypical, we could stay with
bytea if this changed. Just a data point.

-Kevin

cir=> select count(*) from "DocImage";
count
---------
6891626
(1 row)

cir=> select pg_size_pretty(pg_total_relation_size('"DocImage"'));
pg_size_pretty
----------------
956 GB
(1 row)

#76Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#75)
Re: Largeobject Access Controls (r2460)

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

We've heard of people with many tens of thousands of
tables, and pg_dump speed didn't seem to be a huge bottleneck for
them (at least not in recent versions). So I'm feeling we should
not dismiss the idea of one TOC entry per blob.

Thoughts?

I suspect that 7 million BLOBs (and growing fast) would be a problem
for this approach. Of course, if we're atypical, we could stay with
bytea if this changed. Just a data point.

Do you have the opportunity to try an experiment on hardware similar to
what you're running that on? Create a database with 7 million tables
and see what the dump/restore times are like, and whether
pg_dump/pg_restore appear to be CPU-bound or memory-limited when doing
it. If they aren't, we could conclude that millions of TOC entries
isn't a problem.

A compromise we could consider is some sort of sub-TOC-entry scheme that
gets the per-BLOB entries out of the main speed bottlenecks, while still
letting us share most of the logic. For instance, I suspect that the
first bottleneck in pg_dump would be the dependency sorting, but we
don't really need to sort all the blobs individually for that.

regards, tom lane

#77Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#76)
Re: Largeobject Access Controls (r2460)

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Do you have the opportunity to try an experiment on hardware
similar to what you're running that on? Create a database with 7
million tables and see what the dump/restore times are like, and
whether pg_dump/pg_restore appear to be CPU-bound or
memory-limited when doing it.

If these can be empty (or nearly empty) tables, I can probably swing
it as a background task. You didn't need to match the current 1.3
TB database size I assume?

If they aren't, we could conclude that millions of TOC entries
isn't a problem.

I'd actually be rather more concerned about the effects on normal
query plan times, or are you confident that won't be an issue?

A compromise we could consider is some sort of sub-TOC-entry
scheme that gets the per-BLOB entries out of the main speed
bottlenecks, while still letting us share most of the logic. For
instance, I suspect that the first bottleneck in pg_dump would be
the dependency sorting, but we don't really need to sort all the
blobs individually for that.

That might also address the plan time issue, if it actually exists.

-Kevin

#78Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#77)
Re: Largeobject Access Controls (r2460)

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Do you have the opportunity to try an experiment on hardware
similar to what you're running that on? Create a database with 7
million tables and see what the dump/restore times are like, and
whether pg_dump/pg_restore appear to be CPU-bound or
memory-limited when doing it.

If these can be empty (or nearly empty) tables, I can probably swing
it as a background task. You didn't need to match the current 1.3
TB database size I assume?

Empty is fine.

If they aren't, we could conclude that millions of TOC entries
isn't a problem.

I'd actually be rather more concerned about the effects on normal
query plan times, or are you confident that won't be an issue?

This is only a question of what happens internally in pg_dump and
pg_restore --- I'm not suggesting we change anything on the database
side.

regards, tom lane

#79Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#78)
Re: Largeobject Access Controls (r2460)

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Empty is fine.

I'll get started.

-Kevin

#80Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#79)
Re: Largeobject Access Controls (r2460)

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:

I'll get started.

After a couple false starts, the creation of the millions of tables
is underway. At the rate it's going, it won't finish for 8.2 hours,
so I'll have to come in and test the dump tomorrow morning.

-Kevin

#81KaiGai Kohei
kaigai@kaigai.gr.jp
In reply to: Tom Lane (#74)
Re: Largeobject Access Controls (r2460)

(2010/01/23 5:12), Tom Lane wrote:

KaiGai Kohei<kaigai@ak.jp.nec.com> writes:

The attached patch is a revised version.

I'm inclined to wonder whether this patch doesn't prove that we've
reached the end of the line for the current representation of blobs
in pg_dump archives. The alternative that I'm thinking about is to
treat each blob as an independent object (hence, with its own TOC
entry). If we did that, then the standard pg_dump mechanisms for
ownership, ACLs, and comments would apply, and we could get rid of
the messy hacks that this patch is just adding to. That would also
open the door to future improvements like being able to selectively
restore blobs. (Actually you could do it immediately if you didn't
mind editing a -l file...) And it would for instance allow loading
of blobs to be parallelized.

I also think it is better approach than the current blob representation.

Now the argument against that is that it won't scale terribly well
to situations with very large numbers of blobs. However, I'm not
convinced that the current approach of cramming them all into one
TOC entry scales so well either. If your large objects are actually
large, there's not going to be an enormous number of them. We've
heard of people with many tens of thousands of tables, and pg_dump
speed didn't seem to be a huge bottleneck for them (at least not
in recent versions). So I'm feeling we should not dismiss the
idea of one TOC entry per blob.

Even if the database contains massive number of large objects, all the
pg_dump has to manege on RAM is its metadata, not data contents.
If we have one TOC entry per blob, the amount of total i/o size between
server and pg_dump is not different from the current approach.

If we assume one TOC entry consume 64 bytes of RAM, it needs 450MB of
RAM for 7 million BLOBs.
In the recent computers, is it unacceptable pain?
If you try to dump TB class database, I'd like to assume the machine
where pg_dump runs has adequate storage and RAM.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

#82Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#78)
Re: Largeobject Access Controls (r2460)

Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Do you have the opportunity to try an experiment on hardware
similar to what you're running that on? Create a database with
7 million tables and see what the dump/restore times are like,
and whether pg_dump/pg_restore appear to be CPU-bound or
memory-limited when doing it.

If these can be empty (or nearly empty) tables, I can probably
swing it as a background task. You didn't need to match the
current 1.3 TB database size I assume?

Empty is fine.

After about 15 hours of run time it was around 5.5 million tables;
the rate of creation had slowed rather dramatically. I did create
them with primary keys (out of habit) which was probably the wrong
thing. I canceled the table creation process and started a VACUUM
ANALYZE, figuring that we didn't want any hint-bit writing or bad
statistics confusing the results. That has been running for 30
minutes with 65 MB to 140 MB per second disk activity, mixed read
and write. After a few minutes that left me curious just how big
the database was, so I tried:

select pg_size_pretty(pg_database_size('test'));

I did a Ctrl+C after about five minutes and got:

Cancel request sent

but it didn't return for 15 or 20 minutes. Any attempt to query
pg_locks stalls. Tab completion stalls. (By the way, this is not
related to the false alarm on that yesterday, which was a result of
my attempting tab completion from within a failed transaction, which
just found nothing rather than stalling.)

So I'm not sure whether I can get to a state suitable for starting
the desired test, but I'll stay with a for a while.

-Kevin

#83Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#82)
Re: Largeobject Access Controls (r2460)

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:

So I'm not sure whether I can get to a state suitable for starting
the desired test, but I'll stay with a for a while.

I have other commitments today, so I'm going to leave the VACUUM
ANALYZE running and come back tomorrow morning to try the pg_dump.

-Kevin

#84Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#82)
Re: Largeobject Access Controls (r2460)

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

... After a few minutes that left me curious just how big
the database was, so I tried:

select pg_size_pretty(pg_database_size('test'));

I did a Ctrl+C after about five minutes and got:

Cancel request sent

but it didn't return for 15 or 20 minutes.

Hm, we probably are lacking CHECK_FOR_INTERRUPTS in the inner loops in
dbsize.c ...

regards, tom lane

#85Tom Lane
tgl@sss.pgh.pa.us
In reply to: KaiGai Kohei (#81)
Re: Largeobject Access Controls (r2460)

KaiGai Kohei <kaigai@kaigai.gr.jp> writes:

(2010/01/23 5:12), Tom Lane wrote:

Now the argument against that is that it won't scale terribly well
to situations with very large numbers of blobs.

Even if the database contains massive number of large objects, all the
pg_dump has to manege on RAM is its metadata, not data contents.

I'm not so worried about the amount of RAM needed as whether pg_dump's
internal algorithms will scale to large numbers of TOC entries. Any
O(N^2) behavior would be pretty painful, for example. No doubt we could
fix any such problems, but it might take more work than we want to do
right now.

regards, tom lane

#86Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#85)
Re: Largeobject Access Controls (r2460)

Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm not so worried about the amount of RAM needed as whether
pg_dump's internal algorithms will scale to large numbers of TOC
entries. Any O(N^2) behavior would be pretty painful, for
example. No doubt we could fix any such problems, but it might
take more work than we want to do right now.

I'm afraid pg_dump didn't get very far with this before:

pg_dump: WARNING: out of shared memory
pg_dump: SQL command failed
pg_dump: Error message from server: ERROR: out of shared memory
HINT: You might need to increase max_locks_per_transaction.
pg_dump: The command was: LOCK TABLE public.test2672 IN ACCESS SHARE
MODE

Given how fast it happened, I suspect that it was 2672 tables into
the dump, versus 26% of the way through 5.5 million tables.

A sampling of the vmstat 1 output lines in "baseline" state --
before the dump started:

procs -----------memory---------- ---swap-- -----io---- -system--
-----cpu------
1 0 319804 583656 23372 124473248 0 0 17224 10 1742 18995
9 1 88 2 0
3 1 319804 595840 23368 124458856 0 0 17016 10 2014 22965
9 1 89 1 0
1 0 319804 586912 23376 124469128 0 0 16808 158 1807 19181
8 1 89 2 0
2 0 319804 576304 23368 124479416 0 0 16840 5 1764 19136
8 1 90 1 0

0 1 319804 590480 23364 124459888 0 0 1488 130 3449 13844
2 1 93 3 0
0 1 319804 589476 23364 124460912 0 0 1456 115 3328 11800
2 1 94 4 0
1 0 319804 588468 23364 124461944 0 0 1376 146 3156 11770
2 1 95 2 0
1 1 319804 587836 23364 124465024 0 0 1576 133 3599 14797
3 1 94 3 0

While it was running:

procs -----------memory---------- ---swap-- -----io---- -system--
-----cpu------
2 1 429080 886244 23308 111242464 0 0 25684 38 2920 18847
7 3 85 5 0
2 1 429080 798172 23308 111297976 0 0 40024 26 1342 16967
13 2 82 4 0
2 1 429080 707708 23308 111357600 0 0 42520 34 1588 19148
13 2 81 4 0
0 5 429080 620700 23308 111414144 0 0 40272 73863 1434 18077
12 2 80 6 0
1 5 429080 605616 23308 111425448 0 0 6920 131232 729 5187
3 1 66 31 0
0 6 429080 582852 23316 111442912 0 0 10840 131248 665 4987
3 1 66 30 0
2 4 429080 584976 23308 111433672 0 0 9776 139416 693 7890
4 1 66 29 0
0 5 429080 575752 23308 111436752 0 0 10776 131217 647 6157
3 1 66 30 0
1 3 429080 583768 23308 111420304 0 0 13616 90352 1043 13047
6 1 68 25 0
4 0 429080 578888 23300 111397696 0 0 40000 44 1347 25329
12 2 79 6 0
2 1 429080 582368 23292 111367896 0 0 40320 76 1517 28628
13 2 80 5 0
2 0 429080 584960 23276 111338096 0 0 40240 163 1374 26988
13 2 80 5 0
6 0 429080 576176 23268 111319600 0 0 40328 170 1465 27229
13 2 80 5 0
4 0 429080 583212 23212 111288816 0 0 39568 138 1418 27296
13 2 80 5 0

This box has 16 CPUs, so the jump from 3% user CPU to 13% with an
increase of I/O wait from 3% to 5% suggests that pg_dump was
primarily CPU bound in user code before the crash.

I can leave this database around for a while if there are other
things you would like me to try.

-Kevin

#87Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#86)
Re: Largeobject Access Controls (r2460)

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

I'm afraid pg_dump didn't get very far with this before:

pg_dump: WARNING: out of shared memory
pg_dump: SQL command failed

Given how fast it happened, I suspect that it was 2672 tables into
the dump, versus 26% of the way through 5.5 million tables.

Yeah, I didn't think about that. You'd have to bump
max_locks_per_transaction up awfully far to get to where pg_dump
could dump millions of tables, because it wants to lock each one.

It might be better to try a test case with lighter-weight objects,
say 5 million simple functions.

regards, tom lane

#88Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#87)
Re: Largeobject Access Controls (r2460)

Tom Lane <tgl@sss.pgh.pa.us> wrote:

It might be better to try a test case with lighter-weight objects,
say 5 million simple functions.

So the current database is expendable? I'd just as soon delete it
before creating the other one, if you're fairly confident the other
one will do it.

-Kevin

#89Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#88)
Re: Largeobject Access Controls (r2460)

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

It might be better to try a test case with lighter-weight objects,
say 5 million simple functions.

So the current database is expendable?

Yeah, I think it was a bad experimental design anyway...

regards, tom lane

#90Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#87)
Re: Largeobject Access Controls (r2460)

Tom Lane <tgl@sss.pgh.pa.us> wrote:

It might be better to try a test case with lighter-weight objects,
say 5 million simple functions.

A dump of that quickly settled into running a series of these:

SELECT proretset, prosrc, probin,
pg_catalog.pg_get_function_arguments(oid) AS funcargs,
pg_catalog.pg_get_fun
ction_identity_arguments(oid) AS funciargs,
pg_catalog.pg_get_function_result(oid) AS funcresult, proiswindow,
provolatile, proisstrict, prosecdef, proconfig, procost, prorows, (S
ELECT lanname FROM pg_catalog.pg_language WHERE oid = prolang) AS
lanname FROM pg_catalog.pg_proc WHERE oid =
'1404528'::pg_catalog.oid

(with different oid values, of course).

Is this before or after the point you were worried about. Anything
in particular for which I should be alert?

-Kevin

#91Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#87)
Re: Largeobject Access Controls (r2460)

Tom Lane <tgl@sss.pgh.pa.us> wrote:

It might be better to try a test case with lighter-weight objects,
say 5 million simple functions.

Said dump ran in about 45 minutes with no obvious stalls or
problems. The 2.2 GB database dumped to a 1.1 GB text file, which
was a little bit of a surprise.

-Kevin

#92Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#91)
Re: Largeobject Access Controls (r2460)

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

It might be better to try a test case with lighter-weight objects,
say 5 million simple functions.

Said dump ran in about 45 minutes with no obvious stalls or
problems. The 2.2 GB database dumped to a 1.1 GB text file, which
was a little bit of a surprise.

Did you happen to notice anything about pg_dump's memory consumption?
For an all-DDL case like this, I'd sort of expect the memory usage to
be comparable to the output file size.

Anyway this seems to suggest that we don't have any huge problem with
large numbers of archive TOC objects, so the next step probably is to
look at how big a code change it would be to switch over to
TOC-per-blob.

regards, tom lane

#93Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#92)
Re: Largeobject Access Controls (r2460)

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Did you happen to notice anything about pg_dump's memory
consumption?

Not directly, but I was running 'vmstat 1' throughout. Cache space
dropped about 2.1 GB while it was running and popped back up to the
previous level at the end.

-Kevin

#94Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#93)
Re: Largeobject Access Controls (r2460)

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Did you happen to notice anything about pg_dump's memory
consumption?

Not directly, but I was running 'vmstat 1' throughout. Cache
space dropped about 2.1 GB while it was running and popped back up
to the previous level at the end.

I took a closer look, and there's some bad news, I think. The above
numbers were from the ends of the range. I've gone back over and
found that while it dropped about 2.1 GB almost immediately, cache
usage slowly dropped throughout the dump, and bottomed at about 6.9
GB below baseline.

-Kevin

#95Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#94)
Re: Largeobject Access Controls (r2460)

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Did you happen to notice anything about pg_dump's memory
consumption?

I took a closer look, and there's some bad news, I think. The above
numbers were from the ends of the range. I've gone back over and
found that while it dropped about 2.1 GB almost immediately, cache
usage slowly dropped throughout the dump, and bottomed at about 6.9
GB below baseline.

OK, that's still not very scary --- it just means my off-the-cuff
estimate of 1:1 space usage was bad. 3:1 isn't that surprising either
given padding and other issues. The representation of ArchiveEntries
could probably be made a bit more compact ...

regards, tom lane

#96KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: KaiGai Kohei (#81)
1 attachment(s)
Re: Largeobject Access Controls (r2460)

The attached patch uses one TOC entry for each blob objects.

It adds two new section types.

* "BLOB ITEM"

This section provides properties of a certain large object.
It contains a query to create an empty large object, and restore
ownership of the large object, if necessary.

| --
| -- Name: 16406; Type: BLOB ITEM; Schema: -; Owner: ymj
| --
|
| SELECT lo_create(16406);
|
| ALTER LARGE OBJECT "16406" OWNER TO ymj;

The comment descriptions were moved to COMMENT section, like any other
object classes.

| --
| -- Name: LARGE OBJECT 16406; Type: COMMENT; Schema: -; Owner: ymj
| --
|
| COMMENT ON LARGE OBJECT 16406 IS 'This is a small large object.';

Also, access privileges were moved to ACL section.

| --
| -- Name: 16405; Type: ACL; Schema: -; Owner: kaigai
| --
|
| REVOKE ALL ON LARGE OBJECT 16405 FROM PUBLIC;
| REVOKE ALL ON LARGE OBJECT 16405 FROM kaigai;
| GRANT ALL ON LARGE OBJECT 16405 TO kaigai;
| GRANT ALL ON LARGE OBJECT 16405 TO PUBLIC;

* "BLOB DATA"

This section is same as existing "BLOBS" section, except for _LoadBlobs()
does not create a new large object before opening it with INV_WRITE, and
lo_truncate() will be used instead of lo_unlink() when --clean is given.

The legacy sections ("BLOBS" and "BLOB COMMENTS") are available to read
for compatibility, but newer pg_dump never create these sections.

Internally, the getBlobs() scans all the blobs and makes DO_BLOB_ITEM
entries for each blobs and a DO_BLOB_DATA entry if the database contains
a large object at least.

The _PrintTocData() handles both of "BLOBS" and "BLOB DATA" sections.
If the given section is "BLOB DATA", it calls _LoadBlobs() of the specified
format with compat = false, because this section is new style.

In this case, _LoadBlobs() does not create a large object before opening
it with INV_WRITE, because "BLOB ITEM" section already create an empty
large obejct.

And DropBlobIfExists() was renamed to CleanupBlobIfExists(), because it
is modified to apply lo_truncate() if "BLOB DATA" section.
When --clean is given, SELECT lo_unlink(xxx) will be injected on the head
of queries to store, instead of the mid-flow of loading blobs.

One remained issue is the way to decide whether blobs to be dumped, or not.
Right now, --schema-only eliminate all the blob dumps.
However, I think it should follow the manner in any other object classes.

-a, --data-only ... only "BLOB DATA" sections, not "BLOB ITEM"
-s, --schema-only ... only "BLOB ITEM" sections, not "BLOB DATA"
-b, --blobs ... both of "BLOB ITEM" and "BLOB DATA" independently
from --data-only and --schema-only?

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

pgsql-fix-pg_dump-blob-privs.3.patchapplication/octect-stream; name=pgsql-fix-pg_dump-blob-privs.3.patchDownload
*** a/src/bin/pg_dump/pg_backup_archiver.c
--- b/src/bin/pg_dump/pg_backup_archiver.c
***************
*** 520,525 **** restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
--- 520,526 ----
  				_printTocEntry(AH, te, ropt, true, false);
  
  				if (strcmp(te->desc, "BLOBS") == 0 ||
+ 					strcmp(te->desc, "BLOB DATA") == 0 ||
  					strcmp(te->desc, "BLOB COMMENTS") == 0)
  				{
  					ahlog(AH, 1, "restoring %s\n", te->desc);
***************
*** 903,909 **** EndRestoreBlobs(ArchiveHandle *AH)
   * Called by a format handler to initiate restoration of a blob
   */
  void
! StartRestoreBlob(ArchiveHandle *AH, Oid oid, bool drop)
  {
  	Oid			loOid;
  
--- 904,910 ----
   * Called by a format handler to initiate restoration of a blob
   */
  void
! StartRestoreBlob(ArchiveHandle *AH, Oid oid, bool cleanup, bool compat)
  {
  	Oid			loOid;
  
***************
*** 914,937 **** StartRestoreBlob(ArchiveHandle *AH, Oid oid, bool drop)
  
  	ahlog(AH, 2, "restoring large object with OID %u\n", oid);
  
! 	if (drop)
! 		DropBlobIfExists(AH, oid);
  
  	if (AH->connection)
  	{
! 		loOid = lo_create(AH->connection, oid);
! 		if (loOid == 0 || loOid != oid)
! 			die_horribly(AH, modulename, "could not create large object %u\n",
! 						 oid);
! 
  		AH->loFd = lo_open(AH->connection, oid, INV_WRITE);
  		if (AH->loFd == -1)
  			die_horribly(AH, modulename, "could not open large object\n");
  	}
  	else
  	{
! 		ahprintf(AH, "SELECT pg_catalog.lo_open(pg_catalog.lo_create('%u'), %d);\n",
! 				 oid, INV_WRITE);
  	}
  
  	AH->writingBlob = 1;
--- 915,942 ----
  
  	ahlog(AH, 2, "restoring large object with OID %u\n", oid);
  
! 	if (cleanup)
! 		CleanupBlobIfExists(AH, oid, compat);
  
  	if (AH->connection)
  	{
! 		if (compat)
! 		{
! 			loOid = lo_create(AH->connection, oid);
! 			if (loOid == 0 || loOid != oid)
! 				die_horribly(AH, modulename, "could not create large object %u\n",
! 							 oid);
! 		}
  		AH->loFd = lo_open(AH->connection, oid, INV_WRITE);
  		if (AH->loFd == -1)
  			die_horribly(AH, modulename, "could not open large object\n");
  	}
  	else
  	{
! 		if (compat)
! 			ahprintf(AH, "SELECT pg_catalog.lo_create('%u');\n", oid);
! 
! 		ahprintf(AH, "SELECT pg_catalog.lo_open(%u, %d);\n", oid, INV_WRITE);
  	}
  
  	AH->writingBlob = 1;
***************
*** 1940,1946 **** WriteDataChunks(ArchiveHandle *AH)
  			AH->currToc = te;
  			/* printf("Writing data for %d (%x)\n", te->id, te); */
  
! 			if (strcmp(te->desc, "BLOBS") == 0)
  			{
  				startPtr = AH->StartBlobsPtr;
  				endPtr = AH->EndBlobsPtr;
--- 1945,1952 ----
  			AH->currToc = te;
  			/* printf("Writing data for %d (%x)\n", te->id, te); */
  
! 			if (strcmp(te->desc, "BLOBS") == 0 ||
! 				strcmp(te->desc, "BLOB DATA") == 0)
  			{
  				startPtr = AH->StartBlobsPtr;
  				endPtr = AH->EndBlobsPtr;
***************
*** 2077,2082 **** ReadToc(ArchiveHandle *AH)
--- 2083,2089 ----
  				te->section = SECTION_NONE;
  			else if (strcmp(te->desc, "TABLE DATA") == 0 ||
  					 strcmp(te->desc, "BLOBS") == 0 ||
+ 					 strcmp(te->desc, "BLOB DATA") == 0 ||
  					 strcmp(te->desc, "BLOB COMMENTS") == 0)
  				te->section = SECTION_DATA;
  			else if (strcmp(te->desc, "CONSTRAINT") == 0 ||
***************
*** 2713,2718 **** _getObjectDescription(PQExpBuffer buf, TocEntry *te, ArchiveHandle *AH)
--- 2720,2732 ----
  		return;
  	}
  
+ 	/* Use ALTER LARGE OBJECT for BLOB ITEM */
+ 	if (strcmp(type, "BLOB ITEM") == 0)
+ 	{
+ 		appendPQExpBuffer(buf, "LARGE OBJECT %s", te->tag);
+ 		return;
+ 	}
+ 
  	write_msg(modulename, "WARNING: don't know how to set owner for object type %s\n",
  			  type);
  }
***************
*** 2824,2829 **** _printTocEntry(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt, bool isDat
--- 2838,2844 ----
  		strlen(te->owner) > 0 && strlen(te->dropStmt) > 0)
  	{
  		if (strcmp(te->desc, "AGGREGATE") == 0 ||
+ 			strcmp(te->desc, "BLOB ITEM") == 0 ||
  			strcmp(te->desc, "CONVERSION") == 0 ||
  			strcmp(te->desc, "DATABASE") == 0 ||
  			strcmp(te->desc, "DOMAIN") == 0 ||
*** a/src/bin/pg_dump/pg_backup_archiver.h
--- b/src/bin/pg_dump/pg_backup_archiver.h
***************
*** 359,365 **** int			ReadOffset(ArchiveHandle *, pgoff_t *);
  size_t		WriteOffset(ArchiveHandle *, pgoff_t, int);
  
  extern void StartRestoreBlobs(ArchiveHandle *AH);
! extern void StartRestoreBlob(ArchiveHandle *AH, Oid oid, bool drop);
  extern void EndRestoreBlob(ArchiveHandle *AH, Oid oid);
  extern void EndRestoreBlobs(ArchiveHandle *AH);
  
--- 359,365 ----
  size_t		WriteOffset(ArchiveHandle *, pgoff_t, int);
  
  extern void StartRestoreBlobs(ArchiveHandle *AH);
! extern void StartRestoreBlob(ArchiveHandle *AH, Oid oid, bool cleanup, bool compat);
  extern void EndRestoreBlob(ArchiveHandle *AH, Oid oid);
  extern void EndRestoreBlobs(ArchiveHandle *AH);
  
***************
*** 371,377 **** extern void InitArchiveFmt_Tar(ArchiveHandle *AH);
  extern bool isValidTarHeader(char *header);
  
  extern int	ReconnectToServer(ArchiveHandle *AH, const char *dbname, const char *newUser);
! extern void	DropBlobIfExists(ArchiveHandle *AH, Oid oid);
  
  int			ahwrite(const void *ptr, size_t size, size_t nmemb, ArchiveHandle *AH);
  int			ahprintf(ArchiveHandle *AH, const char *fmt,...) __attribute__((format(printf, 2, 3)));
--- 371,377 ----
  extern bool isValidTarHeader(char *header);
  
  extern int	ReconnectToServer(ArchiveHandle *AH, const char *dbname, const char *newUser);
! extern void	CleanupBlobIfExists(ArchiveHandle *AH, Oid oid, bool compat);
  
  int			ahwrite(const void *ptr, size_t size, size_t nmemb, ArchiveHandle *AH);
  int			ahprintf(ArchiveHandle *AH, const char *fmt,...) __attribute__((format(printf, 2, 3)));
*** a/src/bin/pg_dump/pg_backup_custom.c
--- b/src/bin/pg_dump/pg_backup_custom.c
***************
*** 54,60 **** static void _StartBlobs(ArchiveHandle *AH, TocEntry *te);
  static void _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
  static void _EndBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
  static void _EndBlobs(ArchiveHandle *AH, TocEntry *te);
! static void _LoadBlobs(ArchiveHandle *AH, bool drop);
  static void _Clone(ArchiveHandle *AH);
  static void _DeClone(ArchiveHandle *AH);
  
--- 54,60 ----
  static void _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
  static void _EndBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
  static void _EndBlobs(ArchiveHandle *AH, TocEntry *te);
! static void _LoadBlobs(ArchiveHandle *AH, bool cleanup, bool compat);
  static void _Clone(ArchiveHandle *AH);
  static void _DeClone(ArchiveHandle *AH);
  
***************
*** 498,504 **** _PrintTocData(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  			break;
  
  		case BLK_BLOBS:
! 			_LoadBlobs(AH, ropt->dropSchema);
  			break;
  
  		default:				/* Always have a default */
--- 498,507 ----
  			break;
  
  		case BLK_BLOBS:
! 			if (strcmp(te->desc, "BLOBS") == 0)
! 				_LoadBlobs(AH, ropt->dropSchema, true);
! 			else
! 				_LoadBlobs(AH, ropt->dropSchema, false);
  			break;
  
  		default:				/* Always have a default */
***************
*** 619,625 **** _PrintData(ArchiveHandle *AH)
  }
  
  static void
! _LoadBlobs(ArchiveHandle *AH, bool drop)
  {
  	Oid			oid;
  
--- 622,628 ----
  }
  
  static void
! _LoadBlobs(ArchiveHandle *AH, bool cleanup, bool compat)
  {
  	Oid			oid;
  
***************
*** 628,634 **** _LoadBlobs(ArchiveHandle *AH, bool drop)
  	oid = ReadInt(AH);
  	while (oid != 0)
  	{
! 		StartRestoreBlob(AH, oid, drop);
  		_PrintData(AH);
  		EndRestoreBlob(AH, oid);
  		oid = ReadInt(AH);
--- 631,637 ----
  	oid = ReadInt(AH);
  	while (oid != 0)
  	{
! 		StartRestoreBlob(AH, oid, cleanup, compat);
  		_PrintData(AH);
  		EndRestoreBlob(AH, oid);
  		oid = ReadInt(AH);
*** a/src/bin/pg_dump/pg_backup_db.c
--- b/src/bin/pg_dump/pg_backup_db.c
***************
*** 12,17 ****
--- 12,18 ----
  
  #include "pg_backup_db.h"
  #include "dumputils.h"
+ #include "libpq/libpq-fs.h"
  
  #include <unistd.h>
  
***************
*** 653,671 **** CommitTransaction(ArchiveHandle *AH)
  }
  
  void
! DropBlobIfExists(ArchiveHandle *AH, Oid oid)
  {
  	/* Call lo_unlink only if exists to avoid not-found error. */
  	if (PQserverVersion(AH->connection) >= 80500)
  	{
! 		ahprintf(AH, "SELECT pg_catalog.lo_unlink(oid) "
  					 "FROM pg_catalog.pg_largeobject_metadata "
  					 "WHERE oid = %u;\n", oid);
  	}
  	else
  	{
! 		ahprintf(AH, "SELECT CASE WHEN EXISTS(SELECT 1 FROM pg_catalog.pg_largeobject WHERE loid = '%u') THEN pg_catalog.lo_unlink('%u') END;\n",
! 				 oid, oid);
  	}
  }
  
--- 654,685 ----
  }
  
  void
! CleanupBlobIfExists(ArchiveHandle *AH, Oid oid, bool compat)
  {
  	/* Call lo_unlink only if exists to avoid not-found error. */
  	if (PQserverVersion(AH->connection) >= 80500)
  	{
! 		if (compat)
! 			ahprintf(AH, "SELECT pg_catalog.lo_unlink(oid) "
  					 "FROM pg_catalog.pg_largeobject_metadata "
  					 "WHERE oid = %u;\n", oid);
+ 		else
+ 			ahprintf(AH, "SELECT pg_catalog.lo_truncate(pg_catalog.lo_open(oid, %d), 0) "
+ 					 "FROM pg_catalog.pg_largeobject_metadata "
+ 					 "WHERE oid = %u;\n", INV_READ, oid);
  	}
  	else
  	{
! 		if (compat)
! 			ahprintf(AH, "SELECT CASE WHEN EXISTS"
! 					 "(SELECT 1 FROM pg_catalog.pg_largeobject WHERE loid = '%u') THEN"
! 					 " pg_catalog.lo_unlink('%u') END;\n",
! 					 oid, oid);
! 		else
! 			ahprintf(AH, "SELECT CASE WHEN EXISTS"
! 					 "(SELECT 1 FROM pg_catalog.pg_largeobject WHERE loid = '%u') THEN"
! 					 " pg_catalog.lo_truncate(pg_catalog.lo_open('%u', %d), 0) END;\n",
! 					 oid, oid, INV_WRITE);
  	}
  }
  
*** a/src/bin/pg_dump/pg_backup_files.c
--- b/src/bin/pg_dump/pg_backup_files.c
***************
*** 66,72 **** typedef struct
  } lclTocEntry;
  
  static const char *modulename = gettext_noop("file archiver");
! static void _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt);
  static void _getBlobTocEntry(ArchiveHandle *AH, Oid *oid, char *fname);
  
  /*
--- 66,72 ----
  } lclTocEntry;
  
  static const char *modulename = gettext_noop("file archiver");
! static void _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt, bool compat);
  static void _getBlobTocEntry(ArchiveHandle *AH, Oid *oid, char *fname);
  
  /*
***************
*** 330,336 **** _PrintTocData(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  		return;
  
  	if (strcmp(te->desc, "BLOBS") == 0)
! 		_LoadBlobs(AH, ropt);
  	else
  		_PrintFileData(AH, tctx->filename, ropt);
  }
--- 330,338 ----
  		return;
  
  	if (strcmp(te->desc, "BLOBS") == 0)
! 		_LoadBlobs(AH, ropt, true);
! 	else if (strcmp(te->desc, "BLOB DATA") == 0)
! 		_LoadBlobs(AH, ropt, false);
  	else
  		_PrintFileData(AH, tctx->filename, ropt);
  }
***************
*** 365,371 **** _getBlobTocEntry(ArchiveHandle *AH, Oid *oid, char fname[K_STD_BUF_SIZE])
  }
  
  static void
! _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt)
  {
  	Oid			oid;
  	lclContext *ctx = (lclContext *) AH->formatData;
--- 367,373 ----
  }
  
  static void
! _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt, bool compat)
  {
  	Oid			oid;
  	lclContext *ctx = (lclContext *) AH->formatData;
***************
*** 382,388 **** _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt)
  
  	while (oid != 0)
  	{
! 		StartRestoreBlob(AH, oid, ropt->dropSchema);
  		_PrintFileData(AH, fname, ropt);
  		EndRestoreBlob(AH, oid);
  		_getBlobTocEntry(AH, &oid, fname);
--- 384,390 ----
  
  	while (oid != 0)
  	{
! 		StartRestoreBlob(AH, oid, ropt->dropSchema, compat);
  		_PrintFileData(AH, fname, ropt);
  		EndRestoreBlob(AH, oid);
  		_getBlobTocEntry(AH, &oid, fname);
*** a/src/bin/pg_dump/pg_backup_null.c
--- b/src/bin/pg_dump/pg_backup_null.c
***************
*** 147,160 **** _StartBlobs(ArchiveHandle *AH, TocEntry *te)
  static void
  _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid)
  {
  	if (oid == 0)
  		die_horribly(AH, NULL, "invalid OID for large object\n");
  
  	if (AH->ropt->dropSchema)
! 		DropBlobIfExists(AH, oid);
  
! 	ahprintf(AH, "SELECT pg_catalog.lo_open(pg_catalog.lo_create('%u'), %d);\n",
! 			 oid, INV_WRITE);
  
  	AH->WriteDataPtr = _WriteBlobData;
  }
--- 147,165 ----
  static void
  _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid)
  {
+ 	bool	compat = (strcmp(te->desc, "BLOBS") == 0 ? true : false);
+ 
  	if (oid == 0)
  		die_horribly(AH, NULL, "invalid OID for large object\n");
  
  	if (AH->ropt->dropSchema)
! 		CleanupBlobIfExists(AH, oid, compat);
  
! 	if (compat)
! 		ahprintf(AH, "SELECT pg_catalog.lo_open(pg_catalog.lo_create('%u'), %d);\n",
! 				 oid, INV_WRITE);
! 	else
! 		ahprintf(AH, "SELECT pg_catalog.lo_open(%u, %d);\n", oid, INV_WRITE);
  
  	AH->WriteDataPtr = _WriteBlobData;
  }
***************
*** 195,206 **** _PrintTocData(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  	{
  		AH->currToc = te;
  
! 		if (strcmp(te->desc, "BLOBS") == 0)
  			_StartBlobs(AH, te);
  
  		(*te->dataDumper) ((Archive *) AH, te->dataDumperArg);
  
! 		if (strcmp(te->desc, "BLOBS") == 0)
  			_EndBlobs(AH, te);
  
  		AH->currToc = NULL;
--- 200,213 ----
  	{
  		AH->currToc = te;
  
! 		if (strcmp(te->desc, "BLOBS") == 0 ||
! 			strcmp(te->desc, "BLOB DATA") == 0)
  			_StartBlobs(AH, te);
  
  		(*te->dataDumper) ((Archive *) AH, te->dataDumperArg);
  
! 		if (strcmp(te->desc, "BLOBS") == 0 ||
! 			strcmp(te->desc, "BLOB DATA") == 0)
  			_EndBlobs(AH, te);
  
  		AH->currToc = NULL;
*** a/src/bin/pg_dump/pg_backup_tar.c
--- b/src/bin/pg_dump/pg_backup_tar.c
***************
*** 100,106 **** typedef struct
  
  static const char *modulename = gettext_noop("tar archiver");
  
! static void _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt);
  
  static TAR_MEMBER *tarOpen(ArchiveHandle *AH, const char *filename, char mode);
  static void tarClose(ArchiveHandle *AH, TAR_MEMBER *TH);
--- 100,106 ----
  
  static const char *modulename = gettext_noop("tar archiver");
  
! static void _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt, bool compat);
  
  static TAR_MEMBER *tarOpen(ArchiveHandle *AH, const char *filename, char mode);
  static void tarClose(ArchiveHandle *AH, TAR_MEMBER *TH);
***************
*** 696,708 **** _PrintTocData(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  	}
  
  	if (strcmp(te->desc, "BLOBS") == 0)
! 		_LoadBlobs(AH, ropt);
  	else
  		_PrintFileData(AH, tctx->filename, ropt);
  }
  
  static void
! _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt)
  {
  	Oid			oid;
  	lclContext *ctx = (lclContext *) AH->formatData;
--- 696,710 ----
  	}
  
  	if (strcmp(te->desc, "BLOBS") == 0)
! 		_LoadBlobs(AH, ropt, true);
! 	else if (strcmp(te->desc, "BLOB DATA") == 0)
! 		_LoadBlobs(AH, ropt, false);
  	else
  		_PrintFileData(AH, tctx->filename, ropt);
  }
  
  static void
! _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt, bool compat)
  {
  	Oid			oid;
  	lclContext *ctx = (lclContext *) AH->formatData;
***************
*** 725,731 **** _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt)
  			{
  				ahlog(AH, 1, "restoring large object OID %u\n", oid);
  
! 				StartRestoreBlob(AH, oid, ropt->dropSchema);
  
  				while ((cnt = tarRead(buf, 4095, th)) > 0)
  				{
--- 727,733 ----
  			{
  				ahlog(AH, 1, "restoring large object OID %u\n", oid);
  
! 				StartRestoreBlob(AH, oid, ropt->dropSchema, compat);
  
  				while ((cnt = tarRead(buf, 4095, th)) > 0)
  				{
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
***************
*** 190,198 **** static void selectSourceSchema(const char *schemaName);
  static char *getFormattedTypeName(Oid oid, OidOptions opts);
  static char *myFormatType(const char *typname, int32 typmod);
  static const char *fmtQualifiedId(const char *schema, const char *id);
! static bool hasBlobs(Archive *AH);
! static int	dumpBlobs(Archive *AH, void *arg);
! static int	dumpBlobComments(Archive *AH, void *arg);
  static void dumpDatabase(Archive *AH);
  static void dumpEncoding(Archive *AH);
  static void dumpStdStrings(Archive *AH);
--- 190,198 ----
  static char *getFormattedTypeName(Oid oid, OidOptions opts);
  static char *myFormatType(const char *typname, int32 typmod);
  static const char *fmtQualifiedId(const char *schema, const char *id);
! static void getBlobs(Archive *AH);
! static void dumpBlobItem(Archive *AH, BlobInfo *binfo);
! static int  dumpBlobData(Archive *AH, void *arg);
  static void dumpDatabase(Archive *AH);
  static void dumpEncoding(Archive *AH);
  static void dumpStdStrings(Archive *AH);
***************
*** 701,725 **** main(int argc, char **argv)
  			getTableDataFKConstraints();
  	}
  
! 	if (outputBlobs && hasBlobs(g_fout))
! 	{
! 		/* Add placeholders to allow correct sorting of blobs */
! 		DumpableObject *blobobj;
! 		DumpableObject *blobcobj;
! 
! 		blobobj = (DumpableObject *) malloc(sizeof(DumpableObject));
! 		blobobj->objType = DO_BLOBS;
! 		blobobj->catId = nilCatalogId;
! 		AssignDumpId(blobobj);
! 		blobobj->name = strdup("BLOBS");
! 
! 		blobcobj = (DumpableObject *) malloc(sizeof(DumpableObject));
! 		blobcobj->objType = DO_BLOB_COMMENTS;
! 		blobcobj->catId = nilCatalogId;
! 		AssignDumpId(blobcobj);
! 		blobcobj->name = strdup("BLOB COMMENTS");
! 		addObjectDependency(blobcobj, blobobj->dumpId);
! 	}
  
  	/*
  	 * Collect dependency data to assist in ordering the objects.
--- 701,708 ----
  			getTableDataFKConstraints();
  	}
  
! 	if (outputBlobs)
! 		getBlobs(g_fout);
  
  	/*
  	 * Collect dependency data to assist in ordering the objects.
***************
*** 1938,1980 **** dumpStdStrings(Archive *AH)
  
  
  /*
!  * hasBlobs:
   *	Test whether database contains any large objects
   */
! static bool
! hasBlobs(Archive *AH)
  {
! 	bool		result;
! 	const char *blobQry;
! 	PGresult   *res;
  
  	/* Make sure we are in proper schema */
  	selectSourceSchema("pg_catalog");
  
  	/* Check for BLOB OIDs */
  	if (AH->remoteVersion >= 80500)
! 		blobQry = "SELECT oid FROM pg_largeobject_metadata LIMIT 1";
  	else if (AH->remoteVersion >= 70100)
! 		blobQry = "SELECT loid FROM pg_largeobject LIMIT 1";
  	else
! 		blobQry = "SELECT oid FROM pg_class WHERE relkind = 'l' LIMIT 1";
  
! 	res = PQexec(g_conn, blobQry);
! 	check_sql_result(res, g_conn, blobQry, PGRES_TUPLES_OK);
  
! 	result = PQntuples(res) > 0;
  
! 	PQclear(res);
  
! 	return result;
  }
  
  /*
!  * dumpBlobs:
!  *	dump all blobs
   */
  static int
! dumpBlobs(Archive *AH, void *arg)
  {
  	const char *blobQry;
  	const char *blobFetchQry;
--- 1921,2052 ----
  
  
  /*
!  * getBlobs:
   *	Test whether database contains any large objects
   */
! static void
! getBlobs(Archive *AH)
  {
! 	PQExpBuffer		blobQry = createPQExpBuffer();
! 	BlobInfo	   *blobobj;
! 	DumpableObject *blobdata;
! 	PGresult	   *res;
! 	int				i;
  
  	/* Make sure we are in proper schema */
  	selectSourceSchema("pg_catalog");
  
  	/* Check for BLOB OIDs */
  	if (AH->remoteVersion >= 80500)
! 		appendPQExpBuffer(blobQry,
! 						  "SELECT oid, (%s lomowner), lomacl"
! 						  " FROM pg_largeobject_metadata",
! 						  username_subquery);
  	else if (AH->remoteVersion >= 70100)
! 		appendPQExpBuffer(blobQry,
! 						  "SELECT DISTINCT loid, NULL, NULL"
! 						  " FROM pg_largeobject");
  	else
! 		appendPQExpBuffer(blobQry,
! 						  "SELECT DISTINCT oid, NULL, NULL"
! 						  " FROM pg_class WHERE relkind = 'l'");
  
! 	res = PQexec(g_conn, blobQry->data);
! 	check_sql_result(res, g_conn, blobQry->data, PGRES_TUPLES_OK);
  
! 	/*
! 	 * Now, a large object has its own "BLOB ITEM" section to
! 	 * declare itself.
! 	 */
! 	for (i = 0; i < PQntuples(res); i++)
! 	{
! 		blobobj = (BlobInfo *) malloc(sizeof(BlobInfo));
! 		blobobj->dobj.objType = DO_BLOB_ITEM;
! 		blobobj->dobj.catId.oid = atooid(PQgetvalue(res, i, 0));
! 		blobobj->dobj.catId.tableoid = LargeObjectRelationId;
! 		AssignDumpId(&blobobj->dobj);
  
! 		blobobj->dobj.name = strdup(PQgetvalue(res, i, 0));
! 		blobobj->rolname = strdup(PQgetvalue(res, i, 1));
! 		blobobj->blobacl = strdup(PQgetvalue(res, i, 2));
! 	}
  
! 	/*
! 	 * If we have a large object at least, "BLOB DATA" section
! 	 * is also necessary.
! 	 */
! 	if (PQntuples(res) > 0)
! 	{
! 		blobdata = (DumpableObject *) malloc(sizeof(DumpableObject));
! 		blobdata->objType = DO_BLOB_DATA;
! 		blobdata->catId = nilCatalogId;
! 		AssignDumpId(blobdata);
! 		blobdata->name = strdup("BLOB DATA");
! 	}
  }
  
  /*
!  * dumpBlobItem
!  *
!  * dump a definition of the given large object
!  */
! static void
! dumpBlobItem(Archive *AH, BlobInfo *binfo)
! {
! 	PQExpBuffer		bquery;
! 	PQExpBuffer		dquery;
! 	PQExpBuffer		temp;
! 
! 	/* Skip if not to be dumped */
! 	if (!binfo->dobj.dump || dataOnly)
! 		return;
! 
! 	bquery = createPQExpBuffer();
! 	dquery = createPQExpBuffer();
! 	temp = createPQExpBuffer();
! 
! 	/*
! 	 * Create an empty large object
! 	 */
! 	appendPQExpBuffer(bquery, "SELECT lo_create(%s);\n", binfo->dobj.name);
! 	appendPQExpBuffer(dquery, "SELECT lo_unlink(%s);\n", binfo->dobj.name);
! 
! 	ArchiveEntry(AH, binfo->dobj.catId, binfo->dobj.dumpId,
! 				 binfo->dobj.name,
! 				 NULL, NULL,
! 				 binfo->rolname, false,
! 				 "BLOB ITEM", SECTION_PRE_DATA,
! 				 bquery->data, dquery->data, NULL,
! 				 binfo->dobj.dependencies, binfo->dobj.nDeps,
! 				 NULL, NULL);
! 
! 	/*
! 	 * Create a comment on large object, if necessary
! 	 */
! 	appendPQExpBuffer(temp, "LARGE OBJECT %s", binfo->dobj.name);
! 	dumpComment(AH, temp->data, NULL, binfo->rolname,
! 				binfo->dobj.catId, 0, binfo->dobj.dumpId);
! 
! 	/*
! 	 * Dump access privileges, if necessary
! 	 */
! 	dumpACL(AH, binfo->dobj.catId, binfo->dobj.dumpId,
! 			"LARGE OBJECT",
! 			binfo->dobj.name, NULL,
! 			binfo->dobj.name, NULL,
! 			binfo->rolname, binfo->blobacl);
! 
! 	destroyPQExpBuffer(bquery);
! 	destroyPQExpBuffer(dquery);
! 	destroyPQExpBuffer(temp);
! }
! 
! /*
!  * dumpBlobData:
!  *	dump all the data contents of large object
   */
  static int
! dumpBlobData(Archive *AH, void *arg)
  {
  	const char *blobQry;
  	const char *blobFetchQry;
***************
*** 2022,2028 **** dumpBlobs(Archive *AH, void *arg)
  			loFd = lo_open(g_conn, blobOid, INV_READ);
  			if (loFd == -1)
  			{
! 				write_msg(NULL, "dumpBlobs(): could not open large object: %s",
  						  PQerrorMessage(g_conn));
  				exit_nicely();
  			}
--- 2094,2100 ----
  			loFd = lo_open(g_conn, blobOid, INV_READ);
  			if (loFd == -1)
  			{
! 				write_msg(NULL, "dumpBlobData(): could not open large object: %s",
  						  PQerrorMessage(g_conn));
  				exit_nicely();
  			}
***************
*** 2035,2041 **** dumpBlobs(Archive *AH, void *arg)
  				cnt = lo_read(g_conn, loFd, buf, LOBBUFSIZE);
  				if (cnt < 0)
  				{
! 					write_msg(NULL, "dumpBlobs(): error reading large object: %s",
  							  PQerrorMessage(g_conn));
  					exit_nicely();
  				}
--- 2107,2113 ----
  				cnt = lo_read(g_conn, loFd, buf, LOBBUFSIZE);
  				if (cnt < 0)
  				{
! 					write_msg(NULL, "dumpBlobData(): error reading large object: %s",
  							  PQerrorMessage(g_conn));
  					exit_nicely();
  				}
***************
*** 2054,2187 **** dumpBlobs(Archive *AH, void *arg)
  	return 1;
  }
  
- /*
-  * dumpBlobComments
-  *	dump all blob properties.
-  *  It has "BLOB COMMENTS" tag due to the historical reason, but note
-  *  that it is the routine to dump all the properties of blobs.
-  *
-  * Since we don't provide any way to be selective about dumping blobs,
-  * there's no need to be selective about their comments either.  We put
-  * all the comments into one big TOC entry.
-  */
- static int
- dumpBlobComments(Archive *AH, void *arg)
- {
- 	const char *blobQry;
- 	const char *blobFetchQry;
- 	PQExpBuffer cmdQry = createPQExpBuffer();
- 	PGresult   *res;
- 	int			i;
- 
- 	if (g_verbose)
- 		write_msg(NULL, "saving large object properties\n");
- 
- 	/* Make sure we are in proper schema */
- 	selectSourceSchema("pg_catalog");
- 
- 	/* Cursor to get all BLOB comments */
- 	if (AH->remoteVersion >= 80500)
- 		blobQry = "DECLARE blobcmt CURSOR FOR SELECT oid, "
- 			"obj_description(oid, 'pg_largeobject'), "
- 			"pg_get_userbyid(lomowner), lomacl "
- 			"FROM pg_largeobject_metadata";
- 	else if (AH->remoteVersion >= 70300)
- 		blobQry = "DECLARE blobcmt CURSOR FOR SELECT loid, "
- 			"obj_description(loid, 'pg_largeobject'), NULL, NULL "
- 			"FROM (SELECT DISTINCT loid FROM "
- 			"pg_description d JOIN pg_largeobject l ON (objoid = loid) "
- 			"WHERE classoid = 'pg_largeobject'::regclass) ss";
- 	else if (AH->remoteVersion >= 70200)
- 		blobQry = "DECLARE blobcmt CURSOR FOR SELECT loid, "
- 			"obj_description(loid, 'pg_largeobject'), NULL, NULL "
- 			"FROM (SELECT DISTINCT loid FROM pg_largeobject) ss";
- 	else if (AH->remoteVersion >= 70100)
- 		blobQry = "DECLARE blobcmt CURSOR FOR SELECT loid, "
- 			"obj_description(loid), NULL, NULL "
- 			"FROM (SELECT DISTINCT loid FROM pg_largeobject) ss";
- 	else
- 		blobQry = "DECLARE blobcmt CURSOR FOR SELECT oid, "
- 			"	( "
- 			"		SELECT description "
- 			"		FROM pg_description pd "
- 			"		WHERE pd.objoid=pc.oid "
- 			"	), NULL, NULL "
- 			"FROM pg_class pc WHERE relkind = 'l'";
- 
- 	res = PQexec(g_conn, blobQry);
- 	check_sql_result(res, g_conn, blobQry, PGRES_COMMAND_OK);
- 
- 	/* Command to fetch from cursor */
- 	blobFetchQry = "FETCH 100 IN blobcmt";
- 
- 	do
- 	{
- 		PQclear(res);
- 
- 		/* Do a fetch */
- 		res = PQexec(g_conn, blobFetchQry);
- 		check_sql_result(res, g_conn, blobFetchQry, PGRES_TUPLES_OK);
- 
- 		/* Process the tuples, if any */
- 		for (i = 0; i < PQntuples(res); i++)
- 		{
- 			Oid			blobOid = atooid(PQgetvalue(res, i, 0));
- 			char	   *lo_comment = PQgetvalue(res, i, 1);
- 			char	   *lo_owner = PQgetvalue(res, i, 2);
- 			char	   *lo_acl = PQgetvalue(res, i, 3);
- 			char		lo_name[32];
- 
- 			resetPQExpBuffer(cmdQry);
- 
- 			/* comment on the blob */
- 			if (!PQgetisnull(res, i, 1))
- 			{
- 				appendPQExpBuffer(cmdQry,
- 								  "COMMENT ON LARGE OBJECT %u IS ", blobOid);
- 				appendStringLiteralAH(cmdQry, lo_comment, AH);
- 				appendPQExpBuffer(cmdQry, ";\n");
- 			}
- 
- 			/* dump blob ownership, if necessary */
- 			if (!PQgetisnull(res, i, 2))
- 			{
- 				appendPQExpBuffer(cmdQry,
- 								  "ALTER LARGE OBJECT %u OWNER TO %s;\n",
- 								  blobOid, lo_owner);
- 			}
- 
- 			/* dump blob privileges, if necessary */
- 			if (!PQgetisnull(res, i, 3) &&
- 				!dataOnly && !aclsSkip)
- 			{
- 				snprintf(lo_name, sizeof(lo_name), "%u", blobOid);
- 				if (!buildACLCommands(lo_name, NULL, "LARGE OBJECT",
- 									  lo_acl, lo_owner, "",
- 									  AH->remoteVersion, cmdQry))
- 				{
- 					write_msg(NULL, "could not parse ACL (%s) for "
- 							  "large object %u", lo_acl, blobOid);
- 					exit_nicely();
- 				}
- 			}
- 
- 			if (cmdQry->len > 0)
- 			{
- 				appendPQExpBuffer(cmdQry, "\n");
- 				archputs(cmdQry->data, AH);
- 			}
- 		}
- 	} while (PQntuples(res) > 0);
- 
- 	PQclear(res);
- 
- 	archputs("\n", AH);
- 
- 	destroyPQExpBuffer(cmdQry);
- 
- 	return 1;
- }
- 
  static void
  binary_upgrade_set_type_oids_by_type_oid(PQExpBuffer upgrade_buffer,
  											   Oid pg_type_oid)
--- 2126,2131 ----
***************
*** 6478,6498 **** dumpDumpableObject(Archive *fout, DumpableObject *dobj)
  		case DO_DEFAULT_ACL:
  			dumpDefaultACL(fout, (DefaultACLInfo *) dobj);
  			break;
! 		case DO_BLOBS:
! 			ArchiveEntry(fout, dobj->catId, dobj->dumpId,
! 						 dobj->name, NULL, NULL, "",
! 						 false, "BLOBS", SECTION_DATA,
! 						 "", "", NULL,
! 						 dobj->dependencies, dobj->nDeps,
! 						 dumpBlobs, NULL);
  			break;
! 		case DO_BLOB_COMMENTS:
  			ArchiveEntry(fout, dobj->catId, dobj->dumpId,
  						 dobj->name, NULL, NULL, "",
! 						 false, "BLOB COMMENTS", SECTION_DATA,
  						 "", "", NULL,
  						 dobj->dependencies, dobj->nDeps,
! 						 dumpBlobComments, NULL);
  			break;
  	}
  }
--- 6422,6437 ----
  		case DO_DEFAULT_ACL:
  			dumpDefaultACL(fout, (DefaultACLInfo *) dobj);
  			break;
! 		case DO_BLOB_ITEM:
! 			dumpBlobItem(fout, (BlobInfo *) dobj);
  			break;
! 		case DO_BLOB_DATA:
  			ArchiveEntry(fout, dobj->catId, dobj->dumpId,
  						 dobj->name, NULL, NULL, "",
! 						 false, "BLOB DATA", SECTION_DATA,
  						 "", "", NULL,
  						 dobj->dependencies, dobj->nDeps,
! 						 dumpBlobData, NULL);
  			break;
  	}
  }
*** a/src/bin/pg_dump/pg_dump.h
--- b/src/bin/pg_dump/pg_dump.h
***************
*** 115,122 **** typedef enum
  	DO_FDW,
  	DO_FOREIGN_SERVER,
  	DO_DEFAULT_ACL,
! 	DO_BLOBS,
! 	DO_BLOB_COMMENTS
  } DumpableObjectType;
  
  typedef struct _dumpableObject
--- 115,122 ----
  	DO_FDW,
  	DO_FOREIGN_SERVER,
  	DO_DEFAULT_ACL,
! 	DO_BLOB_DATA,
! 	DO_BLOB_ITEM,
  } DumpableObjectType;
  
  typedef struct _dumpableObject
***************
*** 442,447 **** typedef struct _defaultACLInfo
--- 442,454 ----
  	char	   *defaclacl;
  } DefaultACLInfo;
  
+ typedef struct _blobInfo
+ {
+ 	DumpableObject	dobj;
+ 	char	   *rolname;
+ 	char	   *blobacl;
+ } BlobInfo;
+ 
  /* global decls */
  extern bool force_quotes;		/* double-quotes for identifiers flag */
  extern bool g_verbose;			/* verbose flag */
*** a/src/bin/pg_dump/pg_dump_sort.c
--- b/src/bin/pg_dump/pg_dump_sort.c
***************
*** 92,99 **** static const int newObjectTypePriority[] =
  	14,							/* DO_FDW */
  	15,							/* DO_FOREIGN_SERVER */
  	27,							/* DO_DEFAULT_ACL */
! 	20,							/* DO_BLOBS */
! 	21							/* DO_BLOB_COMMENTS */
  };
  
  
--- 92,99 ----
  	14,							/* DO_FDW */
  	15,							/* DO_FOREIGN_SERVER */
  	27,							/* DO_DEFAULT_ACL */
! 	21,							/* DO_BLOB_DATA */
! 	20,							/* DO_BLOB_ITEM */
  };
  
  
***************
*** 1146,1159 **** describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
  					 "DEFAULT ACL %s  (ID %d OID %u)",
  					 obj->name, obj->dumpId, obj->catId.oid);
  			return;
! 		case DO_BLOBS:
  			snprintf(buf, bufsize,
! 					 "BLOBS  (ID %d)",
  					 obj->dumpId);
  			return;
! 		case DO_BLOB_COMMENTS:
  			snprintf(buf, bufsize,
! 					 "BLOB COMMENTS  (ID %d)",
  					 obj->dumpId);
  			return;
  	}
--- 1146,1159 ----
  					 "DEFAULT ACL %s  (ID %d OID %u)",
  					 obj->name, obj->dumpId, obj->catId.oid);
  			return;
! 		case DO_BLOB_DATA:
  			snprintf(buf, bufsize,
! 					 "BLOB DATA  (ID %d)",
  					 obj->dumpId);
  			return;
! 		case DO_BLOB_ITEM:
  			snprintf(buf, bufsize,
! 					 "BLOB ITEM  (ID %d)",
  					 obj->dumpId);
  			return;
  	}
#97Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp
In reply to: KaiGai Kohei (#96)
Re: Largeobject Access Controls (r2460)

KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

The attached patch uses one TOC entry for each blob objects.

When I'm testing the new patch, I found "ALTER LARGE OBJECT" command
returns "ALTER LARGEOBJECT" tag. Should it be "ALTER LARGE(space)OBJECT"
instead? As I remember, we had decided not to use LARGEOBJECT
(without a space) in user-visible messages, right?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

#98Tom Lane
tgl@sss.pgh.pa.us
In reply to: Takahiro Itagaki (#97)
Re: Largeobject Access Controls (r2460)

Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes:

When I'm testing the new patch, I found "ALTER LARGE OBJECT" command
returns "ALTER LARGEOBJECT" tag. Should it be "ALTER LARGE(space)OBJECT"
instead? As I remember, we had decided not to use LARGEOBJECT
(without a space) in user-visible messages, right?

The command tag should match the actual command. If the command name
is "ALTER LARGE OBJECT", the command tag should be too. This is
independent of phraseology we might choose in error messages (though
I agree I don't like "largeobject" in those either).

regards, tom lane

#99KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Takahiro Itagaki (#97)
1 attachment(s)
Re: Largeobject Access Controls (r2460)

(2010/01/28 18:21), Takahiro Itagaki wrote:

KaiGai Kohei<kaigai@ak.jp.nec.com> wrote:

The attached patch uses one TOC entry for each blob objects.

When I'm testing the new patch, I found "ALTER LARGE OBJECT" command
returns "ALTER LARGEOBJECT" tag. Should it be "ALTER LARGE(space)OBJECT"
instead? As I remember, we had decided not to use LARGEOBJECT
(without a space) in user-visible messages, right?

Sorry, I left for fix this tag when I was pointed out LARGEOBJECT should
be LARGE(space)OBJECT.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

pgsql-fix-large_object-tag.patchapplication/octect-stream; name=pgsql-fix-large_object-tag.patchDownload
*** a/src/backend/tcop/utility.c
--- b/src/backend/tcop/utility.c
***************
*** 1687,1693 **** CreateCommandTag(Node *parsetree)
  					tag = "ALTER LANGUAGE";
  					break;
  				case OBJECT_LARGEOBJECT:
! 					tag = "ALTER LARGEOBJECT";
  					break;
  				case OBJECT_OPERATOR:
  					tag = "ALTER OPERATOR";
--- 1687,1693 ----
  					tag = "ALTER LANGUAGE";
  					break;
  				case OBJECT_LARGEOBJECT:
! 					tag = "ALTER LARGE OBJECT";
  					break;
  				case OBJECT_OPERATOR:
  					tag = "ALTER OPERATOR";
#100Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp
In reply to: KaiGai Kohei (#99)
Re: Largeobject Access Controls (r2460)

KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

When I'm testing the new patch, I found "ALTER LARGE OBJECT" command
returns "ALTER LARGEOBJECT" tag. Should it be "ALTER LARGE(space)OBJECT"
instead?

Sorry, I left for fix this tag when I was pointed out LARGEOBJECT should
be LARGE(space)OBJECT.

Committed. Thanks.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

#101Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp
In reply to: KaiGai Kohei (#96)
Re: Largeobject Access Controls (r2460)

KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

The attached patch uses one TOC entry for each blob objects.

This patch does not only fix the existing bugs, but also refactor
the dump format of large objects in pg_dump. The new format are
more similar to the format of tables:

Section <Tables> <New LO> <Old LO>
-----------------------------------------------------
Schema "TABLE" "BLOB ITEM" "BLOBS"
Data "TABLE DATA" "BLOB DATA" "BLOBS"
Comments "COMMENT" "COMMENT" "BLOB COMMENTS"

We will allocate BlobInfo in memory for each large object. It might
consume much more memory compared with former versions if we have
many large objects, but we discussed it is an acceptable change.

As far as I read, the patch is almost ready to commit
except the following issue about backward compatibility:

* "BLOB DATA"
This section is same as existing "BLOBS" section, except for _LoadBlobs()
does not create a new large object before opening it with INV_WRITE, and
lo_truncate() will be used instead of lo_unlink() when --clean is given.

The legacy sections ("BLOBS" and "BLOB COMMENTS") are available to read
for compatibility, but newer pg_dump never create these sections.

I wonder we need to support older versions in pg_restore. You add a check
"PQserverVersion >= 80500" in CleanupBlobIfExists(), but out documentation
says we cannot use pg_restore 9.0 for 8.4 or older servers:

http://developer.postgresql.org/pgdocs/postgres/app-pgdump.html
| it is not guaranteed that pg_dump's output can be loaded into
| a server of an older major version

Can we remove such path and raise an error instead?
Also, even if we support the older servers in the routine,
the new bytea format will be another problem anyway.

One remained issue is the way to decide whether blobs to be dumped, or not.
Right now, --schema-only eliminate all the blob dumps.
However, I think it should follow the manner in any other object classes.

-a, --data-only ... only "BLOB DATA" sections, not "BLOB ITEM"
-s, --schema-only ... only "BLOB ITEM" sections, not "BLOB DATA"
-b, --blobs ... both of "BLOB ITEM" and "BLOB DATA" independently
from --data-only and --schema-only?

I cannot image situations that require data-only dumps -- for example,
restoring database has a filled pg_largeobject_metadata and an empty
or broken pg_largeobject -- but it seems to be unnatural.

I'd prefer to keep the existing behavior:
* default or data-only : dump all attributes and data of blobs
* schema-only : don't dump any blobs
and have independent options to control blob dumps:
* -b, --blobs : dump all blobs even if schema-only
* -B, --no-blobs : [NEW] don't dump any blobs even if default or data-only

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

#102KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Takahiro Itagaki (#101)
Re: Largeobject Access Controls (r2460)

(2010/02/01 14:19), Takahiro Itagaki wrote:

As far as I read, the patch is almost ready to commit
except the following issue about backward compatibility:

* "BLOB DATA"
This section is same as existing "BLOBS" section, except for _LoadBlobs()
does not create a new large object before opening it with INV_WRITE, and
lo_truncate() will be used instead of lo_unlink() when --clean is given.

The legacy sections ("BLOBS" and "BLOB COMMENTS") are available to read
for compatibility, but newer pg_dump never create these sections.

I wonder we need to support older versions in pg_restore. You add a check
"PQserverVersion>= 80500" in CleanupBlobIfExists(), but out documentation
says we cannot use pg_restore 9.0 for 8.4 or older servers:

http://developer.postgresql.org/pgdocs/postgres/app-pgdump.html
| it is not guaranteed that pg_dump's output can be loaded into
| a server of an older major version

Can we remove such path and raise an error instead?
Also, even if we support the older servers in the routine,
the new bytea format will be another problem anyway.

OK, I'll fix it.

One remained issue is the way to decide whether blobs to be dumped, or not.
Right now, --schema-only eliminate all the blob dumps.
However, I think it should follow the manner in any other object classes.

-a, --data-only ... only "BLOB DATA" sections, not "BLOB ITEM"
-s, --schema-only ... only "BLOB ITEM" sections, not "BLOB DATA"
-b, --blobs ... both of "BLOB ITEM" and "BLOB DATA" independently
from --data-only and --schema-only?

I cannot image situations that require data-only dumps -- for example,
restoring database has a filled pg_largeobject_metadata and an empty
or broken pg_largeobject -- but it seems to be unnatural.

Indeed, it might not be a sane situation.

However, we can assume the situation that user wants to backup a database
into two separated files (one for schema definition; one for data contents).
Just after restoring schema definitions, all the large obejcts are created
as empty blobs. Then, we can restore their data contents.

I wonder if the behavior is easily understandable for end users.
The "BLOB ITEM" section contains properties of a certain large obejct,
such as identifier (loid), comment, ownership and access privileges.
These are categorized to schema definitions in other object classes,
but we still need special treatment for blobs.

The --schema-only with large objects might be unnatural, but the
--data-only with properties of large objects are also unnatural.
Which behavior is more unnatural?

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

#103Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp
In reply to: KaiGai Kohei (#102)
Re: Largeobject Access Controls (r2460)

KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

Can we remove such path and raise an error instead?
Also, even if we support the older servers in the routine,
the new bytea format will be another problem anyway.

OK, I'll fix it.

I think we might need to discuss about explicit version checks in pg_restore.
It is not related with large objects, but with pg_restore's capability.
We've not supported to restore a dump to older servers, but we don't have any
version checks, right? Should we do the checks at the beginning of restoring?
If we do so, LO patch could be more simplified.

The --schema-only with large objects might be unnatural, but the
--data-only with properties of large objects are also unnatural.
Which behavior is more unnatural?

I think large object metadata is a kind of row-based access controls.
How do we dump and restore ACLs per rows when we support them for
normal tables? We should treat LO metadata as same as row-based ACLs.
In my opinion, I'd like to treat them as part of data (not of schema).

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

#104KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Takahiro Itagaki (#103)
Re: Largeobject Access Controls (r2460)

(2010/02/02 9:33), Takahiro Itagaki wrote:

KaiGai Kohei<kaigai@ak.jp.nec.com> wrote:

Can we remove such path and raise an error instead?
Also, even if we support the older servers in the routine,
the new bytea format will be another problem anyway.

OK, I'll fix it.

I think we might need to discuss about explicit version checks in pg_restore.
It is not related with large objects, but with pg_restore's capability.
We've not supported to restore a dump to older servers, but we don't have any
version checks, right? Should we do the checks at the beginning of restoring?
If we do so, LO patch could be more simplified.

I agree it is a good idea.

The --schema-only with large objects might be unnatural, but the
--data-only with properties of large objects are also unnatural.
Which behavior is more unnatural?

I think large object metadata is a kind of row-based access controls.
How do we dump and restore ACLs per rows when we support them for
normal tables? We should treat LO metadata as same as row-based ACLs.
In my opinion, I'd like to treat them as part of data (not of schema).

OK, I'll update the patch according to the behavior you suggested.
| I'd prefer to keep the existing behavior:
| * default or data-only : dump all attributes and data of blobs
| * schema-only : don't dump any blobs
| and have independent options to control blob dumps:
| * -b, --blobs : dump all blobs even if schema-only
| * -B, --no-blobs : [NEW] don't dump any blobs even if default or data-only

Please wait for a while. Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

#105KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: KaiGai Kohei (#104)
Re: Largeobject Access Controls (r2460)

The --schema-only with large objects might be unnatural, but the
--data-only with properties of large objects are also unnatural.
Which behavior is more unnatural?

I think large object metadata is a kind of row-based access controls.
How do we dump and restore ACLs per rows when we support them for
normal tables? We should treat LO metadata as same as row-based ACLs.
In my opinion, I'd like to treat them as part of data (not of schema).

OK, I'll update the patch according to the behavior you suggested.
| I'd prefer to keep the existing behavior:
| * default or data-only : dump all attributes and data of blobs
| * schema-only : don't dump any blobs
| and have independent options to control blob dumps:
| * -b, --blobs : dump all blobs even if schema-only
| * -B, --no-blobs : [NEW] don't dump any blobs even if default or data-only

I found out it needs special treatments to dump comments and access
privileges of blobs. See dumpACL() and dumpComment()

| static void
| dumpACL(Archive *fout, CatalogId objCatId, DumpId objDumpId,
| const char *type, const char *name, const char *subname,
| const char *tag, const char *nspname, const char *owner,
| const char *acls)
| {
| PQExpBuffer sql;
|
| /* Do nothing if ACL dump is not enabled */
| if (dataOnly || aclsSkip)
| return;
| :

| static void
| dumpComment(Archive *fout, const char *target,
| const char *namespace, const char *owner,
| CatalogId catalogId, int subid, DumpId dumpId)
| {
| CommentItem *comments;
| int ncomments;
|
| /* Comments are SCHEMA not data */
| if (dataOnly)
| return;
| :

In addition, _printTocEntry() is not called with acl_pass = true,
when --data-only is given.

I again wonder whether we are on the right direction.

Originally, the reason why we decide to use per blob toc entry was
that "BLOB ACLS" entry needs a few exceptional treatments in the code.
But, if we deal with "BLOB ITEM" entry as data contents, it will also
need additional exceptional treatments.

Indeed, even if we have row-level ACLs, it will be dumped in data section
without separating them into properties and data contents because of the
restriction on implementation, not a data modeling reason.

Many of empty large objects may not be sane situation, but it is suitable
for the existing manner in pg_dump/pg_restore, at least.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

#106Robert Haas
robertmhaas@gmail.com
In reply to: KaiGai Kohei (#105)
Re: Largeobject Access Controls (r2460)

2010/2/1 KaiGai Kohei <kaigai@ak.jp.nec.com>:

I again wonder whether we are on the right direction.

I believe the proposed approach is to dump blob metadata if and only
if you are also dumping blob contents, and to do all of this for data
dumps but not schema dumps. That seems about right to me.

Originally, the reason why we decide to use per blob toc entry was
that "BLOB ACLS" entry needs a few exceptional treatments in the code.
But, if we deal with "BLOB ITEM" entry as data contents, it will also
need additional exceptional treatments.

But the new ones are less objectionable, maybe.

...Robert

#107KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Robert Haas (#106)
Re: Largeobject Access Controls (r2460)

(2010/02/04 0:20), Robert Haas wrote:

2010/2/1 KaiGai Kohei<kaigai@ak.jp.nec.com>:

I again wonder whether we are on the right direction.

I believe the proposed approach is to dump blob metadata if and only
if you are also dumping blob contents, and to do all of this for data
dumps but not schema dumps. That seems about right to me.

In other words:

<default> -> blob contents and metadata (owner, acl, comments) shall
be dumped
--data-only -> only blob contents shall be dumped
--schema-only -> neither blob contents and metadata are dumped.

Can I understand correctly?

Originally, the reason why we decide to use per blob toc entry was
that "BLOB ACLS" entry needs a few exceptional treatments in the code.
But, if we deal with "BLOB ITEM" entry as data contents, it will also
need additional exceptional treatments.

But the new ones are less objectionable, maybe.

...Robert

--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

#108KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: KaiGai Kohei (#107)
1 attachment(s)
Re: Largeobject Access Controls (r2460)

(2010/02/04 17:30), KaiGai Kohei wrote:

(2010/02/04 0:20), Robert Haas wrote:

2010/2/1 KaiGai Kohei<kaigai@ak.jp.nec.com>:

I again wonder whether we are on the right direction.

I believe the proposed approach is to dump blob metadata if and only
if you are also dumping blob contents, and to do all of this for data
dumps but not schema dumps. That seems about right to me.

In other words:

<default> -> blob contents and metadata (owner, acl, comments) shall
be dumped
--data-only -> only blob contents shall be dumped
--schema-only -> neither blob contents and metadata are dumped.

Can I understand correctly?

The attached patch enables not to dump "BLOB ITEM" section and corresponding
metadata when --data-only is specified. In addition, it does not output
both "BLOB DATA" and "BLOB ITEM" section when --schema-only is specified.

When --data-only is given to pg_dump, it does not construct any DO_BLOB_ITEM
entries in getBlobs(), so all the metadata (owner, acls, comment) are not
dumped. And it writes legacy "BLOBS" section instead of the new "BLOB DATA"
section to inform pg_restore this archive does not create large objects in
"BLOB ITEM" section.
If --schema-only is given, getBlobs() is simply skipped.

When --data-only is given to pg_restore, it skips all the "BLOB ITEM" sections.
Large objects are created in _LoadBlobs() instead of the section, like as we
have done until now.
The _LoadBlobs() takes the third argument which specifies whether we should
create large object here, or not. Its condition is a bit modified from the
previous patch.

if (strcmp(te->desc, "BLOBS") == 0 || ropt->dataOnly)
_LoadBlobs(AH, ropt, true); ^^^^^^^^^^^^^^^^^
else if (strcmp(te->desc, "BLOB DATA") == 0)
_LoadBlobs(AH, ropt, false);

When --data-only is given to pg_restore, "BLOB ITEM" secition is skipped,
so we need to create large objects at _LoadBlobs() stage, even if the
archive has "BLOB DATA" section.

In addition, --schema-only kills all the "BLOB ITEM" section using a special
condition that was added to _tocEntryRequired().

It might be a bit different from what Itagaki-san suggested, because
"BLOB ITEM" section is still in SECTION_PRE_DATA section.
However, it minimizes special treatments in the code, and no differences
from the viewpoint of end-users.

Or, is it necessary to pack them into SECTION_DATA section anyway?

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

pgsql-fix-pg_dump-blob-privs.4.patchapplication/octect-stream; name=pgsql-fix-pg_dump-blob-privs.4.patchDownload
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 24c0fd4..839ff42 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -520,6 +520,7 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 				_printTocEntry(AH, te, ropt, true, false);
 
 				if (strcmp(te->desc, "BLOBS") == 0 ||
+					strcmp(te->desc, "BLOB DATA") == 0 ||
 					strcmp(te->desc, "BLOB COMMENTS") == 0)
 				{
 					ahlog(AH, 1, "restoring %s\n", te->desc);
@@ -903,7 +904,7 @@ EndRestoreBlobs(ArchiveHandle *AH)
  * Called by a format handler to initiate restoration of a blob
  */
 void
-StartRestoreBlob(ArchiveHandle *AH, Oid oid, bool drop)
+StartRestoreBlob(ArchiveHandle *AH, Oid oid, bool cleanup, bool compat)
 {
 	Oid			loOid;
 
@@ -914,24 +915,29 @@ StartRestoreBlob(ArchiveHandle *AH, Oid oid, bool drop)
 
 	ahlog(AH, 2, "restoring large object with OID %u\n", oid);
 
-	if (drop)
-		DropBlobIfExists(AH, oid);
+	if (cleanup)
+		CleanupBlobIfExists(AH, oid, compat);
 
 	if (AH->connection)
 	{
-		loOid = lo_create(AH->connection, oid);
-		if (loOid == 0 || loOid != oid)
-			die_horribly(AH, modulename, "could not create large object %u\n",
-						 oid);
-
+		if (compat)
+		{
+			loOid = lo_create(AH->connection, oid);
+			if (loOid == 0 || loOid != oid)
+				die_horribly(AH, modulename, "could not create large object %u\n",
+							 oid);
+		}
 		AH->loFd = lo_open(AH->connection, oid, INV_WRITE);
 		if (AH->loFd == -1)
 			die_horribly(AH, modulename, "could not open large object\n");
 	}
 	else
 	{
-		ahprintf(AH, "SELECT pg_catalog.lo_open(pg_catalog.lo_create('%u'), %d);\n",
-				 oid, INV_WRITE);
+		if (compat)
+			ahprintf(AH, "SELECT pg_catalog.lo_open(pg_catalog.lo_create('%u'), %d);\n", oid, INV_WRITE);
+		else
+			ahprintf(AH, "SELECT pg_catalog.lo_open(%u, %d);\n",
+					 oid, INV_WRITE);
 	}
 
 	AH->writingBlob = 1;
@@ -1940,7 +1946,8 @@ WriteDataChunks(ArchiveHandle *AH)
 			AH->currToc = te;
 			/* printf("Writing data for %d (%x)\n", te->id, te); */
 
-			if (strcmp(te->desc, "BLOBS") == 0)
+			if (strcmp(te->desc, "BLOBS") == 0 ||
+				strcmp(te->desc, "BLOB DATA") == 0)
 			{
 				startPtr = AH->StartBlobsPtr;
 				endPtr = AH->EndBlobsPtr;
@@ -2077,6 +2084,7 @@ ReadToc(ArchiveHandle *AH)
 				te->section = SECTION_NONE;
 			else if (strcmp(te->desc, "TABLE DATA") == 0 ||
 					 strcmp(te->desc, "BLOBS") == 0 ||
+					 strcmp(te->desc, "BLOB DATA") == 0 ||
 					 strcmp(te->desc, "BLOB COMMENTS") == 0)
 				te->section = SECTION_DATA;
 			else if (strcmp(te->desc, "CONSTRAINT") == 0 ||
@@ -2235,6 +2243,10 @@ _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls)
 	if (!ropt->create && strcmp(te->desc, "DATABASE") == 0)
 		return 0;
 
+	/* Do nothing if "BLOB ITEM" section with --shcemaOnly */
+	if (strcmp(te->desc, "BLOB ITEM") == 0 && ropt->schemaOnly)
+		return 0;
+
 	/* Check options for selective dump/restore */
 	if (ropt->schemaNames)
 	{
@@ -2713,6 +2725,13 @@ _getObjectDescription(PQExpBuffer buf, TocEntry *te, ArchiveHandle *AH)
 		return;
 	}
 
+	/* Use ALTER LARGE OBJECT for BLOB ITEM */
+	if (strcmp(type, "BLOB ITEM") == 0)
+	{
+		appendPQExpBuffer(buf, "LARGE OBJECT %s", te->tag);
+		return;
+	}
+
 	write_msg(modulename, "WARNING: don't know how to set owner for object type %s\n",
 			  type);
 }
@@ -2824,6 +2843,7 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt, bool isDat
 		strlen(te->owner) > 0 && strlen(te->dropStmt) > 0)
 	{
 		if (strcmp(te->desc, "AGGREGATE") == 0 ||
+			strcmp(te->desc, "BLOB ITEM") == 0 ||
 			strcmp(te->desc, "CONVERSION") == 0 ||
 			strcmp(te->desc, "DATABASE") == 0 ||
 			strcmp(te->desc, "DOMAIN") == 0 ||
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index c09cec5..acb1986 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -359,7 +359,7 @@ int			ReadOffset(ArchiveHandle *, pgoff_t *);
 size_t		WriteOffset(ArchiveHandle *, pgoff_t, int);
 
 extern void StartRestoreBlobs(ArchiveHandle *AH);
-extern void StartRestoreBlob(ArchiveHandle *AH, Oid oid, bool drop);
+extern void StartRestoreBlob(ArchiveHandle *AH, Oid oid, bool cleanup, bool compat);
 extern void EndRestoreBlob(ArchiveHandle *AH, Oid oid);
 extern void EndRestoreBlobs(ArchiveHandle *AH);
 
@@ -371,7 +371,7 @@ extern void InitArchiveFmt_Tar(ArchiveHandle *AH);
 extern bool isValidTarHeader(char *header);
 
 extern int	ReconnectToServer(ArchiveHandle *AH, const char *dbname, const char *newUser);
-extern void	DropBlobIfExists(ArchiveHandle *AH, Oid oid);
+extern void	CleanupBlobIfExists(ArchiveHandle *AH, Oid oid, bool compat);
 
 int			ahwrite(const void *ptr, size_t size, size_t nmemb, ArchiveHandle *AH);
 int			ahprintf(ArchiveHandle *AH, const char *fmt,...) __attribute__((format(printf, 2, 3)));
diff --git a/src/bin/pg_dump/pg_backup_custom.c b/src/bin/pg_dump/pg_backup_custom.c
index ea16c0b..c001cde 100644
--- a/src/bin/pg_dump/pg_backup_custom.c
+++ b/src/bin/pg_dump/pg_backup_custom.c
@@ -54,7 +54,7 @@ static void _StartBlobs(ArchiveHandle *AH, TocEntry *te);
 static void _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
 static void _EndBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
 static void _EndBlobs(ArchiveHandle *AH, TocEntry *te);
-static void _LoadBlobs(ArchiveHandle *AH, bool drop);
+static void _LoadBlobs(ArchiveHandle *AH, bool cleanup, bool compat);
 static void _Clone(ArchiveHandle *AH);
 static void _DeClone(ArchiveHandle *AH);
 
@@ -498,7 +498,10 @@ _PrintTocData(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
 			break;
 
 		case BLK_BLOBS:
-			_LoadBlobs(AH, ropt->dropSchema);
+			if (strcmp(te->desc, "BLOBS") == 0 || ropt->dataOnly)
+				_LoadBlobs(AH, ropt->dropSchema, true);
+			else
+				_LoadBlobs(AH, ropt->dropSchema, false);
 			break;
 
 		default:				/* Always have a default */
@@ -619,7 +622,7 @@ _PrintData(ArchiveHandle *AH)
 }
 
 static void
-_LoadBlobs(ArchiveHandle *AH, bool drop)
+_LoadBlobs(ArchiveHandle *AH, bool cleanup, bool compat)
 {
 	Oid			oid;
 
@@ -628,7 +631,7 @@ _LoadBlobs(ArchiveHandle *AH, bool drop)
 	oid = ReadInt(AH);
 	while (oid != 0)
 	{
-		StartRestoreBlob(AH, oid, drop);
+		StartRestoreBlob(AH, oid, cleanup, compat);
 		_PrintData(AH);
 		EndRestoreBlob(AH, oid);
 		oid = ReadInt(AH);
diff --git a/src/bin/pg_dump/pg_backup_db.c b/src/bin/pg_dump/pg_backup_db.c
index 6a195a9..b1c7e8d 100644
--- a/src/bin/pg_dump/pg_backup_db.c
+++ b/src/bin/pg_dump/pg_backup_db.c
@@ -12,6 +12,7 @@
 
 #include "pg_backup_db.h"
 #include "dumputils.h"
+#include "libpq/libpq-fs.h"
 
 #include <unistd.h>
 
@@ -653,20 +654,21 @@ CommitTransaction(ArchiveHandle *AH)
 }
 
 void
-DropBlobIfExists(ArchiveHandle *AH, Oid oid)
+CleanupBlobIfExists(ArchiveHandle *AH, Oid oid, bool compat)
 {
 	/* Call lo_unlink only if exists to avoid not-found error. */
-	if (PQserverVersion(AH->connection) >= 80500)
-	{
+	if (PQserverVersion(AH->connection) < 90000)
+		die_horribly(AH, NULL,
+					 "could not restore large object into older server");
+
+	if (compat)
 		ahprintf(AH, "SELECT pg_catalog.lo_unlink(oid) "
-					 "FROM pg_catalog.pg_largeobject_metadata "
-					 "WHERE oid = %u;\n", oid);
-	}
+				 "FROM pg_catalog.pg_largeobject_metadata "
+				 "WHERE oid = %u;\n", oid);
 	else
-	{
-		ahprintf(AH, "SELECT CASE WHEN EXISTS(SELECT 1 FROM pg_catalog.pg_largeobject WHERE loid = '%u') THEN pg_catalog.lo_unlink('%u') END;\n",
-				 oid, oid);
-	}
+		ahprintf(AH, "SELECT pg_catalog.lo_truncate(pg_catalog.lo_open(oid, %d), 0) "
+				 "FROM pg_catalog.pg_largeobject_metadata "
+				 "WHERE oid = %u;\n", INV_READ, oid);
 }
 
 static bool
diff --git a/src/bin/pg_dump/pg_backup_files.c b/src/bin/pg_dump/pg_backup_files.c
index 1faac0a..855d3b8 100644
--- a/src/bin/pg_dump/pg_backup_files.c
+++ b/src/bin/pg_dump/pg_backup_files.c
@@ -66,7 +66,7 @@ typedef struct
 } lclTocEntry;
 
 static const char *modulename = gettext_noop("file archiver");
-static void _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt);
+static void _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt, bool compat);
 static void _getBlobTocEntry(ArchiveHandle *AH, Oid *oid, char *fname);
 
 /*
@@ -329,8 +329,10 @@ _PrintTocData(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
 	if (!tctx->filename)
 		return;
 
-	if (strcmp(te->desc, "BLOBS") == 0)
-		_LoadBlobs(AH, ropt);
+	if (strcmp(te->desc, "BLOBS") == 0 || ropt->dataOnly)
+		_LoadBlobs(AH, ropt, true);
+	else if (strcmp(te->desc, "BLOB DATA") == 0)
+		_LoadBlobs(AH, ropt, false);
 	else
 		_PrintFileData(AH, tctx->filename, ropt);
 }
@@ -365,7 +367,7 @@ _getBlobTocEntry(ArchiveHandle *AH, Oid *oid, char fname[K_STD_BUF_SIZE])
 }
 
 static void
-_LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt)
+_LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt, bool compat)
 {
 	Oid			oid;
 	lclContext *ctx = (lclContext *) AH->formatData;
@@ -382,7 +384,7 @@ _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt)
 
 	while (oid != 0)
 	{
-		StartRestoreBlob(AH, oid, ropt->dropSchema);
+		StartRestoreBlob(AH, oid, ropt->dropSchema, compat);
 		_PrintFileData(AH, fname, ropt);
 		EndRestoreBlob(AH, oid);
 		_getBlobTocEntry(AH, &oid, fname);
diff --git a/src/bin/pg_dump/pg_backup_null.c b/src/bin/pg_dump/pg_backup_null.c
index 4217210..2570a84 100644
--- a/src/bin/pg_dump/pg_backup_null.c
+++ b/src/bin/pg_dump/pg_backup_null.c
@@ -147,14 +147,19 @@ _StartBlobs(ArchiveHandle *AH, TocEntry *te)
 static void
 _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid)
 {
+	bool	compat = (strcmp(te->desc, "BLOBS") == 0 ? true : false);
+
 	if (oid == 0)
 		die_horribly(AH, NULL, "invalid OID for large object\n");
 
 	if (AH->ropt->dropSchema)
-		DropBlobIfExists(AH, oid);
+		CleanupBlobIfExists(AH, oid, compat);
 
-	ahprintf(AH, "SELECT pg_catalog.lo_open(pg_catalog.lo_create('%u'), %d);\n",
-			 oid, INV_WRITE);
+	if (compat)
+		ahprintf(AH, "SELECT pg_catalog.lo_open(pg_catalog.lo_create('%u'), %d);\n",
+				 oid, INV_WRITE);
+	else
+		ahprintf(AH, "SELECT pg_catalog.lo_open(%u, %d);\n", oid, INV_WRITE);
 
 	AH->WriteDataPtr = _WriteBlobData;
 }
@@ -195,12 +200,14 @@ _PrintTocData(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
 	{
 		AH->currToc = te;
 
-		if (strcmp(te->desc, "BLOBS") == 0)
+		if (strcmp(te->desc, "BLOBS") == 0 ||
+			strcmp(te->desc, "BLOB DATA") == 0)
 			_StartBlobs(AH, te);
 
 		(*te->dataDumper) ((Archive *) AH, te->dataDumperArg);
 
-		if (strcmp(te->desc, "BLOBS") == 0)
+		if (strcmp(te->desc, "BLOBS") == 0 ||
+			strcmp(te->desc, "BLOB DATA") == 0)
 			_EndBlobs(AH, te);
 
 		AH->currToc = NULL;
diff --git a/src/bin/pg_dump/pg_backup_tar.c b/src/bin/pg_dump/pg_backup_tar.c
index 5cbc365..7f1a2c0 100644
--- a/src/bin/pg_dump/pg_backup_tar.c
+++ b/src/bin/pg_dump/pg_backup_tar.c
@@ -100,7 +100,7 @@ typedef struct
 
 static const char *modulename = gettext_noop("tar archiver");
 
-static void _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt);
+static void _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt, bool compat);
 
 static TAR_MEMBER *tarOpen(ArchiveHandle *AH, const char *filename, char mode);
 static void tarClose(ArchiveHandle *AH, TAR_MEMBER *TH);
@@ -695,14 +695,16 @@ _PrintTocData(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
 		return;
 	}
 
-	if (strcmp(te->desc, "BLOBS") == 0)
-		_LoadBlobs(AH, ropt);
+	if (strcmp(te->desc, "BLOBS") == 0 || ropt->dataOnly)
+		_LoadBlobs(AH, ropt, true);
+	else if (strcmp(te->desc, "BLOB DATA") == 0)
+		_LoadBlobs(AH, ropt, false);
 	else
 		_PrintFileData(AH, tctx->filename, ropt);
 }
 
 static void
-_LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt)
+_LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt, bool compat)
 {
 	Oid			oid;
 	lclContext *ctx = (lclContext *) AH->formatData;
@@ -725,7 +727,7 @@ _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt)
 			{
 				ahlog(AH, 1, "restoring large object OID %u\n", oid);
 
-				StartRestoreBlob(AH, oid, ropt->dropSchema);
+				StartRestoreBlob(AH, oid, ropt->dropSchema, compat);
 
 				while ((cnt = tarRead(buf, 4095, th)) > 0)
 				{
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 2db9e0f..513f110 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -190,9 +190,9 @@ static void selectSourceSchema(const char *schemaName);
 static char *getFormattedTypeName(Oid oid, OidOptions opts);
 static char *myFormatType(const char *typname, int32 typmod);
 static const char *fmtQualifiedId(const char *schema, const char *id);
-static bool hasBlobs(Archive *AH);
-static int	dumpBlobs(Archive *AH, void *arg);
-static int	dumpBlobComments(Archive *AH, void *arg);
+static void getBlobs(Archive *AH);
+static void dumpBlobItem(Archive *AH, BlobInfo *binfo);
+static int  dumpBlobData(Archive *AH, void *arg);
 static void dumpDatabase(Archive *AH);
 static void dumpEncoding(Archive *AH);
 static void dumpStdStrings(Archive *AH);
@@ -701,25 +701,8 @@ main(int argc, char **argv)
 			getTableDataFKConstraints();
 	}
 
-	if (outputBlobs && hasBlobs(g_fout))
-	{
-		/* Add placeholders to allow correct sorting of blobs */
-		DumpableObject *blobobj;
-		DumpableObject *blobcobj;
-
-		blobobj = (DumpableObject *) malloc(sizeof(DumpableObject));
-		blobobj->objType = DO_BLOBS;
-		blobobj->catId = nilCatalogId;
-		AssignDumpId(blobobj);
-		blobobj->name = strdup("BLOBS");
-
-		blobcobj = (DumpableObject *) malloc(sizeof(DumpableObject));
-		blobcobj->objType = DO_BLOB_COMMENTS;
-		blobcobj->catId = nilCatalogId;
-		AssignDumpId(blobcobj);
-		blobcobj->name = strdup("BLOB COMMENTS");
-		addObjectDependency(blobcobj, blobobj->dumpId);
-	}
+	if (outputBlobs)
+		getBlobs(g_fout);
 
 	/*
 	 * Collect dependency data to assist in ordering the objects.
@@ -1938,43 +1921,144 @@ dumpStdStrings(Archive *AH)
 
 
 /*
- * hasBlobs:
+ * getBlobs:
  *	Test whether database contains any large objects
  */
-static bool
-hasBlobs(Archive *AH)
+static void
+getBlobs(Archive *AH)
 {
-	bool		result;
-	const char *blobQry;
-	PGresult   *res;
+	PQExpBuffer		blobQry = createPQExpBuffer();
+	BlobInfo	   *blobobj;
+	DumpableObject *blobdata;
+	PGresult	   *res;
+	int				i;
 
 	/* Make sure we are in proper schema */
 	selectSourceSchema("pg_catalog");
 
 	/* Check for BLOB OIDs */
 	if (AH->remoteVersion >= 80500)
-		blobQry = "SELECT oid FROM pg_largeobject_metadata LIMIT 1";
+		appendPQExpBuffer(blobQry,
+						  "SELECT oid, (%s lomowner), lomacl"
+						  " FROM pg_largeobject_metadata",
+						  username_subquery);
 	else if (AH->remoteVersion >= 70100)
-		blobQry = "SELECT loid FROM pg_largeobject LIMIT 1";
+		appendPQExpBuffer(blobQry,
+						  "SELECT DISTINCT loid, NULL, NULL"
+						  " FROM pg_largeobject");
 	else
-		blobQry = "SELECT oid FROM pg_class WHERE relkind = 'l' LIMIT 1";
+		appendPQExpBuffer(blobQry,
+						  "SELECT DISTINCT oid, NULL, NULL"
+						  " FROM pg_class WHERE relkind = 'l'");
 
-	res = PQexec(g_conn, blobQry);
-	check_sql_result(res, g_conn, blobQry, PGRES_TUPLES_OK);
+	res = PQexec(g_conn, blobQry->data);
+	check_sql_result(res, g_conn, blobQry->data, PGRES_TUPLES_OK);
+
+	/*
+	 * If we have a large object at least, "BLOB DATA" section
+	 * is also necessary.
+	 */
+	if (PQntuples(res) > 0)
+	{
+		blobdata = (DumpableObject *) malloc(sizeof(DumpableObject));
+		blobdata->objType = DO_BLOB_DATA;
+		blobdata->catId = nilCatalogId;
+		AssignDumpId(blobdata);
+		blobdata->name = strdup("BLOBS");
+	}
 
-	result = PQntuples(res) > 0;
+	/*
+	 * If we don't want to dump metadata of large objects,
+	 * no need to create "BLOB ITEM" sections.
+	 */
+	if (dataOnly)
+	{
+		PQclear(res);
+		return;
+	}
+
+	/*
+	 * Now, a large object has its own "BLOB ITEM" section to
+	 * declare itself.
+	 */
+	for (i = 0; i < PQntuples(res); i++)
+	{
+		blobobj = (BlobInfo *) malloc(sizeof(BlobInfo));
+		blobobj->dobj.objType = DO_BLOB_ITEM;
+		blobobj->dobj.catId.oid = atooid(PQgetvalue(res, i, 0));
+		blobobj->dobj.catId.tableoid = LargeObjectRelationId;
+		AssignDumpId(&blobobj->dobj);
+
+		blobobj->dobj.name = strdup(PQgetvalue(res, i, 0));
+		blobobj->rolname = strdup(PQgetvalue(res, i, 1));
+		blobobj->blobacl = strdup(PQgetvalue(res, i, 2));
+	}
 
 	PQclear(res);
+}
 
-	return result;
+/*
+ * dumpBlobItem
+ *
+ * dump a definition of the given large object
+ */
+static void
+dumpBlobItem(Archive *AH, BlobInfo *binfo)
+{
+	PQExpBuffer		bquery;
+	PQExpBuffer		dquery;
+	PQExpBuffer		temp;
+
+	/* Skip if not to be dumped */
+	if (!binfo->dobj.dump || dataOnly)
+		return;
+
+	bquery = createPQExpBuffer();
+	dquery = createPQExpBuffer();
+	temp = createPQExpBuffer();
+
+	/*
+	 * Create an empty large object
+	 */
+	appendPQExpBuffer(bquery, "SELECT lo_create(%s);\n", binfo->dobj.name);
+	appendPQExpBuffer(dquery, "SELECT lo_unlink(%s);\n", binfo->dobj.name);
+
+	ArchiveEntry(AH, binfo->dobj.catId, binfo->dobj.dumpId,
+				 binfo->dobj.name,
+				 NULL, NULL,
+				 binfo->rolname, false,
+				 "BLOB ITEM", SECTION_PRE_DATA,
+				 bquery->data, dquery->data, NULL,
+				 binfo->dobj.dependencies, binfo->dobj.nDeps,
+				 NULL, NULL);
+
+	/*
+	 * Create a comment on large object, if necessary
+	 */
+	appendPQExpBuffer(temp, "LARGE OBJECT %s", binfo->dobj.name);
+	dumpComment(AH, temp->data, NULL, binfo->rolname,
+				binfo->dobj.catId, 0, binfo->dobj.dumpId);
+
+	/*
+	 * Dump access privileges, if necessary
+	 */
+	dumpACL(AH, binfo->dobj.catId, binfo->dobj.dumpId,
+			"LARGE OBJECT",
+			binfo->dobj.name, NULL,
+			binfo->dobj.name, NULL,
+			binfo->rolname, binfo->blobacl);
+
+	destroyPQExpBuffer(bquery);
+	destroyPQExpBuffer(dquery);
+	destroyPQExpBuffer(temp);
 }
 
 /*
- * dumpBlobs:
- *	dump all blobs
+ * dumpBlobData:
+ *	dump all the data contents of large object
  */
 static int
-dumpBlobs(Archive *AH, void *arg)
+dumpBlobData(Archive *AH, void *arg)
 {
 	const char *blobQry;
 	const char *blobFetchQry;
@@ -2022,7 +2106,7 @@ dumpBlobs(Archive *AH, void *arg)
 			loFd = lo_open(g_conn, blobOid, INV_READ);
 			if (loFd == -1)
 			{
-				write_msg(NULL, "dumpBlobs(): could not open large object: %s",
+				write_msg(NULL, "dumpBlobData(): could not open large object: %s",
 						  PQerrorMessage(g_conn));
 				exit_nicely();
 			}
@@ -2035,7 +2119,7 @@ dumpBlobs(Archive *AH, void *arg)
 				cnt = lo_read(g_conn, loFd, buf, LOBBUFSIZE);
 				if (cnt < 0)
 				{
-					write_msg(NULL, "dumpBlobs(): error reading large object: %s",
+					write_msg(NULL, "dumpBlobData(): error reading large object: %s",
 							  PQerrorMessage(g_conn));
 					exit_nicely();
 				}
@@ -2054,134 +2138,6 @@ dumpBlobs(Archive *AH, void *arg)
 	return 1;
 }
 
-/*
- * dumpBlobComments
- *	dump all blob properties.
- *  It has "BLOB COMMENTS" tag due to the historical reason, but note
- *  that it is the routine to dump all the properties of blobs.
- *
- * Since we don't provide any way to be selective about dumping blobs,
- * there's no need to be selective about their comments either.  We put
- * all the comments into one big TOC entry.
- */
-static int
-dumpBlobComments(Archive *AH, void *arg)
-{
-	const char *blobQry;
-	const char *blobFetchQry;
-	PQExpBuffer cmdQry = createPQExpBuffer();
-	PGresult   *res;
-	int			i;
-
-	if (g_verbose)
-		write_msg(NULL, "saving large object properties\n");
-
-	/* Make sure we are in proper schema */
-	selectSourceSchema("pg_catalog");
-
-	/* Cursor to get all BLOB comments */
-	if (AH->remoteVersion >= 80500)
-		blobQry = "DECLARE blobcmt CURSOR FOR SELECT oid, "
-			"obj_description(oid, 'pg_largeobject'), "
-			"pg_get_userbyid(lomowner), lomacl "
-			"FROM pg_largeobject_metadata";
-	else if (AH->remoteVersion >= 70300)
-		blobQry = "DECLARE blobcmt CURSOR FOR SELECT loid, "
-			"obj_description(loid, 'pg_largeobject'), NULL, NULL "
-			"FROM (SELECT DISTINCT loid FROM "
-			"pg_description d JOIN pg_largeobject l ON (objoid = loid) "
-			"WHERE classoid = 'pg_largeobject'::regclass) ss";
-	else if (AH->remoteVersion >= 70200)
-		blobQry = "DECLARE blobcmt CURSOR FOR SELECT loid, "
-			"obj_description(loid, 'pg_largeobject'), NULL, NULL "
-			"FROM (SELECT DISTINCT loid FROM pg_largeobject) ss";
-	else if (AH->remoteVersion >= 70100)
-		blobQry = "DECLARE blobcmt CURSOR FOR SELECT loid, "
-			"obj_description(loid), NULL, NULL "
-			"FROM (SELECT DISTINCT loid FROM pg_largeobject) ss";
-	else
-		blobQry = "DECLARE blobcmt CURSOR FOR SELECT oid, "
-			"	( "
-			"		SELECT description "
-			"		FROM pg_description pd "
-			"		WHERE pd.objoid=pc.oid "
-			"	), NULL, NULL "
-			"FROM pg_class pc WHERE relkind = 'l'";
-
-	res = PQexec(g_conn, blobQry);
-	check_sql_result(res, g_conn, blobQry, PGRES_COMMAND_OK);
-
-	/* Command to fetch from cursor */
-	blobFetchQry = "FETCH 100 IN blobcmt";
-
-	do
-	{
-		PQclear(res);
-
-		/* Do a fetch */
-		res = PQexec(g_conn, blobFetchQry);
-		check_sql_result(res, g_conn, blobFetchQry, PGRES_TUPLES_OK);
-
-		/* Process the tuples, if any */
-		for (i = 0; i < PQntuples(res); i++)
-		{
-			Oid			blobOid = atooid(PQgetvalue(res, i, 0));
-			char	   *lo_comment = PQgetvalue(res, i, 1);
-			char	   *lo_owner = PQgetvalue(res, i, 2);
-			char	   *lo_acl = PQgetvalue(res, i, 3);
-			char		lo_name[32];
-
-			resetPQExpBuffer(cmdQry);
-
-			/* comment on the blob */
-			if (!PQgetisnull(res, i, 1))
-			{
-				appendPQExpBuffer(cmdQry,
-								  "COMMENT ON LARGE OBJECT %u IS ", blobOid);
-				appendStringLiteralAH(cmdQry, lo_comment, AH);
-				appendPQExpBuffer(cmdQry, ";\n");
-			}
-
-			/* dump blob ownership, if necessary */
-			if (!PQgetisnull(res, i, 2))
-			{
-				appendPQExpBuffer(cmdQry,
-								  "ALTER LARGE OBJECT %u OWNER TO %s;\n",
-								  blobOid, lo_owner);
-			}
-
-			/* dump blob privileges, if necessary */
-			if (!PQgetisnull(res, i, 3) &&
-				!dataOnly && !aclsSkip)
-			{
-				snprintf(lo_name, sizeof(lo_name), "%u", blobOid);
-				if (!buildACLCommands(lo_name, NULL, "LARGE OBJECT",
-									  lo_acl, lo_owner, "",
-									  AH->remoteVersion, cmdQry))
-				{
-					write_msg(NULL, "could not parse ACL (%s) for "
-							  "large object %u", lo_acl, blobOid);
-					exit_nicely();
-				}
-			}
-
-			if (cmdQry->len > 0)
-			{
-				appendPQExpBuffer(cmdQry, "\n");
-				archputs(cmdQry->data, AH);
-			}
-		}
-	} while (PQntuples(res) > 0);
-
-	PQclear(res);
-
-	archputs("\n", AH);
-
-	destroyPQExpBuffer(cmdQry);
-
-	return 1;
-}
-
 static void
 binary_upgrade_set_type_oids_by_type_oid(PQExpBuffer upgrade_buffer,
 											   Oid pg_type_oid)
@@ -6524,21 +6480,24 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 		case DO_DEFAULT_ACL:
 			dumpDefaultACL(fout, (DefaultACLInfo *) dobj);
 			break;
-		case DO_BLOBS:
-			ArchiveEntry(fout, dobj->catId, dobj->dumpId,
-						 dobj->name, NULL, NULL, "",
-						 false, "BLOBS", SECTION_DATA,
-						 "", "", NULL,
-						 dobj->dependencies, dobj->nDeps,
-						 dumpBlobs, NULL);
+		case DO_BLOB_ITEM:
+			dumpBlobItem(fout, (BlobInfo *) dobj);
 			break;
-		case DO_BLOB_COMMENTS:
+		case DO_BLOB_DATA:
+			/*
+			 * If --data-only is given, pg_dump skips DO_BLOB_ITEM entries,
+			 * because it triggers all the metadata of large objects (owner,
+			 * access privileges and comments).
+			 * In this case, we mark DO_BLOB_DATA entry as a legacy "BLOBS"
+			 * section to ensure pg_restore injects large object creation
+			 * just before data loading.
+			 */
 			ArchiveEntry(fout, dobj->catId, dobj->dumpId,
-						 dobj->name, NULL, NULL, "",
-						 false, "BLOB COMMENTS", SECTION_DATA,
-						 "", "", NULL,
+						 dobj->name, NULL, NULL, "", false,
+						 !dataOnly ? "BLOB DATA" : "BLOBS",
+						 SECTION_DATA, "", "", NULL,
 						 dobj->dependencies, dobj->nDeps,
-						 dumpBlobComments, NULL);
+						 dumpBlobData, NULL);
 			break;
 	}
 }
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 1e65fac..ccf7348 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -115,8 +115,8 @@ typedef enum
 	DO_FDW,
 	DO_FOREIGN_SERVER,
 	DO_DEFAULT_ACL,
-	DO_BLOBS,
-	DO_BLOB_COMMENTS
+	DO_BLOB_DATA,
+	DO_BLOB_ITEM,
 } DumpableObjectType;
 
 typedef struct _dumpableObject
@@ -443,6 +443,13 @@ typedef struct _defaultACLInfo
 	char	   *defaclacl;
 } DefaultACLInfo;
 
+typedef struct _blobInfo
+{
+	DumpableObject	dobj;
+	char	   *rolname;
+	char	   *blobacl;
+} BlobInfo;
+
 /* global decls */
 extern bool force_quotes;		/* double-quotes for identifiers flag */
 extern bool g_verbose;			/* verbose flag */
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 6676baf..be98c81 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -92,8 +92,8 @@ static const int newObjectTypePriority[] =
 	14,							/* DO_FDW */
 	15,							/* DO_FOREIGN_SERVER */
 	27,							/* DO_DEFAULT_ACL */
-	20,							/* DO_BLOBS */
-	21							/* DO_BLOB_COMMENTS */
+	21,							/* DO_BLOB_DATA */
+	20,							/* DO_BLOB_ITEM */
 };
 
 
@@ -1146,14 +1146,14 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 					 "DEFAULT ACL %s  (ID %d OID %u)",
 					 obj->name, obj->dumpId, obj->catId.oid);
 			return;
-		case DO_BLOBS:
+		case DO_BLOB_DATA:
 			snprintf(buf, bufsize,
-					 "BLOBS  (ID %d)",
+					 "BLOB DATA  (ID %d)",
 					 obj->dumpId);
 			return;
-		case DO_BLOB_COMMENTS:
+		case DO_BLOB_ITEM:
 			snprintf(buf, bufsize,
-					 "BLOB COMMENTS  (ID %d)",
+					 "BLOB ITEM  (ID %d)",
 					 obj->dumpId);
 			return;
 	}
#109Robert Haas
robertmhaas@gmail.com
In reply to: KaiGai Kohei (#107)
Re: Largeobject Access Controls (r2460)

2010/2/4 KaiGai Kohei <kaigai@ak.jp.nec.com>:

(2010/02/04 0:20), Robert Haas wrote:

2010/2/1 KaiGai Kohei<kaigai@ak.jp.nec.com>:

I again wonder whether we are on the right direction.

I believe the proposed approach is to dump blob metadata if and only
if you are also dumping blob contents, and to do all of this for data
dumps but not schema dumps.  That seems about right to me.

In other words:

 <default>     -> blob contents and metadata (owner, acl, comments) shall
                 be dumped
 --data-only   -> only blob contents shall be dumped
 --schema-only -> neither blob contents and metadata are dumped.

Can I understand correctly?

No, that's not what I said. Please reread. I don't think you should
ever dump blob contents without the metadata, or the other way around.

...Robert

#110Alvaro Herrera
alvherre@commandprompt.com
In reply to: Robert Haas (#109)
Re: Largeobject Access Controls (r2460)

Robert Haas escribi�:

2010/2/4 KaiGai Kohei <kaigai@ak.jp.nec.com>:

(2010/02/04 0:20), Robert Haas wrote:

2010/2/1 KaiGai Kohei<kaigai@ak.jp.nec.com>:

I again wonder whether we are on the right direction.

I believe the proposed approach is to dump blob metadata if and only
if you are also dumping blob contents, and to do all of this for data
dumps but not schema dumps. �That seems about right to me.

In other words:

�<default> � � -> blob contents and metadata (owner, acl, comments) shall
� � � � � � � � �be dumped
�--data-only � -> only blob contents shall be dumped
�--schema-only -> neither blob contents and metadata are dumped.

Can I understand correctly?

No, that's not what I said. Please reread. I don't think you should
ever dump blob contents without the metadata, or the other way around.

So:
default: both contents and metadata
--data-only: same
--schema-only: neither

Seems reasonable.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#111KaiGai Kohei
kaigai@kaigai.gr.jp
In reply to: Alvaro Herrera (#110)
Re: Largeobject Access Controls (r2460)

(2010/02/05 3:27), Alvaro Herrera wrote:

Robert Haas escribi�:

2010/2/4 KaiGai Kohei<kaigai@ak.jp.nec.com>:

(2010/02/04 0:20), Robert Haas wrote:

2010/2/1 KaiGai Kohei<kaigai@ak.jp.nec.com>:

I again wonder whether we are on the right direction.

I believe the proposed approach is to dump blob metadata if and only
if you are also dumping blob contents, and to do all of this for data
dumps but not schema dumps. That seems about right to me.

In other words:

<default> -> blob contents and metadata (owner, acl, comments) shall
be dumped
--data-only -> only blob contents shall be dumped
--schema-only -> neither blob contents and metadata are dumped.

Can I understand correctly?

No, that's not what I said. Please reread. I don't think you should
ever dump blob contents without the metadata, or the other way around.

So:
default: both contents and metadata
--data-only: same
--schema-only: neither

Seems reasonable.

OK... I'll try to update the patch, anyway.

However, it means only large object performs an exceptional object class
that dumps its owner, acl and comment even if --data-only is given.
Is it really what you suggested, isn't it?

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

#112Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp
In reply to: KaiGai Kohei (#111)
Re: Largeobject Access Controls (r2460)

KaiGai Kohei <kaigai@kaigai.gr.jp> wrote:

default: both contents and metadata
--data-only: same
--schema-only: neither

However, it means only large object performs an exceptional object class
that dumps its owner, acl and comment even if --data-only is given.
Is it really what you suggested, isn't it?

I wonder we still need to have both "BLOB ITEM" and "BLOB DATA"
even if we will take the all-or-nothing behavior. Can we handle
BLOB's owner, acl, comment and data with one entry kind?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

#113KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Takahiro Itagaki (#112)
Re: Largeobject Access Controls (r2460)

(2010/02/05 13:53), Takahiro Itagaki wrote:

KaiGai Kohei<kaigai@kaigai.gr.jp> wrote:

default: both contents and metadata
--data-only: same
--schema-only: neither

However, it means only large object performs an exceptional object class
that dumps its owner, acl and comment even if --data-only is given.
Is it really what you suggested, isn't it?

I wonder we still need to have both "BLOB ITEM" and "BLOB DATA"
even if we will take the all-or-nothing behavior. Can we handle
BLOB's owner, acl, comment and data with one entry kind?

Is it possible to fetch a certain blob from tar/custom archive
when pg_restore found a toc entry of the blob?

Currently, when pg_restore found a "BLOB DATA" or "BLOBS" entry,
it opens the archive and restores all the blob objects sequentially.
It seems to me we also have to rework the custom format........

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

#114KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Takahiro Itagaki (#112)
Re: Largeobject Access Controls (r2460)

(2010/02/05 13:53), Takahiro Itagaki wrote:

KaiGai Kohei<kaigai@kaigai.gr.jp> wrote:

default: both contents and metadata
--data-only: same
--schema-only: neither

However, it means only large object performs an exceptional object class
that dumps its owner, acl and comment even if --data-only is given.
Is it really what you suggested, isn't it?

I wonder we still need to have both "BLOB ITEM" and "BLOB DATA"
even if we will take the all-or-nothing behavior. Can we handle
BLOB's owner, acl, comment and data with one entry kind?

I looked at the corresponding code.

Currently, we have three _LoadBlobs() variations in pg_backup_tar.c,
pg_backup_files.c and pg_backup_custom.c.

In the _tar.c and _files.c case, we can reasonably fetch data contents
of the blob to be restored. All we need to do is to provide an explicit
filename to the tarOpen() function, and a blob is not necessary to be
restored sequentially.
It means pg_restore can restore an arbitrary file when it found a new
unified blob entry.

In the _custom.c case, its _LoadBlobs() is called from _PrintTocData()
when the given TocEntry is "BLOBS", and it tries to load the following
multiple blobs. However, I could not find any restriction that custom
format cannot have multiple "BLOBS" section. In other word, we can
write out multiple sections with a blob for each a new unified blob entry.

Right now, it seems to me it is feasible to implement what you suggested.

The matter is whether we should do it, or not.
At least, it seems to me better than some of exceptional treatments in
pg_dump and pg_restore from the perspective of design.

What is your opinion?

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

#115KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Takahiro Itagaki (#112)
1 attachment(s)
Re: Largeobject Access Controls (r2460)

(2010/02/05 13:53), Takahiro Itagaki wrote:

KaiGai Kohei<kaigai@kaigai.gr.jp> wrote:

default: both contents and metadata
--data-only: same
--schema-only: neither

However, it means only large object performs an exceptional object class
that dumps its owner, acl and comment even if --data-only is given.
Is it really what you suggested, isn't it?

I wonder we still need to have both "BLOB ITEM" and "BLOB DATA"
even if we will take the all-or-nothing behavior. Can we handle
BLOB's owner, acl, comment and data with one entry kind?

The attached patch was a refactored one according to the suggestion.

In the default or --data-only, it dumps data contents of large objects
and its properties (owner, comment and access privileges), but it dumps
nothing when --schema-only is given.

default: both contents and metadata
--data-only: same
--schema-only: neither

It replaced existing "BLOBS" and "BLOB COMMENTS" sections by the new
"LARGE OBJECT" section which is associated with a certain large object.
Its section header contains OID of the large object to be restored, so
the pg_restore tries to load the specified large object from the given
archive.

_PrintTocData() handlers were modified to support the "LARGE OBJECT"
section that loads the specified large object only, not whole of the
archived ones like "BLOBS". It also support to read "BLOBS" and "BLOB
COMMENTS" sections, but never write out these legacy sections any more.

The archive file will never contain "blobs.toc", because we can find
OID of the large objects to be restored in the section header, without
any special purpose files. It also allows to omit _StartBlobs() and
_EndBlobs() method in tar and file format.

Basically, I like this approach more than the previous combination of
"BLOB ITEM" and "BLOB DATA".

However, we have a known issue here.
The ACL section is categorized to REQ_SCHEMA in _tocEntryRequired(),
so we cannot dump them when --data-only options, even if large object
itself is dumped out. Of course, we can solve it with putting a few more
exceptional treatments, although it is not graceful.
However, it seems to me the matter comes from that _tocEntryRequired()
can only returns a mask of REQ_SCHEMA and REQ_DATA. Right now, it is
not easy to categorize ACL/COMMENT section into either of them.
I think we should consider REQ_ACL and REQ_COMMENT to inform caller
whether the appeared section to be dumped now, or not.

Any idea?

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

pgsql-fix-pg_dump-blob-privs.5.patchapplication/octect-stream; name=pgsql-fix-pg_dump-blob-privs.5.patchDownload
*** a/src/bin/pg_dump/pg_backup_archiver.c
--- b/src/bin/pg_dump/pg_backup_archiver.c
***************
*** 329,335 **** RestoreArchive(Archive *AHX, RestoreOptions *ropt)
  			AH->currentTE = te;
  
  			reqs = _tocEntryRequired(te, ropt, false /* needn't drop ACLs */ );
! 			if (((reqs & REQ_SCHEMA) != 0) && te->dropStmt)
  			{
  				/* We want the schema */
  				ahlog(AH, 1, "dropping %s %s\n", te->desc, te->tag);
--- 329,336 ----
  			AH->currentTE = te;
  
  			reqs = _tocEntryRequired(te, ropt, false /* needn't drop ACLs */ );
! 			if (((reqs & REQ_SCHEMA) != 0 ||
! 				 strcmp(te->desc, "LARGE OBJECT") == 0) && te->dropStmt)
  			{
  				/* We want the schema */
  				ahlog(AH, 1, "dropping %s %s\n", te->desc, te->tag);
***************
*** 448,454 **** restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
  
  	defnDumped = false;
  
! 	if ((reqs & REQ_SCHEMA) != 0)		/* We want the schema */
  	{
  		ahlog(AH, 1, "creating %s %s\n", te->desc, te->tag);
  
--- 449,456 ----
  
  	defnDumped = false;
  
! 	if ((reqs & REQ_SCHEMA) != 0 &&				/* We want the schema */
! 		strcmp(te->desc, "LARGE OBJECT") != 0)	/* or large object */
  	{
  		ahlog(AH, 1, "creating %s %s\n", te->desc, te->tag);
  
***************
*** 519,525 **** restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
  			{
  				_printTocEntry(AH, te, ropt, true, false);
  
! 				if (strcmp(te->desc, "BLOBS") == 0 ||
  					strcmp(te->desc, "BLOB COMMENTS") == 0)
  				{
  					ahlog(AH, 1, "restoring %s\n", te->desc);
--- 521,528 ----
  			{
  				_printTocEntry(AH, te, ropt, true, false);
  
! 				if (strcmp(te->desc, "LARGE OBJECT") == 0 ||
! 					strcmp(te->desc, "BLOBS") == 0 ||
  					strcmp(te->desc, "BLOB COMMENTS") == 0)
  				{
  					ahlog(AH, 1, "restoring %s\n", te->desc);
***************
*** 903,909 **** EndRestoreBlobs(ArchiveHandle *AH)
   * Called by a format handler to initiate restoration of a blob
   */
  void
! StartRestoreBlob(ArchiveHandle *AH, Oid oid, bool drop)
  {
  	Oid			loOid;
  
--- 906,912 ----
   * Called by a format handler to initiate restoration of a blob
   */
  void
! StartRestoreBlob(ArchiveHandle *AH, Oid oid, bool cleanup, bool compat)
  {
  	Oid			loOid;
  
***************
*** 914,938 **** StartRestoreBlob(ArchiveHandle *AH, Oid oid, bool drop)
  
  	ahlog(AH, 2, "restoring large object with OID %u\n", oid);
  
! 	if (drop)
! 		DropBlobIfExists(AH, oid);
  
  	if (AH->connection)
  	{
! 		loOid = lo_create(AH->connection, oid);
! 		if (loOid == 0 || loOid != oid)
! 			die_horribly(AH, modulename, "could not create large object %u\n",
! 						 oid);
! 
  		AH->loFd = lo_open(AH->connection, oid, INV_WRITE);
  		if (AH->loFd == -1)
  			die_horribly(AH, modulename, "could not open large object\n");
  	}
! 	else
  	{
  		ahprintf(AH, "SELECT pg_catalog.lo_open(pg_catalog.lo_create('%u'), %d);\n",
  				 oid, INV_WRITE);
  	}
  
  	AH->writingBlob = 1;
  }
--- 917,947 ----
  
  	ahlog(AH, 2, "restoring large object with OID %u\n", oid);
  
! 	if (cleanup)
! 		CleanupBlobIfExists(AH, oid, compat);
  
  	if (AH->connection)
  	{
! 		if (compat)
! 		{
! 			loOid = lo_create(AH->connection, oid);
! 			if (loOid == 0 || loOid != oid)
! 				die_horribly(AH, modulename, "could not create large object %u\n", oid);
! 		}
  		AH->loFd = lo_open(AH->connection, oid, INV_WRITE);
  		if (AH->loFd == -1)
  			die_horribly(AH, modulename, "could not open large object\n");
  	}
! 	else if (compat)
  	{
  		ahprintf(AH, "SELECT pg_catalog.lo_open(pg_catalog.lo_create('%u'), %d);\n",
  				 oid, INV_WRITE);
  	}
+ 	else
+ 	{
+ 		ahprintf(AH, "SELECT pg_catalog.lo_open('%u', %d);\n",
+ 				 oid, INV_WRITE);
+ 	}
  
  	AH->writingBlob = 1;
  }
***************
*** 1940,1946 **** WriteDataChunks(ArchiveHandle *AH)
  			AH->currToc = te;
  			/* printf("Writing data for %d (%x)\n", te->id, te); */
  
! 			if (strcmp(te->desc, "BLOBS") == 0)
  			{
  				startPtr = AH->StartBlobsPtr;
  				endPtr = AH->EndBlobsPtr;
--- 1949,1956 ----
  			AH->currToc = te;
  			/* printf("Writing data for %d (%x)\n", te->id, te); */
  
! 			if (strcmp(te->desc, "BLOBS") == 0 ||
! 				strcmp(te->desc, "LARGE OBJECT") == 0)
  			{
  				startPtr = AH->StartBlobsPtr;
  				endPtr = AH->EndBlobsPtr;
***************
*** 2685,2690 **** _getObjectDescription(PQExpBuffer buf, TocEntry *te, ArchiveHandle *AH)
--- 2695,2707 ----
  		return;
  	}
  
+ 	/* objects named by just an identifier */
+ 	if (strcmp(type, "LARGE OBJECT") == 0)
+ 	{
+ 		appendPQExpBuffer(buf, "%s %s", type, te->tag);
+ 		return;
+ 	}
+ 
  	/*
  	 * These object types require additional decoration.  Fortunately, the
  	 * information needed is exactly what's in the DROP command.
***************
*** 2828,2833 **** _printTocEntry(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt, bool isDat
--- 2845,2851 ----
  			strcmp(te->desc, "DATABASE") == 0 ||
  			strcmp(te->desc, "DOMAIN") == 0 ||
  			strcmp(te->desc, "FUNCTION") == 0 ||
+ 			strcmp(te->desc, "LARGE OBJECT") == 0 ||
  			strcmp(te->desc, "OPERATOR") == 0 ||
  			strcmp(te->desc, "OPERATOR CLASS") == 0 ||
  			strcmp(te->desc, "OPERATOR FAMILY") == 0 ||
*** a/src/bin/pg_dump/pg_backup_archiver.h
--- b/src/bin/pg_dump/pg_backup_archiver.h
***************
*** 359,365 **** int			ReadOffset(ArchiveHandle *, pgoff_t *);
  size_t		WriteOffset(ArchiveHandle *, pgoff_t, int);
  
  extern void StartRestoreBlobs(ArchiveHandle *AH);
! extern void StartRestoreBlob(ArchiveHandle *AH, Oid oid, bool drop);
  extern void EndRestoreBlob(ArchiveHandle *AH, Oid oid);
  extern void EndRestoreBlobs(ArchiveHandle *AH);
  
--- 359,365 ----
  size_t		WriteOffset(ArchiveHandle *, pgoff_t, int);
  
  extern void StartRestoreBlobs(ArchiveHandle *AH);
! extern void StartRestoreBlob(ArchiveHandle *AH, Oid oid, bool cleanup, bool compat);
  extern void EndRestoreBlob(ArchiveHandle *AH, Oid oid);
  extern void EndRestoreBlobs(ArchiveHandle *AH);
  
***************
*** 371,377 **** extern void InitArchiveFmt_Tar(ArchiveHandle *AH);
  extern bool isValidTarHeader(char *header);
  
  extern int	ReconnectToServer(ArchiveHandle *AH, const char *dbname, const char *newUser);
! extern void	DropBlobIfExists(ArchiveHandle *AH, Oid oid);
  
  int			ahwrite(const void *ptr, size_t size, size_t nmemb, ArchiveHandle *AH);
  int			ahprintf(ArchiveHandle *AH, const char *fmt,...) __attribute__((format(printf, 2, 3)));
--- 371,377 ----
  extern bool isValidTarHeader(char *header);
  
  extern int	ReconnectToServer(ArchiveHandle *AH, const char *dbname, const char *newUser);
! extern void	CleanupBlobIfExists(ArchiveHandle *AH, Oid oid, bool compat);
  
  int			ahwrite(const void *ptr, size_t size, size_t nmemb, ArchiveHandle *AH);
  int			ahprintf(ArchiveHandle *AH, const char *fmt,...) __attribute__((format(printf, 2, 3)));
*** a/src/bin/pg_dump/pg_backup_custom.c
--- b/src/bin/pg_dump/pg_backup_custom.c
***************
*** 54,60 **** static void _StartBlobs(ArchiveHandle *AH, TocEntry *te);
  static void _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
  static void _EndBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
  static void _EndBlobs(ArchiveHandle *AH, TocEntry *te);
! static void _LoadBlobs(ArchiveHandle *AH, bool drop);
  static void _Clone(ArchiveHandle *AH);
  static void _DeClone(ArchiveHandle *AH);
  
--- 54,60 ----
  static void _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
  static void _EndBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
  static void _EndBlobs(ArchiveHandle *AH, TocEntry *te);
! static void _LoadBlobs(ArchiveHandle *AH, bool cleanup, bool compat);
  static void _Clone(ArchiveHandle *AH);
  static void _DeClone(ArchiveHandle *AH);
  
***************
*** 498,504 **** _PrintTocData(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  			break;
  
  		case BLK_BLOBS:
! 			_LoadBlobs(AH, ropt->dropSchema);
  			break;
  
  		default:				/* Always have a default */
--- 498,505 ----
  			break;
  
  		case BLK_BLOBS:
! 			_LoadBlobs(AH, ropt->dropSchema,
! 					   strcmp(te->desc, "BLOBS") == 0 ? true : false);
  			break;
  
  		default:				/* Always have a default */
***************
*** 619,625 **** _PrintData(ArchiveHandle *AH)
  }
  
  static void
! _LoadBlobs(ArchiveHandle *AH, bool drop)
  {
  	Oid			oid;
  
--- 620,626 ----
  }
  
  static void
! _LoadBlobs(ArchiveHandle *AH, bool cleanup, bool compat)
  {
  	Oid			oid;
  
***************
*** 628,634 **** _LoadBlobs(ArchiveHandle *AH, bool drop)
  	oid = ReadInt(AH);
  	while (oid != 0)
  	{
! 		StartRestoreBlob(AH, oid, drop);
  		_PrintData(AH);
  		EndRestoreBlob(AH, oid);
  		oid = ReadInt(AH);
--- 629,635 ----
  	oid = ReadInt(AH);
  	while (oid != 0)
  	{
! 		StartRestoreBlob(AH, oid, cleanup, compat);
  		_PrintData(AH);
  		EndRestoreBlob(AH, oid);
  		oid = ReadInt(AH);
*** a/src/bin/pg_dump/pg_backup_db.c
--- b/src/bin/pg_dump/pg_backup_db.c
***************
*** 12,17 ****
--- 12,18 ----
  
  #include "pg_backup_db.h"
  #include "dumputils.h"
+ #include "libpq/libpq-fs.h"
  
  #include <unistd.h>
  
***************
*** 653,672 **** CommitTransaction(ArchiveHandle *AH)
  }
  
  void
! DropBlobIfExists(ArchiveHandle *AH, Oid oid)
  {
! 	/* Call lo_unlink only if exists to avoid not-found error. */
! 	if (PQserverVersion(AH->connection) >= 80500)
! 	{
  		ahprintf(AH, "SELECT pg_catalog.lo_unlink(oid) "
! 					 "FROM pg_catalog.pg_largeobject_metadata "
! 					 "WHERE oid = %u;\n", oid);
! 	}
  	else
! 	{
! 		ahprintf(AH, "SELECT CASE WHEN EXISTS(SELECT 1 FROM pg_catalog.pg_largeobject WHERE loid = '%u') THEN pg_catalog.lo_unlink('%u') END;\n",
! 				 oid, oid);
! 	}
  }
  
  static bool
--- 654,674 ----
  }
  
  void
! CleanupBlobIfExists(ArchiveHandle *AH, Oid oid, bool compat)
  {
! 	if (AH->connection &&
! 		PQserverVersion(AH->connection) < 80500)
! 		die_horribly(AH, NULL,
! 					 "could not restore large object into older server");
! 
! 	if (compat)
  		ahprintf(AH, "SELECT pg_catalog.lo_unlink(oid) "
! 				 "FROM pg_catalog.pg_largeobject_metadata "
! 				 "WHERE oid = %u;\n", oid);
  	else
! 		ahprintf(AH, "SELECT pg_catalog.lo_truncate(pg_catalog.lo_open(oid, %d), 0) "
! 				 "FROM pg_catalog.pg_largeobject_metadata "
! 				 "WHERE oid = %u;\n", INV_READ, oid);
  }
  
  static bool
*** a/src/bin/pg_dump/pg_backup_files.c
--- b/src/bin/pg_dump/pg_backup_files.c
***************
*** 41,50 **** static void _WriteExtraToc(ArchiveHandle *AH, TocEntry *te);
  static void _ReadExtraToc(ArchiveHandle *AH, TocEntry *te);
  static void _PrintExtraToc(ArchiveHandle *AH, TocEntry *te);
  
- static void _StartBlobs(ArchiveHandle *AH, TocEntry *te);
  static void _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
  static void _EndBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
- static void _EndBlobs(ArchiveHandle *AH, TocEntry *te);
  
  #define K_STD_BUF_SIZE 1024
  
--- 41,48 ----
***************
*** 67,72 **** typedef struct
--- 65,72 ----
  
  static const char *modulename = gettext_noop("file archiver");
  static void _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt);
+ static void _LoadLargeObject(ArchiveHandle *AH, TocEntry *te,
+ 							 RestoreOptions *ropt);
  static void _getBlobTocEntry(ArchiveHandle *AH, Oid *oid, char *fname);
  
  /*
***************
*** 93,102 **** InitArchiveFmt_Files(ArchiveHandle *AH)
  	AH->WriteExtraTocPtr = _WriteExtraToc;
  	AH->PrintExtraTocPtr = _PrintExtraToc;
  
! 	AH->StartBlobsPtr = _StartBlobs;
  	AH->StartBlobPtr = _StartBlob;
  	AH->EndBlobPtr = _EndBlob;
! 	AH->EndBlobsPtr = _EndBlobs;
  	AH->ClonePtr = NULL;
  	AH->DeClonePtr = NULL;
  
--- 93,102 ----
  	AH->WriteExtraTocPtr = _WriteExtraToc;
  	AH->PrintExtraTocPtr = _PrintExtraToc;
  
! 	AH->StartBlobsPtr = NULL;
  	AH->StartBlobPtr = _StartBlob;
  	AH->EndBlobPtr = _EndBlob;
! 	AH->EndBlobsPtr = NULL;
  	AH->ClonePtr = NULL;
  	AH->DeClonePtr = NULL;
  
***************
*** 331,336 **** _PrintTocData(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
--- 331,338 ----
  
  	if (strcmp(te->desc, "BLOBS") == 0)
  		_LoadBlobs(AH, ropt);
+ 	else if (strcmp(te->desc, "LARGE OBJECT") == 0)
+ 		_LoadLargeObject(AH, te, ropt);
  	else
  		_PrintFileData(AH, tctx->filename, ropt);
  }
***************
*** 382,388 **** _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt)
  
  	while (oid != 0)
  	{
! 		StartRestoreBlob(AH, oid, ropt->dropSchema);
  		_PrintFileData(AH, fname, ropt);
  		EndRestoreBlob(AH, oid);
  		_getBlobTocEntry(AH, &oid, fname);
--- 384,390 ----
  
  	while (oid != 0)
  	{
! 		StartRestoreBlob(AH, oid, ropt->dropSchema, true);
  		_PrintFileData(AH, fname, ropt);
  		EndRestoreBlob(AH, oid);
  		_getBlobTocEntry(AH, &oid, fname);
***************
*** 394,399 **** _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt)
--- 396,418 ----
  	EndRestoreBlobs(AH);
  }
  
+ static void
+ _LoadLargeObject(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
+ {
+ 	Oid			oid = atooid(te->tag);
+ 	char		fname[K_STD_BUF_SIZE];
+ 
+ 	snprintf(fname, sizeof(fname), "blob_%u.dat%s",
+ 			 oid, (AH->compression ? ".gz" : ""));
+ 
+ 	StartRestoreBlobs(AH);
+ 
+ 	StartRestoreBlob(AH, oid, ropt->dropSchema, false);
+ 	_PrintFileData(AH, fname, ropt);
+ 	EndRestoreBlob(AH, oid);
+ 
+ 	EndRestoreBlobs(AH);
+ }
  
  static int
  _WriteByte(ArchiveHandle *AH, const int i)
***************
*** 468,496 **** _CloseArchive(ArchiveHandle *AH)
   */
  
  /*
-  * Called by the archiver when starting to save all BLOB DATA (not schema).
-  * This routine should save whatever format-specific information is needed
-  * to read the BLOBs back into memory.
-  *
-  * It is called just prior to the dumper's DataDumper routine.
-  *
-  * Optional, but strongly recommended.
-  */
- static void
- _StartBlobs(ArchiveHandle *AH, TocEntry *te)
- {
- 	lclContext *ctx = (lclContext *) AH->formatData;
- 	char		fname[K_STD_BUF_SIZE];
- 
- 	sprintf(fname, "blobs.toc");
- 	ctx->blobToc = fopen(fname, PG_BINARY_W);
- 
- 	if (ctx->blobToc == NULL)
- 		die_horribly(AH, modulename,
- 		"could not open large object TOC for output: %s\n", strerror(errno));
- }
- 
- /*
   * Called by the archiver when the dumper calls StartBlob.
   *
   * Mandatory.
--- 487,492 ----
***************
*** 517,524 **** _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid)
  	sprintf(fmode, "wb%d", AH->compression);
  	sprintf(fname, "blob_%u.dat%s", oid, sfx);
  
- 	fprintf(ctx->blobToc, "%u %s\n", oid, fname);
- 
  #ifdef HAVE_LIBZ
  	tctx->FH = gzopen(fname, fmode);
  #else
--- 513,518 ----
***************
*** 543,562 **** _EndBlob(ArchiveHandle *AH, TocEntry *te, Oid oid)
  	if (GZCLOSE(tctx->FH) != 0)
  		die_horribly(AH, modulename, "could not close large object file\n");
  }
- 
- /*
-  * Called by the archiver when finishing saving all BLOB DATA.
-  *
-  * Optional.
-  */
- static void
- _EndBlobs(ArchiveHandle *AH, TocEntry *te)
- {
- 	lclContext *ctx = (lclContext *) AH->formatData;
- 
- 	/* Write out a fake zero OID to mark end-of-blobs. */
- 	/* WriteInt(AH, 0); */
- 
- 	if (fclose(ctx->blobToc) != 0)
- 		die_horribly(AH, modulename, "could not close large object TOC file: %s\n", strerror(errno));
- }
--- 537,539 ----
*** a/src/bin/pg_dump/pg_backup_null.c
--- b/src/bin/pg_dump/pg_backup_null.c
***************
*** 147,160 **** _StartBlobs(ArchiveHandle *AH, TocEntry *te)
  static void
  _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid)
  {
  	if (oid == 0)
  		die_horribly(AH, NULL, "invalid OID for large object\n");
  
  	if (AH->ropt->dropSchema)
! 		DropBlobIfExists(AH, oid);
  
! 	ahprintf(AH, "SELECT pg_catalog.lo_open(pg_catalog.lo_create('%u'), %d);\n",
! 			 oid, INV_WRITE);
  
  	AH->WriteDataPtr = _WriteBlobData;
  }
--- 147,165 ----
  static void
  _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid)
  {
+ 	bool	compat = (strcmp(te->desc, "BLOBS") == 0 ? true : false);
+ 
  	if (oid == 0)
  		die_horribly(AH, NULL, "invalid OID for large object\n");
  
  	if (AH->ropt->dropSchema)
! 		CleanupBlobIfExists(AH, oid, compat);
  
! 	if (compat)
! 		ahprintf(AH, "SELECT pg_catalog.lo_open(pg_catalog.lo_create('%u'), %d);\n",
! 				 oid, INV_WRITE);
! 	else
! 		ahprintf(AH, "SELECT pg_catalog.lo_open(%u, %d);\n", oid, INV_WRITE);
  
  	AH->WriteDataPtr = _WriteBlobData;
  }
*** a/src/bin/pg_dump/pg_backup_tar.c
--- b/src/bin/pg_dump/pg_backup_tar.c
***************
*** 44,53 **** static void _WriteExtraToc(ArchiveHandle *AH, TocEntry *te);
  static void _ReadExtraToc(ArchiveHandle *AH, TocEntry *te);
  static void _PrintExtraToc(ArchiveHandle *AH, TocEntry *te);
  
- static void _StartBlobs(ArchiveHandle *AH, TocEntry *te);
  static void _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
  static void _EndBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
- static void _EndBlobs(ArchiveHandle *AH, TocEntry *te);
  
  #define K_STD_BUF_SIZE 1024
  
--- 44,51 ----
***************
*** 101,106 **** typedef struct
--- 99,106 ----
  static const char *modulename = gettext_noop("tar archiver");
  
  static void _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt);
+ static void _LoadLargeObject(ArchiveHandle *AH, TocEntry *te,
+ 							 RestoreOptions *ropt);
  
  static TAR_MEMBER *tarOpen(ArchiveHandle *AH, const char *filename, char mode);
  static void tarClose(ArchiveHandle *AH, TAR_MEMBER *TH);
***************
*** 145,154 **** InitArchiveFmt_Tar(ArchiveHandle *AH)
  	AH->WriteExtraTocPtr = _WriteExtraToc;
  	AH->PrintExtraTocPtr = _PrintExtraToc;
  
! 	AH->StartBlobsPtr = _StartBlobs;
  	AH->StartBlobPtr = _StartBlob;
  	AH->EndBlobPtr = _EndBlob;
! 	AH->EndBlobsPtr = _EndBlobs;
  	AH->ClonePtr = NULL;
  	AH->DeClonePtr = NULL;
  
--- 145,154 ----
  	AH->WriteExtraTocPtr = _WriteExtraToc;
  	AH->PrintExtraTocPtr = _PrintExtraToc;
  
! 	AH->StartBlobsPtr = NULL;
  	AH->StartBlobPtr = _StartBlob;
  	AH->EndBlobPtr = _EndBlob;
! 	AH->EndBlobsPtr = NULL;
  	AH->ClonePtr = NULL;
  	AH->DeClonePtr = NULL;
  
***************
*** 697,702 **** _PrintTocData(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
--- 697,704 ----
  
  	if (strcmp(te->desc, "BLOBS") == 0)
  		_LoadBlobs(AH, ropt);
+ 	else if (strcmp(te->desc, "LARGE OBJECT") == 0)
+ 		_LoadLargeObject(AH, te, ropt);
  	else
  		_PrintFileData(AH, tctx->filename, ropt);
  }
***************
*** 725,731 **** _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt)
  			{
  				ahlog(AH, 1, "restoring large object OID %u\n", oid);
  
! 				StartRestoreBlob(AH, oid, ropt->dropSchema);
  
  				while ((cnt = tarRead(buf, 4095, th)) > 0)
  				{
--- 727,733 ----
  			{
  				ahlog(AH, 1, "restoring large object OID %u\n", oid);
  
! 				StartRestoreBlob(AH, oid, ropt->dropSchema, true);
  
  				while ((cnt = tarRead(buf, 4095, th)) > 0)
  				{
***************
*** 756,761 **** _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt)
--- 758,796 ----
  	EndRestoreBlobs(AH);
  }
  
+ static void
+ _LoadLargeObject(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
+ {
+ 	Oid			oid = atooid(te->tag);
+ 	char		fname[K_STD_BUF_SIZE];
+ 	char		buf[4096];
+ 	TAR_MEMBER *th;
+ 	size_t		cnt;
+ 
+ 	StartRestoreBlobs(AH);
+ 
+ 	snprintf(fname, sizeof(fname), "blob_%u.dat", oid);
+ 
+ 	th = tarOpen(AH, fname, 'r');
+ 	if (th == NULL)
+ 		die_horribly(AH, modulename, "could not open input file \"%s\": %s\n",
+ 					 fname, strerror(errno));
+ 
+ 	ahlog(AH, 1, "restoring large object OID %u\n", oid);
+ 
+ 	StartRestoreBlob(AH, oid, ropt->dropSchema, false);
+ 
+ 	while ((cnt = tarRead(buf, 4095, th)) > 0)
+ 	{
+ 		buf[cnt] = '\0';
+ 		ahwrite(buf, 1, cnt, AH);
+ 	}
+ 	EndRestoreBlob(AH, oid);
+ 
+ 	tarClose(AH, th);
+ 
+ 	EndRestoreBlobs(AH);
+ }
  
  static int
  _WriteByte(ArchiveHandle *AH, const int i)
***************
*** 894,919 **** _scriptOut(ArchiveHandle *AH, const void *buf, size_t len)
   */
  
  /*
-  * Called by the archiver when starting to save all BLOB DATA (not schema).
-  * This routine should save whatever format-specific information is needed
-  * to read the BLOBs back into memory.
-  *
-  * It is called just prior to the dumper's DataDumper routine.
-  *
-  * Optional, but strongly recommended.
-  *
-  */
- static void
- _StartBlobs(ArchiveHandle *AH, TocEntry *te)
- {
- 	lclContext *ctx = (lclContext *) AH->formatData;
- 	char		fname[K_STD_BUF_SIZE];
- 
- 	sprintf(fname, "blobs.toc");
- 	ctx->blobToc = tarOpen(AH, fname, 'w');
- }
- 
- /*
   * Called by the archiver when the dumper calls StartBlob.
   *
   * Mandatory.
--- 929,934 ----
***************
*** 923,929 **** _StartBlobs(ArchiveHandle *AH, TocEntry *te)
  static void
  _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid)
  {
- 	lclContext *ctx = (lclContext *) AH->formatData;
  	lclTocEntry *tctx = (lclTocEntry *) te->formatData;
  	char		fname[255];
  	char	   *sfx;
--- 938,943 ----
***************
*** 938,945 **** _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid)
  
  	sprintf(fname, "blob_%u.dat%s", oid, sfx);
  
- 	tarPrintf(AH, ctx->blobToc, "%u %s\n", oid, fname);
- 
  	tctx->TH = tarOpen(AH, fname, 'w');
  }
  
--- 952,957 ----
***************
*** 957,981 **** _EndBlob(ArchiveHandle *AH, TocEntry *te, Oid oid)
  	tarClose(AH, tctx->TH);
  }
  
- /*
-  * Called by the archiver when finishing saving all BLOB DATA.
-  *
-  * Optional.
-  *
-  */
- static void
- _EndBlobs(ArchiveHandle *AH, TocEntry *te)
- {
- 	lclContext *ctx = (lclContext *) AH->formatData;
- 
- 	/* Write out a fake zero OID to mark end-of-blobs. */
- 	/* WriteInt(AH, 0); */
- 
- 	tarClose(AH, ctx->blobToc);
- }
- 
- 
- 
  /*------------
   * TAR Support
   *------------
--- 969,974 ----
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
***************
*** 190,198 **** static void selectSourceSchema(const char *schemaName);
  static char *getFormattedTypeName(Oid oid, OidOptions opts);
  static char *myFormatType(const char *typname, int32 typmod);
  static const char *fmtQualifiedId(const char *schema, const char *id);
! static bool hasBlobs(Archive *AH);
! static int	dumpBlobs(Archive *AH, void *arg);
! static int	dumpBlobComments(Archive *AH, void *arg);
  static void dumpDatabase(Archive *AH);
  static void dumpEncoding(Archive *AH);
  static void dumpStdStrings(Archive *AH);
--- 190,198 ----
  static char *getFormattedTypeName(Oid oid, OidOptions opts);
  static char *myFormatType(const char *typname, int32 typmod);
  static const char *fmtQualifiedId(const char *schema, const char *id);
! static void getLargeObjects(Archive *AH);
! static void dumpLargeObject(Archive *AH, LargeObjectInfo *loinfo);
! static int	dumpLargeObjectData(Archive *AH, void *arg);
  static void dumpDatabase(Archive *AH);
  static void dumpEncoding(Archive *AH);
  static void dumpStdStrings(Archive *AH);
***************
*** 701,725 **** main(int argc, char **argv)
  			getTableDataFKConstraints();
  	}
  
! 	if (outputBlobs && hasBlobs(g_fout))
! 	{
! 		/* Add placeholders to allow correct sorting of blobs */
! 		DumpableObject *blobobj;
! 		DumpableObject *blobcobj;
! 
! 		blobobj = (DumpableObject *) malloc(sizeof(DumpableObject));
! 		blobobj->objType = DO_BLOBS;
! 		blobobj->catId = nilCatalogId;
! 		AssignDumpId(blobobj);
! 		blobobj->name = strdup("BLOBS");
! 
! 		blobcobj = (DumpableObject *) malloc(sizeof(DumpableObject));
! 		blobcobj->objType = DO_BLOB_COMMENTS;
! 		blobcobj->catId = nilCatalogId;
! 		AssignDumpId(blobcobj);
! 		blobcobj->name = strdup("BLOB COMMENTS");
! 		addObjectDependency(blobcobj, blobobj->dumpId);
! 	}
  
  	/*
  	 * Collect dependency data to assist in ordering the objects.
--- 701,708 ----
  			getTableDataFKConstraints();
  	}
  
! 	if (outputBlobs)
! 		getLargeObjects(g_fout);
  
  	/*
  	 * Collect dependency data to assist in ordering the objects.
***************
*** 1936,2183 **** dumpStdStrings(Archive *AH)
  	destroyPQExpBuffer(qry);
  }
  
- 
  /*
!  * hasBlobs:
!  *	Test whether database contains any large objects
   */
! static bool
! hasBlobs(Archive *AH)
  {
! 	bool		result;
! 	const char *blobQry;
! 	PGresult   *res;
  
  	/* Make sure we are in proper schema */
  	selectSourceSchema("pg_catalog");
  
! 	/* Check for BLOB OIDs */
  	if (AH->remoteVersion >= 80500)
! 		blobQry = "SELECT oid FROM pg_largeobject_metadata LIMIT 1";
  	else if (AH->remoteVersion >= 70100)
! 		blobQry = "SELECT loid FROM pg_largeobject LIMIT 1";
  	else
! 		blobQry = "SELECT oid FROM pg_class WHERE relkind = 'l' LIMIT 1";
  
! 	res = PQexec(g_conn, blobQry);
! 	check_sql_result(res, g_conn, blobQry, PGRES_TUPLES_OK);
  
! 	result = PQntuples(res) > 0;
  
  	PQclear(res);
  
! 	return result;
  }
  
  /*
!  * dumpBlobs:
!  *	dump all blobs
   */
! static int
! dumpBlobs(Archive *AH, void *arg)
  {
! 	const char *blobQry;
! 	const char *blobFetchQry;
! 	PGresult   *res;
! 	char		buf[LOBBUFSIZE];
! 	int			i;
! 	int			cnt;
! 
! 	if (g_verbose)
! 		write_msg(NULL, "saving large objects\n");
  
! 	/* Make sure we are in proper schema */
! 	selectSourceSchema("pg_catalog");
! 
! 	/* Cursor to get all BLOB OIDs */
! 	if (AH->remoteVersion >= 80500)
! 		blobQry = "DECLARE bloboid CURSOR FOR SELECT oid FROM pg_largeobject_metadata";
! 	else if (AH->remoteVersion >= 70100)
! 		blobQry = "DECLARE bloboid CURSOR FOR SELECT DISTINCT loid FROM pg_largeobject";
! 	else
! 		blobQry = "DECLARE bloboid CURSOR FOR SELECT oid FROM pg_class WHERE relkind = 'l'";
  
! 	res = PQexec(g_conn, blobQry);
! 	check_sql_result(res, g_conn, blobQry, PGRES_COMMAND_OK);
! 
! 	/* Command to fetch from cursor */
! 	blobFetchQry = "FETCH 1000 IN bloboid";
! 
! 	do
  	{
! 		PQclear(res);
! 
! 		/* Do a fetch */
! 		res = PQexec(g_conn, blobFetchQry);
! 		check_sql_result(res, g_conn, blobFetchQry, PGRES_TUPLES_OK);
! 
! 		/* Process the tuples, if any */
! 		for (i = 0; i < PQntuples(res); i++)
! 		{
! 			Oid			blobOid;
! 			int			loFd;
! 
! 			blobOid = atooid(PQgetvalue(res, i, 0));
! 			/* Open the BLOB */
! 			loFd = lo_open(g_conn, blobOid, INV_READ);
! 			if (loFd == -1)
! 			{
! 				write_msg(NULL, "dumpBlobs(): could not open large object: %s",
! 						  PQerrorMessage(g_conn));
! 				exit_nicely();
! 			}
! 
! 			StartBlob(AH, blobOid);
! 
! 			/* Now read it in chunks, sending data to archive */
! 			do
! 			{
! 				cnt = lo_read(g_conn, loFd, buf, LOBBUFSIZE);
! 				if (cnt < 0)
! 				{
! 					write_msg(NULL, "dumpBlobs(): error reading large object: %s",
! 							  PQerrorMessage(g_conn));
! 					exit_nicely();
! 				}
! 
! 				WriteData(AH, buf, cnt);
! 			} while (cnt > 0);
! 
! 			lo_close(g_conn, loFd);
! 
! 			EndBlob(AH, blobOid);
! 		}
! 	} while (PQntuples(res) > 0);
  
! 	PQclear(res);
  
! 	return 1;
  }
  
- /*
-  * dumpBlobComments
-  *	dump all blob properties.
-  *  It has "BLOB COMMENTS" tag due to the historical reason, but note
-  *  that it is the routine to dump all the properties of blobs.
-  *
-  * Since we don't provide any way to be selective about dumping blobs,
-  * there's no need to be selective about their comments either.  We put
-  * all the comments into one big TOC entry.
-  */
  static int
! dumpBlobComments(Archive *AH, void *arg)
  {
! 	const char *blobQry;
! 	const char *blobFetchQry;
! 	PQExpBuffer cmdQry = createPQExpBuffer();
! 	PGresult   *res;
! 	int			i;
! 
! 	if (g_verbose)
! 		write_msg(NULL, "saving large object properties\n");
! 
! 	/* Make sure we are in proper schema */
! 	selectSourceSchema("pg_catalog");
  
! 	/* Cursor to get all BLOB comments */
! 	if (AH->remoteVersion >= 80500)
! 		blobQry = "DECLARE blobcmt CURSOR FOR SELECT oid, "
! 			"obj_description(oid, 'pg_largeobject'), "
! 			"pg_get_userbyid(lomowner), lomacl "
! 			"FROM pg_largeobject_metadata";
! 	else if (AH->remoteVersion >= 70300)
! 		blobQry = "DECLARE blobcmt CURSOR FOR SELECT loid, "
! 			"obj_description(loid, 'pg_largeobject'), NULL, NULL "
! 			"FROM (SELECT DISTINCT loid FROM "
! 			"pg_description d JOIN pg_largeobject l ON (objoid = loid) "
! 			"WHERE classoid = 'pg_largeobject'::regclass) ss";
! 	else if (AH->remoteVersion >= 70200)
! 		blobQry = "DECLARE blobcmt CURSOR FOR SELECT loid, "
! 			"obj_description(loid, 'pg_largeobject'), NULL, NULL "
! 			"FROM (SELECT DISTINCT loid FROM pg_largeobject) ss";
! 	else if (AH->remoteVersion >= 70100)
! 		blobQry = "DECLARE blobcmt CURSOR FOR SELECT loid, "
! 			"obj_description(loid), NULL, NULL "
! 			"FROM (SELECT DISTINCT loid FROM pg_largeobject) ss";
! 	else
! 		blobQry = "DECLARE blobcmt CURSOR FOR SELECT oid, "
! 			"	( "
! 			"		SELECT description "
! 			"		FROM pg_description pd "
! 			"		WHERE pd.objoid=pc.oid "
! 			"	), NULL, NULL "
! 			"FROM pg_class pc WHERE relkind = 'l'";
! 
! 	res = PQexec(g_conn, blobQry);
! 	check_sql_result(res, g_conn, blobQry, PGRES_COMMAND_OK);
  
! 	/* Command to fetch from cursor */
! 	blobFetchQry = "FETCH 100 IN blobcmt";
  
  	do
  	{
! 		PQclear(res);
! 
! 		/* Do a fetch */
! 		res = PQexec(g_conn, blobFetchQry);
! 		check_sql_result(res, g_conn, blobFetchQry, PGRES_TUPLES_OK);
! 
! 		/* Process the tuples, if any */
! 		for (i = 0; i < PQntuples(res); i++)
  		{
! 			Oid			blobOid = atooid(PQgetvalue(res, i, 0));
! 			char	   *lo_comment = PQgetvalue(res, i, 1);
! 			char	   *lo_owner = PQgetvalue(res, i, 2);
! 			char	   *lo_acl = PQgetvalue(res, i, 3);
! 			char		lo_name[32];
! 
! 			resetPQExpBuffer(cmdQry);
! 
! 			/* comment on the blob */
! 			if (!PQgetisnull(res, i, 1))
! 			{
! 				appendPQExpBuffer(cmdQry,
! 								  "COMMENT ON LARGE OBJECT %u IS ", blobOid);
! 				appendStringLiteralAH(cmdQry, lo_comment, AH);
! 				appendPQExpBuffer(cmdQry, ";\n");
! 			}
! 
! 			/* dump blob ownership, if necessary */
! 			if (!PQgetisnull(res, i, 2))
! 			{
! 				appendPQExpBuffer(cmdQry,
! 								  "ALTER LARGE OBJECT %u OWNER TO %s;\n",
! 								  blobOid, lo_owner);
! 			}
! 
! 			/* dump blob privileges, if necessary */
! 			if (!PQgetisnull(res, i, 3) &&
! 				!dataOnly && !aclsSkip)
! 			{
! 				snprintf(lo_name, sizeof(lo_name), "%u", blobOid);
! 				if (!buildACLCommands(lo_name, NULL, "LARGE OBJECT",
! 									  lo_acl, lo_owner, "",
! 									  AH->remoteVersion, cmdQry))
! 				{
! 					write_msg(NULL, "could not parse ACL (%s) for "
! 							  "large object %u", lo_acl, blobOid);
! 					exit_nicely();
! 				}
! 			}
! 
! 			if (cmdQry->len > 0)
! 			{
! 				appendPQExpBuffer(cmdQry, "\n");
! 				archputs(cmdQry->data, AH);
! 			}
  		}
- 	} while (PQntuples(res) > 0);
  
! 	PQclear(res);
  
! 	archputs("\n", AH);
  
! 	destroyPQExpBuffer(cmdQry);
  
  	return 1;
  }
--- 1919,2077 ----
  	destroyPQExpBuffer(qry);
  }
  
  /*
!  * getLargeObjects 
!  *	Gather the information of large obejcts.
   */
! static void
! getLargeObjects(Archive *AH)
  {
! 	PQExpBuffer			loQry = createPQExpBuffer();
! 	LargeObjectInfo	   *loinfo;
! 	PGresult		   *res;
! 	int					i;
  
  	/* Make sure we are in proper schema */
  	selectSourceSchema("pg_catalog");
  
! 	/* Collect large object metadata */
  	if (AH->remoteVersion >= 80500)
! 		appendPQExpBuffer(loQry,
! 						  "SELECT oid, (%s lomowner), lomacl,"
! 						  " obj_description(oid, 'pg_largeobject')"
! 						  " FROM pg_largeobject_metadata",
! 						  username_subquery);
! 	else if (AH->remoteVersion >= 70200)
! 		appendPQExpBuffer(loQry,
! 						  "SELECT DISTINCT loid, NULL, NULL, "
! 						  " obj_description(loid, 'pg_largeobject')"
! 						  " FROM pg_largeobject");
  	else if (AH->remoteVersion >= 70100)
! 		appendPQExpBuffer(loQry,
! 						  "SELECT DISTINCT loid, NULL, NULL, "
! 						  " obj_description(loid)"
! 						  " FROM pg_largeobject");
  	else
! 		appendPQExpBuffer(loQry,
! 						  "SELECT DISTINCT oid, NULL, NULL, "
! 						  " obj_description(oid)"
! 						  " FROM pg_class WHERE relkind = 'l'");
  
! 	res = PQexec(g_conn, loQry->data);
! 	check_sql_result(res, g_conn, loQry->data, PGRES_TUPLES_OK);
  
! 	for (i = 0; i < PQntuples(res); i++)
! 	{
! 		loinfo = (LargeObjectInfo *) malloc(sizeof(LargeObjectInfo));
! 		loinfo->dobj.objType = DO_LARGE_OBJECT;
! 		loinfo->dobj.catId = nilCatalogId;
! 		AssignDumpId(&loinfo->dobj);
  
+ 		loinfo->dobj.name = strdup(PQgetvalue(res, i, 0));
+ 		loinfo->rolname = strdup(PQgetvalue(res, i, 1));
+ 		loinfo->loacl = strdup(PQgetvalue(res, i, 2));
+ 		loinfo->locomm = strdup(PQgetvalue(res, i, 3));
+ 	}
  	PQclear(res);
  
! 	destroyPQExpBuffer(loQry);
  }
  
  /*
!  * dumpLargeObject
!  *	dump a large object metadata
   */
! static void
! dumpLargeObject(Archive *AH, LargeObjectInfo *loinfo)
  {
! 	PQExpBuffer		cquery;
! 	PQExpBuffer		dquery;
  
! 	cquery = createPQExpBuffer();
! 	dquery = createPQExpBuffer();
  
! 	/*
! 	 * create an empty large object
! 	 */
! 	appendPQExpBuffer(cquery, "SELECT lo_create(%s);\n",
! 					  loinfo->dobj.name);
! 	/*
! 	 * COMMENT ON, if necessary. Note that we cannot use dumpComment()
! 	 * because it will be deployed on SECTION_NONE.
! 	 */
! 	if (loinfo->locomm && strlen(loinfo->locomm) > 0)
  	{
! 		appendPQExpBuffer(cquery,
! 						  "\nCOMMENT ON LARGE OBJECT %s IS ",
! 						  loinfo->dobj.name);
! 		appendStringLiteralAH(cquery, loinfo->locomm, AH);
! 		appendPQExpBuffer(cquery, ";\n");
! 	}
  
! 	/*
! 	 * cleanup a large object
! 	 */
! 	appendPQExpBuffer(dquery, "SELECT lo_unlink(%s);\n",
! 					  loinfo->dobj.name);
  
! 	ArchiveEntry(AH, loinfo->dobj.catId, loinfo->dobj.dumpId,
! 				 loinfo->dobj.name,
! 				 NULL, NULL,
! 				 loinfo->rolname, false,
! 				 "LARGE OBJECT", SECTION_DATA,
! 				 cquery->data, dquery->data, NULL,
! 				 loinfo->dobj.dependencies, loinfo->dobj.nDeps,
! 				 dumpLargeObjectData, loinfo);
! 	/*
! 	 * Dump access privileges, if necessary
! 	 */
! 	dumpACL(AH, loinfo->dobj.catId, loinfo->dobj.dumpId,
! 			"LARGE OBJECT",
! 			loinfo->dobj.name, NULL,
! 			loinfo->dobj.name, NULL,
! 			loinfo->rolname, loinfo->loacl);
  }
  
  static int
! dumpLargeObjectData(Archive *AH, void *arg)
  {
! 	LargeObjectInfo	   *loinfo = (LargeObjectInfo *)arg;
! 	Oid		blobOid;
! 	int		blobFd;
! 	int		cnt;
! 	char	buf[LOBBUFSIZE];
  
! 	blobOid = atooid(loinfo->dobj.name);
  
! 	/* open the large obejct */
! 	blobFd = lo_open(g_conn, blobOid, INV_READ);
! 	if (blobFd == -1)
! 	{
! 		write_msg(NULL, "%s: could not open large object: %s",
! 				  __FUNCTION__,
! 				  PQerrorMessage(g_conn));
! 		exit_nicely();
! 	}
  
+ 	StartBlob(AH, blobOid);
+ 	/* Now read it in chunks, sending data to archive */
  	do
  	{
! 		cnt = lo_read(g_conn, blobFd, buf, LOBBUFSIZE);
! 		if (cnt < 0)
  		{
! 			write_msg(NULL, "%s(): error reading large object: %s",
! 					  __FUNCTION__,
! 					  PQerrorMessage(g_conn));
! 			exit_nicely();
  		}
  
! 		WriteData(AH, buf, cnt);
! 	} while (cnt > 0);
  
! 	lo_close(g_conn, blobFd);
  
! 	EndBlob(AH, blobOid);
  
  	return 1;
  }
***************
*** 6524,6544 **** dumpDumpableObject(Archive *fout, DumpableObject *dobj)
  		case DO_DEFAULT_ACL:
  			dumpDefaultACL(fout, (DefaultACLInfo *) dobj);
  			break;
! 		case DO_BLOBS:
! 			ArchiveEntry(fout, dobj->catId, dobj->dumpId,
! 						 dobj->name, NULL, NULL, "",
! 						 false, "BLOBS", SECTION_DATA,
! 						 "", "", NULL,
! 						 dobj->dependencies, dobj->nDeps,
! 						 dumpBlobs, NULL);
! 			break;
! 		case DO_BLOB_COMMENTS:
! 			ArchiveEntry(fout, dobj->catId, dobj->dumpId,
! 						 dobj->name, NULL, NULL, "",
! 						 false, "BLOB COMMENTS", SECTION_DATA,
! 						 "", "", NULL,
! 						 dobj->dependencies, dobj->nDeps,
! 						 dumpBlobComments, NULL);
  			break;
  	}
  }
--- 6418,6425 ----
  		case DO_DEFAULT_ACL:
  			dumpDefaultACL(fout, (DefaultACLInfo *) dobj);
  			break;
! 		case DO_LARGE_OBJECT:
! 			dumpLargeObject(fout, (LargeObjectInfo *) dobj);
  			break;
  	}
  }
***************
*** 10395,10401 **** dumpACL(Archive *fout, CatalogId objCatId, DumpId objDumpId,
  	PQExpBuffer sql;
  
  	/* Do nothing if ACL dump is not enabled */
! 	if (dataOnly || aclsSkip)
  		return;
  
  	sql = createPQExpBuffer();
--- 10276,10284 ----
  	PQExpBuffer sql;
  
  	/* Do nothing if ACL dump is not enabled */
! 	/* Large object is an exception of --data-only */
! 	if (aclsSkip ||
! 		(dataOnly && strcmp(type, "LARGE OBJECT") != 0))
  		return;
  
  	sql = createPQExpBuffer();
*** a/src/bin/pg_dump/pg_dump.h
--- b/src/bin/pg_dump/pg_dump.h
***************
*** 115,122 **** typedef enum
  	DO_FDW,
  	DO_FOREIGN_SERVER,
  	DO_DEFAULT_ACL,
! 	DO_BLOBS,
! 	DO_BLOB_COMMENTS
  } DumpableObjectType;
  
  typedef struct _dumpableObject
--- 115,121 ----
  	DO_FDW,
  	DO_FOREIGN_SERVER,
  	DO_DEFAULT_ACL,
! 	DO_LARGE_OBJECT,
  } DumpableObjectType;
  
  typedef struct _dumpableObject
***************
*** 443,448 **** typedef struct _defaultACLInfo
--- 442,455 ----
  	char	   *defaclacl;
  } DefaultACLInfo;
  
+ typedef struct _largeObjectInfo
+ {
+ 	DumpableObject	dobj;
+ 	char	   *rolname;
+ 	char	   *loacl;
+ 	char	   *locomm;
+ } LargeObjectInfo;
+ 
  /* global decls */
  extern bool force_quotes;		/* double-quotes for identifiers flag */
  extern bool g_verbose;			/* verbose flag */
*** a/src/bin/pg_dump/pg_dump_sort.c
--- b/src/bin/pg_dump/pg_dump_sort.c
***************
*** 55,62 **** static const int oldObjectTypePriority[] =
  	3,							/* DO_FDW */
  	4,							/* DO_FOREIGN_SERVER */
  	17,							/* DO_DEFAULT_ACL */
! 	10,							/* DO_BLOBS */
! 	11							/* DO_BLOB_COMMENTS */
  };
  
  /*
--- 55,61 ----
  	3,							/* DO_FDW */
  	4,							/* DO_FOREIGN_SERVER */
  	17,							/* DO_DEFAULT_ACL */
! 	10,							/* DO_LARGE_OBJECT */
  };
  
  /*
***************
*** 92,99 **** static const int newObjectTypePriority[] =
  	14,							/* DO_FDW */
  	15,							/* DO_FOREIGN_SERVER */
  	27,							/* DO_DEFAULT_ACL */
! 	20,							/* DO_BLOBS */
! 	21							/* DO_BLOB_COMMENTS */
  };
  
  
--- 91,97 ----
  	14,							/* DO_FDW */
  	15,							/* DO_FOREIGN_SERVER */
  	27,							/* DO_DEFAULT_ACL */
! 	20,							/* DO_LARGE_OBJECT */
  };
  
  
***************
*** 1146,1160 **** describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
  					 "DEFAULT ACL %s  (ID %d OID %u)",
  					 obj->name, obj->dumpId, obj->catId.oid);
  			return;
! 		case DO_BLOBS:
  			snprintf(buf, bufsize,
! 					 "BLOBS  (ID %d)",
! 					 obj->dumpId);
! 			return;
! 		case DO_BLOB_COMMENTS:
! 			snprintf(buf, bufsize,
! 					 "BLOB COMMENTS  (ID %d)",
! 					 obj->dumpId);
  			return;
  	}
  	/* shouldn't get here */
--- 1144,1153 ----
  					 "DEFAULT ACL %s  (ID %d OID %u)",
  					 obj->name, obj->dumpId, obj->catId.oid);
  			return;
! 		case DO_LARGE_OBJECT:
  			snprintf(buf, bufsize,
! 					 "LARGE OBJECT (OID %s)",
! 					 obj->name);
  			return;
  	}
  	/* shouldn't get here */
#116Alvaro Herrera
alvherre@commandprompt.com
In reply to: Takahiro Itagaki (#112)
Re: Largeobject Access Controls (r2460)

Takahiro Itagaki escribi�:

KaiGai Kohei <kaigai@kaigai.gr.jp> wrote:

default: both contents and metadata
--data-only: same
--schema-only: neither

However, it means only large object performs an exceptional object class
that dumps its owner, acl and comment even if --data-only is given.
Is it really what you suggested, isn't it?

I wonder we still need to have both "BLOB ITEM" and "BLOB DATA"
even if we will take the all-or-nothing behavior. Can we handle
BLOB's owner, acl, comment and data with one entry kind?

I don't think this is necessarily a good idea. We might decide to treat
both things separately in the future and it having them represented
separately in the dump would prove useful.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#117KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Alvaro Herrera (#116)
1 attachment(s)
Re: Largeobject Access Controls (r2460)

(2010/02/08 22:23), Alvaro Herrera wrote:

Takahiro Itagaki escribió:

KaiGai Kohei<kaigai@kaigai.gr.jp> wrote:

default: both contents and metadata
--data-only: same
--schema-only: neither

However, it means only large object performs an exceptional object class
that dumps its owner, acl and comment even if --data-only is given.
Is it really what you suggested, isn't it?

I wonder we still need to have both "BLOB ITEM" and "BLOB DATA"
even if we will take the all-or-nothing behavior. Can we handle
BLOB's owner, acl, comment and data with one entry kind?

I don't think this is necessarily a good idea. We might decide to treat
both things separately in the future and it having them represented
separately in the dump would prove useful.

I agree. From design perspective, the single section approach is more
simple than dual section, but its change set is larger than the dual.

The attached patch revised the previous implementation which have
two types of sections, to handle options correctly, as follows:

* default: both contents and metadata
* --data-only: same
* --schema-only: neither

Below is the points to be introduced.

The _tocEntryRequired() makes its decision whether the given TocEntry
to be dumped here, or not, based on the given context.
Previously, all the sections which needs cleanups and access privileges
were not belonging to SECTION_DATA, so, data sections were ignored, even
if it needs to restore cleanup code and access privileges.

At the pg_backup_archiver.c:329 chunk, it checks whether we need to clean
up the existing object specified by the TocEntry. If the entry is "BLOB
ITEM", _tocEntryRequired() returns REQ_DATA (if --data-only given), then
it does not write out the cleanup code.
(We have to unlink the existing large objects to be restored prior to
creation of them, so it got unavailable to clean up at _StartBlob().)

At the pg_backup_archiver.c:381 chunk, it checks whether we need to restore
access privileges, or not, using the given "ACL" TocEntry. In similar way,
the caller does not expect access privileges being restored when --data-only
is given.

The _tocEntryRequired() was also modified to handle large objects correctly.
Previously, when TocEntry does not have its own dumper (except for "SEQUENCE
SET" section), it was handled as a SECTION_SCHEMA.
If the 16th argument of ArchiveEntry() was NULL, it does not have its own
dumper function, even if the section is SECTION_DATA. Also, the dumpBlobItem()
calls ArchiveEntry() without its dumper, so it is misidentified as a schema
section. The "ACL" section of large objects are also misidentified.
So, I had to add these special treatments.

The dumpACL() is a utility function to write out GRANT/REVOKE statement for
the given acl string. When --data-only is given, it returns immediately
without any works. It prevented to dump access privileges of large objects.
However, all the caller already checks "if (dataOnly)" cases prior to its
invocation. So, I removed this check from the dumpACL().

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

pgsql-fix-pg_dump-blob-privs.6.patchapplication/octect-stream; name=pgsql-fix-pg_dump-blob-privs.6.patchDownload
*** a/src/bin/pg_dump/pg_backup_archiver.c
--- b/src/bin/pg_dump/pg_backup_archiver.c
***************
*** 329,335 **** RestoreArchive(Archive *AHX, RestoreOptions *ropt)
  			AH->currentTE = te;
  
  			reqs = _tocEntryRequired(te, ropt, false /* needn't drop ACLs */ );
! 			if (((reqs & REQ_SCHEMA) != 0) && te->dropStmt)
  			{
  				/* We want the schema */
  				ahlog(AH, 1, "dropping %s %s\n", te->desc, te->tag);
--- 329,335 ----
  			AH->currentTE = te;
  
  			reqs = _tocEntryRequired(te, ropt, false /* needn't drop ACLs */ );
! 			if (((reqs & (REQ_SCHEMA|REQ_DATA)) != 0) && te->dropStmt)
  			{
  				/* We want the schema */
  				ahlog(AH, 1, "dropping %s %s\n", te->desc, te->tag);
***************
*** 381,387 **** RestoreArchive(Archive *AHX, RestoreOptions *ropt)
  		/* Work out what, if anything, we want from this entry */
  		reqs = _tocEntryRequired(te, ropt, true);
  
! 		if ((reqs & REQ_SCHEMA) != 0)	/* We want the schema */
  		{
  			ahlog(AH, 1, "setting owner and privileges for %s %s\n",
  				  te->desc, te->tag);
--- 381,388 ----
  		/* Work out what, if anything, we want from this entry */
  		reqs = _tocEntryRequired(te, ropt, true);
  
! 		/* Access privileges for both of schema and data */
! 		if ((reqs & (REQ_SCHEMA|REQ_DATA)) != 0)
  		{
  			ahlog(AH, 1, "setting owner and privileges for %s %s\n",
  				  te->desc, te->tag);
***************
*** 520,525 **** restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
--- 521,527 ----
  				_printTocEntry(AH, te, ropt, true, false);
  
  				if (strcmp(te->desc, "BLOBS") == 0 ||
+ 					strcmp(te->desc, "BLOB DATA") == 0 ||
  					strcmp(te->desc, "BLOB COMMENTS") == 0)
  				{
  					ahlog(AH, 1, "restoring %s\n", te->desc);
***************
*** 903,909 **** EndRestoreBlobs(ArchiveHandle *AH)
   * Called by a format handler to initiate restoration of a blob
   */
  void
! StartRestoreBlob(ArchiveHandle *AH, Oid oid, bool drop)
  {
  	Oid			loOid;
  
--- 905,911 ----
   * Called by a format handler to initiate restoration of a blob
   */
  void
! StartRestoreBlob(ArchiveHandle *AH, Oid oid, bool drop, bool compat)
  {
  	Oid			loOid;
  
***************
*** 919,937 **** StartRestoreBlob(ArchiveHandle *AH, Oid oid, bool drop)
  
  	if (AH->connection)
  	{
! 		loOid = lo_create(AH->connection, oid);
! 		if (loOid == 0 || loOid != oid)
! 			die_horribly(AH, modulename, "could not create large object %u\n",
! 						 oid);
! 
  		AH->loFd = lo_open(AH->connection, oid, INV_WRITE);
  		if (AH->loFd == -1)
  			die_horribly(AH, modulename, "could not open large object\n");
  	}
  	else
  	{
! 		ahprintf(AH, "SELECT pg_catalog.lo_open(pg_catalog.lo_create('%u'), %d);\n",
! 				 oid, INV_WRITE);
  	}
  
  	AH->writingBlob = 1;
--- 921,944 ----
  
  	if (AH->connection)
  	{
! 		if (compat)
! 		{
! 			loOid = lo_create(AH->connection, oid);
! 			if (loOid == 0 || loOid != oid)
! 				die_horribly(AH, modulename, "could not create large object %u\n",
! 							 oid);
! 		}
  		AH->loFd = lo_open(AH->connection, oid, INV_WRITE);
  		if (AH->loFd == -1)
  			die_horribly(AH, modulename, "could not open large object\n");
  	}
  	else
  	{
! 		if (compat)
! 			ahprintf(AH, "SELECT pg_catalog.lo_open(pg_catalog.lo_create('%u'), %d);\n", oid, INV_WRITE);
! 		else
! 			ahprintf(AH, "SELECT pg_catalog.lo_open(%u, %d);\n",
! 					 oid, INV_WRITE);
  	}
  
  	AH->writingBlob = 1;
***************
*** 1940,1946 **** WriteDataChunks(ArchiveHandle *AH)
  			AH->currToc = te;
  			/* printf("Writing data for %d (%x)\n", te->id, te); */
  
! 			if (strcmp(te->desc, "BLOBS") == 0)
  			{
  				startPtr = AH->StartBlobsPtr;
  				endPtr = AH->EndBlobsPtr;
--- 1947,1954 ----
  			AH->currToc = te;
  			/* printf("Writing data for %d (%x)\n", te->id, te); */
  
! 			if (strcmp(te->desc, "BLOBS") == 0 ||
! 				strcmp(te->desc, "BLOB DATA") == 0)
  			{
  				startPtr = AH->StartBlobsPtr;
  				endPtr = AH->EndBlobsPtr;
***************
*** 2077,2082 **** ReadToc(ArchiveHandle *AH)
--- 2085,2091 ----
  				te->section = SECTION_NONE;
  			else if (strcmp(te->desc, "TABLE DATA") == 0 ||
  					 strcmp(te->desc, "BLOBS") == 0 ||
+ 					 strcmp(te->desc, "BLOB DATA") == 0 ||
  					 strcmp(te->desc, "BLOB COMMENTS") == 0)
  				te->section = SECTION_DATA;
  			else if (strcmp(te->desc, "CONSTRAINT") == 0 ||
***************
*** 2286,2294 **** _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls)
  	if (!te->hadDumper)
  	{
  		/*
! 		 * Special Case: If 'SEQUENCE SET' then it is considered a data entry
  		 */
! 		if (strcmp(te->desc, "SEQUENCE SET") == 0)
  			res = res & REQ_DATA;
  		else
  			res = res & ~REQ_DATA;
--- 2295,2308 ----
  	if (!te->hadDumper)
  	{
  		/*
! 		 * Special Case: If 'SEQUENCE SET', 'BLOB ITEM' or 'ACL' for large
! 		 * objects, then it is considered a data entry
! 		 *
! 		 * XXX - we assume te->tag is not numeric except for large objects.
  		 */
! 		if (strcmp(te->desc, "SEQUENCE SET") == 0 ||
! 			strcmp(te->desc, "BLOB ITEM") == 0 ||
! 			(strcmp(te->desc, "ACL") == 0 && atooid(te->tag) > 0))
  			res = res & REQ_DATA;
  		else
  			res = res & ~REQ_DATA;
***************
*** 2713,2718 **** _getObjectDescription(PQExpBuffer buf, TocEntry *te, ArchiveHandle *AH)
--- 2727,2739 ----
  		return;
  	}
  
+ 	/* Use ALTER LARGE OBJECT for BLOB ITEM */
+ 	if (strcmp(type, "BLOB ITEM") == 0)
+ 	{
+ 		appendPQExpBuffer(buf, "LARGE OBJECT %s", te->tag);
+ 		return;
+ 	}
+ 
  	write_msg(modulename, "WARNING: don't know how to set owner for object type %s\n",
  			  type);
  }
***************
*** 2824,2829 **** _printTocEntry(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt, bool isDat
--- 2845,2851 ----
  		strlen(te->owner) > 0 && strlen(te->dropStmt) > 0)
  	{
  		if (strcmp(te->desc, "AGGREGATE") == 0 ||
+ 			strcmp(te->desc, "BLOB ITEM") == 0 ||
  			strcmp(te->desc, "CONVERSION") == 0 ||
  			strcmp(te->desc, "DATABASE") == 0 ||
  			strcmp(te->desc, "DOMAIN") == 0 ||
*** a/src/bin/pg_dump/pg_backup_archiver.h
--- b/src/bin/pg_dump/pg_backup_archiver.h
***************
*** 359,365 **** int			ReadOffset(ArchiveHandle *, pgoff_t *);
  size_t		WriteOffset(ArchiveHandle *, pgoff_t, int);
  
  extern void StartRestoreBlobs(ArchiveHandle *AH);
! extern void StartRestoreBlob(ArchiveHandle *AH, Oid oid, bool drop);
  extern void EndRestoreBlob(ArchiveHandle *AH, Oid oid);
  extern void EndRestoreBlobs(ArchiveHandle *AH);
  
--- 359,365 ----
  size_t		WriteOffset(ArchiveHandle *, pgoff_t, int);
  
  extern void StartRestoreBlobs(ArchiveHandle *AH);
! extern void StartRestoreBlob(ArchiveHandle *AH, Oid oid, bool drop, bool compat);
  extern void EndRestoreBlob(ArchiveHandle *AH, Oid oid);
  extern void EndRestoreBlobs(ArchiveHandle *AH);
  
*** a/src/bin/pg_dump/pg_backup_custom.c
--- b/src/bin/pg_dump/pg_backup_custom.c
***************
*** 54,60 **** static void _StartBlobs(ArchiveHandle *AH, TocEntry *te);
  static void _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
  static void _EndBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
  static void _EndBlobs(ArchiveHandle *AH, TocEntry *te);
! static void _LoadBlobs(ArchiveHandle *AH, bool drop);
  static void _Clone(ArchiveHandle *AH);
  static void _DeClone(ArchiveHandle *AH);
  
--- 54,60 ----
  static void _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
  static void _EndBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
  static void _EndBlobs(ArchiveHandle *AH, TocEntry *te);
! static void _LoadBlobs(ArchiveHandle *AH, bool drop, bool compat);
  static void _Clone(ArchiveHandle *AH);
  static void _DeClone(ArchiveHandle *AH);
  
***************
*** 498,504 **** _PrintTocData(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  			break;
  
  		case BLK_BLOBS:
! 			_LoadBlobs(AH, ropt->dropSchema);
  			break;
  
  		default:				/* Always have a default */
--- 498,507 ----
  			break;
  
  		case BLK_BLOBS:
! 			if (strcmp(te->desc, "BLOBS") == 0)
! 				_LoadBlobs(AH, ropt->dropSchema, true);
! 			else
! 				_LoadBlobs(AH, false, false);
  			break;
  
  		default:				/* Always have a default */
***************
*** 619,625 **** _PrintData(ArchiveHandle *AH)
  }
  
  static void
! _LoadBlobs(ArchiveHandle *AH, bool drop)
  {
  	Oid			oid;
  
--- 622,628 ----
  }
  
  static void
! _LoadBlobs(ArchiveHandle *AH, bool drop, bool compat)
  {
  	Oid			oid;
  
***************
*** 628,634 **** _LoadBlobs(ArchiveHandle *AH, bool drop)
  	oid = ReadInt(AH);
  	while (oid != 0)
  	{
! 		StartRestoreBlob(AH, oid, drop);
  		_PrintData(AH);
  		EndRestoreBlob(AH, oid);
  		oid = ReadInt(AH);
--- 631,637 ----
  	oid = ReadInt(AH);
  	while (oid != 0)
  	{
! 		StartRestoreBlob(AH, oid, drop, compat);
  		_PrintData(AH);
  		EndRestoreBlob(AH, oid);
  		oid = ReadInt(AH);
*** a/src/bin/pg_dump/pg_backup_db.c
--- b/src/bin/pg_dump/pg_backup_db.c
***************
*** 12,17 ****
--- 12,18 ----
  
  #include "pg_backup_db.h"
  #include "dumputils.h"
+ #include "libpq/libpq-fs.h"
  
  #include <unistd.h>
  
***************
*** 656,672 **** void
  DropBlobIfExists(ArchiveHandle *AH, Oid oid)
  {
  	/* Call lo_unlink only if exists to avoid not-found error. */
! 	if (PQserverVersion(AH->connection) >= 80500)
! 	{
! 		ahprintf(AH, "SELECT pg_catalog.lo_unlink(oid) "
! 					 "FROM pg_catalog.pg_largeobject_metadata "
! 					 "WHERE oid = %u;\n", oid);
! 	}
! 	else
! 	{
! 		ahprintf(AH, "SELECT CASE WHEN EXISTS(SELECT 1 FROM pg_catalog.pg_largeobject WHERE loid = '%u') THEN pg_catalog.lo_unlink('%u') END;\n",
! 				 oid, oid);
! 	}
  }
  
  static bool
--- 657,670 ----
  DropBlobIfExists(ArchiveHandle *AH, Oid oid)
  {
  	/* Call lo_unlink only if exists to avoid not-found error. */
! 	if (AH->connection &&
! 		PQserverVersion(AH->connection) < 80500)
! 		die_horribly(AH, NULL,
! 					 "could not restore large object into older server");
! 
! 	ahprintf(AH, "SELECT pg_catalog.lo_unlink(oid) "
! 			 "FROM pg_catalog.pg_largeobject_metadata "
! 			 "WHERE oid = %u;\n", oid);
  }
  
  static bool
*** a/src/bin/pg_dump/pg_backup_files.c
--- b/src/bin/pg_dump/pg_backup_files.c
***************
*** 66,72 **** typedef struct
  } lclTocEntry;
  
  static const char *modulename = gettext_noop("file archiver");
! static void _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt);
  static void _getBlobTocEntry(ArchiveHandle *AH, Oid *oid, char *fname);
  
  /*
--- 66,72 ----
  } lclTocEntry;
  
  static const char *modulename = gettext_noop("file archiver");
! static void _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt, bool compat);
  static void _getBlobTocEntry(ArchiveHandle *AH, Oid *oid, char *fname);
  
  /*
***************
*** 330,336 **** _PrintTocData(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  		return;
  
  	if (strcmp(te->desc, "BLOBS") == 0)
! 		_LoadBlobs(AH, ropt);
  	else
  		_PrintFileData(AH, tctx->filename, ropt);
  }
--- 330,338 ----
  		return;
  
  	if (strcmp(te->desc, "BLOBS") == 0)
! 		_LoadBlobs(AH, ropt, true);
! 	else if (strcmp(te->desc, "BLOB DATA") == 0)
! 		_LoadBlobs(AH, ropt, false);
  	else
  		_PrintFileData(AH, tctx->filename, ropt);
  }
***************
*** 365,374 **** _getBlobTocEntry(ArchiveHandle *AH, Oid *oid, char fname[K_STD_BUF_SIZE])
  }
  
  static void
! _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt)
  {
  	Oid			oid;
  	lclContext *ctx = (lclContext *) AH->formatData;
  	char		fname[K_STD_BUF_SIZE];
  
  	StartRestoreBlobs(AH);
--- 367,377 ----
  }
  
  static void
! _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt, bool compat)
  {
  	Oid			oid;
  	lclContext *ctx = (lclContext *) AH->formatData;
+ 	char		drop = (compat ? ropt->dropSchema : false);
  	char		fname[K_STD_BUF_SIZE];
  
  	StartRestoreBlobs(AH);
***************
*** 382,388 **** _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt)
  
  	while (oid != 0)
  	{
! 		StartRestoreBlob(AH, oid, ropt->dropSchema);
  		_PrintFileData(AH, fname, ropt);
  		EndRestoreBlob(AH, oid);
  		_getBlobTocEntry(AH, &oid, fname);
--- 385,391 ----
  
  	while (oid != 0)
  	{
! 		StartRestoreBlob(AH, oid, drop, compat);
  		_PrintFileData(AH, fname, ropt);
  		EndRestoreBlob(AH, oid);
  		_getBlobTocEntry(AH, &oid, fname);
*** a/src/bin/pg_dump/pg_backup_null.c
--- b/src/bin/pg_dump/pg_backup_null.c
***************
*** 147,160 **** _StartBlobs(ArchiveHandle *AH, TocEntry *te)
  static void
  _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid)
  {
  	if (oid == 0)
  		die_horribly(AH, NULL, "invalid OID for large object\n");
  
! 	if (AH->ropt->dropSchema)
  		DropBlobIfExists(AH, oid);
  
! 	ahprintf(AH, "SELECT pg_catalog.lo_open(pg_catalog.lo_create('%u'), %d);\n",
! 			 oid, INV_WRITE);
  
  	AH->WriteDataPtr = _WriteBlobData;
  }
--- 147,165 ----
  static void
  _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid)
  {
+ 	bool	compat = (strcmp(te->desc, "BLOBS") == 0 ? true : false);
+ 
  	if (oid == 0)
  		die_horribly(AH, NULL, "invalid OID for large object\n");
  
! 	if (compat && AH->ropt->dropSchema)
  		DropBlobIfExists(AH, oid);
  
! 	if (compat)
! 		ahprintf(AH, "SELECT pg_catalog.lo_open(pg_catalog.lo_create('%u'), %d);\n",
! 				 oid, INV_WRITE);
! 	else
! 		ahprintf(AH, "SELECT pg_catalog.lo_open(%u, %d);\n", oid, INV_WRITE);
  
  	AH->WriteDataPtr = _WriteBlobData;
  }
***************
*** 195,206 **** _PrintTocData(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  	{
  		AH->currToc = te;
  
! 		if (strcmp(te->desc, "BLOBS") == 0)
  			_StartBlobs(AH, te);
  
  		(*te->dataDumper) ((Archive *) AH, te->dataDumperArg);
  
! 		if (strcmp(te->desc, "BLOBS") == 0)
  			_EndBlobs(AH, te);
  
  		AH->currToc = NULL;
--- 200,213 ----
  	{
  		AH->currToc = te;
  
! 		if (strcmp(te->desc, "BLOBS") == 0 ||
! 			strcmp(te->desc, "BLOB DATA") == 0)
  			_StartBlobs(AH, te);
  
  		(*te->dataDumper) ((Archive *) AH, te->dataDumperArg);
  
! 		if (strcmp(te->desc, "BLOBS") == 0 ||
! 			strcmp(te->desc, "BLOB DATA") == 0)
  			_EndBlobs(AH, te);
  
  		AH->currToc = NULL;
*** a/src/bin/pg_dump/pg_backup_tar.c
--- b/src/bin/pg_dump/pg_backup_tar.c
***************
*** 100,106 **** typedef struct
  
  static const char *modulename = gettext_noop("tar archiver");
  
! static void _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt);
  
  static TAR_MEMBER *tarOpen(ArchiveHandle *AH, const char *filename, char mode);
  static void tarClose(ArchiveHandle *AH, TAR_MEMBER *TH);
--- 100,106 ----
  
  static const char *modulename = gettext_noop("tar archiver");
  
! static void _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt, bool compat);
  
  static TAR_MEMBER *tarOpen(ArchiveHandle *AH, const char *filename, char mode);
  static void tarClose(ArchiveHandle *AH, TAR_MEMBER *TH);
***************
*** 696,714 **** _PrintTocData(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  	}
  
  	if (strcmp(te->desc, "BLOBS") == 0)
! 		_LoadBlobs(AH, ropt);
  	else
  		_PrintFileData(AH, tctx->filename, ropt);
  }
  
  static void
! _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt)
  {
  	Oid			oid;
  	lclContext *ctx = (lclContext *) AH->formatData;
  	TAR_MEMBER *th;
  	size_t		cnt;
  	bool		foundBlob = false;
  	char		buf[4096];
  
  	StartRestoreBlobs(AH);
--- 696,717 ----
  	}
  
  	if (strcmp(te->desc, "BLOBS") == 0)
! 		_LoadBlobs(AH, ropt, true);
! 	else if (strcmp(te->desc, "BLOB DATA") == 0)
! 		_LoadBlobs(AH, ropt, false);
  	else
  		_PrintFileData(AH, tctx->filename, ropt);
  }
  
  static void
! _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt, bool compat)
  {
  	Oid			oid;
  	lclContext *ctx = (lclContext *) AH->formatData;
  	TAR_MEMBER *th;
  	size_t		cnt;
  	bool		foundBlob = false;
+ 	bool		drop = (compat ? ropt->dropSchema : false);
  	char		buf[4096];
  
  	StartRestoreBlobs(AH);
***************
*** 725,731 **** _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt)
  			{
  				ahlog(AH, 1, "restoring large object OID %u\n", oid);
  
! 				StartRestoreBlob(AH, oid, ropt->dropSchema);
  
  				while ((cnt = tarRead(buf, 4095, th)) > 0)
  				{
--- 728,734 ----
  			{
  				ahlog(AH, 1, "restoring large object OID %u\n", oid);
  
! 				StartRestoreBlob(AH, oid, drop, compat);
  
  				while ((cnt = tarRead(buf, 4095, th)) > 0)
  				{
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
***************
*** 190,198 **** static void selectSourceSchema(const char *schemaName);
  static char *getFormattedTypeName(Oid oid, OidOptions opts);
  static char *myFormatType(const char *typname, int32 typmod);
  static const char *fmtQualifiedId(const char *schema, const char *id);
! static bool hasBlobs(Archive *AH);
! static int	dumpBlobs(Archive *AH, void *arg);
! static int	dumpBlobComments(Archive *AH, void *arg);
  static void dumpDatabase(Archive *AH);
  static void dumpEncoding(Archive *AH);
  static void dumpStdStrings(Archive *AH);
--- 190,198 ----
  static char *getFormattedTypeName(Oid oid, OidOptions opts);
  static char *myFormatType(const char *typname, int32 typmod);
  static const char *fmtQualifiedId(const char *schema, const char *id);
! static void getBlobs(Archive *AH);
! static void dumpBlobItem(Archive *AH, BlobInfo *binfo);
! static int  dumpBlobData(Archive *AH, void *arg);
  static void dumpDatabase(Archive *AH);
  static void dumpEncoding(Archive *AH);
  static void dumpStdStrings(Archive *AH);
***************
*** 701,725 **** main(int argc, char **argv)
  			getTableDataFKConstraints();
  	}
  
! 	if (outputBlobs && hasBlobs(g_fout))
! 	{
! 		/* Add placeholders to allow correct sorting of blobs */
! 		DumpableObject *blobobj;
! 		DumpableObject *blobcobj;
! 
! 		blobobj = (DumpableObject *) malloc(sizeof(DumpableObject));
! 		blobobj->objType = DO_BLOBS;
! 		blobobj->catId = nilCatalogId;
! 		AssignDumpId(blobobj);
! 		blobobj->name = strdup("BLOBS");
! 
! 		blobcobj = (DumpableObject *) malloc(sizeof(DumpableObject));
! 		blobcobj->objType = DO_BLOB_COMMENTS;
! 		blobcobj->catId = nilCatalogId;
! 		AssignDumpId(blobcobj);
! 		blobcobj->name = strdup("BLOB COMMENTS");
! 		addObjectDependency(blobcobj, blobobj->dumpId);
! 	}
  
  	/*
  	 * Collect dependency data to assist in ordering the objects.
--- 701,708 ----
  			getTableDataFKConstraints();
  	}
  
! 	if (outputBlobs)
! 		getBlobs(g_fout);
  
  	/*
  	 * Collect dependency data to assist in ordering the objects.
***************
*** 1938,1980 **** dumpStdStrings(Archive *AH)
  
  
  /*
!  * hasBlobs:
   *	Test whether database contains any large objects
   */
! static bool
! hasBlobs(Archive *AH)
  {
! 	bool		result;
! 	const char *blobQry;
! 	PGresult   *res;
  
  	/* Make sure we are in proper schema */
  	selectSourceSchema("pg_catalog");
  
  	/* Check for BLOB OIDs */
  	if (AH->remoteVersion >= 80500)
! 		blobQry = "SELECT oid FROM pg_largeobject_metadata LIMIT 1";
  	else if (AH->remoteVersion >= 70100)
! 		blobQry = "SELECT loid FROM pg_largeobject LIMIT 1";
  	else
! 		blobQry = "SELECT oid FROM pg_class WHERE relkind = 'l' LIMIT 1";
  
! 	res = PQexec(g_conn, blobQry);
! 	check_sql_result(res, g_conn, blobQry, PGRES_TUPLES_OK);
  
! 	result = PQntuples(res) > 0;
  
  	PQclear(res);
  
! 	return result;
  }
  
  /*
!  * dumpBlobs:
!  *	dump all blobs
   */
  static int
! dumpBlobs(Archive *AH, void *arg)
  {
  	const char *blobQry;
  	const char *blobFetchQry;
--- 1921,2066 ----
  
  
  /*
!  * getBlobs:
   *	Test whether database contains any large objects
   */
! static void
! getBlobs(Archive *AH)
  {
! 	PQExpBuffer		blobQry = createPQExpBuffer();
! 	BlobInfo	   *binfo;
! 	DumpableObject *bdata;
! 	PGresult	   *res;
! 	int				i;
! 
! 	/* Verbose message */
! 	if (g_verbose)
! 		write_msg(NULL, "reading binary large objects\n");
  
  	/* Make sure we are in proper schema */
  	selectSourceSchema("pg_catalog");
  
  	/* Check for BLOB OIDs */
  	if (AH->remoteVersion >= 80500)
! 		appendPQExpBuffer(blobQry,
! 						  "SELECT oid, (%s lomowner), lomacl,"
! 						  " obj_description(oid, 'pg_largeobject')"
! 						  " FROM pg_largeobject_metadata",
! 						  username_subquery);
! 	else if (AH->remoteVersion >= 70200)
! 		appendPQExpBuffer(blobQry,
! 						  "SELECT DISTINCT loid, NULL, NULL,"
! 						  " obj_description(loid, 'pg_largeobject')"
! 						  " FROM pg_largeobject");
  	else if (AH->remoteVersion >= 70100)
! 		appendPQExpBuffer(blobQry,
! 						  "SELECT DISTINCT loid, NULL, NULL,"
! 						  " obj_description(loid)"
! 						  " FROM pg_largeobject");
  	else
! 		appendPQExpBuffer(blobQry,
! 						  "SELECT DISTINCT oid, NULL, NULL,"
! 						  " obj_description(oid)"
! 						  " FROM pg_class WHERE relkind = 'l'");
  
! 	res = PQexec(g_conn, blobQry->data);
! 	check_sql_result(res, g_conn, blobQry->data, PGRES_TUPLES_OK);
! 
! 	/*
! 	 * Now, a large object has its own "BLOB ITEM" section to
! 	 * declare itself.
! 	 */
! 	for (i = 0; i < PQntuples(res); i++)
! 	{
! 		binfo = (BlobInfo *) malloc(sizeof(BlobInfo));
! 		binfo->dobj.objType = DO_BLOB_ITEM;
! 		binfo->dobj.catId = nilCatalogId;
! 		AssignDumpId(&binfo->dobj);
  
! 		binfo->dobj.name = strdup(PQgetvalue(res, i, 0));
! 		binfo->rolname = strdup(PQgetvalue(res, i, 1));
! 		binfo->blobacl = strdup(PQgetvalue(res, i, 2));
! 		binfo->blobdescr = strdup(PQgetvalue(res, i, 3));
! 	}
! 
! 	/*
! 	 * If we have a large object at least, "BLOB DATA" section
! 	 * is also necessary.
! 	 */
! 	if (PQntuples(res) > 0)
! 	{
! 		bdata = (DumpableObject *) malloc(sizeof(DumpableObject));
! 		bdata->objType = DO_BLOB_DATA;
! 		bdata->catId = nilCatalogId;
! 		AssignDumpId(bdata);
! 		bdata->name = strdup("BLOBS");
! 	}
  
  	PQclear(res);
+ }
  
! /*
!  * dumpBlobItem
!  *
!  * dump a definition of the given large object
!  */
! static void
! dumpBlobItem(Archive *AH, BlobInfo *binfo)
! {
! 	PQExpBuffer		aquery = createPQExpBuffer();
! 	PQExpBuffer		bquery = createPQExpBuffer();
! 	PQExpBuffer		dquery = createPQExpBuffer();
! 
! 	/*
! 	 * Cleanup a large object
! 	 */
! 	appendPQExpBuffer(dquery, "SELECT lo_unlink(%s);\n", binfo->dobj.name);
! 
! 	/*
! 	 * Create an empty large object
! 	 */
! 	appendPQExpBuffer(bquery, "SELECT lo_create(%s);\n", binfo->dobj.name);
! 
! 	/*
! 	 * Create a comment on large object, if necessary
! 	 */
! 	if (strlen(binfo->blobdescr) > 0)
! 	{
! 		appendPQExpBuffer(bquery, "\nCOMMENT ON LARGE OBJECT %s IS ",
! 						  binfo->dobj.name);
! 		appendStringLiteralAH(bquery, binfo->blobdescr, AH);
! 		appendPQExpBuffer(bquery, ";\n");
! 	}
! 
! 	ArchiveEntry(AH, binfo->dobj.catId, binfo->dobj.dumpId,
! 				 binfo->dobj.name,
! 				 NULL, NULL,
! 				 binfo->rolname, false,
! 				 "BLOB ITEM", SECTION_DATA,
! 				 bquery->data, dquery->data, NULL,
! 				 binfo->dobj.dependencies, binfo->dobj.nDeps,
! 				 NULL, NULL);
! 
! 	/*
! 	 * Dump access privileges, if necessary
! 	 */
! 	dumpACL(AH, binfo->dobj.catId, binfo->dobj.dumpId,
! 			"LARGE OBJECT",
! 			binfo->dobj.name, NULL,
! 			binfo->dobj.name, NULL,
! 			binfo->rolname, binfo->blobacl);
! 
! 	destroyPQExpBuffer(aquery);
! 	destroyPQExpBuffer(bquery);
! 	destroyPQExpBuffer(dquery);
  }
  
  /*
!  * dumpBlobData:
!  *	dump all the data contents of large object
   */
  static int
! dumpBlobData(Archive *AH, void *arg)
  {
  	const char *blobQry;
  	const char *blobFetchQry;
***************
*** 2022,2028 **** dumpBlobs(Archive *AH, void *arg)
  			loFd = lo_open(g_conn, blobOid, INV_READ);
  			if (loFd == -1)
  			{
! 				write_msg(NULL, "dumpBlobs(): could not open large object: %s",
  						  PQerrorMessage(g_conn));
  				exit_nicely();
  			}
--- 2108,2114 ----
  			loFd = lo_open(g_conn, blobOid, INV_READ);
  			if (loFd == -1)
  			{
! 				write_msg(NULL, "dumpBlobData(): could not open large object: %s",
  						  PQerrorMessage(g_conn));
  				exit_nicely();
  			}
***************
*** 2035,2041 **** dumpBlobs(Archive *AH, void *arg)
  				cnt = lo_read(g_conn, loFd, buf, LOBBUFSIZE);
  				if (cnt < 0)
  				{
! 					write_msg(NULL, "dumpBlobs(): error reading large object: %s",
  							  PQerrorMessage(g_conn));
  					exit_nicely();
  				}
--- 2121,2127 ----
  				cnt = lo_read(g_conn, loFd, buf, LOBBUFSIZE);
  				if (cnt < 0)
  				{
! 					write_msg(NULL, "dumpBlobData(): error reading large object: %s",
  							  PQerrorMessage(g_conn));
  					exit_nicely();
  				}
***************
*** 2054,2187 **** dumpBlobs(Archive *AH, void *arg)
  	return 1;
  }
  
- /*
-  * dumpBlobComments
-  *	dump all blob properties.
-  *  It has "BLOB COMMENTS" tag due to the historical reason, but note
-  *  that it is the routine to dump all the properties of blobs.
-  *
-  * Since we don't provide any way to be selective about dumping blobs,
-  * there's no need to be selective about their comments either.  We put
-  * all the comments into one big TOC entry.
-  */
- static int
- dumpBlobComments(Archive *AH, void *arg)
- {
- 	const char *blobQry;
- 	const char *blobFetchQry;
- 	PQExpBuffer cmdQry = createPQExpBuffer();
- 	PGresult   *res;
- 	int			i;
- 
- 	if (g_verbose)
- 		write_msg(NULL, "saving large object properties\n");
- 
- 	/* Make sure we are in proper schema */
- 	selectSourceSchema("pg_catalog");
- 
- 	/* Cursor to get all BLOB comments */
- 	if (AH->remoteVersion >= 80500)
- 		blobQry = "DECLARE blobcmt CURSOR FOR SELECT oid, "
- 			"obj_description(oid, 'pg_largeobject'), "
- 			"pg_get_userbyid(lomowner), lomacl "
- 			"FROM pg_largeobject_metadata";
- 	else if (AH->remoteVersion >= 70300)
- 		blobQry = "DECLARE blobcmt CURSOR FOR SELECT loid, "
- 			"obj_description(loid, 'pg_largeobject'), NULL, NULL "
- 			"FROM (SELECT DISTINCT loid FROM "
- 			"pg_description d JOIN pg_largeobject l ON (objoid = loid) "
- 			"WHERE classoid = 'pg_largeobject'::regclass) ss";
- 	else if (AH->remoteVersion >= 70200)
- 		blobQry = "DECLARE blobcmt CURSOR FOR SELECT loid, "
- 			"obj_description(loid, 'pg_largeobject'), NULL, NULL "
- 			"FROM (SELECT DISTINCT loid FROM pg_largeobject) ss";
- 	else if (AH->remoteVersion >= 70100)
- 		blobQry = "DECLARE blobcmt CURSOR FOR SELECT loid, "
- 			"obj_description(loid), NULL, NULL "
- 			"FROM (SELECT DISTINCT loid FROM pg_largeobject) ss";
- 	else
- 		blobQry = "DECLARE blobcmt CURSOR FOR SELECT oid, "
- 			"	( "
- 			"		SELECT description "
- 			"		FROM pg_description pd "
- 			"		WHERE pd.objoid=pc.oid "
- 			"	), NULL, NULL "
- 			"FROM pg_class pc WHERE relkind = 'l'";
- 
- 	res = PQexec(g_conn, blobQry);
- 	check_sql_result(res, g_conn, blobQry, PGRES_COMMAND_OK);
- 
- 	/* Command to fetch from cursor */
- 	blobFetchQry = "FETCH 100 IN blobcmt";
- 
- 	do
- 	{
- 		PQclear(res);
- 
- 		/* Do a fetch */
- 		res = PQexec(g_conn, blobFetchQry);
- 		check_sql_result(res, g_conn, blobFetchQry, PGRES_TUPLES_OK);
- 
- 		/* Process the tuples, if any */
- 		for (i = 0; i < PQntuples(res); i++)
- 		{
- 			Oid			blobOid = atooid(PQgetvalue(res, i, 0));
- 			char	   *lo_comment = PQgetvalue(res, i, 1);
- 			char	   *lo_owner = PQgetvalue(res, i, 2);
- 			char	   *lo_acl = PQgetvalue(res, i, 3);
- 			char		lo_name[32];
- 
- 			resetPQExpBuffer(cmdQry);
- 
- 			/* comment on the blob */
- 			if (!PQgetisnull(res, i, 1))
- 			{
- 				appendPQExpBuffer(cmdQry,
- 								  "COMMENT ON LARGE OBJECT %u IS ", blobOid);
- 				appendStringLiteralAH(cmdQry, lo_comment, AH);
- 				appendPQExpBuffer(cmdQry, ";\n");
- 			}
- 
- 			/* dump blob ownership, if necessary */
- 			if (!PQgetisnull(res, i, 2))
- 			{
- 				appendPQExpBuffer(cmdQry,
- 								  "ALTER LARGE OBJECT %u OWNER TO %s;\n",
- 								  blobOid, lo_owner);
- 			}
- 
- 			/* dump blob privileges, if necessary */
- 			if (!PQgetisnull(res, i, 3) &&
- 				!dataOnly && !aclsSkip)
- 			{
- 				snprintf(lo_name, sizeof(lo_name), "%u", blobOid);
- 				if (!buildACLCommands(lo_name, NULL, "LARGE OBJECT",
- 									  lo_acl, lo_owner, "",
- 									  AH->remoteVersion, cmdQry))
- 				{
- 					write_msg(NULL, "could not parse ACL (%s) for "
- 							  "large object %u", lo_acl, blobOid);
- 					exit_nicely();
- 				}
- 			}
- 
- 			if (cmdQry->len > 0)
- 			{
- 				appendPQExpBuffer(cmdQry, "\n");
- 				archputs(cmdQry->data, AH);
- 			}
- 		}
- 	} while (PQntuples(res) > 0);
- 
- 	PQclear(res);
- 
- 	archputs("\n", AH);
- 
- 	destroyPQExpBuffer(cmdQry);
- 
- 	return 1;
- }
- 
  static void
  binary_upgrade_set_type_oids_by_type_oid(PQExpBuffer upgrade_buffer,
  											   Oid pg_type_oid)
--- 2140,2145 ----
***************
*** 6524,6544 **** dumpDumpableObject(Archive *fout, DumpableObject *dobj)
  		case DO_DEFAULT_ACL:
  			dumpDefaultACL(fout, (DefaultACLInfo *) dobj);
  			break;
! 		case DO_BLOBS:
! 			ArchiveEntry(fout, dobj->catId, dobj->dumpId,
! 						 dobj->name, NULL, NULL, "",
! 						 false, "BLOBS", SECTION_DATA,
! 						 "", "", NULL,
! 						 dobj->dependencies, dobj->nDeps,
! 						 dumpBlobs, NULL);
  			break;
! 		case DO_BLOB_COMMENTS:
  			ArchiveEntry(fout, dobj->catId, dobj->dumpId,
  						 dobj->name, NULL, NULL, "",
! 						 false, "BLOB COMMENTS", SECTION_DATA,
  						 "", "", NULL,
  						 dobj->dependencies, dobj->nDeps,
! 						 dumpBlobComments, NULL);
  			break;
  	}
  }
--- 6482,6497 ----
  		case DO_DEFAULT_ACL:
  			dumpDefaultACL(fout, (DefaultACLInfo *) dobj);
  			break;
! 		case DO_BLOB_ITEM:
! 			dumpBlobItem(fout, (BlobInfo *) dobj);
  			break;
! 		case DO_BLOB_DATA:
  			ArchiveEntry(fout, dobj->catId, dobj->dumpId,
  						 dobj->name, NULL, NULL, "",
! 						 false, "BLOB DATA", SECTION_DATA,
  						 "", "", NULL,
  						 dobj->dependencies, dobj->nDeps,
! 						 dumpBlobData, NULL);
  			break;
  	}
  }
***************
*** 10394,10401 **** dumpACL(Archive *fout, CatalogId objCatId, DumpId objDumpId,
  {
  	PQExpBuffer sql;
  
! 	/* Do nothing if ACL dump is not enabled */
! 	if (dataOnly || aclsSkip)
  		return;
  
  	sql = createPQExpBuffer();
--- 10347,10359 ----
  {
  	PQExpBuffer sql;
  
! 	/*
! 	 * Do nothing if ACL dump is not enabled
! 	 *
! 	 * Note that the caller has to check necessity to dump ACLs
! 	 * depending on --data-only / --schema-only.
! 	 */
! 	if (aclsSkip)
  		return;
  
  	sql = createPQExpBuffer();
*** a/src/bin/pg_dump/pg_dump.h
--- b/src/bin/pg_dump/pg_dump.h
***************
*** 115,122 **** typedef enum
  	DO_FDW,
  	DO_FOREIGN_SERVER,
  	DO_DEFAULT_ACL,
! 	DO_BLOBS,
! 	DO_BLOB_COMMENTS
  } DumpableObjectType;
  
  typedef struct _dumpableObject
--- 115,122 ----
  	DO_FDW,
  	DO_FOREIGN_SERVER,
  	DO_DEFAULT_ACL,
! 	DO_BLOB_DATA,
! 	DO_BLOB_ITEM,
  } DumpableObjectType;
  
  typedef struct _dumpableObject
***************
*** 443,448 **** typedef struct _defaultACLInfo
--- 443,456 ----
  	char	   *defaclacl;
  } DefaultACLInfo;
  
+ typedef struct _blobInfo
+ {
+ 	DumpableObject	dobj;
+ 	char	   *rolname;
+ 	char	   *blobacl;
+ 	char	   *blobdescr;
+ } BlobInfo;
+ 
  /* global decls */
  extern bool force_quotes;		/* double-quotes for identifiers flag */
  extern bool g_verbose;			/* verbose flag */
*** a/src/bin/pg_dump/pg_dump_sort.c
--- b/src/bin/pg_dump/pg_dump_sort.c
***************
*** 92,99 **** static const int newObjectTypePriority[] =
  	14,							/* DO_FDW */
  	15,							/* DO_FOREIGN_SERVER */
  	27,							/* DO_DEFAULT_ACL */
! 	20,							/* DO_BLOBS */
! 	21							/* DO_BLOB_COMMENTS */
  };
  
  
--- 92,99 ----
  	14,							/* DO_FDW */
  	15,							/* DO_FOREIGN_SERVER */
  	27,							/* DO_DEFAULT_ACL */
! 	21,							/* DO_BLOB_DATA */
! 	20,							/* DO_BLOB_ITEM */
  };
  
  
***************
*** 1146,1159 **** describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
  					 "DEFAULT ACL %s  (ID %d OID %u)",
  					 obj->name, obj->dumpId, obj->catId.oid);
  			return;
! 		case DO_BLOBS:
  			snprintf(buf, bufsize,
! 					 "BLOBS  (ID %d)",
  					 obj->dumpId);
  			return;
! 		case DO_BLOB_COMMENTS:
  			snprintf(buf, bufsize,
! 					 "BLOB COMMENTS  (ID %d)",
  					 obj->dumpId);
  			return;
  	}
--- 1146,1159 ----
  					 "DEFAULT ACL %s  (ID %d OID %u)",
  					 obj->name, obj->dumpId, obj->catId.oid);
  			return;
! 		case DO_BLOB_DATA:
  			snprintf(buf, bufsize,
! 					 "BLOB DATA  (ID %d)",
  					 obj->dumpId);
  			return;
! 		case DO_BLOB_ITEM:
  			snprintf(buf, bufsize,
! 					 "BLOB ITEM  (ID %d)",
  					 obj->dumpId);
  			return;
  	}
#118Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp
In reply to: KaiGai Kohei (#117)
Re: Largeobject Access Controls (r2460)

KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

I don't think this is necessarily a good idea. We might decide to treat
both things separately in the future and it having them represented
separately in the dump would prove useful.

I agree. From design perspective, the single section approach is more
simple than dual section, but its change set is larger than the dual.

OK.

When I tested a custom dump with pg_restore, --clean & --single-transaction
will fail with the new dump format because it always call lo_unlink()
even if the large object doesn't exist. It comes from dumpBlobItem:

! dumpBlobItem(Archive *AH, BlobInfo *binfo)
! appendPQExpBuffer(dquery, "SELECT lo_unlink(%s);\n", binfo->dobj.name);

The query in DropBlobIfExists() could avoid errors -- should we use it here?
| SELECT lo_unlink(oid) FROM pg_largeobject_metadata WHERE oid = %s;

BTW, --clean option is ambiguous if combined with --data-only. Restoring
large objects fails for the above reason if previous objects don't exist,
but table data are restored *without* truncation of existing data. Will
normal users expect TRUNCATE-before-load for --clean & --data-only cases?

Present behaviors are;
Table data - Appended. (--clean is ignored)
Large objects - End with an error if object doesn't exist.
IMO, ideal behaviors are:
Table data - Truncate existing data and load new ones.
Large objects - Work like as MERGE (or REPLACE, UPSERT).

Comments?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

#119KaiGai Kohei
kaigai@kaigai.gr.jp
In reply to: Takahiro Itagaki (#118)
Re: Largeobject Access Controls (r2460)

(2010/02/09 20:16), Takahiro Itagaki wrote:

KaiGai Kohei<kaigai@ak.jp.nec.com> wrote:

I don't think this is necessarily a good idea. We might decide to treat
both things separately in the future and it having them represented
separately in the dump would prove useful.

I agree. From design perspective, the single section approach is more
simple than dual section, but its change set is larger than the dual.

OK.

When I tested a custom dump with pg_restore, --clean& --single-transaction
will fail with the new dump format because it always call lo_unlink()
even if the large object doesn't exist. It comes from dumpBlobItem:

! dumpBlobItem(Archive *AH, BlobInfo *binfo)
! appendPQExpBuffer(dquery, "SELECT lo_unlink(%s);\n", binfo->dobj.name);

The query in DropBlobIfExists() could avoid errors -- should we use it here?
| SELECT lo_unlink(oid) FROM pg_largeobject_metadata WHERE oid = %s;

Yes, we can use this query to handle --clean option.
I'll fix it soon.

BTW, --clean option is ambiguous if combined with --data-only. Restoring
large objects fails for the above reason if previous objects don't exist,
but table data are restored *without* truncation of existing data. Will
normal users expect TRUNCATE-before-load for --clean& --data-only cases?

Present behaviors are;
Table data - Appended. (--clean is ignored)
Large objects - End with an error if object doesn't exist.
IMO, ideal behaviors are:
Table data - Truncate existing data and load new ones.
Large objects - Work like as MERGE (or REPLACE, UPSERT).

Comments?

In the existing "BLOBS" section, it creates and restores large objects
in same time. And, it also unlink existing large object (if exists)
just before restoring them, when --clean is given.

In my opinion, when --clean is given, it also should truncate the table
before restoring, even if --data-only is co-given.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

#120KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: KaiGai Kohei (#119)
1 attachment(s)
Re: Largeobject Access Controls (r2460)

(2010/02/09 21:18), KaiGai Kohei wrote:

(2010/02/09 20:16), Takahiro Itagaki wrote:

KaiGai Kohei<kaigai@ak.jp.nec.com> wrote:

I don't think this is necessarily a good idea. We might decide to treat
both things separately in the future and it having them represented
separately in the dump would prove useful.

I agree. From design perspective, the single section approach is more
simple than dual section, but its change set is larger than the dual.

OK.

When I tested a custom dump with pg_restore, --clean&
--single-transaction
will fail with the new dump format because it always call lo_unlink()
even if the large object doesn't exist. It comes from dumpBlobItem:

! dumpBlobItem(Archive *AH, BlobInfo *binfo)
! appendPQExpBuffer(dquery, "SELECT lo_unlink(%s);\n", binfo->dobj.name);

The query in DropBlobIfExists() could avoid errors -- should we use it
here?
| SELECT lo_unlink(oid) FROM pg_largeobject_metadata WHERE oid = %s;

Yes, we can use this query to handle --clean option.
I'll fix it soon.

The attached patch fixed up the cleanup query as follows:

+   appendPQExpBuffer(dquery,
+                     "SELECT pg_catalog.lo_unlink(oid) "
+                     "FROM pg_catalog.pg_largeobject_metadata "
+                     "WHERE oid = %s;\n", binfo->dobj.name);

And, I also noticed that lo_create() was not prefixed by "pg_catalog.",
so I also add it.

Rest of parts were not changed.

Thanks,

BTW, --clean option is ambiguous if combined with --data-only. Restoring
large objects fails for the above reason if previous objects don't exist,
but table data are restored *without* truncation of existing data. Will
normal users expect TRUNCATE-before-load for --clean& --data-only cases?

Present behaviors are;
Table data - Appended. (--clean is ignored)
Large objects - End with an error if object doesn't exist.
IMO, ideal behaviors are:
Table data - Truncate existing data and load new ones.
Large objects - Work like as MERGE (or REPLACE, UPSERT).

Comments?

In the existing "BLOBS" section, it creates and restores large objects
in same time. And, it also unlink existing large object (if exists)
just before restoring them, when --clean is given.

In my opinion, when --clean is given, it also should truncate the table
before restoring, even if --data-only is co-given.

Thanks,

--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

pgsql-fix-pg_dump-blob-privs.7.patchapplication/octect-stream; name=pgsql-fix-pg_dump-blob-privs.7.patchDownload
 src/bin/pg_dump/pg_backup_archiver.c |   48 ++++--
 src/bin/pg_dump/pg_backup_archiver.h |    2 +-
 src/bin/pg_dump/pg_backup_custom.c   |   11 +-
 src/bin/pg_dump/pg_backup_db.c       |   20 +--
 src/bin/pg_dump/pg_backup_files.c    |   11 +-
 src/bin/pg_dump/pg_backup_null.c     |   17 ++-
 src/bin/pg_dump/pg_backup_tar.c      |   11 +-
 src/bin/pg_dump/pg_dump.c            |  321 +++++++++++++++-------------------
 src/bin/pg_dump/pg_dump.h            |   12 +-
 src/bin/pg_dump/pg_dump_sort.c       |   12 +-
 10 files changed, 235 insertions(+), 230 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 24c0fd4..22fef19 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -329,7 +329,7 @@ RestoreArchive(Archive *AHX, RestoreOptions *ropt)
 			AH->currentTE = te;
 
 			reqs = _tocEntryRequired(te, ropt, false /* needn't drop ACLs */ );
-			if (((reqs & REQ_SCHEMA) != 0) && te->dropStmt)
+			if (((reqs & (REQ_SCHEMA|REQ_DATA)) != 0) && te->dropStmt)
 			{
 				/* We want the schema */
 				ahlog(AH, 1, "dropping %s %s\n", te->desc, te->tag);
@@ -381,7 +381,8 @@ RestoreArchive(Archive *AHX, RestoreOptions *ropt)
 		/* Work out what, if anything, we want from this entry */
 		reqs = _tocEntryRequired(te, ropt, true);
 
-		if ((reqs & REQ_SCHEMA) != 0)	/* We want the schema */
+		/* Access privileges for both of schema and data */
+		if ((reqs & (REQ_SCHEMA|REQ_DATA)) != 0)
 		{
 			ahlog(AH, 1, "setting owner and privileges for %s %s\n",
 				  te->desc, te->tag);
@@ -520,6 +521,7 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 				_printTocEntry(AH, te, ropt, true, false);
 
 				if (strcmp(te->desc, "BLOBS") == 0 ||
+					strcmp(te->desc, "BLOB DATA") == 0 ||
 					strcmp(te->desc, "BLOB COMMENTS") == 0)
 				{
 					ahlog(AH, 1, "restoring %s\n", te->desc);
@@ -903,7 +905,7 @@ EndRestoreBlobs(ArchiveHandle *AH)
  * Called by a format handler to initiate restoration of a blob
  */
 void
-StartRestoreBlob(ArchiveHandle *AH, Oid oid, bool drop)
+StartRestoreBlob(ArchiveHandle *AH, Oid oid, bool drop, bool compat)
 {
 	Oid			loOid;
 
@@ -919,19 +921,24 @@ StartRestoreBlob(ArchiveHandle *AH, Oid oid, bool drop)
 
 	if (AH->connection)
 	{
-		loOid = lo_create(AH->connection, oid);
-		if (loOid == 0 || loOid != oid)
-			die_horribly(AH, modulename, "could not create large object %u\n",
-						 oid);
-
+		if (compat)
+		{
+			loOid = lo_create(AH->connection, oid);
+			if (loOid == 0 || loOid != oid)
+				die_horribly(AH, modulename, "could not create large object %u\n",
+							 oid);
+		}
 		AH->loFd = lo_open(AH->connection, oid, INV_WRITE);
 		if (AH->loFd == -1)
 			die_horribly(AH, modulename, "could not open large object\n");
 	}
 	else
 	{
-		ahprintf(AH, "SELECT pg_catalog.lo_open(pg_catalog.lo_create('%u'), %d);\n",
-				 oid, INV_WRITE);
+		if (compat)
+			ahprintf(AH, "SELECT pg_catalog.lo_open(pg_catalog.lo_create('%u'), %d);\n", oid, INV_WRITE);
+		else
+			ahprintf(AH, "SELECT pg_catalog.lo_open(%u, %d);\n",
+					 oid, INV_WRITE);
 	}
 
 	AH->writingBlob = 1;
@@ -1940,7 +1947,8 @@ WriteDataChunks(ArchiveHandle *AH)
 			AH->currToc = te;
 			/* printf("Writing data for %d (%x)\n", te->id, te); */
 
-			if (strcmp(te->desc, "BLOBS") == 0)
+			if (strcmp(te->desc, "BLOBS") == 0 ||
+				strcmp(te->desc, "BLOB DATA") == 0)
 			{
 				startPtr = AH->StartBlobsPtr;
 				endPtr = AH->EndBlobsPtr;
@@ -2077,6 +2085,7 @@ ReadToc(ArchiveHandle *AH)
 				te->section = SECTION_NONE;
 			else if (strcmp(te->desc, "TABLE DATA") == 0 ||
 					 strcmp(te->desc, "BLOBS") == 0 ||
+					 strcmp(te->desc, "BLOB DATA") == 0 ||
 					 strcmp(te->desc, "BLOB COMMENTS") == 0)
 				te->section = SECTION_DATA;
 			else if (strcmp(te->desc, "CONSTRAINT") == 0 ||
@@ -2286,9 +2295,14 @@ _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls)
 	if (!te->hadDumper)
 	{
 		/*
-		 * Special Case: If 'SEQUENCE SET' then it is considered a data entry
+		 * Special Case: If 'SEQUENCE SET', 'BLOB ITEM' or 'ACL' for large
+		 * objects, then it is considered a data entry
+		 *
+		 * XXX - we assume te->tag is not numeric except for large objects.
 		 */
-		if (strcmp(te->desc, "SEQUENCE SET") == 0)
+		if (strcmp(te->desc, "SEQUENCE SET") == 0 ||
+			strcmp(te->desc, "BLOB ITEM") == 0 ||
+			(strcmp(te->desc, "ACL") == 0 && atooid(te->tag) > 0))
 			res = res & REQ_DATA;
 		else
 			res = res & ~REQ_DATA;
@@ -2713,6 +2727,13 @@ _getObjectDescription(PQExpBuffer buf, TocEntry *te, ArchiveHandle *AH)
 		return;
 	}
 
+	/* Use ALTER LARGE OBJECT for BLOB ITEM */
+	if (strcmp(type, "BLOB ITEM") == 0)
+	{
+		appendPQExpBuffer(buf, "LARGE OBJECT %s", te->tag);
+		return;
+	}
+
 	write_msg(modulename, "WARNING: don't know how to set owner for object type %s\n",
 			  type);
 }
@@ -2824,6 +2845,7 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt, bool isDat
 		strlen(te->owner) > 0 && strlen(te->dropStmt) > 0)
 	{
 		if (strcmp(te->desc, "AGGREGATE") == 0 ||
+			strcmp(te->desc, "BLOB ITEM") == 0 ||
 			strcmp(te->desc, "CONVERSION") == 0 ||
 			strcmp(te->desc, "DATABASE") == 0 ||
 			strcmp(te->desc, "DOMAIN") == 0 ||
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index c09cec5..6f70899 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -359,7 +359,7 @@ int			ReadOffset(ArchiveHandle *, pgoff_t *);
 size_t		WriteOffset(ArchiveHandle *, pgoff_t, int);
 
 extern void StartRestoreBlobs(ArchiveHandle *AH);
-extern void StartRestoreBlob(ArchiveHandle *AH, Oid oid, bool drop);
+extern void StartRestoreBlob(ArchiveHandle *AH, Oid oid, bool drop, bool compat);
 extern void EndRestoreBlob(ArchiveHandle *AH, Oid oid);
 extern void EndRestoreBlobs(ArchiveHandle *AH);
 
diff --git a/src/bin/pg_dump/pg_backup_custom.c b/src/bin/pg_dump/pg_backup_custom.c
index ea16c0b..fc815cf 100644
--- a/src/bin/pg_dump/pg_backup_custom.c
+++ b/src/bin/pg_dump/pg_backup_custom.c
@@ -54,7 +54,7 @@ static void _StartBlobs(ArchiveHandle *AH, TocEntry *te);
 static void _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
 static void _EndBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
 static void _EndBlobs(ArchiveHandle *AH, TocEntry *te);
-static void _LoadBlobs(ArchiveHandle *AH, bool drop);
+static void _LoadBlobs(ArchiveHandle *AH, bool drop, bool compat);
 static void _Clone(ArchiveHandle *AH);
 static void _DeClone(ArchiveHandle *AH);
 
@@ -498,7 +498,10 @@ _PrintTocData(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
 			break;
 
 		case BLK_BLOBS:
-			_LoadBlobs(AH, ropt->dropSchema);
+			if (strcmp(te->desc, "BLOBS") == 0)
+				_LoadBlobs(AH, ropt->dropSchema, true);
+			else
+				_LoadBlobs(AH, false, false);
 			break;
 
 		default:				/* Always have a default */
@@ -619,7 +622,7 @@ _PrintData(ArchiveHandle *AH)
 }
 
 static void
-_LoadBlobs(ArchiveHandle *AH, bool drop)
+_LoadBlobs(ArchiveHandle *AH, bool drop, bool compat)
 {
 	Oid			oid;
 
@@ -628,7 +631,7 @@ _LoadBlobs(ArchiveHandle *AH, bool drop)
 	oid = ReadInt(AH);
 	while (oid != 0)
 	{
-		StartRestoreBlob(AH, oid, drop);
+		StartRestoreBlob(AH, oid, drop, compat);
 		_PrintData(AH);
 		EndRestoreBlob(AH, oid);
 		oid = ReadInt(AH);
diff --git a/src/bin/pg_dump/pg_backup_db.c b/src/bin/pg_dump/pg_backup_db.c
index 6a195a9..61fa3b9 100644
--- a/src/bin/pg_dump/pg_backup_db.c
+++ b/src/bin/pg_dump/pg_backup_db.c
@@ -12,6 +12,7 @@
 
 #include "pg_backup_db.h"
 #include "dumputils.h"
+#include "libpq/libpq-fs.h"
 
 #include <unistd.h>
 
@@ -656,17 +657,14 @@ void
 DropBlobIfExists(ArchiveHandle *AH, Oid oid)
 {
 	/* Call lo_unlink only if exists to avoid not-found error. */
-	if (PQserverVersion(AH->connection) >= 80500)
-	{
-		ahprintf(AH, "SELECT pg_catalog.lo_unlink(oid) "
-					 "FROM pg_catalog.pg_largeobject_metadata "
-					 "WHERE oid = %u;\n", oid);
-	}
-	else
-	{
-		ahprintf(AH, "SELECT CASE WHEN EXISTS(SELECT 1 FROM pg_catalog.pg_largeobject WHERE loid = '%u') THEN pg_catalog.lo_unlink('%u') END;\n",
-				 oid, oid);
-	}
+	if (AH->connection &&
+		PQserverVersion(AH->connection) < 80500)
+		die_horribly(AH, NULL,
+					 "could not restore large object into older server");
+
+	ahprintf(AH, "SELECT pg_catalog.lo_unlink(oid) "
+			 "FROM pg_catalog.pg_largeobject_metadata "
+			 "WHERE oid = %u;\n", oid);
 }
 
 static bool
diff --git a/src/bin/pg_dump/pg_backup_files.c b/src/bin/pg_dump/pg_backup_files.c
index 1faac0a..6406fcb 100644
--- a/src/bin/pg_dump/pg_backup_files.c
+++ b/src/bin/pg_dump/pg_backup_files.c
@@ -66,7 +66,7 @@ typedef struct
 } lclTocEntry;
 
 static const char *modulename = gettext_noop("file archiver");
-static void _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt);
+static void _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt, bool compat);
 static void _getBlobTocEntry(ArchiveHandle *AH, Oid *oid, char *fname);
 
 /*
@@ -330,7 +330,9 @@ _PrintTocData(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
 		return;
 
 	if (strcmp(te->desc, "BLOBS") == 0)
-		_LoadBlobs(AH, ropt);
+		_LoadBlobs(AH, ropt, true);
+	else if (strcmp(te->desc, "BLOB DATA") == 0)
+		_LoadBlobs(AH, ropt, false);
 	else
 		_PrintFileData(AH, tctx->filename, ropt);
 }
@@ -365,10 +367,11 @@ _getBlobTocEntry(ArchiveHandle *AH, Oid *oid, char fname[K_STD_BUF_SIZE])
 }
 
 static void
-_LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt)
+_LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt, bool compat)
 {
 	Oid			oid;
 	lclContext *ctx = (lclContext *) AH->formatData;
+	char		drop = (compat ? ropt->dropSchema : false);
 	char		fname[K_STD_BUF_SIZE];
 
 	StartRestoreBlobs(AH);
@@ -382,7 +385,7 @@ _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt)
 
 	while (oid != 0)
 	{
-		StartRestoreBlob(AH, oid, ropt->dropSchema);
+		StartRestoreBlob(AH, oid, drop, compat);
 		_PrintFileData(AH, fname, ropt);
 		EndRestoreBlob(AH, oid);
 		_getBlobTocEntry(AH, &oid, fname);
diff --git a/src/bin/pg_dump/pg_backup_null.c b/src/bin/pg_dump/pg_backup_null.c
index 4217210..0c1b693 100644
--- a/src/bin/pg_dump/pg_backup_null.c
+++ b/src/bin/pg_dump/pg_backup_null.c
@@ -147,14 +147,19 @@ _StartBlobs(ArchiveHandle *AH, TocEntry *te)
 static void
 _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid)
 {
+	bool	compat = (strcmp(te->desc, "BLOBS") == 0 ? true : false);
+
 	if (oid == 0)
 		die_horribly(AH, NULL, "invalid OID for large object\n");
 
-	if (AH->ropt->dropSchema)
+	if (compat && AH->ropt->dropSchema)
 		DropBlobIfExists(AH, oid);
 
-	ahprintf(AH, "SELECT pg_catalog.lo_open(pg_catalog.lo_create('%u'), %d);\n",
-			 oid, INV_WRITE);
+	if (compat)
+		ahprintf(AH, "SELECT pg_catalog.lo_open(pg_catalog.lo_create('%u'), %d);\n",
+				 oid, INV_WRITE);
+	else
+		ahprintf(AH, "SELECT pg_catalog.lo_open(%u, %d);\n", oid, INV_WRITE);
 
 	AH->WriteDataPtr = _WriteBlobData;
 }
@@ -195,12 +200,14 @@ _PrintTocData(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
 	{
 		AH->currToc = te;
 
-		if (strcmp(te->desc, "BLOBS") == 0)
+		if (strcmp(te->desc, "BLOBS") == 0 ||
+			strcmp(te->desc, "BLOB DATA") == 0)
 			_StartBlobs(AH, te);
 
 		(*te->dataDumper) ((Archive *) AH, te->dataDumperArg);
 
-		if (strcmp(te->desc, "BLOBS") == 0)
+		if (strcmp(te->desc, "BLOBS") == 0 ||
+			strcmp(te->desc, "BLOB DATA") == 0)
 			_EndBlobs(AH, te);
 
 		AH->currToc = NULL;
diff --git a/src/bin/pg_dump/pg_backup_tar.c b/src/bin/pg_dump/pg_backup_tar.c
index 5cbc365..8cda3e1 100644
--- a/src/bin/pg_dump/pg_backup_tar.c
+++ b/src/bin/pg_dump/pg_backup_tar.c
@@ -100,7 +100,7 @@ typedef struct
 
 static const char *modulename = gettext_noop("tar archiver");
 
-static void _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt);
+static void _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt, bool compat);
 
 static TAR_MEMBER *tarOpen(ArchiveHandle *AH, const char *filename, char mode);
 static void tarClose(ArchiveHandle *AH, TAR_MEMBER *TH);
@@ -696,19 +696,22 @@ _PrintTocData(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
 	}
 
 	if (strcmp(te->desc, "BLOBS") == 0)
-		_LoadBlobs(AH, ropt);
+		_LoadBlobs(AH, ropt, true);
+	else if (strcmp(te->desc, "BLOB DATA") == 0)
+		_LoadBlobs(AH, ropt, false);
 	else
 		_PrintFileData(AH, tctx->filename, ropt);
 }
 
 static void
-_LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt)
+_LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt, bool compat)
 {
 	Oid			oid;
 	lclContext *ctx = (lclContext *) AH->formatData;
 	TAR_MEMBER *th;
 	size_t		cnt;
 	bool		foundBlob = false;
+	bool		drop = (compat ? ropt->dropSchema : false);
 	char		buf[4096];
 
 	StartRestoreBlobs(AH);
@@ -725,7 +728,7 @@ _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt)
 			{
 				ahlog(AH, 1, "restoring large object OID %u\n", oid);
 
-				StartRestoreBlob(AH, oid, ropt->dropSchema);
+				StartRestoreBlob(AH, oid, drop, compat);
 
 				while ((cnt = tarRead(buf, 4095, th)) > 0)
 				{
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 2db9e0f..4c78d12 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -190,9 +190,9 @@ static void selectSourceSchema(const char *schemaName);
 static char *getFormattedTypeName(Oid oid, OidOptions opts);
 static char *myFormatType(const char *typname, int32 typmod);
 static const char *fmtQualifiedId(const char *schema, const char *id);
-static bool hasBlobs(Archive *AH);
-static int	dumpBlobs(Archive *AH, void *arg);
-static int	dumpBlobComments(Archive *AH, void *arg);
+static void getBlobs(Archive *AH);
+static void dumpBlobItem(Archive *AH, BlobInfo *binfo);
+static int  dumpBlobData(Archive *AH, void *arg);
 static void dumpDatabase(Archive *AH);
 static void dumpEncoding(Archive *AH);
 static void dumpStdStrings(Archive *AH);
@@ -701,25 +701,8 @@ main(int argc, char **argv)
 			getTableDataFKConstraints();
 	}
 
-	if (outputBlobs && hasBlobs(g_fout))
-	{
-		/* Add placeholders to allow correct sorting of blobs */
-		DumpableObject *blobobj;
-		DumpableObject *blobcobj;
-
-		blobobj = (DumpableObject *) malloc(sizeof(DumpableObject));
-		blobobj->objType = DO_BLOBS;
-		blobobj->catId = nilCatalogId;
-		AssignDumpId(blobobj);
-		blobobj->name = strdup("BLOBS");
-
-		blobcobj = (DumpableObject *) malloc(sizeof(DumpableObject));
-		blobcobj->objType = DO_BLOB_COMMENTS;
-		blobcobj->catId = nilCatalogId;
-		AssignDumpId(blobcobj);
-		blobcobj->name = strdup("BLOB COMMENTS");
-		addObjectDependency(blobcobj, blobobj->dumpId);
-	}
+	if (outputBlobs)
+		getBlobs(g_fout);
 
 	/*
 	 * Collect dependency data to assist in ordering the objects.
@@ -1938,43 +1921,149 @@ dumpStdStrings(Archive *AH)
 
 
 /*
- * hasBlobs:
+ * getBlobs:
  *	Test whether database contains any large objects
  */
-static bool
-hasBlobs(Archive *AH)
+static void
+getBlobs(Archive *AH)
 {
-	bool		result;
-	const char *blobQry;
-	PGresult   *res;
+	PQExpBuffer		blobQry = createPQExpBuffer();
+	BlobInfo	   *binfo;
+	DumpableObject *bdata;
+	PGresult	   *res;
+	int				i;
+
+	/* Verbose message */
+	if (g_verbose)
+		write_msg(NULL, "reading binary large objects\n");
 
 	/* Make sure we are in proper schema */
 	selectSourceSchema("pg_catalog");
 
 	/* Check for BLOB OIDs */
 	if (AH->remoteVersion >= 80500)
-		blobQry = "SELECT oid FROM pg_largeobject_metadata LIMIT 1";
+		appendPQExpBuffer(blobQry,
+						  "SELECT oid, (%s lomowner), lomacl,"
+						  " obj_description(oid, 'pg_largeobject')"
+						  " FROM pg_largeobject_metadata",
+						  username_subquery);
+	else if (AH->remoteVersion >= 70200)
+		appendPQExpBuffer(blobQry,
+						  "SELECT DISTINCT loid, NULL, NULL,"
+						  " obj_description(loid, 'pg_largeobject')"
+						  " FROM pg_largeobject");
 	else if (AH->remoteVersion >= 70100)
-		blobQry = "SELECT loid FROM pg_largeobject LIMIT 1";
+		appendPQExpBuffer(blobQry,
+						  "SELECT DISTINCT loid, NULL, NULL,"
+						  " obj_description(loid)"
+						  " FROM pg_largeobject");
 	else
-		blobQry = "SELECT oid FROM pg_class WHERE relkind = 'l' LIMIT 1";
+		appendPQExpBuffer(blobQry,
+						  "SELECT DISTINCT oid, NULL, NULL,"
+						  " obj_description(oid)"
+						  " FROM pg_class WHERE relkind = 'l'");
 
-	res = PQexec(g_conn, blobQry);
-	check_sql_result(res, g_conn, blobQry, PGRES_TUPLES_OK);
+	res = PQexec(g_conn, blobQry->data);
+	check_sql_result(res, g_conn, blobQry->data, PGRES_TUPLES_OK);
 
-	result = PQntuples(res) > 0;
+	/*
+	 * Now, a large object has its own "BLOB ITEM" section to
+	 * declare itself.
+	 */
+	for (i = 0; i < PQntuples(res); i++)
+	{
+		binfo = (BlobInfo *) malloc(sizeof(BlobInfo));
+		binfo->dobj.objType = DO_BLOB_ITEM;
+		binfo->dobj.catId = nilCatalogId;
+		AssignDumpId(&binfo->dobj);
+
+		binfo->dobj.name = strdup(PQgetvalue(res, i, 0));
+		binfo->rolname = strdup(PQgetvalue(res, i, 1));
+		binfo->blobacl = strdup(PQgetvalue(res, i, 2));
+		binfo->blobdescr = strdup(PQgetvalue(res, i, 3));
+	}
+
+	/*
+	 * If we have a large object at least, "BLOB DATA" section
+	 * is also necessary.
+	 */
+	if (PQntuples(res) > 0)
+	{
+		bdata = (DumpableObject *) malloc(sizeof(DumpableObject));
+		bdata->objType = DO_BLOB_DATA;
+		bdata->catId = nilCatalogId;
+		AssignDumpId(bdata);
+		bdata->name = strdup("BLOBS");
+	}
 
 	PQclear(res);
+}
 
-	return result;
+/*
+ * dumpBlobItem
+ *
+ * dump a definition of the given large object
+ */
+static void
+dumpBlobItem(Archive *AH, BlobInfo *binfo)
+{
+	PQExpBuffer		aquery = createPQExpBuffer();
+	PQExpBuffer		bquery = createPQExpBuffer();
+	PQExpBuffer		dquery = createPQExpBuffer();
+
+	/*
+	 * Cleanup a large object
+	 */
+	appendPQExpBuffer(dquery,
+					  "SELECT pg_catalog.lo_unlink(oid) "
+					  "FROM pg_catalog.pg_largeobject_metadata "
+					  "WHERE oid = %s;\n", binfo->dobj.name);
+	/*
+	 * Create an empty large object
+	 */
+	appendPQExpBuffer(bquery,
+					  "SELECT pg_catalog.lo_create(%s);\n",
+					  binfo->dobj.name);
+	/*
+	 * Create a comment on large object, if necessary
+	 */
+	if (strlen(binfo->blobdescr) > 0)
+	{
+		appendPQExpBuffer(bquery, "\nCOMMENT ON LARGE OBJECT %s IS ",
+						  binfo->dobj.name);
+		appendStringLiteralAH(bquery, binfo->blobdescr, AH);
+		appendPQExpBuffer(bquery, ";\n");
+	}
+
+	ArchiveEntry(AH, binfo->dobj.catId, binfo->dobj.dumpId,
+				 binfo->dobj.name,
+				 NULL, NULL,
+				 binfo->rolname, false,
+				 "BLOB ITEM", SECTION_DATA,
+				 bquery->data, dquery->data, NULL,
+				 binfo->dobj.dependencies, binfo->dobj.nDeps,
+				 NULL, NULL);
+
+	/*
+	 * Dump access privileges, if necessary
+	 */
+	dumpACL(AH, binfo->dobj.catId, binfo->dobj.dumpId,
+			"LARGE OBJECT",
+			binfo->dobj.name, NULL,
+			binfo->dobj.name, NULL,
+			binfo->rolname, binfo->blobacl);
+
+	destroyPQExpBuffer(aquery);
+	destroyPQExpBuffer(bquery);
+	destroyPQExpBuffer(dquery);
 }
 
 /*
- * dumpBlobs:
- *	dump all blobs
+ * dumpBlobData:
+ *	dump all the data contents of large object
  */
 static int
-dumpBlobs(Archive *AH, void *arg)
+dumpBlobData(Archive *AH, void *arg)
 {
 	const char *blobQry;
 	const char *blobFetchQry;
@@ -2022,7 +2111,7 @@ dumpBlobs(Archive *AH, void *arg)
 			loFd = lo_open(g_conn, blobOid, INV_READ);
 			if (loFd == -1)
 			{
-				write_msg(NULL, "dumpBlobs(): could not open large object: %s",
+				write_msg(NULL, "dumpBlobData(): could not open large object: %s",
 						  PQerrorMessage(g_conn));
 				exit_nicely();
 			}
@@ -2035,7 +2124,7 @@ dumpBlobs(Archive *AH, void *arg)
 				cnt = lo_read(g_conn, loFd, buf, LOBBUFSIZE);
 				if (cnt < 0)
 				{
-					write_msg(NULL, "dumpBlobs(): error reading large object: %s",
+					write_msg(NULL, "dumpBlobData(): error reading large object: %s",
 							  PQerrorMessage(g_conn));
 					exit_nicely();
 				}
@@ -2054,134 +2143,6 @@ dumpBlobs(Archive *AH, void *arg)
 	return 1;
 }
 
-/*
- * dumpBlobComments
- *	dump all blob properties.
- *  It has "BLOB COMMENTS" tag due to the historical reason, but note
- *  that it is the routine to dump all the properties of blobs.
- *
- * Since we don't provide any way to be selective about dumping blobs,
- * there's no need to be selective about their comments either.  We put
- * all the comments into one big TOC entry.
- */
-static int
-dumpBlobComments(Archive *AH, void *arg)
-{
-	const char *blobQry;
-	const char *blobFetchQry;
-	PQExpBuffer cmdQry = createPQExpBuffer();
-	PGresult   *res;
-	int			i;
-
-	if (g_verbose)
-		write_msg(NULL, "saving large object properties\n");
-
-	/* Make sure we are in proper schema */
-	selectSourceSchema("pg_catalog");
-
-	/* Cursor to get all BLOB comments */
-	if (AH->remoteVersion >= 80500)
-		blobQry = "DECLARE blobcmt CURSOR FOR SELECT oid, "
-			"obj_description(oid, 'pg_largeobject'), "
-			"pg_get_userbyid(lomowner), lomacl "
-			"FROM pg_largeobject_metadata";
-	else if (AH->remoteVersion >= 70300)
-		blobQry = "DECLARE blobcmt CURSOR FOR SELECT loid, "
-			"obj_description(loid, 'pg_largeobject'), NULL, NULL "
-			"FROM (SELECT DISTINCT loid FROM "
-			"pg_description d JOIN pg_largeobject l ON (objoid = loid) "
-			"WHERE classoid = 'pg_largeobject'::regclass) ss";
-	else if (AH->remoteVersion >= 70200)
-		blobQry = "DECLARE blobcmt CURSOR FOR SELECT loid, "
-			"obj_description(loid, 'pg_largeobject'), NULL, NULL "
-			"FROM (SELECT DISTINCT loid FROM pg_largeobject) ss";
-	else if (AH->remoteVersion >= 70100)
-		blobQry = "DECLARE blobcmt CURSOR FOR SELECT loid, "
-			"obj_description(loid), NULL, NULL "
-			"FROM (SELECT DISTINCT loid FROM pg_largeobject) ss";
-	else
-		blobQry = "DECLARE blobcmt CURSOR FOR SELECT oid, "
-			"	( "
-			"		SELECT description "
-			"		FROM pg_description pd "
-			"		WHERE pd.objoid=pc.oid "
-			"	), NULL, NULL "
-			"FROM pg_class pc WHERE relkind = 'l'";
-
-	res = PQexec(g_conn, blobQry);
-	check_sql_result(res, g_conn, blobQry, PGRES_COMMAND_OK);
-
-	/* Command to fetch from cursor */
-	blobFetchQry = "FETCH 100 IN blobcmt";
-
-	do
-	{
-		PQclear(res);
-
-		/* Do a fetch */
-		res = PQexec(g_conn, blobFetchQry);
-		check_sql_result(res, g_conn, blobFetchQry, PGRES_TUPLES_OK);
-
-		/* Process the tuples, if any */
-		for (i = 0; i < PQntuples(res); i++)
-		{
-			Oid			blobOid = atooid(PQgetvalue(res, i, 0));
-			char	   *lo_comment = PQgetvalue(res, i, 1);
-			char	   *lo_owner = PQgetvalue(res, i, 2);
-			char	   *lo_acl = PQgetvalue(res, i, 3);
-			char		lo_name[32];
-
-			resetPQExpBuffer(cmdQry);
-
-			/* comment on the blob */
-			if (!PQgetisnull(res, i, 1))
-			{
-				appendPQExpBuffer(cmdQry,
-								  "COMMENT ON LARGE OBJECT %u IS ", blobOid);
-				appendStringLiteralAH(cmdQry, lo_comment, AH);
-				appendPQExpBuffer(cmdQry, ";\n");
-			}
-
-			/* dump blob ownership, if necessary */
-			if (!PQgetisnull(res, i, 2))
-			{
-				appendPQExpBuffer(cmdQry,
-								  "ALTER LARGE OBJECT %u OWNER TO %s;\n",
-								  blobOid, lo_owner);
-			}
-
-			/* dump blob privileges, if necessary */
-			if (!PQgetisnull(res, i, 3) &&
-				!dataOnly && !aclsSkip)
-			{
-				snprintf(lo_name, sizeof(lo_name), "%u", blobOid);
-				if (!buildACLCommands(lo_name, NULL, "LARGE OBJECT",
-									  lo_acl, lo_owner, "",
-									  AH->remoteVersion, cmdQry))
-				{
-					write_msg(NULL, "could not parse ACL (%s) for "
-							  "large object %u", lo_acl, blobOid);
-					exit_nicely();
-				}
-			}
-
-			if (cmdQry->len > 0)
-			{
-				appendPQExpBuffer(cmdQry, "\n");
-				archputs(cmdQry->data, AH);
-			}
-		}
-	} while (PQntuples(res) > 0);
-
-	PQclear(res);
-
-	archputs("\n", AH);
-
-	destroyPQExpBuffer(cmdQry);
-
-	return 1;
-}
-
 static void
 binary_upgrade_set_type_oids_by_type_oid(PQExpBuffer upgrade_buffer,
 											   Oid pg_type_oid)
@@ -6524,21 +6485,16 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 		case DO_DEFAULT_ACL:
 			dumpDefaultACL(fout, (DefaultACLInfo *) dobj);
 			break;
-		case DO_BLOBS:
-			ArchiveEntry(fout, dobj->catId, dobj->dumpId,
-						 dobj->name, NULL, NULL, "",
-						 false, "BLOBS", SECTION_DATA,
-						 "", "", NULL,
-						 dobj->dependencies, dobj->nDeps,
-						 dumpBlobs, NULL);
+		case DO_BLOB_ITEM:
+			dumpBlobItem(fout, (BlobInfo *) dobj);
 			break;
-		case DO_BLOB_COMMENTS:
+		case DO_BLOB_DATA:
 			ArchiveEntry(fout, dobj->catId, dobj->dumpId,
 						 dobj->name, NULL, NULL, "",
-						 false, "BLOB COMMENTS", SECTION_DATA,
+						 false, "BLOB DATA", SECTION_DATA,
 						 "", "", NULL,
 						 dobj->dependencies, dobj->nDeps,
-						 dumpBlobComments, NULL);
+						 dumpBlobData, NULL);
 			break;
 	}
 }
@@ -10394,8 +10350,13 @@ dumpACL(Archive *fout, CatalogId objCatId, DumpId objDumpId,
 {
 	PQExpBuffer sql;
 
-	/* Do nothing if ACL dump is not enabled */
-	if (dataOnly || aclsSkip)
+	/*
+	 * Do nothing if ACL dump is not enabled
+	 *
+	 * Note that the caller has to check necessity to dump ACLs
+	 * depending on --data-only / --schema-only.
+	 */
+	if (aclsSkip)
 		return;
 
 	sql = createPQExpBuffer();
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 1e65fac..3776d84 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -115,8 +115,8 @@ typedef enum
 	DO_FDW,
 	DO_FOREIGN_SERVER,
 	DO_DEFAULT_ACL,
-	DO_BLOBS,
-	DO_BLOB_COMMENTS
+	DO_BLOB_DATA,
+	DO_BLOB_ITEM,
 } DumpableObjectType;
 
 typedef struct _dumpableObject
@@ -443,6 +443,14 @@ typedef struct _defaultACLInfo
 	char	   *defaclacl;
 } DefaultACLInfo;
 
+typedef struct _blobInfo
+{
+	DumpableObject	dobj;
+	char	   *rolname;
+	char	   *blobacl;
+	char	   *blobdescr;
+} BlobInfo;
+
 /* global decls */
 extern bool force_quotes;		/* double-quotes for identifiers flag */
 extern bool g_verbose;			/* verbose flag */
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 6676baf..be98c81 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -92,8 +92,8 @@ static const int newObjectTypePriority[] =
 	14,							/* DO_FDW */
 	15,							/* DO_FOREIGN_SERVER */
 	27,							/* DO_DEFAULT_ACL */
-	20,							/* DO_BLOBS */
-	21							/* DO_BLOB_COMMENTS */
+	21,							/* DO_BLOB_DATA */
+	20,							/* DO_BLOB_ITEM */
 };
 
 
@@ -1146,14 +1146,14 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 					 "DEFAULT ACL %s  (ID %d OID %u)",
 					 obj->name, obj->dumpId, obj->catId.oid);
 			return;
-		case DO_BLOBS:
+		case DO_BLOB_DATA:
 			snprintf(buf, bufsize,
-					 "BLOBS  (ID %d)",
+					 "BLOB DATA  (ID %d)",
 					 obj->dumpId);
 			return;
-		case DO_BLOB_COMMENTS:
+		case DO_BLOB_ITEM:
 			snprintf(buf, bufsize,
-					 "BLOB COMMENTS  (ID %d)",
+					 "BLOB ITEM  (ID %d)",
 					 obj->dumpId);
 			return;
 	}
#121Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp
In reply to: KaiGai Kohei (#120)
Re: Largeobject Access Controls (r2460)

KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

The attached patch fixed up the cleanup query as follows:
+   appendPQExpBuffer(dquery,
+                     "SELECT pg_catalog.lo_unlink(oid) "
+                     "FROM pg_catalog.pg_largeobject_metadata "
+                     "WHERE oid = %s;\n", binfo->dobj.name);

And, I also noticed that lo_create() was not prefixed by "pg_catalog.",
so I also add it.

Thanks. Now the patch is ready to commit.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center