Priority table or Cache table

Started by Haribabu Kommialmost 12 years ago22 messages
#1Haribabu Kommi
kommi.haribabu@gmail.com

Hi,

I want to propose a new feature called "priority table" or "cache table".
This is same as regular table except the pages of these tables are having
high priority than normal tables. These tables are very useful, where a
faster query processing on some particular tables is expected.

The same faster query processing can be achieved by placing the tables on a
tablespace of ram disk. In this case there is a problem of data loss in
case of system shutdown. To avoid this there is a need of continuous backup
of this tablespace and WAL files is required. The priority table feature
will solve these problems by providing the similar functionality.

User needs a careful decision in deciding how many tables which require a
faster access, those can be declared as priority tables and also these
tables should be in small in both number of columns and size.

New syntax:

create [priority] Table ...;

or

Create Table .. [ buffer_pool = priority | default ];

By adding a new storage parameter of buffer_pool to specify the type of
buffer pool this table can use.

The same can be extended for index also.

Solution -1:

This solution may not be a proper one, but it is simple. So while placing
these table pages into buffer pool, the usage count is changed to double
max buffer usage count instead of 1 for normal tables. Because of this
reason there is a less chance of these pages will be moved out of buffer
pool. The queries which operates on these tables will be faster because of
less I/O. In case if the tables are not used for a long time, then only the
first query on the table will be slower and rest of the queries are faster.

Just for test, a new bool member can be added to RELFILENODE structure to
indicate the table type is priority or not. Using this while loading the
page the usage count can be modified.

The pg_buffercache output of a priority table:

postgres=# select * from pg_buffercache where relfilenode=16385;
bufferid | relfilenode | reltablespace | reldatabase | relforknumber |
relblocknumber | isdirty | usagecount
-----------+---------------+-------------------+-----------------+--------------------+---------------------+---------+------------
270 | 16385 | 1663 | 12831 |
0 | 0 | t | 10

Solution - 2:

By keeping an extra flag in the buffer to know whether the buffer is used
for a priority table or not? By using this flag while replacing a buffer
used for priority table some extra steps needs to be taken care like
1. Only another page of priority table can replace this priority page.
2. Only after at least two complete cycles of clock sweep, a normal table
page can replace this.

In this case the priority buffers are present in memory for long time as
similar to the solution-1, but not guaranteed always.

Solution - 3:

Create an another buffer pool called "priority buffer pool" similar to
shared buffer pool to place the priority table pages. A new guc parameter
called "priority_buffers" can be added to the get the priority buffer pool
size from the user. The Maximum limit of these buffers can be kept smaller
value to make use of it properly.

As an extra care, whenever any page needs to move out of the priority
buffer pool a warning is issued, so that user can check whether the
configured the priority_buffers size is small or the priority tables are
grown too much as not expected?

In this case all the pages are always loaded into memory thus the queries
gets the faster processing.

IBM DB2 have the facility of creating one more buffer pools and fixing
specific tables and indexes into them. Oracle is also having a facility to
specify the buffer pool option as keep or recycle.

I am preferring syntax-2 and solution-3. please provide your
suggestions/improvements.

Regards,
Hari Babu
Fujitsu Australia

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Haribabu Kommi (#1)
Re: Priority table or Cache table

Haribabu Kommi <kommi.haribabu@gmail.com> writes:

I want to propose a new feature called "priority table" or "cache table".
This is same as regular table except the pages of these tables are having
high priority than normal tables. These tables are very useful, where a
faster query processing on some particular tables is expected.

Why exactly does the existing LRU behavior of shared buffers not do
what you need?

I am really dubious that letting DBAs manage buffers is going to be
an improvement over automatic management.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Tom Lane (#2)
Re: Priority table or Cache table

On Thu, Feb 20, 2014 at 11:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Haribabu Kommi <kommi.haribabu@gmail.com> writes:

I want to propose a new feature called "priority table" or "cache table".
This is same as regular table except the pages of these tables are having
high priority than normal tables. These tables are very useful, where a
faster query processing on some particular tables is expected.

Why exactly does the existing LRU behavior of shared buffers not do
what you need?

Lets assume a database having 3 tables, which are accessed regularly. The
user is expecting a faster query results on one table.
Because of LRU behavior which is not happening some times. So if we just
separate those table pages into an another buffer
pool then all the pages of that table resides in memory and gets faster
query processing.

Regards,
Hari Babu
Fujitsu Australia

#4Amit Kapila
amit.kapila16@gmail.com
In reply to: Haribabu Kommi (#3)
Re: Priority table or Cache table

On Thu, Feb 20, 2014 at 6:24 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

On Thu, Feb 20, 2014 at 11:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I want to propose a new feature called "priority table" or "cache
table".
This is same as regular table except the pages of these tables are
having
high priority than normal tables. These tables are very useful, where a
faster query processing on some particular tables is expected.

Why exactly does the existing LRU behavior of shared buffers not do
what you need?

Lets assume a database having 3 tables, which are accessed regularly. The
user is expecting a faster query results on one table.
Because of LRU behavior which is not happening some times.

I think this will not be a problem for regularly accessed tables(pages),
as per current algorithm they will get more priority before getting
flushed out of shared buffer cache.
Have you come across any such case where regularly accessed pages
get lower priority than non-regularly accessed pages?

However it might be required for cases where user wants to control
such behaviour and pass such hints through table level option or some
other way to indicate that he wants more priority for certain tables
irrespective
of their usage w.r.t other tables.

Now I think here important thing to find out is how much helpful it is for
users or why do they want to control such behaviour even when Database
already takes care of such thing based on access pattern.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Amit Kapila (#4)
Re: Priority table or Cache table

On Thu, Feb 20, 2014 at 2:26 PM, Amit Kapila <amit.kapila16@gmail.com>wrote:

On Thu, Feb 20, 2014 at 6:24 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

On Thu, Feb 20, 2014 at 11:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I want to propose a new feature called "priority table" or "cache
table".
This is same as regular table except the pages of these tables are
having
high priority than normal tables. These tables are very useful, where

a

faster query processing on some particular tables is expected.

Why exactly does the existing LRU behavior of shared buffers not do
what you need?

Lets assume a database having 3 tables, which are accessed regularly. The
user is expecting a faster query results on one table.
Because of LRU behavior which is not happening some times.

I think this will not be a problem for regularly accessed tables(pages),
as per current algorithm they will get more priority before getting
flushed out of shared buffer cache.
Have you come across any such case where regularly accessed pages
get lower priority than non-regularly accessed pages?

Because of other regularly accessed tables, some times the table which
expects faster results is getting delayed.

However it might be required for cases where user wants to control
such behaviour and pass such hints through table level option or some
other way to indicate that he wants more priority for certain tables
irrespective
of their usage w.r.t other tables.

Now I think here important thing to find out is how much helpful it is for
users or why do they want to control such behaviour even when Database
already takes care of such thing based on access pattern.

Yes it is useful in cases where the application always expects the faster
results whether the table is used regularly or not.

Regards,
Hari Babu
Fujitsu Australia

#6Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Haribabu Kommi (#5)
2 attachment(s)
Re: Priority table or Cache table

On Thu, Feb 20, 2014 at 10:23 AM, Haribabu Kommi
<kommi.haribabu@gmail.com>wrote:

On Thu, Feb 20, 2014 at 2:26 PM, Amit Kapila <amit.kapila16@gmail.com>wrote:

On Thu, Feb 20, 2014 at 6:24 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

On Thu, Feb 20, 2014 at 11:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I want to propose a new feature called "priority table" or "cache
table".
This is same as regular table except the pages of these tables are
having
high priority than normal tables. These tables are very useful,

where a

faster query processing on some particular tables is expected.

Why exactly does the existing LRU behavior of shared buffers not do
what you need?

Lets assume a database having 3 tables, which are accessed regularly.

The

user is expecting a faster query results on one table.
Because of LRU behavior which is not happening some times.

I think this will not be a problem for regularly accessed tables(pages),
as per current algorithm they will get more priority before getting
flushed out of shared buffer cache.
Have you come across any such case where regularly accessed pages
get lower priority than non-regularly accessed pages?

Because of other regularly accessed tables, some times the table which
expects faster results is getting delayed.

The solution involving buffer pools partitions the buffer cache in separate
pools explicitly. The way PostgreSQL buffer manager works, for a regular
pattern table accesses the buffer cache automatically reaches a stable
point where the number of buffers containing pages belonging to a
particular table starts to stabilize. Thus at an equilibrium point for
given access pattern, the buffer cache automatically gets partitioned by
the tables, each using its share of buffers. So, solution using buffer
pools seems useless.

PFA some scripts, which I used to verify the behaviour. The scripts create
two tables, one large and other half it's size (buffer_usage_objects.sql).
The other script contains few queries which will simulate a simple table
access pattern by running select count(*) on these tables N times. The same
script contains query of pg_buffercache view provided by pg_buffercache
extension. This query counts the number of buffers uses by either of these
tables. So, if you run three session in parallel, two querying either of
the tables and the third taking snapshot of buffer usage per table, you
would be able to see this partitioning.

However it might be required for cases where user wants to control
such behaviour and pass such hints through table level option or some
other way to indicate that he wants more priority for certain tables
irrespective
of their usage w.r.t other tables.

Now I think here important thing to find out is how much helpful it is for
users or why do they want to control such behaviour even when Database
already takes care of such thing based on access pattern.

Yes it is useful in cases where the application always expects the faster
results whether the table is used regularly or not.

In such case, it might be valuable to see if we should play with the
maximum usage parameter, which is set to 5 currently.
54 #define BM_MAX_USAGE_COUNT 5

Regards,
Hari Babu
Fujitsu Australia

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

buffer_usage_objects.sqlapplication/octet-stream; name=buffer_usage_objects.sqlDownload
buffer_usage_queries.sqlapplication/octet-stream; name=buffer_usage_queries.sqlDownload
#7Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Ashutosh Bapat (#6)
1 attachment(s)
Re: Priority table or Cache table

On Thu, Feb 20, 2014 at 10:06 PM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

On Thu, Feb 20, 2014 at 10:23 AM, Haribabu Kommi <kommi.haribabu@gmail.com

wrote:

On Thu, Feb 20, 2014 at 2:26 PM, Amit Kapila <amit.kapila16@gmail.com>wrote:

On Thu, Feb 20, 2014 at 6:24 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

On Thu, Feb 20, 2014 at 11:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I want to propose a new feature called "priority table" or "cache
table".
This is same as regular table except the pages of these tables are
having
high priority than normal tables. These tables are very useful,

where a

faster query processing on some particular tables is expected.

Why exactly does the existing LRU behavior of shared buffers not do
what you need?

Lets assume a database having 3 tables, which are accessed regularly.

The

user is expecting a faster query results on one table.
Because of LRU behavior which is not happening some times.

I think this will not be a problem for regularly accessed tables(pages),
as per current algorithm they will get more priority before getting
flushed out of shared buffer cache.
Have you come across any such case where regularly accessed pages
get lower priority than non-regularly accessed pages?

Because of other regularly accessed tables, some times the table which
expects faster results is getting delayed.

The solution involving buffer pools partitions the buffer cache in
separate pools explicitly. The way PostgreSQL buffer manager works, for a
regular pattern table accesses the buffer cache automatically reaches a
stable point where the number of buffers containing pages belonging to a
particular table starts to stabilize. Thus at an equilibrium point for
given access pattern, the buffer cache automatically gets partitioned by
the tables, each using its share of buffers. So, solution using buffer
pools seems useless.

I checked some of the performance reports on the oracle multiple buffer
pool concept, shown as there is an increase in cache hit ratio compared to
a single buffer pool.
After that only I proposed this split pool solution. I don't know how much
it really works for Postgresql. The performance report on oracle is
attached in the mail.

PFA some scripts, which I used to verify the behaviour. The scripts create
two tables, one large and other half it's size (buffer_usage_objects.sql).
The other script contains few queries which will simulate a simple table
access pattern by running select count(*) on these tables N times. The same
script contains query of pg_buffercache view provided by pg_buffercache
extension. This query counts the number of buffers uses by either of these
tables. So, if you run three session in parallel, two querying either of
the tables and the third taking snapshot of buffer usage per table, you
would be able to see this partitioning.

Thanks for the scripts. I will check it.

However it might be required for cases where user wants to control

such behaviour and pass such hints through table level option or some
other way to indicate that he wants more priority for certain tables
irrespective
of their usage w.r.t other tables.

Now I think here important thing to find out is how much helpful it is
for
users or why do they want to control such behaviour even when Database
already takes care of such thing based on access pattern.

Yes it is useful in cases where the application always expects the faster
results whether the table is used regularly or not.

In such case, it might be valuable to see if we should play with the
maximum usage parameter, which is set to 5 currently.
54 #define BM_MAX_USAGE_COUNT 5

This is the first solution which i have described in my first mail. Thanks,
I will check further into it.

Regards,
Hari Babu
Fujitsu Australia

Attachments:

oracle9i_buffer_pools.pdfapplication/pdf; name=oracle9i_buffer_pools.pdfDownload
%PDF-1.4
%����
5 0 obj
<</Length 6 0 R/Filter /FlateDecode>>
stream
;�f�p�&z!8	�;��1���1VHPj3W��$����������G���Z�����n��������3rY3C�����x{��}�
q�>�|�@��%}�^3�{�n�����z�Q�?��0� ,/��N!Y�\��r|���:���2%�-[!*w>�`�r�JRH�xB���&��Z>�?s�������Hg��P4d(�e��x��u����$���W��ky�w��P�����������n��e��Z����qo�#�d���=�������r�'4��DTO����������J��y�������p�J`��h��>{\��[���'DO0A.��iiQ��'��zHb���wG4�`�@��1��]�~A��x���Ih�b:7���	w+�����	����4������a4#���s+���CZ���V�@�������fq�Ee*�I�8�(f�8 ��Vi���_���#��+dpLh�SF�{���.c����X�S��������������,��$���s�
99�bb���?<�do���"O�"@awtf��?��	<��o�8(���$@z����e�k����h�����!u6��y_���
Z��6vRfm9�XFA���5a������ZR<�{e6\Z!�I�6��E�&�X:$�b�{i�W
a`fZ�[�Y-NZ�P���S+����X�����������
Q)wsGq`%����R�Ru�e��=����a�6?��cV���31Y�"l���G��'Srh�~�-�6I�V_p�Y�����_�5�Gt�"^���g��]���t�.�`���a���~D�C{��wH��0�+��>��G�tf7;[�~q���;P��p����_e��;(h������m�6��1��o�sE*M��/_���k������B��^�N	��
|I_�ZG�m^V!D*5<��zR������s�	W���l*����UW��8t����H�i��zd��C5��pE�n���~��*�*0&/z<P�����}�q�f�VJM��<i��L
����m�g�r�I�s5��b��>W����%��������*��9mg�V����	��v8��>�@����a���G.���.�W���Iv�uU��j"���D�7����/��]� 6]��>�6@]>���h�Yt*/4V���vz���A�c���t�:��T������T�u���s85F��(��K�8H0a�1S���O!���;�C�9k�y�5U���������,����F�l��.1]�W�0�1�Qrki�l���#�*'��8HP�2�R����|��?]{>��Ll�rm�{��	tue�Zf���Z�K��ze�s�c�1��|���lS��E~9e�P�0M�XS�Hd�o'L��6%����0a�4�������h�^��6��7���;\~��	�Ram9s,X��U���s���d��Q���g����=aF`�R������K,x&$��n�X�*��J���������0������xendstream
endobj
6 0 obj
1501
endobj
13 0 obj
<</Length 14 0 R/Filter /FlateDecode>>
stream
5#����&��d������S������S
������UO��a�O��f���J�.����+J���������o�F���S���j�#�:Q�E�%_m^�?��Z�E���` ,Q���%���T������[M`-Bd�������,M+jR����4��/��[�~�W�@#R���7y��L���l�G,��'k�#	r�h0�����Z�����6����mn���L���^r:a|�����e�d�������K����&/�*d�[b����_�����j�l3r���c�u�^(��f���|Qy�:SE��!]�R�[�����f�98�*��A��]8v+��@4\#������m���Z����,��endstream
endobj
14 0 obj
366
endobj
17 0 obj
<</Length 18 0 R/Filter /FlateDecode>>
stream
�pbt��_I�kJ��'Zs�Zv�p����]�K�����9no� M�pBY'y�����he�+d2�0@����z>��*<�o�1.S�f'����<SrO{��.\�������]38��q��Ij$e;��=�����m�c�����dS��,r���Y#4VM�(��gI�~_�r����&����~B��0�����00�"I;���hm)�?q��g0�<�
����**��{�uD�-�����x�K�db3��2���CZ��Xa���1^�o���m^g��@��BL�f���5bW���r}"x+F���{S2L�K�T:����pf��R�TA��h���]N.r��4:�5]oJZ��y��EK�Q��j�j��GZ���A�+�/�<n��9_A���84���
��;�����Ku_�M��`�3)y��E�$H�������vo��j�o	���2G��a��1�"�u�n���N?�,I��-��E��)��c�ih�r�/Q������d�����#�v��Q��N��^���mV4X/�����!p���F%�^M�	z~TzS����3�P�����E�|9_�p��AQ����D�;_��Y����H�^�u^��t�8�<X�J{n����vW�t�R����KE��HPE�>���Ud�=�&.\��y��l�n�(���(��b�2]�%�8%A����� {��
Q����e�x��]CI��j\5O{��(�.�I�S��f#���?�}_Y��M�o1&=�J�L�`���XK����C0g�"�m���#����uf����g��$E���t�B�������!=<��3��I"U�P)���=����}����j
�|OS}�7>�t��@� L'U��=����w+�l��b
�Ly��[�'w��3#�y
ym���b(��\��+�o�
������	(|�ZQ����c����G �0m;lZi���X�^Qt#���a�����w��vM�������X,���.g����!�NDr"�RA�yaJ?���Q�
�$MJ
��a��DS������������,�{��L���:����tixu>����,����]E��e�
*k9^�d�U�������>�z�e�[$�<Gk��R9qm����R�|"��rH��&[$�,3��4�Qmmb���OA��`�[��H[��`��qSz[j�~rE�J3+{�j���p��#�b`�a�!oN������
@�q�X�z�����#���:f�M��?dX!z�5�����l��tv�?�L,����a�s��S��H�����=��o����{ZY*=���&||u2[���6��\��L��-�����L�������H�pm���9F����HF
*��D��^���/'�I���L��)��Fa�W��r���d��r���.yC�eO���P]U�	+���[�� �������-y$N��u��@��/n/�v�
����^6���3�Lv?�$c7��n8�+D���1����5�4�����gI������^�bE��
o�x�<5��C���l&3��6��J�}�����	�(E�^�	��3�1��B�c�������%T���=p����6�0g�����W�<w�z<(���~�g-[g���e���n���D+�9S���I9�Z��V[�}�u{�1����|���(%�u?j��Cz*"��N�����i7O1�4=[A��{[a�9����KV�c�L�
�b�v� ��n�;���*�V�we��;P�R����J �"���f��'[�2��Ht���(3�~l�^�\Iq
��	���Kv.�u��A�8���S5�.n&��u��-��>.L1�RrIm�?0��9t���*9(d�?��{*��.�������L�[�\�|�@,i�.^��7<�.�.���KPO����Qm����/����T��*�����=^[����7���F�%���=U�VG+�e`F	����/�lV���D�d����X0!uP���
��,1^�1q ���=6R�u�v���t��%�h�)�i��6@�����_��*7��p��-�����>�,�����y����)g����	
*�����3	<F������I��� �����"�m��0U�9E���#LN&��p��,�����/��O`�R��k�����b��(2���v�1������x;����A����*��qT�D� O"n�""b�L-
9^p�j����n�2I��).W)�))]A���G����T����tbV����wx����%$if%�������(�naX���#���`�V!3*F���Et�pC�]�
�n���^���]wE�����������:)��'B�j�QyL?�{��RYkz��:����J�X;�N�e�� ��,[���u��
�W�yB���v���FsY�{���t�����{��\���H�&��5q��y��L~52qs�h�l��s�$�������l�q�,�����6v���g���b��<��Z|E$�U�a6��8��G��}��
�H����	�'�s����@'D �jQ������i/ M�b�����L��C��8��p�j��d(�G?�[��c���\�.��D�F?�>a��_��R�J.dtv+>���V1�����������������u�I-�������	���Oa�����?������EHk�?����*�w�����b��d��(������8�5GS��-�
<+�k{H������ �X ��������J��������k�)
������1�Xn�������Wi�J_1��u�������I�:��I)�&���#4Z�5��)!4jj��F���d�B��6$y$J
���?�v�7�
s<J��@{��})���{�����=y����{�s��l���#�����,0�9$?��F	���}�!�i�p#�'u���k
2�E'Q��y�]4�-[J��d�.�l:WOvm��d�TJS��Z8 +��ub��0x��)�Y�����4C�� ���w�4+��7���d�+�FS�x��=��v����i��L^(_����-�����@+hp_��������.�S�a`�A;M�l�F��������</�2)a7�XvA����5D|�C�^r��q��A+_�� �"Xi?���*� Q��w*���q�I��������$i"�C���"�x
�-tN����^�����e�66B��w���������`V���>_�{���Wj��|����t�Y��xH����V�+��0�����1���{	����7�
��]��v����J�G��*]���-���2R������������t�i�N��x[��*���q��,%Sg��8�Kagql��!}@���9*����I�U�(�~�d�����3��j�;������	�1%�k�H
#�@@�����l!B<��(��|���%��Ng�MZ���0�B��X|�6��]f�u_Ql���������j���<�����u��yG
-�0�_��������}r���U��DaD�i��|���[T	,m�<n� a�0�����5�A��D���(�5����l�BM��}�o��tt�H�	y�q�=��6`���'���Ps���
)=����o/^���5�v�[��)�� ��R�Wx;�6��/M�
@���~l�&<	�����8�*�$q�pbf���)J�Tl�:����T�6��G<m��:d[����T�B�&�0�N���G��g�*���O��C<k���!N^����*�8����g��Y�ZJy3��
|������r���R##s��%M"�EA;{<�a��]����e��-2�0�&��;8��wa�����SDOO���+O��B�YcRE�%'q)fq���EuD��7�8"��S?�>�j�T	�dk�%B6��X�"�6H��������T|dP,���:�8-kQ��x��*��c}���m�-x��s5*f�P7�h�u�"~���D~}F������7����g�.��������?x����
�{<��Dw���:�M.��F=j�/4rU�d�B�y�����w��u��!���;�\�F�*��hG������#`��mH����]&D�)V���=Y��wK��6��QKm*a1�
��^����[����1Lh�c*�����t��u��:�mT�����T8��� ��@z�����
��}��;��L����h��_�H�K���������|;z��Wy�����UX*m��v���9}���jx��)���PS������f�+u���}�k�����U�����=0�m}��k@(�'�x��@7��oP�7���m��0zendstream
endobj
18 0 obj
4384
endobj
22 0 obj
<</Length 23 0 R/Filter /FlateDecode>>
stream
��2��#`{B�1����{����KQWN�x|�I�zO�[�^
�����lc:E�&n�F�k��{E�XV����]Ja���>�K<0nrrX$
������
���N�v7B��Zk$W6u�R���Kd"����N��V�9_S�B
Ck'
�u�JL\{�����u:"Y�\Tt$�X�U���x��#<	�Us�����0�A�?��,�-��D�D��������
/Rw�u���(��h!,������w�1�����H]�O��	�?Y`��Y������le�.���S��b��(=�\�
����|&��������>X1��������A|�$ cF�7���pgUdXw$������_�'��w�	wE�&z�WF�4|Im ���@�K[M��U�ZhD������?�8��,��:�.#b9�27�0}9D��������N���������;o�fP�DVh�5az�Z�%y����F#���9(������Z���F��1:#C
��<*f����X�n�t����)�N�3��E)��W���e7���>|#�R���?U�d�Vj~�{V�S%��W
��<[��������?*�W|]�4���(��{MH�s+?�@�Z���L�q���Y�x?�v����=��������+����`��,��,#�&8������a������^a��:�z��_K>v�zM�$%�6����0T$�-��O�V:]�J��8������k�\�(�����x��q51�r���2��
�rnR9\�6H�-��_��������s�vg6��c��u�%�Xi�
�\95�jS��&
	���P�_8�� QF6g��)Y
9���������fY��=�� ��Yc�y�����k��i�FqHu����DM\��kl�f�m���#^���5�w"\G����E��o��mz0x>K�:�0���X����B�����r�U�@9K��#�
n=�������L����#R��vn�:'\��dl�b�b�T�jA`����4~��<L%%?���36����r���P�se"$�.��-.��
�~���R��T�"��A��D�����W���RO��@����I$����`���W�@��rq��1vR��B��=�����/nY�3-�!�x�~��LH�]�5����6���1���{"j?���E#�[����7�pms��X����hC`��~������)���ml8���GU:z�x���V2��b��4����_<�t�������H�$$[*�������&���[�#�@HI�V�G���Te�_H�wSA�P 3�m D�Q�4����$	�}��G"�B�'��E:V)�
������J�mP1�J�N�hEhf4
��u�Q��BSz�C^�x
��?��a�����9j�a������������&}�|9?';q=B���2����),��p����������^���	��i�K�������b��t"�7�8[�WQ$d������Sf��I��3�����^`	���eL�P�����$���^�y��������
w��~�ID�(�����7|�lIL~<��}a	�>j�����k���@��iK��b%�&��G��;��.($_��H���c �G����hRj�0X�`�?l����<c�nOM��I�^;��5��QV������,�;4��bz��C�N��kt�B��_���s$l9��Cl�vH^�	nv���L�r��>�r�n��(��+%v*i�",Ep��XaU��g���,����%�b�8�����h&�����XG�������������)��C��>{���I�XL���s��^n�Ktj�;<[��W00��#��Y�)�p��Jj����������{<���%=��\`=^��ks�f ?.^[a���>;�9��2����6-�L����\a�J;-���p'�Cb/
4%��Tz����s�2����h�k�T�v�j�+#�E��n$*d#%�(��l^��[$�O|��0?RK9=��],�_��m�6Cx��-w����5�>�QPYJ�x�hq���������@W>������������.W$[����u�E�FpH9U�e�F,!	�N����tg����G�����`�k;��qA����'k�N�P,h��/F���{��
3�s�B>�6�H�v����0�����ECG����(f�����a|p�9�����7���SF+p%7��i����j p�\U=�~M����nM�Jw�#�X����}��&��������s����Tv�p5e�g?����I�v��3��2B?Q�#�)g��y(Q�W>���>�N���J �av�[��@�:�0V�D������������;���K4nc����m����y�/TOH�����|��'�6*au���
��AO�NS�����6<�SF�������g"����]��>��c<S�� �WP\X�x�T������-���x*g�Rw�[��;:�7x�+�v9�?�]�F%�nA6�d�~m6:�{��#Aw�O��s�u	W4w��7�p��s���La������[Y{�-�O�_��`K��3��YH���v�����SC������G<zD(qr����[����S�^�3B�p�zU����y�AK0_�F��-]10o�>S�g��X
&��E���T��������c,
����2��6{[X0�����-G�2�:Zl~��w�/���Qb��,"3��1]3���[�7��!#hxs�\a�]�_�u�5��R�YJ����"�2��=?|B<�]V�n�����Y�/l���Nb�dmu��	w�0r}��
�b�(M��5WJ`s�����o��^d���r�K�Er(�0��9�:�;�
"��y?��E��6��E���m}���E��� ��.����&�V�~��'wh	M?�L�P�H7���y+P=��x}���*odC���h�~�N|�C�����D���x���~��6���I�
k�!HO:��:��su&���^H����G#�e7�?:��e�=/I�	�	�����>5�
+��"oT��L.��1i�(-��Y���U�<J�:pl�E��V9E)Yr������c���i��k5���JP�A2xe�Y�
��b�s���X,�4�Xd����ypQ�g���8
�Mrs_��$��/�C��O������h���N9P�����,���F��?>#����}���3��	:�?���A�ahY
��&�N�bi;��� B5�MmKA�������J�y�i�!�Z�3(H]��~o�|*�"�R��%p���J1�V����f�C�D����S��+U����O�:�w�}���$���#�z,�8�A�A<��C���e"/���o��k���1��oSk���cB���=�������bl,9����Ab-h�
�����M������4,eY����T�irMXw��x��m�����S�v�
������G�}4
��*Md�-E9q�.���G2j[��~�&L��~���PK��{��=0������D���z�Kendstream
endobj
23 0 obj
3602
endobj
27 0 obj
<</Length 28 0 R/Filter /FlateDecode>>
stream
�8<x[�6���y��n[��@�z��4�#JRl
�T_�4��u���uuU�J������u�B&�=�;(����-\K��,��5�KD���e����r��+t������"3%
�%'����tU�W\&�	b�G��wp%��CR�}�$q�&�2����W���[�Y���=3�V{���G8���$�mMY�W�������h��%J�6}�����/���������AXf@��
w�=����z�K���7=E\,��^2��� "0�S�x%�N9fO5G���E��(4� ��J�~L�	���bY@:��'��(�vh���u�.x��k�g��	J���,�V+��uV��De�+�����xX=�uw��9 (Nl���.�;�,l�*m	&|��<r�_R�e�Hz�NG��������w7D���@�:l?�/�8��0{W���������h��|_��$������i�f6��<N��-:�{a8�r�����~Cn)�FT��|ip��3��A�z��-W�PQ����$7���Y�$3����";�@P`T�"ObI�p��<��������Y�lt�&�&��,�?u�����
�Z�����o��a&�v���l�����Y���!_��43�J{�7��qD*���z���q;(���d6�A4�Lt�8�-�o�)�$S[�f��o`R%�G���K<k�+$�;������"�7�����,�2�R������0��v_�.�b/��Dy80b�������#�Nt��s�g��	Q�Qo`�}{�c�VB��s��3���*��3�t��k��:���V������������ ���D�L��+4��MxcQ��_���D��G�?L6)O�����.�N���w�p���0t���)�s��+���+������}�E8)aF���S����-�#Z��}O^��?���w�G��u?����J�v�l��}�,�P�[��D���mr�����y]�����'Q�A!�2�A���zq��-���Dz>�6�p#_!�D�����}�ojL"�2!��!��3n*�#cMjQ���$��w2��<J�!���x�[1��`��j�K+C�������wR����Mv
�r2^mu��~?��Z�1�;��d��������q	4����`�3��:5=��N��V
E�vO�c��9�&8{f�>Z;����W���w����G>����qw������k������z"����(��,2��(�?*���)����U�d��1��d�����X�;��@@�Z�Js�P��i�`+� �>Z�L��F�G���Y���q��>hg���3���@Klh��	=�K��Kpw����Z��y�����`��D�.R�=�@���kY�4p�/+�]���pz/��,N>#Zfj�IRy#z��?\��knN��siK���
�4�MQ����J�����-��Xj����z����gT�f��cN�8S^/���E"O.RC�20����ToT�j�t���������;�6�)@}vp��

�A�*����
���<a([5t$����;DI��*�u�����,����j<���	~��#GZ���s�N���W��l~���&~�e�}��+�����O���M	�Z���������i	�rCn�>���h�����Gv�|2_F� �~Q�85U/� �\h7�o�S�Ox-q�1��li�#�
�1dZ���cR���"v�����J�c�����\-���X`4_[�=��<��M�iU��(
TzRF�	�G(�[��T��z��KBy���	)�4�����K�j[7�<��e-#�3$�[6�<���|�H�[��m�.���K�8��|r��1���c!��.8�Z��+W��3�%^�������)R��������������������sM0��^qe��Y�V��c��`��P����P�����W�%�Sk�m/�k9;�p~L
���X�	^��k��8���$�����&����^]�B�|���Q+�t__��v$N��+���H����t(N�V�:-AOT�'g���bZ�:���Po���B��t�>��)y��&M�������c`j ����NSP&�p��
�@s�f�;�+�	R�/�
��!�[2�ne��L����zM=n04
�������P�W��i���.'��1��)	���h�pS7L��rqZzp���(��wH
��D���C9L��#�37�
R:�
,X=�Y�d�|������zP���$�6p�\�A���,r�P_C�T�����S�!>��)��_I*}��F����
3�
Qb��A����-k�"���!=��p�]����Dz���_ U�g�y���{{������"/0
��)x07�b^Ii���I$�]�@kS$js� Q;`d����t�X�I�����&�2�
<�����\�i��	�
�7��?u��j�|��>�s�zl��GJ�+�l�8�����_(wX��%�H�&���O�M4�`��^n�~��l�$���k��I���,����07��H�TW.������3��Q���Ove����MQ��y�P@��g@116�������w@p����;���~39�R�d����SQ^Miv����<������d���.lE����'gl�������QmV��UMt�� ��I��~N����t{6~��endstream
endobj
28 0 obj
2723
endobj
32 0 obj
<</Length 33 0 R/Filter /FlateDecode>>
stream
�{���j��2�����;������
�	hdi���{T���d����d���u�(�����%���E�B�]ZGvM^�\k�H�)�y��Q�,�9A���0�-.�	D�F�����7c���7�v������c����%�vq�q*$Z���?��O���"+�������o�~i�q�v�n�.)�����Q�_���hNM��l~�l���Rk�����XRD�����Jo��B��>��{;j����'��O
m-����``<	�����W� �(�V	�wY����-�C=,��]d�����1++��^��
 fq��$�{<����0	�2�z�����'C���@���[r��� S�'U��x2����(�]
_� ��	��J�������!B'U3�b���T���-����sD	|���,���b��"$d�{�0��m�sA��i�����)� _R��L�J�4�mLoj��U���9�8��a�y��@���6��9-qK��&i,���g2���;�V�m�z0G�9��nMc6��s�~	��:����ATA4�����mhy�n
qa>��W;AVp\�S��G���:�jzMh>�iY�;K��gP;������u����X���8R�1E�Y�];D]�z������ti��l���Y}!1��K���e"��xjM6z`�v��n~%I�g}�e&��i�`n
��L�����3��CiI��2�G��	�u��,����S'��,V�/�B�J*
�s�����1p�����j���	�F��4~����;�?���0)��T��*���)��gg��4R���d��B���Y/Z���kP*�9���Fp������R{��-�����x�Bz9w>��E %&���d���Y�������d0�:��������IUd�<x��`�
e��������o��/R�w*���+�6KrG����S�������v�[�J�u.����&�@92��$������+_�$6��p�yTx{���e�@�>�h�a�>�A���)���^A���v'���o�q�j��q�0W
��(����!���0��d%q��[�����)P�;����q�c��J�����{���1@��v�PV��X��+���W��Q6V>�F��*����A��+w�Z)�����8�������rzaZp��e��v�0���*�c�5�9��t�}����a�3�5��x��9���#�i>����P�$S����(�FX�c�����6|f^N��`��SAN'W��"{C9^w/X�����yb���X��
GSQ���O��XU���7�d-�N�`k�$�.�<���y����kue���F���i�G,'�Y��ss������>���>����q
����$^�*�xRU�M���B��������������,-�:�y������G�}U^P&��Po^S|���J�N$�`=H�L��xT�{�]�f5a�Q\��Z	���
M^���g�K��{l�P5����>���La����d��9�4hc���+
6���Cz�����<�WBO��<V��.��7m���ol��������e��$��yC��PB���2L�
*�qO�u������
V^�J���-���J��rj��7�L)c����K%�[`��A!�@��4�u��K�!uIt��*��O\���3�:th��F�%1���wK�-�Z4i���H� �`�2P���/�.#{����wHnF6���/5\�L�4I�~�]VW�9
�4��\,`����\B�2!�pN;��_7p����o�x�{!#qF�u��zZ��f&���[�C[��)6-���q�l��2�,E���jz��HQh*-��gj�����G���}
K��?k�GE��=��7^;�����\1������|Z�G���������V/RV��S����2���M��A��l�I0�u��pS"���&J������J�	��\���$����a����"�+��@$�i�0�{vYY���$v���=�����==�_rn�,k�:���p>D�'xy����a�>:i+_��#��U9k��I��y�Z��R���$(M9j}#�����vn�,':��sj��������I��	�����p%�5�#P�[r=,�U���<�+�����^#�5@-���g�F^&��y�Y���o�O;�_��8M��;S���a��E9_�x�������}������T������"i�XT����B�:��!8�L�BF���K����k�f�������6-}S �����<��70�r�<��	�� ��z��PM&(Cg6o?%�K�b#n���mo���=X�x�6���*��2RD$�(���~EK8�������?���Y~���x����7����y*@8���9���H
-� ��xS��1zl�TQ�I���D���T��a+uq�@@�7t���<�'��Ja}�)� �MR���A���nQ��'�����E����������}H�	x��iA{B{zDy�x��������uv�eSs�Qf��}|z�������s�Q��!�������Bj	g 0��N��r	A���~�h�_g_�t�����NZ�<�(nm�]��<�^���d����������yr�XXVA��:��_'� R�}�	�CV���tV����[�C�@C���\�Dn���l�����t��xI�z/V�^2!p�}�i.q\:���
r���4k�[>�endstream
endobj
33 0 obj
2734
endobj
36 0 obj
<</Length 37 0 R/Filter /FlateDecode>>
stream
��1�T3A�0M1��8�V��4k���K$Mqg\8R�#s������V9���R�-�+�1�dsA��C)�{�o��#8t��qFtPp�lH��2]�I����F���Z?]��%�;�f�:K��L��Zruc�0��Y���$��b�I8�=����{�w�����t������h��B�L�����6���&��CD\�mu��\j�����dC��]q����H�z��k��O��F��O��x���m�����S������T>�C��X��mdy��������+� @���%p���*�����I��e�r9��*!g���S]���M�����%N^}f��p$���n���"^C�����q�WO���l�~e�o��qh�53��!��������wIG%*d�������b�P/��W�N,����~���)�h	�	�$]������[n�	Vx%�<��/�H|��j��P������=��F��^����P;��toik�hXPg�����C����p'���a���I{
"
bp�R�w,�\�i�7z�.���lt�L�2�y����L�F���O�%"�L�IC�=�x���E_�b�	��G��f�#V��
-f�Yv��+�X�?p6�}�4�#"�;��������/�C�r[������E�������9�}g�"����!h�� ����K�J5)�^`������2�ese�vf�����N
� D.���@��������V�q�Z�OS���v�zJ'.\�>� �b�N��x~���j���r�B��]H6j.e+��)�Dw�(��a���&J��g7��$�����a�n��i�9��
�����Z�	�Q�z�N����I�s������8+Zz1���2:����]C��<|&Gm���{(y^��d��������!�: d,��
����O��6�������5����!P��yX����n�~�B=���IH2}����r������|���0��|���Q^��F��L�R����A"�N���W����4~1����g�U�������
b
%�NL���(9#��+�hJ-��GnqN��\-�\�Z�b�3*Z:D�i�!������w���)����X�O����>e3Z��2��!�U�~����ngYk�Q�0L�0���`\�u������R�NSuI+(x^b�g���C�;�����~�z0��i��p=k��B�m��J���Q�0� 8�����f#:=�CE��iM�G'��+R=e�8H������)�#2~�JB?��)�"	��V�t��������d���)w���v�]/L�����l_�@�r-:i���k
����P@�k�ZT��q����+��*�*����J3�-���Nx��������~�_�!5�D��w���X$�4-j��X�#n�w���,��f	o��<�dwv�B��4c7A��
MLO�-��Q�V|�Q�|?4^A+�UF����\�47��dI/IqZ�i��)8H7g��Br~���B��IJ�����x���M;��@�|���% ��ieM%�3����XKe�
�N�������U���x!\�����+R�l���J}�Wdk)}fw�D4����h
b�
5&� z�]#���*�D4��Q��K��u�LBr��I���8t`X���I���4�o(�GWB&S��c�o�{EV,9�&�4F(P��3����r�#"E�W����8Z���|�G���p��TH��<d�`�Y�F���i
D�`P�������<�����&����E�2�GzQx��D���K���=��r�lRw��
0iO7p����9���q=E|�!>�Dj��u�A�Y@q|
T�_I����I�]1��9
��
�|���2��j�%gbRp���m()���+�.��u��L��m�������6<�9�-$�C���}hYpb6:H�=��!�q�}��m��i�@��fg����;�LZ���X�EO��Z`��4)2c�'F�A��O�N�^�va�Q��vd��T�R����^;��`3^�g+'��`��8���PW�a����Bh��m�-_��ns�Y�����	�_Tn���8	0��5������w<�M�fSzN�_����hf��z5�����i�^/���{�����.h
L�$���}��k�
���Q��d�Dv��N��L^^�h�pyk?(]M�M�)T,eW?*��o����I���0 ��`��X"����9:���y�I+~���*	��D��������Y�CH4���1���A��`�Sv8�Y�#$�����:��AYa(�{�����y�\@����]���7Y��	���"K&o���+�7t�d
���)�Tg�����y@���o�.�B`��;I����W�|��=I�_j~tl=����+!!��ceX�1�5q�y�;%6ZI��_L������L[,K3�W��lz���M���L.��d�}�����V�@X8gf����xV�
���8<�N��<�Q�+�3�;x�XH����=%X
���h�m�����1�'L��,6�8���O-�I�����"mU�@Nf��pI���5 ��MM�$Q��]�	=B��&��gJ@��bn��D���T��r�EI���t�bB/�[���'�<�j1o�9$~��c�F���Eo�������I�`�k,�(�d��B3�7q�&������L����<�i�c{] ��e�!����f�'3�G��a,D����h�&���ea�]������_R_�/R���N��}?	���V�3o��^<�I\���������BT(��D��M5��-��{G�x����7��W��y�5ja�&�#/C�����j��z����fcpt����=�M���n�����+�0����u��Io�+|���X�RC&�7s�Tr��0d������q���D{endstream
endobj
37 0 obj
2981
endobj
40 0 obj
<</Length 41 0 R/Filter /FlateDecode>>
stream
����D>+�4`�T�G��{#���&�uI���	(��B��b���k��	��G�����}�5���F:B-��C����p�_�����tZ^T���_/�C�~�bcT=���jg�c"�K2}O@�E����g��}������|���/m�)�v�w�HO�Cu�����0����y���	.�+]-{C�	ph�z^g�)nV���R������]�m(L�\����}9��7_
�~���P��#�e��8
��h%�����*4W���b�eB.��b���{�C"��p{���#2���V�>
�\_k�h:�po\�tT|�����;���G�y����C�&~�i��\�w�6Q��`�T�,��m+pT����7�& ��>����r�3��
��=L��eZ/����g��u�US?��%y{��3�(�;��b��������f�H��{Gt��_��J������,j��B���V����5%�������KEZ(3�h�c_�D�i�d���x��mg-�3+�D�-Z����T�����d�P��vZ���2l*��a�l���)�I�:��+���0�_qO�1�s��z,��%�E�_)e���6(�W��.�g4���p%�N*���N�6[�j��,���x;���#I�f����[z���>D&rL�b��G���TD3�~��X��R1�>i9I�z'�����Y"���~'���\3��j�=e_3�����-���B�+���g|�W��]A:���/��a��1�A8
[a�~0��������\E�X��b-	�=��<��}h�}u�����Td��<z,g
z�V	���Xc����?��V(����{*�p3W���:�;��������k����o����TG�c���@	���P�^f�ai��Q�g�N0.�)�6	��������c8��NSF��]{�~��e�I�|w�=1
�
�C�,4f��x��Gv����3�4�jD8
�k��4���E�'���k����YP��+���4RXs��B������R��6����Ts�c���	��Z)6>ZF�a��d�����R�|!A�~$K�[��&Oz,��������3���
����L��:Py���������1`����<�ah���)?�(���l,~�cB��~H023�a@ �x�t�Q�'U�y'{�J��av;��I���j�9����JUfj;S�|��w;��a;�(5������=���;���s����������VW|R���C�������J6�C\p�n���^�6�Z(�U�0�>��2�q���1��P���Q���W��.���EG�fV�����X�`J���ro���������>/V�����(��^�]�QX�v�N�����T������Q�m��c]��4��� poQ-YB��z�����_�"��0A�A�kC,:���8�RgN\��]��������A��1:�����,����/�5�	��� ��
��Z�Xr���M���O`�B��G�I������_�/'��x����'�k"�&$e�/H������o��xw
���� ���������r�K�x�C1	n�"z��4By��]`I�6��c����H��)����Co������y�k�3����)�S`�`���������z$"w��X]{w���c�?]��t�� `#�]4�=:�F4�;�C&�s�0�����ee'����#���0����~�	��Rt�y:���\��K��|��H��a���������/������r	C��kx-��?G�z�)���n*�pZF�{�r��Y96S��C>�B_m��(S���E��1>����
��Efj�>x���j��p���
5@q���E�����0Xj�@4��rI�g)��b���,��m�zy�=��k�r��}�x_��r�����2In,a����M��W6����wjk
�;$5
�p���R@��������_�O�-�!�u��ngV��l�@=�&.�/�&���P�~�c�1�R�)Y�O�boLr�R|���u����k�mKy���y���K�?/�e+����kz�,m����[���9��t!6�gb����E�5�[|~[���6���s-o=�����D%m����EU{����(������L�2�������1���R9NkB��NO��:81��I��_H���
�����&�K6Y���Q�L��Fx�����,�������F���
��9�x�qH�y���+,������Dp��a�����c\�cN�{
�J�W��j�:�����d�wt�Lc�1m���m�C����8|@`���-$����
��yF5�&��(?*�	�k�"'�;�����g�~;���Z��w�2��+�HZ#G�kM�H��E�]����3:�����L�L�w�Z�7+l��y�r���|,�,�k�1���|�/�E�9�GB�d)ol���3��p5�jy2�������-@�����}l�9<$�x�g�0n�\Q`�XX-?OXO8���\-�K�($�u�;���%n��u���K [WEe3'�=-�L��m���Y�n$�H{��`��q?�`K2����D�4$�^��
�h7]/��dN�#\�[w&�s�V��8�L�F�UCI�6�~���;X���}I�
��2(�E�������3��O�O�C����]�l�"Y=��	PAT;���&JI$������!.�h�.���9����6�+���gfQ�(�=��N�J��yf��k>3C�Bz���/:���*�?��S�~4��7���O��9,U������(Y��Gj���Kx�\�<���A��{�dP��F�������O�������7��0�6���&s�p��R7f9�Xy���������fWF��%qHi��i��l��-��2Ql>����{��txc~��fO�K���g�X HI��C|��p�bSR����]��z�8�U����$�������S�����@��"5�u�aI���7���d>���2���~��9�A�����{C�c�	@?&��'A�+���%�t��.�K����(8����<��Zh���8��3���N����endstream
endobj
41 0 obj
3112
endobj
44 0 obj
<</Length 45 0 R/Filter /FlateDecode>>
stream
p"���<T���Z��W�mI���N6K�3�0W:���?wZy<A��R���^������Zk$e��� c�P�����s�IJl
����/�"��3�!�Z'�]���$zS���$|k�������<��R��T��J��V�)���A���q���g2sM����4��Va`��E:"I 1�����
T	rO?u7:#P}R[P��P��nx�T�TKT+�T".RF��_"�f�5�_�M���1���
;���UXXz����P���r5=&<�D$&^����9��!��i{�����Z��I����%s�]�-���v����l�y��2�6��"�\a~����HJ_L,��J���k���~��*)�Z��Aq�P����a���o���0�2������\nN>&���7�Z�������WC6@^���7O��9�}�e
��,��]w�J�=�������^���![��^y�UU����J�;�T�wzO)z�W�h�P�pa�e��J��LSd�y-�B���y_h~�j�E�xs��I���8S��B���*�B���d4Uo~������,��{`��Ms(�K��T�<G���dP���[9>g-&u���W����:O%���9�m����B1<p��'yC�����N�����w>}�)��d�=J��2��+���hG������c���.T$�S��k����A�$>i���2��Q�����\aF�	�a�d�!�T4�o����Z��\S��"�r���U���C���m*�~���R�d�����=5�b��E/�]1�����R����h��\�!��_X�{�����E��m�5hX��+n�G�������}|:�~��^�Z�iMVa\hg���'����Z�����sw$S���D��#��Tj��JNUP@��(�����2�2:5����`o�s�>R:���{����*��k�l,x����n~p�w�K��|��+�cT@�������^$����en����0�l.����������OeCb�'�V	=�a�B���-t;P����b4��"p�
/��i��oJWy1��=�������E�5c����0[����u�'Z�n�
���n�}�#����2����m+��b�JV���*����y����?���'Te��#<sQ���5��f�Q�%N_f�6��C}��P��s�J�!�{-]�<����]�D����{�V��0z`0,��6!%Ta����w�X��z�|��WO�{�]Y&��a�s?��QZ���h�AT���e�t����<;�g���l��%"("�;������Z
S�
�������u8g����E�,Qq��L�c[T�OH���H�p:j�w�EMX-���g��8�~���
���G� �����C��J�l�_bt��9O��j)�g$[��j���}�.����Um�U7���<C� ��6$�r?�&�!���p-��1F8(�:�qQl�4�T��|wuh����<�I�v��F�0��#��e��LC�`O�/���6Q��e�Qv�I*��c'�:Qt-��L�}�����
�y�P����\���@��u��|�!g���q�����cHx�P�s�4�LNC�=I����eV�FMT�S������/I�Zg�\r�Pe����,��{���{CI�|��$��v�9���7n}mX���"�*����-Q���>�i������_<������L�n���D��b���#�#�
rQ�EA=9�@9��{*�a-)���S��q�#2Q�������|����pG8MG	U��Y+����Q�������4C�vV�� ��5��bN����V������$��"�c�9it�2�,:�&X+�T�,3��=���f"%���|���|����@IK]��>���$�����
������F���qm=q�2�5�
�>Q!~SF��J_��*��Cb����R�����0Q
�4��y�*��
�����&�7(
	�u3��_��)!�?:�P��rQ������Iy�w+�(�wSm�#j��Wl;^���{fO �c��	���.�Ph�32�T���6AoT���?�d����.}�n��j��xF����5H��H\��_u���2����mp�����}9��}��R����o��
&���#�r�y\�C���[�����|e�o��[B�D�(^��t��
IU�����������c��OxK-<���M���D)^�3�b0~��\���o��� M�������QB��d����A1G��"~�:*�O�Y����Wx~�1�{m*�V�3��aWr%�T})�	N[�""Cp��@�QE�Wy�7�N������K+Ajf����~{��P�E6&U�S��?& ��c-�t�[�A!j;�^���&��0��z�����I��G���z�}�(����@7e�P��Iz=�g��T�uX�����Dn�G��]~������h<}jKv�
����������DN��8
V��`�����@$I�����u�5�W|�Ks�����vV�Q���"��d.���
wE��I�8�4H��3c��N����=���;��*��h����R���Q%���B6_���D��c�)�GA|U����������!���
�2�I������ J@Z���	�|��q����;�N��e^�|�(��� X�,T22�{��  ���aY����E�+�I�L �9l��5N{�-Fb>���!�!9�p��v�<)���o3E*s1N��U�:�f����m��@lt�^-+���� ��hW?d3�m��gS�j�Eh�H�
��Z��� cC�����5���X�
��V������ttY�7�����^�I����������&Y~���@1�C�1xF�9x�
?A$���I%t?��@��HH/Q��0o�����o�\U��h���}@�1s���p)�>���e���qKA�����a��H�U=>Zy�-�Vb�?gp���D��tP��^�}�W�����A
P^�(�^���OH�aI2�%j���J&w�����IN�K~�AX}��TN%��-J������� ��,�����=.����1dP�������������d��[<�{S��H>2�o�o�
�wi�y��-M����51�9g�4���o���4dh��'Aek��l:J�dL
FQ�����������Z�@�o��eO��nV[S�^Sv��� �X�q�%�R�	H���eSj���G�kQ&�*.L���%)������D�y#QS��U�!~[����Q��������j�z��"��9�:w�F(E6D���Z�zv���q� ��/��9^�J;)endstream
endobj
45 0 obj
3307
endobj
48 0 obj
<</Length 49 0 R/Filter /FlateDecode>>
stream
�{g�v�U`�6����[(�-N��������Z�Ca��#��m���������U��-�%�g0D���k���������Id�f�1�-J��O�h��
$�dr���$FE�Y���G���'
X��@��E?�<��GN�u5R<���n��������H�z|p����~�a��/��d��N1��D�k��� f� \.����h6pwb8U5&G�,�����p >U]�{EP�)���!J:xZ�*56l2�5�0�������l%�@�b�LB�b��	�<d���(��6�Q>{���Q�QO	V��C7_6�;�����`����g3�]��5xB���w�RUXW��S������l�T�e]�sN�	�{�p@�"%���6�T��V$~��b���@�(���h��������N����W��#^D�kPM�y�@r�{��m�]�V����*��6��py����,o��FS��8��� ��srA��QJFW�}D�/jE>yr���xRB�g�`GV����	]��"�f�QeA�/�,���E��=X�v=O�|%J� ��9��[���R��]{e.�$��41�MkL���t�TZ�����8fhx���z4��WE�Y�,x�3C���}_���xmVRuyN-��������@���p� �2�������
�Ds 
� �K�u��Oa\��������[I�?�;c����	188�y*]����C�*�P��D�)�B�S����r�T�>W�= ������^�����s�����;�z*zKf�'R����� �.�������n������YQ�����W�X�2�
�K��:���7
�*���Ymd��e}��
w0n4�����e���$Fw��-H�#*��~6�r����X������U�D�#��!�`��L����*>���tQ��b�)RoL���g���j�'-�����T�^�Q���a#5����1.��Yr�-K��R���0�JK���v5�e���3���;���o�e�9���{U����m����t����#!:�Fk����.�A�������S���V$t���W��Q���t�.����4��2�"�����������7���ujTE�fd�����y���I�OK`)��a��(�~^��!_C�JQ��lD�DV
�A��P	$�1����|�O�98�>"lX��4dg�%Q+�Q�o �/�M��YK�3�"I5�f=-�=>_�rh������W��&�`U�K4���R�W368W-�%<{����_�~7�fcmr�����R�P�9�`�q'��������Wu(-�>�����A;���a.�(kH���x��d[�U*$v==@7w��_�������02�'��c!�� ��bS���s��#�:�*�G��'7M1+����8�.��l��6�51�Q�����|��"Y��$Yp����d���o��nf�]8��D�v�\&���*x������$�k�*}�;t5T|`���'��qq�9�R��=v�n6�fQ�p�����}�#�]��7&]eZ��,��������|�o����P��:X�G��o��+�����c_>7���"�N���:��@�.v<k>�X}��A�G>�{P�
��A�����T:��j��fJ���q�����B�1��4
����H����T�PU'������r���|��)�@���	�;">���Cb�5�F�D�m�BW�h\�jm)v��'n�������4b�)���N��/!���������x�K3�����sy�����������d�V]T�4�K���o��7����%�vC>���=��r���wa�W�����*�zx��aS�����Xj���s"�[�O�(����L��{b�����`?���G�?����sU@�	j'��6a%w�h�G'�A����]��ml���Hr�@�������%+�~��h?���
�9�o�r��	���U��%Y�
5���@3���7��R��x��-��{�*(������^��t�J�MZ���w�-c����5�p�c����R�b�N/�|s�DNpEY��~M^�b�"�@�Jl|��9�2$wt�f�������,���eWy�i�,��2I0��f	���Hs6"Ckf�r.���0�i*�B��A��|�����q�jG0s?:�JY�$�_@�������o����p*��%�
��6��A�����������)�����:��	�1&)]�����S���oK�2�N3��
�����IB�N���>���0?0�h_���-
c�&��_�j��]G��[#J���w�����e��XP
��v'�as�'x���=�.
�rrh�h���#�y��N��p�Y��_')��q�[������Vqm������aH���M������+%�y��/,�B�*	4���2 ���^�`��_�V��S�����-E��N�����2(�1-N���{�uml���e��2���`����Z������%�Q��� ��W���5�P@'�i
>�S���~���E��y�y{�FN{��)���N���~s��4�b2�=��e���2N����
�8��/�"�f-��h/5ui�L�gM�
�e�IV�Z-1X�8�Ov&".>�������%��1}�y�LX�<��Z���>xO5�\�Q8a���{1iPjT'O���WN;��7�%�]���7�?LNR���^�II���}����;�:���d��r^S�
������Oe��#����`KL��^���\���:���*����@���,d�dVxS=�����x+�����&���g�h�H|�o�������T��
�S�|�������l�D�������f2�N�*�c�� ��*
�}�{D�2��:3Y����C8��	A�uj�^���7���Q�@�M�>��9���b�������Tn��"�y��������h����9H�k����$��cPO��Jo ��Y�@��|��^���w��{a��K���.�`�/9��l�cd���+e�f~��C�#gU~7�����?'�>?��br��LB�(P�3��+�Uq�t���Z�;.�{0��T�)�p��8�[����A����"�4Hj��^���N���3�p�N0���.L��� ���)u��dA%m�o���f��7����\�B�*�UMb�%N�t�|a���F����X����R���������Y$b�����s"���cWg\P����<0�����3�S�i�!���1�]������1�Z����"z���S�����+��rZ��q���+_���P�'�4�$:�[�v���a�3����$x�e�p
���&�2�9V+
!.����p��K:E�
���;�����
#�&�<Nf
d��W�"a�1b��
vV���Oxp�a�I���CA��,j�������[S�IM���u�o�?Y�2?�R�������L�m��z�"I�3������p/N{���Am1&��e����O�E��,�������_SmR����?T��/��|L���}�F���9-Hg)�M�H.���Z����:z��'K���6/����&U���L���z"���	)�3���_ZF|�������|�F�5F����T!jzplAS�V�n���V�:	T�p��#b��a���������l'g�n����� �
�3�2��>�)�VD�N)��O4fdn~-QVf/�]�����<���|��=4�����Lw__�������J+W�h��*���f�������yy��Ah��^���N���R[ )�D��%�:�����h�u�����~3A0R����K�(q�����T}����A/�_�%���5���� �����T��������>
*�
��!�^�O���B_��U1G+8�Tn����cc�>VW�vm�Zl!@;j�&�k+�+��p{UxP>��@		-����F�d_���yg�PTI����>��[��p���o��o2�������E ��PQ�Hz���\���$'G���v~G���P���
�9/5��"���WJ$
���/�h�7g��`�'^��������b�]U�yb��W
`yz����]�endstream
endobj
49 0 obj
4157
endobj
52 0 obj
<</Length 53 0 R/Filter /FlateDecode>>
stream
���B8�tPFT�3�N&�#'b
��B�8Ds�9�++5�8�]�5��A^�*���+_
��l$Ko����
xq55��8h��.�=�	>Z�
�v�][1��e�p�n�^���'��������*h����d�-n��7ZV�%�Z��w��$������3�1���:�8���>�B{���,�x���%H�;3sg����2���T���Vn�^'2*��_���A���g�� �:��?��	�z4Lck�y�R�������,e��.����0$��D���u���z�#"W�)l����z7��:���,�m~���=�XK`�+���W���,��O���NT����"#��x�@m�8h���ew)D����>2����x�����w��������M1���5����?�!Q�M�d�_�Lk���@OP��H��F�h=�T���T	����S����=2.�C"����T��_wJ�����RH��b���S��_�MO{^��->/�PK���,���l�����`9a�P27OW���������=����� XHk��~�����Ts�M���}���7|E�l��(���ZP�J#G|dt�����]EC+}�w�q��i���v���}������v�Az�<J�r/�"��|h��������Y|�p{4��5w7U��n���V�-���(��T��f�?�2\��a@���s��s��`�C����P~-4��Zc��z�V7�iUbc�;sD�������SW�W^���#������!���<�����]��qr��T����;����h�R��*f/6��^$�k�PGs��
	�W����ys�;��_��E�8�Q����:���f��
B��+n�������P����6P�g�Y�����A������������E�C��/������E%��������)��������!��f�g����
Tf�&�����<�:�)�t��������	
/�E�N��q ��Uh�.q��B��"������r�G�-�(��h��)���T���:|� �����x��2'�6�"=���`�et�<����������$��^����l��g]����0�sO�Z����f��}O�z#�l����-����U��J��yF��~}�d�N�+1�_��*(���;5X�e�Uf�����\�E]2��CJ-��
7�"������B&hvpQ���t�����L�W���B������:f=p���W���z}D������@�x��'�$�c�4�b|��"/��G�����[��e�Yd.�mr~���4��������7�0>��������!���J)u�N��80�}��S{�XMwp��2p��
�;�5���
Y��Mz��Pr�N���V'�se�)#�
T[��m�*w�����e�*���*�����e��X]-F>3?!��w-�%Y�����A����P>��l��)�SF@6��q������S��9�J����{�6)<�����;���6�3X�q~#�b��}����T�k I���
���ug�|�>����g���'�I����/-d��9�H�S�� �!�pjr��h����2�V��}��)�8��5R���}T>b�f
����.�������]�[e����va_�:�7�N-Y��cI�{����w'�u����E9#~���,F�J��m��X�j��0FJ�N����V��-�'����vJ3���
�I�m-e���T��]��,B
����Y(�C��-�'gy�O'�$�!�v�zi5$�R�\W��t����5���j������Wo`&��7�D����bv�ttV8�����o2�.�8���������
��
<y��g~!�>F�&������?�v2���ta�����������A��b��r��V���J�kB���T/N�t�������8{��4�!/��HJ��:|\�i��7��Z���U�;�hW�BCG'���[nv%��HL>bkR:,e��$�i����BT�lJo���6��A��-S@w�;�5��2���Rd�z{�O�'8�]z�����4
+�uMm�#Y�Z��T:���r�j�F��Q5��}�z]Ue[Kn�_)�-,��������>y������#��JmK�m.�
���?�~����`:%�5J�]sl�*�j���X9�D<��h}��'\�D�;B��'h�Z���doW-���]�����y��5^>���H���;l�����������hAg@��X�.y9�(��k�3:�<V=}���^F�:���`v���x����M���*zw>�����
0|���]��,���O��
t$'I�0�6Q��,N�0m�7�3���S��&���l�����]�V:�����������pW�~6�(3�+������IS����K�s�r%%�4*���
Nk��i����G|�Q�>�8���1V��*x?��&��Nq}Q^�q5O��������V-
�����[�y�%G
�����Q<d���v��g���:��{�m�����%��!X�c%�k��X ��QHo��3\���m�P&)Rf��il�X4�z
�=��f��k��_a�^����Vi��M���c���`��wqxT�K��O ��:��P�s����AD�J
_��H����2PgD���W���	�����i`P�&��*$�x��g�|�F	f?#���S��
�V[K�sOG6@�����2��@�~�N�IT�[W�9��L�S����TnD�@���/L�������1���+��=�{b�[E�}�]���R�����`��?��h/�q�^����
�]Uk�lpA;SX����	���PCx4
_����Fs�)�����N�]����k� ��0Q��Z������Sl
����_�I�k����B��E/�����
���(\�<Q�Q�h�#��,a+��9�.>�&�=�
�M���4�0�*8^�4�,{!.��kR�wD[�3��4	���'Y*��7�0�_���n��	���z_���Ah��?�%�i�_/9������~���:����,<z#�����F:��9M��SgJ�K�6��7X���1�bD�
�����4F0�K�+�v����R�ec��sAA��d��yj��p�E�`�HU��b��'"���4��\����c~ ��~[���S@l�1E���M^
�']?Vx�0.����/��p%W�?\9����Y/�
��v�LZ���]$T�+�f��@.��4�������7\�-�y?���8Q���)h�Z����o�\�P	r��tH�_��9y� N���M%o.�h�������:���&�Ch_��%4��<�'��mW�����+��)�
Ss���l<vsxg���L�����.p{����Q-�d�t���I��o����K�V�%�V�
��1�cL�6����-p4Awz�����m*���o�f����x^.��|�6]������N�&@])RF�(F������i!0��w�e���N,�EW�|*N\V^n�l�����![u"M)s�2~�fT��'+���#_&N��'j.��+�������_n�s�I,�X�U��6p�%��2j}��E����H+�Lwrl����,I���p��M9����������
(���O�%�W`XP������&��
fWL�M9*�7��K���N�0?�W�P{������7D���G��9���_F����t�]��cf��5eBU���"���DLk��R�`�����5���e��@*I�~!Y�1kU`t�72��?eFGUkY�*�k3����Z���M���0jg��+�����7�Ng�V2�4d.���O�l�����D��I��r���G�zB��
&�
2Z�F_�([���N�-�)NR�>I$��p��v,__�`�H����+�\�������Y��1��a��YT����0�I���~Mi����80��8yp��|�8�C�������rr�0�:<�p$���Ds\n����|�I���J`�f�.vD^Y�W����Y���DF���d9���#��`1S�
1�l]�|���BN L�=/�{�#c�e	x�
��K6��.q$@��=!�1:CP���M�!��+�C���"���HIzZ���+�,\���
n'������$�I���/�����4�:*�E�jO��"9Dq|����TP�b�^�,�$f$jJ;��;����F Gh="
/��zc�4l
>��m�v�V|O��DeT�f+W���B��B����B������B-��a�(�����D��<���<?"��L����	����o��Y�
��z
����XP���h���80����Q��S�������~Nv!��2���4I��1P�|�`�?1��R`�z��!�j��<��]����O�6X<5�����y��J4�4?��&v>&h�k�[@�I�*;S��T��#�w���"���z�=>������
m�3[��-j��s[�~�8��?S-5�����!#��v�z^�������g��
x^��owx�5/�$�T45I�^Mh���endstream
endobj
53 0 obj
4607
endobj
56 0 obj
<</Length 57 0 R/Filter /FlateDecode>>
stream
S���w�XY8]�V���z����t�.���d{��{��$�~��4���9U�x��h���H�������s������_��s�][ SO������@�5t�x���4�P<%h��ru���0��}n����������(�`*t�7sF5�]�9:����m=c�~�n��Y������[k��@�YJ���h������t�~�Kz��]�G��zY���Bm��l� ��c������3���y���f���o���Qk���-g�>]1��Nj���wq<?�����e���U0���x[����lI�����,��-P�?h�p�9�0��M�?
�0�p��G,����"�[7��t��K�V%F�A���}�\�w��,dR�U��e2
Fjw��,��A���0�����c��a�������/��������Q���^�T�* �<�@?�9@	Fv�����z)@�
���
����[����(#�l%�[�3��bCq{��zd-(���Hs�Z���d+j���pwR��Hx��1����=��E8���
.
D;ZJ������1oZ����o�� �",�k��%���.qW>v
���������vP,�{3� M��4��o�c�sdo��/��V>�[w��$/�������j���P��U6�+*����D��i3o!��K��V�f7MH)�xU�8����b�Z���1����Nc*`�a�����D�X���Q����������d=
~.�����eDohD����#pu�5rW+����`��E�+�i�o4�+A/V���5���$5����q��8w���y�z�k����a�����_�!���JZX�C5�SZz��}��w��� ��f����e�+Q�����^i��G�z=$byXm����!����u����d�K*L"��nY������?]�:��&R<����1�F�-��U��M�I�g ��qhd�G�RE\�
&��vE�?���R�v�	}��5��0P���������qo�i��U>�q��.d<,�c���ug��Wk���Gt�*�pH5Q#��7.U��~�����3=���B�����^�	����v�T��X�����*����##�!������t���W��|�Y9��'�U��R�w�7
�j#�,�R��k�K�`�N�rO��y�4�P�$�l��ic����;�^}������4�u*�vo���x�!��F������c�*Qp�%�C�Zxu��e�'�����J"uM�oI�n�#�$Y����/l_l�`�2%��������WMf�Ck7�L�E/^
#���hk�[��u�8��2�L�J{��C��Go��������bIa���T� C"6lL
)��Z>�r�34�y�o�Z<a���������-A����^W���R�4a���|.|tCU����*<{'�.��<6M|���FI��I������G��h�1x��9��pR9PSH�>��RK�m�o�G�W�.�r�Q>y���[�4����pSh%F�
���us5x%D;���7�� ������~'#6��NxU�������7N�!��?z��l�u������dr�#&���{��sS����I�3�l?���n,�L�n��H����l��W)p�S���Uf��_���P��tD�|��������C�}����ri���K� ��P�����+M�t������R����?����O����y��:�' ��.
�O)���w�bQ�u��uX�PhtM�Ayj�P^Ic����VT���l��8�	SX�m�|�i�����Dd�q�AJP���������=8������(��4�_Zvd��*x��;�F?�Od:={$&O�nD�b=�p������/�gZq������)��=*�;^I
�������2cm��*v���h�fW(�"PD��$, ����U�
���:H	���V<Z]���<���u�����|�e��~�Tz���H����'
U��	����F"x�
L����h�Q�x\���.�E���:����
�w^�J��U
8����uo��A��v��Q���<��)m����"��8��3����N3o0��(�}P��[Eh��N#R�@������;U�Z�
�O��2����EkCbyY�2.�������l)�/����F��"���*�:�3�hp"�t�����<�L���K��4|��]��>e������p�����.����D2��2����dv��m�})q��b��v?��5U�k���#�B��1������k��E����ReJ����?�> ��`�|��f�=�����\����&��pm��e�	S;K����-v�v
|w�����������Q���i[l?�9?g^V���
����a�]w�]Z�%��?3�UklK
���m.e�8a|?~�x����om�������
.���&�&U<_�w�v_;�|#��!�� ?U��Zhf$]aX�@�Vv)u��'������xW�����'�����0�������|�w��B�99
^?*�%=��k��gX��Z��v����@�E�T����K�;�{{`����^��U����l$@��n��,�j�'8�Kf6�V�*����l�n�{���u���
��HA�?d�����=��4F�����_y�l����z
a�����i�����`yj����$�����
:�����%u|���y�D:�\L�)x��:���\���70�H7��(F"�~ j,������K���������V����^�%J)M��);�A�����=�����v���"�(g�e��W������)��\}���f��cT���q|���g�����>@��Pw�8SJ���7�Q.�;"�����(@�B�����g��~�p����^S�\��r�\�'�3
i�
�2K��K�
Ez�,����h9�I�8,�;0N�������\?�+Ty���
�����{��I��'i�����?����S'����]��6�7����,8�J9,l���FBnl�^�����v�}O�!	T���c��Yf_����8������0��O��z��������O��I�o�`%���(�[�	��A#��#��/o�Qn�+�F�wW��+�DG���SY)��T�vd�"�3]��~�i3���%�\��C"|NIa�_���r����9{�E(���?b�h��e�]���>|0!�75�8���`KA9�g���O�f�5��+�8�x������m%�E����k8-�����[�q�� �� ��l��Wq �z��Y���������.�YDSQ2�m�Bz
��g������+��<����]2
g�"��#D9���z���t*��� ��"��{dendstream
endobj
57 0 obj
3376
endobj
61 0 obj
<</Length 62 0 R/Filter /FlateDecode>>
stream
�����,
u�z������,o���c��<�_@�{�y0e��!=	����b��m1~�x�����+k�TT/�k%v��Z���<`=d�Fd�C���h�����1#�I�+��2r�����5�^�6�
�o@UOa��.A�[Z����emLF�Icy���MP�P��\�X"�I{k��_�!U�|L:
�����.�Y��T��2r����8���������F_k=A,1��\���K��]��V�y�	��A�0
f���</��5z<c�}�������F������V���
��Ey���1�e
��6�C
�Vc�3��1U��qCYGbW�����<k�����V�q8���=,�Qo�k?S8����0�������m��<���dl>�:�sZR���U����td;�W���m�3?�=*I�l��:�a�MG\����*f��!Bc���:N~%�=�a���[���`�%��
�������l�dE~�{��\KO@,�$��W7�YjiXU��cg��+-���aK���\���&����G���$G>"�6��s��7�G�9]��=}�^:
����;s��|���i�xSJ/���iw������q5c�d'�f�_��-��2�N$�g���A������l�K�5�?�;waP-��m� ��
�h������D�:��g�H&�?�
%�AH��j��Q3�6������9h3�����{6V��o ��SqL����E� n�������b�w,����4��g���b�^E[���?���O]���h�B�8V��u�)����bSvwzo�^�E�/=n�e��],����;�D�!�h�e�^o���U�
���p�ylo����������k(�t�
K�Q��*Ya����3/qf g��o��jN�p�4Z��N�L������p�f�~��qj�<e��>l�ZvJ���J�G+��Y���S������j��7w�`����*����b�����L^Uc���7��gW[�����$���*�j�~=>�a��0�P��wvd����~lkW�NB������ ���������V�#!� �W�a
�L>���{:gdl�\��:!�}��&R�=����21?��/�v��d����$�vJ�/gX���O;��P�[���5��a��L������t�?�������Z��Y�y�6��m�-��]�B:v������I%:�7���)6dz���G��%P�IZ�!aZR���Un�o�E��g3t��2����Ww*6�Q{d���l^f����
���"�����9G�c�v�2v��|�_J����|t|��7����v��� xj� "F-lM8�>�������/��.���~�AeI��_��A�[Z���d�[����}��!�,"1n��e�7DL_�{)�sf)-�;��������@����?��4�g6���R6�;O]�TF��	���t�+�i����9'#��{6E������Y4/�\8�*CC�Yv��LV��d���������%�T U�y@������C�dv�9D���w0�gkMa`!��bk� U�.�����n?w����]>��nxab��|�~$A�����{�����/��h;��?�5m� ��Bh�Y��_�9#�eGH=u����oy���&GQ�nh�����1���4���+������	��_I��o
�����80������Z��{�V��F��
��3,!�B�'F�H���a8Z
�b�@�(�#N�k#N'��l�]����L�X�`�I��e��#���u�}�������:���R����1cn���&�U4�il���/;�����������@������"�K<�A\�!0���=k��W��R7���	*�G@w�h�����n]9fc[[?V�@�i��A������f�>�����@�W����9;���P��M�0w
�����h�P����j��?o�`���
��].y�#1��@�\t:E�}a_q>d�Q�x[C6g���|�����#:��� ��>�|�bv�xx@��>�?���e���$��M��(�������_&���$�Y�I�������X����lHb�1�z���
6�z����O�M�$���@j�b1���4����1�D��p9��!V����e���8tC;��}��&�h^���P�X�V~�_�bHGO>�0�;G���x�6;���.����7,��P��Jz�)��U_a�f�N�8w&�q�8`���	c���)I��HM������"�2�O��NV��{k��������������avU ��\;��t�!�"Cp&K��N��4P��>
:�����RUw;Q^ .������"���>;\���`yi�SM���� .��2���2hjq22�_x��p�)���:2"��g��I�������1��	���&�P�g����BG��
3�����G�+0�����1\I����c��|5����@��D���4qAG�[�b����3b�C��4Y�����N�?�LNfOE
���c&#2����wt��"�F������p�Z
Zl����\D>?�,�7"�Y�fM��+��/�fV���T��$h(v��b�t��O]\���9�F���'��S0���C��!�r�[�:����x?:m������@�(��|4!�J�T������9��*���@f��6�-*yD3TwB������J��Z�2�!@D�Ws�dQXs"��i�A�2�Nyr*����5���c���eL�F���s��?�����n�1������sF����Q�>�X������8>������|�5�Z��
>j&Y~G��k�)M��T�i�,�����{w���y
��o0�����<�5Re��"�(</<_]��9�Rmy���C���1�n�p��*����}�?	F;��@R&�6f��X6����s��e��	�s���1��6~�r��`��]6M�_�HG�����a
[����n)T$9'�&����v��<�F���K~�eS���1���SpB�9w�n�7��:�Nr��Fw����������E��D�]�^H��=��(!�C�#
�
��uYAg�N�a��A���o~
���7K������k?:?5�p���Mw���T��*����FU�P��BW���Uo}��~����a�1�X����� �Sv?�X���U�����g��<�z��C��_�X:����Q"�4�7����e*���[��^vtM	�2k+�� �������6�p9��A�j�G�c���juGly)u�5�7��Z*~Q�I$�&�_g���6�b�����"^�f~��"�`K�`]B#�;
�L�~�+����,�jU&A��o<v�
)A�3��45B
:�6��S�R��NR>U��7Q���0B��5+����!"z"O��w_[��/���!��}J����w�X:�P�8���^��@���QKi�8�1��A]��h�����������.����%�+���
"�"�%,������t��endstream
endobj
62 0 obj
3551
endobj
65 0 obj
<</Length 66 0 R/Filter /FlateDecode>>
stream
��!�?>r��$�}0�:u�!�`���5�P���*�!���&-��1�(�:�6�U��zd��H��v�8�'���"�|�$+�B��F%���QkW�(\0g���f�����h��j�R��"����o�2Mmm���6T����H��� ���Gv
\O�U�b��Iw��%$j��"V���E���

x� ��p�%����\^c��07�P=8�/�,�	���n.�V�d|Nc�Nml���	>N�d"�����3lYJ�����K����|��(�xO�H4��`������q6�=N~�e�wN��M�JSzb�[a�8�i�
	[�<����2��,t9��:�����4D��)��b���3`��Qe=ZrrY��$4)��&b���'���o�4n�i�2o6��W���a��z�veC����aI�u�e=���#�*������D\�h�&�g���j�n��d����XA,�`��>F=��#Y�|����$N�?n�&~dE^�FE��p�u:gWk/�qh����������6
cI�"���Pq^�^'�9�Q��EB�`X����=�r���T�0�=�L�����<�=
���OeV� ������x��ka�	�S��K�w��J�&~�gc�4��N�������<��K�1b�<�K9�~u����X���pS�:O�:�A`�����n�"�H�bZ�?��Q�Z��@������������i�?s���9���0��z��+%m�NE�M�=	A���y�b^k�l$����B'�{s��8!�����*V�yu�D�o�>65k9Y����8�"�Xp���������:��m���8`���:�.:�[���W0:���U�e-��Y�,�W��
y`J�(�����/*m)J$'���lf|����Ey*��=�Y�����HI������c��_(I�M
�F��������6���9����_�mX�2��\
��'��l8�����^tf3��O^=NV�NLN������_�~&�\Xw��c=���eBQ^��j���SH�
�L�����d���6n���u�u�`Z�*��u����v���o�u��%���gE�]�V�\�����v}�g��ap+�p��)����c���iX	�Fz��_�X3/w"���~N]�P�N�����2p5�<��I�yW��W,�38�Sn������ge��A�P��,���������h��d����������t.�L����V�+"{w���o����2A�no�����aa:��������m��an�������>:�H-
�"V;�T]�)�K��@c�����8
�����P��yQ��aR:t_�5~�`�;0��.�����L���2w
�5�E4v�V��|��5V_lm��G�����3H��j����T�s�����z+��i�q��j��2�����������m���YR�u
K�\v�����+�2u�F; �D�H:����������j��
������6���J09�/.�*LP�@�Olj��X7����7R��_QB`����
r!OSZk�y��rm�_W1�h6`(��F;2������8|]�u�����3� ��[}����A:�y���*����,�R$����U�2`��>�X��	��7�j�1����zpb���8���[�������ES��F�|E�u���2�DJ� G�Hoq�����L�x��������)*����b��w�tM�y��cL
�b���W�Z�pL���V�9k���S��x!n<K� )@�=�u��`��`H>��g��H���Ap}����;3UN��zd��;
6������[�k����}�A���NF��o�Es�.y�������B��B������^��x�b��`�;R?�eX~ES��k#4F9|�.����\��z\* 9.�w��Tkd�3�l<���1OW���(���~�`�/��[���o��BeL�����k
Us���w2%;����T;Y1�7/��HZ�B��F��Oh�[Lm%[�Y�8����
�I� &���F���r����}��4����F��KF�����51���5���WvC��<O��GG:7��:?R���0f"���������C���M>?���"��]��yZy@,�(D4�A`q 
^��?��R����.�A#�)bf|��/w����k7Z����IOo�J��<����g��L��\$f��P���s���7�K]-_C���8�MlU����cBv�?�
�F��a��R��M��.�������9
d�f����A�#j�������6������>`�z�c���#v���-XGR�_� ��vp�LY0�W��g�������}�����0K�h�J4R�]��<cS�Bw�~#�nub��L�YW-���������j�a|3�����)y���t�b�����)�&"h7H do�i�����X{F���'�d�\�z~MN�c#�%�$y��O����]��o~(���������Gd�c|9fpv� �6�m�4fR�2�b$�a1����(��F�6T�����+����t�Y��E�c��Z����"e�Z	����"`���	^nD�a��E3K�EP_`��1�����8��a���Ca4���J���*��8�l��-��X�@�|^���&~��4����#%��	�s5�Q��� ��=V���j���O����C-*�p�:?�3$m��9%���Uf��#�x`���R��6����&�U6�
o���V��x�������QFj��wz���j����0��Z���6��I��{/�8`�  ���*��b�����5��s��!KA�q�d0���pD^ ���^b��G��������-G�3{Uq�9N�����I���ZHgC�8��^��gI�i��3�AbK�H�]�H�*��l�������.~Z���5�E�m<)6m��3������:�XI�e�RT����#�F�'�M��XE�kW�wG�e1h��T�c�2�PF�
�LQ���)�X���&e>�F��Epd6M7y���y�����9endstream
endobj
66 0 obj
3095
endobj
4 0 obj
<</Type/Page/MediaBox [0 0 612 792]
/Rotate 0/Parent 3 0 R
/Resources<</ProcSet[/PDF /Text]
/Font 11 0 R
>>
/Contents 5 0 R
>>
endobj
12 0 obj
<</Type/Page/MediaBox [0 0 612 792]
/Rotate 0/Parent 3 0 R
/Resources<</ProcSet[/PDF /Text]
/Font 15 0 R
>>
/Contents 13 0 R
>>
endobj
16 0 obj
<</Type/Page/MediaBox [0 0 612 792]
/Rotate 0/Parent 3 0 R
/Resources<</ProcSet[/PDF /Text]
/Font 20 0 R
>>
/Contents 17 0 R
>>
endobj
21 0 obj
<</Type/Page/MediaBox [0 0 612 792]
/Rotate 0/Parent 3 0 R
/Resources<</ProcSet[/PDF /Text]
/Font 25 0 R
>>
/Contents 22 0 R
>>
endobj
26 0 obj
<</Type/Page/MediaBox [0 0 612 792]
/Rotate 0/Parent 3 0 R
/Resources<</ProcSet[/PDF /Text]
/Font 30 0 R
>>
/Contents 27 0 R
>>
endobj
31 0 obj
<</Type/Page/MediaBox [0 0 612 792]
/Rotate 0/Parent 3 0 R
/Resources<</ProcSet[/PDF /Text]
/Font 34 0 R
>>
/Contents 32 0 R
>>
endobj
35 0 obj
<</Type/Page/MediaBox [0 0 612 792]
/Rotate 0/Parent 3 0 R
/Resources<</ProcSet[/PDF /Text]
/Font 38 0 R
>>
/Contents 36 0 R
>>
endobj
39 0 obj
<</Type/Page/MediaBox [0 0 612 792]
/Rotate 0/Parent 3 0 R
/Resources<</ProcSet[/PDF /Text]
/Font 42 0 R
>>
/Contents 40 0 R
>>
endobj
43 0 obj
<</Type/Page/MediaBox [0 0 612 792]
/Rotate 0/Parent 3 0 R
/Resources<</ProcSet[/PDF /Text]
/Font 46 0 R
>>
/Contents 44 0 R
>>
endobj
47 0 obj
<</Type/Page/MediaBox [0 0 612 792]
/Rotate 0/Parent 3 0 R
/Resources<</ProcSet[/PDF /Text]
/Font 50 0 R
>>
/Contents 48 0 R
>>
endobj
51 0 obj
<</Type/Page/MediaBox [0 0 612 792]
/Rotate 0/Parent 3 0 R
/Resources<</ProcSet[/PDF /Text]
/Font 54 0 R
>>
/Contents 52 0 R
>>
endobj
55 0 obj
<</Type/Page/MediaBox [0 0 612 792]
/Rotate 0/Parent 3 0 R
/Resources<</ProcSet[/PDF /Text]
/Font 59 0 R
>>
/Contents 56 0 R
>>
endobj
60 0 obj
<</Type/Page/MediaBox [0 0 612 792]
/Rotate 0/Parent 3 0 R
/Resources<</ProcSet[/PDF /Text]
/Font 63 0 R
>>
/Contents 61 0 R
>>
endobj
64 0 obj
<</Type/Page/MediaBox [0 0 612 792]
/Rotate 0/Parent 3 0 R
/Resources<</ProcSet[/PDF /Text]
/Font 67 0 R
>>
/Contents 65 0 R
>>
endobj
3 0 obj
<< /Type /Pages /Kids [
4 0 R
12 0 R
16 0 R
21 0 R
26 0 R
31 0 R
35 0 R
39 0 R
43 0 R
47 0 R
51 0 R
55 0 R
60 0 R
64 0 R
] /Count 14
/Rotate 0>>
endobj
1 0 obj
<</Type /Catalog /Pages 3 0 R
/Metadata 69 0 R
>>
endobj
11 0 obj
<</R10
10 0 R/R7
7 0 R/R8
8 0 R/R9
9 0 R>>
endobj
15 0 obj
<</R10
10 0 R/R7
7 0 R/R8
8 0 R>>
endobj
20 0 obj
<</R10
10 0 R/R7
7 0 R/R8
8 0 R/R19
19 0 R/R9
9 0 R>>
endobj
25 0 obj
<</R24
24 0 R/R7
7 0 R/R8
8 0 R/R19
19 0 R/R9
9 0 R>>
endobj
30 0 obj
<</R29
29 0 R/R7
7 0 R/R8
8 0 R/R19
19 0 R/R9
9 0 R>>
endobj
34 0 obj
<</R24
24 0 R/R10
10 0 R/R7
7 0 R/R8
8 0 R/R19
19 0 R/R9
9 0 R>>
endobj
38 0 obj
<</R24
24 0 R/R10
10 0 R/R7
7 0 R/R8
8 0 R/R19
19 0 R/R9
9 0 R>>
endobj
42 0 obj
<</R10
10 0 R/R7
7 0 R/R8
8 0 R/R19
19 0 R/R9
9 0 R>>
endobj
46 0 obj
<</R29
29 0 R/R10
10 0 R/R7
7 0 R/R8
8 0 R/R19
19 0 R/R9
9 0 R>>
endobj
50 0 obj
<</R29
29 0 R/R7
7 0 R/R8
8 0 R/R19
19 0 R/R9
9 0 R>>
endobj
54 0 obj
<</R10
10 0 R/R7
7 0 R/R8
8 0 R/R19
19 0 R/R9
9 0 R>>
endobj
59 0 obj
<</R29
29 0 R/R10
10 0 R/R7
7 0 R/R8
8 0 R/R19
19 0 R/R9
9 0 R/R58
58 0 R>>
endobj
63 0 obj
<</R29
29 0 R/R7
7 0 R/R8
8 0 R/R9
9 0 R>>
endobj
67 0 obj
<</R10
10 0 R/R7
7 0 R/R8
8 0 R/R9
9 0 R>>
endobj
24 0 obj
<</BaseFont/Symbol/Type/Font
/Subtype/Type1>>
endobj
29 0 obj
<</BaseFont/Courier/Type/Font
/Subtype/Type1>>
endobj
10 0 obj
<</BaseFont/Times-Bold/Type/Font
/Subtype/Type1>>
endobj
7 0 obj
<</BaseFont/Helvetica/Type/Font
/Subtype/Type1>>
endobj
8 0 obj
<</BaseFont/Helvetica-Bold/Type/Font
/Subtype/Type1>>
endobj
19 0 obj
<</BaseFont/Times-Italic/Type/Font
/Subtype/Type1>>
endobj
9 0 obj
<</BaseFont/Times-Roman/Type/Font
/Encoding 68 0 R/Subtype/Type1>>
endobj
68 0 obj
<</Type/Encoding/Differences[
39/quotesingle
147/quotedblleft/quotedblright
150/endash]>>
endobj
58 0 obj
<</BaseFont/Courier-Bold/Type/Font
/Subtype/Type1>>
endobj
69 0 obj
<</Length 1431>>stream
<?xpacket begin='���' id='W5M0MpCehiHzreSzNTczkc9d'?>
<?adobe-xap-filters esc="CRLF"?>
<x:xmpmeta xmlns:x='adobe:ns:meta/' x:xmptk='XMP toolkit 2.9.1-13, framework 1.6'>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:iX='http://ns.adobe.com/iX/1.0/'>
<rdf:Description rdf:about='592720b6-d3fa-11db-0000-0e5d311da407' xmlns:pdf='http://ns.adobe.com/pdf/1.3/' pdf:Producer='AFPL Ghostscript 8.54'/>
<rdf:Description rdf:about='592720b6-d3fa-11db-0000-0e5d311da407' xmlns:xap='http://ns.adobe.com/xap/1.0/' xap:ModifyDate='2007-03-13' xap:CreateDate='2007-03-13'><xap:CreatorTool>AFPL Ghostscript 8.54 PDF Writer</xap:CreatorTool></rdf:Description>
<rdf:Description rdf:about='592720b6-d3fa-11db-0000-0e5d311da407' xmlns:xapMM='http://ns.adobe.com/xap/1.0/mm/' xapMM:DocumentID='592720b6-d3fa-11db-0000-0e5d311da407'/>
<rdf:Description rdf:about='592720b6-d3fa-11db-0000-0e5d311da407' xmlns:dc='http://purl.org/dc/elements/1.1/' dc:format='application/pdf'><dc:title><rdf:Alt><rdf:li xml:lang='x-default'>\376\377\000B\000u\000f\000f\000e\000r\000_\000p\000o\000o\000l\000s</rdf:li></rdf:Alt></dc:title><dc:creator><rdf:Seq><rdf:li>\376\377\000k\000e\000v\000i\000n\000c</rdf:li></rdf:Seq></dc:creator></rdf:Description>
</rdf:RDF>
</x:xmpmeta>
                                                                        
                                                                        
<?xpacket end='w'?>
endstream
endobj
2 0 obj
<</Producer(\214\335D&aK\316L=\311\252\342L\315\257Z\212/d\247!)
/CreationDate(\211\241&Zq;\226\020\177\216\350\262\016\234\355\035\207'}\265%\254\320)
/ModDate(\211\241&Zq;\226\020\177\216\350\262\016\234\355\035\207'}\265%\254\320)
/Title(3d\024\(Ay\246EN\333\331\344>\326\337q\252gJ\375\025\363\367\246g\))
/Creator(3d\024:AH\246eN\376\331\363>\301\337O\252cJ\375\025\356\367\352g\fs\017\213V\360\240\2619\017\267,d\322\2608\003\276,-\350-\373\331\251)
/Author(3d\024\001Ai\246UN\324\331\357>\307)>>endobj
70 0 obj
<</Filter /Standard /V 1 /Length 40 /R 3 /P -4 /O (S���l=���x��z��7D�I2]v];�0)
/U (��3��7\\u��W��-\(�N^Nu�AdNV��)>>
endobj
xref
0 71
0000000000 65535 f 
0000046983 00000 n 
0000050159 00000 n 
0000046823 00000 n 
0000044809 00000 n 
0000000015 00000 n 
0000001586 00000 n 
0000048221 00000 n 
0000048285 00000 n 
0000048422 00000 n 
0000048155 00000 n 
0000047048 00000 n 
0000044951 00000 n 
0000001606 00000 n 
0000002044 00000 n 
0000047107 00000 n 
0000045095 00000 n 
0000002064 00000 n 
0000006520 00000 n 
0000048354 00000 n 
0000047157 00000 n 
0000045239 00000 n 
0000006541 00000 n 
0000010215 00000 n 
0000048030 00000 n 
0000047227 00000 n 
0000045383 00000 n 
0000010236 00000 n 
0000013031 00000 n 
0000048092 00000 n 
0000047297 00000 n 
0000045527 00000 n 
0000013052 00000 n 
0000015858 00000 n 
0000047367 00000 n 
0000045671 00000 n 
0000015879 00000 n 
0000018932 00000 n 
0000047448 00000 n 
0000045815 00000 n 
0000018953 00000 n 
0000022137 00000 n 
0000047529 00000 n 
0000045959 00000 n 
0000022158 00000 n 
0000025537 00000 n 
0000047599 00000 n 
0000046103 00000 n 
0000025558 00000 n 
0000029787 00000 n 
0000047680 00000 n 
0000046247 00000 n 
0000029808 00000 n 
0000034487 00000 n 
0000047750 00000 n 
0000046391 00000 n 
0000034508 00000 n 
0000037956 00000 n 
0000048610 00000 n 
0000047820 00000 n 
0000046535 00000 n 
0000037977 00000 n 
0000041600 00000 n 
0000047912 00000 n 
0000046679 00000 n 
0000041621 00000 n 
0000044788 00000 n 
0000047971 00000 n 
0000048504 00000 n 
0000048678 00000 n 
0000050676 00000 n 
trailer
<< /Size 71 /Root 1 0 R /Info 2 0 R
/ID [<17668BD840432B794804E57011C3563B><17668BD840432B794804E57011C3563B>]
/Encrypt 70 0 R >>
startxref
50819
%%EOF
#8Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Haribabu Kommi (#7)
2 attachment(s)
Re: Priority table or Cache table

On Fri, Feb 21, 2014 at 12:02 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

On Thu, Feb 20, 2014 at 10:06 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

On Thu, Feb 20, 2014 at 10:23 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

On Thu, Feb 20, 2014 at 2:26 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Thu, Feb 20, 2014 at 6:24 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

On Thu, Feb 20, 2014 at 11:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I want to propose a new feature called "priority table" or "cache
table".
This is same as regular table except the pages of these tables are
having
high priority than normal tables. These tables are very useful,
where a
faster query processing on some particular tables is expected.

Why exactly does the existing LRU behavior of shared buffers not do
what you need?

Lets assume a database having 3 tables, which are accessed regularly.
The
user is expecting a faster query results on one table.
Because of LRU behavior which is not happening some times.

I Implemented a proof of concept patch to see whether the buffer pool
split can improve the performance or not.

Summary of the changes:
1. The priority buffers are allocated as continuous to the shared buffers.
2. Added new reloption parameter called "buffer_pool" to specify the
buffer_pool user wants the table to use.
3. Two free lists are created to store the information for two buffer pools.
4. While allocating the buffer based on the table type, the
corresponding buffer is allocated.

The Performance test is carried as follows:
1. Create all the pgbench tables and indexes on the new buffer pool.
2. Initialize the pgbench test with a scale factor of 75 equals to a
size of 1GB.
3. Create an another load test table with a size of 1GB with default
buffer pool.
4. In-parallel with performance test the select and update operations
are carried out on the load test table (singe thread).

Configuration changes:
shared_buffers - 1536MB (Head) Patched Shared_buffers
-512MB, Priority_buffers - 1024MB.
synchronous_commit - off, wal_buffers-16MB, checkpoint_segments - 255,
checkpoint_timeout - 15min.

Threads Head Patched Diff
1 25 25 0%
2 35 59 68%
4 52 79 51%
8 79 150 89%

In my testing it shows very good improvement in performance.

The POC patch and the test script is attached in the mail used for
testing the performance.
The modified pgbench.c code is also attached to use the newly created
buffer pool instead of default for the test purpose.
Copy the test script to the installation folder and execute as
./rub_bg.sh ./run_reading.sh 1 1

please let me know your suggestions.

Regards,
Hari Babu
Fujitsu Australia

Attachments:

test_script.zipapplication/zip; name=test_script.zipDownload
cache_table_poc.patchapplication/octet-stream; name=cache_table_poc.patchDownload
*** a/contrib/pg_buffercache/pg_buffercache_pages.c
--- b/contrib/pg_buffercache/pg_buffercache_pages.c
***************
*** 100,109 **** pg_buffercache_pages(PG_FUNCTION_ARGS)
  		fctx->tupdesc = BlessTupleDesc(tupledesc);
  
  		/* Allocate NBuffers worth of BufferCachePagesRec records. */
! 		fctx->record = (BufferCachePagesRec *) palloc(sizeof(BufferCachePagesRec) * NBuffers);
  
  		/* Set max calls and remember the user function context. */
! 		funcctx->max_calls = NBuffers;
  		funcctx->user_fctx = fctx;
  
  		/* Return to original context when allocating transient memory */
--- 100,109 ----
  		fctx->tupdesc = BlessTupleDesc(tupledesc);
  
  		/* Allocate NBuffers worth of BufferCachePagesRec records. */
! 		fctx->record = (BufferCachePagesRec *) palloc(sizeof(BufferCachePagesRec) * NSharedBuffers);
  
  		/* Set max calls and remember the user function context. */
! 		funcctx->max_calls = NSharedBuffers;
  		funcctx->user_fctx = fctx;
  
  		/* Return to original context when allocating transient memory */
***************
*** 122,128 **** pg_buffercache_pages(PG_FUNCTION_ARGS)
  		 * Scan though all the buffers, saving the relevant fields in the
  		 * fctx->record structure.
  		 */
! 		for (i = 0, bufHdr = BufferDescriptors; i < NBuffers; i++, bufHdr++)
  		{
  			/* Lock each buffer header before inspecting. */
  			LockBufHdr(bufHdr);
--- 122,128 ----
  		 * Scan though all the buffers, saving the relevant fields in the
  		 * fctx->record structure.
  		 */
! 		for (i = 0, bufHdr = BufferDescriptors; i < NSharedBuffers; i++, bufHdr++)
  		{
  			/* Lock each buffer header before inspecting. */
  			LockBufHdr(bufHdr);
*** a/src/backend/access/common/reloptions.c
--- b/src/backend/access/common/reloptions.c
***************
*** 33,38 ****
--- 33,40 ----
  #include "utils/memutils.h"
  #include "utils/rel.h"
  
+ static void validateBufferPoolOption(char *value);
+ 
  /*
   * Contents of pg_class.reloptions
   *
***************
*** 292,297 **** static relopt_string stringRelOpts[] =
--- 294,310 ----
  		validateWithCheckOption,
  		NULL
  	},
+ 	{
+ 		{
+ 			"buffer_pool",
+ 			"Table with buffer_pool option defined (default or priority).",
+ 			RELOPT_KIND_HEAP | RELOPT_KIND_BTREE
+ 		},
+ 		7,
+ 		false,
+ 		validateBufferPoolOption,
+ 		"default"
+ 	},
  	/* list terminator */
  	{{NULL}}
  };
***************
*** 1174,1179 **** default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
--- 1187,1194 ----
  	int			numoptions;
  	static const relopt_parse_elt tab[] = {
  		{"fillfactor", RELOPT_TYPE_INT, offsetof(StdRdOptions, fillfactor)},
+ 		{"buffer_pool", RELOPT_TYPE_STRING,
+ 		offsetof(StdRdOptions, bufferpool_offset)},
  		{"autovacuum_enabled", RELOPT_TYPE_BOOL,
  		offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, enabled)},
  		{"autovacuum_vacuum_threshold", RELOPT_TYPE_INT,
***************
*** 1205,1211 **** default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
  		{"check_option", RELOPT_TYPE_STRING,
  		offsetof(StdRdOptions, check_option_offset)},
  		{"user_catalog_table", RELOPT_TYPE_BOOL,
! 		 offsetof(StdRdOptions, user_catalog_table)}
  	};
  
  	options = parseRelOptions(reloptions, validate, kind, &numoptions);
--- 1220,1226 ----
  		{"check_option", RELOPT_TYPE_STRING,
  		offsetof(StdRdOptions, check_option_offset)},
  		{"user_catalog_table", RELOPT_TYPE_BOOL,
! 		offsetof(StdRdOptions, user_catalog_table)}
  	};
  
  	options = parseRelOptions(reloptions, validate, kind, &numoptions);
***************
*** 1356,1358 **** tablespace_reloptions(Datum reloptions, bool validate)
--- 1371,1391 ----
  
  	return (bytea *) tsopts;
  }
+ 
+ /*
+  * Validator for "bufferpool" reloption on Tables and views. The allowed values
+  * are "default" and "priority".
+  */
+ static void
+ validateBufferPoolOption(char *value)
+ {
+ 	if (value == NULL ||
+ 		(pg_strcasecmp(value, "default") != 0 &&
+ 		 pg_strcasecmp(value, "priority") != 0))
+ 	{
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 				 errmsg("invalid value for \"bufferpool\" option"),
+ 			  errdetail("Valid values are \"default\", and \"priority\".")));
+ 	}
+ }
*** a/src/backend/access/transam/clog.c
--- b/src/backend/access/transam/clog.c
***************
*** 437,443 **** TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
  Size
  CLOGShmemBuffers(void)
  {
! 	return Min(32, Max(4, NBuffers / 512));
  }
  
  /*
--- 437,443 ----
  Size
  CLOGShmemBuffers(void)
  {
! 	return Min(32, Max(4, NSharedBuffers / 512));
  }
  
  /*
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 5048,5054 **** XLOGChooseNumBuffers(void)
  {
  	int			xbuffers;
  
! 	xbuffers = NBuffers / 32;
  	if (xbuffers > XLOG_SEG_SIZE / XLOG_BLCKSZ)
  		xbuffers = XLOG_SEG_SIZE / XLOG_BLCKSZ;
  	if (xbuffers < 8)
--- 5048,5054 ----
  {
  	int			xbuffers;
  
! 	xbuffers = NSharedBuffers / 32;
  	if (xbuffers > XLOG_SEG_SIZE / XLOG_BLCKSZ)
  		xbuffers = XLOG_SEG_SIZE / XLOG_BLCKSZ;
  	if (xbuffers < 8)
***************
*** 8194,8200 **** LogCheckpointEnd(bool restartpoint)
  			 "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
  			 "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s",
  			 CheckpointStats.ckpt_bufs_written,
! 			 (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
  			 CheckpointStats.ckpt_segs_added,
  			 CheckpointStats.ckpt_segs_removed,
  			 CheckpointStats.ckpt_segs_recycled,
--- 8194,8200 ----
  			 "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
  			 "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s",
  			 CheckpointStats.ckpt_bufs_written,
! 			 (double) CheckpointStats.ckpt_bufs_written * 100 / NSharedBuffers,
  			 CheckpointStats.ckpt_segs_added,
  			 CheckpointStats.ckpt_segs_removed,
  			 CheckpointStats.ckpt_segs_recycled,
***************
*** 8210,8216 **** LogCheckpointEnd(bool restartpoint)
  			 "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
  			 "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s",
  			 CheckpointStats.ckpt_bufs_written,
! 			 (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
  			 CheckpointStats.ckpt_segs_added,
  			 CheckpointStats.ckpt_segs_removed,
  			 CheckpointStats.ckpt_segs_recycled,
--- 8210,8216 ----
  			 "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
  			 "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s",
  			 CheckpointStats.ckpt_bufs_written,
! 			 (double) CheckpointStats.ckpt_bufs_written * 100 / NSharedBuffers,
  			 CheckpointStats.ckpt_segs_added,
  			 CheckpointStats.ckpt_segs_removed,
  			 CheckpointStats.ckpt_segs_recycled,
***************
*** 8651,8657 **** CreateCheckPoint(int flags)
  	LogCheckpointEnd(false);
  
  	TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written,
! 									 NBuffers,
  									 CheckpointStats.ckpt_segs_added,
  									 CheckpointStats.ckpt_segs_removed,
  									 CheckpointStats.ckpt_segs_recycled);
--- 8651,8657 ----
  	LogCheckpointEnd(false);
  
  	TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written,
! 									 NSharedBuffers,
  									 CheckpointStats.ckpt_segs_added,
  									 CheckpointStats.ckpt_segs_removed,
  									 CheckpointStats.ckpt_segs_recycled);
*** a/src/backend/optimizer/path/costsize.c
--- b/src/backend/optimizer/path/costsize.c
***************
*** 4118,4125 **** check_effective_cache_size(int *newval, void **extra, GucSource source)
  		/*
  		 * Otherwise, substitute the auto-tune value, being wary of overflow.
  		 */
! 		if (NBuffers < INT_MAX / 4)
! 			*newval = NBuffers * 4;
  		else
  			*newval = INT_MAX;
  	}
--- 4118,4125 ----
  		/*
  		 * Otherwise, substitute the auto-tune value, being wary of overflow.
  		 */
! 		if (NSharedBuffers < INT_MAX / 4)
! 			*newval = NSharedBuffers * 4;
  		else
  			*newval = INT_MAX;
  	}
*** a/src/backend/postmaster/checkpointer.c
--- b/src/backend/postmaster/checkpointer.c
***************
*** 903,912 **** CheckpointerShmemSize(void)
  
  	/*
  	 * Currently, the size of the requests[] array is arbitrarily set equal to
! 	 * NBuffers.  This may prove too large or small ...
  	 */
  	size = offsetof(CheckpointerShmemStruct, requests);
! 	size = add_size(size, mul_size(NBuffers, sizeof(CheckpointerRequest)));
  
  	return size;
  }
--- 903,912 ----
  
  	/*
  	 * Currently, the size of the requests[] array is arbitrarily set equal to
! 	 * NSharedBuffers.  This may prove too large or small ...
  	 */
  	size = offsetof(CheckpointerShmemStruct, requests);
! 	size = add_size(size, mul_size(NSharedBuffers, sizeof(CheckpointerRequest)));
  
  	return size;
  }
***************
*** 935,941 **** CheckpointerShmemInit(void)
  		 */
  		MemSet(CheckpointerShmem, 0, size);
  		SpinLockInit(&CheckpointerShmem->ckpt_lck);
! 		CheckpointerShmem->max_requests = NBuffers;
  	}
  }
  
--- 935,941 ----
  		 */
  		MemSet(CheckpointerShmem, 0, size);
  		SpinLockInit(&CheckpointerShmem->ckpt_lck);
! 		CheckpointerShmem->max_requests = NSharedBuffers;
  	}
  }
  
*** a/src/backend/storage/buffer/buf_init.c
--- b/src/backend/storage/buffer/buf_init.c
***************
*** 17,25 ****
  #include "storage/bufmgr.h"
  #include "storage/buf_internals.h"
  
- 
  BufferDesc *BufferDescriptors;
! char	   *BufferBlocks;
  int32	   *PrivateRefCount;
  
  
--- 17,24 ----
  #include "storage/bufmgr.h"
  #include "storage/buf_internals.h"
  
  BufferDesc *BufferDescriptors;
! char *BufferBlocks; 
  int32	   *PrivateRefCount;
  
  
***************
*** 77,87 **** InitBufferPool(void)
  
  	BufferDescriptors = (BufferDesc *)
  		ShmemInitStruct("Buffer Descriptors",
! 						NBuffers * sizeof(BufferDesc), &foundDescs);
  
  	BufferBlocks = (char *)
  		ShmemInitStruct("Buffer Blocks",
! 						NBuffers * (Size) BLCKSZ, &foundBufs);
  
  	if (foundDescs || foundBufs)
  	{
--- 76,86 ----
  
  	BufferDescriptors = (BufferDesc *)
  		ShmemInitStruct("Buffer Descriptors",
! 						NSharedBuffers * sizeof(BufferDesc), &foundDescs);
  
  	BufferBlocks = (char *)
  		ShmemInitStruct("Buffer Blocks",
! 						NSharedBuffers * (Size) BLCKSZ, &foundBufs);
  
  	if (foundDescs || foundBufs)
  	{
***************
*** 98,119 **** InitBufferPool(void)
  
  		/*
  		 * Initialize all the buffer headers.
! 		 */
! 		for (i = 0; i < NBuffers; buf++, i++)
  		{
  			CLEAR_BUFFERTAG(buf->tag);
  			buf->flags = 0;
  			buf->usage_count = 0;
  			buf->refcount = 0;
  			buf->wait_backend_pid = 0;
! 
  			SpinLockInit(&buf->buf_hdr_lock);
  
  			buf->buf_id = i;
  
  			/*
  			 * Initially link all the buffers together as unused. Subsequent
! 			 * management of this list is done by freelist.c.
  			 */
  			buf->freeNext = i + 1;
  
--- 97,118 ----
  
  		/*
  		 * Initialize all the buffer headers.
! 		*/
! 		for (i = 0; i < NSharedBuffers; buf++, i++)
  		{
  			CLEAR_BUFFERTAG(buf->tag);
  			buf->flags = 0;
  			buf->usage_count = 0;
  			buf->refcount = 0;
  			buf->wait_backend_pid = 0;
! 			
  			SpinLockInit(&buf->buf_hdr_lock);
  
  			buf->buf_id = i;
  
  			/*
  			 * Initially link all the buffers together as unused. Subsequent
!   			 * management of this list is done by freelist.c.
  			 */
  			buf->freeNext = i + 1;
  
***************
*** 122,134 **** InitBufferPool(void)
  		}
  
  		/* Correct last entry of linked list */
! 		BufferDescriptors[NBuffers - 1].freeNext = FREENEXT_END_OF_LIST;
  	}
  
  	/* Init other shared buffer-management stuff */
  	StrategyInitialize(!foundDescs);
  }
  
  /*
   * Initialize access to shared buffer pool
   *
--- 121,134 ----
  		}
  
  		/* Correct last entry of linked list */
! 		BufferDescriptors[NSharedBuffers - 1].freeNext = FREENEXT_END_OF_LIST;
  	}
  
  	/* Init other shared buffer-management stuff */
  	StrategyInitialize(!foundDescs);
  }
  
+ 
  /*
   * Initialize access to shared buffer pool
   *
***************
*** 147,153 **** InitBufferPoolAccess(void)
  	/*
  	 * Allocate and zero local arrays of per-buffer info.
  	 */
! 	PrivateRefCount = (int32 *) calloc(NBuffers, sizeof(int32));
  	if (!PrivateRefCount)
  		ereport(FATAL,
  				(errcode(ERRCODE_OUT_OF_MEMORY),
--- 147,153 ----
  	/*
  	 * Allocate and zero local arrays of per-buffer info.
  	 */
! 	PrivateRefCount = (int32 *) calloc(NSharedBuffers, sizeof(int32));
  	if (!PrivateRefCount)
  		ereport(FATAL,
  				(errcode(ERRCODE_OUT_OF_MEMORY),
***************
*** 164,178 **** Size
  BufferShmemSize(void)
  {
  	Size		size = 0;
  
  	/* size of buffer descriptors */
! 	size = add_size(size, mul_size(NBuffers, sizeof(BufferDesc)));
  
  	/* size of data pages */
! 	size = add_size(size, mul_size(NBuffers, BLCKSZ));
  
  	/* size of stuff controlled by freelist.c */
  	size = add_size(size, StrategyShmemSize());
! 
  	return size;
  }
--- 164,180 ----
  BufferShmemSize(void)
  {
  	Size		size = 0;
+ 	
+ 	NSharedBuffers = add_size(NBuffers, NPriorityBuffers);
  
  	/* size of buffer descriptors */
! 	size = add_size(size, mul_size(NSharedBuffers, sizeof(BufferDesc)));
  
  	/* size of data pages */
! 	size = add_size(size, mul_size(NSharedBuffers, BLCKSZ));
  
  	/* size of stuff controlled by freelist.c */
  	size = add_size(size, StrategyShmemSize());
! 	
  	return size;
  }
*** a/src/backend/storage/buffer/buf_table.c
--- b/src/backend/storage/buffer/buf_table.c
***************
*** 34,40 **** typedef struct
  
  static HTAB *SharedBufHash;
  
- 
  /*
   * Estimate space needed for mapping hashtable
   *		size is the desired hash table size (possibly more than NBuffers)
--- 34,39 ----
*** a/src/backend/storage/buffer/bufmgr.c
--- b/src/backend/storage/buffer/bufmgr.c
***************
*** 87,93 **** static bool IsForInput;
  static volatile BufferDesc *PinCountWaitBuf = NULL;
  
  
! static Buffer ReadBuffer_common(SMgrRelation reln, char relpersistence,
  				  ForkNumber forkNum, BlockNumber blockNum,
  				  ReadBufferMode mode, BufferAccessStrategy strategy,
  				  bool *hit);
--- 87,93 ----
  static volatile BufferDesc *PinCountWaitBuf = NULL;
  
  
! static Buffer ReadBuffer_common(int poolid, SMgrRelation reln, char relpersistence,
  				  ForkNumber forkNum, BlockNumber blockNum,
  				  ReadBufferMode mode, BufferAccessStrategy strategy,
  				  bool *hit);
***************
*** 102,108 **** static void TerminateBufferIO(volatile BufferDesc *buf, bool clear_dirty,
  				  int set_flag_bits);
  static void shared_buffer_write_error_callback(void *arg);
  static void local_buffer_write_error_callback(void *arg);
! static volatile BufferDesc *BufferAlloc(SMgrRelation smgr,
  			char relpersistence,
  			ForkNumber forkNum,
  			BlockNumber blockNum,
--- 102,108 ----
  				  int set_flag_bits);
  static void shared_buffer_write_error_callback(void *arg);
  static void local_buffer_write_error_callback(void *arg);
! static volatile BufferDesc *BufferAlloc(int poolid, SMgrRelation smgr,
  			char relpersistence,
  			ForkNumber forkNum,
  			BlockNumber blockNum,
***************
*** 232,237 **** ReadBufferExtended(Relation reln, ForkNumber forkNum, BlockNumber blockNum,
--- 232,238 ----
  				   ReadBufferMode mode, BufferAccessStrategy strategy)
  {
  	bool		hit;
+ 	int			poolid;
  	Buffer		buf;
  
  	/* Open it at the smgr level if not already done */
***************
*** 252,258 **** ReadBufferExtended(Relation reln, ForkNumber forkNum, BlockNumber blockNum,
  	 * miss.
  	 */
  	pgstat_count_buffer_read(reln);
! 	buf = ReadBuffer_common(reln->rd_smgr, reln->rd_rel->relpersistence,
  							forkNum, blockNum, mode, strategy, &hit);
  	if (hit)
  		pgstat_count_buffer_hit(reln);
--- 253,261 ----
  	 * miss.
  	 */
  	pgstat_count_buffer_read(reln);
! 	poolid = RelationIsInPriorityBufferPool(reln) ? PRIORITY_BUFFER_POOL : DEFAULT_BUFFER_POOL;
! 
! 	buf = ReadBuffer_common(poolid, reln->rd_smgr, reln->rd_rel->relpersistence,
  							forkNum, blockNum, mode, strategy, &hit);
  	if (hit)
  		pgstat_count_buffer_hit(reln);
***************
*** 279,286 **** ReadBufferWithoutRelcache(RelFileNode rnode, ForkNumber forkNum,
  	SMgrRelation smgr = smgropen(rnode, InvalidBackendId);
  
  	Assert(InRecovery);
! 
! 	return ReadBuffer_common(smgr, RELPERSISTENCE_PERMANENT, forkNum, blockNum,
  							 mode, strategy, &hit);
  }
  
--- 282,288 ----
  	SMgrRelation smgr = smgropen(rnode, InvalidBackendId);
  
  	Assert(InRecovery);
! 	return ReadBuffer_common(DEFAULT_BUFFER_POOL, smgr, RELPERSISTENCE_PERMANENT, forkNum, blockNum,
  							 mode, strategy, &hit);
  }
  
***************
*** 291,297 **** ReadBufferWithoutRelcache(RelFileNode rnode, ForkNumber forkNum,
   * *hit is set to true if the request was satisfied from shared buffer cache.
   */
  static Buffer
! ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
  				  BlockNumber blockNum, ReadBufferMode mode,
  				  BufferAccessStrategy strategy, bool *hit)
  {
--- 293,299 ----
   * *hit is set to true if the request was satisfied from shared buffer cache.
   */
  static Buffer
! ReadBuffer_common(int poolid, SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
  				  BlockNumber blockNum, ReadBufferMode mode,
  				  BufferAccessStrategy strategy, bool *hit)
  {
***************
*** 333,339 **** ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
  		 * lookup the buffer.  IO_IN_PROGRESS is set if the requested block is
  		 * not currently in memory.
  		 */
! 		bufHdr = BufferAlloc(smgr, relpersistence, forkNum, blockNum,
  							 strategy, &found);
  		if (found)
  			pgBufferUsage.shared_blks_hit++;
--- 335,341 ----
  		 * lookup the buffer.  IO_IN_PROGRESS is set if the requested block is
  		 * not currently in memory.
  		 */
! 		bufHdr = BufferAlloc(poolid, smgr, relpersistence, forkNum, blockNum,
  							 strategy, &found);
  		if (found)
  			pgBufferUsage.shared_blks_hit++;
***************
*** 532,538 **** ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
   * No locks are held either at entry or exit.
   */
  static volatile BufferDesc *
! BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
  			BlockNumber blockNum,
  			BufferAccessStrategy strategy,
  			bool *foundPtr)
--- 534,540 ----
   * No locks are held either at entry or exit.
   */
  static volatile BufferDesc *
! BufferAlloc(int poolid, SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
  			BlockNumber blockNum,
  			BufferAccessStrategy strategy,
  			bool *foundPtr)
***************
*** 613,619 **** BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
  		 * still held, since it would be bad to hold the spinlock while
  		 * possibly waking up other processes.
  		 */
! 		buf = StrategyGetBuffer(strategy, &lock_held);
  
  		Assert(buf->refcount == 0);
  
--- 615,621 ----
  		 * still held, since it would be bad to hold the spinlock while
  		 * possibly waking up other processes.
  		 */
! 		buf = StrategyGetBuffer(poolid, strategy, &lock_held);
  
  		Assert(buf->refcount == 0);
  
***************
*** 625,631 **** BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
  
  		/* Now it's safe to release the freelist lock */
  		if (lock_held)
! 			LWLockRelease(BufFreelistLock);
  
  		/*
  		 * If the buffer was dirty, try to write it out.  There is a race
--- 627,633 ----
  
  		/* Now it's safe to release the freelist lock */
  		if (lock_held)
! 			LWLockRelease(BufFreelistLock(poolid));
  
  		/*
  		 * If the buffer was dirty, try to write it out.  There is a race
***************
*** 1246,1252 **** BufferSync(int flags)
  	 * certainly need to be written for the next checkpoint attempt, too.
  	 */
  	num_to_write = 0;
! 	for (buf_id = 0; buf_id < NBuffers; buf_id++)
  	{
  		volatile BufferDesc *bufHdr = &BufferDescriptors[buf_id];
  
--- 1248,1254 ----
  	 * certainly need to be written for the next checkpoint attempt, too.
  	 */
  	num_to_write = 0;
! 	for (buf_id = 0; buf_id < NSharedBuffers; buf_id++)
  	{
  		volatile BufferDesc *bufHdr = &BufferDescriptors[buf_id];
  
***************
*** 1268,1274 **** BufferSync(int flags)
  	if (num_to_write == 0)
  		return;					/* nothing to do */
  
! 	TRACE_POSTGRESQL_BUFFER_SYNC_START(NBuffers, num_to_write);
  
  	/*
  	 * Loop over all buffers again, and write the ones (still) marked with
--- 1270,1276 ----
  	if (num_to_write == 0)
  		return;					/* nothing to do */
  
! 	TRACE_POSTGRESQL_BUFFER_SYNC_START(NSharedBuffers, num_to_write);
  
  	/*
  	 * Loop over all buffers again, and write the ones (still) marked with
***************
*** 1278,1285 **** BufferSync(int flags)
  	 * Note that we don't read the buffer alloc count here --- that should be
  	 * left untouched till the next BgBufferSync() call.
  	 */
! 	buf_id = StrategySyncStart(NULL, NULL);
! 	num_to_scan = NBuffers;
  	num_written = 0;
  	while (num_to_scan-- > 0)
  	{
--- 1280,1287 ----
  	 * Note that we don't read the buffer alloc count here --- that should be
  	 * left untouched till the next BgBufferSync() call.
  	 */
! 	buf_id = StrategySyncStart(0, NULL, NULL);
! 	num_to_scan = NSharedBuffers;
  	num_written = 0;
  	while (num_to_scan-- > 0)
  	{
***************
*** 1326,1342 **** BufferSync(int flags)
  			}
  		}
  
! 		if (++buf_id >= NBuffers)
  			buf_id = 0;
  	}
- 
  	/*
  	 * Update checkpoint statistics. As noted above, this doesn't include
  	 * buffers written by other backends or bgwriter scan.
  	 */
  	CheckpointStats.ckpt_bufs_written += num_written;
  
! 	TRACE_POSTGRESQL_BUFFER_SYNC_DONE(NBuffers, num_written, num_to_write);
  }
  
  /*
--- 1328,1343 ----
  			}
  		}
  
! 		if (++buf_id >= NSharedBuffers)
  			buf_id = 0;
  	}
  	/*
  	 * Update checkpoint statistics. As noted above, this doesn't include
  	 * buffers written by other backends or bgwriter scan.
  	 */
  	CheckpointStats.ckpt_bufs_written += num_written;
  
! 	TRACE_POSTGRESQL_BUFFER_SYNC_DONE(BufferPool[poolid].num_buffers, num_written, num_to_write);
  }
  
  /*
***************
*** 1389,1635 **** BgBufferSync(void)
  	int			num_to_scan;
  	int			num_written;
  	int			reusable_buffers;
  
  	/* Variables for final smoothed_density update */
  	long		new_strategy_delta;
  	uint32		new_recent_alloc;
  
! 	/*
! 	 * Find out where the freelist clock sweep currently is, and how many
! 	 * buffer allocations have happened since our last call.
! 	 */
! 	strategy_buf_id = StrategySyncStart(&strategy_passes, &recent_alloc);
! 
! 	/* Report buffer alloc counts to pgstat */
! 	BgWriterStats.m_buf_alloc += recent_alloc;
! 
! 	/*
! 	 * If we're not running the LRU scan, just stop after doing the stats
! 	 * stuff.  We mark the saved state invalid so that we can recover sanely
! 	 * if LRU scan is turned back on later.
! 	 */
! 	if (bgwriter_lru_maxpages <= 0)
  	{
! 		saved_info_valid = false;
! 		return true;
! 	}
! 
! 	/*
! 	 * Compute strategy_delta = how many buffers have been scanned by the
! 	 * clock sweep since last time.  If first time through, assume none. Then
! 	 * see if we are still ahead of the clock sweep, and if so, how many
! 	 * buffers we could scan before we'd catch up with it and "lap" it. Note:
! 	 * weird-looking coding of xxx_passes comparisons are to avoid bogus
! 	 * behavior when the passes counts wrap around.
! 	 */
! 	if (saved_info_valid)
! 	{
! 		int32		passes_delta = strategy_passes - prev_strategy_passes;
! 
! 		strategy_delta = strategy_buf_id - prev_strategy_buf_id;
! 		strategy_delta += (long) passes_delta *NBuffers;
  
! 		Assert(strategy_delta >= 0);
  
! 		if ((int32) (next_passes - strategy_passes) > 0)
  		{
! 			/* we're one pass ahead of the strategy point */
! 			bufs_to_lap = strategy_buf_id - next_to_clean;
! #ifdef BGW_DEBUG
! 			elog(DEBUG2, "bgwriter ahead: bgw %u-%u strategy %u-%u delta=%ld lap=%d",
! 				 next_passes, next_to_clean,
! 				 strategy_passes, strategy_buf_id,
! 				 strategy_delta, bufs_to_lap);
! #endif
  		}
! 		else if (next_passes == strategy_passes &&
! 				 next_to_clean >= strategy_buf_id)
  		{
! 			/* on same pass, but ahead or at least not behind */
! 			bufs_to_lap = NBuffers - (next_to_clean - strategy_buf_id);
! #ifdef BGW_DEBUG
! 			elog(DEBUG2, "bgwriter ahead: bgw %u-%u strategy %u-%u delta=%ld lap=%d",
! 				 next_passes, next_to_clean,
! 				 strategy_passes, strategy_buf_id,
! 				 strategy_delta, bufs_to_lap);
! #endif
  		}
  		else
  		{
  			/*
! 			 * We're behind, so skip forward to the strategy point and start
! 			 * cleaning from there.
  			 */
! #ifdef BGW_DEBUG
! 			elog(DEBUG2, "bgwriter behind: bgw %u-%u strategy %u-%u delta=%ld",
! 				 next_passes, next_to_clean,
! 				 strategy_passes, strategy_buf_id,
! 				 strategy_delta);
! #endif
  			next_to_clean = strategy_buf_id;
  			next_passes = strategy_passes;
! 			bufs_to_lap = NBuffers;
  		}
- 	}
- 	else
- 	{
- 		/*
- 		 * Initializing at startup or after LRU scanning had been off. Always
- 		 * start at the strategy point.
- 		 */
- #ifdef BGW_DEBUG
- 		elog(DEBUG2, "bgwriter initializing: strategy %u-%u",
- 			 strategy_passes, strategy_buf_id);
- #endif
- 		strategy_delta = 0;
- 		next_to_clean = strategy_buf_id;
- 		next_passes = strategy_passes;
- 		bufs_to_lap = NBuffers;
- 	}
  
! 	/* Update saved info for next time */
! 	prev_strategy_buf_id = strategy_buf_id;
! 	prev_strategy_passes = strategy_passes;
! 	saved_info_valid = true;
  
! 	/*
! 	 * Compute how many buffers had to be scanned for each new allocation, ie,
! 	 * 1/density of reusable buffers, and track a moving average of that.
! 	 *
! 	 * If the strategy point didn't move, we don't update the density estimate
! 	 */
! 	if (strategy_delta > 0 && recent_alloc > 0)
! 	{
! 		scans_per_alloc = (float) strategy_delta / (float) recent_alloc;
! 		smoothed_density += (scans_per_alloc - smoothed_density) /
! 			smoothing_samples;
! 	}
! 
! 	/*
! 	 * Estimate how many reusable buffers there are between the current
! 	 * strategy point and where we've scanned ahead to, based on the smoothed
! 	 * density estimate.
! 	 */
! 	bufs_ahead = NBuffers - bufs_to_lap;
! 	reusable_buffers_est = (float) bufs_ahead / smoothed_density;
  
! 	/*
! 	 * Track a moving average of recent buffer allocations.  Here, rather than
! 	 * a true average we want a fast-attack, slow-decline behavior: we
! 	 * immediately follow any increase.
! 	 */
! 	if (smoothed_alloc <= (float) recent_alloc)
! 		smoothed_alloc = recent_alloc;
! 	else
! 		smoothed_alloc += ((float) recent_alloc - smoothed_alloc) /
! 			smoothing_samples;
  
! 	/* Scale the estimate by a GUC to allow more aggressive tuning. */
! 	upcoming_alloc_est = (int) (smoothed_alloc * bgwriter_lru_multiplier);
  
! 	/*
! 	 * If recent_alloc remains at zero for many cycles, smoothed_alloc will
! 	 * eventually underflow to zero, and the underflows produce annoying
! 	 * kernel warnings on some platforms.  Once upcoming_alloc_est has gone to
! 	 * zero, there's no point in tracking smaller and smaller values of
! 	 * smoothed_alloc, so just reset it to exactly zero to avoid this
! 	 * syndrome.  It will pop back up as soon as recent_alloc increases.
! 	 */
! 	if (upcoming_alloc_est == 0)
! 		smoothed_alloc = 0;
  
! 	/*
! 	 * Even in cases where there's been little or no buffer allocation
! 	 * activity, we want to make a small amount of progress through the buffer
! 	 * cache so that as many reusable buffers as possible are clean after an
! 	 * idle period.
! 	 *
! 	 * (scan_whole_pool_milliseconds / BgWriterDelay) computes how many times
! 	 * the BGW will be called during the scan_whole_pool time; slice the
! 	 * buffer pool into that many sections.
! 	 */
! 	min_scan_buffers = (int) (NBuffers / (scan_whole_pool_milliseconds / BgWriterDelay));
  
! 	if (upcoming_alloc_est < (min_scan_buffers + reusable_buffers_est))
! 	{
! #ifdef BGW_DEBUG
! 		elog(DEBUG2, "bgwriter: alloc_est=%d too small, using min=%d + reusable_est=%d",
! 			 upcoming_alloc_est, min_scan_buffers, reusable_buffers_est);
! #endif
! 		upcoming_alloc_est = min_scan_buffers + reusable_buffers_est;
! 	}
  
! 	/*
! 	 * Now write out dirty reusable buffers, working forward from the
! 	 * next_to_clean point, until we have lapped the strategy scan, or cleaned
! 	 * enough buffers to match our estimate of the next cycle's allocation
! 	 * requirements, or hit the bgwriter_lru_maxpages limit.
! 	 */
  
! 	/* Make sure we can handle the pin inside SyncOneBuffer */
! 	ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
  
! 	num_to_scan = bufs_to_lap;
! 	num_written = 0;
! 	reusable_buffers = reusable_buffers_est;
  
! 	/* Execute the LRU scan */
! 	while (num_to_scan > 0 && reusable_buffers < upcoming_alloc_est)
! 	{
! 		int			buffer_state = SyncOneBuffer(next_to_clean, true);
  
! 		if (++next_to_clean >= NBuffers)
  		{
! 			next_to_clean = 0;
! 			next_passes++;
! 		}
! 		num_to_scan--;
  
! 		if (buffer_state & BUF_WRITTEN)
! 		{
! 			reusable_buffers++;
! 			if (++num_written >= bgwriter_lru_maxpages)
  			{
! 				BgWriterStats.m_maxwritten_clean++;
! 				break;
  			}
  		}
- 		else if (buffer_state & BUF_REUSABLE)
- 			reusable_buffers++;
- 	}
  
! 	BgWriterStats.m_buf_written_clean += num_written;
  
! #ifdef BGW_DEBUG
! 	elog(DEBUG1, "bgwriter: recent_alloc=%u smoothed=%.2f delta=%ld ahead=%d density=%.2f reusable_est=%d upcoming_est=%d scanned=%d wrote=%d reusable=%d",
! 		 recent_alloc, smoothed_alloc, strategy_delta, bufs_ahead,
! 		 smoothed_density, reusable_buffers_est, upcoming_alloc_est,
! 		 bufs_to_lap - num_to_scan,
! 		 num_written,
! 		 reusable_buffers - reusable_buffers_est);
! #endif
  
! 	/*
! 	 * Consider the above scan as being like a new allocation scan.
! 	 * Characterize its density and update the smoothed one based on it. This
! 	 * effectively halves the moving average period in cases where both the
! 	 * strategy and the background writer are doing some useful scanning,
! 	 * which is helpful because a long memory isn't as desirable on the
! 	 * density estimates.
! 	 */
! 	new_strategy_delta = bufs_to_lap - num_to_scan;
! 	new_recent_alloc = reusable_buffers - reusable_buffers_est;
! 	if (new_strategy_delta > 0 && new_recent_alloc > 0)
! 	{
! 		scans_per_alloc = (float) new_strategy_delta / (float) new_recent_alloc;
! 		smoothed_density += (scans_per_alloc - smoothed_density) /
! 			smoothing_samples;
! 
! #ifdef BGW_DEBUG
! 		elog(DEBUG2, "bgwriter: cleaner density alloc=%u scan=%ld density=%.2f new smoothed=%.2f",
! 			 new_recent_alloc, new_strategy_delta,
! 			 scans_per_alloc, smoothed_density);
! #endif
  	}
  
  	/* Return true if OK to hibernate */
--- 1390,1640 ----
  	int			num_to_scan;
  	int			num_written;
  	int			reusable_buffers;
+ 	int			poolid;
  
  	/* Variables for final smoothed_density update */
  	long		new_strategy_delta;
  	uint32		new_recent_alloc;
  
! 	for (poolid = 0; poolid < NUM_MAX_BUFFER_POOLS; poolid++)
  	{
! 		/*
! 		 * Find out where the freelist clock sweep currently is, and how many
! 		 * buffer allocations have happened since our last call.
! 		 */
! 		strategy_buf_id = StrategySyncStart(poolid, &strategy_passes, &recent_alloc);
  
! 		/* Report buffer alloc counts to pgstat */
! 		BgWriterStats.m_buf_alloc += recent_alloc;
  
! 		/*
! 		 * If we're not running the LRU scan, just stop after doing the stats
! 		 * stuff.  We mark the saved state invalid so that we can recover sanely
! 		 * if LRU scan is turned back on later.
! 		 */
! 		if (bgwriter_lru_maxpages <= 0)
  		{
! 			saved_info_valid = false;
! 			return true;
  		}
! 
! 		/*
! 		 * Compute strategy_delta = how many buffers have been scanned by the
! 		 * clock sweep since last time.  If first time through, assume none. Then
! 		 * see if we are still ahead of the clock sweep, and if so, how many
! 		 * buffers we could scan before we'd catch up with it and "lap" it. Note:
! 		 * weird-looking coding of xxx_passes comparisons are to avoid bogus
! 		 * behavior when the passes counts wrap around.
! 		 */
! 		if (saved_info_valid)
  		{
! 			int32		passes_delta = strategy_passes - prev_strategy_passes;
! 
! 			strategy_delta = strategy_buf_id - prev_strategy_buf_id;
! 			strategy_delta += (long) passes_delta * ((poolid == 0) ? NBuffers : NSharedBuffers);
! 
! 			Assert(strategy_delta >= 0);
! 
! 			if ((int32) (next_passes - strategy_passes) > 0)
! 			{
! 				/* we're one pass ahead of the strategy point */
! 				bufs_to_lap = strategy_buf_id - next_to_clean;
! 	#ifdef BGW_DEBUG
! 				elog(DEBUG2, "bgwriter ahead: bgw %u-%u strategy %u-%u delta=%ld lap=%d",
! 					 next_passes, next_to_clean,
! 					 strategy_passes, strategy_buf_id,
! 					 strategy_delta, bufs_to_lap);
! 	#endif
! 			}
! 			else if (next_passes == strategy_passes &&
! 					 next_to_clean >= strategy_buf_id)
! 			{
! 				/* on same pass, but ahead or at least not behind */
! 				bufs_to_lap = ((poolid == 0) ? NBuffers : NSharedBuffers) - (next_to_clean - strategy_buf_id);
! 	#ifdef BGW_DEBUG
! 				elog(DEBUG2, "bgwriter ahead: bgw %u-%u strategy %u-%u delta=%ld lap=%d",
! 					 next_passes, next_to_clean,
! 					 strategy_passes, strategy_buf_id,
! 					 strategy_delta, bufs_to_lap);
! 	#endif
! 			}
! 			else
! 			{
! 				/*
! 				 * We're behind, so skip forward to the strategy point and start
! 				 * cleaning from there.
! 				 */
! 	#ifdef BGW_DEBUG
! 				elog(DEBUG2, "bgwriter behind: bgw %u-%u strategy %u-%u delta=%ld",
! 					 next_passes, next_to_clean,
! 					 strategy_passes, strategy_buf_id,
! 					 strategy_delta);
! 	#endif
! 				next_to_clean = strategy_buf_id;
! 				next_passes = strategy_passes;
! 				bufs_to_lap = ((poolid == 0) ? NBuffers : NSharedBuffers);
! 			}
  		}
  		else
  		{
  			/*
! 			 * Initializing at startup or after LRU scanning had been off. Always
! 			 * start at the strategy point.
  			 */
! 	#ifdef BGW_DEBUG
! 			elog(DEBUG2, "bgwriter initializing: strategy %u-%u",
! 				 strategy_passes, strategy_buf_id);
! 	#endif
! 			strategy_delta = 0;
  			next_to_clean = strategy_buf_id;
  			next_passes = strategy_passes;
! 			bufs_to_lap = ((poolid == 0) ? NBuffers : NSharedBuffers);
  		}
  
! 		/* Update saved info for next time */
! 		prev_strategy_buf_id = strategy_buf_id;
! 		prev_strategy_passes = strategy_passes;
! 		saved_info_valid = true;
  
! 		/*
! 		 * Compute how many buffers had to be scanned for each new allocation, ie,
! 		 * 1/density of reusable buffers, and track a moving average of that.
! 		 *
! 		 * If the strategy point didn't move, we don't update the density estimate
! 		 */
! 		if (strategy_delta > 0 && recent_alloc > 0)
! 		{
! 			scans_per_alloc = (float) strategy_delta / (float) recent_alloc;
! 			smoothed_density += (scans_per_alloc - smoothed_density) /
! 				smoothing_samples;
! 		}
  
! 		/*
! 		 * Estimate how many reusable buffers there are between the current
! 		 * strategy point and where we've scanned ahead to, based on the smoothed
! 		 * density estimate.
! 		 */
! 		bufs_ahead = ((poolid == 0) ? NBuffers : NSharedBuffers) - bufs_to_lap;
! 		reusable_buffers_est = (float) bufs_ahead / smoothed_density;
  
! 		/*
! 		 * Track a moving average of recent buffer allocations.  Here, rather than
! 		 * a true average we want a fast-attack, slow-decline behavior: we
! 		 * immediately follow any increase.
! 		 */
! 		if (smoothed_alloc <= (float) recent_alloc)
! 			smoothed_alloc = recent_alloc;
! 		else
! 			smoothed_alloc += ((float) recent_alloc - smoothed_alloc) /
! 				smoothing_samples;
  
! 		/* Scale the estimate by a GUC to allow more aggressive tuning. */
! 		upcoming_alloc_est = (int) (smoothed_alloc * bgwriter_lru_multiplier);
  
! 		/*
! 		 * If recent_alloc remains at zero for many cycles, smoothed_alloc will
! 		 * eventually underflow to zero, and the underflows produce annoying
! 		 * kernel warnings on some platforms.  Once upcoming_alloc_est has gone to
! 		 * zero, there's no point in tracking smaller and smaller values of
! 		 * smoothed_alloc, so just reset it to exactly zero to avoid this
! 		 * syndrome.  It will pop back up as soon as recent_alloc increases.
! 		 */
! 		if (upcoming_alloc_est == 0)
! 			smoothed_alloc = 0;
  
! 		/*
! 		 * Even in cases where there's been little or no buffer allocation
! 		 * activity, we want to make a small amount of progress through the buffer
! 		 * cache so that as many reusable buffers as possible are clean after an
! 		 * idle period.
! 		 *
! 		 * (scan_whole_pool_milliseconds / BgWriterDelay) computes how many times
! 		 * the BGW will be called during the scan_whole_pool time; slice the
! 		 * buffer pool into that many sections.
! 		 */
! 		min_scan_buffers = (int) (((poolid == 0) ? NBuffers : NPriorityBuffers) / (scan_whole_pool_milliseconds / BgWriterDelay));
  
! 		if (upcoming_alloc_est < (min_scan_buffers + reusable_buffers_est))
! 		{
! 	#ifdef BGW_DEBUG
! 			elog(DEBUG2, "bgwriter: alloc_est=%d too small, using min=%d + reusable_est=%d",
! 				 upcoming_alloc_est, min_scan_buffers, reusable_buffers_est);
! 	#endif
! 			upcoming_alloc_est = min_scan_buffers + reusable_buffers_est;
! 		}
  
! 		/*
! 		 * Now write out dirty reusable buffers, working forward from the
! 		 * next_to_clean point, until we have lapped the strategy scan, or cleaned
! 		 * enough buffers to match our estimate of the next cycle's allocation
! 		 * requirements, or hit the bgwriter_lru_maxpages limit.
! 		 */
  
! 		/* Make sure we can handle the pin inside SyncOneBuffer */
! 		ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
  
! 		num_to_scan = bufs_to_lap;
! 		num_written = 0;
! 		reusable_buffers = reusable_buffers_est;
  
! 		/* Execute the LRU scan */
! 		while (num_to_scan > 0 && reusable_buffers < upcoming_alloc_est)
  		{
! 			int			buffer_state = SyncOneBuffer(next_to_clean, true);
  
! 			if (++next_to_clean >= ((poolid == 0) ? NBuffers : NSharedBuffers))
! 			{
! 				next_to_clean = (poolid == 0) ? 0 : NBuffers;
! 				next_passes++;
! 			}
! 			num_to_scan--;
! 
! 			if (buffer_state & BUF_WRITTEN)
  			{
! 				reusable_buffers++;
! 				if (++num_written >= bgwriter_lru_maxpages)
! 				{
! 					BgWriterStats.m_maxwritten_clean++;
! 					break;
! 				}
  			}
+ 			else if (buffer_state & BUF_REUSABLE)
+ 				reusable_buffers++;
  		}
  
! 		BgWriterStats.m_buf_written_clean += num_written;
  
! 	#ifdef BGW_DEBUG
! 		elog(DEBUG1, "bgwriter: recent_alloc=%u smoothed=%.2f delta=%ld ahead=%d density=%.2f reusable_est=%d upcoming_est=%d scanned=%d wrote=%d reusable=%d",
! 			 recent_alloc, smoothed_alloc, strategy_delta, bufs_ahead,
! 			 smoothed_density, reusable_buffers_est, upcoming_alloc_est,
! 			 bufs_to_lap - num_to_scan,
! 			 num_written,
! 			 reusable_buffers - reusable_buffers_est);
! 	#endif
  
! 		/*
! 		 * Consider the above scan as being like a new allocation scan.
! 		 * Characterize its density and update the smoothed one based on it. This
! 		 * effectively halves the moving average period in cases where both the
! 		 * strategy and the background writer are doing some useful scanning,
! 		 * which is helpful because a long memory isn't as desirable on the
! 		 * density estimates.
! 		 */
! 		new_strategy_delta = bufs_to_lap - num_to_scan;
! 		new_recent_alloc = reusable_buffers - reusable_buffers_est;
! 		if (new_strategy_delta > 0 && new_recent_alloc > 0)
! 		{
! 			scans_per_alloc = (float) new_strategy_delta / (float) new_recent_alloc;
! 			smoothed_density += (scans_per_alloc - smoothed_density) /
! 				smoothing_samples;
! 
! 	#ifdef BGW_DEBUG
! 			elog(DEBUG2, "bgwriter: cleaner density alloc=%u scan=%ld density=%.2f new smoothed=%.2f",
! 				 new_recent_alloc, new_strategy_delta,
! 				 scans_per_alloc, smoothed_density);
! 	#endif
! 		}
  	}
  
  	/* Return true if OK to hibernate */
***************
*** 1717,1723 **** AtEOXact_Buffers(bool isCommit)
  		int			RefCountErrors = 0;
  		Buffer		b;
  
! 		for (b = 1; b <= NBuffers; b++)
  		{
  			if (PrivateRefCount[b - 1] != 0)
  			{
--- 1722,1728 ----
  		int			RefCountErrors = 0;
  		Buffer		b;
  
! 		for (b = 1; b <= NSharedBuffers; b++)
  		{
  			if (PrivateRefCount[b - 1] != 0)
  			{
***************
*** 1763,1769 **** AtProcExit_Buffers(int code, Datum arg)
  		int			RefCountErrors = 0;
  		Buffer		b;
  
! 		for (b = 1; b <= NBuffers; b++)
  		{
  			if (PrivateRefCount[b - 1] != 0)
  			{
--- 1768,1774 ----
  		int			RefCountErrors = 0;
  		Buffer		b;
  
! 		for (b = 1; b <= NSharedBuffers; b++)
  		{
  			if (PrivateRefCount[b - 1] != 0)
  			{
***************
*** 2144,2150 **** DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
  		return;
  	}
  
! 	for (i = 0; i < NBuffers; i++)
  	{
  		volatile BufferDesc *bufHdr = &BufferDescriptors[i];
  
--- 2149,2155 ----
  		return;
  	}
  
! 	for (i = 0; i < NSharedBuffers; i++)
  	{
  		volatile BufferDesc *bufHdr = &BufferDescriptors[i];
  
***************
*** 2233,2239 **** DropRelFileNodesAllBuffers(RelFileNodeBackend *rnodes, int nnodes)
  	if (use_bsearch)
  		pg_qsort(nodes, n, sizeof(RelFileNode), rnode_comparator);
  
! 	for (i = 0; i < NBuffers; i++)
  	{
  		RelFileNode *rnode = NULL;
  		volatile BufferDesc *bufHdr = &BufferDescriptors[i];
--- 2238,2244 ----
  	if (use_bsearch)
  		pg_qsort(nodes, n, sizeof(RelFileNode), rnode_comparator);
  
! 	for (i = 0; i < NSharedBuffers; i++)
  	{
  		RelFileNode *rnode = NULL;
  		volatile BufferDesc *bufHdr = &BufferDescriptors[i];
***************
*** 2298,2304 **** DropDatabaseBuffers(Oid dbid)
  	 * database isn't our own.
  	 */
  
! 	for (i = 0; i < NBuffers; i++)
  	{
  		volatile BufferDesc *bufHdr = &BufferDescriptors[i];
  
--- 2303,2309 ----
  	 * database isn't our own.
  	 */
  
! 	for (i = 0; i < NSharedBuffers; i++)
  	{
  		volatile BufferDesc *bufHdr = &BufferDescriptors[i];
  
***************
*** 2331,2337 **** PrintBufferDescs(void)
  	int			i;
  	volatile BufferDesc *buf = BufferDescriptors;
  
! 	for (i = 0; i < NBuffers; ++i, ++buf)
  	{
  		/* theoretically we should lock the bufhdr here */
  		elog(LOG,
--- 2336,2342 ----
  	int			i;
  	volatile BufferDesc *buf = BufferDescriptors;
  
! 	for (i = 0; i < NSharedBuffers; ++i, ++buf)
  	{
  		/* theoretically we should lock the bufhdr here */
  		elog(LOG,
***************
*** 2352,2358 **** PrintPinnedBufs(void)
  	int			i;
  	volatile BufferDesc *buf = BufferDescriptors;
  
! 	for (i = 0; i < NBuffers; ++i, ++buf)
  	{
  		if (PrivateRefCount[i] > 0)
  		{
--- 2357,2363 ----
  	int			i;
  	volatile BufferDesc *buf = BufferDescriptors;
  
! 	for (i = 0; i < NSharedBuffers; ++i, ++buf)
  	{
  		if (PrivateRefCount[i] > 0)
  		{
***************
*** 2437,2443 **** FlushRelationBuffers(Relation rel)
  	/* Make sure we can handle the pin inside the loop */
  	ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
  
! 	for (i = 0; i < NBuffers; i++)
  	{
  		bufHdr = &BufferDescriptors[i];
  
--- 2442,2448 ----
  	/* Make sure we can handle the pin inside the loop */
  	ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
  
! 	for (i = 0; i < NSharedBuffers; i++)
  	{
  		bufHdr = &BufferDescriptors[i];
  
***************
*** 2487,2493 **** FlushDatabaseBuffers(Oid dbid)
  	/* Make sure we can handle the pin inside the loop */
  	ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
  
! 	for (i = 0; i < NBuffers; i++)
  	{
  		bufHdr = &BufferDescriptors[i];
  
--- 2492,2498 ----
  	/* Make sure we can handle the pin inside the loop */
  	ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
  
! 	for (i = 0; i < NSharedBuffers; i++)
  	{
  		bufHdr = &BufferDescriptors[i];
  
*** a/src/backend/storage/buffer/freelist.c
--- b/src/backend/storage/buffer/freelist.c
***************
*** 49,55 **** typedef struct
  } BufferStrategyControl;
  
  /* Pointers to shared state */
! static BufferStrategyControl *StrategyControl = NULL;
  
  /*
   * Private (non-shared) state for managing a ring of shared buffers to re-use.
--- 49,66 ----
  } BufferStrategyControl;
  
  /* Pointers to shared state */
! static BufferStrategyControl *StrategyControl[NUM_MAX_BUFFER_POOLS];
! 
! struct bufferAccessStrategyStatus
! {
! 	char name[NAMEDATALEN];
! 	int  num_buffers;
! };
! 
! struct bufferAccessStrategyStatus BufferStrategyStatus[NUM_MAX_BUFFER_POOLS] = {
! 	{"Default Buffer Strategy status", 0},
! 	{"Priority Buffer Strategy status", 0}
! };
  
  /*
   * Private (non-shared) state for managing a ring of shared buffers to re-use.
***************
*** 109,115 **** static void AddBufferToRing(BufferAccessStrategy strategy,
   *	kernel calls while holding the buffer header spinlock.
   */
  volatile BufferDesc *
! StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
  {
  	volatile BufferDesc *buf;
  	Latch	   *bgwriterLatch;
--- 120,126 ----
   *	kernel calls while holding the buffer header spinlock.
   */
  volatile BufferDesc *
! StrategyGetBuffer(int poolid, BufferAccessStrategy strategy, bool *lock_held)
  {
  	volatile BufferDesc *buf;
  	Latch	   *bgwriterLatch;
***************
*** 131,144 **** StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
  
  	/* Nope, so lock the freelist */
  	*lock_held = true;
! 	LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
  
  	/*
  	 * We count buffer allocation requests so that the bgwriter can estimate
  	 * the rate of buffer consumption.	Note that buffers recycled by a
  	 * strategy object are intentionally not counted here.
  	 */
! 	StrategyControl->numBufferAllocs++;
  
  	/*
  	 * If bgwriterLatch is set, we need to waken the bgwriter, but we should
--- 142,155 ----
  
  	/* Nope, so lock the freelist */
  	*lock_held = true;
! 	LWLockAcquire(BufFreelistLock(poolid), LW_EXCLUSIVE);
  
  	/*
  	 * We count buffer allocation requests so that the bgwriter can estimate
  	 * the rate of buffer consumption.	Note that buffers recycled by a
  	 * strategy object are intentionally not counted here.
  	 */
! 	StrategyControl[poolid]->numBufferAllocs++;
  
  	/*
  	 * If bgwriterLatch is set, we need to waken the bgwriter, but we should
***************
*** 146,158 **** StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
  	 * is annoyingly tedious, but it happens at most once per bgwriter cycle,
  	 * so the performance hit is minimal.
  	 */
! 	bgwriterLatch = StrategyControl->bgwriterLatch;
  	if (bgwriterLatch)
  	{
! 		StrategyControl->bgwriterLatch = NULL;
! 		LWLockRelease(BufFreelistLock);
  		SetLatch(bgwriterLatch);
! 		LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
  	}
  
  	/*
--- 157,169 ----
  	 * is annoyingly tedious, but it happens at most once per bgwriter cycle,
  	 * so the performance hit is minimal.
  	 */
! 	bgwriterLatch = StrategyControl[poolid]->bgwriterLatch;
  	if (bgwriterLatch)
  	{
! 		StrategyControl[poolid]->bgwriterLatch = NULL;
! 		LWLockRelease(BufFreelistLock(poolid));
  		SetLatch(bgwriterLatch);
! 		LWLockAcquire(BufFreelistLock(poolid), LW_EXCLUSIVE);
  	}
  
  	/*
***************
*** 161,173 **** StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
  	 * individual buffer spinlocks, so it's OK to manipulate them without
  	 * holding the spinlock.
  	 */
! 	while (StrategyControl->firstFreeBuffer >= 0)
  	{
! 		buf = &BufferDescriptors[StrategyControl->firstFreeBuffer];
  		Assert(buf->freeNext != FREENEXT_NOT_IN_LIST);
  
  		/* Unconditionally remove buffer from freelist */
! 		StrategyControl->firstFreeBuffer = buf->freeNext;
  		buf->freeNext = FREENEXT_NOT_IN_LIST;
  
  		/*
--- 172,184 ----
  	 * individual buffer spinlocks, so it's OK to manipulate them without
  	 * holding the spinlock.
  	 */
! 	while (StrategyControl[poolid]->firstFreeBuffer >= 0)
  	{
! 		buf = &BufferDescriptors[StrategyControl[poolid]->firstFreeBuffer];
  		Assert(buf->freeNext != FREENEXT_NOT_IN_LIST);
  
  		/* Unconditionally remove buffer from freelist */
! 		StrategyControl[poolid]->firstFreeBuffer = buf->freeNext;
  		buf->freeNext = FREENEXT_NOT_IN_LIST;
  
  		/*
***************
*** 188,202 **** StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
  	}
  
  	/* Nothing on the freelist, so run the "clock sweep" algorithm */
! 	trycounter = NBuffers;
  	for (;;)
  	{
! 		buf = &BufferDescriptors[StrategyControl->nextVictimBuffer];
  
! 		if (++StrategyControl->nextVictimBuffer >= NBuffers)
  		{
! 			StrategyControl->nextVictimBuffer = 0;
! 			StrategyControl->completePasses++;
  		}
  
  		/*
--- 199,213 ----
  	}
  
  	/* Nothing on the freelist, so run the "clock sweep" algorithm */
! 	trycounter = BufferStrategyStatus[poolid].num_buffers;
  	for (;;)
  	{
! 		buf = &BufferDescriptors[StrategyControl[poolid]->nextVictimBuffer];
  
! 		if (++StrategyControl[poolid]->nextVictimBuffer >= ((poolid == 0) ? BufferStrategyStatus[poolid].num_buffers : NSharedBuffers))
  		{
! 			StrategyControl[poolid]->nextVictimBuffer = (poolid == 0) ? 0 : BufferStrategyStatus[poolid - 1].num_buffers;
! 			StrategyControl[poolid]->completePasses++;
  		}
  
  		/*
***************
*** 209,215 **** StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
  			if (buf->usage_count > 0)
  			{
  				buf->usage_count--;
! 				trycounter = NBuffers;
  			}
  			else
  			{
--- 220,226 ----
  			if (buf->usage_count > 0)
  			{
  				buf->usage_count--;
! 				trycounter = BufferStrategyStatus[poolid].num_buffers;
  			}
  			else
  			{
***************
*** 241,247 **** StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
  void
  StrategyFreeBuffer(volatile BufferDesc *buf)
  {
! 	LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
  
  	/*
  	 * It is possible that we are told to put something in the freelist that
--- 252,260 ----
  void
  StrategyFreeBuffer(volatile BufferDesc *buf)
  {
! 	int poolid = buf->buf_id < NBuffers ? DEFAULT_BUFFER_POOL : PRIORITY_BUFFER_POOL;
! 
! 	LWLockAcquire(BufFreelistLock(poolid), LW_EXCLUSIVE);
  
  	/*
  	 * It is possible that we are told to put something in the freelist that
***************
*** 249,261 **** StrategyFreeBuffer(volatile BufferDesc *buf)
  	 */
  	if (buf->freeNext == FREENEXT_NOT_IN_LIST)
  	{
! 		buf->freeNext = StrategyControl->firstFreeBuffer;
  		if (buf->freeNext < 0)
! 			StrategyControl->lastFreeBuffer = buf->buf_id;
! 		StrategyControl->firstFreeBuffer = buf->buf_id;
  	}
  
! 	LWLockRelease(BufFreelistLock);
  }
  
  /*
--- 262,274 ----
  	 */
  	if (buf->freeNext == FREENEXT_NOT_IN_LIST)
  	{
! 		buf->freeNext = StrategyControl[poolid]->firstFreeBuffer;
  		if (buf->freeNext < 0)
! 			StrategyControl[poolid]->lastFreeBuffer = buf->buf_id;
! 		StrategyControl[poolid]->firstFreeBuffer = buf->buf_id;
  	}
  
! 	LWLockRelease(BufFreelistLock(poolid));
  }
  
  /*
***************
*** 270,289 **** StrategyFreeBuffer(volatile BufferDesc *buf)
   * being read.
   */
  int
! StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc)
  {
  	int			result;
  
! 	LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
! 	result = StrategyControl->nextVictimBuffer;
  	if (complete_passes)
! 		*complete_passes = StrategyControl->completePasses;
  	if (num_buf_alloc)
  	{
! 		*num_buf_alloc = StrategyControl->numBufferAllocs;
! 		StrategyControl->numBufferAllocs = 0;
  	}
! 	LWLockRelease(BufFreelistLock);
  	return result;
  }
  
--- 283,302 ----
   * being read.
   */
  int
! StrategySyncStart(int poolid, uint32 *complete_passes, uint32 *num_buf_alloc)
  {
  	int			result;
  
! 	LWLockAcquire(BufFreelistLock(poolid), LW_EXCLUSIVE);
! 	result = StrategyControl[poolid]->nextVictimBuffer;
  	if (complete_passes)
! 		*complete_passes = StrategyControl[poolid]->completePasses;
  	if (num_buf_alloc)
  	{
! 		*num_buf_alloc = StrategyControl[poolid]->numBufferAllocs;
! 		StrategyControl[poolid]->numBufferAllocs = 0;
  	}
! 	LWLockRelease(BufFreelistLock(poolid));
  	return result;
  }
  
***************
*** 298,311 **** StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc)
  void
  StrategyNotifyBgWriter(Latch *bgwriterLatch)
  {
! 	/*
! 	 * We acquire the BufFreelistLock just to ensure that the store appears
! 	 * atomic to StrategyGetBuffer.  The bgwriter should call this rather
! 	 * infrequently, so there's no performance penalty from being safe.
! 	 */
! 	LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
! 	StrategyControl->bgwriterLatch = bgwriterLatch;
! 	LWLockRelease(BufFreelistLock);
  }
  
  
--- 311,329 ----
  void
  StrategyNotifyBgWriter(Latch *bgwriterLatch)
  {
! 	int poolid;
! 
! 	for (poolid = 0; poolid < NUM_MAX_BUFFER_POOLS; poolid++)
! 	{
! 		/*
! 		 * We acquire the BufFreelistLock just to ensure that the store appears
! 		 * atomic to StrategyGetBuffer.  The bgwriter should call this rather
! 		 * infrequently, so there's no performance penalty from being safe.
! 		 */
! 		LWLockAcquire(BufFreelistLock(poolid), LW_EXCLUSIVE);
! 		StrategyControl[poolid]->bgwriterLatch = bgwriterLatch;
! 		LWLockRelease(BufFreelistLock(poolid));
! 	}
  }
  
  
***************
*** 318,329 **** StrategyNotifyBgWriter(Latch *bgwriterLatch)
   * is also determined here.
   */
  Size
! StrategyShmemSize(void)
  {
  	Size		size = 0;
  
  	/* size of lookup hash table ... see comment in StrategyInitialize */
! 	size = add_size(size, BufTableShmemSize(NBuffers + NUM_BUFFER_PARTITIONS));
  
  	/* size of the shared replacement strategy control block */
  	size = add_size(size, MAXALIGN(sizeof(BufferStrategyControl)));
--- 336,347 ----
   * is also determined here.
   */
  Size
! StrategyShmemSize()
  {
  	Size		size = 0;
  
  	/* size of lookup hash table ... see comment in StrategyInitialize */
! 	size = add_size(size, BufTableShmemSize(NSharedBuffers + NUM_BUFFER_PARTITIONS));
  
  	/* size of the shared replacement strategy control block */
  	size = add_size(size, MAXALIGN(sizeof(BufferStrategyControl)));
***************
*** 342,394 **** void
  StrategyInitialize(bool init)
  {
  	bool		found;
  
  	/*
  	 * Initialize the shared buffer lookup hashtable.
  	 *
  	 * Since we can't tolerate running out of lookup table entries, we must be
  	 * sure to specify an adequate table size here.  The maximum steady-state
! 	 * usage is of course NBuffers entries, but BufferAlloc() tries to insert
  	 * a new entry before deleting the old.  In principle this could be
  	 * happening in each partition concurrently, so we could need as many as
! 	 * NBuffers + NUM_BUFFER_PARTITIONS entries.
  	 */
! 	InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS);
  
! 	/*
! 	 * Get or create the shared strategy control block
! 	 */
! 	StrategyControl = (BufferStrategyControl *)
! 		ShmemInitStruct("Buffer Strategy Status",
! 						sizeof(BufferStrategyControl),
! 						&found);
  
! 	if (!found)
  	{
  		/*
! 		 * Only done once, usually in postmaster
  		 */
! 		Assert(init);
  
! 		/*
! 		 * Grab the whole linked list of free buffers for our strategy. We
! 		 * assume it was previously set up by InitBufferPool().
! 		 */
! 		StrategyControl->firstFreeBuffer = 0;
! 		StrategyControl->lastFreeBuffer = NBuffers - 1;
  
! 		/* Initialize the clock sweep pointer */
! 		StrategyControl->nextVictimBuffer = 0;
  
! 		/* Clear statistics */
! 		StrategyControl->completePasses = 0;
! 		StrategyControl->numBufferAllocs = 0;
  
! 		/* No pending notification */
! 		StrategyControl->bgwriterLatch = NULL;
  	}
- 	else
- 		Assert(!init);
  }
  
  
--- 360,419 ----
  StrategyInitialize(bool init)
  {
  	bool		found;
+ 	int			poolid;
  
  	/*
  	 * Initialize the shared buffer lookup hashtable.
  	 *
  	 * Since we can't tolerate running out of lookup table entries, we must be
  	 * sure to specify an adequate table size here.  The maximum steady-state
! 	 * usage is of course NSharedBuffers entries, but BufferAlloc() tries to insert
  	 * a new entry before deleting the old.  In principle this could be
  	 * happening in each partition concurrently, so we could need as many as
! 	 * NSharedBuffers + NUM_BUFFER_PARTITIONS entries.
  	 */
! 	InitBufTable(NSharedBuffers + NUM_BUFFER_PARTITIONS);
  
! 	BufferStrategyStatus[DEFAULT_BUFFER_POOL].num_buffers = NBuffers;
! 	BufferStrategyStatus[PRIORITY_BUFFER_POOL].num_buffers = NPriorityBuffers;
  
! 	for (poolid = 0; poolid < NUM_MAX_BUFFER_POOLS; poolid++)
  	{
  		/*
! 		 * Get or create the shared strategy control block
  		 */
! 		StrategyControl[poolid] = (BufferStrategyControl *)
! 			ShmemInitStruct(BufferStrategyStatus[poolid].name,
! 							sizeof(BufferStrategyControl),
! 							&found);
  
! 		if (!found)
! 		{
! 			/*
! 			 * Only done once, usually in postmaster
! 			 */
! 			Assert(init);
  
! 			/*
! 			 * Grab the whole linked list of free buffers for our strategy. We
! 			 * assume it was previously set up by InitBufferPool().
! 			 */
! 			StrategyControl[poolid]->firstFreeBuffer = (poolid == 0) ? 0 : BufferStrategyStatus[poolid - 1].num_buffers;
! 			StrategyControl[poolid]->lastFreeBuffer = BufferStrategyStatus[poolid].num_buffers - 1;
  
! 			/* Initialize the clock sweep pointer */
! 			StrategyControl[poolid]->nextVictimBuffer = (poolid == 0) ? 0 : BufferStrategyStatus[poolid - 1].num_buffers;;
  
! 			/* Clear statistics */
! 			StrategyControl[poolid]->completePasses = 0;
! 			StrategyControl[poolid]->numBufferAllocs = 0;
! 
! 			/* No pending notification */
! 			StrategyControl[poolid]->bgwriterLatch = NULL;
! 		}
! 		else
! 			Assert(!init);
  	}
  }
  
  
***************
*** 438,444 **** GetAccessStrategy(BufferAccessStrategyType btype)
  	}
  
  	/* Make sure ring isn't an undue fraction of shared buffers */
! 	ring_size = Min(NBuffers / 8, ring_size);
  
  	/* Allocate the object and initialize all elements to zeroes */
  	strategy = (BufferAccessStrategy)
--- 463,469 ----
  	}
  
  	/* Make sure ring isn't an undue fraction of shared buffers */
! 	ring_size = Min(NSharedBuffers / 8, ring_size);
  
  	/* Allocate the object and initialize all elements to zeroes */
  	strategy = (BufferAccessStrategy)
*** a/src/backend/storage/lmgr/lwlock.c
--- b/src/backend/storage/lmgr/lwlock.c
***************
*** 219,225 **** NumLWLocks(void)
  	numLocks = NUM_FIXED_LWLOCKS;
  
  	/* bufmgr.c needs two for each shared buffer */
! 	numLocks += 2 * NBuffers;
  
  	/* proc.c needs one for each backend or auxiliary process */
  	numLocks += MaxBackends + NUM_AUXILIARY_PROCS;
--- 219,225 ----
  	numLocks = NUM_FIXED_LWLOCKS;
  
  	/* bufmgr.c needs two for each shared buffer */
! 	numLocks += 2 * NSharedBuffers;
  
  	/* proc.c needs one for each backend or auxiliary process */
  	numLocks += MaxBackends + NUM_AUXILIARY_PROCS;
*** a/src/backend/tcop/postgres.c
--- b/src/backend/tcop/postgres.c
***************
*** 3538,3544 **** PostgresMain(int argc, char *argv[],
  
  		MyStartTime = time(NULL);
  	}
! 
  	SetProcessingMode(InitProcessing);
  
  	/* Compute paths, if we didn't inherit them from postmaster */
--- 3538,3544 ----
  
  		MyStartTime = time(NULL);
  	}
! 	
  	SetProcessingMode(InitProcessing);
  
  	/* Compute paths, if we didn't inherit them from postmaster */
*** a/src/backend/utils/init/globals.c
--- b/src/backend/utils/init/globals.c
***************
*** 107,112 **** int			maintenance_work_mem = 16384;
--- 107,114 ----
   * register background workers.
   */
  int			NBuffers = 1000;
+ int			NPriorityBuffers = 0;
+ int			NSharedBuffers;
  int			MaxConnections = 90;
  int			max_worker_processes = 8;
  int			MaxBackends = 0;
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
***************
*** 1712,1717 **** static struct config_int ConfigureNamesInt[] =
--- 1712,1728 ----
  	},
  
  	{
+ 		{"priority_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+ 			gettext_noop("Sets the number of priority buffers used by the server."),
+ 			NULL,
+ 			GUC_UNIT_BLOCKS
+ 		},
+ 		&NPriorityBuffers,
+ 		1024, 16, INT_MAX / 2,
+ 		NULL, NULL, NULL
+ 	},
+ 
+ 	{
  		{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
  			gettext_noop("Sets the maximum number of temporary buffers used by each session."),
  			NULL,
*** a/src/backend/utils/misc/postgresql.conf.sample
--- b/src/backend/utils/misc/postgresql.conf.sample
***************
*** 114,119 ****
--- 114,121 ----
  
  #shared_buffers = 32MB			# min 128kB
  					# (change requires restart)
+ #priority_buffers = 32MB		#Buffers used for priority tables
+ 					# (change requires restart)
  #huge_pages = try			# on, off, or try
  					# (change requires restart)
  #temp_buffers = 8MB			# min 800kB
*** a/src/include/miscadmin.h
--- b/src/include/miscadmin.h
***************
*** 126,132 **** do { \
  /*****************************************************************************
   *	  globals.h --															 *
   *****************************************************************************/
- 
  /*
   * from utils/init/globals.c
   */
--- 126,131 ----
***************
*** 140,146 **** extern bool ExitOnAnyError;
  
  extern PGDLLIMPORT char *DataDir;
  
! extern PGDLLIMPORT int NBuffers;
  extern int	MaxBackends;
  extern int	MaxConnections;
  extern int	max_worker_processes;
--- 139,145 ----
  
  extern PGDLLIMPORT char *DataDir;
  
! extern PGDLLIMPORT int NSharedBuffers;
  extern int	MaxBackends;
  extern int	MaxConnections;
  extern int	max_worker_processes;
*** a/src/include/storage/buf_internals.h
--- b/src/include/storage/buf_internals.h
***************
*** 23,28 ****
--- 23,31 ----
  #include "storage/spin.h"
  #include "utils/relcache.h"
  
+ #define NUM_MAX_BUFFER_POOLS 2
+ #define DEFAULT_BUFFER_POOL 0
+ #define PRIORITY_BUFFER_POOL 1
  
  /*
   * Flags for buffer descriptors
***************
*** 185,197 **** extern BufferDesc *LocalBufferDescriptors;
   */
  
  /* freelist.c */
! extern volatile BufferDesc *StrategyGetBuffer(BufferAccessStrategy strategy,
  				  bool *lock_held);
  extern void StrategyFreeBuffer(volatile BufferDesc *buf);
  extern bool StrategyRejectBuffer(BufferAccessStrategy strategy,
  					 volatile BufferDesc *buf);
  
! extern int	StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc);
  extern void StrategyNotifyBgWriter(Latch *bgwriterLatch);
  
  extern Size StrategyShmemSize(void);
--- 188,200 ----
   */
  
  /* freelist.c */
! extern volatile BufferDesc *StrategyGetBuffer(int poolid, BufferAccessStrategy strategy,
  				  bool *lock_held);
  extern void StrategyFreeBuffer(volatile BufferDesc *buf);
  extern bool StrategyRejectBuffer(BufferAccessStrategy strategy,
  					 volatile BufferDesc *buf);
  
! extern int	StrategySyncStart(int poolid, uint32 *complete_passes, uint32 *num_buf_alloc);
  extern void StrategyNotifyBgWriter(Latch *bgwriterLatch);
  
  extern Size StrategyShmemSize(void);
*** a/src/include/storage/bufmgr.h
--- b/src/include/storage/bufmgr.h
***************
*** 16,21 ****
--- 16,22 ----
  
  #include "storage/block.h"
  #include "storage/buf.h"
+ #include "storage/buf_internals.h"
  #include "storage/bufpage.h"
  #include "storage/relfilenode.h"
  #include "utils/relcache.h"
***************
*** 44,50 **** typedef enum
--- 45,53 ----
  } ReadBufferMode;
  
  /* in globals.c ... this duplicates miscadmin.h */
+ extern PGDLLIMPORT int NSharedBuffers;
  extern PGDLLIMPORT int NBuffers;
+ extern PGDLLIMPORT int NPriorityBuffers;
  
  /* in bufmgr.c */
  extern bool zero_damaged_pages;
***************
*** 97,103 **** extern PGDLLIMPORT int32 *LocalRefCount;
   */
  #define BufferIsValid(bufnum) \
  ( \
! 	AssertMacro((bufnum) <= NBuffers && (bufnum) >= -NLocBuffer), \
  	(bufnum) != InvalidBuffer  \
  )
  
--- 100,106 ----
   */
  #define BufferIsValid(bufnum) \
  ( \
! 	AssertMacro((bufnum) <= NSharedBuffers && (bufnum) >= -NLocBuffer), \
  	(bufnum) != InvalidBuffer  \
  )
  
*** a/src/include/storage/lwlock.h
--- b/src/include/storage/lwlock.h
***************
*** 89,133 **** extern PGDLLIMPORT LWLockPadded *MainLWLockArray;
   * if you remove a lock, consider leaving a gap in the numbering sequence for
   * the benefit of DTrace and other external debugging scripts.
   */
! #define BufFreelistLock				(&MainLWLockArray[0].lock)
! #define ShmemIndexLock				(&MainLWLockArray[1].lock)
! #define OidGenLock					(&MainLWLockArray[2].lock)
! #define XidGenLock					(&MainLWLockArray[3].lock)
! #define ProcArrayLock				(&MainLWLockArray[4].lock)
! #define SInvalReadLock				(&MainLWLockArray[5].lock)
! #define SInvalWriteLock				(&MainLWLockArray[6].lock)
! #define WALBufMappingLock			(&MainLWLockArray[7].lock)
! #define WALWriteLock				(&MainLWLockArray[8].lock)
! #define ControlFileLock				(&MainLWLockArray[9].lock)
! #define CheckpointLock				(&MainLWLockArray[10].lock)
! #define CLogControlLock				(&MainLWLockArray[11].lock)
! #define SubtransControlLock			(&MainLWLockArray[12].lock)
! #define MultiXactGenLock			(&MainLWLockArray[13].lock)
! #define MultiXactOffsetControlLock	(&MainLWLockArray[14].lock)
! #define MultiXactMemberControlLock	(&MainLWLockArray[15].lock)
! #define RelCacheInitLock			(&MainLWLockArray[16].lock)
! #define CheckpointerCommLock		(&MainLWLockArray[17].lock)
! #define TwoPhaseStateLock			(&MainLWLockArray[18].lock)
! #define TablespaceCreateLock		(&MainLWLockArray[19].lock)
! #define BtreeVacuumLock				(&MainLWLockArray[20].lock)
! #define AddinShmemInitLock			(&MainLWLockArray[21].lock)
! #define AutovacuumLock				(&MainLWLockArray[22].lock)
! #define AutovacuumScheduleLock		(&MainLWLockArray[23].lock)
! #define SyncScanLock				(&MainLWLockArray[24].lock)
! #define RelationMappingLock			(&MainLWLockArray[25].lock)
! #define AsyncCtlLock				(&MainLWLockArray[26].lock)
! #define AsyncQueueLock				(&MainLWLockArray[27].lock)
! #define SerializableXactHashLock	(&MainLWLockArray[28].lock)
! #define SerializableFinishedListLock		(&MainLWLockArray[29].lock)
! #define SerializablePredicateLockListLock	(&MainLWLockArray[30].lock)
! #define OldSerXidLock				(&MainLWLockArray[31].lock)
! #define SyncRepLock					(&MainLWLockArray[32].lock)
! #define BackgroundWorkerLock		(&MainLWLockArray[33].lock)
! #define DynamicSharedMemoryControlLock		(&MainLWLockArray[34].lock)
! #define AutoFileLock				(&MainLWLockArray[35].lock)
! #define ReplicationSlotAllocationLock	(&MainLWLockArray[36].lock)
! #define ReplicationSlotControlLock		(&MainLWLockArray[37].lock)
! #define NUM_INDIVIDUAL_LWLOCKS		38
  
  /*
   * It's a bit odd to declare NUM_BUFFER_PARTITIONS and NUM_LOCK_PARTITIONS
--- 89,135 ----
   * if you remove a lock, consider leaving a gap in the numbering sequence for
   * the benefit of DTrace and other external debugging scripts.
   */
! #define BufFreelistLock(poolid)				(&MainLWLockArray[poolid].lock)
! #define NUM_FREELIST_LOCKS	2
! 
! #define ShmemIndexLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 1].lock)
! #define OidGenLock					(&MainLWLockArray[NUM_FREELIST_LOCKS + 2].lock)
! #define XidGenLock					(&MainLWLockArray[NUM_FREELIST_LOCKS + 3].lock)
! #define ProcArrayLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 4].lock)
! #define SInvalReadLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 5].lock)
! #define SInvalWriteLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 6].lock)
! #define WALBufMappingLock			(&MainLWLockArray[NUM_FREELIST_LOCKS + 7].lock)
! #define WALWriteLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 8].lock)
! #define ControlFileLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 9].lock)
! #define CheckpointLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 10].lock)
! #define CLogControlLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 11].lock)
! #define SubtransControlLock			(&MainLWLockArray[NUM_FREELIST_LOCKS + 12].lock)
! #define MultiXactGenLock			(&MainLWLockArray[NUM_FREELIST_LOCKS + 13].lock)
! #define MultiXactOffsetControlLock	(&MainLWLockArray[NUM_FREELIST_LOCKS + 14].lock)
! #define MultiXactMemberControlLock	(&MainLWLockArray[NUM_FREELIST_LOCKS + 15].lock)
! #define RelCacheInitLock			(&MainLWLockArray[NUM_FREELIST_LOCKS + 16].lock)
! #define CheckpointerCommLock		(&MainLWLockArray[NUM_FREELIST_LOCKS + 17].lock)
! #define TwoPhaseStateLock			(&MainLWLockArray[NUM_FREELIST_LOCKS + 18].lock)
! #define TablespaceCreateLock		(&MainLWLockArray[NUM_FREELIST_LOCKS + 19].lock)
! #define BtreeVacuumLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 20].lock)
! #define AddinShmemInitLock			(&MainLWLockArray[NUM_FREELIST_LOCKS + 21].lock)
! #define AutovacuumLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 22].lock)
! #define AutovacuumScheduleLock		(&MainLWLockArray[NUM_FREELIST_LOCKS + 23].lock)
! #define SyncScanLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 24].lock)
! #define RelationMappingLock			(&MainLWLockArray[NUM_FREELIST_LOCKS + 25].lock)
! #define AsyncCtlLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 26].lock)
! #define AsyncQueueLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 27].lock)
! #define SerializableXactHashLock	(&MainLWLockArray[NUM_FREELIST_LOCKS + 28].lock)
! #define SerializableFinishedListLock		(&MainLWLockArray[NUM_FREELIST_LOCKS + 29].lock)
! #define SerializablePredicateLockListLock	(&MainLWLockArray[NUM_FREELIST_LOCKS + 30].lock)
! #define OldSerXidLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 31].lock)
! #define SyncRepLock					(&MainLWLockArray[NUM_FREELIST_LOCKS + 32].lock)
! #define BackgroundWorkerLock		(&MainLWLockArray[NUM_FREELIST_LOCKS + 33].lock)
! #define DynamicSharedMemoryControlLock		(&MainLWLockArray[NUM_FREELIST_LOCKS + 34].lock)
! #define AutoFileLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 35].lock)
! #define ReplicationSlotAllocationLock	(&MainLWLockArray[NUM_FREELIST_LOCKS + 36].lock)
! #define ReplicationSlotControlLock		(&MainLWLockArray[NUM_FREELIST_LOCKS + 37].lock)
! #define NUM_INDIVIDUAL_LWLOCKS		NUM_FREELIST_LOCKS + 37
  
  /*
   * It's a bit odd to declare NUM_BUFFER_PARTITIONS and NUM_LOCK_PARTITIONS
*** a/src/include/utils/rel.h
--- b/src/include/utils/rel.h
***************
*** 217,222 **** typedef struct StdRdOptions
--- 217,223 ----
  {
  	int32		vl_len_;		/* varlena header (do not touch directly!) */
  	int			fillfactor;		/* page fill factor in percent (0..100) */
+ 	int			bufferpool_offset;		/* Buffer Pool option */
  	AutoVacOpts autovacuum;		/* autovacuum-related options */
  	bool		security_barrier;		/* for views */
  	int			check_option_offset;	/* for views */
***************
*** 249,254 **** typedef struct StdRdOptions
--- 250,266 ----
  	(BLCKSZ * (100 - RelationGetFillFactor(relation, defaultff)) / 100)
  
  /*
+  * RelationIsInPriorityBufferPool
+  *		Returns the relation's buffer pool.  
+  */
+ #define RelationIsInPriorityBufferPool(relation) \
+ 	((relation)->rd_options &&												\
+ 	 ((StdRdOptions *) (relation)->rd_options)->bufferpool_offset != 0 ?	\
+ 	 strcmp((char *) (relation)->rd_options +								\
+ 			((StdRdOptions *) (relation)->rd_options)->bufferpool_offset,	\
+ 			"priority") == 0 : false)
+ 
+ /*
   * RelationIsSecurityView
   *		Returns whether the relation is security view, or not
   */
#9Sameer Thakur
samthakur74@gmail.com
In reply to: Haribabu Kommi (#8)
Re: Priority table or Cache table

Hello,
I applied the patch to current HEAD. There was one failure (attached),
freelist.rej
<http://postgresql.1045698.n5.nabble.com/file/n5804200/freelist.rej&gt;

Compiled the provided pgbench.c and added following in .conf
shared_buffers = 128MB # min 128kB
Shared_buffers=64MB
Priority_buffers=128MB

I was planning to performance test later hence different values.

But while executing pgbench the following assertion occurs

LOG: database system is ready to accept connections
LOG: autovacuum launcher started
TRAP: FailedAssertion("!(strategy_delta >= 0)", File: "bufmgr.c", Line:
1435)
LOG: background writer process (PID 10274) was terminated by signal 6:
Aborted
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.

Is there a way to avoid it? Am i making some mistake?
regards
Sameer

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Priority-table-or-Cache-table-tp5792831p5804200.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Hans-Jürgen Schönig
postgres@cybertec.at
In reply to: Tom Lane (#2)
Re: Priority table or Cache table

On 20 Feb 2014, at 01:38, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Haribabu Kommi <kommi.haribabu@gmail.com> writes:

I want to propose a new feature called "priority table" or "cache table".
This is same as regular table except the pages of these tables are having
high priority than normal tables. These tables are very useful, where a
faster query processing on some particular tables is expected.

Why exactly does the existing LRU behavior of shared buffers not do
what you need?

I am really dubious that letting DBAs manage buffers is going to be
an improvement over automatic management.

regards, tom lane

the reason for a feature like that is to define an area of the application which needs more predictable runtime behaviour.
not all tables are created equals in term of importance.

example: user authentication should always be supersonic fast while some reporting tables might gladly be forgotten even if they happened to be in use recently.

i am not saying that we should have this feature.
however, there are definitely use cases which would justify some more control here.
otherwise people will fall back and use dirty tricks sucks as “SELECT count(*)” or so to emulate what we got here.

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Fujii Masao
masao.fujii@gmail.com
In reply to: Haribabu Kommi (#8)
Re: Priority table or Cache table

On Mon, Mar 17, 2014 at 1:16 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

On Fri, Feb 21, 2014 at 12:02 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

On Thu, Feb 20, 2014 at 10:06 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

On Thu, Feb 20, 2014 at 10:23 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

On Thu, Feb 20, 2014 at 2:26 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Thu, Feb 20, 2014 at 6:24 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

On Thu, Feb 20, 2014 at 11:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I want to propose a new feature called "priority table" or "cache
table".
This is same as regular table except the pages of these tables are
having
high priority than normal tables. These tables are very useful,
where a
faster query processing on some particular tables is expected.

Why exactly does the existing LRU behavior of shared buffers not do
what you need?

Lets assume a database having 3 tables, which are accessed regularly.
The
user is expecting a faster query results on one table.
Because of LRU behavior which is not happening some times.

I Implemented a proof of concept patch to see whether the buffer pool
split can improve the performance or not.

Summary of the changes:
1. The priority buffers are allocated as continuous to the shared buffers.
2. Added new reloption parameter called "buffer_pool" to specify the
buffer_pool user wants the table to use.

I'm not sure if storing the information of "priority table" into
database is good
because this means that it's replicated to the standby and the same table
will be treated with high priority even in the standby server. I can imagine
some users want to set different tables as high priority ones in master and
standby.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Jim Nasby
jim@nasby.net
In reply to: Hans-Jürgen Schönig (#10)
Re: Priority table or Cache table

On 5/16/14, 8:15 AM, Hans-J�rgen Sch�nig wrote:

On 20 Feb 2014, at 01:38, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I am really dubious that letting DBAs manage buffers is going to be
an improvement over automatic management.

the reason for a feature like that is to define an area of the application which needs more predictable runtime behaviour.
not all tables are created equals in term of importance.

example: user authentication should always be supersonic fast while some reporting tables might gladly be forgotten even if they happened to be in use recently.

i am not saying that we should have this feature.
however, there are definitely use cases which would justify some more control here.
otherwise people will fall back and use dirty tricks sucks as �SELECT count(*)� or so to emulate what we got here.

Which is really just an extension of a larger problem: many applications do not care one iota about ideal performance; they care about *always* having some minimum level of performance. This frequently comes up with the issue of a query plan that is marginally faster 99% of the time but sucks horribly for the remaining 1%. Frequently it's far better to chose a less optimal query that doesn't have a degenerate case.
--
Jim C. Nasby, Data Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Hannu Krosing
hannu@2ndQuadrant.com
In reply to: Fujii Masao (#11)
Re: Priority table or Cache table

On 05/20/2014 01:46 PM, Fujii Masao wrote:

On Mon, Mar 17, 2014 at 1:16 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

...
I Implemented a proof of concept patch to see whether the buffer pool
split can improve the performance or not.

Summary of the changes:
1. The priority buffers are allocated as continuous to the shared buffers.
2. Added new reloption parameter called "buffer_pool" to specify the
buffer_pool user wants the table to use.

I'm not sure if storing the information of "priority table" into
database is good
because this means that it's replicated to the standby and the same table
will be treated with high priority even in the standby server. I can imagine
some users want to set different tables as high priority ones in master and
standby.

There might be a possibility to override this in postgresql.conf for
optimising what you described but for most uses it is best to be in
the database, at least to get started.

Cheers

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Fujii Masao
masao.fujii@gmail.com
In reply to: Hannu Krosing (#13)
Re: Priority table or Cache table

On Sun, May 25, 2014 at 6:52 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:

On 05/20/2014 01:46 PM, Fujii Masao wrote:

On Mon, Mar 17, 2014 at 1:16 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

...
I Implemented a proof of concept patch to see whether the buffer pool
split can improve the performance or not.

Summary of the changes:
1. The priority buffers are allocated as continuous to the shared buffers.
2. Added new reloption parameter called "buffer_pool" to specify the
buffer_pool user wants the table to use.

I'm not sure if storing the information of "priority table" into
database is good
because this means that it's replicated to the standby and the same table
will be treated with high priority even in the standby server. I can imagine
some users want to set different tables as high priority ones in master and
standby.

There might be a possibility to override this in postgresql.conf for
optimising what you described but for most uses it is best to be in
the database, at least to get started.

Overriding the setting in postgresql.conf rather than that in database might
confuse users because it's opposite order of the priority of the GUC setting.

Or, what about storig the setting into flat file like replication slot?

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Hannu Krosing
hannu@2ndQuadrant.com
In reply to: Fujii Masao (#14)
Re: Priority table or Cache table

On 05/26/2014 04:16 PM, Fujii Masao wrote:

On Sun, May 25, 2014 at 6:52 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:

On 05/20/2014 01:46 PM, Fujii Masao wrote:

On Mon, Mar 17, 2014 at 1:16 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

...
I Implemented a proof of concept patch to see whether the buffer pool
split can improve the performance or not.

Summary of the changes:
1. The priority buffers are allocated as continuous to the shared buffers.
2. Added new reloption parameter called "buffer_pool" to specify the
buffer_pool user wants the table to use.

I'm not sure if storing the information of "priority table" into
database is good
because this means that it's replicated to the standby and the same table
will be treated with high priority even in the standby server. I can imagine
some users want to set different tables as high priority ones in master and
standby.

There might be a possibility to override this in postgresql.conf for
optimising what you described but for most uses it is best to be in
the database, at least to get started.

Overriding the setting in postgresql.conf rather than that in database might
confuse users because it's opposite order of the priority of the GUC setting.

Or, what about storig the setting into flat file like replication slot?

seems like a good time to introduce a notion of non-replicated tables :)

should be a good fit with logical replication.

Cheers
Hannu

Regards,

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Sameer Thakur (#9)
1 attachment(s)
Re: Priority table or Cache table

On Fri, May 16, 2014 at 8:29 PM, Sameer Thakur <samthakur74@gmail.com> wrote:

Hello,
I applied the patch to current HEAD. There was one failure (attached),
freelist.rej
<http://postgresql.1045698.n5.nabble.com/file/n5804200/freelist.rej&gt;

Compiled the provided pgbench.c and added following in .conf
shared_buffers = 128MB # min 128kB
Shared_buffers=64MB
Priority_buffers=128MB

I was planning to performance test later hence different values.

But while executing pgbench the following assertion occurs

LOG: database system is ready to accept connections
LOG: autovacuum launcher started
TRAP: FailedAssertion("!(strategy_delta >= 0)", File: "bufmgr.c", Line:
1435)
LOG: background writer process (PID 10274) was terminated by signal 6:
Aborted
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.

Is there a way to avoid it? Am i making some mistake?

Sorry for the late reply. Thanks for the test.
Please find the re-based patch with a temp fix for correcting the problem.
I will a submit a proper patch fix later.

Regards,
Hari Babu
Fujitsu Australia

Attachments:

cache_table_poc_v2.patchapplication/octet-stream; name=cache_table_poc_v2.patchDownload
*** a/contrib/pg_buffercache/pg_buffercache_pages.c
--- b/contrib/pg_buffercache/pg_buffercache_pages.c
***************
*** 98,107 **** pg_buffercache_pages(PG_FUNCTION_ARGS)
  		fctx->tupdesc = BlessTupleDesc(tupledesc);
  
  		/* Allocate NBuffers worth of BufferCachePagesRec records. */
! 		fctx->record = (BufferCachePagesRec *) palloc(sizeof(BufferCachePagesRec) * NBuffers);
  
  		/* Set max calls and remember the user function context. */
! 		funcctx->max_calls = NBuffers;
  		funcctx->user_fctx = fctx;
  
  		/* Return to original context when allocating transient memory */
--- 98,107 ----
  		fctx->tupdesc = BlessTupleDesc(tupledesc);
  
  		/* Allocate NBuffers worth of BufferCachePagesRec records. */
! 		fctx->record = (BufferCachePagesRec *) palloc(sizeof(BufferCachePagesRec) * NSharedBuffers);
  
  		/* Set max calls and remember the user function context. */
! 		funcctx->max_calls = NSharedBuffers;
  		funcctx->user_fctx = fctx;
  
  		/* Return to original context when allocating transient memory */
***************
*** 120,126 **** pg_buffercache_pages(PG_FUNCTION_ARGS)
  		 * Scan though all the buffers, saving the relevant fields in the
  		 * fctx->record structure.
  		 */
! 		for (i = 0, bufHdr = BufferDescriptors; i < NBuffers; i++, bufHdr++)
  		{
  			/* Lock each buffer header before inspecting. */
  			LockBufHdr(bufHdr);
--- 120,126 ----
  		 * Scan though all the buffers, saving the relevant fields in the
  		 * fctx->record structure.
  		 */
! 		for (i = 0, bufHdr = BufferDescriptors; i < NSharedBuffers; i++, bufHdr++)
  		{
  			/* Lock each buffer header before inspecting. */
  			LockBufHdr(bufHdr);
*** a/src/backend/access/common/reloptions.c
--- b/src/backend/access/common/reloptions.c
***************
*** 33,38 ****
--- 33,40 ----
  #include "utils/memutils.h"
  #include "utils/rel.h"
  
+ static void validateBufferPoolOption(char *value);
+ 
  /*
   * Contents of pg_class.reloptions
   *
***************
*** 292,297 **** static relopt_string stringRelOpts[] =
--- 294,310 ----
  		validateWithCheckOption,
  		NULL
  	},
+ 	{
+ 		{
+ 			"buffer_pool",
+ 			"Table with buffer_pool option defined (default or priority).",
+ 			RELOPT_KIND_HEAP | RELOPT_KIND_BTREE
+ 		},
+ 		7,
+ 		false,
+ 		validateBufferPoolOption,
+ 		"default"
+ 	},
  	/* list terminator */
  	{{NULL}}
  };
***************
*** 1174,1179 **** default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
--- 1187,1194 ----
  	int			numoptions;
  	static const relopt_parse_elt tab[] = {
  		{"fillfactor", RELOPT_TYPE_INT, offsetof(StdRdOptions, fillfactor)},
+ 		{"buffer_pool", RELOPT_TYPE_STRING,
+ 		offsetof(StdRdOptions, bufferpool_offset)},
  		{"autovacuum_enabled", RELOPT_TYPE_BOOL,
  		offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, enabled)},
  		{"autovacuum_vacuum_threshold", RELOPT_TYPE_INT,
***************
*** 1356,1358 **** tablespace_reloptions(Datum reloptions, bool validate)
--- 1371,1391 ----
  
  	return (bytea *) tsopts;
  }
+ 
+ /*
+  * Validator for "bufferpool" reloption on Tables and views. The allowed values
+  * are "default" and "priority".
+  */
+ static void
+ validateBufferPoolOption(char *value)
+ {
+ 	if (value == NULL ||
+ 		(pg_strcasecmp(value, "default") != 0 &&
+ 		 pg_strcasecmp(value, "priority") != 0))
+ 	{
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 				 errmsg("invalid value for \"bufferpool\" option"),
+ 			  errdetail("Valid values are \"default\", and \"priority\".")));
+ 	}
+ }
*** a/src/backend/access/transam/clog.c
--- b/src/backend/access/transam/clog.c
***************
*** 437,443 **** TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
  Size
  CLOGShmemBuffers(void)
  {
! 	return Min(32, Max(4, NBuffers / 512));
  }
  
  /*
--- 437,443 ----
  Size
  CLOGShmemBuffers(void)
  {
! 	return Min(32, Max(4, NSharedBuffers / 512));
  }
  
  /*
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 4702,4708 **** XLOGChooseNumBuffers(void)
  {
  	int			xbuffers;
  
! 	xbuffers = NBuffers / 32;
  	if (xbuffers > XLOG_SEG_SIZE / XLOG_BLCKSZ)
  		xbuffers = XLOG_SEG_SIZE / XLOG_BLCKSZ;
  	if (xbuffers < 8)
--- 4702,4708 ----
  {
  	int			xbuffers;
  
! 	xbuffers = NSharedBuffers / 32;
  	if (xbuffers > XLOG_SEG_SIZE / XLOG_BLCKSZ)
  		xbuffers = XLOG_SEG_SIZE / XLOG_BLCKSZ;
  	if (xbuffers < 8)
***************
*** 7845,7851 **** LogCheckpointEnd(bool restartpoint)
  			 "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
  			 "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s",
  			 CheckpointStats.ckpt_bufs_written,
! 			 (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
  			 CheckpointStats.ckpt_segs_added,
  			 CheckpointStats.ckpt_segs_removed,
  			 CheckpointStats.ckpt_segs_recycled,
--- 7845,7851 ----
  			 "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
  			 "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s",
  			 CheckpointStats.ckpt_bufs_written,
! 			 (double) CheckpointStats.ckpt_bufs_written * 100 / NSharedBuffers,
  			 CheckpointStats.ckpt_segs_added,
  			 CheckpointStats.ckpt_segs_removed,
  			 CheckpointStats.ckpt_segs_recycled,
***************
*** 7861,7867 **** LogCheckpointEnd(bool restartpoint)
  			 "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
  			 "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s",
  			 CheckpointStats.ckpt_bufs_written,
! 			 (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
  			 CheckpointStats.ckpt_segs_added,
  			 CheckpointStats.ckpt_segs_removed,
  			 CheckpointStats.ckpt_segs_recycled,
--- 7861,7867 ----
  			 "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
  			 "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s",
  			 CheckpointStats.ckpt_bufs_written,
! 			 (double) CheckpointStats.ckpt_bufs_written * 100 / NSharedBuffers,
  			 CheckpointStats.ckpt_segs_added,
  			 CheckpointStats.ckpt_segs_removed,
  			 CheckpointStats.ckpt_segs_recycled,
***************
*** 8301,8307 **** CreateCheckPoint(int flags)
  	LogCheckpointEnd(false);
  
  	TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written,
! 									 NBuffers,
  									 CheckpointStats.ckpt_segs_added,
  									 CheckpointStats.ckpt_segs_removed,
  									 CheckpointStats.ckpt_segs_recycled);
--- 8301,8307 ----
  	LogCheckpointEnd(false);
  
  	TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written,
! 									 NSharedBuffers,
  									 CheckpointStats.ckpt_segs_added,
  									 CheckpointStats.ckpt_segs_removed,
  									 CheckpointStats.ckpt_segs_recycled);
*** a/src/backend/postmaster/checkpointer.c
--- b/src/backend/postmaster/checkpointer.c
***************
*** 903,912 **** CheckpointerShmemSize(void)
  
  	/*
  	 * Currently, the size of the requests[] array is arbitrarily set equal to
! 	 * NBuffers.  This may prove too large or small ...
  	 */
  	size = offsetof(CheckpointerShmemStruct, requests);
! 	size = add_size(size, mul_size(NBuffers, sizeof(CheckpointerRequest)));
  
  	return size;
  }
--- 903,912 ----
  
  	/*
  	 * Currently, the size of the requests[] array is arbitrarily set equal to
! 	 * NSharedBuffers.  This may prove too large or small ...
  	 */
  	size = offsetof(CheckpointerShmemStruct, requests);
! 	size = add_size(size, mul_size(NSharedBuffers, sizeof(CheckpointerRequest)));
  
  	return size;
  }
***************
*** 935,941 **** CheckpointerShmemInit(void)
  		 */
  		MemSet(CheckpointerShmem, 0, size);
  		SpinLockInit(&CheckpointerShmem->ckpt_lck);
! 		CheckpointerShmem->max_requests = NBuffers;
  	}
  }
  
--- 935,941 ----
  		 */
  		MemSet(CheckpointerShmem, 0, size);
  		SpinLockInit(&CheckpointerShmem->ckpt_lck);
! 		CheckpointerShmem->max_requests = NSharedBuffers;
  	}
  }
  
*** a/src/backend/storage/buffer/buf_init.c
--- b/src/backend/storage/buffer/buf_init.c
***************
*** 17,25 ****
  #include "storage/bufmgr.h"
  #include "storage/buf_internals.h"
  
- 
  BufferDesc *BufferDescriptors;
! char	   *BufferBlocks;
  int32	   *PrivateRefCount;
  
  
--- 17,24 ----
  #include "storage/bufmgr.h"
  #include "storage/buf_internals.h"
  
  BufferDesc *BufferDescriptors;
! char *BufferBlocks; 
  int32	   *PrivateRefCount;
  
  
***************
*** 77,87 **** InitBufferPool(void)
  
  	BufferDescriptors = (BufferDesc *)
  		ShmemInitStruct("Buffer Descriptors",
! 						NBuffers * sizeof(BufferDesc), &foundDescs);
  
  	BufferBlocks = (char *)
  		ShmemInitStruct("Buffer Blocks",
! 						NBuffers * (Size) BLCKSZ, &foundBufs);
  
  	if (foundDescs || foundBufs)
  	{
--- 76,86 ----
  
  	BufferDescriptors = (BufferDesc *)
  		ShmemInitStruct("Buffer Descriptors",
! 						NSharedBuffers * sizeof(BufferDesc), &foundDescs);
  
  	BufferBlocks = (char *)
  		ShmemInitStruct("Buffer Blocks",
! 						NSharedBuffers * (Size) BLCKSZ, &foundBufs);
  
  	if (foundDescs || foundBufs)
  	{
***************
*** 98,119 **** InitBufferPool(void)
  
  		/*
  		 * Initialize all the buffer headers.
! 		 */
! 		for (i = 0; i < NBuffers; buf++, i++)
  		{
  			CLEAR_BUFFERTAG(buf->tag);
  			buf->flags = 0;
  			buf->usage_count = 0;
  			buf->refcount = 0;
  			buf->wait_backend_pid = 0;
! 
  			SpinLockInit(&buf->buf_hdr_lock);
  
  			buf->buf_id = i;
  
  			/*
  			 * Initially link all the buffers together as unused. Subsequent
! 			 * management of this list is done by freelist.c.
  			 */
  			buf->freeNext = i + 1;
  
--- 97,118 ----
  
  		/*
  		 * Initialize all the buffer headers.
! 		*/
! 		for (i = 0; i < NSharedBuffers; buf++, i++)
  		{
  			CLEAR_BUFFERTAG(buf->tag);
  			buf->flags = 0;
  			buf->usage_count = 0;
  			buf->refcount = 0;
  			buf->wait_backend_pid = 0;
! 			
  			SpinLockInit(&buf->buf_hdr_lock);
  
  			buf->buf_id = i;
  
  			/*
  			 * Initially link all the buffers together as unused. Subsequent
!   			 * management of this list is done by freelist.c.
  			 */
  			buf->freeNext = i + 1;
  
***************
*** 122,134 **** InitBufferPool(void)
  		}
  
  		/* Correct last entry of linked list */
! 		BufferDescriptors[NBuffers - 1].freeNext = FREENEXT_END_OF_LIST;
  	}
  
  	/* Init other shared buffer-management stuff */
  	StrategyInitialize(!foundDescs);
  }
  
  /*
   * Initialize access to shared buffer pool
   *
--- 121,134 ----
  		}
  
  		/* Correct last entry of linked list */
! 		BufferDescriptors[NSharedBuffers - 1].freeNext = FREENEXT_END_OF_LIST;
  	}
  
  	/* Init other shared buffer-management stuff */
  	StrategyInitialize(!foundDescs);
  }
  
+ 
  /*
   * Initialize access to shared buffer pool
   *
***************
*** 147,153 **** InitBufferPoolAccess(void)
  	/*
  	 * Allocate and zero local arrays of per-buffer info.
  	 */
! 	PrivateRefCount = (int32 *) calloc(NBuffers, sizeof(int32));
  	if (!PrivateRefCount)
  		ereport(FATAL,
  				(errcode(ERRCODE_OUT_OF_MEMORY),
--- 147,153 ----
  	/*
  	 * Allocate and zero local arrays of per-buffer info.
  	 */
! 	PrivateRefCount = (int32 *) calloc(NSharedBuffers, sizeof(int32));
  	if (!PrivateRefCount)
  		ereport(FATAL,
  				(errcode(ERRCODE_OUT_OF_MEMORY),
***************
*** 164,178 **** Size
  BufferShmemSize(void)
  {
  	Size		size = 0;
  
  	/* size of buffer descriptors */
! 	size = add_size(size, mul_size(NBuffers, sizeof(BufferDesc)));
  
  	/* size of data pages */
! 	size = add_size(size, mul_size(NBuffers, BLCKSZ));
  
  	/* size of stuff controlled by freelist.c */
  	size = add_size(size, StrategyShmemSize());
! 
  	return size;
  }
--- 164,180 ----
  BufferShmemSize(void)
  {
  	Size		size = 0;
+ 	
+ 	NSharedBuffers = add_size(NBuffers, NPriorityBuffers);
  
  	/* size of buffer descriptors */
! 	size = add_size(size, mul_size(NSharedBuffers, sizeof(BufferDesc)));
  
  	/* size of data pages */
! 	size = add_size(size, mul_size(NSharedBuffers, BLCKSZ));
  
  	/* size of stuff controlled by freelist.c */
  	size = add_size(size, StrategyShmemSize());
! 	
  	return size;
  }
*** a/src/backend/storage/buffer/buf_table.c
--- b/src/backend/storage/buffer/buf_table.c
***************
*** 34,40 **** typedef struct
  
  static HTAB *SharedBufHash;
  
- 
  /*
   * Estimate space needed for mapping hashtable
   *		size is the desired hash table size (possibly more than NBuffers)
--- 34,39 ----
*** a/src/backend/storage/buffer/bufmgr.c
--- b/src/backend/storage/buffer/bufmgr.c
***************
*** 86,92 **** static bool IsForInput;
  static volatile BufferDesc *PinCountWaitBuf = NULL;
  
  
! static Buffer ReadBuffer_common(SMgrRelation reln, char relpersistence,
  				  ForkNumber forkNum, BlockNumber blockNum,
  				  ReadBufferMode mode, BufferAccessStrategy strategy,
  				  bool *hit);
--- 86,92 ----
  static volatile BufferDesc *PinCountWaitBuf = NULL;
  
  
! static Buffer ReadBuffer_common(int poolid, SMgrRelation reln, char relpersistence,
  				  ForkNumber forkNum, BlockNumber blockNum,
  				  ReadBufferMode mode, BufferAccessStrategy strategy,
  				  bool *hit);
***************
*** 101,107 **** static void TerminateBufferIO(volatile BufferDesc *buf, bool clear_dirty,
  				  int set_flag_bits);
  static void shared_buffer_write_error_callback(void *arg);
  static void local_buffer_write_error_callback(void *arg);
! static volatile BufferDesc *BufferAlloc(SMgrRelation smgr,
  			char relpersistence,
  			ForkNumber forkNum,
  			BlockNumber blockNum,
--- 101,107 ----
  				  int set_flag_bits);
  static void shared_buffer_write_error_callback(void *arg);
  static void local_buffer_write_error_callback(void *arg);
! static volatile BufferDesc *BufferAlloc(int poolid, SMgrRelation smgr,
  			char relpersistence,
  			ForkNumber forkNum,
  			BlockNumber blockNum,
***************
*** 231,236 **** ReadBufferExtended(Relation reln, ForkNumber forkNum, BlockNumber blockNum,
--- 231,237 ----
  				   ReadBufferMode mode, BufferAccessStrategy strategy)
  {
  	bool		hit;
+ 	int			poolid;
  	Buffer		buf;
  
  	/* Open it at the smgr level if not already done */
***************
*** 251,257 **** ReadBufferExtended(Relation reln, ForkNumber forkNum, BlockNumber blockNum,
  	 * miss.
  	 */
  	pgstat_count_buffer_read(reln);
! 	buf = ReadBuffer_common(reln->rd_smgr, reln->rd_rel->relpersistence,
  							forkNum, blockNum, mode, strategy, &hit);
  	if (hit)
  		pgstat_count_buffer_hit(reln);
--- 252,260 ----
  	 * miss.
  	 */
  	pgstat_count_buffer_read(reln);
! 	poolid = RelationIsInPriorityBufferPool(reln) ? PRIORITY_BUFFER_POOL : DEFAULT_BUFFER_POOL;
! 
! 	buf = ReadBuffer_common(poolid, reln->rd_smgr, reln->rd_rel->relpersistence,
  							forkNum, blockNum, mode, strategy, &hit);
  	if (hit)
  		pgstat_count_buffer_hit(reln);
***************
*** 278,285 **** ReadBufferWithoutRelcache(RelFileNode rnode, ForkNumber forkNum,
  	SMgrRelation smgr = smgropen(rnode, InvalidBackendId);
  
  	Assert(InRecovery);
! 
! 	return ReadBuffer_common(smgr, RELPERSISTENCE_PERMANENT, forkNum, blockNum,
  							 mode, strategy, &hit);
  }
  
--- 281,287 ----
  	SMgrRelation smgr = smgropen(rnode, InvalidBackendId);
  
  	Assert(InRecovery);
! 	return ReadBuffer_common(DEFAULT_BUFFER_POOL, smgr, RELPERSISTENCE_PERMANENT, forkNum, blockNum,
  							 mode, strategy, &hit);
  }
  
***************
*** 290,296 **** ReadBufferWithoutRelcache(RelFileNode rnode, ForkNumber forkNum,
   * *hit is set to true if the request was satisfied from shared buffer cache.
   */
  static Buffer
! ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
  				  BlockNumber blockNum, ReadBufferMode mode,
  				  BufferAccessStrategy strategy, bool *hit)
  {
--- 292,298 ----
   * *hit is set to true if the request was satisfied from shared buffer cache.
   */
  static Buffer
! ReadBuffer_common(int poolid, SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
  				  BlockNumber blockNum, ReadBufferMode mode,
  				  BufferAccessStrategy strategy, bool *hit)
  {
***************
*** 332,338 **** ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
  		 * lookup the buffer.  IO_IN_PROGRESS is set if the requested block is
  		 * not currently in memory.
  		 */
! 		bufHdr = BufferAlloc(smgr, relpersistence, forkNum, blockNum,
  							 strategy, &found);
  		if (found)
  			pgBufferUsage.shared_blks_hit++;
--- 334,340 ----
  		 * lookup the buffer.  IO_IN_PROGRESS is set if the requested block is
  		 * not currently in memory.
  		 */
! 		bufHdr = BufferAlloc(poolid, smgr, relpersistence, forkNum, blockNum,
  							 strategy, &found);
  		if (found)
  			pgBufferUsage.shared_blks_hit++;
***************
*** 531,537 **** ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
   * No locks are held either at entry or exit.
   */
  static volatile BufferDesc *
! BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
  			BlockNumber blockNum,
  			BufferAccessStrategy strategy,
  			bool *foundPtr)
--- 533,539 ----
   * No locks are held either at entry or exit.
   */
  static volatile BufferDesc *
! BufferAlloc(int poolid, SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
  			BlockNumber blockNum,
  			BufferAccessStrategy strategy,
  			bool *foundPtr)
***************
*** 612,618 **** BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
  		 * still held, since it would be bad to hold the spinlock while
  		 * possibly waking up other processes.
  		 */
! 		buf = StrategyGetBuffer(strategy, &lock_held);
  
  		Assert(buf->refcount == 0);
  
--- 614,620 ----
  		 * still held, since it would be bad to hold the spinlock while
  		 * possibly waking up other processes.
  		 */
! 		buf = StrategyGetBuffer(poolid, strategy, &lock_held);
  
  		Assert(buf->refcount == 0);
  
***************
*** 624,630 **** BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
  
  		/* Now it's safe to release the freelist lock */
  		if (lock_held)
! 			LWLockRelease(BufFreelistLock);
  
  		/*
  		 * If the buffer was dirty, try to write it out.  There is a race
--- 626,632 ----
  
  		/* Now it's safe to release the freelist lock */
  		if (lock_held)
! 			LWLockRelease(BufFreelistLock(poolid));
  
  		/*
  		 * If the buffer was dirty, try to write it out.  There is a race
***************
*** 1245,1251 **** BufferSync(int flags)
  	 * certainly need to be written for the next checkpoint attempt, too.
  	 */
  	num_to_write = 0;
! 	for (buf_id = 0; buf_id < NBuffers; buf_id++)
  	{
  		volatile BufferDesc *bufHdr = &BufferDescriptors[buf_id];
  
--- 1247,1253 ----
  	 * certainly need to be written for the next checkpoint attempt, too.
  	 */
  	num_to_write = 0;
! 	for (buf_id = 0; buf_id < NSharedBuffers; buf_id++)
  	{
  		volatile BufferDesc *bufHdr = &BufferDescriptors[buf_id];
  
***************
*** 1267,1273 **** BufferSync(int flags)
  	if (num_to_write == 0)
  		return;					/* nothing to do */
  
! 	TRACE_POSTGRESQL_BUFFER_SYNC_START(NBuffers, num_to_write);
  
  	/*
  	 * Loop over all buffers again, and write the ones (still) marked with
--- 1269,1275 ----
  	if (num_to_write == 0)
  		return;					/* nothing to do */
  
! 	TRACE_POSTGRESQL_BUFFER_SYNC_START(NSharedBuffers, num_to_write);
  
  	/*
  	 * Loop over all buffers again, and write the ones (still) marked with
***************
*** 1277,1284 **** BufferSync(int flags)
  	 * Note that we don't read the buffer alloc count here --- that should be
  	 * left untouched till the next BgBufferSync() call.
  	 */
! 	buf_id = StrategySyncStart(NULL, NULL);
! 	num_to_scan = NBuffers;
  	num_written = 0;
  	while (num_to_scan-- > 0)
  	{
--- 1279,1286 ----
  	 * Note that we don't read the buffer alloc count here --- that should be
  	 * left untouched till the next BgBufferSync() call.
  	 */
! 	buf_id = StrategySyncStart(0, NULL, NULL);
! 	num_to_scan = NSharedBuffers;
  	num_written = 0;
  	while (num_to_scan-- > 0)
  	{
***************
*** 1325,1341 **** BufferSync(int flags)
  			}
  		}
  
! 		if (++buf_id >= NBuffers)
  			buf_id = 0;
  	}
- 
  	/*
  	 * Update checkpoint statistics. As noted above, this doesn't include
  	 * buffers written by other backends or bgwriter scan.
  	 */
  	CheckpointStats.ckpt_bufs_written += num_written;
  
! 	TRACE_POSTGRESQL_BUFFER_SYNC_DONE(NBuffers, num_written, num_to_write);
  }
  
  /*
--- 1327,1342 ----
  			}
  		}
  
! 		if (++buf_id >= NSharedBuffers)
  			buf_id = 0;
  	}
  	/*
  	 * Update checkpoint statistics. As noted above, this doesn't include
  	 * buffers written by other backends or bgwriter scan.
  	 */
  	CheckpointStats.ckpt_bufs_written += num_written;
  
! 	TRACE_POSTGRESQL_BUFFER_SYNC_DONE(BufferPool[poolid].num_buffers, num_written, num_to_write);
  }
  
  /*
***************
*** 1388,1634 **** BgBufferSync(void)
  	int			num_to_scan;
  	int			num_written;
  	int			reusable_buffers;
  
  	/* Variables for final smoothed_density update */
  	long		new_strategy_delta;
  	uint32		new_recent_alloc;
  
! 	/*
! 	 * Find out where the freelist clock sweep currently is, and how many
! 	 * buffer allocations have happened since our last call.
! 	 */
! 	strategy_buf_id = StrategySyncStart(&strategy_passes, &recent_alloc);
! 
! 	/* Report buffer alloc counts to pgstat */
! 	BgWriterStats.m_buf_alloc += recent_alloc;
! 
! 	/*
! 	 * If we're not running the LRU scan, just stop after doing the stats
! 	 * stuff.  We mark the saved state invalid so that we can recover sanely
! 	 * if LRU scan is turned back on later.
! 	 */
! 	if (bgwriter_lru_maxpages <= 0)
  	{
! 		saved_info_valid = false;
! 		return true;
! 	}
! 
! 	/*
! 	 * Compute strategy_delta = how many buffers have been scanned by the
! 	 * clock sweep since last time.  If first time through, assume none. Then
! 	 * see if we are still ahead of the clock sweep, and if so, how many
! 	 * buffers we could scan before we'd catch up with it and "lap" it. Note:
! 	 * weird-looking coding of xxx_passes comparisons are to avoid bogus
! 	 * behavior when the passes counts wrap around.
! 	 */
! 	if (saved_info_valid)
! 	{
! 		int32		passes_delta = strategy_passes - prev_strategy_passes;
! 
! 		strategy_delta = strategy_buf_id - prev_strategy_buf_id;
! 		strategy_delta += (long) passes_delta *NBuffers;
  
! 		Assert(strategy_delta >= 0);
  
! 		if ((int32) (next_passes - strategy_passes) > 0)
  		{
! 			/* we're one pass ahead of the strategy point */
! 			bufs_to_lap = strategy_buf_id - next_to_clean;
! #ifdef BGW_DEBUG
! 			elog(DEBUG2, "bgwriter ahead: bgw %u-%u strategy %u-%u delta=%ld lap=%d",
! 				 next_passes, next_to_clean,
! 				 strategy_passes, strategy_buf_id,
! 				 strategy_delta, bufs_to_lap);
! #endif
  		}
! 		else if (next_passes == strategy_passes &&
! 				 next_to_clean >= strategy_buf_id)
  		{
! 			/* on same pass, but ahead or at least not behind */
! 			bufs_to_lap = NBuffers - (next_to_clean - strategy_buf_id);
! #ifdef BGW_DEBUG
! 			elog(DEBUG2, "bgwriter ahead: bgw %u-%u strategy %u-%u delta=%ld lap=%d",
! 				 next_passes, next_to_clean,
! 				 strategy_passes, strategy_buf_id,
! 				 strategy_delta, bufs_to_lap);
! #endif
  		}
  		else
  		{
  			/*
! 			 * We're behind, so skip forward to the strategy point and start
! 			 * cleaning from there.
  			 */
! #ifdef BGW_DEBUG
! 			elog(DEBUG2, "bgwriter behind: bgw %u-%u strategy %u-%u delta=%ld",
! 				 next_passes, next_to_clean,
! 				 strategy_passes, strategy_buf_id,
! 				 strategy_delta);
! #endif
  			next_to_clean = strategy_buf_id;
  			next_passes = strategy_passes;
! 			bufs_to_lap = NBuffers;
  		}
- 	}
- 	else
- 	{
- 		/*
- 		 * Initializing at startup or after LRU scanning had been off. Always
- 		 * start at the strategy point.
- 		 */
- #ifdef BGW_DEBUG
- 		elog(DEBUG2, "bgwriter initializing: strategy %u-%u",
- 			 strategy_passes, strategy_buf_id);
- #endif
- 		strategy_delta = 0;
- 		next_to_clean = strategy_buf_id;
- 		next_passes = strategy_passes;
- 		bufs_to_lap = NBuffers;
- 	}
  
! 	/* Update saved info for next time */
! 	prev_strategy_buf_id = strategy_buf_id;
! 	prev_strategy_passes = strategy_passes;
! 	saved_info_valid = true;
  
! 	/*
! 	 * Compute how many buffers had to be scanned for each new allocation, ie,
! 	 * 1/density of reusable buffers, and track a moving average of that.
! 	 *
! 	 * If the strategy point didn't move, we don't update the density estimate
! 	 */
! 	if (strategy_delta > 0 && recent_alloc > 0)
! 	{
! 		scans_per_alloc = (float) strategy_delta / (float) recent_alloc;
! 		smoothed_density += (scans_per_alloc - smoothed_density) /
! 			smoothing_samples;
! 	}
! 
! 	/*
! 	 * Estimate how many reusable buffers there are between the current
! 	 * strategy point and where we've scanned ahead to, based on the smoothed
! 	 * density estimate.
! 	 */
! 	bufs_ahead = NBuffers - bufs_to_lap;
! 	reusable_buffers_est = (float) bufs_ahead / smoothed_density;
  
! 	/*
! 	 * Track a moving average of recent buffer allocations.  Here, rather than
! 	 * a true average we want a fast-attack, slow-decline behavior: we
! 	 * immediately follow any increase.
! 	 */
! 	if (smoothed_alloc <= (float) recent_alloc)
! 		smoothed_alloc = recent_alloc;
! 	else
! 		smoothed_alloc += ((float) recent_alloc - smoothed_alloc) /
! 			smoothing_samples;
  
! 	/* Scale the estimate by a GUC to allow more aggressive tuning. */
! 	upcoming_alloc_est = (int) (smoothed_alloc * bgwriter_lru_multiplier);
  
! 	/*
! 	 * If recent_alloc remains at zero for many cycles, smoothed_alloc will
! 	 * eventually underflow to zero, and the underflows produce annoying
! 	 * kernel warnings on some platforms.  Once upcoming_alloc_est has gone to
! 	 * zero, there's no point in tracking smaller and smaller values of
! 	 * smoothed_alloc, so just reset it to exactly zero to avoid this
! 	 * syndrome.  It will pop back up as soon as recent_alloc increases.
! 	 */
! 	if (upcoming_alloc_est == 0)
! 		smoothed_alloc = 0;
  
! 	/*
! 	 * Even in cases where there's been little or no buffer allocation
! 	 * activity, we want to make a small amount of progress through the buffer
! 	 * cache so that as many reusable buffers as possible are clean after an
! 	 * idle period.
! 	 *
! 	 * (scan_whole_pool_milliseconds / BgWriterDelay) computes how many times
! 	 * the BGW will be called during the scan_whole_pool time; slice the
! 	 * buffer pool into that many sections.
! 	 */
! 	min_scan_buffers = (int) (NBuffers / (scan_whole_pool_milliseconds / BgWriterDelay));
  
! 	if (upcoming_alloc_est < (min_scan_buffers + reusable_buffers_est))
! 	{
! #ifdef BGW_DEBUG
! 		elog(DEBUG2, "bgwriter: alloc_est=%d too small, using min=%d + reusable_est=%d",
! 			 upcoming_alloc_est, min_scan_buffers, reusable_buffers_est);
! #endif
! 		upcoming_alloc_est = min_scan_buffers + reusable_buffers_est;
! 	}
  
! 	/*
! 	 * Now write out dirty reusable buffers, working forward from the
! 	 * next_to_clean point, until we have lapped the strategy scan, or cleaned
! 	 * enough buffers to match our estimate of the next cycle's allocation
! 	 * requirements, or hit the bgwriter_lru_maxpages limit.
! 	 */
  
! 	/* Make sure we can handle the pin inside SyncOneBuffer */
! 	ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
  
! 	num_to_scan = bufs_to_lap;
! 	num_written = 0;
! 	reusable_buffers = reusable_buffers_est;
  
! 	/* Execute the LRU scan */
! 	while (num_to_scan > 0 && reusable_buffers < upcoming_alloc_est)
! 	{
! 		int			buffer_state = SyncOneBuffer(next_to_clean, true);
  
! 		if (++next_to_clean >= NBuffers)
  		{
! 			next_to_clean = 0;
! 			next_passes++;
! 		}
! 		num_to_scan--;
  
! 		if (buffer_state & BUF_WRITTEN)
! 		{
! 			reusable_buffers++;
! 			if (++num_written >= bgwriter_lru_maxpages)
  			{
! 				BgWriterStats.m_maxwritten_clean++;
! 				break;
  			}
  		}
- 		else if (buffer_state & BUF_REUSABLE)
- 			reusable_buffers++;
- 	}
  
! 	BgWriterStats.m_buf_written_clean += num_written;
  
! #ifdef BGW_DEBUG
! 	elog(DEBUG1, "bgwriter: recent_alloc=%u smoothed=%.2f delta=%ld ahead=%d density=%.2f reusable_est=%d upcoming_est=%d scanned=%d wrote=%d reusable=%d",
! 		 recent_alloc, smoothed_alloc, strategy_delta, bufs_ahead,
! 		 smoothed_density, reusable_buffers_est, upcoming_alloc_est,
! 		 bufs_to_lap - num_to_scan,
! 		 num_written,
! 		 reusable_buffers - reusable_buffers_est);
! #endif
  
! 	/*
! 	 * Consider the above scan as being like a new allocation scan.
! 	 * Characterize its density and update the smoothed one based on it. This
! 	 * effectively halves the moving average period in cases where both the
! 	 * strategy and the background writer are doing some useful scanning,
! 	 * which is helpful because a long memory isn't as desirable on the
! 	 * density estimates.
! 	 */
! 	new_strategy_delta = bufs_to_lap - num_to_scan;
! 	new_recent_alloc = reusable_buffers - reusable_buffers_est;
! 	if (new_strategy_delta > 0 && new_recent_alloc > 0)
! 	{
! 		scans_per_alloc = (float) new_strategy_delta / (float) new_recent_alloc;
! 		smoothed_density += (scans_per_alloc - smoothed_density) /
! 			smoothing_samples;
! 
! #ifdef BGW_DEBUG
! 		elog(DEBUG2, "bgwriter: cleaner density alloc=%u scan=%ld density=%.2f new smoothed=%.2f",
! 			 new_recent_alloc, new_strategy_delta,
! 			 scans_per_alloc, smoothed_density);
! #endif
  	}
  
  	/* Return true if OK to hibernate */
--- 1389,1639 ----
  	int			num_to_scan;
  	int			num_written;
  	int			reusable_buffers;
+ 	int			poolid;
  
  	/* Variables for final smoothed_density update */
  	long		new_strategy_delta;
  	uint32		new_recent_alloc;
  
! 	for (poolid = 0; poolid < NUM_MAX_BUFFER_POOLS; poolid++)
  	{
! 		/*
! 		 * Find out where the freelist clock sweep currently is, and how many
! 		 * buffer allocations have happened since our last call.
! 		 */
! 		strategy_buf_id = StrategySyncStart(poolid, &strategy_passes, &recent_alloc);
  
! 		/* Report buffer alloc counts to pgstat */
! 		BgWriterStats.m_buf_alloc += recent_alloc;
  
! 		/*
! 		 * If we're not running the LRU scan, just stop after doing the stats
! 		 * stuff.  We mark the saved state invalid so that we can recover sanely
! 		 * if LRU scan is turned back on later.
! 		 */
! 		if (bgwriter_lru_maxpages <= 0)
  		{
! 			saved_info_valid = false;
! 			return true;
  		}
! 
! 		/*
! 		 * Compute strategy_delta = how many buffers have been scanned by the
! 		 * clock sweep since last time.  If first time through, assume none. Then
! 		 * see if we are still ahead of the clock sweep, and if so, how many
! 		 * buffers we could scan before we'd catch up with it and "lap" it. Note:
! 		 * weird-looking coding of xxx_passes comparisons are to avoid bogus
! 		 * behavior when the passes counts wrap around.
! 		 */
! 		if (saved_info_valid)
  		{
! 			int32		passes_delta = strategy_passes - prev_strategy_passes;
! 
! 			strategy_delta = strategy_buf_id - prev_strategy_buf_id;
! 			strategy_delta += (long) passes_delta * ((poolid == 0) ? NBuffers : NSharedBuffers);
! 
! 			/*Assert(strategy_delta >= 0); Temp fix to avoid the crash*/
! 
! 			if ((int32) (next_passes - strategy_passes) > 0)
! 			{
! 				/* we're one pass ahead of the strategy point */
! 				bufs_to_lap = strategy_buf_id - next_to_clean;
! 	#ifdef BGW_DEBUG
! 				elog(DEBUG2, "bgwriter ahead: bgw %u-%u strategy %u-%u delta=%ld lap=%d",
! 					 next_passes, next_to_clean,
! 					 strategy_passes, strategy_buf_id,
! 					 strategy_delta, bufs_to_lap);
! 	#endif
! 			}
! 			else if (next_passes == strategy_passes &&
! 					 next_to_clean >= strategy_buf_id)
! 			{
! 				/* on same pass, but ahead or at least not behind */
! 				bufs_to_lap = ((poolid == 0) ? NBuffers : NSharedBuffers) - (next_to_clean - strategy_buf_id);
! 	#ifdef BGW_DEBUG
! 				elog(DEBUG2, "bgwriter ahead: bgw %u-%u strategy %u-%u delta=%ld lap=%d",
! 					 next_passes, next_to_clean,
! 					 strategy_passes, strategy_buf_id,
! 					 strategy_delta, bufs_to_lap);
! 	#endif
! 			}
! 			else
! 			{
! 				/*
! 				 * We're behind, so skip forward to the strategy point and start
! 				 * cleaning from there.
! 				 */
! 	#ifdef BGW_DEBUG
! 				elog(DEBUG2, "bgwriter behind: bgw %u-%u strategy %u-%u delta=%ld",
! 					 next_passes, next_to_clean,
! 					 strategy_passes, strategy_buf_id,
! 					 strategy_delta);
! 	#endif
! 				next_to_clean = strategy_buf_id;
! 				next_passes = strategy_passes;
! 				bufs_to_lap = ((poolid == 0) ? NBuffers : NSharedBuffers);
! 			}
  		}
  		else
  		{
  			/*
! 			 * Initializing at startup or after LRU scanning had been off. Always
! 			 * start at the strategy point.
  			 */
! 	#ifdef BGW_DEBUG
! 			elog(DEBUG2, "bgwriter initializing: strategy %u-%u",
! 				 strategy_passes, strategy_buf_id);
! 	#endif
! 			strategy_delta = 0;
  			next_to_clean = strategy_buf_id;
  			next_passes = strategy_passes;
! 			bufs_to_lap = ((poolid == 0) ? NBuffers : NSharedBuffers);
  		}
  
! 		/* Update saved info for next time */
! 		prev_strategy_buf_id = strategy_buf_id;
! 		prev_strategy_passes = strategy_passes;
! 		saved_info_valid = true;
  
! 		/*
! 		 * Compute how many buffers had to be scanned for each new allocation, ie,
! 		 * 1/density of reusable buffers, and track a moving average of that.
! 		 *
! 		 * If the strategy point didn't move, we don't update the density estimate
! 		 */
! 		if (strategy_delta > 0 && recent_alloc > 0)
! 		{
! 			scans_per_alloc = (float) strategy_delta / (float) recent_alloc;
! 			smoothed_density += (scans_per_alloc - smoothed_density) /
! 				smoothing_samples;
! 		}
  
! 		/*
! 		 * Estimate how many reusable buffers there are between the current
! 		 * strategy point and where we've scanned ahead to, based on the smoothed
! 		 * density estimate.
! 		 */
! 		bufs_ahead = ((poolid == 0) ? NBuffers : NSharedBuffers) - bufs_to_lap;
! 		reusable_buffers_est = (float) bufs_ahead / smoothed_density;
  
! 		/*
! 		 * Track a moving average of recent buffer allocations.  Here, rather than
! 		 * a true average we want a fast-attack, slow-decline behavior: we
! 		 * immediately follow any increase.
! 		 */
! 		if (smoothed_alloc <= (float) recent_alloc)
! 			smoothed_alloc = recent_alloc;
! 		else
! 			smoothed_alloc += ((float) recent_alloc - smoothed_alloc) /
! 				smoothing_samples;
  
! 		/* Scale the estimate by a GUC to allow more aggressive tuning. */
! 		upcoming_alloc_est = (int) (smoothed_alloc * bgwriter_lru_multiplier);
  
! 		/*
! 		 * If recent_alloc remains at zero for many cycles, smoothed_alloc will
! 		 * eventually underflow to zero, and the underflows produce annoying
! 		 * kernel warnings on some platforms.  Once upcoming_alloc_est has gone to
! 		 * zero, there's no point in tracking smaller and smaller values of
! 		 * smoothed_alloc, so just reset it to exactly zero to avoid this
! 		 * syndrome.  It will pop back up as soon as recent_alloc increases.
! 		 */
! 		if (upcoming_alloc_est == 0)
! 			smoothed_alloc = 0;
  
! 		/*
! 		 * Even in cases where there's been little or no buffer allocation
! 		 * activity, we want to make a small amount of progress through the buffer
! 		 * cache so that as many reusable buffers as possible are clean after an
! 		 * idle period.
! 		 *
! 		 * (scan_whole_pool_milliseconds / BgWriterDelay) computes how many times
! 		 * the BGW will be called during the scan_whole_pool time; slice the
! 		 * buffer pool into that many sections.
! 		 */
! 		min_scan_buffers = (int) (((poolid == 0) ? NBuffers : NPriorityBuffers) / (scan_whole_pool_milliseconds / BgWriterDelay));
  
! 		if (upcoming_alloc_est < (min_scan_buffers + reusable_buffers_est))
! 		{
! 	#ifdef BGW_DEBUG
! 			elog(DEBUG2, "bgwriter: alloc_est=%d too small, using min=%d + reusable_est=%d",
! 				 upcoming_alloc_est, min_scan_buffers, reusable_buffers_est);
! 	#endif
! 			upcoming_alloc_est = min_scan_buffers + reusable_buffers_est;
! 		}
  
! 		/*
! 		 * Now write out dirty reusable buffers, working forward from the
! 		 * next_to_clean point, until we have lapped the strategy scan, or cleaned
! 		 * enough buffers to match our estimate of the next cycle's allocation
! 		 * requirements, or hit the bgwriter_lru_maxpages limit.
! 		 */
  
! 		/* Make sure we can handle the pin inside SyncOneBuffer */
! 		ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
  
! 		num_to_scan = bufs_to_lap;
! 		num_written = 0;
! 		reusable_buffers = reusable_buffers_est;
  
! 		/* Execute the LRU scan */
! 		while (num_to_scan > 0 && reusable_buffers < upcoming_alloc_est)
  		{
! 			int			buffer_state = SyncOneBuffer(next_to_clean, true);
  
! 			if (++next_to_clean >= ((poolid == 0) ? NBuffers : NSharedBuffers))
! 			{
! 				next_to_clean = (poolid == 0) ? 0 : NBuffers;
! 				next_passes++;
! 			}
! 			num_to_scan--;
! 
! 			if (buffer_state & BUF_WRITTEN)
  			{
! 				reusable_buffers++;
! 				if (++num_written >= bgwriter_lru_maxpages)
! 				{
! 					BgWriterStats.m_maxwritten_clean++;
! 					break;
! 				}
  			}
+ 			else if (buffer_state & BUF_REUSABLE)
+ 				reusable_buffers++;
  		}
  
! 		BgWriterStats.m_buf_written_clean += num_written;
  
! 	#ifdef BGW_DEBUG
! 		elog(DEBUG1, "bgwriter: recent_alloc=%u smoothed=%.2f delta=%ld ahead=%d density=%.2f reusable_est=%d upcoming_est=%d scanned=%d wrote=%d reusable=%d",
! 			 recent_alloc, smoothed_alloc, strategy_delta, bufs_ahead,
! 			 smoothed_density, reusable_buffers_est, upcoming_alloc_est,
! 			 bufs_to_lap - num_to_scan,
! 			 num_written,
! 			 reusable_buffers - reusable_buffers_est);
! 	#endif
  
! 		/*
! 		 * Consider the above scan as being like a new allocation scan.
! 		 * Characterize its density and update the smoothed one based on it. This
! 		 * effectively halves the moving average period in cases where both the
! 		 * strategy and the background writer are doing some useful scanning,
! 		 * which is helpful because a long memory isn't as desirable on the
! 		 * density estimates.
! 		 */
! 		new_strategy_delta = bufs_to_lap - num_to_scan;
! 		new_recent_alloc = reusable_buffers - reusable_buffers_est;
! 		if (new_strategy_delta > 0 && new_recent_alloc > 0)
! 		{
! 			scans_per_alloc = (float) new_strategy_delta / (float) new_recent_alloc;
! 			smoothed_density += (scans_per_alloc - smoothed_density) /
! 				smoothing_samples;
! 
! 	#ifdef BGW_DEBUG
! 			elog(DEBUG2, "bgwriter: cleaner density alloc=%u scan=%ld density=%.2f new smoothed=%.2f",
! 				 new_recent_alloc, new_strategy_delta,
! 				 scans_per_alloc, smoothed_density);
! 	#endif
! 		}
  	}
  
  	/* Return true if OK to hibernate */
***************
*** 1716,1722 **** AtEOXact_Buffers(bool isCommit)
  		int			RefCountErrors = 0;
  		Buffer		b;
  
! 		for (b = 1; b <= NBuffers; b++)
  		{
  			if (PrivateRefCount[b - 1] != 0)
  			{
--- 1721,1727 ----
  		int			RefCountErrors = 0;
  		Buffer		b;
  
! 		for (b = 1; b <= NSharedBuffers; b++)
  		{
  			if (PrivateRefCount[b - 1] != 0)
  			{
***************
*** 1762,1768 **** AtProcExit_Buffers(int code, Datum arg)
  		int			RefCountErrors = 0;
  		Buffer		b;
  
! 		for (b = 1; b <= NBuffers; b++)
  		{
  			if (PrivateRefCount[b - 1] != 0)
  			{
--- 1767,1773 ----
  		int			RefCountErrors = 0;
  		Buffer		b;
  
! 		for (b = 1; b <= NSharedBuffers; b++)
  		{
  			if (PrivateRefCount[b - 1] != 0)
  			{
***************
*** 2143,2149 **** DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
  		return;
  	}
  
! 	for (i = 0; i < NBuffers; i++)
  	{
  		volatile BufferDesc *bufHdr = &BufferDescriptors[i];
  
--- 2148,2154 ----
  		return;
  	}
  
! 	for (i = 0; i < NSharedBuffers; i++)
  	{
  		volatile BufferDesc *bufHdr = &BufferDescriptors[i];
  
***************
*** 2232,2238 **** DropRelFileNodesAllBuffers(RelFileNodeBackend *rnodes, int nnodes)
  	if (use_bsearch)
  		pg_qsort(nodes, n, sizeof(RelFileNode), rnode_comparator);
  
! 	for (i = 0; i < NBuffers; i++)
  	{
  		RelFileNode *rnode = NULL;
  		volatile BufferDesc *bufHdr = &BufferDescriptors[i];
--- 2237,2243 ----
  	if (use_bsearch)
  		pg_qsort(nodes, n, sizeof(RelFileNode), rnode_comparator);
  
! 	for (i = 0; i < NSharedBuffers; i++)
  	{
  		RelFileNode *rnode = NULL;
  		volatile BufferDesc *bufHdr = &BufferDescriptors[i];
***************
*** 2297,2303 **** DropDatabaseBuffers(Oid dbid)
  	 * database isn't our own.
  	 */
  
! 	for (i = 0; i < NBuffers; i++)
  	{
  		volatile BufferDesc *bufHdr = &BufferDescriptors[i];
  
--- 2302,2308 ----
  	 * database isn't our own.
  	 */
  
! 	for (i = 0; i < NSharedBuffers; i++)
  	{
  		volatile BufferDesc *bufHdr = &BufferDescriptors[i];
  
***************
*** 2330,2336 **** PrintBufferDescs(void)
  	int			i;
  	volatile BufferDesc *buf = BufferDescriptors;
  
! 	for (i = 0; i < NBuffers; ++i, ++buf)
  	{
  		/* theoretically we should lock the bufhdr here */
  		elog(LOG,
--- 2335,2341 ----
  	int			i;
  	volatile BufferDesc *buf = BufferDescriptors;
  
! 	for (i = 0; i < NSharedBuffers; ++i, ++buf)
  	{
  		/* theoretically we should lock the bufhdr here */
  		elog(LOG,
***************
*** 2351,2357 **** PrintPinnedBufs(void)
  	int			i;
  	volatile BufferDesc *buf = BufferDescriptors;
  
! 	for (i = 0; i < NBuffers; ++i, ++buf)
  	{
  		if (PrivateRefCount[i] > 0)
  		{
--- 2356,2362 ----
  	int			i;
  	volatile BufferDesc *buf = BufferDescriptors;
  
! 	for (i = 0; i < NSharedBuffers; ++i, ++buf)
  	{
  		if (PrivateRefCount[i] > 0)
  		{
***************
*** 2436,2442 **** FlushRelationBuffers(Relation rel)
  	/* Make sure we can handle the pin inside the loop */
  	ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
  
! 	for (i = 0; i < NBuffers; i++)
  	{
  		bufHdr = &BufferDescriptors[i];
  
--- 2441,2447 ----
  	/* Make sure we can handle the pin inside the loop */
  	ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
  
! 	for (i = 0; i < NSharedBuffers; i++)
  	{
  		bufHdr = &BufferDescriptors[i];
  
***************
*** 2486,2492 **** FlushDatabaseBuffers(Oid dbid)
  	/* Make sure we can handle the pin inside the loop */
  	ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
  
! 	for (i = 0; i < NBuffers; i++)
  	{
  		bufHdr = &BufferDescriptors[i];
  
--- 2491,2497 ----
  	/* Make sure we can handle the pin inside the loop */
  	ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
  
! 	for (i = 0; i < NSharedBuffers; i++)
  	{
  		bufHdr = &BufferDescriptors[i];
  
*** a/src/backend/storage/buffer/freelist.c
--- b/src/backend/storage/buffer/freelist.c
***************
*** 49,55 **** typedef struct
  } BufferStrategyControl;
  
  /* Pointers to shared state */
! static BufferStrategyControl *StrategyControl = NULL;
  
  /*
   * Private (non-shared) state for managing a ring of shared buffers to re-use.
--- 49,66 ----
  } BufferStrategyControl;
  
  /* Pointers to shared state */
! static BufferStrategyControl *StrategyControl[NUM_MAX_BUFFER_POOLS];
! 
! struct bufferAccessStrategyStatus
! {
! 	char name[NAMEDATALEN];
! 	int  num_buffers;
! };
! 
! struct bufferAccessStrategyStatus BufferStrategyStatus[NUM_MAX_BUFFER_POOLS] = {
! 	{"Default Buffer Strategy status", 0},
! 	{"Priority Buffer Strategy status", 0}
! };
  
  /*
   * Private (non-shared) state for managing a ring of shared buffers to re-use.
***************
*** 109,115 **** static void AddBufferToRing(BufferAccessStrategy strategy,
   *	kernel calls while holding the buffer header spinlock.
   */
  volatile BufferDesc *
! StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
  {
  	volatile BufferDesc *buf;
  	Latch	   *bgwriterLatch;
--- 120,126 ----
   *	kernel calls while holding the buffer header spinlock.
   */
  volatile BufferDesc *
! StrategyGetBuffer(int poolid, BufferAccessStrategy strategy, bool *lock_held)
  {
  	volatile BufferDesc *buf;
  	Latch	   *bgwriterLatch;
***************
*** 131,144 **** StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
  
  	/* Nope, so lock the freelist */
  	*lock_held = true;
! 	LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
  
  	/*
  	 * We count buffer allocation requests so that the bgwriter can estimate
  	 * the rate of buffer consumption.  Note that buffers recycled by a
  	 * strategy object are intentionally not counted here.
  	 */
! 	StrategyControl->numBufferAllocs++;
  
  	/*
  	 * If bgwriterLatch is set, we need to waken the bgwriter, but we should
--- 142,155 ----
  
  	/* Nope, so lock the freelist */
  	*lock_held = true;
! 	LWLockAcquire(BufFreelistLock(poolid), LW_EXCLUSIVE);
  
  	/*
  	 * We count buffer allocation requests so that the bgwriter can estimate
  	 * the rate of buffer consumption.  Note that buffers recycled by a
  	 * strategy object are intentionally not counted here.
  	 */
! 	StrategyControl[poolid]->numBufferAllocs++;
  
  	/*
  	 * If bgwriterLatch is set, we need to waken the bgwriter, but we should
***************
*** 146,158 **** StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
  	 * is annoyingly tedious, but it happens at most once per bgwriter cycle,
  	 * so the performance hit is minimal.
  	 */
! 	bgwriterLatch = StrategyControl->bgwriterLatch;
  	if (bgwriterLatch)
  	{
! 		StrategyControl->bgwriterLatch = NULL;
! 		LWLockRelease(BufFreelistLock);
  		SetLatch(bgwriterLatch);
! 		LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
  	}
  
  	/*
--- 157,169 ----
  	 * is annoyingly tedious, but it happens at most once per bgwriter cycle,
  	 * so the performance hit is minimal.
  	 */
! 	bgwriterLatch = StrategyControl[poolid]->bgwriterLatch;
  	if (bgwriterLatch)
  	{
! 		StrategyControl[poolid]->bgwriterLatch = NULL;
! 		LWLockRelease(BufFreelistLock(poolid));
  		SetLatch(bgwriterLatch);
! 		LWLockAcquire(BufFreelistLock(poolid), LW_EXCLUSIVE);
  	}
  
  	/*
***************
*** 161,173 **** StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
  	 * individual buffer spinlocks, so it's OK to manipulate them without
  	 * holding the spinlock.
  	 */
! 	while (StrategyControl->firstFreeBuffer >= 0)
  	{
! 		buf = &BufferDescriptors[StrategyControl->firstFreeBuffer];
  		Assert(buf->freeNext != FREENEXT_NOT_IN_LIST);
  
  		/* Unconditionally remove buffer from freelist */
! 		StrategyControl->firstFreeBuffer = buf->freeNext;
  		buf->freeNext = FREENEXT_NOT_IN_LIST;
  
  		/*
--- 172,184 ----
  	 * individual buffer spinlocks, so it's OK to manipulate them without
  	 * holding the spinlock.
  	 */
! 	while (StrategyControl[poolid]->firstFreeBuffer >= 0)
  	{
! 		buf = &BufferDescriptors[StrategyControl[poolid]->firstFreeBuffer];
  		Assert(buf->freeNext != FREENEXT_NOT_IN_LIST);
  
  		/* Unconditionally remove buffer from freelist */
! 		StrategyControl[poolid]->firstFreeBuffer = buf->freeNext;
  		buf->freeNext = FREENEXT_NOT_IN_LIST;
  
  		/*
***************
*** 188,202 **** StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
  	}
  
  	/* Nothing on the freelist, so run the "clock sweep" algorithm */
! 	trycounter = NBuffers;
  	for (;;)
  	{
! 		buf = &BufferDescriptors[StrategyControl->nextVictimBuffer];
  
! 		if (++StrategyControl->nextVictimBuffer >= NBuffers)
  		{
! 			StrategyControl->nextVictimBuffer = 0;
! 			StrategyControl->completePasses++;
  		}
  
  		/*
--- 199,213 ----
  	}
  
  	/* Nothing on the freelist, so run the "clock sweep" algorithm */
! 	trycounter = BufferStrategyStatus[poolid].num_buffers;
  	for (;;)
  	{
! 		buf = &BufferDescriptors[StrategyControl[poolid]->nextVictimBuffer];
  
! 		if (++StrategyControl[poolid]->nextVictimBuffer >= ((poolid == 0) ? BufferStrategyStatus[poolid].num_buffers : NSharedBuffers))
  		{
! 			StrategyControl[poolid]->nextVictimBuffer = (poolid == 0) ? 0 : BufferStrategyStatus[poolid - 1].num_buffers;
! 			StrategyControl[poolid]->completePasses++;
  		}
  
  		/*
***************
*** 209,215 **** StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
  			if (buf->usage_count > 0)
  			{
  				buf->usage_count--;
! 				trycounter = NBuffers;
  			}
  			else
  			{
--- 220,226 ----
  			if (buf->usage_count > 0)
  			{
  				buf->usage_count--;
! 				trycounter = BufferStrategyStatus[poolid].num_buffers;
  			}
  			else
  			{
***************
*** 241,247 **** StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
  void
  StrategyFreeBuffer(volatile BufferDesc *buf)
  {
! 	LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
  
  	/*
  	 * It is possible that we are told to put something in the freelist that
--- 252,260 ----
  void
  StrategyFreeBuffer(volatile BufferDesc *buf)
  {
! 	int poolid = buf->buf_id < NBuffers ? DEFAULT_BUFFER_POOL : PRIORITY_BUFFER_POOL;
! 
! 	LWLockAcquire(BufFreelistLock(poolid), LW_EXCLUSIVE);
  
  	/*
  	 * It is possible that we are told to put something in the freelist that
***************
*** 249,261 **** StrategyFreeBuffer(volatile BufferDesc *buf)
  	 */
  	if (buf->freeNext == FREENEXT_NOT_IN_LIST)
  	{
! 		buf->freeNext = StrategyControl->firstFreeBuffer;
  		if (buf->freeNext < 0)
! 			StrategyControl->lastFreeBuffer = buf->buf_id;
! 		StrategyControl->firstFreeBuffer = buf->buf_id;
  	}
  
! 	LWLockRelease(BufFreelistLock);
  }
  
  /*
--- 262,274 ----
  	 */
  	if (buf->freeNext == FREENEXT_NOT_IN_LIST)
  	{
! 		buf->freeNext = StrategyControl[poolid]->firstFreeBuffer;
  		if (buf->freeNext < 0)
! 			StrategyControl[poolid]->lastFreeBuffer = buf->buf_id;
! 		StrategyControl[poolid]->firstFreeBuffer = buf->buf_id;
  	}
  
! 	LWLockRelease(BufFreelistLock(poolid));
  }
  
  /*
***************
*** 270,289 **** StrategyFreeBuffer(volatile BufferDesc *buf)
   * being read.
   */
  int
! StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc)
  {
  	int			result;
  
! 	LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
! 	result = StrategyControl->nextVictimBuffer;
  	if (complete_passes)
! 		*complete_passes = StrategyControl->completePasses;
  	if (num_buf_alloc)
  	{
! 		*num_buf_alloc = StrategyControl->numBufferAllocs;
! 		StrategyControl->numBufferAllocs = 0;
  	}
! 	LWLockRelease(BufFreelistLock);
  	return result;
  }
  
--- 283,302 ----
   * being read.
   */
  int
! StrategySyncStart(int poolid, uint32 *complete_passes, uint32 *num_buf_alloc)
  {
  	int			result;
  
! 	LWLockAcquire(BufFreelistLock(poolid), LW_EXCLUSIVE);
! 	result = StrategyControl[poolid]->nextVictimBuffer;
  	if (complete_passes)
! 		*complete_passes = StrategyControl[poolid]->completePasses;
  	if (num_buf_alloc)
  	{
! 		*num_buf_alloc = StrategyControl[poolid]->numBufferAllocs;
! 		StrategyControl[poolid]->numBufferAllocs = 0;
  	}
! 	LWLockRelease(BufFreelistLock(poolid));
  	return result;
  }
  
***************
*** 298,311 **** StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc)
  void
  StrategyNotifyBgWriter(Latch *bgwriterLatch)
  {
! 	/*
! 	 * We acquire the BufFreelistLock just to ensure that the store appears
! 	 * atomic to StrategyGetBuffer.  The bgwriter should call this rather
! 	 * infrequently, so there's no performance penalty from being safe.
! 	 */
! 	LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
! 	StrategyControl->bgwriterLatch = bgwriterLatch;
! 	LWLockRelease(BufFreelistLock);
  }
  
  
--- 311,329 ----
  void
  StrategyNotifyBgWriter(Latch *bgwriterLatch)
  {
! 	int poolid;
! 
! 	for (poolid = 0; poolid < NUM_MAX_BUFFER_POOLS; poolid++)
! 	{
! 		/*
! 		 * We acquire the BufFreelistLock just to ensure that the store appears
! 		 * atomic to StrategyGetBuffer.  The bgwriter should call this rather
! 		 * infrequently, so there's no performance penalty from being safe.
! 		 */
! 		LWLockAcquire(BufFreelistLock(poolid), LW_EXCLUSIVE);
! 		StrategyControl[poolid]->bgwriterLatch = bgwriterLatch;
! 		LWLockRelease(BufFreelistLock(poolid));
! 	}
  }
  
  
***************
*** 318,329 **** StrategyNotifyBgWriter(Latch *bgwriterLatch)
   * is also determined here.
   */
  Size
! StrategyShmemSize(void)
  {
  	Size		size = 0;
  
  	/* size of lookup hash table ... see comment in StrategyInitialize */
! 	size = add_size(size, BufTableShmemSize(NBuffers + NUM_BUFFER_PARTITIONS));
  
  	/* size of the shared replacement strategy control block */
  	size = add_size(size, MAXALIGN(sizeof(BufferStrategyControl)));
--- 336,347 ----
   * is also determined here.
   */
  Size
! StrategyShmemSize()
  {
  	Size		size = 0;
  
  	/* size of lookup hash table ... see comment in StrategyInitialize */
! 	size = add_size(size, BufTableShmemSize(NSharedBuffers + NUM_BUFFER_PARTITIONS));
  
  	/* size of the shared replacement strategy control block */
  	size = add_size(size, MAXALIGN(sizeof(BufferStrategyControl)));
***************
*** 342,394 **** void
  StrategyInitialize(bool init)
  {
  	bool		found;
  
  	/*
  	 * Initialize the shared buffer lookup hashtable.
  	 *
  	 * Since we can't tolerate running out of lookup table entries, we must be
  	 * sure to specify an adequate table size here.  The maximum steady-state
! 	 * usage is of course NBuffers entries, but BufferAlloc() tries to insert
  	 * a new entry before deleting the old.  In principle this could be
  	 * happening in each partition concurrently, so we could need as many as
! 	 * NBuffers + NUM_BUFFER_PARTITIONS entries.
  	 */
! 	InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS);
  
! 	/*
! 	 * Get or create the shared strategy control block
! 	 */
! 	StrategyControl = (BufferStrategyControl *)
! 		ShmemInitStruct("Buffer Strategy Status",
! 						sizeof(BufferStrategyControl),
! 						&found);
  
! 	if (!found)
  	{
  		/*
! 		 * Only done once, usually in postmaster
  		 */
! 		Assert(init);
  
! 		/*
! 		 * Grab the whole linked list of free buffers for our strategy. We
! 		 * assume it was previously set up by InitBufferPool().
! 		 */
! 		StrategyControl->firstFreeBuffer = 0;
! 		StrategyControl->lastFreeBuffer = NBuffers - 1;
  
! 		/* Initialize the clock sweep pointer */
! 		StrategyControl->nextVictimBuffer = 0;
  
! 		/* Clear statistics */
! 		StrategyControl->completePasses = 0;
! 		StrategyControl->numBufferAllocs = 0;
  
! 		/* No pending notification */
! 		StrategyControl->bgwriterLatch = NULL;
  	}
- 	else
- 		Assert(!init);
  }
  
  
--- 360,419 ----
  StrategyInitialize(bool init)
  {
  	bool		found;
+ 	int			poolid;
  
  	/*
  	 * Initialize the shared buffer lookup hashtable.
  	 *
  	 * Since we can't tolerate running out of lookup table entries, we must be
  	 * sure to specify an adequate table size here.  The maximum steady-state
! 	 * usage is of course NSharedBuffers entries, but BufferAlloc() tries to insert
  	 * a new entry before deleting the old.  In principle this could be
  	 * happening in each partition concurrently, so we could need as many as
! 	 * NSharedBuffers + NUM_BUFFER_PARTITIONS entries.
  	 */
! 	InitBufTable(NSharedBuffers + NUM_BUFFER_PARTITIONS);
  
! 	BufferStrategyStatus[DEFAULT_BUFFER_POOL].num_buffers = NBuffers;
! 	BufferStrategyStatus[PRIORITY_BUFFER_POOL].num_buffers = NPriorityBuffers;
  
! 	for (poolid = 0; poolid < NUM_MAX_BUFFER_POOLS; poolid++)
  	{
  		/*
! 		 * Get or create the shared strategy control block
  		 */
! 		StrategyControl[poolid] = (BufferStrategyControl *)
! 			ShmemInitStruct(BufferStrategyStatus[poolid].name,
! 							sizeof(BufferStrategyControl),
! 							&found);
  
! 		if (!found)
! 		{
! 			/*
! 			 * Only done once, usually in postmaster
! 			 */
! 			Assert(init);
  
! 			/*
! 			 * Grab the whole linked list of free buffers for our strategy. We
! 			 * assume it was previously set up by InitBufferPool().
! 			 */
! 			StrategyControl[poolid]->firstFreeBuffer = (poolid == 0) ? 0 : BufferStrategyStatus[poolid - 1].num_buffers;
! 			StrategyControl[poolid]->lastFreeBuffer = BufferStrategyStatus[poolid].num_buffers - 1;
  
! 			/* Initialize the clock sweep pointer */
! 			StrategyControl[poolid]->nextVictimBuffer = (poolid == 0) ? 0 : BufferStrategyStatus[poolid - 1].num_buffers;;
  
! 			/* Clear statistics */
! 			StrategyControl[poolid]->completePasses = 0;
! 			StrategyControl[poolid]->numBufferAllocs = 0;
! 
! 			/* No pending notification */
! 			StrategyControl[poolid]->bgwriterLatch = NULL;
! 		}
! 		else
! 			Assert(!init);
  	}
  }
  
  
***************
*** 438,444 **** GetAccessStrategy(BufferAccessStrategyType btype)
  	}
  
  	/* Make sure ring isn't an undue fraction of shared buffers */
! 	ring_size = Min(NBuffers / 8, ring_size);
  
  	/* Allocate the object and initialize all elements to zeroes */
  	strategy = (BufferAccessStrategy)
--- 463,469 ----
  	}
  
  	/* Make sure ring isn't an undue fraction of shared buffers */
! 	ring_size = Min(NSharedBuffers / 8, ring_size);
  
  	/* Allocate the object and initialize all elements to zeroes */
  	strategy = (BufferAccessStrategy)
*** a/src/backend/storage/lmgr/lwlock.c
--- b/src/backend/storage/lmgr/lwlock.c
***************
*** 229,235 **** NumLWLocks(void)
  	numLocks = NUM_FIXED_LWLOCKS;
  
  	/* bufmgr.c needs two for each shared buffer */
! 	numLocks += 2 * NBuffers;
  
  	/* proc.c needs one for each backend or auxiliary process */
  	numLocks += MaxBackends + NUM_AUXILIARY_PROCS;
--- 229,235 ----
  	numLocks = NUM_FIXED_LWLOCKS;
  
  	/* bufmgr.c needs two for each shared buffer */
! 	numLocks += 2 * NSharedBuffers;
  
  	/* proc.c needs one for each backend or auxiliary process */
  	numLocks += MaxBackends + NUM_AUXILIARY_PROCS;
*** a/src/backend/tcop/postgres.c
--- b/src/backend/tcop/postgres.c
***************
*** 3538,3544 **** PostgresMain(int argc, char *argv[],
  
  		MyStartTime = time(NULL);
  	}
! 
  	SetProcessingMode(InitProcessing);
  
  	/* Compute paths, if we didn't inherit them from postmaster */
--- 3538,3544 ----
  
  		MyStartTime = time(NULL);
  	}
! 	
  	SetProcessingMode(InitProcessing);
  
  	/* Compute paths, if we didn't inherit them from postmaster */
*** a/src/backend/utils/init/globals.c
--- b/src/backend/utils/init/globals.c
***************
*** 107,112 **** int			maintenance_work_mem = 16384;
--- 107,114 ----
   * register background workers.
   */
  int			NBuffers = 1000;
+ int			NPriorityBuffers = 0;
+ int			NSharedBuffers;
  int			MaxConnections = 90;
  int			max_worker_processes = 8;
  int			MaxBackends = 0;
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
***************
*** 1712,1717 **** static struct config_int ConfigureNamesInt[] =
--- 1712,1728 ----
  	},
  
  	{
+ 		{"priority_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+ 			gettext_noop("Sets the number of priority buffers used by the server."),
+ 			NULL,
+ 			GUC_UNIT_BLOCKS
+ 		},
+ 		&NPriorityBuffers,
+ 		1024, 16, INT_MAX / 2,
+ 		NULL, NULL, NULL
+ 	},
+ 
+ 	{
  		{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
  			gettext_noop("Sets the maximum number of temporary buffers used by each session."),
  			NULL,
*** a/src/backend/utils/misc/postgresql.conf.sample
--- b/src/backend/utils/misc/postgresql.conf.sample
***************
*** 114,119 ****
--- 114,121 ----
  
  #shared_buffers = 32MB			# min 128kB
  					# (change requires restart)
+ #priority_buffers = 32MB		#Buffers used for priority tables
+ 					# (change requires restart)
  #huge_pages = try			# on, off, or try
  					# (change requires restart)
  #temp_buffers = 8MB			# min 800kB
*** a/src/include/miscadmin.h
--- b/src/include/miscadmin.h
***************
*** 126,132 **** do { \
  /*****************************************************************************
   *	  globals.h --															 *
   *****************************************************************************/
- 
  /*
   * from utils/init/globals.c
   */
--- 126,131 ----
***************
*** 140,146 **** extern bool ExitOnAnyError;
  
  extern PGDLLIMPORT char *DataDir;
  
! extern PGDLLIMPORT int NBuffers;
  extern int	MaxBackends;
  extern int	MaxConnections;
  extern int	max_worker_processes;
--- 139,145 ----
  
  extern PGDLLIMPORT char *DataDir;
  
! extern PGDLLIMPORT int NSharedBuffers;
  extern int	MaxBackends;
  extern int	MaxConnections;
  extern int	max_worker_processes;
*** a/src/include/storage/buf_internals.h
--- b/src/include/storage/buf_internals.h
***************
*** 23,28 ****
--- 23,31 ----
  #include "storage/spin.h"
  #include "utils/relcache.h"
  
+ #define NUM_MAX_BUFFER_POOLS 2
+ #define DEFAULT_BUFFER_POOL 0
+ #define PRIORITY_BUFFER_POOL 1
  
  /*
   * Flags for buffer descriptors
***************
*** 185,197 **** extern BufferDesc *LocalBufferDescriptors;
   */
  
  /* freelist.c */
! extern volatile BufferDesc *StrategyGetBuffer(BufferAccessStrategy strategy,
  				  bool *lock_held);
  extern void StrategyFreeBuffer(volatile BufferDesc *buf);
  extern bool StrategyRejectBuffer(BufferAccessStrategy strategy,
  					 volatile BufferDesc *buf);
  
! extern int	StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc);
  extern void StrategyNotifyBgWriter(Latch *bgwriterLatch);
  
  extern Size StrategyShmemSize(void);
--- 188,200 ----
   */
  
  /* freelist.c */
! extern volatile BufferDesc *StrategyGetBuffer(int poolid, BufferAccessStrategy strategy,
  				  bool *lock_held);
  extern void StrategyFreeBuffer(volatile BufferDesc *buf);
  extern bool StrategyRejectBuffer(BufferAccessStrategy strategy,
  					 volatile BufferDesc *buf);
  
! extern int	StrategySyncStart(int poolid, uint32 *complete_passes, uint32 *num_buf_alloc);
  extern void StrategyNotifyBgWriter(Latch *bgwriterLatch);
  
  extern Size StrategyShmemSize(void);
*** a/src/include/storage/bufmgr.h
--- b/src/include/storage/bufmgr.h
***************
*** 16,21 ****
--- 16,22 ----
  
  #include "storage/block.h"
  #include "storage/buf.h"
+ #include "storage/buf_internals.h"
  #include "storage/bufpage.h"
  #include "storage/relfilenode.h"
  #include "utils/relcache.h"
***************
*** 44,50 **** typedef enum
--- 45,53 ----
  } ReadBufferMode;
  
  /* in globals.c ... this duplicates miscadmin.h */
+ extern PGDLLIMPORT int NSharedBuffers;
  extern PGDLLIMPORT int NBuffers;
+ extern PGDLLIMPORT int NPriorityBuffers;
  
  /* in bufmgr.c */
  extern bool zero_damaged_pages;
***************
*** 97,103 **** extern PGDLLIMPORT int32 *LocalRefCount;
   */
  #define BufferIsValid(bufnum) \
  ( \
! 	AssertMacro((bufnum) <= NBuffers && (bufnum) >= -NLocBuffer), \
  	(bufnum) != InvalidBuffer  \
  )
  
--- 100,106 ----
   */
  #define BufferIsValid(bufnum) \
  ( \
! 	AssertMacro((bufnum) <= NSharedBuffers && (bufnum) >= -NLocBuffer), \
  	(bufnum) != InvalidBuffer  \
  )
  
*** a/src/include/storage/lwlock.h
--- b/src/include/storage/lwlock.h
***************
*** 89,133 **** extern PGDLLIMPORT LWLockPadded *MainLWLockArray;
   * if you remove a lock, consider leaving a gap in the numbering sequence for
   * the benefit of DTrace and other external debugging scripts.
   */
! #define BufFreelistLock				(&MainLWLockArray[0].lock)
! #define ShmemIndexLock				(&MainLWLockArray[1].lock)
! #define OidGenLock					(&MainLWLockArray[2].lock)
! #define XidGenLock					(&MainLWLockArray[3].lock)
! #define ProcArrayLock				(&MainLWLockArray[4].lock)
! #define SInvalReadLock				(&MainLWLockArray[5].lock)
! #define SInvalWriteLock				(&MainLWLockArray[6].lock)
! #define WALBufMappingLock			(&MainLWLockArray[7].lock)
! #define WALWriteLock				(&MainLWLockArray[8].lock)
! #define ControlFileLock				(&MainLWLockArray[9].lock)
! #define CheckpointLock				(&MainLWLockArray[10].lock)
! #define CLogControlLock				(&MainLWLockArray[11].lock)
! #define SubtransControlLock			(&MainLWLockArray[12].lock)
! #define MultiXactGenLock			(&MainLWLockArray[13].lock)
! #define MultiXactOffsetControlLock	(&MainLWLockArray[14].lock)
! #define MultiXactMemberControlLock	(&MainLWLockArray[15].lock)
! #define RelCacheInitLock			(&MainLWLockArray[16].lock)
! #define CheckpointerCommLock		(&MainLWLockArray[17].lock)
! #define TwoPhaseStateLock			(&MainLWLockArray[18].lock)
! #define TablespaceCreateLock		(&MainLWLockArray[19].lock)
! #define BtreeVacuumLock				(&MainLWLockArray[20].lock)
! #define AddinShmemInitLock			(&MainLWLockArray[21].lock)
! #define AutovacuumLock				(&MainLWLockArray[22].lock)
! #define AutovacuumScheduleLock		(&MainLWLockArray[23].lock)
! #define SyncScanLock				(&MainLWLockArray[24].lock)
! #define RelationMappingLock			(&MainLWLockArray[25].lock)
! #define AsyncCtlLock				(&MainLWLockArray[26].lock)
! #define AsyncQueueLock				(&MainLWLockArray[27].lock)
! #define SerializableXactHashLock	(&MainLWLockArray[28].lock)
! #define SerializableFinishedListLock		(&MainLWLockArray[29].lock)
! #define SerializablePredicateLockListLock	(&MainLWLockArray[30].lock)
! #define OldSerXidLock				(&MainLWLockArray[31].lock)
! #define SyncRepLock					(&MainLWLockArray[32].lock)
! #define BackgroundWorkerLock		(&MainLWLockArray[33].lock)
! #define DynamicSharedMemoryControlLock		(&MainLWLockArray[34].lock)
! #define AutoFileLock				(&MainLWLockArray[35].lock)
! #define ReplicationSlotAllocationLock	(&MainLWLockArray[36].lock)
! #define ReplicationSlotControlLock		(&MainLWLockArray[37].lock)
! #define NUM_INDIVIDUAL_LWLOCKS		38
  
  /*
   * It's a bit odd to declare NUM_BUFFER_PARTITIONS and NUM_LOCK_PARTITIONS
--- 89,135 ----
   * if you remove a lock, consider leaving a gap in the numbering sequence for
   * the benefit of DTrace and other external debugging scripts.
   */
! #define BufFreelistLock(poolid)				(&MainLWLockArray[poolid].lock)
! #define NUM_FREELIST_LOCKS	2
! 
! #define ShmemIndexLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 1].lock)
! #define OidGenLock					(&MainLWLockArray[NUM_FREELIST_LOCKS + 2].lock)
! #define XidGenLock					(&MainLWLockArray[NUM_FREELIST_LOCKS + 3].lock)
! #define ProcArrayLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 4].lock)
! #define SInvalReadLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 5].lock)
! #define SInvalWriteLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 6].lock)
! #define WALBufMappingLock			(&MainLWLockArray[NUM_FREELIST_LOCKS + 7].lock)
! #define WALWriteLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 8].lock)
! #define ControlFileLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 9].lock)
! #define CheckpointLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 10].lock)
! #define CLogControlLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 11].lock)
! #define SubtransControlLock			(&MainLWLockArray[NUM_FREELIST_LOCKS + 12].lock)
! #define MultiXactGenLock			(&MainLWLockArray[NUM_FREELIST_LOCKS + 13].lock)
! #define MultiXactOffsetControlLock	(&MainLWLockArray[NUM_FREELIST_LOCKS + 14].lock)
! #define MultiXactMemberControlLock	(&MainLWLockArray[NUM_FREELIST_LOCKS + 15].lock)
! #define RelCacheInitLock			(&MainLWLockArray[NUM_FREELIST_LOCKS + 16].lock)
! #define CheckpointerCommLock		(&MainLWLockArray[NUM_FREELIST_LOCKS + 17].lock)
! #define TwoPhaseStateLock			(&MainLWLockArray[NUM_FREELIST_LOCKS + 18].lock)
! #define TablespaceCreateLock		(&MainLWLockArray[NUM_FREELIST_LOCKS + 19].lock)
! #define BtreeVacuumLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 20].lock)
! #define AddinShmemInitLock			(&MainLWLockArray[NUM_FREELIST_LOCKS + 21].lock)
! #define AutovacuumLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 22].lock)
! #define AutovacuumScheduleLock		(&MainLWLockArray[NUM_FREELIST_LOCKS + 23].lock)
! #define SyncScanLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 24].lock)
! #define RelationMappingLock			(&MainLWLockArray[NUM_FREELIST_LOCKS + 25].lock)
! #define AsyncCtlLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 26].lock)
! #define AsyncQueueLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 27].lock)
! #define SerializableXactHashLock	(&MainLWLockArray[NUM_FREELIST_LOCKS + 28].lock)
! #define SerializableFinishedListLock		(&MainLWLockArray[NUM_FREELIST_LOCKS + 29].lock)
! #define SerializablePredicateLockListLock	(&MainLWLockArray[NUM_FREELIST_LOCKS + 30].lock)
! #define OldSerXidLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 31].lock)
! #define SyncRepLock					(&MainLWLockArray[NUM_FREELIST_LOCKS + 32].lock)
! #define BackgroundWorkerLock		(&MainLWLockArray[NUM_FREELIST_LOCKS + 33].lock)
! #define DynamicSharedMemoryControlLock		(&MainLWLockArray[NUM_FREELIST_LOCKS + 34].lock)
! #define AutoFileLock				(&MainLWLockArray[NUM_FREELIST_LOCKS + 35].lock)
! #define ReplicationSlotAllocationLock	(&MainLWLockArray[NUM_FREELIST_LOCKS + 36].lock)
! #define ReplicationSlotControlLock		(&MainLWLockArray[NUM_FREELIST_LOCKS + 37].lock)
! #define NUM_INDIVIDUAL_LWLOCKS		NUM_FREELIST_LOCKS + 37
  
  /*
   * It's a bit odd to declare NUM_BUFFER_PARTITIONS and NUM_LOCK_PARTITIONS
*** a/src/include/utils/rel.h
--- b/src/include/utils/rel.h
***************
*** 215,220 **** typedef struct StdRdOptions
--- 215,221 ----
  {
  	int32		vl_len_;		/* varlena header (do not touch directly!) */
  	int			fillfactor;		/* page fill factor in percent (0..100) */
+ 	int			bufferpool_offset;		/* Buffer Pool option */
  	AutoVacOpts autovacuum;		/* autovacuum-related options */
  	bool		security_barrier;		/* for views */
  	int			check_option_offset;	/* for views */
***************
*** 248,253 **** typedef struct StdRdOptions
--- 249,265 ----
  	(BLCKSZ * (100 - RelationGetFillFactor(relation, defaultff)) / 100)
  
  /*
+  * RelationIsInPriorityBufferPool
+  *		Returns the relation's buffer pool.  
+  */
+ #define RelationIsInPriorityBufferPool(relation) \
+ 	((relation)->rd_options &&												\
+ 	 ((StdRdOptions *) (relation)->rd_options)->bufferpool_offset != 0 ?	\
+ 	 strcmp((char *) (relation)->rd_options +								\
+ 			((StdRdOptions *) (relation)->rd_options)->bufferpool_offset,	\
+ 			"priority") == 0 : false)
+ 
+ /*
   * RelationIsSecurityView
   *		Returns whether the relation is security view, or not
   */
#17Beena Emerson
memissemerson@gmail.com
In reply to: Haribabu Kommi (#16)
Re: Priority table or Cache table

On Tue, Jun 3, 2014 at 9:50 AM, Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

Sorry for the late reply. Thanks for the test.
Please find the re-based patch with a temp fix for correcting the problem.
I will a submit a proper patch fix later.

Please note that the new patch still gives assertion error:

TRAP: FailedAssertion("!(buf->freeNext != (-2))", File: "freelist.c", Line:
178)
psql:load_test.sql:5: connection to server was lost

Hence, the patch was installed with assertions off.

I also ran the test script after making the same configuration changes that
you have specified. I found that I was not able to get the same performance
difference that you have reported.

Following table lists the tps in each scenario and the % increase in
performance.

Threads Head Patched Diff
1 1669 1718 3%
2 2844 3195 12%
4 3909 4915 26%
8 7332 8329 14%

Kindly let me know if I am missing something.

--
Beena Emerson

#18Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Beena Emerson (#17)
1 attachment(s)
Re: Priority table or Cache table

On Mon, Jun 30, 2014 at 11:08 PM, Beena Emerson <memissemerson@gmail.com> wrote:

I also ran the test script after making the same configuration changes that
you have specified. I found that I was not able to get the same performance
difference that you have reported.

Following table lists the tps in each scenario and the % increase in
performance.

Threads Head Patched Diff
1 1669 1718 3%
2 2844 3195 12%
4 3909 4915 26%
8 7332 8329 14%

coming back to this old thread.

I just tried a new approach for this priority table, instead of a
entirely separate buffer pool,
Just try to use a some portion of shared buffers to priority tables
using some GUC variable
"buffer_cache_ratio"(0-75) to specify what percentage of shared
buffers to be used.

Syntax:

create table tbl(f1 int) with(buffer_cache=true);

Comparing earlier approach, I though of this approach is easier to implement.
But during the performance run, it didn't showed much improvement in
performance.
Here are the test results.

Threads Head Patched Diff
1 3123 3238 3.68%
2 5997 6261 4.40%
4 11102 11407 2.75%

I am suspecting that, this may because of buffer locks that are
causing the problem.
where as in older approach of different buffer pools, each buffer pool
have it's own locks.
I will try to collect the profile output and analyze the same.

Any better ideas?

Here I attached a proof of concept patch.

Regards,
Hari Babu
Fujitsu Australia

Attachments:

cache_table_poc.patchapplication/octet-stream; name=cache_table_poc.patchDownload
*** a/src/backend/access/common/reloptions.c
--- b/src/backend/access/common/reloptions.c
***************
*** 85,90 **** static relopt_bool boolRelOpts[] =
--- 85,98 ----
  		},
  		false
  	},
+ 	{
+ 		{
+ 			"buffer_cache",
+ 			"Enables buffer_cache option Table/Index.",
+ 			RELOPT_KIND_HEAP | RELOPT_KIND_BTREE
+ 		},
+ 		false
+ 	},
  	/* list terminator */
  	{{NULL}}
  };
***************
*** 1230,1236 **** default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
  		{"autovacuum_analyze_scale_factor", RELOPT_TYPE_REAL,
  		offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, analyze_scale_factor)},
  		{"user_catalog_table", RELOPT_TYPE_BOOL,
! 		offsetof(StdRdOptions, user_catalog_table)}
  	};
  
  	options = parseRelOptions(reloptions, validate, kind, &numoptions);
--- 1238,1246 ----
  		{"autovacuum_analyze_scale_factor", RELOPT_TYPE_REAL,
  		offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, analyze_scale_factor)},
  		{"user_catalog_table", RELOPT_TYPE_BOOL,
! 		offsetof(StdRdOptions, user_catalog_table)},
! 		{"buffer_cache", RELOPT_TYPE_BOOL,
! 		offsetof(StdRdOptions, buffer_cache) }
  	};
  
  	options = parseRelOptions(reloptions, validate, kind, &numoptions);
*** a/src/backend/storage/buffer/buf_init.c
--- b/src/backend/storage/buffer/buf_init.c
***************
*** 18,23 ****
--- 18,24 ----
  #include "storage/buf_internals.h"
  
  
+ BufferPoolHeader *BufferPool;
  BufferDescPadded *BufferDescriptors;
  char	   *BufferBlocks;
  
***************
*** 65,71 **** void
  InitBufferPool(void)
  {
  	bool		foundBufs,
! 				foundDescs;
  
  	/* Align descriptors to a cacheline boundary. */
  	BufferDescriptors = (BufferDescPadded *) CACHELINEALIGN(
--- 66,78 ----
  InitBufferPool(void)
  {
  	bool		foundBufs,
! 				foundDescs,
! 				foundPoolHeader;
! 
! 	/* Initialize the Buffer Pool Header data */
! 	BufferPool = (BufferPoolHeader *)
! 						ShmemInitStruct("Buffer pool",
! 						sizeof(BufferPoolHeader), &foundPoolHeader);
  
  	/* Align descriptors to a cacheline boundary. */
  	BufferDescriptors = (BufferDescPadded *) CACHELINEALIGN(
***************
*** 116,121 **** InitBufferPool(void)
--- 123,131 ----
  
  		/* Correct last entry of linked list */
  		GetBufferDescriptor(NBuffers - 1)->freeNext = FREENEXT_END_OF_LIST;
+ 
+ 		pg_atomic_init_u32(&BufferPool->current_buffer_cache_pages, 0);
+ 		pg_atomic_init_u32(&BufferPool->max_buffer_cache_pages,((NBuffers * buffer_cache_ratio) / 100));
  	}
  
  	/* Init other shared buffer-management stuff */
***************
*** 133,138 **** BufferShmemSize(void)
--- 143,151 ----
  {
  	Size		size = 0;
  
+ 	/* size of buffer Pool Header */
+ 	size = add_size(size, sizeof(BufferPoolHeader));
+ 
  	/* size of buffer descriptors */
  	size = add_size(size, mul_size(NBuffers, sizeof(BufferDescPadded)));
  	/* to allow aligning buffer descriptors */
*** a/src/backend/storage/buffer/bufmgr.c
--- b/src/backend/storage/buffer/bufmgr.c
***************
*** 1171,1182 **** BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
  	 * 1 so that the buffer can survive one clock-sweep pass.)
  	 */
  	buf->tag = newTag;
! 	buf->flags &= ~(BM_VALID | BM_DIRTY | BM_JUST_DIRTIED | BM_CHECKPOINT_NEEDED | BM_IO_ERROR | BM_PERMANENT);
  	if (relpersistence == RELPERSISTENCE_PERMANENT)
  		buf->flags |= BM_TAG_VALID | BM_PERMANENT;
  	else
  		buf->flags |= BM_TAG_VALID;
  	buf->usage_count = 1;
  
  	UnlockBufHdr(buf);
  
--- 1171,1196 ----
  	 * 1 so that the buffer can survive one clock-sweep pass.)
  	 */
  	buf->tag = newTag;
! 	buf->flags &= ~(BM_VALID | BM_DIRTY | BM_JUST_DIRTIED | BM_CHECKPOINT_NEEDED | BM_IO_ERROR | BM_PERMANENT | BM_BUFFER_CACHE_PAGE);
  	if (relpersistence == RELPERSISTENCE_PERMANENT)
  		buf->flags |= BM_TAG_VALID | BM_PERMANENT;
  	else
  		buf->flags |= BM_TAG_VALID;
  	buf->usage_count = 1;
+ 	
+ 	if ((oldFlags & BM_BUFFER_CACHE_PAGE) && (smgr->is_a_buffer_cache_rel))
+ 	{
+ 		buf->flags |= BM_BUFFER_CACHE_PAGE;
+ 	}
+ 	else if (smgr->is_a_buffer_cache_rel)
+ 	{
+ 		buf->flags |= BM_BUFFER_CACHE_PAGE;
+ 		increment_buffer_cache_pages_counter();
+ 	}
+ 	else if (oldFlags & BM_BUFFER_CACHE_PAGE)
+ 	{
+ 		decrement_buffer_cache_pages_counter();
+ 	}
  
  	UnlockBufHdr(buf);
  
***************
*** 1288,1293 **** retry:
--- 1302,1310 ----
  	buf->flags = 0;
  	buf->usage_count = 0;
  
+ 	if (oldFlags & BM_BUFFER_CACHE_PAGE)
+ 		decrement_buffer_cache_pages_counter();
+ 
  	UnlockBufHdr(buf);
  
  	/*
*** a/src/backend/storage/buffer/freelist.c
--- b/src/backend/storage/buffer/freelist.c
***************
*** 101,106 **** typedef struct BufferAccessStrategyData
--- 101,129 ----
  static volatile BufferDesc *GetBufferFromRing(BufferAccessStrategy strategy);
  static void AddBufferToRing(BufferAccessStrategy strategy,
  				volatile BufferDesc *buf);
+ static bool is_buffer_cache_ratio_reached(void);
+ 
+ void
+ decrement_buffer_cache_pages_counter()
+ {
+ 	pg_atomic_fetch_sub_u32(&BufferPool->current_buffer_cache_pages, 1);
+ }
+ 
+ void
+ increment_buffer_cache_pages_counter()
+ {
+ 	pg_atomic_fetch_add_u32(&BufferPool->current_buffer_cache_pages, 1);
+ }
+ 
+ static bool
+ is_buffer_cache_ratio_reached()
+ {
+ 	uint32 current_pages = pg_atomic_read_u32(&BufferPool->current_buffer_cache_pages);
+ 	uint32 max_pages = pg_atomic_read_u32(&BufferPool->max_buffer_cache_pages);
+ 
+ 	return (current_pages < max_pages) ? false : true;
+ }
+ 
  
  /*
   * ClockSweepTick - Helper routine for StrategyGetBuffer()
***************
*** 305,313 **** StrategyGetBuffer(BufferAccessStrategy strategy)
  		LockBufHdr(buf);
  		if (buf->refcount == 0)
  		{
! 			if (buf->usage_count > 0)
  			{
  				buf->usage_count--;
  				trycounter = NBuffers;
  			}
  			else
--- 328,341 ----
  		LockBufHdr(buf);
  		if (buf->refcount == 0)
  		{
! 			if ((buf->flags & BM_BUFFER_CACHE_PAGE) && !is_buffer_cache_ratio_reached())
! 			{
! 				trycounter = NBuffers;
! 			}
! 			else if (buf->usage_count > 0)
  			{
  				buf->usage_count--;
+ 
  				trycounter = NBuffers;
  			}
  			else
*** a/src/backend/utils/init/globals.c
--- b/src/backend/utils/init/globals.c
***************
*** 116,121 **** int			maintenance_work_mem = 16384;
--- 116,122 ----
   * register background workers.
   */
  int			NBuffers = 1000;
+ int			buffer_cache_ratio = 0;
  int			MaxConnections = 90;
  int			max_worker_processes = 8;
  int			MaxBackends = 0;
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
***************
*** 1811,1816 **** static struct config_int ConfigureNamesInt[] =
--- 1811,1826 ----
  	},
  
  	{
+ 		{ "buffer_cache_ratio", PGC_POSTMASTER, RESOURCES_MEM,
+ 			gettext_noop("Sets the number of buffer cache ratio can be used for buffer cache relations from shared memory."),
+ 			NULL
+ 		},
+ 		&buffer_cache_ratio,
+ 		0, 0, 75,
+ 		NULL, NULL, NULL
+ 	},
+ 
+ 	{
  		{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
  			gettext_noop("Sets the maximum number of temporary buffers used by each session."),
  			NULL,
*** a/src/backend/utils/misc/postgresql.conf.sample
--- b/src/backend/utils/misc/postgresql.conf.sample
***************
*** 114,119 ****
--- 114,122 ----
  
  #shared_buffers = 32MB			# min 128kB
  					# (change requires restart)
+ #buffer_cache_ratio = 0;	#specifies the ratio of shared buffers
+ 							#used for buffer cache relations
+ 							#(change requires restart)
  #huge_pages = try			# on, off, or try
  					# (change requires restart)
  #temp_buffers = 8MB			# min 800kB
*** a/src/bin/pgbench/pgbench.c
--- b/src/bin/pgbench/pgbench.c
***************
*** 1930,1938 **** init(bool is_no_vacuum)
  		}
  	};
  	static const char *const DDLINDEXes[] = {
! 		"alter table pgbench_branches add primary key (bid)",
! 		"alter table pgbench_tellers add primary key (tid)",
! 		"alter table pgbench_accounts add primary key (aid)"
  	};
  	static const char *const DDLKEYs[] = {
  		"alter table pgbench_tellers add foreign key (bid) references pgbench_branches",
--- 1930,1938 ----
  		}
  	};
  	static const char *const DDLINDEXes[] = {
! 		"alter table pgbench_branches add primary key (bid) with (buffer_cache=true)",
! 		"alter table pgbench_tellers add primary key (tid) with (buffer_cache=true)",
! 		"alter table pgbench_accounts add primary key (aid) with (buffer_cache=true)"
  	};
  	static const char *const DDLKEYs[] = {
  		"alter table pgbench_tellers add foreign key (bid) references pgbench_branches",
***************
*** 1973,1979 **** init(bool is_no_vacuum)
  		opts[0] = '\0';
  		if (ddl->declare_fillfactor)
  			snprintf(opts + strlen(opts), sizeof(opts) - strlen(opts),
! 					 " with (fillfactor=%d)", fillfactor);
  		if (tablespace != NULL)
  		{
  			char	   *escape_tablespace;
--- 1973,1983 ----
  		opts[0] = '\0';
  		if (ddl->declare_fillfactor)
  			snprintf(opts + strlen(opts), sizeof(opts) - strlen(opts),
! 					 " with (fillfactor=%d", fillfactor);
! 		
! 		snprintf(opts + strlen(opts), sizeof(opts) - strlen(opts),
! 			"%s buffer_cache=true)", ddl->declare_fillfactor ? ",":" with (");
! 		
  		if (tablespace != NULL)
  		{
  			char	   *escape_tablespace;
*** a/src/include/storage/buf_internals.h
--- b/src/include/storage/buf_internals.h
***************
*** 40,45 ****
--- 40,46 ----
  #define BM_CHECKPOINT_NEEDED	(1 << 7)		/* must write for checkpoint */
  #define BM_PERMANENT			(1 << 8)		/* permanent relation (not
  												 * unlogged) */
+ #define BM_BUFFER_CACHE_PAGE	(1 << 9)		/* Buffer used by a buffer cache rel */
  
  typedef bits16 BufFlags;
  
***************
*** 227,232 **** extern void StrategyNotifyBgWriter(int bgwprocno);
--- 228,236 ----
  extern Size StrategyShmemSize(void);
  extern void StrategyInitialize(bool init);
  
+ extern void decrement_buffer_cache_pages_counter(void);
+ extern void increment_buffer_cache_pages_counter(void);
+ 
  /* buf_table.c */
  extern Size BufTableShmemSize(int size);
  extern void InitBufTable(int size);
*** a/src/include/storage/bufmgr.h
--- b/src/include/storage/bufmgr.h
***************
*** 14,19 ****
--- 14,20 ----
  #ifndef BUFMGR_H
  #define BUFMGR_H
  
+ #include "port/atomics.h"
  #include "storage/block.h"
  #include "storage/buf.h"
  #include "storage/bufpage.h"
***************
*** 45,52 **** typedef enum
--- 46,61 ----
  								 * replay; otherwise same as RBM_NORMAL */
  } ReadBufferMode;
  
+ typedef struct BufferPoolHeader
+ {
+ 	pg_atomic_uint32	max_buffer_cache_pages;
+ 	pg_atomic_uint32	current_buffer_cache_pages;
+ } BufferPoolHeader;
+ 
  /* in globals.c ... this duplicates miscadmin.h */
  extern PGDLLIMPORT int NBuffers;
+ extern int buffer_cache_ratio;
+ extern BufferPoolHeader *BufferPool;
  
  /* in bufmgr.c */
  extern bool zero_damaged_pages;
*** a/src/include/storage/smgr.h
--- b/src/include/storage/smgr.h
***************
*** 56,61 **** typedef struct SMgrRelationData
--- 56,63 ----
  	BlockNumber smgr_fsm_nblocks;		/* last known size of fsm fork */
  	BlockNumber smgr_vm_nblocks;	/* last known size of vm fork */
  
+ 	bool		is_a_buffer_cache_rel;	/* Flag to indicate the relation buffer_cache */
+ 
  	/* additional public fields may someday exist here */
  
  	/*
*** a/src/include/utils/rel.h
--- b/src/include/utils/rel.h
***************
*** 220,225 **** typedef struct StdRdOptions
--- 220,227 ----
  	AutoVacOpts autovacuum;		/* autovacuum-related options */
  	bool		user_catalog_table;		/* use as an additional catalog
  										 * relation */
+ 	bool		buffer_cache;	/* Use buffer cache for relation 
+ 								 * if available */
  } StdRdOptions;
  
  #define HEAP_MIN_FILLFACTOR			10
***************
*** 256,261 **** typedef struct StdRdOptions
--- 258,270 ----
  	((relation)->rd_options ?				\
  	 ((StdRdOptions *) (relation)->rd_options)->user_catalog_table : false)
  
+ /*
+  * RelationUsesBufferCache
+  *		Returns the relation's buffer_cache option.
+  */
+ #define RelationUsesBufferCache(relation) \
+ 	((relation)->rd_options ?				\
+ 	 ((StdRdOptions *) (relation)->rd_options)->buffer_cache : false)
  
  /*
   * ViewOptions
***************
*** 390,395 **** typedef struct ViewOptions
--- 399,405 ----
  	do { \
  		if ((relation)->rd_smgr == NULL) \
  			smgrsetowner(&((relation)->rd_smgr), smgropen((relation)->rd_node, (relation)->rd_backend)); \
+ 		(relation)->rd_smgr->is_a_buffer_cache_rel = RelationUsesBufferCache(relation); \
  	} while (0)
  
  /*
#19Amit Kapila
amit.kapila16@gmail.com
In reply to: Haribabu Kommi (#18)
Re: Priority table or Cache table

On Thu, Aug 6, 2015 at 12:24 PM, Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

On Mon, Jun 30, 2014 at 11:08 PM, Beena Emerson <memissemerson@gmail.com>

wrote:

I also ran the test script after making the same configuration changes

that

you have specified. I found that I was not able to get the same

performance

difference that you have reported.

Following table lists the tps in each scenario and the % increase in
performance.

Threads Head Patched Diff
1 1669 1718 3%
2 2844 3195 12%
4 3909 4915 26%
8 7332 8329 14%

coming back to this old thread.

I just tried a new approach for this priority table, instead of a
entirely separate buffer pool,
Just try to use a some portion of shared buffers to priority tables
using some GUC variable
"buffer_cache_ratio"(0-75) to specify what percentage of shared
buffers to be used.

Syntax:

create table tbl(f1 int) with(buffer_cache=true);

Comparing earlier approach, I though of this approach is easier to

implement.

But during the performance run, it didn't showed much improvement in
performance.
Here are the test results.

What is the configuration for test (RAM of m/c, shared_buffers,
scale_factor, etc.)?

Threads Head Patched Diff
1 3123 3238 3.68%
2 5997 6261 4.40%
4 11102 11407 2.75%

I am suspecting that, this may because of buffer locks that are
causing the problem.
where as in older approach of different buffer pools, each buffer pool
have it's own locks.
I will try to collect the profile output and analyze the same.

Any better ideas?

I think you should try to find out during test, for how many many pages,
it needs to perform clocksweep (add some new counter like
numBufferBackendClocksweep in BufferStrategyControl to find out the
same). By theory your patch should reduce the number of times it needs
to perform clock sweep.

I think in this approach even if you make some buffers as non-replaceable
(buffers for which BM_BUFFER_CACHE_PAGE is set), still clock sweep
needs to access all the buffers. I think we might want to find some way to
reduce that if this idea helps.

Another thing is that, this idea looks somewhat similar (although not same)
to current Ring Buffer concept, where Buffers for particular types of scan
uses buffers from Ring. I think it is okay to prototype as you have done
in patch and we can consider to do something on those lines if at all
this patch's idea helps.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#20Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Amit Kapila (#19)
Re: Priority table or Cache table

On Mon, Aug 10, 2015 at 3:09 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Aug 6, 2015 at 12:24 PM, Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

What is the configuration for test (RAM of m/c, shared_buffers,
scale_factor, etc.)?

Here are the details:

CPU - 16 core, RAM - 252 GB

shared_buffers - 1700MB, buffer_cache_ratio - 70
wal_buffers - 16MB, synchronous_commit - off
checkpoint_timeout - 15min, max_wal_size - 5GB.

pgbench scale factor - 75 (1GB)

Load test table size - 1GB

Threads Head Patched Diff
1 3123 3238 3.68%
2 5997 6261 4.40%
4 11102 11407 2.75%

I am suspecting that, this may because of buffer locks that are
causing the problem.
where as in older approach of different buffer pools, each buffer pool
have it's own locks.
I will try to collect the profile output and analyze the same.

Any better ideas?

I think you should try to find out during test, for how many many pages,
it needs to perform clocksweep (add some new counter like
numBufferBackendClocksweep in BufferStrategyControl to find out the
same). By theory your patch should reduce the number of times it needs
to perform clock sweep.

I think in this approach even if you make some buffers as non-replaceable
(buffers for which BM_BUFFER_CACHE_PAGE is set), still clock sweep
needs to access all the buffers. I think we might want to find some way to
reduce that if this idea helps.

Another thing is that, this idea looks somewhat similar (although not same)
to current Ring Buffer concept, where Buffers for particular types of scan
uses buffers from Ring. I think it is okay to prototype as you have done
in patch and we can consider to do something on those lines if at all
this patch's idea helps.

Thanks for the details. I will try the same.

Regards,
Hari Babu
Fujitsu Australia

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Amit Kapila
amit.kapila16@gmail.com
In reply to: Haribabu Kommi (#20)
Re: Priority table or Cache table

On Tue, Aug 11, 2015 at 11:31 AM, Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

On Mon, Aug 10, 2015 at 3:09 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Thu, Aug 6, 2015 at 12:24 PM, Haribabu Kommi <

kommi.haribabu@gmail.com>

wrote:

What is the configuration for test (RAM of m/c, shared_buffers,
scale_factor, etc.)?

Here are the details:

CPU - 16 core, RAM - 252 GB

shared_buffers - 1700MB, buffer_cache_ratio - 70
wal_buffers - 16MB, synchronous_commit - off
checkpoint_timeout - 15min, max_wal_size - 5GB.

pgbench scale factor - 75 (1GB)

Load test table size - 1GB

It seems that test table can fit easily in shared buffers, I am not sure
this patch will be of benefit for such cases, why do you think it can be
beneficial for such cases?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#22Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Amit Kapila (#21)
Re: Priority table or Cache table

On Tue, Aug 11, 2015 at 4:43 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Aug 11, 2015 at 11:31 AM, Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

On Mon, Aug 10, 2015 at 3:09 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Thu, Aug 6, 2015 at 12:24 PM, Haribabu Kommi
<kommi.haribabu@gmail.com>
wrote:

What is the configuration for test (RAM of m/c, shared_buffers,
scale_factor, etc.)?

Here are the details:

CPU - 16 core, RAM - 252 GB

shared_buffers - 1700MB, buffer_cache_ratio - 70
wal_buffers - 16MB, synchronous_commit - off
checkpoint_timeout - 15min, max_wal_size - 5GB.

pgbench scale factor - 75 (1GB)

Load test table size - 1GB

It seems that test table can fit easily in shared buffers, I am not sure
this patch will be of benefit for such cases, why do you think it can be
beneficial for such cases?

Yes. This configuration combination is may not be best for the test.

The idea behind these setting is to provide enough shared buffers to cache
table by tuning the buffer_cache_ratio from 0 to 70% of shared buffers
So the cache tables have enough shared buffers and rest of the shared
buffers can be used for normal tables i.e load test table.

I will try to evaluate some more performance tests with different shared
buffers settings and load.

Regards,
Hari Babu
Fujitsu Australia

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers