Priority table or Cache table
Hi,
I want to propose a new feature called "priority table" or "cache table".
This is same as regular table except the pages of these tables are having
high priority than normal tables. These tables are very useful, where a
faster query processing on some particular tables is expected.
The same faster query processing can be achieved by placing the tables on a
tablespace of ram disk. In this case there is a problem of data loss in
case of system shutdown. To avoid this there is a need of continuous backup
of this tablespace and WAL files is required. The priority table feature
will solve these problems by providing the similar functionality.
User needs a careful decision in deciding how many tables which require a
faster access, those can be declared as priority tables and also these
tables should be in small in both number of columns and size.
New syntax:
create [priority] Table ...;
or
Create Table .. [ buffer_pool = priority | default ];
By adding a new storage parameter of buffer_pool to specify the type of
buffer pool this table can use.
The same can be extended for index also.
Solution -1:
This solution may not be a proper one, but it is simple. So while placing
these table pages into buffer pool, the usage count is changed to double
max buffer usage count instead of 1 for normal tables. Because of this
reason there is a less chance of these pages will be moved out of buffer
pool. The queries which operates on these tables will be faster because of
less I/O. In case if the tables are not used for a long time, then only the
first query on the table will be slower and rest of the queries are faster.
Just for test, a new bool member can be added to RELFILENODE structure to
indicate the table type is priority or not. Using this while loading the
page the usage count can be modified.
The pg_buffercache output of a priority table:
postgres=# select * from pg_buffercache where relfilenode=16385;
bufferid | relfilenode | reltablespace | reldatabase | relforknumber |
relblocknumber | isdirty | usagecount
-----------+---------------+-------------------+-----------------+--------------------+---------------------+---------+------------
270 | 16385 | 1663 | 12831 |
0 | 0 | t | 10
Solution - 2:
By keeping an extra flag in the buffer to know whether the buffer is used
for a priority table or not? By using this flag while replacing a buffer
used for priority table some extra steps needs to be taken care like
1. Only another page of priority table can replace this priority page.
2. Only after at least two complete cycles of clock sweep, a normal table
page can replace this.
In this case the priority buffers are present in memory for long time as
similar to the solution-1, but not guaranteed always.
Solution - 3:
Create an another buffer pool called "priority buffer pool" similar to
shared buffer pool to place the priority table pages. A new guc parameter
called "priority_buffers" can be added to the get the priority buffer pool
size from the user. The Maximum limit of these buffers can be kept smaller
value to make use of it properly.
As an extra care, whenever any page needs to move out of the priority
buffer pool a warning is issued, so that user can check whether the
configured the priority_buffers size is small or the priority tables are
grown too much as not expected?
In this case all the pages are always loaded into memory thus the queries
gets the faster processing.
IBM DB2 have the facility of creating one more buffer pools and fixing
specific tables and indexes into them. Oracle is also having a facility to
specify the buffer pool option as keep or recycle.
I am preferring syntax-2 and solution-3. please provide your
suggestions/improvements.
Regards,
Hari Babu
Fujitsu Australia
Haribabu Kommi <kommi.haribabu@gmail.com> writes:
I want to propose a new feature called "priority table" or "cache table".
This is same as regular table except the pages of these tables are having
high priority than normal tables. These tables are very useful, where a
faster query processing on some particular tables is expected.
Why exactly does the existing LRU behavior of shared buffers not do
what you need?
I am really dubious that letting DBAs manage buffers is going to be
an improvement over automatic management.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Feb 20, 2014 at 11:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Haribabu Kommi <kommi.haribabu@gmail.com> writes:
I want to propose a new feature called "priority table" or "cache table".
This is same as regular table except the pages of these tables are having
high priority than normal tables. These tables are very useful, where a
faster query processing on some particular tables is expected.Why exactly does the existing LRU behavior of shared buffers not do
what you need?
Lets assume a database having 3 tables, which are accessed regularly. The
user is expecting a faster query results on one table.
Because of LRU behavior which is not happening some times. So if we just
separate those table pages into an another buffer
pool then all the pages of that table resides in memory and gets faster
query processing.
Regards,
Hari Babu
Fujitsu Australia
On Thu, Feb 20, 2014 at 6:24 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
On Thu, Feb 20, 2014 at 11:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I want to propose a new feature called "priority table" or "cache
table".
This is same as regular table except the pages of these tables are
having
high priority than normal tables. These tables are very useful, where a
faster query processing on some particular tables is expected.Why exactly does the existing LRU behavior of shared buffers not do
what you need?Lets assume a database having 3 tables, which are accessed regularly. The
user is expecting a faster query results on one table.
Because of LRU behavior which is not happening some times.
I think this will not be a problem for regularly accessed tables(pages),
as per current algorithm they will get more priority before getting
flushed out of shared buffer cache.
Have you come across any such case where regularly accessed pages
get lower priority than non-regularly accessed pages?
However it might be required for cases where user wants to control
such behaviour and pass such hints through table level option or some
other way to indicate that he wants more priority for certain tables
irrespective
of their usage w.r.t other tables.
Now I think here important thing to find out is how much helpful it is for
users or why do they want to control such behaviour even when Database
already takes care of such thing based on access pattern.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Feb 20, 2014 at 2:26 PM, Amit Kapila <amit.kapila16@gmail.com>wrote:
On Thu, Feb 20, 2014 at 6:24 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:On Thu, Feb 20, 2014 at 11:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I want to propose a new feature called "priority table" or "cache
table".
This is same as regular table except the pages of these tables are
having
high priority than normal tables. These tables are very useful, wherea
faster query processing on some particular tables is expected.
Why exactly does the existing LRU behavior of shared buffers not do
what you need?Lets assume a database having 3 tables, which are accessed regularly. The
user is expecting a faster query results on one table.
Because of LRU behavior which is not happening some times.I think this will not be a problem for regularly accessed tables(pages),
as per current algorithm they will get more priority before getting
flushed out of shared buffer cache.
Have you come across any such case where regularly accessed pages
get lower priority than non-regularly accessed pages?
Because of other regularly accessed tables, some times the table which
expects faster results is getting delayed.
However it might be required for cases where user wants to control
such behaviour and pass such hints through table level option or some
other way to indicate that he wants more priority for certain tables
irrespective
of their usage w.r.t other tables.Now I think here important thing to find out is how much helpful it is for
users or why do they want to control such behaviour even when Database
already takes care of such thing based on access pattern.
Yes it is useful in cases where the application always expects the faster
results whether the table is used regularly or not.
Regards,
Hari Babu
Fujitsu Australia
On Thu, Feb 20, 2014 at 10:23 AM, Haribabu Kommi
<kommi.haribabu@gmail.com>wrote:
On Thu, Feb 20, 2014 at 2:26 PM, Amit Kapila <amit.kapila16@gmail.com>wrote:
On Thu, Feb 20, 2014 at 6:24 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:On Thu, Feb 20, 2014 at 11:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I want to propose a new feature called "priority table" or "cache
table".
This is same as regular table except the pages of these tables are
having
high priority than normal tables. These tables are very useful,where a
faster query processing on some particular tables is expected.
Why exactly does the existing LRU behavior of shared buffers not do
what you need?Lets assume a database having 3 tables, which are accessed regularly.
The
user is expecting a faster query results on one table.
Because of LRU behavior which is not happening some times.I think this will not be a problem for regularly accessed tables(pages),
as per current algorithm they will get more priority before getting
flushed out of shared buffer cache.
Have you come across any such case where regularly accessed pages
get lower priority than non-regularly accessed pages?Because of other regularly accessed tables, some times the table which
expects faster results is getting delayed.
The solution involving buffer pools partitions the buffer cache in separate
pools explicitly. The way PostgreSQL buffer manager works, for a regular
pattern table accesses the buffer cache automatically reaches a stable
point where the number of buffers containing pages belonging to a
particular table starts to stabilize. Thus at an equilibrium point for
given access pattern, the buffer cache automatically gets partitioned by
the tables, each using its share of buffers. So, solution using buffer
pools seems useless.
PFA some scripts, which I used to verify the behaviour. The scripts create
two tables, one large and other half it's size (buffer_usage_objects.sql).
The other script contains few queries which will simulate a simple table
access pattern by running select count(*) on these tables N times. The same
script contains query of pg_buffercache view provided by pg_buffercache
extension. This query counts the number of buffers uses by either of these
tables. So, if you run three session in parallel, two querying either of
the tables and the third taking snapshot of buffer usage per table, you
would be able to see this partitioning.
However it might be required for cases where user wants to control
such behaviour and pass such hints through table level option or some
other way to indicate that he wants more priority for certain tables
irrespective
of their usage w.r.t other tables.Now I think here important thing to find out is how much helpful it is for
users or why do they want to control such behaviour even when Database
already takes care of such thing based on access pattern.Yes it is useful in cases where the application always expects the faster
results whether the table is used regularly or not.
In such case, it might be valuable to see if we should play with the
maximum usage parameter, which is set to 5 currently.
54 #define BM_MAX_USAGE_COUNT 5
Regards,
Hari Babu
Fujitsu Australia
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
On Thu, Feb 20, 2014 at 10:06 PM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:
On Thu, Feb 20, 2014 at 10:23 AM, Haribabu Kommi <kommi.haribabu@gmail.com
wrote:
On Thu, Feb 20, 2014 at 2:26 PM, Amit Kapila <amit.kapila16@gmail.com>wrote:
On Thu, Feb 20, 2014 at 6:24 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:On Thu, Feb 20, 2014 at 11:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I want to propose a new feature called "priority table" or "cache
table".
This is same as regular table except the pages of these tables are
having
high priority than normal tables. These tables are very useful,where a
faster query processing on some particular tables is expected.
Why exactly does the existing LRU behavior of shared buffers not do
what you need?Lets assume a database having 3 tables, which are accessed regularly.
The
user is expecting a faster query results on one table.
Because of LRU behavior which is not happening some times.I think this will not be a problem for regularly accessed tables(pages),
as per current algorithm they will get more priority before getting
flushed out of shared buffer cache.
Have you come across any such case where regularly accessed pages
get lower priority than non-regularly accessed pages?Because of other regularly accessed tables, some times the table which
expects faster results is getting delayed.The solution involving buffer pools partitions the buffer cache in
separate pools explicitly. The way PostgreSQL buffer manager works, for a
regular pattern table accesses the buffer cache automatically reaches a
stable point where the number of buffers containing pages belonging to a
particular table starts to stabilize. Thus at an equilibrium point for
given access pattern, the buffer cache automatically gets partitioned by
the tables, each using its share of buffers. So, solution using buffer
pools seems useless.
I checked some of the performance reports on the oracle multiple buffer
pool concept, shown as there is an increase in cache hit ratio compared to
a single buffer pool.
After that only I proposed this split pool solution. I don't know how much
it really works for Postgresql. The performance report on oracle is
attached in the mail.
PFA some scripts, which I used to verify the behaviour. The scripts create
two tables, one large and other half it's size (buffer_usage_objects.sql).
The other script contains few queries which will simulate a simple table
access pattern by running select count(*) on these tables N times. The same
script contains query of pg_buffercache view provided by pg_buffercache
extension. This query counts the number of buffers uses by either of these
tables. So, if you run three session in parallel, two querying either of
the tables and the third taking snapshot of buffer usage per table, you
would be able to see this partitioning.
Thanks for the scripts. I will check it.
However it might be required for cases where user wants to control
such behaviour and pass such hints through table level option or some
other way to indicate that he wants more priority for certain tables
irrespective
of their usage w.r.t other tables.Now I think here important thing to find out is how much helpful it is
for
users or why do they want to control such behaviour even when Database
already takes care of such thing based on access pattern.Yes it is useful in cases where the application always expects the faster
results whether the table is used regularly or not.In such case, it might be valuable to see if we should play with the
maximum usage parameter, which is set to 5 currently.
54 #define BM_MAX_USAGE_COUNT 5
This is the first solution which i have described in my first mail. Thanks,
I will check further into it.
Regards,
Hari Babu
Fujitsu Australia
Attachments:
oracle9i_buffer_pools.pdfapplication/pdf; name=oracle9i_buffer_pools.pdfDownload
%PDF-1.4
%����
5 0 obj
<</Length 6 0 R/Filter /FlateDecode>>
stream
;�f�p�&z!8 �;��1���1VHPj3W��$����������G���Z�����n��������3rY3C�����x{��}�
q�>� |�@��%}�^3�{�n�����z�Q�?��0� ,/��N!Y�\��r|���:���2%�-[!*w>�`�r�JRH�xB���&��Z>�?s�������Hg��P4d(�e��x��u����$���W��ky�w��P�����������n�� e��Z����qo�#�d���=�������r�'4��DTO����������J��y�������p�J`��h��>{\��[���'DO0A.��iiQ��'��zHb���wG4�`�@��1��]�~A��x���Ih�b:7��� w+����� ����4������a4#���s+���CZ���V�@�������fq�Ee*�I�8�(f�8 ��Vi���_���#��+dpLh�SF�{���.c����X�S��������������,��$���s�
99�bb���?<�do���"O�"@awtf��?�� <��o�8(���$@z����e�k����h�����!u6��y_���
Z��6vRfm9�XFA���5a������ZR<�{e6\Z!�I�6��E�&�X:$�b�{i�W
a`fZ�[�Y-NZ�P���S+����X�����������
Q)wsGq`%����R�Ru�e��=����a�6?��cV���31Y�"l���G��'Srh�~�-�6I�V_p�Y�����_�5�Gt�"^���g��]���t�.�`���a���~D�C{��wH��0�+��>��G�tf7;[�~q���;P��p����_e��;(h������m�6��1��o�sE*M��/_���k������B��^�N ��
|I_�ZG�m^V!D*5<��zR������s� W���l*����UW��8t����H�i��zd��C5��pE�n���~��*�*0&/z<P�����}�q�f�VJM��<i��L
����m�g�r�I�s5��b��>W����%��������*��9mg�V���� ��v8��>�@����a���G.�� �.�W���Iv�uU��j"���D�7����/��]� 6]��>�6@]>���h�Yt*/4V���vz���A�c���t�:��T������T�u���s85F��(��K�8H0a�1S���O!���;�C�9k�y�5U������ ���,����F�l��.1]�W�0�1�Qrki�l���#�*'��8HP�2�R����|��?]{>��Ll�rm�{�� tue�Zf���Z�K��ze�s�c�1��|���lS��E~9e�P�0M�XS�Hd�o'L��6%����0a�4�������h�^��6��7���;\~�� �Ram9s,X��U���s���d��Q���g����=aF`�R������K,x&$��n�X�*��J���������0������xendstream
endobj
6 0 obj
1501
endobj
13 0 obj
<</Length 14 0 R/Filter /FlateDecode>>
stream
5#����&��d������S������S
������UO��a�O��f���J�.����+J���������o�F���S���j�#�:Q�E�%_m^�?��Z�E���` ,Q���%� ��T������[M`-Bd�������,M+jR����4��/��[�~�W�@#R���7y��L���l�G,��'k�# r�h0�����Z�����6����mn���L���^r:a|�����e�d�������K����&/�*d�[b����_�����j�l3r���c�u�^(��f���|Qy�:SE��!]�R�[�����f�98�*��A��]8v+��@4\#������m���Z����,��endstream
endobj
14 0 obj
366
endobj
17 0 obj
<</Length 18 0 R/Filter /FlateDecode>>
stream
�pbt��_I�kJ��'Zs�Zv�p����]�K�����9no� M�pBY'