Proposal for disk quota feature

Started by Hubert Zhangover 7 years ago16 messages
#1Hubert Zhang
hzhang@pivotal.io

Hi all,
We want to introduce disk quota feature into Postgres.

*Why disk quota*
*In a multi-tenant environment, there is a requirement to limit the disk
quota that database/schema/table can be written or a user can consume for
different organizations.*
*Meanwhile, other databases such as Oracle, Teradata, DB2 have already
supported disk quota feature.*

*Heikki has already implemented disk quota feature in Postgres as an
extension pg_quota <https://github.com/hlinnaka/pg_quota&gt;. We plan to
enhance disk quota feature based on Heikki's implementation.*

*Scope*

The scope of disk quota feature is to limit the disk usage of
database/schema/table objects and users.

Here table means heap table, ao table, index table, partition table and
associated table( toast table, visible table, large object etc.). Schema
disk quota is the disk quota of all the tables in this schema. Database
disk quota is the disk quota of all the tables in this database.

User's quota is the size of all the tables whose owner are this user.
Out of Scope: Note that spill files, xlogs, clogs and logs are not
considered for database object level disk quota at this stage.

*Design*
We propose disk quota with the following components:

1. Quota Setting Store is where the disk quota setting to be stored and
accessed. DBA or object owner uses SQL queries to configure the disk quota
for each database objects.

*2. Quota Change Detector is the monitor of size change of database objects
in Postgres. It will write change information to shared memory to notify
Quota Size Checker. The implementation of Change Detector could be hooks in
smgr_extend/smgr_unlink/smgr_truncate when modifying the size of a heap
table. The hooks will write to shared memory in a batched way to reduce the
impact on OLTP performance.3. Quota Size Checker is implemented as a worker
process. It maintains the current disk usage for each database objects and
users, and compare them with settings in Quota Setting Store. If it detects
the disk usage hit the quota redzone(either upon or below), it will notify
Quota Enforcement Operator. 4. Quota Enforcement Operator has two roles:
one is to check the disk quota permission before queries are
executed(QueryBeforeRun Check), the other is to cancel the running queries
when it reaches the disk quota limit dynamically(QueryRunning Check). Quota
Enforcement Operator uses shared memory to store the enforcement
information to guarantee a quick check. *

*To implement the right proof of concept, we want to receive feedback from
the community from the following aspects: *
Q1. User Interface: when setting a disk quota,
Option 1 is to use *insert into quota.config(user1, 10G)*
Option 2 is to use UDF *select set_quota("role","user1",10G)*
Option 3 is to use native SQL syntax *create disk quota on ROLE user1
10G, or create disk quota on SCHEMA s1 25G;*
Q2. Quota Setting Store using user table or catalog?
Option 1 is to create a schema called quota for each database and write
quota settings into quota.config table, only DBA could modify it. This
corresponds to Q1.option1
Option 2 is to store quota settings into the catalog. For Schema and
Table write them to database level catalog. For database or user, write
them to either database level or global catalog.

I personally prefer Q1's option3 and Q2's option2, since it makes disk
quota more like a native feature. We could support the quota worker process
implementation as an extension for now, but in the long term, disk quota is
like a fundamental feature of a database and should be a native feature
just like other databases. So store quota conf into catalog and supply
native syntax is better.

Open Problem
We prepare to implement Quota Size Checker as a worker process. Worker
process needs to connect to a database to build the disk usage map and
quota map(e.g. a user/shcema's disk usage on a given database) But one
worker process can only bind to one database(InitPostgres(dbname)) at the
initialization stage. It results in we need a separate worker process for
each database. The solution to this problem is not straightforward, here
are some ideas:
1 To make worker process could retrieve and cache information from all the
databases. As Tom Lane pointed out that it needs to flush all the database
specific thing, like relcache syscache etc.
2 Keep one worker process for each database. But using a parent/global
quota worker process to manage the lifecycle of database level worker
processes. It could handle the newly created database(avoid restart
database) and save resource when a database is not used. But this needs to
change worker process to be hierarchical. Postmaster becomes the grandfather
of database level worker processes in this case.

Any better ideas on it?

--
Thanks

Hubert Zhang

#2Chapman Flack
chap@anastigmatix.net
In reply to: Hubert Zhang (#1)
Re: Proposal for disk quota feature

On 08/30/2018 09:57 AM, Hubert Zhang wrote:

2 Keep one worker process for each database. But using a parent/global
quota worker process to manage the lifecycle of database level worker
processes. It could handle the newly created database(avoid restart
database) and save resource when a database is not used. But this needs to
change worker process to be hierarchical. Postmaster becomes the grandfather
of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period, and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

-Chap

#3Pavel Stehule
pavel.stehule@gmail.com
In reply to: Chapman Flack (#2)
Re: Proposal for disk quota feature

2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net>:

On 08/30/2018 09:57 AM, Hubert Zhang wrote:

2 Keep one worker process for each database. But using a parent/global
quota worker process to manage the lifecycle of database level worker
processes. It could handle the newly created database(avoid restart
database) and save resource when a database is not used. But this needs

to

change worker process to be hierarchical. Postmaster becomes the

grandfather

of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period, and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

There are servers with thousands databases. Worker per database is not good
idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers can be. On
servers with large set of databases, large set of tables it can identify
too big table too late.

Isn't better to implement some quotas on storage level?

Regards

Pavel

Show quoted text

-Chap

#4Hubert Zhang
hzhang@pivotal.io
In reply to: Pavel Stehule (#3)
Re: Proposal for disk quota feature

Thanks Chapman.
@Pavel, could you please explain more about your second suggestion "implement
some quotas on storage level?"
We will not keep the long-lived processes attach to all databases(just like
you mentioned servers with thousands of databases)
And you are right, we could share ideas with autovacuum process, fork
worker processes in need.
"autovacuum checks for tables that have had a large number of inserted,
updated or deleted tuples. These checks use the statistics collection
facility"
diskquota process is similar to autovacuum at caring about insert, but the
difference is that it also care about vucuum full, truncate and drop. While
update and delete may not be interested since no file change happens. So a
separate diskquota process is preferred.

So if we implemented disk quota as a full native feature, and in the first
initial version I prefer to implement the following features:
1 Fork diskquota launcher process under Postmaster serverloop, which is
long-lived.
2 Diskquota launcher process is responsible for creating diskquota worker
process for every database.
3 DIskquota setting is stored in a separate catalog table for each database.
4 Initialization stage, Diskquota launcher process creates diskquota worker
process for all the databases(traverse like autovacuum). Worker process
calculates disk usage of db objects and their diskquota setting. If any
db object exceeds its quota limit, put them into the blacklist in the
shared memory, which will later be used by enforcement operator. Worker
process exits when works are done.
5 Running stage, Diskquota launcher process creates diskquota worker
process for the database with a large number of insert, copy, truncate,
drop etc. or create disk quota statement. Worker process updates the file
size for db objects containing the result relation, and compare with the
diskquota setting. Again, if exceeds quota limit, put them into blacklist,
remove from blacklist vice versa. Worker process exits when works are done
and a GUC could control the frequency of worker process restart to a
specific database. As you know, this GUC also controls the delay when we do
enforcement.
6 Enforcement. When postgres backend executes queries, check the blacklist
in shared memory to determine whether the query is allowed(before execute)
or need rollback(is executing)?

If we implemented disk quota as an extension, we could just use background
worker to start diskquota launcher process and use
RegisterDynamicBackgroundWorker() to fork child diskquota worker processes
by the launcher process as suggested by @Chapman. Diskquota setting could
be stored in user table in a separate schema for each database(Schema and
table created by create extension statement) just like what Heikki has done
in pg_quota project. But in this case, we need to create extension for each
database before diskquota worker process can be set up for that database.

Any comments on the above design and which is preferred, native feature or
extension as the POC?

-- Hubert

On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net>:

On 08/30/2018 09:57 AM, Hubert Zhang wrote:

2 Keep one worker process for each database. But using a parent/global
quota worker process to manage the lifecycle of database level worker
processes. It could handle the newly created database(avoid restart
database) and save resource when a database is not used. But this needs

to

change worker process to be hierarchical. Postmaster becomes the

grandfather

of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period, and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

There are servers with thousands databases. Worker per database is not
good idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers can be. On
servers with large set of databases, large set of tables it can identify
too big table too late.

Isn't better to implement some quotas on storage level?

Regards

Pavel

-Chap

--
Thanks

Hubert Zhang

#5Pavel Stehule
pavel.stehule@gmail.com
In reply to: Hubert Zhang (#4)
1 attachment(s)
Re: Proposal for disk quota feature

Hi

2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:

Thanks Chapman.
@Pavel, could you please explain more about your second suggestion "implement
some quotas on storage level?"

See attached patch - it is very simple - and good enough for our purposes.

Regards

Pavel

Show quoted text

We will not keep the long-lived processes attach to all databases(just
like you mentioned servers with thousands of databases)
And you are right, we could share ideas with autovacuum process, fork
worker processes in need.
"autovacuum checks for tables that have had a large number of inserted,
updated or deleted tuples. These checks use the statistics collection
facility"
diskquota process is similar to autovacuum at caring about insert, but the
difference is that it also care about vucuum full, truncate and drop. While
update and delete may not be interested since no file change happens. So a
separate diskquota process is preferred.

So if we implemented disk quota as a full native feature, and in the
first initial version I prefer to implement the following features:
1 Fork diskquota launcher process under Postmaster serverloop, which is
long-lived.
2 Diskquota launcher process is responsible for creating diskquota worker
process for every database.
3 DIskquota setting is stored in a separate catalog table for each
database.
4 Initialization stage, Diskquota launcher process creates diskquota worker
process for all the databases(traverse like autovacuum). Worker process
calculates disk usage of db objects and their diskquota setting. If any
db object exceeds its quota limit, put them into the blacklist in the
shared memory, which will later be used by enforcement operator. Worker
process exits when works are done.
5 Running stage, Diskquota launcher process creates diskquota worker
process for the database with a large number of insert, copy, truncate,
drop etc. or create disk quota statement. Worker process updates the file
size for db objects containing the result relation, and compare with the
diskquota setting. Again, if exceeds quota limit, put them into blacklist,
remove from blacklist vice versa. Worker process exits when works are
done and a GUC could control the frequency of worker process restart to a
specific database. As you know, this GUC also controls the delay when we do
enforcement.
6 Enforcement. When postgres backend executes queries, check the blacklist
in shared memory to determine whether the query is allowed(before execute)
or need rollback(is executing)?

If we implemented disk quota as an extension, we could just use background
worker to start diskquota launcher process and use
RegisterDynamicBackgroundWorker() to fork child diskquota worker
processes by the launcher process as suggested by @Chapman.
Diskquota setting could be stored in user table in a separate schema for
each database(Schema and table created by create extension statement) just
like what Heikki has done in pg_quota project. But in this case, we need to
create extension for each database before diskquota worker process can be
set up for that database.

Any comments on the above design and which is preferred, native feature or
extension as the POC?

-- Hubert

On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net>:

On 08/30/2018 09:57 AM, Hubert Zhang wrote:

2 Keep one worker process for each database. But using a parent/global
quota worker process to manage the lifecycle of database level worker
processes. It could handle the newly created database(avoid restart
database) and save resource when a database is not used. But this

needs to

change worker process to be hierarchical. Postmaster becomes the

grandfather

of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period, and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

There are servers with thousands databases. Worker per database is not
good idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers can be. On
servers with large set of databases, large set of tables it can identify
too big table too late.

Isn't better to implement some quotas on storage level?

Regards

Pavel

-Chap

--
Thanks

Hubert Zhang

Attachments:

relation_limit.patchtext/x-patch; charset=US-ASCII; name=relation_limit.patchDownload
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index bb96881cad..619163e237 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -35,6 +35,7 @@
 #include "storage/bufmgr.h"
 #include "storage/relfilenode.h"
 #include "storage/smgr.h"
+#include "utils/guc.h"
 #include "utils/hsearch.h"
 #include "utils/memutils.h"
 #include "pg_trace.h"
@@ -198,6 +199,11 @@ static MdfdVec *_mdfd_getseg(SMgrRelation reln, ForkNumber forkno,
 static BlockNumber _mdnblocks(SMgrRelation reln, ForkNumber forknum,
 		   MdfdVec *seg);
 
+/*
+ *  limits for relations size
+ */
+int max_blocks;
+int warn_blocks;
 
 /*
  *	mdinit() -- Initialize private state for magnetic disk storage manager.
@@ -504,6 +510,14 @@ mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 	Assert(blocknum >= mdnblocks(reln, forknum));
 #endif
 
+	if (max_blocks != -1 && blocknum > (BlockNumber) max_blocks)
+		ereport(ERROR,
+				(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+				 errmsg("cannot extend file beyond %u blocks",
+						max_blocks),
+				 errhint("Session file limit defined by \"hard_relation_limit\" (%s) is over.",
+						GetConfigOptionByName("hard_relation_limit", NULL, false))));
+
 	/*
 	 * If a relation manages to grow to 2^32-1 blocks, refuse to extend it any
 	 * more --- we mustn't create a block whose number actually is
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 9b533d52c8..ecc7906af2 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -161,6 +161,12 @@ static int	syslog_facility = LOG_LOCAL0;
 static int	syslog_facility = 0;
 #endif
 
+/*
+ *  limits for relations size
+ */
+extern int max_blocks;
+extern int warn_blocks;
+
 static void assign_syslog_facility(int newval, void *extra);
 static void assign_syslog_ident(const char *newval, void *extra);
 static void assign_session_replication_role(int newval, void *extra);
@@ -2910,6 +2916,17 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"hard_relation_limit", PGC_USERSET, RESOURCES,
+			gettext_noop("Sets maximum number of blocks in relation."),
+			gettext_noop("The default is -1 (turning this feature off)."),
+			GUC_UNIT_BLOCKS | GUC_NOT_IN_SAMPLE
+		},
+		&max_blocks,
+		-1, -1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"track_activity_query_size", PGC_POSTMASTER, RESOURCES_MEM,
 			gettext_noop("Sets the size reserved for pg_stat_activity.query, in bytes."),
diff --git a/src/test/regress/expected/guc.out b/src/test/regress/expected/guc.out
index 43ac5f5f11..6a3e090c9d 100644
--- a/src/test/regress/expected/guc.out
+++ b/src/test/regress/expected/guc.out
@@ -751,6 +751,12 @@ select current_setting('nosuch.setting', true);
  nada
 (1 row)
 
+-- set limits for tables
+set hard_relation_limit = '1MB';
+create table test_limit_hard as select 1 a, 2 b, 3 c, 4 d from generate_series(1,100000);
+ERROR:  cannot extend file beyond 128 blocks
+HINT:  Session file limit defined by "hard_relation_limit" (1MB) is over.
+set hard_relation_limit to default;
 -- Normally, CREATE FUNCTION should complain about invalid values in
 -- function SET options; but not if check_function_bodies is off,
 -- because that creates ordering hazards for pg_dump
diff --git a/src/test/regress/sql/guc.sql b/src/test/regress/sql/guc.sql
index 23e5029780..bee38c013d 100644
--- a/src/test/regress/sql/guc.sql
+++ b/src/test/regress/sql/guc.sql
@@ -271,6 +271,15 @@ select current_setting('nosuch.setting');
 select current_setting('nosuch.setting', false);
 select current_setting('nosuch.setting', true);
 
+
+-- set limits for tables
+set hard_relation_limit = '1MB';
+
+create table test_limit_hard as select 1 a, 2 b, 3 c, 4 d from generate_series(1,100000);
+
+set hard_relation_limit to default;
+
+
 -- Normally, CREATE FUNCTION should complain about invalid values in
 -- function SET options; but not if check_function_bodies is off,
 -- because that creates ordering hazards for pg_dump
#6Hubert Zhang
hzhang@pivotal.io
In reply to: Pavel Stehule (#5)
Re: Proposal for disk quota feature

Thanks Pavel.
Your patch did enforcement on storage level(md.c or we could also use
smgr_extend). It's straight forward.
But I prefer to implement disk_quota as a feature with following objectives:
1 set/alter disk quota setting on different database objects, e.g. user,
database, schema etc. not only a general GUC, but we could set separate
quota limit for a specific objects.
2 enforcement operator should work at two positions: before query is
running and when query is running. The latter one's implementation maybe
similar to your patch.

On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

Hi

2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:

Thanks Chapman.
@Pavel, could you please explain more about your second suggestion "implement
some quotas on storage level?"

See attached patch - it is very simple - and good enough for our purposes.

Regards

Pavel

We will not keep the long-lived processes attach to all databases(just
like you mentioned servers with thousands of databases)
And you are right, we could share ideas with autovacuum process, fork
worker processes in need.
"autovacuum checks for tables that have had a large number of inserted,
updated or deleted tuples. These checks use the statistics collection
facility"
diskquota process is similar to autovacuum at caring about insert, but
the difference is that it also care about vucuum full, truncate and drop.
While update and delete may not be interested since no file change happens.
So a separate diskquota process is preferred.

So if we implemented disk quota as a full native feature, and in the
first initial version I prefer to implement the following features:
1 Fork diskquota launcher process under Postmaster serverloop, which is
long-lived.
2 Diskquota launcher process is responsible for creating diskquota
worker process for every database.
3 DIskquota setting is stored in a separate catalog table for each
database.
4 Initialization stage, Diskquota launcher process creates diskquota worker
process for all the databases(traverse like autovacuum). Worker process
calculates disk usage of db objects and their diskquota setting. If any
db object exceeds its quota limit, put them into the blacklist in the
shared memory, which will later be used by enforcement operator. Worker
process exits when works are done.
5 Running stage, Diskquota launcher process creates diskquota worker
process for the database with a large number of insert, copy, truncate,
drop etc. or create disk quota statement. Worker process updates the file
size for db objects containing the result relation, and compare with the
diskquota setting. Again, if exceeds quota limit, put them into blacklist,
remove from blacklist vice versa. Worker process exits when works are
done and a GUC could control the frequency of worker process restart to a
specific database. As you know, this GUC also controls the delay when we do
enforcement.
6 Enforcement. When postgres backend executes queries, check the
blacklist in shared memory to determine whether the query is allowed(before
execute) or need rollback(is executing)?

If we implemented disk quota as an extension, we could just use
background worker to start diskquota launcher process and use
RegisterDynamicBackgroundWorker() to fork child diskquota worker
processes by the launcher process as suggested by @Chapman.
Diskquota setting could be stored in user table in a separate schema for
each database(Schema and table created by create extension statement) just
like what Heikki has done in pg_quota project. But in this case, we need to
create extension for each database before diskquota worker process can be
set up for that database.

Any comments on the above design and which is preferred, native feature
or extension as the POC?

-- Hubert

On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net>:

On 08/30/2018 09:57 AM, Hubert Zhang wrote:

2 Keep one worker process for each database. But using a parent/global
quota worker process to manage the lifecycle of database level worker
processes. It could handle the newly created database(avoid restart
database) and save resource when a database is not used. But this

needs to

change worker process to be hierarchical. Postmaster becomes the

grandfather

of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period, and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

There are servers with thousands databases. Worker per database is not
good idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers can be. On
servers with large set of databases, large set of tables it can identify
too big table too late.

Isn't better to implement some quotas on storage level?

Regards

Pavel

-Chap

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

#7Pavel Stehule
pavel.stehule@gmail.com
In reply to: Hubert Zhang (#6)
Re: Proposal for disk quota feature

2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:

Thanks Pavel.
Your patch did enforcement on storage level(md.c or we could also use
smgr_extend). It's straight forward.
But I prefer to implement disk_quota as a feature with following
objectives:
1 set/alter disk quota setting on different database objects, e.g. user,
database, schema etc. not only a general GUC, but we could set separate
quota limit for a specific objects.
2 enforcement operator should work at two positions: before query is
running and when query is running. The latter one's implementation maybe
similar to your patch.

The patch was just example. The resource quotes should be more complex -
per partition, table, schema, database, user - so GUC are possible, but not
very user friendly.

Our case is specific, but not too much. The servers are used for
multidimensional analyses - and some tables can grow too fast (COPY, INSERT
SELECT). We need to solve limits immediately. The implementation is simple,
so I did it. Same implementation on database level, or schema level needs
some more locks, so it will not be too effective. The resource management
can be complex very complex, and I expect so it will be hard work.

Regards

Pavel

Show quoted text

On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

Hi

2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:

Thanks Chapman.
@Pavel, could you please explain more about your second suggestion "implement
some quotas on storage level?"

See attached patch - it is very simple - and good enough for our
purposes.

Regards

Pavel

We will not keep the long-lived processes attach to all databases(just
like you mentioned servers with thousands of databases)
And you are right, we could share ideas with autovacuum process, fork
worker processes in need.
"autovacuum checks for tables that have had a large number of inserted,
updated or deleted tuples. These checks use the statistics collection
facility"
diskquota process is similar to autovacuum at caring about insert, but
the difference is that it also care about vucuum full, truncate and drop.
While update and delete may not be interested since no file change happens.
So a separate diskquota process is preferred.

So if we implemented disk quota as a full native feature, and in the
first initial version I prefer to implement the following features:
1 Fork diskquota launcher process under Postmaster serverloop, which is
long-lived.
2 Diskquota launcher process is responsible for creating diskquota
worker process for every database.
3 DIskquota setting is stored in a separate catalog table for each
database.
4 Initialization stage, Diskquota launcher process creates diskquota worker
process for all the databases(traverse like autovacuum). Worker process
calculates disk usage of db objects and their diskquota setting. If any
db object exceeds its quota limit, put them into the blacklist in the
shared memory, which will later be used by enforcement operator. Worker
process exits when works are done.
5 Running stage, Diskquota launcher process creates diskquota worker
process for the database with a large number of insert, copy, truncate,
drop etc. or create disk quota statement. Worker process updates the file
size for db objects containing the result relation, and compare with the
diskquota setting. Again, if exceeds quota limit, put them into blacklist,
remove from blacklist vice versa. Worker process exits when works are
done and a GUC could control the frequency of worker process restart to a
specific database. As you know, this GUC also controls the delay when we do
enforcement.
6 Enforcement. When postgres backend executes queries, check the
blacklist in shared memory to determine whether the query is allowed(before
execute) or need rollback(is executing)?

If we implemented disk quota as an extension, we could just use
background worker to start diskquota launcher process and use
RegisterDynamicBackgroundWorker() to fork child diskquota worker
processes by the launcher process as suggested by @Chapman.
Diskquota setting could be stored in user table in a separate schema for
each database(Schema and table created by create extension statement) just
like what Heikki has done in pg_quota project. But in this case, we need to
create extension for each database before diskquota worker process can be
set up for that database.

Any comments on the above design and which is preferred, native feature
or extension as the POC?

-- Hubert

On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net>:

On 08/30/2018 09:57 AM, Hubert Zhang wrote:

2 Keep one worker process for each database. But using a

parent/global

quota worker process to manage the lifecycle of database level worker
processes. It could handle the newly created database(avoid restart
database) and save resource when a database is not used. But this

needs to

change worker process to be hierarchical. Postmaster becomes the

grandfather

of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period, and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

There are servers with thousands databases. Worker per database is not
good idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers can be.
On servers with large set of databases, large set of tables it can identify
too big table too late.

Isn't better to implement some quotas on storage level?

Regards

Pavel

-Chap

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

#8Hubert Zhang
hzhang@pivotal.io
In reply to: Pavel Stehule (#7)
Re: Proposal for disk quota feature

*Hi all,We redesign disk quota feature based on the comments from Pavel
Stehule and Chapman Flack. Here are the new design.OverviewBasically, disk
quota feature is used to support multi-tenancy environment, different level
of database objects could be set a quota limit to avoid over use of disk
space. A common case could be as follows: DBA could enable disk quota on a
specified database list. DBA could set disk quota limit for
tables/schemas/roles in these databases. Separate disk quota worker process
will monitor the disk usage for these objects and detect the objects which
exceed their quota limit. Queries loading data into these “out of disk
quota” tables/schemas/roles will be cancelled.We are currently working at
init implementation stage. We would like to propose our idea firstly and
get feedbacks from community to do quick iteration.SQL Syntax (How to use
disk quota)1 Specify the databases with disk quota enabled in GUC
“diskquota_databases” in postgresql.conf and restart the database.2 DBA
could set disk quota limit for table/schema/role.CREATE DISK QUOTA tablea1
ON TABLE a1 with (quota = ‘1MB’);CREATE DISK QUOTA roleu1 ON USER u1 with
(quota = ‘1GB’);CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota =
‘3MB’);3 Simulate a schema out of quota limit case: suppose table a1 and
table a2 are both under schema s1.INSERT INTO a1 SELECT
generate_series(1,1000);INSERT INTO a2 SELECT
generate_series(1,3000000);SELECT pg_sleep(5)INSERT INTO a1 SELECT
generate_series(1,1000);ERROR: schema's disk space quota exceededDROP
TABLE a2;SELECT pg_sleep(5)INSERT INTO a1 SELECT
generate_series(1,1000);INSERT 0 1000ArchitectureDisk quota has the
following components.1. Quota Setting Store is where the disk quota setting
to be stored and accessed. We plan to use catalog table pg_diskquota to
store these information. pg_diskquota is
like:CATALOG(pg_diskquota,6122,DiskQuotaRelationId){ NameData quotaname; /*
diskquota name */ int16 quotatype; /* diskquota type name */ Oid
quotatargetoid; /* diskquota target db object oid*/ int32 quotalimit; /*
diskquota size limit in MB*/ int32 quotaredzone; /* diskquota redzone in
MB*/} FormData_pg_diskquota;2. Quota Change Detector is the monitor of size
change of database objects. We plan to use stat collector to detect the
‘active’ table list at initial stage. But stat collector has some
limitation on finding the active table which is in a running transaction.
Details see TODO section.3. Quota Size Checker is where to calculate the
size and compare with quota limit for database objects. According to
Pavel’s comment, autovacuum launcher and worker process could be a good
reference to disk quota. So we plan to use a disk quota launcher daemon
process and several disk quota worker process to finish this work. Launcher
process is responsible for starting worker process based on a user defined
database list from GUC. Worker process will connect to its target database
and monitor the disk usage for objects in this database. In init stage of
worker process, it will call calculate_total_relation_size() to calculate
the size for each user table. After init stage, worker process will refresh
the disk model every N seconds. Refreshing will only recalculate the size
of tables in ‘active’ table list, which is generated by Quata Change
Detector to minimize the cost.4. Quota Enforcement Operator is where to
check for the quota limitation at postgres backend side. We will firstly
implement it in ExecCheckRTPerms() as pre-running enforcement. It will
check the disk quota of tables being inserted or updated, and report error
if table’s or table’s schema’s or table’s owner’s quota limit is exceeded.
As a native feature, we plan to add more checkpoint to do running query
enforcement. For example, if a disk quota lefts 10MB quota, a query could
insert 1GB data. This query could be allowed in pre-running enforcement
check, but will be cancelled in running query enforcement check. Therefore,
it can improve the accurate of disk quota usage. To achieve this, we plan
to add a checkpoint in lower API such as smgr_extened. Hence, the Quota
Enforcement Operator will check the disk quota usage when smgr_extened is
called. If the quota is over limited, current query will be cancelled.
Highlight1. Native feature.Support native Create/Drop Disk Quota SQL
statement.New catalog table pg_diskquota to store disk quota setting.2.
Auto DML/DDL detection. Table
create/update/insert/delete/vacuum/truncate/drop/schema_change/owner_change,
Schema create/drop and Role create/drop will be detected by disk quota
automatically. 3. Low cost disk quota checker.Worker process of disk quota
need to refresh the disk usage model every N seconds. Since recalculate the
file size using stat() system call is expensive for a large number of
files, we use an ‘active’ table list to reduce the real work at each
iteration. A basic experiment on our init stage implementation on database
with 20K tables shows that the refresh cost is 1% cpu usage and will be
finished within 50ms. Todo/LimitationBefore we propose our patch, we plan
to enhance it with the following ideas:1. Setting database list with disk
quota enabled dynamically without restart database. Since we have the disk
quota launcher process, it could detect the new ‘diskquota_databases’ list
and start/stop the corresponding disk quota worker process.2. Enforcement
when query is running. Considering the case when there is 10MB quota left,
but next query will insert 10GB data. Current enforcement design will allow
this query to be executed. This is limited by the ‘active’ table detection
is generated by stat collector. Postgres backend will only send table stat
information to collector only when the transaction ends. We need a new way
to detect the ‘active’ table even when this table is being modified inside
a running transaction.3. Monitor unlimited number of databases. Current we
set the max number of disk quota worker process to be 10 to reduce the
affection normal workload. But how about if we want to monitor the disk
quota of more than 10 databases? Our solution is to let disk quota launcher
to manage a queue of database need to be monitored. And disk quota worker
process consuming the queue and refresh the disk usage/quota for this
database. After some periods, worker will return the database to the queue,
and fetch the top database from queue to process. The period determine the
delay of detecting disk quota change. To implement this feature, we need to
support a subprocess of postmaster to rebind to another database instead of
the database binded in InitPostgres().4. Support active table detection on
vacuum full and vacuum analyze. Currently vacuum full and vacuum analyze
are not tracked by stat collector.Thanks to Heikki, Pavel Stehule,Chapman
Flack for the former comments on disk quota feature. Any comments on how to
improve disk quota feature are appreciated.*

On Mon, Sep 3, 2018 at 12:05 PM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:

Thanks Pavel.
Your patch did enforcement on storage level(md.c or we could also use
smgr_extend). It's straight forward.
But I prefer to implement disk_quota as a feature with following
objectives:
1 set/alter disk quota setting on different database objects, e.g. user,
database, schema etc. not only a general GUC, but we could set separate
quota limit for a specific objects.
2 enforcement operator should work at two positions: before query is
running and when query is running. The latter one's implementation maybe
similar to your patch.

The patch was just example. The resource quotes should be more complex -
per partition, table, schema, database, user - so GUC are possible, but not
very user friendly.

Our case is specific, but not too much. The servers are used for
multidimensional analyses - and some tables can grow too fast (COPY, INSERT
SELECT). We need to solve limits immediately. The implementation is simple,
so I did it. Same implementation on database level, or schema level needs
some more locks, so it will not be too effective. The resource management
can be complex very complex, and I expect so it will be hard work.

Regards

Pavel

On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

Hi

2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:

Thanks Chapman.
@Pavel, could you please explain more about your second suggestion "implement
some quotas on storage level?"

See attached patch - it is very simple - and good enough for our
purposes.

Regards

Pavel

We will not keep the long-lived processes attach to all databases(just
like you mentioned servers with thousands of databases)
And you are right, we could share ideas with autovacuum process, fork
worker processes in need.
"autovacuum checks for tables that have had a large number of
inserted, updated or deleted tuples. These checks use the statistics
collection facility"
diskquota process is similar to autovacuum at caring about insert, but
the difference is that it also care about vucuum full, truncate and drop.
While update and delete may not be interested since no file change happens.
So a separate diskquota process is preferred.

So if we implemented disk quota as a full native feature, and in the
first initial version I prefer to implement the following features:
1 Fork diskquota launcher process under Postmaster serverloop, which is
long-lived.
2 Diskquota launcher process is responsible for creating diskquota
worker process for every database.
3 DIskquota setting is stored in a separate catalog table for each
database.
4 Initialization stage, Diskquota launcher process creates diskquota worker
process for all the databases(traverse like autovacuum). Worker process
calculates disk usage of db objects and their diskquota setting. If any
db object exceeds its quota limit, put them into the blacklist in the
shared memory, which will later be used by enforcement operator. Worker
process exits when works are done.
5 Running stage, Diskquota launcher process creates diskquota worker
process for the database with a large number of insert, copy, truncate,
drop etc. or create disk quota statement. Worker process updates the file
size for db objects containing the result relation, and compare with the
diskquota setting. Again, if exceeds quota limit, put them into blacklist,
remove from blacklist vice versa. Worker process exits when works are
done and a GUC could control the frequency of worker process restart to a
specific database. As you know, this GUC also controls the delay when we do
enforcement.
6 Enforcement. When postgres backend executes queries, check the
blacklist in shared memory to determine whether the query is allowed(before
execute) or need rollback(is executing)?

If we implemented disk quota as an extension, we could just use
background worker to start diskquota launcher process and use
RegisterDynamicBackgroundWorker() to fork child diskquota worker
processes by the launcher process as suggested by @Chapman.
Diskquota setting could be stored in user table in a separate schema for
each database(Schema and table created by create extension statement) just
like what Heikki has done in pg_quota project. But in this case, we need to
create extension for each database before diskquota worker process can be
set up for that database.

Any comments on the above design and which is preferred, native feature
or extension as the POC?

-- Hubert

On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <pavel.stehule@gmail.com

wrote:

2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net>:

On 08/30/2018 09:57 AM, Hubert Zhang wrote:

2 Keep one worker process for each database. But using a

parent/global

quota worker process to manage the lifecycle of database level

worker

processes. It could handle the newly created database(avoid restart
database) and save resource when a database is not used. But this

needs to

change worker process to be hierarchical. Postmaster becomes the

grandfather

of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period, and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

There are servers with thousands databases. Worker per database is not
good idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers can be.
On servers with large set of databases, large set of tables it can identify
too big table too late.

Isn't better to implement some quotas on storage level?

Regards

Pavel

-Chap

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

#9Pavel Stehule
pavel.stehule@gmail.com
In reply to: Hubert Zhang (#8)
Re: Proposal for disk quota feature

pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang <hzhang@pivotal.io> napsal:

*Hi all,We redesign disk quota feature based on the comments from Pavel
Stehule and Chapman Flack. Here are the new design.OverviewBasically, disk
quota feature is used to support multi-tenancy environment, different level
of database objects could be set a quota limit to avoid over use of disk
space. A common case could be as follows: DBA could enable disk quota on a
specified database list. DBA could set disk quota limit for
tables/schemas/roles in these databases. Separate disk quota worker process
will monitor the disk usage for these objects and detect the objects which
exceed their quota limit. Queries loading data into these “out of disk
quota” tables/schemas/roles will be cancelled.We are currently working at
init implementation stage. We would like to propose our idea firstly and
get feedbacks from community to do quick iteration.SQL Syntax (How to use
disk quota)1 Specify the databases with disk quota enabled in GUC
“diskquota_databases” in postgresql.conf and restart the database.2 DBA
could set disk quota limit for table/schema/role.CREATE DISK QUOTA tablea1
ON TABLE a1 with (quota = ‘1MB’);CREATE DISK QUOTA roleu1 ON USER u1 with
(quota = ‘1GB’);CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota =
‘3MB’);*

just fast reaction - why QUOTA object?

Isn't ALTER SET enough?

Some like

ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..

New DDL commans looks like too hard hammer .

Show quoted text

*3 Simulate a schema out of quota limit case: suppose table a1 and table
a2 are both under schema s1.INSERT INTO a1 SELECT
generate_series(1,1000);INSERT INTO a2 SELECT
generate_series(1,3000000);SELECT pg_sleep(5)INSERT INTO a1 SELECT
generate_series(1,1000);ERROR: schema's disk space quota exceededDROP
TABLE a2;SELECT pg_sleep(5)INSERT INTO a1 SELECT
generate_series(1,1000);INSERT 0 1000ArchitectureDisk quota has the
following components.1. Quota Setting Store is where the disk quota setting
to be stored and accessed. We plan to use catalog table pg_diskquota to
store these information. pg_diskquota is
like:CATALOG(pg_diskquota,6122,DiskQuotaRelationId){ NameData quotaname; /*
diskquota name */ int16 quotatype; /* diskquota type name */ Oid
quotatargetoid; /* diskquota target db object oid*/ int32 quotalimit; /*
diskquota size limit in MB*/ int32 quotaredzone; /* diskquota redzone in
MB*/} FormData_pg_diskquota;2. Quota Change Detector is the monitor of size
change of database objects. We plan to use stat collector to detect the
‘active’ table list at initial stage. But stat collector has some
limitation on finding the active table which is in a running transaction.
Details see TODO section.3. Quota Size Checker is where to calculate the
size and compare with quota limit for database objects. According to
Pavel’s comment, autovacuum launcher and worker process could be a good
reference to disk quota. So we plan to use a disk quota launcher daemon
process and several disk quota worker process to finish this work. Launcher
process is responsible for starting worker process based on a user defined
database list from GUC. Worker process will connect to its target database
and monitor the disk usage for objects in this database. In init stage of
worker process, it will call calculate_total_relation_size() to calculate
the size for each user table. After init stage, worker process will refresh
the disk model every N seconds. Refreshing will only recalculate the size
of tables in ‘active’ table list, which is generated by Quata Change
Detector to minimize the cost.4. Quota Enforcement Operator is where to
check for the quota limitation at postgres backend side. We will firstly
implement it in ExecCheckRTPerms() as pre-running enforcement. It will
check the disk quota of tables being inserted or updated, and report error
if table’s or table’s schema’s or table’s owner’s quota limit is exceeded.
As a native feature, we plan to add more checkpoint to do running query
enforcement. For example, if a disk quota lefts 10MB quota, a query could
insert 1GB data. This query could be allowed in pre-running enforcement
check, but will be cancelled in running query enforcement check. Therefore,
it can improve the accurate of disk quota usage. To achieve this, we plan
to add a checkpoint in lower API such as smgr_extened. Hence, the Quota
Enforcement Operator will check the disk quota usage when smgr_extened is
called. If the quota is over limited, current query will be cancelled.
Highlight1. Native feature.Support native Create/Drop Disk Quota SQL
statement.New catalog table pg_diskquota to store disk quota setting.2.
Auto DML/DDL detection. Table
create/update/insert/delete/vacuum/truncate/drop/schema_change/owner_change,
Schema create/drop and Role create/drop will be detected by disk quota
automatically. 3. Low cost disk quota checker.Worker process of disk quota
need to refresh the disk usage model every N seconds. Since recalculate the
file size using stat() system call is expensive for a large number of
files, we use an ‘active’ table list to reduce the real work at each
iteration. A basic experiment on our init stage implementation on database
with 20K tables shows that the refresh cost is 1% cpu usage and will be
finished within 50ms. Todo/LimitationBefore we propose our patch, we plan
to enhance it with the following ideas:1. Setting database list with disk
quota enabled dynamically without restart database. Since we have the disk
quota launcher process, it could detect the new ‘diskquota_databases’ list
and start/stop the corresponding disk quota worker process.2. Enforcement
when query is running. Considering the case when there is 10MB quota left,
but next query will insert 10GB data. Current enforcement design will allow
this query to be executed. This is limited by the ‘active’ table detection
is generated by stat collector. Postgres backend will only send table stat
information to collector only when the transaction ends. We need a new way
to detect the ‘active’ table even when this table is being modified inside
a running transaction.3. Monitor unlimited number of databases. Current we
set the max number of disk quota worker process to be 10 to reduce the
affection normal workload. But how about if we want to monitor the disk
quota of more than 10 databases? Our solution is to let disk quota launcher
to manage a queue of database need to be monitored. And disk quota worker
process consuming the queue and refresh the disk usage/quota for this
database. After some periods, worker will return the database to the queue,
and fetch the top database from queue to process. The period determine the
delay of detecting disk quota change. To implement this feature, we need to
support a subprocess of postmaster to rebind to another database instead of
the database binded in InitPostgres().4. Support active table detection on
vacuum full and vacuum analyze. Currently vacuum full and vacuum analyze
are not tracked by stat collector.Thanks to Heikki, Pavel Stehule,Chapman
Flack for the former comments on disk quota feature. Any comments on how to
improve disk quota feature are appreciated.*

On Mon, Sep 3, 2018 at 12:05 PM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:

Thanks Pavel.
Your patch did enforcement on storage level(md.c or we could also use
smgr_extend). It's straight forward.
But I prefer to implement disk_quota as a feature with following
objectives:
1 set/alter disk quota setting on different database objects, e.g. user,
database, schema etc. not only a general GUC, but we could set separate
quota limit for a specific objects.
2 enforcement operator should work at two positions: before query is
running and when query is running. The latter one's implementation maybe
similar to your patch.

The patch was just example. The resource quotes should be more complex -
per partition, table, schema, database, user - so GUC are possible, but not
very user friendly.

Our case is specific, but not too much. The servers are used for
multidimensional analyses - and some tables can grow too fast (COPY, INSERT
SELECT). We need to solve limits immediately. The implementation is simple,
so I did it. Same implementation on database level, or schema level needs
some more locks, so it will not be too effective. The resource management
can be complex very complex, and I expect so it will be hard work.

Regards

Pavel

On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

Hi

2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:

Thanks Chapman.
@Pavel, could you please explain more about your second suggestion "implement
some quotas on storage level?"

See attached patch - it is very simple - and good enough for our
purposes.

Regards

Pavel

We will not keep the long-lived processes attach to all databases(just
like you mentioned servers with thousands of databases)
And you are right, we could share ideas with autovacuum process, fork
worker processes in need.
"autovacuum checks for tables that have had a large number of
inserted, updated or deleted tuples. These checks use the statistics
collection facility"
diskquota process is similar to autovacuum at caring about insert, but
the difference is that it also care about vucuum full, truncate and drop.
While update and delete may not be interested since no file change happens.
So a separate diskquota process is preferred.

So if we implemented disk quota as a full native feature, and in the
first initial version I prefer to implement the following features:
1 Fork diskquota launcher process under Postmaster serverloop, which
is long-lived.
2 Diskquota launcher process is responsible for creating diskquota
worker process for every database.
3 DIskquota setting is stored in a separate catalog table for each
database.
4 Initialization stage, Diskquota launcher process creates diskquota worker
process for all the databases(traverse like autovacuum). Worker process
calculates disk usage of db objects and their diskquota setting. If any
db object exceeds its quota limit, put them into the blacklist in the
shared memory, which will later be used by enforcement operator. Worker
process exits when works are done.
5 Running stage, Diskquota launcher process creates diskquota worker
process for the database with a large number of insert, copy, truncate,
drop etc. or create disk quota statement. Worker process updates the file
size for db objects containing the result relation, and compare with the
diskquota setting. Again, if exceeds quota limit, put them into blacklist,
remove from blacklist vice versa. Worker process exits when works are
done and a GUC could control the frequency of worker process restart to a
specific database. As you know, this GUC also controls the delay when we do
enforcement.
6 Enforcement. When postgres backend executes queries, check the
blacklist in shared memory to determine whether the query is allowed(before
execute) or need rollback(is executing)?

If we implemented disk quota as an extension, we could just use
background worker to start diskquota launcher process and use
RegisterDynamicBackgroundWorker() to fork child diskquota worker
processes by the launcher process as suggested by @Chapman.
Diskquota setting could be stored in user table in a separate schema for
each database(Schema and table created by create extension statement) just
like what Heikki has done in pg_quota project. But in this case, we need to
create extension for each database before diskquota worker process can be
set up for that database.

Any comments on the above design and which is preferred, native
feature or extension as the POC?

-- Hubert

On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <
pavel.stehule@gmail.com> wrote:

2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net>:

On 08/30/2018 09:57 AM, Hubert Zhang wrote:

2 Keep one worker process for each database. But using a

parent/global

quota worker process to manage the lifecycle of database level

worker

processes. It could handle the newly created database(avoid restart
database) and save resource when a database is not used. But this

needs to

change worker process to be hierarchical. Postmaster becomes the

grandfather

of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period, and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

There are servers with thousands databases. Worker per database is
not good idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers can be.
On servers with large set of databases, large set of tables it can identify
too big table too late.

Isn't better to implement some quotas on storage level?

Regards

Pavel

-Chap

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

#10Hubert Zhang
hzhang@pivotal.io
In reply to: Pavel Stehule (#9)
Re: Proposal for disk quota feature

just fast reaction - why QUOTA object?
Isn't ALTER SET enough?
Some like
ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..
New DDL commans looks like too hard hammer .

It's an option. Prefer to consider quota setting store together:
CREATE DISK QUOTA way is more nature to store quota setting in a separate
pg_diskquota catalog
While ALTER SET way is more close to store quota setting in pg_class,
pg_role, pg_namespace. etc in an integrated way.
(Note that here I mean nature/close is not must, ALTER SET could also store
in pg_diskquota and vice versa.)

Here are some differences I can think of:
1 pg_role is a global catalog, not per database level. It's harder to
tracker the user's disk usage in the whole clusters(considering 1000+
databases). So the semantic of CREATE DISK QUOTA ON USER is limited: it
only tracks the user's disk usage inside the current database.
2 using separate pg_diskquota could add more field except for quota limit
without adding too many fields in pg_class, e.g. red zone to give the user
a warning or the current disk usage of the db objects.

On Fri, Sep 21, 2018 at 8:01 PM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang <hzhang@pivotal.io> napsal:

*Hi all,We redesign disk quota feature based on the comments from Pavel
Stehule and Chapman Flack. Here are the new design.OverviewBasically, disk
quota feature is used to support multi-tenancy environment, different level
of database objects could be set a quota limit to avoid over use of disk
space. A common case could be as follows: DBA could enable disk quota on a
specified database list. DBA could set disk quota limit for
tables/schemas/roles in these databases. Separate disk quota worker process
will monitor the disk usage for these objects and detect the objects which
exceed their quota limit. Queries loading data into these “out of disk
quota” tables/schemas/roles will be cancelled.We are currently working at
init implementation stage. We would like to propose our idea firstly and
get feedbacks from community to do quick iteration.SQL Syntax (How to use
disk quota)1 Specify the databases with disk quota enabled in GUC
“diskquota_databases” in postgresql.conf and restart the database.2 DBA
could set disk quota limit for table/schema/role.CREATE DISK QUOTA tablea1
ON TABLE a1 with (quota = ‘1MB’);CREATE DISK QUOTA roleu1 ON USER u1 with
(quota = ‘1GB’);CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota =
‘3MB’);*

just fast reaction - why QUOTA object?

Isn't ALTER SET enough?

Some like

ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..

New DDL commans looks like too hard hammer .

*3 Simulate a schema out of quota limit case: suppose table a1 and table
a2 are both under schema s1.INSERT INTO a1 SELECT
generate_series(1,1000);INSERT INTO a2 SELECT
generate_series(1,3000000);SELECT pg_sleep(5)INSERT INTO a1 SELECT
generate_series(1,1000);ERROR: schema's disk space quota exceededDROP
TABLE a2;SELECT pg_sleep(5)INSERT INTO a1 SELECT
generate_series(1,1000);INSERT 0 1000ArchitectureDisk quota has the
following components.1. Quota Setting Store is where the disk quota setting
to be stored and accessed. We plan to use catalog table pg_diskquota to
store these information. pg_diskquota is
like:CATALOG(pg_diskquota,6122,DiskQuotaRelationId){ NameData quotaname; /*
diskquota name */ int16 quotatype; /* diskquota type name */ Oid
quotatargetoid; /* diskquota target db object oid*/ int32 quotalimit; /*
diskquota size limit in MB*/ int32 quotaredzone; /* diskquota redzone in
MB*/} FormData_pg_diskquota;2. Quota Change Detector is the monitor of size
change of database objects. We plan to use stat collector to detect the
‘active’ table list at initial stage. But stat collector has some
limitation on finding the active table which is in a running transaction.
Details see TODO section.3. Quota Size Checker is where to calculate the
size and compare with quota limit for database objects. According to
Pavel’s comment, autovacuum launcher and worker process could be a good
reference to disk quota. So we plan to use a disk quota launcher daemon
process and several disk quota worker process to finish this work. Launcher
process is responsible for starting worker process based on a user defined
database list from GUC. Worker process will connect to its target database
and monitor the disk usage for objects in this database. In init stage of
worker process, it will call calculate_total_relation_size() to calculate
the size for each user table. After init stage, worker process will refresh
the disk model every N seconds. Refreshing will only recalculate the size
of tables in ‘active’ table list, which is generated by Quata Change
Detector to minimize the cost.4. Quota Enforcement Operator is where to
check for the quota limitation at postgres backend side. We will firstly
implement it in ExecCheckRTPerms() as pre-running enforcement. It will
check the disk quota of tables being inserted or updated, and report error
if table’s or table’s schema’s or table’s owner’s quota limit is exceeded.
As a native feature, we plan to add more checkpoint to do running query
enforcement. For example, if a disk quota lefts 10MB quota, a query could
insert 1GB data. This query could be allowed in pre-running enforcement
check, but will be cancelled in running query enforcement check. Therefore,
it can improve the accurate of disk quota usage. To achieve this, we plan
to add a checkpoint in lower API such as smgr_extened. Hence, the Quota
Enforcement Operator will check the disk quota usage when smgr_extened is
called. If the quota is over limited, current query will be cancelled.
Highlight1. Native feature.Support native Create/Drop Disk Quota SQL
statement.New catalog table pg_diskquota to store disk quota setting.2.
Auto DML/DDL detection. Table
create/update/insert/delete/vacuum/truncate/drop/schema_change/owner_change,
Schema create/drop and Role create/drop will be detected by disk quota
automatically. 3. Low cost disk quota checker.Worker process of disk quota
need to refresh the disk usage model every N seconds. Since recalculate the
file size using stat() system call is expensive for a large number of
files, we use an ‘active’ table list to reduce the real work at each
iteration. A basic experiment on our init stage implementation on database
with 20K tables shows that the refresh cost is 1% cpu usage and will be
finished within 50ms. Todo/LimitationBefore we propose our patch, we plan
to enhance it with the following ideas:1. Setting database list with disk
quota enabled dynamically without restart database. Since we have the disk
quota launcher process, it could detect the new ‘diskquota_databases’ list
and start/stop the corresponding disk quota worker process.2. Enforcement
when query is running. Considering the case when there is 10MB quota left,
but next query will insert 10GB data. Current enforcement design will allow
this query to be executed. This is limited by the ‘active’ table detection
is generated by stat collector. Postgres backend will only send table stat
information to collector only when the transaction ends. We need a new way
to detect the ‘active’ table even when this table is being modified inside
a running transaction.3. Monitor unlimited number of databases. Current we
set the max number of disk quota worker process to be 10 to reduce the
affection normal workload. But how about if we want to monitor the disk
quota of more than 10 databases? Our solution is to let disk quota launcher
to manage a queue of database need to be monitored. And disk quota worker
process consuming the queue and refresh the disk usage/quota for this
database. After some periods, worker will return the database to the queue,
and fetch the top database from queue to process. The period determine the
delay of detecting disk quota change. To implement this feature, we need to
support a subprocess of postmaster to rebind to another database instead of
the database binded in InitPostgres().4. Support active table detection on
vacuum full and vacuum analyze. Currently vacuum full and vacuum analyze
are not tracked by stat collector.Thanks to Heikki, Pavel Stehule,Chapman
Flack for the former comments on disk quota feature. Any comments on how to
improve disk quota feature are appreciated.*

On Mon, Sep 3, 2018 at 12:05 PM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:

Thanks Pavel.
Your patch did enforcement on storage level(md.c or we could also use
smgr_extend). It's straight forward.
But I prefer to implement disk_quota as a feature with following
objectives:
1 set/alter disk quota setting on different database objects, e.g.
user, database, schema etc. not only a general GUC, but we could set
separate quota limit for a specific objects.
2 enforcement operator should work at two positions: before query is
running and when query is running. The latter one's implementation maybe
similar to your patch.

The patch was just example. The resource quotes should be more complex -
per partition, table, schema, database, user - so GUC are possible, but not
very user friendly.

Our case is specific, but not too much. The servers are used for
multidimensional analyses - and some tables can grow too fast (COPY, INSERT
SELECT). We need to solve limits immediately. The implementation is simple,
so I did it. Same implementation on database level, or schema level needs
some more locks, so it will not be too effective. The resource management
can be complex very complex, and I expect so it will be hard work.

Regards

Pavel

On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

Hi

2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:

Thanks Chapman.
@Pavel, could you please explain more about your second suggestion "implement
some quotas on storage level?"

See attached patch - it is very simple - and good enough for our
purposes.

Regards

Pavel

We will not keep the long-lived processes attach to all
databases(just like you mentioned servers with thousands of databases
)
And you are right, we could share ideas with autovacuum process, fork
worker processes in need.
"autovacuum checks for tables that have had a large number of
inserted, updated or deleted tuples. These checks use the statistics
collection facility"
diskquota process is similar to autovacuum at caring about insert,
but the difference is that it also care about vucuum full, truncate and
drop. While update and delete may not be interested since no file change
happens. So a separate diskquota process is preferred.

So if we implemented disk quota as a full native feature, and in the
first initial version I prefer to implement the following features:
1 Fork diskquota launcher process under Postmaster serverloop, which
is long-lived.
2 Diskquota launcher process is responsible for creating diskquota
worker process for every database.
3 DIskquota setting is stored in a separate catalog table for each
database.
4 Initialization stage, Diskquota launcher process creates diskquota worker
process for all the databases(traverse like autovacuum). Worker process
calculates disk usage of db objects and their diskquota setting. If any
db object exceeds its quota limit, put them into the blacklist in the
shared memory, which will later be used by enforcement operator. Worker
process exits when works are done.
5 Running stage, Diskquota launcher process creates diskquota worker
process for the database with a large number of insert, copy, truncate,
drop etc. or create disk quota statement. Worker process updates the file
size for db objects containing the result relation, and compare with the
diskquota setting. Again, if exceeds quota limit, put them into blacklist,
remove from blacklist vice versa. Worker process exits when works
are done and a GUC could control the frequency of worker process restart to
a specific database. As you know, this GUC also controls the delay when we
do enforcement.
6 Enforcement. When postgres backend executes queries, check the
blacklist in shared memory to determine whether the query is allowed(before
execute) or need rollback(is executing)?

If we implemented disk quota as an extension, we could just use
background worker to start diskquota launcher process and use
RegisterDynamicBackgroundWorker() to fork child diskquota worker
processes by the launcher process as suggested by @Chapman.
Diskquota setting could be stored in user table in a separate schema for
each database(Schema and table created by create extension statement) just
like what Heikki has done in pg_quota project. But in this case, we need to
create extension for each database before diskquota worker process can be
set up for that database.

Any comments on the above design and which is preferred, native
feature or extension as the POC?

-- Hubert

On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <
pavel.stehule@gmail.com> wrote:

2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net>:

On 08/30/2018 09:57 AM, Hubert Zhang wrote:

2 Keep one worker process for each database. But using a

parent/global

quota worker process to manage the lifecycle of database level

worker

processes. It could handle the newly created database(avoid

restart

database) and save resource when a database is not used. But this

needs to

change worker process to be hierarchical. Postmaster becomes the

grandfather

of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period, and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

There are servers with thousands databases. Worker per database is
not good idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers can
be. On servers with large set of databases, large set of tables it can
identify too big table too late.

Isn't better to implement some quotas on storage level?

Regards

Pavel

-Chap

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

#11Jeremy Finzel
finzelj@gmail.com
In reply to: Hubert Zhang (#10)
Re: Proposal for disk quota feature

On Fri, Sep 21, 2018 at 9:21 AM Hubert Zhang <hzhang@pivotal.io> wrote:

just fast reaction - why QUOTA object?

Isn't ALTER SET enough?
Some like
ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..
New DDL commans looks like too hard hammer .

It's an option. Prefer to consider quota setting store together:
CREATE DISK QUOTA way is more nature to store quota setting in a separate
pg_diskquota catalog
While ALTER SET way is more close to store quota setting in pg_class,
pg_role, pg_namespace. etc in an integrated way.
(Note that here I mean nature/close is not must, ALTER SET could also
store in pg_diskquota and vice versa.)

Here are some differences I can think of:
1 pg_role is a global catalog, not per database level. It's harder to
tracker the user's disk usage in the whole clusters(considering 1000+
databases). So the semantic of CREATE DISK QUOTA ON USER is limited: it
only tracks the user's disk usage inside the current database.
2 using separate pg_diskquota could add more field except for quota limit
without adding too many fields in pg_class, e.g. red zone to give the user
a warning or the current disk usage of the db objects.

On Fri, Sep 21, 2018 at 8:01 PM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang <hzhang@pivotal.io>
napsal:

*Hi all,We redesign disk quota feature based on the comments from Pavel
Stehule and Chapman Flack. Here are the new design.OverviewBasically, disk
quota feature is used to support multi-tenancy environment, different level
of database objects could be set a quota limit to avoid over use of disk
space. A common case could be as follows: DBA could enable disk quota on a
specified database list. DBA could set disk quota limit for
tables/schemas/roles in these databases. Separate disk quota worker process
will monitor the disk usage for these objects and detect the objects which
exceed their quota limit. Queries loading data into these “out of disk
quota” tables/schemas/roles will be cancelled.We are currently working at
init implementation stage. We would like to propose our idea firstly and
get feedbacks from community to do quick iteration.SQL Syntax (How to use
disk quota)1 Specify the databases with disk quota enabled in GUC
“diskquota_databases” in postgresql.conf and restart the database.2 DBA
could set disk quota limit for table/schema/role.CREATE DISK QUOTA tablea1
ON TABLE a1 with (quota = ‘1MB’);CREATE DISK QUOTA roleu1 ON USER u1 with
(quota = ‘1GB’);CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota =
‘3MB’);*

just fast reaction - why QUOTA object?

Isn't ALTER SET enough?

Some like

ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..

New DDL commans looks like too hard hammer .

*3 Simulate a schema out of quota limit case: suppose table a1 and table
a2 are both under schema s1.INSERT INTO a1 SELECT
generate_series(1,1000);INSERT INTO a2 SELECT
generate_series(1,3000000);SELECT pg_sleep(5)INSERT INTO a1 SELECT
generate_series(1,1000);ERROR: schema's disk space quota exceededDROP
TABLE a2;SELECT pg_sleep(5)INSERT INTO a1 SELECT
generate_series(1,1000);INSERT 0 1000ArchitectureDisk quota has the
following components.1. Quota Setting Store is where the disk quota setting
to be stored and accessed. We plan to use catalog table pg_diskquota to
store these information. pg_diskquota is
like:CATALOG(pg_diskquota,6122,DiskQuotaRelationId){ NameData quotaname; /*
diskquota name */ int16 quotatype; /* diskquota type name */ Oid
quotatargetoid; /* diskquota target db object oid*/ int32 quotalimit; /*
diskquota size limit in MB*/ int32 quotaredzone; /* diskquota redzone in
MB*/} FormData_pg_diskquota;2. Quota Change Detector is the monitor of size
change of database objects. We plan to use stat collector to detect the
‘active’ table list at initial stage. But stat collector has some
limitation on finding the active table which is in a running transaction.
Details see TODO section.3. Quota Size Checker is where to calculate the
size and compare with quota limit for database objects. According to
Pavel’s comment, autovacuum launcher and worker process could be a good
reference to disk quota. So we plan to use a disk quota launcher daemon
process and several disk quota worker process to finish this work. Launcher
process is responsible for starting worker process based on a user defined
database list from GUC. Worker process will connect to its target database
and monitor the disk usage for objects in this database. In init stage of
worker process, it will call calculate_total_relation_size() to calculate
the size for each user table. After init stage, worker process will refresh
the disk model every N seconds. Refreshing will only recalculate the size
of tables in ‘active’ table list, which is generated by Quata Change
Detector to minimize the cost.4. Quota Enforcement Operator is where to
check for the quota limitation at postgres backend side. We will firstly
implement it in ExecCheckRTPerms() as pre-running enforcement. It will
check the disk quota of tables being inserted or updated, and report error
if table’s or table’s schema’s or table’s owner’s quota limit is exceeded.
As a native feature, we plan to add more checkpoint to do running query
enforcement. For example, if a disk quota lefts 10MB quota, a query could
insert 1GB data. This query could be allowed in pre-running enforcement
check, but will be cancelled in running query enforcement check. Therefore,
it can improve the accurate of disk quota usage. To achieve this, we plan
to add a checkpoint in lower API such as smgr_extened. Hence, the Quota
Enforcement Operator will check the disk quota usage when smgr_extened is
called. If the quota is over limited, current query will be cancelled.
Highlight1. Native feature.Support native Create/Drop Disk Quota SQL
statement.New catalog table pg_diskquota to store disk quota setting.2.
Auto DML/DDL detection. Table
create/update/insert/delete/vacuum/truncate/drop/schema_change/owner_change,
Schema create/drop and Role create/drop will be detected by disk quota
automatically. 3. Low cost disk quota checker.Worker process of disk quota
need to refresh the disk usage model every N seconds. Since recalculate the
file size using stat() system call is expensive for a large number of
files, we use an ‘active’ table list to reduce the real work at each
iteration. A basic experiment on our init stage implementation on database
with 20K tables shows that the refresh cost is 1% cpu usage and will be
finished within 50ms. Todo/LimitationBefore we propose our patch, we plan
to enhance it with the following ideas:1. Setting database list with disk
quota enabled dynamically without restart database. Since we have the disk
quota launcher process, it could detect the new ‘diskquota_databases’ list
and start/stop the corresponding disk quota worker process.2. Enforcement
when query is running. Considering the case when there is 10MB quota left,
but next query will insert 10GB data. Current enforcement design will allow
this query to be executed. This is limited by the ‘active’ table detection
is generated by stat collector. Postgres backend will only send table stat
information to collector only when the transaction ends. We need a new way
to detect the ‘active’ table even when this table is being modified inside
a running transaction.3. Monitor unlimited number of databases. Current we
set the max number of disk quota worker process to be 10 to reduce the
affection normal workload. But how about if we want to monitor the disk
quota of more than 10 databases? Our solution is to let disk quota launcher
to manage a queue of database need to be monitored. And disk quota worker
process consuming the queue and refresh the disk usage/quota for this
database. After some periods, worker will return the database to the queue,
and fetch the top database from queue to process. The period determine the
delay of detecting disk quota change. To implement this feature, we need to
support a subprocess of postmaster to rebind to another database instead of
the database binded in InitPostgres().4. Support active table detection on
vacuum full and vacuum analyze. Currently vacuum full and vacuum analyze
are not tracked by stat collector.Thanks to Heikki, Pavel Stehule,Chapman
Flack for the former comments on disk quota feature. Any comments on how to
improve disk quota feature are appreciated.*

On Mon, Sep 3, 2018 at 12:05 PM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:

Thanks Pavel.
Your patch did enforcement on storage level(md.c or we could also use
smgr_extend). It's straight forward.
But I prefer to implement disk_quota as a feature with following
objectives:
1 set/alter disk quota setting on different database objects, e.g.
user, database, schema etc. not only a general GUC, but we could set
separate quota limit for a specific objects.
2 enforcement operator should work at two positions: before query is
running and when query is running. The latter one's implementation maybe
similar to your patch.

The patch was just example. The resource quotes should be more complex
- per partition, table, schema, database, user - so GUC are possible, but
not very user friendly.

Our case is specific, but not too much. The servers are used for
multidimensional analyses - and some tables can grow too fast (COPY, INSERT
SELECT). We need to solve limits immediately. The implementation is simple,
so I did it. Same implementation on database level, or schema level needs
some more locks, so it will not be too effective. The resource management
can be complex very complex, and I expect so it will be hard work.

Regards

Pavel

On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <pavel.stehule@gmail.com

wrote:

Hi

2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:

Thanks Chapman.
@Pavel, could you please explain more about your second suggestion
"implement some quotas on storage level?"

See attached patch - it is very simple - and good enough for our
purposes.

Regards

Pavel

We will not keep the long-lived processes attach to all
databases(just like you mentioned servers with thousands of
databases)
And you are right, we could share ideas with autovacuum process,
fork worker processes in need.
"autovacuum checks for tables that have had a large number of
inserted, updated or deleted tuples. These checks use the statistics
collection facility"
diskquota process is similar to autovacuum at caring about insert,
but the difference is that it also care about vucuum full, truncate and
drop. While update and delete may not be interested since no file change
happens. So a separate diskquota process is preferred.

So if we implemented disk quota as a full native feature, and in
the first initial version I prefer to implement the following features:
1 Fork diskquota launcher process under Postmaster serverloop, which
is long-lived.
2 Diskquota launcher process is responsible for creating diskquota
worker process for every database.
3 DIskquota setting is stored in a separate catalog table for each
database.
4 Initialization stage, Diskquota launcher process creates diskquota
worker process for all the databases(traverse like autovacuum).
Worker process calculates disk usage of db objects and
their diskquota setting. If any db object exceeds its quota limit, put them
into the blacklist in the shared memory, which will later be used by
enforcement operator. Worker process exits when works are done.
5 Running stage, Diskquota launcher process creates diskquota worker
process for the database with a large number of insert, copy, truncate,
drop etc. or create disk quota statement. Worker process updates the file
size for db objects containing the result relation, and compare with the
diskquota setting. Again, if exceeds quota limit, put them into blacklist,
remove from blacklist vice versa. Worker process exits when works
are done and a GUC could control the frequency of worker process restart to
a specific database. As you know, this GUC also controls the delay when we
do enforcement.
6 Enforcement. When postgres backend executes queries, check the
blacklist in shared memory to determine whether the query is allowed(before
execute) or need rollback(is executing)?

If we implemented disk quota as an extension, we could just use
background worker to start diskquota launcher process and use
RegisterDynamicBackgroundWorker() to fork child diskquota worker
processes by the launcher process as suggested by @Chapman.
Diskquota setting could be stored in user table in a separate schema for
each database(Schema and table created by create extension statement) just
like what Heikki has done in pg_quota project. But in this case, we need to
create extension for each database before diskquota worker process can be
set up for that database.

Any comments on the above design and which is preferred, native
feature or extension as the POC?

-- Hubert

On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <
pavel.stehule@gmail.com> wrote:

2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net>:

On 08/30/2018 09:57 AM, Hubert Zhang wrote:

2 Keep one worker process for each database. But using a

parent/global

quota worker process to manage the lifecycle of database level

worker

processes. It could handle the newly created database(avoid

restart

database) and save resource when a database is not used. But

this needs to

change worker process to be hierarchical. Postmaster becomes the

grandfather

of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period, and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to
worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

There are servers with thousands databases. Worker per database is
not good idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers can
be. On servers with large set of databases, large set of tables it can
identify too big table too late.

Isn't better to implement some quotas on storage level?

FWIW, we are super interested in this feature at a role level especially in
reporting systems where users can for example accidentally start a create
table statement overnight and sure enough after their Cartesian product
generated 2TB of data they run the system out of disk.

That problem is really hard to manage currently and this feature would
really solve that.

Thanks,
Jeremy

Show quoted text
#12Pavel Stehule
pavel.stehule@gmail.com
In reply to: Hubert Zhang (#10)
Re: Proposal for disk quota feature

pá 21. 9. 2018 v 16:21 odesílatel Hubert Zhang <hzhang@pivotal.io> napsal:

just fast reaction - why QUOTA object?

Isn't ALTER SET enough?
Some like
ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..
New DDL commans looks like too hard hammer .

It's an option. Prefer to consider quota setting store together:
CREATE DISK QUOTA way is more nature to store quota setting in a separate
pg_diskquota catalog
While ALTER SET way is more close to store quota setting in pg_class,
pg_role, pg_namespace. etc in an integrated way.
(Note that here I mean nature/close is not must, ALTER SET could also
store in pg_diskquota and vice versa.)

I have not a problem with new special table for storing this information.
But it looks like redundant to current GUC configuration and limits. Can be
messy do some work with ALTER ROLE, and some work via CREATE QUOTE.

Regards

Pavel

Show quoted text

Here are some differences I can think of:
1 pg_role is a global catalog, not per database level. It's harder to
tracker the user's disk usage in the whole clusters(considering 1000+
databases). So the semantic of CREATE DISK QUOTA ON USER is limited: it
only tracks the user's disk usage inside the current database.
2 using separate pg_diskquota could add more field except for quota limit
without adding too many fields in pg_class, e.g. red zone to give the user
a warning or the current disk usage of the db objects.

On Fri, Sep 21, 2018 at 8:01 PM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang <hzhang@pivotal.io>
napsal:

*Hi all,We redesign disk quota feature based on the comments from Pavel
Stehule and Chapman Flack. Here are the new design.OverviewBasically, disk
quota feature is used to support multi-tenancy environment, different level
of database objects could be set a quota limit to avoid over use of disk
space. A common case could be as follows: DBA could enable disk quota on a
specified database list. DBA could set disk quota limit for
tables/schemas/roles in these databases. Separate disk quota worker process
will monitor the disk usage for these objects and detect the objects which
exceed their quota limit. Queries loading data into these “out of disk
quota” tables/schemas/roles will be cancelled.We are currently working at
init implementation stage. We would like to propose our idea firstly and
get feedbacks from community to do quick iteration.SQL Syntax (How to use
disk quota)1 Specify the databases with disk quota enabled in GUC
“diskquota_databases” in postgresql.conf and restart the database.2 DBA
could set disk quota limit for table/schema/role.CREATE DISK QUOTA tablea1
ON TABLE a1 with (quota = ‘1MB’);CREATE DISK QUOTA roleu1 ON USER u1 with
(quota = ‘1GB’);CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota =
‘3MB’);*

just fast reaction - why QUOTA object?

Isn't ALTER SET enough?

Some like

ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..

New DDL commans looks like too hard hammer .

*3 Simulate a schema out of quota limit case: suppose table a1 and table
a2 are both under schema s1.INSERT INTO a1 SELECT
generate_series(1,1000);INSERT INTO a2 SELECT
generate_series(1,3000000);SELECT pg_sleep(5)INSERT INTO a1 SELECT
generate_series(1,1000);ERROR: schema's disk space quota exceededDROP
TABLE a2;SELECT pg_sleep(5)INSERT INTO a1 SELECT
generate_series(1,1000);INSERT 0 1000ArchitectureDisk quota has the
following components.1. Quota Setting Store is where the disk quota setting
to be stored and accessed. We plan to use catalog table pg_diskquota to
store these information. pg_diskquota is
like:CATALOG(pg_diskquota,6122,DiskQuotaRelationId){ NameData quotaname; /*
diskquota name */ int16 quotatype; /* diskquota type name */ Oid
quotatargetoid; /* diskquota target db object oid*/ int32 quotalimit; /*
diskquota size limit in MB*/ int32 quotaredzone; /* diskquota redzone in
MB*/} FormData_pg_diskquota;2. Quota Change Detector is the monitor of size
change of database objects. We plan to use stat collector to detect the
‘active’ table list at initial stage. But stat collector has some
limitation on finding the active table which is in a running transaction.
Details see TODO section.3. Quota Size Checker is where to calculate the
size and compare with quota limit for database objects. According to
Pavel’s comment, autovacuum launcher and worker process could be a good
reference to disk quota. So we plan to use a disk quota launcher daemon
process and several disk quota worker process to finish this work. Launcher
process is responsible for starting worker process based on a user defined
database list from GUC. Worker process will connect to its target database
and monitor the disk usage for objects in this database. In init stage of
worker process, it will call calculate_total_relation_size() to calculate
the size for each user table. After init stage, worker process will refresh
the disk model every N seconds. Refreshing will only recalculate the size
of tables in ‘active’ table list, which is generated by Quata Change
Detector to minimize the cost.4. Quota Enforcement Operator is where to
check for the quota limitation at postgres backend side. We will firstly
implement it in ExecCheckRTPerms() as pre-running enforcement. It will
check the disk quota of tables being inserted or updated, and report error
if table’s or table’s schema’s or table’s owner’s quota limit is exceeded.
As a native feature, we plan to add more checkpoint to do running query
enforcement. For example, if a disk quota lefts 10MB quota, a query could
insert 1GB data. This query could be allowed in pre-running enforcement
check, but will be cancelled in running query enforcement check. Therefore,
it can improve the accurate of disk quota usage. To achieve this, we plan
to add a checkpoint in lower API such as smgr_extened. Hence, the Quota
Enforcement Operator will check the disk quota usage when smgr_extened is
called. If the quota is over limited, current query will be cancelled.
Highlight1. Native feature.Support native Create/Drop Disk Quota SQL
statement.New catalog table pg_diskquota to store disk quota setting.2.
Auto DML/DDL detection. Table
create/update/insert/delete/vacuum/truncate/drop/schema_change/owner_change,
Schema create/drop and Role create/drop will be detected by disk quota
automatically. 3. Low cost disk quota checker.Worker process of disk quota
need to refresh the disk usage model every N seconds. Since recalculate the
file size using stat() system call is expensive for a large number of
files, we use an ‘active’ table list to reduce the real work at each
iteration. A basic experiment on our init stage implementation on database
with 20K tables shows that the refresh cost is 1% cpu usage and will be
finished within 50ms. Todo/LimitationBefore we propose our patch, we plan
to enhance it with the following ideas:1. Setting database list with disk
quota enabled dynamically without restart database. Since we have the disk
quota launcher process, it could detect the new ‘diskquota_databases’ list
and start/stop the corresponding disk quota worker process.2. Enforcement
when query is running. Considering the case when there is 10MB quota left,
but next query will insert 10GB data. Current enforcement design will allow
this query to be executed. This is limited by the ‘active’ table detection
is generated by stat collector. Postgres backend will only send table stat
information to collector only when the transaction ends. We need a new way
to detect the ‘active’ table even when this table is being modified inside
a running transaction.3. Monitor unlimited number of databases. Current we
set the max number of disk quota worker process to be 10 to reduce the
affection normal workload. But how about if we want to monitor the disk
quota of more than 10 databases? Our solution is to let disk quota launcher
to manage a queue of database need to be monitored. And disk quota worker
process consuming the queue and refresh the disk usage/quota for this
database. After some periods, worker will return the database to the queue,
and fetch the top database from queue to process. The period determine the
delay of detecting disk quota change. To implement this feature, we need to
support a subprocess of postmaster to rebind to another database instead of
the database binded in InitPostgres().4. Support active table detection on
vacuum full and vacuum analyze. Currently vacuum full and vacuum analyze
are not tracked by stat collector.Thanks to Heikki, Pavel Stehule,Chapman
Flack for the former comments on disk quota feature. Any comments on how to
improve disk quota feature are appreciated.*

On Mon, Sep 3, 2018 at 12:05 PM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:

Thanks Pavel.
Your patch did enforcement on storage level(md.c or we could also use
smgr_extend). It's straight forward.
But I prefer to implement disk_quota as a feature with following
objectives:
1 set/alter disk quota setting on different database objects, e.g.
user, database, schema etc. not only a general GUC, but we could set
separate quota limit for a specific objects.
2 enforcement operator should work at two positions: before query is
running and when query is running. The latter one's implementation maybe
similar to your patch.

The patch was just example. The resource quotes should be more complex
- per partition, table, schema, database, user - so GUC are possible, but
not very user friendly.

Our case is specific, but not too much. The servers are used for
multidimensional analyses - and some tables can grow too fast (COPY, INSERT
SELECT). We need to solve limits immediately. The implementation is simple,
so I did it. Same implementation on database level, or schema level needs
some more locks, so it will not be too effective. The resource management
can be complex very complex, and I expect so it will be hard work.

Regards

Pavel

On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <pavel.stehule@gmail.com

wrote:

Hi

2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:

Thanks Chapman.
@Pavel, could you please explain more about your second suggestion
"implement some quotas on storage level?"

See attached patch - it is very simple - and good enough for our
purposes.

Regards

Pavel

We will not keep the long-lived processes attach to all
databases(just like you mentioned servers with thousands of
databases)
And you are right, we could share ideas with autovacuum process,
fork worker processes in need.
"autovacuum checks for tables that have had a large number of
inserted, updated or deleted tuples. These checks use the statistics
collection facility"
diskquota process is similar to autovacuum at caring about insert,
but the difference is that it also care about vucuum full, truncate and
drop. While update and delete may not be interested since no file change
happens. So a separate diskquota process is preferred.

So if we implemented disk quota as a full native feature, and in
the first initial version I prefer to implement the following features:
1 Fork diskquota launcher process under Postmaster serverloop, which
is long-lived.
2 Diskquota launcher process is responsible for creating diskquota
worker process for every database.
3 DIskquota setting is stored in a separate catalog table for each
database.
4 Initialization stage, Diskquota launcher process creates diskquota
worker process for all the databases(traverse like autovacuum).
Worker process calculates disk usage of db objects and
their diskquota setting. If any db object exceeds its quota limit, put them
into the blacklist in the shared memory, which will later be used by
enforcement operator. Worker process exits when works are done.
5 Running stage, Diskquota launcher process creates diskquota worker
process for the database with a large number of insert, copy, truncate,
drop etc. or create disk quota statement. Worker process updates the file
size for db objects containing the result relation, and compare with the
diskquota setting. Again, if exceeds quota limit, put them into blacklist,
remove from blacklist vice versa. Worker process exits when works
are done and a GUC could control the frequency of worker process restart to
a specific database. As you know, this GUC also controls the delay when we
do enforcement.
6 Enforcement. When postgres backend executes queries, check the
blacklist in shared memory to determine whether the query is allowed(before
execute) or need rollback(is executing)?

If we implemented disk quota as an extension, we could just use
background worker to start diskquota launcher process and use
RegisterDynamicBackgroundWorker() to fork child diskquota worker
processes by the launcher process as suggested by @Chapman.
Diskquota setting could be stored in user table in a separate schema for
each database(Schema and table created by create extension statement) just
like what Heikki has done in pg_quota project. But in this case, we need to
create extension for each database before diskquota worker process can be
set up for that database.

Any comments on the above design and which is preferred, native
feature or extension as the POC?

-- Hubert

On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <
pavel.stehule@gmail.com> wrote:

2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net>:

On 08/30/2018 09:57 AM, Hubert Zhang wrote:

2 Keep one worker process for each database. But using a

parent/global

quota worker process to manage the lifecycle of database level

worker

processes. It could handle the newly created database(avoid

restart

database) and save resource when a database is not used. But

this needs to

change worker process to be hierarchical. Postmaster becomes the

grandfather

of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period, and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to
worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

There are servers with thousands databases. Worker per database is
not good idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers can
be. On servers with large set of databases, large set of tables it can
identify too big table too late.

Isn't better to implement some quotas on storage level?

Regards

Pavel

-Chap

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

#13Hubert Zhang
hzhang@pivotal.io
In reply to: Pavel Stehule (#12)
Re: Proposal for disk quota feature

But it looks like redundant to current GUC configuration and limits

what do you mean by current GUC configuration? Is that the general block
number limit in your patch? If yes, the difference between GUC and
pg_diskquota catalog is that pg_diskquota will store different quota limit
for the different role, schema or table instead of a single GUC value.

On Sat, Sep 22, 2018 at 11:17 AM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

pá 21. 9. 2018 v 16:21 odesílatel Hubert Zhang <hzhang@pivotal.io> napsal:

just fast reaction - why QUOTA object?

Isn't ALTER SET enough?
Some like
ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..
New DDL commans looks like too hard hammer .

It's an option. Prefer to consider quota setting store together:
CREATE DISK QUOTA way is more nature to store quota setting in a separate
pg_diskquota catalog
While ALTER SET way is more close to store quota setting in pg_class,
pg_role, pg_namespace. etc in an integrated way.
(Note that here I mean nature/close is not must, ALTER SET could also
store in pg_diskquota and vice versa.)

I have not a problem with new special table for storing this information.
But it looks like redundant to current GUC configuration and limits. Can be
messy do some work with ALTER ROLE, and some work via CREATE QUOTE.

Regards

Pavel

Here are some differences I can think of:
1 pg_role is a global catalog, not per database level. It's harder to
tracker the user's disk usage in the whole clusters(considering 1000+
databases). So the semantic of CREATE DISK QUOTA ON USER is limited: it
only tracks the user's disk usage inside the current database.
2 using separate pg_diskquota could add more field except for quota limit
without adding too many fields in pg_class, e.g. red zone to give the user
a warning or the current disk usage of the db objects.

On Fri, Sep 21, 2018 at 8:01 PM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang <hzhang@pivotal.io>
napsal:

*Hi all,We redesign disk quota feature based on the comments from Pavel
Stehule and Chapman Flack. Here are the new design.OverviewBasically, disk
quota feature is used to support multi-tenancy environment, different level
of database objects could be set a quota limit to avoid over use of disk
space. A common case could be as follows: DBA could enable disk quota on a
specified database list. DBA could set disk quota limit for
tables/schemas/roles in these databases. Separate disk quota worker process
will monitor the disk usage for these objects and detect the objects which
exceed their quota limit. Queries loading data into these “out of disk
quota” tables/schemas/roles will be cancelled.We are currently working at
init implementation stage. We would like to propose our idea firstly and
get feedbacks from community to do quick iteration.SQL Syntax (How to use
disk quota)1 Specify the databases with disk quota enabled in GUC
“diskquota_databases” in postgresql.conf and restart the database.2 DBA
could set disk quota limit for table/schema/role.CREATE DISK QUOTA tablea1
ON TABLE a1 with (quota = ‘1MB’);CREATE DISK QUOTA roleu1 ON USER u1 with
(quota = ‘1GB’);CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota =
‘3MB’);*

just fast reaction - why QUOTA object?

Isn't ALTER SET enough?

Some like

ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..

New DDL commans looks like too hard hammer .

*3 Simulate a schema out of quota limit case: suppose table a1 and
table a2 are both under schema s1.INSERT INTO a1 SELECT
generate_series(1,1000);INSERT INTO a2 SELECT
generate_series(1,3000000);SELECT pg_sleep(5)INSERT INTO a1 SELECT
generate_series(1,1000);ERROR: schema's disk space quota exceededDROP
TABLE a2;SELECT pg_sleep(5)INSERT INTO a1 SELECT
generate_series(1,1000);INSERT 0 1000ArchitectureDisk quota has the
following components.1. Quota Setting Store is where the disk quota setting
to be stored and accessed. We plan to use catalog table pg_diskquota to
store these information. pg_diskquota is
like:CATALOG(pg_diskquota,6122,DiskQuotaRelationId){ NameData quotaname; /*
diskquota name */ int16 quotatype; /* diskquota type name */ Oid
quotatargetoid; /* diskquota target db object oid*/ int32 quotalimit; /*
diskquota size limit in MB*/ int32 quotaredzone; /* diskquota redzone in
MB*/} FormData_pg_diskquota;2. Quota Change Detector is the monitor of size
change of database objects. We plan to use stat collector to detect the
‘active’ table list at initial stage. But stat collector has some
limitation on finding the active table which is in a running transaction.
Details see TODO section.3. Quota Size Checker is where to calculate the
size and compare with quota limit for database objects. According to
Pavel’s comment, autovacuum launcher and worker process could be a good
reference to disk quota. So we plan to use a disk quota launcher daemon
process and several disk quota worker process to finish this work. Launcher
process is responsible for starting worker process based on a user defined
database list from GUC. Worker process will connect to its target database
and monitor the disk usage for objects in this database. In init stage of
worker process, it will call calculate_total_relation_size() to calculate
the size for each user table. After init stage, worker process will refresh
the disk model every N seconds. Refreshing will only recalculate the size
of tables in ‘active’ table list, which is generated by Quata Change
Detector to minimize the cost.4. Quota Enforcement Operator is where to
check for the quota limitation at postgres backend side. We will firstly
implement it in ExecCheckRTPerms() as pre-running enforcement. It will
check the disk quota of tables being inserted or updated, and report error
if table’s or table’s schema’s or table’s owner’s quota limit is exceeded.
As a native feature, we plan to add more checkpoint to do running query
enforcement. For example, if a disk quota lefts 10MB quota, a query could
insert 1GB data. This query could be allowed in pre-running enforcement
check, but will be cancelled in running query enforcement check. Therefore,
it can improve the accurate of disk quota usage. To achieve this, we plan
to add a checkpoint in lower API such as smgr_extened. Hence, the Quota
Enforcement Operator will check the disk quota usage when smgr_extened is
called. If the quota is over limited, current query will be cancelled.
Highlight1. Native feature.Support native Create/Drop Disk Quota SQL
statement.New catalog table pg_diskquota to store disk quota setting.2.
Auto DML/DDL detection. Table
create/update/insert/delete/vacuum/truncate/drop/schema_change/owner_change,
Schema create/drop and Role create/drop will be detected by disk quota
automatically. 3. Low cost disk quota checker.Worker process of disk quota
need to refresh the disk usage model every N seconds. Since recalculate the
file size using stat() system call is expensive for a large number of
files, we use an ‘active’ table list to reduce the real work at each
iteration. A basic experiment on our init stage implementation on database
with 20K tables shows that the refresh cost is 1% cpu usage and will be
finished within 50ms. Todo/LimitationBefore we propose our patch, we plan
to enhance it with the following ideas:1. Setting database list with disk
quota enabled dynamically without restart database. Since we have the disk
quota launcher process, it could detect the new ‘diskquota_databases’ list
and start/stop the corresponding disk quota worker process.2. Enforcement
when query is running. Considering the case when there is 10MB quota left,
but next query will insert 10GB data. Current enforcement design will allow
this query to be executed. This is limited by the ‘active’ table detection
is generated by stat collector. Postgres backend will only send table stat
information to collector only when the transaction ends. We need a new way
to detect the ‘active’ table even when this table is being modified inside
a running transaction.3. Monitor unlimited number of databases. Current we
set the max number of disk quota worker process to be 10 to reduce the
affection normal workload. But how about if we want to monitor the disk
quota of more than 10 databases? Our solution is to let disk quota launcher
to manage a queue of database need to be monitored. And disk quota worker
process consuming the queue and refresh the disk usage/quota for this
database. After some periods, worker will return the database to the queue,
and fetch the top database from queue to process. The period determine the
delay of detecting disk quota change. To implement this feature, we need to
support a subprocess of postmaster to rebind to another database instead of
the database binded in InitPostgres().4. Support active table detection on
vacuum full and vacuum analyze. Currently vacuum full and vacuum analyze
are not tracked by stat collector.Thanks to Heikki, Pavel Stehule,Chapman
Flack for the former comments on disk quota feature. Any comments on how to
improve disk quota feature are appreciated.*

On Mon, Sep 3, 2018 at 12:05 PM, Pavel Stehule <pavel.stehule@gmail.com

wrote:

2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:

Thanks Pavel.
Your patch did enforcement on storage level(md.c or we could also use
smgr_extend). It's straight forward.
But I prefer to implement disk_quota as a feature with following
objectives:
1 set/alter disk quota setting on different database objects, e.g.
user, database, schema etc. not only a general GUC, but we could set
separate quota limit for a specific objects.
2 enforcement operator should work at two positions: before query is
running and when query is running. The latter one's implementation maybe
similar to your patch.

The patch was just example. The resource quotes should be more complex
- per partition, table, schema, database, user - so GUC are possible, but
not very user friendly.

Our case is specific, but not too much. The servers are used for
multidimensional analyses - and some tables can grow too fast (COPY, INSERT
SELECT). We need to solve limits immediately. The implementation is simple,
so I did it. Same implementation on database level, or schema level needs
some more locks, so it will not be too effective. The resource management
can be complex very complex, and I expect so it will be hard work.

Regards

Pavel

On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <
pavel.stehule@gmail.com> wrote:

Hi

2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:

Thanks Chapman.
@Pavel, could you please explain more about your second
suggestion "implement some quotas on storage level?"

See attached patch - it is very simple - and good enough for our
purposes.

Regards

Pavel

We will not keep the long-lived processes attach to all
databases(just like you mentioned servers with thousands of
databases)
And you are right, we could share ideas with autovacuum process,
fork worker processes in need.
"autovacuum checks for tables that have had a large number of
inserted, updated or deleted tuples. These checks use the statistics
collection facility"
diskquota process is similar to autovacuum at caring about insert,
but the difference is that it also care about vucuum full, truncate and
drop. While update and delete may not be interested since no file change
happens. So a separate diskquota process is preferred.

So if we implemented disk quota as a full native feature, and in
the first initial version I prefer to implement the following features:
1 Fork diskquota launcher process under Postmaster serverloop,
which is long-lived.
2 Diskquota launcher process is responsible for creating diskquota
worker process for every database.
3 DIskquota setting is stored in a separate catalog table for each
database.
4 Initialization stage, Diskquota launcher process creates
diskquota worker process for all the databases(traverse like
autovacuum). Worker process calculates disk usage of db objects and
their diskquota setting. If any db object exceeds its quota limit, put them
into the blacklist in the shared memory, which will later be used by
enforcement operator. Worker process exits when works are done.
5 Running stage, Diskquota launcher process creates diskquota worker
process for the database with a large number of insert, copy, truncate,
drop etc. or create disk quota statement. Worker process updates the file
size for db objects containing the result relation, and compare with the
diskquota setting. Again, if exceeds quota limit, put them into blacklist,
remove from blacklist vice versa. Worker process exits when works
are done and a GUC could control the frequency of worker process restart to
a specific database. As you know, this GUC also controls the delay when we
do enforcement.
6 Enforcement. When postgres backend executes queries, check the
blacklist in shared memory to determine whether the query is allowed(before
execute) or need rollback(is executing)?

If we implemented disk quota as an extension, we could just use
background worker to start diskquota launcher process and use
RegisterDynamicBackgroundWorker() to fork child diskquota worker
processes by the launcher process as suggested by @Chapman.
Diskquota setting could be stored in user table in a separate schema for
each database(Schema and table created by create extension statement) just
like what Heikki has done in pg_quota project. But in this case, we need to
create extension for each database before diskquota worker process can be
set up for that database.

Any comments on the above design and which is preferred, native
feature or extension as the POC?

-- Hubert

On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <
pavel.stehule@gmail.com> wrote:

2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net>:

On 08/30/2018 09:57 AM, Hubert Zhang wrote:

2 Keep one worker process for each database. But using a

parent/global

quota worker process to manage the lifecycle of database level

worker

processes. It could handle the newly created database(avoid

restart

database) and save resource when a database is not used. But

this needs to

change worker process to be hierarchical. Postmaster becomes

the grandfather

of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period,
and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to
worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

There are servers with thousands databases. Worker per database is
not good idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers can
be. On servers with large set of databases, large set of tables it can
identify too big table too late.

Isn't better to implement some quotas on storage level?

Regards

Pavel

-Chap

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

#14Pavel Stehule
pavel.stehule@gmail.com
In reply to: Hubert Zhang (#13)
Re: Proposal for disk quota feature

so 22. 9. 2018 v 8:48 odesílatel Hubert Zhang <hzhang@pivotal.io> napsal:

But it looks like redundant to current GUC configuration and limits

what do you mean by current GUC configuration? Is that the general block
number limit in your patch? If yes, the difference between GUC and
pg_diskquota catalog is that pg_diskquota will store different quota limit
for the different role, schema or table instead of a single GUC value.

storage is not relevant in this moment.

I don't see to consistent to sets some limits via SET command, or ALTER X
SET, and some other with CREATE QUOTA ON.

The quotas or object limits, resource limits are pretty useful and
necessary, but I don't see these like new type of objects, it is much more
some property of current objects. Because we have one syntax for this
purpose I prefer it. Because is not good to have two syntaxes for similar
purpose.

So instead CREATE DISC QUATA ON SCHEMA xxx some value I prefer

ALTER SCHEMA xxx SET disc_quota = xxx;

The functionality is +/- same. But ALTER XX SET was introduce first, and I
don't feel comfortable to have any new syntax for similar purpose

Regards

Pavel

Show quoted text

On Sat, Sep 22, 2018 at 11:17 AM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

pá 21. 9. 2018 v 16:21 odesílatel Hubert Zhang <hzhang@pivotal.io>
napsal:

just fast reaction - why QUOTA object?

Isn't ALTER SET enough?
Some like
ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..
New DDL commans looks like too hard hammer .

It's an option. Prefer to consider quota setting store together:
CREATE DISK QUOTA way is more nature to store quota setting in a
separate pg_diskquota catalog
While ALTER SET way is more close to store quota setting in pg_class,
pg_role, pg_namespace. etc in an integrated way.
(Note that here I mean nature/close is not must, ALTER SET could also
store in pg_diskquota and vice versa.)

I have not a problem with new special table for storing this information.
But it looks like redundant to current GUC configuration and limits. Can be
messy do some work with ALTER ROLE, and some work via CREATE QUOTE.

Regards

Pavel

Here are some differences I can think of:
1 pg_role is a global catalog, not per database level. It's harder to
tracker the user's disk usage in the whole clusters(considering 1000+
databases). So the semantic of CREATE DISK QUOTA ON USER is limited: it
only tracks the user's disk usage inside the current database.
2 using separate pg_diskquota could add more field except for quota
limit without adding too many fields in pg_class, e.g. red zone to give the
user a warning or the current disk usage of the db objects.

On Fri, Sep 21, 2018 at 8:01 PM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang <hzhang@pivotal.io>
napsal:

*Hi all,We redesign disk quota feature based on the comments from
Pavel Stehule and Chapman Flack. Here are the new design.OverviewBasically,
disk quota feature is used to support multi-tenancy environment, different
level of database objects could be set a quota limit to avoid over use of
disk space. A common case could be as follows: DBA could enable disk quota
on a specified database list. DBA could set disk quota limit for
tables/schemas/roles in these databases. Separate disk quota worker process
will monitor the disk usage for these objects and detect the objects which
exceed their quota limit. Queries loading data into these “out of disk
quota” tables/schemas/roles will be cancelled.We are currently working at
init implementation stage. We would like to propose our idea firstly and
get feedbacks from community to do quick iteration.SQL Syntax (How to use
disk quota)1 Specify the databases with disk quota enabled in GUC
“diskquota_databases” in postgresql.conf and restart the database.2 DBA
could set disk quota limit for table/schema/role.CREATE DISK QUOTA tablea1
ON TABLE a1 with (quota = ‘1MB’);CREATE DISK QUOTA roleu1 ON USER u1 with
(quota = ‘1GB’);CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota =
‘3MB’);*

just fast reaction - why QUOTA object?

Isn't ALTER SET enough?

Some like

ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..

New DDL commans looks like too hard hammer .

*3 Simulate a schema out of quota limit case: suppose table a1 and
table a2 are both under schema s1.INSERT INTO a1 SELECT
generate_series(1,1000);INSERT INTO a2 SELECT
generate_series(1,3000000);SELECT pg_sleep(5)INSERT INTO a1 SELECT
generate_series(1,1000);ERROR: schema's disk space quota exceededDROP
TABLE a2;SELECT pg_sleep(5)INSERT INTO a1 SELECT
generate_series(1,1000);INSERT 0 1000ArchitectureDisk quota has the
following components.1. Quota Setting Store is where the disk quota setting
to be stored and accessed. We plan to use catalog table pg_diskquota to
store these information. pg_diskquota is
like:CATALOG(pg_diskquota,6122,DiskQuotaRelationId){ NameData quotaname; /*
diskquota name */ int16 quotatype; /* diskquota type name */ Oid
quotatargetoid; /* diskquota target db object oid*/ int32 quotalimit; /*
diskquota size limit in MB*/ int32 quotaredzone; /* diskquota redzone in
MB*/} FormData_pg_diskquota;2. Quota Change Detector is the monitor of size
change of database objects. We plan to use stat collector to detect the
‘active’ table list at initial stage. But stat collector has some
limitation on finding the active table which is in a running transaction.
Details see TODO section.3. Quota Size Checker is where to calculate the
size and compare with quota limit for database objects. According to
Pavel’s comment, autovacuum launcher and worker process could be a good
reference to disk quota. So we plan to use a disk quota launcher daemon
process and several disk quota worker process to finish this work. Launcher
process is responsible for starting worker process based on a user defined
database list from GUC. Worker process will connect to its target database
and monitor the disk usage for objects in this database. In init stage of
worker process, it will call calculate_total_relation_size() to calculate
the size for each user table. After init stage, worker process will refresh
the disk model every N seconds. Refreshing will only recalculate the size
of tables in ‘active’ table list, which is generated by Quata Change
Detector to minimize the cost.4. Quota Enforcement Operator is where to
check for the quota limitation at postgres backend side. We will firstly
implement it in ExecCheckRTPerms() as pre-running enforcement. It will
check the disk quota of tables being inserted or updated, and report error
if table’s or table’s schema’s or table’s owner’s quota limit is exceeded.
As a native feature, we plan to add more checkpoint to do running query
enforcement. For example, if a disk quota lefts 10MB quota, a query could
insert 1GB data. This query could be allowed in pre-running enforcement
check, but will be cancelled in running query enforcement check. Therefore,
it can improve the accurate of disk quota usage. To achieve this, we plan
to add a checkpoint in lower API such as smgr_extened. Hence, the Quota
Enforcement Operator will check the disk quota usage when smgr_extened is
called. If the quota is over limited, current query will be cancelled.
Highlight1. Native feature.Support native Create/Drop Disk Quota SQL
statement.New catalog table pg_diskquota to store disk quota setting.2.
Auto DML/DDL detection. Table
create/update/insert/delete/vacuum/truncate/drop/schema_change/owner_change,
Schema create/drop and Role create/drop will be detected by disk quota
automatically. 3. Low cost disk quota checker.Worker process of disk quota
need to refresh the disk usage model every N seconds. Since recalculate the
file size using stat() system call is expensive for a large number of
files, we use an ‘active’ table list to reduce the real work at each
iteration. A basic experiment on our init stage implementation on database
with 20K tables shows that the refresh cost is 1% cpu usage and will be
finished within 50ms. Todo/LimitationBefore we propose our patch, we plan
to enhance it with the following ideas:1. Setting database list with disk
quota enabled dynamically without restart database. Since we have the disk
quota launcher process, it could detect the new ‘diskquota_databases’ list
and start/stop the corresponding disk quota worker process.2. Enforcement
when query is running. Considering the case when there is 10MB quota left,
but next query will insert 10GB data. Current enforcement design will allow
this query to be executed. This is limited by the ‘active’ table detection
is generated by stat collector. Postgres backend will only send table stat
information to collector only when the transaction ends. We need a new way
to detect the ‘active’ table even when this table is being modified inside
a running transaction.3. Monitor unlimited number of databases. Current we
set the max number of disk quota worker process to be 10 to reduce the
affection normal workload. But how about if we want to monitor the disk
quota of more than 10 databases? Our solution is to let disk quota launcher
to manage a queue of database need to be monitored. And disk quota worker
process consuming the queue and refresh the disk usage/quota for this
database. After some periods, worker will return the database to the queue,
and fetch the top database from queue to process. The period determine the
delay of detecting disk quota change. To implement this feature, we need to
support a subprocess of postmaster to rebind to another database instead of
the database binded in InitPostgres().4. Support active table detection on
vacuum full and vacuum analyze. Currently vacuum full and vacuum analyze
are not tracked by stat collector.Thanks to Heikki, Pavel Stehule,Chapman
Flack for the former comments on disk quota feature. Any comments on how to
improve disk quota feature are appreciated.*

On Mon, Sep 3, 2018 at 12:05 PM, Pavel Stehule <
pavel.stehule@gmail.com> wrote:

2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:

Thanks Pavel.
Your patch did enforcement on storage level(md.c or we could also
use smgr_extend). It's straight forward.
But I prefer to implement disk_quota as a feature with following
objectives:
1 set/alter disk quota setting on different database objects, e.g.
user, database, schema etc. not only a general GUC, but we could set
separate quota limit for a specific objects.
2 enforcement operator should work at two positions: before query is
running and when query is running. The latter one's implementation maybe
similar to your patch.

The patch was just example. The resource quotes should be more
complex - per partition, table, schema, database, user - so GUC are
possible, but not very user friendly.

Our case is specific, but not too much. The servers are used for
multidimensional analyses - and some tables can grow too fast (COPY, INSERT
SELECT). We need to solve limits immediately. The implementation is simple,
so I did it. Same implementation on database level, or schema level needs
some more locks, so it will not be too effective. The resource management
can be complex very complex, and I expect so it will be hard work.

Regards

Pavel

On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <
pavel.stehule@gmail.com> wrote:

Hi

2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:

Thanks Chapman.
@Pavel, could you please explain more about your second
suggestion "implement some quotas on storage level?"

See attached patch - it is very simple - and good enough for our
purposes.

Regards

Pavel

We will not keep the long-lived processes attach to all
databases(just like you mentioned servers with thousands of
databases)
And you are right, we could share ideas with autovacuum process,
fork worker processes in need.
"autovacuum checks for tables that have had a large number of
inserted, updated or deleted tuples. These checks use the statistics
collection facility"
diskquota process is similar to autovacuum at caring about insert,
but the difference is that it also care about vucuum full, truncate and
drop. While update and delete may not be interested since no file change
happens. So a separate diskquota process is preferred.

So if we implemented disk quota as a full native feature, and in
the first initial version I prefer to implement the following features:
1 Fork diskquota launcher process under Postmaster serverloop,
which is long-lived.
2 Diskquota launcher process is responsible for
creating diskquota worker process for every database.
3 DIskquota setting is stored in a separate catalog table for each
database.
4 Initialization stage, Diskquota launcher process creates
diskquota worker process for all the databases(traverse like
autovacuum). Worker process calculates disk usage of db objects and
their diskquota setting. If any db object exceeds its quota limit, put them
into the blacklist in the shared memory, which will later be used by
enforcement operator. Worker process exits when works are done.
5 Running stage, Diskquota launcher process creates diskquota worker
process for the database with a large number of insert, copy, truncate,
drop etc. or create disk quota statement. Worker process updates the file
size for db objects containing the result relation, and compare with the
diskquota setting. Again, if exceeds quota limit, put them into blacklist,
remove from blacklist vice versa. Worker process exits when works
are done and a GUC could control the frequency of worker process restart to
a specific database. As you know, this GUC also controls the delay when we
do enforcement.
6 Enforcement. When postgres backend executes queries, check the
blacklist in shared memory to determine whether the query is allowed(before
execute) or need rollback(is executing)?

If we implemented disk quota as an extension, we could just use
background worker to start diskquota launcher process and use
RegisterDynamicBackgroundWorker() to fork child diskquota worker
processes by the launcher process as suggested by @Chapman.
Diskquota setting could be stored in user table in a separate schema for
each database(Schema and table created by create extension statement) just
like what Heikki has done in pg_quota project. But in this case, we need to
create extension for each database before diskquota worker process can be
set up for that database.

Any comments on the above design and which is preferred, native
feature or extension as the POC?

-- Hubert

On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <
pavel.stehule@gmail.com> wrote:

2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net>:

On 08/30/2018 09:57 AM, Hubert Zhang wrote:

2 Keep one worker process for each database. But using a

parent/global

quota worker process to manage the lifecycle of database level

worker

processes. It could handle the newly created database(avoid

restart

database) and save resource when a database is not used. But

this needs to

change worker process to be hierarchical. Postmaster becomes

the grandfather

of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period,
and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to
worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

There are servers with thousands databases. Worker per database
is not good idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers can
be. On servers with large set of databases, large set of tables it can
identify too big table too late.

Isn't better to implement some quotas on storage level?

Regards

Pavel

-Chap

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

#15Hubert Zhang
hzhang@pivotal.io
In reply to: Pavel Stehule (#14)
Re: Proposal for disk quota feature

The quotas or object limits, resource limits are pretty useful and
necessary, but I don't see these like new type of objects, it is much more
some property of current objects. Because we have one syntax for this
purpose I prefer it. Because is not good to have two syntaxes for similar
purpose.

SCHEMA and TABLE are OK for me, But as I mentioned before, ROLE is a
special case when using ALTER SET at this moment.
TABLE and SCHEMA are both database level, e.g. pg_class and pg_namespace
both residents in one database. But ROLE is cluster-level. They don't
belong to a database. ALTER ROLE XXX SET disk_quota = xxx means to set the
quota for the user on all the databases in the first glance. But in our
first stage design, ROLE's quota is bind to a specific database. E.g. Role
Jack could have 10GB quota on database A and 2GB quota on database B.

SQL syntax is not hard to modify, I don't think this should block the main
design of disk quota feature. Is there any comment on the design and
architecture? If no, we'll firstly submit our patch and involve more
discussion?

On Sat, Sep 22, 2018 at 3:03 PM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

so 22. 9. 2018 v 8:48 odesílatel Hubert Zhang <hzhang@pivotal.io> napsal:

But it looks like redundant to current GUC configuration and limits

what do you mean by current GUC configuration? Is that the general block
number limit in your patch? If yes, the difference between GUC and
pg_diskquota catalog is that pg_diskquota will store different quota limit
for the different role, schema or table instead of a single GUC value.

storage is not relevant in this moment.

I don't see to consistent to sets some limits via SET command, or ALTER X
SET, and some other with CREATE QUOTA ON.

The quotas or object limits, resource limits are pretty useful and
necessary, but I don't see these like new type of objects, it is much more
some property of current objects. Because we have one syntax for this
purpose I prefer it. Because is not good to have two syntaxes for similar
purpose.

So instead CREATE DISC QUATA ON SCHEMA xxx some value I prefer

ALTER SCHEMA xxx SET disc_quota = xxx;

The functionality is +/- same. But ALTER XX SET was introduce first, and I
don't feel comfortable to have any new syntax for similar purpose

Regards

Pavel

On Sat, Sep 22, 2018 at 11:17 AM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

pá 21. 9. 2018 v 16:21 odesílatel Hubert Zhang <hzhang@pivotal.io>
napsal:

just fast reaction - why QUOTA object?

Isn't ALTER SET enough?
Some like
ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..
New DDL commans looks like too hard hammer .

It's an option. Prefer to consider quota setting store together:
CREATE DISK QUOTA way is more nature to store quota setting in a
separate pg_diskquota catalog
While ALTER SET way is more close to store quota setting in pg_class,
pg_role, pg_namespace. etc in an integrated way.
(Note that here I mean nature/close is not must, ALTER SET could also
store in pg_diskquota and vice versa.)

I have not a problem with new special table for storing this
information. But it looks like redundant to current GUC configuration and
limits. Can be messy do some work with ALTER ROLE, and some work via CREATE
QUOTE.

Regards

Pavel

Here are some differences I can think of:
1 pg_role is a global catalog, not per database level. It's harder to
tracker the user's disk usage in the whole clusters(considering 1000+
databases). So the semantic of CREATE DISK QUOTA ON USER is limited: it
only tracks the user's disk usage inside the current database.
2 using separate pg_diskquota could add more field except for quota
limit without adding too many fields in pg_class, e.g. red zone to give the
user a warning or the current disk usage of the db objects.

On Fri, Sep 21, 2018 at 8:01 PM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang <hzhang@pivotal.io>
napsal:

*Hi all,We redesign disk quota feature based on the comments from
Pavel Stehule and Chapman Flack. Here are the new design.OverviewBasically,
disk quota feature is used to support multi-tenancy environment, different
level of database objects could be set a quota limit to avoid over use of
disk space. A common case could be as follows: DBA could enable disk quota
on a specified database list. DBA could set disk quota limit for
tables/schemas/roles in these databases. Separate disk quota worker process
will monitor the disk usage for these objects and detect the objects which
exceed their quota limit. Queries loading data into these “out of disk
quota” tables/schemas/roles will be cancelled.We are currently working at
init implementation stage. We would like to propose our idea firstly and
get feedbacks from community to do quick iteration.SQL Syntax (How to use
disk quota)1 Specify the databases with disk quota enabled in GUC
“diskquota_databases” in postgresql.conf and restart the database.2 DBA
could set disk quota limit for table/schema/role.CREATE DISK QUOTA tablea1
ON TABLE a1 with (quota = ‘1MB’);CREATE DISK QUOTA roleu1 ON USER u1 with
(quota = ‘1GB’);CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota =
‘3MB’);*

just fast reaction - why QUOTA object?

Isn't ALTER SET enough?

Some like

ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..

New DDL commans looks like too hard hammer .

*3 Simulate a schema out of quota limit case: suppose table a1 and
table a2 are both under schema s1.INSERT INTO a1 SELECT
generate_series(1,1000);INSERT INTO a2 SELECT
generate_series(1,3000000);SELECT pg_sleep(5)INSERT INTO a1 SELECT
generate_series(1,1000);ERROR: schema's disk space quota exceededDROP
TABLE a2;SELECT pg_sleep(5)INSERT INTO a1 SELECT
generate_series(1,1000);INSERT 0 1000ArchitectureDisk quota has the
following components.1. Quota Setting Store is where the disk quota setting
to be stored and accessed. We plan to use catalog table pg_diskquota to
store these information. pg_diskquota is
like:CATALOG(pg_diskquota,6122,DiskQuotaRelationId){ NameData quotaname; /*
diskquota name */ int16 quotatype; /* diskquota type name */ Oid
quotatargetoid; /* diskquota target db object oid*/ int32 quotalimit; /*
diskquota size limit in MB*/ int32 quotaredzone; /* diskquota redzone in
MB*/} FormData_pg_diskquota;2. Quota Change Detector is the monitor of size
change of database objects. We plan to use stat collector to detect the
‘active’ table list at initial stage. But stat collector has some
limitation on finding the active table which is in a running transaction.
Details see TODO section.3. Quota Size Checker is where to calculate the
size and compare with quota limit for database objects. According to
Pavel’s comment, autovacuum launcher and worker process could be a good
reference to disk quota. So we plan to use a disk quota launcher daemon
process and several disk quota worker process to finish this work. Launcher
process is responsible for starting worker process based on a user defined
database list from GUC. Worker process will connect to its target database
and monitor the disk usage for objects in this database. In init stage of
worker process, it will call calculate_total_relation_size() to calculate
the size for each user table. After init stage, worker process will refresh
the disk model every N seconds. Refreshing will only recalculate the size
of tables in ‘active’ table list, which is generated by Quata Change
Detector to minimize the cost.4. Quota Enforcement Operator is where to
check for the quota limitation at postgres backend side. We will firstly
implement it in ExecCheckRTPerms() as pre-running enforcement. It will
check the disk quota of tables being inserted or updated, and report error
if table’s or table’s schema’s or table’s owner’s quota limit is exceeded.
As a native feature, we plan to add more checkpoint to do running query
enforcement. For example, if a disk quota lefts 10MB quota, a query could
insert 1GB data. This query could be allowed in pre-running enforcement
check, but will be cancelled in running query enforcement check. Therefore,
it can improve the accurate of disk quota usage. To achieve this, we plan
to add a checkpoint in lower API such as smgr_extened. Hence, the Quota
Enforcement Operator will check the disk quota usage when smgr_extened is
called. If the quota is over limited, current query will be cancelled.
Highlight1. Native feature.Support native Create/Drop Disk Quota SQL
statement.New catalog table pg_diskquota to store disk quota setting.2.
Auto DML/DDL detection. Table
create/update/insert/delete/vacuum/truncate/drop/schema_change/owner_change,
Schema create/drop and Role create/drop will be detected by disk quota
automatically. 3. Low cost disk quota checker.Worker process of disk quota
need to refresh the disk usage model every N seconds. Since recalculate the
file size using stat() system call is expensive for a large number of
files, we use an ‘active’ table list to reduce the real work at each
iteration. A basic experiment on our init stage implementation on database
with 20K tables shows that the refresh cost is 1% cpu usage and will be
finished within 50ms. Todo/LimitationBefore we propose our patch, we plan
to enhance it with the following ideas:1. Setting database list with disk
quota enabled dynamically without restart database. Since we have the disk
quota launcher process, it could detect the new ‘diskquota_databases’ list
and start/stop the corresponding disk quota worker process.2. Enforcement
when query is running. Considering the case when there is 10MB quota left,
but next query will insert 10GB data. Current enforcement design will allow
this query to be executed. This is limited by the ‘active’ table detection
is generated by stat collector. Postgres backend will only send table stat
information to collector only when the transaction ends. We need a new way
to detect the ‘active’ table even when this table is being modified inside
a running transaction.3. Monitor unlimited number of databases. Current we
set the max number of disk quota worker process to be 10 to reduce the
affection normal workload. But how about if we want to monitor the disk
quota of more than 10 databases? Our solution is to let disk quota launcher
to manage a queue of database need to be monitored. And disk quota worker
process consuming the queue and refresh the disk usage/quota for this
database. After some periods, worker will return the database to the queue,
and fetch the top database from queue to process. The period determine the
delay of detecting disk quota change. To implement this feature, we need to
support a subprocess of postmaster to rebind to another database instead of
the database binded in InitPostgres().4. Support active table detection on
vacuum full and vacuum analyze. Currently vacuum full and vacuum analyze
are not tracked by stat collector.Thanks to Heikki, Pavel Stehule,Chapman
Flack for the former comments on disk quota feature. Any comments on how to
improve disk quota feature are appreciated.*

On Mon, Sep 3, 2018 at 12:05 PM, Pavel Stehule <
pavel.stehule@gmail.com> wrote:

2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:

Thanks Pavel.
Your patch did enforcement on storage level(md.c or we could also
use smgr_extend). It's straight forward.
But I prefer to implement disk_quota as a feature with following
objectives:
1 set/alter disk quota setting on different database objects, e.g.
user, database, schema etc. not only a general GUC, but we could set
separate quota limit for a specific objects.
2 enforcement operator should work at two positions: before query
is running and when query is running. The latter one's implementation maybe
similar to your patch.

The patch was just example. The resource quotes should be more
complex - per partition, table, schema, database, user - so GUC are
possible, but not very user friendly.

Our case is specific, but not too much. The servers are used for
multidimensional analyses - and some tables can grow too fast (COPY, INSERT
SELECT). We need to solve limits immediately. The implementation is simple,
so I did it. Same implementation on database level, or schema level needs
some more locks, so it will not be too effective. The resource management
can be complex very complex, and I expect so it will be hard work.

Regards

Pavel

On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <
pavel.stehule@gmail.com> wrote:

Hi

2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:

Thanks Chapman.
@Pavel, could you please explain more about your second
suggestion "implement some quotas on storage level?"

See attached patch - it is very simple - and good enough for our
purposes.

Regards

Pavel

We will not keep the long-lived processes attach to all
databases(just like you mentioned servers with thousands of
databases)
And you are right, we could share ideas with autovacuum process,
fork worker processes in need.
"autovacuum checks for tables that have had a large number of
inserted, updated or deleted tuples. These checks use the statistics
collection facility"
diskquota process is similar to autovacuum at caring about
insert, but the difference is that it also care about vucuum full, truncate
and drop. While update and delete may not be interested since no file
change happens. So a separate diskquota process is preferred.

So if we implemented disk quota as a full native feature, and in
the first initial version I prefer to implement the following features:
1 Fork diskquota launcher process under Postmaster serverloop,
which is long-lived.
2 Diskquota launcher process is responsible for
creating diskquota worker process for every database.
3 DIskquota setting is stored in a separate catalog table for
each database.
4 Initialization stage, Diskquota launcher process creates
diskquota worker process for all the databases(traverse like
autovacuum). Worker process calculates disk usage of db objects and
their diskquota setting. If any db object exceeds its quota limit, put them
into the blacklist in the shared memory, which will later be used by
enforcement operator. Worker process exits when works are done.
5 Running stage, Diskquota launcher process creates diskquota worker
process for the database with a large number of insert, copy, truncate,
drop etc. or create disk quota statement. Worker process updates the file
size for db objects containing the result relation, and compare with the
diskquota setting. Again, if exceeds quota limit, put them into blacklist,
remove from blacklist vice versa. Worker process exits when
works are done and a GUC could control the frequency of worker process
restart to a specific database. As you know, this GUC also controls the
delay when we do enforcement.
6 Enforcement. When postgres backend executes queries, check the
blacklist in shared memory to determine whether the query is allowed(before
execute) or need rollback(is executing)?

If we implemented disk quota as an extension, we could just use
background worker to start diskquota launcher process and use
RegisterDynamicBackgroundWorker() to fork child diskquota worker
processes by the launcher process as suggested by @Chapman.
Diskquota setting could be stored in user table in a separate schema for
each database(Schema and table created by create extension statement) just
like what Heikki has done in pg_quota project. But in this case, we need to
create extension for each database before diskquota worker process can be
set up for that database.

Any comments on the above design and which is preferred, native
feature or extension as the POC?

-- Hubert

On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <
pavel.stehule@gmail.com> wrote:

2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net>
:

On 08/30/2018 09:57 AM, Hubert Zhang wrote:

2 Keep one worker process for each database. But using a

parent/global

quota worker process to manage the lifecycle of database

level worker

processes. It could handle the newly created database(avoid

restart

database) and save resource when a database is not used. But

this needs to

change worker process to be hierarchical. Postmaster becomes

the grandfather

of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period,
and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to
worker.bgw_notify_pid
of each of its children, and also obtains a handle for each
child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

There are servers with thousands databases. Worker per database
is not good idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers
can be. On servers with large set of databases, large set of tables it can
identify too big table too late.

Isn't better to implement some quotas on storage level?

Regards

Pavel

-Chap

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

#16Haozhou Wang
hawang@pivotal.io
In reply to: Hubert Zhang (#15)
1 attachment(s)
Re: Proposal for disk quota feature

Hi All,

We prepare a WIP patch which is implemented based on our redesigned
proposal. This patch is for demonstration our ideas and currently can
support create/delete disk quota for table/schema/user.

Here is the usage,
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 Specify the databases with disk quota enabled in GUC
“diskquota_databases” in postgresql.conf and restart the database.

2 DBA could set disk quota limit for table/schema/role.

CREATE DISK QUOTA tablea1 ON TABLE a1 with (quota = ‘1MB’);
CREATE DISK QUOTA roleu1 ON USER u1 with (quota = ‘1GB’);
CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota = ‘3MB’);

3 Simulate a schema out of quota limit case: suppose table a1 and table a2
are both under schema s1.

INSERT INTO a1 SELECT generate_series(1,1000);
INSERT INTO a2 SELECT generate_series(1,3000000);
SELECT pg_sleep(5)

INSERT INTO a1 SELECT generate_series(1,1000);
ERROR: schema's disk space quota exceeded

DROP TABLE a2;
SELECT pg_sleep(5)

INSERT INTO a1 SELECT generate_series(1,1000);
INSERT 0 1000
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

This patch works well on small size of databases (tens of databases) as
each database in GUC diskquota_databases will have a background worker
process for monitoring purpose.

Thanks very much for Heikki's suggestion and comments.
All comments are very welcome!

On Mon, Sep 24, 2018 at 9:56 PM Hubert Zhang <hzhang@pivotal.io> wrote:

The quotas or object limits, resource limits are pretty useful and

necessary, but I don't see these like new type of objects, it is much more
some property of current objects. Because we have one syntax for this
purpose I prefer it. Because is not good to have two syntaxes for similar
purpose.

SCHEMA and TABLE are OK for me, But as I mentioned before, ROLE is a
special case when using ALTER SET at this moment.
TABLE and SCHEMA are both database level, e.g. pg_class and pg_namespace
both residents in one database. But ROLE is cluster-level. They don't
belong to a database. ALTER ROLE XXX SET disk_quota = xxx means to set the
quota for the user on all the databases in the first glance. But in our
first stage design, ROLE's quota is bind to a specific database. E.g. Role
Jack could have 10GB quota on database A and 2GB quota on database B.

SQL syntax is not hard to modify, I don't think this should block the
main design of disk quota feature. Is there any comment on the design and
architecture? If no, we'll firstly submit our patch and involve more
discussion?

On Sat, Sep 22, 2018 at 3:03 PM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

so 22. 9. 2018 v 8:48 odesílatel Hubert Zhang <hzhang@pivotal.io> napsal:

But it looks like redundant to current GUC configuration and limits

what do you mean by current GUC configuration? Is that the general block
number limit in your patch? If yes, the difference between GUC and
pg_diskquota catalog is that pg_diskquota will store different quota limit
for the different role, schema or table instead of a single GUC value.

storage is not relevant in this moment.

I don't see to consistent to sets some limits via SET command, or ALTER X
SET, and some other with CREATE QUOTA ON.

The quotas or object limits, resource limits are pretty useful and
necessary, but I don't see these like new type of objects, it is much more
some property of current objects. Because we have one syntax for this
purpose I prefer it. Because is not good to have two syntaxes for similar
purpose.

So instead CREATE DISC QUATA ON SCHEMA xxx some value I prefer

ALTER SCHEMA xxx SET disc_quota = xxx;

The functionality is +/- same. But ALTER XX SET was introduce first, and
I don't feel comfortable to have any new syntax for similar purpose

Regards

Pavel

On Sat, Sep 22, 2018 at 11:17 AM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

pá 21. 9. 2018 v 16:21 odesílatel Hubert Zhang <hzhang@pivotal.io>
napsal:

just fast reaction - why QUOTA object?

Isn't ALTER SET enough?
Some like
ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..
New DDL commans looks like too hard hammer .

It's an option. Prefer to consider quota setting store together:
CREATE DISK QUOTA way is more nature to store quota setting in a
separate pg_diskquota catalog
While ALTER SET way is more close to store quota setting in pg_class,
pg_role, pg_namespace. etc in an integrated way.
(Note that here I mean nature/close is not must, ALTER SET could also
store in pg_diskquota and vice versa.)

I have not a problem with new special table for storing this
information. But it looks like redundant to current GUC configuration and
limits. Can be messy do some work with ALTER ROLE, and some work via CREATE
QUOTE.

Regards

Pavel

Here are some differences I can think of:
1 pg_role is a global catalog, not per database level. It's harder to
tracker the user's disk usage in the whole clusters(considering 1000+
databases). So the semantic of CREATE DISK QUOTA ON USER is limited: it
only tracks the user's disk usage inside the current database.
2 using separate pg_diskquota could add more field except for quota
limit without adding too many fields in pg_class, e.g. red zone to give the
user a warning or the current disk usage of the db objects.

On Fri, Sep 21, 2018 at 8:01 PM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang <hzhang@pivotal.io>
napsal:

*Hi all,We redesign disk quota feature based on the comments from
Pavel Stehule and Chapman Flack. Here are the new design.OverviewBasically,
disk quota feature is used to support multi-tenancy environment, different
level of database objects could be set a quota limit to avoid over use of
disk space. A common case could be as follows: DBA could enable disk quota
on a specified database list. DBA could set disk quota limit for
tables/schemas/roles in these databases. Separate disk quota worker process
will monitor the disk usage for these objects and detect the objects which
exceed their quota limit. Queries loading data into these “out of disk
quota” tables/schemas/roles will be cancelled.We are currently working at
init implementation stage. We would like to propose our idea firstly and
get feedbacks from community to do quick iteration.SQL Syntax (How to use
disk quota)1 Specify the databases with disk quota enabled in GUC
“diskquota_databases” in postgresql.conf and restart the database.2 DBA
could set disk quota limit for table/schema/role.CREATE DISK QUOTA tablea1
ON TABLE a1 with (quota = ‘1MB’);CREATE DISK QUOTA roleu1 ON USER u1 with
(quota = ‘1GB’);CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota =
‘3MB’);*

just fast reaction - why QUOTA object?

Isn't ALTER SET enough?

Some like

ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..

New DDL commans looks like too hard hammer .

*3 Simulate a schema out of quota limit case: suppose table a1 and
table a2 are both under schema s1.INSERT INTO a1 SELECT
generate_series(1,1000);INSERT INTO a2 SELECT
generate_series(1,3000000);SELECT pg_sleep(5)INSERT INTO a1 SELECT
generate_series(1,1000);ERROR: schema's disk space quota exceededDROP
TABLE a2;SELECT pg_sleep(5)INSERT INTO a1 SELECT
generate_series(1,1000);INSERT 0 1000ArchitectureDisk quota has the
following components.1. Quota Setting Store is where the disk quota setting
to be stored and accessed. We plan to use catalog table pg_diskquota to
store these information. pg_diskquota is
like:CATALOG(pg_diskquota,6122,DiskQuotaRelationId){ NameData quotaname; /*
diskquota name */ int16 quotatype; /* diskquota type name */ Oid
quotatargetoid; /* diskquota target db object oid*/ int32 quotalimit; /*
diskquota size limit in MB*/ int32 quotaredzone; /* diskquota redzone in
MB*/} FormData_pg_diskquota;2. Quota Change Detector is the monitor of size
change of database objects. We plan to use stat collector to detect the
‘active’ table list at initial stage. But stat collector has some
limitation on finding the active table which is in a running transaction.
Details see TODO section.3. Quota Size Checker is where to calculate the
size and compare with quota limit for database objects. According to
Pavel’s comment, autovacuum launcher and worker process could be a good
reference to disk quota. So we plan to use a disk quota launcher daemon
process and several disk quota worker process to finish this work. Launcher
process is responsible for starting worker process based on a user defined
database list from GUC. Worker process will connect to its target database
and monitor the disk usage for objects in this database. In init stage of
worker process, it will call calculate_total_relation_size() to calculate
the size for each user table. After init stage, worker process will refresh
the disk model every N seconds. Refreshing will only recalculate the size
of tables in ‘active’ table list, which is generated by Quata Change
Detector to minimize the cost.4. Quota Enforcement Operator is where to
check for the quota limitation at postgres backend side. We will firstly
implement it in ExecCheckRTPerms() as pre-running enforcement. It will
check the disk quota of tables being inserted or updated, and report error
if table’s or table’s schema’s or table’s owner’s quota limit is exceeded.
As a native feature, we plan to add more checkpoint to do running query
enforcement. For example, if a disk quota lefts 10MB quota, a query could
insert 1GB data. This query could be allowed in pre-running enforcement
check, but will be cancelled in running query enforcement check. Therefore,
it can improve the accurate of disk quota usage. To achieve this, we plan
to add a checkpoint in lower API such as smgr_extened. Hence, the Quota
Enforcement Operator will check the disk quota usage when smgr_extened is
called. If the quota is over limited, current query will be cancelled.
Highlight1. Native feature.Support native Create/Drop Disk Quota SQL
statement.New catalog table pg_diskquota to store disk quota setting.2.
Auto DML/DDL detection. Table
create/update/insert/delete/vacuum/truncate/drop/schema_change/owner_change,
Schema create/drop and Role create/drop will be detected by disk quota
automatically. 3. Low cost disk quota checker.Worker process of disk quota
need to refresh the disk usage model every N seconds. Since recalculate the
file size using stat() system call is expensive for a large number of
files, we use an ‘active’ table list to reduce the real work at each
iteration. A basic experiment on our init stage implementation on database
with 20K tables shows that the refresh cost is 1% cpu usage and will be
finished within 50ms. Todo/LimitationBefore we propose our patch, we plan
to enhance it with the following ideas:1. Setting database list with disk
quota enabled dynamically without restart database. Since we have the disk
quota launcher process, it could detect the new ‘diskquota_databases’ list
and start/stop the corresponding disk quota worker process.2. Enforcement
when query is running. Considering the case when there is 10MB quota left,
but next query will insert 10GB data. Current enforcement design will allow
this query to be executed. This is limited by the ‘active’ table detection
is generated by stat collector. Postgres backend will only send table stat
information to collector only when the transaction ends. We need a new way
to detect the ‘active’ table even when this table is being modified inside
a running transaction.3. Monitor unlimited number of databases. Current we
set the max number of disk quota worker process to be 10 to reduce the
affection normal workload. But how about if we want to monitor the disk
quota of more than 10 databases? Our solution is to let disk quota launcher
to manage a queue of database need to be monitored. And disk quota worker
process consuming the queue and refresh the disk usage/quota for this
database. After some periods, worker will return the database to the queue,
and fetch the top database from queue to process. The period determine the
delay of detecting disk quota change. To implement this feature, we need to
support a subprocess of postmaster to rebind to another database instead of
the database binded in InitPostgres().4. Support active table detection on
vacuum full and vacuum analyze. Currently vacuum full and vacuum analyze
are not tracked by stat collector.Thanks to Heikki, Pavel Stehule,Chapman
Flack for the former comments on disk quota feature. Any comments on how to
improve disk quota feature are appreciated.*

On Mon, Sep 3, 2018 at 12:05 PM, Pavel Stehule <
pavel.stehule@gmail.com> wrote:

2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:

Thanks Pavel.
Your patch did enforcement on storage level(md.c or we could also
use smgr_extend). It's straight forward.
But I prefer to implement disk_quota as a feature with following
objectives:
1 set/alter disk quota setting on different database objects, e.g.
user, database, schema etc. not only a general GUC, but we could set
separate quota limit for a specific objects.
2 enforcement operator should work at two positions: before query
is running and when query is running. The latter one's implementation maybe
similar to your patch.

The patch was just example. The resource quotes should be more
complex - per partition, table, schema, database, user - so GUC are
possible, but not very user friendly.

Our case is specific, but not too much. The servers are used for
multidimensional analyses - and some tables can grow too fast (COPY, INSERT
SELECT). We need to solve limits immediately. The implementation is simple,
so I did it. Same implementation on database level, or schema level needs
some more locks, so it will not be too effective. The resource management
can be complex very complex, and I expect so it will be hard work.

Regards

Pavel

On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <
pavel.stehule@gmail.com> wrote:

Hi

2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:

Thanks Chapman.
@Pavel, could you please explain more about your second
suggestion "implement some quotas on storage level?"

See attached patch - it is very simple - and good enough for our
purposes.

Regards

Pavel

We will not keep the long-lived processes attach to all
databases(just like you mentioned servers with thousands of
databases)
And you are right, we could share ideas with autovacuum process,
fork worker processes in need.
"autovacuum checks for tables that have had a large number of
inserted, updated or deleted tuples. These checks use the statistics
collection facility"
diskquota process is similar to autovacuum at caring about
insert, but the difference is that it also care about vucuum full, truncate
and drop. While update and delete may not be interested since no file
change happens. So a separate diskquota process is preferred.

So if we implemented disk quota as a full native feature, and
in the first initial version I prefer to implement the following features:
1 Fork diskquota launcher process under Postmaster serverloop,
which is long-lived.
2 Diskquota launcher process is responsible for
creating diskquota worker process for every database.
3 DIskquota setting is stored in a separate catalog table for
each database.
4 Initialization stage, Diskquota launcher process creates
diskquota worker process for all the databases(traverse like
autovacuum). Worker process calculates disk usage of db objects and
their diskquota setting. If any db object exceeds its quota limit, put them
into the blacklist in the shared memory, which will later be used by
enforcement operator. Worker process exits when works are done.
5 Running stage, Diskquota launcher process creates diskquota worker
process for the database with a large number of insert, copy, truncate,
drop etc. or create disk quota statement. Worker process updates the file
size for db objects containing the result relation, and compare with the
diskquota setting. Again, if exceeds quota limit, put them into blacklist,
remove from blacklist vice versa. Worker process exits when
works are done and a GUC could control the frequency of worker process
restart to a specific database. As you know, this GUC also controls the
delay when we do enforcement.
6 Enforcement. When postgres backend executes queries, check the
blacklist in shared memory to determine whether the query is allowed(before
execute) or need rollback(is executing)?

If we implemented disk quota as an extension, we could just use
background worker to start diskquota launcher process and use
RegisterDynamicBackgroundWorker() to fork child diskquota
worker processes by the launcher process as suggested by @Chapman.
Diskquota setting could be stored in user table in a separate schema for
each database(Schema and table created by create extension statement) just
like what Heikki has done in pg_quota project. But in this case, we need to
create extension for each database before diskquota worker process can be
set up for that database.

Any comments on the above design and which is preferred, native
feature or extension as the POC?

-- Hubert

On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <
pavel.stehule@gmail.com> wrote:

2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net

:

On 08/30/2018 09:57 AM, Hubert Zhang wrote:

2 Keep one worker process for each database. But using a

parent/global

quota worker process to manage the lifecycle of database

level worker

processes. It could handle the newly created database(avoid

restart

database) and save resource when a database is not used. But

this needs to

change worker process to be hierarchical. Postmaster becomes

the grandfather

of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set
period, and
starts a short-lived worker for every database, waiting for
each
one before starting the next.

It was straightforward to implement. Looking back over the
code,
I see the global worker assigns its own PID to
worker.bgw_notify_pid
of each of its children, and also obtains a handle for each
child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from
any
of them, but that seems equally straightforward at first
glance.

There are servers with thousands databases. Worker per database
is not good idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers
can be. On servers with large set of databases, large set of tables it can
identify too big table too late.

Isn't better to implement some quotas on storage level?

Regards

Pavel

-Chap

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

--
Thanks

Hubert Zhang

--
Regards,
Haozhou

Attachments:

0001-WIP-Add-disk-quota-feature.patchapplication/octet-stream; name=0001-WIP-Add-disk-quota-feature.patchDownload
From a5f70125125a0e9f69c1bb8687e9e3bdc98dd489 Mon Sep 17 00:00:00 2001
From: Hubert Zhang <hzhang@pivotal.io>
Date: Mon, 3 Sep 2018 09:40:09 +0000
Subject: [PATCH] WIP: Add disk quota feature.

Support native SQL for create/drop disk quota setting
Add disk quota launcher and worker process to monitor
the disk usage for each table/schema/role
Add enforcement when quota limit is exceeded.
---
 src/backend/catalog/Makefile               |    2 +-
 src/backend/commands/Makefile              |    2 +-
 src/backend/commands/diskquotacmd.c        |  223 ++++
 src/backend/commands/event_trigger.c       |    1 +
 src/backend/executor/execMain.c            |    6 +
 src/backend/nodes/copyfuncs.c              |   31 +
 src/backend/nodes/equalfuncs.c             |   26 +
 src/backend/parser/gram.y                  |   91 +-
 src/backend/postmaster/Makefile            |    2 +-
 src/backend/postmaster/diskquota.c         | 1810 ++++++++++++++++++++++++++++
 src/backend/postmaster/pgstat.c            |   20 +
 src/backend/postmaster/postmaster.c        |  251 +++-
 src/backend/storage/ipc/ipci.c             |    3 +
 src/backend/storage/lmgr/lwlocknames.txt   |    1 +
 src/backend/storage/lmgr/proc.c            |    9 +-
 src/backend/tcop/postgres.c                |   12 +
 src/backend/tcop/utility.c                 |   26 +
 src/backend/utils/adt/dbsize.c             |   17 +
 src/backend/utils/cache/syscache.c         |   34 +
 src/backend/utils/init/miscinit.c          |    5 +-
 src/backend/utils/init/postinit.c          |   19 +-
 src/backend/utils/misc/guc.c               |   40 +
 src/bin/psql/tab-complete.c                |    1 +
 src/include/catalog/indexing.h             |    8 +
 src/include/catalog/pg_diskquota.h         |   60 +
 src/include/commands/diskquotacmd.h        |   30 +
 src/include/nodes/nodes.h                  |    2 +
 src/include/nodes/parsenodes.h             |   32 +
 src/include/parser/kwlist.h                |    2 +
 src/include/pgstat.h                       |    3 +
 src/include/postmaster/diskquota.h         |   72 ++
 src/include/storage/pmsignal.h             |    2 +
 src/include/utils/dbsize.h                 |   18 +
 src/include/utils/guc_tables.h             |    1 +
 src/include/utils/syscache.h               |    3 +
 src/test/regress/expected/disk_quota.out   |   47 +
 src/test/regress/expected/sanity_check.out |    2 +
 src/test/regress/parallel_schedule         |    2 +-
 src/test/regress/serial_schedule           |    1 +
 src/test/regress/sql/disk_quota.sql        |   54 +
 40 files changed, 2933 insertions(+), 38 deletions(-)
 create mode 100644 src/backend/commands/diskquotacmd.c
 create mode 100644 src/backend/postmaster/diskquota.c
 create mode 100644 src/include/catalog/pg_diskquota.h
 create mode 100644 src/include/commands/diskquotacmd.h
 create mode 100644 src/include/postmaster/diskquota.h
 create mode 100644 src/include/utils/dbsize.h
 create mode 100644 src/test/regress/expected/disk_quota.out
 create mode 100644 src/test/regress/sql/disk_quota.sql

diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index 0865240..9e57fe5 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -46,7 +46,7 @@ CATALOG_HEADERS := \
 	pg_default_acl.h pg_init_privs.h pg_seclabel.h pg_shseclabel.h \
 	pg_collation.h pg_partitioned_table.h pg_range.h pg_transform.h \
 	pg_sequence.h pg_publication.h pg_publication_rel.h pg_subscription.h \
-	pg_subscription_rel.h
+	pg_subscription_rel.h pg_diskquota.h
 
 GENERATED_HEADERS := $(CATALOG_HEADERS:%.h=%_d.h) schemapg.h
 
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index 4a6c99e..d5a3ebb 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -14,7 +14,7 @@ include $(top_builddir)/src/Makefile.global
 
 OBJS = amcmds.o aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
 	collationcmds.o constraint.o conversioncmds.o copy.o createas.o \
-	dbcommands.o define.o discard.o dropcmds.o \
+	dbcommands.o define.o discard.o diskquotacmd.o dropcmds.o \
 	event_trigger.o explain.o extension.o foreigncmds.o functioncmds.o \
 	indexcmds.o lockcmds.o matview.o operatorcmds.o opclasscmds.o \
 	policy.o portalcmds.o prepare.o proclang.o publicationcmds.o \
diff --git a/src/backend/commands/diskquotacmd.c b/src/backend/commands/diskquotacmd.c
new file mode 100644
index 0000000..b110e0e
--- /dev/null
+++ b/src/backend/commands/diskquotacmd.c
@@ -0,0 +1,223 @@
+/*-------------------------------------------------------------------------
+ *
+ * disk_quota.c
+ *	  support for disk quota in different level.
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ *
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "catalog/dependency.h"
+#include "catalog/indexing.h"
+#include "catalog/namespace.h"
+#include "catalog/pg_diskquota.h"
+#include "commands/diskquotacmd.h"
+#include "miscadmin.h"
+#include "utils/acl.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+
+void CreateDiskQuota(CreateDiskQuotaStmt *stmt)
+{
+	Relation	disk_quota_rel;
+	Datum		quota_values[Natts_pg_diskquota];
+	bool		quota_nulls[Natts_pg_diskquota];
+	Oid			db_object_oid = InvalidOid;
+	Oid			disk_quota_oid = InvalidOid;
+	HeapTuple	tuple;
+	Oid			ownerId;
+	ListCell   *cell;
+	bool		quota_set = false;
+
+
+	disk_quota_rel = heap_open(DiskQuotaRelationId, RowExclusiveLock);
+
+
+	/* Must be super user */
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied to create disk quota \"%s\"",
+						stmt->quotaname),
+				 errhint("Must be superuser to create a disk quota.")));
+
+	/* For dependency only */
+	ownerId = GetUserId();
+
+	/* Check that there is no disk quota entry with the same name */
+	if (OidIsValid(GetDiskQuotaOidByName(stmt->quotaname)))
+		ereport(ERROR,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+				 errmsg("disk quota entry \"%s\" already exists",
+						stmt->quotaname)));
+
+	/*
+	 * Insert tuple into pg_diskquota
+	 */
+	memset(quota_values, 0, sizeof(quota_values));
+	memset(quota_nulls, false, sizeof(quota_nulls));
+
+	quota_values[Anum_pg_diskquota_quotaname - 1] =
+		DirectFunctionCall1(namein, CStringGetDatum(stmt->quotaname));
+
+	quota_values[Anum_pg_diskquota_quotatype - 1] = Int16GetDatum((int16) stmt->dbobjtype);
+
+	/*
+	 * Search for related database object OID
+	 */
+
+	switch (stmt->dbobjtype)
+	{
+		case DISK_QUOTA_TABLE:
+		{
+			db_object_oid = RangeVarGetRelidExtended(stmt->table, NoLock, RVR_MISSING_OK, NULL, NULL);
+
+			if (!OidIsValid(db_object_oid)){
+				if (stmt->table->schemaname)
+				{
+					ereport(ERROR, (errmsg("could not create disk quota, TABLE '%s.%s' does not exist",
+										stmt->table->schemaname, stmt->table->relname)));
+				}
+				else
+				{
+					ereport(ERROR, (errmsg("could not create disk quota, TABLE '%s' does not exist",
+										stmt->table->relname)));
+				}
+
+			}
+			break;
+		}
+		case DISK_QUOTA_SCHEMA:
+		{
+			db_object_oid = LookupNamespaceNoError(stmt->objname);
+
+			if (!OidIsValid(db_object_oid))
+			{
+
+				ereport(ERROR, (errmsg("could not create disk quota, SCHEMA '%s' does not exist",
+									stmt->objname)));
+
+			}
+			break;
+
+		}
+		case DISK_QUOTA_USER:
+		{
+			db_object_oid = get_role_oid(stmt->objname, true);
+
+			if (!OidIsValid(db_object_oid))
+			{
+				ereport(ERROR, (errmsg("could not create disk quota, USER '%s' does not exist",
+									stmt->objname)));
+
+			}
+			break;
+
+		}
+		default:
+		{
+			ereport(ERROR, (errmsg("could not create disk quota, Unknown DB OBJECT TYPE")));
+		}
+	}
+
+	quota_values[Anum_pg_diskquota_quotatargetoid - 1] = ObjectIdGetDatum(db_object_oid);
+
+	foreach(cell, stmt->options)
+	{
+		DefElem    *def = (DefElem *) lfirst(cell);
+
+		if (strcmp(def->defname, "quota") == 0 && quota_set == false)
+		{
+			int limitinMB;
+			const char *hintmsg;
+			if (!parse_int(strVal(def->arg), &limitinMB, GUC_UNIT_MB, &hintmsg))
+			{
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("invalid value for integer option \"%s\": %s",
+								def->defname, strVal(def->arg))));
+			}
+			quota_values[Anum_pg_diskquota_quotalimit- 1] = limitinMB;
+			quota_set = true;
+		}
+		else
+		{
+			if (quota_set)
+			{
+				ereport(ERROR,
+						(errmsg("duplicate quota settings")));
+
+			} else
+			{
+				ereport(ERROR,
+						(errcode(ERRCODE_UNDEFINED_OBJECT),
+						 errmsg("unknown disk quota option %s", def->defname)));
+			}
+
+		}
+	}
+
+
+	tuple = heap_form_tuple(disk_quota_rel->rd_att, quota_values, quota_nulls);
+
+	disk_quota_oid = CatalogTupleInsert(disk_quota_rel, tuple);
+
+	heap_freetuple(tuple);
+
+	recordDependencyOnOwner(DiskQuotaRelationId, disk_quota_oid, ownerId);
+
+	if (!quota_set) {
+		ereport(ERROR,
+			(errmsg("quota is not set in option"),
+			 errhint("Add quota='size' in option")));
+	}
+
+	heap_close(disk_quota_rel, RowExclusiveLock);
+}
+
+void DropDiskQuota(DropDiskQuotaStmt *stmt)
+{
+	Relation	disk_quota_rel;
+	HeapTuple	disk_quota_tuple;
+
+
+	disk_quota_rel = heap_open(DiskQuotaRelationId, RowExclusiveLock);
+
+	disk_quota_tuple = SearchSysCache1(DISKQUOTANAME, CStringGetDatum(stmt->quotaname));
+
+	if (!HeapTupleIsValid(disk_quota_tuple)) {
+		if (stmt->missing_ok)
+			return;
+		else
+		{
+			ereport(ERROR,
+				(errmsg("disk quota %s does not exist", stmt->quotaname)));
+		}
+	}
+
+	CatalogTupleDelete(disk_quota_rel, &disk_quota_tuple->t_self);
+
+	ReleaseSysCache(disk_quota_tuple);
+
+	heap_close(disk_quota_rel, RowExclusiveLock);
+}
+
+char *GetDiskQuotaName(Oid quotaid)
+{
+	return "none";
+}
+
+Oid GetDiskQuotaOidByName(const char *name)
+{
+	return GetSysCacheOid1(DISKQUOTANAME, CStringGetDatum(name));
+}
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index eecc85d..b2847f2 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -94,6 +94,7 @@ static event_trigger_support_data event_trigger_support[] = {
 	{"CONVERSION", true},
 	{"DATABASE", false},
 	{"DOMAIN", true},
+	{"DISK QUOTA", true},
 	{"EXTENSION", true},
 	{"EVENT TRIGGER", false},
 	{"FOREIGN DATA WRAPPER", true},
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 85d9803..6717292 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -52,6 +52,7 @@
 #include "miscadmin.h"
 #include "optimizer/clauses.h"
 #include "parser/parsetree.h"
+#include "postmaster/diskquota.h"
 #include "rewrite/rewriteManip.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
@@ -585,6 +586,11 @@ ExecCheckRTPerms(List *rangeTable, bool ereport_on_violation)
 							   get_rel_name(rte->relid));
 			return false;
 		}
+
+		if (!CheckTableQuota(rte))
+		{
+			return false;
+		}
 	}
 
 	if (ExecutorCheckPerms_hook)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 7c8220c..25b91be 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -4072,6 +4072,31 @@ _copyAlterExtensionContentsStmt(const AlterExtensionContentsStmt *from)
 	return newnode;
 }
 
+static CreateDiskQuotaStmt *
+_copyCreateDiskQuotaStmt(const CreateDiskQuotaStmt *from)
+{
+	CreateDiskQuotaStmt *newnode = makeNode(CreateDiskQuotaStmt);
+
+	COPY_STRING_FIELD(quotaname);
+	COPY_SCALAR_FIELD(dbobjtype);
+	COPY_STRING_FIELD(objname);
+	COPY_NODE_FIELD(options);
+
+	return newnode;
+}
+
+static DropDiskQuotaStmt *
+_copyDropDiskQuotaStmt(const DropDiskQuotaStmt *from)
+{
+	DropDiskQuotaStmt *newnode = makeNode(DropDiskQuotaStmt);
+
+	COPY_STRING_FIELD(quotaname);
+	COPY_SCALAR_FIELD(missing_ok);
+
+	return newnode;
+}
+
+
 static CreateFdwStmt *
 _copyCreateFdwStmt(const CreateFdwStmt *from)
 {
@@ -5379,6 +5404,12 @@ copyObjectImpl(const void *from)
 		case T_AlterExtensionContentsStmt:
 			retval = _copyAlterExtensionContentsStmt(from);
 			break;
+		case T_CreateDiskQuotaStmt:
+			retval = _copyCreateDiskQuotaStmt(from);
+			break;
+		case T_DropDiskQuotaStmt:
+			retval = _copyDropDiskQuotaStmt(from);
+			break;
 		case T_CreateFdwStmt:
 			retval = _copyCreateFdwStmt(from);
 			break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 378f2fa..30cdb14 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1857,6 +1857,26 @@ _equalAlterExtensionContentsStmt(const AlterExtensionContentsStmt *a, const Alte
 }
 
 static bool
+_equalCreateDiskQuotaStmt(const CreateDiskQuotaStmt *a, const CreateDiskQuotaStmt *b)
+{
+	COMPARE_STRING_FIELD(quotaname);
+	COMPARE_SCALAR_FIELD(dbobjtype);
+	COMPARE_STRING_FIELD(objname);
+	COMPARE_NODE_FIELD(options);
+
+	return true;
+}
+
+static bool
+_equalDropDiskQuotaStmt(const DropDiskQuotaStmt *a, const DropDiskQuotaStmt *b)
+{
+	COMPARE_STRING_FIELD(quotaname);
+	COMPARE_SCALAR_FIELD(missing_ok);
+
+	return true;
+}
+
+static bool
 _equalCreateFdwStmt(const CreateFdwStmt *a, const CreateFdwStmt *b)
 {
 	COMPARE_STRING_FIELD(fdwname);
@@ -3438,6 +3458,12 @@ equal(const void *a, const void *b)
 		case T_AlterExtensionContentsStmt:
 			retval = _equalAlterExtensionContentsStmt(a, b);
 			break;
+		case T_CreateDiskQuotaStmt:
+			retval = _equalCreateDiskQuotaStmt(a, b);
+			break;
+		case T_DropDiskQuotaStmt:
+			retval = _equalDropDiskQuotaStmt(a, b);
+			break;
 		case T_CreateFdwStmt:
 			retval = _equalCreateFdwStmt(a, b);
 			break;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4bd2223..e6395c1 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -258,6 +258,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
 		CreateSchemaStmt CreateSeqStmt CreateStmt CreateStatsStmt CreateTableSpaceStmt
+		CreateDiskQuotaStmt DropDiskQuotaStmt
 		CreateFdwStmt CreateForeignServerStmt CreateForeignTableStmt
 		CreateAssertStmt CreateTransformStmt CreateTrigStmt CreateEventTrigStmt
 		CreateUserStmt CreateUserMappingStmt CreateRoleStmt CreatePolicyStmt
@@ -407,6 +408,10 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <node>	opt_publication_for_tables publication_for_tables
 %type <value>	publication_name_item
 
+%type <list>    quota_option_list
+%type <str>     obj_name
+%type <defelt>  quota_option_elem
+
 %type <list>	opt_fdw_options fdw_options
 %type <defelt>	fdw_option
 
@@ -625,7 +630,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 	DATA_P DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
 	DEFERRABLE DEFERRED DEFINER DELETE_P DELIMITER DELIMITERS DEPENDS DESC
-	DETACH DICTIONARY DISABLE_P DISCARD DISTINCT DO DOCUMENT_P DOMAIN_P
+	DETACH DICTIONARY DISABLE_P DISCARD DISK DISTINCT DO DOCUMENT_P DOMAIN_P
 	DOUBLE_P DROP
 
 	EACH ELSE ENABLE_P ENCODING ENCRYPTED END_P ENUM_P ESCAPE EVENT EXCEPT
@@ -666,7 +671,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	POSITION PRECEDING PRECISION PRESERVE PREPARE PREPARED PRIMARY
 	PRIOR PRIVILEGES PROCEDURAL PROCEDURE PROCEDURES PROGRAM PUBLICATION
 
-	QUOTE
+	QUOTA QUOTE
 
 	RANGE READ REAL REASSIGN RECHECK RECURSIVE REF REFERENCES REFERENCING
 	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA
@@ -863,6 +868,7 @@ stmt :
 			| CreateAssertStmt
 			| CreateCastStmt
 			| CreateConversionStmt
+			| CreateDiskQuotaStmt
 			| CreateDomainStmt
 			| CreateExtensionStmt
 			| CreateFdwStmt
@@ -898,6 +904,7 @@ stmt :
 			| DoStmt
 			| DropAssertStmt
 			| DropCastStmt
+			| DropDiskQuotaStmt
 			| DropOpClassStmt
 			| DropOpFamilyStmt
 			| DropOwnedStmt
@@ -4772,6 +4779,84 @@ AlterExtensionContentsStmt:
 /*****************************************************************************
  *
  *		QUERY:
+ *             CREATE DISK QUOTA name ON obj_type obj_name WITH options_list
+ *
+ *****************************************************************************/
+
+CreateDiskQuotaStmt: CREATE DISK QUOTA name ON TABLE qualified_name opt_with '(' quota_option_list ')'
+					{
+						 CreateDiskQuotaStmt *n = makeNode(CreateDiskQuotaStmt);
+						 n->quotaname = $4;
+						 n->dbobjtype = DISK_QUOTA_TABLE;
+						 n->table = $7;
+						 n->objname = NULL;
+						 n->options = $10;
+						 $$ = (Node *)n;
+					}
+					|  CREATE DISK QUOTA name ON SCHEMA obj_name opt_with '(' quota_option_list ')'
+					{
+						CreateDiskQuotaStmt *n = makeNode(CreateDiskQuotaStmt);
+						n->quotaname = $4;
+						n->dbobjtype = DISK_QUOTA_SCHEMA;
+						n->table = NULL;
+						n->objname = $7;
+						n->options = $10;
+						$$ = (Node *)n;
+					}
+					|  CREATE DISK QUOTA name ON USER obj_name opt_with '(' quota_option_list ')'
+					{
+						CreateDiskQuotaStmt *n = makeNode(CreateDiskQuotaStmt);
+						n->quotaname = $4;
+						n->dbobjtype = DISK_QUOTA_USER;
+						n->table = NULL;
+						n->objname = $7;
+						n->options = $10;
+						$$ = (Node *)n;
+					}
+			;
+
+obj_name:
+		ColLabel                              { $$ = $1; }
+	;
+
+quota_option_elem:
+		ColLabel '=' Sconst
+		{
+			$$ = makeDefElem($1, (Node *)makeString($3), @1);
+		}
+	;
+
+quota_option_list:
+		quota_option_elem                               { $$ = list_make1($1); }
+		| quota_option_list ',' quota_option_elem       { $$ = lappend($1, $3); }
+	;
+
+/*****************************************************************************
+ *
+ *		QUERY:
+ *             DROP DISK QUOTA name
+ *
+ *****************************************************************************/
+
+DropDiskQuotaStmt: DROP DISK QUOTA name
+					{
+						DropDiskQuotaStmt *n = makeNode(DropDiskQuotaStmt);
+						n->quotaname = $4;
+						n->missing_ok = false;
+						$$ = (Node *) n;
+					}
+					|  DROP DISK QUOTA IF_P EXISTS name
+					{
+						DropDiskQuotaStmt *n = makeNode(DropDiskQuotaStmt);
+						n->quotaname = $6;
+						n->missing_ok = true;
+						$$ = (Node *) n;
+					}
+	;
+
+/*****************************************************************************
+ *
+ *		QUERY:
  *             CREATE FOREIGN DATA WRAPPER name options
  *
  *****************************************************************************/
@@ -15072,6 +15157,7 @@ unreserved_keyword:
 			| DICTIONARY
 			| DISABLE_P
 			| DISCARD
+			| DISK
 			| DOCUMENT_P
 			| DOMAIN_P
 			| DOUBLE_P
@@ -15192,6 +15278,7 @@ unreserved_keyword:
 			| PROCEDURES
 			| PROGRAM
 			| PUBLICATION
+			| QUOTA
 			| QUOTE
 			| RANGE
 			| READ
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 71c2321..a589325 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -13,6 +13,6 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o fork_process.o \
-	pgarch.o pgstat.o postmaster.o startup.o syslogger.o walwriter.o
+	pgarch.o pgstat.o postmaster.o startup.o syslogger.o walwriter.o diskquota.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/postmaster/diskquota.c b/src/backend/postmaster/diskquota.c
new file mode 100644
index 0000000..4c51ee0
--- /dev/null
+++ b/src/backend/postmaster/diskquota.c
@@ -0,0 +1,1810 @@
+/*-------------------------------------------------------------------------
+ *
+ * diskquota.c
+ *
+ * PostgreSQL Integrated disk quota Daemon
+ *
+ * The diskquota system is structured in two different kinds of processes: the
+ * diskquota launcher and the diskquota worker.  The launcher is an
+ * always-running process, started by the postmaster when the diskquota GUC
+ * parameter is set.  The launcher starts diskquota workers based on a given
+ * list of databases that disk quota is enabled.  The workers are the processes
+ * which calculate the disk usage of each monitored objects in the database.
+ *
+ * The diskquota launcher cannot start the worker processes by itself,
+ * because doing so would cause robustness issues (namely, failure to shut
+ * them down on exceptional conditions, and also, since the launcher is
+ * connected to shared memory and is thus subject to corruption there, it is
+ * not as robust as the postmaster).  So it leaves that task to the postmaster.
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/diskquota.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <signal.h>
+#include <sys/time.h>
+#include <unistd.h>
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/multixact.h"
+#include "access/reloptions.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "catalog/dependency.h"
+#include "catalog/indexing.h"
+#include "catalog/namespace.h"
+#include "catalog/pg_database.h"
+#include "catalog/pg_diskquota.h"
+#include "catalog/pg_tablespace.h"
+#include "commands/dbcommands.h"
+#include "executor/executor.h"
+#include "lib/ilist.h"
+#include "libpq/pqsignal.h"
+#include "miscadmin.h"
+#include "nodes/makefuncs.h"
+#include "pgstat.h"
+#include "postmaster/diskquota.h"
+#include "postmaster/fork_process.h"
+#include "postmaster/postmaster.h"
+#include "storage/bufmgr.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lmgr.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "storage/procsignal.h"
+#include "storage/sinvaladt.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/dbsize.h"
+#include "utils/fmgroids.h"
+#include "utils/fmgrprotos.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/ps_status.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+#include "utils/tqual.h"
+#include "utils/varlena.h"
+
+
+/*
+ * GUC parameters
+ */
+char	   *guc_dq_database_list = NULL;
+bool		diskquota_start_daemon = false;
+int			diskquota_max_workers;
+
+/* cluster level max size of black list */
+#define MAX_DISK_QUOTA_BLACK_ENTRIES 8192 * 1024
+/* cluster level init size of black list */
+#define INIT_DISK_QUOTA_BLACK_ENTRIES 8192
+/* per database level max size of black list */
+#define MAX_LOCAL_DISK_QUOTA_BLACK_ENTRIES 8192
+/* max number of disk quota worker process */
+#define NUM_WORKITEMS			10
+/* initial active table size */
+#define INIT_ACTIVE_TABLE_SIZE	64
+
+/* Flags to tell if we are in an diskquota process */
+static bool am_diskquota_launcher = false;
+static bool am_diskquota_worker = false;
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+static volatile sig_atomic_t got_SIGTERM = false;
+
+/* Memory context for long-lived data */
+static MemoryContext diskquotaMemCxt;
+
+typedef struct TableSizeEntry TableSizeEntry;
+typedef struct NamespaceSizeEntry NamespaceSizeEntry;
+typedef struct RoleSizeEntry RoleSizeEntry;
+typedef struct BlackMapEntry BlackMapEntry;
+typedef struct LocalBlackMapEntry LocalBlackMapEntry;
+
+/* local cache of table disk size and corresponding schema and owner */
+struct TableSizeEntry
+{
+	Oid			reloid;
+	Oid			namespaceoid;
+	Oid			owneroid;
+	int64		totalsize;
+};
+
+/* local cache of namespace disk size */
+struct NamespaceSizeEntry
+{
+	Oid			namespaceoid;
+	int64		totalsize;
+};
+
+/* local cache of role disk size */
+struct RoleSizeEntry
+{
+	Oid			owneroid;
+	int64		totalsize;
+};
+
+/* global blacklist for which exceed their quota limit */
+struct BlackMapEntry
+{
+	Oid 		targetoid;
+	Oid			databaseoid;
+};
+
+/* local blacklist for which exceed their quota limit */
+struct LocalBlackMapEntry
+{
+	Oid 		targetoid;
+	bool		isexceeded;
+};
+
+/*
+ * Get active table list to check their size
+ */
+static HTAB *pgstat_table_map = NULL;
+static HTAB *pgstat_active_table_map = NULL;
+
+/* Cache to detect the active table list */
+typedef struct DiskQuotaLocalTableCache
+{
+	Oid			tableid;
+	PgStat_Counter tuples_inserted;
+	PgStat_Counter tuples_updated;
+	PgStat_Counter tuples_deleted;
+	PgStat_Counter vacuum_count;
+	PgStat_Counter autovac_vacuum_count;
+	PgStat_Counter tuples_living;
+} DiskQuotaLocalTableCache;
+
+/* struct to describe the active table */
+typedef struct DiskQuotaActiveHashEntry
+{
+	Oid			reloid;
+	PgStat_Counter t_refcount; /* TODO: using refcount for active queue */
+} DiskQuotaActiveHashEntry;
+
+
+/*
+ * disk_quota_table_stat entry: store the last checked results of table status
+ */
+typedef struct DiskQuotaStateHashEntry
+{
+	Oid			reloid;
+	DiskQuotaLocalTableCache t_entry;
+} DiskQuotaStatHashEntry;
+
+/* using hash table to support incremental update the table size entry.*/
+static HTAB *table_size_map = NULL;
+static HTAB *namespace_size_map = NULL;
+static HTAB *role_size_map = NULL;
+
+/* black list for database objects which exceed their quota limit */
+static HTAB *disk_quota_black_map = NULL;
+static HTAB *local_disk_quota_black_map = NULL;
+
+/* workitem state*/
+typedef enum
+{
+	WIS_INVALID = 0,
+	WIS_STARTING,
+	WIS_RUNNING,
+	WIS_STOPPING,
+}WIState;
+/*
+ * disk quota workitem array, stored in DiskQuotaShmem->dq_workItems.  This
+ * list is mostly protected by DiskQuotaLock, except that if an item is
+ * marked 'active' other processes must not modify the work-identifying
+ * members.
+ */
+typedef struct DiskQuotaWorkItem
+{
+	WIState		dqw_state;
+	Oid			dqw_database;
+	TimestampTz	dqw_last_active;
+	TimestampTz	dqw_launchtime;
+} DiskQuotaWorkItem;
+
+/*-------------
+ * The main diskquota shmem struct.  On shared memory we store this main
+ * struct and the array of WorkerInfo structs.  This struct keeps:
+ *
+ * dq_launcherpid	 the PID of the diskquota launcher
+ * dq_startingWorker pointer to WorkerInfo currently being started (cleared by
+ *					 the worker itself as soon as it's up and running)
+ * dq_workItems		 work item array
+ *
+ * This struct is protected by DiskQuotaLock, except for dq_signal and parts
+ * of the worker list (see above).
+ *-------------
+ */
+typedef struct
+{
+	pid_t				dq_launcherpid;
+	DiskQuotaWorkItem  *dq_startingWorker;
+	DiskQuotaWorkItem	dq_workItems[NUM_WORKITEMS];
+} DiskQuotaShmemStruct;
+
+static DiskQuotaShmemStruct *DiskQuotaShmem;
+static DiskQuotaWorkItem *myWorkItem;
+
+/* PID of launcher, valid only in worker while shutting down */
+int			DiskquotaLauncherPid = 0;
+
+#ifdef EXEC_BACKEND
+static pid_t dqlauncher_forkexec(void);
+static pid_t dqworker_forkexec(void);
+#endif
+NON_EXEC_STATIC void DiskQuotaWorkerMain(int argc, char *argv[]) pg_attribute_noreturn();
+NON_EXEC_STATIC void DiskQuotaLauncherMain(int argc, char *argv[]) pg_attribute_noreturn();
+
+static List *get_database_list(void);
+
+static void do_diskquota(void);
+static void FreeWorkerInfo(int code, Datum arg);
+
+static void dq_sighup_handler(SIGNAL_ARGS);
+static void dql_sigusr2_handler(SIGNAL_ARGS);
+static void dql_sigterm_handler(SIGNAL_ARGS);
+
+static void init_worker_parameters(void);
+static void launcher_init_disk_quota(void);
+static void launcher_monitor_disk_quota(void);
+static void do_start_worker(DiskQuotaWorkItem *item);
+
+static void init_disk_quota_model(void);
+static void refresh_disk_quota_model(void);
+static void calculate_table_disk_usage(void);
+static void calculate_schema_disk_usage(void);
+static void calculate_role_disk_usage(void);
+static void flush_local_black_map(void);
+static void reset_local_black_map(void);
+static void check_disk_quota_by_oid(Oid targetOid, int64 current_usage, int8 diskquota_type);
+static void get_rel_owner_schema(Oid relid, Oid *ownerOid, Oid *nsOid);
+static void update_namespace_map(Oid namespaceoid, int64 updatesize);
+static void update_role_map(Oid owneroid, int64 updatesize);
+static void remove_namespace_map(Oid namespaceoid);
+static void remove_role_map(Oid owneroid);
+static bool check_table_is_active(Oid reloid);
+static void build_active_table_map(void);
+/********************************************************************
+ *					  DISKQUOTA LAUNCHER CODE
+ ********************************************************************/
+
+#ifdef EXEC_BACKEND
+/*
+ * forkexec routine for the diskquota launcher process.
+ *
+ * Format up the arglist, then fork and exec.
+ */
+static pid_t
+dqlauncher_forkexec(void)
+{
+	char	   *av[10];
+	int			ac = 0;
+
+	av[ac++] = "postgres";
+	av[ac++] = "--forkdqlauncher";
+	av[ac++] = NULL;			/* filled in by postmaster_forkexec */
+	av[ac] = NULL;
+
+	Assert(ac < lengthof(av));
+
+	return postmaster_forkexec(ac, av);
+}
+
+/*
+ * We need this set from the outside, before InitProcess is called
+ */
+void
+DiskquotaLauncherIAm(void)
+{
+	am_diskquota_launcher = true;
+}
+#endif
+
+/*
+ * Main entry point for diskquota launcher process, to be called from the
+ * postmaster.
+ */
+int
+StartDiskQuotaLauncher(void)
+{
+	pid_t		DiskQuotaPID;
+
+#ifdef EXEC_BACKEND
+	switch ((DiskQuotaPID = dqlauncher_forkexec()))
+#else
+	switch ((DiskQuotaPID = fork_process()))
+#endif
+	{
+		case -1:
+			ereport(LOG,
+					(errmsg("could not fork diskquota launcher process: %m")));
+			return 0;
+
+#ifndef EXEC_BACKEND
+		case 0:
+			/* in postmaster child ... */
+			InitPostmasterChild();
+			/* Close the postmaster's sockets */
+			ClosePostmasterPorts(false);
+			DiskQuotaLauncherMain(0, NULL);
+			break;
+#endif
+		default:
+			return (int) DiskQuotaPID;
+	}
+
+	/* shouldn't get here */
+	return 0;
+}
+
+/*
+ * After init stage, launcher is responsible for monitor the worker process and
+ * active databases.
+ */
+static void
+launcher_monitor_disk_quota(void)
+{
+	/*TODO: monitor the state of worker process and active database */
+	while(!got_SIGTERM)
+	{
+		int rc;
+		rc = WaitLatch(MyLatch, WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					5000, WAIT_EVENT_DISKQUOTA_MAIN);
+		ResetLatch(MyLatch);
+		/*
+		 * Emergency bailout if postmaster has died.  This is to avoid the
+		 * necessity for manual cleanup of all postmaster children.
+		 */
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		/* the normal shutdown case */
+		if (got_SIGTERM)
+			break;
+	}
+}
+
+static HeapTuple
+GetDatabaseTuple(const char *dbname)
+{
+	HeapTuple	tuple;
+	Relation	relation;
+	SysScanDesc	scan;
+	ScanKeyData	key[1];
+
+	/*
+	 * form a scan key
+	 */
+	ScanKeyInit(&key[0],
+				Anum_pg_database_datname,
+				BTEqualStrategyNumber, F_NAMEEQ,
+				CStringGetDatum(dbname));
+
+	/*
+	 * Open pg_database and fetch a tuple.  Force heap scan if we haven't yet
+	 * built the critical shared relcache entries (i.e., we're starting up
+	 * without a shared relcache cache file).
+	 */
+	relation = heap_open(DatabaseRelationId, AccessShareLock);
+	scan = systable_beginscan(relation, DatabaseNameIndexId,
+								criticalSharedRelcachesBuilt,
+								NULL,
+								1, key);
+
+	tuple = systable_getnext(scan);
+
+	/* Must copy tuple before releasing buffer */
+	if (HeapTupleIsValid(tuple))
+		tuple = heap_copytuple(tuple);
+
+	/* all done */
+	systable_endscan(scan);
+	heap_close(relation, AccessShareLock);
+
+	return tuple;
+}
+
+/* find oid given a database name */
+static Oid
+db_name_to_oid(const char *db_name)
+{
+	Oid oid = InvalidOid;
+	HeapTuple tuple;
+	StartTransactionCommand();
+	tuple = GetDatabaseTuple(db_name);
+	if (HeapTupleIsValid(tuple))
+	{
+		oid = HeapTupleGetOid(tuple);
+	}
+	CommitTransactionCommand();
+	return oid;
+}
+
+/*
+ * Main loop for the diskquota launcher process.
+ */
+NON_EXEC_STATIC void
+DiskQuotaLauncherMain(int argc, char *argv[])
+{
+	sigjmp_buf	local_sigjmp_buf;
+
+	am_diskquota_launcher = true;
+
+	/* Identify myself via ps */
+	init_ps_display(pgstat_get_backend_desc(B_DISKQUOTA_LAUNCHER), "", "", "");
+	elog(LOG, "disk quota enabled database list:'%s'\n", guc_dq_database_list);
+
+	if (PostAuthDelay)
+		pg_usleep(PostAuthDelay * 1000000L);
+
+	SetProcessingMode(InitProcessing);
+	/*
+	 * Set up signal handlers.  We operate on databases much like a regular
+	 * backend, so we use the same signal handling.  See equivalent code in
+	 * tcop/postgres.c.
+	 */
+	pqsignal(SIGHUP, dq_sighup_handler);
+	pqsignal(SIGINT, StatementCancelHandler);
+	pqsignal(SIGTERM, dql_sigterm_handler);
+
+	pqsignal(SIGQUIT, quickdie);
+	InitializeTimeouts();		/* establishes SIGALRM handler */
+
+	pqsignal(SIGPIPE, SIG_IGN);
+	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
+	pqsignal(SIGUSR2, dql_sigusr2_handler);
+	pqsignal(SIGFPE, FloatExceptionHandler);
+	pqsignal(SIGCHLD, SIG_DFL);
+
+	/* Early initialization */
+	BaseInit();
+	/*
+	 * Create a per-backend PGPROC struct in shared memory, except in the
+	 * EXEC_BACKEND case where this was done in SubPostmasterMain. We must do
+	 * this before we can use LWLocks (and in the EXEC_BACKEND case we already
+	 * had to do some stuff with LWLocks).
+	 */
+#ifndef EXEC_BACKEND
+	InitProcess();
+#endif
+	InitPostgres(NULL, InvalidOid, NULL, InvalidOid, NULL, false);
+
+	SetProcessingMode(NormalProcessing);
+
+	/*
+	 * Create a memory context that we will do all our work in.  We do this so
+	 * that we can reset the context during error recovery and thereby avoid
+	 * possible memory leaks.
+	 */
+	diskquotaMemCxt = AllocSetContextCreate(TopMemoryContext,
+											"diskquota Launcher",
+											ALLOCSET_DEFAULT_SIZES);
+	MemoryContextSwitchTo(diskquotaMemCxt);
+
+	/*
+	 * If an exception is encountered, processing resumes here.
+	 *
+	 * This code is a stripped down version of PostgresMain error recovery.
+	 */
+	if (sigsetjmp(local_sigjmp_buf, 1) != 0)
+	{
+		/* since not using PG_TRY, must reset error stack by hand */
+		error_context_stack = NULL;
+
+		/* Prevents interrupts while cleaning up */
+		HOLD_INTERRUPTS();
+
+		/* Forget any pending QueryCancel or timeout request */
+		disable_all_timeouts(false);
+		QueryCancelPending = false; /* second to avoid race condition */
+
+		/* Report the error to the server log */
+		EmitErrorReport();
+
+		/* Abort the current transaction in order to recover */
+		AbortCurrentTransaction();
+
+		/*
+		 * Release any other resources, for the case where we were not in a
+		 * transaction.
+		 */
+		LWLockReleaseAll();
+		pgstat_report_wait_end();
+		AbortBufferIO();
+		UnlockBuffers();
+		if (CurrentResourceOwner)
+		{
+			ResourceOwnerRelease(CurrentResourceOwner,
+								 RESOURCE_RELEASE_BEFORE_LOCKS,
+								 false, true);
+			/* we needn't bother with the other ResourceOwnerRelease phases */
+		}
+		AtEOXact_Buffers(false);
+		AtEOXact_SMgr();
+		AtEOXact_Files(false);
+		AtEOXact_HashTables(false);
+
+		/*
+		 * Now return to normal top-level context and clear ErrorContext for
+		 * next time.
+		 */
+		MemoryContextSwitchTo(diskquotaMemCxt);
+		FlushErrorState();
+
+		/* Flush any leaked data in the top-level context */
+		MemoryContextResetAndDeleteChildren(diskquotaMemCxt);
+
+		/*
+		 * Make sure pgstat also considers our stat data as gone.  Note: we
+		 * mustn't use diskquota_refresh_stats here.
+		 */
+		pgstat_clear_snapshot();
+
+		/* Now we can allow interrupts again */
+		RESUME_INTERRUPTS();
+
+		/* if in shutdown mode, no need for anything further; just go away */
+		if (got_SIGTERM)
+			goto shutdown;
+
+		/*
+		 * Sleep at least 1 second after any error.  We don't want to be
+		 * filling the error logs as fast as we can.
+		 */
+		pg_usleep(1000000L);
+	}
+
+	/* We can now handle ereport(ERROR) */
+	PG_exception_stack = &local_sigjmp_buf;
+
+	/* must unblock signals before calling rebuild_database_list */
+	PG_SETMASK(&UnBlockSig);
+
+	/*
+	 * Set always-secure search path.  Launcher doesn't connect to a database,
+	 * so this has no effect.
+	 */
+	SetConfigOption("search_path", "", PGC_SUSET, PGC_S_OVERRIDE);
+
+	/*
+	 * Force zero_damaged_pages OFF in the diskquota process, even if it is set
+	 * in postgresql.conf.  We don't really want such a dangerous option being
+	 * applied non-interactively.
+	 */
+	SetConfigOption("zero_damaged_pages", "false", PGC_SUSET, PGC_S_OVERRIDE);
+
+	/*
+	 * Force settable timeouts off to avoid letting these settings prevent
+	 * regular maintenance from being executed.
+	 */
+	SetConfigOption("statement_timeout", "0", PGC_SUSET, PGC_S_OVERRIDE);
+	SetConfigOption("lock_timeout", "0", PGC_SUSET, PGC_S_OVERRIDE);
+	SetConfigOption("idle_in_transaction_session_timeout", "0",
+					PGC_SUSET, PGC_S_OVERRIDE);
+
+	/*
+	 * Force default_transaction_isolation to READ COMMITTED.  We don't want
+	 * to pay the overhead of serializable mode, nor add any risk of causing
+	 * deadlocks or delaying other transactions.
+	 */
+	SetConfigOption("default_transaction_isolation", "read committed",
+					PGC_SUSET, PGC_S_OVERRIDE);
+
+	DiskQuotaShmem->dq_launcherpid = MyProcPid;
+
+	/* init disk quota information */
+	launcher_init_disk_quota();
+
+	/* monitor disk quota change */
+	launcher_monitor_disk_quota();
+
+	/* Normal exit from the diskquota launcher is here */
+shutdown:
+	ereport(DEBUG1,
+			(errmsg("diskquota launcher shutting down")));
+	DiskQuotaShmem->dq_launcherpid = 0;
+
+	proc_exit(0);				/* done */
+}
+
+/* let postmaster to fork disk quota worker process */
+static void
+do_start_worker(DiskQuotaWorkItem *item)
+{
+	SendPostmasterSignal(PMSIGNAL_START_DISKQUOTA_WORKER);
+}
+
+static void
+launcher_init_disk_quota(void)
+{
+	MemoryContext		tmpcxt,
+						oldcxt;
+	int					idx;
+	DiskQuotaWorkItem  *item;
+	DiskQuotaWorkItem  *itemArray;
+
+	idx = 0;
+	tmpcxt = AllocSetContextCreate(CurrentMemoryContext,
+								   "Start worker tmp cxt",
+								   ALLOCSET_DEFAULT_SIZES);
+	oldcxt = MemoryContextSwitchTo(tmpcxt);
+
+	init_worker_parameters();
+	itemArray = DiskQuotaShmem->dq_workItems;
+	while (idx < diskquota_max_workers) {
+		LWLockAcquire(DiskQuotaLock, LW_EXCLUSIVE);
+		item = DiskQuotaShmem->dq_startingWorker;
+		if (item != NULL)
+		{ // TODO: something is wrong when creating a worker process
+			if (item->dqw_state != WIS_RUNNING)
+			{
+				elog(WARNING, "running failed, state=%d", (int)item->dqw_state);
+			}
+			LWLockRelease(DiskQuotaLock);
+			WaitLatch(MyLatch, WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+				2000, WAIT_EVENT_DISKQUOTA_MAIN);
+			continue;
+		}
+		DiskQuotaShmem->dq_startingWorker = item =  &itemArray[idx];
+		if (!OidIsValid(item->dqw_database))
+		{
+			LWLockRelease(DiskQuotaLock);
+			break;
+		}
+		item->dqw_state = WIS_STARTING;
+		item->dqw_launchtime = GetCurrentTimestamp();
+		LWLockRelease(DiskQuotaLock);
+
+		do_start_worker(item);
+
+		WaitLatch(MyLatch, WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+			2000, WAIT_EVENT_DISKQUOTA_MAIN);
+		ResetLatch(MyLatch);
+
+		idx++;
+	}
+
+	MemoryContextSwitchTo(oldcxt);
+	MemoryContextDelete(tmpcxt);
+}
+
+/* SIGHUP: set flag to re-read config file at next convenient time */
+static void
+dq_sighup_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/* SIGUSR2: a worker is up and running, or just finished, or failed to fork */
+static void
+dql_sigusr2_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/* SIGTERM: time to die */
+static void
+dql_sigterm_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGTERM = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Assign the database with disk quota enabled into the
+ * DiskQuotaWorkItem struct.
+ */
+static void
+init_worker_parameters(void)
+{
+	List *dblist;
+	ListCell *cell;
+	int i = 0;
+	DiskQuotaWorkItem *worker;
+
+	dblist = get_database_list();
+	worker = DiskQuotaShmem->dq_workItems;
+
+	foreach(cell, dblist)
+	{
+		char *db_name;
+		Oid db_oid = InvalidOid;
+
+		db_name = (char *)lfirst(cell);
+		if (db_name == NULL || *db_name == '\0')
+		{
+			elog(WARNING, "invalid db name='%s'", db_name);
+			continue;
+		}
+		db_oid = db_name_to_oid(db_name);
+		if (db_oid == InvalidOid)
+		{
+			elog(WARNING, "cann't find oid for db='%s'", db_name);
+			continue;
+		}
+		if (i>=diskquota_max_workers)
+		{
+			elog(WARNING, "diskquota_max_workers<NUM_WORKITEMS: %d - %d\n", diskquota_max_workers, NUM_WORKITEMS);
+			break;
+		}
+		worker[i].dqw_database = db_oid;
+		elog(DEBUG1, "db_name[%d] = '%s' oid=%d", i, db_name, db_oid);
+		++i;
+	}
+}
+
+/*
+ * IsDiskQuota functions
+ * Return whether this is either a launcher diskquota process
+ */
+bool
+IsDiskQuotaLauncherProcess(void)
+{
+	return am_diskquota_launcher;
+}
+/*
+ * DiskQuotaShmemSize
+ *		Compute space needed for diskquota-related shared memory
+ */
+Size
+DiskQuotaShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(DiskQuotaShmemStruct);
+	size = MAXALIGN(size);
+	size = add_size(size, hash_estimate_size(MAX_DISK_QUOTA_BLACK_ENTRIES, sizeof(BlackMapEntry)));
+	return size;
+}
+
+/*
+ * DiskQuotaShmemInit
+ *		Allocate and initialize diskquota-related shared memory
+ */
+void
+DiskQuotaShmemInit(void)
+{
+	bool		found;
+	HASHCTL		hash_ctl;
+
+	DiskQuotaShmem = (DiskQuotaShmemStruct *)
+		ShmemInitStruct("DiskQuota Data",
+						DiskQuotaShmemSize(),
+						&found);
+
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(BlackMapEntry);
+	hash_ctl.alloc = ShmemAllocNoError;
+	hash_ctl.dsize = hash_ctl.max_dsize = hash_select_dirsize(MAX_DISK_QUOTA_BLACK_ENTRIES);
+	disk_quota_black_map = ShmemInitHash("blackmap whose quota limitation is reached",
+									INIT_DISK_QUOTA_BLACK_ENTRIES,
+									MAX_DISK_QUOTA_BLACK_ENTRIES,
+									&hash_ctl,
+									HASH_DIRSIZE | HASH_SHARED_MEM | HASH_ALLOC | HASH_ELEM | HASH_BLOBS);
+
+
+	if (!IsUnderPostmaster)
+	{
+		Assert(!found);
+
+		DiskQuotaShmem->dq_launcherpid = 0;
+		DiskQuotaShmem->dq_startingWorker = NULL;
+		memset(DiskQuotaShmem->dq_workItems, 0,
+			   sizeof(DiskQuotaWorkItem) * NUM_WORKITEMS);
+	}
+	else
+		Assert(found);
+}
+
+/*
+ * database list found in guc
+ */
+static List *
+get_database_list(void)
+{
+	List	   *dblist = NULL;
+	if (!SplitIdentifierString(guc_dq_database_list, ',', &dblist))
+	{
+		elog(FATAL, "cann't get database list from guc:'%s'", guc_dq_database_list);
+		return NULL;
+	}
+	return dblist;
+}
+
+/********************************************************************
+ *					  DISKQUOTA WORKER CODE
+ ********************************************************************/
+
+#ifdef EXEC_BACKEND
+/*
+ * forkexec routines for the diskquota worker.
+ *
+ * Format up the arglist, then fork and exec.
+ */
+static pid_t
+dqworker_forkexec(void)
+{
+	char	   *av[10];
+	int			ac = 0;
+
+	av[ac++] = "postgres";
+	av[ac++] = "--forkdqworker";
+	av[ac++] = NULL;			/* filled in by postmaster_forkexec */
+	av[ac] = NULL;
+
+	Assert(ac < lengthof(av));
+
+	return postmaster_forkexec(ac, av);
+}
+
+/*
+ * We need this set from the outside, before InitProcess is called
+ */
+void
+DiskquotaWorkerIAm(void)
+{
+	am_diskquota_worker = true;
+}
+#endif
+
+/*
+ * Main entry point for diskquota worker process.
+ *
+ * This code is heavily based on pgarch.c, q.v.
+ */
+int
+StartDiskQuotaWorker(void)
+{
+	pid_t		worker_pid;
+
+#ifdef EXEC_BACKEND
+	switch ((worker_pid = dqworker_forkexec()))
+#else
+	switch ((worker_pid = fork_process()))
+#endif
+	{
+		case -1:
+			ereport(LOG,
+					(errmsg("could not fork diskquota worker process: %m")));
+			return 0;
+
+#ifndef EXEC_BACKEND
+		case 0:
+			/* in postmaster child ... */
+			InitPostmasterChild();
+
+			/* Close the postmaster's sockets */
+			ClosePostmasterPorts(false);
+
+			DiskQuotaWorkerMain(0, NULL);
+			break;
+#endif
+		default:
+			return (int) worker_pid;
+	}
+
+	/* shouldn't get here */
+	return 0;
+}
+
+/*
+ * DiskQuotaWorkerMain
+ */
+NON_EXEC_STATIC void
+DiskQuotaWorkerMain(int argc, char *argv[])
+{
+	sigjmp_buf	local_sigjmp_buf;
+	Oid			dbid;
+
+	am_diskquota_worker = true;
+
+	/* Identify myself via ps */
+	init_ps_display(pgstat_get_backend_desc(B_DISKQUOTA_WORKER), "", "", "");
+
+	SetProcessingMode(InitProcessing);
+
+	/*
+	 * Set up signal handlers.  We operate on databases much like a regular
+	 * backend, so we use the same signal handling.  See equivalent code in
+	 * tcop/postgres.c.
+	 */
+	pqsignal(SIGHUP, dq_sighup_handler);
+
+	/*
+	 * SIGINT is used to signal canceling the current disk quota check; SIGTERM
+	 * means abort and exit cleanly, and SIGQUIT means abandon ship.
+	 */
+	pqsignal(SIGINT, StatementCancelHandler);
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGQUIT, quickdie);
+	InitializeTimeouts();		/* establishes SIGALRM handler */
+
+	pqsignal(SIGPIPE, SIG_IGN);
+	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
+	pqsignal(SIGUSR2, SIG_IGN);
+	pqsignal(SIGFPE, FloatExceptionHandler);
+	pqsignal(SIGCHLD, SIG_DFL);
+
+	/* Early initialization */
+	BaseInit();
+	/*
+	 * Create a per-backend PGPROC struct in shared memory, except in the
+	 * EXEC_BACKEND case where this was done in SubPostmasterMain. We must do
+	 * this before we can use LWLocks (and in the EXEC_BACKEND case we already
+	 * had to do some stuff with LWLocks).
+	 */
+#ifndef EXEC_BACKEND
+	InitProcess();
+#endif
+
+	/*
+	 * If an exception is encountered, processing resumes here.
+	 *
+	 * See notes in postgres.c about the design of this coding.
+	 */
+	if (sigsetjmp(local_sigjmp_buf, 1) != 0)
+	{
+		/* Prevents interrupts while cleaning up */
+		HOLD_INTERRUPTS();
+
+		/* Report the error to the server log */
+		EmitErrorReport();
+
+		/*
+		 * We can now go away.  Note that because we called InitProcess, a
+		 * callback was registered to do ProcKill, which will clean up
+		 * necessary state.
+		 */
+		proc_exit(0);
+	}
+
+	/* We can now handle ereport(ERROR) */
+	PG_exception_stack = &local_sigjmp_buf;
+
+	PG_SETMASK(&UnBlockSig);
+
+	/*
+	 * Set always-secure search path, so malicious users can't redirect user
+	 * code (e.g. pg_index.indexprs).  (That code runs in a
+	 * SECURITY_RESTRICTED_OPERATION sandbox, so malicious users could not
+	 * take control of the entire diskquota worker in any case.)
+	 */
+	SetConfigOption("search_path", "", PGC_SUSET, PGC_S_OVERRIDE);
+
+	/*
+	 * Force zero_damaged_pages OFF in the diskquota process, even if it is set
+	 * in postgresql.conf.  We don't really want such a dangerous option being
+	 * applied non-interactively.
+	 */
+	SetConfigOption("zero_damaged_pages", "false", PGC_SUSET, PGC_S_OVERRIDE);
+
+	/*
+	 * Force settable timeouts off to avoid letting these settings prevent
+	 * regular maintenance from being executed.
+	 */
+	SetConfigOption("statement_timeout", "0", PGC_SUSET, PGC_S_OVERRIDE);
+	SetConfigOption("lock_timeout", "0", PGC_SUSET, PGC_S_OVERRIDE);
+	SetConfigOption("idle_in_transaction_session_timeout", "0",
+					PGC_SUSET, PGC_S_OVERRIDE);
+
+	/*
+	 * Force default_transaction_isolation to READ COMMITTED.  We don't want
+	 * to pay the overhead of serializable mode, nor add any risk of causing
+	 * deadlocks or delaying other transactions.
+	 */
+	SetConfigOption("default_transaction_isolation", "read committed",
+					PGC_SUSET, PGC_S_OVERRIDE);
+
+	/*
+	 * Force synchronous replication off to allow regular maintenance even if
+	 * we are waiting for standbys to connect. This is important to ensure we
+	 * aren't blocked from performing anti-wraparound tasks.
+	 */
+	if (synchronous_commit > SYNCHRONOUS_COMMIT_LOCAL_FLUSH)
+		SetConfigOption("synchronous_commit", "local",
+						PGC_SUSET, PGC_S_OVERRIDE);
+
+	/*
+	 * Get the info about the database we're going to work on.
+	 */
+	LWLockAcquire(DiskQuotaLock, LW_EXCLUSIVE);
+	/*
+	 * beware of startingWorker being INVALID; this should normally not
+	 * happen, but if a worker fails after forking and before this, the
+	 * launcher might have decided to remove it from the queue and start
+	 * again.
+	 */
+	if (DiskQuotaShmem->dq_startingWorker != NULL)
+	{
+		myWorkItem = DiskQuotaShmem->dq_startingWorker;
+		dbid = myWorkItem->dqw_database;
+		myWorkItem->dqw_state = WIS_RUNNING;
+
+		/*
+		 * remove from the "starting" pointer, so that the launcher can start
+		 * a new worker if required
+		 */
+		DiskQuotaShmem->dq_startingWorker = NULL;
+		LWLockRelease(DiskQuotaLock);
+
+		on_shmem_exit(FreeWorkerInfo, 0);
+
+		/* wake up the launcher */
+		if (DiskQuotaShmem->dq_launcherpid != 0)
+			kill(DiskQuotaShmem->dq_launcherpid, SIGUSR2);
+	}
+	else
+	{
+		dbid = InvalidOid;
+		LWLockRelease(DiskQuotaLock);
+	}
+
+	if (OidIsValid(dbid))
+	{
+		char		dbname[NAMEDATALEN];
+
+		/*
+		 * Connect to the selected database
+		 *
+		 * Note: if we have selected a just-deleted database (due to using
+		 * stale stats info), we'll fail and exit here.
+		 */
+		InitPostgres(NULL, dbid, NULL, InvalidOid, dbname, false);
+		SetProcessingMode(NormalProcessing);
+		set_ps_display(dbname, false);
+		ereport(DEBUG1,
+				(errmsg("diskquota: processing database \"%s\"", dbname)));
+
+		if (PostAuthDelay)
+			pg_usleep(PostAuthDelay * 1000000L);
+
+
+		do_diskquota();
+	}
+
+	/*
+	 * The launcher will be notified of my death in ProcKill, *if* we managed
+	 * to get a worker slot at all
+	 */
+
+	/* All done, go away */
+	proc_exit(0);
+}
+
+/*
+ * Return a WorkerInfo to the free list
+ */
+static void
+FreeWorkerInfo(int code, Datum arg)
+{
+	if (myWorkItem != NULL)
+	{
+		LWLockAcquire(DiskQuotaLock, LW_EXCLUSIVE);
+
+		/*
+		 * Wake the launcher up so that he can launch a new worker immediately
+		 * if required.  We only save the launcher's PID in local memory here;
+		 * the actual signal will be sent when the PGPROC is recycled.  Note
+		 * that we always do this, so that the launcher can rebalance the cost
+		 * limit setting of the remaining workers.
+		 *
+		 * We somewhat ignore the risk that the launcher changes its PID
+		 * between us reading it and the actual kill; we expect ProcKill to be
+		 * called shortly after us, and we assume that PIDs are not reused too
+		 * quickly after a process exits.
+		 */
+		DiskquotaLauncherPid = DiskQuotaShmem->dq_launcherpid;
+
+		myWorkItem->dqw_launchtime = 0;
+		myWorkItem->dqw_state = WIS_INVALID;
+
+		/* not mine anymore */
+		myWorkItem = NULL;
+
+		/*
+		 * now that we're inactive, cause a rebalancing of the surviving
+		 * workers
+		 */
+		LWLockRelease(DiskQuotaLock);
+	}
+}
+
+
+
+/*
+ * init disk quota model when the worker process firstly started.
+ */
+void
+init_disk_quota_model(void)
+{
+	HASHCTL		hash_ctl;
+	MemoryContext DSModelContext;
+	DSModelContext = AllocSetContextCreate(TopMemoryContext,
+										   "Disk quotas model context",
+										   ALLOCSET_DEFAULT_SIZES);
+
+	/* init hash table for table/schema/role etc.*/
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(TableSizeEntry);
+	hash_ctl.hcxt = DSModelContext;
+
+	table_size_map = hash_create("TableSizeEntry map",
+								1024,
+								&hash_ctl,
+								HASH_ELEM | HASH_CONTEXT| HASH_BLOBS);
+
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(NamespaceSizeEntry);
+	hash_ctl.hcxt = DSModelContext;
+
+	namespace_size_map = hash_create("NamespaceSizeEntry map",
+								1024,
+								&hash_ctl,
+								HASH_ELEM | HASH_CONTEXT| HASH_BLOBS);
+
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(RoleSizeEntry);
+	hash_ctl.hcxt = DSModelContext;
+
+	role_size_map = hash_create("RoleSizeEntry map",
+								1024,
+								&hash_ctl,
+								HASH_ELEM | HASH_CONTEXT| HASH_BLOBS);
+
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(LocalBlackMapEntry);
+	local_disk_quota_black_map = hash_create("local blackmap whose quota limitation is reached",
+									MAX_LOCAL_DISK_QUOTA_BLACK_ENTRIES,
+									&hash_ctl,
+									HASH_ELEM | HASH_CONTEXT | HASH_BLOBS);
+	if (pgstat_table_map == NULL)
+	{
+		HASHCTL ctl;
+
+		memset(&ctl, 0, sizeof(ctl));
+
+		ctl.keysize = sizeof(Oid);
+		ctl.entrysize = sizeof(DiskQuotaStatHashEntry);
+
+		pgstat_table_map = hash_create("disk quota Table State Entry lookup hash table",
+									NUM_WORKITEMS,
+									&ctl,
+									HASH_ELEM | HASH_BLOBS);
+	}
+
+	if (pgstat_active_table_map == NULL)
+	{
+		HASHCTL ctl;
+
+		memset(&ctl, 0, sizeof(ctl));
+
+		ctl.keysize = sizeof(Oid);
+		ctl.entrysize = sizeof(DiskQuotaActiveHashEntry);
+
+		pgstat_active_table_map = hash_create("disk quota Active Table Entry lookup hash table",
+									INIT_ACTIVE_TABLE_SIZE,
+									&ctl,
+									HASH_ELEM | HASH_BLOBS);
+	}
+
+	/* calcualte the disk usage for each database objects */
+	refresh_disk_quota_model();
+}
+
+/*
+ * generate the new shared blacklist from the localblack list which
+ * exceed the quota limit.
+ * */
+static void
+flush_local_black_map(void)
+{
+	HASH_SEQ_STATUS iter;
+	LocalBlackMapEntry* localblackentry;
+	BlackMapEntry* blackentry;
+	bool found;
+
+	LWLockAcquire(DiskQuotaLock, LW_EXCLUSIVE);
+
+	hash_seq_init(&iter, local_disk_quota_black_map);
+	while ((localblackentry = hash_seq_search(&iter)) != NULL)
+	{
+		if (localblackentry->isexceeded)
+		{
+			blackentry = (BlackMapEntry*) hash_search(disk_quota_black_map,
+							   (void *) &localblackentry->targetoid,
+							   HASH_ENTER_NULL, &found);
+			if (blackentry == NULL)
+			{
+				elog(WARNING, "shared disk quota black map size limit reached.");
+			}
+			else
+			{
+				/* new db objects which exceed quota limit */
+				if (!found)
+				{
+					blackentry->targetoid = blackentry->targetoid;
+					blackentry->databaseoid = MyDatabaseId;
+				}
+			}
+		}
+		else
+		{
+			/* db objects are removed or under quota limit in the new loop */
+			(void) hash_search(disk_quota_black_map,
+							   (void *) &localblackentry->targetoid,
+							   HASH_REMOVE, NULL);
+		}
+	}
+	LWLockRelease(DiskQuotaLock);
+}
+
+/* fetch the new blacklist from shared blacklist at each refresh iteration. */
+static void
+reset_local_black_map(void)
+{
+	HASH_SEQ_STATUS iter;
+	LocalBlackMapEntry* localblackentry;
+	BlackMapEntry* blackentry;
+	bool found;
+	/* clear entries in local black map*/
+	hash_seq_init(&iter, local_disk_quota_black_map);
+
+	while ((localblackentry = hash_seq_search(&iter)) != NULL)
+	{
+		(void) hash_search(local_disk_quota_black_map,
+				(void *) &localblackentry->targetoid,
+				HASH_REMOVE, NULL);
+	}
+
+	/* get black map copy from shared black map */
+	LWLockAcquire(DiskQuotaLock, LW_SHARED);
+	hash_seq_init(&iter, disk_quota_black_map);
+	while ((blackentry = hash_seq_search(&iter)) != NULL)
+	{
+		/* only reset entries for current db */
+		if (blackentry->databaseoid == MyDatabaseId)
+		{
+			localblackentry = (LocalBlackMapEntry*) hash_search(local_disk_quota_black_map,
+								(void *) &blackentry->targetoid,
+								HASH_ENTER, &found);
+			if (!found)
+			{
+				localblackentry->targetoid = blackentry->targetoid;
+				localblackentry->isexceeded = false;
+			}
+		}
+	}
+	LWLockRelease(DiskQuotaLock);
+
+}
+
+/*
+ * Compare the disk quota limit and current usage of a database object.
+ * Put them into local blacklist if quota limit is exceeded.
+ */
+static void check_disk_quota_by_oid(Oid targetOid, int64 current_usage, int8 diskquota_type)
+{
+	bool					found;
+	HeapTuple				tuple;
+	int32 					quota_limit_mb;
+	int32 					current_usage_mb;
+	LocalBlackMapEntry*	localblackentry;
+
+	tuple = SearchSysCache1(DISKQUOTATARGETOID, ObjectIdGetDatum(targetOid));
+	if (!HeapTupleIsValid(tuple))
+	{
+		/* default no limit */
+		return;
+	}
+	quota_limit_mb = ((Form_pg_diskquota) GETSTRUCT(tuple))->quotalimit;
+	ReleaseSysCache(tuple);
+
+	current_usage_mb = current_usage / (1024 *1024);
+	if(current_usage_mb > quota_limit_mb)
+	{
+		elog(DEBUG1,"Put object %u to blacklist with quota limit:%d, current usage:%d",
+				targetOid, quota_limit_mb, current_usage_mb);
+		localblackentry = (LocalBlackMapEntry*) hash_search(local_disk_quota_black_map,
+					&targetOid,
+					HASH_ENTER, &found);
+		localblackentry->isexceeded = true;
+	}
+}
+
+static void
+remove_namespace_map(Oid namespaceoid)
+{
+	hash_search(namespace_size_map,
+			&namespaceoid,
+			HASH_REMOVE, NULL);
+}
+
+static void
+update_namespace_map(Oid namespaceoid, int64 updatesize)
+{
+	bool found;
+	NamespaceSizeEntry* nsentry;
+	nsentry = (NamespaceSizeEntry *)hash_search(namespace_size_map,
+			&namespaceoid,
+			HASH_ENTER, &found);
+	if (!found)
+	{
+		nsentry->namespaceoid = namespaceoid;
+		nsentry->totalsize = updatesize;
+	}
+	else {
+		nsentry->totalsize += updatesize;
+	}
+
+}
+
+static void
+remove_role_map(Oid owneroid)
+{
+	hash_search(role_size_map,
+			&owneroid,
+			HASH_REMOVE, NULL);
+}
+
+static void
+update_role_map(Oid owneroid, int64 updatesize)
+{
+	bool found;
+	RoleSizeEntry* rolentry;
+	rolentry = (RoleSizeEntry *)hash_search(role_size_map,
+			&owneroid,
+			HASH_ENTER, &found);
+	if (!found)
+	{
+		rolentry->owneroid = owneroid;
+		rolentry->totalsize = updatesize;
+	}
+	else {
+		rolentry->totalsize += updatesize;
+	}
+
+}
+
+static void
+add_to_pgstat_map(Oid relOid)
+{
+	DiskQuotaStatHashEntry *entry;
+	bool found;
+
+	entry = hash_search(pgstat_table_map, &relOid, HASH_ENTER, &found);
+
+	if (!found)
+	{
+		memset(&entry->t_entry, 0, sizeof(entry->t_entry));
+	}
+}
+
+static void
+remove_pgstat_map(Oid relOid)
+{
+	hash_search(pgstat_table_map, &relOid, HASH_REMOVE, NULL);
+}
+
+/*
+ *  Incremental way to update the disk quota of every database objects
+ *  Recalculate the table's disk usage when it's a new table or be update.
+ *  Detect the removed table if it's nolonger in pg_class.
+ *  If change happens, no matter size change or owner change,
+ *  update schemasizemap and rolesizemap correspondingly.
+ *
+ */
+static void
+calculate_table_disk_usage(void)
+{
+	bool found;
+	Relation	classRel;
+	HeapTuple	tuple;
+	HeapScanDesc relScan;
+	TableSizeEntry *tsentry;
+	Oid			relOid;
+	HASH_SEQ_STATUS iter;
+
+	classRel = heap_open(RelationRelationId, AccessShareLock);
+	relScan = heap_beginscan_catalog(classRel, 0, NULL);
+
+	while ((tuple = heap_getnext(relScan, ForwardScanDirection)) != NULL)
+	{
+		Form_pg_class classForm = (Form_pg_class) GETSTRUCT(tuple);
+		found = false;
+		if (classForm->relkind != RELKIND_RELATION &&
+			classForm->relkind != RELKIND_MATVIEW)
+			continue;
+		relOid = HeapTupleGetOid(tuple);
+
+		/* ignore system table*/
+		if(relOid < FirstNormalObjectId)
+			continue;
+
+		/* skip to recalculate the tables which are not in active list.*/
+
+		tsentry = (TableSizeEntry *)hash_search(table_size_map,
+							 &relOid,
+							 HASH_ENTER, &found);
+		/* namespace and owner may be changed since last check*/
+		if (!found)
+		{
+			/* if it's a new table*/
+			tsentry->reloid = relOid;
+			tsentry->namespaceoid = classForm->relnamespace;
+			tsentry->owneroid = classForm->relowner;
+			tsentry->totalsize = calculate_total_relation_size_by_oid(relOid);
+			update_namespace_map(tsentry->namespaceoid, tsentry->totalsize);
+			update_role_map(tsentry->owneroid, tsentry->totalsize);
+			/* add to pgstat_table_map hash map */
+			add_to_pgstat_map(relOid);
+		}
+		else if (check_table_is_active(tsentry->reloid))
+		{
+			/* if table size is modified*/
+			int64 oldtotalsize = tsentry->totalsize;
+			tsentry->totalsize = calculate_total_relation_size_by_oid(relOid);
+			update_namespace_map(tsentry->namespaceoid, tsentry->totalsize - oldtotalsize);
+			update_role_map(tsentry->owneroid, tsentry->totalsize - oldtotalsize);
+		}
+		/* check the disk quota limit TODO only check the modified table */
+		check_disk_quota_by_oid(tsentry->reloid, tsentry->totalsize, DISKQUOTA_TYPE_TABLE);
+
+		/* if schema change */
+		if (tsentry->namespaceoid != classForm->relnamespace)
+		{
+			update_namespace_map(tsentry->namespaceoid, -1 * tsentry->totalsize);
+			tsentry->namespaceoid = classForm->relnamespace;
+			update_namespace_map(tsentry->namespaceoid, tsentry->totalsize);
+		}
+		/* if owner change*/
+		if(tsentry->owneroid != classForm->relowner)
+		{
+			update_role_map(tsentry->owneroid, -1 * tsentry->totalsize);
+			tsentry->owneroid = classForm->relowner;
+			update_role_map(tsentry->owneroid, tsentry->totalsize);
+		}
+	}
+
+	heap_endscan(relScan);
+	heap_close(classRel, AccessShareLock);
+
+	/* Process removed tables*/
+	hash_seq_init(&iter, table_size_map);
+
+	while ((tsentry = hash_seq_search(&iter)) != NULL)
+	{
+		/* check if namespace is already be deleted */
+		tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(tsentry->reloid));
+		if (!HeapTupleIsValid(tuple))
+		{
+			update_role_map(tsentry->owneroid, -1 * tsentry->totalsize);
+			update_namespace_map(tsentry->namespaceoid, -1 * tsentry->totalsize);
+
+			hash_search(table_size_map,
+					&tsentry->reloid,
+					HASH_REMOVE, NULL);
+			remove_pgstat_map(tsentry->reloid);
+			continue;
+		}
+		ReleaseSysCache(tuple);
+	}
+}
+
+static void calculate_schema_disk_usage(void)
+{
+	HeapTuple	tuple;
+	HASH_SEQ_STATUS iter;
+	NamespaceSizeEntry* nsentry;
+	hash_seq_init(&iter, namespace_size_map);
+
+	while ((nsentry = hash_seq_search(&iter)) != NULL)
+	{
+		/* check if namespace is already be deleted */
+		tuple = SearchSysCache1(NAMESPACEOID, ObjectIdGetDatum(nsentry->namespaceoid));
+		if (!HeapTupleIsValid(tuple))
+		{
+			remove_namespace_map(nsentry->namespaceoid);
+			continue;
+		}
+		ReleaseSysCache(tuple);
+		elog(DEBUG1, "check namespace:%u with usage:%ld", nsentry->namespaceoid, nsentry->totalsize);
+		check_disk_quota_by_oid(nsentry->namespaceoid, nsentry->totalsize, DISKQUOTA_TYPE_SCHEMA);
+	}
+}
+
+static void calculate_role_disk_usage(void)
+{
+	HeapTuple	tuple;
+	HASH_SEQ_STATUS iter;
+	RoleSizeEntry* rolentry;
+	hash_seq_init(&iter, role_size_map);
+
+	while ((rolentry = hash_seq_search(&iter)) != NULL)
+	{
+		/* check if namespace is already be deleted */
+		tuple = SearchSysCache1(AUTHOID, ObjectIdGetDatum(rolentry->owneroid));
+		if (!HeapTupleIsValid(tuple))
+		{
+			remove_role_map(rolentry->owneroid);
+			continue;
+		}
+		ReleaseSysCache(tuple);
+		elog(DEBUG1, "check role:%u with usage:%ld", rolentry->owneroid, rolentry->totalsize);
+		check_disk_quota_by_oid(rolentry->owneroid, rolentry->totalsize, DISKQUOTA_TYPE_ROLE);
+	}
+}
+
+static bool check_table_is_active(Oid reloid)
+{
+	bool found = false;
+	hash_search(pgstat_active_table_map, &reloid, HASH_REMOVE, &found);
+	if (found)
+	{
+		elog(DEBUG1,"table is active with oid:%u", reloid);
+	}
+	return found;
+}
+
+static void build_active_table_map(void)
+{
+	DiskQuotaStatHashEntry *hash_entry;
+	HASH_SEQ_STATUS status;
+
+	hash_seq_init(&status, pgstat_table_map);
+
+	/* reset current pg_stat snapshot to get new data */
+	pgstat_clear_snapshot();
+
+	while ((hash_entry = (DiskQuotaStatHashEntry *) hash_seq_search(&status)) != NULL)
+	{
+
+		PgStat_StatTabEntry *stat_entry;
+
+		stat_entry = pgstat_fetch_stat_tabentry(hash_entry->reloid);
+		if (stat_entry == NULL) {
+			continue;
+		}
+
+		if (stat_entry->tuples_inserted != hash_entry->t_entry.tuples_inserted ||
+			stat_entry->tuples_updated != hash_entry->t_entry.tuples_updated ||
+			stat_entry->tuples_deleted != hash_entry->t_entry.tuples_deleted ||
+			stat_entry->autovac_vacuum_count !=  hash_entry->t_entry.autovac_vacuum_count ||
+			stat_entry->vacuum_count !=  hash_entry->t_entry.vacuum_count ||
+			stat_entry->n_live_tuples != hash_entry->t_entry.tuples_living)
+		{
+			/* Update the entry */
+			hash_entry->t_entry.tuples_inserted = stat_entry->tuples_inserted;
+			hash_entry->t_entry.tuples_updated = stat_entry->tuples_updated;
+			hash_entry->t_entry.tuples_deleted = stat_entry->tuples_deleted;
+			hash_entry->t_entry.autovac_vacuum_count = stat_entry->autovac_vacuum_count;
+			hash_entry->t_entry.vacuum_count = stat_entry->vacuum_count;
+			hash_entry->t_entry.tuples_living = stat_entry->n_live_tuples;
+
+			/* Add this entry to active hash table if not exist */
+			hash_search(pgstat_active_table_map, &hash_entry->reloid, HASH_ENTER, NULL);
+
+		}
+	}
+}
+
+
+/*
+ * Scan file system, to update the model with all files.
+ */
+static void
+refresh_disk_quota_model(void)
+{
+	reset_local_black_map();
+
+	/* recalculate the disk usage of table, schema and role */
+	StartTransactionCommand();
+	calculate_table_disk_usage();
+	calculate_schema_disk_usage();
+	calculate_role_disk_usage();
+	CommitTransactionCommand();
+
+	flush_local_black_map();
+}
+
+/*
+ * Process a database table-by-table
+ *
+ * Note that CHECK_FOR_INTERRUPTS is supposed to be used in certain spots in
+ * order not to ignore shutdown commands for too long.
+ */
+static void
+do_diskquota(void)
+{
+	init_disk_quota_model();
+
+	SetLatch(MyLatch);
+	while (!got_SIGTERM)
+	{
+		/* refresh interval is 2 seconds */
+		int rc;
+		rc = WaitLatch(MyLatch, WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					2000, WAIT_EVENT_DISKQUOTA_MAIN);
+		ResetLatch(MyLatch);
+
+		/*
+		 * Emergency bailout if postmaster has died.  This is to avoid the
+		 * necessity for manual cleanup of all postmaster children.
+		 */
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		/* the normal shutdown case */
+		if (got_SIGTERM)
+			break;
+
+		elog(LOG,"refresh disk quota model begin");
+		build_active_table_map();
+		refresh_disk_quota_model();
+		elog(LOG,"refresh disk quota model end");
+	}
+}
+
+/*
+ * DiskQuotaingActive
+ *		Check GUC vars and report whether the diskquota process should be
+ *		running.
+ */
+bool
+DiskQuotaingActive(void)
+{
+	if (!diskquota_start_daemon || !pgstat_track_counts)
+		return false;
+	return true;
+}
+
+/*
+ * diskquota_init
+ *		This is called at postmaster initialization.
+ *
+ * All we do here is annoy the user if he got it wrong.
+ */
+void
+diskquota_init(void)
+{
+	if (diskquota_start_daemon && !pgstat_track_counts)
+		ereport(WARNING,
+				(errmsg("diskquota not started because of misconfiguration"),
+				 errhint("Enable the \"track_counts\" option.")));
+}
+
+
+/*
+ * IsDiskQuota functions
+ * Return whether this is either a worker diskquota process
+ */
+bool
+IsDiskQuotaWorkerProcess(void)
+{
+	return am_diskquota_worker;
+}
+
+
+
+static void
+get_rel_owner_schema(Oid relid, Oid *ownerOid, Oid *nsOid)
+{
+	HeapTuple	tp;
+
+	tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+	if (HeapTupleIsValid(tp))
+	{
+		Form_pg_class reltup = (Form_pg_class) GETSTRUCT(tp);
+		*ownerOid = reltup->relowner;
+		*nsOid = reltup->relnamespace;
+		ReleaseSysCache(tp);
+		return ;
+	}
+	else
+	{
+		elog(DEBUG1, "could not find owner for relation %u", relid);
+		return;
+	}
+}
+
+/*
+ * Enforcement operator to check if the quota limit of a database
+ * object is reached.
+ */
+bool
+CheckTableQuota(RangeTblEntry *rte)
+{
+	bool found;
+	Oid ownerOid = InvalidOid;
+	Oid nsOid = InvalidOid;
+	Oid reloid;
+
+	/* check only for relation */
+	if (rte->rtekind != RTE_RELATION)
+		return true;
+
+	/* check for insert and update tables */
+	if ((rte->requiredPerms & ACL_INSERT) == 0 && (rte->requiredPerms & ACL_UPDATE) == 0)
+		return true;
+
+	reloid = rte->relid;
+	get_rel_owner_schema(reloid, &ownerOid, &nsOid);
+
+	LWLockAcquire(DiskQuotaLock, LW_SHARED);
+	hash_search(disk_quota_black_map,
+				&reloid,
+				HASH_FIND, &found);
+	if (found)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_DISK_FULL),
+				 errmsg("table's disk space quota exceeded")));
+		return false;
+	}
+
+	if ( nsOid != InvalidOid)
+	{
+		hash_search(disk_quota_black_map,
+				&nsOid,
+				HASH_FIND, &found);
+		if (found)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_DISK_FULL),
+					 errmsg("schema's disk space quota exceeded")));
+			return false;
+		}
+
+	}
+
+	if ( ownerOid != InvalidOid)
+	{
+		hash_search(disk_quota_black_map,
+				&ownerOid,
+				HASH_FIND, &found);
+		if (found)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_DISK_FULL),
+					 errmsg("role's disk space quota exceeded")));
+			return false;
+		}
+	}
+	LWLockRelease(DiskQuotaLock);
+	return true;
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 8a5b2b3..27b212a 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -48,6 +48,7 @@
 #include "miscadmin.h"
 #include "pg_trace.h"
 #include "postmaster/autovacuum.h"
+#include "postmaster/diskquota.h"
 #include "postmaster/fork_process.h"
 #include "postmaster/postmaster.h"
 #include "replication/walsender.h"
@@ -2833,6 +2834,16 @@ pgstat_bestart(void)
 			/* Autovacuum Worker */
 			beentry->st_backendType = B_AUTOVAC_WORKER;
 		}
+		else if (IsDiskQuotaLauncherProcess())
+		{
+			/* DiskQuota Launcher */
+			beentry->st_backendType = B_DISKQUOTA_LAUNCHER;
+		}
+		else if (IsDiskQuotaWorkerProcess())
+		{
+			/* DiskQuota Worker */
+			beentry->st_backendType = B_DISKQUOTA_WORKER;
+		}
 		else if (am_walsender)
 		{
 			/* Wal sender */
@@ -3483,6 +3494,9 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_AUTOVACUUM_MAIN:
 			event_name = "AutoVacuumMain";
 			break;
+		case WAIT_EVENT_DISKQUOTA_MAIN:
+			event_name = "DiskQuotaMain";
+			break;
 		case WAIT_EVENT_BGWRITER_HIBERNATE:
 			event_name = "BgWriterHibernate";
 			break;
@@ -4111,6 +4125,12 @@ pgstat_get_backend_desc(BackendType backendType)
 		case B_AUTOVAC_WORKER:
 			backendDesc = "autovacuum worker";
 			break;
+		case B_DISKQUOTA_LAUNCHER:
+			backendDesc = "diskquota launcher";
+			break;
+		case B_DISKQUOTA_WORKER:
+			backendDesc = "diskquota worker";
+			break;
 		case B_BACKEND:
 			backendDesc = "client backend";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 305ff36..4b323e7 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -109,6 +109,7 @@
 #include "pgstat.h"
 #include "port/pg_bswap.h"
 #include "postmaster/autovacuum.h"
+#include "postmaster/diskquota.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/fork_process.h"
 #include "postmaster/pgarch.h"
@@ -144,9 +145,10 @@
 #define BACKEND_TYPE_AUTOVAC	0x0002	/* autovacuum worker process */
 #define BACKEND_TYPE_WALSND		0x0004	/* walsender process */
 #define BACKEND_TYPE_BGWORKER	0x0008	/* bgworker process */
-#define BACKEND_TYPE_ALL		0x000F	/* OR of all the above */
+#define BACKEND_TYPE_DISKQUOTA  0x000F  /* diskquota process */
+#define BACKEND_TYPE_ALL		0x0010	/* OR of all the above */
 
-#define BACKEND_TYPE_WORKER		(BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER)
+#define BACKEND_TYPE_WORKER		(BACKEND_TYPE_DISKQUOTA | BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER)
 
 /*
  * List of active backends (or child processes anyway; we don't actually
@@ -253,7 +255,8 @@ static pid_t StartupPID = 0,
 			AutoVacPID = 0,
 			PgArchPID = 0,
 			PgStatPID = 0,
-			SysLoggerPID = 0;
+			SysLoggerPID = 0,
+			DiskQuotaPID = 0;
 
 /* Startup process's status */
 typedef enum
@@ -358,6 +361,12 @@ static volatile sig_atomic_t start_autovac_launcher = false;
 /* the launcher needs to be signalled to communicate some condition */
 static volatile bool avlauncher_needs_signal = false;
 
+/* received START_DISK_QUOTA_LAUNCHER signal */
+static volatile sig_atomic_t start_diskquota_launcher = false;
+
+/* the launcher needs to be signalled to communicate some condition */
+static volatile bool dqlauncher_needs_signal = false;
+
 /* received START_WALRECEIVER signal */
 static volatile sig_atomic_t WalReceiverRequested = false;
 
@@ -431,6 +440,7 @@ static void maybe_start_bgworkers(void);
 static bool CreateOptsFile(int argc, char *argv[], char *fullprogname);
 static pid_t StartChildProcess(AuxProcType type);
 static void StartAutovacuumWorker(void);
+static void StartDiskquotaWorker(void);
 static void MaybeStartWalReceiver(void);
 static void InitPostmasterDeathWatchHandle(void);
 
@@ -1313,6 +1323,11 @@ PostmasterMain(int argc, char *argv[])
 	autovac_init();
 
 	/*
+	 * Initialize the diskquota subsystem (again, no process start yet)
+	 */
+	diskquota_init();
+
+	/*
 	 * Load configuration files for client authentication.
 	 */
 	if (!load_hba())
@@ -1759,6 +1774,16 @@ ServerLoop(void)
 				start_autovac_launcher = false; /* signal processed */
 		}
 
+		/* If we have lost the disk quota launcher, try to start a new one */
+		if (DiskQuotaPID == 0 &&
+			(DiskQuotaingActive() || start_diskquota_launcher) &&
+			pmState == PM_RUN)
+		{
+			DiskQuotaPID = StartDiskQuotaLauncher();
+			if (DiskQuotaPID != 0)
+				start_diskquota_launcher = false; /* signal processed */
+		}
+
 		/* If we have lost the stats collector, try to start a new one */
 		if (PgStatPID == 0 &&
 			(pmState == PM_RUN || pmState == PM_HOT_STANDBY))
@@ -1776,6 +1801,14 @@ ServerLoop(void)
 				kill(AutoVacPID, SIGUSR2);
 		}
 
+		/* If we need to signal the diskquota launcher, do so now */
+		if (dqlauncher_needs_signal)
+		{
+			dqlauncher_needs_signal = false;
+			if (DiskQuotaPID != 0)
+				kill(DiskQuotaPID, SIGUSR2);
+		}
+
 		/* If we need to start a WAL receiver, try to do that now */
 		if (WalReceiverRequested)
 			MaybeStartWalReceiver();
@@ -2543,6 +2576,8 @@ SIGHUP_handler(SIGNAL_ARGS)
 			signal_child(WalReceiverPID, SIGHUP);
 		if (AutoVacPID != 0)
 			signal_child(AutoVacPID, SIGHUP);
+		if (DiskQuotaPID != 0)
+			signal_child(DiskQuotaPID, SIGHUP);
 		if (PgArchPID != 0)
 			signal_child(PgArchPID, SIGHUP);
 		if (SysLoggerPID != 0)
@@ -2629,10 +2664,13 @@ pmdie(SIGNAL_ARGS)
 				/* autovac workers are told to shut down immediately */
 				/* and bgworkers too; does this need tweaking? */
 				SignalSomeChildren(SIGTERM,
-								   BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER);
+								   BACKEND_TYPE_DISKQUOTA | BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER);
 				/* and the autovac launcher too */
 				if (AutoVacPID != 0)
 					signal_child(AutoVacPID, SIGTERM);
+				/* and the diskquota launcher too */
+				if (DiskQuotaPID != 0)
+					signal_child(DiskQuotaPID, SIGTERM);
 				/* and the bgwriter too */
 				if (BgWriterPID != 0)
 					signal_child(BgWriterPID, SIGTERM);
@@ -2710,10 +2748,13 @@ pmdie(SIGNAL_ARGS)
 				/* shut down all backends and workers */
 				SignalSomeChildren(SIGTERM,
 								   BACKEND_TYPE_NORMAL | BACKEND_TYPE_AUTOVAC |
-								   BACKEND_TYPE_BGWORKER);
+								   BACKEND_TYPE_DISKQUOTA | BACKEND_TYPE_BGWORKER);
 				/* and the autovac launcher too */
 				if (AutoVacPID != 0)
 					signal_child(AutoVacPID, SIGTERM);
+				/* and the diskquota launcher too */
+				if (DiskQuotaPID != 0)
+					signal_child(DiskQuotaPID, SIGTERM);
 				/* and the walwriter too */
 				if (WalWriterPID != 0)
 					signal_child(WalWriterPID, SIGTERM);
@@ -2877,6 +2918,8 @@ reaper(SIGNAL_ARGS)
 			 */
 			if (!IsBinaryUpgrade && AutoVacuumingActive() && AutoVacPID == 0)
 				AutoVacPID = StartAutoVacLauncher();
+			if (DiskQuotaingActive() && DiskQuotaPID == 0)
+				DiskQuotaPID = StartDiskQuotaLauncher();
 			if (PgArchStartupAllowed() && PgArchPID == 0)
 				PgArchPID = pgarch_start();
 			if (PgStatPID == 0)
@@ -3011,6 +3054,20 @@ reaper(SIGNAL_ARGS)
 								 _("autovacuum launcher process"));
 			continue;
 		}
+		/*
+		 * Was it the disk quota launcher?	Normal exit can be ignored; we'll
+		 * start a new one at the next iteration of the postmaster's main
+		 * loop, if necessary.  Any other exit condition is treated as a
+		 * crash.
+		 */
+		if (pid == DiskQuotaPID)
+		{
+			DiskQuotaPID = 0;
+			if (!EXIT_STATUS_0(exitstatus))
+				HandleChildCrash(pid, exitstatus,
+								 _("diskquota launcher process"));
+			continue;
+		}
 
 		/*
 		 * Was it the archiver?  If so, just try to start a new one; no need
@@ -3272,7 +3329,7 @@ CleanupBackend(int pid,
 
 /*
  * HandleChildCrash -- cleanup after failed backend, bgwriter, checkpointer,
- * walwriter, autovacuum, or background worker.
+ * walwriter, autovacuum, diskquota, or background worker.
  *
  * The objectives here are to clean up our local state about the child
  * process, and to signal all other remaining children to quickdie.
@@ -3477,6 +3534,18 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 		signal_child(AutoVacPID, (SendStop ? SIGSTOP : SIGQUIT));
 	}
 
+	/* Take care of the disk quota launcher too */
+	if (pid == DiskQuotaPID)
+		DiskQuotaPID = 0;
+	else if (DiskQuotaPID != 0 && take_action)
+	{
+		ereport(DEBUG2,
+				(errmsg_internal("sending %s to process %d",
+								 (SendStop ? "SIGSTOP" : "SIGQUIT"),
+								 (int) DiskQuotaPID)));
+		signal_child(DiskQuotaPID, (SendStop ? SIGSTOP : SIGQUIT));
+	}
+
 	/*
 	 * Force a power-cycle of the pgarch process too.  (This isn't absolutely
 	 * necessary, but it seems like a good idea for robustness, and it
@@ -3646,15 +3715,15 @@ PostmasterStateMachine(void)
 	{
 		/*
 		 * PM_WAIT_BACKENDS state ends when we have no regular backends
-		 * (including autovac workers), no bgworkers (including unconnected
-		 * ones), and no walwriter, autovac launcher or bgwriter.  If we are
-		 * doing crash recovery or an immediate shutdown then we expect the
-		 * checkpointer to exit as well, otherwise not. The archiver, stats,
-		 * and syslogger processes are disregarded since they are not
-		 * connected to shared memory; we also disregard dead_end children
-		 * here. Walsenders are also disregarded, they will be terminated
-		 * later after writing the checkpoint record, like the archiver
-		 * process.
+		 * (including autovac workers, disk quota workers), no bgworkers
+		 * (including unconnected ones), and no walwriter, autovac launcher,
+		 * disk quota launcher or bgwriter.  If we are doing crash recovery
+		 * or an immediate shutdown then we expect the checkpointer to exit
+		 * as well, otherwise not. The archiver, stats, and syslogger processes
+		 * are disregarded since they are not connected to shared memory; we
+		 * also disregard dead_end children here. Walsenders are also
+		 * disregarded, they will be terminated later after writing the checkpoint
+		 * record, like the archiver process.
 		 */
 		if (CountChildren(BACKEND_TYPE_NORMAL | BACKEND_TYPE_WORKER) == 0 &&
 			StartupPID == 0 &&
@@ -3663,7 +3732,7 @@ PostmasterStateMachine(void)
 			(CheckpointerPID == 0 ||
 			 (!FatalError && Shutdown < ImmediateShutdown)) &&
 			WalWriterPID == 0 &&
-			AutoVacPID == 0)
+			AutoVacPID == 0 && DiskQuotaPID == 0)
 		{
 			if (Shutdown >= ImmediateShutdown || FatalError)
 			{
@@ -3761,6 +3830,7 @@ PostmasterStateMachine(void)
 			Assert(CheckpointerPID == 0);
 			Assert(WalWriterPID == 0);
 			Assert(AutoVacPID == 0);
+			Assert(DiskQuotaPID == 0);
 			/* syslogger is not considered here */
 			pmState = PM_NO_CHILDREN;
 		}
@@ -3950,6 +4020,8 @@ TerminateChildren(int signal)
 		signal_child(WalReceiverPID, signal);
 	if (AutoVacPID != 0)
 		signal_child(AutoVacPID, signal);
+	if (DiskQuotaPID != 0)
+		signal_child(DiskQuotaPID, signal);
 	if (PgArchPID != 0)
 		signal_child(PgArchPID, signal);
 	if (PgStatPID != 0)
@@ -4795,6 +4867,12 @@ SubPostmasterMain(int argc, char *argv[])
 	if (strcmp(argv[1], "--forkavworker") == 0)
 		AutovacuumWorkerIAm();
 
+	/* diskquota needs this set before calling InitProcess */
+	if (strcmp(argv[1], "--forkdqlauncher") == 0)
+		DiskquotaLauncherIAm();
+	if (strcmp(argv[1], "--forkdqworker") == 0)
+		DiskquotaWorkerIAm();
+
 	/*
 	 * Start our win32 signal implementation. This has to be done after we
 	 * read the backend variables, because we need to pick up the signal pipe
@@ -4923,6 +5001,32 @@ SubPostmasterMain(int argc, char *argv[])
 
 		AutoVacWorkerMain(argc - 2, argv + 2);	/* does not return */
 	}
+	if (strcmp(argv[1], "--forkdqlauncher") == 0)
+	{
+		/* Restore basic shared memory pointers */
+		InitShmemAccess(UsedShmemSegAddr);
+
+		/* Need a PGPROC to run CreateSharedMemoryAndSemaphores */
+		InitProcess();
+
+		/* Attach process to shared data structures */
+		CreateSharedMemoryAndSemaphores(false, 0);
+
+		DiskQuotaLauncherMain(argc - 2, argv + 2);	/* does not return */
+	}
+	if (strcmp(argv[1], "--forkdqworker") == 0)
+	{
+		/* Restore basic shared memory pointers */
+		InitShmemAccess(UsedShmemSegAddr);
+
+		/* Need a PGPROC to run CreateSharedMemoryAndSemaphores */
+		InitProcess();
+
+		/* Attach process to shared data structures */
+		CreateSharedMemoryAndSemaphores(false, 0);
+
+		DiskQuotaWorkerMain(argc - 2, argv + 2);	/* does not return */
+	}
 	if (strncmp(argv[1], "--forkbgworker=", 15) == 0)
 	{
 		int			shmem_slot;
@@ -5131,6 +5235,21 @@ sigusr1_handler(SIGNAL_ARGS)
 		start_autovac_launcher = true;
 	}
 
+	if (CheckPostmasterSignal(PMSIGNAL_START_DISKQUOTA_LAUNCHER) &&
+		Shutdown == NoShutdown)
+	{
+		/*
+		 * Start one iteration of the diskquota daemon, even if diskquota
+		 * is nominally not enabled.  This is so we can have an active defense
+		 * against transaction ID wraparound.  We set a flag for the main loop
+		 * to do it rather than trying to do it here --- this is because the
+		 * diskquota process itself may send the signal, and we want to handle
+		 * that by launching another iteration as soon as the current one
+		 * completes.
+		 */
+		start_diskquota_launcher = true;
+	}
+
 	if (CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER) &&
 		Shutdown == NoShutdown)
 	{
@@ -5138,6 +5257,13 @@ sigusr1_handler(SIGNAL_ARGS)
 		StartAutovacuumWorker();
 	}
 
+	if (CheckPostmasterSignal(PMSIGNAL_START_DISKQUOTA_WORKER) &&
+		Shutdown == NoShutdown)
+	{
+		/* The diskquota launcher wants us to start a worker process. */
+		StartDiskquotaWorker();
+	}
+
 	if (CheckPostmasterSignal(PMSIGNAL_START_WALRECEIVER))
 	{
 		/* Startup Process wants us to start the walreceiver process. */
@@ -5480,6 +5606,93 @@ StartAutovacuumWorker(void)
 }
 
 /*
+ * StartDiskquotaWorker
+ *		Start an diskquota worker process.
+ *
+ * This function is here because it enters the resulting PID into the
+ * postmaster's private backends list.
+ *
+ * NB -- this code very roughly matches BackendStartup.
+ */
+static void
+StartDiskquotaWorker(void)
+{
+	Backend    *bn;
+	/*
+	 * If not in condition to run a process, don't try, but handle it like a
+	 * fork failure.  This does not normally happen, since the signal is only
+	 * supposed to be sent by diskquota launcher when it's OK to do it, but
+	 * we have to check to avoid race-condition problems during DB state
+	 * changes.
+	 */
+	if (canAcceptConnections() == CAC_OK)
+	{
+		/*
+		 * Compute the cancel key that will be assigned to this session. We
+		 * probably don't need cancel keys for diskquota workers, but we'd
+		 * better have something random in the field to prevent unfriendly
+		 * people from sending cancels to them.
+		 */
+		if (!RandomCancelKey(&MyCancelKey))
+		{
+			ereport(LOG,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					 errmsg("could not generate random cancel key")));
+			return;
+		}
+
+		bn = (Backend *) malloc(sizeof(Backend));
+		if (bn)
+		{
+			bn->cancel_key = MyCancelKey;
+
+			/* Diskquota workers are not dead_end and need a child slot */
+			bn->dead_end = false;
+			bn->child_slot = MyPMChildSlot = AssignPostmasterChildSlot();
+			bn->bgworker_notify = false;
+			bn->pid = StartDiskQuotaWorker();
+			if (bn->pid > 0)
+			{
+				bn->bkend_type = BACKEND_TYPE_DISKQUOTA;
+				dlist_push_head(&BackendList, &bn->elem);
+#ifdef EXEC_BACKEND
+				ShmemBackendArrayAdd(bn);
+#endif
+				/* all OK */
+				return;
+			}
+
+			/*
+			 * fork failed, fall through to report -- actual error message was
+			 * logged by StartDiskQuotaWorker
+			 */
+			(void) ReleasePostmasterChildSlot(bn->child_slot);
+			free(bn);
+		}
+		else
+			ereport(LOG,
+					(errcode(ERRCODE_OUT_OF_MEMORY),
+					 errmsg("out of memory")));
+	}
+
+	/*
+	 * Report the failure to the launcher, if it's running.  (If it's not, we
+	 * might not even be connected to shared memory, so don't try to call
+	 * DiskQuotaWorkerFailed.)  Note that we also need to signal it so that it
+	 * responds to the condition, but we don't do that here, instead waiting
+	 * for ServerLoop to do it.  This way we avoid a ping-pong signalling in
+	 * quick succession between the diskquota launcher and postmaster in case
+	 * things get ugly.
+	 */
+	if (DiskQuotaPID != 0)
+	{
+		elog(WARNING, "wuhao 11");
+		//DiskQuotaWorkerFailed();
+		dqlauncher_needs_signal = true;
+	}
+}
+
+/*
  * MaybeStartWalReceiver
  *		Start the WAL receiver process, if not running and our state allows.
  */
@@ -5534,7 +5747,7 @@ CreateOptsFile(int argc, char *argv[], char *fullprogname)
  *
  * This reports the number of entries needed in per-child-process arrays
  * (the PMChildFlags array, and if EXEC_BACKEND the ShmemBackendArray).
- * These arrays include regular backends, autovac workers, walsenders
+ * These arrays include regular backends, autovac/diskquota workers, walsenders
  * and background workers, but not special children nor dead_end children.
  * This allows the arrays to have a fixed maximum size, to wit the same
  * too-many-children limit enforced by canAcceptConnections().  The exact value
@@ -5543,8 +5756,8 @@ CreateOptsFile(int argc, char *argv[], char *fullprogname)
 int
 MaxLivePostmasterChildren(void)
 {
-	return 2 * (MaxConnections + autovacuum_max_workers + 1 +
-				max_worker_processes);
+	return 2 * (MaxConnections + autovacuum_max_workers + diskquota_max_workers
+				+ 2 + max_worker_processes);
 }
 
 /*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 0c86a58..10b2bda 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -27,6 +27,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/diskquota.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/slot.h"
@@ -140,6 +141,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 		size = add_size(size, ProcSignalShmemSize());
 		size = add_size(size, CheckpointerShmemSize());
 		size = add_size(size, AutoVacuumShmemSize());
+		size = add_size(size, DiskQuotaShmemSize());
 		size = add_size(size, ReplicationSlotsShmemSize());
 		size = add_size(size, ReplicationOriginShmemSize());
 		size = add_size(size, WalSndShmemSize());
@@ -256,6 +258,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	ProcSignalShmemInit();
 	CheckpointerShmemInit();
 	AutoVacuumShmemInit();
+	DiskQuotaShmemInit();
 	ReplicationSlotsShmemInit();
 	ReplicationOriginShmemInit();
 	WalSndShmemInit();
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index e6025ec..7d2d2f8 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -50,3 +50,4 @@ OldSnapshotTimeMapLock				42
 BackendRandomLock					43
 LogicalRepWorkerLock				44
 CLogTruncationLock					45
+DiskQuotaLock                       46
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 6f9aaa5..a8b1800 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -41,6 +41,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
+#include "postmaster/diskquota.h"
 #include "replication/slot.h"
 #include "replication/syncrep.h"
 #include "storage/condition_variable.h"
@@ -352,7 +353,7 @@ InitProcess(void)
 	 * cleaning up.  (XXX autovac launcher currently doesn't participate in
 	 * this; it probably should.)
 	 */
-	if (IsUnderPostmaster && !IsAutoVacuumLauncherProcess())
+	if (IsUnderPostmaster && !IsAutoVacuumLauncherProcess() && !IsDiskQuotaLauncherProcess())
 		MarkPostmasterChildActive();
 
 	/*
@@ -897,13 +898,17 @@ ProcKill(int code, Datum arg)
 	 * This process is no longer present in shared memory in any meaningful
 	 * way, so tell the postmaster we've cleaned up acceptably well. (XXX
 	 * autovac launcher should be included here someday)
+	 * disk quota do the same as autovac process
 	 */
-	if (IsUnderPostmaster && !IsAutoVacuumLauncherProcess())
+	if (IsUnderPostmaster && !IsAutoVacuumLauncherProcess() && !IsDiskQuotaLauncherProcess())
 		MarkPostmasterChildInactive();
 
 	/* wake autovac launcher if needed -- see comments in FreeWorkerInfo */
 	if (AutovacuumLauncherPid != 0)
 		kill(AutovacuumLauncherPid, SIGUSR2);
+	/* disk quota do the same as autovac process */
+	if (DiskquotaLauncherPid != 0)
+		kill(DiskquotaLauncherPid, SIGUSR2);
 }
 
 /*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index e4c6e3d..aa933bd 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -56,6 +56,7 @@
 #include "parser/parser.h"
 #include "pg_getopt.h"
 #include "postmaster/autovacuum.h"
+#include "postmaster/diskquota.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
@@ -2956,6 +2957,10 @@ ProcessInterrupts(void)
 			ereport(FATAL,
 					(errcode(ERRCODE_ADMIN_SHUTDOWN),
 					 errmsg("terminating autovacuum process due to administrator command")));
+		else if (IsDiskQuotaWorkerProcess())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating disk quota process due to administrator command")));
 		else if (IsLogicalWorker())
 			ereport(FATAL,
 					(errcode(ERRCODE_ADMIN_SHUTDOWN),
@@ -3084,6 +3089,13 @@ ProcessInterrupts(void)
 					(errcode(ERRCODE_QUERY_CANCELED),
 					 errmsg("canceling autovacuum task")));
 		}
+		if (IsDiskQuotaWorkerProcess())
+		{
+			LockErrorCleanup();
+			ereport(ERROR,
+					(errcode(ERRCODE_QUERY_CANCELED),
+					 errmsg("canceling diskquota task")));
+		}
 		if (RecoveryConflictPending)
 		{
 			RecoveryConflictPending = false;
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index b5804f6..0d97700 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -34,6 +34,7 @@
 #include "commands/copy.h"
 #include "commands/createas.h"
 #include "commands/dbcommands.h"
+#include "commands/diskquotacmd.h"
 #include "commands/defrem.h"
 #include "commands/discard.h"
 #include "commands/event_trigger.h"
@@ -204,6 +205,8 @@ check_xact_readonly(Node *parsetree)
 		case T_CreateExtensionStmt:
 		case T_AlterExtensionStmt:
 		case T_AlterExtensionContentsStmt:
+		case T_CreateDiskQuotaStmt:
+		case T_DropDiskQuotaStmt:
 		case T_CreateFdwStmt:
 		case T_AlterFdwStmt:
 		case T_CreateForeignServerStmt:
@@ -1385,6 +1388,16 @@ ProcessUtilitySlow(ParseState *pstate,
 														 &secondaryObject);
 				break;
 
+			case T_CreateDiskQuotaStmt:
+				CreateDiskQuota((CreateDiskQuotaStmt *) parsetree);
+				commandCollected = true;
+				break;
+
+			case T_DropDiskQuotaStmt:
+				DropDiskQuota((DropDiskQuotaStmt *) parsetree);
+				commandCollected = true;
+				break;
+
 			case T_CreateFdwStmt:
 				address = CreateForeignDataWrapper((CreateFdwStmt *) parsetree);
 				break;
@@ -2214,6 +2227,14 @@ CreateCommandTag(Node *parsetree)
 			tag = "ALTER EXTENSION";
 			break;
 
+		case T_CreateDiskQuotaStmt:
+			tag = "CREATE DISK QUOTA";
+			break;
+
+		case T_DropDiskQuotaStmt:
+			tag = "DROP DISK QUOTA";
+			break;
+
 		case T_CreateFdwStmt:
 			tag = "CREATE FOREIGN DATA WRAPPER";
 			break;
@@ -2999,6 +3020,11 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_DDL;
 			break;
 
+		case T_CreateDiskQuotaStmt:
+		case T_DropDiskQuotaStmt:
+			lev = LOGSTMT_DDL;
+			break;
+
 		case T_CreateFdwStmt:
 		case T_AlterFdwStmt:
 		case T_CreateForeignServerStmt:
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 07e5e78..bdd7646 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -25,6 +25,7 @@
 #include "storage/fd.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/dbsize.h"
 #include "utils/numeric.h"
 #include "utils/rel.h"
 #include "utils/relfilenodemap.h"
@@ -1020,3 +1021,19 @@ pg_relation_filepath(PG_FUNCTION_ARGS)
 
 	PG_RETURN_TEXT_P(cstring_to_text(path));
 }
+
+int64 calculate_total_relation_size_by_oid(Oid relOid)
+{
+	Relation	rel;
+	int64		size;
+
+	rel = try_relation_open(relOid, AccessShareLock);
+
+	if (rel == NULL)
+		return 0;
+
+	size = calculate_total_relation_size(rel);
+
+	relation_close(rel, AccessShareLock);
+	return size;
+}
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 2b38178..38cb643 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -38,6 +38,7 @@
 #include "catalog/pg_default_acl.h"
 #include "catalog/pg_depend.h"
 #include "catalog/pg_description.h"
+#include "catalog/pg_diskquota.h"
 #include "catalog/pg_enum.h"
 #include "catalog/pg_event_trigger.h"
 #include "catalog/pg_foreign_data_wrapper.h"
@@ -375,6 +376,39 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		8
 	},
+	{DiskQuotaRelationId,		/* DISKQUOTANAME */
+		DiskQuotaNameIndexId,
+		1,
+		{
+			Anum_pg_diskquota_quotaname,
+			0,
+			0,
+			0
+		},
+		16
+	},
+	{DiskQuotaRelationId,		/* DISKQUOTAOID */
+		DiskQuotaOidIndexId,
+		1,
+		{
+			ObjectIdAttributeNumber,
+			0,
+			0,
+			0
+		},
+		16
+	},
+	{DiskQuotaRelationId,		/* DISKQUOTATARGETOID */
+		DiskQuotaTargetOidIndexId,
+		1,
+		{
+			Anum_pg_diskquota_quotatargetoid,
+			0,
+			0,
+			0
+		},
+		16
+	},
 	{EnumRelationId,			/* ENUMOID */
 		EnumOidIndexId,
 		1,
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 865119d..45954e7 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -38,6 +38,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
+#include "postmaster/diskquota.h"
 #include "postmaster/postmaster.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
@@ -687,9 +688,9 @@ InitializeSessionUserIdStandalone(void)
 {
 	/*
 	 * This function should only be called in single-user mode, in autovacuum
-	 * workers, and in background workers.
+	 * workers, disk quota workers and in background workers.
 	 */
-	AssertState(!IsUnderPostmaster || IsAutoVacuumWorkerProcess() || IsBackgroundWorker);
+	AssertState(!IsUnderPostmaster || IsAutoVacuumWorkerProcess() || IsDiskQuotaWorkerProcess() || IsBackgroundWorker);
 
 	/* call only once */
 	AssertState(!OidIsValid(AuthenticatedUserId));
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 5ef6315..46ffc05 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -38,6 +38,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
+#include "postmaster/diskquota.h"
 #include "postmaster/postmaster.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
@@ -319,9 +320,9 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 	 * a way to recover from disabling all access to all databases, for
 	 * example "UPDATE pg_database SET datallowconn = false;".
 	 *
-	 * We do not enforce them for autovacuum worker processes either.
+	 * We do not enforce them for autovacuum/diskquota worker processes either.
 	 */
-	if (IsUnderPostmaster && !IsAutoVacuumWorkerProcess())
+	if (IsUnderPostmaster && !IsAutoVacuumWorkerProcess() && !IsDiskQuotaWorkerProcess())
 	{
 		/*
 		 * Check that the database is currently allowing connections.
@@ -503,9 +504,9 @@ InitializeMaxBackends(void)
 {
 	Assert(MaxBackends == 0);
 
-	/* the extra unit accounts for the autovacuum launcher */
-	MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
-		max_worker_processes;
+	/* the extra unit accounts for the autovacuum and diskquota launcher */
+	MaxBackends = MaxConnections + autovacuum_max_workers + diskquota_max_workers
+			+ 2 + max_worker_processes;
 
 	/* internal error because the values were all checked previously */
 	if (MaxBackends > MAX_BACKENDS)
@@ -681,8 +682,8 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	 */
 	before_shmem_exit(ShutdownPostgres, 0);
 
-	/* The autovacuum launcher is done here */
-	if (IsAutoVacuumLauncherProcess())
+	/* The autovacuum and diskquota launcher is done here */
+	if (IsAutoVacuumLauncherProcess()|| IsDiskQuotaLauncherProcess())
 	{
 		/* report this backend in the PgBackendStatus array */
 		pgstat_bestart();
@@ -719,10 +720,10 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	 * Perform client authentication if necessary, then figure out our
 	 * postgres user ID, and see if we are a superuser.
 	 *
-	 * In standalone mode and in autovacuum worker processes, we use a fixed
+	 * In standalone mode and in autovacuum/diskquota worker processes, we use a fixed
 	 * ID, otherwise we figure it out from the authenticated user name.
 	 */
-	if (bootstrap || IsAutoVacuumWorkerProcess())
+	if (bootstrap || IsAutoVacuumWorkerProcess() || IsDiskQuotaWorkerProcess())
 	{
 		InitializeSessionUserIdStandalone();
 		am_superuser = true;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 77662af..5e07e39 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -59,6 +59,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/diskquota.h"
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "postmaster/walwriter.h"
@@ -184,6 +185,7 @@ static const char *show_tcp_keepalives_count(void);
 static bool check_maxconnections(int *newval, void **extra, GucSource source);
 static bool check_max_worker_processes(int *newval, void **extra, GucSource source);
 static bool check_autovacuum_max_workers(int *newval, void **extra, GucSource source);
+static bool check_diskquota_max_workers(int *newval, void **extra, GucSource source);
 static bool check_autovacuum_work_mem(int *newval, void **extra, GucSource source);
 static bool check_effective_io_concurrency(int *newval, void **extra, GucSource source);
 static void assign_effective_io_concurrency(int newval, void *extra);
@@ -1340,6 +1342,16 @@ static struct config_bool ConfigureNamesBool[] =
 	},
 
 	{
+		{"diskquota", PGC_SIGHUP, AUTOVACUUM,
+			gettext_noop("Starts the diskquota subprocess."),
+			NULL
+		},
+		&diskquota_start_daemon,
+		true,
+		NULL, NULL, NULL
+	},
+
+	{
 		{"trace_notify", PGC_USERSET, DEVELOPER_OPTIONS,
 			gettext_noop("Generates debugging output for LISTEN and NOTIFY."),
 			NULL,
@@ -2875,6 +2887,16 @@ static struct config_int ConfigureNamesInt[] =
 		3, 1, MAX_BACKENDS,
 		check_autovacuum_max_workers, NULL, NULL
 	},
+	{
+		/* see max_connections */
+		{"diskquota_max_workers", PGC_POSTMASTER, DISKQUOTA,
+			gettext_noop("Sets the maximum number of simultaneously running diskquota worker processes."),
+			NULL
+		},
+		&diskquota_max_workers,
+		3, 1, 10,
+		check_diskquota_max_workers, NULL, NULL
+	},
 
 	{
 		{"max_parallel_maintenance_workers", PGC_USERSET, RESOURCES_ASYNCHRONOUS,
@@ -3296,6 +3318,16 @@ static struct config_string ConfigureNamesString[] =
 	},
 
 	{
+		{"diskquota_databases", PGC_USERSET,DISKQUOTA,
+			gettext_noop("Database list for disk quota monitoring."),
+			NULL,
+		},
+		&guc_dq_database_list,
+		"postgres,test0,test1",
+		NULL, NULL, NULL
+	},
+
+	{
 		{"log_line_prefix", PGC_SIGHUP, LOGGING_WHAT,
 			gettext_noop("Controls information prefixed to each log line."),
 			gettext_noop("If blank, no prefix is used.")
@@ -10661,6 +10693,14 @@ check_autovacuum_max_workers(int *newval, void **extra, GucSource source)
 }
 
 static bool
+check_diskquota_max_workers(int *newval, void **extra, GucSource source)
+{
+	if (MaxConnections + *newval + 1 + max_worker_processes > MAX_BACKENDS)
+		return false;
+	return true;
+}
+
+static bool
 check_autovacuum_work_mem(int *newval, void **extra, GucSource source)
 {
 	/*
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 47a1a19..bb66078 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1010,6 +1010,7 @@ static const pgsql_thing_t words_after_create[] = {
 	{"DATABASE", Query_for_list_of_databases},
 	{"DEFAULT PRIVILEGES", NULL, NULL, NULL, THING_NO_CREATE | THING_NO_DROP},
 	{"DICTIONARY", Query_for_list_of_ts_dictionaries, NULL, NULL, THING_NO_SHOW},
+	{"DISK QUOTA", NULL, NULL, NULL},
 	{"DOMAIN", NULL, NULL, &Query_for_list_of_domains},
 	{"EVENT TRIGGER", NULL, NULL, NULL},
 	{"EXTENSION", Query_for_list_of_extensions},
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 254fbef..e7065cc 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -360,4 +360,12 @@ DECLARE_UNIQUE_INDEX(pg_subscription_subname_index, 6115, on pg_subscription usi
 DECLARE_UNIQUE_INDEX(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg_subscription_rel using btree(srrelid oid_ops, srsubid oid_ops));
 #define SubscriptionRelSrrelidSrsubidIndexId 6117
 
+DECLARE_UNIQUE_INDEX(pg_diskquota_oid_index, 6124, on pg_diskquota using btree(oid oid_ops));
+#define DiskQuotaOidIndexId 6124
+DECLARE_UNIQUE_INDEX(pg_diskquota_name_index, 6125, on pg_diskquota using btree(quotaname name_ops));
+#define DiskQuotaNameIndexId 6125
+
+DECLARE_UNIQUE_INDEX(pg_diskquota_target_oid_index, 6126, on pg_diskquota using btree(quotatargetoid oid_ops));
+#define DiskQuotaTargetOidIndexId 6126
+
 #endif							/* INDEXING_H */
diff --git a/src/include/catalog/pg_diskquota.h b/src/include/catalog/pg_diskquota.h
new file mode 100644
index 0000000..ff3f2df
--- /dev/null
+++ b/src/include/catalog/pg_diskquota.h
@@ -0,0 +1,60 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_diskquota.h
+ *	  definition of the "diskquota" system catalog (pg_diskquota)
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_diskquota.h
+ *
+ * NOTES
+ *	  The Catalog.pm module reads this file and derives schema
+ *	  information.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_DISKQUOTA_H
+#define PG_DISKQUOTA_H
+
+#include "catalog/genbki.h"
+#include "catalog/pg_diskquota_d.h"
+
+/* ----------------
+ *	pg_diskquota definition.  cpp turns this into
+ *	typedef struct FormData_pg_diskquota
+ * ----------------
+ */
+
+typedef enum DiskQuotaType
+{
+	DISKQUOTA_TYPE_UNKNOWN = -1,
+
+	DISKQUOTA_TYPE_TABLE = 0,
+
+	DISKQUOTA_TYPE_SCHEMA = 1,
+
+	DISKQUOTA_TYPE_DATABASE = 3,
+
+	DISKQUOTA_TYPE_ROLE = 2,
+
+} DiskQuotaType;
+
+CATALOG(pg_diskquota,6122,DiskQuotaRelationId)
+{
+	NameData	quotaname;		/* diskquota name */
+	int16		quotatype;		/* diskquota type name */
+	Oid			quotatargetoid;	/* diskquota target db object oid*/
+	int32		quotalimit;		/* diskquota size limit in MB*/
+	int32		quotaredzone;	/* diskquota redzone in MB*/
+} FormData_pg_diskquota;
+
+/* ----------------
+ *	Form_pg_diskquota corresponds to a pointer to a tuple with
+ *	the format of pg_diskquota relation.
+ * ----------------
+ */
+typedef FormData_pg_diskquota *Form_pg_diskquota;
+
+#endif							/* PG_DISKQUOTA_H */
diff --git a/src/include/commands/diskquotacmd.h b/src/include/commands/diskquotacmd.h
new file mode 100644
index 0000000..26e62b5
--- /dev/null
+++ b/src/include/commands/diskquotacmd.h
@@ -0,0 +1,30 @@
+/*-------------------------------------------------------------------------
+ *
+ * diskquotacmd.h
+ *	  support for disk quota in different level.
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ *
+ * src/include/command/diskquotacmd.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef POSTGRES_DISKQUOTA_H
+#define POSTGRES_DISKQUOTA_H
+
+#include "postgres.h"
+#include "nodes/parsenodes.h"
+
+
+extern void CreateDiskQuota(CreateDiskQuotaStmt *stmt);
+extern void DropDiskQuota(DropDiskQuotaStmt *stmt);
+
+
+/* catalog access function */
+extern char *GetDiskQuotaName(Oid quotaid);
+extern Oid GetDiskQuotaOidByName(const char *name);
+
+
+#endif //POSTGRES_DISKQUOTA_H
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index cac6ff0..66a18f3 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -387,6 +387,8 @@ typedef enum NodeTag
 	T_AlterEnumStmt,
 	T_AlterTSDictionaryStmt,
 	T_AlterTSConfigurationStmt,
+	T_CreateDiskQuotaStmt,
+	T_DropDiskQuotaStmt,
 	T_CreateFdwStmt,
 	T_AlterFdwStmt,
 	T_CreateForeignServerStmt,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 62209a8..e19fc42 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2207,6 +2207,38 @@ typedef struct AlterExtensionContentsStmt
 } AlterExtensionContentsStmt;
 
 /* ----------------------
+ *		Create/Alter/Drop Disk Quota Statements
+ * ----------------------
+ */
+
+
+typedef enum DiskQuotaDBObjectType
+{
+	DISK_QUOTA_TABLE = 0,		/* Quota in table level */
+	DISK_QUOTA_SCHEMA = 1,     /* Quota in schema level */
+	DISK_QUOTA_USER = 2	    /* Quota in user level */
+} DiskQuotaDBObjectType;
+
+typedef struct CreateDiskQuotaStmt
+{
+	NodeTag		type;
+	char	   *quotaname;		/* Disk Quota's name */
+	DiskQuotaDBObjectType       dbobjtype;			/* Type of Database object for disk quota */
+	RangeVar   *table;          /* The table relation if the DBObjectTyoe is DISK_QUOTA_TABLE*/
+	char       *objname;			/* Database object name */
+	List	   *options;			/* Disk quota options */
+} CreateDiskQuotaStmt;
+
+typedef struct DropDiskQuotaStmt
+{
+	NodeTag		type;
+	char	   *quotaname;		/* Disk Quota's name */
+	bool		missing_ok;
+} DropDiskQuotaStmt;
+
+
+
+/* ----------------------
  *		Create/Alter FOREIGN DATA WRAPPER Statements
  * ----------------------
  */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 23db401..be23eba 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -134,6 +134,7 @@ PG_KEYWORD("detach", DETACH, UNRESERVED_KEYWORD)
 PG_KEYWORD("dictionary", DICTIONARY, UNRESERVED_KEYWORD)
 PG_KEYWORD("disable", DISABLE_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("discard", DISCARD, UNRESERVED_KEYWORD)
+PG_KEYWORD("disk", DISK, UNRESERVED_KEYWORD)
 PG_KEYWORD("distinct", DISTINCT, RESERVED_KEYWORD)
 PG_KEYWORD("do", DO, RESERVED_KEYWORD)
 PG_KEYWORD("document", DOCUMENT_P, UNRESERVED_KEYWORD)
@@ -317,6 +318,7 @@ PG_KEYWORD("procedure", PROCEDURE, UNRESERVED_KEYWORD)
 PG_KEYWORD("procedures", PROCEDURES, UNRESERVED_KEYWORD)
 PG_KEYWORD("program", PROGRAM, UNRESERVED_KEYWORD)
 PG_KEYWORD("publication", PUBLICATION, UNRESERVED_KEYWORD)
+PG_KEYWORD("quota", QUOTA, UNRESERVED_KEYWORD)
 PG_KEYWORD("quote", QUOTE, UNRESERVED_KEYWORD)
 PG_KEYWORD("range", RANGE, UNRESERVED_KEYWORD)
 PG_KEYWORD("read", READ, UNRESERVED_KEYWORD)
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index d59c24a..5caf0b9 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -703,6 +703,8 @@ typedef enum BackendType
 {
 	B_AUTOVAC_LAUNCHER,
 	B_AUTOVAC_WORKER,
+	B_DISKQUOTA_LAUNCHER,
+	B_DISKQUOTA_WORKER,
 	B_BACKEND,
 	B_BG_WORKER,
 	B_BG_WRITER,
@@ -756,6 +758,7 @@ typedef enum
 {
 	WAIT_EVENT_ARCHIVER_MAIN = PG_WAIT_ACTIVITY,
 	WAIT_EVENT_AUTOVACUUM_MAIN,
+	WAIT_EVENT_DISKQUOTA_MAIN,
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
diff --git a/src/include/postmaster/diskquota.h b/src/include/postmaster/diskquota.h
new file mode 100644
index 0000000..7200c7f
--- /dev/null
+++ b/src/include/postmaster/diskquota.h
@@ -0,0 +1,72 @@
+/*-------------------------------------------------------------------------
+ *
+ * diskquota.h
+ *	  header file for integrated diskquota daemon
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/diskquota.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef DISKQUOTA_H
+#define DISKQUOTA_H
+
+#include "nodes/parsenodes.h"
+#include "storage/block.h"
+
+/* GUC variables */
+extern char *guc_dq_database_list;
+extern bool diskquota_start_daemon;
+extern int	diskquota_max_workers;
+extern int	diskquota_work_mem;
+extern int	diskquota_naptime;
+extern int	diskquota_vac_thresh;
+extern double diskquota_vac_scale;
+extern int	diskquota_anl_thresh;
+extern double diskquota_anl_scale;
+extern int	diskquota_freeze_max_age;
+extern int	diskquota_multixact_freeze_max_age;
+extern int	diskquota_vac_cost_delay;
+extern int	diskquota_vac_cost_limit;
+
+/* diskquota launcher PID, only valid when worker is shutting down */
+extern int	DiskquotaLauncherPid;
+
+extern int	Log_diskquota_min_duration;
+
+/* Status inquiry functions */
+extern bool DiskQuotaingActive(void);
+extern bool IsDiskQuotaLauncherProcess(void);
+extern bool IsDiskQuotaWorkerProcess(void);
+
+#define IsAnyDiskQuotaProcess() \
+	(IsDiskQuotaLauncherProcess() || IsDiskQuotaWorkerProcess())
+
+/* Functions to start diskquota process, called from postmaster */
+extern void diskquota_init(void);
+extern int	StartDiskQuotaLauncher(void);
+extern int	StartDiskQuotaWorker(void);
+
+/* called from postmaster when a worker could not be forked */
+extern void DiskQuotaWorkerFailed(void);
+
+/* diskquota cost-delay balancer */
+extern void DiskQuotaUpdateDelay(void);
+
+#ifdef EXEC_BACKEND
+extern void DiskQuotaLauncherMain(int argc, char *argv[]) pg_attribute_noreturn();
+extern void DiskQuotaWorkerMain(int argc, char *argv[]) pg_attribute_noreturn();
+extern void DiskquotaWorkerIAm(void);
+extern void DiskquotaLauncherIAm(void);
+#endif
+
+
+/* shared memory stuff */
+extern Size DiskQuotaShmemSize(void);
+extern void DiskQuotaShmemInit(void);
+
+extern bool CheckTableQuota(RangeTblEntry *rte);
+#endif							/* DISKQUOTA_H */
diff --git a/src/include/storage/pmsignal.h b/src/include/storage/pmsignal.h
index 0747341..7749247 100644
--- a/src/include/storage/pmsignal.h
+++ b/src/include/storage/pmsignal.h
@@ -41,6 +41,8 @@ typedef enum
 	PMSIGNAL_BACKGROUND_WORKER_CHANGE,	/* background worker state change */
 	PMSIGNAL_START_WALRECEIVER, /* start a walreceiver */
 	PMSIGNAL_ADVANCE_STATE_MACHINE, /* advance postmaster's state machine */
+	PMSIGNAL_START_DISKQUOTA_LAUNCHER,    /* start an diskquota launcher */
+	PMSIGNAL_START_DISKQUOTA_WORKER,  /* start an diskquota worker */
 
 	NUM_PMSIGNALS				/* Must be last value of enum! */
 } PMSignalReason;
diff --git a/src/include/utils/dbsize.h b/src/include/utils/dbsize.h
new file mode 100644
index 0000000..e1286f7
--- /dev/null
+++ b/src/include/utils/dbsize.h
@@ -0,0 +1,18 @@
+/*-------------------------------------------------------------------------
+ *
+ * dbsize.h
+ *	  Definitions for dbsize functions.
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/dbsize.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef DBSIZE_H
+#define DBSIZE_H
+
+int64 calculate_total_relation_size_by_oid(Oid relOid);
+#endif							/* DBSIZE_H */
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 668d9ef..53b5aac 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -88,6 +88,7 @@ enum config_group
 	STATS_MONITORING,
 	STATS_COLLECTOR,
 	AUTOVACUUM,
+	DISKQUOTA,
 	CLIENT_CONN,
 	CLIENT_CONN_STATEMENT,
 	CLIENT_CONN_LOCALE,
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 4f33358..9de408d 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -54,6 +54,9 @@ enum SysCacheIdentifier
 	CONVOID,
 	DATABASEOID,
 	DEFACLROLENSPOBJ,
+	DISKQUOTANAME,
+	DISKQUOTAOID,
+	DISKQUOTATARGETOID,
 	ENUMOID,
 	ENUMTYPOIDNAME,
 	EVENTTRIGGERNAME,
diff --git a/src/test/regress/expected/disk_quota.out b/src/test/regress/expected/disk_quota.out
new file mode 100644
index 0000000..2eed59a
--- /dev/null
+++ b/src/test/regress/expected/disk_quota.out
@@ -0,0 +1,47 @@
+--
+-- Regression tests for disk quota
+--
+CREATE SCHEMA test_ns_disk_quota
+       CREATE TABLE quota_1 (
+              a int,
+              b int
+       );
+CREATE TABLE quota_2 (
+        a                       int2,
+        b                       float4
+);
+CREATE ROLE test_user_quota_1;
+CREATE DISK QUOTA abc ON TABLE quota_2 WITH ();
+ERROR:  syntax error at or near ")"
+LINE 1: CREATE DISK QUOTA abc ON TABLE quota_2 WITH ();
+                                                     ^
+CREATE DISK QUOTA abc ON TABLE quota_2 WITH (abc='100MB');
+ERROR:  unknown disk quota option abc
+CREATE DISK QUOTA abc ON TABLE quota_2 WITH (quota='100MB', abc='100MB');
+ERROR:  duplicate quota settings
+CREATE DISK QUOTA abc ON TABLE quota_2 WITH (quota='100MB', quota='200MB');
+ERROR:  duplicate quota settings
+CREATE DISK QUOTA abc ON TABLE quota_2 WITH (quota='100MB');
+CREATE DISK QUOTA abc ON TABLE quota_2 WITH (quota='100MB');
+ERROR:  disk quota entry "abc" already exists
+CREATE DISK QUOTA abc_error ON TABLE quota_1 WITH (quota='100MB');
+ERROR:  could not create disk quota, TABLE 'quota_1' does not exist
+CREATE DISK QUOTA abc_error ON SCHEMA quota_1 WITH (quota='100MB');
+ERROR:  could not create disk quota, SCHEMA 'quota_1' does not exist
+CREATE DISK QUOTA abc_error ON USER quota_1 WITH (quota='100MB');
+ERROR:  could not create disk quota, USER 'quota_1' does not exist
+CREATE DISK QUOTA abc ON SCHEMA test_ns_disk_quota WITH (quota='100MB');
+ERROR:  disk quota entry "abc" already exists
+CREATE DISK QUOTA abc_1 ON SCHEMA test_ns_disk_quota WITH (quota='100MB');
+CREATE DISK QUOTA abc_2 ON TABLE test_ns_disk_quota.quota_1 WITH (quota='100MB');
+CREATE DISK QUOTA abc_3 ON USER test_user_quota_1 WITH (quota='100MB');
+DROP DISK QUOTA abc;
+DROP DISK QUOTA abc_1;
+DROP DISK QUOTA abc_2;
+DROP DISK QUOTA abc_3;
+DROP DISK QUOTA abc_no;
+ERROR:  disk quota abc_no does not exist
+DROP SCHEMA test_ns_disk_quota CASCADE;
+NOTICE:  drop cascades to table test_ns_disk_quota.quota_1
+DROP TABLE quota_2;
+DROP ROLE test_user_quota_1;
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index 48e0508..8f2d3e7 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -120,6 +120,8 @@ pg_db_role_setting|t
 pg_default_acl|t
 pg_depend|t
 pg_description|t
+pg_diskquota|t
+pg_diskquota_capability|t
 pg_enum|t
 pg_event_trigger|t
 pg_extension|t
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 16f979c..e411453 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -116,7 +116,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare without_oid c
 # ----------
 # Another group of parallel tests
 # ----------
-test: identity partition_join partition_prune reloptions hash_part indexing partition_aggregate
+test: identity partition_join partition_prune reloptions hash_part indexing partition_aggregate disk_quota
 
 # event triggers cannot run concurrently with any test that runs DDL
 test: event_trigger
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 42632be..e444fa0 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -191,3 +191,4 @@ test: partition_aggregate
 test: event_trigger
 test: fast_default
 test: stats
+test: disk_quota
diff --git a/src/test/regress/sql/disk_quota.sql b/src/test/regress/sql/disk_quota.sql
new file mode 100644
index 0000000..06b7db2
--- /dev/null
+++ b/src/test/regress/sql/disk_quota.sql
@@ -0,0 +1,54 @@
+--
+-- Regression tests for disk quota
+--
+
+CREATE SCHEMA test_ns_disk_quota
+
+       CREATE TABLE quota_1 (
+              a int,
+              b int
+       );
+
+CREATE TABLE quota_2 (
+        a                       int2,
+        b                       float4
+);
+
+CREATE ROLE test_user_quota_1;
+
+CREATE DISK QUOTA abc ON TABLE quota_2 WITH ();
+
+CREATE DISK QUOTA abc ON TABLE quota_2 WITH (abc='100MB');
+
+CREATE DISK QUOTA abc ON TABLE quota_2 WITH (quota='100MB', abc='100MB');
+
+CREATE DISK QUOTA abc ON TABLE quota_2 WITH (quota='100MB', quota='200MB');
+
+CREATE DISK QUOTA abc ON TABLE quota_2 WITH (quota='100MB');
+
+CREATE DISK QUOTA abc ON TABLE quota_2 WITH (quota='100MB');
+
+CREATE DISK QUOTA abc_error ON TABLE quota_1 WITH (quota='100MB');
+
+CREATE DISK QUOTA abc_error ON SCHEMA quota_1 WITH (quota='100MB');
+
+CREATE DISK QUOTA abc_error ON USER quota_1 WITH (quota='100MB');
+
+CREATE DISK QUOTA abc ON SCHEMA test_ns_disk_quota WITH (quota='100MB');
+
+CREATE DISK QUOTA abc_1 ON SCHEMA test_ns_disk_quota WITH (quota='100MB');
+
+CREATE DISK QUOTA abc_2 ON TABLE test_ns_disk_quota.quota_1 WITH (quota='100MB');
+
+CREATE DISK QUOTA abc_3 ON USER test_user_quota_1 WITH (quota='100MB');
+
+
+DROP DISK QUOTA abc;
+DROP DISK QUOTA abc_1;
+DROP DISK QUOTA abc_2;
+DROP DISK QUOTA abc_3;
+DROP DISK QUOTA abc_no;
+
+DROP SCHEMA test_ns_disk_quota CASCADE;
+DROP TABLE quota_2;
+DROP ROLE test_user_quota_1;
-- 
1.8.3.1