Questions about PostgreSQL implementation details
Hello PostgreSQL hackers,
I hope I am posting on the right mailing-list.
I am actually doing a PhD related to relational databases and software engineering.
I use PostgreSQL for my research.
I have a few questions about the internals of PostgreSQL and I think they require experts knowledge.
I could not find documentation about that in the nice PostgreSQL documentation but maybe I missed something? Tell me if it is the case.
My Questions:
Q1. Are PostgreSQL's meta-description tables (such as pg_class) the "reality" concerning the state of the DB or are they just a virtual representation ?
What I would like to know with this question is: would it be possible to implement DDL queries (e.g. CREATE TABLE, DROP TABLE, CREATE VIEW, ALTER TABLE, etc.) as DML queries that modify the meta-data stored in meta-description tables?
For example, something like:
INSERT INTO pg_class [...];
To create a new table (instead of the CREATE TABLE DDL query).
Q1.1 If it is possible, is what is done in reality? I have the feeling that it is not the case and that DDL queries are implemented in C directly.
Q1.2 If it is possible and not done, what is the reason?
--
Q2. Are PostgreSQL's "meta-constraints" (i.e. constraints related to database structure such as "a table can only have a single primary key") implemented in C code or via data constraints on PostgreSQL's meta-description tables?
I guess, again, they are implemented in C code, but I might be wrong.
Q2.1 If they are not implemented via data constraints on meta-description tables, why ?
Q2.2 Is there somewhere in the documentation a list of such "meta-constraints" implemented by PostgreSQL?
Thanks in advance for answering my questions, any help or pointers to existing documentation will be appreciated.
Cheers,
Julien Delplanque
On 12/9/19 7:35 AM, Julien Delplanque wrote:
Hello PostgreSQL hackers,
I hope I am posting on the right mailing-list.
I am actually doing a PhD related to relational databases and software
engineering.I use PostgreSQL for my research.
I have a few questions about the internals of PostgreSQL and I think
they require experts knowledge.I could not find documentation about that in the nice PostgreSQL
documentation but maybe I missed something? Tell me if it is the case.My Questions:
Q1. Are PostgreSQL's meta-description tables (such as pg_class) the
"reality" concerning the state of the DB or are they just a virtual
representation ?
Not all of them are real tables; some of the pg_catalog relations are
views over others of them. But many of them are real tables with C
structs that back them. Take a look in src/include/catalog/pg_class.h
and you'll see the C struct definition, somewhat obscured by some
macros that make it less obvious to people not familiar with the
postgresql sources.
On line 29:
CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP
BKI_ROWTYPE_OID(83,RelationRelation_Rowtype_Id) BKI_SCHEMA_MACRO
{
...
}
That's a typedef. See genbki.h where it defines the macro:
#define CATALOG(name,oid,oidmacro) typedef struct CppConcat(FormData_,name)
What I would like to know with this question is: would it be possible to
implement DDL queries (e.g. CREATE TABLE, DROP TABLE, CREATE VIEW, ALTER
TABLE, etc.) as DML queries that modify the meta-data stored in
meta-description tables?For example, something like:
INSERT INTO pg_class [...];
To create a new table (instead of the CREATE TABLE DDL query).
You are not allowed to insert into the pg_class table directly. There
are good reasons for that. Simply inserting a row into this table would
not cause all the infrastructure that backs a table to pop into
existence. So you have to use the DDL commands.
Q1.1 If it is possible, is what is done in reality? I have the feeling
that it is not the case and that DDL queries are implemented in C directly.
See src/backend/commands/tablecmds.c, function DefineRelation.
--
Mark Dilger
Julien Delplanque <julien.delplanque@inria.fr> writes:
I have a few questions about the internals of PostgreSQL and I think they require experts knowledge.
Q1. Are PostgreSQL's meta-description tables (such as pg_class) the "reality" concerning the state of the DB or are they just a virtual representation ?
The system catalogs are reality as far as the effects of DDL go. In the
particular case of pg_class, there is additional reality, which is that
(most) pg_class rows represent one or more data files on-disk. You could
in fact simulate many DDL operations by manual inserts/deletes/updates on
system catalogs; but that would not result in any actions on the data
files, so it falls down for the specific cases of CREATE/DROP TABLE,
CREATE/DROP INDEX, etc.
What I would like to know with this question is: would it be possible to implement DDL queries (e.g. CREATE TABLE, DROP TABLE, CREATE VIEW, ALTER TABLE, etc.) as DML queries that modify the meta-data stored in meta-description tables?
Underneath, many of those operations are just catalog manipulations,
so yes up to the point where you need to do something that impacts
user data storage.
(In practice, getting all the details right from a SQL client
would be a pretty painful thing, so I'm not sure I see the point.
Usually, modifying the PG C code or writing an extension would be
a saner approach to modifying the system's behavior.)
Q2. Are PostgreSQL's "meta-constraints" (i.e. constraints related to database structure such as "a table can only have a single primary key") implemented in C code or via data constraints on PostgreSQL's meta-description tables?
Mostly code. The only real constraints on the system catalogs, in the
sense of something that would reject an ill-advised low-level update,
are the unique indexes. There are also NOT NULL markers on many of
the catalogs' columns, but those are only enforced against manual
SQL updates not updates made by C code. This is all somewhat historical,
I suppose, but it's worked well enough for us.
Thanks in advance for answering my questions, any help or pointers to existing documentation will be appreciated.
There's no substitute for reading the source code. Unlike some other
systems, PG was developed as open source from the beginning, so there
is not much in the way of a "theory of operations manual" or other
separate internals documentation --- people with these sorts of
questions are expected to go read the code. You could perhaps start
by skimming
https://www.postgresql.org/docs/devel/internals.html
and then go look for README files in the portions of the source
tree that interest you.
regards, tom lane
Hi Julien!
On 09/12/2019 17:35, Julien Delplanque wrote:
Q1. Are PostgreSQL's meta-description tables (such as pg_class) the "reality" concerning the state of the DB or are they just a virtual representation ?
Yes, the catalog tables are the authoritative source. The system uses
those tables internally to get the information too.
Some of the pg_* relations are just views over other catalog tables, though.
What I would like to know with this question is: would it be possible to implement DDL queries (e.g. CREATE TABLE, DROP TABLE, CREATE VIEW, ALTER TABLE, etc.) as DML queries that modify the meta-data stored in meta-description tables?
For example, something like:
INSERT INTO pg_class [...];
To create a new table (instead of the CREATE TABLE DDL query).
Q1.1 If it is possible, is what is done in reality? I have the feeling that it is not the case and that DDL queries are implemented in C directly.
Q1.2 If it is possible and not done, what is the reason?
The C code for the DDL commands do some things in addition to modifying
the catalog tables. Notably for CREATE TABLE, it creates the relation
file in the data directory, where all the data is stored. It also
handles locking, invalidating various caches, firing event triggers etc.
Except for creating relation files, those other things happen just in
memory, though.
It is not supported, and please don't do it in production, but you could
try it out. Set "allow_system_table_mods=on", and insert to pg_class,
pg_attribute, etc. See how well it works. Beware that there are internal
caches, called "syscaches", in backends over the catalog tables, so if
you modify them directly, you may need to restart for the changes to
take effect.
Q2. Are PostgreSQL's "meta-constraints" (i.e. constraints related to database structure such as "a table can only have a single primary key") implemented in C code or via data constraints on PostgreSQL's meta-description tables?
I guess, again, they are implemented in C code, but I might be wrong.
In C code.
Q2.1 If they are not implemented via data constraints on meta-description tables, why ?
I think there are some restrictions that cannot easily be represented as
constraints. Also, we've never supported constraints on catalog tables,
so no one's given much thought to what it would look like if we did.
Q2.2 Is there somewhere in the documentation a list of such "meta-constraints" implemented by PostgreSQL?
Hmm, I don't think there is. Your best bet is to just look at the C
code, I'm afraid.
- Heikki
On 12/9/19 7:52 AM, Mark Dilger wrote:
Q1.1 If it is possible, is what is done in reality? I have the feeling
that it is not the case and that DDL queries are implemented in C
directly.See src/backend/commands/tablecmds.c, function DefineRelation.
I realize I could be a bit more helpful, here. For a SQL command
like "CREATE TABLE", you can first look in src/backend/parser/gram.y
for the grammar rule. In this case, searching for CREATE TABLE
leads you to a banner comment around line 3132. The rule for
CreateStmt creates a node of type CreateStmt. That leads you
to a struct of the same name in src/include/nodes/parsenodes.h.
You can see all the fields of that struct, and reconcile those
against what the code in gram.y is doing to populate those
fields. You can then look in src/backend/tcop/utility.c for
the T_CreateStmt nodeTag, and you'll find that it appears in
a few switch statements. One of those in ProcessUtilitySlow
calls DefineRelation along with a bunch of other stuff.
That's how you can trace these commands. Let's take DROP TABLE
as a second example. This one is harder, because "DROP TABLE"
doesn't exist exactly. It's part of a larger grammar production
for DropStmt that includes other DROP commands. All the same,
you can see there is a rule for DropStmt that creates a node of
type DropStmt. Looking in src/include/nodes/parsenodes.h you
can see the struct of the same name. Looking in tcop/utility.c
you see T_DropStmt is handled in a few switch statements, and
seem to run through ProcessUtilitySlow and ExecDropStmt. The
function ExecDropStmt has a switch over stmt->removeType, which
was populated back in gram.y to the value OBJECT_TABLE. That
now serves to select the RemoveRelations function.
This sort of analysis is fairly useful for getting from a SQL
statement to the code in src/backend/commands/ that implements
the guts of that statement.
I hope this helps.
--
Mark Dilger
Mark Dilger <hornschnorter@gmail.com> writes:
[ useful tips about finding the code that implements a SQL command ]
BTW, if it wasn't obvious already, you *really* want to have some kind
of tool that easily finds the definition of a particular C symbol.
You can fall back on "grep -r" or "git grep", but lots of people use
ctags or etags or some other C-aware indexing tool.
regards, tom lane
Thanks for your answer, I will dive into the C code then.
Le 9/12/19 à 16:52, Mark Dilger a écrit :
Not all of them are real tables; some of the pg_catalog relations are
views over others of them. But many of them are real tables with C
structs that back them. Take a look in src/include/catalog/pg_class.h
and you'll see the C struct definition, somewhat obscured by some
macros that make it less obvious to people not familiar with the
postgresql sources.
Indeed, I should have say it in my email, I saw in the documentation
that some of them are views. :-)
Cheers,
Julien
Le 9/12/19 à 16:52, Mark Dilger a écrit :
Show quoted text
On 12/9/19 7:35 AM, Julien Delplanque wrote:
Hello PostgreSQL hackers,
I hope I am posting on the right mailing-list.
I am actually doing a PhD related to relational databases and
software engineering.I use PostgreSQL for my research.
I have a few questions about the internals of PostgreSQL and I think
they require experts knowledge.I could not find documentation about that in the nice PostgreSQL
documentation but maybe I missed something? Tell me if it is the case.My Questions:
Q1. Are PostgreSQL's meta-description tables (such as pg_class) the
"reality" concerning the state of the DB or are they just a virtual
representation ?Not all of them are real tables; some of the pg_catalog relations are
views over others of them. But many of them are real tables with C
structs that back them. Take a look in src/include/catalog/pg_class.h
and you'll see the C struct definition, somewhat obscured by some
macros that make it less obvious to people not familiar with the
postgresql sources.On line 29:
CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP
BKI_ROWTYPE_OID(83,RelationRelation_Rowtype_Id) BKI_SCHEMA_MACRO
{
...
}That's a typedef. See genbki.h where it defines the macro:
#define CATALOG(name,oid,oidmacro) typedef struct
CppConcat(FormData_,name)What I would like to know with this question is: would it be possible
to implement DDL queries (e.g. CREATE TABLE, DROP TABLE, CREATE VIEW,
ALTER TABLE, etc.) as DML queries that modify the meta-data stored in
meta-description tables?For example, something like:
INSERT INTO pg_class [...];
To create a new table (instead of the CREATE TABLE DDL query).
You are not allowed to insert into the pg_class table directly. There
are good reasons for that. Simply inserting a row into this table would
not cause all the infrastructure that backs a table to pop into
existence. So you have to use the DDL commands.Q1.1 If it is possible, is what is done in reality? I have the
feeling that it is not the case and that DDL queries are implemented
in C directly.See src/backend/commands/tablecmds.c, function DefineRelation.
On Tue, 10 Dec 2019 at 01:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Mark Dilger <hornschnorter@gmail.com> writes:
[ useful tips about finding the code that implements a SQL command ]
BTW, if it wasn't obvious already, you *really* want to have some kind
of tool that easily finds the definition of a particular C symbol.
You can fall back on "grep -r" or "git grep", but lots of people use
ctags or etags or some other C-aware indexing tool.
I strongly recommend cscope with editor integration for your preferred
editor btw.
--
Craig Ringer http://www.2ndQuadrant.com/
2ndQuadrant - PostgreSQL Solutions for the Enterprise
On Mon, 9 Dec 2019 at 23:35, Julien Delplanque <julien.delplanque@inria.fr>
wrote:
Hello PostgreSQL hackers,
I hope I am posting on the right mailing-list.
I am actually doing a PhD related to relational databases and software
engineering.I use PostgreSQL for my research.
I have a few questions about the internals of PostgreSQL and I think they
require experts knowledge.I could not find documentation about that in the nice PostgreSQL
documentation but maybe I missed something? Tell me if it is the case.
There are a bunch of README files in the source tree that concern various
innards of PostgreSQL. They're not always referred to by any comments etc,
so you have to know they exist. They're usually well worth reading, though
it can take a while before you understand enough of PostgreSQL's
architecture for them to make sense...
Try
find src/ -name README\*
Q1. Are PostgreSQL's meta-description tables (such as pg_class) the
"reality" concerning the state of the DB or are they just a virtual
representation ?
That's been largely answered. But I want to point out an important caveat
that isn't obvious to new people: The oid of a relation (pg_class.oid) is
not the same thing as the pg_class.relfilenode, which is usually the base
of the filename of the on-disk storage for the relation. On an idle or new
database most relations are created with an equal oid and relfilename, so
it's easy to think the oid maps to the on-disk name of a relation, but it
doesn't. The relation oid will not change so long as the relation exists,
but the relfilenode may change if the table contents are rewritten, etc.
Additionally, there are special tables that are "relmapped" such that they
don't have a normal relfilenode at all, instead access is indirected via a
separate mapping. IIRC that's mainly necessary so we can bootstrap access
to the catalog tables that tell us how to read the catalogs.
What I would like to know with this question is: would it be possible to
implement DDL queries (e.g. CREATE TABLE, DROP TABLE, CREATE VIEW, ALTER
TABLE, etc.) as DML queries that modify the meta-data stored in
meta-description tables?
Not really.
PostgreSQL has a caching layer - sycache, relcache, catcache - and
invalidation scheme that it relies on. It doesn't execute regular queries
on the system catalogs. It also has simplifying rules around how they are
updated and accessed. See the logic in genam.c etc. Catalogs may also
represent things that aren't just other DB rows - for example, pg_class
entries are associated with files on disk for individual database tables.
You can't just insert into pg_class, pg_attribute, etc and expect that to
safely create a table. Though it's surprising how much you can get away
with by hacking the catalogs if you're very careful and you trick
PostgreSQL into firing appropriate invalidations. I'd quite like to have a
SQL-exposed way to do a forced global cache flush and invalidation for use
in emergency scary catalog hacking situations.
So you can do quite a bit with direct catalog surgery, but it's dangerous
and if you break the database, you get to keep the pieces.
Q1.1 If it is possible, is what is done in reality? I have the feeling that
it is not the case and that DDL queries are implemented in C directly.
Right. See standard_ProcessUtility() and friends.
Q1.2 If it is possible and not done, what is the reason?
Speed - no need to run the full executor. Simplification of catalog access.
Caching and invalidations. Chicken/egg problems: how do you "CREATE TABLE
pg_class"? . Lots more.
Q2. Are PostgreSQL's "meta-constraints" (i.e. constraints related to
database structure such as "a table can only have a single primary key")
implemented in C code or via data constraints on PostgreSQL's
meta-description tables?
System catalogs are not permitted to have CONSTRAINTs (CHECK constraints,
UNIQUE constraints, PRIMARY KEY constraints, FOREIGN KEY constraints, etc).
All such management is done in C level logic with the assistance of the
pg_depend catalog and the relationships it tracks.
Q2.1 If they are not implemented via data constraints on meta-description
tables, why ?
Same as above.
Q2.2 Is there somewhere in the documentation a list of such
"meta-constraints" implemented by PostgreSQL?
Not AFAIK.
Why?
--
Craig Ringer http://www.2ndQuadrant.com/
2ndQuadrant - PostgreSQL Solutions for the Enterprise