Some newbie questions

Started by M2Yover 17 years ago6 messages
#1M2Y
mailtoyahoo@gmail.com

Hello,

Could you plz answer the following questions of a newbie:

What is a good way to start understanding backend(postgres) code? Is
there any documentation available especially for developers?

What is commit log and why it is needed?

Why does a replication solution need log shipping and why cant we just
ship the transaction statements to a standby node?

to be continued ... ;)

Thanks,
Srinivas

#2Shane Ambler
pgsql@Sheeky.Biz
In reply to: M2Y (#1)
Re: Some newbie questions

M2Y wrote:

Hello,

Could you plz answer the following questions of a newbie:

What is a good way to start understanding backend(postgres) code? Is
there any documentation available especially for developers?

Most of the developer info is within comments in the code itself.
Another place to start is http://www.postgresql.org/developer/coding

What is commit log and why it is needed?

To achieve ACID (Atomic, Consistent, Isolatable, Durable)
The changes needed to complete a transaction are saved to the commit log
and flushed to disk, then the data files are changed. If the power goes
out during the data file modifications the commit log can be used to
complete the changes without losing any data.

Why does a replication solution need log shipping and why cant we
just ship the transaction statements to a standby node?

Depends on what you wish to achieve. They are two ways to a similar
solution.
Log shipping is part of the core code with plans to make the duplicate
server be able to satisfy select queries.
Statement based replication is offered by other options such as slony.

Each has advantages and disadvantages. Transaction logs are part of
normal operation and can be copied to another server in the background
without adding load or delays to the master server.

Statement based replication has added complexity of waiting for the
slaves to duplicate the transaction and handling errors from a slave
applying the transaction. They also tend to have restrictions when it
comes to replicating DDL changes - implemented as triggers run from
INSERT/UPDATE not from CREATE/ALTER TABLE.

--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz

#3M2Y
mailtoyahoo@gmail.com
In reply to: Shane Ambler (#2)
Re: Some newbie questions

Thanks Shane for your response...

On Sep 7, 11:52 pm, pgsql@Sheeky.Biz (Shane Ambler) wrote:

What is a good way to start understanding backend(postgres) code? Is
there any documentation available especially for developers?

Most of the developer info is within comments in the code itself.
Another place to start ishttp://www.postgresql.org/developer/coding

I have seen this link. But, I am looking(or hoping) for any design doc
or technical doc which details what is happening under the hoods as it
will save a lot of time to catchup the main stream.

What is commit log and why it is needed?

To achieve ACID (Atomic, Consistent, Isolatable, Durable)
The changes needed to complete a transaction are saved to the commit log
and flushed to disk, then the data files are changed. If the power goes
out during the data file modifications the commit log can be used to
complete the changes without losing any data.

This, I think, is transaction log or XLog. My question is about CLog
in which two bits are there for each transaction which will denote the
status of transaction. Since there is XLog from which we can determine
what changes we have to redo and undo, what is the need for this CLog.

Why does a replication solution need log shipping and why cant we
just ship the transaction statements to a standby node?

Depends on what you wish to achieve. They are two ways to a similar
solution.
Log shipping is part of the core code with plans to make the duplicate
server be able to satisfy select queries.
Statement based replication is offered by other options such as slony.

Each has advantages and disadvantages. Transaction logs are part of
normal operation and can be copied to another server in the background
without adding load or delays to the master server.

Statement based replication has added complexity of waiting for the
slaves to duplicate the transaction and handling errors from a slave
applying the transaction. They also tend to have restrictions when it
comes to replicating DDL changes - implemented as triggers run from
INSERT/UPDATE not from CREATE/ALTER TABLE.

I agree. Assuming that both master and backup are running same
versions of the server and both are in sync, why cant we just send the
command statements to standby in the main backend loop(before parsing)
and let the standby ignore the SELECT kind of statements.

I am a beginner ... plz forgive my ignorance and plz provide some
clarity so that I can understand the system better.

Thanks,
Srinivas

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: M2Y (#3)
Re: Some newbie questions

M2Y <mailtoyahoo@gmail.com> writes:

On Sep 7, 11:52�pm, pgsql@Sheeky.Biz (Shane Ambler) wrote:

Most of the developer info is within comments in the code itself.
Another place to start ishttp://www.postgresql.org/developer/coding

I have seen this link. But, I am looking(or hoping) for any design doc
or technical doc which details what is happening under the hoods as it
will save a lot of time to catchup the main stream.

Well, you should certainly not neglect
http://developer.postgresql.org/pgdocs/postgres/internals.html

Also note that many subtrees of the source code contain README files
with assorted overview material.

regards, tom lane

#5Alvaro Herrera
alvherre@commandprompt.com
In reply to: M2Y (#3)
Re: Some newbie questions

M2Y escribi�:

On Sep 7, 11:52�pm, pgsql@Sheeky.Biz (Shane Ambler) wrote:

What is a good way to start understanding backend(postgres) code? Is
there any documentation available especially for developers?

What is commit log and why it is needed?

To achieve ACID (Atomic, Consistent, Isolatable, Durable)
The changes needed to complete a transaction are saved to the commit log
and flushed to disk, then the data files are changed. If the power goes
out during the data file modifications the commit log can be used to
complete the changes without losing any data.

This, I think, is transaction log or XLog. My question is about CLog
in which two bits are there for each transaction which will denote the
status of transaction. Since there is XLog from which we can determine
what changes we have to redo and undo, what is the need for this CLog.

That's correct -- what Shane is describing is the transaction log
(usually know here as WAL). However, this xlog is write-only (except in
the case of a crash); clog is read-write, and must be fast to query
since it's used very frequently to determine visibility of each tuple.
Perhaps what you need to read is the chapter on our MVCC implementation,
which relies heavily on clog.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#6Greg Smith
gsmith@gregsmith.com
In reply to: M2Y (#1)
Re: Some newbie questions

On Sun, 7 Sep 2008, M2Y wrote:

Why does a replication solution need log shipping and why cant we just
ship the transaction statements to a standby node?

Here's one of the classic examples of why that doesn't work:

create table x (d decimal);
insert into x values (random());

If you execute those same statements on two different nodes, they will end
up with different values for the random number and therefore the nodes
won't match anymore. A similar issue shows up if you use functions that
check the current system time, that will be slightly different between the
two: even if the clocks are perfectly synced, by the time the standy
received the transaction it will be later than the original.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD