Fault injection framework

Started by Asim R Palmost 7 years ago4 messageshackers

apraveen@pivotal.io

almost 7 years ago

Hello

Fault injection was discussed a few months ago at PGCon in Ottawa. At
least a few folks showed interest and so I would like to present what we
have been using in Greenplum.

The attached patch set contains the fault injector framework ported to
PostgreSQL master. It provides ability to define points of interest in
backend code and then inject faults at those points from SQL. Also
included is an isolation test to simulate a speculative insert conflict
scenario that was found to be rather cumbersome to implement using advisory
locks, see [1]CAAKRu_a7hbyrk=wveHYhr4LbcRnRCG=yPUVoQYB9YO1CdUBE9Q@mail.gmail.com </messages/by-id/CAAKRu_a7hbyrk=wveHYhr4LbcRnRCG=yPUVoQYB9YO1CdUBE9Q@mail.gmail.com>. The alternative isolation spec using fault injectors seems
much simpler to understand.

Asim

[1]: CAAKRu_a7hbyrk=wveHYhr4LbcRnRCG=yPUVoQYB9YO1CdUBE9Q@mail.gmail.com </messages/by-id/CAAKRu_a7hbyrk=wveHYhr4LbcRnRCG=yPUVoQYB9YO1CdUBE9Q@mail.gmail.com>
</messages/by-id/CAAKRu_a7hbyrk=wveHYhr4LbcRnRCG=yPUVoQYB9YO1CdUBE9Q@mail.gmail.com>

Michael Paquier

michael@paquier.xyz

almost 7 years ago

In reply to: Asim R P (#1)

Re: Fault injection framework

On Thu, Aug 22, 2019 at 07:45:09PM +0530, Asim R P wrote:

Fault injection was discussed a few months ago at PGCon in Ottawa. At
least a few folks showed interest and so I would like to present what we
have been using in Greenplum.

The attached patch set contains the fault injector framework ported to
PostgreSQL master. It provides ability to define points of interest in
backend code and then inject faults at those points from SQL. Also
included is an isolation test to simulate a speculative insert conflict
scenario that was found to be rather cumbersome to implement using advisory
locks, see [1]. The alternative isolation spec using fault injectors seems
much simpler to understand.

You may want to double-check whitespaces in your patch set, and 0002
does not apply because of conflicts in isolationtester.h (my fault!).

0002 is an independent feature, so I would keep it out of the fault
framework for integration. There has been a argument from Alvaro
more convincing than mine about the use of a separate keyword, hence
removing a dependency with steps:
/messages/by-id/20190823153825.GA11405@alvherre.pgsql

It would be good also to have a test case which exercises it, without
the need of the fault framework or its dedicated schedule.

Patches 0003, 0004 and 0005 could just be grouped together, they deal
about the same thing.

My first impressions about this patch is that it is very intrusive.
Could you explain the purpose of am_faultinjector? That's a specific
connection string parameter which can be used similarly to replication
for WAL senders? Couldn't there be an equivalent with a SUSET GUC?
It may be interesting to see which parts of this framework could be
moved into an extension loaded with shared_preload_libraries, one
thing being the shared memory initialization part. At the end it
would be interesting to not have a dependency with a compile-time
flag.

Things like exec_fault_injector_command() need to be more documented.
It is hard to guess what it is being used for.
--
Michael

Asim R P

apraveen@pivotal.io

almost 7 years ago

In reply to: Michael Paquier (#2)

Re: Fault injection framework

On Tue, Aug 27, 2019 at 12:35 PM Michael Paquier <michael@paquier.xyz>
wrote:

You may want to double-check whitespaces in your patch set, and 0002
does not apply because of conflicts in isolationtester.h (my fault!).

I've rebased the patch set against the latest master and tried to resolve
whitespace issues. Apologies for the whitespace conflicts, I tried
resolving them but there is some trailing whitespace in the answer file of
the regress test in v1-0001 that cannot be removed, else the test will fail.

0002 is an independent feature, so I would keep it out of the fault
framework for integration. There has been a argument from Alvaro
more convincing than mine about the use of a separate keyword, hence
removing a dependency with steps:

/messages/by-id/20190823153825.GA11405@alvherre.pgsql

That is a valid point, thank you Alvaro for the feedback. I've changed
0002 so that a step within a permutation can be declared as blocking,
revised patch set is attached.

It would be good also to have a test case which exercises it, without
the need of the fault framework or its dedicated schedule.

It is for this reason that I have not separated patch 0002 out from
faultinjector patch set because the test to demonstrate the blocking
feature uses faults. I need to give more thought to find a test having a
session that needs to block for reasons other than locking. Any pointers
will be very helpful.

My first impressions about this patch is that it is very intrusive.
Could you explain the purpose of am_faultinjector? That's a specific
connection string parameter which can be used similarly to replication
for WAL senders? Couldn't there be an equivalent with a SUSET GUC?

Thank you for the review. Admittedly, the patch set doesn't include a test
to demonstrate am_faultinjector. That is used when a fault needs to be
injected into a remote server, say a standby. And that standby may be
accepting connections or not, depending on if it's operating in hot-standby
mode. Therefore, the am_faultinjector and the connection parameter is used
to identify fault injection requests and allow those to be handled even
when normal user connections are not allowed. Also, such connections do
not need to be associated with a database, they simply need to set the
fault in the shared memory hash table. In that sense, fault injection
connections are treated similar to replication connections.

I was looking into tests under src/test/recovery/t/. Let me write a test
to demonstrate what I'm trying to explain above.

It may be interesting to see which parts of this framework could be
moved into an extension loaded with shared_preload_libraries, one
thing being the shared memory initialization part. At the end it
would be interesting to not have a dependency with a compile-time
flag.

Patch 0001 includes an extension that provides a SQL UDF as a wrapper over
the fault injection interface in backend. Moving the backend part of the
patch also into an extension seems difficult to me. Getting rid of the
compile time dependency is a strong enough advantage to spend more efforts
on this.

Things like exec_fault_injector_command() need to be more documented.
It is hard to guess what it is being used for.

Added a comment to explain things a bit. Hope that helps. And as
mentioned above, I'm working on a test case to demonstrate this feature.

Asim

Asim R P

apraveen@pivotal.io

over 6 years ago

In reply to: Asim R P (#3)

Re: Fault injection framework

On Tue, Aug 27, 2019 at 6:57 PM Asim R P <apraveen@pivotal.io> wrote:

On Tue, Aug 27, 2019 at 12:35 PM Michael Paquier <michael@paquier.xyz>

wrote:

Things like exec_fault_injector_command() need to be more documented.
It is hard to guess what it is being used for.

Added a comment to explain things a bit. Hope that helps. And as

mentioned above, I'm working on a test case to demonstrate this feature.

After learning a bit of Perl, I have a TAP test to share. The test
validates that a commit on master waits until a synchronous standby has
flushed WAL up to or greater than the commit LSN. The test demonstrates
remote faultinjector interface to inject a fault on standby. That's where
exec_fault_injector_command() and related code is exercised.

Patch summary:
0001 - the original fault injector patch up thread with remote fault
injection capability
0006 - TAP test that makes use of the remote fault injector API
Patches 0002-0005 are not included because they are not changed.

Asim

Fault injection framework

Attachments:

Attachments:

Attachments: