Monitoring for long running transactions
We had a customer complaining of random data loss for the last 6 months
or so. We eventually tracked it down to a combination of bad coding and
a couple of bugs with the ORM. Basically, 'BEGIN' was being emitted by
the web app when viewing a certain page and 'COMMIT' was never emitted
after that. So once the app would get restarted, all data changes would
be lost. Definitely worst case scenario.
So the question is, what is the best way to monitor for this scenario
going forward? Are there any plugins or community recommended scripts
already made?
Regards,
On 6/4/20 10:00 AM, Samuel Smith wrote:
We had a customer complaining of random data loss for the last 6 months
or so. We eventually tracked it down to a combination of bad coding and
a couple of bugs with the ORM. Basically, 'BEGIN' was being emitted by
the web app when viewing a certain page and 'COMMIT' was never emitted
after that. So once the app would get restarted, all data changes would
be lost. Definitely worst case scenario.So the question is, what is the best way to monitor for this scenario
going forward? Are there any plugins or community recommended scripts
already made?
https://www.postgresql.org/docs/12/monitoring-stats.html#PG-STAT-ACTIVITY-VIEW
"
state text Current overall state of this backend. Possible values are:
active: The backend is executing a query.
idle: The backend is waiting for a new client command.
idle in transaction: The backend is in a transaction, but is not
currently executing a query.
idle in transaction (aborted): This state is similar to idle in
transaction, except one of the statements in the transaction caused an
error.
fastpath function call: The backend is executing a fast-path function.
disabled: This state is reported if track_activities is disabled in
this backend.
"
Regards,
--
Adrian Klaver
adrian.klaver@aklaver.com
On 6/4/20 2:29 PM, Adrian Klaver wrote:
On 6/4/20 10:00 AM, Samuel Smith wrote:
We had a customer complaining of random data loss for the last 6
months or so. We eventually tracked it down to a combination of bad
coding and a couple of bugs with the ORM. Basically, 'BEGIN' was being
emitted by the web app when viewing a certain page and 'COMMIT' was
never emitted after that. So once the app would get restarted, all
data changes would be lost. Definitely worst case scenario.So the question is, what is the best way to monitor for this scenario
going forward? Are there any plugins or community recommended scripts
already made?https://www.postgresql.org/docs/12/monitoring-stats.html#PG-STAT-ACTIVITY-VIEW
"
state text Current overall state of this backend. Possible
values are:active: The backend is executing a query.
idle: The backend is waiting for a new client command.
idle in transaction: The backend is in a transaction, but is not
currently executing a query.idle in transaction (aborted): This state is similar to idle in
transaction, except one of the statements in the transaction caused an
error.fastpath function call: The backend is executing a fast-path function.
disabled: This state is reported if track_activities is disabled in
this backend.
"Regards,
Sorry, I should have clarified that I was aware of the pg_stat_activity
table. That is how we found the problem in the first place. And yes I
could just write a bash script and run it in cron. I just didn't know if
there was a more "official" way to go about this since it is probably a
common monitoring point and/or if something like this was already made.
Regards,
On 6/4/20 12:59 PM, Samuel Smith wrote:
On 6/4/20 2:29 PM, Adrian Klaver wrote:
Sorry, I should have clarified that I was aware of the pg_stat_activity
table. That is how we found the problem in the first place. And yes I
could just write a bash script and run it in cron. I just didn't know if
there was a more "official" way to go about this since it is probably a
common monitoring point and/or if something like this was already made.
I guess it comes down to how much info you want. A list of monitoring tools:
https://wiki.postgresql.org/wiki/Monitoring#pgcenter
Haven't used it but this one:
looks interesting. From here:
Active backends: transaction running too long, idling in transaction,
waiting for locks
Information on scripting above:
https://pgmetrics.io/docs/scripting.html
Regards,
--
Adrian Klaver
adrian.klaver@aklaver.com
## Samuel Smith (pgsql@net153.net):
Sorry, I should have clarified that I was aware of the pg_stat_activity
table. That is how we found the problem in the first place. And yes I
could just write a bash script and run it in cron. I just didn't know if
there was a more "official" way to go about this since it is probably a
common monitoring point and/or if something like this was already made.
This works fine for everything nagios-like:
https://bucardo.org/check_postgres/check_postgres.pl.html#txn_time
Other solutions may be found via the PostgreSQL wiki - as you noted,
this is (should be) monitored, so almost everything which can monitor
PostgreSQL covers this. (which raises the obvious question - did you
monitor your server or did you just use the common "shouting user"
approach? ;))
Regards,
Christoph
--
Spare Space
Samuel Smith schrieb am 04.06.2020 um 21:59:
Sorry, I should have clarified that I was aware of the
pg_stat_activity table. That is how we found the problem in the first
place. And yes I could just write a bash script and run it in cron. I
just didn't know if there was a more "official" way to go about this
since it is probably a common monitoring point and/or if something
like this was already made.
What about setting idle_in_transaction_session_timeout to some reasonably high value (15 minutes?)
Then you would get an error in your application because the connection was terminated and you would learn early about the problem.
Even with monitoring enabled, you would probably still kill those sessions manually as the application most probably can't commit them properly any more.
Thomas