How different is AWS-RDS postgres?

Started by Rob Sargentalmost 5 years ago19 messagesgeneral
Jump to latest
#1Rob Sargent
robjsargent@gmail.com

I have what purports to be Postgres 12 ( PostgreSQL 12.5 on
x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat
4.8.5-11), 64-bit ) running on RDS.  I testing a new release of my
service with an admittedly large data set (which may be my swamping AWS
bandwidth).  But my app logs tell me a table was created and loaded (and
later, read)

From my tomcat log

00:09:58.567 [https-jsse-nio-10.0.2.28-15002-exec-3] DEBUG
edu.utah.camplab.jx.PayloadFromMux - STAGING TABLE CREATED:
bulk."rjs_GEV15_15_ff3ba73c_d927_431c_bd29_9687a47f1594"
00:09:58.585 [https-jsse-nio-10.0.2.28-15002-exec-3] INFO
edu.utah.camplab.jx.PayloadFromMux -
ff3ba73c-d927-431c-bd29-9687a47f1594: started COPY work at
1621987798585 00:09:58.586 [ff3ba73c-d927-431c-bd29-9687a47f1594]
INFO edu.utah.camplab.jx.PayloadWriterThread -
bulk."rjs_GEV15_15_ff3ba73c_d927_431c_bd29_9687a47f1594": Begin bulk
copy segment 00:10:01.597 [https-jsse-nio-10.0.2.28-15002-exec-11]
ERROR edu.utah.camplab.servlet.PayloadSaveServlet - received payload
00:10:01.598 [https-jsse-nio-10.0.2.28-15002-exec-11] DEBUG
org.jooq.impl.DefaultConnectionProvider - setting auto commit :
false 00:10:01.599 [https-jsse-nio-10.0.2.28-15002-exec-11] DEBUG
edu.utah.camplab.jx.PayloadFromMux -
074d449b-c3ba-499f-83e3-f48427fe0156: start transaction at
1621987801599 00:10:01.599 [https-jsse-nio-10.0.2.28-15002-exec-11]
DEBUG org.jooq.impl.DefaultConnectionProvider - set savepoint
00:10:07.511 [ff3ba73c-d927-431c-bd29-9687a47f1594] INFO
edu.utah.camplab.jx.PayloadWriterThread - bulk transfer of 2528447
took 8.925s 00:10:07.511 [ff3ba73c-d927-431c-bd29-9687a47f1594]
DEBUG edu.utah.camplab.jx.PayloadWriterThread - staged in 8925 ms
00:10:07.567 [https-jsse-nio-10.0.2.28-15002-exec-3] INFO
edu.utah.camplab.jx.PayloadFromMux -
ff3ba73c-d927-431c-bd29-9687a47f1594: Total segment save took 9486
ms 00:10:07.567 [https-jsse-nio-10.0.2.28-15002-exec-3] INFO
edu.utah.camplab.jx.AbstractPayload -
ff3ba73c-d927-431c-bd29-9687a47f1594: closing process
ff3ba73c-d927-431c-bd29-9687a47f1594 00:10:07.608
[https-jsse-nio-10.0.2.28-15002-exec-3] DEBUG
org.jooq.impl.DefaultConnectionProvider - release savepoint
00:10:07.609 [https-jsse-nio-10.0.2.28-15002-exec-3] DEBUG
edu.utah.camplab.jx.PayloadFromMux -
ff3ba73c-d927-431c-bd29-9687a47f1594: end transaction at 1621987807609

Which claims to have written 2,528,447 records in roughly 9 seconds into
the newly created table
"bulk.rjs.GEV15_15_FF3ba73c_d927_431c_bd29_9687147f1594". Nice.

However, no such table exists, though later processing renames it by
appending "_done" to the name (being careful to remain under 64 char)

My middleware does receive an exception notice

00:10:55.101 [https-jsse-nio-10.0.2.28-15002-exec-3] ERROR
edu.utah.camplab.jx.AbstractPayload - run
ff3ba73c-d927-431c-bd29-9687a47f1594: Exception from db write: SQL
[insert into segment select * from
bulk."rjs_GEV15_15_ff3ba73c_d927_431c_bd29_9687a47f1594" as s where
s.probandset_id >= 'a0000000-0000-0000-0000-000000000000' and
s.probandset_id < 'b0000000-0000-0000-0000-000000000000' ]; An I/O
error occurred while sending to the backend.: {}

which confirms(?) that the table /was/ there and read from 10 times
prior since I copy from this temp, bulk loaded table to the actual
target in 16 chunks by diddling the first hex digit from 0 through f. 
Here zero through nine apparently didn't have a problem.  These 16
inserts are in a single transaction, separate from the bulk copy. 
(There are a dozen more  of these files processed and disappeared.)

My question is:
Should I be surprised that the initial bulk loaded table is nowhere to
be seen, given the thumbs up from the logs?  Is this frailty inherent in
AWS/RDS infrastructure?

Since this is an academic exorcise, I have minimal AWS support, which
has yet to chime in on this matter. My access to the logs is um, er,
constrained.

#2Ron
ronljohnsonjr@gmail.com
In reply to: Rob Sargent (#1)
Re: How different is AWS-RDS postgres?

On 5/26/21 12:50 PM, Rob Sargent wrote:

I have what purports to be Postgres 12 ( PostgreSQL 12.5 on
x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat
4.8.5-11), 64-bit ) running on RDS.  I testing a new release of my service
with an admittedly large data set (which may be my swamping AWS
bandwidth).  But my app logs tell me a table was created and loaded (and
later, read)

From my tomcat log

00:09:58.567 [https-jsse-nio-10.0.2.28-15002-exec-3] DEBUG
edu.utah.camplab.jx.PayloadFromMux - STAGING TABLE CREATED:
bulk."rjs_GEV15_15_ff3ba73c_d927_431c_bd29_9687a47f1594" 00:09:58.585
[https-jsse-nio-10.0.2.28-15002-exec-3] INFO
edu.utah.camplab.jx.PayloadFromMux -
ff3ba73c-d927-431c-bd29-9687a47f1594: started COPY work at
1621987798585 00:09:58.586 [ff3ba73c-d927-431c-bd29-9687a47f1594] INFO
edu.utah.camplab.jx.PayloadWriterThread -
bulk."rjs_GEV15_15_ff3ba73c_d927_431c_bd29_9687a47f1594": Begin bulk
copy segment 00:10:01.597 [https-jsse-nio-10.0.2.28-15002-exec-11]
ERROR edu.utah.camplab.servlet.PayloadSaveServlet - received payload
00:10:01.598 [https-jsse-nio-10.0.2.28-15002-exec-11] DEBUG
org.jooq.impl.DefaultConnectionProvider - setting auto commit : false
00:10:01.599 [https-jsse-nio-10.0.2.28-15002-exec-11] DEBUG
edu.utah.camplab.jx.PayloadFromMux -
074d449b-c3ba-499f-83e3-f48427fe0156: start transaction at
1621987801599 00:10:01.599 [https-jsse-nio-10.0.2.28-15002-exec-11]
DEBUG org.jooq.impl.DefaultConnectionProvider - set savepoint
00:10:07.511 [ff3ba73c-d927-431c-bd29-9687a47f1594] INFO
edu.utah.camplab.jx.PayloadWriterThread - bulk transfer of 2528447
took 8.925s 00:10:07.511 [ff3ba73c-d927-431c-bd29-9687a47f1594] DEBUG
edu.utah.camplab.jx.PayloadWriterThread - staged in 8925 ms
00:10:07.567 [https-jsse-nio-10.0.2.28-15002-exec-3] INFO
edu.utah.camplab.jx.PayloadFromMux -
ff3ba73c-d927-431c-bd29-9687a47f1594: Total segment save took 9486 ms
00:10:07.567 [https-jsse-nio-10.0.2.28-15002-exec-3] INFO
edu.utah.camplab.jx.AbstractPayload -
ff3ba73c-d927-431c-bd29-9687a47f1594: closing process
ff3ba73c-d927-431c-bd29-9687a47f1594 00:10:07.608
[https-jsse-nio-10.0.2.28-15002-exec-3] DEBUG
org.jooq.impl.DefaultConnectionProvider - release savepoint
00:10:07.609 [https-jsse-nio-10.0.2.28-15002-exec-3] DEBUG
edu.utah.camplab.jx.PayloadFromMux -
ff3ba73c-d927-431c-bd29-9687a47f1594: end transaction at 1621987807609

Which claims to have written 2,528,447 records in roughly 9 seconds into
the newly created table
"bulk.rjs.GEV15_15_FF3ba73c_d927_431c_bd29_9687147f1594". Nice.

However, no such table exists, though later processing renames it by
appending "_done" to the name (being careful to remain under 64 char)

My middleware does receive an exception notice

00:10:55.101 [https-jsse-nio-10.0.2.28-15002-exec-3] ERROR
edu.utah.camplab.jx.AbstractPayload - run
ff3ba73c-d927-431c-bd29-9687a47f1594: Exception from db write: SQL
[insert into segment select * from
bulk."rjs_GEV15_15_ff3ba73c_d927_431c_bd29_9687a47f1594" as s where
s.probandset_id >= 'a0000000-0000-0000-0000-000000000000' and
s.probandset_id < 'b0000000-0000-0000-0000-000000000000' ]; An I/O
error occurred while sending to the backend.: {}

which confirms(?) that the table /was/ there and read from 10 times prior
since I copy from this temp, bulk loaded table to the actual target in 16
chunks by diddling the first hex digit from 0 through f.  Here zero
through nine apparently didn't have a problem.  These 16 inserts are in a
single transaction, separate from the bulk copy.  (There are a dozen more 
of these files processed and disappeared.)

My question is:
Should I be surprised that the initial bulk loaded table is nowhere to be
seen, given the thumbs up from the logs?  Is this frailty inherent in
AWS/RDS infrastructure?

Since this is an academic exorcise, I have minimal AWS support, which has
yet to chime in on this matter. My access to the logs is um, er, constrained.

The big differences I notice are:

1. "postgres" is not a superuser,
2. viewing logs is a hassle.

Otherwise, they're really similar.  We've pumped about 6TB of data into an
instance, and it's been rock solid.  JBoss is quite happy, and there haven't
been any problems.

--
Angular momentum makes the world go 'round.

#3Rob Sargent
robjsargent@gmail.com
In reply to: Ron (#2)
Re: How different is AWS-RDS postgres?

My question is:
Should I be surprised that the initial bulk loaded table is nowhere
to be seen, given the thumbs up from the logs?  Is this frailty
inherent in AWS/RDS infrastructure?

Since this is an academic exorcise, I have minimal AWS support, which
has yet to chime in on this matter. My access to the logs is um, er,
constrained.

The big differences I notice are:

1. "postgres" is not a superuser,
2. viewing logs is a hassle.

Otherwise, they're really similar.  We've pumped about 6TB of data
into an instance, and it's been rock solid.  JBoss is quite happy, and
there haven't been any problems.

--
Angular momentum makes the world go 'round.

Good to hear.
I pushed this same data set into our previous server (Version 10) so I
was surprised it blew up this time, though in that case these separate
files were more spread out in arrival time.  Nothing in the (available)
logs other that lots of auto-vacuuming  and the suggestion to increase
wal size (which I thought a little surprising).  I really would like to
know just what went south but that isn't obvious yet - OOM?, network?,
disk full? but I can't find any smoking gun on the AWS/RDS pages.

rjs

#4Ron
ronljohnsonjr@gmail.com
In reply to: Rob Sargent (#3)
Re: How different is AWS-RDS postgres?

On 5/26/21 1:24 PM, Rob Sargent wrote:

My question is:
Should I be surprised that the initial bulk loaded table is nowhere to
be seen, given the thumbs up from the logs?  Is this frailty inherent in
AWS/RDS infrastructure?

Since this is an academic exorcise, I have minimal AWS support, which
has yet to chime in on this matter. My access to the logs is um, er,
constrained.

The big differences I notice are:

1. "postgres" is not a superuser,
2. viewing logs is a hassle.

Otherwise, they're really similar.  We've pumped about 6TB of data into
an instance, and it's been rock solid.  JBoss is quite happy, and there
haven't been any problems.

--
Angular momentum makes the world go 'round.

Good to hear.
I pushed this same data set into our previous server (Version 10) so I was
surprised it blew up this time, though in that case these separate files
were more spread out in arrival time. Nothing in the (available) logs
other that lots of auto-vacuuming  and the suggestion to increase wal size
(which I thought a little surprising).  I really would like to know just
what went south but that isn't obvious yet - OOM?, network?, disk full?
but I can't find any smoking gun on the AWS/RDS pages.

The difficulty in viewing logs needs to be emphasized.

--
Angular momentum makes the world go 'round.

#5Ian Harding
harding.ian@gmail.com
In reply to: Ron (#4)
Re: How different is AWS-RDS postgres?

There is an option to send the logs to cloudwatch which makes it less awful
to look at them.

On Wed, May 26, 2021, 11:52 AM Ron <ronljohnsonjr@gmail.com> wrote:

Show quoted text

On 5/26/21 1:24 PM, Rob Sargent wrote:

My question is:
Should I be surprised that the initial bulk loaded table is nowhere to be
seen, given the thumbs up from the logs? Is this frailty inherent in
AWS/RDS infrastructure?

Since this is an academic exorcise, I have minimal AWS support, which has
yet to chime in on this matter. My access to the logs is um, er,
constrained.

The big differences I notice are:

1. "postgres" is not a superuser,
2. viewing logs is a hassle.

Otherwise, they're really similar. We've pumped about 6TB of data into an
instance, and it's been rock solid. JBoss is quite happy, and there
haven't been any problems.

--
Angular momentum makes the world go 'round.

Good to hear.
I pushed this same data set into our previous server (Version 10) so I was
surprised it blew up this time, though in that case these separate files
were more spread out in arrival time. Nothing in the (available) logs
other that lots of auto-vacuuming and the suggestion to increase wal size
(which I thought a little surprising). I really would like to know just
what went south but that isn't obvious yet - OOM?, network?, disk full? but
I can't find any smoking gun on the AWS/RDS pages.

The difficulty in viewing logs needs to be emphasized.

--
Angular momentum makes the world go 'round.

#6Rob Sargent
robjsargent@gmail.com
In reply to: Ian Harding (#5)
Re: How different is AWS-RDS postgres?

On May 26, 2021, at 4:37 PM, Ian Harding <harding.ian@gmail.com> wrote:


There is an option to send the logs to cloudwatch which makes it less awful to look at them.

I have that but precious little of interest there. Lots of autovac, a smattering of hints to increase wal size!? I have yet to spot anything which corresponds to the “I/O failure” which the middle ware gets.

I don’t have query logging on, but I do see reports from my psql session fat-fingering.

As to the logs UI, the search is pretty feeble; I don’t understand why there are four channels of logs; the graphs are wearing the same rose-coloured as the logs.
And 24 hours without a peep from AWS support. (I don’t call mailing me what I sent them “contact”.)

My guess right now is that the entire tomcat connection pool is in a single transaction? That’s the only way the tables could disappear. I am making separate calls to JDBC getConnection () for each doPost.

#7Philip Semanchuk
philip@americanefficient.com
In reply to: Rob Sargent (#6)
Re: How different is AWS-RDS postgres?

On May 26, 2021, at 10:04 PM, Rob Sargent <robjsargent@gmail.com> wrote:

On May 26, 2021, at 4:37 PM, Ian Harding <harding.ian@gmail.com> wrote:


There is an option to send the logs to cloudwatch which makes it less awful to look at them.

I have that but precious little of interest there. Lots of autovac, a smattering of hints to increase wal size!? I have yet to spot anything which corresponds to the “I/O failure” which the middle ware gets.

I don’t have query logging on, but I do see reports from my psql session fat-fingering.

As to the logs UI, the search is pretty feeble; I don’t understand why there are four channels of logs; the graphs are wearing the same rose-coloured as the logs.
And 24 hours without a peep from AWS support. (I don’t call mailing me what I sent them “contact”.)

My guess right now is that the entire tomcat connection pool is in a single transaction? That’s the only way the tables could disappear. I am making separate calls to JDBC getConnection () for each doPost.

We used Aurora (AWS hosted Postgres) and I agree that Cloudwatch search is pretty limited. I wrote a Python script to download cloudwatch logs to my laptop where I can use proper tools like grep to search them. It’s attached to this email. It’s hacky but not too terrible. I hope you find it useful.

Cheers
Philip

Attachments:

get_aws_logs.pytext/x-python-script; name=get_aws_logs.py; x-unix-mode=0644Download
#8Rob Sargent
robjsargent@gmail.com
In reply to: Philip Semanchuk (#7)
Re: How different is AWS-RDS postgres?

We used Aurora (AWS hosted Postgres) and I agree that Cloudwatch search is pretty limited. I wrote a Python script to download cloudwatch logs to my laptop where I can use proper tools like grep to search them. It’s attached to this email. It’s hacky but not too terrible. I hope you find it useful.

Cheers
Philip

Thank you.

Show quoted text

<get_aws_logs.py>

#9Rob Sargent
robjsargent@gmail.com
In reply to: Philip Semanchuk (#7)
Re: How different is AWS-RDS postgres?

On 5/27/21 7:45 AM, Philip Semanchuk wrote:

On May 26, 2021, at 10:04 PM, Rob Sargent <robjsargent@gmail.com> wrote:

On May 26, 2021, at 4:37 PM, Ian Harding <harding.ian@gmail.com> wrote:


There is an option to send the logs to cloudwatch which makes it less awful to look at them.

I have that but precious little of interest there. Lots of autovac, a smattering of hints to increase wal size!? I have yet to spot anything which corresponds to the “I/O failure” which the middle ware gets.

I don’t have query logging on, but I do see reports from my psql session fat-fingering.

As to the logs UI, the search is pretty feeble; I don’t understand why there are four channels of logs; the graphs are wearing the same rose-coloured as the logs.
And 24 hours without a peep from AWS support. (I don’t call mailing me what I sent them “contact”.)

My guess right now is that the entire tomcat connection pool is in a single transaction? That’s the only way the tables could disappear. I am making separate calls to JDBC getConnection () for each doPost.

We used Aurora (AWS hosted Postgres) and I agree that Cloudwatch search is pretty limited. I wrote a Python script to download cloudwatch logs to my laptop where I can use proper tools like grep to search them. It’s attached to this email. It’s hacky but not too terrible. I hope you find it useful.

Cheers
Philip

I may have found another difference: JDBC connections are not logged?!
I just reproduce my report, and the CloudWatch view of the logs shows
some psql interaction from before and after the test, but no mention of
losing 7.5M records.

#10Sam Gendler
sgendler@ideasculptor.com
In reply to: Rob Sargent (#9)
Re: How different is AWS-RDS postgres?

The same JDBC connection that is resulting in lost data? Sounds to me like
you aren't connecting to the DB you think you are connecting to.

On Thu, May 27, 2021 at 2:01 PM Rob Sargent <robjsargent@gmail.com> wrote:

Show quoted text

On 5/27/21 7:45 AM, Philip Semanchuk wrote:

On May 26, 2021, at 10:04 PM, Rob Sargent <robjsargent@gmail.com> <robjsargent@gmail.com> wrote:

On May 26, 2021, at 4:37 PM, Ian Harding <harding.ian@gmail.com> <harding.ian@gmail.com> wrote:


There is an option to send the logs to cloudwatch which makes it less awful to look at them.

I have that but precious little of interest there. Lots of autovac, a smattering of hints to increase wal size!? I have yet to spot anything which corresponds to the “I/O failure” which the middle ware gets.

I don’t have query logging on, but I do see reports from my psql session fat-fingering.

As to the logs UI, the search is pretty feeble; I don’t understand why there are four channels of logs; the graphs are wearing the same rose-coloured as the logs.
And 24 hours without a peep from AWS support. (I don’t call mailing me what I sent them “contact”.)

My guess right now is that the entire tomcat connection pool is in a single transaction? That’s the only way the tables could disappear. I am making separate calls to JDBC getConnection () for each doPost.

We used Aurora (AWS hosted Postgres) and I agree that Cloudwatch search is pretty limited. I wrote a Python script to download cloudwatch logs to my laptop where I can use proper tools like grep to search them. It’s attached to this email. It’s hacky but not too terrible. I hope you find it useful.

Cheers
Philip

I may have found another difference: JDBC connections are not logged?!
I just reproduce my report, and the CloudWatch view of the logs shows some
psql interaction from before and after the test, but no mention of losing
7.5M records.

#11Rob Sargent
robjsargent@gmail.com
In reply to: Sam Gendler (#10)
Re: How different is AWS-RDS postgres?

On 5/27/21 3:08 PM, Sam Gendler wrote:

The same JDBC connection that is resulting in lost data?  Sounds to me
like you aren't connecting to the DB you think you are connecting to.

I almost wish that were true.

However, looking at AWS "Performance Insights" is see the sql statements
generate by my app begin executed on the server.  Not coincidentally
this is from the "Top SQL (10)/Load by waits" view. Now that view does
NOT, in point of fact, name the database in which the sql is running,
but the rest of my environment pretty much rules out silently switching
tracks on myself.  I have to read from the correct database, using a
UUID, to get data to analyze, then save the results of the analysis
back. I'm using my wonderful webapp to run this and I've successfully
analyzed and stored result for small starting data sets.

I just notice the "Top database" tab on that screen:  there is only one
and it's the correct one.

I've reproduced the behaviour.  I'm pretty convinced it a size thing,
but which part of the system is being max'd out is not clear.  Not CPU,
but that's the only resource the "Performance Insight" mentions
(suggesting this UI wasn't designed by a database person).

The loss of the staging table is most spectacular.  It filled from a
file which has 7.5M records. It's clear in the tomcat logs that is was
created and written to, one record read.  Then the "top sql" happens:

insert into segment select * from
bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where
s.probandset_id >= ? and s.probandset_id < ?

the "bulk" table is the staging table. The params are filled in with a
quasi-uuid which grab roughly 1/16th of the data.  In the stack trace on
my tomcat server I get

Caused by: org.jooq.exception.DataAccessException: SQL [insert into
segment select * from
bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where 
s.probandset_id >= '30000000-0000-0000-0000-00000000\
0000' and s.probandset_id < '40000000-0000-0000-0000-000000000000'
]; An I/O error occurred while sending to the backend.

So this would have been the fourth such insert statement read from that
staging table.

That table is not deleted by my code.  It is renamed after the last
insert into segment, by appending "_done" to the name.  But that table,
by either name, is nowhere to be seen on the server.

For those scoring at home, the trouble in the tomcat log start with

0:02:11.452 [https-jsse-nio-10.0.2.28-15002-exec-7] INFO
edu.utah.camplab.jx.PayloadFromMux -
bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6": Begin
transfer from bulk to segment
27-May-2021 20:02:50.338 FINE [Catalina-utility-2]
org.apache.catalina.session.ManagerBase.processExpires Start expire
sessions StandardManager at 1622145770338 sessioncount 0
27-May-2021 20:02:50.338 FINE [Catalina-utility-2]
org.apache.catalina.session.ManagerBase.processExpires End expire
sessions StandardManager processingTime 0 expired sessions: 0
27-May-2021 20:02:50.476 FINE [Catalina-utility-2]
org.apache.catalina.session.ManagerBase.processExpires Start expire
sessions StandardManager at 1622145770476 sessioncount 1
27-May-2021 20:02:50.476 FINE [Catalina-utility-2]
org.apache.catalina.session.ManagerBase.processExpires End expire
sessions StandardManager processingTime 0 expired sessions: 0
27-May-2021 20:02:51.847 WARNING [Tomcat JDBC Pool
Cleaner[1731185718:1622133381802]]
org.apache.tomcat.jdbc.pool.ConnectionPool.abandon Connection has
been abandoned PooledConnection[org.postgresql.jdbc.PgCo\
nnection@1622ead9]:java.lang.Exception
        at
org.apache.tomcat.jdbc.pool.ConnectionPool.getThreadDump(ConnectionPool.java:1163)
        at
org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:816)
        at
org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:660)
        at
org.apache.tomcat.jdbc.pool.ConnectionPool.getConnection(ConnectionPool.java:198)
        at
org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:132)
        at
org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:90)
        at
edu.utah.camplab.servlet.AbstractSGSServlet.getDbConn(AbstractSGSServlet.java:123)
        at
edu.utah.camplab.servlet.AbstractSGSServlet.getDbConn(AbstractSGSServlet.java:105)
        at
edu.utah.camplab.servlet.PayloadSaveServlet.doPost(PayloadSaveServlet.java:50)
....
20:02:51.854 [https-jsse-nio-10.0.2.28-15002-exec-7] DEBUG
org.jooq.impl.DefaultConnectionProvider - rollback to savepoint
20:02:51.855 [https-jsse-nio-10.0.2.28-15002-exec-7] DEBUG
org.jooq.impl.DefaultConnectionProvider - rollback
20:02:51.855 [https-jsse-nio-10.0.2.28-15002-exec-7] DEBUG
org.jooq.impl.DefaultConnectionProvider - setting auto commit : true
20:02:51.855 [https-jsse-nio-10.0.2.28-15002-exec-7] ERROR
edu.utah.camplab.jx.AbstractPayload - run
c9e224ca-85d2-40b4-ad46-327cfb9f0ac6: Exception from db write: SQL
[insert into segment select * from bulk.\
"rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where
s.probandset_id >= '30000000-0000-0000-0000-000000000000' and
s.probandset_id < '40000000-0000-0000-0000-000000000000' ]; An I/O
error occurred \
while sending to the backend.: {}
org.jooq.exception.DataAccessException: SQL [insert into segment
select * from
bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where 
s.probandset_id >= '30000000-0000-0000-0000-000000000000' an\
d s.probandset_id < '40000000-0000-0000-0000-000000000000' ]; An I/O
error occurred while sending to the backend.
        at org.jooq_3.14.7.POSTGRES.debug(Unknown Source)
        at org.jooq.impl.Tools.translate(Tools.java:2880)
        at
org.jooq.impl.DefaultExecuteContext.sqlException(DefaultExecuteContext.java:757)
        at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:389)
        at
edu.utah.camplab.jx.PayloadFromMux.insertAllSegments(PayloadFromMux.java:177)
        at
edu.utah.camplab.jx.PayloadFromMux.lambda$completeSegmentSave$3(PayloadFromMux.java:165)
        at
org.jooq.impl.DefaultDSLContext$3.run(DefaultDSLContext.java:615)
        at
org.jooq.impl.DefaultDSLContext$3.run(DefaultDSLContext.java:612)
        at
org.jooq.impl.DefaultDSLContext.lambda$transactionResult0$0(DefaultDSLContext.java:543)
        at org.jooq.impl.Tools$35$1.block(Tools.java:5203)
        at
java.base/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3137)
        at org.jooq.impl.Tools$35.get(Tools.java:5200)
        at
org.jooq.impl.DefaultDSLContext.transactionResult0(DefaultDSLContext.java:595)
        at
org.jooq.impl.DefaultDSLContext.transactionResult(DefaultDSLContext.java:512)
        at
org.jooq.impl.DefaultDSLContext.transaction(DefaultDSLContext.java:612)
        at
edu.utah.camplab.jx.PayloadFromMux.completeSegmentSave(PayloadFromMux.java:164)
        at
edu.utah.camplab.jx.PayloadFromMux.writedb(PayloadFromMux.java:45)
        at
edu.utah.camplab.jx.AbstractPayload.lambda$write$0(AbstractPayload.java:77)
        at
org.jooq.impl.DefaultDSLContext$3.run(DefaultDSLContext.java:615)

...
       Suppressed: org.jooq.exception.DataAccessException: Cannot
rollback transaction
                at
org.jooq.impl.DefaultConnectionProvider.rollback(DefaultConnectionProvider.java:142)
                at
org.jooq.impl.DefaultTransactionProvider.rollback(DefaultTransactionProvider.java:223)
                at
org.jooq.impl.DefaultDSLContext.lambda$transactionResult0$0(DefaultDSLContext.java:567)
                ... 45 common frames omitted
        Caused by: java.sql.SQLException: Connection has already
been closed.
                at
org.apache.tomcat.jdbc.pool.ProxyConnection.invoke(ProxyConnection.java:117)
                at
org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:108)
                at
org.apache.tomcat.jdbc.pool.interceptor.AbstractCreateStatementInterceptor.invoke(AbstractCreateStatementInterceptor.java:79)
                at
org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:108)
                at
org.apache.tomcat.jdbc.pool.DisposableConnectionFacade.invoke(DisposableConnectionFacade.java:81)
                at com.sun.proxy.$Proxy4.rollback(Unknown Source)
                at
org.jooq.impl.DefaultConnectionProvider.rollback(DefaultConnectionProvider.java:139)
                ... 47 common frames omitted
        Suppressed: org.jooq.exception.DataAccessException: Cannot
set autoCommit
                at
org.jooq.impl.DefaultConnectionProvider.setAutoCommit(DefaultConnectionProvider.java:222)
                at
org.jooq.impl.DefaultTransactionProvider.brace(DefaultTransactionProvider.java:246)
                at
org.jooq.impl.DefaultTransactionProvider.rollback(DefaultTransactionProvider.java:229)
                at
org.jooq.impl.DefaultDSLContext.lambda$transactionResult0$0(DefaultDSLContext.java:567)
                ... 33 common frames omitted
        Caused by: java.sql.SQLException: Connection has already
been closed.
                at
org.apache.tomcat.jdbc.pool.ProxyConnection.invoke(ProxyConnection.java:117)
                at
org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:108)
                at
org.apache.tomcat.jdbc.pool.interceptor.AbstractCreateStatementInterceptor.invoke(AbstractCreateStatementInterceptor.java:79)
                at
org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:108)
                at
org.apache.tomcat.jdbc.pool.DisposableConnectionFacade.invoke(DisposableConnectionFacade.java:81)
                at com.sun.proxy.$Proxy4.setAutoCommit(Unknown Source)
                at
org.jooq.impl.DefaultConnectionProvider.setAutoCommit(DefaultConnectionProvider.java:219)
                ... 36 common frames omitted
Caused by: org.postgresql.util.PSQLException: An I/O error occurred
while sending to the backend.
        at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:350)
        at
org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:481)
        at
org.postgresql.jdbc.PgStatement.execute(PgStatement.java:401)
        at
org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:164)
        at
org.postgresql.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:153)
        at
jdk.internal.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
        at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:564)
        at
org.apache.tomcat.jdbc.pool.StatementFacade$StatementProxy.invoke(StatementFacade.java:114)
        at com.sun.proxy.$Proxy14.execute(Unknown Source)
        at
org.jooq.tools.jdbc.DefaultPreparedStatement.execute(DefaultPreparedStatement.java:214)
        at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:458)
        at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:375)
        ... 50 common frames omitted
Caused by: javax.net.ssl.SSLException: Socket closed
        at
java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
        at
java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:369)
        at
java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:312)
        at
java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:307)
        at
java.base/sun.security.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1680)
        at
java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1054)
        at
org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:161)
        at
org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:128)

#12Ron
ronljohnsonjr@gmail.com
In reply to: Rob Sargent (#11)
Re: How different is AWS-RDS postgres?

On 5/27/21 4:58 PM, Rob Sargent wrote:

On 5/27/21 3:08 PM, Sam Gendler wrote:

The same JDBC connection that is resulting in lost data?  Sounds to me
like you aren't connecting to the DB you think you are connecting to.

I almost wish that were true.

However, looking at AWS "Performance Insights" is see the sql statements
generate by my app begin executed on the server.  Not coincidentally this
is from the "Top SQL (10)/Load by waits" view.  Now that view does NOT, in
point of fact, name the database in which the sql is running, but the rest
of my environment pretty much rules out silently switching tracks on
myself.  I have to read from the correct database, using a UUID, to get
data to analyze, then save the results of the analysis back. I'm using my
wonderful webapp to run this and I've successfully analyzed and stored
result for small starting data sets.

I just notice the "Top database" tab on that screen:  there is only one
and it's the correct one.

I've reproduced the behaviour.  I'm pretty convinced it a size thing, but
which part of the system is being max'd out is not clear.  Not CPU, but
that's the only resource the "Performance Insight" mentions (suggesting
this UI wasn't designed by a database person).

The loss of the staging table is most spectacular.  It filled from a file
which has 7.5M records. It's clear in the tomcat logs that is was created
and written to, one record read.  Then the "top sql" happens:

insert into segment select * from
bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where
s.probandset_id >= ? and s.probandset_id < ?

the "bulk" table is the staging table. The params are filled in with a
quasi-uuid which grab roughly 1/16th of the data.  In the stack trace on
my tomcat server I get

Caused by: org.jooq.exception.DataAccessException: SQL [insert into
segment select * from
bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where 
s.probandset_id >= '30000000-0000-0000-0000-00000000\
0000' and s.probandset_id < '40000000-0000-0000-0000-000000000000' ];
An I/O error occurred while sending to the backend.

So this would have been the fourth such insert statement read from that
staging table.

That table is not deleted by my code.  It is renamed after the last insert
into segment, by appending "_done" to the name.  But that table, by either
name, is nowhere to be seen on the server.

For those scoring at home, the trouble in the tomcat log start with

0:02:11.452 [https-jsse-nio-10.0.2.28-15002-exec-7] INFO
edu.utah.camplab.jx.PayloadFromMux -

Is there any way to replicate this in psql?

--
Angular momentum makes the world go 'round.

#13Rob Sargent
robjsargent@gmail.com
In reply to: Ron (#12)
Re: How different is AWS-RDS postgres?

On 5/27/21 4:10 PM, Ron wrote:

On 5/27/21 4:58 PM, Rob Sargent wrote:

On 5/27/21 3:08 PM, Sam Gendler wrote:

The same JDBC connection that is resulting in lost data?  Sounds to
me like you aren't connecting to the DB you think you are connecting to.

I almost wish that were true.

However, looking at AWS "Performance Insights" is see the sql
statements generate by my app begin executed on the server.  Not
coincidentally this is from the "Top SQL (10)/Load by waits" view. 
Now that view does NOT, in point of fact, name the database in which
the sql is running, but the rest of my environment pretty much rules
out silently switching tracks on myself.  I have to read from the
correct database, using a UUID, to get data to analyze, then save the
results of the analysis back. I'm using my wonderful webapp to run
this and I've successfully analyzed and stored result for small
starting data sets.

I just notice the "Top database" tab on that screen:  there is only
one and it's the correct one.

I've reproduced the behaviour.  I'm pretty convinced it a size thing,
but which part of the system is being max'd out is not clear.  Not
CPU, but that's the only resource the "Performance Insight" mentions
(suggesting this UI wasn't designed by a database person).

The loss of the staging table is most spectacular.  It filled from a
file which has 7.5M records. It's clear in the tomcat logs that is
was created and written to, one record read.  Then the "top sql"
happens:

insert into segment select * from
bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s
where s.probandset_id >= ? and s.probandset_id < ?

the "bulk" table is the staging table. The params are filled in with
a quasi-uuid which grab roughly 1/16th of the data.  In the stack
trace on my tomcat server I get

Caused by: org.jooq.exception.DataAccessException: SQL [insert
into segment select * from
bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s
where  s.probandset_id >= '30000000-0000-0000-0000-00000000\
0000' and s.probandset_id <
'40000000-0000-0000-0000-000000000000' ]; An I/O error occurred
while sending to the backend.

So this would have been the fourth such insert statement read from
that staging table.

That table is not deleted by my code.  It is renamed after the last
insert into segment, by appending "_done" to the name.  But that
table, by either name, is nowhere to be seen on the server.

For those scoring at home, the trouble in the tomcat log start with

0:02:11.452 [https-jsse-nio-10.0.2.28-15002-exec-7] INFO
edu.utah.camplab.jx.PayloadFromMux -

Is there any way to replicate this in psql?

I have the json file which feeds the staging table and I have code which
is designed to load such files.  I suppose an sql file with those 16
insert statements would approximate the the apps call.  Let me give that
a shot.

Show quoted text

--
Angular momentum makes the world go 'round.

#14Sam Gendler
sgendler@ideasculptor.com
In reply to: Rob Sargent (#11)
Re: How different is AWS-RDS postgres?

That sure looks like something is causing your connection to have a
transaction rollback. I haven't worked in Java in far too long, but it
seems like your connection pool is under the impression your connection was
abandoned so it reclaims it and rollback the transaction, which would
explain why you aren't seeing the table when all is said and done - all of
the work is being undone at the end.

One possibility, based on the catalina log you provided - if you have
either end of the connection set up to automatically close idle connections
after a period of time, then you might receive a closed connection from the
pool, which will just error out when you attempt to run a query. In which
case, you need to set up your connection pool to test a connection before
it returns it to the requester. Usually something as simple as "select 2"
will be sufficient to determine if the database connection is open. I can
just about guarantee that your connection pool has a parameter which allows
you to specify a query to execute when a connection is requested.

On Thu, May 27, 2021 at 2:58 PM Rob Sargent <robjsargent@gmail.com> wrote:

Show quoted text

On 5/27/21 3:08 PM, Sam Gendler wrote:

The same JDBC connection that is resulting in lost data? Sounds to me
like you aren't connecting to the DB you think you are connecting to.

I almost wish that were true.

However, looking at AWS "Performance Insights" is see the sql statements
generate by my app begin executed on the server. Not coincidentally this
is from the "Top SQL (10)/Load by waits" view. Now that view does NOT, in
point of fact, name the database in which the sql is running, but the rest
of my environment pretty much rules out silently switching tracks on
myself. I have to read from the correct database, using a UUID, to get
data to analyze, then save the results of the analysis back. I'm using my
wonderful webapp to run this and I've successfully analyzed and stored
result for small starting data sets.

I just notice the "Top database" tab on that screen: there is only one
and it's the correct one.

I've reproduced the behaviour. I'm pretty convinced it a size thing, but
which part of the system is being max'd out is not clear. Not CPU, but
that's the only resource the "Performance Insight" mentions (suggesting
this UI wasn't designed by a database person).

The loss of the staging table is most spectacular. It filled from a file
which has 7.5M records. It's clear in the tomcat logs that is was created
and written to, one record read. Then the "top sql" happens:

insert into segment select * from
bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where
s.probandset_id >= ? and s.probandset_id < ?

the "bulk" table is the staging table. The params are filled in with a
quasi-uuid which grab roughly 1/16th of the data. In the stack trace on my
tomcat server I get

Caused by: org.jooq.exception.DataAccessException: SQL [insert into
segment select * from
bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where
s.probandset_id >= '30000000-0000-0000-0000-00000000\
0000' and s.probandset_id < '40000000-0000-0000-0000-000000000000' ]; An
I/O error occurred while sending to the backend.

So this would have been the fourth such insert statement read from that
staging table.

That table is not deleted by my code. It is renamed after the last insert
into segment, by appending "_done" to the name. But that table, by either
name, is nowhere to be seen on the server.

For those scoring at home, the trouble in the tomcat log start with

0:02:11.452 [https-jsse-nio-10.0.2.28-15002-exec-7] INFO
edu.utah.camplab.jx.PayloadFromMux -
bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6": Begin transfer
from bulk to segment
27-May-2021 20:02:50.338 FINE [Catalina-utility-2]
org.apache.catalina.session.ManagerBase.processExpires Start expire
sessions StandardManager at 1622145770338 sessioncount 0
27-May-2021 20:02:50.338 FINE [Catalina-utility-2]
org.apache.catalina.session.ManagerBase.processExpires End expire sessions
StandardManager processingTime 0 expired sessions: 0
27-May-2021 20:02:50.476 FINE [Catalina-utility-2]
org.apache.catalina.session.ManagerBase.processExpires Start expire
sessions StandardManager at 1622145770476 sessioncount 1
27-May-2021 20:02:50.476 FINE [Catalina-utility-2]
org.apache.catalina.session.ManagerBase.processExpires End expire sessions
StandardManager processingTime 0 expired sessions: 0
27-May-2021 20:02:51.847 WARNING [Tomcat JDBC Pool
Cleaner[1731185718:1622133381802]]
org.apache.tomcat.jdbc.pool.ConnectionPool.abandon Connection has been
abandoned PooledConnection[org.postgresql.jdbc.PgCo\
nnection@1622ead9]:java.lang.Exception
at
org.apache.tomcat.jdbc.pool.ConnectionPool.getThreadDump(ConnectionPool.java:1163)
at
org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:816)
at
org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:660)
at
org.apache.tomcat.jdbc.pool.ConnectionPool.getConnection(ConnectionPool.java:198)
at
org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:132)
at
org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:90)
at
edu.utah.camplab.servlet.AbstractSGSServlet.getDbConn(AbstractSGSServlet.java:123)
at
edu.utah.camplab.servlet.AbstractSGSServlet.getDbConn(AbstractSGSServlet.java:105)
at
edu.utah.camplab.servlet.PayloadSaveServlet.doPost(PayloadSaveServlet.java:50)
....
20:02:51.854 [https-jsse-nio-10.0.2.28-15002-exec-7] DEBUG
org.jooq.impl.DefaultConnectionProvider - rollback to savepoint
20:02:51.855 [https-jsse-nio-10.0.2.28-15002-exec-7] DEBUG
org.jooq.impl.DefaultConnectionProvider - rollback
20:02:51.855 [https-jsse-nio-10.0.2.28-15002-exec-7] DEBUG
org.jooq.impl.DefaultConnectionProvider - setting auto commit : true
20:02:51.855 [https-jsse-nio-10.0.2.28-15002-exec-7] ERROR
edu.utah.camplab.jx.AbstractPayload - run
c9e224ca-85d2-40b4-ad46-327cfb9f0ac6: Exception from db write: SQL [insert
into segment select * from bulk.\
"rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where
s.probandset_id >= '30000000-0000-0000-0000-000000000000' and
s.probandset_id < '40000000-0000-0000-0000-000000000000' ]; An I/O error
occurred \
while sending to the backend.: {}
org.jooq.exception.DataAccessException: SQL [insert into segment select *
from bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where
s.probandset_id >= '30000000-0000-0000-0000-000000000000' an\
d s.probandset_id < '40000000-0000-0000-0000-000000000000' ]; An I/O error
occurred while sending to the backend.
at org.jooq_3.14.7.POSTGRES.debug(Unknown Source)
at org.jooq.impl.Tools.translate(Tools.java:2880)
at
org.jooq.impl.DefaultExecuteContext.sqlException(DefaultExecuteContext.java:757)
at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:389)
at
edu.utah.camplab.jx.PayloadFromMux.insertAllSegments(PayloadFromMux.java:177)
at
edu.utah.camplab.jx.PayloadFromMux.lambda$completeSegmentSave$3(PayloadFromMux.java:165)
at
org.jooq.impl.DefaultDSLContext$3.run(DefaultDSLContext.java:615)
at
org.jooq.impl.DefaultDSLContext$3.run(DefaultDSLContext.java:612)
at
org.jooq.impl.DefaultDSLContext.lambda$transactionResult0$0(DefaultDSLContext.java:543)
at org.jooq.impl.Tools$35$1.block(Tools.java:5203)
at
java.base/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3137)
at org.jooq.impl.Tools$35.get(Tools.java:5200)
at
org.jooq.impl.DefaultDSLContext.transactionResult0(DefaultDSLContext.java:595)
at
org.jooq.impl.DefaultDSLContext.transactionResult(DefaultDSLContext.java:512)
at
org.jooq.impl.DefaultDSLContext.transaction(DefaultDSLContext.java:612)
at
edu.utah.camplab.jx.PayloadFromMux.completeSegmentSave(PayloadFromMux.java:164)
at
edu.utah.camplab.jx.PayloadFromMux.writedb(PayloadFromMux.java:45)
at
edu.utah.camplab.jx.AbstractPayload.lambda$write$0(AbstractPayload.java:77)
at
org.jooq.impl.DefaultDSLContext$3.run(DefaultDSLContext.java:615)

...
Suppressed: org.jooq.exception.DataAccessException: Cannot rollback
transaction
at
org.jooq.impl.DefaultConnectionProvider.rollback(DefaultConnectionProvider.java:142)
at
org.jooq.impl.DefaultTransactionProvider.rollback(DefaultTransactionProvider.java:223)
at
org.jooq.impl.DefaultDSLContext.lambda$transactionResult0$0(DefaultDSLContext.java:567)
... 45 common frames omitted
Caused by: java.sql.SQLException: Connection has already been
closed.
at
org.apache.tomcat.jdbc.pool.ProxyConnection.invoke(ProxyConnection.java:117)
at
org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:108)
at
org.apache.tomcat.jdbc.pool.interceptor.AbstractCreateStatementInterceptor.invoke(AbstractCreateStatementInterceptor.java:79)
at
org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:108)
at
org.apache.tomcat.jdbc.pool.DisposableConnectionFacade.invoke(DisposableConnectionFacade.java:81)
at com.sun.proxy.$Proxy4.rollback(Unknown Source)
at
org.jooq.impl.DefaultConnectionProvider.rollback(DefaultConnectionProvider.java:139)
... 47 common frames omitted
Suppressed: org.jooq.exception.DataAccessException: Cannot set
autoCommit
at
org.jooq.impl.DefaultConnectionProvider.setAutoCommit(DefaultConnectionProvider.java:222)
at
org.jooq.impl.DefaultTransactionProvider.brace(DefaultTransactionProvider.java:246)
at
org.jooq.impl.DefaultTransactionProvider.rollback(DefaultTransactionProvider.java:229)
at
org.jooq.impl.DefaultDSLContext.lambda$transactionResult0$0(DefaultDSLContext.java:567)
... 33 common frames omitted
Caused by: java.sql.SQLException: Connection has already been
closed.
at
org.apache.tomcat.jdbc.pool.ProxyConnection.invoke(ProxyConnection.java:117)
at
org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:108)
at
org.apache.tomcat.jdbc.pool.interceptor.AbstractCreateStatementInterceptor.invoke(AbstractCreateStatementInterceptor.java:79)
at
org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:108)
at
org.apache.tomcat.jdbc.pool.DisposableConnectionFacade.invoke(DisposableConnectionFacade.java:81)
at com.sun.proxy.$Proxy4.setAutoCommit(Unknown Source)
at
org.jooq.impl.DefaultConnectionProvider.setAutoCommit(DefaultConnectionProvider.java:219)
... 36 common frames omitted
Caused by: org.postgresql.util.PSQLException: An I/O error occurred while
sending to the backend.
at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:350)
at
org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:481)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:401)
at
org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:164)
at
org.postgresql.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:153)
at jdk.internal.reflect.GeneratedMethodAccessor25.invoke(Unknown
Source)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at
org.apache.tomcat.jdbc.pool.StatementFacade$StatementProxy.invoke(StatementFacade.java:114)
at com.sun.proxy.$Proxy14.execute(Unknown Source)
at
org.jooq.tools.jdbc.DefaultPreparedStatement.execute(DefaultPreparedStatement.java:214)
at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:458)
at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:375)
... 50 common frames omitted
Caused by: javax.net.ssl.SSLException: Socket closed
at
java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
at
java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:369)
at
java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:312)
at
java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:307)
at
java.base/sun.security.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1680)
at
java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1054)
at
org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:161)
at
org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:128)

#15Sam Gendler
sgendler@ideasculptor.com
In reply to: Rob Sargent (#13)
Re: How different is AWS-RDS postgres?

The docs for the connection pool you are using:
https://tomcat.apache.org/tomcat-7.0-doc/jdbc-pool.html and a screenshot
for the relevant configuration params.

On Thu, May 27, 2021 at 3:24 PM Rob Sargent <robjsargent@gmail.com> wrote:

Show quoted text

On 5/27/21 4:10 PM, Ron wrote:

On 5/27/21 4:58 PM, Rob Sargent wrote:

On 5/27/21 3:08 PM, Sam Gendler wrote:

The same JDBC connection that is resulting in lost data? Sounds to me
like you aren't connecting to the DB you think you are connecting to.

I almost wish that were true.

However, looking at AWS "Performance Insights" is see the sql statements
generate by my app begin executed on the server. Not coincidentally this
is from the "Top SQL (10)/Load by waits" view. Now that view does NOT, in
point of fact, name the database in which the sql is running, but the rest
of my environment pretty much rules out silently switching tracks on
myself. I have to read from the correct database, using a UUID, to get
data to analyze, then save the results of the analysis back. I'm using my
wonderful webapp to run this and I've successfully analyzed and stored
result for small starting data sets.

I just notice the "Top database" tab on that screen: there is only one
and it's the correct one.

I've reproduced the behaviour. I'm pretty convinced it a size thing, but
which part of the system is being max'd out is not clear. Not CPU, but
that's the only resource the "Performance Insight" mentions (suggesting
this UI wasn't designed by a database person).

The loss of the staging table is most spectacular. It filled from a file
which has 7.5M records. It's clear in the tomcat logs that is was created
and written to, one record read. Then the "top sql" happens:

insert into segment select * from
bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where
s.probandset_id >= ? and s.probandset_id < ?

the "bulk" table is the staging table. The params are filled in with a
quasi-uuid which grab roughly 1/16th of the data. In the stack trace on my
tomcat server I get

Caused by: org.jooq.exception.DataAccessException: SQL [insert into
segment select * from
bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where
s.probandset_id >= '30000000-0000-0000-0000-00000000\
0000' and s.probandset_id < '40000000-0000-0000-0000-000000000000' ]; An
I/O error occurred while sending to the backend.

So this would have been the fourth such insert statement read from that
staging table.

That table is not deleted by my code. It is renamed after the last insert
into segment, by appending "_done" to the name. But that table, by either
name, is nowhere to be seen on the server.

For those scoring at home, the trouble in the tomcat log start with

0:02:11.452 [https-jsse-nio-10.0.2.28-15002-exec-7] INFO
edu.utah.camplab.jx.PayloadFromMux -

Is there any way to replicate this in psql?

I have the json file which feeds the staging table and I have code which
is designed to load such files. I suppose an sql file with those 16 insert
statements would approximate the the apps call. Let me give that a shot.

--
Angular momentum makes the world go 'round.

Attachments:

Screen Shot 2021-05-27 at 3.29.47 PM.pngimage/png; name="Screen Shot 2021-05-27 at 3.29.47 PM.png"Download+19-2
#16Rob Sargent
robjsargent@gmail.com
In reply to: Sam Gendler (#14)
Re: How different is AWS-RDS postgres?

On 5/27/21 4:25 PM, Sam Gendler wrote:

That sure looks like something is causing your connection to have a
transaction rollback.  I haven't worked in Java in far too long, but
it seems like your connection pool is under the impression your
connection was abandoned so it reclaims it and rollback the
transaction, which would explain why you aren't seeing the table when
all is said and done - all of the work is being undone at the end.

One possibility, based on the catalina log you provided - if you have
either end of the connection set up to automatically close idle
connections after a period of time, then you might receive a closed
connection from the pool, which will just error out when you attempt
to run a query. In which case, you need to set up your connection pool
to test a connection before it returns it to the requester.  Usually
something as simple as "select 2" will be sufficient to determine if
the database connection is open. I can just about guarantee that your
connection pool has a parameter which allows you to specify a query to
execute when a connection is requested.

Well I /was/ doing

contextResource.setProperty("validationQuery", "SELECT 1");

but I see that I lost that when I switched to using a properties file. 
Thanks for point me there.

The loop of 16 insert statement is in a single transaction, single
connection so I'm not sure who's choking first.  Is the connection idle
after the I/O error or is the I/O error from a dead connection?  (Small
disclaimer:  there is no catalina involved here, just an embedded tomcat
instance.)

#17Sam Gendler
sgendler@ideasculptor.com
In reply to: Rob Sargent (#16)
Re: How different is AWS-RDS postgres?

Unless something has changed in recent years, the core servlet engine of
tomcat IS catalina. Embedded tomcat is embedded catalina. It looks like
the I/O error is a result of attempting to send a query on an already dead
connection. I'd look for something that is limiting how long a connection
can be open - either an explicit or default value for a timeout in the
connection pool or on the server side. If you don't get the same behaviour
when running against a database locally, I'd definitely look at the default
settings in RDS. It may be automatically closing connections if they are
idle for even a brief period.

On Thu, May 27, 2021 at 3:35 PM Rob Sargent <robjsargent@gmail.com> wrote:

Show quoted text

On 5/27/21 4:25 PM, Sam Gendler wrote:

That sure looks like something is causing your connection to have a
transaction rollback. I haven't worked in Java in far too long, but it
seems like your connection pool is under the impression your connection was
abandoned so it reclaims it and rollback the transaction, which would
explain why you aren't seeing the table when all is said and done - all of
the work is being undone at the end.

One possibility, based on the catalina log you provided - if you have
either end of the connection set up to automatically close idle connections
after a period of time, then you might receive a closed connection from the
pool, which will just error out when you attempt to run a query. In which
case, you need to set up your connection pool to test a connection before
it returns it to the requester. Usually something as simple as "select 2"
will be sufficient to determine if the database connection is open. I can
just about guarantee that your connection pool has a parameter which allows
you to specify a query to execute when a connection is requested.

Well I /was/ doing

contextResource.setProperty("validationQuery",
"SELECT 1");

but I see that I lost that when I switched to using a properties file.
Thanks for point me there.

The loop of 16 insert statement is in a single transaction, single
connection so I'm not sure who's choking first. Is the connection idle
after the I/O error or is the I/O error from a dead connection? (Small
disclaimer: there is no catalina involved here, just an embedded tomcat
instance.)

#18Rob Sargent
robjsargent@gmail.com
In reply to: Sam Gendler (#17)
Re: How different is AWS-RDS postgres?

Not important who does what with whom re Catalina/Tomcat ;)

I will indeed re-examine timeouts and such as inserting 100sK record is not instantaneous by any stretch. Tomcat is the new kid on my block as prior to this release I managed a naked port with a Selector and that had no trouble with this same test data. My biggest concern is the lack of server side indications. I also need to confirm the table create/bulk-copy and update to target table (segment) are in separate transactions. Doesn’t look like it, but they’re supposed to be. I want the data written to the db in bulk, then come back round and load to the final table in chunks.

Show quoted text

On May 27, 2021, at 5:20 PM, Sam Gendler <sgendler@ideasculptor.com> wrote:

Unless something has changed in recent years, the core servlet engine of tomcat IS catalina. Embedded tomcat is embedded catalina. It looks like the I/O error is a result of attempting to send a query on an already dead connection. I'd look for something that is limiting how long a connection can be open - either an explicit or default value for a timeout in the connection pool or on the server side. If you don't get the same behaviour when running against a database locally, I'd definitely look at the default settings in RDS. It may be automatically closing connections if they are idle for even a brief period.

On Thu, May 27, 2021 at 3:35 PM Rob Sargent <robjsargent@gmail.com> wrote:

On 5/27/21 4:25 PM, Sam Gendler wrote:
That sure looks like something is causing your connection to have a transaction rollback. I haven't worked in Java in far too long, but it seems like your connection pool is under the impression your connection was abandoned so it reclaims it and rollback the transaction, which would explain why you aren't seeing the table when all is said and done - all of the work is being undone at the end.

One possibility, based on the catalina log you provided - if you have either end of the connection set up to automatically close idle connections after a period of time, then you might receive a closed connection from the pool, which will just error out when you attempt to run a query. In which case, you need to set up your connection pool to test a connection before it returns it to the requester. Usually something as simple as "select 2" will be sufficient to determine if the database connection is open. I can just about guarantee that your connection pool has a parameter which allows you to specify a query to execute when a connection is requested.

Well I /was/ doing
contextResource.setProperty("validationQuery", "SELECT 1");
but I see that I lost that when I switched to using a properties file. Thanks for point me there.

The loop of 16 insert statement is in a single transaction, single connection so I'm not sure who's choking first. Is the connection idle after the I/O error or is the I/O error from a dead connection? (Small disclaimer: there is no catalina involved here, just an embedded tomcat instance.)

#19Rob Sargent
robjsargent@gmail.com
In reply to: Sam Gendler (#17)
Re: How different is AWS-RDS postgres?

Found it.  Though there are indeed separate transactions for the bulk
copy  v. target update, I had missed that there is an over-arching
transaction in play and of course when things go south, /all/ things go
south.

I get away with this, for now by bumping the tomcat idle time to an
hour, which is serious overkill.