BUG #15299: relation does not exist errors

Started by PG Bug reporting formover 7 years ago8 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 15299
Logged by: Jeff Frost
Email address: jeff@pgexperts.com
PostgreSQL version: 9.5.13
Operating system: Ubuntu 14.04
Description:

We recently upgraded some production servers to 9.5.13 on Saturday
(7/21/2018) and they ran fine for 2 days. On Wednesday morning, we started
seeing some strange issues where postgres was reporting tons of "relation"
does not exist errors. Here are some snippets from the logs:

2018-07-25 14:33:53.243
GMT,"app_db","webapp",34887,"127.0.0.1:56740",5b587d3b.8847,90363,"PARSE",2018-07-25
13:38:03 GMT,287/3021104,0,ERROR,42P01,"relation ""login_informations"" does
not exist",,,
,,,"SELECT ""login_informations"".* FROM ""login_informations"" WHERE
""login_informations"".""id"" = 735211 LIMIT 1",37,,"sidekiq 4.1.2 procore
[0 of 5 busy]"
2018-07-25 14:33:53.251
GMT,"app_db","webapp",34887,"127.0.0.1:56740",5b587d3b.8847,90364,"PARSE",2018-07-25
13:38:03 GMT,287/3021107,0,ERROR,42P01,"relation ""login_informations"" does
not exist",,,
,,,"SELECT ""login_informations"".* FROM ""login_informations"" WHERE
""login_informations"".""id"" = 735211 LIMIT 1 ",37,,"sidekiq
4.1.2 procore [0 of 5 busy]"
2018-07-25 14:33:53.252
GMT,"app_db","webapp",34887,"127.0.0.1:56740",5b587d3b.8847,90365,"PARSE",2018-07-25
13:38:03 GMT,287/3021109,0,ERROR,42P01,"relation ""login_informations"" does
not exist",,,
,,,"SELECT ""login_informations"".* FROM ""login_informations"" WHERE
""login_informations"".""id"" = 735211 LIMIT 1 ",37,,"sidekiq
4.1.2 procore [0 of 5 busy]"
2018-07-25 14:33:53.258
GMT,"app_db","webapp",34887,"127.0.0.1:56740",5b587d3b.8847,90366,"PARSE",2018-07-25
13:38:03 GMT,287/3021110,0,ERROR,42P01,"relation ""login_informations"" does
not exist",,,
,,,"SELECT ""login_informations"".* FROM ""login_informations"" WHERE
""login_informations"".""id"" = 735211 LIMIT 1",37,,"sidekiq 4.1.2 procore
[0 of 5 busy]"
2018-07-25 14:33:53.295
GMT,"app_db","webapp",34887,"127.0.0.1:56740",5b587d3b.8847,90367,"PARSE",2018-07-25
13:38:03 GMT,287/3021111,0,ERROR,42P01,"relation ""login_informations"" does
not exist",,,
,,,"SELECT ""login_informations"".* FROM ""login_informations"" WHERE
""login_informations"".""id"" = 735211 LIMIT 1 ",37,,"sidekiq
4.1.2 procore [0 of 5 busy]"
2018-07-25 14:33:53.296
GMT,"app_db","webapp",34887,"127.0.0.1:56740",5b587d3b.8847,90368,"PARSE",2018-07-25
13:38:03 GMT,287/3021112,0,ERROR,42P01,"relation ""permission_templates""
does not exist",
,,,,,"SELECT ""permission_templates"".* FROM ""permission_templates"" WHERE
""permission_templates"".""id"" = 137880 LIMIT 1 ",39,,
"sidekiq 4.1.2 procore [0 of 5 busy]"

Connecting to the DB via psql and issuing a \d login_informations showed
that the table was there. Running one of the queries from the logs by hand
worked fine.

Since we use pgbouncer as a connection pooler, we then connected using psql
through pgbouncer on one of the affected hosts and were able to reproduce
the issue, so the theory was that some postgresql backends had lost track of
relations. We stopped/started pgbouncer to allow those backends to exit and
the issue was resolved.

A little about the DB in case this is helpful:

DB size: 12TB
DB load: 25k xacts/sec
Characterization: 80% read / 20% write
Replication role: primary with 1 direct replica and a few cascaded
replicas

So far haven't been able to reproduce synthetically, but let me know if
there is any more info we can pull from the server if this happens again.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
Re: BUG #15299: relation does not exist errors

=?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:

We recently upgraded some production servers to 9.5.13 on Saturday
(7/21/2018) and they ran fine for 2 days. On Wednesday morning, we started
seeing some strange issues where postgres was reporting tons of "relation"
does not exist errors. Here are some snippets from the logs:

2018-07-25 14:33:53.243
GMT,"app_db","webapp",34887,"127.0.0.1:56740",5b587d3b.8847,90363,"PARSE",2018-07-25
13:38:03 GMT,287/3021104,0,ERROR,42P01,"relation ""login_informations"" does
not exist",,,
,,,"SELECT ""login_informations"".* FROM ""login_informations"" WHERE
""login_informations"".""id"" = 735211 LIMIT 1",37,,"sidekiq 4.1.2 procore
[0 of 5 busy]"

Connecting to the DB via psql and issuing a \d login_informations showed
that the table was there. Running one of the queries from the logs by hand
worked fine.

My first thought would be something to do with a restrictive search_path.

regards, tom lane

#3Jeff Frost
jeff@pgexperts.com
In reply to: Tom Lane (#2)
Re: BUG #15299: relation does not exist errors

On Jul 26, 2018, at 9:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

=?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:

We recently upgraded some production servers to 9.5.13 on Saturday
(7/21/2018) and they ran fine for 2 days. On Wednesday morning, we started
seeing some strange issues where postgres was reporting tons of "relation"
does not exist errors. Here are some snippets from the logs:

2018-07-25 14:33:53.243
GMT,"app_db","webapp",34887,"127.0.0.1:56740",5b587d3b.8847,90363,"PARSE",2018-07-25
13:38:03 GMT,287/3021104,0,ERROR,42P01,"relation ""login_informations"" does
not exist",,,
,,,"SELECT ""login_informations"".* FROM ""login_informations"" WHERE
""login_informations"".""id"" = 735211 LIMIT 1",37,,"sidekiq 4.1.2 procore
[0 of 5 busy]"

Connecting to the DB via psql and issuing a \d login_informations showed
that the table was there. Running one of the queries from the logs by hand
worked fine.

My first thought would be something to do with a restrictive search_path.

Good thought. As far as I can tell we never change it from the default and everything in that DB is in the public schema.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Frost (#3)
Re: BUG #15299: relation does not exist errors

Jeff Frost <jeff@pgexperts.com> writes:

On Jul 26, 2018, at 9:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

My first thought would be something to do with a restrictive search_path.

Good thought. As far as I can tell we never change it from the default and everything in that DB is in the public schema.

Right, but what I'm half-theorizing is that the patches for CVE-2018-1058
affected you somehow. Maybe something "transiently" installed pg_catalog
(only) as search_path and forgot to remove it again. It would be easy
to explain it that way if the errors had been coming out of autovacuum
worker processes, for instance, though your log didn't match that exactly.

If you see it again it would be a good idea to look very closely at what
happened right before the errors started.

regards, tom lane

#5Jeff Frost
jeff@pgexperts.com
In reply to: Tom Lane (#4)
Re: BUG #15299: relation does not exist errors

On Jul 26, 2018, at 12:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeff Frost <jeff@pgexperts.com> writes:

On Jul 26, 2018, at 9:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

My first thought would be something to do with a restrictive search_path.

Good thought. As far as I can tell we never change it from the default and everything in that DB is in the public schema.

Right, but what I'm half-theorizing is that the patches for CVE-2018-1058
affected you somehow. Maybe something "transiently" installed pg_catalog
(only) as search_path and forgot to remove it again. It would be easy
to explain it that way if the errors had been coming out of autovacuum
worker processes, for instance, though your log didn't match that exactly.

If you see it again it would be a good idea to look very closely at what
happened right before the errors started.

Oh, I see what you're saying. Don't the autovacuum processes connect to their own backends which then exit after they're complete though?

I have the logs around, so if you know what I should look for, I'd be happy to dig.

#6Andres Freund
andres@anarazel.de
In reply to: Jeff Frost (#5)
Re: BUG #15299: relation does not exist errors

On 2018-07-26 13:18:33 -0700, Jeff Frost wrote:

Oh, I see what you're saying. Don't the autovacuum processes connect to their own backends which then exit after they're complete though?

Yes, that's right. Creating an extension or calling a function that
internally does a SET search_path seem like a more likely cause.

- Andres

#7Jeff Frost
jeff@pgexperts.com
In reply to: Andres Freund (#6)
Re: BUG #15299: relation does not exist errors

On Jul 26, 2018, at 1:23 PM, Andres Freund <andres@anarazel.de> wrote:

On 2018-07-26 13:18:33 -0700, Jeff Frost wrote:

Oh, I see what you're saying. Don't the autovacuum processes connect to their own backends which then exit after they're complete though?

Yes, that's right. Creating an extension or calling a function that
internally does a SET search_path seem like a more likely cause.

I grepped our code and didn't find anywhere that we set search_path, though it's possible some ruby gem we use could be doing it. I'll go spelunking there next!

#8Jeff Frost
jeff@pgexperts.com
In reply to: Jeff Frost (#7)
Re: BUG #15299: relation does not exist errors

On Jul 26, 2018, at 1:53 PM, Jeff Frost <jeff@pgexperts.com> wrote:

On Jul 26, 2018, at 1:23 PM, Andres Freund <andres@anarazel.de> wrote:

On 2018-07-26 13:18:33 -0700, Jeff Frost wrote:

Oh, I see what you're saying. Don't the autovacuum processes connect to their own backends which then exit after they're complete though?

Yes, that's right. Creating an extension or calling a function that
internally does a SET search_path seem like a more likely cause.

I grepped our code and didn't find anywhere that we set search_path, though it's possible some ruby gem we use could be doing it. I'll go spelunking there next!

Wanted to close the loop on this. As usual, Tom was right.

We have a rake task in the application used to pull a production copy of structure.sql which uses pg_dump under the covers. Because it uses the same conninfo as the app, it happily goes through pgbouncer. So, when that rake task is run, pg_dump connects through pgbouncer and happily sets that restrictive search_path. It then creates the schema-only dump and upon completion, that connection is returned to the pool as a big fat boobytrap for the next application connection.