Re: Possible issue with expanded object infrastructure on Postgres 9.6.1

Started by Justin Workmanalmost 9 years ago2 messages

justin@photolynx.com

almost 9 years ago

It would help to know the data types of the columns involved in this
query; but just eyeballing it, it doesn't look like it involves any
array operations, so it's pretty hard to believe that the expanded-object
code could have gotten invoked intentionally. (The mere presence of
an array column somewhere in the vicinity would not do that; you'd
need to invoke an array-ish operation, or at least pass the array into
a plpgsql function.)
If I had to bet on the basis of this much info, I would bet that the
parallel-query infrastructure is dropping the ball somewhere and
transmitting a corrupted datum that accidentally looks like it is
an expanded-object reference.
If $customer wants a quick fix, I'd suggest seeing whether disabling
parallel query makes the problem go away. That might be a good first
step anyway, just to narrow down where the problem lies.
regards, tom lane

Hi Tom,

My name is Justin, and I am $customer as it were. As Peter explained, we
haven't seen the segfaults anymore since disabling parallel queries. This
works as a quick fix and is much appreciated! If you would still like to
get to the bottom of this, I am willing to help out with more information
as needed. My knowledge of PG internals is extremely limited so I don't
know how much help I can be, but we'd like to see this resolved beyond the
quick fix, or at least understand why it happened.

album_photo_assignments.id, album_photo_assignments.album_id,
album_photo_assignments.photo_id and albums.id are all UUID columns.
albums.deleted_at is a timestamp.

Thanks so much for your time,

Justin Workman

Amit Kapila

amit.kapila16@gmail.com

almost 9 years ago

In reply to: Justin Workman (#1)

On Thu, Feb 16, 2017 at 11:45 AM, Justin Workman <justin@photolynx.com> wrote:

It would help to know the data types of the columns involved in this
query; but just eyeballing it, it doesn't look like it involves any
array operations, so it's pretty hard to believe that the expanded-object
code could have gotten invoked intentionally. (The mere presence of
an array column somewhere in the vicinity would not do that; you'd
need to invoke an array-ish operation, or at least pass the array into
a plpgsql function.)
If I had to bet on the basis of this much info, I would bet that the
parallel-query infrastructure is dropping the ball somewhere and
transmitting a corrupted datum that accidentally looks like it is
an expanded-object reference.
If $customer wants a quick fix, I'd suggest seeing whether disabling
parallel query makes the problem go away. That might be a good first
step anyway, just to narrow down where the problem lies.
regards, tom lane

Hi Tom,

My name is Justin, and I am $customer as it were. As Peter explained, we
haven't seen the segfaults anymore since disabling parallel queries. This
works as a quick fix and is much appreciated! If you would still like to get
to the bottom of this, I am willing to help out with more information as
needed. My knowledge of PG internals is extremely limited so I don't know
how much help I can be, but we'd like to see this resolved beyond the quick
fix, or at least understand why it happened.

Thanks Justin for the confirmation. Is it possible for you to get a
standalone test case with which we can reproduce the problem?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers