initdb issue on 64-bit Windows - (Was: [pgsql-packagers] PG 9.6beta2 tarballs are ready)
On Fri, Jun 24, 2016 at 2:14 AM, Umair Shahid <umair.shahid@2ndquadrant.com>
wrote:
---------- Forwarded message ----------
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu, Jun 23, 2016 at 9:32 PM
Subject: Re: [pgsql-packagers] PG 9.6beta2 tarballs are ready
To: Magnus Hagander <magnus@hagander.net>
Cc: Umair Shahid <umair.shahid@2ndquadrant.com>, Dave Page <
dpage@postgresql.org>, PostgreSQL Packagers <
pgsql-packagers@postgresql.org>Magnus Hagander <magnus@hagander.net> writes:
That makes more sense as the joinrel stuff *has* been changed between the
two betas. I'm sure someone who's touched that code (Tom?) can comment on
that part..It still makes little sense to me, as the previous reports say that the
problem happened during bootstrap, and the planner does not run
during bootstrap.Could we get a look at debug_query_string in the coredump, to possibly
narrow down where the crash is really happening?
Moving thread to -hackers ...
debug_query_string is
* "INSERT INTO pg_description SELECT t.objoid, c.oid, t.objsubid,
t.description FROM tmp_pg_description t, pg_class c WHERE c.relname =
t.classname;"*
Happening in "setup_description"
Show quoted text
It's still strange that it doesn't affect woodlouse.
Or any of the other Windows critters...
regards, tom lane
--
Umair Shahid
2ndQuadrant - The PostgreSQL Support Company
http://www.2ndQuadrant.com/
On 24 June 2016 at 05:17, Umair Shahid <umair.shahid@gmail.com> wrote:
On Fri, Jun 24, 2016 at 2:14 AM, Umair Shahid <
umair.shahid@2ndquadrant.com> wrote:---------- Forwarded message ----------
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu, Jun 23, 2016 at 9:32 PM
Subject: Re: [pgsql-packagers] PG 9.6beta2 tarballs are ready
To: Magnus Hagander <magnus@hagander.net>
Cc: Umair Shahid <umair.shahid@2ndquadrant.com>, Dave Page <
dpage@postgresql.org>, PostgreSQL Packagers <
pgsql-packagers@postgresql.org>Magnus Hagander <magnus@hagander.net> writes:
That makes more sense as the joinrel stuff *has* been changed between
the
two betas. I'm sure someone who's touched that code (Tom?) can comment
on
that part..
It still makes little sense to me, as the previous reports say that the
problem happened during bootstrap, and the planner does not run
during bootstrap.Could we get a look at debug_query_string in the coredump, to possibly
narrow down where the crash is really happening?Moving thread to -hackers ...
debug_query_string is
* "INSERT INTO pg_description SELECT t.objoid, c.oid, t.objsubid,
t.description FROM tmp_pg_description t, pg_class c WHERE c.relname =
t.classname;"*Happening in "setup_description"
I was helping Haroon with this last night. I don't have access to the
original thread and he's not around so I don't know how much he said. I'll
repeat our findings here.
During debugging I found that:
* A VS 2013 build (perfomed by Haroon and copied to the test host) crashes
consistently with the reported symptoms - "performing post-bootstrap
initialization ... child process was terminated by exception 0xC0000005"
* The issue doesn't happen in a VS 2015 build done on the test host
* I couldn't use just-in-time debugging because the restricted execution
token setup isolated the process. For the same reason, breakpoints stop
working in initdb.c after line 3557.
* To get a backtrace, I had to:
* Launch a VS x86 command prompt
* devenv /debugexe bin\initdb.exe -D test
* Set a breakpoint in initdb.c:3557 and initdb.c:3307
* Run
* When it traps at get_restricted_token(), manually move the execution
pointer over the setup of the restricted execution token by dragging &
dropping the yellow instruction pointer arrow. Yes, really. Or, y'know,
comment it out and rebuild, but I was working with a supplied binary.
* Continue until next breakpoint
* Launch process explorer and find the pid of the postgres child process
* Debug->attach to process, attach to the child postgres. This doesn't
detach the parent, VS does multiprocess debugging.
* Continue execution
* vs will trap on the child when it crashes
* It is an access violation (segfault) in postgres.exe when attempting to
read memory at 0xFFFFFFFFFFFFFFFF in calc_joinrel_size_estimate() at
costsize.c:3940
fkselec = get_foreign_key_join_selectivity(root,
outer_rel->relids,
inner_rel->relids,
sjinfo,
&restrictlist);
with debug_query_string:
0x0000000009bf6140 "INSERT INTO pg_description SELECT t.objoid, c.oid,
t.objsubid, t.description FROM tmp_pg_description t, pg_class c WHERE
c.relname = t.classname;\n"
Backtrace:
Exception thrown at 0x00000001401A5A81 in postgres.exe: 0xC0000005:
Access violation reading location 0xFFFFFFFFFFFFFFFF.
postgres.exe!calc_joinrel_size_estimate(PlannerInfo * root, RelOptInfo *
outer_rel, RelOptInfo * inner_rel, double outer_rows, double inner_rows,
SpecialJoinInfo * sjinfo, List * restrictlist) Line 3944 C
postgres.exe!set_joinrel_size_estimates(PlannerInfo * root, RelOptInfo *
rel, RelOptInfo * outer_rel, RelOptInfo * inner_rel, SpecialJoinInfo *
sjinfo, List * restrictlist) Line 3852 C
postgres.exe!build_join_rel(PlannerInfo * root, Bitmapset * joinrelids,
RelOptInfo * outer_rel, RelOptInfo * inner_rel, SpecialJoinInfo * sjinfo,
List * * restrictlist_ptr) Line 521 C
postgres.exe!make_join_rel(PlannerInfo * root, RelOptInfo * rel1,
RelOptInfo * rel2) Line 721 C
postgres.exe!make_rels_by_clause_joins(PlannerInfo * root, RelOptInfo *
old_rel, ListCell * other_rels) Line 266 C
postgres.exe!join_search_one_level(PlannerInfo * root, int level) Line 69
C
postgres.exe!standard_join_search(PlannerInfo * root, int levels_needed,
List * initial_rels) Line 2172 C
postgres.exe!query_planner(PlannerInfo * root, List * tlist,
void(*)(PlannerInfo *, void *) qp_callback, void * qp_extra) Line 255 C
postgres.exe!grouping_planner(PlannerInfo * root, char
inheritance_update, double tuple_fraction) Line 1695 C
postgres.exe!subquery_planner(PlannerGlobal * glob, Query * parse,
PlannerInfo * parent_root, char hasRecursion, double tuple_fraction) Line
775 C
postgres.exe!standard_planner(Query * parse, int cursorOptions,
ParamListInfoData * boundParams) Line 312 C
postgres.exe!pg_plan_query(Query * querytree, int cursorOptions,
ParamListInfoData * boundParams) Line 800 C
postgres.exe!exec_simple_query(const char * query_string) Line 1023 C
postgres.exe!PostgresMain(int argc, char * * argv, const char * dbname,
const char * username) Line 4076 C
postgres.exe!main(int argc, char * * argv) Line 227 C
Local vars:
+ inner_rel 0x0000000009dfd170 {type=T_EquivalenceClass (537)
reloptkind=RELOPT_BASEREL (0) relids=0x0000000009d6d718 {...} ...} RelOptInfo
*
inner_rows 270.00000000000000 double
+ outer_rel 0x00000001401ded48
{postgres.exe!build_joinrel_tlist(PlannerInfo * root, RelOptInfo * joinrel,
RelOptInfo * input_rel), Line 646} {...} RelOptInfo *
outer_rows 2.653352065130e-314#DEN double
+ restrictlist 0x0000000009d6f7f8 {type=T_List (656) length=1
head=0x0000000009d6f7d8 {data={ptr_value=0x0000000009d6e980 ...} ...} ...} List
*
+ root 0x0000000009dfd800 {type=1 parse=0x000000000067d220
{type=T_AllocSetContext (601) commandType=CMD_UNKNOWN (0) ...} ...} PlannerInfo
*
+ sjinfo 0x000000000043f870 {type=T_SpecialJoinInfo (543)
min_lefthand=0x0000000009dfcfd8 {nwords=1 words=0x0000000009dfcfdc {...} }
...} SpecialJoinInfo *
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Jun 24, 2016 at 11:21 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
* Launch a VS x86 command prompt
* devenv /debugexe bin\initdb.exe -D test
* Set a breakpoint in initdb.c:3557 and initdb.c:3307
* Run
* When it traps at get_restricted_token(), manually move the execution
pointer over the setup of the restricted execution token by dragging &
dropping the yellow instruction pointer arrow. Yes, really. Or, y'know,
comment it out and rebuild, but I was working with a supplied binary.
* Continue until next breakpoint
* Launch process explorer and find the pid of the postgres child process
* Debug->attach to process, attach to the child postgres. This doesn't
detach the parent, VS does multiprocess debugging.
* Continue execution
* vs will trap on the child when it crashes
Do you think a crash dump could have been created by creating
crashdumps/ in PGDATA as part of initdb before this query is run?
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 24 June 2016 at 10:21, Craig Ringer <craig@2ndquadrant.com> wrote:
* To get a backtrace, I had to:
* Launch a VS x86 command prompt
* devenv /debugexe bin\initdb.exe -D test
* Set a breakpoint in initdb.c:3557 and initdb.c:3307
* Run
* When it traps at get_restricted_token(), manually move the execution
pointer over the setup of the restricted execution token by dragging &
dropping the yellow instruction pointer arrow. Yes, really. Or, y'know,
comment it out and rebuild, but I was working with a supplied binary.
* Continue until next breakpoint
* Launch process explorer and find the pid of the postgres child process
* Debug->attach to process, attach to the child postgres. This doesn't
detach the parent, VS does multiprocess debugging.
* Continue execution
* vs will trap on the child when it crashes
Also, to save anyone else this hassle, I have saved a process dump (windows
core file) and the debug symbols to gdrive. You can get them at:
Note that you will need a Visual Studio version installed. VS Community
2015 works fine. You only need to install the C++ devenv and C++ headers,
you don't need MFC or any of the rest. The default install is fine if you
don't mind a bigger download. Once installed, open postgres.dmp, then go
to debug->options, symbols. There, enable the Microsoft Symbol Server, and
also add a new entry for the absolute path to the symbols directory for the
archive you unpacked. You should enable the symbol cache directory too,
make a directory in your user dir and put it there.
If Haroon shared some gdrive links earlier on the thread I don't have
access to, this is the same data just efficiently compressed (32MB instead
of 180MB) and packaged up in a single convenient archive with the matching
sources and a full working install. You'll need 7zip to unpack it, but that
should be on your "install as soon as you install Windows" list anyway.
https://drive.google.com/open?id=0B7JKjZdzBUo1aE5DQnZ5VEpBUEk
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 24 June 2016 at 10:28, Michael Paquier <michael.paquier@gmail.com> wrote:
On Fri, Jun 24, 2016 at 11:21 AM, Craig Ringer <craig@2ndquadrant.com>
wrote:* Launch a VS x86 command prompt
* devenv /debugexe bin\initdb.exe -D test
* Set a breakpoint in initdb.c:3557 and initdb.c:3307
* Run
* When it traps at get_restricted_token(), manually move the execution
pointer over the setup of the restricted execution token by dragging &
dropping the yellow instruction pointer arrow. Yes, really. Or, y'know,
comment it out and rebuild, but I was working with a supplied binary.
* Continue until next breakpoint
* Launch process explorer and find the pid of the postgres childprocess
* Debug->attach to process, attach to the child postgres. This doesn't
detach the parent, VS does multiprocess debugging.
* Continue execution
* vs will trap on the child when it crashesDo you think a crash dump could have been created by creating
crashdumps/ in PGDATA as part of initdb before this query is run?
I see what you did there ;)
Yes, quite possibly, actually. I should've just got Haroon to build me a
new initdb without the priv setting and with creation of crashdumps/ .
It might be worth testing that out and adding an initdb startup flag to
create the directory, since initdb is such a PITA to debug.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Jun 24, 2016 at 11:33 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
Yes, quite possibly, actually. I should've just got Haroon to build me a new
initdb without the priv setting and with creation of crashdumps/ .It might be worth testing that out and adding an initdb startup flag to
create the directory, since initdb is such a PITA to debug.
I was more thinking about putting that under -DDEBUG for example.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier
Sent: Friday, June 24, 2016 11:37 AM
On Fri, Jun 24, 2016 at 11:33 AM, Craig Ringer <craig@2ndquadrant.com>
wrote:
It might be worth testing that out and adding an initdb startup flagto create the directory, since initdb is such a PITA to debug.
I was more thinking about putting that under -DDEBUG for example.
I think just the existing option -d (--debug) and/or -n (--no-clean) would be OK.
Regards
Takayuki Tsunakawa
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jun 24, 2016 at 11:51 AM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier
Sent: Friday, June 24, 2016 11:37 AM
On Fri, Jun 24, 2016 at 11:33 AM, Craig Ringer <craig@2ndquadrant.com>
wrote:
It might be worth testing that out and adding an initdb startup flagto create the directory, since initdb is such a PITA to debug.
I was more thinking about putting that under -DDEBUG for example.
I think just the existing option -d (--debug) and/or -n (--no-clean) would be OK.
If the majority thinks that an option switch is more adapted, I won't
fight it strongly. Just please let's not mess up with the behavior of
the existing options.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 24 June 2016 at 05:17, Umair Shahid <umair.shahid@gmail.com> wrote:
It's still strange that it doesn't affect woodlouse.
Or any of the other Windows critters...
<http://www.2ndQuadrant.com/>
Given that it's only been seen in VS 2013, it's particularly odd that it's
not biting woodlouse.
I'd like more details from those whose installs are crashing. What exact
vcvars env did you run under, with which exact cl.exe version?
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Jun 24, 2016 at 1:28 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
Given that it's only been seen in VS 2013, it's particularly odd that it's
not biting woodlouse.I'd like more details from those whose installs are crashing. What exact
vcvars env did you run under, with which exact cl.exe version?
Which OS did you use for the compilation? I don't think that this
matters much but woodloose is using Win7.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 24 June 2016 at 12:31, Michael Paquier <michael.paquier@gmail.com> wrote:
On Fri, Jun 24, 2016 at 1:28 PM, Craig Ringer <craig@2ndquadrant.com>
wrote:Given that it's only been seen in VS 2013, it's particularly odd that
it's
not biting woodlouse.
I'd like more details from those whose installs are crashing. What exact
vcvars env did you run under, with which exact cl.exe version?Which OS did you use for the compilation? I don't think that this
matters much but woodloose is using Win7.
I'll have to wait for Haroon for that info for the crashing builds he did,
but I've now reproduced it with:
Windows server 2012 R2, VS 2013 Community Update 5, cross compile tools for
x86 to amd64. cl 18.00.40629 for x64, env:
%comspec% /k ""C:\Program Files (x86)\Microsoft Visual Studio
12.0\VC\vcvarsall.bat" x86_amd64"
"where cl" reports
C:\Program Files (x86)\Microsoft Visual Studio
12.0\VC\bin\x86_amd64\cl.exe
Note that cross compilation is a typical configuration on Windows, where
you routinely use 32bit x86 compilers to build 64bit code, except in the
newest SDKs.
I see the same symptoms, with the segfault.
This host is a clean install, an AWS instance created for the purpose.
It looks like woodlouse probably runs an older VS2013 and uses the native
x64 toolchain; its env includes:
C:\\Program Files (x86)\\Microsoft Visual Studio 12.0\\VC\\BIN\\amd64
and does not have x86_amd64 in it.
BTW, I suggested to Haroon that he clone beta2 from git, then do a
git-bisect between beta1 (works) and beta2 (fails) to see if he can
identify the commit that causes things to start failing. I don't know how
far he got with that yesterday.
By comparison, I had no problems on the same host with VS Community 2015,
cl 19.00.23918, env "VS2015 x64 Native Tools Command Prompt":
%comspec% /k ""C:\Program Files (x86)\Microsoft Visual Studio
14.0\VC\vcvarsall.bat"" amd64
On a side note I'm unable to build with vs2013 community u5 native tools (
for some reason. Link errors, unresolved external symbol _ischartype_l . cl
18.00.42629 for x64, env:
%comspec% /k ""C:\Program Files (x86)\Microsoft Visual Studio
12.0\VC\vcvarsall.bat" amd64"
"where cl" reports:
C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\amd64\cl.exe
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 24 June 2016 at 13:00, Craig Ringer <craig@2ndquadrant.com> wrote:
I've now reproduced it with:
I can also confirm that it _doesn't_ crash with the same SDK using a 32-bit
build (running under WoW on x64). cl 18.00.40629 for x86, env:
%comspec% /k ""C:\Program Files (x86)\Microsoft Visual Studio
12.0\VC\vcvarsall.bat" x86"
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 24 June 2016 at 10:28, Michael Paquier <michael.paquier@gmail.com> wrote:
On Fri, Jun 24, 2016 at 11:21 AM, Craig Ringer <craig@2ndquadrant.com>
wrote:* Launch a VS x86 command prompt
* devenv /debugexe bin\initdb.exe -D test
* Set a breakpoint in initdb.c:3557 and initdb.c:3307
* Run
* When it traps at get_restricted_token(), manually move the execution
pointer over the setup of the restricted execution token by dragging &
dropping the yellow instruction pointer arrow. Yes, really. Or, y'know,
comment it out and rebuild, but I was working with a supplied binary.
* Continue until next breakpoint
* Launch process explorer and find the pid of the postgres childprocess
* Debug->attach to process, attach to the child postgres. This doesn't
detach the parent, VS does multiprocess debugging.
* Continue execution
* vs will trap on the child when it crashesDo you think a crash dump could have been created by creating
crashdumps/ in PGDATA as part of initdb before this query is run?
The answer is "yes" btw. Add "crashdumps" to the static array of
directories created by initdb and it works great.
Sigh. It'd be less annoying if I hadn't written most of the original patch.
For convenience I also commented out the check_root call in
src/backend/main.c and the get_restricted_token(progname) call in initdb.c,
so I could run it easily under an admin account where I can also install
tools etc without hassle. Not recommended on a non-throwaway machine of
course.
The generated crashdump shows the same crash in the same location.
I have absolutely no idea why it's trying to access memory at what looks
like (uint64)(-1) though. Nothing in the auto vars list:
+ &restrictlist 0x000000000043f7b0 {0x0000000009e32600 {type=T_List (656)
length=1 head=0x0000000009e325e0 {data={ptr_value=...} ...} ...}} List * *
+ inner_rel 0x0000000009e7ad68 {type=T_EquivalenceClass (537)
reloptkind=RELOPT_BASEREL (0) relids=0x0000000009e30520 {...} ...} RelOptInfo
*
+ inner_rel->relids 0x0000000009e30520 {nwords=658 words=0x0000000009e30524
{...} } Bitmapset *
+ outer_rel 0x00000001401dec98
{postgres.exe!build_joinrel_tlist(PlannerInfo * root, RelOptInfo * joinrel,
RelOptInfo * input_rel), Line 646} {...} RelOptInfo *
+ outer_rel->relids 0xe808498b48d78b48 {nwords=??? words=0xe808498b48d78b4c
{...} } Bitmapset *
+ sjinfo 0x000000000043f870 {type=T_SpecialJoinInfo (543)
min_lefthand=0x0000000009e7abd0 {nwords=1 words=0x0000000009e7abd4 {...} }
...} SpecialJoinInfo *
or locals:
+ inner_rel 0x0000000009e7ad68 {type=T_EquivalenceClass (537)
reloptkind=RELOPT_BASEREL (0) relids=0x0000000009e30520 {...} ...} RelOptInfo
*
inner_rows 270.00000000000000 double
+ outer_rel 0x00000001401dec98
{postgres.exe!build_joinrel_tlist(PlannerInfo * root, RelOptInfo * joinrel,
RelOptInfo * input_rel), Line 646} {...} RelOptInfo *
outer_rows 2.653351978175e-314#DEN double
+ restrictlist 0x0000000009e32600 {type=T_List (656) length=1
head=0x0000000009e325e0 {data={ptr_value=0x0000000009e31788 ...} ...} ...} List
*
+ root 0x0000000009e7b3f8 {type=1 parse=0x0000000000504ad0
{type=T_AllocSetContext (601) commandType=CMD_UNKNOWN (0) ...} ...} PlannerInfo
*
+ sjinfo 0x000000000043f870 {type=T_SpecialJoinInfo (543)
min_lefthand=0x0000000009e7abd0 {nwords=1 words=0x0000000009e7abd4 {...} }
...} SpecialJoinInfo *
seems to fit. Though outer_rel->relids is a pretty weird address -
0xe808498b48d78b48? Really?
I'd point DrMemory at it, but unfortunately it only supports 32-bit
applications so far. I don't have access to any of the commerical tools
like Purify. Maybe someone at EDB can help out with that, if you guys do?
Register states are:
RAX = 000000000043F7B0 RBX = 0000000009E32218 RCX = 0000000009E78510 RDX =
0000000009E7ABD0 RSI = 0000000009E78510 RDI = 0000000009E32218 R8 =
0000000009E7B3F8 R9 = 0000000009E7B1E8 R10 = 0000000009E7A9C0 R11 =
0000000000000001 R12 = 0000000009E32200 R13 = 0000000000000000 R14 =
0000000009E7B1E8 R15 = 0000000000000000 RIP = 00000001401A59D1 RSP =
000000000043F6E0 RBP = 0000000009E7A9C0 EFL = 00010202
and the exact crash site is
fkselec = get_foreign_key_join_selectivity(root,
outer_rel->relids,
inner_rel->relids,
sjinfo,
&restrictlist);
00000001401A59AB mov r8,qword ptr [r8+8]
00000001401A59AF mov rdx,qword ptr [rdx+8]
00000001401A59B3 movaps xmmword ptr [rax-28h],xmm6
00000001401A59B7 movaps xmmword ptr [rax-38h],xmm7
00000001401A59BB movaps xmmword ptr [rax-48h],xmm8
00000001401A59C0 movaps xmmword ptr [rax-58h],xmm9
00000001401A59C5 lea rax,[rax+38h]
00000001401A59C9 movaps xmm7,xmm3
00000001401A59CC mov qword ptr [rsp+20h],rax
00000001401A59D1 movaps xmmword ptr [rax-68h],xmm10 <---- here
00000001401A59D6 mov qword ptr [rax-48h],r14
00000001401A59DA mov r14,qword ptr [sjinfo]
00000001401A59E2 mov ebp,dword ptr [r14+28h]
00000001401A59E6 mov qword ptr [rax-50h],r15
00000001401A59EA mov r9,r14
00000001401A59ED mov r15,rcx
00000001401A59F0 call get_foreign_key_join_selectivity
(01401A5C30h)
with
XMM3 000000000000000040A5720000000000
RAX 000000000043F7B0
XMM7 000000000000000040A5720000000000
RSP 000000000043F6E0
XMM10 00000000000000000000000000000000
I'm about 100% ignorant of x64 asm, but hopefully someone can interpret
this usefully. I can tell it's doing a sse "Move Aligned Packed
Single-Precision Floating-Point Values" (from memory into a sse register?)
but that's about it.
rax-68h is 0x000000000043F748. The memory at that location is
00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0 bf 00 00 00 00 00 00 00 00 c0
a9 e7 09 00 00 00 00 f8 b3 e7 09 00 00
So there you go, a whole bunch of data and I, at least, am still none the
wiser.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Jun 24, 2016 at 3:22 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 24 June 2016 at 10:28, Michael Paquier <michael.paquier@gmail.com> wrote:
On Fri, Jun 24, 2016 at 11:21 AM, Craig Ringer <craig@2ndquadrant.com>
wrote:* Launch a VS x86 command prompt
* devenv /debugexe bin\initdb.exe -D test
* Set a breakpoint in initdb.c:3557 and initdb.c:3307
* Run
* When it traps at get_restricted_token(), manually move the execution
pointer over the setup of the restricted execution token by dragging &
dropping the yellow instruction pointer arrow. Yes, really. Or, y'know,
comment it out and rebuild, but I was working with a supplied binary.
* Continue until next breakpoint
* Launch process explorer and find the pid of the postgres child
process
* Debug->attach to process, attach to the child postgres. This doesn't
detach the parent, VS does multiprocess debugging.
* Continue execution
* vs will trap on the child when it crashesDo you think a crash dump could have been created by creating
crashdumps/ in PGDATA as part of initdb before this query is run?The answer is "yes" btw. Add "crashdumps" to the static array of directories
created by initdb and it works great.
As simple as attached..
Sigh. It'd be less annoying if I hadn't written most of the original patch.
You mean the patch that created the crashdumps/ trick? This has saved
me a couple of months back to analyze a problem TBH.
--
Michael
Attachments:
dbg-initdb.patchinvalid/octet-stream; name=dbg-initdb.patchDownload+3-0
On Fri, Jun 24, 2016 at 11:21 AM, Craig Ringer
<craig(at)2ndquadrant(dot)com> wrote:
I was helping Haroon with this last night. I don't have access to the
original thread and he's not around so I don't know how much he said.
I'll
repeat our findings here.
Craig, I am around now looking into this. I'll update the list as I get
more info.
- Haroon
On 24 June 2016 at 11:27, Michael Paquier <michael.paquier@gmail.com> wrote:
On Fri, Jun 24, 2016 at 3:22 PM, Craig Ringer <craig@2ndquadrant.com>
wrote:On 24 June 2016 at 10:28, Michael Paquier <michael.paquier@gmail.com>
wrote:
On Fri, Jun 24, 2016 at 11:21 AM, Craig Ringer <craig@2ndquadrant.com>
wrote:* Launch a VS x86 command prompt
* devenv /debugexe bin\initdb.exe -D test
* Set a breakpoint in initdb.c:3557 and initdb.c:3307
* Run
* When it traps at get_restricted_token(), manually move theexecution
pointer over the setup of the restricted execution token by dragging &
dropping the yellow instruction pointer arrow. Yes, really. Or,y'know,
comment it out and rebuild, but I was working with a supplied binary.
* Continue until next breakpoint
* Launch process explorer and find the pid of the postgres child
process
* Debug->attach to process, attach to the child postgres. Thisdoesn't
detach the parent, VS does multiprocess debugging.
* Continue execution
* vs will trap on the child when it crashesDo you think a crash dump could have been created by creating
crashdumps/ in PGDATA as part of initdb before this query is run?The answer is "yes" btw. Add "crashdumps" to the static array of
directories
created by initdb and it works great.
As simple as attached..
Sigh. It'd be less annoying if I hadn't written most of the original
patch.
You mean the patch that created the crashdumps/ trick? This has saved
me a couple of months back to analyze a problem TBH.
--
Michael
--
Haroon http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jun 24, 2016 at 11:21 AM, Craig Ringer
<craig(at)2ndquadrant(dot)com> wrote:
I was helping Haroon with this last night. I don't have access to the
original thread and he's not around so I don't know how much he said.
I'll
repeat our findings here.
Craig, I am around now looking into this. I'll update the list as I get
more info.
Apparently my previous message (this same text ) didn't make it through ...
-- Haroon
Import Notes
Resolved by subject fallback
I have been running bisect, it breaks at this commit:
*commit 100340e2dcd05d6505082a8fe343fb2ef2fa5b2a*
*Author: Tom Lane <tgl@sss.pgh.pa.us <tgl@sss.pgh.pa.us>>*
*Date: Sat Jun 18 15:22:34 2016 -0400*
* Restore foreign-key-aware estimation of join relation sizes.*
* This patch provides a new implementation of the logic added by commit*
* 137805f89 and later removed by 77ba61080. It differs from the
original*
* primarily in expending much less effort per joinrel in large queries,*
* which it accomplishes by doing most of the matching work once per
query not*
* once per joinrel. Hopefully, it's also less buggy and better
commented.*
* The never-documented enable_fkey_estimates GUC remains gone.*
* There remains work to be done to make the selectivity estimates
account*
* for nulls in FK referencing columns; but that was true of the original*
* patch as well. We may be able to address this point later in beta.*
* In the meantime, any error should be in the direction of
overestimating*
* rather than underestimating joinrel sizes, which seems like the
direction*
* we want to err in.*
* Tomas Vondra and Tom Lane* Discussion: <
31041.1465069446@sss.pgh.pa.us>
--
Haroon http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jun 24, 2016 at 12:19 PM, Haroon Muhammad <contact.mharoon@gmail.com
Show quoted text
wrote:
On Fri, Jun 24, 2016 at 11:21 AM, Craig Ringer
<craig(at)2ndquadrant(dot)com> wrote:I was helping Haroon with this last night. I don't have access to the
original thread and he's not around so I don't know how much he said.I'll
repeat our findings here.
Craig, I am around now looking into this. I'll update the list as I get
more info.Apparently my previous message (this same text ) didn't make it through ...
-- Haroon
Craig Ringer <craig@2ndquadrant.com> writes:
I have absolutely no idea why it's trying to access memory at what looks
like (uint64)(-1) though. Nothing in the auto vars list:
+ &restrictlist 0x000000000043f7b0 {0x0000000009e32600 {type=T_List (656) length=1 head=0x0000000009e325e0 {data={ptr_value=...} ...} ...}} List * * + inner_rel 0x0000000009e7ad68 {type=T_EquivalenceClass (537) reloptkind=RELOPT_BASEREL (0) relids=0x0000000009e30520 {...} ...} RelOptInfo * + inner_rel->relids 0x0000000009e30520 {nwords=658 words=0x0000000009e30524 {...} } Bitmapset * + outer_rel 0x00000001401dec98 {postgres.exe!build_joinrel_tlist(PlannerInfo * root, RelOptInfo * joinrel, RelOptInfo * input_rel), Line 646} {...} RelOptInfo * + outer_rel->relids 0xe808498b48d78b48 {nwords=??? words=0xe808498b48d78b4c {...} } Bitmapset * + sjinfo 0x000000000043f870 {type=T_SpecialJoinInfo (543) min_lefthand=0x0000000009e7abd0 {nwords=1 words=0x0000000009e7abd4 {...} } ...} SpecialJoinInfo *
inner_rel seems to be pointing at garbage, or at least why is the
referenced object tag T_EquivalenceClass not T_RelOptInfo? And
why aren't we being given anything for outer_rel? The value for
outer_rel->relids isn't inspiring any confidence either, and
for that matter inner_rel->relids couldn't possibly have more than
nwords==1 given how simple the query is. In short, either the
debugger is totally confused or the code is, because most of these
pointers aren't pointing at anything sane.
TBH, this looks more like a compiler bug than anything else. I wonder
whether it's getting confused by taking the address of a parameter
(although surely we do that elsewhere).
It would be worth recompiling at -O0, or whatever the local equivalent
of that is, to see if (1) the crash goes away or (2) the debugger's
printouts get any more reliable.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jun 24, 2016 at 1:28 PM, Craig Ringer
<craig(at)2ndquadrant(dot)com> wrote:
I'd like more details from those whose installs are crashing. What exact
vcvars env did you run under, with which exact cl.exe version?
This is a Windows server 2012 R2 Standard.
Devenv: Microsoft Visual Studio 2013 Community Version 12.0.31101.0.
Env:
%comspec% /k ""C:\Program Files (x86)\Microsoft Visual Studio
12.0\VC\vcvarsall.bat"" x86_amd64
'where cl.exe'
C:\Program Files (x86)\Microsoft Visual Studio
12.0\VC\bin\x86_amd64\cl.exe
C:\Program Files (x86)\Microsoft Visual Studio
12.0\VC\bin\cl.exe
I have been able to reproduce it on Windows 7 Professional (Service Pack 1
) also with Microsoft Visual Studio 2013 Community Version 12.0.40629.0.
Env:
%comspec% /k ""C:\Program Files (x86)\Microsoft Visual Studio
12.0\VC\vcvarsall.bat"" x86_amd64
'Where cl.exe'
C:\Program Files (x86)\Microsoft Visual Studio
12.0\VC\bin\x86_amd64\cl.exe
C:\Program Files (x86)\Microsoft Visual Studio
12.0\VC\bin\cl.exe
I started with bisect activity between beta2 (bad) and beta1(good) given
that beta1 works fine. Crash occurs at following commit.
commit 100340e2dcd05d6505082a8fe343fb2ef2fa5b2a
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sat Jun 18 15:22:34 2016 -0400
Restore foreign-key-aware estimation of join relation sizes.
This patch provides a new implementation of the logic added by commit
137805f89 and later removed by 77ba61080. It differs from the original
primarily in expending much less effort per joinrel in large queries,
which it accomplishes by doing most of the matching work once per query
not
once per joinrel. Hopefully, it's also less buggy and better commented.
The never-documented enable_fkey_estimates GUC remains gone.
There remains work to be done to make the selectivity estimates account
for nulls in FK referencing columns; but that was true of the original
patch as well. We may be able to address this point later in beta.
In the meantime, any error should be in the direction of overestimating
rather than underestimating joinrel sizes, which seems like the
direction
we want to err in.
Tomas Vondra and Tom Lane
Discussion: <31041.1465069446@sss.pgh.pa.us>
This appears consistent with the crash in planner suggested by crash dump
Craig shared.
Tom any ideas on what could be going wrong here ?
Given that it fails on 'setup_description', I tried bypassing that by
commenting it out, it again crashes on 'setup_privileges' and
'setup_schema'.
debug_query_string for setup_privileges:
*INSERT INTO pg_init_privs (objoid, classoid, objsubid, initprivs,
privtype) SELECT oid, (SELECT oid FROM pg_class WHERE
relname = 'pg_class'), 0, relacl, 'i' FROM
pg_class WHERE relacl IS NOT NULL AND relkind IN ('r',
'v', 'm', 'S');INSERT INTO pg_init_privs (objoid, classoid, objsubid,
initprivs, privtype) SELECT pg_class.oid, (SELECT oid FROM
pg_class WHERE relname = 'pg_class'), pg_attribute.attnum,
pg_attribute.attacl, 'i' FROM pg_class JOIN
pg_attribute ON (pg_class.oid = pg_attribute.attrelid) WHERE
pg_attribute.attacl IS NOT NULL AND pg_class.relkind IN ('r', 'v',
'm', 'S');INSERT INTO pg_init_privs (objoid, classoid, objsubid,
initprivs, privtype) SELECT oid, (SELECT oid FROM pg_class
WHERE relname = 'pg_proc'), 0, proacl, 'i' FROM
pg_proc WHERE proacl IS NOT NULL;INSERT INTO pg_init_privs
(objoid, classoid, objsubid, initprivs, privtype) SELECT oid,
(SELECT oid FROM pg_class WHERE relname = 'pg_type'), 0,
typacl, 'i' FROM pg_type WHERE typacl IS NOT
NULL;INSERT INTO pg_init_privs (objoid, classoid, objsubid, initprivs,
privtype) SELECT oid, (SELECT oid FROM pg_class WHERE
relname = 'pg_language'), 0, lanacl, 'i' FROM
pg_language WHERE lanacl IS NOT NULL;INSERT INTO pg_init_privs
(objoid, classoid, objsubid, initprivs, privtype) SELECT oid,
(SELECT oid FROM pg_class WHERE relname = 'pg_largeobject_metadata'),
0, lomacl, 'i' FROM pg_largeobject_metadata
WHERE lomacl IS NOT NULL;INSERT INTO pg_init_privs (objoid,
classoid, objsubid, initprivs, privtype) SELECT oid,
(SELECT oid FROM pg_class WHERE relname = 'pg_namespace'), 0,
nspacl, 'i' FROM pg_namespace WHERE nspacl IS
NOT NULL;INSERT INTO pg_init_privs (objoid, classoid, objsubid,
initprivs, privtype) SELECT oid, (SELECT oid FROM pg_class
WHERE relname = 'pg_database'), 0, datacl, 'i' FROM
pg_database WHERE datacl IS NOT NULL;INSERT INTO
pg_init_privs (objoid, classoid, objsubid, initprivs, privtype) SELECT
oid, (SELECT oid FROM pg_class WHERE relname =
'pg_tablespace'), 0, spcacl, 'i' FROM
pg_tablespace WHERE spcacl IS NOT NULL;INSERT INTO pg_init_privs
(objoid, classoid, objsubid, initprivs, privtype) SELECT oid,
(SELECT oid FROM pg_class WHERE relname = 'pg_foreign_data_wrapper'),
0, fdwacl, 'i' FROM pg_foreign_data_wrapper
WHERE fdwacl IS NOT NULL;INSERT INTO pg_init_privs (objoid,
classoid, objsubid, initprivs, privtype) SELECT oid,
(SELECT oid FROM pg_class WHERE relname = 'pg_foreign_server'), 0,
srvacl, 'i' FROM pg_foreign_server WHERE
srvacl IS NOT NULL;/**
* * SQL Information Schema*
* * as defined in ISO/IEC 9075-11:2011*
* **
* * Copyright (c) 2003-2016, PostgreSQL Global Development Group*
* **
* * src/backend/catalog/information_schema.sql*
* **
* * Note: this file is read in single-user -j mode, which means that the*
* * command terminator is semicolon-newline-newline; whenever the backend*
* * sees that, it stops and executes what it's got. If you write a lot of*
* * statements without empty lines between, they'll all get quoted to you*
* * in any error message about one of them, so don't do that. Also, you*
* * cannot write a semicolon immediately followed by an empty line in a*
* * string literal (including a function body!) or a multiline comment.*
* */*
*/**
* * Note: Generally, the definitions in this file should be ordered*
* * according to the clause numbers in the SQL standard, which is also the*
* * alphabetical order. In some cases it is convenient or necessary to*
* * define one information schema view by using another one; in that case,*
* * put the referencing view at the very end and leave a note where it*
* * should have been put.*
* */*
*/**
* * 5.1*
* * INFORMATION_SCHEMA schema*
* */*
*CREATE SCHEMA information_schema;*
*GRANT USAGE ON SCHEMA information_schema TO PUBLIC;*
*SET search_path TO information_schema;*
debug_query_string for setup_schema:
*INSERT INTO sql_implementation_info VALUES ('10003', 'CATALOG NAME', NULL,
'Y', NULL);*
*INSERT INTO sql_implementation_info VALUES ('10004', 'COLLATING SEQUENCE',
NULL, (SELECT default_collate_name FROM character_sets), NULL);*
*INSERT INTO sql_implementation_info VALUES ('23', 'CURSOR COMMIT
BEHAVIOR', 1, NULL, 'close cursors and retain prepared statements');*
*INSERT INTO sql_implementation_info VALUES ('2', 'DATA SOURCE NAME',
NULL, '', NULL);*
*INSERT INTO sql_implementation_info VALUES ('17', 'DBMS NAME', NULL,
(select trim(trailing ' ' from substring(version() from '^[^0-9]*'))),
NULL);*
*INSERT INTO sql_implementation_info VALUES ('18', 'DBMS VERSION', NULL,
'???', NULL); -- filled by initdb*
*INSERT INTO sql_implementation_info VALUES ('26', 'DEFAULT TRANSACTION
ISOLATION', 2, NULL, 'READ COMMITTED; user-settable');*
*INSERT INTO sql_implementation_info VALUES ('28', 'IDENTIFIER CASE', 3,
NULL, 'stored in mixed case - case sensitive');*
*INSERT INTO sql_implementation_info VALUES ('85', 'NULL COLLATION', 0,
NULL, 'nulls higher than non-nulls');*
*INSERT INTO sql_implementation_info VALUES ('13', 'SERVER NAME', NULL,
'', NULL);*
*INSERT INTO sql_implementation_info VALUES ('94', 'SPECIAL CHARACTERS',
NULL, '', 'all non-ASCII characters allowed');*
*INSERT INTO sql_implementation_info VALUES ('46', 'TRANSACTION
CAPABLE', 2, NULL, 'both DML and DDL');*
And if I comment these out i.e. setup_description, setup_privileges and
'setup_schema' it seem to progress well without any errors/crashes.
Regards,
Haroon
--
Haroon http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 24 June 2016 at 21:34, Tom Lane <tgl@sss.pgh.pa.us> wrote:
TBH, this looks more like a compiler bug than anything else.
I tend to agree. Especially since valgrind has no complaints on x64 linux,
and neither does DrMemory for 32-bit builds with the same toolchain on the
same Windows and same SDK.
I don't see any particular reason we can't proceed with 9.6beta2 and build
x64 Pg with MS VS 2015. There's no evidence turning up of a Pg bug here,
and compiling with a different toolchain gets us working binaries for the
target platform in question.
It would be worth recompiling at -O0, or whatever the local equivalent
of that is, to see if (1) the crash goes away or (2) the debugger's
printouts get any more reliable
Yeah, it probably is. I'll see if I can find time this w/e.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services