Undesirable entries in typedefs list

Started by Tom Laneabout 8 years ago11 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

I noticed that doing pgindent with the current typedefs list available
from the buildfarm caused a lot of havoc in what had been stable code.
Looking into the reasons, it seems that:

(1) "bool" is no longer listed as a typedef name (probably because
stdbool.h makes it a macro instead);

(2) "abs", "boolean", "iterator", "other", "pointer", "reference",
"string", and "type" all now are listed as typedef names.

It's probably okay to treat "boolean" as a typedef, but all those others
are complete disasters. Anyone know where they're coming from?

As for "bool", we could probably deal with that most reliably by
having pgindent add it as a special case. Maybe we could get it
back in there by having some trailing-edge buildfarm member
contribute typedefs, but that seems like a solution with a rather
limited half-life.

regards, tom lane

#2Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#1)
Re: Undesirable entries in typedefs list

Hi,

On 2018-03-24 13:25:34 -0400, Tom Lane wrote:

(2) "abs", "boolean", "iterator", "other", "pointer", "reference",
"string", and "type" all now are listed as typedef names.

It's probably okay to treat "boolean" as a typedef, but all those others
are complete disasters. Anyone know where they're coming from?

Semi informed theory: LLVM? I think I'd configured one of the LLVM
animals to collect typedefs, but that might have been a bad idea...

As for "bool", we could probably deal with that most reliably by
having pgindent add it as a special case. Maybe we could get it
back in there by having some trailing-edge buildfarm member
contribute typedefs, but that seems like a solution with a rather
limited half-life.

Could we combine the list of typedefs with one manually maintained
in-tree?

Greetings,

Andres Freund

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#2)
Re: Undesirable entries in typedefs list

Andres Freund <andres@anarazel.de> writes:

On 2018-03-24 13:25:34 -0400, Tom Lane wrote:

(2) "abs", "boolean", "iterator", "other", "pointer", "reference",
"string", and "type" all now are listed as typedef names.

It's probably okay to treat "boolean" as a typedef, but all those others
are complete disasters. Anyone know where they're coming from?

Semi informed theory: LLVM? I think I'd configured one of the LLVM
animals to collect typedefs, but that might have been a bad idea...

That was my first thought as well, since this seems to have changed
quite recently. I don't think it's a bad idea to be collecting typedefs
from something that compiles that code, because otherwise our own
typedefs in that area won't be known.

As for "bool", we could probably deal with that most reliably by
having pgindent add it as a special case. Maybe we could get it
back in there by having some trailing-edge buildfarm member
contribute typedefs, but that seems like a solution with a rather
limited half-life.

Could we combine the list of typedefs with one manually maintained
in-tree?

Perhaps. We might need a manually maintained blacklist, as well.

regards, tom lane

#4Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#3)
Re: Undesirable entries in typedefs list

Hi,

On 2018-03-24 13:34:45 -0400, Tom Lane wrote:

That was my first thought as well, since this seems to have changed
quite recently. I don't think it's a bad idea to be collecting typedefs
from something that compiles that code, because otherwise our own
typedefs in that area won't be known.

The number of typedefs we actually use is fairly small (~5-7), so we
could realistically maintain the them manually.

I'm about to head out, but afterwards I'm going to check what the
typedefs collected actually are when LLVM is enabled.

Greetings,

Andres Freund

#5Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#4)
Re: Undesirable entries in typedefs list

On 2018-03-24 10:41:40 -0700, Andres Freund wrote:

I'm about to head out, but afterwards I'm going to check what the
typedefs collected actually are when LLVM is enabled.

It's indeed from llvm:

<3><5201>: Abbrev Number: 4 (DW_TAG_typedef)
<5202> DW_AT_name : (indirect string, offset: 0x8a0a4): string
<5206> DW_AT_decl_file : 13
<5207> DW_AT_decl_line : 74
<5208> DW_AT_type : <0x3896>

which basically references std::string

<3><3896>: Abbrev Number: 24 (DW_TAG_class_type)
<3897> DW_AT_name : (indirect string, offset: 0x2db87): basic_string<char, std::char_traits<char>, std::allocator<char> >
<389b> DW_AT_byte_size : 32
<389c> DW_AT_decl_file : 12
<389d> DW_AT_decl_line : 77
<389e> DW_AT_sibling : <0x51fc>

Given that the PG code shouldn't refer to this, I think we might be able
to limit things by specifying --dwarf-depth=3 to the objdump output.

I've attached the difference between a objdump typedefs list roughly
equivalent to what the buildfarm uses. There's no difference when not
using llvm.

I'm a bit uncomfortable relying --dwarf-depth=3, with 3 being determined
purely experimentally though.

Greetings,

Andres Freund

Attachments:

typelist_diff_llvm.txttext/plain; charset=us-asciiDownload+0-141
In reply to: Andres Freund (#5)
Re: Undesirable entries in typedefs list

On Sat, Mar 24, 2018 at 1:36 PM, Andres Freund <andres@anarazel.de> wrote:

I've attached the difference between a objdump typedefs list roughly
equivalent to what the buildfarm uses. There's no difference when not
using llvm.

I'm a bit uncomfortable relying --dwarf-depth=3, with 3 being determined
purely experimentally though.

A quick look at the DWARF4 standard [1]http://dwarfstd.org/doc/DWARF4.pdf -- Peter Geoghegan suggests that this refers to
lexical depth. So, I agree that that doesn't seem like a great idea.

[1]: http://dwarfstd.org/doc/DWARF4.pdf -- Peter Geoghegan
--
Peter Geoghegan

#7Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#6)
Re: Undesirable entries in typedefs list

On 2018-03-24 14:51:32 -0700, Peter Geoghegan wrote:

On Sat, Mar 24, 2018 at 1:36 PM, Andres Freund <andres@anarazel.de> wrote:

I've attached the difference between a objdump typedefs list roughly
equivalent to what the buildfarm uses. There's no difference when not
using llvm.

I'm a bit uncomfortable relying --dwarf-depth=3, with 3 being determined
purely experimentally though.

A quick look at the DWARF4 standard [1] suggests that this refers to
lexical depth. So, I agree that that doesn't seem like a great idea.

[1] http://dwarfstd.org/doc/DWARF4.pdf

Are you referring to Section 3.4? That's something different afaict. Or
which bit are you thinking of?

See e.g. https://sourceware.org/bugzilla/show_bug.cgi?id=22250

"
Another addition "depth" starts counting from zero. Zero usually is the
first DIE (compilation_unit or partial_unit). Specifying --dwarf-depth=0
will only print the Compilation Unit headers, followed by a blank line
and no ... markers. Unless --dwarf-depth is given with a non-zero
argument, then only a blank line is printed.
"

My understanding is that it controls through how many levels of nesting
in type definitions objdump recurses through. The type of typedefs we
really care about are going to be at the global level. Which should make
them available at dwarf-depth=2. Then we have a few typedefs which are
solely done inside functions, which is why we need dwarf-depth=3.

Greetings,

Andres Freund

#8Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#1)
Re: Undesirable entries in typedefs list

On Sun, Mar 25, 2018 at 3:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I noticed that doing pgindent with the current typedefs list available
from the buildfarm caused a lot of havoc in what had been stable code.
Looking into the reasons, it seems that:

(1) "bool" is no longer listed as a typedef name (probably because
stdbool.h makes it a macro instead);

(2) "abs", "boolean", "iterator", "other", "pointer", "reference",
"string", and "type" all now are listed as typedef names.

It's probably okay to treat "boolean" as a typedef, but all those others
are complete disasters. Anyone know where they're coming from?

As for "bool", we could probably deal with that most reliably by
having pgindent add it as a special case. Maybe we could get it
back in there by having some trailing-edge buildfarm member
contribute typedefs, but that seems like a solution with a rather
limited half-life.

pgindent already has a list of blacklisted typedefs (see lines 121 to 123)

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#9Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#8)
Re: Undesirable entries in typedefs list

On Sun, Mar 25, 2018 at 11:32 AM, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:

On Sun, Mar 25, 2018 at 3:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I noticed that doing pgindent with the current typedefs list available
from the buildfarm caused a lot of havoc in what had been stable code.
Looking into the reasons, it seems that:

(1) "bool" is no longer listed as a typedef name (probably because
stdbool.h makes it a macro instead);

(2) "abs", "boolean", "iterator", "other", "pointer", "reference",
"string", and "type" all now are listed as typedef names.

It's probably okay to treat "boolean" as a typedef, but all those others
are complete disasters. Anyone know where they're coming from?

As for "bool", we could probably deal with that most reliably by
having pgindent add it as a special case. Maybe we could get it
back in there by having some trailing-edge buildfarm member
contribute typedefs, but that seems like a solution with a rather
limited half-life.

pgindent already has a list of blacklisted typedefs (see lines 121 to 123)

Here's a patch to pgindent which I think will do what you want fairly
simply, assuming you have identified all the offending tokens.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

pgindent-whiteblacklist.patchapplication/octet-stream; name=pgindent-whiteblacklist.patchDownload+12-2
#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#9)
Re: Undesirable entries in typedefs list

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:

pgindent already has a list of blacklisted typedefs (see lines 121 to 123)

Oh, so it does.

Here's a patch to pgindent which I think will do what you want fairly
simply, assuming you have identified all the offending tokens.

Hm, this does not work for me (with perl 5.10.1), I get syntax errors like

Not enough arguments for map at src/tools/pgindent/pgindent line 64, near "abs allocfunc iterator other pointer printfunc reference string type )"
(Might be a runaway multi-line () string starting on line 63)
syntax error at src/tools/pgindent/pgindent line 64, near "abs allocfunc iterator other pointer printfunc reference string type )"
Execution of src/tools/pgindent/pgindent aborted due to compilation errors.

After reading the perlfunc man page I changed

-my %blacklist = map { "$_\n" => 1 }
+my %blacklist = map { +"$_\n" => 1 }

and then it seems to work, but man this is an ugly language :-(

regards, tom lane

#11Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#10)
Re: Undesirable entries in typedefs list

On Wed, Mar 28, 2018 at 8:32 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:

pgindent already has a list of blacklisted typedefs (see lines 121 to 123)

Oh, so it does.

Here's a patch to pgindent which I think will do what you want fairly
simply, assuming you have identified all the offending tokens.

Hm, this does not work for me (with perl 5.10.1), I get syntax errors like

Not enough arguments for map at src/tools/pgindent/pgindent line 64, near "abs allocfunc iterator other pointer printfunc reference string type )"
(Might be a runaway multi-line () string starting on line 63)
syntax error at src/tools/pgindent/pgindent line 64, near "abs allocfunc iterator other pointer printfunc reference string type )"
Execution of src/tools/pgindent/pgindent aborted due to compilation errors.

After reading the perlfunc man page I changed

-my %blacklist = map { "$_\n" => 1 }
+my %blacklist = map { +"$_\n" => 1 }

and then it seems to work, but man this is an ugly language :-(

Yes, there are lots of quirks that can make it fun to deal with.

Thanks for fixing and committing.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services