Re: sgml cleanup: unescaped '>' characters

Started by Bruce Momjianover 14 years ago9 messagesdocs
Jump to latest
#1Bruce Momjian
bruce@momjian.us

Peter Eisentraut wrote:

as well as seemingly-invalid SGML, such as using '>' unescaped inside
normal SGML entries.

Unescaped > is valid, AFAIK.

Oh, that's interesting. I took a quick look at "The SGML FAQ book",
page 73 [1], which supports this claim.

But I notice we've been fixing such issues in the recent past (e.g.
commit d420ba2a2d4ea4831f89a3fd7ce86b05eff932ff). Don't we want to
continue doing so? Not to mention the fact that we have
./src/tools/find_gt_lt, which while somewhat broken, has the
ostensible goal of finding such problems in the SGML. Or do we want to
stop worrying about '>' entirely, and rename find_gt_lt to find_lt,
instead?

[1] http://books.google.com/books?id=OyJHFJsnh10C&lpg=PA229&ots=DGkYDdvbhE&pg=PA73#v=onepage&q&f=false

I don't know what the rationale for this tool is. I have never used it.
Clearly, the reference shows, and the tools we use confirm, that it is
not necessary to use it.

I have updated the scripts and instructions accordingly.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#2Peter Eisentraut
peter_e@gmx.net
In reply to: Bruce Momjian (#1)

On tor, 2011-09-01 at 10:17 -0400, Bruce Momjian wrote:

Peter Eisentraut wrote:

as well as seemingly-invalid SGML, such as using '>' unescaped inside
normal SGML entries.

Unescaped > is valid, AFAIK.

Oh, that's interesting. I took a quick look at "The SGML FAQ book",
page 73 [1], which supports this claim.

But I notice we've been fixing such issues in the recent past (e.g.
commit d420ba2a2d4ea4831f89a3fd7ce86b05eff932ff). Don't we want to
continue doing so? Not to mention the fact that we have
./src/tools/find_gt_lt, which while somewhat broken, has the
ostensible goal of finding such problems in the SGML. Or do we want to
stop worrying about '>' entirely, and rename find_gt_lt to find_lt,
instead?

[1] http://books.google.com/books?id=OyJHFJsnh10C&amp;lpg=PA229&amp;ots=DGkYDdvbhE&amp;pg=PA73#v=onepage&amp;q&amp;f=false

I don't know what the rationale for this tool is. I have never used it.
Clearly, the reference shows, and the tools we use confirm, that it is
not necessary to use it.

I have updated the scripts and instructions accordingly.

That still leaves open why we bother about escaping <.

#3Bruce Momjian
bruce@momjian.us
In reply to: Peter Eisentraut (#2)

Peter Eisentraut wrote:

On tor, 2011-09-01 at 10:17 -0400, Bruce Momjian wrote:

Peter Eisentraut wrote:

as well as seemingly-invalid SGML, such as using '>' unescaped inside
normal SGML entries.

Unescaped > is valid, AFAIK.

Oh, that's interesting. I took a quick look at "The SGML FAQ book",
page 73 [1], which supports this claim.

But I notice we've been fixing such issues in the recent past (e.g.
commit d420ba2a2d4ea4831f89a3fd7ce86b05eff932ff). Don't we want to
continue doing so? Not to mention the fact that we have
./src/tools/find_gt_lt, which while somewhat broken, has the
ostensible goal of finding such problems in the SGML. Or do we want to
stop worrying about '>' entirely, and rename find_gt_lt to find_lt,
instead?

[1] http://books.google.com/books?id=OyJHFJsnh10C&amp;lpg=PA229&amp;ots=DGkYDdvbhE&amp;pg=PA73#v=onepage&amp;q&amp;f=false

I don't know what the rationale for this tool is. I have never used it.
Clearly, the reference shows, and the tools we use confirm, that it is
not necessary to use it.

I have updated the scripts and instructions accordingly.

That still leaves open why we bother about escaping <.

The problem is that I often add SGML that has:

if (1 < 0) ...

I need something to warn me about those, especially in the release
notes.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#4Peter Eisentraut
peter_e@gmx.net
In reply to: Bruce Momjian (#3)

On tor, 2011-09-01 at 14:17 -0400, Bruce Momjian wrote:

That still leaves open why we bother about escaping <.

The problem is that I often add SGML that has:

if (1 < 0) ...

I need something to warn me about those, especially in the release
notes.

Why do you need to be warned about that?

#5Bruce Momjian
bruce@momjian.us
In reply to: Peter Eisentraut (#4)

Peter Eisentraut wrote:

On tor, 2011-09-01 at 14:17 -0400, Bruce Momjian wrote:

That still leaves open why we bother about escaping <.

The problem is that I often add SGML that has:

if (1 < 0) ...

I need something to warn me about those, especially in the release
notes.

Why do you need to be warned about that?

If I have:

if (1 < fred)

it will think "fred" is a SGML tag, no?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#6Peter Eisentraut
peter_e@gmx.net
In reply to: Bruce Momjian (#5)

On tor, 2011-09-01 at 17:31 -0400, Bruce Momjian wrote:

Peter Eisentraut wrote:

On tor, 2011-09-01 at 14:17 -0400, Bruce Momjian wrote:

That still leaves open why we bother about escaping <.

The problem is that I often add SGML that has:

if (1 < 0) ...

I need something to warn me about those, especially in the release
notes.

Why do you need to be warned about that?

If I have:

if (1 < fred)

it will think "fred" is a SGML tag, no?

No, a < followed by a space is not a tag, it's character data. If it
thought it were a tag, it would complain.

#7Bruce Momjian
bruce@momjian.us
In reply to: Peter Eisentraut (#6)

Peter Eisentraut wrote:

On tor, 2011-09-01 at 17:31 -0400, Bruce Momjian wrote:

Peter Eisentraut wrote:

On tor, 2011-09-01 at 14:17 -0400, Bruce Momjian wrote:

That still leaves open why we bother about escaping <.

The problem is that I often add SGML that has:

if (1 < 0) ...

I need something to warn me about those, especially in the release
notes.

Why do you need to be warned about that?

If I have:

if (1 < fred)

it will think "fred" is a SGML tag, no?

No, a < followed by a space is not a tag, it's character data. If it
thought it were a tag, it would complain.

Sometimes it is '<' (in single quotes), which I thought would be a
problem.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#8Peter Eisentraut
peter_e@gmx.net
In reply to: Bruce Momjian (#7)

On lör, 2011-09-03 at 16:47 -0400, Bruce Momjian wrote:

Peter Eisentraut wrote:

On tor, 2011-09-01 at 17:31 -0400, Bruce Momjian wrote:

Peter Eisentraut wrote:

On tor, 2011-09-01 at 14:17 -0400, Bruce Momjian wrote:

That still leaves open why we bother about escaping <.

The problem is that I often add SGML that has:

if (1 < 0) ...

I need something to warn me about those, especially in the release
notes.

Why do you need to be warned about that?

If I have:

if (1 < fred)

it will think "fred" is a SGML tag, no?

No, a < followed by a space is not a tag, it's character data. If it
thought it were a tag, it would complain.

Sometimes it is '<' (in single quotes), which I thought would be a
problem.

The bottom line is, the SGML parser can figure that out itself, and if
it has a problem, it will complain. We don't need to second guess it
with regular expressions that are handcrafted out of thin air.

I was hoping you would remember whether you initially put this in
because of some tool problem. But if we are not finding any supporting
evidence, I would suggest that we just scrap this thing entirely.

#9Bruce Momjian
bruce@momjian.us
In reply to: Peter Eisentraut (#8)

Peter Eisentraut wrote:

On l?r, 2011-09-03 at 16:47 -0400, Bruce Momjian wrote:

Peter Eisentraut wrote:

On tor, 2011-09-01 at 17:31 -0400, Bruce Momjian wrote:

Peter Eisentraut wrote:

On tor, 2011-09-01 at 14:17 -0400, Bruce Momjian wrote:

That still leaves open why we bother about escaping <.

The problem is that I often add SGML that has:

if (1 < 0) ...

I need something to warn me about those, especially in the release
notes.

Why do you need to be warned about that?

If I have:

if (1 < fred)

it will think "fred" is a SGML tag, no?

No, a < followed by a space is not a tag, it's character data. If it
thought it were a tag, it would complain.

Sometimes it is '<' (in single quotes), which I thought would be a
problem.

The bottom line is, the SGML parser can figure that out itself, and if
it has a problem, it will complain. We don't need to second guess it
with regular expressions that are handcrafted out of thin air.

I was hoping you would remember whether you initially put this in
because of some tool problem. But if we are not finding any supporting
evidence, I would suggest that we just scrap this thing entirely.

I put it in to warn about release.sgml markup problems, so I properly
escaped all non-tag '>' and '<' characters.

I have removed the tool. We can always re-add it if we find it is
needed.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +