Doc: typo in config.sgml

Started by Tatsuo Ishiiover 1 year ago70 messages

Jump to latest

Tatsuo Ishii

ishii@postgresql.org

over 1 year ago

I think there's an unnecessary underscore in config.sgml.
Attached patch fixes it.

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Yugo Nagata

nagata@sraoss.co.jp

over 1 year ago

In reply to: Tatsuo Ishii (#1)

Re: Doc: typo in config.sgml

On Mon, 30 Sep 2024 15:34:04 +0900 (JST)
Tatsuo Ishii <ishii@postgresql.org> wrote:

I think there's an unnecessary underscore in config.sgml.
Attached patch fixes it.

I could not apply the patch with an error.

error: patch failed: doc/src/sgml/config.sgml:9380
error: doc/src/sgml/config.sgml: patch does not apply

I found your patch contains an odd character (ASCII Code 240?)
by performing `od -c` command on the file. See the attached file.

Regards,
Yugo Nagata

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

--
Yugo Nagata <nagata@sraoss.co.jp>

Tatsuo Ishii

ishii@postgresql.org

over 1 year ago

In reply to: Yugo Nagata (#2)

Re: Doc: typo in config.sgml

I think there's an unnecessary underscore in config.sgml.
Attached patch fixes it.

I could not apply the patch with an error.

error: patch failed: doc/src/sgml/config.sgml:9380
error: doc/src/sgml/config.sgml: patch does not apply

Strange. I have no problem applying the patch here.

I found your patch contains an odd character (ASCII Code 240?)
by performing `od -c` command on the file. See the attached file.

Yes, 240 in octal (== 0xc2) is in the patch but it's because current
config.sgml includes the character. You can check it by looking at
line 9383 of config.sgml.

I think it was introduced by 28e858c0f95.

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Yugo Nagata

nagata@sraoss.co.jp

over 1 year ago

In reply to: Tatsuo Ishii (#3)

Re: Doc: typo in config.sgml

On Mon, 30 Sep 2024 17:23:24 +0900 (JST)
Tatsuo Ishii <ishii@postgresql.org> wrote:

I think there's an unnecessary underscore in config.sgml.
Attached patch fixes it.

I could not apply the patch with an error.

error: patch failed: doc/src/sgml/config.sgml:9380
error: doc/src/sgml/config.sgml: patch does not apply

Strange. I have no problem applying the patch here.

I found your patch contains an odd character (ASCII Code 240?)
by performing `od -c` command on the file. See the attached file.

Yes, 240 in octal (== 0xc2) is in the patch but it's because current
config.sgml includes the character. You can check it by looking at
line 9383 of config.sgml.

Yes, you are right, I can find the 0xc2 char in config.sgml using od -c,
although I still could not apply the patch.

I think this is non-breaking space of (C2A0) of utf-8. I guess my
terminal normally regards this as a space, so applying patch fails.

I found it also in line 85 of ref/drop_extension.sgml.

I think it was introduced by 28e858c0f95.

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

--
Yugo NAGATA <nagata@sraoss.co.jp>

Tatsuo Ishii

ishii@postgresql.org

over 1 year ago

In reply to: Tatsuo Ishii (#3)

Re: Doc: typo in config.sgml

I think there's an unnecessary underscore in config.sgml.

I was wrong. The particular byte sequences just looked an underscore
on my editor but the byte sequence is actually 0xc2a0, which must be a
"non breaking space" encoded in UTF-8. I guess someone mistakenly
insert a non breaking space while editing config.sgml.

However the mistake does not affect the patch.

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Yugo Nagata

nagata@sraoss.co.jp

over 1 year ago

In reply to: Tatsuo Ishii (#5)

Re: Doc: typo in config.sgml

On Mon, 30 Sep 2024 18:03:44 +0900 (JST)
Tatsuo Ishii <ishii@postgresql.org> wrote:

I think there's an unnecessary underscore in config.sgml.

I was wrong. The particular byte sequences just looked an underscore
on my editor but the byte sequence is actually 0xc2a0, which must be a
"non breaking space" encoded in UTF-8. I guess someone mistakenly
insert a non breaking space while editing config.sgml.

However the mistake does not affect the patch.

It looks like we've crisscrossed our mail.
Anyway, I agree with removing non breaking spaces, as well as
one found in line 85 of ref/drop_extension.sgml.

Regards,
Yugo Nagata

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

--
Yugo NAGATA <nagata@sraoss.co.jp>

Daniel Gustafsson

daniel@yesql.se

over 1 year ago

In reply to: Tatsuo Ishii (#5)

Re: Doc: typo in config.sgml

On 30 Sep 2024, at 11:03, Tatsuo Ishii <ishii@postgresql.org> wrote:

I think there's an unnecessary underscore in config.sgml.

I was wrong. The particular byte sequences just looked an underscore
on my editor but the byte sequence is actually 0xc2a0, which must be a
"non breaking space" encoded in UTF-8. I guess someone mistakenly
insert a non breaking space while editing config.sgml.

I wonder if it would be worth to add a check for this like we have to tabs?
The attached adds a rule to "make -C doc/src/sgml check" for trapping nbsp
(doing so made me realize we don't have an equivalent meson target).

--
Daniel Gustafsson

Yugo Nagata

nagata@sraoss.co.jp

over 1 year ago

In reply to: Daniel Gustafsson (#7)

Re: Doc: typo in config.sgml

On Mon, 30 Sep 2024 11:59:48 +0200
Daniel Gustafsson <daniel@yesql.se> wrote:

On 30 Sep 2024, at 11:03, Tatsuo Ishii <ishii@postgresql.org> wrote:

I think there's an unnecessary underscore in config.sgml.

I was wrong. The particular byte sequences just looked an underscore
on my editor but the byte sequence is actually 0xc2a0, which must be a
"non breaking space" encoded in UTF-8. I guess someone mistakenly
insert a non breaking space while editing config.sgml.

I wonder if it would be worth to add a check for this like we have to tabs?
The attached adds a rule to "make -C doc/src/sgml check" for trapping nbsp
(doing so made me realize we don't have an equivalent meson target).

Your patch couldn't detect 0xA0 in config.sgml in my machine, but it works
when I use `grep -P "[\xA0]"` instead of `grep -e "\xA0"`.

However, it also detects the following line in charset.sgml.
(https://www.postgresql.org/docs/current/collation.html)

For example, locale und-u-kb sorts 'àe' before 'aé'.

This is not non-breaking space, so should not be detected as an error.

Regards,
Yugo Nagata

--
Daniel Gustafsson

--
Yugo Nagata <nagata@sraoss.co.jp>

Tatsuo Ishii

ishii@postgresql.org

over 1 year ago

In reply to: Yugo Nagata (#8)

Re: Doc: typo in config.sgml

I wonder if it would be worth to add a check for this like we have to tabs?

+1.

The attached adds a rule to "make -C doc/src/sgml check" for trapping nbsp
(doing so made me realize we don't have an equivalent meson target).

Your patch couldn't detect 0xA0 in config.sgml in my machine, but it works
when I use `grep -P "[\xA0]"` instead of `grep -e "\xA0"`.

However, it also detects the following line in charset.sgml.
(https://www.postgresql.org/docs/current/collation.html)

For example, locale und-u-kb sorts 'àe' before 'aé'.

This is not non-breaking space, so should not be detected as an error.

That's because non-breaking space (nbsp) is not encoded as 0xa0 in
UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code
point in Unicode. i.e. U+00A0).
So grep -P "[\xC2\xA0]" should work to detect nbsp.

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

#10

Yugo Nagata

nagata@sraoss.co.jp

over 1 year ago

In reply to: Tatsuo Ishii (#9)

Re: Doc: typo in config.sgml

On Mon, 30 Sep 2024 20:07:31 +0900 (JST)
Tatsuo Ishii <ishii@postgresql.org> wrote:

I wonder if it would be worth to add a check for this like we have to tabs?

+1.

The attached adds a rule to "make -C doc/src/sgml check" for trapping nbsp
(doing so made me realize we don't have an equivalent meson target).

Your patch couldn't detect 0xA0 in config.sgml in my machine, but it works
when I use `grep -P "[\xA0]"` instead of `grep -e "\xA0"`.

However, it also detects the following line in charset.sgml.
(https://www.postgresql.org/docs/current/collation.html)

For example, locale und-u-kb sorts 'àe' before 'aé'.

This is not non-breaking space, so should not be detected as an error.

That's because non-breaking space (nbsp) is not encoded as 0xa0 in
UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code
point in Unicode. i.e. U+00A0).
So grep -P "[\xC2\xA0]" should work to detect nbsp.

`LC_ALL=C grep -P "\xC2\xA0"` works for my environment.
([ and ] were not necessary.)

When LC_ALL is null, `grep -P "\xA0"` could not detect any characters in charset.sgml,
but I think it is better to specify both LC_ALL=C and "\xC2\xA0" for making sure detecting
nbsp.

One problem is that -P option can be used in only GNU grep, and grep in mac doesn't support it.

On bash, we can also use `grep $'\xc2\xa0'`, but I am not sure we can assume the shell is bash.

Maybe, better way is use perl itself rather than grep as following.

`perl -ne '/\xC2\xA0/ and print' `

I attached a patch fixed in this way.

Regards,
Yugo Nagata

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

--
Yugo NAGATA <nagata@sraoss.co.jp>

#11

Tatsuo Ishii

ishii@postgresql.org

over 1 year ago

In reply to: Yugo Nagata (#10)

Re: Doc: typo in config.sgml

That's because non-breaking space (nbsp) is not encoded as 0xa0 in
UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code
point in Unicode. i.e. U+00A0).
So grep -P "[\xC2\xA0]" should work to detect nbsp.

`LC_ALL=C grep -P "\xC2\xA0"` works for my environment.
([ and ] were not necessary.)

When LC_ALL is null, `grep -P "\xA0"` could not detect any characters in charset.sgml,
but I think it is better to specify both LC_ALL=C and "\xC2\xA0" for making sure detecting
nbsp.

One problem is that -P option can be used in only GNU grep, and grep in mac doesn't support it.

On bash, we can also use `grep $'\xc2\xa0'`, but I am not sure we can assume the shell is bash.

Maybe, better way is use perl itself rather than grep as following.

`perl -ne '/\xC2\xA0/ and print' `

I attached a patch fixed in this way.

GNU sed can also be used without setting LC_ALL:

sed -n /"\xC2\xA0"/p

However I am not sure if non-GNU sed can do this too...

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

#12

Tatsuo Ishii

ishii@postgresql.org

over 1 year ago

In reply to: Yugo Nagata (#4)

Re: Doc: typo in config.sgml

On Mon, 30 Sep 2024 17:23:24 +0900 (JST)
Tatsuo Ishii <ishii@postgresql.org> wrote:

I think there's an unnecessary underscore in config.sgml.
Attached patch fixes it.

I could not apply the patch with an error.

error: patch failed: doc/src/sgml/config.sgml:9380
error: doc/src/sgml/config.sgml: patch does not apply

Strange. I have no problem applying the patch here.

I found your patch contains an odd character (ASCII Code 240?)
by performing `od -c` command on the file. See the attached file.

Yes, 240 in octal (== 0xc2) is in the patch but it's because current
config.sgml includes the character. You can check it by looking at
line 9383 of config.sgml.

Yes, you are right, I can find the 0xc2 char in config.sgml using od -c,
although I still could not apply the patch.

I think this is non-breaking space of (C2A0) of utf-8. I guess my
terminal normally regards this as a space, so applying patch fails.

I found it also in line 85 of ref/drop_extension.sgml.

Thanks. I have pushed the fix for ref/drop_extension.sgml along with
config.sgml.

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

#13

Yugo Nagata

nagata@sraoss.co.jp

over 1 year ago

In reply to: Tatsuo Ishii (#11)

Re: Doc: typo in config.sgml

On Tue, 01 Oct 2024 10:33:50 +0900 (JST)
Tatsuo Ishii <ishii@postgresql.org> wrote:

That's because non-breaking space (nbsp) is not encoded as 0xa0 in
UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code
point in Unicode. i.e. U+00A0).
So grep -P "[\xC2\xA0]" should work to detect nbsp.

`LC_ALL=C grep -P "\xC2\xA0"` works for my environment.
([ and ] were not necessary.)

When LC_ALL is null, `grep -P "\xA0"` could not detect any characters in charset.sgml,
but I think it is better to specify both LC_ALL=C and "\xC2\xA0" for making sure detecting
nbsp.

One problem is that -P option can be used in only GNU grep, and grep in mac doesn't support it.

On bash, we can also use `grep $'\xc2\xa0'`, but I am not sure we can assume the shell is bash.

Maybe, better way is use perl itself rather than grep as following.

`perl -ne '/\xC2\xA0/ and print' `

I attached a patch fixed in this way.

GNU sed can also be used without setting LC_ALL:

sed -n /"\xC2\xA0"/p

However I am not sure if non-GNU sed can do this too...

Although I've not check it myself, BSD sed doesn't support \x escape according to [1]https://stackoverflow.com/questions/24275070/sed-not-giving-me-correct-substitute-operation-for-newline-with-mac-difference.

[1]: https://stackoverflow.com/questions/24275070/sed-not-giving-me-correct-substitute-operation-for-newline-with-mac-difference

By the way, I've attached a patch a bit modified to use the plural form statement
as same as check-tabs.

Non-breaking **spaces** appear in SGML/XML files

Regards,
Yugo Nagata

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

--
Yugo NAGATA <nagata@sraoss.co.jp>

#14

Yugo Nagata

nagata@sraoss.co.jp

over 1 year ago

In reply to: Yugo Nagata (#13)

Re: Doc: typo in config.sgml

On Tue, 1 Oct 2024 15:16:52 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

On Tue, 01 Oct 2024 10:33:50 +0900 (JST)
Tatsuo Ishii <ishii@postgresql.org> wrote:

That's because non-breaking space (nbsp) is not encoded as 0xa0 in
UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code
point in Unicode. i.e. U+00A0).
So grep -P "[\xC2\xA0]" should work to detect nbsp.

`LC_ALL=C grep -P "\xC2\xA0"` works for my environment.
([ and ] were not necessary.)

When LC_ALL is null, `grep -P "\xA0"` could not detect any characters in charset.sgml,
but I think it is better to specify both LC_ALL=C and "\xC2\xA0" for making sure detecting
nbsp.

One problem is that -P option can be used in only GNU grep, and grep in mac doesn't support it.

On bash, we can also use `grep $'\xc2\xa0'`, but I am not sure we can assume the shell is bash.

Maybe, better way is use perl itself rather than grep as following.

`perl -ne '/\xC2\xA0/ and print' `

I attached a patch fixed in this way.

GNU sed can also be used without setting LC_ALL:

sed -n /"\xC2\xA0"/p

However I am not sure if non-GNU sed can do this too...

Although I've not check it myself, BSD sed doesn't support \x escape according to [1].

[1] https://stackoverflow.com/questions/24275070/sed-not-giving-me-correct-substitute-operation-for-newline-with-mac-difference

By the way, I've attached a patch a bit modified to use the plural form statement
as same as check-tabs.

Non-breaking **spaces** appear in SGML/XML files

The previous patch was broken because the perl command failed to return the correct result.
I've attached an updated patch to fix the return value. In passing, I added line breaks
for long lines.

Regards,
Yugo Nagata

--
Yugo Nagata <nagata@sraoss.co.jp>

#15

Yugo Nagata

nagata@sraoss.co.jp

over 1 year ago

In reply to: Yugo Nagata (#14)

Re: Doc: typo in config.sgml

On Tue, 1 Oct 2024 22:20:55 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:

On Tue, 1 Oct 2024 15:16:52 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

On Tue, 01 Oct 2024 10:33:50 +0900 (JST)
Tatsuo Ishii <ishii@postgresql.org> wrote:

That's because non-breaking space (nbsp) is not encoded as 0xa0 in
UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code
point in Unicode. i.e. U+00A0).
So grep -P "[\xC2\xA0]" should work to detect nbsp.

`LC_ALL=C grep -P "\xC2\xA0"` works for my environment.
([ and ] were not necessary.)

When LC_ALL is null, `grep -P "\xA0"` could not detect any characters in charset.sgml,
but I think it is better to specify both LC_ALL=C and "\xC2\xA0" for making sure detecting
nbsp.

One problem is that -P option can be used in only GNU grep, and grep in mac doesn't support it.

On bash, we can also use `grep $'\xc2\xa0'`, but I am not sure we can assume the shell is bash.

Maybe, better way is use perl itself rather than grep as following.

`perl -ne '/\xC2\xA0/ and print' `

I attached a patch fixed in this way.

GNU sed can also be used without setting LC_ALL:

sed -n /"\xC2\xA0"/p

However I am not sure if non-GNU sed can do this too...

Although I've not check it myself, BSD sed doesn't support \x escape according to [1].

[1] https://stackoverflow.com/questions/24275070/sed-not-giving-me-correct-substitute-operation-for-newline-with-mac-difference

By the way, I've attached a patch a bit modified to use the plural form statement
as same as check-tabs.

Non-breaking **spaces** appear in SGML/XML files

The previous patch was broken because the perl command failed to return the correct result.
I've attached an updated patch to fix the return value. In passing, I added line breaks
for long lines.

I've attached a updated patch.
I added the comment to explain why Perl is used instead of grep or sed.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

Doc: typo in config.sgml

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments: