Doc: typo in config.sgml
I think there's an unnecessary underscore in config.sgml.
Attached patch fixes it.
Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
Attachments:
fix_config.patchtext/x-patch; charset=iso-8859-1Download+1-1
On Mon, 30 Sep 2024 15:34:04 +0900 (JST)
Tatsuo Ishii <ishii@postgresql.org> wrote:
I think there's an unnecessary underscore in config.sgml.
Attached patch fixes it.
I could not apply the patch with an error.
error: patch failed: doc/src/sgml/config.sgml:9380
error: doc/src/sgml/config.sgml: patch does not apply
I found your patch contains an odd character (ASCII Code 240?)
by performing `od -c` command on the file. See the attached file.
Regards,
Yugo Nagata
Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
--
Yugo Nagata <nagata@sraoss.co.jp>
Attachments:
I think there's an unnecessary underscore in config.sgml.
Attached patch fixes it.I could not apply the patch with an error.
error: patch failed: doc/src/sgml/config.sgml:9380
error: doc/src/sgml/config.sgml: patch does not apply
Strange. I have no problem applying the patch here.
I found your patch contains an odd character (ASCII Code 240?)
by performing `od -c` command on the file. See the attached file.
Yes, 240 in octal (== 0xc2) is in the patch but it's because current
config.sgml includes the character. You can check it by looking at
line 9383 of config.sgml.
I think it was introduced by 28e858c0f95.
Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
On Mon, 30 Sep 2024 17:23:24 +0900 (JST)
Tatsuo Ishii <ishii@postgresql.org> wrote:
I think there's an unnecessary underscore in config.sgml.
Attached patch fixes it.I could not apply the patch with an error.
error: patch failed: doc/src/sgml/config.sgml:9380
error: doc/src/sgml/config.sgml: patch does not applyStrange. I have no problem applying the patch here.
I found your patch contains an odd character (ASCII Code 240?)
by performing `od -c` command on the file. See the attached file.Yes, 240 in octal (== 0xc2) is in the patch but it's because current
config.sgml includes the character. You can check it by looking at
line 9383 of config.sgml.
Yes, you are right, I can find the 0xc2 char in config.sgml using od -c,
although I still could not apply the patch.
I think this is non-breaking space of (C2A0) of utf-8. I guess my
terminal normally regards this as a space, so applying patch fails.
I found it also in line 85 of ref/drop_extension.sgml.
I think it was introduced by 28e858c0f95.
Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
--
Yugo NAGATA <nagata@sraoss.co.jp>
I think there's an unnecessary underscore in config.sgml.
I was wrong. The particular byte sequences just looked an underscore
on my editor but the byte sequence is actually 0xc2a0, which must be a
"non breaking space" encoded in UTF-8. I guess someone mistakenly
insert a non breaking space while editing config.sgml.
However the mistake does not affect the patch.
Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
On Mon, 30 Sep 2024 18:03:44 +0900 (JST)
Tatsuo Ishii <ishii@postgresql.org> wrote:
I think there's an unnecessary underscore in config.sgml.
I was wrong. The particular byte sequences just looked an underscore
on my editor but the byte sequence is actually 0xc2a0, which must be a
"non breaking space" encoded in UTF-8. I guess someone mistakenly
insert a non breaking space while editing config.sgml.However the mistake does not affect the patch.
It looks like we've crisscrossed our mail.
Anyway, I agree with removing non breaking spaces, as well as
one found in line 85 of ref/drop_extension.sgml.
Regards,
Yugo Nagata
Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
--
Yugo NAGATA <nagata@sraoss.co.jp>
On 30 Sep 2024, at 11:03, Tatsuo Ishii <ishii@postgresql.org> wrote:
I think there's an unnecessary underscore in config.sgml.
I was wrong. The particular byte sequences just looked an underscore
on my editor but the byte sequence is actually 0xc2a0, which must be a
"non breaking space" encoded in UTF-8. I guess someone mistakenly
insert a non breaking space while editing config.sgml.
I wonder if it would be worth to add a check for this like we have to tabs?
The attached adds a rule to "make -C doc/src/sgml check" for trapping nbsp
(doing so made me realize we don't have an equivalent meson target).
--
Daniel Gustafsson
Attachments:
check_nbsp.diffapplication/octet-stream; name=check_nbsp.diff; x-unix-mode=0644Download+4-1
On Mon, 30 Sep 2024 11:59:48 +0200
Daniel Gustafsson <daniel@yesql.se> wrote:
On 30 Sep 2024, at 11:03, Tatsuo Ishii <ishii@postgresql.org> wrote:
I think there's an unnecessary underscore in config.sgml.
I was wrong. The particular byte sequences just looked an underscore
on my editor but the byte sequence is actually 0xc2a0, which must be a
"non breaking space" encoded in UTF-8. I guess someone mistakenly
insert a non breaking space while editing config.sgml.I wonder if it would be worth to add a check for this like we have to tabs?
The attached adds a rule to "make -C doc/src/sgml check" for trapping nbsp
(doing so made me realize we don't have an equivalent meson target).
Your patch couldn't detect 0xA0 in config.sgml in my machine, but it works
when I use `grep -P "[\xA0]"` instead of `grep -e "\xA0"`.
However, it also detects the following line in charset.sgml.
(https://www.postgresql.org/docs/current/collation.html)
For example, locale und-u-kb sorts 'àe' before 'aé'.
This is not non-breaking space, so should not be detected as an error.
Regards,
Yugo Nagata
--
Daniel Gustafsson
--
Yugo Nagata <nagata@sraoss.co.jp>
I wonder if it would be worth to add a check for this like we have to tabs?
+1.
The attached adds a rule to "make -C doc/src/sgml check" for trapping nbsp
(doing so made me realize we don't have an equivalent meson target).Your patch couldn't detect 0xA0 in config.sgml in my machine, but it works
when I use `grep -P "[\xA0]"` instead of `grep -e "\xA0"`.However, it also detects the following line in charset.sgml.
(https://www.postgresql.org/docs/current/collation.html)For example, locale und-u-kb sorts 'àe' before 'aé'.
This is not non-breaking space, so should not be detected as an error.
That's because non-breaking space (nbsp) is not encoded as 0xa0 in
UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code
point in Unicode. i.e. U+00A0).
So grep -P "[\xC2\xA0]" should work to detect nbsp.
Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
On Mon, 30 Sep 2024 20:07:31 +0900 (JST)
Tatsuo Ishii <ishii@postgresql.org> wrote:
I wonder if it would be worth to add a check for this like we have to tabs?
+1.
The attached adds a rule to "make -C doc/src/sgml check" for trapping nbsp
(doing so made me realize we don't have an equivalent meson target).Your patch couldn't detect 0xA0 in config.sgml in my machine, but it works
when I use `grep -P "[\xA0]"` instead of `grep -e "\xA0"`.However, it also detects the following line in charset.sgml.
(https://www.postgresql.org/docs/current/collation.html)For example, locale und-u-kb sorts 'àe' before 'aé'.
This is not non-breaking space, so should not be detected as an error.
That's because non-breaking space (nbsp) is not encoded as 0xa0 in
UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code
point in Unicode. i.e. U+00A0).
So grep -P "[\xC2\xA0]" should work to detect nbsp.
`LC_ALL=C grep -P "\xC2\xA0"` works for my environment.
([ and ] were not necessary.)
When LC_ALL is null, `grep -P "\xA0"` could not detect any characters in charset.sgml,
but I think it is better to specify both LC_ALL=C and "\xC2\xA0" for making sure detecting
nbsp.
One problem is that -P option can be used in only GNU grep, and grep in mac doesn't support it.
On bash, we can also use `grep $'\xc2\xa0'`, but I am not sure we can assume the shell is bash.
Maybe, better way is use perl itself rather than grep as following.
`perl -ne '/\xC2\xA0/ and print' `
I attached a patch fixed in this way.
Regards,
Yugo Nagata
Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
--
Yugo NAGATA <nagata@sraoss.co.jp>
Attachments:
v2_check_nbsp.difftext/x-diff; name=v2_check_nbsp.diffDownload+4-1
That's because non-breaking space (nbsp) is not encoded as 0xa0 in
UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code
point in Unicode. i.e. U+00A0).
So grep -P "[\xC2\xA0]" should work to detect nbsp.`LC_ALL=C grep -P "\xC2\xA0"` works for my environment.
([ and ] were not necessary.)When LC_ALL is null, `grep -P "\xA0"` could not detect any characters in charset.sgml,
but I think it is better to specify both LC_ALL=C and "\xC2\xA0" for making sure detecting
nbsp.One problem is that -P option can be used in only GNU grep, and grep in mac doesn't support it.
On bash, we can also use `grep $'\xc2\xa0'`, but I am not sure we can assume the shell is bash.
Maybe, better way is use perl itself rather than grep as following.
`perl -ne '/\xC2\xA0/ and print' `
I attached a patch fixed in this way.
GNU sed can also be used without setting LC_ALL:
sed -n /"\xC2\xA0"/p
However I am not sure if non-GNU sed can do this too...
Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
On Mon, 30 Sep 2024 17:23:24 +0900 (JST)
Tatsuo Ishii <ishii@postgresql.org> wrote:I think there's an unnecessary underscore in config.sgml.
Attached patch fixes it.I could not apply the patch with an error.
error: patch failed: doc/src/sgml/config.sgml:9380
error: doc/src/sgml/config.sgml: patch does not applyStrange. I have no problem applying the patch here.
I found your patch contains an odd character (ASCII Code 240?)
by performing `od -c` command on the file. See the attached file.Yes, 240 in octal (== 0xc2) is in the patch but it's because current
config.sgml includes the character. You can check it by looking at
line 9383 of config.sgml.Yes, you are right, I can find the 0xc2 char in config.sgml using od -c,
although I still could not apply the patch.I think this is non-breaking space of (C2A0) of utf-8. I guess my
terminal normally regards this as a space, so applying patch fails.I found it also in line 85 of ref/drop_extension.sgml.
Thanks. I have pushed the fix for ref/drop_extension.sgml along with
config.sgml.
Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
On Tue, 01 Oct 2024 10:33:50 +0900 (JST)
Tatsuo Ishii <ishii@postgresql.org> wrote:
That's because non-breaking space (nbsp) is not encoded as 0xa0 in
UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code
point in Unicode. i.e. U+00A0).
So grep -P "[\xC2\xA0]" should work to detect nbsp.`LC_ALL=C grep -P "\xC2\xA0"` works for my environment.
([ and ] were not necessary.)When LC_ALL is null, `grep -P "\xA0"` could not detect any characters in charset.sgml,
but I think it is better to specify both LC_ALL=C and "\xC2\xA0" for making sure detecting
nbsp.One problem is that -P option can be used in only GNU grep, and grep in mac doesn't support it.
On bash, we can also use `grep $'\xc2\xa0'`, but I am not sure we can assume the shell is bash.
Maybe, better way is use perl itself rather than grep as following.
`perl -ne '/\xC2\xA0/ and print' `
I attached a patch fixed in this way.
GNU sed can also be used without setting LC_ALL:
sed -n /"\xC2\xA0"/p
However I am not sure if non-GNU sed can do this too...
Although I've not check it myself, BSD sed doesn't support \x escape according to [1]https://stackoverflow.com/questions/24275070/sed-not-giving-me-correct-substitute-operation-for-newline-with-mac-difference.
By the way, I've attached a patch a bit modified to use the plural form statement
as same as check-tabs.
Non-breaking **spaces** appear in SGML/XML files
Regards,
Yugo Nagata
Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
--
Yugo NAGATA <nagata@sraoss.co.jp>
Attachments:
v3_check_nbsp.difftext/x-diff; name=v3_check_nbsp.diffDownload+4-1
On Tue, 1 Oct 2024 15:16:52 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:
On Tue, 01 Oct 2024 10:33:50 +0900 (JST)
Tatsuo Ishii <ishii@postgresql.org> wrote:That's because non-breaking space (nbsp) is not encoded as 0xa0 in
UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code
point in Unicode. i.e. U+00A0).
So grep -P "[\xC2\xA0]" should work to detect nbsp.`LC_ALL=C grep -P "\xC2\xA0"` works for my environment.
([ and ] were not necessary.)When LC_ALL is null, `grep -P "\xA0"` could not detect any characters in charset.sgml,
but I think it is better to specify both LC_ALL=C and "\xC2\xA0" for making sure detecting
nbsp.One problem is that -P option can be used in only GNU grep, and grep in mac doesn't support it.
On bash, we can also use `grep $'\xc2\xa0'`, but I am not sure we can assume the shell is bash.
Maybe, better way is use perl itself rather than grep as following.
`perl -ne '/\xC2\xA0/ and print' `
I attached a patch fixed in this way.
GNU sed can also be used without setting LC_ALL:
sed -n /"\xC2\xA0"/p
However I am not sure if non-GNU sed can do this too...
Although I've not check it myself, BSD sed doesn't support \x escape according to [1].
By the way, I've attached a patch a bit modified to use the plural form statement
as same as check-tabs.Non-breaking **spaces** appear in SGML/XML files
The previous patch was broken because the perl command failed to return the correct result.
I've attached an updated patch to fix the return value. In passing, I added line breaks
for long lines.
Regards,
Yugo Nagata
--
Yugo Nagata <nagata@sraoss.co.jp>
Attachments:
v4_check_nbsp.difftext/x-diff; name=v4_check_nbsp.diffDownload+9-3
On Tue, 1 Oct 2024 22:20:55 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:
On Tue, 1 Oct 2024 15:16:52 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:On Tue, 01 Oct 2024 10:33:50 +0900 (JST)
Tatsuo Ishii <ishii@postgresql.org> wrote:That's because non-breaking space (nbsp) is not encoded as 0xa0 in
UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code
point in Unicode. i.e. U+00A0).
So grep -P "[\xC2\xA0]" should work to detect nbsp.`LC_ALL=C grep -P "\xC2\xA0"` works for my environment.
([ and ] were not necessary.)When LC_ALL is null, `grep -P "\xA0"` could not detect any characters in charset.sgml,
but I think it is better to specify both LC_ALL=C and "\xC2\xA0" for making sure detecting
nbsp.One problem is that -P option can be used in only GNU grep, and grep in mac doesn't support it.
On bash, we can also use `grep $'\xc2\xa0'`, but I am not sure we can assume the shell is bash.
Maybe, better way is use perl itself rather than grep as following.
`perl -ne '/\xC2\xA0/ and print' `
I attached a patch fixed in this way.
GNU sed can also be used without setting LC_ALL:
sed -n /"\xC2\xA0"/p
However I am not sure if non-GNU sed can do this too...
Although I've not check it myself, BSD sed doesn't support \x escape according to [1].
By the way, I've attached a patch a bit modified to use the plural form statement
as same as check-tabs.Non-breaking **spaces** appear in SGML/XML files
The previous patch was broken because the perl command failed to return the correct result.
I've attached an updated patch to fix the return value. In passing, I added line breaks
for long lines.
I've attached a updated patch.
I added the comment to explain why Perl is used instead of grep or sed.
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata@sraoss.co.jp>
Attachments:
v5_check_nbsp.difftext/x-diff; name=v5_check_nbsp.diffDownload+10-2
On Mon, Sep 30, 2024 at 11:59:48AM +0200, Daniel Gustafsson wrote:
On 30 Sep 2024, at 11:03, Tatsuo Ishii <ishii@postgresql.org> wrote:
I think there's an unnecessary underscore in config.sgml.
I was wrong. The particular byte sequences just looked an underscore
on my editor but the byte sequence is actually 0xc2a0, which must be a
"non breaking space" encoded in UTF-8. I guess someone mistakenly
insert a non breaking space while editing config.sgml.I wonder if it would be worth to add a check for this like we have to tabs?
The attached adds a rule to "make -C doc/src/sgml check" for trapping nbsp
(doing so made me realize we don't have an equivalent meson target).
Can we check for any character outside the support range of SGML?
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
When a patient asks the doctor, "Am I going to die?", he means
"Am I going to die soon?"
On Tue, 1 Oct 2024 22:20:55 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:On Tue, 1 Oct 2024 15:16:52 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:On Tue, 01 Oct 2024 10:33:50 +0900 (JST)
Tatsuo Ishii <ishii@postgresql.org> wrote:That's because non-breaking space (nbsp) is not encoded as 0xa0 in
UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code
point in Unicode. i.e. U+00A0).
So grep -P "[\xC2\xA0]" should work to detect nbsp.`LC_ALL=C grep -P "\xC2\xA0"` works for my environment.
([ and ] were not necessary.)When LC_ALL is null, `grep -P "\xA0"` could not detect any characters in charset.sgml,
but I think it is better to specify both LC_ALL=C and "\xC2\xA0" for making sure detecting
nbsp.One problem is that -P option can be used in only GNU grep, and grep in mac doesn't support it.
On bash, we can also use `grep $'\xc2\xa0'`, but I am not sure we can assume the shell is bash.
Maybe, better way is use perl itself rather than grep as following.
`perl -ne '/\xC2\xA0/ and print' `
I attached a patch fixed in this way.
GNU sed can also be used without setting LC_ALL:
sed -n /"\xC2\xA0"/p
However I am not sure if non-GNU sed can do this too...
Although I've not check it myself, BSD sed doesn't support \x escape according to [1].
By the way, I've attached a patch a bit modified to use the plural form statement
as same as check-tabs.Non-breaking **spaces** appear in SGML/XML files
The previous patch was broken because the perl command failed to return the correct result.
I've attached an updated patch to fix the return value. In passing, I added line breaks
for long lines.I've attached a updated patch.
I added the comment to explain why Perl is used instead of grep or sed.
Looks good to me. If there's no objection, I will commit this to
master branch.
Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
On 8 Oct 2024, at 02:03, Tatsuo Ishii <ishii@postgresql.org> wrote:
On Tue, 1 Oct 2024 22:20:55 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:
I've attached a updated patch.
I added the comment to explain why Perl is used instead of grep or sed.Looks good to me. If there's no objection, I will commit this to
master branch.
No objections, LGTM.
--
Daniel Gustafsson
Hi Danile, Yugo,
On 8 Oct 2024, at 02:03, Tatsuo Ishii <ishii@postgresql.org> wrote:
On Tue, 1 Oct 2024 22:20:55 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:I've attached a updated patch.
I added the comment to explain why Perl is used instead of grep or sed.Looks good to me. If there's no objection, I will commit this to
master branch.No objections, LGTM.
Thank you for the patch and review! I have pushed the patch.
Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
On Mon, 7 Oct 2024 15:45:54 -0400
Bruce Momjian <bruce@momjian.us> wrote:
On Mon, Sep 30, 2024 at 11:59:48AM +0200, Daniel Gustafsson wrote:
On 30 Sep 2024, at 11:03, Tatsuo Ishii <ishii@postgresql.org> wrote:
I think there's an unnecessary underscore in config.sgml.
I was wrong. The particular byte sequences just looked an underscore
on my editor but the byte sequence is actually 0xc2a0, which must be a
"non breaking space" encoded in UTF-8. I guess someone mistakenly
insert a non breaking space while editing config.sgml.I wonder if it would be worth to add a check for this like we have to tabs?
The attached adds a rule to "make -C doc/src/sgml check" for trapping nbsp
(doing so made me realize we don't have an equivalent meson target).Can we check for any character outside the support range of SGML?
What we can define the range of allowed characters range in SGML?
We can detect non-ASCII characters by using regexp /\P{ascii}/ or /[^\x00-\x7f]/,
but they are used in some places in charset.sgml and some names in release-*.sgml.
Regards,
Yugo Nagata
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.comWhen a patient asks the doctor, "Am I going to die?", he means
"Am I going to die soon?"
--
Yugo Nagata <nagata@sraoss.co.jp>