[PATCH] Fix docs to use canonical links
Hello hackers,
During work in the separate thread [1] /messages/by-id/CAKFQuwYEX9Pj9G0ZHJeWSmSbnqUyGH+FYcW-66eZjfVG4KOjiQ@mail.gmail.com, I discovered more cases
where the link in docs wasn't the canonical link [2] https://en.wikipedia.org/wiki/Canonical_link_element.
[1]: /messages/by-id/CAKFQuwYEX9Pj9G0ZHJeWSmSbnqUyGH+FYcW-66eZjfVG4KOjiQ@mail.gmail.com
[2]: https://en.wikipedia.org/wiki/Canonical_link_element
The. below script e.g. doesn't parse SGML, and is broken in some other ways
also, but probably good enough to suggest changes that can then be manually
carefully verified.
```
#!/bin/bash
output_file="changes.log"
$output_file
extract_canonical() {
local url=$1
canonical=$(curl -s "$url" | sed -n 's/.*<link rel="canonical" href="\([^"]*\)".*/\1/p')
if [[ -n "$canonical" && "$canonical" != "$url" ]]; then
echo "-$url" >> $output_file
echo "+$canonical" >> $output_file
echo $canonical
else
echo $url
fi
}
find . -type f -name '*.sgml' | while read -r file; do
urls=$(sed -n 's/.*\(https:\/\/[^"]*\).*/\1/p' "$file")
for url in $urls; do
canonical_url=$(extract_canonical "$url")
if [[ "$canonical_url" != "$url" ]]; then
# Replace the original URL with the canonical URL in the file
sed -i '' "s|$url|$canonical_url|g" "$file"
fi
done
done
```
Most of what it found was indeed correct, but I had to undo some mistakes it did.
All the changes in the attached patch have been manually verified, by clicking
the original link, and observing the URL seen in the browser.
/Joel
Attachments:
0001-Fix-docs-to-use-canonical-links.patchapplication/octet-stream; name="=?UTF-8?Q?0001-Fix-docs-to-use-canonical-links.patch?="Download
From b75af4afa73381ff3a152d2de3cbadd6be03756e Mon Sep 17 00:00:00 2001
From: Joel Jakobsson <github@compiler.org>
Date: Thu, 27 Jun 2024 11:24:12 +0200
Subject: [PATCH] Fix docs to use canonical links
---
doc/src/sgml/acronyms.sgml | 34 +++++++++++++++++-----------------
doc/src/sgml/isn.sgml | 2 +-
doc/src/sgml/sepgsql.sgml | 2 +-
3 files changed, 19 insertions(+), 19 deletions(-)
diff --git a/doc/src/sgml/acronyms.sgml b/doc/src/sgml/acronyms.sgml
index 6e64b190ea..4b8a4f4a93 100644
--- a/doc/src/sgml/acronyms.sgml
+++ b/doc/src/sgml/acronyms.sgml
@@ -41,7 +41,7 @@
<term><acronym>ASCII</acronym></term>
<listitem>
<para>
- <ulink url="https://en.wikipedia.org/wiki/Ascii">American Standard
+ <ulink url="https://en.wikipedia.org/wiki/ASCII">American Standard
Code for Information Interchange</ulink>
</para>
</listitem>
@@ -149,7 +149,7 @@
<term><acronym>DBMS</acronym></term>
<listitem>
<para>
- <ulink url="https://en.wikipedia.org/wiki/Dbms">Database Management
+ <ulink url="https://en.wikipedia.org/wiki/Database#Database_management_system">Database Management
System</ulink>
</para>
</listitem>
@@ -160,7 +160,7 @@
<listitem>
<para>
<ulink
- url="https://en.wikipedia.org/wiki/Data_Definition_Language">Data
+ url="https://en.wikipedia.org/wiki/Data_definition_language">Data
Definition Language</ulink>, SQL commands such as <command>CREATE
TABLE</command>, <command>ALTER USER</command>
</para>
@@ -172,7 +172,7 @@
<listitem>
<para>
<ulink
- url="https://en.wikipedia.org/wiki/Data_Manipulation_Language">Data
+ url="https://en.wikipedia.org/wiki/Data_manipulation_language">Data
Manipulation Language</ulink>, SQL commands such as <command>INSERT</command>,
<command>UPDATE</command>, <command>DELETE</command>
</para>
@@ -260,7 +260,7 @@
<listitem>
<para>
<ulink
- url="https://en.wikipedia.org/wiki/Git_(software)">Git</ulink>
+ url="https://en.wikipedia.org/wiki/Git">Git</ulink>
</para>
</listitem>
</varlistentry>
@@ -269,7 +269,7 @@
<term><acronym>GMT</acronym></term>
<listitem>
<para>
- <ulink url="https://en.wikipedia.org/wiki/GMT">Greenwich Mean Time</ulink>
+ <ulink url="https://en.wikipedia.org/wiki/Greenwich_Mean_Time">Greenwich Mean Time</ulink>
</para>
</listitem>
</varlistentry>
@@ -359,7 +359,7 @@
<term><acronym>ISSN</acronym></term>
<listitem>
<para>
- <ulink url="https://en.wikipedia.org/wiki/Issn">International Standard
+ <ulink url="https://en.wikipedia.org/wiki/ISSN">International Standard
Serial Number</ulink>
</para>
</listitem>
@@ -452,7 +452,7 @@
<listitem>
<para>
<ulink
- url="https://en.wikipedia.org/wiki/Visual_C++"><productname>Microsoft
+ url="https://en.wikipedia.org/wiki/Microsoft_Visual_C%2B%2B"><productname>Microsoft
Visual C</productname></ulink>
</para>
</listitem>
@@ -502,7 +502,7 @@
<term><acronym>OLAP</acronym></term>
<listitem>
<para>
- <ulink url="https://en.wikipedia.org/wiki/Olap">Online Analytical
+ <ulink url="https://en.wikipedia.org/wiki/Online_analytical_processing">Online Analytical
Processing</ulink>
</para>
</listitem>
@@ -512,7 +512,7 @@
<term><acronym>OLTP</acronym></term>
<listitem>
<para>
- <ulink url="https://en.wikipedia.org/wiki/OLTP">Online Transaction
+ <ulink url="https://en.wikipedia.org/wiki/Online_transaction_processing">Online Transaction
Processing</ulink>
</para>
</listitem>
@@ -522,7 +522,7 @@
<term><acronym>ORDBMS</acronym></term>
<listitem>
<para>
- <ulink url="https://en.wikipedia.org/wiki/ORDBMS">Object-Relational
+ <ulink url="https://en.wikipedia.org/wiki/Object%E2%80%93relational_database">Object-Relational
Database Management System</ulink>
</para>
</listitem>
@@ -533,7 +533,7 @@
<listitem>
<para>
<ulink
- url="https://en.wikipedia.org/wiki/Pluggable_Authentication_Modules">Pluggable
+ url="https://en.wikipedia.org/wiki/Pluggable_authentication_module">Pluggable
Authentication Modules</ulink>
</para>
</listitem>
@@ -600,7 +600,7 @@
<listitem>
<para>
<ulink
- url="https://en.wikipedia.org/wiki/Relational_database_management_system">Relational
+ url="https://en.wikipedia.org/wiki/Relational_database#RDBMS">Relational
Database Management System</ulink>
</para>
</listitem>
@@ -621,7 +621,7 @@
<term><acronym>SGML</acronym></term>
<listitem>
<para>
- <ulink url="https://en.wikipedia.org/wiki/SGML">Standard Generalized
+ <ulink url="https://en.wikipedia.org/wiki/Standard_Generalized_Markup_Language">Standard Generalized
Markup Language</ulink>
</para>
</listitem>
@@ -689,7 +689,7 @@
<term><acronym>SSL</acronym></term>
<listitem>
<para>
- <ulink url="https://en.wikipedia.org/wiki/Secure_Sockets_Layer">Secure Sockets Layer</ulink>
+ <ulink url="https://en.wikipedia.org/wiki/Transport_Layer_Security#SSL_1.0,_2.0,_and_3.0">Secure Sockets Layer</ulink>
</para>
</listitem>
</varlistentry>
@@ -708,7 +708,7 @@
<term><acronym>SYSV</acronym></term>
<listitem>
<para>
- <ulink url="https://en.wikipedia.org/wiki/System_V">Unix System V</ulink>
+ <ulink url="https://en.wikipedia.org/wiki/UNIX_System_V">Unix System V</ulink>
</para>
</listitem>
</varlistentry>
@@ -797,7 +797,7 @@
<term><acronym>UTF8</acronym></term>
<listitem>
<para>
- <ulink url="https://en.wikipedia.org/wiki/Utf8">Eight-Bit Unicode
+ <ulink url="https://en.wikipedia.org/wiki/UTF-8">Eight-Bit Unicode
Transformation Format</ulink>
</para>
</listitem>
diff --git a/doc/src/sgml/isn.sgml b/doc/src/sgml/isn.sgml
index ea2aabc87d..bd7a221f73 100644
--- a/doc/src/sgml/isn.sgml
+++ b/doc/src/sgml/isn.sgml
@@ -397,7 +397,7 @@ SELECT isbn13(id) FROM test;
The prefixes used for hyphenation were also compiled from:
<itemizedlist>
<listitem><para><ulink url="https://www.gs1.org/standards/id-keys"></ulink></para></listitem>
- <listitem><para><ulink url="https://en.wikipedia.org/wiki/List_of_ISBN_identifier_groups"></ulink></para></listitem>
+ <listitem><para><ulink url="https://en.wikipedia.org/wiki/List_of_ISBN_registration_groups"></ulink></para></listitem>
<listitem><para><ulink url="https://www.isbn-international.org/content/isbn-users-manual"></ulink></para></listitem>
<listitem><para><ulink url="https://en.wikipedia.org/wiki/International_Standard_Music_Number"></ulink></para></listitem>
<listitem><para><ulink url="https://www.ismn-international.org/ranges.html"></ulink></para></listitem>
diff --git a/doc/src/sgml/sepgsql.sgml b/doc/src/sgml/sepgsql.sgml
index 1b848f1977..bc308e3142 100644
--- a/doc/src/sgml/sepgsql.sgml
+++ b/doc/src/sgml/sepgsql.sgml
@@ -794,7 +794,7 @@ ERROR: SELinux: security policy violation
<title>External Resources</title>
<variablelist>
<varlistentry>
- <term><ulink url="https://wiki.postgresql.org/wiki/SEPostgreSQL">SE-PostgreSQL Introduction</ulink></term>
+ <term><ulink url="https://wiki.postgresql.org/wiki/SEPostgreSQL_Introduction">SE-PostgreSQL Introduction</ulink></term>
<listitem>
<para>
This wiki page provides a brief overview, security design, architecture,
--
2.45.1
On Thu, Jun 27, 2024 at 11:27:45AM +0200, Joel Jacobson wrote:
During work in the separate thread [1], I discovered more cases
where the link in docs wasn't the canonical link [2].[1] /messages/by-id/CAKFQuwYEX9Pj9G0ZHJeWSmSbnqUyGH+FYcW-66eZjfVG4KOjiQ@mail.gmail.com
[2] https://en.wikipedia.org/wiki/Canonical_link_elementThe. below script e.g. doesn't parse SGML, and is broken in some other ways
also, but probably good enough to suggest changes that can then be manually
carefully verified.
The 19 links you are updating here avoid redirections in Wikipedia and
the Postgres wiki. It's always a bit of a chicken-and-egg game in
this area, because links always change, still I don't mind the change.
Any opinions from others?
--
Michael
On 1 Jul 2024, at 08:06, Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Jun 27, 2024 at 11:27:45AM +0200, Joel Jacobson wrote:
During work in the separate thread [1], I discovered more cases
where the link in docs wasn't the canonical link [2].[1] /messages/by-id/CAKFQuwYEX9Pj9G0ZHJeWSmSbnqUyGH+FYcW-66eZjfVG4KOjiQ@mail.gmail.com
[2] https://en.wikipedia.org/wiki/Canonical_link_elementThe. below script e.g. doesn't parse SGML, and is broken in some other ways
also, but probably good enough to suggest changes that can then be manually
carefully verified.The 19 links you are updating here avoid redirections in Wikipedia and
the Postgres wiki. It's always a bit of a chicken-and-egg game in
this area, because links always change, still I don't mind the change.
Avoding redirects is generally a good thing, not everyone is on lightning fast
internet. Wikipedia is however not doing any 30X redirects so it's not really
an issue for those links, it's all 200 requests.
--
Daniel Gustafsson
On Mon, Jul 1, 2024, at 09:35, Daniel Gustafsson wrote:
Avoding redirects is generally a good thing, not everyone is on lightning fast
internet. Wikipedia is however not doing any 30X redirects so it's not really
an issue for those links, it's all 200 requests.
Yes, I noticed that too when observing the HTTPS traffic, so no issue there,
except that it's a bit annoying that the address bar suddenly changes.
However, I think David J had another good argument:
"If we are making wikipedia our authority we might as well use their standard for naming."
/Joel
On 1 Jul 2024, at 13:09, Joel Jacobson <joel@compiler.org> wrote:
On Mon, Jul 1, 2024, at 09:35, Daniel Gustafsson wrote:
Avoding redirects is generally a good thing, not everyone is on lightning fast
internet. Wikipedia is however not doing any 30X redirects so it's not really
an issue for those links, it's all 200 requests.Yes, I noticed that too when observing the HTTPS traffic, so no issue there,
except that it's a bit annoying that the address bar suddenly changes.
Right, I was unclear, I'm not advocating against changing. It won't move the
needle compared to 30X redirects but it also won't hurt.
However, I think David J had another good argument:
"If we are making wikipedia our authority we might as well use their standard for naming."
It's a moving target, but so is most if not all links.
--
Daniel Gustafsson
Daniel Gustafsson <daniel@yesql.se> writes:
On 1 Jul 2024, at 13:09, Joel Jacobson <joel@compiler.org> wrote:
However, I think David J had another good argument:
"If we are making wikipedia our authority we might as well use their standard for naming."
It's a moving target, but so is most if not all links.
I see nothing wrong with this patch, so pushed.
regards, tom lane