Converting README documentation to Markdown

Started by Daniel Gustafssonalmost 2 years ago25 messageshackers
Jump to latest
#1Daniel Gustafsson
daniel@yesql.se

Over in [0]20240405000935.2zujjc5t5e2jai4k@awork3.anarazel.de I asked whether it would be worthwhile converting all our README
files to Markdown, and since it wasn't met with pitchforks I figured it would
be an interesting excercise to see what it would take (my honest gut feeling
was that it would be way too intrusive). Markdown does brings a few key
features however so IMHO it's worth attempting to see:

* New developers are very used to reading/writing it
* Using a defined format ensures some level of consistency
* Many users and contributors new *as well as* old like reading documentation
nicely formatted in a browser
* The documentation now prints really well
* pandoc et.al can be used to render nice looking PDF's
* All the same benefits as discussed in [0]20240405000935.2zujjc5t5e2jai4k@awork3.anarazel.de

The plan was to follow Grubers original motivation for Markdown closely:

"The idea is that a Markdown-formatted document should be publishable
as-is, as plain text, without looking like it’s been marked up with
tags or formatting instructions."

This translates to making the least amount of changes to achieve a) retained
plain text readability at todays level, b) proper Markdown rendering, not
looking like text files in a HTML window, and c) absolutly no reflows and
minimal impact on git blame.

Turns out we've been writing Markdown for quite some time, so it really didn't
take much at all. I renamed all the files .md and with almost just changing
whitespace achieved what I think is pretty decent results. The rendered
versions can be seen by browsing the tree below:

https://github.com/danielgustafsson/postgres/tree/markdown

The whitespace changes are mostly making sure that code (anything which is to
be rendered without styling really) is indented from column 0 with tab or 4
spaces (depending on what was already used in the file) and has a blank line
before and after. This is the bulk of the changes. The non-whitespace changes
introduced are:

* Section/subsection markers: Basically all our files underline the main
section with ==== and subsections with ----. This renders perfectly well with.
Markdown so add these to the few that didn't have them.

* The SSL readme starts a sentence with ">" which renders as quote, removing
that fixes rendering and makes the plain text version better IMHO.

* In the regex README there are two file references using * as a wildcard, but
the combination of the two makes Markdown render the text between them in
italics. Wrapping these in backticks solves it, but I'm not a fan since we
don't do that elsewhere. A solution which avoids backticks would ne nice.

* Some bulletlists characters are changed to match the syntax, which also makes
them more consistent with all the other README files in the tree. In one
case (SSL test readme) there were no bullets at all which is both
inconsistent and renders poorly.

* Anything inside <> is rendered as a link if it matches, so in cases where <X>
is used to indicatee "replace with X" I added whitespace like "< X >" which
might be a bit ugly, but works. When referencing header files with <time.h>
the <> are removed to just say the header name, which seemed like the least bad
option there.

* Text quoted with backticks, like `foo' is replaced with 'foo' to keep it from
rendering like code.

* Rather than indenting the whole original README for bsd_indent I added ``` to
make it a code block, ie render without formatting.

The README files in doc/ are left untouched as they contain lots of <foo> XML
tags which all would need to be wrapped in backticks at the cost of plain text
readability. Might not be controversial and in that case they can be done too,
but I left them for now since they deviated from the least-changes-possible
plan for the patchset. It can probably be argued thats lots of other READMEs
can be skipped as well, like all the ones in test modules which have 4 lines
saying the directory contains a test for the thing which the name of the
directory already gave away. For completeness I left those in though, they for
the most part go untouched.

It's not perfect by any stretch, there are still for example cases where a * in
the text turns on italic rendering which wasn't the intention if the author.
Resisting the temptation to go overboard with changes is however a design goal,
these are after all work documents and should be functional and practical.

In order to make review a bit easier I've split the patch into two, one for the
file renaming and one for the changes. Inspecting the 0002 diff by skipping
whitespace shows the above discussed changes.

Thoughts?

--
Daniel Gustafsson

[0]: 20240405000935.2zujjc5t5e2jai4k@awork3.anarazel.de
[1]: CAG6XLEmGE95DdKqjk+Dd9vC8mfN7BnV2WFgYk_9ovW6ikN0YSg@mail.gmail.com
[2]: https://daringfireball.net/projects/markdown/

Attachments:

v1-0002-Convert-internal-documentation-to-markdown-Conver.patchapplication/octet-stream; name=v1-0002-Convert-internal-documentation-to-markdown-Conver.patch; x-unix-mode=0644Download+897-795
v1-0001-Convert-internal-documentation-to-Markdown-rename.patchapplication/octet-stream; name=v1-0001-Convert-internal-documentation-to-Markdown-rename.patch; x-unix-mode=0644Download+0-1
#2Erik Wienhold
ewie@ewie.name
In reply to: Daniel Gustafsson (#1)
Re: Converting README documentation to Markdown

On 2024-04-08 21:29 +0200, Daniel Gustafsson wrote:

Over in [0] I asked whether it would be worthwhile converting all our README
files to Markdown, and since it wasn't met with pitchforks I figured it would
be an interesting excercise to see what it would take (my honest gut feeling
was that it would be way too intrusive). Markdown does brings a few key
features however so IMHO it's worth attempting to see:

* New developers are very used to reading/writing it
* Using a defined format ensures some level of consistency
* Many users and contributors new *as well as* old like reading documentation
nicely formatted in a browser
* The documentation now prints really well
* pandoc et.al can be used to render nice looking PDF's
* All the same benefits as discussed in [0]

The plan was to follow Grubers original motivation for Markdown closely:

"The idea is that a Markdown-formatted document should be publishable
as-is, as plain text, without looking like it’s been marked up with
tags or formatting instructions."

+1 for keeping the plaintext readable.

This translates to making the least amount of changes to achieve a) retained
plain text readability at todays level, b) proper Markdown rendering, not
looking like text files in a HTML window, and c) absolutly no reflows and
minimal impact on git blame.

Turns out we've been writing Markdown for quite some time, so it really didn't
take much at all. I renamed all the files .md and with almost just changing
whitespace achieved what I think is pretty decent results. The rendered
versions can be seen by browsing the tree below:

https://github.com/danielgustafsson/postgres/tree/markdown

The whitespace changes are mostly making sure that code (anything which is to
be rendered without styling really) is indented from column 0 with tab or 4
spaces (depending on what was already used in the file) and has a blank line
before and after. This is the bulk of the changes.

I've only peeked at a couple of those READMEs, but they look alright so
far (at least on GitHub). Should we settle on a specific Markdown
flavor[1]https://markdownguide.offshoot.io/extended-syntax/#lightweight-markup-languages? Because I'm never sure if some markups only work on
specific code-hosting sites. Maybe also a guide on writing Markdown
that renders properly, especially with regard to escaping that may be
necessary (see below).

The non-whitespace changes introduced are:

[...]

* In the regex README there are two file references using * as a wildcard, but
the combination of the two makes Markdown render the text between them in
italics. Wrapping these in backticks solves it, but I'm not a fan since we
don't do that elsewhere. A solution which avoids backticks would ne nice.

Escaping does the trick: regc_\*.c

[...]

* Anything inside <> is rendered as a link if it matches, so in cases where <X>
is used to indicatee "replace with X" I added whitespace like "< X >" which
might be a bit ugly, but works. When referencing header files with <time.h>
the <> are removed to just say the header name, which seemed like the least bad
option there.

Can be escaped as well: \<X>

[1]: https://markdownguide.offshoot.io/extended-syntax/#lightweight-markup-languages

--
Erik

#3Daniel Gustafsson
daniel@yesql.se
In reply to: Erik Wienhold (#2)
Re: Converting README documentation to Markdown

On 8 Apr 2024, at 22:30, Erik Wienhold <ewie@ewie.name> wrote:
On 2024-04-08 21:29 +0200, Daniel Gustafsson wrote:

I've only peeked at a couple of those READMEs, but they look alright so
far (at least on GitHub). Should we settle on a specific Markdown
flavor[1]? Because I'm never sure if some markups only work on
specific code-hosting sites.

Probably, but if we strive for maintained textual readability with avoiding
most of the creative markup then we're probably close to the original version.
But I agree, it should be evaluated.

Maybe also a guide on writing Markdown
that renders properly, especially with regard to escaping that may be
necessary (see below).

That's a good point, if we opt for an actual format there should be some form
of documentation about that format, especially if we settle for using a
fraction of the capabilities of the format.

* In the regex README there are two file references using * as a wildcard, but
the combination of the two makes Markdown render the text between them in
italics. Wrapping these in backticks solves it, but I'm not a fan since we
don't do that elsewhere. A solution which avoids backticks would ne nice.

Escaping does the trick: regc_\*.c

Right, but that makes the plaintext version less readable than the backticks I
think.

Can be escaped as well: \<X>

..and same with this one. It's all very subjective though.

--
Daniel Gustafsson

#4Peter Eisentraut
peter_e@gmx.net
In reply to: Daniel Gustafsson (#1)
Re: Converting README documentation to Markdown

On 08.04.24 21:29, Daniel Gustafsson wrote:

Over in [0] I asked whether it would be worthwhile converting all our README
files to Markdown, and since it wasn't met with pitchforks I figured it would
be an interesting excercise to see what it would take (my honest gut feeling
was that it would be way too intrusive). Markdown does brings a few key
features however so IMHO it's worth attempting to see:

* New developers are very used to reading/writing it
* Using a defined format ensures some level of consistency
* Many users and contributors new*as well as* old like reading documentation
nicely formatted in a browser
* The documentation now prints really well
* pandoc et.al can be used to render nice looking PDF's
* All the same benefits as discussed in [0]

The plan was to follow Grubers original motivation for Markdown closely:

"The idea is that a Markdown-formatted document should be publishable
as-is, as plain text, without looking like it’s been marked up with
tags or formatting instructions."

This translates to making the least amount of changes to achieve a) retained
plain text readability at todays level, b) proper Markdown rendering, not
looking like text files in a HTML window, and c) absolutly no reflows and
minimal impact on git blame.

I started looking through this and immediately found a bunch of tiny
problems. (This is probably in part because the READMEs under
src/backend/access/ are some of the more complicated ones, but then they
are also the ones that might benefit most from better rendering.)

One general problem is that original Markdown and GitHub-flavored
Markdown (GFM) are incompatible in some interesting aspects. For
example, the line

A split initially marks the left page with the F_FOLLOW_RIGHT flag.

is rendered by GFM as you'd expect. But original Markdown converts it to

A split initially marks the left page with the F<em>FOLLOW</em>RIGHT
flag.

This kind of problem is pervasive, as you'd expect.

Another incompatibility is that GFM accepts "1)" as a list marker (which
appears to be used often in the READMEs), but original Markdown does
not. This then also affects surrounding formatting.

Also, the READMEs often do not indent lists in a non-ambiguous way. For
example, if you look into src/backend/optimizer/README, section "Join
Tree Construction", there are two list items, but it's not immediately
clear which paragraphs belong to the list and which ones follow the
list. This also interacts with the previous point. The resulting
formatting in GFM is quite misleading.

src/port/README.md is a similar case.

There are also various places where whitespace is used for ad-hoc
formatting. Consider for example in src/backend/access/gin/README

the "category" of the null entry. These are the possible categories:

1 = ordinary null key value extracted from an indexable item
2 = placeholder for zero-key indexable item
3 = placeholder for null indexable item

Placeholder null entries are inserted into the index because otherwise

But this does not preserve the list-like formatting, it just flows it
together.

There is a similar case with the authors list at the end of
src/backend/access/gist/README.md.

src/test/README.md wasn't touched by your patch, but it also needs
adjustments for list formatting.

In summary, I think before we could accept this, we'd need to go through
this with a fine-toothed comb line by line and page by page to make sure
the formatting is still sound. And we'd need to figure out which
Markdown flavor to target.

#5Daniel Gustafsson
daniel@yesql.se
In reply to: Peter Eisentraut (#4)
Re: Converting README documentation to Markdown

On 13 May 2024, at 09:20, Peter Eisentraut <peter@eisentraut.org> wrote:

I started looking through this and immediately found a bunch of tiny problems. (This is probably in part because the READMEs under src/backend/access/ are some of the more complicated ones, but then they are also the ones that might benefit most from better rendering.)

Thanks for looking!

One general problem is that original Markdown and GitHub-flavored Markdown (GFM) are incompatible in some interesting aspects.

That's true, but virtually every implementation of Markdown in practical use
today is incompatible with Original Markdown.

Reading my email I realize I failed to mention the markdown platforms I was
targeting (and thus flavours), and citing Gruber made it even more confusing.
For online reading I verified with Github and VS Code since they have a huge
market presence. For offline work I targeted rendering with pandoc since we
already have a dependency on it in the tree. I don't think targeting the
original Markdown implementation is useful, or even realistic.

Another aspect of platform/flavour was to make the markdown version easy to
maintain for hackers writing content. Requiring the minimum amount of markup
seems like the developer-friendly way here to keep productivity as well as
document quality high.

Most importantly though, I targeted reading the files as plain text without any
rendering. We keep these files in text format close to the code for a reason,
and maintaining readability as text was a north star.

For example, the line

A split initially marks the left page with the F_FOLLOW_RIGHT flag.

is rendered by GFM as you'd expect. But original Markdown converts it to

A split initially marks the left page with the F<em>FOLLOW</em>RIGHT
flag.

This kind of problem is pervasive, as you'd expect.

Correct, but I can't imagine that we'd like to wrap every instance of a name
with underscores in backticks like `F_FOLLOW_RIGHT`. There are very few
Markdown implementations which don't support underscores like this (testing
just now on the top online editors and sites providing markdown editing I
failed to find a single one).

Also, the READMEs often do not indent lists in a non-ambiguous way. For example, if you look into src/backend/optimizer/README, section "Join Tree Construction", there are two list items, but it's not immediately clear which paragraphs belong to the list and which ones follow the list. This also interacts with the previous point. The resulting formatting in GFM is quite misleading.

I agree that the rendered version excacerbates this problem. Writing a bullet
point list where each item spans multiple paragraphs indented the same way as
the paragraphs following the list is not helpful to the reader. In these cases
both the markdown and the text version will be improved by indentation.

There are also various places where whitespace is used for ad-hoc formatting. Consider for example in src/backend/access/gin/README

the "category" of the null entry. These are the possible categories:

1 = ordinary null key value extracted from an indexable item
2 = placeholder for zero-key indexable item
3 = placeholder for null indexable item

Placeholder null entries are inserted into the index because otherwise

But this does not preserve the list-like formatting, it just flows it together.

That's the kind of sublists which need to be found as part of this work, and
the items prefixed with a list identifier. In this case, prefixing each row in
the sublist with '-' yields the correct result.

src/test/README.md wasn't touched by your patch, but it also needs adjustments for list formatting.

I didn't re-indent that one in order to keep the changes to the absolute
minimum, since I considered the rendered version passable even if not
particularly good. Re-indenting files like this will for sure make the end
result better, as long as the changes keep the text version readability.

In summary, I think before we could accept this, we'd need to go through this with a fine-toothed comb line by line and page by page to make sure the formatting is still sound.

Absolutely. I've been over every file to ensure they aren't blatantly wrong,
but I didn't want to spend the time if this was immmediately shot down as
something the community don't want to maintain.

And we'd need to figure out which Markdown flavor to target.

Absolutely, and as I mentioned above, we need to pick based both the final
result (text and rendered) as well as the developer experience for maintaining
this.

--
Daniel Gustafsson

#6Peter Eisentraut
peter_e@gmx.net
In reply to: Daniel Gustafsson (#5)
Re: Converting README documentation to Markdown

On 15.05.24 14:26, Daniel Gustafsson wrote:

Another aspect of platform/flavour was to make the markdown version easy to
maintain for hackers writing content. Requiring the minimum amount of markup
seems like the developer-friendly way here to keep productivity as well as
document quality high.

Most importantly though, I targeted reading the files as plain text without any
rendering. We keep these files in text format close to the code for a reason,
and maintaining readability as text was a north star.

I've been thinking about this some more. I think the most value here
would be to just improve the plain-text formatting, so that there are
consistent list styles, header styles, indentation, some of the
ambiguities cleared up -- much of which your 0001 patch does. You might
as well be targeting markdown-like conventions with this; they are
mostly reasonable.

I tend to think that actually converting all the README files to
README.md could be a net negative for maintainability. Because now you
are requiring everyone who potentially wants to edit those to be aware
of Markdown syntax and manually check the rendering. With things like
DocBook, if you make a mess, you get error messages from the build step.
If you make a mess in Markdown, you have to visually find it yourself.
There are many READMEs that contain nested lists and code snippets and
diagrams and such all mixed together. Getting that right in Markdown
can be quite tricky. I'm also foreseeing related messes of trailing
whitespace, spaces-vs-tab confusion, gitattributes violations, etc. It
can be a lot of effort. It's okay to do this for prominent files like
the top-level one, but I suggest that for the rest we can keep it simple
and just use plain text.

#7Jelte Fennema-Nio
postgres@jeltef.nl
In reply to: Peter Eisentraut (#6)
Re: Converting README documentation to Markdown

On Fri, 28 Jun 2024 at 09:38, Peter Eisentraut <peter@eisentraut.org> wrote:

Getting that right in Markdown can be quite tricky.

I agree that in some cases it's tricky. But what's the worst case that
can happen when you get it wrong? It renders weird on github.com.
Luckily there's a "code" button to go to the plain text format[1]https://github.com/postgres/postgres/blob/master/README.md?plain=1. In
all other cases (which I expect will be most) the doc will be easier
to read. Forcing plaintext, just because sometimes we might make a
mistake in the syntax seems like an overcorrection imho. Especially
because these docs are (hopefully) read more often than written.

[1]: https://github.com/postgres/postgres/blob/master/README.md?plain=1

#8Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Peter Eisentraut (#4)
Re: Converting README documentation to Markdown

I've been thinking about this some more. I think the most value here
would be to just improve the plain-text formatting, so that there are
consistent list styles, header styles, indentation, some of the
ambiguities cleared up -- much of which your 0001 patch does. You
might as well be targeting markdown-like conventions with this; they
are mostly reasonable.

I tend to think that actually converting all the README files to
README.md could be a net negative for maintainability. Because now
you are requiring everyone who potentially wants to edit those to be
aware of Markdown syntax and manually check the rendering. With
things like DocBook, if you make a mess, you get error messages from
the build step. If you make a mess in Markdown, you have to visually
find it yourself. There are many READMEs that contain nested lists
and code snippets and diagrams and such all mixed together. Getting
that right in Markdown can be quite tricky. I'm also foreseeing
related messes of trailing whitespace, spaces-vs-tab confusion,
gitattributes violations, etc. It can be a lot of effort. It's okay
to do this for prominent files like the top-level one, but I suggest
that for the rest we can keep it simple and just use plain text.

Agreed.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

#9Peter Eisentraut
peter_e@gmx.net
In reply to: Jelte Fennema-Nio (#7)
Re: Converting README documentation to Markdown

On 28.06.24 11:56, Jelte Fennema-Nio wrote:

On Fri, 28 Jun 2024 at 09:38, Peter Eisentraut <peter@eisentraut.org> wrote:

Getting that right in Markdown can be quite tricky.

I agree that in some cases it's tricky. But what's the worst case that
can happen when you get it wrong? It renders weird on github.com.

I have my "less" set up so that "less somefile.md" automatically renders
the markdown. That's been pretty useful. But if that now keeps making
a mess out of PostgreSQL's README files, then I'm going to have to keep
fixing things, and I might get really mad. That's the worst that could
happen. ;-)

So I don't agree with "aspirational markdown". If we're going to do it,
then I expect that the files are marked up correctly at all times.

Conversely, what's the best that could happen?

#10Jelte Fennema-Nio
postgres@jeltef.nl
In reply to: Peter Eisentraut (#9)
Re: Converting README documentation to Markdown

On Fri, 28 Jun 2024 at 20:40, Peter Eisentraut <peter@eisentraut.org> wrote:

I have my "less" set up so that "less somefile.md" automatically renders
the markdown. That's been pretty useful. But if that now keeps making
a mess out of PostgreSQL's README files, then I'm going to have to keep
fixing things, and I might get really mad. That's the worst that could
happen. ;-)

Do you have reason to think that this is going to be a bigger issue
for Postgres READMEs than for any other markdown files you encounter?
Because this sounds like a generic problem you'd run into with your
"less" set up, which so far apparently has been small enough that it's
worth the benefit of automatically rendering markdown files.

So I don't agree with "aspirational markdown". If we're going to do it,
then I expect that the files are marked up correctly at all times.

I think for at least ~90% of our README files this shouldn't be a
problem. If you have specific ones in mind that contain difficult
markup/diagrams, then maybe we shouldn't convert those.

Conversely, what's the best that could happen?

That your "less" would automatically render Postgres READMEs nicely.
Which you say has been pretty useful ;-) And maybe even show syntax
highlighting for codeblocks.

P.S. Now I'm wondering what your "less" is.

#11Daniel Gustafsson
daniel@yesql.se
In reply to: Peter Eisentraut (#9)
Re: Converting README documentation to Markdown

On 28 Jun 2024, at 20:40, Peter Eisentraut <peter@eisentraut.org> wrote:

If we're going to do it, then I expect that the files are marked up correctly at all times.

I agree with that. I don't think it will be a terribly high bar though since we
were pretty much already writing markdown. We already have pandoc in the meson
toolchain, adding a target to check syntax should be doable.

Conversely, what's the best that could happen?

One of the main goals of this work was to make sure the documentation renders
nicely on platforms which potential new contributors consider part of the
fabric of writing code. We might not be on Github (and I'm not advocating that
we should) but any new contributor we want to attract is pretty likely to be
using it. The best that can happen is that new contributors find the postgres
code more approachable and get excited about contributing to postgres.

--
Daniel Gustafsson

#12Daniel Gustafsson
daniel@yesql.se
In reply to: Peter Eisentraut (#6)
Re: Converting README documentation to Markdown

On 28 Jun 2024, at 09:38, Peter Eisentraut <peter@eisentraut.org> wrote:

I've been thinking about this some more. I think the most value here would be to just improve the plain-text formatting, so that there are consistent list styles, header styles, indentation, some of the ambiguities cleared up -- much of which your 0001 patch does. You might as well be targeting markdown-like conventions with this; they are mostly reasonable.

(I assume you mean 0002). I agree that the increased consistency is worthwhile
even if we don't officially convert to Markdown (ie only do 0002 and not 0001).

I tend to think that actually converting all the README files to README.md could be a net negative for maintainability. Because now you are requiring everyone who potentially wants to edit those to be aware of Markdown syntax

Fair enough, but we currently expect those editing to be aware of our syntax
which isn't defined at all (leading to the variations this patchset fixes).
I'm not sure whats best for maintainability but I do think the net change is
all that big.

and manually check the rendering.

That however would be a new requirement, and I can see that being a deal-
breaker for introducing this.

Attached is a v2 which fixes a conflict, if there is no interest in Markdown
I'll drop 0001 and the markdown-specifics from 0002 to instead target increased
consistency.

--
Daniel Gustafsson

Attachments:

v2-0001-Convert-internal-documentation-to-Markdown-rename.patchapplication/octet-stream; name=v2-0001-Convert-internal-documentation-to-Markdown-rename.patch; x-unix-mode=0644Download+0-1
v2-0002-Convert-internal-documentation-to-markdown-Conver.patchapplication/octet-stream; name=v2-0002-Convert-internal-documentation-to-markdown-Conver.patch; x-unix-mode=0644Download+897-795
#13Daniel Gustafsson
daniel@yesql.se
In reply to: Daniel Gustafsson (#12)
Re: Converting README documentation to Markdown

On 1 Jul 2024, at 12:22, Daniel Gustafsson <daniel@yesql.se> wrote:

Attached is a v2 which fixes a conflict, if there is no interest in Markdown
I'll drop 0001 and the markdown-specifics from 0002 to instead target increased
consistency.

Since there doesn't seem to be much interest in going all the way to Markdown,
the attached 0001 is just the formatting changes for achieving (to some degree)
consistency among the README's. This mostly boils down to using a consistent
amount of whitespace around code, using the same indentation on bullet lists
and starting sections the same way. Inspecting the patch with git diff -w
reveals that it's not much left once whitespace is ignored. There might be a
few markdown hunks left which I'll hunt down in case anyone is interested in
this.

As an added bonus this still makes most READMEs render nicely as Markdown, just
not automatically on Github as it doesn't know the filetype.

--
Daniel Gustafsson

Attachments:

v3-0001-Standardize-syntax-in-internal-documentation.patchapplication/octet-stream; name=v3-0001-Standardize-syntax-in-internal-documentation.patch; x-unix-mode=0644Download+1134-1032
#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Gustafsson (#13)
Re: Converting README documentation to Markdown

Daniel Gustafsson <daniel@yesql.se> writes:

Since there doesn't seem to be much interest in going all the way to Markdown,
the attached 0001 is just the formatting changes for achieving (to some degree)
consistency among the README's. This mostly boils down to using a consistent
amount of whitespace around code, using the same indentation on bullet lists
and starting sections the same way. Inspecting the patch with git diff -w
reveals that it's not much left once whitespace is ignored. There might be a
few markdown hunks left which I'll hunt down in case anyone is interested in
this.

As an added bonus this still makes most READMEs render nicely as Markdown, just
not automatically on Github as it doesn't know the filetype.

I did not inspect the patch in detail, but this approach seems
like a reasonable compromise. However, if we're not officially
going to Markdown, how likely is it that these files will
stay valid in future edits? I suspect most of us don't have
those syntax rules wired into our fingers (I sure don't).

regards, tom lane

#15Daniel Gustafsson
daniel@yesql.se
In reply to: Tom Lane (#14)
Re: Converting README documentation to Markdown

On 10 Sep 2024, at 17:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Daniel Gustafsson <daniel@yesql.se> writes:

Since there doesn't seem to be much interest in going all the way to Markdown,
the attached 0001 is just the formatting changes for achieving (to some degree)
consistency among the README's. This mostly boils down to using a consistent
amount of whitespace around code, using the same indentation on bullet lists
and starting sections the same way. Inspecting the patch with git diff -w
reveals that it's not much left once whitespace is ignored. There might be a
few markdown hunks left which I'll hunt down in case anyone is interested in
this.

As an added bonus this still makes most READMEs render nicely as Markdown, just
not automatically on Github as it doesn't know the filetype.

I did not inspect the patch in detail, but this approach seems
like a reasonable compromise. However, if we're not officially
going to Markdown, how likely is it that these files will
stay valid in future edits? I suspect most of us don't have
those syntax rules wired into our fingers (I sure don't).

I'm not too worried, especially since we're not making any guarantees about
conforming to a set syntax. We had written more or less correct Markdown
already, if we continue to create new content in the style of the surrounding
existing content then I'm confident they'll stay very close to markdown.

--
Daniel Gustafsson

#16Robert Haas
robertmhaas@gmail.com
In reply to: Daniel Gustafsson (#13)
Re: Converting README documentation to Markdown

On Tue, Sep 10, 2024 at 8:51 AM Daniel Gustafsson <daniel@yesql.se> wrote:

Since there doesn't seem to be much interest in going all the way to Markdown,

Just for the record, I suspect going to Markdown is actually the right
thing to do. I am personally unenthusiastic about it because I need
one more thing to worry about when committing like I need a hole in my
head, but a chronic complaint about the PostgreSQL project is that we
insist on doing everything our own way instead of admitting that there
is significant value in conforming to, or at least being compatible
with, widely-adopted development practices, and using Markdown files
to document stuff in git repos seems to be one of those. No single
change that we make is going to make the difference between us
attracting the next generation of developers and not, but if we always
prioritize what feels good to people who learned to code in the 1970s
or 1980s (like me!) over what feels good to people who learned to code
in the 2010s or 2020s, we will definitely run out of developers at
some point.

--
Robert Haas
EDB: http://www.enterprisedb.com

#17Peter Eisentraut
peter_e@gmx.net
In reply to: Daniel Gustafsson (#13)
Re: Converting README documentation to Markdown

On 10.09.24 14:50, Daniel Gustafsson wrote:

On 1 Jul 2024, at 12:22, Daniel Gustafsson <daniel@yesql.se> wrote:

Attached is a v2 which fixes a conflict, if there is no interest in Markdown
I'll drop 0001 and the markdown-specifics from 0002 to instead target increased
consistency.

Since there doesn't seem to be much interest in going all the way to Markdown,
the attached 0001 is just the formatting changes for achieving (to some degree)
consistency among the README's. This mostly boils down to using a consistent
amount of whitespace around code, using the same indentation on bullet lists
and starting sections the same way. Inspecting the patch with git diff -w
reveals that it's not much left once whitespace is ignored. There might be a
few markdown hunks left which I'll hunt down in case anyone is interested in
this.

I went through this file by file and checked the results of a
markdown-to-HTML conversion using cmark and looking at the raw output
source files.

A lot of the changes are obvious and make sense. But there are a number
of cases of code within lists or nested lists or both that need further
careful investigation. I'm attaching a fixup patch where I tried to
improve some of this (and a few other things I found along the way).
Some of the more complicated ones, such as
src/backend/storage/lmgr/README-SSI, will need to be checked again and
even more carefully to make sure that the meaning is not altered by
these patches.

One underlying problem that I see is that markdown assumes four-space
tabs, but a standard editor configuration (and apparently your editor)
uses 8 tabs. But then, if you have a common situation like

```
1. Run this code

<tab>$ sudo kill
```

then that's incorrect (the code line will not be inside the list),
because it should be

```
1. Run this code

<tab><tab>$ sudo kill
```

or

```
1. Run this code

<8 spaces>$ sudo kill
```

So we need to think about a way to make this more robust for future
people editing. Maybe something in .gitattributes or some editor
settings. Otherwise, it will be all over the places after a while.
(There are also a couple of places where apparently you changed
whitespace that wasn't necessary to be changed.)

Apart from this, I don't changing the placeholders like <foo> to < foo

. In some cases, this really decreases readability. Maybe we should

look for different approaches there.

Maybe there are some easy changes that could be extracted from this
patch, but the whitespace and list issue needs more consideration.

Attachments:

0001-fixup-Standardize-syntax-in-internal-documentation.patchtext/plain; charset=UTF-8; name=0001-fixup-Standardize-syntax-in-internal-documentation.patchDownload+88-90
#18Daniel Gustafsson
daniel@yesql.se
In reply to: Peter Eisentraut (#17)
Re: Converting README documentation to Markdown

On 23 Sep 2024, at 13:58, Peter Eisentraut <peter@eisentraut.org> wrote:

I went through this file by file and checked the results of a markdown-to-HTML conversion using cmark and looking at the raw output source files.

Thanks for reviewing!

I thought the consensus of the thread was to skip Markdown compatibility and
only go for more consistency in the current format, so I didn't check any such
results when proposing that version. I've checked with the Markdown rendering
preview in VSCode for this.

Placing the goalposts in the right place seems useful, should we aim for
Markdown, or a consistent look-and-feel regardless of markdown compatibility.

A lot of the changes are obvious and make sense. But there are a number of cases of code within lists or nested lists or both that need further careful investigation. I'm attaching a fixup patch where I tried to improve some of this (and a few other things I found along the way). Some of the more complicated ones, such as src/backend/storage/lmgr/README-SSI, will need to be checked again and even more carefully to make sure that the meaning is not altered by these patches.

Agreed. Checking it I don't see any such cases but more careful looking is
needed.

One underlying problem that I see is that markdown assumes four-space tabs, but a standard editor configuration (and apparently your editor) uses 8 tabs. But then, if you have a common situation like

```
1. Run this code

<tab>$ sudo kill
```

then that's incorrect (the code line will not be inside the list), because it should be

```
1. Run this code

<tab><tab>$ sudo kill
```

or

```
1. Run this code

<8 spaces>$ sudo kill
```

So we need to think about a way to make this more robust for future people editing. Maybe something in .gitattributes or some editor settings. Otherwise, it will be all over the places after a while.

Maybe we can add some form of pandoc target for rendering as as way to test
locally before pushing? (For those with pandoc installed, but we already have
the infrastructure in meson to use pandoc so it could be convenient perhaps).

(There are also a couple of places where apparently you changed whitespace that wasn't necessary to be changed.)

It was to make the files look and feel consistent, I've tried to reduce that in
the attached.

Apart from this, I don't changing the placeholders like <foo> to < foo >. In some cases, this really decreases readability. Maybe we should look for different approaches there.

Agreed. I took a stab at some of them in the attached. The usage in
src/test/isolation/README is seemingly the hardest to replace and I'm not sure
how we should proceed there.

Maybe there are some easy changes that could be extracted from this patch, but the whitespace and list issue needs more consideration.

If we want to reduce the size of this I think the changes which add a line of
whitespace could be broken out and committed separately. One random example
from the diff being:

***
Outer join identity 3 (discussed above) complicates this picture
a bit. In the form
+
A leftjoin (B leftjoin C on (Pbc)) on (Pab)
+
all of the Vars in clauses Pbc and Pab will have empty varnullingrels,
but if we start with
***

Personally I consider those changes a win for readability on their own
regardless of any progress towards Markdown.

The attached has your changed rolled into 0001 and any new changes in 0002 for
ease of skimming the diffs.

--
Daniel Gustafsson

Attachments:

v4-0002-Review-comments.patchapplication/octet-stream; name=v4-0002-Review-comments.patch; x-unix-mode=0644Download+212-218
v4-0001-Standardize-syntax-in-internal-documentation.patchapplication/octet-stream; name=v4-0001-Standardize-syntax-in-internal-documentation.patch; x-unix-mode=0644Download+1172-1071
#19Jelte Fennema-Nio
postgres@jeltef.nl
In reply to: Daniel Gustafsson (#18)
Re: Converting README documentation to Markdown

On Tue, 1 Oct 2024 at 15:52, Daniel Gustafsson <daniel@yesql.se> wrote:

So we need to think about a way to make this more robust for future people editing. Maybe something in .gitattributes or some editor settings. Otherwise, it will be all over the places after a while.

Maybe we can add some form of pandoc target for rendering as as way to test
locally before pushing?

I think a gitattributes rule to disallow hard-tabs word work fine,
especially when combined with this patch of mine which keeps the
.editorconfig file in sync with the .gitattributes file:
https://commitfest.postgresql.org/49/4829/

Apart from this, I don't changing the placeholders like <foo> to < foo >. In some cases, this really decreases readability. Maybe we should look for different approaches there.

Agreed. I took a stab at some of them in the attached. The usage in
src/test/isolation/README is seemingly the hardest to replace and I'm not sure
how we should proceed there.

One way to improve the isolation/README situation is by:
1. indenting the standalone lines by four spaces to make it a code block
2. for the inline cases, replace <foo> with `<foo>` or `foo`

#20Daniel Gustafsson
daniel@yesql.se
In reply to: Jelte Fennema-Nio (#19)
Re: Converting README documentation to Markdown

On 1 Oct 2024, at 16:53, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
On Tue, 1 Oct 2024 at 15:52, Daniel Gustafsson <daniel@yesql.se> wrote:

Apart from this, I don't changing the placeholders like <foo> to < foo >. In some cases, this really decreases readability. Maybe we should look for different approaches there.

Agreed. I took a stab at some of them in the attached. The usage in
src/test/isolation/README is seemingly the hardest to replace and I'm not sure
how we should proceed there.

One way to improve the isolation/README situation is by:
1. indenting the standalone lines by four spaces to make it a code block
2. for the inline cases, replace <foo> with `<foo>` or `foo`

If we go for following Markdown syntax then for sure, if not it will seem a bit
off I think.

--
Daniel Gustafsson

#21Tristan Partin
tristan@partin.io
In reply to: Jelte Fennema-Nio (#10)
#22Peter Eisentraut
peter_e@gmx.net
In reply to: Daniel Gustafsson (#20)
#23Daniel Gustafsson
daniel@yesql.se
In reply to: Peter Eisentraut (#22)
#24Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#16)
#25Junwang Zhao
zhjwpku@gmail.com
In reply to: Alvaro Herrera (#24)