Optimizing the documentation
-hackers,
The community has spent a lot of time optimizing features over the years.
Excellent examples include parallel query and partitioning which have been
multi-year efforts to increase the quality, performance, and extend
features of the original commit. We should consider the documentation in a
similar manner. Just like code, documentation can sometimes use a bug fix,
optimization, and/or new features added to the original implementation.
Technical documentation should only be as verbose as needed to illustrate
the concept or task that we are explaining. It should not be redundant, nor
should it use .50 cent words when a .10 cent word would suffice. I would
like to put effort into optimizing the documentation and am requesting
general consensus that this would be a worthwhile effort before I begin to
dust off my Docbook skills.
I have provided an example below:
Original text (79 words):
This book is the official documentation of PostgreSQL. It has been written
by the PostgreSQL developers and other volunteers in parallel to the
development of the PostgreSQL software. It describes all the functionality
that the current version of PostgreSQL officially supports.
To make the large amount of information about PostgreSQL manageable, this
book has been organized in several parts. Each part is targeted at a
different class of users, or at users in different stages of their
PostgreSQL experience:
Optimized text (35 words):
This is the official PostgreSQL documentation. It is written by the
PostgreSQL community in parallel with the development of the software. We
have organized it by the type of user and their stages of experience:
Issues that are resolved with the optimized text:
-
Succinct text is more likely to be read than skimmed
-
Removal of extraneous mentions of PostgreSQL
-
Removal of unneeded justifications
-
Joining of two paragraphs into one that provides only the needed
information to the user
-
Word count decreased by over 50%. As changes such as these are adopted
it would make the documentation more consumable.
Thanks,
JD
--
Founder - https://commandprompt.com/ - 24x7x365 Postgres since 1997
Co-Chair - https://postgresconf.org/ - Postgres Education at its finest
People, Postgres, Data
On Mon, Dec 14, 2020 at 12:50 PM Joshua Drake <jd@commandprompt.com> wrote:
-hackers,
The community has spent a lot of time optimizing features over the years.
Excellent examples include parallel query and partitioning which have been
multi-year efforts to increase the quality, performance, and extend
features of the original commit. We should consider the documentation in a
similar manner. Just like code, documentation can sometimes use a bug fix,
optimization, and/or new features added to the original implementation.Technical documentation should only be as verbose as needed to illustrate
the concept or task that we are explaining. It should not be redundant, nor
should it use .50 cent words when a .10 cent word would suffice. I would
like to put effort into optimizing the documentation and am requesting
general consensus that this would be a worthwhile effort before I begin to
dust off my Docbook skills.
As a quick observation, it would be more immediately helpful to add to the
existing proposal to add more details about architecture and get that
committed before embarking on a new documentation project.
https://commitfest.postgresql.org/31/2541/
I have provided an example below:
Original text (79 words):
This book is the official documentation of PostgreSQL. It has been written
by the PostgreSQL developers and other volunteers in parallel to the
development of the PostgreSQL software. It describes all the functionality
that the current version of PostgreSQL officially supports.To make the large amount of information about PostgreSQL manageable, this
book has been organized in several parts. Each part is targeted at a
different class of users, or at users in different stages of their
PostgreSQL experience:Optimized text (35 words):
This is the official PostgreSQL documentation. It is written by the
PostgreSQL community in parallel with the development of the software. We
have organized it by the type of user and their stages of experience:Issues that are resolved with the optimized text:
-
Succinct text is more likely to be read than skimmed
-Removal of extraneous mentions of PostgreSQL
-Removal of unneeded justifications
-Joining of two paragraphs into one that provides only the needed
information to the user
-Word count decreased by over 50%. As changes such as these are adopted
it would make the documentation more consumable.That actually exists in our documentation? I suspect changing it isn't
all that worthwhile as the typical user isn't reading the documentation
like a book and with the entry point being the table of contents most of
that material is simply gleaned from observing the presented structure
without words needed to describe it.
While I don't think making readability changes is a bad thing, and maybe my
perspective is a bit biased and negative right now, but the attention given
to the existing documentation patches in the commitfest isn't that great -
so adding another mass of patches fixing up items that haven't provoked
complaints seems likely to just make the list longer.
In short, I don't think optimization should be a goal in its own right; but
rather changes should mostly be driven by questions asked by our users. I
don't think reading random chapters of the documentation to find
non-optimal exposition is going to be a good use of time.
David J.
On 14/12/2020 21:50, Joshua Drake wrote:
The community has spent a lot of time optimizing features over the
years. Excellent examples include parallel query and partitioning which
have been multi-year efforts to increase the quality, performance, and
extend features of the original commit. We should consider the
documentation in a similar manner. Just like code, documentation can
sometimes use a bug fix, optimization, and/or new features added to the
original implementation.Technical documentation should only be as verbose as needed to
illustrate the concept or task that we are explaining. It should not be
redundant, nor should it use .50 cent words when a .10 cent word would
suffice. I would like to put effort into optimizing the documentation
and am requesting general consensus that this would be a worthwhile
effort before I begin to dust off my Docbook skills.
Hard to argue with "let's make the doc better" :-).
I expect that there will be a lot of bikeshedding over the exact
phrases. That's OK. Every improvement that actually gets committed
helps, even if we don't make progress on other parts.
I have provided an example below:
Original text (79 words):
This book is the official documentation of PostgreSQL. It has been
written by the PostgreSQL developers and other volunteers in parallel to
the development of the PostgreSQL software. It describes all the
functionality that the current version of PostgreSQL officially supports.To make the large amount of information about PostgreSQL manageable,
this book has been organized in several parts. Each part is targeted at
a different class of users, or at users in different stages of their
PostgreSQL experience:Optimized text (35 words):
This is the official PostgreSQL documentation. It is written by the
PostgreSQL community in parallel with the development of the software.
We have organized it by the type of user and their stages of experience:
Some thoughts on this example:
- Changing "has been" to "is" changes the tone here. "Is" implies that
it is being written continuously, whereas "has been" implies that it's
finished. We do update the docs continuously, but point of the sentence
is that the docs were developed together with the features, so "has
been" seems more accurate.
´- I like "PostgreSQL developers and other volunteers" better than the
"PostgreSQL community". This is the very first introduction to
PostgreSQL, so we can't expect the reader to know what the "PostgreSQL
community" is. I like the "volunteers" word here a lot.
- I think a little bit of ceremony is actually OK in this particular
paragraph, since it's the very first one in the docs.
- I agree with dropping the "to make the large amount of information
manageable".
So I would largely keep this example unchanged, changing it into:
---
This book is the official documentation of PostgreSQL. It has been
written by the PostgreSQL developers and other volunteers in parallel to
the development of the PostgreSQL software. It describes all the
functionality that the current version of PostgreSQL officially supports.
This book has been organized in several parts. Each part is targeted at
a different class of users, or at users in different stages of their
PostgreSQL experience:
---
Issues that are resolved with the optimized text:
* Succinct text is more likely to be read than skimmed
* Removal of extraneous mentions of PostgreSQL
* Removal of unneeded justifications
* Joining of two paragraphs into one that provides only the needed
information to the user* Word count decreased by over 50%. As changes such as these are
adopted it would make the documentation more consumable.
I agree with these goals in general. I like to refer to
http://www.plainenglish.co.uk/how-to-write-in-plain-english.html when
writing documentation. Or anything else, really.
- Heikki
Technical documentation should only be as verbose as needed to illustrate
the concept or task that we are explaining. It should not be redundant, nor
should it use .50 cent words when a .10 cent word would suffice. I would
like to put effort into optimizing the documentation and am requesting
general consensus that this would be a worthwhile effort before I begin to
dust off my Docbook skills.As a quick observation, it would be more immediately helpful to add to the
existing proposal to add more details about architecture and get that
committed before embarking on a new documentation project.
I considered just starting to review patches as such but even with that,
doesn't it make sense that if I am going to be putting a particular thought
process into my efforts that there is a general consensus? For example,
what would be exceedly helpful would be a documentation style guide that is
canonical and we can review documentation against. Currently our
documentation is all over the place. It isn't that it is not technically
accurate or comprehensive
Optimized text (35 words):
This is the official PostgreSQL documentation. It is written by the
PostgreSQL community in parallel with the development of the software. We
have organized it by the type of user and their stages of experience:Issues that are resolved with the optimized text:
-
Succinct text is more likely to be read than skimmed
-Removal of extraneous mentions of PostgreSQL
-Removal of unneeded justifications
-Joining of two paragraphs into one that provides only the needed
information to the user
-Word count decreased by over 50%. As changes such as these are
adopted it would make the documentation more consumable.That actually exists in our documentation?
Yes. https://www.postgresql.org/docs/13/preface.html
I suspect changing it isn't all that worthwhile as the typical user isn't
reading the documentation like a book and with the entry point being the
table of contents most of that material is simply gleaned from observing
the presented structure without words needed to describe it.
It is a matter of consistency.
While I don't think making readability changes is a bad thing, and maybe
my perspective is a bit biased and negative right now, but the attention
given to the existing documentation patches in the commitfest isn't that
great - so adding another mass of patches fixing up items that haven't
provoked complaints seems likely to just make the list longer.
One of the issues is that editing documentation with patches is a pain. It
is simpler and a lower barrier of effort to pull up an existing section of
Docbook and edit that (just like code) than it is to break out specific
text within a patch. Though I would be happy to take a swipe at reviewing a
specific documentation patch (as you linked).
In short, I don't think optimization should be a goal in its own right;
but rather changes should mostly be driven by questions asked by our
users. I don't think reading random chapters of the documentation to find
non-optimal exposition is going to be a good use of time.
I wasn't planning on reading random chapters. I was planning on walking
through the documentation as it is written and hopefully others would join.
This is a monumental effort to perform completely. Also consider the
overall benefit, not just one specific piece. Would you not consider it a
net win if certain questions were being answered in a succinct way as to
allow users to use the documentation instead of asking the most novice of
questions on various channels?
JD
Show quoted text
This is the official PostgreSQL documentation. It is written by the
PostgreSQL community in parallel with the development of the software.
We have organized it by the type of user and their stages of experience:Some thoughts on this example:
- Changing "has been" to "is" changes the tone here. "Is" implies that
it is being written continuously, whereas "has been" implies that it's
finished. We do update the docs continuously, but point of the sentence
is that the docs were developed together with the features, so "has
been" seems more accurate.
No argument.
´- I like "PostgreSQL developers and other volunteers" better than the
"PostgreSQL community". This is the very first introduction to
PostgreSQL, so we can't expect the reader to know what the "PostgreSQL
community" is. I like the "volunteers" word here a lot.
There is a huge community for PostgreSQL, the developers are only a
small (albeit critical) part of it. By using the term "PostgreSQL
community" we are providing equity to all those who participate in the
success of the project. I could definitely see saying "PostgreSQL
volunteers".
- I think a little bit of ceremony is actually OK in this particular
paragraph, since it's the very first one in the docs.- I agree with dropping the "to make the large amount of information
manageable".So I would largely keep this example unchanged, changing it into:
---
This book is the official documentation of PostgreSQL. It has been
written by the PostgreSQL developers and other volunteers in parallel to
the development of the PostgreSQL software. It describes all the
functionality that the current version of PostgreSQL officially supports.This book has been organized in several parts. Each part is targeted at
a different class of users, or at users in different stages of their
PostgreSQL experience:
---
I appreciate the feedback and before we get too far down the rabbit hole, I
would like to note that I am not tied to an exact wording as my post was
more about the general goal and results based on that goal.
I agree with these goals in general. I like to refer to
http://www.plainenglish.co.uk/how-to-write-in-plain-english.html when
writing documentation. Or anything else, really.
Great resource!
JD
Show quoted text
- Heikki
Heikki Linnakangas <hlinnaka@iki.fi> writes:
On 14/12/2020 21:50, Joshua Drake wrote:
Issues that are resolved with the optimized text:
* Succinct text is more likely to be read than skimmed
* Removal of extraneous mentions of PostgreSQL
* Removal of unneeded justifications
* Joining of two paragraphs into one that provides only the needed
information to the user* Word count decreased by over 50%. As changes such as these are
adopted it would make the documentation more consumable.
I agree with these goals in general. I like to refer to
http://www.plainenglish.co.uk/how-to-write-in-plain-english.html when
writing documentation. Or anything else, really.
I think this particular chunk of text is an outlier. (Not unreasonably
so; as Heikki notes, it's customary for the very beginning of a book to
be a bit more formal.) Most of the docs contain pretty dense technical
material that's not going to be improved by making it even denser.
Also, to the extent that there's duplication, it's often deliberate.
For example, if a given bit of info appears in the tutorial and the
main docs and the reference pages, that doesn't mean we should rip
out two of the three appearances.
There certainly are sections that are crying out for reorganization,
but that's going to be very topic-specific and not something that
just going into it with a copy-editing mindset will help.
In short, the devil's in the details. Maybe there are lots of
places where this type of approach would help, but I think it's
going to be a case-by-case discussion not something where there's
a clear win overall.
regards, tom lane
On Mon, Dec 14, 2020 at 1:40 PM Joshua Drake <jd@commandprompt.com> wrote:
For example, what would be exceedly helpful would be a documentation style
guide that is canonical and we can review documentation against.
I do agree with that premise, with the goal of getting more people to
contribute to writing and reviewing documentation and having more than
vague ideas about what is or isn't considered minor items to just leave
alone or points of interest to debate. But as much as I would love
perfectly written English documentation I try to consciously make an effort
to accept things that maybe aren't perfect but are good enough in the
interest of having a larger set of contributors with more varied abilities
in this area. "It is clear enough" is a valid trade-off to take.
Thanks, though it was meant to be a bit rhetorical.
While I don't think making readability changes is a bad thing, and maybe
my perspective is a bit biased and negative right now, but the attention
given to the existing documentation patches in the commitfest isn't that
great - so adding another mass of patches fixing up items that haven't
provoked complaints seems likely to just make the list longer.One of the issues is that editing documentation with patches is a pain. It
is simpler and a lower barrier of effort to pull up an existing section of
Docbook and edit that (just like code) than it is to break out specific
text within a patch. Though I would be happy to take a swipe at reviewing a
specific documentation patch (as you linked).
I'm not following this line of reasoning.
In short, I don't think optimization should be a goal in its own right;
but rather changes should mostly be driven by questions asked by our
users. I don't think reading random chapters of the documentation to find
non-optimal exposition is going to be a good use of time.I wasn't planning on reading random chapters. I was planning on walking
through the documentation as it is written and hopefully others would join.
This is a monumental effort to perform completely. Also consider the
overall benefit, not just one specific piece. Would you not consider it a
net win if certain questions were being answered in a succinct way as to
allow users to use the documentation instead of asking the most novice of
questions on various channels?
I suspect over half of the questions asked are due to not reading the
documentation at all - I tend to get good results when I point someone to
the correct terminology and section, and if there are follow-up questions
then I know where to look for improvements and have a concrete question or
two in hand to ensure that the revised documentation answers.
I'm fairly well plugged into user questions and have recently made an
attempt to respond to those with specific patches to improve the
documentation involved in those questions. And also have been working to
help other documentation patches get pushed through. Based upon those
experiences I think this monumental community effort is going to stall out
pretty quickly - regardless of its merits - though if the effort results in
a new guidelines document then I would say it was worth the effort
regardless of how many paragraphs are optimized away.
My $0.02
David J.
In short, the devil's in the details. Maybe there are lots of
places where this type of approach would help, but I think it's
going to be a case-by-case discussion not something where there's
a clear win overall.
Certainly and I didn't want to just start dumping patches. Part of this is
just style, for example:
Thus far, our queries have only accessed one table at a time. Queries can
access multiple tables at once, or access the same table in such a way that
multiple rows of the table are being processed at the same time. A query
that accesses multiple rows of the same or different tables at one time is
called a join query. As an example, say you wish to list all the weather
records together with the location of the associated city. To do that, we
need to compare the city column of each row of the weather table with the
name column of all rows in the cities table, and select the pairs of rows
where these values match.
It isn't "terrible" but can definitely be optimized. In a quick review, I
would put it something like this:
Queries can also access multiple tables at once, or access the same table
in a way that multiple rows are processed. A query that accesses multiple
rows of the same or different tables at one time is a join. For example, if
you wish to list all of the weather records with the location of the
associated city, we would compare the city column of each row of the weather
table with the name column of all rows in the cities table, and select the
rows *WHERE* the values match.
The reason I bolded and capitalized WHERE was to provide a visual signal to
the example that is on the page. I could also argue that we could remove
"For example," though I understand its purpose here.
Again, this was just a quick review.
JD
On Mon, Dec 14, 2020 at 12:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Most of the docs contain pretty dense technical
material that's not going to be improved by making it even denser.
It's always hard to write dense technical prose, for a variety of
reasons. I often struggle with framing. For example I seem to write
sentences that sound indecisive. But is that necessarily a bad thing?
It seems wise to hedge a little bit when talking about (say) some kind
of complex system with many moving parts. Ernest Hemingway never had
to describe how VACUUM works.
I agree with Heikki to some degree; there is value in trying to follow
a style guide. But let's not forget about the other problem with the
docs, which is that there isn't enough low level technical details of
the kind that advanced users value. There is a clear unmet demand for
that IME. If we're going to push in the direction of simplification,
it should not make this other important task harder.
--
Peter Geoghegan
Joshua Drake <jd@commandprompt.com> writes:
Certainly and I didn't want to just start dumping patches. Part of this is
just style, for example:
Thus far, our queries have only accessed one table at a time. Queries can
access multiple tables at once, or access the same table in such a way that
multiple rows of the table are being processed at the same time. A query
that accesses multiple rows of the same or different tables at one time is
called a join query. As an example, say you wish to list all the weather
records together with the location of the associated city. To do that, we
need to compare the city column of each row of the weather table with the
name column of all rows in the cities table, and select the pairs of rows
where these values match.
It isn't "terrible" but can definitely be optimized. In a quick review, I
would put it something like this:
Queries can also access multiple tables at once, or access the same table
in a way that multiple rows are processed. A query that accesses multiple
rows of the same or different tables at one time is a join. For example, if
you wish to list all of the weather records with the location of the
associated city, we would compare the city column of each row of the weather
table with the name column of all rows in the cities table, and select the
rows *WHERE* the values match.
TBH, I'm not sure that that is an improvement at all. I'm constantly
reminded that for many of our users, English is not their first language.
A little bit of redundancy in wording is often helpful for them.
The places where I think the docs need help tend to be places where
assorted people have added information over time, such that there's
not a consistent style throughout a section; or maybe the information
could be presented in a better order. We don't need to be taking a
hacksaw to text that's perfectly clear as it stands.
(If I were thinking of rewriting this text, I'd probably think of
removing the references to self-joins and covering that topic
in a separate para. But that's because self-joins aren't basic
usage, not because I think the text is unreadable.)
The reason I bolded and capitalized WHERE was to provide a visual signal to
the example that is on the page.
IMO, typographical tricks are not something to lean on heavily.
regards, tom lane
Queries can also access multiple tables at once, or access the same table
in a way that multiple rows are processed. A query that accesses multiple
rows of the same or different tables at one time is a join. For example,if
you wish to list all of the weather records with the location of the
associated city, we would compare the city column of each row of theweather
table with the name column of all rows in the cities table, and select
the
rows *WHERE* the values match.
TBH, I'm not sure that that is an improvement at all. I'm constantly
reminded that for many of our users, English is not their first language.
A little bit of redundancy in wording is often helpful for them.
Interesting point, it is certainly true that many of our users are ESL
folks. I would expect a succinct version to be easier to understand but I
have no idea.
The places where I think the docs need help tend to be places where
assorted people have added information over time, such that there's
not a consistent style throughout a section; or maybe the information
could be presented in a better order. We don't need to be taking a
hacksaw to text that's perfectly clear as it stands.
The term perfectly clear is part of the problem I am trying to address. I
can pick and pull at the documentation all day long and show things that
are not perfectly clear. They are clear to you, myself and I imagine most
of the readers on this list. Generally speaking we are not the target of
the documentation and we may easily get pulled into the "good enough" when
in reality it could be so much better. I have gotten so used to our
documentation that I literally skip over unneeded words to get to the
answer I am looking for. I don't think that is the target we want to hit.
Wouldn't we want the least amount of mental energy to understand the
concept as possible for the reader? Every extra word that isn't needed,
every extra adjective, repeated term or "very unique" that exists is extra
energy spent to understand what the writer is trying to say. That mental
energy can be exhausted quickly, especially when considering dense
technical topics.
(If I were thinking of rewriting this text, I'd probably think of
removing the references to self-joins and covering that topic
in a separate para. But that's because self-joins aren't basic
usage, not because I think the text is unreadable.)
That makes sense. I was just taking the direct approach of making existing
content better as an example. I would agree with your assessment if it were
to be submitted as a patch.
The reason I bolded and capitalized WHERE was to provide a visual signal
to
the example that is on the page.
IMO, typographical tricks are not something to lean on heavily.
Fair enough.
JD
On Mon, Dec 14, 2020 at 01:38:05PM -0800, Peter Geoghegan wrote:
On Mon, Dec 14, 2020 at 12:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Most of the docs contain pretty dense technical
material that's not going to be improved by making it even denser.It's always hard to write dense technical prose, for a variety of
reasons. I often struggle with framing. For example I seem to write
sentences that sound indecisive. But is that necessarily a bad thing?
It seems wise to hedge a little bit when talking about (say) some kind
of complex system with many moving parts. Ernest Hemingway never had
to describe how VACUUM works.I agree with Heikki to some degree; there is value in trying to follow
a style guide. But let's not forget about the other problem with the
docs, which is that there isn't enough low level technical details of
the kind that advanced users value. There is a clear unmet demand for
that IME. If we're going to push in the direction of simplification,
it should not make this other important task harder.
I agree a holistic review of the docs can yield great benefits. No one
usually complains about overly verbose text, but making it clearer is
always a win. Anyway, of course, it is going to be very specific for
each case. As an extreme example, in 2007 when I did a full review of
the docs, I clarified may/can/might in our docs, and it probably helped.
Here is one of several commits:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=e81c138e18
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com
The usefulness of a cup is in its emptiness, Bruce Lee
On Thu, Dec 17, 2020 at 7:42 AM Bruce Momjian <bruce@momjian.us> wrote:
I agree a holistic review of the docs can yield great benefits. No one
usually complains about overly verbose text, but making it clearer is
always a win. Anyway, of course, it is going to be very specific for
each case. As an extreme example, in 2007 when I did a full review of
the docs, I clarified may/can/might in our docs, and it probably helped.
I think that the "may/can/might" rule is a very good one. It
standardizes something that would otherwise just be left to chance,
and AFAICT has no possible downside. Even still, I think that adding
new rules is subject to sharp diminishing returns. There just aren't
that many things that work like that.
--
Peter Geoghegan