Planet Postgres and the curse of AI

Started by Greg Sabino Mullaneover 1 year ago18 messagesgeneral
Jump to latest
#1Greg Sabino Mullane
greg@turnstep.com

I've been noticing a growing trend of blog posts written mostly, if not
entirely, with AI (aka LLMs, ChatGPT, etc.). I'm not sure where to raise
this issue. I considered a blog post, but this mailing list seemed a better
forum to generate a discussion.

The problem is two-fold as I see it.

First, there is the issue of people trying to game the system by churning
out content that is not theirs, but was written by a LLM. I'm not going to
name specific posts, but after a while it gets easy to recognize things
that are written mostly by AI.

These blog posts are usually generic, describing some part of Postgres
in an impersonal, mid-level way. Most of the time the facts are not
wrong, per se, but they lack nuances that a real DBA would bring to the
discussion, and often leave important things out. Code examples are often
wrong in subtle ways. Places where you might expect a deeper discussion are
glossed over.

So this first problem is that it is polluting the Postgres blogs with
overly bland, moderately helpful posts that are not written by a human, and
do not really bring anything interesting to the table. There is a place for
posts that describe basic Postgres features, but the ones written by humans
are much better. (yeah, yeah, "for now" and all hail our AI overlords in
the future).

The second problem is worse, in that LLMs are not merely gathering
information, but have the ability to synthesize new conclusions and facts.
In short, they can lie. Or hallucinate. However you want to call it, it's a
side effect of the way LLMs work. In a technical field like Postgres, this
can be a very bad thing. I don't know how widespread this is, but I was
tipped off about this over a year ago when I came across a blog suggesting
using the "max_toast_size configuration parameter". For those not
familiar, I can assure you that Postgres does not have, nor will likely
ever have, a GUC with that name.

As anyone who has spoken with ChatGPT knows, getting small important
details correct is not its forte. I love ChatGPT and actually use it daily.
It is amazing at doing certain tasks. But writing blog posts should not be
one of them.

Do we need a policy or a guideline for Planet Postgres? I don't know. It
can be a gray line. Obviously spelling and grammar checking is quite
okay, and making up random GUCs is not, but the middle bit is very hazy.
(Human) thoughts welcome.

Cheers,
Greg

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Greg Sabino Mullane (#1)
Re: Planet Postgres and the curse of AI

st 17. 7. 2024 v 19:22 odesílatel Greg Sabino Mullane <htamfids@gmail.com>
napsal:

I've been noticing a growing trend of blog posts written mostly, if not
entirely, with AI (aka LLMs, ChatGPT, etc.). I'm not sure where to raise
this issue. I considered a blog post, but this mailing list seemed a better
forum to generate a discussion.

The problem is two-fold as I see it.

First, there is the issue of people trying to game the system by churning
out content that is not theirs, but was written by a LLM. I'm not going to
name specific posts, but after a while it gets easy to recognize things
that are written mostly by AI.

These blog posts are usually generic, describing some part of Postgres
in an impersonal, mid-level way. Most of the time the facts are not
wrong, per se, but they lack nuances that a real DBA would bring to the
discussion, and often leave important things out. Code examples are often
wrong in subtle ways. Places where you might expect a deeper discussion are
glossed over.

So this first problem is that it is polluting the Postgres blogs with
overly bland, moderately helpful posts that are not written by a human, and
do not really bring anything interesting to the table. There is a place for
posts that describe basic Postgres features, but the ones written by humans
are much better. (yeah, yeah, "for now" and all hail our AI overlords in
the future).

The second problem is worse, in that LLMs are not merely gathering
information, but have the ability to synthesize new conclusions and facts.
In short, they can lie. Or hallucinate. However you want to call it, it's a
side effect of the way LLMs work. In a technical field like Postgres, this
can be a very bad thing. I don't know how widespread this is, but I was
tipped off about this over a year ago when I came across a blog suggesting
using the "max_toast_size configuration parameter". For those not
familiar, I can assure you that Postgres does not have, nor will likely
ever have, a GUC with that name.

As anyone who has spoken with ChatGPT knows, getting small important
details correct is not its forte. I love ChatGPT and actually use it daily.
It is amazing at doing certain tasks. But writing blog posts should not be
one of them.

Do we need a policy or a guideline for Planet Postgres? I don't know. It
can be a gray line. Obviously spelling and grammar checking is quite
okay, and making up random GUCs is not, but the middle bit is very hazy.
(Human) thoughts welcome.

It is very unpleasant to read a long article, and at the end to understand
so there is zero valuable information. Terrible situation was on planet
mariadb https://mariadb.org/planet/, but now it was cleaned. I am for some
form of moderating - and gently touching an author that writes articles
without extra value against documentation.

Regards

Pavel

Show quoted text

Cheers,
Greg

#3Kashif Zeeshan
kashi.zeeshan@gmail.com
In reply to: Greg Sabino Mullane (#1)
Re: Planet Postgres and the curse of AI

Hi Greg

I agree with you on the misuse of AI based tools, as per my experience with
Postgres the solutions suggested wont work at times.
Its not bad to get help from these tools but put all the solutions from
there is counter productive.
I think People should take care while using these tools while suggesting
solutions for real world problems.

Regards
Kashif Zeeshan

On Wed, Jul 17, 2024 at 10:22 PM Greg Sabino Mullane <htamfids@gmail.com>
wrote:

Show quoted text

I've been noticing a growing trend of blog posts written mostly, if not
entirely, with AI (aka LLMs, ChatGPT, etc.). I'm not sure where to raise
this issue. I considered a blog post, but this mailing list seemed a better
forum to generate a discussion.

The problem is two-fold as I see it.

First, there is the issue of people trying to game the system by churning
out content that is not theirs, but was written by a LLM. I'm not going to
name specific posts, but after a while it gets easy to recognize things
that are written mostly by AI.

These blog posts are usually generic, describing some part of Postgres
in an impersonal, mid-level way. Most of the time the facts are not
wrong, per se, but they lack nuances that a real DBA would bring to the
discussion, and often leave important things out. Code examples are often
wrong in subtle ways. Places where you might expect a deeper discussion are
glossed over.

So this first problem is that it is polluting the Postgres blogs with
overly bland, moderately helpful posts that are not written by a human, and
do not really bring anything interesting to the table. There is a place for
posts that describe basic Postgres features, but the ones written by humans
are much better. (yeah, yeah, "for now" and all hail our AI overlords in
the future).

The second problem is worse, in that LLMs are not merely gathering
information, but have the ability to synthesize new conclusions and facts.
In short, they can lie. Or hallucinate. However you want to call it, it's a
side effect of the way LLMs work. In a technical field like Postgres, this
can be a very bad thing. I don't know how widespread this is, but I was
tipped off about this over a year ago when I came across a blog suggesting
using the "max_toast_size configuration parameter". For those not
familiar, I can assure you that Postgres does not have, nor will likely
ever have, a GUC with that name.

As anyone who has spoken with ChatGPT knows, getting small important
details correct is not its forte. I love ChatGPT and actually use it daily.
It is amazing at doing certain tasks. But writing blog posts should not be
one of them.

Do we need a policy or a guideline for Planet Postgres? I don't know. It
can be a gray line. Obviously spelling and grammar checking is quite
okay, and making up random GUCs is not, but the middle bit is very hazy.
(Human) thoughts welcome.

Cheers,
Greg

#4Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Greg Sabino Mullane (#1)
Re: Planet Postgres and the curse of AI

On 7/17/24 10:21, Greg Sabino Mullane wrote:

I've been noticing a growing trend of blog posts written mostly, if not
entirely, with AI (aka LLMs, ChatGPT, etc.). I'm not sure where to raise
this issue. I considered a blog post, but this mailing list seemed a
better forum to generate a discussion.

Do we need a policy or a guideline for Planet Postgres? I don't know. It
can be a gray line. Obviously spelling and grammar checking is quite
okay, and making up random GUCs is not, but the middle bit is very hazy.
(Human) thoughts welcome.

A policy would be nice, just not sure how enforceable it would be. How
do you differentiate between the parrot that is AI and one that is
human? I run across all manner of blog posts where folks have lifted
content from the documentation or other sources without attribution,
which is basically what AI generated content is. AI does like to
embellish and make things up(ask the NYC lawyer suing the airlines about
that), though that is a human trait as well.

Cheers,
Greg

--
Adrian Klaver
adrian.klaver@aklaver.com

#5Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Greg Sabino Mullane (#1)
Re: Planet Postgres and the curse of AI

On Wed, 2024-07-17 at 13:21 -0400, Greg Sabino Mullane wrote:

I've been noticing a growing trend of blog posts written mostly, if not entirely, with AI
(aka LLMs, ChatGPT, etc.). I'm not sure where to raise this issue. I considered a blog post,
but this mailing list seemed a better forum to generate a discussion.

The problem is two-fold as I see it.

First, there is the issue of people trying to game the system by churning out content that is not theirs [...]

So this first problem is that it is polluting the Postgres blogs [...]

The second problem is worse, in that LLMs are not merely gathering information, but have
the ability to synthesize new conclusions and facts. In short, they can lie.

Do we need a policy or a guideline for Planet Postgres? I don't know. It can be a gray line.
Obviously spelling and grammar checking is quite okay, and making up random GUCs is not,
but the middle bit is very hazy. (Human) thoughts welcome.

As someone who writes blogs and occasionally browses Planet Postgres, this has not
struck me as a major problem. I just scrolled through it and nothing stood out to
me - perhaps I am too naïve.

There certainly are people who publish random short utterances, perhaps with the
intention to hit the "top posters" list, but I don't think we need strong measures.

If anything, I am most annoyed by articles that are just thinly veiled advertising,
but there is already a policy controlling that.

As long as there is not a flood of AI generated babble (and I cannot see one), I'd
say that this will regulate itself: spewing empty content and lies is not going to
reflect well on the author and his/her organization.

PostgreSQL has excellent documentation. Anybody who blindly follows advice from a
blog without checking with the documentation only has himself/herself to blame.

Yours,
Laurenz Albe

#6Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Laurenz Albe (#5)
Re: Planet Postgres and the curse of AI

I wrote:

On Wed, 2024-07-17 at 13:21 -0400, Greg Sabino Mullane wrote:

I've been noticing a growing trend of blog posts written mostly, if not entirely, with AI
(aka LLMs, ChatGPT, etc.). I'm not sure where to raise this issue. I considered a blog post,
but this mailing list seemed a better forum to generate a discussion.

[...]

Do we need a policy or a guideline for Planet Postgres? I don't know. It can be a gray line.
Obviously spelling and grammar checking is quite okay, and making up random GUCs is not,
but the middle bit is very hazy. (Human) thoughts welcome.

As someone who writes blogs and occasionally browses Planet Postgres, this has not
struck me as a major problem. I just scrolled through it and nothing stood out to
me - perhaps I am too naïve.

Seems like I *was* naïve - Álvaro has pointed me to a juicy example off-list.

Still, I wouldn't make a policy specifically against AI generated content. That is
hard to prove, and it misses the core of the problem. The real problem is low-level,
counterfactual content, be it generated by an AI or not.

Perhaps there could be a way to report misleading, bad content and a policy that says
that you can be banned if you repeatedly write grossly misleading and counterfactual
content. Stuff like "to improve performance, set fast_mode = on and restart the database".

Yours,
Laurenz Albe

#7David Rowley
dgrowleyml@gmail.com
In reply to: Laurenz Albe (#6)
Re: Planet Postgres and the curse of AI

On Fri, 19 Jul 2024 at 00:31, Laurenz Albe <laurenz.albe@cybertec.at> wrote:

Perhaps there could be a way to report misleading, bad content and a policy that says
that you can be banned if you repeatedly write grossly misleading and counterfactual
content. Stuff like "to improve performance, set fast_mode = on and restart the database".

As a first step, maybe it's worth just privately writing to the
offenders telling them what's been seen, giving them a chance to
improve and letting them know what they're doing isn't going
unnoticed. If I was doing this and someone pointed out lots of silly
mistakes with something I'd published, I'd be very embarrassed and I'd
reconsider my blog writing approach.

It might also be worth considering if we want to have a policy on LLM
usage in https://www.postgresql.org/about/policies/planet-postgresql/
. If we want to disallow blogs written by LLMs then we'd need to be
careful about how we define that as doing something like using an
LLM-based spell checker does not seem like it should be disallowed.
But to what degree exactly should that be allowed?

David

#8Greg Sabino Mullane
greg@turnstep.com
In reply to: David Rowley (#7)
Re: Planet Postgres and the curse of AI

But to what degree exactly should that be allowed?

Somewhat ironically, here's a distinction chatgpt and I came up with:

LLM-generated content: Content where the substantial part of the text is
directly created by LLMs without significant human alteration or editing.

Human-edited or reviewed content: Content that has been substantially
revised, corrected, or enhanced by a human after initial generation by
LLMs. This includes using spell and grammar checking, manual edits for
clarity or style, and content that reflects significant human input beyond
the original LLM output.

#9Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Greg Sabino Mullane (#8)
Re: Planet Postgres and the curse of AI

On Thu, 2024-07-18 at 10:25 -0400, Greg Sabino Mullane wrote:

But to what degree exactly should that be allowed?

Somewhat ironically, here's a distinction chatgpt and I came up with:

LLM-generated content: Content where the substantial part of the text is directly
created by LLMs without significant human alteration or editing.

I have no problem with that definition, but it is useless as a policy:
Even in a blog with glaring AI nonsense in it, how can you prove that the
author did not actually edit and improve other significant parts of the text?

Why not say that authors who repeatedly post grossly counterfactual or
misleading content can be banned?

Yours,
Laurenz Albe

#10Greg Sabino Mullane
greg@turnstep.com
In reply to: Laurenz Albe (#9)
Re: Planet Postgres and the curse of AI

On Fri, Jul 19, 2024 at 3:22 AM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

I have no problem with that definition, but it is useless as a policy:
Even in a blog with glaring AI nonsense in it, how can you prove that the
author did not actually edit and improve other significant parts of the
text?

Well, we can't 100% prove it, but we can have ethical guidelines. We
already have other guidelines that are open to interpretation (and plenty
of planet posts bend the rules quite often, IMO, but that's another post).

Why not say that authors who repeatedly post grossly counterfactual or
misleading content can be banned?

Banned is a strong word, but certainly they can have the posts removed, and
receive warnings from the planet admins. If the admins can point to a
policy, that helps. Perhaps as you hint at, we need a policy to not just
discourage AI-generated things, but also wrong/misleading things in general
(which was not much of a problem before LLMs arrived, to be honest).

Cheers,
Greg

#11Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Greg Sabino Mullane (#10)
Re: Planet Postgres and the curse of AI

On Tue, 2024-07-23 at 10:38 -0400, Greg Sabino Mullane wrote:

Why not say that authors who repeatedly post grossly counterfactual or
misleading content can be banned?

Perhaps as you hint at, we need a policy to not just discourage AI-generated
things, but also wrong/misleading things in general

I have been known to make mistakes in my blogs...
We shouldn't discourage people who happen to blog something wrong.
That's why I used strong verbiage like "grossly counterfactual".

Yours,
Laurenz Albe

#12Avinash Kumar
avinash.vallarapu@gmail.com
In reply to: Laurenz Albe (#11)
Re: Planet Postgres and the curse of AI

Hi,

As someone who has taken days to publish each blog upon so many reviews,
corrections and edits while
attempting to make it as best/informative and as perfect as possible, it
might seem slightly frustrating when
we see some AI generated content, especially when it is misleading readers.

However, I do agree with Lawrence that it is impossible to prove whether it
is written by AI or a human.
AI can make mistakes and it might mistakenly point out that a blog is
written by AI (which I know is difficult to implement).

I see Moderators spending some time reviewing content and accepting or
warning if it is not related to Postgres.
AI may be adopted to help us score whether an article is related to
Postgres and decline the submission/blog feed.
But, it is very impossible to use AI or some strategy to identify whether
it is written by AI or human.

People may also use AI generated Images in their blogs, and they may be
meaningful for their article.
Is it only the content or also the images ? It might get too complicated
while implementing some rules.

Ultimately, Humans do make mistakes and we shouldn't discourage people
assuming it is AI that made that mistake.

On Tue, Jul 23, 2024 at 11:51 AM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

On Tue, 2024-07-23 at 10:38 -0400, Greg Sabino Mullane wrote:

Why not say that authors who repeatedly post grossly counterfactual or
misleading content can be banned?

Perhaps as you hint at, we need a policy to not just discourage

AI-generated

things, but also wrong/misleading things in general

I have been known to make mistakes in my blogs...
We shouldn't discourage people who happen to blog something wrong.
That's why I used strong verbiage like "grossly counterfactual".

Yours,
Laurenz Albe

--
Regards,
Avinash Vallarapu

#13Greg Sabino Mullane
greg@turnstep.com
In reply to: Avinash Kumar (#12)
Re: Planet Postgres and the curse of AI

On Tue, Jul 23, 2024 at 12:45 PM Avinash Vallarapu <
avinash.vallarapu@gmail.com> wrote:

However, I do agree with Lawrence that it is impossible to prove whether
it is written by AI or a human.
AI can make mistakes and it might mistakenly point out that a blog is
written by AI (which I know is difficult to implement).

Right - I am not interested in "proving" things, but I think a policy to
discourage overuse of AI is warranted.

People may also use AI generated Images in their blogs, and they may be

meaningful for their article.
Is it only the content or also the images ? It might get too complicated
while implementing some rules.

Only the content, the images are perfectly fine. Even expected, these days.

Ultimately, Humans do make mistakes and we shouldn't discourage people
assuming it is AI that made that mistake.

Humans make mistakes. AI confidently hallucinates.

#14Greg Sabino Mullane
greg@turnstep.com
In reply to: Laurenz Albe (#9)
Re: Planet Postgres and the curse of AI

On Fri, Jul 19, 2024 at 3:22 AM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

Why not say that authors who repeatedly post grossly counterfactual or
misleading content can be banned?

I like this, and feel we are getting closer. How about:

"Posts should be technically and factually correct. Use of AI should be
used for minor editing, not primary generation"

(wordsmithing needed)

Cheers,
Greg

#15Justin Clift
justin@postgresql.org
In reply to: Greg Sabino Mullane (#14)
Re: Planet Postgres and the curse of AI

On 2024-08-20 22:44, Greg Sabino Mullane wrote:

On Fri, Jul 19, 2024 at 3:22 AM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

Why not say that authors who repeatedly post grossly counterfactual or
misleading content can be banned?

I like this, and feel we are getting closer. How about:

"Posts should be technically and factually correct. Use of AI should be
used for minor editing, not primary generation"

Sounds pretty sensible. :)

+ Justin

#16Bruce Momjian
bruce@momjian.us
In reply to: Justin Clift (#15)
Re: Planet Postgres and the curse of AI

On Wed, Aug 21, 2024 at 02:19:22AM +1000, Justin Clift wrote:

On 2024-08-20 22:44, Greg Sabino Mullane wrote:

On Fri, Jul 19, 2024 at 3:22 AM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

Why not say that authors who repeatedly post grossly counterfactual or
misleading content can be banned?

I like this, and feel we are getting closer. How about:

"Posts should be technically and factually correct. Use of AI should be
used for minor editing, not primary generation"

Sounds pretty sensible. :)

Agreed. Honestly, some of the AI is so bad that if you see something you
suspect is AI generated, you can just ask the author what they meant by
that paragraph, and they will not be able to answer.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

#17Robert Treat
xzilla@users.sourceforge.net
In reply to: Greg Sabino Mullane (#13)
Re: Planet Postgres and the curse of AI

On Tue, Aug 20, 2024 at 8:33 AM Greg Sabino Mullane <htamfids@gmail.com> wrote:

On Tue, Jul 23, 2024 at 12:45 PM Avinash Vallarapu <avinash.vallarapu@gmail.com> wrote:

However, I do agree with Lawrence that it is impossible to prove whether it is written by AI or a human.
AI can make mistakes and it might mistakenly point out that a blog is written by AI (which I know is difficult to implement).

Right - I am not interested in "proving" things, but I think a policy to discourage overuse of AI is warranted.

People may also use AI generated Images in their blogs, and they may be meaningful for their article.
Is it only the content or also the images ? It might get too complicated while implementing some rules.

Only the content, the images are perfectly fine. Even expected, these days.

Ultimately, Humans do make mistakes and we shouldn't discourage people assuming it is AI that made that mistake.

Humans make mistakes. AI confidently hallucinates.

I think this is a key point, and one that we could focus on for
purposes of discouragement. Ie. "Blogs that are found to repeatedly
post incorrect information and/or AI style hallucinations may be
restricted from contributing to the planet postgres feed. This will be
determined on a case by case basis." While it is likely impossible to
come up with a set of rules that will satisfy some of the more
legalistic folks among us, this would be a simple warning that would
at least encourage folks to make sure they aren't posting bad
information and leave a door open for enforcement if needed. And yes,
this assumes that the folks running planet will enforce if needed,
though I don't think it requires heavy policing at this point.

Robert Treat
https://xzilla.net

#18John the Scott
jmscott@setspace.com
In reply to: Robert Treat (#17)
Re: Planet Postgres and the curse of AI

Posts should be technically and factually correct

agreed and period. no need qualify how the nonsense was created.

-john

On Thu, Aug 22, 2024 at 4:13 PM Robert Treat <rob@xzilla.net> wrote:

On Tue, Aug 20, 2024 at 8:33 AM Greg Sabino Mullane <htamfids@gmail.com> wrote:

On Tue, Jul 23, 2024 at 12:45 PM Avinash Vallarapu <avinash.vallarapu@gmail.com> wrote:

However, I do agree with Lawrence that it is impossible to prove whether it is written by AI or a human.
AI can make mistakes and it might mistakenly point out that a blog is written by AI (which I know is difficult to implement).

Right - I am not interested in "proving" things, but I think a policy to discourage overuse of AI is warranted.

People may also use AI generated Images in their blogs, and they may be meaningful for their article.
Is it only the content or also the images ? It might get too complicated while implementing some rules.

Only the content, the images are perfectly fine. Even expected, these days.

Ultimately, Humans do make mistakes and we shouldn't discourage people assuming it is AI that made that mistake.

Humans make mistakes. AI confidently hallucinates.

I think this is a key point, and one that we could focus on for
purposes of discouragement. Ie. "Blogs that are found to repeatedly
post incorrect information and/or AI style hallucinations may be
restricted from contributing to the planet postgres feed. This will be
determined on a case by case basis." While it is likely impossible to
come up with a set of rules that will satisfy some of the more
legalistic folks among us, this would be a simple warning that would
at least encourage folks to make sure they aren't posting bad
information and leave a door open for enforcement if needed. And yes,
this assumes that the folks running planet will enforce if needed,
though I don't think it requires heavy policing at this point.

Robert Treat
https://xzilla.net

--
Fast is fine, But accuracy is final.
You must learn to be slow in a hurry.
- Wyatt Earp