[PATCH][DOC][MINOR] Fix incorrect lexeme limit in textsearch docs

Started by Dharin Shah15 days ago3 messages
#1Dharin Shah
dharinshah95@gmail.com
1 attachment(s)

Hello,

A minor doc patch for this page
https://www.postgresql.org/docs/current/textsearch-limitations.html
and this line

*- The number of lexemes must be less than 2^64*

Docs wrongly claim "lexemes must be < 2^64" but the actual constraint is
1 MB total storage (MAXSTRPOS), and no 2^64 check exists in the code.

From src/include/tsearch/ts_type.h:

#define MAXSTRPOS ( (1<<20) - 1) // 1,048,575 bytes

typedef struct {
int32 size; // number of lexemes
...
} TSVectorData;

The attached patch:
- Removes the incorrect 2^64 claim
- Clarifies this means "distinct lexemes in a single tsvector value"

Thanks,
Dharin

Attachments:

0001-docs-Fix-incorrect-tsvector-lexeme-limit-in-textsear.patchapplication/octet-stream; name=0001-docs-Fix-incorrect-tsvector-lexeme-limit-in-textsear.patchDownload
From 0e48eda155f5e80b28c06e2a27c7efb884a6ccee Mon Sep 17 00:00:00 2001
From: Dharin Shah <8616130+Dharin-shah@users.noreply.github.com>
Date: Sat, 27 Dec 2025 17:10:38 +0100
Subject: [PATCH] docs: Fix incorrect tsvector lexeme limit in textsearch.sgml

The documentation incorrectly stated that the number of lexemes must be
less than 2^64. The actual constraint is the 1 MB storage limit for the
total tsvector data. This clarifies that the limit applies to distinct
lexemes in a single tsvector value.
---
 doc/src/sgml/textsearch.sgml | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml
index d20484cb232..762f7024664 100644
--- a/doc/src/sgml/textsearch.sgml
+++ b/doc/src/sgml/textsearch.sgml
@@ -3998,9 +3998,8 @@ Parser: "pg_catalog.default"
      less than 1 megabyte</para>
     </listitem>
     <listitem>
-     <!-- TODO: number of lexemes in what?  This is unclear -->
-     <para>The number of lexemes must be less than
-     2<superscript>64</superscript></para>
+     <para>The number of distinct lexemes in a single <type>tsvector</type>
+     value is constrained by the 1 megabyte total storage limit (see above)</para>
     </listitem>
     <listitem>
      <para>Position values in <type>tsvector</type> must be greater than 0 and
-- 
2.39.3 (Apple Git-146)

#2Dharin Shah
dharinshah95@gmail.com
In reply to: Dharin Shah (#1)
Re: [PATCH][DOC][MINOR] Fix incorrect lexeme limit in textsearch docs

Hello,

Gentle ping on the textsearch docs patch. Happy to address any feedback

Thanks,
Dharin

On Sat, Dec 27, 2025 at 10:09 PM Dharin Shah <dharinshah95@gmail.com> wrote:

Show quoted text

Hello,

A minor doc patch for this page
https://www.postgresql.org/docs/current/textsearch-limitations.html
and this line

*- The number of lexemes must be less than 2^64*

Docs wrongly claim "lexemes must be < 2^64" but the actual constraint is
1 MB total storage (MAXSTRPOS), and no 2^64 check exists in the code.

From src/include/tsearch/ts_type.h:

#define MAXSTRPOS ( (1<<20) - 1) // 1,048,575 bytes

typedef struct {
int32 size; // number of lexemes
...
} TSVectorData;

The attached patch:
- Removes the incorrect 2^64 claim
- Clarifies this means "distinct lexemes in a single tsvector value"

Thanks,
Dharin

#3surya poondla
suryapoondla4@gmail.com
In reply to: Dharin Shah (#1)
Re: [PATCH][DOC][MINOR] Fix incorrect lexeme limit in textsearch docs

Hi Dharin,

I looked at your patch, it looks good.

In the code, I couldn’t find any 2^64 bound on the lexeme count, so
removing that makes sense.
The added sentence about distinct lexeme count seems to overlap with the
existing description of tsvector limits, so I’m not sure it adds much new
information.

-Surya Poondla

Show quoted text