pgvector as standard PostgreSQL feature?
Hello,
I am looking at pgvector, pgvectorscale, pgai extensions.
Other DB engines support built-in vector types.
Is there a plan to get pgvector's types (vector, halfvec, sparsevec, bit) implemented as native built-in data types like json/jsonb ?
Side note: I have some doubts about these type names, especially "bit" ... why not "bitvec"?
Seb
On Mar 19, 2025, at 07:47, Sebastien Flaesch <sebastien.flaesch@4js.com> wrote:
Is there a plan to get pgvector's types (vector, halfvec, sparsevec, bit) implemented as native built-in data types like json/jsonb ?
(I'm speaking just for myself here.) I would not base any plans on this functionality being available in the PostgreSQL core in the near future (and by "near future," I mean the next five years).
1. You list three different extensions with overlapping functionality, and that's a good sign that there isn't consensus on what the features that would be offered in core should be.
2. Adding a type to the core distribution (or even to contrib/) creates a maintenance burden on the core developers, and that's not something assumed lightly. Once a type is in core, it (almost) never can be removed, and the more specialized the type and detailed the implementation, the greater the risk that the developers who know and care about it won't be available in the future. Search the archives for a discussion of the "money" type for what happens when a type added to core starts becoming ill-supported... and "money" isn't anywhere near as complex as vector functionality.
3. PostgreSQL is designed to have a rich ecosystem of extensions. The ability to add this kind of functionality in an extension is exactly what distinguishes PostgreSQL from many other RDBMS systems. There's no burning need to add functionality like this to core.
It is true that hosted environments take time to adopt new extensions (although AWS RDS has supported pgvector for nearly two years now), but that's not in itself a reason to move things into core.
Side note: I have some doubts about these type names, especially "bit" ... why not "bitvec"?
BIT and BIT VARYING are the SQL standard names for these types.
Go it, makes total sense.
So pgvector etc will probably remain an extension for a while.
Thanks for the note about BIT type.
I have missed that it's a standard built-in type.
Seb
________________________________
From: Christophe Pettus <xof@thebuild.com>
Sent: Wednesday, March 19, 2025 9:19 AM
To: Sebastien Flaesch <sebastien.flaesch@4js.com>
Cc: pgsql-general@postgresql.org <pgsql-general@postgresql.org>
Subject: Re: pgvector as standard PostgreSQL feature?
EXTERNAL: Do not click links or open attachments if you do not recognize the sender.
On Mar 19, 2025, at 07:47, Sebastien Flaesch <sebastien.flaesch@4js.com> wrote:
Is there a plan to get pgvector's types (vector, halfvec, sparsevec, bit) implemented as native built-in data types like json/jsonb ?
(I'm speaking just for myself here.) I would not base any plans on this functionality being available in the PostgreSQL core in the near future (and by "near future," I mean the next five years).
1. You list three different extensions with overlapping functionality, and that's a good sign that there isn't consensus on what the features that would be offered in core should be.
2. Adding a type to the core distribution (or even to contrib/) creates a maintenance burden on the core developers, and that's not something assumed lightly. Once a type is in core, it (almost) never can be removed, and the more specialized the type and detailed the implementation, the greater the risk that the developers who know and care about it won't be available in the future. Search the archives for a discussion of the "money" type for what happens when a type added to core starts becoming ill-supported... and "money" isn't anywhere near as complex as vector functionality.
3. PostgreSQL is designed to have a rich ecosystem of extensions. The ability to add this kind of functionality in an extension is exactly what distinguishes PostgreSQL from many other RDBMS systems. There's no burning need to add functionality like this to core.
It is true that hosted environments take time to adopt new extensions (although AWS RDS has supported pgvector for nearly two years now), but that's not in itself a reason to move things into core.
Side note: I have some doubts about these type names, especially "bit" ... why not "bitvec"?
BIT and BIT VARYING are the SQL standard names for these types.
On Wed, Mar 19, 2025 at 3:20 AM Christophe Pettus <xof@thebuild.com> wrote:
On Mar 19, 2025, at 07:47, Sebastien Flaesch <sebastien.flaesch@4js.com>
wrote:
2. Adding a type to the core distribution (or even to contrib/) creates a
maintenance burden on the core developers, and that's not something assumed
lightly. Once a type is in core, it (almost) never can be removed, and the
more specialized the type and detailed the implementation, the greater the
risk that the developers who know and care about it won't be available in
the future. Search the archives for a discussion of the "money" type for
what happens when a type added to core starts becoming ill-supported... and
"money" isn't anywhere near as complex as vector functionality.3. PostgreSQL is designed to have a rich ecosystem of extensions. The
ability to add this kind of functionality in an extension is exactly what
distinguishes PostgreSQL from many other RDBMS systems. There's no burning
need to add functionality like this to core.It is true that hosted environments take time to adopt new extensions
(although AWS RDS has supported pgvector for nearly two years now), but
that's not in itself a reason to move things into core.
Managed offerings are the norm now, so there really is big
difference between core and non-core extension, unless your extension is
popular enough (as pgvector is) to be supported across the major providers
or only requires SQL to deploy. Your point about core is valid, but
there is definitely a squeeze on that is preventing ecosystem expansion,
don't know what the solution is though.
merlin