Performance of JSON type in postgres
Hi,
It's postgres 15+. And need guidance on JSON types, (for both on premise
vanilla postgres installation and AWS cloud based RDS/aurora postgres
installations).
I have never worked on a JSON data type in the past. But now in one of the
scenarios the team wants to use it and thus want to understand if there are
any pros/cons in using JSONB vs JSON in postgres. Is there a specific
scenario in which one should use one vs other. Any storage or performance
threshold or limitations if any exists etc? Even some team mates suggest
storing it simply in string data types. So I want to understand the
experts' opinion on this which I believe will be crucial during design
itself.
Regards
Veem
On 7/14/25 12:51, veem v wrote:
Hi,
It's postgres 15+. And need guidance on JSON types, (for both on premise
vanilla postgres installation and AWS cloud based RDS/aurora postgres
installations).I have never worked on a JSON data type in the past. But now in one of
the scenarios the team wants to use it and thus want to understand if
there are any pros/cons in using JSONB vs JSON in postgres. Is there a
specific scenario in which one should use one vs other. Any storage or
performance threshold or limitations if any exists etc? Even some team
mates suggest storing it simply in string data types. So I want to
understand the experts' opinion on this which I believe will be
crucial during design itself.
It is spelled out here:
https://www.postgresql.org/docs/current/datatype-json.html
Regards
Veem
--
Adrian Klaver
adrian.klaver@aklaver.com
On Mon, Jul 14, 2025 at 12:54 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
On 7/14/25 12:51, veem v wrote:
So I want to
understand the experts' opinion on this which I believe will be
crucial during design itself.It is spelled out here:
I've taken to heart the main takeaway from that page:
"In general, most applications should prefer to store JSON data as jsonb,
unless there are quite specialized needs, such as legacy assumptions about
ordering of object keys."
David J.
On Mon, Jul 14, 2025 at 2:01 PM David G. Johnston <
david.g.johnston@gmail.com> wrote:
On Mon, Jul 14, 2025 at 12:54 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:On 7/14/25 12:51, veem v wrote:
So I want to
understand the experts' opinion on this which I believe will be
crucial during design itself.It is spelled out here:
I've taken to heart the main takeaway from that page:
"In general, most applications should prefer to store JSON data as jsonb,
unless there are quite specialized needs, such as legacy assumptions about
ordering of object keys."
I don't think the documentation is accurate at all, unless one of those
specialized needs is to 'be faster'. json serialization is more than 2x
faster based on simple testing (see below). This is absolutely not a
trivial difference.
I would say, use json for serialization, use jsonb for data storage, unless
the precise structure of the input document is important.
merlin
leaselock_iam@leaselock_prod=> explain analyze select json_agg(l) from (
select l from llcore.lease l limit 10000) q;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=405.52..405.53 rows=1 width=32) (actual
time=69.043..69.048 rows=1 loops=1)
-> Limit (cost=0.00..380.52 rows=10000 width=247) (actual
time=0.017..9.764 rows=10000 loops=1)
-> Seq Scan on lease l (cost=0.00..100383.89 rows=2638089
width=247) (actual time=0.016..8.831 rows=10000 loops=1)
Planning Time: 0.109 ms
Execution Time: 69.088 ms
(5 rows)
Time: 160.560 ms
leaselock_iam@leaselock_prod=> explain analyze select jsonb_agg(l) from (
select l from llcore.lease l limit 10000) q;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=405.52..405.53 rows=1 width=32) (actual
time=146.139..146.141 rows=1 loops=1)
-> Limit (cost=0.00..380.52 rows=10000 width=247) (actual
time=0.017..20.837 rows=10000 loops=1)
-> Seq Scan on lease l (cost=0.00..100383.89 rows=2638089
width=247) (actual time=0.016..19.975 rows=10000 loops=1)
Planning Time: 0.108 ms
Execution Time: 152.277 ms
On Tue, 15 Jul 2025 at 23:02, Merlin Moncure <mmoncure@gmail.com> wrote:
On Mon, Jul 14, 2025 at 2:01 PM David G. Johnston <
david.g.johnston@gmail.com> wrote:On Mon, Jul 14, 2025 at 12:54 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:On 7/14/25 12:51, veem v wrote:
So I want to
understand the experts' opinion on this which I believe will be
crucial during design itself.It is spelled out here:
I've taken to heart the main takeaway from that page:
"In general, most applications should prefer to store JSON data as jsonb,
unless there are quite specialized needs, such as legacy assumptions about
ordering of object keys."I don't think the documentation is accurate at all, unless one of those
specialized needs is to 'be faster'. json serialization is more than 2x
faster based on simple testing (see below). This is absolutely not a
trivial difference.I would say, use json for serialization, use jsonb for data storage,
unless the precise structure of the input document is important.merlin
leaselock_iam@leaselock_prod=> explain analyze select json_agg(l) from (
select l from llcore.lease l limit 10000) q;
QUERY PLAN------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=405.52..405.53 rows=1 width=32) (actual
time=69.043..69.048 rows=1 loops=1)
-> Limit (cost=0.00..380.52 rows=10000 width=247) (actual
time=0.017..9.764 rows=10000 loops=1)
-> Seq Scan on lease l (cost=0.00..100383.89 rows=2638089
width=247) (actual time=0.016..8.831 rows=10000 loops=1)
Planning Time: 0.109 ms
Execution Time: 69.088 ms
(5 rows)Time: 160.560 ms
leaselock_iam@leaselock_prod=> explain analyze select jsonb_agg(l) from (
select l from llcore.lease l limit 10000) q;
QUERY PLAN-------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=405.52..405.53 rows=1 width=32) (actual
time=146.139..146.141 rows=1 loops=1)
-> Limit (cost=0.00..380.52 rows=10000 width=247) (actual
time=0.017..20.837 rows=10000 loops=1)
-> Seq Scan on lease l (cost=0.00..100383.89 rows=2638089
width=247) (actual time=0.016..19.975 rows=10000 loops=1)
Planning Time: 0.108 ms
Execution Time: 152.277 ms
Thank you.
I tested below for sample data. I see loading or serialization seems a lot
slower(twice as slower) in JSONB as compared to JSON. Whereas storage looks
efficient in JSONB. and reading performance of nested fields are 7-8 times
slower in JSON as compared to JSONB(and ofcourse index support makes it a
better choice here). Hope i am testing it correctly here.
So I am a bit confused here . Also one of our use case is, along with
persisting this data and querying it in postgres database, We are also
going to move this data from postgres (which is a upstream OLTP system) to
a downstream OLAP system ,which is in Snowflake database which is having
data types like Variant or Varchar types. So, will it create a significant
difference if we store it in JSON vs JSONB in our postgres i.e the
source/upstream database?
On 7/19/25 13:39, veem v wrote:
Thank you.
I tested below for sample data. I see loading or serialization seems a
lot slower(twice as slower) in JSONB as compared to JSON. Whereas
storage looks efficient in JSONB. and reading performance of nested
fields are 7-8 times slower in JSON as compared to JSONB(and ofcourse
index support makes it a better choice here). Hope i am testing it
correctly here.https://dbfiddle.uk/6P7sjL22 <https://dbfiddle.uk/6P7sjL22>
So I am a bit confused here . Also one of our use case is, along with
persisting this data and querying it in postgres database, We are also
going to move this data from postgres (which is a upstream OLTP system)
to a downstream OLAP system ,which is in Snowflake database which is
having data types like Variant or Varchar types. So, will it create a
significant difference if we store it in JSON vs JSONB in our postgres
i.e the source/upstream database?
I thought you are answered that with your tests above? At least for the
Postgres end. As to the Snowflake end you will need to do comparable
tests for fetching the data from Postgres and transforming it.
--
Adrian Klaver
adrian.klaver@aklaver.com
On Sun, 20 Jul 2025 at 02:29, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
On 7/19/25 13:39, veem v wrote:
I thought you are answered that with your tests above? At least for the
Postgres end. As to the Snowflake end you will need to do comparable
tests for fetching the data from Postgres and transforming it.Thank you Adrian.
Yes will try to test the load from postgres to snowflake to see if any
specific format makes a difference in such a situation(mainly considering
JSONB seems postgres native only).
Additionally I am unable to test upfront, but few teammates are saying
below. Are these really true? Wants to know from experts here,
1)The lack of detailed statistics on data distribution within JSONB columns
can hinder the query planner from making optimal choices, sometimes leading
to slower execution or a reliance on sequential scans even when indexes
exist. Storing extensive or deeply nested structures within a single JSONB
document can lead to document bloat.
2)Loss of Formatting and Order: The binary format of JSONB doesn't preserve
the original order of keys, whitespace, or duplicate keys in the JSON input.
3)Lack of Type Safety and Schema Enforcement: JSONB provides no inherent
schema or type validation. This means you can easily insert inconsistent
data types for the same key across different rows, making data management
and querying challenging.
4)No Native Foreign Key Support: You cannot directly define foreign key
constraints within a JSONB column to enforce referential integrity with
other tables.
Regards
Veem
On Sat, Jul 19, 2025 at 5:19 PM veem v <veema0000@gmail.com> wrote:
On Sun, 20 Jul 2025 at 02:29, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:On 7/19/25 13:39, veem v wrote:
I thought you are answered that with your tests above? At least for the
Postgres end. As to the Snowflake end you will need to do comparable
tests for fetching the data from Postgres and transforming it.Thank you Adrian.
Yes will try to test the load from postgres to snowflake to see if any
specific format makes a difference in such a situation(mainly considering
JSONB seems postgres native only).Additionally I am unable to test upfront, but few teammates are saying
below. Are these really true? Wants to know from experts here,1)The lack of detailed statistics on data distribution within JSONB
columns can hinder the query planner from making optimal choices, sometimes
leading to slower execution or a reliance on sequential scans even when
indexes exist. Storing extensive or deeply nested structures within a
single JSONB document can lead to document bloat.2)Loss of Formatting and Order: The binary format of JSONB doesn't
preserve the original order of keys, whitespace, or duplicate keys in the
JSON input.3)Lack of Type Safety and Schema Enforcement: JSONB provides no inherent
schema or type validation. This means you can easily insert inconsistent
data types for the same key across different rows, making data management
and querying challenging.4)No Native Foreign Key Support: You cannot directly define foreign key
constraints within a JSONB column to enforce referential integrity with
other tables.
The answer to all of these questions is: normalize your data structures.
--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
On 7/19/25 14:19, veem v wrote:
On Sun, 20 Jul 2025 at 02:29, Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:On 7/19/25 13:39, veem v wrote:
I thought you are answered that with your tests above? At least for the
Postgres end. As to the Snowflake end you will need to do comparable
tests for fetching the data from Postgres and transforming it.Thank you Adrian.
Yes will try to test the load from postgres to snowflake to see if any
specific format makes a difference in such a situation(mainly
considering JSONB seems postgres native only).Additionally I am unable to test upfront, but few teammates are saying
below. Are these really true? Wants to know from experts here,
Again as I mentioned at the beginning of this thread many of the answers
can be found here:
https://www.postgresql.org/docs/current/datatype-json.html
1)The lack of detailed statistics on data distribution within JSONB
columns can hinder the query planner from making optimal choices,
sometimes leading to slower execution or a reliance on sequential scans
even when indexes exist. Storing extensive or deeply nested structures
within a single JSONB document can lead to document bloat.
8.14.4. jsonb Indexing
2)Loss of Formatting and Order: The binary format of JSONB doesn't
preserve the original order of keys, whitespace, or duplicate keys in
the JSON input.
"By contrast, jsonb does not preserve white space, does not preserve the
order of object keys, and does not keep duplicate object keys. If
duplicate keys are specified in the input, only the last value is kept."
3)Lack of Type Safety and Schema Enforcement: JSONB provides no inherent
schema or type validation. This means you can easily insert inconsistent
data types for the same key across different rows, making data
management and querying challenging.
"JSON data types are for storing JSON (JavaScript Object Notation) data,
as specified in RFC 7159. Such data can also be stored as text, but the
JSON data types have the advantage of enforcing that each stored value
is valid according to the JSON rules. "
There is no schema validation.
4)No Native Foreign Key Support: You cannot directly define foreign key
constraints within a JSONB column to enforce referential integrity with
other tables.
Table 8.23. JSON Primitive Types and Corresponding PostgreSQL Types
If you look at that you will see that JSON(B) types do not cover the
range that Postgres types do, making it difficult to map directly to
Postgres tables even if there where FK's allowed.
To me it looks like you really need to store the data in defined
Postgres tables.
Regards
Veem
--
Adrian Klaver
adrian.klaver@aklaver.com