JSONB Array of Strings (with GIN index) versus Split Rows (B-Tree Index)

Started by Syed Jafriabout 7 years ago3 messagesgeneral

syed.jafri2@ucalgary.ca

about 7 years ago

I have a database which stores receiver to indicate which account the data relates to. This has led to tons of duplication of data, as one set of data may create 3 separate rows, where the only difference is the receiver column.

|---------------------|------------------|---------------------|------------------|
| Receiver | Event | Date | Location |
|---------------------|------------------|---------------------|------------------|
| Alpha | 3 | 12 | USA |
|---------------------|------------------|---------------------|------------------|
| Bravo | 3 | 12 | USA |
|---------------------|------------------|---------------------|------------------|
| Charlie | 3 | 12 | USA |
|---------------------|------------------|---------------------|------------------|

While redesigning the database, I have considered using an array with a GIN index instead of the current B-Tree index on receiver. My proposed new table would look like this:
|-------------------------------|--------------|------------|-------------------|
| Receivers | Event | Date | Location |
|-------------------------------|--------------|------------|-------------------|
| ["Alpha", "Bravo", "Charlie"] | 3 | 12 | USA |
|-------------------------------|--------------|------------|-------------------|

More Information:
· Receiver names are of the type (a-z, 1-5, .)
· 95% of all queries currently look like this: SELECT * FROM table WHERE Receiver = Alpha, with the new format this would be SELECT * FROM table WHERE receivers @> '"Alpha"'::jsonb;
· The table currently contains over 4 billion rows (with duplication) and the new proposed schema would cut it down to under 2 billion rows.
·
Question:
1. Does it make more sense to use Postgres Native Text Array?
2. Would a jsonb_path_ops GIN index on receivers make sense here?
3. Which option is more efficient? Which is faster?

Ian Zimmerman

itz@very.loosely.org

about 7 years ago

In reply to: Syed Jafri (#1)

Re: JSONB Array of Strings (with GIN index) versus Split Rows (B-Tree Index)

On 2019-02-04 05:34, Syed Jafri wrote:

|---------------------|------------------|---------------------|------------------|
| Receiver | Event | Date | Location |
|---------------------|------------------|---------------------|------------------|
| Alpha | 3 | 12 | USA |
|---------------------|------------------|---------------------|------------------|
| Bravo | 3 | 12 | USA |
|---------------------|------------------|---------------------|------------------|
| Charlie | 3 | 12 | USA |
|---------------------|------------------|---------------------|------------------|

Please forgive a naive question, but doesn't this simply cry out for a
normalization step?

EventID Date Location
3 12 USA

Receiver EventID
Alpha 3
Bravo 3
Charlie 3

If you want to use JSON by any means necessary, perhaps a simpler
key/value store is more appropriate than a relational DB like Postgres?

--
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet and on broken lists
which rewrite From, fetch the TXT record for no-use.mooo.com.

David G. Johnston

david.g.johnston@gmail.com

about 7 years ago

In reply to: Syed Jafri (#1)

Re: JSONB Array of Strings (with GIN index) versus Split Rows (B-Tree Index)

On Sun, Feb 3, 2019 at 10:35 PM Syed Jafri <syed.jafri2@ucalgary.ca> wrote:

· Receiver names are of the type (a-z, 1-5, .)

· 95% of all queries currently look like this: SELECT * FROM table WHERE Receiver = Alpha, with the new format this would be SELECT * FROM table WHERE receivers @> '"Alpha"'::jsonb;
3. Which option is more efficient? Which is faster?

I'd probably develop a performance test for the model and planned
queries and experiment with:

1. text arrays
2. jsonb arrays
3. partitioning

David J.