making a pg store of 'multiple checkboxes' efficient

Started by Jonathan Vanascoalmost 19 years ago3 messagesgeneral
Jump to latest
#1Jonathan Vanasco
postgres@2xlp.com

I have a large table (5M items current) that is projected to grow at
the rate of 2.5M a month looking at the current usage trends.

the table represents some core standardized user account attributes ,
while text heavy / unstandardized info lies in other tables.

my issue is this: i'm adding in a 'multiple checkboxes' style field,
and trying to weigh the options for db representation against one
another.

my main concern is speed - this is read heavy , but I am worried to
some degree about disk space (not sure where disk space fits in with
pg, when I used to use mysql the simplest schema change could
drastically effect the disk size though ).

that said , these are my current choices:

option a
bitwise operations
and/or operations to condense checkboxes into searchable field
pro:
super small
fits in 1 table
con:
could not find any docs on the speed of bitwise searches in pg

option b
secondary table with bools
create table extends( account_id , option_1_bool , option_2_bool )
pro:
1 join , fast search on bools
con:
PITA to maintain/extend

option c
mapping table
create table mapping ( account_id , option_id )
pro:
extensible
con:
slow speed - needs multiple joins , records all over

I'd personally lean towards option a or b . anyone have suggestions ?

thanks.

// Jonathan Vanasco

| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - -
| SyndiClick.com
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - -
| FindMeOn.com - The cure for Multiple Web Personality Disorder
| Web Identity Management and 3D Social Networking
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - -
| RoadSound.com - Tools For Bands, Stuff For Fans
| Collaborative Online Management And Syndication Tools
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - -

#2Alban Hertroys
alban@magproductions.nl
In reply to: Jonathan Vanasco (#1)
Re: making a pg store of 'multiple checkboxes' efficient

Jonathan Vanasco wrote:

I have a large table (5M items current) that is projected to grow at the
rate of 2.5M a month looking at the current usage trends.

the table represents some core standardized user account attributes ,
while text heavy / unstandardized info lies in other tables.

my issue is this: i'm adding in a 'multiple checkboxes' style field, and
trying to weigh the options for db representation against one another.

my main concern is speed - this is read heavy , but I am worried to some
degree about disk space (not sure where disk space fits in with pg, when
I used to use mysql the simplest schema change could drastically effect
the disk size though ).

that said , these are my current choices:

option a
bitwise operations
and/or operations to condense checkboxes into searchable field
pro:
super small
fits in 1 table
con:
could not find any docs on the speed of bitwise searches
in pg

I got some good results using bitwise operations on an integer column.
An index on such a column helps a great deal of course. What type of
integer you need depends on how many booleans you (expect to) have.

My operations were like "WHERE (value & 80) = 80" to match against the
7th and 5th bits, with value an int4 column. Query times are in the tens
of milliseconds range.

Admittedly I only got about that many records through joins with other
tables I needed (meaning that my result set may at some point in time
have been around as big as your data set), so the planner may have
reduced the number of bitwise operations significantly for me.

The actual number of bitwise values to compare was around 40,000
integers * 25 mask values, but that got joined with over 1M records from
another result set.

A bitwise operator on a single column should (theoretically) have less
overhead than integer/boolean operators on multiple columns. Computers
are good at bitwise operations, after all.

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

#3Jonathan Vanasco
postgres@2xlp.com
In reply to: Alban Hertroys (#2)
Re: making a pg store of 'multiple checkboxes' efficient

On Apr 18, 2007, at 4:28 AM, Alban Hertroys wrote:

I got some good results using bitwise operations on an integer column.
An index on such a column helps a great deal of course. What type of
integer you need depends on how many booleans you (expect to) have.

My operations were like "WHERE (value & 80) = 80" to match against the
7th and 5th bits, with value an int4 column. Query times are in the
tens
of milliseconds range.

Admittedly I only got about that many records through joins with other
tables I needed (meaning that my result set may at some point in time
have been around as big as your data set), so the planner may have
reduced the number of bitwise operations significantly for me.

The actual number of bitwise values to compare was around 40,000
integers * 25 mask values, but that got joined with over 1M records
from
another result set.

A bitwise operator on a single column should (theoretically) have less
overhead than integer/boolean operators on multiple columns. Computers
are good at bitwise operations, after all.

Computers are good at bitwise operations, but software often has
scary implementations :)

thanks for the input. I'll definitely go this route. It was my
first thought, but there is almost no documentation out there for
this type of storage.

// Jonathan Vanasco

| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - -
| SyndiClick.com
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - -
| FindMeOn.com - The cure for Multiple Web Personality Disorder
| Web Identity Management and 3D Social Networking
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - -
| RoadSound.com - Tools For Bands, Stuff For Fans
| Collaborative Online Management And Syndication Tools
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - -