Hot to model data in DB (PostgreSQL) for SNMP-like multiple configurations

Started by Alexandru Lazarevabout 7 years ago4 messagesgeneral

alexandru.lazarev@gmail.com

about 7 years ago

I am working on product managing and monitoring Network (NMS-like products).

Product manages configuration of network devices, for now each device has
stored its configuration in simple table - this was the original design.

CREATE TABLE public.configuration(
id integer NOT NULL,
config json NOT NULL,
CONSTRAINT configuration_pkey PRIMARY KEY (id),)

A config looks like:

{
"_id": 20818132,
"type": "Modem",
"data": [{
"oid": "1.3.6.1.4.1.9999.3.10.2.2.25.4.1.40",
"instance": "24",
"value": "null"
},
{
"oid": "1.3.6.1.4.1.9999.3.5.10.1.86",
"instance": "0",
"value": "562"
},
{
"oid": "1.3.6.1.4.1.9999.3.5.10.3.92.4.1",
"instance": "0",
"value": "0"
},
{
"oid": "1.3.6.1.4.1.9999.3.10.2.2.25.4.1.43",
"instance": "24",
"value": "vlan24"
},
{
"oid": "1.3.6.1.4.1.9999.3.10.2.2.25.4.1.43",
"instance": "25",
"value": "vlan25"
}
]}

And there are many plv8 (java script procedural language extension for
PostgreSQL) stored procedures working on bulks of such config, reading some
OIDs, changing them conditionally, removing some of them and adding others,
especially in use cases like: There are some upper-level META-configuration
of different level, which during change have to update all their updated
parameters to all affected leaves configs. An simple test-example (but
without touching 'data' node)

CREATE OR REPLACE FUNCTION public.process_jsonb_plv8()
RETURNS void AS$BODY$
var CFG_TABLE_NAME = "configurations";
var selPlan = plv8.prepare( "select c.config from " + CFG_TABLE_NAME +
" c where c.id = $1", ['int'] );
var updPlan = plv8.prepare( 'update ' + CFG_TABLE_NAME + ' set config
= $1 where id = $2', ['jsonb','int'] );

try {

var ids = plv8.execute('select id from devices');

for (var i = 0; i < ids.length; i++) {
var db_cfg = selPlan.execute(ids[i].id); //Get current json
config from DB
var cfg = db_cfg[0].config;
cfg["key0"] = 'plv8_json'; //-add some dummy key
updPlan.execute(cfg, ids[i].id); //put uopdated JSON config in DB
plv8.elog(NOTICE, "UPDATED = " + ids[i].id);

}} finally {
selPlan.free();
updPlan.free();}
return;$BODY$
LANGUAGE plv8 VOLATILE
COST 100;

For real use-cases plv8 SPs are more complicated, doing FOR-LOOP through
ALL OIDs object of 'data' array, checking if it is looking for and update
value an/or remove it and/or add newer if necessary.

Since number of devices in DB increased from several hundreds to 40K or
even 70K, and number of OID+Instance combinations also increased from
several hundred to ~1K and sometimes up to 10K within a config, we start
facing slowness in bulk (especially global -> update to ALL Devices)
updates/searches.

In order to get rid off FOR LOOP step for each configuration I've converted
data-node from array to object (key-value model), something like :

{
"_id": 20818132,
"type": "Modem",
"data": {
"1.3.6.1.4.1.9999.3.10.2.2.25.4.1.40": {
"24": "null"
},
"1.3.6.1.4.1.9999.3.5.10.1.86": {
"0": "562"
},
"1.3.6.1.4.1.9999.3.5.10.3.92.4.1": {
"0": "0"
},
"1.3.6.1.4.1.9999.3.10.2.2.25.4.1.43": {
"24": "vlan24",
"25": "vlan25"
}
}}

Now in order to get a concrete OID (e.g.
"1.3.6.1.4.1.9999.3.10.2.2.25.4.1.43") and/or its instance I do 1-2 *O(1)*
operations instead *O(n)*. And it become a bit faster. After I've changed
column type from json to jsonb - I've got a lot of memory issues with plv8
stored procedures, so now ideas is:

*What are the best practices to store such data and use cases in DB?*
taking in considerations following: - Bulk and global updates are often
enough (user-done operation) - several times per week and it takes long
time - several minutes, annoying user experience. - Consulting some OIDs
only from concrete config is medium frequency use case - Consulting ALL
devices have some specific OID (SNMP Parameter) settled to a specific value
- medium frequency cases. - Consult (read) a configuration for a specific
device as a whole document - often use case (it is send to device as json
or as converted CSV, it is send in modified json format to other utilities,
etc)

One of suggestion from other oppinions is to move ALL configurations to
simple plain relational table

CREATE TABLE public.configuration_plain(
device_id integer,
oid text,
instance text,
value text)

Looking like

*id*

*oid*

*instance*

*value*

20818132

1.3.6.1.4.1.9999.2.13

VSAT

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.15

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.17

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.18

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.19

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.8.1.1

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.8.1.1

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.8.1.2

vlan24

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.8.1.2

VLAN_25

And now I end with a table of ~33 M rows for 40K devices * (700-900
OID+Instance combinations). Some simple selects and updates (especially if
I add simple indexes on id, oid columns) works faster than JSON (less than
1 sec updating one OID for ALL devices), but on some stored procedures
where I need to do some checks and business logic before manipulating
concrete parameters in configuration - performance decrease again from 10
to 25 seconds in below example with each nee added operation:

CREATE OR REPLACE FUNCTION public.test_update_bulk_configuration_plain_plpg(
sql_condition text, -- something like 'select id from devices'
new_elements text, --collection of OIDs to be Added or Update,
could be JSON Array or comma separated list, containing 1 or more
(100) OIDs
oids_to_delete text --collection of OIDs to Delete
)
RETURNS void AS$BODY$DECLARE
r integer;
cnt integer;
ids int[];
lid int;BEGIN
RAISE NOTICE 'start';
EXECUTE 'SELECT ARRAY(' || sql_condition || ')' into ids;
FOREACH lid IN ARRAY ids
LOOP
-- DELETE
-- Some business logic
-- FOR .. IF .. BEGIN
delete from configuration_plain c where c.oid =
'1.3.6.1.4.1.9999.3.5.10.3.201.1.1' and instance = '10' and
c.device_id = lid;
delete from configuration_plain c where c.oid = 'Other
OID' and instance = 'Index' and c.device_id = lid;
-- other eventual deletes
--END

-- UPDATE
-- Some business logic
-- FOR .. IF .. BEGIN
update configuration_plain c set value = '2' where c.oid =
'1.3.6.1.4.1.9999.3.5.10.3.87' and c.device_id = lid;
update configuration_plain c set value = '2' where c.oid =
'1.3.6.1.4.1.9999.3.5.10.3.201.1.1' and instance = '1' and c.device_id
= lid;
-- other eventual updates
-- END

--INSERT
insert into configuration_plain (id, oid, instance, value)
values (lid,'1.3.6.1.4.1.9999.3.5.10.3.201.1.1', '11', '11');
-- OTHER eventually....
insert into configuration_plain (id, oid, instance, value)
values (lid,'OTHER_OID', 'Idx', 'Value of OID');
END LOOP;
RAISE NOTICE 'end';
RETURN;END$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;

So any best practices and advice on such data and use cases modeling in DB?

Regards,

AlexL

Alban Hertroys

haramrae@gmail.com

about 7 years ago

In reply to: Alexandru Lazarev (#1)

Re: Hot to model data in DB (PostgreSQL) for SNMP-like multiple configurations

Is there a reason not to use a relational model instead of json(b) here? I
think that is in fact considered best practice.

On Fri, 8 Mar 2019 at 15:40, Alexandru Lazarev <alexandru.lazarev@gmail.com>
wrote:

I am working on product managing and monitoring Network (NMS-like
products).

Product manages configuration of network devices, for now each device has
stored its configuration in simple table - this was the original design.

CREATE TABLE public.configuration(
id integer NOT NULL,
config json NOT NULL,
CONSTRAINT configuration_pkey PRIMARY KEY (id),)

A config looks like:

{
"_id": 20818132,
"type": "Modem",
"data": [{
"oid": "1.3.6.1.4.1.9999.3.10.2.2.25.4.1.40",
"instance": "24",
"value": "null"
},
{
"oid": "1.3.6.1.4.1.9999.3.5.10.1.86",
"instance": "0",
"value": "562"
},
{
"oid": "1.3.6.1.4.1.9999.3.5.10.3.92.4.1",
"instance": "0",
"value": "0"
},
{
"oid": "1.3.6.1.4.1.9999.3.10.2.2.25.4.1.43",
"instance": "24",
"value": "vlan24"
},
{
"oid": "1.3.6.1.4.1.9999.3.10.2.2.25.4.1.43",
"instance": "25",
"value": "vlan25"
}
]}

And there are many plv8 (java script procedural language extension for
PostgreSQL) stored procedures working on bulks of such config, reading some
OIDs, changing them conditionally, removing some of them and adding others,
especially in use cases like: There are some upper-level META-configuration
of different level, which during change have to update all their updated
parameters to all affected leaves configs. An simple test-example (but
without touching 'data' node)

CREATE OR REPLACE FUNCTION public.process_jsonb_plv8()
RETURNS void AS$BODY$
var CFG_TABLE_NAME = "configurations";
var selPlan = plv8.prepare( "select c.config from " + CFG_TABLE_NAME + " c where c.id = $1", ['int'] );
var updPlan = plv8.prepare( 'update ' + CFG_TABLE_NAME + ' set config = $1 where id = $2', ['jsonb','int'] );

try {

var ids = plv8.execute('select id from devices');

for (var i = 0; i < ids.length; i++) {
var db_cfg = selPlan.execute(ids[i].id); //Get current json config from DB
var cfg = db_cfg[0].config;
cfg["key0"] = 'plv8_json'; //-add some dummy key
updPlan.execute(cfg, ids[i].id); //put uopdated JSON config in DB
plv8.elog(NOTICE, "UPDATED = " + ids[i].id);

}} finally {
selPlan.free();
updPlan.free();}
return;$BODY$
LANGUAGE plv8 VOLATILE
COST 100;

For real use-cases plv8 SPs are more complicated, doing FOR-LOOP through
ALL OIDs object of 'data' array, checking if it is looking for and update
value an/or remove it and/or add newer if necessary.

Since number of devices in DB increased from several hundreds to 40K or
even 70K, and number of OID+Instance combinations also increased from
several hundred to ~1K and sometimes up to 10K within a config, we start
facing slowness in bulk (especially global -> update to ALL Devices)
updates/searches.

In order to get rid off FOR LOOP step for each configuration I've
converted data-node from array to object (key-value model), something like
:

{
"_id": 20818132,
"type": "Modem",
"data": {
"1.3.6.1.4.1.9999.3.10.2.2.25.4.1.40": {
"24": "null"
},
"1.3.6.1.4.1.9999.3.5.10.1.86": {
"0": "562"
},
"1.3.6.1.4.1.9999.3.5.10.3.92.4.1": {
"0": "0"
},
"1.3.6.1.4.1.9999.3.10.2.2.25.4.1.43": {
"24": "vlan24",
"25": "vlan25"
}
}}

Now in order to get a concrete OID (e.g.
"1.3.6.1.4.1.9999.3.10.2.2.25.4.1.43") and/or its instance I do 1-2 *O(1)*
operations instead *O(n)*. And it become a bit faster. After I've changed
column type from json to jsonb - I've got a lot of memory issues with
plv8 stored procedures, so now ideas is:

*What are the best practices to store such data and use cases in DB?*
taking in considerations following: - Bulk and global updates are often
enough (user-done operation) - several times per week and it takes long
time - several minutes, annoying user experience. - Consulting some OIDs
only from concrete config is medium frequency use case - Consulting ALL
devices have some specific OID (SNMP Parameter) settled to a specific value
- medium frequency cases. - Consult (read) a configuration for a specific
device as a whole document - often use case (it is send to device as json
or as converted CSV, it is send in modified json format to other utilities,
etc)

One of suggestion from other oppinions is to move ALL configurations to
simple plain relational table

CREATE TABLE public.configuration_plain(
device_id integer,
oid text,
instance text,
value text)

Looking like

*id*

*oid*

*instance*

*value*

20818132

1.3.6.1.4.1.9999.2.13

0

VSAT

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.15

0

0

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.17

0

0

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.18

0

1

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.19

0

2

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.8.1.1

24

24

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.8.1.1

25

25

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.8.1.2

24

vlan24

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.8.1.2

25

VLAN_25

And now I end with a table of ~33 M rows for 40K devices * (700-900
OID+Instance combinations). Some simple selects and updates (especially if
I add simple indexes on id, oid columns) works faster than JSON (less than
1 sec updating one OID for ALL devices), but on some stored procedures
where I need to do some checks and business logic before manipulating
concrete parameters in configuration - performance decrease again from 10
to 25 seconds in below example with each nee added operation:

CREATE OR REPLACE FUNCTION public.test_update_bulk_configuration_plain_plpg(
sql_condition text, -- something like 'select id from devices'
new_elements text, --collection of OIDs to be Added or Update, could be JSON Array or comma separated list, containing 1 or more (100) OIDs
oids_to_delete text --collection of OIDs to Delete
)
RETURNS void AS$BODY$DECLARE
r integer;
cnt integer;
ids int[];
lid int;BEGIN
RAISE NOTICE 'start';
EXECUTE 'SELECT ARRAY(' || sql_condition || ')' into ids;
FOREACH lid IN ARRAY ids
LOOP
-- DELETE
-- Some business logic
-- FOR .. IF .. BEGIN
delete from configuration_plain c where c.oid = '1.3.6.1.4.1.9999.3.5.10.3.201.1.1' and instance = '10' and c.device_id = lid;
delete from configuration_plain c where c.oid = 'Other OID' and instance = 'Index' and c.device_id = lid;
-- other eventual deletes
--END

-- UPDATE
-- Some business logic
-- FOR .. IF .. BEGIN
update configuration_plain c set value = '2' where c.oid = '1.3.6.1.4.1.9999.3.5.10.3.87' and c.device_id = lid;
update configuration_plain c set value = '2' where c.oid = '1.3.6.1.4.1.9999.3.5.10.3.201.1.1' and instance = '1' and c.device_id = lid;
-- other eventual updates
-- END

--INSERT
insert into configuration_plain (id, oid, instance, value) values (lid,'1.3.6.1.4.1.9999.3.5.10.3.201.1.1', '11', '11');
-- OTHER eventually....
insert into configuration_plain (id, oid, instance, value) values (lid,'OTHER_OID', 'Idx', 'Value of OID');
END LOOP;
RAISE NOTICE 'end';
RETURN;END$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;

So any best practices and advice on such data and use cases modeling in DB?

Regards,

AlexL

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

Alexandru Lazarev

alexandru.lazarev@gmail.com

about 7 years ago

In reply to: Alban Hertroys (#2)

Re: Hot to model data in DB (PostgreSQL) for SNMP-like multiple configurations

For now I do not see the strong reason, but i inherited this project from
other developers,
Originally there was MongoDB and structure was more complex, having SNMP
like nested tables with OID.Instance1.Instance2.instance3 and in JSON it
looked like:
{
"1.3.6.1.4.1.9999.3.10.2.2.25.4.1.43.1": {
"1": {
"24": "vlan24",
"25": "vlan25"
},
"2": {
"24": "127.0.0.1",
"25": "8.8.8.8"
}
}
}

Here we have table in table - How to model this in relational - with
separate tables and JOINs only?
I am not excluding in future I'll have such requirement

the other reason is that devices request their config and some other tools
requests devices configs as a single document/file - this a bit create
overhead for composing document in JSON or XML or CSV format from
relational table (I understand it is doable, but...)

BTW in PG documentation:
"
*8.14.2. Designing JSON documents effectively*

*Representing data as JSON can be considerably more flexible than the
traditional relational data model, which is compelling in environments
where requirements are fluid. It is quite possible for both approaches to
co-exist and complement each other within the same application. However,
even for applications where maximal flexibility is desired, it is still
recommended that JSON documents have a somewhat fixed structure. The
structure is typically unenforced (though enforcing some business rules
declaratively is possible), but having a predictable structure makes it
easier to write queries that usefully summarize a set of "documents"
(datums) in a table.JSON data is subject to the same concurrency-control
considerations as any other data type when stored in a table. Although
storing large documents is practicable, keep in mind that any update
acquires a row-level lock on the whole row. Consider limiting JSON
documents to a manageable size in order to decrease lock contention among
updating transactions. Ideally, JSON documents should each represent an
atomic datum that business rules dictate cannot reasonably be further
subdivided into smaller datums that could be modified independently."
https://www.postgresql.org/docs/9.6/datatype-json.html
<https://www.postgresql.org/docs/9.6/datatype-json.html>*

On Fri, Mar 8, 2019 at 5:15 PM Alban Hertroys <haramrae@gmail.com> wrote:

Show quoted text

Is there a reason not to use a relational model instead of json(b) here? I
think that is in fact considered best practice.

On Fri, 8 Mar 2019 at 15:40, Alexandru Lazarev <
alexandru.lazarev@gmail.com> wrote:

I am working on product managing and monitoring Network (NMS-like
products).

Product manages configuration of network devices, for now each device has
stored its configuration in simple table - this was the original design.

CREATE TABLE public.configuration(
id integer NOT NULL,
config json NOT NULL,
CONSTRAINT configuration_pkey PRIMARY KEY (id),)

A config looks like:

{
"_id": 20818132,
"type": "Modem",
"data": [{
"oid": "1.3.6.1.4.1.9999.3.10.2.2.25.4.1.40",
"instance": "24",
"value": "null"
},
{
"oid": "1.3.6.1.4.1.9999.3.5.10.1.86",
"instance": "0",
"value": "562"
},
{
"oid": "1.3.6.1.4.1.9999.3.5.10.3.92.4.1",
"instance": "0",
"value": "0"
},
{
"oid": "1.3.6.1.4.1.9999.3.10.2.2.25.4.1.43",
"instance": "24",
"value": "vlan24"
},
{
"oid": "1.3.6.1.4.1.9999.3.10.2.2.25.4.1.43",
"instance": "25",
"value": "vlan25"
}
]}

And there are many plv8 (java script procedural language extension for
PostgreSQL) stored procedures working on bulks of such config, reading some
OIDs, changing them conditionally, removing some of them and adding others,
especially in use cases like: There are some upper-level META-configuration
of different level, which during change have to update all their updated
parameters to all affected leaves configs. An simple test-example (but
without touching 'data' node)

CREATE OR REPLACE FUNCTION public.process_jsonb_plv8()
RETURNS void AS$BODY$
var CFG_TABLE_NAME = "configurations";
var selPlan = plv8.prepare( "select c.config from " + CFG_TABLE_NAME + " c where c.id = $1", ['int'] );
var updPlan = plv8.prepare( 'update ' + CFG_TABLE_NAME + ' set config = $1 where id = $2', ['jsonb','int'] );

try {

var ids = plv8.execute('select id from devices');

for (var i = 0; i < ids.length; i++) {
var db_cfg = selPlan.execute(ids[i].id); //Get current json config from DB
var cfg = db_cfg[0].config;
cfg["key0"] = 'plv8_json'; //-add some dummy key
updPlan.execute(cfg, ids[i].id); //put uopdated JSON config in DB
plv8.elog(NOTICE, "UPDATED = " + ids[i].id);

}} finally {
selPlan.free();
updPlan.free();}
return;$BODY$
LANGUAGE plv8 VOLATILE
COST 100;

For real use-cases plv8 SPs are more complicated, doing FOR-LOOP through
ALL OIDs object of 'data' array, checking if it is looking for and update
value an/or remove it and/or add newer if necessary.

Since number of devices in DB increased from several hundreds to 40K or
even 70K, and number of OID+Instance combinations also increased from
several hundred to ~1K and sometimes up to 10K within a config, we start
facing slowness in bulk (especially global -> update to ALL Devices)
updates/searches.

In order to get rid off FOR LOOP step for each configuration I've
converted data-node from array to object (key-value model), something like
:

{
"_id": 20818132,
"type": "Modem",
"data": {
"1.3.6.1.4.1.9999.3.10.2.2.25.4.1.40": {
"24": "null"
},
"1.3.6.1.4.1.9999.3.5.10.1.86": {
"0": "562"
},
"1.3.6.1.4.1.9999.3.5.10.3.92.4.1": {
"0": "0"
},
"1.3.6.1.4.1.9999.3.10.2.2.25.4.1.43": {
"24": "vlan24",
"25": "vlan25"
}
}}

Now in order to get a concrete OID (e.g.
"1.3.6.1.4.1.9999.3.10.2.2.25.4.1.43") and/or its instance I do 1-2
*O(1)* operations instead *O(n)*. And it become a bit faster. After I've
changed column type from json to jsonb - I've got a lot of memory issues
with plv8 stored procedures, so now ideas is:

*What are the best practices to store such data and use cases in DB?*
taking in considerations following: - Bulk and global updates are often
enough (user-done operation) - several times per week and it takes long
time - several minutes, annoying user experience. - Consulting some OIDs
only from concrete config is medium frequency use case - Consulting ALL
devices have some specific OID (SNMP Parameter) settled to a specific value
- medium frequency cases. - Consult (read) a configuration for a specific
device as a whole document - often use case (it is send to device as json
or as converted CSV, it is send in modified json format to other utilities,
etc)

One of suggestion from other oppinions is to move ALL configurations to
simple plain relational table

CREATE TABLE public.configuration_plain(
device_id integer,
oid text,
instance text,
value text)

Looking like

*id*

*oid*

*instance*

*value*

20818132

1.3.6.1.4.1.9999.2.13

0

VSAT

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.15

0

0

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.17

0

0

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.18

0

1

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.19

0

2

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.8.1.1

24

24

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.8.1.1

25

25

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.8.1.2

24

vlan24

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.8.1.2

25

VLAN_25

And now I end with a table of ~33 M rows for 40K devices * (700-900
OID+Instance combinations). Some simple selects and updates (especially if
I add simple indexes on id, oid columns) works faster than JSON (less than
1 sec updating one OID for ALL devices), but on some stored procedures
where I need to do some checks and business logic before manipulating
concrete parameters in configuration - performance decrease again from 10
to 25 seconds in below example with each nee added operation:

CREATE OR REPLACE FUNCTION public.test_update_bulk_configuration_plain_plpg(
sql_condition text, -- something like 'select id from devices'
new_elements text, --collection of OIDs to be Added or Update, could be JSON Array or comma separated list, containing 1 or more (100) OIDs
oids_to_delete text --collection of OIDs to Delete
)
RETURNS void AS$BODY$DECLARE
r integer;
cnt integer;
ids int[];
lid int;BEGIN
RAISE NOTICE 'start';
EXECUTE 'SELECT ARRAY(' || sql_condition || ')' into ids;
FOREACH lid IN ARRAY ids
LOOP
-- DELETE
-- Some business logic
-- FOR .. IF .. BEGIN
delete from configuration_plain c where c.oid = '1.3.6.1.4.1.9999.3.5.10.3.201.1.1' and instance = '10' and c.device_id = lid;
delete from configuration_plain c where c.oid = 'Other OID' and instance = 'Index' and c.device_id = lid;
-- other eventual deletes
--END

-- UPDATE
-- Some business logic
-- FOR .. IF .. BEGIN
update configuration_plain c set value = '2' where c.oid = '1.3.6.1.4.1.9999.3.5.10.3.87' and c.device_id = lid;
update configuration_plain c set value = '2' where c.oid = '1.3.6.1.4.1.9999.3.5.10.3.201.1.1' and instance = '1' and c.device_id = lid;
-- other eventual updates
-- END

--INSERT
insert into configuration_plain (id, oid, instance, value) values (lid,'1.3.6.1.4.1.9999.3.5.10.3.201.1.1', '11', '11');
-- OTHER eventually....
insert into configuration_plain (id, oid, instance, value) values (lid,'OTHER_OID', 'Idx', 'Value of OID');
END LOOP;
RAISE NOTICE 'end';
RETURN;END$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;

So any best practices and advice on such data and use cases modeling in
DB?

Regards,

AlexL

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

Rene Romero Benavides

rene.romero.b@gmail.com

about 7 years ago

In reply to: Alexandru Lazarev (#3)

Re: Hot to model data in DB (PostgreSQL) for SNMP-like multiple configurations

Try partitioning your table based on your device_id, that will give you a
considerable boost for queries which the where clause includes it. for 9.6
(that's the one your using right?) there's pg_partman for that kind of
thing, in this case you would partition by ranges, if the id's are
sequential it's pretty straightforward. Any chance of upgrading to a newer
PG version? partitioning becomes native from PG10 onwards, so you don't
have to rely on particular plugins, and there are always significant
performance improvements for several use cases in newer versions (like
improved parallelism).

On Fri, Mar 8, 2019 at 10:40 AM Alexandru Lazarev <
alexandru.lazarev@gmail.com> wrote:

For now I do not see the strong reason, but i inherited this project from
other developers,
Originally there was MongoDB and structure was more complex, having SNMP
like nested tables with OID.Instance1.Instance2.instance3 and in JSON it
looked like:
{
"1.3.6.1.4.1.9999.3.10.2.2.25.4.1.43.1": {
"1": {
"24": "vlan24",
"25": "vlan25"
},
"2": {
"24": "127.0.0.1",
"25": "8.8.8.8"
}
}
}

Here we have table in table - How to model this in relational - with
separate tables and JOINs only?
I am not excluding in future I'll have such requirement

the other reason is that devices request their config and some other tools
requests devices configs as a single document/file - this a bit create
overhead for composing document in JSON or XML or CSV format from
relational table (I understand it is doable, but...)

BTW in PG documentation:
"
*8.14.2. Designing JSON documents effectively*

*Representing data as JSON can be considerably more flexible than the
traditional relational data model, which is compelling in environments
where requirements are fluid. It is quite possible for both approaches to
co-exist and complement each other within the same application. However,
even for applications where maximal flexibility is desired, it is still
recommended that JSON documents have a somewhat fixed structure. The
structure is typically unenforced (though enforcing some business rules
declaratively is possible), but having a predictable structure makes it
easier to write queries that usefully summarize a set of "documents"
(datums) in a table.JSON data is subject to the same concurrency-control
considerations as any other data type when stored in a table. Although
storing large documents is practicable, keep in mind that any update
acquires a row-level lock on the whole row. Consider limiting JSON
documents to a manageable size in order to decrease lock contention among
updating transactions. Ideally, JSON documents should each represent an
atomic datum that business rules dictate cannot reasonably be further
subdivided into smaller datums that could be modified independently."
https://www.postgresql.org/docs/9.6/datatype-json.html
<https://www.postgresql.org/docs/9.6/datatype-json.html>*

On Fri, Mar 8, 2019 at 5:15 PM Alban Hertroys <haramrae@gmail.com> wrote:

Is there a reason not to use a relational model instead of json(b) here?
I think that is in fact considered best practice.

On Fri, 8 Mar 2019 at 15:40, Alexandru Lazarev <
alexandru.lazarev@gmail.com> wrote:

I am working on product managing and monitoring Network (NMS-like
products).

Product manages configuration of network devices, for now each device
has stored its configuration in simple table - this was the original design.

CREATE TABLE public.configuration(
id integer NOT NULL,
config json NOT NULL,
CONSTRAINT configuration_pkey PRIMARY KEY (id),)

A config looks like:

{
"_id": 20818132,
"type": "Modem",
"data": [{
"oid": "1.3.6.1.4.1.9999.3.10.2.2.25.4.1.40",
"instance": "24",
"value": "null"
},
{
"oid": "1.3.6.1.4.1.9999.3.5.10.1.86",
"instance": "0",
"value": "562"
},
{
"oid": "1.3.6.1.4.1.9999.3.5.10.3.92.4.1",
"instance": "0",
"value": "0"
},
{
"oid": "1.3.6.1.4.1.9999.3.10.2.2.25.4.1.43",
"instance": "24",
"value": "vlan24"
},
{
"oid": "1.3.6.1.4.1.9999.3.10.2.2.25.4.1.43",
"instance": "25",
"value": "vlan25"
}
]}

And there are many plv8 (java script procedural language extension for
PostgreSQL) stored procedures working on bulks of such config, reading some
OIDs, changing them conditionally, removing some of them and adding others,
especially in use cases like: There are some upper-level META-configuration
of different level, which during change have to update all their updated
parameters to all affected leaves configs. An simple test-example (but
without touching 'data' node)

CREATE OR REPLACE FUNCTION public.process_jsonb_plv8()
RETURNS void AS$BODY$
var CFG_TABLE_NAME = "configurations";
var selPlan = plv8.prepare( "select c.config from " + CFG_TABLE_NAME + " c where c.id = $1", ['int'] );
var updPlan = plv8.prepare( 'update ' + CFG_TABLE_NAME + ' set config = $1 where id = $2', ['jsonb','int'] );

try {

var ids = plv8.execute('select id from devices');

for (var i = 0; i < ids.length; i++) {
var db_cfg = selPlan.execute(ids[i].id); //Get current json config from DB
var cfg = db_cfg[0].config;
cfg["key0"] = 'plv8_json'; //-add some dummy key
updPlan.execute(cfg, ids[i].id); //put uopdated JSON config in DB
plv8.elog(NOTICE, "UPDATED = " + ids[i].id);

}} finally {
selPlan.free();
updPlan.free();}
return;$BODY$
LANGUAGE plv8 VOLATILE
COST 100;

For real use-cases plv8 SPs are more complicated, doing FOR-LOOP through
ALL OIDs object of 'data' array, checking if it is looking for and update
value an/or remove it and/or add newer if necessary.

Since number of devices in DB increased from several hundreds to 40K or
even 70K, and number of OID+Instance combinations also increased from
several hundred to ~1K and sometimes up to 10K within a config, we start
facing slowness in bulk (especially global -> update to ALL Devices)
updates/searches.

In order to get rid off FOR LOOP step for each configuration I've
converted data-node from array to object (key-value model), something like
:

{
"_id": 20818132,
"type": "Modem",
"data": {
"1.3.6.1.4.1.9999.3.10.2.2.25.4.1.40": {
"24": "null"
},
"1.3.6.1.4.1.9999.3.5.10.1.86": {
"0": "562"
},
"1.3.6.1.4.1.9999.3.5.10.3.92.4.1": {
"0": "0"
},
"1.3.6.1.4.1.9999.3.10.2.2.25.4.1.43": {
"24": "vlan24",
"25": "vlan25"
}
}}

Now in order to get a concrete OID (e.g.
"1.3.6.1.4.1.9999.3.10.2.2.25.4.1.43") and/or its instance I do 1-2
*O(1)* operations instead *O(n)*. And it become a bit faster. After
I've changed column type from json to jsonb - I've got a lot of memory
issues with plv8 stored procedures, so now ideas is:

*What are the best practices to store such data and use cases in DB?*
taking in considerations following: - Bulk and global updates are often
enough (user-done operation) - several times per week and it takes long
time - several minutes, annoying user experience. - Consulting some OIDs
only from concrete config is medium frequency use case - Consulting ALL
devices have some specific OID (SNMP Parameter) settled to a specific value
- medium frequency cases. - Consult (read) a configuration for a specific
device as a whole document - often use case (it is send to device as json
or as converted CSV, it is send in modified json format to other utilities,
etc)

One of suggestion from other oppinions is to move ALL configurations to
simple plain relational table

CREATE TABLE public.configuration_plain(
device_id integer,
oid text,
instance text,
value text)

Looking like

*id*

*oid*

*instance*

*value*

20818132

1.3.6.1.4.1.9999.2.13

0

VSAT

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.15

0

0

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.17

0

0

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.18

0

1

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.19

0

2

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.8.1.1

24

24

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.8.1.1

25

25

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.8.1.2

24

vlan24

20818132

1.3.6.1.4.1.9999.3.10.2.2.10.8.1.2

25

VLAN_25

And now I end with a table of ~33 M rows for 40K devices * (700-900
OID+Instance combinations). Some simple selects and updates (especially if
I add simple indexes on id, oid columns) works faster than JSON (less than
1 sec updating one OID for ALL devices), but on some stored procedures
where I need to do some checks and business logic before manipulating
concrete parameters in configuration - performance decrease again from 10
to 25 seconds in below example with each nee added operation:

CREATE OR REPLACE FUNCTION public.test_update_bulk_configuration_plain_plpg(
sql_condition text, -- something like 'select id from devices'
new_elements text, --collection of OIDs to be Added or Update, could be JSON Array or comma separated list, containing 1 or more (100) OIDs
oids_to_delete text --collection of OIDs to Delete
)
RETURNS void AS$BODY$DECLARE
r integer;
cnt integer;
ids int[];
lid int;BEGIN
RAISE NOTICE 'start';
EXECUTE 'SELECT ARRAY(' || sql_condition || ')' into ids;
FOREACH lid IN ARRAY ids
LOOP
-- DELETE
-- Some business logic
-- FOR .. IF .. BEGIN
delete from configuration_plain c where c.oid = '1.3.6.1.4.1.9999.3.5.10.3.201.1.1' and instance = '10' and c.device_id = lid;
delete from configuration_plain c where c.oid = 'Other OID' and instance = 'Index' and c.device_id = lid;
-- other eventual deletes
--END

-- UPDATE
-- Some business logic
-- FOR .. IF .. BEGIN
update configuration_plain c set value = '2' where c.oid = '1.3.6.1.4.1.9999.3.5.10.3.87' and c.device_id = lid;
update configuration_plain c set value = '2' where c.oid = '1.3.6.1.4.1.9999.3.5.10.3.201.1.1' and instance = '1' and c.device_id = lid;
-- other eventual updates
-- END

--INSERT
insert into configuration_plain (id, oid, instance, value) values (lid,'1.3.6.1.4.1.9999.3.5.10.3.201.1.1', '11', '11');
-- OTHER eventually....
insert into configuration_plain (id, oid, instance, value) values (lid,'OTHER_OID', 'Idx', 'Value of OID');
END LOOP;
RAISE NOTICE 'end';
RETURN;END$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;

So any best practices and advice on such data and use cases modeling in
DB?

Regards,

AlexL

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/