kind of a bag of attributes in a DB . . .

Started by Albretch Muellerover 6 years ago11 messagesgeneral
Jump to latest
#1Albretch Mueller
lbrtchx@gmail.com

Say, you get lots of data and their corresponding metadata, which in
some cases may be undefined or undeclared (left as an empty string).
Think of youtube json files or the result of the "file" command.

I need to be able to "instantly" search that metadata and I think DBs
are best for such jobs and get some metrics out of it.

I know this is not exactly a kosher way to deal with data which can't
be represented in a nice tabular form, but I don't find the idea that
half way off either.

What is the pattern, anti-pattern or whatever relating to such design?

Do you know of such implementations with such data?

lbrtchx

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Albretch Mueller (#1)
Re: kind of a bag of attributes in a DB . . .

On 9/7/19 5:45 AM, Albretch Mueller wrote:

Say, you get lots of data and their corresponding metadata, which in
some cases may be undefined or undeclared (left as an empty string).
Think of youtube json files or the result of the "file" command.

I need to be able to "instantly" search that metadata and I think DBs
are best for such jobs and get some metrics out of it.

Is the metadata uniform or are you dealing with a variety of different data?

I know this is not exactly a kosher way to deal with data which can't
be represented in a nice tabular form, but I don't find the idea that
half way off either.

What is the pattern, anti-pattern or whatever relating to such design?

Do you know of such implementations with such data?

lbrtchx

--
Adrian Klaver
adrian.klaver@aklaver.com

#3Chris Travers
chris.travers@gmail.com
In reply to: Albretch Mueller (#1)
Re: kind of a bag of attributes in a DB . . .

On Sat, Sep 7, 2019 at 5:17 PM Albretch Mueller <lbrtchx@gmail.com> wrote:

Say, you get lots of data and their corresponding metadata, which in
some cases may be undefined or undeclared (left as an empty string).
Think of youtube json files or the result of the "file" command.

I need to be able to "instantly" search that metadata and I think DBs
are best for such jobs and get some metrics out of it.

I know this is not exactly a kosher way to deal with data which can't
be represented in a nice tabular form, but I don't find the idea that
half way off either.

What is the pattern, anti-pattern or whatever relating to such design?

Do you know of such implementations with such data?

We do the debug logs of JSONB with some indexing. It works in some
limited cases but you need to have a good sense of index possibilities and
how the indexes actually work.

lbrtchx

--
Best Wishes,
Chris Travers

Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor
lock-in.
http://www.efficito.com/learn_more

#4Albretch Mueller
lbrtchx@gmail.com
In reply to: Adrian Klaver (#2)
Re: kind of a bag of attributes in a DB . . .

On 9/7/19, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

Is the metadata uniform or are you dealing with a variety of different
data?

You can expect for all files to have a filename and size, but their
kinds (the metadata describing them) can be really colorful and wild
when it comes to formatting.

lbrtchx

#5Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Albretch Mueller (#4)
Re: kind of a bag of attributes in a DB . . .

On 9/10/19 9:59 AM, Albretch Mueller wrote:

On 9/7/19, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

Is the metadata uniform or are you dealing with a variety of different
data?

You can expect for all files to have a filename and size, but their
kinds (the metadata describing them) can be really colorful and wild
when it comes to formatting.

If there is no rhyme or reason to the metadata I am not sure how you
could come up with an efficient search strategy. Seems it would be a
brute search over everything.

lbrtchx

--
Adrian Klaver
adrian.klaver@aklaver.com

#6Albretch Mueller
lbrtchx@gmail.com
In reply to: Adrian Klaver (#5)
Re: kind of a bag of attributes in a DB . . .

On 9/10/19, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

If there is no rhyme or reason to the metadata I am not sure how you
could come up with an efficient search strategy. Seems it would be a
brute search over everything.

Not exactly. Say some things have colours but now weight. You could
still Group them as being "weighty" and then tell about how heavy they
are, with the colorful ones you could specify the colours and then see
if there is some correlation between weights and colours ...

lbrtchx

#7Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Albretch Mueller (#6)
Re: kind of a bag of attributes in a DB . . .

On 9/11/19 9:46 AM, Albretch Mueller wrote:

On 9/10/19, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

If there is no rhyme or reason to the metadata I am not sure how you
could come up with an efficient search strategy. Seems it would be a
brute search over everything.

Not exactly. Say some things have colours but now weight. You could
still Group them as being "weighty" and then tell about how heavy they
are, with the colorful ones you could specify the colours and then see
if there is some correlation between weights and colours ...

It would help to see some sample data, otherwise any answer would be
pure speculation.

lbrtchx

--
Adrian Klaver
adrian.klaver@aklaver.com

#8Albretch Mueller
lbrtchx@gmail.com
In reply to: Adrian Klaver (#7)
Re: kind of a bag of attributes in a DB . . .

just download a bunch of json info files from youtube data Feeds

Actually, does postgresql has a json Driver of import feature?

the metadata contained in json files would require more than one
small databases, but such an import feature should be trivial

C

#9Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Albretch Mueller (#8)
Re: kind of a bag of attributes in a DB . . .

On 9/14/19 2:06 AM, Albretch Mueller wrote:

just download a bunch of json info files from youtube data Feeds

Actually, does postgresql has a json Driver of import feature?

Not sure what you mean by above?

Postgres has json(b) data types that you can import JSON into:

https://www.postgresql.org/docs/11/datatype-json.html

the metadata contained in json files would require more than one
small databases, but such an import feature should be trivial

Again, not sure I understand why small databases are required?

C

--
Adrian Klaver
adrian.klaver@aklaver.com

#10Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Albretch Mueller (#8)
Re: kind of a bag of attributes in a DB . . .

On 9/14/19 2:06 AM, Albretch Mueller wrote:

just download a bunch of json info files from youtube data Feeds

Actually, does postgresql has a json Driver of import feature?

I'm working without a net(coffee) and so I forgot to mention that for
Python there is:

http://initd.org/psycopg/docs/extras.html?highlight=json

Not sure if this is what you are looking for or not.

the metadata contained in json files would require more than one
small databases, but such an import feature should be trivial

C

--
Adrian Klaver
adrian.klaver@aklaver.com

#11Chris Travers
chris.travers@gmail.com
In reply to: Albretch Mueller (#8)
Re: kind of a bag of attributes in a DB . . .

On Sat, Sep 14, 2019 at 5:11 PM Albretch Mueller <lbrtchx@gmail.com> wrote:

just download a bunch of json info files from youtube data Feeds

Actually, does postgresql has a json Driver of import feature?

Sort of.... There are a bunch of features around JSON and JSONB data
types which could be useful.

the metadata contained in json files would require more than one
small databases, but such an import feature should be trivial

It is not at all trivial for a bunch of reasons inherent to the JSON
specification. How to handle duplicate keys, for example.

However writing an import for JSON objects into a particular database is
indeed trivial.

C

--
Best Wishes,
Chris Travers

Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor
lock-in.
http://www.efficito.com/learn_more