match an IP address
hello all,
I'm unable to build a LIKE or SIMILAR TO expression for matching and ip
address
192.168.90.3
10.3.2.1
any help please...
thanks
joao
On Mon, Sep 22, 2008 at 05:59:25PM +0100, Joao Ferreira gmail wrote:
I'm unable to build a LIKE or SIMILAR TO expression for matching and ip
address
192.168.90.3
10.3.2.1
any help please...
any reason not to use standard inet datatype? which does the validation.
Best regards,
depesz
--
Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
On 22/09/2008 17:59, Joao Ferreira gmail wrote:
I'm unable to build a LIKE or SIMILAR TO expression for matching and ip
address
There are built-in types in PG for handling IP addresses - are they any
use to you?
If not, there's a useful site here which may get you started:
http://regexlib.com/default.aspx
Ray.
------------------------------------------------------------------
Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland
rod@iol.ie
Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals
------------------------------------------------------------------
Import Notes
Resolved by subject fallback
On Mon, Sep 22, 2008 at 10:59 AM, Joao Ferreira gmail
<joao.miguel.c.ferreira@gmail.com> wrote:
hello all,
I'm unable to build a LIKE or SIMILAR TO expression for matching and ip
address192.168.90.3
10.3.2.1
As already mentioned inet / cidr types should work. Example:
postgres=# create table inettest (a inet);
CREATE TABLE
postgres=# insert into inettest values
('192.168.0.1'),('192.168.1.1'),('10.0.0.1');
INSERT 0 3
postgres=# select a from inettest where '192.168.0.1/16' >> a;
a
-------------
192.168.0.1
192.168.1.1
(2 rows)
postgres=# select a from inettest where '192.168.0.1/24' >> a;
a
-------------
192.168.0.1
(1 row)
postgres=# select a from inettest where '192.168.0.1/0' >> a;
a
-------------
192.168.0.1
192.168.1.1
10.0.0.1
(3 rows)
well...
my IP addresses are stored in a TEXT type field. that field can actually
contain usernames like 'joao' or 'scott' and it can contain IP
addresses....
:(
joao
Show quoted text
On Mon, 2008-09-22 at 11:13 -0600, Scott Marlowe wrote:
On Mon, Sep 22, 2008 at 10:59 AM, Joao Ferreira gmail
<joao.miguel.c.ferreira@gmail.com> wrote:hello all,
I'm unable to build a LIKE or SIMILAR TO expression for matching and ip
address192.168.90.3
10.3.2.1As already mentioned inet / cidr types should work. Example:
postgres=# create table inettest (a inet);
CREATE TABLE
postgres=# insert into inettest values
('192.168.0.1'),('192.168.1.1'),('10.0.0.1');
INSERT 0 3
postgres=# select a from inettest where '192.168.0.1/16' >> a;
a
-------------
192.168.0.1
192.168.1.1
(2 rows)
postgres=# select a from inettest where '192.168.0.1/24' >> a;
a
-------------
192.168.0.1
(1 row)
postgres=# select a from inettest where '192.168.0.1/0' >> a;
a
-------------
192.168.0.1
192.168.1.1
10.0.0.1
(3 rows)
On Mon, Sep 22, 2008 at 11:16 AM, Joao Ferreira gmail
<joao.miguel.c.ferreira@gmail.com> wrote:
well...
my IP addresses are stored in a TEXT type field. that field can actually
contain usernames like 'joao' or 'scott' and it can contain IP
addresses....
Then cast them to inet and use the method I showed above:
postgres=# create table b as select a::text from inettest ;
SELECT
postgres=# select * from b;
a
----------------
192.168.0.1/32
192.168.1.1/32
10.0.0.1/32
(3 rows)
postgres=# select a from b where '192.168.0.1/0' >> a::inet;
a
----------------
192.168.0.1/32
192.168.1.1/32
10.0.0.1/32
(3 rows)
postgres=# select a from b where '192.168.0.1/24' >> a::inet;
a
----------------
192.168.0.1/32
Hi,
Joao Ferreira gmail wrote:
well...
my IP addresses are stored in a TEXT type field. that field can actually
contain usernames like 'joao' or 'scott' and it can contain IP
addresses....
Any reason not to change this in the first place?
For a quick fix you could use regex to find the records
likely containing an ip-address notation.
Tino
my IP addresses are stored in a TEXT type field. that field can actually
contain usernames like 'joao' or 'scott' and it can contain IP
addresses....
I think this is common DB design on many websites that have registered
user IDs.
My humble suggestion would be to make another column in the table
called "user_registered" or something. Make this an int(1). If the
user is registered, value is 1, otherwise 0.
To update your existing data onetime, run an SQL update query looking
for IP pattern with 3 dots (which is likely IP address and not user
id) or by the INET conditions in previous suggestions.
Next, include the columns "user_id" and "user_registered" in the same
index. This won't increase the size of the index too much, as the
user_registered field is a small INT number. But from now on your SQL
queries:
...WHERE user_id = 'testuser' AND user_registered = 1
will return much faster. I have found this to the most convenient and
fastest solution in PGSQL instead of haivng INET in the WHERE clauses.
Phoenix Kiula wrote:
my IP addresses are stored in a TEXT type field. that field can actually
contain usernames like 'joao' or 'scott' and it can contain IP
addresses....I think this is common DB design on many websites that have registered
user IDs.
Is it? Name one! Sounds like crappy design to me.
My humble suggestion would be to make another column in the table
called "user_registered" or something. Make this an int(1). If the
user is registered, value is 1, otherwise 0.
Maybe "registration" (aka authentication) is mixed up with simple
session handling?
To update your existing data onetime, run an SQL update query looking
for IP pattern with 3 dots (which is likely IP address and not user
id) or by the INET conditions in previous suggestions.
It could also just be a fdn (hostname) - still I'm a bit puzzled
how a username and an IP-address can get into the same field.
Next, include the columns "user_id" and "user_registered" in the same
index. This won't increase the size of the index too much, as the
user_registered field is a small INT number. But from now on your SQL
queries:...WHERE user_id = 'testuser' AND user_registered = 1
will return much faster. I have found this to the most convenient and
fastest solution in PGSQL instead of haivng INET in the WHERE clauses.
Can you give example on where the inet entry is going to be used?
Cheers
Tino
I think this is common DB design on many websites that have registered
user IDs.Is it? Name one! Sounds like crappy design to me.
It might sound crappy design to you, but for websites that allow users
to do something while they are registered OR unregistered, will choose
to have this for query speed. (Registered user goes in as "testuser"
while an unregistered one goes as his IP address--some websites also
use cookies but they're less reliable.)
We can make this very relationally sound and whatnot, but having one
field to go to, whether you are registered or not, makes it much
simpler.
Most websites have to allow for dots in their user ID these days as
people prefer to have their email address as user ID. Which means that
the dot checking of an IP address may not work to distinguish IP
addresses (unregistered user) from registered user IDs.
In this scenario, for query speed, again, if there is a column that
tells us whether this user is registered or not it helps a great deal.
The INET match condition is not good enough for speed for most modern
websites with any sizeable traffic. I even wrote a function that
converts IP to INET integer and vice versa, but no great boost in
query speed that could compare to an indexed query on user_id and
user_reg.
Welcome your thoughts on how you would do it.
Import Notes
Reply to msg id not found: e373d31e0809221725r44b9a2acq9232915a1d0cebf0@mail.gmail.com
Tino Wildenhain wrote:
Phoenix Kiula wrote:
my IP addresses are stored in a TEXT type field. that field can
actually
contain usernames like 'joao' or 'scott' and it can contain IP
addresses....I think this is common DB design on many websites that have registered
user IDs.Is it? Name one! Sounds like crappy design to me.
My humble suggestion would be to make another column in the table
called "user_registered" or something. Make this an int(1). If the
user is registered, value is 1, otherwise 0.Maybe "registration" (aka authentication) is mixed up with simple
session handling?To update your existing data onetime, run an SQL update query looking
for IP pattern with 3 dots (which is likely IP address and not user
id) or by the INET conditions in previous suggestions.It could also just be a fdn (hostname) - still I'm a bit puzzled
how a username and an IP-address can get into the same field.
Dodgy forum software. Lots of it uses an IP address as a fake username
for unregistered users, rather than doing the sensible thing and
tracking both IP address and (if defined) username.
How I'd want to do this if I was designing the setup from scratch would
probably be:
--
-- Track user identies
--
CREATE TABLE user (
user_id SERIAL PRIMARY KEY,
user_name VARCHAR NOT NULL,
user_retired BOOLEAN
-- and whatever else you want to keep track of about them
);
-- This index enforces unique user names across active users,
-- permitting re-use of usernames for since deleted users without
-- losing information about the previous user of that name's activity
-- or losing the inability to differentiate between the old and new
-- owners of that name.
--
-- I'd revoke DELETE rights on `user' or use a rule that rewrote DELETE
-- to an UPDATE user SET user_retired = 't'.
--
CREATE UNIQUE INDEX user_active_name
ON user(user_name)
WHERE (NOT user_retired)
-- Now, say your problem described in your post is with user activity
-- logging:
CREATE TABLE access_log (
-- blah blah
user_id INTEGER REFERENCES user(user_id),
access_ip cidr NOT NULL
);
CREATE INDEX access_log_ip ON access_log(access_ip);
CREATE INDEX access_user
ON access_log(user_id)
WHERE (user_id NOT NULL);
In other words: always store the IP address, and if the user involved
was a registered user store a reference to their user ID as well. Store
a reference to a globally unique user identity number rather than the
user name, and permit reuse of user names without losing information
about distinct username owners.
If you wanted you could use string user IDs and do away with the
synthetic "user_id" key I've used above, but I suspect you'd regret it
down the track.
If you wanted to look up activity that might be identifed by IP address
or by username, a query like this would do the trick and would simulate
the behaviour your forum software is used to, including the ability of a
user to create a username that's an IP address to throw the whole thing
into chaos:
SELECT * FROM access_user
WHERE COALESCE(user_id, to_char(access_ip)) = matchstring;
--
Craig Ringer
My post at the bottom.
On 9/23/08, Craig Ringer <craig@postnewspapers.com.au> wrote:
Dodgy forum software. Lots of it uses an IP address as a fake username for
unregistered users, rather than doing the sensible thing and tracking both
IP address and (if defined) username.How I'd want to do this if I was designing the setup from scratch would
probably be:--
-- Track user identies
--
CREATE TABLE user (
user_id SERIAL PRIMARY KEY,
user_name VARCHAR NOT NULL,
user_retired BOOLEAN
-- and whatever else you want to keep track of about them
);-- This index enforces unique user names across active users,
-- permitting re-use of usernames for since deleted users without
-- losing information about the previous user of that name's activity
-- or losing the inability to differentiate between the old and new
-- owners of that name.
--
-- I'd revoke DELETE rights on `user' or use a rule that rewrote DELETE
-- to an UPDATE user SET user_retired = 't'.
--CREATE UNIQUE INDEX user_active_name
ON user(user_name)
WHERE (NOT user_retired)-- Now, say your problem described in your post is with user activity
-- logging:CREATE TABLE access_log (
-- blah blah
user_id INTEGER REFERENCES user(user_id),
access_ip cidr NOT NULL
);CREATE INDEX access_log_ip ON access_log(access_ip);
CREATE INDEX access_user
ON access_log(user_id)
WHERE (user_id NOT NULL);In other words: always store the IP address, and if the user involved was a
registered user store a reference to their user ID as well. Store a
reference to a globally unique user identity number rather than the user
name, and permit reuse of user names without losing information about
distinct username owners.If you wanted you could use string user IDs and do away with the synthetic
"user_id" key I've used above, but I suspect you'd regret it down the track.If you wanted to look up activity that might be identifed by IP address or
by username, a query like this would do the trick and would simulate the
behaviour your forum software is used to, including the ability of a user to
create a username that's an IP address to throw the whole thing into chaos:SELECT * FROM access_user
WHERE COALESCE(user_id, to_char(access_ip)) = matchstring;--
Craig Ringer
No. You have no idea what the design is for. Not forum crap.
What happens when you need to store in a table the activity log?
ACTIVITY_ID
USER_STAMP (currently user_id or ip for registered and unregistered resp.)
The idea of storing IP for users is already being done. So what?
Everytime they "do" something, you do not store their IP. Why would
you? Just store their user id. For unregistered ones however, we store
the IP because there is nothing else. There is no user ID for them.
What's your logic for getting a user ID for unregistered guys --
invent one automagically?
Finally, this SQL:
WHERE COALESCE(user_id, to_char(access_ip)) = matchstring;
Ever tried this crap on a table of 10 million records on a live
website, where this query is happening at 3000 times per second? No
such function schtick will match the raw speed of a simpler indexed
query. Or did you mean my index should contain the COALESCE already?
Tino, I wasn't talking about sessions. This is not about session IDs.
A session ID is useless the next time a certain IP address "does
something" on the site. You want a commonality for non-registered
users across many different sessions. (Apart from the fact that
session IDs are usually long hashes which take up space in the table
and in the index)
Import Notes
Reply to msg id not found: e373d31e0809222343m430a7fecr3539af8136a320fd@mail.gmail.com
Phoenix Kiula wrote:
My post at the bottom.
...
No. You have no idea what the design is for. Not forum crap.
What happens when you need to store in a table the activity log?
ACTIVITY_ID
USER_STAMP (currently user_id or ip for registered and unregistered resp.)
And here it gets wrong. Obviously you would store the session id
or if you have a lot of relations, use a sequence generated
key for session_id (compare with my design in the other post,
in this case session_id would be serial and you'd have a field
session_key text with the index for the cookies in the sessions
table instead)
The idea of storing IP for users is already being done. So what?
Abandon this idea I'd say. Its based on the wrong asumption IP
addresses map to users in 1:1 relation.
Everytime they "do" something, you do not store their IP. Why would
you? Just store their user id. For unregistered ones however, we store
the IP because there is nothing else. There is no user ID for them.
What's your logic for getting a user ID for unregistered guys --
invent one automagically?Finally, this SQL:
WHERE COALESCE(user_id, to_char(access_ip)) = matchstring;
Ever tried this crap on a table of 10 million records on a live
website, where this query is happening at 3000 times per second? No
You have 10 million people active the same time in your site?
such function schtick will match the raw speed of a simpler indexed
query. Or did you mean my index should contain the COALESCE already?Tino, I wasn't talking about sessions. This is not about session IDs.
Well actually this is. You are just naming it differently.
A session ID is useless the next time a certain IP address "does
something" on the site. You want a commonality for non-registered
users across many different sessions. (Apart from the fact that
session IDs are usually long hashes which take up space in the table
and in the index)
Yes but only active ones.
btw, given IP is in every request, where is your username coming from?
Apart from basic auth, there is no way of having a userid tied to
the request directly, so how are you doing this?
Tino
On Mon, Sep 22, 2008 at 05:59:25PM +0100, Joao Ferreira gmail wrote:
I'm unable to build a LIKE or SIMILAR TO expression for matching and ip
address
192.168.90.3
10.3.2.1
any help please...
use this regular expression:
'^[0-9]{1,3}(.[0-9]{1,3}){3}$'
warning: do not use "like" or "similar to".
proper way to use it:
select * from table where field ~ '^[0-9]{1,3}(.[0-9]{1,3}){3}$';
this regexp is not 100% fault proof - it will happily return rows like:
'300.999.998.7'
but for most of the cases it should be enough. if you need it to match
only ips, and filter out things like '300.999.999.999' - let me know.
Best regards,
depesz
--
Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
Phoenix Kiula wrote:
Ever tried this crap on a table of 10 million records on a live
website, where this query is happening at 3000 times per second? No
such function schtick will match the raw speed of a simpler indexed
query. Or did you mean my index should contain the COALESCE already?
Please forgive my attempt to help you based on a woefully insufficient
description of your problem and situation. I will not make any attempt
to do so again.
--
Craig Ringer
thank you depesz
it seems a pretty good fix for my problem. Actually yestreday I came up
with something similar but your's is better.
cheers
joao
Show quoted text
On Tue, 2008-09-23 at 09:26 +0200, hubert depesz lubaczewski wrote:
On Mon, Sep 22, 2008 at 05:59:25PM +0100, Joao Ferreira gmail wrote:
I'm unable to build a LIKE or SIMILAR TO expression for matching and ip
address
192.168.90.3
10.3.2.1
any help please...use this regular expression:
'^[0-9]{1,3}(.[0-9]{1,3}){3}$'
warning: do not use "like" or "similar to".
proper way to use it:select * from table where field ~ '^[0-9]{1,3}(.[0-9]{1,3}){3}$';
this regexp is not 100% fault proof - it will happily return rows like:
'300.999.998.7'but for most of the cases it should be enough. if you need it to match
only ips, and filter out things like '300.999.999.999' - let me know.Best regards,
depesz
Please forgive my attempt to help you based on a woefully insufficient
description of your problem and situation. I will not make any attempt to do
so again.
Actually it was not my problem, this is a thread started by some one
else. I use Gmail so I see the entire thread as a "conversation" and
the context is maintained. You should try it. Anyway, sorry that you
feel bad.
To others: thanks for your suggestions, but this issue is not one of
session IDs, nor is it solved by storing IP addresses separately
(which does not assume 1:1 correlation between user and IP). We'll let
that be.
Let's just say that in *many* online situations it is vital for
querying speed to have the same column that stores users -- both
registered and unregistered. A query in SQL that matches against an IP
address regexp to identify the unregistered ones may work for some
with smaller databases, which is great, and if it doesn't (the "~"
match is simply not practical for large busy websites), then consider
a small separate column that stores the registration status as a flag.
Thanks.
Phoenix Kiula wrote:
Ever tried this crap on a table of 10 million records on a live
website, where this query is happening at 3000 times per second? No
such function schtick will match the raw speed of a simpler indexed
query. Or did you mean my index should contain the COALESCE already?
Hmm. My previous response may have been overly grumpy.
The point I was *trying* to make is that shoving a username/id and an IP
address into a single field is probably not ideal. At least in my
experience you pay for this sort of optimisation (if it even works out
as an optimisation in the first place) down the track. I have the
misfortunate to have to administrate a system full of such multi-use
fields, and have developed a real loathing for the approach.
If you don't want to store IPs for registered users, I'd use:
user_id INTEGER,
ip cidr,
CONSTRAINT must_have_userstamp
CHECK ( user_id IS NOT NULL OR ip IS NOT NULL)
... and yes, I'd use a functional index to look it up, or even a
trigger-maintained cache of the text representation if I had to. Then
again, I guess I'm lucky enough to work in environments where data
integrity and correctness is a priority and the resources available are
a good fit to the tasks the database needs to do.
--
Craig Ringe
If you don't want to store IPs for registered users, I'd use:
user_id INTEGER,
ip cidr,
CONSTRAINT must_have_userstamp
CHECK ( user_id IS NOT NULL OR ip IS NOT NULL)... and yes, I'd use a functional index to look it up, or even a
trigger-maintained cache of the text representation if I had to. Then
Ok, this is an idea. And I do agree that multifunction fields are a
potential pain in the distant future.
My questions:
1. What extra tax will this constraint levy on an INSERT or UPDATE on
this table? There are about 100,000 inserts a day, and over three
times as many UPDATES. The concurrency is pretty high -- I mean
sometimes 1,000 users at the same time but no more than that. If the
additional cost of insertion/updating is not too heavy, I suppose this
could be a nice approach.
2. Why not have an INET field...why a CIDR? What's the benefit? It
stores those pesky ".../8" type additional data which one has to mask
with functions. Would INET work just as well?
3. Storage wise does this add significantly? How much space does an
INET field take as opposed to, say, a VARCHAR field?
4. Most importantly, how would you structure the index for this? I
would much rather have a fast "=" in my sql's WHERE clause. No "OR"
etc. Any thoughts?
Thanks
Hi,
Phoenix Kiula wrote:
Please forgive my attempt to help you based on a woefully insufficient
description of your problem and situation. I will not make any attempt to do
so again.To others: thanks for your suggestions, but this issue is not one of
session IDs, nor is it solved by storing IP addresses separately
(which does not assume 1:1 correlation between user and IP). We'll let
that be.Let's just say that in *many* online situations it is vital for
querying speed to have the same column that stores users -- both
registered and unregistered. A query in SQL that matches against an IP
if not registered, where is the user coming from? The IP is clearly not
an identifier for a user. You (and the OP) should disregard that idea.
address regexp to identify the unregistered ones may work for some
with smaller databases, which is great, and if it doesn't (the "~"
match is simply not practical for large busy websites), then consider
a small separate column that stores the registration status as a flag.
The user id itself would serve as that flag. If non NULL -> user known,
otherwise unknown. Sounds easy, no? No regex at all! :)
Thanks.
Thx ;)
Tino