DUPS in tables columns ERROR: column ". . . " does not exist
Hi,
~
I am trying to get dups from some data from files which md5sums I
previously calculated
~
Here is my mere mortal SQL
~
SELECT md5, COUNT(md5) AS md5cnt
FROM jdk1_6_0_07_txtfls_md5
WHERE (md5cnt > 1)
GROUP BY md5
ORDER BY md5cnt DESC;
~
and this is what I get:
~
jpk=# SELECT md5, COUNT(md5) AS md5cnt
FROM jdk1_6_0_07_txtfls_md5
WHERE (md5cnt > 1)
GROUP BY md5
ORDER BY md5cnt DESC;
jpk-# jpk-# jpk-# jpk-# ERROR: column "md5cnt" does not exist
LINE 3: WHERE (md5cnt > 1)
~
I think I know what that one means based on the clear error message,
namely md5cntis not a table column itself, but I still think there
should be a way to formulate a simple query like this because PG does
take "ORDER BY md5cnt DESC" even if md5cnt is not a table column, why
on earth then it does not swallow and digest the "WHERE (md5cnt > 1)"
part?
~
You could go the monkey way running a query like:
~
SELECT md5, COUNT(md5) AS md5cnt FROM jdk1_6_0_07_txtfls_md5 GROUP BY
md5 ORDER BY md5cnt DESC;
~
and then use code to jump of the loop when md5cnt becomes 1 or you
could use nested SQL statements
~
How can you find duplicate records in a table?
~
Thanks
lbrtchx
Also I know there is a DISTINCT keyword, but I also need to know how
many times the particular data in the column is repeated if it is,
that is why I need to go:
~
SELECT md5, COUNT(md5) AS md5cnt
FROM jdk1_6_0_07_txtfls_md5
WHERE (md5cnt > 1)
GROUP BY md5
ORDER BY md5cnt DESC;
~
Thanks
lbrtchx
Albretch Mueller wrote:
Hi,
~
I am trying to get dups from some data from files which md5sums I
previously calculated
~
Here is my mere mortal SQL
~
SELECT md5, COUNT(md5) AS md5cnt
FROM jdk1_6_0_07_txtfls_md5
WHERE (md5cnt > 1)
GROUP BY md5
ORDER BY md5cnt DESC;
I think you are looking for HAVING as in:
SELECT md5, COUNT(md5)
FROM jdk1_6_0_07_txtfls_md5
GROUP BY md5
HAVING count(md5) > 1
Stefan
thank you Stefan your SQL worked, but still; I am just asking and my
programming bias will certainly show, but aren't you effectivly
"calling" count on the table three times if you go:
~
SELECT md5, COUNT(md5)
FROM jdk1_6_0_07_txtfls_md5
GROUP BY md5
HAVING COUNT(md5) > 1
ORDER BY COUNT(md5) DESC;
~
Shouldn't
~
SELECT md5, COUNT(md5) AS CNT
FROM jdk1_6_0_07_txtfls_md5
GROUP BY md5
HAVING CNT > 1
ORDER BY CNT DESC;
~
work?
~
jpk=# SELECT md5, COUNT(md5) AS CNT
FROM jdk1_6_0_07_txtfls_md5
GROUP BY md5
HAVING CNT > 1
ORDER BY CNT DESC;
jpk-# jpk-# jpk-# jpk-# ERROR: column "cnt" does not exist
LINE 4: HAVING CNT > 1
~
Thanks
lbrtchx
"Albretch Mueller" <lbrtchx@gmail.com> writes:
thank you Stefan your SQL worked, but still; I am just asking and my
programming bias will certainly show, but aren't you effectivly
"calling" count on the table three times if you go:
The system is smart enough to only do the count() once.
regards, tom lane
The system is smart enough to only do the count() once.
~
But not smart enough to make a variable you declare point to that
internal variable so that things are clearer/ easier ;-)
~
Thanks
lbrtchx
On Sat, Aug 30, 2008 at 01:36:25PM -0400, Albretch Mueller wrote:
The system is smart enough to only do the count() once.
~
But not smart enough to make a variable you declare point to that
internal variable so that things are clearer/ easier ;-)
The SQL standard has pretty clear rules about what variables can be
referenced from where, and this is one of those places (the rationale
is probably in there too).
Have a nice day
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.
Albretch Mueller wrote:
Also I know there is a DISTINCT keyword, but I also need to know how
many times the particular data in the column is repeated if it is,
that is why I need to go:
~
SELECT md5, COUNT(md5) AS md5cnt
FROM jdk1_6_0_07_txtfls_md5
WHERE (md5cnt > 1)
GROUP BY md5
ORDER BY md5cnt DESC;
Use HAVING instead of WHERE.
--
Lew