I want to search my project source code
I have a lot of code -- millions of lines at this point, written
over the last 5 years. Everything is in a bunch of nested folders.
At least once a week, I want to find some code that uses a few modules,
so I have to launch a find + grep at the top of the tree and then wait
for it to finish.
I wonder if I could store our source code in a postgresql table and
then use full text searching to index. Then I hope I could run a query
where I ask for all files that use modules X, Y, and Z.
I'm looking for something sort of like the locate utility, except that
instead of building a quickly-searchable list of file names, I want to
be able to search file contents also.
Matt
Matthew Wilson <matt@tplus1.com> writes:
At least once a week, I want to find some code that uses a few modules,
so I have to launch a find + grep at the top of the tree and then wait
for it to finish.
Personally I use glimpse for this. It's a bit old and creaky but it
performs wonders. There might be something better out there by now.
I wouldn't recommend trying to use a standard FTS to index code:
code is not a natural language and the kinds of searches you usually
want to perform are a lot different. As an example, I glimpse for
"foo" when looking for references to a function foo, but "^foo"
when seeking its definition (this relies on the coding conventions
about function layout, of course). An FTS doesn't think start-of-line
is significant so it can't do that.
regards, tom lane
openfts.sf.net is tool for you. It has even example scripts for
indexing/searching file system.
Oleg
On Sat, 27 Oct 2007, Matthew Wilson wrote:
I have a lot of code -- millions of lines at this point, written
over the last 5 years. Everything is in a bunch of nested folders.At least once a week, I want to find some code that uses a few modules,
so I have to launch a find + grep at the top of the tree and then wait
for it to finish.I wonder if I could store our source code in a postgresql table and
then use full text searching to index. Then I hope I could run a query
where I ask for all files that use modules X, Y, and Z.I'm looking for something sort of like the locate utility, except that
instead of building a quickly-searchable list of file names, I want to
be able to search file contents also.Matt
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
Matthew Wilson wrote:
I have a lot of code -- millions of lines at this point, written
over the last 5 years. Everything is in a bunch of nested folders.At least once a week, I want to find some code that uses a few modules,
so I have to launch a find + grep at the top of the tree and then wait
for it to finish.I wonder if I could store our source code in a postgresql table and
then use full text searching to index. Then I hope I could run a query
where I ask for all files that use modules X, Y, and Z.
DBMSs are great tools for the right job, but IMO this is not the right
job. I can't see how a database engine, with all it's transactional
overhead and many other layers, will ever beat a simple grep
performance-wise. I've used Eclipse for refactoring, but having done it
once, I'm sticking with grep.
--
Guy Rouillier
On Oct 28, 2007, at 12:59 AM, Guy Rouillier wrote:
Matthew Wilson wrote:
I have a lot of code -- millions of lines at this point, written
over the last 5 years. Everything is in a bunch of nested folders.
At least once a week, I want to find some code that uses a few
modules,
so I have to launch a find + grep at the top of the tree and then
wait
for it to finish.
I wonder if I could store our source code in a postgresql table and
then use full text searching to index. Then I hope I could run a
query
where I ask for all files that use modules X, Y, and Z.DBMSs are great tools for the right job, but IMO this is not the
right job. I can't see how a database engine, with all it's
transactional overhead and many other layers, will ever beat a
simple grep performance-wise. I've used Eclipse for refactoring,
but having done it once, I'm sticking with grep.
This is exactly what cscope is good for.
http://cscope.sourceforge.net/
I've used it since the early 90's. I do level 3 support for really
big companies. If you are an emacs fan, its hooked in to it as well.
You want to use the -q option. If it is a million lines of code, its
going to take a while. It pseudo-parses the code (some tricky
constructs will confuse it) and builds a very simple database file.
I think it uses Berkeley's DB file. After that, finding all the
occurrences of foo is a few seconds.
If you want to find just definitions (like where is foo defined),
then use ctags or etags. There is exuberant ctags here:
Perry Smith ( pedz@easesoftware.com )
Ease Software, Inc. ( http://www.easesoftware.com )
Low cost SATA Disk Systems for IBMs p5, pSeries, and RS/6000 AIX systems
Tom Lane wrote:
I wouldn't recommend trying to use a standard FTS to index code:
code is not a natural language and the kinds of searches you usually
want to perform are a lot different. As an example, I glimpse for
"foo" when looking for references to a function foo, but "^foo"
when seeking its definition (this relies on the coding conventions
about function layout, of course). An FTS doesn't think start-of-line
is significant so it can't do that.
+1. The nice thing about a tool that understands code is that you can
query it in ways that make sense to code. For example I can search for
"all files that include foo.h" or "all callers of function bar" or "all
occurences of the symbol baz". I use cscope for this, which integrates
nicely into my text editor (vim), and others have told me they use
kscope which puts it inside a nice GUI window, if you care about such
things.
--
Alvaro Herrera http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"I would rather have GNU than GNOT." (ccchips, lwn.net/Articles/37595/)