I want to search my project source code

Started by Matthew Wilsonover 18 years ago6 messagesgeneral
Jump to latest
#1Matthew Wilson
matt@tplus1.com

I have a lot of code -- millions of lines at this point, written
over the last 5 years. Everything is in a bunch of nested folders.

At least once a week, I want to find some code that uses a few modules,
so I have to launch a find + grep at the top of the tree and then wait
for it to finish.

I wonder if I could store our source code in a postgresql table and
then use full text searching to index. Then I hope I could run a query
where I ask for all files that use modules X, Y, and Z.

I'm looking for something sort of like the locate utility, except that
instead of building a quickly-searchable list of file names, I want to
be able to search file contents also.

Matt

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Matthew Wilson (#1)
Re: I want to search my project source code

Matthew Wilson <matt@tplus1.com> writes:

At least once a week, I want to find some code that uses a few modules,
so I have to launch a find + grep at the top of the tree and then wait
for it to finish.

Personally I use glimpse for this. It's a bit old and creaky but it
performs wonders. There might be something better out there by now.

I wouldn't recommend trying to use a standard FTS to index code:
code is not a natural language and the kinds of searches you usually
want to perform are a lot different. As an example, I glimpse for
"foo" when looking for references to a function foo, but "^foo"
when seeking its definition (this relies on the coding conventions
about function layout, of course). An FTS doesn't think start-of-line
is significant so it can't do that.

regards, tom lane

#3Oleg Bartunov
oleg@sai.msu.su
In reply to: Matthew Wilson (#1)
Re: I want to search my project source code

openfts.sf.net is tool for you. It has even example scripts for
indexing/searching file system.

Oleg

On Sat, 27 Oct 2007, Matthew Wilson wrote:

I have a lot of code -- millions of lines at this point, written
over the last 5 years. Everything is in a bunch of nested folders.

At least once a week, I want to find some code that uses a few modules,
so I have to launch a find + grep at the top of the tree and then wait
for it to finish.

I wonder if I could store our source code in a postgresql table and
then use full text searching to index. Then I hope I could run a query
where I ask for all files that use modules X, Y, and Z.

I'm looking for something sort of like the locate utility, except that
instead of building a quickly-searchable list of file names, I want to
be able to search file contents also.

Matt

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#4Guy Rouillier
guyr-ml1@burntmail.com
In reply to: Matthew Wilson (#1)
Re: I want to search my project source code

Matthew Wilson wrote:

I have a lot of code -- millions of lines at this point, written
over the last 5 years. Everything is in a bunch of nested folders.

At least once a week, I want to find some code that uses a few modules,
so I have to launch a find + grep at the top of the tree and then wait
for it to finish.

I wonder if I could store our source code in a postgresql table and
then use full text searching to index. Then I hope I could run a query
where I ask for all files that use modules X, Y, and Z.

DBMSs are great tools for the right job, but IMO this is not the right
job. I can't see how a database engine, with all it's transactional
overhead and many other layers, will ever beat a simple grep
performance-wise. I've used Eclipse for refactoring, but having done it
once, I'm sticking with grep.

--
Guy Rouillier

#5Perry Smith
pedz@easesoftware.com
In reply to: Guy Rouillier (#4)
Re: I want to search my project source code

On Oct 28, 2007, at 12:59 AM, Guy Rouillier wrote:

Matthew Wilson wrote:

I have a lot of code -- millions of lines at this point, written
over the last 5 years. Everything is in a bunch of nested folders.
At least once a week, I want to find some code that uses a few
modules,
so I have to launch a find + grep at the top of the tree and then
wait
for it to finish.
I wonder if I could store our source code in a postgresql table and
then use full text searching to index. Then I hope I could run a
query
where I ask for all files that use modules X, Y, and Z.

DBMSs are great tools for the right job, but IMO this is not the
right job. I can't see how a database engine, with all it's
transactional overhead and many other layers, will ever beat a
simple grep performance-wise. I've used Eclipse for refactoring,
but having done it once, I'm sticking with grep.

This is exactly what cscope is good for.

http://cscope.sourceforge.net/

I've used it since the early 90's. I do level 3 support for really
big companies. If you are an emacs fan, its hooked in to it as well.

You want to use the -q option. If it is a million lines of code, its
going to take a while. It pseudo-parses the code (some tricky
constructs will confuse it) and builds a very simple database file.
I think it uses Berkeley's DB file. After that, finding all the
occurrences of foo is a few seconds.

If you want to find just definitions (like where is foo defined),
then use ctags or etags. There is exuberant ctags here:

http://ctags.sourceforge.net/

Perry Smith ( pedz@easesoftware.com )
Ease Software, Inc. ( http://www.easesoftware.com )

Low cost SATA Disk Systems for IBMs p5, pSeries, and RS/6000 AIX systems

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#2)
Re: I want to search my project source code

Tom Lane wrote:

I wouldn't recommend trying to use a standard FTS to index code:
code is not a natural language and the kinds of searches you usually
want to perform are a lot different. As an example, I glimpse for
"foo" when looking for references to a function foo, but "^foo"
when seeking its definition (this relies on the coding conventions
about function layout, of course). An FTS doesn't think start-of-line
is significant so it can't do that.

+1. The nice thing about a tool that understands code is that you can
query it in ways that make sense to code. For example I can search for
"all files that include foo.h" or "all callers of function bar" or "all
occurences of the symbol baz". I use cscope for this, which integrates
nicely into my text editor (vim), and others have told me they use
kscope which puts it inside a nice GUI window, if you care about such
things.

--
Alvaro Herrera http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"I would rather have GNU than GNOT." (ccchips, lwn.net/Articles/37595/)