
The web is a great source of information, especially if you know how to use one of the better search engines such as Google or Altavista. But sometimes, the value of the information is time-dependent. You may want to be able to Monitor the web for new information, and be notified when it is available.
Netgrep was written to help ferret out new references to information by polling search engines. Given a search engine and a search phrase, Netgrep will check on a specified frequency for updates, and notify you of any new references.
In writing Netgrep, however, it became clear that another useful function could easily be added which would track the position of a specified site in the search results for a given search engine. This tracking function can also be configured to notify you of changes in position.
A Monitor maintains a list of resultant URLs from a given search. Any time a new URL is found, it will send a notification to the console, via email, to a log file, or via a combination of these methods.
The web interface supports a number of commands:
A Tracker maintains a list of resultant URLs from a given search. Any time the specified URL changes position in the results, it will send a notification to the console, via email, to a log file, or via a combination of these methods.
The web interface supports a number of commands:
There will be a configuration file for the GHEAR framework (see GHEAR config for more details).
The NetGrep configuration file is fairly self-explanatory. An example is included here:
# # Configuration file for NetGrep # NetGrep allows you to monitor search engines for new results # for a given search, or to track the position of a site for # given search terms. # For more information, please visit # http://www.andthehorseyourodeinon.com/tech/netgrep/ # # 04-25-2000 SjG # # Send Email Notifications? send = yes # Email address to send notifications. Change this, or I'll # get your notifications (and irritated) notifyemail = samuelg@andthehorseyourodeinon.com # # SMTP server to use for sending Email notifications # mail.smtp.host = localhost # # Email address to put in the "From" field of the notification email. # from-address = netgrep@localhost # inter-service start delay # If you want services to start in a staggered fashion, # set this delay to non-zero (otherwise, it's milliseconds) interservice-delay = 10000 # Refresh Frequency. How often should Netgrep # check for updates? In milliseconds. (Anything less than # hourly is pretty much a waste of CPU and bandwidth). # # daily refresh = 86400000
The searchterms file is organized with one Tracker or Monitor per line. Lines consist of items, delimited by the "pipe" symbol. Each line contains the following items:
Examples:
engine=google|term="horse you rode in on"|site=www.andthehorseyourodeinon.com|depth=55|name=Example
This will create a Tracker named "Example" for the site http://www.andthehorseyourodeinon.com for the search terms "horse you rode in on" using the Google search engine. Any time the position of this site changes (within the first 55 results returned), notification will be triggered.
engine=altavista|terms=mmmm donut|depth=50
This will create a Monitor on Altavista for sites referencing the words "mmmm" and/or "donut". Any time a new site shows up in the first 50 results, notification will be triggered.
With release 1.5.1, adding search engines is easy for Monitors. Use one of the existing Monitors as an example. Note that there is code in HTTPUtils that removes search engine references -- you may want to modify that as well. This will be cleaned up in the next release, along with the adding of Trackers.
To support the email feature, you'll also need Sun's JavaMail 1.1.x jar file on your Classpath. JavaMail requires the JavaBean Activation Framework as well. You can download JavaMail and the JAF from Sun.
Care must be used when setting up Netgrep searches so as not to create too many simultaneous Monitors or Trackers. Depending on system memory and CPU power, too many could saturate the system. Additionally, searches are not shared among Trackers or Monitors (even for the same keywords), so a Netgrep session could potentially overwhelm the local network, effectively denying service to other network operations.
The following other considerations are copied from the GHEAR Security considerations:
As far as I know, there aren't any major buffer overrun-type problems with Java's I/O libraries, but if there are, this code could be at risk. A compromised Virtual Machine could be used to cause a lot of problems, so it seems prudent to run GHEAR application under accounts that have limited permissions. Don't run 'em as root!
Obviously, GHEAR yields some information about one or more running processes. If this information is sensitive, don't use GHEAR for those processes!
GHEAR's listener thread is itself vulnerable to certain denial of service attacks, and could be used to induce others (like overloading the CPU by overwhelming the VM with requests).
(None)
7 June 2000. Version 1.5.1 includes a lot of object model reorganization, the ability to re-load the searchterm file, sorted output, the new Infoseek Tracker and Monitor, and the Multi Monitor.
23 May 2000. Version 1.2. Initial release.
You can grab a tarball.
You can access the current CVS tree using:
setenv CVSROOT :pserver:anonymous@www.andthehorseyourodeinon.com:/home/cvsroot
Password for anonymous access to the CVS repository is "anonymous".
Right this way...
The whole bundle (including utilities, and so on) lives at http://www.andthehorseyourodeinon.com/tech/javadoc/packages.html
Q: But it ain't really grep-like!
A: True, but all the good names like "web weasel" and "net ferret" were already taken.
Initial version by Samuel Goldstein (samuelg@andthehorseyourodeinon.com)
Latest Web Page Update: 7 June 2000
| Home | Art Terrorism | Poetic Justice | Ravings | Technology |