man ignore.list (Formats) - websec url monitoring configuration
NAME
ignore.list - websec url monitoring configuration
DESCRIPTION
IGNORE KEYWORDS
When determining which parts of a particular web page has changed, you may want to skip those paragraphs that contains certain predefined words. For example, pages like InfoWorld, PC Magazine and PC Week often contain the current date/time regardless of whether there is new or changed content. In such cases, you can use IGNORE KEYWORDS to skip those paragraphs which contains date/time information.
Ignore keywords are stored in a file called ignore.list in the same directory as websec. Like the URL list, the ignore keywords are partitioned into different sections. Each section has a user-defined name. An example is shown below:
[General] all rights reserved an error occurred click here comments copyright
[Date_Time] January\s+\d{1,2} February\s+\d{1,2} March\s+\d{1,2} April\s+\d{1,2} May\s+\d{1,2}
In the example above, there are two sections: General and Date_Time. You can use them in the URL list as follows:
Ignore = General
You can also use multiple sections at one go:
Ignore = General,Date_Time
If you use certain ignore keywords regularly, you might want to add them to a defaults section in the URL list.
Ignore keywords can contain regular expressions. For example, the ignore keyword January\s+\d{1,2} tells websec to look for the string January, followed by one or more spaces, followed by at least one but not more than two digits.
Two sections of ignore keywords are supplied in this distribution. General contains some general ignore keywords which you may want to use. Date_Time contains date/time detectors coded using regular expressions. Feel free to add your own!
IGNORE URLS
Most advertisements in webpages are of the following form:
<A HREF="http://page.url.com/advert/cgi-bin/" ...> <IMG SRC="advert.animated.gif" ...> Click here for free beer! </A>
Such advertisements can be ignored when running webdiff using ignore URLs.
Ignore URLs are also stored in ignore.list. They contain all of parts of the URL referred to by the <A HREF> tag which you want to ignore. An example is shown below:
[Adverts] page.url.com/advert/cgi-bin/
Use the Adverts section in the URL list as follows:
IgnoreURL = Adverts
You can also use multiple sections at one go:
IgnoreURL = Adverts1,Adverts2
If you use certain ignore URLs regularly, you might want to add them to a defaults section in the URL list.
Like ignore keywords, ignore URLs can contain regular expressions.
An Adverts section is supplied in this distribution. Feel free to add your own!
SEE ALSO
AUTHOR
Baruch Even <websec@ev-en.org> is maintaining this program.