man log_analysis (Commandes) - Analyze various system logs

NAME

log_analysis - Analyze various system logs

SYNOPSIS

log_analysis [-h] [-r] [-g] [-f config_file] [-o file] [-O] [-n nodename] [-U] [-u unknownsdir] [-D var1,var2=value,...] [-d days_ago] [-a] [-F] [-i] [-m mail_address] [-M mail_prog] [-s] [-S] [-t forced_type] [required_files. . .] log_analysis -I info_type

DESCRIPTION

log_analysis analyzes and summarizes system logs files. It also runs some other commands (ie. w, df -k) to show the system state. It's intended to be run on a daily basis out of cron.

log_analysis supports several major modes. The default mode is report mode, which scans through your logs, produces a text report, and exits. There is also real mode, which lets you monitor your logs continuously; gui mode, which is a gui sitting on top of real mode; and daemon mode, which is a daemonized variant of real mode.

OPTIONS

-a all
Show all logs, not just the ones from yesterday.
-A daemon mode
Start in daemon mode. Daemon mode is like real mode, except that the process daemonizes, and there is no regular output, just actions. daemon mode is useful if you want to start log_analysis at system boot time to run actions. It's also useful if you have actions configured, and you have multiple copies of log_analysis running in real/gui mode, and you only want the actions to happen once. See -r for more info on real mode. In general, anything that applies to real mode applies to daemon mode unless it explicitly says otherwise. The variables specific to daemon mode are daemon_mode and daemon_mode_pid_file. One variable that is not specific to daemon mode but is really useful with daemon mode is real_mode_no_actions_unless_is_daemon.
-b real mode backlogs
By default, real mode and gui mode ignore all existing log messages and only show new logs. With this option, real mode shows logs as indicated by days_ago. See -r for more info.
-d days_ago
Show logs from days_ago days ago. Defaults to 1 (ie. show yesterday's logs.) In -a mode, this option only affects the heading, and it defaults to 0. You can also provide an absolute date in the form YYYY_MM_DD, ie. 2001_03_02. And you can provide the symbolic names today (equivalent to 0) and yesterday (equivalent to 1). And you can even provide a date range in the form YYYY_MM_DD-YYYY_MM_DD or ago1-ago2 to get output for a range of days. Each day is output individually, so if you use the -o option, you get a separate file for each day, and if you use the -m option, you get a separate mail for each day. You can also set this in the config with the days_ago variable. See -r for how days_ago is handled under real mode and gui mode.
-D var1,var2=value,var3,...
This option lets you define preprocessor constants. It's argument is a comma-separated list of constants to define. To set a constant to a particular value, say constant=value.
-f config_file
Read config_file in addition to the internal config and the internal config files. See CONFIG FILE for details.
-F
Instead of loading the whole internal config, just use a minimal subset.
-g
gui mode, ie. monitor log files continuously. Currently conflicts with many other modes and options. Yes, has built-in support for log file rollover. This is basically real mode (see -r) with a GUI; variables that apply to real mode also apply to gui mode, but not vice versa. See variables gui_mode, gui_mode_modifier, and window_command for gui mode specifics. See -r for many things that also apply to gui mode.
-h help
Show command summary and exit.
-i includes suppress
Don't include the standard include files, ie. /etc/log_analysis.conf, /usr/etc/log_analysis.conf, and the others listed in FILES. Note that this option does not stop the inclusion of CW$HOME/.log_analysis.conf in gui mode.
-I info
This option is used for obtaining internal information about log_analysis. log_analysis exits immediately after outputting the information. If info is help, log_analysis outputs the list of things you can use for info. If info is categories, all categories (those mentioned in the various configs and implicit categories) will be listed. If info is colors, all colors that work for real_mode and gui_mode will be listed. If info is config_versions, all config files will be listed with their config_version (if defined) and file_version (if defined). If info is evals, the evals built from the config (internal and local) are output. If info is internal_config, the internal config is output. If info is log_files, the log files that would have been read are output. If info is log_types, the known log types are output. If info is nothing, log_analysis just exits. Useful for testing configs. If info is pats, the known subpatterns will be listed. If info is patterns, the various patterns defined for the log types are output.
-m mail_address
Mail output to mail_address. This can also be specified in the config; see mail_address in VARIABLES.
-M mail_command
Use mail_command to send the mail. This can also be specified in the config; see mail_command in VARIABLES for more info, including the default.
-n nodename
Use nodename as the nodename (AKA hostname) instead of the output of uname -n. This is more than just cosmetic: entries in syslogged files will be processed differently if they didn't come from this nodename. This can also be specified in the config file; see nodename in VARIABLES.
-N process all nodenames
If the logs contain entries for nodes other than nodename, (ie. if the host is a syslog server), analyze them anyway.
-o file
Output to file instead of to standard output. Works with -m, so you can save to a file and send mail with one same command.
-O
With -o file, causes the output to go both to the file and to standard output. NB: this does not currently work with -m, so you can't output to a file, standard output, and to email.
-p pgp_type
Encrypts the mail output. Uses pgp_type to determine the encryption command. For use with -m or mail_address. See pgp_type in the list of global variables for info on encryption types.
-r
Real mode, ie. monitor log files continuously. Currently conflicts with many other modes and options. Yes, has built-in support for log file rollover. See -g for a GUI that can sit on top of this mode, and -A to run real mode as a daemon. See variables real_mode, real_mode_output_format, real_mode_sleep_interval, real_mode_check_interval, real_mode_backlogs (or the -b option), and keep_all_raw_logs in the list of global variables for more configurables. WARNING: in real mode and gui mode, only the most recent file per glob in optional_log_files is monitored. This means that you should set it to something like /var/log/messages* and /var/log/syslog* rather than /var/log/*. WARNING: in real mode and in gui mode, log_analysis treats days_ago differently; if it's a simple number, it is treated as the number of days ago to start looking at logs. So, if days_ago is 7, log_analysis looks through the past 7 days' worth of logs. HOWEVER, even if -d is set, log_analysis doesn't actually show these logs unless -b is specified or the corresponding variable real_mode_backlogs is set. NOTE: The primary feature of log_analysis is its reporting capability. Using it for continuous monitoring makes sense if you want a single config for reporting and for continuous monitoring. If you just want continuous monitoring then you may be better off with some of the other software out there, such as swatch(1).
-s suppress other commands
Usually, log_analysis runs assorted commands that show system state (ie. w, df -k). This option doesn't run those commands. See commands_to_run in VARIABLES for the list of extra commands.
-S suppress output footer
Usually, log_analysis will include its version number, the time it spent running, and its arguments at the end of the output. This option suppresses that output.
-t forced_type
log_analysis usually determines the type of logfiles by looking at the per-type log_filenames extension. This option and the type_force variable let you bypass that check.
-U unknowns-only
Output logfile unknowns to stdout and exit. If unknownsdir exists, also wipe unknownsdir if it exists and then write out raw unknown lines to files in unknownsdir. This exists to make writing custom rules easier.
-u unknownsdir
Use unknownsdir as the unknownsdir. If unknownsdir already exists, and contains files, its files will be used as the input for log_analysis regardless of any other command line options. If -U is also specified, after all processing unknownsdir will be wiped out and its files rewritten with the current unknowns. This is useful for writing your own configs.
-v version
Output version and exit.
required-files
If files are specified on the command line, log_analysis ignores its built-in list of optional and required log files, and process the files on the command line. If one of the files doesn't exist, it's a fatal error.

CONFIG FILE

The script has an embedded config file. It will also read various external config files if they exist; see FILES for a list. Later directives (from later in the file or from a file read later) override earlier directives.

You can make comments with '#' at the beginning of a line. If you want a '#' or '=' at the beginning of a line, you usually need to quote it with backslash.

Some directives take a block as argument. A block is a collection of lines that ends with a line that is empty or only contains whitespace. '#' at the beginning of a line still comments out the line. Leading whitespace on a line is ignored.

Before the config is parsed, it is passed through a preprocessor inspired by the aide(1) preprocessor.

Pattern directives

These directives describe your logs, and are the main point of this program. The basic idea here is that you first declare what logtype you are working with, and then you specify a bunch of perl patterns that describe different kinds of log messages, and that save parts of the message. For each perl pattern, you specify one or more destinations that describe what you want done with it.

logtype: type
Future patterns should be applied to this logtype (ie. sulog, syslog, wtmp.) Example: logtype: syslog
pattern: pattern
pattern is a perl regex (see perlre(1)) that implictly starts with ^ (beginning of the line) and implicitly ends with \s*$ (optional whitespace and the end of the line.) This should only be issued after a logtype: has been issued in the same config file. Wildcard parts of the pattern should be surrounded with parentheses, to save these parts for later use in the format:. Note that there are some tokens with special meanings that can be used here in the format CW$pat{something}, ie. CW$pat{ip}, CW$pat{file}, etc. (see pat for details, and run log_analysis -I pats for the current list). Examples: pattern: popper: Stats: ($pat{mail_user}) (\d+) (\d+) (\d+) (\d+) pattern: login: LOGIN ON ($pat{file}) BY ($pat{user}) The order of precedence for patterns is undefined, except that user-defined patterns always have precedence over the patterns of the internal config.
format: format
format is treated as a string that contains the useful information from a pattern. Note that it should not actually be quoted. A format is mandatory for category destinations, but should not be used with SKIP or LAST destinations. For example, if we had a pattern that was login: LOGIN ON ($pat{file}) BY ($pat{user}), we would probably just want CW$2, so we might say: format: CI$2 Similarly, if we had a patterns that was kernel: deny (\d+) packets from ($pat{ip}) to ($pat{ip}), we might want to say: format: CI$2 => CI$3
use_sprintf
use_sprintf is optional. If this directive is present for a given format, than instead of the format being treated as a string, it is treated as the arguments for sprintf(3). For example, if you have a source IP address in CW$2 and a destination IP address in CW$3, you could just have dest as $2 => CI$3, but you would have things lining up better if you did this: format: "%-15s => CI$3", CI$2 use_sprintf
count: count
count is optional. The default is that a log line that matches a pattern causes the category to increment by 1. But sometimes, a single log line corresponds to multiple events, ie. if you have a log message of the form 5 packets denied by firewall or last message repeated 3 times, you can extract the event count to count. For example, if you're using the pattern kernel: deny (\d+) packets from ($pat{ip}) to ($pat{ip}), you might say: count: CI$1
color: colors
space-separated list of colors to display this message in when in real-mode or gui-mode. For a list of colors that will work in both modes, run log_analysis -I colors. Note that bell is among the available colors, because it didn't fit anywhere else. See the colors entry for more info. NOTE: if multiple dest configs with conflicting color settings result in delivery to the same line in gui mode, the result is currently undefined. There is only one line to be displayed, after all.
description: description_text
This is a simple text description of the event, to explain the problem to your operators. It can be accessed via gui mode. The note above by color applies.
do_action: action
Run action (described elsewhere in the config with the action: keyword) if this event is seen in real mode or gui mode.
priority: priority
Assign priority priority to action. Currently, the only priority that does anything is IGNORE. It can be used to ignore events.
dest: dest
This describes what you want done with the data in a pattern. If dest is the special token SKIP the data is discarded. If dest is the special token LAST, the data is assumed to be of the form last message repeated N times, and we pretend as though the last message we saw occurred, using count as a multiplier. If dest starts with the special token UNIQUE, we do special unique handling, which is covered in UNIQUE DESTINATION. If dest starts with the special token CATEGORY or is any other string, it is treated as a category that the pattern data should be saved to. Ie. if pattern was login: LOGIN ON ($pat{file}) BY ($pat{user}), and format was $2, then one might set dest to login: successful local login. You must have a format defined before the dest. You can have multiple dest directives for a single pattern, if all of the dests are category destinations. Each one needs its own format. Similarly, if you set count or use_sprintf, they are tied to the particular dest you set them with. Note that dest closes the description of a destination, so you need to have any other related directives (ie. format, count, use_sprintf) before the dest directive. This ordering is necessary to avoid ambiguity in the multiple-destination case.

Event directives

You can configure what happens for incoming events based on certain criteria. Currently, those criteria are a simple string match of one or more of the category, data, or hostname. So, for example, you can ignore all messages from roguehost, or color user logged in messages for a certain user in bright red. Here are the useful directives:

event:
Starts a new event config.
match category: value
match data: value
match hostname: value
This event config applies when the category is value, or the data is value, or the hostname is value. If multiple match lines are supplied, they are ANDed together.
color: color
description: description_text
do_action: action
priority: priority
color, description, do_action, and priority work the same way as they do in a dest config or in an event config. If event, dest, and category configs all apply to a given event than event has highest precedence, followed by dest, followed by category.

Category directives

Several patterns can lead to the same category, so category-specific directives are associated with the category, not with a pattern. Here are the category directives:

category: category
Specifies which category subsequent directives will define.
filter: filter commands
By default, log_analysis will output all the data it finds in a category. Filters let you specify, say, that only the top 10 items should be output, or that only the items that occurred fewer than 5 times should be output. If a category has data, but none of the data meet the filter rules, then the category will be completely skipped. See FILTERS for more info.
sort: sorting keywords
Specifies how this category should be sorted in the output. Examples are funky, string, value, reverse value, etc. The default is funky. See SORTING for more info.
derive: derive commands
The usual way to populate categories is via the pattern config. But sometimes, you want to combine two or more elemental categories to make a new category. Any categories derived in this manner may not be a destination for simple patterns. There are currently three subcommands for this (the quotes are literal): These do what you expect: take the values for the items in category2 and add or subtract them from the values for the items in category1. Any item defined in either category will be in the new category. Subtract can cause the values in the new category to be negative or 0. The new category will contain items in category1 that are not in category2. This is very different from subtract. Example: if category1 contains A with a value of 2 and B with a value of 2, while category2 contains A with a value of 1 and C with a value of 1, 'category1 subtract category2' will contain A with a value of 1, B with a value of 2, and C with a value of -1, while 'category1 remove category2' will only contain B with a value of 2.
color: color
description: description_text
do_action: action
priority: priority
color, description, do_action, and priority work the same way as they do in a dest config or in an event config. If event, dest, and category configs all apply to a given event than event has highest precedence, followed by dest, followed by category.

Action directives

In real mode and in gui mode, sometimes you want an action (like paging someone) to automatically happen when a particular message is seen. And in gui mode, you might want to run a command on a message interactively (ie. to telnet or ssh into the host it came from.) The directives to do that (inspired by swatch(1)) are:

action: action_name
Starts defining a new action named action_name.
command: command
The command to run for the current action. command uses the same tags as real_mode_output_format. WARNING: you can potentially shoot yourself in the foot by passing data that has not been sanitized to a command on your system. Be careful!
window: title
Performing the action will require creating a window using title as the title. The title will be passed to window_command as the %t tag. title itself uses the same tags as real_mode_output_format. This only makes sense for gui mode. WARNING: you can potentially shoot yourself in the foot by passing data that has not been sanitized to a command on your system. Be careful!
use_pipe:
The data in the event will be sent to the command via standard input. The format used will be that specified by the default_action_format variable, unless overridden locally by the action_format: directive. These formats allow the same tags as real_mode_output_format.
action_format: format
See use_pipe above.
throttle: throttle_time
Automatically-triggered actions can potentially result in a slew of events. The throttle option lets you specify a minimum amount of time before the action should recur with this event. The time can be specified as seconds, as minutes:seconds, or as hours:minutes:seconds. Throttles do not apply to actions and logins that are explicitly invoked via the GUI. By default, the throttle is triggered on unique category and data. That is, if the event was category user logged in and the data was morty, then the throttle will keep user logged in, morty events from causing the action to run again, but won't stop user logged in, esther or no such user, morty events from triggering the action. This default is set with the default_throttle_format variable, which defaults to %c\n%d. It can be overriden on a per-action basis with the throttle_format: directive, which takes the same tags as real_mode_output_format. If you want the throttle to be global to the action (say, a pager action), set throttle_format to a simple scalar value (like 1).
throttle_format: format
See throttle: above.

Other directives

config_version version-number
Declare that the config is compatible with version version-number. This is for version-control purposes. Every config file should have one of these. You can scan your config files' config versions with -I config_versions.
file_version revision-information
Your own version control information. revision-information can be arbitrary text. You can scan your config files' config versions with -I config_versions.
include file
Read in configuration from file. Dies if file doesn't exist. file is subject to usual tag substitutions; see TAG SUBSTITUTION.
include_if_exists file
Just like include, but doesn't die if the file doesn't exist.
include_dir dir
Read in all files in dir, and include them. Die if the directory doesn't exist, or if a file in the directory isn't readable. dir is subject to the usual tag substitutions; see TAG SUBSTITUTION. Any filenames that match a pattern in filename_ignore_patterns will be skipped.
include_dir_if_exists dir
Just like include_dir, but doesn't die if the directory doesn't exist. Does still die if any of the files in dir isn't readable.
block_comment
Throws out the block immediately after it.
set var varname =value
Set scalar variable varname to value value. If the variable already exists, this will overwrite it. See VARIABLES for the list of variables you can play with.
add var varname =value
If scalar variable varname already exists, append value to the end of its current value. If it doesn't yet exist, create it and set it to value. See VARIABLES for the list of variables you can play with.
set arr arrname =
Read in the block that follows this declaration, make the lines into an array, and set the array variable arrname to that array. See VARIABLES for the list of variables you can play with.
add arr arrname =
Read in the block that follows this declaration, make the lines into an array, and append that array to the array named arrname. See VARIABLES for the list of variables you can play with.
remove arr arrname =
Read in the block that follows this declaration, and for each line, look for and delete that line from array arrname. If one of these lines cannot be found, the result is a warning, not death. See VARIABLES for the list of variables you can play with.

VARIABLES

Some variables are scalar, which means they are strings or numbers. Some variables are arrays, which are lists of scalars.

Some variables are mandatory, which means they must be defined somewhere in one of the config files, while some variables are optional.

Some variables are global, while some are per-log-type extensions. Some example of per-log-type extensions are date_pattern and filenames. Extensions should actually appear in the format TYPE_EXTENSION, ie. date_pattern would actually appear as syslog_date_pattern for the syslog log-type and sulog_date_pattern for sulog.

To see examples of many of the possibilities, as well as the default values, run log_analysis -I internal_config.

PER-LOG-TYPE VARIABLE EXTENSIONS

filenames
This mandatory extension is an array of file basenames that apply to the log type. For example, if you wanted /var/adm/messages.1 to be processed by the syslog rules, you might add messages to syslog_filenames.
open_command
Some log files (ie. wtmp log types) are in a binary format that needs to be interpreted by external commands. This optional scalar extension specifies a command to be run to interpret the file. The command is subject to the usual tag substitutions (see TAG SUBSTITUTIONS), plus the CW%f tag maps to the file. For example, the wtmp log type defines wtmp_open_command as "last -f CI%f". If both decompression_rules and open_command apply to a given file, the intermediate data will be stored in a temp file unless pipe_decompress_to_open is used. See pipe_decompress_to_open for more info.
pipe_decompress_to_open
If both decompression_rules and open_command apply to a given file, the intermediate data will be stored in a temporary file by default to avoid problems with some commands that can't handle input from a pipe. If this optional scalar extension is set to 1 (or any true) value, then instead, the output of the decompression rule will be piped to the open command, and the open command's CW%f tag will be mapped to -.
open_command_is_continuous
If an open_command has been specified and the command is the sort that never exits (ie. tcpdump or the like) you should set this to let log_analysis know what to expext. Such commands should only ever be used in real mode or gui mode.
pre_date_hook
This optional extension is an array of arbitrary perl commands that are run for each log line, before the date processing (or any other processing) is done.
date_pattern
This mandatory extension is a scalar that contains a pattern with at least one parenthesized subpattern. Before any rules are applied to a log line, the engine strips off the date pattern. If the engine is only looking at one day (ie. the default), it takes the part of the string that matched the parenthesized subpattern, and if it isn't equal to the right date, it skips the line. The date_format extension (next) describes what the date should look like.
date_format
This mandatory extension is a scalar that describes the date using the same format as BIstrftime(3). For example, syslog_date_format is %b CW%e.
nodename_pattern
This optional extension is a pattern with at least one parenthesized subpattern. If it exists, then after the date_pattern is stripped from the line, this pattern is stripped, and the part that matched the subpattern is compared to the nodename. If they're not equal, then the relevant counter for the category named by the other_host_message variable is incremented. Note that all nodenames are subject to having the local domain stripped from them; see domain and leave_FQDNs_alone for details.
pre_skip_list_hook
This optional extension is an array of perl commands to be run after the nodename check, just before the skip_list check.
skip_list
This optional extension is obsolete and deprecated, but still works for backwards compatibility.
raw_rules
This optional extension is obsolete and deprecated, but still works for backwards compatibility.

GLOBAL VARIABLES

These variables are all globals.

log_type_list
This variable is a mandatory global array that contains the list of all known log-types, ie. syslog, sulog, wtmpx, etc.
pat
This variable is a madatory global array that contains a list of subpattern names followed by a comma, optional whitespace, and a perl regex that represents that subpattern. Some of the predefined patterns include ip, zone, user, mail_user, etc. Run log_analysis -I pats for a list.
host_pat
file_pat
ip_pat
mail_user_pat
user_pat
word_pat
zone_pat
Legacy variables. Please don't use them.
other_host_message
output_message_one_day
output_message_all_days
output_message_all_days_in_range
Assorted mandatory scalars that are used for human-readable output. other_host_message defaults to Other hosts syslogging to us, output_message_one_day defaults to Logs for CW%n on CW%d, output_message_all_days defaults to All logs for CW%n as of CW%d. output_message_all_days_in_range defaults to All logs for CW%n for CW%s through CW%e.
date_format
This variable is a mandatory global scalar that describes how you want the date printed in the output. Uses the format of BIstrftime(3). Note that you probably shouldn't use characters that you wouldn't want in a filename (ie. whitespace or '/') if you want to use the CW%d tag for output_file.
output_file
Equivalent to -o file. This variable is an optional global scalar that lists a filename that will be output to instead of to standard output. Works with mail_address (if specified.) Note that this variable is subject to the usual tag substitutions (see TAG SUBSTITUTIONS, plus you can use the CW%d tag for the date, so you can set it to something like /var/log_analysis/archive/%n-%d. See output_file_and_stdout.
output_file_and_stdout
Equivalent to -O. This variable is an optional global scalar that changes the behavior of -o or output_file. By default, -o or output_file causes output to only to only go to the named file. With this variable, output also goes to standard output. Note: this does not currently work with -m.
nodename
This variable is an optional global scalar that is used in a bunch of places: in checking to see whether a message from syslog (or other log type that defines nodename_pattern) originated on this host; in reading in various default config files; etc. If left unset in the config, its value is set from the output of uname -n. Its value is used to set the n tag. Note that unless leave_FQDNs_alone is set, log_analysis will try to strip the local domain name from nodename.
osname
osrelease
These two optional global scalars default to the output of uname -s and uname -r, respectively. They are only used for reading in default config files. Their values set the s and r tags, respectively.
domain
This variable is an optional global scalar. If you don't set it, log_analysis will try to set it by looking for a domain line in /etc/resolv.conf. If log_analysis has domain set, it will attempt to strip away the local domain name from all nodenames it encounters, unless leave_FQDNs_alone is set. See leave_FQDNs_alone for details.
leave_FQDNs_alone
This variable is an optional global scalar. By default, if log_analysis has domain set (either explicitly or implicitly), it will attempt to strip away the domain name in domain from all nodenames it encounters. If you set this to 1, or to some other true value, log_analysis will not attempt to strip the domain name in domain.
PATH
This variable is an optional global scalar that sets the PATH environment variable. This doesn't help the initial setting of nodename, osname, or osrelease, which are set by running uname.
umask
This variable is an optional global scalar that sets the umask. See umask(2).
priority
This variable is an optional global scalar that sets the priority, or niceness. See nice(1). Setting this to zero means run unchanged from the current niceness. Setting this negative is a bad idea unless you really know what you're doing, and is forbdidden to non-root users.
decompression_rules
This variable is an optional global array of rules to decompress compressed files, in the format: compression-extension, comma, space, command to decompress to stdout. The command is subject to the usual tag substitutions (see TAG SUBSTITUTIONS, plus CW%f stands for the filename. For example, the rule for gzipped files is: CWgz, gzip -dc %f If both decompression_rules and open_command apply to a given file, the default is to use a temp file for the intermediate results unless pipe_decompress_to_open is used. See pipe_decompress_to_open for more info.
pgp_rules
This variable is an optional global array of rules for PGP encrypting messages, in the format: PGP type (user defined), comma, space, command to PGP encrypt stdin to stdout. The command is subject to the usual tag substitutions, plus CW%m stands for the email address. For use with the "-p and -m" options. For example, the rule for gnupg is: CWg, gpg -aer %m 2>&1 Internally defined rules are g for gnupg, 2 for PGP 2.x, and 5 for PGP 5.x. WARNING: The user who runs log_analysis must have already imported the mail destination's key for this to work. Make sure to test this before you put it in a cronjob.
filename_ignore_patterns
This variable is an optional global array of patterns that describe filenames to be skipped in an include_dir/include_dir_if_exists context, such as emacs backup file (.*~) or vim backup files (\..*\.swp). Only the file component of the path is examined, not the directory component. Patterns implicitly begin with ^ and implicitly end with $.
mail_address
This variable is an optional global scalar that can consist of an email address. If set, the output of the script will be mailed to the address it is set to. The -m option does the same thing, and overrides this.
mail_command
This variable is an optional global scalar that is the command used to send mail if -m is user or mail_address is set. The -M option does the same thing, and overrides this. This variable is subject to the usual tag substitutions, plus CW%m stands for mail_address and CW%o stands for the relevant output message. The default is: CWMail -s '%o' %m
optional_log_files
This variable is an optional array of file globs that are to be processed. Note that, unlike required_log_files, these are globs rather than literal filenames, although literal filenames will also work. [Globs are filenames with wildcards, ie. /var/adm/messages*.] See -r for an issue specific to real mode and gui mode.
commands_to_run
This variable is an optional array of commands that are also supposed to be run to give a snapshot of the system state. These are currently: w, df -k, and cat /etc/dumpdates.
ignore_categories
This variable is an optional array of categories that you don't want to see. Rather than try to remove all the rules for these categories, you can just list them here.
priority_categories
This variable is an optional array of categories that will be listed first in the output.
days_ago
This optional scalar variable is the config equivalent of the -d option.
process_all_nodenames
This optional scalar variable is the config equivalent of the -N option.
type_force
This optional scalar is the config equivalent of the -t option.
allow_nodenames
This variable is an optional array of nodenames that can log to this host. Usually, logs labelled as being from another host will not be anaylzed, and each such line will be listed in a special category; if you chose to allow some nodenames (or if you choose to process all nodenames by setting -N or setting process_all_nodenames) then these log messages will also be processed.
real_mode
This variable is the config equivalent of the -r option; see the -r option for more details.
real_mode_output_format
This is a required global scalar. It describes the per-output format for real mode and gui mode. It is subject to normal tag substitution (see TAG SUBSTITUTION); in addition to the normal tags, %c is replaced with the category, %# is replaced with the count, %d is replaced with the formatted data, %h is replaced with the nodename of the message, and %R is the raw, original log line without the trailing newline. If keep_all_log_lines is set, you also get %A for all the raw logs line. WARNING: you usually want %h (nodename of the message), not %n (nodename of the host you're running on, which is one of the default tags substitutions.) Defaults to %c: (loghost CW%n, from host CW%h)\n%-10# CW%d\n\n.
real_mode_sleep_interval
This optional global scalar is for use with real mode and gui mode. In these modes, log_analysis reads log files for more data, sleeps for a little while, and then reads again. The sleep interval controls how long log_analysis sleeps (in seconds). It defaults to 1.
real_mode_check_interval
This optional global scalar is for use with real mode and gui mode. In these modes, log_analysis sits in a loop reading from the logs files. Periodically, it wants to check if the log files have rolled over or if newer log files have appeared. If at least this long (in seconds) goes by since the last time we've checked, we check again.
keep_all_raw_logs
This optional global scalar is a boolean for use with real mode and gui mode. It enables a CW%A tag that contains all the raw logs for a given entry. That is, if you have multiple log lines that contain essentially the same data, only the first line shows up in CW%R, and the rest are thrown out. This variable lets you keep them all. It can eat up a lot of memory, so it's disabled by default.
real_mode_backlogs
This optional global scalar is equivalent to -b.
colors
This variable is an optional global array for use with real mode and gui mode. It defines the colors available on console, using name, string pairs. The usual tag substitution rules apply to the string, plus the special tag CW%a stands for octal character 007 (ASCII BEL) and CW%e stands for octal character 033 (ASCII ESC). Some of the colors are actually mode changes (ie. normal, inverse, reverse, blink, etc.) If you define any colors, you should also define a normal color. Note that bell is among the colors; it didn't belong anywhere else. You can list colors with log_analysis -I colors.
gui_mode
This variable is the config equivalent of the -g option; see the -g option for more details. It is an optional scalar.
gui_mode_modifier
In gui mode, the default modifier to do things with the keyboard is alt, ie. alt-q to exit. This lets you change it. It is an optional scalar.
window_command
In gui mode, if we need a window to run a command, say an action, this will be the command that is used. The tags are the same as real_mode_output_format, plus we have %t as the title and %C as the command. It is an optional scalar.
login_action
This optional array lets you specify what action should be used to login to a given host in gui mode, overriding default_login_action. Lines are in the format host, login_action.
default_login_action
This optional scalar specifies which login action should be used to login in hosts by default in gui mode.
default_throttle_format
See the throttle: directive in the action group.
default_action_format
See the use_pipe directive in the action group.
print_command
print_format
save_format
gui_mode_config_autosave
gui_mode_config_file
These are for GUI use.
default_sort
This variable is an optional global scalar that describes how certain things will be sorted. See SORTING for info on what this can be set to. Defaults to funky.
default_filter
This variable is an optional global scalar that describes the default category filter. See FILTERS for info on what this can be set to.

PREPROCESSOR DIRECTIVES

NB: these get completely processed before all other directives, so they don't care about other syntax elements. Except as noted, these should appear at the beginning of the line after optional whitespace.

@@end
End of config file.
@@define var val
Define var as value val. var should contain only alphanumerics and underscores, and start with an alphanumeric. val may contain no whitespace.
@@undef var
Undo any previous definition of var.
@@ifdef var
@@ifndef var
@@else
@@endif
If variable var is defined, even defined as a false value, the lines after the @@ifdef are used, otherwise the lines are effectively commented out. @@ifndef is the logical reverse. @@ifdef and @@ifndef must be terminated by an @@endif. They may contain an @@else section that works in the usual way.
@@ifhost name
@@ifnhost name
These are just like @@ifdef and @@ifndef above, except that they test if the variable nodename is equal to the value supplied for name.
@@{var}
If this string appears anywhere on any line, then if var is a defined variable, its value is substituted. If var is not a defined variable, the string is left literally. Note that this behaviour is different from that of aide(1).
@@warn message
Print out message as soon as the config is read.
@@error message
Print out message and exit as soon as the config is read.

SORTING

You can sort category items using several different criteria. You can set the default_sort, and then on a per-category basis, you can use the sort: keyword to control things even closer. If you don't override it, default_sort defaults to funky. Sorts stack, so you can use reverse string or reverse value. In theory, you can stack all of them, ie. reverse value reverse funky, but there is no guarantee that sorts are stable.

The available sorts are:

string
Simple string lexicographical sort. Does not handle numbers well.
numeric
Sorts numbers, including decimal numbers, correctly, but cannot handle non-numeric characters, and cannot handle IPs correctly.
funky
Tries to do the right thing with mixed integers and strings. Handles IP addresses correctly. It does not handle decimal numbers correctly.
reverse
Reverses the current order. Can be used in conjunction with another sort, ie. reverse string.
value
Sorts by count (ascending) instead of by item.
none
Does no additional sorting.

FILTERS

Sometimes, you don't want to see all the information in a category, just the top few items, or whatever. Filters let you do this. You can set a default filter using default_filter (defaults to none) or you can set filters on a per-category basis using the filter: keyword.

Some commands you can use:

>= N
Only show items whose count is greater than or equal to N.
<= N
> N
< N
= N, == N
!= N, <> N, >< N
These are analagous to >=.
top N
top N%
top_strict N
top_strict N%
Only show those items who count is in the top N or top N%. The difference between top and top_strict is what happens when there's a tie to be in the top N. top will include all the items that tie, even if this means there will be more than N. top_strict always cuts off after N.
bottom N
bottom N%
bottom_strict N
bottom_strict N%
Analagous to top.
subfilter and subfilter
subfilter or subfilter
Lets you and or or two or more subfilters togther (ie. "top 10 and >= 4").

UNIQUE DESTINATIONS

log_analysis has a relatively simple counting mechanism that is usually effective. One exception is when you want to track how often one value occurs in your log uniquely with another value. For example, suppose you're watching firewall logs, CW$1 is the source IP, CW$2 is the destination IP, and you want to know if you're being scanned. Tracking counts of $1 CW$2 requires you to manually count how many times CW$1 occurs. Tracking just $1 doesn't really tell you what you want, because you don't know if the source IP is really scanning a bunch of different hosts, or just has a renegade process that's banging away at a single destination. What you want to track is how many times CW$1 occurs with a unique CW$2.

To do this sort of thing in a pattern config, set format: to value1, value2 and set dest: to "UNIQUE category-name". In our example, we might say:

  format: $1, $2
  dest:   UNIQUE scans

The fields in format are not evaluated in a string context, and only the last comma acts as a separator. So, if CW$3 contains the protocol information, you might say this:

  format: sprintf("%-15s %s", $1, $3), $2
  dest:   UNIQUE scans

TAG SUBSTITUTIONS

A few items are subject to tag substitutions. These are kind of like printf's % sequences: a sequence like %n gets replaced with the nodename. You can optionally specify field widths, which default to right-justified (ie. %10n) or can be preceeded with a - to make them left-justified (ie. %-10n). Also, a few of the basic C-style backslash sequences are understood (ie. \n for newline, \t for tab, \\ for backslash). Anything subject to tag substitutions will be listed as such.

Here are the standard tag sequences:

%% literal %
%n nodename (ie. the output of uname -n.)
%r OS release (ie. the output of uname -r.)
%s OS name (ie. the output of uname -s.)

There are also other tag sequences that apply in special situations. They are listed where they apply.

If you try to use an undefined sequence (ie. %Z or something else), you'll get an error.

EXAMPLES

log_analysis -m root@whatever

Analyze yesterday's logs and mail the results to root@whatever. You might want to put this in a cronjob.

log_analysis -p5 -m root@whatever

Same as the last one, but PGP encrypt the logs using PGP 5 before mailing.

log_analysis -a

Look at all the logs, not just yesterday's.

log_analysis -sa /var/adm/sulog

Analyze all the contents of sulog, don't bother with local state.

log_analysis -san otherhost syslog-file

Analyze all the contents of syslog-file, which was created on otherhost. Don't run the local state commands.

log_analysis -sd1 -f foo.conf -U

This style of command is useful while developing local configs to handle log messages unknown to the internal config.

Use foo.conf as a config file in addition to the internal config. Output only the unknowns.

COMPATIBILITY

Written for Solaris and Linux. May work for other OSs.

Written for perl 5.00503. May work with some earlier perl versions.

NOTES

You often need to be root to read interesting log files.

It is customary to regularly rollover log files. Many log file formats don't include year infomation; among other benefits, rollover makes the dates in such logfiles unambiguous. log_analysis by default looks for log lines that match a particular day of the year, but does not even try to guess the year. If the OS you're using doesn't rollover some logfiles by default (ie. Solaris doesn't rollover /var/adm/wtmpx, /var/adm/wtmp, or /var/adm/sulog), you will need to rollover these files yourself to get valid output from this program.

On some OSes, '%' (ie. the percent symbol) has a special meaning in crontabs, and needs to be commented. See crontab(1).

When there are a lot of unknowns, log_analysis can take a lot longer to run. This is particularly a problem when you're first running it, before you customize for your site. To get around this problem, if you send log_analysis a SIGINT (ie. if you hit control-C), it will stop going through your logs and immediately output the results.

FILES

/etc/log_analysis.conf
/etc/log_analysis.conf-%n
/etc/log_analysis.conf-%s-%r
/etc/log_analysis.conf-%s
/usr/etc/log_analysis.conf
/usr/etc/log_analysis.conf-%n
/usr/etc/log_analysis.conf-%s-%r
/usr/etc/log_analysis.conf-%s
Config files, in order of precedence. %n, %s, and %r have the usual tag substitution meanings; see TAG SUBSTITUTIONS.
/etc/log_analysis.d
/usr/etc/log_analysis.d
Plug-in directories. All files in these directories will be treated as config files and include'd.
$HOME/.log_analysis.conf
If you start log_analysis with the -g option, this file will be loaded as a config file after all other config files, except those specified by -f. This is also the default file for the save config menu option.

AUTHOR

Mordechai T. Abzug <morty@frakir.org>

See Also