man lurkftp (Commandes) - monitor and/or mirror FTP sites

NAME

lurkftp v1.00 - monitor and/or mirror FTP sites

SYNOPSIS

lurkftp [options] [site [dir] .. ] ..

DESCRIPTION

Lurkftp is the ultimate FTP site lurker and mirroring program. It will monitor changes in source directories and either just report changes or mirror changes into a destination directory.

Lurkftp in its most basic mode takes site/directory-list "pairs", as follows:

site1 /pub/dir1
site2 /pub/dir2a /pub/dir2b
site3 dir3

These pairs are either separated by newlines (if in an option file), by the -+ option (e.g. "lurkftp site1 /pub/dir1 -+ site2 /pub/dir2"), or by the EOF of an option file.

Once all options are parsed, processing begins with the first pair and (by default) continues with subsequent pairs until all have been processed.

Default processing operates as follows:

•
the directory is recursively read from the source site

•
If no directory was readable from the source site, the line `*** <site> <dirs>: <error> ***' is printed, and processing continues with the next site.

•
The results are compared with the `destination directory', which is by default the contents of a placeholder file, normally called .chkls.<site><dirs>.gz, with `/''s and ` ''s removed from the dirs list and replaced with `.' and `_', respectively. Thus, the default placeholder file for the "pair" `site1 /pub/dir1 /pub/dir2' is `.chkls.site1.pub.dir1_.pub.dir2.gz'.

•
If any changes occur, the line `--- <site> <dirs> ---' is printed, along with a list of changes, sorted by date, then name. Each change is preceded by a single character indicating the type of change. Additions and removals are preceded by `+' and `-', respectively. Other prefixes are documented by the options which might generate those prefixes.

•
If any changes occurred, the destination placeholder file is updated.

•
Processing then continues with the next site.

OPTIONS

The entire pair list and/or each pair can be preceded by a list of options. Actually, any options that precede the -+ option (explicit or implied) will apply to the given pair.

Note: argument-less options take the `+' prefix to mean the opposite of the normal meaning. Other options can also take `+', but the meaning doesn't change. The -+ (++) -h (+h), and -- (+-) also don't change meaning.

Percent Substitution

There are two types of %-substitution: outside and inside. Outside substitution is for site-wide items, such as file names and report headers. Inside substitution is for file-specific items, such as mirror pipes and report lines.

The following outside-substitutions are done:

%s
Site name

%d
Underline-separated list of directories, substituting `/' with `.' and ` ' with `_'.

%p
Space-separated list of directories

%S
Alternate site

%D
Alternate %d-style list of directories

%P
Alternate %p-style list of directories

%t
Extra text; for headers/footers this is "totals" and for error messages this is the actual error message. Otherwise this is an empty string.

%%
The `%' character

The following inside-substitutions are done:

%f
File name without path

%L
The link name (if appropriate)

%r
Full directory path to remote file (without file name; with trailing `/')

%l
Full directory path to local file (without file name; with trailing `/')

%s
Site name

%b
Size (in bytes)

%m
Mode (full mode, including file type)

%d
Modification date (YYYYMMDD)

%t
Modification time (HHMM)

%{<format>}
Modification date passed with <format> to strftime(3)

%D
The device major number (device nodes only)

%M
The device minor number (device nodes only)

%T
The type of operation ('+', '-', etc.)

%[conditional_text]
Add %-substituted text conditionally. The format of the conditional text is: <condition> [ ? <true_text> ] [ : <false_text> ]. Note that either the ? or the : or both must be present. The <condition> is evaluated, and either the <true_text> or the <false_text> is evaluated and inserted as appropriate. The following conditions are available:
B[=|>|<][size]
Check byte-size of file. If no directional specifier is given, then > is implied. If no size is given, then 0 is implied. Size may be specifed as number of bytes, number of Kilobytes, number of Blocks, or number of Megabytes by adding (or omitting) the appropriate capital-lettered suffix.

l
True if directory entry is a soft link.

f
True if directory entry is a regular file.

d
True if directory entry is a directory.

s
True if directory entry is a socket.

b
True if directory entry is a block-special file.

c
True if directory entry is a character-special file.

p
True if directory entry is named pipe.

t
True if sticky bit is set for directory entry.

[ugo]rwxS
Check permissions as specified by given pattern. S stands for setuid/setgid. Permissions are anded with given mask (if no ugo given, then all are implied) and true is returned if any bit is still set.

T<type>
True if type of operation is equal to given <type> character.

%%
The `%' character

Generic Directory Specification

All FTP sites may be specified as follows: [ [ <user> ] [ ,<acct> ] [ :<pass> ] @ ] [ <host> ] [ ,<port> ] [ :<dir> ]. The -o and -O options take generic directory specs as arguments. These are as follows:

l<ls-lR file>
This specifies a local file with a format parsable by the listing parser. The file name is processed by outside %-substitution.

c<command>
This specifies a command to run which will generate output parsable by the listing parser. This is usally an ls(1) or a find(1) command. The command is processed by outside %-substitution.

m<lsfile>
This specifies a lurkftp-generated placemarker file. The file name is processed by outside %-substitution.

d<localdir>
This specifies a local directory to recursively read for directory entries. Multiple directories may be specified.

f<ftpsite>
This specifies a site+dir to recursively read for directory entries. Multiple directories may be specified.

L<ftpsite>
This specifies a remote file in a format parsable by the listing parser to retrieve and use for the directory.

General Options

-B
Run in background: close stdin/stdout/stderr, fork, and dissociate from parent process group. Lurkftp should return immediately to the invoking process.

-F <filename>
Read an option file (immediately). In option files, blank lines and anything on a line after a `#' are ignored. An implicit `-+' option (i.e. site/dir pair separator) is generated at the end of any line containing a site and/or directory name. Quotes (`' and `"'), the `\' character, and the `~' character in option files are handled as per csh(1). Environment variables ($<name> or ${<name>}) are also expanded when not escaped by single quotes or backslash.

-P
Process in parallel by calling fork(2) before processing each site.

-N
Indicate that subsequent operations depend on their predecessor. That is, forks will not separate these operations, and failure in one operation will terminate all subsequent dependent operations. There may be multiple dependency groups.

-z <prog>
Program to filter all ls files through when writing (default: gzip). Setting this to an empty string disables output filtering.

-Z <prog>
Program to filter ls files or remote listings through if the first character of the file in question is non-printable as per ANSI isprint(3). (default: gunzip). Setting this to an empty string resets to the default.

-v <mask>
Set debug mask to <masks>. Masks greater than 0 will produce some lurkftp trace messages.

--
Next argument is literal. Note that this differs from getopt(3) in that it only literalizes the next option, not all remaining options.

-+
Separate multiple site/dir groups.

-h
Print help message and exit.

Reporting Options

-q
Suppress change report

-R <command>
If a report is generated, then pipe that report to the given command. Otherwise don't invoke the command.

-r <type><string>
This option sets various report-related strings.
Type
Function

t
Sets the report's title string. Outside %-substitution is performed on this string. The default is `--- %s %d ---'.

d
Sets the report's directory-entry line. Inside %-substitution is performed on this string. The default is `%T %d %12b %r%f%[l? -> %L]' if mirroring is turned on, and is the same, but surrounded by the conditional `%[T<T>: ... ]' when mirroring is disabled so that moves are not reported.

f
Sets the report's footer string. Outside %-substitution is performed on this string. The default is `%t'.

s
Sets the report's sort string. The sort string is at most 8 comparison specifiers, and sorting is ordered by performing each comparison in the order of the string until a mismatch is found. The default is `fdpnlst'. The following comparison specifiers can be used, as well as the reverse-order version (which is the same letter, but capitalized):
f
Sort numerically by file type.

m
Sort numerically by mode (other than file type).

p
Sort alphabetically by file's path.

n
Sort alphabetically by file's name.

l
Sort alphabetically by link name, if present.

d
Sort by date (ymd) if entry is a file.

t
Sort by time (hm) if entry is a file.

s
Sort by size (in bytes) if entry is a file.

T
Sets the error report's title string. Outside %-substitution is performed on this string. The default is `\n*** ERRORS IN %S %P -> %s %p MIRROR ***'.

D
Sets the error report's directory-entry line. Inside %-substitution is performed on this string.

F
Sets the error report's footer string. Outside %-substitution is performed on this string.

S
Sets the error report's sort string. The sort string is in the same format as that used by the standard report. The default is `PNLFDST'.

e
Sets the format for general error messages. Outside %-substitution is perfomed on this string.

Site/Directory Specification Options

-o <dirspec>
Set generic source directory.

-O <dirspec>
Set generic destination directory.
-p <password>
Set default FTP login password (default: <myusername>@)

-b <base>
Change default name (formerly just base name) for placeholder files (default: .chkls.%s%d.gz). Outside %-substitution is performed on the name.
-L <rname>
Use file <rname> on remote site instead of performing remote directory listing. Note: this option overrides the -f option below. This option only affects the next site/dir pair.

-U
Detect unchanged (i.e. moved) files. If two regular files have the same date, size, and name, but are located in different directories, then they are processed as moved. When no mirror directory or pipe is defined, moved files are not reported; otherwise they are reported with `<' and `>' for the original and new location, respectively, and the file is not retrieved from the remote site, but either ignored (if pipes are enabled) or moved as if by mv(1) if mirroring to a directory is enabled. An error in moving will be reported by the characters `(' and `)'.

-M
Force "manual" recursion when retrieving remote listing by using LIST -la or LIST (depending on which works) on each directory and issuing a CWD command to enter subdirectories. This mode is invoked automatically if the default LIST -lRa command fails for any reason (usually because the -lRa options aren't supported by the remote FTP daemon). This is especially useful if specific directories are to be filtered out, as the recursion routine will match the name of the directory to be entered (with a trailing `/') against the exclude filter before recursing.

Mirroring Options

-m
Perform mirroring when applicable; requires -d and/or -e and/or -t options. If this option is turned off, reports are still made, so this option can be used to test what the results of a mirroring operation would be. Beware: List files are also updated, however, so some pseudo-directory tricks to mirror-pipe specific files will pretend complete success. (e.g. the sunsite .lsm trick used in the example can't be harmlessly tested before running). Any failure to download a file (and, in the case of the -e option, complete the pipe successfully) will be reported by an entry preceded by the `*' character, and any failure to remove a file will be reported by an entry preceded by the `#' character.

-d <ldir>
Set local directory to <ldir> and read it instead of a placeholder file. This option only applies to the next site/dir pair.

-e <cmd>
Don't update the local directory when mirroring; instead pipe each new file into <cmd>. This option only applies to the next site/dir pair. It would probably also be useful to use the -l and -f options with this. The local directory (-d) is only needed if the %l %-escape is used. Inside %-substitution is perfomed on <cmd>.

-l <file>
Read and update placeholder <file> instead of using contents of local directory. This option only affects the next site/dir pair. The same %-substitutions are done as for the -b option.

-f <file>
Read placeholder <file> instead of retrieving remote directory. This option only affects the next site/dir pair. The same %-substitutions are done as for the -b option. This option overrides the -L option above.

-E
Make "exact" comparison: fix modes to match remote site. The report shows changes which merely change modes by preceding them with a `M'. Failure to perform the change will be reported by preceding the entry with `$'.

-n
Make no file transfers or moves, or deletions; just update date stamps [and modes if the -E option is active].

-A
Attempt to append to files which increase in size instead of downloading the entire file. This is useful in cases where a directory of log files which always increase in size is to be mirrored.

-t <site>
Mirror source files to remote directory.

-c
Force source files to be from local directory.

-g <pipe>
Get source files by executing <pipe>. Inside %-substitution is done on <pipe>.

Filtering Options

Note: Only one include filter and/or one exclude filter can be specified. The include filter is run first, and then the exclude filter. Passing the null string to the -i or -x options removes the associated filter.

-i <regex>
Include only files that match the extended regular expression <regex>.

-I <file>
Include only files that match the extended regular expression contained in <file>. Newlines in <file> are converted to `|'.

-x <regex>
Exclude any files that match the extended regular expression <regex>.

-X <file>
Exclude any files that match the extended regular expression contained in <file>. Newlines in <file> are converted to `|'.

-D
Filter out directories. Note that in order to handle automatic directory processing properly, mirrors that use -f to read placeholder files that were generated with this option should also have this option in effect.

-s
Don't filter out specials (device nodes, pipes, and sockets). Normally they are filtered out. Note that when mirroring device nodes and pipes are created, but sockets aren't.

Timeout Options

Note that all timeout options use the same base option, -T. All timeout options can be specified with the same parameter string by concatenating desired timeouts. Also, any timeout set to zero is disabled completely.

-T c<seconds>
Initial connection and login timeout (default: 20)

-T t<seconds per K>
Timeout for file and directory transfers (default: 10)

-T o<seconds>
Timeout for simple commands (cd, pwd, etc.)

-T q<seconds>
Timeout for quit command and logout (default: 5)

-T r<count>
# of times to retry list and/or file retrievals before giving up (default: 10)

-T d<seconds>
Amount of time to wait between retries (default: 10)

EXAMPLES

Command lines

# Look for new versions of X for Linux & mail report to me
lurkftp -i Linux ftp.xfree86.org /pub/XFree86 -F .mailme

# .mailme is a file containing: -R 'mail -s "lurkftp output" dark'

# Mirror a single directory with reschedule; # at will mail me the report. atcron "2:00 tomorrow" lurkftp -m -d /net/ftp/rplay \ ftp.sdsu.edu /pub/rplay

# Mirror slackware disk set via sz into /usr/local/sw # Not recommended if no auto-download in local comm program lurkftp -d /usr/local/sw -l .sw.gz -e "ONAME=%l%f sz -" \ ftp.cdrom.com /pub/linux/slackware/slakware -F .mailme

# Do main lurking; see config files below lurkftp -F .chksites

Contents of .chksites

# An extract from my command file
# no multiple entries from same site, so simplify name
-b .chkls.%s.gz
-R 'mail -s "LurkFTP Output" dark' # mail reports to me
-D # Don't care about changes in directories
-U # ignore moves
-P # fork away!
-X .chkfilt.sunsite # special filter for sunsite
sunsite.unc.edu /pub/Linux # fetch master list
-N # .lsm stuff depends on sunsite
# mail new .lsm's to me
-i '.*\.lsm$' -x /Incoming/ # include lsm's not in Incoming dir
-f .chkls.%s.gz # Read remote site from previously generated listing
#Note: the following file was primed so that old .lsms wouldn't
#be sent.  This was done by *not* using -m.  It could've also
# been primed by using the command:
# zgrep .lsm .chkls.sunsite.gz | gzip >.chkls.lsms.gz
-l .chkls.lsms.gz # Keep track of sent .lsm's in this file
-m -e 'mail -s "lurkftp: %f" dark' # mirror through pipe
sunsite.unc.edu /pub/Linux # same site/dir as above
-i "" # reset include filter
+N # No more dependencies
-X .chkfilt # filter for everyone else
tsx-11.mit.edu /pub/linux/680x0 /pub/linux/packages/GCC
ftp.kernel.org /pub/linux/kernel
# etc.

Contents of .chkfilt

INDEX.whole
INDEX.short
ls-lR
/INDEX(|.html|.old)$
00-find-ls(|.gz)$

Contents of .chkfilt.sunsite

/README$
/distributions/
/!INDEX
/archive-current/
linux-announce.archive
INDEX.whole
INDEX.short
ls-lR
/INDEX(|.html|.old)$
00-find-ls(|.gz)$

SEE ALSO

regexec(3), gzip(1), ftp(1), mail(1), at(1), mirror(1L).

BUGS

[+: may want to fix; *: definitely want to fix; -: may never fix]

* Doesn't handle non-UNIX remote sites [I know of none any more] + Some fixed-sized buffers may overflow - Groups & user names aren't mirrored - Sockets aren't mirrored - Exact time isn't used for comparison (only accurate to what ls gives) - All options in external program option group are obsolete + Few options are really range-checked * Probably plenty of nasty hidden bugs

DIAGNOSTICS

Failed transfers are marked in the report. Specific errors are printed to stderr. Debugging messages and some error messages are only printed when the debug level (as set by the -v option) is greater than 0.

AUTHOR

Thomas J. Moore, dark@mama.indstate.edu