man Getopt::Declare () - Declaratively Expressed Command-Line Arguments via Regular Expressions
NAME
Getopt::Declare - Declaratively Expressed Command-Line Arguments via Regular Expressions
VERSION
This document describes version 1.08 of Getopt::Declare, released May 21, 1999.
SYNOPSIS
use Getopt::Declare;
$args = new Getopt::Declare ($specification_string, $optional_source);
DESCRIPTION
Overview
Getopt::Declare is yet another command-line argument parser, one which is specifically designed to be powerful but exceptionally easy to use.
To parse the command-line in CW@ARGV, one simply creates a Getopt::Declare object, by passing CWGetopt::Declare::new() a specification of the various parameters that may be encountered:
$args = new Getopt::Declare($specification);
The specification is a single string such as this:
$specification = q(
-a Process all data
-b <N:n> Set mean byte length threshold to <N> { bytelen = $N; }
+c <FILE> Create new file <FILE>
--del Delete old file { delold() }
delete [ditto]
e <H:i>x<W:i> Expand image to height <H> and width <W> { expand($H,$W); }
-F <file>... Process named file(s) { defer {for (@file) {process()}} }
=getrand [<N>] Get a random number (or, optionally, <N> of them) { $N = 1 unless defined $N; }
-- Traditionally indicates end of arguments { finish } );
in which the syntax of each parameter is declared, along with a description and (optionally) one or more actions to be performed when the parameter is encountered. The specification string may also include other usage formatting information (such as group headings or separators) as well as standard Perl comments (which are ignored).
Calling CWGetopt::Delare::new() parses the contents of the array CW@ARGV, extracting any arguments which match the parameters defined in the specification string, and storing the parsed values as hash elements within the new Getopt::Declare object being created.
Other features of the Getopt::Declare package include:
- •
- The use of full Perl regular expressions to constrain matching of parameter components.
- •
- Automatic generation of error, usage and version information.
- •
- Optional conditional execution of embedded actions (i.e. only on successful parsing of the entire command-line)
- •
- Strict or non-strict parsing (unrecognized command-line elements may either trigger an error or may simply be left in CW@ARGV)
- •
- Declarative specification of various inter-parameter relationships (for example, two parameters may be declared mutually exclusive and this relationship will then be automatically enforced).
- •
- Intelligent clustering of adjacent flags (for example: the command-line sequence "-a -b -c may be abbreviated to -abc", unless there is also a CW-abc flag declared).
- •
- Selective or global case-insensitivity of parameters.
- •
- The ability to parse files (especially configuration files) instead of the command-line.
Terminology
The terminology of command-line processing is often confusing, with various terms (such as argument, parameter, option, flag, etc.) frequently being used interchangeably and inconsistently in the various Getopt:: packages available. In this documentation, the following terms are used consistently: The space-separated concatenation of the elements of the array CW@ARGV at the time a Getopt::Declare object is created. A specification of a single entity which may appear in the command-line. Always includes at least one syntax for the entity. Optionally may include other (variant) syntaxes, one or more descriptions of the entity, and/or actions to be performed when the entity is encountered. For example, the following is a single parameter specification (with two variants):
--window <height> x <width> Set window to <height> by <width> { setwin($width,$height); }
--window <h>x<w>@<x>,<y> Set window size and centroid { setwin($w,$h,$x,$y); }A substring of the command-line which matches a single parameter variant. Unlike some other Getopt:: packages, in Getopt::Declare an argument may be a single element of CW@ARGV, or part of a single CW@ARGV element, or the concatenation of several adjacent CW@ARGV elements. A specification of one actual syntax variant matched by a parameter. Always consists of a leading parameter flag or parameter variable, optionally followed by one or more parameter components (that is, other parameter variables or punctuators). In the above example, CW--window <height> x <width> is a parameter definition. A sequence of non-space characters which introduces a parameter. Traditionally a parameter flag begins with - or --, but Getopt::Declare allows almost any sequence of characters to be used as a flag. In the above example, CW--window is the parameter flag. A place-holder (within a parameter specification) for a value that will appear in any argument matching that parameter. In the above example, CW<height>, CW<width>, CW<h>, CW<y>, CW<x>, and CW<y> are all parameter variables. A literal sequence of characters (within a parameter specification) which will appear in any argument matching that parameter. In the above example, the literals CWx and CW@ are punctuators. A textual description of the purpose and/or use of a particular variant of parameter. In the above examples, the string:
Set window to <height> by <width>is a parameter description. A block of Perl code to be executed in response to encountering a specific parameter. In the above example:
{ setwin($width,$height); }is a parameter action. One or more different syntaxes for a single parameter, all sharing the same leading flag, but having different trailing parameter variables and/or punctuators. Getopt::Declare considers all parameter definitions with the same leading flag to be merely variant forms of a single underlying parameter. The above example shows two parameter variants for the CW--window parameter.
Parameter definitions
As indicated above, a parameter specification consists of three parts: the parameter definition, a textual description, and any actions to be performed when the parameter is matched.
The parameter definition consists of a leading flag or parameter variable, followed by any number of parameter variables or punctuators, optionally separated by spaces. The parameter definition is terminated by one or more tabs (at least one trailing tab must be present).
For example, all of the following are valid Getopt::Declare parameter definitions:
-v in=<infile> +range <from>..<to> --lines <start> - <stop> ignore bad lines <outfile>
Note that each of the above examples has at least one trailing tab (even if you can't see them). Note too that this hodge-podge of parameter styles is certainly not recommended within a single program, but is shown so as to illustrate some of the range of parameter syntax conventions Getopt::Declare supports.
The spaces between components of the parameter definition are optional but significant, both in the definition itself and in the arguments that the definition may match. If there is no space between components in the specification, then no space is allowed between corresponding arguments on the command-line. If there is space between components of the specification, then space between those components is optional on the command-line.
For example, the CW--lines parameter above matches:
--lines1-10 --lines 1-10 --lines 1 -10 --lines 1 - 10 --lines1- 10
If it were instead specified as:
--lines <start>-<stop>
then it would match only:
--lines1-10 --lines 1-10
Note that the optional nature of spaces in parameter specification implies that flags and punctuators cannot contain the character '<' (which is taken as the delimiter for a parameter variable) nor the character '[' (which introduces an optional parameter component - see Optional parameter components).
Types of parameter variables
By default, a parameter variable will match a single blank-terminated or comma-delimited string. For example, the parameter:
-val <value>
would match any of the following the arguments:
-value # <value> <- "ue" -val abcd # <value> <- "abcd" -val 1234 # <value> <- "1234" -val "a value" # <value> <- "a value"
It is also possible to restrict the types of values which may be matched by a given parameter variable. For example:
-limit <threshold:n> Set threshold to some (real) value -count <N:i> Set count to <N> (must be an integer)
If a parameter variable is suffixed with :n, it will match any reasonable numeric value, whilst the :i suffix restricts a parameter variable to only matching integer values. These two type specifiers are the simplest examples of a much more powerful mechanism, which allows parameter variables to be restricted to matching any specific regular expression. See Defining new parameter variable types.
Parameter variables are treated as scalars by default, but this too can be altered. Any parameter variable immediately followed by an ellipsis (CW...) is treated as a list variable, and matches its specified type sequentially as many times as possible. For example, the parameter specification:
-pages <page:i>...
would match either of the following arguments:
-pages 1 -pages 1 2 7 20
Note that both scalar and list parameter variables are respectful of the flags of other parameters as well as their own trailing punctuators. For example, given the specifications:
-a -b <b_list>... -c <c_list>... ;
The following arguments will be parsed as indicated:
-b -d -e -a # <b_list> <- ("-d", "-e") -b -d ; # <b_list> <- ("-d", ";") -c -d ; # <c_list> <- ("-d")
List parameter variables are also repectful of the needs of subsequent parameter variables. That is, a parameter specification like:
-copy <files>... <dir>
will behave as expected, putting all but the last string after the CW-copy flag into the parameter variable CW<files>, whilst the very last string is assigned to CW<dir>.
Optional parameter components
Except for the leading flag, any part of a parameter definition may be made optional by placing it in square brackets. For example:
+range <from> [..] [<to>]
which matches any of:
+range 1..10 +range 1.. +range 1 10 +range 1
List parameter variables may also be made optional (the ellipsis must follow the parameter variable name immediately, so it goes inside the square brackets):
-list [<page>...]
Two or more parameter components may be made jointly optional, by specifying them in the same pair of brackets. Optional components may also be nested. For example:
-range <from> [.. [<to>] ]
Scalar optional parameter variables (such as CW[<to>]) are given undefined values if they are skipped during a successful parameter match. List optional parameter variables (such as CW[<page>...]) are assigned an empty list if unmatched.
One important use for optional punctuators is to provide abbreviated versions of specific flags. For example:
-num[eric] # Match "-num" or "-numeric" -lexic[ographic]al # Match "-lexical" or "-lexicographical" -b[ells+]w[histles] # Match "-bw" or "-bells+whistles"
Note that the actual flags for these three parameters are CW-num, CW-lexic and CW-b, respectively.
Parameter descriptions
Providing a textual description for each parameter (or parameter variant) is optional, but strongly recommended. Apart from providing internal documentation, parameter descriptions are used in the automatically-generated usage information provided by Getopt::Declare.
Descriptions may be placed after the tab(s) following the parameter definition and may be continued on subsequent lines, provided those lines do not contain any tabs after the first non-whitespace character (because any such line will instead be treated as a new parameter specification). The description is terminated by a blank line, an action specification (see below) or another parameter specification.
For example:
-v Verbose mode in=<infile> Specify input file (will fail if file does not exist)
+range <from>..<to> Specify range of columns to consider --lines <start> - <stop> Specify range of lines to process
ignore bad lines Ignore bad lines :-)
<outfile> Specify an output file
The parameter description may also contain special directives which alter the way in which the parameter is parsed. See the various subsections of ADVANCED FEATURES for more information.
Actions
Each parameter specification may also include one or more blocks of Perl code, specified in a pair of curly brackets (which must start on a new line).
Each action is executed as soon as the corresponding parameter is successfully matched in the command-line (but see Deferred actions for a means of delaying this response).
For example:
-v Verbose mode { $::verbose = 1; } -q Quiet mode { $::verbose = 0; }
Actions are executed (as CWdo blocks) in the package in which the Getopt::Declare object containing them was created. Hence they have access to all package variables and functions in that namespace.
In addition, each parameter variable belonging to the corresponding parameter is made available as a (block-scoped) Perl variable with the same name. For example:
+range <from>..<to> Set range { setrange($from, $to); }
-list <page:i>... Specify pages to list { foreach (@page) { list($_) if $_ > 0; } }
Note that scalar parameter variables become scalar Perl variables, and list parameter variables become Perl arrays.
Predefined variables available in actions
Within an action the following variables are also available: Stores the identifier of the current parameter: either the leading flag or, if there is no leading flag, the name of the first parameter variable. Stores the substring matched by each punctuator in the current parameter. The hash is indexed by the punctuator itself. The main purpose of this variable is to allow actions to check whether optional punctuators were in fact matched. For example:
-v[erbose] Set verbose mode (doubly verbose if full word used) { if ($_PUNCT_{"erbose"}) { $verbose = 2; } else { $verbose = 1; } }This hash stores boolean values indicating whether or not a given parameter has already been found. The hash keys are the leading flags or parameter variables of each parameter. For instance, the following specification makes the CW-q and CW-v parameters mutually exclusive (but see Parameter dependencies for a much easier way to achieve this effect):
-v Set verbose mode { die "Can't be verbose *and* quiet!\n" if $_FOUND_{"-q"}; }
-q Set quiet mode { die "Can't be quiet *and* verbose!\n" if $_FOUND_{"-v"}; }For reasons that will be explained in Rejection and termination, a given parameter is not marked as found until after its associated actions are executed. That is, CW$_FOUND_{$_PARAM_} will not (usually) be true during a parameter action.
Note that, although numerous other internal variables on which the generated parser relies are also visible within parameter actions, accessing any of them may have Dire Consequences. Moreover, these other variables may no longer be accessible (or even present) in future versions of Getopt::Declare. All such internal variables have names beginning with an underscore. Avoiding such variables names will ensure there are no conflicts between actions and the parser itself.
The command-line parsing process
Whenever a Getopt::Declare object is created, the current command-line is parsed by sequentially, by attempting to match each parameter in the object's specification string against the current elements in the CW@ARGV array (but see Parsing from other sources). The order in which parameters are tried against CW@ARGV is determined by three rules:
- 1.
- Parameters with longer flags are tried first. Hence the command-line argument -quiet would be parsed as matching the parameter CW-quiet rather than the parameter CW-q <string>, even if the CW-q parameter was defined first.
- 2.
- Parameter variants with the most components are matched first. Hence the argument -rand 12345 would be parsed as matching the parameter variant CW-rand <seed>, rather than the variant CW-rand, even if the shorter CW-rand variant was defined first.
- 3.
- Otherwise, parameters are matched in the order they are defined.
Elements of CW@ARGV which do not match any defined parameter are collected during the parse and are eventually put back into CW@ARGV (see Strict and non-strict command-line parsing).
ADVANCED FEATURES
Case-insensitive parameter matching
By default, a Getopt::Declare object parses the command-line in a case-sensitive manner. The CW[nocase] directive enables a specific parameter (or, alternatively, all parameters) to be matched case-insensitively.
If a CW[nocase] directive is included in the description of a specific parameter variant, then that variant (only) will be matched without regard for case. For example, the specification:
-q Quiet mode [nocase]
-v Verbose mode
means that the arguments "-q and -Q" will both match the CW-q parameter, but that only "-v" (and not "-V") will match the CW-v parameter.
If a CW[nocase] directive appears anywhere outside a parameter description, then the entire specification is declared case-insensitive and all parameters defined in that specification are matched without reagrd to case.
Termination and rejection
It is sometimes useful to be able to terminate command-line processing before all arguments have been parsed. To this end, Getopt::Declare provides a special local operator (CWfinish) which may be used within actions. The CWfinish operator takes a single optional argument. If the argument is true (or omitted), command-line processing is terminated at once (although the current parameter is still marked as having been successfully matched). For example:
-- Traditional argument list terminator { finish }
-no-- Use non-traditional terminator instead { $nontrad = 1; }
## Non-traditional terminator (only valid if -no-- flag seen) { finish($nontrad); }
It is also possible to reject a single parameter match from within an action (and then continue trying other candidates). This allows actions to be used to perform more sophisticated tests on the type of a parameter variable, or to implement complicated parameter interdependencies.
To reject a parameter match, the CWreject operator is used. The CWreject operator takes an optional argument. If the argument is true (or was omitted), the current parameter match is immediately rejected. For example:
-ar <R:n> Set aspect ratio (must be in the range (0..1]) { $::sawaspect++; reject $R <= 0 || $R > 1 ; setaspect($R); }
-q Quiet option (not available on Wednesdays) { reject((localtime)[6] == 3); $::verbose = 0; }
Note that any actions performed before the call to CWreject will still have effect (for example, the variable CW$::sawaspect remains incremented even if the aspect ratio parameter is subsequently rejected).
The CWreject operator may also take a second argument, which is used as an error message if the rejected argument subsequently fails to match any other parameter. For example:
-q Quiet option (not available on Wednesdays) { reject((localtime)[6] == 3 => "Not today!"); $::verbose = 0; }
Specifying other parameter variable types
As was mentioned in Type of parameter variables, parameter variables can be restricted to matching only numbers or only integers by using the type specifiers :n and :i. Getopt::Declare provides seven other inbuilt type specifiers, as well as two mechanisms for defining new restrictions on parameter variables.
The other inbuilt type specifiers are:
- :+i
- which restricts a parameter variable to matching positive, non-zero integers (that is: 1, 2, 3, etc.)
- :+n
- which restricts a parameter variable to matching positive, non-zero numbers (that is, floating point numbers strictly greater than zero).
- :0+i
- which restricts a parameter variable to matching non-negative integers (that is: 0, 1, 2, 3, etc.)
- :0+n
- which restricts a parameter variable to matching non-negative numbers (that is, floating point numbers greater than or equal to zero).
- :s
- which allows a parameter variable to match any quote-delimited or whitespace-terminated string. Note that this specifier simply makes explicit the default behaviour.
- :if
- which is used to match input file names. Like type ':s', type ':if' matches any quote-delimited or whitespace-terminated string. However this type does not respect other command-line flags and also requires that the matched string is either - (indicating standard input) or the name of a readable file.
- :of
- which is used to match output file names. It is exactly like type ':if' except that it requires that the string is either - (indicating standard output) or the name of a file that is either writable or non-existent.
- :s
- which allows a parameter variable to match any quote-delimited or whitespace-terminated string. Note that this specifier simply makes explicit the default behaviour.
For example:
-repeat <count:+i> Repeat <count> times (must be > 0)
-scale <factor:0+n> Set scaling factor (cannot be negative)
Alternatively, parameter variables can be restricted to matching a specific regular expression, by providing the required pattern explicitly (in matched / delimiters after the :). For example:
-parity <p:/even|odd|both/> Set parity (<p> must be "even", "odd" or "both")
-file <name:/\w*\.[A-Z]{3}/> File name must have a three- capital-letter extension
If an explicit regular expression is used, there are three convenience extensions available:
- %T
-
If the sequence CW%T appears in a pattern, it is translated to a negative
lookahead containing the parameter variable's trailing context.
Hence the parameter definition:
-find <what:/(%T\.)+/> ;
ensures that the command line argument -find abcd; causes CW<what> to match abcd, not abcd;. - %D
- If the sequence CW%D appears in a pattern, it is translated into a subpattern which matches any single digit (like a CW\d), but only if that digit would not match the parameter variable's trailing context. Hence CW%D is just a convenient short-hand for CW(?:%T\d) (and is actually implemented that way).
- %F
- By default, any explicit pattern is modified by Getopt::Declare so that it fails if the argument being matched represents some defined parameter flag. If however the sequence CW%F appears anywhere in a pattern, it causes the pattern not to reject strings which would otherwise match another flag. For example, the inbuilt types ':if' and ':of' use CW%F to enable them to match filenames which happen to be identical to parameter flags.
Defining new parameter variable types
Explicit regular expressions are very powerful, but also cumbersome to use (or reuse) in some situations. Getopt::Declare provides a general parameter variable type definition mechanism to simplify such cases.
To declare a new parameter variable type, the CW[pvtype:...] directive is used. A CW[pvtype...] directive specifies the name, matching pattern, and action for the new parameter variable type (though both the pattern and action are optional).
The name string may be any whitespace-terminated sequence of characters which does not include a >. The name may also be specified within a pair of quotation marks (single or double) or within any Perl quotelike operation. For example:
[pvtype: num ] # Makes this valid: -count <N:num> [pvtype: 'a num' ] # Makes this valid: -count <N:a num> [pvtype: q{nbr} ] # Makes this valid: -count <N:nbr>
The pattern is used in initial matching of the parameter variable. Patterns are normally specified as a /-delimited Perl regular expression:
[pvtype: num /\d+/ ] [pvtype: 'a num' /\d+(\.\d*)/ ] [pvtype: q{nbr} /[+-]?\d+/ ]
Alternatively the pattern associated with a new type may be specified as a : followed by the name of another parameter variable type (in quotes if necessary). In this case the new type matches the same pattern (and action! - see below) as the named type. For example:
[pvtype: num :+i ] # <X:num> is the same as <X:+i> [pvtype: 'a num' :n ] # <X:a num> is the same as <X:n> [pvtype: q{nbr} :'a num' ] # <X:nbr> is also the same as <X:n>
As a third alternative, the pattern may be omitted altogether, in which case the new type matches whatever the inbuilt pattern :s matches.
The optional action which may be included in any CW[pvtype:...] directive is executed after the corresponding parameter variable matches the command line but before any actions belonging to the enclosing parameter are executed. Typically, such type actions will call the CWreject operator (see Termination and rejection) to test extra conditions, but any valid Perl code is acceptible. For example:
[pvtype: num /\d+/ { reject if (localtime)[6]==3 } ] [pvtype: 'a num' :n { print "a num!" } ] [pvtype: q{nbr} :'a num' { reject $::no_nbr } ]
If a new type is defined in terms of another (for example, :a num and :nbr above), any action specified by that new type is prepended to the action of that other type. Hence:
- •
- the new type :num matches any string of digits, but then rejects the match if it's Wednesday.
- •
- the new type :a num matches any string of digits (like its parent type :num), then prints out a num!, and then rejects the match if it's Wednesday (like its parent type :num).
- •
- the new type :nbr matches any string of digits (like its parent type :a num), but then rejects the match if the global CW$::no_nbr variable is true. Otherwise it next prints out a num! (like its parent type :a num), and finally rejects the match if it's Wednesday (like its grandparent type :num).
When a type action is executed (as part of a particular parameter match), three local variables are available: which contains the value matched by the type's pattern. It is this value which is ultimately assigned to the local Perl variable which is available to parameter actions. Hence if the type action changes the value of CW$_VAL_, that changed value becomes the real value of the corresponding parameter variable (see the Roman numeral example below). which contains the name of the parameter variable being matched. which contains the name of the parameter currently being matched.
Here is a example of the use of these variables:
$args = new Getopt::Declare <<'EOPARAM';
[pvtype: type /[OAB]|AB')/ ] [pvtype: Rh? /Rh[+-]/ ] [pvtype: days :+i { reject $_VAL_<14 " $_PARAM_ (too soon!)"} ]
-donated <D:days> Days since last donation -applied <A:days> Days since applied to donate
-blood <type:type> [<rh:Rh?>] Specify blood type and (optionally) rhesus factor EOPARAM
In the above example, the :days parameter variable type is defined to match whatever the :+i type matches (that is positive, non-zero integers), with the proviso that the matching value (CW$_VAL_) must be at least 14. If a shorter value is specified for CW<D>, or CW<A> parameter variables, then Getopt::Declare would issue the following (respective) error messages:
Error: -donated (too soon!) Error: -applied (too soon!)
Note that the inbuilt parameter variable types (i, n, etc.) are really just predefined type names, and hence can be altered if necessary:
$args = new Getopt::Declare <<'EOPARAM';
[pvtype: 'n' /[MDCLXVI]+/ { reject !($_VAL_=to_roman $_VAL_) } ]
-index <number:n> Index number { print $data[$number]; } EOPARAM
The above CW[pvtype:...] directive means that all parameter variables specified with a type :n henceforth only match valid Roman numerals, but that any such numerals are automatically converted to ordinary numbers (by passing CW$_VAL_) through the CWto_roman function).
Hence the requirement that all :n numbers now must be Roman can be imposed transparently, at least as far as the actual parameter variables which use the :n type are concerned. Thus CW$number can be still used to index the array CW@data despite the new restrictions placed upon it by the redefinition of type :n.
Note too that, because the :+n and :0+n types are implicitly defined in terms of the original :n type (as if the directives:
[pvtype: '+n' :n { reject if $_VAL <= 0 } ] [pvtype: '0+n' :n { reject if $_VAL < 0 } ]
were included in every specification), the above redefinition of :n affects those types as well. In such cases the format conversion is performed before the sign tests (in other words, the inherited actions are performed after any newly defined ones).
Parameter variable type definitions may appear anywhere in a Getopt::Declare specification and are effective for the entire scope of the specification. In particular, new parameter variable types may be defined after they are used.
Undocumented parameters
If a parameter description is omitted, or consists entirely of whitespace, or contains the special directive CW[undocumented], then the parameter is still parsed as normal, but will not appear in the automatically generated usage information (see Usage information).
Apart from allowing for secret parameters (a dubious benefit), this feature enables the programmer to specify some undocumented action which is to be taken on encountering an otherwise unknown argument. For example:
<unknown> { handle_unknown($unknown); }Sometimes it is desirable to provide two or more alternate flags for the same behaviour (typically, a short form and a long form). To reduce the burden of specifying such pairs, the special directive CW[ditto] is provided. If the description of a parameter begins with a CW[ditto] directive, that directive is replaced with the description for the immediately preceding parameter (including any other directives). For example:
-v Verbose mode --verbose [ditto] (long form)
In the automatically generated usage information this would be displayed as:
-v Verbose mode --verbose " " (long form)
Furthermore, if the dittoed parameter has no action(s) specified, the action(s) of the preceding parameter are reused. For example, the specification:
-v Verbose mode { $::verbose = 1; } --verbose [ditto]
would result in the CW--verbose option setting CW$::verbose just like the CW-v option. On the other hand, the specification:
-v Verbose mode { $::verbose = 1; } --verbose [ditto] { $::verbose = 2; }
would give separate actions to each flag.
Deferred actions
It is often desirable or necessary to defer actions taken in response to particular flags until the entire command-line has been parsed. The most obvious case is where modifier flags must be able to be specified after the command-line arguments they modify.
To support this, Getopt::Declare provides a local operator (CWdefer) which delays the execution of a particular action until the command-line processing is finished. The CWdefer operator takes a single block, the execution of which is deferred until the command-line is fully and successfully parsed. If command-line processing fails for some reason (see DIAGNOSTICS), the deferred blocks are never executed.
For example:
<files>... Files to be processed { defer { foreach (@files) { proc($_); } } }
-rev[erse] Process in reverse order { $::ordered = -1; }
-rand[om] Process in random order { $::ordered = 0; }
With the above specification, the CW-rev and/or CW-rand flags can be specified after the list of files, but still affect the processing of those files. Moreover, if the command-line parsing fails for some reason (perhaps due to an unrecognized argument), the deferred processing will not be performed.
Flag clustering
Like some other Getopt:: packages, Getopt::Declare allows parameter flags to be clustered. That is, if two or more flags have the same flag prefix (one or more leading non-whitespace, non-alphanumeric characters), those flags may be concatenated behind a single copy of that flag prefix. For example, given the parameter specifications:
-+ Swap signs -a Append mode -b Bitwise compare -c <FILE> Create new file +del Delete old file +e <NICE:i> Execute (at specified nice level) when complete
The following command-lines (amongst others) are all exactly equivalent:
-a -b -c newfile +e20 +del -abc newfile +dele20 -abcnewfile+dele20 -abcnewfile +e 20del
The last two alternatives are correctly parsed because Getopt::Declare allows flag clustering at any point where the remainder of the command-line being processed starts with a non-whitespace character and where the remaining substring would not otherwise immediately match a parameter flag.
Hence the trailing +dele20 in the third command-line example is parsed as +del +e20 and not -+ del +e20. This is because the previous - prefix is not propagated (since the leading +del is a valid flag).
In contrast, the trailing +e 20del in the fourth example is parsed as +e 20 +del because, after the 20 is parsed (as the integer parameter variable CW<NICE>), the next characters are del, which do not form a flag themselves unless prefixed with the controlling +.
In some circumstances a clustered sequence of flags on the command-line might also match a single (multicharacter) parameter flag. For example, given the specifications:
-a Blood type is A -b Blood type is B -ab Blood type is AB -ba Donor has a Bachelor of Arts
A command-line argument -aba might be parsed as -a -b -a or -a -ba or -ab -a. In all such cases, Getopt::Declare prefers the longest unmatched flag first. Hence the previous example would be parsed as -ab -a, unless the CW-ab flag had already appeared in the command-line (in which case, it would be parsed as -a -ba).
These rules are designed to produce consistency and least surprise, but (as the above example illustrates) may not always do so. If the idea of unconstrained flag clustering is too libertarian for a particular application, the feature may be restricted (or removed completely), by including a CW[cluster:...] directive anywhere in the specification string.
The options are: This version of the directive allows any flag to be clustered (that is, it merely makes explicit the default behaviour). This version of the directive restricts clustering to parameters which are pure flags (that is, those which have no parameter variables or punctuators). This version of the directive restricts clustering to parameters which are pure flags, and which consist of a flag prefix followed by a single alphanumeric character. This version of the directive turns off clustering completely.
For example:
$args = new Getopt::Declare <<'EOSPEC'; -a Append mode -b Back-up mode -bu [ditto] -c <file> Copy mode -d [<file>] Delete mode -e[xec] Execute mode
[cluster:singles] EOSPEC
In the above example, only the CW-a and CW-b parameters may be clustered. The CW-bu parameter is excluded because it consists of more than one letter, whilst the CW-c and CW-d parameters are excluded because they take (or may take, in CW-d's case) a variable. The CW-e[xec] parameter is excluded because it may take a trailing punctuator (CW[xec]).
By comparison, if the directive had been CW[cluster: flags], then CW-bu could be clustered, though CW-c, CW-d and CW-e[xec] would still be excluded since they are not pure flags).
Strict and non-strict command-line parsing
Strictness in Getopt::Declare refers to the way in which unrecognized command-line arguments are handled. By default, Getopt::Declare is non-strict, in that it simply skips silently over any unrecognized command-line argument, leaving it in CW@ARGV at the conclusion of command-line processing (but only if they were originally parsed from CW@ARGV).
No matter where they came from, the remaining arguments are also available by calling the CWunused method on the Getopt::Declare object, after it has parsed. In a list context, this method returns a list of the unprocessed arguments; in a scalar context a single string with the unused arguments concatenated is returned.
Likewise, there is a CWused method that returns the arguments that were successfully processed by the parser.
However, if a new Getopt::Declare object is created with a specification string containing the CW[strict] directive (at any point in the specification):
$args = new Getopt::Declare <<'EOSPEC';
[strict]
-a Append mode -b Back-up mode -c Copy mode EOSPEC
then the command-line is parsed strictly. In this case, any unrecognized argument causes an error message (see DIAGNOSTICS) to be written to STDERR, and command-line processing to (eventually) fail. On such a failure, the call to CWGetopt::Declare::new() returns CWundef instead of the usual hash reference.
The only concession that strict mode makes to the unknown is that, if command-line processing is prematurely terminated via the CWfinish operator, any command-line arguments which have not yet been examined are left in CW@ARGV and do not cause the parse to fail (of course, if any unknown arguments were encountered before the CWfinish was executed, those earlier arguments will cause command-line processing to fail).
The strict option is useful when all possible parameters can be specified in a single Getopt::Declare object, whereas the non-strict approach is needed when unrecognized arguments are either to be quietly tolerated, or processed at a later point (possibly in a second Getopt::Declare object).
Parameter dependencies
Getopt::Declare provides five other directives which modify the behaviour of the command-line parser in some way. One or more of these directives may be included in any parameter description. In addition, the CW[mutex:...] directive may also appear in any usage decoration (see Usage information).
Each directive specifies a particular set of conditions that a command-line must fulfil (for example, that certain parameters may not appear on the same command-line). If any such condition is violated, an appropriate error message is printed (see DIAGNOSTICS). Furthermore, once the command-line is completely parsed, if any condition was violated, the program terminates (whilst still inside CWGetopt::Declare::new()).
The directives are: Specifies that an argument matching at least one variant of the corresponding parameter must be specified somewhere in the command-line. That is, if two or more required parameters share the same flag, it suffices that any one of them matches an argument (recall that Getopt::Declare considers all parameter specifications with the same flag merely to be variant forms of a single underlying parameter). If an argument matching a required flag is not found in the command-line, an error message to that effect is issued, command-line processing fails, and CWGetopt::Declare::new() returns CWundef. By default, Getopt::Declare objects allow each of their parameters to be matched only once (that is, once any variant of a particular parameter matches an argument, all variants of that same parameter are subsequently excluded from further consideration when parsing the rest of the command-line). However, it is sometimes useful to allow a particular parameter to match more than once. Any parameter whose description includes the directive CW[repeatable] is never excluded as a potential argument match, no matter how many times it has matched previously:
-nice Increase nice value (linearly if repeated) [repeatable] { set_nice( get_nice()+1 ); }
-w Toggle warnings [repeatable] for the rest of the command-line { $warn = !$warn; }As a more general mechanism is a CW[repeatable] directive appears in a specification anywhere other than a flag's description, then all parameters are marked repeatable:
[repeatable]
-nice Increase nice value (linearly if repeated) { set_nice( get_nice()+1 ); }
-w Toggle warnings for the rest of the command-line { $warn = !$warn; }The CW[mutex:...] directive specifies that the parameters whose flags it lists are mutually exclusive. That is, no two or more of them may appear in the same command-line. For example:
-case set to all lower case -CASE SET TO ALL UPPER CASE -Case Set to sentence case -CaSe SeT tO "RAnSom nOTe" CasE
[mutex: -case -CASE -Case -CaSe]The interaction of the CW[mutex:...] and CW[required] directives is potentially awkward in the case where two required arguments are also mutually exclusive (since the CW[required] directives insist that both parameters must appear in the command-line, whilst the CW[mutex:...] directive expressly forbids this). Getopt::Declare resolves such contradictory constraints by relaxing the meaning of required slightly. If a flag is marked required, it is considered found for the purposes of error checking if it or any other flag with which it is mutually exclusive appears on the command-line. Hence the specifications:
-case set to all lower case [required] -CASE SET TO ALL UPPER CASE [required] -Case Set to sentence case [required] -CaSe SeT tO "RAnSom nOTe" CasE [required]
[mutex: -case -CASE -Case -CaSe]mean that exactly one of these four flags must appear on the command-line, but that the presence of any one of them will suffice to satisfy the requiredness of all four. It should also be noted that mutual exclusion is only tested for after a parameter has been completely matched (that is, after the execution of its actions, if any). This prevents rejected parameters (see Termination and rejection) from incorrectly generating mutual exclusion errors. However, it also sometimes makes it necessary to defer the actions of a pair of mutually exclusive parameters (for example, if those actions are expensive or irreversible). The CW[excludes:...] directive provides a pairwise version of mutual exclusion, specifying that the current parameter is mutually exclusive with all the other parameters lists, but those other parameters are not mutually exclusive with each other. That is, whereas the specification:
-left Justify to left margin -right Justify to right margin -centre Centre each line
[mutex: -left -right -centre]means that only one of these three justification alternatives can ever be used at once, the specification:
-left Justify to left margin -right Justify to right margin -centre Centre each line [excludes: -left -right]means that CW-left and CW-right can still be used together (probably to indicate "left and right" justification), but that neither can be used with CW-centre. Note that the CW[excludes:...] directive also differs from the CW[mutex:...] in that it is always connected with a paricular parameter, implicitly using the flag of that parameter as the target of exclusion. The CW[requires] directive specifies a set of flags which must also appear in order for a particular flag to be permitted in the command-line. The condition is a boolean expression, in which the terms are the flags or various parameters, and the operations are CW&&, CW||, CW!, and bracketting. For example, the specifications:
-num Use numeric sort order -lex Use "dictionary" sort order -len Sort on length of line (or field)
-field <N:+i> Sort on value of field <N>
-rev Reverse sort order [requires: -num || -lex || !(-len && -field)]means that the CW-rev flag is allowed only if either the CW-num or the CW-lex parameter has been used, or if it is not true that both the CW-len and the CW-field parameters have been used. Note that the operators CW&&, CW||, and CW! retain their normal Perl precedences.
Parsing from other sources
Getopt::Declare normally parses the contents of CW@ARGV, but can be made to parse specified files instead. To accommodate this, CWGetopt::Declare::new() takes an optional second parameter, which specifies a file to be parsed. The parameter may be either: in which case CWGetopt::Declare::new() reads the corresponding handle until end-of-file, and parses the resulting text (even if it is an empty string). in which case CWGetopt::Declare::new() looks for the files $ENV{HOME}/.${progname}rc and $ENV{PWD}/.${progname}rc, concatenates their contents, and parses that. If neither file is found (or if both are inaccessible) CWGetopt::Declare::new() immediately returns zero. If a file is found but the parse subsequently fails, CWundef is returned. in which case CWGetopt::Declare::new() builds a parser from the supplied grammar and returns a reference to it, but does not parse anything. See The Getopt::Declare::code() method and The Getopt::Declare::parse() method. in which case CWGetopt::Declare::new() immediately returns zero. This alternative is useful when using a CWFileHandle:
my $args = new Getopt::Declare($grammar, new FileHandle ($filename) || -SKIP);because it makes explicit what happens if CWFileHandle::new() fails. Of course, if the CW-SKIP alternative were omitted, <Getopt::Declare::new> would still return immediately, having found CWundef as its second argument.
- Any other ARRAY reference
- in which case CWGetopt::Declare::new() treats the array elements as a list of filenames, concatenates the contents of those files, and parses that. If the list does not denote any accessible file(s) CWGetopt::Declare::new() immediately returns zero. If matching files are found, but not successfully parsed, CWundef is returned.
- A string
- in which case CWGetopt::Declare::new() parses that string directly. Note that when CWGetopt::Declare::new() parses from a source other than CW@ARGV, unrecognized arguments are not placed back in CW@ARGV.
Using Getopt::Declare objects after command-line processing
After command-line processing is completed, the object returned by CWGetopt::Declare::new() will have the following features:
- Parameter data
-
For each successfully matched parameter, the Getopt::Declare object
will contain a hash element. The key of that element will be the leading flag
or parameter variable name of the parameter.
The value of the element will be a reference to another hash which contains
the names and values of each distinct parameter variable and/or
punctuator which was matched by the parameter. Punctuators generate
string values containing the actual text matched. Scalar parameter
variables generate scalar values. List parameter variables
generate array references.
As a special case, if a parameter consists of a single component
(either a single flag or a single parameter variable), then the value for the
corresponding hash key is not a hash reference, but the actual value matched.
The following example illustrates the various possibilities:
$args = new Getopt::Declare, q{
-v <value> [etc] One or more values <infile> Input file [required] -o <outfiles>... Output files };
if ( $args->{'-v'} ) { print "Using value: ", $args->{'-v'}{'<value>'}; print " (et cetera)" if $args->{'-v'}{'etc'}; print "\n"; }
open INFILE, $args->{'<infile>'} or die; @data = <INFILE>;
foreach $outfile ( @{$args->{'-o'}{'<outfiles>'}} ) { open OUTFILE, ">$outfile" or die; print OUTFILE process(@data); close OUTFILE; }
The values which are assigned to the various hash elements are copied from the corresponding blocked-scoped variables which are available within actions. In particular, if the value of any of those block-scoped variables is changed within an action, that changed value is saved in the hash. For example, given the specification:$args = new Getopt::Declare, q{
-ar <R:n> Set aspect ratio (will be clipped to [0..1]) { $R = 0 if $R < 0; $R = 1 if $R > 1; } };
then the value of CW$args->{'-ar'}{'<R>'} will always be between zero and one. In its non-strict mode, once a Getopt::Declare object has completed its command-line processing, it pushes any unrecognized arguments back into the emptied command-line array CW@ARGV (whereas all recognized arguments will have been removed). Note that these remaining arguments will be in sequential elements (starting at CW$ARGV[0]), not in their original positions in CW@ARGV. Once a Getopt::Declare object is created, its CWusage() method may be called to explicitly print out usage information corresponding to the specification with which it was built. See Usage information for more details. If the CWusage() method is called with an argument, that argument is passed to CWexit after the usage information is printed (the no-argument version of CWusage() simply returns at that point). Another useful method of a Getopt::Declare object is CWversion(), which prints out the name of the enclosing program, the last time it was modified, and the value of CW$::VERSION, if it is defined. Note that this implies that all Getopt::Declare objects in a single program will print out identical version information. Like the CWusage() method, if CWversion is passed an argument, it will exit with that value after printing. It is possible to separate the construction of a Getopt::Declare parser from the actual parsing it performs. If CWGetopt::Declare::new() is called with a second parameter CW'-BUILD' (see Parsing from other sources, it constructs and returns a parser, without parsing anything. The resulting parser object can then be used to parse multiple sources, by calling its CWparse() method. CWGetopt::Declare::parse() takes an optional parameter which specifies the source of the text to be parsed (it parses CW@ARGV if the parameter is omitted). This parameter takes the same set of values as the optional second parameter of CWGetopt::Declare::new() (see Parsing from other sources). CWGetopt::Declare::parse() returns true if the source is located and parsed successfully. It returns a defined false (zero) if the source is not located. An CWundef is returned if the source is located, but not successfully parsed. Thus, the following code first constructs a parsers for a series of alternate configuration files and the command line, and then parses them:# BUILD PARSERS my $config = Getopt::Declare::new($config_grammar, -BUILD); my $cmdline = Getopt::Declare::new($cmdline_grammar, -BUILD);
# TRY STANDARD CONFIG FILES $config->parse(-CONFIG)
# OTHERWISE, TRY GLOBAL CONFIG or $config->parse('/usr/local/config/.demo_rc')
# OTHERWISE, TRY OPENING A FILEHANDLE (OR JUST GIVE UP) or $config->parse(new FileHandle (".config") || -SKIP);
# NOW PARSE THE COMMAND LINE
$cmdline->parse() or die;
It is also possible to retreive the command-line parsing code generated internally by CWGetopt::Declare::new(). The CWGetopt::Declare::code() method returns a string containing the complete command-line processing code, as a single CWdo block plus a leading CWpackage declaration. CWGetopt::Declare::code() takes as its sole argument a string containing the complete name of the package (for the leading CWpackage declaration in the generated code). If this string is empty or undefined, the package name defaults to main. Since the default behaviour of CWGetopt::Declare::new() is to execute the command-line parsing code it generates, if the goal is only to generate the parser code, the optional second '-BUILD' parameter (see Parsing from other sources) should be specified when calling <Getopt::Declare::new()>. For example, the following program inlines a CWGetopt::Declare specification, by extracting it from between the first =for Getopt::Declare and the next =cut appearing on CWSTDIN:use Getopt::Declare;
sub encode { return new Getopt::Declare (shift,-BUILD)->code() || die }
undef $/; if (<>) { s {^=for\s+Getopt::Declare\s*\n(.*?)\n=cut} {'my (\$self,$source) = ({});'.encode($1).' or die "\n";'} esm; }
print;
Note that the generated inlined version expects to find a lexical variable named CW$source, which tells it what to parse (this variable is normally set by the optional parameters of CWGetopt::Declare::new() or CWGetopt::Declare::parse()). The inlined code leaves all extracted parameters in the lexical variable CW$self and the does not autogenerate help or version flags (since there is no actual Getopt::Declare object in the inlined code through which to generate them).
AUTOGENERATED FEATURES
Usage information
The specification passed to CWGetopt::Declare::new is used (almost verbatim) as a usage display whenever usage information is requested. Such requests may be made either by specifying an argument matching the help parameter (see Help parameters) or by explicitly calling the CWGetopt::Declare::usage() method (through an action or after command-line processing):
$args = new Getopt::Declare, q{
-usage Show usage information and exit
{ $self->usage(0); }
+usage Show usage information at end of program };
# PROGRAM HERE
$args->usage() if $args->{'+usage'};The following changes are made to the original specification before it is displayed:
- *
- All actions and comments are deleted,
- *
- any CW[ditto] directive is converted to an appropriate set of ditto marks,
- *
- any text in matching square brackets (including any directive) is deleted,
- *
-
any parameter variable type specifier (:i, :n, :/pat/, etc.) is deleted.
Otherwise, the usage information displayed retains all the formatting
present in the original specification.
In addition to this information, if the input source is CW@ARGV,
Getopt::Declare displays three sample command-lines: one indicating
the normal usage (including any required parameter variables), one
indicating how to invoke help (see Help parameters), and one
indicating how to determine the current version of the program (see
Version parameters).
The usage information is printed to CWSTDOUT and (since Getopt::Declare
tends to encourage longer and better-documented parameter lists) if
the IO::Pager package is available, an IO::Pager object is used to
page out the entire usage documentation.
It is sometimes convenient to add other decorative features to a
program's usage information, such as subheadings, general notes,
separators, etc. Getopt::Declare accommodates this need by ignoring
such items when interpreting a specification string, but printing them
when asked for usage information.
Any line which cannot be interpreted as either a parameter
definition, a parameter description, or a parameter action, is treated
as a decorator line, and is printed verbatim (after any square
bracketted substrings have been removed from it).
The key to successfully decorating Getopt::Declare usage
information is to ensure that decorator lines are separated from
any preceding parameter specification, either by an action or by an
empty line. In addition, like a parameter description, a decorator
line cannot contain a tab character after the first non-whitespace
character (because it would then be treated as a parameter
specification).
The following specification demonstrates various forms of usage
decoration. In fact, there are only four actual parameters (CW-in,
CW-r, CW-p, and CW-out) specified. Note in particular that leading tabs
are perfectly acceptible in decorator lines.
$args = new Getopt::Declare (<<'EOPARAM');
============================================================ Required parameter:
-in <infile> Input file [required]
------------------------------------------------------------
Optional parameters:
(The first two are mutually exclusive) [mutex: -r -p]
-r[and[om]] Output in random order -p[erm[ute]] Output all permutations
---------------------------------------------------
-out <outfile> Optional output file
------------------------------------------------------------ Note: this program is known to run very slowly of files with long individual lines. ============================================================ EOPARAM
Help parameters
By default, Getopt::Declare automatically defines all of the following parameters:
-help Show usage information [undocumented]
{ $self->usage(0); }
-Help [ditto]
-HELP [ditto]
--help [ditto]
--Help [ditto]
--HELP [ditto]
-h [ditto]
-H [ditto]
Hence, most attempts by the user to get help will automatically work
successfully.
Note however that, if a parameter with any of these flags is
explicitly specified in the string passed to CWGetopt::Declare::new(),
that flag (only) is removed from the list of possible help flags. For
example:
-w <pixels:+i> Specify width in pixels -h <pixels:+i> Specify height in pixelswould cause the CW-h help parameter to be removed (although help would still be accessible through the other seven alternatives).
Version parameters
Getopt::Declare also automatically creates a set of parameters which can be used to retreive program version information:
-version Show version information [undocumented]
{ $self->version(0); }
-Version [ditto]
-VERSION [ditto]
--version [ditto]
--Version [ditto]
--VERSION [ditto]
-v [ditto]
-V [ditto]
As with the various help commands, explicitly specifying a parameter
with any of the above flags removes that flag from the list of version
flags.
DIAGNOSTICS
Getopt::Declare may issue the following diagnostics whilst parsing a command-line. All of them are fatal (the first five, instantly so): A matching pair of angle brackets were specified as part of a parameter definition, but did not form a valid parameter variable specification (that is, it wasn't in the form: <name> or <name:type>). An unknown type specifier was used in a parameter variable type suffix. A Perl syntax error was detected in the indicated action. An action was found for which there was no preceding parameter specification. This usually occurs because the trailing tab was omitted from the preceding parameter specification. An action was found, but it was missing one or more closing '}'s. The condition specified as part of the indicated CW[requires:...] directive was not a well-formed boolean expression. Common problems include: omitting a CW&&/CW|| operator between two flags, mismatched brackets, or using CWand/CWor instead of CW&&/CW||. Either there was a Perl syntax error in one some action (which was not caught by the previous diagnostic), or (less likely) there is a bug in the code generator inside Getopt::Declare. The flag for the indicated parameter was found, but the argument did not then match any of that parameter's variant syntaxes. Two mutually exclusive flags were specified together. No argument matching the specified required parameter was found during command-line processing. The indicated parameter has a CW[requires:...] directive, which was not satisfied. A command-line argument was encountered which did not match any specified parameter. This diagnostic can only only appear if the strict option is in effect. A parameter variable in the indicated parameter was declared with the type :+i (or a type derived from it), but the corresponding argument was not a positive, non-zero integer. A parameter variable in the indicated parameter was declared with the type :+n (or a type derived from it), but the corresponding argument was not a positive, non-zero number. A parameter variable in the indicated parameter was declared with the type :0+i (or a type derived from it), but the corresponding argument was not a positive integer. A parameter variable in the indicated parameter was declared with the type :0+n (or a type derived from it), but the corresponding argument was not a positive number.
AUTHOR
Damian Conway (damian@conway.org)
BUGS AND ANNOYANCES
There are undoubtedly serious bugs lurking somewhere in this code. If nothing else, it shouldn't take 1500 lines to explain a package that was designed for intuitive ease of use! Bug reports and other feedback are most welcome.
COPYRIGHT
Copyright (c) 1997-2000, Damian Conway. All Rights Reserved. This module is free software. It may be used, redistributed and/or modified under the terms of the Perl Artistic License (see http://www.perl.com/perl/misc/Artistic.html)