man VCP::Filter::map () - rewrite name, branch_id or delete revisions

NAME

VCP::Filter::map - rewrite name, branch_id or delete revisions

SYNOPSIS

  ## In a .vcp file:

    Map:
            name_glob_1<branch_1> name_out_1<branch_result_1>
            name_glob_2<branch_2> name_out_2<branch_result_2>
            # ... etc ...

  ## From the command line:
   vcp <source> map: name_glob_1<branch_1> name_out_1<branch_result_1> -- <dest>

  ## you may have one or more ( pattern match ) pairs on the command
  ## line, ending with --

  ## the <branch> part of the maps is optional.

DESCRIPTION

Maps source files, revisions, and branches to destination files and branches while copying a repository. This is done by rewriting the CWname and CWbranch_id of revisions according to a list of rules.

Rules

A rule is a pair of expressions specifying a pattern to match against each incoming revision's name and branch_id and a replacement expression specifying the revision's new name and branch_id.

The list of rules is evaluated top down; the first rule in the list that matches is used to generate the new name and branch_id. If no other rules match the implicit default rule is to copy files as is.

Patterns and Replacement Expressions

Patterns and replacements are each are composed of two subexpressions, the CWname_expr and the CWbranch_id_expr like so:

    name_expr<branch_id_expr>

The CW<branch_id_expr> (including angle brackets) is optional and may be forbidden by some sources or destinations that embed the concept of a branch in the name_expr. (See VCP::Dest::p4 for an example, though this may be changed in the future).

For now, the symbols CW# and CW@ are reserved for future used in all expressions and must be escaped using CW\, and various shell-like wildcards are implemented in pattern expressions.

Pattern Expressions

Both the CWname_expr and CWbranch_id_expr specify patterns using shell regular expression syntax with the extension that parentheses are used to extract portions of the match in to numbered variables which may be used in the result construction, like Perl regular expressions:

   ?      Matches one character other than "/"
   *      Matches zero or more characters other than "/"
   ...    Matches zero or more characters, including "/"
   (foo)  Matches "foo" and stores it in the $1, $2, etc

Some example pattern CWname_exprs are:

   Pattern
   name_expr  Matches
   =========  =======
   foo        the top level file "foo"
   foo/bar    the file "foo/bar"
   ...        all files (like a missing name_expr)
   foo/...    all files under "foo/"
   .../bar    all files named "bar" anywhere
   */bar      all files named "bar" one dir down
   ....pm     all files ending in ".pm"
   ?.pm       all top level 4 char files ending in ".pm"
   \?.pm      the top level file "?.pm"
   (*)/...    all files in subdirs, puts the top level dirname in $1

Unix-style slashes are used, even on operating systems where that may not be the preferred local custom. A pattern consisting of the empty string is legal and matches everything (NOTE: currently there is no way to take advantage of this; quoting is not implemented in the forms parser yet. use ... instead).

Relative paths are taken relative to the rev_root indicated in the source specification for pattern CWname_exprs (or in the destination specification for result CWname_exprs). For now, a relative path is a path that does not begin with the character CW/, so be aware that the pattern CW(/) is relative. This is a limitation of the implementation and may change, until it does, don't rely on a leading ( making a path relative and use multiple rules to match multiple absolute paths.

If no CWname_expr is provided, CW... is assumed and the pattern will match on all filenames.

Some example pattern CWbranch_id_exprs are:

    Pattern
    branch_id_expr  Matches files on
    =============   ================
    <>              no branch label
    <...>           all branches (like a missing <branch_id_expr>)
    <foo>           branch "foo"
    <R...>          branches beginning with "R"
    <R(...)>        branches beginning with "R", the other chars in $1

If no CWbranch_id_expr is provided, files on all branches are matched. CW* and CW... still match differently in pattern CWbranch_id_exprs, as in <name_expr> patterns, but this is likely to make no difference, as I've not yet seen a branch label with a / in it. Still, it is wise to avoid * in CWbranch_id_expr patterns.

Some example composite patterns are (any $ variables set are given in parenthesis):

    Pattern            Matches
    =======            =======
    foo<>              top level files named "foo" not on a branch
    (...)<>            all files not on a branch ($1)
    (...)/(...)<>      all files not on a branch ($1,$2)
    ...<R1>            all files on branch "R1"
    .../foo<R...>      all files "foo" on branches beginning with "R"
    (...)/foo<R(...)>  all files "foo" on branches beginning with "R" ($1, $2)

Escaping

Null characters and newlines are forbidden in all expressions.

The characters CW#, CW@, CW[, CW], CW{, CW}, CW>, CW< and CW$ must be escaped using a CW\, as must any wildcard characters meant to be taken literally.

In result expressions, the wildcard characters CW*, CW?, the wildcard trigraph CW... and parentheses must each be escaped with single CW\ as well.

No other characters are to be escaped.

Case sensitivity

By default, all patterns are case sensitive. There is no way to override this at present; one will be added.

Result Expressions

Result expressions look a lot like patthern expressions except that wildcards are not allowed and CW$1 and CW${1} style variable interpolation is.

To explore result expressions, let's look at converting set of example files between cvs and p4 repositories. The difficulty here is that cvs and p4 have differing branching implementations.

Let's assume our CVS repository has a module named CWflibble with a file named CWfoo/bar in it. Here is a branch diagram, with the main development trunk shown down the left (CW1.1 through CW1.6, etc) and a single branch, tagged in CVS with a branch tag of CWbeta_1, is shown forking off version CW1.5:

     flibble/foo/bar:

         1.1
          |
         ...
          |
         1.5
          | \
          |  \ beta_1
          |   \
         1.6   \
          |    1.5.2.1
         ...    |
                |
               1.5.2.2
                |
               ...

    NOTE 1: You can use C<vcp> to extract graphical branch diagrams by
    installing AT&T's GraphViz package and the Perl CPAN module
    GraphViz.pm.  Then you can use a command like:

        $ vcp cvs:/var/cvsroot:flibble/foo/bar \
            branch_diagram:foo_bar.png

    to generate a .png file showing something like the above diagram.

On the other hand, p4 users typically branch files using directory names. Here's file CWfoo/bar again, with the main trunk held in the main depot's //depot/main directory, again with a branch after the 5th version of the file, but this time, the branch is represented by taking a copy

    //depot/main/foo/bar

         #1
          |
         ...
          |
         #5
          |\
          | \ //depot/beta_1/foo/bar
          |  \
         #6   \
          |   #1
         ...   |
               |
              #2
               |
              ...

    NOTE 2: the p4 command allows users to branch in very crafty and
    creative ways; it does not enforce the semantic of 1 branch per
    directory, and this gives p4 users a lot of power and flexibility.
    It also means that you might need some pretty crafty and creative
    branch maps when converting from p4 to other repositories.

    NOTE 3: that branch looks like a copy, but is actually just a
    metadata entry in the perforce repository, so it's very low
    overhead in terms of server effort and disk space, usually
    even more so than CVS branches.

    NOTE 4: Using GraphViz (as described in NOTE 1 above), you can
    build a diagram like this using vcp:

        $ vcp p4:perforce.our.com:1666://depot/flibble/foo/bar \
            branch_diagram:foo_bar.png

A user may or may not choose to label a branch in p4 with something called a branch specification (see p4 help branch for details). For this discussion, we'll assume they didn't.

First, let's look at cvs -> p4 conversion. To do this, we need to match the branch tags in the CVS repository and use them to map branched files in to a p4 subdirectory. Here's .vcp file for this:

   ## cvs2p4.vcp

   Source:
   # get all files in the flibble module from cvs
       cvs:/var/cvsroot:flibble/...

   Destination:
   # Put the files in the flibble directory in the main depot of p4
       p4:perforce.our.com:1666://depot/flibble/...

   Map:
   #   Pattern       Result
   #   ============  =======
       (...)<>       main/$1   # main trunk => //depot/flibble/main/...
       (...)<(...)>  $2/$1     # branches   => //depot/flibble/$branch/...

The CWSource: and CWDestination: fields are just pieces of a normal CWvcp command line moved in to CWcvs2p4.vcp. The CWMap: field is a list of rules composed of pattern, result expression pairs.

In this example, all of the map expressions are relative paths. The patterns are relative to the CWSource: cvs repositories' "CWflibble" module. The results are relative to the CWDestination: p4 repositories' "CW//depot/flibble/" directory.

The first rule maps all files that have no branch tag in to the p4 directory CW//depot/flibble/main/. The CW(...)<> pattern has two parts: a CWname part and a CWbranch_id part. The CWname part, CW(...), matches all path names and copies them to the CW$1 variable. The CWbranch_id part, CW < >, matches empty / missing CWbranch_ids (CWvcp's name for the CVS branch tag associated with a file on a branch). The CW main/$1 result retrieves the CWname part stored in CW$1 and prefixes it with "CWmain/" to build the final CWname value.

The second rule maps all files on branches to an appropriately named subdirectory in the p4 destination. The pattern is a lot like the first rule's, but has a CWbranch_id part that matches all CWbranch_ids and copies them in to CW$2. The rule merely uses this CWbranch_id from CW$2 instead of the hardcoded "CWmain/" string to place the branches in appropriate subdirectories.

Here's how our flibble/foo/bar file version fare when passed through this mapping:

    CVS flibble/...              p4 //depot/flibble/...
    ========================     ======================

    foo/bar#1.1                  main/foo/bar#1
    foo/bar#1.2                  main/foo/bar#2
    ...                          ...
    foo/bar#1.5.2.1              beta_1/foo/bar#1
    foo/bar#1.5.2.2              beta_1/foo/bar#2
    ...                          ...

It's up to you to be sure there are no branches tagged "CWmain" in the CVS repository. Also, no branch specification will be created in the target p4 repository (this is a limitation that should be fixed).

Result Actions: <<delete>> and <<keep>>

The result expression CW<<delete>> indicates to delete the revision, while the result expression CW<<keep>> indicates to pass it through unchanged:

    Map:
    #   Pattern            Result
    #   =================  ==========
        old_stuff/...      <<delete>>  # Delete all files in /old
        old_stuff/.../*.c  <<keep>>    # except these

<<delete>> and <<keep>> may not appear in results; they are standalone tokens.

The default rule

There is a default rule

    ...  <<keep>>  ## Default rule: passes everything through as-is

that is evaluated after all the other rules. Thus, if no other rule matches a revision, it is passed through unchanged.

Command Line Parsing

For large maps or repeated use, the map is best specified in a .vcp file. For quick one-offs or scripted situations, however, the map: scheme may be used on the command line. In this case, each parameter is a word (separated by whitespace) and every pair of words is a ( pattern, result ) pair.

Because vcp command line parsing is performed incrementally and the next filter or destination specifications can look exactly like a pattern or result, the special token -- is used to terminate the list of patterns provided on the command line. This may also be the last word in the CWMap: section of a .vcp file, but that is superfluous. It is an error to use -- before the last word in a .vcp file.

LIMITATIONS

There is no way (yet) of telling the mapper to continue processing the rules list. We could implement labels like CW<label> to be allowed before pattern expressions (but not between pattern and result), and we could then impelement CW<goto label>. And a CW<next> could be used to fall through to the next label. All of which is wonderful, but I want to gain some real world experience with the current system and find a use case for gotos and fallthroughs before I implement them. This comment is here to solicit feedback :).

AUTHOR

Barrie Slaymaker <barries@slaysys.com>

COPYRIGHT

Copyright (c) 2000, 2001, 2002 Perforce Software, Inc. All rights reserved.

See VCP::License (CWvcp help license) for the terms of use.