man Text::Diff () - Perform diffs on files and record sets

NAME

Text::Diff - Perform diffs on files and record sets

SYNOPSIS

    use Text::Diff;

    ## Mix and match filenames, strings, file handles, producer subs,
    ## or arrays of records; returns diff in a string.
    ## WARNING: can return B<large> diffs for large files.
    my $diff = diff "file1.txt", "file2.txt", { STYLE => "Context" };
    my $diff = diff \$string1,   \$string2,   \%options;
    my $diff = diff \*FH1,       \*FH2;
    my $diff = diff \&reader1,   \&reader2;
    my $diff = diff \@records1,  \@records2;

    ## May also mix input types:
    my $diff = diff \@records1,  "file_B.txt";

DESCRIPTION

CWdiff() provides a basic set of services akin to the GNU CWdiff utility. It is not anywhere near as feature complete as GNU CWdiff, but it is better integrated with Perl and available on all platforms. It is often faster than shelling out to a system's CWdiff executable for small files, and generally slower on larger files.

Relies on Algorithm::Diff for, well, the algorithm. This may not produce the same exact diff as a system's local CWdiff executable, but it will be a valid diff and comprehensible by CWpatch. We haven't seen any differences between Algorithm::Diff's logic and GNU diff's, but we have not examined them to make sure they are indeed identical.

Note: If you don't want to import the CWdiff function, do one of the following:

   use Text::Diff ();

   require Text::Diff;

That's a pretty rare occurence, so CWdiff() is exported by default.

OPTIONS

diff() takes two parameters from which to draw input and a set of options to control it's output. The options are:

FILENAME_A, MTIME_A, FILENAME_B, MTIME_B
The name of the file and the modification time files These are filled in automatically for each file when diff() is passed a filename, unless a defined value is passed in. If a filename is not passed in and FILENAME_A and FILENAME_B are not provided or CWundef, the header will not be printed. Unused on CWOldStyle diffs.
OFFSET_A, OFFSET_B
The index of the first line / element. These default to 1 for all parameter types except ARRAY references, for which the default is 0. This is because ARRAY references are presumed to be data structures, while the others are line oriented text.
STYLE
Unified, Context, OldStyle, or an object or class reference for a class providing CWfile_header(), CWhunk_header(), CWhunk(), CWhunk_footer() and CWfile_footer() methods. The two footer() methods are provided for overloading only; none of the formats provide them. Defaults to Unified (unlike standard CWdiff, but Unified is what's most often used in submitting patches and is the most human readable of the three. If the package indicated by the STYLE has no hunk() method, c<diff()> will load it automatically (lazy loading). Since all such packages should inherit from Text::Diff::Base, this should be marvy. Styles may be specified as class names (CWSTYLE = Foo), in which case they will be CWnew()ed with no parameters, or as objects (CWSTYLE = Foo->new>).
CONTEXT
How many lines before and after each diff to display. Ignored on old-style diffs. Defaults to 3.
OUTPUT
Examples and their equivalent subroutines:
    OUTPUT   => \*FOOHANDLE,   # like: sub { print FOOHANDLE shift() }
    OUTPUT   => \$output,      # like: sub { $output .= shift }
    OUTPUT   => \@output,      # like: sub { push @output, shift }
    OUTPUT   => sub { $output .= shift },
If no CWOUTPUT is supplied, returns the diffs in a string. If CWOUTPUT is a CWCODE ref, it will be called once with the (optional) file header, and once for each hunk body with the text to emit. If CWOUTPUT is an IO::Handle, output will be emitted to that handle.
FILENAME_PREFIX_A, FILENAME_PREFIX_B
The string to print before the filename in the header. Unused on CWOldStyle diffs. Defaults are CW"---", CW"+++" for Unified and CW"***", CW"+++" for Context.
KEYGEN, KEYGEN_ARGS
These are passed to traverse_sequences in Algorithm::Diff.

Note: if neither CWFILENAME_ option is defined, the header will not be printed. If at one is present, the other and both MTIME_ options must be present or Use of undefined variable warnings will be generated (except on CWOldStyle diffs, which ignores these options).

Formatting Classes

These functions implement the output formats. They are grouped in to classes so diff() can use class names to call the correct set of output routines and so that you may inherit from them easily. There are no constructors or instance methods for these classes, though subclasses may provide them if need be.

Each class has file_header(), hunk_header(), hunk(), and footer() methods identical to those documented in the Text::Diff::Unified section. header() is called before the hunk() is first called, footer() afterwards. The default footer function is an empty method provided for overloading:

    sub footer { return "End of patch\n" }

Some output formats are provided by external modules (which are loaded automatically), such as Text::Diff::Table. These are are documented here to keep the documentation simple.

Text::Diff::Base

Returns "" for all methods (other than CWnew()).

Text::Diff::Unified

    --- A   Mon Nov 12 23:49:30 2001
    +++ B   Mon Nov 12 23:49:30 2001
    @@ -2,13 +2,13 @@
     2
     3
     4
    -5d
    +5a
     6
     7
     8
     9
    +9a
     10
     11
    -11d
     12
     13
file_header
    $s = Text::Diff::Unified->file_header( $options );
Returns a string containing a unified header. The sole parameter is the options hash passed in to diff(), containing at least:
    FILENAME_A  => $fn1,
    MTIME_A     => $mtime1,
    FILENAME_B  => $fn2,
    MTIME_B     => $mtime2
May also contain
    FILENAME_PREFIX_A    => "---",
    FILENAME_PREFIX_B    => "+++",
to override the default prefixes (default values shown).
hunk_header
    Text::Diff::Unified->hunk_header( \@ops, $options );
Returns a string containing the output of one hunk of unified diff.
Text::Diff::Unified::hunk
    Text::Diff::Unified->hunk( \@seq_a, \@seq_b, \@ops, $options );
Returns a string containing the output of one hunk of unified diff.

Text::Diff::Table

 +--+----------------------------------+--+------------------------------+
 |  |../Test-Differences-0.2/MANIFEST  |  |../Test-Differences/MANIFEST  |
 |  |Thu Dec 13 15:38:49 2001          |  |Sat Dec 15 02:09:44 2001      |
 +--+----------------------------------+--+------------------------------+
 |  |                                  * 1|Changes                       *
 | 1|Differences.pm                    | 2|Differences.pm                |
 | 2|MANIFEST                          | 3|MANIFEST                      |
 |  |                                  * 4|MANIFEST.SKIP                 *
 | 3|Makefile.PL                       | 5|Makefile.PL                   |
 |  |                                  * 6|t/00escape.t                  *
 | 4|t/00flatten.t                     | 7|t/00flatten.t                 |
 | 5|t/01text_vs_data.t                | 8|t/01text_vs_data.t            |
 | 6|t/10test.t                        | 9|t/10test.t                    |
 +--+----------------------------------+--+------------------------------+

This format also goes to some pains to highlight invisible characters on differing elements by selectively escaping whitespace:

 +--+--------------------------+--------------------------+
 |  |demo_ws_A.txt             |demo_ws_B.txt             |
 |  |Fri Dec 21 08:36:32 2001  |Fri Dec 21 08:36:50 2001  |
 +--+--------------------------+--------------------------+
 | 1|identical                 |identical                 |
 * 2|        spaced in         |        also spaced in    *
 * 3|embedded space            |embedded        tab       *
 | 4|identical                 |identical                 |
 * 5|        spaced in         |\ttabbed in               *
 * 6|trailing spaces\s\s\n     |trailing tabs\t\t\n       *
 | 7|identical                 |identical                 |
 * 8|lf line\n                 |crlf line\r\n             *
 * 9|embedded ws               |embedded\tws              *
 +--+--------------------------+--------------------------+
See Text::Diff::Table for more details, including how the whitespace escaping works.

Text::Diff::Context

    *** A   Mon Nov 12 23:49:30 2001
    --- B   Mon Nov 12 23:49:30 2001
    ***************
    *** 2,14 ****
      2
      3
      4
    ! 5d
      6
      7
      8
      9
      10
      11
    - 11d
      12
      13
    --- 2,14 ----
      2
      3
      4
    ! 5a
      6
      7
      8
      9
    + 9a
      10
      11
      12
      13

Note: hunk_header() returns only ***************\n.

LIMITATIONS

Must suck both input files entirely in to memory and store them with a normal amount of Perlish overhead (one array location) per record. This is implied by the implementation of Algorithm::Diff, which takes two arrays. If Algorithm::Diff ever offers an incremental mode, this can be changed (contact the maintainers of Algorithm::Diff and Text::Diff if you need this; it shouldn't be too terribly hard to tie arrays in this fashion). Does not provide most of the more refined GNU diff options: recursive directory tree scanning, ignoring blank lines / whitespace, etc., etc. These can all be added as time permits and need arises, many are rather easy; patches quite welcome. Uses closures internally, this may lead to leaks on CWperl versions 5.6.1 and prior if used many times over a process' life time.

AUTHOR

Barrie Slaymaker <barries@slaysys.com>.

COPYRIGHT & LICENSE

Copyright 2001, Barrie Slaymaker. All Rights Reserved. You may use this under the terms of either the Artistic License or GNU Public License v 2.0 or greater.