man SWISH::API::Common () - SWISH Document Indexing Made Easy

NAME

SWISH::API::Common - SWISH Document Indexing Made Easy

SYNOPSIS

    use SWISH::API::Common;

    my $swish = SWISH::API::Common->new();

        # Index all files in a directory and its subdirectories
    $swish->index("/usr/local/share/doc");

        # After indexing once (it's persistent), fire up as many
        # queries as you like:

        # Search documents containing both "swish" and "install"
    for my $hit ($swish->search("swish AND install")) {
        print $hit->path(), "\n";
    }

DESCRIPTION

CWSWISH::API::Common offers an easy interface to the Swish index engine. While SWISH::API offers a complete API, CWSWISH::API::Common focusses on ease of use.

THIS MODULE IS CURRENTLY UNDER DEVELOPMENT. THE API MIGHT CHANGE AT ANY TIME.

Currently, CWSWISH::API::Common just allows for indexing documents in a single directory and any of its subdirectories. Also, don't run index() and search() in parallel yet.

INSTALLATION

CWSWISH::API::Common requires CWSWISH::API and the swish engine to be installed. Please download the latest release from

    http://swish-e.org/distribution/swish-e-2.4.3.tar.gz

and untar it, type

    ./configure
    make
    make install

and then install SWISH::API which is contained in the distribution:

    cd perl
    perl Makefile.PL
    make 
    make install

METHODS

$sw = SWISH::API::Common->new()
Constructor. Takes many options, but the defaults are usually fine. Available options and their defaults:
        # Where SWISH::API::Common stores index files etc.
    swish_adm_dir   "$ENV{HOME}/.swish-common"
        # The path to swish-e, relative is OK
    swish_exe       "swish-e"
        # Swish index file
    swish_idx_file  "$self->{swish_adm_dir}/default.idx"
        # Swish configuration file
    swish_cnf_file  "$self->{swish_adm_dir}/default.cnf"
        # SWISH Stemming
    swish_fuzzy_indexing_mode => "Stemming_en"
        # Maximum amount of data (in bytes) extracted
        # from a single file
    file_len_max 100_000
        # Preserve every indexed file's atime
    atime_preserve
$sw->index($dir, ...)
Generate a new index of all text documents under directory CW$dir. One or more directories can be specified. Searches the index, using the given search expression. Returns a list hits, which can be asked for their path:
        # Search documents containing 
        # both "foo" and "bar"
    for my $hit ($swish->search("foo AND bar")) {
        print $hit->path(), "\n";
    }
index_remove
Permanently delete the current index.

TODO List

    * More than one index directory
    * Remove documents from index
    * Iterator for search hits

LEGALESE

Copyright 2005 by Mike Schilli, all rights reserved. This program is free software, you can redistribute it and/or modify it under the same terms as Perl itself.

AUTHOR

2005, Mike Schilli <cpan@perlmeister.com>