man estcmd (Commandes) - command line interface of the core API

NAME

estcmd - command line interface of the core API

SYNOPSIS

estcmd put [-cl] db [file]

estcmd out [-cl] [-pc enc] db expr

estcmd edit [-cl] [-pc enc] db expr name [value]

estcmd get [-pc enc] db expr [attr]

estcmd list [-lp] db

estcmd uriid [-pc enc] db expr

estcmd meta db [name [value]]

estcmd inform db

estcmd optimize [-onp] [-ond] db

estcmd search [-ic enc] [-vu|-va|-vf|-vs|-vh|-vx|-dd] [-kn num] [-ec] [-gs|-gf|-ga] [-cd] [-ni] [-sf] [-hs] [-attr expr] [-ord expr] [-max num] [-sk num] [-sim id] db [phrase]

estcmd gather [-cl] [-no] [-fe|-ft|-fh|-fm] [-fx sufs cmd] [-fz] [-fo] [-rm sufs] [-ic enc] [-il lang] [-bc] [-pc enc] [-px name] [-apn] [-sd] [-cm] [-cs num] db [file|dir]

estcmd purge [-cl] [-no] [-fc] [-pc enc] [-attr expr] db [prefix]

estcmd extkeys [-no] [-fc] [-dfdb file] [-ni] [-kn num] [-attr expr] db [prefix]

estcmd words [-dfdb file] db

estcmd draft [-ft|-fh|-fm] [-ic enc] [-il lang] [-bc] [-kn num] [file]

estcmd break [-ic enc] [-il lang] [-apn] [-wt] [file]

estcmd iconv [-ic enc] [-il lang] [-oc enc] [file]

estcmd randput [-ren|-rla|-reu|-ror|-rjp|-rch] [-cs num] db dnum

estcmd wicked db dnum

estcmd regression db

estcmd version

DESCRIPTION

estcmd is an aggregation of sub commands. The name of a sub command is specified by the first argument. Other arguments are parsed according to each sub command. The argument db specifies the path of an index.

estcmd put [-cl] db [file]
Register a document of document draft to an index. file specifies a target file. If it is omitted, the standard input is read. If -cl is specified, regions of a overwritten document are cleaned up.
estcmd out [-pc enc] [-cl] db expr
Remove information of a document from an index. expr specifies the ID number, the URI, or the local path of a document. If -cl is specified, regions of the document are cleaned up. -pc specifies the encoding of file paths. By default, it is ISO-8859-1.
estcmd edit [-pc enc] db expr name [value]
Edit an attribute of a document in an index. expr specifies the ID number, the URI, or the local path of a document. name specifies the name of an attribute. value specifies the value of the attribute. If it is omitted, the attribute is removed. -pc specifies the encoding of the file path and the attribute value. By default, it is ISO-8859-1.
estcmd get [-pc enc] db expr [attr]
Output document draft of a document in an index. expr specifies the ID number, the URI, or the local path of a document. If attr is specified, only the value of the attribute is output. -pc specifies the encoding of file paths. By default, it is ISO-8859-1.
estcmd list [-lp] db
Output a list of all document in an index. If -lp is specified, local path equivalent to URL of "file://" is output.
estcmd uriid [-pc enc] db expr
Output the ID number of a document specified by URI. expr specifies the URI or the local path of a document. -pc specifies the encoding of file paths. By default, it is ISO-8859-1.
estcmd meta db [name [value]]
Handle meta data. name specifies the name of a piece of meta data. If it is omitted, a list of all names is output. value specifies the value of the meta data to be recorded. If it is omitted, the current value is output. If it is an empty string, the meta data is removed.
estcmd inform db
Output the number of documents and the number of unique words in an index.
estcmd optimize [-onp] [-ond] db
Optimize an index and clean up dispensable regions. If -onp is specified, it is omitted to clean up dispensable regions. If -ond is specified, it is omitted to optimize the database files.
estcmd search [-ic enc] [-vu|-va|-vf|-vs|-vh|-vx|-dd] [-kn num] [-ec] [-gs|-gf|-ga] [-cd] [-ni] [-sf] [-hs] [-attr expr] [-ord expr] [-max num] [-sk num] [-sim id] db [phrase]
Search an index for documents. phrase specifies the search phrase. -ic specifies the input encoding. By default, it is UTF-8. If -vu is specified, TSV of ID number and URI are output. If -va is specified, multipart format including attributes is output. If -vf is specified, multipart format including document draft is output. If -vs is specified, multipart format including attributes and snippets is output. If -vh is specified, human readable format including attributes and snippets is output. If -vx is specified, XML including including attributes and snippets is output. If -dd is specified, document draft data are dumped and saved into separated files. -kn specifies the number of keywords to be extracted. By default, no keyword is extracted. -ec specifies lower limit of similarity eclipse. If -gs is specified, every key of N-gram is checked. By default, it is alternately. If -gf is specified, keys of N-gram are checked every three. If -ga is specified, keys of N-gram are checked every four. If -cd is specified, whether documents match the search phrase definitely is checked. If -ni is specified, TF-IDF tuning is omitted. If -sf is specified, the phrase is treated as a simplified form. If -hs is specified, score information is output as an attribute. -attr specifies an attribute search condition. This option can be specified multiple times. -ord specifies the order expression. By default, it is descending by score. -max specifies the maximum number of shown documents. Negative means unlimited. By default, it is 10. -sk specifies the number of documents to be skipped. By default, it is 0. -sim specifies the ID number of the seed document for similarity search.
estcmd gather [-cl] [-no] [-fe|-ft|-fh|-fm] [-fx sufs cmd] [-fz] [-fo] [-rm sufs] [-ic enc] [-il lang] [-bc] [-pc enc] [-px name] [-apn] [-sd] [-cm] [-cs num] db [file|dir]
Scan the local file system and register documents into an index. If the third argument is the name of a file, a list of paths of target documents are read from it. If it is "-", the standard input is specified. If the third argument is the name of a directory. All files under the directory are treated as target documents. If -cl is specified, regions of overwritten documents are cleaned up. If -no is specified, operations are printed but not executed actually. If -fe is specified, target files are treated as document draft. By default, the format is detected by the suffix of each document. If -ft is specified, target files are treated as plain text. If -fh is specified, target files are treated as HTML. If -fm is specified, target files are treated as MIME. If -fx is specified, target files with the specified suffixes are processed by the specified outer command. If the command is leaded by "T@", the output of the command is treated as plain text. If the command is leaded by "H@", the output of the command is treated as HTML. If the command is leaded by "M@", the output of the command is treated as MIME. Else, the output is treated as document draft. This option can be specified multiple times. If -fz is specified, documents which do not corresponding to the condition of -fx are ignored. If -fo is specified, target files are not read. It is useful for efficient process of the outer command. If -rm is specified, target files with the specified suffixes are removed. "*" matches any file. This option can be specified multiple times. -ic specifies the input encoding. By default, it is detected automatically. -il specifies the preferred input language. By default, English is preferred. If -bc is specified, binary files are detected and ignored. -pc specifies the encoding of file paths. By default, it is ISO-8859-1. -px specifies the name of an attribute read from the list of paths. As the list of paths can be in TSV format, the first field is treated as the path of a target document, the second field and the followers are definitions of attribute values. -px specifies the name of each values of the second field and the followers. This option can be specified multiple times. If -apn is specified, N-gram analysis is performed against European text also. If -sd is specified, the modification date of each file is recorded as an attribute. If -cm is specified, documents whose modification date has not changed are ignored. -cs specifies the size of cache memory by mega bytes. By default, it is 64Mb.
estcmd purge [-cl] [-no] [-fc] [-ec enc] [-attr expr] db [prefix]
Purge information of documents which do not exist on the file system. If prefix is specified, only documents whose URIs are begins with it. It can be specified by the local path of a directory. If -cl is specified, regions of the deleted documents are cleaned up. If -no is specified, operations are printed but not executed actually. If -fc is specified, information of all target documents are deleted. -pc specifies the encoding of file paths. By default, it is ISO-8859-1. -attr specifies an attribute search condition. This option can be specified multiple times.
estcmd extkeys [-no] [-fc] [-dfdb file] [-ni] [-kn num] [-attr expr] db [prefix]
Create a database of keywords extracted from documents. If prefix is specified, only documents whose URIs are begins with it. If -no is specified, operations are printed but not executed actually. If -fc is specified, all target documents are processed whichever they have existing records or not. -dfdb specifies an outher database of document frequency. By default, document frequency is calculated dynamically according to the index. If -ni is specified, TF-IDF tuning is omitted. -kn specifies the number of keywords to be extracted. -attr specifies an attribute search condition. This option can be specified multiple times.
estcmd words [-dfdb file] db
Output a list of all unique words and each record size which is treated as docuemnt frequency. -dfdb specifies an outer database where the result is stored. By default, the result is output to the standard output as TSV. If the outer database already exists, the value of each record is incremented.
estcmd draft [-ft|-fh|-fm] [-ic enc] [-il lang] [-bc] [-kn num] [file]
For test and debug.
estcmd break [-ic enc] [-il lang] [-apn] [-wt] [file]
For test and debug.
estcmd iconv [-ic enc] [-il lang] [-oc enc] [file]
For test and debug.
estcmd randput [-ren|-rla|-reu|-ror|-rjp|-rch] [-cs num] db dnum
For test and debug.
estcmd wicked db dnum
For test and debug.
estcmd regression db
For test and debug.
estcmd version
Show the version information.

All sub commands return 0 if the operation is success, else return 1. As for put, out, gather, purge, randput, wicked, and regression, they finish with closing the database when they catch the signal 1 (SIGHUP), 2 (SIGINT), 3 (SIGQUIT), 13 (SIGPIPE), or 15 (SIGTERM).

The encoding name specified by -ic option should be such name registered to IETF as UTF-8, ISO-8859-1, and so on. The language name specified by -il option should be one of "en" (English), "ja" (Japanese, "zh" (Chinese), "ko" (Korean).

The outer command specified by -fx option of gather receives the path of the target document by the first argument and the path for output by the second argument. The original path of the target document is given as the value of the environment variable `ESTORIGFILE'.

Note that similarity search is very slow, by default. To improve the performance of similarity search, running "estcmd extkeys" beforehand is strongly recommended.

SEE ALSO

estconfig(1), estcall(1), estmaster(1), estraier(3)

Please see http://hyperestraier.sourceforge.net/uguide-en.html for detail.