man hspell (Commandes) - Hebrew spellchecker

NAME

hspell - Hebrew spellchecker

SYNOPSIS

hspell [ -achHlnsvV ] [file...]

DESCRIPTION

hspell tries to find incorrectly spelled Hebrew words in its input files.

Like the traditional Unix spell(1), hspell outputs the sorted list of incorrect words, and does not (yet) have a more friendly interface for making corrections for you. However, unlike spell(1), hspell can suggest possible corrections for some spelling errors - such suggestions are enabled with the -c (correct) and -n (notes) options.

Hspell currently expects ISO-8859-8-encoded input files. Non-Hebrew characters in the input files are ignored, allowing the easy spellchecking of Hebrew-English texts, as well as HTML or TeX files. If files using a different encoding (e.g., UTF8) are to be checked, they must be converted first to ISO-8859-8 (e.g., see iconv(1), recode(1)).

The output will also be in ISO-8859-8 encoding, in so-called "logical order", so it is normally useful to pipe it to bidiv(1) before viewing, as in:

hspell -c filename | bidiv | less

If no input file is given, hspell reads from its standard input.

OPTIONS

-v
If the -v option is given, hspell prints emacs-oriented version information and exits.
-V
With the -V option, hspell prints true and human-oriented version information and exits.
-c
If the -c option is given, hspell will suggest corrections for misspelled words, whenever it can find such corrections. The correction mechanism in this release is especially good at finding corrections for incorrect niqqud-less spellings, with missing or extra 'immot-qri'a.
-n
The -n option will give some longer "notes" about certain spelling errors, explaining why these are indeed errors (or in what cases using this word is in fact correct). It is recommend to combine the two options, -cn for maximal correction help from hspell.
-l
The -l (linguistic information) option will explain for each correct word why it was recognized (show the basic noun, verb, etc., that this inflection relates to, and its tense, gender, associated Kinnuy, or other relevant information)

If Hspell was built without morphological analysis support, this option will only show the correct splits of the given word into prefix + word, as the full information incurs a 4-fold increase in the installation size.

Giving the -c option in addition to -l results in special behavior. In that case we suggest "corrections" to every word (regardless if they are in the dictionary or not), and show the linguistic information on all those words. This can be useful for a reader application, which may also want to be able to understand mispellings and their possible meanings.

-s
Normally, the words deemed spelling mistakes are shown in alphabetical order. The -s option orders them by severity, i.e., the errors that most frequently appear in the document are shown first. This option is most useful for people helping to build hspell's word list, and are looking for common correct words that hspell does not know yet.
-a
With the -a option, hspell tries to emulate (as little as possible of) ispell's pipe interface. This allows Lyx, Emacs, Geresh and KDE to use hspell as an external spell-checker.
-H
By default, Hspell does not allow the He Ha-sh'ela prefix. This is because this prefix is not normally used in modern Hebrew, and generates many false-negatives (errors, like He followed by a possessed noun, are thought to be correct). The -H option nevertheless tells Hspell to allow this prefix.
-D base
load the word lists from the given base pathname, rather than from the compiled-in default path. This is mostly used for testing Hspell, when the dictionaries have been compiled in the current directory and hspell is run as "hspell -Dhebrew.wgz".
-d -B, -m, -T, -C, -S, -P, -p, -w, and -W
These options are passed to hspell by lyx or other applications, and are cordially ignored.

SPELLING STANDARD

Hspell was designed to be 100% and strictly compliant with the official niqqud-less spelling rules ("Ha-ktiv Khasar Ha-niqqud", colloquially known as "Ktiv Male") published by the Academy of the Hebrew Language.

This is both an advantage and a disadvantage, depending on your viewpoint. It's an advantage because it encourages a correct and consistent spelling style throughout your writing. It is a disadvantage, because a few of the Academia's official spelling decisions are relatively unknown to the general public.

Users of Hspell (and all Hebrew writers, for that matter) are encouraged to read the Academia's official niqqud-less spelling rules (which are printed at the end of most modern Hebrew dictionaries, and an abridged version is available in http://hebrew-academy.huji.ac.il/decision4.html). Users are also encouraged to refer to Hebrew dictionaries which use the niqqud-less spelling (such as Millon Ha-hove or Rav Milim).

Future releases might include an option for alternative spelling standards.

BEHIND THE SCENES

The hspell program itself is mostly a simple (but efficient) program that checks input words against a long list of valid words. The real "brains" behind it are the word lists (dictionary) provided by the Hspell project.

In order for this dictionary to be completely free of other people's copyright restrictions, the Hspell project is a clean-room implementation, not based on other companies' word lists, on other companies' spell checkers, or on copying of printed dictionaries.

The word list is also not based on automatic scanning of available Hebrew documents (such as online newspapers), because there is no way to guarantee that such a list will be correct (not contain misspellings, useless proper names, slang, and so on), complete (certain inflections might not appear in the chosen samples) or consistent (especially when it comes to niqqud-less spelling rules).

Instead, our idea was to write programs which know how to correctly inflect Hebrew nouns and conjugate Hebrew verbs. The input to these programs is a list of noun stems and verb roots, plus hints needed for the correct inflection when these cannot be figured out automatically. These input files are obviously an important part of the Hspell project. The "word list generators" (written in Perl, and are also part of the Hspell project) then create the complete word-list for use by the spellchecking program, hspell. This generation process is only done once, when installing hspell.

The generated lists are useful for much more than spellchecking - see the Hspell project's README file for more ideas for the future.

FILES

~/.hspell_words, ./hspell_words
These files, if they exist, should contain a list of Hebrew words that hspell will also accept as correct words.

Note that only these words exactly will be added - they are not inflected, and prefixes are not automatically allowed.

/usr/local/share/hspell/*
The standard Hebrew word lists used by hspell.

EXIT STATUS

Currently always 0.

VERSION

The version of hspell described by this manual page is 0.9 (January 13, 2005)

COPYRIGHT

Copyright (C) 2000-2005, Nadav Har'El <nyh@math.technion.ac.il> and Dan Kenigsberg <danken@cs.technion.ac.il>.

Hspell is free software, released under the GNU General Public License (GPL). Note that not only the programs in the distribution, but also the dictionary files and the generated word lists, are licensed under the GPL. There is no warranty of any kind.

See the LICENSE file for more information and the exact license terms.

The latest version of this software can be found in http://www.ivrix.org.il/projects/spell-checker

ACKNOWLEDGMENTS

The hspell utility and the linguistic databases behind it (collectively called "the Hspell project") were created by Nadav Har'El <nyh@math.technion.ac.il> and by Dan Kenigsberg <danken@cs.technion.ac.il>.

Although we wrote all of Hspell's code ourselves, we are truly indebted to the old-style "open source" pioneers - people who wrote books instead of hiding their knowledge in proprietary software. For the correct noun inflections, Dr. Shaul Barkali's "The Complete Noun Book" has been a great help. Prof. Uzzi Ornan's booklet "Verb Conjugation in Flow Charts" has been instrumental in the implementation of verb conjugation, and Barkali's "The Complete Verb Book" was used too.

During our work we have extensively used a number of Hebrew dictionaries, including Even Shoshan, Millon Ha-hove and Rav-Milim, to ensure the correctness of certain words. Various Hebrew newspapers and books, both printed and online, were used for inspiration and for finding words we still do not recognize.

We wish to thank Cilla Tuviana and Dr. Zvi Har'El for their assistance with some grammatical questions.

Several other people helped us in various releases, with suggestions, fixes or patches - they are listed in the WHATSNEW file in the distribution.

SEE ALSO

BUGS

This manual page is in English.

The hspell spellchecker depends on word lists created by the Hspell project. At this stage, these word lists still do not cover all of the Hebrew vocabulary, and so hspell will often list correct words (that it doesn't know) as being wrong. This is being worked on, and hspell's vocabulary will grow from release to release.

Version 0.6 and above feature a redesigned front-end, which is unfortunately missing a few features that existed in version 0.5. For more details, see the WHATSNEW file in the distribution.

For GUI-lovers, hspell's user interface is an abomination. As more and more applications learn to interface with hspell, this will no longer be an issue. See http://www.ivrix.org.il/projects/spell-checker/Hspell-HOWTO.html for instructions on how to use Hspell in a variety of applications.