man gocr (Commandes) - command line OCR tool
NAME
gocr - command line OCR tool
SYNOPSIS
gocr [OPTION] [-i] pnm file
DESCRIPTION
gocr is an optical character recognition program that can be used from the command line. It takes input in PNM, PGM, PBM, PPM, or PCX format, and writes recognized text to stdout. If the pnm file is a single dash, PNM data is read from stdin. If gzip, bzip2 and netpbm-progs are installed and your system supports popen(3) also pnm.gz, pnm.bz2, png, jpg, jpeg, tiff, gif, bmp, ps (only single pages) and eps are supported as input files (not as input stream), where pnm can be replaced by one of ppm, pgm and pbm.
OPTIONS
- -h
- show usage information
- -i file
- read input from file (or stdin if file is a single dash)
- -o file
- send output to file instead of stdout
- -e file
- send errors to file instead of stderr or to stdout if file is a dash
- -x file
- progress output to file (file can be a file name, a fifo name or a file descriptor 1...255)
- -p path
- database path (including final slash, default is ./db/)
- -f format
- output format (ISO8859_1 TeX HTML UTF8 ASCII)
- -l level
- set grey level to level (0<160<=255, default: 0 for autodetect)
- -d size
- set dust size in pixels (clusters smaller than this are removed), 0 means no clusters are removed, the default is -1 for auto detection
- -s num
- set spacewidth/dots (default: 0 for autodetect)
- -v verbosity
- be verbose; verbosity is a bitfield
- -c string
- only verbose output of characters from string
- -C string
- only recognise characters from string
- -m modes
- set operation modes; modes is a bitfield
- -n bool
- if bool is non-zero, only recognise numbers (this is now obsolete, use -C "0123456789")
The verbosity is specified as a bitfield:
- 1
- print more info
- 2
- list shapes of boxes (see -c)
- 4
- list pattern of boxes (see -c)
- 8
- print pattern after recognition
- 16
- print line information
- 32
- create outXX.pgm
The operation modes are:
- 2
- use database (early development)
- 4
- layout analysis, zoning (development)
- 8
- don't compare unrecognized characters
- 16
- don't divide overlapping characters
- 32
- don't do context correction
- 64
- character packing (development)
- 130
- extend database, prompts user (128+2, early development)
- 256
- switch off the OCR engine (makes sense together with -m 2)
.SH AUTHOR
Joerg Schulenburg <jschulen@gmx.de>
First version of man page by Tim Waugh <twaugh@redhat.com>
VERSION INFORMATION
This man page documents gocr, version 0.37.
REPORTING BUGS
Report bugs to <jschulen@gmx.de>
SEE ALSO
More details can be found at /usr/share/doc/gocr-X.XX/gocr.html.
EXAMPLES
- gocr -v 33 text1.pbm
- output verbose information, out30.bmp is created to see details of recognition process
- gocr -v 7 -c _YV text1.pbm
- verbose output for unknown chars and chars Y and V
- djpeg -pnm -gray text.jpg | gocr -
- convert a jpeg file to pnm format and input via pipe