man Jcode () - Japanese Charset Handler


Jcode - Japanese Charset Handler


 use Jcode;
 # traditional
 Jcode::convert(\$str, $ocode, $icode, "z");
 # or OOP!
 print Jcode->new($str)->h2z->tr($from, $to)->utf8;


<Japanese document is now available as Jcode::Nihongo. > supports both object and traditional approach. With object approach, you can go like;

  $iso_2022_jp = Jcode->new($str)->h2z->jis;

Which is more elegant than:

  $iso_2022_jp = $str;
  &jcode::convert(\$iso_2022_jp, 'jis', &jcode::getcode(\$str), "z");

For those unfamiliar with objects, still supports CWgetcode() and CWconvert().

If the perl version is 5.8.1, Jcode acts as a wrapper to Encode, the standard charset handler module for Perl 5.8 or later.


Methods mentioned here all return Jcode object unless otherwise mentioned.


Creates Jcode object CW$j from CW$str. Input code is automatically checked unless you explicitly set CW$icode. For available charset, see getcode below. For perl 5.8.1 or better, CW$icode can be any encoding name that Encode understands.

  $j = Jcode->new($european, 'iso-latin1');
When the object is stringified, it returns the EUC-converted string so you can <print CW$j> instead of <print CW$j->euc>.
Passing Reference
Instead of scalar value, You can use reference as Jcode->new(\$str); This saves time a little bit. In exchange of the value of CW$str being converted. (In a way, CW$str is now tied to jcode object). Sets CW$j's internal string to CW$str. Handy when you use Jcode object repeatedly (saves time and memory to create object).
 # converts mailbox to SJIS format
 my $jconv = new Jcode;
 $/ = 00;
     print $jconv->set(\$_)->mime_decode->sjis;
Appends CW$str to CW$j's internal string. shortcut for Jcode->new() so you can go like;

Encoded Strings

In general, you can retrieve encoded string as CW$j->encoded.

$sjis = jcode($str)->sjis
What you code is what you get :) Same as CW$j->h2z->jis. Hankaku Kanas are forcibly converted to Zenkaku. For perl 5.8.1 and better, you can also use any encoding names and aliases that Encode supports. For example:
  $european = $j->iso_latin1; # replace '-' with '_' for names.
FYI: Encode::Encoder uses similar trick.
For perl is 5.8.1 or better, Jcode stores the internal string in UTF-8. Any character that does not map to ->encoding are replaced with a '?', which is Encode standard.
  my $unistr = "\x{262f}"; # YIN YANG
  my $j = jcode($unistr);  # $j->euc is '?'
You can change this behavior by specifying fallback like Encode. Values are the same as Encode. CWJcode::FB_PERLQQ, CWJcode::FB_XMLCREF, CWJcode::FB_HTMLCREF are aliased to those of Encode for convenice.
  print $j->fallback(Jcode::FB_PERLQQ)->euc;   # '\x{262f}'
  print $j->fallback(Jcode::FB_XMLCREF)->euc;  # '&#x262f;'
  print $j->fallback(Jcode::FB_HTMLCREF)->euc; # '&#9775;'
The global variable CW$Jcode::FALLBACK stores the default fallback so you can override that by assigning the value.
  $Jcode::FALLBACK = Jcode::FB_PERLQQ; # set default fallback scheme
folds lines in jcode string every CW$width (default: 72) where CW$width is the number of halfwidth character. Fullwidth Characters are counted as two. with a newline string spefied by CW$newline_str (default: \n). Rudimentary kinsoku suppport is now available for Perl 5.8.1 and better. returns character length properly, rather than byte length.

Methods that use MIME::Base64

To use methods below, you need MIME::Base64. To install, simply

   perl -MCPAN -e 'CPAN::Shell->install("MIME::Base64")'

If your perl is 5.6 or better, there is no need since MIME::Base64 is bundled. Converts CW$str to MIME-Header documented in RFC1522. When CW$lf is specified, it uses CW$lf to fold line (default: \n). When CW$bpl is specified, it uses CW$bpl for the number of bytes (default: 76; this number must be smaller than 76). For Perl 5.8.1 or better, you can also encode MIME Header as:

  $mime_header = $j->MIME_Header;
In which case the resulting CW$mime_header is MIME-B-encoded UTF-8 whereas CW$j->mime_encode() returnes MIME-B-encoded ISO-2022-JP. Most modern MUAs support both.
Decodes MIME-Header in Jcode object. For perl 5.8.1 or better, you can also do the same as:
  Jcode->new($str, 'MIME-Header')

Hankaku vs. Zenkaku

Converts X201 kana (Hankaku) to X208 kana (Zenkaku). When CW$keep_dakuten is set, it leaves dakuten as is (That is, ka + dakuten is left as is instead of being converted to ga) You can retrieve the number of matches via CW$j->nmatch;
Converts X208 kana (Zenkaku) to X201 kana (Hankaku). You can retrieve the number of matches via CW$j->nmatch;

Regexp emulators

To use CW->m() and CW->s(), you need perl 5.8.1 or better. Applies CWtr/$from/$to/ on Jcode object where CW$from and CW$to are EUC-JP strings. On perl 5.8.1 or better, CW$from and CW$to can also be flagged UTF-8 strings. If CW$opt is set, CWtr/$from/$to/$opt is applied. CW$opt must be 'c', 'd' or the combination thereof. You can retrieve the number of matches via CW$j->nmatch; The following methods are available only for perl 5.8.1 or better. Applies CWs/$pattern/$replace/$opt. CW$pattern and CWreplace must be in EUC-JP or flagged UTF-8. CW$opt are the same as regexp options. See perlre for regexp options. Like CW$j->tr(), CW$j->s() returns the object itself so you can nest the operation as follows;

  $j->tr("a-z", "A-Z")->s("foo", "bar");
Applies CWm/$patter/$opt. Note that this method DOES NOT RETURN AN OBJECT so you can't chain the method like CW$j->s().

Instance Variables

If you need to access instance variables of Jcode object, use access methods below instead of directly accessing them (That's what OOP is all about)

FYI, Jcode uses a ref to array instead of ref to hash (common way) to optimize speed (Actually you don't have to know as long as you use access methods instead; Once again, that's OOP)

Reference to the EUC-coded String.
Input charcode in recent operation.
Number of matches (Used in CW$j->tr, etc.)


($code, [$nmatch]) = getcode($str)
Returns char code of CW$str. Return codes are as follows
 ascii   Ascii (Contains no Japanese Code)
 binary  Binary (Not Text File)
 euc     EUC-JP
 sjis    SHIFT_JIS
 jis     JIS (ISO-2022-JP)
 ucs2    UCS2 (Raw Unicode)
 utf8    UTF8
When array context is used instead of scaler, it also returns how many character codes are found. As mentioned above, CW$str can be \$str instead. Users: This function is 100% upper-conpatible with jcode::getcode() well, almost;
 * When its return value is an array, the order is the opposite;
   jcode::getcode() returns $nmatch first.
 * jcode::getcode() returns 'undef' when the number of EUC characters
   is equal to that of SJIS.  Jcode::getcode() returns EUC.  for there is no in-betweens.
Converts CW$str to char code specified by CW$ocode. When CW$icode is specified also, it assumes CW$icode for input string instead of the one checked by getcode(). As mentioned above, CW$str can be \$str instead. Users: This function is 100% upper-conpatible with jcode::convert() !


For perl is 5.8.1 or later, Jcode acts as a wrapper to Encode. Meaning Jcode is subject to bugs therein.


This package owes a lot in motivation, design, and code, to the for Perl4 by Kazumasa Utashiro <>.

Hiroki Ohzaki <> has helped me polish regexp from the very first stage of development.

JEncode by has inspired me to integrate Encode to Jcode. He has also contributed Japanese POD.

And folks at Jcode Mailing list <>. Without them, I couldn't have coded this far.






Copyright 1999-2005 Dan Kogai <>

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.