man Regexp::Common::comment () - Regexp::Common::comment -- provide regexes for comments.
NAME
Regexp::Common::comment -- provide regexes for comments.
SYNOPSIS
use Regexp::Common qw /comment/;
while (<>) { /$RE{comment}{C}/ and print "Contains a C comment\n"; /$RE{comment}{C++}/ and print "Contains a C++ comment\n"; /$RE{comment}{PHP}/ and print "Contains a PHP comment\n"; /$RE{comment}{Java}/ and print "Contains a Java comment\n"; /$RE{comment}{Perl}/ and print "Contains a Perl comment\n"; /$RE{comment}{awk}/ and print "Contains an awk comment\n"; /$RE{comment}{HTML}/ and print "Contains an HTML comment\n"; }
use Regexp::Common qw /comment RE_comment_HTML/;
while (<>) { $_ =~ RE_comment_HTML() and print "Contains an HTML comment\n"; }
DESCRIPTION
Please consult the manual of Regexp::Common for a general description of the works of this interface.
Do not use this module directly, but load it via Regexp::Common.
This modules gives you regular expressions for comments in various languages.
THE LANGUAGES
Below, the comments of each of the languages are described. The patterns are available as CW$RE{comment}{CILANGCW}, foreach language LANG. Some languages have variants; it's described at the individual languages how to get the patterns for the variants. Unless mentioned otherwise, CW{-keep} sets CW$1, CW$2, CW$3 and CW$4 to the entire comment, the opening marker, the content of the comment, and the closing marker (for many languages, the latter is a newline) respectively.
- ABC
- Comments in ABC start with a backslash (CW\), and last till the end of the line. See <http://homepages.cwi.nl/%7Esteven/abc/>.
- Ada
- Comments in Ada start with CW--, and last till the end of the line.
- Advisor
- Advisor is a language used by the HP product glance. Comments for this language start with either CW# or CW//, and last till the end of the line.
- Advsys
- Comments for the Advsys language start with CW; and last till the end of the line. See also <http://www.wurb.com/if/devsys/12>.
- Alan
- Alan comments start with CW--, and last till the end of the line. See also <http://w1.132.telia.com/~u13207378/alan/manual/alanTOC.html>.
- Algol 60
- Comments in the Algol 60 language start with the keyword CWcomment, and end with a CW;. See <http://www.masswerk.at/algol60/report.htm>.
- Algol 68
- In Algol 68, comments are either delimited by CW#, or by one of the keywords CWco or CWcomment. The keywords should not be part of another word. See <http://westein.arb-phys.uni-dortmund.de/~wb/a68s.txt>. With CW{-keep}, only CW$1 will be set, returning the entire comment.
- ALPACA
- The ALPACA language has comments starting with CW/* and ending with CW*/.
- awk
- The awk programming language uses comments that start with CW# and end at the end of the line.
- B
- The B language has comments starting with CW/* and ending with CW*/.
- BASIC
- There are various forms of BASIC around. Currently, we only support the variant supported by mvEnterprise, whose pattern is available as CW$RE{comment}{BASIC}{mvEnterprise}. Comments in this language start with a CW!, a CW* or the keyword CWREM, and end till the end of the line. See <http://www.rainingdata.com/products/beta/docs/mve/50/ReferenceManual/Basic.pdf>.
- Beatnik
- The esotoric language Beatnik only uses words consisting of letters. Words are scored according to the rules of Scrabble. Words scoring less than 5 points, or 18 points or more are considered comments (although the compiler might mock at you if you score less than 5 points). Regardless whether CW{-keep}, CW$1 will be set, and set to the entire comment. This pattern requires perl 5.8.0 or newer.
- beta-Juliet
- The beta-Juliet programming language has comments that start with CW// and that continue till the end of the line. See also <http://www.catseye.mb.ca/esoteric/b-juliet/index.html>.
- Befunge-98
- The esotoric language Befunge-98 uses comments that start and end with a CW;. See <http://www.catseye.mb.ca/esoteric/befunge/98/spec98.html>.
- BML
- BML, or Better Markup Language is an HTML templating language that uses comments starting with CW<?c_, and ending with CWc_?>. See <http://www.livejournal.com/doc/server/bml.index.html>.
- Brainfuck
- The minimal language Brainfuck uses only eight characters, CW<, CW>, CW[, CW], CW+, CW-, CW. and CW,. Any other characters are considered comments. With CW{-keep}, CW$1 is set to the entire comment.
- C
- The C language has comments starting with CW/* and ending with CW*/.
- C
- The C language has comments starting with CW/* and ending with CW*/. See <http://cs.uas.arizona.edu/classes/453/programs/CSpec.html>.
- The language has two forms of comments. Comments that start with CW// and last till the end of the line, and comments that start with CW/*, and end with CW*/. If CW{-keep} is used, only CW$1 will be set, and set to the entire comment.
- C#
- The C# language has two forms of comments. Comments that start with CW// and last till the end of the line, and comments that start with CW/*, and end with CW*/. If CW{-keep} is used, only CW$1 will be set, and set to the entire comment. See <http://msdn.microsoft.com/library/default.asp?url=/library/en-us/csspec/html/vclrfcsharpspec_C.asp>.
- Caml
- Comments in Caml start with CW(*, end with CW*), and can be nested. See <http://www.cs.caltech.edu/courses/cs134/cs134b/book.pdf> and <http://pauillac.inria.fr/caml/index-eng.html>.
- Cg
- The Cg language has two forms of comments. Comments that start with CW// and last till the end of the line, and comments that start with CW/*, and end with CW*/. If CW{-keep} is used, only CW$1 will be set, and set to the entire comment. See <http://developer.nvidia.com/attach/3722>.
- CLU
- In CWCLU, a comment starts with a procent sign (CW%), and ends with the next newline. See <ftp://ftp.lcs.mit.edu:/pub/pclu/CLU-syntax.ps> and <http://www.pmg.lcs.mit.edu/CLU.html>.
- COBOL
- Traditionally, comments in COBOL are indicated by an asteriks in the seventh column. This is what the pattern matches. Modern compiler may more lenient though. See <http://www.csis.ul.ie/cobol/Course/COBOLIntro.htm>, and <http://www.csis.ul.ie/cobol/default.htm>. Due to a bug in the regexp engine of perl 5.6.x, this regexp is only available in version 5.8.0 and up.
- CQL
- Comments in the chess query language (CQL) start with a semi colon (CW;) and last till the end of the line. See <http://www.rbnn.com/cql/>.
- Crystal Report
- The formula editor in Crystal Reports uses comments that start with CW//, and end with the end of the line.
- Dylan
- There are two types of comments in Dylan. They either start with CW//, or are nested comments, delimited with CW/* and CW*/. Under CW{-keep}, only CW$1 will be set, returning the entire comment. This pattern requires perl 5.6.0 or newer.
- ECMAScript
- The ECMAScript language has two forms of comments. Comments that start with CW// and last till the end of the line, and comments that start with CW/*, and end with CW*/. If CW{-keep} is used, only CW$1 will be set, and set to the entire comment. JavaScript is Netscapes implementation of ECMAScript. See <http://www.ecma-international.org/publications/files/ecma-st/Ecma-262.pdf>, and <http://www.ecma-international.org/publications/standards/Ecma-262.htm>.
- Eiffel
- Eiffel comments start with CW--, and last till the end of the line.
- False
- In False, comments start with CW{ and end with CW}. See <http://wouter.fov120.com/false/false.txt>
- FPL
- The FPL language has two forms of comments. Comments that start with CW// and last till the end of the line, and comments that start with CW/*, and end with CW*/. If CW{-keep} is used, only CW$1 will be set, and set to the entire comment.
- Forth
- Comments in Forth start with CW\, and end with the end of the line. See also <http://docs.sun.com/sb/doc/806-1377-10>.
- Fortran
- There are two forms of Fortran. There's free form Fortran, which has comments that start with CW!, and end at the end of the line. The pattern for this is given by CW$RE{Fortran}. Fixed form Fortran, which has been obsoleted, has comments that start with CWC, CWc or CW* in the first column, or with CW! anywhere, but the sixth column. The pattern for this are given by CW$RE{Fortran}{fixed}. See also <http://www.cray.com/craydoc/manuals/007-3692-005/html-007-3692-005/>.
- Funge-98
- The esotoric language Funge-98 uses comments that start and end with a CW;.
- fvwm2
- Configuration files for fvwm2 have comments starting with a CW# and lasting the rest of the line.
- Haifu
- Haifu, an esotoric language using haikus, has comments starting and ending with a CW,. See <http://www.dangermouse.net/esoteric/haifu.html>.
- Haskell
- There are two types of comments in Haskell. They either start with at least two dashes, or are nested comments, delimited with CW{- and CW-}. Under CW{-keep}, only CW$1 will be set, returning the entire comment. This pattern requires perl 5.6.0 or newer.
- HTML
-
In HTML, comments only appear inside a comment declaration.
A comment declaration starts with a CW<!, and ends with a
CW>. Inside this declaration, we have zero or more comments.
Comments starts with CW-- and end with CW--, and are optionally
followed by whitespace. The pattern CW$RE{comment}{HTML} recognizes
those comment declarations (and hence more than a comment).
Note that this is not the same as something that starts with
CW<!-- and ends with CW-->, because the following will
be matched completely:
<!-- First Comment -- --> Second Comment <!-- -- Third Comment -->
Do not be fooled by what your favourite browser thinks is an HTML comment. If CW{-keep} is used, the following are returned: - $1
- captures the entire comment declaration.
- $2
- captures the MDO (markup declaration open), CW<!.
- $3
- captures the content between the MDO and the MDC.
- $4
- captures the (last) comment, without the surrounding dashes.
- $5
- captures the MDC (markup declaration close), CW>.
- Hugo
- There are two types of comments in Hugo. They either start with CW! (which cannot be followed by a CW\), or are nested comments, delimited with CW!\ and CW\!. Under CW{-keep}, only CW$1 will be set, returning the entire comment. This pattern requires perl 5.6.0 or newer.
- Icon
- Icon has comments that start with CW# and end at the next new line. See <http://www.toolsofcomputing.com/IconHandbook/IconHandbook.pdf>, <http://www.cs.arizona.edu/icon/index.htm>, and <http://burks.bton.ac.uk/burks/language/icon/index.htm>.
- ILLGOL
- The esotoric language ILLGOL uses comments starting with NB and lasting till the end of the line. See <http://www.catseye.mb.ca/esoteric/illgol/index.html>.
- INTERCAL
- Comments in INTERCAL are single line comments. They start with one of the keywords CWNOT or CWN'T, and can optionally be preceeded by the keywords CWDO and CWPLEASE. If both keywords are used, CWPLEASE preceeds CWDO. Keywords are separated by whitespace.
- J
- The language J uses comments that start with CWNB., and that last till the end of the line. See <http://www.jsoftware.com/books/help/primer/contents.htm>, and <http://www.jsoftware.com/>.
- Java
- The Java language has two forms of comments. Comments that start with CW// and last till the end of the line, and comments that start with CW/*, and end with CW*/. If CW{-keep} is used, only CW$1 will be set, and set to the entire comment.
- JavaScript
- The JavaScript language has two forms of comments. Comments that start with CW// and last till the end of the line, and comments that start with CW/*, and end with CW*/. If CW{-keep} is used, only CW$1 will be set, and set to the entire comment. JavaScript is Netscapes implementation of ECMAScript. See <http://www.mozilla.org/js/language/E262-3.pdf>, and <http://www.mozilla.org/js/language/>.
- LaTeX
- The documentation language LaTeX uses comments starting with CW% and ending at the end of the line.
- Lisp
- Comments in Lisp start with a semi-colon (CW;) and last till the end of the line.
- LPC
- The LPC language has comments starting with CW/* and ending with CW*/.
- LOGO
- Comments for the language LOGO start with CW;, and last till the end of the line.
- lua
- Comments for the lua language start with CW--, and last till the end of the line. See also <http://www.lua.org/manual/manual.html>.
- M, MUMPS
- In CWM (aka CWMUMPS), comments start with a semi-colon, and last till the end of a line. The language specification requires the semi-colon to be preceeded by one or more linestart characters. Those characters default to a space, but that's configurable. This requirement, of preceeding the comment with linestart characters is not tested for. See <ftp://ftp.intersys.com/pub/openm/ism/ism64docs.zip>, <http://mtechnology.intersys.com/mproducts/openm/index.html>, and <http://mcenter.com/mtrc/index.html>.
- mutt
- Configuration files for mutt have comments starting with a CW# and lasting the rest of the line.
- Nickle
- The Nickle language has one line comments starting with CW# (like Perl), or multiline comments delimited by CW/* and CW*/ (like C). Under CW-keep, only CW$1 will be set. See also <http://www.nickle.org>.
- Oberon
- Comments in Oberon start with CW(* and end with CW*). See <http://www.oberon.ethz.ch/oreport.html>.
- Pascal
- There are many implementations of Pascal. This modules provides pattern for comments of several implementations. This is the pattern that recognizes comments according to the Pascal ISO standard. This standard says that comments start with either CW{, or CW(*, and end with CW} or CW*). This means that CW{*) and CW(*} are considered to be comments. Many Pascal applications don't allow this. See <http://www.pascal-central.com/docs/iso10206.txt> The Alice Pascal compiler accepts comments that start with CW{ and end with CW}. Comments are not allowed to contain newlines. See <http://www.templetons.com/brad/alice/language/>. The Delphi Pascal, Free Pascal and the Gnu Pascal Compiler implementations of Pascal all have comments that either start with CW// and last till the end of the line, are delimited with CW{ and CW} or are delimited with CW(* and CW*). Patterns for those comments are given by CW$RE{comment}{Pascal}{Delphi}, CW$RE{comment}{Pascal}{Free} and CW$RE{comment}{Pascal}{GPC} respectively. These patterns only set CW$1 when CW{-keep} is used, which will then include the entire comment. See <http://info.borland.com/techpubs/delphi5/oplg/>, <http://www.freepascal.org/docs-html/ref/ref.html> and <http://www.gnu-pascal.de/gpc/>. The Workshop Pascal compiler, from SUN Microsystems, allows comments that are delimited with either CW{ and CW}, delimited with CW(*) and CW*), delimited with CW/*, and CW*/, or starting and ending with a double quote (CW"). When CW{-keep} is used, only CW$1 is set, and returns the entire comment. See <http://docs.sun.com/db/doc/802-5762>.
- PEARL
- Comments in PEARL start with a CW! and last till the end of the line, or start with CW/* and end with CW*/. With CW{-keep}, CW$1 will be set to the entire comment.
- PHP
- Comments in PHP start with either CW# or CW// and last till the end of the line, or are delimited by CW/* and CW*/. With CW{-keep}, CW$1 will be set to the entire comment.
- PL/B
- In PL/B, comments start with either CW. or CW;, and end with the next newline. See <http://www.mmcctech.com/pl-b/plb-0010.htm>.
- PL/I
- The PL/I language has comments starting with CW/* and ending with CW*/.
- PL/SQL
- In PL/SQL, comments either start with CW-- and run till the end of the line, or start with CW/* and end with CW*/.
- Perl
- Perl uses comments that start with a CW#, and continue till the end of the line.
- Portia
- The Portia programming language has comments that start with CW//, and last till the end of the line.
- Python
- Python uses comments that start with a CW#, and continue till the end of the line.
- Q-BAL
- Comments in the Q-BAL language start with CW` (a backtick), and contine till the end of the line.
- QML
- In CWQML, comments start with CW# and last till the end of the line. See <http://www.questionmark.com/uk/qml/overview.doc>.
- R
- The statistical language R uses comments that start with a CW# and end with the following new line. See <http://www.r-project.org/>.
- REBOL
- Comments for the REBOL language start with CW; and last till the end of the line.
- Ruby
- Comments in Ruby start with CW# and last till the end of the time.
- Scheme
- Scheme comments start with CW;, and last till the end of the line. See <http://schemers.org/>.
- shell
- Comments in various shells start with a CW# and end at the end of the line.
- Shelta
- The esotoric language Shelta uses comments that start and end with a CW;. See <http://www.catseye.mb.ca/esoteric/shelta/index.html>.
- SLIDE
- The SLIDE language has two froms of comments. First there is the line comment, which starts with a CW# and includes the rest of the line (just like Perl). Second, there is the multiline, nested comment, which are delimited by CW(* and CW*). Under C{-keep}>, only CW$1 is set, and is set to the entire comment. This pattern needs at least Perl version 5.6.0. See <http://www.cs.berkeley.edu/~ug/slide/docs/slide/spec/spec_frame_intro.shtml>.
- slrn
- Configuration files for slrn have comments starting with a CW% and lasting the rest of the line.
- Smalltalk
- Smalltalk uses comments that start and end with a double quote, CW".
- SMITH
- Comments in the SMITH language start with CW;, and last till the end of the line.
- Squeak
- In the Smalltalk variant Squeak, comments start and end with CW". Double quotes can appear inside comments by doubling them.
- SQL
- Standard SQL uses comments starting with two or more dashes, and ending at the end of the line. MySQL does not follow the standard. Instead, it allows comments that start with a CW# or CW-- (that's two dashes and a space) ending with the following newline, and comments starting with CW/*, and ending with the next CW; or CW*/ that isn't inside single or double quotes. A pattern for this is returned by CW$RE{comment}{SQL}{MySQL}. With CW{-keep}, only CW$1 will be set, and it returns the entire comment.
- Tcl
- In Tcl, comments start with CW# and continue till the end of the line.
- TeX
- The documentation language TeX uses comments starting with CW% and ending at the end of the line.
- troff
- The document formatting language troff uses comments starting with CW\", and continuing till the end of the line.
- vi
- In configuration files for the editor vi, one can use comments starting with CW", and ending at the end of the line.
- *W
- In the language *W, comments start with CW||, and end with CW!!.
- zonefile
- Comments in DNS zonefiles start with CW;, and continue till the end of the line.
REFERENCES
- [Go 90]
- Charles F. Goldfarb: The SGML Handbook. Oxford: Oxford University Press. 1990. ISBN 0-19-853737-9. Ch. 10.3, pp 390-391.
HISTORY
$Log: comment.pm,v $ Revision 2.116 2005/03/16 00:00:02 abigail CQL, INTERCAL, R
Revision 2.115 2005/01/09 23:12:03 abigail BML comments
Revision 2.114 2004/12/18 11:43:06 abigail POD: HTML comments end in >, not <
Revision 2.113 2004/12/15 22:06:51 abigail Fixed regex for J comments
Revision 2.112 2004/06/09 21:44:48 abigail New languages
Revision 2.111 2003/09/24 08:39:35 abigail Stupid "syntax" warning issues false positives
Revision 2.110 2003/08/19 21:27:55 abigail Nickle language
Revision 2.109 2003/08/13 10:07:39 abigail Added patterns for C--, C#, Cg and SLIDE comments
Revision 2.108 2003/08/01 11:30:25 abigail Comments for 'QML' and 'PL/SQL'
Revision 2.107 2003/05/25 21:33:48 abigail POD nits from Bryan C. Warnock
Revision 2.106 2003/03/12 22:25:42 abigail - More generic setup to define comments for various languages. - Expanded and redid the documentation for comment.pm. - Comments for Advisor, Advsys, Alan, Algol 60, Algol 68, B, BASIC (mvEnterprise), Forth, Fortran (both fixed and free form), fvwm2, mutt, Oberon, 6 versions of Pascal, PEARL (one of the at least four...), PL/B, PL/I, slrn, Squeak.
Revision 2.105 2003/03/09 19:04:42 abigail - More generic setup to define comments for various languages. - Expanded and redid the documentation for comment.pm. Now every language has its own paragraph, describing its comment, and pointers to webpages. - Comments for Advisor, Advsys, Alan, Algol 60, Algol 68, B, BASIC (mvEnterprise), Forth, Fortran (both fixed and free form), fvwm2, mutt, Oberon, 6 versions of Pascal, PEARL (one of the at least four...), PL/B, PL/I, slrn, Squeak.
Revision 2.104 2003/02/21 14:48:06 abigail Crystal Reports
Revision 2.103 2003/02/11 09:39:08 abigail Added
Revision 2.102 2003/02/07 15:23:54 abigail Lua and FPL
Revision 2.101 2003/02/01 22:55:31 abigail Changed Copyright years
Revision 2.100 2003/01/21 23:19:40 abigail The whole world understands RCS/CVS version numbers, that 1.9 is an older version than 1.10. Except CPAN. Curse the idiot(s) who think that version numbers are floats (in which universe do floats have more than one decimal dot?). Everything is bumped to version 2.100 because CPAN couldn't deal with the fact one file had version 1.10.
Revision 1.19 2002/11/06 13:51:34 abigail Minor POD changes.
Revision 1.18 2002/09/18 18:13:01 abigail Fixes for 5.005
Revision 1.17 2002/09/04 17:04:24 abigail Q-BAL
Revision 1.16 2002/08/27 16:50:50 abigail Patterns for Beatnik, Befunge-98, Funge-98 and W*.
Revision 1.15 2002/08/22 17:04:03 abigail SMITH added
Revision 1.14 2002/08/22 16:41:25 abigail + Added function 'id' and 'from_to' with associated data. + Added function 'combine' for languages having multiple syntaxes. + Added 'Shelta'
Revision 1.13 2002/08/21 16:00:32 abigail beta-Juliet, Portia, ILLGOL and Brainfuck.
Revision 1.12 2002/08/20 17:40:37 abigail - Created a 'nested' function (simplified version from Regexp::Common::balanced). - Comments that use 'from' to eol or balanced (nested) delimiters are now generated from a data array. - Added Hugo and Haifu.
Revision 1.11 2002/08/05 12:16:58 abigail Fixed 'Regex::' and 'Rexexp::' typos to 'Regexp::' (Found my Mike Castle).
Revision 1.10 2002/07/31 23:33:16 abigail Documented that Haskell and Dylan comments need at least 5.6.0.
Revision 1.9 2002/07/31 23:12:29 abigail Dylan and Haskell comments can be nested, hence version 5.6.0 of Perl is needed to be able to make a regex matching them.
Revision 1.8 2002/07/31 14:48:16 abigail Added LOGO (to please petdance)
Revision 1.7 2002/07/31 13:06:41 abigail Dealt with -keep for Haskell and Dylan.
Revision 1.6 2002/07/31 00:54:00 abigail Added comments for Haskell, Dylan, Smalltalk and MySQL.
Revision 1.5 2002/07/30 16:38:23 abigail Added support for the languages: LaTeX, Tcl, TeX and troff.
Revision 1.4 2002/07/26 16:48:12 abigail Simplied datastructure for the languages that use single line comments.
Revision 1.3 2002/07/26 16:37:20 abigail Added new languages: Ada, awk, Eiffel, Java, LPC, PHP, Python, REBOL, Ruby, vi and zonefile.
Revision 1.2 2002/07/25 22:37:44 abigail Added 'use strict'. Added 'no_defaults' to 'use Regex::Common' to prevent loaded of all defaults.
Revision 1.1 2002/07/25 19:56:07 abigail Modularizing Regexp::Common.
SEE ALSO
Regexp::Common for a general description of how to use this interface.
AUTHOR
Damian Conway (damian@conway.org)
MAINTAINANCE
This package is maintained by Abigail (regexp-common@abigail.nl).
BUGS AND IRRITATIONS
Bound to be plenty.
For a start, there are many common regexes missing. Send them in to regexp-common@abigail.nl.
COPYRIGHT
Copyright (c) 2001 - 2003, Damian Conway. All Rights Reserved. This module is free software. It may be used, redistributed and/or modified under the terms of the Perl Artistic License (see http://www.perl.com/perl/misc/Artistic.html)