man Language::INTERCAL () - INTERCAL compiler for Perl

NAME

Language::INTERCAL - INTERCAL compiler for Perl

SYNOPSIS

    use Language::INTERCAL 'program', '(program text)';

    program();

DESCRIPTION

Language::INTERCAL compiles an INTERCAL program and creates a subroutine in the caller's package containing the object code. The Perl syntax is shown in SYNOPSIS; for the INTERCAL program text, refer to the INTERCAL-72 documentation (or the on-line reference on CWhttp://www.assurdo.com/INTERCAL/reference/). Please note that, for compatibilit with INTERCAL-72, the program text is assumed to be in EBCDIC, unless there is evidence that the text is ASCII. The functions described in Charset::EBCDIC might be found useful.

If one does not want to compile the INTERCAL programs at Perl's compile-time, the following syntax is also allowed:

    use Language::INTERCAL;

    compile Language::INTERCAL 'program', '(program text)';

The compile and use also accept a third parameter, a filehandle. If this is given, a program listing (in approximate ASCII, with any constant, variable number, and label translated in Roman numerals) is produced.

To run the program, just call it as a normal subroutine. It accepts up to two optional parameters, which specify input and output redirection. If the two parameters are omitted, it uses CWSTDIN and CWSTDOUT; if just the second is omitted, it uses the input provided and CWSTDOUT. The redirection arguments can be filehandles, which are used in the obvious way, or code references. If a code reference is used for input, it is expected to return one line each time it is called without parameters, and one block when it is called with one parameter (the block size). If a code reference is used for output, it is expected to just send all its arguments to the appropriate output, just like perl's print would do.

There is one minor restriction to the language, and several extensions. The construct DO (label) NEXT is not supported. Too many people have been tempted into using it as a replacement for GO TO. The parser recognises it, but you get a runtime error (see DIAGNOSTIC). The constructs DO RESUME and DO FORGET also cause an error, as they are meaningless without DO NEXT.

SEPARATE COMPILATION AND DIFFERENT BACK ENDS

The compile subroutine, and the import mechanism via use simply call the various parts of the compiler in the correct sequence. As of version 0.02, we allow the program a finer control of the compilation process, and provide support for different back ends. First, the program must be parsed:

    use Language::INTERCAL;

    my $parse_tree = parse Language::INTERCAL '(program text)';

The CW$parse_tree returned is a Perl object with methods to continue the compilation process. These methods are: This method runs the optimiser on the parse three. This method allows to join together different parse threes. Runs the compiler back end identified by TYPE. LIST will be passed to the back end together with the parse tree. If the optimiser had been called on the parse tree, a second-stage optimiser is automatically called at this point. The compiler comes with three back ends. Two of them produce Perl code, one produces INTERCAL source code (it is used by sick). The default back end, Perl, requires one string as argument and produces a subroutine with that name in the caller's name space. Calling that subroutine will run the program. The other Perl back end providec, PerlText, requires one string as argument and produces a new file with the perl code. To run the code, simply use:

    perl progname [input_file [output_file]]
The last back end produces an INTERCAL program. It can be used to disassemble compiler objecs, to pretty-print programs, or just as a very expensive no-op. If you optimise your program, and look at the differences between the source code and the output of this back end, you can learn something about the way the optimiser works. This back end requires one string as argument, the file name. The procedure to create different back ends is currently undocumented, but nobody stops you looking at the two back ends provided to see how they work. Note that the interface (and most of the compiler's internals) has changed between 0.04 and 0.05.

There is a program oo, ick (note embedded space and comma in the name), which will perform some or all parts of the compilation process, can save parse trees to a file and reread them in a subsequent invocation.

Another program, sick, disassembles compiler objects (as saved by oo, ick -c) and produces INTERCAL source. This is similar to calling the INTERCAL back end, except that sick is able to read compiler objects produced by older versions of CLC-INTERCAL - if you have such objects and the source code is not available, this is the only way to convert them for use with this version of CLC-INTERCAL.

INPUT/OUTPUT

The first extension is in the input/output mechanism. There are three types of input/output supported, as compared to just one in INTERCAL-72:

Numeric I/O
This is the standard INTERCAL-72 input/output. When writing in from the input, it reads one card which must contain a number, spelled out in full in English (the input must be in EBCDIC, of course). When reading out to the output, the number is transformed into Roman numerals, with a slight modification: a lower case numberal is considered to be multiplied by 1000, and an underlined one by 1,000,000 (combining the two, a lower case underlined numeral is multiplied by 1,000,000,000). For example, IV is 4, iv is 4000, "IV is 4,000,000 and iv is 4,000,000,000. There is an exception in that numbers between 1000 and 3999 are represented with M for 1000, not i, so MMMII is 3002 and MIMI" is 1,001,001,001. This is almost, but not quite, identical to INTERCAL-72 and just slightly different from C-INTERCAL. Romans used overline to multiply by 1000, double overline to multiply by 1,000,000 and so on. INTERCAL-72 used lower case to multiply by 1000 and overline to multiply by 1,000,000. We think our own notation will be self-explanatory (which is precisely why we have spent so many words on it).
Alphanumeric I/O
This is the kind of input/output the compiler itself uses. The input is assumed to consist of a sequence of 80-character records, encoded in EBCDIC (these records are also known as punched cards, or simply cards). It is converted to Baudot code while reading, if possible. Characters which cannot be represented in Baudot will cause a runtime error. Because Baudot cannot properly represent upper- and lower-case letters, we use a slightly non-standard version, in which requesting to shift to the letters set from itself will cause a case change. Also, requesting to shift to the figures set from itself will shift to a set of special symbols. In the output, the opposite conversion is applied, and the product is one line intended to be sent to an ASCII line printer. Please see Charset::Baudot for a description of the extended Baudot code. The alphanumeric i/o has the same syntax as the numeric i/o, but uses 16 bit arrays instead of numeric variables.
Binary I/O
This is the simplest, yet most powerful, form of input/output. Its syntax is identical to the numeric i/o, but uses 32 bit arrays instead of numeric variables. The input is assumed to be a stream of bytes, which is not interpreted in any way. The number of bytes written in from the input is the same as the size of the array specified. A value #172 is inserted just before the first value, then a simple algorithm is applied to each pair of consecutive bytes as follows: first the left-hand side byte is used to select the right-hand side; this value is extended to 16 bits by padding with a random value; then the complement of the left-hand side byte selects the complement of the right-hand side byte; again, the value is random-extended to 16 bits; the two values are then interleaved. It is clear that all the bits of the right-hand side byte are present in the result, and their order is changed in a predictable way, so no information is lost. This is the value which gets written in in the appropriate place in the array. For output, the reverse algorithm is applied, and the resulting value is chopped to 8 bits, to obtain a sequence of bytes. Note that a formerly undocumented compiler directive allows to change the standard initial value of 172. Programs have been known to stop working if you use this compiler directive.

COMPUTED COME FROM

CLC-INTERCAL allows an expression instead of a label in the COME FROM statement. This will be resolved at run-time. It is an error for a program to have two COME FROM statements pointing at the same label. However, this condition is only checked when a statement with a label is executed, and an error is only reported if there are two COME FROMs pointing at that label. This might simplify programs.

The computed COME FROM can be used as a simple way to invoke subroutines, for example:

            DO REINSTATE (11)
    (10)    DO .1 <- #10
    (11)    DON'T COME FROM (1001)
            PLEASE ABSTAIN FROM (11)

    ...

    (1000)  DO COME FROM .1
    ...
    (1001)  DO .1 <- #0

It is clear that this will call subroutine (1000) as soon as it executes statement (10), because of the computed COME FROM. It is also clear that (1001) acts as a RETURN statement. Because statement (11) is normally ABSTAINed, it is not an error to have more than one call to this subroutine, as there won't be more than one active COME FROM statement pointing at (1001). However, the statement must be REINSTATEd just before the call, and ABSTAINed immediatly after to avoid runtime errors.

Note that recursive calls are a bit tricky, but nothing so difficult that an experienced program can't come up with an elegant solution in five or six months.

See the description of object-oriented features below for a better way to encode subroutines, which also allows recursion in a simple way.

GRAPH DATA STRUCTURES

It is possible to impose a graph structure between registers of the same type. This can be used to build complex data structures, or simply to make the program less readable. These data structures are implicitely represented by defining the BELONGS TO relationship between registers. In the simplest case of a tree structures, every leaf belongs to some other node, which in turn belongs to another node, and eventually some node will belong to the root. For more complex structures the idea is similar. To define that (for example) CW.1 belongs to CW.2, CW.1 must stop being a free register and CW.2 becomes its owner:

    PLEASE ENSLAVE .1 TO .2

If the BELONGS TO relationship breaks, perhaps because the item escaped, we need to inform the compiler:

    DO FREE ,1 FROM ,3

It is not an error if a register BELONGS TO more than one register, or even if it BELONGS TO itself. It is confusing, but is not an error.

If you know a register and want its owner, just prefix the register with the big-money (CW$) symbol. If the register has several owners, the second owner can be found by prefixing with CW2, the third owner with CW3, and so on until the ninth owner. These prefisex can be repeated. For example, CW$23,1 means the third owner of whatever register is the second owner of CW,1's owner. For example, if CW,1 is enslaved to CW,9, then CW$23,1 is equivalent to CW23,9.

An example might make the last point clearer. Suppose we have:

    PLEASE ENSLAVE ,1 TO ,9
    DO ENSLAVE ,1 TO ,7
    DO ENSLAVE ,1 TO ,5
    DO ENSLAVE ,9 TO ,2
    DO ENSLAVE ,9 TO ,3
    DO ENSLAVE ,9 TO ,1
    PLEASE ENSLAVE ,3 TO ,9
    DO ENSLAVE ,3 TO ,7
    DO ENSLAVE ,3 TO ,2

Then CW$23,1 is equivalent to CW,2 (because CW,1's owner is CW,9, so CW$23,1 is the same as CW23,9; now CW,9's owners are CW,2, CW,3, and CW,1, so the second owner is CW,3; hence CW23,9 is the same as CW3,3; finally, CW,3's third owner is CW,2).

A future version will allow to follow the ownership relation in a dynamic way. However, we haven't decided on a syntax for it yet.

OBJECT-ORIENTED FEATURES

After days (or perhaps minutes) of hard thought, we have created an object model for INTERCAL which is consistent with the philosophy of the language, and requires only a minimal change in the syntax. Older programs will continue to run unchanged if they do not use any object-oriented features.

To understand the object model, we need to consider the concept if class. A class is a place where one goes to learn something. More precisely, a class is a place where one can learn about a number of subjects. To define a class, we simply need to state which subjects one can learn and where in the program one can find the lectures. A lecture looks like a normal subroutine, and starts at a label (which must be at least 1000, we are not allowing lectures before 1000 because people can be asleep). We shall describe the lectures after we have described the students; for now it suffices to say that if lecture CW(1) teaches about subject CW#12 then we can associate this teaching with a particular class, say CW@36 with:

    DO STUDY #12 AT (1000) IN CLASS @36

Here CW@ (whirlpool) is used to identify a new type of registers, the class registers. These cannot be used as normal registers, but they can partecipate in the graph structure defined by BELONGS TO. This is important for lectures.

Suppose now that register CW:7 is a student of class CW@36. This register can decide to go to a lecture about subject CW#12 with the statement:

    DO :7 LEARNS #12

This causes the runtime to look up which class(es) the student CW:7 attends to, and, if one of them has a lecture for CW#12, the corresponding subroutine is called. During the lecture, the class register (for example CW@36) is enslaved to the student (CW:7), so it is easy to refer to the student during the lecture, for example using CW$@36. The lecture ends when the statement:

    DO FINISH LECTURE

is executed. At this point, program execution resumes after the corresponding CWLEARNS, and the class register's BELONGS TO relation is restored to its value before the start of the lecture. Note that this means that the student's BELONGS TO state cannot be changed during a lecture. This restriction might be removed in a future release.

To complete the discussion of the object model, we need to see how a register becomes a student of some class. This is done by asking which class can teach a subject:

    DO ENROL :7 TO LEARN #12

It is possible to demand more than one subject, for example:

    DO ENROL :7 TO LEARN #12 + #42 + #1536

If the runtime can find a class which teaches all the required subject, the register is marked as a student of that class. If no such class can be found, the runtime complains that THIS MUST BE A HOLIDAY. If there is more than one candidate class, you get the error message CLASS WAR BETWEEN CW@2 and CW@36. Specify more subjects to identify the class uniquely.

Finally, it is possible to remove any association between a student and its classes, so it is a normal register again. This is done with:

    DO :7 GRADUATES

We can't resist to point out that the statement:

    PLEASE ABSTAIN FROM STUDYING + GRADUATING

is syntactically valid, but might not prove very popular in universities. It also makes impossible to define new classes until the corresponding CWREINSTATE.

QUANTUM INTERCAL

Version 0.04 and subsequent of CLC-INTERCAL provide support for quantum computers. They allow to create quantum bits (both true and false at the same time) with statements ABSTAIN, REINSTATE, IGNORE and REMEMBER, by allowing these statements and their contrary to be executed symultaneously. For example:

    PLEASE ABSTAIN FROM (1) WHILE REINSTATING IT
    PLEASE REINSTATE COMING FROM + NEXTING WHILE ABSTAINING FROM THEM
    DO IGNORE .1 + .2 WHILE REMEMBERING THEM
    DO REMEMBER :1 WHILE IGNORING IT

See the reference manual for more information.

OPERAND OVERLOADING

Starting with CLC-INTERCAL 0.05, operand overloading is supported. Any expression can contain an overloading subexpression, which alters the meaning of a register (or a range of registers) while returning the register's original value. For example:

    PLEASE :1 <- :2 / .3 ¢ .4

This assigns to :1 the original value of :2, while at the same time asking that from now on, any reference to :2 be replaced with the expression .3 ¢ .4

As a result of this overloading, the statements:

    DO READ OUT :2
    DO :1 <- :2

will respectively read and assign to :1 the result of calculating .3 ¢ .4

The replacement applies everywhere, except within the expression itself (this is just to avoid loops). For example:

    DO :2 <- #123 ¢ #456

will result in:

    DO .3 <- #123
    DO .4 <- #456

Within the overloaded expression, CW@0 will be enslaved to the original register (:2 in all these examples).

The range overloading takes a 32 bit number and an expression; the number is considered the interleave of two 16 bit numbers, and the overloading is applied to all registers with numbers between the two. For example:

    DO .1 <- '#1 ¢ #4' \ #0

Will overload .1 to .4 and :1 to :4 with the constant 0.

See the reference manual for more information.

OTHER EXTENSIONS

Version 0.05 and subsequent of CLC-INTERCAL support a C-unlike postprocessor (that's the opposite of a C-like preprocessor, well not quite), loop constructs, event-based programming, and exception handling.

FORMERLY UNDOCUMENTED COMPILER AND RUNTIME DIRECTIVES

There are several directives which can be used to alter the behaviour of the compiler or the runtime library. These only apply to the runtime if the Perl backend is used. If you use a different backend, see the documentation which comes with it.

All compiler and runtime directive are selected by calling the subroutines fudge, fiddle, or <toggle>. These are all equivalent to each other, and are not exported or exportable. Each element in the argument list is processed separately, the unrecognised ones are silently ignored.

mingle
Switches between two representations for the interleave operator in the program listings produced by the parser. Default is to represent it with c-overstrike-/, the other representation uses a change symbol but only works if the output font is ISO-8859-1.
xor
Switches between two representations for the xor operator in the program listings produced by the parser. Default is to represent it with V-overstrike-worm, the other representation uses a yen symbol which is the closest approximation we could find to the xor symbol in ISO 8859-1.
next
Toggles the acceptance of obsolete statements. The parser always accepts them, and by default they cause a runtime error. By toggling this, the runtime error is removed. The runtime error can be also evoided with a command-line option (--obsolete).
roman
Switches between two representations for Roman numeral greather than or equal to 1,000,000. Default is to prefix them with a backslash, the other representation underlines the numerals (prefixes them with flatworm-overstrike) This applies to program listings as well as any numeric output produced by the Perl runtime.
width=\d+
Selects the width of the program listing produced by the parser. Default is width=79.
io=\d+
Selects the initial value used for binary input/output. Default is io=172. The default is recorded in the compiled program. When programs are linked, the first program's default applies.
bug=\d+
Alters the probability of the random compiler error. The compiler error is introduced inside a double-oh-seven with the corresponding probability. If the probability is zero, there is a very small probability of it occurring anyway: if this happens, it is called an unexplainable compiler error.

DIAGNOSTIC

In the likely case of an error, the compiler can, at its own discretion, decide to continue compiling the program and introduce a runtime error, or abort the compilation process with self-explanatory error message, which usually starts with a 3 digit error code. When introducing a runtime error, the statements is marked with a splat (CW*) followed by the 3 digit error code, but no message is produced by the compiler. The back end and the runtime can decide what to do in this case.

COMPILER INDUCED RUNTIME ERRORS

The following errors are marked with a splat in the listing and transformed into runtime errors:

000 (UNIDENTIFIED STATEMENT)
Any error for which a more specific code has not been dreamed up.
072 (INVALID LABEL)
A label number is 0 or greater than 65535, or contains things which cannot be imagined to be numbers.
081 (MISSING PLEASE/DO)
A statement starts with something which we can't quite pretend looks like a PLEASE or a DO.
084 (INVALID GERUND)
Something like PLEASE ABSTAIN FROM WRITHING would give this splat. A statement starts with a register identifier and possibly CWSUB and a few expressions, but there is no "<CW-" after that.
100 (ILLEGAL NUMBER AFTER 007)
The number after CW% (double-oh-seven), is zero or greater than 100. If a number is not given, an error code 0 is used instead, until we get a better idea. One of the required parts of an CWENSLAVE is damaged. The register is left free. Some paperwork is missing. The register is still a slave.
199 (INVALID CONSTANT)
A constant is greater than 65535, or does not start with "CW#". Check the Oxford English Dictionary or any other dictionary produced in the UK. The university refuses the application.
218 (POSTPROCESSOR ERROR)
Invalid template, or other problem, in a postprocessor directive.
241 (INVALID DIMENSIONS)
Using an array with too many or not enough subscripts, or with a subscript of zero or too large.
242 (INCOMPLETE WORM)
There is a +, but there isn't a - I wouldn't know, I never do it.
299 (INVALID REGISTER)
A register number is 0 or greater than 65535.
398 (SPARK UNDERFLOW)
A subexpression starts with a spark and ends with none.
399 (MISSING RABBIT)
A subexpression starts with a set rabbit ears but ends with none.
458 (REDEFINED LABEL)
A label and its twin brother have been found together in your program.
499 (POLITENESS ERROR)
Stop saying PLEASE that much. Or maybe you are not saying it enough.
999 (SORRY, WE DO NOT ALLOW LECTURES BEFORE 1000)
A CWSTUDY statements uses a label less than (1000). This is not acceptable.

FATAL COMPILER ERRORS

The compiler commits suicide when it sees any of the following:

012 (I/O ERROR)
The back end cannot READ OUT the program to a file.
013 (SYNTAX ERROR)
The call from Perl to the compiler wasn't quite acceptable.
110 (O/I ERROR)
The program could not be WRITTEN INto the compiler.
111 (CHARACTER SET ERROR)
Compiler cannot guess character set. Or maybe it can, but doesn't support it.
666 (ERROR IN BLACK MAGIC)
All compiler objects contain some black magic. In this case, it must have worn out. Magic was not found while reading a compiler object from an .ipt file.

In addition, a 458 error (label redefined) might cause a fatal compiler error if detected during linking.

RUNTIME ERRORS

The following errors are generated by the runtime:

-over 4

003 (TOO MANY COMMAND-LINE ARGUMENTS)
The compiled program is called with too many parameters. The maximum is usually 2, which is why the error code is 3.
012 (I/O ERROR)
You asked output to a file, but something didn't work.
013 (SYNTAX ERROR)
The call from Perl to the compiler wasn't quite acceptable.
110 (O/I ERROR)
You asked input from a file, but something didn't work.
111 (CHARACTER SET ERROR)
You asked to use a character set which is not supported.
129 (NO SUCH LABEL)
You can't DO that label NEXT, or claim that there is a lecture there.
241 (REGISTER USAGE ERROR)
Using an array or a class as a value, or some such nonsense. Or maybe doing something wrong with the subscripts.
275 (INCOMPATIBLE NUMBER OF SPOTS)
A 32 bit value has been assigned to a 16 bit register.
401 (OBSOLETE PROGRAM)
You attempted to use a statement we don't like.
436 (STASH HIDDEN TOO WELL)
An attempt has been made to RETRIEVE a register which wasn't STASHed.
456 (NO SPLAT)
You attempted to use *, but the program had not yet died of an error. It has now.
512 (OWNERSHIP MISMATCH)
Attempting to FREE somebody, you failed to correctly identify one of his owners.

Also, attempting to find out owner information from a free register.

533 (33 BIT VALUE)
An INTERLEAVE operator produced a value which cannot fit in a two spot register.
555 (EXCESSIVE ATTRACTION)
More than one COME FROM attract the same label. If possible, this will be produced by the compiler. However, if there are computed COME FROMs or static ones which can be ABSTAINed FROM, it becomes a runtime error.
603 (CLASS WAR)
More than one class teaches the subjects listed. Be more precise.
621 (POINTLESS RESUME/FORGET)
The expression in a RESUME or FORGET evaluated to 0.
623 (FORGETTING TOO MUCH)
The program terminated via RESUME instead of GIVE UP.
633 (FALLING OFF THE EDGE OF THE PROGRAM)
The program executed past the last statement.
774 (COMPILER ERROR)
The Random Bug opcode has been executed. See method fiddle.
799 (NO SUCH LECTURE)
Cannot find a place to LEARN that subject.
801 (NOT IN A LECTURE)
Cannot FINISH a LECTURE if it hasn't started.
822 (NOT A STUDENT)
Need to ENROL before LEARNING and long before GRADUATING.
823 (COURSE MISMATCH)
You try to LEARN something but you didn't ENROL for the right class.
997 (SO MUCH TO SEE, SO LITTLE TIME)
An alphanumeric WRITE IN received more data than it can store. Note that the conversion to Baudot increases the data size because of the shift codes.

In addition, using non-constant labels can cause the runtime to generate errors 072 and 999 when it finds out.

NOTES

The C-INTERCAL character input/output is not supported. Use of bases different from 2 is not yet allowed. There is a formerly undocumented flag which will cause the runtime to accept DO NEXT, FORGET, and RESUME.

The two random compiler bugs are not currently implemented. These will be added in a future version.

There is currently no simple way to debug programs. This will also be addressed in a future release.

The expression syntax is extended to allow unary operators before the spot, two-spot or mesh. Thus "CW#&12 is equivalent to CW&#12".

This module reimplements some of the functionality of some other modules (see Exporter, Carp, Reinventing the Wheel). This is intentional, as it will leave larger scope for obfuscation in a future release.

It is syntactically valid to say:

    no Language::INTERCAL;

in your program. Currently, this causes a fatal error because you can't get rid of INTERCAL. Tough.

BUGS

There are none. See next paragraph for evidence.

There might be minor modifications to the accepted language syntax, and minor perceived malfunctions. These modifications and perceived malfunctions come in two classes: compiler bugs and intentional restrictions. Any modification or perceived malfunction which is documented is by definition intentional. We consider the Perl source code part of the documentation, and, since the whole compiler is written in Perl, every modification and perceived malfunction is documented. Hence there are no bugs.

COPYRIGHT

This module is part of CLC-INTERCAL.

Copyright (c) 1999 by Claudio Calvelli <CWlunatic@assurdo.com>, all (f)rights reserved.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

SEE ALSO

A qualified psychiatrist.