The first time I combined the ideas I had been developing into a single entity, I was working on an IBM 1130, a "third-generation" computer. The result seemed so powerful that I considered it a "fourth generation computer language." I would have called it Fourth, except that the 1130 permitted only five-character identifiers. So Fourth became Forth, a nicer play on words anyway.
The first program to be called Forth was written in about 1970. The first complete implementation was used in 1971 on a DEC PDP-11 for the National Radio Astronomy Observatory's 11-meter radio telescope in Arizona. This system was responsible for pointing and tracking the telescope, collecting data and recording it on magnetic tape, supporting an interactive graphics terminal on which an astronomer could analyse previously recorded data. The multi-tasking nature of the system allowed all these function to be performed concurrently, without timing conflicts or other interference.
The system was so useful that astronomers from all over the world began asking for copies. Its use spread rapidly and in 1976 Forth was adopted as a standard language by the International Astronomical Union.
The success of this application enabled Moore and Elizabeth ("Bess") Rather in 1973 to form "FORTH, Inc.", to explore commercial uses of the language. FORTH, Inc., developed multi-user versions of Forth on minicomputers for diverse projects ranging from data-bases to scientific applications such as image processing. Like the first application these often required a mixture of various facilities.
A version was developed, in 1977, for the newly-introduced 8 Bit microprocessors called "microFORTH". This was complemented by their "miniFORTH" product for minicomputers. Later (in 1979) these systems where replaced by the "PolyForth" product. This has since become one of the largest selling Forth system on the market.
"microFORTH" was successfully used in embedded microprocessor application in the United States, Europe, and Japan. The success of microFORTH lead to the formation of the European Forth Users Group (EFUG), later, in 1978, a group of computer hobbyists in Northern California formed the Forth Interest Group (FIG).
The members of FIG obtained a Forth system from an observatory. From this they developed a simple model which they implemented on several systems and (with permission from FORTH, Inc.) published listings and disks at very low cost. This model later became known as the FIG-Forth model. This action helping the rapid spread of interest in Forth. FIG now has 60 "chapters" in 15 countries.
Forth is often spoken of as a language because that is its most visible aspect. However, Forth is more than a conventional programming language in that all the capabilities normally associated with a large portfolio of separate programs (compilers, editors, assemblers, etc.) are included within its range. It is also less than a conventional programming language in its deliberate lack of complex syntax characteristic of most high-level languages.
The original implementations of Forth were stand-alone systems that included functions normally performed by separate operating systems, editors, compilers, assemblers, debuggers and other utilities. A single, simple, consistent set of rules governed this range of capabilities. Today, although very fast stand-alone version are sill marketed for many processors, there are also many versions that run co-resident with conventional operating systems, such as MS-DOS and Unix.
Forth was not derived from another other language. As a result, its appearance and internal characteristics may seem unfamiliar to new users. But Forth's simplicity, extreme modularity, and interactive nature offset the initial strangeness, making it easy to learn and use. A new Forth programmer must invest some time mastering its large command repertoire. After a month or so of full-time use, the programmer could understand more of its internal working than is possible with conventions operating systems and compilers.
The most unconventional feature of Forth is its extensibility. The programming process in Forth consists of defining new words, actually new commands in the language. These may be defined in terms of previously defined words, much as one teaches a child concepts by explaining them in terms of previously understood concepts. Such words are called "high level definitions." Alternatively, new words may also be defined in assembly code, since most Forth implementations include an assembler for the host processor.
As a result of this extensibility, developing an application has the collateral result of developing a special "application-oriented language" for that type of application which may be applied to a similar application or used to modify this one.
Forth's extensibility goes beyond just adding new commands to the language. With equivalent ease, one can also add new classes of words. That is, one may create a word which itself will define words. In creating such a defining word the programmer may specify a specialised behavior for the words it will create which will be effective at compile time, at run-time, or both. This capability allows one to define specialised data types, with complete control over both structure and behavior. Since the run-time behavior of such words may be defined either in high-level or in assembler, the words created by this new defining word are equivalent to all other kinds of Forth words in performance. The system will also allow one to add new "compiler directives" to implement special kinds of loops or other control structures, such as a CASE structure.
Forth words are similar to subroutines in other languages. They are also equivalent to commands in other languages. Forth allows one to type a function name at the keyboard, the function will then be executed. However, placing the function name in a definition will cause a reference to the function to be compiled.
High-level words are defined as a collection of other words. This can be thought of as a macro in other languages or as an English definition as given in a dictionary. The new word is then added to the store of words that can be used. Its definition is added to the dictionary of words. There are few characters that cannot be included in a word's name. Many programming groups adopt naming conventions, using punctuation characters, to improve readability.
When a word is encountered, the dictionary is searched to discover the word's definition. The function associated with the word is ether executed, or a reference is compiled into a new definition. If, however, the word can not be found in the dictionary, the system will attempt to convert the word into a number. If it succeeds the number is placed onto a parameter stack. If it fails to convert the word into a number it will display the word and an error message, indicating that the word is unknown to the system.
Forth adheres to the principles of "structured programming":
Forth is characterised by five major elements:
According to a recient interview with the developers of the famous Infocom adventure games (Hitch Hikers Guide to the Galaxy, and others), thair game interpreters where written in Forth.
Unison World produced over a dozen games for CP/M machines, all written in fig-Forth. According to Marc de Groot, thair technical director, porting the Z80-based games to the 6502 and 6809, typically took less than three months.
Words are added to the dictionary by "defining words", the most common
of which is
: (colon). When
: is executed, it constructs
a dictionary entry for the word that follows it and enters into "compilation"
mode. There are many different compilation methods, the most common of
which is "Threaded Code", where the definition consists of a list of addresses
referencing previously defined words. The definition is terminated by
(semicolon). Figure 1 shows the dictionary entry for the definition:
: NETWORK ( -- ) OPEN LINK TRXT. ECHO CLOSE LINK ;
NETWORK"). Finally a pointer to a routine called "
(:)" is compiled into the dictionary as the first part of the definition. This is a pointer to some code that will perform the action necessary to interpret the body of the definition. This is not the only compilation technique, but it is the most popular. This technique is known as Indirect threaded code as the first entry in the definition is a reference to some code that knows how to interpret the rest of the definition.
The remainder of the definition is referred to as its body. In
compilation mode the system will search for the head of each of the words
in turn. The address of the head is placed into the body of the definition,
thus producing a list of addresses. Finally, when the
; is reached,
the address of a routine called "
EXIT" is compiled into the definition.
EXIT routine is designed to return control to the invoking
word, thus acting as a subroutine return.
Although the structure of both stacks is the same, they have very different uses. The user/programmer interacts most directly with the data stack, which holds the arguments being passed between words. This replaces parameter lists used by conventional languages. It is an efficient internal mechanism which makes definitions intrinsically re-entrant. The second stack is known as the return stack and is used to hold return addresses for nested definitions, although other kinds of data are occasionally held there temporarily.
The use of the Data Stack (often called just "the stack") leads to a notation in which operands precede operators. This is a postfix notation often called RPN or "Reverse Polish Notation". The notation is based on the "Sentential Calculus" as developed by Professor Jan Lukasiewicz in the 1920s whilst working at Warsaw University (Lukasiewicz, 1963).
For an example let us take the word
BLANK. This expects an
address and count on the stack, placing the specified number of ASCII blanks
into the region of memory starting at the given address. Thus:
will fill the scratch region, whose address is placed on the stack by the
PAD 25 BLANK
25blanks. Application word are usually defined to work similarly. For example,
might be defined to record 100 measurements in a data array. Arithmetic
operators also expect values and leave their results on the stack. For
+adds the top two number on the stack, replacing them both by their sum. Since the results of operations are left on the stack, operations may be strung together without the need to define temporary storage variables. For example, the expression:
tempn + ((reading mod interval) / interval) * (tempn+1 - tempn)becomes:
reading interval mod interval / tempn+1 tempn - * tempn +
The first is the text interpreter, which passes strings from the terminal (or mass storage) and looks each word up in the dictionary. When a word is found it is executed by invoking the second level, the address interpreter.
The second level is the "address interpreter." Although not all Forth
systems are implemented in this way it was the first and is still the primary
implementation method. The address interpreter processes strings of address
(or tokens) compiled in definitions created by
: (colon), by executing
the definition pointed to be each.
QUIT. This is the Forth interpreter or keyboard interpreter.
: QUIT ( -- ) BEGIN RESET QUERY INTERPRET AGAIN ;
RESETclears the stacks,
QUERYwaits for user entry from the keyboard (or reads a line from a mass storage device) for commands to be used by
INTERPRETto search the dictionary for a match, followed by execution.
AGAINare the program-control words that make it an infinite loop.
QUIT loop provides the "interactive" nature of the language
as it carries out commands as soon as they have been entered. The results
can be inspected by the same keyboard interpreter. A new definition can
be tested and the trial/error cycle repeated in a fraction of the time
required by other edit-compile-link-test languages.
:definition) this will be the address of the address interpreter. That is the word
The address interpreter has a register
I that contains the
address of the next entry in the list to be executed. This entry is the
address of the cfa of the word called by the higher-level word currently
being executed. It is the cfa that determines the nature (or type) of word.
In the example given in figure 2 the word A calls the
word B, which in turn calls the word X, etc. The cfa is kept in an
W. Because of these two levels of indirect
addressing and the dictionary-list structure Forth is also known as "Indirect
The address interpreter reads
Wwith the address next in the list. It then reads from
Wand executes the code indicated by the code pointer.
Iis automatically incremented for the next entry in the list, as the first entry was the cfa
Inow points to the body of the definition. This is useful when defining data-type structures such as look-up tables, where the body contains the elements of the table.
In the case of high-level commands, the code field address points to
(:), a routine that saves the current
I on the return
stack, then loads
W and repeats the process. At
the end of the high-level definition, the word
EXIT is executed.
This will restore the old value of
I and program execution continues
as before. These three actions, read
I pointer, save
pointer, and restore
I pointer, are the basic mechanism by which
the address interpreter works.
The address interpreter has three important properties.
JSRinstruction and address.
:(colon) and interpreted by the address interpreter. Most of Forth itself is defined in this manner.
CODEthe programmer can create a definition whose behavior will consist of executing actual machine instructions.
CODEdefinitions may be used for I/O, arithmetic primitives, and other machine-dependent (or time-critical) processing. When using
CODEthe programmer has full control over the CPU, as with any other assembler.
CODEdefinitions run at full machine speed.
This is an important feature of Forth. It permits explicit computer-dependent
code in manageable pieces with specific interfacing conventions that are
machine-independent. To move an application to a different processor one
is only required to recode the
CODE words, which will interact
with other Forth words in exactly the same manner.
Forth assemblers are sufficiently compact (typically a KByte) that they
can be resident in the system (as are the compiler, editor, and other
programming tools). This means that the programmer can type in short
CODE definitions and execute them immediately. This capability
is especially valuable in testing custom hardware (See an example
of testing hardware).
Such block orientated disk handling is efficient and easy for native Forth systems to implement. As a result, blocks provide a completely transportable mechanism for handling program source and data across both native and co-resident implementations.
Definitions in program source blocks are compiled into memory by the
LOAD. Most systems include an editor, which formats a block
for display into 16 lines of 64 characters each and provides commands to
modify the source. An example of a Forth source block is given in
Source blocks have historically been an important element in Forth style. Just as Forth definitions may be considered the linguistic equivalent of sentences in natural languages, a block is analogous to a paragraph. A block normally contains definitions related to a common theme, such as "vector arithmetic". A comment on the top line of the block identifies this theme. An application may selectively load the blocks it needs.
Blocks are also used to store data. Small records can be combined into a block, or large records spread over several blocks. The programmer may allocate blocks in whatever way suits the application, native systems can increase performance by organising data to minimise disk head motion. Several vendors have developed sophisticated file and data base systems based on Forth blocks.
Versions of Forth that run co-resident with a host Operating System often implement blocks using files. In addition to providing a more common file based environment.
The "link" field contains the address of the head of the previous command in the dictionary list. This is required when searching the dictionary.
The "name length" field contains the number of characters in the full name of the word, followed by those characters. This is required to match the name of a command to that presented during interpretation. Most Forth systems only store the first 3 characters of the name, to avoid name clashing while minimising the space used (most systems allow one to select from between 3 to 31 characters).
The process of interpretation starts with the last entry in the dictionary and follows the link list backwards until a match is found. Although many systems now provide a hashing algorithm that splits the dictionary into separate lists, only one of which is searched, considerably reducing the search time.
The "code pointer" is the cfa of the first instruction to be executed for the word, this forms the first entry in the body of the definition. It is the address of the code that will interpret the rest of the definition, thus points to different code for different data types.
The body of a word varies for different data types (see figure 4):
CODE) definitions, the body contains a list of op-codes which defines the behavior of the command. The cfa therefore contains the address of the body.
CONSTANT, the body contains the actual data. The cfa contains the address of a routine that manipulates this data as required. An example of a user defined data-structure is given in figure 5.
:(colon) definitions, the body contains a list of addresses for all the previously defined words that make up its definition. The cfa is the address of the address interpreter
The LEDs are interfaced through a single 8 Bit port mapped to the address
40H. This location is defined as a
CONSTANT on Line 1, so that
it may be referred to by name; should the address change, one need only
adjust the value of this constant. The word
LIGHTS returns this
address on the stack. The definition
LIGHT takes a value on the
stack and sends it to the device. The nature of this value is a bit mask,
whose bits correspond directly to the individual lights. Thus, the command:
LIGHTSto suit the new hardware, while the rest of the application code need not be changed.
Lines 4-6 contain a simple diagnostic of the sort one might type in
from the terminal to confirm that everything is working. The variable
contains a delay time in milliseconds; execution of the word
returns the address of this variable. Two values of
set by the definitions
FAST, using the Forth
! (pronounced "store") which takes a value
and an address, storing the value in the address. The definition
runs a loop from 0 though 255 (the loop is started by the word
ended by the word
LOOP). The word
I places the current
loop index on the stack, this is then sent to the lights. The system will
then wait for the period specified by
DELAY. The word
(pronounced "fetch") fetches a value from an address, in this case the
address supplied by
DELAY. This value is passed to
which waits the specified number of milliseconds. The result of executing
COUNTS is that the light will count from 0 to 255 at the desired
rate. To run this one would type:
SLOW COUNTS or
LAMPis a defining word which takes a bit mask (representing a particular lamp) as an argument, and compiles it as a named entity. The word
CREATEcompiles a header into the dictionary, while the word
,(comma) places the mask into the body of the definition. The word
DOES>places the address of the following code into the cfa of the new word. Therefore when the new word is executed its action is to fetch the contents of the first item in the body of the new word. Lines 9 and 10 contain five uses of
LAMPto name particular indicators. When one of these words, such as
POWER, is executed, the mask is returned on the stack. In fact, this behavior is identical to the behavior of a Forth
LAMPdefinition is included here for example. The ability to define such a "defining word" with the use of
DOES>is one of the most powerful features of the language, allowing one to define "intelligent" application based data structures.
Finally, on lines 13-15, we have the words that will control the light
LAMPS is a variable that contains the current state of
the lamps. The word
LAMP-ON takes a mask (supplied by one of the
LAMP words) and turns the lamp on. It changes the state of that
particular lamp, saving the result in
will turn the given lamp off, also changing the
In the remainder of the application, the lamp names and
LAMP-OFF are probably the only words that will be executed directly.
The usage will be for example:
POWER LAMP-ON or
The time to compile this block of code on the system was about half a second, including time to fetch it from disk. So it is quite practical (and normal practice) for a programmer to simply type in a definition and try it immediately. In addition, one always has the capability of communicating with external devices directly. The first thing one would do when told about the lamps would be to type:
HEX FF 40 OUTPUT
A "target-compiler" allows the use of a host CPU, such as an IBM PC, for developing systems. Programs can be edited and interactively tested at the keyboard. These programs can then be compiled for the target environment and the appropriate ROM code produced. One function of the target compiler is to strip out any unproductive required code, such as the compiler, editor and assembler. By doing this, run time ROM overhead can be reduced from the 8 KBytes of development system to a minimum of approximately 600 Bytes.
The "cross-compiler" acts in much the same way as the target-compiler. Allowing the user to develop (and test) code on a host system. The cross-compiler will then compile the Forth system, in much the same manner as the target-compiler, except the target of the cross-compiler is a different CPU, with a different machine language. This facility allows us to develop systems for new CPU's very quickly. As the majority of Forth systems are written in Forth we need only write an assembler for the new processor and cross-compile the Forth system for the new processor. Most Forth systems are developed using this method. As a result, Forth is usually one of the first languages to be implemented on a new processor.
The process of writing one Forth compiler in another is referred to as "meta-compilation".
The Forth operating environment is fast and as a result, Forth-based systems can support both multi-tasking and multi-user operation even on computers whose hardware is usually thought incapable of such operation. For example, one producer of telephone switchboards is running over 50 tasks on a Z80. There are several multiprogrammed products for the IBM PC, some of which even support multiple users. Even on computers that are commonly used in multi-user operations, the number of users that can be supported may be much larger than expected. One large data-base application running on a single 68000 processor has over 100 terminal updating and querying its data-base, with no significant degradation.
Multi-user systems may also support multiple programmers, each of which has a private dictionary, stacks and a set of variables controlling that task. The private dictionary is linked to a shared, re-entrant dictionary containing all the standard Forth functions. The private dictionary can be used to develop application code which may later be integrated into the shared dictionary.
Figure 6 shows the classical method of multi-tasking in
a Forth system. This is a "voluntary round robin" scheduler. It is the most
common implementation of multi-tasking found in Forth systems. However, some
implementations use time slicing or priority scheduling and other preemptive
algorithms. In this system each task has a user area where its control
variables, private dictionary, and stacks are kept. The first field of this
user area is the
STATUS variable. A task has two possible values
for this variable, awake or asleep.
For a task to be selected for execution it must be awake. When the task has been selected its status is reset to asleep. The task executes until it voluntary re-enters the scheduler by executing the word
PAUSE. This will reset the task's status back to awake prior to re-entering the scheduler. When the scheduler comes around again the task will continue execution from after the
In addition to the
PAUSE word, a task could also re-enter the
scheduler by executing the
STOP word. This is similar to
except that it does not reset the task's status to awake, thus it will
re-enter the scheduler with the task's status set to asleep. This means
that the task will not be executed again, until such time as its status
is set to awake by another task, or an interrupt.
The system is programmed in this manner so as to allow for interrupts.
When an interrupt occurs, some machine dependent code will set a given
task's status to awake, thus when the scheduler next comes to that task
it will be executed. The
STATUS variable is set to asleep when
the task is executed, to allow for an interrupt occurring while the task
is executing. Hence, if a task executes a
STOP its status is not
changed, thus if an interrupt has set the task's status to awake whilst
the task was executing it will re-enter the scheduler with its status set
to awake. Thus allowing us to buffer an interrupt. However, this means
that when a task is giving up the processor voluntarily and wishes to continue
execution next time the scheduler comes around, it must set its status
The round robin scheduler takes the address stored in the
user variable as the address of the next task. If that task's
is set to awake, it is executed, otherwise the scheduler will take its
LINK address and move on to the next task. This can be seen in
the figure 6. There are two main problems with this method:
PAUSEin the underlying system and the speed of the Forth system overcome these problems in the majority of applications.
In 1981, Chuck Moore undertook to design a chip-level implementation of the Forth virtual machine. Working first at FORTH, Inc. and subsequently with a start-up company formed to develop the chip, Moore completed the design in 1984 and the first prototypes were produced in early 1985. More recently, Forth processors have been developed by Harris Semiconductor Corp., Johns Hopkins University and others. Forth-based chips offer extremely high performance, generally comparable with RISC chips but without the programming difficulties that accompany conventional RISC processors.
However, the first major effort to standardise Forth was a meeting in Utrecht in 1977. The attendees produced a preliminary standard and agreed to meet in the following year. The 1978 meeting was also attended by members of the California based FIG. Over the next couple of years a series of meetings, attended by both users and vendors, produced a more comprehensive standard called Forth-79.
Although Forth-79 was very influential, many Forth users and vendors found serious flaws in it. In 1983 two further meetings where held to produce the refined Forth-83 standard.
Encouraged by the widespread acceptance of Forth-83, a group of users and vendors met in 1986 to investigate the feasibility of an American National Standard. The Technical Committee for American National Standard Forth, the ANS ASC X3/X3J14 committee held its first meeting in 1987 with the objective "to achieve an acceptable standard which will result in broad compliance among all major vendors of Forth language products, with minimum adverse impact upon transportability from existing systems in use." In 1994, some seven years later, the new standard was finally produced. This is the most far reaching of all the standards. It was open to public review thought its development, with comments coming from 5 different countriess. The International Standards Organisation accepted this as an international standard some two years later (although it was not published until the following year, 1997).
This provides us with an interactive debugging environment where we
can add new macros (high-level definitions) and new instructions (low-level
CODE) definitions). It even allows us to extend the macro system
by defining new data types (defining words (
As this interpreter can also act as a fully integrated operating system
the programmer need only learn the one tool.
Forth has four primitive virtues: (a) Intimacy, (b) Immediacy, (c) Extensibility, and (d) Economy. It has two derived virtues: Total Comprehension, and Symbiosis.
Forth is not just a language, its more of a philosophy for solving problems. This can be summarised with the acronym K.I.S.S. (Keep It Simple and Stupid). To quote from Jerry Boutelle (owner of Nautilus Systems in Santa Cruz, California) when asked "How does using Forth affect your thinking?" replied:
Forth has changed my thinking in many ways. Since learning Forth I've coded in other languages, including assembler, Basic and Fortran. I've found that I use the same kind of decomposition we do in Forth, in the sense of creating words and grouping them together. For example, in handling strings I would define subroutines analogous to Forth'sOr to quote Antoine Lavoisier (1789):
FILL, etc. More fundamentally, Forth has reaffirmed my faith in simplicity. Most people go out and attack problems with complicated tools. But simpler tools are available and more useful. I try to simplify all the aspects of my life. There's a quote I like from Tao Te Ching by the Chinese philosopher Lao Tzu: "To attain knowledge, add things every day; to obtain wisdom, remove things every day".
It is impossible to disassociate language from science or science from language, because every natural science always involves three things: the sequence of phenomena on which the science is based, the abstract concepts which call these phenomena to mind, and the words in which the concepts are expressed. To call forth a concept, a word is needed; to portray a phenomenon, a concept is needed. All three mirror one and the same reality.This embodies the philosophy behind Forth.