Before C

The venerable BCPL procedural structured programming language is fast to compile, is reliable and efficient, offers a wide range of software libraries and system functions, and is available on several platforms, including the Raspberry Pi.

In the 1960s, the main high-level programming languages were Fortran, Basic, Algol 60, and COBOL. To optimize code or to provide low-level operations, assembler programming offered the only means to access registers and execute specific machine instructions. BCPL, which was used as a teaching language in many universities, provided a language with a rich syntax, addressed the scoping limitations of the other languages, and had low-level operations such as bit manipulation and computation of variable addresses.

Where BCPL differs from the other languages is that it is typeless; all variables are considered to be a word, typically 16 or 32 bits. Programmers can access individual bits and bytes of a word, perform both arithmetic and logical operations on words, compute the address of a word, or use a word as a pointer to another word. One further novel aspect of BCPL is that the compiler is small and written in BCPL, producing intermediate code for a virtual machine and simplifying the development of the compiler for a wide range of computers. BCPL was used on mainframe computers and minicomputers in the 1970s and microprocessors in the 1980s.

The early developers of Unix were influenced by, and many aspects of C were adopted directly from, BCPL. Although BCPL also supported characters and bytes, the lack of richer types was addressed in C, which became the programming language of choice for Unix (and subsequently Linux), leaving BCPL mostly for academic applications. Several groups developed compilers, operating systems, software utilities, commercial packages, and even flight simulation software in BCPL, but for the most part, BCPL has been forgotten.

The demise of BCPL in both academia and industry is disappointing, particularly because it is a powerful teaching language, introducing students to algorithms, software design, and compiler design. Later, languages such as Pascal and Modula-2 became popular languages to introduce concepts in computer science but have been superseded by Java, Python, and C++. Whereas the learning curve for BCPL is small, enabling students to become productive in a short time, the complexity of languages such as C++ can be a barrier to students learning their first programming language. 

The BCPL Language

The example in Listing 1 of a small BCPL program computes factorial values from 1! to 5!. Because C was developed from BCPL, the syntax of both languages is similar. The include directive in C is a GET directive in BCPL, the assignment operator = in C is := in BCPL, and the fences (curly brackets) { and } are identical. In C the address of a variable a is denoted by &a, whereas in BCPL it is given by @a. Indirection, or the use of pointers, is given by *a in C or !a in BCPL. Arrays are organized so that a!b in BCPL corresponds to a[b] in C.

Listing 1: 1! to 5! in BCPL

01 GET "libhdr"
02 
03 LET start() = VALOF
04 {
05     FOR i = 1 TO 5 DO
06       writef("fact(%n) = %i4*n", i, fact(i))
07     RESULTIS 0
08 }
09 
10 AND fact(n) = n=0 ‑> 1, n*fact(n‑1)

The GET directive includes the common procedures and definitions needed in the compilation of a program. The procedure start is similar to main in C, where the VALOF keyword denotes that start is a function with the result returned by the RESULTIS keyword. The variable i, a local variable of the procedure start, is implicitly defined at the start of the FOR loop, which is executed five times. The writef function is similar to printf in C. The recursive function fact tests whether n is zero and returns either 1 or n*(n-1)!, where the parameter n is a local variable of the procedure fact.

In BCPL, a variable is defined as a word that can represent an integer, a bit pattern, a character, a pointer to a string of characters, a floating-point number, or an address. A programmer can apply arithmetic operators, logical operators, shift operators, an address operator, or indirection to a variable – the compiler assumes that the programmer knows what they are doing and, subject to syntactic and sematic compilation checks, places very few constraints on programming constructions. Arguably, C and BCPL fall into the category of languages that provide almost unlimited power for a programmer with very few checks on their intention.

Both C and BCPL allow sections of a program to be compiled separately (e.g., to provide a library of functions). Global variables and procedures in BCPL, which are similar to external variables and functions in C, can be accessed by all sections of a program, whereas static variables are only accessible from the section in which they are declared. The other category of variables is local or dynamic variables, which are declared and used in the same way as C. When a local variable is declared, space is allocated on a stack, which grows and shrinks dynamically, typically on entry to and exit from a procedure, respectively, enabling procedures to be called recursively. 

Portability

BCPL was developed by Martin Richards in the Computer Laboratory at the University of Cambridge. His more recent Cintcode implementation is extensive and provides numerous examples of coding, mathematical algorithms, and even operating system functions. The advantages of this implementation are considerable: It is fast to compile, is reliable and efficient, and offers a wide range of software libraries and system functions. It is also available on several platforms, including the PC and the Raspberry Pi. The only drawback is the loss of speed from interpreting the compiled code.

I refer you to Martin Richard’s textbook , and his website which includes a version of Cintcode, that is straightforward to download and implement on an RPi. Also, a guide directed at young people programming a Raspberry Pi provides an extensive description of BCPL and the Cintcode implementation and numerous examples of BCPL programs.

For the programmer intending to write applications in BCPL that exploit the processing power of the ARM cores of a Raspberry Pi, a BCPL compiler generating ARM instructions directly is likely to produce code which runs considerably faster than interpreted code. For other users less concerned with processing speed, the tools and support provided by the Cintcode implementation of BCPL offer a stable and reliable platform.

BCPL for the Raspberry Pi

The arrival of the Raspberry Pi with its ARM cores, network connection, sound and video outputs, USB ports, and I/O interface running under the Linux operating system has encouraged the development of a range of programming languages for this platform. A code generator for BCPL that I developed compiles BCPL directly to ARM machine code, which can be linked with the standard Linux gcc toolset. The compiler (7,000 lines) compiles itself in less than 0.2 seconds on a Raspberry Pi 4B.

This 32-bit implementation of BCPL compiles a BCPL program prog.b to prog.o, where prog.o is a Linux object module linked with two libraries – blib.o and alib.o – by the gcc linker to produce an executable ELF module, prog. The library blib.b is written in BCPL and contains the common BCPL library functions. A small library alib.s is written in Linux assembler and contains low-level functions to access the Linux runtime environment.

Although the gcc linker builds the executable program, the object code produced by the compiler contains only blocks of position-independent code, requiring no relocation. At runtime, alib initializes the BCPL environment, setting up the workspace for the stack and global and static variables. Strictly, gcc is only used to generate a Linux-compatible module that can be loaded, whereas the linking of a BCPL program and libraries is performed by alib.

 Notes for Developers

The compiler uses registers r0 to r9 for arithmetic operations, logic operations, and procedure calls. The code generator attempts to optimize the code by keeping variables in registers, minimizing the number of memory accesses.

Register rg points to the global vector, and register rp is the BCPL stack pointer or frame pointer. Procedure linkage, procedure arguments, and local variables are allocated space in the current frame. Stack space is claimed on entry to a procedure and released on return from a procedure. The link register lr holds the return address on entry to a procedure and can also be used as a temporary register within a procedure. The system stack pointer sp is not used by the BCPL compiler, so it can be used to push and pop temporary variables. The compiler uses the BCPL stack for procedure linkage and the storage of local variables. It should be noted that the ARM core is a pipelined processor and reference to pc during an instruction implies the address of the current instruction+8 for most instructions. The program counter pc is used in the code generation of relative addresses used for procedure calls and branches and also in switchon expressions in BCPL.

Although Linux libraries are not explicitly linked, the libc library is available to BCPL programs. Fortunately, the register calling mechanisms of the GNU gcc tool chain and BCPL are distinct and independent. The BCPL stack grows upward, with no access or modification to the system stack. In C, the stack grows downward, and local variables are stored relative to the system stack pointer sp. Consequently, it is possible to call C functions from BCPL.

In the ARM Procedure Call Standard (APCS), the first four arguments are loaded into registers r0, r1, r2, and r3, respectively, and a result is returned in register r0. The address of the procedure is computed, and the procedure is called by an appropriate branch and link (bl) instruction or a branch, link, and exchange instruction (blx).

However, C and BCPL have two important differences: (1) BCPL strings are defined by the string size in the first byte followed by the 8-bit characters of the string, whereas strings in C are arrays of 8-bit characters terminated with a zero byte. BCPL strings must be converted to C strings, if calling C. (2) Addresses of variables, vectors, and strings in BCPL are word addresses, whereas they are machine addresses in C. Passing an address from BCPL to C requires a logical left shift of two places, and passing an address from C to BCPL requires a logical right shift of two places. Care is needed with strings in C because they are not necessary aligned on 32-bit word boundaries.

In both C and BCPL, the registers r0-r9 are not preserved across procedure calls. Additionally, the BCPL registers rp, rg, and lr cannot be guaranteed to be preserved in C, and it is advisable to store these registers before calling a C procedure. In practice, they can be pushed onto the system stack and popped on return by:

push {rg, rp, lr}
pop {rg, rp, lr}

The code produced by the code generator for the factorial example is shown in Listing 2 with comments to explain specific instructions. Note that register r0 is reloaded at location 0x38 because it is reached by code from locations 0x34 and 0x74; consequently, the content of register r0 is not assured. Additionally, the reference to the string

"fact(%n) = %i4*n"

is not known at location 0x4C when the instruction is generated; therefore, a full static reference is generated with the offset 0x00000028 stored at location 0x90.

Table 1: BCPL Registers

Register

Name

Function

0

r0

Data register 0

1

r1

Data register 1

2

r2

Data register 2

3

r3

Data register 3

4

r4

Data register 4

5

r5

Data register 5

6

r6

Data register 6

7

r7

Data register 7

8

r8

Data register 8

9

r9

Data register 9

10

rg

Global vector

11

rp

BCPL stack

12

ip

Unused

13

lr

Link register

14

sp

System stack pointer

15

pc

Program counter

Listing 2: Code Generator Output

   0:	0000003c 	data		Section size (words)
   4:	0000fddf 	data		Section identifier
   8:	6361660b 	data		Section name “fact”
   c:	20202074 	data		
  10:	20202020 	data		
  14:	0000dfdf 	data		Entry identifier
  18:	6174730b 	data		Procedure name “start”
  1c:	20207472 	data		
  20:	20202020 	data		
  24:	e8a4c800 	stmia	r4!,{fp,lr,pc}	Standard procedure entry
  28:	e884000f 	stm	r4,{r0,r1,r2,r3}	
  2c:	e244b00c 	sub	fp,r4,#12	
  30:	e3a00001 	mov	r0,#1	Initial value i
  34:	e58b000c 	str	r0,[fp,#12]	Save i
  38:	e59b000c 	ldr	r0,[fp,#12]	Load i
  3c:	e28b4024 	add	r4,fp,#36	Set new stack frame
  40:	eb000017 	bl	0xa4	Call f(i)
  44:	e1a02000 	mov	r2,r0	Arg 3 = f(i)
  48:	e59b100c 	ldr	r1,[fp,#12]	Arg 2 = i
  4c:	e59fe03c 	ldr	lr,[pc,#60]	Arg 1 = “fact(%n) = %i4*n”
  50:	e08f000e 	add	r0,pc,lr	pc offset
  54:	e1a00120 	lsr	r0,r0,#2	BCPL address
  58:	e28b4010 	add	r4,fp,#16	Set new stack frame
  5c:	e59ae178 	ldr	lr,[sl,#376]	Global writef
  60:	e12fff3e 	blx	lr	Call writef()
  64:	e59b000c 	ldr	r0,[fp,#12]	Load i
  68:	e2800001 	add	r0,r0,#1	Increment by 1
  6c:	e58b000c 	str	r0,[fp,#12]	Store i
  70:	e3500005 	cmp	r0,#5	Check end of for-loop
  74:	daffffef 	ble	0x38	Continue for-loop
  78:	e3a00000 	mov	r0,#0	Return 0
  7c:	e89b8800 	ldm	fp,{fp,pc}	Standard procedure return
  80:	6361660f 	data		String “fact(%n) = %i4*n”
  84:	6e252874 	data		
  88:	203d2029 	data		
  8c:	0a346925 	data
  90:	00000028 	data		
  94:	0000dfdf 	data		Entry identifier
  98:	6361660b 	data		String “fact”
  9c:	20202074 	data		
  a0:	20202020 	data		
  a4:	e8a4c800 	stmia	r4!,{fp,lr,pc}	Standard procedure entry
  a8:	e884000f 	stm	r4,{r0,r1,r2,r3}	
  ac:	e244b00c 	sub	fp,r4,#12	
  b0:	e3500000 	cmp	r0,#0	Test n=0
  b4:	1a000001 	bne	0xc0	Skip if not
  b8:	e3a00001 	mov	r0,#1	Return 1
  bc:	e89b8800 	ldm	fp,{fp,pc}	Standard procedure return
  c0:	e59b000c 	ldr	r0,[fp,#12]	Load n
  c4:	e2400001 	sub	r0,r0,#1	Decrement n
  c8:	e28b4010 	add	r4,fp,#16	Set new stack frame
  cc:	ebfffff4 	bl	0xa4	Call f(n-1)
  d0:	e59b100c 	ldr	r1,[fp,#12]	Get n
  d4:	e0000190 	mul	r0,r0,r1	Return n*(n-1)
  d8:	e89b8800 	ldm	fp,{fp,pc}	Standard procedure return
  dc:	00000000	data		No statics
  e0:	00000000	data		Start of global vector
  e4:	00000001 	data		Global 1 (start)
  e8:	00000024 	data		Offset to global 1
  ec:	0000005e 	data		Maximum global of the section

Installation

The file bcpl_distribution contains the files shown in Table 2. The object files bcpl.o and blib.o each contain a block of position-independent code. The assembler module leader.s provides a means of identifying the start of a BCPL program. The runtime library alib.s is written in assembler code and includes data regions for the global variables and static variables and is linked to the GNU C runtime library libc. Note that the files bcpl.b and bcplfecg.h are only needed to rebuild the compiler and are not required for user applications.

Table 2: bcpl_distribution

File Name

Function

alib.s

A runtime library written in GNU ARM assembler

blib.b

The BCPL runtime library, written in BCPL

blib.o

A precompiled version of the BCPL runtime library blib.b

bcpl.b

The BCPL compiler and code generator to run under Linux

bcpl.o

A precompiled version of the BCPL compiler and code generator

bcplcg.b

The code generator used by the BCPL compiler for the ARM processor

bcplfecg.h

A header file used by the code generator

leader.s

A small assembler program only used to locate the start of a BCPL program

libhdr.h

The standard BCPL header

The distribution also includes several BCPL examples and a user guide (Table 3). The programs queens.b and primes.b are described in Martin Richard’s excellent notes to young people interested in programming the Raspberry Pi [3].

Table 3: BCPL Examples and User Guide

File

Content

bench.b

A small program to time the execution of a small fragment of BCPL

fact.b

A small program to print the factorial numbers from 1! to 5!

primes.b

A small program to print the prime numbers less than 1,000

queens.b

An implementation of the “Queens” problem for 1 to 16 pieces

guide.pdf

A guide to BCPL for the Raspberry Pi, including installation notes

To install BCPL on Raspberry Pi Model 3 or 4, create a new directory and copy the distribution files in bcpl-distribution to this directory. Alternatively, to install BCPL on a Raspberry Pi Model 2, copy the distribution files in bcpl-distribution-rpi2. In a terminal shell, enter the commands

>unzip bcpl-distribution.zip
>as leader.s -o leader.o
>as alib.s -o alib.o
>gcc leader.o bcpl.o blib.o alib.o -o bcpl

to build and test the compiler (> denotes the Linux prompt).

For a first compiler test, compile and run the program fact.b, which prints the factorial numbers from 1! to 5!:

>./bcpl fact.b -o fact
>./fact

Further confidence tests rebuild the BCPL compiler bcpl.b with the BCPL compiler and build the library blib.b:

>./bcpl bcpl.b -o bcpl
>./bcpl -c blib.b

The BCPL library files and the compiler can then be copied to the appropriate Linux shared directories:

>sudo mkdir /usr/include/BCPL
>sudo cp libhdr.h /usr/include/BCPL/
>sudo cp bcpl /usr/bin/
>sudo cp leader.o /usr/lib/
>sudo cp blib.o /usr/lib/
>sudo cp alib.o /usr/lib/

The remaining BCPL programs can now be compiled and run with the command bcpl rather than ./bcpl. The compiler searches for library files in the working directory before searching the directories /usr/include/BCPL and /usr/lib.

Nostalgia

The influence of BCPL on the development of C and its later variants cannot be overstated. The availability of BCPL for the Raspberry Pi allows old computer science students to dust off copies of their programs, which should run directly on the Raspberry Pi. BCPL was used extensively in many UK university computer science departments. The portable multi-tasking operating system Tripos was written entirely in BCPL in the Computer Laboratory at the University of Cambridge and used in early versions of the Commodore Amiga, in the automotive industry, and in financial applications.

The logic simulator HILO-2 (the forerunner of Verilog) was developed in BCPL. Numerous utilities, including the early word processor roff were written in BCPL. Before the availability of floating-point hardware, I adapted BCPL compilers for the Motorola 6809 and 68000 processors to use scaled fixed-point arithmetic in real-time flight simulation. nnn