Building Bigger Programs

So far we have been writing little programs and testing them interactively in OCaml. However, to conquer the complexity of the task of writing larger programs, tools are needed to split them into well-defined modules, each with a given set of types and functions. We can then build big systems without worrying that some internal change to a single module will affect the whole program. This process of modularization is known as abstraction, and is fundamental to writing large programs, a discipline sometimes called software engineering.

In this chapter, you will have to create text files and type commands into the command prompt of your computer. If you are not sure how to do this, or the examples in this chapter do not work for you, ask a friend or teacher. In particular, if using Microsoft Windows, some of the commands may have different names.

Making a module

We will be building a modular version of our text statistics program from Chapter 13. First, write the text file shown in below (but not the italic annotations) and save it as textstat.ml (OCaml programs live in files with lowercase names ending in .ml).

The first line is a comment. Comments in OCaml are written between (* and *). We use comments in large programs to help the reader (who might be someone else, or ourselves some time later) to understand the program.

We have then introduced a type for our statistics. This will hold the number of words, characters, and sentences. We have then written a function stats_from_channel which for now just returns zeros for all the statistics.

Now, we can issue a command to turn this program into a pre-processed OCaml module. This compiles the program into an executable. The module can then be loaded into interactive OCaml, or used to build standalone programs. Execute the following command:

ocamlc textstat.ml

You can see that the name of the OCaml compiler is ocamlc. If there are errors in textstat.ml they will be printed out, including the line and character number of the problem. You must fix these, and try the command again. If compilation succeeds, you will see the file textstat.cmo in the current directory. There will be other files, but we are not worried about those yet. Let us load our pre-compiled module into OCaml:

OCaml

# #load "textstat.cmo";;
# Textstat.stats_from_file "gregor.txt";;
- : int * int * int * int = (0, 0, 0, 0)

Note that #load is different from our earlier #use command – that was just reading a file as if it had been cut and pasted – we are really loading the compiled module here.

Filling out the module

Let us add a real stats_from_channel function, to produce a working text statistics module. We will also add utility functions for retrieving individual statistics from the stats type:

We can compile it in the same way, and try it with our example file:

OCaml

# #load "textstat.cmo";;
# let s = Textstat.stats_from_file "gregor.txt";;
val s : Textstat.stats = (8, 464, 80, 4)
# Textstat.lines s;;
- : int = 8
# Textstat.characters s;;
- : int = 464
# Textstat.words s;;
- : int = 80
# Textstat.sentences s;;
- : int = 4

You might ask why we need the functions lines, characters etc. when the information is returned in the tuple. Let us discuss that now.

Making an interface

We said that modules were for creating abstractions, so that the implementation of an individual module could be altered without changing the rest of the program. However, we have not achieved that yet – the details of the internal type are visible to the program using the module, and that program would break if we changed the type of stats to hold an additional statistic. In addition, the internal count_words function is available, even though the user of the module is not expected to use it.

What we would like to do is to restrict the module so that only the types and functions we want to be used directly are available. For this, we use an interface. Interfaces are held in files ending in .mli, and we can write one for our module. Here is our interface:

In this interface, we have exposed every type and function. Types are written in the same way as in the .ml file. Functions are written with val, followed by the name, a colon, and the type of the function. We can compile this by giving the .mli file together with the .ml file when using ocamlc:

ocamlc textstat.mli textstat.ml

The ocamlc compiler has created at least two files: textstat.cmo as before and textstat.cmi (the compiled interface). You should find this operates exactly as before when loaded into OCaml. Now, let us remove the definition of the type from the interface, to make sure that the stats type is hidden, and its parts can only be accessed using the lines, characters, words, and sentences functions. We will also remove the declaration for stats_from_channel to demonstrate that functions we do not need can be hidden too:

Now, if we compile the program again with ocamlc textstat.mli textstat.ml, we see that the stats_of_channel function is now not accessible, and the type of stats is now hidden, or abstract.

OCaml

# #load "textstat.cmo";;
# let s = Textstat.stats_from_file "gregor.txt";;
val s : Textstat.stats = <abstr>
# Textstat.lines s;;
- : int = 8
# Textstat.characters s;;
- : int = 464
# Textstat.words s;;
- : int = 80
# Textstat.sentences s;;
- : int = 4
# Textstat.stats_from_channel;;
Error: Unbound value Textstat.stats_from_channel

We have successfully separated the implementation of our module from its interface – we can now change the stats type internally to hold extra statistics without invalidating existing programs. This is abstraction in a nutshell.

Building standalone programs

Now it is time to cut ourselves free from interactive OCaml, and build standalone programs which can be executed directly. Let us add another file stats.ml which will use functions from the Textstat module to create a program which, when given a file name, prints some statistics about it:

There are some new things here:

The built-in array Sys.argv lists the arguments given to a command written at the command line. The first is the name of our program, so we ignore that. The second will be the name of the file the user wants our program to inspect. So, we match against that array. If there is any other array size, we print out a usage message.
The function Printexc.to_string from the OCaml Standard Library converts an exception into a string – we use this to print out the error.
There was an error, so it is convention to specify an exit code of 1 rather than 0. Do not worry about this.

Let us compile this standalone program using ocamlc, giving a name for the executable program using the -o option:

ocamlc textstat.mli textstat.ml stats.ml -o stats

Now, we can run the program:

$ ./stats gregor.txt
Words: 80
Characters: 464
Sentences: 4
Lines: 8
$ ./stats not_there.txt
An error occurred: Sys_error("not_there.txt: No such file or directory")
$ ./stats
Usage: stats <filename>

This output might look different on your computer, depending on your operating system. On most computers, the ocamlopt compiler is also available. If we type

ocamlopt textstat.mli textstat.ml stats.ml -o stats

we obtain an executable which is much faster than before, and completely independent of OCaml – it can run on any computer which has the same processor and operating system (such as Windows or Mac OS X) as yours, with no need for an OCaml installation. On the other hand, the advantage of ocamlc is that it produces a program which can run on any computer, so long as OCaml support is installed.

Questions

Extend our example to print the character histogram data as we did in Chapter 13.
Write and compile a standalone program to reverse the lines in a text file, writing to another file.
Write a program which takes sufficiently long to run to allow you to compare the speed of programs compiled with ocamlc and ocamlopt.
Write a standalone program to search for a given string in a file. Lines where the string is found should be printed to the screen.