So far we have been writing little programs and testing them interactively in OCaml. However, to conquer the complexity of the task of writing larger programs, tools are needed to split them into well-defined modules, each with a given set of types and functions. We can then build big systems without worrying that some internal change to a single module will affect the whole program. This process of modularization is known as abstraction, and is fundamental to writing large programs, a discipline sometimes called software engineering.
In this chapter, you will have to create text files and type commands into the command prompt of your computer. If you are not sure how to do this, or the examples in this chapter do not work for you, ask a friend or teacher. In particular, if using Microsoft Windows, some of the commands may have different names.
We will be building a modular version of our text statistics program
from Chapter 13. First, write the text file shown in below (but not the
italic annotations) and save it as textstat.ml
(OCaml
programs live in files with lowercase names ending in
.ml
).
The first line is a comment. Comments in OCaml are written between
(*
and *)
. We use comments in large programs
to help the reader (who might be someone else, or ourselves some time
later) to understand the program.
We have then introduced a type for our statistics. This will hold the
number of words, characters, and sentences. We have then written a
function stats_from_channel
which for now just returns
zeros for all the statistics.
Now, we can issue a command to turn this program into a pre-processed OCaml module. This compiles the program into an executable. The module can then be loaded into interactive OCaml, or used to build standalone programs. Execute the following command:
ocamlc textstat.ml
You can see that the name of the OCaml compiler is
ocamlc
. If there are errors in textstat.ml
they will be printed out, including the line and character number of the
problem. You must fix these, and try the command again. If compilation
succeeds, you will see the file textstat.cmo
in the current
directory. There will be other files, but we are not worried about those
yet. Let us load our pre-compiled module into OCaml:
OCaml
# #load "textstat.cmo";;
# Textstat.stats_from_file "gregor.txt";;
- : int * int * int * int = (0, 0, 0, 0)
Note that #load
is different from our earlier
#use
command – that was just reading a file as if it had
been cut and pasted – we are really loading the compiled module
here.
Let us add a real stats_from_channel
function, to
produce a working text statistics module. We will also add utility
functions for retrieving individual statistics from the
stats
type:
We can compile it in the same way, and try it with our example file:
OCaml
# #load "textstat.cmo";;
# let s = Textstat.stats_from_file "gregor.txt";;
val s : Textstat.stats = (8, 464, 80, 4)
# Textstat.lines s;;
- : int = 8
# Textstat.characters s;;
- : int = 464
# Textstat.words s;;
- : int = 80
# Textstat.sentences s;;
- : int = 4
You might ask why we need the functions lines
,
characters
etc. when the information is returned in the
tuple. Let us discuss that now.
We said that modules were for creating abstractions, so that the
implementation of an individual module could be altered without changing
the rest of the program. However, we have not achieved that yet – the
details of the internal type are visible to the program using the
module, and that program would break if we changed the type of
stats
to hold an additional statistic. In addition, the
internal count_words
function is available, even though the
user of the module is not expected to use it.
What we would like to do is to restrict the module so that only the
types and functions we want to be used directly are available. For this,
we use an interface. Interfaces are held in files ending in
.mli
, and we can write one for our module. Here is our
interface:
In this interface, we have exposed every type and function. Types are
written in the same way as in the .ml
file. Functions are
written with val
, followed by the name, a colon, and the
type of the function. We can compile this by giving the
.mli
file together with the .ml
file when
using ocamlc
:
ocamlc textstat.mli textstat.ml
The ocamlc
compiler has created at least two files:
textstat.cmo
as before and textstat.cmi
(the
compiled interface). You should find this operates exactly as before
when loaded into OCaml. Now, let us remove the definition of the type
from the interface, to make sure that the stats type is hidden, and its
parts can only be accessed using the lines
,
characters
, words
, and sentences
functions. We will also remove the declaration for
stats_from_channel
to demonstrate that functions we do not
need can be hidden too:
Now, if we compile the program again with
ocamlc textstat.mli textstat.ml
, we see that the
stats_of_channel
function is now not accessible, and the
type of stats is now hidden, or abstract.
OCaml
# #load "textstat.cmo";;
# let s = Textstat.stats_from_file "gregor.txt";;
val s : Textstat.stats = <abstr>
# Textstat.lines s;;
- : int = 8
# Textstat.characters s;;
- : int = 464
# Textstat.words s;;
- : int = 80
# Textstat.sentences s;;
- : int = 4
#
Textstat.stats_from_channel;;
Error: Unbound value Textstat.stats_from_channel
We have successfully separated the implementation of our module from
its interface – we can now change the stats
type internally
to hold extra statistics without invalidating existing programs. This is
abstraction in a nutshell.
Now it is time to cut ourselves free from interactive OCaml, and
build standalone programs which can be executed directly. Let us add
another file stats.ml
which will use functions from the
Textstat
module to create a program which, when given a
file name, prints some statistics about it:
There are some new things here:
The built-in array Sys.argv
lists the arguments
given to a command written at the command line. The first is the name of
our program, so we ignore that. The second will be the name of the file
the user wants our program to inspect. So, we match against that array.
If there is any other array size, we print out a usage message.
The function Printexc.to_string
from the OCaml
Standard Library converts an exception into a string – we use this to
print out the error.
There was an error, so it is convention to specify an exit code of 1 rather than 0. Do not worry about this.
Let us compile this standalone program using ocamlc
,
giving a name for the executable program using the -o
option:
ocamlc textstat.mli textstat.ml stats.ml -o stats
Now, we can run the program:
$ ./stats gregor.txt
Words: 80
Characters: 464
Sentences: 4
Lines: 8
$ ./stats not_there.txt
An error occurred: Sys_error("not_there.txt: No such file or directory")
$ ./stats
Usage: stats <filename>
This output might look different on your computer, depending on your
operating system. On most computers, the ocamlopt
compiler
is also available. If we type
ocamlopt textstat.mli textstat.ml stats.ml -o stats
we obtain an executable which is much faster than before, and
completely independent of OCaml – it can run on any computer which has
the same processor and operating system (such as Windows or Mac OS X) as
yours, with no need for an OCaml installation. On the other hand, the
advantage of ocamlc
is that it produces a program which can
run on any computer, so long as OCaml support is installed.
Extend our example to print the character histogram data as we did in Chapter 13.
Write and compile a standalone program to reverse the lines in a text file, writing to another file.
Write a program which takes sufficiently long to run to allow you
to compare the speed of programs compiled with ocamlc
and
ocamlopt
.
Write a standalone program to search for a given string in a file. Lines where the string is found should be printed to the screen.