The expression 17
is of type int and is a value already.
The expression 1 + 2 * 3 + 4
is of type int and evaluates to the
value 11
, since the multiplication is done first. The
expression 800 / 80 / 8
has type int. It is the same as
(800 / 80) / 8
rather than 800 / (80 / 8)
and
evaluates to 1
.
The expression 400 > 200
has type bool because this is the type
of the result of the comparison operator >
. It evaluates
to true
. Similarly, 1 <> 1
has type
bool and evaluates to
false
. The expression true || false
is of type
bool and evaluates to
true
since one of the operands is true. Similarly,
true && false
evaluates to false
since
one of the operands is false. The expression
if true then false else true
evaluates to
false
since the first (then
) part of the
conditional expression is chosen, and takes the place of the entire
expression.
The expression ’%’
is of type char and is already a value. The
expression ’a’ + ’b’
has no type – it gives a type error
because the +
operator does not operate on characters.
The mod
operator is of higher precedence than the
+
operator. So 1 + 2 mod 3
and
1 + (2 mod 3)
are the same expression, evaluating to
1 + 2
which is 3
, but
(1 + 2) mod 3
is the same as 3 mod 3
, which is
0
.
The expression evaluates to 11
. The programmer seems to
be under the impression that spacing affects evaluation order. It does
not, and so this use of space is misleading.
The expression max_int + 1
evaluates to a number equal
to min_int
. Likewise, min_int - 1
evaluates to
a number equal to max_int
. The number line “wraps around”.
This leads to the odd situation that
max_int + 1 < max_int
evaluates to true
. It
follows that when writing programs, we must be careful about what
happens when numbers may be very large or very small.
OCaml accepts the program, but complains when it is run:
OCaml
# 1 / 0;;
Exception: Division_by_zero.
We will talk about such exceptions later in the book. They are used for program errors which cannot necessarily be found just by looking at the program text, but are only discovered during evaluation.
For x mod y
:
when y = 0, OCaml
prints Exception: Division_by_zero
when y < > 0, x < 0, the result will be negative
when y < > 0, x = 0, the result will be zero
This illustrates how even simple mathematical operators require careful specification when programming – we must be explicit about the rules.
It prevents unexpected values: what would happen if an integer other
than 1
and 0
was calculated in the program –
what would it mean? It is better just to use a different type. We can
then show more easily that a program is correct.
The lowercase characters are in alphabetical order, for example
’p’ < ’q’
evaluates to true
. The uppercase
characters are similarly ordered. The uppercase letters are all
“smaller” than the lowercase characters, so for example
’A’ < ’a’
evaluates to true
. For type bool, false
is
considered “less than” true
.
Just take in an integer and return the number multiplied by ten. The function takes and returns an integer, so the type is int → int.
OCaml
# let times_ten x = x * 10;;
val times_ten : int -> int = <fun>
We must take two integer arguments, and use the
&&
and <>
operators to test if
they are both non-zero. So the result will be of type bool. The whole type will
therefore be int → int →
bool.
OCaml
# let both_non_zero x y =
x <> 0 && y <> 0;;
val both_non_zero : int -> int -> bool = <fun>
Our function should take an integer, and return another one (the sum). So, it is type will be int → int. The base case is when the number is equal to 1. Then, the sum of all numbers from 1…1 is just 1. If not, we add the argument to the sum of all the numbers from 1…(n−1).
OCaml
# let rec sum n =
if n = 1 then 1 else n + sum (n - 1);;
val sum : int -> int = <fun>
The function is recursive, so we used
let rec
instead of
let
. What happens if the argument given is
zero or negative?
The function will have type int → int → int. A number to the power of 0 is 1. A number to the power of 1 is itself. Otherwise, the answer is the current n multiplied by nx − 1.
OCaml
# let rec power x n =
if n = 0 then 1 else
(if n = 1 then x else
x * power x (n - 1));;
val power : int -> int -> int = <fun>
Notice that we had to put one if
…
then
…
else
inside the
else
part of another to cope with the
three different cases. The parentheses are not actually required,
though, so we may write it like this:
OCaml
# let rec power x n =
if n = 0 then 1 else
if n = 1 then x else
x * power x (n - 1);;
val power : int -> int -> int = <fun>
In fact, we can remove the case for n = 1
since
power x 1
will reduce to x * power x 0
which
is just x
.
The function isconsonant
will have type char →
bool. If a lower case character in the range
’a’
…’z’
is not a vowel, it must be a
consonant. So we can reuse the isvowel
function we wrote
earlier, and negate its result using the not
function:
OCaml
# let isconsonant c = not (isvowel c);;
val isconsonant : char -> bool = <fun>
The expression is the same as
let
x = 1
in
(
let
x = 2
in
x + x)
,
and so the result is 4
. Both instances of x
in
x + x
evaluate to 2
since this is the value
assigned to the name x
in the nearest enclosing
let
expression.
We could simply return 0 for a negative argument. The factorial of 0 is 1, so we can change that too, and say our new function finds the factorial of any non-negative number:
OCaml
# let rec factorial x =
if x < 0 then 0 else
if x = 0 then 1 else
x * factorial (x - 1);;
val factorial : int -> int = <fun>
Note that factorial
can fail in other ways too – if the
number gets too big and “wraps around”. For example, on the author’s
computer, factorial 40
yields
-2188836759280812032
.
We can just pattern match on the boolean. It does not matter, in this instance, which order the two cases are in.
Recall our solution from the previous chapter:
Modifying it to use pattern matching:
Again, modifying our solution from the previous chapter:
This is the same as
(A match case belongs to its nearest enclosing
match
). So the expression evaluates to
5
.
We write two functions of type char → bool like this:
Alternatively, we might write:
These two solutions have differing behaviour upon erroneous arguments (such as punctuation). Can you see why?
This is similar to odd_elements
:
But we can perform the same trick as before, by reversing the cases, to reduce their number:
This is like counting the length of a list, but we only count if the
current element is true
.
We can use an accumulating argument in an auxiliary function to make a tail recursive version:
To make a palindrome from any list, we can append it to its reverse. To check if a list is a palindrome, we can compare it for equality with its reverse (the comparison operators work over almost all types).
We pattern match with three cases. The empty list, where we have reached the last element, and where we have yet to reach it.
We can build a tail recursive version by adding an accumulating list,
and reversing it when finished (assuming a tail recursive
rev
, of course!)
The empty list cannot contain the element; if there is a non-empty list, either the head is equal to the element we are looking for, or if not, the result of our function is just the same as the result of recursing on the tail.
Note that we are using the property that the ||
operator
only evaluates its right hand side if the left hand side is false to
limit the recursion – it really does stop as soon as it finds the
element.
If a list is empty, it is already a set. If not, either the head exists somewhere in the tail or it does not; if it does exist in the tail, we can discard it, since it will be included later. If not, we must include it.
For example, consider the evaluation of
make_set [4; 5; 6; 5; 4]
:
The first part of the evaluation of rev
takes time
proportional to the length of the list, processing each element once.
However, when the lists are appended together, the order of the
operations is such that the first argument becomes longer each time. The
@
operator, as we know, also takes time proportional to the
length of its first argument. And so, this accumulating of the lists
takes time proportional to the square of the length of the list.
By using an additional accumulating argument, we can write a version which operates in time proportional to the length of the list.
For the same list:
Simply add an extra let
to define a
name representing the number we will take or drop:
The argument to take
or drop
is
length l / 2
which is clearly less than or equal to
length l
for all possible values of l
. Thus,
take
and drop
always succeed. In our case,
take
and drop
are only called with
length l
more than 1, due to the pattern matching.
We may simply replace the <=
operator with the
>=
operator in the insert
function.
The sort
function is unaltered.
We require a function of type α list → bool. List of length zero and one are, by definition, sorted. If the list is longer, check that its first two elements are in sorted order. If this is true, also check that the rest of the list is sorted, starting with the second element.
We can reverse the cases to simplify:
Lists are compared starting with their first elements. If the elements differ, they are compared, and that is the result of the comparison. If both have the same first element, the second elements are considered, and so on. If the end of one list is reached before the other, the shorter list is considered smaller. For example:
[1] < [2] < [2; 1] < [2; 2]
These are the same principles you use to look up a word in a dictionary: compare the first letters – if same, compare the second etc. So, when applied to the example in the question, it has the effect of sorting the words into alphabetical order.
The let rec
construct can be nested
just like the let
construct:
We have renamed the second argument of the insert
function to avoid confusion.
Our function will have type char list → char list. We just match on the argument list: if it is empty, we are done. If it starts with an exclamation mark, we output a period, and carry on. If not, we output the character unchanged, and carry on:
To use map
instead, we write a simple function
calm_char
to process a single character. We can then use
map
to build our main function:
This avoids the explicit recursion of the original, and so it is easier to see what is going on.
The clip
function is of type int →
int and is easy to write:
Now we can use map for the cliplist
function:
Just put the body of the clip function inside an anonymous function:
We require a function apply f n x
which applies function
f
a total of n
times to the initial value
x
. The base case is when n
is zero.
Consider the type:
The function f
must take and return the same type α, since its result in one iteration
is fed back in as its argument in the next. Therefore, the argument
x
and the final result must also have type α. For example, for α = int, we might
have a power function:
So power a b
calculates ab.
We can add an extra argument to the insert
function, and
use that instead of the comparison operator:
Now we just need to rewrite the sort
function.
We cannot use map here, because the result list will not necessarily be the same length as the argument list. The function will have type (α → bool) → α list → α list.
For example,
filter (
fun
x -> x mod 2 = 0) [1; 2; 4; 5]
evaluates to [2; 4]
.
The function will have type (α → bool) → α list → bool.
For example, we can see if all elements of a list are positive:
for_all (
fun
x -> x > 0) [1; 2; -1]
evaluates to false
. Notice that we are relying on the fact
that &&
only evaluates its right hand side when the
left hand side is true to limit the recursion.
The function will have type (α → β) → α
list list → β list list. We use
map
on each element of the list.
We have used explicit recursion to handle the outer list, and
map
to handle each inner list.
The function smallest_inner
takes a currently smallest
found integer, a boolean value found
indicating if we have
found any suitable value or not, and the list of integers. It is started
with max_int
as the current value, so that any number is
smaller than it, and false
for found
because
nothing has been found yet.
Thus, the function raises an exception in the case of an empty list, or one which is non-empty but contains no positive integer, and otherwise returns the smallest positive integer in the list.
We just surround the call to smallest
with an exception
handler for Not_found
.
We write a function sqrt_inner
which, given a test
number x
and a target number n
squares
x
and tests if it is more than n
. If it is,
the answer is x - 1
. The test number will be initialized at
1
. The function sqrt
raises our exception if
the argument is less than zero, and otherwise begins the testing
process.
We wrap up the function, handle the Complex
exception
and return.
Since the keys must be unique, the number of different keys is simply
the length of the list representing the dictionary – so we can just use
the usual length
function.
The type is the same as for the add
function. However,
if we reach the end of the list, we raise an exception, since we did not
manage to find the entry to replace.
The function takes a list of keys and a list of values and returns a dictionary. So it will have type α list → β list → (α × β) list.
This will have the type (α × β) list → α list × β list. For the first time, we need to return a pair, building up both result lists element by element. This is rather awkward, since we will need the tails of both of the eventual results, so we can attach the new heads. We can do this by pattern matching.
Here’s a sample evaluation (we cannot really show it in the conventional way, so you must work through it whilst looking at the function definition):
Since the inner pattern match has only one form, and is complete, we
can use let
instead:
We can use our member
function which determines whether
an element is a member of a list, building up a list of the keys we have
already seen, and adding to the result list of key-value pairs only
those with new keys.
How long does this take to run? Consider how long member
takes.
We pattern match on the first list – if it is empty, the result is
simply b
. Otherwise, we add the first element of the first
list to the union of the rest of its elements and the second list.
We can verify that the elements of dictionary a
have
precedence over the elements of dictionary b
by noting that
add
replaces a value if the key already exists.
The function g a b c
has type α →
β → γ
→ δ which can also be written α →
(β
→ (γ →
δ)). Thus, it takes an argument of type α and returns a function of type
β → (γ →
δ)
which, when you give it an argument of type β returns a function of type γ →
δ which, when you give it an
argument of type γ returns
something of type δ. And so,
we can apply just one or two arguments to the function g
(which is called partial application), or apply all three at once. When
we write let
g a b c = …
this
is just shorthand for
let
g =
fun
a ->
fun
b ->
fun
c -> …
The type of member
is α → α
list → bool, so if we
partially apply the first argument, the type of member x
must be α list → bool. We can use the
partially-applied member
function and map
to
produce a list of boolean values, one for each list in the argument,
indicating whether or not that list contains the element. Then, we can
use member
again to make sure there are no
false
booleans in the list.
We could also write:
Which do you think is clearer? Why do we check for the absence of
false
rather than the presence of true
?
The function ( / ) 2
resulting from the partial
application of the /
operator is the function which divides
two by a given number, not the function which divides a given number by
two. We can define a reverse divide function…
let
rdiv x y = y / x
…which, when partially applied, does what we want.
The function map
has type (α →
β)
→ α list → β
list. The function mapl
we wrote has type
(α →
β)
→ α list list → β
list list. So the function mapll
will have
type (α →
β)
→ α list list list → β
list list list. It may be defined thus:
But, as discussed, we may remove the l
s too:
It is not possible to write a function which would map a function
f
over a list, or list of lists, or list of lists of lists
depending upon its argument, because every function in OCaml must have a
single type. If a function could map f
over an α
list list it must inspect its argument enough to know it
is a list of lists, thus it could not be used on a β list unless β = α list.
We can write a function to truncate a single list using our
take
function, being careful to deal with the case where
there is not enough to take, and then use this and map
to
build truncate
itself.
Here we have used partial application of truncate
to
build a suitable function for map
. Note that we could use
exception handling instead of calling length, saving time:
You might, however, reflect on whether or not this is good style.
First, define a function which takes the given number and a list,
returning the first element (or the number if none). We can then build
the main function, using partial application to make a suitable function
to give to map
:
We need two constructors – one for squares, which needs just a single integer (the length of a side), and one for rectangles which needs two integers (the width and height, in that order):
The name of our new type is rect. A
rect is either a Square
or a Rectangle
. For
example,
We pattern match on the argument:
This will be a function of type rect → rect. Squares remain unaltered, but if we have a rectangle with a bigger width than height, we rotate it by ninety degrees.
We will use map
to perform our rotation on any rects in the argument list which need it. We
will then use the sorting function from the previous chapter which takes
a custom comparison function so as to just compare the widths.
For example, packing the list of rects
[Square 6; Rectangle (4, 3); Rectangle (5, 6); Square 2]
will give
[Square 2; Rectangle (3, 4); Rectangle (5, 6); Square 6]
We follow the same pattern as for the list type, being careful to deal with exceptional circumstances:
We can use our power
function from earlier:
We can just wrap up the previous function:
Our function will have type α →
α tree → bool. It takes a
element to look for, a tree holding that kind of element, and returns
true
if the element is found, or false
otherwise.
Note that we have placed the test x = y
first of the
three to ensure earliest termination upon finding an appropriate
element.
Our function will have type α tree → α tree. A leaf flips to a leaf. A branch has its left and right swapped, and we must recursively flip its left and right sub-trees too.
We can check each part of both trees together. Leaves are considered equal, branches are equal if their left and right sub-trees are equal.
We can use the tree insertion operation repeatedly:
There will be no key clashes, because the argument should already be
a dictionary. If it is not, earlier keys are preferred since
insert
replaces existing keys.
We can make list dictionaries from both tree dictionaries, append them, and build a new tree from the resultant list.
The combined list may not be a dictionary (because it may have
repeated keys), but tree_of_list
will prefer keys
encountered earlier. So, we put entries from t’
after
those from t
.
We will use a list for the sub-trees of each branch, with the empty list signifying there are no more i.e. that this is the bottom of the tree. Thus, we only need a single constructor.
type
’a mtree = Branch
of
’a * ’a mtree list
So, now we can define size
, total
, and
map
.
In fact, when there is only one pattern to match, we can put it directly in place of the function’s argument, simplifying these definitions:
A first attempt might be:
However, there are two problems:
OCaml
# [1; 2; 3];;
- : int list = [1; 2; 3]
# print_integers [1; 2; 3];;
[1; 2; 3; ]- : unit = ()
There is an extra space after the last element, and a semicolon too. We can fix this, at the cost of a longer program:
Now, the result is correct:
OCaml
# [1; 2; 3];;
- : int list = [1; 2; 3]
# print_integers [1; 2; 3];;
[1; 2; 3]- : unit = ()
We must deal with the exception raised when read_int
attempts to read something which is not an integer, as before. When that
exception is caught, we try again, by recursively calling ourselves. The
function ends when three integers are input correctly, returning them as
a tuple.
You may wonder why we used nested
let
…
in
…
structures rather than
just writing (read_int (), read_int (), read_int ())
– the
evaluation order of a tuple is not specified and OCaml is free to do
what it wants.
We ask the user how many dictionary entries will be entered, eliminating the need for a special “I have finished” code. First, a function to read a given number of integer–string pairs, dealing with the usual problem of malformed integers:
And now, asking the user how many entries there will be, and calling our first function:
Notice that we defined, raised, and handled our own exception
BadNumber
to deal with the user asking to read a negative
number of dictionary entries – this would cause
read_dict_number
to fail to return.
If we write a function to build the list of integers from 1 to n (or the empty list if n is zero):
We can then write a function to output a table of a given size to an output channel.
Look at this carefully. We are using nested calls to
iter
to build the two-dimensional table from
one-dimensional lists. Can you separate this into more than one
function? Which approach do you think is more readable?
We can test write_table_channel
most easily by using the
built-in output channel stdout
which just writes to the
screen:
OCaml
# write_table_channel stdout 5;;
1 2 3 4 5
2 4 6 8 10
3 6 9 12 15
4 8 12 16 20
5 10 15 20 25
- : unit = ()
Now we just need to wrap it in a function to open an output file, write the table, and close the output, dealing with any errors which may arise.
In addition to raising Invalid_argument
in the case of a
negative number, we handle all possible exceptions to do with opening,
writing to, and closing the file, re-raising them as our own, predefined
one. Is this good style?
We write a simple function to count the lines in a channel by taking
a line, ignoring it, and adding one to the result of taking another
line; our recursion ends when an End_of_file
exception is
raised – it is caught and 0 ends the
summation.
The main function countlines
just opens the file, calls
the first function, and closes the file. Any errors are caught and
re-raised using the built-in Failure
exception.
As usual, let us write a function to deal with channels, and then
deal with opening and closing files afterward. Our function takes an
input channel and an output channel, adds the line read from the input
to the output, follows it with a newline character, and continues. It
only ends when the End_of_file
exception is raised inside
input_line
and caught.
Now we wrap it up, remembering to open and close both files and deal with the many different errors which might occur.
Two references, x
and y
, of type
int ref have been
created. Their initial values are 1 and
2. Their final values are 2 and 4. The
type of the expression is int because this is the type
of !x + !y
, and the result is 6.
The expression [ref 5; ref 5]
is of type int ref list. It contains two
references each containing the integer 5. Changing the contents of one
reference will not change the contents of the other. The expression
let
x = ref 5
in
[x; x]
is also of type int ref
list and also contains two references to the integer 5.
However, altering one will alter the other:
OCaml
# let r = let x = ref 5 in [x; x];;
val r : int ref list = [{contents = 5}; {contents = 5}]
#
match r with h::_ -> h := 6;;
Warning 8 [partial-match]: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
[]
- : unit = ()
# r;;
- : int ref list = [{contents = 6}; {contents = 6}]
We can write a function forloop
which takes a function
of type int → α (where alpha would
normally be unit),
together with the start and end numbers:
For example:
OCaml
# forloop print_int 2 10;;
2345678910- : unit = ()
# forloop print_int 2 2;;
2- : unit = ()
[|1; 2; 3|]
: int
array
[|true; false; true|]
: bool array
[|[|1|]|]
: (int array)
array which is int
array array
[|[1; 2; 3]; [4; 5; 6]|]
: int list array
[|1; 2; 3|].(2)
: int, has value 2
[|1; 2; 3|].(2) <- 4
: unit, updates the array to
[|1; 2; 4|]
We use a for
construct:
Note that this works for the empty array, because a
for
construct where the second number is
less than the first never executes its expression.
Since we wish to reverse the array in place, our function will have type α array → unit. Our method is to proceed from the first element to the half-way point, swapping elements from either end of the array. If the array has odd length, the middle element will not be altered.
We will represent the int array
array as an array of columns so that
a.(x).(y)
is the element in column x
and row
y
.
Note that the result is correct for table 0
.
The difference between the codes for ’a’
and
’A’
, or ’z’
and ’Z’
is 32, so we
add or subtract as appropriate. Codes not in those ranges are
unaltered.
Periods, exclamation marks and question marks may appear in multiples, leading to a wrong answer. The number of characters does not include newlines. It is not clear how quotations would be handled. Counting the words by counting spaces is inaccurate – a line with ten words will count only nine.
We calculate the ceiling and floor, and return the closer one, being careful to make sure that a point equally far from the ceiling and floor is rounded up.
The behaviour with regard to values such as infinity
and
nan
is fine, since it always returns the result of either
floor
or ceil
.
The function returns another point, and is simple arithmetic.
The whole part is calculated using the built-in floor
function. We return a tuple, the first number being the whole part, the
second being the original number minus the whole part. In the case of a
negative number, we must be careful – floor
always rounds
downward, not toward zero!
Notice that we are using the unary operator -.
to make
the number positive.
We need to determine at which column the asterisk will be printed. It is important to make sure that the range 0…1 is split into fifty equal sized parts, which requires some careful thought. Then, we just print enough spaces to pad the line, add the asterisk, and a newline character.
No allowance has been made here for bad arguments (for example,
b
smaller than a
). Can you extend our program
to move the zero-point to the middle of the screen, so that the sine
function can be graphed even when its result is less than zero?
A non-tail-recursive one is simple:
To make a tail-recursive one, we can use an accumulator, reversing
each list as we append it, and reversing the result.
List.rev
is tail-recursive already.
We can use List.mem
, partially applied, to map over the
list of lists. We then make sure that false
is not in the
resultant list, again with List.mem
.
The String.iter
function calls a user-supplied function
of type char → unit on each character of
the string. We can use this to increment a counter when an exclamation
mark is found.
The contents of the counter is then the result of the function.
We can use the String.map
function, which takes a
user-supplied function of type char
→ char and returns a
new string, where each character is the result of the mapping function
on the character in the same place in the old string.
Notice that we have taken advantage of partial application to erase the last argument as usual.
Looking at the documentation for the String
module we
find the following:
So, by using the empty string as a separator, we have what we want:
We can use the functions create
,
add_string
, and contents
from the
Buffer
module together with the usual list iterator
List.iter
:
The initial size of the buffer, 100, is arbitrary.
We repeatedly check if the string we are looking for is right at the beginning of the string to be searched. If not, we chop one character off the string to be searched, and try again. Every time we find a match, we increment a counter.
You might consider that writing this function with lists of characters rather than strings would be easier. Unfortunately, it would be slow, and these kinds of searching tasks are often required to be very fast.
First, we extend the Textstat
module to allow
frequencies to be counted and expose it through the interface. Here is
the interface q1.mli
:
Here is its implementation q1.ml
:
Here is the main program:
We can write two little functions – one to read all the lines from a file, and one to write them. The main function, then, reads the command line to find the input and output file names, reads the lines from the input, reverses the list of lines, and writes them out. If a problem occurs, the exception is printed out. If the command line is badly formed, we print a usage message and exit.
Note that there is a problem if the file has no final newline – it will end up with one. How might you solve that?
We can simply do something (or nothing) a huge number of times using
a for
loop.
On many systems, typing time
followed by a space and the
usual command will print out on the screen how long the program took to
run. For example, on the author’s computer:
$ ocamlc bigloop.ml -o bigloop
$ time ./bigloop
real 0m1.896s
user 0m1.885s
sys 0m0.005s
$ ocamlopt bigloop.ml -o bigloop
$ time ./bigloop
real 0m0.022s
user 0m0.014s
sys 0m0.003s
You can see that, when compiled with ocamlc
, it takes
1.9s to run, but when compiled with ocamlopt
just
0.022s.
We can get all the lines in the file using our getlines
function from question two. The main function simply calls
string_in_line
on each line, printing it if
true
is returned.
The interesting function is string_in_line
. To see if
term
is in line
we start at position 0. The
condition for the term having been found is a combination of boolean
expressions. The first ensures that we are not so far through the string
that the expression could not possibly fit at the current position. The
second checks to see if the term is found at the current position by
using the function String.sub
from the OCaml Standard
Library. If not, we carry on.