Course

"An Erlang Course"

From the documentation, a short course in erlang with examples and problems. The sequential programming constructs are similar to other functional languages notably in the lack of side-effects when programming and the use of tail recursion to stop processes from growing in space and taking over all of the resources. List comprehensions are nice, ie defining a list from another list by [Y || Y <- T, Y =< X] (all elements of list T that are less than X). The concurrent programming features are what gives erlang its power - the ability to create many light-weight processes and to structure applications so that supervisor modules deal with crashes of their child processes to achieve robustness.

Programming exercises:

Simple sequential programs

%%% Code: temp.erl

-module(temp).
-export([convert/1]).

convert({c,T}) -> {f, c2f(T)};
convert({f,T}) -> {c, f2c(T)}.

c2f(T) -> 9/5*T+32.

f2c(T) -> 5/9*(T-32).

%%% End Code: temp.erl

Simple recursive programs

List1 module to calculate maximum and minimum of a list of numbers

%%% Code: list1.erl

-module(list1).
-export([min/1,max/1,min_max/1]).

min([H | T]) -> min(T, H).

min([], CurrentMin) -> CurrentMin; 
min([H | T], CurrentMin) -> 
  if   
       H < CurrentMin -> min(T, H); 
       true -> min(T,CurrentMin) 
    end.          

max([H | T]) -> max(T, H).

max([], CurrentMax) -> CurrentMax; 
max([H | T], CurrentMax) -> 
  if 
      H > CurrentMax -> max(T, H); 
        true -> max(T,CurrentMax) 
    end.          

min_max(L) -> {min(L), max(L)}.

%%% End Code: list1.erl

This works, but is a bit inelegant since the code for min and max are very similar and could be replaced by fold functions. min_max also traverses the list twice while it could instead do the task in one traversal. The list2 module makes these improvements:

%%% Code: list2.erl

-module(list2).
-export([min/1,max/1,min_max/1]).

min([H | T]) -> lists:foldl(
  fun(X,Acc)->if X<Acc -> X; true->Acc end end, H, T).

max([H | T]) -> lists:foldl(
  fun(X,Acc)->if X>Acc -> X; true->Acc end end, H, T).

min_max([H | T]) -> lists:foldl(
  fun(X,{MinAcc,MaxAcc}) -> 
    if 
      X < MinAcc, X<MaxAcc -> {X, MaxAcc};
      X > MinAcc, X<MaxAcc -> {MinAcc, MaxAcc};
      X > MinAcc, X>MaxAcc -> {MinAcc, X}
    end
  end, {H,H}, T).

%%% End Code: list2.erl

Note also that for both versions of the program an empty list causes the program to crash - this is a feature rather than a bug. A key point when programming fault-tolerant systems is that processes should crash entirely if they don't know how to deal with an input, and then their supervisor sees this and copes with the error.

Interacting processes, concurrency

Now things get interesting - creating multiple processes and getting them to talk to each other. The first task is to create two processes that then send a message between each other M times, terminating gracefully after that. This problem is basically the ping-pong program considered in tut15 of the "Getting Started with Erlang" document (p25). Instead of going through the code I'll go through the logic of what is happening:

  1. The start process spawns two processes, pong and ping. The pong process starts executing the pong function with no argument while the ping process starts executing the ping function with two arguments, the number of bounces and the PID of the pong process. IMPORTANT: pong is created first.
  2. Ping sends a packet consisting of {ping, Ping_PID} to pong then waits to receive the atom "pong".
  3. Pong receives the packet from ping and replies with a pong back to the ping process.
  4. Ping receives the pong from Pong and calls itself with (number of bounces - 1) (along with Pong_PID). Goto 2 (unless number of bounces remaining is 0)
  5. If no more bounces, Ping sends the atom "finished" to Pong and terminates.
  6. Pong receives "finished" and terminates.

To track what is going on a message is output from each process at each bounce.

Ring

The next problem is to start N processes in a ring an send a message around the ring M times. The message passes around the ring in only one direction so each process only needs to know the identity of the next process in the ring.

Approach: when the program is started with start(N,M) the first process is spawned and starts with the function start(N-1,M,Process0_PID). The extra argument is required so that the final start function knows to which node to send the start signal. The second process is spawned and goes into a function waiting for a message to start sending M messages around the ring. The arguments of this function are the number of messages to send (which decrements as they are sent) and the PID of the next process in the ring.

IMPORTANT: need to make a slight alteration, in that the intial call to start should spawn a child process that acts as the first node in the ring, rather than the interactive shell acting as the first node itself. Otherwise the next call to start gets confused... hang on a sec, the only way that it would get confused is if there were still some messages from the last invocation hanging around, but all of these should have been processed - unless I have an off-by-one error in the code for receiveing messges (ie the nodes don't receive the last message sent to them). ... Check ... No, that isn't the error, the problem is that the first node in the ring has one extra message to process due to the asymmetry in starting. Fix this by adding an extra message type so that the first node is sent "start_ring" to start and "start" to send the message.

%%% Code: ring.erl

-module(ring).  % Ring of N processes passing M messages around ring
-export([start/2]).  % Functions exported for use by user
-export([start/3,ring/2]).

ring(0,_) ->
      io:format("~w: Process finishing~n",[self()]);

ring(M,Child_PID) ->
  receive
    start_ring ->
      io:format("~w: Beginning the ring~n",[self()]),
      Child_PID ! {start, self()},
      ring(M,Child_PID);
    {start, Parent_PID} ->
      io:format("~w: Received from ~w - ~w messages to go~n",
                [self(),Parent_PID,M-1]),
      Child_PID ! {start, self()},
      ring(M-1,Child_PID)
  end.

start(N,M) ->
  spawn(ring,start,[N,M,setup]).

start(N,M,setup) ->
  io:format("~w: Process starting~n",[self()]),
  Child_PID = spawn(ring,start,[N-1,M,self()]),
  ring(M,Child_PID);

start(1,M,Ring0_PID) ->
  io:format("~w: Process starting~n",[self()]),
  Ring0_PID ! start_ring,
  ring(M,Ring0_PID);

start(N,M,Ring0_PID) ->
  io:format("~w: Process starting~n",[self()]),
  Child_PID = spawn(ring,start,[N-1,M,Ring0_PID]),
  ring(M,Child_PID).

%%% End Code: ring.erl

Star

Similar to the last problem, except now the network topology is a star, ie a central node connected bi-directionally to the other nodes in the network (which aren't connected to each other). The problem description specifies N nodes in a star, I will take this to mean there are N nodes in total, including the central node. Again M messages are sent to the nodes (and sent back to the central node? Assume that this is part of the task).

Approach: start function spawns N-1 nodes and returns the list of PIDs. It then spawns the central node with the number of messages to be sent and the list of PIDs. The central node has two states:

Once all M messages are sent a final "finished" message is sent to all of the edge nodes.

%%% Code: star.erl

-module(star).
-export([start/0,start/2,edge_node/0,central_node/2]).

send_all(Node_List, Message) ->
  lists:foreach(fun(Elem)->Elem ! Message end, Node_List).

central_node(0, Edge_PIDs) ->
  send_all(Edge_PIDs, finished), % Send the edge process: finished
  io:format("~w: Central node finished~n",[self()]);

central_node(M, Edge_PIDs) ->
  send_all(Edge_PIDs, {message, self()}),
  central_node(M, Edge_PIDs, length(Edge_PIDs)).

central_node(M, Edge_PIDs, K) ->
  receive
    {response, Edge_PID} ->
      io:format("~w: received response~n",[self()])
  end,
  if
    K > 1 ->
      central_node(M, Edge_PIDs, K-1);
    true ->
      central_node(M-1, Edge_PIDs)
  end.

edge_node() ->
  receive
    finished ->
      io:format("~w: Edge node finished~n",[self()]);
    {message, From_PID} ->
      io:format("~w: received message~n",[self()]),
      From_PID ! {response,self()},
      edge_node()
  end.

spawn_nodes(0,Acc) -> Acc;
spawn_nodes(N,Acc) ->
  Child_PID = spawn(star,edge_node,[]),
  io:format("Started edge node ~w~n",[Child_PID]),
  spawn_nodes(N-1, [Child_PID | Acc]).

start() -> start(2,3).

start(N,M) ->
  Edge_PIDs = spawn_nodes(N-1,[]),
  Cent_PID = spawn(star, central_node, [M, Edge_PIDs]),
  io:format("~w: Central node starting~n",[Cent_PID]).

%%% End Code: star.erl

Masters and Slaves, Error handling

The advantage of treating each task as an independent process is that it becomes possible to decouple processes from each other so that when one process crashes it doesn't destroy the entire system. When a process crashes it sends and EXIT signal to all the processes that it is linked to. Links between processes can be created when the process is spawned (to link the parent and child) or afterwards (to link the current process to another process - could do this is response to a message to create an arbitrary linkage topology).
Error trapping is the ability to process EXIT signals as if they were messages and react accordingly (eg by respawning the process that crashed). If exit signals are not caught they propagate up through the chain of processes that are linked to the crashing process.
The exit signal is of the form {'EXIT',PID,Why} so the trap exits clause should be programmed to match this form.

Layering is what enables the construction of robust systems. The higher level processes trap errors from lower level processes, so the lowest level processes (the actual "worker" or application processes) do not have to contain any error handling code.

Primitives for error handling:

Problem task: write a ms (for master-slave) module with the interface

When a slave dies the master should detect this, restart the slave and print a message saying that it has done this.

Approach: The shell spawns the master process and registers it as master. Master process calls itself recursively N times, each time spawn_linking a new process and adding the PID of the child to its list of children. At the end of the recursion the master sets the trap_exit process flag and goes to the master(SlavePIDList) state. While it is in this state it receives either the message stop (which causes it to send the stop message to all of its children and then exit), {N, message} where N is the number of a slave (from the shell calling the to_slave(N, message) function or an 'EXIT' signal from a slave that has received the die message. The master process sends messages to slave N by looking up the PID of that slave in its list of child PIDs. When a slave is terminated a new slave process is spawn_linked and the list of child PIDs is updated.

%%% Code: ms.erl

-module(ms).
-export([start/1,to_slave/2,stop/0]).
-export([master/1, slave/1, start_slaves/2]).

master(Slaves) ->
  receive
    stop -> stop(Slaves);
    {'EXIT',_ , N} ->
      Slave_PID = spawn_link(ms, slave, [N]),
      io:format("master restarting dead slave ~w.~n",[N]),
      master(lists:keyreplace(N,1,Slaves,{N,Slave_PID}));
    {N, Message} -> 
      lookup_PID(Slaves, N) ! Message,
      master(Slaves)
  end.

to_slave(N, Message) -> master ! {N, Message}.

lookup_PID([{M, PID} | T], N) ->
  if M == N -> PID; true -> lookup_PID(T, N) end.

slave(N) ->
  receive
    die -> 
      exit(N); % return slave number on exit
    Message -> 
      io:format("Slave ~w got message ~w.~n",[N, Message]),
      slave(N)
  end.

start_slaves(0, Slaves) ->
  process_flag(trap_exit, true),
  master(Slaves);
start_slaves(N, Slaves) ->
  Slave_PID = spawn_link(ms, slave, [N]),
  start_slaves(N-1, [{N,Slave_PID} | Slaves]).

start(N) ->
  Master_PID = spawn(ms, start_slaves, [N,[]]), 
  register(master, Master_PID).

stop() -> 
  master ! stop.

stop(Slaves) ->
  lists:foreach(fun({_, PID}) -> PID ! die end, Slaves).

%%% End Code: ms.erl

Robustness and the use of the graphics package

This task looks at adding robustness to a hiearchical set of interacting processes. Each process has a window associated with it, and each window has three buttons: Quit, Spawn and Error.
Spawn creates a new process and links it to the parent process.
Quit exits the process and any child processes.
Error simulates an error by exiting the process and and children, with the error being caught by the parent process and the process being restarted.

Approach: each new process that is spwaned creates a window with the three buttons and then goes into an interactive loop having parameters the handles of the buttons and the PID of the parent process. The PID of the parent process is used when the process responds to error signals - if the quit signal is received by a process that process quits if the message is from its parent and ignores the message otherwise. Similarly an error signal causes a process to exit if it receives the signal from its parent, or spawn a new process if the signal is not from its parent.
[Initially I tried to do this the other way around, ie each node kept a list of its children and searched the list for the PID of any message it received, responding accordingly if the process was in the list of children or not. This approach is less efficient and more complicated since the list must be updated each time a new process is spawned or a child process exits].

%%% Code: robust.erl

-module(robust).
-export([start/0]).
-export([loop/4,start_win/1]).

loop(Quit, Spawn, Error, Parent_PID) ->
  receive
    {gs,Quit,click,_,_} ->
      io:format("~w exiting normally~n",[self()]),
      exit(quit);
    {gs,Spawn,click,_,_} ->
      Child_PID =spawn_link(robust,start_win,[self()]),
      io:format("~w spawning child ~w~n",[self(),Child_PID]),
      loop(Quit,Spawn,Error, Parent_PID);
    {gs,Error,click,_,_} ->
      exit(die);
    {'EXIT',PID,quit} ->
      if PID == Parent_PID -> 
           io:format("~w received quit from parent ~w.~n",
                     [self(),Parent_PID]),
           exit(quit);
         true ->
           loop(Quit,Spawn,Error, Parent_PID)
      end;
    {'EXIT',PID,die} ->
      if PID == Parent_PID -> 
           io:format("~w received error from parent ~w.~n",
                     [self(),Parent_PID]),
           exit(die);
         true ->
           spawn_link(robust,start_win,[self()]),
           loop(Quit,Spawn,Error, Parent_PID)
      end
  end.

start() ->
  spawn(robust,start_win,[self()]).

start_win(Parent_PID) ->
  GS=gs:start(),
  WH = [{width,200},{height,40}],
  Win = gs:create(window,GS,WH),
  gs:frame(packer,Win,{packer_x,[{stretch,1},{stretch,1},{stretch,1}]}),
  Quit  = gs:button(packer,[{label,{text,"Quit"}}  ,{pack_x,1}]),
  Spawn = gs:button(packer,[{label,{text,"Spawn"}} ,{pack_x,2}]),
  Error = gs:button(packer,[{label,{text,"Error"}} ,{pack_x,3}]),
  gs:config(packer,WH),
  process_flag(trap_exit,true),
  gs:config(Win,{map,true}),
  loop(Quit, Spawn, Error, Parent_PID).

%%% End Code: robust.erl