Matt's Blog


Sat Dec 9 14:31:58 EST 2006

Writing a computer program at the most basic level simply involves taking a thought, translating that thought into a programming language and then running the final program on some machine. However, the downside to programming is that it requires a rigid structure in order to allow the computer to interpret the programming language correctly. This is not completely a downside as it encourages logical thought about a problem, but there are many elements of a program that consist of "boilerplate" code that is reused from problem to problem with no thought involved - a boring problem rather than an intersting one.

Programmers like interesting problems. Turning a boring problem into an interesting problem means finding a general solution to the boring problem so the programmer does not have to think about the boring problem ever again. This means building tools.

The Ocaml distribution has a good selection of tools, which are similar to those available to other developed languages. These are the tools that are language-specific:

  • A compiler. This converts the program written in the programming language (the source code) into a form that can be run by a machine (the object code). Ocaml actually has two compilers, one produces machine independent byte code that can be run on the Ocaml virtual machine, the other produces machine code for particular hardware eg x86.
  • An interactive interpreter. This is a tool common to many high level programming languages, eg Ocaml, Lisp, Prolog, Matlab, but less common with lower level languages such as C and Pascal.
  • A runtime interpreter, or virtual machine. This interprets (runs) the bytecode produced by bytecode compilation. The advantage here is that to distribute a program to a number of different computer types it is sufficient to translate the virtual machine to the different computers, and then distribute the program in bytecode form. The alternative is to compile the program natively for each architecture.
  • Lexer and parser generators. A lexer translates a stream of input characters into a stream of lexical tokens, eg words. A parser checks whether a string of lexical tokens forms a syntactically valid structure according to some grammar definition, eg a sentence with subject, predicate and object. These devices are useful for processing user input, or for writing a domain specific language that expresses a given problem in a level closer to human language than computer code, and is then translated into computer code.
    • Sidenote: It is possible to think of the next higher level ie semantic analysis to see if the stream of syntactically valid constructs fits into some knowledge ontology. At present humans can do this but machines find it difficult. This is an ongoing field of research, see for example the semantic web. Thinking about the next higher level, ie a device to processes ontologies according to some higher level specification, makes my head hurt.
  • Dependency generator. For reasons of modularity it is usually a good idea in a large program to split separate concepts into separate files and compile these separately. This means that only a subset of the total set of files needs to be recompiled if a file is modified. The dependency generator keeps track of which files depend on other files, and creates the appropriate Makefile containing commands on which files need to be compiled, how, and in which order.
  • Browser. This allows interface definitions to be browsed through, to see how different modules fit together.
  • Documentation generator, usually by processing specially marked comments.
  • Debugger. If a program doesn't work it is sometimes not obvious what caused it to fail. One of the advantages of Ocaml is the strong typing system so most errors are caught at compile time. If they are not, it is possible to probe the code manually by placing suspect sections in exception handlers or even just print statements to see which sections of the code are reached. The debugger allows more controlled exploration of the program, by setting breakpoints in the code. The debugger executes the program until it reaches a breakpoint and then stops execution and asks for user input from an interactive shell. The user can check and modify the program environment at this point and see if there are any unexpected values. The user can also get the debugger to single-step through the program one instruction at a time, keeping an eye on what variables are changed at each step.
  • Profiler. This keeps track of where the majority of the effort of the program occurs, ie which functions or varaibles are accessed most often, which function takes the majority of the time. The point here is to identify bottlenecks in the system, and rewrite the offending functions (while still making sure they behave correctly!).
  • Foreign function interface. This lets a program written in one language use functions defined in another language, typically available in a library. Ocaml has a number of advantages over lower level languages such as C or C++, but there are many existing libraries written in the latter languages, covering such areas as hardware access, numerical calculations and graphical user interfaces (to identify three things that Ocaml does not have good standard libraries for). The foreign function interface lets the separate files be compiled separately and then linked together in the final stage of program compilation. This means that the overall program can be written in the most suitable language for each of its component parts.
  • Unit test framework to write tests for functions and objects.

In addition to the above tools there are tools that can be used for a number of different laguages:

  • Editor. This allows source code files to be edited, along with several useful tools to prevent mistakes:
    • Context sensitive highlighting to highlight keywords, variables, strings and comments. Also match parentheses.
    • Code template generation eg to put in skeleton "if...then...else" statements
  • Contract checker and other formal verification tools. The programmer writes a formal description of what the program should do in a general description language such as Z or CSP. The contract checker then acts as another step in the compilation process, stopping compilation if a contract is not fulfilled.

... and some tools that can be used to enable multiple programmers to work on a program at once:

  • Version control system. Think of this as a very powerful "Undo" function that can deal with multiple file versions. Additionally, multiple programmers can work on a single program, and merge changes.
  • Bug tracker. Keeps a database of program errors to enable users to report errors and programmers to see where the error occured.
  • Maillist software to help programmers to communicate with each other.
  • Community websites (eg Drupal on MySQL, Apache and PHP).
  • The internet.

[code] [ideas]


code (31)

erlang (6)
ideas (24)
lisp (1)
me (14)
notes (6)
ocaml (4)
physics (46)
qo (7)
unix (6)
vim (3)