Blog - code

[Me]

Everything is Lisp if you use Emacs

Sun Dec 10 08:42:07 EST 2006

A simple editor or word processor lets you type in characters on the keyboard and save them to a file so that you can access the file later. A more advanced editor allows formatting markup of the text, eg headings, lists, paragraphs, justification of text. An extremely advanced editor has an embedded scripting language to allow the user to automate common tasks. The type of tasks that can be automated depends on how powerful the scripting language is, and what interfaces there are between the editor and the outside world.

Emacs is an extremely advanced editor. The scripting language is a dialect of common Lisp, so lack of power is not a concern. The editor can be tied into external programs such as news readers, mail agents, web browsers, compilers, contract checker, ... with the scripting language. Programmers love this because it allows automation of common compilation tasks. (...)

Tools

Sat Dec 9 14:31:58 EST 2006

Writing a computer program at the most basic level simply involves taking a thought, translating that thought into a programming language and then running the final program on some machine. However, the downside to programming is that it requires a rigid structure in order to allow the computer to interpret the programming language correctly. This is not completely a downside as it encourages logical thought about a problem, but there are many elements of a program that consist of "boilerplate" code that is reused from problem to problem with no thought involved - a boring problem rather than an intersting one.

Programmers like interesting problems. Turning a boring problem into an interesting problem means finding a general solution to the boring problem so the programmer does not have to think about the boring problem ever again. This means building tools.

The Ocaml distribution has a good selection of tools, which are similar to those available to other developed languages. (...)

Programming as an expressive medium

Sat Dec 9 14:10:58 EST 2006

Programming is a way of translating thought into form. It is as expressive a medium as a novel or poem. In the poetical form of communication the writer is relying on the depth and breadth of the reader's experience to convey the poets message. In the programming form of communication the programmer relies on the capabilities of a computational engine, including its interface with the user. This user interface can be optimized for a given purpose, minimising the impedance mismatch between the programmer and his audience.

Here is the idea: any program is a message from the programmer saying one or more of:

This is a problem that I find interesting, and here is a method to investigate or solve that problem.
This problem is boring, here is a method to solve this problem automatically so I can think about more interesting things
I don't care much one way or the other about this problem, but writing this program pays the bills and lets me do more interesting things. (...)

Cryptography

Sun Dec 3 12:23:24 EST 2006

A stream cipher can be thought of as a machine that has internal state and generates a stream of random values as an input stream is fed into the machine, with a stream of random values coming out of the output when the input stream and generated stream are combined. The advantage of this viewpoint is that a general control function can be constructed that can take a number of different encryption machines as arguments eg a Caeser shift cipher, Vignere cipher or modern stream cipher such as RC4. (...)

Labview dataflow programming in Erlang

Tue Nov 28 21:09:38 EST 2006

Labview is a commonly used language for setting up experiments. It has a number of nice features:

programs are written in a graphical programming language, ie by moving icons around a diagram to create a schematic representation of the experiment
program execution is data-flow oriented, ie each icon has a number of inputs and outputs, and only outputs data once all of its inputs have received data. This seems a natural way of modelling signal processing tasks.

Erlang is a language where concurrency is the main consideration. The Erlang virtual machine can create independent lightweight processes at very little cost, with the processes communicating via message passing to unique process identifiers (PIDs).
Here I am going to consider the problem of programming Labview-like data-flow oriented programming in Erlang. (...)

[code] [erlang]

More playing with blogging software

Tue Nov 28 20:33:30 EST 2006

My homebrew blogging software starts with a plaintext file and uses a lexer and parser to split this into a list of entries. It then iterates over the list, printing each entry to a category page (eg code or ocaml), the main page and a permalink page.

One problem with this as it stands is that each permalink gets regenerated each time the blogging program is run. This is usually unnecessary since previous blog entries rarely change. It is even more annoying because the permalink entries are the full versions, not restricted to the first 1000 characters of the entry.

Solution: store in a file a record of which entries have been processed, and ideally be able to recognize whether an entry has changed. Do this by creating a hashtable of permalink string (YYYYMMDD_HHMMSS) and MD5 digest of the actual output string. The relevant bit of the main program is shown below:

read_permhash "permhash.txt";
List.iter 
  (fun entry -> 
    let tlst = Entry. (...)

Tries, dictionaries and directories

Sat Nov 25 09:02:43 EST 2006

A trie, or prefix tree, is a tree structure that maps keys to values stored in the tree. The tree itself consists of a set of labelled edges and nodes containing the values. The key corresponding to the value at a particular node is the concatenation of the path labels leading from the root of the tree to the node.

A standard application is to use a trie to represent a dictionary (see the exercises at the end of chapter 2 of the ORA Ocaml book). Here the edges are labelled by the characters making up a word, and the associated value is a boolean stating whether the node is the end of a word.

A trie can also be used to represent the layout of a website consisting of a root index page followed by a hierarchy of subpages (ie the layout of my website). Here the node values are wrapper functions that wrap content depending on its position in the tree, for example producing a navigation menu that depends on the position of the element in the tree. (...)

[code] [ideas] [ocaml]

Wrapping content in containers

Sun Nov 19 11:24:44 EST 2006

Everyone is pretty familiar with the concept of wrapping content in tags describing how to display that content by now. A basic example is wrapping section heading in a <h1>Section name</h1> environment in HTML.

What I want to do is wrap an entire output format around my content. Consider my website as the example. As I write this my website consists of a hierarchical collection of linked pages. The pages at the top level of the hierarchy ("Home", "Me", "Links", etc) are used to construct a top level navigation bar that is placed on the side of each page. The lower level pages have a smaller navigation bar at the top of the page that links to children of that page and the parent node (as long as the parent is not one of the top level pages).

At the moment my website generation program is quite low level in that it treats the pages as strings rather than abstract datatypes for example (this is due to my lack of understanding of the module system when I wrote the program). (...)

Content

Sat Nov 18 14:02:33 EST 2006

The important thing about communication is content not style - true or false? In my opinion, both. Communication is the act of constructing a mental model of the world and conveying this model to another person to internalise and integrate into their mental model. This is accomplished through use of some communications channel, such as plain text, speech, demonstration or interactive dialog. Which method is most effective at conveying your meaning to the receiver depends on the receiver, not the transmitter. Using a physical analogy, there needs to be the correct impedence matching for maximum transmission of the signal.

Adding to the problem is the fact that people may be unreceptive to the idea that you are trying to convey. The entire advertising industry is based around the idea of convincing people to buy products by making them want to buy the product ie by altering their mental model to one that includes the product as a necessary requirement for happiness. (...)

A calculator for uncertain quantities using Ocaml

Sun Nov 12 10:36:43 EST 2006

Error propagation is an important tool for measuring the uncertainty of a final result based on the uncertainties of the underlying measurements. It is also the bane of many first year science students' lives. A calculator that worked with uncertain quantities and did the error propagation automatically would be a useful tool, eg 22.0,5.0% + 10.0,1.0 to add 22 with an uncertainty of 5% to 10 with an uncertainty of 1.

Here I show how to accomplish this using Ocaml. Ocaml is a good choice for this task because the program can be broken down easily into its independent sub-tasks using the language facilities:

use the module system to define an UncertainNumber abstract datatype, with addition, subtraction, multiplication and division operations defined for this datatype.
use ocamllex and ocamlyacc to construct a calculator that deals with this datatype. (...)

The Game of Life, Metacircular interpreters and Hashlife

Sun Nov 12 09:33:31 EST 2006

J.H. Conway describe the "Life" cellular automata in 1970. This cellular automata consists of cells that may be alive or dead, and a transition rule that says a cell stays the in the same state if it has two neighbours, becomes alive if it has three neighbours and dies otherwise. The cells are arranged in a regular array of square cells, with the neighbourhood of a given cell being the surrounding eight cells.

This simple setup is Turing complete
- see "The Recursive Universe" by William Poundstone or "Winning Ways (For Your Mathematical Plays)" by J.H. Conway.
- This means that it is possible to construct a Life pattern that acts as an interpreted ie it takes other programs translated (compiled) to a Life pattern and then evaluates the program described by this input pattern. In particular, the pattern of the evaluator itself can be run by the evaluator - this is the key of a language being Turing complete. (...)

Nouns and Verbs

Sun Nov 5 20:04:26 EST 2006

Recently I have been reading "The Structure and Interpretation of Computer Programs" by Abelson and Sussman, (or "the wizard book" as it is also known). This describes the fundamentals of computer programming, using the Lisp dialect Scheme as the example language due to its ability to treat functions as data and vice versa. The ability to treat functions and data on the same footing leads to a whole new way of programming, since programs can be designed that work by functional composition rather than operations on data structures.

Why is this fundamental? It boils down to how we as humans use language. We use language to describe the world around us to ourselves and other people. While it is a fallacy that tribes having no word for the colour blue are unable to see that colour (for example) there is a grain of truth - the ability to translate perception into different sensory channels reinforces events and brings different processing mechanisms into play. (...)

Ocaml Functors

Sun Nov 5 15:42:46 EST 2006

Following on from the previous entry, now consider implementations of a signature that are themselves functions of a specific module type - these are called Functors. In the previous example the Interval datatype was defined for some endpoint elements, eg ints or floats. In Ocaml ints and floats have different addition operators (+ versus +. respectively) so the structure definitions for each interval required a lot of duplication. It would be better to define an Elt (element) datatype with addition, subtraction and comparison operators defined for each type, and then pass this module as a parameter for the functor implementing the interval datatype. This is shown below, where I have introduced the convention that the abstract datatype is called t, while the element from which it is constructed is called elt. (...)

Ocaml modules

Sat Nov 4 19:59:37 EST 2006

The ability to construct abstract data types is vital for producing large programs since it allows implementation details to be hidden from other parts of the program. For example in mathematical programming languages the underlying representation of complex numbers is not relevant to the scientist who must construct formulae using the complex numbers.

Ocaml has a module system that allows the programmer to define an interface specification separately to the implementation of that interface. Take for an example a module that describes intervals. An interval is defined by the end points of the interval (including the end points - this is a design decision here not a fundametal property of intervals). (...)

Content management systems

Sun Oct 15 19:39:42 EST 2006

It looks like it may be time to bid a fond farewell to my homebrew blogging software, at least in its current incarnation. It was a good programming exercise to write a program that read a plain text file and constructed a set of interlinked webpages with the dated entries separated into the appropriate categories. This turned out to be fairly simple in the functional programming style, and I'm tempted to go a bit further in introducing a higher degree of data abstraction (now that I have a better idea of how the Ocaml module system works).

But recently I have been playing around with the Drupal content management system (somehow volunteering to update the group website became constructing the group interactive online community. This mission creep tends to happen when I get a dull computery job to do :) ). The nice thing about this system compared to other content management systems is its extensibility. (...)

[code] [ideas] [me]

Encrypted blog entries again

Sat Oct 14 09:06:15 EST 2006

One of my friends read the blog entry from a couple of weeks ago about encryption/decryption in client side javascript, and pointed out that There already exist libraries for encryption between the client and the server eg using public key cryptography or similar.
This is of course true, and proabably what you would use in a secure web application where bi-directional communication is allowed.

His question clarified the idea in my mind further. The crux of the idea is unidirectional broadcast from a server to a set of clients. The messages broadcast contain plain text data, encrypted data, and code for the encryption machines. Keys to make the machines decrypt encrypted content are distributed through other channels, for example buying a decryption key in tamper-proof hardware token form from a vendor.

Is this starting to sound familiar?

Looks like I've just reinvented encyption for digital or satellite television. (...)

Encrypted blog entries

Tue Oct 3 21:48:44 EST 2006

Most social networking sites have a mechanism whereby only the friends of a person can see their blog entries. The standard way to do this is to have a back end database that keeps track of the users logged in to the system and allows them access to the entries only if they are in the list of friends of a given user. This is a server side approach that has a couple of disadvantages:

increased server load to keep track of the state of the system
lack of security against eavesdroppers as it can be seen when access to a restricted page is attempted

The second point deserves further discussion. Consider a high traffic website that contains some entries that are restricted to those people meeting a certain criteria eg a news site that restricts access to full articles to subscribers, a blog with personal entries restricted to close friends, www.al-queda.ter restricting details of plans to the true believers, etc. (...)

Looping without loops

Sun Sep 17 18:09:04 EST 2006

A standard exercise when first learning to program is to draw a triangle of stars using for loops, i.e.

*
**
***
****

How do you do this in a language that doesn't allow variable assignment? By (tail) recursion with an extra accumulator argument and loop index. Below is the Ocaml code for drawing a number of different triangles. Note the use of the higher order function string_gen.

let make_line c n = String.make n c

let make_tri c n = 
  let rec aux k acc = 
    if k=0 then acc else aux (k-1) ((make_line c k) :: acc)
  in aux n []

let string_gen shift_fun ind_update lst = 
  snd (List.fold_left 
    (fun (ind,acc) x -> (ind_update ind,
                         acc ^ (shift_fun ind) ^ x ^ "\n"))
    (0,"") lst)

let odd_els lst = 
  List.rev (snd (List. (...)

Planar trees

Sun Sep 17 17:42:01 EST 2006

A planar tree consists of a list of nodes, where each node is a pair containing the element present at that node and a list of child nodes - or in other words the subtree that is the child of that node. This data structure is very important, being able to model Lisp S-expressions, XML, and other hierarchical data structures.

As a concrete example consider a bullet point list with nested levels:

Point 1
- Point 1a
- Point 1b
- Point 1b1
Point 2
Point 3
- Point 3a

We want to represent this is a tree of the form

[Node(1, [Node(1a,[]); Node(1b,[Node(1b1,[])])]);
 Node(2,[]); Node(3,[Node(3a,[])])]

where "[a;b;c]" is the Ocaml notation for constructing a (linked) list. This can be translated trivially into other functional programming languages.

The key insight that I had today was how to represent the tree as a list of pairs where each pair contains (level in hierarchy, element). (...)

Commercial Users of Functional Programming

Thu Aug 17 10:21:42 BST 2006

The [Commercial Users of Functional Programming (CUFP)] conference is about functional programming as a means rather than an end. This ties in very well with the physicist mindset of using the best tools available for the job at hand, to minimize effort. The conference website has abstracts of the current conference and reports from the preceeding conferences.

Ideas for Erlang server software

Thu Aug 17 08:42:35 BST 2006

From the Erlang mailing list, D�rio Abdulrehman wrote:

Erlang is well suited for writing scalable server software and I am currently looking for interesting server software projects that can be done.

Ideas proposed in that message and the following thread:

VOD Video On Demand http://vodka.lfcia.org/
Instant Messaging (ejabberd)
Joel Reymont is working on trading software
[A cometd implementation:]
Radiosity based tracer.
An AJAX/Comet-based online chess game.
web based chat with yaws, dojo 0.3, suffering, and endurance
an RTP implementation in Erlang

[code] [erlang]

Why Should I Care What Color the Bikeshed Is?

Wed Aug 16 09:55:10 BST 2006

[Why Should I Care What Color the Bikeshed Is?] is a classic post describing the bikeshed principle to Open Source developers. The basic concept comes from the book "Parkinson's Law", the point being that people will spend a disproportionate amount of time dealing with small problems if those problems are within their skill set.

[code] [physics]

Labview in Erlang

Sun Aug 13 21:44:30 BST 2006

Introduction

Labview is the common programming language used for data acquisition in the experimental sciences (although Matlab is also very popular). The main selling point of Labview is the ability to write programs in a graphical programming language where functions are "virtual instruments" with defined inputs and outputs joined together by wires. This allows for "data flow" based programming where the output of a virtual instrument is calculated as soon as all of the inputs are available.

Initial thoughts

In Erlang the input and output messages can be represented by messages, and the virtual instruments by processes. The tricky thing is to represent the data flow of the input messages so that the process calculates the outputs only when all of the input messages have arraived. Take a simple example of a process that requires three data inputs to calculate its output. (...)

[code] [erlang] [ideas]

Blogging from the web

Sun Aug 13 12:18:42 BST 2006

Introduction

Now that I have my blogging software working to process a single text file into a set of related blog webpages it is time to think of extending it to make it more useful.

Some possibilities:

Create a webpage so that new entries can be added to the blog, rather than the existing method of writing to a text file and updating the blog by running my program on the text file. Use a .htaccess file to restrict access to the "Create blog entry" webpage.
Make it possible to update the blog by sending email messages to a certain address on my domain (or even just my email account with a special token to trigger a procmail rule). Restrict access by a password token in the email, encrypted email, one-time passwords, etc.

Both of the above ideas boil down to writing a program to add an individual entry to the blog. (...)

Algorithms for the ages

Wed Aug 9 16:58:49 BST 2006

From Random Samples, Science 287 p799, February 4, 2000:

[1946: Metropolis Algorithm]
[1947: Simplex method for linear programming]
[1950: Krylov Subspace Iteration method]
1951: Decompositional approach to matrix computations such as the Cholesky decomposition, the pivoted LU decomposition, the QR decomposition, the spectral decomposition, the Schur decomposition, and the singular value decomposition.
1957: Fortran Optimizing Compiler
[1959: QR Algorithm for computing eigenvalues]
[1962: Quicksort algorithms for sorting]
[1965: Fast Fourier Transform]
[1977: Integer Relation Detection

Learning Lisp (1)

Wed Aug 2 21:01:16 BST 2006

Learning Lisp has been on my to-do list ever since I read "The Art of Unix Programming" by ESR. Probably even before that, when I realized that most of the power of Mathematica comes from the functional programming aspects. The ability to define higher order functions that act on complex objects rather than just different types of data is empowering. Once I became aware of functional programming I started with Ocaml, a dialect of ML (Meta Language - originally used to write proof proving systems). Ocaml is great because it is strongly typed and the language itself checks that the program obeys the typing. This means that many errors are detected at compile time rather than run time, speeding up the debugging process (see the SPARK programming language for the logical extension of this process to a language that is very robust due to the enforcement of information flow constraints at compile time, but this is another entry). (...)

Lexing and Parsing again

Wed Jul 26 18:11:18 BST 2006

[Ocaml Parser Tutorial Setup]

This tutorial describes how to construct a parser in Ocaml. The main thing that I found useful was the idea of compiling a separate .ml file that defines types that the parser deals with - rather than a parser token being a simple int or string it can be a more complex data type. So for my blog software define a "key" of type (string * string * string * string) to contain (year, month, day, full date string), a "tagstr" type of string list to contain the list of category tags, etc. This lets me use the lexer to do more of the string processing, rather than doing further string processing in the main blogmain.ml program. This is turn makes the program as a whole more modular, since the blogmain.ml program can be written to treat an incoming list of verified tokens from the parser and combine them with some glue functions into the final output. (...)

Program types

Sun Jul 16 11:38:03 BST 2006

Eric Raymond's book "The Art of Unix Programming" is a great explanation of the Unix programming philosophy (and Zen), as well as a guide for writing good programs. In Chapter 11 he discusses a set of common Unix interface Design Patterns, ie how the program interacts with the user. The key points about Unix are that everything is a file, and ideally all data is stored in plain text files. This means that data can be processed using a number of different programs, rather than the specific program that was originally used to produce the data.

The design patterns are:

Filter. The program receives data on standard input, does something to it and returns the result to standard output. The user can use shell redirection to read from or write to a file instead of stdin/stdout. Filters can be chained together.
Cantrip. Invoke the command and it does something, with no input or output required by the user. (...)

Making links tags work

Fri Jul 14 10:54:28 BST 2006

Quick modification to the last version of the software, to go through the list of tags for an entry and create a correspoding link to that blog category. This doesn't exist as yet since the entry does not currently get piped both to the main page and the tag category, I need to relearn how to open different output streams in Ocaml - from memory it was just open_out channel_name. Should be simple...

... time passes ...

In the end it took about an hour to do, longer than I had hoped. Still, seems pretty functional now. The code for the main ocaml program is below:

blogmain.ml

(* Convert character list to string *)
let explode str = 
  let rec aux n acc = match n with
    | 0 -> acc
    | _ -> aux (n-1) (str.[n-1] :: acc)
  in aux (String.length str) [];;

(* Convert character list to string *)
let implode clst =
  let str = String.create (List. (...)

Blogging using Ocaml

Thu Jul 13 09:58:41 BST 2006

Write an Ocamllex/Ocamlyacc program to parse text from my log files into a blog format. Get the lexer to recognise dates in the standard Unix date format, and take this to represent the start of a new blog entry. Immediately following the date can come an optional tags field, stating the categories that the entry should be filed under. This is terminated by two newlines and then the blog entry proper appears, until the next date field at the start of a line by itself.

As a first attempt just use a modified key/value parser, replacing the delimiter character with a date string, and adding entries to the list as tuples of (date string, tags string, entry string). This is fairly trivial to accomplish, the parser only needs to recognise five tokens: DATE, TAGS, ENTRY, EOL (end of line) and EOF (end of file).

Once the text file is parsed into a list of entry tuples the entries themselves can be filtered through Markdown to convert the formatted text into HTML. (...)

How to insert timestamps in VIM?

Tue Apr 18 09:04:46 BST 2006

  - r! date (or r! date/T for time only)
  - "=strftime("%c")<enter>p

code (31)

ideas (24)

lisp (1)

me (16)

qo (7)

unix (8)

vim (4)