More playing with blogging software
Tue Nov 28 20:33:30 EST 2006
My homebrew blogging software starts with a plaintext file and uses a lexer and parser to split this into a list of entries. It then iterates over the list, printing each entry to a category page (eg code or ocaml), the main page and a permalink page.
One problem with this as it stands is that each permalink gets regenerated each time the blogging program is run. This is usually unnecessary since previous blog entries rarely change. It is even more annoying because the permalink entries are the full versions, not restricted to the first 1000 characters of the entry.
Solution: store in a file a record of which entries have been processed, and ideally be able to recognize whether an entry has changed. Do this by creating a hashtable of permalink string (YYYYMMDD_HHMMSS) and MD5 digest of the actual output string. The relevant bit of the main program is shown below:
read_permhash "permhash.txt";
List.iter
(fun entry ->
let tlst = Entry.tags entry in
let permlink = make_permlink entry in
let perm_outstr = (outstr "full" entry) in
let perm_md5 =
Digest.to_hex (Digest.string perm_outstr) in
let old_md5 =
try Hashtbl.find ht permlink
with Not_found -> "" in
let outlist =
List.fold_left
(fun acc x -> (tag_to_chan x) :: acc)
[] tlst in
begin
List.iter (writeout (outstr "short" entry)) outlist;
(if perm_md5 <> old_md5 then
let permchan = permoutchan permlink in
begin
writeout perm_outstr permchan;
Hashtbl.replace ht permlink perm_md5;
close_out permchan
end
else ());
end)
(List.rev entries);
write_permhash "permhash.txt";
The read_permhash and write_permhash functions are simple, the first opens the file "permhash.txt" and adds the key-value pairs to an empty hashtable, while the second iterates over the hashtable at the end to write the new hashes.
This works for the permalink files, might also be worth doing for the category files so that only the new entries are added to the front of each file. However this will be a bit tricky if a permalink is updated in the middle of a category and so may not be worth it.