A Dictionary package for Lua 4.0.1

  — Wim Couwenberg, July 15, 2003

1. Tables & dictionaries

Lua's central data structure is the table. A Lua table is a heterogenous unordered associative array disallowing duplicate keys. It is a versatile and extremely powerful structure, not in the least because of its highly optimized implementation.

The dict package for Lua 4 introduces a dictionary data type. A dictionary keeps an ordered collection of records that can be retrieved by key lookups. There are no restrictions on the actual form of either the records stored or the keys used for lookup.

2. The dict package

2.1 Creating dictionaries

A dictionary is created by dict.create([record order[, key record order]]). If the key record order is absent then record order is used in its place. This is useful in situations where records and keys are one and the same (e.g. a dictionary using plain strings as records.) If both record order and key record order are omitted then the default lua order operators are used.

The record order is used to decide where each new record should be stored. If ord is a record order then for each pair of records a, b the call ord(a, b) should return a number < 0 if a < b, 0 if a == b and a number > 0 if a > b. (Technically it would suffice if a record order was just a predicate "a < b" but this is not true for a key record order.)

The key record order is used to find records that match a given key. If ord is a key record order then for each key/record pair k, r the following rules apply. If key k matches record r then ord(k, r) must return 0. If key k can only match records smaller (greater) than r in the record order then it must return a number < 0 (> 0). (A single predicate "k < r" would not suffice.)

For convenience, dict.cmp can be used to define an order. It is defined by
function dict.cmp(a, b) return a < b and -1 or (b < a and 1 or 0) end

2.2 Inserting records

If d is a dictionary then you can insert a record r with d:insert(r). The record will always be inserted, even if it equals another record in the dictionary by record order. It is stored such that no equal record follows it in the dictionary (i.e. it comes last in a possible sequence of equal records.) The insert method returns an iterator that represents the location of the inserted record in the dictionary. Iterators must be treated as opaque structures. They are used for iterating over and manipulating the records in a dictionary.

2.3 Retrieving records

Records are located by key. If d is a dictionary and k is a key then d:find(k) locates the first record that matches the key. If no matching record is found then find returns nil. If a match is found then find returns a pair record, iterator. Here record is the first record that matches the key and iterator represents the location of this record in the dictionary.

The call d:lwb(k) returns the first record, iterator pair in the dictionary such that ord(k, record) <= 0 in the key record order, or nil if no such record exists. Similarly d:upb(k) returns the first record, iterator pair such that ord(k, record) < 0, or nil if no such record exists in d.

2.4 Removing records

Because a single key k can match any number of (contiguous) records in the database, a record can not be removed by specifying a key. Instead, an iterator is used that represents an exact location in the dictionary. Such an iterator must originate from an earlier call to a dictionary method. If it is an iterator on a dictionary d then d:erase(it) removes the record pointed at by it from the dictionary. The iterator it is no longer valid after the call to erase.

2.5 Number of records

The number of records in the dictionary d can be obtained as its property d.n. Every record is counted as one even if it is equal to another record in the dictionary with respect to the record order.

2.6 Iterating records

Positions in the dictionary are represented by iterators. Iterators are more specific than keys, because a key does not necessarily specify a unique record in the dictionary. The record pointed at by an iterator it can be obtained by the call dict.record(it).

If d is a dictionary then d:first() and d:last() return a record, iterator pair for the first respectively the last position in the dictionary, or nil if the dictionary is empty.

If it is an iterator on d then d:next(it) and d:prev(it) return the record, iterator pair that is directly in front of respectively directly following the given iterator, or nil if the iterator points at the last respectively first position in the dictionary. The iterator it is itself not affected by these calls. For convenience, d:next() and d:prev() are equivalent to d:first() and d:last() respectively.

2.7 Validity of iterators

An iterator can not be constructed but can only be obtained as a result of a dictionary method (find, lwb, upb, first, last, prev or next.) An iterator remains valid until the record that it points to is removed from the dictionary by an erase call. In particular, erase does not invalidate iterators that do not point to the removed record. Inserting a new record does not invalidate any iterators.

3 Implementation issues

The dictionary implementation of the dict package is based on AVL-trees. This means that all operations on a dictionary or an iterator are guaranteed O(log n) where n is the number of records in the dictionary. A full ordered iteration over all records in a dictionary is guaranteed O(n).

4 Some examples

Here are two examples that dump an ordered list of all words occuring in a file. Note that these examples may be instructive but they are certainly not very practical as the same results could have been easily obtained using plain lua tables.

require "dict"

local file, err = openfile(arg[1], "r")
if not file then
	print("can not open file " .. arg[1] .. ": " .. err)
	return
end

-- default order, keys equal records
local d = dict.create()

-- build up word list
local line = read(file, "*l")
while line do
	gsub(line, "(%a+)", function(word)
		%d:insert(word)
	end)
	line = read(file, "*l")
end

closefile(file)

-- dump words, 
-- recurring words are printed multiple times
print(d.n .. " words total")
local word, it = d:first()
while word do
	print(word)
	word, it = d:next(it) -- replace by d:upb(word) to print unique words
end
The following example builds up a word count dictionary and prints all frequencies for words in the range e - k.
require "dict"

local file, err = openfile(arg[1], "r")
if not file then
	print("can not open file " .. arg[1] .. ": " .. err)
	return
end

-- record and key record order, case insensitive
local ro = function(a, b)
	return dict.cmp(strlower(a.word), strlower(b.word))
end

local kro = function(k, r)
	return dict.cmp(strlower(k), strlower(r.word))
end

-- create a dictionary using these orders
local d = dict.create(ro, kro)

-- build up word frequency list
local line = read(file, "*l")
while line do
	gsub(line, "(%a+)", function(word)
		local rec = %d:find(word)
		if rec then
			rec.count = rec.count + 1
		else
			%d:insert {word = word, count = 1}
		end
	end)
	line = read(file, "*l")
end

closefile(file)

-- dump words in range e - k 
-- words are printed with their frequency
print(d.n .. " different words total, showing range e-k")
local rec, it = d:lwb "e"
local _, last = d:lwb "l"
while it ~= last do
	print(rec.word, rec.count)
	rec, it = d:next(it)
end