Spelling Normalization

Seventeenth-century writing exhibits a large degree of spelling variation. This has a negative effect on the performance of analysis techniques such as topic modeling and keyword analysis. The CKCC project decided to use the spelling normalization application VARD 2, developed by Alistair Baron who is working in the group of Paul Rayson at Lancaster University, UK. The basic philosophy of VARD 2 is to normalize text to modern spelling, allowing existing linguistic tools to be used unmodified.

VARD was developed and trained to deal with spelling variation in Early Modern English, but it has an open structure, which allows it to deal with spelling variation in other languages as well. In the CKCC project VARD 2 is trained for Dutch. Spelling variation in French and Latin is handled with a rule based approach. If time permits we will also use VARD for French.


User Interface

Spelling normalization is only used as a preprocessing step in analysis techniques. In the user interface the original, unnormalized letter texts are shown.