The CKCC project has been made possible by grants from NWO (the Netherlands Organisation for Scientific Research) and Clarin-NL, and with the support of Clarin-EU.
The grant of 495.000 euro from the Netherlands Organization for Scientific Research (NWO) will be used to build the web-based tools to analyze and visualize the 17th-century intellectual networks and their themes of interest, and to enrich this corpus with annotations. The CKCC cooperation partners will bear the costs of the digitization of the letters itself, the NWO grant will not be used for that.
CLARIN-EU and CLARIN-NL
With advice from CLARIN-EU and financed by CLARIN-NL, in the first stage of the project a demonstrator has been developed which implements techniques of topic modeling. Tools used to create this demonstrator include tokenizing, language recognition, stemming and LDA (Latent Dirichlet Allocation; Mallet implementation). This stage was followed by an evaluation of more topical modeling methods, Latent Semantic Analysis and Random Indexing. These methods were combined with Natural Language Processing techniques for spelling normalization, language idenfication and named entity recognition. Results are presented in a facetted search interface, allowing the researcher to create selections of letters filtered by topic, correspondent, named persons, location and language. Besides the list of selected letters, the results are shown on a geographical map and on a time line. The personal network of a correspondent and people named in letters can also be visualized. Furthermore, the computer generates alternative suggestions for search and for similar letters.