This application performs a search for terms in ALTO files. ALTO files are XML files that store the output of OCR. A term can be several words and multiple terms can be searched at the same time. The output is in XML format and contains the coordinates of the words that were found as well as textual context around the hits. The program does a case insensitive search. This is achieved by converting everything into lowercase and then comparing. In addition, any punctuation marks (or brackets etc.) at the beginning or end of terms are ignored for comparison purposes. The detection of punctuation marks and the conversion to lowercase is based on Unicode 5.2 data.
Created by: Yves Maurer, Centre Informatique de l'Etat, 03 Dec, 2009
Last updated by: Yves Maurer, Centre Informatique de l'Etat, 23 Nov, 2011