Tools for the Efficient Generation of Hand-Drawn Corpora Based on Context-Free Grammars
Scott MacLean, David Tausky, George Labahn, Edward Lank, Mirette Marzouk
Sketch Based Interfaces and Modeling, 2009, pp. 125--132.
Abstract: In sketch recognition systems, ground-truth data sets serve to both train and test recognition algorithms. Unfortunately, generating data sets that are sufficiently large and varied is frequently a costly and time-consuming endeavour. In this paper, we present a novel technique for creating a large and varied ground-truthed corpus for hand drawn math recognition. Candidate math expressions for the corpus are generated via random walks through a context-free grammar, the expressions are transcribed by human writers, and an algorithm automatically generates ground-truth data for individual symbols and inter-symbol relationships within the math expressions. While the techniques we develop in this paper are illustrated through the creation of a ground-truthed corpus of mathematical expressions, they are applicable to any sketching domain that can be described by a formal grammar.
@inproceedings{MacLean:2009:TFT,
author = {Scott MacLean and David Tausky and George Labahn and Edward Lank and Mirette Marzouk},
title = {Tools for the Efficient Generation of Hand-Drawn Corpora Based on Context-Free Grammars},
booktitle = {Sketch Based Interfaces and Modeling},
pages = {125--132},
year = {2009},
}
Return to the search page.
graphbib: Powered by "bibsql" and "SQLite3."