Abstract
Digital ink (online handwriting) generation has a number of potential applications for creating user-visible content, such as handwriting autocompletion, spelling correction, and beautification. Writing is personal and usually the processing is done on-device. Ink generative models thus need to produce high quality content quickly, in a resource constrained environment.
In this work, we study ways to maximize the quality of the output of a trained digital ink generative model, while staying within an inference time budget. We use and compare the effect of multiple sampling and ranking techniques, in the first ablation study of its kind in the digital ink domain.
We confirm our findings on multiple datasets - writing in English and Vietnamese, as well as mathematical formulas - using two model types and two common ink data representations. In all combinations, we report a meaningful improvement in the recognizability of the synthetic inks, in some cases more than halving the character error rate metric, and describe a way to select the optimal combination of sampling and ranking techniques for any given computational budget.
A. Afonin—Work done as a student researcher at Google Research, Zürich, Switzerland.
A. Afonin and A. Maksai—These authors contributed equally to this work and share first authorship.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
A notebook accompanying this submission that can run inference on example models for each dataset, data representation, and model type, and includes test label sets, is available here: https://colab.research.google.com/drive/1AkwmDOkEIkifbOYEBdcB9PrR_Ll-fcmz.
References
Aksan, E., Deselaers, T., Tagliasacchi, A., Hilliges, O.: Cose: compositional stroke embeddings. arXiv preprint arXiv:2006.09930 (2020)
Aksan, E., Pece, F., Hilliges, O.: Deepwriting: making digital ink editable via deep generative modeling. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (2018)
Baevski, A., Auli, M.: Adaptive input representations for neural language modeling. arXiv preprint arXiv:1809.10853 (2018)
Basu, S., Ramachandran, G.S., Keskar, N.S., Varshney, L.R.: Mirostat: a neural text decoding algorithm that directly controls perplexity (2020). https://doi.org/10.48550/ARXIV.2007.14966. https://arxiv.org/abs/2007.14966
Bengio, S., Vinyals, O., Jaitly, N., Shazeer, N.: Scheduled sampling for sequence prediction with recurrent neural networks (2015). https://doi.org/10.48550/ARXIV.1506.03099.https://arxiv.org/abs/1506.03099
Betker, J.: TorToiSe text-to-speech, April 2022. https://github.com/neonbjb/tortoise-tts
Cao, N., Yan, X., Shi, Y., Chen, C.: AI-sketcher: a deep generative model for producing high-quality sketches. In: Proceedings of the AAAI Conference on Artificial Intelligence (2019)
Carbune, V., et al.: Fast multi-language LSTM-based online handwriting recognition (2020)
Chang, J., Shrivastava, A., Koppula, H., Zhang, X., Tuzel, O.: Style equalization: Unsupervised learning of controllable generative sequence models. arXiv preprint arXiv:2110.02891 (2021)
Chang, J.H.R., et al.: Data incubation-synthesizing missing data for handwriting recognition. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4188–4192. IEEE (2022)
Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A., Bengio, Y.: A recurrent latent variable model for sequential data (2015). https://doi.org/10.48550/ARXIV.1506.02216. https://arxiv.org/abs/1506.02216
Das, A., Yang, Y., Hospedales, T., Xiang, T., Song, Y.Z.: Sketchode: learning neural sketch representation in continuous time. In: International Conference on Learning Representations (2021)
Fan, A., Lewis, M., Dauphin, Y.: Hierarchical neural story generation (2018). https://doi.org/10.48550/ARXIV.1805.04833, https://arxiv.org/abs/1805.04833
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323. JMLR Workshop and Conference Proceedings (2011)
Graves, A.: Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)
Ha, D., Eck, D.: A neural representation of sketch drawings. arXiv preprint arXiv:1704.03477 (2017)
He, T., Zhang, J., Zhou, Z., Glass, J.R.: Quantifying exposure bias for open-ended language generation (2020)
Holtzman, A., Buys, J., Du, L., Forbes, M., Choi, Y.: The curious case of neural text degeneration (2019). https://doi.org/10.48550/ARXIV.1904.09751. https://arxiv.org/abs/1904.09751
Keysers, D., Deselaers, T., Rowley, H., Wang, L., Carbune, V.: Multi-language online handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. (2017)
Kotani, A., Tellex, S., Tompkin, J.: Generating handwriting via decoupled style descriptors. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 764–780. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_45
Krishna, K., Chang, Y., Wieting, J., Iyyer, M.: Rankgen: improving text generation with large ranking models (2022). https://doi.org/10.48550/ARXIV.2205.09726. https://arxiv.org/abs/2205.09726
Liu, B., Wei, H., Niu, D., Chen, H., He, Y.: Asking questions the human way: scalable question-answer generation from text corpus. In: Proceedings of the Web Conference 2020, pp. 2032–2043 (2020)
Liwicki, M., Bunke, H.: IAM-OnDB-an on-line English sentence database acquired from handwritten text on a whiteboard. In: ICDAR 2005. IEEE (2005)
Luhman, T., Luhman, E.: Diffusion models for handwriting generation (2020). https://doi.org/10.48550/ARXIV.2011.06704, https://arxiv.org/abs/2011.06704
Maksai, A., Rowley, H., Berent, J., Musat, C.: Inkorrect: online handwriting spelling correction (2022). https://doi.org/10.48550/ARXIV.2202.13794. https://arxiv.org/abs/2202.13794
Meister, C., Pimentel, T., Wiher, G., Cotterell, R.: Typical decoding for natural language generation (2022). https://doi.org/10.48550/ARXIV.2202.00666. https://arxiv.org/abs/2202.00666
Nguyen, H., Nguyen, C., Bao, P., Nakagawa, M.: A database of unconstrained Vietnamese online handwriting and recognition experiments by recurrent neural networks. Pattern Recognition (2018)
Ravaut, M., Joty, S., Chen, N.F.: Summareranker: a multi-task mixture-of-experts re-ranking framework for abstractive summarization. arXiv preprint arXiv:2203.06569 (2022)
Reddy, R.: Speech understanding systems: A summary of results of the five-year research effort at carnegie mellon university. Tech. rep. (1977)
Ribeiro, L., Bui, T., Collomosse, J., Ponti, M.: Sketchformer: transformer-based representation for sketched structure. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
See, A., Pappu, A., Saxena, R., Yerukola, A., Manning, C.D.: Do massively pretrained language models make better storytellers? (2019). https://doi.org/10.48550/ARXIV.1909.10705, https://arxiv.org/abs/1909.10705
Song, Y.: Béziersketch: A generative model for scalable vector sketches. University of Surrey, Tech. rep. (2020)
Vaswani, A., et al.: Attention is all you need. In: Advances in neural information processing systems (2017)
Wang, Y., et al.: Tacotron: towards end-to-end speech synthesis. arXiv preprint arXiv:1703.10135 (2017)
Zhang, T., et al.: Coder reviewer reranking for code generation. arXiv preprint arXiv:2211.16490 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Afonin, A., Maksai, A., Timofeev, A., Musat, C. (2023). Sampling and Ranking for Digital Ink Generation on a Tight Computational Budget. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14190. Springer, Cham. https://doi.org/10.1007/978-3-031-41685-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-41685-9_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41684-2
Online ISBN: 978-3-031-41685-9
eBook Packages: Computer ScienceComputer Science (R0)