Sampling and Ranking for Digital Ink Generation on a Tight Computational Budget

Andrei Afonin¹¹,
Andrii Maksai¹²,
Aleksandr Timofeev¹¹ &
…
Claudiu Musat¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14190))

Included in the following conference series:

International Conference on Document Analysis and Recognition

819 Accesses

Abstract

Digital ink (online handwriting) generation has a number of potential applications for creating user-visible content, such as handwriting autocompletion, spelling correction, and beautification. Writing is personal and usually the processing is done on-device. Ink generative models thus need to produce high quality content quickly, in a resource constrained environment.

In this work, we study ways to maximize the quality of the output of a trained digital ink generative model, while staying within an inference time budget. We use and compare the effect of multiple sampling and ranking techniques, in the first ablation study of its kind in the digital ink domain.

We confirm our findings on multiple datasets - writing in English and Vietnamese, as well as mathematical formulas - using two model types and two common ink data representations. In all combinations, we report a meaningful improvement in the recognizability of the synthetic inks, in some cases more than halving the character error rate metric, and describe a way to select the optimal combination of sampling and ranking techniques for any given computational budget.

A. Afonin—Work done as a student researcher at Google Research, Zürich, Switzerland.

A. Afonin and A. Maksai—These authors contributed equally to this work and share first authorship.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Notes

1.
A notebook accompanying this submission that can run inference on example models for each dataset, data representation, and model type, and includes test label sets, is available here: https://colab.research.google.com/drive/1AkwmDOkEIkifbOYEBdcB9PrR_Ll-fcmz.

References

Aksan, E., Deselaers, T., Tagliasacchi, A., Hilliges, O.: Cose: compositional stroke embeddings. arXiv preprint arXiv:2006.09930 (2020)
Aksan, E., Pece, F., Hilliges, O.: Deepwriting: making digital ink editable via deep generative modeling. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (2018)
Google Scholar
Baevski, A., Auli, M.: Adaptive input representations for neural language modeling. arXiv preprint arXiv:1809.10853 (2018)
Basu, S., Ramachandran, G.S., Keskar, N.S., Varshney, L.R.: Mirostat: a neural text decoding algorithm that directly controls perplexity (2020). https://doi.org/10.48550/ARXIV.2007.14966. https://arxiv.org/abs/2007.14966
Bengio, S., Vinyals, O., Jaitly, N., Shazeer, N.: Scheduled sampling for sequence prediction with recurrent neural networks (2015). https://doi.org/10.48550/ARXIV.1506.03099.https://arxiv.org/abs/1506.03099
Betker, J.: TorToiSe text-to-speech, April 2022. https://github.com/neonbjb/tortoise-tts
Cao, N., Yan, X., Shi, Y., Chen, C.: AI-sketcher: a deep generative model for producing high-quality sketches. In: Proceedings of the AAAI Conference on Artificial Intelligence (2019)
Google Scholar
Carbune, V., et al.: Fast multi-language LSTM-based online handwriting recognition (2020)
Google Scholar
Chang, J., Shrivastava, A., Koppula, H., Zhang, X., Tuzel, O.: Style equalization: Unsupervised learning of controllable generative sequence models. arXiv preprint arXiv:2110.02891 (2021)
Chang, J.H.R., et al.: Data incubation-synthesizing missing data for handwriting recognition. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4188–4192. IEEE (2022)
Google Scholar
Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A., Bengio, Y.: A recurrent latent variable model for sequential data (2015). https://doi.org/10.48550/ARXIV.1506.02216. https://arxiv.org/abs/1506.02216
Das, A., Yang, Y., Hospedales, T., Xiang, T., Song, Y.Z.: Sketchode: learning neural sketch representation in continuous time. In: International Conference on Learning Representations (2021)
Google Scholar
Fan, A., Lewis, M., Dauphin, Y.: Hierarchical neural story generation (2018). https://doi.org/10.48550/ARXIV.1805.04833, https://arxiv.org/abs/1805.04833
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323. JMLR Workshop and Conference Proceedings (2011)
Google Scholar
Graves, A.: Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)
Google Scholar
Ha, D., Eck, D.: A neural representation of sketch drawings. arXiv preprint arXiv:1704.03477 (2017)
He, T., Zhang, J., Zhou, Z., Glass, J.R.: Quantifying exposure bias for open-ended language generation (2020)
Google Scholar
Holtzman, A., Buys, J., Du, L., Forbes, M., Choi, Y.: The curious case of neural text degeneration (2019). https://doi.org/10.48550/ARXIV.1904.09751. https://arxiv.org/abs/1904.09751
Keysers, D., Deselaers, T., Rowley, H., Wang, L., Carbune, V.: Multi-language online handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. (2017)
Google Scholar
Kotani, A., Tellex, S., Tompkin, J.: Generating handwriting via decoupled style descriptors. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 764–780. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_45
Chapter Google Scholar
Krishna, K., Chang, Y., Wieting, J., Iyyer, M.: Rankgen: improving text generation with large ranking models (2022). https://doi.org/10.48550/ARXIV.2205.09726. https://arxiv.org/abs/2205.09726
Liu, B., Wei, H., Niu, D., Chen, H., He, Y.: Asking questions the human way: scalable question-answer generation from text corpus. In: Proceedings of the Web Conference 2020, pp. 2032–2043 (2020)
Google Scholar
Liwicki, M., Bunke, H.: IAM-OnDB-an on-line English sentence database acquired from handwritten text on a whiteboard. In: ICDAR 2005. IEEE (2005)
Google Scholar
Luhman, T., Luhman, E.: Diffusion models for handwriting generation (2020). https://doi.org/10.48550/ARXIV.2011.06704, https://arxiv.org/abs/2011.06704
Maksai, A., Rowley, H., Berent, J., Musat, C.: Inkorrect: online handwriting spelling correction (2022). https://doi.org/10.48550/ARXIV.2202.13794. https://arxiv.org/abs/2202.13794
Meister, C., Pimentel, T., Wiher, G., Cotterell, R.: Typical decoding for natural language generation (2022). https://doi.org/10.48550/ARXIV.2202.00666. https://arxiv.org/abs/2202.00666
Nguyen, H., Nguyen, C., Bao, P., Nakagawa, M.: A database of unconstrained Vietnamese online handwriting and recognition experiments by recurrent neural networks. Pattern Recognition (2018)
Google Scholar
Ravaut, M., Joty, S., Chen, N.F.: Summareranker: a multi-task mixture-of-experts re-ranking framework for abstractive summarization. arXiv preprint arXiv:2203.06569 (2022)
Reddy, R.: Speech understanding systems: A summary of results of the five-year research effort at carnegie mellon university. Tech. rep. (1977)
Google Scholar
Ribeiro, L., Bui, T., Collomosse, J., Ponti, M.: Sketchformer: transformer-based representation for sketched structure. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
See, A., Pappu, A., Saxena, R., Yerukola, A., Manning, C.D.: Do massively pretrained language models make better storytellers? (2019). https://doi.org/10.48550/ARXIV.1909.10705, https://arxiv.org/abs/1909.10705
Song, Y.: Béziersketch: A generative model for scalable vector sketches. University of Surrey, Tech. rep. (2020)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in neural information processing systems (2017)
Google Scholar
Wang, Y., et al.: Tacotron: towards end-to-end speech synthesis. arXiv preprint arXiv:1703.10135 (2017)
Zhang, T., et al.: Coder reviewer reranking for code generation. arXiv preprint arXiv:2211.16490 (2022)

Download references

Author information

Authors and Affiliations

EPFL, Lausanne, Switzerland
Andrei Afonin & Aleksandr Timofeev
Google Research, Zürich, Switzerland
Andrii Maksai & Claudiu Musat

Authors

Andrei Afonin
View author publications
You can also search for this author in PubMed Google Scholar
Andrii Maksai
View author publications
You can also search for this author in PubMed Google Scholar
Aleksandr Timofeev
View author publications
You can also search for this author in PubMed Google Scholar
Claudiu Musat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrii Maksai .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Gernot A. Fink
Adobe, College Park, MN, USA
Rajiv Jain
Osaka Metropolitan University, Osaka, Japan
Koichi Kise
Rochester Institute of Technology, Rochester, NY, USA
Richard Zanibbi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Afonin, A., Maksai, A., Timofeev, A., Musat, C. (2023). Sampling and Ranking for Digital Ink Generation on a Tight Computational Budget. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14190. Springer, Cham. https://doi.org/10.1007/978-3-031-41685-9_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-41685-9_9
Published: 19 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41684-2
Online ISBN: 978-3-031-41685-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)