Avoiding Plagiarism in Markov Sequence Generation
DOI:
https://doi.org/10.1609/aaai.v28i1.9126Keywords:
markov chains, constraint programming, music generation, automatic content generation, plagiarism, max orderAbstract
Markov processes are widely used to generate sequences that imitate a given style, using random walk. Random walk generates sequences by iteratively concatenating states to prefixes of length equal or less than the given Markov order}. However, at higher orders, Markov chains tend to replicate chunks of the corpus with a size possibly higher than the order, a primary form of plagiarism. The Markov order defines a maximum length for training but not for generation. In the framework of constraint satisfaction (CSP), we introduce MaxOrder. This global constraint ensures that generated sequences do not include chunks larger than a given maximum order. We exhibit an automaton that recognises the solution set, with a size linear in the size of the corpus. We propose a linear-time procedure to generate this automaton from a corpus and a given max order. We then use this automaton to achieve generalised arc consistency for the MaxOrder constraint, holding on a sequence of size n, in O(n.T) time, where T is the size of the automaton. We illustrate our approach by generating text sequences from text corpora with a maximum order guarantee, effectively controlling plagiarism.