6232-Article Text-9457-1-10-20200516
6232-Article Text-9457-1-10-20200516
6232-Article Text-9457-1-10-20200516
7375
MRKQXQZLWWLQJO\XQOHDVKHVDQLQVLGLRXVSR[
natural language sentences. We use a variation of this event
VHQWHQFHQ 3(5621!GLVDVVHPEOHFRQWDJLRXVGLVHDVHQ䌙 structure. In our work, events are defined as a 5-tuple of
s, v, o, p, m as opposed to the 4-tuples used in Martin et
(YHQWLI\ HYHQWQ
(YHQW al. (2018). Here v is a verb, s is the subject of the verb, o
(YHQW is the object, p is the corresponding preposition, and m can
3(5621! MRKQ
&RQWDJLRXVGLVHDVHQ SR[
be a modifier, prepositional object, or indirect object. Any
of these elements can be ∅, denoting the absence of the el-
:RUNLQJ ement. All elements are stemmed and generalized with the
/RQJWHUP HYHQWQ
PHPRU\ exception of the preposition.
PDOHQVSDWLDOFRQILJXUDWLRQ䌙DGRSW The generalization process involves finding the Verb-
Net (Schuler and Kipper-Schuler 2005) v3.3 class of the
JHQHUDOL]HG
(YHQW verb and finding the WordNet (Miller 1995) v3.1 Synset
6ORWILOOHU
VHQWHQFHQ
6HQWHQFH
that is two levels higher in the hypernym tree for all of
the nouns in the event. This process also includes the iden-
PDOHQFUXPSOHVDQGLVDERXWWREH
VHQWHQFHQ VKHDWKQ tification of named entities in the event tuple, extracting
+HFUXPSOHVDQGLVDERXWWREHKXVN
people, organizations, locations, etc. through named entity
recognition (NER) and numbering them as the story goes
Figure 1: The full automated story generation pipeline, illus- on. For example “PERSON” names are replaced by the tag
trating an example where the event-to-event module gener- <PERSON>n where n indicates the n-th “PERSON” in
ates only a single following event. the story. Similarly, the other NER categories are replaced
with tags that indicate their category and their number within
the story. This maintains consistency in named entities for a
ensemble-based system for the event-to-sentence problem given story in the corpora.
that balances between retaining the event’s original seman- We further process the corpus by “splitting” sentences
tic meaning, while being an interesting continuation of the akin to the “split-and-prune” methodology of Martin et
story. We demonstrate that our system for guided language al. (2018). This is done to decrease the number of events
generation outperforms a baseline sequence-to-sequence ap- generated from a single sentence—reducing the number of
proach. Additionally, we present the results of a full end-to- mappings of a single sentence to multiple events. The split-
end story generation pipeline (Figure 1), showing how all of ting process starts with extracting the parse trees of each sen-
the sub-systems can be integrated. tence using the Stanford Parser. Sentences are then split on
S’s (SBARs) and conjunctions before nested sentences. This
2 Related Work and Background process can result in incomplete sentences where the S-bar
phrase is nested inside of a sentence, acting as the direct ob-
2.1 Story Generation via Machine Learning ject. For example, when it sees a sentence like “She says
Machine learning approaches to story and plot generation that he is upset.” it becomes “She says. He is upset.” Then
attempt to learn domain information from a corpus of story the split sentences are sorted to reflect the original ordering
examples (Swanson and Gordon 2012; Li et al. 2013). Re- of subjects or phrases as closely as possible.
cent work has looked at using recurrent neural networks For this paper, the event-to-event system is the policy
(RNNs) for story and plot generation. Roemmele and Gor- gradient deep reinforcement learner from Tambwekar et
don (2018) use LSTMs with skip-though vector embed- al. (2019). This system has been tested to ensure that the
dings (Kiros et al. 2015) to generate stories. Khalifa, Barros, resulting events are of high quality to minimize error in that
and Togelius (2017) train an RNN on a highly-specialized portion of the pipeline. Our event-to-sentence system is ag-
corpus, such as work from a single author. Fan, Lewis, and nostic to the choice of the event-to-event system, all it re-
Dauphin (2018) introduce a form of hierarchical story gen- quires is a sequence of events to turn into sentences. The
eration in which a premise is first generated by the model event-to-event network is placed into the pipeline as the
and then transformed into a passage. This last example is “Event2Event” module, seen in Figure 1, and its output is
a form of guided generation wherein a single sentence pro- fed into the event-to-sentence models during testing.
vides guidance. Similarly, Yao et al. (2019) decompose story
generation into planning out a storyline and then generating 3 Event-to-Sentence
a story from it. Our work differs in that we use the event- We define event-to-sentence to be the problem of select-
to-event process to provide guidance to event-to-sentence. ing a sequence of words st = st0 , st1 , ..., stk —that form
Ammanabrolu et al. (2019) look at narrative generation as a sentence—given the current input event et , i.e. the cur-
a form of quest generation in interactive fiction and use a rent sentence is generated based on maximizing P r(st |et ; θ)
knowledge graph to ground their generative models. where θ refers to the parameters of the generative sys-
tem. The eventification in Section 2.2 is a lossy process in
2.2 Event Representation and Generation which some of the information from the original sentence
Martin et al. (2018) showed that the performance on is dropped. Thus, the task of event-to-sentence involves fill-
both event-to-event and event-to-sentence problems improve ing in this missing information. There is also no guarantee
when using an abstraction—known as an event—instead of that the event-to-event process will produce an event that is
7376
part of the event-to-sentence training corpus, simply due to there are two key phases: “retrieve” phase and “edit” phase.
the fact that the space of potentially-generated events is very With respect to the input event, we first retrieve the nearest-
large; the correct mapping from the generated event to a nat- neighbor event and its corresponding sentence in the train-
ural language sentence would be unknown. ing set using the retriever model. Passing both the retrieved
In prior work, Martin et al. (2018) use a sequence-to- event-sentence pair and the input event as inputs, we use the
sequence LSTM neural network to translate events into sen- editor model to generate a sentence using beam search.
tences. We observe that “vanilla” sequence-to-sequence net- Many of the successes produced by the model stem from
works end up operating as simple language models, often its ability to retain the complex sentence structures that ap-
ignoring the input event when generating a sentence. The pear in our training corpus and thus attempts to balance be-
generated sentence is usually grammatically correct but re- tween maintaining coherence and being interesting. How-
tains little of the semantic meaning given by the event. ever, this interaction with the training data can also prove
We thus look for other forms of guided neural language to be a major drawback of the method; target events that
generation, with the goals of preserving the semantic mean- are distant in the embedding space from training examples
ing from the event in addition to keeping the generated typically result in poor sentence quality. Since RetEdit re-
sentences interesting. We propose four different models— lies heavily on having good examples, we set the confi-
optimized towards a different point in the spectrum be- dence of the retrieve-and-edit model to be proportional to 1 –
tween the two objectives, and a baseline fifth model that retrieval distance when generating sentences, as a lower
is used as a fallthrough. The task of each model is to retrieval distance implies greater confidence. However, the
translate events into “generalized” sentences, wherein nouns mapping from event to sentence is not a one-to-one func-
are replaced by WordNet Synsets. If a model does not tion. There are occasionally multiple sentences that map to
pass a specific threshold (determined individually for each a single event, resulting in retrieval distance of 0, in which
model), the system continues onto the next model in the case the example sentence is returned without modifications.
ensemble. In order, the models are: (1) a retrieve-and-edit
model based on Hashimoto et al. (2018); (2) template fill- 3.2 Sentence Templating
ing; (3) sequence-to-sequence with Monte Carlo beam de- As mentioned earlier, the baseline sequence-to-sequence
coding; (4) sequence-to-sequence with a finite state ma- network operates as a simple language model and can often
chine decoder; and (5) vanilla (beam-decoding) sequence- ignore the input event when generating a sentence. However,
to-sequence. We find that none of these models by them- we know that our inputs, an event tuple will have known
selves can successfully find a balance between the goals parts of speech.We created a simplified grammar for the syn-
of retaining all of the event tokens and generating interest- tax of sentences generated from events:
ing output. However, each of the models possess their own S → N P v (N P ) (P P )
strengths and weaknesses—each model is essentially opti-
mized towards a different point on the spectrum between the NP → d n
two goals. We combine these models into an ensemble in PP → p NP
an attempt to minimize the weaknesses of each individual where d is a determiner that will be added and the rest of
model and to achieve a balance. the terminal symbols correspond to an argument in the event,
with n being s, o, or m, depending on its position in the sen-
3.1 Retrieve-and-Edit tence. The resulting sentence would be [ s]{v [ o] [p m]}
The first model is based on the retrieve-and-edit RetEdit where blanks indicate where words should be added to make
framework for predicting structured outputs (Hashimoto et a complete sentence.
al. 2018). We first learn a task-specific similarity between First, our algorithm predicts the most likely VerbNet
event tuples by training an encoder-decoder to map each frame based on the contents of the input event (how many
event onto an embedding that can reconstruct the output and which arguments are filled). VerbNet provides a num-
sentence; this is our retriever model. Next, we train an edi- ber of syntactic structures for different verb classes based on
tor model which maximizes the likelihood of generating the how the verb is being used. For example, if the input event
target sentence given both the input event and a retrieved contains 2 nouns and a verb without a preposition, we as-
event-sentence example pair. We used a standard sequence- sume that the output sentence takes the form of [NP V NP],
to-sequence model with attention and copying (Gu et al. but if it has 2 nouns, a verb, and a proposition, then it should
2016) to stand in as our editor architecture. Although this be [NP V PP].
framework was initially applied to the generation of GitHub Second, we apply a Bidirectional LSTM language model
Python code and Hearthstone cards, we extend this tech- trained on the generalized sentences in our training cor-
nique to generate sentences from our event tuples. Specif- pus. Given a word, we can generate words before and af-
ically, we first initialize a new set of GLoVe word embed- ter it, within a particular phrase as given by some of the
dings (Pennington, Socher, and Manning 2014), using ran- rules above, and concatenate the generated sentence frag-
dom initialization for out-of-vocabulary words. We use our ments together. Specifically, we use the AWD-LSTM (Mer-
training set to learn weights for the retriever and editor mod- ity, Keskar, and Socher 2018) architecture as our language
els, set confidence thresholds for the model with the valida- model since it is currently state-of-the-art.
tion set, and evaluate performance using the test set. At decode time, we continue to generate words in each
In order to generate a sentence from a given input event, phrase until we reach a stopping condition: (1) reaching a
7377
maximum length (to prevent run-on sentences); or (2) gen- Monte Carlo beam decoder has been shown to generate bet-
erating a token that is indicative of an element in the next ter sentences that are more grammatically-correct than the
phrase, for example seeing a verb being generated in a noun other techniques in our ensemble, while sticking more to the
phrase. When picking words from the language model, we input than a traditional beam decoder. However, there is no
noticed that the words “the” and “and” were extremely com- guarantee that all input event tokens will be included in the
mon. To increase the variety of the sentences, we sample final output sentence.
from the top k most-likely next words and enforce a number
of grammar-related rules in order to keep the coherence of 3.4 Finite State Machine Constrained Beams
the sentence. For example, we do not allow two determiners Various forms of beam search, including Monte Carlo play-
nor two nouns to be generated next to each other. outs, cannot ensure that the tokens from an input event ap-
One can expect that many of the results will look struc- pear in the outputted sentence. As such, we adapted the al-
turally similar. However, we can guarantee that the provided gorithm to fit such lexical constraints, similar to Anderson
tokens in the event will appear in the generated sentence— et al. (2017) who adapted beam search to fit captions for
this model is optimized towards maintaining coherence. To images, with the lexical constraints coming from sets of im-
determine the confidence of the model for each sentence, we age tags. The Constrained Beam Search used finite state ma-
sum the loss after each generated token, normalize to sen- chines to guide the beam search toward generating the de-
tence length, and subtract from 1 as higher loss translates to sired tokens. Their approach, which we have co-opted for
lower confidence. event-to-sentence, attempts to achieve a balance between the
flexibility and sentence quality typical of a beam search ap-
3.3 Monte-Carlo Beam Search proach, while also adhering to the context and story encoded
in the input events that more direct approaches (e.g. Section
Our third method is an adaptation of Monte Carlo Beam
3.2) would achieve.
Search (Cazenave 2012) for event-to-sentence. We train a
The algorithm works on a per-event basis, beginning by
sequence-to-sequence model on pairs of events & general-
generating a finite state machine. This finite state machine
ized sentences and run Monte Carlo beam search at decode
consists of states that enforce the presence of input tokens
time. This method differs from traditional beam search in
in the generated sentence. As an example, assume we have
that it introduces another scoring term that is used to re-
an n-token input event, {t1 , t2 , t3 , ..., tn }. The correspond-
weight all the beams at each timestep.
ing machine consists of 2n states. Each state maintains a
After top-scoring words are outputted by the model at
search beam of size B s with at most b output sequences,
each timestep, playouts are done from each word, or node. A
corresponding to the configured beam size s. At each time
node is the final token of the partially-generated sequences
step, every state (barring the initial state) receives from pre-
on the beam currently and the start of a new playout. Dur-
decessor states those output sequences whose last generated
ing each playout, one word is sampled from the current
token matches an input event token. The state then adds to
step’s softmax over all words in the vocabulary. The decoder
its beam the b most likely output sequences from those re-
network is unrolled until it reaches the “end-of-story” tag.
ceived. Generating token t1 moves the current state from the
Then, the previously-generated sequence and the sequence
initial state to the state corresponding to t1 , t3 to a state for
generated from the current playout are concatenated together
t3 , and so on. The states t1 and t3 then, after generating to-
and passed into a scoring function that computes the current
kens t1 and t3 respectively, transmit said sequences to the
playout’s score.
state t1,3 . The states and transitions proceed as such until
The scoring function is a combination of (1) BLEU scores reaching the final state, wherein they have matched every
up to 4-grams between the input event and generated sen- token in the input event. Completed sequences in the final
tence, as well as (2) a weighted 1-gram BLEU score be- state contain all input event tokens, thus providing us with
tween each item in the input event and generated sentence. the ability to retain the semantic meaning of the event.
The weights combining the 1-gram BLEU scores are learned As much as the algorithm is based around balancing gen-
during validation time where the weight for each word in the erating good sentences with satisfying lexical constraints, it
event that does not appear in the final generated sequence does not perform particularly well at either. It is entirely pos-
gets bumped up. Multiple playouts are done from each word sible, if not at all frequent, for generated sentences to contain
and the score s for the current word is computed as: all input tokens but lose proper grammar and syntax, or even
st = α ∗ st−1 + (1 − α) ∗ AV G(playoutt ) (1) fail to reach the final state within a fixed time horizon. This is
exacerbated by larger tuples of tokens, seen even at just five
where α is a constant. tokens per tuple. To compensate, we relax our constraint to
In the end, the k partial sequences with the highest playout permit output sequences that have matched at least three out
scores are kept as the current beam. For the ensemble, this of five tokens from the input event.
model’s confidence score is the final score of the highest-
scoring end node. Monte Carlo beam search excels at cre- 3.5 Ensemble
ating diverse output—i.e. it skews towards generating inter- The entire event-to-sentence ensemble is designed as a cas-
esting sentences. Since the score for each word is based on cading sequence of models: (1) retrieve-and-edit, (2) sen-
playouts that sample based on weights at each timestep, it tence templating, (3) Monte Carlo beam search, (4) finite
is possible for the output to be different across runs. The state constrained beam search, and (5) standard beam search.
7378
We use the confidence scores generated by each of the mod- fully prepared, it is split in a 8:1:1 ratio to create the training,
els in order to re-rank the outputs of the individual models. validation, and testing sets, respectively.
This is done by setting a confidence threshold for each of
the models such that if a confidence threshold fails, the next 5 Experiments
model in the ensemble is tried. The thresholds are tuned on We perform two sets of experiments, one set evaluating
the confidence scores generated from the individual models our models on the event-to-sentence problem by itself,
on the validation set of the corpus. This ensemble saves on and another set intended to evaluate the full storytelling
computation as it sequentially queries each model, terminat- pipeline. Each of the models in the event-to-sentence en-
ing early and returning an output sentence if the confidence semble are trained on the training set in the sci-fi corpus.
threshold for any of the individual models are met. The training details for each of the models are as described
An event first goes through the retrieve-and-edit frame- above. All of the models in the ensemble slot-fill the verb
work, which generates a sentence and corresponding confi- automatically—filling a VerbNet class with a verb of ap-
dence score. This framework performs well when it is able to propriate conjugation—except for the sentence templating
retrieve a sample from the training set that is relatively close model which does verb slot-filling during post-processing.
in terms of retrieval distance to the input. Given the sparsity After the models are trained, we pick the cascading
of the dataset, this happens with a relatively low probability, thresholds for the ensemble by running the validation set
and so we place this model first in the sequence. through each of the models and generating confidence
The next two models are each optimized towards one of scores. This is done by running a grid search through a
our two main goals. The sentence templating approach re- limited set of thresholds such that the overall BLEU-4
tains all of the tokens within the event and so loses none of score (Papineni et al. 2002) of the generated sentences in
its semantic meaning, at the expense of generating a more in- the validation set is maximized. These thresholds are then
teresting sentence. The Monte-Carlo approach, on the other frozen when running the final set of evaluations on the test
hand, makes no guarantees regarding retaining the original set. For the baseline sequence-to-sequence method, we de-
tokens within the event but is capable of generating a di- code our output with a beam size of 5. We report perplexity,
verse set of sentences. We thus cascade first to the sentence BLEU-4, and ROUGE-4 scores, comparing against the gold
templating model and then the Monte-Carlo approach, im- standard from the test set.
plicitly placing greater importance on the goal of retaining
the semantic meaning of the event. P erplexity = 2− x p(x) log2 p(x)
(2)
The final model queried is the finite-state-machine–
where x is a token in the text, and
constrained beam search. This model has no confidence
score; either the model is successful in producing a sentence count(x)
within the given length with the event tokens or not. In the p(x) = (3)
v∈V count(V )
case that the finite state machine based model is unsuccess-
ful in producing a sentence, the final fallthrough model—the where V is the vocabulary. Our BLEU-4 scores are naturally
baseline sequence-to-sequence model with standard beam low (where higher is better) because of the creative nature of
search decoding—is used. the task—good sentences may not use any of the ground-
truth n-grams. Even though we frame Event2Sentence as
4 Dataset a translation task, BLEU-4 and ROUGE-4 are not reliable
metrics for creative generation tasks.
To aid in the performance of our story generation, we select The first experiment takes plots in the in the test set, even-
a single genre: science fiction. We scraped long-running sci- tifies them, and then uses our event-to-sentence ensemble to
ence fiction TV show plot summaries from the fandom wiki convert them back to sentences. In addition to using the full
service wikia.com. This dataset contains longer and more ensemble, we further experiment with using different combi-
detailed plot summaries than the dataset used in Martin et nations of models along the spectrum between maintaining
al. (2018) and Tambwekar et al. (2019), which we believe coherence and being interesting. We then evaluate the gen-
to be important for the overall story generation process. The erated sentences, using the original sentences from the test
corpus contains 2,276 stories in total, each story an episode set as a gold standard.
of a TV show. The average story length is 89.23 sentences. The second experiment uses event sequences generated by
There are stories from 11 shows, with an average of 207 sto- an event-to-event system such as Tambwekar et al. (2019)
ries per show, from shows like Doctor Who, Futurama, and and is designed to demonstrate how our system integrates
The X-Files. The data was pre-processed to simplify alien into the larger pipeline described in Figure 1. We then
names in order to aid the parser. Then the sentences were transform these generated event sequences into general-
split, partially following the “split-and-pruned” methodol- ized sentences using both the ensemble and the baseline
ogy of Martin et al. (2018) as described in 2.2. sequence-to-sequence approach. As the last step, the gen-
Once the sentences were split, they were “eventified” as eralized sentences are passed into the “slot filler” (see Fig-
described in Section 2.2. One benefit of having split sen- ure 1) such that the categories are filled. As the story goes
tences is that there is a higher chance of having a 1:1 cor- on, the “memory” maintains a dynamic graph that keeps
respondence between a sentence and an event, instead of a track of what entities (e.g. people, items) are mentioned at
single sentence becoming multiple events. After the data is which event and what their tag was (e.g. <PERSON>5,
7379
Table 1: Event-to-sentence examples for each model. ∅ represents an empty parameter; <PRP> is a pronoun.
Input Event RetEdit Templates Monte Carlo FSM Gold Standard
<PRP>, act-114-1- <PRP> and <PRP> act-114-1-1 <PRP> moves to physical entity.n.01 <PRP> move to the
1, to, ∅, event.n.01 <PERSON>0 move to event.n.01. the nearest natu- move back to the event.n.01.
to the event.n.01 of the ral object.n.01. phenomenon.n.01 of
natural object.n.01. the craft.n.02...
<PERSON>2, <PERSON>2 sends The <PERSON>2 <PERSON>2 ∅ In activity.n.01 to
send-11.1, through, <PERSON>6 send-11.1 the passes this un- avoid <PRP> out.n.01
<PERSON>6, through the <PERSON>6 dercover in the <PERSON>2 would trans-
<LOCATION>1 <LOCATION>1. through body part.n.01 and port <PERSON>6 through
<LOCATION>1. collapses. the <LOCATION>1.
Table 2: End-to-end pipeline examples on previously-unseen input data. The Event-to-Sentence model used is the full ensemble.
Sentences are generated using both the extracted and generated events.
Input Sent. Extracted event Generated Events (Event-to- Generated Sentences (Event-to-Sentence) Slot-filled Sentences
Event)
On Tatooine, <ORG>0, <PERSON>1, settle-36.1.2, The <ORG>0 can not scan the vessel.n.02 The Jabba the Hutt can not scan the
Jabba the Hutt assessment-34.1, ∅, indicator.n.03, indicator.n.03 of the <VESSEL>0. <PERSON>1 de- bareboat of the Uss Lakota. O Yani
inspects the drone ∅, vessel.n.02, ; music.n.01, escape-51.1- cides to be a little person.n.01 at the decides to be a little mailer at the air-
barge recently ∅ 1, from, ∅, ∅; <PRP>, structure.n.01. the music.n.01 arrives. dock. The Music arrives. She finds a
delivered to him. discover-84, to, run-51.3.2, <PRP> finds a lonely person.n.01 on the lonely mailer on the upper one of the
progenitor.n.01 upper one of the craft.n.02 which is not a bareboat which is not a love letter but
personal letter.n.01 but does not respond to does not respond to hails.
hails .
Boba Fett has <PERSON>0, <PERSON>0, chase- <PERSON>0 enters the bounty.n.04 and Boba Fett enters the bounty and tells
just chased down chase-51.6, ∅, 51.6, to, magnitude.n.01, tells <PRP>. <PERSON>0 attaches the it. Boba Fett attaches the explosive
another bounty, a bounty.n.04, ∅ ∅; magnitude.n.01, explosive.a.01 to the person.n.01 who is to the peer who is trying to fix the
Rodian art dealer comprehend-87.2, off, trying to fix the device.n.01. the magni- toy. The multiplicity doesn’t know
who sold fake craft.n.02, magnitude.n.01; tude.n.01 doesn’t know the craft.n.02 off the the bounty off the bounty. Dark Jedi
works to Gebbu <PERSON>2, amuse- craft.n.02. <PERSON>2 is surprised when Lomi Plo is surprised when it learns
the Hutt. 31.1, off, ∅, ∅; <PRP> learns that the person.n.01 is actu- that the peer is actually Mrs Conners.
<PERSON>2, discover-84, ally <PERSON>7. <PERSON>2 sees the Dark Jedi Lomi Plo sees the combi-
off, change of integrity.n.01, ∅ change of integrity.n.01 and tells <PRP>. nation off the Orbs and tells them.
7380
Table 4: Utilization percentages for each model combination on both events from the test set and from the full pipeline.
RetEdit Templates Monte Carlo FSM Seq2seq
Test Pipeline Test Pipeline Test Pipeline Test Pipeline Test Pipeline
RetEdit+MC 82.58 31.74 - - 9.95 48.4 - - 7.46 19.86
Templates+MC - - 6.14 5.48 65.7 66.67 - - 28.16 27.85
Templates+FSM - - 6.14 5.48 - - 56.77 32.65 37.09 61.87
RetEdit+Templates+MC 82.58 31.74 1.49 3.88 9.1 45.21 - - 6.82 19.18
Full Ensemble 94.91 55.71 0.22 0.91 4.29 41.10 0.15 0.68 0.43 1.60
7381
search. In Proceedings of the 2017 Conference on Empirical Meth- Peng, N.; Ghazvininejad, M.; May, J.; and Knight, K. 2018. To-
ods in Natural Language Processing, 36–945. wards Controllable Story Generation. In Proceedings of the First
Cazenave, T. 2012. Monte Carlo Beam Search. IEEE Transactions Workshop on Storytelling, 43–49. New Orleans, Louisiana: Asso-
on Computational Intelligence and AI in games 4(1):68–72. ciation for Computational Linguistics.
Clark, E.; Ji, Y.; and Smith, N. A. 2018. Neural Text Generation in Pennington, J.; Socher, R.; and Manning, C. 2014. Glove: Global
Stories Using Entity Representations as Context. In NAACL-HLT. vectors for word representation. In Proceedings of the 2014
conference on empirical methods in natural language processing
Fan, A.; Lewis, M.; and Dauphin, Y. 2018. Hierarchical Neural (EMNLP), 1532–1543.
Story Generation. In Proceedings of the 56th Annual Meeting of
the Association for Computational Linguistics, 889–898. Pérez y Pérez, M., and Sharples, R. 2001. MEXICA: A computer
model of a cognitive account of creative writing. Journal of Exper-
Farrell, R.; Ware, S. G.; and Baker, L. J. 2019. Manipulating nar- imental & Theoretical Artificial Intelligence 13(2001):119–139.
rative salience in interactive stories using indexter’s pairwise event
Porteous, J., and Cavazza, M. 2009. Controlling narrative gener-
salience hypothesis. IEEE Transactions on Games.
ation with planning trajectories: The role of constraints. In Joint
Gervás, P.; Dı́az-Agudo, B.; Peinado, F.; and Hervás, R. 2005. International Conference on Interactive Digital Storytelling, vol-
Story plot generation based on CBR. Knowledge-Based Systems ume 5915 LNCS, 234–245. Springer.
18(4-5):235–242. Purdy, C.; Wang, X.; He, L.; and Riedl, M. 2018. Predicting gener-
Gu, J.; Lu, Z.; Li, H.; and Li, V. O. K. 2016. Incorporating copy- ated story quality with quantitative measures. In Fourteenth Artifi-
ing mechanism in sequence-to-sequence learning. Association for cial Intelligence and Interactive Digital Entertainment Conference.
Computational Linguistics (ACL). Riedl, M. O., and Young, R. M. 2010. Narrative Planning: Balanc-
Hashimoto, T. B.; Guu, K.; Oren, Y.; and Liang, P. 2018. A ing Plot and Character. Journal of Artificial Intelligence Research
Retrieve-and-Edit Framework for Predicting Structured Outputs. 39:217–267.
In 32nd Conference on Neural Information Processing Systems Roemmele, M., and Gordon, A. S. 2018. An Encoder-decoder
(NeurIPS 2018). Approach to Predicting Causal Relations in Stories. In Proceed-
Jain, P.; Agrawal, P.; Mishra, A.; Sukhwani, M.; Laha, A.; and ings of the First Workshop on Storytelling, 50–59. New Orleans,
Sankaranarayanan, K. 2017. Story generation from sequence of Louisiana: Association for Computational Linguistics.
independent short descriptions. In SIGKDD Workshop on Machine Roemmele, M. 2018. Neural Networks for Narrative Continuation.
Learning for Creativity (ML4Creativity). Ph.D. Dissertation, University of Southern California.
Khalifa, A.; Barros, G. A. B.; and Togelius, J. 2017. DeepTingle. Schuler, K. K., and Kipper-Schuler, K. 2005. VerbNet: A Broad-
In International Conference on Computational Creativity. Coverage, Comprehensive Verb Lexicon. Ph.D. Dissertation, Uni-
Kiros, R.; Zhu, Y.; Salakhutdinov, R. R.; Zemel, R.; Urtasun, R.; versity of Pennsylvania.
Torralba, A.; and Fidler, S. 2015. Skip-thought vectors. In Ad- Sutskever, I.; Vinyals, O.; and Le, Q. V. 2014. Sequence to Se-
vances in neural information processing systems, 3294–3302. quence Learning with Neural Networks. In Advances in Neural
Lebowitz, M. 1987. Planning Stories. In Proceedings of the 9th Information Processing Systems, 3104–3112.
Annual Conference of the Cogntive Science Society, 234–242. Swanson, R., and Gordon, A. 2012. Say Anything: Using tex-
Li, B.; Lee-Urban, S.; Johnston, G.; and Riedl, M. O. 2013. tual case-based reasoning to enable open-domain interactive sto-
Story generation with crowdsourced plot graphs. In Proceedings rytelling. ACM Transactions on Interactive Intelligent Systems
of the Twenty-Seventh AAAI Conference on Artificial Intelligence, 2(3):16:1–16:35.
AAAI’13, 598–604. AAAI Press. Tambwekar, P.; Dhuliawala, M.; Martin, L. J.; Mehta, A.; Harrison,
Martin, L. J.; Ammanabrolu, P.; Wang, X.; Singh, S.; Harrison, B.; B.; and Riedl, M. O. 2019. Controllable Neural Story Plot Genera-
Dhuliawala, M.; Tambwekar, P.; Mehta, A.; Arora, R.; Dass, N.; tion via Reward Shaping. In Proceedings of the 28th International
Purdy, C.; and Riedl, M. O. 2017. Improvisational Storytelling Joint Conference on Artificial Intelligence.
Agents. In Workshop on Machine Learning for Creativity and De- Turner, S. R., and Dyer, M. G. 1986. Thematic knowledge, episodic
sign (NeurIPS 2017). memory and analogy in MINSTREL, a story invention system. Uni-
Martin, L. J.; Ammanabrolu, P.; Wang, X.; Hancock, W.; Singh, S.; versity of California, Computer Science Department.
Harrison, B.; and Riedl, M. O. 2018. Event Representations for Au- Ware, S., and Young, R. M. 2011. Cpocl: A narrative planner
tomated Story Generation with Deep Neural Nets. In Thirty-Second supporting conflict. In Proceedings of the 7th AAAI Conference on
AAAI Conference on Artificial Intelligence (AAAI-18), 868–875. Artificial Intelligence and Interactive Digital Entertainment.
Meehan, J. R. 1977. TALE-SPIN, an interactive program that Wiseman, S.; Shieber, S. M.; and Rush, A. M. 2017. Challenges
writes stories. Proceedings of the 5th international joint confer- in data-to-document generation. In Proceedings of the 2014 Con-
ence on Artificial intelligence 1:91–98. ference on Empirical Methods in Natural Language Processing
(EMNLP).
Merity, S.; Keskar, N. S.; and Socher, R. 2018. Regularizing and
Optimizing LSTM Language Models. In 6th International Confer- Yao, L.; Peng, N.; Weischedel, R.; Knight, K.; Zhao, D.; and Yan,
ence on Learning Representations, ICLR 2018. R. 2019. Plan-And-Write: Towards Better Automatic Storytelling.
In Proceedings of the Thirty-Third AAAI Conference on Artificial
Miller, G. A. 1995. WordNet: A Lexical Database for English.
Intelligence (AAAI-19).
Communications of the ACM 38(11):39–41.
Papineni, K.; Roukos, S.; Ward, T.; and Zhu, W.-J. 2002. Bleu:
a method for automatic evaluation of machine translation. In Pro-
ceedings of the 40th annual meeting on association for computa-
tional linguistics, 311–318. Association for Computational Lin-
guistics.
7382