Nothing Special   »   [go: up one dir, main page]

Information Retrieval 27th China Conference CCIR 2021 Dalian China October 29 31 2021 Proceedings 1st Edition Hongfei Lin Min Zhang Liang Pang

Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

Full download test bank at ebookmeta.

com

Information Retrieval 27th China Conference CCIR


2021 Dalian China October 29 31 2021 Proceedings
1st Edition Hongfei Lin Min Zhang Liang Pang
For dowload this book click LINK or Button below

https://ebookmeta.com/product/information-
retrieval-27th-china-conference-ccir-2021-dalian-
china-october-29-31-2021-proceedings-1st-edition-
hongfei-lin-min-zhang-liang-pang-2/
OR CLICK BUTTON

DOWLOAD EBOOK

Download More ebooks from https://ebookmeta.com


More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Information Retrieval 27th China Conference CCIR 2021


Dalian China October 29 31 2021 Proceedings 1st Edition
Hongfei Lin Min Zhang Liang Pang

https://ebookmeta.com/product/information-retrieval-27th-china-
conference-ccir-2021-dalian-china-
october-29-31-2021-proceedings-1st-edition-hongfei-lin-min-zhang-
liang-pang-2/

Web Information Systems Engineering – WISE 2021: 22nd


International Conference on Web Information Systems
Engineering, WISE 2021, Melbourne, VIC, Australia,
October 26-29, 2021 Proceedings, Part I Wenjie Zhang
https://ebookmeta.com/product/web-information-systems-
engineering-wise-2021-22nd-international-conference-on-web-
information-systems-engineering-wise-2021-melbourne-vic-
australia-october-26-29-2021-proceedings-pa/

Logic and Argumentation 4th International Conference


CLAR 2021 Hangzhou China October 20 22 2021 Proceedings
1st Edition Pietro Baroni

https://ebookmeta.com/product/logic-and-argumentation-4th-
international-conference-clar-2021-hangzhou-china-
october-20-22-2021-proceedings-1st-edition-pietro-baroni/

Evolutionary Multi Criterion Optimization 11th


International Conference EMO 2021 Shenzhen China March
28 31 2021 Proceedings 1st Edition Hisao Ishibuchi

https://ebookmeta.com/product/evolutionary-multi-criterion-
optimization-11th-international-conference-emo-2021-shenzhen-
china-march-28-31-2021-proceedings-1st-edition-hisao-ishibuchi/
Pattern Recognition and Computer Vision 4th Chinese
Conference PRCV 2021 Beijing China October 29 November
1 2021 Proceedings Part IV Lecture Notes in Computer
Science Huimin Ma (Editor)
https://ebookmeta.com/product/pattern-recognition-and-computer-
vision-4th-chinese-conference-prcv-2021-beijing-china-
october-29-november-1-2021-proceedings-part-iv-lecture-notes-in-
computer-science-huimin-ma-editor/

Machine Translation 17th China Conference CCMT 2021


Xining China October 8 10 2021 Revised Selected Papers
1st Edition Jinsong Su Rico Sennrich

https://ebookmeta.com/product/machine-translation-17th-china-
conference-ccmt-2021-xining-china-october-8-10-2021-revised-
selected-papers-1st-edition-jinsong-su-rico-sennrich/

Machine Translation 17th China Conference CCMT 2021


Xining China October 8 10 2021 Revised Selected Papers
1st Edition Jinsong Su Rico Sennrich

https://ebookmeta.com/product/machine-translation-17th-china-
conference-ccmt-2021-xining-china-october-8-10-2021-revised-
selected-papers-1st-edition-jinsong-su-rico-sennrich-2/

Natural Language Processing and Chinese Computing 10th


CCF International Conference NLPCC 2021 Qingdao China
October 13 17 2021 Proceedings Part I Lu Wang

https://ebookmeta.com/product/natural-language-processing-and-
chinese-computing-10th-ccf-international-conference-
nlpcc-2021-qingdao-china-october-13-17-2021-proceedings-part-i-
lu-wang/

Health Information Science 10th International


Conference HIS 2021 Melbourne VIC Australia October 25
28 2021 Proceedings Siuly Siuly (Editor)

https://ebookmeta.com/product/health-information-science-10th-
international-conference-his-2021-melbourne-vic-australia-
october-25-28-2021-proceedings-siuly-siuly-editor/
Hongfei Lin
Min Zhang
Liang Pang (Eds.)
LNCS 13026

Information Retrieval
27th China Conference, CCIR 2021
Dalian, China, October 29–31, 2021
Proceedings
Lecture Notes in Computer Science 13026

Founding Editors
Gerhard Goos
Karlsruhe Institute of Technology, Karlsruhe, Germany
Juris Hartmanis
Cornell University, Ithaca, NY, USA

Editorial Board Members


Elisa Bertino
Purdue University, West Lafayette, IN, USA
Wen Gao
Peking University, Beijing, China
Bernhard Steffen
TU Dortmund University, Dortmund, Germany
Gerhard Woeginger
RWTH Aachen, Aachen, Germany
Moti Yung
Columbia University, New York, NY, USA
More information about this subseries at http://www.springer.com/series/7407
Hongfei Lin Min Zhang
• •

Liang Pang (Eds.)

Information Retrieval
27th China Conference, CCIR 2021
Dalian, China, October 29–31, 2021
Proceedings

123
Editors
Hongfei Lin Min Zhang
Dalian University of Technology Department of Computer Science
Dalian, China Tsinghua University
Beijing, China
Liang Pang
Institute of Computing Technology
Chinese Academy of Sciences
Beijing, China

ISSN 0302-9743 ISSN 1611-3349 (electronic)


Lecture Notes in Computer Science
ISBN 978-3-030-88188-7 ISBN 978-3-030-88189-4 (eBook)
https://doi.org/10.1007/978-3-030-88189-4
LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues

© Springer Nature Switzerland AG 2021


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, expressed or implied, with respect to the material contained herein or for any errors or
omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

The 2021 China Conference on Information Retrieval (CCIR 2021), co-organized by


the Chinese Information Processing Society of China (CIPS) and the Chinese
Computer Federation (CCF), was the 27th installment of the conference series. The
conference was hosted by Dalian University of Foreign Languages in Dalian, Liaoning,
China, during October 29–31, 2021.
The annual CCIR conference serves as the major forum for researchers and prac-
titioners from both China and other Asian countries/regions to share their ideas, present
new research results, and demonstrate new systems and techniques in the broad field of
information retrieval (IR). Since CCIR 2017, the conference has enjoyed contributions
spanning the theory and application of IR, both in English and Chinese.
This year we received a total of 124 submissions from both China and other Asian
countries. Each submission was carefully reviewed by at least three domain experts,
and the Program Committee (PC) chairs made the final decisions. We accepted 72,
among which 15 were English papers and 57 were Chinese papers. The final English
program of CCIR 2021 featured 15 papers.
CCIR 2021 included abundant academic activities. Besides keynote speeches
delivered by world-renowned scientists from China and abroad, and traditional paper
presentation sessions and poster sessions, we also hosted a young scientist forum, an
evaluation workshop, and tutorials on frontier research topics. We also invited authors
from related international conferences (such as SIGIR and CIKM) to share their
research results as well. CCIR 2021 featured four keynote speeches by James Allen
(University of Massachusetts Amherst), Si Wu (Peking University), Jun Wang (UCL),
and Zhongyuan Wang (Meituan Inc.).
The conference and program chairs of CCIR 2021 extend their sincere gratitude to
all authors and contributors to this year’s conference. We are also grateful to the PC
members for their reviewing effort, which guaranteed that CCIR 2021 could feature a
quality program of original and innovative research in IR. Special thanks go to our
sponsors for their generosity: Meituan Inc., Huawei Inc., and Baidu Inc.

October 2021 Hong Liu


Jiafeng Guo
Hongfei Lin
Min Zhang
Liang Pang
Organization

General Chairs
Hong Liu Dalian University of Foreign Languages, China
Jiafeng Guo Institute of Computing Technology, Chinese Academy
of Sciences, China

Program Committee Chairs


Hongfei Lin Dalian University of Technology, China
Min Zhang Tsinghua University, China

Proceedings Chairs
Ruihua Qi Dalian University of Foreign Languages, China
Liang Yang Dalian University of Technology, China

Publicity Chair
Zhumin Chen Shandong University, China

Publication Chair
Liang Pang Institute of Computing Technology, Chinese Academy
of Sciences, China

Webmaster
Yuan Lin Dalian University of Technology, China

Youth Forum Chairs


Xin Zhao Renmin University of China, China
Chenliang Li Wuhan University, China

CCIR Cup Chairs


Jianming Lv South China University of Technology, China
Weiran Xu Beijing University of Posts and Telecommunications,
China
viii Organization

Sponsorship Chairs
Qi Zhang Fudan University, China
Zhongyuan Wang Kwai Inc., China

Treasurers
Kan Xu Dalian University of Technology, China
Song Yang Dalian University of Foreign Languages, China

Award Chair
Ru Li Shanxi University, China

Program Committee
Ting Bai Beijing University of Posts and Telecommunications,
China
Fei Cai National University of Defense Technology, China
Jiawei Chen University of Science and Technology of China, China
Xiaoliang Chen Xihua University, China
Yubo Chen Institute of Automation, Chinese Academy of Sciences,
China
Zhumin Chen Shandong University, China
Zhicheng Dou Renmin University of China, China
Yajun Du Xihua University, China
Yixing Fan Institute of Computing Technology, Chinese Academy
of Sciences, China
Shengxiang Gao Kunming University of Science and Technology, China
Zhongyuan Han Foshan University, China
Yanbin Hao University of Science and Technology of China, China
Ben He University of Chinese Academy of Sciences, China
Xiangnan He University of Science and Technology of China, China
Yu Hong Soochow University, China
Zhuoren Jiang Zhejiang University, China
Ting Jin Hainan University, China
Yanyan Lan Tsinghua University, China
Chenliang Li Wuhan University, China
Lishuang Li Dalian University of Technology, China
Ru Li Shanxi University, China
Shangsong Liang Sun Yat-sen University, China
Xiangwen Liao Fuzhou University, China
Hongfei Lin Dalian University of Technology, China
Yuan Lin Dalian University of Technology, China
Chang Liu Peking University, China
Peiyu Liu Shandong Normal University, China
Organization ix

Yue Liu Institute of Computing Technology, Chinese Academy


of Sciences, China
Cheng Luo Tsinghua University, China
Zhunchen Luo PLA Academy of Military Science, China
Jianming Lv South China University of Technology, China
Ziyu Lyu Shenzhen Institute of Advanced Technology,
Chinese Academy of Sciences, China
Weizhi Ma Tsinghua University, China
Jiaxin Mao Tsinghua University, China
Xianling Mao Beijing Institute of Technology, China
Liqiang Nie Shandong University, China
Liang Pang Institute of Computing Technology, Chinese Academy
of Sciences, China
Zhaochun Ren Shandong University, China
Tong Ruan East China University of Science and Technology,
China
Huawei Shen Institute of Computing Technology, Chinese Academy
of Sciences, China
Dawei Song Beijing Institute of Technology, China
Ruihua Song Renmin University of China, China
Xuemeng Song Shandong University, China
Hongye Tan Shanxi University, China
Songbo Tan Beijing Hengchang Litong Investment Management
Co., Ltd., China
Liang Tang Information Engineering University, China
Hongjun Wang TRS Information Technology Co., Ltd., China
Pengfei Wang Beijing Institute of Technology, China
Suge Wang Shanxi University, China
Ting Wang National University of Defense Technology, China
Zhiqiang Wang Shanxi University, China
Zhongyuan Wang Meituan, China
Dan Wu Wuhan University, China
Le Wu Hefei University of Technology, China
Yueyue Wu Tsinghua University, China
Jun Xu Renmin University of China, China
Tong Xu University of Science and Technology of China, China
Weiran Xu Beijing Institute of Technology, China
Hongbo Xu Institute of Computing Technology, Chinese Academy
of Sciences, China
Kan Xu Dalian University of Technology, China
Hongfei Yan Peking University, China
Xiaohui Yan Huawei, China
Muyun Yang Harbin Institute of Technology, China
Zhihao Yang Dalian University of Technology, China
Dongyu Zhang Dalian University of Technology, China
Hu Zhang Shanxi University, China
x Organization

Min Zhang Tsinghua University, China


Peng Zhang Tianjin University, China
Qi Zhang Fudan University, China
Ruqing Zhang Institute of Computing Technology, Chinese Academy
of Sciences, China
Weinan Zhang Harbin Institute of Technology, China
Ying Zhang Nankai University, China
Yu Zhang Harbin Institute of Technology, China
Chengzhi Zhang Nanjing University of Science and Technology, China
Xin Zhao Renmin University of China, China
Jianxing Zheng Shanxi University, China
Qingqing Zhou Nanjing Normal University, China
Jianke Zhu Zhejiang University, China
Xiaofei Zhu Chongqing University of Technology, China
Zhenfang Zhu ShanDong JiaoTong University, China
Jiali Zuo Jiangxi Normal University, China
Contents

Search and Recommendation

Interaction-Based Document Matching for Implicit Search Result


Diversification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Xubo Qin, Zhicheng Dou, Yutao Zhu, and Ji-Rong Wen

Various Legal Factors Extraction Based on Machine Reading


Comprehension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Beichen Wang, Ziyue Wang, Baoxin Wang, Dayong Wu, Zhigang Chen,
Shijin Wang, and Guoping Hu

Meta-learned ID Embeddings for Online Inductive Recommendation . . . . . . . 32


Jingyu Peng, Le Wu, Peijie Sun, and Meng Wang

Modelling Dynamic Item Complementarity with Graph Neural Network


for Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Yingwai Shiu, Weizhi Ma, Min Zhang, Yiqun Liu, and Shaoping Ma

NLP for IR

LDA-Transformer Model in Chinese Poetry Authorship Attribution. . . . . . . . 59


Zhou Ai, Zhang Yijia, Wei Hao, and Lu Mingyu

Aspect Fusion Graph Convolutional Networks for Aspect-Based Sentiment


Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Fuyao Zhang, Yijia Zhang, Shuo Hou, Fei Chen, and Mingyu Lu

Iterative Strict Density-Based Clustering for News Stream . . . . . . . . . . . . . . 88


Kaijie Shi, Jiaxin Shi, Yu Zhou, Lei Hou, and Juanzi Li

A Pre-LN Transformer Network Model with Lexical Features


for Fine-Grained Sentiment Classification. . . . . . . . . . . . . . . . . . . . . . . . . . 100
Kaixin Wang, Xiujuan Xu, Yu Liu, and Zhehuan Zhao

Adversarial Context-Aware Representation Learning


of Multiword Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Bo An
xii Contents

IR in Education

Research on the Evaluation Words Recognition in Scholarly Papers’


Peer Review Texts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Kun Ding, Xinhang Zhao, Liang Yang, Kaiqiao Wang, and Yuan Lin

Evaluation of Learning Effect Based on Online Data . . . . . . . . . . . . . . . . . . 141


Zhaohui Liu, Hongfei Yan, Chong Chen, and Qi Su

Self-training vs Pre-trained Embeddings for Automatic Essay Scoring . . . . . . 155


Xianbing Zhou, Liang Yang, Xiaochao Fan, Ge Ren, Yong Yang,
and Hongfei Lin

Enhanced Hierarchical Structure Features for Automated Essay Scoring . . . . . 168


Junteng Ma, Xia Li, Minping Chen, and Weigeng Yang

IR in Biomedicine

A Drug Repositioning Method Based on Heterogeneous Graph Neural


Network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Yu Wang, Shaowu Zhang, Yijia Zhang, Liang Yang, and Hongfei Lin

Auto-learning Convolution-Based Graph Convolutional Network


for Medical Relation Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Mengyuan Qian, Jian Wang, Hongfei Lin, Di Zhao, Yijia Zhang,
Wentai Tang, and Zhihao Yang

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209


Search and Recommendation
Interaction-Based Document Matching
for Implicit Search Result Diversification

Xubo Qin1 , Zhicheng Dou2(B) , Yutao Zhu3 , and Ji-Rong Wen2


1
School of Information, Renmin University of China, Beijing, China
2
Gaoling School of Artificial Intelligence, Renmin University of China,
Beijing, China
dou@ruc.edu.cn
3
Université de Montréal, Québec, Canada

Abstract. To satisfy different intents behind the queries issued by


users, the search engines need to re-rank the search result documents
for diversification. Most of previous approaches of search result diver-
sification use pre-trained embeddings to represent the candidate docu-
ments. These representation-based approaches lose fine-grained matching
signals. In this paper, we propose a new supervised framework leverag-
ing interaction-based neural matching signals for implicit search result
diversification. Compared with previous works, our proposed framework
can capture and aggregate fine-grained matching signals between each
candidate document and selected document sequences, and improve the
performance of implicit search result diversification. Experimental results
show that our proposed framework can outperform previous state-of-the-
art implicit and explicit diversification approaches significantly, and even
slightly outperforms ensemble diversification approaches. Besides, with
our proposed strategies the online ranking latency of our framework is
moderate and affordable.

Keywords: Search result diversification · Neural IR · Matching

1 Introduction

Users tend to issue short queries in search engines. These short queries are usu-
ally ambiguous or vague [12,16,26,27]. Taking the query “apple” as an example,
the actual user intents behind the query can be either the fruit “apple” or “Apple
Company”. Besides, a user intent can also cover multiple aspects (such as “how
to learn JAVA” or “download JAVA IDE” for the intent “JAVA programming
language”). In order to satisfy those diversified user intents, the technology of
search result diversification is necessary for search engines. The ranking models
of search result diversification aims at re-ranking the result documents to satisfy
diversified user intents at former ranking positions. Depending on whether to
model the user intent coverage explicitly, previous studies can be categorized
into implicit and explicit diversification methods. The implicit diverse ranking
c Springer Nature Switzerland AG 2021
H. Lin et al. (Eds.): CCIR 2021, LNCS 13026, pp. 3–15, 2021.
https://doi.org/10.1007/978-3-030-88189-4_1
4 X. Qin et al.

approaches [4,30,31,33,36] focus on capturing the interaction signals between


documents and modeling the document novelty by the dissimilarity of the docu-
ments. On the contrary, the explicit approaches [1,9,14,17,24] tend to explicitly
model the coverage of different subtopics. Recently, a group of studies [21,22]
are proposed for modeling both document interactions and subtopic coverage,
which can be treated as ensemble methods. As subtopic mining itself is a very
challenging task, in this work, we focus on implicit diversification approaches.
Although many implicit methods have been proposed, most of them measure
the document’s novelty based on the dissimilarity between the candidate docu-
ment and the selected documents. For example, NTN [31] is a typical implicit
method that automatically learns a novelty function based on the pre-trained
representation (e.g., doc2vec or PLSA) of documents. A main drawback of these
methods is: merely computing the document’s novelty based on the pre-trained
representation is inaccurate, because the unsupervised pre-training methods can-
not provide reliable representations, and the document’s content usually con-
tains abundant information. Indeed, some studies in ad-hoc ranking [13] have
reported that the representation-based methods (i.e., directly computing rank-
ing score based on the representation of queries and documents) often performs
worse than interaction-based methods (i.e., constructing the term-level match-
ing signals from queries and documents and aggregating them for calculating
ranking scores). This result indicates that merely using pre-trained document
representation to compute documents’ similarity is suboptimal in search result
diversification.
To tackle this problem, in this work, we propose conducting the term-level
interaction between documents to measure their similarity and design a
new model called MatchingDIV. Our model follows the widely used greedy
document selection process in search result diversification. Given a document
list, the model iteratively selects a novel document from the list and adds
it to the re-ranked list. After all documents are selected, the obtained list is
diverse. Specifically, MatchingDIV first encodes the candidate document and all
selected documents by a pre-trained language model (e.g., BERT [11]). Then,
each selected document is interacted with the candidate document at the term-
level, and the representation of each term in the selected document is updated.
By this means, the fine-grained matching information is integrated into the term
representations. Next, MatchingDIV applies a recurrent neural network (RNN)
to aggregate the term-level representation and calculate the document-level rep-
resentation. Finally, all document representations are aggregated by another
RNN, and the ranking score of the candidate document is computed based on
the final representation. To our best knowledge, we are the first to consider
fine-grained interaction between documents in search result diversification task.
Experiment results show that our proposed framework can significantly outper-
forms the state-of-the-art implicit and explicit diversification approaches based
on pre-trained document representations.
MatchingDIV 5

2 Related Work
We briefly review some related work about search result diversification and neu-
ral matching in different tasks.
Search Result Diversification. The earliest typical diversification models
is Max Margin Relevance (MMR) [4]. It compares each candidate document
with selected document sequence, greedily selects the document with the best
ranking score, and appends it to the selected sequence. The ranking scores of
documents are computed based on their relevance to the query and the nov-
elty compared with the selected documents. The “novelty” here is measured
by the dissimilarity between documents. The original MMR uses handcrafted
features and scoring functions to calculate the similarity, which limits its appli-
cation. Many approaches extend MMR by using supervised learning methods to
learn the features and functions automatically (e.g., SVM-DIV [33], R-LTR [36],
PAMM [30], and PAMM-NTN [31]). These methods are called implicit diversi-
fication approaches. On the contrary, explicit diversification approaches model
the coverage of different user intents (represented as subtopics) of each docu-
ment. A novel document is expected to cover new user intents which have not
been covered by selected sequence. Several unsupervised and supervised explicit
diversification approaches have been proposed, e.g., xQuAD [24], PM2 [9],
HxQuAD, HPM2 [14], and DSSA [17]. Recently, a group of new approaches
(e.g., DVGAN [21] and DESA [22]) have been proposed as ensemble approaches.
They use both implicit inter-document features and explicit subtopic coverage
features.
Note that most of these approaches use document representations pre-trained
by unsupervised tools, such as doc2vec [20] and LDA [3]. The document similar-
ity is computed by cosine similarity of two document embeddings. Different from
these methods, we represent documents at term-level, based on which an inter-
action is conducted. Therefore, our method can capture fine-grained matching
signals which are more accurate in computing document similarity.
Neural Matching in Different Tasks. In recent years, researchers have pro-
posed a group of deep-learning based relevance matching models for multiple IR
tasks. Compared with traditional approaches, these neural matching methods
can better measure the semantic similarity between queries and documents. In
general, these methods can be divided into two categories: representation-based
methods [15,25] and interaction-based methods [8,32]. The representation-based
methods use neural networks to generate the dense vector representations of
the queries and documents and compute their similarity based on the repre-
sentations. In contrast, the interaction-based methods first capture term-level
interaction signals between queries and documents and then aggregate them to
compute the similarity. In the view of neural matching, the previous diversifi-
cation approaches can be seen as representation-based approaches. In addition
to the ranking task, there are also a group of studies [28,35] leveraging neu-
ral models to measure the similarity between dialogue context and response
6 X. Qin et al.

candidate and achieving great performance in retrieval-based chatbots. Intu-


itively, the relationship among the context-response sentences is similar to the
that among selected-candidate documents in implicit search result diversification
tasks. Inspired by previous work in multi-turn response selection, we propose an
interaction-based methods for search result diversification.

Ed Relevance
v1 Features
Candidate
Document Ranking
xd
Cross-Attention

MLP
d Score
s
BERT

RNN
RNN
vi vd
Selected
Document
Idi
List
vm
[d1 , · · · , dm ]
Edi

Representation Interaction Aggregation Scroing

Fig. 1. The structure of MatchingDIV.

3 Methodology

In this section, we first give the formulation of search result diversification prob-
lem. Then, we introduce the overall structure of our framework and describe
the details of each component. Finally, we describe the training and inference
process.

3.1 Problem Formulation


The definition of implicit search result diversification task can be described as:
Given a query q with and a list of candidate documents D, the diverse ranking
task aims to return a new ranked document list R. Here, D is an initial relevance
ranking list without diversification. For the diversified list R, both the relevance
and the diversity of those documents should be considered. As a greedy selection
approach, our framework compares each candidate document d with the selected
document sequence C and returns the ranking score s. The document with the
highest score will be selected and appended into C. Our target is designing a
model f to compute the ranking score s for the candidate document d by con-
sidering its relevance to the query q and its novelty regarding C. This process
can be formulated as:

s = f (q, d, C). (1)


MatchingDIV 7

3.2 MatchingDIV

In implicit search result diversification methods, a document’s novelty is mea-


sured by its dissimilarity with other documents. Therefore, how to calculate
the similarity is very crucial. Existing methods usually compute cosine similar-
ity based on pre-trained document representation, but it is difficult to capture
accurate matching signals merely based on these representations. In this work,
we propose an interaction-based document matching framework, which is called
MatchingDIV. As shown in Fig. 1, our framework first represents each term of
document by a pre-trained language model. Then, we design a cross-attention
mechanism to model the interaction based on two documents’ representations.
Since this operation is conducted on term-level representations, our method can
capture fine-grained matching signals. The details of our framework are intro-
duced as follows.
Document Representation. With the recent progress of contextualized lan-
guage models, we use BERT [11] to generate the term representation of the
documents:

Ed = Linear(Norm(BERT([D]))), (2)

where Ed ∈ Rld ×h , and ld is the length of the document. [D] denotes the word-
pieces of documents after tokenization, and Norm(·) denotes the operation of
normalization. “BERT” denotes a BERT-like encoder which can be replaced by
other pretrained models, such as DistilBERT [23] or ELECTRA [6]. Following
the previous work [18], we apply a linear projection layer to compress the term
representation into h-dimension for reducing the storage cost.
Interaction via Cross-Attention. To capture the fine-grained matching sig-
nals, we use cross-attention as the interaction function to let each document in
the selected document sequence interact with the candidate document. Similar
to the self-attention which is widely used in Transformer-based models [11,29],
the cross-attention operation is also based on multi-head attention (MHA):

qK
Attn(q, K, V) = Softmax( √ )V, (3)
d
MHA(q, K, V) = [a1 ; . . . ; ah ], (4)
ai = Attn(qWQ K V
i , KWi , VWi ), i ∈ [1, h]. (5)

Due to space limitation, we omit the details of multi-head attention, which can
be referred to at [29]. For the selected document di in the sequence C and the
candidate document d, the interacted representations of di can be defined as:

Idi = MHA(Edi , Ed , Ed ). (6)

With the cross-attention mechanism, the representation of each term in the


selected document di is enhanced by the weighed sum of the representations of d.
8 X. Qin et al.

The similarity information is updated to the representation so that the interacted


representations Idi can represent the term-level matching signals between d and
di .
Matching Signals Aggregation. After getting the enhanced representation
of each term in the selected document di , the next question is how to aggregate
them and compute an integrated representation of di . Here, we apply an RNN.
Considering Idi = {Ti,1 , . . . , Ti,ldi }, where Ti,j is the enhanced representation
of the j-th term in di , the hidden state ht of the RNN is described as:

hi,t = tanh(Wi [hi,t−1 ; Ti,t ] + bi ), t ∈ [1, ldi ]. (7)

The last hidden state hi,ldi is used as the integrated representation of the docu-
ment di . To simplify the notation, we use vi = hi,ldi , and it contains matching
signals between the selected document di and the candidate document d.
Afterwards, when obtaining integrated representations of all selected doc-
uments, we employ another RNN to aggregate the information of the whole
selected documents sequence as:

hd,k = tanh(Wd [hd,k−1 ; vk−1 ] + bd ), k ∈ [1, |C|]. (8)

We use the last hidden state hd,|C| to represent the selected document sequence.
This vector contains the matching information between the candidate document
and the selected document sequence, and it is denoted as vd for simplification.
Note that in practice, we use GRU cells for all RNNs.
Ranking Score. Inheriting the spirit of MMR [4], the final ranking score is cal-
culated based on both the relevance and the novelty. For a candidate document
d, its ranking score s is calculated as:

s = MLP(ReLU(MLP([xd ; vd ]))), (9)

where MLP(·) is a multi-layer perceptron, ReLU is ReLU activation function,


and ; is concatenation operation. xd is a group of relevance features of d regarding
the query q. Following previous studies [17,21,22], we use some traditional IR
features, such as BM25 and TF-IDF, to measure the relevance. For each ranking
position, our model greedily selects the best document with the highest score s.
When a document is selected, it will be added to C. This process will be repeated
until all the documents are selected.

3.3 Model Training and Inference

Loss Function. In the process of training, we use the sum of all the documents’
ranking score si as the score sr of a given ranking sequence r. Following previous
work [17,22], we apply a list-pairwise sampling approach to generate training
samples in limited datasets. With the positive and negative ranking pair (r1 , r2 ),
MatchingDIV 9

the loss function for list-pairwise samples is defined as a binary classification log-
loss formation:
 
L= |ΔM|[ys log(P (r1 , r2 )) + (1 − ys ) log(1 − P (r1 , r2 ))]. (10)
q∈Q s∈Sq

Here |ΔM| = |M(r1 ) − M(r2 )|, and P (r1 , r2 ) = σ(sr1 − sr2 ), where σ(·) is
the sigmoid function. Due to space limitation, we omit the detailed introduc-
tion of list-pairwise sampling method, and more details can be found in [17].
MatchingDIV is optimized in an end-to-end manner, where the BERT encoder
is fine-tuned, and the other components are trained from scratch.
Reducing Online Inference Latency. As MatchingDIV leverages BERT,
which is a large model, to encode the documents, for online ranking tasks, we
propose two strategies to reduce the ranking latency.

(1) Late-interaction Strategy. The original design of BERT suggests to concate-


nate two documents as a long sequence and model their relationship through
the first special token. However, in our task, it is impractical to concatenate
each selected document with the candidate document in online scenario due
to the high computation cost. Therefore, Following the previous work [18],
we apply late-interaction strategy to decoupling the encoding and interaction
of the documents so that the document representation can be pre-computed
and stored offline. As a result, the online inference latency of computing the
document representations can be omitted.
(2) Ranking-top Strategy. In MatchingDIV, the computational cost is increasing
with the length growth of selected document sequence C. Hence, we propose
a ranking-top strategy to reduce the computational cost of online document
interactions. For an initial ranking list D with m documents, MatchingDIV
takes all the m candidate documents as input and return m ranking scores.
Then, MatchingDIV greedily selects the best document and iteratively adds
it to the ranking sequence R. When |R| grows to the maximum number n
(n < m), the ranking process will stop early, and all remained candidate
documents in D will be directly appended into |R|. In other words, with
the ranking-top strategy, only the top N documents in R are re-ranked in
diversity. The computational cost of document interactions between selected
and candidate documents can thus being reduced. In practical application,
search result diversification aims to satisfy user intents in former ranking
positions, so it is unnecessary to spend much time for the latter positions.

4 Experiment Settings
4.1 Datasets and Metrics
We use the Web Track dataset from TREC 2009 to 2012 with 198 queries in total.
The query #95 and #100 without diversity judgements are not used. Each query
includes 3 to 8 user intent annotations, and the relevance rating is marked as
10 X. Qin et al.

relevant or irrelevant at intent level. We use the preprocessed relevance feature


data provided by Jiang et al. [17] on GitHub1 . The data include 18 relevance
features for each query and subquery generated by traditional IR models. More
details about those features can be found in [17]. The title (if available) and
content are concatenated together for tokenization, and we only use the first
ld = 80 terms for the document since the documents are usually too long for
document interaction.
The evaluation metrics in our experiments are the official Web Track diversity
metrics, including α-nDCG [7], ERR-IA [5], and NRBP [2]. Similar to previous
work [17,30,31,36], we also apply the metrics of Precision-IA [1] (denoted as Pre-
IA) and Subtopic Recalls [34] (denoted as S-rec). We use the top-50 documents
of Indri initial rankings as inputs, and all those metrics are computed on the
top 20 results of the diversified ranking lists. Two-tailed paired t-test are used
to conduct significance testing with p-value <0.05. In the significance testing,
MatchingDIV is compared with PAMM-NTN as the state-of-the-art supervised
implicit model and DSSA as the best explicit model.

4.2 Model Settings

In the training phase, we use 5-fold cross validation to tune the parameters in
all experiments with the widely used α-nDCG@20. In each fold, there are 160
queries for training and 40 queries for testing. In our experiment, the hidden
size of GRU is 128, and the BERT-based embeddings are compressed into 128
dimension. The batch size is 32. We use Adam [19] optimizer. The learning rate
of the BERT encoder is 3e−5, while that of other network components is 1e−3.
We compare MatchingDIV with baselines including:

(1) Non-diversified approaches: Lemur, ListMLE. These two ad-hoc ranking


methods doe not consider diversity.
(2) Explicit diversification methods: xQuAD [24], PM2 [9], TxQuAD,
TPM2 [10], HxQuAD, HPM2 [14]. These are representative unsupervised
explicit methods. DSSA [17] is a supervised method, which models the diver-
sity of the documents with subtopic attention using RNNs. This is the state-
of-the-art explicit diversification methods. Note that our method uses BERT
as the document encoder, and we also equip DSSA with BERT and denote
this variant as DSSA (BERT) for a fair comparison.
(3) Implicit diversification methods: R-LTR [36], PAMM [30], NTN [31]. They
are representative supervised implicit methods. The neural tensor network
(NTN) is used on both R-LTR and PAMM, denoted as R-LTR-NTN and
PAMM-NTN, respectively.
(4) Ensemble methods: DESA [22] and DVGAN [21]. They are two ensemble
methods that use both explicit (subtopic) features and implicit (document
similarity) features.

1
https://github.com/jzbjyb/DSSA.
MatchingDIV 11

5 Experimental Results
5.1 Overall Results and Discussion

Table 1 shows the results of all models. We can observe: (1) MatchingDIV out-
performs all the implicit and explicit baseline models, and the improvement is
statistically significant (with p-value <0.05) on all the metrics except for Pre-IA.
These results clearly demonstrate the effectiveness of our proposed RMEDiv.
(2) Compared with those approaches based on pre-trained document embed-
dings, our framework can capture and aggregate the fine-grained matching sig-
nals between selected and candidate documents, thus improving the performance
of search result diversification. (3) Intriguingly, MatchingDIV, as an implicit
method without using subtopic coverage, can perform slightly better than the
ensemble approaches (DVGAN and DESA). This reflects that our proposed
interaction-based document matching is very effective. Besides, this result also
implies the advantage of enhancing the relevance matching component for diver-
sification. (4) Pre-trained language models (such as BERT) are reported to have
great capability of representation. By integrating it into the baseline DSSA, we
see a slight performance improvement. However, there is still a large gap between

Table 1. Performance of all approaches. The baselines include: (1) non-diversed meth-
ods; (2) explicit methods; (3) implicit methods; and (4) ensemble methods. The best
results are in bold. † indicates that our model significantly outperforms all implicit and
explicit approaches (p-value <0.05 in two-tailed paired t-test).

Methods ERR-IA α-nDCG NRBP Pre-IA S-rec


(1) Lemur .271 .369 .232 .153 .621
(1) ListMLE .287 .387 .249 .157 .619
(2) xQuAD .317 .413 .284 .161 .622
(2) TxQuAD .308 .410 .272 .155 .634
(2) HxQuAD .326 .421 .294 .158 .629
(2) PM2 .306 .411 .267 .169 .643
(2) TPM2 .291 .399 .250 .161 .639
(2) HPM2 .317 .420 .279 .172 .645
(2) DSSA (doc2vec) .350 .452 .318 .184 .645
(2) DSSA (BERT) .352 .457 .319 .181 .656
(3) R-LTR .303 .403 .267 .164 .631
(3) PAMM .309 .411 .271 .168 .643
(3) R-LTR-NTN .312 .415 .272 .166 .644
(3) PAMM-NTN .311 .417 .272 .170 .648
(3) MatchingDIV (Ours) .366† .467† .334† .185 .659†
(4) DVGAN .367 .465 .334 .175 .660
(4) DESA .363 .464 .332 .184 .653
12 X. Qin et al.

the performance of DSSA (BERT) and that of our proposed MatchingDIV. This
indicates that the better performance we obtained is not merely due to BERT
embeddings but also to the proposed interaction-based document matching.

Table 2. Effects of different pretrained models

Settings ERR-IA α-nDCG NRBP Pre-IA S-rec


Electra-base-dicsriminator .366 .467 .334 .185 .659
Distilbert-base-uncased .360 .463 .329 .182 .655
Bert-base-uncased .363 .465 .332 .184 .659

5.2 Effect of Different Encoder


We further investigate the effect of different model settings in MatchingDIV.
Specifically, we try other pretrained models provided by Huggingface2 as docu-
ment encoder. The following models are tested: the basic BERT model “bert-
base-uncased”, the DistilBERT [23] model “distilbert-base-uncased”, and the
ELECTRA [6] model “electra-base-discriminator”. Results are shown in Table 2.
The results show that the BERT-base model performs slightly better than Dis-
tilBERT, while the ELECTRA has achieved the best performance. It is worth
noting that all these variants achieve better performance than existing baseline
methods. This further validates the effectiveness of our proposed interaction-
based document matching method.

Table 3. Results of average online ranking time per query.

Setting Average time online (ms) α-nDCG@20


n=5 19 .458
n = 10 45 .464
n = 20 113 .466
n = 50 295 .466

5.3 Inference Latency for Online Ranking

As we introduced in Sect. 3.3, the inference latency is very important when apply-
ing a diversification model in practice. On the one hand, MatchingDIV employs

2
https://github.com/huggingface/transformers.
MatchingDIV 13

the late-interaction mechanism, allowing to encode the documents offline. There-


fore, the computational time of encoding the documents into term-level embed-
dings can be omitted. On the other hand, with the ranking-top strategy, Match-
ingDIV only generates the top n documents of the diversified ranking list R.
To investigate the effect of this strategy, we set n = {5, 10, 20, 50} and test the
model’s inference time and corresponding performance in terms of α-nDCG@20.
Experimental results are shown in Table 3.
From these results, we can find that the average online inference time growth
approximate linearly regarding the increasing size of n. The performance also
improves from n = 5 to n = 20. After that, when n = 50, the average inference
time is increasing, but the performance is no longer improved. This is because
the evaluation metric α-nDCG@20 only considers the diversification in top 20
documents. Indeed, this result is consistent with the goal of search result diver-
sification. The diverse ranking model aims at satisfying user intents at former
ranking positions rather than spending lots of time on the latter ranking posi-
tions. When n = 20, the performance is good, and the online ranking latency is
113 ms, demonstrating that our framework is effective and efficient.

6 Conclusion and Future Work

In this work, we proposed an supervised framework MatchingDIV for integrating


interaction-based document matching in implicit search result diversification.
Based on BERT-based term embeddings of each document, MatchingDIV used
cross-attention and GRUs to capture and aggregate low-level matching signals
between selected documents and candidate documents. Compared with previous
work, we are among the first to capture the fine-grained term-level matching
signals for document selection in search result diversification. Experiment results
showed that our framework can significantly outperform the previous SOTA
explicit and implicit diversification method, and even outperform the ensemble
diversification framework. Furthermore, with late-interaction and our proposed
ranking-top strategy, the online ranking latency is affordable for actual search
engines. These results demonstrate the advantage of employing interaction-based
document matching for diversification tasks. As future work, we plan to also
integrate query-document interactions, which may bring further improvement
for diverse ranking tasks.

Acknowledgments. This work was supported by National Natural Science Founda-


tion of China No. 61872370 and No. 61832017, and Beijing Outstanding Young Scientist
Program No. BJJWZYJH012019100020098. We thank all the anonymous reviewers for
their insightful comments.

References
1. Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results.
In: WSDM (2009)
14 X. Qin et al.

2. Baeza-Yates, R., Hurtado, C., Mendoza, M.: Query recommendation using query
logs in search engines. In: Lindner, W., Mesiti, M., Türker, C., Tzitzikas, Y., Vakali,
A.I. (eds.) EDBT 2004. LNCS, vol. 3268, pp. 588–596. Springer, Heidelberg (2004).
https://doi.org/10.1007/978-3-540-30192-9 58
3. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn.
Res. 3, 993–1022 (2003)
4. Carbonell, J.G., Goldstein, J.: The use of MMR, diversity-based reranking for
reordering documents and producing summaries. In: SIGIR (1998)
5. Chapelle, O., Metlzer, D., Zhang, Y., Grinspan, P.: Expected reciprocal rank for
graded relevance. In: CIKM. ACM (2009)
6. Clark, K., Luong, M., Le, Q.V., Manning, C.D.: ELECTRA: pre-training text
encoders as discriminators rather than generators. In: ICLR. OpenReview.net
(2020)
7. Clarke, C.L.A., et al.: Novelty and diversity in information retrieval evaluation. In:
SIGIR. ACM (2008)
8. Dai, Z., Xiong, C., Callan, J., Liu, Z.: Convolutional neural networks for soft-
matching n-grams in ad-hoc search. In: WSDM. ACM (2018)
9. Dang, V., Croft, W.B.: Diversity by proportionality: an election-based approach
to search result diversification. In: SIGIR (2012)
10. Dang, V., Croft, W.B.: Term level search result diversification. In: SIGIR (2013)
11. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidi-
rectional transformers for language understanding. In: NAACL-HLT. Association
for Computational Linguistics (2019)
12. Dou, Z., Song, R., Wen, J.: A large-scale evaluation and analysis of personalized
search strategies. In: WWW. ACM (2007)
13. Guo, J., et al.: A deep look into neural ranking models for information retrieval.
Inf. Process. Manag. 57(6), 102067 (2020)
14. Hu, S., Dou, Z., Wang, X., Sakai, T., Wen, J.: Search result diversification based
on hierarchical intents. In: CIKM (2015)
15. Huang, P., He, X., Gao, J., Deng, L., Acero, A., Heck, L.P.: Learning deep struc-
tured semantic models for web search using clickthrough data. In: CIKM. ACM
(2013)
16. Jansen, B.J., Spink, A., Saracevic, T.: Real life, real users, and real needs: a study
and analysis of user queries on the web. Inf. Process. Manag. 36(2), 207–227 (2000)
17. Jiang, Z., Wen, J., Dou, Z., Zhao, W.X., Nie, J., Yue, M.: Learning to diversify
search results via subtopic attention. In: SIGIR (2017)
18. Khattab, O., Zaharia, M.: ColBERT: efficient and effective passage search via
contextualized late interaction over BERT. In: SIGIR. ACM (2020)
19. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y.,
LeCun, Y. (eds.) ICLR (2015)
20. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In:
ICML 2014 (2014)
21. Liu, J., Dou, Z., Wang, X., Lu, S., Wen, J.: DVGAN: a minimax game for search
result diversification combining explicit and implicit features. In: SIGIR (2020)
22. Qin, X., Dou, Z., Wen, J.: Diversifying search results using self-attention network.
In: CIKM. ACM (2020)
23. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: Distilbert, a distilled version of
BERT: smaller, faster, cheaper and lighter. CoRR abs/1910.01108 (2019)
24. Santos, R.L.T., Macdonald, C., Ounis, I.: Exploiting query reformulations for web
search result diversification. In: WWW (2010)
MatchingDIV 15

25. Shen, Y., He, X., Gao, J., Deng, L., Mesnil, G.: Learning semantic representations
using convolutional neural networks for web search. In: WWW. ACM (2014)
26. Silverstein, C., Henzinger, M.R., Marais, H., Moricz, M.: Analysis of a very large
web search engine query log. In: SIGIR Forum, vol. 33, no. 1 (1999)
27. Song, R., Luo, Z., Wen, J., Yu, Y., Hon, H.: Identifying ambiguous queries in web
search. In: WWW. ACM (2007)
28. Tao, C., Wu, W., Xu, C., Hu, W., Zhao, D., Yan, R.: Multi-representation fusion
network for multi-turn response selection in retrieval-based chatbots. In: WSDM
(2019)
29. Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
30. Xia, L., Xu, J., Lan, Y., Guo, J., Cheng, X.: Learning maximal marginal relevance
model via directly optimizing diversity evaluation measures. In: SIGIR (2015)
31. Xia, L., Xu, J., Lan, Y., Guo, J., Cheng, X.: Modeling document novelty with
neural tensor network for search result diversification. In: SIGIR (2016)
32. Xiong, C., Dai, Z., Callan, J., Liu, Z., Power, R.: End-to-end neural ad-hoc ranking
with kernel pooling. In: SIGIR. ACM (2017)
33. Yue, Y., Joachims, T.: Predicting diverse subsets using structural SVMs. In: ICML.
ACM International Conference Proceeding Series, vol. 307 (2008)
34. Zhai, C., Cohen, W.W., Lafferty, J.D.: Beyond independent relevance: methods
and evaluation metrics for subtopic retrieval. In: SIGIR (2003)
35. Zhou, X., et al.: Multi-turn response selection for chatbots with deep attention
matching network. In: ACL (2018)
36. Zhu, Y., Lan, Y., Guo, J., Cheng, X., Niu, S.: Learning for search result diversifi-
cation. In: SIGIR (2014)
Various Legal Factors Extraction Based
on Machine Reading Comprehension

Beichen Wang1 , Ziyue Wang1(B) , Baoxin Wang1 , Dayong Wu1 ,


Zhigang Chen1 , Shijin Wang1,2 , and Guoping Hu1
1
State Key Laboratory of Cognitive Intelligence, iFLYTEK Research, Beijing, China
{zywang27,bxwang2,dywu2,zgchen,sjwang3,gphu}@iflytek.com
2
iFLYTEK AI Research (Hebei), Langfang, China

Abstract. With the rapid growth of legal cases, professionals are under
pressure to go through lengthy documents and grasp informative pieces
of text in the limited time. Most of the existing techniques focus on
simple legal information retrieval task, such as name or address of the
prosecutor or the defendant, which can be easily accomplished with the
help of handcrafted patterns or sequence labeling methods. Yet compli-
cated texts always challenge such pattern-based methods and sequence
labeling approaches. These texts state the same facts or describe the
same events, but they do not share common or similar patterns. In
this paper, we design a unified framework to extract legal information
in various formats, including directly extracted information (a piece of
span) and information that needs to be deduced. The framework follows
the methodology to answer questions in machine reading comprehension
(MRC) tasks. We treat the extraction fact labels as the counterpart of
questions in MRC task and propose several strategies to represent them.
We construct several datasets regarding different cases for training and
testing. Our best strategy achieves up to 4% enhancement in F1 score
on each dataset compared to the MRC baseline.

Keywords: Information extraction · Machine reading comprehension ·


Legal information

1 Introduction

Nowadays, driven by increasingly complicated legal provisions and cases, both


ordinary parties and legal workers are eager to use technical means to assist in
analysis. In the process of assisting judicial work, it is an indispensable ability to
extract various forms of required information efficiently and correctly. The infor-
mation are sometimes pieces of text in the document, such as event descriptions,
actions and entity names, or conclusions that are not directly stated and need
deducing from the original text. Such information are usually crucial to the final
sentence, which are called legal factors in the legal industry. Legal factors are
closely cohered with legal case types and each type has a fixed factor list. Given
the case type, judges or other legal workers are clear of what information to seek
c Springer Nature Switzerland AG 2021
H. Lin et al. (Eds.): CCIR 2021, LNCS 13026, pp. 16–31, 2021.
https://doi.org/10.1007/978-3-030-88189-4_2
Various Legal Factors Extraction Based on Machine Reading Comprehension 17

from the document. However, different forms of legal factors cannot be easily
extracted by a single model simultaneously using existing techniques. Thus, we
hope to build a unified framework to extract all factors regarding the case type.
Generally, information extraction task [6] extracts entities (person, organi-
zation, etc.) and facts (relations, events, etc.) from given texts [16], helping to
acquire the desired information and reconstruct massive contents. This is usually
done by sequence labeling models, such as Lattice LSTM [18] and transformer-
based models [9,17]. They conduct experiments on Chinese named entity recog-
nition (NER) task, and achieve high performance in benchmarks, such as Chinese
NER MSRA [8], OntoNotes 4.0, and Resume NER [18]. However, they fail to
maintain competitive results in our task. Our extraction targets are of multiple
forms, including spans and deductions. Spans are often entity names and event
descriptions, while deductions include answers to predefined legal conditions,
for example, whether the document agrees (answers “Yes”) or opposes (answers
“No”) to a given condition. These challenge the sequence labeling models since
the answers are not spans from the context. Inspired by MRC tasks [7,12,13],
whose question types are similar to ours, for example, asking for a named entity
(span) and deducing a piece of opinion (“Yes” or “No”), we can employ such
ideas to solve the legal factor extraction task. The difference lies in that the
questions of MRC tasks are arbitrary, but the extraction fact labels (the coun-
terpart to questions in MRC) are fixed along with the case type. We pre-define
these fixed legal factors to a given case as a set of extraction fact labels.
In this paper, we design a unified framework, particularly for multi-formed
legal factor extraction. It follows the MRC methodology to encode and interact
with the document and the extraction fact labels. Some researchers proved that
more specified and informative queries could improve the extraction precision
[10]. As our extraction fact labels are fixed, to achieve better performance, we
expand the labels to obtain more specified expressions with query expansion
components before the extraction. Our framework solves the problem discussed
above and our best strategy achieves 4% improvement compared to the baseline.
Our contributions are as follows: i. We achieve automatic extraction of various
forms of legal factor with a single model. ii. We proposed a strategy to represent
legal factors that outperform the others.

2 Related Works
Some existing works investigates the feasibility of using MRC approaches to
solving information extraction tasks. Li et al. [10] use the MRC model to
solve the named entity recognition task and achieved good results in nested
entity recognition. For each more nested entity, one more question requires to
be answered. And the reading comprehension model is designed to handle the
question-answering task. They suppose that the reading comprehension model is
a natural solution to the nested entity problem. Similarly, in our task, informa-
tion of factors is often nested or overlapped, we could exploit the MRC framework
for legal factor extraction.
Query expansion is comprehensively used because of its simplicity and prac-
ticality. There are several ways to reach query expansion. The IBM algorithm in
18 B. Wang et al.

the machine translation model is migrated to directly rewrite queries [5]. People
find that it is too rough to directly rewrite queries, and instead use the Seq2Seq
model to incorporate richer semantic information, and reinforcement learning is
used to fine-tune the rewrite model [1]. Of course, there are other ideas, such
as using a large amount of query click data, various concept words are mined
out. Further, associating the concept words with the knowledge graph in order
to replace the simple query clustering scheme [11].

3 The Legal Factor Extraction Framework

3.1 Architecture of the Framework

Primarily, we introduce two important concepts, legal factors and the legal fac-
tor extraction task. As mentioned in Sect. 1, legal factor is a jurisprudence
concept, referring to the key information that affects the final sentencing, which
includes the descriptions of a judicial event, such as persons, actions, causes,
consequences and etc., and the conclusions to predefined judicial conditions. We
define the legal factor extraction task as the procedure to find out the text stat-
ing such information from original paragraphs. Our framework consists of three
parts: input documents and extraction fact labels embeddings, documents and
labels interactive encoding, and result answers prediction. The overall structure
is shown in Fig. 1 and Fig. 2 is an example shows how our framework actually
works.

Fig. 1. Architecture of the framework. All the four forms of extraction fact labels can
be fed into the encoder independently for extraction. The dotted arrows refer to the
semantic enhancement.
Various Legal Factors Extraction Based on Machine Reading Comprehension 19

In the embedding part, the inputs are the case descriptions and extrac-
tion targets. Note that the case type is an inherent attribute of a case
description and the targets of extraction are determined along with the case
type. We design different representations of the extraction fact labels and
gradually enriched their semantic. First, we employ special tokens ([Token1],
[Token2], etc.) as the targets. These tokens function as signals prompting the
model to extracting different legal factors. Then, we represent the labels by
text of legal factors, such as “ ” (the injury level) and “ ”
(course/means). Further, the legal factors are expanded into more specific ques-
tions, for instance, “ ” (What is the victim’s injury level?) and
“ ” (If the defendant use tools?). Finally, the questions are
reinforced to form longer questions via query expansion methods. We will discuss
different extraction targets in the following subsection. The rest parts interac-
tively encode documents and extraction fact labels with multi-head self attention
[15], and predict the results according to the types of extraction fact labels. If it
is unanswerable, an empty string returned. Otherwise, it answers “Yes” or “No”
to a yes/no question, or returns the result of the extraction through the start
and end indexes in the input passage.

Fig. 2. Examples of the workflow (Left: Answer as span. Right: Answer as “YES”).
We do not translate sentences after data augmentation (DA) because DA is usually a
paraphrase of the original question, and the English version could remain unchanged.

3.2 Strategies for Representing Extraction Fact Labels

The Special Tokens. To verify our hypothesis that the different forms of legal
factor can be extracted by a unified framework, we represent each extraction
fact label as different [Token] rather than using the actual text of factor. They
20 B. Wang et al.

act as signals guiding the model to extract designated content. Specific, we set a
series of special tokens according to the legal factors that need to be extracted,
where the factors are one-to-one mapped to the tokens. It is equivalent to cipher
or semaphore. When the model encounters a special token, it learns to extract
a specific kind of information. Commencing from the special tokens, we enrich
the semantic gradually in the rest extraction fact labels representations.

From Factor to Question. Our produced datasets are dedicated to the type
of cases, and the information that needs to be extracted from each dataset is
summarized according to the experiences of the judiciary (legal factors). Tak-
ing the dataset of traffic accidents as an example, the factors are “ ”
(time of the accident), “ ” (site of the accident), “ ” (division of
responsibilities), “ ” (the loss), etc. These factors are the counterpart of
questions in typical question-answering datasets such as SQuAD [13], SQuAD2.0
[12], and CJRC [4]. Although the factors are not in the form of interrogative sen-
tences, they contain all the core information needed for extraction. Hence, we
use it the factors as another form of extraction fact labels.
Nevertheless, the factors are not written in general question format, they
are just short key phrases indicating what content to look into. We move for-
ward to gain question-like text to see if expanding a factor into question format
benefits the extraction. We convert the factors into question format according
to linguistic rules such as the wh-question patterns. For instance, “ ”
(time of the accident) is a factor in the dataset of traffic accidents. We apply
linguistic rules to switch it to its corresponding question, which results in
“ ” (What was the time of the accident?). Similarly,
“ ” (site of the accident) is switched to “ ”
(Where did the accident take place?), and “ ” (the loss) turns to
“ ” (What losses does the plaintiff suffer?). Some fac-
tors are expanded into more than one question because they refer to multiple
concrete facets. For example, “ ” (division of responsibilities) becomes
“ ” (What are the responsibilities of the defendant?) and
“ ” (What are the responsibilities of the plaintiff?). Such
expansion literally enriches the semantic compared to the vanilla factors, and is
highly likely to enhance the performance in the question-answering experiment.

From Question to Questions. Following the outcome (converted question) of


last section, we use query expansion to further enrich the semantic information
of the extraction factor labels. In this section, each extraction factor label is
represented by longer or multiple questions. The expansion is quite simple and
straightforward, namely synonym expansion. Specifically, there are two ways to
expand a sentence with its synonym:

– The first is word-wise synonym expansion. A sentence is first segmented into


words using LTP [2] as the word segmentation module. Then the correspond-
ing synonyms of each word are directly concatenated following the original
Various Legal Factors Extraction Based on Machine Reading Comprehension 21

word order. We design two strategies to find the synonyms of each word,
“M1 ” and “M2 ”. “M1 ” looks up the existing dictionary for synonyms, such
as BigCilin [19]. While “M2 ” builds a special dictionary concerning both con-
texts and questions in each dataset. The embeddings of words in the whole
created dictionary are generated by BERT [3]. We use the embedding vectors
for calculating cosine similarities of the input word and the rest words from
the dictionary. We rank the candidates and resume top K words as retrieved
synonyms. Then, the synonyms are directly piled up to form a new sentence.
For example, in the dataset of recourse for labor remuneration when sup-
posing K = 3, the question “ ” (Whether the
party signs the labor contract?) will be expanded as “
”.
– The second is sentence-wise synonym expansion and we name it data aug-
mentation (DA). The sentence is treated as a whole when searching for syn-
onyms. To augmentate a whole sentence, we need to gain its the general
semantic representation. Hence, we rely on the corpus of Baidu zhidao, which
is a knowledge encyclopedia written in Chinese. This corpus is used to pre-
train language model that better understand the correlation between Chinese
words and phrases. We refer readers to SimBERT [14] for training details. We
use this model to gain a ranked list of K similar sentences. After acquiring
the sentence-wise synonyms, we propose several strategies to join them. In
addition to direct concatenation, we utilize bi-LSTM and bi-GRU to encode
semantic of these synonyms. Compared to the direct concatenation, this strat-
egy extract hidden features and control dimension of the output in order to
satisfy BERT’s input length limit. The direct concatenation easily exceeds
this limitation and the rest part of text is ignored. We report selected aug-
mentation results of K = 5 in Table 1 and Table 5 (in the appendix).

Table 1. Examples of data augmentation of recourse for labor remuneration (RLR)


(See Appendix for other datasets). We report two typical questions for each dataset.
Those without translations share the same English version with the original question’s.
22 B. Wang et al.

4 Experiments
4.1 Datasets and Experimental Settings
We composed three datasets regarding the following case types, intentional injury
(II), recourse for labor remuneration (RLR), and refusal to execute judgments
or rulings (REJR). The statistics are shown in Table 2. We select 200 question-
answer pairs from each dataset as the testset. Note that the number of questions
answered by “No” is markedly low. This is a common feature of judicial data,
that officers tend to ask what they’ve already had evidence by hands, leading to
few negative answers. We do not manually disturb this, because we believe that
it is better to restore the actual dataset distribution.
As for experimental settings, we employ BERT-base encode the documents
and extraction fact labels and refer readers to BERT [3] for detailed model
descriptions and hyper-parameter settings. Since the input length of Bert is
limited by 512, the length (LF ) allocated to the extraction fact label should also
be constrained and well-designed. Overall, the input extraction fact label tokens
of our model is no longer than 60, and the rest are preserved for document input.
Specific, LF = 1 if using [Token] as the representation; LF ≤ 10 for factors and
questions; while LF ≤ 60 for expansions. In practice, most of the expanded
questions are longer than 10, so any K > 5 results in input length suppressing.
We will seek the best effect by varying K.

Table 2. Statistics of datasets. # refers to the number. NG refers to “Not Given”.

Cases #Factors #Document #Questions Answer types


Yes No Span NG
RLR 17 992 5628 1233 189 8422 946
II 13 974 9840 695 246 7840 1059
REJR 13 960 8320 2017 25 4142 2158

4.2 Experiments of Different Methods for Legal Factor Extraction


Before carrying out experiments with MRC models, we try the sequence labeling
methods to address this task. We convert our legal factor extraction datasets
into the sequence labeling format, where the case descriptions in our datasets are
labeled with BIOES (B-begin, I-inside, O-outside, E-end, S-single) tags. Further,
we reproduce two entity recognition algorithms, Lattice LSTM [18] and Chinese
BERT-base [3]. Because these models require to find span text from the given
passage, only questions answered by span are preserved and the corresponding
questions turn out to be the tags. The number of such tags are limited, so that
we could number as Q1, Q2, etc. Specifically, for the question Q1, we find the
corresponding answer span in the context and label the tokens within the span
as “B-Q1 I-Q1 I-Q1 ... E-Q1”. The sequence labeling experiments are compared
to our proposed question-answering approaches in Table 3. Unfortunately, the
results do not live up to expectations, which we will discuss in Sect. 5.
Various Legal Factors Extraction Based on Machine Reading Comprehension 23

4.3 The Structure of Questions and the Design of Query Expansion


We investigate different methods to figure out the query expansion module that
works the best and design a serious of ablation studies. In this section, we intro-
duce these methods in details:

– “Qori ”: To Directly generate questions from the factors. The result of the
question-answering experiment in “Qori ” will be the baseline of the whole
experiments.
– “modif y”: We manually modify some questions, mainly to fix some mis-
takes caused by the generation process and to make them closer to
the contexts. The modification includes correction of typos and examina-
tion of punctuation characters. In the II dataset, we modify the ques-
tion “ ” (What happened to the victim?) to
“ ”. Although the English translation for both
“ ” and “ ” is victim, only “ ” appears in the contexts of
the dataset rather than “ ”. Therefore, we modify this word to be con-
sistent with the context.
– “+F ”: To directly concatenate the factors with original questions as a whole.
– “QEsyn -M 1”: Word-wise expansion using method M 1 described in Sect. 3.2.
– “QEsyn -M2 ”: Word-wise expansion using method M 2 described in Sect. 3.2.
– “QEda -random”: To replace the original question with a randomly question
obtained by data augmentation.
– “QEda -lstm”: To add a bi-LSTM layer after data augmentation to extract
semantic information from augmented questions.
– “QEda -gru”: To add a bi-GRU layer after data augmentation to extract
semantic information from augmented questions.
– “QEda -topK”: To directly concatenate the first K augmented sentences with
the original question. We experiment with K = 1, 3, 5.
– “QEda -mix”: To only expand questions whose answers are spans, but use the
original question for those answered by “Yes” or “No”, and those with no
answers (NG).
– “QEda -yn”: Do not expand question those answered by “Yes” or “No”, but
perform expansion on other kinds of questions.
– “QEda -last5”: To concatenate the last five out of ten augmented sentences
with the original question. We hypothesize that taking the last five sentences
increases the diversity of semantic because they are less similar to the original
question than the first five sentences.

Among, the above strategies, QEda -topK, QEda -mix, QEda -yn and QEda -
last5 are four ablation studies that reveal what indeed leads to the enhancement
of extraction results.

4.4 Results
The results of different methods to extract legal information and the perfor-
mances of different forms of extraction fact labels are reported in Table 3. Not as
24 B. Wang et al.

expected, the sequence labeling methods much underperform the rest methods.
The F1 score of Lattice only reaches 11.17%. Similarly, Seq2SeqBERT does not
perform much better, with F1 score 22.8%. We can conclude that the sequence
labeling methods are incapable of handling our task. Among the forms of extrac-
tion fact labels, the one with query expansion module achieves best F1 results
for RLR (88.51), II (86.11) and REJE (81.47) datasets. This method also con-
tributes to the highest EM scores on RLR (77.07) and REJR (67.34) datasets.
While in the II dataset, representing fact labels as questions peaks by the EM
score of 74.88. Table 3 shows that along with the increasing of semantic infor-
mation from tokens to expanded questions, the results are broadly improved,
reaching up to 20% in F1 and 15% in EM. These prove the idea that richer
semantic features boosts the model’s performance.
Table 4 and Table 6 (in the appendix) report the results of query expansion
strategies and the ablation experiments. Taking RLR dataset as an example, we
apply all the methods and strategies described in Sect. 4.3 to test the effective-
ness. Among these strategies, the data augmentation-based strategies outper-
form the others. The synonym-based experiments show similar pattern as the
RLR for the other two datasets. So we omit the synonym-based results due to the
page limitation and focus on the data augmentation-based experiments. In RLR
dataset, “QEda -top5” has a positive effect and its EM and F1 enhance about
1.5% and 2%, respectively compared to “QEda -lstm” and “QEda -gru”. After
these experiments, we realize that query expansion methods work unequally for
all kinds of questions. Some work well on questions whose answers are spans,
but poorly on “Yes”, “No”, or “NG” questions. Hence, apart from expanding
question of all answer types (QEda -topK), we design another experiments, QEda -
mix, QEda -yn, and QEda -last5, as ablation experiments to test the influence on
different types of extractions and to find out the best strategy. The QEda -mix
achieves an overall good result and is the recorded strategy in Table 3, but it
affects datasets differently.

Table 3. Exact Match(%) and F1(%) of different forms of extraction fact labels
(token, factor, question and expanded question) and sequence labeling method. Specific,
F1Seq1 : the F1 of Lattice [18]; F1Seq2 : the F1 of Seq2SeqBERT ; Expansion: QEda -mix.

Cases Seq. labeling Token Factor Question Expansion


F1Seq1 F1Seq2 EM F1 EM F1 EM F1 EM F1
RLR 11.17 22.80 69.39 82.04 72.96 85.49 73.98 85.13 77.04 88.51
II – – 69.35 81.30 71.36 82.92 74.88 86.10 72.86 86.21
REJR – – 52.26 60.66 58.29 66.24 65.83 80.76 67.34 81.47

5 Analyses
5.1 Extract Legal Factor as a Human Judge
Table 3 examines the feasibility of sequence labeling approaches in solving the
proposed legal factor extraction task and confirms that they are unsuitable to our
task. The “entities” in our datasets are nearly ten times as long as the entities
Various Legal Factors Extraction Based on Machine Reading Comprehension 25

Table 4. Results of special questions, reported in Exact Match(%), F1 score(%),


Precision(%) and Recall(%). Details of these methods are shown in Sect. 4.4

Cases Methods Total Span Not given


EM F1 EM F1 R P F1
Recourse for Qori 72.96 84.40 63.21 84.36 80.00 82.35 81.16
Labor
Qori + F 70.92 83.98 57.55 81.70 82.86 80.56 81.69
Remuneration
(RLR) Qori -modif y 73.98 85.38 61.32 82.40 85.71 83.33 84.50
QEsyn -M1 70.92 82.88 62.26 84.38 68.57 85.71 76.19
QEsyn -M2 71.94 84.36 59.43 82.39 80.00 80.00 80.00
QEsyn -M2 + F 72.96 84.01 63.21 83.64 77.14 79.41 78.26
QEda -random 70.41 82.72 59.43 82.21 74.29 81.25 77.61
QEda -top5 75.51 86.25 66.98 86.84 80.00 87.50 83.58
QEda -lstm 71.43 84.39 58.49 82.46 82.86 78.38 80.56
QEda -gru 71.43 83.59 58.49 80.97 82.86 82.86 82.86
QEda -mix 77.04 88.51 63.21 84.41 100.00 92.11 95.89
QEda -yn 71.43 83.38 58.49 80.59 82.86 76.32 79.46
QEda -top3 71.43 82.91 59.43 80.66 82.86 74.36 78.38
QEda -top1 73.47 84.70 63.21 83.97 80.00 80.00 80.00
QEda -last5 72.96 84.73 62.26 84.04 80.00 82.35 81.16
Intentional Qori 72.36 84.78 68.26 83.06 90.00 85.71 87.80
Injury (II) Qori -modif y 72.36 85.08 68.26 83.42 90.00 85.71 87.80
QEda -top5 71.86 83.56 68.26 82.21 85.00 89.47 87.18
QEda -lstm 70.85 83.70 67.66 82.98 80.00 84.21 82.05
QEda -gru 70.35 82.68 65.87 80.56 90.00 85.71 87.80
QEda -mix 72.86 86.21 69.46 85.36 85.00 89.47 87.18
Refusal to Qori 66.33 80.23 50.00 77.11 75.00 82.98 78.79
Execute
Qori -modif y 67.34 81.22 50.00 77.09 78.85 87.23 82.83
Judgments or
Rulings (REJR) QEda -top5 67.34 81.19 50.98 78.00 76.92 86.96 81.63
QEda -lstm 65.83 80.01 49.02 76.69 78.85 82.00 80.39
QEda -gru 67.34 80.67 51.96 77.96 75.00 88.64 81.25
QEda -mix 66.83 81.47 49.02 77.57 78.85 91.11 84.54

in the standard NER datasets, such as Resume NER [18]. Our “entities” are
essentially the answer spans to pre-defined questions, and are uneven in length.
Whilst in the other datasets, most of the entities are of just a few tokens, and
the number of entity tags are far less than ours. Therefore, the sequence labeling
method is unqualified for information extraction task of this magnitude. These
urges us to introduce the MRC-based means as better solutions.
In general MRC task, the model knows what answers to searching for because
the questions are written in plain text. But in the “Token” experiment, we
only feed the token symbols to the model without the specific target of the
extraction descriptions. By doing so, we imitate the procedure of a judge reading
through the documents. As we discussed in Sect. 1, these legal factors are fixed
regarding the case type, and a judge can fast decide the what to looking for when
26 B. Wang et al.

given a case description. This experiment proves that the model can learn the
semantic meanings through this special token without been told the exact target.
To a certain extent, it completes the work of expert knowledge and artificial
settings by itself. This differs our methods from general MRC approaches, and
is instructive for follow-up experiments.

5.2 The Richer Semantic Information, the Higher Score


We can conclude from Table 3 that along with the increasing of semantic informa-
tion from tokens to expanded questions, the performances are improved remark-
ably. Starting from the Token representation to the Factor and Question, and
eventually the Expansion, the semantic meaning of the extraction fact labels get
more specific and enhanced. Their corresponding results also get boosted gradu-
ally. In the most extreme example, the REJR dataset, the F1 score increases from
about 60 to over 81, and the EM score grows about 15 from Token to Expansion.
As shown in Table 3, there are surely information growth from Token to Factor
and from Question to Expansion. However, in the RLR dataset, the trend from
Factor to Question is in contrast with the others. A possible explanation is that
the Factor description is already concrete enough and converting it into question
format introduces undesired noises. Overall, the expressiveness of different forms
of extraction fact labels get enhanced all the way from Token to Expansion. This
consequently achieves higher scores.

5.3 Strategies of Query Expansion


Following the results, we further discuss different types of query expan-
sion and the corresponding detailed strategies. Table 4 proves that data
augmentation-based approaches outperform those synonym-based. Unexpect-
edly, synonym-based approaches sometimes impair the performance. Dif-
ferent from synonym conversion, the augmentation adds extra meaningful
words to the questions. For instance in the RLR dataset, the question
“ ” (What is the basic salary of workers?) turns
out to be “ ”, “ ” and
“ ”, where K = 3. Although the subject of the sen-
tence, “ ” (workers), does not change to its synonyms, yet the object
“ ” (the amount of basic salary) becomes “ ” (the
basic amount of salary), “ ” (basic wage) and “ ” (wage base).
And in the third enhanced sentence, a new word “ ” (monthly) is added as
the attributive of “ ” (wage base). Obviously, data augmentation brings
richer semantic features to the original questions, including paraphrasing of the
original words and introducing of extra information. These make the questions
more accurate and specific.
Nevertheless, the more is not always the better. We discover that different
augmentations have inconsistent effects on different datasets and different types
of questions. Expanding the question for all the answer types (span, YES, NO
and NG) does not always give the best results (see results of QEda -topK in
Various Legal Factors Extraction Based on Machine Reading Comprehension 27

Table 4 and Appendix Table 6). In the ablation experiments, some questions are
preserved and the others are expanded. We first expand all the questions to find
out what types of questions end in poor results, and leave these questions not
expanded while expand the rest questions who have outstanding performance.
The final strategy is QEda -mix, whose experimental results on all three datasets
get improved by up to 4 points in both F1 and EM.
For different datasets, the same type of questions does not necessarily show
similar results before and after query expansion. For example, in the RLR and
REJR datasets, query expansion works very well on questions whose answers are
spans, and poorly on questions whose answers are “YES”, “NO” and “NG”. But
in the II dataset, the results are exactly the opposite. Practically, it is difficult
for us to fully judge the results of the experiments through pre-speculation. For
instance, we intuitively thought that it is unnecessary to expand yes/no questions
because their answers are simple and naive, and the question format is uniform.
However, as far as the experimental results are concerned, our speculation can
only be incompletely wrong. The performance of different types of questions
before and after query expansion could be inconsistent, and cannot be predicted
despite of the datasets they belong to.

6 Conclusion
In this paper, we design a unified framework to extract different types of legal fac-
tors. It is a MRC-based framework but with pre-determined extraction fact labels
according to the datasets and achieves automatic extraction without manually
feeding the questions. To verify the effectiveness of our approach, we constructed
three datasets, including intentional injury (II), recourse for labor remuneration
(RLR), and refusal to execute judgments or rulings (REJR) cases. Experiments
show that such model suits well to our information extraction task compared
to the sequence labeling models. To improve the model performance, we design
different strategies and conduct plentiful experiments that discuss the impact
of adding semantic information to the extraction fact labels. The strategies fall
into two main categories, synonym-based and data augmentation-based. Among
all the strategies of these two categories, the QEda -mix performs generally well,
improving the performance of up to 4% on each dataset.

7 Future Works
Our proposed framework aim at extracting different types of legal factors. So far,
it is able to extract span-like factors, deduce yes/no factors and judge whether a
factor is included in the document or not. However, there are still several types of
legal factors that are not supported, such as multiple-choice factors, numerical
derivation factors, summarisation factors, etc. multiple-choice factors refer to
the fact that involves a fixed number of parties. For instance, some disputes over
rental contracts often involve three parties: the landlord, the tenant, and the
intermediary. The numerical derivation factors usually ask for the total amount
of money, weight, and the number of criminal activities. These information are
28 B. Wang et al.

not directly and clearly written, nor can they be derived during straightforward
extractions. In the future, we will manage to expand the framework’s capability
to let it handle more types of factors.
Experiments show that although query expansion has a positive effect on
each dataset, yet different datasets have different responses to query expansion.
For the RLR and REJR datasets, query expansion works well on questions whose
answers are spans. While query expansion gives a good result on the other types
of instance for II dataset. We believe this is due to differences in the structures
and content of different case types, but we have not quantified these differences
or looked for common patterns. In the future, we will try to find a universal
query expansion strategy to streamline the current solution, as well as adding
new factor types into consideration.

Acknowledgments. The authors would like to thank all the reviewers for their
insight reviews. This paper is funded by National Key R&D Program of China
(No.2018YFC0807701).

Appendix

Table 5. Data augmentation of other two datasets. We selected two representative


questions from each dataset. We do not translate sentences after data augmentation
(DA) because DA is usually a paraphrasing of the original questions, and the English
version could remain unchanged.
Various Legal Factors Extraction Based on Machine Reading Comprehension 29

Table 6. Results of general questions, reported in Exact Match(%), F1 score(%),


Precision(%) and Recall(%). The meaning of each experiment in this table is the same
as that in Table 4. It is noted that the extraction results of the questions whose answer is
no in the datasets of recourse for labor remuneration and refusal to execute judgments
or rulings are poor, and even the results of many experiments are 0. As shown in
Table 2, the questions with negative answer are originally very rare, and we do not
build the datasets specifically for this situation, which made it difficult to extract, or
simply do not show in the test sets.

Cases Methods Yes No


R P F1 R P F1
Recourse for Qori 97.96 92.31 95.05 0.00 0.00 0.00
Labor
Qori + F 100.00 90.74 95.15 0.00 0.00 0.00
Remuneration
(RLR) Qori -modif y 100.00 94.23 97.03 16.67 100.00 28.57
QEsyn -M1 100.00 83.05 90.74 0.00 0.00 0.00
QEsyn -M2 97.96 94.12 96.00 33.33 50.00 40.00
QEsyn -M2 + F 100.00 87.50 93.33 0.00 0.00 0.00
QEda -random 100.00 89.09 94.23 0.00 0.00 0.00
QEda -top5 100.00 90.74 95.15 0.00 0.00 0.00
QEda -lstm 100.00 92.45 96.08 0.00 0.00 0.00
QEda -gru 100.00 92.45 96.08 0.00 0.00 0.00
QEda -mix 100.00 92.45 96.08 0.00 0.00 0.00
QEda -yn 100.00 92.45 96.08 0.00 0.00 0.00
QEda -top3 97.96 92.31 95.05 0.00 0.00 0.00
QEda -top1 100.00 90.74 95.15 0.00 0.00 0.00
QEda -last5 100.00 92.45 96.08 0.00 0.00 0.00
Intentional Qori 100.00 90.00 94.74 100.00 100.00 100.00
Injury (II) Qori -modif y 100.00 90.00 94.74 100.00 100.00 100.00
QEda -top5 100.00 90.00 94.74 100.00 100.00 100.00
QEda -lstm 100.00 90.00 94.74 100.00 100.00 100.00
QEda -gru 100.00 100.00 100.00 100.00 100.00 100.00
QEda -mix 100.00 90.00 94.74 100.00 100.00 100.00
Refusal to Qori 95.45 76.36 84.84 0.00 0.00 0.00
Execute
Qori -modif y 95.45 77.78 85.71 0.00 0.00 0.00
Judgments or
Rulings (REJR) QEda -top5 95.45 77.78 85.71 0.00 0.00 0.00
QEda -lstm 90.91 80.00 85.11 0.00 0.00 0.00
QEda -gru 95.45 77.78 85.71 0.00 0.00 0.00
QEda -mix 95.45 77.78 85.71 0.00 0.00 0.00

References
1. Buck, C., et al.: Ask the right questions: active question reformulation with rein-
forcement learning. In: 6th International Conference on Learning Representations,
ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 2018, Conference Track Pro-
ceedings. OpenReview.net (2018). https://openreview.net/forum?id=S1CChZ-CZ
Another random document with
no related content on Scribd:
talk, but his uncle, Halil Pasha, the former Commander of the
Mesopotamian and Caucasian fronts, caused much merriment by
fighting a sham duel with wooden swords. His opponent was a
singer dressed as a medieval knight.
Besides the actresses several of the old aristocracy were gracious to
Enver and even the pretty wife of a Commissar attached to the
Foreign Office wrote him what would have been considered an
intriguing note, in another time and place. She offered to teach him
Russian if he would teach her French and he replied curtly that he
was not a “professor.”
But Enver was not always over-serious and unbending. In a certain
small circle of friends he was quite otherwise. And circumstances,
which so largely decide our destinies, gave me an opportunity to
know him as well as one alien to his religion could know him. He was
allotted by the Foreign Office quarters in the little palace where I was
living. For half a year I saw him every day, sat next to him at table
and occasionally we went to the theatres and the Turkish Embassy.
During that time Enver told me a great deal about his life and his
ambitions.
He had known my husband in Constantinople, and was away from
Moscow at the time of my husband’s death. As soon as he returned
he called on me, bringing Halil and the Turkish Ambassador, Ali
Fued Pasha, with him. All three were extremely kind and
sympathetic. From that moment until I left Moscow the Turks did
everything possible to make life less tragic for me, and I gained an
insight into the Turkish character which I had never imagined. The
Turks have a peculiar capacity for friendship. And friendship, once
given, has no bournes; a friend is a friend through everything—
sorrow, dishonor, poverty, as well as wealth and success. An enemy,
on the other hand, is beyond all consideration; he is spared nothing,
forgiven nothing.
Enver has the personal vanity of the enthusiast and he imagines that
everything he does he does well. The only way to cope with his
conceit is to be brutally frank. I discovered early in our acquaintance
that frankness by no means offends him. One of Enver’s nicest
qualities is that he likes to discover his faults as well as his virtues;
he is eager to improve himself.
He has a passion for making pencil sketches of people he meets and
always goes about with a pencil and a pad of paper in his pocket. In
the house where we lived he made sketches of all the guests and all
the servants. He made, in all, six very uninteresting portraits of me.
One morning when the tea was more tasteless than ever and the
bread especially sour and muddy so that I felt I could not manage to
eat a single bite, I could not help feeling unpleasantly resentful to
see Enver busy with his sketches and full of enthusiasm. And while I
sat staring at the terrible meal, he proceeded to make a life-size
portrait of me which was incredibly bad. I remember that I wondered
where on earth he ever got such a large piece of paper in Russia.
When he had finished the sketch he signed his name with a flourish
and presented it to me. I took it but said nothing. Enver has the
curiosity of a child, and, after a long silence, he asked me if it was
possible that I did not really like it. I said that I thought he had no
talent for drawing. He became suddenly quite angry and said in a
low voice, “But do you realize I have signed my name to it?”
“Your name doesn’t mean anything on a picture,” I explained. “If it
was an order for an execution or an advance it would be another
matter. You can’t make a good drawing just by signing your name to
it”
He frowned and then grew cordial as suddenly as he had grown
angry. He rose and bowed to me in a most courtly way. “You can’t
imagine,” he said, “how pleasing arrogance is to me.” His three
dominating characteristics being bravery, hauteur and recklessness,
he imagined that these motives also guided the actions of his
friends.
Enver never seemed to be able to loaf in the easy manner of most
Orientals. His mental and physical vitality is more like that of an
enthusiastic and healthy American. Every morning he rose early to
go for a long walk, he read a great deal, took at least three lessons
in some foreign language every week and was constantly writing
articles for Turkish papers which he printed on a hand press in his
own room, and he held almost daily conferences either with the
Russians or the Mohammedans. He does not drink or smoke and is
devoutly religious.
He likes any discussion which reveals another person’s deepest
emotions. If he cannot rouse one any other way he does so by some
antagonistic remark which often he does not mean at all. For
example, he is extremely liberal in his opinions about women and
does not think they should be excluded from political life.
Nevertheless, he said to a young actress at tea one afternoon in my
apartment, when they were talking about woman suffrage, that she
would be better off in a harem. Being an ardent feminist, she rose
and fairly shouted at him: “Enver Pasha, you may be a great man in
the East, but just listen to me! I am one of the first actresses of my
profession. In my world it is every bit as great and important for me
to remain an actress as it is for you in your world to remain a warrior
or a diplomat.”
Enver took his scolding in very good humor. Afterwards he told me
that he had never liked this actress before. “Independence is a great
thing in women. Our women lack it and many of them are just
puppets on this account.”
He was always extremely interested in American ideas and
American opinions. He said he could never understand why
Americans were so sentimental about Armenians. “Do they imagine
that Armenians never kill Turks? That is indeed irony.”
At the table he used to ask Mr. Vanderlip questions about his
proposed Kamchatka concessions. Vanderlip, like many
Californians, is rather violently anti-Japanese. His idea of having a
naval base at Kamchatka amused Enver. He said Vanderlip was
killing two birds with one stone, that he wished to manœuver the
American Government into a war with Japan, prove himself a patriot,
and at the same time protect his own interests and grow rich. “So
that,” said Enver, “if it really came about—the next war would be for
Vanderlip and should be known as ‘Vanderlip’s War.’”
When I asked, “Would you be sorry to see America and Japan at
war?” he replied, “Not if England was involved. Anything which tends
to draw England’s attention away from us or which weakens the
great powers, naturally gives Turkey a better chance for
reconstruction. You understand that I’m not saying I want to see
another war; I am simply saying that if those nations interested in
destroying Turkey are occupied elsewhere it relieves us of war
burdens and gives us a chance to carry out our own destinies.”
He tried to get Mr. Vanderlip’s reaction on women by the same
tactics he employed with the actress. One day he said, “I have three
wives and I’m looking for another.” This was not true, but Mr.
Vanderlip proved entirely gullible. “Good heavens,” he said,
regarding Enver in shocked surprise, “we Anglo-Saxons consider
one wife enough tyranny....”
“Naturally,” Enver conceded, suavely, “with one there must be
tyranny but with three or four or a hundred.... Ah, you must agree
that is quite a different matter.”
His sudden appearance in Moscow during the blackest days of the
blockade as well as the blackest days for the Central Powers proves
him an incomparable soldier-of-fortune. With two suits of clothes, a
pair of boots, a good revolver and a young German mechanician
whom he could trust, he started by aeroplane from Berlin to Moscow.
The story of how they had to land because of engine trouble near
Riga, of how he was captured and spent two months in the Riga jail
just at the moment when the whole Allied world was calling loudest
for his blood, will remain a story which will have scanty
advertisement from those British Secret Service men who like so
well to turn journalists and write their own brave autobiographies.
Enver sat in the Riga jail as plain “Mr. Altman” who could not speak
anything but German. He was scrutinized by every Secret Service
man in the vicinity and pronounced unanimously a Jewish German
Communist of no importance. By appearing humble, inoffensive and
pleasant, he soon worked his way into the confidence of the warden,
was released and escaped to Moscow. He arrived just in time to rush
off for the dramatic Baku conference.
The Communists understood perfectly well that Enver Pasha was
not at the Oriental Conference as a sudden and sincere convert to
Internationalism, and he knew that they knew. Both Zinoviev and
Enver were actors taking the leading rôles in a significant historical
pageant. The results are really all that matter, since the motives will
soon be forgotten.
When Enver turned to Moscow he had no other place to turn to and
when Zinoviev took him to Baku, Zinoviev knew no other means of
effectively threatening the English in order to change their attitude on
the blockade. Zinoviev could not complain about Enver’s shallow
attitude towards Socialism since there was hardly anything
Socialistic about Zinoviev’s appeal for a “holy war.” Enver summed
up his feelings about the new alliance thus: “For the future of Turkey
and the future of the East a friendship with Russia is worth more to
the Turks than any number of military victories. And we have to build
that friendship while we have the opportunity.”
His way of living without any regrets and as if there were no to-
morrows is rather startling at times. I remember when Talaat Pasha,
his lifelong friend, was murdered by an Armenian in Berlin, he read
the message with no show of emotion and his only comment was:
“His time had come!” But against an excessive temptation on the
part of fate to record Enver’s death prematurely, in his own words, he
“sleeps with one eye open,” carries a dagger and a loaded
automatic. Once when we talked about the possibility of his being
assassinated he said, “I have been near death so many times that
these days I live now seem to be a sort of present to me.”
Enver is no open sesame to those who do not know him well. He
really has the traditional Oriental inscrutability. The first two or three
times I talked with him, we stumbled along rather lamely in French.
Someone suggested to me that he probably spoke several
languages which, for some unknown reason, he would not admit. So
one day I said abruptly, “Oh, let’s speak English.” He looked at me
with one of his sudden, rare smiles and answered in my own
language, “Very well, if you prefer it.”
When I asked him how he learned English he told me he had
learned it from an English spy. “He came to me as a valet and
professed deep love for Turkey. For several months we studied
diligently. One day I thought I would test his love for Turkey so I
ordered him to the front. He was killed. Later, we found his papers.”
“Were you surprised?” I asked him.
“Why, not at all,” said Enver. “He really showed a great deal of pluck.
The only thing I had against him was that he taught me a lot of
expressions not used at court.”
“Like what?”
“Like ‘don’t mention it,’” said Enver, laughing. “And the terrible thing
about learning such an expression,” he said, “is that it is so sharp
and so definite and often fits an occasion so aptly that it flashes in
one’s mind and can’t be forgotten. American slang is extremely
picturesque and expressive, but it is not dignified enough to be used
by diplomats.”
Everyone is familiar with Enver’s “direct action” method of playing
politics. One of the ways he was wont to remove troublesome rivals
in the days of the Young Turk Revolution was to go out and shoot
them with his own hand. This “impulsiveness” got him into grave
trouble with the Soviets in spite of all his sensible utterances to the
contrary. When he was “shifted” to Bokhara so that he would not be
in the way of either Kemal or the Russians, he got bored and started
a war of his own. One night he fled into the hills of Afghanistan and
soon began to gather recruits around him. A few nights after that one
of the principal officials of the Bokharan Republic also fled to join
Enver. This performance was repeated until over half the Bokharan
cabinet had fled. Then fighting began and we got vague rumors of
battles, but only through the Soviet press.
In August, 1922, we heard that Enver had been killed, that his body
had been found on the field of battle. There was even romance
surrounding him and his supposed death. Stories were circulated
that when the body was picked up and examined, the letters of an
American girl were found next his heart. I went to see Jacob Peters,
who has charge now of all the Eastern Territory and who was then in
Moscow. He laughed heartily and said he would show me the
“information.” It consisted of three very hazy telegrams which had
been three weeks on the way. The men who sent the telegrams and
discovered the body had never seen Enver. There was no mention of
letters. And Peter’s opinion of the whole affair was that there was
nothing at all authentic in the story or else it was “a trick of Enver’s to
sham being dead.” Peters’ theory proved true. Within a few days
fighting began again and Enver began to win.
He had conceived the notion of uniting all Turkestan and Bokhara
and Keiva to the Angora government. It places the Soviets in a
strange position. They may have to give in to him, though he will not
actually be an “enemy,” because neither the Turks nor the Russians
can afford to break their treaty. Therefore, his private war in the
south embarrasses the Soviets much more than it does Kemal, who
needs only to disavow any connection with it, as do all the Turkish
officials. If Enver wins he will add a nice slice to Turkish territory; if
he loses, Turkey will be in the same position as before.
Enver, while he will always maintain a great prestige in the
Mussulman world, will never oust Kemal. Mustapha Kemal Pasha is
the great popular hero of a victorious Turkey, which but for him might
never have even survived. There were times in the past when Enver
was more important than Kemal, but that can never happen again in
Kemal’s life. Both men rose from the ranks and both are the sons of
peasants. And Kemal at this moment is more important than the
Sultan. Greater than that no man can be under the banners of
Mohammed.
One can hardly over-estimate the importance of the new
Mohammedan unity, that new patriotic energy which has taken the
place of the former lethargy and which already reaches out far
beyond the borderland of the Faithful. The Mussulman world is
reviving after a long sleep. And not only Mohammedans are uniting
but the entire East and Middle East. Aside from Japan, a significant
harmony is rapidly taking place, a harmony which evolves itself into
a tremendous power. This power may decide the world’s destiny
before another generation.
Enver and Kemal Pasha, being aware of the purport of beginning
that great concord by interwoven treaties with Russia, read the stars
well. There must come a day also when that great sleeping giant,
China, will be part of this alliance. And the seeds of that friendship
have also been planted. The Chinese official delegations which
came to Moscow were not only well received by the Russians, but
they hob-nobbed with the Mohammedans like brothers.
TIKON AND THE RUSSIAN CHURCH

TIKON AND THE RUSSIAN CHURCH


There are two points of importance in regard to the Greek Orthodox
Church and the Russian revolution. First, that the church has
maintained itself and second, that it has issued no frantic appeals for
outside help. While certain priests have allied themselves with
counter-revolutionists, officially the church has never taken sides.
Even at the present moment when a bitter conflict is on, the quarrel
remains a family quarrel.
Tikon, the Patriarch, by remaining unruffled through the barricade
and blockade days, proved himself a strong leader in a time when
only strong leaders could survive. If he had been frightened or
hostile in the Denikin or Kolchak days he might have shared the fate
of the Romanoffs; if he had taken part in counter-revolutions, the
church itself might have been badly shattered. But until recently,
Tikon has been as placid as his ikons and as interested in the great
change going on about him as a scientist. And therein lay his
strength. “Don’t let any one pity me,” he said last winter when I
talked to him, “I am having the most interesting time of my life.”
Much nonsense has been written about anti-church propaganda in
Soviet Russia. Dozens of writers have discussed a certain rather
obscure sign in Moscow which reads: “Religion is the opiate of the
people.” This sign, about three feet across, is painted high up on the
north side of the Historical Museum building near the entrance to the
Red Square. No one in Russia seems to be much interested in it and
certainly it attracts less attention than any one of our million billboard
advertisements. I tried for a year to find out who had put it up and
what group it represented, but could never discover. It was a cab
driver who said the wisest thing concerning it. “If somebody took it
down,” he said dryly, when I asked him what he thought of it, “no one
would notice.”
The anti-clerical posters gotten out at the beginning of the revolution,
however, had a much more far-reaching influence. They were
usually to the effect that the priests were hoarding the church lands
and at the same time expecting the peasants to support them. Any
idea which sanctions giving the land to the peasants is popular in
Russia. It was not long before the peasants had seized the church
lands and divided them through their land committees. But this did
not make them atheists.
I remember meeting an old peasant leader from Siberia who had led
a successful revolt against Kolchak. He was received as a hero
when he arrived at Moscow for an All Russian Congress of Soviets.
He told me a story about a priest in his community who was a
counter-revolutionist. He said, “It usually is this way with me and with
many of the peasants, we love God and we are religious but we hate
the priests.” I asked him if it was not possible to find good priests,
and he began to tell me about one priest who had been very noble
and self-sacrificing. But this was the only one he could think of. “The
others disgrace God,” he said.
And that is just what one must understand in order to comprehend
the Russian church and its present position. The Soviets did not
destroy the church or ruin it in any way—no outside pressure could
do that. It was Rasputin and other “disgracers” who at last outraged
even the credulous and easy-going peasants.
A revolution had to take place in the church as well as outside of it to
save it at all. The church, at the time of the revolution, was as corrupt
as the Tsardom. Nothing is better evidence of this than the way it
was deserted by hundreds of priests as soon as the life in the
monasteries ceased to be easy. Long before the upheaval the
priesthood had grown dissolute. All that the revolution did was to
give the church a pruning which saved its soul. By shearing it of its
old luxuries it cut off the parasitic priests and by severing it from the
state it took the church out of politics. It was forced to stand or fall by
its own merits.
And when the wealth of the church was reduced to a certain point it
became necessary for a priest to be such a good priest and so well
loved and appreciated by his flock that his flock was willing to
support him, in spite of the hard life and the terrible conditions. Thus
a new and better clergy came into being.
The final test of the priesthood, however, came with the famine. All
that was left of the church wealth, outside of the churches
themselves, were the jewels in the ikons and the silver and gold
ornaments which glitter in the shrines throughout Russia. The
government decided to requisition these treasures. The priests who
had been shriven in the revolutionary fire were glad and willing to
part with these things, but there were many who resisted. The
outcome was a split in the church ranks, as well as riots, intrigue,
and bad feeling. There probably was a good deal of mismanagement
on the part of a few arbitrary Soviet officials like Zinoviev, who do not
seem to comprehend the sensitiveness of religious people and how
easily outraged they are by outside intrusion. There is little doubt that
this heightened a delicate and unfortunate situation. If a Church
Committee had been allowed to select and turn over the jewels and
precious metal, Tikon and other churchmen would probably never
have been brought to trial.
However, the trials themselves are intensely interesting and mark an
epoch in the life of the revolution. They actually mark the real
beginning of public opinion in Russia and that, in any case, is a
healthy development. It is like letting fresh air into a long-closed
room. Discussions of the government and the church have for five
years been going on in whispers behind closed doors. It now comes
down to this: if the government is wrong and is unjustly stripping the
church of wealth, the government will suffer by lack of support or
even open hostility on the part of the peasants, who have so much
power now that they can no longer be ignored on any question; and
if the priests are wrong and prove themselves selfish in this time of
need, the priests will be deposed. But the church itself will go on
because the peasants are religious; they will continue to “love God”
in the traditional manner.
About a week ago I met a Russian priest in New York and I asked
him at once how he felt about the requisitioning of the jewels. He
raised his hands devoutly. “What man could pray to God and hoard
jewels at such a time?” he exclaimed. Then he showed me a very
old and precious carved wooden cross. “There was a ruby in this
cross,” he said. “It was the only valuable thing I possessed. I can’t
tell you how happy I was when it was sold and the money used for
relief. This is not a stone you see in it now; it is a piece of red glass,
but it is somehow more precious to me than the ruby.” Here is the
expression of a really devout man and the only sort of priest that
people will follow in such a crisis.
It is perfectly true that the leaders of the Communist movement are
not religious. All students, in fact the entire “intelligentsia” or
educated classes of Russia, were never religious. Before the
revolution all groups of revolutionaries and literary folk prided
themselves on their lack of religion. So anti-religion is not confined
strictly to the leaders of the Communist movement. Any other party
except the Monarchist Party would be equally devoid of interest in
religion.
The Monarchists necessarily support the church because the Tsar
was really head of the church. This has been true since the time of
Peter the Great, who while not actually abolishing the office of
Patriarch, never allowed another Patriarch to be elected. One of the
curious and interesting sidelights of the revolution was that a few
weeks after the church was separated from the state, a Patriarch
was elected for the first time in two hundred years, so that while in
one way the church lost its power, in another way it really came into
its own.
Freedom of religion, as we know it in the United States, was a
surprise and a shock to the members of the Russian church, for up
until 1917 no other sects but the Greek Orthodox were permitted by
law in Russia. Naturally, when other religious orders began to send
in missionaries the old church protested, and when the Soviets
answered that freedom of religion was now an established fact they
did not understand it as “freedom” and called it discrimination. And it
seemed like discrimination, because, while the Orthodox Church was
losing its former possessions, other religions were gaining
concessions.
Tikon, whose official title is Patriarch of Moscow and All the Russias,
and who is called, with a sharp flavor of French revolutionary days,
by the Supreme Revolutionary Tribunal, “Citizen Basil Ivanovitch
Baliavin,” was born in Pskoff in 1860. He was educated in Petrograd
Theological Academy and became a monk upon the completion of
his studies. He later held several important posts as a professor in
theological institutions. He was consecrated Bishop of the Aleutian
Isles and North America in 1897 and then came to America. In 1905
he was made Archbishop and moved the cathedral residence from
San Francisco to New York. He returned to Russia in February,
1907, having been appointed Archbishop of Jaroslav. In 1913 he
became Archbishop of Vilna. Early in 1917 he was elected
Metropolitan of Moscow and in November of that same year, just
when the Bolsheviks came into power, he became Patriarch.
Just what influenced Tikon and made him so much more democratic
than most of his colleagues, I do not know. My own opinion, after a
conversation with him, is that he is somewhat of a student of history
and a philosopher, as well as a priest. It is the opinion of many
people, inside and outside of Russia, that it was his long residence in
America which made him so liberal. Of one thing I feel sure. He
would have resisted the Soviet Government if he had believed that it
was better for the future of the church. I do not think he refrained
because of any personal fears, but because he actually saw a real
revival of religion in the fire through which the church was passing.
No one could have expected the church to embrace the revolution.
The nobility and the clergy had walked too many centuries hand in
hand. The nobility perished in the course of events and the church
survived, as it did in France. And the church will continue to survive
—merely the poorer by a few jewels or a few thousand acres of land.
But it will never wield the same power that it once did or that it could
wield if there was a return to Tsardom. It cannot be as strong, for
example, as the Church of Rome is in Italy.
The real menace to the power of the Russian church lies in its own
medieval outlook on life. It has scarcely anything to do with anti-
church propagandists or with opposition by force or by requisition.
The youth of Russia is interested in reconstruction and the
government for the first time. The young people have learned to read
and to think. They are no longer content with the old forms; they are
repelled by dissolute or un-Christlike priests. If the church wishes to
be strong and to have an influence in the life of the nation it cannot
gain that influence by haggling over a pile of rubies and diamonds
and emeralds while thousands of children are dying of hunger. The
old peasants might follow Tikon when he says that the famine is the
business of God, but the young people will not. It is almost
inconceivable that a man can follow the lowly Christ in such a proud
way. Certainly, the young Russians, who have so passionately
defended the revolution, will never be satisfied with such a
conception.
It seems very sad, from the religious point of view, that Tikon, who
steered his church through the long period of fighting and
destruction, should lose his equilibrium in the period of adjustment.
He was able to smile through all the worst days of terror and
suspicion. He could joke about the Cheka guard outside his door, he
could calm his agitated congregations, but he could not sacrifice
form. When I interviewed him he wore a gorgeous robe and jewels.
Tikon is sincere. Even in his clinging to the splendor of gold and
jewels, he is sincere. It is his particular mystical way of loving God,
which is difficult to understand in our age of materialism. Tikon, in a
lesser degree, has many of those qualities of Lenin which make him
a leader of men. If he had been as great a man as Lenin he would
have thoroughly purified the church and led a great religious revival
in Russia.
TCHICHERIN,

COMMISSAR FOR FOREIGN AFFAIRS,


AND HIS SUBORDINATES.
GREGORY VASSILIEVITCH TCHICHERIN
MAXIM LITVINOV, ASSISTANT COMMISSAR
LEONID KRASSIN
DAVID ROTHSTEIN
GREGORY WEINSTEIN
MICHAEL KARAKHAN
MR. FLORINSKY
MR. AXIONOV

GREGORY VASSILIEVITCH
TCHICHERIN
My first interview with Tchicherin was at midnight and my last
interview was at five in the morning. This happens to cover a fairly
complete rotation of the official hours of the Soviet Foreign Office.
One evening at a box party in the Bolshoi Theatre, Enver Pasha
remarked: “I have to kill time somehow for three hours after the play.
Halil Pasha and I have an appointment with Mr. Tchicherin at two
o’clock.” In spite of his smiling Oriental inscrutability and a palpable
diplomatic duty to conform to everything Russian, one could feel an
amused disapproval of such official unconventionality.
This eccentric habit of turning night into day, with every floor of the
Foreign Office blazing like a lighthouse in a city which by municipal
decree is put to bed before midnight in order to save fuel, naturally
creates an almost fantastic air of whimsicality. Mr. Tchicherin makes
no excuse for this “vice,” as one of his secretaries very cleverly
phrased it; he simply finds night more harmonious for his tasks than
day and with that lack of consideration which dreamers always
consciously or unconsciously assume, he forces his whole staff to
follow his example. The result is that his clerks make a mad
scramble to get transferred into another government department.
Everything about Tchicherin is as consistently contrary to an ordered
life as his inversion of working hours. Born an aristocrat, trained
under the Tsar for the diplomatic service, delicate, cultured, aloof,
with a fine gesture of Quixotic generosity, he has thrown his life and
his fortunes in with the cause of the proletariat with all the abandon
of religious fervor.
His aloofness is so evident that one can hardly find any concordance
about the astounding decision of such an obvious æsthete to
become an active part of revolution—which is sweat and blood and
violence. Perhaps that explains why he wraps his vision round him
like a cloak and shuts out the sun in order not to be disturbed and
disillusioned by reality. We were all brought up on stories about kings
who were gay-fellows-well-met and could outdance and outdrink
their soldiers; on nobles who turned out to be Robin Hoods. But,
alas, who can imagine Tchicherin rollicking at a workers’ picnic or
smoking a friendly pipe with a Red soldier?
No simple person will ever feel intimate or at home with his super-
class indifference to material surroundings. A scrubwoman is just as
uncomfortable in his presence as was the intrepid Mrs. Sheridan,
who was able to rub such gay elbows with the other commissars. Mr.
Tchicherin’s way of arching an eyebrow at life upsets the best brand
of poise.
Living alone in a barren room on the top floor of the Foreign Office,
he is as far removed socially and physically from the lower as the
upper crust. Perhaps only an aristocrat is able to attain this dizzy
height of indifference to human contact with one’s fellows. And I
can’t help feeling that there is something rather splendid about such
complete isolation.
Outside of politics, the telephone and the cable, all up-to-dateness
offends him. He abhors new clothes, does not like to ride in
automobiles, refuses to have modern office paraphernalia about him,
does every little task for himself, like sharpening his own pencils and
running all over the building on office-boy errands. This attitude
produces the same effect as if he distrusted all his subordinates. His
secretaries stand helpless and ill at ease while he searches for a lost
telegram or answers the telephone.
Last winter they told an amusing story of how Karakhan, who is
Commissar of Eastern Affairs, lured Tchicherin into donning a new
suit. Tchicherin’s one suit was literally in rags when the Turkish treaty
and the Afghan treaty and the Persian treaty and all the other
Oriental treaties were about to be signed. These affairs had to be
arranged with more or less bourgeois pomp, since the Orientals are
rather keen on ceremony. So Mr. Karakhan, taking a long chance,
went ahead and ordered a new suit for Mr. Tchicherin from a Soviet
tailor, then one morning while Tchicherin slept, he changed the suits.
In a few minutes he came rushing back again and exclaimed with
emotion, “There’s a new note from Lord Curzon!” Tchicherin was up
in one bound and struggling into the new trousers. Whatever he
thought privately of Mr. Karakhan’s presumption, they continued in
an apparently pleasant relationship.
In appearance Mr. Tchicherin is tall, with the bent shoulders of the
man who stoops to go through doors. His eyes, not through any
evasiveness, but because of an extreme shyness, continually seek
other places than the face of his interviewer. Yet when one meets his
quick, occasional glance, one is startled by the intelligence and
gentleness of his expression.
Diplomacy is an inseparable part of Mr. Tchicherin’s existence. He
eats, drinks and sleeps with the affairs of state, looks at life as a
chess game and is continually checkmating, even in ordinary
conversation. Lenin approves of him and feels for him a warm
personal affection in spite of the fact that the Premier so dislikes
eccentricities. He knows that Tchicherin can be trusted, that he has
an invaluable knowledge of international affairs and more important
than all that, that he will never make any real decision without
consulting Lenin.
Mr. Bullitt told me that during his negotiations he found Tchicherin so
brilliant that it was difficult to get anywhere. The Foreign Minister was
always quite justified from the Soviet angle but the Soviets were
being forced to make hard concessions. Invariably when they came
to a deadlock, he telephoned Lenin and Lenin gave in.
During our first talk, when we discussed the campaign of lies about
Russia which has so long flooded English, French and American
papers, I said that I thought it was partly due to the fact that no
reporters were permitted at that time to go in and investigate actual
conditions. It was characteristic of Tchicherin to interrupt very
suddenly and ask, “Will you tell me why American reporters come
over here and claim they are impartial observers, even profess
friendliness towards us, and then go home and write such
astounding lies?”
I thought it wasn’t fair to generalize. The most unfair stories have
always been manufactured at Riga and Reval or at Paris by
interested political groups or by disappointed reporters who never
got inside. As for the reporters who actually witnessed the revolution,
certainly the majority remained fair and sympathetic, in spite of the
fact that it grew particularly difficult, especially in America, even to
maintain one’s equilibrium about Russia after Brest-Litovsk. To my
mind came back unhappy recollections of Overman and Lusk
investigations, raids, deportations and general war-hysteria. Perhaps
some such thought came also to Tchicherin because he said, “Yes,
yes, I suppose in the main, you are right, but how do you account for
a man like——?”
Tchicherin is full of old-fashioned honor. The idea that foreign papers
sanctioned false reports in order to justify intervention or the
blockade seemed so outrageous to him that he could never realize
that this sort of propaganda has become as much a part of modern
warfare as liquid fire or submarines.
Very late one night I saw Tchicherin running up the stairs to his office
in a high state of excitement because a New York evening
newspaper carried on its front page a fake interview with Lenin in
which he discussed everything from the Irish situation to the Russian
Ballet. Tchicherin saw no humor in this. His comment was, “How can
a reputable American paper allow such a thing? After all, Comrade
Lenin is the Premier of a great country.”
Men who give themselves completely to an ideal quite naturally
become supersensitive and unreasonable. At least that is the rule,
and Tchicherin is no exception. The deliberate misinterpretation
abroad, during long hard years, of every effort of the Soviet
Government at peace or reconstruction or defense or negotiations,
has got under his skin. So while he insisted on the strictest
adherence to the truth in all reports sent over the government wire,
at the same time he permitted himself a mild dissipation in
extravagant adjectives by way of retaliation, in his too long and too
complicated “notes.” He allowed even more unrestrained language in
Vestnik. Vestnik is the official bulletin of the Soviet Government—
very much like the bulletin issued by the Bureau of Public
Information during the war. The young man who edited this sheet
was a talented and educated Russian but his idea of an unemotional
government report was very much like that of our own George Creel.
I used to tease him about his passion for such words as “scurrilous”
in reference to capitalists or White Guards. But it never made any
impression. He confessed that he found my cables flat and
uninteresting.
Besides my radios to American papers, which were transmitted by
way of Berlin, and the government bulletin which was sent out to the
whole world and rarely used by anybody, there was also a wire to
London for the Daily Herald. Every one of these telegrams had to be

You might also like