GINN: gradient interpretable neural networks for visualizing financial texts

Tomoki Ito ORCID: orcid.org/0000-0003-4200-1311¹,
Hiroki Sakaji¹,
Kiyoshi Izumi¹,
Kota Tsubouchi² &
…
Tatsuo Yamashita²

497 Accesses
Explore all metrics

Abstract

This study aims to visualize financial documents in such a way that even nonexperts can understand the sentiments contained therein. To achieve this, we propose a novel text visualization method using an interpretable neural network (NN) architecture, called a gradient interpretable NN (GINN). A GINN can visualize a market sentiment score from an entire financial document and the sentiment gradient scores in both word and concept units. Moreover, the GINN can visualize important concepts given in various sentence contexts. Such visualization helps nonexperts easily understand financial documents. We theoretically analyze the validity of the GINN and experimentally demonstrate the validity of text visualization produced by the GINN using real financial texts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text-Visualizing Neural Network Model: Understanding Online Financial Textual Data

Designing and Evaluating Context-Sensitive Visualization Models for Deep Learning Text Classifiers

Text Scribe: Unveiling New Dimensions in Text Summarization

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

http://textream.yahoo.co.jp/category/1834773.

References

Ravi, K., Ravi, V.: A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl. Based Syst. 89(C), 14–46 (2015)
Article Google Scholar
Hechtlinger, Y.: Interpretation of prediction models using the input gradient. In: NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems (2016)
Bach, S., Binder, A., Montavon, G., Klauschen, F., Muller, K.R., Samek, W.: On pixel-wise explanations for nonlinear classifier decisions by layer-wise relevance propagation. PLOS ONE 10(7), 1–46 (2015)
Google Scholar
Mikolov, T., Chen, K., Sutskever, I., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. NIPS 2013, 3111–3119 (2013)
Google Scholar
Hornik, K., Feinerer, I., Kober, M., Buchta, C.: Spherical k-means clustering. J. Stat. Softw. 50(10), 1–22 (2012)
Article Google Scholar
Yuan, Y., He, L., Peng, L., Huang, Z.: A new study based on word2vec and cluster for document categorization. J. Comput. Inf. Syst. 10(21), 9301–9308 (2014)
Google Scholar
Zhao, P., Zhang, T.: Accelerating Minibatch stochastic gradient descent using stratified sampling. arXiv:1405.3080v1 (2014)
Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: ICLR (2015)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. JMLR 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Kudo, T., Yamamoto, K., Matsumoto. Y.: Applying conditional random fields to japanese morphological analysis. In: EMNLP 2004(2004)
Fang, A., Macdonald, C., Ounis, I., Habel, P.: Using word embedding to evaluate the coherence of topics from twitter data. In: SIGIR 2016 (2016)
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: LREC 2010 Workshop (2010)
Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. In: ICML 2017 (2017)
Xu, Q., Zhao, Q., Pei, W., Yang, L., He, Z.: Design interpretable neural network trees through self-organized learning of features. In: IJCNN 2004 (2004)
Zhang, Q., Wu, Y.N., Zhu, S.: Interpretable convolutional neural networks. In: CVPR 2018 (2018)
Mnih, V., Heess, N., Graves, A., Kavukcuoglu, K.: Recurrent models of visual attention. NIPS 2014, 2204–2212 (2014)
Google Scholar
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. ICML 2015, 77–81 (2015)
Google Scholar
Dong, Y., Su, H., Zhu, J., Zhang, B.: Improving interpretability of deep neural networks with semantic information. In: CVPR 2017 (2017)
Patrik, E.K., Liu, Y.: A survey on interactivity in topic models. IJACSA 7(4), 456–461 (2016)
Google Scholar
Jeffrey, L., Connor, C., Kevin, S., Jordan, B.: Tandem anchoring: a multiword anchor approach for interactive topic modeling. In: ACL 2017, pp. 896–905 (2017)
Hu, L., Jian, S., Cao, L., Chen, Q.: Interpretable recommendation via attraction modeling: learning multilevel attractiveness over multimodal movie contents. In: IJCAI 2018 (2018)
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: NAACL 2016 (2016)
Rahman, M.K.M., Chow, W.S.C.: Content-based hierarchical document organization using multi-layer hybrid network and tree-structured features. Expert Syst. Appl. 37(4), 2874–2881 (2010)
Article Google Scholar
Zhao, H., Du, L., Buntine, W., Zhou, M.: Inter and intra topic structure learning with word embeddings. In: ICML 2018 (2018)
Hasan, M., RundensteinerE., Agu, E.: Automatic emotion detection in text streams by analyzing Twitter data. Int. J. Data Sci. Anal. (2018) https://doi.org/10.1007/s41060-018-0096-z
Barranco, R.C., Boedihardjo, A.P., Hossain, M.S.: Analyzing evolving stories in news articles. Int. J. Data Sci. Anal. (2017). https://doi.org/10.1007/s41060-017-0091-9
Ito, T., Sakaji, H., Tsubouchi, K., Izumi, K., Yamashita, T.: Text-visualizing neural network model: understanding online financial textual data. In: PAKDD 2018 (2018)

Download references

Acknowledgements

This work was supported in part by JSPS KAKENHI Grant No. JP17J04768.

Author information

Authors and Affiliations

Graduate School of Engineering, The University of Tokyo, Tokyo, Japan
Tomoki Ito, Hiroki Sakaji & Kiyoshi Izumi
Yahoo Japan Corporation, Tokyo, Japan
Kota Tsubouchi & Tatsuo Yamashita

Authors

Tomoki Ito
View author publications
You can also search for this author in PubMed Google Scholar
Hiroki Sakaji
View author publications
You can also search for this author in PubMed Google Scholar
Kiyoshi Izumi
View author publications
You can also search for this author in PubMed Google Scholar
Kota Tsubouchi
View author publications
You can also search for this author in PubMed Google Scholar
Tatsuo Yamashita
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tomoki Ito.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This paper is an extension version of the PAKDD’2018 Long Presentation paper Text-visualizing Neural Network Model: Understanding Online Financial Textual Data [27]

A Appendix

1.1 A.1 Theoretical analysis of the II algorithm

This section theoretically explains the validity of the II algorithm. Let $\varOmega _{dw}^{(k)}$ be a set of words in the polarity dictionary included in the kth cluster. Then, Propositions 1– 3 are established.

Proposition 1

If Update is utilized for the parameter updates, then

$$\begin{aligned} \left\{ \begin{array}{ll} E[\partial {w}_{k, i}^{(2)}]< 0 &{} \left( \frac{p^{+}(w_{k,i})}{p^{-}(w_{k,i})}> \frac{E[|\varDelta ^{(2)*}_{j, k}|| z_{j, i}^{(1, k)} = 1 \cap j \in D^{(n)}]}{E[|\varDelta ^{(2)*}_{j, k}| | z_{j ,i}^{(1, k)} = 1 \cap j \in D^{(p)}]}\right) \\ E[\partial {w}_{k, i}^{(2)}] > 0 &{} \left( \frac{p^{+}(w_{k,i})}{p^{-}(w_{k,i})} < \frac{E[|\varDelta ^{(2)*}_{j, k}|| z_{j, i}^{(1, k)} = 1 \cap j \in D^{(n)}]}{E[|\varDelta ^{(2)*}_{j, k}| | z_{j ,i}^{(1, k)} = 1 \cap j \in D^{(p)}]}\right) \end{array} \right. . \end{aligned}$$

(1)

Proposition 1 indicates that if Cond 1:the values of${t^+}$and${t^-}$are sufficiently large and Cond 2:for every word$w_{k,i^{+}} \in \varOmega _{dw}^{(k)} \cap \varOmega _{pw}^{(k)}$, and$w_{k,i^{-}} \in \varOmega _{dw}^{(k)} \cap \varOmega _{nw}^{(k)}$, the initial values of$w^{(2)}_{k,i^{+}}$and$w^{(2)}_{k,i^{-}}$given byInitare positive and sufficiently large, and negative and sufficiently small, respectively, are met for every k, then the II algorithm is expected to award each positive word $\in \varOmega _{pw}^{(k)}$ (negative word $\in \varOmega _{nw}^{(k)})$ a positive (negative) sentiment score. Let ${\varvec{H}^{d}}^{(j, t)}$ be $\varvec{H}^{(j, t)} - {\varvec{H}^{*}}^{(j, t)*}$. Then, the following propositions, which are important for explaining the market mood predictability of the GINN, are established.

Proposition 2

If the initial values of $|\varvec{W^{(3)}}|$ and $|\varvec{W^{(4)}}|$ are sufficiently small (Cond 3), and for every $j \in \varOmega ^{(t)}_m$, the values of $\varvec{z}^{(2)}_{j}$ are $ \left\{ \begin{array}{ll} \mathrm{positive} &{} (j \in D^{(p)}) \\ \mathrm{negative} &{} (j \in D^{(n)}) \end{array} \right. $, then the first and second row vector values of $\partial \varvec{H}^{(j, t)}$ are positive and negative, respectively, and

$$\begin{aligned} \frac{\sum _{j \in \varOmega ^{(t+1)}_m} \Vert {\varvec{H}^{d}}^{(j, t+1)} \Vert _{1} }{\sum _{j \in \varOmega ^{(t+1)}_m} \Vert \varvec{H}^{(j, t+1)}\Vert _{1}} \le \frac{\sum _{j \in \varOmega ^{(t+1)}_m} \Vert {\varvec{H}^{d}}^{(j, t)} \Vert _{1} }{\sum _{j \in \varOmega ^{(t+1)}_m} \Vert \varvec{H}^{(j, t)}\Vert _{1}}. \end{aligned}$$

Proposition 3

If, for every k, Conds. 1–3 are established, and the values $|\varOmega _{pw}^{(k, t^+)}|$, $|\varOmega _{nw}^{(k, t^-)}|$, and $|\varOmega _m|$ are sufficiently large, then $\lim _{t \rightarrow \infty } \frac{\sum _{j \in \varOmega ^{(t)}_m} \Vert {\varvec{H}^{d}}^{(j, t)} \Vert _{1} }{\sum _{j \in \varOmega ^{(t)}_m} \Vert \varvec{H}^{(j, t)}\Vert _{1}} = 0$.

Propositions 2 and 3 indicate that we can obtain the local optimal solution using the II algorithm in an ideal case because the influence of Update disappears over time. From these propositions, we can also see that Init maintains model predictability because Init is useful for satisfying Cond 2.

Proposition 1 explains the interpretability of the GINN, and Propositions 2 and 3 confirm the predictability of the GINN in an ideal case.

1.1.1 A.1.1 Proof of Proposition 1

Proof

Here, for every $k (\le K)$, if $j \in D^{(p)}$, then ${\varDelta }^{(2)*}_{k, j} \le 0$, and if $j \in D^{(n)}$, then ${\varDelta }^{(2)*}_{k, j} \ge 0$. Thus,

$$\begin{aligned} E[\partial {w}_{k, i}^{(2)}]= & {} E\left[ \frac{1}{N} \sum _{j \in \varOmega _m} {\varDelta }^{(2)*}_{k, j}{\varvec{z}_{j}^{(1, k)}}\right] \\= & {} \mathrm{freq}({w}_{k, i})\left( p^{-}({w}_{k, i})E\left[ {\varDelta }^{(2)*}_{k, j} \left| j \in D^{(p)} \right. \right] \right. \\&\left. - p^{+}({w}_{k, i})E\left[ {\varDelta }^{(2)*}_{j, k} \left| j \in D^{(n)}\right. \right] \right) . \end{aligned}$$

Therefore, Proposition 1 can be established. $\square $

1.1.2 A.1.2 Proof of Proposition 2

Proof

Let us denote $\varvec{Z}^{(2)} := [\varvec{v}^{(CS)}_{m(1)}, \ldots , \varvec{v}^{(CS)}_{m(N)}] (\in {\mathbb {R}}^{K \times N})$, $\varvec{U}^{(2)} := \tanh ^{-1}(\varvec{Z}^{(2)})$, $\varvec{U}^{(3)} := \varvec{W}^{(3)}\varvec{Z}^{(2)}$, $\varvec{u}^{(l)}_j$ is the jth column of $\varvec{U}^{(l)}$ ($l = 2, 3$), and $\varvec{z}^{(l)}_j$ and $z^{(l)}_{i,j}$ are the jth column and the (i, j) component $\varvec{Z}^{(l)}$ ($l = 2$). We approximate $\partial {\varvec{H}^{(j, t)}}$ as follows:

$$\begin{aligned} \partial {\varvec{H}^{(j, t)}}= & {} \partial ({\varvec{W}^{(4)}}\left( \mathrm{diag} (f_3'(\varvec{u}^{(3)}_j)\right) {\varvec{W}^{(3)}})) \\\approx & {} \partial ({\varvec{W}^{(4)}})\mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) {\varvec{W}^{(3)}} \\&+\, {\varvec{W}^{(4)}}\mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) \partial {\varvec{W}^{(3)}} \end{aligned}$$

First, we confirm that if for every $j \in \varOmega ^{(t)}_m$, the values of $\varvec{z}^{(2)}_{j}$ are $ \left\{ \begin{array}{ll} \mathrm{positive} &{} (j \in D^{(p)}) \\ \mathrm{negative} &{} (j \in D^{(n)}) \end{array} \right. $ , then the following three lemmas are established. $\square $

Lemma 1

The first and second row vector values of

${\varDelta ^{(4)}_j}{\varvec{z}^{(2)}_j}^\mathrm{T}$ are positive and negative, respectively.

Lemma 2

The first and second rows of

$$\begin{aligned} \partial ({\varvec{W}^{(4)}})\mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) {\varvec{W}^{(3)}} \end{aligned}$$

are positive and negative, respectively.

Lemma 3

The first and second rows of

$$\begin{aligned} {\varvec{W}^{(4)}}\mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) \partial {\varvec{W}^{(3)}} \end{aligned}$$

are positive and negative.

1.1.3 A.1.3 Proof of Lemma 1

From the condition,

$$\begin{aligned} {z}^{(2)}_{k, j} \left\{ \begin{array}{ll} > 0 &{} (j \in D^{(p)}) \\ < 0 &{} (j \in D^{(n)}). \end{array} \right. \end{aligned}$$

(2)

Moreover, Eq. (4) is established.

$$\begin{aligned}&\left\{ \begin{array}{ll} \varvec{d_j} = (0,1)^\mathrm{T} &{} ( j \in D^{(p)}), \\ \varvec{d_j} = (1,0)^\mathrm{T} &{} ( j \in D^{(n)}) \end{array} \right. , \end{aligned}$$

(3)

$$\begin{aligned}&\varDelta ^{(4)}_j := \varvec{y}_j - \varvec{d}_j \left\{ \begin{array}{ll} \left( |\varDelta ^{(4)}_{1, j}|, -|\varDelta ^{(4)}_{1, j}|\right) ^\mathrm{T} &{} (j \in D^{(p)})\\ \left( -|\varDelta ^{(4)}_{1, j}|, |\varDelta ^{(4)}_{1, j}|\right) ^\mathrm{T} &{} (j \in D^{(n)}). \end{array} \right. \end{aligned}$$

(4)

Thus, from Eqs. (2) and (4), Lemma 1 is established.

1.1.4 A.1.4 Proof of Lemma 2

$$\begin{aligned}&{\varvec{W}^{(4)}}\mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) \partial {\varvec{W}^{(3)}} \\&\quad =\frac{1}{N}\sum _{i = 1}^{N} {\varvec{W}^{(4)}}\mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) \mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_i)\right) {\varvec{W}^{(4)}}^\mathrm{T} {\varDelta ^{(4)}_i}{\varvec{z}^{(2)}_i}^\mathrm{T}. \end{aligned}$$

Here, $ \partial \varvec{w}^{(4)}_1 = - \partial \varvec{w}^{(4)}_2 $ because $\partial \varvec{W}^{(4)} = {\varDelta }^{(4)}{\varvec{Z}^{(3)}}^\mathrm{T}$ and ${\varDelta }^{(4)}_{1, j} = - {\varDelta }^{(4)}_{2, j}$ for every j. Considering that $\varvec{W}^{(4)}$ is the sum of the values of $\partial \varvec{W}^{(4)}$ in the previous updates, if the initial value of $|\varvec{W}^{(4)}|$ is sufficiently small, then we can approximate it as

$$\begin{aligned} \varvec{w}^{(4)}_1 \approx - \varvec{w}^{(4)}_2. \end{aligned}$$

(5)

Let us denote $A^{l}$ as

$$\begin{aligned} A^{l} :={\varvec{W}^{(4)}}\mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) \mathrm{diag} (f_3'(\varvec{u}^{(3)}_i)){\varvec{W}^{(4)}}^\mathrm{T}. \end{aligned}$$

We define ${v}^{{\varDelta }^{(4)}{\varvec{Z}^{(3)}}}_{1, i}$ as the ith component of $\varvec{w}^{(4)}_1$ and $F_{i,j}$ as the (i, i) component of $\mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) \mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_i)\right) $. Then, $ A^{l} = \left( \begin{array}{ll} \sum _{i = 1}^{K2} F_{i,j} |{{v}^{{\varDelta }^{(4)}{\varvec{Z}^{(3)}}}_{1, i}}|^2 &{} - \sum _{i = 1}^{K2} F_{i,j} |{{v}^{{\varDelta }^{(4)}{\varvec{Z}^{(3)}}}_{1, i}}|^2 \\ - \sum _{i = 1}^{K2} F_{i,j} |{{v}^{{\varDelta }^{(4)}{\varvec{Z}^{(3)}}}_{1, i}}|^2 &{} \sum _{i = 1}^{K2} F_{i,j} |{{v}^{{\varDelta }^{(4)}{\varvec{Z}^{(3)}}}_{1, i}}|^2 \end{array}\right) . $

Thus, from Lemma 1, if the initial value of $|\varvec{W}^{(4)}|$ is sufficiently small, then Lemma 2 is established.

1.1.5 A.1.5 Proof of Lemma 3

$$\begin{aligned}&\partial ({\varvec{W}^{(4)}})\mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) {\varvec{W}^{(3)}} \\&\quad = \frac{1}{N}{\varDelta }^{(4)}{f_3({\varvec{Z}^{(2)}}^\mathrm{T}{\varvec{W}^{(3)}}^\mathrm{T})} \mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) {\varvec{W}^{(3)}} \\&\quad = \frac{1}{N} \sum _{i = 1}^{N} {\varDelta }^{(4)}_i{f_3\left( {\varvec{z}^{(2)}_i}^\mathrm{T}{\varvec{W}^{(3)}}^\mathrm{T}\right) } \mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) {\varvec{W}^{(3)}} \\&\quad = \frac{1}{N} \sum _{i = 1}^{N}{\varDelta }^{(4)}_i {\varvec{z}^{(2)}_i}^\mathrm{T}{\varvec{W}^{(3)}}^\mathrm{T} \varvec{M}^{i} \mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) {\varvec{W}^{(3)}}. \end{aligned}$$

Let us define the matrix $\varvec{M}^{i}$ as $ \varvec{M}^{i} := \mathrm{diag} \left( \frac{f_3(u_i)}{u_i}\right) $ and $A^{r}$ as $ \varvec{A}^{r} := {\varvec{W}^{(3)}}^\mathrm{T} \varvec{M}^{i} \mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) {\varvec{W}^{(3)}}. $ Here, $ \partial \varvec{W}^{(3)} = \frac{1}{N} \sum _{i} \mathrm{diag}\left( f_3'(\varvec{u}^{(3)}_i)\right) {\varvec{W}^{(4)}}^\mathrm{T} {\varDelta ^{(4)}_i}{\varvec{z}^{(2)}_i}^\mathrm{T}. $

Thus,

$$\begin{aligned}&\partial {\varvec{W}^{(3)}}^\mathrm{T} \varvec{M}^{i} \mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) \partial {\varvec{W}^{(3)}} \\&\quad = \frac{1}{N^2} \sum _{l \in \varOmega _m} \sum _{m \in \varOmega _m} {\varvec{z}^{(2)}_l}{\varDelta ^{(4)}_l}^\mathrm{T} {\varvec{W}^{(4)}} D^{r}_{i, j, l, m} {\varvec{W}^{(4)}}^\mathrm{T} {\varDelta ^{(4)}_m}{\varvec{z}^{(2)}_m}^\mathrm{T}, \end{aligned}$$

where we denote

$$\begin{aligned} \mathrm{diag}\left( f_3'(\varvec{u}^{(3)}_l)\right) \varvec{M}^{i} \mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) \mathrm{diag}\left( f_3'(\varvec{u}^{(3)}_m)\right) \end{aligned}$$

as $D^{r}_{i, j, l, m}$.

Considering that $ \varvec{w}^{(4)}_1 \approx - \varvec{w}^{(4)}_2$ (Eq (5)),

$$\begin{aligned} {\varvec{W}^{(4)}} D^{r}_{i, j, l, m} {\varvec{W}^{(4)}}^\mathrm{T} \approx \left( \begin{array}{cc} k^{(4)} &{} -k^{(4)} \\ -k^{(4)} &{} k^{(4)} \end{array} \right) , \end{aligned}$$

where $k^{(4)} := {\varvec{w}^{(4)}_1} D^{r}_{i, j, l, m} {\varvec{w}^{(4)}_1}^\mathrm{T} > 0$ because $D^{r}_{i, j, l, m}$ is the diagonal matrix, and each diagonal element value of $D^{r}_{i, j, l, m}$ is positive.

Moreover, from Eq. (4),

$$\begin{aligned} {\varDelta ^{(4)}_l}^\mathrm{T} \left( \begin{array}{cc} k^{(4)} &{} -k^{(4)} \\ -k^{(4)} &{} k^{(4)} \end{array} \right) {\varDelta ^{(4)}_m} \left\{ \begin{array}{cc} > 0 &{} (\varvec{d}_{l} = \varvec{d}_{m} )\\ < 0 &{} (\varvec{d}_{l} \ne \varvec{d}_{m}) \end{array} \right. . \end{aligned}$$

Therefore, from Eq. (2), each element value of

$$\begin{aligned} {\varvec{z}^{(2)}_l} {\varDelta ^{(4)}_l}^\mathrm{T} \left( \begin{array}{cc} k^{(4)} &{} -k^{(4)} \\ -k^{(4)} &{} k^{(4)} \end{array} \right) {\varDelta ^{(4)}_m} {\varvec{z}^{(2)}_m}^\mathrm{T} \end{aligned}$$

is positive. Thus, each element value of

$ \partial {\varvec{W}^{(3)}}^\mathrm{T} \varvec{M}^{i} \mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) \partial {\varvec{W}^{(3)}} $ is positive. Considering that $\varvec{W}^{(3)}$ is the sum of the values of $\partial \varvec{W}^{(3)}$ in the previous updates, if the initial value of $\varvec{W}^{(3)}$ is sufficiently small and N is sufficiently large, then each element value of $\varvec{A}^{r}$ is positive. Thus, from Lemma 1 and the above, the first and second array values of ${\varDelta ^{(4)}_i}{\varvec{z}^{(2)}_i}^\mathrm{T}A^{r}$ are positive and negative, respectively. Thus, if N is sufficiently large, then Lemma 3 is established.

1.1.6 A.1.6 Summarization

From $ \partial {\varvec{H}^{(j, t)}} = \frac{1}{N} \sum _{i = 1}^{N} A^{l} {\varDelta ^{(4)}_i}{\varvec{z}^{(2)}_i}^\mathrm{T} + {\varDelta ^{(4)}_i}{\varvec{z}^{(2)}_i}^\mathrm{T} A^{r} $, Lemmas 2 and 3, the first and second row values of $E[\partial {\varvec{H}^{(j, t)}}]$ are positive and negative, respectively, for every j. Thus, Proposition 3 is established.

1.1.7 A.1.7 Explanation of Proposition 3

Proof

If the following conditions are met for every k:

Cond 1: the values of ${t^+}$ and ${t^-}$ are sufficiently large,

Cond 2: for every word $w_{k,i^{+}} \in \varOmega _{dw}^{(k)} \cap \varOmega _{pw}^{(k)}$, and $w_{k,i^{-}} \in \varOmega _{dw}^{(k)} \cap \varOmega _{nw}^{(k)}$, the initial value of $w^{(2)}_{k,i^{+}}$ given by Init is positive and sufficiently large, and negative and sufficiently small, respectively,

Cond 3: the initial values of $|\varvec{W^{(3)}}|$ and $|\varvec{W^{(4)}}|$ are sufficiently small, and

Cond 4: the values $|\varOmega _{pw}^{(k, t^+)}|$, $|\varOmega _{nw}^{(k, t^-)}|$, and $|\varOmega _m|$ are sufficiently large,

then, from Cond 1, Cond 2, Cond 4, and Proposition 1, Eq. (2) is established. Thus, from Proposition 2 and Cond 3, Proposition 3 is established. $\square $

1.1.8 Experimental examples of influence by Update

Figure 8 illustrates examples for obtaining the mean value of

$$\begin{aligned} R_t := \frac{\sum _{j \in \varOmega _m} \Vert {\varvec{H}^{d}}^{(j, t)} \Vert _{1}}{\sum _{j \in \varOmega _m} \Vert \varvec{H}^{(j, t)}\Vert _{1}} \end{aligned}$$

in the fivefold cross-validation using real datasets. The upper part of Fig. 8 is the result for the Yahoo dataset, where $T = 0.02 \mathrm{and} K2 = K = 500$, and the lower part is the result for the News article dataset, where $K2 = K = 500$. The results demonstrate that the influence of Update converges to zero according to Proposition 3, even when real datasets are used.

1.2 A.2 Gradient method for assigning terms to their polarity scores using fully MLP

We assign sentiment scores to words using the gradient method [2] and the fully MLP as follows. Let the output value of the fully MLP, $\varvec{y}^{mlp}_{j} $, be $ f^{MLP}(\varvec{v}^{\mathrm{(BOW)}}_j) \in {\mathbb {R}}^{2} $ , and $D_\mathrm{train}$ be the training dataset documents. The sentiment value of word $w_{k, i}$, $Gr(w_{k, i})$, is calculated as

$$\begin{aligned}&\varvec{y}^{mlp+}_{j} := \varvec{y}^{mlp}_{j} \odot (1,0)^\mathrm{T}, \varvec{y}^{mlp-}_{j} := \varvec{y}^{mlp}_{j} \odot (0,1)^\mathrm{T}, \\&\mathrm{Gr}(w_{k, i}) := \frac{\sum _{j \in D_{\mathrm{train}}} \frac{\partial \varvec{y}^{mlp+}_{j}}{\partial {z}^{(1, k)}_{j, i}} - \frac{\partial \varvec{y}^{mlp-}_{j}}{\partial {z}^{(1, k)}_{j, i}} }{|D_{\mathrm{train}}|}. \end{aligned}$$

1.3 A.3 Experimental result details

1.3.1 A.3.1 Interpretability evaluation

Table 7 summarizes the interpretability evaluation results of different parameter settings: the mean $F_1$ scores and the standard deviation scores.

Table 7 Interpretability evaluation results from different parameter settings (the mean value ± the standard deviation results)

Full size table

1.3.2 A.3.2 Market mood predictability evaluation

Tables 8 and 9 show the market mood predictability results of the fivefold cross-validation.

Table 8 Market mood predictability evaluation results

Full size table

Table 9 Details of market mood predictability results

Full size table

Table 10 shows the mean scores and standard deviation values from different parameter settings when the market mood predictability is evaluated in terms of the mean score of the fivefold cross-validation.

Table 10 Market mood predictability evaluation results (the mean value ± and the standard deviation results from different parameter settings)

Full size table

1.4 A.4 Detailed text visualization results for other initialization settings in Init

Tables 11, 12, and 13 represent the detailed text visualization results for other initialization settings in Init.

Table 11 Market mood predictability results for the GINN model when the number of words in Init decreases

Full size table

Table 12 Interpretability evaluation results for the GINN model when the number of words in Init decreases

Full size table

Table 13 Human interpretability evaluation results for the GINN model when the number of words in Init decreases

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ito, T., Sakaji, H., Izumi, K. et al. GINN: gradient interpretable neural networks for visualizing financial texts. Int J Data Sci Anal 9, 431–445 (2020). https://doi.org/10.1007/s41060-018-0160-8

Download citation

Received: 01 October 2018
Accepted: 22 November 2018
Published: 04 December 2018
Issue Date: May 2020
DOI: https://doi.org/10.1007/s41060-018-0160-8

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Text-Visualizing Neural Network Model: Understanding Online Financial Textual Data

Designing and Evaluating Context-Sensitive Visualization Models for Deep Learning Text Classifiers

Text Scribe: Unveiling New Dimensions in Text Summarization

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

A Appendix

A Appendix

1.1 A.1 Theoretical analysis of the II algorithm

Proposition 1

Proposition 2

Proposition 3

1.1.1 A.1.1 Proof of Proposition 1

Proof

1.1.2 A.1.2 Proof of Proposition 2

Proof

Lemma 1

Lemma 2

Lemma 3

1.1.3 A.1.3 Proof of Lemma 1

1.1.4 A.1.4 Proof of Lemma 2

1.1.5 A.1.5 Proof of Lemma 3

1.1.6 A.1.6 Summarization

1.1.7 A.1.7 Explanation of Proposition 3

Proof

1.1.8 Experimental examples of influence by Update

1.2 A.2 Gradient method for assigning terms to their polarity scores using fully MLP

1.3 A.3 Experimental result details

1.3.1 A.3.1 Interpretability evaluation

1.3.2 A.3.2 Market mood predictability evaluation

1.4 A.4 Detailed text visualization results for other initialization settings in Init

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation