Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

GINN: gradient interpretable neural networks for visualizing financial texts

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

This study aims to visualize financial documents in such a way that even nonexperts can understand the sentiments contained therein. To achieve this, we propose a novel text visualization method using an interpretable neural network (NN) architecture, called a gradient interpretable NN (GINN). A GINN can visualize a market sentiment score from an entire financial document and the sentiment gradient scores in both word and concept units. Moreover, the GINN can visualize important concepts given in various sentence contexts. Such visualization helps nonexperts easily understand financial documents. We theoretically analyze the validity of the GINN and experimentally demonstrate the validity of text visualization produced by the GINN using real financial texts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. http://textream.yahoo.co.jp/category/1834773.

References

  1. Ravi, K., Ravi, V.: A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl. Based Syst. 89(C), 14–46 (2015)

    Article  Google Scholar 

  2. Hechtlinger, Y.: Interpretation of prediction models using the input gradient. In: NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems (2016)

  3. Bach, S., Binder, A., Montavon, G., Klauschen, F., Muller, K.R., Samek, W.: On pixel-wise explanations for nonlinear classifier decisions by layer-wise relevance propagation. PLOS ONE 10(7), 1–46 (2015)

    Google Scholar 

  4. Mikolov, T., Chen, K., Sutskever, I., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. NIPS 2013, 3111–3119 (2013)

    Google Scholar 

  5. Hornik, K., Feinerer, I., Kober, M., Buchta, C.: Spherical k-means clustering. J. Stat. Softw. 50(10), 1–22 (2012)

    Article  Google Scholar 

  6. Yuan, Y., He, L., Peng, L., Huang, Z.: A new study based on word2vec and cluster for document categorization. J. Comput. Inf. Syst. 10(21), 9301–9308 (2014)

    Google Scholar 

  7. Zhao, P., Zhang, T.: Accelerating Minibatch stochastic gradient descent using stratified sampling. arXiv:1405.3080v1 (2014)

  8. Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: ICLR (2015)

  9. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. JMLR 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  10. Kudo, T., Yamamoto, K., Matsumoto. Y.: Applying conditional random fields to japanese morphological analysis. In: EMNLP 2004(2004)

  11. Fang, A., Macdonald, C., Ounis, I., Habel, P.: Using word embedding to evaluate the coherence of topics from twitter data. In: SIGIR 2016 (2016)

  12. Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: LREC 2010 Workshop (2010)

  13. Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. In: ICML 2017 (2017)

  14. Xu, Q., Zhao, Q., Pei, W., Yang, L., He, Z.: Design interpretable neural network trees through self-organized learning of features. In: IJCNN 2004 (2004)

  15. Zhang, Q., Wu, Y.N., Zhu, S.: Interpretable convolutional neural networks. In: CVPR 2018 (2018)

  16. Mnih, V., Heess, N., Graves, A., Kavukcuoglu, K.: Recurrent models of visual attention. NIPS 2014, 2204–2212 (2014)

    Google Scholar 

  17. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. ICML 2015, 77–81 (2015)

    Google Scholar 

  18. Dong, Y., Su, H., Zhu, J., Zhang, B.: Improving interpretability of deep neural networks with semantic information. In: CVPR 2017 (2017)

  19. Patrik, E.K., Liu, Y.: A survey on interactivity in topic models. IJACSA 7(4), 456–461 (2016)

    Google Scholar 

  20. Jeffrey, L., Connor, C., Kevin, S., Jordan, B.: Tandem anchoring: a multiword anchor approach for interactive topic modeling. In: ACL 2017, pp. 896–905 (2017)

  21. Hu, L., Jian, S., Cao, L., Chen, Q.: Interpretable recommendation via attraction modeling: learning multilevel attractiveness over multimodal movie contents. In: IJCAI 2018 (2018)

  22. Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: NAACL 2016 (2016)

  23. Rahman, M.K.M., Chow, W.S.C.: Content-based hierarchical document organization using multi-layer hybrid network and tree-structured features. Expert Syst. Appl. 37(4), 2874–2881 (2010)

    Article  Google Scholar 

  24. Zhao, H., Du, L., Buntine, W., Zhou, M.: Inter and intra topic structure learning with word embeddings. In: ICML 2018 (2018)

  25. Hasan, M., RundensteinerE., Agu, E.: Automatic emotion detection in text streams by analyzing Twitter data. Int. J. Data Sci. Anal. (2018) https://doi.org/10.1007/s41060-018-0096-z

  26. Barranco, R.C., Boedihardjo, A.P., Hossain, M.S.: Analyzing evolving stories in news articles. Int. J. Data Sci. Anal. (2017). https://doi.org/10.1007/s41060-017-0091-9

  27. Ito, T., Sakaji, H., Tsubouchi, K., Izumi, K., Yamashita, T.: Text-visualizing neural network model: understanding online financial textual data. In: PAKDD 2018 (2018)

Download references

Acknowledgements

This work was supported in part by JSPS KAKENHI Grant No. JP17J04768.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tomoki Ito.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This paper is an extension version of the PAKDD’2018 Long Presentation paper Text-visualizing Neural Network Model: Understanding Online Financial Textual Data [27]

A Appendix

A Appendix

1.1 A.1 Theoretical analysis of the II algorithm

This section theoretically explains the validity of the II algorithm. Let \(\varOmega _{dw}^{(k)}\) be a set of words in the polarity dictionary included in the kth cluster. Then, Propositions 13 are established.

Proposition 1

If Update is utilized for the parameter updates, then

$$\begin{aligned} \left\{ \begin{array}{ll} E[\partial {w}_{k, i}^{(2)}]< 0 &{} \left( \frac{p^{+}(w_{k,i})}{p^{-}(w_{k,i})}> \frac{E[|\varDelta ^{(2)*}_{j, k}|| z_{j, i}^{(1, k)} = 1 \cap j \in D^{(n)}]}{E[|\varDelta ^{(2)*}_{j, k}| | z_{j ,i}^{(1, k)} = 1 \cap j \in D^{(p)}]}\right) \\ E[\partial {w}_{k, i}^{(2)}] > 0 &{} \left( \frac{p^{+}(w_{k,i})}{p^{-}(w_{k,i})} < \frac{E[|\varDelta ^{(2)*}_{j, k}|| z_{j, i}^{(1, k)} = 1 \cap j \in D^{(n)}]}{E[|\varDelta ^{(2)*}_{j, k}| | z_{j ,i}^{(1, k)} = 1 \cap j \in D^{(p)}]}\right) \end{array} \right. . \end{aligned}$$
(1)

Proposition 1 indicates that if Cond 1:the values of\({t^+}\)and\({t^-}\)are sufficiently large and Cond 2:for every word\(w_{k,i^{+}} \in \varOmega _{dw}^{(k)} \cap \varOmega _{pw}^{(k)}\), and\(w_{k,i^{-}} \in \varOmega _{dw}^{(k)} \cap \varOmega _{nw}^{(k)}\), the initial values of\(w^{(2)}_{k,i^{+}}\)and\(w^{(2)}_{k,i^{-}}\)given byInitare positive and sufficiently large, and negative and sufficiently small, respectively, are met for every k, then the II algorithm is expected to award each positive word \(\in \varOmega _{pw}^{(k)}\) (negative word \(\in \varOmega _{nw}^{(k)})\) a positive (negative) sentiment score. Let \({\varvec{H}^{d}}^{(j, t)}\) be \(\varvec{H}^{(j, t)} - {\varvec{H}^{*}}^{(j, t)*}\). Then, the following propositions, which are important for explaining the market mood predictability of the GINN, are established.

Proposition 2

If the initial values of \(|\varvec{W^{(3)}}|\) and \(|\varvec{W^{(4)}}|\) are sufficiently small (Cond 3), and for every \(j \in \varOmega ^{(t)}_m\), the values of \(\varvec{z}^{(2)}_{j}\) are \( \left\{ \begin{array}{ll} \mathrm{positive} &{} (j \in D^{(p)}) \\ \mathrm{negative} &{} (j \in D^{(n)}) \end{array} \right. \), then the first and second row vector values of \(\partial \varvec{H}^{(j, t)}\) are positive and negative, respectively, and

$$\begin{aligned} \frac{\sum _{j \in \varOmega ^{(t+1)}_m} \Vert {\varvec{H}^{d}}^{(j, t+1)} \Vert _{1} }{\sum _{j \in \varOmega ^{(t+1)}_m} \Vert \varvec{H}^{(j, t+1)}\Vert _{1}} \le \frac{\sum _{j \in \varOmega ^{(t+1)}_m} \Vert {\varvec{H}^{d}}^{(j, t)} \Vert _{1} }{\sum _{j \in \varOmega ^{(t+1)}_m} \Vert \varvec{H}^{(j, t)}\Vert _{1}}. \end{aligned}$$

Proposition 3

If, for every k, Conds. 1–3 are established, and the values \(|\varOmega _{pw}^{(k, t^+)}|\), \(|\varOmega _{nw}^{(k, t^-)}|\), and \(|\varOmega _m|\) are sufficiently large, then \(\lim _{t \rightarrow \infty } \frac{\sum _{j \in \varOmega ^{(t)}_m} \Vert {\varvec{H}^{d}}^{(j, t)} \Vert _{1} }{\sum _{j \in \varOmega ^{(t)}_m} \Vert \varvec{H}^{(j, t)}\Vert _{1}} = 0\).

Propositions 2 and 3 indicate that we can obtain the local optimal solution using the II algorithm in an ideal case because the influence of Update disappears over time. From these propositions, we can also see that Init maintains model predictability because Init is useful for satisfying Cond 2.

Proposition 1 explains the interpretability of the GINN, and Propositions 2 and 3 confirm the predictability of the GINN in an ideal case.

1.1.1 A.1.1 Proof of Proposition 1

Proof

Here, for every \(k (\le K)\), if \(j \in D^{(p)}\), then \({\varDelta }^{(2)*}_{k, j} \le 0\), and if \(j \in D^{(n)}\), then \({\varDelta }^{(2)*}_{k, j} \ge 0\). Thus,

$$\begin{aligned} E[\partial {w}_{k, i}^{(2)}]= & {} E\left[ \frac{1}{N} \sum _{j \in \varOmega _m} {\varDelta }^{(2)*}_{k, j}{\varvec{z}_{j}^{(1, k)}}\right] \\= & {} \mathrm{freq}({w}_{k, i})\left( p^{-}({w}_{k, i})E\left[ {\varDelta }^{(2)*}_{k, j} \left| j \in D^{(p)} \right. \right] \right. \\&\left. - p^{+}({w}_{k, i})E\left[ {\varDelta }^{(2)*}_{j, k} \left| j \in D^{(n)}\right. \right] \right) . \end{aligned}$$

Therefore, Proposition 1 can be established. \(\square \)

1.1.2 A.1.2 Proof of Proposition 2

Proof

Let us denote \(\varvec{Z}^{(2)} := [\varvec{v}^{(CS)}_{m(1)}, \ldots , \varvec{v}^{(CS)}_{m(N)}] (\in {\mathbb {R}}^{K \times N})\), \(\varvec{U}^{(2)} := \tanh ^{-1}(\varvec{Z}^{(2)})\), \(\varvec{U}^{(3)} := \varvec{W}^{(3)}\varvec{Z}^{(2)}\), \(\varvec{u}^{(l)}_j\) is the jth column of \(\varvec{U}^{(l)}\) (\(l = 2, 3\)), and \(\varvec{z}^{(l)}_j\) and \(z^{(l)}_{i,j}\) are the jth column and the (ij) component \(\varvec{Z}^{(l)}\) (\(l = 2\)). We approximate \(\partial {\varvec{H}^{(j, t)}}\) as follows:

$$\begin{aligned} \partial {\varvec{H}^{(j, t)}}= & {} \partial ({\varvec{W}^{(4)}}\left( \mathrm{diag} (f_3'(\varvec{u}^{(3)}_j)\right) {\varvec{W}^{(3)}})) \\\approx & {} \partial ({\varvec{W}^{(4)}})\mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) {\varvec{W}^{(3)}} \\&+\, {\varvec{W}^{(4)}}\mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) \partial {\varvec{W}^{(3)}} \end{aligned}$$

First, we confirm that if for every \(j \in \varOmega ^{(t)}_m\), the values of \(\varvec{z}^{(2)}_{j}\) are \( \left\{ \begin{array}{ll} \mathrm{positive} &{} (j \in D^{(p)}) \\ \mathrm{negative} &{} (j \in D^{(n)}) \end{array} \right. \) , then the following three lemmas are established. \(\square \)

Lemma 1

The first and second row vector values of

\({\varDelta ^{(4)}_j}{\varvec{z}^{(2)}_j}^\mathrm{T}\) are positive and negative, respectively.

Lemma 2

The first and second rows of

$$\begin{aligned} \partial ({\varvec{W}^{(4)}})\mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) {\varvec{W}^{(3)}} \end{aligned}$$

are positive and negative, respectively.

Lemma 3

The first and second rows of

$$\begin{aligned} {\varvec{W}^{(4)}}\mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) \partial {\varvec{W}^{(3)}} \end{aligned}$$

are positive and negative.

1.1.3 A.1.3 Proof of Lemma 1

From the condition,

$$\begin{aligned} {z}^{(2)}_{k, j} \left\{ \begin{array}{ll} > 0 &{} (j \in D^{(p)}) \\ < 0 &{} (j \in D^{(n)}). \end{array} \right. \end{aligned}$$
(2)

Moreover, Eq. (4) is established.

$$\begin{aligned}&\left\{ \begin{array}{ll} \varvec{d_j} = (0,1)^\mathrm{T} &{} ( j \in D^{(p)}), \\ \varvec{d_j} = (1,0)^\mathrm{T} &{} ( j \in D^{(n)}) \end{array} \right. , \end{aligned}$$
(3)
$$\begin{aligned}&\varDelta ^{(4)}_j := \varvec{y}_j - \varvec{d}_j \left\{ \begin{array}{ll} \left( |\varDelta ^{(4)}_{1, j}|, -|\varDelta ^{(4)}_{1, j}|\right) ^\mathrm{T} &{} (j \in D^{(p)})\\ \left( -|\varDelta ^{(4)}_{1, j}|, |\varDelta ^{(4)}_{1, j}|\right) ^\mathrm{T} &{} (j \in D^{(n)}). \end{array} \right. \end{aligned}$$
(4)

Thus, from Eqs. (2) and (4), Lemma 1 is established.

1.1.4 A.1.4 Proof of Lemma 2

$$\begin{aligned}&{\varvec{W}^{(4)}}\mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) \partial {\varvec{W}^{(3)}} \\&\quad =\frac{1}{N}\sum _{i = 1}^{N} {\varvec{W}^{(4)}}\mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) \mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_i)\right) {\varvec{W}^{(4)}}^\mathrm{T} {\varDelta ^{(4)}_i}{\varvec{z}^{(2)}_i}^\mathrm{T}. \end{aligned}$$

Here, \( \partial \varvec{w}^{(4)}_1 = - \partial \varvec{w}^{(4)}_2 \) because \(\partial \varvec{W}^{(4)} = {\varDelta }^{(4)}{\varvec{Z}^{(3)}}^\mathrm{T}\) and \({\varDelta }^{(4)}_{1, j} = - {\varDelta }^{(4)}_{2, j}\) for every j. Considering that \(\varvec{W}^{(4)}\) is the sum of the values of \(\partial \varvec{W}^{(4)}\) in the previous updates, if the initial value of \(|\varvec{W}^{(4)}|\) is sufficiently small, then we can approximate it as

$$\begin{aligned} \varvec{w}^{(4)}_1 \approx - \varvec{w}^{(4)}_2. \end{aligned}$$
(5)

Let us denote \(A^{l}\) as

$$\begin{aligned} A^{l} :={\varvec{W}^{(4)}}\mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) \mathrm{diag} (f_3'(\varvec{u}^{(3)}_i)){\varvec{W}^{(4)}}^\mathrm{T}. \end{aligned}$$

We define \({v}^{{\varDelta }^{(4)}{\varvec{Z}^{(3)}}}_{1, i}\) as the ith component of \(\varvec{w}^{(4)}_1\) and \(F_{i,j}\) as the (ii) component of \(\mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) \mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_i)\right) \). Then, \( A^{l} = \left( \begin{array}{ll} \sum _{i = 1}^{K2} F_{i,j} |{{v}^{{\varDelta }^{(4)}{\varvec{Z}^{(3)}}}_{1, i}}|^2 &{} - \sum _{i = 1}^{K2} F_{i,j} |{{v}^{{\varDelta }^{(4)}{\varvec{Z}^{(3)}}}_{1, i}}|^2 \\ - \sum _{i = 1}^{K2} F_{i,j} |{{v}^{{\varDelta }^{(4)}{\varvec{Z}^{(3)}}}_{1, i}}|^2 &{} \sum _{i = 1}^{K2} F_{i,j} |{{v}^{{\varDelta }^{(4)}{\varvec{Z}^{(3)}}}_{1, i}}|^2 \end{array}\right) . \)

Thus, from Lemma 1, if the initial value of \(|\varvec{W}^{(4)}|\) is sufficiently small, then Lemma 2 is established.

1.1.5 A.1.5 Proof of Lemma 3

$$\begin{aligned}&\partial ({\varvec{W}^{(4)}})\mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) {\varvec{W}^{(3)}} \\&\quad = \frac{1}{N}{\varDelta }^{(4)}{f_3({\varvec{Z}^{(2)}}^\mathrm{T}{\varvec{W}^{(3)}}^\mathrm{T})} \mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) {\varvec{W}^{(3)}} \\&\quad = \frac{1}{N} \sum _{i = 1}^{N} {\varDelta }^{(4)}_i{f_3\left( {\varvec{z}^{(2)}_i}^\mathrm{T}{\varvec{W}^{(3)}}^\mathrm{T}\right) } \mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) {\varvec{W}^{(3)}} \\&\quad = \frac{1}{N} \sum _{i = 1}^{N}{\varDelta }^{(4)}_i {\varvec{z}^{(2)}_i}^\mathrm{T}{\varvec{W}^{(3)}}^\mathrm{T} \varvec{M}^{i} \mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) {\varvec{W}^{(3)}}. \end{aligned}$$

Let us define the matrix \(\varvec{M}^{i}\) as \( \varvec{M}^{i} := \mathrm{diag} \left( \frac{f_3(u_i)}{u_i}\right) \) and \(A^{r}\) as \( \varvec{A}^{r} := {\varvec{W}^{(3)}}^\mathrm{T} \varvec{M}^{i} \mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) {\varvec{W}^{(3)}}. \) Here, \( \partial \varvec{W}^{(3)} = \frac{1}{N} \sum _{i} \mathrm{diag}\left( f_3'(\varvec{u}^{(3)}_i)\right) {\varvec{W}^{(4)}}^\mathrm{T} {\varDelta ^{(4)}_i}{\varvec{z}^{(2)}_i}^\mathrm{T}. \)

Thus,

$$\begin{aligned}&\partial {\varvec{W}^{(3)}}^\mathrm{T} \varvec{M}^{i} \mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) \partial {\varvec{W}^{(3)}} \\&\quad = \frac{1}{N^2} \sum _{l \in \varOmega _m} \sum _{m \in \varOmega _m} {\varvec{z}^{(2)}_l}{\varDelta ^{(4)}_l}^\mathrm{T} {\varvec{W}^{(4)}} D^{r}_{i, j, l, m} {\varvec{W}^{(4)}}^\mathrm{T} {\varDelta ^{(4)}_m}{\varvec{z}^{(2)}_m}^\mathrm{T}, \end{aligned}$$

where we denote

$$\begin{aligned} \mathrm{diag}\left( f_3'(\varvec{u}^{(3)}_l)\right) \varvec{M}^{i} \mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) \mathrm{diag}\left( f_3'(\varvec{u}^{(3)}_m)\right) \end{aligned}$$

as \(D^{r}_{i, j, l, m}\).

Considering that \( \varvec{w}^{(4)}_1 \approx - \varvec{w}^{(4)}_2\) (Eq (5)),

$$\begin{aligned} {\varvec{W}^{(4)}} D^{r}_{i, j, l, m} {\varvec{W}^{(4)}}^\mathrm{T} \approx \left( \begin{array}{cc} k^{(4)} &{} -k^{(4)} \\ -k^{(4)} &{} k^{(4)} \end{array} \right) , \end{aligned}$$

where \(k^{(4)} := {\varvec{w}^{(4)}_1} D^{r}_{i, j, l, m} {\varvec{w}^{(4)}_1}^\mathrm{T} > 0\) because \(D^{r}_{i, j, l, m}\) is the diagonal matrix, and each diagonal element value of \(D^{r}_{i, j, l, m}\) is positive.

Moreover, from Eq. (4),

$$\begin{aligned} {\varDelta ^{(4)}_l}^\mathrm{T} \left( \begin{array}{cc} k^{(4)} &{} -k^{(4)} \\ -k^{(4)} &{} k^{(4)} \end{array} \right) {\varDelta ^{(4)}_m} \left\{ \begin{array}{cc} > 0 &{} (\varvec{d}_{l} = \varvec{d}_{m} )\\ < 0 &{} (\varvec{d}_{l} \ne \varvec{d}_{m}) \end{array} \right. . \end{aligned}$$

Therefore, from Eq. (2), each element value of

$$\begin{aligned} {\varvec{z}^{(2)}_l} {\varDelta ^{(4)}_l}^\mathrm{T} \left( \begin{array}{cc} k^{(4)} &{} -k^{(4)} \\ -k^{(4)} &{} k^{(4)} \end{array} \right) {\varDelta ^{(4)}_m} {\varvec{z}^{(2)}_m}^\mathrm{T} \end{aligned}$$

is positive. Thus, each element value of

\( \partial {\varvec{W}^{(3)}}^\mathrm{T} \varvec{M}^{i} \mathrm{diag} \left( f_3'(\varvec{u}^{(3)}_j)\right) \partial {\varvec{W}^{(3)}} \) is positive. Considering that \(\varvec{W}^{(3)}\) is the sum of the values of \(\partial \varvec{W}^{(3)}\) in the previous updates, if the initial value of \(\varvec{W}^{(3)}\) is sufficiently small and N is sufficiently large, then each element value of \(\varvec{A}^{r}\) is positive. Thus, from Lemma 1 and the above, the first and second array values of \({\varDelta ^{(4)}_i}{\varvec{z}^{(2)}_i}^\mathrm{T}A^{r}\) are positive and negative, respectively. Thus, if N is sufficiently large, then Lemma 3 is established.

1.1.6 A.1.6 Summarization

From \( \partial {\varvec{H}^{(j, t)}} = \frac{1}{N} \sum _{i = 1}^{N} A^{l} {\varDelta ^{(4)}_i}{\varvec{z}^{(2)}_i}^\mathrm{T} + {\varDelta ^{(4)}_i}{\varvec{z}^{(2)}_i}^\mathrm{T} A^{r} \), Lemmas 2 and 3, the first and second row values of \(E[\partial {\varvec{H}^{(j, t)}}]\) are positive and negative, respectively, for every j. Thus, Proposition 3 is established.

1.1.7 A.1.7 Explanation of Proposition 3

Proof

If the following conditions are met for every k:

Cond 1: the values of \({t^+}\) and \({t^-}\) are sufficiently large,

Cond 2: for every word \(w_{k,i^{+}} \in \varOmega _{dw}^{(k)} \cap \varOmega _{pw}^{(k)}\), and \(w_{k,i^{-}} \in \varOmega _{dw}^{(k)} \cap \varOmega _{nw}^{(k)}\), the initial value of \(w^{(2)}_{k,i^{+}}\) given by Init is positive and sufficiently large, and negative and sufficiently small, respectively,

Cond 3: the initial values of \(|\varvec{W^{(3)}}|\) and \(|\varvec{W^{(4)}}|\) are sufficiently small, and

Cond 4: the values \(|\varOmega _{pw}^{(k, t^+)}|\), \(|\varOmega _{nw}^{(k, t^-)}|\), and \(|\varOmega _m|\) are sufficiently large,

then, from Cond 1, Cond 2, Cond 4, and Proposition 1, Eq. (2) is established. Thus, from Proposition 2 and Cond 3, Proposition 3 is established. \(\square \)

1.1.8 Experimental examples of influence by Update

Figure 8 illustrates examples for obtaining the mean value of

$$\begin{aligned} R_t := \frac{\sum _{j \in \varOmega _m} \Vert {\varvec{H}^{d}}^{(j, t)} \Vert _{1}}{\sum _{j \in \varOmega _m} \Vert \varvec{H}^{(j, t)}\Vert _{1}} \end{aligned}$$

in the fivefold cross-validation using real datasets. The upper part of Fig. 8 is the result for the Yahoo dataset, where \(T = 0.02 \mathrm{and} K2 = K = 500\), and the lower part is the result for the News article dataset, where \(K2 = K = 500\). The results demonstrate that the influence of Update converges to zero according to Proposition 3, even when real datasets are used.

Fig. 8
figure 8

Experimental example. Influence of Update, \(R_t \), for the iteration t. The upper part is the result for the Yahoo dataset, and the lower part is that for the News article dataset

1.2 A.2 Gradient method for assigning terms to their polarity scores using fully MLP

We assign sentiment scores to words using the gradient method [2] and the fully MLP as follows. Let the output value of the fully MLP, \(\varvec{y}^{mlp}_{j} \), be \( f^{MLP}(\varvec{v}^{\mathrm{(BOW)}}_j) \in {\mathbb {R}}^{2} \) , and \(D_\mathrm{train}\) be the training dataset documents. The sentiment value of word \(w_{k, i}\), \(Gr(w_{k, i})\), is calculated as

$$\begin{aligned}&\varvec{y}^{mlp+}_{j} := \varvec{y}^{mlp}_{j} \odot (1,0)^\mathrm{T}, \varvec{y}^{mlp-}_{j} := \varvec{y}^{mlp}_{j} \odot (0,1)^\mathrm{T}, \\&\mathrm{Gr}(w_{k, i}) := \frac{\sum _{j \in D_{\mathrm{train}}} \frac{\partial \varvec{y}^{mlp+}_{j}}{\partial {z}^{(1, k)}_{j, i}} - \frac{\partial \varvec{y}^{mlp-}_{j}}{\partial {z}^{(1, k)}_{j, i}} }{|D_{\mathrm{train}}|}. \end{aligned}$$

1.3 A.3 Experimental result details

1.3.1 A.3.1 Interpretability evaluation

Table 7 summarizes the interpretability evaluation results of different parameter settings: the mean \(F_1\) scores and the standard deviation scores.

Table 7 Interpretability evaluation results from different parameter settings (the mean value ± the standard deviation results)

1.3.2 A.3.2 Market mood predictability evaluation

Tables 8 and 9 show the market mood predictability results of the fivefold cross-validation.

Table 8 Market mood predictability evaluation results
Table 9 Details of market mood predictability results

Table 10 shows the mean scores and standard deviation values from different parameter settings when the market mood predictability is evaluated in terms of the mean score of the fivefold cross-validation.

Table 10 Market mood predictability evaluation results (the mean value ± and the standard deviation results from different parameter settings)

1.4 A.4 Detailed text visualization results for other initialization settings in Init

Tables 11, 12, and 13 represent the detailed text visualization results for other initialization settings in Init.

Table 11 Market mood predictability results for the GINN model when the number of words in Init decreases
Table 12 Interpretability evaluation results for the GINN model when the number of words in Init decreases
Table 13 Human interpretability evaluation results for the GINN model when the number of words in Init decreases

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ito, T., Sakaji, H., Izumi, K. et al. GINN: gradient interpretable neural networks for visualizing financial texts. Int J Data Sci Anal 9, 431–445 (2020). https://doi.org/10.1007/s41060-018-0160-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-018-0160-8

Keywords

Navigation