Abstract
This paper demonstrates the application of Natural Language Processing (NLP) tools to explore large libraries of documents and to correlate heuristic associations between text descriptions in figure captions with interpretations of images and figures. The use of visualization tools based on NLP methods permits one to quickly assess the extent of the research described in the literature related to a specific topic. The authors demonstrate how the use of NLP methods on only the figure captions without having to navigate the entire text of a document can provide an accelerated assessment of the literature in a given domain.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
E. Kim, K. Huang, A. Tomala, S. Matthews, E. Strubell, A. Saunders, A. McCallum, and E. Olivetti: Machine-learned and codified synthesis parameters of oxide materials. Sci. Data 127, 170127 (2017).
P. Murray-Rust and H.S. Rzepa: Chemical markup, XML, and the world wide web. 4. CML schema. J. Chem. Inf. Comput. Sci. 43, 757–772 (2003).
H.E. Pence and A. Williams: Chemspider: an online chemical information resource. J. Chem. Educ. 87, 1123–1124 (2010).
R. Sheshadri and T.D. Sparks: Perspective: interactive material databases through aggregation of literature data. APL Mater 4, 053206 (2016).
L.C. Lin, A.H. Berger, R.L. Martin, J. Kim, J.A. Swisher, K. Jariwala, C.H. Rycroft, A.S. Bhown, M.W. Deem, M. Haranczyk, and B. Smit: In silico screening of carbon capture materials. Nat. Mater 11, 633–641 (2012).
A.O. Oliynyk, E. Antono, T.D. Sparks, L. Ghadbeigi, M.W. Gaultois, B. Meredig, and A. Mar: High throughput machine learning driven synthesis of full-Heusler compounds. Chem. Mater 28, 7324–7331 (2016).
E.O. Pyzer‐Knapp, K. Li, and A. Aspuru-Guzik: Learning from the Harvard Clean Energy Project: the use of neural networks to accelerate materials discovery. Adv. Funct. Mater. 25, 649–6502 (2015).
B.G. Sumpter, R.K. Vasudevan, T. Potok, and S.V. Kalinin: A bridge for accelerating materials by design. NPJ Comp. Mater 1, 15008 (2015).
T. Rocktaschel, M. Weidlich, and U. Leser: ChemSport: a hybrid system for chemical named entity recognition. Bioinformatics 28, 1633–1640 (2012).
C.E. Wilmer, M. Leaf, C.Y. Lee, O.K. Farha, B.G. Hauser, J.T. Hupp, and R. Q. Snurr: Large scale screening of hypothetical metal-organic frameworks. Nat. Chem. 4, 83–89 (2011).
E. Kim, K. Huang, J. Stefanie, and E. Olivetti: Virtual screening of inorganic materials synthesis parameters with deep learning. NPJ Comp. Mater 3, 53 (2017).
M.C. Swain and J.M. Cole: ChemDataExtractor: a toolkit for automated extraction of chemical information from the scientific literature. J. Chem. Inf. Model. 56, 1894–1904 (2016).
C.J. Callum and J.M. Cole: Auto-generated materials database of Curie and Neel temperatures via semi-supervised relationship extraction. Sci. Data 5, 180111 (2018).
N.P. Bansal and J. Lamon: Ceramic Matrix Composites: Materials, Modelling, and Technology (John Wiley & Sons, Hoboken, NJ, 2016).
M. Sato and Y. Ando: Topological Superconductors: a review. Rep. Prog. Phys 80, 076501 (2017).
Elsevier: Elsevier Developers. (2018). https://dev.elsevier.com/ (cited 2018).
A. Torralba, R. Fergus, and W.T. Freeman: 80 Million tiny images: a large dataset for non-parametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30, 1958–1970 (2008).
J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, and L. Fei-Fei: ImageNet: a large scale hierarchial image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009; pp. 248–255.
K.S. Jones: A statistical interpretation of term specificity and its application in retrieval. J. Doc 28, 11–21 (1972).
L. van der Maaten and G. Hinton: Visualizing Data using t-SNE. Journal of Machine Learning Research 9, 2579–2605 (2008).
Yoichi Ando and Liang Fu: Topological Crystalline Insulators and Topological Superconductors: From Concepts to Materials. Annual Review of Condensed Matter Physics 6(1), 361–381 (2015). http://dx.doi.org/10.1146/annurev-conmatphys-031214-014501.
Rabia Sultana, P. Neha, R. Goyal, S. Patnaik, and V.P.S. Awana: Unusual non saturating Giant Magneto-resistance in single crystalline Bi 2 Te 3 topological insulator. Journal of Magnetism and Magnetic Materials 428, 213–218 (2017). http://dx.doi.org/10.1016/j.jmmm.2016.12.011.
A.F Goncharov and V.V Struzhkin: Pressure dependence of the Raman spectrum, lattice parameters and superconducting critical temperature of MgB2: evidence for pressure-driven phonon-assisted electronic topo-logical transition. Physica C: Superconductivity 385(1-2), 117–130 (2003). http://dx.doi.org/10.1016/S0921-4534(02)02311-0.
C.C. Chang, T.K. Chen, W.C. Lee, P.H. Lin, M.J. Wang, Y.C. Wen, P.M. Wu, and M.K. Wu: Superconductivity in Fe-chalcogenides. Physica C: Superconductivity and its Applications 514, 423–434 (2015). http://dx.doi.org/10.1016/j.physc.2015.02.011.
A. Andrada-Chacón, V.G. Baonza, and J. Sánchez-Benítez: Correlation between electrical resistance and defect concentration in graphite under non-hydrostatic stress. Carbon 113, 20–211 (2017). http://dx.doi.org/10.1016/j.carbon.2016.11.058.
Marianna V. Kharlamova: Advances in tailoring the electronic properties of single-walled carbon nanotubes. Progress in Materials Science 77, 12–211 (2016). http://dx.doi.org/10.1016/j.pmatsci.2015.09.001.
Francesco Bonaccorso, Antonio Lombardo, Tawfique Hasan, Zhipei Sun, Luigi Colombo, and Andrea C. Ferrari: Production and processing of gra-phene and 2d crystals. Materials Today 15(12), 564–589 (2012). http://dx.doi.org/10.1016/S1369-7021(13)70014-2.
Yu. A. Freiman and H.J. Jodl: Solid oxygen. Physics Reports 401(1-4), 1–228 (2004). http://dx.doi.org/10.1016/j.physrep.2004.06.002.
Marc D. Fontana and Patrice Bourson: Microstructure and defects probed by Raman spectroscopy in lithium niobate crystals and devices. Applied Physics Reviews 2(4), 040602 (2015). http://dx.doi.org/10.1063/1. 4934203.
Acknowledgments
We gratefully acknowledge support from the National Science Foundation (NSF) DIBBs program, award number 1640867. The authors would also like to acknowledge support from the Toyota Research Institute Accelerated Materials Design and Discovery program. K.R. acknowledges the Erich Bloch Endowed Chair at the University at Buffalo.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Venugopal, V., Broderick, S.R. & Rajan, K. A picture is worth a thousand words: applying natural language processing tools for creating a quantum materials database map. MRS Communications 9, 1134–1141 (2019). https://doi.org/10.1557/mrc.2019.136
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1557/mrc.2019.136