Abstract
In a relational table, core columns represent the primary subject entities that other columns in the table depend on. While discovering core columns is crucial for understanding a table’s semantic column types, column relations, and entities, it is often overlooked. Previous methods typically rely on heuristic rules or contextual information, which can fail to accurately capture the dependencies between columns and make it difficult to preserve their relationships. To address these challenges, we introduce Dependency-aware Core Column Discovery (DaCo), an iterative method that uses a novel rough matching strategy to identify both inter-column dependencies and core columns. Unlike other methods, DaCo does not require labeled data or contextual information, making it suitable for practical scenarios. Additionally, it can identify multiple core columns within a table, which is common in real-world tables. Our experimental results demonstrate that DaCo outperforms existing core column discovery methods, substantially improving the efficiency of table understanding tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
By dependency, we are implying that column y is an attribute of column x if y depends on x [28].
- 2.
Our solution can be applied to table understanding tasks since research on table understanding assumes an overlap between the table and the KG.
- 3.
https://github.com/barrel-0314/daco.
- 4.
It includes tough tables generated from SemTab for dealing with the tabular data to KG matching problem.
References
T2d gold standard for matching web tables to dbpedia (2015). http://webdatacommons.org/webtables/goldstandard.html
Gittables benchmark-column type detection (2021). https://zenodo.org/record/5706316#.YxAVU9NBw2x
Semtab 2021: Semantic web challenge on tabular data to knowledge graph matching (2021), http://www.cs.ox.ac.uk/isg/challenges/sem-tab/2021/
Bhagavatula, C.S., Noraset, T., Downey, D.: TabEL: entity linking in web tables. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 425–441. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25007-6_25
Birnick, J., Blasius, T., Friedrich, T., Naumann, F., Papenbrock, T., Schirneck, M.: Hitting set enumeration with partial information for unique column combination discovery. In: Proceedings of the VLDB Endowment, vol. 13, pp. 2070–2083 (2020)
Bornemann, L., Bleifuß, T., Kalashnikov, D.V., Naumann, F., Srivastava, D.: Natural key discovery in wikipedia tables. In: Proceedings of The Web Conference 2020, pp. 2789–2795 (2020)
Cafarella, M.J., Halevy, A., Wang, D.: WebTables: exploring the power of tables on the web. In: Proceedings of the VLDB Endowment, pp. 538–549 (2008)
Cafarella, M.J., Halevy, A., Wang, D., Wu, E., Zhang, Y.: Uncovering the relational web. In: Proceedings of the 11th International Workshop on Web and Databases (2008)
Chen, J., Jiménez-Ruiz, E., Horrocks, I., Sutton, C.: ColNet: embedding the semantics of web tables for column type prediction. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 29–36 (2019)
Chen, Z., Trabelsi, M., Heflin, J., Xu, Y., Davison, B.D.: Table search using a deep contextualized language model. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 589–598 (2020)
Chirigati, F., Liu, J., Korn, F., Wu, Y., Yu, C., Zhang, H.: Knowledge exploration using tables on the web. In: Proceedings of the VLDB Endowment, vol. 10, pp. 193–204 (2016)
Deng, X., Sun, H., Lees, A., Wu, Y., Yu, C.: TURL: table understanding through representation learning. In: Proceedings of the 2022 ACM SIGMOD International Conference on Management of Data, vol. 14, pp. 33–40 (2022)
Efthymiou, V., Hassanzadeh, O., Rodriguez-Muro, M., Christophides, V.: Matching web tables with knowledge base entities: from entity lookups to entity embeddings. In: Proceedings of the International Semantic Web Conference, pp. 260–277 (2017)
Ermilov, I., Ngomo, A.-C.N.: TAIPAN: automatic property mapping for tabular data. In: Blomqvist, E., Ciancarini, P., Poggi, F., Vitali, F. (eds.) EKAW 2016. LNCS (LNAI), vol. 10024, pp. 163–179. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49004-5_11
Fan, W., Wu, Y., Xu, J.: Functional dependencies for graphs. In: Proceedings of the 2016 ACM SIGMOD International Conference on Management of Data, pp. 1843–1857 (2016)
Gentile, A.L., Ristoski, P., Eckel, S., Ritze, D., Paulheim, H.: Entity matching on web tables: a table embeddings approach for blocking. In: Proceedings of the 20th International Conference on Extending Database Technology, pp. 510–513 (2017)
Harmouch, H., Papenbrock, T., Naumann, F.: Relational header discovery using similarity search in a table corpus. In: 2021 IEEE 37th International Conference on Data Engineering, pp. 444–455. IEEE (2021)
Ho, V.T., Pal, K., Razniewski, S., Berberich, K., Weikum, G.: Extracting contextualized quantity facts from web tables. In: Proceedings of the Web Conference 2021, pp. 4033–4042 (2021)
Ibrahim, Y., Riedewald, M., Weikum, G., Zeinalipour-Yazti, D.: Bridging quantities in tables and text. In: Proceedings of IEEE 35th International Conference on Data Engineering, pp. 1010–1021 (2019)
Khatiwada, A., et al.: Santos: relationship-based semantic table union search. CoRR abs/2209.13589 (2022)
Korini1, K., Peeters, R., Bizer, C.: SOTAB: the WDC schema.org table annotation benchmark. In: Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching co-located with the 21st International Semantic Web Conference, vol. 3320, pp. 14–19 (2022)
Kruit, B., Boncz, P., Urbani, J.: Extracting N-ary facts from wikipedia table clusters. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 655–664 (2020)
Kruit, B., Boncz, P., Urbani, J.: TAKCO: a platform for extracting novel facts from tables. In: Companion Proceedings of the Web Conference, pp. 705–707 (2021)
Kruse, S., Naumann, F.: Efficient discovery of approximate dependencies. In: Proceedings of the VLDB Endowment, vol. 11, pp. 759–772 (2018)
Lehmann, J., et al.: Dbpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web 6(2), 167–195 (2014)
Lehmberg, O., Bizer, C.: Web table column categorisation and profiling. In: Proceedings of the 19th International Workshop on Web and Databases, pp. 1–7 (2016)
Lehmberg, O., Bizer, C.: Stitching web tables for improving matching quality. In: Proceedings of the VLDB Endowment, vol. 10, pp. 1502–1513 (2017)
Lehmberg, O., Bizer, C.: Profiling the semantics of N-ary web table data. In: Proceedings of the International Workshop on Semantic Big Data, vol. 5, pp. 1–6 (2019)
Lehmberg, O., Bizer, C.: Synthesizing N-ary relations from web tables. In: Proceedings of the 9th International Conference on Web Intelligence, Mining and Semantics, vol. 17, pp. 1–12 (2019)
Li, Z.: Cauchy convergence topologies on the space of continuous functions. Topol. Appl. 161, 321–329 (2014)
Luzuriaga, J., Munoz, E., Rosales-Mendez, H., Hogan, A.: Merging web tables for relation extraction with knowledge graphs. IEEE Trans. Knowl. Data Eng. 35(2), 1803–1816 (2023)
Marzocchi, M., Cremaschi, M., Pozzi, R., Avogadro, R., Palmonari, M.: MammoTab: a giant and comprehensive dataset for semantic table interpretation. In: Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching co-located with the 21st International Semantic Web Conference, vol. 3320, pp. 28–33 (2022)
Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of Machine Learning. The MIT Press (2018)
Nargesian, F., Zhu, E., Pu, K.Q., Miller, R.J.: Table union search on open data. In: Proceedings of the VLDB Endowment, vol. 11, pp. 813–825 (2018)
Neumaier, S., Umbrich, J., Parreira, J.X., Polleres, A.: Multi-level semantic labelling of numerical values. In: Groth, P., et al. (eds.) Proceedings of the 15th International Semantic Web Conference, pp. 428–445 (2016)
Nguyen, P., Kertkeidkachorn, N., Ichise, R., Takeda, H.: TabEAno: table to knowledge graph entity annotation. CoRR abs/2010.01829 (2020)
Pham, M., Alse, S., Knoblock, C.A., Szekely, P.: Semantic labeling: a domain-independent approach. In: Groth, P., et al., (eds.) Proceedings of the 15th International Semantic Web Conference, pp. 446–462 (2016)
Ritze, D., Lehmberg, O., Bizer, C.: Matching html tables to DBpedia. In: Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics, pp. 1–6 (2015)
Shyu, S.j., Yin, P., Lin, B.M.T.: An ant colony optimization algorithm for the minimum weight vertex cover problem. Ann. Oper. Res. 131, 283–304 (2004)
Sismanis, Y., Brown, P., Haas, P.J., Reinwald, B.: GORDIAN: efficient and scalable discovery of composite keys. In: Proceedings of the VLDB Endowment, pp. 691–702 (2006)
Sun, H., Ma, H., Yih, W.t., Yan, X.: Table cell search for question answering. In: Proceedings of the 25th International Conference on World Wide Web, pp. 771–782 (2016)
Takeoka, K., Oyamada, M., Nakadai, S., Okadome, T.: Meimei: an efficient probabilistic approach for semantically annotating tables. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 281–288 (2019)
Tan, Z., Ran, A., Ma, S., Qin, S.: Fast incremental discovery of pointwise order dependencies. In: Proceedings of the VLDB Endwment, vol. 13, pp. 1669–1681 (2020)
Trabelsi, M., Chen, Z., Zhang, S., Davison, B.D., Heflin, J.: StruBERT: structure-aware BERT for table search and matching. In: Proceedings of the Web Conference 2022, pp. 442–451 (2021)
Venetis, P., et al.: Recovering semantics of tables on the web. In: Proceedings of the VLDB Endowment, vol. 4, pp. 528–538 (2011)
Wang, N., Ren, X.: Identifying multiple entity columns in web tables. Int. J. Softw. Eng. Knowl. Eng. 28(3), 287–309 (2018)
Wei, Z., Hartmann, S., Link, S.: Discovery algorithms for embedded functional dependencies. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 833–843 (2020)
Yin, P., Neubig, G., Yih, W.T., Riedel, S.: TaBERT: pretraining for joint understanding of textual and tabular data. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, pp. 8413–8426 (2020)
Zhang, M., Chakrabarti, K.: InfoGather+ semantic matching and annotation of numeric and time-varying attributes in web tables. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 145–156 (2013)
Zhang, S., Balog, K.: Ad hoc table retrieval using semantic similarity. In: Proceedings of the World Wide Web Conference, pp. 1553–1562 (2018)
Zhang, S., Balog, K.: On-the-fly table generation. In: Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 595–604 (2018)
Zhang, S., Balog, K.: Auto-completion for data cells in relational tables. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 761–770 (2019)
Zhang, S., Balog, K.: Web table extraction, retrieval, and augmentation: a survey. ACM Trans. Intell. Syst. Technol. 11, 13:1-13:35 (2020)
Zhang, S., Meij, E., Balog, K., Rernanda, R.: Novel entity discovery from web tables. In: Proceedings of International World Wide Web Conference, pp. 1298–1308 (2020)
Zhang, X., Chen, Y., Chen, J., Du, X., Zou, L.: Mapping entity-attribute web tables to web-scale knowledge bases. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds.) DASFAA 2013. LNCS, vol. 7826, pp. 108–122. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37450-0_8
Zhang, Z.: Towards efficient and effective semantic table interpretation. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 487–502. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_31
Zhang, Z.: Effective and efficient semantic table interpretation using TableMiner+. Semantic Web 8(6), 921–957 (2017)
Zhu, G., Iglesias, C.A.: Computing semantic similarity of concepts in knowledge graphs. IEEE Trans. Knowl. Data Eng. 29(1), 72–89 (2017)
Acknowledgements
We would like to thank Jiaoyan Chen for his useful comment on this paper. This work is supported by the State Grid Technology Project “research and application of key technologies for automatic graphic construction of power grid control system driven by model and data”, the National Natural Science Foundation of China under the grant numbers [62061146001, 62072099, 62232004], the “Zhishan” Scholars Programs of Southeast University, and the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Qiu, J. et al. (2023). Dependency-Aware Core Column Discovery for Table Understanding. In: Payne, T.R., et al. The Semantic Web – ISWC 2023. ISWC 2023. Lecture Notes in Computer Science, vol 14265. Springer, Cham. https://doi.org/10.1007/978-3-031-47240-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-47240-4_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47239-8
Online ISBN: 978-3-031-47240-4
eBook Packages: Computer ScienceComputer Science (R0)