Abstract
To improve the quality of questions asked in Community-based questions answering forums, we create a new dataset from the website, Stack Overflow, which contains three components: (1) context: the text features of questions, (2) treatment: categories of revision suggestions and (3) outcome: the measure of question quality (e.g., the number of questions, upvotes or clicks). This dataset helps researchers develop causal inference models towards solving two problems: (i) estimating the causal effects of aforementioned treatments on the outcome and (ii) finding the optimal treatment for the questions. Empirically, we performed experiments with three state-of-the-art causal effect estimation methods on the contributed dataset. In particular, we evaluated the optimal treatments recommended by the these approaches by comparing them with the ground truth labels – treatments (suggestions) provided by experts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
References
Anderson, A., Huttenlocher, D., Kleinberg, J., Leskovec, J.: Discovering value from community activity on focused question answering sites: a case study of stack overflow. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 850–858, August 2012
Correa, D., Sureka, A.: Chaff from the wheat: characterization and modeling of deleted questions on stack overflow. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 631–642. ACM (2014)
Kato, M., White, R.W., Teevan, J., Dumais, S.: Clarifications and question specificity in synchronous social Q&A. ACM, April 2013
Faruqui, M., Das, D.: Identifying well-formed natural language questions. arXiv e-prints, page arXiv:1808.09419, August 2018
Trienes, J., Balog, K.: Identifying unclear questions in community question answering websites. In: Azzopardi, L., Stein, B., Fuhr, N., Mayr, P., Hauff, C., Hiemstra, D. (eds.) ECIR 2019. LNCS, vol. 11437, pp. 276–289. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-15712-8_18
Yang, J., Hauff, C., Bozzon, A., Houben, G.-J.: Asking the right question in collaborative q&a systems. In: Proceedings of the 25th ACM Conference on Hypertext and Social Media, HT 2014, pp. 179–189. ACM , New York (2014)
Mueller, J., Reshef, D.N., Du, G., Jaakkola, T.: Learning optimal interventions. arXiv preprint arXiv:1606.05027 (2016)
Mueller, J., Gifford, D., Jaakkola, T.: Sequence to better sequence: continuous revision of combinatorial structures. In: Precup, D., Teh, Y.W., (eds.) Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 2536–2544. International Convention Centre, Sydney, 06–11 August 2017. PMLR
Yang, D., Halfaker, A., Kraut, R., Hovy, E.: Identifying semantic edit intentions from revisions in Wikipedia. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, September 2017, pp. 2000–2010. Association for Computational Linguistics (2017)
Miao, N., Zhou, H., Mou, L., Yan, R., Li, L.: CGMH: constrained sentence generation by metropolis-hastings sampling. CoRR, abs/1811.10996 (2018)
Guo, R., Cheng, L., Li, J., Richard Hahn, P., Liu, H.: A survey of learning causality with data: problems and methods (2018)
Hill, J.L.: Bayesian nonparametric modeling for causal inference. J. Comput. Graph. Stat. 20(1), 217–240 (2011)
Johansson, F.D., Shalit, U., Sontag, D.: Learning representations for counterfactual inference (2016)
Guo, R., Li, J., Liu, H.: Learning individual treatment effects from networked observational data. arXiv preprint arXiv:1906.03485 (2019)
Li, J., Guo, R., Liu, C., Liu, H.: Adaptive unsupervised feature selection on attributed networks. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, 4–8 August 2019, pp. 92–100 (2019)
Shakarian, P., Bhatnagar, A., Aleali, A., Shaabani, E., Guo, R.: Diffusion in Social Networks. SCS. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23105-1
Rakesh, V., Guo, R., Moraffah, R., Agarwal, N., Liu, H.: Linked causal variational autoencoder for inferring paired spillover effects. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 1679–1682. ACM (2018)
Veitch, V., Sridhar, D., Blei, D.M.: Using text embeddings for causal inference. arXiv preprint arXiv:1905.12741 (2019)
Cheng, L., Guo, R., Liu, H.: Robust cyberbullying detection with causal interpretation. In: Companion Proceedings of The 2019 World Wide Web Conference, pp. 169–175. ACM (2019)
Cheng, L., Moraffah, R., Guo, R., Candan, K.S., Raglin, A., Huan, L.: A practical data repository for causal learning with big data. In: 2019 BenchCouncil International Symposium on Benchmarking, Measuring and Optimizing (Bench 2019) (2019)
Chipman, H.A., George, E.I., McCulloch, R.E., et al.: Bart: bayesian additive regression trees. Ann. Appl. Stat. 4(1), 266–298 (2010)
Shalit, U., Johansson, F.D., Sontag, D.: Estimating individual treatment effect: generalization bounds and algorithms. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 3076–3085. JMLR. org (2017)
Louizos, C., Shalit, U., Mooij, J.M., Sontag, D., Zemel, R., Welling, M.: Causal effect inference with deep latent-variable models. In: Advances in Neural Information Processing Systems, pp. 6446–6456 (2017)
Acknowledgement
This material is based upon work supported by ARO/ARL and the National Science Foundation (NSF) Grant #1610282, NSF #1909555.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, Y., Guo, R., Wang, W., Liu, H. (2020). Causal Learning in Question Quality Improvement. In: Gao, W., Zhan, J., Fox, G., Lu, X., Stanzione, D. (eds) Benchmarking, Measuring, and Optimizing. Bench 2019. Lecture Notes in Computer Science(), vol 12093. Springer, Cham. https://doi.org/10.1007/978-3-030-49556-5_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-49556-5_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49555-8
Online ISBN: 978-3-030-49556-5
eBook Packages: Computer ScienceComputer Science (R0)