Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/CCGrid.2015.16acmotherconferencesArticle/Chapter ViewAbstractPublication PagesccgridConference Proceedingsconference-collections
research-article

Towards provenance-based anomaly detection in MapReduce

Published: 04 May 2015 Publication History

Abstract

MapReduce enables parallel and distributed processing of vast amount of data on a cluster of machines. However, such computing paradigm is subject to threats posed by malicious and cheating nodes or compromised user submitted code that could tamper data and computation since users maintain little control as the computation is carried out in a distributed fashion. In this paper, we focus on the analysis and detection of anomalies during the process of MapReduce computation. Accordingly, we develop a computational provenance system that captures provenance data related to MapReduce computation within the MapReduce framework in Hadoop. In particular, we identify a set of invariants against aggregated provenance information, which are later analyzed to uncover anomalies indicating possible tampering of data and computation. We conduct a series of experiments to show the efficiency and effectiveness of our proposed provenance system.

References

[1]
J. Dean and S. Ghemawat, "Mapreduce: simplified data processing on large clusters," Communications of the ACM, vol. 51, no. 1, pp. 107--113, 2008.
[2]
"Hadoop apache," http://hadoop.apache.org.
[3]
J. Dyer and N. Zhang, "Security issues relating to inadequate authentication in mapreduce applications," in High Performance Computing and Simulation (HPCS), 2013 International Conference on. IEEE, 2013, pp. 281--288.
[4]
W. Wei, J. Du, T. Yu, and X. Gu, "Securemr: A service integrity assurance framework for mapreduce," in Computer Security Applications Conference, 2009. ACSAC'09. Annual. IEEE, 2009, pp. 73--82.
[5]
Z. Xiao and Y. Xiao, "Accountable mapreduce in cloud computing," in Proc. IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS 11), 2011, pp. 1082--1087.
[6]
Y. Wang and J. Wei, "Viaf: Verification-based integrity assurance framework for mapreduce," in Cloud Computing (CLOUD), 2011 IEEE International Conference on. IEEE, 2011, pp. 300--307.
[7]
C. Huang, S. Zhu, and D. Wu, "Towards trusted services: Result verification schemes for mapreduce," in Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on. IEEE, 2012, pp. 41--48.
[8]
E. Yoon and A. Sqcuicciarini, "Toward detecting compromised mapreduce workers through log analysis," in Cluster, Cloud and Grid Computing (CCGrid), 2014 14th IEEE/ACM International Symposium on, May 2014, pp. 41--50.
[9]
O. Q. Zhang, M. Kirchberg, R. K. Ko, and B. S. Lee, "How to track your data: The case for cloud computing provenance," in Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on. IEEE, 2011, pp. 446--453.
[10]
R. Ikeda, H. Park, and J. Widom, "Provenance for generalized map and reduce workflows," in CIDR 2011. Stanford InfoLab. {Online}. Available: http://ilpubs.stanford.edu:8090/985/
[11]
Y. L. Simmhan, B. Plale, and D. Gannon, "A survey of data provenance in e-science," ACM Sigmod Record, vol. 34, no. 3, pp. 31--36, 2005.
[12]
J. Freire, D. Koop, E. Santos, and C. T. Silva, "Provenance for computational tasks: A survey," Computing in Science & Engineering, vol. 10, no. 3, pp. 11--21, 2008.
[13]
K.-K. Muniswamy-Reddy, D. A. Holland, U. Braun, and M. I. Seltzer, "Provenance-aware storage systems." in USENIX Annual Technical Conference, General Track, 2006, pp. 43--56.
[14]
K.-K. Muniswamy-Reddy and M. Seltzer, "Provenance as first class cloud data," ACM SIGOPS Operating Systems Review, vol. 43, no. 4, pp. 11--16, 2010.
[15]
S. Akoush, R. Sohan, and A. Hopper, "Hadoopprov: Towards provenance as a first class citizen in mapreduce," in Proceedings of the 5th USENIX Workshop on the Theory and Practice of Provenance, ser. TaPP '13. Berkeley, CA, USA: USENIX Association, 2013, pp. 11:1--11:4. {Online}. Available: http://dl.acm.org/citation.cfm?id=2482949.2482963
[16]
"The kepler project," http://kepler-project.org.
[17]
"The taverna project," http://taverna.sourceforge.net.
[18]
K.-K. Muniswamy-Reddy, P. Macko, and M. I. Seltzer, "Provenance for the cloud." in FAST, vol. 10, 2010, pp. 15--14.
[19]
H. Park, R. Ikeda, and J. Widom, "Ramp: A system for capturing and tracing provenance in mapreduce workflows," in 37th International Conference on Very Large Data Bases (VLDB). Stanford InfoLab, August 2011.
[20]
R. Hasan, R. Sion, and M. Winslett, "Introducing secure provenance: problems and challenges," in Proceedings of the 2007 ACM workshop on Storage security and survivability. ACM, 2007, pp. 13--18.
[21]
P. McDaniel, K. R. Butler, S. E. McLaughlin, R. Sion, E. Zadok, and M. Winslett, "Towards a secure and efficient system for end-to-end provenance." in TaPP, 2010.
[22]
R. Hasan, R. Sion, and M. Winslett, "The case of the fake picasso: Preventing history forgery with secure provenance." in FAST, vol. 9, 2009, pp. 1--14.
[23]
A. Bates, B. Mood, M. Valafar, and K. Butler, "Towards secure provenance-based access control in cloud environments," in Proceedings of the third ACM conference on Data and application security and privacy. ACM, 2013, pp. 277--284.
[24]
U. Braun, A. Shinnar, and M. I. Seltzer, "Securing provenance." in HotSec, 2008.
[25]
A. Rosenthal, L. Seligman, A. Chapman, and B. T. Blaustein, "Scalable access controls for lineage." in Workshop on the Theory and Practice of Provenance, 2009.
[26]
Q. Ni, S. Xu, E. Bertino, R. Sandhu, and W. Han, "An access control language for a general provenance model," in Secure Data Management. Springer, 2009, pp. 68--88.
[27]
I. Roy, S. T. Setty, A. Kilzer, V. Shmatikov, and E. Witchel, "Airavat: Security and privacy for mapreduce." in NSDI, vol. 10, 2010, pp. 297--312.
[28]
V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth et al., "Apache hadoop yarn: Yet another resource negotiator," in Proceedings of the 4th annual Symposium on Cloud Computing. ACM, 2013, p. 5.
[29]
J. Tan, X. Pan, E. Marinelli, S. Kavulya, R. Gandhi, and P. Narasimhan, "Kahuna: Problem diagnosis for mapreduce-based cloud computing environments," in Network Operations and Management Symposium (NOMS), 2010 IEEE. IEEE, 2010, pp. 112--119.
[30]
"Aspectj," http://eclipse.org/aspectj/.
[31]
"Javassist," http://www.csg.ci.i.u-tokyo.ac.jp/~chiba/javassist/.

Cited By

View all
  • (2022)Visualizing architectural evolution via provenance trackingProceedings of the Conference on Research in Adaptive and Convergent Systems10.1145/3538641.3561493(83-91)Online publication date: 3-Oct-2022
  • (2020)Discrepancy detection in whole network provenanceProceedings of the 12th USENIX Conference on Theory and Practice of Provenance10.5555/3488890.3488895(5-5)Online publication date: 22-Jun-2020
  • (2016)Management of distributed big data for social networksProceedings of the 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2016.107(639-648)Online publication date: 16-May-2016
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
CCGRID '15: Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing
May 2015
1277 pages
ISBN:9781479980062

Publisher

IEEE Press

Publication History

Published: 04 May 2015

Check for updates

Author Tags

  1. MapReduce
  2. computation integrity
  3. logging
  4. provenance

Qualifiers

  • Research-article

Conference

CCGrid '15

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Visualizing architectural evolution via provenance trackingProceedings of the Conference on Research in Adaptive and Convergent Systems10.1145/3538641.3561493(83-91)Online publication date: 3-Oct-2022
  • (2020)Discrepancy detection in whole network provenanceProceedings of the 12th USENIX Conference on Theory and Practice of Provenance10.5555/3488890.3488895(5-5)Online publication date: 22-Jun-2020
  • (2016)Management of distributed big data for social networksProceedings of the 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2016.107(639-648)Online publication date: 16-May-2016
  • (2015)Big Data Mining Applications and ServicesProceedings of the 2015 International Conference on Big Data Applications and Services10.1145/2837060.2837076(1-8)Online publication date: 20-Oct-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media