Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3340422.3343636acmconferencesArticle/Chapter ViewAbstractPublication PagesaintecConference Proceedingsconference-collections
research-article

Distributed Hayabusa: Scalable Syslog Search Engine Optimized for Time-Dimensional Search

Published: 07 August 2019 Publication History

Abstract

Network administrators usually collect and store logs generated by servers, networks, and security appliances so that when network trouble and/or security incidents occur, they can identify the source of the problem by investigating the contents of the logs. The size of the system needed to store and search the log messages tends to increase as the size of the managed network becomes large. A fast log storage and search system called Hayabusa was previously proposed that optimizes a time-dimensional search operation. In this paper, we propose a simple distributed system that adds scalability to the existing Hayabusa system. The evaluation results show that the Distributed Hayabusa system consisting of 10 servers (with multiple worker processes on each server) is 36 times faster than a standalone Hayabusa system. The time required to perform a full-text search over 14.4 billion data records is only about 7 s, which is sufficiently low for the daily operations of administrators managing a very-large-scale network.

References

[1]
1994. Interop Tokyo. http://www.interop.jp/.
[2]
1994. ShowNet. http://www.interop.jp/2017/shownet/.
[3]
2003. Splunk. https://www.splunk.com/.
[4]
2006. Amazon Web Service. https://aws.amazon.com/.
[5]
2009. UDP Samplicator. https://github.com/sleinen/samplicator.
[6]
2010. Elasticsearch. https://www.elastic.co/products/elasticsearch.
[7]
2010. SQLite. https://www.sqlite.org/.
[8]
2011. Apache Hadoop. http://hadoop.apache.org/.
[9]
2013. InfluxDB. https://www.influxdata.com/time-series-platform/influxdb/.
[10]
2013. An inside look at google bigquery. (2013). https://cloud.google.com/files/BigQueryTechnicalWP.pdf
[11]
2014. VMware vRealize Log Insight. https://www.vmware.com/products/vrealize-log-insight.html.
[12]
2018. Hayabusa2. https://github.com/hirolovesbeer/hayabusa2.
[13]
Hiroshi Abe, Keiichi Shima, Yuji Sekiya, Daisuke Miyamoto, Tomohiro Ishihara, and Kazuya Okada. 2017. Hayabusa: Simple and Fast Full-Text Search Engine for Massive System Log Data. In Proceedings of the 12th International Conference on Future Internet Technologies (CFI'17). ACM, New York, NY, USA, Article 2, 7 pages.
[14]
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51, 1 (Jan. 2008), 107--113.
[15]
George L. 2011. HBase: The Definitive Guide: Random Access to Your Planet-Size Data (1st ed.). O'Reilly Media, Inc.
[16]
Pieter Hintjens. 2011. 0MQ - The Guide. http://zguide.zeromq.org/page:all
[17]
Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoflrey Romer, Shiva Shivakumar, Matt Tolton, and Theo Vassilakis. 2010. Dremel: Interactive Analysis of Web-Scale Datasets. In Proc. of the 36th Int'l Conf on Very Large Data Bases. 330--339. http://www.vldb2010.org/accept.htm
[18]
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The Hadoop Distributed File System. In Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) (MSST '10). IEEE Computer Society, Washington, DC, USA, 1--10.
[19]
O. Tange. 2011. GNU Parallel - The Command-Line Power Tool. ;login: The USENIX Magazine 36, 1 (Feb. 2011), 42--47.
[20]
Vinod Kumar Vavilapalli, Arun C. Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, Bikas Saha, Carlo Curino, Owen O'Malley, Sanjay Radia, Benjamin Reed, and Eric Baldeschwieler. 2013. Apache Hadoop YARN: Yet Another Resource Negotiator. In Proceedings of the 4th Annual Symposium on Cloud Computing (SOCC '13). ACM, New York, NY, USA, Article 5, 16 pages.
[21]
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI'12). USENIX Association, Berkeley, CA, USA, 2--2. http://dl.acm.org/citation.cfm?id=2228298.2228301
[22]
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. In Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud'10). USENIX Association, Berkeley, CA, USA, 10--10. http://dl.acm.org/citation.cfm?id=1863103.1863113

Cited By

View all
  • (2021)amulog: A general log analysis framework for comparison and combination of diverse template generation methods*International Journal of Network Management10.1002/nem.219532:4Online publication date: 19-Dec-2021
  • (2020)amulog: A General Log Analysis Framework for Diverse Template Generation Methods2020 16th International Conference on Network and Service Management (CNSM)10.23919/CNSM50824.2020.9269049(1-5)Online publication date: 2-Nov-2020
  • (2019)Ontology-Based System for Dynamic Risk Management in Administrative DomainsApplied Sciences10.3390/app92145479:21(4547)Online publication date: 26-Oct-2019

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
AINTEC '19: Proceedings of the 15th Asian Internet Engineering Conference
August 2019
60 pages
ISBN:9781450368490
DOI:10.1145/3340422
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • AIOT: Asian Institute of Technology

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data Processing
  2. Parallel Processing
  3. Parallel System
  4. SQL

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

AINTEC '19
Sponsor:
AINTEC '19: Asian Internet Engineering Conference
August 7 - 9, 2019
Phuket, Thailand

Acceptance Rates

Overall Acceptance Rate 15 of 38 submissions, 39%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)2
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)amulog: A general log analysis framework for comparison and combination of diverse template generation methods*International Journal of Network Management10.1002/nem.219532:4Online publication date: 19-Dec-2021
  • (2020)amulog: A General Log Analysis Framework for Diverse Template Generation Methods2020 16th International Conference on Network and Service Management (CNSM)10.23919/CNSM50824.2020.9269049(1-5)Online publication date: 2-Nov-2020
  • (2019)Ontology-Based System for Dynamic Risk Management in Administrative DomainsApplied Sciences10.3390/app92145479:21(4547)Online publication date: 26-Oct-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media