research-article

Distributed Hayabusa: Scalable Syslog Search Engine Optimized for Time-Dimensional Search

Authors:

Daisuke Miyamoto,

Tomohiro Ishihara,

Satoshi MatsuuraAuthors Info & Claims

AINTEC '19: Proceedings of the 15th Asian Internet Engineering Conference

Pages 9 - 16

https://doi.org/10.1145/3340422.3343636

Published: 07 August 2019 Publication History

Abstract

Network administrators usually collect and store logs generated by servers, networks, and security appliances so that when network trouble and/or security incidents occur, they can identify the source of the problem by investigating the contents of the logs. The size of the system needed to store and search the log messages tends to increase as the size of the managed network becomes large. A fast log storage and search system called Hayabusa was previously proposed that optimizes a time-dimensional search operation. In this paper, we propose a simple distributed system that adds scalability to the existing Hayabusa system. The evaluation results show that the Distributed Hayabusa system consisting of 10 servers (with multiple worker processes on each server) is 36 times faster than a standalone Hayabusa system. The time required to perform a full-text search over 14.4 billion data records is only about 7 s, which is sufficiently low for the daily operations of administrators managing a very-large-scale network.

References

[1]

1994. Interop Tokyo. http://www.interop.jp/.

[2]

1994. ShowNet. http://www.interop.jp/2017/shownet/.

[3]

2003. Splunk. https://www.splunk.com/.

[4]

2006. Amazon Web Service. https://aws.amazon.com/.

[5]

2009. UDP Samplicator. https://github.com/sleinen/samplicator.

[6]

2010. Elasticsearch. https://www.elastic.co/products/elasticsearch.

[7]

2010. SQLite. https://www.sqlite.org/.

[8]

2011. Apache Hadoop. http://hadoop.apache.org/.

[9]

2013. InfluxDB. https://www.influxdata.com/time-series-platform/influxdb/.

[10]

2013. An inside look at google bigquery. (2013). https://cloud.google.com/files/BigQueryTechnicalWP.pdf

[11]

2014. VMware vRealize Log Insight. https://www.vmware.com/products/vrealize-log-insight.html.

[12]

2018. Hayabusa2. https://github.com/hirolovesbeer/hayabusa2.

[13]

Hiroshi Abe, Keiichi Shima, Yuji Sekiya, Daisuke Miyamoto, Tomohiro Ishihara, and Kazuya Okada. 2017. Hayabusa: Simple and Fast Full-Text Search Engine for Massive System Log Data. In Proceedings of the 12th International Conference on Future Internet Technologies (CFI'17). ACM, New York, NY, USA, Article 2, 7 pages.

Digital Library

[14]

Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51, 1 (Jan. 2008), 107--113.

Digital Library

[15]

George L. 2011. HBase: The Definitive Guide: Random Access to Your Planet-Size Data (1st ed.). O'Reilly Media, Inc.

[16]

Pieter Hintjens. 2011. 0MQ - The Guide. http://zguide.zeromq.org/page:all

[17]

Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoflrey Romer, Shiva Shivakumar, Matt Tolton, and Theo Vassilakis. 2010. Dremel: Interactive Analysis of Web-Scale Datasets. In Proc. of the 36th Int'l Conf on Very Large Data Bases. 330--339. http://www.vldb2010.org/accept.htm

Digital Library

[18]

Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The Hadoop Distributed File System. In Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) (MSST '10). IEEE Computer Society, Washington, DC, USA, 1--10.

Digital Library

[19]

O. Tange. 2011. GNU Parallel - The Command-Line Power Tool. ;login: The USENIX Magazine 36, 1 (Feb. 2011), 42--47.

[20]

Vinod Kumar Vavilapalli, Arun C. Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, Bikas Saha, Carlo Curino, Owen O'Malley, Sanjay Radia, Benjamin Reed, and Eric Baldeschwieler. 2013. Apache Hadoop YARN: Yet Another Resource Negotiator. In Proceedings of the 4th Annual Symposium on Cloud Computing (SOCC '13). ACM, New York, NY, USA, Article 5, 16 pages.

Digital Library

[21]

Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI'12). USENIX Association, Berkeley, CA, USA, 2--2. http://dl.acm.org/citation.cfm?id=2228298.2228301

Digital Library

[22]

Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. In Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud'10). USENIX Association, Berkeley, CA, USA, 10--10. http://dl.acm.org/citation.cfm?id=1863103.1863113

Digital Library

Cited By

Kobayashi SYamashiro YOtomo KFukuda K(2021)amulog: A general log analysis framework for comparison and combination of diverse template generation methods*International Journal of Network Management10.1002/nem.219532:4Online publication date: 19-Dec-2021
https://doi.org/10.1002/nem.2195
Kobayashi SYamashiro YOtomo KFukuda K(2020)amulog: A General Log Analysis Framework for Diverse Template Generation Methods2020 16th International Conference on Network and Service Management (CNSM)10.23919/CNSM50824.2020.9269049(1-5)Online publication date: 2-Nov-2020
https://doi.org/10.23919/CNSM50824.2020.9269049
Vega-Barbas MVillagrá VMonje FRiesco RLarriva-Novo XBerrocal J(2019)Ontology-Based System for Dynamic Risk Management in Administrative DomainsApplied Sciences10.3390/app92145479:21(4547)Online publication date: 26-Oct-2019
https://doi.org/10.3390/app9214547

Index Terms

Distributed Hayabusa: Scalable Syslog Search Engine Optimized for Time-Dimensional Search

Recommendations

Hayabusa: Simple and Fast Full-Text Search Engine for Massive System Log Data
CFI'17: Proceedings of the 12th International Conference on Future Internet Technologies

In this study, we introduce a simple and high-speed search engine for large-scale system logs, called Hayabusa. Hayabusa uses SQLite, standard lightweight database software with GNU Parallel and general Linux commands, such that it can run efficiently ...
Deterministic simulation of shared memory on bounded degree networks
An N-Tier Client/Server course: a classroom experience
ACMSE '04: Proceedings of the 42nd annual ACM Southeast Conference

This paper describes the results and the lessons learned from implementing CS8628: N-Tier Client/Server systems. This is a new graduate level course offered for the first time in the summer of 2003 and it is intended to introduce fundamental concepts of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

AINTEC '19: Proceedings of the 15th Asian Internet Engineering Conference

August 2019

60 pages

ISBN:9781450368490

DOI:10.1145/3340422

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCOMM: ACM Special Interest Group on Data Communication

In-Cooperation

AIOT: Asian Institute of Technology

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

AINTEC '19

Sponsor:

SIGCOMM

AINTEC '19: Asian Internet Engineering Conference

August 7 - 9, 2019

Phuket, Thailand

Acceptance Rates

Overall Acceptance Rate 15 of 38 submissions, 39%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
101
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)2

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kobayashi SYamashiro YOtomo KFukuda K(2021)amulog: A general log analysis framework for comparison and combination of diverse template generation methods*International Journal of Network Management10.1002/nem.219532:4Online publication date: 19-Dec-2021
https://doi.org/10.1002/nem.2195
Kobayashi SYamashiro YOtomo KFukuda K(2020)amulog: A General Log Analysis Framework for Diverse Template Generation Methods2020 16th International Conference on Network and Service Management (CNSM)10.23919/CNSM50824.2020.9269049(1-5)Online publication date: 2-Nov-2020
https://doi.org/10.23919/CNSM50824.2020.9269049
Vega-Barbas MVillagrá VMonje FRiesco RLarriva-Novo XBerrocal J(2019)Ontology-Based System for Dynamic Risk Management in Administrative DomainsApplied Sciences10.3390/app92145479:21(4547)Online publication date: 26-Oct-2019
https://doi.org/10.3390/app9214547

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents