Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2661829.2661882acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Fast Heuristics for Near-Optimal Task Allocation in Data Stream Processing over Clusters

Published: 03 November 2014 Publication History

Abstract

We study provisioning and job reconfiguration techniques for adapting to execution environment changes when processing data streams on cluster-based deployments. By monitoring the performance of an executing job, we identify computation and communication bottlenecks. In such cases we reconfigure the job by reallocating its tasks to minimize the communication cost. Our work targets data-intensive applications where the inter-node transfer latency is significant. We aim to minimize the transfer latency while keeping the nodes below some computational load threshold. We propose a scalable centralized scheme that employs fast allocation heuristics. Our techniques are based on a general group-based job representation that is commonly found in many distributed data stream processing frameworks. Using this representation we devise linear-time task allocation algorithms that improve existing quadratic-time solutions in practical cases. We have implemented and evaluated our proposals using both synthetic and real-world scenarios. Our results show that our algorithms: (a) exhibit significant allocation throughput while producing near-optimal allocations, and (b) significantly improve existing task-level approaches.

References

[1]
D. J. Abadi et al. Aurora: a new model and architecture for data stream management. The VLDB Journal, 2003.
[2]
D. J. Abadi et al. The design of the Borealis stream processing engine. In CIDR, 2005.
[3]
L. Amini et al. SPC: A distributed, scalable platform for data mining. In DMSSP, 2006.
[4]
L. Aniello et al. Adaptive online scheduling in Storm. In DEBS, 2013.
[5]
B. Babcock et al. Operator scheduling in data stream systems. The VLDB Journal, 13(4):333--353, 2004.
[6]
S. Bokhari. A shortest tree algorithm for optimal assignments across space and time in a distributed processor system. Software Engineering, IEEE Transactions on, SE-7(6), 1981.
[7]
S. H. Bokhari. Dual processor scheduling with dynamic reassignment. IEEE Trans. Softw. Eng., 5(4), 1979.
[8]
A. Burns. Scheduling hard real-time systems: a review. Softw. Eng. J., 6(3), 1991.
[9]
D. Carney et al. Operator scheduling in a data stream manager. In VLDB, 2003.
[10]
M.-S. Chern et al. An lc branch-and-bound algorithm for the module assignment problem. Inform. Process. Lett., 1989.
[11]
M. Cherniack et al. Scalable distributed stream processing. In CIDR, 2003.
[12]
T. C. K. Chou and J. A. Abraham. Load balancing in distributed systems. IEEE Trans. Softw. Eng., 1982.
[13]
W. Chu. Optimal file allocation in a multiple computer system. Computers, IEEE Transactions on, C-18(10), 1969.
[14]
W. Chu et al. Task allocation in distributed data processing. Computer, 13(11), 1980.
[15]
W. W. Chu et al. Task allocation and precedence relations for distributed real-time systems. IEEE Trans. Comput., 1987.
[16]
J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In OSDI, 2004.
[17]
K. Efe. Heuristic models of task assignment scheduling in distributed systems. Computer, 15(6), 1982.
[18]
O. I. El-Dessouki et al. Distributed enumeration on between computers. Computers, IEEE Transactions on, 1980.
[19]
V. B. Gylys and J. A. Edwards. Optimal partitioning of workload for distributed systems. In COMPCON, 1976.
[20]
K. Haessig and C. J. Jenny. Partitioning and allocation computational objects in distributed computing systems. In Proc. IFIP Congr., 1980.
[21]
C. J. Jenny. Process partitioning in distributed systems. In Proc. NTC, 1977.
[22]
M. Kafil et al. Optimal task assignment in heterogeneous distributed computing systems. Concurrency, IEEE, 1998.
[23]
J. Kreps et al. Kafka: A distributed messaging system for log processing. In NetDB, 2011.
[24]
P.-Y. R. Ma et al. A task allocation model for distributed computing systems. Comp., IEEE Trans. on, 1982.
[25]
L. Neumeyer et al. S4: Distributed stream computing platform. In ICDM Workshops, 2010.
[26]
P. Pietzuch et al. Network-aware operator placement for stream-processing systems. In ICDE '06, 2006.
[27]
G. Rao et al. Assignment of tasks in a distributed processor system with limited memory. Comp., IEEE Trans. on, 1979.
[28]
A. Sarje and G. Sagar. Heuristic model for task allocation in distributed computer systems. Computers and Digital Techniques, IEEE Proceedings, 138(5), 1991.
[29]
M. A. Shah et al. Flux: An adaptive partitioning operator for continuous query systems. In ICDE, 2002.
[30]
J. Sheild. Partitioning concurrent vlsi simulation programs onto a multiprocessor by simulated annealing. Computers and Digital Techniques, IEEE Proceedings, 134(1), 1987.
[31]
C.-C. Shen et al. A graph matching approach to optimal task assignment in distributed computing systems using a minimax criterion. Computers, IEEE Transactions on, 1985.
[32]
H. S. Stone. Multiprocessor scheduling with the aid of network flow algorithms. IEEE Trans. Softw. Eng., 1977.
[33]
Storm. http://github.com/nathanmarz/storm.
[34]
J. Wolf et al. SODA: An optimizing scheduler for large-scale stream-based distributed computer systems. Lecture Notes in Computer Science, 2008.
[35]
Y. Xing. Dynamic load distribution in the Borealis stream processor. In ICDE, 2005.
[36]
Y. Xing. Load distribution for distributed stream processing. In EDBT Workshops, 2005.
[37]
Y. Xing et al. Providing resiliency to load variations in distributed stream processing. In VLDB, 2006.
[38]
X. Yang and X. Zhang. A general heuristic algorithm of task allocation in distributed systems. In Comp. and Appl., 1987.
[39]
Apache Zookeeper. http://zookeeper.apache.org/.

Cited By

View all
  • (2024)Fault Tolerance Placement in the Internet of ThingsProceedings of the ACM on Management of Data10.1145/36549412:3(1-29)Online publication date: 30-May-2024
  • (2024)Online Nonstop Task Management for Storm-Based Distributed Stream Processing EnginesJournal of Computer Science and Technology10.1007/s11390-021-1629-939:1(116-138)Online publication date: 30-Jan-2024
  • (2022)Risk-aware Collection Strategies for Multirobot Foraging in Hazardous EnvironmentsACM Transactions on Autonomous and Adaptive Systems10.1145/351425116:3-4(1-38)Online publication date: 6-Jul-2022
  • Show More Cited By

Index Terms

  1. Fast Heuristics for Near-Optimal Task Allocation in Data Stream Processing over Clusters

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management
    November 2014
    2152 pages
    ISBN:9781450325981
    DOI:10.1145/2661829
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 November 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data stream processing
    2. distributed and real-time computation
    3. resource utilization
    4. task allocation

    Qualifiers

    • Research-article

    Conference

    CIKM '14
    Sponsor:

    Acceptance Rates

    CIKM '14 Paper Acceptance Rate 175 of 838 submissions, 21%;
    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 18 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Fault Tolerance Placement in the Internet of ThingsProceedings of the ACM on Management of Data10.1145/36549412:3(1-29)Online publication date: 30-May-2024
    • (2024)Online Nonstop Task Management for Storm-Based Distributed Stream Processing EnginesJournal of Computer Science and Technology10.1007/s11390-021-1629-939:1(116-138)Online publication date: 30-Jan-2024
    • (2022)Risk-aware Collection Strategies for Multirobot Foraging in Hazardous EnvironmentsACM Transactions on Autonomous and Adaptive Systems10.1145/351425116:3-4(1-38)Online publication date: 6-Jul-2022
    • (2022)Two-stage Scheduling of Stream Computing for Industrial Cloud-edge Collaboration2022 IEEE International Conference on Joint Cloud Computing (JCC)10.1109/JCC56315.2022.00016(57-64)Online publication date: Aug-2022
    • (2022)Dynamic energy-efficient scheduling for streaming applications in stormComputing10.1007/s00607-021-00961-7104:2(413-432)Online publication date: 1-Feb-2022
    • (2022)Dynamic Resource-Efficient Scheduling in Data Stream Management Systems Deployed on Computing CloudsNew Frontiers in Cloud Computing and Internet of Things10.1007/978-3-031-05528-7_5(133-163)Online publication date: 28-Apr-2022
    • (2021)TATA: Throughput-Aware TAsk Placement in Heterogeneous Stream Processing with Deep Reinforcement Learning2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom52081.2021.00021(44-54)Online publication date: Sep-2021
    • (2021)Task allocation for distributed stream processingIEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications10.1109/INFOCOM.2016.7524433(1-9)Online publication date: 10-Mar-2021
    • (2021)Towards a Security-Aware Deployment of Data Streaming Applications in Fog ComputingFog/Edge Computing For Security, Privacy, and Applications10.1007/978-3-030-57328-7_14(355-385)Online publication date: 5-Jan-2021
    • (2021)Self‐adaptation on parallel stream processing: A systematic reviewConcurrency and Computation: Practice and Experience10.1002/cpe.675934:6Online publication date: 7-Dec-2021
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media