Abstract
Fault detection in modern networks is done with a set of specially instrumented nodes which send probes to find faults. These probes generate additional traffic in network and compete with other regular traffic for bandwidth. In this paper we consider the problem of dynamically adapting the probes based on traffic dynamics experienced by nodes. We propose to profile the links and nodes to get aggregate I/O statistics in a time window and use it as an instantaneous measure of congestion. We consider the network with I/O statistics to generate a weighted graph and formulate an optimization problem to find a set of probes covering whole network with minimum weight. By showing that finding minimum weight probes maps to a known NP complete problem, we propose three greedy algorithms for selecting probes. With both simulation and real graphs of Internet Service Provider (ISP) networks, we perform five sets of experiments and show that proposed algorithms can dynamically adapt to changes in traffic dynamics and also can select probes in large networks in reasonable time.
Similar content being viewed by others
Change history
26 March 2020
The original version of the article unfortunately contained a typo error in the first author name in the author group.
Notes
A preliminary version of this paper appeared in COMSNETS 2018 [2].
Congestion can be measured with other parameters also.
This inherently limit usage of our probe selection algorithms within an autonomous system.
Assuming the graph is connected.
We further assume that the situation where K needs to be \(n-1\) does not occur in practical networks with atleast two probe-stations.
Although Dijkstra’s algorithm is used with link state routing algorithms, in practice some policy based decisions may also be involved. Our model can be tweaked to work with other routing algorithms to identify Candidate Probe Set.
This is a random choice among the two equal weight probes \(CP_6= \{(5,1,4,7),399\}\) and \(CP_{12}= \{(7,4,1,5),399\}\)
The threshold for our experiments is set to 50% which is a heuristic selection, however a system administrator can change this to any value.
We can use other metrics like number of bytes transmitted, etc.
Dikjstra’s algorithm is used here to reduce the number of candidate probes available for selection and to be consistent with other two approaches.
References
Dusia, A., Sethi, A.S.: Probe generation for active probing. Int. J. Netw. Manag. 20, 1–21 (2018)
Tayal, A., Hubballi, N., Natu, M., Sadaphal, V.: Congestion-aware probe selection for fault detection in networks. In: COMSNETS’18: Proceedings of the 10th international conference on communication systems and networks, pp. 407–409 (2018)
Guan, X., Qin, T., Li, W., Wang, P.: Dynamic feature analysis and measurement for large-scale network traffic monitoring. IEEE Trans. Inform. Forens. Secur. 5, 905–919 (2010)
Li, K.Y., Chen, C.W., Lee, S.S.W.: Dynamic load balanced routing in IP networks. In: ICM’16: Proceedings of the 6th international conference on information communication and management, pp. 216–221 (2016)
Zhang, J., Yu, F.R., Wang, S., Huang, T., Liu, Z., Liu, Y.: Load balancing in data center networks: a survey. IEEE Commun. Surv. Tutor. 20, 2324–2352 (2018)
Steinder, M., Sethi, A.S.: A survey of fault localization techniques in computer networks. Sci. Comput. Program. 53, 165–194 (2004)
Natu, M., Sethi, A.S.: Active probing approach for fault localization in computer networks. In: E2EMON ’06: Proceedings of 4th IEEE/IFIP workshop on end-to-end monitoring techniques and services, pp. 1–9 (2006)
Kavulya, S.P., Joshi, K., Giandomenico, F.D., Narasimhan, P.: Failure diagnosis of complex systems. Springer, New York (2012)
Coates, A., Hero III, A.O., Nowak, R., Yu, B.: Internet Tomography. IEEE Signal Processing Magazine 19, 47–65 (2002)
Lawrence, E., Michailidis, G., Nair, V.N., Xi, B.: Network tomography: a review and recent developments. In: Frontiers in statistics, pp. 345–364 (2006)
Remote Network Monitoring MIB Protocol Identifier Macros. https://tools.ietf.org/html/rfc2896
Cisco IOS NetFlow. https://www.cisco.com/c/en/us/products/ios-nx-os-software/ios-netflow
Tivoli Business Systems Manager. http://www.tivoli.com
Groenendijk, J., Huang, Y., Fallon, L.: Adaptive terminal reporting for Scalable Service Quality Monitoring in large networks. In: CNSM’11: Proceedings of the 7th international conference on network and service management, pp. 1–5 (2011)
Ahmed, T., Oreshkin, B., Coates, M.: Machine learning approaches to network anomaly detection. In: USENIX ’07: Proceedings of the 2nd USENIX workshop on tackling computer systems problems with machine learning techniques, pp. 1–6 (2007)
Hu, Z., Zhu, L., Ardi, C., Katz-Bassett, E., Madhyastha, H.V., Heidemann, J., Yu, M.: The need for end-to-end evaluation of cloud availability. In: Passive and active measurement, pp. 119–130 (2014)
Jaggard, A.D., Kopparty, S., Ramachandran, V., Wright, R.N.: The design space of probing algorithms for network-performance measurement. In: SIGMETRICS ’13: Proceedings of the 40th international conference on measurement and modeling of computer systems, pp. 105–116 (2013)
Quan, L., Heidemann, J., Pradkin, Y.: Detecting internet outages with precise active probing (extended). Tech. rep. USC/Information Sciences Institute, Sacramento (2012)
Quan, L., Heidemann, J., Pradkin, Y.: Trinocular: understanding internet reliability through adaptive probing. In: SIGCOMM ’13: Proceedings of the ACM SIGCOMM 2013 conference, pp. 255–266 (2013)
Jeswani, D., Korde, N., Patil, D., Natu, M., Augustine, J.: Probe station selection algorithms for fault management in computer networks. In: COMSNETS ’10: Proceedings of 2nd international conference on communication systems and networks, pp. 1–9 (2010)
Jeswani, D., Natu, M., Ghosh, R.K.: Adaptive monitoring: application of probing to adapt passive monitoring. J. Netw. Syst. Manag. 23, 950–977 (2015)
Jeswani, D., Natu, M., Ghosh, R.K.: Adaptive monitoring: a framework to adapt passive monitoring using probing. In: CNSM ’12: Proceedings of the 8th international conference on network and service management, pp. 350–356 (2012)
Brodie, M., Rish, I., Ma, S., Odintsova, N.: Active probing strategies for problem diagnosis in distributed systems. In: IJCAI’03: Proceedings of the 18th international joint conference on artificial intelligence, pp. 1337–1338 (2003)
Guan, L., Wang, Y., Li, W., Yan, C.: Efficient probing method for active diagnosis in large scale network. In: CNSM ’13: Proceedings of the 9th international conference on network and service management, pp. 198–202 (2013)
Cheng, L., Qiu, X., Meng, L., Qiao, Y., Boutaba, R.: Efficient active probing for fault diagnosis in large scale and noisy networks. In: INFOCOM’ 10: Proceedings of the 30th IEEE international conference on computer communications, pp. 1–9 (2010)
Lu, L., Xu, Z., Wang, W., Sun, Y.: A new fault detection method for computer networks. Reliab. Eng. Syst. Saf. 114, 45–51 (2013)
Ma, L., He, T., Leung, K.K., Towsley, D., Swami, A.: Efficient identification of additive link metrics via network tomography. In: ICDCS ’13: Proceedings of the 33rd international conference on distributed computing systems, pp. 581–590 (2013)
Li, H., Gao, Y., Dong, W., Chen, C.: Taming both predictable and unpredictable link failures for network tomography. IEEE/ACM transactions on networking pp. 1–14 (2018)
Duffield, N.: Network tomography of binary network performance characteristics. IEEE Trans. Inform. Theory 52, 5373–5388 (2006)
Sommers, J., Barford, P., Duffield, N., Ron, A.: Accurate and efficient SLA compliance monitoring. ACM SIGCOMM: Comput Commun Rev 37, 109–120 (2007)
Mardani, M., Giannakis, G.B.: Estimating traffic and anomaly maps via network tomography. IEEE/ACM Trans. Netw 24, 1533–1547 (2016)
Dong, W., Gao, Y., Wu, W., Bu, J., Chen, C., Li, X.Y.: Optimal monitor assignment for preferential link tomography in communication networks. IEEE/ACM Trans Netw 25, 210–223 (2017)
Wang, P., Xu, H., Niu, Z., Han, D., Xiong, Y.: Expeditus: Congestion-aware load balancing in clos data center networks. In: SoCC ’16: Proceedings of the seventh ACM symposium on cloud computing, pp. 442–455. ACM (2016)
Battre, D., Hovestadt, M., Lohrmann, B., Stanik, A., Warneke, D.: Detecting bottlenecks in parallel DAG-based data flow programs. In: 3rd Workshop on many-task computing on grids and supercomputers, pp. 1–10 (2010)
Barroso, L.A., Clidaras, J., Hoelzle, U.: The datacenter as a computer:An introduction to the design of warehouse-scale machines, pp. 1–154. Morgan & Claypool (2013)
Hammadi, A., Mhamdi, L.: A survey on architectures and energy efficiency in data center networks. Comput. Commun. 40, 1–21 (2014)
Gill, P., Jain, N., Nagappan, N.: Understanding network failures in data centers: measurement, analysis, and implications. ACM SIGCOMM: Comput. Commun. Rev. 41, 350–361 (2011)
Liu, Y., Lin, D., Muppala, J., Hamdi, M.: A study of fault-tolerance characteristics of data center networks. In: DSN 12: Proceedings of the 42nd annual IEEE/IFIP international conference on dependable systems and networks, pp. 1–6 (2012)
Liu, Y., Muppala, J.: Fault-tolerance characteristics of data center network topologies using fault regions. In: DSN 13: Proceedings of the 43rd annual IEEE/IFIP international conference on dependable systems and networks, pp. 1–6 (2013)
NS-3. https://www.nsnam.org/
Rocketfuel: An ISP topology mapping engine. http://research.cs.washington.edu/networking/rocketfuel/
Mahajan, R., Spring, N., Wetherall, D., Anderson, T.: Inferring link weights using end-to-end measurements. In: IMW ’02: Proceedings of the 2nd ACM SIGCOMM workshop on internet measurement, pp. 231–236 (2002)
Cygan, M., Kowalik, A., Wykurz, M.: Exponential-time approximation of weighted set cover. Inform. Process. Lett. 109, 957–961 (2009)
Acknowledgements
First author would like to thank Vaishali Sadaphal for the discussions she had when she was interning at TRDDC Pune. She would also like to thank for the financial assistance given to her by Department of IT under Visvesvaraya Ph.D. Scheme.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original version of this article was revised: The first author name was incorrectly published as ‘Anjua Tayal’ and the corrected name is ‘Anuja Tayal’.
Probe Selection is a NP–Complete Problem
Probe Selection is a NP–Complete Problem
In this Appendix, we show that probe-selection optimization problem of selecting minimum weight probes from the set of candidate probes is a known NP-Complete problem as it can be mapped to weighted set cover problem. Weighted set cover problem is defined as follows [43]
Definition 2
Given a finite universe \(U = \{e_{1,} e_{2}, ..., e_{n}\}\) of n members, \(S = \{s_{1}, s_{2}, ..., s_{m}\} \bigwedge , \forall i \; s_{i} \subseteq U\) a collection of subsets of U and a weight function \(w : s \rightarrow {\mathbb {R}}^{+}\) that assigns a positive real weight to each subset of U, the goal is to find the minimum weight subcollection of S whose union is U or a minimum weight set cover.
We define the problem of selecting probes from given set of candidate probes as follows
Definition 3
Given set of nodes to be covered in a graph \(V = \{v_{1}, v_{2}, ..., v_{n}\}\) of n nodes and Candidate Probe Set \(CP = \{CP_{1}, CP_{2}, ..., CP_{cp}\} \bigwedge , \forall i \; CP_{i} \subseteq V\) a collection of subsets of V and a weight function \(W : CP \rightarrow {\mathbb {R}}^{+}\) that assigns a positive real weight to each subset of V, the goal is to find the minimum weight subcollection of CP whose union is V or a minimum weight probes.
The set of finite universe U in weighted set cover problem is analogous to the set of nodes V to be covered in a graph G, while set S is equivalent to set of candidate probes. Each candidate probe \(CP_{i}\) consists of set of nodes \(\subseteq V\) covered by that probe. There exists a weight function W which assigns weight to each candidate probe which is the cost of selecting that probe. Our goal is to find minimum weight probes \(PS \subseteq CP\) which covers all the nodes of the graph or the union is V.
Rights and permissions
About this article
Cite this article
Tayal, A., Sharma, N., Hubballi, N. et al. Traffic Dynamics-Aware Probe Selection for Fault Detection in Networks. J Netw Syst Manage 28, 1055–1084 (2020). https://doi.org/10.1007/s10922-020-09514-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10922-020-09514-3