We describe an emerging application of data mining in the context of computer networks. This application concerns the problem of predicting the size of a flow and detecting elephant flows (very large flows). Flow size is a very important statistic that can be used to improve routing, load balancing and scheduling in computer networks. Flow size prediction is particularly challenging since flow patterns continuously change and predictions must be done in real time (milliseconds) to avoid delays. We describe how to formulate the problem as an online machine learning task to continuously adjust to changes in flow traffic. We evaluate the predictive nature of a set of features and the accuracy of three online predictors based on neural networks, Gaussian process regression and online Bayesian Moment Matching on three datasets of real traffic. We also demonstrate how to use such online predictors to improve routing (i.e., reduced flow completion time) in a network simulation.
Welcome to everyone!