Group for Research in Decision Analysis

Optimizing Data Placement for Distributed Computation

Lukasz Golab University of Waterloo, Canada

I will discuss the following problem: given a set of data items, a set of tasks that reference these data items, and a set of servers with finite storage and processing capacities, allocate the data items to the servers in a way that minimizes the amount of data that needs to be transferred among servers during task execution. This problem arises in many practical scenarios including cloud databases. I will show that this problem can be reduced to the well-studied graph partitioning problem, which is NP-hard, but for which efficient approximation algorithms exist. I will also discuss how to handle load balancing and data replication. This is joint work with Marios Hadjieleftheriou (AT&T), Howard Karloff (Yahoo!) and Barna Saha (AT&T).

This seminar is organized jointly with the Montreal section of CORS and financially supported by the CORS Traveling Speakers Program.