|Robust Resource Allocation for Heterogeneous Parallel and Distributed Computing Systems
Prof. H. J. Siegel
Abell Endowed Chair Distinguished Professor of Electrical and Computer Engineering and Professor of Computer Science
Director, CSU Information Science and Technology Center (ISTeC)
Colorado State University, Fort Collins, Colorado, USA
Date: Monday - 6:00-9:00 PM
In heterogeneous parallel and distributed computing environments, a network of different machines is interconnected and provides a variety of computational capabilities. These capabilities can be used to execute a collection of different types of applications, each of which may consist of multiple tasks, where the tasks have diverse computational requirements. The execution times of a task may vary from one machine to the next, and tasks must share the computing and communication resources of the system. Furthermore, there can be inter-task data dependencies.
The resources in parallel and distributed computing systems should be allocated to the computational tasks in a way that maximizes some system performance measure. However, allocation decisions and associated performance prediction are often based on estimated values of task and system parameters. The actual values of these parameters may differ from the estimates; for example, the estimates may represent only average values, the models used to generate the estimates may have limited accuracy, or there may be changes in the environment. Thus, an important research problem is the development of resource management strategies that strive to meet particular system performance requirements given such uncertainties. To address this problem, we have designed two models for deriving the degree of robustness of a resource allocation. One model is based on having deterministic estimates of the parameters whose exact values are uncertain, and in this case the degree of robustness of a resource allocation is quantified as the maximum amount of collective uncertainty in these system parameters within which a user-specified level of system performance (QoS) can be guaranteed. The second model assumes that stochastic information is available about the values of these parameters whose actual values are uncertain, and with this model the degree of robustness is quantified as the probability that a user-specified level of system performance can be met. Both robustness models, and the robustness metric associated with each, will be presented, and it will be shown how they can be used to evaluate the robustness of resource allocations. In addition, it will be demonstrated how these models can be incorporated into resource management heuristics that produce robust allocations that attempt to optimize some user-specified performance criterion. This will be done for both static heuristics, which are executed off-line for production environments, and dynamic heuristics, that are executed on-line, for environments where tasks must be assigned resources as they arrive into the system.
The tutorial material is applicable to various types of heterogeneous computing and communication environments, including parallel, distributed, cluster, grid, Internet, embedded, and wireless. Furthermore, the robustness models, concepts, and metrics presented are generally applicable to design problems throughout various scientific and engineering fields.
This course will enable you to:
- Understand the problem of robust resource allocation in heterogeneous parallel and distributed computing systems
- Ask the “three robustness questions” that must be answered whenever anyone makes robustness claims
- Apply the appropriate model of robustness depending on the information available about the system uncertainties
- Design and use robustness metrics to quantify the robustness of a particular resource allocation for a given computational environment
- Incorporate robustness into the design of both static (off-line) and dynamic (on-line) resource allocation heuristics
This course is intended for faculty, engineers, scientists, and graduate students who want to learn about how to define, model, and quantify robustness when designing and using heterogeneous suites of computers (including clusters and certain types of grids) to execute applications in a way that will optimize some performance criterion.
H. J. Siegel is the George T. Abell Endowed Chair Distinguished Professor of Electrical and Computer Engineering at Colorado State University (CSU), where he is also a Professor of Computer Science. He is the Director of the CSU Information Science and Technology Center (ISTeC), a university-wide organization for promoting, facilitating, and enhancing CSU’s research, education, and outreach activities pertaining to the design and innovative application of computer, communication, and information systems. From 1976 to 2001, he was a professor in the School of Electrical and Computer Engineering at Purdue University. He received two B.S. degrees from the Massachusetts Institute of Technology (MIT), and the M.A., M.S.E., and Ph.D. degrees from Princeton University. He is a Fellow of the IEEE and a Fellow of the ACM. Prof. Siegel has co-authored over 330 published technical papers in the areas of parallel and distributed computing and communications. He was a Coeditor-in-Chief of the Journal of Parallel and Distributed Computing, and was on the Editorial Boards of the IEEE Transactions on Parallel and Distributed Systems and the IEEE Transactions on Computers. For more information, please see www.engr.colostate.edu/~hj.
Professor H. J. Siegel
Department of Electrical and Computer Engineering and Department of Computer Science
Colorado State University
Fort Collins, CO 80523-1373
Office: (970) 491-7982
Fax: (970) 491-2249