RACC Seminar 4/23/2015 - Challenges of Managing Scientific Workflows in High-Throughput and High-Performance Computing Environments with Dr. Ewa Deelman



  RACC Seminar Video



 

Abstract:
Scientific workflows allow researchers to declaratively describe potentially complex applications that are composed of individual computational components. Workflows also include a description of the data and control dependencies between the components. This talk will describe example workflows in various science domains including astronomy, bioinformatics, earthquake science, gravitational-wave physics, and others. It will examine the challenges faced by workflow management systems when executing complex workflows in distributed and high-performance computing environments. In particular the talk will describe the Pegasus Workflow Management System developed at USC/ISI (https://pegasus.isi.edu). Pegasus bridges the scientific domain and the execution environment by automatically mapping high-level workflow descriptions onto distributed resources. As part of this process, Pegasus may transform the workflow based on the workflow properties and the target architecture. The talk will describe the optimizations and techniques developed and used within the Pegasus system to efficiently manage data and computations across heterogeneous computing environments. Pegasus can execute workflows on a laptop, a campus cluster, grids, and clouds. It can handle workflows with a single task or millions of tasks and has been used to manage workflows accessing and generating Terabytes of data. The talk will also look at the challenges and opportunities that upcoming, extreme-scale machines bring to workflow management systems.
 

Dr. Ewa Deelman is a Research Associate Professor at the USC Computer Science Department and the Assistant Director of Science Automation Technologies at the USC Information Sciences Institute. Dr. Deelman's research interests include the design and exploration of collaborative, distributed scientific environments, with particular emphasis on workflow management as well as the management of large amounts of data and metadata. In 2007, Dr. Deelman edited a book: “Workflows in e-Science: Scientific Workflows for Grids”, published by Springer. She is also the founder of the annual Workshop on Workflows in Support of Large-Scale Science, which is held in conjunction with the Super Computing conference. In 1997 Dr. Deelman received her PhD in Computer Science from the Rensselaer Polytechnic Institute.