IPDRM 2017

Second Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware

Orlando, Florida, USA.

Held in conjunction with the 31st IEEE International Parallel and Distributed Processing Symposium, (IEEE IPDPS 2017), May 29-June 2, 2017, Orlando, Florida, USA.

Submission deadlines: January 29th, 2017 EXTENDED TO February 5th, 2017


Node architectures of extreme-scale systems are rapidly increasing in complexity. Emerging homogeneous and heterogeneous designs provide massive multi-level parallelism, but developing efficient runtime systems and middleware that allow applications to efficiently and productively exploit these architectures is extremely challenging. Moreover, current state-of-the-art approaches may become unworkable once energy consumption, resilience, and data movement constraints are added. The goal of this workshop is to attract the international research community to share new and bold ideas that will address the challenges of design, implementation, deployment, and evaluation of future runtime systems and middleware.


This workshop will emphasize novel, disruptive research ideas over incremental advances. We will solicit papers on topics including, but not limited to, the following areas:

Runtime System/Middleware Design, Evaluation and Usage

  • Runtime/Middleware for emerging HPC and cloud computing platforms
  • Runtime/Middleware for Big Data Computing
  • Modeling and Performance Analysis of Runtime Systems
  • Comparison studies of different runtime systems and middleware
  • Tuning and Optimization studies
  • Interactions between Runtime Systems and Middleware
  • Runtime-architecture co-design

Constraints and Issues for Runtime Systems and Middleware

  • Energy- and Power-aware schemes
  • Fault Tolerance and Reliability
  • Scalable high-performance I/O and access to Big Data
  • Memory management
  • Runtime data analysis (e.g., in-situ analysis)
  • Real-time solutions and QOS
  • Virtualization, provisioning, and scheduling
  • Scalability of novel runtime systems and applications using them

Design Principles and Programming Support

  • High-level programming models (e.g., thread and task based models, data parallel models, and stream programming) and domain-specific languages
  • Programming frameworks, parallel programming, and design methodologies
  • Methodologies and tools for runtime and middleware design, implementation , verification, and evaluation
  • Wild and crazy ideas on future Runtime System and Middleware


We invite two kinds of submissions to this workshop: (1) Full-length research papers (8-page limit); (2) Short papers (4-page limit), which can take the form of position papers, experience reports, or surveys/comparisons of runtime systems and middleware. Papers should not exceed eight (or four) single-spaced pages (including figures, tables and references) using 10-point font on 8.5x11-inch pages. Submissions will be judged on correctness, originality, technical strength, significance, presentation quality, and appropriateness. Submitted papers should not have appeared in or under consideration for another venue. A full peer-review process will be followed with each paper being reviewed by at least 3 members of the program committee.

Submissions should follow the IEEE Conference Proceedings templates found at (http://www.ieee.org/conferences_events/conferences/publishing/templates.html). Camera-ready copy will need to conform to IPDPS guidelines; these will be announced during author notification

Papers can be submitted at (https://easychair.org/conferences/?conf=ipdrm17)


  • Paper Submission: January 29th EXTENDED to February 5th
  • Paper Notification: February 28th
  • Final Paper Due: March 7th

Organizing Committees

General Chairs

  • Shuaiwen Leon Song, Pacific Northwest National Laboratory
  • Torsten Hoefler, ETH Zurich

Proceeding Chair

  • Joseph Manzano, Pacific Northwest National Laboratory

Program Committee

  • Prasanna Balaprakash, Argonne National Laboratory, USA
  • Suren Byna, Lawrence Berkeley National Laboratory, USA
  • Marc Casas, Barcelona SuperComputing Center, Spain
  • Holger Froning, Heidelberg University, Germany
  • Todd Gamblin, Lawrence Livermore National Laboratory, USA
  • Xiong Jin, Institute of Computing Technology, Chinese Academy of Sciences, China
  • Changhee Jung, Virginia Tech, USA
  • Ang Li, Pacific Northwest National Laboratory, USA
  • Chao Li, Qualcomm, USA
  • Xu Liu, College of William and Mary, USA
  • Benoit Meister, Reservoir Laboratory, USA
  • Bin Ren, College of William and Mary, USA
  • Siva Kumar Sastry Hari, NVIDIA, USA
  • Dipanjan Sengupta, Intel Labs, USA
  • Devesh Tiwari, Northeastern University, USA
  • Ananta Tiwari, San Diego Supercomputing Center, USA
  • Bo Wu, Colorado School of Mines, USA
  • Yi Yang, NEC Laboratories America, Inc, USA
  • Jeffrey Young, Georgia Tech, USA
  • Zhija Zhao, University of California Riverside

Workshop Program

  • 08:30 AM to 08:45 AM: Introduction to the Workshop
  • 08:45 AM to 9:45 AM: Keynote: Xian-He Sun, Illinois Institute of Technology, “Memory Centric Optimization: Keep the memory hierarchy but nothing else”
  • 9:45 AM to 10:00 AM: Break 1
  • 10:00 AM to 10:45 AM: Invited Talk 1: Marc Casas; Reducing Data Movements on Large Shared Memory Systems by Exploiting Computation Dependencies
  • 10:45 AM to 12:15 PM: Session 1
    • Characterizing and Improving the Performance of Many-Core Task-Based Parallel Programming Runtimes; Jaume Bosch Pons, Xubin Tan, Carlos Alvarez, Daniel Jimenez, Xavier Martorell, Eduard Ayguade (Barcelona Supercomputing Center (BSC) - Universitat Politecnica de Catalunya)
    • A Memory Heterogeneity-Aware Runtime System for bandwidth-sensitive HPC applications; Kavitha Chandrasekar, Laxmikant V. Kale (University of Illinois Urbana Champaign), Xiang Ni (University of Illinois Urbana Champaign and IBM T.J. Watson Center)
    • SmartBlock: An Approach to Standardizing In Situ Workflow Components; Alexis Champsaur, Jai Dayal, Matthew Wolf, Ada Gavrilovska (School of Computer Science, Georgia Tech), Greg Eisenhauer, Jay Lofstead, Patrick Widener(Sandia National Laboratories)
  • 12:15 PM to 01:30 PM: Lunch
  • 01:30 Pm to 02:15 PM: Invited Talk 2: Torsten Hoefler, “dCUDA: Hardware Supported Overlap of Computation and Communication”
  • 02:15 PM to 03:15 PM: Session 2
    • A Case Study in Computational Caching Microservices for HPC; John Jenkins, Philip Carns, Robert Ross (Argonne National Laboratory), Galen Shipman, Jamaludin Mohd-Yusof, Kipton Barros (Los Alamos National Laboratory)
    • A Load-Balanced Parallel and Distributed Sorting Algorithm Implemented with PGX.D; Sungpack Hong, Jinsoo Lee, Siegfried Depner, Hassan Chafi (Oracle Labs), J. Ramanujam, Hartmut Kaiser (Center for Computation and Technology, Louisiana State University), Zahra Khatami (Oracle Labs and Center for Computation and Technology, Louisiana State University)
  • 03:15 PM to 03:30 PM: Break
  • 03:30 PM to 04:30 PM: Session 3
    • Performance Prediction of HPC Applications on Intel Processors; Carlos Rosales, Antonio Gomez-Iglesias, Si Liu, Feng Chen, Lei Huang, Hang Liu, Antia Lamas-Linares, John Cazes (Texas Advanced Computing Center The University of Texas at Austin)
    • vPHI: Enabling Xeon Phi Capabilities in Virtual Machines; Stefanos Gerangelos, Nectarios Koziris (Computing Systems Laboratory National Technical University of Athens)
  • 04:30 to 04:40 PM: Closing statements



Memory Centric Optimization: Keep the memory hierarchy but nothing else


Computing has changed from compute-centric to data-centric. Many new architectures, such as GPU, FPGA, ASIC, are introduced to match computer systems with the applications’ data requirement, and therefore, improve the overall performance. In this talk we introduce a series of fundamental results and their associated mechanisms to conduct this matching automatically, and through both hardware and software optimizations. We first present the Concurrent-AMAT (C-AMAT) data access model to unify the impact of data locality, concurrency and overlapping. Then, we introduce the pace matching data-transfer design methodology to optimize memory system performance. Based on the pace matching design, a memory hierarchy is built to mask the performance gap between CPU and memory devices. C-AMAT is used to calculate the data transfer request/supply ratio at each memory layer, and a global control algorithm, named layered performance matching (LPM), is developed to match the data transfer at each memory layer and thus match the overall performance between the CPU and the underlying memory system. The holistic pace-matching optimization is very different from the conventional locality-based system optimization. Analytic results show the pace-matching approach can minimize memory-wall effects. Experimental testing confirms the theoretical findings, with a 150x reduction of memory stall time. We will present the concept of the pace matching data transfer, the design of C-AMAT and LPM, and some experimental case studies. We will also discuss optimization and research issues related to pace matching data transfer and of memory systems in general.

Bio-Short version

Dr. Xian-He Sun is a University Distinguished Professor of Computer Science of the Department of Computer Science at the Illinois Institute of Technology (IIT). He is the director of the Scalable Computing Software laboratory at IIT and a guest faculty in the Mathematics and Computer Science Division at the Argonne National Laboratory. Before joining IIT, he worked at DoE Ames National Laboratory, at ICASE, NASA Langley Research Center, at Louisiana State University, Baton Rouge, and was an ASEE fellow at Navy Research Laboratories. Dr. Sun is an IEEE fellow and is known for his memory-bounded speedup model, also called Sun-Ni’s Law, for scalable computing. His research interests include data-intensive high performance computing, memory and I/O systems, software system for big data applications, and performance evaluation and optimization. He has over 250 publications and 5 patents in these areas. He is a former IEEE CS distinguished speaker, a former vice chair of the IEEE Technical Committee on Scalable Computing, the past chair of the Computer Science Department at IIT, and is serving and served on the editorial board of leading professional journals in the field of parallel processing. More information about Dr. Sun can be found at his web site www.cs.iit.edu/~sun/.