Tutorials

We are pleased to announce the following tutorials which will be delivered during the conference:

Monday, December 17:

  • HPC Meets Cloud: Building Efficient Clouds for HPC, Big Data, and Deep Learning Middleware and Applications. (D. K. Panda and Xiaoyi Lu) (more information)
  • Unikernels for Dummies with Unikraft. (Felipe Huici) (more information)
  • APIs for Data Center Automation, Analytics, and Control. (Alan Sill) (more information)

Thursday, December 20:

  • A simple path towards testing cloud applications. (Francisco Gortázar) (more information)
Instructor and Bio Topic and Abstract
Dhabaleswar K. (DK) Panda and Xiaoyi Lu, Ohio State University. Dr. Dhabaleswar K. (DK) Panda is a Professor of Computer Science at the Ohio State University. His research interests in- clude parallel computer architecture, high performance computing, communication protocols, files systems, network-based computing, and Quality of Service. He has published over 400 papers in major journals and international conferences related to these research areas. Dr. Panda and his research group members have been doing extensive research on modern network- ing technologies including InfiniBand, HSE and RDMA over Converged Enhanced Ethernet (RoCE). His research group is currently collaborating with National Laboratories and leading InfiniBand and 10GigE/iWARP companies on designing var- ious subsystems of next generation high-end systems. The MVAPICH2 (High Performance MPI over InfiniBand, iWARP and RoCE) open-source software libraries, developed by his research group, are currently being used by more than 2,925 organizations worldwide (in 86 countries). This software has enabled several InfiniBand clusters (including the 2nd one) to get into the latest TOP500 ranking during the last decade. More than 482,000 downloads of these libraries have taken place from the project’s site. The new RDMA-enabled Apache Hadoop, Spark, HBase, and Memcached packages, consisting of acceleration for HDFS, MapReduce, RPC, Spark, HBase, and Memcached, and OSU HiBD micro-benchmarks are publicly available from http://hibd.cse.ohio-state.edu. These libraries are currently being used by more than 285 organizations from 34 countries. More than 27,100 downloads of these libraries have taken place from the project site. He is an IEEE Fellow and a member of ACM. More details about Dr. Panda are available at http://www.cse.ohio-state.edu/∼panda. Dr. Xiaoyi Lu is a Research Scientist of the Department of Computer Science and Engineering at the Ohio State University, USA. His current research interests include high performance interconnects and protocols, Big Data Processing, Parallel Computing Models (MPI/PGAS), Virtualization and Cloud Computing. He has published over 100 papers in international journals and conferences related to these research areas. He has been actively involved in various professional activities (PC Co-Chair, PC Member, and Reviewer) in academic journals and conferences. Recently, Dr. Lu is leading the research and development of RDMA-based accelerations for Apache Hadoop, Spark, HBase, and Memcached, and OSU HiBD micro- benchmarks, which are publicly available from http://hibd.cse.ohio-state.edu. These libraries are currently being used by more than 285 organizations from 34 countries. More than 27,100 downloads of these libraries have taken place from the project site. He is a core member of the MVAPICH2 project and he is leading the research and development of MVAPICH2- Virt (high-performance and scalable MPI for hypervisor and container based HPC cloud). He is a member of IEEE and ACM. More details about Dr. Lu are available at http://www.cse.ohio-state.edu/∼luxi. HPC Meets Cloud: Building Efficient Clouds for HPC, Big Data, and Deep Learning Middleware and Applications. The recently introduced Single Root I/O Virtualization (SR-IOV) technology for InfiniBand and high-speed Ethernet provides native I/O virtualization capabilities and is changing the landscape of HPC virtualization. However, SR-IOV lacks support for locality-aware communication and live migration, which limits its usage for efficient running HPC, Big Data, and Deep Learning workloads. This tutorial first provides an overview of popular virtualization system software, high-performance interconnects and communication mechanisms on HPC clouds, such as InfiniBand, RDMA, SR-IOV, IVShmem, etc. We further discuss the opportunities and technical challenges of designing high-performance MPI runtime over SR-IOV enabled InfiniBand clusters with both virtual machines and containers. We also discuss how to integrate these designs into cloud management systems like OpenStack and HPC cluster resource managers like Slurm. In addition, we will demonstrate how high-performance solutions can be designed to run Big Data and Deep Learning workloads (like Hadoop, Spark, TensorFlow) in HPC cloud environments. Finally, we will show demos of running these designs on the NSF-supported Chameleon Cloud platform. The proposed tutorial has evolved from the previous well-attended offerings (e.g., UCC’17, MUG’17, ICDCS’18). We will enhance the materials for UCC’18 with the emerging new technologies and the feedback we got from previous offerings.
Simon Kuenzer and Felipe Huici, NEC Labs. Simon Kuenzer is a senior systems researcher passionate about virtualization and unikernels. Simon has been at NEC Labs for the past 6 years and has expertise in NFV and fast packet frameworks like Intel DPDK and Netmap. He is the main maintainer of Unikraft and is currently doing a Ph.D. at the University of Liege, having received a diploma degree in computer science with focus on robotics and operating systems at the Karlsruhe Institute of Technology (KIT) in Germany. Felipe Huici is a chief researcher at NEC Europe Labs in Heidelberg. He received his undergraduate degree with honors from the University of Virginia, and his Masters in Data Communications, Networks and Distributed Systems from University College London, graduating top of the class; he received his Ph.D. from that same institution under the supervision of Prof. Mark Handley (UCL). Felipe regularly publishes on several top-tier conferences and journals such as SOSP, SIGCOMM, NSDI, CoNEXT, SoCC and SIGCOMM CCR, regularly acts as TPC member of conferences and journals such as CoNEXT, INFOCOM and SIGCOMM CCR, and is one of the maintainers of the Unikraft project. Unikernels for Dummies with Unikraft. A large body of research has shown that specialization of applications can yield large performance gains. In particular, unikernels (specialized operating systems tailored to specific applications) can provide incredibly fast boot times (a few milliseconds), tiny memory footprints (hundreds of KBs or a few MBs) and high network throughput (e.g., 40 Gb/s) all with great isolation. The painful downside to specializing OSes is the significant amount of expert time needed to port applications to the underlying minimalistic OSes. In this tutorial we will introduce Unikraft, a Linux Foundation open source project aimed at providing a menu-based, automated tool for creating unikernels targeting specific applications. Unikraft breaks down an OS’ functionality into a set of fine-granularity libraries (e.g., memory allocators, schedulers, filesystems, drivers, network stacks, etc.), along with the ability to leverage standard, existing libraries (e.g., libc, openssl, etc.). Users can then pick and choose functionality through the menu to build specialized OSes and to tweak their performance.
Alan Sill, Texas Tech University. Alan Sill is the Managing Director of the High Performance Computing Center at Texas Tech University, where he is also adjunct professor of physics, and Visiting Professor of Distributed Computing at the University of Derby. He also co-directs the multi-university US National Science Foundation Industry/University Cooperative Research Center for Cloud and Autonomic Computing (CAC). Dr. Sill holds a PhD in particle physics from American University and has an extensive track record of work in particle and nuclear physics and scientific computing including cloud and grid computing. He serves as President of the Open Grid Forum, is an active member of many computing standards working groups and roadmap committees, and has served as principal organizer, program committee chair, or general chair for a number of large-scale conferences and workshops. APIs for Data Center Automation, Analytics, and Control. The tutorial will present both the conceptual background (brief) and some hands-on exercises for code that can be used in practice for data center and server management. It will introduce Redfish, which improves over the too late, too chaotic standardisation of IPMI with a RESTful and actually useful interface. The implications deploying Redfish, which is included in the backplanes of all recent servers, are manyfold. First, future large-scale and exascale installations require full hardware control for administrators so that users cannot accidentally lock up or break parts of the system. The talk will highlight the advantages of standardised APIs using OpenAPI derived through transformation from Redfish’s JSON schemas. The speaker will furthermore touch on emerging Metal-as-a-Service offerings such as Chameleon and Cloudlab which allow for rapid and reproducible instantiation of machines including pre-installed software.
Francisco Gortázar (Patxi), Universidad Rey Juan Carlos. Dr. Francisco Gortázar is a Tenure Professor at Universidad Rey Juan Carlos with more than 12 years of experience in teaching distributed systems and concurrent programming. He has published more than 20 papers on high impact journals and conferences. He has a strong connection with the industry, specifically providing consultancy about cloud technologies and improving testing activities. Currently he is coordinating the H2020 project ElasTest, where he is researching novel ways of testing cloud infrastructures and applications, including 5G, IoT and real-time communication systems. A simple path towards testing cloud applications. Integration and end-to-end (e2e) testing of distributed systems, especially those deployed on cloud infrastructure, is a much more complex task than that of a monolithic application. Distributed systems require several services to be started, even for the simplest integration tests, and several tools need to be in place, like automated browsers for e2e testing. The tutorial will show how the efforts required for: 1) testing such systems, and 2) doing root cause analysis in the presence of failures, can be diminished by using ElasTest. We will start explaining how to perform e2e testing over a docker-based application using Jenkins, a popular CI server. We will go through the life-cycle of the application: starting, testing, gathering logs and metrics, stopping and analyzing results. Then, we will resort to ElasTest, and we will show how these tasks can be easily performed on this new testing platform specifically tailored for distributed systems testing. Two different approaches with ElasTest will be showcased: a first one tightly integrated with Jenkins that requires minimal efforts and changes to the existing Jenkins configuration. A second one where the full application lifecycle is left to the ElasTest platform, and the team can focus exclusively on testing.

Back to programme overview

links

social