We are pleased to announce the following tutorials which will be delivered during the conference:
Instructor and Bio |
Topic and Abstract |
Dhabaleswar K. (DK) Panda and Xiaoyi Lu, Ohio State University.
Dr. Dhabaleswar K. (DK) Panda is a Professor of Computer Science at the Ohio
State University. His research interests in- clude parallel computer
architecture, high performance computing, communication protocols, files
systems, network-based computing, and Quality of Service. He has published over
400 papers in major journals and international conferences related to these
research areas. Dr. Panda and his research group members have been doing
extensive research on modern network- ing technologies including InfiniBand, HSE
and RDMA over Converged Enhanced Ethernet (RoCE). His research group is
currently collaborating with National Laboratories and leading InfiniBand and
10GigE/iWARP companies on designing var- ious subsystems of next generation
high-end systems. The MVAPICH2 (High Performance MPI over InfiniBand, iWARP and
RoCE) open-source software libraries, developed by his research group, are
currently being used by more than 2,925 organizations worldwide (in 86
countries). This software has enabled several InfiniBand clusters (including the
2nd one) to get into the latest TOP500 ranking during the last decade. More than
482,000 downloads of these libraries have taken place from the project’s site.
The new RDMA-enabled Apache Hadoop, Spark, HBase, and Memcached packages,
consisting of acceleration for HDFS, MapReduce, RPC, Spark, HBase, and
Memcached, and OSU HiBD micro-benchmarks are publicly available from
http://hibd.cse.ohio-state.edu. These libraries are currently being used by more
than 285 organizations from 34 countries. More than 27,100 downloads of these
libraries have taken place from the project site. He is an IEEE Fellow and a
member of ACM. More details about Dr. Panda are available at
http://www.cse.ohio-state.edu/∼panda.
Dr. Xiaoyi Lu is a Research Scientist of the Department of Computer Science and
Engineering at the Ohio State University, USA. His current research interests
include high performance interconnects and protocols, Big Data Processing,
Parallel Computing Models (MPI/PGAS), Virtualization and Cloud Computing. He has
published over 100 papers in international journals and conferences related to
these research areas. He has been actively involved in various professional
activities (PC Co-Chair, PC Member, and Reviewer) in academic journals and
conferences. Recently, Dr. Lu is leading the research and development of
RDMA-based accelerations for Apache Hadoop, Spark, HBase, and Memcached, and OSU
HiBD micro- benchmarks, which are publicly available from
http://hibd.cse.ohio-state.edu. These libraries are currently being used by more
than 285 organizations from 34 countries. More than 27,100 downloads of these
libraries have taken place from the project site. He is a core member of the
MVAPICH2 project and he is leading the research and development of MVAPICH2-
Virt (high-performance and scalable MPI for hypervisor and container based HPC
cloud). He is a member of IEEE and ACM. More details about Dr. Lu are available
at http://www.cse.ohio-state.edu/∼luxi.
|
HPC Meets Cloud: Building Efficient Clouds for HPC, Big Data, and Deep Learning Middleware and Applications.
The recently introduced Single Root I/O Virtualization (SR-IOV) technology for
InfiniBand and high-speed Ethernet provides native I/O virtualization
capabilities and is changing the landscape of HPC virtualization. However,
SR-IOV lacks support for locality-aware communication and live migration, which
limits its usage for efficient running HPC, Big Data, and Deep Learning
workloads. This tutorial first provides an overview of popular virtualization
system software, high-performance interconnects and communication mechanisms on
HPC clouds, such as InfiniBand, RDMA, SR-IOV, IVShmem, etc. We further discuss
the opportunities and technical challenges of designing high-performance MPI
runtime over SR-IOV enabled InfiniBand clusters with both virtual machines and
containers. We also discuss how to integrate these designs into cloud management
systems like OpenStack and HPC cluster resource managers like Slurm. In
addition, we will demonstrate how high-performance solutions can be designed to
run Big Data and Deep Learning workloads (like Hadoop, Spark, TensorFlow) in HPC
cloud environments. Finally, we will show demos of running these designs on the
NSF-supported Chameleon Cloud platform.
The proposed tutorial has evolved from the previous well-attended offerings
(e.g., UCC’17, MUG’17, ICDCS’18). We will enhance the materials for UCC’18 with
the emerging new technologies and the feedback we got from previous offerings.
|
Simon Kuenzer and Felipe Huici, NEC Labs.
Simon Kuenzer is a senior systems researcher passionate about
virtualization and unikernels. Simon has been at NEC Labs for the past
6 years and has expertise in NFV and fast packet frameworks like Intel
DPDK and Netmap. He is the main maintainer of Unikraft and is
currently doing a Ph.D. at the University of Liege, having received a
diploma degree in computer science with focus on robotics and
operating systems at the Karlsruhe Institute of Technology (KIT) in
Germany.
Felipe Huici is a chief researcher at NEC Europe Labs in
Heidelberg. He received his undergraduate degree with honors from the
University of Virginia, and his Masters in Data Communications,
Networks and Distributed Systems from University College London,
graduating top of the class; he received his Ph.D. from that same
institution under the supervision of Prof. Mark Handley (UCL). Felipe
regularly publishes on several top-tier conferences and journals such
as SOSP, SIGCOMM, NSDI, CoNEXT, SoCC and SIGCOMM CCR, regularly acts
as TPC member of conferences and journals such as CoNEXT, INFOCOM and
SIGCOMM CCR, and is one of the maintainers of the Unikraft project.
|
Unikernels for Dummies with Unikraft.
A large body of research has shown that specialization of applications
can yield large performance gains. In particular, unikernels
(specialized operating systems tailored to specific applications) can
provide incredibly fast boot times (a few milliseconds), tiny memory
footprints (hundreds of KBs or a few MBs) and high network throughput
(e.g., 40 Gb/s) all with great isolation. The painful downside to
specializing OSes is the significant amount of expert time needed to
port applications to the underlying minimalistic OSes.
In this tutorial we will introduce Unikraft, a Linux Foundation open
source project aimed at providing a menu-based, automated tool for
creating unikernels targeting specific applications. Unikraft breaks
down an OS’ functionality into a set of fine-granularity libraries
(e.g., memory allocators, schedulers, filesystems, drivers, network
stacks, etc.), along with the ability to leverage standard, existing
libraries (e.g., libc, openssl, etc.). Users can then pick and choose
functionality through the menu to build specialized OSes and to tweak
their performance.
|
Alan Sill, Texas Tech University.
Alan Sill is the Managing Director of the High Performance Computing Center at Texas Tech University, where he is also adjunct professor of physics, and Visiting Professor of Distributed Computing at the University of Derby. He also co-directs the multi-university US National Science Foundation Industry/University Cooperative Research Center for Cloud and Autonomic Computing (CAC). Dr. Sill holds a PhD in particle physics from American University and has an extensive track record of work in particle and nuclear physics and scientific computing including cloud and grid computing. He serves as President of the Open Grid Forum, is an active member of many computing standards working groups and roadmap committees, and has served as principal organizer, program committee chair, or general chair for a number of large-scale conferences and workshops.
|
APIs for Data Center Automation, Analytics, and Control.
The tutorial will present both the conceptual background (brief) and some hands-on exercises for code that can be used in practice for data center and server management.
It will introduce Redfish, which improves over the too late, too chaotic standardisation of IPMI with a RESTful and actually useful interface.
The implications deploying Redfish, which is included in the backplanes of all recent servers, are manyfold. First, future large-scale and exascale installations require full hardware control for administrators so that users cannot accidentally lock up or break parts of the system.
The talk will highlight the advantages of standardised APIs using OpenAPI derived through transformation from Redfish’s JSON schemas.
The speaker will furthermore touch on emerging Metal-as-a-Service offerings such as Chameleon and Cloudlab which allow for rapid and reproducible instantiation of machines including pre-installed software.
|
Francisco Gortázar (Patxi), Universidad Rey Juan Carlos.
Dr. Francisco Gortázar is a Tenure Professor at Universidad Rey Juan Carlos with more than 12 years of experience in teaching distributed systems and concurrent programming. He has published more than 20 papers on high impact journals and conferences. He has a strong connection with the industry, specifically providing consultancy about cloud technologies and improving testing activities. Currently he is coordinating the H2020 project ElasTest, where he is researching novel ways of testing cloud infrastructures and applications, including 5G, IoT and real-time communication systems.
|
A simple path towards testing cloud applications.
Integration and end-to-end (e2e) testing of distributed systems, especially those deployed on cloud infrastructure, is a much more complex task than that of a monolithic application. Distributed systems require several services to be started, even for the simplest integration tests, and several tools need to be in place, like automated browsers for e2e testing. The tutorial will show how the efforts required for: 1) testing such systems, and 2) doing root cause analysis in the presence of failures, can be diminished by using ElasTest. We will start explaining how to perform e2e testing over a docker-based application using Jenkins, a popular CI server. We will go through the life-cycle of the application: starting, testing, gathering logs and metrics, stopping and analyzing results. Then, we will resort to ElasTest, and we will show how these tasks can be easily performed on this new testing platform specifically tailored for distributed systems testing. Two different approaches with ElasTest will be showcased: a first one tightly integrated with Jenkins that requires minimal efforts and changes to the existing Jenkins configuration. A second one where the full application lifecycle is left to the ElasTest platform, and the team can focus exclusively on testing.
|