Inside HPC, GPU, AI : Must-see sessions at the Berlin Summit

Join the people building and operating open infrastructure at the OpenStack Summit Berlin in November. The Summit schedule features over 200 sessions organized by use cases including: artificial intelligence and machine learning, high performance computing, edge computing, network functions virtualization, container infrastructure and public, private and multi-cloud strategies.

Here we’re highlighting some of the sessions you’ll want to add to your schedule about HPC, GPU and AI. Check out all the sessions, workshops and lightning talks focusing on these three topics here.

The AI Thunderdome: Using OpenStack to accelerate AI training with Sahara, Spark and Swift

OpenStack lends itself well to big data problems says Red Hat’s Sean Pryor. He’ll talk about how with Swift and Ceph, data storage is easier than ever. One of the most consequential problems in the big data space is using AI to make sense of ever-increasing data volumes. OpenStack makes this a solvable problem: Data stored in Swift can be accessed by a Sahara cluster, which can use GPU instances to accelerate parallel AI hyperparameter tuning. This ability allows users to spin up and down huge AI training farms at a fraction of the manual effort, and in the end, isn’t that what the cloud is all about? Details here.

NASA Goddard Private Cloud: Genesis and lessons learned

In the fall 2016, NASA Goddard’s NASA Center for Climate Simulation (NCCS) and the Information Technology and Communications Directorate (ITCD) began a collaboration to provide an on-premises private cloud to the entire Goddard community using hardware reclaimed from Discover, the NCCS’ traditional HPC cluster.

The GPC is on track for production availability in October 2018 running Queens, however there are over 30 projects (and growing!) running in the prototype environment on Mitaka.
This from NASA’s Mike Moore will describe the challenges encountered and the innovative solutions devised on this journey including: telemetry/billing, data protection/DR, security, “cloudifying” workloads, containers and guiding HPC users through the paradigm shift to cloud computing. Details here.

Monitoring-as-a-Service in the HPC Cloud

When applications move to the cloud, the first move is to recreate the same platform on software defined infrastructure. This falls short of the true potential of cloud. OpenStack infrastructure can offer so much more – once cloud users become aware of the powerful APIs and services available to them.

In this talk, Stig Telfer of StackHPC Ltd. and Darryl Weaver of Verne Global will describe how to take HPC cloud migration to the next level. They’ll demonstrate the integration of Monasca services for monitoring and logging for performance-focussed deployments. They’ll show how this unlocks best-of-breed performance telemetry for all users, and how this opens new opportunities for users and admins to understand and optimize their applications. Details here.

Cyborg: Accelerate your cloud

As data center workloads evolve to become increasingly compute-intensive, there is a growing need for accelerators. There are a wide variety of accelerators, spanning GPUs, FPGAs, ASICs, and workload-specific ones such as TPUs. The Cyborg project in OpenStack aims to ease the adoption and lifecycle management of these diverse accelerator types.

Cyborg and Nova developers have put together an architecture to enable offload to various accelerators says Intel’s Sundar Nadathur. The architecture includes FPGAs, which have unique needs for programming and bitstream management. In this presentation, we will look at use cases for offloads to devices in general, programming models for FPGAs, and the representation of devices (including FPGAs) in Placement. Nadathur will take a close look at the scheduling of instances that need accelerators. He’ll detail the archictecture of os-acc, a library for Nova compute to interact with Cyborg. Finally, we will present the current status of Cyborg development. Details here.

See you at the OSF Summit in Berlin, November 15-18 2018! Register here.

Franck V.

Tags: AI, Berlin Summit, GPU, HPC

Author
Recent Posts

Superuser

Superuser Magazine is the Open Infrastructure Foundation's official online publication. It covers open infrastructure ecosystem news, case studies, event recaps, product updates and announcements, project releases, and more.