April 2021 event

A charitable, service-based non-profit 501(c)(3) organization (NPO) educating and connecting the High Performance Computing (HPC) user community to state of art technology for the purpose of optimizing business processes and workforce advancement.

Our technology focus includes AI/Machine Learning, Data Science, Cloud Computing, and Visualization utilized in applications in Energy, Life Sciences, Manufacturing & Engineering, Financial Services, Academia, and Government.

The Society of HPC Professionals lunch and learn event

Lunch & Learn – April 2021

Constructing a Computationally Efficient Distributed GPU appliance in a PCI Express Based Network

Thursday, 22 April 2021
12:00pm – 1:00pm CST
Live stream due to COVID-19

Watch video and/or download presentation:

Membership Required

You must be a member to access this content.

View Membership Levels

Already a member? Log in here

About the Event

In this presentation we construct a computationally efficient distributed GPU appliance utilizing GPU Direct RDMA and NVMe-oF in a PCI Express based network. To address computational efficiency a solution must address the issue of underutilized compute and storage elements, also known as stranded resources. For a distributed GPU compute appliance one needs the ability to balance CPU-to-GPU compute ratios, as well as optimal GPU-to-GPU and GPU-to-storage communications, and ideally the ability to scale solutions spanning multiple GPU appliances. Modern feature rich PCI Express switch chips provide the features required to solve this problem via virtual switch partitioning and PCI non-transparent bridging (NTB). What has been missing to date is the software to unlock the capabilities of such a solution, until now. GigaIO’s FabreX is a PCIe standards-based network that addresses these challenges by easily integrating into existing infrastructure.

GigaIO FabreX software and hardware are utilized to construct this solution. Measured data from testing will be presented to demonstrate the performance and efficiency benefits of FabreX. The paper also presents a discussion on upcoming software functionality that unlocks the full HW capabilities of these modern PCI switch chips including: GPU Direct Storage; PCI multicast; and remote persistent memory. We end by covering the PCI Express roadmap and its relationship to CXL and GenZ.

About the Speaker

Scott Taylor
Fellow, GigaIO

Scott has an extensive background in high speed networking, accelerators and security from working at companies like Cray Research and Sun Microsystems. Leveraging this background, he created the FabreX software architecture supporting Redfish Composability Service, NVMe-oF, GPU Direct RDMA, accelerators, MPI and TCP/IP all with a single PCI-compliant interconnect. He has built the engineering team at GigaIO from the ground up to implement a singular vision of FabreX as an open source, standards-based ecosystem.

Scott’s previous experience includes Prisa Networks, a Fiber Channel startup, where he helped drive the shift from an arbitrated loop to switch based topologies. His many years working as an expert consultant helps him drive key intellectual property development at GigaIO. Scott holds a BS in computer science from UC Santa Barbara.

REGISTRATION IS CLOSED

Members
FREE for Members and U of H Student Members

Non-Members
$15 for Non-Members
This is a good time to consider joining SHPCP since members attend all monthly events for free, get copies of presentations and videos (when approved by presenter), and can participate in our annual Technology Meeting in December.
Plus, when we return to live events get free lunch and ability to live stream events.
It more than pays for itself if you attend just 4 events per year.
Click here to join now and attend for free

LOCATION:

Due to COVID-19, this month’s Lunch & Learn is a live stream event only.

For members, the link is on the Registration and Member Resources pages when you log in.

For non-members, the link will be provided after you register