Chih-Chieh Yang is a research staff member in the Hybrid Cloud Infrastructure Research Department at IBM T.J. Watson Research Center. He received both his B.S. and M.S. from National Tsing Hua University (NTHU) in Taiwan. He accumulated industry experiences working for MediaTek, a top Taiwanese IC design house, as a software engineer for several years before starting and eventually earning his Ph.D. from the University of Illinois at Urbana-Champaign (UIUC). He joined IBM after completing his Ph.D. His past research experiences include designing software components that facilitate the development of distributed applications, high-level parallel programming abstractions, scaling distributed machine learning applications to both state-of-the-art supercomputers and future extreme-scale HPC systems, and optimizing cloud control plane software design. He currently focuses on performance optimization and automation of large-scale Foundation Model deployments on HPC and Cloud-based systems.
Computers have never been more important to the world. At IBM Research, we’re designing new systems that provide flexible, secure computing environments — from bits to neurons and qubits. We’re working on innovations in hybrid cloud infrastructure, operating systems, and software. Our goal is to create technologies that improve performance, security, and ease of use across hybrid and multi-cloud computing. We want to enable clients to dynamically compose best-of-breed services and applications freely and frictionlessly across distributed computing environments and accelerate data-driven innovations.
More: https://research.ibm.com/hybrid-cloud
The intern will participate in research projects related to system infrastructure of Foundation Model with a focus on optimizing the model inference of extreme large-scale (billions of parameters) models. These models are deployed in multi-node, multi-GPU environments and the optimization goals are to increase throughput, decrease latency, lower power consumption while maintaining the quality of inference results. The intern will work specifically on quantization of models to understand the benefits (increased speed, reduced memory footprint, reduced power consumption) and the interactions of different optimizations.
From June 10 to September 1, 2024 (adjustable at the discretion of the organisation)