Abstract

A new era of computing has begun with the development of high-performance computing (HPC), artificial intelligence (AI), machine learning (ML), and cognitive systems. Dramatic increases in the power density of the electronic components have led to the design and architecture of efficient thermal management technologies on these systems. IBM designed and delivered in 2018 the most powerful and fastest supercomputers of the world known as Summit and Sierra having 200 petaflops peak computing performance through LINPACK benchmarks. These systems which are called as IBM POWER AC922 are both air and liquid cooled, where water is employed in liquid-cooled systems to cool the high-power electronic components including IBM POWER9 processors and NVIDIA graphics processing units (GPUs). In this paper, we highlight the overview of the thermal and mechanical design strategies applied to these systems. Testing and experimental analysis with comparison to computational modeling is provided. Thermal control strategies are investigated for the optimization of overall system efficiency. In air cooled systems, we discuss the fan and heat sink designs, as well as the preheating effect on the PCIe section. In liquid-cooled systems, which have a unique cold plate design cooling the processors and the GPUs with water, we examine the water flow path design for the central processing units (CPUs), the GPUs, and the thermal performance of the cold plate. An overview of the cooling assemblies such as TIMs and air baffles in these systems is discussed. Unit and rack manifolds and rear door heat exchanger (RDHx) are investigated. Water flow and pressure distribution at the node and rack-level are provided.

References

1.
The Blue Gene Team
,
2013
, “
Blue Gene/Q: By Co-Design
,”
Comput. Sci. Res. Dev.
,
28
, p.
127
.10.1007/s00450-012-0215-3
2.
Caldeira
,
A.
,
Kahle
,
M.
,
Saverimuthu
,
G.
, and
Vearner
,
K. C.
,
2015
,
IBM Power Systems S822 LC Technical Overview and Introduction
,
IBM Red Paper
, Armonk, NY.
3.
Parida
,
P. R.
,
David
,
M.
,
Iyengar
,
M.
,
Schultz
,
M.
,
Gaynes
,
M.
,
Kamath
,
V.
,
Kochuparambil
,
B.
, and
Chainer
,
T.
,
2012
, “
Experimental Investigation of Water Cooled Server Microprocessors and Memory Devices in an Energy Efficient Chiller-Less Data Center
,”
28th Annual IEEE Semiconductor Thermal Measurement and Management Symposium
(
SEMI-THERM
), San Jose, CA, Mar. 18–22, pp.
224
231
.10.1109/STHERM.2012.6188852
4.
Frederic Lardinois, 2018, “IBM and the DoE Launch the World’s Fastest Supercomputer,” Techcrunch, Bay Area, CA, accessed Apr. 13, 2020, https://techcrunch.com/2018/06/08/ibms-new-summit-supercomputer-for-the-doe-delivers-200-petaflops/
5.
Summit, 2018, “Introducing Summit,” Oak Ridge National Laboratory, Oak Ridge, TN, accessed Apr. 13, 2020, https://www.olcf.ornl.gov/summit/
6.
TSMC,
2017
, “
TSMC CoWoS Foundry Services
,” Taiwan Semiconductor Manufacturing Company, Hsinchu, Taiwan, accessed Feb. 13, 2017, https://www.tsmc.com
7.
Daniel Payne
,
2012
, “
Chip On Wafer On Substrate (CoWoS)
,” Semiconductor Professionals, accessed Nov. 3, 2017, https://semiwiki.com/
8.
Hoffmeyer
,
M.
,
Subramanian
,
P.
,
Beyerle
,
R.
, and
Mann
,
P.
,
2017
, “
Novel Graphite-Based TIM for High Performance Computing
,”
16th IEEE Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems
(
ITherm
), Orlando, FL, May 30–June 2.10.1109/ITHERM.2017.7992478
9.
Hoffmeyer
,
M. K.
,
2017
, “
GPU Cooling Solutions for HPC Systems
,”
Electronic Packaging Symposium, SUNY-BU/GE Research Center
, Niskayuna, NY, Sept. 19–20.
10.
Tian
,
S.
,
Takken
,
T.
,
Schultz
,
M.
,
Yao
,
Y.
,
Coteus
,
P.
,
Marroquin
,
C.
,
O'Connell
,
K.
,
Mahaney
,
H. V.
,
Yuksel
,
A.
, and
Ellsworth
,
M.
,
2019
, “A Single Flexible Coldplate Cools Multiple Devices,” 18th IEEE Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (
ITherm
), Las Vegas, NV, May 28–31, pp. 1313–1320.10.1109/ITHERM.2019.8757281
11.
Yuksel
,
A.
,
Mahaney
,
V.
,
Marroquin
,
C.
,
Tian
,
S.
,
Hoffmeyer
,
M.
,
Schultz
,
M.
, and
Takken
,
T.
,
2019
, “
Thermal and Mechanical Design of the Fastest Supercomputer of the World in Cognitive Systems: IBM POWER AC 922
,”
ASME
Paper No. IPACK2019-644410.1115/IPACK2019-6444.
You do not currently have access to this content.