## Abstract

Machine learning (ML)-based predictive techniques are used in conjunction with a game-theoretic approach to predict the thermal behavior of a power electronics package, and study the relative influence of encapsulation material properties and thermal management techniques in influencing hotspot temperatures. Parametric steady-state and transient thermal simulations are conducted for a commercially available 1.2 kV/444 A silicon carbide (SiC) half-bridge module. An extensive databank of 2592 (steady-state) and 1200 (transient) data points generated via numerical simulations is used to train and evaluate the performance of three ML algorithms (random forest, support vector regression, and neural network) in modeling the thermal behavior. The parameter space includes the thermal conductivities of the encapsulant, baseplate, heat sink, and cooling conditions deployed at the sink and covers a variety of materials and cooling scenarios. Excellent prediction accuracies with *R*^{2} values > 99.5% are obtained for the algorithms. Shapley additive explanations (SHAP) dependence plots are used to quantify the relative impact of device and heat sink parameters on junction temperatures. We observe that while heatsink cooling conditions significantly influence the steady-state junction temperature, their contribution in determining the junction temperature in dynamic mode is diminished. Using ML-SHAP models, we quantify the impact of emerging polymeric nanocomposites (with high conductivities and diffusivities) on hotspot temperature reduction, with the device operating in steady-state and transient modes. Overall, this study highlights the attractiveness of ML-based approaches for thermal design, and provides a framework for setting targets for future encapsulation materials.

## 1 Introduction and Motivation

The transition from silicon to wide bandgap-based semiconductors such as SiC and GaN has resulted in advanced power electronic devices with the capability to operate at faster switching speeds as well as higher temperatures and breakdown voltages. Alongside the replacement of Si-based packages with wide bandgap-based packages, devices have adopted smaller form factors over time, resulting in ever-increasing heat densities [1–4]. While heat fluxes >100 W/cm^{2} are common in microprocessors, semiconductor devices such as insulated gate bipolar transistors (IGBTs) and laser diode arrays can generate heat fluxes >1 kW/cm^{2}, resulting in localized hot spots [5,6]. Effective functioning of such devices then becomes strongly contingent on effective thermal management and packaging techniques. This has led to significant research and advances in the development of high-heat-flux thermal management solutions and new packaging materials.

The standard method for removing heat generated by a package involves attaching a heat sink to the package in a single or a dual-sided configuration [7,8], with various options for cooling media (air, water, refrigerants, etc.) and technique (single-phase or two-phase). Designing an effective thermal management solution requires a detailed understanding of the temperature distribution in a package. It is challenging to obtain analytical solutions to the three-dimensional Poisson's equation to arrive at the temperature map for a complex commercial package. Therefore, numerous studies have analyzed the influence of cooling technologies and package designs on the thermal performance using either a thermal resistance network analogy (Foster or Cauer network) or via 3D simulations [9–11]. Thermal resistance circuits can capture heat spreading within different layers, and can be used to model slightly complicated geometries without resorting to full-scale three-dimensional (3D) simulations. However, the prediction accuracies are often limited due to the underlying assumptions [12,13]. It is noted that since the thermal behavior cannot be modeled explicitly using analytical methods and without resorting to time-consuming numerical simulations, understanding the dependence of hotspot temperature or temperature gradients in the package on material properties and heat sink parameters presents significant challenges. While most studies focus on steady-state thermal performance, power electronics devices such as IGBTs, microprocessors used in portable devices, and high-power semiconductor laser diode arrays more commonly generate time-varying heat fluxes.

Although experiments and numerical simulations have been key in designing thermal management techniques, machine learning algorithms have also been used to predict thermal performance for 3D integrated circuits, single and multicore processors and cloud computing data centers [14–17]. Such algorithms serve as an alternative to first-order modeling techniques to predict the influence of various parameters on the thermal performance of the package. Notably, machine learning has been used to model boiling and condensation heat transfer performance and identify various boiling regimes [18–22]. This enhanced understanding has important applications for high heat flux thermal management of power electronics.

Advances in active cooling have been coupled with passive thermal solutions, which involve using advanced packaging materials to create thermally conductive pathways away from hotspots and enable heat spreading. Such materials need to satisfy multifunctional requirements depending upon the mode of operation of the package. In the static mode, where the power applied to the device doesn't change with time, effective functioning depends primarily on adequate heat dissipation and, therefore, on the thermal conductivity of the package materials. In contrast, dynamic-mode operations which involve temperature swings result in thermo-mechanical stresses at the wire-bond junction and solder-chip interface, which necessitate using materials that satisfy multifunctional electrical-thermal-mechanical requirements.

One specific area for recent research is the synthesis of high thermal conductivity polymeric nanocomposites for use as encapsulants and heat sink materials in power electronics packages. Such composites have a polymeric base matrix interspersed with high thermal conductivity filler materials such as boron nitride, silicon carbide, aluminum nitride etc. [23–25]. While traditional encapsulation materials have a thermal conductivity of ∼0.2 W/mK, many newly developed nanocomposites have reported thermal conductivity exceeding 20 W/mK, i.e., an enhancement by a factor of 100 [26]. The thermal conductivity of such nanocomposites depends strongly on the filler loading fraction, on the type, size, shape, orientation of the filler particles and on filler-matrix interfacial interactions etc. Many studies report nanocomposites with highly anisotropic properties [26]. This is due to two-dimensional shaped fillers such as sheets and rods that aid heat flow in the in-plane direction of the nanocomposite via thermally conductive pathways. Researchers have utilized active orientation techniques (using magnetic or electric fields) to align particles in the out-of-plane (OOP) direction to build thermal pathways. Such nanocomposites can be used as heat sink materials, when focused more on directing heat toward a cooler surface. Such polymer nanocomposite-based heat sinks have multiple advantages over their metal counterparts owing to their compatibility with well-developed fabrication processes (e.g., 3D printing), better heat dissipation to weight ratios and higher corrosion resistance [27].

While there are numerous studies on synthesis and thermal conductivity measurements of novel nanocomposites, it is crucial to quantify the improvement in thermal performance resulting from the use of these materials. Additionally, such a study can establish targets for future material development efforts. Consequently, we conducted a detailed analysis of the influence of using such high thermal conductivity polymeric nanocomposites as encapsulants and heat sink materials in conjunction with various cooling techniques. To focus the analysis toward real-life conditions, a commercially available SiC half-bridge power electronics module was simulated. This study utilized 3D steady-state and transient thermal simulations to generate an extensive databank (2592 data points for steady-state modeling and 1200 points for transient modeling) of junction temperatures obtained from parametric simulations in ansys. This data bank was then used to train and compare the predictive utility of three machine learning models: random forest (RF), support vector regression (SVR) and artificial neural networks (ANN) in estimating the maximum junction temperature of the package (which is the key parameter of interest).

Furthermore, the random forest algorithm was coupled with a game-theoretic approach called Shapley additive explanation (SHAP) model to quantify the relative contribution of input features toward increasing/decreasing hotspot junction temperatures of the device. An important objective of this effort was to estimate the target values of thermal conductivity and diffusivity to produce a meaningful reduction in hotspot temperatures. The obtained results were used to assess ongoing material development efforts and their implications on the thermal performance of power electronic (PE) packages. It is noted that a part of the present study (specifically, steady-state analysis) was discussed in another recent study [28]. Presently, we build up on the methodology developed in our past work [28] to provide a holistic assessment of the influence of various parameters on the steady-state as well as the transient performance of the power electronics package.

## 2 Half-Bridge Module Analyzed in this Study

Figure 1 shows a commercially available 1.2 kV/444 A SiC half-bridge module employed in our study. This module consists of seven MOSFETs and six diodes (per switch), is designed for ultrafast switching operations, and is rated to a maximum junction temperature of 175 °C. Such modules are used in high-efficiency converters/inverters, motor and traction drives, and smart-grid/grid-tied distributed generation applications. The module consists of an AlSiC baseplate and a Si_{3}N_{4} active metal-brazed substrate. The structural and material characteristics for this module (detailed in Ref. [29]) were extracted for all the layers starting from the SiC chip through the insulated power substrate to the AlSiC baseplate (detailed in Supplemental Material on the ASME Digital Collection). A computer-aided design model for the entire module was subsequently rendered in SolidWorks (Waltham, MA).

The module was covered with a 4 mm thick polymer encapsulant covering all wires and the MOSFETs/diodes. This encapsulant serves multiple purposes, such as environmental protection (moisture, contaminants), preventing chemical corrosion and mechanical damage. However, it adversely affects heat removal from the package owing to its low thermal conductivity (0.2 W/mK).

## 3 Description of Machine Learning Models Used in this Study

Machine learning (ML) algorithms are used to identify hidden patterns in vast amounts of data to predict performance of complex, nonlinear systems. Due to the ease in implementation and availability of vast amounts of data, ML algorithms are being used in fields such as biochemistry, finance, social media, transportation, geology, and sensors [30–33]. ML has also been recently employed to predict the thermal behavior of complex systems that are challenging to model from first principles [20,21,34–36].

Machine learning models are commonly characterized by the tradeoff between prediction accuracy and model interpretability. Prediction accuracy is determined by the model's flexibility, which determines the restrictiveness in the range of shapes that the model can produce to fit a particular dataset. Statistical regression-based techniques, being restricted by the range of equations that can be used to define complex nonlinear hyperplanes, exhibit very low prediction accuracies for highly nonlinear data sets. However, since such techniques yield tangible relationships between the input and output variables in the form of an equation, it is easy to visualize and map the fundamental relationships or dependencies, making such models highly interpretable. This facilitates understanding of the relative contributions of various input features to the target variable. Nonlinear methods, such as support vector machines or bagged regression trees, have very high prediction accuracies, albeit at the expense of reduced model interpretability, leading to their characterization as black-box methods.

Presently, we use three different predictive models: (a) random forest, (b) support vector regression, and (c) artificial neural networks to estimate the junction temperature of the half-bridge module as a function of material properties and heat sink parameters of the package. Additionally, we couple the random forest model with SHAP feature importance methods to map the relative influence of various parameters on the junction temperature. Our present approach makes the ML models interpretable while maintaining a high level of accuracy.

### 3.1 Random Forest.

Ensemble-based techniques combine multiple weak learning models to produce one robust model to improve performance by decreasing variance and bias using bagging (bootstrap aggregating) and boosting techniques. In the present study, we use decision tree-based ensemble methods owing to their ease of implementation and versatility. Figure 2, a typical decision tree divides a predictor space into several regions by employing certain rules at every split designed to optimize a specific objective function. After a decision tree is trained on a dataset, predictions can be obtained by following the branches of the tree while following the splitting rules at every node. The value of a prediction for a particular instance would then be the mean or mode of the trained observations in the final region/branch where it belongs (usually denoted as the terminal nodes or leaves of the tree).

When a single decision tree trained on a particular dataset is used for prediction, it can exhibit a very high variance, wherein the model performs poorly for data sets other than the ones used for training the model. In order to reduce variance and increase the robustness of the model, two powerful ML techniques—bagging or bootstrap aggregation, [37] and random feature selection [38] are implemented in a decision tree-based method. The fundamental idea is to fit multiple decision trees on several training data sets and average the resulting observations to decrease the overall variance of the machine learning model. To provide multiple training data sets from the available data, bootstrapping is carried out wherein the dataset is resampled multiple times with replacements to create many simulated samples that have the same size as the original dataset. Subsequently, *N* different trees are trained over *N* bootstrapped samples of data, and the results obtained from these trees are then averaged to obtain the final output. In random feature selection, only a tiny subset (*m*) of predictors is chosen out of all the predictors (*p*) every time a split decision is made in the internal node of a tree (*m* is typically chosen to be equal to $p$) , which decorrelates individual trees in the forest. Random forest combines both these techniques (bagging and random feature selection), to provide a very strong generalized prediction model.

### 3.2 Support Vector Regression.

*ε*-insensitive approach [39] using structure risk minimization principle to form a symmetrical flexible tube around an estimated regression function (illustrated in Fig. 3) in such a way that points inside the tube receive no penalty, whereas those lying outside the tube are penalized. To do so, given a set of training vectors {(

*x*

_{1},

*y*

_{1}), (

*x*

_{2},

*y*

_{2}) … (

*x*,

_{n}*y*)}, SVR algorithm generates a regression function of the following form:

_{n}*x*) = [

*ϕ*

_{1}(

*x*

_{1}),

*ϕ*

_{2}(

*x*

_{2}),…,

*ϕ*(

_{n}*x*)]

_{n}^{T}are the kernel functions used to map the input vectors

**x**to a higher-dimensional space,

*w*= [

*w*

_{1},

*w*

_{2},…,

*w*]

_{n}^{T}are the weight factors for each support vector and b is the function bias. The coefficients, w and b can be estimated by minimizing a regularized risk function of the following form:

*C*being a penalty parameter that determines the flexibility of the model and the tradeoff between the model smoothness and prediction error. The second term in the risk function is the

*ε*-insensitive loss function $L\epsilon (yi,f(xi))$ that imposes a penalty for points lying outside the

*ε*-tube. In some cases, where the prediction error for the training samples is greater than

*ε*, positive slack variables (

*ξ*and

*ξ**) are introduced to deal with these data points, and the above equation is transformed into the primal function

*R*(

*w*,

*ξ*,

*ξ**) with the optimization problem taking the following form:

where $\alpha i$ and $\alpha i\u0302$ are the Lagrange multipliers and *K* is the kernel function. The value of *K*(*x _{i}, x_{j}*) is equal to the inner product of two vectors

*x*and

_{i}*x*in the feature space

_{j}*ϕ*(

*x*) and

_{i}*ϕ*(

*x*). In the present analysis, radial basis function kernel, $K(xi,xj)=exp(\u2212\gamma \Vert xi\u2212xj\Vert 2)$, where the kernel parameter,

_{j}*γ*> 0 has been used.

### 3.3 Artificial Neural Networks.

Neural networks (inspired by the functioning of biological neural networks) are composed of tiny units called neurons arranged in a series of interconnected layers. Figure 4 outlines the basic architecture of a neural network. The first layer is called an input layer which receives the input data and introduces it into the network, whereas the last layer is called the output layer, which receives the processed information from all the layers to provide the final prediction. The layers between the input and output layers constitute the hidden layers that carry out most of the processing wherein the data received through the input layer is propagated through a series of weighted interconnections and activation functions at every neural junction. Every computational neuron in a neural network receives the sum of weighted values and a bias passed on from computational neurons of the previous layer. This value, upon passing through an activation function (depending upon the type of function: linear or nonlinear), becomes the output for that particular neuron to be passed on as a weighted value for a neuron in the subsequent layer. The output for a particular neural network can be estimated if the hyperparameters such as weights, biases and activation function are specified.

where *u _{j}* is the weighted sum at the

*j*th neuron,

*w*is the weight between the

_{ij}*i*th input signal and the

*j*th neuron,

*I*is the input variable and

_{i}*b*represents the bias.

_{j}*O*is the output and

_{j}*f*is the activation function. The information from this hidden layer then passes onto the subsequent layers until it reaches the output layer. Once the predictions are computed for the data points, the error values are determined between the predicted and the actual output values, following which the error is passed backward to the input layer to update the weights and biases using an optimization algorithm to minimize the prediction error. This is referred to as the backpropagation phase. Presently, we use neural networks with a rectified linear unit activation function and backpropagation network architecture, which is very commonly used when developing a multilayer perceptron to learn complex relationships in data.

## 4 Description of Simulation Methodology

A 3D model for the SiC half-bridge module was rendered in Solidworks (Fig. 5), which captured all the minute details related to dimensions and placement of the wire-bonds, MOSFETs, diodes, printed circuit board, and different layers comprising the PE package. This model was then imported to ansys to carry out parametric 3D steady-state and transient thermal simulations. Since there were a considerable number of test cases to evaluate (2592 and 1200 data points for steady-state and transient modeling, respectively) and the simulations were computationally expensive, they were carried out using a supercomputing cluster at the Texas Advanced Computing Center (TACC).

To investigate the influence of various active cooling technologies such as air and liquid (single and two-phase) cooling, a plate-fin type heat sink consisting of rectangular channels with cross-sectional dimensions: 2.55 × 21.88 mm and 1.1 mm thick fins was attached to the bottom of the baseplate; this heat sink is identical to the one chosen in a recent study [11]. To reduce the computational cost, the entire domain was divided into smaller sets of self-replicating, symmetrical entities. The simulations were conducted for one such sliced domain, as illustrated in Figs. 5 and 6.

Once the solid model was imported into ansys, convergence analysis was conducted to determine a suitable mesh resolution, wherein the maximum junction temperature was evaluated for varying mesh sizes. Subsequently, a high-resolution mesh consisting of 2.31 million elements (as illustrated in Fig. 6) was employed for carrying out the ansys simulations.

Variable parameters in the steady-state simulations included the thermal conductivity of the polymeric encapsulant (anisotropic), heat sink, and baseplate, as well as the heat transfer coefficient (HTC) and the temperature of the heat sink. Since the focus of this study was to evaluate the relative contribution of every input parameter value on junction temperatures, an extensive databank consisting of 2592 simulation test cases was developed. This databank includes all possible combinations of discrete parameter values as listed in Table 1.

T_{sink} (°C) | HTC (W/m^{2}K) | k_{sink} (W/mK) | k_{encxy}, k_{encz} (W/mK) | k_{baseplate} (W/mK) |
---|---|---|---|---|

10 | 500 | 20 | 0.2, 0.2 | 150 |

15 | 1000 | 50 | 2, 0.2 | 300 |

20 | 2000 | 100 | 20, 0.2 | |

3000 | 200 | 40, 0.2 | ||

4000 | 300 | 0.2, 2 | ||

5000 | 400 | 2, 2 | ||

20, 2 | ||||

40, 2 | ||||

0.2, 10 | ||||

2, 10 | ||||

20, 10 | ||||

40, 10 |

T_{sink} (°C) | HTC (W/m^{2}K) | k_{sink} (W/mK) | k_{encxy}, k_{encz} (W/mK) | k_{baseplate} (W/mK) |
---|---|---|---|---|

10 | 500 | 20 | 0.2, 0.2 | 150 |

15 | 1000 | 50 | 2, 0.2 | 300 |

20 | 2000 | 100 | 20, 0.2 | |

3000 | 200 | 40, 0.2 | ||

4000 | 300 | 0.2, 2 | ||

5000 | 400 | 2, 2 | ||

20, 2 | ||||

40, 2 | ||||

0.2, 10 | ||||

2, 10 | ||||

20, 10 | ||||

40, 10 |

The parameter values were varied to cover a broad spectrum of conditions. Thermal conductivities of 50, 200, and 400 W/mK represent stainless steel, aluminum, and copper heat sinks. Other values of heat sink thermal conductivities in Table 1 represent novel nanocomposite high thermal conductivity polymers. The range of HTCs covers air, single- and two-phase liquid cooling. Thermal conductivity of nanocomposite encapsulants can be highly anisotropic due to the dependence of thermal conductivity on the orientation and spacing of the filler materials embedded in the matrix. Therefore, this study considers anisotropic thermal conductivities for encapsulants, the range of values for which is listed in Table 1, and is based on a survey [26] of more than 250 studies (spanning the past 60 years) reporting in-plane (*k*_{encxy}) and OOP (*k*_{encz}) thermal conductivities. Based on the observations reported in this study [26], OOP thermal conductivities tend to be an order of magnitude lower than in-plane (IP) thermal conductivities. A maximum OOP and IP thermal conductivity of 11.1 W/mK [41] and 212.8 W/mK [42] have been reported in literature, both employing boron nitride-based fillers. However, commonly reported IP thermal conductivities lie in the range of 2–40 W/mK [26], which is the range we have considered in the present analysis.

Table 2 shows the parameter space for transient simulations. Thermal diffusivity is a key input variable as diffusivity is a more meaningful parameter than conductivity under transient conditions. The first three diffusivity values for heatsink materials in Table 2 represent reported values for polymer nanocomposites [26]. Diffusivities ranging between 0.1 and 0.5 × 10^{−5} m^{2}/s are commonly reported, whereas a diffusivity of 1 × 10^{−5} m^{2}/s is representative of best-in-class values [26]. The last two heatsink material diffusivities represent AlSiC and copper metal, respectively. Other variables (Table 2) include the thermal diffusivity of the baseplate, encapsulant, pulse width duration of an applied square power waveform and HTC. Pulse widths range from 0.05 s to 1 s, and these values are smaller than the time constant of the PE package. A databank of 1200 simulation test cases was developed for the transient simulations by considering all possible combinations of parameters outlined in Table 2.

α_{sink} × 10^{−5} (m^{2}/s) | α_{baseplate} × 10^{−5} (m^{2}/s) | α_{encapsulant} × 10^{−5} (m^{2}/s) | Pulse width (s) | HTC (W/m^{2}K) | T_{sink} (°C) |
---|---|---|---|---|---|

0.1 | 7.5 | 0.01 | 0.05 | 500 | 20 |

0.5 | 11.6 | 0.1 | 0.1 | 1000 | |

1 | 0.5 | 0.25 | 2000 | ||

7.5 | 1 | 0.5 | 3000 | ||

11.6 | 1 | 4000 | |||

5000 |

α_{sink} × 10^{−5} (m^{2}/s) | α_{baseplate} × 10^{−5} (m^{2}/s) | α_{encapsulant} × 10^{−5} (m^{2}/s) | Pulse width (s) | HTC (W/m^{2}K) | T_{sink} (°C) |
---|---|---|---|---|---|

0.1 | 7.5 | 0.01 | 0.05 | 500 | 20 |

0.5 | 11.6 | 0.1 | 0.1 | 1000 | |

1 | 0.5 | 0.25 | 2000 | ||

7.5 | 1 | 0.5 | 3000 | ||

11.6 | 1 | 4000 | |||

5000 |

The boundary conditions (for steady-state and transient simulations, outlined in Fig. SII available in the Supplemental Materials on the ASME Digital Collection) applied to the package are as follows: (i) natural convection (5 W/m^{2}K) at 25 °C applied at the external surfaces of the package, (ii) adiabatic condition at the side faces of the package, and (iii) forced convection HTC applied at *T*_{sink} at the internal surfaces of the heat sink. A steady total power (*P _{D}*) of 1500 W (corresponding to a heat flux of ∼400 W/cm

^{2}) and a square power waveform with a maximum value of 1500 W and a 50% duty cycle was dissipated equally in all 14 MOSFETs for the steady-state and transient simulations, respectively. It is noted that this is the maximum power rating specified in the module datasheet. This approach also enables calculation of junction to ambient thermal resistance $\Theta ja=Tj\u2212TambPD$, which is an important performance parameter for any electronics package.

## 5 Development of Machine Learning Models

The junction temperature values obtained from the simulations were used to train and evaluate the performance of three ML models: random forest, support vector regression and artificial neural networks. In order to carry out the ML analysis, the dataset was split into training and test dataset using train-test split function from scikit-learn library [43], wherein 85% of the dataset was employed for training the model and the remaining 15% was used for evaluating the accuracy and performance of the trained ML model. The performance of an ML model depends significantly on its hyperparameters, which can be used to tune the model's structure and complexity, and, therefore, its capacity to learn and infer trends from any data. In tree-based models, typical hyperparameters include number of trees to grow in the forest, number of features to be considered for each split, minimum number of data points to be allowed in a leaf node and depth of each tree. Hyperparameters for support vector regression include the type of kernel used and regularization (*C*) and gamma (*γ*) parameters. Hyperparameters for neural networks include the number of nodes and hidden layers and the weights and biases to be employed for every interconnection.

To tune the hyperparameters, an exhaustive grid search cross-validation with a two-step approach was used to carry out optimization. The first step consists of evaluating a suitable range of hyperparameter values by sampling the combinations at random, and the second step consists of carrying out a concentrated search in the narrowed-down parameter space to arrive at the combination yielding the best results [28]. For neural network architectures, the weights and biases were optimized using adaptive moment estimation (Adam) optimization algorithm, which utilizes momentum and adaptive learning rates to converge faster. To carry out grid search cross-validation, optimization and to implement the ML algorithms in Python, open-source ML libraries such as scikit-learn, tensorflow, and keras were used [43–45].

*R*

^{2 }%) and average absolute relative deviation percent (AARD %), as defined below:

## 6 Results and Discussions

### 6.1 Model Validation Using Derating Curve in Manufacturer's Datasheet.

In order to validate the ansys model, the derating curve provided in the manufacturer's datasheet was used. Derating curve provides the maximum power dissipation (*P _{D}*) that can be achieved in the module while maintaining a constant junction temperature of 175 °C for differing case temperatures (

*T*). To carry out this validation, derating curve points were generated using the developed ansys model by obtaining the power dissipation values that lead to a junction temperature of 175 °C for varying case temperatures. Figure 7 outlines the values obtained via the simulations along with the derating curve from the manufacturer's datasheet. An excellent agreement was observed with the maximum difference being less than 0.6% relative to the datasheet value.

_{C}### 6.2 Results of Steady-State Analysis

#### 6.2.1 Temperature Profile in Package.

Figure 8(a) shows a temperature profile in the half-bridge module (with the encapsulant hidden) obtained from the simulations. This particular case corresponds to a heat transfer coefficient of 5000 W/m^{2}K (liquid cooling), *T*_{sink} = 10 °C, a copper heat sink, an AlSiC baseplate (*k*_{bp} = 150 W/mK) and an encapsulant with in-plane and OOP thermal conductivity of 40 and 10 W/mK, respectively. Under these conditions, a maximum junction temperature of 98 °C is obtained (equivalent to a junction to ambient thermal resistance (*Θ _{ja}*) of 0.0587 °C/W), which is much below the maximum rated junction temperature of 175 °C. Figures 8(b)–8(d) depict the temperature profiles in successive layers of the package. It is seen that the maximum temperature in every layer decreases due to in-plane heat spreading as heat flows toward the heat sink, where it is absorbed by the flowing coolant. It is noted that since the MOSFETs are placed closer to one end of the package, the heat sources are not equidistant from the centerline of the device, which results in an asymmetrical temperature distribution along the longitudinal direction of the package.

#### 6.2.2 Results From Machine Learning-Based Modeling.

The data obtained from ANSYS simulations were first subjected to data analysis using pair plots, as displayed in Fig. 9. Pair plots are powerful visualization tools that display the distribution of single variables and the interrelationship between any two variables in the dataset. The scatter plot tiles in the upper triangle represent the correlation between variables corresponding to the vertical and horizontal axes. Histograms on diagonal plots represent the frequency distribution of a single variable, whereas plot tiles on the lower triangle represent Pearson correlation coefficients for the variables in the corresponding axes. For instance, the plot corresponding to tile number 1 × 2 shows the dependence of the maximum temperature (*y*-axis) on the thermal conductivity of the heat sink (*x*-axis), whereas the plot corresponding to tile number 2 × 1 displays the correlation coefficient for these variables. A negative correlation coefficient (−0.62 in this case) indicates negative dependence, wherein the maximum temperature decreases with increasing heat sink thermal conductivity. Based on the correlation coefficients, heat sink conductivity and the heat transfer coefficient have a higher impact on the maximum temperature than other variables. While such correlation coefficients provide information about the magnitude of the correlation as well as the direction of the association, it only provides a single number for the entire dataset. To provide a detailed overall picture, SHAP plots have been used (in Sec. 6.2.3) to highlight the marginal dependency of all individual points in a dataset.

Table 3 summarizes the *R*^{2} and AARD performance metrics for all three algorithms (ANN, RF and SVR) employed in this study. The performance of the databank on polynomial regression was also evaluated; however, it was seen to perform poorly (*R*^{2} = 92.14, as detailed in Supplemental Material on the ASME Digital Collection) owing to the highly nonlinear nature of the data. SVR and ANN models exhibit better prediction performance than RF with overall *R*^{2} values of 99.98% and 99.97%, respectively. The AARD values are the lowest for support vector regression-based predictions with training, test and overall values of 0.16, 0.32, and 0.18%, respectively.

Neural network | Random forest | Support vector regression | |||||||
---|---|---|---|---|---|---|---|---|---|

Train | Test | Total | Train | Test | Total | Train | Test | Total | |

R^{2} % | 99.96 | 99.98 | 99.97 | 99.67 | 99.43 | 99.60 | 99.99 | 99.96 | 99.98 |

AARD % | 0.46 | 0.44 | 0.46 | 1.82 | 2.51 | 1.92 | 0.16 | 0.32 | 0.18 |

Neural network | Random forest | Support vector regression | |||||||
---|---|---|---|---|---|---|---|---|---|

Train | Test | Total | Train | Test | Total | Train | Test | Total | |

R^{2} % | 99.96 | 99.98 | 99.97 | 99.67 | 99.43 | 99.60 | 99.99 | 99.96 | 99.98 |

AARD % | 0.46 | 0.44 | 0.46 | 1.82 | 2.51 | 1.92 | 0.16 | 0.32 | 0.18 |

Figure 10 shows the junction temperatures predicted by the ML algorithms versus those obtained via ansys simulations, with the dashed line being the identity line (*y* = *x*). The inlet graphs represent the percentile distribution of the AARD values for the corresponding algorithms. It is evident from Figs. 10(a) and 10(b) (in line with the metrics in Table 3) that SVR and ANN result in robust predictions, which is visually indicated by data points closely clustered around the identity line. Random forest algorithm tends to underpredict temperature values (Fig. 10(c)) at higher temperatures. This can be explained by a lack of data points in regions of higher temperature (as evidenced by the pair plot in Fig. 9) that lead to poor training and thereby poor prediction performance in those regions. While SVR provides better performance in terms of AARD metrics, it is interesting to note that for higher temperature regions, data points are more closely clustered to the identity line for neural network as compared to SVR.

It is known that as the prediction flexibility of a statistical model increases, its interpretability correspondingly decreases (also known as the flexibility-interpretability tradeoff). All the models using in the present analysis provide very good prediction performances, albeit at the cost of decreased model interpretabilities. Since the underlying relationship between the input and target variable can be difficult to discern using such approaches, they fall under the category of being black-box-based approaches. Therefore, it becomes difficult to delineate the relative contribution of various features toward the thermal performance using highly flexible models (such as the ones employed in our analysis) when compared to more interpretable statistical approaches such as polynomial regression. Supplement To overcome the drawback of interpretability of ML-based approaches, we couple the RF algorithm with SHAP tree explainer [46,47] to highlight the contribution of individual predictors on the junction temperature. SHAP has its origins in game theory and helps quantify the marginal contribution of every feature instance toward the prediction output of the model. The fundamental difference between SHAP feature importance and traditional permutation feature importance is that the latter is based on evaluating the increase in prediction error as feature values are permuted and provides a single value for the overall feature importance for all the instances. SHAP, on the other hand, is based on evaluating the marginal contributions and helps provide a bird's eye view on how every instance value contributes to the final prediction.

#### 6.2.3 Results From SHAP Analysis.

Figure 11 outlines the SHAP summary plot, denoting the importance of input features (in descending order of importance from top to bottom) in terms of their impact on junction temperature. Each data point on the plot represents a Shapley value for a particular instance and a feature used in the ML model. Datapoints having the same Shapley values for a feature have been jittered across the *y*-axis to highlight the density of points having the same values. Accordingly, the horizontal location or their SHAP values quantify the influence of individual instance values on junction temperatures, with values toward the right or left being associated with higher or lower marginal temperature prediction values. The color bar denotes the magnitude of the feature with red or blue dots, indicating higher or lower feature magnitudes (higher or lower thermal conductivity or heat transfer coefficient), respectively. Such SHAP plots provide much-needed information for thermal engineers on the variables or design parameters that could be tuned to create the maximum impact on the thermal performance of the device.

Shapley additive explanations values corresponding to different features for a particular instance are the numbers that, on adding sequentially to the average of all the temperature predictions, provide the temperature prediction for that set of feature instance values. It is, therefore, representative of marginal contribution (positive or negative) values, denoting how much the average of all temperature predictions would shift (to higher or lower regions), thereby indicating the magnitude as well as the direction of influence of every feature value. Based on the plot outlined in Fig. 11, thermal conductivities of the heat sink and the heat transfer coefficient (type of cooling technology employed) have the highest variation in SHAP values, and therefore have maximum impact on the junction temperature, in line with their Pearson coefficient values (outlined in Fig. 9). The range of SHAP values are relatively higher in the positive direction because the average temperature value for the steady-state databank is very low when compared to the overall range, as is evident in the histogram (tile 1 × 1 of the pair plot in Fig. 9). Lower sink thermal conductivities, corresponding to higher junction temperature values, would therefore need higher SHAP values to push the average temperature predictions to a higher magnitude. Additionally, high values for thermal conductivities of the heat sink, baseplate, encapsulant and heat transfer coefficient are associated with negative SHAP values, which implies that they negatively correlate with junction temperatures. This is expected as materials with higher thermal conductivity and two-phase cooling technologies with higher HTCs reduce the junction temperature by lowering the thermal resistance.

Thermal conductivity of the encapsulant exhibits the least spread in SHAP values and therefore has the least impact on junction temperatures. Thermal conductivities in the *x–y* (in-plane) direction have slightly more impact on the junction temperature variation when compared to thermal conductivity in the *z*-direction (normal to package surface). This can be attributed to the range of thermal conductivity values being higher in the in-plane direction (0.2–40 W/mK) when compared to the out-of-plane direction (0.2–10 W/mK), which is in line with the reported trends for polymeric nanocomposite encapsulants [26]. This implies that heat spreading plays a more dominant role than natural convection (occurring on top of the package) in dissipating heat away from the junction for the range of encapsulant conductivities considered in this study.

Based on the SHAP values for the dataset employed, a heat sink conductivity of at least 200 W/mK would be required for a SHAP value of about -20, representing a 20 °C reduction from the average temperature of 157 °C. Furthermore, keeping all the other parameters fixed (air-cooled copper heat sink at 10 °C with an AlSiC baseplate and *k*_{encz} = 0.2 W/mK), encapsulants with an in-plane thermal conductivity of 40 W/mK lead to a temperature reduction of 4.2 °C whereas those with an in-plane thermal conductivity of 0.2 W/mK (baseline case) lead to a temperature increase of 3.2 °C from the average temperature. This implies that using best-in-class nanocomposites as encapsulants lead to a 7.4 °C reduction in junction temperature when compared to baseline case (currently used) encapsulants. However, if we consider single-phase and two-phase liquid cooling, the improvement over the baseline case value reduces to 6.5 and 6.3 °C, respectively.

While single-digit temperature reductions with advanced encapsulants might appear trivial, it is essential to note that such reductions can still significantly increase the lifetime of these packages. An empirical and very approximate rule of thumb in the packaging industry is that the lifetimes are reduced by half for every 10 °C increase in temperature. Even more important are the benefits of high thermal conductivity encapsulants in reducing temperature gradients in the package. While this study did not evaluate temperature gradients, high thermal conductivities (especially in the in-plane direction) will minimize the severity of hot spots by lateral heat spreading. We also note that most emerging nanocomposites are still in their early stage of development [14]. Further R&D is needed in areas such as achieving a multifunctional suite of properties, scalability, and durability. Nevertheless, this analysis does show that encapsulants with a thermal conductivity enhancement > 100X of baseline materials (*k* = 0.2 W/mK) could provide meaningful benefits from a thermal management and lifetime standpoint. These benefits can be augmented if dual-sided cooling were incorporated to further reduce the thermal resistance from the top of the package. Importantly, the influence of better thermal properties of the encapsulant also depends on the package design. Alternate package designs could take greater advantage of nanoparticle-enhanced thermal properties of emerging encapsulants than existing packages.

### 6.3 Results of Transient Analysis

#### 6.3.1 Temperature Profile in Package.

Figure 12 provides snapshots (of the top of the package) of an illustrative temperature profile in the half-bridge module (with the encapsulant hidden) at different time intervals as obtained from ansys transient simulations. This particular case corresponds to a heat transfer coefficient of 5000 W/m^{2}K (liquid cooling), *T*_{sink} = 10 °C, a copper heat sink, an AlSiC baseplate, an encapsulant with diffusivity of 1 × 10^{−5} m^{2}/s, and a square wave input power profile with a pulse width of 1 s. When compared to the steady-state case, the hotspots are concentrated in a narrower region around the MOSFETs, which could be ascribed to the pulse width being lesser than the time constant of the package. Temperatures increase until the pulse powers down, as can be observed from succeeding snapshots at progressive time intervals. Since the device is operating in a transient state, pulse width is an important parameter that determines the maximum temperature and temperature swings in the package.

#### 6.3.2 Results From Machine-Learning-Based Analysis.

Figure 13 is a pair plot for the databank (1200 points) obtained from transient ansys simulations. While the relationship of *T*_{max} with the pulse width can be very clearly inferred from the scatter plot corresponding to tile no 1 × 6, understanding the dependence on material and cooling properties requires further analysis. The Pearson correlation coefficient values suggest that material properties and pulse width have a more significant influence on hotspot temperatures than the cooling technology employed. This has been investigated in detail using SHAP plots. For transient analysis, machine learning algorithms (RF, SVR, and ANN) were once again trained after deciding the optimal hyperparameters using grid search cross-validation, and their performance was tested using the procedure outlined in Sec. 5. Since the magnitudes of input features exhibited a significant orders of magnitude variation across their values (diffusivity values are seven to nine orders of magnitude smaller than HTC), they were normalized before conducting the analysis.

Table 4 summarizes the performance metrics for the ML algorithms trained and tested using the data obtained from transient ansys simulations. The prediction performance metrics of the ML algorithms for the transient database are slightly lower than that for steady-state data, which can be attributed either to a lower number of data points (1200 versus 2592) or a highly nonlinear nature of the transient data. ANN exhibits the best overall prediction accuracy with an overall *R*^{2 }% of 99.72, followed by RF and SVR with overall *R*^{2 }% values of 99.63 and 98.51, respectively.

Neural network | Random forest | Support vector regression | |||||||
---|---|---|---|---|---|---|---|---|---|

Train | Test | Total | Train | Test | Total | Train | Test | Total | |

R^{2} % | 99.9 | 98.61 | 99.72 | 99.76 | 98.8 | 99.63 | 98.54 | 98.39 | 98.51 |

AARD % | 0.44 | 1.46 | 0.59 | 0.66 | 1.45 | 0.78 | 1.29 | 1.64 | 1.35 |

Neural network | Random forest | Support vector regression | |||||||
---|---|---|---|---|---|---|---|---|---|

Train | Test | Total | Train | Test | Total | Train | Test | Total | |

R^{2} % | 99.9 | 98.61 | 99.72 | 99.76 | 98.8 | 99.63 | 98.54 | 98.39 | 98.51 |

AARD % | 0.44 | 1.46 | 0.59 | 0.66 | 1.45 | 0.78 | 1.29 | 1.64 | 1.35 |

Figure 14 compares the predicted (obtained via ML analysis) versus simulated temperatures (obtained via simulations) for the three algorithms trained on the transient data. A considerable deviation of the training data points from the identity line for SVR algorithm can be observed, especially at higher temperatures, which indicates inadequate training, thereby leading to lower performance metrics than the other two algorithms. However, the difference in test and training accuracies is the lowest for SVR, which implies that while the training accuracies are low, the variance is low as well when compared to ANN and RF, where there is a difference of at least 1% between the training and test accuracies. This implies that while ANN and RF models lead to better overall accuracy, they might be slightly overfitting the training data points, indicating a relatively higher variance. It is noted that we have selected the hyperparameters to reduce the variance and achieve an optimal bias-variance tradeoff to obtain accurate and generalized ANN and RF prediction models.

#### 6.3.3 Results From SHAP Analysis.

Figure 15 shows the SHAP plot for the variables considered in the transient analysis. While heatsink parameters such as HTC and thermal conductivity primarily dictated steady-state performance, material properties of the package have a relatively higher impact on transient thermal performance when compared to the cooling conditions at the heat sink (for the range of parameters considered in this study). Higher values of pulse width and diffusivities increase and decrease the maximum junction temperature, respectively, as evident from the SHAP plot. Pulse width has the maximum impact, which could be ascribed to its values being lesser than the thermal time constant of the PE package. If the pulse width exceeded the time constant, the device would approach steady-state, and the maximum temperature, in that case, would then be determined by a combined interplay between the steady-state parameters as outlined in the SHAP plot in Fig. 11. It is important to note that steady-state and transient performances are governed by different properties (thermal conductivity for steady-state versus diffusivity for transient state), and the dependence on this set of parameters is dictated by the pulse width or the operating frequency of the package. In short, thermal conductivity (representing thermal resistance) of the materials determines the maximum temperature (*T*_{max}), whereas diffusivities (representing thermal capacitances) would determine how quickly it asymptotes to *T*_{max}. Therefore, it is vital to determine the operating frequency and the thermal time constant of a PE package before tuning the material properties to maximize thermal performance.

While this analysis investigates the influence of pulse width on maximum temperature, the influence of pulse width on temperature swings and the resulting thermomechanical stress deserves a separate in-depth investigation. A holistic approach to designing power electronic packages requires identifying the influence of device parameters on hotspot temperature, temperature variations, thermomechanical stresses, and the resulting predicted lifetime of the package. SHAP plots provide a powerful tool to conduct a detailed analysis of the impact of various parameters on overall device performance.

In this particular case, using an encapsulant with the best in class diffusivity of 1 × 10^{−5} m^{2}/s (considering other device parameters as an air-cooled copper heat sink with an AlSiC base plate for a pulse width of 1 s) corresponds to a 4.6 °C marginal decrease in the maximum temperature whereas a diffusivity of 1 × 10^{−7} m^{2}/s (baseline case) corresponds to a 4.3 °C increase in the maximum temperature. This implies that using an encapsulant with a best-in-class diffusivity will lead to an 8.9 °C decrease in the maximum junction temperature of the package when compared to the baseline diffusivity. The diffusivity of conventional polymers (baseline case) are of the order of 1 × 10^{−7} m^{2}/s, and only a handful of nanocomposites having values exceeding 1 × 10^{−5} m^{2}/s [26] have been synthesized, which highlights opportunities for materials developers. In order to further enhance the thermal impact of these encapsulants, two approaches are suggested: (1) Develop materials with thermal diffusivity > 1 × 10^{−4} m^{2}/s; this is challenging and subject to material-related considerations. (2) Introduce double-sided cooling, which is a more practical approach.

## 7 Conclusions

The SHAP analysis shows that heat sink parameters (thermal conductivity of heat sink material and HTC) and encapsulant thermal conductivity have the maximum and minimum impact, respectively, on steady-state hotspot junction temperature. In contrast, cooling conditions deployed at the heat sink have the least impact on junction temperature under transient conditions, with the pulse width (operating frequency) significantly influencing the transient performance. Using SHAP analysis, we estimate that encapsulants with 100× the thermal conductivity and diffusivity of currently used encapsulants will reduce the maximum hotspot temperatures by 7.4 °C and 8.9 °C, respectively, for the device operating in static and dynamic modes. These numbers highlight the need for significant improvements in thermal properties by the materials development community.

In addition to the above findings, this study highlighted the utility of machine learning for design and material selection for a single package geometry. Natural extensions of the present work include considering geometry variations (of the package and microchannels in the heat sink) and including variable heat transfer coefficients (developing flow) in a single analysis. While these considerations add more complexities, the utility of machine learning increases with the complexity of the problem. While this study focuses on the utility of using ML+SHAP for investigating the benefits of improved encapsulants, this approach/tool can also be used for a holistic assessment to quantify the impact of a change in other parameters (cooling technology, thermal properties and the pulse width) on hotspot junction temperature. This would allow thermal designers to identify specific parameters that yield the largest temperature reduction, with the least amount of modification to existing parameters.

Another useful extension of this work would be to assess the influence of material properties on temperature swings and on the resulting thermo-mechanical stresses at the wire-bond and solder-chip junctions; such analyses can predict the expected life of the package. Such analyses would consider a more expansive material property space, which includes the coefficient of thermal expansion, Young's modulus (E), and Poisson's ratio (υ). Furthermore, while the present analysis considers uniform power generation in all MOSFETs, the presently developed methodology could be extended for nonuniform power generation scenarios by tagging the MOSFET locations and individual powers and considering them as input variables in the ML algorithm. Finally, the present study analyzes three algorithms, which are commonly used for predictive analysis. However, other ML algorithms such as gradient boosted regression trees, XGBoost (eXtreme Gradient Boosting) and extremely randomized trees, all of which are variants of ensemble tree-based algorithms, could also be used for similar predictions.

## Acknowledgment

The authors would like to acknowledge discussions and feedback from Bruce Geil, Aderinto Ogunniyi and Morris Berman from Army Research Laboratory. This work was supported by a cooperative agreement between Army Research Laboratory and UT Austin. Palash V. Acharya would like to acknowledge the University Graduate Continuing Fellowship from UT Austin. The authors also acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing high performance computing resources that have contributed to the research results reported within this paper.

## Funding Data

U.S. Army Research Laboratory (ARL) (Cooperative Agreement; Funder ID: 10.13039/100006754).

## Nomenclature

*b*=_{j}bias

*C*=penalty parameter

*f*=activation function

*I*=_{i}input variable

*k*=thermal conductivity

*O*=_{j}predicted output

*P*=_{D}power dissipation at MOSFET

*u*=_{j}weighted sum at the

*j*th neuron*w*=weight factor

*w*=_{ij}weight between the

*i*th input signal and the*j*th neuron*α*=thermal diffusivity

*α*=_{i}Lagrange multiplier

*γ*=kernel parameter

*ξ*=slack variable

*ϕ*=kernel function

## References

**144**(1), p. 010801.10.1115/1.4050002