2012 Freeman Scholar Lecture: Computational Fluid Dynamics on Graphics Processing Units

Vanka, S. P.

doi:10.1115/1.4023858

This paper discusses the various issues of using graphics processing units (GPU) for computing fluid flows. GPUs, used primarily for processing graphics functions in a computer, are massively parallel multicore processors, which can also perform scientific computations in a data parallel mode. In the past ten years, GPUs have become quite powerful and have challenged the central processing units (CPUs) in their price and performance characteristics. However, in order to fully benefit from the GPUs' performance, the numerical algorithms must be made data parallel and converge rapidly. In addition, the hardware features of the GPUs require that the memory access be managed carefully in order to not suffer from the high latency. Fully explicit algorithms for Euler and Navier–Stokes equations and the lattice Boltzmann method for mesoscopic flows have been widely incorporated on the GPUs, with significant speed-up over a scalar algorithm. However, more complex algorithms with implicit formulations and unstructured grids require innovative thinking in data access and management. This article reviews the literature on linear solvers and computational fluid dynamics (CFD) algorithms on GPUs, including the author's own research on simulations of fluid flows using GPUs.

References

1.

Nickolls

,

J.

, and

Dally

,

W. J.

,

2010

, “

The GPU Computing Era

,”

IEEE MICRO

,

30

(

2

), pp.

56

–

69

.10.1109/MM.2010.41

Google Scholar

Crossref

2.

Fatahalian

,

K.

, and

Houston

,

M.

,

2008

, “

A Closer Look at GPUs

,”

Commun. ACM

,

51

(

10

), pp.

50

–

57

.10.1145/1400181.1400197

Google Scholar

Crossref

3.

Boyd

,

C.

,

2008

, “

Data Parallel Computing

,”

ACM Queue

,

6

(

2

), pp.

31

–

39

.10.1145/1365490.1365499

Google Scholar

Crossref

4.

Lindholm

,

E.

,

Nickolls

,

J.

,

Oberman

,

S.

, and

Montrym

,

J.

,

2008

, “

NVIDIA Tesla: A Unified Graphics and Computing Architecture

,”

IEEE MICRO

,

28

(

2

), pp.

39

–

55

.10.1109/MM.2008.31

Google Scholar

Crossref

5.

NVIDIA Corporation, http://www.nvidia.com/page/home.htm

6.

“

Products & Technologies

,” AMD, http:/www.amd.com/us/products

7.

Patankar

,

S. V.

,

1980

,

Numerical Heat Transfer and Fluid Flow

,

McGraw Hill

,

New York

.

8.

Fletcher

,

C. A. J.

,

1991

,

Computational Techniques for Fluid Dynamics

,

Springer

,

Berlin

.

9.

Anderson

,

D. A.

,

Tannehill

,

J. C.

, and

Pletcher

,

R. H.

,

1984

,

Computational Fluid Mechanics and Heat Transfer

,

Hemisphere

,

New York

.

10.

Ferziger

,

J. H.

, and

Peric

,

M.

,

2002

,

Computational Methods for Fluid Dynamics

, 3rd ed.,

Springer Verlag

,

Berlin

.

11.

ANSYS Fluent

, http://www.ansys.com

12.

“

CFD and CAE Products – CD-adapco

,” CD-adapco, http://www.cd-adapco.com/products/

13.

“

COMSOL Multiphysics Engineering Simulation Software

,” COMSOL, http://www.comsol.com/products/multiphysics/

14.

“

ESI Group – Fluid Dynamics

,” ESI, http://www.esi-group.com/products/Fluid-Dynamics

15.

Metacomp Technologies

, http://www.metacomptech.com/

16.

Pope

,

S. B.

,

2000

,

Turbulent Flows

,

Cambridge University

,

Cambridge, England

.

17.

Gorder

,

P. F.

,

2007

, “

Multicore Processors for Science and Engineering

,”

Comput. Sci. Eng.

,

9

(

2

), pp.

3

–

7

10.1109/MCSE.2007.35.

Google Scholar

Crossref

18.

Geer

,

D.

,

2005

, “

Chip Makers Turn to Multicore Processors

,”

Computer

,

38

(

5

), pp.

11

–

13

.10.1109/MC.2005.160

Google Scholar

Crossref

19.

Owens

,

J. D.

,

Houston

,

M.

,

Luebke

,

D.

,

Green

,

S.

,

Stone

,

J. E.

, and

Phillips

,

J. C.

,

2008

, “

GPU Computing

,”

Proc. IEEE

,

96

(

5

), pp.

879

–

899

.10.1109/JPROC.2008.917757

Google Scholar

Crossref

20.

Kirk

,

D. B.

, and

Hwu

,

W. W.

,

2010

,

Programming Massively Parallel Processors: A Hands-On Approach

(Applications of GPU Computing Series),

Morgan Kaufman, Burlington, MA

.

21.

Liu

,

G. R.

, and

Liu

,

M. B.

,

2003

,

Smoothed Particle Hydrodynamics: A Meshfree Particle Method

,

World Scientific

,

Singapore

.

22.

Succi

,

S.

,

2001

,

The Lattice Boltzmann Equation for Fluid Dynamics and Beyond

,

Oxford University

,

New York

.

23.

Bird

,

G. A.

,

1994

,

Molecular Gas Dynamics and the Direct Simulation of Gas Flows

,

Oxford University

,

New York

.

24.

“

Parallel Programming and Computing Platform: CUDA

,” NVIDIA, http://www.nvidia.com/object/cuda_home_new.html

25.

Nickolls

,

J.

,

Buck

,

I.

,

Garland

,

M.

, and

Skadron

,

K.

,

2008

, “

Scalable Parallel Programming With CUDA

,”

ACM Queue

,

6

(

2

), pp.

41

–

53

.10.1145/1365490.1365500

Google Scholar

Crossref

26.

Halfhill

,

T. R.

,

2008

, “

Parallel Processing With CUDA

,”

Microprocessor Rep.

,

Jan. 28

,

2008

.

27.

Sanders

,

J.

, and

Kandrot

,

E.

,

2011

,

CUDA by Example: An Introduction to General-Purpose GPU Programming

,

Addison-Wesley

,

New Jersey

.

28.

Cook

,

S.

,

2011

,

CUDA Programming: A Developer's Guide to Parallel Computing With GPUs

,

Morgan Kaufmann, Burlington, MA

.

29.

Farber

,

R.

,

2011

,

CUDA Application Design and Development

,

Elsevier

,

New York

.

30.

Tsuchiyama

,

R.

,

Nakamura

,

T.

,

Iizuka

,

T.

,

Asahara

,

A.

,

Son

,

J.

, and

Miki

,

S.

,

2012

,

The OpenCL Programming Book

,

Fixstars Corporation, Japan

.

31.

PGI CUDA FORTRAN Compiler

, The Portland Group, http://www.pgroup.com/resources/accel_files/index.htm

32.

Harlow

,

F. H.

, and

Welch

,

J. E.

,

1965

, “

Numerical Calculation of Time-Dependent Viscous Incompressible Flow of Fluid With a Free Surface

,”

Phys. Fluids

,

8

(

12

), pp.

2182

–

2189

.10.1063/1.1761178

Google Scholar

Crossref

33.

Hockney

,

R. W.

, and

Jesshope

,

C. R.

,

1981

,

Parallel Computers

,

Adam Hilger

,

Bristol

, UK.

34.

Greenbaum

,

A.

,

1997

,

Iterative Methods for Solving Linear Systems

,

SIAM

,

Philadelphia

.

35.

Saad

,

Y.

,

2003

,

Iterative Methods for Sparse Linear Systems

,

SIAM

,

Philadelphia

.

36.

Hockney

,

R. W.

,

1965

, “

A Fast Direct Solution of Poisson's Equation Using Fourier's Analysis

,”

J. ACM

,

12

(

1

), pp.

95

–

113

.10.1145/321250.321259

Google Scholar

Crossref

37.

Allmann

,

S.

,

Rauber

,

T.

, and

Runger

,

G.

,

2001

, “

Cyclic Reduction on Distributed Shared Memory Machines

,” Euromicro Conference on Parallel Distributed and Networked-Based Processing,

IEEE

Computer Society, pp.

290

–

297

.10.1109/EMPDP.2001.905055

38.

Lambiotte

,

J. J.

, and

Voigt

,

R. G.

,

1975

, “

The Solution of Tridiagonal Linear Systems on the CDC STAR-100 Computer

,”

ACM Trans. Math. Softw.

,

1

(

4

), pp.

308

–

329

.10.1145/355656.355658

Google Scholar

Crossref

39.

Muller

,

S. M.

, and

Sheerer

,

D.

,

1991

, “

A Method to Parallelize Tridiagonal Solvers

,”

Parallel Comput.

,

17

, pp.

181

–

188

.10.1016/S0167-8191(05)80104-8

Google Scholar

Crossref

40.

Stone

,

H. S.

,

1973

, “

An Efficient Parallel Algorithm for the Solution of a Tridiagonal Linear System of Equations

,”

J. ACM

,

20

(

1

), pp.

27

–

38

.10.1145/321738.321741

Google Scholar

Crossref

41.

Ho

,

C. T.

, and

Johnson

,

S. L.

,

1990

, “

Optimizing Tridiagonal Solvers for Alternating Direction Methods on Boolean Cube Multiprocessors

,”

SIAM (Soc. Ind. Appl. Math.) J. Sci. Stat. Comput.

,

11

(

3

), pp.

563

–

592

.10.1137/0911032

Google Scholar

Crossref

42.

Egecioglu

,

O.

,

Koc

,

C. K.

, and

Laub

,

A. J.

,

1989

, “

A Recursive Doubling Algorithm for Solution of Tridiagonal Systems on Hypercube Multiprocessors

,”

J. Comput. Appl. Math.

,

27

, pp.

95

–

108

.10.1016/0377-0427(89)90362-2

Google Scholar

Crossref

43.

Zhang

,

Y.

,

Cohen

,

J.

, and

Owens

,

J. D.

,

2010

, “

Fast Tridiagonal Solvers on the GPU

,” Proceedings of the 15th

ACM

SIGPLAN Symposium on the Principles and Practice of Parallel Programming, pp.

127

–

136

.10.1145/1693453.1693472

44.

Davidson

,

A.

,

Zhang

,

Y.

, and

Owens

,

J. D.

,

2011

, “

An Auto-Tuned Method for Solving Large Tridiagonal Systems on the GPU

,” Proceedings of the 2011

IEEE

International Parallel & Distributed Processing Symposium, pp.

956

–

965

.10.1109/IPDPS.2011.92

45.

Egloff

,

D.

,

2010

, “

High Performance Finite Difference PDE Solvers on GPUs

,” QuantAlea GmbH Technical Report.

46.

Sakharmykh

,

N.

,

2010

, “

Efficient Tridiagonal Solvers for ADI Methods and Fluid Simulation

,”

NVIDIA GPU Technology Conference

.

47.

Goddeke

,

D.

, and

Strzodka

,

R.

,

2011

, “

Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed Precision Multigrid

,”

IEEE Trans. Parallel Distrib. Syst.

,

22

(

1

), pp.

22

–

32

.10.1109/TPDS.2010.61

Google Scholar

Crossref

48.

Bolz

,

J.

,

Farmer

,

I.

,

Grinspun

,

E.

, and

Schroder

,

P.

,

2003

, “

Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid

,”

ACM Trans. Graphics

,

22

(

3

), pp.

917

–

924

.10.1145/882262.882364

Google Scholar

Crossref

49.

Goodnight

,

N.

,

Woolley

,

C.

,

Lewin

,

G.

,

Luebke

,

D.

, and

Humphreys

,

G.

,

2003

, “

A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware

,”

SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware

, pp.

1

–

11

.

50.

Bahi

,

J. M.

,

Couturier

,

R.

, and

Khodja

,

L. Z.

,

2011

, “

Parallel Sparse Linear Solver GMRES for GPU Clusters With Compression of Exchanged Data

,”

Lect. Notes Comput. Sci.

,

7155

, pp.

471

–

480

.10.1007/978-3-642-29737-3

Google Scholar

Crossref

51.

Amador

,

G.

, and

Gomes

,

A.

,

2009

, “

Linear Solvers for Stable Fluids: GPU vs CPU

,”

Proceedings of the 17th Encontro Português de Computação Gráfica (EPCG’09)

, pp.

145

–

153

.

52.

Gaikwad

,

A.

, and

Toke

,

I. M.

,

2010

, “

Parallel Iterative Linear Solvers on GPU: A Financial Engineering Case

,” Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-Based Processing

(PDP)

, pp.

607

–

614

10.1109/PDP.2010.55.

53.

Li

,

R.

, and

Saad

,

Y.

,

2013

, “

GPU-Accelerated Preconditioned Iterative Linear Solvers

,”

J. Supercomput.

,

63

, pp.

443

–

466

.10.1007/s11227-012-0825-3

Google Scholar

Crossref

54.

Jost

,

T.

,

Contassot-Vivier

,

S.

, and

Vialle

,

S.

,

2010

, “

An Efficient Multi-Algorithms Sparse Linear Solver for GPUs

,” Parallel Computing: From Multicores and GPU's to Petascale, Vol.

19

,

IOS Press

, Amsterdam, The Netherlands, pp.

546

–

553

10.3233/978-1-60750-530-3-546.

55.

Haase

,

G.

,

Liebmann

,

M.

,

Douglas

,

C. C.

, and

Plank

,

G.

,

2010

, “

A Parallel Algebraic Multigrid Solver on Graphics Processing Units

,”

Lect. Notes Comput. Sci.

,

5938

, pp.

38

–

47

.10.1007/978-3-642-11842-5

Google Scholar

Crossref

56.

Wiggers

,

W. A.

,

Bakker

,

V.

,

Kokkeler

,

A. B. J.

, and

Smit

,

G. J. M.

,

2007

, “

Implementing the Conjugate Gradient Algorithm on Multi-Core Systems

,” International Symposium on System-on-Chip

(ISSOC)

, Tampere, Finland, Nov. 19–21, pp.

1

–

4

10.1109/ISSOC.2007.4427436.

57.

Cevahir

,

A.

,

Nukada

,

A.

, and

Matsuoka

,

S.

,

2009

, “

Fast Conjugate Gradients With Multiple GPUs

,” International Conference on Computational Sciences (

ICCS

), Vol. 5544,

Springer

,

New York

, pp.

893

–

903

10.1007/978-3-642-01970-8_90.

58.

Liu

,

X.

,

Liu

,

Z.

,

Tan

,

S. X.-D.

, and

Gordon

,

J.

,

2012

, “

Full-Chip Thermal Analysis of 3D ICs With Liquid Cooling by GPU-Accelerated GMRES Method

,”

ISQED

(2012), pp.

123

–

128

10.1109/ISQED.2012.6187484.

59.

Heuveline

,

V.

,

Lukarski

,

D.

, and

Weiss

,

J. P.

,

2012

, “

Fine-Grained Parallel Preconditioners for Fast GPU-Based Solvers

,”

NVIDIA GPU Technology Conference

,

San Jose, CA

,

May

.

60.

Kruger

,

J.

, and

Westermann

,

R.

,

2003

, “

Linear Algebra Operators for GPU Implementation of Numerical Algorithms,”

ACM Trans. Graphics

,

22

(

3

), pp.

908

–

913

.10.1145/882262.882363

Google Scholar

Crossref

61.

Williams

,

S.

,

Vuduc

,

R.

,

Oliker

,

L.

,

Shalf

,

J.

,

Yelick

,

K.

, and

Demmel

,

J.

,

2009

, “

Optimizing Sparse Matrix-Vector Multiply on Emerging Multicore Platforms

,”

Parallel Comput.

,

35

(

3

), pp.

178

–

194

.10.1016/j.parco.2008.12.006

Google Scholar

Crossref

62.

Williams

,

S.

,

Bell

,

N.

,

Choi

,

J.

,

Garland

,

M.

,

Oliker

,

L.

, and

Vu

,

R.

,

2010

, “

Sparse Matrix-Vector Multiplication on Multicore and Accelerators

,” Scientific Computing With Multicore and Accelerators,

CRC Press

, Boca Raton, FL.10.1201/b10376-8

63.

Bell

,

N.

, and

Garland

,

M.

,

2008

, “

Efficient Sparse Matrix-Vector Multiplication on CUDA

,” NVIDIA Technical Report No. NVR 2008-004.

64.

Baskaran

,

M.

, and

Bordawekar

,

R.

,

2008

, “

Optimizing Sparse Matrix-Vector Multiplications on GPUs

,” IBM Technical Report No. RC 24704.

65.

Buatois

,

L.

,

Caumon

,

G.

, and

Levy

,

B.

,

2009

, “

Concurrent Number Cruncher – GPU Implementation of a General Sparse Linear Solver

,”

Int. J. Parallel, Emergent, Distrib. Syst.

,

24

(

3

), pp.

205

–

223

.10.1080/17445760802337010

Google Scholar

Crossref

66.

Tomov

,

S.

,

Nath

,

R.

,

Ltaief

,

H.

, and

Dongarra

,

J.

,

2010

, “

Dense Linear Algebra Solvers for Multicore With GPU Accelerators

,”

IEEE

International Symposium on Parallel & Distributed Processing, pp.

1

–

8

.10.1109/IPDPSW.2010.5470941

67.

Weber

,

P.

,

Du

,

R.

,

Luszczek

,

P.

,

Tomov

,

S.

,

Peterson

,

G.

, and

Dongarra

,

J.

,

2012

, “

From CUDA to OpenCL: Towards a Performance-Portable Solution for Multi-Platform GPU Programming

,”

Parallel Comput.

,

38

(

8

), pp.

391

–

407

.10.1016/j.parco.2011.10.002

Google Scholar

Crossref

68.

Buttari

,

A.

,

Langon

,

J.

,

Kurzak

,

J.

, and

Dongarra

,

J.

,

2009

, “

A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures

,”

Parallel Comput.

,

35

(

1

), pp.

38

–

53

.10.1016/j.parco.2008.10.002

Google Scholar

Crossref

69.

“

GPGPU.org: General-Purpose Computation on Graphics Processing Units

,” GPGPU, http://www.gpgpu.org

70.

Humphrey

,

J. R.

,

Price

,

D. K.

,

Spagnoli

,

K. E.

,

Paolini

,

A. L.

, and

Kelmelis

,

E. J.

,

2010

, “

CULA: Hybrid GPU Accelerated Linear Algebra Routines

,”

Proc. SPIE

,

7705

, p.

770502

.10.1117/12.850538

71.

Volkov

,

V.

, and

Demmel

,

J. W.

,

2008

, “

Benchmarking GPUs to Tune Dense Linear Algebra

,”

Proc. 2008 ACM/IEEE Conference on Supercomputing

, pp.

31

–

41

.

72.

Vuduc

,

R.

,

Chandramowlishwaran

,

A.

,

Choi

,

J.

,

Guney

,

M.

, and

Shringarpure

,

A.

,

2010

, “

On the Limits of GPU Acceleration

,”

Proc. USENIX Wkshp. Hot Topics in Parallelism (HotPar)

,

Berkeley, CA

,

June

.

73.

Agarwal

,

R. K.

,

1989

, “

Development of a Navier-Stokes Code on a Connection Machine

,” Proc. of the 9th AIAA Computational Fluid Dynamics Conference, Buffalo, NY, June,

AIAA

, Paper No. 89-1938, pp.

103

–

108

.10.2514/6.1989-1938

74.

Agarwal

,

R. K.

, and

Lewis

,

J. C.

,

1992

, “

Computational Fluid Dynamics on Parallel Processors

,”

Comput. Syst. Eng.

,

3

(

1–4

), pp.

251

–

259

.10.1016/0956-0521(92)90110-5

Google Scholar

Crossref

75.

Levit

,

C.

, and

Jespersen

,

D.

,

1988

, “

Explicit and Implicit Solution of Navier-Stokes Equations on a Massively Parallel Computer

,”

Comput. Struct.

,

30

(

1–2

), pp.

385

–

393

.10.1016/0045-7949(88)90244-1

Google Scholar

Crossref

76.

Robichaux

,

J.

,

Tafti

,

D. K.

, and

Vanka

,

S. P.

,

1992

, “

Large-Eddy Simulations of Turbulence on the CM-2

,”

Numer. Heat Transfer, Part B

,

21

(

3

), pp.

367

–

388

.10.1080/10407799208944910

Google Scholar

Crossref

77.

Wang

,

G.

,

1996

, “

Large Eddy Simulations of Bluff-Body Wakes on Parallel Computers

,” Ph.D. thesis,

University of Illinois at Urbana

,

Champaign, IL

.

78.

Kass

,

M.

, and

Miller

,

G.

,

1990

, “

Rapid, Stable Fluid Dynamics for Computer Graphics

,” Computer Graphics (Proc. of

SIGGRAPH

90), pp.

49

–

57

.10.1145/97880.97884

79.

Stam

,

J.

,

1999

, “

Stable Fluids

,” Proc. 26th Annual Conference on Computer Graphics and Interactive Techniques (

SIGGRAPH

), pp.

121

–

128

.10.1145/311535.311548

80.

Stam

,

J.

,

2001

, “

A Simple Fluid Solver Based on FFT

,”

J. Graph Tools

,

6

(

2

), pp.

43

–

52

.10.1080/10867651.2001.10487540

Google Scholar

Crossref

81.

Harris

,

M.

,

2004

, “

Fast Fluid Dynamics Simulation on the GPU

,”

GPU Gems

,

Pearson Education

,

Boston

, MA, pp.

637

–

665

.

82.

Amador

,

G.

, and

Gomes

,

A.

,

2010

, “

CUDA-Based Linear Solvers for Stable Fluids

,” International Conference on Top of Form Information Science and Applications (

ICISA

),

Apr. 21–23

.10.1109/ICISA.2010.5480268

83.

Crane

,

K.

,

Llamas

,

I.

, and

Tariq

,

S.

,

2007

, “

Real-Time Simulation and Rendering of 3D Fluids

,”

GPU Gems

, Vol.

3

,

Pearson Education

,

Boston

, MA, pp.

633

–

675

.

84.

Scheidegger

,

C. E.

,

Comba

,

J. L. D.

, and

da Cunha

,

R. D.

,

2005

, “

Practical CFD Simulations on Programmable Graphics Hardware Using SMAC

,”

Comput. Graph. Forum

,

24

, pp.

715, 728

.10.1111/j.1467-8659.2005.00897.x

Google Scholar

Crossref

85.

Comba

,

J. L. D.

,

Dietrich

,

C.

,

Pagot

,

C.

, and

Scheidegger

,

C. E.

,

2003

, “

Computations on GPUs: From a Programmable Pipeline to an Efficient Stream Processor

,”

Rev. Inf. Teór. Appl.

,

10

, pp.

41

–

70

.

86.

Goddeke

,

D.

,

Strzodka

,

R.

, and

Turek

,

S.

,

2007

, “

Performance and Accuracy of Hardware-Oriented Native Emulated and Mixed-Precision Solvers in FEM Simulations

,”

Int. J. Parallel Emergent Distrib. Syst.

,

22

, pp.

221

–

256

.10.1080/17445760601122076

Google Scholar

Crossref

87.

Goddeke

,

D.

,

Strzodka

,

R.

,

Mohd-Yusof

,

J.

,

McCormick

,

P.

,

Wobker

,

H.

,

Becker

,

C.

, and

Turek

,

S.

,

2008

, “

Using GPUs to Improve Multigrid Solver Performance on a Cluster

,”

Int. J. CSE

,

4

(

1

), pp.

36

–

55

.10.1504/IJCSE.2008.021111

88.

Hagen

,

T.

,

Lie

,

K.

, and

Natvig

,

J.

,

2006

, “

Solving the Euler Equations on Graphics Processing Units

,”

Comput. Sci. (ICCS)

,

3994

, pp.

220

–

227

.10.1007/11758549_34

89.

Hagen

,

T. R.

,

Hjelmervik

,

J. M.

,

Lie

,

K. A.

,

Natvig

,

J. R.

, and

Henriksen

,

M. O.

,

2005

, “

Visual Simulation of Shallow Water Waves

,”

Simul. Model Pract. Theory

,

13

, pp.

716

–

726

.10.1016/j.simpat.2005.08.006

Google Scholar

Crossref

90.

Brodtkorb

,

A.

,

Hagen

,

T. R.

,

Lie

,

K. A.

, and

Natvig

,

J. R.

,

2010

, “

Simulation and Visualization of the Saint-Venant System Using GPUs

,”

Comput. Visualization Sci.

,

13

, pp.

341

–

353

.10.1007/s00791-010-0149-x

Google Scholar

Crossref

91.

Brodtkorb

,

A.

, and

Hagen

,

T. R.

,

2010

, “

A Comparison of Three Commodity-Level Parallel Architectures: Multi-Core CPU, Cell BE and GPU

,”

MMCS

2008, Vol. 5862, pp.

70

–

80

.10.1007/978-3-642-11620-9_6

92.

Elsen

,

E.

,

LeGresley

,

P.

, and

Darve

,

E.

,

2008

, “

Large Calculation of the Flow Over a Hypersonic Vehicle Using a GPU

,”

J. Comput. Phys.

,

227

(

24

), pp.

10148

–

10161

.10.1016/j.jcp.2008.08.023

Google Scholar

Crossref

93.

Buck

,

I.

,

Foley

,

T.

,

Horn

,

D.

,

Sugerman

,

J.

,

Fatahalian

,

K.

,

Houston

,

M.

, and

Hanrahan

,

P.

,

2003

, “

Brook for GPUs: Stream Computing on Graphics Hardware

,”

ACM Trans.

,

23

(

3

), pp.

777

–

786

.10.1145/1015706.1015800

Google Scholar

Crossref

94.

Brandvik

,

T.

, and

Pullan

,

G.

,

2008

, “

Acceleration of a 3D Euler Solver Using Commodity Graphics Hardware

,”

46th AIAA Aerospace Sciences Meeting and Exhibit

,

Reno, NV

,

Jan. 7–10

, AIAA Paper No. 2008-607.

95.

Brandvik

,

T.

, and

Pullan

,

G.

,

2007

, “

Acceleration of a Two-Dimensional Euler Solver Using Commodity Graphics Hardware

,”

J. Mech. Eng. Sci.

,

221

(

12

), pp.

1745

–

1748

.10.1243/09544062JMES813FT

Google Scholar

Crossref

96.

Brandvik

,

T.

, and

Pullan

,

G.

,

2009

, “

An Accelerated 3D Navier-Stokes Solver for Flows in Turbomachines

,”

ASME

Turbo Expo 2009,

Orlando

, FL,

June 8–12

, Paper No. GT2009-60052.10.1115/GT2009-60052

97.

Corrigan

,

A.

,

Camelli

,

F.

,

Löhner

,

R.

, and

Wallin

,

J.

,

2009

, “

Running Unstructured Grid CFD Solvers on Modern Graphics Hardware

,”

19th AIAA Computational Fluid Dynamics Conference

,

July

, Paper No. AIAA-2009-4001.

98.

Corrigan

,

A.

,

Camelli

,

F.

,

Löhner

,

R.

, and

Mut

,

F.

,

2012

, “

Semi-Automatic Porting of a Large-Scale FORTRAN CFD Code to GPUs

,”

Int. J. Numer. Methods Fluids

,

69

, pp.

314

–

331

.10.1002/fld.2560

Google Scholar

Crossref

99.

Antoniou

,

A. S.

,

Karantasis

,

K. I.

,

Polychronopoulos

,

E. D.

, and

Ekaterinaris

,

J. A.

,

2010

, “

Acceleration of a Finite-Difference WENO Scheme for Large-Scale Simulations on Many-Core Architectures

,”

48th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition

,

Orlando, FL

,

Jan. 4–7

.

100.

Cohen

,

J. M.

, and

Molemaker

,

M. J.

,

2009

, “

A Fast Double Precision CFD Code Using CUDA

,”

21st International Conference on Parallel Computational Fluid Dynamics

.

101.

Thibault

,

J.

, and

Senocak

,

I.

,

2009

, “

CUDA Implementation of a Navier-Stokes Solver on Multi-GPU Desktop Platforms for Incompressible Flows

,”

47th AIAA Aerospace Sciences Meeting

,

Jan. 5–8

, Paper No. AIAA 2009-758.

102.

Jacobsen

,

D.

,

Thibault

,

J.

, and

Senocak

,

I.

,

2010

, “

An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computation on Multi-GPU Clusters

,”

AIAA Aerospace Sciences Meeting

,

Reno, NV

,

January

.

103.

DeLeon

,

R.

,

Jacobsen

,

D.

, and

Senocak

,

I.

,

2012

, “

Large Eddy Simulations of Turbulent Incompressible Flows on GPU Clusters

,”

50th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition

, pp.

1

–

13

.

104.

Griebel

,

M.

, and

Zaspel

,

P.

,

2010

, “

A Multi-GPU Accelerated Solver for the Three-Dimensional Two-Phase Incompressible Navier-Stokes Equations

,”

Comput. Sci. Res. Dev.

,

25

, pp.

65

–

73

.10.1007/s00450-010-0111-7

Google Scholar

Crossref

105.

Kelly

,

J.

,

2009

, “

GPU-Accelerated Simulation of Two-Phase Incompressible Fluid Flow Using a Level-Set Method for Interface Capturing

,” ASME 2009 International Mechanical Engineering Congress and Exposition

(IMECE

2009), Lake Buena Vista, FL, Nov. 13–19, Paper No. IMECE2009-13330, pp.

2221

–

2228

.10.1115/IMECE2009-13330

106.

Jespersen

,

D. C.

,

2009

, “

Acceleration of a CFD Code With a GPU

,” NASA Technical Report No. NAS-09-003.

107.

Buning

,

P. G.

,

Jesperson

,

D. E.

,

Pulliam

,

T. H.

,

Chan

,

W. M.

,

Slotnick

,

J. P.

,

Krist

,

S. E.

, and

Renze

,

K. J.

,

1998

,

OVERFLOW User's Manual- version 1.8

,

NASA Langley Research Center

, Hampton, VA.

108.

Phillips

,

E. H.

,

Zhang

,

Y.

,

Davis

,

R. L.

, and

Owens

,

J. D.

,

2009

, “

Rapid Aerodynamic Performance Prediction on a Cluster of Graphics Processing Units

,”

47th AIAA Aerospace Sciences Meeting

,

Reno, NV

,

January

.

109.

Phillips

,

E. H.

,

Davis

,

R. L.

, and

Owens

,

J. D.

,

2010

, “

Unsteady Turbulent Simulations on a Cluster of Graphics Processors

,”

40th AIAA Fluid Dynamics Conference

,

June

, Paper No. AIAA 2010-5036.

110.

Asouti

,

V. G.

,

Trompoukis

,

X. S.

,

Kampolis

,

J. C.

, and

Giannakoglou

,

K. C.

,

2011

, “

Unsteady CFD Computations Using Vertex-Centered Finite Volumes for Unstructured Grids on Graphics Processing Units

,”

Int. J. Numer. Methods Fluids

,

67

, pp.

232

–

246

.10.1002/fld.2352

Google Scholar

Crossref

111.

Kanpolis

,

J. C.

,

Trompoukis

,

X. S.

,

Asouti

,

V. G.

, and

Giannakoglou

,

K. C.

,

2010

, “

CFD Based Analysis and Two-Level Aerodynamic Optimization on Graphics Processing Units

,”

Comput. Methods Appl. Mech. Eng.

,

199

, pp.

712

–

722

.10.1016/j.cma.2009.11.001

Google Scholar

Crossref

112.

Turek

,

S.

,

Becker

,

C.

, and

Kilian

,

S.

,

2003

, “

Hardware-Oriented Numeric and Concepts for PDE Software

,”

FGCS, Future Gener. Comput. Syst.

,

22

, pp.

217

–

238

.10.1016/j.future.2003.09.007

Google Scholar

Crossref

113.

Strzodka

,

R.

,

Doggett

,

M.

, and

Kolb

,

A.

,

2005

, “

Scientific Computation for Simulations of Programmable Graphics Hardware

,”

Simul. Model. Pract. Theory

,

13

, pp.

667

–

680

.10.1016/j.simpat.2005.08.001

Google Scholar

Crossref

114.

Patnaik

,

G.

, and

Obenschain

,

K. S.

,

2010

, “

Using GPU on HPC Applications to Satisfy Low-Power Computational Requirements

,” 48th

AIAA

Aerospace Sciences Meeting,

Orlando, FL

,

January

, Paper No. AIAA-2010-524.10.2514/6.2010-524

115.

Corrigan

,

A.

, and

Lohner

,

R.

,

2011

, “

Porting of FEFLO to Multi-GPU Clusters

,” 49th

AIAA

Aerospace Sciences Conference,

Orlando, FL

, Paper No. 2011-0948.10.2514/6.2011-948

116.

Klockner

,

A.

,

Warburton

,

T.

,

Bridge

,

J.

, and

Hesthaven

,

J. S.

,

2009

, “

Nodal Discretization Galerkin Methods on Graphics Processors

,”

J. Comput. Phys.

,

228

, pp.

7863

–

7882

.10.1016/j.jcp.2009.06.041

Google Scholar

Crossref

117.

Fatica

,

M.

,

Jameson

,

A.

, and

Alonso

,

J.

,

2004

, “

Stream-FLO: An Euler Solver for Streaming Architectures

,” AIAA Paper No. AIAA 2004-1090.

118.

Wang

,

P.

,

Abel

,

T.

, and

Kaehler

,

R.

,

2010

, “

Adaptive Mesh Fluid Simulations on GPU

,”

New Astron.

,

15

(

7

), pp.

581

–

589

.10.1016/j.newast.2009.10.002

Google Scholar

Crossref

119.

Liang

,

W. Y.

,

Hsieh

,

T. J.

,

Satria

,

M.

,

Chang

,

Y. L.

,

Fang

,

J. P.

,

Chen

,

C. C.

, and

Han

,

C. C.

,

2009

, “

A GPU-Based Simulation of Tsunami Propagation and Inundation

,”

Lect. Notes Comput. Sci.

,

5574

, pp.

593

–

603

.10.1007/978-3-642-03095-6

Google Scholar

Crossref

120.

Mossaiby

,

F.

,

Rossi

,

R.

,

Dadvand

,

P.

, and

Idelsohn

,

S.

,

2012

, “

OpenCL-Based Implementation of an Unstructured Edge-Based Finite Element Convection-Diffusion Solver on Graphics Hardware

,”

Int. J. Numer. Methods Eng.

,

89

, pp.

1635

–

1651

.10.1002/nme.3302

Google Scholar

Crossref

121.

Che

,

S.

,

Boyer

,

M.

,

Meng

,

J.

,

Tarjan

,

D.

,

Sheaffer

,

J.

, and

Skadron

,

K.

,

2008

, “

A Performance Study of General-Purpose Applications on Graphics Processors Using Cuda

,”

J. Parallel Distrib. Comput.

,

68

(

10

), pp.

1370

–

1380

.10.1016/j.jpdc.2008.05.014

Google Scholar

Crossref

122.

Li

,

W.

,

Wei

,

X.

, and

Kaufman

,

A.

,

2003

, “

Implementing Lattice Boltzmann Computation on Graphics Hardware

,”

Visual Comput.

,

19

, pp.

444

–

456

.10.1007/s00371-003-0210-6

Google Scholar

Crossref

123.

Kaufman

,

A.

,

Fan

,

Z.

, and

Petkov

,

K.

,

2009

, “

Implementing the Lattice Boltzmann Model on Commodity Graphics Hardware

,”

J. Stat. Mech.

,

2009

, p.

P06016

.10.1088/1742-5468/2009/06/P06016

124.

Fan

,

Z.

,

Kuo

,

Y.

,

Zhao

,

Y.

,

Qiu

,

F.

,

Kaufman

,

A.

, and

Arcieri

,

W.

,

2009

, “

Visual Simulation of Thermal Fluid Dynamics in a Pressurized Water Reactor

,”

Visual Comput.

,

25

(

11

), pp.

985

–

996

.10.1007/s00371-008-0309-x

Google Scholar

Crossref

125.

Tolke

,

J.

,

2010

, “

Implementation of a Lattice Boltzmann Kernel Using the Compute Unified Device Architecture Developed by NVIDIA

,”

Comput. Visualization Sci.

,

13

, pp.

29

–

39

.10.1007/s00791-008-0120-2

Google Scholar

Crossref

126.

Tolke

,

J.

, and

Krafczyk

,

M.

,

2008

, “

Teraflop Computing on a Desktop PC With GPUs for 3D CFD

,”

Int. J. Comput. Fluid Dyn.

,

22

(

7

), pp.

443

–

456

.10.1080/10618560802238275

Google Scholar

Crossref

127.

Bailey

,

P.

,

Myre

,

J.

,

Walsh

,

S. D. C.

,

Lilja

,

D. J.

, and

Saar

,

M. O.

,

2009

, “

Accelerating Lattice Boltzmann Fluid Flow Simulations Using Graphics Processors

,”

International Conference on Parallel Processing

,

Vienna Austria

.

128.

Feichtinger

,

C.

,

Habich

,

J.

,

Kostler

,

H.

,

Hager

,

G.

,

Rude

,

U.

, and

Wellein

,

G.

,

2011

, “

A Flexible Patch-Based Lattice Boltzmann Parallelization Approach for Heterogeneous GPU–CPU Clusters

,”

Parallel Comput.

,

37

(

9

), pp.

536

–

549

.10.1016/j.parco.2011.03.005

Google Scholar

Crossref

129.

Obrecht

,

C.

,

Kuznik

,

F.

,

Tourancheau

,

B.

, and

Roux

,

J. J.

,

2011

, “

A New Approach to the Lattice Boltzmann Method for Graphics Processing Units

,”

Comput. Math. Appl.

,

61

(

12

), pp.

3628

–

3638

.10.1016/j.camwa.2010.01.054

Google Scholar

Crossref

130.

Peng

,

L.

,

Nomura

,

K.

,

Oyakawa

,

T.

,

Kalia

,

R.

,

Nakano

,

A.

, and

Vashishta

,

P.

,

2008

, “

Parallel Lattice Boltzmann Flow Simulation on Emerging Multi-Core Platforms

,”

Lect. Notes Comput. Sci.

,

5168

, pp.

763

–

777

.10.1007/978-3-540-85451-7

Google Scholar

Crossref

131.

Alam

,

M. S.

, and

Cheng

,

L.

,

2011

, “

Parallelization of LBM Code Using CUDA Capable GPU Platform for 3D Single and Two-Sided Non-Facing Lid-Driven Cavity Flow

,” Proceedings of the ASME 2011 30th International Conference on Ocean, Offshore and Arctic Engineering (

OMAE

2011),

Rotterdam, The Netherlands

,

June 19–24

, pp.

745

–

753

.10.1115/OMAE2011-50332

132.

“

Sailfish Reference Manual

,” Sailfish, http://sailfish.us.edu.pl/index.html

133.

Rustico

,

E.

,

Bilotta

,

G.

,

Gallo

,

G.

,

Herault

,

A.

, and

Del Negro

,

C.

,

2012

, “

Smoothed Particle Hydrodynamics Simulations on Multi-GPU Systems

,” 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (

PDP

).10.1109/PDP.2012.21

134.

Anderson

,

J. A.

,

Lorenz

,

C. D.

, and

Travesset

,

A.

,

2008

, “

General Purpose Molecular Dynamics Simulations Fully Implemented on Graphics Processing Units

,”

J. Comput. Phys.

,

227

, pp.

5342

–

5359

.10.1016/j.jcp.2008.01.047

Google Scholar

Crossref

135.

Marsh

,

D.

,

2010

, “

Molecular Dynamics-Lattice Boltzmann Hybrid Method on Graphics Processors

,” M.S. thesis,

University of Illinois at Urbana-Champaign

,

Champaign, IL

.

136.

Sahu

,

K.

, and

Vanka

,

S. P.

,

2011

, “

A Multiphase Lattice Boltzmann Study of Buoyancy-Induced Mixing in a Tilted Channel

,”

Comput. Fluids

,

50

(

1

), pp.

199

–

215

.10.1016/j.compfluid.2011.07.012

Google Scholar

Crossref

137.

He

,

X.

,

Zhang

,

R.

,

Chen

,

S.

, and

Doolen

,

G. D.

,

1999

, “

On the Three-Dimensional Rayleigh-Taylor Instability

,”

Phys. Fluids

,

11

(

5

), pp.

1143

–

1152

.10.1063/1.869984

Google Scholar

Crossref

138.

Redapangu

,

P.

,

Vanka

,

S. P.

, and

Sahu

,

K.

,

2012

, “

Multiphase Lattice Boltzmann Simulations of Buoyancy Induced Flow of Two Immiscible Fluids With Different Viscosities

,”

Eur. J. Mech.

, B/Fluids,

34

, pp.

105

–

114

.10.1016/j.euromechflu.2012.01.006

Google Scholar

Crossref

139.

Redapangu

,

P.

,

Sahu

,

K. C.

, and

Vanka

,

S. P.

,

2012

, “

A Study of Pressure-Driven Displacement Flow of Two Immiscible Liquids Using a Multiphase Lattice Boltzmann Approach

,”

Phys. Fluids

,

24

(

10

), p.

102110

.10.1063/1.4760257

Google Scholar

Crossref

140.

Wang

,

G.

,

Cope

,

W. K.

, and

Vanka

,

S. P.

,

1994

,

Multigrid Calculations of Twin Jet Impingement With Crossflow: Comparison of Segregated and Coupled Relaxation Strategies

, Vol.

196

,

American Society of Mechanical Engineers, Fluids Engineering Division (Publication) FED

,

New York

, pp.

233

–

244

.

141.

Shinn

,

A. F.

, and

Vanka

,

S. P.

,

2009

, “

Implementation of a Semi-Implicit Pressure-Based Multigrid Fluid Flow Algorithm on a Graphics Processing Unit

,” Proceedings of the ASME (

IMECE

2009), Lake Buena Vista, FL, pp.

125

–

133

.10.1115/IMECE2009-11587

142.

Shinn

,

A. F.

,

Vanka

,

S. P.

, and

Hwu

,

W. W.

,

2010

, “

Direct Numerical Simulation of Turbulent Flow in a Square Duct Using a Graphics Processing Unit (GPU)

,” 40th

AIAA

Fluid Dynamics Conference.10.2514/6.2010-5029

143.

Shinn

,

A. F.

, and

Vanka

,

S. P.

,

2013

, “

Large Eddy Simulations of Film-Cooling Flows With a Micro-Ramp Vortex Generator

,”

ASME J. Turbomach.

,

135

(

1

), p.

011004

.10.1115/1.4006329

Google Scholar

Crossref

144.

Chaudhary

,

R.

,

Vanka

,

S. P.

, and

Thomas

,

B. G.

,

2010

, “

Direct Numerical Simulations of Magnetic Field Effects on Turbulent Flow in a Square Duct

,”

Phys. Fluids

,

22

(

7

), p.

075102

.10.1063/1.3456724

Google Scholar

Crossref

145.

Chaudhary

,

R.

,

Thomas

,

B. G.

, and

Vanka

,

S. P.

,

2012

, “

Effect of Electromagnetic Ruler Braking (EMBr) on Transient Turbulent Flow in Continuous Slab Casting Using Large Eddy Simulations

,”

Metall. Mater. Trans. B

,

43

(

3

), pp.

532

–

553

.10.1007/s11663-012-9634-6

Google Scholar

Crossref

146.

Chaudhary

,

R.

,

Vanka

,

S. P.

, and

Thomas

,

B. G.

,

2011

, “

Direct Numerical Simulations of Transverse and Spanwise Magnetic Field Effects on Turbulent Flow in a 2:1 Aspect Ratio Rectangular Duct

,”

Comput. Fluids

,

51

(

1

), pp.

100

–

114

.10.1016/j.compfluid.2011.08.002

Google Scholar

Crossref

147.

Vanka

,

S. P.

,

Shinn

,

A. F.

, and

Sahu

,

K. C.

,

2011

, “

Computational Fluid Dynamics Using Graphics Processing Units: Challenges and Opportunities

,” Proceedings of the ASME 2011

IMECE

Conference,

Denver, CO

, pp.

429

–

437

.10.1115/IMECE2011-65260

148.

Nicoud

,

F.

, and

Ducros

,

F.

,

1999

, “

Subgrid-Scale Stress Modelling Based on the Square of the Velocity Gradient Tensor

,”

Flow, Turbul. Combust.

,

62

(

3

), pp.

183

–

200

.10.1023/A:1009995426001

Google Scholar

Crossref

149.

Shinn

,

A. F.

,

2011

, “

Large Eddy Simulations of Turbulent Flows on Graphics Processing Units: Application to Film-Cooling Flows

,” Ph.D thesis,

University of Illinois at Urbana-Champaign

,

Champaign, IL

.

150.

Chaudhary

,

R.

,

2011

, “

Studies of Turbulent Flows in Continuous Casting of Steel With and Without Magnetic Field

,” Ph.D. thesis,

University of Illinois at Urbana-Champaign

,

Champaign, IL

.

151.

Zaman

,

K. B. M. Q.

,

Rigby

,

D. L.

, and

Heidman

,

J. D.

,

2010

, “

Inclined Jet in Crossflow Interacting With a Vortex Generator

,”

J. Propul. Power

,

26

(

5

), pp.

947

–

954

.10.2514/1.49742

Google Scholar

Crossref

152.

Timmel

,

K.

,

Eckert

,

S.

, and

Gerbeth

,

G.

,

2011

, “

Experimental Investigation of the Flow in a Continuous-Casting Mold Under the Influence of a Transverse Direct Current Magnetic Field

,”

Metall. Mater. Trans. B

,

42

(

1

), pp.

68

–

80

.10.1007/s11663-010-9458-1

Google Scholar

Crossref

153.

Timmel

,

K.

,

Miao

,

X.

,

Eckert

,

S.

,

Lucas

,

D.

, and

Gerbeth

,

G.

,

2010

, “

Experimental and Numerical Modeling of the Steel Flow in a Continuous Casting Mould Under the Influence of a Transverse DC Magnetic Field

,”

Magnetohydrodynamics

,

46

(

4

), pp.

337

–

448

.

154.

Lee

,

V.

,

Kim

,

C.

,

Chuggani

,

J.

,

Deisher

,

M.

,

Kim

,

D.

,

Nguyen

,

A.

,

Satish

,

N.

,

Smelyansky

,

M.

,

Chennupaty

,

S.

,

Hammarlund

,

P.

,

Singhal

,

R.

, and

Dubey

,

P.

,

2010

, “

Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU

,”

ISCA

10,

Saint-Malo, France

,

June 19–23

.10.1145/1815961.1816021

You do not currently have access to this content.

2012 Freeman Scholar Lecture: Computational Fluid Dynamics on Graphics Processing Units

References

Sign In

Purchase this Content

Get Email Alerts

Cited By

ASME Journals

ASME Conference Proceedings

ASME eBooks

Resources

Opportunities

2012 Freeman Scholar Lecture: Computational Fluid Dynamics on Graphics Processing Units

References

Sign In

Purchase this Content

Product added to cart.

Get Email Alerts

Cited By

Related Articles

Related Proceedings Papers

Related Chapters

ASME Journals

ASME Conference Proceedings

ASME eBooks

Resources

Opportunities

This Feature Is Available To Subscribers Only