Details

Parallel Processing with Multiple CPU Cores in Ansys Fluent

Hello everyone, this is Mohsen Seraj from Ozen Engineering, Inc. Today, I want to discuss how to run Ansys Fluent using parallel processing, set the number of cores per processor, and manage parallel processing effectively.

Initial Setup

We begin with the Veloent Launcher in the Home tab, focusing on the Machine and Simulation sections. It's crucial to understand that your Ansys license limits the number of cores available for running Ansys Fluent. Ensure you know the dimensions of your simulation (2D or 3D) and whether it requires double precision.

Key Points:

The GPU solver option is disabled if no GPU card is available.
You can choose the number of CPU cores for the solver processor.
Ensure the working directory is correct as all information is saved there.

Parallel Processing Settings

When using multiple CPU cores, parallel processing is employed. For inter-node communication, the default option is typically used, but Intel MPI and MSMPI are also available.

Local vs. Mainframe Cluster:

On a local machine, shared memory is used.
On a mainframe cluster, distributed memory across different CPUs is required.

Running the Simulation

We start Fluent with four cores, the minimum number available with an Ansys license. The processor used is an AMD Ryzen.

Simulation Details:

3D model, double precision, pressure-based solver.
Turbulent Modeling: K-Omega SST.
Mesh: Approximately 4 million cells.

Mesh and Boundary Conditions

Check the mesh display for cylinders, inlets, outlets, and walls. Ensure mesh quality by verifying minimum volume and orthogonality.

Boundary Conditions:

Inlet: Mass flow inlet with specified flow rate and temperature.
Outlet: Pressure outlet with zero gauge pressure.
Walls: Thermal boundary conditions with specified heat flux and convection conditions.

Solution and Iterations

Initialize the model using hybrid initialization. Before calculation, ensure an appropriate number of iterations for steady conditions. Monitor solution progress and check residuals.

Performance Metrics:

Average wall clock time per iteration.
Total wall clock time for the solution.
Simulation wall clock time, including all processes.

Increasing CPU Cores

We test the simulation with increasing CPU cores: 8, 12, 16, 24, and finally 32 cores. Each increase in cores results in a decrease in wall clock time, demonstrating the efficiency of parallel processing.

Core Usage:

4 cores: Initial setup.
8 cores: Reduced wall clock time.
12 cores: Further reduction in time.
16 cores: Continued efficiency.
24 cores: Optimal for the given mesh size.
32 cores: Maximum available on this machine.

Conclusion

In this video, we demonstrated how to adjust the number of cores in Ansys Fluent and the impact on simulation time. The efficiency gain diminishes beyond a certain number of cores due to the mesh size and communication overhead.

Key Takeaways:

The number of cores is limited by your Ansys license.
Optimal core usage depends on mesh size and simulation requirements.
Consider communication overhead when using many cores.

Thank you for watching. I hope you found this video informative.

Transcript

[This was auto-generated. There may be mispellings.]

Hello everyone, this is Mohsen Seraj from Ozen Engineering team. Today I want to talk about how to run Ansys Fluent when using parallel processing and how to set the number of cores per core processor available in parallel processing, and the parallel processing for running Ansys Fluent.

Watch this video. This is Veloent Launcher. We are in the Home tab. We have Machine and Simulation. For simulation, we have to use this one. The license is something that you need to know because it also limits the number of cores available to run Ansys Fluent.

You have to look and be sure about the dimensions of the simulation that you have, either 2D or 3D, and also if it is double precision or not. The option for the GPU solver is grayed out here, it is not activated, because we don't have a GPU card available on this machine.

And here you can choose the number of CPU cores. It is solver processor, so you can increase it or type it. Let's punch in 4, so you can see that here. If I start Fluent, we are going to use four CPU cores. Be sure that the working directory is correct, because all information is saved here.

These are other options for general options, parallel processing, remote, schedule, environment. Specifically, here for parallel setting, because when we use multi-cores, multi-CPU cores, it means that we are using parallel processing.

For the interconnection between the nodes, usually, there is only a default option, but for MPI, which is messaging between the nodes, I mean the competing cores, CPU cores, we usually again use the default, but also Intel MPI and MSMPI; these are the other options available for you.

When we are running on a local machine, like a desktop, that I am using right now, it is a local machine, shared memory, but if it is a mainframe cluster, then you have to choose this one, which we have memories distributed on different CPUs.

So, back to the Home tab, we are starting with four cores, which is the minimum number of cores available when you have an Ansys license. Hit the start button. So, Fluent is launching.

As you can see, we are using four cores, one, two, three, out of 32 cores available, and the processor is AMD Ryzen. Let's read the following model; we read the case file, this is the case file, open that, reading the mesh and building the mesh. So, these are the things printed in the console.

Wait until the model is fully read and is ready to work on. Okay, we're done, as shown here for reading the case. As you can see on top, it is a 3D model, double precision, and it is a pressure-based solver. It is Turbulent Modeling, it is K-Omega SST, K-Omega, and we are using four processors.

Let's check the mesh display. So, as you can see, we have cylinders, one inlet, one outlet; if I sort it out based on the surface type, inlets, outlets, and walls, so here on this side, we have inlets, okay, inlets, and on the other side, we have outlets.

If you want to see the mesh cells, mesh element, here is mesh element. optiSLang is a solution for the application of the optiSLang software. Let's pull up the information about the mesh. We don't have a very fine mesh here, it's about 4 million cells, number of faces, and nodes.

And if we want to check the mesh, we can come here and perform the mesh check; it shows the domain size. And one check-up, important here, is minimum volume. Minimum volume of a cell.

It shouldn't be negative, because sometimes we don't have good mesh quality for the cell, and it is very distorted, so this number may be negative.

So, we have to be sure that it is positive, and if you want to check the mesh quality, you can see that the minimum orthogonality is above 0.15, and the aspect ratio. It is a steady simulation. Fluid is the coolant that we use between the two solid cylinders. It is water.

These are the material properties, constant material properties that we use, and we use default material for the solids. We already activated the energy equation, so we have a transfer also available.

Be sure that the fluid zone is water; it is not air, by default, it is air, so be sure about that, to check the correct material for the fluid zone, and for the solid zones. Let's check boundary conditions. Inlet, it is mass flow inlet. It is a given mass flow rate and a given temperature.

Outlet is pressure outlet, zero gauge pressure, and we have the given back flow total temperature for the outlet.

Walls, we can have also thermal boundary conditions for the wall; say this is the inner wall, I have the boundary condition for heat flux with a given value, and for the outer wall, convection boundary condition, so I need information for the coefficient of heat transfer and the furthest temperature.

So, we're done for setup. I already set up some monitors from the solution reports for temperature and velocity. Let's initialize the model. Hybrid initialization. The model is initializing. Go to the run calculation.

Before hitting the calculate button, be sure you have an appropriate number of iterations for this steady condition. Always check the case, don't forget. The solution is starting. Now, we can start the calculation. So, now we are solving the model.

The solution is based on the four cores, and on the bottom left, it is showing the number of iterations remaining. The solution has not started yet, now it is just assigning the information to the nodes. This is the first iteration. These are the plots that we have for the solution reports.

Let's look at the plot for rigid walls. I will be back when the solution is done. 20 iterations done. The rigid wall is decreasing, which is good. So, the solution is done. We can check the time spent for this simulation. In the console, type parallel timer usage.

And here, you have some information about the time. First of all, it is average wall clock time per iteration. Wall clock time is the time that is similar to what we measure, really physical time. We measure that; the first one is pair iterations, here.

And the two last items in the list, it is the total wall clock time, it is the time that is only spent for the solution. And the last one, the last item, it is simulation wall clock time.

It is the time that all is included, not only the solution time, also the time for printing this information into the console, updating the plots here, for example, as you can see.

And also, if we have, for example, say that animations for saving the animation frames, everything will be the simulation wall clock time, but only for the solution, it is total wall clock time.

So far, I showed you how to set up the model using only four cores, running that, and find out the time that you spend for the solution. Let's move on to the next step, which is increasing the number of CPU cores. This is the Fluent Launcher; it uses 8 CPU cores. Start Fluent. Read the file.

It is the same file that we already used for 4 cores. Now, we have 8 cores listed here. Okay, the model is ready. It is the same setup as before, so we just go and initialize the model, and check the solution time. Go for running the calculation. 100 iterations. Start calculation.

Reports from the solution parameter. These are monitors. We look at the plot for the rigid walls.

I'll be back when the solution is done. 20 iterations done, you can see the bottom left, 80 iterations out of a hundred remaining, so you are seeing the number of iterations done and the countdown of the time here.

Besides the rigid walls, for the continuity, x velocity component, x, y, z velocity component, energy, and the turbulent parameters. Okay, solution completed, type the same command, parallel timer usage.

Okay, as we expected, we see a shorter time for wall clock time, total wall clock time, this is the time spent only for the solution.

And simulation wall clock time, this is whatever, printing the rigid walls, updating the plots here, and also if there is any animation, saving the frames for the animation. So, as you can see, the simulation wall clock time usually is larger than the total wall clock time.

So, we're done for 8 cores, that is here on top, and now we're going to test 12 cores. Set the number of CPU cores to 12. So, the setup is as before, start Fluent.

Number of cores available to run this simulation, read the file, same file as before that we used for 4 and 8 core simulations, now it is 12 cores on top that you can see. Okay, reading the model is done. So, we can go and initialize the model, following model here.

And start the calculation, the solution for 100 iterations. Always check the case, don't forget. The solution is starting. I'll be back when the solution is done. 20 iterations are done. Solution is completed after 23 iterations. Let's check the time, solution time.

And punch the same command, parallel timer usage, in the console. As we expected, by using 12 cores, we are getting a shorter time for the wall clock time and for the simulation wall clock time. So, as the number of cores increases, we have faster running of the simulation.

Let's go for more cores for running this following model. Okay, let's start with 16 cores, double precision, 3D, and click the start button. Let's read the case file. The model is already set up, so just initialize that.

Go for the solution, 100 iterations, you always have to check the case, start calculation. Solution completed, let's check the time. Punch this command in the console, parallel timer usage.

And you can see total wall clock time, it is 95, 96 seconds, and simulation wall clock time for everything is 101 seconds. So, I said the number of cores, server processes, 24 for this new simulation, same model. As you can see, that we are using 24 cores out of 32 cores available on this machine.

Read the following model, the case file. The model is ready, initialize the model, it is the same model that we used for previous simulations, initialize the model. The model, as you can see on top, we are using 24 cores. Start the simulation.

Almost 20 iterations done, 21 on the second iteration, and the solution is completed. Check the solution time by punching this command in the console. As we expected, we see shorter and shorter time for total wall clock time for the solution and simulation wall clock time for everything.

So, almost 83 seconds for total wall clock time and almost 89.90 seconds for the simulation wall clock time. So, we can go for one more simulation that we use all cores available on this machine. Let's check Fluent running with 32 cores. Hit start.

The cores available for running Fluent are listed here, 32 out of the 32 cores. These are the maximum number of cores that we can run Fluent on this machine. So, 32 cores, Turbulent SST K-Omega model, Pressure base, steady condition, double precision, and three-dimensional. Initialize the model.

And start the solution. Residuals are printing in the console. 85 iterations remaining out of 100, so if you have 2 HPC pack licenses, we can run up to 32 cores. Solution completed, let's check the time. Parallel timer usage. As you can see, the total wall clock time is 82-83 seconds.

And the simulation wall clock time is going to be almost 89 or 90 seconds. So, in this video, we just showed you how to change the number of cores when running Fluent. I am going to show you the comparison between the wall clock time for running this model when using more and more number of cores.

So, I put all wall clock times together. I already showed you the simulation that used different number of cores, started from 4 cores. Please be sure to correct any misspelled Ansys product names as you transcribe, e.g., 'OptiSling' should be 'optiSLang'.

If we compare 32 cores with 24 cores, we get the same result. The reason is that after a certain number of cores, we don't see that proportionality between the increasing number of cores and decreasing the wall clock time for running the CFT simulation, Ansys Fluent.

Remember that we only had about 4 million cells for the mesh that I used for this one. So, maybe 32 cores, even 24 cores, there are too many cores for such small cases. So, if we want to look at the results, if we want to see how many cores we need, it depends on different parameters.

And one of them is the size of the cell numbers, mesh cell numbers.

And remember that the data will be distributed between these different nodes, so when we go to the tens of the CPU cores on a local machine or on a cluster, then maybe the time lag for having the messaging between these cores, if you have a CPU core, it could be a parameter here.

And also, the number of the mesh, this is some mesh cell number, this is another thing that you have to think about that. Another point here is that the number of CPU cores available to you is also dependent on the Ansys license that you have. For the standard one, you start with 4 cores.

If you have the first HPC pack, then you're going to have 12 more cores, and the total will be 16 cores. If you want to run the parallel processing for Fluent on 32 cores, then you need 2 HPC packs and an Ansys license.

There are other parameters that you have to think about when you choose the number of cores in the Fluent Launcher. I hope you enjoyed this video. Thank you very much.

View on YouTube