Vote count:
0
I am trying to improve the performance of a large scientific simulation code written in Fortran. The process I am following is as follows. I identified the most time consuming kernel, and extracted that kernel from the main code and optimized the kernel separately, using loop transformations, such as loop unroll and loop permutation. Now when I get the optimized kernel back into the main body of code, I fail to see any performance improvement in the main code. In fact some times the optimized flow takes more time, than the original implementation.
I am guessing this is due to differences in cache usages. When I was running the small kernel, it had the whole cache to its disposition, but when the main code was run, the improved kernel as part of the main code could perhaps access only a small fraction of the cache.
So, my question is, is it possible to limit the amount of cache that is accessible to a program? If there is, I could perhaps use it to model the behavior of the main code when I am running the small kernel. I am currently running on an AMD Opteron platform, but have access to other platforms too such as IBM BG/Q. Thank you.
Limiting cache accessibility
Aucun commentaire:
Enregistrer un commentaire