Memory Management for Many-Core Processors with Software Configurable Locality Policies
J. Zhou, and B. Demsky. Proceedings of the 2012 international symposium on Memory Management, page 3--14. New York, NY, USA, ACM, (2012)
DOI: 10.1145/2258996.2259000
Abstract
As processors evolve towards higher core counts, architects will develop more sophisticated memory systems to satisfy the cores' increasing thirst for memory bandwidth. Early many-core processor designs suggest that future memory systems will likely include multiple controllers and distributed cache coherence protocols. Many-core processors that expose memory locality policies to the software system provide opportunities for automatic tuning that can achieve significant performance benefits.</p> <p>Managed languages typically provide a simple heap abstraction. This paper presents techniques that bridge the gap between the simple heap abstraction of modern languages and the complicated memory systems of future processors. We present a NUMA-aware approach to garbage collection that balances the competing concerns of data locality and heap utilization to improve performance. We combine a lightweight approach for measuring an application's memory behavior with an online, adaptive algorithm for tuning the cache to optimize it for the specific application's behaviors.</p> <p>We have implemented our garbage collector and cache tuning algorithm and present results on a 64-core TILEPro64 processor.
Description
Memory management for many-core processors with software configurable locality policies
%0 Conference Paper
%1 Zhou:2012:MMM:2258996.2259000
%A Zhou, Jin
%A Demsky, Brian
%B Proceedings of the 2012 international symposium on Memory Management
%C New York, NY, USA
%D 2012
%I ACM
%K GC TILEPro64 Tilera
%P 3--14
%R 10.1145/2258996.2259000
%T Memory Management for Many-Core Processors with Software Configurable Locality Policies
%U http://doi.acm.org/10.1145/2258996.2259000
%X As processors evolve towards higher core counts, architects will develop more sophisticated memory systems to satisfy the cores' increasing thirst for memory bandwidth. Early many-core processor designs suggest that future memory systems will likely include multiple controllers and distributed cache coherence protocols. Many-core processors that expose memory locality policies to the software system provide opportunities for automatic tuning that can achieve significant performance benefits.</p> <p>Managed languages typically provide a simple heap abstraction. This paper presents techniques that bridge the gap between the simple heap abstraction of modern languages and the complicated memory systems of future processors. We present a NUMA-aware approach to garbage collection that balances the competing concerns of data locality and heap utilization to improve performance. We combine a lightweight approach for measuring an application's memory behavior with an online, adaptive algorithm for tuning the cache to optimize it for the specific application's behaviors.</p> <p>We have implemented our garbage collector and cache tuning algorithm and present results on a 64-core TILEPro64 processor.
%@ 978-1-4503-1350-6
@inproceedings{Zhou:2012:MMM:2258996.2259000,
abstract = {As processors evolve towards higher core counts, architects will develop more sophisticated memory systems to satisfy the cores' increasing thirst for memory bandwidth. Early many-core processor designs suggest that future memory systems will likely include multiple controllers and distributed cache coherence protocols. Many-core processors that expose memory locality policies to the software system provide opportunities for automatic tuning that can achieve significant performance benefits.</p> <p>Managed languages typically provide a simple heap abstraction. This paper presents techniques that bridge the gap between the simple heap abstraction of modern languages and the complicated memory systems of future processors. We present a NUMA-aware approach to garbage collection that balances the competing concerns of data locality and heap utilization to improve performance. We combine a lightweight approach for measuring an application's memory behavior with an online, adaptive algorithm for tuning the cache to optimize it for the specific application's behaviors.</p> <p>We have implemented our garbage collector and cache tuning algorithm and present results on a 64-core TILEPro64 processor.},
acmid = {2259000},
added-at = {2012-09-16T16:00:20.000+0200},
address = {New York, NY, USA},
author = {Zhou, Jin and Demsky, Brian},
biburl = {https://www.bibsonomy.org/bibtex/207a35244430c208eba3e283a5a846eb2/gron},
booktitle = {Proceedings of the 2012 international symposium on Memory Management},
description = {Memory management for many-core processors with software configurable locality policies},
doi = {10.1145/2258996.2259000},
interhash = {8dd72f8aea789497f85fa51b2fc59ff4},
intrahash = {07a35244430c208eba3e283a5a846eb2},
isbn = {978-1-4503-1350-6},
keywords = {GC TILEPro64 Tilera},
location = {Beijing, China},
numpages = {12},
pages = {3--14},
publisher = {ACM},
series = {ISMM '12},
timestamp = {2012-09-16T16:00:20.000+0200},
title = {Memory Management for Many-Core Processors with Software Configurable Locality Policies},
url = {http://doi.acm.org/10.1145/2258996.2259000},
year = 2012
}