copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors

J. Mellor-Crummey, and M. Scott. ACM Trans. Comput. Syst., 9 (1): 21--65 (1991)
DOI: http://doi.acm.org/10.1145/103727.103729

Abstract

Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-memory parallel programs. Unfortunately, typical implementations of busy-waiting tend to produce large amounts of memory and interconnect contention, introducing performance bottlenecks that become markedly more pronounced as applications scale. We argue that this problem is not fundamental, and that one can in fact construct busy-wait synchronization algorithms that induce no memory or interconnect contention. The key to these algorithms is for every processor to spin on separate Pub Fmt italiclocally-accessiblePub Fmt /italic flag variables, and for some other processor to terminate the spin with a single remote write operation at an appropriate time. Flag variables may be locally-accessible as a result of coherent caching, or by virtue of allocation in the local portion of physically distributed shared memory. We present a new scalable algorithm for spin locks that generates 0(1) remote references per lock acquisition, independent of the number of processors attempting to acquire the lock. Our algorithm provides reasonable latency in the absence of contention, requires only a constant amount of space per lock, and requires no hardware support other than a swap-with-memory instruction. We also present a new scalable barrier algorithm that generates 0(1) remote references per processor reaching the barrier, and observe that two previously-known barriers can likewise be cast in a form that spins only on locally-accessible flag variables. None of these barrier algorithms requires hardware support beyond the usual atomicity of memory reads and writes. We compare the performance of our scalable algorithms with other software approaches to busy-wait synchronization on both a Sequent Symmetry and a BBN Butterfly. Our principal conclusion is that Pub Fmt italiccontention due to synchronization need not be a problem in large-scale shared-memory multiprocessors.Pub Fmt /italic The existence of scalable algorithms greatly weakens the case for costly special-purpose hardware support for synchronization, and provides a case against so-called “dance hall” architectures, in which shared memory locations are equally far from all processors. —Pub Fmt italicFrom the Authors' AbstractPub Fmt /italic

Description

Algorithms for scalable synchronization on shared-memory multiprocessors

Cite this publication

%0 Journal Article %1 103729 %A Mellor-Crummey, John M. %A Scott, Michael L. %C New York, NY, USA %D 1991 %I ACM %J ACM Trans. Comput. Syst. %K Algorithms Barrier Benchmarks Evaluation ManyCore MultiCore Survey Synchronization comparison %N 1 %P 21--65 %R http://doi.acm.org/10.1145/103727.103729 %T Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors %U http://portal.acm.org/citation.cfm?id=103727.103729 %V 9 %X Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-memory parallel programs. Unfortunately, typical implementations of busy-waiting tend to produce large amounts of memory and interconnect contention, introducing performance bottlenecks that become markedly more pronounced as applications scale. We argue that this problem is not fundamental, and that one can in fact construct busy-wait synchronization algorithms that induce no memory or interconnect contention. The key to these algorithms is for every processor to spin on separate Pub Fmt italiclocally-accessiblePub Fmt /italic flag variables, and for some other processor to terminate the spin with a single remote write operation at an appropriate time. Flag variables may be locally-accessible as a result of coherent caching, or by virtue of allocation in the local portion of physically distributed shared memory. We present a new scalable algorithm for spin locks that generates 0(1) remote references per lock acquisition, independent of the number of processors attempting to acquire the lock. Our algorithm provides reasonable latency in the absence of contention, requires only a constant amount of space per lock, and requires no hardware support other than a swap-with-memory instruction. We also present a new scalable barrier algorithm that generates 0(1) remote references per processor reaching the barrier, and observe that two previously-known barriers can likewise be cast in a form that spins only on locally-accessible flag variables. None of these barrier algorithms requires hardware support beyond the usual atomicity of memory reads and writes. We compare the performance of our scalable algorithms with other software approaches to busy-wait synchronization on both a Sequent Symmetry and a BBN Butterfly. Our principal conclusion is that Pub Fmt italiccontention due to synchronization need not be a problem in large-scale shared-memory multiprocessors.Pub Fmt /italic The existence of scalable algorithms greatly weakens the case for costly special-purpose hardware support for synchronization, and provides a case against so-called “dance hall” architectures, in which shared memory locations are equally far from all processors. —Pub Fmt italicFrom the Authors' AbstractPub Fmt /italic

@article{103729, abstract = {Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-memory parallel programs. Unfortunately, typical implementations of busy-waiting tend to produce large amounts of memory and interconnect contention, introducing performance bottlenecks that become markedly more pronounced as applications scale. We argue that this problem is not fundamental, and that one can in fact construct busy-wait synchronization algorithms that induce no memory or interconnect contention. The key to these algorithms is for every processor to spin on separate Pub Fmt italiclocally-accessiblePub Fmt /italic flag variables, and for some other processor to terminate the spin with a single remote write operation at an appropriate time. Flag variables may be locally-accessible as a result of coherent caching, or by virtue of allocation in the local portion of physically distributed shared memory. We present a new scalable algorithm for spin locks that generates 0(1) remote references per lock acquisition, independent of the number of processors attempting to acquire the lock. Our algorithm provides reasonable latency in the absence of contention, requires only a constant amount of space per lock, and requires no hardware support other than a swap-with-memory instruction. We also present a new scalable barrier algorithm that generates 0(1) remote references per processor reaching the barrier, and observe that two previously-known barriers can likewise be cast in a form that spins only on locally-accessible flag variables. None of these barrier algorithms requires hardware support beyond the usual atomicity of memory reads and writes. We compare the performance of our scalable algorithms with other software approaches to busy-wait synchronization on both a Sequent Symmetry and a BBN Butterfly. Our principal conclusion is that Pub Fmt italiccontention due to synchronization need not be a problem in large-scale shared-memory multiprocessors.Pub Fmt /italic The existence of scalable algorithms greatly weakens the case for costly special-purpose hardware support for synchronization, and provides a case against so-called “dance hall” architectures, in which shared memory locations are equally far from all processors. —Pub Fmt italicFrom the Authors' AbstractPub Fmt /italic}, added-at = {2010-01-15T11:13:47.000+0100}, address = {New York, NY, USA}, author = {Mellor-Crummey, John M. and Scott, Michael L.}, biburl = {https://www.bibsonomy.org/bibtex/2cbf711851a668f0a995e2ce052ba94b0/gron}, description = {Algorithms for scalable synchronization on shared-memory multiprocessors}, doi = {http://doi.acm.org/10.1145/103727.103729}, interhash = {55dc7fe12cef8fb54b35457545562362}, intrahash = {cbf711851a668f0a995e2ce052ba94b0}, issn = {0734-2071}, journal = {ACM Trans. Comput. Syst.}, keywords = {Algorithms Barrier Benchmarks Evaluation ManyCore MultiCore Survey Synchronization comparison}, number = 1, pages = {21--65}, publisher = {ACM}, timestamp = {2010-01-15T11:13:47.000+0100}, title = {Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors}, url = {http://portal.acm.org/citation.cfm?id=103727.103729}, volume = 9, year = 1991 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors

Abstract

Description

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors

Abstract

Description

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors

Comments and Reviews
(0)