I recently diagnosed the root cause of a concurrency bug, CR6822370,
and thought it sufficiently interesting to share the details. (CR 6822370 actually represents a
cluster of bugs that are now thought to be related by a common underlying issue).
Briefly, we have a lost wakeup bug in the native C++ Parker::park() platform-specific
infrastructure code that implements java.util.concurrent.LockSupport.park().
The lost wakeup arises from a race that itself arises because of architectural
reordering that in turn occurs because of missing memory barrier instructions.
The lost wakeup may manifest as various 'hangs' or instances of progress failure.
F. David, G. Thomas, J. Lawall, и G. Muller. Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages &\#38; Applications, стр. 291--307. ACM, (2014)
H. Pan, B. Hindman, и K. Asanović. Proceedings of the First USENIX conference on Hot topics in parallelism, стр. 6. Berkeley, CA, USA, USENIX Association, (2009)
J. Pallas, и D. Ungar. PLDI '88: Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation, стр. 268--277. New York, NY, USA, ACM, (1988)
O. Tardieu, и S. Edwards. EMSOFT '06: Proceedings of the 6th ACM & IEEE International conference on Embedded software, стр. 142--151. New York, NY, USA, ACM, (2006)