I recently diagnosed the root cause of a concurrency bug, CR6822370,
and thought it sufficiently interesting to share the details. (CR 6822370 actually represents a
cluster of bugs that are now thought to be related by a common underlying issue).
Briefly, we have a lost wakeup bug in the native C++ Parker::park() platform-specific
infrastructure code that implements java.util.concurrent.LockSupport.park().
The lost wakeup arises from a race that itself arises because of architectural
reordering that in turn occurs because of missing memory barrier instructions.
The lost wakeup may manifest as various 'hangs' or instances of progress failure.
F. David, G. Thomas, J. Lawall, and G. Muller. Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages &\#38; Applications, page 291--307. ACM, (2014)
R. von Behren, J. Condit, and E. Brewer. Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9, page 4. Berkeley, CA, USA, USENIX Association, (2003)
H. Pan, B. Hindman, and K. Asanović. Proceedings of the First USENIX conference on Hot topics in parallelism, page 6. Berkeley, CA, USA, USENIX Association, (2009)
O. Tardieu, and S. Edwards. EMSOFT '06: Proceedings of the 6th ACM & IEEE International conference on Embedded software, page 142--151. New York, NY, USA, ACM, (2006)
J. Pallas, and D. Ungar. PLDI '88: Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation, page 268--277. New York, NY, USA, ACM, (1988)