Toward Operating System Support for Scalable Multithreaded Message Passing., , and . EuroMPI, page 1:1-1:10. ACM, (2015)Revisiting rendezvous protocols in the context of RDMA-capable host channel adapters and many-core processors., , , , and . EuroMPI, page 85-90. ACM, (2013)Dynamic Adaptable Asynchronous Progress Model for MPI RMA Multiphase Applications., , , , , and . IEEE Trans. Parallel Distributed Syst., 29 (9): 1975-1989 (2018)Exploiting Hidden Non-uniformity of Uniform Memory Access on Manycore CPUs., , and . Euro-Par Workshops (2), volume 8806 of Lecture Notes in Computer Science, page 242-253. Springer, (2014)Inter-reference gap distribution replacement: an improved replacement algorithm for set-associative caches., and . ICS, page 20-30. ACM, (2004)Interface for heterogeneous kernels: A framework to enable hybrid OS designs targeting high performance computing on manycore architectures., , , , , , , , and . HiPC, page 1-10. IEEE Computer Society, (2014)Direct MPI Library for Intel Xeon Phi Co-Processors., , and . IPDPS Workshops, page 816-824. IEEE, (2013)On the Scalability, Performance Isolation and Device Driver Transparency of the IHK/McKernel Hybrid Lightweight Kernel., , , , , and . IPDPS, page 1041-1050. IEEE Computer Society, (2016)Casper: An Asynchronous Progress Model for MPI RMA on Many-Core Architectures., , , , , and . IPDPS, page 665-676. IEEE Computer Society, (2015)Pinot: Speculative Multi-threading Processor Architecture Exploiting Parallelism over a Wide Range of Granularities., , , and . MICRO, page 81-92. IEEE Computer Society, (2005)