Hi, I'm Chingyi Lin. I'm a first year Ph.D. in System Design Lab, and advised by Radu Marculescu. I am a Taiwanese and now in Electrical and Computer Engineering Department at Carnegie Mellon University.
Most of the cache coherence protocol only has global state. With G-flag, which indicates the reduced coherence domain, and the modified hierarchial interconnection, we reduce 22% of scalibility cost of cache miss latency in PARSEC-3.0's blackscholes program. This group-aware technique can alleviate the affect of coherence communication overhead by describing the sharing status with more detail.
Paper link: Download here
Course representation: slides
Triple-DES Optimization on a CPU
Course: 18-645 How to write fast code
Optimizing a decryption algorithm (triple DES) depending on the parameters of our target processor, Intel Broadwell, including caches access time, size and number of functional unit. With table merging and operation parallelization, this work improves 10% in execution time. (It's still a single-threaded program)
Paper link: Here
Course representation: 6-page slides