¡Hola! I'm Shaizeen Aga.

I am a PhD student at University of Michigan, Ann Arbor. I work with Prof.Satish Narayanasamy and I am part of Computer Engineering Lab. My under-graduate alma mater is College of Engineering, Pune, India.

My current research focus is on near data computing where I am designing energy efficient architectures exploiting the near data computing paradigm. I am also passionate about parallel computer architecture and parallel computing and making multi-core systems more accessible to programmers. My past projects in this area have been about making more intuitive memory models possible and efficient runtimes for multi-core systems.


This work introduces a novel way to perform near data computing by exposing processor caches as viable compute units. I harness bitline computing in SRAM, to enable in-place computation in processor caches and demonstrate how computation in caches unlocks massive amounts of data parallelism and reduces data movement costs. I demostrate considerable performance and energy savings for a suite of data intensive workloads using this technique.
This project was part of my Internship with High Performance Computing group at Pacific Northwest National Labs, Richland, WA. I was mentored here by Sriram Krishnamoorthy.
I worked here on improving the efficiency of Cilk multi-core runtime system. Cilk programming language makes it easy for programmers to express parallelism though for certain class of algorithms, synchronization in Cilk programs tends to be over-constrained leading to poor performance. By employing optimistic concurrency I improved the performance of Cilk multithreaded runtime system by upto 1.9X.
A significant fraction of fence overhead is caused by stores that are waiting for data from memory. In this project, by introducing the capability to grant coherence permission for a store much earlier than servicing its data from memory, I could complete stores faster and as such complete fences faster. I showed that this simple optimization eliminates fence overhead in a majority of scenarios, and helps bridge the performance gap between SC and TSO memory models for a low design cost.
This project was part of my Parallel Computer Architecture (EECS 570) course project at University of Michigan, Ann Arbor. Using dynamic classification of cache blocks, we relaxed memory consistency model constraints to improve performance of Sequentially Consistent hardware.
This project earned Top Grade in Winter 2012 class of EECS 570!
Project Report
This project was part of my Computer Architecture (EECS 470) course project at University of Michigan, Ann Arbor.
I implemented the memory interface of the core (load queue, store queue, post retirement store buffer) and host of other components like Reorder Buffer, Instruction Buffer. I also designed and implemented an Adaptive Instruction Prefetcher which gained us significant performance benefits.
This processor design earned Top Grade in Fall 2011 class of EECS 470!
Project Report
I worked here on NVIDIA’s parallel computing platform CUDA and ported a True motion estimation algorithm on the CUDA platform. The challenges involved here were: understanding true motion estimation and CUDA architecture and doing a literature survey to pick an algorithm which could be efficiently ported onto the CUDA platform.

Conference Papers

Compute Caches

Shaizeen Aga, Supreet Jeloka, Arun Subramaniyan, Satish Narayanasamy, David Blaauw, and Reetuparna Das.
To appear in the 23rd IEEE Symposium on High Performance Computer Architecture (HPCA'17),February 2017. [link]

Efficiently Enforcing Strong Memory Ordering in GPUs

Abhayendra Singh, Shaizeen Aga, Satish Narayanasamy.
In the 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'15), December 2015. [link]

CilkSpec: optimistic concurrency for Cilk

Shaizeen Aga, Sriram Krishnamoorthy, Satish Narayanasamy.
In International Conference for High Performance Computing, Networking, Storage and Analysis (SC'15), Austin, TX, November 2015. [link]

zFence: Data-less Coherence for Efficient Fences

Shaizeen Aga, Abhayendra Singh, Satish Narayanasamy.
In 29th International Conference on Supercomputing (ICS'15), June 2015 [link]


Ordering constraint management within coherent memory systems

Shaizeen Aga, Abhayendra Singh, Satish Narayanasamy.
US Patent 9367461, 2014 [link]

Method for exploiting parallelism in task-based systems using an iteration space splitter

Behnam Robatmili, Shaizeen Aga, Dario Suarez Gracia, Arun Raman, Aravind Natarajan, Gheorghe Calin Cascaval, Pablo Montesinos Ortego, Han Zhao.
US Patent 9501328, 2016 [link]

Contact Details


shaizeen [at] umich [dot] edu


4844 BBB, 2260 Hayward,
University of Michigan,Ann Arbor,
MI,USA 48109-2121.

Find me on

Google Scholar