Siva Kumar Sastry Hari

Sr. Research Scientist
Architecture Research Group
NVIDIA
Email: shari [at] nvidia [dot] com
Google Scholar page
My NVIDIA page

Siva Hari is a Senior Research Scientist in the Computer Architecture Research Group at NVIDIA. His research interests are in the fields of computer architecture, compilers, GPUs, and reliable systems. His current research focus is on making GPU and accelerator-based systems resilient through software and architecture level solutions. He obtained his Ph.D. and M.S. from the Computer Science Department at University of Illinois at Urbana-Champaign and Bachelor's degree from the Computer Science and Engineering Department at IIT Madras.

He received the 2014 David J. Kuck Outstanding Ph.D. Thesis Award from the Computer Science Department at the University of Illinois. He also received the W.J. Poppelbaum Memorial Award from the Computer Science Department at the University of Illinois at Urbana-Champaign in 2012 for academic merit and creativity in computer hardware or architecture. One of the papers he co-authored was selected for the IEEE Micro's Top Picks 2013 and another was recognized as the Best Paper Award Runner-up at DSN 2018. He has interned at Intel and Sun Microsystems.




Publications

Talks/Posters

Theses

Software


Publications:

Top
  1. GPU Snapshot: Checkpoint Offloading for GPU-Dense Systems
    K. Lee, M. B. Sullivan, S. K. S. Hari, T. Tsai, S. W. Keckler, M. Erez
    ICS'19: International Conference on Supercomputing, 2019


  2. ML-based Fault Injection for Autonomous Vehicles: A Case for Bayesian Fault Injection
    S. Jha, S. S. Banerjee, T. Tsai, S. K. S. Hari, M. B. Sullivan, Z. T. Kalbarczyk, S. W. Keckler, R. K. Iyer
    DSN'19: IEEE/IFIP International Conference on Dependable Systems and Networks, 2019


  3. Towards analytically evaluating the error resilience of GPU Programs
    A. R. Anwer, G. Li, K. Pattabiraman, S. K. S. Hari, M. B. Sullivan, T. Tsai
    SELSE'19: IEEE Workshop on Silicon Errors in Logic - System Effects, 2019


  4. On the Trend of Resilience for GPU-Dense Systems
    K. Lee, M. B. Sullivan, S. K. S. Hari, T. Tsai, S. W. Keckler, M. Erez
    SELSE'19: IEEE Workshop on Silicon Errors in Logic - System Effects, 2019


  5. Optimizing Software-Directed Instruction Replication for GPU Error Detection
    A. Mahmoud, S. K. S. Hari, M. Sullivan, T. Tsai, and S. Keckler
    SC'18: International Conference for High-Performance Computing, Networking, Storage and Analysis, 2018 (acceptance rate: ~19%)

  6. Kayotee: A Fault Injection-based System to Assess the Safety and Reliability of Autonomous Vehicles to Faults and Errors
    S. Jha, T. Tsai, S. K. S. Hari, M. Sullivan, Z. Kalbarczyk, S. W. Keckler, and R. Iyer
    ART'18: IEEE International Workshop on Automotive Reliability & Test, 2018

  7. SwapCodes: Error Codes for Hardware-Software Cooperative GPU Pipeline Error Detection
    M. Sullivan, S. K. S. Hari, B. Zimmer, T. Tsai, and S. Keckler
    MICRO'18: IEEE/ACM International Symposium on Microarchitecture, 2018 (acceptance rate: ~21%)

  8. Modeling Soft Error Propagation in Programs
    G. Li, K. Pattabiraman, S. K. S. Hari, M. Sullivan, and T. Tsai,
    DSN'18: IEEE/IFIP International Conference on Dependable Systems and Networks, 2018 (acceptance rate: ~25%)
    Best Paper Awared Runner-Up

  9. Understanding Error Propagation in Deep-Learning Neural Networks (DNN) Accelerators and Applications
    G. Li, S. K. S. Hari, M. Sullivan, T. Tsai, K. Pattabiraman, J. Emer, S. Keckler
    SC'17: International Conference for High-Performance Computing, Networking, Storage and Analysis, 2017 (acceptance rate: ~19%)

  10. SASSIFI: An Architecture-level Fault Injection Tool for GPU Application Resilience Evaluation
    S. K. S. Hari, T. Tsai, M. Stephenson, S. Keckler, J. Emer
    ISPASS'17: IEEE International Symposium on Performance Analysis of Systems and Software, 2017 (acceptance rate: ~30%)

  11. Approxilyzer: Towards A Systematic Framework for Instruction-Level Approximate Computing and its Application to Hardware Resiliency
    R. Venkatagiri, A. Mahmoud, S. K. S. Hari, S. Adve
    MICRO'16: IEEE/ACM International Symposium on Microarchitecture, 2016 (acceptance rate: ~21%)

  12. An Analytical Model for Hardened Latch Selection and Exploration
    M. Sullivan, B. Zimmer, S. K. S. Hari, T. Tsai, S. Keckler
    SELSE'16: IEEE Workshop on Silicon Errors in Logic - System Effects, 2016

  13. Flexible Software Profiling of GPU Architectures
    M. Stephenson, S. K. S. Hari, Y. Lee, E. Ebrahimi, D. Johnson, D.Nellans, M. O’Connor, S. W. Keckler
    ISCA'15: International Symposium on Computer Architecture, 2015 (acceptance rate: ~19%)

  14. Locality-Driven Dynamic GPU Cache Bypassing
    C. Li, S. L. Song, H. Dai, A. Sidelnik, S. K. S. Hari, and H. Zhou
    ICS'15: International Conference on Supercomputing, 2015 (acceptance rate: ~25%)

  15. SASSIFI:Evaluating Resilience of GPU Applications
    S. K. S. Hari, T. Tsai, M. Stephenson, S. W. Keckler, and J. Emer.
    SELSE'15: IEEE Workshop of Silicon Errors in Logic - System Effects (SELSE), 2015

  16. Hardware Fault Recovery for I/O Intensive Applications
    P. Ramachandran, S. K. S. Hari, M. Li, and S. V. Adve
    TACO'14:Transactions on Architecture and Code Optimization, 2014

  17. GangES: Gang Error Simulation for Hardware Resiliency Evaluation
    S. K. S. Hari, R. Venkatagiri, S. V. Adve, and H. Naeimi
    ISCA'14: International Symposium on Computer Architecture, 2014

  18. Measuring the Radiation Reliability of SRAM Structures in GPUs Designed for HPC
    P. Rech, L. Carro, N. Wang, T. Tsai, S. K. S. Hari, and S. W. Keckler
    SELSE'14: IEEE Workshop on Silicon Errors in Logic - System Effects , 2014

  19. Relyzer: Application Resiliency Analyzer for Transient Faults
    S. K. S. Hari, S. V. Adve, H. Naeimi, and P. Ramachandran
    TopPicks'13: IEEE Micro, special issue on the Top Picks from the 2012 Computer Architecture Conferences, May - June 2013

  20. Low-cost Program-level Detectors for Reducing Silent Data Corruptions
    S. K. S. Hari, S. V. Adve, and H. Naeimi
    DSN'12: Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks, 2012 (acceptance rate: ~17%)

  21. Relyzer: Exploiting Application-level Fault Equivalence to Analyze Application Resiliency to Transient Faults
    S. K. S. Hari, S. V. Adve, H. Naeimi, and P. Ramachandran
    ASPLOS '12: Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems, 2012 (acceptance rate: ~21%)

  22. CrashTest'ing SWAT: Accurate, Gate-Level Evaluation of Symptom-Based Resiliency Solutions
    A. Pellegrini, R. Smolinski, L. Chen, X. Fu, S. K. S. Hari, J. Jiang, S. V. Adve, T. Austin, V. Bertacco
    DATE'12: Proceedings of the Design, Automation and Test in Europe, 2012

  23. Relyzer: Application Resiliency Analyzer for Transient Faults
    S. K. S. Hari, H. Naeimi, P. Ramachandran, S. V. Adve
    SELSE'11: Proceedings of the IEEE Workshop of Silicon Errors in Logic - System Effects , 2011

  24. Understanding When Symptom Detectors Work by Studying Data-Only Application Values
    P. Ramachandran, S. K. S. Hari, S. V. Adve, H. Naeimi
    SELSE'11: Proceedings of the IEEE Workshop of Silicon Errors in Logic - System Effects, 2011.

  25. CrashTest'ing SWAT: Accurate, Gate-Level Evaluation of Symptom-Based Resiliency Solutions
    A. Pellegrini, R. Smolinski, X. Fu, L. Chen, S. K. S. Hari, J. Jiang, S. V. Adve, T. Austin, V. Bertacco
    SELSE'11: Proceedings of the IEEE Workshop of Silicon Errors in Logic - System Effects, 2011

  26. Architectures for Online Error Detection and Recovery in Multicore Processors D. Gizopoulos, M. Psarakis, S. V. Adve, P. Ramachandran, S. K. S. Hari, D. Sorin, A. Meixner, A. Biswas, X. Vera
    DATE'11: Proceedings of the Design, Automation and Test in Europe, 2011 (acceptance rate: ~25%)

  27. mSWAT: Low-Cost Hardware Fault Detection and Diagnosis for Multicore Systems
    S. K. S. Hari, M. Li, P. Ramachandran, B. Choi, S. V. Adve
    MICRO'09: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, 2009 (acceptance rate: ~24%)

  28. Accurate Microarchitecture-level Fault modeling for Studing Wear-out Faults
    M. Li, P. Ramachandran, U. Karpuzcu, S. K. S. Hari, S. V. Adve
    HPCA'09: Proceeding of the International Conference on High-Performance Computer Architecture, 2009 (acceptance rate: ~19%)

  29. Automatic Constraint Based Test Generation for Behavioral HDL Models
    S. K. S. Hari, V. V. Konda, V. Kamakoti, V. Vedula, K. S. Maneperambil
    TVLSI'08: IEEE Transactions on VLSI Systems in the special section on Design Verification and Validation: Theory and Techniques, 2008

  30. Power Virus Generation Using Behavioural Models of Circuits
    K. Najeeb, V. V. Konda, S. K. S. Hari, V. Kamakoti, V. Vedula
    VTS'07: Proceedings of the 25th IEEE VLSI Test Symposium , 2007 (acceptance rate: ~35%)

  31. Constructing Online Testable Circuits using Reversible Logic
    N. Mahammad, S. K. S. Hari, S. Shroff, V. Kamakoti
    VDAT'06: Proceedings of 10th IEEE VLSI Design and Test Symposium, 2006

  32. Efficient Building Blocks for Reversible Sequential Circuit Design
    S. K. S. Hari, S. Shroff, N. Mahammad, V. Kamakoti
    MWSCAS'06: IEEE International Midwest Symposium on Circuits and Systems , 2006


Conference/Workshop Talks/Posters:

Top
  1. SASSIFI: An Architecture-level Fault Injection Tool for GPU Application Resilience Evaluation, at the International Symposium on Performance Analysis of Systems and Software (ISPASS), 2017. I also presented an older different version of this talk at SELSE 2015

  2. Preserving Application Reliability on Unreliable Hardware, at the NVIDIA Research, April 2013, Santa Clara, CA

  3. Analyzing and Reducing Silent Data Corruptions caused by Soft-Errors, at the Doctoral Showcase program at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), Nov 2012, Salt Lake City, UT (Acceptance rate: 25.5%)

  4. Look Ma, No SDCs!
    S. K. S. Hari, R. Venkatagiri, S. Adve, and H. Naeimi
    Poster at the Giga-Scale Research Center (GSRC) Annual Symposium, 2012
    [Margarida Jacome GSRC Best Poster Award, awarded to two posters among 50+ showcased projects from 10+ Universities]

  5. Low-cost Program-level Detectors for Reducing Silent Data Corruptions at the IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), June 2012, Boston, MA (pdf)

  6. Relyzer: Exploiting Application-Level Fault Equivalence to Analyze Application Resiliency to Transient Faults at the ACM International Conference on Architectureal Support for Programming Languages and Operating Systems (ASPLOS), March 2012, London, UK (pdf)

  7. Relyzer: Application Resiliency Analyzer for Transients Faults
    S. K. S. Hari, S. V. Adve, H. Naeimi, and P. Ramachandran
    Poster at the Giga-Scale Research Center (GSRC) Annual Symposium, 2011

  8. Relyzer: Application Resiliency Analyzer for Transient Faults at the IEEE Workshop on Silicon Errors in Logic - System Effects (SELSE), March 2011, Champaign, IL

  9. Relyzer: Application Resiliency Analyzer for Transient Faults at the MDG ArchFest at Intel Corporation, Jan 2011, Portland, OR

  10. SWAT: The Whole Enchilada for In-Core Fault Resiliency
    P. Ramachandran, S. K. S. Hari, M. Li, S. K. Sahoo, R. Smolinski, X. Fu, L. Chen, S. V. Adve
    Giga-Scale Research Center (GSRC) Annual Symposium, 2010

  11. mSWAT: Low-Cost Hardware Fault Detection and Diagnosis for Multicore Systems at the 42nd International Symposium of Microarchitecture (MICRO), Dec 2009, New York, NY

  12. mSWAT: Low-Cost Hardware Fault Detection and Diagnosis for Multicores
    S. K. S. Hari, M. Li, P. Ramachandran, B. Choi, S. V. Adve
    Poster at the Spring CS Grad Expo, UIUC, 2010
    I also presented a slightly different version of this poster at the Gigascale Research Center (GSRC) Annual Symposium, 2009 (pdf)

  13. Application Aware SoftWare Anomaly Treatment
    P. Ramachandran, S. K. S. Hari, M. Li, S. V. Adve, S. Vasudevan
    Poster at the Giga-Scale Research Center (GSRC) Annual Symposium , 2009

  14. Software Managed Resiliency at the Gigascale Systems Research Center (GSRC) Kickoff meeting, Nov 2009, Princeton, NJ

  15. MSWAT: Hardware Fault Detection and Diagnosis for Multicore Systems at the Resilient Theme Summer Workshop of Gigascale Systems Research Center (GSRC), July 2009, Urbana, IL

  16. SWAT: Hardware Reliability through Software Anomaly Treatment
    M. Li, P. Ramachandran, S. K. Sahoo, S. K. S. Hari, S. V. Adve, V. Adve, Y. Y. Zhou
    Poster at the Giga-Scale Research Center (GSRC) Annual Symposium, 2008

  17. SWAT-Sim: Accurate Microarchitecture-level Fault Models
    M. Li, P. Ramachandran, U. Karpuzcu, S. K. S. Hari, S. V. Adve
    Poster at the Giga-Scale Research Center (GSRC) Annual Symposium, 2008

  18. Automatic Constraint Based Test Generation for Behavioral HDL Models
    S. K. S. Hari, V. V. Konda, V. Kamakoti, V. Vedula, K. S. Maneperambil
    University Booth at Design, Automation and Test in Europe (DATE), 2007


Theses:

Top

Software:

Top