Siva Hari is a Principal Research Scientist in the Architecture
Research Group at NVIDIA. His research interests are in
computer architecture, artificial intelligence, and systems,
with the current focus on Autonomous and High-Performance Computing Systems.
He obtained his Ph.D. and M.S. in Computer Science at University of Illinois at
Urbana-Champaign and B.Tech. in Computer Science and Engineering
at Indian Institute of Technology (IIT) Madras.
He received the 2023 Rising Star in Dependability Award at the DSN conference.
He received the 2014
David
J. Kuck Outstanding Ph.D. Thesis Award from the Computer Science Department at the University of Illinois.
He received the
W.J. Poppelbaum Memorial Award from the Computer Science Department at the University of Illinois
at Urbana-Champaign in 2012 for academic merit and creativity in computer hardware or architecture.
His work received the following recognitions:
two papers selected for IEEE Top Picks in Test and Reliability in 2023,
one paper selected as an IEEE Micro's Top Pick in 2022, and
Best Research Paper Award at ISSRE 2020,
Best Paper Award Runner-up at DSN 2018,
paper selected as an IEEE Micro's Top Pick in 2013, and
Margarida Jacome Best Poster Award at GSRC Annual Symposium, 2012.
Conference and Journal Publications |
Top |
-
ALBERTA: ALgorithm-Based Error Resilience in Transformer Architectures
H. Liu, V. Singh, M. Filipiuk, S. K. S. Hari
IEEE Open Journal of the Computer Society, 2024
-
Safety-Critical Scenario Generation Via Reinforcement Learning Based Editing
H. Liu, L. Zhang, S. K. S. Hari, J. Zhao
ICRA'24: IEEE International Conference on Robotics and Automation, 2024
-
VaPr: Variable-Precision Tensors to Accelerate Robot Motion Planning
Y. S. Hsiao, S. K. S. Hari, B. Sundaralingam, J. Yik, T. Tambe, C. Sakr, S. W. Keckler, V. J. Reddi
IROS'23: IEEE/RSJ International Conference on Intelligent Robots and Systems, 2023
-
CuRobo: Parallelized Collision-Free Robot Motion Generation
B. Sundaralingam, S. K. S. Hari, A. Fishman, C. Garret, K. Van Wyk, A. Millane, H. Oleynikova, A. Handa, F. Ramos, N. Ratliff, and D. Fox
ICRA'23: IEEE International Conference on Robotics and Automation, 2023
-
Zhuyi: Perception Processing Rate Estimation for Safety of Autonomous Vehicles
Y. S. Hsiao, S. K. S. Hari, M. Filipiuk, T. Tsai, M. B. Sullivan, V. J. Reddi, V. Singh, and S. W. Keckler
DAC'22: Design Automation Conference, 2022
-
Exploiting Temporal Data Diversity for Detecting Safety-critical Faults in AV Compute Systems
S. Jha, S. Cui, T. Tsai, S. K. S. Hari, M. B. Sullivan, Z. T. Kalbarczyk, S. W. Keckler, R. K. Iyer
DSN'22: IEEE/IFIP International Conference on Dependable Systems and Networks, 2022
-
Characterizing and Mitigating Soft Errors in GPU DRAM
M. B. Sullivan, M. O’Connor, D. Lee, P. Racunas, S. Hukerikar, N. Saxena, T. Tsai, S. K. S. Hari, and S. W. Keckler
TopPicks'22: IEEE Micro, Special Issue on Top Picks from the 2021 Computer Architecture Conferences, 2022
-
Suraksha: A Framework to Analyze the Safety Implications of Perception Design Choices in AVs
H. Zhao, S. K. S. Hari, T. Tsai, M. B. Sullivan, S. W. Keckler, and J. Zhao
ISSRE'21: IEEE International Conference on Software Reliability Engineering, 2021
-
Optimizing Selective Protection for CNN Resilience
A. Mahmoud, S. K. S. Hari, C. Fletcher, S. Adve, C. Sakr, N. Shanbag, P. Molchanov, M. B. Sullivan, T. Tsai, and S. W. Keckler
ISSRE'21: IEEE International Conference on Software Reliability Engineering, 2021
-
Characterizing and Mitigating Soft Errors in GPU DRAM
M. B. Sullivan, M. O’Connor, D. Lee, P. Racunas, S. Hukerikar, N. Saxena, T. Tsai, S. K. S. Hari, and S. W. Keckler
MICRO'21: IEEE/ACM International Symposium on Microarchitecture, 2021
Selected as an IEEE Top Pick in Test and Reliability, 2023
-
Generating and Characterizing Scenarios for Safety Testing of Autonomous Vehicles
Z. Ghodsi, S. K. S. Hari, I. Frosio, T. Tsai, A. Troccoli, S. W. Keckler, S. Garg, and A. Anandkumar
IEEE IV'21: IEEE Intelligent Vehicles Symposium, 2021
-
NVBitFI: Dynamic Fault Injection for GPUs
T. Tsai, S. K. S. Hari, M. B. Sullivan, O. Villa, and S. W. Keckler
DSN'21: IEEE/IFIP International Conference on Dependable Systems and Networks, 2021
-
Making Convolutions Resilient via Algorithm-Based Error Detection Techniques
S. K. S. Hari, M. B. Sullivan, T. Tsai, and S. W. Keckler
TDSC'21: IEEE Transactions on Dependable and Secure Computing, 2021
-
Demystifying GPU Reliability: Comparing and Combining Beam Experiments, Fault Simulation, and Profiling
F. F. do Santos, S. K. S. Hari, P. M. Basso, L. Carro, and P. Rech
IPDPS'21: IEEE International Parallel & Distributed Processing Symposium, 2021
-
AV-FUZZER: Finding Safety Violations in Autonomous Driving Systems
G. Li, Y. Li, S. Jha , T. Tsai, M. B. Sullivan, S. K. S. Hari, Z. T. Kalbarczyk, and R. K. Iyer
ISSRE'20: IEEE International Conference on Software Reliability Engineering, 2020
Best Research Paper Award
-
GPU-TRIDENT: Efficient Modeling of Error Propagation in GPU Programs
A. R. Anwer, G. Li, K. Pattabiraman, M. B. Sullivan, T. Tsai, and S. K. S. Hari
SC'20: International Conference for High-Performance Computing, Networking, Storage and Analysis, 2020
-
GPU Snapshot: Checkpoint Offloading for GPU-Dense Systems
K. Lee, M. B. Sullivan, S. K. S. Hari, T. Tsai, S. W. Keckler, and M. Erez
ICS'19: International Conference on Supercomputing, 2019
-
ML-based Fault Injection for Autonomous Vehicles: A Case for Bayesian Fault Injection
S. Jha, S. S. Banerjee, T. Tsai, S. K. S. Hari, M. B. Sullivan, Z. T. Kalbarczyk, S. W. Keckler, R. K. Iyer
DSN'19: IEEE/IFIP International Conference on Dependable Systems and Networks, 2019
-
Optimizing Software-Directed Instruction Replication for GPU Error Detection
A. Mahmoud, S. K. S. Hari, M. Sullivan, T. Tsai, and S. Keckler
SC'18: International Conference for High-Performance Computing, Networking, Storage and Analysis, 2018 (acceptance rate: ~19%)
-
SwapCodes: Error Codes for Hardware-Software Cooperative GPU Pipeline Error Detection
M. Sullivan, S. K. S. Hari, B. Zimmer, T. Tsai, and S. Keckler
MICRO'18: IEEE/ACM International Symposium on Microarchitecture, 2018 (acceptance rate: ~21%)
-
Modeling Soft Error Propagation in Programs
G. Li, K. Pattabiraman, S. K. S. Hari, M. Sullivan, and T. Tsai,
DSN'18: IEEE/IFIP International Conference on Dependable Systems and Networks, 2018 (acceptance rate: ~25%)
Best Paper Award Runner-Up
-
Understanding Error Propagation in Deep-Learning Neural Networks (DNN) Accelerators and Applications
G. Li, S. K. S. Hari, M. Sullivan, T. Tsai, K. Pattabiraman, J. Emer, S. Keckler
SC'17: International Conference for High-Performance Computing, Networking, Storage and Analysis, 2017 (acceptance rate: ~19%)
Selected as an IEEE Top Pick in Test and Reliability, 2023
-
SASSIFI: An
Architecture-level Fault Injection Tool for GPU Application Resilience Evaluation
S. K. S. Hari, T. Tsai, M. Stephenson, S. Keckler, J. Emer
ISPASS'17: IEEE International Symposium on Performance Analysis of Systems and Software, 2017 (acceptance rate: ~30%)
-
Approxilyzer: Towards A Systematic Framework for Instruction-Level Approximate Computing and its Application to Hardware Resiliency
R. Venkatagiri, A. Mahmoud, S. K. S. Hari, S. Adve
MICRO'16: IEEE/ACM International Symposium on Microarchitecture, 2016 (acceptance rate: ~21%)
-
Flexible Software Profiling of GPU Architectures
M. Stephenson, S. K. S. Hari, Y. Lee, E. Ebrahimi, D. Johnson, D.Nellans, M. O’Connor, S. W. Keckler
ISCA'15: International Symposium on Computer Architecture, 2015 (acceptance rate: ~19%)
-
Locality-Driven Dynamic GPU Cache Bypassing
C. Li, S. L. Song, H. Dai, A. Sidelnik, S. K. S. Hari, and H. Zhou
ICS'15: International Conference on Supercomputing, 2015 (acceptance rate: ~25%)
-
Hardware Fault Recovery for I/O Intensive Applications
P. Ramachandran, S. K. S. Hari, M. Li, and S. V. Adve
TACO'14:Transactions on Architecture and Code Optimization, 2014
-
GangES: Gang Error Simulation for Hardware Resiliency Evaluation
S. K. S. Hari, R. Venkatagiri, S. V. Adve, and H. Naeimi
ISCA'14: International Symposium on Computer Architecture, 2014
-
Relyzer: Application Resiliency Analyzer for Transient Faults
S. K. S. Hari, S. V. Adve, H. Naeimi, and P. Ramachandran
TopPicks'13: IEEE Micro, Special Issue on Top Picks from the 2012 Computer Architecture Conferences, 2013
-
Low-cost Program-level Detectors for Reducing Silent Data Corruptions
S. K. S. Hari, S. V. Adve, and H. Naeimi
DSN'12: IEEE/IFIP International Conference on Dependable Systems and Networks, 2012 (acceptance rate: ~17%)
-
Relyzer: Exploiting Application-level Fault Equivalence to Analyze Application Resiliency to Transient Faults
S. K. S. Hari, S. V. Adve, H. Naeimi, and P. Ramachandran
ASPLOS '12: International Conference on Architectural Support for Programming Languages and Operating Systems, 2012 (acceptance rate: ~21%)
-
CrashTest'ing SWAT: Accurate, Gate-Level Evaluation of Symptom-Based Resiliency Solutions
A. Pellegrini, R. Smolinski, L. Chen, X. Fu, S. K. S. Hari, J. Jiang, S. V. Adve, T. Austin, V. Bertacco
DATE'12: Design, Automation and Test in Europe, 2012
-
Architectures for Online Error Detection and Recovery in Multicore Processors
D. Gizopoulos, M. Psarakis, S. V. Adve, P. Ramachandran, S. K. S. Hari, D. Sorin, A. Meixner, A. Biswas, X. Vera
DATE'11: Design, Automation and Test in Europe, 2011 (acceptance rate: ~25%)
-
mSWAT: Low-Cost Hardware Fault Detection and Diagnosis for Multicore Systems
S. K. S. Hari, M. Li, P. Ramachandran, B. Choi, S. V. Adve
MICRO'09: IEEE/ACM International Symposium on Microarchitecture, 2009 (acceptance rate: ~24%)
-
Accurate Microarchitecture-level Fault modeling for Studing Wear-out Faults
M. Li, P. Ramachandran, U. Karpuzcu, S. K. S. Hari, S. V. Adve
HPCA'09: Proceeding of the International Conference on High-Performance Computer Architecture, 2009 (acceptance rate: ~19%)
-
Automatic Constraint Based Test Generation for Behavioral HDL Models
S. K. S. Hari, V. V. Konda, V. Kamakoti, V. Vedula, K. S. Maneperambil
TVLSI'08: IEEE Transactions on VLSI Systems in the special section on Design Verification and Validation: Theory and Techniques, 2008
-
Power Virus Generation Using Behavioural Models of Circuits
K. Najeeb, V. V. Konda, S. K. S. Hari, V. Kamakoti, V. Vedula
VTS'07: IEEE VLSI Test Symposium , 2007 (acceptance rate: ~35%)
-
Constructing Online Testable Circuits using Reversible Logic
N. Mahammad, S. K. S. Hari, S. Shroff, V. Kamakoti
VDAT'06: IEEE VLSI Design and Test Symposium, 2006
-
Efficient Building Blocks for Reversible Sequential Circuit Design
S. K. S. Hari, S. Shroff, N. Mahammad, V. Kamakoti
MWSCAS'06: IEEE International Midwest Symposium on Circuits and Systems , 2006
arXiv and Workshop Publications |
Top |
-
Towards Precision-Aware Fault Tolerance Approaches for Mixed-Precision Applications
B. Fang, S. K. S. Hari, T. Tsai, X. Li, G. Gopalakrishnan, I. Laguna, K. Barker, and A. Li
FTXS'22: Workshop on Fault-Tolerance for HPC at Extreme Scale, 2022
-
Suraksha: A Quantitative AV Safety Evaluation Framework to Analyze Safety Implications of Perception Design Choices
H. Zhao, S. K. S. Hari, T. Tsai, M. B. Sullivan, S. W. Keckler, and J. Zhao
SSIV'21: Workshop on Safety and Security of Intelligent Vehicles, 2021
-
Simulation Driven Design and Test for Safety of AI Based Autonomous Vehicles
V. Singh, S. K. S. Hari, T. Tsai, M. Pitale
SAIAD'21: Workshop on Safe Artificial Intelligence for Automated Driving, 2021
-
Generating and Characterizing Scenarios for Safety Testing of Autonomous Vehicles
Z. Ghodsi, S. K. S. Hari, I. Frosio, T. Tsai, A. Troccoli, S. W. Keckler, S. Garg, and A. Anandkumar
arXiv'21
ART'20: Shorter version accepted at the IEEE International Workshop on Automotive Reliability and Test, 2020
-
Making Convolutions Resilient via Algorithm-Based Error Detection Techniques
S. K. S. Hari, M. B. Sullivan, T. Tsai, S. W. Keckler
arXiv'20
-
PyTorchFI: A Runtime Perturbation Tool for DNNs
A. Mahmoud, N. Aggarwal, A. Nobbe, J. R. S. Vicarte, S. V. Adve, C.W. Fletcher, I. Frosio, and S. K. S. Hari
DSN-S'20: IEEE/IFIP International Conference on Dependable Systems and Networks – Supplemental Volume, 2020,
presented at the Workshop on Dependable and Secure Machine Learning (DSML), 2020
-
Estimating Silent Data Corruption Rates Using a Two-Level Model
S. K. S. Hari, P. Rech, T. Tsai, M. Stephenson, A. Zulfiqar, M. B. Sullivan, P. Shirvani, P. Racunas, J. Emer, and S. W. Keckler
arXiv'20
-
Feature Map Vulnerability Evaluation in CNNs
A. Mahmoud, S. K. S. Hari, C. Fletcher, S. Adve, C. Sakr, N. Shanbag, P. Molchanov, M. B. Sullivan, T. Tsai, and S. W. Keckler
SARA'20: Workshop on Secure and Resilient Autonomy (SARA), 2020
-
HarDNN: Feature Map Vulnerability Evaluation in CNNs
A. Mahmoud, S. K. S. Hari, C. Fletcher, S. Adve, C. Sakr, N. Shanbag, P. Molchanov, M. B. Sullivan, T. Tsai, and S. W. Keckler
arXiv'20
An updated version appeared in SRC TECHCON'20 with the title "HarDNN: Fine-Grained Vulnerability Evaluation and Protection for Convolutional Neural Networks"
-
Towards analytically evaluating the error resilience of GPU Programs
A. R. Anwer, G. Li, K. Pattabiraman, S. K. S. Hari, M. B. Sullivan, T. Tsai
SELSE'19: IEEE Workshop on Silicon Errors in Logic - System Effects, 2019
-
On the Trend of Resilience for GPU-Dense Systems
K. Lee, M. B. Sullivan, S. K. S. Hari, T. Tsai, S. W. Keckler, M. Erez
DSN-S'19: IEEE/IFIP International Conference on Dependable Systems and Networks – Supplemental Volume, 2019, also presented at
the IEEE Workshop on Silicon Errors in Logic - System Effects, 2019 and received Best of SELSE Award
-
Kayotee: A Fault Injection-based System to Assess the Safety and Reliability of Autonomous Vehicles to Faults and Errors
S. Jha, T. Tsai, S. K. S. Hari, M. Sullivan, Z. Kalbarczyk, S. W. Keckler, and R. Iyer
ART'18: IEEE International Workshop on Automotive Reliability & Test, 2018
-
An Analytical Model for Hardened Latch Selection and Exploration
M. Sullivan, B. Zimmer, S. K. S. Hari, T. Tsai, S. Keckler
SELSE'16: IEEE Workshop on Silicon Errors in Logic - System Effects, 2016
-
SASSIFI:Evaluating Resilience of GPU Applications
S. K. S. Hari, T. Tsai, M. Stephenson, S. W. Keckler, and J. Emer.
SELSE'15: IEEE Workshop of Silicon Errors in Logic - System Effects (SELSE), 2015
-
Measuring the Radiation Reliability of SRAM Structures in GPUs Designed for HPC
P. Rech, L. Carro, N. Wang, T. Tsai, S. K. S. Hari, and S. W. Keckler
SELSE'14: IEEE Workshop on Silicon Errors in Logic - System Effects , 2014
-
Relyzer: Application Resiliency Analyzer for Transient Faults
S. K. S. Hari, H. Naeimi, P. Ramachandran, S. V. Adve
SELSE'11: IEEE Workshop of Silicon Errors in Logic - System Effects , 2011
-
Understanding When Symptom Detectors Work by Studying Data-Only Application Values
P. Ramachandran, S. K. S. Hari, S. V. Adve, H. Naeimi
SELSE'11: IEEE Workshop of Silicon Errors in Logic - System Effects, 2011.
-
CrashTest'ing SWAT: Accurate, Gate-Level Evaluation of Symptom-Based Resiliency Solutions
A. Pellegrini, R. Smolinski, X. Fu, L. Chen, S. K. S. Hari, J. Jiang, S. V. Adve, T. Austin, V. Bertacco
SELSE'11: IEEE Workshop of Silicon Errors in Logic - System Effects, 2011
- Collision-free motion generation
B. Sundaralingam, S. K. S. Hari, A. H.Fishman, C. R. Garrett, A. J. Millane, E. Oleynikova, A. Handa, F. T. Ramos, N. D. Ratliff, K. V. Wyk, D. Fox - US Patent App. 18/200,347, 2024
- Techniques for identifying occluded objects using a neural network
S. K. S. Hari, J. L. Clemons, T. K. Tsai - US Patent App. 17/943,576, 2024
- Method to estimate processing rate requirement for safe AV driving to prioritize resource usage
S. K. S Hari, Y. Hsiao, T. Tsai, V. Singh - US Patent App. 17/963,531, 2023
- Adversarial scenarios for safety testing of autonomous vehicles
S. K. S. Hari, I. Frosio, Z. Ghodsi, A. Anandkumar, T. Tsai, S. W. Keckler, A. Troccoli - US Patent 11,550,325, 2023
- Hardware fault detection for feedback control systems in autonomous machine applications
T. Tsai, S. Jha, S. K. S. Hari, M. B. Sullivan - US Patent App. 16/994,382, 2022
- Packed error correction code (ECC) for compressed data protection
M. B. Sullivan, J. M. Pool, Y. Huang, T. K. Tsai, S. K. S. Hari, S. W. Keckler - US Patent 11,522,565, 2022
- Tensor-based driving scenario characterization
S. K. S. Hari, I. Frosio, Z. Ghodsi, A. Anandkumar, T. Tsai, S. W. Keckler - US Patent 11,390,301, 2022
- System and Methods for Hardware-Software Cooperative Pipeline Error Detection
M. B. Sullivan, S. K. S. Hari, B. Zimmer, T. Tsai, S. W. Keckler - US Patent 11,409,597, 2022
- Optimizing Software-Directed Instruction Replication for GPU Error Detection
S. K. S. Hari, M. B. Sullivan, T. Tsai, S. W. Keckler, A. Mahmoud - US Patent 10,817,289, 2020
- NVBitFI:
This open-source tool provides an automated framework to
perform error injection campaigns for GPU application
resilience evaluation. NVBitFI builds on top of NVBit, which
is a low-level assembly-language instrumentation tool for GPUs.
NVBitFI is a much improved version of SASSIFI. It runs on newer
GPUs (including Turing and Volta GPUs), works with pre-compiled libraries,
and is expected to be signficanlty faster than SASSIFI.
- PyTorchFI
:
PyTorchFI is a runtime fault injection tool for PyTorch, an
open-source deep learning platform developed by Facebook.
It allows users to simulate errors within a Convolutional Neural Network (CNN)
during inference and develop insights into the robustness of
different models and understand why some models are more resilient.
- SASSIFI:
This open-source tool provides an automated framework to
perform error injection campaigns for GPU application
resilience evaluation. SASSIFI builds on top of SASSI, which
is a low-level assembly-language instrumentation tool that
provides the ability to instrument instructions in the
low-level GPU assembly language (SASS). SASSIFI can be used to
perform many types of resilience evaluation studies. Our ISPASS
2017 paper explains the tool in detail and presents a few case
studies.
- Approxilyzer:
This is an open-source framework for instruction level
approximation and resiliency software. Approxilyzer provides a
systematic way to identify instructions that exhibit first-order
approximation potential. It can also identify silent data corruption
(SDC) causing instructions in the presence of single-bit errors.
Approxilyzer employs static and dynamic analysis, in addition to
heuristics, to reduce the run-time of finding Approximate instructions
and SDC-causing instructions by 3-6x orders of magnitude compared to a
naive error injection approach.