A soft approach to a hard problem in autonomous vehicles


December 11, 2019

One guaranteed part of dealing with computer systems is that hardware problems happen.

In a smartphone, hardware errors are inconvenient, but not life-threatening. However, if an autonomous vehicle suffers from a hardware failure, it could be highly dangerous. ASU Asscoiate Professor Aviral Shrivastava holds a miniature autonomous vehicle Aviral Shrivastava, an associate professor of computer science and engineering in the Ira A. Fulton Schools of Engineering at Arizona State University, developed software techniques that can help resolve errors in hardware caused by stray cosmic particles. When particularly hard-to-detect hardware errors, called faults, happen in cyber-physical systems such as autonomous vehicles, it’s important to detect and resolve them correctly to prevent property damage and loss of life. Photo by Erika Gronek/ASU Download Full Image

As autonomous vehicles are “essentially computers on wheels,” it’s important to know when a hardware error might cause accidents, said Aviral Shrivastava, an associate professor of computer science and engineering in the Ira A. Fulton Schools of Engineering at Arizona State University.

While he’s dealing with hardware problems, Shrivastava says new software strategies are actually the key to powerful and efficient methods to prevent hardware errors in critical systems.

A fault in our cars

Unbeknownst to many, outer space poses a problem for computing systems here on Earth — in computer systems like the ones that enable autonomous driving.

Cosmic particles are increasingly affecting integrated circuits — the fundamental components of electronics — as circuits shrink in size and the technology advances. An impact from an invasive particle in just the right spot can cause components and software to behave differently. These are known as “faults” and can vary in severity.

Approximately 90% of faults don’t cause any problems and are known as “masked” faults because another part of the system compensates for the error. These faults are negligible for example, some might corrupt something in the computer memory that is never accessed again.

Other hardware faults are much easier to spot, as they can cause the software running on the hardware systems to start behaving in an obviously different way or to stop working entirely.

One type of fault called silent data corruption, or SDC, is the “sneakiest” kind of fault because the system appears to be behaving normally, but its results, or output, are slightly wrong. Shrivastava says they’re the most important type of fault to study and understand because they are “hard to even detect.”

SDCs are especially troubling for safety-critical cyberphysical systems, or computing systems, such as autonomous vehicles, that interact with our physical world. It’s important to be able to work around hardware faults, or at the very least be able to detect that something is wrong.

Because they’re hard to detect, it can be difficult to say what errors are caused by SDCs. Some experts believe an SDC potentially caused by cosmic particles may have led to the unintended acceleration problem in Toyota vehicles that led to a massive recall in 2009.

“Reducing the number of SDCs is a meaningful metric for evaluating the effectiveness of a protection technique,” Shrivastava said.

Back in 2015, the common wisdom was that only hardware protection techniques are “strong enough” to protect against hardware errors, and that software techniques are not effective.

But Shrivastava did not buy this argument. He reasoned that if a fault does not rise up to the software level, then it is not important. So any fault that matters (i.e., faults that change the program output) should come to the domain of software for resolution; and, once it’s there, software techniques should be able to detect it and fix it.

“There is no fundamental reason as to why software techniques cannot be effective,” Shrivastava said.

He saw merit in using software techniques for protection, reasoning that even though hardware techniques may be effective, they only work when the hardware is protected.

On the other hand, software techniques are universally applicable. Anyone can use them on any past, present or future processors. They can even be applied in a piecemeal approach to reduce their overhead in energy usage and cost. For example, you can categorically apply software fixes — such as using them only on safety-critical applications, or even to the specific safety-critical parts of an application.

“This ‘flexibility of application’ is not possible for techniques that are already implemented in the hardware,” Shrivastava said. “Once implemented, they always cause overhead.”

Shrivastava made this his research goal — to develop effective software protection techniques — for his National Science Foundation CAREER Award project.

A software touch fixes hardware problems

Shrivastava and his doctoral students, Moslem Didehban and Reiley Jeyapaul then started evaluating the existing software protection techniques, and soon found they were already able to detect 90% of SDC faults. In general, 90% is good, but when human lives are on the line, 90% just isn’t good enough.

After carefully analyzing the weak points of existing techniques, they started to systematically fix the holes.

Over the course of a six-year NSF CAREER Award project, the research team developed a set of software techniques that are as effective as hardware techniques. They also produced a large body of repeatable evidence to demonstrate their method’s reliability.

“In this project, we were able to develop software techniques that are very effective, and are able to achieve protection comparable to hardware protection techniques,” Shrivastava said.

One method, called near-zero silent data corruption, or nZDC, was published in the 2014 Design Automation Conference proceedings. Shrivastava and Didehban proposed a technique that duplicates program instructions and compares results intermittently to check for errors. nZDC was demonstrated to detect more than 99.9% of the SDCs.

When an error is found, their other technique, Nemesis (described in a 2017 International Conference on Computer Aided Design paper), runs an underlying cause analysis of the error and finds whether it is even possible to fix the error or not. Nemesis demonstrated an ability to recover from 96% of SDCs. For the remaining 4%, it declared its inability to recover.

While replication is a well-known technique to protect programs, the devil is in the details; most previous works have gotten the details wrong. And even a small change can render the protection ineffective. Shrivastava observed that many previous techniques were sometimes recovering incorrectly. It is also not possible to recover from all errors, and a wrong recovery defeats everything. That is why he inserted a special routine to determine if it is possible to recover correctly.

Shrivastava and his team have a strong publishing record of more than 20 conference papers and 12 journal articles on the topic of software recovery techniques, including Didehban and Jeyapaul’s doctoral dissertations and six other students’ master’s theses.

Their results have the potential to impact how autonomous vehicle systems are certified for reliability. A hardware-based fault monitoring technique is currently the only way to get certified.

“I am of the opinion that effective software techniques should also be allowed,” Shrivastava said.

The meticulous nature of Shrivastava’s six-year software research to find the elusive computing errors with new software techniques has emboldened him to urge the research community to value commitment and perseverance over quick results and ROIs.

“It is because of this impatience that so many ‘ineffective’ protection techniques have been proposed,” Shrivastava said. “Research takes a long time, but carefully considering how faults occur and how to best address them can save lives.”

Monique Clement

Communications specialist, Ira A. Fulton Schools of Engineering

480-727-1958

Dual graduate shares how experiences in The College helped her grow


December 11, 2019

Editor’s note: This is part of a series of profiles for fall 2019 commencement.

Sarah Winkelman says her passion for political science took off in her sophomore year of college. ASU student in Tzfat, Israel Photo by Emily Bonner Download Full Image

“I started off as a political science major,” said Winkelman, a soon-to-be dual graduate of The College of Liberal Arts and SciencesSchool of Politics and Global Studies. “I wasn’t really sure what I was interested in yet, besides my U.S. government class in high school.”

But after taking classes on national security and terrorism, Winkelman says she was hooked.

“After that, I decided to add a major in global studies and a focus on national security in the Middle East,” she said. “I haven't looked back since.”

Winkelman said she encountered challenges while pursuing her degrees, the most difficult of which was finding herself.

“It was hard moving across the country (from Illinois) and into a room with a complete stranger,” she said. “I didn't really find anything I was interested in the first semester of my freshman year, and I spent more of my time in my room studying and not having fun.”

Soon, however, Winkelman said she began to participate more actively in university life by joining clubs and attending events.  

“I thought being busy would make my grades suffer, but it did the opposite,” she said. “By being busy it actually helped me improve my time management skills, which caused my academics to flourish.”

Winkelman answered some questions about her time at Arizona State University.

Question: Why was ASU and The College of Liberal Arts and Sciences the right choice for you?

Answer: I cannot imagine where I would be if I didn't choose ASU. ASU has so many different opportunities to get involved, and offers so many resources. I am so thankful I was able to study abroad through ASU by going to Spain. I also had the opportunity to take three other international trips with clubs, twice to Israel and once to Poland. I found friends through my three on-campus jobs throughout the years, through both clubs and classes. Every year, more opportunities become available for students, and I cannot wait to see what comes next for the younger students here. 

Q: What experiences have you gained from your time in The College that will help you achieve your future goals in life?

A: The experiences that I gained while pursuing two degrees in The College helped me with time management and confidence. Throughout my time at ASU, I grew up and became a stronger, more independent person. I am no longer afraid to ask for help when I need it, like going to professors’ office hours, taking a break from studying and having dinner with friends instead, or traveling the world alone. Every experience I had, whether big or small, made me into the young adult that I am today. 

Q: What's the best piece of advice you'd like to give to those still in school?

A: Just keep chugging along, whether it takes you 4, 5, or even 10 years to graduate. Everything adds up, and in the end, it will all pay off. 

Q: What was your favorite spot on campus, whether for studying, meeting friends or just thinking about life? 

A: I really like studying in the Palo Verde East hub. I am a north campus person because that's where I lived for three years. I work best in busy spaces, so when I see other people working hard it drives me to buckle down and finish the assignment. I used to have coffee meetings there for my internship, too.

Q: What’s your plan for after graduation?

A: I have already committed to Teach for America, where I will be teaching secondary math in a Title I school in Phoenix for at least two years. Long term, I want to work in the government as part of a counterterrorism force, or in national security.

Christopher Clements

Marketing Assistant, The College Of Liberal Arts and Sciences