Historical Aerospace Software Errors Categorized to Influence Fault Tolerance (English)

Lorraine E Prokop

Conference paper / No indication

How to get this title?

Local TIB services

TIB document delivery Purchase

Export, share and cite

https://ntrs.nasa.gov/citations/20230012909

Since the first use of computers in space and aircraft, software errors have occurred. These errors can manifest as loss-of-life or less catastrophically. As the demand for automation increases, software in mission or safety-critical systems should be designed to be tolerant to the most likely software faults. This paper categorizes a set of 55 historic aerospace software error incidents from 1962 to 2023 to determine trends of how and where automation is most likely to fail, behaving unexpectedly. A distinction between software producing unexpected (erroneous) output versus no output (failsilent) is introduced. Of the historical incidents analyzed, 85% were from software producing wrong output rather than simply stopping. Rebooting was found to be ineffective to clear erroneous behavior, and not reliable to recover from silent failures. Error origin was within the code/logic itself in 58% of cases, 16% from configurable data, 15% from unexpected sensor input, and 11% from command/operator input. A substantial forty percent (40%) of unexpected software behavior was indicated by the absence of code, arising from unanticipated situations and missing requirements, and 16% of incidents were subjectively deemed “unknown-unknowns”. No incidents were found to be the result of programming language, compiler, tool, or operating system; and only sixteen percent (16%) of all incidents were considered errors traditional computer science/programming in nature. These findings indicate that for fault tolerance, erroneous automation behavior must be a primary consideration especially at critical moments, and reboot recoverability may not be viable. Special care should be taken to validate configurable data and commands prior to use. “Test-like-you-fly”, including hardware-in-the-loop combined with robust off-nominal testing should be used to uncover missing logic arising from unanticipated situations not covered by requirements alone. This study uniquely focuses on manifestations of unexpected flight software behavior, independent of ultimate root cause. We characterize software error behavior and origin to improve software design, test, and operations for resilience to the most common manifestations, and provide a rich dataset for further study.

Title:

Historical Aerospace Software Errors Categorized to Influence Fault Tolerance
Contributors:

Lorraine E Prokop ( author )
Conference:

45th International IEEE Aerospace Conference ; 2024 ; Big Sky, MT, US
Publisher:

Institute of Electrical and Electronics Engineers

Publication date:

2024-03-02
Type of media:

Conference paper
Type of material:

No indication
Language:

English
Contract Number:

20230012909
Keywords:

Errors , Aerospace , Software , Space Transportation and Safety , Computer Programming and Software
Source:

NTRS

How to get this title?

Local TIB services

TIB document delivery Purchase

Pricing information

Quicklinks

Borrowing & Ordering

Quicklinks

Search & discover

Quicklinks

Learning & working

Quicklinks

Publishing & Archiving

Quicklinks

About the TIB

Quicklinks

Research & Development

Historical Aerospace Software Errors Categorized to Influence Fault Tolerance (English)

How to get this title?

Export, share and cite

More details on this result

Similar titles

How to get this title?

Export, share and cite