I recommend very interesting paper. Authors studied many errors in complicated distributed systems, like Cassandra, and found that majority of failures are caused by trivial errors (some of them can be detected even in unit tests). Here is very interesting quote:
almost all (92%) of the catastrophic system failures are the result of incorrect handling of non-fatal errors explicitly signaled in software.
In my opinion causes of errors spotted in the study may apply to any kind of software.