Error handling

To make GROMACS behave like a proper library, we need to change the way errors etc. are handled. Basically, the library should not print out anything to stdio/stderr unless it is part of the API specification, and even then, there should be a way for the user to suppress the output. Also, the library should normally not terminate the program without the user having control over this. There are different types of errors, which also affects the handling. Different cases are discussed separately below, split by the way they are handled. These guidelines are starting to take their final form, although details may still change.

  • For programming errors, i.e., errors that should never occur if the program is correctly written, it’s acceptable to assert and terminate the program. This applies to both errors in the library and errors in user code or user input that calls the library. Older code tends to still use assert() calls, but new code should prefer more expressive functionality such as GMX_RELEASE_ASSERT(). This version of the macro will result in asserts that are still present when the build type is Release, which is what we want by default. In performance-sensitive parts of the code, it is acceptable to rather use GMX_ASSERT() to avoid the performance penalty of a branch when the code is compiled for production use. By default, Jenkins builds the RelWithAssert build type.
  • For some errors it might be feasible to recover gracefully and continue execution. In this case, your APIs should be defined so that the API-user/programmer does not have to check separately whether the problem was due to a programming error, but it’s better to e.g. use exceptions for recoverable errors and asserts for programming errors.
  • Exceptions should only be used for unexpected errors, e.g., out of memory or file system IO errors. As a general guideline, incorrect user input should not produce an untrapped exception resulting in execution termination telling the user an exception occured. Instead, you should catch exceptions in an earlier stack frame, make a suitable decision about diagnostic messages, and then decide how execution should be terminated.
  • There is a global list of possible exceptions in src/gromacs/utility/exceptions.h, and the library should throw one of these when it fails, possibly providing a more detailed description of the reason for the failure. The types of exceptions can be extended, and currently include:
    • Out of memory (e.g. std::bad_alloc)
    • File I/O error (e.g. not found)
    • Invalid user input (could not be understood)
    • Inconsistent user input (parsed correctly, but has internal conflicts)
    • Simulation instability
    • Invalid API call/value/internal error (an assertion might also be used in such cases)
    • In the internals of a module called from code that is not exception safe, you can use exceptions for error handling, but avoid propagating them to caller code.
  • Avoid using exceptions to propagate errors across regions that start or join threads with OpenMP, since OpenMP cannot make guarantees about whether exceptions are caught or if the program will crash. Currently we catch all exceptions before we leave an OpenMP threaded region. If you throw an exception, make sure that it is caught and handled appropriately in the same thread/OpenMP section.
  • There are also cases where a library routine wants to report a warning or a non-fatal error, but is still able to continue processing. In this case you should try to collect all issues and report and report them (similar to what grompp does with notes, warnings and errors) instead of just returning the first error. It is irritating to users if they fix the reported error, but then they keep getting a new error message every time the rerun the program.
  • A function should not fail as part of its normal operation. However, doing nothing can be considered normal operation. A function accessing data should typically also be callable when no such data is available, but still return through normal means. If the failure is not normal, it is OK to rather throw an exception.

For coding guidelines to make this all work, see Implementing exceptions for error handling.