Debugging a production orthanc instance.

Hello everybody. I have been hunting down a bug that’s been causing my servers to crash every couple of days.
My fastest solution is to automate the restarting of the server, but I still want to hunt down the cause of the bug, here is what I’ve implemented so far. Please let me know if there’s other steps you would consider:

  • use the --verbose option for logging errors.
  • Log the errors to a file, or otherwise record the stderr output.
  • Download the latest version of Orthanc.
  • Record and download the name of the instance that caused the process to crash (can be kind of tricky.)

So right now I’m installing the latest version of Orthanc and I’ll wait to see if the crash still happens, I feel at this point I’m ready to report the bug. Is there any other step I should take before reporting the bug?

Thanks!

Hello Tomas

In addition to the steps you’ve already mentioned, if you can reproduce the crash, then running Orthanc inside of gdb would help a lot:

https://book.orthanc-server.com/faq/debugging.html

Thanks Benjamin, I will take this into account.

I upgraded Orthanc to the latest version and the issue appeared to cease. If something happens again, I now have the tools to report and start tracking the bug.

I still have an issue with recording the instance that might cause a crash, considering that the instance itself could not be stored on disk. One solution would be to log incoming (and at this point, why not, outgoing) raw data.
I’m not sure how many ESI layers I should drop, but a pcap should at the very least provide the data that caused the crash, even if it’s in an uncomfortable format. To avoid duplicating (or more) storage space, I would have to expire the captures after a certain period and log them permanently upon a crash.

Is this a sensible thing to run in production? How would you otherwise ensure that you can access the data sent to a server prior to a crash?

Thank you very much.

Glad to read that upgrading Orthanc has solved your issue.

Regarding this question:

I still have an issue with recording the instance that might cause a crash, considering that the instance itself could not be stored on disk. One solution would be to log incoming (and at this point, why not, outgoing) raw data. I’m not sure how many ESI layers I should drop, but a pcap should at the very least provide the data that caused the crash, even if it’s in an uncomfortable format. To avoid duplicating (or more) storage space, I would have to expire the captures after a certain period and log them permanently upon a crash. Is this a sensible thing to run in production? How would you otherwise ensure that you can access the data sent to a server prior to a crash?

In theory, the “core dump” file should contain all the content of the memory that led to the crash, including the faulty data:

https://en.wikipedia.org/wiki/Core_dump

So, you should have access to all the data (e.g. a problematic DICOM file) by analyzing the resulting “core” file using gdb:
https://book.orthanc-server.com/faq/crash.html

For reference, here is a sample gdb instruction to write the content of a “std::string” C++ variable (whose name is “dicomFile”) that comes from some “core” file, into a separate file “/tmp/faulty.dcm” for further analysis:

(gdb) dump binary memory /tmp/faulty.dcm dicomFile._M_dataplus._M_p (dicomFile._M_dataplus._M_p + dicomFile._M_string_length)

HTH,

Sébastien-