Orthanc 1.5.6 max CPU and stops responding

Dear all,

We’re investigating an event on orthanc 1.5.6 running on docker on ubuntu 18.04 on a t2.micro on AWS. The server has been running fine for several months and an identical server just went through performance testing without any major incidents, hence what we are seeing here is a bit puzzling.

At the time of the crash, the following activity would have been ongoing (roughly). a single study transfer was incoming over the internet using the transfer accelerator. low intensity web app trafic would have been querying orthanc for some updates and possibly a dicom viewer was requesting a download of a study via the native API. This level of activity has never previously been an issue for the server to handle, hence I don’t think it’s an “overload” situation.

as you can see in the attached dashboard screenshots, you will notice that after a bump on incoming internet trafic (the incoming study) hit the server, it suddenly spikes CPU to 100 % for 5-10 minutes until the point where it stops responding. Actually the whole operating system stopped responding. The only way of restoring operation after 20 minutes was to stop the server using AWS tools (a reboot command to the OS did nothing). You will notice it coming back online after an outage of about 20 minutes.

The orthanc log meanwhile reports nothing. Not a single line.

Any suggestions where I look next for clues to what happened to this instance?

thank you all,
Pär
www.cmrad.com

orthanc1.5.6dies.png

Nothing in Orthanc Verbose logs?
What about scripts? Do you have any post processing Lua scripts?
If you send not using transfer accelerator (peer) do you have the same issue?

Hi Bryan, thanks for your comment. I don’t have any verbose logs as this is a productive system and we can’t have that switched on all the time. I’m trying to figure out how to reproduce the error in a non-production system to be able to capture logs, but the issue has only been seen once.

We have been sending a lot of data both using the peer and accelerator transfer methods. Much bigger volumes than what was going on at the time of the crash. The system has typically responded well to high volumes (within reason for the server size).

regards,
Pär

Hi Pär,

I know Thibault is currently working on a Docker image in which we’ll include a build of Orthanc that would generate a Core Dump that can be analyzed in case of such a crash. That’s probably the only way to investigate this kind of issues.

We’ll keep you up to date.

BR

Alain

As a complement to Alain’s answer, I’ve just added a section in the Orthanc Book about how to analyse a crash:
http://book.orthanc-server.com/faq/crash.html