OnChange Callbacks and CPU loads

I ran another test where I switched Alain’s docker compose file version 2.1. This let me add cpuset and cpucount limits to the Orthancs and force each to utilize only one CPU. There was some suggestion in the other discussions regarding futex problems that suggested limiting to one CPU helped.

In this case, assuming cpuset is properly limiting the Orthanc to one CPU, the results were the same as without the limitation. Post anonymization, CPU falls to 0.5%. Deleting all data causes it to jump to 1.5-2% and remain there.

John.

FYI, all my tests of Alain’s setup above were using the first Orthanc in the docker compose file:

orthanc-a : osimis/orthanc:23.5.1

Just now I ran a test on my Fedora system with the second orthanc:

orthanc-b: osimis/orthanc:22.12.2

Surprise! The 22.12.2 Orthanc CPU does not idle hot after anonymization and deletion. After anonymization, it settles back down to <0.5%. And again, after deleting all files, it settles back down to <0.5%. I ran the test twice with the same results.

My apologies, I am guessing this is what you wanted me to test from the beginning, but I was focusing on finding a system where Orthanc-A would behave as expected.

I tested Orthanc-b on both my CentOS hosts with similar results: CPU usage returns to < 0.5% when idle.

Hi John,

Thanks a lot for all your investigations. I’m adding this to our todo list for later investigation since it might take some time to investigate and I do not consider it as super high priority since the CPU load is acceptable on most systems.

I’ll keep you informed when I make any progress.

Best regards,

Alain.

Thanks, Alain.

For those who don’t need the latest version, I think I will roll back to the earlier version.

Though we’re only seeing low idling CPU in the couple percent range for a small study, I have observed that anonymizing our large studies (30k-80k instances) can leave the Orthanc idling up to 25-50% of a CPU. Luckily, my users in that case don’t need the latest version and I can roll them back.

Cheers,
John.

1 Like

Hi,

I made some quick investigations. Here are my current notes for future follow up.

Started Orthanc with a debugger with this minimal config to reduce the number of threads at maximum:

{
    "Name": "Orthanc debug",
    "HttpPort": 8043,
    "DicomServerEnabled" : false,
    "StorageDirectory": "OrthancStorageDebugBis",

    "RemoteAccessAllowed": true,
    "AuthenticationEnabled": false,

    "JobsEngineThreadsCount" : {
      "ResourceModification": 4     // for /anonymize, /modify
    },
  
    "HttpThreadsCount": 1,
    "OverwriteInstances": true
}

After a fresh Orthanc start, it is at 0.3% CPU while idle with some small peak to 5% at regular interval.

I then upload a series with 1.2k instance, restart → 0.3% CPU
Then, I anonymize the study. After it is done, the CPU is around 0.7%

I break the execution and went through all the threads before and after the anonymization. Here’s what they are doing:

HttpServer: waitForExit -> ServerBarrier sleep(100us)
DB Flusing thread -> sleep(500ms)
Unstable resource monitor thread -> sleep(500ms)
ChangeThread: waiting on pthread_cond_timedwait(100ms)
MemoryTrimmingThread: sleep(100ms)
JobsEngine::RetryHandler: sleep(200ms)
JobsEngine::Worker: waiting on pthread_cond_timedwait(200ms)
JobsEngine::Worker: waiting on pthread_cond_timedwait(200ms)
ServerContext::SaveJobsThread: sleep(100ms)
CivetWeb-worker consume_socket: waiting on mutex
CivetWeb-worker accept loop: waiting poll (200ms)
OrthancWebDav::UploadWorker waiting on pthread_cond_timedwait(100ms)
LuaScripting::EventThread waiting on pthread_cond_timedwait(100ms)

Most of these threads are doing nothing once they wake up since there’s nothing to do.
However, I tried to disable the MemoryTrimmingThread and it seems the CPU is down to 0.3% even after the anonymization.

My assumption right now is that the malloc_trim takes more time after anonymization because the memory is ‘fragmented’ or more memory have been consumed. The next calls still takes the same time because the memory can not be trimmed…

I’ll dig into it a bit deeper. Maybe we should just execute it every 10 seconds instead of every 100ms …

I have just implemented this change to call the malloc_trim every 30s instead of every 100ms.

@HDCoder, if you could validate that it solves your issue with the latest osimis/orthanc:master-unstable image, that would be great !!!

Thanks

Alain

Thanks so much, Alain. I’ll try that out and report back. I want to test with several different sizes of studies and leave it idling for a while to see if the effect remains, scales with the size of the study or lingers with time.

John.

So far, so good. I ran small a (750 image) and a slightly larger (2500 image) study through the anonymization.

The CPU spikes as expected during our anonymization process (our Lua/Python scripts) and then drops back down to < 1% on par with all our other idling Orthancs. Those other orthancs are running the older pre-problematic version.

So it looks like we’re back to normal!

Once your change works its way into the stable release, I’ll start rolling it out to our existing Orthancs.

Thanks,
John.

1 Like