Docker images leaking memory?

Dominic_Bou-Samra1 · April 15, 2020, 5:47am

Hi all.

We have started to run Orthanc in the cloud using GKE. We noticed the memory usage is very odd. This is after uploading just a few studies (maybe 200mb worth of DICOM files). These spikes are the container crashing.
image (4).png

I tried running it locally and observing the memory using this test script:

while :

do

echo “Deleting”

http -a orthanc:orthanc delete http://localhost:8042/instances/e12b2655-f71cf5db-4ece6d73-1b8931d3-492c1dc1

sleep 1

echo “Adding”

http -a orthanc:orthanc post http://localhost:8042/instances < US000000.dcm

done

All this script does is repeatedly add and delete a DICOM instance of about 17mb in size. If you want this file, let me know and I can upload it somewhere.

Running this script using the OSX binary shows the RSS of the process never exceeds 50mb. Under Docker however, it’s growing to 4gb after 20m. I am yet to test the binary under a non dockerized Ubuntu install.

Can anyone shed some light on this behaviour? Should Orthanc hold on to memory like this? Am I misreading the RSS memory values?

Dom

jodogne · April 15, 2020, 8:38am

Hello,

Thanks for the report. Yes, please provide your DICOM file so that we can work on reproducing your issue.

Kind Regards,
Sébastien-

Dominic_Bou-Samra1 · April 15, 2020, 8:43am

Here is the file: https://drive.google.com/open?id=1j7SvS1pCyrDOnw8w5ZXTsAF2_We-o0xy

But I am pretty sure it will work on any file actually. Let me know what else you need or want to try

Dominic_Bou-Samra · April 15, 2020, 9:18am

I’ve just fired up my Ubuntu partition on my home PC and tried the Debian Orthanc package. It displays the same behavior as it was displaying inside Docker. The OSX binary I used from the Osimis site (same version) did not display it.

I have observed some curious behavior however: If I run that script, the Orthanc process appears to top out at 1.319gb without going over, after about 1 minute. It will stay there for minutes without going over. However… If I restart my script it will grow again, this time to 2.601gb

I tried compiling from source as per the LinuxCompilation.txt, and things appear to work fine. No leaks displayed, and memory is constant at 35mb used.

jodogne · April 15, 2020, 3:27pm

Hello again,

Are you using Orthanc 1.6.0 with HTTPS? There is a known and fixed problem:
https://groups.google.com/d/msg/orthanc-users/LpSb0JZDj7w/cbFel-OoCwAJ

Please could you give a try with Orthanc 1.5.8?

If this does not solve your issue, please could you try and reduce the “HttpThreadsCount” configuration option from 50 to 10? Your virtual machine is maybe just too slow to process the requests, making them queue in the memory, which results in high memory usage.

HTH,
Sébastien-

Dominic_Bou-Samra1 · April 15, 2020, 11:16pm

Hi Sebastian.

I am not using HTTPS in my tests.

I tried reducing the HttpThreadsCount config value to 10, and this reduced the memory used. This is good news, but I am now somewhat confused.

Why does memory remain at very low amounts (35mb) when running with HttpThreadsCount of 40, with the OSX binary, or with the binary I compiled myself from source on Ubuntu, and yet running with the Debian apt package binary, or inside Docker, causes it to continue to grow. What is the difference between these setups?

Dom

jodogne · April 16, 2020, 8:51am

Hello,

Regarding your question about the difference between all these setups:

Dynamic compilation on Ubuntu will use the latest C library for your system.

This contrasts with the Linux Standard Base pre-compiled binaries (available at http://lsb.orthanc-server.com/orthanc/) that are compiled using the LSB SDK and that are statically linked against dependencies, in order to make them fully portable in a cross-distribution way. The LSB SDK are kind of a “compatibility mode” for C/C++ applications, with an older libc and an older compiler. This compatibility mode might induce side-effects because of the older libc, such as a possible sub-optimal use of heap memory in the case of multithreading.

In situations where performance is important on a GNU/Linux system, it might be interesting to compile from sources, and dynamically link against the third-party dependencies. This does not reflect the situation of the average user who just wants a solution that works out-of-the-box.

Regarding a mitigation:

In order to mitigate such an issue for the average user, I will reduce the default value of “HttpThreadsCount” from 50 to 10 in forthcoming maintenance release 1.6.1 of Orthanc. Advanced users can increase this value if need be.

Regarding memory usage:

I swear I triple-checked that there is no memory leak in your scenario. Using your script, your DICOM file and Orthanc 1.6.0 LSB (Linux Standard Base), the “massif” tool from valgrind reports maximum heap usage of 80MB.

Note that the “massif” tool reports a more accurate value of the actual memory consumption by the application, contrarily to “VmRSS” (resident set size) metrics on GNU/Linux that is typically used by memory monitoring tools. Indeed, the “resident set size” takes into account memory blocks that have been allocated by “malloc()”, then released by “free()”, but still cached for future reuse by the glibc. The technical name is “arena”:
https://sourceware.org/glibc/wiki/MallocInternals

The “massif” tool turns off this mechanism of arenas.

By default, using the LSB binaries in the absence of “massif”, it looks as if each thread were associated with one separate “memory pool / arena” (check out section “Thread Local Cache - tcache”). As a consequence, if each one of the 50 threads in the HTTP server of Orthanc allocates at some point, say, 50MB (which roughly corresponds to 3 copies of your DICOM file of 17MB), the total memory usage reported can grow up to 50 threads x 50MB = 2.5G, even if the Orthanc threads properly free all the buffers.

A possible solution to reducing this memory usage is to ask glibc to limit the number of “memory pools / arenas”. On GNU/Linux, this can be done by setting the environment variable “MALLOC_ARENA_MAX”. For instance, the following bash command-line would use one single “memory pool /arena” that is shared by all the threads:

$ MALLOC_ARENA_MAX=1 ./Orthanc

On my system, with such a configuration, the “VmRSS” stays at about 100MB. Obviously, this restrictive setting will use minimal memory, but will result in contention among the threads. A good compromise might be to use 5 arenas (this results in RAM usage of 300MB on my system):

$ MALLOC_ARENA_MAX=5 ./Orthanc

I guess that Orthanc binaries that use a more recent version of glibc than the one in LSB might do better job is sizing the arenas, which could explain why you don’t have problems with binaries you built yourself.

Memory allocation on GNU/Linux is a complex topic. There are many other options available as environment variables that could reduce memory consumption (for instance, “MALLOC_MMAP_THRESHOLD_” would bypass arenas for large blocks). Check out the manpage of “mallopt()”:

http://man7.org/linux/man-pages/man3/mallopt.3.html

HTH,
Sébastien-

Alain_Mazy3 · April 16, 2020, 9:03am

Note that we’re currently working on building new osimis/orthanc Docker images in which Orthanc will be linked dynamically. So hopefully, this problem will disappear with these images.

jodogne · April 16, 2020, 9:32am

As a complement to my previous message, the Docker images “jodogne/orthanc”, “jodogne/orthanc-plugins”, and “jodogne/orthanc-python” have just been updated so that the environment variable MALLOC_ARENA_MAX is set to 5 by default:
https://book.orthanc-server.com/users/docker.html
https://github.com/jodogne/OrthancDocker/commit/3c7e7174acc1509033ab651ff54a15da133d99f8

One can override this default setting as follows (setting it to “0” will resume the old behavior):

$ docker run -p 4242:4242 -p 8042:8042 -e MALLOC_ARENA_MAX=1 --rm jodogne/orthanc-python:1.6.0

Dominic_Bou-Samra1 · April 18, 2020, 1:36am

Thanks all for the thorough explanations and investigations.

@Sébastien, I have deployed the updated 1.6.0 image that has the MALLOC_ARENA_MAX set at 5, and capped my thread count at 10. This has fixed all of our out of memory issues, and memory remains constant at ~ 300mb.

Thanks again

jodogne · April 20, 2020, 4:18pm

For reference, I have finally decided to revert the default value of “HttpThreads” to 50:
https://hg.orthanc-server.com/orthanc/rev/ee0a1211419f

I have indeed noticed that setting this parameter to a lower value strongly reduces the speed of the integration tests.

Dominic_Bou-Samra1 · April 23, 2020, 4:20am

Hmm.

So I am running into memory issues again, this time to do with making HTTP requests. I don’t yet have a repeatable test case (not for lack of trying mind you), or methodology for reproducing it.

I am seeing it when I make many concurrent HTTP requests against Orthanc via an app, getting images, getting tags etc. It seems somewhat related to when requests queue up, and cant be served immediately. The memory will spike to 5+ gb very quickly and promptly crash (similarly to the original issue), and never come back down.

That’s about as much detail as I can get. I have tried using Jmeter and apache bench to simulate what we are doing in our web app, but I can’t get it to do the same thing.
We are running the 1.6.0 orthanc-plugins image with MALLOC_ARENA_MAX=5, HttpThreadsCount=8, Keep-Alive: true

Sorry for the vague details - perhaps I will have more soon.

jodogne · April 23, 2020, 5:39am

Unfortunately, I can’t provide much support without a reproducible scenario…

My first advice would be to try and increase “HttpThreadsCount” back to the default value of 50.

Then, I would run Orthanc in “–verbose” mode so as to see whether a specific REST route causes this growth:
https://book.orthanc-server.com/faq/log.html

Also, I would give a try tuning other environment variables related to malloc(), notably “MALLOC_MMAP_THRESHOLD_”:
https://linux.die.net/man/3/mallopt

Dominic_Bou-Samra1 · April 23, 2020, 5:41am

Thanks for your patience and help. I’ll give those suggestions a go, and let you know what I find.

I have a way to reproduce it repeatably now, but it involves basically replaying every request from my app (hundreds of requests). I’ll try and get a more minimal example going.

Michel_Rozpendowski1 · December 4, 2020, 10:02am

Hi,

For your information, I have encountered the same issue with Orthanc running in a Docker container, using osimis/orthanc:20.10.2
The journal logs are quite explicit:
2020-12-03T15:49:55Z null: Out of memory: Kill process 21400 (Orthanc) score 712 or sacrifice child 2020-12-03T15:49:55Z null: Killed process 21400 (Orthanc) total-vm:23069876kB, anon-rss:20404168kB, file-rss:0kB, shmem-rss:0kB 2020-12-03T15:49:55Z null: oom_reaper: reaped process 21400 (Orthanc), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Some interesting reading in Docker documentation:

Out Of Memory Exceptions (OOME)
If your services or containers attempt to use more memory than the system has available, you may experience an Out Of Memory Exception (OOME) and a container, or the Docker daemon, might be killed by the kernel OOM killer. To prevent this from happening, ensure that your application runs on hosts with adequate memory and see Understand the risks of running out of memory.

And you can read further about it in the page Understand the risks of running out of memory:

It is important not to allow a running container to consume too much of the host machine’s memory. On Linux hosts, if the kernel detects that there is not enough memory to perform important system functions, it throws an OOME, or Out Of Memory Exception, and starts killing processes to free up memory. Any process is subject to killing, including Docker and other important applications. This can effectively bring the entire system down if the wrong process is killed.

Docker attempts to mitigate these risks by adjusting the OOM priority on the Docker daemon so that it is less likely to be killed than other processes on the system. The OOM priority on containers is not adjusted. This makes it more likely for an individual container to be killed than for the Docker daemon or other system processes to be killed. You should not try to circumvent these safeguards by manually setting --oom-score-adj to an extreme negative number on the daemon or a container, or by setting --oom-kill-disable on a container.

For more information about the Linux kernel’s OOM management, see Out of Memory Management.

You can mitigate the risk of system instability due to OOME by:

Perform tests to understand the memory requirements of your application before placing it into production.
Ensure that your application runs only on hosts with adequate resources.
Limit the amount of memory your container can use, as described below.
Be mindful when configuring swap on your Docker hosts. Swap is slower and less performant than memory but can provide a buffer against running out of system memory.
Consider converting your container to a service, and using service-level constraints and node labels to ensure that the application runs only on hosts with enough memory

My short-term fix was therefore the set the following in my docker-compose.yml file (my host has more than 20G memory available):
version: “2”
services:
orthanc:
mem_limit: 10g

Most certainly Orthanc needs less memory than that. The behaviour might be due to the debian OS setup the Orthanc Docker image is using, and how it is reserving memory, and not due to any memory leak.
I’ll do further investigation and testing and revert back to you here later.

Kind regards,

Michel

Michel_Rozpendowski1 · December 4, 2020, 3:49pm

For the record, the mem_limit trick did not prevent the kernel from killing orthanc once the mem_limit was reached…
I have not been able to reproduce locally this memory growth through.

I’m now trying with MALLOC_ARENA_MAX=1 instead of the default MALLOC_ARENA_MAX=5.

I will continue to investigate and will revert back here in case I find anything valuable.

Michel_Rozpendowski1 · December 4, 2020, 5:01pm

Please note we have discovered that the Docker images build with the osimis/orthanc-builder, contrary to the jodogne/orthanc ones, were not setting any value for MALLOC_ARENA_MAX with the consequence that glibc was setting the default number of memory pools on 64bit systems to 8 times the number of CPU cores, which was therefore 8 times 8 = 64 memory pools in my case, contrary to the 5 advised in the Orthanc Book.

This has been fixed with following commit https://bitbucket.org/osimis/orthanc-builder/commits/eef05cd27b1ea4571d723abb298f5f7cd7c9ec68 and published as of Osimis Orthanc Docker image 20.12.2.

I hope this helps!

Cheers,

Michel