have been using 1.12.4 for many months without problems
recently Orthanc has been crashing on random studies stored in S3
last verbose log messages before restart:
I1218 11:49:05.456958 HTTP-22 AWS S3 Storage:/StoragePlugin.cpp:259] AWS S3 Storage: read whole attachment 2013cb3e-2539-45ca-a58f-8ec98c2ba267 (36.46MB in 4.44s = 68.96Mbps)
I1218 11:49:05.567547 HTTP-35 StorageCache.cpp:128] Read attachment “4ab33e95-12d3-4d1f-af77-4b88871dbd61” with content type 1 from cache
W1218 11:49:08.857127 MAIN main.cpp:2059] Orthanc version: 1.12.4
any suggestions on where to start troubleshooting?
You should check the memory usage using e.g docker stats at the time Orthanc is crashing. What is the RAM size of your host ?
Do not hesitate to look at the logs before the last 3 lines before dying. Maybe you have actually 50 HTTP clients each requesting a 35 MB file from S3.
in this scenario, there’s only 1 user requesting a single study. i thought the same study could generate the crash, but turns out that’s not the case (posted a google link to the study earlier in this thread). the logs before the last 3 lines are just more calls to S3 for instances of the same study.
using docker stats - is it normal for Orthanc memory usage to grow and not reduce if a user views 1 study after another and then closes the viewer window?
There are caches, and the memory is not always reclaimed directly.
Are you under the impression that your container could be OOM-killed?
You might want to give it a try on a dev machine with more memory, to see if that can help. There are settings that help lowering the Orthanc memory usage, btw.
Orthanc memory usage strongly depends upon your use case. Orthanc can happily run on a Raspberry Pi with 4GB or can use a huge Postgres DB and manage TB of Dicom data without issues, depending on what you’re asking it to do.
Setting MALLOC_ARENA_MAX is a good compromise between thread contention and memory usage, but it’s more of a secondary level setting : this will not help much if your setup is too constrained in the first place.
You can use a few different methods to find out whether your container is killed because of memory usage:
You can check the docker container exit code (docker ps -a --no-trunc) : 137 being a telltale sign of OOM killed containers
You can check in the kernel ring buffer for oom messages: dmesg | grep -i "oom")
You can use docker stats while your container is running and try to correlate the container exiting with a memory metric reaching a certain point.
You can limit the memory used by your container, with the --memory and --memory-swap flag and check if this makes the container crash faster. For instance: docker run -it --memory=1g --memory-swap=2g your_orthanc_image
Please let us know if this is still and issue and whether you need help or advice
in the meantime, the crash happened again. it appears to be while orthanc is retrieving a file from S3.
I0103 12:59:59.379233 7f9fcc7c06c0 AWS S3 Storage:/StoragePlugin.cpp:159] AWS S3 Storage: reading range of attachment 868d300d-fc46-41b9-b27a-b057d4135dc3 of type 1
is the last entry before orthanc stops responding
the earlier entries for receiving the same attachment are:
I0103 12:58:20.548129 DICOM-2 AWS S3 Storage:/StoragePlugin.cpp:110] AWS S3 Storage: creating attachment 868d300d-fc46-41b9-b27a-b057d4135dc3 of type 1
I0103 12:58:20.684867 DICOM-2 AWS S3 Storage:/StoragePlugin.cpp:134] AWS S3 Storage: created attachment 868d300d-fc46-41b9-b27a-b057d4135dc3 (8.03MB in 136.75ms = 492.86Mbps)
Have you tried enabling all relevant logs so that we can really see what the very last operation is before this crash?
You might want to enable the S3 logs with the EnableAwsSdkLogs option (in the S3 plugin configuration block), as well as all the other Orthanc logs.
Could you check what the exit code is when Orthanc is stopped like that?
Also, you might want to perform the same kind of operations with smaller studies, then bigger ones, and check if the size is somehow correlated with how soon this crashes happens?
Also, if it’s not too cumbersome based on your setup, you might want to compare the behavior of Orthanc inside a container with it running directly on your host (for instance, the LSB binaries).
the problem is intermittent - so hesitant to turn on too much logging
currently logging is set to verbose
if S3 logs aren’t too heavy, will turn those on too
S3 log volumes depends on the amount of studies you are handling and on the specific modalities you are using. For instance, some fMRI studies contain tens of thousands of very small instances and in that case, logging every object stored in S3 (1 object == 1 instance) will be very verbose. But many modalities have a much smaller # of instances per study. In your example, it seems that instances are on the bigger side (8MB), so S3 log volume should be less of a problem (don’t trust me on that and make some tests!)
In general, it would really take a significant amount of logging for your system to be slowed down by it. At least if you switch to stdout logging as suggested (it’s quite common for containers to be verbose). I don’t know how well writing logs to a folder share is optimized by the container runtime.