Hi,
We have 2 systems that run Orthanc with the following configuration:
orthancteam/orthanc:latest (1.12.6)
PostgreSQL DB (one uses Postgres v13, the other Postgres v17) (configured with PostgreSQL Plugin 7.1, for indexing only
Application script that loops and checks every instance in Orthanc for processing
These 2 servers appear to have a memory leak. Our application script loops through every instance to check a variety of different things. It uses the rest API and the checks data in instances/:instance_id/metadata one instance at a time. Once finished, it will wait and then start again (it’s brute force but it works).
We’re having reliability with these 2 servers, they regularly (1+ times a day) run out of memory and require orthanc to be restarted to clear the memory.
From observing the servers through htop it appears orthanc is increasing memory demand by approx 10meg every ~10mins. Without restarting Orthanc, memory usage on our servers can exhaust all page file in 12-24hrs.
If I stop our application script, the memory leak either slows or stops to a manageable level.
I’m wondering if there are any troubleshooting/debugging advice that I can use to identify where this memory leak is?
It’s quite difficult to investigate this kind of leaks on a container.
If you could provide us with the simplest script calling the API that generates this leak, I should hopefully then be able to quickly identify it with valgrind on my system.
Here is a simple python script that is the core of the script. I haven’t verified that this script causes the same memory leak but will try it now (edited: I just verified that this does cause the swap usage to keep growing)
import time
import requests
url='http://localhost:8042'
def get_instance_metadata(instance_id):
return requests.get(f'{url}/instances/{instance_id}', params={'expand': True}).json()
def get_all_instances():
all_instances = []
limit=100
count=0
while True:
params = {'limit': limit, 'since': count * limit}
instance_list = requests.get(f'{url}/instances', params=params).json()
all_instances.extend(instance_list)
if len(instance_list) < limit:
break
count += 1
return all_instances
def loop():
while True:
instances = get_all_instances()
for instance_id in instances:
metadata = get_instance_metadata(instance_id)
time.sleep(60)
loop()
Here is a minimal docker-compose.yml file that describes our server setup.
That was a tough one because it was actually not detected by Valgrind and other memory checking tools !
We were actually storing one SQL prepared statement for each combination of since & limit so it only happened if you had a lot of instances. I was finally able to trigger it quite rapidly by storing 10.000 instances and request them by pages of 2 → this was creating 5.000 different prepared SQL statement → and, after the fix, only 1 !
Morning Alain,
I wasn’t able to test the build as I’m not sure it was successful. Looking at the Github Actions, I think the unstable build failed.
James