While tracking down performance bottlenecks in our Orthanc deployment on fly.io using Postgres for index and Tigris as an S3 storage backend, I found one issue:
In ServerContext.cpp in the DicomCacheLocker constructor, there is logic that is supposed to limit cache-locking to (currently hard-coded) at most one cache-entry that exceeds 50 MB (currently hard-coded as well). This uses a semaphore locker.
The problem is that the semaphore lock is acquired ahead of time (to be sure) and only released once the DICOM data is completely retrieved (ServerContext:ReadDicom()has finished).
ReadDicom() in our context has two stages:
- lookup of the storage attachment from Postgres
- retrieval of the payload from S3
Both involve network interactions and can be slow depending on network condition, service load, etc.
In fact, after step 1, the uncompressed size of the payload is already known and the semaphore lock could typically be released before the payload is retrieved. This would allow for better parallelism when serving (“small”) image retrievals.
We actually found this issue when looking at verbose logs and saw that HTTP processing would stop after permissions checks (advanced authorization plugin) and before doing Postgres index lookups unexpectedly until a different thread had finished S3 transfers.
I am attaching a patch that passes in the semaphore locker to ReadDicom() such that the big/small check and lock release can be done there once the index data has been retrieved.
As I am not yet allowed to upload attachments, please see here: Orthanc patch: Release large-dicom semaphore-lock directly after reading index · GitHub
I’d be glad if this (or a similar amendment) could find its way into Orthanc.
We still have performance bottlenecks but they are somewhere else. Our observation is that Orthanc serves image data a lot slower (plus ~ 300ms) when using a Postgres index instead of the local sqlite index even though raw Postgres latency is at about 3-4 ms.