I don’t think the DB connection count should have any real impact on this.
Yes it didn’t change anything,
However I still have very poor performances on the /metadata api with waiting time 1+ min for 400 instances.
I don’t see where is the bootlneck, the database has 1ms response, download dicom with threading has good performances so i guess the object storage is responding also fast.
im using ORTHANC__EXTRA_MAIN_DICOM_TAGS__INSTANCE / series, coild if interfer with metadata retrieval ?
Best regarsd,
Salim
Another thing, when i run concurrent request the answer delay seems to increase.
Viewer slike ohif don’t run 1 series metadata call , it run concurrently 1 per series.
So /metadata access was sort of already threaded as you had n series/metadata concurrent requests, for instance for MRI is usually to have 20+ series so OHIF will emit 20 concurrent requests which each one would be 4 threaded.
May we create a bootlneck this way in the storage backend if we have too much parrellel threaded requests ?
Salim
Dear all,
I have spotted a problem brought by the following changeset: orthanc-dicomweb: 39b7ccaa6dfc
This changeset disables the calls to the optimized OrthancPluginLoadDicomInstance()
function that was introduced in Orthanc 1.12.1, at least as soon as multiple threads are used. I guess @alainmazy will dig into this. Because of this, I feel like there should be almost no difference in the speed of DICOMweb 1.13 vs 1.14.
On another hand, the methodology that is currently used in this thread will be unable to identify the cause of the bottleneck. It is needed to agree on a common dataset, on a common benchmark, and on a common setup. Furthermore, there is a high complexity and variety in the configurations and the hardware setups that have been reported so far, so the thread essentially boils down to comparing apples and pears, without identifying the core problem.
As far as I’m concerned, I just have made a test using the Radiogenomics.7z
archive corresponding to the “TCIA, Siemens, PET study” from Aliza Medical Imaging. This archive contains 810 DICOM instances with an uncompressed transfer syntax, so it is representative of an average clinical DICOM study.
I’ve compiled Orthanc 1.12.1 and DICOMweb 1.13 + 1.14, from source (no Docker), inside a fresh build directory and using -DCMAKE_BUILD_TYPE=Release
, on Ubuntu 18.04 (my development machine). No script and no other plugin was used. My Orthanc storage was on a basic HDD, and I used the built-in SQLite database. Then, I executed the following DICOMweb WADO-RS request:
$ time curl http://localhost:8042/dicom-web/studies/1.3.6.1.4.1.14519.5.2.1.4334.1501.757929841898426427124434115918/metadata > /tmp/wado.json
Both setups runs in the same amount of time (about 800ms), which I don’t find excessive for 810 DICOM instances. We are thus extremely far away from the figures reported by Salim. Note however that DICOMweb 1.14 brought no improvement over DICOMweb 1.13, presumably because of the cancelling of OrthancPluginLoadDicomInstance()
, but has more unstable runtimes (ranging between 600ms and 1 second), presumably because of the use of threads.
Summarizing, before investigating complex scenarios involving OHIF, multiple connexions PostgreSQL, Docker, and Azure, let’s rule out such complexity and let’s start by validating simple scenarios.
Kind Regards,
Sébastien-
My bad. This was an erroneous statement: OrthancPluginLoadDicomInstance()
is called from another path I hadn’t spotted in the source code. Sorry for the noise @alainmazy.
That being said, it doesn’t negate the need to define a benchmarking scenario.
Yes absolutely,
let me check on a more standard deployment using docker and a conventional VM in azure, to start to see if the problem is due to Azure Container Instance (i have doubts on the container instance quality)
On my side, I have been doing some tests with Sebastien’s dataset.
First series is the one ending with 63 (324 instances)
Second series is the one ending with 75 (486 instances)
Docker 23.6.1 (PG) Docker 23.7.0 (PG) Native 1.12.1-1.14 debug (SQLite) Native 1.12.1-1.14 LSB Release (SQLite)
Azure storage from my dev PC (very slow !!!)
- In OHIF (both /series/metadata loaded together) NA 29s & 36s (total time = 36s)
- In Stone (both /series/metadata loaded one after the other) 16s + 57s (total time = 73s) 14s + 21s (total time = 35s)
Local storage SSD on my dev PC
- In OHIF (both /series/metadata loaded together) NA 2.33s & 2.94s (total time = 2.94s) 0.49s + 0.56s (total time = 0.56s) 0.33s + 0.45s (total time = 0.45s)
- In Stone (both /series/metadata loaded one after the other) 1.98s + 1.33s (total time = 3.31s) 1.80s + 1.70s (total time = 3.50s) 1.88s + 0.38s (total time = 2.26s) 1.81s + 0.36s (total time = 2.17s)
- With curl 0.76s + 1.19s (total time = 1.95s) 1.05s + 1.60s (total time = 2.65s) 0.26s + 0.31s (total time = 0.57s) 0.23s + 0.28s (total time = 0.51s)
continued testing only with curl:
Native 1.12.0-1.13 LSB Release (SQLite) Docker with 1.12.1-1.14 LSB Release (PG) Docker with 1.12.1-1.14 LSB Release (SQLite) Docker with 1.12.0-1.13 LSB Release (SQLite)
- With curl 0.32s + 0.45s (total time = 0.77s) 1.24s + 1.74s (total time = 2.98s) 0.29s + 0.34s (total time = 0.63s) 0.34s + 0.98s (total time = 1.32s)
Docker with 1.12.1-1.14 LSB Release (PG+IndexConnectionsCount=5 instead of 1 in all other tests)
- With curl 0.94s + 1.45s (total time = 2.39s)
To summarize:
- 23.7.0 (1.12.1+1.14) brings some improvements with Azure storage (2x faster than 23.6.1)
- OHIF is usually faster than Stone to get 2 series metadata because OHIF queries them in parallel while Stone queries them one after the other
- Stone starts reading the frames once it got its first response which slows down the second query because the drive/DB are busy doing other stuffs
- Given we are using a very small DB, using SQLite has a huge positive impact on performance compared to PostgreSQL (5x faster !!! on my dev machine !)
- With SQLite, there is roughly a 30-40% gain in performance between 1.12.0+1.13 a 1.12.1+1.14. With PostgreSQL, this gain is not visible at all
- Docker is not slowing the system compared to native executables.
- LSB binaries and dynamically linked binaries have roughly the same performances
- With PostgreSQL, increasing the IndexConnectionsCount to 5 instead of 1 slighly improves the performances (20%)
It seems that the next step is to improve the Orthanc core, PostgreSQL & DicomWeb plugin to reduce the number of queries to DB since it seems it has quite a big impact.
Best regards,
Alain.
Interesting in my production server i have a postgres database (azure flexible postgres server)
I checked the metrics of my database, I see spike of requests while using dicom web api but far from overloading the database in termes of IOPS / CPU (5-10% usage).
But even without overloading the database server, if Orthanc do a lot of sequential SQL request it could delay response.
Do you have an idea how many SQL call can generate a /metdata call of a series ? By which factor could you reduce it ?
Do we have a similar bottelneck in the /archive API, when requesting a lot of Instance there a important waiting time before the ZIP streaming starts, I guess this waiting time is to do Orthanc colleting Instance references to generate in the ZIP, is there room to agregate SQL queries in this api too ?
Best regards,
Salim
As a rough estimate, I would say that there are 5 small SQL queries per instance and we hope to reduce it to 5 large SQL queries whatever the number of instances.
This could also impact the /archive route.
Wow that’s a huge difference,
If you need funding please estimate the number of hours you need and we can discuss how to use the funding you have awaiting for our projects,
Best regards
Salim
Hi @salimkanoun
To get back to this series/{uid}/metadata
speed issues: I have just implemented some kind of caching for this route as you suggested some day:
On the StableSeries event, the output of series/{uid}/metadata
is saved as a compressed attachment to the series which means that it can be read as is when accessing this url. This dramatically improves the loading time since only one file must be read vs one file per instance in the previous version.
This might of course slow down the ingest speed a bit but, if the Orthanc cache is large enough, all files are still cached when the StableSeries event occurs.
Of course, for content that has been ingested before this DicomWeb plugin version, the cache has not been filled → I’m currently updating the Housekeeper plugin to handle this.
I have made a test Docker image named osimis/orthanc:dw
. Could you possibly give it a try before I release the plugin ?
Note: This change applies to the “Full” mode.
Thanks,
Alain.
Dear Alain,
Thanks ! Very good news, I will make a try this week.
Best regards,
Salim
Dear Alain,
Yes it’s working perfectly, the /series/metadata api has droped to 6-7 seconds for the first call to 0.2 to 0.3 seconds for all new call of this api so the caching is working well with very huge performance improvement (as well as lowering the pressure on the db/storage backend).
I have only one question, your caching is triggered either by the first call of the /metadata API or the Series Stable event.
The risk i see is that if someone calls the metadata API before the stable series event while orthanc is still reciving instances you will build a partial cache, the cached data will not represent all instances that are available few minutes later.
I think the Stable Series event should have a priority and force to renew the cache event if a cache is available (as the stable series has been triggered something has changed so the cache should be invalidated even if existing in this case.).
Do you have this kind of priority ?
It’s anyway looking to be a very cool improvement for dicom-web api, thanks for your work !
Best regards,
Salim
Hi Salim,
Thanks for testing !
Yes, the StableSeries event does not check if the data has already been cached and always overwrites the cache.
However, I have just added versioning and instance counting in the cache to handle future fixes and missed StableSeries events that can occur in case of restart before the StableAge is reached.
Best regards,
Alain
Perfect, thanks for this very nice improvement !
Hi Alain,
That’s a nice solution. I also had implemented an external system which cache series metadata once OnSeriesStable event occurs. I saw that you are catching event inside this plugin
static OrthancPluginErrorCode OnChangeCallback(OrthancPluginChangeType changeType,
OrthancPluginResourceType resourceType,
const char *resourceId)
{
try
{
switch (changeType)
{
case OrthancPluginChangeType_OrthancStarted:
OrthancPlugins::Configuration::LoadDicomWebServers();
break;
case OrthancPluginChangeType_StableSeries:
CacheSeriesMetadata(resourceId);
break;
default:
break;
}
}
catch (Orthanc::OrthancException& e)
{
LOG(ERROR) << "Exception: " << e.What();
}
catch (...)
{
LOG(ERROR) << "Uncatched native exception";
}
return OrthancPluginErrorCode_Success;
}
I think it would be nice to have a configuration that enable/ disable if the user want to use cache or not. Hence this does not affect the current running system’s performance. Also I think this cache is only effective for series which has a lot of instances. Other modality like DR, US … have few instances so caching is not necessary.
Secondly, It would be nice if you expose an API which deletes cache and creates cache. So the user can proactively do whatever they want in case of any events like orthanc restart, insufficient cache or whatsoever.
Thirdly, since attachments take amount of storage. So if Orthanc Core can count this amount in to the statistics, I would be even more perfect.
curl -i localhost:8042/statistics
{
"CountInstances" : 94570,
"CountPatients" : 400,
"CountSeries" : 1906,
"CountStudies" : 482,
** "TotalDiskSize" : "45097173060",**
** "TotalDiskSizeMB" : 43008,**
** "TotalUncompressedSize" : "45097173060",**
** "TotalUncompressedSizeMB" : 43008**
}
Thirdly,
Hi Christophe,
Thanks for your feedback and suggestions.
- the overhead in terms of storage is minimal: e.g. for a 600x1.2MB CT series, the cache consumes only an additional 52KB out of 720MB.
- the size of the cache is already counted in the
/statistics
route that counts the size of allattachments
- to simplify the API, I do not consider it is necessary to offer a route to clear the cache. This can be done by calling DELETE http://localhost:8042/series/../attachments/4031
- there is already a route to force an update of the cache by posting an empty payload at http://localhost:8042/studies/../update-dicomweb-cache. This route is used by the Housekeeper plugin.
I have just added a new "EnableMetadataCache"
config to enable/disable the cache.
Best regards,
Alain
This is great improvement. Just a question, is the metadata already gzipped and stored? or gzipping happens on the fly
For new studies, the metadata is stored in gzip format when they are received (actually when they become stable).
For older studies, it is computed on the fly the first time you try to access them.
HTH,
Alain