Accelerating archive API making instance collection async

Hi there,

I have a small question on an Orthanc Plugin we want to build.

Historically Orthanc has a slow API for getting the ZIP archive that have been clearly improved by ZIP streaming of response.

Now the ZIP straming start almose immediatly, but the transfer speed depends on the ability of Orthanc to retrieve instances from the storage system (for instance my testing server is delivering 5Mo/sec which is good but does not saturate my bandwith of 12Mo/sec that I was reaching when the file was prepared in server side).

In a discussion I got while ago with Alain, he told me that this API retrieve instances synchronously with a single thread.

We are about to work on a python plugin for Orthanc that will try to retrive these instances asynchrously with 3 parrallele threads.
I think this could lead to performance improvement especially if you have the dicom in buckets as this bucket should not slow down for parrallel request.
And additionally it will be a first exercice for us to make a plugin in Orthanc and make our learning curve (GPL licenced of course)

So my questions would be :

  • Am I correct in my assumption that async file reading can improve this archive speed ? (assuming we have a cloud database and cloud bucket that will handle multithreding)

  • In this plugin, to identify the instance to retrieve we will call the Study|Series/{orthancID}/instance by an http call to orthanc itself to build to array of the instance to collect. Is it good to use the http call from plugin or should be always prefer to use internal methods ?

  • I see the plugin can return a buffer response to Orthanc. So I guess if we build the zip as a stream in python we will be able to transform it to buffer for orthanc and thus able to reproduce the native streaming output to the client. Is it correct ?

With these advices we hope we will be able to make a contributing plugin (at least a first one because we won’t stop there)

Best regards,

Salim

Hi Salim,

Here are my answers to your 3 questions:

- Am I correct in my assumption that async file reading can improve this archive speed ? (assuming we have a cloud database and cloud bucket that will handle multithreding)

Yes, if you use a storage plugin like S3, it should benefit from reading from multiple threads at the same time

- In this plugin, to identify the instance to retrieve we will call the Study|Series/{orthancID}/instance by an http call to orthanc itself to build to array of the instance to collect. Is it good to use the http call from plugin or should be always prefer to use internal methods ?

It’s a bit more efficient to use the internal methods (but, that should be negligible). Using the HTTP call (e.g with the request library) is nice because it helps you develop and test your plugin outside Orthanc.

- I see the plugin can return a buffer response to Orthanc. So I guess if we build the zip as a stream in python we will be able to transform it to buffer for orthanc and thus able to reproduce the native streaming output to the client. Is it correct ?

The python SDK does not currently expose the primitive to reply with a chunked response (equivalent to this C++ method). So I’m afraid that it is currently not possible to develop such a plugin in python.

So, at this point, I can imagine 3 options:

  • modify the core of Orthanc such that the ArchiveJob reads DICOM files from multiple-threads
  • extend the python sdk to allow you to reply with chunked responses
  • write the plugin in C++

Best regards,

Alain.

Dear Alain,

Thanks for your answer,

We are going to try to propose a enhancement to the core of Orthanc by editing the ArchiveJob class (we looked at the code and we have an idea to make it).
No idea of our chance of success but we are going to try .

Probably that we will have additional question that we will ask in this thread^^,

Thanks!

Salim

Hi Alain,

First of all, thank you for answering our questions.

I allowed myself to fork your git OrthancMirror and commit the source code I made on ArchiveJob.cpp and ArchiveJob.h to add multithreading on the recovery of a DICOM.

Here is the commit link:

https://github.com/exploff/OrthancMirror/commit/f80cbcbd450486d5011296446148b14ce3444368

If you ever have time to take a look, this is the first time I have touched C ++ it must surely have some ways to optimize that.

Knowing that it is the recovery of an instance that slows down the process. I have separated the recovery and the writing to the archive.
(Unless I am mistaken, it was impossible for me to perform the recovery and writing in one and the same thread).

I defined:

  • a constant equal to the maximum number of active threads (here 3)

  • a modification of the Apply () method of the Command class to separate recovery and write

  • a method to perform the last step of creating the archive

  • a structure to gather the information necessary for recovery via the thread

  • a method to execute the retrieval by the thread and store it in a content

  • a method to terminate a thread and update variables

  • a modification of the Step () method of the main class to manage threads

Thank you !

Hi Julien,

Thanks for this contribution. We’ll look into it.

Note that I’m in vacation right now so this will probably take “some” time before I can look into it.

Best regards,

Alain

Hi, anyone find the solution for accelerating archive when we download study from orthanc server. If yes, how to implement it using orthanc/osimis docker container.

Thank you in advance.