Slow POSTs to /instances under load

When uploading dicoms to /instances, sometimes the HTTP request takes exactly a multiple of 1min to complete (most of the time it is exactly 1min, but I have seen 2min as well).
This happens especially when uploading many files at once (testing with a dataset of ~300 CTs).

The issue

  • occurs when using the Orthanc web app and when using the API directly.
  • occurs less frequently when all uploads are executed sequentially (i.e. no concurrent Http requests)
  • occurs more frequently when the dataset is large (never occurred for single files)

Environment:

  • Orthanc version 1.10.0.1.
  • Everything is installed on the same machine (i.e. all connections are “localhost”)
  • OS: Windows Server 2019

Have you read the section “Performance issues” in our FAQ?

https://book.orthanc-server.com/faq/troubleshooting.html#performance-issues

Also, if you upload the DICOM files one by one, do you set the “Expect” HTTP header, as explained in the Orthanc Book?
https://book.orthanc-server.com/users/rest.html#sending-dicom-images

Finally, if you are uploading a single upload of a ZIP file containing your hundreds of CTs, isn’t your computer swapping because of insufficient RAM?

Sébastien-

Hi Thorsten,

You should probably check your logs in verbose mode to check how each individual file is being ingested and if you observe a 1 minute ‘pause’ somewhere.

Might sound weird but you could also give it a try by using 127.0.0.1 instead of localhost. In some Windows versions, I’ve observed 1 second IPv6 DNS timeout before switching to IPv4 DNS to resolve ‘localhost’… but never heard of a 1 minute delay …

HTH

Alain.

Hello Sébastien,

I will try both suggestions, much appreciated!
All the files taken together (they are not zipped, but uploaded one-by-one) only have a few hundred MB (each CT is below 1MB), so swapping should not be an issue.

Best regards
Thorsten

Hello Sébastien, Alain,

after some more investigation (thank you for your hints!), it turned out the culprit was one of our .lua scripts running on each new file upload.
Disabling that immediately solved the issue, so apologies for the false alarm here.

Thank you both and best regards
Thorsten

Hello Sébastien, Alain,

I might have be too optimistic with the previous statement. After more investigation, it turns out that POSTs to Orthanc/instances are significantly slowed down under certain circumstances.
As setup, I am using the Orthanc Explorer to upload Dicoms. We have one .lua script, which (for testing purposes) I have trimmed down to the following:

function OnStoredInstance(instanceId, tags, metadata, origin)
local query = {}
query[‘Modality’] = tags[‘Modality’]
query[‘FrameOfReferenceUID’] = tags[‘FrameOfReferenceUID’]
query[‘SeriesInstanceUID’] = tags[‘SeriesInstanceUID’]
query[‘StudyInstanceUID’] = tags[‘StudyInstanceUID’]
query[‘PatientID’] = tags[‘PatientID’]

local tools = {}
tools[‘Level’] = “Instance”
tools[‘Query’] = query

local answer = RestApiPost(“/tools/find”, JSON:encode(tools))

end

The issue seems to be with the RestApiPost invocation. Without it (i.e. commenting out that line), POSTs to /instances take about 50ms and the script executes in <10ms typically.
With the RestApiPost, the time for POSTs to /instances jumps to around 500ms - 1000ms and the script execution takes around 500ms on average as well.

Is there any advice on how to mitigate these regressions?

P.S.:Some additional observations:

  • The runtime of the script starts quite fast at around 100ms, but seems to get slower, the more instances have been uploaded, I have seen up to 700ms (at around 350 dicoms in Orthanc).
  • The script execution seems to be asynchronous, the logfiles show script executions long after the Orthanc Exporer has finished uploading all the files.

Best regards
Thorsten

Hello. Is there a particular reason to have a call to /tools/find/ in the stored instance event? You could use other events like NEW_STUDY or STUDY_STABLE, but since I don’t use Lua I can’t provide an example.

A quinta-feira, 28 de abril de 2022 à(s) 12:19:03 UTC+1, thorsten.r...@gmail.com escreveu:

Hi thorsten,

What exactly do you try to achieve by checking if the instance is already there ? The instance has already been stored when your callback is called.

And yes, OnStoredInstance are queued and executed asynchronously.

Best regards,

Alain.

Hello Alain,

the calls to /tools/find tells us how many instances of that series have already been uploaded. This let’s us determine when a series has been uploaded in it’s entirety on Orthanc (to signal other services).

As mentioned above, there is the OnStudyStable callback, presumably for cases like this. However, the documentation is very scarce on the exact conditions triggering it, and, if I understand it correctly, there could be some issues-by-design: too many false signals (callback is triggered after very short time), too much latency (callback triggered after very long time), issues with temporary network outages or user errors. Hence the current version seemed simpler.

Best regards
Thorsten

I’ve been using the STABLE_STUDY event for a while and it works pretty well. Here are some remarks:

You need to properly set the StableAge variable to a reasonable time.
When studies are resubmitted, you must decide what to do with the STABLE_STUDY event triggered again. However, Orthanc behavior changes slightly depending on the value of OverwriteInstances.

When you have OverwriteInstances = true, and a study is resubmitted, the only possibility is:
All instances are identified/rewritten ← Does not trigger NEW_STUDY / Triggers STABLE_STUDY

When you have OverwriteInstances = false, and a study is resubmitted, there are two possibilities:

New instances were identified ← Does not trigger NEW_STUDY / Triggers STABLE_STUDY
All instances resubmitted were already stored ← Does not trigger NEW_STUDY / Does not trigger STABLE_STUDY

Notice that in all cases the NEW_STUDY event is not triggered, which makes sense. The only problem is the scenario i highlighted above. Let me explain:

Since the NEW_STUDY event is not triggered for resubmitted studies, you don’t have an entrypoint to react to them, so you resort to the OnStoredInstance event.
However, already-stored instances triggers OnStoredInstance without a variable indicating so.

And since the STABLE_STUDY event is not triggered in this case, you have no way of handling the incoming study.

I think we need more generic events regarding the lifecycle of an incoming study to Orthanc, something like INCOMING_STUDY / FINISHED_STUDY (think of finally in a catch block). With this, we would be able to handle specific events once without relying on the instance level events.

A sexta-feira, 29 de abril de 2022 à(s) 16:45:52 UTC+1, thorsten.r...@gmail.com escreveu:

Dear Thorsten,

[…] As mentioned above, there is the OnStudyStable callback, presumably for cases like this. However, the documentation is very scarce on the exact conditions triggering it, and, if I understand it correctly, there could be some issues-by-design: too many false signals (callback is triggered after very short time), too much latency (callback triggered after very long time), issues with temporary network outages or user errors. […]

The “OnStable” events are documented at the following place in the Orthanc Book:

https://book.orthanc-server.com/faq/features.html#stable-resources

If you feel this documentation is insufficient, feel free to contribute by improving this text:
https://hg.orthanc-server.com/orthanc-book/file/default/Sphinx/source/faq/features.rst
https://book.orthanc-server.com/developers/repositories.html#simple-patch-import-export

Regards,

Sébastien-

Hello Sébastien-

If you feel this documentation is insufficient, feel free to contribute by improving this text:
https://hg.orthanc-server.com/orthanc-book/file/default/Sphinx/source/faq/features.rst
https://book.orthanc-server.com/developers/repositories.html#simple-patch-import-export

Like the attached file? I am unsure as to how exactly to submit that as a patch to this group.
It changes the explanation of OnStablePatient, OnStableSeries and OnStableStudy to include a link to ‘Stable Resources’. Also, OnStableSeries and OnStableStudy now have the same explanation as OnStablePatient (this is the point were I was assuming StableAge only applied to OnStablePatient).

Still, since it is now clear that all three callbacks are triggered from a timing-based heuristic, they would produce false positives under various conditions described above.
Given that, I’d prefer the exact counting method from the script above (if the performance degradation of uploads/script execution can be solved, of course).

Thank you and best regards
Thorsten

contribution.patch (4.93 KB)

Hi Thorsten,

2 remarks:

  • your contribution.patch is a valid patch but its content is not the one you expected (it cancels one of the last Seb’s commit). If you prefer, feel free to send me the files you have modified and I’ll review and update the repo.

  • I understand you are trying to find the number of sibling instances in the series. /tools/find is probably not the best suited for that in terms of performance (you might have noticed :slight_smile: !). I would recommend using GET on http://localhost:8042/instances/…/series.

BTW, how do you know how many instances to expect in each series ? I don’t know any standard way to detect that a series is complete and, it turns out that the StableStudy/StableSeries events are not that bad in real life.

HTH,

Alain

Hello Alain,

  • your contribution.patch is a valid patch but its content is not the one you expected (it cancels one of the last Seb’s commit). If you prefer, feel free to send me the files you have modified and I’ll review and update the repo.

The only modified file (lua.rst) is attached. Thank you for your help!
Any hints what I might have done wrong? Thought I was following the guide here (but then again, first time using Mercurial).

  • I understand you are trying to find the number of sibling instances in the series. /tools/find is probably not the best suited for that in terms of performance (you might have noticed :slight_smile: !). I would recommend using GET on http://localhost:8042/instances/…/series.

Indeed, after some preliminary testing, I can confirm that /instances/…/series is much faster, thank you for the suggestion!

BTW, how do you know how many instances to expect in each series ? I don’t know any standard way to detect that a series is complete and, it turns out that the StableStudy/StableSeries events are not that bad in real life.

Using some non-standard dicom tags. Each CT knows how many there are in total.

Thank you and best regards
Thorsten

lua.rst (23.6 KB)

Hi Thorsten,

Thx, I’ve reintegrated your changes. Don’t know what could have been wrong with the patch process - not an expert !

Best regards,

Alain