Concurrent retrieve requests failed

I ran into a problem when making concurrent C-MOVE requests using Orthanc. To replicate the issue, try the following:
1. Make an instance-level query. For example:
POST modalities/ClearCanvas/query
Payload: {"Level":"Instance","Query":{"StudyInstanceUID": "1.2.410.200028.479.2015128.135245", "SeriesInstanceUID": "1.2.410.200028.479.2015128.135245.1"}}

2. Get the answers from the query above. For example:
queries/b2e7c2b9-9ff8-452a-afab-736ef73f6587/answers
This will return answers corresponding to our instances.

3. Issue concurrent retrieve requests to download those instances. For example:
POST queries/b2e7c2b9-9ff8-452a-afab-736ef73f6587/answers/0/retrieve
POST queries/b2e7c2b9-9ff8-452a-afab-736ef73f6587/answers/1/retrieve
I wrote a C# client to make sure both requests are sent simultaneously.

The second "retrieve" request will fail. It says "Peer aborted Association (or never connected).

Any thoughts would be sincerely appreciated! I looked into Orthanc's source code, and it seems that mutex locks are applied correctly. Did I miss something?

Thanks!
Ishaan

concurrent retrieve.jpg

I dived into the code and might have found the bug (not the fix yet). Here is what I discovered:

Whe I make C-MOVE calls, the following function will be invoked:
ReusableDicomUserConnection::Locker locker(context_.GetReusableDicomUserConnection()...);
locker.GetConnection().Move(target, map);

The intention of locker is to protect Move() call with a mutex. Unfortunately, after completing the Move() call, CommandDispatcher::Step() will be triggered again where DUL_PEERREQUESTEDRELEASE message will be received. The T_ASC_Association object will be released as a result.

But if a second move call is initiated AFTER the first Move() call, but BEFORE the CommandDispatcher::Step() call, the second Move() call will try to reuse the original DicomUserConnection object for further communication. When the aforementioned DUL_PEERREQUESTEDRELEASE message arrives, the second move call will fail since the reused T_ASC_Association object is destroyed.

So the fundamental problem of the bug is that locker.GetConnection().Move(target, map) didn't cover all operations related to the move: when the first Move() is still trying to cleanup the association while a second Move() is triggered, we can see crashes every time.

Thanks for your thoughts on how to fix the problem in advance!

Ishaan

Dear Ishaan,

Many thanks for finding and investigating this issue!

Please would you kindly introduce an issue report in our bugtracker, so as to ensure the proper quality workflow?
https://bitbucket.org/sjodogne/orthanc/issues

Regards,
S├ębastien-

Hi S├ębastien,

I created an issue in the bugtracker. Thanks for looking into the issue in advance!

Ishaan