Frozen Jobs and REST API not responding after stuck C-MOVE

Hello,

We have encountered an issue and I am hoping to get some help looking into it. Unfortunately I’m not yet able to reproduce locally and my access to the instance where the issue has occurred is limited. But here is what we know:

Set-up:

  1. We initiated a C-MOVE SCU request via the REST API (Orthanc configured for Synchronous C-MOVEs). The C-MOVE is to move a study from a remote PACS to Orthanc.
  2. Eventually the application that initiated the C-MOVE through the API times out and closes the connection.
  3. The C-MOVE job remains in a ‘running’ state indefinitely after this.
  4. We never see any corresponding incoming C-STORE requests. (I am unsure yet if we are getting any sort of C-MOVE-RSP from the PACS).

I suspect the lack of C-STORE response is due to a networking or configuration issue on the PACS end. However, here is where we are having an issue with Orthanc:

  1. The job remains indefinitely.
  2. Attempting to cancel or pause the job via the API return a 200 OK, but the job remains in a running state. (This persists between reboots). I did find another description of this issue: Unable to Pause or Cancel running jobs (DICOM MOVE SCU)
  3. Attempting to use the /reset endpoint to restart Orthanc hangs up. Orthanc stops but freezes up when it is trying to shut down.
  4. Certain other REST API requests never respond. Initiating a C-ECHO through the API seems to work. However, a C-FIND does not. Crucially, though, the trace-level logs show that the DICOM operations succeed. So the C-FIND occurs but the HTTP request never returns a response.
  5. Looking at the metrics orthanc_jobs_running and orthanc_rest_api_active_requests keep increasing as I make more requests. The jobs and api requests seem stuck running.

Orthanc Version 1.12.7

Here is relevant config

{
  "DicomAet": "${DICOM_AET}",
  "Plugins": ["/usr/share/orthanc/plugins", "/usr/local/share/orthanc/plugins"],
  "UserMetadata": {
    "DeletionDate": 3030,
  },

  "StorageDirectory": "/var/lib/orthanc/db",
  "MaximumStorageSize": 51200,
  "MaximumStorageCacheSize": 6144,
  "MediaArchiveSize": 10,

  "DicomThreadsCount": 16,
  "ConcurrentJobs": 0, // (Unlimited)
  "StorageAccessOnFind": "Never", // Fastest setting - uses DB Index whenever possible

  "RemoteAccessAllowed": true,
  "AuthenticationEnabled": false,

  "DatabaseServerIdentifier": "Orthanc1",
  "DicomModalitiesInDatabase": true,
  "PostgreSQL": {
    "EnableIndex": true,
    "EnableStorage": false,
    "Host": "${HOST}",
    "Port": 5432,
    "Database": "orthanc",
    "Username": "${POSTGRES_USER}",
    "Password": "${POSTGRES_PASSWORD}",
    "EnableSsl": false,
    "MaximumConnectionRetries": 10,
    "ConnectionRetryInterval": 5,
    "IndexConnectionsCount": 50,
    "EnableVerboseLogs": false
  }
}

I will update if able to provide a minimal reproducible example. In the meantime any insight into how to mitigate or debug this issue would be appreciated.

Hi,

First, note that "ConcurrentJobs": 0 does not mean unlimited but A value of "0" indicates to use all the available CPU logical cores.

Since you are sharing only the relevant configuration; are you sure you have not configured DICOM SCU/SCP timeouts ?

Imagine you have only 1 concurrent job and you have very large DICOM timeouts, then, a DICOM job could block the job engine for a very long time while waiting for DICOM messages.

/reset probably waits for the current jobs to complete or, at least, to yield after a step.

BTW, it would be very helpful to have a reproducible setup (or at least full logs).

Best regards,

Alain.

Thank you for your input. The config above is almost complete (we have it split into several files). The other file does not contain any timeout config, but I will double check to make sure that we are not setting that anywhere else.

I am still working to reproduce this behavior, but so far, I haven’t been able to mimic the behavior of the production PACS locally and so I haven’t been able to trigger the same behavior. However, your input has helped me imagine a way I might be able to do so. I’ll report back when I have more information.

Hello @alainmazy, I have some more information for you.

Timeout Configuration

I confirmed that we are not setting DicomScuTimeout. However, it appears that the Timeout property was being set to 0 for the modality. We did not explicitly set this when setting up a modality, but it seems that perhaps it is set to a default value from the database? We did try to set it to 30 but did not see a change in behavior. However, we already had stuck jobs at that time, so it’s possible that we were encountering a different issue. Will try again today to set it to 30 once we have no stuck jobs.

Can you confirm what the 0 value means in this context? I know that for DicomScuTimeout a value of 0 indicates that there is no timeout. However, in this case does 0 mean no timeout or does it mean “do not override DicomScuTimeout”? If it does not mean no timeout, but it is also set to 0 if no value is passed when creating a modality, I would consider that unexpected behavior. I would expect that not passing a value would leave it as the default.

Stuck C-MOVE

We were able to reproduce a stuck C-MOVE. Here are the steps that got there:

  1. Issue a C-FIND (success)
  2. Use answers/$id/retrieve to start a C-MOVE
  3. We see the successful association and then a C-MOVE-RQ. There is no response. The API request never returns and I don’t see anything more in the logs on the Orthanc side.

REST API and Jobs Frozen

After this, we begin seeing the “stuck” behavior from Orthanc. Some API requests return successfully. Others trigger DICOM operations but never return. Others don’t even trigger DICOM opersation.

Here is a summary of the behavior we see:

  • /modality/$id/query shows a successful DICOM request and response, but the REST API request never returns a response
  • subsequent retrieves (C-MOVEs) do not even send a DICOM request
  • unable to cancel, pause, or delete jobs – they remain in a ‘Running’ state
  • using the /reset endpoint to restart Orthanc does not work. It logs Orthanc is stopping and a few services stop, but then it gets hung up. We resorted to restarting the Docker container.
  • upon restarting the jobs are still in a Running state. However, we are able to try another C-MOVE and see the association and request one time. Then subsequent requests revert back to the fully “stuck” behavior.

I am in the process of gathering and cleaning up the logs. We still have not been able to reproduce locally and I don’t have direct access to the server where this is occurring. I will post relevant logs ASAP. Please let me know if there are logs that would be most helpful.

I still have not been able to reproduce the issue locally, but we were able to run a very controlled test on the system where we are seeing the issue.

Starting Configuration

  • We do not explicitly set the DicomScuTimeout so it should be the default value.
  • All other global configuration is shown above.
  • I confirmed that the number of logical cores is 8
  • Prior to this test we dropped the database and deleted the storage volume
  • Also running with "StoreJobs": false to ease in resetting Orthanc.

Modality Config

{
   "bbb4a53c-53c2-486a-9670-084fca988cde" : {
      "AET" : "STENTOR_QRP",
      "AllowEcho" : true,
      "AllowEventReport" : true,
      "AllowFind" : true,
      "AllowFindWorklist" : true,
      "AllowGet" : true,
      "AllowMove" : true,
      "AllowNAction" : true,
      "AllowStore" : true,
      "AllowTranscoding" : true,
      "Host" : "10.18.50.204",
      "LocalAet" : "",
      "Manufacturer" : "Generic",
      "Port" : 107,
      "RetrieveMethod" : "SystemDefault",
      "Timeout" : 10,
      "UseDicomTls" : false
   }
}

C-ECHO and C-FIND initially work as expected

C-ECHO

I1017 18:02:18.837356           HTTP-1 DicomAssociation.cpp:272] (dicom) Opening a DICOM SCU connection without DICOM TLS from AET "HFHS_ABCDEF3D" to AET "STENTOR_QRP" on host 10.18.50.204:107 (manufacturer: Generic, timeout: 10s)
T1017 18:02:18.837426           HTTP-1 DicomAssociation.cpp:370] (dicom) Request Parameters:
====================== BEGIN A-ASSOCIATE-RQ =====================
Our Implementation Class UID:      1.2.276.0.7230010.3.0.3.6.9
Our Implementation Version Name:   OFFIS_DCMTK_369
Their Implementation Class UID:
Their Implementation Version Name:
Application Context Name:    1.2.840.10008.3.1.1.1
Calling Application Name:    HFHS_ABCDEF3D
Called Application Name:     STENTOR_QRP
Responding Application Name:
Our Max PDU Receive Size:    16384
Their Max PDU Receive Size:  0
Presentation Contexts:
  Context ID:        1 (Proposed)
    Abstract Syntax: =VerificationSOPClass
    Proposed SCP/SCU Role: Default
    Proposed Transfer Syntax(es):
      =LittleEndianExplicit
      =LittleEndianImplicit
      =BigEndianExplicit
Requested Extended Negotiation: none
Accepted Extended Negotiation:  none
Requested User Identity Negotiation: none
User Identity Negotiation Response:  none
======================= END A-ASSOCIATE-RQ ======================
D: setting network send timeout to 60 seconds
D: setting network receive timeout to 60 seconds
D: DULFSM: disabling Nagle algorithm as defined at compilation time (DISABLE_NAGLE_ALGORITHM)
D: Constructing Associate RQ PDU
D: PDU Type: Associate Accept, PDU Length: 179 + 6 bytes PDU header
D:   02  00  00  00  00  b3  00  01  00  00  53  54  45  4e  54  4f
D:   52  5f  51  52  50  20  20  20  20  20  48  46  48  53  5f  49
D:   4e  54  55  49  54  33  44  20  20  20  00  00  00  00  00  00
D:   00  00  00  00  00  00  00  00  00  00  00  00  00  00  00  00
D:   00  00  00  00  00  00  00  00  00  00  10  00  00  15  31  2e
D:   32  2e  38  34  30  2e  31  30  30  30  38  2e  33  2e  31  2e
D:   31  2e  31  21  00  00  1b  01  00  00  00  40  00  00  13  31
D:   2e  32  2e  38  34  30  2e  31  30  30  30  38  2e  31  2e  32
D:   2e  31  50  00  00  33  51  00  00  04  00  00  fa  ea  52  00
D:   00  18  31  2e  33  2e  34  36  2e  36  37  30  35  38  39  2e
D:   34  32  2e  31  2e  34  2e  34  2e  35  55  00  00  0b  50  48
D:   49  53  55  44  4d  33  31  30  30
D: Parsing an A-ASSOCIATE PDU
T1017 18:02:18.840789           HTTP-1 DicomAssociation.cpp:380] (dicom) Connection Parameters: Transport connection: TCP/IP, unencrypted.
T1017 18:02:18.840821           HTTP-1 DicomAssociation.cpp:382] (dicom) Association Parameters Negotiated:
====================== BEGIN A-ASSOCIATE-AC =====================
Our Implementation Class UID:      1.2.276.0.7230010.3.0.3.6.9
Our Implementation Version Name:   OFFIS_DCMTK_369
Their Implementation Class UID:    1.3.46.670589.42.1.4.4.5
Their Implementation Version Name: PHISUDM3100
Application Context Name:    1.2.840.10008.3.1.1.1
Calling Application Name:    HFHS_ABCDEF3D
Called Application Name:     STENTOR_QRP
Responding Application Name: STENTOR_QRP
Our Max PDU Receive Size:    16384
Their Max PDU Receive Size:  64234
Presentation Contexts:
  Context ID:        1 (Accepted)
    Abstract Syntax: =VerificationSOPClass
    Proposed SCP/SCU Role: Default
    Accepted SCP/SCU Role: Default
    Accepted Transfer Syntax: =LittleEndianExplicit
Requested Extended Negotiation: none
Accepted Extended Negotiation:  none
Requested User Identity Negotiation: none
User Identity Negotiation Response:  none
======================= END A-ASSOCIATE-AC ======================
T1017 18:02:18.840834           HTTP-1 DicomAssociation.cpp:397] (dicom) DicomAssociation::Open, adding SOPClassUID 1.2.840.10008.1.1 - TS 1.2.840.10008.1.2.1 - PC ID 1
I1017 18:02:18.842130           HTTP-1 DicomAssociation.cpp:112] (dicom) Closing DICOM association

C-FIND

FYI, the C-FIND was issued with a Timeout: 30 in the message body to /modality/id/query

I1017 18:14:09.909006          HTTP-29 HttpServer.cpp:1263] (http) POST /modalities/bbb4a53c-53c2-486a-9670-084fca988cde/query
I1017 18:14:09.909497          HTTP-29 DicomAssociation.cpp:272] (dicom) Opening a DICOM SCU connection without DICOM TLS from AET "HFHS_ABCDEF3D" to AET "STENTOR_QRP" on host 10.18.50.204:107 (manufacturer: Generic, timeout: 30s)
T1017 18:14:09.909586          HTTP-29 DicomAssociation.cpp:370] (dicom) Request Parameters:
====================== BEGIN A-ASSOCIATE-RQ =====================
Our Implementation Class UID:      1.2.276.0.7230010.3.0.3.6.9
Our Implementation Version Name:   OFFIS_DCMTK_369
Their Implementation Class UID:
Their Implementation Version Name:
Application Context Name:    1.2.840.10008.3.1.1.1
Calling Application Name:    HFHS_ABCDEF3D
Called Application Name:     STENTOR_QRP
Responding Application Name:
Our Max PDU Receive Size:    16384
Their Max PDU Receive Size:  0
Presentation Contexts:
  Context ID:        1 (Proposed)
    Abstract Syntax: =FINDPatientRootQueryRetrieveInformationModel
    Proposed SCP/SCU Role: Default
    Proposed Transfer Syntax(es):
      =LittleEndianExplicit
      =LittleEndianImplicit
      =BigEndianExplicit
  Context ID:        3 (Proposed)
    Abstract Syntax: =FINDStudyRootQueryRetrieveInformationModel
    Proposed SCP/SCU Role: Default
    Proposed Transfer Syntax(es):
      =LittleEndianExplicit
      =LittleEndianImplicit
      =BigEndianExplicit
  Context ID:        5 (Proposed)
    Abstract Syntax: =FINDModalityWorklistInformationModel
    Proposed SCP/SCU Role: Default
    Proposed Transfer Syntax(es):
      =LittleEndianExplicit
      =LittleEndianImplicit
      =BigEndianExplicit
Requested Extended Negotiation: none
Accepted Extended Negotiation:  none
Requested User Identity Negotiation: none
User Identity Negotiation Response:  none
======================= END A-ASSOCIATE-RQ ======================
D: setting network send timeout to 60 seconds
D: setting network receive timeout to 60 seconds
D: DULFSM: disabling Nagle algorithm as defined at compilation time (DISABLE_NAGLE_ALGORITHM)
D: Constructing Associate RQ PDU
D: PDU Type: Associate Accept, PDU Length: 222 + 6 bytes PDU header
D:   02  00  00  00  00  de  00  01  00  00  53  54  45  4e  54  4f
D:   52  5f  51  52  50  20  20  20  20  20  48  46  48  53  5f  49
D:   4e  54  55  49  54  33  44  20  20  20  00  00  00  00  00  00
D:   00  00  00  00  00  00  00  00  00  00  00  00  00  00  00  00
D:   00  00  00  00  00  00  00  00  00  00  10  00  00  15  31  2e
D:   32  2e  38  34  30  2e  31  30  30  30  38  2e  33  2e  31  2e
D:   31  2e  31  21  00  00  1b  01  00  00  00  40  00  00  13  31
D:   2e  32  2e  38  34  30  2e  31  30  30  30  38  2e  31  2e  32
D:   2e  31  21  00  00  1b  03  00  00  00  40  00  00  13  31  2e
D:   32  2e  38  34  30  2e  31  30  30  30  38  2e  31  2e  32  2e
D:   31  21  00  00  08  05  00  03  00  40  00  00  00  50  00  00
D:   33  51  00  00  04  00  00  fa  ea  52  00  00  18  31  2e  33
D:   2e  34  36  2e  36  37  30  35  38  39  2e  34  32  2e  31  2e
D:   34  2e  34  2e  35  55  00  00  0b  50  48  49  53  55  44  4d
D:   33  31  30  30
D: Parsing an A-ASSOCIATE PDU
T1017 18:14:09.914895          HTTP-29 DicomAssociation.cpp:380] (dicom) Connection Parameters: Transport connection: TCP/IP, unencrypted.
T1017 18:14:09.914926          HTTP-29 DicomAssociation.cpp:382] (dicom) Association Parameters Negotiated:
====================== BEGIN A-ASSOCIATE-AC =====================
Our Implementation Class UID:      1.2.276.0.7230010.3.0.3.6.9
Our Implementation Version Name:   OFFIS_DCMTK_369
Their Implementation Class UID:    1.3.46.670589.42.1.4.4.5
Their Implementation Version Name: PHISUDM3100
Application Context Name:    1.2.840.10008.3.1.1.1
Calling Application Name:    HFHS_ABCDEF3D
Called Application Name:     STENTOR_QRP
Responding Application Name: STENTOR_QRP
Our Max PDU Receive Size:    16384
Their Max PDU Receive Size:  64234
Presentation Contexts:
  Context ID:        1 (Accepted)
    Abstract Syntax: =FINDPatientRootQueryRetrieveInformationModel
    Proposed SCP/SCU Role: Default
    Accepted SCP/SCU Role: Default
    Accepted Transfer Syntax: =LittleEndianExplicit
  Context ID:        3 (Accepted)
    Abstract Syntax: =FINDStudyRootQueryRetrieveInformationModel
    Proposed SCP/SCU Role: Default
    Accepted SCP/SCU Role: Default
    Accepted Transfer Syntax: =LittleEndianExplicit
  Context ID:        5 (Abstract Syntax Not Supported)
    Abstract Syntax: =FINDModalityWorklistInformationModel
    Proposed SCP/SCU Role: Default
    Accepted SCP/SCU Role: Default
Requested Extended Negotiation: none
Accepted Extended Negotiation:  none
Requested User Identity Negotiation: none
User Identity Negotiation Response:  none
======================= END A-ASSOCIATE-AC ======================
T1017 18:14:09.914939          HTTP-29 DicomAssociation.cpp:397] (dicom) DicomAssociation::Open, adding SOPClassUID 1.2.840.10008.5.1.4.1.2.1.1 - TS 1.2.840.10008.1.2.1 - PC ID 1
T1017 18:14:09.914953          HTTP-29 DicomAssociation.cpp:397] (dicom) DicomAssociation::Open, adding SOPClassUID 1.2.840.10008.5.1.4.1.2.2.1 - TS 1.2.840.10008.1.2.1 - PC ID 3
T1017 18:14:09.914986          HTTP-29 DicomControlUserConnection.cpp:337] (dicom) Sending Find Request:
===================== OUTGOING DIMSE MESSAGE ====================
Message Type                  : C-FIND RQ
Presentation Context ID       : 3
Message ID                    : 1
Affected SOP Class UID        : FINDStudyRootQueryRetrieveInformationModel
Data Set                      : present
Priority                      : medium
======================= END DIMSE MESSAGE =======================
14:14:09.000 orthanc            |
# Dicom-Data-Set
# Used TransferSyntax: Little Endian Explicit
(0008,0005) CS [ISO_IR 100]                             #  10, 1 SpecificCharacterSet
(0008,0050) SH [ANON7305565]                            #  12, 1 AccessionNumber
(0008,0052) CS [STUDY]                                  #   6, 1 QueryRetrieveLevel
(0010,0020) LO (no value available)                     #   0, 0 PatientID
(0020,000d) UI (no value available)                     #   0, 0 StudyInstanceUID
14:14:09.000 orthanc            |
I1017 18:14:16.872034          HTTP-29 DicomAssociation.cpp:112] (dicom) Closing DICOM association
T1017 18:14:16.871764          HTTP-29 DicomControlUserConnection.cpp:78] (dicom) Received Find Response 32668:
===================== INCOMING DIMSE MESSAGE ====================
Message Type                  : C-FIND RSP
Message ID Being Responded To : 1
Affected SOP Class UID        : FINDStudyRootQueryRetrieveInformationModel
Data Set                      : present
DIMSE Status                  : 0xff00: Pending: Matches are continuing
======================= END DIMSE MESSAGE =======================
T1017 18:14:16.871824          HTTP-29 DicomControlUserConnection.cpp:86] (dicom) Response Identifiers 32668:
14:14:16.000 orthanc            |
# Dicom-Data-Set
# Used TransferSyntax: Little Endian Explicit
(0008,0005) CS [ISO_IR 100]                             #  10, 1 SpecificCharacterSet
(0008,0020) DA [20250324]                               #   8, 1 StudyDate
(0008,0030) TM [142334]                                 #   6, 1 StudyTime
(0008,0050) SH [ANON7305565]                            #  12, 1 AccessionNumber
(0008,0052) CS [STUDY]                                  #   6, 1 QueryRetrieveLevel
(0008,0054) AE [STENTOR_QRP]                            #  12, 1 RetrieveAETitle
(0008,0056) CS [ONLINE]                                 #   6, 1 InstanceAvailability
(0008,0061) CS [MR]                                     #   2, 1 ModalitiesInStudy
(0008,0090) PN [REFERRING]                              #  10, 1 ReferringPhysicianName
(0010,0010) PN [ABCDEF3D^HFHS^^^]                       #  16, 1 PatientName
(0010,0020) LO [ABCDEF5809587]                          #  14, 1 PatientID
(0010,0030) DA [19670327]                               #   8, 1 PatientBirthDate
(0010,0040) CS [O]                                      #   2, 1 PatientSex
(0020,000d) UI [2.16.840.1.114151.3.1.20566.9479912.4452.1759939163.7] #  54, 1 StudyInstanceUID
14:14:16.000 orthanc            |
T1017 18:14:16.872014          HTTP-29 DicomControlUserConnection.cpp:361] (dicom) Received Final Find Response:
===================== INCOMING DIMSE MESSAGE ====================
Message Type                  : C-FIND RSP
Message ID Being Responded To : 1
Affected SOP Class UID        : FINDStudyRootQueryRetrieveInformationModel
Data Set                      : none
DIMSE Status                  : 0x0000: Success: Matching is complete
======================= END DIMSE MESSAGE =======================

C-MOVE Stalls and never times out

Using the queries/$id/answers/0/retrieve endpoint, the API request never returns.

W1017 18:16:01.368331          HTTP-33 OrthancRestModalities.cpp:974] Driving C-Move SCU on remote modality STENTOR_QRP to target modality HFHS_ABCDEF3D
I1017 18:16:01.368520          HTTP-33 JobsRegistry.cpp:805] New DicomMoveScu job submitted with priority 0: cb622ef3-8543-4b63-bc9a-870a1ef1d572
I1017 18:16:01.368561    JOBS-WORKER-4 JobsEngine.cpp:138] (jobs) Executing DicomMoveScu job with priority 0 in worker thread 4: cb622ef3-8543-4b63-bc9a-870a1ef1d572
I1017 18:16:01.368612    JOBS-WORKER-4 DicomAssociation.cpp:272] (dicom) Opening a DICOM SCU connection without DICOM TLS from AET "HFHS_ABCDEF3D" to AET "STENTOR_QRP" on host 10.18.50.204:107 (manufacturer: Generic, timeout: 30s)
T1017 18:16:01.368679    JOBS-WORKER-4 DicomAssociation.cpp:370] (dicom) Request Parameters:
====================== BEGIN A-ASSOCIATE-RQ =====================
Our Implementation Class UID:      1.2.276.0.7230010.3.0.3.6.9
Our Implementation Version Name:   OFFIS_DCMTK_369
Their Implementation Class UID:
Their Implementation Version Name:
Application Context Name:    1.2.840.10008.3.1.1.1
Calling Application Name:    HFHS_ABCDEF3D
Called Application Name:     STENTOR_QRP
Responding Application Name:
Our Max PDU Receive Size:    16384
Their Max PDU Receive Size:  0
Presentation Contexts:
  Context ID:        1 (Proposed)
    Abstract Syntax: =MOVEPatientRootQueryRetrieveInformationModel
    Proposed SCP/SCU Role: Default
    Proposed Transfer Syntax(es):
      =LittleEndianExplicit
      =LittleEndianImplicit
      =BigEndianExplicit
  Context ID:        3 (Proposed)
    Abstract Syntax: =MOVEStudyRootQueryRetrieveInformationModel
    Proposed SCP/SCU Role: Default
    Proposed Transfer Syntax(es):
      =LittleEndianExplicit
      =LittleEndianImplicit
      =BigEndianExplicit
Requested Extended Negotiation: none
Accepted Extended Negotiation:  none
Requested User Identity Negotiation: none
User Identity Negotiation Response:  none
======================= END A-ASSOCIATE-RQ ======================
D: setting network send timeout to 60 seconds
D: setting network receive timeout to 60 seconds
D: DULFSM: disabling Nagle algorithm as defined at compilation time (DISABLE_NAGLE_ALGORITHM)
D: Constructing Associate RQ PDU
D: PDU Type: Associate Accept, PDU Length: 210 + 6 bytes PDU header
D:   02  00  00  00  00  d2  00  01  00  00  53  54  45  4e  54  4f
D:   52  5f  51  52  50  20  20  20  20  20  48  46  48  53  5f  49
D:   4e  54  55  49  54  33  44  20  20  20  00  00  00  00  00  00
D:   00  00  00  00  00  00  00  00  00  00  00  00  00  00  00  00
D:   00  00  00  00  00  00  00  00  00  00  10  00  00  15  31  2e
D:   32  2e  38  34  30  2e  31  30  30  30  38  2e  33  2e  31  2e
D:   31  2e  31  21  00  00  1b  01  00  00  00  40  00  00  13  31
D:   2e  32  2e  38  34  30  2e  31  30  30  30  38  2e  31  2e  32
D:   2e  31  21  00  00  1b  03  00  00  00  40  00  00  13  31  2e
D:   32  2e  38  34  30  2e  31  30  30  30  38  2e  31  2e  32  2e
D:   31  50  00  00  33  51  00  00  04  00  00  fa  ea  52  00  00
D:   18  31  2e  33  2e  34  36  2e  36  37  30  35  38  39  2e  34
D:   32  2e  31  2e  34  2e  34  2e  35  55  00  00  0b  50  48  49
D:   53  55  44  4d  33  31  30  30
D: Parsing an A-ASSOCIATE PDU
T1017 18:16:01.371955    JOBS-WORKER-4 DicomAssociation.cpp:380] (dicom) Connection Parameters: Transport connection: TCP/IP, unencrypted.
T1017 18:16:01.371980    JOBS-WORKER-4 DicomAssociation.cpp:382] (dicom) Association Parameters Negotiated:
====================== BEGIN A-ASSOCIATE-AC =====================
Our Implementation Class UID:      1.2.276.0.7230010.3.0.3.6.9
Our Implementation Version Name:   OFFIS_DCMTK_369
Their Implementation Class UID:    1.3.46.670589.42.1.4.4.5
Their Implementation Version Name: PHISUDM3100
Application Context Name:    1.2.840.10008.3.1.1.1
Calling Application Name:    HFHS_ABCDEF3D
Called Application Name:     STENTOR_QRP
Responding Application Name: STENTOR_QRP
Our Max PDU Receive Size:    16384
Their Max PDU Receive Size:  64234
Presentation Contexts:
  Context ID:        1 (Accepted)
    Abstract Syntax: =MOVEPatientRootQueryRetrieveInformationModel
    Proposed SCP/SCU Role: Default
    Accepted SCP/SCU Role: Default
    Accepted Transfer Syntax: =LittleEndianExplicit
  Context ID:        3 (Accepted)
    Abstract Syntax: =MOVEStudyRootQueryRetrieveInformationModel
    Proposed SCP/SCU Role: Default
    Accepted SCP/SCU Role: Default
    Accepted Transfer Syntax: =LittleEndianExplicit
Requested Extended Negotiation: none
Accepted Extended Negotiation:  none
Requested User Identity Negotiation: none
User Identity Negotiation Response:  none
======================= END A-ASSOCIATE-AC ======================
T1017 18:16:01.371990    JOBS-WORKER-4 DicomAssociation.cpp:397] (dicom) DicomAssociation::Open, adding SOPClassUID 1.2.840.10008.5.1.4.1.2.1.2 - TS 1.2.840.10008.1.2.1 - PC ID 1
T1017 18:16:01.372003    JOBS-WORKER-4 DicomAssociation.cpp:397] (dicom) DicomAssociation::Open, adding SOPClassUID 1.2.840.10008.5.1.4.1.2.2.2 - TS 1.2.840.10008.1.2.1 - PC ID 3
T1017 18:16:01.372050    JOBS-WORKER-4 DicomControlUserConnection.cpp:444] (dicom) Sending Move Request:
===================== OUTGOING DIMSE MESSAGE ====================
Message Type                  : C-MOVE RQ
Presentation Context ID       : 3
Message ID                    : 1
Affected SOP Class UID        : MOVEStudyRootQueryRetrieveInformationModel
Data Set                      : present
Priority                      : medium
Move Destination              : HFHS_ABCDEF3D
======================= END DIMSE MESSAGE =======================

After this, we don’t see any more DICOM logs. The association seems to stay open and no C-MOVE responses are logged.

Same Request with DCMTK:

We tried the same request with movescu. Here we saw repeated C-MOVE-RSP messages. In fact they went on indefinitely, never progressing on sub-operations and always with a status of Pending.

$ movescu -d -S -aet HFHS_ABCDEF3D -aec STENTOR_QRP 10.18.50.204 107 -k QueryRetrieveLevel=STUDY -k StudyInstanceUID=2.16.840.1.114151.3.1.20566.9479912.4452.1759939163.7
D: DcmDataDictionary: Loading file: /home/ubuntu/dcmtk-3.6.9-linux-x86_64-static/share/dcmtk-3.6.9/dicom.dic
D: $dcmtk: movescu v3.6.9 2024-12-11 $
D:
D: Request Parameters:
D: ====================== BEGIN A-ASSOCIATE-RQ =====================
D: Our Implementation Class UID:      1.2.276.0.7230010.3.0.3.6.9
D: Our Implementation Version Name:   OFFIS_DCMTK_369
D: Their Implementation Class UID:
D: Their Implementation Version Name:
D: Application Context Name:    1.2.840.10008.3.1.1.1
D: Calling Application Name:    HFHS_ABCDEF3D
D: Called Application Name:     STENTOR_QRP
D: Responding Application Name:
D: Our Max PDU Receive Size:    16384
D: Their Max PDU Receive Size:  0
D: Presentation Contexts:
D:   Context ID:        1 (Proposed)
D:     Abstract Syntax: =FINDStudyRootQueryRetrieveInformationModel
D:     Proposed SCP/SCU Role: Default
D:     Proposed Transfer Syntax(es):
D:       =LittleEndianExplicit
D:       =BigEndianExplicit
D:       =LittleEndianImplicit
D:   Context ID:        3 (Proposed)
D:     Abstract Syntax: =MOVEStudyRootQueryRetrieveInformationModel
D:     Proposed SCP/SCU Role: Default
D:     Proposed Transfer Syntax(es):
D:       =LittleEndianExplicit
D:       =BigEndianExplicit
D:       =LittleEndianImplicit
D: Requested Extended Negotiation: none
D: Accepted Extended Negotiation:  none
D: Requested User Identity Negotiation: none
D: User Identity Negotiation Response:  none
D: ======================= END A-ASSOCIATE-RQ ======================
I: Requesting Association
D: setting network send timeout to 60 seconds
D: setting network receive timeout to 60 seconds
D: Constructing Associate RQ PDU
D: PDU Type: Associate Accept, PDU Length: 210 + 6 bytes PDU header
D:   02  00  00  00  00  d2  00  01  00  00  53  54  45  4e  54  4f
D:   52  5f  51  52  50  20  20  20  20  20  48  46  48  53  5f  49
D:   4e  54  55  49  54  33  44  20  20  20  00  00  00  00  00  00
D:   00  00  00  00  00  00  00  00  00  00  00  00  00  00  00  00
D:   00  00  00  00  00  00  00  00  00  00  10  00  00  15  31  2e
D:   32  2e  38  34  30  2e  31  30  30  30  38  2e  33  2e  31  2e
D:   31  2e  31  21  00  00  1b  01  00  00  00  40  00  00  13  31
D:   2e  32  2e  38  34  30  2e  31  30  30  30  38  2e  31  2e  32
D:   2e  31  21  00  00  1b  03  00  00  00  40  00  00  13  31  2e
D:   32  2e  38  34  30  2e  31  30  30  30  38  2e  31  2e  32  2e
D:   31  50  00  00  33  51  00  00  04  00  00  fa  ea  52  00  00
D:   18  31  2e  33  2e  34  36  2e  36  37  30  35  38  39  2e  34
D:   32  2e  31  2e  34  2e  34  2e  35  55  00  00  0b  50  48  49
D:   53  55  44  4d  33  31  30  30
D: Parsing an A-ASSOCIATE PDU
D: Association Parameters Negotiated:
D: ====================== BEGIN A-ASSOCIATE-AC =====================
D: Our Implementation Class UID:      1.2.276.0.7230010.3.0.3.6.9
D: Our Implementation Version Name:   OFFIS_DCMTK_369
D: Their Implementation Class UID:    1.3.46.670589.42.1.4.4.5
D: Their Implementation Version Name: PHISUDM3100
D: Application Context Name:    1.2.840.10008.3.1.1.1
D: Calling Application Name:    HFHS_ABCDEF3D
D: Called Application Name:     STENTOR_QRP
D: Responding Application Name: STENTOR_QRP
D: Our Max PDU Receive Size:    16384
D: Their Max PDU Receive Size:  64234
D: Presentation Contexts:
D:   Context ID:        1 (Accepted)
D:     Abstract Syntax: =FINDStudyRootQueryRetrieveInformationModel
D:     Proposed SCP/SCU Role: Default
D:     Accepted SCP/SCU Role: Default
D:     Accepted Transfer Syntax: =LittleEndianExplicit
D:   Context ID:        3 (Accepted)
D:     Abstract Syntax: =MOVEStudyRootQueryRetrieveInformationModel
D:     Proposed SCP/SCU Role: Default
D:     Accepted SCP/SCU Role: Default
D:     Accepted Transfer Syntax: =LittleEndianExplicit
D: Requested Extended Negotiation: none
D: Accepted Extended Negotiation:  none
D: Requested User Identity Negotiation: none
D: User Identity Negotiation Response:  none
D: ======================= END A-ASSOCIATE-AC ======================
I: Association Accepted (Max Send PDV: 64222)
I: Sending Move Request
D: ===================== OUTGOING DIMSE MESSAGE ====================
D: Message Type                  : C-MOVE RQ
D: Presentation Context ID       : 3
D: Message ID                    : 1
D: Affected SOP Class UID        : MOVEStudyRootQueryRetrieveInformationModel
D: Data Set                      : present
D: Priority                      : medium
D: Move Destination              : HFHS_ABCDEF3D
D: ======================= END DIMSE MESSAGE =======================
I: Request Identifiers:
I:
I: # Dicom-Data-Set
I: # Used TransferSyntax: Little Endian Explicit
I: (0008,0052) CS [STUDY]                                  #   6, 1 QueryRetrieveLevel
I: (0020,000d) UI [2.16.840.1.114151.3.1.20566.9479912.4452.1759939163.7] #  54, 1 StudyInstanceUID
I:
D: DcmDataset::read() TransferSyntax="Little Endian Implicit"
I: Received Move Response 1
D: ===================== INCOMING DIMSE MESSAGE ====================
D: Message Type                  : C-MOVE RSP
D: Message ID Being Responded To : 1
D: Affected SOP Class UID        : MOVEStudyRootQueryRetrieveInformationModel
D: Remaining Suboperations       : 1453
D: Completed Suboperations       : 0
D: Failed Suboperations          : 0
D: Warning Suboperations         : 0
D: Data Set                      : none
D: DIMSE Status                  : 0xff00: Pending: Sub-operations are continuing
D: ======================= END DIMSE MESSAGE =======================
D: DcmDataset::read() TransferSyntax="Little Endian Implicit"
I: Received Move Response 2
D: ===================== INCOMING DIMSE MESSAGE ====================
D: Message Type                  : C-MOVE RSP
D: Message ID Being Responded To : 1
D: Affected SOP Class UID        : MOVEStudyRootQueryRetrieveInformationModel
D: Remaining Suboperations       : 1453
D: Completed Suboperations       : 0
D: Failed Suboperations          : 0
D: Warning Suboperations         : 0
D: Data Set                      : none
D: DIMSE Status                  : 0xff00: Pending: Sub-operations are continuing
D: ======================= END DIMSE MESSAGE =======================
D: DcmDataset::read() TransferSyntax="Little Endian Implicit"
I: Received Move Response 3
D: ===================== INCOMING DIMSE MESSAGE ====================
D: Message Type                  : C-MOVE RSP
D: Message ID Being Responded To : 1
D: Affected SOP Class UID        : MOVEStudyRootQueryRetrieveInformationModel
D: Remaining Suboperations       : 1453
D: Completed Suboperations       : 0
D: Failed Suboperations          : 0
D: Warning Suboperations         : 0
D: Data Set                      : none
D: DIMSE Status                  : 0xff00: Pending: Sub-operations are continuing
D: ======================= END DIMSE MESSAGE =======================
D: DcmDataset::read() TransferSyntax="Little Endian Implicit"
I: Received Move Response 4
D: ===================== INCOMING DIMSE MESSAGE ====================
D: Message Type                  : C-MOVE RSP
D: Message ID Being Responded To : 1
D: Affected SOP Class UID        : MOVEStudyRootQueryRetrieveInformationModel
D: Remaining Suboperations       : 1453
D: Completed Suboperations       : 0
D: Failed Suboperations          : 0
D: Warning Suboperations         : 0
D: Data Set                      : none
D: DIMSE Status                  : 0xff00: Pending: Sub-operations are continuing
D: ======================= END DIMSE MESSAGE =======================
...

We know that this likely indicates a misconfiguration at the networking level or on the remote modality. However, this seems to put Orthanc into a broken/frozen state.

Jobs Engine Frozen?

After this, Orthanc seems to stop responding in many operations:

  • jobs cannot be cancelled or paused - the API request succeeds, but the job is not cancelled
  • trying to use /tools/reset hangs up after DICOM server stops
W1017 18:37:53.047286             MAIN main.cpp:1004] Reset request received, restarting Orthanc
W1017 18:37:53.047310             MAIN main.cpp:1008] Orthanc is stopping
W1017 18:37:53.146148             MAIN main.cpp:1246]     HTTP server has stopped
W1017 18:37:53.345523             MAIN main.cpp:1362]     DICOM server has stopped

Normally the next log is that the jobs engine has stopped, so I suspect it is hanging up because there are running jobs.

W1017 18:37:53.638636             MAIN JobsEngine.cpp:317] The jobs engine has stopped
  • Running a query on the modality shows that the C-FIND operation succeeds, however the API request never responds

Please let me know if I can provide any other information that would be helpful. I wanted to be thorough in the logs above, but not overwhelm you with everything.

I am still working on a minimal working example, but I have been unable to replicate the behavior of the PACS (sending C-MOVE-RSPs but no C-STORE) so for now I’m not able to do this. If you have any suggestions on how I might replicate that with a script or other tool, please let me know. Thank you again for your help.

Hi @josh.keller ,

Thanks for all the detailed information !

I made a setup with 2 Orthanc instances running with a debugger. This allows me to pause the second Orthanc where your modality seems to stop (after the C-MOVE RQ has been received but, before the C-MOVE RSP is received).

Notes for myself: here’s where I stop the second Orthanc:

Here are my logs in this case:

I1020 10:19:24.021393           HTTP-2 Toolbox.cpp:2782] (http) POST /queries/07ecc89f-9ce9-4d06-b446-b29043955b12/answers/0/retrieve
W1020 10:19:24.021685           HTTP-2 OrthancRestModalities.cpp:974] Driving C-Move SCU on remote modality ORTHANC-BIS to target modality ORTHANC4243
I1020 10:19:24.022099           HTTP-2 JobsRegistry.cpp:832] New DicomMoveScu job submitted with priority 0: 6a2aaa6c-5e2c-4869-ad93-48f912f2b06e
I1020 10:19:24.022269    JOBS-WORKER-0 JobsEngine.cpp:145] (jobs) Executing DicomMoveScu job with priority 0 in worker thread 0: 6a2aaa6c-5e2c-4869-ad93-48f912f2b06e
I1020 10:19:24.022354    JOBS-WORKER-0 DicomAssociation.cpp:272] (dicom) Opening a DICOM SCU connection without DICOM TLS from AET "ORTHANC4243" to AET "ORTHANC-BIS" on host 127.0.0.1:4245 (manufacturer: Generic, timeout: 10s)
T1020 10:19:24.022452    JOBS-WORKER-0 DicomAssociation.cpp:370] (dicom) Request Parameters:
====================== BEGIN A-ASSOCIATE-RQ =====================
Our Implementation Class UID:      1.2.276.0.7230010.3.0.3.6.9
Our Implementation Version Name:   OFFIS_DCMTK_369
Their Implementation Class UID:    
Their Implementation Version Name: 
Application Context Name:    1.2.840.10008.3.1.1.1
Calling Application Name:    ORTHANC4243
Called Application Name:     ORTHANC-BIS
Responding Application Name: 
Our Max PDU Receive Size:    16384
Their Max PDU Receive Size:  0
Presentation Contexts:
  Context ID:        1 (Proposed)
    Abstract Syntax: =MOVEPatientRootQueryRetrieveInformationModel
    Proposed SCP/SCU Role: Default
    Proposed Transfer Syntax(es):
      =LittleEndianExplicit
      =LittleEndianImplicit
      =BigEndianExplicit
  Context ID:        3 (Proposed)
    Abstract Syntax: =MOVEStudyRootQueryRetrieveInformationModel
    Proposed SCP/SCU Role: Default
    Proposed Transfer Syntax(es):
      =LittleEndianExplicit
      =LittleEndianImplicit
      =BigEndianExplicit
Requested Extended Negotiation: none
Accepted Extended Negotiation:  none
Requested User Identity Negotiation: none
User Identity Negotiation Response:  none
======================= END A-ASSOCIATE-RQ ======================
D: setting network send timeout to 60 seconds
D: setting network receive timeout to 60 seconds
D: DULFSM: disabling Nagle algorithm as defined at compilation time (DISABLE_NAGLE_ALGORITHM)
D: Constructing Associate RQ PDU
D: PDU Type: Associate Accept, PDU Length: 217 + 6 bytes PDU header
D:   02  00  00  00  00  d9  00  01  00  00  4f  52  54  48  41  4e
D:   43  2d  42  49  53  20  20  20  20  20  4f  52  54  48  41  4e
D:   43  34  32  34  33  20  20  20  20  20  00  00  00  00  00  00
D:   00  00  00  00  00  00  00  00  00  00  00  00  00  00  00  00
D:   00  00  00  00  00  00  00  00  00  00  10  00  00  15  31  2e
D:   32  2e  38  34  30  2e  31  30  30  30  38  2e  33  2e  31  2e
D:   31  2e  31  21  00  00  1b  01  00  00  00  40  00  00  13  31
D:   2e  32  2e  38  34  30  2e  31  30  30  30  38  2e  31  2e  32
D:   2e  31  21  00  00  1b  03  00  00  00  40  00  00  13  31  2e
D:   32  2e  38  34  30  2e  31  30  30  30  38  2e  31  2e  32  2e
D:   31  50  00  00  3a  51  00  00  04  00  00  40  00  52  00  00
D:   1b  31  2e  32  2e  32  37  36  2e  30  2e  37  32  33  30  30
D:   31  30  2e  33  2e  30  2e  33  2e  36  2e  39  55  00  00  0f
D:   4f  46  46  49  53  5f  44  43  4d  54  4b  5f  33  36  39
D: Parsing an A-ASSOCIATE PDU
T1020 10:19:24.023911    JOBS-WORKER-0 DicomAssociation.cpp:380] (dicom) Connection Parameters: Transport connection: TCP/IP, unencrypted.
T1020 10:19:24.023954    JOBS-WORKER-0 DicomAssociation.cpp:382] (dicom) Association Parameters Negotiated:
====================== BEGIN A-ASSOCIATE-AC =====================
Our Implementation Class UID:      1.2.276.0.7230010.3.0.3.6.9
Our Implementation Version Name:   OFFIS_DCMTK_369
Their Implementation Class UID:    1.2.276.0.7230010.3.0.3.6.9
Their Implementation Version Name: OFFIS_DCMTK_369
Application Context Name:    1.2.840.10008.3.1.1.1
Calling Application Name:    ORTHANC4243
Called Application Name:     ORTHANC-BIS
Responding Application Name: ORTHANC-BIS
Our Max PDU Receive Size:    16384
Their Max PDU Receive Size:  16384
Presentation Contexts:
  Context ID:        1 (Accepted)
    Abstract Syntax: =MOVEPatientRootQueryRetrieveInformationModel
    Proposed SCP/SCU Role: Default
    Accepted SCP/SCU Role: Default
    Accepted Transfer Syntax: =LittleEndianExplicit
  Context ID:        3 (Accepted)
    Abstract Syntax: =MOVEStudyRootQueryRetrieveInformationModel
    Proposed SCP/SCU Role: Default
    Accepted SCP/SCU Role: Default
    Accepted Transfer Syntax: =LittleEndianExplicit
Requested Extended Negotiation: none
Accepted Extended Negotiation:  none
Requested User Identity Negotiation: none
User Identity Negotiation Response:  none
======================= END A-ASSOCIATE-AC ======================
T1020 10:19:24.024021    JOBS-WORKER-0 DicomAssociation.cpp:397] (dicom) DicomAssociation::Open, adding SOPClassUID 1.2.840.10008.5.1.4.1.2.1.2 - TS 1.2.840.10008.1.2.1 - PC ID 1
T1020 10:19:24.024046    JOBS-WORKER-0 DicomAssociation.cpp:397] (dicom) DicomAssociation::Open, adding SOPClassUID 1.2.840.10008.5.1.4.1.2.2.2 - TS 1.2.840.10008.1.2.1 - PC ID 3
T1020 10:19:24.024153    JOBS-WORKER-0 DicomControlUserConnection.cpp:455] (dicom) Sending Move Request:
===================== OUTGOING DIMSE MESSAGE ====================
Message Type                  : C-MOVE RQ
Presentation Context ID       : 3
Message ID                    : 1
Affected SOP Class UID        : MOVEStudyRootQueryRetrieveInformationModel
Data Set                      : present
Priority                      : medium
Move Destination              : ORTHANC4243
======================= END DIMSE MESSAGE =======================
D: timeout of 10 seconds elapsed while waiting for C-MOVE Responses
E1020 10:19:34.415069    JOBS-WORKER-0 OrthancException.cpp:68] Error in the network protocol: DicomAssociation - C-MOVE to AET "ORTHANC-BIS": DIMSE No data available (timeout in non-blocking mode)
I1020 10:19:34.415314    JOBS-WORKER-0 DicomAssociation.cpp:112] (dicom) Closing DICOM association
I1020 10:19:43.779799    JOBS-WORKER-0 JobsRegistry.cpp:538] Job has completed with failure: 6a2aaa6c-5e2c-4869-ad93-48f912f2b06e
E1020 10:19:43.780109           HTTP-2 OrthancException.cpp:68] Error in the network protocol: DicomAssociation - C-MOVE to AET "ORTHANC-BIS": DIMSE No data available (timeout in non-blocking mode)
I1020 10:19:43.780503           HTTP-2 Toolbox.cpp:2787] (http) POST /queries/07ecc89f-9ce9-4d06-b446-b29043955b12/answers/0/retrieve (elapsed: 19759194 us)

I this case, I have set DicomScuTimeout to 10 and, as expected, we do see a timeout after 10 seconds while waiting for the C-MOVE RSP.

So it seems that something keeps the connection alive… The DICOM protocol does not have a keep-alive mechanism but this could happen at the TCP level ? In Orthanc, this timeout detection is handled by DCMTK. At this point, I don’t really want to dig in that code :wink:

On your side, that might be interesting to capture the TCP traffic to check if something happens at the TCP level that would explain that the TCP connection remains active.

Best,

Alain.

Thank you @alainmazy. From our tests with DCMTK I suspect that we are receiving repeated C-MOVE-RSP with a pending response that for some reason are not being logged when running within Orthanc. In the case where regular responses are being received, I believe that would keep the connection open.
Obviously there is an issue either on the remote PACS or with the networking configuration. However, I would like to figure out how to prevent this behavior from Orthanc even if we encounter a misbehaving remote modality. I will keep digging and do a TCP capture if necessary.

I have finally been able to reproduce the behavior locally using a python script and patching pynetdicom to send repeated C-MOVE-RSP messages with a pending status. I will put this together in a gist and post the link here.

Hello @alainmazy here a repo with a minimal working example.
A python script is provided to mimic the PACS behavior. A bash script is provided to trigger the behavior we observed. Everything is provided in a docker compose stack, but you should be able to connect to the PACS Python script from a local instance of Orthanc using the port forwarding.
Please let me know if you have any questions or need any additional information.

Hi @josh.keller

First of all, thanks for the minimal working example. Top class ! :+1:

I have made this patch to display the C-MOVE RSP messages in the logs. This way, at least, we realize there is something weird going on.

I understand that you’d like Orthanc to sort of timeout when it receives progress messages with no actual progress but I’m afraid this would over complexify Orthanc only to “hide” a dysfunctional modality. So I’m not in favor of adding such a feature to Orthanc.

At worst, the right approach to handle that in Orthanc would be to add a few callbacks such that you could write a plugin that would analyze the C-Move RSP and decide to close the association but that’s another story.

I’d be happy to get your feedback on this.

Alain.

Hi @alainmazy,
Thank you for taking a look at this. I think that patch will certainly help diagnose in the case that there is a misbehaving PACS. I also agree that this isn’t something that Orthanc should handle automatically. If a PACS is misbehaving Orthanc should provide the tools to understand what is happening and why. I think the patch goes a long way toward that.

I think the main concern I have is that this misbehaving PACS puts Orthanc in a state that makes it very difficult to recover from. I have not found a way to cancel the jobs without force restarting the Docker container and not loading the jobs from the database. In addition, a single “stuck” C-MOVE seems to cause some other API requests to hang up, even though there should be additional logical cpus available for the jobs engine to utilize.

For instance, issuing a request to /modalities/{id}/query while a “stuck” C-MOVE is ongoing causes the following behavior:

  1. The HTTP request is logged
  2. The C-FIND operations complete successfully (as shown in the trace logs)
  3. No HTTP response is returned
  4. orthanc_rest_api_active_requests and orthanc_jobs_running begin to pile up, even when http requests are closed from the client side.

Here are some ideas for how Orthanc could either better handle or provide tools for the user to handle these situations:

  1. To begin, I would not expect other requests to stop working due to a “stuck” C-MOVE. At the very least, assuming there are cpu cores still available, other operations should go on as normal.
  2. For a C-MOVE, when in synchronous mode, closing the HTTP connection cancels the C-MOVE.
  3. In either synchronous or asynchronous mode, cancelling the job cancels the C-MOVE. (If not the default behavior, perhaps it could be via a ?force=true flag.)
  4. Provide some sort of timeout (per request?) that would force cancel an operation after a certain amount of time, even in the case where it appears progress is being made. I understand this is different from a “no data” timeout, but I can see a reasonable use-case for setting a maximum amount of time you want to allow an operation to take.

These are just ideas and I’m not sure how reasonable or feasible they would be to implement. I think the important point to me is that there is a reasonably straightforward way to cancel a stuck request like this without having to completely reboot and clear the jobs database.

I’m not a C++ programmer but I would be happy to help investigate if pointed in the right direction.

Hi @josh.keller

Thanks for your feedback. I get your points and they all make sense.
I’ll look into it but it might take time to cover all topics.

BTW, if you are using Orthanc in the scope of a commercial application, do not hesitate to help fund the development/debugging of such topics either by donating to Open Collective or by purchasing a dedicated support contract. This is of course not mandatory but this might sometimes help prioritize our work :wink:

Best regards,

Alain.

Thank you @alainmazy. I have reached out to get details about the support contract.

For the time being we are working to see if we can rectify the behavior on the PACS, but currently our only way to recover seems to be set "StoreJobs": false and restart the Docker container. So my last question regarding this issue: do you have any other suggestions for how to recover from this state?

Right now, I’m afraid I don’t have any recipe to get out of this situation gracefully. I’ll look into it …