Orthanc server locking up and restart locks SQLite

Hi,

First time posting here.
We are using Orthanc as part of an AI solution.

From the company supporting us we recieved the setup and LUA scripts to perform the tasks needed.
However we notice that the server sometimes locks up and refuses to accept any images.

In the logs we don’t see anything showing an issue.
When restarting the service our SQLite DB goes in to a lock causing the server to get stuck

Only solution then is to reboot the server.
However we are looking for a solution as to why it locks up in the first place.
Aswell as a procedure on how to reboot Orthanc server without SQLite locking up.

Any one have any ideas where i could start?

We are running it on a Windows 2016 server with sufficient memory.
The server stays up for about 2 - 3 weeks and then locks.

Thanks

Forgot to mention the LUA scripts purpose.
It just Anonymizes the DICOM and then transfers it to an external source for analyzing the images.

But it already fails at recieving the images as it doesn’t show up in the log when locking up.

From the company supporting us we recieved the setup and LUA scripts
to perform the tasks needed.

Given the time frames and the nature of the Lua procedures (which you
explain in a later message), I'm assuming you cannot easily "try it
without them for a few weeks". Still, if you happen to have a testing
environment where you can reproduce the problem, I encourage you to try
without the Lua procedures as we attempt to find solutions (if it can
be isolated to the Lua code it would help the troubleshooting process).

Also don't hesitate to post the code here if it's not sensitive or
proprietary.

However we notice that the server sometimes locks up and refuses to
accept any images.

I would encourage you to use a "debug build" of Orthanc until the next
lock up. Although they're available for the LSB[1], I must admit I'm
not sure they are for Windows. Maybe somebody else here can quickly
confirm, otherwise don't hesitate to ask more explicitly for help to
figure that out and possibly file a request for such builds if they're
not available. Also note that since Orthanc is open source, any company
(like the one you mention) may be able to build such programs that
include debug information (and the Orthanc project would welcome the
contribution of the build procedures!).

Once Orthanc locks up, you may open the Windows Task Manager, right-
click Orthanc, and select "Create dump file". You may then compress the
resulting file and send it to someone who can extract useful
information out of it (it will allow seeing "where" Orthanc is stuck),
or do it yourself with a debugger if you have the skills.

Attention: Since a "dump" is a snapshot of the Orthanc process memory,
it may contain sensitive data. If you must request the assistance of an
untrusted third-party for analysis (or publish it publicly on the
Orthanc issue tracker or this mailing list), make sure all sensitive
data has been stripped (perhaps with the help of a trusted party like
the company you mention).

FYI: You can create a dump file of a process running the normal Orthanc
program with no debug information, but as-is it would be almost useless
to anybody who can assist you. Still, although these don't exist yet as
far as I know, you can also request or contribute procedures to
generate (even via a third-party developer) separate "debug symbol
files" which can be combined with your non-debug process dump file. For
a few reasons this may not be as accurate as a full debug build, but
would still yield very good information.

In the logs we don't see anything showing an issue.

Noted.

When restarting the service our SQLite DB goes in to a lock causing
the server to get stuck

Try removing the "index-shm" file in the Orthanc Index Directory. This
is merely an index to the write-ahead log and to my knowledge can be
safely deleted (but do back it up / move it just in case).

Rationale: In at least one previous instance I personally heard of, a
crash on Windows induced a condition where Orthanc couldn't be started
before removing this file, likely because it is used to synchronize
processes (i.e. locking) by the embedded SQLite library and wasn't
updated to signal that no processes use the database anymore (because
of the crash), or because it was corrupted. (But see below.)

Only solution then is to reboot the server.

It is rather odd that rebooting the entire machine would "fix" the
SQLite lock condition. This may invalidate the "index-shm theory"
above, but I encourage you to try removing the file anyway just in
case.

I have no explanation for this at this time. Maybe double-check that
there is definitely no other Orthanc process alive. You may also use
various methods[2] to verify that no other processes (e.g. malware
scanners) are locking the file as well.

Any one have any ideas where i could start?

For the initial lock up, you can try:

* Looking for potential similarities between data sets sent to Orthanc
when it locks up. I gather the problem is not fully deterministic, but
it may still be partially determined by certain properties of the input
data (i.e. "sometimes locks when X is true in the data, but never
otherwise"), which may hint at the source of the problem. It's a long
shot which I wouldn't personally spend much time on.

* Publish your Lua scripts for review as suggested above, maybe
somebody will spot something that might explain the behavior.

* Publish your list of plugins and plugin versions (e.g. via /plugins
and /plugins/{id} resources). Also confirm your Orthanc version (e.g.
via /system resource). Maybe somebody will point out the possibility of
the problem originating with a plugin which hooks into the input data
handling process.

* Gather a "stack trace" for inspection (and submit it for third-party
review if necessary) via dumping state of the Orthanc process running a
debug variant of the program, as suggested above. To me this is the
only methodological approach that is guaranteed to help you steadily
progress (everything else is hit or miss).

* Wait for more answers here; it's very likely I'm missing something
obvious.

Good luck!

[1] Orthanc downloads
[2] filesystems - Find out which process is locking a file or folder in Windows - Super User