Getting FileUuid from C++ Plugin without RESTApi

Martin_Kramer · February 8, 2022, 2:13pm

Hi,

I’m experimenting with developing an Orthanc Plugins at the moment. Because the server is expected to receive some heavy load over time I’m trying to implement everything as simple as possible, meaning HTTP is disabled and not using Python or Lua plugins but using just C++.

The plugins hooks into the OrthancPluginRegisterOnStoredInstanceCallback() function and within the callback I need access to the FileUuid for each instance. Is it possible to get the FileUuid directly from a C++ plugin without running a RestAPI call first?

I managed to get access to the DICOM header of the instance by means of OrthancPluginGetInstanceJson(context, instance) but am unable to figure out how to obtain the FileUuid from the given instanceId parameter of the callback.

best wishes
Martin

Alain_Mazy1 · February 8, 2022, 4:50pm

Hi Martin,

No, this is designed to be hidden completely also from the plugin point of view.

The only plugins who have access to the file UUID are the storage plugins. But, there, you won’t know what instance it relates to, the storage plugin only receives a file uuid + a binary blob and is requested to store/read it to/from disk.

What do you have in mind exactly ?

Best regards,

Alain.

Martin_Kramer · February 8, 2022, 6:26pm

Hi Alain,

I want to use Orthanc as a larger long-term PACS that will be 80% archive with only little regular read/query access.It will be connected to about 15 larger modalities (CT, MRI, Angio) so I’m expecting quite some amount of data coming through (roughly estimated 10TB to 20TB per year). I’m quite confident that Orthanc can handle this job and I have already gone through all the tips and guides in the Orthanc book for setting up and optimizing Orthanc for performance. That’s also the reason for the minimal approach without lua or python plugins and also without HTTP for the receiver instance of Orthanc.

However, because of the long-term nature of the storage (5+ years) I would very much like the data to be stored in a PatientID/StudyUID/SeriesUID/ directory structure as a fail-safe (in case of database corruption or simply a failure of the database server the data can then still be easily and quickly used without having to first index 50TB+ of DICOMs, even indexing and recreating the database will be easier using this structure because one can index by PatientID and can already use data before indexing is complete).

First I have looked into creating a custom Storage plugin but quickly realized that this would create quite some overhead in StorageRead() because every read event would need to first figure out PatientID, StudyUID and SeriesUID for each instance, probably by a direct database query. For performance reasons I would rather want to avoid this, so I was thinking that maybe it would be possible to simply create symbolic links of the original Orthanc storage area to the PatientID/StudyUID/SeriesUID/ structure, i.e. overwriting StorageCreate() to write to the original Orthanc storage paradigm and create an additional symbolic link based on PatientID/StudyUID/SeriesUID/ in a separate directory. But this would mean to completely reimplement the original Orthanc storage area paradigm in a custom plugin simply to extend it by the symbolic link creation. Just recreating the already existing storage functionality is something I would like to avoid, especially for future compatibility to core Orthanc changes.

So I was thinking next to create a simple plugin that hooks into the OnStoredInstanceCallback() to create symbolic links of the original Orthanc storage area to the PatientID/StudyUID/SeriesUID/ structure. This way Orthanc could still be using the original and efficient UUID-based storage area… and that’s how I ended up not finding a way to get the FileUuid from the Plugin

I’m aware that this would probably be possible by just running a RestAPI call in OnStoredInstanceCallback() on each instance to get the FileUuid, but for this, I’m very worried about performance in situations where thousands of instances could be transferred per minute and would prefer to avoid this.

best wishes
Martin

jodogne · February 9, 2022, 6:47am

Hello,

If you want to preserve an explicit “PatientID/StudyUID/SeriesUID/ directory structure” without duplication, have you considered using the new “Folder Indexer” plugin created by UCLouvain?
https://book.orthanc-server.com/plugins/indexer.html

Regards,

Sébastien-

Alain_Mazy1 · February 9, 2022, 12:25pm

Hi Martin,

So far, we’ve always been reluctant to give attachments UUID (no, it was not possible to get the file UUID from the API).

Your symlink idea makes a lot of sense and, given the volume of data you are expecting, will likely be more scalable than the “Folder Indexer” plugin.

Knowing the attachment uuid was also requested by users performing large migrations from HDD to S3 and they wanted to copy files patients by patients and not “randomly”. A bit like your “recovery” scenario.

So, I’ve just added a new route in the Rest API to get all attachments info including the UUID: https://hg.orthanc-server.com/orthanc/rev/504624b0a062

So a call to http://localhost:8042/instances/…/attachments/dicom/info will return something like

{
   "CompressedMD5" : "b16eb30c1e14e0f3b5cfed381865d52d",
   "CompressedSize" : 201714,
   "ContentType" : 1,
   "UncompressedMD5" : "b16eb30c1e14e0f3b5cfed381865d52d",
   "UncompressedSize" : 201714,
   "Uuid" : "9b02da7e-68f0-479b-bd6b-2056c3fa9120"
}

It still requires a call to the Rest Api from the plugin. However, this call will bypass the HttpServer and, in terms of performance, is almost like a direct access to the DB.

Hope this helps,

Alain.

Christopher · February 10, 2022, 4:06pm

Hi Martin,
I am also interested in your topic. I am curious about your question of database corruption

However, because of the long-term nature of the storage (5+ years) I would very much like the data to be stored in a PatientID/StudyUID/SeriesUID/ directory structure as a fail-safe (in case of database corruption or simply a failure of the database server the data can then still be easily and quickly used without having to first index 50TB+ of DICOMs, even indexing and recreating the database will be easier using this structure because one can index by PatientID and can already use data before indexing is complete).

Suppose you have dicom structure : PatientID/StudyUID/SeriesUID, how would you start the server without having to first index 50TB+ dicom ? Orthanc needs to know the uuid of each level Study/ Series/ Instance (from the database) before calling the StorageArea. So in case of database corruption, how do you let Orthanc Server run without recreating the database ?

BTW, your idea of symlink is very much useful. The process of backing up dicom data will be much easier (i.e using rsync).

Thanks,
Chris

Martin_Kramer · February 10, 2022, 6:42pm

Hi Cristopher,

In case of a complete database failure years in the future, I can’t exactly say what the next steps would be, that always depends on the situation. For my long-term archive solution (using normal hard drives in a RAID6 to cost-effectively create such big storage) I do assume that a complete re-indexing of 50TB+ of DICOMs on normal HDDs would take at least a week, probably even more. With only having the default Orthanc storage the only option in such a case is to reindex everything and the data can only be used once this process is complete for all files.

From having the alternative PatientID/StudyUID/SeriesUID links available I see a couple of advantages for what one could do in case of a complete failure:

Throw together a quick pynetdicom based server that indexes only down to the series level,
i.e. it will not index SOPInstanceUID and will basically just read one Dicom per series.
That should be much faster and sufficient to serve most queries (those that request complete series anyway)
Start the Orthanc server with a clear database and run the re-indexing in the background on the patient level,
this means that once a patient is indexed it will be complete and already available for queries/usage.
One could even run the re-indexing by sorting by the most recently changed patients, effectively re-indexing the most recent patients first.
It makes the storage human readable, even if you have 200TB of Dicoms this way you could, if required, just do things manually on the file system. Let’s say you have a request for a research project that requires data of specific patients coming in while the re-indexing is running you can just grab the files directly. (it’s time-consuming and manual labor, but you could find the data easily!)

I’m basically just concerned for long-term compatibility and readability because handling such large Dicom storages is just one of the biggest pains one can imagine

@Alain, thank you very much!
I tried to implement your suggestions into a plugin, the source is available on a GitHub repo: https://github.com/MartinK84/orthanc_storage_link_plugin/blob/main/StorageLink_ChangeCallback/Plugin.cpp
There is probably still a lot of testing and minor improvements necessary but it so far seems to work.
I’m also planning to test implementing this as a custom storage area plugin in the future
(already created a plugin stub in the same repo, just haven’t had time to implement it, but the Folder Indexer Plugin Sébastien linked could be a good base for this)

Thanks for your work and support!
Martin