Handling cases where my DICOM database "breaks the DICOM model of the real world"

Hello Orthanc users,

I would like your advice on the problem I’m currently facing.
I am importing with a batch a large quantity of studies inside an orthanc instance. But those studies are sometimes duplicated, anonymized twice and I do not have control over this data of bad quality.
In the end, I end up with multiple studies in orthanc having the same studyInstanceUid, multiple series with the same studyInstanceUid and multiple instances with the same sopInstanceUid.

My goal is to centralize all my old data to keep it easily accessible from Orthanc. But having such bad data is problematic: Orthanc issues multiple warning saying my data “breaks the DICOM model of the real world” and nevertheless it would be a bad idea to keep duplicated objects when they are supposed to be unique in the real world (and all your scripts/programs logic is based on the uniqueness).

Has anyone ever set up strategies or script to handle this ?
I can have two strategies:

  • either I clean my data before injecting it in Orthanc
  • or I clean it after it being stored in Orthanc

I was leaning toward the second solution by creating a script I would run everyday that would delete all but one study for studies whom studyInstanceUID appears multiple times.
I think cleaning my data before sending it to Orthanc would be quite complicated.

What do you think ? Do you have tools or advices ?

Regards,
Francois

Hi Francois,

I have ever encountered such that case when centralizing dicoms in one orthanc. At that time I had to cleanup my dicom data before sending to the Orthanc. I think there are 2 options in orthanc now :
1/ Using merge api in orthanc to merge 2 studies
2/ Write a plugin (c++/ python) to merge the study if there is a study (with the same studyInstanceUID) already existed in the Orthanc.

FYI, here’s a basic sample of a “sanitizer” python plugin. You should of course modify it to implement your own logic.
Note that the sanitizer can query the clean Orthanc to check if data is already there or not to avoid duplicates.

oops sorry, the link was stuck in my clipboard … https://bitbucket.org/osimis/orthanc-setup-samples/src/master/docker/sanitize-middleman-python/

Hello all, and thank for your responses.
After a bit of thinking, I think I will just send the data to Orthanc, which seems to store it fine, but then have a periodical process to find the studyInstanceUIDs matching multiples studies in Orthanc. then I will delete the studies containing the less number of series.
At least that’s my plan for now.
One of the reason is that the data is so large and I do not have it all at that moment that my process is more tailored to my need.

Thanks