instance anonymization with consistent PatientID

Hi all,

I’ve seen this being discussed in the forum, but never anyone posting a clear answer. I’m working on a lua anonymization script as we’re building a solution including orthanc which will forward studies from multiple hospitals to a single cloud backend orthanc. at each hospital, we’ll have an orthanc server functioning as a DICOM proxy and anonymizer, whereafter it autoroutes to the central cloud backend.

I’d like to use the onStoredInstance function to achieve the fastest possible routing. However I need a method to keeping the Patient, Study and Series relationships intact and the default anonymization using a hashed timestamp for every PatientID which passes by the DICOM proxy, will not work. hence a couple of questions for the forum:

  1. has anyone built an anonymization script which somehow maintains the patient, study and series relationship intact (several options have been discussed in other threads)? feel free to share your solutions if you have one.

  2. when I ask my own anonymization script to simply keep PatientID intact, orthanc complains that I need to use the force option. could someone help me sort out the syntax for using the force option as what I’ve created seems to be invalid? (attached).

  3. if I was to give up the idea of immediate autorouting and switch to onStablePatient, I assume that the patientID would remain constant within one lua script execution. would that be the case?

thank you for a great product and a fantastic forum,
Pär

anonymizeSendToPeerCMRADCORE.lua (1.51 KB)

Hi Par,

I’m one of the users who has asked about this feature in the past. We’ve developed a Lua script here to run on the Orthanc that maintains consistent anonymization for a variety of fields.

We trigger our anonymization using the OnStableStudy trigger. The script anonymizes complete studies under the following constraints:

  • Incoming non-anonymized study must always produce the same outgoing anonymized Study/Patient.

  • Repeating anonymization produces exactly the same anonymous study

  • Requires maintaining a lookup table of incoming/outgoing StudyInstanceUID and PatientID- Internal FrameOfReferenceUID links must be maintained.

  • We do a lot of MR image anonymization and those often need these FrameOfReferenceUID links to connect the coordinate frames of separate series within the study.

  • Original FrameOfReferenceUID in our case contain patient info. So they first have to be examined to determine the links before they are replaced with anonymized new UID.- All Date/Time tags are shifted by a random date as per our HIPAA requirements

  • The shift is determined once per patient upon the receipt of the first study

  • Subsequent studies are shifted by the same consistent value
    To maintain the lookup table between incoming and outgoing data as well as store things like the date shifts, we run our Orthanc with the PostGreSQL setup, all within Docker containers on a Linux host. I had to add Lua SQL libaries to the Orthanc Docker image so that I could call the PostGreSQL server from the Lua script inside Orthanc in order to store my lookup information.

I’ve been working on streamlining my Docker setup so that I could share it, but it’s been a challenge so far. We have a lot of security requirements that make encapsulating the whole setup in a simple docker-compose.yml or Dockerfile a challenge. At the moment, I bring up my Orthanc/Postgres containers with a series of bash scripts and Dockerfiles. It’s not quite in a form I can post publicly yet, but I could send a zip if we can determine how to email one another without publishing our emails here for spam bots to find.

John.

I should clarify that repeating our anonymization process creates anonymized studies with identical anonymized StudyInstanceUID/PatientID pairs. However, since we don’t track UID below the Study level, the repeated SeriesInstanceUID and SOPInstanceUID are all “new” every time the process is repeated. Our users know not to push such repeated anonymizations to the same destination PACS because of the data duplication that will result there at the series and instance level.

Keeping the “consistent anonymization” at the Study level was a good compromise for us in terms of the amount of coding and record keeping that would be necessary compared to tracking images down to the series or even instance level.

John.

Hi John,

thank you for getting back to me and the reply with such detail. If possible, would you be able to share you scripts over private email conversation. you can contact me on at cmrad dot com. in our case as well keeping the consistency down to study level globally and below that only per anonymization execution should be absolutely ok. basically the forwarding of the studies from the clinics should only happen one time. and even if it did happen twice, that’s not a big deal either.

thanks,
Pär

Hi Par,

I sent some zip files of our setup to your with some explanations.

At the moment, I’m using a rather old Osimis Orthanc Docker image and haven’t gotten around to updating our setup for the latest version. That mostly affects the plugins we use with our Orthanc and shouldn’t change the postgres lua connection all that much.

John.

Hi, John.

I’d like to receive a copy of these zip files.
Could you send them to at gmail dot com?

Thanks in advance.

Regards,

Everton

Hi Everton,

I will be traveling for a few days, but can get to this next week.

I have now moved to a docker-compose recipe with one or two helper scripts to get things up and going.

The first step, however, that you might look into is downloading the Osimis docker files for building their docker image. I start with their docker file and change their Ubuntu 16.04 base to Ubuntu 18.04, since there are a few Lua libraries that I need that are only available in the 18.04 base.

My typical process is to pull the Osimis orthanc-builder from git: https://bitbucket.org/osimis/orthanc-builder/src/master/

Edit the Docker file to make the 18.04 change and build a local 18.04 version of the Osimis Orthanc image.

After that, I have the new docker-compose setup that I can get to you next week.

John.

John,

Thank you very much.
I’ll look into these docker image resources in the meantime

Best Regards,

Everton

I emailed a zip archive to your address.

John.

Hi,

Please note that Orthanc release 1.5.8 has fixed the lost relationships between CT and RT-STRUCT during anonymization!

At Osimis, we use Orthanc to anonymise studies with consistent PatientIDs by forcing a value for the PatientID that is calculated with the HMAC SHA256 algorithm, with a key stored on the Orthanc issuing the study.
Then the PatientID is unique and constant per institution and patient.
The StudyInstanceUID is randomly generated, so you get a different one for the same study each time you anonymise it.
It does not allow to easily reidentify the original PatientID though, as would do an encryption algorithm following the DICOM standard (see http://dicom.nema.org/medical/dicom/current/output/html/part15.html §E).

Cheers,

Michel

John,

Thank you and sorry for the late response.
I think that something gone wrong because I didn’t found your e-mail.
Could you send it again?

Thanks in advance and best regards

Everton Portela

Perhaps gmail rejects emails with zip attachments? Did it end up in your junk folder?

We’re having email trouble with our systems at the moment, so I’m writing this from our web email. I won’t be able to send the zip until I have access from my normal machine.

John.

Hi Everton, Par, Robert, Michel and all –

Just realized this thread was stale after writing a long-ish reply, but I will post anyway in case it is still also of some us to the OP or others.

I have also created scripts that do some parts of what Par and Robert described. The technique that I use is to derive a consistent new “random” Study, Series, and Instance UID without a LUT using the same method that others have discussed for hashing consistent “random” patient id’s and times. Thus, the anonymization is can be done at the instance level but is consistent across the series, study, and patient levels. Moreover, repeated anonymizations of the same source data result in identical output tags.

Take a look at the project scripts at https://github.com/derekmerck/diana2. There are a bunch of somewhat documented Python libraries and some of examples of anonymizing dicom objects with sham info as data move in and out of orthanc. I was working for a while on a Lua script that called my python anonymizer on instance arrived, but I ended up implementing it through a orthanc manager process instead b/c it was easier for me to debug.

This may be something that @John Roberts and @Michel Rozpendowski might want to look at as well, since I think it would overcome their repeated anonymization side-effects, if I understand correctly.

Best regards,

Derek–

Thanks for the link. I’ll take a look. The “hash in place” approach is certainly a far simpler solution. This is essentially what the CTP anonymizer put out by RSNA does behind the scenes. Unfortunately, they use an md5 hash which is no longer considered secure for patient privacy within the US regulatory framework.

My plan is to invoke a stronger hash algorithm than md5 and “hash in place” like you are doing now, when I finally have the time to convert my Lua scripts to a standalone python package using the Orthanc RestAPI.

John.

I can see Diana is hashing with SHA1 which is also not secure.
You should at least consider using SHA2 to avoid collisions.
Besides, at Osimis, we hash with HMAC SH256 which is even more secure as it uses a key (local to the origin institution holding the identified studies) which has the advantage of then providing a different patient id hash for two patients from different institutions but that would have by chance the same patient id.

Good point about about adding a key per institution.

I was anticipating having an Orthanc instance dependent salt added to the hashing algorithm. We tend to anonymize per research project with a dedicated Orthanc per project. Hence, we don’t generally need anonymization agreement at the institutional level.

John.

Out of curiousity, for those using hashes to anonymize the UID, are you also updating the meta-connections that Orthanc maintains (Anonmyized From X)?

My own anonymization in Lua breaks the “Anonymized From” meta below the patient level. I’ve never figured out how to update the meta to reflect the connections between my anonymized output and the parent. That seems to be something only the built in RestAPI/Anonymize function seems to handle.

I’ve asked here before about the meta information and gather that it’s not easily updatable.

But then, with your hash algorithm, you also must be breaking any “anonymized from” links as well. Do you just accept that as the outcome? Or do you update the meta data to maintain the linkage within Orthanc?

John.

At Osimis, we use hashes only for the Patient ID, and rely on Orthanc for the generation of the other UIDs when anonymizing at study level. We are therefore not losing any meta-connection in Orthanc.

Interesting. One of the issues I ran into with the standard RestAPI anonymizer was that it did not account for UID buried in sequence tags. MR is one modality stuffed with these buried UID which link one series with another. To maintain the cross referencing aspect, I have to first create a graph of the existing connections with all the original Study/Series/Image UID and then use the RestAPI to create new UID in a consistent manner that retains the interconnections in the anonymized output.

Employing a hash would make that a lot easier since I could feed the existing UID to the hash to generate new UID. This is what RSNA’s Java CTP anonymizer does, though with the insecure old MD5 hash as mentioned. Using a hash approach is simpler than a complicated graph/map solution because I don’t have to capture the interconnectedness. The hash simply retains it.

If Lua had a hash class in the standard distribution, I’d probably consider switching to that now, but as near as I can tell the Lua libraries for hashing are mostly 3rd party. I’d rather use code that has been viewed by more people than just the original coder.

For now, my hashing approach to handling UIDs will have to wait for my rewrite of my Lua code in Python.

John.