questions about private tags

Hello,

I use private tags in the RadioLogic DICOM teaching files stored on Orthanc Servers (recent mainline version) and retrieved with a web app based on Cornerstone.
This works as expected and I hope to finalize and present the project in the coming weeks.

The internal handling of private tags in Orthanc has no influence on my project, but I am confused about the related logic and I ask myself if it works as designed.
I have read all relevant documentation and mailing posts about this topic, but some features are not clear for me.

I attach a test DICOM file which contains three private tags, one with non ASCII values :

(4321,0010) LO Private Creator RadioLogic
(4321,1011) LO RadioButton1 ABCDEfgh
(4321,1012) LO RadioButton2 éöàäèü£iop

My findings about the handling of these tags in Orthanc are documented in the attached assembled screenshot-image and described hereafter.

I configured the Orthanc server to add the private tags to the dictionary. The message displayed when the server starts (screenshot 1) shows that the
configuration is correct. The Orthanc Explorer DICOM tags page shows the correct values for my private tags (screenshot 2), however with an unknown name.
I expect to see the names RadioButton1 and RadioButton2 for these tags.

When I send the GET request “http://localhost:8042/instances/{ID}/tags”, the names are also unknown and the value for the non ASCII string
has the wrong encoding (screenshot 4). When I send a request for simple tags
(“http://localhost:8042/instances/{ID}/simplified-tags” or “http://localhost:8042/instances/{ID}/tags/?simplify”), my private tags are not listed at all,
but private tags with PrivateCreator GEISS are included in the alphabetic list.

The request “http://localhost:8042/instances/{ID}/content” provides a list including my three tags (screenshot 3). When I add the group-element number
for a standard or a private tag, I receive a file with the MIME application/octet-stream. I expect to receive a JSON file.

Requesting for example “http://localhost:8042/instances/{ID}/content/4321-1012”, a file (application/octet-stream) with 10 octets is downloaded.
When I open this file, the six first characters have a wrong code, the last four are correct (screenshot 5).

I am pleased to do additional tests if necessary.

best regards,

Marco Barnig

test_marco_barnig.dcm (524 KB)

additional informations to my post :

I tried different character encodings in the Orthanc server configuration and in the DICOM file (tag 0008,0005), without seeing
a difference in the results.

The screenshots refer to a “DefaultEncoding” Latin1 in Orthanc.
The Test DICOM file has a SpecialCharacterSet set to ISO_IR_100.

Marco Barnig

Hello,

All issues are now solved. I found myself answers to all questions. It was not easy to find the multiple reasons for the reported unexpected behavior of Orthanc when using UTF-8 characters. It’s time to provide a feedback. More details are available in my blog http://www.web3.lu/evolution-of-character-encoding.html/

  1. NFD instead of NFC

The é in Sébastien can be encoded in two UTF-8 forms:
eAcute = ” \ u {E9} ” (NFC = Normalization Form Canonical Composition)
combinedEAcute = ” \ u {65} \ u {301} // e followed by ´ (NFD = Normalization Form Canonical Decomposition)

The Mac OS X swift function NSTask (now Process) used to run Unix excecutables in my OS X apps encodes characters using decomposed unicode forms (NFD), which can cause problems in certain libraries. This was the main problem.

  1. Specific Character Set (0008,0005)

The correct DICOM term for multi-byte character sets without code extensions is ISO_IR 192.

  1. Orthanc dictionary

There was a bug in the Orthanc dictionary which was fixed on August 16, 2016.

see post https://groups.google.com/forum/#!msg/orthanc-users/wj_n_ZcQB-A/0mW_51qlDQAJ

  1. Content type for individual tags

The content type for the request orthanc:8042/instances/{ID}/content/group-element-number/ is application/octet-stream. I think now this is intended.

  1. No content type defined in HTTP headers for JSON files

To show the correct characters in the JSON files returned by the requests “orthanc:8042/instances/{ID}/content/” and “orthanc:8042/instances/{ID}/content/?simplify”, you must set the default character encoding in the browser to UTF-8, because Orthanc is not sending a content type or encoding charset in the HTTP response. I checked the headers with Web-Sniffer. I think it’s easy to add such an header file to the Orthanc Server.

The following figure shows that Orthanc now handles all the UTF-8 characters correctly :

The display of the UTF-8 characters (namely the Emoji) depends on the used browsers. The figures shows from top to bottom the browsers Safari, Firefox and Microsoft Edge.

I attach 2 DICOM files with NFD and NFC encoded UTF-8 characters, if someone wants to do some tests. There remains some small issues with the names when downloading ZIP or DICOMDIR files.

regards,

Marco Barnig

NFC-UTF8-characters.dcm (32.7 KB)

NFD-UTF8-characters.dcm (32.7 KB)

The correct link to the blog is
http://www.web3.lu/evolution-of-character-encoding/

Dear Marco,

Sorry for the very late answering to your questions, as they were numerous and quite complex… :slight_smile:

I will first answer this part of your original post:

I attach a test DICOM file which contains three private tags, one with non ASCII values :

(4321,0010) LO Private Creator RadioLogic
(4321,1011) LO RadioButton1 ABCDEfgh
(4321,1012) LO RadioButton2 éöàäèü£iop

I configured the Orthanc server to add the private tags to the dictionary. The message displayed when the server starts (screenshot 1) shows that the
configuration is correct. The Orthanc Explorer DICOM tags page shows the correct values for my private tags (screenshot 2), however with an unknown name.
I expect to see the names RadioButton1 and RadioButton2 for these tags.

This was indeed a limitation of Orthanc. Up to Orthanc 1.1.0, the “Dictionary” configuration option was limited to the definition of public tags. Support for private tags has been introducted by the following changeset, that is now part of the Orthanc mainline, and that will be part of Orthanc 1.1.0:

https://bitbucket.org/sjodogne/orthanc/commits/a657f7772e6915017c4088023283fc9fc0268139

In your example, you would update the configuration of Orthanc as follows:

“Dictionary” : {
“4321,1011” : [ “LO”, “RadioButton1”, 1, 1, “RadioLogic” ],

“4321,1012” : [ “LO”, “RadioButton2”, 1, 1, “RadioLogic” ]
}

Note how you have to add the Private Creator of the private tags (in this case, “Radiologic”) as the last item of their declaration. After uploading your sample DICOM file to Orthanc, you can now see the following when querying the “…/tags” REST URI:

curl http://localhost:8042/instances/b8a4a406-d78e9e36-52010ace-02eb3d37-adeea726/tags

[…]
“4321,0010” : {

“Name” : “PrivateCreator”,
“Type” : “String”,
“Value” : “RadioLogic”
},
“4321,1011” : {
“Name” : “RadioButton1”,
“PrivateCreator” : “RadioLogic”,
“Type” : “String”,
“Value” : “ABCDEfgh”
},
“4321,1012” : {
“Name” : “RadioButton2”,
“PrivateCreator” : “RadioLogic”,
“Type” : “String”,
“Value” : “éöàäèü£iop”
},

[…]

As this example shows, the “RadioButton1” and “RadioButton2” names are now working as you expect (up to Orthanc 1.1.0, the name would indeed read as “Unknown Tag & Data”).

Similarly, this information can be more friendly accessed in JavaScript by adding a “?simplify” argument:

curl -s http://localhost:8042/instances/b8a4a406-d78e9e36-52010ace-02eb3d37-adeea726/tags?simplify | grep Radio

“RadioButton1” : “ABCDEfgh”,
“RadioButton2” : “éöàäèü£iop”,

I will now continue digging your questions regarding encodings.

HTH,
Sébastien-

I continue my answers.

When I send a request for simple tags
(“http://localhost:8042/instances/{ID}/simplified-tags” or “http://localhost:8042/instances/{ID}/tags/?simplify”), my private tags are not listed at all,
but private tags with PrivateCreator GEISS are included in the alphabetic list.

This was because the private tags were all mapped to the same name “Unknown Tag & Data”.

As a consequence, only the lastly encountered private tag was taken into consideration by the “simplified-tags” or “tags?simplify” URI (all the previous values for private tags having been overridden).

Carefully defining the private tags in the “Dictionary” configuration option will allow to prevent this shortcoming in forthcoming releases of Orthanc.

The request “http://localhost:8042/instances/{ID}/content” provides a list including my three tags (screenshot 3). When I add the group-element number
for a standard or a private tag, I receive a file with the MIME application/octet-stream. I expect to receive a JSON file.

This is not a correct expectation, as this “/content” URI is designed to provide plain, raw, low-level access to the DICOM dataset.

Orthanc does not try to transcode anything (neither the encoding, nor the file format) through this “/content” URI: It returns exactly what is stored inside the DICOM tag. Hence the “application/octet-stream” MIME type, as the tag might contain just anything.

This URI is really low-level, and is notably useful to access the PixelData (without decompressing the image), or fields that are too long to be included by the “tags” URI (fields with a length of above 256 characters).

Requesting for example “http://localhost:8042/instances/{ID}/content/4321-1012”, a file (application/octet-stream) with 10 octets is downloaded.
When I open this file, the six first characters have a wrong code, the last four are correct (screenshot 5).

I have not read your subsequent messages nor your blog post yet, but I assume you didn’t know at the time of writing your original post that DICOM files might have various encodings. You can create DICOM files whose tags are encoded using Latin-1, UTF8, Arabic… The encoding of one DICOM file is provided by the value of its “SpecificCharacterSet” (0008,0005) tag:
http://dicom.nema.org/MEDICAL/dicom/2014b/output/chtml/part03/sect_C.12.html#sect_C.12.1.1.2

In your sample DICOM file, I see that the “SpecificCharacterSet” is set to “ISO_IR 100”, indicating a Latin-1 encoding. As a consequence, the value returned by “/content/4321-1012” is a string encoded using Latin-1. Here is how to correctly reencode the “/content” of your “RadioButton2” field under Linux:

curl -s http://localhost:8042/instances/b8a4a406-d78e9e36-52010ace-02eb3d37-adeea726/content/4321,1012 | iconv -f latin1

éöàäèü£iop

Note that this call also works for Orthanc <= 1.1.0.

When Orthanc returns a JSON file, the file will always encoded using UTF-8, which is the default encoding according to the JSON file format. This means that the “/tags” URI will automatically detect the encoding of your DICOM file (Latin-1 in your case), and will transparently convert it to UTF-8. This explains why the following call works as expected:

curl -s http://localhost:8042/instances/b8a4a406-d78e9e36-52010ace-02eb3d37-adeea726/tags | grep “4321,1012” -A5

“4321,1012” : {
“Name” : “RadioButton2”,
“PrivateCreator” : “RadioLogic”,
“Type” : “String”,
“Value” : “éöàäèü£iop”
},

HTH,

Sébastien-

I tried different character encodings in the Orthanc server configuration and in the DICOM file (tag 0008,0005), without seeing
a difference in the results.

The screenshots refer to a “DefaultEncoding” Latin1 in Orthanc.
The Test DICOM file has a SpecialCharacterSet set to ISO_IR_100.

The “DefaultEncoding” configuration option of Orthanc is only taken into consideration when it parses some DICOM file that has no “SpecificCharacterSet” (0008,0005) tag.

Indeed, some modalities do not properly set this tag, even if it is mandatory when extended characters are present in the DICOM file.

In your case, as your test DICOM file does have a proper value for this tag, it is normal that changing this configuration option does not make any difference.

Sébastien-

This is my last answer… I see that you have studied the encoding problem in-depth, most probably far deeper than me :slight_smile:

  1. NFD instead of NFC
    The Mac OS X swift function NSTask (now Process) used to run Unix excecutables in my OS X apps encodes characters using decomposed unicode forms (NFD), which can cause problems in certain libraries. This was the main problem.

Wow, nice to know!

  1. Orthanc dictionary

There was a bug in the Orthanc dictionary which was fixed on August 16, 2016.

see post https://groups.google.com/forum/#!msg/orthanc-users/wj_n_ZcQB-A/0mW_51qlDQAJ

There was actually another issue, as reported earlier in this thread:
https://groups.google.com/d/msg/orthanc-users/v60O7x3uYF0/fpXk7qcTAwAJ

  1. Content type for individual tags

The content type for the request orthanc:8042/instances/{ID}/content/group-element-number/ is application/octet-stream. I think now this is intended.

Yes, this is indeed intended:
https://groups.google.com/d/msg/orthanc-users/v60O7x3uYF0/s9NEn4QVAwAJ

  1. No content type defined in HTTP headers for JSON files

To show the correct characters in the JSON files returned by the requests “orthanc:8042/instances/{ID}/content/” and “orthanc:8042/instances/{ID}/content/?simplify”, you must set the default character encoding in the browser to UTF-8, because Orthanc is not sending a content type or encoding charset in the HTTP response. I checked the headers with Web-Sniffer. I think it’s easy to add such an header file to the Orthanc Server.

Nice catch! This is now also fixed in the mainline of Orthanc:
https://bitbucket.org/sjodogne/orthanc/commits/9cf176bc21ad6ea0a8f246a4f5446c8c7f19b5e3

The “Content-Type” HTTP header associated with JSON files is now “application/json; charset=utf-8” instead of “application/json”.

In theory, this modification is not mandatory (UTF-8 should be implied by default when dealing with JSON), but introducing it visibily solves interoperability issues. The following page might give additional interesting information:

To conclude this long series of questions/answers, I am extremely grateful to Marco for this great discussion and for all the investigations he made to improve the understanding of encodings in DICOM and Orthanc.

Kind Regards,
Sébastien-

Dear Sébastien,

thank you very much for your detailed and in-depht answer to my numerous questions.
Your response cleared my (small) remaining doubts about the DICOM encoding in Orthanc.

The development of the RadioLogic project (a radiology teaching tool with self-assessment) is harder than I expected. The main reason is the complexity, the incomplete documentation and the numerous version changes of the Apple OS X and iOS versions. I use Swift 3 for the Mac apps and HTML5/Javascript for the iOS webapp related to the RadioLogic project. Another difficulty is the different working of medical and engineering human brains!

I am however confident to terminate and to present the first release of the project before the end of this year. Orthanc is used as the private archive for the DICOM teaching files. I think I have currently a good understanding about the features and working of Orthanc and I enjoy using this great software package.

Thank you to make this server available for free to the medical open-source community and thank you for your guidance and excellent support of the users in this forum.

best regards,
Marco Barnig

Dear Marco,

This is really excellent news! It is so impelling for a free and open-source service such as Orthanc to know it shows usefulness in larger-scale workflow such as yours.

We are obviously looking forward to know more about your RadioLogic radiology teaching project in the next few weeks/months :slight_smile:

Finally, I am very happy to read that the guidance provided within this forum is useful to the Orthanc community, and that Orthanc is itself helpful to medical imaging in its global nature.

Cheers,
Sébastien-