Rest API Content Type for String

Hello,

I am trying to retrieve a value that is a UID and is marked as a string via the REST api of Orthanc.

In navigating the REST interface, I expected all values from ‘/instances/{id}/content/*’ to have an Content Type of ‘application/json’. However when you retrieve a single value such as ‘instances/{id}/content/0020-000d/’ it returns it as ‘application/octet-stream’.

I can handle that however, I was expecting it to be a JSON string especially if the type in the tags states “String”.

The actual issue I wanted to bring up, is that since it is returning a non-JSON string, when parsing it with any http client, there is an extra space (or space-like character) returned at the end of the string.

I have tried this with a few different patients. I am running Orthanc 1.2.0 on Windows from Osimis.

It looks like this in Python3 as a raw string: b’1.2.840.113619.2.55.3.1678393098.910.1486472352.534\x00’

Do you know why the ‘\x00’ is being appended?

Also when running this via curl and python, the content-length is stated as 52, but if you check the length of the string it is only 51 characters.

Thanks for any help you can provide.

Adit

Hello,

I am trying to retrieve a value that is a UID and is marked as a
string via the REST api of Orthanc.

In navigating the REST interface, I expected all values from
'/instances/{id}/content/*' to have an Content Type of
'application/json'. However when you retrieve a single value such as
'instances/{id}/content/0020-000d/' it returns it as
'application/octet-stream'.

I can handle that however, I was expecting it to be a JSON string
especially if the type in the tags states "String".

Actually I believe the data model here is the one described in DICOM
(DICOM Data Dictionary), not JSON. If there were an "application/dicom-
attribute" media type or similar it would probably be appropriate, but
failing that application/octet-stream is the most sensible one IMO.

The actual issue I wanted to bring up, is that since it is returning
a non-JSON string, when parsing it with any http client, there is an
extra space (or space-like character) returned at the end of the
string.

I have tried this with a few different patients. I am running Orthanc
1.2.0 on Windows from Osimis.

It looks like this in Python3 as a raw
string: b'1.2.840.113619.2.55.3.1678393098.910.1486472352.534\x00'

Do you know why the '\x00' is being appended?

Here is the DICOM Data Dictionary:
http://dicom.nema.org/medical/dicom/current/output/html/part06.html

Lookup the attribute you're observing: 0020,000D (Study Instance UID).
You'll see that its DICOM Value Representation is "UI".

Here is the list of DICOM Value Representations:
http://dicom.nema.org/medical/dicom/current/output/html/part05.html#tab
le_6.2-1

Lookup "UI".
You'll have all the information you need to parse it correctly (as you
can see it is definitely not a JSON string, no quotes). This includes
the trailing null byte and a small rationale. Please note that this is
padding, it may not always appear (but AFAICT there can be only one if
it's there).

Also when running this via curl and python, the content-length is
stated as 52, but if you check the length of the string it is only 51
characters.

I cannot reproduce that unfortunately (see attached log, both the
Content-Length and the actual entity body are 42 bytes long in my
example). Can you provide the exact method you're using to get that
reading?

Hope this helps!

instance-content (1.16 KB)

> I am trying to retrieve a value that is a UID and is marked as a
> string via the REST api of Orthanc.
>
> In navigating the REST interface, I expected all values from
> '/instances/{id}/content/*' to have an Content Type of
> 'application/json'. However when you retrieve a single value such as
> 'instances/{id}/content/0020-000d/' it returns it as
> 'application/octet-stream'.
>
> I can handle that however, I was expecting it to be a JSON string
> especially if the type in the tags states "String".

Actually I believe the data model here is the one described in DICOM
(DICOM Data Dictionary), not JSON. If there were an "application/dicom-
attribute" media type or similar it would probably be appropriate, but
failing that application/octet-stream is the most sensible one IMO.

That is fair. I just assumed that since the REST API was returning
JSON for the content endpoint, that it would continue to return JSON
for attributes as well. I am able to detect for this, but it is more
workarounds than I would prefer.

I suppose is it possible to document in the REST API on Google Drive
which endpoints will serve JSON and which will serve
application-octet?

The use case that I have is a proxy for Orthanc, and I expected to
just proxy JSON for the "/instance/{id}/content/*" endpoint.

> The actual issue I wanted to bring up, is that since it is returning
> a non-JSON string, when parsing it with any http client, there is an
> extra space (or space-like character) returned at the end of the
> string.
>
> I have tried this with a few different patients. I am running Orthanc
> 1.2.0 on Windows from Osimis.
>
> It looks like this in Python3 as a raw
> string: b'1.2.840.113619.2.55.3.1678393098.910.1486472352.534\x00'
>
> Do you know why the '\x00' is being appended?

Here is the DICOM Data Dictionary:
http://dicom.nema.org/medical/dicom/current/output/html/part06.html

Lookup the attribute you're observing: 0020,000D (Study Instance UID).
You'll see that its DICOM Value Representation is "UI".

Here is the list of DICOM Value Representations:
PS3.5
le_6.2-1

Lookup "UI".
You'll have all the information you need to parse it correctly (as you
can see it is definitely not a JSON string, no quotes). This includes
the trailing null byte and a small rationale. Please note that this is
padding, it may not always appear (but AFAICT there can be only one if
it's there).

I agree that it is possible for a trailing null byte to be present. If
you look at the "/tags" on an instance and look at the
StudyInstanceUID value (or any tag that returns a string) it doesn't
contain the null byte. However, when you access it via
/content/0020-000d/ it does contain the null byte. That to me isn't
consistent.

I suppose what I am looking for is the same result as parsing the
value from "/tags" when accessing it via content.

I am actually trying to do this for a specific tag for RT Structure
Set, so the tags are quite nested, so using the
"/instance/{id}/content/*" endpoint is preferred, otherwise the
request takes a long time to return.

> Also when running this via curl and python, the content-length is
> stated as 52, but if you check the length of the string it is only 51
> characters.

I cannot reproduce that unfortunately (see attached log, both the
Content-Length and the actual entity body are 42 bytes long in my
example). Can you provide the exact method you're using to get that
reading?

So using the RT Struct from the dicompyler-core test data (
dicompyler-core/tests/testdata/example_data at master · dicompyler/dicompyler-core · GitHub),

following your test methods, I got a value of 44 for the
content-length when accessing the attribute directly (see attachment).

If you execute the following AJAX request in the console of the
Orthanc Web viewer (a surrogate to access the JSON content), the value
is also 44:

$.get('/instances/53f436e0-9c63bce0-20118a60-eb66e036-4e64188b/content/0020-000d/',
function(data) {
    console.log(data.length);
});
44

However, when querying the JSON string in "/tags" you will see the
length of the string returned is only 43 characters.

$.getJSON('/instances/53f436e0-9c63bce0-20118a60-eb66e036-4e64188b/tags',
function(data) {
    console.log(data["0020,000d"]["Value"].length);
});
43

This mismatch is what I am concerned about, and have to work around.
The JSON content is stripping the null byte, and I have to do the same
in my user code when processing the attribute directly.

Furthermore, what is interesting is that Orthanc will take both
versions (with and without the null byte) when you query
"/tools/lookup" with the UID.

In my opinion, and for consistency, I think the value returned by the
"/instance/{id}/content/*" should match the JSON string, especially
since "/tools/lookup" does not differentiate. This would simplify user
code, as it would not need to parse the data, and it seems to already
be occurring for the Orthanc JSON output.

Hope this makes sense.

Adit

length-test.txt (1.11 KB)

That is fair. I just assumed that since the REST API was returning
JSON for the content endpoint, that it would continue to return JSON
for attributes as well. I am able to detect for this, but it is more
workarounds than I would prefer.

That's a fair assumption as well, though definitely not a bug IMO.
Being able to access raw DICOM data in such a structured way can be
invaluable.

You shouldn't need any workarounds to detect the format, the standard
Content-Type header serves this purpose.

I suppose is it possible to document in the REST API on Google Drive
which endpoints will serve JSON and which will serve
application-octet?

That's definitely an area that needs to be improved. We have a little
bit of experience with Swagger, I think it (or something like it) would
be a good way to provide reference documentation for this kind of thing
(the spreadsheet approach is less than ideal). Don't hesitate to
contribute if you can.

I agree that it is possible for a trailing null byte to be present.
If
you look at the "/tags" on an instance and look at the
StudyInstanceUID value (or any tag that returns a string) it doesn't
contain the null byte. However, when you access it via
/content/0020-000d/ it does contain the null byte. That to me isn't
consistent.

I have to disagree as I feel /tags and /content serve different
purposes, and I wouldn't want the behavior of /content to disappear
(again I think it is very useful whenever we want the *exact* data
present in the DICOM tag). /tags does parse DICOM data and formats it
in JSON which is often what you want but not always. As soon as you
have something like a JSON string, things like padding null bytes
become irrelevant (this is just a value representation concern, not
part of the semantics of the value), so it makes a lot of sense to me
that it isn't present there. A JSON string, even alone in a JSON
document, also still requires double quotes (and as such has its own
idiosyncrasies that might be surprising to others).

A quick search will also yield some confusion about control characters
in JSON strings; I wouldn't take this into consideration if we really
needed to do this, but since this is debatable, I think it's another
argument in favor of the current behavior of /tags.

This definitely should be documented properly however, as you noted
above.

I suppose what I am looking for is the same result as parsing the
value from "/tags" when accessing it via content.

That's very reasonable. I see two ways of approaching this:

- Have /content/{tag} serve application/json as a non-default
representation. Clients can then use the standard Accept header to
request it.
- Have /tags/{tag} resources (instead of just the /tags list) with a
default representation in JSON. You could then use that instead.

Either way we get the best of both worlds without regressing and
breaking compatibility.

I encourage you to open an issue request to track this feature and even
contribute the code if you can.

I am actually trying to do this for a specific tag for RT Structure
Set, so the tags are quite nested, so using the
"/instance/{id}/content/*" endpoint is preferred, otherwise the
request takes a long time to return.

Yes, it's definitely better.

You might find JS libraries for parsing DICOM into native JS data
structures that will handle all the edge cases; padding for UI value
representations might not be the only surprising thing in the specs.

So using the RT Struct from the dicompyler-core test data (
https://github.com/dicompyler/dicompyler-core/tree/master/tests/testd
ata/example_data),

following your test methods, I got a value of 44 for the
content-length when accessing the attribute directly (see
attachment).

If you execute the following AJAX request in the console of the
Orthanc Web viewer (a surrogate to access the JSON content), the
value
is also 44:

$.get('/instances/53f436e0-9c63bce0-20118a60-eb66e036-
4e64188b/content/0020-000d/',
function(data) {
console.log(data.length);
});
44

However, when querying the JSON string in "/tags" you will see the
length of the string returned is only 43 characters.

$.getJSON('/instances/53f436e0-9c63bce0-20118a60-eb66e036-
4e64188b/tags',
function(data) {
console.log(data["0020,000d"]["Value"].length);
});
43

This mismatch is what I am concerned about, and have to work around.
The JSON content is stripping the null byte, and I have to do the
same
in my user code when processing the attribute directly.

I hope my explanation above was satisfactory; I really believe this is
the correct behavior and that the changes I suggest would help in your
scenario (leverage Accept in /content/{tag} or expose /tags/{tag}
resources).

Furthermore, what is interesting is that Orthanc will take both
versions (with and without the null byte) when you query
"/tools/lookup" with the UID.

That's interesting indeed, I didn't know exactly what format was
expected either. Looking at the test suite, you can see an odd-length
Python string representing a UID getting passed as-is without padding
in the body of the lookup request (and without specifying a content-
type header explicitly).

I would thus *not* rely on the fact that a null byte is ignored here,
and recommend for now that you make sure it is not sent to the lookup
resource.

Still, the format should definitely be specified explicitly somewhere.

Hope this helps,

Yes, I appreciate the thorough and complete discussion.

I believe either would be satisfactory, although I would lean towards
reusing /content/{tag} with a Accept header of application/json. Does
Orthanc really need another endpoint such as /tags? I am not quite
sure. Most http libraries (i.e. I am using requests in Python) can
default to application/json, in which the problem would solve itself.

There is another possibility that if Orthanc did implement
/tags/{tag}, it could work like pydicom, where you could use the Tag
Description to access the element instead of the tag itself. Although
some may feel it is more verbose, having the actual Tag Description as
an accessor could be easier to understand, instead of remembering what
the tag means.

Also, I agree that Swagger would be quite helpful for documentation.
What would be the best way to help with this?

Let me know what you think.

Adit

I believe either would be satisfactory, although I would lean towards
reusing /content/{tag} with a Accept header of application/json. Does
Orthanc really need another endpoint such as /tags? I am not quite
sure. Most http libraries (i.e. I am using requests in Python) can
default to application/json, in which the problem would solve itself.

I'd support that; I always prefer leveraging HTTP fully. I think one
useful addition would be the registration of a new media type for the
raw representations (e.g. application/vnd.orthanc.dicom-value or
something) instead of application/octet-stream, mainly to avoid any
confusion. Not strictly necessary though (and application/octet-stream
needs to keep the same representation for backward-compatibility
anyway).

That being said, I don't see /tags being removed (again backward
compatibility concerns). Deprecating it seems reasonable however. (Note
that it can also be the other way around, adding DICOM VRs to /tags.)

There is another possibility that if Orthanc did implement
/tags/{tag}, it could work like pydicom, where you could use the Tag
Description to access the element instead of the tag itself. Although
some may feel it is more verbose, having the actual Tag Description
as
an accessor could be easier to understand, instead of remembering
what
the tag means.

I agree. Regardless of the concerns above I think the remark is also
valid for /content. DICOM actually standardizes these descriptions as
"Keywords" (see PS3.6).

Also, I agree that Swagger would be quite helpful for documentation.
What would be the best way to help with this?

Honestly I think just a prototype would be great at this point, mostly
so that it can be used as a basis for discussion. The community can
organize around proper source control and publishing later.

Hello,

I am a bit lost in this very long thread.

I am unsure whether all the questions have been properly addressed, so I will go back to the original question by given telegraphic-style answers.

In navigating the REST interface, I expected all values from ‘/instances/{id}/content/*’ to have an Content Type of ‘application/json’. However when you retrieve a single value such as ‘instances/{id}/content/0020-000d/’ it returns it as ‘application/octet-stream’.

I can handle that however, I was expecting it to be a JSON string especially if the type in the tags states “String”.

The “/instances/{id}/content/*” gives partial, raw access to the binary DICOM file, bypassing any interpretation of the content of the data. The “application/octet-stream” is the expected behavior for this reason.

I suggest you to use the “/instances/{id}/tags” or “/instances/{id}/tags?simplify” URIs if you expect retrieving a JSON file containing the DICOM tags.

The actual issue I wanted to bring up, is that since it is returning a non-JSON string, when parsing it with any http client, there is an extra space (or space-like character) returned at the end of the string. […] Do you know why the ‘\x00’ is being appended?

This is due to the padding of the DICOM file format:
http://dicom.nema.org/dicom/2013/output/chtml/part05/sect_6.2.html

Because you access the raw DICOM stream, Orthanc keeps the “\x00” padding character.

Also when running this via curl and python, the content-length is stated as 52, but if you check the length of the string it is only 51 characters.

How do you measure the length of the string? I guess the padding “\x00” is interpreted as an end-of-string character. Here is a sample command-line session accessing a padded content:

As you will notice, the “Content-Length” is properly set, even in the presence of a padding.

HTH,
Sébastien-

Hello Sébastien,

Sorry, have been busy with a new baby at home. :slight_smile:

I think where I was going with this thread was looking to find a
specific tag for an RT Structure Set. Since the tags are quite nested,
using the "/instance/{id}/content/*" endpoint is preferred, Compare
this to "/instances/{id}/tags?simplify" and extracting the specific
tag, which takes much longer.

That being said, I have just resorted to monitoring the Content
Encoding and removing the padding characters if present.

Thanks,

Adit