SeriesDescription characters '+' and '-' removed during OE2 ZIP export

Hi All,

I have a question regarding character handling during export from OE2.

We run Orthanc in a research environment. During MR experiments, researchers frequently use + and - characters in the SeriesDescription (0008,103E) to distinguish between small variations in experimental variables (e.g. increases or decreases).

When exporting a study via OE2 > Export > ZIP, the generated series directories are named based on the SeriesDescription. However, the + and - characters are removed from the folder names in the exported ZIP. I’ve included an example below to better illustrate this.

SeriesDescription

image

OE2 Export

This creates confusion for researchers because:

  • It becomes difficult to distinguish between related series

  • In some cases, folders appear duplicated as the only difference was this symbol

Importantly:

  • The original DICOM files received by Orthanc contain the + and –

  • The characters are present in the SeriesDescription on OE2

  • The exported DICOM metadata still correctly contains the characters in (0008,103E)

  • The issue appears to affect only the generated folder names in the ZIP export

I assume this may be intentional filename sanitisation, but I’m not aware of any filesystem that doesn’t support + or -.

Is anyone able to clarify:

  1. Why are these characters removed from the generated folder names?

  2. Is this behaviour configurable?

Environment:

  • Orthanc version: 25.6.4

  • Orthanc Explorer 2 version: 1.8.5

  • Export method: OE2 > Export ZIP > extracted on Windows 11

Thanks in advance.

Hi Jacob,

You are right, file/folder names in zip should support all characters but, right now, Orthanc keep only the alphanumeric characters.

I have added a TODO in Orthanc to handle the full utf-8 character sets.

Best,

Alain.

Hello,

Caution is needed here. Indeed, the original ZIP format doesn’t use UTF-8 but the CP437 encoding. Support for UTF-8 was only introduced in 2006, apparently by adding a new “Unicode Path Extra Field”. I have had a brief look at the minizip 1.1 implementation that is internally used by Orthanc, and nothing indicates that this field is properly set. This is why Orthanc currently avoids non-ASCII characters in the generated ZIP archives. The first step would thus be to update minizip, which is not trivial.

Regards,
Sébastien-

Hi Alain and Sébastien,

Thank you both for your replies.

I appreciate that this isn’t a straightforward change, especially given considerations and Sébastien outlined.

Thanks for adding it to the TODO list. This is more of a quality of life improvement for our research workflows rather than a critical issue. In the meantime, we can implement a simple workaround by simply avoiding these characters in the SeriesDescription.

Many thanks again for the clarification,

Jacob

Hello,

UTF-8 support in ZIP archives has just been implemented in Orthanc by this changeset.

The new global configuration option ZipUseUtf8 can be set to true to generate ZIP archives containing UTF-8 filenames. If the option is set to false (the default), only ASCII characters will be included, which largely corresponds to the behavior of Orthanc <= 1.12.10. Note that + and - are now also allowed, which was not the case in earlier versions of Orthanc.

This global option can also be overridden on a per-archive basis by providing the Utf8 argument to the {...}/archive and /tools/create-archive routes.

This development was made possible by the very recent release 1.3.2 of minizip (released on 2026-02-17), which is the first version of minizip that sets bit 11 of the ZIP header flag when the path is encoded in UTF-8 (see their release notes).

The reason ZipUseUtf8 is set to false by default is that UTF-8 is still not well supported by many software. My tests were done on GNU/Linux using the 7z command-line tool. Feedback from testing this new feature is welcome.

Kind regards,
Sébastien-

1 Like