A Tale of Two Vector File Formats

Since version 7.5, the DecoNetwork Designer has performed all of its rendering client-side to greatly improve the user experience. To help achieve this, we use SVG as our primary display format. SVG enjoys wide compatibility across browsers and most user manipulations that our designer requires can be done in real-time. If we ever run into problems, the human-readability of the format improves debugging.

However SVG is no panacea in the world of printing. This is where PDF is often preferable as a preflight format. SVG and PDF are, at their core, very different formats that have a very different target use. Here we discuss the challenges that DecoNetwork has had to overcome to reliably offer PDF as a production file format for printing, whilst using SVG for front-end display.

Color Space

One of the main limitations with SVG is its exclusive use of the sRGB color space. While it is true that SVG1.1 does actually support embedded ICC color profiles, and SVG2 supports an even wider range of managed and unmanaged colors, these features generally have no browser support. In practice, we are dealing with an unmanaged, RGB-only format. However PDFs often contain colors defined in non-RGB color spaces (for example, PANTONE spot colors, or CMYK) and we should strive to preserve this information in the upload-designer-production workflow.

With this in mind we can summarize the following objectives when dealing with PDF files:

We must support PDF as an input file format (eg, designer uploads, stock library images and so on)
Allow manipulation of imported PDF objects in the designer in real-time, using an SVG representation of those objects.
Preserve colors – including non-RGB colors – that were in the original PDF, unless they are user-modified or palette-matched in the designer.
Support PDF as an output production file format, rendering colors true to the original document or those picked/matched in the designer.

When a vector file (eg PDF) is uploaded into a deconetwork library or the front-end designer, it needs to be converted to SVG for our designer to be able to manipulate it. At a glance, it would be tempting to simply convert it with one of the many tools available, and manipulate only the resulting SVG from that point forward. Later when generating production files, we can just convert our SVG back to PDF with the same tool. This would be a relatively simple route, and a little experimentation leads us to believe that this is what some of our competitors are doing.

But alas, as mentioned earlier, when the uploaded PDF contains, for example, PANTONE® or CMYK-spot colors, the convert-and-forget approach would lose this important information and we’d be left with an RGB-only document even when we later convert back to PDF for production files. In practice, this may lead to an outcome whereby a fulfilment center prints a design on a garment where the colors look “off” compared to the file that was originally uploaded by the customer.

With this in mind, imagine we have a PDF file with the following structure:

As we can see it uses two CMYK colors and a color from an imaginary palette of spot colors. After a rudimentary conversion to SVG, we might end up with the following:

If we were to load this SVG into the designer and only consider the information it contained, we’d lose the original colors that the artist had intended to use.

Our solution to this lossy process implements a hybrid approach. Any uploaded vector is converted to an intermediate file format (DNT – DecoNetwork Template) whose objects are easily converted to SVG for display purposes but also stores the “real” color of those objects alongside the RGB representation that gets used in the SVG. This way we can build a “mapping” that we can refer to if we convert back to PDF at a later stage.

Since we didn’t want to write an entire PDF converter/parser from scratch, we build the DNT in two passes. First, we still convert the PDF to SVG using one of a handful of existing solutions. We then interrogate the PDF object structure for used colors to create the mapping which might look something like this:

Finally, we have an SVG-to-DNT converter that takes both the SVG and the above mapping to create the complete object. The DNT is converted to SVG in the designer, and any manipulations made by the user will modify both the SVG on screen and the underlying DNT.

After an product is decorated in the desinger and an order is created, we enter the production phase. If the production file format is PDF, the steps are effectively the reverse of the above process. Armed with the DNT generated from the designer, we convert it to an SVG and use off-the-shelf tools to convert the SVG to PDF. But now this PDF has only RGB colors. We have written another tool that takes the RGB mappings from the DNT and substitutes these colors directly in the PDF.

“That seems too complicated.” I hear you say. “You can just convert RGB back to CMYK with a simple formula, here’s a one I found with some googling…” Unfortunately it is not so simple. Whilst various formulae exist for remedial RGB->CMYK conversions, they are approximations at best, since RGB->CMYK is essentially a subjective conversion. CMYK and RGB colorspaces have a different gamut, so given only an RGB result, we can’t know for sure the original CMYK color from which it was generated (ie, the gamut mapping is missing). Not to mention simple formula-conversion doesn’t solve the case for spot colors. We need to, and can, do better than this.

We can visualize the PDF life-cycle as follows:

When PDFs behave badly

Occasionally we see issues with PDFs that leave us scratching our heads. In the production phase, DecoNetwork automatically captures “fatal” errors (those where a production file fails to be produced). These errors are sent immediately to our team of engineers to investigate and resolve, meaning that often problems are seamlessly rectified before the user even knew anything went wrong.

Other times, files are produced that just don’t work right. Recently a DecoNetwork client had reported that a production PDF was causing Adobe Illustrator to crash when opening. Leaving aside the general recommendation not to use AI for PDFs, sometimes you’ve got to work with the tools you have. Unfortunately AI didn’t leave any traces in its logs as to what might have gone wrong.

At first glance, there appeared to be nothing wrong with the PDF. Certainly not visually, and its internal structure appeared okay according to the various tools we had at our disposal. It would open fine in Adobe Reader, Acrobat Pro and various other open-source readers.

We were left with simply trying to track down what traits were unique to the failing PDF that didn’t exist in PDFs that worked fine. Upon inspecting the PDF in a text editor (the internal structure of a PDF is quasi-human readable), one object of interest was noted:

...
6 0 obj
<< /Length 568
 /Filter /FlateDecode
 /Type /XObject
 /Subtype /Image
 /Width 1800
 /Height 2000
 /ColorSpace /DeviceGray
 /Interpolate false
 /BitsPerComponent 1
 /SMask 7 0 R
>>
stream
....

Describing what each of these tags does is beyond the scope of this article, but in short: This defines a grayscale raster image blob, using 1-bit of data per component. Since grayscale has only one component, our blob is effectively a 1-bit-per-pixel image, ie, black and white. These 1-bit rasters repeatedly showed up in the AI-crashing PDFs and were consistently absent from the PDFs that opened O.K. I suspected we had our culprit.

It is worth noting that this is a perfectly valid PDF object. However, given the size of the PDF specification, it comes as no real surprise to us that not every combination of PDF features are supported in every PDF tool (though not supporting 1 bit images – the absolute simplest type of raster, is a tad disappointing). Nonetheless the goal here is to ensure that for maximum compatibility we do not produce PDFs that contain these 1-bit objects. We instead substitute them for RGB objects (that happen to contain only values [0,0,0] and [255,255,255]). Theoretically one could “tap into” the PDF pipeline (mentioned in the previous section) at any point to substitute this object for a 24-bit RGB object.

Digging deeper, we find that the Cairo graphics library, upon which our SVG->PDF conversion partly depends, is “smart” enough to only write out a monochrome stream even if it is given an RGB stream as an input, provided those inputs contain only black and white pixels. Presumably this is to keep the file as small as possible (the above black-and-white stream uncompressed is ~9KB, the same content encoded in an RGB stream would be around 216KB). What this does mean though is our stream substitution can only be performed at the very last step, before the final production PDF is saved.

We could, alternatively, inject a single colored pixel into the otherwise grayscale stream to guarantee the stream remains RGB all the way through. This is simple and tempting, however, given the use case, we can’t help but feel introducing a new color, even if visually miniscule, could potentially spoil a production workflow.

Final thoughts

Hopefully by now you have more insight into, and have gained further appreciation of, the background processes that happen in DecoNetwork’s production file processing pipeline. At DecoNetwork we’re always adding features, and make those that are particularly experimental available to our beta testers. We encourage you to push our system to its limits and let us know the results.

from DecoNetwork Blog https://www.deconetwork.com/blog/a-tale-of-two-vector-file-formats/

Hover your mouse to Deconetwork.com

Lamurdi's Official Blog

Sunday, August 6, 2017