Email:       Twitter: @dbrucegrant
Beyond Nuxeo 5.6

Looks like the 5.6 release of Nuxeo has some promising ui/usability improvements. No doubt there will be improvements in other areas as well. Hopefully I will have some time in the not too distant future to investigate.

I'm looking beyond 5.6, specifically with my new DAM goggles on. There is demand for industrial strength DAM across most verticals. In some verticals, digital asset management underpins the business itself, and in these situations industrial strength DAM is critical. I would like to see a stronger focus on improving the DAM product (and underlying CAP/imaging components).

I think it's time for Nuxeo to step up efforts to improve DAM over and under the covers - to get it on a level playing field with commercially available DAM solutions.

What do i think are the most important areas on which to focus?

1. Format agnosticism.

Whether it's a raw image from a Nikon camera, a multi-layer PSD, or one of a myriad of other formats the DAM solution needs to be able to handle it,

2. Size agnosticism.

Industrial strength DAM products must handle 1.5GB Tiff files just as eloquently as they handle a 5MB Jpeg. While processors, RAM, and disk speed will determine overall performance, the solution should have workarounds for any reasonably sized hardware or virtual machine. This also means finding and eradicating all software-based limitations for handling files > 2GB in size.

3. Metadata - all of it.

Filtering and finding images is one of the most important functions (in my opinion) of a digital asset system. Using image metadata for searching and filtering requires efficient automated extraction. And, it means that every type of common metadata must be easily extracted from supported image types. This includes, but is not limited to, XMP, IPTC, and EXIF.

4. Performance Profiling data and Sizing Charts

The challenges with moving and processing multi-GB files are much different than those of smaller files (where transaction times - even aggregrated - or not all that significant). It would be extremely useful if Nuxeo published recommended configurations for all Nuxeo applications, but DAM specifically. The recommendations could include JVM sizing, Database configuration, OS configuration, Imaging component tweaks, RAM, disk usage, etc. This is a big complex area, but some base level of recommendations would be helpful as a starting point.

5. Speed, Speed, Speed

With big images, and lots of them, processing speed is critical. I have seen a number of opportunities in the core Nuxeo imaging code to improve overall transaction speed. One of the biggest areas for possible improvement is the imaging library - especially when spinning off multiple Jpeg versions of the original image. Lots of opportunity here.

6. Third party Library Updates - More Frequent/Alternates/Plugins

Libraries that support third party image processing (ImageMagick, metadata-extractor, etc.) have to be kept up-to-date, preferably as part of point releases. This ensures that newer image formats can be handled with minimal fuss. Maybe it's time to look at alternatives to some of the image processing components. Is Google's OiiO a possible alternative to ImageMagick? Is there a way to abstract the third party image library in such a way that OSX users could take advantage of the CoreImage library for image processing (which is far faster than ImageMagick equivlents)?

[Added May 28 2012:

7. Parameter driven functionality

An example... When an asset is ingested into the DAM repository three images are generated - an original size JPEG, a Medium size JPEG, and a thumbnail. The number of images generated and their sizes are hard-coded in the application. It would be nice if the configuration of this functionality were exposed in a configuration document, allowing additional renditions, lower resolution renditions, different formats, etc. In my opinion this would greatly improve the flexiblity and applicability of DAM.]

What else can Nuxeo improve in the current DAM product to make it more industrial strength?



Nuxeo, DAM, ImageMagick and really big files

Dam frustrating is a good way to describe my last three days. I have been working to tune the Nuxeo DAM import for very large images … frustrating because there are many pieces, moving big images takes time, and debugging/document the whole process introduces a natural latency.

Nuxeo import relies on ImageMagick for image-related tasks (e.g. resizing and cropping) - during import ImageMagick is used heavily and is by far the most time-consuming element of the import process. If ImageMagick slows then so does your import.

Nuxeo DAM, for that matter the core imaging code, works really well with most images. "Works really well" extends to the integration with ImageMagick. But take a step into the world of very large/complex image files and the works well fantasy world takes a halting step into reality.

What do I mean by very large image (or complex) files? Three possible interpretations:

In absolute terms, any image file over 2GB is very large (well, technically anything over 2^31-1, which is the limit of an int value in Java). This limit is important because any code that uses int values for file size or related processing will be limited to 2GB files. There are still some limitations within Nuxeo when it comes to uploading very large files (not just images) - e.g., in DAM I can import a 3.5GB video file through the importer but the same import fails when attempted through the UI.  I do know that Nuxeo developers have been working actively to remove these barriers.

In relative terms, very large can be any image that is big enough (or the requested operation complex enough) to require more physical memory during image processing than your OS can make available. For example, images in the 1GB range can be a problem on a Windows box with 4GB of RAM (assuming Nuxeo server is also running). ImageMagick runs fastest when the entire pixel map can be loaded into RAM.  If the entire pixel map can't be loaded into memory then image processing slows dramatically – that’s because ImageMagick will page parts of the image in/out of memory to complete the requested task.

Complexity of the image file itself can exacerbate the situation. For example, multi-layered PSD images require far more processing time than a single layer file of the same size. To the ImageMagick identify command each layer is effectively an image of its own, with its own metadata. So it takes ImageMagick much longer to troll through multiple images to extract the required information.

Add to this complexity the differences in memory allocation/reservation of different operating systems and it makes it challenging to tune Nuxeo DAM and ImageMagick.

I have spent the last three days running through numerous configurations, file sizes, limits, memory allocations, etc. trying to better understand the issues in order to move forward. What I found is...

  • So long as you're running VCS then Postgres tuning is not really important for image processing (of course it is very important for other reasons)
  • Technically, the amount of RAM you set aside for Nuxeo will not limit the size of images you can import, but practically speaking, if you don't allocate enough RAM to reflect the size and volume of images imported, displayed, etc. in your system then your DAM solution will be less than useful. The RAM required is dependent on numerous other factors so it's difficult to be entirely prescriptive
  • The amount of physical RAM available to ImageMagick is absolutely critical to timely image processing. Not enough RAM and ImageMagick will page to disk, run times will increase dramatically and it's likely your Nuxeo transaction will timeout. The end result … nothing will be imported (the transaction will be rolled back). You might want to set upper limits on the resources available to ImageMagick (using environment variable) so all available RAM isn't gobbled up
  • The speed of your disk is also important. Slow disk = slower image processing. Longer to read, longer to write, longer to create any interim temp files. If your disk is on the network then make sure it's on a high speed connection. You might want to consider a solid state drive for ImageMagick temporary files.
  • CPU speed, number of cores, and number of threads allocated (for ImageMagick) must also be considered - especially from a global perspective, since multiple image commands can be running simultaneously.
  • The location of ImageMagick temp files is important. ImageMagick commands produce temp files, the location of which can be controlled with the MAGICK_TEMPORARY_PATH environment variable. Set the path variable to a disk with lots of free space, preferably not the same disk as your OS and Nuxeo are on.

Each case is difference, but the end result for Nuxeo-ImageMagick import tuning: expect to spend extra time understanding your requirements, get ready to buy more memory, and be prepared to put in some effort to get everything running quickly!

Nuxeo Community Survey - Any Interest?

I have been kicking about the idea of running a community driven survey on Nuxeo (the product suite and the company). Is anyone else interested in understanding how Nuxeo is being used, which versions are being used, what software has been integrated with/to Nuxeo, what people like/don't like about Nuxeo, suggestions to improve, etc.

My thinking is as follows:

1. Gather some basic demographics (products, versions, number of users, geography, etc.)

2. Gather feedback on how Nuxeo is going as a company (quality, responsiveness, etc.).

3. Gather feedback on individual products (likes, dislikes, ideas to improve, etc.)

4. Gather information on how products are being used (purpose, integrations, etc.)

5. And free format feedback.

Once collected I will generate summary report. The report will be shared with Nuxeo as formal community feedback, and a copy of the report will be emailed to all who participated in the survey. I will also post the survey on my site for those who don't provide contact information. Published results will be completely anonymous.

This will take some effort on my part but I think it will help the community, those considering Nuxeo, and it can help guide the future of the product?

Am I crazy (I probably am) or would this have some value? Also, let me know if you have suggestions for specific questions or areas of interest to include in the survey.


Nuxeo Hotfixes - An Idea

Today on Nuxeo answers a user asked about the availability of the latest hotfix package for 5.5 (link to Nuxeo answers question). The reply, as I expected, was that packaged hotfixes are for paying customers only. Absolutely understandable, for although Nuxeo is open source, Nuxeo the company is a business and needs to make money to stay in business and flourish. Hopefully everyone who has invested time and energy in Nuxeo can appreciate this.

But this got me to thinking, maybe there is an additional subscription model for small businesses and individuals, where there is no additional support provided, but hotfixes are sold as-is for a fixed price (via credit card or paypal). To Nuxeo this becomes a transaction requiring no additional labor or cost. Incremental revenue, happier users, and a chance at growing the customer base when users reach a certain size where they feel a more formal support arrangement is warranted.

In my mind, if the cost of a hotfix were set reasonably it would far outbalance the cost of having to build from source and then deploy on one or more servers (and having to deal with any resulting issues). Would $150 per hot fix be reasonable?

I can see the downside to this of course - create more hotfixes to generate more revenue... that would be evil :-)

Ok, so what about an annual hotfix only subscription. Look at the last few releases and the number of hotfixes per year on average and let's say $1500 per year for hotfixes only - but no technical support (other than that provided in forums and answers).


Cheers, Bruce.

Leveraging ImageMagick with Nuxeo

Until recently I never really appreciated the value and power of ImageMagick. This is one slick piece of software (or collection of image transformation tools).

The base integration with Nuxeo is good, but there is still plenty of opportunity to improve. The need for speed (and flexibility) becomes very evident when dealing with 250MB images, TIF images, IPTC/EXIF/XMP metadata, and with images that have a CMYK color space.

Quick Fix...

ImageMagick is integrated with Nuxeo, or more accurately loosely bound to, via a series of command line calls that perform the image-related magic.  Nuxeo expects to find the ImageMagick tools on the system path, while the loose binding comes by way of a contribution that associates Nuxeo command names with ImageMagick commands and parameters. Part of the default Nuxeo contribution is shown below…

<?xml version="1.0"?>
<component name="org.nuxeo.ecm.platform.picture.commandline.imagemagick">
    <extension target="org.nuxeo.ecm.platform.commandline.executor.service.CommandLineExecutorComponent”
        <command name="identify" enabled="true">
            <parameterString> -ping -format '%m %w %h %z' #{inputFilePath}</parameterString>
            <winParameterString> -ping -format "%m %w %h %z" #{inputFilePath}</winParameterString>
            <installationDirective>You need to install ImageMagic.</installationDirective>


       <command name="resizer" enabled="true">
          <parameterString>-flatten -resize #{targetWidth}x#{targetHeight} -depth #{targetDepth} #{inputFilePath}[0]
          <installationDirective>You need to install ImageMagic.</installationDirective>

I included this chunk of code because it demonstrates an issue with ImageMagick and the flexibility of Nuxeo (and ImageMagick) to address the issue.

In the example, the Nuxeo “resizer” command gets tied to the ImageMagick “convert” command. If the input picture is a TIF file then resizer will convert the image to a JPG and resize as required. This causes a problem when the TIF image has layers/masks. Unfortunately, converting from TIF to JPEG and resizing at the same time results in the loss of layers/masks in the resulting JPEG image – not so good.

There is a workaround, and that involves splitting the resizer command into two parts. Luckily both Nuxeo and ImageMagick are fine with using pipe-lined commands. So, what I split the single default resizer command into a conversion and then a re-size. I overrode the existing contribution with the following….

<command name="resizer" enabled="true">
     <parameterString> #{inputFilePath}[0] jpg:- | convert - -resize #{targetWidth}x#{targetHeight} –depth
                                    #{targetDepth} #{outputFilePath}</parameterString>
     <installationDirective>You need to install ImageMagick</installationDirective>

In the new “resizer” the input file first gets converted to a JPEG file (represented by the -), which is then piped into a convert command to resize the JPEG as required. This may cause slightly more work, but TIF files now get converted and resized as expected.

Resolving the TIF conversion issue with a simple contribution is a fairly simple solution.

One or the Other...

Unfortunately, not all fixes are so straight forward. The “CropAndResize” command is used by the tiling service to facilitate annotations. Crop and resize issues a pipelined ImageMagick command (stream | convert … see below) that takes an input stream and converts the image to a specified size using a given color map. Problem is – the color map is hardcoded as RGB – which is an issue for any image in a CMYK color space.

<command name="cropAndResize" enabled="true">
      <parameterString> -map rgb -storage-type char -extract #{tileWidth}x#{tileHeight}+#{offsetX}+#{offsetY}
                                    #{inputFilePath} - | convert -depth 8 -size #{tileWidth}x#{tileHeight} -resize
                                    #{targetWidth}x#{targetHeight}! rgb:- #{outputFilePath} </parameterString>
      <installationDirective>You need to install ImageMagic.</installationDirective>

The –map rgb in the cropAndResize command can be changed to –map cmyk (and the piped output changed to cmyk:-) … and CMYK images will now appear properly for annotations. However, RGB images are no longer tiled correctly. In this case programmatic changes are required to determine the color space of the image being processed and then to pass the appropriate color map value to cropAndResize as a parameter.  Getting the color space could be part of an existing Identify command (where all image objects are picked up) or it could be done in a separate process using “identify –format %[colorspace] filename.tif” (although this will incur extra processing costs).

To make this work the image tiling package and imaging core package need to be modified.

Big Image Efficiencies... next time

Finally, when it comes to dealing with big files in Nuxeo (100MB+) there is room for improvement. I haven’t yet walked through the whole process but I have watched the number of temp files that get created during the ingestion process. At first blush it appears that there is an opportunity to make this more efficient – which is really important for large files! However, I will have to dig into this another time.

Showing 21 - 25 of 48 results.
Items per Page 5
of 10

Recent Entries Recent Entries

RSS (Opens New Window)
Showing 1 - 5 of 15 results.
of 3