Dam frustrating is a good way to describe my last three days. I have been working to tune the Nuxeo DAM import for very large images … frustrating because there are many pieces, moving big images takes time, and debugging/document the whole process introduces a natural latency.
Nuxeo import relies on ImageMagick for image-related tasks (e.g. resizing and cropping) - during import ImageMagick is used heavily and is by far the most time-consuming element of the import process. If ImageMagick slows then so does your import.
Nuxeo DAM, for that matter the core imaging code, works really well with most images. "Works really well" extends to the integration with ImageMagick. But take a step into the world of very large/complex image files and the works well fantasy world takes a halting step into reality.
What do I mean by very large image (or complex) files? Three possible interpretations:
In absolute terms, any image file over 2GB is very large (well, technically anything over 2^31-1, which is the limit of an int value in Java). This limit is important because any code that uses int values for file size or related processing will be limited to 2GB files. There are still some limitations within Nuxeo when it comes to uploading very large files (not just images) - e.g., in DAM I can import a 3.5GB video file through the importer but the same import fails when attempted through the UI. I do know that Nuxeo developers have been working actively to remove these barriers.
In relative terms, very large can be any image that is big enough (or the requested operation complex enough) to require more physical memory during image processing than your OS can make available. For example, images in the 1GB range can be a problem on a Windows box with 4GB of RAM (assuming Nuxeo server is also running). ImageMagick runs fastest when the entire pixel map can be loaded into RAM. If the entire pixel map can't be loaded into memory then image processing slows dramatically – that’s because ImageMagick will page parts of the image in/out of memory to complete the requested task.
Complexity of the image file itself can exacerbate the situation. For example, multi-layered PSD images require far more processing time than a single layer file of the same size. To the ImageMagick identify command each layer is effectively an image of its own, with its own metadata. So it takes ImageMagick much longer to troll through multiple images to extract the required information.
Add to this complexity the differences in memory allocation/reservation of different operating systems and it makes it challenging to tune Nuxeo DAM and ImageMagick.
I have spent the last three days running through numerous configurations, file sizes, limits, memory allocations, etc. trying to better understand the issues in order to move forward. What I found is...
- So long as you're running VCS then Postgres tuning is not really important for image processing (of course it is very important for other reasons)
- Technically, the amount of RAM you set aside for Nuxeo will not limit the size of images you can import, but practically speaking, if you don't allocate enough RAM to reflect the size and volume of images imported, displayed, etc. in your system then your DAM solution will be less than useful. The RAM required is dependent on numerous other factors so it's difficult to be entirely prescriptive
- The amount of physical RAM available to ImageMagick is absolutely critical to timely image processing. Not enough RAM and ImageMagick will page to disk, run times will increase dramatically and it's likely your Nuxeo transaction will timeout. The end result … nothing will be imported (the transaction will be rolled back). You might want to set upper limits on the resources available to ImageMagick (using environment variable) so all available RAM isn't gobbled up
- The speed of your disk is also important. Slow disk = slower image processing. Longer to read, longer to write, longer to create any interim temp files. If your disk is on the network then make sure it's on a high speed connection. You might want to consider a solid state drive for ImageMagick temporary files.
- CPU speed, number of cores, and number of threads allocated (for ImageMagick) must also be considered - especially from a global perspective, since multiple image commands can be running simultaneously.
- The location of ImageMagick temp files is important. ImageMagick commands produce temp files, the location of which can be controlled with the MAGICK_TEMPORARY_PATH environment variable. Set the path variable to a disk with lots of free space, preferably not the same disk as your OS and Nuxeo are on.
Each case is difference, but the end result for Nuxeo-ImageMagick import tuning: expect to spend extra time understanding your requirements, get ready to buy more memory, and be prepared to put in some effort to get everything running quickly!