02 Sep

I came up with a neat way to retarget images using a mesh that is transformed by rotating and doing an ortographic (non-perspective) projection. This is generally quite interesting since it can be done using a mesh and simple transformations and so can be done almost completely on the GPU. Even using a mesh can be avoided if one uses a height map à la parallax mapping to alter the texture coordinates so just one quad needs to be drawn (with a suitable fragment shader, of course).

The idea is simply to have areas of images at a slope depending of how much the areas should be resized when retargeting. The slope angle depends of from what angle the source image is viewed to get the retargeting effect since the idea is to eliminate the viewing angle using the slope.

Here’s a more detailed explanation:

1. Create an energy map of the source image, areas of interest have high energy

2. Traverse the energy map horizontally accumulating the energy value of the current pixel and the accumulated sum from the previous pixel

3. Repeat the previous step vertically using the accumulated map from the previous step. The accumulated energy map now “grows” from the upper left corner to the lower right corner. You may need a lot of precision for the map

4. Create a mesh with the x and y coordinates of each vertex encoding the coordinates of the source image (and thus also the texture coordinates) and the z coordinate encoding the accumulated energy. The idea is to have all areas of interest at a steep slope and other areas with little or no slope

5. Draw the mesh with ortographic projection, using depth testing and textured with the source image

6. Rotate the mesh around the Y axis to retarget image horizontally and around the X axis to retarget image vertically

Here is a one-dimensional example (sorry for the awful images):

Source image

The red dots represent areas of interest, such as sharp edges that we don’t want to resize as much as we want to resize the areas between the details. We then elevate our line for every red dot:

Elevated mesh

Imagine the above example as something you would do for every row and column of a two-dimensional image. Now, when the viewer views the mesh (which is drawn without perspective) he or she sees the original image:

Viewing the mesh from zero angle

However, if the viewing angle is changed, the red dots don’t move in relation to each other as much as the areas that are not elevated when they are projected on the view plane. Consider the below example:

Viewing the mesh from an angle (gray line is the projected mesh)

Note how the unelevated line segments will seem shorter from the viewer’s perspective while the distance between the red dots is closer to the original distance. The blue dots in the above image show how areas that have little energy and so are not on a slope, thus will be move more compared to the red dots.

04 Sep

I saw this video of a SIGGRAPH paper about image retargeting (high res version here, read the paper here), that is rescaling an image so that the algorithm keeps the interesting areas intact and doesn’t squash everything. It’s called seam carving in the paper.

The video made it look amazingly simple (and actually explained the whole idea much better than most papers manage to do), so obviously I had to try my hands at it. After about three hours worth of coding I came up with my version (you can find the source code below!).

 Original image Retargeted image Retargeted image

Notice how the guy’s face and the cloud stay the same even if everything else is stuffed in the smaller image area.

 Original image Retargeted image

Again, the higher contrast areas (i.e. the man and the dogs, black on white) are kept the same while the snowy area is made narrower.

 It’s a small world… ;)

I didn’t read the SIGGRAPH paper, so I don’t know what makes their algorithm work that well (or maybe they just chose the right images for the video). My program works as follows (when shrinking the image horizontally):

1. For each column, traverse from top to bottom picking any of the three (or more) neighboring pixels below the current pixel

2. Calculate the “penalty” or error, i.e. try to pick the neighboring pixel that is colored as similarly as possible compared to the one next to it (in the direction we want to shrink the image)

3. From these paths, pick the path that has the lowest penalty and crop the pixels along the path, while moving the rows to the left, as you would delete characters in a text

4. Repeat until the image width is what was requested

In all, this is very slow but it could be made faster (as in the video that shows realtime scaling) if the penalty or error values were precalculated for each pixel. The algorithm should also try to pick paths that are further apart, so it would remove pixels more evenly and it should backtrack when trying to find the optimal path. Now it just goes along a “wall”, i.e. a high-contrast area when it finds one – it should backtrack and try to find a path further away. Finally, there should be a feature that allowed the user to mark faces and other areas that should never be scaled.

To use the program, you need to run it from command line or drag a 24-bit BMP image on the icon. Resize the window to scale images. If you want to save the image, simply answer “yes” when exiting the program.

### New version

When using the new version, you can resize to a specific size (as requested) by running the program as follows:

`retarget image.bmp 800 600`

This will try to resize the image to 800×600 resolution. The new version is able to load JPEG, PNG, BMP and probably some other formats too (thanks to the SDL_image library). Note that it still will save as BMP, even if the extension is JPG or so.

Use the left mouse button to mark areas such as faces, eyes and so on, and the right mouse button to mark areas that you want to remove. Middle mouse button erases the marks. To tweak the blur amount (less is better for cartoon style images and maps, the opposite for photos), run it like this:

`retarget image.bmp 800 600 4`

Now there will be twice as much blur as usually (default is 2).

retarget3.zip – the program with the source code (you need SDL and SDL_image)

Here’s the original version, it is still useful:

retarget2.zip – the program with the source code (you need SDL)