Image retargetingSep 04, 2007 |
Popularity: 5% |
Note: See below for the updated version.
The video made it look amazingly simple (and actually explained the whole idea much better than most papers manage to do), so obviously I had to try my hands at it. After about three hours worth of coding I came up with my version (you can find the source code below!).
| Original image | Retargeted image | Retargeted image |
Notice how the guy’s face and the cloud stay the same even if everything else is stuffed in the smaller image area.
| Original image | Retargeted image |
Again, the higher contrast areas (i.e. the man and the dogs, black on white) are kept the same while the snowy area is made narrower.
| It’s a small world… |
|
I didn’t read the SIGGRAPH paper, so I don’t know what makes their algorithm work that well (or maybe they just chose the right images for the video). My program works as follows (when shrinking the image horizontally):
- For each column, traverse from top to bottom picking any of the three (or more) neighboring pixels below the current pixel
- Calculate the “penalty” or error, i.e. try to pick the neighboring pixel that is colored as similarly as possible compared to the one next to it (in the direction we want to shrink the image)
- From these paths, pick the path that has the lowest penalty and crop the pixels along the path, while moving the rows to the left, as you would delete characters in a text
- Repeat until the image width is what was requested
In all, this is very slow but it could be made faster (as in the video that shows realtime scaling) if the penalty or error values were precalculated for each pixel. The algorithm should also try to pick paths that are further apart, so it would remove pixels more evenly and it should backtrack when trying to find the optimal path. Now it just goes along a “wall”, i.e. a high-contrast area when it finds one - it should backtrack and try to find a path further away. Finally, there should be a feature that allowed the user to mark faces and other areas that should never be scaled.
To use the program, you need to run it from command line or drag a 24-bit BMP image on the icon. Resize the window to scale images. If you want to save the image, simply answer “yes” when exiting the program.
New version
retarget image.bmp 800 600
This will try to resize the image to 800×600 resolution. The new version is able to load JPEG, PNG, BMP and probably some other formats too (thanks to the SDL_image library). Note that it still will save as BMP, even if the extension is JPG or so.
retarget image.bmp 800 600 4
Now there will be twice as much blur as usually (default is 2).
retarget3.zip - the program with the source code (you need SDL and SDL_image)
Here’s the original version, it is still useful:
retarget2.zip - the program with the source code (you need SDL)










James says:
Sep 07, 2007 at 1:41 am
Good work. I was planning on coding this into a retargeting function for the PHP GD library this weekend when I had some time.
According to the video, if I recall correctly, they calculate the gradient of the image for their penalty function rather than picking a close color like yours. Though it appears to have similar results.
A first-order image gradient can be calculated by grayscaling the image, running a guassian smoothing filter ( http://en.wikipedia.org/wiki/Gaussian_blur ), then a sobel filter ( http://en.wikipedia.org/wiki/Sobel ) in both the vertical and horizontal directions. There are other schemes than this.
The edges in the images are said to be important so the optimization scheme takes a big hit if it has to pass through a high contrast edge.
What is not clear in that video is how they derived the subsequent energy function.
kometbomb says:
Sep 07, 2007 at 10:41 am
Thanks for the info! I think if someone in fact added the precalclation, this would indeed be the resulting method. It is quite clear from my example that we need a smoothing filter because even normal noise on a photo makes the pathfinding work much worse. Now it works best on maps and cartoons with very similar colors.
phi says:
Sep 07, 2007 at 8:37 pm
I bow my head before you - Congratulations !
I planned on doing it in java this weekend, and i will definetly take a look at your code.
Good Luck for your future
kometbomb says:
Sep 07, 2007 at 8:43 pm
Thanks, phi
Actually, you should try what James described above. I tried it today (again only a few more functions, we’re still in the “quite simple” territory), and it does work better with photos. Though it’s still far from the magic in the original video. Keep me posted on your code, I want to link to it.
codex says:
Sep 10, 2007 at 8:00 pm
Does your retarget program handle enlarging images? i.e. 640×480 -> 800×600?
kometbomb says:
Sep 10, 2007 at 8:15 pm
It only shrinks the image (removes rows/columns/paths of pixels) so far, but the reverse operation should be just as simple (just add pixels). In fact, I don’t know why I didn’t try it already. Even if it didn’t work as expected, it should look pretty interesting.
James says:
Sep 11, 2007 at 5:26 am
I never got around to coding it this weekend. It’s funny because I just got an email today from one of my instructors that stated:
…”I’ll start talking about dynamic programming next Tuesday. There is a really cool project I think we can do for that, based on a recent SIGGRAPH paper from last month.”
So I guess I ought to start on it.
Here’s the high quality video link that explains all of this (without a lot of technical details):
http://www.faculty.idc.ac.il/arik/IMRet-All.mov
And here is the paper:
http://www.faculty.idc.ac.il/arik/imret.pdf
In the paper, it explains how he derives the error map (after calculating the image gradient):
“The optimal seam can be found using dynamic programming. The first step is to traverse the image from the second row to the last row and compute the cumulative minimum energy M for all possible connected seams for each entry (i, j):
M(i, j) = e(i, j)+ min(M(i−1, j−1),M(i−1, j),M(i−1, j+1))
At the end of this process, the minimum value of the last row in M will indicate the end of the minimal connected vertical seam. Hence, in the second step we backtrack from this minimum entry on M to find the path of the optimal seam (see Figure 1). The definition of M for horizontal seams is similar.”
kometbomb says:
Sep 11, 2007 at 5:20 pm
Thanks for the links.
OK, in the second version of my algorithm there’s a fundamental difference: I use only one gradient map and the algorithm described by the paper uses two gradient maps depending of along which axis we want to shrink the image. A quick trial shows that simply adding that feature to my algorithm makes it work much better. However, I noticed it started to evenly squash images much more.
Will says:
Sep 29, 2007 at 8:29 am
Check out rsizr.com for a free Flash-based implementation of seam carving that lets you resize your own images, both in height and width simultaneously, in real time. (You can rescale and crop images too!)
http://rsizr.com/about/gallery/ for example images
Gabe Rudy says:
Nov 04, 2007 at 9:51 pm
I like how many implementations have sprung up around this. I did a Qt4 GUI based on Andy Owen’s library, which now uses CAIR (a multi-threaded C++ retargeting library).
So if you looking for a fast desktop version (for Mac,Windows or Linux) you might want to take a look at http://code.google.com/p/seam-carving-gui/
kometbomb says:
Nov 04, 2007 at 10:09 pm
I agree this one is a pretty cool programming puzzle for anyone interested, because it’s relatively simple to solve and also doesn’t have just one correct solution. And best of all, it produces interesting if not usable results even if the algorithm doesn’t work that well.
Craigb says:
Feb 28, 2008 at 10:07 pm
Hey I have a question. Could this be helpful for procedurally generating mipmapping textures for openGL? I know its very process intensive to create the images, but if it were done at loading time that would fix that problem. The question is do you think it would be worth it at all?
kometbomb says:
Feb 28, 2008 at 10:23 pm
Hmm, I don’t think it would be useful at all, considering a mipmap pixel essentially is an average of four pixels from the larger image. E.g. when applying trilinear texturing (i.e. you crossfade between mipmaps), the two pixels from the two different mipmap levels would not match at all.
I have to say I can’t imagine what it would look like in motion. Maybe it would look cool (especially, if you need to keep important details visible even if a polygon is very small on the screen). But it definitely wouldn’t look right.
If you decide to do this, it’s worth of telling how it went.
kometbomb says:
Feb 28, 2008 at 10:34 pm
Actually, now that I thought of this again, you could try this: When drawing a polygon, figure out the polygon dimensions on the screen. Retarget the texture to those dimensions, send the new texture to the hardware and draw the polygon using the new texture. The result should be something between perspective and retargeting.
My routine is not an implementation of the algorithm described in the paper. The original routine should be more than fast enough for real-time use.