15 Sep

C masochism, part 1

I like C. I write tools in C that sane people write in Ruby or whatever the trendy script-like loosely typed language of the week is. But this is not about which language to choose. This is about doing stuff in C just because a lot of people think it’s 1.) limited, 2.) hard and 3.) unnecessarily old to be used in the 2000s.

I like C. I like how a compiled executable is usually in the 10 kilobyte range for simple programs. I like how you don’t need anything besides the executable. I like how I can’t blame but myself if the program is too slow.

But a significant amount of the reasoning why I decide to use C over and over again is the challenge. I recently read a blog entry about how programming can ruin your life, which I was ready to judge as another lame blog post about how programming is wizardry and how programmers see everything differently (you know, a bit how in the end of Matrix (1999) Neo sees that green Matrix code everywhere). Though, I have to agree the post has a good point: I too tend to make even purely practical things challenging, e.g. by using vanilla C.

Back to the point. I was writing this tool for downloading files using curl (the command line tool, not the library). I also use FlashGot for Firefox which saves all the selected URLs in a text file and runs the program specifying the text file on the command line. Parsing the text file is simple enough in C:

int main(int argc,char **argv) {
  FILE *f=fopen(argv[1],"r");
  char line[100];

  while (fgets(line,99,f)) {
    download_url_with_curl(line);
  }

  fclose(f);
  return 0;
}

However, I don’t like how the program is dependent of the text file (note how at this point the additional challenges start to sneak in). The text file is open as long as the program keeps downloading the files and that can be a long while if the files are huge. Of course, the program should read the list in memory and close the file.

int main(int argc,char **argv) {
  FILE *f=fopen(argv[1],"r");
  char line[100][100];
  int num_lines=0,i;

  while (fgets(line[num_lines++],99,f)) {
    /* line read */
  }

  fclose(f);

  for(i=0;i<num_lines;i++) download_url_with_curl(line[i]);

  return 0;
}

Still quite simple, but there are two problems: What if there are more than 100 lines? What if the lines are longer than 100 characters (including the null terminator)? The array at least should be dynamic, i.e. it should be able to resize itself when lines are read. This is still quite simple but it starts to get annoying:

int main(int argc,char **argv) {
  FILE *f=fopen(argv[1],"r");
  char **lines=NULL,line[100];
  int num_lines=0,i;

  while (fgets(line,99,f)) {
    lines=realloc(lines,(num_lines+1)*sizeof(char*));
    lines[num_lines++]=strdup(line);
  }

  fclose(f);

  for(i=0;i<num_lines;i++) {
    download_url_with_curl(line[i]);
  }

  for(i=0;i<num_lines;i++) {
    free(lines[i]); // free the allocated strings
  }

  free(lines); // free the array
  return 0;
}

You have to reallocate memory a lot and you have to free the memory. Not perverted enough. It seems I have run out of real improvements to make and have to find something else challenging.

I figured I could use the stack to have a kind of dynamic array by using recursion: if every recursion allocates a bit of the stack to store one array item, it would automatically allocate and free the memory when the functions return:

void parse_line(FILE *f) {
  char *line=malloc(sizeof(char)*100);

  if (fgets(line,99,f)) {
    parse_line(f);
    download_url_with_curl(line);
  } else {
    fclose(f);
  }

  free(line);
}

int main(int argc,char **argv) {
  parse_line(fopen(argv[1],"r"));
  return 0;
}

This basically works as follows: the parse_file() function will recurse until all of the file is read, closes the file and then it starts to call the download_url_with_curl() function as every function returns (take note it will do everything backwards, which doesn’t matter when downloading files).

The stack will overflow at some point but it should handle ~50000 or whatever amount of recursions – there are only the saved registers and the single pointer to the string in the stack per array item. An acceptable tradeoff when you think of the code being actually about as complex as the first incarnation with the fixed array size. Obviously, the people who bitch and moan about pointers being evil and how C doesn’t have any dynamic conveniences haven’t ever considered this elegant method.

07 Sep

Viewer2 Build 2840

A quick release for those who use Viewer2, a lot of half-ready stuff this time:

  • Support for network paths (i.e. path starts with \\)
  • Option to flag an image as a favorite (the heart icon), search with is:favorite
  • Shows basic information about images with little icons (e.g. image has tags, image is favorite)
  • Better date: search option, see the manual
  • Should behave a little better on dual-screen boxes
  • Option for ordering images in a square instead of a spiral
  • BETABETABETA: The browser can display subdirectories (to check it out, set ShowDirectories to a non-zero value in the registry)

Coming up (hopefully in the next release) are IPTC tag export, comprehensive directory navigation via the main browser (instead of the file list on the left), path and search bookmarks (i.e. “show all of my favorite images from this month”) and a config tool.

Do send more suggestions.

Also, if someone is a fan of this project and could help with the config tool (needs just a dialog that changes a few keys in the registry, the main problem for me is that I don’t do Visual Studio or other IDEs that provide a dialog editor, so it’s kinda annoying) it would be nice. It seems Viewer2 has hit a niche so it would be a shame if the potential was wasted.

viewer2-installer-2840.exe

04 Sep

Image retargeting

Note: See below for the updated version.

I saw this video of a SIGGRAPH paper about image retargeting (high res version here, read the paper here), that is rescaling an image so that the algorithm keeps the interesting areas intact and doesn’t squash everything. It’s called seam carving in the paper.

The video made it look amazingly simple (and actually explained the whole idea much better than most papers manage to do), so obviously I had to try my hands at it. After about three hours worth of coding I came up with my version (you can find the source code below!).

original retargeted e3.PNG
Original image Retargeted image Retargeted image

Notice how the guy’s face and the cloud stay the same even if everything else is stuffed in the smaller image area.

f1.JPG f2.PNG
Original image Retargeted image

Again, the higher contrast areas (i.e. the man and the dogs, black on white) are kept the same while the snowy area is made narrower.

map-world.gif w2.png
It’s a small world… ;)

I didn’t read the SIGGRAPH paper, so I don’t know what makes their algorithm work that well (or maybe they just chose the right images for the video). My program works as follows (when shrinking the image horizontally):

  1. For each column, traverse from top to bottom picking any of the three (or more) neighboring pixels below the current pixel

  2. Calculate the “penalty” or error, i.e. try to pick the neighboring pixel that is colored as similarly as possible compared to the one next to it (in the direction we want to shrink the image)

  3. From these paths, pick the path that has the lowest penalty and crop the pixels along the path, while moving the rows to the left, as you would delete characters in a text

  4. Repeat until the image width is what was requested

In all, this is very slow but it could be made faster (as in the video that shows realtime scaling) if the penalty or error values were precalculated for each pixel. The algorithm should also try to pick paths that are further apart, so it would remove pixels more evenly and it should backtrack when trying to find the optimal path. Now it just goes along a “wall”, i.e. a high-contrast area when it finds one – it should backtrack and try to find a path further away. Finally, there should be a feature that allowed the user to mark faces and other areas that should never be scaled.

To use the program, you need to run it from command line or drag a 24-bit BMP image on the icon. Resize the window to scale images. If you want to save the image, simply answer “yes” when exiting the program.

New version

When using the new version, you can resize to a specific size (as requested) by running the program as follows:

retarget image.bmp 800 600

This will try to resize the image to 800×600 resolution. The new version is able to load JPEG, PNG, BMP and probably some other formats too (thanks to the SDL_image library). Note that it still will save as BMP, even if the extension is JPG or so.

Use the left mouse button to mark areas such as faces, eyes and so on, and the right mouse button to mark areas that you want to remove. Middle mouse button erases the marks. To tweak the blur amount (less is better for cartoon style images and maps, the opposite for photos), run it like this:

retarget image.bmp 800 600 4

Now there will be twice as much blur as usually (default is 2).

retarget3.zip – the program with the source code (you need SDL and SDL_image)

Here’s the original version, it is still useful:

retarget2.zip – the program with the source code (you need SDL)

03 Sep

StumbleUpon on StumbleUpon and other stuff off the top of my head

I’m quite sure everyone using StumbleUpon has noticed this and feels the same but it really annoys me how you get tons and tons of stumbles about SU in general. I don’t think anyone who uses SU needs a basic introduction. I can’t figure out any good reason for people to keep thumbing up those pages. Which reminds me of a few things about StumbleUpon that I think are not that great.

The whole concept of thumbing sites up or down is a bit confusing affair: often I want to see more pages that I disagree with (I like to amuse myself with that kind of material). So, do I thumb the page up and look like a creationist-racist-whatever to other people, or do I thumb it down and get less unintentionally hilarious pages? They should have two ratings for pages based on if you agree with the page and if you took time to read the page (because, like, this may shock you but SU is mainly for wasting your time).

One thing I think would be nice is that you could explicitly tell SU to send you pages with the specified tags. And, even more valuable would be if you could tell it to never send you any pages with some tag. I’m subscribed to something that sends me tons of SEO tips (i.e. search engine optimizing, the art of creating pages that get a lot of visits thanks to dubious means of getting the page first on Google results instead of because the page kicks ass) that I don’t necessarily agree with nor want to see more of them. However, it is hard to stop receiving those pages and still get the other pages in the category. It would be very convenient to have a filter that filtered out pages with a suitable tag. Maybe the filter should work a bit like how you search the Web, i.e. you would have a search box on the SU toolbar and you would type in a query, and each click on the “Stumble!” button would give the next search result.

Which brings to my last complaint: more tags, please. Even automated tagging would help a lot. After all, in the above case, pages about SEO would be easy to spot because they tend to contain the said term a lot. Or, the term tower defense. God, I hate all those TD games. Also, if you are just submitting a new web proxy site, please die. Those can’t be that profitable.