15 Sep

C masochism, part 1

I like C. I write tools in C that sane people write in Ruby or whatever the trendy script-like loosely typed language of the week is. But this is not about which language to choose. This is about doing stuff in C just because a lot of people think it’s 1.) limited, 2.) hard and 3.) unnecessarily old to be used in the 2000s.

I like C. I like how a compiled executable is usually in the 10 kilobyte range for simple programs. I like how you don’t need anything besides the executable. I like how I can’t blame but myself if the program is too slow.

But a significant amount of the reasoning why I decide to use C over and over again is the challenge. I recently read a blog entry about how programming can ruin your life, which I was ready to judge as another lame blog post about how programming is wizardry and how programmers see everything differently (you know, a bit how in the end of Matrix (1999) Neo sees that green Matrix code everywhere). Though, I have to agree the post has a good point: I too tend to make even purely practical things challenging, e.g. by using vanilla C.

Back to the point. I was writing this tool for downloading files using curl (the command line tool, not the library). I also use FlashGot for Firefox which saves all the selected URLs in a text file and runs the program specifying the text file on the command line. Parsing the text file is simple enough in C:

int main(int argc,char **argv) {
  FILE *f=fopen(argv[1],"r");
  char line[100];

  while (fgets(line,99,f)) {
    download_url_with_curl(line);
  }

  fclose(f);
  return 0;
}

However, I don’t like how the program is dependent of the text file (note how at this point the additional challenges start to sneak in). The text file is open as long as the program keeps downloading the files and that can be a long while if the files are huge. Of course, the program should read the list in memory and close the file.

int main(int argc,char **argv) {
  FILE *f=fopen(argv[1],"r");
  char line[100][100];
  int num_lines=0,i;

  while (fgets(line[num_lines++],99,f)) {
    /* line read */
  }

  fclose(f);

  for(i=0;i<num_lines;i++) download_url_with_curl(line[i]);

  return 0;
}

Still quite simple, but there are two problems: What if there are more than 100 lines? What if the lines are longer than 100 characters (including the null terminator)? The array at least should be dynamic, i.e. it should be able to resize itself when lines are read. This is still quite simple but it starts to get annoying:

int main(int argc,char **argv) {
  FILE *f=fopen(argv[1],"r");
  char **lines=NULL,line[100];
  int num_lines=0,i;

  while (fgets(line,99,f)) {
    lines=realloc(lines,(num_lines+1)*sizeof(char*));
    lines[num_lines++]=strdup(line);
  }

  fclose(f);

  for(i=0;i<num_lines;i++) {
    download_url_with_curl(line[i]);
  }

  for(i=0;i<num_lines;i++) {
    free(lines[i]); // free the allocated strings
  }

  free(lines); // free the array
  return 0;
}

You have to reallocate memory a lot and you have to free the memory. Not perverted enough. It seems I have run out of real improvements to make and have to find something else challenging.

I figured I could use the stack to have a kind of dynamic array by using recursion: if every recursion allocates a bit of the stack to store one array item, it would automatically allocate and free the memory when the functions return:

void parse_line(FILE *f) {
  char *line=malloc(sizeof(char)*100);

  if (fgets(line,99,f)) {
    parse_line(f);
    download_url_with_curl(line);
  } else {
    fclose(f);
  }

  free(line);
}

int main(int argc,char **argv) {
  parse_line(fopen(argv[1],"r"));
  return 0;
}

This basically works as follows: the parse_file() function will recurse until all of the file is read, closes the file and then it starts to call the download_url_with_curl() function as every function returns (take note it will do everything backwards, which doesn’t matter when downloading files).

The stack will overflow at some point but it should handle ~50000 or whatever amount of recursions – there are only the saved registers and the single pointer to the string in the stack per array item. An acceptable tradeoff when you think of the code being actually about as complex as the first incarnation with the fixed array size. Obviously, the people who bitch and moan about pointers being evil and how C doesn’t have any dynamic conveniences haven’t ever considered this elegant method.