22 Oct

There’s Plenty of Room at the Bottom

This is the first part of the epic (two-part) series of articles about tiny intros, the next one will be about an actual intro.

I love tiny graphical presentations called “intros” in the demoscene and by tiny I mean 256 bytes tiny. Usually, at this point people mention a paragraph of text is well over 256 bytes worth of data. However, you should not think that computer needs the same amount of data to store information such as “multiply variable X by variable Y” as we do. That textual representation needs 34 bytes of storage, less if you write it using shorter words or simply an equation. What a processor needs would be something like 3 instructions, 2-4 bytes each.

However, my point is that 256 bytes is ridiculously little for most people (even though you can hardly impress a mouth breather playing Halo 3 by stuffing amazing things in that space). At those sizes the source code is many times bigger than the resulting program. On the other hand, 256 bytes is plenty of space for those who are crazy enough to write such things. Why executables tend to be much, much bigger than that is because there is a lot of stuff in there what the programmer didn’t put in there himself/herself.

The Code

An obvious source of bloat in the executable is the fact programs are written in high-level languages such as C++ and then compiled into code that the processor can run. Compilers generally do a good job at producing working executables with fast code and they don’t even try to save space. Add tons of modern features – object polymorphism, runtime error checking et cetera – into that and you have your normal 100 kilobytes of code for the smallest program that necessarily doesn’t even do anything but exit back to desktop. On top of that, the executable file also houses information such as which libraries to use, the program icon and so on.

One way around the problem above – and it is a problem only if you for some reason are decided to spend you time doing something very few find nice – is to write the programs in more condensed languages. For example, C produces smaller code than C++ (thanks to the fact there is minimal extra fluff in C) and writing everything in assembly language produces pretty much as small code as it is possible to write because it basically is the human-readable version of machine language. A reminder: those modern languages (and C isn’t even modern as it was conceived in the 1970s) were created because writing complex things in assembly is not everyone’s cup of tea – that makes writing very tiny stuff in assembly not-everyone’s-cup-of-tea squared.

“rubba_b” by kometbomb…

… and the same for Windows – 254 bytes vs. 9278 bytes.

Another layer of arcane computing that makes a 256-byte executable possible is the use of old operating systems such as DOS. As you read above, the executable contains information about libraries and such, and back in the 80s hardly any shared libraries were used at all anyway. So, an executable pretty much was only the compiled program code and that was it. No extra fluff. Using this old executable format and assembly language the programmer essentially is the only factor that has any effect in the size of the executable.

Ironically, while old operating systems with no libraries do not provide any modern conveniences like fast 3D graphics or a mentionable interface to anything at all, this is a blessing. You can access the hardware directly which means a lot less code to do things like drawing a pixel on the screen. You don’t even have to set up those libraries because there isn’t any. As an example, the same intro (albeit converted to C code from assembly) in Windows XP is about 10 kilobytes in size, while the original version for DOS is less than 256 bytes. And, it would be even larger in Windows if it used the hardware to do the things it does in software.

The Data

Yet another reason why things like computer games or even larger intros are huge compared to the tiniest intros is because they come with a lot of precreated data like graphics and music. For 256-byte intros even a precreated flightpath for the camera is pretty much a no-go. Or, a 3D object with, say, more than four faces. That means we have to use procedurally generated data which is in the vogue right now even outside size restrictions, with things like Spore.

Actually, in many new school 256-byte intros the data is not even precalculated but simply calculated while the intro draws it on the screen. This is to save the space you would need if you had both the drawing loop and the precalculation loop (remember, even the loop structure uses precious bytes) – thanks to the fact even slow things are fast with a relatively new computer.

Speaking of slow things, one of them is raytracing. Which is purely mathematical, which means very little precreated material. Raytracing also is one way to produce realistic three-dimensional images, which as a rule of thumb, look nice. So, it’s not a surprise that a modern 256-byte intro uses raytracing, or raycasting, to simulate nature, look amazing and still fit in the specified size limit. If it’s 256 bytes and 3D, it probably uses some blend of raycasting. Older intros usually were versions of familiar 2D effects such as the rotozoomer, the fire effect, the Mandelbrot set and other well-explored routines.

The Art

Even among demo afficianados there is some controversy whether the tiniest intros have any merit as art or if they simply are a proof of concept of how small you can make something. There is no hidden gold vein as in nanotechnology in there. While most tiny intros are for the most part coder porn, nobody can deny their status as art installations. And if you stay in the 128-256 byte range, most of the popular tiny intros are quite pretty too, especially considering all the compromises that had to be made to make it fit.

The ultimate irony in downplaying the merits of the intros is that the demoscene pretty much was born from people pushing the machine to the limits. Well, nowadays to create a great demo (think of it as a large intro with many parts) you have to push your artistic talent more than the limits of the machines. It’s more like creating a music video, though, with your own tools. Seems like the final limit is the man-made limit of the smallest size you can fit something in and still call it art with a clean conscience.

17 Oct

Viewer2 Build 3664

  • PNG support (i.e. I rewrote the loader code so it’s easier to add more formats – contact me if you are willing to write a plug-in)
  • Image details when hovering over a thumbnail
  • Progress dialogs for various actions
  • Added a white border around thumbnails, seems to be in the vogue (you can disable it in the config)
  • When viewing images full-screen, the background will fade into black
03 Oct

Viewer2 Build 3321

  • First of all, more eye candy (see the video). I hope it also makes things easier to understand in respect to where images go when sorting
  • The searching has split into two: Fast Search and Deep Search. Fast Search uses indexing to find files very fast and Deep Search is the old search which is slow. In addition, Fast Search can use fuzzy searching thanks to the individual words it sees and it also looks in the IPTC metadata which Deep Search doesn’t do (because it doesn’t use pre-indexed data). Note: You need to build the indexes to benefit from Fast Search, the installer will do this for you. You need to do this only once, as long as you have AutoExtractWords enabled in config.
  • Use Ctrl + Delete to pysically delete images (another requested feature, keep them coming)
  • The thumbnails now have a tag that changes color according to if they have tags and if the tags make the image stand out from other images. Useful for obsessive tagging
  • The image view now zooms in the selected area, use left drag to mark the area
  • Tons of little fixes I forgot as soon as I implemented them, including a program icon

viewer2-installer-3322.exe (bugfix to 3321)

15 Sep

C masochism, part 1

I like C. I write tools in C that sane people write in Ruby or whatever the trendy script-like loosely typed language of the week is. But this is not about which language to choose. This is about doing stuff in C just because a lot of people think it’s 1.) limited, 2.) hard and 3.) unnecessarily old to be used in the 2000s.

I like C. I like how a compiled executable is usually in the 10 kilobyte range for simple programs. I like how you don’t need anything besides the executable. I like how I can’t blame but myself if the program is too slow.

But a significant amount of the reasoning why I decide to use C over and over again is the challenge. I recently read a blog entry about how programming can ruin your life, which I was ready to judge as another lame blog post about how programming is wizardry and how programmers see everything differently (you know, a bit how in the end of Matrix (1999) Neo sees that green Matrix code everywhere). Though, I have to agree the post has a good point: I too tend to make even purely practical things challenging, e.g. by using vanilla C.

Back to the point. I was writing this tool for downloading files using curl (the command line tool, not the library). I also use FlashGot for Firefox which saves all the selected URLs in a text file and runs the program specifying the text file on the command line. Parsing the text file is simple enough in C:

int main(int argc,char **argv) {
  FILE *f=fopen(argv[1],"r");
  char line[100];

  while (fgets(line,99,f)) {
    download_url_with_curl(line);
  }

  fclose(f);
  return 0;
}

However, I don’t like how the program is dependent of the text file (note how at this point the additional challenges start to sneak in). The text file is open as long as the program keeps downloading the files and that can be a long while if the files are huge. Of course, the program should read the list in memory and close the file.

int main(int argc,char **argv) {
  FILE *f=fopen(argv[1],"r");
  char line[100][100];
  int num_lines=0,i;

  while (fgets(line[num_lines++],99,f)) {
    /* line read */
  }

  fclose(f);

  for(i=0;i<num_lines;i++) download_url_with_curl(line[i]);

  return 0;
}

Still quite simple, but there are two problems: What if there are more than 100 lines? What if the lines are longer than 100 characters (including the null terminator)? The array at least should be dynamic, i.e. it should be able to resize itself when lines are read. This is still quite simple but it starts to get annoying:

int main(int argc,char **argv) {
  FILE *f=fopen(argv[1],"r");
  char **lines=NULL,line[100];
  int num_lines=0,i;

  while (fgets(line,99,f)) {
    lines=realloc(lines,(num_lines+1)*sizeof(char*));
    lines[num_lines++]=strdup(line);
  }

  fclose(f);

  for(i=0;i<num_lines;i++) {
    download_url_with_curl(line[i]);
  }

  for(i=0;i<num_lines;i++) {
    free(lines[i]); // free the allocated strings
  }

  free(lines); // free the array
  return 0;
}

You have to reallocate memory a lot and you have to free the memory. Not perverted enough. It seems I have run out of real improvements to make and have to find something else challenging.

I figured I could use the stack to have a kind of dynamic array by using recursion: if every recursion allocates a bit of the stack to store one array item, it would automatically allocate and free the memory when the functions return:

void parse_line(FILE *f) {
  char *line=malloc(sizeof(char)*100);

  if (fgets(line,99,f)) {
    parse_line(f);
    download_url_with_curl(line);
  } else {
    fclose(f);
  }

  free(line);
}

int main(int argc,char **argv) {
  parse_line(fopen(argv[1],"r"));
  return 0;
}

This basically works as follows: the parse_file() function will recurse until all of the file is read, closes the file and then it starts to call the download_url_with_curl() function as every function returns (take note it will do everything backwards, which doesn’t matter when downloading files).

The stack will overflow at some point but it should handle ~50000 or whatever amount of recursions – there are only the saved registers and the single pointer to the string in the stack per array item. An acceptable tradeoff when you think of the code being actually about as complex as the first incarnation with the fixed array size. Obviously, the people who bitch and moan about pointers being evil and how C doesn’t have any dynamic conveniences haven’t ever considered this elegant method.