Look smart on the Internet in three easy steps
(They’re images so copycopy)
(They’re images so copycopy)
This is the first part of the epic (two-part) series of articles about tiny intros, the next one will be about an actual intro.
I love tiny graphical presentations called “intros” in the demoscene and by tiny I mean 256 bytes tiny. Usually, at this point people mention a paragraph of text is well over 256 bytes worth of data. However, you should not think that computer needs the same amount of data to store information such as “multiply variable X by variable Y” as we do. That textual representation needs 34 bytes of storage, less if you write it using shorter words or simply an equation. What a processor needs would be something like 3 instructions, 2-4 bytes each.
However, my point is that 256 bytes is ridiculously little for most people (even though you can hardly impress a mouth breather playing Halo 3 by stuffing amazing things in that space). At those sizes the source code is many times bigger than the resulting program. On the other hand, 256 bytes is plenty of space for those who are crazy enough to write such things. Why executables tend to be much, much bigger than that is because there is a lot of stuff in there what the programmer didn’t put in there himself/herself.
An obvious source of bloat in the executable is the fact programs are written in high-level languages such as C++ and then compiled into code that the processor can run. Compilers generally do a good job at producing working executables with fast code and they don’t even try to save space. Add tons of modern features – object polymorphism, runtime error checking et cetera – into that and you have your normal 100 kilobytes of code for the smallest program that necessarily doesn’t even do anything but exit back to desktop. On top of that, the executable file also houses information such as which libraries to use, the program icon and so on.
One way around the problem above – and it is a problem only if you for some reason are decided to spend you time doing something very few find nice – is to write the programs in more condensed languages. For example, C produces smaller code than C++ (thanks to the fact there is minimal extra fluff in C) and writing everything in assembly language produces pretty much as small code as it is possible to write because it basically is the human-readable version of machine language. A reminder: those modern languages (and C isn’t even modern as it was conceived in the 1970s) were created because writing complex things in assembly is not everyone’s cup of tea – that makes writing very tiny stuff in assembly not-everyone’s-cup-of-tea squared.
“rubba_b” by kometbomb…
… and the same for Windows – 254 bytes vs. 9278 bytes.
Another layer of arcane computing that makes a 256-byte executable possible is the use of old operating systems such as DOS. As you read above, the executable contains information about libraries and such, and back in the 80s hardly any shared libraries were used at all anyway. So, an executable pretty much was only the compiled program code and that was it. No extra fluff. Using this old executable format and assembly language the programmer essentially is the only factor that has any effect in the size of the executable.
Ironically, while old operating systems with no libraries do not provide any modern conveniences like fast 3D graphics or a mentionable interface to anything at all, this is a blessing. You can access the hardware directly which means a lot less code to do things like drawing a pixel on the screen. You don’t even have to set up those libraries because there isn’t any. As an example, the same intro (albeit converted to C code from assembly) in Windows XP is about 10 kilobytes in size, while the original version for DOS is less than 256 bytes. And, it would be even larger in Windows if it used the hardware to do the things it does in software.
Yet another reason why things like computer games or even larger intros are huge compared to the tiniest intros is because they come with a lot of precreated data like graphics and music. For 256-byte intros even a precreated flightpath for the camera is pretty much a no-go. Or, a 3D object with, say, more than four faces. That means we have to use procedurally generated data which is in the vogue right now even outside size restrictions, with things like Spore.
Actually, in many new school 256-byte intros the data is not even precalculated but simply calculated while the intro draws it on the screen. This is to save the space you would need if you had both the drawing loop and the precalculation loop (remember, even the loop structure uses precious bytes) – thanks to the fact even slow things are fast with a relatively new computer.
Speaking of slow things, one of them is raytracing. Which is purely mathematical, which means very little precreated material. Raytracing also is one way to produce realistic three-dimensional images, which as a rule of thumb, look nice. So, it’s not a surprise that a modern 256-byte intro uses raytracing, or raycasting, to simulate nature, look amazing and still fit in the specified size limit. If it’s 256 bytes and 3D, it probably uses some blend of raycasting. Older intros usually were versions of familiar 2D effects such as the rotozoomer, the fire effect, the Mandelbrot set and other well-explored routines.
Even among demo afficianados there is some controversy whether the tiniest intros have any merit as art or if they simply are a proof of concept of how small you can make something. There is no hidden gold vein as in nanotechnology in there. While most tiny intros are for the most part coder porn, nobody can deny their status as art installations. And if you stay in the 128-256 byte range, most of the popular tiny intros are quite pretty too, especially considering all the compromises that had to be made to make it fit.
The ultimate irony in downplaying the merits of the intros is that the demoscene pretty much was born from people pushing the machine to the limits. Well, nowadays to create a great demo (think of it as a large intro with many parts) you have to push your artistic talent more than the limits of the machines. It’s more like creating a music video, though, with your own tools. Seems like the final limit is the man-made limit of the smallest size you can fit something in and still call it art with a clean conscience.
I like C. I write tools in C that sane people write in Ruby or whatever the trendy script-like loosely typed language of the week is. But this is not about which language to choose. This is about doing stuff in C just because a lot of people think it’s 1.) limited, 2.) hard and 3.) unnecessarily old to be used in the 2000s.
I like C. I like how a compiled executable is usually in the 10 kilobyte range for simple programs. I like how you don’t need anything besides the executable. I like how I can’t blame but myself if the program is too slow.
But a significant amount of the reasoning why I decide to use C over and over again is the challenge. I recently read a blog entry about how programming can ruin your life, which I was ready to judge as another lame blog post about how programming is wizardry and how programmers see everything differently (you know, a bit how in the end of Matrix (1999) Neo sees that green Matrix code everywhere). Though, I have to agree the post has a good point: I too tend to make even purely practical things challenging, e.g. by using vanilla C.
Back to the point. I was writing this tool for downloading files using curl (the command line tool, not the library). I also use FlashGot for Firefox which saves all the selected URLs in a text file and runs the program specifying the text file on the command line. Parsing the text file is simple enough in C:
int main(int argc,char **argv) {
FILE *f=fopen(argv[1],"r");
char line[100];
while (fgets(line,99,f)) {
download_url_with_curl(line);
}
fclose(f);
return 0;
}
However, I don’t like how the program is dependent of the text file (note how at this point the additional challenges start to sneak in). The text file is open as long as the program keeps downloading the files and that can be a long while if the files are huge. Of course, the program should read the list in memory and close the file.
int main(int argc,char **argv) {
FILE *f=fopen(argv[1],"r");
char line[100][100];
int num_lines=0,i;
while (fgets(line[num_lines++],99,f)) {
/* line read */
}
fclose(f);
for(i=0;i<num_lines;i++) download_url_with_curl(line[i]);
return 0;
}
Still quite simple, but there are two problems: What if there are more than 100 lines? What if the lines are longer than 100 characters (including the null terminator)? The array at least should be dynamic, i.e. it should be able to resize itself when lines are read. This is still quite simple but it starts to get annoying:
int main(int argc,char **argv) {
FILE *f=fopen(argv[1],"r");
char **lines=NULL,line[100];
int num_lines=0,i;
while (fgets(line,99,f)) {
lines=realloc(lines,(num_lines+1)*sizeof(char*));
lines[num_lines++]=strdup(line);
}
fclose(f);
for(i=0;i<num_lines;i++) {
download_url_with_curl(line[i]);
}
for(i=0;i<num_lines;i++) {
free(lines[i]); // free the allocated strings
}
free(lines); // free the array
return 0;
}
You have to reallocate memory a lot and you have to free the memory. Not perverted enough. It seems I have run out of real improvements to make and have to find something else challenging.
I figured I could use the stack to have a kind of dynamic array by using recursion: if every recursion allocates a bit of the stack to store one array item, it would automatically allocate and free the memory when the functions return:
void parse_line(FILE *f) {
char *line=malloc(sizeof(char)*100);
if (fgets(line,99,f)) {
parse_line(f);
download_url_with_curl(line);
} else {
fclose(f);
}
free(line);
}
int main(int argc,char **argv) {
parse_line(fopen(argv[1],"r"));
return 0;
}
This basically works as follows: the parse_file() function will recurse until all of the file is read, closes the file and then it starts to call the download_url_with_curl() function as every function returns (take note it will do everything backwards, which doesn’t matter when downloading files).
The stack will overflow at some point but it should handle ~50000 or whatever amount of recursions – there are only the saved registers and the single pointer to the string in the stack per array item. An acceptable tradeoff when you think of the code being actually about as complex as the first incarnation with the fixed array size. Obviously, the people who bitch and moan about pointers being evil and how C doesn’t have any dynamic conveniences haven’t ever considered this elegant method.
I’m quite sure everyone using StumbleUpon has noticed this and feels the same but it really annoys me how you get tons and tons of stumbles about SU in general. I don’t think anyone who uses SU needs a basic introduction. I can’t figure out any good reason for people to keep thumbing up those pages. Which reminds me of a few things about StumbleUpon that I think are not that great.
The whole concept of thumbing sites up or down is a bit confusing affair: often I want to see more pages that I disagree with (I like to amuse myself with that kind of material). So, do I thumb the page up and look like a creationist-racist-whatever to other people, or do I thumb it down and get less unintentionally hilarious pages? They should have two ratings for pages based on if you agree with the page and if you took time to read the page (because, like, this may shock you but SU is mainly for wasting your time).
One thing I think would be nice is that you could explicitly tell SU to send you pages with the specified tags. And, even more valuable would be if you could tell it to never send you any pages with some tag. I’m subscribed to something that sends me tons of SEO tips (i.e. search engine optimizing, the art of creating pages that get a lot of visits thanks to dubious means of getting the page first on Google results instead of because the page kicks ass) that I don’t necessarily agree with nor want to see more of them. However, it is hard to stop receiving those pages and still get the other pages in the category. It would be very convenient to have a filter that filtered out pages with a suitable tag. Maybe the filter should work a bit like how you search the Web, i.e. you would have a search box on the SU toolbar and you would type in a query, and each click on the “Stumble!” button would give the next search result.
Which brings to my last complaint: more tags, please. Even automated tagging would help a lot. After all, in the above case, pages about SEO would be easy to spot because they tend to contain the said term a lot. Or, the term tower defense. God, I hate all those TD games. Also, if you are just submitting a new web proxy site, please die. Those can’t be that profitable.