Pretty Pictures via Genetic Algorithms

Deadlines

Tuesday, 4 AprilProject out
Tuesday, 11 April, 11:59pmSpecs due
Tuesday, 18 April, 11:59pmPrototype due
Friday, 28 April, 11:59pmProject due (last day of classes)

How you'll work

Unlike previous projects, this one relaxes the constraints. You can choose your own partner. (You must have a partner, and we will allow one group of three, if necessary.) You can do pair programming. Or not. You can do unit testing. Or not. As part of your spec, we expect you to describe how you plan to go about designing, implementing, and testing your project. Of course, we encourage you to do it the way we've required you to do it in the past. Note: just because we're relaxing the requirements, doesn't mean we're allowing you to be lax in the work you do. We're just giving those of you who think you have better ideas the chance to give them a test drive. You'll still be graded on the same scoring sheet as before. (So, where we require "unit testing", you're now required to have a good testing strategy. Likewise, your specs and implementation must still stay in sync.)

Partners and Graders

Update: Groups are here. Grading teams are still TBD.

The Project

The purpose of the assignment is to automatically create pretty pictures by using genetic algorithms. This may sound a little strange, but the results can be quite spectacular. Here's the general idea: a picture is really a function that maps points (x,y) to colors (r,g,b). You evaluate this function once for each pixel on the screen and store the results. So far so good, but unlike traditional computer graphics, we will be breeding these functions with each other, and allowing the user to select which pictures from the current generation should be bred into the next generation.

This assignment is based on a paper written in 1991 by Karl Sims, Artificial Evolution for Computer Graphics. Scroll down to the bottom of his Web page and check out the pretty pictures. For a whizzy Web implementation of the same thing, check out The Gallery of Random Art. You should consider the Sims paper to be a part of your assignment specification.

Breeding functions?

It's not as crazy as it sounds. While there are a number of techniques available, you will use the same technique as Sims does in section 4.3 ("Mating Symbolic Expressions"). The idea is that any function can be represented as a tree. The function is the genotype, whereas the final picture produced is the phenotype of that gene. Sims describes two forms of breeding:

With standard GA systems, you need a fitness function. You evaluate the fitness of every gene you generate and then crossbreed the genes with the highest fitness. You have a fair amount of flexibility in how you do this. One simple way is, for each child you wish to produce, you randomly choose the future child's parents (biased toward parents with a higher fitness value) and then crossbreed them to product the child.

In the case of your assignment, the fitness function is "prettiness" - the user selects which pictures he or she prefers (say, four of the twenty five presented) and those are then crossbred for the next generation.

Furthermore, in addition to evolution through breeding, you should also support evolution through mutation. Mutation means replacing an existing node in the tree with something generated at random (or tweaking some existing constant within the tree).

Feature set

Your program will support a number of features:

Your program will support the Web browser paradigm. That means you will have "Back" and "Forward" buttons, as well as a "Go" menu with a history of where you've been (as used in Firefox, among other browsers). You will have a "Restart" button that flushes what you've got and generates new random images from scratch. You will have a "Stop" button which causes all computation to cease.

You will support incremental display. Each image should be partially rendered as you are computing it, and you should be computing all the images concurrently (note: not necessarily with multiple threads, although that's a perfectly reasonable way to do it). The effect should be similar to visiting a Web page and watching the various images filling in slowly.

NEW About PPM: PPM (Portable PixMap) is an extremely simple image format designed by Jef Poskanzer. There is a Java implementation of a Decoder and Encoder for PPM written by Poskanzer himself (local mirror, in case these files disappear). You're welcome to try using it. You will also be able to view your PPM images later on using simple Unix tools like xv or by converting them using convert (part of ImageMagick, available on Owlnet).

NEW About PNG: The PNG (Portable Network Graphics) format [Wikipedia] is a much newer and more capable image format. A PNG is organized into "chunks"; the simplest PNG is a header chunk, a gzipped (!) image data chunk, and a footer chunk. You can support PNG in one of two ways:

  1. You can write the PNG format yourself, which is pretty easy if you poke around the specmuch easier than gzip, since you only need to write "type 2" (8-bit-per-channel RGB) images.
  2. You can use the javax.imageio classes built into Java. (This is the easy way out.)

Intrinsic functions in your genome

Your genome will have a rich set of possible functions. Each function is defined as taking zero or more floating-point arguments and returning a floating-point value. These functions can thus be composed together into an arbitrary tree. This function will then be evaluated for each pixel (i.e., looping over X and Y) and for each color channel (R,G,B). You'll probably find it easier to think of the colors (R,G,B) as a 3-vector, and design all your functions to operate piecewise on these vectors (more on this below, when we talk about software engineering).

We will define an image over X and Y from -1 to 1. The upper left corner of the image will be (-1, 1), which is to say, it will be the usual Cartesian coordinates you do on paper rather than the backward geometry you see in many computer graphics libraries.

You must support everything below (and feel free to add additional stuff). You should scale things such that the minimum (R,G,B) value is (-1, -1, -1) and the maximum value is (1, 1, 1), mapping to pure black and pure white, respectively.

Now, not all of these functions are defined continuously. You should have appropriate error handling (i.e., just define divide-by-zero to return zero and continue onward).

Color spaces

Your program will support conversion from the usual RGB color space to the slightly less usual Y/Cr/Cb space. This is accomplished with the following system of equations:

   Y  =  0.2989 R + 0.5866 G + 0.1145 B
   Cb = -0.1687 R - 0.3312 G + 0.5000 B
   Cr =  0.5000 R - 0.4183 G - 0.0816 B

And, of course, it can be inverted:

   R = Y             + 1.4022 Cr
   G = Y - 0.3456 Cb - 0.7145 Cr
   B = Y + 1.7710 Cb

This particular color space is the basis of JPEG and MPEG encoding. It's also the same as the "Y/Pr/Pb" component video jacks you find on the back of modern DVD players. The "Y" channel represents the "black&white" or luminance portion of the image, and the other channels represent the changes necessary to get the colors or chrominance. You can think of this color space in similar terms to how radio is broadcast, where they transmit one channel with L+R and a second channel with L-R. If you have an older radio that can only get L+R, at least you get mono sound. Likewise, the YCrCb color space originated in systems to add color onto black&white TV broadcast standards.

When your genetic images start combining color space changes with piecewise multiplication, you'll start getting some very pretty visual effects.

Of course, there are many other color spaces out there. If you're going for cool points, you might want to look at other matrix multiplication operators on the RGB 3-vectors.

Animations

You will implement genetic cross dissolves as described in section 4.5 of the Sims paper. The idea is that the user selects two images that are (hopefully) genetically similar to each other and then asks for them to be cross-dissolved. To interpolate to a genotype between the two source genotypes, you recursive walk down from the top. While the source genotypes are the same, you continue recursively. Where they differ, you evaluate both sides and linearly interpolate between the values they return.

Your GUI will allow you to specify how many frames will be computed and what resolution they will be. The results, after being generated, will be displayed on-screen as a double-buffered animation. Your program will also have an option to write the images out to disk. (We have further instructions to help you with the MPEG generation process.)

The stop button should still work during the computation of these frames as well as playing back the video. Likewise, since this computation might take a while to complete, you should provide a thermometer-style bar gauge to indicate how far along your program has gotten in its computation.

Please don't stress out about generating the MPEG videos. The only "real" requirement here is to be able to write out a bunch of sequentially numbered PPM files. Save the in-program video playback for a last-minute feature if you've got the time.

Final submission

In the end, you will have created a program that lets you explore cool images. As part of your final submission, you will create a Web page for all the world to see, that has three of your favorite images (rendered at 600x600 pixel resolution) as well as a screen-dump of your GUI as it's running with the user selecting images and also a screen-dump of the animation functionality. For each of the three images, your Web page should also show the genotype of the image.

To help your T.A.'s (and yourself), you should also generate a Web page with a bunch of test images. You should have one image per intrinsic in your genome to prove to yourself (and to us) that you are implementing that intrinsic function properly. For example, color Perlin noise should look different from black&white Perlin noise. To demonstrate your RGB-To-YCrCb and reverse functions, you should probably choose suitable inputs, such as sweeps from left to right with black to white or red to green. You may find you want to add other intrinsic functions to help you with this testing. That's fine, but consider that you've already got some fairly powerful intrinsic functions already in there.

You will also have an MPEG video or two of your favorite genetic animations. While all you really need to do is to link to the .mpg file, you may find things work better if you explicitly use HTML code to run the QuickTime viewer. Unfortunately, there's no simple magic way to make embedded animations work perfectly on every browser.

As soon as you've got a URL, even if there's nothing there yet, e-mail your grading group. We'll set up a big web page so you can look at each others' pretty pictures.

Image quality notes

Normally, computer graphics images are defined to be a two-dimensional array of (R,G,B) tuples, where each value is a single byte (usually from zero through 255). The functions described above, however, are described "continuously" from -1.0 to 1.0. When you covert from an integer pixel coordinate to a floating-point phenotype coordinate, and then look up a value in an embedded image (converting back to integer pixels again), it's possible for the image coordinates to map somewhere in between the discrete pixel points. The solution you need to use is called bilinear interpolation. In one dimension, interpolation is easy. If x is a value between zero and one, and you have P0 and P1 at each point, the linear interpolation is (1-x)*P0 + x*P1. In two dimensions, your sample point is usually inside a box with four samples on each edge. The way to think of the solution is that you first compute the linear interpolation in the x-axis for the bottom points and the top points, then you interpolate those intermediate results on the y-axis. Assuming points P00, P01, P10, and P11, the final equation would be something like:

If you don't do bilinear interpolation, but instead round-off the coordinates, you'll get some fairly unpleasant image artifacts if you have to zoom in on a image (big, blocky pixels). The interpolation looks much nicer, but it's slower. You might consider including a "quality" switch that uses round-off instead of bilinear interpolation.

If you're feeling really gonzo (not required for this assignment), you could investigate trilinear mipmapping. The above case takes care of zooming in to an image, but could generate aliasing artifacts when you zoom out (e.g., moire patterns). Modern texture mapping hardware does this and it looks great, but the software speeds would be even slower.

Performance notes

If you just evaluate the function at every pixel, you may find it goes very slowly. A simple performance trick is to do some loop reordering. Where you might normally write:

Instead, you can say something like this:

This means you're only interpreting your genome representation once per row (and carrying one intermediate value per column), so the cost of interpretation becomes much more reasonable.

One last trick: on the RGB-To-YCrCB transformations, notice that the coefficients all have exactly four decimal places. If you multiplied every coefficient by 10000, you could perform the whole computation with integers, then divide by 10000 when you were done. If you were even more clever, you could multiply those constants by 2^16, do the multiply, then right shift by 16, avoiding the need to do the more expensive integer divide operation.

Software engineering notes

We're requiring you to implement a user interface from scratch and we want it to "feel" like using a web browser. In additional to the technical challenges (e.g., making the stop button work), you're going to have some usability challenges. You should perform some basic usability testing on your friends, roommates, and so forth. Even before your program is anywhere near complete, you can observe your friends trouble when using your system. Keep your hands in your pockets and keep your mouth shut. See if they can figure it out without your help. When you make changes based on this, be sure to describe your experimental results in your updated specs.

You need a plug-in architecture that supports all the different intrinsic functions of your genome. I've already broken them down based on the number of inputs, but you've got some important design decisions that you need to make. Should you really compute the red, green, and blue images entirely separately, or can you compute them all at the same time? If you want to do everything all at once, then you may find it helpful to define every intrinsic operation to work over 3-vector colors. So, for a single-argument function like sine or cosine, you might have:

Things get more complex for two-argument functions. Consider external-image. If those arguments are X and Y, then you get exactly the original image. If they're something else, then your image gets warped in some hopefully pretty fashion. Again, you have to decide how you want to handle it. Maybe it's easier to make them be a single-argument function that treats a 3-vector as if it were an (X,Y) point (i.e., ignoring the third input). Or, maybe you want to take two 3-vectors, which gives you, in effect, a separate warping function for each of the three color channels. You'll need to come up with a reasonable solution to this problem that can be applied for images, Perlin noise, and other intrinsics.

Variables like X and Y raise another interesting challenge that you need to address. You could, for example, insert X as (X,X,X), or you could perhaps create vectors like (X,Y,0). It's important that your system have enough flexibility to be able to generate (Y,X,0) much less images like (3*X+2*Y,X*X,cos(Y)). One way you might address this is having a handful of simple primitives (e.g., (X,X,X), (Y,Y,Y), (X,Y,0), (constant1, constant2, constant3)) and then some kind of mixing function, comparable to RGB-To-YCrCb, that's just another intrinsic in your genome. In your spec, you should describe how your design is general enough to create a suitable variety of different genotypes.

You're required to support Sims' tree-branching genetic crossbreeding, but you should be sure that you can support other styles of crossbreeding as well. Make sure you've engineered something analogous to a visitor pattern that can separate out the selection of parents to breed (the fitness function) and the method with which you generate their children (crossbreeding, mutation, etc.). You might even consider having swappable fitness functions. Right now, the required fitness function is user-preference, but it's just as conceivable to have something off-the-wall like trying to breed pictures that look as close as possible to a picture of your face.

There are many different ways you can divide the labor for this project. One programmer might do the GUI work, where the other does the GA work. You could even look at using an embedded interpreter like Rhino to drive the low-level system by hand before you've gotten the GUI finished. With simple calls to the interpreter, you could build up functions to be evaluated, run them with output to a file, and then look at the results with your favorite external tool. You can also build test scripts that exercise more code than traditional unit tests. This could make your life much easier with respect to testing and integration.

Optional stuff

If any of you miraculously finishes this project early, here are some ideas you might pursue:

Last modified: Tue Apr 4 10:18:46 CDT 2006