Pretty Pictures via Genetic Algorithms
Deadlines
Tuesday, 4 April | Project out |
Tuesday, 11 April, 11:59pm | Specs due |
Tuesday, 18 April, 11:59pm | Prototype due |
Friday, 28 April, 11:59pm | Project due (last day of classes) |
How you'll work
Unlike previous projects, this one relaxes the constraints. You can choose your own partner. (You must have a partner, and we will allow one group of three, if necessary.) You can do pair programming. Or not. You can do unit testing. Or not. As part of your spec, we expect you to describe how you plan to go about designing, implementing, and testing your project. Of course, we encourage you to do it the way we've required you to do it in the past. Note: just because we're relaxing the requirements, doesn't mean we're allowing you to be lax in the work you do. We're just giving those of you who think you have better ideas the chance to give them a test drive. You'll still be graded on the same scoring sheet as before. (So, where we require "unit testing", you're now required to have a good testing strategy. Likewise, your specs and implementation must still stay in sync.)
Partners and Graders
Update: Groups are here. Grading teams are still TBD.
The Project
The purpose of the assignment is to automatically create pretty pictures by using genetic algorithms. This may sound a little strange, but the results can be quite spectacular. Here's the general idea: a picture is really a function that maps points (x,y) to colors (r,g,b). You evaluate this function once for each pixel on the screen and store the results. So far so good, but unlike traditional computer graphics, we will be breeding these functions with each other, and allowing the user to select which pictures from the current generation should be bred into the next generation.
This assignment is based on a paper written in 1991 by Karl Sims, Artificial Evolution for Computer Graphics. Scroll down to the bottom of his Web page and check out the pretty pictures. For a whizzy Web implementation of the same thing, check out The Gallery of Random Art. You should consider the Sims paper to be a part of your assignment specification.
Breeding functions?
It's not as crazy as it sounds. While there are a number of techniques available, you will use the same technique as Sims does in section 4.3 ("Mating Symbolic Expressions"). The idea is that any function can be represented as a tree. The function is the genotype, whereas the final picture produced is the phenotype of that gene. Sims describes two forms of breeding:
- When you wish to create a child, from two parents which are similar in their genotype, you recursively traverse both parents. So long as the parents are the same, the child will be the same as the parent. Once you reach some kind of difference, you roll the dice and choose one parent or the other.
- If the two parents have unrelated genotypes, they won't be able to breed together as above. Instead, you take a random clipping from one parent and splice it onto a copy of the other parent at a random location, replacing whatever was there before.
With standard GA systems, you need a fitness function. You evaluate the fitness of every gene you generate and then crossbreed the genes with the highest fitness. You have a fair amount of flexibility in how you do this. One simple way is, for each child you wish to produce, you randomly choose the future child's parents (biased toward parents with a higher fitness value) and then crossbreed them to product the child.
In the case of your assignment, the fitness function is "prettiness" - the user selects which pictures he or she prefers (say, four of the twenty five presented) and those are then crossbred for the next generation.
Furthermore, in addition to evolution through breeding, you should also support evolution through mutation. Mutation means replacing an existing node in the tree with something generated at random (or tweaking some existing constant within the tree).
Feature set
Your program will support a number of features:
- A 5x5 grid containing twenty five pictures - the current generation. Feel free to make this scrollable or do whatever is necessary for it to look good. I recommend you render the small images at 125x125 pixel resolution, although you could scale this to fit the size of the window if you prefer.
- The ability to select any subset of these images for breeding into the subsequent generation.
- The ability to set the mutation rate (i.e., the probability that a function in the next generation will be randomly mutated).
- The ability to blow up an image to "full size" (rather than the 1/25 grid size)
- The ability to save an image to a file at any desired pixel resolution. NEW You must support at least one of these two required formats:
- The PPM "rawbits" format ("man ppm" and use the "P6" format; see below for more info).
- The PNG format (only color type 2 is required; see below for more info).
- The ability to view, save, and re-load the genotype for any given image (these are often represented as LISP-like S-expressions, but you can use any human-readable format you like, perhaps XML or even JavaScript).
- The ability to load a new picture from scratch (again, either PPM "rawbits" format or PNG type 2 is required; other formats are optional) and use that as input to future generations.
- The ability to start over from scratch with 25 randomly generated image functions.
- The ability to produce genetic animations between two different images (see Animations, below).
Your program will support the Web browser paradigm. That means you will have "Back" and "Forward" buttons, as well as a "Go" menu with a history of where you've been (as used in Firefox, among other browsers). You will have a "Restart" button that flushes what you've got and generates new random images from scratch. You will have a "Stop" button which causes all computation to cease.
You will support incremental display. Each image should be partially rendered as you are computing it, and you should be computing all the images concurrently (note: not necessarily with multiple threads, although that's a perfectly reasonable way to do it). The effect should be similar to visiting a Web page and watching the various images filling in slowly.
NEW About PPM: PPM (Portable PixMap) is
an extremely simple image format designed by Jef Poskanzer. There is
a Java implementation of a Decoder and Encoder for PPM
written by Poskanzer himself (local mirror, in case these files disappear). You're welcome to try using
it. You will also be able to view your PPM images later on using
simple Unix tools like xv or by converting them using convert
(part of ImageMagick, available on Owlnet).
NEW About PNG: The PNG (Portable Network Graphics) format [Wikipedia] is a much newer and more capable image format. A PNG is organized into "chunks"; the simplest PNG is a header chunk, a gzipped (!) image data chunk, and a footer chunk. You can support PNG in one of two ways:
- You can write the PNG format yourself, which is pretty easy if you poke around the spec—much easier than gzip, since you only need to write "type 2" (8-bit-per-channel RGB) images.
- You can use the
javax.imageio
classes built into Java. (This is the easy way out.)
Intrinsic functions in your genome
Your genome will have a rich set of possible functions. Each function is defined as taking zero or more floating-point arguments and returning a floating-point value. These functions can thus be composed together into an arbitrary tree. This function will then be evaluated for each pixel (i.e., looping over X and Y) and for each color channel (R,G,B). You'll probably find it easier to think of the colors (R,G,B) as a 3-vector, and design all your functions to operate piecewise on these vectors (more on this below, when we talk about software engineering).
We will define an image over X and Y from -1 to 1. The upper left corner of the image will be (-1, 1), which is to say, it will be the usual Cartesian coordinates you do on paper rather than the backward geometry you see in many computer graphics libraries.
You must support everything below (and feel free to add additional stuff). You should scale things such that the minimum (R,G,B) value is (-1, -1, -1) and the maximum value is (1, 1, 1), mapping to pure black and pure white, respectively.
- Zero arguments
- Constants (initially generated randomly between -1 and 1)
- The variables X and Y (the target pixel you're evaluating)
- One argument
- Negate
- Round-down and round-up
- Sin, Cos, ATan (trigonometric functions)
- Exponentiate, Log
- Absolute value
- Clip (result is clipped to [-1,1])
- Clip(0.5) = 0.5
- Clip(1.5) = 1.0
- Clip(-1.5) = -1.0
- Clip(10) = 1.0
- Wrap (result is wrapped to [-1,1])
- Wrap(0.5) = 0.5
- Wrap(1.5) = -0.5
- Wrap(-1.5) = 0.5
- Wrap(10) = 0
- Two argument
- Add, subtract, multiply, divide
- External image (X and Y inputs to this
function return the colors of the image - see below
for notes on image quality)
- Clipping version (points outside the box are clipped back to the unit box)
- Tiling version (points outside the box are brought back inside the box using a modulo-like operation)
- Perlin noise (Java source code available)
- Black&white noise (same value for R, G, and B)
- Color noise (different values for R, G, and B)
- Special one-argument
- RGB-To-YCrCb - this converts from the RGB color-space that you know to a luminance / chrominance space commonly used in digital television and other applications (see color spaces, below).
- YCrCb-To-RGB - this is the inverse operation to the above.
Now, not all of these functions are defined continuously. You should have appropriate error handling (i.e., just define divide-by-zero to return zero and continue onward).
Color spaces
Your program will support conversion from the usual RGB color space to the slightly less usual Y/Cr/Cb space. This is accomplished with the following system of equations:
Y = 0.2989 R + 0.5866 G + 0.1145 B Cb = -0.1687 R - 0.3312 G + 0.5000 B Cr = 0.5000 R - 0.4183 G - 0.0816 B
And, of course, it can be inverted:
R = Y + 1.4022 Cr G = Y - 0.3456 Cb - 0.7145 Cr B = Y + 1.7710 Cb
This particular color space is the basis of JPEG and MPEG encoding. It's also the same as the "Y/Pr/Pb" component video jacks you find on the back of modern DVD players. The "Y" channel represents the "black&white" or luminance portion of the image, and the other channels represent the changes necessary to get the colors or chrominance. You can think of this color space in similar terms to how radio is broadcast, where they transmit one channel with L+R and a second channel with L-R. If you have an older radio that can only get L+R, at least you get mono sound. Likewise, the YCrCb color space originated in systems to add color onto black&white TV broadcast standards.
When your genetic images start combining color space changes with piecewise multiplication, you'll start getting some very pretty visual effects.
Of course, there are many other color spaces out there. If you're going for cool points, you might want to look at other matrix multiplication operators on the RGB 3-vectors.
Animations
You will implement genetic cross dissolves as described in section 4.5 of the Sims paper. The idea is that the user selects two images that are (hopefully) genetically similar to each other and then asks for them to be cross-dissolved. To interpolate to a genotype between the two source genotypes, you recursive walk down from the top. While the source genotypes are the same, you continue recursively. Where they differ, you evaluate both sides and linearly interpolate between the values they return.
Your GUI will allow you to specify how many frames will be computed and what resolution they will be. The results, after being generated, will be displayed on-screen as a double-buffered animation. Your program will also have an option to write the images out to disk. (We have further instructions to help you with the MPEG generation process.)
The stop button should still work during the computation of these frames as well as playing back the video. Likewise, since this computation might take a while to complete, you should provide a thermometer-style bar gauge to indicate how far along your program has gotten in its computation.
Please don't stress out about generating the MPEG videos. The only "real" requirement here is to be able to write out a bunch of sequentially numbered PPM files. Save the in-program video playback for a last-minute feature if you've got the time.
Final submission
In the end, you will have created a program that lets you explore cool images. As part of your final submission, you will create a Web page for all the world to see, that has three of your favorite images (rendered at 600x600 pixel resolution) as well as a screen-dump of your GUI as it's running with the user selecting images and also a screen-dump of the animation functionality. For each of the three images, your Web page should also show the genotype of the image.
To help your T.A.'s (and yourself), you should also generate a Web page with a bunch of test images. You should have one image per intrinsic in your genome to prove to yourself (and to us) that you are implementing that intrinsic function properly. For example, color Perlin noise should look different from black&white Perlin noise. To demonstrate your RGB-To-YCrCb and reverse functions, you should probably choose suitable inputs, such as sweeps from left to right with black to white or red to green. You may find you want to add other intrinsic functions to help you with this testing. That's fine, but consider that you've already got some fairly powerful intrinsic functions already in there.
You will also have an MPEG video or two of your favorite genetic animations. While all you really need to do is to link to the .mpg file, you may find things work better if you explicitly use HTML code to run the QuickTime viewer. Unfortunately, there's no simple magic way to make embedded animations work perfectly on every browser.
As soon as you've got a URL, even if there's nothing there yet, e-mail your grading group. We'll set up a big web page so you can look at each others' pretty pictures.
Image quality notes
Normally, computer graphics images are defined to be a two-dimensional array of (R,G,B) tuples, where each value is a single byte (usually from zero through 255). The functions described above, however, are described "continuously" from -1.0 to 1.0. When you covert from an integer pixel coordinate to a floating-point phenotype coordinate, and then look up a value in an embedded image (converting back to integer pixels again), it's possible for the image coordinates to map somewhere in between the discrete pixel points. The solution you need to use is called bilinear interpolation. In one dimension, interpolation is easy. If x is a value between zero and one, and you have P0 and P1 at each point, the linear interpolation is (1-x)*P0 + x*P1. In two dimensions, your sample point is usually inside a box with four samples on each edge. The way to think of the solution is that you first compute the linear interpolation in the x-axis for the bottom points and the top points, then you interpolate those intermediate results on the y-axis. Assuming points P00, P01, P10, and P11, the final equation would be something like:
- t0 = (1-x)*P00 + x*P10.
- t1 = (1-x)*P01 + x*P11.
- result = (1-y)*t0 + y*t1.
If you don't do bilinear interpolation, but instead round-off the coordinates, you'll get some fairly unpleasant image artifacts if you have to zoom in on a image (big, blocky pixels). The interpolation looks much nicer, but it's slower. You might consider including a "quality" switch that uses round-off instead of bilinear interpolation.
If you're feeling really gonzo (not required for this assignment), you could investigate trilinear mipmapping. The above case takes care of zooming in to an image, but could generate aliasing artifacts when you zoom out (e.g., moire patterns). Modern texture mapping hardware does this and it looks great, but the software speeds would be even slower.
Performance notes
If you just evaluate the function at every pixel, you may find it goes very slowly. A simple performance trick is to do some loop reordering. Where you might normally write:
- Loop Y (-1 to 1)
- Loop X (-1 to 1)
- Evaluate(X, Y) --> returns one pixel
- Loop X (-1 to 1)
Instead, you can say something like this:
- Loop Y (-1 to 1)
- Evaluate(Y) --> returns an entire row of pixels
This means you're only interpreting your genome representation once per row (and carrying one intermediate value per column), so the cost of interpretation becomes much more reasonable.
One last trick: on the RGB-To-YCrCB transformations, notice that the coefficients all have exactly four decimal places. If you multiplied every coefficient by 10000, you could perform the whole computation with integers, then divide by 10000 when you were done. If you were even more clever, you could multiply those constants by 2^16, do the multiply, then right shift by 16, avoiding the need to do the more expensive integer divide operation.
Software engineering notes
We're requiring you to implement a user interface from scratch and we want it to "feel" like using a web browser. In additional to the technical challenges (e.g., making the stop button work), you're going to have some usability challenges. You should perform some basic usability testing on your friends, roommates, and so forth. Even before your program is anywhere near complete, you can observe your friends trouble when using your system. Keep your hands in your pockets and keep your mouth shut. See if they can figure it out without your help. When you make changes based on this, be sure to describe your experimental results in your updated specs.
You need a plug-in architecture that supports all the different intrinsic functions of your genome. I've already broken them down based on the number of inputs, but you've got some important design decisions that you need to make. Should you really compute the red, green, and blue images entirely separately, or can you compute them all at the same time? If you want to do everything all at once, then you may find it helpful to define every intrinsic operation to work over 3-vector colors. So, for a single-argument function like sine or cosine, you might have:
- Original: (R',G',B') = sin((R,G,B))
- Implementation: R' = sin(R), G' = sin(G), B' = sin(B)
Things get more complex for two-argument functions. Consider external-image. If those arguments are X and Y, then you get exactly the original image. If they're something else, then your image gets warped in some hopefully pretty fashion. Again, you have to decide how you want to handle it. Maybe it's easier to make them be a single-argument function that treats a 3-vector as if it were an (X,Y) point (i.e., ignoring the third input). Or, maybe you want to take two 3-vectors, which gives you, in effect, a separate warping function for each of the three color channels. You'll need to come up with a reasonable solution to this problem that can be applied for images, Perlin noise, and other intrinsics.
Variables like X and Y raise another interesting challenge that you need
to address. You could, for example, insert X as (X,X,X)
, or you could perhaps
create vectors like (X,Y,0)
. It's important that your system have
enough flexibility to be able to generate (Y,X,0)
much less images
like (3*X+2*Y,X*X,cos(Y))
. One way you might address this is having a handful of
simple primitives (e.g., (X,X,X)
, (Y,Y,Y)
, (X,Y,0)
, (constant1, constant2, constant3))
and then some kind of mixing function, comparable to RGB-To-YCrCb, that's just another intrinsic in
your genome. In your spec, you should describe how your design is general
enough to create a suitable variety of different genotypes.
You're required to support Sims' tree-branching genetic crossbreeding, but you should be sure that you can support other styles of crossbreeding as well. Make sure you've engineered something analogous to a visitor pattern that can separate out the selection of parents to breed (the fitness function) and the method with which you generate their children (crossbreeding, mutation, etc.). You might even consider having swappable fitness functions. Right now, the required fitness function is user-preference, but it's just as conceivable to have something off-the-wall like trying to breed pictures that look as close as possible to a picture of your face.
There are many different ways you can divide the labor for this project. One programmer might do the GUI work, where the other does the GA work. You could even look at using an embedded interpreter like Rhino to drive the low-level system by hand before you've gotten the GUI finished. With simple calls to the interpreter, you could build up functions to be evaluated, run them with output to a file, and then look at the results with your favorite external tool. You can also build test scripts that exercise more code than traditional unit tests. This could make your life much easier with respect to testing and integration.
Optional stuff
If any of you miraculously finishes this project early, here are some ideas you might pursue:
- A time parameter (t) that varies from zero to one (or any other range) turns any expression into an animation. If it goes zero to one and back to zero, then it makes for a loop that can repeat infinitely. These may not be quite as cool as the genetic cross-dissolves, but they're easy to implement.
- Different statistical biases on the random selection of different intrinsic gene functions can help you tune what kinds of images you'll get. For example, once you load a picture of your face, you can make it more likely to appear.
- When you crossbreed, particularly with only a handful of parents, you may get many children with identical genotypes. You only need to send one of them on to the next generation, and otherwise you might as well try to generate another child.
- You might want to be save bookmarks. Whenever you find an image that you like, you could bookmark it and assign it some probability of occuring in future generations just like any other intrinsic function.
- Once you've found a clever way to warp one image, you're going to want to be able to apply it to lots of other images. You could take a particular genotype, turn the embedded image name into an external variable, and make a command-line utility (or Photoshop plug-in!) suitable for general use.
- (Hard) Once you've bookmarked a lot of favorites, you may be able to automatically identify common subexpressions that are popular among your bookmarks. Those would be even better candidates for keeping around for future generations. (Hint: if you had the right hash function, you could detect common subexpressions by looking for hash collisions.)
- (Hard) While most interesting image functions will have dependencies on both x and y, many of their subexpressions will be invariant with respect to those variables. You may be able to optimize away a non-trivial amount of computation through partial evaluation of the expressions.