Thursday, December 13, 2012

Conway's Game of Life in SQL, Part 2

I showed my implementation of Conway's Game of Life in SQL to a few people, a co-worker pondered if it could be done without the stored procedure.  I gave it some thought, and came up with an implementation that relies on views. I used views for the simple reason that it made things easier to see and understand.  If a query can be written using views, the it can be written as straight SQL, but the results are generally uglier, messier, and can often be unreadable.  Since I'm using MySQL, I realize how bad views can be when it comes to performance, but since this is a proof of concept, and a toy anyway, I'm not too worried.  I honestly doubt any real-life application would ever require me to implement the Game of Life in SQL!

I started as I had earlier, by defining the table as holding all the live cells for each generation.
create table cell (
  generation int,
  x int, 
  y int,
  primary key (generation, x, y));
Then, I created a view to represent all the candidate position on the grids where we might find live cells on the next generation.  We can only find cells where there are cells for the current generation, and in all 8 neighbour positions of those cells:
create view candidate (generation, x, y) as 
  select generation, x-1, y-1 from cell union
  select generation, x,   y-1 from cell union
  select generation, x+1, y-1 from cell union
  select generation, x-1, y   from cell union
  select generation, x,   y   from cell union
  select generation, x+1, y   from cell union
  select generation, x-1, y+1 from cell union
  select generation, x,   y+1 from cell union
  select generation, x+1, y+1 from cell;
Next, I counted the number of live cells in the 3x3 square around each candidate.  This isn't quite the same as counting the neighbours, since it includes the current cell in the count.  We will need to take that into account in a bit.
create view neighbour_count (generation, x, y, count) as 
  select generation, x, y, 
         (select count(*) 
          from cell 
         where cell.x in (candidate.x-1, candidate.x, candidate.x+1)
           and cell.y in (candidate.y-1, candidate.y, candidate.y+1)
           and not (cell.x = candidate.x and cell.y = candidate.y)
           and cell.generation = candidate.generation)
    from candidate;
From the previous post, the rules of the game can be reduced to: a cell is alive in the next generation if

  1. it has 3 neighbours
  2. it has 2 neighbours and was alive in the current generation
Let's write this as a truth table:
Live? Neighbours Alive in the next generation
yes less than 2 no
yes 2 YES
yes 3 YES
yes more than 3 no
no less than 3 no
no 3 YES
no more than 3 no
If we count the cell in the neighbour count, we change our truth table:
Live? # cells in 3x3 square Alive in the next generation
yes less than 3 no
yes 3 YES
yes 4 YES
yes more than 4 no
no less than 3 no
no 3 YES
no more than 3 no
A cell will be alive in the next generation if it's alive and the number of live cells in the square is 3 or 4, or if the cell is dead and there are 3 cells in the square.  Let's rewrite the rules... again.

A cell will be alive in the next generation if
  1. The 3x3 square has exactly 3 cells
  2. The 3x3 square has exactly 4 cells and the current cell is alive.
So, we'll include a cell in the next generation according to these rules.  We can tell if a cell is alive by doing a left outer join on the cell table:
insert into cell (
  select neighbour_count.generation+1, neighbour_count.x, neighbour_count.y 
  from neighbour_count 
         left outer join cell 
              on neighbour_count.generation = cell.generation
             and neighbour_count.x = cell.x
             and neighbour_count.y = cell.y
  where neighbour_count.generation = (select max(generation) from cell)
    and (neighbour_count.count = 3 
         or (neighbour_count.count = 4 and cell.x is not null)));
Repeat the insert statement for each following generation.

Monday, December 10, 2012

The Code Retreat

On December 8th, I attended the Montréal edition of the 2012 Code Retreat.  It was organized by Mathieu Bérubé, to whom I'm very thankful.  He asked the attendees for comments on the event, so here I go.

The format of the code retreat is that during the day, there are 6 coding session, using pairs programming.  In each session, we try to implement Conways' Game of Life, using TDD principles.  We can use any language or framework we want.  Sometimes, only one of the partners knows the language used, and the other learns as we go.  For some of the session, the leader also adds a challenge or a new guideline.  After the session, we delete the code, and we all get together to discuss the session and the challenge.  Mathieu had some questions for us to guide the discussion and to make us think.

The goal of the even, however, is not to implement the game of life, but to learn something in the process.  I think we all learned a lot, technically.  Some of us learned new languages, some learned new ways to approach the problem.  I learned that the Game of Life is easier to solve in functional or functional-type languages, rather than with object oriented or procedural languages.  I learned the problem cannot be solved by storing and infinite 2D array.  I learned that MySQL has special optimisation for "IN" that it doesn't have for "BETWEEN".  I learned a smattering of Haskell.

But that's the sort of things one can learn in a book.  What else did I learn, that's less likely to be in technology books?  In all fairness, some of the following, I already knew, and is covered in management or psychology books, but those book tend to be less popular with developers.

The clock is an evil but useful master

By having only 45 minutes to complete the task, we can't afford to explore various algorithms and data structures to solve the problem - as soon as we see something that looks promising, we grab it, and run with it, until we notice a problem and try down a different path.  In some cases, we even started coding without knowing where we were headed.  I would normally think of this as a problem, because creativity is a good thing, and stifling it must be bad.  However, having too much time usually means I'll over-design, think of all possible ways, but in the end, I may still rush through the implementation.  Moreover, I assume that I'll have thought about all the possible snags that can come along, but some problems only rear their ugly heads after some code is written and the next step proves impossible.  Sometimes, it's better to start moving down the wrong path, than to just stay in one spot and do nothing at all.

This is somewhat related the the Lean Startup methodology - test your assumptions before you make too many.

At one point, I asked the organizer to put up, on the projector, how much time remained in the session.  He pointed out that it was a bad idea, because when we look at the clock, we're more likely to cut corners just to get something working, rather than focus on the quality of the code.  This was a hard habit to ignore!

Let go, and smell the flowers

Since my first partner and I solved the problem in the first session, I was really eager to solve it the second (and third, ...) time around, in part to prove that he didn't do all the work, and I was just there to fix the syntax.  But from the second session on, Mathieu gave us particular challenges to include in our coding, from "think about change-proofing your solution" to "naive ping-pong (one partner writes the test, the other does the least amount of work possible to pass the test)".  As it turns out, it was really hard to completely implement the solution, writing the tests first, and factoring in the challenge.  Something had to give.  It was really hard to let go "complete the solution" in favor of TDD and the challenges.  Recognizing that this is what I was doing certainly helped me let go, but the drive to finish the job kept nagging me.  So much so that when I got home, I just HAD to implement the Game of Life in Scala, because I was so upset I didn't finish during the event.

Another aspect of this is how I interacted with my partners.  When I thought I had the solution and they didn't, I just pushed on with my idea, so that we'd get the job done as fast as possible.  Rushing means I didn't listen as well as I could have to what the other person had to say.  You can't speak and listen at the same time.  The point of Code Retreat is not to write the Game of Life, that's been done thousands of times.  The point was to play with writing code, try different things, and  learn something in the process.

The tools make a difference on productivity

Session after session, it became clear that for this particular problem, a functional approach makes more sense than a procedural or object-oriented one.  While we can use functional programming concepts in most languages, functional languages make the task much simpler.  Even though we struggled setting up the test environment for Scala, we got further than with procedural language, because once started, we just added a test, added an implementation - everything seemed to work almost on the first try.

Another tool that affected productivity is the editor.  I'm not advocating for any particular tool, but in all the sessions, we used one participant's set up.  Whoever's setup that was would edit so much faster.  When using someone else's computer, I quickly got frustrated because I would type some editor command, only to realize that it's the wrong way to do what I want on that editor, or I had to slow down to avoid such mistakes.  This happened even though I knew how to use the other person's editor.  It may have worked better if we had used something like Dropbox so we could each use our own computer, but given that IDEs tend to organize the source/project differently, it may not have worked either.

I change depending on the personality of my partner

Each team I was on had a different dynamic.  This was due in part to the challenge proposed - ping-pong tends to do that, and in part to the personality of my partner.  It is likely that they also took cues from my personality as well, particularly since I was generally outspoken between the sessions, so I did not necessarily get to know them well in the process.  I tend to avoid being the leader of a group, but I'm also impatient.  At the beginning of a session, the my partner didn't suggest something immediately, an approach, a language, etc, I'd propose something.  This means that more often than not, I imposed my will on the other.  I should have listened more to my partners, and I might have, if I hadn't been so intent on finishing the problem in the 45 minutes!

Deleting the code

One of the instructions in the Code Retreat is that after each session, you delete the code.  This was hardest to do, the closer we got to finishing the solution.  It was particularly difficult in the session when we wrote in Scala, because after the initial fumbling with the test environment, the progress was steady and fast.  If only I had another 10 minutes, I'm positive we could finish!  Surprisingly, the working code in MySQL was very easy to delete, possibly because it felt complete, done, over with.

All in all, it was a useful, interesting and fun experience.  I highly recommend it to any programmer!

Sunday, December 9, 2012

Conway's Game of Life in... SQL?

Yesterday, I attended the Code Retreat hosted at Notman House and organized by Mathieu Bérubé.  It was a most enjoyable experience.  On the first session, I paired with Christian Lavoie, and we tried to implement Conway's Game of Life using... mysql.  Some people thought this was crazy, and I can't disagree.  At the same time, we were able to complete the solution in under 45 minutes, not a small feat.

Part of the rules of the Code Retreat is that you delete your code after each session.  We did, but I thought the solution was so easy to implement, I thought I'd try to recreate it at home.  I made some changes to the implementation, but the core of the idea is the same as what we wrote.

The first thing to consider is that in the Game of Life, we cannot have a fixed-length 2D array that holds the cells.  Indeed, the best way is to store only the coordinates of the live cells.  Any cell that's not live isn't store anywhere, since that's the state of all the cells in the infinite grid, and it's just impossible to store an infinite grid.  We assume a finite number of live cells to start with (which means we'll always have a finite number of cells).

These cells will be located in a rectangular area that's bounded by the minimum x, minimum y, maximum x and maximum y coordinates of the cells.  This means that we can restrict our processing to the positions within a rectangle that's just a little bigger than the bounding rectangle: we need to add a border one-position wide around the whole rectangle.

Then, for each cell within that rectangle, we decide if it should be alive in the next generation.  We simply ignore cells that die, by not adding them to the next generation.  The 4 rules of the game can be rewritten as two simpler rules:

  1. If the current cell is alive, it stays alive if it as exactly 2 or 3 neighbours who are also alive
  2. If the current cell is dead, it becomes alive if it has exactly 3 neighbours.
Using a little logic processing, we can further rewrite this as the following 2 conditions to be alive in the next generation:
  1. If a cells has 3 neighbours
  2. If a cell is alive and has 2 neighbours
So, we only add the cell to the next generation (insert it in the table) if it fits either of those rules.

Counting neighbours is also fairly easy.  We find how many cells exist that have their x and y coordinates one less, equal, or one more than the current x.  We have to include the current x and y to include cells exactly to the left, exactly to right, exactly above or exactly below the current cell.  We must, however, exclude the current cell, so if we know it's set, we subtract 1 from the total count.

Here is code close to what we came up with, with some modifications that came to me after the fact.

drop table if exists cells;
create table cells (
  generation int,
  x int,
  y int,
  primary key (generation, x, y));

drop procedure if exists iterate_generation;
delimiter $$
create procedure iterate_generation ()
  declare current_gen int;
  declare current_x int;
  declare current_y int;
  declare min_x int;
  declare max_x int;
  declare min_y int;
  declare max_y int;
  declare live int;
  declare neighbour_count int;

  select max(generation) into current_gen from cells;
  select min(x)-1, min(y)-1, max(x)+1, max(y)+1
    into min_x, min_y, max_x, max_y from cells
   where generation = current_gen;

  set current_x := min_x;
  while current_x <= max_x do
    set current_y := min_y;
    while current_y <= max_y do
      select count(1) into live from cells
       where generation = current_gen
         and x = current_x
         and y = current_y;
      select count(1) - live into neighbour_count from cells
       where generation = current_gen
         and x in (current_x - 1, current_x, current_x + 1)
         and y in (current_y - 1, current_y, current_y + 1);

      if neighbour_count = 3 || live = 1 and neighbour_count = 2
        insert into cells (generation, x, y)
                   values (current_gen+1, current_x, current_y);
      end if;
      set current_y := current_y + 1;
    end while;
    set current_x := current_x + 1;
  end while;
delimiter ;

Crazy? well, yes. And yet, surprisingly easy. And absolutely a whole lot of fun.