Monday, August 8, 2011

Bayes Theorem and Predicting the Future

I think Bayes theorem is one of the most beautiful things I have come across - its so simple and insanely efficient. In this post, I am going to present an idea about how Bayes theorem can be used for predicting the future ... Yes! you heard it right - for predicting the future.

The idea came to me while attending a talk by Peter Norvig. If you haven't already, you must, and I insist, take a look at the spell checker written by him which uses Bayes theorem. (Search Google!)

Before the idea, what is Bayes theorem: Well, it relates two events A and B in the following way:
Pr (A | B). Pr(B) = Pr (B | A). Pr(A)

How you may ask? And I must explain that intuitively upholding the spirit of this blog.
So here it is:
Pr(A | B) means the probability of A happening after event B has already taken place. So imagine two circles which are intersecting and represent A and B. Stretch your imagination and think that both the circles are invisible for now. What happens next?
B.
Event B happens next.
At this point of time, you should imagine a circle representing event B emerge out. What happens next?
A.
Event A happens next.
At this point of time,  you should be careful. The whole of circle representing event A should not emerge out. Since B has happened, we are restricted to play around only inside B. So what should emerge out is the intersecting area between A and B - which is already a part of B.

Now at this point of time, either you are confused enough and about to leave or have in your imagination - a circle representing B with a shaded area that represents A and B happening together. Voila! What have we got?
Pr (A | B) = Pr ( A intersecting B) / Pr(B)  -- see the circle in your head - this is simple maths.

Swap A and B and what have you got:
Pr (B | A) = Pr(B intersecting A) / Pr(A)

Well, Pr (A intersecting B) = Pr (B intersecting A) , so if you do that, you get the form which I originally mentioned:

Pr( A | B) = Pr(B | A). Pr(A) / Pr(B)



Predicting the future - how?
This is the interesting part. Let B represent your present(or past) and A represent one possible state of your future.

What you ought to do is:

Pr( Your future state | (Given) Your present state) = 


Pr ( Your present state| (Given) Your future state). Pr(Your future state) / Pr(Your present state)

Now note that given your present state (pathetic as it is - reading this blog) , your future state could be many. So dont forget to apply a max function and take the future state with the maximum probability.

Also note that Pr (Your present state) is the common denominator across all your future states - so it can safely be ignored or if you are a stickler to pure maths - still can you ignore it (humble request)?

But the million $ question would: WTH does this term on RHS mean:

Pr ( Your present state| (Given) Your future state).

What it means is what it actually means! Given your future state, whats the probability of your present state? Does that make any sense? If you knew your future state, wouldn't you not be reading this blog already.

True.

Now the answer is - resolving this term depends on the context of the problem and how we can model it in Bayes theorem context. However, since I won't let you go intuitionally starved: here's a way: Why don't you observe your past and evaluate this term? So basically you spread out your past as a set of discrete events that have already happened and you can very easily divide it into states that are similar to your present states and also the states which you eventually went into. Having done that, you can get a fair amount of idea of the term that invoked a WTH from you a while before.

See, you can model your future given your present - isn't Bayes theorem insanely beautiful? :)

Tuesday, May 3, 2011

Expecto Patronum - writing a sudoku solver (Part 2)

Qn: So how much have we progressed in writing a Sudoku solver?
Ans: 33.33 %

Lets finish the other two parts as well. I'll start with the easy one first, getting all valid choices for a cell in the sudoku map.

Getting valid choices for a cell in sudoku map
The sudoku is partially filled and I am analysing a cell (x, y). What I'd want is to get a list of all possible valid choices for (x, y). Simple! Eliminate all numbers (between 1-9) from row x, column y and the 3 X 3 square containing (x, y). Whatever remains, however improbable, are the valid choices, my friend. Here is the swash-buckling code:


def get_choices(s, x, y):
  '''Get possible choices for cell [x,y]
     Args:
       s: sudoku map
       x: row
       y: column
    Returns:
      list of possible choices
  '''
  l = 9
  all_choices = set(range(1, l+1))
  invalid_choices = set([])
  for i in range(1, l+1):
    r_ele = s[10*x + i]
    if r_ele:
      invalid_choices.add(r_ele)
    c_ele = s[10*i + y]
    if c_ele:
      invalid_choices.add(c_ele)

  #Use your own discretion to find out the top left cell of 3X3 square containing (x, y)
  sq_x, sq_y = find_3X3_top-left-cell(x, y)

  for i in range(3):
    for j in range(3):
      key = 10 * (sq_x + i) + (sq_y + j)
      if s[key]:
        invalid_choices.add(s[key])
    
  return list(all_choices - invalid_choices)  

And now comes the grand finale!!!


Main Algorithm
                                      "To iterate is human, to recurse divine"

And since we all want to write divine code, we would use recursion to solve this problem. How exactly? 

# Termination condition
If all cells in sudoku map are filled:
  if sudoku is solved:
     return the sudoku map

# Main recursion 
As long as there are unfilled cells in sudoku map:
1. Go to a unfilled cell 
2. Iterate over all possible valid values for that cell.
3. Fill that cell with each iterating valid value (thus modifying the sudoku map)
4. Recursively solve this new modified sudoku map.
5. If recursion returns because we found a solution (recursion will return sudoku map object):
      return the currently filled sudoku map as it is

    else (recursion returned None because no solution is found along the iterated value we filled):
      restore the sudoku map before recursion
      repeat from step 2 by moving on to the next value in the iteration

  Heuristic
   I also included a heuristic in the actual program for the order of visiting the unfilled cells. You could  visit them sequentially but turns out that it is not a good strategy. A simple heuristic is to start from the cell which has least number of possible valid choices. Using this simple heuristic, you cut your search space exponentially. (Think, think!)


All this algorithm (including the heuristic) translates to the following code:

def solve_sudoku(s):
  '''Solves sudoku
     Args: 
       s: sudoku map
    Returns:
      Solved sudoku map
  '''
  l = 9

  if all(val for val in s.values()): 
    if check_sudoku(s):
      return s

  min_choices = range(1, 10) 
  min_choice_key = None
  for i in range(1, l+1):
    for j in range(1, l+1):
      key = 10*i + j
      if s[key]:
        continue
      choices = get_choices(s, i, j)
      if len(min_choices) > len(choices):
        min_choices = choices
        min_choice_key = key

  for k in min_choices:
    s[min_choice_key] = k
    if solve_sudoku(s):
      return s
    s[min_choice_key] = 0

So this is the end, beautiful friend! 
Here was a simple, recursive sudoku solver. And it solved the world's hardest sudoku in around 3-4 sec. (http://www.usatoday.com/news/offbeat/2006-11-06-sudoku_x.htm)

There are more optimal implementations out there than the one I have given. But my aim was just to explain the process of writing a simple sudoku solver, and hopefully, you would have got a feeling of it now!

Acknowledgement:
I figured out the logic myself, except for the heuristic taken from http://norvig.com/sudoku.html which created a real big performance difference.

Expecto Patronum - writing a sudoku solver (Part 1)

Writing a sudoku solver is not a dark art. It appears so. And appearances can be deceptive.

So this post is about how to write a sudoku solver. And I will show you how to write it in Python - though choice of programming language really does not matter.

This post is about writing a sudoku solver for a 9 X 9 sudoku puzzle. At this point, I assume you all know what a sudoku puzzle is and what are its rules - if you do not, stop reading and go back to sleep (sigh! yawn!)

Input
First of all, how do we represent the input. The answer comes naturally - read from a file. Cool! And what would be the file format to represent a 9 X 9 sudoku puzzle? Thats your choice. Not being too much fancy in file format, I will keep it simple: there will be 9 lines in the file, with each line having 9 numbers, from  0 - 9 , separated by single space. An example for the content of the input file is:

#############

0 0 3 0 7 5 0 0 0
0 0 0 0 2 0 0 8 6
9 0 2 0 0 1 0 0 3
0 4 0 1 8 0 0 0 0
0 0 7 0 0 0 0 0 0
2 0 0 0 0 0 6 0 0
0 0 0 7 0 0 3 0 1
0 0 4 0 0 0 0 0 7
0 3 0 4 5 0 2 0 0

#############


**0 represents the unfilled entries in the sudoku which we have to find out.


Representing Sudoku in the program
2D Array? Grid? Well, well, well, hold your horses. I am writing this in Python and Python gives me list as a awesome data structure. Since the natural way to represent the sudoku puzzle in the program would be a 2D sort of matrix, I will have to use a collection of lists. Now wait! What I want is to be able to glide down any row or column or a 3 X 3 square and with a collection of lists, its a bit tedious. So I am going to use Python's hero of all times - a dictionary to represent the sudoku. 

What about the keys in the dictionary? Easy one mate! Lets churn a key out of (row, column) for each sudoku cell. Well, you can make any key, it does not matter as long as you could manufacture 81 different keys with the same hash function. (It might vary in efficiency but that would not be a bottleneck unless the extent of one's stupidity is proportional to the size of the universe)

I would use the first function that came to my mind:
                                                   f(row, column) -> 10 * row + column

Now using this hash function, just read from the file and construct this sudoku map (sudoku hash, sudoku dictionary - anything is fine as long as you understand it to be the same thing I am talking about).

Checking if sudoku is solved
This post is getting too long and I'd like to continue onto another. But before that, lets add one real function to our code which checks whether a currently filled sudoku map has solved the sudoku or not. The algorithm is pretty easy - check all rows and each row should have all numbers from 1 - 9 (no repetitions), and do this for all columns too. So here is the first real function code in this blog:

def check_sudoku(s):
  '''Checks if sudoku is solved
     Args: 
       s: sudoku map
    Returns:
      True if sudoku is solved
  '''
  l = 9
  # Check each row and column
  for i in range(1, l+1):
    r = []
    c = []
    for j in range(1, l+1):
      r_ele = s[10*i + j]
      c_ele = s[10*j + i]
      if r_ele in r or c_ele in c:
        return False
      r.append(r_ele)
      c.append(c_ele)
  return True

I got really bored now(Yaaaawn!). So more on this in part 2.

Saturday, March 5, 2011

Dynamic Programming - This, That and the Universe

Divide and Conquer versus Dynamic Programming

Let me outline two very important truths here:

  • D&C is a top-down approach while DP is a bottom-up approach to solve problems.
  • D&C can be used to solve a problem instance of DP.
What then, is the difference, you may ask? Why DP? Why not D&C? The answer, my friend, is blowing in the wind. :)

I will give it a shot to explain this as simply as I can:

In D&C, you divide the main problem, say P into smaller parts. Lets consider, for the sake of simplicity, that P is broken down into two smaller sub-problems S1 and S2. Simply think of P as being: P = S1+ S2. Now, in a pure D&C setting, you would solve S1 and S2 separately, combine their solutions in a certain way and get the answer to P. Perfect!

In DP, the first realization you should have is that S1 and S2 overlap. Lets consider that S2 depends on S1 (overlap!). This means that S1 has to be solved before S2 can be solved. Now you may ask, why? Why isn't there any other way to solve S2? Because if P is a DP problem and P has a optimal sub-structure, then its sub-problems also have a optimal sub-structure. And if S1 and S2 have a optimal sub-structure and S2 is dependent on S1, you should optimally solve S1 first which would lead you to getting a optimal solution to S2.

And how does this all say that you cannot use D&C? As a matter of fact, you can. You could solve S1 in one part, S2 in another ... in a D&C setting. But that would involve more work because you are going to solve S1 twice, once by itself and once when you are trying to solve S2. That loosely-said more-work grows exponentially as the overlapping sub-problems increase. 

All that said and done, whats a simple, intuitive way to solve such a problem P which includes the notion of overlapping sub-problems and optimal sub-structure? Its simple, isn't it? Solve S1 first and remember the result. Then solve S2 utilizing the remembered result for S1 (S2 depends on S1). Then solve P using the solutions of S2 and S1. All you had to do was start from bottom (S1) and move gradually (through S2) towards up (P). That makes your computation easier, gives you a solution and a chance to boast about your DP prowess.

So long, DP!

Wednesday, March 2, 2011

Dynamic Programming - Truth II

A Dynamic programming problem has a optimal substructure.

The second most important truth of DP!

What does this mean? It simply means that you cannot apply DP unless you are sure that the optimal solution of sub-problems will help you in getting the optimal solution of the bigger problem. And how would any sane person know if a problem has optimal sub-structure? Simple, by putting your grey cells to work.

Lets think about it taking a example.

- Shortest Path
The problem is to find a shortest path from a vertex u to vertex v in a graph G (lets forget about the nature of the graph for a while ... I am trying to lay out a concept).
Lets say we have i vertices (labelled from 1,2,3 .. i) from u to v in the actual shortest path. So the shortest path looks like this:

u ->1 -> 2 -> 3 -> .... ->i -> v

Now to reach vertex v through a shortest path, you must reach vertex i through the shortest path. Why? Because if you do not do so, and give me a solution that includes a path from u to v through i such that the path from u to i is not the shortest one, then I will simply replace the path from u to i with the shortest path and get a better solution than the one you gave to me! 

Whoa! Do you see now the meaning of optimal sub-structure. You cannot solve shortest path from u to v optimally unless you solve shortest path from u to i optimally. And you cannot solve the shortest path from u to i optimally unless you solve the shortest path from u to j (where j is an intermediate vertex in the shortest path from u to i). And you cannot solve shortest path from u to j unless ... the story continues.

At this juncture, I hope you might be a little sensitive to DP and its truths. But a few of you might be stricken by the feeling that this all seems so dreadfully familiar to Divide and Conquer technique. Yes, indeed. This is similar, but yet different. The next part is the grand finale for the DP treatise where I will outline a few more insights into it. Till then, an interesting fact to ponder about:

Solving single source shortest path to a destination vertex in a graph is as hard as solving single source shortest path to all the vertices. Think about it, have you seen a polynomial time algorithm that calculates the single source single destination shortest path only without calculating all other shortest paths from the source?
Voila! This is your chance to fame. :) 



Dynamic Programming - Truth I

I read it in undergrad, thought about it and did not understand why it works !

I read it during my grad, thought about it and got some understanding!

I think about it often and to this day, I am gaining an insight into it!

My insight

What is dynamic programming?

Its a bottom up approach to solve problems. And now, whats the notion of 'bottom up approach'? In simple terms, you break the problem into small parts. Then you start with those smaller constituents of the problem, and combine them one after the other and finally, after combining all the constituents, your solution is ready.

Wait! I thought that was divide and conquer.
Or err ... that is recursion dude, you ain't know nothing.

True indeed. This matches with the notion of divide and conquer technique. And divide and conquer can be formulated by recursion. If you have understood this, its time for you to know about another great truth.

In dynamic programming, the smaller constituents or sub-problems OVERLAP!

[I am going off your blog. Thinking intuitively , huh and all you could give me was bookish gibberish which I cannot comprehend.]

Wait! This is not gibberish and it is one of the two most important truths of DP. Let me explain it through an example. Do you know binary search and how we split the array into half each time during binary search? Remember that innocuous looking algorithm of binary search that is considered to be one of the most beautiful things on this planet existing within a dimension of log n time.

I hope you do remember that algorithm. If you don't, please go, read it and then come back. Now for those of you who are waiting for me to go further, here it is:

In binary search, you split the array into two equal halves, say L and R during one search. The number that you are searching for is either in L or in R. It cannot be in both (lets consider array of distinct elements only for now). If you take a pause and think now, you broke down your search problem into smaller parts: search in part L or search in part R. But the search will be successful in one of the parts. What does that imply? Your smaller parts do not overlap. Basically, either you go to part L and search further (never coming back to R again) or you go to R , search and discard part L. There is no notion of an inter-dependence between the two smaller parts for finding the solution(in this case, searching the element).

Now do you see the notion of sub-problems overlapping! If the sub-problems overlap, we cannot get the final solution unless we solve and combine them in a certain way (bottom-up way).

Not to confuse you further, I will put out my remaining truth of DP in the next post. Do give it a look.

Tuesday, March 1, 2011

Intuition

In my opinion, there is a huge difference between conscious logic and intuition. Intuition is that sudden leap of faith which conscious logic would take hours to process and understand. Intuition is that feeling you have, always. Intuition is your sensitiveness towards specific things. And Intuition is instinctive.

Why is it important?

To understand the essence! :)

That said and done, the aim of this blog is to present an intuitive understanding of things I find interesting. Hope you would like it.