Machine Learning & Tools : August 2016

I was challenged by my mentor to implement the below condition using Python.

Impose a set of behavioral state-change rules to the random arrays (100 x 100)

Whenever a square is surrounded by 4 other squares (either adjacent or diagonal) then it retains its value for all future iterations from that point forward.

All squares that do not become bound in any given iteration, continue changing until they are bound by 4 squares at some future time.

The goal of this is to initiate a random array, populate it with arbitrary set of numbers from a uniform random distribution [0,1], and to apply the rules below to each square, evaluated on every iteration.

Below is the pictorial representation of the problem

While working on the implementation found that this condition is somewhat similar to Conway Game of life. I read some details about the Conway game of life and found it very interesting. Here are some of the details about the Conway Game of life :

The universe of the Game of Life is an infinite two-dimensional orthogonal grid of square cells, each of which is in one of two possible states, alive or dead, or "populated" or "unpopulated" (the difference may seem minor, except when viewing it as an early model of human/urban behavior simulation or how one views a blank space on a grid). Every cell interacts with its eight neighbours, which are the cells that are horizontally, vertically, or diagonally adjacent.

I tried to implement the behavioral state-change rules using Python. While implementing took help from some of the online posts also. I tried the problem in below two ways by first taking only two states and then in the second attempt tried the rules on the random array. Here are those two approaches :

1) Took only two option of cells to be live or dead and applied the behavioral state change rule.

2) Applied behavioral state-change rules to the random arrays (100 x 100)

Here is the code for both the conditions :

Approach 1: Took only two option of cells to be live or dead and applied the behaviourial state change rule.

################################################################################

# conway-On-Off.py

# Author: Ravi Vijay

# Description:

# A simple Python/matplotlib implementation of following condition :

# Whenever a square is surrounded by 4 other squares (either adjacent or diagonal) then it retains its #value for all future iterations from that point forward.

# All squares that do not become bound in any given iteration, continue changing until they are #bound by 4 squares at some future time.

################################################################################

import numpy as np

import matplotlib.pyplot as plt

import matplotlib.animation as animation

N = 100

ON = 255

OFF = 0

vals = [ON, OFF]

# populate grid with random on/off - more off than on

grid = np.random.choice(vals, N*N, p=[0.2, 0.8]).reshape(N, N)

def update(data):

global grid

# copy grid since we require 4 neighbors for calculation

# Putting here the loop to go line by line

newGrid = grid.copy()

for i in range(N):

for j in range(N):

# compute 4-neighbor sum (once adjacent & once diagonal)

# At boundary condition we wrap around

# Adjacent Total

total1 = (grid[i, (j-1)%N] + grid[i, (j+1)%N] +

grid[(i-1)%N, j] + grid[(i+1)%N, j])/255

# Diagonal Total

total2 = (grid[(i-1)%N, (j-1)%N] + grid[(i-1)%N, (j+1)%N] +

grid[(i+1)%N, (j-1)%N] + grid[(i+1)%N, (j+1)%N])/255

# Apply rules

if grid[i, j] == ON:

if (total1 !=4) or (total2 != 4):

newGrid[i, j] = OFF

else:

newGrid[i, j] = ON

# update data

mat.set_data(newGrid)

grid = newGrid

return [mat]

# set up animation

fig, ax = plt.subplots()

mat = ax.matshow(grid)

ani = animation.FuncAnimation(fig, update, interval=200,

save_count=200)

plt.show()

Here is the screenshot for the iteration :

Approach 2: Applied behavioral state-change rules to the random arrays (100 x 100)

################################################################################

# conway.py

# Author: Ravi Vijay

# Description:

# A simple Python/matplotlib implementation of following condition :

# Whenever a square is surrounded by 4 other squares (either adjacent or diagonal) then it retains its value

# for all future iterations from that point forward.

# All squares that do not become bound in any given iteration, continue changing until they are bound by 4 squares at some future time.

################################################################################

import numpy as np

import matplotlib.pyplot as plt

import matplotlib.animation as animation

N = 100

m=100 # Matrix Size

c=5 # Number of Colors

# populate grid with random on/off - more off than on

grid = np.random.randint(c, size=(m, m))

def update(data):

global grid

# copy grid since we require 8 neighbors for calculation

# and we go line by line

newGrid = grid.copy()

for i in range(m):

for j in range(m):

# apply rules

if ((grid[i, (j-1)%N] == grid[i, (j+1)%N] == grid[(i-1)%N, j] == grid[(i+1)%N, j])

or (grid[(i-1)%N, (j-1)%N] == grid[(i-1)%N, (j+1)%N] == grid[(i+1)%N, (j-1)%N] == grid[(i+1)%N, (j+1)%N])):

newGrid[i, j] = grid[i,j]

else:

newGrid[i, j] = np.random.randint(c, size=(1, 1))

# update data

mat.set_data(newGrid)

grid = newGrid

return [mat]

# set up animation

fig, ax = plt.subplots()

mat = ax.matshow(grid)

ani = animation.FuncAnimation(fig, update, interval=50,

save_count=50)

plt.show()

Below is the screenshot of few of the iterations :

Got the chance to go through this paper on Machine Learning. It was recommended to me by my mentor. This is a very good paper to understand the whole perspective on Machine learning. Author has summarized very well about the supervised learning methods and when you should consider using each method.

It took me some time to go through the whole paper and also I need to come back on various topics in the article to have the better understanding. I highly recommend this paper to all those who are starting with the machine learning journey and would like to see the importance of each ML algorithm for supervised learning.

These are some of the points that I would like to summarize :

Supervises Learning : If the instances are given with known labels

Unsupervised Learning : Instances are unlabeled.

The main aim to go through various ML algorithms is to understand the strengths and limitations of each algorithms. While solving the problem we should consider using the combination of algorithms for better accuracy.

Just using one algorithm and trying to increase the accuracy may not help always.

Issues with supervised learning :

The first step towards using ML is to collect the dataset.
While preparing the dataset try to find out the fields which are most informative. If that information is not available we can use brute force method ie measure everything in the hope that the right feature can be isolated.There are challenges with the brute force as it requires pre processing.

Some of the methods used for pre processing & cleansing of dataset described by author are :

Variable-by-variable data cleansing
Use of visualization tool.
Instance selection to handle noise - This is more of the optimization problem. For sampling the instances methods can be used such as Random sampling, Stratified sampling etc
Handling the incomplete feature values - Methods like ignoring features with unknown values, Selecting most common feature value, mean substitution, regression & classification, treating mission values as special values can be used.
Feature subset selection
Feature construction/transformaton

Algorithm Selection :
The author talks about the importance of using the right algorithm. The prediction accuracy is the main criteria for the classifier evaluation. The techniques generally used for the classifier accuracy are - split the training set, cross validation method, Leave one out method. Leave one out is the special case of cross validation process. The disadvantage of this method is that it requires lot of computation but gives the most accurate estimate of the classifiers error rate.

Supervised Machine Learning techniques -

1) Logic Based Algorithms - Symbolic learning Method

a) Decision Trees: These are the trees that classify instances based on the feature values. The decision trees are easy to comprehend.

b) Rule based classifier: Here the author talks about the difference between the Decision tree and the Rule based classifiers. The main difference emphasized by author is that the decision tree evaluate the average quality of number of disjoint sets whereas the rule classifier only evaluate the quality of the set of instances that is covered by the candidate rule.

Another difference is that the decision tree uses the Divide & conquer approach while the Rule based classifier uses the Separate and conquer method. For smaller datasets its always advantageous to use the divide and conquer method.

c) Inductive Logic Programming : Some of the characteristics emphasized by author for Inductive Logic Programming are - Expressive representation, Ability to make use of logically encoded background knowledge & extensive use of Inductive Logic programming in Relational Data Mining.

2) Perceptron Based Techniques :

Perceptron is described as - If X1 to Xn are input features and W1 to Wn are prediction weights/vectors, than perceptron algorithm computes the sum of weighted inputs iXiWi and adjust the threshold : if sum is above threshold output is 1 else 0.
Author describes the advantage of perceptron based technique when the number of features are high.
Some of the disadvantages are that it cannot deal with numerical problems and also perceptron like methods are binary.

Neural Networks :

If a straight line can be drawn to separate the input instances into correct category, perceptron culd find the solution. IF the straight line cannot be drawn, it is difficult to reach the point where all instances are classified properly.
Artifical Neural Networks helps to solve this problem.

Neural network consist of 3 types of units - Input Unit, Output Units and in between of input & output units, hidden units are present.

ANN depends on 3 fundamental aspects - Input & activation function of the unit, network architecture and weight of each input connection.

The most common method used to find the value of the weight is Back Propogation Algorithm.

The major disadvantage of the ANN is their lack of ability to reason about their output in a way that can be effectively communicated.

3) Statistical Learning Algorithms :
The author describes that as compared to ANN, statistical approach define the probability model to define if a an instance belong to a particular class. Bayesian networks & Instance based method belongs to this category.

Bayesian Network :

Bayesian Network is a probabilistic graphical model that represents a set of random variables and their conditional dependencies via a directed acyclic graph (DAG). For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

Author explains that computational difficulty in exploring a previously unknown network is one of the limitation of Bayesian Network.

Author also emphasizes on the feature of BN as compared to decision tree or neural networks - possibility of taking into account prior information about a given problem.

It is also important to note that BN are not useful for data sets with many features.

Naive Bayes Classifier :

NB classifiers belongs to the family of Bayesian Network with strong independence assumptions between the features. They are composed of DAG's with single parent and many children with the assumption of independence among the child nodes.

Author also emphasizes that the assumption of independence among child nodes is always wrong and that is the reason that NB classifiers are less accurate as compared to other sophisticated algorithms like ANN.

Some of the methods used to overcome the independence assumption and improve the classifier accuracy in research are - CR method (Network has the limitation that each feature could be related to only one other feature), Selective Bayesian Classifier ( to include a feature selection stage. This is to improve the classification performance by removing the irrelevant features).

Author emphasizes that the majr advantage of NB classifier is the short computational time for training.

Instance-based learning :

These are also called the lazy learning algorithms as they delay the generalization process till the classification process. They require less computational time during the training phase and more during the classification phase due to this reason.
One of the example talked about the author is k nearest neighbor. Some of the characterstics of K nearest neighbor are - they have lasrge storage requirement, sensitive to choice of similarty fnction, lack well defined method to choose k ie its nearest neighbors.

4). Support Vector Machine

SVM runs around the concept of margin. Author describes that if we consider the hyperplane and the margin that separates two data classes, maximizing the margin can help in to reduce the upper bound on expected generalization error.
SVM is very useful where the number of features are large as compared to number of training instances. The author describes how Sequention Minimal Optimization can relatively quickly solve the SVM QP problem.

5). Characteristics of each Algorithm

Author described the characteristic of each algorithm and have put the point of view on where & why to use each algorithm. I have summarized the highlight of that below :

SVM & Neural Networks - Multi-dimensions & continuous features.
Logic Based - Discrete/categorical features
SVM - Requires large sample size

KNN - very sensitive to irrelevant features, lot of storage space, require complete set with missing values eliminated.
Neural Networks - presence of irrelevant features can make training inefficient, require complete set with missing values eliminated.

Naive Bayes - High Bias, little storage, robust to missing values.
Decision Tree, Neural Network, SVM - High Variance

At the end, author summarizes how these machine learning algorithms could be combined together to get the better outcome.

Thoroughly enjoyed reading and summarizing the article here in this blog !

Machine Learning & Tools

Thursday, 18 August 2016

Impose a set of behavioral state-change rules to the random arrays using Python

Thursday, 11 August 2016

Paper Review - Machine Learning : a review of classification and combining techniques