Sunday, December 25, 2016

Verifying Convolution Theorem on 2D Images (MATLAB Code)

The objective of this post is to verify the convolution theorem on 2D images. I will follow a practical verification based on experiments.

In mathematics, the convolution theorem states that under suitable conditions the Fourier transform of a convolution is the pointwise product of Fourier transforms. In other words, convolution in one domain (e.g., time domain) equals point-wise multiplication in the other domain (e.g., frequency domain).
Figure 1. Convolution Theorem

Fourier transform (FT) calculates the frequency domain representation of a spacial domain signal, while inverse Fourier transform (IFT) does the opposite; given the frequency domain representation of a signal, it calculates the spacial domain representation of it.

Figure 2. Fourier Transform and Inverse Fourier Transfrom

Convolution theorem is not only valid for 1D signals, but also 2D signals. This makes it very useful for many image processing and computer vision applications. Figure 3 shows two example images in spacial domain ( we are used to see images in that domain :) ) and the corresponding representation in frequency domain.
Figure 3. Examples for 2 images in spacial and frequency domains

In our verification experiment, we will apply it to three different images with different level of details: high, medium, and low details. The reason will be clear at the end of this post.
Firstly, let's create a black image W of the same size of I with a small white rectangle in the middle. 
For the image I, firstly, we will get the result of multiplying the frequency domain of I by W:
  1. Apply Fourier transform on I, let the result be F
  2. Calculate the point-wise multiplication of F and W, let the result be F×W
  3. Get the inverse Fourier transform of F×W, let this result be R1
Now, let's do the inverse process, for the same image I, we will get the result of the convolution of the spacial domain of I and the inverse Fourier transform of W.
  1. Apply inverse Fourier transform on W, let the result be M
  2. Calculate the convolution of I and M, let the result be R2
As described earlier, the convolution theorem establish that the two processes described above to get R1 and R2 are equivalent; so R1 and R2 should be the same images at the end. If we see that, we verify the convolution theorem on 2D images.
In the first process, we point-wise multiply the input image frequency domain representation by a black image with a small white rectangle in the middle. We eliminate high frequencies and keep low frequencies. So, the spacial domain of the resulted image should be a blurred version image of the input image, because high frequencies in that input image are eliminated.
The smaller the white rectangle in the middle of W is, the more frequencies we remove, and hence, the more blurred image we get. The following figures 4 and 5 show R1 and R2 results for an input image of high detail. (It's form Microsoft PhD Summer School 2015 that I was invited for in Cambridge University.). As it's clear from the figures R1 and R2 are equivalent, which verifies the convolution theorem in 2D. In both results, the white rectangle of W dimensions are 30% of image dimensions.

Figure 4. R1 for a high detail image, the white rectangle of W dimensions are 30% of image sizes

Figure 5. R2 for a high detail image, the white rectangle of W dimensions are 30% of image sizes

For the same high detail image, if we visualize R1 and R2 but when the white rectangle of W is smaller to have dimensions of 10% of image dimensions instead of 30%. The resulted image becomes more blurred as expected as shown in figures 6 and 7. When the size of the white rectangle W is larger, the level of detail preserved in R1 and R2 increases, i.e., more high frequency components of I are preserved. 

Figure 6. R1 for a high detail image, the white rectangle of W dimensions are 10% of image dimensions

Figure 7. R2 for a high detail image, the white rectangle of W dimensions are 10% of image dimensions

For the white rectangle of W dimensions are 30% of image dimensions, let's use an input image I of low detail already instead of the high detail image we used above. The result is shown in Figure 8.

Figure 8. R1 for a low detail image, the white rectangle of W dimensions are 30% of image dimensions

As we can see, using the same W, the result in figure 8 looks less blurred than the result in figure 4. Because in figure 8 experiment, the input image contains less details already. For the same white rectangle W size, the image of high detail losses more information than the low detail image (i.e., looks more blurred). The high frequency components of a highly detailed image contain more information than it in a less detailed image; this is why the highly detailed image loses more information (and looks more blurred) when the same pass filter is applied to both images.

Please note that the obtained spacial domain of W represents the known image low pass filter everyone uses. Also, my MATLAB code in MathWorks FileExchange website conducts more experiments than I've presented here. Figure 9 shows last example for the result using a moderate level of detail image as input.

Figure 9. R2 for a medium detail image, the white rectangle of W dimensions are 10% of image dimensions

Sunday, August 28, 2016

Ranked top 5% percent in Kaggle Distracted Driver Competition

Kaggle State Farm Distracted Driver Detection competition has just ended, and I ranked within top 5% (64th out of 1450 participating teams, winner's got $65,000).My approach is mainly based on Deep Learning (trained 20 very deep models) but still applies Computer Vision strategies to reduce neural network distraction.A brief description about the system is in the image below:
References:[1] Rajen Bhatt, Abhinav Dhall, 'Skin Segmentation Dataset', UCI Machine Learning Repository.
[2] X. Zhu, D. Ramanan. "Face detection, pose estimation and landmark localization in the wild" Computer Vision and Pattern Recognition (CVPR) Providence, Rhode Island, June 2012.
[3] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).

  • The competition was very challenging, we did not do some costly annotations nor used test data in any form of learning (even semi-supervised) nor annotation.
    We treat images independently, while some participants learn from test data and take advantage of the fact that test images are originally sampled from recorded videos. So, they do some sort of test videos reconstruction and hence image classification makes use of temporal context.
    Also, some participants annotate their training data and some crowdsources the annotation. Which is either too much work or needs money.
    Our system is more general than such systems, even that they are doing better than ours in leaderboard; they are indirectly over-fitting the competition test data.
  • The average loss metric used in competition leaderboard ranking doesn't directly reflect the system accuracy. I believe all top 100 systems classification accuracies are higher than 99%, but the loss metric reflects how were you confident in your classification, which is harder. Only a single misclassification with high confidence, will give a very bad average loss.
  • Face detection for such problem is hard, the well-known Haar Cascades surely fail. The example in the image above is easy, but normally driver face is seen from side.
  • I've tried lots of strategies for upper body Human Pose Estimation (Calvin) and scene/driver segmentation (this and this) but didn't achieve good results.
  • I used data augmentation. Because the training data size is not large.
  • The camera is not calibrated, and changes orientation. The system should be intelligent enough to handle this.
  • I know many interesting details and results (like the following image visualizing the most important area that affected network decision) are missing here, but I will be happy to answer any of your questions about any details.

Thursday, July 28, 2016

Regression for Classification

In this article, I describe how to use regression to tackle a classification problem. Regression and classification are fundamental topics in machine learning. To remind you, in regression: the output variable takes continuous values, while in classification: the output variable takes class labels. To demonstrate and proof the concept, I wrote a configurable MATLAB code that you can download from the link below (no MATLAB toolboxes are used):

In the link above, I provide source code for Least Squares Regression along with two data sets to run the code on. Each set consists of sample data points repressing two classes. One of the sets represents a linearly-separable classification problem, and the other set is for a non-linearly separable problem. You can easily configure the code to train a model on any of the two sets, or any custom data set you have created by setting the variable: dataFileName.

To use the Least Squares Regression to solve a classification problem, a simple trick is used. The data points of the first and second classes are extended by adding a new extra dimension.  This produces an augmented cloud of points in n+1 dimensional space, where n is the size of the original data space. In that extra dimension, the data points belonging to the first and second classes take values of -1 and +1 respectively.
Then, samples of the augmented data (with the extra dimension) are fitted using Least Square Regression. In my code, the function to be fitted is chosen to be a polynomial function.  The regression objective is to estimate the parameters of that polynomial such that it best fits the training data in a least-squres sense. You can easily change the order of the polynomial by setting the variable: polynomial_order. If it's set to 1, in case of the 2D data points I used as example with my code, the fitting polynomial will represent a plane in 3D. If it's set to more than 1, it will allow curvatures and hence more complex data fitting. 

To achieve classification, the classification decision boundary is simply the intersection between the fitted polynomial surface and the surface where the extra dimension is constant at a value midway between -1 and +1. The 1 and -1 in the previous sentence are equal to the values we have previously set in the extra dimension for each class. If we set different values, it should be different. Figure 1 shows the decision boundary of classifying linear data from different point of views, and figure 2 shows the same for the wave-alike data, where misclassified samples are circled in red. In such 2D data points case, the decision boundary is the intersection of the fitted polynomial and the horizontal plane passing by z=0 (z is the extra dimension here).
Figure 1. Least square models to classify linear data from different points of view

Figure 2. Least square models to classify the wave-alike data from different points of view, misclassified samples are circled in red.

For classification accuracy, I use the Minimum Correct Classification Rate (MCCR). MCCR is defined as the minimum of CCR1 and CCR2. CCRn is the ratio of the correctly classified test points in class n divided by the total number of test points in class n. The MCCR for the linear data set is zero using a polynomial of order 3. For the wave-alike data, the MCCR = 0.94. The reason behind not achieving a perfect MCCR=1 for the wave-alike data is that classification with Least Squares Regression is prone to outliers, and it tries to fit a function such that all the training points give a small squared errors. Figure 3 is taken from Chapter 4 of "Pattern Recognition and Machine Learning" by Bishop. Both images in the figure shows the classification decision boundary obtained from a Least Squares Regression as detailed above in purple color. The decision boundary is good, until some outliers data points are added to the blue class as in the image to the right. The resulted classier penalizes these outliers even that they are 'too correct' data points. What is in green is the decision boundaries obtained by Logistic Regression. The advantage of it is that it's prone to outliers, and does not penalize the 'too correct' data points.
Figure 3. Classification with Least Square Regression is prone to outliers, however with Logistic Regression does not penalize those 'too correct' data points

Thursday, June 30, 2016

Pretty Printing of a Tree Data Structures [AVL tree example]

To debug a tree data structure, one usually constructs a set of test cases and use a pen and a piece of paper to draw and trace the tree data structure content at different steps of operations (insertion, deletion, ...). Everyone who has been doing this knows well that this is a tedious task, especially when the tree is big and data structure operations are complex.

This is why I'm introducing a nice tree printing (visualization) member function for you in C++. The function is generic, and you can easily modify it to work with any other tree data structures. The output from such function is as shown in the two examples below (enlarge the images to see them better; first one is for an AVL tree and second example is for 2-3 alike tree):

The function name is "printTree". I've integrated it with an AVL tree implementation that is adapted from Prof. Florin Balasa graduate class of advance algorithms. There are three source code files: "AvlTree.h", "", and "". To compile with GNU compiler, simply run: "g++".

File AvlTree.h :
#ifndef  AVLTREE_H
#define  AVLTREE_H

#include <cstdlib>
#include <iostream>
using namespace std;

#define KEY_SPACE 10 //The maximum spaces needed to draw a key value like: 6,8,10,20,... (Including all the node braces)

//   Template classes for a node and for an AVL tree

template <class Obj>  class AvlTree;
template <class Obj>
class AvlNode {
    // Data members: the key, two pointers, and the height
    Obj key;       
    AvlNode *left, *right;
    int height;
    AvlNode(const Obj& ob = Obj(), AvlNode *= NULL, AvlNode *= NULL, int h = 0): key(ob), left(l), right(r), height(h) { }
    friend class AvlTree<Obj>;

template <class Obj>
class AvlTree {
        AvlNode<Obj>*  root;
        void makeEmpty(AvlNode<Obj>*& t) const;
        void rotateLeft(AvlNode<Obj>* & k1);
        void rotateRight(AvlNode<Obj>* & k2);
        void rotateLeftRight(AvlNode<Obj>* & k3);
        void rotateRightLeft(AvlNode<Obj>* & k1);
        AvlNode<Obj>* copyTree(AvlNode<Obj>* t) const;  
        void insert(const Obj& x, AvlNode<Obj>* & t);
        int height(AvlNode<Obj>* t) const { return(== NULL) ? -1 : t->height; }
        int max(int a, int b) const { return a>=? a : b; }
        AvlTree() { root = NULL; }
        AvlTree(const AvlTree& rhs); // The copy constructor
        ~AvlTree() { makeEmpty(root); }

      const AvlTree<Obj>& operator=(const  AvlTree& rhs);
      AvlTree<Obj>& insert(const  Obj& x) { insert(x, root);  return *this; }
      AvlNode<Obj>* find(const  Obj& x) const;
      AvlNode<Obj>* findMin() const;
      AvlNode<Obj>* findMax() const;
      AvlTree<Obj>& remove(const  Obj& x);
      void printTree() const;

#include ""
End of File AvlTree.h

File :
#include <stack>
#include <vector>  /* Needed in printTree */
#include <math.h>  /* pow and log10 functions needed in printTree */
#include <iomanip> /* Needed in printTree */
#include <sstream> /* Needed in printTree */
#include <fstream> /* Needed in printTree */
template <class Obj>
AvlTree<Obj>::AvlTree(const AvlTree& rhs) { // The copy constructor
      root = copyTree(rhs.root);       

template <class Obj>
AvlNode<Obj>* AvlTree<Obj>::copyTree(AvlNode<Obj>* t) const {
       return( t == NULL ) ? NULL : new AvlNode<Obj>( t->key, copyTree(t->left), copyTree(t->right), t->height );

template <class Obj>
void AvlTree<Obj>::makeEmpty(AvlNode<Obj>*& t) const { // Deletes bottom-up the whole tree
       if( t != NULL ) {
             makeEmpty(t->left);       makeEmpty(t->right);
             delete t;      t = NULL;

template <class Obj> 
const AvlTree<Obj>& AvlTree<Obj>::operator=(const AvlTree& rhs) { // The assignment operator
      if( this != &rhs ) {
            root = copyTree(rhs.root);
      return *this;

template <class Obj>
void AvlTree<Obj>::printTree() const { // d is the depth of the node in the tree
    if( root != NULL ) {
        vector<AvlNode<Obj>*> crtLvlNodes(0); // Stored in stack(no delete is needed)
        vector<AvlNode<Obj>*> nxtLvlNodes(0);
        for ( int lvl=height(root)+1; lvl>0; lvl-- ) {
            int count = 2;
            for( int i=2; i<=lvl; i++ )
                count += pow(2,i) + pow(2,i-2)*(KEY_SPACE-4)/2;