Thursday, June 30, 2016

Pretty Printing of a Tree Data Structures [AVL tree example]

To debug a tree data structure, one usually constructs a set of test cases and use a pen and a piece of paper to draw and trace the tree data structure content at different steps of operations (insertion, deletion, ...). Everyone who has been doing this knows well that this is a tedious task, especially when the tree is big and data structure operations are complex.

This is why I'm introducing a nice tree printing (visualization) member function for you in C++. The function is generic, and you can easily modify it to work with any other tree data structures. The output from such function is as shown in the two examples below (enlarge the images to see them better; first one is for an AVL tree and second example is for 2-3 alike tree):

The function name is "printTree". I've integrated it with an AVL tree implementation that is adapted from Prof. Florin Balasa graduate class of advance algorithms. There are three source code files: "AvlTree.h", "", and "". To compile with GNU compiler, simply run: "g++".

File AvlTree.h :
#ifndef  AVLTREE_H
#define  AVLTREE_H

#include <cstdlib>
#include <iostream>
using namespace std;

#define KEY_SPACE 10 //The maximum spaces needed to draw a key value like: 6,8,10,20,... (Including all the node braces)

//   Template classes for a node and for an AVL tree

template <class Obj>  class AvlTree;
template <class Obj>
class AvlNode {
    // Data members: the key, two pointers, and the height
    Obj key;       
    AvlNode *left, *right;
    int height;
    AvlNode(const Obj& ob = Obj(), AvlNode *= NULL, AvlNode *= NULL, int h = 0): key(ob), left(l), right(r), height(h) { }
    friend class AvlTree<Obj>;

template <class Obj>
class AvlTree {
        AvlNode<Obj>*  root;
        void makeEmpty(AvlNode<Obj>*& t) const;
        void rotateLeft(AvlNode<Obj>* & k1);
        void rotateRight(AvlNode<Obj>* & k2);
        void rotateLeftRight(AvlNode<Obj>* & k3);
        void rotateRightLeft(AvlNode<Obj>* & k1);
        AvlNode<Obj>* copyTree(AvlNode<Obj>* t) const;  
        void insert(const Obj& x, AvlNode<Obj>* & t);
        int height(AvlNode<Obj>* t) const { return(== NULL) ? -1 : t->height; }
        int max(int a, int b) const { return a>=? a : b; }
        AvlTree() { root = NULL; }
        AvlTree(const AvlTree& rhs); // The copy constructor
        ~AvlTree() { makeEmpty(root); }

      const AvlTree<Obj>& operator=(const  AvlTree& rhs);
      AvlTree<Obj>& insert(const  Obj& x) { insert(x, root);  return *this; }
      AvlNode<Obj>* find(const  Obj& x) const;
      AvlNode<Obj>* findMin() const;
      AvlNode<Obj>* findMax() const;
      AvlTree<Obj>& remove(const  Obj& x);
      void printTree() const;

#include ""
End of File AvlTree.h

File :
#include <stack>
#include <vector>  /* Needed in printTree */
#include <math.h>  /* pow and log10 functions needed in printTree */
#include <iomanip> /* Needed in printTree */
#include <sstream> /* Needed in printTree */
#include <fstream> /* Needed in printTree */
template <class Obj>
AvlTree<Obj>::AvlTree(const AvlTree& rhs) { // The copy constructor
      root = copyTree(rhs.root);       

template <class Obj>
AvlNode<Obj>* AvlTree<Obj>::copyTree(AvlNode<Obj>* t) const {
       return( t == NULL ) ? NULL : new AvlNode<Obj>( t->key, copyTree(t->left), copyTree(t->right), t->height );

template <class Obj>
void AvlTree<Obj>::makeEmpty(AvlNode<Obj>*& t) const { // Deletes bottom-up the whole tree
       if( t != NULL ) {
             makeEmpty(t->left);       makeEmpty(t->right);
             delete t;      t = NULL;

template <class Obj> 
const AvlTree<Obj>& AvlTree<Obj>::operator=(const AvlTree& rhs) { // The assignment operator
      if( this != &rhs ) {
            root = copyTree(rhs.root);
      return *this;

template <class Obj>
void AvlTree<Obj>::printTree() const { // d is the depth of the node in the tree
    if( root != NULL ) {
        vector<AvlNode<Obj>*> crtLvlNodes(0); // Stored in stack(no delete is needed)
        vector<AvlNode<Obj>*> nxtLvlNodes(0);
        for ( int lvl=height(root)+1; lvl>0; lvl-- ) {
            int count = 2;
            for( int i=2; i<=lvl; i++ )
                count += pow(2,i) + pow(2,i-2)*(KEY_SPACE-4)/2;

Thursday, June 9, 2016

Does More Features always mean Higher Accuracy ? [Bayesian Classifier MATLAB Code]

In machine learning, is it always true that you will achieve higher classification accuracy if you use more features ? In other words, does more features always mean higher accuracy ?
This is the question that I'm going to analyze and answer in this post. I hope you find it useful. Please let me know if you have any questions in the comments, I will be happy to answer. For that we will use the Skin Segmentation Dataset and the Naive Bayesian Classifier. I implement the Bayesian classifier, and no libraries or Toolboxes are used.

The code is here for you to download and play with.

The Skin Segmentation Dataset (Rajen Bhatt, Abhinav Dhall, UCI Machine Learning Repository) is collected by randomly sampling B,G,R values from face images of various age groups (young, middle, and old), race groups (white, black, and Asian), and genders obtained from FERET database and PAL database. Total learning sample size is 245057; out of which 50859 is the skin samples and 194198 is non-skin samples. The dataset is of the dimension 245057 * 4 where first three columns are B,G,R (x1,x2, and x3 features) values and fourth column is of the class labels (decision variable y).
The first 90% in each class samples are chosen to form the training data, while the remaining 10% of points in each class form the test data.

Data Set Characteristics:  
Number of Instances:
Attribute Characteristics:
Number of Attributes:
Date Donated
Associated Tasks:
Missing Values?
Number of Web Hits:

Figure 1 shows plots of the two classes likelihood, the fitted normal distribution (not used by the classifier and plotted for visualization only), and posterior probabilities built from the training data. The plot is for the best single designated feature which I found to the third one (the pixel red color feature). The prior for the first class W1 (skin class) is chosen to be 0.4 and the second class W2 (non-skin class) is chosen to be 0.6. These two values are chosen to reflect the fact that skin colors are really a smaller subset of all the possible colors, and this is the reason why these values achieves better Minimum Correct Classification Rate (MCCR).

Figure 1. Plots of the two classes’ likelihood and posterior probabilities for the best single feature (the red color feature)

Now, I pick two features together until identifying the best couple of features. Figure 2 shows plots of the likelihood and posterior for the best couple of features, which are the blue color and red color features. You have the code and you can try and you will come to the same result as I.

Figure 2. Plots of the two classes’ likelihood and posterior probabilities for the best couple of features (blue and red colors features)

The normal Bayes rule is applied for each test sample x, and the classification to class i is done such that:, where w1 means skin class and w2 means non-skin class. The priors are chosen such that P(w1)=0.4 and P(w1)=0.6 to reflect the fact that skin colors are really a smaller subset of all the possible colors, and this is the reason why these values achieves better Minimum Correct Classification Rate (MCCR). For fair comparison for the three cases of features space dimensionality, the histogram bins width is kept fixed in all the cases; a color value feature takes a range from 0 to 255, and the bin width is chosen to be equal to 10 (set empirically based on the database size).

For the best single feature, the best couple of features, and the three features combined, the resulted MCCR are equal to 0.8439, 0.9581, and 0.9128 respectively as shown in figure 3.

Figure 3. MCCR for different number of features

As stated earlier, the red feature is proven to be the best single feature. Also, my experiment indicated that if any two features out of the three colors features are combined, a better MCCR is achieved than the red color feature alone! The gain is maximized in the case of combining the red and blue colors features (best couple of features).
If the green feature is combined with the best couple of features to form combined three features, the MCCR is still better than the best single feature case. But it slightly decreases compared to the best couple of features case. In general, it is not always better to use more features for higher accuracy, but to use the right features. Any feature has some good aspects where a good representation of some aspects of the data is achieved, and some other bad aspects, where a noisy representation of some aspects of the data is achieved. If the good aspects of a feature are already correlated with the existing features in a system (more than the bad aspects), adding that feature will introduce more overall bad feature representation and reduces the MCCR. This is what happened when we added the green color feature to the best couple of features.

Note: On the other hand, the higher the number of features (especially when the number of training samples is small) the more probable to over-fit on the training data, and hence you achieve lower classification accuracy in test data. This is one reason why PCA is good. We will discuss it in detail in a coming post.