Wednesday, December 23, 2015

C/C++ Real-time Task Dispatcher (micro OS for small program memory size)

This is my full implementation for the real-time embedded system task dispatcher of Ron Kreymborgi. It's a small but yet powerful task dispatcher aimed specifically at resource starved systems. It is suitable for many small embedded systems which have restrictions on program memory size.

The dispatcher is described in the following document. I have also integrated it into a PIC32 project and provided a test application for my implementation. It's provided in Microchip MPLAB-X. To run it, install the Microchip MPLAB-X development environment as well as XC32 compiler (make sure that you install PLIB).



Thursday, December 17, 2015

Collision Avoidance and Navigation with Evolutionary Neural Network


1- Project Description
This is a fully configurable MATLAB project that implements and provides simulation for vehicle self-learning of collision avoidance and navigation with a rangefinder sensor using an evolutionary artificial neural network. The neural network guides the vehicle around the environment and a genetic algorithm is used to pick and breed generations of more intelligent vehicles.
The vehicle uses a rangefinder sensor that calculates N intersections depths with the environment and then feeds these N values as inputs to the neural network. The inputs are then passed through a neural network and finally to an output layer of 2 neurons: a left and right steering force. These forces are used to turn the vehicle by deciding the vehicle steering angle.
Each vehicle represents a different chromosome in a generation (or a unique set weight for the neural net) which are evaluated and potentially carried through to the next generation by a fitness score. The fitness score has different definition in each of my three experiments for collision avoidance and navigation self-learning.

2- Software Configurations
  • The vehicles dimensions and its wheels (base and dimensions)
  • Rangefinder range and number of beams
  • The environment
  • Neural network architecture
  • Number of vehicles and their replacement strategy
  • The generic algorithm parameters: mutation probability, crossover probability, crossover site probability distribution, population size, selection strategy, replacement strategy …
3- Simple 2D vehicle steering physics
Given the vehicle speed and simulation time tick Δt the travelled distance L per a single time step is calculated. Given wheel base, vehicle position P, heading θ, and distance travelled per time step L, the new vehicle position Pnew and heading θnew are calculated as shown in figure 1. Video 1 shows a simulation result.

Figure 1. Simple 2D vehicle steering physics

Video 1. Simple 2D vehicle steering physics in action

4- Self-learning Navigation Experiment
Fitness function is chosen to be the distance that the vehicle traveled along the track before it collides with track boundaries. I was surprised by how fast vehicles learn navigation without any human interaction! In less than 50 generations with each generation having a population of 200 chromosomes, and with a neural network of only 3 hidden layers, perfect navigation is learnt! Mutation probability is 0.1, crossover probability is 1, cross over site follows the normal distribution: ~N(95%,5%), selection is based on tournaments of size 10 candidates, and all children replace their parents replacement strategy is adopted.

There is an interesting observation here. For the track map shown in figure 2, the vehicle took 12 generations to learn how to successfully turn in the first critical location A marked by red circle in the figure. Once the vehicle learns that, it achieves a huge fitness increase by implicitly learning how drive through all the following tricky turns in the track. This fact is demonstrated in figure 3 plot. This interesting because it is similar to the way humans learn things. The same effect happens for the vehicle to learn how to turn by 180° in the critical learning location B.

Figure 2. A track critical learning locations

As in figure 3, after 12 generations, the vehicle tries to learn how to turn by 180° in the critical learning location B, so it modifies its behavior but in a way that makes it fail to pass through the critical location A. This is why the fitness decreases again after it has increased, and that repeats until the vehicle learns to avoid such bad behavior by itself. However, the vehicle still fails to turn by 180°, and this is why the fitness function saturates. Actually, the road is too narrow for the vehicle to learn how to achieve that tricky 180° turn in a small number of learning generations. In another experiment, I modified the track to have a wider width (30 meters width instead of 12 meters, new map is also shown in figure 4). In only 16 iterations the vehicle learned to do that tricky 180° turn and navigate through the map almost forever without colliding! (Specifically, the car travelled the whole track more than 100 times until I stopped it manually.)

Figure 3. Fitness function per generation for figure 3 track set-up

Figure 4. Fitness function per generation for figure 3 track set-up with a wider track

Video 2. Navigation Self-learning

For the track map of figure 5, the time traveled by the vehicle before crash for each generation is shown for different rangefinder sensor number of beams. Moderate number of beams (5 beams performed best) is proven to be the better. Figure 6 shows the same information for different rangefinder sensor ranges. The higher the sensor range is proven to be the better.

Figure 5. Fitness per generation for different number of rangefinder sensor number of beams

Figure 6. Fitness per generation for different number of rangefinder sensor ranges

It’s important to mention that to prevent vehicles from rotating around themselves, a trick that is described later (section 6) in this report is used.

5- Can vehicle learn route to a specific destination?
With a simple modification to the fitness function, such that the fitness function becomes the subtraction of the vehicle drive time before collision and the Euclidean distance between the vehicle position and the destination location just before collision, the vehicle easily learns its route to the destination. In my recorded video for this experiment, it took the vehicle only 7 generations to learn its route to a far destination!

Video 3. Self-learning route to a specific destination

Figure 7. Vehicle learns to decide which turn to take to reach the destination correctly

6- Self-learning Collision Avoidance Experiment
Fitness for each vehicle is simply to survive. A vehicle dies and starts from a random location if it collides with track boundaries or with another vehicle. It’s important to penalize the vehicle responsible for the accident when a collision happens as shown in figure 8. I came to that simple role: when a collision happens, ask the question: “Would crash still happen if a vehicle x is the only vehicle that moved at collision time step?”. If the answer is yes, vehicle x is a reason for that accident, and should be penalized.

Figure 8. Collision penalization. Two examples with two vehicles before and after the accident time step

It is interesting to discover that vehicles started to learn bad habits to survive. Each vehicle learned to rotate around itself such that it avoids colliding with track boundaries and other vehicles! Figure 9 show such behavior. To cope with that, the fitness function is modified such that if a vehicle “gets smart” and starts to rotate around itself, it is penalized with a fitness of zero. That was a banality that is good enough for vehicles not to adopt such a bad habit. The standard deviation of vehicle position can easily detect such behavior.

Figure 9. Vehicles learn bad habit too!

Eventually, vehicles learned to avoid collision. Videos 4 and 5 show the experiment results for early and late generations respectively. The fascinating thing is that no human has told the vehicles how to drive and avoid collision! The video for late generation is recorded while cars are in generations 33, 20, 12, 12, 31, 21, 18, and 14 respectively. A different replacement strategy is adopted to achieve such good performance; the new population is composed of the best 90% children chromosomes in addition to 10% of the best chromosomes from all the vehicles.

Video 4. Collision avoidance experiment for early generations

Video 5. Collision avoidance experiment for late generations

Thursday, November 19, 2015

MLP Neural Network with Backpropagation [MATLAB Code]

This is an implementation for Multilayer Perceptron (MLP) Feed Forward Fully Connected Neural Network with a Sigmoid activation function. The training is done using the Backpropagation algorithm with options for Resilient Gradient Descent, Momentum Backpropagation, and Learning Rate Decrease. The training stops when the Mean Square Error (MSE) reaches zero or a predefined maximum number of epochs is reached.

Four example data for training and testing are included with the project. They are generated by SharkTime Sharky Neural Network.

1- Download Code
http://www.mathworks.com/matlabcentral/fileexchange/54076-mlp-neural-network-with-backpropagation

2- Network architecture & Training Parameters:
The code configuration parameters are as follows:

1- Numbers of hidden layers and neurons per hidden layer. It’s represented by the variable nbrOfNeuronsInEachHiddenLayer. To have a neural network with 3 hidden layers with number of neurons 4, 10, and 5 respectively; that variable is set to [4 10 5].
2- Number of output layer nits. Usually the number of output units is equal to the number of classes, but it still can be less (≤ log2(nbrOfClasses)). It’s represented by the variable nbrOfOutUnits. The number of input layer units is obtained from the training samples dimension.
3- The selection if the sigmoid activation function is unipolar or polar. It’s represented by the variable unipolarBipolarSelector.
4- The learning rate η.
5- The maximum number of epochs at which the training stops unless MSE reaches zero. It’s represented by the variable nbrOfEpochs_max.
6- Option to enable or disable Momentum Backpropagation. It’s represented by the variable enable_learningRate_momentum.
7- The Momentum Backpropagation rate α. It’s represented by the variable momentum_alpha.
8- Option to enable or disable Resilient Gradient Descent. It’s represented by the variable enable_resilient_gradient_descent.
9- The Resilient Gradient Descent parametersη+η-Δmin, Δmax, represented by the variables learningRate_plus, learningRate_negative, deltas_min, and deltas_max.
10- Option to enable or disable Learning Rate Decrease. It’s represented by the variable enable_decrease_learningRate.
11- The Learning Rate Decrease parameters: and . It’s represented by the variables learningRate_decreaseValue and min_learningRate.

The code also contains a parameter for drawing the decision boundary separating the classes and the MSE curve. The number of epochs after which a figure is drawn and saved on the machine is specified. The figures are saved in a folder named Results besides the m files. This parameter is represented by the variable draw_each_nbrOfEpochs. The variable dataFileName takes the Sharky input points file name as string.

3- Results:
In all of the following four test cases, MCCR=1. The stop condition is that MSE reaches zero in all the cases. A unipolar sigmoid function is chosen.

A) Linear Points Case:
Figure 1. Network: 2-4-2 , Unipolar Sigmoid Activation, No Options, η=0.15.

B) Circle Points Case:

Figure2. Network: 2-10-2 , Unipolar Sigmoid Activation, Resilient Gradient Descent, η+=1.2, η-=0.5, Δmin=10^-6, Δmax=50.

C) Wave Points Case:
Figure 3. Network: 2-10-10-2 , Unipolar Sigmoid Activation, Resilient Gradient Descent, η+=1.2, η-=0.5, Δmin=10^-6, Δmax=50.

D) Spiral Points Case:

Figure 4. Network: 2-10-10-2 , Unipolar Sigmoid Activation, Resilient Gradient Descent, η+=1.2, η-=0.5, Δmin=10^-6, Δmax=50.

The following image shows a magnification for the decision boundary with better resolution.
Figure 5. Magnified decision boundary with better resolution for Spiral points case. 

The following video shows the solving of the Two Spirals problem.

Video 1. Solving the Two Spirals Problem.

Wednesday, October 14, 2015

Solving Two Spirals Problem with Multilayer Perceptron [Video]

Video for Solving Two Spirals Problem with Multilayer Perceptron Neural Networks

I will post the code implementation for the network and the backpropagation training trick used to solve it soon. More details about the problem are in this paper.

Monday, June 8, 2015

C/C++ Goes-to Operator ??

Have you ever heard about the "Goes-to" operator (-->) in C and C++ ?
No? So, check out the following code:

#include <stdio.h>
int main()
{
    int x = 5;
    while (x --> 0) // X goes to 0
    {
        printf("%d ", x);
    }
}


The previous code compiles. Actually, there is no operator in C and C++ called "Goes-to". It's two separate operators; decrementing x and return its original value (-- operator), and then compare it if greater than zero (> operator).

To better understand, the while statement could be written as follows:
while ( (x--) > 0 ) .



Thursday, June 4, 2015

A New Variant of Douglas–Peucker Algorithm (2/4) [Original Algo. MATLAB Implementation]

Previous Posts  

Original Douglas-Peucker Algorithm

The following algorithm description is taken from Wikipdeia [1] and CodeProject article by [2]. The Ramer–Douglas–Peucker algorithm (RDP) is an algorithm for reducing the number of points in a curve that is approximated by a series of points. This algorithm is also known under the names Douglas–Peucker algorithm, iterative end-point fit algorithm and split-and-merge algorithm.

The Douglas-Peucker algorithm uses a point-to-edge distance tolerance. The algorithm starts with a crude simplification that is the single edge joining the first and last vertices of the original polyline. It then computes the distance of all intermediate vertices to that edge. The vertex that is furthest away from that edge, and that has a computed distance that is larger than a specified tolerance, will be marked as a key and added to the simplification. This process will recurse for each edge in the current simplification, until all vertices of the original polyline are within tolerance of the simplification results. This process is illustrated below:

Initially, the simplification consists of a single edge. During the first step, the fourth vertex is marked as a key and the simplification is adjusted accordingly. During the second step, the first edge of the current simplification is processed. The maximum vertex distance to that edge falls below the tolerance threshold, and no new key is added. During the third step, a key is found for the second edge of the current simplification. This edge is split at the key and the simplification is updated. This process continues until no more keys can be found. Note that at each step, only one edge of the current simplification is processed.
This algorithm has a worst case running time of O(nm), and O(n log m) on average, where m is the size of the simplified polyline. As such, this is an output dependent algorithm, and will be very fast when m is small. To make it even faster, the Distance between points routine is applied as a pre-processing step.

MATLAB Implementation

function out_points = points_simplify_2d(points, epsilon)
  id_end = size(points,1);
  id_start = 1;

  % Indeces of points are not removed by Douglas-Peucker algorithm
  remaining_ids = true(id_end,1);

  % Call douglas_peucker_iteration (recursive function)
  [points remaining_ids] = douglas_peucker_iteration( ...
     points,epsilon,id_start,id_end, remaining_ids);
  out_points = points(remaining_ids,:);
end

function [points remaining_ids]  = douglas_peucker_iteration(points, epsilon, id_start, id_end, remaining_ids)
  % For the points (start + 1) to (end) -> next_points_relative is
  % relative coordinates from start point
  next_points_relative = bsxfun(@minus, points(id_start+1:id_end,:), 
     points(id_start,:));
  end_point_relative = next_points_relative(end,:)';

  % Efficient Method to get distances_from_start_end_line
  beta = (end_point_relative' *
     next_points_relative')./(end_point_relative'*end_point_relative);
  b = next_points_relative-bsxfun(@times,beta,end_point_relative)';
  distances_from_start_end_line = hypot(b(:,1),b(:,2));

  % Identify maximum distance and get its index
  [dmax dmax_id] = max(distances_from_start_end_line);
  dmax_id  = id_start + dmax_id; %ID of the edge point

  if dmax <= epsilon;
    remaining_ids(id_start+1:id_end-1) = false; 
  else  
    [points remaining_ids] = douglas_peucker_iteration( ...
       points,epsilon,id_start,dmax_id, remaining_ids);
    [points remaining_ids] = douglas_peucker_iteration( ...
       points,epsilon,dmax_id,id_end, remaining_ids);
  end
end

______________________________________________________________________
References:
[1] http://en.wikipedia.org/wiki/Ramer-Douglas-Peucker_algorithm
[2] Polyline Simplification

Wednesday, June 3, 2015

A New Variant of Douglas–Peucker Algorithm (1/4) [Introduction]

The Ramer-Douglas-Peucker is a polyline simplification algorithm that uses a point-to-edge (point-to-line) distance tolerance “Epsilon”. The algorithm starts with a crude simplification that is the single edge (line) joining the first and last vertices (points) of the original polyline (polyline needed to be simplified). There is a variant of this algorithm already that allows it to produce a polyline with a fixed number of vertices “N”.

From software points of view, sometimes it’s preferred to have a fixed-size array to store a polyline vertices. Thus, the variant version of the Douglas-Peucker algorithm, that produces a fixed number of vertices “N”, is more convenient. However, from applications point of view, it doesn’t make sense to describe a simple polyline (rectangular area for example) with the same number of vertices that describe a complex-shaped one. This is why I’ve introduced my own version of the Douglas-Peucker algorithm, which I call a “Regularized” version.

My modified algorithm takes two pieces of data as inputs. The first is a predefined maximum number of vertices "N". The second is a tolerance "Epsilon". The algorithm returns n ≤ N vertices; it excludes the vertices that over-fit according to the predefined tolerance "Epsilon”. Those are the vertices that would have not been selected using the original Douglas-Peucker algorithm if it runs using the given "epsilon". A sample results of my variant of the Douglas–Peucker algorithm are shown in the coming table.

(A) N=50 , Epsilon=1.5 :

(B) N=50 , Epsilon=1.0 :

(C) N=100 , Epsilon=1.5 :

(D) N=10 , Epsilon=1.5 :


In the previous table, polyline (B) contains more vertices than polyline (A), because decreasing Epsilon allows more vertices to produce max edge distances higher than Epsilon, while these points still produce a polyline with number of vertices less than N. The resulted polyline in (C) is the same as (A), because increasing N, while having the same Epsilon will not produce new vertices that fits that Epsilon. Polyline in (D) contains fewer vertices than the polyline in (A), because the number of vertices in polyline (A) is greater than the N in (D).

In a coming posts, I will post the algorithms and MATLAB implementations for the three configurations:
- Original Douglas-Peucker algorithm.
- Variant of Douglas-Peucker algorithm (fixed number of simplified vertices "N").
- My Variant of Douglas-Peucker algorithm (with "N" and a tolerance "Epsilon").

Tuesday, April 7, 2015

Using OpenCV with Visual Studio

Step1. CMake:

CMake is used to make (build) your own libraries from the OpenCV source files and not to use the pre-built libraries. This allows to take advantage of the most advanced technologies OpenCV integrates into their library. You should build the ALL_BUILD project for both of the Debug and Release modes. Then, build the INSTALL project for both of Debug and Release modes. This step will prepare the folder: C:\openCV_build\visual_studio_10\install\ for you. In this tutorial, I assume that the chosen CMake build binaries directory is: C:\openCV_build\visual_studio_10\ .

-----------------------------------------------------------------------------------------------------------

Step2. Integration with Visual Studio 10 Project:

You can do this step globally once on your machine, or you can make it per project. In this tutorial, I choose to add the OpenCV libraries locally (per project), not to always clump all of the projects in my machine with OpenCV information.

At this point, you need to configure the Visual Studio project, so it can locate OpenCV headers and libraries. Go to the Project Properties (ALT+F7), and once the new window shows up do the following:
  1. On the Configuration box, select All Configurations.
  2. Open Configuration Properties > C/C++ > General, and edit the field Additional Include Directories to add these 3 paths (for the headers):C:\openCV_build\visual_studio_10\install\include\opencv
    C:\openCV_build\visual_studio_10\install\include\opencv2
    C:\openCV_build\visual_studio_10\install\include
  3. Note that include\opencv is for the C interface of OpenCV and include\opencv2 if for the C++ interface. We are also adding the folder include to prevent our build from being broken by some headers of the C interface that refer to C++ headers as opencv2\core.


  4. Then, add the path of the libraries on Configuration Properties > Linker > General, and on the Additional Library Directories field, add this: C:\openCV_build\visual_studio_10\install\x86\vc10\lib.

  5. Finally, according to the complexity and need of the desired application, add the libraries that you need from:
    opencv_calib3d300.lib
    opencv_core300.lib
    opencv_features2d300.lib
    opencv_flann300.lib
    opencv_highgui300.lib
    opencv_imgcodecs300.lib
    opencv_imgproc300.lib
    opencv_ml300.lib
    opencv_objdetect300.lib
    opencv_photo300.lib
    opencv_shape300.lib
    opencv_stitching300.lib
    opencv_superres300.lib
    opencv_ts300.lib
    opencv_video300.lib
    opencv_videoio300.lib
    opencv_videostab300.lib


    So go to Configuration Properties > Linker > Input and add them:

Now, press F7 to Build Solution and you should see:
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ========== .
To be able to execute the application you'll need to modify the PATH environment variable of your system to add the location of OpenCV's DLLs. Add this to end of PATH:
;C:\openCV_build\visual_studio_10\install\x86\vc10\bin .

In the above discussion, you can just add a new environment variable called OPENCV_DIR and set the path: C:\openCV_build\visual_studio_10\install to it. And like that you will use $(OPENCV_DIR)\include, instead of: C:\openCV_build\visual_studio_10\install\include .

-----------------------------------------------------------------------------------------------------------

Step3. Debugging Images:


Just download NativeViewer, then open as text:
C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\Packages\Debugger\autoexp.datto add the NativeViewer dll file path to it. In other words, under the [AutoExpand] section add the following line (modify the "E:\_PhD\Programs\NativeViewer_VS2010" with the link as in your machine):
cv::Mat=$ADDIN(E:\_PhD\Programs\NativeViewer_VS2010\NativeViewer_VS2010_v1.0.1\NativeViewer10.dll,CvMatViewer).You can also use Visual Studio 2012 or higher with Image Watch plug-in.

Like that, you will be able to see image during debugging as in the following image. Don't forget to hold down the Ctrl button before hovering over an image variable to see its image content.

Monday, March 2, 2015

Modified Condition/Decision Coverage (MC/DC - MCDC)

Consider the following C++ code, where x, y, and z are Integers :
if ( (A==1 || B==20) && C==30 ) {
    cout << "Decision is True!";
}
else {
    cout << "Decision is False!";
}.
We want to write a group of test cases T to test that the above Boolean expression: ( (A==1 || B==20) && C==30 ) is written correctly without inspecting the code with eyes. This task is very useful, especially when the Boolean expression is complicated. One example for a test case can be: If A, B, and C are at some moment are equal to 1, 20, and 30 respectively, the code within the first if condition should execute ("Decision is True!" to be printed) in case that the coder didn't commit a writing mistake within the Boolean expression. In simpler format :
A=1, B=20, C=30  ->  "Decision is True!".

The modified condition/decision coverage (MC/DC) is a code coverage criterion to achieve that testing. It allows generating test cases that can discover if :
  • there is any missing condition that should be present, 
  • there is any wrongly implemented "OR" or "AND" operation(s),
  • or there is any wrongly inverted condition.
Firstly, let's define two concepts: Condition and Decision. We mean with Condition the atomic (indivisible, it's also called leaf-level) conditions. (A==1 || B==20) is a condition, but it's not atomic, while (A==1) is an atomic condition. On the other hand, Decision is the whole big condition under testing. It's ( (A==1 || B==20) && C==30 ) in our example code.
------------------------------------------------------------------------
For Condition Coverage:
(A==1)
,
(B==20), and (C==30) are conditions. If the test cases in T guarantee that each condition is true in some test case and is false in another test case, we achieve condition coverage.
Example for a T that achieve condition coverage:
T = {
A=1, B=20, C=30 ->  "Decision is True!"  ,
A=0, B=0 , C=0  ->  "Decision is False!" }
------------------------------------------------------------------------
For Decision Coverage:
If the test cases guarantee that the Decision is true in some test case and is false in another test case, we achieve decision coverage. So, the above example T achieve decision coverage too.
------------------------------------------------------------------------
For Modified Condition/Decision Coverage:
For each atomic condition X, there should be two test cases T1 and T2 in T that achieve the following three conditions:
  • In T1 X is true, while it's false in T2.
  • In T1, the Decision is true, while it's false in T2.
    Or the opposite.
  • T1 and T2 are common for all the conditions other than X.
For example:
T = {
A=0, B=0C=30 ->  "Decision is False!" ,
A=0, B=20, C=30 ->  "Decision is True!"  ,
A=0, B=20, C=0  ->  "Decision is False!" ,
A=1, B=0 , C=30 ->  "Decision is True!}

In the previous example test cases, for the condition (A=0),
(A=0, B=0C=30 ->  "Decision is False!") is T1, and 
(A=1, B=0 , C=30 ->  "Decision is True!") , is T2. Note that, for any Boolean expression there will be at least n+1 test cases to achieve Modified Condition/Decision Coverage, where n is the number of conditions.

Thursday, February 5, 2015

Entropy .. Simply !

In information theory, entropy is the average amount of information contained in each message received. Let's dig into an example directly,
the four possible outcomes (events) that could occur if you flipped a coin twice (experiment) are:
Head then Head , Head then Tail , Tail then Tail , and Tail then Head .

Note that the four outcomes (sample space) are equally likely, so each probability should be equal to 1/4 :
P(X1=Head & X2=Head) = 1/4 ,
P(X1=Head & X2=Tail) = 1/4 ,
P(X1=Tail & X2=Tail) = 1/4 , and
P(X1=Tail & X2=Head) = 1/4 .

The entropy for the previous example is given by:
H(X)=ΣiP(x)log2[P(x)]=
 = - (1/4) log2(1/4) - (1/4) log2(1/4) - (1/4) log2(1/4) - (1/4) log2(1/4) = 2.

This means that you need 2 bits to store the actual status at any moment that comes out when conducting that experiment (event).
---------------------------------------------------------------------------------------------
Let's have another simple example. If we expect if the sun will rise tomorrow or not (experiment), we obtain two possible outcomes (sample space):
P(SunRisesTommorrow) = 1 and
P(SunWontRiseTommorrow) =0 .
  
So, the entropy is : = - (1) log2(1) - (0) log2(0)  = 0.
This means that we don't need any bits to store the actual status of that experiment, because we are 100% sure that tomorrow the sun will rise (P(SunRisesTommorrow) = 1).
---------------------------------------------------------------------------------------------
"The entropy of the unknown result of the next toss of the coin is maximized if the coin is fair (that is, if heads and tails both have equal probability 1/2). This is the situation of maximum uncertainty as it is most difficult to predict the outcome of the next toss; the result of each toss of the coin delivers one full bit of information.
However, if we know the coin is not fair, but comes up heads or tails with probabilities p and q, where pq, then there is less uncertainty. Every time it is tossed, one side is more likely to come up than the other. The reduced uncertainty is quantified in a lower entropy: on average each toss of the coin delivers less than one full bit of information." Wikipedia