Thursday, January 1, 2015

Hidden Markov Model HMM .. Simply ! (1/3)

In this series of tutorials, I will explain very important concepts about the probability theory in a simple way.We will simplify the concept and mathematics of Hidden Markov Models (HMM). It is named after the Russian mathematician Andrey Markov. Almost all present day large vocabulary continuous speech recognition (LVCSR) and text/handwriting recognition systems are based on HMMs! HMMs lie at the heart of almost all of the modern successful speech and text recognition solutions.

In this tutorial, we will explain what a Markov Models is and what are the probability theory concepts that are related to it. This topic is very essential to understand HMM. For this purpose, let's try to figure out this problem: we need to make a probability model that we can use to predict the weather in Cairo for tomorrow.


    

For the weather state, let's assume we only have three possibilities:
  • State 1: Sunny
  • State 2: Cloudy
  • State 3: Rain or Snow (Rain/Snow)
By carefully examining the weather of Cairo for a long period of time, we found following weather change pattern:


--------------------------------------------------------------------------------------------

What you see in the previous image is a Markov Model diagram that can be used to predict the weather in Cairo. In that model:
if today is Sunny, the probability of having a Sunny day tomorrow = 0.6,
the probability of having a Cloudy day tomorrow = 0.3,
and the probability of having a Rain/Snow tomorrow = 0.1. It's logic that the summation of these values is equal to one, because they cover all the three possibilities of tomorrow's weather given that today is Sunny. The summation of outgoing edge weights from every state should be one.

Also, note that the weather of tomorrow depends on today and is totally independent on the weather of yesterday. In other words, the next state depends only on the current state and not the previous one(s). A stochastic process (a system whose state is non-deterministic: random) has the Markov property if the conditional probability distribution of future states of the process (conditional on both past and present values) depends only upon the present state, not on the sequence of events that preceded it. A process with this property like our process is called a Markov process.

Our weather problem looks simple, we can estimate the different weather probabilities for some days easily. But because that the purpose of these tutorials is to explain HMM which is more complex. We need to define properties, like the ones in the previous two paragraphs, in terms of mathematics. So that when things are getting complected, we have some nice equations that can resolve it. So now, let's make some definitions concerning Markov Model:
  • Number of States :
    N, where N=3 in the previous weather example.
  • 1st order Markov assumption :
    P( St = X | St-1 = Y | St-2 = Z | St-3 = W | ... | .... ) = P( St = X | St-1 = Y ) ,
    In words,
    the probability of having a state X at some day (t)
    giventhat 
    in the previous day (t-1) it was state Y,
    and that the day before it (t-2), it was state Z,
    and that the day before before it (t-3), it was state W,
    and that the day before before before it (t-4), it was state ....
    and that the day before before before before it (t-5), it was state ....
    ....
    is equal to

    the probability of having a state X at some day (t)
    given
    that in the previous day (t-1) it was state Y.
  • Stationary :
    P( St = X | St-1 = Y ) = P( St+1 = X | St+1-1 = Y ) ,
    In words, the probability of having a state X at some day (t) given that in the previous day it was state Y,is the same value regardless in which day am I.
  • State Transition Matrix :
    , Where:
    axy = P( St = Y | St-1 = X ), 1  x,y  N , with constraint axy ≥ 0,
  • Initial state probability:
    πi = P( S1 = X )
    In words, πi. is the provability of having the first day (the very initial first day, as if there are no days before it) to be state X.
  • Conditional Property:
    P( St = X , Sv = Y ) = P( St = X | Sv = Y ) * P( Sv = Y )
    In words,
    the probability of having a state X at some day t
    and (means that both events should happen)
    having a state Y at some other day v,

    is equal to
    the probability of having a state X at some day t
    given that we (will) have a state Y at some other day v
    multiplied by
    the probability of having a state Y at the day v.
  • Chain Rule:
    P( S1 = XS2 = Y | S3 = Z | ... | .... | ST-1 = W | ST = U ) =
    P( S1 = X ) * P( S2 = YS1 = X )
        * P( S3 = ZS1 = X,S2 = Y ) * ... * P( ST = US1 = X,S2 = Y,S3 = Z, ... ,ST-1 = W )
    And using the 1st order Markov assumption definition/property:
    = P( S1 = X ) * P( S2 = YS1 = X ) * P( S3 = ZS2 = Y ) * ... * P( ST = UST-1 = W )
--------------------------------------------------------------------------------------------
Now, let's go back to our previous Cairo weather example, we will have a state transition matrix as follows :

, and let's say we have an initial state properties as follows : πi = {0.9, 0.08 , 0.02}. Note that the summation of the previous group should be equal to one. We need to answer a question:
" What is the probability that the weather for the next five days will be: “sunny-rain-sunny-cloudy-cloudy” when today is sunny? ". We should remember that sunny is state 1, cloudy is state 2, and rain/snow is state 3. The solution to that question is as follows:

P(O | model) = P (S1 = 1 , S2 = 1, S3 = 3, S4 = 1, S5 = 2, S6 = 2 | model)
  =  P (S1 = 1) * P (S2 = 1 | S1 = 1) * P (S3 = 3 | S2 = 1) *
                          P (S4 = 1 | S3 = 3) * P (S5 = 2 | S4 = 1) * P (S6 = 2 | S5 = 2)
 = π1 * S11 * S13* S31 * S12 * S22 = 0.9 * 0.6 * 0.1 * 0.4 * 0.3 * 0.2 = 0.001296,
which means: 0.1296 % !

In the next article we will explain about Hidden Markov Models.
I will be glad to answer your questions or hear your feedback in the comments.

No comments: