In information theory, entropy is the average amount of information contained in each message received. Let's dig into an example directly,
the four possible outcomes (events) that could occur if you flipped a coin twice (experiment) are:
Head then Head , Head then Tail , Tail then Tail , and Tail then Head .
Note that the four outcomes (sample space) are equally likely, so each probability should be equal to 1/4 :
P(X1=Head & X2=Head) = 1/4 ,
P(X1=Head & X2=Tail) = 1/4 ,
P(X1=Tail & X2=Tail) = 1/4 , and
P(X1=Tail & X2=Head) = 1/4 .
The entropy for the previous example is given by:
H(X)=−ΣiP(x)log2[P(x)]
This means that you need 2 bits to store the actual status at any moment that comes out when conducting that experiment (event).
---------------------------------------------------------------------------------------------
Let's have another simple example. If we expect if the sun will rise tomorrow or not (experiment), we obtain two possible outcomes (sample space):
P(SunRisesTommorrow) = 1 and
P(SunWontRiseTommorrow) =0 .
---------------------------------------------------------------------------------------------
"The entropy of the unknown result of the next toss of the coin is maximized if the coin is fair (that is, if heads and tails both have equal probability 1/2). This is the situation of maximum uncertainty as it is most difficult to predict the outcome of the next toss; the result of each toss of the coin delivers one full bit of information.
However, if we know the coin is not fair, but comes up heads or tails with probabilities p and q, where p ≠ q, then there is less uncertainty. Every time it is tossed, one side is more likely to come up than the other. The reduced uncertainty is quantified in a lower entropy: on average each toss of the coin delivers less than one full bit of information." Wikipedia
the four possible outcomes (events) that could occur if you flipped a coin twice (experiment) are:
Head then Head , Head then Tail , Tail then Tail , and Tail then Head .
Note that the four outcomes (sample space) are equally likely, so each probability should be equal to 1/4 :
P(X1=Head & X2=Head) = 1/4 ,
P(X1=Head & X2=Tail) = 1/4 ,
P(X1=Tail & X2=Tail) = 1/4 , and
P(X1=Tail & X2=Head) = 1/4 .
The entropy for the previous example is given by:
= - (1/4) log2(1/4) - (1/4) log2(1/4) - (1/4) log2(1/4) - (1/4) log2(1/4) = 2.
This means that you need 2 bits to store the actual status at any moment that comes out when conducting that experiment (event).
---------------------------------------------------------------------------------------------
Let's have another simple example. If we expect if the sun will rise tomorrow or not (experiment), we obtain two possible outcomes (sample space):
P(SunRisesTommorrow) = 1 and
P(SunWontRiseTommorrow) =0 .
So, the entropy is : = - (1) log2(1) - (0) log2(0) = 0.
This means that we don't need any bits to store the actual status of that experiment, because we are 100% sure that tomorrow the sun will rise (P(SunRisesTommorrow) = 1).---------------------------------------------------------------------------------------------
"The entropy of the unknown result of the next toss of the coin is maximized if the coin is fair (that is, if heads and tails both have equal probability 1/2). This is the situation of maximum uncertainty as it is most difficult to predict the outcome of the next toss; the result of each toss of the coin delivers one full bit of information.
However, if we know the coin is not fair, but comes up heads or tails with probabilities p and q, where p ≠ q, then there is less uncertainty. Every time it is tossed, one side is more likely to come up than the other. The reduced uncertainty is quantified in a lower entropy: on average each toss of the coin delivers less than one full bit of information." Wikipedia