This is an implementation for Multilayer Perceptron (MLP) Feed Forward Fully Connected Neural Network with a Sigmoid activation function. The training is done using the Backpropagation algorithm with options for Resilient Gradient Descent, Momentum Backpropagation, and Learning Rate Decrease. The training stops when the Mean Square Error (MSE) reaches zero or a predefined maximum number of epochs is reached.
Four example data for training and testing are included with the project. They are generated by SharkTime Sharky Neural Network.
1- Download Code
2- Network architecture & Training Parameters:
The code configuration parameters are as follows:
1- Numbers of hidden layers and neurons per hidden layer. It’s represented by the variable nbrOfNeuronsInEachHiddenLayer. To have a neural network with 3 hidden layers with number of neurons 4, 10, and 5 respectively; that variable is set to [4 10 5].
2- Number of output layer nits. Usually the number of output units is equal to the number of classes, but it still can be less (≤ log2(nbrOfClasses)). It’s represented by the variable nbrOfOutUnits. The number of input layer units is obtained from the training samples dimension.
3- The selection if the sigmoid activation function is unipolar or polar. It’s represented by the variable unipolarBipolarSelector.
4- The learning rate η.
5- The maximum number of epochs at which the training stops unless MSE reaches zero. It’s represented by the variable nbrOfEpochs_max.
6- Option to enable or disable Momentum Backpropagation. It’s represented by the variable enable_learningRate_momentum.
7- The Momentum Backpropagation rate
α. It’s represented by the variable momentum_alpha.
8- Option to enable or disable Resilient Gradient Descent. It’s represented by the variable enable_resilient_gradient_descent.
9- The Resilient Gradient Descent parameters: η+
, η- ,
Δmin, Δmax, represented by the variables learningRate_plus, learningRate_negative, deltas_min, and deltas_max.
The code also contains a parameter for drawing the decision boundary separating the classes and the MSE curve. The number of epochs after which a figure is drawn and saved on the machine is specified. The figures are saved in a folder named Results besides the m files. This parameter is represented by the variable draw_each_nbrOfEpochs. The variable dataFileName takes the Sharky input points file name as string.
3- Results:
In all of the following four test cases, MCCR=1. The stop condition is that MSE reaches zero in all the cases. A unipolar sigmoid function is chosen.
A) Linear Points Case:
Figure 1. Network: 2-4-2 , Unipolar Sigmoid Activation, No Options, η=0.15.
Figure2. Network: 2-10-2 , Unipolar Sigmoid Activation, Resilient Gradient Descent, η+=1.2, η-=0.5, Δmin=10^-6, Δmax=50.
Figure 3. Network: 2-10-10-2 , Unipolar Sigmoid Activation, Resilient Gradient Descent, η+=1.2, η-=0.5, Δmin=10^-6, Δmax=50.
Figure 4. Network: 2-10-10-2 , Unipolar Sigmoid Activation, Resilient Gradient Descent, η+=1.2, η-=0.5, Δmin=10^-6, Δmax=50.

Figure 5. Magnified decision boundary with better resolution for Spiral points case.
Video 1. Solving the Two Spirals Problem.
There is something wrong for the original code. It should be added with " if i~= length(Weights)-1", for there is no redundant bias in the last weight matrix.
for i = 1:length(Weights)-1
Weights{i} = 2*rand(nbrOfNodesPerLayer(i), nbrOfNodesPerLayer(i+1))-1; %RowIndex: From Node Number, ColumnIndex: To Node Number
if i~= length(Weights)-1
Weights{i}(:,1) = 0; %Bias nodes weights with previous layer (Redundant step)
Delta_Weights{i} = zeros(nbrOfNodesPerLayer(i), nbrOfNodesPerLayer(i+1));
ResilientDeltas{i} = deltas_start*ones(nbrOfNodesPerLayer(i), nbrOfNodesPerLayer(i+1));
Hello 王健,
Many Thanks. Yes, you are right. This is a mistake.
It will always force the weights connected to the first neuron in the output layer to be zero, instead of starting from random values.
As I say in code comments, this line of code is only for convenience ("Redundant") and can be removed. However, once it's there, it should be protected with an if condition like you've did.
I have corrected the code on MathWorks File Exchange.
Thanks again,
Dear Hesham,
How are you ? I wish you are fine.
I'm AbdelMonaem AbdAllah, doing my PhD in UNSW @ Canberra, Australia. I'm still new in NN, so I'm wondering whether your NN can train a function, for example, squares of number of variables or it works on binary values only ?
Please, could you contact me on my e-mail,
Dear AbdelMonaem,
I'm fine, thank you. I wish you best of success with your PhD.
Simply, a neural network is a black box that understands/models the relation between some patterns (feature vectors) and their corresponding labels (classes). The understanding phase is called "training". A trained neural network is used later on to estimate the class (label) of a test pattern, this is called the "testing" or "deployment" phase.
I understand you are asking about the input vector dimensionality. My implementation is generic; it detects and works with the given input dimension whatever its length. The input vectors should be stored row by row in the "Samples" variable, and their corresponding class labels in "TargetClasses". A feature can take any number. If you need more than two output classes, you need to uncomment line 59 and implement the "To-Do" I mention in line 68. Please let me know if you have any difficulties doing it.
Best Regards, Hesham
Thank you Hesham for your quick reply. I'm doing PhD in optimization.
Actually, I'm wondering if you have any document that would help me in understanding the parameters that you used in your code, and how they would affect the performance of your NN.
Also, after training, how can I test a new record to be belongs to which class ?
Hello AbdelMonaem,
I'm afraid that I don't have a specific reference in mind to recommend for you. Nevertheless, the parameters of the network, resilient Gradient Descent, decreasing learning rate, momentum, and epochs are widely discussed online. Have a look, and let me know if you have any specific questions.
I wish you best of luck.
hi,,can u help me?im doing my final project on classifying TB on color features by using multilayered perceptron..n im getting confused on extracting the input and output value from an image
Hi Siti,
I wish you all the best with your final project.
I didn't get what you mean.
Inputs to MLP is your feature and output is the classification result. You should have an output neuron per class, and decide the predicted class by checking the neuron with highest activation value.
Dear Hesham,
Lovely Stuff mate. Love what you have done!
Im trying to create my own code and was looking at how others have attempted it to get a general idea of how to start coding.
Just a quick question and it may seem a silly one....
but what is the difference between activation_func and activation_func_drev?
As in what does the drev stand for. I understand the function itself is different but what is the reasoning behind that?
Thank you very much
Dear Michael,
Thank you so much for the motivation :)
Very soon, I will post more interesting similar articles about different areas of machine learning.
"Drev" stands for "Derivative", that function returns the value of the derivative (gradient) of the activation function at a query point.
Best Wishes, Hesham
Hey , Great work.
I am a newbie in ANN and its coding.. But from whatever i have read, this seems to be prefect generic code for "ANY" MLP feedforward network with backpropagation training algorithm. Right?
And if so then, I have generated 1570*7 excel table for my project by Matlab code. Now i want to create a network that takes 5 columns (First five natural frequencies of the structure) as input and takes remaining 2 columns (Size and location of defect) as target.
So my query is, while you have used some other input data format, and I have excel table. So how can i incorporate reading those data format instead of .points data file?
Hello Narendra,
Yes, it is a generic implementation for MLP feedforward network.
Concerning parsing the Excel data in MATLAB, there are many tutorials online for that.
In line 52, the variable 'Samples' should contain the training data, a row for each data sample. In your case it should be a matrix of 1570 rows X 5 columns.
Because, you don't have classes here; i.e. you don't have a single output neuron that should fire for each different class. You should comment the 'for' loop in line 70. And after it, fill the matrix 'TargetOutput' such that each row is the desired output for each data sample. In your case it should be a matrix of 1570 rows X 2 columns.
Also, for the above reason, your training stop criteria is different. So, comment lines 184:195 and line 198, and another code that can estimate how much the neural network is good (calculate 'MSE(Epoch)' using the 'outputs' results for each sample). 'MSE(Epoch)' should be the number of mistakenly classified samples from the neural network divided by the total number of data samples.
It you don't get any of what I mean. I'm happy to get your data and upgrade my code accordingly for you.
Best wishes, Hesham
Hello 70R50,
1- Let's speak in terms of code, after getting 'outputs', you need to modify the next if conditions like follows:
if (isequal(outputs,[1 0 0 0 0 0 0 0 0 0]))
ActualClasses(Sample) = 0;
else if (isequal(outputs,[0 1 0 0 0 0 0 0 0 0]))
ActualClasses(Sample) = 1;
else if (isequal(outputs,[0 0 0 0 0 0 0 0 1 0]))
ActualClasses(Sample) = 8;
2- No, you should modify the function in Activation_func.m to be 'logsig' activation. Plus, modifying the function in Activation_func_drev.m to be the derivative of the 'logsig' activation. Which you should figure out for your assignment :-)
Best wishes, Hesham
@70R50: I've forgot to mention that the MSE calculation should change accordingly:
MSE(Epoch) = sum(find(ActualClasses==TargetClasses))/(length(Samples(:,1)));
Hello Sir,
I am doing my final assignment on Yield forecasting using ANN. First of all, I am a newbie to NN, my background is Agriculture. I wanna ask you about my code either it is true or not.
%% load divided input data set
load annflo.mat
%% define training inputs
trainInp = [trainflo];
%% define targets
T = [targetflo];
net = newff(trainInp,T,[],{},'trainlm');
net = init(net);
net.trainParam.epochs = 1000;
net.trainParam.goal = 0.01;
net.divideParam.trainRatio = 1;
net.divideParam.valRatio = 0;
net.divideParam.testRatio = 0;
net.trainParam.max_fail = 2;
net = train(net,trainInp,T,[],[]);
Y = sim(net,trainInp);
I was using Levenberg-Marquardt as the training algorithm, but then I realized that It was supposed to be Back-propagation as the training algorithm and I had no idea how to change the code. Would you like to help me, Sir?
Thank you :)
I am new to NN.Can the following code be used for time series prediction?
What modification do i have to do?
Hi Hesham,
I'm trying to adapt your code for 4 inputs and 2 output neurons(no classes). I read your comment given to Narendra. I'm struggling with the evaluation part of the code. Can you please help me out.
Thank you :)
Really Nice work Hesham. Good luck with all your work.
Best Regards
@freakme whatsoever:
You are using MATLAB Neural Network Toolbox. Sorry, I don't have it. But there is a nice example for you here:
Good luck, and I'm here for any questions if you use my code.
@mahum pervez:
Yes NN can solve the series prediction problem. I've uploaded an image for you that summarizes the idea:
Good Luck.
@Chathura: Hi, Thanks for your nice words. Sorry for late replay.
You can simply replace the evaluation code from line 184 to line 195 with this line of code:
[~,ActualClasses(Sample)] = max(outputs) - 1;
Good Luck :)
Dear Hesam
First of all I want to thank for this awesome and clean MLP implementation
I have a question about my case which doesn't work very well I have 400 samples with 20 features which divided to 3 classes. I almost done all of your advises for changing code base on other questions and comments but in my case code works till 99 epochs and after that this errors comes out!could you please help me?
Error using *
Inner matrix dimensions must agree.
Error in EvaluateNetwork (line 7)
NodesActivations{Layer} = NodesActivations{Layer-1}*Weights{Layer-1};
Error in MLP_NN (line 228)
outputs = EvaluateNetwork([1 x y], NodesActivations, Weights, unipolarBipolarSelector);
Thank you Milad for your nice words, glad it's useful.
You are right, this is because my visualization code only supports problems with 2 features only. But in your case, it's 20.
You have to options to proceed:
- Comment the visualization code section, or set the variable draw_each_nbrOfEpochs to a large number like 10^9, so that visualization won't occur.
- Try to do PCA on your features to reduce them to 2 features, this will make training much faster and visualization feasible. You may even end up with better classification performance (depending on the variance in your features).
Hi Hesham,
I have a classification problem which is to classify the music genres for more than 50,000 songs. Here, I have 10 genres. I've been studying your code for several days and I also read your comment about the multiply outputs problem. But I still cannot figure out how to construct the outputs. The actual labels for the 10 genres are [1,2,...,10]. And I construct a matrix for all the samples with their corresponding labels:
[1 0 0 0 0 0 0 0 0 0;
1 0 0 0 0 0 0 0 0 0;
0 1 0 0 0 0 0 0 0 0;
0 0 0 0 0 0 0 0 0 1];
Here [1 0 0 0 0 0 0 0 0 0] means the label for the sample point is 1 and [0 0 0 0 0 0 0 0 0 1] means the label for this sample point is 10, etc.
The followings are your code:
%% Read Data
importedData = importdata(dataFileName, '\t', 6);
Samples =, 1:length(,:))-1);
TargetClasses =, length(,:)));
TargetClasses = TargetClasses - min(TargetClasses);
ActualClasses = -1*ones(size(TargetClasses));
Problem 1: Do I still needs the above code to construct the TargetClasses and the ActualClasses?
Problem 2: What are the TargetClasses and the ActualClasses in my case?
Thank you!
Hi Eve,
Problem 1: No you won't need that code. Replace it by other code that fills the variables 'Samples', 'TaggetClasses', and 'ActualClasses'.
Please note that the variable 'Samples' should contain your training data, a row for each data sample.
Problem 2: 'TaggetClasses' is the labels matrix you have constructed as you describe above.
'ActualClasses' at that moment is a variable of the same size as 'TaggetClasses'. It is just for allocation, so you can put zeros or any numbers for now. Later on in the "Evaluation" code section, it should be filled with the neural network output for all the samples. You should modify that section as I described in a previous comment.
If it still have problems, I will be happy to upgrade my code if you send me sample of your data via email maybe:
Good luck.
Dear Hesham Eraqi,
can I use your code as classifier ?
As I understand the data there are:
Samples - training data,
TargetClassess - labels for training data (classess).
Where (which variable) can I store the testing data ?
And, how to read a final (predicted) labels of such data ?
Hi! Congratulations for your code. I'm using it to approximate a function. It's more or less like this: I have five inputs and one output that is the price of a commodity. And I want the Neural Network to learn these prices and give me prices on new five inputs that it hasn't been taught. So I think it is only one output. Ok ? If I just reduce the number of output in your code to one, will it work to approximate the function ?
And another doubt, why do you subtract from the Target Classes its minimum ?
Thank you very much!
Just to let you know what changes I have made to your code in order to get the nn to calculate an output only (value of a function). I started by changing the number of outputs to one, and importing my data. My data have six input variables and one output. As my target output is a price, I normalized it to be between 0 and 1 with the min max normalization. So my targetclasses are just my prices normalized. Besides, the targetoutputs now, as I guess, must be the same as the targetclasses, because there are no classes at all. So it must be the value of the output itself, that is what I want my nn to evaluate given the inputs. As for the evaluation, I just commented all the stuff that has to do with the actualclasses and inserted in ActuallClasses(sample) = output. After that I commented the visualization part that has not to do with the error. It won't work with this changes becaue the evaluation is evaluating the same value for all my training set. What have I forgot doing ?
Hi der..I need matlab code for ( training back propagation algorithm ) for wind forecasting....plz rpy to diz mail
Dear Hesham Eraqi
i'm doing finial year research in university. i'm doing "landslide susceptibility analysis" using ANN back-propagation method, problem is i have 10 input data(images,numerically ) for 1 landslide like wise i have 8 land slide data. Now i wanted create network for finding land slide susceptible.
Can you provided ideas with code.(i'm try to do this with the mathlab)
Hello Hesham Eraqi,
Good work from you. However, I found that the code only for training. How do you test your data? And if by getting MSE close to 0, how you ensure that your model is not overfitting?
Christian Sanchez commented on your file MLP Neural Network with Backpropagation :
Hi ,
I am trying to understand backpropagation, and your code is being really helpful, thanks.
I have got a question: your input to the derivative of the sigmoid is "NodesActivations", which has previously gone through the sigmoid function. I do not understand this, shouldnt it be input to the derivative of the sigmoid the input to the activation function? Obviously no, because your code works, but could you help me please to understand this ?
Input to activation function: s
Output from activation function: z=f(s )
Derivative: d/dw z=d/dw f(s) = f(s)(1-f(s))
while in the code it looks like f(z)(1-f(z)) is calculated, because you do :
Line 111-->NodesActivations{Layer} = Activation_func(NodesActivations{Layer}, unipolarBipolarSelector );
Line 121-->gradient = Activation_func_drev(NodesActivations{Layer+1}, unipolarBipolarSelector);
@Christian Sanchez:
You are absolutely correct; to get the derivative of a function you need the input to it (not the output) to substitute with. But if that function is a Sigmoid (which is our case) you can use a beautiful feature that allows calculating the derivative using the output. I've provided the proof in this post:
Very good post.
All the ways that you suggested to find a new post was very good.
Keep doing posting and thanks for sharing.11:38 AM 9/10/2018
Imagine GPUs use outside of cinematic and gaming use cases.
Data Science Interview Questions
We are located at :
Location 1:
ExcelR - Data Science, Data Analytics Course Training in Bangalore
49, 1st Cross, 27th Main BTM Layout stage 1 Behind Tata Motors Bengaluru, Karnataka 560068
Phone: 096321 56744
Hours: Sunday - Saturday 7AM - 11PM
Location 2:
#49, Ground Floor, 27th Main, Near IQRA International School, opposite to WIF Hospital, 1st Stage, BTM Layout, Bengaluru, Karnataka 560068
Phone: 070224 51093
Hours: Sunday - Saturday 7AM - 10PM
How to determine the validation set?
this only for training ?
"How to determine the validation set?
this only for training ?"
In our example, we used the training data for validation and for testing. But a real data problem, the validation data could be selected using techniques like cross-validation, and test data are separate.
Wow! Such an amazing and helpful post this is. I really really love it. It's so good and so awesome. I am just amazed. I hope that you continue to do your work like this in the future also.
data science course in guwahati
I have recently visited your blog profile. I am totally impressed by your blogging skills and knowledge.
Data Science Training In Chennai | Certification | Data Science Courses in Chennai | Data Science Training In Bangalore | Certification | Data Science Courses in Bangalore | Data Science Training In Hyderabad | Certification | Data Science Courses in hyderabad | Data Science Training In Coimbatore | Certification | Data Science Courses in Coimbatore | Data Science Training | Certification | Data Science Online Training Course
I was just browsing through the internet looking for some information and came across your blog. I am impressed by the information that you have on this blog. It shows how well you understand this subject. Bookmarked this page, will come back for more.
data science courses
Thanks for the Valuable information.Really useful information. Thank you so much for sharing. It will help everyone.
An awesome blog thanks a lot for giving me this great opportunity to write on this.
thời gian bay từ việt nam sang trung quốc
vé máy bay tphcm đi quảng châu
săn vé máy bay giá rẻ đi thượng hải
giá vé máy bay từ vietnam đi anh
giá vé máy bay eva đi mỹ
lịch bay từ mỹ về việt nam hôm nay
You have done a amazing job with you website
business analytics course in aurangabad
I have to convey my respect for your kindness for all those that require guidance on this one field. Your special commitment to passing the solution up and down has been incredibly functional and has continually empowered most people just like me to achieve their dreams. Your amazing insightful information entails much to me and especially to my peers.
có vé máy bay từu mỹ về việt nam chưa
vé máy bay từ úc về việt nam giá rẻ
Các chuyến bay từ Incheon về Hà Nội hôm nay
săn vé may bay giá rẻ tu Nhat Ban ve Viet Nam
Cách săn vé máy bay giá rẻ tu Dai Loan ve Viet Nam
cách đăng ký về việt nam từ canada
Thanks for sharing your wealthy information. This is one of the excellent posts which I have seen. I go through your all of your blog, but this blog is the best one. It is really what I wanted to see hope in future you will continue for sharing such an excellent post
đặt vé máy bay từ singapore về việt nam
Từ Nhật về Việt Nam có bị cách ly không
vé máy bay từ toronto về việt nam
vé máy bay từ san francisco về việt nam
vé máy bay từ los angeles về việt nam
mua vé máy bay từ canada về việt nam
A great website with interesting and unique material what else would you need.
data scientist course
Become a data science expert by joining AI Patasala’s data science training in Hyderabad, where you can learn more about data science concepts.
Data Science Training Institute in Hyderabad
Data Science Course Institute in Hyderabad
