AI – Neural networks and machine learning
Intro to AI, neural networks and machine learning
Artificial intelligence (AI) is a set of technologies that enable computers to perform a variety of advanced functions, including the ability to see, understand and translate spoken and written language, analyze data, make recommendations, and more.
Machine learning is a subset of artificial intelligence (AI) that allows machines to learn and improve from data without being explicitly programmed.
Neural network is a computing system that uses a network of interconnected nodes to process data in a way that mimics the human brain.
Neural networks is a subset of machine learning, machine learning is a subset of artificial intelligence.
Neural networks will be the focus of this blog post series. It will be similar to a tutorial for how to write a hello world computer program in a programming language, a beginner’s guide for neural networks in machine learning and artificial intelligence.
In order to understand how a neural network works, it’s critical to understand the math behind it, specifically linear algebra, derivatives and partial derivatives in calculus. Some programming skills are also required, preferably programming skills in Python.
The hello world neural network we will be building is a model that can recognize handwritten digits. In order to train the model to recognize handwritten digits, a large set of data of handwritten digits are needed to train the model and we will be using the famous handwritten digit data set MNIST. This data set has a large number of handwritten digits in images and the corresponding digit to each image. We will feed handwritten digit data as the input to the neural work, and the neural network will produce a digit as the output. If the neural network correctly produces the digit that is the actual digit of the hand written digit image, then that is a success and we will update the model to let it know it did a good job. Otherwise, we will update the model to let it know it didn’t do a good job this time. For example, if there are 50000 handwritten digits available for use to train the model, then the model can get trained 50000 times. The way it works is very similar to how a human being learning to recognize digits using the neural networks in the brain, hence the name neural networks for machine learning in artificial intelligence. This is also called a supervised learning because we tell the model what is right and what is wrong, and as the model gets more and more training, it will get smarter and be able to produce more accurate outputs.
What is a data set?
A data set the data to be used to train a neural network. In the case of the neural network for recognizing handwritten digits, it’s a lot of images, each image representing a digit. In order to use an image as an input data for a neural network, the image has to be transformed into pixels. An image of a digit from the MNST data set is 28 by 28 pixels, 28 times 28 equals to 784, a total of 784 pixels. Each pixel is represented with a value between 0.0 and 1.0, a value of 0.0 representing white, a value of 1.0 representing black, and in between values representing the darkness, the higher the value, the darker it is.
Here is what it looks like of a sample image for a 5 from the MNIST data, ignore the border because it is just added here for demonstrating the image is a square of 28 by 28 pixels.
It is represented by an array of 784 float values, with each value between 0.0 and 1.0, a complete white pixel will be represented by 0.0, and places where it has some greyscales are represented by values grater than 0.0 and less or equal to 1.0, such as 0.0703125, 0.66796875, 0.85546875, 0.98828125, 0.98828125, 0.98828125, 0.98828125, 0.76171875, 0.3125. Each image is represented by an array such as this one. If there are 50000 images, there will 50000 arrays like this one.
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.01171875, 0.0703125, 0.0703125, 0.0703125, 0.4921875, 0.53125, 0.68359375, 0.1015625, 0.6484375, 0.99609375, 0.96484375, 0.49609375, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1171875, 0.140625, 0.3671875, 0.6015625, 0.6640625, 0.98828125, 0.98828125, 0.98828125, 0.98828125, 0.98828125, 0.87890625, 0.671875, 0.98828125, 0.9453125, 0.76171875, 0.25, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.19140625, 0.9296875, 0.98828125, 0.98828125, 0.98828125, 0.98828125, 0.98828125, 0.98828125, 0.98828125, 0.98828125, 0.98046875, 0.36328125, 0.3203125, 0.3203125, 0.21875, 0.15234375, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0703125, 0.85546875, 0.98828125, 0.98828125, 0.98828125, 0.98828125, 0.98828125, 0.7734375, 0.7109375, 0.96484375, 0.94140625, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.3125, 0.609375, 0.41796875, 0.98828125, 0.98828125, 0.80078125, 0.04296875, 0.0, 0.16796875, 0.6015625, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0546875, 0.00390625, 0.6015625, 0.98828125, 0.3515625, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.54296875, 0.98828125, 0.7421875, 0.0078125, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.04296875, 0.7421875, 0.98828125, 0.2734375, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.13671875, 0.94140625, 0.87890625, 0.625, 0.421875, 0.00390625, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.31640625, 0.9375, 0.98828125, 0.98828125, 0.46484375, 0.09765625, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.17578125, 0.7265625, 0.98828125, 0.98828125, 0.5859375, 0.10546875, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0625, 0.36328125, 0.984375, 0.98828125, 0.73046875, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.97265625, 0.98828125, 0.97265625, 0.25, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1796875, 0.5078125, 0.71484375, 0.98828125, 0.98828125, 0.80859375, 0.0078125, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.15234375, 0.578125, 0.89453125, 0.98828125, 0.98828125, 0.98828125, 0.9765625, 0.7109375, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.09375, 0.4453125, 0.86328125, 0.98828125, 0.98828125, 0.98828125, 0.98828125, 0.78515625, 0.3046875, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.08984375, 0.2578125, 0.83203125, 0.98828125, 0.98828125, 0.98828125, 0.98828125, 0.7734375, 0.31640625, 0.0078125, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0703125, 0.66796875, 0.85546875, 0.98828125, 0.98828125, 0.98828125, 0.98828125, 0.76171875, 0.3125, 0.03515625, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.21484375, 0.671875, 0.8828125, 0.98828125, 0.98828125, 0.98828125, 0.98828125, 0.953125, 0.51953125, 0.04296875, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.53125, 0.98828125, 0.98828125, 0.98828125, 0.828125, 0.52734375, 0.515625, 0.0625, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
Here is another way to understand how does an array of these values represents an image for a digit. Convert the values to 0s or asterisks *, 0 for all 0.0 values, * for all values greater than 0.0, then print them out in a 28 by 28 grid. It will looks like this. One can roughly tell this is a 5 by just looking at it.
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
000000000000************0000
00000000****************0000
0000000****************00000
0000000***********0000000000
00000000*******0**0000000000
000000000*****00000000000000
00000000000****0000000000000
00000000000****0000000000000
000000000000******0000000000
0000000000000******000000000
00000000000000******00000000
000000000000000*****00000000
00000000000000000****0000000
00000000000000*******0000000
000000000000********00000000
0000000000*********000000000
00000000**********0000000000
000000**********000000000000
0000**********00000000000000
0000********0000000000000000
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
What is a neural network?
Here is what it looks like of a neural network. Usually there is an input layer, hidden layers and an output layer. For the handwritten digit recognization neural network, the input layer is an array of floats which represents an image. In this case, it will be 784 floats between 0.0 and 1.0. Hidden layer is where the input values get processed and updated. The number of neurons is usually less than the input layer. Some model can have more than one hidden layer and some can only have 1 hidden layer. In this case, there is only 1 hidden layer. The output layer for the digit recognization neural network is a 10 neurons representing the digits 0-9. To train the handwritten digit recognization neural network, and if there are 50000 training images, there will be 50000 arrays of floats alone with 50000 corresponding digits to feed into the neural network. After the neural network is trained with them, then feed it another one it never seen before, and the network is expected to be able to recognize the digit correctly.
What is a neuron?
A neuron is a node in a neural network, each neuron/node holds a value. The total number of neurons in a neural network varies depending on the problem it is looking to solve. Usually the number of neurons in the input layer is greater than the hidden layer, and the number of neurons in the hidden layer is greater than the output layer. In the above handwritten digit recognization neural network, there are 784 neurons in the input layer because the input is 784 pixel values of an image. The number of neurons of an input layer strictly depends on the number of inputs.
The hidden layer only has 15 neurons. However, it does not necessary has to be 15, it could be more or less. We can do some analysis of the problem as well as some testings to figure out the best number for the number of neurons in a hidden layer. Each node in the hidden layer holds a value that is calculated via a math formula by plugging in the input values from the neurons in the input layer, weights and biases.
The output layer has 10 neurons representing the digits 0 to 9. It could also be just 4 neurons because the digits 0 to 9 can also be represented in binary using 4 binary bits. For example, 9 in binary is 1001, 8 in binary is 1000, 7 in binary is 111, etc. However, using the decimal digit 0-9 makes more sense because it is easier to understand to us human.
How does a neuron gets its value?
For neurons in the input layer, the value is simply the input values. For example, a neuron in the handwritten digit recognization neural network is simply a value between 0.0 and 1.0 that represents the greyscale of a pixel in an image. A 28 by 28 pixels digit image would have 784 pixel values, hence 784 neurons as the inputs.
The value of a neuron in a hidden layer gets its value by calculating a weighted sum of its input values from connected neurons from the previous layer, adding a bias, and then passing the result through an activation function, which transforms the sum into the final output value of the neuron. For example, to get the value for the first neuron of the hidden layer in the handwritten digit recognization neural network with 784 input neurons, we have to calculate the weighted sum of the 784 input values, adding a bias and then passing the result through an activation function to get the value. Since there are 784 input values, there will be 784 weights because each input will have its own weight. A weighted value is obtained by multiplying an input value by it’s weight, doing this for 784 inputs will get 784 weighted values. The weighted sum is obtained by summing up these 784 weighted values. A bias is then added to the weighted sum to get an new output value, this output value is then passed through an activation function to get the final output value for a neuron. Each neuron in the hidden layer needs to go through all of these calculation in order to get its value. The input value will stay the same, but the weights and bias are different for every neuron value calculation in the hidden layer. Graphically from the above handwritten digit recognization neural network, you can see each neuron in the hidden layer is connected with 784 lines because there are 784 input values. If there are more than one hidden layer, each neuron in the next hidden layer has to go through the same calculation process. The only differences is that the inputs will now be the neurons from he previous hidden layer instead of the input layer.
For the output layer, each neuron essentially gets its value the same way as the neurons in the hidden layer. The only difference is that the input neurons are now from the hidden layer right before the output layer.
Weights and biases
In a neural network, weights and biases are initially assigned random values and then learned during the training process through a process called back propagation, where the network adjusts these values based on the difference between its predictions and the actual target values. Technically, it takes derivative of a cost function which represents the “slope” of the cost function at a given point, indicating how much the cost changes with respect to a small change in the networks’ parameters (weights and biases), essentially the network fine-tunes the weights and biases to minimize the cost during training using gradient descent, allowing it to make better predictions on new data.
The number of weights needed for getting a single neuron value in a hidden layer is the number of total input neurons in the input layer. Hence the total number of weights needed for getting all of the neuron values in a hidden layer is the number of neurons of the input layer times the number of neurons of the hidden layer. A hidden layer can also be an input layer when there are multiple hidden layers, or it’s the hidden layer right before the output layer. In the handwritten digit recognization neural network for the MSIT image data. The number of weights needed to get a value for a neuron in the hidden layer is 784 because there are 784 inputs. The number of weights needed to get all 15 neuron values in the hidden layer will be 784 times 15 which is 11760. To go from the hidden layer to the output layer, we can think of the hidden layer as the new input layer and the output layer as the hidden layer or still view it as the output layer if you wish. The idea is really just to convert a bunch of numbers into a new set of numbers. Since there are 15 neurons in the hidden layer, it will need 15 weights for getting a value for a neuron in the output layer. Since there are 10 neurons in the output layer, it will require 150 weights to get all neuron values in the output layer because 15 times 10 equals 150. In total, 11760 plus 150, we will need 11910 weights to process a single 28 by 28 pixels image using the above handwritten digit recognization neural network.
The biases are only needed for the hidden layer or the output layer, the input layer or the hidden layer that is being used as input layer for next hidden layer or output layer does not need biases. The number of biases needed for a hidden/output layer is the the number of neurons of the hidden/output layer. In the handwritten digit recognization neural network above, there are 15 neurons in the hidden layer, hence it needs 15 biases for getting all of the neuron values in the hidden layer. There are 10 neurons in the output layer, hence it needs 10 biases for getting all of the neurons values in the output layer. So the total number of biases needed for this handwritten digit recognization neural network is 15 plus 10 which will be 25. It needs 25 biases to process a single 28 by 28 pixels image.
As for the actual value of weights and biases, they are all randomly generated at first. Then after an image is fed into the network as a training image, it will update weights and biases to improve the accuracy of the network. If we are feeding 50000 labeled image data into the network to train it, these weights and biases will get adjusted/updated 50000 times.
Here is a sample random weights for 784 input neurons:
|-1.2651|-0.8090|-1.6988|1.0857 |1.7013 |0.6900 |-0.6129|-1.9394|-1.0222|-0.8834|1.1505 |0.9092 |-1.9330|0.0095 |1.5569 |0.1401 |-1.1669|-0.1204|0.3014 |0.6593 |0.1555 |-0.4565|-0.9909|-1.5216|0.3594 |0.9056 |-0.2062|-1.5522 |-0.9483|1.3362 |-0.6070|1.8282 |-0.1789|0.6141 |0.5472 |-0.9354|0.8640 |0.0924 |-0.7577|1.6009 |-0.8171|0.2852 |-0.5260|0.9520 |0.5780 |-0.2995|0.5894 |0.1866 |1.6496 |0.2473 |-0.8720|-0.8557|1.5978 |-0.1836|0.4430 |0.5171 |-0.3337|-1.0105|-1.8163|-0.8011|-0.6917|0.0632 |1.1730 |-1.3502|1.0767 |-0.1825|-0.6915|0.7578 |-3.0192|-0.2360|0.0290 |0.3663 |-1.3019|-2.5041|-2.0399|0.4270 |0.2351 |-0.6625|2.2626 |0.5103 |-0.8128|-0.2559|-0.0367|-0.7988 |-0.2473|-0.5732|0.0354 |-0.9617|-0.8300|0.7985 |-0.1095|3.2137 |0.5705 |1.1844 |1.0324 |0.1807 |-0.6404|-0.2727|0.4857 |-0.2006|0.9660 |1.0861 |0.6007 |-1.2798|-0.9728|0.1788 |0.5255 |-1.1118|-1.1845|0.5173 |-1.3254|0.3985 |-0.2091|0.7610 |0.1427 |-0.4636|0.2184 |-0.0644|0.3691 |1.7039 |-0.1189|0.5549 |-2.3208|-1.4346|1.9114 |-1.1832|-1.3017|-1.6506|-0.1663|-1.1634|-0.5145|-0.1915|-0.9990|-0.1367|-1.3794|1.5742 |-0.2395|-0.0425|1.4574 |0.8374 |1.9605 |-1.0456|-1.5838|-0.9174|-1.6890|-1.9890|-0.1524|-0.4575|0.8071 |-0.1180|0.1266 |2.1392 |-0.1684|-0.4655|-0.5491|-0.6427|0.4363 |0.2576 |-0.4743|0.0304 |0.3868 |0.3844 |-0.2387|-1.0501|-0.5985|0.7943 |0.9768 |0.1820 |0.1079 |1.1747 |-1.6871|0.9148 |-0.3037|-0.0975|0.7095 |0.4520 |-1.6048|-1.0688|-0.4722|1.1812 |2.5588 |-0.8409|-0.1794|-0.0455|0.6720 |1.2040 |2.1276 |-0.1733|0.0216 |0.9822 |0.2301 |-0.2372|0.1765 |-1.5534|0.1028 |-0.6330 |0.2769 |0.0801 |0.7863 |0.4045 |-1.4451|0.1932 |-2.1793|0.6200 |1.0722 |-0.3041|1.1274 |-0.5882|-1.6750|-1.2911|1.6641 |-0.7506|0.8829 |-0.8319|-0.3433|0.9099 |0.4169 |-0.7916|-0.2650|1.8875 |-1.7294|1.6555 |-0.6461|0.8998 |-0.7588|-0.0047|-0.3807|1.1342 |-0.9185|0.4284 |-0.8798|-0.0980|1.0921 |0.3411 |0.7303 |1.0955 |-1.0335|-1.5995|-0.5956|-0.2658|-0.2068|-1.3410|-0.6982|-0.5938|-1.4017|0.6647 |-1.7120|1.3547 |0.7302 |0.4720 |0.6161 |0.5229 |-0.2833|0.4588 |0.6378 |0.7116 |0.2193 |-0.6537|0.4497 |-1.2164|-0.4218|-0.8660|-0.6212|1.0792 |-0.9675|-1.0070|-0.2309|0.0961 |1.8327 |-0.2759|-0.8867|-1.2470|0.2773 |0.6117 |-0.3106|0.6871 |-1.3887|0.4738 |0.0220 |0.1999 |0.0234 |-0.5314|0.4472 |-1.4622|0.3696 |0.6047 |-0.4555|-0.5797|0.3877 |-1.1618|0.7890 |-0.4213|-0.4857|-0.3695|-0.7607|-0.7803|0.6929 |1.5959 |-0.6924|-0.4945|0.0490 |-0.6329|1.3685 |0.3880 |0.9510 |-1.5665|0.1692 |1.2779 |0.7525 |1.4819 |0.0420 |-0.2748|1.2278 |1.0543 |1.3383 |-1.9387|0.5130 |-0.0480|-0.4897|-0.4528|0.0175 |1.2384 |1.6149 |1.1592 |0.0699 |-0.2542|-1.2511|1.3750 |1.6376 |-0.4390|-0.3944|-1.7702|-0.9258|1.2617 |-0.3761|-1.7335 |0.7250 |0.8946 |1.2770 |1.1781 |1.3152 |0.1565 |-1.2534|1.0231 |-0.1569|-2.6348|-0.9410|-2.6443|-0.4126|0.5497 |0.8071 |0.6509 |-0.1762|-1.4725|0.1747 |0.4348 |-0.1511|2.2003 |-0.8347|0.2130 |0.8037 |-0.1511|0.2942 |-0.5883 |1.0036 |2.3024 |1.0902 |2.5935 |0.8247 |-0.3559|0.0038 |0.3448 |0.7777 |0.2198 |1.0155 |1.4515 |-1.0259|0.3454 |0.2638 |0.2246 |-1.1845|-0.7379|-0.0591|-1.0454|1.8739 |-0.1274|0.8863 |1.4194 |-0.5352|1.4770 |2.3563 |-1.7962 |1.7356 |-0.1939|-0.3487|2.3315 |0.6808 |-0.6493|0.2827 |0.5546 |0.0839 |2.1831 |-1.4125|-1.7297|-2.8487|1.0667 |-0.7694|0.2745 |1.7802 |1.1453 |-1.7867|0.2305 |-1.6025|-0.7509|1.2229 |-0.6469|0.6259 |0.1587 |0.0278 |-0.1503 |0.6697 |0.2856 |-1.2223|0.1342 |1.1180 |-1.0757|-0.2285|-0.4738|0.1442 |-0.7988|-0.0077|0.0917 |-0.2127|3.3935 |0.1405 |1.4117 |-0.0880|2.2570 |0.2732 |1.5207 |1.6523 |-0.6015|0.3255 |-0.0707|-0.9008|0.5615 |-0.2526|1.0088 |0.3419 |-0.3067|-0.1686|0.6982 |0.8937 |1.0834 |-1.5886|1.0095 |-0.7140|1.4363 |0.1797 |0.1583 |-0.2321|0.5748 |-0.8885|-0.3649|-0.7686|0.3705 |-0.4175|0.5016 |-1.0176|0.9921 |-0.8346|0.9017 |-0.5603|0.5733 |0.2231 |1.2691 |-1.3398|2.0787 |0.4097 |0.2369 |0.7777 |0.7632 |0.7955 |-1.7404|-1.1122|0.6601 |1.5493 |0.0238 |-2.2743|0.3588 |-1.4871|0.6881 |2.8387 |-0.8347|0.1356 |0.2807 |1.0909 |0.3039 |-0.9827|-0.2422|0.4016 |-0.8349|-0.6510|0.4862 |-0.9125|0.1139 |-0.1802|-0.5839|0.8631 |0.3968 |0.1698 |0.7373 |-0.7946|0.3511 |-0.6150|-1.4119|-0.5886|-2.1193|-0.3019|-1.5221|1.2542 |1.2289 |1.4185 |0.4473 |1.4577 |-0.3072|0.2278 |1.3761 |1.0143 |-1.6921|-1.2519|-0.8534 |1.5319 |-1.2128|0.8104 |-0.3597|0.1744 |-0.7490|0.4101 |-0.6922|0.1464 |0.2505 |-0.6485|-0.1695|2.0543 |-0.6121|0.0332 |-0.2082|0.0593 |-0.5102|1.5261 |0.9716 |-0.0420|1.5297 |-0.1109|1.5647 |1.3095 |1.3344 |0.4348 |0.5685 |1.8482 |-0.6132|0.3778 |0.5004 |0.9601 |-1.0597|0.8826 |-0.8194|0.0848 |0.4991 |0.3799 |-0.7854|-0.0987|-0.9148|-0.6672|-0.4274|0.2184 |-1.6459|0.5997 |-0.8600|-1.2120|-0.1096|-0.8476|-0.1695|0.2510 |0.7718 |-1.1733|-0.2107 |-0.6142|-0.0813|0.3893 |-0.3111|-0.4327|-0.3136|-1.4807|0.8057 |0.2799 |-1.3194|-0.7022|0.4558 |-0.2323|0.9940 |-1.5923|-0.0419|0.0426 |-0.8603|0.2030 |0.0190 |0.4770 |-0.6641|0.2422 |1.7720 |0.7094 |-0.0201|0.5056 |-0.5142 |-1.1804|-2.6988|-2.7833|-0.2256|-0.3594|0.4988 |-0.5088|-0.8392|-0.4861|0.7217 |0.0065 |-1.6970|-1.3064|1.5622 |-1.6589|1.8592 |0.3702 |0.7018 |0.4939 |-1.0106|0.3800 |-0.0069|-0.4358|-1.2641|1.0627 |1.7309 |2.3332 |-0.5836 |-0.0493|0.1993 |-1.8130|-0.4307|0.2030 |0.1258 |-0.7235|-0.8324|-0.6912|-0.5757|1.6416 |0.0444 |-0.0945|0.3423 |0.1498 |-1.3850|-0.8457|1.4177 |0.5590 |-0.4354|-1.1580|-1.0993|1.0765 |0.9689 |1.0205 |-1.7498|-1.0893|2.0413 |-1.9350|0.5455 |-0.6231|-0.2228|0.1711 |0.2557 |0.5389 |2.2794 |-0.8472|-0.3560|1.6968 |-0.4342|1.6031 |1.2132 |0.7609 |-0.1876|0.1865 |1.5044 |-0.6984|0.2358 |-2.1500|-0.9907|-0.2379|-0.7045|-0.7605|0.2737 |-0.1254|-0.0408 |-0.7681|-0.5618|1.2383 |0.2756 |-2.3619|-0.4972|0.5598 |0.4120 |2.3939 |-0.0342|1.2705 |-1.8408|-1.2955|-0.7969|0.0126 |-0.6161|1.3888 |-1.3473|-0.6412|0.1739 |-1.2546|-0.1134|0.5235 |0.6589 |-0.7404|0.2669 |-0.0613|0.8041 |1.5621 |-0.7603|-0.0271|-0.6179|0.6909 |0.4051 |0.4783 |-0.0841|0.3704 |-0.1265|1.8606 |0.0941 |1.0369 |-2.3488|1.5964 |-0.1276|-0.5809|0.8567 |-1.1057|-1.2037|0.2941 |1.4856 |-2.1382|-0.5322|0.7903 |-0.9932|1.1370 |1.0686 |0.0958 |1.2012 |1.1947 |-0.9830|-0.8157|1.2283 |-0.1353|0.2716 |0.1185 |0.7029 |0.0298 |0.1787 |1.1689 |-0.9922|-0.0346|-0.7023|3.1942 |-0.2896|0.1963 |0.0654 |-0.0978|0.7838 |-3.0162|0.5753 |0.0551 |-0.5275|-1.0970|-1.1497
Here is a list of sample weights for the hidden layer when it is being processed as the input layer to produce the output layer neuron values.
[-0.8405,0.2393 ,-0.0073,0.9292 ,0.6307 ,2.1406 ,0.9013 ,-1.6851,-0.0139,0.2420 ,-0.7958,-0.0896,-0.3030,-0.8843,-0.3770]
Here is a sample biases for the hidden layer:
[[-0.74555834] [ 1.32840211] [ 0.60680441] [-0.61619871] [-0.75000556] [-1.27153402] [ 2.48666413] [-0.13057067] [ 1.31535368] [ 1.89554804] [ 0.96974504] [ 0.94212068] [-1.51683047] [-0.5537063 ] [ 0.30924096]]
Here is a sample biases for the output layer:
[[ 1.02342246] [-0.45875094] [ 0.56651443] [ 1.82527478] [ 1.46097958] [-0.48350051] [ 1.32493587] [ 2.01979146] [ 1.07815773] [-1.63518175]]
What is an activation function?
An activation function is a function that takes the input, weight and bias, and produce a value for a neuron in the hidden layer or output layer. For the handwritten digit recognization neuron network, it uses the sigmoid function as the activation function. It transform a neuron’s input into an output value typically ranging between 0 and 1, making it useful for binary classification tasks where outputs can be interpreted as probabilities. This works very well for digit recognization. We can think of each neuron in the hidden layer represents a portion of a digit such as a curve, a corner, a loop, a line, etc. If the value for a neuron that represents a curve is closer to 1, then it means it is very likely it is a curve, otherwise it may not be a curve. For the output layer, there are 10 neurons representing the digit 0 to 9. Each of these neuron is also a value between 0 and 1. If the value for a neuron representing the digit 6 has a value of 0.99, that means the network recognized the image as a 6.
What is a cost function in neural networks?
A cost function in a neural network is a mathematical function that measures how well the network is performing on a given task by calculating the difference between the predicted output and the actual target value, essentially acting as a metric to guide the network in adjusting its weights and biases to minimize the error and improve accuracy during training; the goal is to find the set of parameters that produce the lowest cost value, representing the best possible prediction for the network.
The derivative of a cost function in neural networks
The derivative of a cost function represents the “slope” of the cost function at a given point, indicating how much the cost changes with respect to a small change in the model’s parameters (weights and biases), essentially guiding the direction to update the parameters to minimize the cost during training using gradient descent. By calculating the derivative of the cost function with respect to each parameter, we can determine how much adjusting that parameter will affect the overall cost, allowing us to update the parameters in the direction that reduces the cost the most. The derivative is crucial for gradient descent optimization, where the algorithm iteratively updates parameters in the direction of the negative gradient (the direction that minimizes the cost).
Search within Codexpedia
Search the entire web