K Fold Cross Validation:
Friday, June 9, 2017
Monday, May 1, 2017
At Bala's place!
Two days have just slipped. It was quick and good. Some learnings. Some insights.
And bigger event is that 3 months have passed since I left my job.
So, how was it?
It was good. It was mostly exploration. And a big relaxation mode. I need to take it to the next level now.
More travel needs to be done. And will be staying at friend's place more often.
Tomorrow will meet my previous employer and make a visit to office. Also, will meet a friend that I met randomly on a BlaBla trip and will end the day by meeting Ratul, my old time friend.
Will spend more time with you from tomorrow on. Will sleep for now. Goodnight!
And bigger event is that 3 months have passed since I left my job.
So, how was it?
It was good. It was mostly exploration. And a big relaxation mode. I need to take it to the next level now.
More travel needs to be done. And will be staying at friend's place more often.
Tomorrow will meet my previous employer and make a visit to office. Also, will meet a friend that I met randomly on a BlaBla trip and will end the day by meeting Ratul, my old time friend.
Will spend more time with you from tomorrow on. Will sleep for now. Goodnight!
Monday, April 17, 2017
Wednesday, April 12, 2017
[]Convolution
How do we apply convolution on images?
An image is made up of 3 matrices. Pixel values for Red, Green and Blue.
If we convolve that, we get a first level of feature map. The convolving matrix is called Convolution Kernel.
What do we convolve it with?
We do it with a
An image is made up of 3 matrices. Pixel values for Red, Green and Blue.
If we convolve that, we get a first level of feature map. The convolving matrix is called Convolution Kernel.
What do we convolve it with?
We do it with a
Friday, April 7, 2017
[Udicity]Perceptron
https://classroom.udacity.com/nanodegrees/nd101/parts/8b03c7e1-308c-477a-bd21-28872f5fed78/modules/2afd43e6-f4ce-4849-bde6-49d7164da71b/lessons/dc37fa92-75fd-4d41-b23e-9659dde80866/concepts/5ab911d0-fe20-4113-852c-8a07fe9bdacc
Adding weights and bias.
Summing the inputs.
If weight is zero: No effect of that parameter
If exactly opposite: No bias.
If opposite at some level:
If bias is zero: Weights can take any number. You can multiply or divide it by any number.
Building blocks of a neural networks are again NOT, AND and OR gates.
This is best said in the following comment from Udacity where we created a XOR gate from NOT, AND and OR gates.
"You've seen that a perceptron can solve linearly separable problems. Solving more complex problems, you use more perceptrons. You saw this by calculating AND, OR, NOT, and XOR operations using perceptrons. These operations can be used to create any computer program. With enough data and time, a neural network can solve any problem that a computer can calculate. However, you don't build a Twitter using a neural network. A neural network is like any tool, you have to know when to use it.
Adding weights and bias.
Summing the inputs.
If weight is zero: No effect of that parameter
If exactly opposite: No bias.
If opposite at some level:
If bias is zero: Weights can take any number. You can multiply or divide it by any number.
Building blocks of a neural networks are again NOT, AND and OR gates.
This is best said in the following comment from Udacity where we created a XOR gate from NOT, AND and OR gates.
"You've seen that a perceptron can solve linearly separable problems. Solving more complex problems, you use more perceptrons. You saw this by calculating AND, OR, NOT, and XOR operations using perceptrons. These operations can be used to create any computer program. With enough data and time, a neural network can solve any problem that a computer can calculate. However, you don't build a Twitter using a neural network. A neural network is like any tool, you have to know when to use it.
The power of a neural network isn't building it by hand, like we were doing. It's the ability to learn from examples. In the next few sections, you'll learn how a neural networks sets it's own weights and biases."
Tuesday, April 4, 2017
Maths for deep learning!
- Linear Algebra.
- Calculus.
- Statistics.
If you just want to use the existing models, you don't need much maths. But, if you want to build your models, you need some understanding of the terminology.
At the level of data cleaning, we need some maths.
Like Min Max Scaling.
This makes all the data in the range 0-1.
Formula for it:
(X-Xmin)/(Xmax-Xmin)
When we have to give this data to the TensorFlow, we will be dealing with the following mainly.
1) Scalar. Constant
2) Vector. 1 D
3) Matrix. 2 D
4) Tensor. N D
So, overall we use maths in the following cases:
1) Normalising.
2) Learning Hyperparameters
3) Initialising Weights.
4) Forward Propagation.
5) Calculate error.
6) Backpropagate to get cost corrected.
Hyper parameters.
1) Batch size.
2) Initial Learning rate.
3) Learning rate schedule
4) Rotations.
5) No. of iterations.
6) Weight decay
7) Random Minor.
8) Transformations.
These parameters can be chosen based on the domain knowledge or we can also use Search approach where we just define a range for each parameter.
Q:If you can do matrix operations using Python, will you still use NumPy?
A:YES. NumPy library is written in C, which makes it faster than native python.
Python lists can have different data types as well. But, numpy can only have one specific data type.
- Calculus.
- Statistics.
If you just want to use the existing models, you don't need much maths. But, if you want to build your models, you need some understanding of the terminology.
At the level of data cleaning, we need some maths.
Like Min Max Scaling.
This makes all the data in the range 0-1.
Formula for it:
(X-Xmin)/(Xmax-Xmin)
When we have to give this data to the TensorFlow, we will be dealing with the following mainly.
1) Scalar. Constant
2) Vector. 1 D
3) Matrix. 2 D
4) Tensor. N D
So, overall we use maths in the following cases:
1) Normalising.
2) Learning Hyperparameters
3) Initialising Weights.
4) Forward Propagation.
5) Calculate error.
6) Backpropagate to get cost corrected.
Hyper parameters.
1) Batch size.
2) Initial Learning rate.
3) Learning rate schedule
4) Rotations.
5) No. of iterations.
6) Weight decay
7) Random Minor.
8) Transformations.
These parameters can be chosen based on the domain knowledge or we can also use Search approach where we just define a range for each parameter.
Q:If you can do matrix operations using Python, will you still use NumPy?
A:YES. NumPy library is written in C, which makes it faster than native python.
Python lists can have different data types as well. But, numpy can only have one specific data type.
Monday, March 27, 2017
TensorFlow and Deep Learning without a PhD, Part 1 (Google Cloud Next '17) -
https://www.youtube.com/watch?v=u4alGiomYP4
Looks like Neural networks always have a non linear activation function.
This video explains number recognition. Which is a classification problem. May be multinomial linear regression!
Softmax is one algorithm which works well on classification problems.
NumPy is a numerical library for Python.
Looks like Neural networks always have a non linear activation function.
This video explains number recognition. Which is a classification problem. May be multinomial linear regression!
Softmax is one algorithm which works well on classification problems.
NumPy is a numerical library for Python.
Sunday, March 26, 2017
Deep Learning from Nvidia blog!
Ref: https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-core-concepts/
---------
Part 1 focuses on introducing the main concepts of deep learning.
Part 2 provides historical background and delves into the training procedures, algorithms and practical tricks that are used in training for deep learning.
Part 3 covers sequence learning, including recurrent neural networks, LSTMs, and encoder-decoder systems for neural machine translation.
Part 4 covers reinforcement learning.
Feature Engineering was an essential thing for Machine Learning. It was needed for easier learning process.
And it was a difficult process. Every dataset needed a different feature engineering. If we change the data set we had to again come up with new features.
feature engineering remains the single most effective technique to do well in difficult tasks.
Know the meaning of Regression. Simply it means, statistical relation between input and output.
Deep Learning learns the features. So, feature engineering is replaced by feature learning.
Imagine how difficult it would have been if we had to tell all the features of a cat, building, car?
Anyway, even humans are not taught in that way. Our parents never taught us that, since it has paws, eyes, tail, ears hence, you call it cat. If we were taught that way we would have called a tiger a cat! Which is technically right, though)
We saw a cat a hundred times and we ourselves understood that it is a cat. Same is expected from a machine doing deep learning.
One major difference is, we were able to understand a cat just with a hundred or tens of images of cat. Whereas a machine needs thousands or millions to get to that understanding!
And of course the power we consume. Machines consume exorbitant amount of power to run such algorithms!
We underestimate human capability so much, indeed! But what can we say. We have evolved these skills over millions of years. And the computers have just started. And I would say, they are already evolved pretty advanced in a matter of few decades!
One more reason for enhanced interest in Deep learning in recent years. Didn't understand one or two terms in it. But, it seems now we have better activation function that takes care that gradient don't become too small for the later layers to process input. Previously, these activation functions were absent. And they used to face Vanishing gradient issues.
Source:
Non-linear hierarchical features: What does this mean?
LSTM is another reason for the success story of Deep learning. It talks about the time variable dependency. I don't know if helps in image recognition but may be videos. Where the time factor is involved. And what it does is, it correlates the output data with hundreds of older inputs and outputs.
This is very unlike the practices earlier where they used to use only upto 10 previous inputs. This technique was introduced by some two guys in 1997. But it picked up only in recent years.
May be this LSTM and activation functions are related. I am just guessing!!
Perceptron is similar to neurons in our brain.
The one app which used AI and got famous. Any guesses? It's Prisma. There are very few people who wouldn't have heard of it.
It turns normal pictures into art-like photos. This was one of the biggest buzz in the last year!
In this context, one can see a deep learning algorithm as multiple feature learning stages, which then pass their features into a logistic regression that classifies an input.
Logistic regression is a simple and well known algorithm. It doesn't have any hidden layers. And can work on very less data.
So, now into the Artificial neural networks! What happens there.
1) Take the input
2) Apply a transformation with a weighted sum over all the inputs.
3) Turn into into an intermediate state with a non linear function.
This will give you the feature. This entire thing happens in a layer. And the Transformative function is called a unit.
Like any other learning, the neural network gets better by looking at the error or cost and modifying the weights. Probably, at the second step.
So, the word convolution that we here so often is the one involved in 2nd step. As a transformative function. Similar to this are pooling. And also Units. Which are mostly the activation functions that we were talking about earlier.
Units are generally compared to neurons. The word neurons is too misleading. People compare it to how our brains work. But, in fact, recently, people started realising that biological neurons are similar to entire multi layer peceptrons(units) rather than a single unit. Hence it is better not to use the word neurons and prefer the word unit. or even perceptron. My guess is, perceptrons are the traditional approach to deep learning without good activation functions.
Weighed data: Matrix multiplied with input and weight matrix.
Difference between an activation function and a unit is that a unit can have multiple activation functions. Like an LSTM or it can have more complex structure. (Doesn't it just mean a more complex activation function?)
Is it necessary that activation functions are always non-linear?
I don't think so. We just saw a rectified linear function.
Difference between linear and non linear functions.
With non linear functions we create new relations between input parameters.
It seems using non linear functions creates increasingly complex features in deep learning.
Big point is, a chain of layers(even 1000s) with linear functions is equivalent to a single layer. Because a chain of matrix multiplication can be represented by a single matrix multiplication.
That's why non linear functions are so important in deep learning.
A layer is usually uniform. Which means, it only applies one type of activation function on it, so that it can be easily compared to other parts of network.
The first and last layers are called input and output layers and the ones in the middle are hidden layers.
Hidden doesn't mean, it is hidden from the developer as well. Just that it is neither the input or output.
---------
Part 1 focuses on introducing the main concepts of deep learning.
Part 2 provides historical background and delves into the training procedures, algorithms and practical tricks that are used in training for deep learning.
Part 3 covers sequence learning, including recurrent neural networks, LSTMs, and encoder-decoder systems for neural machine translation.
Part 4 covers reinforcement learning.
Feature Engineering was an essential thing for Machine Learning. It was needed for easier learning process.
And it was a difficult process. Every dataset needed a different feature engineering. If we change the data set we had to again come up with new features.
feature engineering remains the single most effective technique to do well in difficult tasks.
Know the meaning of Regression. Simply it means, statistical relation between input and output.
Deep Learning learns the features. So, feature engineering is replaced by feature learning.
Imagine how difficult it would have been if we had to tell all the features of a cat, building, car?
Anyway, even humans are not taught in that way. Our parents never taught us that, since it has paws, eyes, tail, ears hence, you call it cat. If we were taught that way we would have called a tiger a cat! Which is technically right, though)
We saw a cat a hundred times and we ourselves understood that it is a cat. Same is expected from a machine doing deep learning.
One major difference is, we were able to understand a cat just with a hundred or tens of images of cat. Whereas a machine needs thousands or millions to get to that understanding!
And of course the power we consume. Machines consume exorbitant amount of power to run such algorithms!
We underestimate human capability so much, indeed! But what can we say. We have evolved these skills over millions of years. And the computers have just started. And I would say, they are already evolved pretty advanced in a matter of few decades!
One more reason for enhanced interest in Deep learning in recent years. Didn't understand one or two terms in it. But, it seems now we have better activation function that takes care that gradient don't become too small for the later layers to process input. Previously, these activation functions were absent. And they used to face Vanishing gradient issues.
Source:
"While hierarchical feature learning was used before the field deep learning existed, these architectures suffered from major problems such as the vanishing gradient problem where the gradients became too small to provide a learning signal for very deep layers, thus making these architectures perform poorly when compared to shallow learning algorithms (such as support vector machines).
The term deep learning originated from new methods and strategies designed to generate these deep hierarchies of non-linear features by overcoming the problems with vanishing gradients so that we can train architectures with dozens of layers of non-linear hierarchical features. In the early 2010s, it was shown that combining GPUs with activation functions that offered better gradient flow was sufficient to train deep architectures without major difficulties. From here the interest in deep learning grew steadily." Non-linear hierarchical features: What does this mean?
LSTM is another reason for the success story of Deep learning. It talks about the time variable dependency. I don't know if helps in image recognition but may be videos. Where the time factor is involved. And what it does is, it correlates the output data with hundreds of older inputs and outputs.
This is very unlike the practices earlier where they used to use only upto 10 previous inputs. This technique was introduced by some two guys in 1997. But it picked up only in recent years.
May be this LSTM and activation functions are related. I am just guessing!!
Perceptron is similar to neurons in our brain.
The one app which used AI and got famous. Any guesses? It's Prisma. There are very few people who wouldn't have heard of it.
It turns normal pictures into art-like photos. This was one of the biggest buzz in the last year!
In this context, one can see a deep learning algorithm as multiple feature learning stages, which then pass their features into a logistic regression that classifies an input.
Logistic regression is a simple and well known algorithm. It doesn't have any hidden layers. And can work on very less data.
So, now into the Artificial neural networks! What happens there.
1) Take the input
2) Apply a transformation with a weighted sum over all the inputs.
3) Turn into into an intermediate state with a non linear function.
This will give you the feature. This entire thing happens in a layer. And the Transformative function is called a unit.
Like any other learning, the neural network gets better by looking at the error or cost and modifying the weights. Probably, at the second step.
So, the word convolution that we here so often is the one involved in 2nd step. As a transformative function. Similar to this are pooling. And also Units. Which are mostly the activation functions that we were talking about earlier.
Units are generally compared to neurons. The word neurons is too misleading. People compare it to how our brains work. But, in fact, recently, people started realising that biological neurons are similar to entire multi layer peceptrons(units) rather than a single unit. Hence it is better not to use the word neurons and prefer the word unit. or even perceptron. My guess is, perceptrons are the traditional approach to deep learning without good activation functions.
Weighed data: Matrix multiplied with input and weight matrix.
Difference between an activation function and a unit is that a unit can have multiple activation functions. Like an LSTM or it can have more complex structure. (Doesn't it just mean a more complex activation function?)
Is it necessary that activation functions are always non-linear?
I don't think so. We just saw a rectified linear function.
Difference between linear and non linear functions.
With non linear functions we create new relations between input parameters.
It seems using non linear functions creates increasingly complex features in deep learning.
Big point is, a chain of layers(even 1000s) with linear functions is equivalent to a single layer. Because a chain of matrix multiplication can be represented by a single matrix multiplication.
That's why non linear functions are so important in deep learning.
A layer is usually uniform. Which means, it only applies one type of activation function on it, so that it can be easily compared to other parts of network.
The first and last layers are called input and output layers and the ones in the middle are hidden layers.
Hidden doesn't mean, it is hidden from the developer as well. Just that it is neither the input or output.
Monday, February 27, 2017
Machine Learning. Beginnings!
Started with Machine learning. It has been on my mind since quiet some time.
Though I did pretty well with Machine Learning course from Coursera two years back, I have almost forgotten it.
My interest in it again kindled by Deep learning. I felt I should give a shot to it. So, I am back!
I have spoken to few people to find out if I should again go back to Machine learning in order to learn Deep learning. By the time they answered the question, I got what I wanted to do!
It's better to start with Machine learning and then move to Deep learning. And the reasoning behind it is, Deep learning is just one of the tool from the tool box of a mechanic.
If we just learn about this one tool, we tend to look at all the problems from the this tool perspective. It's like you only know how to work with a screw driver and hence, you try to fix any work with a screw driver. It's good to know one tool really well, but before that it is important to have an overview of all other tools as well!
With that clarity in mind, I have started with a book called Machine Learning with R. I am on the third chapter now. Will keep writing whatever I learn from now on!
I haven't got any interesting data to work on till now. I need to file some RTI and get data.
Though I did pretty well with Machine Learning course from Coursera two years back, I have almost forgotten it.
My interest in it again kindled by Deep learning. I felt I should give a shot to it. So, I am back!
I have spoken to few people to find out if I should again go back to Machine learning in order to learn Deep learning. By the time they answered the question, I got what I wanted to do!
It's better to start with Machine learning and then move to Deep learning. And the reasoning behind it is, Deep learning is just one of the tool from the tool box of a mechanic.
If we just learn about this one tool, we tend to look at all the problems from the this tool perspective. It's like you only know how to work with a screw driver and hence, you try to fix any work with a screw driver. It's good to know one tool really well, but before that it is important to have an overview of all other tools as well!
With that clarity in mind, I have started with a book called Machine Learning with R. I am on the third chapter now. Will keep writing whatever I learn from now on!
I haven't got any interesting data to work on till now. I need to file some RTI and get data.
Subscribe to:
Comments (Atom)