I’ve started to take a popular free course on Coursera about Machine Learning taught by Andrew Ng, one of the expert in that field. After learning a few about machine learning in YouTube that mostly covers about Genetic Algorithm and Artificial Neural Network in general. Now I eventually knows slightly better the foundation of it.

There are some topics that covered during the first and second week, but in brief, Ng explains a brief about history of machine learning, then the 2 difference of machine learning method (Supervised & Unsupervised Learning) and then about the concept of Linear Regression and how to implement the Gradient Descent to find the local or global optima of the function or hypothesis of the Linear Regression itself.

In this time, I will cover more about Linear Regression and how to implement the method of Gradient Descent. First of all, Linear Regression itself is an approach to model a relationship between 1 or more features. For example, suppose we have the data of Land Area in regards of the house price. Look at the table below:

 Land Area House Price 100 M2 Rp 1.000.000 200 M2 Rp 2.000.000 300 M2 Rp 3.000.000 400 M2 Rp 4.000.000

Then, by looking at the data, we can conclude that the function for determining the House Price is:

House Price = Land Area * 10.000

or in a more “symbolic way”

Y = 𝒙 * 10.000

Now, for the linear regression itself, let us make a more universal way to determine our function , but wait for a moment, we could not call it function yet, as we are only “guessing” it, so we call it as Hypothesis, and it is:

Y = θ0 + θ1 * 𝒙1

This is a simple hypothesis to determine a linear regression with 1 feature. And we could actually set the hypothesis for the linear regression to match the dataset. For example if we want to add more features in dataset:

Y = θ0 + θ1 * 𝒙+ … θn* 𝒙 n

Now, image that we have a new data about the price of the application in regards of its version

Now, by using the known hypothesis Y = θ0 + θ1 * 𝒙1 . We can start to look for the optimal value for the theta0 and theta1.

First, give theta0 and theta1 some random values. Make sure it is still in the range of the feature and Y value. This time, for sake of simplicity I want to make both 1.

As we can see, the blue line is our hypothesis, and it is still far from our expected value. Now let us see how big is the error of our current hypothesis by using the cost function

## Cost Function

We need a function to see how big the error of our hypothesis is. One common function that is often used is mean squared error, which measure the difference between the estimator (the dataset) and the estimated value (the prediction). It looks like this: