Machine learning has become quite the trend and you might have noticed a lot of people talking about it already. So today we will explain to you what all this fuss is about. In simple words Machine learning is all about learning from data.
What is this hype all about?
Truth to be told, the hype around machine learning is not going to fizzle or die out any time soon. It is a very important subject in a number of domains, as the subject has yielded some amazing results in various fields such as medicine, agriculture, cyber security, financial services, transportation and so on. At its core, the subject is really simple, and it involves lots and lots of data and making machines to learn from this data.
What is Machine Learning?
Machine learning is a subset of artificial intelligence (AI) which provides systems the ability to automatically learn and improve from previous data, without being explicitly programmed. It provides the system an ability to learn from past experiences and make future decisions on its own.
If you have never written code or dealt with algorithms, you may already be feeling lost. so let’s take an example. Suppose you are a real estate agent and have built a good intuitive sense from years of experience about how to price a home just by looking at it . Suppose your business had grown rapidly and you have no time to look at the houses as before. How to determine the price now? Here comes the machine learning. You start by collecting a bunch of historical data about the houses sold in your area. You collect details such as square feet, number of bedrooms, region, number of floors etc and you prepare a machine learning model based on it. The machine learning algorithm then automatically figure out the pricing logic from the given training data. Such a model will be ablato predict the house price.
Now we will get some more deep in to machine learning. Machine learning is all about feeding data into a generic algorithm and help it build its own logic, based on the data fed to it. This way, you don’t have to write codes. The main two subcategories of machine learning are supervised and unsupervised learning.
Supervised Machine Learning
Supervised learning is a machine learning technique in which we teach or train the machine using previously processed data. The process can be thought of as a teacher supervising the learning process and hence it is called supervised learning. In this technique we have prior knowledge of what the output values for our samples should be and algorithm learns from this training data set for enhancing a better prediction in future scenarios. Therefore, the goal of supervised learning is to learn a function given along with a sample of data and desired outputs, best approximates the relationship between input and output features in the data. There are mainly two types of supervised machine learning: Classification and Regression.
Regression is used when we want to map input to continuous output. Techniques of regression algorithms include linear regression, Decision Trees , K nearest neighbors and so on . Thus, many different models can be used, the simplest and the most commonly used one is the linear regression. It tries to estimate the data with the best hyper-plane which goes through the points. Some of the regression tasks include predicting the weight of a person, predicting the age of a person, predicting the price of houses in a particular area and so on.
Classification is the task of approximating a mapping function from input variables to discrete output variables. The output variables often called labels or categories are predicted through mapping function for a given observation. For example, when filtering emails “spam” or “not spam”, classify gender as “male” or “female”, when looking at transaction data, “fraudulent”, or “authorized” and so on. There are a number of classification models. Classification models include logistic regression, decision trees, random forest, gradient-boosted tree, multilayer perceptron, and Naive Bayes.
Unsupervised machine learning
Unsupervised learning is used only when have only input data and no corresponding output data. The goal of unsupervised learning is to model the underlying structure or distribution of data in order to learn more about it. According to Wikipedia “In data mining or even in data science world, the problem of unsupervised learning task is trying to find hidden structure in unlabeled data. Since the examples given to the learner are unlabeled, there is no error or reward signal to evaluate a potential solution”. From this itself you will get a broad idea about unsupervised learning. Some common algorithms under this technique include k-means clustering, auto encoders and principal component analysis. Since there are no labels in unsupervised learning, there is no specific way to compare model performances. It can be further classified into clustering and association problems.
A clustering problem is where you want to discover the inherent groupings in the data. For instance, grouping customers based on purchasing behavior.
If you want to discover rules that describe large portions of your data the problem is addressed as association. For example, if you want to take the number of people that buy A also tend to buy B. Then it is considered as an association problem.
Now we had gone through major categories of machine learning. At last, but not least, Machine learning is creating solutions for specific problems and is still very far from reaching the capacity of the human brain (general intelligence). Hope this overview was helpful to you to get a broad idea on machine learning. In our future articles we will dive deeper about machine learning and will give you a good intuition. Until then Happy Machine Learning!