Author | PURVA HUILGOL compile | Flin source | analyticsvidhya

Start your deep learning career?

Deep learning is a complex and daunting field for beginners. Concepts like hidden layers, convolutional neural networks, and back propagation keep coming up as you try to master topics you’re studying in depth.

It’s not easy — especially if you follow an unstructured learning path without first understanding the basic concepts. You’ll limp around a foreign city like a tourist without a map!

Here’s the good news — you don’t need an advanced degree or PhD to learn and master deep learning. Before entering the world of deep learning, however, you should know (and master) certain key concepts.

In this article, I will introduce five such basic concepts. I also recommend you enrich your deep learning experience with the following resources:

  • Introduction to Neural Networks (Free course)

    • Courses.analyticsvidhya.com/courses/Int…
  • Computer vision using deep learning

    • Courses.analyticsvidhya.com/courses/com…
  • Comprehensive Learning path for deep learning in 2020

    • www.analyticsvidhya.com/blog/2020/0…

The five basic elements to begin the journey of deep learning are:

  1. Prepare system

  2. Python programming

  3. Linear algebra and calculus

  4. Probability and statistics

  5. Key machine learning concepts

Let’s introduce them all.

1. Prepare the system

To learn a new skill (cooking, for example), you first need to have all the equipment. You’ll need tools such as knives, cooking utensils, and of course a gas range! You also need to know how to use these tools.

It’s also important to build your system for deep learning, knowing the tools you need and how to use them.

Whether you use Windows, Linux, or Mac, you must understand basic commands. Here’s a handy table for your reference:

This is a very good tutorial, allows you to start using Git and basic Git command: www.vogella.com/tutorials/G…

The deep learning boom has led not only to ground-breaking research in AI, but also to breaking down new barriers in computer hardware.

GPU (Graphics processing unit) :

For most deep learning projects, you’ll need a GPU to process image and video data. You can also build deep learning models on a laptop/PC without a GPU, but doing so would be time consuming. The main advantages that a GPU must provide are:

  1. It allows parallel processing
  2. In the CPU + GPU combination, the CPU assigns complex tasks to the GPU and assigns other tasks to itself, saving a lot of time

Here’s a great video explaining the difference between gpus and cpus:

  • youtu.be/-P28LKWTzrI

You don’t have to buy a GPU or install it on your computer. There are a variety of cloud computing resources available to provide gpus for free or at very low cost. In addition, there are gpus that preload some exercise data sets and preload their own tutorials. Some of these are Paperspace Gradient, Google Colab and Kaggle Kernels.

On the other hand, there are mature servers that require some setup steps and some custom functionality, such as Amazon Web Services EC2.

The following table illustrates the options you have:

Deep learning has also led Google to develop its own type of processing unit dedicated to building neural networks and deep learning tasks called TPU.

TPUs

A TPU or tensor processing unit is essentially a coprocessor used with the CPU. Tpus are cheaper than Gpus and therefore much faster, making it easy to build deep learning models.

Google Colab also offers TPU for free (not the full enterprise version, but the cloud version). This is the use of TPU and Google’s own model is built on its Colab tutorial: Colab notebooks | cloud TPU.

  • Cloud.google.com/tpu/docs/co…

To summarize, these are the basic minimum hardware requirements to start building a deep learning model:

2. Python programming

Continuing the analogy of learning to cook, you now have the hang of operating a knife and a gas range. But what about the skills and recipes needed to actually cook food?

This is where we come across the software needed for deep learning. Python is a cross-industry programming language for in-depth learning.

However, we can’t just use Python for the calculations and operations needed for deep learning. Other functions are provided by libraries in Python. A library can have hundreds of small tools called functions that we can use to program.

While you don’t need to be an in-depth coding ninja, you do need to understand the basics of Python programming

That said, instead of mastering the vast ocean of Python programming, learn some specific libraries dedicated to machine learning and processing data

Anaconda is a framework that helps you track Python versions and libraries. It is a handy multifunctional tool, very popular, easy to use, and has simple documentation. Here’s how to install Anaconda.

  • www.analyticsvidhya.com/blog/2019/0…

So what do I mean by Python basics? Let’s discuss it in more detail.

Note: You can start learning Python in our free lessons

  • Courses.analyticsvidhya.com/courses/int…

1. Variables and data types in Python

The main data types in Python are:

  • Int: integer
  • Float: the decimal
  • String: a single character or sequence of characters
  • Bool: Saves two Boolean values -True and False

2. Operators in Python

There are five main types of operators in Python:

  • Arithmetic operators: +, -, *, /, etc
  • Comparison operators: such as <, >, <=, > =, ==,! =
  • Logical operators: and, or, not
  • Identity operators: is, is not
  • Membership operators: in, not in

3. Data structures in Python

Python provides a variety of data sets that can be used for different purposes. Each data structure has unique properties that we can use to store different types of data and data types. These attributes are:

  • Ordered: This means that elements in a data structure are stored in a specific order. This order remains the same no matter how and when we use it (unless we explicitly change it)

  • Immutable: This means that data structures cannot be changed. If the data structure is mutable, that means it can be changed

In data science, the most common data structures are:

  • Lists: Ordered and variable

Example: We have a list like this:

my_list = [1,3,7,9]
Copy the code

This order remains the same everywhere this list is used. Alternatively, we can change the list, such as removing 7, adding 11, and so on.

  • Tuple: Similar to a list (ordered), but unlike a list, tuples are immutable

Example: Tuples can be declared as:

my_tuple = ("apple"."banana"."cherry") 
Copy the code

For now, this order will remain the same, but unlike the list, we cannot remove “cherry” or add “orange” to the tuple.

  • Sets: Unordered and mutable, although they can only hold unique values

Example: Collections use the following curly braces:

 my_set = {'apple'.'banana'.'cherry'}
Copy the code

No order is defined for the collection.

  • Dictionaries: a group of < key, value > pairs. Dictionaries are unordered and mutable. This means that they are largely out of order and can be changed, but can be accessed by index or key. Dictionaries can only have unique keys, although keys do not have to have unique values.

Example: Dictionaries also use curly braces in key-value format:

my_dict = { "brand": "Ford"."model": "Mustang"."year": 1964}
Copy the code

Here, “brand”, “model” and “year” are keys with values “Ford”, “Mustang” and “1964”, respectively. The order of the keys can be different each time the dictionary is printed.

4. Control flow in Python

Control flow means control code execution flow. We execute code line by line, and what we do on one line affects how we write the next:

Conditional statements

Conditions are set using the condition operators we saw earlier.

  • If-else: What would you like to eat today? Burger or salad? If you want a healthier option, opt for a salad, or, if you just want a quick bite and don’t care about calories, go for a burger. That’s what the if-else condition does

Example: You need to check whether a student passes or fails. If his score is > = 40, he has passed; Otherwise, his grade is not pass.

In this case, our conditional statement would be:

if marks >= 40:
    print("Pass")
else:
    print("Fail")
Copy the code
cycle

For loop: Used to iterate over a sequence. This sequence can represent a sequence of characters (strings) or any of the above data structures, such as lists, collections, tuples, and dictionaries

Example: We have a list of values from 1 to 5, and we need to multiply each value in this list by 3:

numbers_list = [1.2.3.4.5]

for each_number is numbers_list:
   print(each_number * 3)
Copy the code

Try the code snippet above and you’ll see how easy Python is!

Here’s the interesting thing: Unlike other programming languages, we don’t need to store the same types of variables in data structures. We could have A list like this [John, 153,78.5, “A +”] or even A list like [[” A “, 56], [” B “, 36.5]]. It is Python’s diversity and flexibility that makes it so popular with data scientists!

You can also take advantage of the following free courses that cover the basics of Python and Pandas:

  • Python for Data Science — Free course
    • Courses.analyticsvidhya.com/courses/int…
  • Pandas uses Python for data analysis
    • Courses.analyticsvidhya.com/courses/pan…

5.Pandas Python

This is one of the libraries you’ll encounter when you start machine learning and deep learning. Pandas is a very popular library that is required for both deep learning and machine learning.

We store data in a variety of formats, such as CSV (comma-separated values) files, Excel worksheets, etc. To handle the data in these files, Pandas provides a data structure called the Pandas Data box (you can think of it as a table).

Data boxes and Pandas The extensive operations provided on data boxes make it a workhorse library for machines and deep learning.

If you haven’t Pandas, you can choose free simple classes: courses.analyticsvidhya.com/courses/pan…

Now, if you read the list of five things we started doing, you might have a question: What will the math in deep learning do?

Well, let’s find out!

3. Linear algebra and calculus for deep learning

It is a common misconception that deep learning requires advanced knowledge of linear algebra and calculus. Well, let me dispel that myth here.

All you have to do is recall your high school math and start the journey of deep learning!

Let’s take a simple example. We have images of cats and dogs, and we want the machine to tell us which animal is present in any given image:

Now, we can easily identify cats and dogs here. But how will machines distinguish between the two? The only way to do this is to feed the model data in numerical form, and that’s where we need linear algebra. We basically converted images of cats and dogs into numbers. These numbers can be represented as vectors or matrices.

We’ll cover some key terms and some important resources you can learn from them.

Linear algebra for deep learning

1. Scalars and vectors: Although scalars have only magnitude, vectors have both direction and magnitude.

  • Dot product: The dot product of two vectors returns a scalar value
  • Cross product: The cross product of two vectors returns another vector that is orthogonal to the two vectors

Example: If we have 2 vectors A = [1, -3, 5] and b = [4, -2, -1], then:

A) Dot product:

a . b = (a1 * b1) + (a2 * b2) + (a3 * b3) = (1 * 4(-) +3 * -2) + (5 * 1) = 3
Copy the code

B) Cross product:

a X b = [c1, c2, c3] = [13.21.10]
Copy the code

when

C1 = (a2 * b3) - (A3 * b2) C2 = (A3 * b1) - (a1 * b3) c3 = (a1 * b2) - (a2 * b1)Copy the code

2. Matrices and matrix operations: A matrix is an array of numbers in the form of rows and columns. For example, the cat image above can be written as a pixel matrix:

Just like numbers, we can add and subtract from two matrices. However, operations such as multiplication and division are slightly different:

  • Scalar multiplication: When we multiply a single scalar value by a matrix, we multiply the scalar by all the elements in the matrix

  • Matrix multiplication: Multiplying two matrices means taking the dot product of rows and columns and creating a new matrix with dimensions different from the two input matrices

  • Transpose of a matrix: We swap rows and columns in a matrix to obtain its transpose

  • Inverse matrix: Conceptually, it is similar to an inverse number in that the inverse of a matrix is multiplied by a matrix to produce an identity matrix

You can refer to this excellent Khan Academy course on linear algebra to learn more about these concepts. You can also check out 10 powerful applications of linear algebra here.

  • Courses: www.khanacademy.org/math/linear…

  • Linear algebra 10 strong application: www.analyticsvidhya.com/blog/2019/0…

Deep learning calculus

The value we are trying to predict, for example “y”, is whether the image is a cat or a dog. This value can be expressed as a function of the input variable/input vector. Our main aim is to approximate the predicted value to the actual value.

Now, imagine processing thousands of images of cats and dogs. These look really cute, but as you can imagine, it’s not easy to work with these images and numbers!

Because deep learning by its nature involves large amounts of data and complex machine learning models, the use of both is often a waste of time and resources. That’s why it’s important to optimize our deep learning model so that it can make predictions as accurately as possible without using excessive resources and time.

This is the key to calculus in deep learning: optimization.

In any deep learning or machine learning model, we can represent the output as a mathematical function of the input variables. Therefore, we need to see how the output varies with each input variable. We need derivatives to do this, because derivatives represent rates of change.

Derivatives and partial derivatives: Simply put, the derivative measures the change in the output value when we change the input value. In mathematical terms:

If y = f(x), then the derivative of y with respect to x, id given as
dy/dx = change in y / change in x
Copy the code

Geometrically, if we represent f (x) as a graph, then the derivative of that point is also the slope of that point’s tangent line on that graph.

Here’s a picture to help you understand it:

The derivative we saw above involves only one variable, x. In deep learning, however, the final output Y may depend on hundreds of variables. In this case, we need to calculate the rate of change of y for each input variable. This is where the partial derivative comes in.

Partial derivatives: Basically, we consider only one variable and leave all the other variables unchanged. And then we use the remaining variables to compute the derivative of y. In this way, we can calculate the derivative of each variable.

Chain rule: In general, depending on the input variables, the function of y can be much more complicated. So how do we compute the derivative? The chain rule helps us calculate the following:

If y = f(g(x)), where g(x) is a function of x, and f is a function of g(x), then
dy/dx = df/dx * dg/dx
Copy the code

Let’s consider a relatively simple example:

Y is equal to sine of x to the power2)Copy the code

Therefore, using the chain rule:

So dy dx is equal to d sine of x2 dx times d cosine of x2 times 2xCopy the code

Deep Learning Calculus

  • Calculus at Khan Academy: Calculus
    • www.khanacademy.org/math/differ…
  • 3Blue1Brown has great videos on math and calculus:
    • www.youtube.com/channel/UCY…

4. Probability statistics of deep learning

Like linear algebra, “statistics and probability” is its own new mathematical world. This can be very daunting for beginners, and even experienced data scientists sometimes find it challenging to recall advanced statistical concepts.

However, statistics is undeniably the backbone of machine learning and deep learning. Concepts of probability and statistics, such as descriptive statistics and hypothesis testing, are crucial in an industry where the interpretability of deep learning models is Paramount.

Let’s start with the basic definition:

  • Statistics is the study of data

  • Descriptive statistics is the study of mathematical tools for describing and representing data

  • Probability measures the likelihood of an event happening

Descriptive statistics

Let me give you a simple example. Suppose you score 1,000 students out of 100 on an entrance exam. Someone asked you: How did the students do on the exam? Can you introduce the student’s scores to that person? In the future, you might but first say that the average score is 68. This is the average of the data.

Again, we can find simpler statements based on the data:

At this point, in just a few lines, we can say that most students did well, but not many did poorly on the test. That’s descriptive statistics. We represent data for 1000 students using only 5 values.

Other key terms are used in descriptive statistics, such as:

  • The standard deviation
  • The variance
  • Normal distribution
  • Central limit theorem

possibility

Based on the same example, suppose you were asked the question: If I chose a student at random from these 1,000 students, what are his/her chances of passing the exam? The concept of probability will help you answer this question. If you get a 0.6 probability, that means he/she has a 60% chance of passing (assuming a pass criterion of 40 points).

Hypothesis testing and inferential statistics can be used to answer additional questions about the same data (as shown below) :

  • Can the entrance examination be considered difficult?
  • Are students’ high scores the result of hard work or because the questions in the exam were easy?

You can learn all about statistics and probability from the following resources:

  • Introduction to Data Science (Statistics and Probability)

    • Courses.analyticsvidhya.com/courses/int…
  • Statistical guide to integrated practical reasoning

    • www.analyticsvidhya.com/blog/2017/0…
  • Probabilistic foundations of data science

    • www.analyticsvidhya.com/blog/2017/0…
  • Your statistical hypothesis hypothesis test guide

    • www.analyticsvidhya.com/blog/2015/0…

5. Key machine learning concepts for deep learning

That’s good news — you don’t need to know the full range of machine learning algorithms that exist today. Not that they don’t matter, but just to start deep learning, you don’t need to know a lot.

However, there are a few concepts that are critical to building your base and getting to know yourself. Let’s review these concepts.

Supervised and unsupervised algorithms

  • Supervised learning: In these algorithms, we know the target variables (which we are going to predict), and we know the input variables (independent features that contribute to the target variables). We then generate an equation that gives the relationship between the input variable and the target variable and apply it to the data we have. Examples: kNN, SVM, linear regression, etc.

  • Unsupervised learning: In unsupervised learning, we do not know the target variables. It is mainly used to cluster data into groups, and after clustering data we can identify groups. Examples of unsupervised learning include K-means clustering, prior algorithms, etc.

Evaluation indicators

Building predictive models is not the only step required for deep learning. You need to check the quality of the model and keep improving it until we get to the best model.

So how do we judge the performance of deep learning models? We use a number of metrics. Depending on the task, we have different evaluation metrics for regression and classification.

  • Classification evaluation indicators:

    • Confusion matrix

    • accuracy

    • Accuracy and recall

    • F1 score

    • AUC-ROC

    • Log loss

  • Regression evaluation indicators:

    • RMSE

    • RMSLE

    • R2 and adjusted R2

Evaluation metrics are critical in deep learning. Whether in research or industry, your deep learning model will be judged by the value of the metrics.

  • 11 Important Machine Learning Model Evaluation Metrics Everyone Should Know

    • www.analyticsvidhya.com/blog/2019/0…
  • Free courses — Evaluation metrics for machine learning models

    • Courses.analyticsvidhya.com/courses/eva…

Authentication technology

A deep learning model trains itself based on the data provided to it. However, as mentioned above, we need to improve this model, and we need to check its performance. The true power of the model can only be observed when we provide completely new data (albeit cleaned up).

But how can we improve the model? Do we give it new data every time we want to change a parameter? You can imagine how time-consuming and expensive such a task would be!

That’s why we use validation. We divided the whole data into three parts: training, validation and testing. Here’s a simple sentence to help you remember:

We train models on training sets, refine them on verification sets, and finally make predictions on hitherto invisible test sets.

Some common strategies for cross-validation are k-fold cross-validation and leave-one-method cross-validation (LOOCV).

This is a comprehensive article on validation techniques and how to implement them in Python: Improving model performance with cross validation (in Python/R)

  • www.analyticsvidhya.com/blog/2018/0…

Gradient descent

Let’s go back to the calculus we saw earlier and the need for optimization. How do we know we’ve reached the best model? We can make some subtle changes in the equation, and each time we make a change, we check to see if we are close to the actual value.

It’s a small step in the direction of what’s possible, and it’s the basic intuition behind gradient descent. Gradient descent is one of the most important concepts you will encounter and revisit frequently in deep learning.

Explanation and implementation of gradient descent in Python: Introduction to gradient descent algorithms (and variations) in machine learning.

Linear model

What’s the simplest equation you can think of? Let me list a few:

  1. Y = x + 1
  2. 4x + 3y -2z = 56
  3. Y is equal to x over 1 minus x.

Did you notice one thing in common with all three functions? Yeah, they’re all linear functions. What if we could use these functions to predict the value of y?

And then these are called linear models. You’d be surprised how popular linear models are in the industry. They are not too complicated, they are explicable, and with the right gradient descent, we can also get high evaluation indicators! Moreover, linear models form the basis for further learning. For example, did you know that you can build a logistic regression model using a simple neural network?

Here is a detailed guide that covers not only linear and logistic regression, but also other linear models: 7 regression types and techniques in data science.

  • www.analyticsvidhya.com/blog/2015/0…

Overfitting and overfitting

You’ll often run into situations where your deep learning model performs well on the training set, but gives you poor accuracy on the validation set. This is because the model is learning each pattern from the training set, so it cannot detect these patterns in the validation set. This is called overfitting the data, and it makes the model too complex.

On the other hand, if your deep learning model doesn’t perform well on both the training set and the validation set, it probably won’t fit. When our data is actually non-linear (complex), think of it as applying a linear equation (an overly simplistic model) to our data:

A simple analogy for overfitting and underfitting is an example of a student in a math class:

  • Overfitting is associated with the student learning all the questions discussed in class by rote, but not being able to answer different questions related to the same concept during the exam

  • Underfitting is students who do poorly in class or on exams. Our target audience is the model/student who does not need to know all the issues discussed in class but does well on the test to show that he/she knows the concepts

Look at this intuitive explanation of overfitting and underfitting, and how they compare: overfitting and underfitting in machine learning.

  • www.analyticsvidhya.com/blog/2020/0…

Deviation of variance

In the simplest terms, bias is the difference between the actual value and the predicted value. Variance is measured by the change in output when training data is changed.

Let’s quickly summarize what the picture above explains:

  1. Top left: Very accurate model, so our model will have a low error, which means less bias and less bias. All data points fit the bull ‘s-eye

  2. Top right: The predicted data points are centered on the bullseye (low variance), but also far apart (high deviation)

  3. Lower left: The predicted values are clustered (low variance), but far from the center of the target (high deviation)

  4. Lower right: The predicted data points are neither near the center of the target (high bias) nor close to each other (high variance)

Both high bias and high variance lead to increased errors. In general, high deviation indicates inadequate fitting, while high variance indicates excessive fitting. It is very difficult to achieve both low bias and low variance — one often comes at the expense of the other.

In terms of model complexity, we can use the following figure to determine the optimal complexity of the model:

sklearn

Just like Pandas, there is another library that forms the basis of machine learning. The SkLearn library is the most popular library in machine learning. It contains a lot of machine learning algorithms that you can apply to data as functions.

In addition, SkLearn even has capabilities for all metrics, cross-validation, and scaling/standardizing data.

Here is an example of sklearn in action:

from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error

regr = LinearRegression()  

#train your data - remember how we train the model on our train set?
regr.fit(X_train, y_train)

#predict on our validation set to improve it
y_pred = regr.predict(X_Valid)

#evaluation metrics: MSE
print('Mean Squared Error:', mean_squared_error(y_test, y_pred))
...#further improvement of our model
Copy the code

We can build a simple linear regression model with less than 10 lines of code!

Here are some great resources to learn more about SkLearn:

  • Scikit-learn (SKLearn) Introduction to machine learning

    • Courses.analyticsvidhya.com/courses/get…
  • Everything you need to know about kitkit-Learn’s latest update

    • www.analyticsvidhya.com/blog/2020/0…

endnotes

In this article, we present five basic things to know before building our first deep learning model. Here you’ll encounter popular deep learning frameworks such as PyTorch and TensorFlow. They are built in Python, and since you have a good command of Python, you can now easily understand how to use them.

Here are a few good articles about these frameworks:

  • Deep Learning Guide: How to implement neural networks in Python using TensorFlow

    • www.analyticsvidhya.com/blog/2016/1…
  • A beginner friendly guide to Pythorch and how to work from scratch

    • www.analyticsvidhya.com/blog/2019/0…

Once you’ve built your foundation on these five pillars, you can explore more advanced concepts such as hyperparametric tuning, backpropagation, and more. These are the concepts that I have accumulated for in-depth study.

The original link: www.analyticsvidhya.com/blog/2020/0…

Welcome to panchuangai blog: panchuang.net/

Sklearn123.com/

Welcome to docs.panchuang.net/