Chapter 1 Questionnaire

Do you need these for deep learning?
- Lots of Math T/F:
  - False, High school math is sufficient.
- Lots of Data T/F:
  - False, Record breaking results have been seen with <50 items of data.
- Lots of expensive computers T/F:
  - You can get what you need for state of the art work for free.
- A PhD T/F:
  - False, no formal training is necessary.
Name five areas where deep learning is now the best in the world.

Natural language processing, computer vision, medicine, biology, image generation. Other areas also include recommendation systems, playing games, robotics, and other applications such as financial and logistical forecasting, text to speech and much more.
What was the name of the first device that was based on the principle of the artificial neuron?

A psychologist names Frank Rosentblatt worked on building the first artifical neuron device called the Mark I Perceptron.
Based on the book of the same name, what are the requirements for parallel distributed processing (PDP)?
The requirements for parallel distrubuted processing require the following: **A set of processing units***
- A state of activation
- An output function for each unit
- A pattern of connectivity among units
- A propagation rule for propagating patterns of activities through the network of connectivities
- An activation rule for combining the inputs impinging on a unit with the current state of that unit to produce an output for the unit
- A learning rule whereby patterns of connectivity are modified by experience
- An environment within which the systems must operate
What were the two theoretical misunderstandings that held back the field of neural networks?

The first theoretical misunderstanding came from the global academic community who essentially gave up on neural networks. This happened because of the misunderstanding of the book Perceptrons (MIT Press) by Marvin Minsky and Seymour Papert. Minsky and Papert showed that that a single layer of perceptrons was unable to learn some simple but critical mathematical functions(such as XOR). This is what the community took away from the book, what they missed was that Minsky and Papert in the book showed that using multiple layers of the perceptrons would allow these limitations to be addressed.

The second theoretical misunderstanding came from the implementation of neural networks. In theory, adding just one extra layer of neurons was enough to allow any mathematical function was enough to allow any mathematical function to be approximated with these neural networks, but in practice such networks were often too big and too slow to be useful.
What is a GPU?

Graphics Processing Unit(GPU). Also known as a graphics card. A special kind of processor in your computer that can handle thousands of single tasks at the same time.
Open a notebook and execute a cell containing:1+1. What happens?> Running a cell executes the instructions in the cell and the output, 2 in this example, is shown directly below the executed cell.
Follow through each cell of the stripped version of the notebook for this chapter. Before executing each cell, guess what will happen.

Running each cell executes the instructions in the cell and shows the results in the cell right below the executed cell.
Complete the Jupyter Notebook online appendix.

Go to https://oreil.ly/9uPZe to complete this. Make sure to go through this and learn the shortcuts and tricks towards the bottom of the page.10. Why is it hard to use a traditional computer program to recognize images in a photo? When we write a regular program, we are able to think about the process to complete the task and then translate them into code. We would need to tell the computer the exact steps required to solve a problem in detail. Recognizing objects in a photo is tricky. What are the steps that we need to recognize an object in a picture?
What did Samuel mean by "weight assignment"?
Samuel means that by "weight assignment" they are variables and we can assign values to them. These weights are the program's values that define how the program will operate.
What term do we normally use in deep learning for what Samuel called "weights"?
What Samuel called "weights" are generally referred to as model parameters. The term weight now is reserved for a particular type of model parameter.
Draw a picture that summarizes Samuel's view of a machine learning model.
See above image at the top of the page for Samuel's view of a machine learning model.
Why is it hard to understand why a deep learning model makes a particular prediction?
It is hard because it is not obvious what the deep model looks like for any given problem such as vision. This is in contrast to a checkers program where we have strategies encoded, search mechanism, and varied weights depending on the move and checkerboard area.
What is the name of the theorem that shows that a neural network can solve any mathematical problem to any level of accuracy?
The universal approximation theorem shows that this function can solve any problem to any level of accuracy, in theory.
What do you need in order to train a model?
Fundamentals for training a deep learning model: * A model cannot be created without data.
- A model can learn to operate on only the patterns seen in the input data used to train it.
- This learning approach creates only predictions, not recommended actions.
- It not enough to have just examples of input data. We need labels for that data. This means we need to have a label on each picture telling the computer what it is, such as picture of a cat labeled as a cat or dog picture labeled as a dog.
How could a feedback loop impact the rollout of a predictive policing model?
A positive feedback loop can impact the rollout of a predictive policing model. When the model gets used more, the more biased the data becomes. This causes the model to be more biased and the pattern repeats. For example, a predictive policing model is made based on historical arrests. This is not predicting crime but rather predicting arrests. Law enforcement then use this model to focus on those areas predicted by the model thereby increasing arrests. The additional arrests feedback into the model and makes the model more bias.
Do we always have to use 224×224-pixel images with the cat recognition model?
224x244 is the standard size for historical reasons(old pretrained models require this size), but you can use any size. IF you increase the size, you will often get a model with better results. This is because the model has more details to focus on. The downside of doing this causes the training to slow and increase memory consumption. The opposite is true if the images are smaller.
What is the difference between classification and regression?
- A classification model is a model that attempts to predict a class or category. It's predicting from a number of discrete possiblities, such as "dog" or "cat".
- A regression model is a model that attempts to predict one or more numeric quantities, such as temperature or a location.
What is a validation set? What is a test set? Why do we need them?
- A validation set is a set of data held out from training, used only for measuring how good the model is. Since we see the validation set, we can indirectly influence the model as we explore and adjust hyperparameter values. This has the potential to allow for overfitting the validation data through human trial and error and exploration.
- A test set is another level of highly reserved data. This is different from the validation set because it is not used to improve the model. The test data needs to be totally hidden and is only used to evaluate the model at the very end.
What will fastai do if you don't provide a validation set?
Fastai will help you out and provide a default validation set for you. By default, fastai will set valid_pct to 0.2 which means it will hold out 20% of the data and not use it for training the model.
Can we always use a random sample for a validation set? Why or why not?
No, when training it is helpful to set a random seed every time we run the code. This will tell fastai to give the same validation set every time we run it. If then we change our model and retrain it, we know that any differences are due to changes in the model and not due to having a different random validation set.
What is overfitting? Provide an example.

When training a model in such a way that it remembers specific features of the input data, rather than generalizing well to data not seen during training. The model then memorizes what it has already seen and then it makes poor predictions about new images..
What is a metric? How does it differ from "loss"?

A metric is a fucntion that measures the quality of the model's predictions using the validation set, and will be printed at the end of each epoch. The purpose of loss is to define a "measure of performance" that the training system can use to update the weights automatically. The loss is a choice that is easy for stochastic gradient descent to use. A metric is defined for human consumption that is easy to understand and matches closely what you want your model to do.
How can pretrained models help?
Pretrained models can help becuase they are the most important method we have to allow us to train more accurate models, more quickly, weith less data and less time and money.
What is the "head" of a model?
The "head" of a pretrained model is the last layer. The last layer is specifically customized to the original training task. We remove the last layer with cnn_learner and replace it with one or more new layers with randomized weights of an appropriate size for the dataset we are working with.
What kinds of features do the early layers of a CNN find? How about the later layers?
In the early layers, a CNN finds features such as diagonal, horizontal, and vertical edges, as well as various gradients. As we go up the layers, the model creates feature detectors that look for corners, repeating lines, circles, and other simple patterns. These are built upon the basic building blocks from the earlier layers. As we get into the later layers, the model is able to identify and match with higher-level semantic components, such as car wheels, text, and flower petals.
Are image models only useful for photos?
No, they can be used for many other applications other than images because a lot of other things can be represented as images. For example, a sound can be converted to a spectogram. A time series can be converted into an image by plotting the time series on a graph. In general, a small number of general approaches in deep learning can go a long way if you are creative in how you represent your data.
What is an "architecture"?
Every model starts with a choice of "architecture, which is the template* of the model that we're trying to fit. This is the actual mathmatical funcion that we're passing the input data and paramters to.
What is segmentation?
Creating a model that can recognize the content of every individual pixel in an image
What is y_range used for? When do we need it?
The y_range parameter is used to tell fastai what range our target has when predicting a continous number, rather an a category.
What are "hyperparameters"?
They are parameters about parameters, since they are the higher-level choices that govern the meaning of the weight parameters
What's the best way to avoid failures when using AI in an organization?
The best way to avoid failures when using AI in an organization is to use a test set. If using an external vendor, we hold out test data that the vendor never gets to see. They we check thier model on the test data we held out using a metric we choose based on what actually matters to us in practice. We then decide what level of performance is adequate