Where do text models currently have a major deficiency?
Text models are not good at generating correct responses. We don’t have a reliable way to combine a knowledge base with a deep learning model for generating a correct language response. An example would be medical language responses. It can be easy to create content that appears to a layman to be compelling, but is actually is entirely incorrect.
What are possible negative societal implications of text generation models?

A possible negative implication is that context-appropriate, highly compelling responses on social media could be used at massive scale that is thousands of times greater than any troll farm previously seen and used to spread disinformation, create unrest, and encourage conflict.
In situations where a model might make mistakes, and those mistakes could be harmful, what is a good alternative to automating a process?

An alternative to automating a process is that deep learning not be used as an entirely automated process, but as a part of a process in which the model and a human user interact closely. This then helps the humans orders of magnitude become more productive than they would be with entirely manual methods and results in a more accurate processes than using a human alone.
What kind of tabular data is deep learning particularly good at?

Deep learning is particularly good at analyzing time series and tabular data. Deep learning greatly increases the variety of columns that you can include such as columns containing natural language(book titles, reviews, etc.) and hight cardinality categorical columns (large number of discrete choices such as zip code or product ID).
What's a key downside of directly using a deep learning model for recommendation systems?

They have the downside that they tell you only which products a particular user might like, rather than what recommendations would be helpful for user.
What are the steps of the Drivetrain Approach?
The basic idea is to start with considering your objective, then think about what actions you can take to meet that objective and what data you have (or can acquire) that can help, and then build a model that you can use to determine the best action to take to get the best results in terms of your objective.
How do the steps of the Drivetrain Approach map to a recommendation system?
The objective of a recommendation engine is to drive additional sales by surprising and delighting the customer with recommendations of items they would not have purchased without the recommendation. The lever is the ranking of the recommendations. New data must be collected to generate recommendations, that will cause new sales. This will require conducting many randomized experiments in order to collect data about a wide range of of recommendations for a wide range of customers. This is step that few organizations take; but without it, you don’t have the information you need to optimize recommendations based on your true objective(more sales)!.
Create an image recognition model using data you curate, and deploy it on the web.
What is DataLoaders?

Dataloaders is a thin class that just stores whatever DataLoader objects you pass to it and makes them available as train and valid. For fastai, it provides the data for your model.
What four things do we need to tell fastai to create DataLoaders?
To turn our downloaded data into a DataLoaders object, we need to tell fast at least four things
- What kinds of data we are working with
- How to get the list of items
- How to label these items
- How to create the validation set
What does the splitter parameter to DataBlock do?
The splitter parameter splits the data into training and validation sets.
How do we ensure a random split always gives the same validation set?

We want to split our training and validation sets randomly. However, we would like to have the same training/validation split each time we run this notebook, so we fix the random seed(computers don’t really know how to create random numbers at all, but simply create lists of numbers that look random; if you provide the same starting point for that list each time (called the seed) then you will get the exact same list each time).
What letters are often used to signify the independent and dependent variables?

The independent variable is often referred to as x, and the dependent variable is often referred to as y.
What's the difference between the crop, pad, and squish resize approaches? When might you choose one over the others?

The crop on images removes some of the features that allow us to perform recognition. The pad, pads the images with zeros(black) leading to empty space which wastes computations for our model. The squish or stretch may cause the image to end up as unrealistic shapes, leading the model to learn that things look different from how they actually are. In practice we randomly select part of the image and then crop to just that part. On each epoch(one complete pass through all of the images in the dataset), we randomly select a different part of the image. This causes our model to learn to focus on, and recognize, different features in our images. It also reflects how images work in the real world:different photos of the same thing may be framed in slightly different ways.
What is data augmentation? Why is it needed?
Data augmentation refers to creating random variations of our input data, such that they appear different but do not change the meaning of the data. Examples of common data augmentation techniques for images are rotation, flipping, perspective, warping, brightness changes, and contrast changes. Untrained neural networks knows nothing about how images behave. It doesn’t even recognize that when an object is rotated by one degree, it is still a picture of the same thing! Training a neural network with examples of images in which the objects are in slightly different places and are slightly different sizes helps it understand the basic concept of what an object is, and how it can be represented in an image.
What is the difference between item_tfms and batch_tfms?
- Item transforms is code (item_tfms) that runs on each individual item, whether it be an image, category, or so forth.
- Batch transforms is code that run on the GPU and run on an entire batch of images that are the same size.
What is a confusion matrix?
A confusion matrix helps us understand how the model is performing. The rows represent all the black, grizzly, and teddy bears in the dataset. The columns represent the images that the model predicted as black, grizzly, and teddy bears, respectively. The diagonal one the matrix shows the images that were classified correctly, and the off-diagonal cells represent those that were classified incorrectly.
What does export save?
The export method saves the architecture and the trained parameters of the model. It also saves the definition of how to create your DataLoaders.
What is it called when we use a model for getting predictions, instead of training?
When we use a model for getting predictions, instead of training, we call it inference.
What are IPython widgets?
IPython widgets are GUI components that bring together JavaScript and Python functionality in a web browser, and can be created and used within a Jupiter notebook.
When might you want to use CPU for deployment? When might GPU be better?
We do not need a GPU to serve a model in production. As an example, an image classification model will use a single CPU since it is more cost effective since you would normally be classifying one user’s image at a time. A GPU might be better if you wait for multiple users to submit their images, batch them up, and process them all at once. This will cause the users to wait for the results so it may be best for a high volume site.
What are the downsides of deploying your app to a server, instead of to a client (or edge) device such as a phone or PC?
Your application will require a network connection, and there will be some lately each time the model is called. If your application uses sensitive data, your users may be concerned about an approach that sends that data to a remote server.
What are three examples of problems that could occur when rolling out a bear warning system in practice?
- Working with video data instead of images
- Handling nighttime images, which may not appear in this dataset
- Dealing with low-resolution camera images
- Ensuring results are returned fast enough to be useful in practice
- Recognizing bears in positions that are rarely seen in photos that people post online (for example from behind, partially covered by bushes, or a long way away from the camera)
What is "out-of-domain data”?

Out of domain data means that the data that our model sees in production is very different from what it saw during training. There isn’t a complete technical solution to this problem; instead, we have to be careful about our approach to rolling out the technology.
What is "domain shift”?
A common problem whereby the type of data that our model sees changes over time. For instance, an insurance company may use a deep learning model as part of its pricing and risk algorithm, but over time the types of customers the company attracts and the types of risks it represents may change so much that the original training data is no longer relevant.
What are the three steps in the deployment process?
- Step 1 - Manual Process
  - Run model in parallel
  - Humans check all predictions
- Step 2 - Limited Scope Deployment
  - Careful human supervision
  - Time or geography limited
- Step 3 - Gradual expansion
  - Good reporting systems needed
  - Consider what could go wrong.