DLJS - Book Sample Chapters
DLJS - Book Sample Chapters
DLJS - Book Sample Chapters
Kevin Scott
Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Inference
2. Making Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3. Data & Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4. How to Prepare Image data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Training
5. Training Your Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6. Training from Scratch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7. Working With Non-Linear Data . . . . . . . . . . . . . . . . . . . . . . . . . . 47
8. Structured Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
9. Recognizing Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
10. Transfer Learning with ImageNet . . . . . . . . . . . . . . . . . . . . . . . . 85
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Hi!
This sample contains two chapters from my book, Deep Learning With
Javascript.
The first chapter talks about why you'd want to build a Neural Network in
Javascript; the second, how to get started doing Image Classification.
— Kevin
Foreword
This is how most software gets written today. Manually and line by line. You, the
programmer, specify everything about how a program should operate, from how
a user interacts to the way data is stored and retrieved.
Some have called this approach to code explicit programming, though a more
straight forward description would simply be "programming".
When you're building a login form, or any sort of system where you want this
action to cause that outcome, explicitly programming software makes a lot of
sense. You can reason through code like this. There's little danger that the
software wakes up one day on the wrong side of the bed and decides it's tired of
this whole "login business" and why don't we try logging users out for a change?
Your code will always do exactly what you've told it to.
i
But let's say you want software that can detect whether a photograph contains a
human face or not. How might you approach that?
function isThisAHumanFace(pixels) {
if (
hasEyes(pixels) &&
hasEars(pixels) &&
hasNose(pixels) &&
hasMouth(pixels)
) {
return true
}
return false
}
Perhaps you start by reasoning that a human head is made up of eyes, ears, a
nose, and a mouth. What if the photograph is from above, and all we see is the
top of the person's head? What about from below? What if they're upside down,
or half their face is in shadow? What if they're wearing a floppy hat? What if
they have a beard or have long hair over their eyes? Trying to define every
possibility in a scenario like this becomes very complicated, very quickly.
That's the promise of Deep Learning. Tackle problems that traditionally were
hard-to-impossible to solve with regular programming. And in the process,
revolutionize how we build software.
I wanted a book like this one when I began studying Deep Learning. I'm a hacker
at heart, and I learn by doing.
ii
Most books on Deep Learning assume strong mathematics or statistics
backgrounds. Traditionally this hasn't been a problem, because anyone looking to
enter the field would be expected to tackle problems that required these sorts of
backgrounds.
The goal of this book is to demystify Neural Networks and prepare you for that
future, through hacking and through using them in projects you'd use in the real
world.
To get the most out of this book, I assume you're familiar with modern Javascript,
or at least confident enough to fake it. If you do need a primer on Javascript,
there's a number of great recommendations in the Resources section.
iii
How This Book is Organized
At the beginning of most chapters is a URL that provides a sandbox for writing
your code, with everything set up and ready to go, that looks like:
https://dljsbook.com/i/inference
You are of course welcome to run the examples locally if you'd prefer. If you do,
install the following libraries via npm :
The field of Deep Learning changes incredibly fast. Things evolve, and quickly. If
anything is missing, confusing, or just plain wrong, please let me know at
feedback@dljsbook.com so I can fix it for the next version.
Happy hacking!
iv
1. What is Deep Learning
When I was growing up, I read a story about a man from Serbia named Kalfa
Manojlo. He was a blacksmith's apprentice who dreamt of becoming the first man
in human history to fly.
He wasn't the first to try. People had been flapping their arms like birds for
centuries with no apparent success, but Manojlo, with the confidence of
somebody too young to know what they don't know, believed he could build a
pair of wings better than any that had come before. On a chilly November day in
1841 Manojlo took his winged contraption, scaled the town Customs Office, and
launched himself into the air above a crowd of bemused onlookers.
You've probably never heard of this guy unless you're Serbian, so you can
probably guess what happened: Manojlo landed head first in a nearby snowbank
to much amusement from the gathered crowd. (Pretty good entertainment in the
1840s.)
Many early attempts at flight were like this. People thought that to fly like a bird,
you had to imitate a bird. It's not an unreasonable assumption; birds have been
flying pretty darn well for a pretty long time. But to conquer flight on a human
scale requires a fundamentally different approach.
Neural Networks, the engines that power Deep Learning, are inspired by the
human brain, in particular its capacity to learn over time. Similar to biological
brains, they are composed of cells connected together that change in response to
exposure from stimulus. However, Neural Networks are really closer to statistics
on steroids than they are to a human brain, and the strategies we'll use to build
and train them are diverge pretty quickly from anything related to the animal
kingdom.
1
Neural Networks have been around since at least the '50s, and from the beginning
people have asked when we might expect machines to achieve consciousness.
Depending on the speaker, the term Artificial Intelligence can mean anything
from Logistic Regression to Skynet taking over the world. For this reason, we'll
instead refer to the technology we're interested in as Machine Learning and
Deep Learning.
Machine Learning is the act of making predictions. That's it. You put data in, you
get data out. This includes a large range of methods not under the Deep Learning
umbrella, including many traditional statistical methods. And Deep Learning
covers the specific technology we'll study in this book, Neural Networks.
It turns out that asking a machine to make predictions, and giving it the tools to
improve those predictions, is applicable to a startlingly diverse set of applications.
You've probably encountered Neural Networks in use through one of the popular
virtual assistants, like Siri, Alexa, or Google Home, on the market today. When
you use your face or fingerprint to unlock your phone, that's a Neural Network.
There's a Neural Network running on your phone's keyboard, predicting which
words you're likely to type next, or autosuggesting likely phrases in your email
application. Perhaps you've been prompted to tag your friends in uploaded
photos, with your friends' faces highlighted and their names autosuggested.
Deep Learning can recognize and classify images to a degree of accuracy that
exceeds humans. Deep Learning monitors your inbox for spam and monitors
your purchases for fraud. An autonomous agent uses Deep Learning to decide
which move to make in a game of Go. An autonomous car uses Deep Learning to
decide whether to speed up, slow down, and turn right or left. Hedge funds use
Deep Learning to predict what stocks to buy, and stores use it to forecast demand.
Magazines and newspapers use Deep Learning to automatically generate
summaries of sports, and doctors are using Deep Learning to identify cancerous
cells, perform surgery, and sequence genomes. Researchers recently trained a
There's almost no industry that won't realize huge changes from Deep Learning,
and these changes are coming not in decades, but today and over the next few
years.
Andrew Ng, co-founder of Google Brain, often refers to this technology as "the
new electricity": a technology that will become so ubiquitous as to be embedded
in every device, everywhere around us. This oncoming sea change has huge
implications for how we build applications and craft software. Andrej Karpathy,
Director of AI at Tesla, calls it "Software 2.0":
It turns out that a large portion of real-world problems have the property
that it is significantly easier to collect the data (or more generally,
identify a desirable behavior) than to explicitly write the program. In
these cases, the programmers will split into two teams. The 2.0
programmers manually curate, maintain, massage, clean and label
datasets; each labeled example literally programs the final system
because the dataset gets compiled into Software 2.0 code via the
optimization. Meanwhile, the 1.0 programmers maintain the surrounding
tools, analytics, visualizations, labeling interfaces, infrastructure, and the
training code. — Andrej Karpathy
Intelligent Devices
Traditionally, Neural Networks have been run exclusively on servers, massive
computers with the computing horsepower to support them. That's beginning to
change.
And just in time, because there's compelling reasons to run Neural Networks
directly on the device.
Another reason is latency. If you're in a self-driving car, you can't rely on a cloud
connection to detect whether pedestrians are in front of you. Even with a good
connection, it's hard to do real time analysis on a 60 FPS video or audio stream if
you're processing it on the server; that goes double for cutting-edge AR and VR
applications. Processing directly on the device avoids the round trip.
As more companies face the decision of whether to deploy their Neural Network
on a server or directly on the device, increasingly that answer will be the device.
It just makes sense.
Javascript is ideal for our purposes in this book, because you already have it
installed (through your web browser) and it excels as a language in handling rich
interactive experiences. Though all the Networks we'll write in this book are
designed to run in a browser, you may wish to tackle larger datasets requiring
more computation in the future; if so, all examples can be ported to run in
Node.js and can take advantage of whatever server-side GPUs you have at your
disposal.
The biggest drawback I see for using Javascript for Deep Learning is the nascent
ecosystem. npm still lags Python's tools in the breadth and depth of packages
supporting Deep Learning, and a huge amount of resources, tutorials, and books
demonstrate AI concepts in Python or R. To me, this presents an opportunity as a
community to step up and contribute the next generation of tools. Javascript is a
wily language and I have no doubt developers will fill in the gaps soon.
A lone Neuron
The Neuron takes in a single number and transforms it. You'll specify the nature
of this transformation when you architect your Network.
Weights describe the strength of the connection between a given pair of Neurons.
A bigger weight implies a stronger connection between one Neuron and another,
increasing the influence that Neuron will have on the final prediction.
This diagram shows a three-layer Neural Network. The first layer is the input
layer. This is where data enters the Network. The last layer is the output layer,
responsible for emitting the transformed values from the Network.
The layer in the middle is called a hidden layer. Hidden layers make up the bulk
of Neural Networks, and it's Networks with a lot of hidden layers that give rise to
the phrase "Deep Learning": those Networks are "Deep". There's no limit to the
number of hidden layers you can use.
Inference describes the flow of data moving forward in one direction through the
Network, from the input layer to the output layer. You can think of this as the
network predicting the expected value, based on the given input values. For
instance, you might feed it a picture of an animal, and the Network might answer
"dog".
When you do this, specific Neurons fire based on the presence or absence of
certain characteristics: Does it have fur? Does it have floppy ears? Is its tongue
hanging out at an odd angle? Based on the answers to these questions, the
Network might answer "Yes, I think this is a dog!"
Inference is usually, though not always, how your users will interact with your
Neural Network.
Training describes the flow of data forward and backward through the network.
Based on the accuracy of the Network's predictions, changes ripple backwards,
adjusting weights so that the network can improve and produce more accurate
predictions in the future. You might feed the network a hundred photos of dogs,
allowing the network to figure out - on its own - that all dogs tend to have fur
and oddly angled tongues.
You may see the terms forward propagation and backpropagation. These are
i formal terms for describing Inference and an element of Training, respectively.
Often these two phases - Inference and Training - are approached separately, and
in this book that's how we'll tackle them. We'll start by looking at Inference,
including how to interact with pretrained Neural Networks, how to pass them
data, and how to interpret predictions. After that, we'll discuss Training,
including how to build Neural Networks from scratch, and how to train them to
return accurate predictions.
We'll start by learning how to load a Neural Network in our browser and use it.
11
2. Making Predictions
https://dljsbook.com/i/inference
— D. Sculley et all
Have you ever built an app that relies on users to accurately tag the things they
upload? If so, you've probably found you can't rely on users to tag things
accurately, or tag things at all.
Part of the problem is it's a hassle. Remember Google+? One of its major
innovations was prompting you to categorize your friends into "Circles" to give
you a more relevant feed, but nobody had time for that.
To make it easier to collect accurate tags from our users, we can build a Neural
Network that will automatically suggest tags when an image is uploaded.
For this example and subsequent chapters, we'll use a Neural Network called
MobileNet. (Not all Neural Networks have names, but this one does). MobileNet
is a Neural Network developed by Google. It's designed to be small and efficient,
perfect for usage in a Browser.
12
All Neural Networks are trained on some collection of data, and if you have
access to the original dataset, you can expect the Network to perform at or near
its theoretical best. MobileNet was trained on a dataset called ImageNet. Its
breadth and quality of classification (1000 categories to choose from), along with
its size (an average of 500 images per category) have established it as a sort of
"gold standard" for establishing Image Classification benchmarks.
Samples of ImageNet
Let's jump into the dataset and see what things look like. The @dljsbook/data
package provides a handy interface for loading images from ImageNet. You can
load an image with:
This will print an image to tfvis if it's available, or your browser console.
Any call to print from @dljsbook accepts an optional HTML element as the
i first argument, like print(document.getElementById('root')) .
Making Predictions 13
Printing an image from ImageNet to your browser
Now that you've seen the kind of data we're dealing with, let's see an example of
Inference in action. We'll load MobileNet and feed it a random image, and see
how accurate it is. The code to perform Inference with MobileNet is:
Making Predictions 14
If we open up tfvis or your console log, we should see an image and its associated
category. Run the code again to load a new, random image. We'll step through
this code line by line.
The first line loads a Neural Network from the web into your browser with a
single line of code:
tf.loadModel('path_to_a_network')
tf.loadModel returns a promise that resolves with the Neural Network. Once
the Network has loaded, we can load a random image from the ImageNet dataset:
This returns a category. Let's next look at the prediction variable. To translate
prediction into a normal array, we can write:
Making Predictions 15
If you write console.log(prediction) directly, you'll see some strange
output. This is because prediction is a Tensor, not a number. To see the
i output of prediction directly, write prediction.print() . Tensors are a
type of data container we'll cover in depth in Chapter 3.
We can find the category with the highest prediction with this snippet:
This pulls the highest number, corresponding to the category the Network is
most confident matches the image. We can map the value of prediction to a string
that matches the category:
console.log(ImageNet.classes[pred])
// Dog
Why does the Network return a number instead of the name of the category
directly?
During the previous chapter, we discussed how Inference was like the Network
looking at a picture of a dog and asking: "Does it have floppy ears? A tongue? It's
a dog!" In reality, it's more accurate to say that a Network receives data as a series
of numbers - 11010 - each representing a pixel value, and the Network emits a
number - 61, for example - that must in turn be interpreted as "dog" or not.
Neural Networks deal entirely in numbers, and all the data you send through and
collect from the Network needs to be converted. While converting data that
comes out of the network is usually pretty straightforward, preparing and
converting the data before it goes into the model can be trickier.
Making Predictions 16
Thanks for reading!
Again, I'd love to hear any feedback you have to offer, good or bad. You can write
to me at:
feedback@dljsbook.com
— Kevin