Convolutional Neural Networks with Swift for Tensorflow: Image Recognition and Dataset Categorization
By Brett Koonce
()
About this ebook
Dive into and apply practical machine learning and dataset categorization techniques while learning Tensorflow and deep learning. This book uses convolutional neural networks to do image recognition all in the familiar and easy to work with Swift language.
It begins with a basic machine learning overview and then ramps up to neural networks and convolutions and how they work. Using Swift and Tensorflow, you'll perform data augmentation, build and train large networks, and build networks for mobile devices. You’ll also cover cloud training and the network you build can categorize greyscale data, such as mnist, to large scale modern approaches that can categorize large datasets, such as imagenet.
Convolutional Neural Networks with Swift for Tensorflow uses a simple approach that adds progressive layers of complexity until you have arrived at the current state of the art for this field.
What You'll Learn
- Categorize and augment datasets
- Build and train large networks, including via cloud solutions
- Deploy complex systems to mobile devices
Who This Book Is For
Developers with Swift programming experience who would like to learn convolutional neural networks by example using Swift for Tensorflow as a starting point.
Related to Convolutional Neural Networks with Swift for Tensorflow
Related ebooks
Neural Networks with Python Rating: 0 out of 5 stars0 ratingsArtificial Neural Networks with TensorFlow 2: ANN Architecture Machine Learning Projects Rating: 0 out of 5 stars0 ratingsPractical Machine Learning in JavaScript: TensorFlow.js for Web Developers Rating: 0 out of 5 stars0 ratingsTensorFlow Developer Certification Guide: Crack Google's official exam on getting skilled with managing production-grade ML models Rating: 0 out of 5 stars0 ratingsTensorFlow Developer Certification Guide Rating: 0 out of 5 stars0 ratingsInternet of Things (IoT) A Quick Start Guide: A to Z of IoT Essentials Rating: 0 out of 5 stars0 ratingsDeep Learning with Python: A Comprehensive Guide to Deep Learning with Python Rating: 0 out of 5 stars0 ratingsTensorFlow in 1 Day: Make your own Neural Network Rating: 4 out of 5 stars4/5Practical TensorFlow.js: Deep Learning in Web App Development Rating: 0 out of 5 stars0 ratingsTensorFlow 2.x in the Colaboratory Cloud: An Introduction to Deep Learning on Google’s Cloud Service Rating: 0 out of 5 stars0 ratingsNeuromorphic Computing and Beyond: Parallel, Approximation, Near Memory, and Quantum Rating: 0 out of 5 stars0 ratingsDeep Learning: Computer Vision, Python Machine Learning And Neural Networks Rating: 0 out of 5 stars0 ratingsHands-on TinyML: Harness the power of Machine Learning on the edge devices (English Edition) Rating: 5 out of 5 stars5/5Computer Vision with Maker Tech: Detecting People With a Raspberry Pi, a Thermal Camera, and Machine Learning Rating: 0 out of 5 stars0 ratingsHands-On Deep Learning for Images with TensorFlow: Build intelligent computer vision applications using TensorFlow and Keras Rating: 0 out of 5 stars0 ratingsPractical Ansible: Configuration Management from Start to Finish Rating: 0 out of 5 stars0 ratingsArtificial Neural Networks with Java: Tools for Building Neural Network Applications Rating: 0 out of 5 stars0 ratingsPro .NET Memory Management: For Better Code, Performance, and Scalability Rating: 0 out of 5 stars0 ratingsEffective Data Science Infrastructure: How to make data scientists productive Rating: 0 out of 5 stars0 ratingsThe Homelab Almanac: A guide for starting the homelab journey, from purchasing to DevOps deployment Rating: 0 out of 5 stars0 ratingsIntroduction to Machine Learning in the Cloud with Python: Concepts and Practices Rating: 0 out of 5 stars0 ratingsBeginning Azure IoT Edge Computing: Extending the Cloud to the Intelligent Edge Rating: 0 out of 5 stars0 ratingsSoftware-Defined Networks: A Systems Approach Rating: 5 out of 5 stars5/5Python Deep Learning: Understand how deep neural networks work and apply them to real-world tasks Rating: 0 out of 5 stars0 ratingsDebugging Systems-on-Chip: Communication-centric and Abstraction-based Techniques Rating: 0 out of 5 stars0 ratingsMachine Learning with Rust: A practical attempt to explore Rust and its libraries across popular machine learning techniques Rating: 0 out of 5 stars0 ratingsMachine Learning with Rust Rating: 0 out of 5 stars0 ratingsLarge Scale Machine Learning with Python Rating: 2 out of 5 stars2/5Deep Learning with TensorFlow: Explore neural networks and build intelligent systems with Python, 2nd Edition Rating: 0 out of 5 stars0 ratings
Programming For You
Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS Rating: 5 out of 5 stars5/5Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps Rating: 4 out of 5 stars4/5Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1 Rating: 5 out of 5 stars5/5Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time! Rating: 0 out of 5 stars0 ratingsCoding All-in-One For Dummies Rating: 4 out of 5 stars4/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5HTML in 30 Pages Rating: 5 out of 5 stars5/5Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer. Rating: 5 out of 5 stars5/5C# Programming from Zero to Proficiency (Beginner): C# from Zero to Proficiency, #2 Rating: 0 out of 5 stars0 ratingsSQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days Rating: 5 out of 5 stars5/5Python Data Structures and Algorithms Rating: 5 out of 5 stars5/5Linux Command-Line Tips & Tricks Rating: 0 out of 5 stars0 ratingsHTML & CSS: Learn the Fundaments in 7 Days Rating: 4 out of 5 stars4/5PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project Rating: 5 out of 5 stars5/5Python QuickStart Guide: The Simplified Beginner's Guide to Python Programming Using Hands-On Projects and Real-World Applications Rating: 0 out of 5 stars0 ratingsPython Games from Zero to Proficiency (Beginner): Python Games From Zero to Proficiency, #1 Rating: 0 out of 5 stars0 ratingsCoding All-in-One For Dummies Rating: 0 out of 5 stars0 ratingsLearning JavaScript Data Structures and Algorithms Rating: 5 out of 5 stars5/5The Most Concise Step-By-Step Guide To ChatGPT Ever Rating: 3 out of 5 stars3/5
Reviews for Convolutional Neural Networks with Swift for Tensorflow
0 ratings0 reviews
Book preview
Convolutional Neural Networks with Swift for Tensorflow - Brett Koonce
© Brett Koonce 2021
B. KoonceConvolutional Neural Networks with Swift for Tensorflowhttps://doi.org/10.1007/978-1-4842-6168-2_1
1. MNIST: 1D Neural Network
Brett Koonce¹
(1)
Jefferson, MO, USA
In this chapter, we will look at a simple image recognition dataset called MNIST and build a basic one-dimensional neural network, often called a multilayer perceptron, to classify our digits and categorize black and white images.
Dataset overview
MNIST (Modified National Institute of Standards and Technology) is a dataset put together in 1999 that is an extremely important testbed for computer vision problems. You will see it everywhere in academic papers in this field, and it is considered the computer vision equivalent of hello world. It is a collection of preprocessed grayscale images of hand-drawn digits of the numbers 0–9. Each image is 28 by 28 pixels wide, for a total of 784 pixels. For each pixel, there is a corresponding 8-bit grayscale value, a number from 0 (white) to 255 (completely black).
At first, we’re not even going to treat this as actual image data. We’re going to unroll it – we’re going to take the top row and pull off each row at a time, until we have a really long string of numbers. We can imagine expanding this concept across the 28 by 28 pixels to produce a long row of input values, a vector that’s 784 pixels long and 1 pixel wide, each with a corresponding value from 0 to 255.
The dataset has been cleaned so that there’s not a lot of non-digit noise (e.g., off-white backgrounds). This will make our job simpler. If you download the actual dataset, you will usually get it in the form of a comma-separated file, with each row corresponding to an entry. We can convert this into an image by literally assigning the values one a time in reverse. The actual dataset is 60000 hand-drawn **training** digits with corresponding **labels** (the actual number), and 10000 **test** digits with corresponding **labels**. The dataset proper is usually distributed as a python pickle (a simple way of storing a dictionary) file (you don’t need to know this, just in case you run across this online).
So, our goal is to learn how to correctly guess what number we are looking at in the **test** dataset, based on our **model** that we have learned from the **training** dataset. This is called a **supervised learning** task since our goal is to emulate what another human (or model) has done. We will simply take individual rows and try to guess the corresponding digit using a simple version of a neural network called a **multilayer perceptron**. This is often shortened to **MLP**.
Dataset handler
We can use the dataset loader from swift-models,
part of the Swift for Tensorflow project, to make dealing with the preceding sample simpler. In order for the following code to work, you will need to use the following swift package manager import to automatically add the datasets to your code.
BASIC: If you are new to swift programming and just want to get started, simply use the swift-models checkout you got working in the chapter where we set up Swift for Tensorflow and place the following code (MLP demo) into the main.swift
file in the LeNet-MNIST example and run swift run LeNet-MNIST
.
ADVANCED: If you are a swift programmer already, here is the base swift-models import file we will be using:
```
/// swift-tools-version:5.3
// The swift-tools-version declares the minimum version of Swift required to build this package.
import PackageDescription
let package = Package(
name: ConvolutionalNeuralNetworksWithSwiftForTensorFlow
,
platforms: [
.macOS(.v10_13),
],
dependencies: [
.package(
name: swift-models
, url: https://github.com/tensorflow/swift-models.git
, .branch(master
)
),
],
targets: [
.target(
name: MNIST-1D
, dependencies: [.product(name: Datasets
, package: swift-models
)],
path: MNIST-1D
),
]
)
```
Hopefully, the preceding code is not too confusing. Importing this code library will make our lives much easier. Now, let’s build our first neural network!
Code: Multilayer perceptron + MNIST
Let’s look at a very simple demo. Put this code into a main.swift
file with the proper imports, and we’ll run it:
```
/// 1
import Datasets
import TensorFlow
// 2
struct MLP: Layer {
var flatten = Flatten
var inputLayer = Dense
var hiddenLayer = Den se
var outputLayer = Dense
@differentiable
public func forward(_ input: Tensor
return input.sequenced(through: flatten, inputLayer, hiddenLayer, outputLayer)
}
}
// 3
let batchSize = 128
let epochCount = 12
var model = MLP()
let optimizer = SGD(for: model, learningRate: 0.1)
let dataset = MNIST(batchSize: batchSize)
print(Starting training...
)
for (epoch, epochBatches) in dataset.training.prefix(epochCount).enumerated() {
// 4
Context.local.learningPhase = .training
for batch in epochBatches {
let (images, labels) = (batch.data, batch.label)
let (_, gradients) = valueWithGradient(at: model) { model -> Tensor
let logits = model(images)
return softmaxCrossEntropy(logits: logits, labels: labels)
}
optimizer.update(&model, along: gradients)
}
// 5
Context.local.learningPhase = .inference
var testLossSum: Float = 0
var testBatchCount = 0
var correctGuessCount = 0
var totalGuessCount = 0
for batch in dataset.validation {
let (images, labels) = (batch.data, batch.label)
let logits = model(images)
testLossSum += softmaxCrossEntropy(logits: logits, labels: labels).scalarized()
testBatchCount += 1
let correctPredictions = logits.argmax(squeezingAxis: 1) .== labels
correctGuessCount += Int(Tensor
totalGuessCount = totalGuessCount + batch.data.shape[0]
}
let accuracy = Float(correctGuessCount) / Float(totalGuessCount)
print(
"
[Epoch \(epoch + 1)] \
Accuracy: \(correctGuessCount)/\(totalGuessCount) (\(accuracy)) \
Loss: \(testLossSum / Float(testBatchCount))
"
)
}
```
Results
When you run the preceding code, you should get an output that looks like this:
```
Loading resource: train-images-idx3-ubyte Loading resource: train-labels-idx1-ubyte Loading resource: t10k-images-idx3-ubyte Loading resource: t10k-labels-idx1-ubyte
Starting training…
[Epoch 1] Accuracy: 9364/10000 (0.9364) Loss: 0.21411717
[Epoch 2] Accuracy: 9547/10000 (0.9547) Loss: 0.15427242
[Epoch 3] Accuracy: 9630/10000 (0.963) Loss: 0.12323072
[Epoch 4] Accuracy: 9645/10000 (0.9645) Loss: 0.11413358
[Epoch 5] Accuracy: 9700/10000 (0.97) Loss: 0.094898805
[Epoch 6] Accuracy: 9747/10000 (0.9747) Loss: 0.0849531
[Epoch 7] Accuracy: 9757/10000 (0.9757) Loss: 0.076825164
[Epoch 8] Accuracy: 9735/10000 (0.9735) Loss: 0.082270846
[Epoch 9] Accuracy: 9782/10000 (0.97) Loss: 0.07173009
[Epoch 10] Accuracy: 9782/10000 (0.97) Loss: 0.06860765
[Epoch 11] Accuracy: 9779/10000 (0.9779) Loss: 0.06677916
[Epoch 12] Accuracy: 9794/10000 (0.9794) Loss: 0.063436724
Congratulations, you’ve done machine learning! This demo is only a few lines long, but a lot is actually happening under the hood. Let’s break down what’s going on.
Demo breakdown (high level)
We will look at all of the preceding code, going through section by section using the number in the comments (e.g., //1, //2, etc.). We will first do a pass to try and explain what is going on at a high level and then do a second pass where we explain the nitty-gritty details.
Imports (1)
Our first few lines are pretty simple; we’re importing the swift-models MNIST dataset handler and then the TensorFlow library.
Model breakdown (2)
Next, we build our actual neural network, an MLP model:
```
/// 2
struct MLP: Layer {
var flatten = Flatten
var inputLayer = Dense
var hiddenLayer = Dense
var outputLayer = Dense
@differentiable
public func forward(_ input: Tensor
return input.sequenced(through: flatten, inputLayer, hiddenLayer, outputLayer)
}
}
```
What’s in this data structure? Our first line just defines a new struct called MLP, which subclasses **Layer**, a type in swift for tensorflow. To define this class, S4tf enforces a **protocol** definition that we implement the function **forward** (formerly **callAsFunction**), which takes an **input** and maps it to an **output**. Our middle lines then actually define the layers of our perceptron:
```
var flatten = Flatten
var inputLayer = Dense
var hiddenLayer = Dense
var outputLayer = Dense
```
We have four internal layers:
1)
A flatten operation: This just takes the input and reduces it to a single row of input numbers (a vector).
Our dataset is internally giving us a picture of 28x28 pixels, and this just converts it into a row of numbers, 784 pixels long.
Next, we have three **dense** layers, which are a special type of neural network called **fully connected** layers. The first goes from our initial input (e.g., the flattened 784x1 vector) to 512 nodes, like so.
2)
A dense layer: 784 (the preceding input) to 512 nodes.
3)
Another dense layer: 512 nodes to 512 nodes again.
4)
An output layer: 512 nodes to 10 nodes (the number of digits, 0–9).
And then, finally, a forward function, which is where our neural network logic magic happens. We literally take the input, run it through the flatten, dense1, dense2, and output layers to produce our result.
And so our
return input.sequenced(through: flatten, inputLayer,
hiddenLayer, outputLayer)
is then the call that actually takes the input and maps it through these four layers. We will look at the actual training loop next to understand how all of that actually happens, but a very large part of the magic of swift for tensorflow is on these few lines. We’ll talk a little bit more about what is happening here in a second, but conceptually this function is nothing more than applying the preceding four layers in a sequence.
Global variables (3)
These lines are just setting up some different tools we’re going to use:
```
let batchSize = 128
let epochCount = 12
var model = MLP()
let optimizer = SGD(for: model, learningRate: 0.1)
let dataset = MNIST(batchSize: batchSize)
```
The first two lines set a couple of global variables: our batchSize (how many MNIST examples we are going to look at each pass) and epochCount (number of passes over the dataset we’re going to do).
The next line initializes our model, which we talked about earlier.
The fourth line initializes our optimizer, which we’re going to talk about more in a second.
The last line sets up our dataset handler.
The next line starts our actual training process by looping over our data:
```
for (epoch, epochBatches) in dataset.training.prefix(epochCount).enumerated() {
```
Now we can get into the actual training loop!
Training loop: Updates (4)
Here’s what the actual core of our training loop looks like. Conceptually, we’re going to be taking a set of pictures or **batch** and showing each individual picture to the first input set of dense nodes, which will **fire** and go to the next hidden set of dense nodes, which will **fire** and go to the final output set of dense nodes. Then, we will take all of the outputs of the final layer of our network, select the largest one, and look at it. If this node is the same number as the original input we gave it, then we will give the network a **reward** and tell it to increase its confidence in the results. If this answer is the wrong one, then we will give the network a **negative reward** and tell it to decrease its confidence in its results. By repeating this process using thousands of samples, our network can learn to accurately predict inputs it has never seen before.
```
Context.local.learningPhase = .training
for batch in epochBatches {
let (images, labels) = (batch.data, batch.label)
let (_, gradients) = valueWithGradient(at: model) { model -> Tensor
let logits = model(images)
return softmaxCrossEntropy(logits: logits, labels: labels)
}
optimizer.update(&model, along: gradients)
}
How does this work under the hood? A little bit of calculus mixed together with all of our data. For each training example, we get the raw pixel values (image data) and then the corresponding label (actual number for the picture). Then, we determine the **gradient** for the **model** by calculating the values that the model will predict for X and then see how our prediction compares with the actual value y using a function called softmaxCrossEntropy . Conceptually, softmax just takes a collection of inputs and then normalizes their results across the set as a percentage. This can be a bit complex mathematically, so converting the numbers to use the natural log e and then dividing by the sum of the exponents has the useful dual properties of being consistent across arbitrary inputs and easy to evaluate on a computer. Then, we update our **model** in the direction of that it differs from where it should be slightly (more in the right direction if it’s correct, away if it’s not). Our learning rate determines how far we should go each pass (e.g., since our rate is .1, we’re only going to go 10% of the direction the network thinks is the right one each time). In the for loop that calls all of this, we will repeat this process across all of our data (one pass) for multiple rounds, or **epochs**.
Training loop: Accuracy (5)
Next, we run our model on our test data and calculate how often it was correct on images it hasn’t seen yet (but that we know the right answers to). So then, what does accuracy mean, and how do we calculate it? Our code looks like this:
```
Context.local.learningPhase = .inference
var testLossSum: Float = 0
var testBatchCount = 0
var correctGuessCount = 0
var totalGuessCount = 0
for batch in dataset.validation {
let (images, labels) = (batch.data, batch.label)
let logits = model(images)
testLossSum += softmaxCrossEntropy(logits: logits, labels: labels).scalarized()
testBatchCount += 1
let correctPredictions = logits.argmax(squeezingAxis: 1) .== labels
correctGuessCount += Int(Tensor
totalGuessCount = totalGuessCount + batch.data.shape[0]
}
let accuracy = Float(correctGuessCount) / Float(totalGuessCount)
print(
"
[Epoch \(epoch + 1)] \
Accuracy: \(correctGuessCount)/\(totalGuessCount) (\(accuracy)) \
Loss: \(testLossSum / Float(testBatchCount))
"
)
```
In a similar process to our training dataset, we simply take our test input images, run them through our model, and then compare our results to what we know the right answer to be. Then we literally calculate the number of correct answers divided by the total number of images to produce our accuracy percentage. Our final few lines just print out various numbers each pass through the dataset, or **epoch**, so we can see if our loss is decreasing