top of page

Deep Learning with Computer Vision
Training Description
You’ve probably heard that Deep Learning is making news across the world as one of the most promising techniques in machine learning, especially for analyzing image data. With every industry dedicating resources to unlock the deep learning potential, to be competitive, you will want to use these models in tasks such as image tagging, object recognition, speech recognition, and text analysis.
• In this training session you will build deep learning models for Computer Vision. One the detailed case study will be using attention based mechanism to do image segmentation and object recognition.
• Another detailed case study would be image captioning and understanding using a combination attention based CNN and sequence based model(LSTM).
• Image Classification
• Facial Recognition
• Object Detection
• Image Captioning
• Video Analysis
Case studies
• OpenCV based techniques
• GAN
• Deep CNNs
• Deep RNNs
• Segmentation
• YOLO
Techniques
• face recognition
• image generation
• video classification
• image captioning
• medical image segmentation
• product search
• Optical Character/Word/Sentence Recognition
• person detection
Key skills
• This is an advanced level session and it assumes that you have good familiarity with Machine learning.
• Working Knowledge of python
• Machine Learning Internals
• All sequence based models like RNN, LSTM, GRUs, Attention,Language Models must be known as in "Modern Natural Language Processing(NLP) with Deep Learning session (Kindly refer to the first item in the brief TOC)".
Pre-requisites
This is an instructor led course provides lecture topics and the practical application of
Deep Learning and the underlying technologies. It pictorially presents most concepts
and there is a detailed case study that strings together the technologies, patterns and
design.
Instructional Method
Software Tools
• Tensorflow
1. Installation
2. Sharing Variables
3. Creating Your First Graph and Running It in a Session
4. Managing Graphs
5. Visualizing the Graph and Training Curves Using TensorBoard
6. Implementing Gradient Descent
7. Lifecycle of a Node Value
8. Linear Regression with TensorFlow
9. Modularity
10. Saving and Restoring Models
11. Name Scopes
12. Feeding Data to the Training Algorithm
• Keras
Convolutional Neural Networks Internals
• Pooling layer
• Image augmentation
• Convolutional layer
• History of CNNs
• Convolutional layers in Keras
• Code for visualizing an image
• Input layer
• How do computers interpret images?
• Practical example image classification
• Convolutional neural networks
• Dropout
• Attention Mechanism for CNN and Visual Models
1. Types of Attention
2. Glimpse Sensor in code
3. Attention mechanism for image captioning
4. Hard Attention
5. Applying the RAM on a noisy MNIST sample
6. Recurrent models of visual attention
7. Using attention to improve visual models
8. Reasons for sub-optimal performance of visual CNN models
9. Soft Attention
• Build Your First CNN and Performance Optimization
1. Convolution and pooling operations in TensorFlow
2. Convolutional operations
3. Using tanh
4. Convolution operations in TensorFlow
5. Regularization
6. Fully connected layer
7. Weight and bias initialization
8. Pooling, stride, and padding operations
9. CNN architectures and drawbacks of DNNs
10. Applying pooling operations in TensorFlow
11. Using sigmoid
12. Training a CNN
13. Using ReLU
14. Activation functions
• Building, training, and evaluating our first CNN
1. Creating a CNN model
2. Defining CNN hyperparameters
3. Model evaluation
4. Dataset description
5. Loading the required packages
6. Running the TensorFlow graph to train the CNN model
7. Preparing the TensorFlow graph
8. Loading the training/test images to generate train/test set
9. Constructing the CNN layers
• Model performance optimization
1. Applying dropout operations with TensorFlow
2. Building the second CNN by putting everything together
3. Appropriate layer placement
4. Which optimizer to use?
5. Creating the CNN model
6. Dataset description and preprocessing
7. Number of neurons per hidden layer
8. Number of hidden layers
9. Batch normalization
10. Memory tuning
11. Training and evaluating the network
12. Advanced regularization and avoiding overfitting
• Popular CNN Model Architectures
1. Architecture insights
2. ResNet architecture
3. AlexNet architecture
4. VGG image classification code example
5. Introduction to ImageNet
6. VGGNet architecture
7. GoogLeNet architecture
8. LeNet
9. Traffic sign classifiers using AlexNet
10. Inception module
• Transfer Learning
1. Multi-task learning
2. Target dataset is small but different from the original training
3. dataset
4. Autoencoders for CNN
5. Applications
6. Target dataset is large and similar to the original training dataset
7. Introducing to autoencoders
8. Convolutional autoencoder
9. Target dataset is large and different from the original training
10. dataset
11. Transfer learning example
12. Feature extraction approach
13. Target dataset is small and is similar to the original training
14. dataset
15. An example of compression
• GAN: Generating New Images with CNN
1. Feature matching
2. GAN code example
3. Deep convolutional GAN
4. Adding the optimizer
5. Training a GAN model
6. Semi-supervised learning and GAN
7. Pixpix - Image-to-Image translation GAN
8. Calculating loss
9. Semi-supervised classification using a GAN example
10. CycleGAN
11. Batch normalization
• Object Detection and Instance Segmentation with CNN
1. Creating the environment
2. Fast R-CNN (fast region-based CNN)
3. The differences between object detection and image classification
4. Mask R-CNN (Instance segmentation with CNN)
5. Cascading classifiers
6. Haar Features
7. Faster R-CNN (faster region proposal network-based CNN)
8. Traditional, nonCNN approaches to object detection
9. R-CNN (Regions with CNN features)
10. Running the pre-trained model on the COCO dataset
11. Why is object detection much more challenging than image classification?
12. The Viola-Jones algorithm
13. Preparing the COCO dataset folder structure
14. Downloading and installing the COCO API and detectron library
15. (OS shell commands)
16. Instance segmentation in code
17. Haar features, cascading classifiers, and the Viola-Jones algorithm
18. Installing Python dependencies (Python environment)
• Popular CNN Model Architectures
1. Introduction to ImageNet
2. VGG image classification code example
3. GoogLeNet architecture
4. Architecture insights
5. Inception module
6. AlexNet architecture
7. VGGNet architecture
8. LeNet
9. ResNet architecture
10. Traffic sign classifiers using AlexNet
Deep Generative Models
• Deep Boltzmann Machines
• Back-Propagation through Random Operations
• Restricted Boltzmann Machines
• Generative Stochastic Networks
• Boltzmann Machines for Structured or Sequential Outputs
• Boltzmann Machines
• Other Boltzmann Machines
• Other Generation Schemes
• Directed Generative Nets
• Boltzmann Machines for Real-Valued Data
• Evaluating Generative Models
• Drawing Samples from Autoencoders
• Deep Belief Networks
• Convolutional Boltzmann Machines
OpenCV
• The Core Functionality (core module)
• Introduction to OpenCV
• Object Detection (objdetect module)
• Image Processing (imgproc module)
• Deep Neural Networks (dnn module)
• GPU-Accelerated Computer Vision (cuda module)
Deep learning for computer vision
• Similarity learning
• Human face analysis
• Face landmarks and attributes
1. Multi-Task Facial Landmark (MTFL) dataset
2. The Kaggle keypoint dataset
3. The Multi-Attribute Facial Landmark (MAFL) dataset
4. Learning the facial key points
• Face recognition
1. Finding the optimum threshold
2. The YouTube faces dataset
3. The labeled faces in the wild (LFW) dataset
4. The CelebFaces Attributes dataset
5. CASIA web face database
6. The VGGFace2 dataset
7. Computing the similarity between faces
• Face detection
• Face clustering
• Algorithms for similarity learning
1. Visual recommendation systems
2. DeepRank
3. FaceNet
4. The DeepNet model
5. Contrastive loss
6. Triplet loss
7. Siamese networks
• Classification
• Image Classification
• The bigger deep learning models
1. The DenseNet model
2. The Google Inception-V3 model
3. The VGG-16 model
4. The SqueezeNet model
5. The AlexNet model
6. Spatial transformer networks
7. The Microsoft ResNet-50 model
• Other popular image testing datasets
1. The Fashion-MNIST dataset
2. The CIFAR dataset
3. The ImageNet dataset and competition
• Training the MNIST model in TensorFlow
1. The MNIST datasets
2. Building a multilayer convolutional network
3. Building a perceptron
4. Loading the MNIST data
• Training a model for binary classification
1. Transfer learning or fine-tuning of a model
2. Preparing the data
3. Augmenting the dataset
4. Benchmarking with simple CNN
5. Fine-tuning several layers in deep learning
• Developing real-world applications
1. Brand safety
2. Tackling the underfitting and overfitting scenarios
3. Gender and age detection from face
4. Choosing the right model
5. Fine-tuning apparel models
• Image Retrieval
• Model inference
1. Serving the trained model
2. Exporting a model
• Understanding visual features
1. Embedding visualization
2. The DeepDream
3. Visualizing activation of deep learning models
4. Adversarial examples
5. Guided backpropagation
• Content-based image retrieval
1. Matching faster using approximate nearest neighbour
2. Extracting bottleneck features for an image
3. Computing similarity between query image and target
4. database
5. Autoencoders of raw images
6. Building the retrieval pipeline
7. Efficient retrieval
8. Advantages of ANNOY
9. Denoising using autoencoders
• Generative models
• Generative Adversarial Networks
1. Drawbacks of GAN
2. Image translation
3. InfoGAN
4. Conditional GAN
5. Adversarial loss
6. Vanilla GAN
• Applications of generative models
1. Inpainting
2. Super-resolution of images
3. Blending
4. 3D models from photos
5. Text to image generation
6. Transforming attributes
7. Creating training data
8. Image to image translation
9. Interactive image generation
10. Artistic style transfer
11. Creating new animation characters
12. Predicting the next frame in a video
• Neural artistic style transfer
1. Style transfer
2. Style loss using the Gram matrix
3. Content loss
• Visual dialogue model
1. Algorithm for VDM
2. Discriminator
3. Generator
• Video analysis
• Extending image-based approaches to videos
1. Captioning videos
2. Regressing the human pose
3. Generating videos
4. Tracking facial landmarks
5. Segmenting videos
• Exploring video classification datasets
1. UCF101
2. YouTube-8M
3. Other datasets
• Understanding and classifying videos
• Approaches for classifying videos
1. Using trajectory for classification
2. Multi-modal fusion
3. Using 3D convolution for temporal learning
4. Classifying videos over long periods
5. Fusing parallel CNN for video classification
6. Attending regions for classification
7. Streaming two CNN's for action recognition
• Image captioning
• Understanding natural language processing for image captioning
1. Expressing words in vector form
2. Training an embedding
3. Converting words to vectors
• Implementing attention-based image captioning
• Approaches for image captioning and related problems
1. Retrieving captions from images and images from captions
2. Creating captions using image ranking
3. Using attention network for captioning
4. Using multimodal metric space
5. Knowing when to look
6. Using a condition random field for linking image and text
7. Using RNN on CNN features to generate captions
8. Using RNN for captioning
9. Dense captioning
• Understanding the problem and datasets
• Detection or localization and segmentation
• Object Detection
• Object detection API
1. Re-training object detection models
2. Data preparation for the Pet dataset
3. The YOLO object detection algorithm
4. Monitoring loss and accuracy using TensorBoard
5. Pre-trained models
6. Training the model
7. Object detection training pipeline
8. Training a pedestrian detection for a self-driving car
• Detecting objects in an image
• Localizing algorithms
1. Convolution implementation of sliding window
2. Combining regression with the sliding window
3. Thinking about localization as a regression problem
4. Applying regression to other problems
5. The scale-space concept
6. Localizing objects using sliding windows
7. Training a fully connected layer as a convolution layer
• Detecting objects
1. Single shot multi-box detector
2. Regions of the convolutional neural network (R-CNN)
3. Fast R-CNN
4. Faster R-CNN
• Exploring the datasets
1. Intersection over Union
2. ImageNet dataset
3. PASCAL VOC challenge
4. COCO object detection challenge
5. Evaluating datasets using metrics
6. The mean average precision
• Semantic Segmentation
• Segmenting satellite images
1. Modeling FCN for segmentation
• Datasets
• Predicting pixels
1. Understanding the earth from satellite imagery
2. Diagnosing medical images
3. Enabling robots to see
• Algorithms for semantic segmentation
1. Large kernel matters
2. The Fully Convolutional Network
3. RefiNet
4. Upsampling the layers by pooling
5. The SegNet architecture
6. DeepLab
7. PSPnet
8. Skipping connections for better training
9. Sampling the layers by convolution
10. Dilated convolutions
• Ultra-nerve segmentation
• Segmenting instances
Topics
bottom of page
