
Machine Learning Internals
Training Description
Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it. Many researchers also think it is the best way to make progress towards human-level AI.
This trainig session provides a deep dive into machine learning, datamining, and statisticalpattern recognition. Topics include: (i) Supervised learning (parametric/nonparametricalgorithms, support vector machines, kernels, neural networks).
(ii)Unsupervised learning (clustering, dimensionality reduction, recommender systems,deep learning).
(iii) Best practices in machine learning (bias/variance theory;innovation process in machine learning and AI). The course will also draw fromnumerous case studies and applications, so that you'll also learn how to applylearning algorithms to building smart robots (perception, control), text understanding(web search, anti-spam), computer vision, medical informatics, audio, databasemining, and other areas.
- Measuring and Tuning performance of ML algorithms 
- You'll learn about not only the theoretical underpinnings of learning, but also gain the practical know-how needed to quickly and powerfully apply these techniques to new problems 
- Most effective machine learning techniques 
- You will learn how to Prototype and then productionize 
- Best practices in innovation as it pertains to machine learning and AI 
- Use tools like Scikit for ML tasks 
Key skills
- Experience in Programming 
- An understanding of Intro to Statistics would be helpful. 
- A familiarity with Probability Theory, Calculus, Linear Algebra and Statistics is required 
- Working Knowledge of Python 
Pre-requisites
- Data Scientist 
- People who want to take their skills to the next level especially to State-of-the-art NLP 
- Software Engineers 
Intended Audience
This is an instructor led course provides lecture topics and the practical application of Machine Learning and the underlying technologies. It pictorially presents most concepts and there is a detailed case study that strings together the technologies, patterns and design.
Instructional Method
Introduction
- Model selection 
- Supervised learning 
- Discovering graph structure 
- Types of machine learning 
- Machine learning: what and why? 
- Parametric vs non-parametric models 
- No free lunch theorem 
- Linear regression 
- Some basic concepts in machine learning 
- Discovering clusters 
- Classification 
- Regression 
- Matrix completion 
- Logistic regression 
- Parametric models for classification and regression 
- The curse of dimensionality 
- Overfitting 
- Unsupervised learning 
- Discovering latent factors 
- A simple non-parametric classifier: K-nearest neighbors 
Machine Learning for Predictive Data Analytics
- Predictive Data Analytics Tools 
- How Does Machine Learning Work? 
- The Road Ahead 
- What Can Go Wrong with Machine Learning? 
- The Predictive Data Analytics Project Lifecycle: CRISP-DM 
- What Is Machine Learning? 
- What Is Predictive Data Analytics? 
Data to Insights to Decisions
- Different Types of Data 
- Different Types of Features 
- Designing the Analytics Base Table 
- Designing and Implementing Features 
- Assessing Feasibility 
- Converting Business Problems into Analytics Solutions 
- Case Study: Motor Insurance Fraudmotor 
- Implementing Features 
- Handling Time 
Data Exploration
- Outliers 
- Handling Missing Values 
- Handling Outliers 
- Missing Values 
- Irregular Cardinality 
- Handling Data Quality Issues 
- The Data Quality Report 
- The Normal Distribution 
- Identifying Data Quality Issues 
- Getting to Know the Data 
Advanced Data Exploration
- Measuring Covariance and Correlation 
- Visualizing Relationships Between Features 
- Binning 
- Data Preparation 
- Normalization 
Information-based Learning
- Shannon’s Entropy Model 
- Handling Continuous Descriptive Features 
- Decision Trees 
- Predicting Continuous Targets 
- Extensions and Variations 
- Fundamentals 
- Information Gain 
- Big Idea 
- Standard Approach: The ID Algorithm 
- Tree Pruning 
- Alternative Feature Selection and Impurity Metrics 
Similarity-based Learning
- Standard Approach: The Nearest Neighbor Algorithm 
- Predicting Continuous Targets 
- Fundamentals 
- Other Measures of Similarity 
- Extensions and Variations 
- Data Normalization 
- Feature Space 
- Big Idea 
- Measuring Similarity Using Distance Metrics 
- Feature Selection 
- Handling Noisy Data 
- Efficient Memory Search 
Probability-based Learning
- Big Idea 
- Smoothing 
- Extensions and Variations 
- Bayes’ Theorem 
- Bayesian Networks 
- Continuous Features: Probability Density Functions 
- Continuous Features: Binning 
- Bayesian Prediction 
- Conditional Independence and Factorization 
- Fundamentals 
- Standard Approach: The Naive Bayes Model 
Error-based Learning
- Setting the Learning Rate Using Weight Decay 
- Error Surfaces 
- Multinomial Logistic Regression 
- Modeling Non-linear Relationships 
- Handling Categorical Descriptive Features 
- Interpreting Multivariable Linear Regression Models 
- Simple Linear Regression 
- Big Idea 
- Handling Categorical Target Features: Logistic Regression 
- Extensions and Variations 
- Fundamentals 
- Choosing Learning Rates and Initial Weights 
- Standard Approach: Multivariable Linear Regression with Gradient 
- Descent 
- Gradient Descent 
- Multivariable Linear Regression 
- Measuring Error 
Evaluation
- Performance Measures: Prediction Scores 
- Designing Evaluation Experiments 
- Evaluating Models after Deployment 
- Performance Measures: Multinomial Targets 
- Extensions and Variations 
- Fundamentals 
- Performance Measures: Continuous Targets 
- Performance Measures: Categorical Targets 
- Big Idea 
- Standard Approach: Misclassification Rate on a Hold-out Test Set 
Software Tools
- scikit learn 
Scikit Learn
- Introduction to scikit learn 
- Types 
- Model persistence 
- setting up scikit learn 
- Machine learning: the problem setting 
- Learning and predicting 
Tensorflow
- Feeding Data to the Training Algorithm 
- Saving and Restoring Models 
- Introduction and setting up 
- Creating Your First Graph and Running It in a Session 
- Lifecycle of a Node Value 
- Managing Graphs 
Linear regression
- Regularization effects of big data 
- Bayesian inference when ?^2 is unknown * 
- Model specification 
- Numerically stable computation * 
- Computing the posterior 
- Geometric interpretation 
- Convexity 
- Connection with PCA * 
- Maximum likelihood estimation (least squares) 
- Bayesian linear regression 
- Derivation of the MLE 
- Computing the posterior predictive 
- EB for linear regression (evidence procedure) 
- Ridge regression 
- Basic idea 
- Robust linear regression * 
- Introduction 
Logistic regression
- Residual analysis (outlier detection) * 
- Generative vs discriminative classifier 
- Multi-class logistic regression 
- Online learning and regret minimization 
- Iteratively reweighted least squares (IRLS) 
- Quasi-Newton (variable metric) methods 
- Newton’s method 
- Bayesian logistic regression 
- A Bayesian view 
- Laplace approximation 
- l2 regularization 
- Gaussian approximation for logistic regression 
- Approximating the posterior predictive 
- Derivation of the BIC 
- Steepest descent 
- Introduction 
- MLE 
- Model specification 
- Online learning and stochastic optimization 
- Dealing with missing data 
- Fisher’s linear discriminant analysis (FLDA) * 
- Model fitting 
- Stochastic optimization and risk minimization 
- Pros and cons of each approach 
- The LMS algorithm 
- Logistic regression 
- The perceptron algorithm 
Support Vector Machine
- Linear SVM Classification 
- SVM Regression 
- Nonlinear SVM Classification 
- Under the Hood 
Kernels
- Smoothing kernels 
- Kernels for comparing documents 
- The kernel trick 
- SVMs for classification 
- Kernelized ridge regression 
- SVMs for regression 
- Linear kernels 
- Kernel machines 
- Introduction 
- Comparison of discriminative kernel methods 
- Kernelized nearest neighbor classification 
- Using kernels inside GLMs 
- A probabilistic interpretation of SVMs 
- Kernel functions 
- RBF kernels 
- Mercer (positive definite) kernels 
- Kernel density estimation (KDE) 
- Kernel PCA 
- String kernels 
- LVMs, RVMs, and other sparse vector machines 
- Kernels for building generative models 
- Choosing C 
- Kernelized K-medoids clustering 
- Pyramid match kernels 
- Kernel regression 
- Kernels derived from probabilistic generative models 
- Locally weighted regression 
- Summary of key points 
- Support vector machines (SVMs) 
- From KDE to KNN 
- Matern kernels 
Decision Trees
- Regularization Hyperparameters 
- Regression 
- Making Predictions 
- Training and Visualizing a Decision Tree 
- Gini Impurity or Entropy? 
- Estimating Class Probabilities 
- Decision Trees 
- Instability 
- Computational Complexity 
- The CART Training Algorithm 
Dimensionality Reduction
- SVD 
- PCA 
- Kernel PCA 
- LLE 
- Main Approaches for Dimensionality Reduction 
- The Curse of Dimensionality 
- CUR 
Introduction to Deep Learning
- Parameter Hyperspace 
- Minimizing Cost Entropy 
- Normalized Inputs And Initial Weights 
- Measuring Performance 
- Transition Into Practical Aspects Of Learning 
- Stochastic Gradient Descent 
- Training your Logistic Classifier 
- Transition: Overfitting -> Dataset Size 
- Momentum And Learning Rate Decay 
- Supervised Classification 
- Solving Problems 
- Lather Rinse Repeat 
- Optimizing A Logistic Classifier 
- Cross Entropy 
- What is Deep Learning 
Deep Neural Network
- "2-layer" neural network 
- Network Of ReLUs 
- Dropout 
- Intro to Deep Neural Network 
- No Neurons 
- Backprop 
- Regularization Intro 
- Linear Models Are Limited 
- The Chain Rule 
- Dropout Pt-2 
- Regularization 
- Training A Deep Learning Network 
Clustering
- Hierarchical risk parity 
- The expectation-maximization algorithm 
- k-Means clustering 
- Hierarchical DBSCAN 
- Gaussian mixture models 
- Density-based clustering 
- Visualization – dendrograms 
- Hierarchical clustering 
- DBSCAN 
- Evaluating cluster quality 
Gradient Boosting TechniquesAdaptive boosting
- AdaBoost with sklearn 
- The AdaBoost algorithm 
Gradient boosting machines
- How to use gradient boosting with sklearn 
- Shrinkage and learning rate 
- How to tune parameters with GridSearchCV 
- How to test on the holdout set 
- How to train and tune GBM models 
- Parameter impact on test scores 
- Ensemble size and early stopping 
- Subsampling and stochastic gradient boosting 
Fast scalable GBM implementations
- Randomized grid search 
- DART – dropout for trees 
- How algorithmic innovations drive performance 
- Second-order loss function approximation 
- Treatment of categorical features 
- How to create binary data formats 
- Depth-wise versus leaf-wise growth 
- Regularization 
- Additional features and optimizations 
- GPU-based training 
- Objectives and loss functions 
- How to tune hyperparameters 
- Simplified split-finding algorithms 
- Cross-validation results across models 
- How to use XGBoost, LightGBM, and CatBoost 
- Learning parameters 
- How to evaluate the results 
Time Series ModelsMultivariate time series models
- Testing for cointegration 
- The vector autoregressive (VAR) model 
- Cointegration – time series with a common trend 
- How to use cointegration for a pairs-trading strategy 
- How to use the VAR model for macro fundamentals forecasts 
- Systems of equations 
Analytical tools for diagnostics and feature extraction
- How to diagnose and achieve stationarity 
- How to compute rolling window statistics 
- How to decompose time series patterns 
- How to diagnose and address unit roots 
- Moving averages and exponential smoothing 
- How to apply time series transformations 
- Time series transformations 
- How to measure autocorrelation 
- Unit root tests 
Univariate time series models
- Selecting the lag order 
- How to build ARIMA models and extensions 
- How to identify the number of lags 
- How to build autoregressive models 
- How to diagnose model fit 
- How to identify the number of AR and MA terms 
- Adding features – ARMAX 
- Adding seasonal differencing – SARIMAX 
- How to forecast macro fundamentals 
- How to build moving average models 
- Generalizing ARCH – the GARCH model 
- The autoregressive conditional heteroskedasticity (ARCH) model 
- The relationship between AR and MA models 
- How to identify the number of lags 
- How to build a volatility-forecasting model 
- How to use time series models to forecast volatility 
SOTA
- CNN 
- ALSTM-FQN 
- LSTM 
- LSTM-FQN 
Topics
