This course will give a detailed introduction to learning theory with a focus on the classification problem. It will be shown how to obtain (pobabilistic) bounds on the generalization error for certain types of algorithms. The main themes will be: * probabilistic inequalities and concentration inequalities * union bounds, chaining * measuring the size of a function class, Vapnik Chervonenkis dimension, shattering dimension and Rademacher averages * classification with real-valued functions Some knowledge of probability theory would be helpful but not required since the main tools will be introduced.
Turning procedural and structural knowledge into programs has established methodologies, but what about turning knowledge into probabilistic models? I explore a few examples of what such a process could look like.
This course covers the design and analysis of randomized algorithms and, more generally, applications of randomness in computing. You will learn fundamental tools from probability and see many applications of randomness in computing.
- Robust and stochastic optimization
- Convex analysis
- Linear programming
- Monte Carlo simulation
- Model-based estimation
- Matrix algebra review
- Probability and statistics basics
John D. Cook, Greg Egan, Dan Piponi and I had a fun mathematical adventure on Twitter. It started when John Cook wrote a program to compute the probability distribution of distances $latex |xy - yx|$ where $latex x$ and $latex y$ were two randomly chosen unit quaternions: • John D. Cook, How far is xy…
The Score Function Estimator Is Widely Used For Estimating Gradients Of Stochastic Objectives In Stochastic Computation Graphs (scg), Eg. In Reinforcement Learning And Meta-learning. While Deriving The First-order Gradient Estimators By Differentiating A Surrogate Loss (sl) Objective Is Computationally And Conceptually Simple, Using The Same Approach For Higher-order Gradients Is More Challenging. Firstly, Analytically Deriving And Implementing Such Estimators Is Laborious And Not Compliant With Automatic Differentiation. Secondly, Repeatedly Applying Sl To Construct New Objectives For Each Order Gradient Involves Increasingly Cumbersome Graph Manipulations. Lastly, To Match The First-order Gradient Under Differentiation, Sl Treats Part Of The Cost As A Fixed Sample, Which We Show Leads To Missing And Wrong Terms For Higher-order Gradient Estimators. To Address All These Shortcomings In A Unified Way, We Introduce Dice, Which Provides A Single Objective That Can Be Differentiated Repeatedly, Generating Correct Gradient Estimators Of Any Order In Scgs. Unlike Sl, Dice Relies On Automatic Differentiation For Performing The Requisite Graph Manipulations. We Verify The Correctness Of Dice Both Through A Proof And Through Numerical Evaluation Of The Dice Gradient Estimates. We Also Use Dice To Propose And Evaluate A Novel Approach For Multi-agent Learning. Our Code Is Available At Https://goo.gl/xkkgxn.