Statistical Learning for Data Science
About This Course
This is an introductory course in both the theory and applications of Statistical Machine Learning.
Data-driven decision-making has been enabled by the wide usage of a variety of statistical machine learning techniques. Smart use of data aids in improving profits in businesses, the quality of life of individuals, the performance of sports teams, and the way we network with each other.
The course will expose students to real world examples of how machine learning methods are being used, as well as provide the fundamentals behind the working of such methods. We will particularly address empirical risk minimization, linear and logistic regression, discrete choice models, nonparametric inference, model selection in high dimensions, classification and regression trees, random forests, clustering, ensemble learning, and multiple testing. The course will include additional topics of active learning, differential privacy, and ethical concerns in data analytics. We will use the statistical software R for implementing the algorithms taught in class.
What You'll Learn
2. Describe data effectively, predict future outcomes, and prescribe decisions using the machine learning tools.
3. Understand and apply fundamental concepts of statistical machine learning including empirical risk minimization, linear and logistic regression, model selection, clustering, ensemble learning, learning in high-dimensions and multiple testing.
Entry Requirements
Students should have a basic understanding of linear algebra, calculus, and basic probability and statistics.