## Data Science and Analytics - Master of Professional Studies

## Course Descriptions

**DATA601: Probability and Statistics, 3 credits. **The course aims to provide a solid understanding of the fundamental concepts of probability theory and statistics. The course covers the basic probabilistic concepts such as probability space, random variables and vectors, expectation, covariance, correlation, probability distribution functions, etc. Important classes of discrete and continuous random variables, their inter-relation, and relevance to applications are discussed. Conditional probabilities, the Bayes formula, and properties of jointly distributed random variables are covered. Limit theorems, which investigate the behavior of a sum of a large number of random variables, are discussed. The main concepts random processes are then introduced. The latter part of the course concerns the basic problems of mathematical statistics, in particular, point and interval estimation and hypothesis testing.

**DATA602: Principles of Data Science, 3 credits. **An introduction to the data science pipeline, i.e., the end-to-end process of going from unstructured, messy data to knowledge and actionable insights. Provides a broad overview of what data science means and systems and tools commonly used for data science and illustrates the principles of data science through several case studies.

**DATA603: Principles of Machine Learning, 3 credits. **A broad introduction to machine learning and statistical pattern recognition. Topics include: Supervised learning: Bayes decision theory, discriminant functions, maximum likelihood estimation, nearest neighbor rule, linear discriminant analysis, support vector machines, neural networks, deep learning networks. Unsupervised learning: clustering, dimensionality reduction, PCA, auto-encoders. The course will also discuss recent applications of machine learning, such as computer vision, data mining, autonomous navigation, and speech recognition.

**DATA604: Data Representation and Modeling, 3 credits. **An introductory course connecting students to the most recent developments in the field of data science. It covers several fundamental mathematical concepts which form the foundations of Big Data theory. Among the topics included are Principal Component Analysis, metric learning and nearest neighbor search, elementary spectral graph theory, minimum and maximum graph cuts, graph partitions, Laplacian Eigenmaps, manifold learning and dimension reduction concepts, clustering and classification techniques such as k-means, kernel methods, Mercer’s theorem, and Support Vector Machines. Some relevant concepts from geometry and topology will be also covered. Expected learning outcomes include that students should be able to recognize, articulate, describe, differentiate, compare and apply core concepts of mathematical data science, and they should be able to use these concepts to interpret and analyze the provided curated data sets, while generating analytical results with justifiable logical conclusions. Moreover, they should be able to select, justify, design, revise and apply the learned algorithms to obtain the desired analytics outcomes. They also should be able to review, evaluate and assess other similar algorithms in a critical and constructive manner.

**DATA605: Big Data Systems, 3 credits.** An overview of data management systems for performing data science on large volumes of data, including relational databases, and NoSQL systems. The topics covered include different types of data management systems, their pros and cons, how and when to use those systems, and best practices for data modeling.

**DATA606: Algorithms for Data Science, 3 credits.** Provides an in-depth understanding of some of the key data structures and algorithms essential for advanced data science. Topics include random sampling, graph algorithms, network science, data streams, and optimization.

**DATA607: Communication in Data Science and Analytics, 3 credits.** Expected learning outcomes include that, in the context of data science and analytics, students should be able to: summarize, report, organize prose, statistics, graphics, and presentations; explain uncertainty, sensitivity/robustness, limitations; describe model generation and representation; discuss interpretations and implications; communicate effectively to diverse audiences within a business organization, and possibly other outcomes.

**DATA611: Analysis of Networks, 3 credits.** Expected learning outcomes include that students should be able to: effectively and efficiently represent a variety of systems as networks; calculate and interpret structural measures on these networks; generate statistical predictions of unobserved portions of the network; model the growth of networks over time; model dynamical processes (e.g., the spread of information) on static networks; visualize large complex networks at different levels of resolution, depending on the context.

**DATA612: Deep Learning, 3 credits.** This course provides an introduction to the construction and use of deep neural networks, that is models that are composed of several layers of nonlinear processing. The class will especially focus on the foundational understanding of the main features in the structure of deep neural nets that make them attractive from the computational point of view and for a sample of applications. Specific topics include the key concept of backpropagation and its importance to reduce the computational cost of the training of the neural nets, a discussion of some of the various coding tools available and how they use parallelization, and a presentation and study of convolutional neural networks. The concepts introduced in the class will finally be illustrated by some examples of applications chosen among various classification/clustering questions, computer vision, natural language processing.

**DATA698: Research Methods and Study Design, 3 credits.** Expected learning outcomes include that students should be able to: compose problem specifications relevant to work environment, create project descriptions, determine data and resource requirements, propose appropriate methods analytical methods, construct research plans; determine reporting requirements appropriate to various employment situations, identify intended audiences and uses, propose supporting documentation, and possibly other outcomes. Includes ethical and legal considerations in data science. Intended to be penultimate course, though course may be taken concurrently with other courses.

**DATA699: Capstone Research Project, 3 credits.** Expected learning outcomes include that students should be able to: originate and conduct a data science/data analysis project applicable to organization objectives; develop a written report; construct other deliverables (e.g., web site, database, enriched media presentation); and possibly other outcomes. Students would choose a project reflecting their domain interest (and perhaps of professional relevance). Course instructor would provide guidance to set appropriate expectations and scope. Intended to be ultimate course of the program, though course may be taken concurrently with other courses except DATA698.