DS Interview Prep Notes
1
Summary resources for DS learning
1.1
Statistics, Probability, and A/B Testing
1.1.1
Stanford Courses (NON-EDUC):
1.1.2
Online Resources
1.2
SQL / R / Python / Visualization Tools
1.3
Machine Learning resources
1.3.1
Stanford Classes:
1.3.2
Online Resources
1.4
Interview Questions & Resume
2
Statistics and Probability
2.1
IMPORTANT Concepts to review
2.1.1
Key terms for Variability Metrics
2.2
Designing Studies
2.2.1
Identifying variables type:
2.2.2
Classify study type as ovbservational or experimental
2.2.3
Sampling Techniques
2.2.4
Principles of Experimental Design—Control, Randomize, Replicate, and Block—and their purposes
2.2.5
Random Sampling vs. Random Assignment
2.2.6
Hypothesis Tests and Resampling
2.2.7
Statistical Significance and p-values
3
A/B Testing
3.1
What is A/B testing?
3.2
What you can’t do with A/B testing?
3.3
Defining the hypothesis
3.4
Defining Metrics and Gathering Data
3.4.1
High-Level concepts for metrics
3.4.2
Methods for Coming up w/ Proxy Metrics or Validating Metrics
3.4.3
Gathering Additional Data
3.4.4
Segmenting and Filtering Data
3.5
Designing an A/B test
3.5.1
Summary workflow of A/B testing:
3.6
Step1: Choose and characterize metrics for both
sanity check and evaluation
3.7
Step 2: Choose significance level, statistical power and practical significance level
3.8
Step 3: Calculate required sample size
3.9
Step 4: Take sample for control/treatment groups and run the test
3.10
Step 5: Analyze the results and draw conclusions
3.10.1
First step — Sanity Check
3.10.2
Second step — Analyze the Results
3.10.3
Last step — Draw Conclusions
3.11
Other things to keep in mind
4
Database Management and Data Systems (SQL)
4.1
CRUD (create, read, update, delete) Operations
4.1.1
CREATE —> Databases | Tables | Views | Users | Permissions | Security Groups
4.1.2
INSERT, — insert new records into existing database tables
4.1.3
UPDATE — Amend existing database records
4.1.4
DELETE — delete existing records from tables
4.1.5
Declare Variables —> so it will be easier to use in later conditions without repetitively calling the same values
4.1.6
Temporary tables
4.1.7
READ | VIEW
4.2
Exploratory Data Analysis in SQL (T-SQL)
4.2.1
GROUP BY | HAVING | WHERE
4.2.2
JOIN examples
4.2.3
UNION Operator
4.2.4
CASE statements can be used to create columns (new variables) for
4.2.5
TEXT operations
4.2.6
Substituting NULL values using COALESCE in T-SQL
4.2.7
DATE
4.2.8
ROUND and TRUNCATE
4.3
Advanced SQL - loops/CTE/Windows
4.3.1
WHILE Loops & DECLARE
4.3.2
Derived Tables
4.3.3
CTE (Common Table Expressions)
4.3.4
Window Functions in SQL
4.3.5
Windows functions Cheatsheet
4.3.6
Defining a window alias
5
CS145 — DBs & Data Systems
5.1
Introduction — DB overview
5.1.1
Project goals
5.1.2
IO Blocks for Efficiency
5.1.3
Basic System Numbers
5.1.4
IO Cost Model
5.2
System Primer
5.3
5.3.1
Data Independence
5.3.2
Data Model
6
Machine Learning Models
6.1
Introduction
6.1.1
What is Machine Learning
6.1.2
Why do we use Machine Learning?
6.1.3
Examples of ML applications in industries
6.1.4
Types of Machine Learning Systems
6.1.5
Practical ML advice
6.2
Supervised Learning
7
Product Sense
Published with bookdown
Data Science Prep Notes
Chapter 7
Product Sense