Wharton Ph.D. Summer Tech Camp 2017

Instructor: Alex Miller

Ph.D. Student, Information Systems (OID Department)
Email: alexmill@wharton.upenn.edu
Office: 526.4 JMHH

Course Description

The aim of this course is to familiarize incoming and current Wharton PhD students with the basic technical skills and tools required for empirical research. This includes publicly available and open source tools (e.g., AWS, Python, R) and Wharton-specific resources (e.g., Wharton grid computing cluster, WRDS). The course is primarily concerned with acquiring, cleaning, managing, and analyzing data. It will provide hands-on experience using variety of computing tools, including intro-level machine learning and natural language processing techniques. At the end of this short-term course, students will have a better understanding of what tools are most appropriate for different data analysis tasks at hand.

There is no prerequisite for this course. Feel free to attend the sessions selectively. Auditing is welcome. The format will be roughly a 60-min lecture followed by a 30-min lab session, where you are encouraged to work on exercises. There is no exam. Please bring your own laptop for this course.

Dates and Time

July 31 - Aug 16 (8 sessions)
10:00-11:30am Monday/Wednesday/Friday

Location

First 3 Sessions: F55
Last 5 Sessions: F70

GitHub Repository

All the notes and slides for the course can be downloaded at the course’s repository on GitHub.

Sessions

Intro, Unix, Git, and R Basics
Mon 31 July, Room F55
- Before class (optional):
  - Install R and R Studio (Ignore the “SDSFoundations” stuff)
  - Windows users: Install GitBash
  - Register for a GitHub account
- Slides
- Exercises
More R & Python Intro
Wed 2 Aug, Room F55
- Pre-class: Apply for Wharton grid account.
- Links:
  - Recommended Python IDE for beginners: Thonny
  - Also consider Anaconda, which handles a lot of the hassle of installing scientific Python packages automatically.
- Slides
- Exercises
  - Answers to beginner exercises
Wharton HPCC and Behavioral Lab (Guest Speakers)
Fri 4 Aug, Room F55
- Pre-class: Apply for Wharton grid account
- Link to Hugh’s Slides
- Exercises
Structured Data Collection: Consuming APIs in Python
Mon 7 Aug, Room F70
- Note the room change!
- Slides
- Exercises
Unstructured Data Collection in Python: Crawling and Scraping
Wed 9 Aug, Room F70
Advanced Scraping and Regex
Fri 11 Aug, Room F70
- Slides
- Exercises
  - Solution
Intro to Text Mining and NLP
Mon 14 Aug, Room F70
Intro to Concepts in Machine Learning
Wed 16 Aug, Room F70
- Slides
- Word2Vec Notebook Files:
  - Browse on GitHub
  - Download ZIP