Plots for numeric variables and how to interpret them:
boxplot, hist, violin, joint, pair/scatter, heatmap
Attributes and Functions for data exploration:
`.corr()`,
`.concat()`, `.index`, `.reindex()`,
`.select_dtypes()`,
`.drop()`
Dealing with categorical variables via `.astype('category')` on Series objects - more on this later!
Overview 🗺️
(Business) Problem ↔ Data Acquisition ↔ Data Cleaning ↔ Data Exploration / Visualisation ↔ Modelling ↔ Reporting ↔ Deployment
Purpose: familiarise yourself with the data, know which features might be relevant, prepare data for modeling
Data cleaning (while exploring & visualising): dealing with NaN, checking for nonsensical outliers, cleaning up strings, finding relevant subsets of data (meet conditions you are interested in)
2 key learning points in Lab 3
Plots for categorical variables and how to interpret them: catplot
(strip/swarm; (enhanced) boxplot, violin; point, bar, count)
Credits to Charlene for the slides (and the pointers on dealing with NaNs)!
One more thing!
Do well in Year 1, opens up a lot of opportunities (good internship, URECA, ABP, double major)
Work hard and smart - competing with non-CS/CE ppl too ; at some point the market will saturate
Have realistic goals in Uni - don't do too much, health comes first
1 big goal if possible - break it down, e.g. take external relevant courses/certs, win competitions, do well in intern
Don’t forget to have fun - best if what you do when having fun is linked to the goal ; fun to talk about / share during interviews
References
This set of slides is made using reveal.js.
It's really easy to make a basic set of slides (just HTML) and you can consider using it for simple (tech) presentations!
For more advanced customization, you do need CSS and JS but scripts can be easily googled for and it has good documentation.