SwarmPlot: like boxplot but can see actual data points (but can't see quartiles)
Build a decision tree with sklearn, along with the Attributes and Functions for the model object: Building model: .fit(), Checking model: .plot_tree() Using model: .predict(), .predict_proba() Analysing model: .score(), .confusion_matrix()
Overview πΊοΈ
(Business) Problem β Data Acquisition β Data Cleaning β Data Exploration / Visualisation β Modelling β Reporting β Deployment
We've seen regression and classification models
Many other models exist, but first understand the basics well
We haven't gone into the details (e.g. regularisation (see Slide 31) ; lasso, ridge regression, elastic net)
4 key learning points in Lab 6
One-hot encoding for categorical variables
Dealing with imbalanced datasets (imblearn is good)
The data engineering side of things: Using Apache Spark, Kubernetes
Lab 5 Deliverables
No submission! :)
All the best for the quiz!
The human side of DS
For insights derived from data to be useful, they have to be actionable (i.e. predictive and prescriptive).
Convincing decision makers with insights from data is hard, especially if they are used to relying on their instincts.
Thick data - information humans can obtain but we don't know how to encode it well for computers to understand ; 'semantic gap'
One more thing!
Aim high but have legitimate backup plans (that you'll enjoy)
References
This set of slides is made using reveal.js.
It's really easy to make a basic set of slides (just HTML) and you can consider using it for simple (tech) presentations!
For more advanced customization, you do need CSS and JS but scripts can be easily googled for and it has good documentation.