Kaggle is a website which hosts machine learning competitions. I recently completed the Titanic competition on Kaggle. The task was to predict who died and who survived on the Titanic, given data on approximately 900 passengers. I did this by using a random forest model to classify the passengers. I managed to correctly classify 74% of the passengers. As I had never used a random forest model before, I taught myself how the model works as I went along. This competition made me realize the importance of knowing the theory behind the model being fitted and understanding the data, as it makes it easier to interpret the model and gain insight from the data.
Before fitting and improving the model, I got the data into the desired format and analysed it. I go into this in more detail in my kernel (this is the code with comments explaining it):