Project Overview
The major goals for this project are:
- Question 1 In all countries across the world, how many children finish primary school in 2019?
- Question 2 How is the result from question 1 affected by the income of the countries?
- Question 3 Is it possible to identify objectively reasons for children not finishing school. What factors influencing the completion rate can be identified?
- Question 4 How well can predictions be made for the figures 5 and 10 years in the future?
Technologies Used
- Python
- ML
- LinearRegression
- RandomForestRegressor
- matplotlib
Project Highlights
- Challenges in Data Collection: Accurate primary school completion rates are difficult to determine due to varying data availability across countries. A significant portion of the project effort was dedicated to gathering diverse datasets with different methodologies.
- Saturation Point Hypothesis: The project identified a potential pattern in primary school completion rates, suggesting that, at a certain level of a country's development, a saturation point for completion rates may be reached.
- Comparison of Data Models: The study compared completion rates using linear interpolation, constant extrapolation, auxiliary datasets, and a regression model, revealing differences in the predictions made by each approach.
- Regression Model Insights: The regression model was effective in predicting completion rates for countries that don't regularly report data. However, it is not suitable for predicting future values due to the necessity of forecasting its explanatory variables, and its results were harder to interpret due to the complexity of the random forest regression model used.