Using R to predict ATAR

Statistics using R - UniSA

Last year I completed a course with UniSA called Statistics using R. This 10 week course built upon skills taught in Programming Fundamentals and Database Fundamentals.

The content, developed by Tim Bogomolov, introduced a powerful statistical computing software called R. It was outstanding. Shortly after the course concluded I decided to use these new skills to analyse and predict a student’s ATAR outcome.

This article is a follow up to ‘Closing the ATAR gap with predictive analytics’ where I introduced the concept. This article discusses the technical aspects to how I achieved the prediction with R, and how that was implemented in Power BI.

Business Insights

BI comes in 3 flavours: Descriptive, Predictive and Prescriptive. We use data mining techniques to produce insights into: What has happened (Descriptive), What will happen (Predictive), and what should be done (Prescriptive).

Let’s apply this to an education perspective. We need historical data, of which there is lots. E.g. attendance data, pastoral data, and standardised testing data. Naplan is a good example of standardised, structured data. ATAR data is another, very rich dataset. TISC provide this to each school, and I believe it is critical for each school to garner as much knowledge from this data as possible.

TISC data

To get started you will need your schools ATAR data from TISC. This is a sample dataset. And from a data mining perspective this is an incredibly rich data source!

This data contains so many stories.

  • How has a student’s School Assessment score been moderated?

  • Which way has a Course been scaled?

  • What is happening from one year to the next?

  • Why is this happening and what can be done to improve student and teacher performance?

For more information on taking descriptive insights from your ATAR data, see this post