Skip to content

zanderswai/research-analytics-ml

Repository files navigation

Ml - analytics cron service

This is an automated scheduled cron module that triggers specific processess according to business requirements. The main data generation sequence is defined as per onboarded districts of which the dataframes tables have been created. It consists of three major processes:

1. Dailypull
2. Most used software
3. Recommendations
  • The dailypull refers to the process of updating the dataframes with the latest software usage data from the student analytics reporting table. The dataframe tables are labelled per district, where all districts consists columns of apps containing data for each student. ~ Read more on the function:

  • Most used software is calculates the most used software by the individual student as well as the software used on assessments or latest usage such as forecast data. ~ Read more on the function:

  • Recommendations provides suggestions on the best software to use based on the trackline of usage and overall district performance. It will produce recommendations only for students with assessments. ~ Read more on the function:

Note: The recommendations are resource heavy, hence why only allocated to a single instance, compared to the dailypull and most used

Main columns for sorting during operations
------------------------------------------+
- Col:      Type                 
------------------------------------------+

- Value:    1 -> student with assessement data

            2 -> student with forecast data
            
            [ ** Year to date ** ]

            3 -> student with latest pre aggregated score average

            4 -> student with old/previous pre aggregate scored average

----------------------------------------------------------------------------------------------+
- Col:      Performance: This defines the score bands according to district score definitions. 
            It allows for grouping and sorting as per system requirements
----------------------------------------------------------------------------------------------+

- Value:    excellent -> scores range 88 => 100

            satisfactory -> scores range 69 => 88

            needs improvement -> scores range 55 => 68
            
            unsatisfactory -> scores range 0 => 54

The main sequence of operation follows the following order.

1. The daily pull starts as the source of truth. After sorting and updating student data, the dataframes will contain appropriate scores data

2. After the daily pull completes, the most used software commences where it generates software data per student data type. Whether its assessment data
   Or forecast data.

3. Finally the recommendations process is executed to generate based only off assessment data only            

Onbording a district

This sequence creates data for visualization on the dashboard per district which involves running the three main data generation processes above. Here's how we onboard a new district:

1. Create the dataframes table as per district id and selected applications
2. Add the district id to the global list in the `./trigger.py`
3. Run the data generation sequence.

About

Predictive analytics machine learning cron server based on Time schedule

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages