We followed CRISP-DM or Cross-Industry Standard Process for Data Mining. The general CRISP-DM process model includes six phases that address the main issues in data mining. The six phases fit together
in a cyclical process designed to incorporate data mining into your larger business practices.
It is a powerful and extensible environment, with a wide range of statistics and data visualization capabilities. Users can perform data analysis and visualization with a minimal amount of R code. Users
can write their own R functions and packages that can be used locally, shared within their organizations, or shared with the broader R community through CRAN.
To mine, model, and predict we used Oracle R Enterprise. For visualization of the results we used OBIEE.
R is an open-source language and environment that supports:
Statistical computing and data visualization
Data manipulations and transformations
And sophisticated graphical displays
As mentioned previously, R is a statistics language that is similar to SAS or SPSS.
Oracle R Enterprise (ORE) contains a statistics engine, and provides transparent access to database-resident data from R
We gathered data from different sources into R, joined them into a table and ran several multivariable regressions.
After running several iterations and omitting insignificant variables we got to the final model:
Factors such as high school GPA, student's ethnicity, or the amount of financial aid needed had the most significant coefficients. Our level of significance was 0.05. Our final model then consists of
parameter filler where by loading new student data we can identify at risk students.
The results and graphical analysis can then be projected into Oracle Business Intelligence Enterprise Edition (OBIEE). The reports and functions are dynamically generated through execution of the SQL
query that invokes the R function stored in the database R script repository. Oracle R Enterprise enables graphical, analytical, and statistical insights to be deployed throughout the enterprise using
Teachers and administrators can now identify at-risk students and constructively intervene with personalized assistance. Predict which intervention activities will have the optimal impact on students.
At least 25 hours cut off the workload.