Towards Ethical and Transparent Recommendation Algorithms for Learning Materials:
Empowering Online Learners
The success of many video-hosting social networks is largely attributed to their recommendation algorithms, which boost user retention and consequently increase ad revenue. These algorithms are essential to a company's commercial success, and as such, they are closely guarded and unintelligible to outsiders.
Reverse engineering these algorithms is the only way to understand their inner workings. Notably, large non-profit organizations like the Mozilla Foundation have used crowdsourcing to decipher YouTube's algorithm. Utilizing a special browser extension and numerous volunteers, they gathered data on video recommendations. In their report, they urged for immediate action to regulate YouTube's algorithm. This is just one example of the many critiques targeting recommendation algorithms.
Despite the opaque nature of these recommendations, websites like YouTube host a wealth of educational content created by both amateurs and professionals. It's possible that this high-quality content is affected by the same recommendation algorithm, resulting in good educational material being undiscoverable or creators modifying their videos to appease the algorithm.
Given the influence of recommendation algorithms and the abundance of free online learning materials, the challenge lies in designing an algorithm that is both ethical and transparent for learners.
Our team is diligently addressing this multifaceted problem, which encompasses technological, ethical, and social aspects. We will delve into each of these facets in upcoming blog posts, beginning with a general overview.
At the heart of the recommendation algorithm lies the connectome of all content. For learning materials, we must capture two primary "dimensions."
First, we need a method to relate videos based on their covered topics. For instance, we may want to determine if two given videos address the same Common Core Standard. This can be achieved either by manually labeling each video or by using AI to automatically detect topic content. Both approaches are valid, and a hybrid method may be the most practical solution. Additionally, topics are often interrelated and exhibit a clear hierarchical structure. We will discuss this further in an upcoming post.
Second, videos differ in style. Factors such as visual style (e.g., Khan Academy-like videos, lecturers in front of blackboards, prop manipulation, etc.), intensity (e.g., frequency of edits, word counts, etc.), and complexity (e.g., use of equations, amount of written text on slides, video duration, etc.) all contribute to the overall style. AI can be used to assign labels to video files based on these factors. This aspect of video analysis will be elaborated in another post.
These two components enable us to compare videos by topic and style. While this doesn't constitute a recommendation algorithm on its own, it establishes a parameter space in which a recommendation algorithm can operate. Users can also navigate this space independently by selecting their desired learning objectives and preferences.
Stay tuned for further updates!