Buildup of speaking skills in an online learning community: a network-analytic exploration

Studies in learning communities have consistently found evidence that peer-interactions contribute to students’ performance outcomes. A particularly important competence in the modern context is the ability to communicate ideas effectively. One metric of this is speaking, which is an important skill in professional and casual settings. In this study, we explore peer-interaction effects in online networks on speaking skill development. In particular, we present an evidence for gradual buildup of skills in a small-group setting that has not been reported in the literature. Evaluating the development of such skills requires studying objective evidence, for which purpose, we introduce a novel dataset of six online communities consisting of 158 participants focusing on improving their speaking skills. They video-record speeches for 5 prompts in 10 days and exchange comments and performance-ratings with their peers. We ask (i) whether the participants’ ratings are affected by their interaction patterns with peers, and (ii) whether there is any gradual buildup of speaking skills in the communities towards homogeneity. To analyze the data, we employ tools from the emerging field of Graph Signal Processing (GSP). GSP enjoys a distinction from Social Network Analysis in that the latter is concerned primarily with the connection structures of graphs, while the former studies signals on top of graphs. We study the performance ratings of the participants as graph signals atop underlying interaction topologies. Total variation analysis of the graph signals show that the participants’ rating differences decrease with time (slope = −0.04, p < 0.01), while average ratings increase (slope = 0.07, p < 0.05)—thereby gradually building up the ratings towards community-wide homogeneity. We provide evidence for peer-influence through a prediction formulation. Our consensus-based prediction model outperforms baseline network-agnostic regression models by about 23% in predicting performance ratings. This in turn shows that participants’ ratings are affected by their peers’ ratings and the associated interaction patterns, corroborating previous findings. Then, we formulate a consensus-based diffusion model that captures these observations of peer-influence from our analyses. We anticipate that this study will open up future avenues for a broader exploration of peer-influenced skill development mechanisms, and potentially help design innovative interventions in small-groups to maximize peer-effects.


Introduction
Contrast a classical one-way classroom setting, where the teacher delivers lectures and the students take notes, to an interactive one that encourages in-class and out-of-class student interaction in the form of learning communities. Research supports the intuition that the latter is more effective in achieving higher academic outcomes, better retention rates 1, 2 , decreased faculty isolation 3 , stronger engagement and increased satisfaction with college 4,5 . Learning communities and the associated peer effects have motivated decisions in various knowledge-spheres, from supporting Universities' agenda of providing superior education 1 to substantiating Government policy debates regarding school desegregation and ability-grouping [6][7][8] .
Let us take a closer look at what value such a learning community adds at an individual and global level. At an individual level, a student learns in the roles of both a tutee and a tutor. As a tutee, s/he receives support from peers 9 , observes their actions, and collaborates to accumulate knowledge in the field 2,10 . As a tutor, s/he supports them back, and the meta-cognitive task of explaining something or aiding someone clarifies his/her own understanding 11 . At a global level, the participants and their interactions are well suited to be modeled as a network of nodes and edges, in which knowledge diffusion is simulated as a barter process 12 . Indeed, knowledge can be traded without decreasing the level possessed by each trader as it is a nonrival good. The word of mouth interactions inherent in such settings can be modeled as the currency of knowledge transfer across a network 13 , while effective diffusion of knowledge has been shown to be achieved through a small-world network structure 12,14 .
Modeling learners and their interactions as a network of nodes and edges naturally leads to the use of Social Network Analysis tools to gain insight into how various network parameters affect offline and online learning. For instance, higher network density has been shown to correlate with higher sense of community 15 , which in turn correlates with better educational outcomes 5 . Network centrality and prestige have been shown to be robust predictors of cognitive learning outcomes 16 . Furthermore, longitudinal studies of communities of students have demonstrated mean-reverting effects in terms of performance, as for example in an offline study on the joint effect of social influence and social selection 17 .
However, when considering online communities, distinctive characteristics are found as compared to traditional offline ones, for better and for worse. On the one hand, visual cues available in oral communication are missed out online 18 , and physical separation between the students and their instructor often leads to a sense of isolation and contributes to dissatisfaction 19 . On the other hand, learners have ample time to organize their thoughts when composing posts or comments for online platforms 20 , they can post their comments or messages at their convenience, and do not have to compete for opportunities to talk 21 . Importantly, online networks enjoy an advantage in quantifying interactions more objectively through posts and comments 16,22 , as opposed to the use of questionnaires and surveys in offline datasets 17,23 .
In existing literature, there are relatively limited sets of experimental studies that examine the impact of online interactions in characterizing how the knowledge of individual participants and the community as a whole evolve temporally. We address the gap in this paper by introducing a new dataset of six online learning communities, and formulating analysis, simulation and prediction frameworks to shed light on the mechanism and implication of knowledge propagation within the dataset, temporally at individual and global levels.
The data comes from a ubiquitous online system, ROC Speak 24 , built by us, that gives people semi-automated feedback on public speaking. 158 participants, aged 18 to 54 years, were hired from Amazon Mechanical Turk ("Turkers"), and worked towards a common goal of improving their public speaking skills. They were randomly assigned into 6 groups as we refer to as Groups 1 through 6, having 26, 31, 26, 30, 22 and 23 participants respectively. They had to record 5 videos in 10 days on common job interview question prompts, one video every other day. The system generated automatic feedback on smile intensity, body movement/gesture, loudness, pitch, unique word count, word cloud and instances of weak language use for groups 2, 4 and 6, as shown in Figure 1. In all six groups, the participants exchanged subjective feedback with their peers. They were required to give feedback to at least three peers in each prompt, whom they could choose randomly. Each feedback comprised of (1) at least three comments and (2) performance ratings on a 1-5 Likert scale. The exchange of a comments within groups essentially formed the six independent online learning communities in the dataset, and is considered as the currency of interaction in our analysis. On the other hand, the average of the performance ratings each participant received from his/her peers in a given prompt is considered a proxy of the participant's knowledge, and we track its temporal behavior towards understanding how it is affected by the interactions. The dataset comprises of 25,665 ratings -5,053 of them 'overall' performance ratings that we use in this study -and 14,597 comments in total.
We present insights obtained through our analyses of the ROC Speak data. At the global level, we find evidence of diffusion of knowledge across the network. In essence, we show that as the participants interact more, the total variation in speaking performances decreases in the online communities, i.e. the participants' ratings gradually come closer to each other; while increasing over time. We introduce a framework to model and simulate such an online knowledge propagation mechanism based on consensus protocols for network synchronization and distributed decision making 25 . For example, Olfati-Saber and Murray 26 studied such consensus algorithms and presented relevant convergence analysis, but we modify the protocol therein to suite the online learning community in question. We discuss how the proposed network dynamics capture the unique features encountered with the online community, namely that participants "pull" each other closer towards the better as they interact more, gradually turning a heterogeneous learning community to a homogeneous one. Leveraging these insights, we look at an individual level, and find evidence that the participants are impacted by the quality of their peers and the amounts of interaction. We establish this by formulating a convex optimization problem to predict future performance ratings of the participants. We compare prediction errors between two models: (i) an ordinary least-squares regression using only past performance ratings of the participant, i.e., a model which is agnostic to the network effects on the participant; and (ii) augmenting the least-squares predictor with a smoothness regularization term encouraging network-wide consensus 27 . We find that the latter performs better at predicting the ratings consistently across all six learning communities in the dataset, thereby capturing the neighbors' effect on individual learning. In developing these analysis, simulation and prediction methods we leverage concepts from Graph Signal Processing, an innovative framework with well-documented merits for studying similarities and connectivity patterns between interacting entities in applications such as sensor, brain and social networks 28 .
In this context, our contributions can be summarized as follows: • Constructing a dataset of six online learning communities that comprises the speaking ratings and interaction information of a total of 158 participants across 5 prompts; • Analyzing total variation and performance trends in the dataset to observe the fact that at a global level a heterogeneous learning community tends towards homogeneity; • Modeling and simulating the observed online knowledge propagation mechanism; and • Predicting future performance ratings of the learners through an optimized network-analytic approach, which provides evidence that individual performances are impacted by the quality of the participants' peers and the amounts of interaction.

Figure 1.
A snapshot of the ROC Speak feedback interface. The page shows automatically generated measurements for (A) smile intensity, gestural movements, (B) loudness, pitch, (C) unique word count, word recognition confidence, a transcription of the speech, word cloud, and instances of weak language. Additionally, the peer-generated feedback are shown in (D) a Feedback Summary section, and (E) a top ranked comments section classified by usefulness and sentiment. In our study, we use the overall ratings given by the peers, as shown in segment (D).

Analysis of ROC Speak Data
Consider modeling an online community's participants and their interaction patterns through a network graph. To that end, we naturally identify each participant with a node in the said graph. In the ROC Speak dataset, people leave a comment only after they have watched a video of another user. How does knowledge diffuse through such pairwise interactions? Participant A can learn from the feedback given by Participant B, and Participant B can also learn from Participant A's public speaking approach, mistakes, and qualities from watching his/her video. Therefore we model each interaction via an undirected edge, acknowledging mutual benefit of the two nodes or participants. Exchanges of multiple comments are encoded through integervalued edge weights. More formally, for each community in the dataset, let us consider an undirected, weighted graph G(N , E , W (p) ) representing the network with a node set N of known cardinality N (i.e., the total number of participants in the community), and the edge set E of unordered pairs of elements in N . The so-called symmetric weighted adjacency matrix is denoted by W (p) ∈ R N×N , whose i j th element represents the total number of comments that participants i and j have given to each other during the p th prompt. The participants could randomly choose whom to give feedback to, which makes G a random graph. Notice that since comment and feedback patterns tend to change across prompts, so does the connectivity pattern (i.e., the topology) of the resulting graph and hence the explicit dependency of the weights in W (p) with respect to p.
Let us now incorporate the participants' rating information in the model in the form of graph signals. Here, a graph signal 28 is a vertex-valued network process that can be represented as a vector of size N supported on the nodes of G, where its i th component is the rating of node i. As explained in the previous section, the participants are given performance ratings by their peers, and we take a participant's average rating in each prompt as the graph signal value. Thus, we collect the ratings of the p th prompt in a vector r (p) ∈ R N , where r (p) i is i th participant's rating. This representation is visualized in Figure 2. Under the natural assumption that the signal properties are related to the topology of the graph where they are supported, the goal of Graph Signal Processing is to develop models and algorithms that fruitfully leverage this relational structure, and can make inferences about these signal values when they are only partially observed 28 . We contend this is a valuable and innovative approach, that will be henceforth adopted for the analysis and simulation of the ROC Speak ratings data.

Total Variation Analysis
Exposure to knowledge does not guarantee the ability to act upon it. A student can be taught the appropriate use of grammar, but unless s/he is able to use grammar accurately, the evaluations would not reflect the acquisition of knowledge. From the interactions in the online learning community, a participant might get to know how to modulate voice better, for instance, but his/her ratings would not improve unless the uploaded videos are reflective of this knowledge. Through mutually beneficial interactions and repetitive practice, however, we hypothesize that effective diffusion will gradually take place, i.e., people will eventually be able to convert the knowledge into actions. These will in turn enable them to perform closer to the qualities of their peers, since knowledge can be traded without decreasing an individual's share of it. Therefore the variation of ratings within the community will gradually decrease. The more people interact and the better their peers' ratings are, the more this effect is likely to show. To test this hypothesized phenomenon in our dataset, we carry out a total variation analysis of the rating signals as elaborated in the Methods section.   The total variation TV(r) is often referred to as a smoothness measure of the signal r with respect to the graph G. If user rating values in r differ significantly -especially between pairs of nodes that show strong patterns of interaction or equivalently large weights -then TV(r) is expected to be large. If on the other hand the variations are minor, we say the graph signal is smooth and accordingly TV(r) will take on small values. All in all, one can assert that if the ratings of the users come closer to each other, the total variation measure will decrease.
In the ROC Speak data, we evaluate the total variation of the rating signals r (p) across the network at the end of each prompt [cf. (1)]. The normalized total variation for each of the six groups is plotted in Figure 3a, and exhibits a decreasing pattern from the third prompt onwards. The dashed plot in Figure 3a indicates the linear trend of the average across the six groups. A diminishing trend is apparent, providing evidence that the interacting users are coming closer to each other in terms of rating outcomes.

Overall Trajectory of Ratings
To complement the gradual decrease of total variation, we calculate the network-wide average ratings at each prompt, as detailed in the Methods section, and plot them as a function of the prompt index p. Figure 3b shows that all of the six groups have a trend of improvement in average ratings across prompts. The dashed plot indicates that the average trend across all six groups has a positive-slope characteristic.
Our findings from the analysis of ROC Speak data indicate that at a global level, the participants improve collectively, while at the same time their ratings tend to a network-wide consensus. In line with the hypothesized phenomenon, repetitive interactions among the users in the roles of tutor and tutee enable effective diffusion to take place, and a participant thus accumulates knowledge. This accumulation allows the average ratings to have the observed positive trend. Knowledge being a non-rival good, the knowledge gap among participants also reduces, explaining the diminishing trend in total variation. We therefore interpret that the users experience a "pull" effect from their peers towards a better average rating, enabling the community to progress towards homogeneity at a global level.

Simulation Framework for Knowledge Propagation
Leveraging the aforementioned insights on the mechanisms of knowledge propagation in our dataset, we proceed to develop a model for the user ratings' evolution which facilitates simulation of the network process. We model the temporal evolution of user ratings r (p) as a diffusion process on the graph G, where the trade of knowledge can take place without decreasing the individual level of knowledge. We impose a positive drift to the ratings in order to model the fact that a learner can accumulate knowledge from various sources such as automated feedback, peer feedback, experience of watching peers' videos, and other external resources. In a real network, people will have different learning rates, their individual talents will differ, external inflow of knowledge will not be consistent, and in some less probable cases they may forget information as well. To lump all these noisy yet positively skewed effects into one variable, we model the positive drift ε as a Gaussian random variable with a positive mean µ > 0. Superposition of random effects are well modeled by a Gaussian random variable by virtue of the Central Limit Theorem. The ratings received by participants lie between 1-5, so we introduce an explicit control procedure to bound the ratings in our model. Based on these assumptions, the formulation of our proposed simulation framework is elaborated in the Methods section. Figure 4 demonstrates the overall idea pictorially. , is calculated from the user's own rating in the p th prompt, r (p) 1 = 4 (denoted by blue bar), single interactions with users 2, 3 and 4, and a positive drift ε 1 . Here, users 2 and 4 both have ratings of 3 in the first prompt, and user 3 has a rating of 5. Since two of the peers are "pulling" user 1 downwards and one peer is "pulling" upwards, the net diffusion effect takes user 1's rating from 4 down to 3.99 (denoted by a yellow bar). However, a Gaussian random positive drift ε 1 = 0.0676 (denoted by a green bar on top of the yellow bar) represents a sample realization of user 1's gathering of knowledge from sources external to peer feedback, and pushes his rating in the second prompt to r (p+1) 1 = 3.99 + 0.0676 = 4.0576. The magnified plot reflects the probability distribution of user 1's rating at (p + 1) th prompt which is centered around 3.99 + µ, with mean µ. This also shows that the rating in (p + 1) th can take any value in the interval [r min , r max ], but with a higher probability it is closer to 3.99 + µ.

Numerical Test
We test the proposed model by running a simulation that synthetically generates user ratings across prompts according to the dynamics in (3) in the Methods section. Since the dataset spans 5 prompts and we would like to run the simulation for a longer time horizon, we synthetically generate graphs with structural properties resembling our dataset. Since the nature of interaction is random in our communities, we generate Erdős-Rényi random graphs for each prompt with the same number of nodes as our dataset (N = 26 nodes) 29 . Each edge is included in the graph with probability p = 0.5, so that the expected number of edges N 2 p matches the number of edges in the data set. We make the positive drift term ε have mean µ = 0.05 and standard deviation σ = 0.1 based on the ROC Speak average ratings behavior, and select c = 0.01. Then, we run the iterations described in (3) with r (1) as initialization, where r (1) i 's are drawn uniformly at random from the interval [1,5]. Figure 5a shows two realizations of the total variation (1) as it evolves over prompts, superimposed to the mean evolution obtained after averaging 1000 independent such realizations. The specific realizations show fluctuations similar to those observed in Figure 3a, but the trend is diminishing towards zero meaning that the ratings become similar with time. Likewise, 6/12 Figure 6. A sample visualization of how the users' ratings r (p) (blue bars) change as total variation diminishes and average rating saturates at limiting time (i.e., prompt p → ∞). The network topology changes in every prompt as the users randomly interact with each other. A decreasing TV(r) measure is a direct manifestation of the signal r (p) being smoother as p increases. Figure 5b shows two realizations and the mean evolution of the network-wide average ratings (2) as a function of prompts. Again the simulation captures the observed effect of users globally improving with time.
To summarize, this section characterized the observed phenomena of the users "pulling" each other closer towards the better through a diffusion model with a positive drift. This captures the positively skewed effects of repetitive interactions in accumulating and propagating knowledge. While the network model is admittedly simple, the simulation results indicate that the synthesized sample paths match the salient observations made from the real-data analysis. This effect is further illustrated in Figure 6, where the ratings of the users are shown as blue bars. The figure demonstrates that as the users interact more, the variation of their ratings diminish while the average ratings approach the limiting value r max .

Ratings Prediction Under Smoothness Prior
Does the interaction information hold any key to where the knowledge of a participant and the community as a whole might stand after a given period of time? We have already shed light on how the community at a global level gradually becomes more homogeneous and evolves towards the better. To further support our insights of the underlying "pull" mechanism, we explore the phenomena at an individual level, and approach it as a prediction problem. It is important to note that we do not seek to fit the data to find the best possible regression model for predicting future ratings of the users. Rather, we intend to compare prediction errors between network-agnostic and consensus-based regression models to illustrate our point that the network has an impact on individual outcomes. In particular, the network-agnostic regression model takes into account only the trajectory of a user's ratings to estimate his/her future rating, while the consensus-based regression model also considers the ratings of his/her peers and the quantity of interactions. The latter is consistently able to make a better estimation across the six groups, suggesting evidence of peer influence.
In our dataset, we have interaction and rating information across five prompts. We use the first four prompts' data for training, and predict the 5 th prompt's ratings for all the users in the six groups. We compare the predictions against the original ground-truth ratings of the final prompt to report the prediction errors. In making the predictions, we use our proposed prediction frameworks of linear regression models with (a) linear and (b) non-linear features that encourage network-wide consensus in terms of small total variation. The objective functions for the regression models are elaborated through equations (5) and (6) in the Methods section, as well as the figures of merit used for computing prediction errors.
The results are summarized in Table 1. The first and third columns of the table show the relative errors obtained by our proposed consensus-based predictors across all six groups. The second and fourth columns tabulate the prediction errors when we switch off the smoothness regularization term (by plugging λ = 0 in (5) and (6)), i.e. when network-agnostic regression baselines are adopted to predict user ratings. Table 1 shows that if we incorporate the "pull" effect of the network into our predictions (λ = 0), we are consistently able to markedly improve the prediction performance against the network-agnostic models (λ = 0) across all six learning communities. This result validates the idea that interactions with the neighbors in one's network indeed impact his/her learning outcomes.
For the sake of illustration, we can consider a scenario in a classroom setting. A teacher can track the test scores of John and make a prediction of his future score by running a regression through his records. We observe that John is essentially pulled closer to the qualities of his peers. Hence, if the teacher also plugs into the prediction framework how much John interacts with his peers and what the peers' scores are, our arguments suggest that the teacher will be able to make a better prediction. This is consistent with the interpretations of gradual convergence towards global knowledge homogeneity.

Discussion
In this work we studied a dataset of six online learning communities from a network-centric, Graph Signal Processing perspective; and explored the mechanisms and implications of knowledge propagation. At a global level, we found evidence of Table 1. Fifth prompt prediction errors comparison between consensus-based (λ = 0) and network-agnostic (λ = 0) frameworks, for both regression models with (a) linear and (b) non-linear features. Bold errors denote better prediction results. The optimization functions for the regression models are elaborated through equations (5) and (6) in the Methods section. total variation declining and average ratings improving temporally, suggesting diffusion and accumulation of knowledge. This leads to the gradual closure of knowledge gap towards achieving community-wide homogeneity. We proposed a network diffusion model to simulate knowledge propagation dynamics, designed to capture the observed diminishing total variation and increasing trends in ratings. At an individual level, we found evidence of the participants being impacted by the knowledge of the peers and the amount of interaction. We observe this by developing a statistical inference tool to predict the users' future ratings based on their individual rating trajectories coupled with a network-wide smoothness regularization. Numerical results showed a consistent improvement in predictions across all six communities in the dataset when compared against baseline network-agnostic regression schemes, thereby providing support to our proposed mechanisms.

(a) Prediction errors with linear features (b) Prediction errors with non-linear features
The traditional view of knowledge as a foundational social construction contends that there is a 'reality' out there, and in pursuing the goal of discovering that best or correct answer, two heads or more are probable to do better than one. In contrast, in the non-foundational view, knowledge is constructed by a group of people working together, and not discovered, which makes having a community of learners a necessity 30 . For instance, William Whipple argues that teachers and students work interdependently to construct knowledge, rather than knowledge being transferred in an authoritarian structure from teacher to student 31 . Accordingly, in our dataset, participants interact online in an interdependent manner to construct knowledge together, showing evidences of mutual benefit.
The idea that learners actively construct knowledge as they build mental frameworks to make sense of their environments is a fundamental assumption of constructivism 2 . In a research on cognitive psychology, subjects were shown the sentence, The window is not closed. Later, most of them recalled the sentence as The window is open. This leads to the idea that students build a mental image or a cognitive map, a schema, as they learn something, and later build new knowledge in connection to what they already know 2 . In our analyses of the online communities, we found evidence that the total variation had a gradual decline while the average rating increased as the online interactions took place repetitively. This is in agreement with the idea of gradual buildup of knowledge through mental schemas.
The influential Coleman Report 8 was among the first comprehensive explorations to suggest that children's schooling outcomes are likely to depend on the attainment of their school peers. Establishing social ties with competent and helpful peers are likely to help students achieve better academic results and extract additional benefits through the utility of friendship 32 . Using Texas Schools Microdata, it has been shown that an exogenous change of 1 point in peers' reading scores raises a student's own score between 0.15 and 0.4 points 33 . In our study of online learning communities, we observe in a similar way that participants are influenced by the quality of their peers as they interact temporally.
In a rather innovative study on peer effects, Goethals explored whether students would perform better writing about newspaper articles they read and discussed in academically homogeneous or heterogeneous groups of three 34 . It was observed that homogeneous groups outperformed the heterogeneous ones, irrespective of whether they were made up of students from the top or bottom halves of the class. In our study we do not explore outcomes of ability grouping in the context of online communities, rather show that accumulation of knowledge gradually leads to global homogeneity. Exploring impacts of ability grouping towards maximizing learning outcomes is one of our future works.
One important aspect of the learning communities in our online dataset is that the participants could choose randomly whom they gave feedback to in each of the five prompts, leading to the formation of random networks. Cowan and Jonard explored in their simulations how the average and variance of knowledge behave as a network goes from regular to random by changing the rewiring probability 12 . It was observed that small world networks with rewiring probabilities around 0.06 shows greater average ratings than regular or random networks, but at the cost of greater knowledge variance. Therefore, theoretically, the networks in our dataset are not optimized for the facilitating the greatest average ratings of the participants, and perhaps the use of small-work structures can lead to better average performances. On the other hand, simulation results do show lower variance in knowledge for random graphs, which supports the idea of effective closing of knowledge gap in our online communities, towards achieving homogeneity at a global level.
In his study on behavior propagation, Centola showed that health behavior spreads farther and faster across clustered-lattice networks than across corresponding random networks, since the former structure facilitates social reinforcement through redundant network ties 35 . Exploring how incorporating such regular lattice and small world structures affect the propagation of knowledge-based action, i.e., improved public speaking skills in the communities are some of our future courses of work.
In our dataset, the Turkers were recruited and grouped randomly, and the short 10-day duration of the study did not show any evidence of formation of friendship ties. Therefore we ignored the complexities of a probable co-evolutionary process between the dynamics of social selection (formation of friendship ties) and social influence (performance outcomes) in our analyses, which were discussed by Lomi et al. 17 for an offline MBA learning community.
Although we had six independent groups in our dataset, the number of participants in them (26,31,26,30,22 and 23 respectively) are not large enough to give conclusive insights on knowledge propagation dynamics in large communities with thousands of users. In all of the six groups, the users interacted over five prompts, given to them across 10 days. This again limits us from conclusively understanding performance trends at a distant limiting time, and we had to resort to simulation. The peers were not trained raters, hence we cannot ignore the possibility of bias in their judgments. These are some of the noisy effects that might have corrupted the data we have used.
ROC Speak is an online learning community, but the analysis, simulation and prediction frameworks we present can still be applied to any online or offline learning community that has similar interaction characteristics and where the outcomes can be quantified. An example of such a learning community could be a Massive Open Online Course (MOOC) forum. Various contemporary MOOCs have discussion forums, but in many of these, participation is not an integral part of the learning process, rather an optional endeavor. Our findings show that the amount and nature of interaction has an effect on the learning outcomes, and therefore the effectiveness of deploying interaction interfaces as integral components of MOOC learning can be explored, towards maximizing knowledge propagation.

Total Variation Analysis
Recall the network adjacency matrix W (p) ∈ R N×N defined previously. The degree of user i at the p th prompt is the total number of comments that s/he has exchanged by the end of that prompt, and is defined as d  N ) is a diagonal matrix that collects the users' degrees on its diagonal, and 1 N stands for an all ones vector of length N 36 . For a graph signal r at any given prompt, one can utilize the graph Laplacian to compute the so-termed total variation (TV) of the users' ratings over the network as where r i and r j reflect the ratings of users i and j, respectively, while weights W i j account for the amount of interaction between users i and j in the prompt in question. (1) clearly shows that if the ratings do not vary much across connected nodes, then TV(r) is small and vice versa, making total variation a measure of smoothness. To better present total variations in a graph, we collect them in a vector TV = [TV(1), TV(2), . . . , TV(m)] T and plot normalized total variation as TV/ TV over prompts in Figures 3a and 5a, where . maps a vector to its Euclidean norm.

Overall Trajectory of Ratings
The network-wide average rating at each prompt is calculated as where r (p) i is the rating of the i th user in the p th prompt.

Network Diffusion Model of Knowledge Propagation
Based on the assumptions discussed in the Simulation subsection of Results, we model the evolution of ratings via the Laplacian dynamics where r (p) is the ratings vector of p th prompt, c ∈ (0, 1/d max ) is the diffusion constant and d max is the maximum degree of nodes at the corresponding prompt, L (p) is the Laplacian matrix of the graph G(N , E , W (p) ) at the p th prompt, ε is a Gaussian random vector with mean µ > 0 and given variance σ 2 , and P [r min ,r max ] (·) is a projection operator onto the interval [r min , r max ].
For ROC Speak one has r min = 1 and r max = 5.
To understand the chosen dynamics, disregard the projection operator in (3) for the sake of a simpler argument. Then notice that the update r (p+1) = r (p) − cL (p) r (p) represents a Laplacian-based network diffusion process 26 , where the future rating of a given user depends on the ratings of his/her peers in the current prompt (r (p) ) and the nature of the interactions taking place in the learning community (L (p) ). Focusing on the i th user recursion in (3) (modulo the projection operator), one obtains the scalar update where r (p) i is i th user rating at the p th prompt. In obtaining (4), we have used the definition of graph Laplacian, i.e., L (p) := j is a weighted average of user i and his/her neighbors' ratings, with each neighbor's weights being proportional to the number of interactions that s/he has with user i. This way, the model captures knowledge propagation across the network where the diffusion constant c is a relatively small number. Further intuition can be gained by interpreting r (p+1) = r (p) − cL (p) r (p) as a gradient-descent iteration to minimize the total variation functional TV(r) := r T L (p) r in (1). This suggests that (3) will drive the ratings towards a consensus of minimum total variation, directly modeling the observed "pull" effect.

Ratings Prediction Under Smoothness Prior
We collect the vectors r (p) ∈ R N for the first m prompts (m = 4 in our data set) to form the training set, where N is the number of users and the i th component of r (p) indicates the rating of the i th user by the end of the p th prompt. Then, we predict the (m + 1) th prompt ratings using two different linear regression models: (a)r = β 0 + β 1 p, and (b)r = β 0 + β 1 p + β 2 √ p, where p is the prompt index, and β 0 , β 1 , β 2 ∈ R N are parameters learned through the following regularized least-squares criteria respectively: (a) and (b) +µ( β 1 2 + β 2 2 ). (6) Here, the regression model has the non-linear √ p term to capture the saturation effect at limiting time, as observed in our numerical tests. Problems (5) and (6) are both convex, specifically unconstrained quadratic programs with closed form solutions 37 . The first summands in the objective functions in (5) and (6) are data fidelity terms, which take into account each participant's individual trajectory of learning across the first m prompts, and fits them to the respective linear regression models. Accordingly, the first terms can be attributed to a person's own talent or pace of learning. The second summands in both the objective functions incorporate the effect of the neighbors' ratings and the amount of interaction into a participant's learning curve, in accordance with our observation that the whole community is being "pulled" closer. Notice that these terms are nothing else than smoothing regularizers of the form ∑ m p=1 TV(r), hence encouraging user ratings prediction with small total variation. The tuning parameter λ > 0 balances the trade-off between faithfulness to the past ratings data and the smoothness (in a total variation sense) of the predicted graph signalsr =β 0 +β 1 p andr =β 0 +β 1 p +β 2 √ p, and can be chosen via model selection techniques such as cross validation 38 . If we switch off the network-consensus effect by setting 10/12 λ = 0 in (5) and (6), they boil down to network-agnostic baseline regression models. Parameter µ which is chosen along with λ via leave-one-out cross validation, prevents overfitting via a shrinkage mechanism which ensures that none of β 1 and β 2 get to dominate the objective functions disproportionately 38 .

Data Availability
The dataset analyzed in the study is available from the corresponding author on reasonable request.