The majority of today’s online social networks offer the users the possibility to actively contribute to the platform’s growth by sharing different forms of UGC, according to their interests, competencies, and willingness. Looking from a different perspective, users are also exposed to the content generated by others on the platform. Thanks to the integrated search engines and the use of hashtags they can explore this content, discover users with similar interests, and ultimately decide to become followers. Given the limited time, they can spend on social media platforms, intuitively users seek to optimize the list of followees so as to receive high-quality content.

### Meritocratic principle

In order to support our intuition, we collected a longitudinal Twitter data set^{30} on a network composed of >6000 scientists working in the area of complex social networks. Compared with other data sets, one of the advantages is that we can expect most of the complex network scientists to be active on Twitter, given that they consistently study social networks effects. Moreover, the most popular nodes can be easily associated with renowned researchers in the field. Arguably, the number of followers can be considered as a proxy for the quality of the content generated by a user. We support this hypothesis by manually inspecting and labeling the top in-degree nodes.

Then, we enumerate the agents in decreasing in-degree order, so that agent 1 (presumably providing the best UGC) is the most followed, agent 2 is the second most, and so on. For each agent *i*, we reconstruct the temporal sequence of the outgoing connections (({i}_{1},{i}_{2},ldots ,{i}_{{d}_{i}^{{{{{{{{rm{out}}}}}}}}}})), where *i*_{k} is the index (rank) of the destination of the *k-*th outgoing connection of agent *i*, and ({d}_{i}^{{{{{{{{rm{out}}}}}}}}}) is her final out-degree. Then, for each agent *i* we compute the probability

$${p}_{i}=frac{left|{jin left{2,ldots ,{d}_{i}^{{{{{{{{rm{out}}}}}}}}}right},,{{mbox{s.t.}}}{i}_{j} , > , {{mbox{median}}},{{i}_{1},ldots ,{i}_{j-1}}}right|}{{d}_{i}^{{{{{{{{rm{out}}}}}}}}}-1},$$

(1)

that estimates the likelihood that a new connection of agent *i* is higher (in ranking) than the median of the previous connections. At one extreme, for *p*_{i} = 0 the sequence (({i}_{1},{i}_{2},ldots ,{i}_{{d}_{i}^{{{{{{{{rm{out}}}}}}}}}})) is such that every new followee has a rank smaller than (or equal to) the median of the current list of followees’ ranks, i.e., ({i}_{j}le ,{{mbox{median}}},{{i}_{1},ldots ,{i}_{j-1}}). As such, the rolling median, i.e., the median computed among the first *k* elements, is always non-increasing in *k*, and the user is continuously seeking better UGC. Conversely, for *p*_{i} = 1 the rolling median is always increasing.

In Fig. 1 we compare the histogram of the empirical distribution of *p*_{i} (in purple) with the null hypothesis, plotted in blue, in which we remove the temporal-ordered pattern of the sequence. The elements in agent *i*’s followees set ({{{{{{{{mathcal{F}}}}}}}}}_{i}^{{{{{{{{rm{out}}}}}}}}}:= {{i}_{1},{i}_{2},ldots ,{i}_{{d}_{i}^{{{{{{{{rm{out}}}}}}}}}}}) are randomly re-ordered: in the new sequence (({bar{i}}_{1},{bar{i}}_{2},ldots ,{bar{i}}_{{d}_{i}^{{{{{{{{rm{out}}}}}}}}}})), the *k-*th element of the list is drawn uniformly at random from ({{{{{{{{mathcal{F}}}}}}}}}_{i}^{{{{{{{{rm{out}}}}}}}}}setminus left({bar{i}}_{1},ldots ,{bar{i}}_{k-1}right)). The null hypothesis has a median value of ~0.5, which is easy to interpret: if the sequence is completely random, adding an extra element to the partial sequence has a 50% probability of being above the median, and 50% probability of being below it. Comparing the two distributions, we notice that empirical data tend to have a decreasing quality-ranking sequence of followees. As the difference is statistically significant, we can reject the hypothesis that the temporal sequence is random.

Ultimately, this empirical evidence confirms our intuition, i.e., users tend to continuously increase the quality threshold of the new followees. This characteristic, though, is missing in the network formation literature, as there is typically no quality associated with the users. For instance, in the preferential attachment model^{27}, each user selects *m* followees proportional to their in-degree. If the network is large enough, the probability of selecting a node *k* in the *d-*th draw does not depend on *d*. Therefore, the temporal sequence of connections of the preferential attachment model is similar to the null hypothesis. Moreover, even in the fitness model^{16} where the quality (fitness) is considered, users tend to connect to high-quality nodes first, rather than later.

### Quality-based model

To formalize our quality-based model, we consider the unweighted directed network among *N* ≥ 2 agents whose UGC revolves around a specific common interest, e.g., a particular traveling destination. We denote the directed tie from *i* to *j* with ({a}_{ij}in left{0,1right}), where *a*_{ij} = 1 means *i* follows *j*. Then, we assume there are no self-loops and that each agent *i* can only control her followees *a*_{ij} but not her followers *a*_{ji}. Similarly to the approach in the fitness model^{16}, we endow each actor *i* with an attribute *q*_{i}, drawn from a probability distribution, e.g., uniform, normal, exponential distribution, that describes the average quality of *i*’s content^{32}, e.g., a picture taken at that traveling destination. As will be manifested later, our model predictions are independent of the numerical representation of these qualities, which could be somehow subjective and arbitrary. Instead, in our model, only the ordering of the individual qualities matters. Therefore, the choice of the underlying probability distribution does not affect any of the following results, contrary to the fitness model^{16}.

The quality *q*_{i} can be seen as the expectation of a Bernoulli random variable *Q*_{i} describing the probability of followers liking agent *i*’s content. Higher values of *q*_{i} are then associated with better UGC. A value of zero, instead, can be used to model users that do not produce any UGC. With this setup, the model can be directly applied to the platforms, e.g., YouTube or Twitch, in which users can be partitioned into two classes, i.e., the content creators and their followers (or viewers, see Supplementary Note 1).

We then consider a sequential dynamical process starting from the empty network, where at each time-step (tin left{1,2,ldots right}) each actor *i* picks another distinct actor *j* chosen randomly from a probability distribution on (left{1,ldots ,i-1,i+1,ldots ,Nright}). In the following theoretical analysis, we consider the uniform distribution. However, we also integrate into our discussion the numerical comparison between uniform distribution and the in-degree-based preferential attachment meeting process.

To reflect the meritocratic principle, we base the tie formation decision on the comparison between *i*’s current followees’ and *j*’s qualities. Let the payoff function of agent *i* measure the maximum quality received by *i*, i.e.,

$${V}_{i}

(2)

where ({{{{{{{{mathcal{F}}}}}}}}}_{i}^{{{{{{{{rm{out}}}}}}}}}

(3)

meaning that *i* will add *j* in her followees’ set if *j* provides better quality content compared to *i*’s current followees. Note that, if *i* finds a node that already belongs to her set of followees, the connection will not be re-discussed. While, intuitively, this may lead to a large out-degree, we will show that this is not the case because the cost of good-quality connections is infinitely low, but the cost of poor-quality ones is infinitely high.

We emphasize that the choice of the payoff function eq. (2) reflects the natural continuous chase for the maximum quality^{37} (exploitation) while minimizing the effort owing to non-improving connections. Alternatively, one could consider smoother payoffs such as the followees’ average quality

$${V}_{i}

### Theorem 1

(Convergence) For any set of qualities (left{{q}_{1},ldots ,{q}_{N}right}), the network reaches an equilibrium almost surely. Moreover, the probability of reaching equilibrium within *t* time-steps reads as follows:

$${mathsf{P}} [,{{mbox{An equilibrium is reached within}}};t;{{mbox{time steps}}},]\ ={mathsf{P}}[{a}_{12}

From the above theoretical cumulative distribution function, we derive the expected number of time-steps to reach equilibrium. The result, shown in Fig. 2 (in blue), indicates that the number of time-steps grows almost exponentially in the network size *N*.

For comparison, we consider two alternative scenarios for the meeting probability distribution. In orange, we use a preferential attachment approach: at each time-step, each agent *i* meets a distinct agent *j* according to a probability that is proportional to the current in-degree of the agent *j*. By doing so, the quality-based dynamics (in the follow/not follow decision) are combined with a meeting process that coarsely resembles the effect of recommendation systems (increasing the visibility of high in-degree nodes). In purple, the meeting agent is chosen with 50% probability, from the uniform distribution, and with the remaining 50% probability, according to the preferential attachment mechanism. As the numerical results indicate, the preferential attachment-based meeting process significantly reduces the number of required time-steps, even when mixed with a uniform distribution.

### In-degree distribution

The network structure at equilibrium is a consequence of the probability distribution ruling the meeting process, and of the meeting order itself. Even though there can be multiple equilibria, some network macroscopic properties can be statistically described. A preliminary observation is that, at equilibrium, every other node follows node 1, and node 1 follows node 2. Before studying the stationary state, we statistically describe the transient in-degree distribution as a function of the quality ranking *i*. In particular, we provide an analytical formula for the expected in-degree of each node as a function of *i*, as stated in the following theorem.

### Theorem 2

(In-degree distribution) The probability that node *i* is followed by node *j* ≠ *i* after *t* > 0 time-steps is:

$${mathsf{P}}[{a}_{ji}

(4)

Moreover, the probability of node *i* having in-degree ({d}_{i}^{{{{{{{{rm{in}}}}}}}}}

(5)

where we omitted the time-step dependency on ({bar{p}}_{i}) and ({underline{p}}_{i}). Finally, the expected in-degree of node *i* after *t* time-steps reads as:

$${mathbb{E}}left[{d}_{i}^{{{{{{{{rm{in}}}}}}}}}

(6)

The proof of the theorem is provided in Supplementary Note 1, whereas in Fig. 3a we show the probability density function, together with its expectation, for a network of 1000 agents, after *t* = 200 time-steps (non-stationary state). The following corollary, instead, studies the probability density functions upon reaching the network dynamics equilibrium.

### Corollary 1

At equilibrium, the probability that node *i* is followed by node *j* ≠ *i* is:

$${mathsf{P}}[{a}_{ji}=1]:= mathop{lim }limits_{tto infty }{mathsf{P}}[{a}_{ji}

(7)

and the expected in-degree of node *i* reads as:

$${mathbb{E}}left[{d}_{i}^{{{{{{{{rm{in}}}}}}}},star }right]=left{begin{array}{ll}N-1,quad &, {{mbox{if}}},i=1,\ frac{N}{i},quad &,{{mbox{otherwise}}}.end{array}right.$$

(8)

### Proof

First, we derive the probability of node *i* being followed by node *j* at equilibrium by taking the limit *t* → *∞* in eq. (4):

$$mathop{lim }limits_{tto infty }{mathsf{P}}[{a}_{ji}

Similarly, taking the limit for *t* → *∞* of eq. (6) yields exactly to

$${mathbb{E}}left[{d}_{i}^{{{{{{{{rm{in}}}}}}}},star }right]=left{begin{array}{ll}N-1,quad &, {{mbox{if}}},i=1,\ frac{N}{i},quad &,{{mbox{otherwise}}},.end{array}right.$$

According to the above result, at equilibrium the best content provider, node 1, receives *N* − 1 connections, node 2 has *N*/2 expected followers, node 3 has *N*/3, and so on. The result can be intuitively reached with the following plausible reasoning: any user that has not yet found node 1 nor node 2, has the same probability of finding any of the two in the coming time-step. In expectation, in half of the cases, the user will become a follower of node 2 before finding and following (necessarily) node 1. In the other half of the case, she will find node 1 before having seen node 2. Thus, the expected number of followers of node 2 is half of the expected number of followers of node 1. The reasoning can be extended for any node *j* > 2.

Such a regular scaling property is called Zipf’s law^{34} and it is illustrated in Fig. 3b, where we plot the expected in-degree of each node as a function of its quality ranking, together with the probability density functions. In the log–log scale, the expected in-degree perfectly matches a line with coefficient −1. Real-world evidence of Zipf’s law has been reported in many systems, including firm sizes^{38}, city sizes^{33}, connections between web-pages^{39}.

To empirically validate our results in the context of online social networks, we collected and analyzed data from Twitch, an online social media platform focusing on video streaming, including broadcasts of gameplay, e-sports competitions, and real-life content. Over the past decade, Twitch gradually became one of the most popular social media platforms, reaching up to four million unique creators streaming each month and 17 million average daily visitors (https://www.twitch.tv/p/press-center/), and serving as a virtual third place for online entertainment and social interactions^{40}. Our data sets consist of the followership networks in three different categories, i.e., poker, chess, and art, where users can live-watch broadcasters playing or discussing these arguments. Among them, we only retain the first two categories, as the third one did not satisfy the criterion of having a baseline community of interested users. The description of our crawling method and of the collected data are provided in the Methods section and in Supplementary Note 3. In Fig. 4, we show the in-degree of the 15 most followed users in chess and poker as a function of the rank. The empirical data are fitted via linear regression (in the log–log plot) and look strikingly similar to Zipf’s law.

### Zipf’s law is more than a power-law

The peculiarity and apparent ubiquity of Zipf’s law have triggered numerous efforts to explain its origins^{33}. Despite being a discrete distribution, Zipf’s law is often associated with the continuous Pareto distribution, better known as power-law^{41}. For this reason, it is frequently seen as the result of a linear preferential attachment process^{27} based on Gibrat’s rule of proportional growth^{42} which leads to Yule-Simon (power-law) distributions^{43}. However, as noticed in^{35}, there is more than a power-law in Zipf: although a power-law distribution is certainly necessary to reproduce the asymptotic behavior of Zipf’s law at large values of rank *i*, any random sampling of data does not lead to Zipf’s law and the deviations are dramatic for the largest objects. In particular, Zipf’s law emphasizes the relation among the top-ranking elements, which essentially correspond to the most important nodes, i.e., the network influencers. The typical Zipf’s sequence *N*, *N*/2, *N*/3, … can be observed only if data constitutes a coherent set^{35}. Thus, Zipf’s law is more insightful than just a power-law.

The difference with a Pareto distribution becomes evident when considering our in-degree probability density function, which can be analytically derived by using the result of the previous theorem and computing the average of each user’s probability density function

$${mathsf{P}}[{d}^{{{{{{{{rm{in}}}}}}}},star }=d]=frac{1}{N}mathop{sum }limits_{i=1}^{N}{mathsf{P}}[{d}_{i}^{{{{{{{{rm{in}}}}}}}},star }=d].$$

As shown in Fig. 5, the theoretical in-degree probability density function follows a power-law of coefficient (hat{alpha }=-2.06pm 0.003) (*p* value < 10^{−8}), which is not surprising since the expected in-degree is distributed according to a Zipf’s law (see again the discussion in ref. ^{41}). However, notice that the power-law fit does not hold in the region of low-in-degree nodes (*d*_{min} = 7, according to the Clauset algorithm^{44}). In this region, the log-normal distribution is a better fit compared to the power-law^{45}. On the other hand, the fit with a log-normal distribution is worse (compared to the power-law) in the heavy tail of our distribution, i.e., in the region of high-quality nodes. However, the most noticeable difference is that, contrary to both the power-law and the log-normal curve, our theoretical distribution is not monotonically decreasing, especially in its right tail. The probability of having a node of in-degree *N*/2 is slightly larger than the probability of having a node of in-degree *N*/2 ± *α*, for some range of *α* ≥ 1. Moreover, the probability of having a node of in-degree in a neighborhood of *N*/2 is larger compared with the power-law fit (chosen as baseline). Conversely, our theoretical probability of having a node of in-degree slightly smaller than *N* − 1 is negligible (<10^{−8}), stating that the second-best node cannot be too close to the best node. This is in sharp contrast with the monotonic Pareto and log-normal distributions.

In synthesis, our probability distribution differs from Pareto and log-normal distributions when focusing on the network influencers. In this range, i.e., in the extreme right tail, our theoretical distribution is similar to its power-law fit, only after logarithmically (and thus coarsely) binning the data (as common practice when empirical data are scattered and the distributions are continuous). Yet, without such arguably coarse binning of the data, our theoretical distribution can predict more accurately the Zipf’s regularity found on the top influencers of real-world networks, as in the Twitch data set shown in Fig. 4. We finally refer to Supplementary Note 4 (see in particular, Supplementary Figs. 12, 13) for a detailed analysis of the empirical in-degree distribution.

### Preferential attachment meeting process

The theoretical results on the nodes’ in-degree distributions are derived based on a uniform distribution meeting process. In this scenario, every user has the same probability of being found. However, most social media platforms personalize the content users are exposed to, thus it is fundamental to understand what can be the impact if some users have more visibility than others. In order to do so, we numerically studied the results of a preferential attachment in-degree-based meeting process, which mimics the idea that popular users might get promoted (in terms of visibility) by the platform’s recommendation system. Compared with the preferential attachment model^{27}, though, here we only alter the probability of being found, but the connection will still depend on the meritocratic principle of eq. (3). In Fig. 6, we report the results of a mixed scenario in which the potential followee is chosen with 50% probability from the uniform distribution, and with the remaining 50% probability from an in-degree based distribution. We refer to Supplementary Note 2 (see in particular Supplementary Fig. 1) for more details on the numerical comparison between uniform distribution, mixed preferential attachment, and pure preferential attachment. Compared to the uniform distribution scenario in Figs. 3 and 5, the introduction of a mixed preferential attachment-based meeting process slightly increases the variance of the in-degree probability distribution of each agent. In this scenario, it becomes possible that some low-quality agents get an initial (i.e., in the early stage of the network formation process) advantage (purely by chance), which gets reinforced by the (mixed) preferential attachment mechanism. Yet, this effect quickly fades away, because potential followers still undergo the quality threshold rule of eq. (3). On the other hand, the (mixed) preferential attachment may penalize some other agents, which receive fewer followers than what they would expect with the uniform distribution process. It is remarkable that, even after introducing the mixed preferential attachment process, the correlation between quality and followers persists (on average): the higher the quality, the higher the average number of followers. Even more importantly, Zipf’s relation is robust under mixed preferential attachment-based meeting processes (e.g., recommendation systems). Similarly, on the right column of Fig. 6, the average in-degree probability distribution function is strikingly similar to the one obtained with the uniform distribution scenario.

### Out-degree distribution

Similarly to the in-degree, we can study the statistical distribution of the nodes’ out-degree at equilibrium. First, notice that, contrary to the in-degree, the out-degree distribution does not depend on the rank, but is uniform for all the nodes in the network. In fact, according to the dynamics, each node creates connections to increasing quality nodes, until reaching its own equilibrium. Let ({d}_{N}^{{{{{{{{rm{out}}}}}}}},star }) be the random variable describing the out-degree (at equilibrium) of a general node *i* in a network of *N* agents. First, note that the probability of node *i* > 1 finding node 1 (or node 1 finding node 2) at her first choice is simply:

$${mathsf{P}}left[{d}_{N}^{{{{{{{{rm{out}}}}}}}},star }=1right]=frac{1}{N-1}.$$

Conversely, the probability of having maximum out-degree corresponds to the probability of meeting all the other nodes in increasing quality ordering, i.e.,

$${mathsf{P}}left[{d}_{N}^{{{{{{{{rm{out}}}}}}}},star }=N-1right]=frac{1}{(N-1)!}.$$

Obviously, a node cannot have an out-degree larger than *N*−1 in a network of *N* agents, thus ({mathsf{P}}left[{d}_{N}^{{{{{{{{rm{out}}}}}}}},star }=dright]=0), for all *d* ≥ *N*. Similarly, ({mathsf{P}}left[{d}_{N}^{{{{{{{{rm{out}}}}}}}},star }=dright]=0), for all *d* < 1, every node must have out-degree of at least 1. Then, we denote as *C*_{1} the random variable describing the first meeting of *i*. *C*_{1} can take any value (jin left{1,ldots ,i-1,i+1,Nright}), with uniform probability 1/(*N* − 1). Then, the probability ({mathsf{P}}left[{d}_{N}^{{{{{{{{rm{out}}}}}}}},star }=dright]), for *N* ≥ 2 and 1 ≤ *d* ≤ *N* − 1, can be described recursively as follows:

$${mathsf{P}}left[{d}_{N}^{{{{{{{{rm{out}}}}}}}},star }=dright]= mathop{sum }limits_{j=1}^{i-1}{mathsf{P}}[{C}_{1}=j]{mathsf{P}}left[{d}_{j}^{{{{{{{{rm{out}}}}}}}},star }=d-1right] \ +mathop{sum }limits_{j=i+1}^{N}{mathsf{P}}[{C}_{1}=j]{mathsf{P}}left[{d}_{j-1}^{{{{{{{{rm{out}}}}}}}},star }=d-1right] \ = frac{1}{N-1}mathop{sum }limits_{j=1}^{N-1}{mathsf{P}}left[{d}_{j}^{{{{{{{{rm{out}}}}}}}},star }=d-1right].$$

(9)

In other words, ({mathsf{P}}left[{d}_{N}^{{{{{{{{rm{out}}}}}}}},star }=dright]) is equal to the sum of the probability of agent *i* choosing (jin left{1,ldots ,i-1,i+1,ldots ,Nright}) as first choice times the probability of having out-degree *d* − 1 in a network with *j* remaining nodes (the node itself and the *j* − 1 nodes to which she can still connect to).

Thanks to eq. (9), we can describe the out-degree probability functions by means of recursion on the network size *N*. The result for different values of *N* is pictured in Fig. 7a. The out-degree distribution exhibits non-monotonic properties (first increasing, then decreasing) and quickly vanishing tail: extremely large values are particularly rare. Even though the theoretical out-degree distribution cannot be directly associated with any known distribution, it is very close to a gamma distribution (or to a Poisson distribution), as shown in Supplementary Fig. 2 (see also the discussion in Supplementary Note 2). Thanks to its non-monotonic nature and its fast decay, there are also some similarities with the log-normal distributions (whose decrease is more than linear in the log–log plot). On the other hand, it differs more significantly from a power-law distribution. Despite being observed in the in-degree distributions of many real-world social networks, power-laws do not always properly describe their out-degree distributions. Empirical evidence on the asymmetry between the in- and out-degree distributions is found on Twitter^{46} and YouTube^{47}. In particular, several empirical studies on the out-degree distributions of online social networks highlight a much faster decrease in the probability of seeing very high out-degree nodes^{48,49,50}. Albeit some influencers on YouTube, Instagram, or Twitch, might have one million followers, it is hard to imagine that some users follow several thousand or even a million of other users, since every single action of the following someone involves a decision-making process and at least a click on the Follow button. As a result, the out-degree distributions of these platforms feature a clear cut-off (sometimes even artificially imposed to prevent fake-user accounts) on the order of roughly a thousand connections (for network samples, this clearly reduces even further), instead of a heavy tail as featured by the power-law.

In our Twitch data sets, we find evidence of this fast drop in the frequency of the high out-degree nodes. As shown in Fig. 8, the out-degree probability density function is concentrated in the range *d* ∈ [1, 10], with the 99-percentile being at *d* = 15 (*d* = 19) in the chess (poker) data set. As a comparison, our theoretical distribution would predict the 99-percentile to be at *d* ~ 10, while a power-law of classical exponent −2 predicts it at *d* = 100. The maximum out-degrees are found to be, respectively, *d* = 151 and *d* = 142, which are higher than the maximum out-degree of the theoretical distribution *d* ~ 20, but at least three orders of magnitude smaller than the maximum predicted by a power-law of exponent −2, i.e., from the maximum in-degree of the empirical distribution. At first glance, the theoretical and empirical results in Figs. 7a and 8 show some differences. In particular, compared with the data the empirical distribution exhibits a larger frequency of low out-degree nodes. The difference can be due to the fact that the network is still in the formation process, and many users only joined it recently. Therefore these most recent users have just started their search and there is an abundance of low out-degree nodes. We conjecture that another possible reason may be the recommendation systems behind the Twitch platform (we explored this effect by studying the numerical out-degree distribution under preferential attachment-based meeting process, see Supplementary Fig. 3). To better understand this inconsistency, in Supplementary Fig. 15, we studied the stacked frequencies of the out-degree of the followers of the 15 most followed nodes in each data set.

From the recursive description of the probability density function in eq. (9), it is possible to compute the expected nodes’ out-degree, as stated in the following theorem.

### Theorem 3

(Out-degree distribution) At equilibrium, the nodes’ expected out-degree ({mathbb{E}}left[{d}_{N}^{{{{{{{{rm{out}}}}}}}},star }right]) in a network of *N* ≥ 2 agents equals the (*N* − 1)-th harmonic number:

$${mathbb{E}}left[{d}_{N}^{{{{{{{{rm{out}}}}}}}},star }right]=mathop{sum }limits_{k=1}^{N-1}frac{1}{k}.$$

According to the theorem, the expected out-degrees as a function of *N* are given by the harmonic sequence: 1, 3/2, 11/6, 25/12, …. The proof provided in Supplementary Note 1 is built on the derivation of the out-degree probability distribution, however, the result can be intuitively derived from the following observations: the expected out-degree is uniform across all the nodes, and the sum of the expected out-degree equals the sum of the expected in-degree of all the nodes. From the result on the in-degree distribution (see eq. (8)), the sum of the expected in-degree is equal to (*N* − 1) + *N*/2 + *N*/3 + … + 1. Thus, the expected out-degree is equal to (*N* − 1)/*N* + *N*/(2*N*) + *N*/(3*N*) + … + 1/*N*, which in fact corresponds to the *N* − 1-th harmonic number. As shown in Fig. 7b, the expected out-degree has a similar growth compared with the base-2 logarithm of the network size *N*.

### Diameter and clustering

Since each node is connected, on average, to roughly ({{{{{{{mathrm{log}}}}}}},}_{2}(N)) other nodes, intuitively, the network should feature the small-world property. In Fig. 9a we provide some numerical results on the average network diameter and average nodes’ distance for different network sizes. According to the results, also the network diameter has a growth rate similar to the base-2 logarithm of the network size. Moreover, the average node’s distance has even slower growth, with a value of 5 for networks of 10^{4} nodes. These results match the empirical observations on the small-world property of real-world networks, according to which the distance between two randomly chosen nodes grows proportionally to the logarithm of network size^{51}. Moreover, they are quantitatively similar to previous findings on, e.g., an Instagram network sample^{52} (of size equal to 44 thousand nodes) where the diameter is found to be 11, and the average distance 3.16.

As such, our theoretical model already captures some of the most widely observed features of real-world networks, i.e., the in-degree scaling and the small-world properties. A third common feature is the high clustering coefficient. Even though it has been shown that the average value on directed social networks such as Instagram or Twitter, is smaller than on other undirected social networks, e.g., Facebook, it remains on the order of 10%^{22}. To compute the clustering coefficient for directed networks, we adopted the class out in the taxonomy of^{53} in which a triad around node *i* is closed when *i* follows two distinct agents *j* and *k*, and there exists a tie from *j* to *k* or from *k* to *j*. Accordingly, the clustering coefficient of *i* reads as

$${c}_{i}=frac{{sum }_{jne i}{sum }_{kne i,j}{a}_{ij}{a}_{ik}{a}_{jk}}{{sum }_{jne i}{sum }_{kne i,j}{a}_{ij}{a}_{ik}}.$$

The numerical results pictured in Fig. 9b indicate that the average clustering coefficient is monotonically decreasing in the network size *N*, yet it remains above 10% on networks composed of 10^{6} nodes. Only a marginal increase is observed when introducing the preferential attachment on the meeting process (see the comparison in Supplementary Fig. 4).

### Audience overlap

The previous analysis on the clustering coefficient studied the probability that two user’s followees are friends with each other. Similarly, it is interesting to analyze the probability that two (highly followed) users are followed by the same third user. In other words, we aim at studying the similarity between the followers’ sets of the different agents. The similarity between agents reveals the existence of common interest, and it can be used for link prediction^{54} or to improve the recommendation systems^{55}. Inspired by the Jaccard index introduced for species similarities^{56}, and already used to measure followers’ overlap^{57}, we propose the following real-valued matrix *O* to measure the overlap between the audiences of two agents:

$$O(i,j):= frac{left|{{{{{{{{mathcal{F}}}}}}}}}_{i}^{{{{{{{{rm{in}}}}}}}}}cap {{{{{{{{mathcal{F}}}}}}}}}_{j}^{{{{{{{{rm{in}}}}}}}}}right|}{left|{{{{{{{{mathcal{F}}}}}}}}}_{i}^{{{{{{{{rm{in}}}}}}}}}right|}in left[0,1right],$$

(10)

if (left|{{{{{{{{mathcal{F}}}}}}}}}_{i}^{{{{{{{{rm{in}}}}}}}}}right| , > , 0), and 0 otherwise, where ({{{{{{{{mathcal{F}}}}}}}}}_{i}^{{{{{{{{rm{in}}}}}}}}}) denotes the set of followers of agent *i*. In other words, this coefficient measures the number of common followers of *i* and *j*, normalized by the number of followers of *i*. Note that, when agent *i* is lower in the ranking list with respect to agent *j*, i.e., *i* > *j*, we typically have (| {{{{{{{{mathcal{F}}}}}}}}}_{i}^{{{{{{{{rm{in}}}}}}}}}| < | {{{{{{{{mathcal{F}}}}}}}}}_{j}^{{{{{{{{rm{in}}}}}}}}}|), and eq. (10) corresponds to the Szymkiewicz-Simpson coefficient, also known as the overlap coefficient^{58}, where the denominator is replaced by (min{| {{{{{{{{mathcal{F}}}}}}}}}_{i}^{{{{{{{{rm{in}}}}}}}}}| ,| {{{{{{{{mathcal{F}}}}}}}}}_{j}^{{{{{{{{rm{in}}}}}}}}}| }). Compared with it and to the Jaccard index^{56}, whose denominator is (|{{{{{{{{mathcal{F}}}}}}}}}_{i}^{{{{{{{{rm{in}}}}}}}}}cup {{{{{{{{mathcal{F}}}}}}}}}_{j}^{{{{{{{{rm{in}}}}}}}}}|), our measure leads to a non-symmetric matrix.

In Fig. 10a, we plot the numerical results on the overlap index for networks of 10^{5} nodes, upon reaching the equilibrium. The results are averaged over 10 simulations. According to the previous results, all the nodes should follow node 1 at equilibrium, thus any follower of a node *i* should also be a follower of node 1 and the overlap in the first column is simply (*O*(*i*, 1) = 1). Moreover, consistent with our result in eq. (8), we observe the Zipf’s sequence 1, 1/2, 1/3, …, in the first row (*O*(1, *j*) = 1/*j*). Perhaps surprisingly, a similar pattern appears in all the rows, upon averaging on a sufficiently large number of simulations. We emphasize that the results are independent of the number of nodes *N*. An intuition for these observations is provided in Supplementary Fig. 5.

To further validate our theoretical model, we perform the same analysis on the overlap among the 15 most followed nodes of our Twitch data sets, reported in Fig. 10b (for the chess data set) and in Supplementary Fig. 14 (for the poker data set). Our numerical results are qualitatively well aligned with the real-world data with respect to the horizontal decrease of the overlap index. Yet, the Twitch data sets show that the overlap index is not always uniform across the different rows. For instance, in the chess data set, low-ranking nodes exhibit a slightly higher overlap index with respect to the numerical results. Conversely, high-ranking nodes show the opposite behavior. This phenomenon might be related to the statistical dependency of individuals’ followee sets on their out-degrees, as discussed in Supplementary Note 4.