March 11, 2019

Network of Thrones: Recapping Season 7 and Predicting Season 8

March 11, 2019

In about a month, HBO will begin the final season of Game of Thrones. The show has reached a cultural critical mass and its viewership has increased over recent seasons. The show’s branding has been limitless and included beers, whiskys, and makeup, just to name a few. The coming season has become one of (if not) the most heavily anticipated final seasons in recent memory.

With the season premiere coming in mid-April, I thought now might be a good time to recap Season 7 and generate some predictions for what will happen in the final season. Given that this is a data blog, and I’m a data scientist, it seems reasonable to use statistical methods to do these things. Specifically, I want to use network analysis to draw insights from Season 7 that I can use to recap the season and generate predictions for Season 8. There is a tradition of using network analysis to study plot points from Game of Thrones, but I want to move beyond tracking previous plot points using network analysis to predicting plot points using network analysis.

The data used was generously collected by Andrew Beveridge, an Associate Professor at Macalester College. Specifically, the data details the number of interactions between any two characters. Here, an interaction is defined as one of the following:

Character A speaks directly after Character B.
Character A speaks about Character B.
Character C speaks about Character A and Character B.
Character A and Character B are mentioned in the same stage direction.
Character A and Character B appear in a scene together.

This produces a “weighted” and “undirected” network, wherein two characters interact a number of times (weighted) but we are agnostic to who initiated the interaction (undirected). I will try to explain as much network-parlance as i can as we go along, but for a canonical primer, see Wasserman and Faust.

When generating predictions, I will pay particularly close attention to plot points, potential interactions, and who may ultimately become to ruler of Westeros.

It is at this point I should warn you: If you haven’t yet seen Season 7 and care at all about having it spoiled for you, stop reading. Right now. Seriously. Go finish Season 7.

To tell its story, Game of Thrones bounces back and forth between different locations. For example, an episode might start at The Wall, only to bounce between Dorne and King’s Landing. To understand these different plot narratives and how they may interact in the future, I use community detection.

Community detection typically seeks to group actors into a finite number of groups based upon pockets of connectivity. Think about your office. Perhaps the data scientists and engineers interact with one another fairly closely, forming a tight-knit community. Perhaps the sales team interacts with one another fairly closely, forming a tight-knit community. While these communities may interact with one another, more often than not, any member is most likely to interact with someone in their own community. Within the context of Game of Thrones, it makes sense that Tyrion Lannister and Jon Snow would interact more frequently than Jon Snow and Cersei Lannister would; they spent most of the season in the same location.

Results from the Louvain community detection algorithm. This algorithm is specifically designed to work efficiently on large, weighted networks.

The following visualization indicates that there are four communities within Season 7 of Game of Thrones. This certainly makes sense. Furthest left, colored in blue, we have a community that consists of characters such as Cersei Lannister, Jaimie Lannister, Ser Bronn of the Blackwater, Ser Gregor Clegane (the Mountain), and the Greyjoys. For a great part of the season, characters within this cluster are interacting to maintain a firm-grasp on the Lannister-led Westeros, and they are frequently interacting towards this end.

However, as the season progresses and Jon Snow convinces Queen Daenerys Stormborn of the House Targaryen (… the First of Her Name, Queen of the Andals, the Rhoynar and the First Men, Lady of the Seven Kingdoms and Protector of the Realm, Lady of Dragonstone, Queen of Meereen, Khaleesi of the Great Grass Sea, the Unburnt, Breaker of Chains and Mother of Dragons) that Winter is indeed coming, they reach out to the Lannisters in King’s Landing to call a truce and turn their attention to the North. As such, the blue and green communities overlap slightly. This green community, consisting mostly of Jon Snow, Queen Daenerys Stormborn… I won’t repeat the title again for our collective sake, Tyrion, and everyone’s favorite Onion Knight Ser Davos, spends most of the season in Dragonstone.

I also find a community reflecting the interactions within Winterfell, wherein Sansa is attempting to field an army capable of defending the North. This community, colored in yellow, consists largely of Sansa, Littlefinger, Arya, Bran, Brienne of Tarth, and everyone’s favorite squire Podrick. There is certainly some interaction with the Dragonstone community, for example, Jon and Ser Davos spent a good amount of time in Winterfell during Season 7. Nevertheless, the bulk of interactions between members of the Winterfell community occur with other individuals within the Winterfell community.

The final community detected, colored in Red, includes Ser Jorah Momont, Tormund Giantsbane, the Hound, the Night King, and White Walkers. This alludes to a large plot narrative wherein a team is sent north of the Wall to capture living White Walkers as proof for Daenerys and Cersei. You might remember this concluding with a team of White Walkers pulling Viserion (one of Daenerys’ dragons) out of a lake.

This exercise has been good for recapping Season 7, but what can it teach us about Season 8? Quite a bit! First and foremost, these communities, while distinct, are fairly close together. This indicates something that most would expect: Westeros must unify if it is to confront the Night King. There are no totally disconnected components or groups of characters, and as time progressed, three of these communities started to converge and interact with one another with greater frequency. One thing that does become apparent though, it seems that some are far less likely to converge. For example, it may not make sense for those in Dorne, such as Tyene or Ellaria Sand, to interact with those likely to stay in Winterfell, such as Sansa.

To better understand who is likely to interact, or not interact, in Season 8, I use a latent space model to estimate the “affinity” between any two characters. Specifically, I use an Additive Mixed Effects (AME) model introduced by Peter Hoff and implemented in R with the “amen” package. This model allows users to examine which actors within a network are more or less likely to interact, “affinity”, based upon common connections and a variety of complex interdependencies within the network. For an excellent primer, I recommend Minhas, Hoff, and Ward.

Heatmap of interactions, lighter values indicate higher positive affinity, darker values indicate lower or negative affinity. I recommend opening this in a separate tab for closer examination.

The preceding visualization presents a heatmap for these affinity values, wherein darker values reflect negative or lower affinity and lighter values reflect greater positive affinity. It should be noted, that true to form, these color palettes were produced by Alejandro Rico and made available in his R package “gameofthrones”. I went with the White Walkers’ palette because while winter will hopefully be coming to an end (seriously, it’s March…), Winter is coming.

Overall, it appears that most characters don’t have a great deal of affinity for one another (but not necessarily negative affinity). The highest levels of affinity appear to be associated with particular characters, as shown by the light blue horizontal and vertical stripes. These characters include Tyrion, Sansa, Jon, Daenerys, and Cersei. This makes sense, as the main characters they would be most likely to form a relationship with any one other person.

However, this ultimately tells us little about what we might expect from Season 8. To generate interesting predictions, I look relationally at the pairs of characters who do not connect in Season 7 but have the highest latent affinity, or probability of forming a relationship. The ten largest affinity scores among those with no Season 7 relationships are presented in the preceding plot. The first relationship is perhaps one of the most heavily anticipated meetings in the series: Sansa meeting her brother Jon’s new queen, lover, and in the largest turn of dramatic irony in recent memory, aunt, Daenerys. This indicates a fairly large prediction that has actually already been verified! We’re already off to a good start!

Many of the remaining relations are between Jon and several large characters that will likely join Jon’s command in the battle against the White Walkers. Once omitting those who die, this includes Ellaria and Tyene Sand and Yara Greyjoy. The model also predicts that Arya Stark and Daenerys will meet, which makes sense as similar to Sansa, Arya is the sister of Jon, Daenery’s new general and lover. Additionally, the model predicts that Arya and Tyrion Lannister will eventually meet. Given that Arya’s sister Sansa is Tyrion’s amicable ex, this seems likely.

Finally, I want to end with the projections that most readers may care about: Who will lead Westeros once the dust settles? Even Vegas cares about this question, putting Bran Stark as the favorite at +125. You can even put money on Tormund at +15000. Given the political nature of Game of Thrones, perhaps we can use social network analysis to produce measures of political power. Within the study of networks, centrality is a concept frequently used to characterize an actors position within the broader network. To capture the influence of a character, we might want to use measures like eigenvector, betweenness, or degree centrality.

Each measure in the preceding plot presents a different “facet” of the social power and influence one might yield in Westeros. Degree centrality would measure a character’s popularity, how frequently they interact with other characters. Weighted degree specifically captures the volume of interactions with all other characters while unweighted degree simply measures the number of individuals a character interacts with. For example, Samwell Tarly may have a great deal of interactions, but they’re largely with the same people (Jon, Gilly, etc.).

Betweenness centrality, on the other hand, captures how frequently a character might be between other individuals in the network. A character with high betweenness is a bridge character, linking many other characters together indirectly.

Finally, eigenvector centrality measures the connectedness of an character to other highly connected characters. As such, they have influence and they have influence with individuals who have a lot of influence.

It is unclear whether one of these should produce a firm set of predictions over others. As such, they must be considered in combination. Regardless of measure, the same names frequently appear: Jon, Daenerys, Cersei, Tyrion. Jon and Daenerys are near the top in all four measures while Cersei and Tyrion are included in the top five in three of four measures. As such, it seems that safe money would be, in this order: Jon, Daenerys, Tyrion, and Cersei.

For the record, these are my predictions:

There will be a battle where at least individuals in all communities will fight. It is unclear whether we should expect it to be alongside one another or against one another.
In a major plot point and in a significant way:
- Daenerys and Sansa will interact.
- Ellaria and Jon will interact.
- Jon and Yara will interact.
- Daenerys and Arya will interact.
- Jon and Tyene will interact.
- Arya and Tyrion will interact.
One of the following will rule Westeros: Jon, Daenerys, Tyrion, or Cersei.

I will try my hardest to update these predictions as episodes come along!

Benjamin Campbell

February 25, 2019

What is Twitter saying about the injury of Duke's Zion Williamson?

Benjamin Campbell

February 25, 2019

In last week’s NCAA basketball game of the week, the UNC Tar Heels (#5) faced off against the Duke Blue Devils (#1). The game had all the makings of a classic: two top five contenders, a widely known rivalry, and top NBA prospects. Tickets to the game met Super Bowl prices fetching, at a minimum, $2,500. While UNC-Duke games typically receive significant national attention, it was Duke’s freshman sensation Zion Williamson who attracted big names like President Barack Obama and recent Oscar winner Spike Lee (finally). As a true Freshman, Zion has taken the sports world by storm, drawing attention for years at this point for his jaw-dropping dunks and tremendous NBA potential.

Unfortunately, not even a full minute into this hyped game, Zion’s Nike PG 2.5 (Oklahoma City Thunder forward Paul George’s signature shoe) exploded out from under him, leaving him to fall awkwardly and sustain a Grade 1 knee sprain. This injury sent shockwaves through sports reporting and led Nike to lose $1.1 billion in stock value. Beyond the quality of Nike shoes, Zion’s injury raised many important questions about college athletics, the NBA’s “one-and-done” rule, and the shamateurism of collegiate athletics.

One question that piqued my interest was how the Twitterverse has reacted to Zion’s injury. As a daily listener of sports radio (especially The Dan Le Batard Show), I became curious about how we could define the conversation surrounding this momentous event.

To study this conversation, I downloaded approximately 1,000 Tweets from Verified Twitter users (a classification typically reserved for journalists, celebrities, or other people of acclaim) and included the words Zion, Williamson, or Duke. For this project I used Mike Kearney’s “rtweet” package, which I highly recommend. These Tweets came between February 12th and 22nd, straddling the game which occurred on February 20th. The breakdown of Tweets is as follows:

Hourly distribution of relevant Tweets from Verified users. Dashed line is drawn at the time of the injury.

As you can see, the number of Tweets dramatically increases following Zion’s injury. Naturally, the question of who most frequently contributed to the conversation interested me. One might expect the large sports news reporting firms to dominate the conversation, but perhaps some more regionally-based reporters might contribute significantly. The following graph plots the 10 most active users in this data:

Verified Users contributing the most to the conversation regarding Zion Williamson before and after his injury.

As expected, the four most active users are relatively large sports reporting agencies, such as Sports Illustrated (SInow) and their affiliated accounts for NCAA basketball (si_ncaabb) and NBA basketball (TheCrossover). Following that, there’s activity by Duke Basketball (dukebasketball). Perhaps unsurprisingly, but rarely considered, local news outlets appear discuss Zion significantly as well, including the flagship radio station for UNC (WCHLChapelboro). The inclusion of Sports Illustrated’s NBA Draft contributor Jeremy Woo’s (JeremyWoo) on this list appears to indicate that perhaps many were considering the impact of the injury on the NBA draft.

However, how can we learn about the discussion with respect to Zion and his injury? In processing these tweets, I removed any tokenized portions of links, Zion, or Williamson. Using the principles of semantic coherence (again, not the most principled), I estimated a Latent Dirichlet Allocation with 6 different topics. The results of this analysis is presented as follows:

Labels are assigned according to commonalities in the words frequently appearing in each topic.

Overall, many of the topics discussed reflect narratives that I heard preceding, during, and following the UNC-Duke game. The first topic reflects a series of discussions wherein NBA players encouraged Zion to hang-up his shoes and prepare for the draft and recognized the unethical treatment of student athletes. Many of the words associated with this topic include “nba”, “lebron, “james”, and “players.” One of the largest voices during this time was LeBron James, who told Zion that he should do what he thought was best for him. Other voices included Utah Jazz guard Donovan Mitchell who said,

“Again let’s remember all the money that went into this game.... and these players get none of it.... and now Zion gets hurt... something has to change.”

Denver Nuggets’ Isaiah Thomas tweeted,

”Let these kids go straight out of HS!!! Too much on the line to be messing with college if you got a legit chance to turn pro. One injury can change somebody career, Zion sit yo ass down lol and we will be ready for you in the big boy league #LookingOutForThePlayers.”

These are NBA players, like LeBron James, who are reacting to Zion’s injury. As such, given these words, this topic seems relevant.

As mentioned, prices for this UNC-Duke game had skyrocketed. ESPN reported that minimum prices were around $2,500 and some tickets were sold for over $10,000. This was a dominant topic of conversation preceding the game and following the game when many expressed compassion for those spending so much without seeing Zion play more than a minute. This topic emerges from the model, where “ticket”, “tickets”, and “prices” frequently occur.

Another topic appears to focus on Zion’s teammates, their importance, and potentially how they may react to the injury. This is evidenced by the inclusion of words such as “rj” and “barrett” in this topic, referring to Duke’s RJ Barrett, another well-known NBA prospect. Certainly, Zion’s injury has shifted additional burden on Barrett to pick up the load, but even before the injury, Barrett was seen as an essential part of the team's success.

A fourth topic seems to focus on the injury in general, frequently focusing on aspects of the injury and its cause, including words like “shoe”, “nike”, “knee” , and “injury.” This, naturally, was a big part of the conversation around Zion.

The fifth topic seems to largely focus upon Nike and the exploding shoe as the most prevalent words were “nike” and “shoe”, with “blows” (as in blow-out) following closely behind. This was certainly a common topic of conversation, leading to discussion about shoe design and engineering.

Finally, the last topic seems to focus in on the influence of the injury on the game and whether he would return. Frequently appearing in this topic are “injury” and “game”, relating these words to how the injury influenced the game. Also appearing in this topic, but no others, is “return”, likely referring to the question of whether or not Zion would have returned to the game.

These topics seem to do a decent job summarizing the discussion surrounding Zion Williamson and his injury. But, how much did the conversation change surrounding his injury? Were certain topics more prevalent after the injury than before? To analyze this, I examine the volume of Tweets assigned to a particular topic over time. This is visualized as follows:

Prevalence of topics over time. Vertical line drawn at time of injury.

Unsurprisingly, Tweets regarding ticket price and Zion’s Duke Teammates were fairly prevalent before the game. This makes sense, as the discussion preceding the game was largely about ticket price and the importance of RJ Barrett as a complement to Zion.

Naturally, there’s a large uptick in the Nike and shoes topic following the injury, which makes sense at it would almost certainly would not have merited discussion prior. The same is said can be said for the injury and game and return topics. Interestingly, there’s not a huge change in the NBA player response topic. While some are certainly discussing Zion prior to the game, there is an uptick in discussion following the injury.

At the time of this post, the discussion over whether Zion should return or should prepare for the draft is still raging. On Friday, NBA commissioner Adam Silver indicated an interest in removing the one-and-done rule, allowing would-be student athletes to join the NBA where they could immediately profit off their labor and likeness without spending a year “playing school.”

Benjamin Campbell

February 15, 2019

Learning from Environmental Interest Group Emails using LDA

Benjamin Campbell

February 15, 2019

A great deal is known about how interest groups lobby the government. Some of my earliest research examined the roles interest groups adopt when collectively lobbying the courts. Yet, we know relatively little about the other avenues through which interest groups influence the world. This problem motivated a team of us at Ohio State to examine the grassroots-level lobbying from interest groups.

This team, comprised of Prof. Jan Box-Steffensmeier, Seth Walker, Dalton Flanagan, and myself, examined a variety of questions related to grassroots lobbying. We began by developing a set of email addresses, and with each email address we registered for one interest group’s listserv. If an email address started getting emails from an interest group other than the one it was assigned to, we could identify that the interest groups had shared their listservs. These accounts collected emails between February 2015 and November 2016. While the primary motive of the project was to detect coalitions of interest groups that assisted one another in lobbying the public, a question that intrigued me was what interest groups discussed when lobbying the public. What could we learn from almost two years of emails sent from interest groups? Were these messages on brand? Were they highly responsive to certain agenda setting events?

I sought to answer these questions using a set of 1,864 emails sent by 22 groups. These groups had previously lobbied the United States courts over environmental issues. To learn about the topics discussed, I fit a Latent Dirichlet Allocation (LDA) with 4 topics (while there are much more principled ways to select the number of topics, I wanted to keep a small number to help in interpretation).

From this analysis, we effectively learn that groups lobby over specialized issues related to their broader mission. While this is not surprising, what is surprising is that the timing of these emails does not seem to vary according to the topic. This question of timing could mean several things. Interest groups could all respond to the same agenda setting events, but tailor their discussion of the event to their message. Alternatively, they may not be particularly moved to respond to events and may all have common communication cycles

The preceding figure shows the breakdown of topics uncovered through LDA. I have labeled these clusters according to semantic coherence and the common denominator for the most common words populating the topic. A set of emails appear to relate to issues of interest to business. This first topic includes words that largely refer to the Cato Institute, a well-known Libertarian think tank that often pushes back against environmental regulations.

Another topic appears that consists of words related to agriculture, such as trade, wheat, and soybean. This certainly makes sense as oftentimes agricultural-based interest groups find themselves involved in environmental cases (e.g. land use, irrigation and water use, pesticide/fertilizer usage).

The last two topics are consistent with what one might expect to be present in environmental interest groups’ emails. Some groups seem to send emails reflecting a topic covering conservation and habitat protection. Others seem to send emails reflecting a topic covering wildlife and wilderness protection. The former topic consists of many core words, such as birds, climate, and salmon. In addition, the National Audubon Society appears to be largely affiliated with this topic. The latter topic includes many words covering protection of national parks.

The prior image shows the topic assignment probabilities per email per a handful of interest groups. The takeaway is that often times these interest groups, more often than not, predominantly send emails associated with certain topics. These email-topic-group relationships are often fairly intuitive. For example, unsurprisingly, the American Seed Trade Association tends to send emails associated with the agricultural topic more often than any other topic. Additionally, while often sending emails associated with the wildlife and wilderness protection topic, the National Audubon Society and the Audubon Society of the Everglades tend more frequently to send emails associated with conservation and habitat protection.

Counterintuitively, however, the timing of emails does not appear to vary significantly by topic. We might expect that certain groups would ramp up their messaging in response to particular agenda setting events. However, these groups have relatively uniform timing with respect to outgoing emails. This might indicate that groups, regardless of broader objective, still respond to national agenda setting events, perhaps by tailoring their message. Alternatively, it might mean that these groups are not responsive to these events and have common cycles for sending out emails, either for advocacy or solicitation.