Researchers have found a way to predict a tweet’s popularity — with an astounding 84% accuracy.
Here, per one algorithm, is the Platonic version of the news tweet:
If that seems a little dull for Twitter perfection … well, that’s the point. Steadiness — compelling news expressed in straightforward, not hyperbolic, language — is actually a component of a maximally effective tweet, according to the algorithm. And this particular tweet is also sent from a credible source, The New York Times, making it extra spreadable. It’s also about technology, the most popular, shareable category of news story. It’s engaging without being insistent. And it stars a company — Apple — with high name recognition.
The algorithm comes courtesy of a fascinating paper from UCLA and Hewlett-Packard’s HP Labs. Researchers Roja Bandari, Sitram Asur and Bernardo Huberman teamed up to try to predict the popularity — which is to say, the spreadability — of news-based tweets. While previous work has relied on tweets’ early performance to predict their popularity over their remaining lifespan, Bandari et al focused on predicting tweets’ popularity even before they become tweets in the first place.
The researchers have developed a tool that allows Twitterers — in particular, news organizations — to calibrate their tweets in advance of their posting, creating content that’s optimized for maximum attention and impact. That tool allows for the forecasting of a tweeted article’s popularity with a remarkable 84% accuracy.
To develop their algorithm, the researchers hypothesized that four factors would determine an article’s social success:
The news source that creates and publishes the article
The category of news the article belongs to (technology, health, sports)
Whether the language in the article was emotional or objective
Whether celebrities, famous brands or other notable institutions are mentioned
The team then used publicly available tools like Feedzilla‘s API to gather a dataset of more than 40,000 tweeted news articles — collected during a nine-day span in August 2011. They used Feedzilla’s topic metadata to assign a category to each article (distinguishing among, say, tech stories, business stories, sports stories and the like). And they used Stanford’s Named Entity Recognizer to identify text representing a famous person or company name within the tweet — Lady Gaga, say — and to measure the prominence of that name relative to others. What resulted was a score for each of those 40,000 tweeted articles based on the team’s four factors.
The team then compared the number of retweets and shares each news article garnered over time. Their key metric was what they termed t-density, or the number of tweets earned by each news link.
Here are their findings, broken down by news category:
As the graph shows, the category of news involved in the article certainly made a difference to a tweet’s popularity: Technology was the most tweetable news area, followed by Health and the ever-shareable Fun Stuff. Also impactful was the name recognition of the text of the tweet itself. You can know with some certainty that a story about Lady Gaga will do well, and you can know with even more certainty that a tech story about Lady Gaga will do well. But what led most overwhelmingly, and most predictably, to sharing was the person or organization who shared the information in the first place — hence, the @NYTimes origin of the tweet above. A “WHOA, GUYS, HERE’S HUGE NEWS ABOUT LADY GAGA” sent from @NYTimes means a lot more than the same declaration from me, or even from @LadyGaga herself. Brand, even and especially on the Internet, matters.
Furthermore, emotional language doesn’t seem to matter when it comes to predictable sharing. A tweet that calmly describes what you’ll get by clicking on a link — “Here is some news about Lady Gaga” — will have about the same attentional impact as a tweet that HYPERBOLICALLY SHOUTS IT. Even within the tumult that is the Internet, when it comes to framing the news, objective language does just as well as emotional.