Show Me Your Tweets and I'll Tell You Where You're From
Researchers at Carnegie Mellon University took a closer look at the regional dialects of Twitter users and found that they were able to pinpoint a user’s location with a reasonable confidence by simply looking at the words you use. While some reports about this study argue that this means that social networks “create their own regional dialects,” most of the words reported in the study only reflect a representation of the dialect the user is already speaking. The researchers did find a few words where the preferred spellings between regions was different enough to at least hint at the formation of regional “Twitter dialects,” though.
The researchers looked at 380,000 messages with 4.7 million words from 9,500 users. The paper about this research effort (“Statistical exploration of geographical lexical variation in social media”) was presented at the annual meeting of the Linguistic Society of America last weekend.
As the researchers point out, New Yorkers say “cab” instead of “taxi” and “y’all” will quickly mark you as a southerner. On Twitter, the way you spell certain words – “koo” for “cool” in San Francisco and “coo” in southern California, for example – can also give away your location. While most Twitter users tend to shorten words rather than lengthen them, New Yorkers have a stronger tendency to write “you” as “youu” than other users. Jacob Eisenstein, a post-doctoral fellow at CMU’s Machine Learning Department and one of the co-authors of the paper, also told the San Francisco Chronicle that the use of Spanish-language terms tend to appear in regions with large Spanish-speaking populations (no surprise there).
Based on all of this information, the researchers managed to create an algorithm that allowed them to pinpoint the location of most Twitter users within a 300-mile radius.