On Friday and Saturday I had the great pleasure of attending the UCSB Interdisciplinary Humanities Center’s Geography of Place conference, “Mapping Place: GIS and the Spatial Humanities.” I’m definitely going to write about that after my I recover from two full days of geography-infused goodness. For the moment you can check out the live tweets that some of us contributed from our Twitter accounts using #MappingPlace.
In the context of a conference largely devoted to exploring the different ways humanists use and display and theorize data, using #MappingPlace reminded me of the power of Twitter in collecting data for my own research. For a while I’ve kept tabs on Twitter users who report publicly on their prayerwalking activities. Before I get technical about what I’m learning about and what I’m trying to make, let’s talk about Twitter and its restrictions for researchers.
1. Content Boundaries: First, private information such as street addresses cannot be shared without a user’s explicit authorization. To the degree that a researcher might want to geo-locate a Twitter user based on non-public information, this can pose an obstacle to mapping tweets or twitter content.
3. Uses & Distribution: In short, by tweeting a Twitter user has granted folks “a worldwide, non-exclusive, royalty-free license (with the right to sublicense) to use, copy, reproduce, process, adapt, modify, publish, transmit, display and distribute such Content in any and all media or distribution methods (now known or later developed).” For researchers who wish to use public tweets the combination of point #2 above and #3 here amount to a near blank-slate for current and future research. Users cannot expect compensation for the use of their public information (tweets), nor can they legally object to the use of that information (with a few extreme legal exceptions related to demonstrable copyright infringement and, potentially and only in certain places, violation of local defamation, slander or libel laws.)
The sum of these points–despite any fears that researchers might have over rare legal issues–is that Twitter is a golden goose for humanists. Let me show a couple of the ways this is true for my own subject, spatial prayer practices like prayerwalking.
If you want to find out who went prayerwalking recently and tweeted about it, you simply use Twitter’s search tool. In this instance “prayerwalking” (one word) returns nothing. Most users don’t combine the terms pray and walk together into a single word even if they are going on a prayerwalk. But if you search for “prayer walking,” Twitter returns dozens of results:
As you can see, these results are not really about folks who are going out into their neighborhoods and cities praying on-site. It’s a problem that no amount of creative searching will solve because Twitter can’t distinguish between “Say a prayer for help and be thankful you’re walking with the Lord” and “I’m prayerwalking for the Lord to help me.” Combining pray and walk in a dozen different ways doesn’t return things that are substantially better or more related to what I’m looking for, which is reports by folks planning a prayerwalk, during a prayerwalk, or reporting that they have completed a prayerwalk.
So while on one hand I’m thrilled that I can easily search Twitter (this is a newer feature of the site that didn’t exist a year ago), it fails to provide a sophisticated enough search and sort experience for my research needs. What to do?
One solution might be to begin archiving any and all tweets that involve the many combinations of pray and walk that could include desireable tweets. Then you could go through them all and identify the ones that were relevant (deferring for just a moment the hurdle of saving just those items). Thankfully I don’t have to do this (or I only do it occasionally using the resources within my personal Twitter account). While the past week or two of tweets is always readily searchable within Twitter, pretty much everything that has been tweeted is available through Google’s Realtime search feature. (Everything is technically available through Twitter, too, but the site doesn’t return very much data when you use its native search features.) Check out this search of “prayer walk” that shows tweets from August 2010:
What’s great about this aggregation of results is that Google provides a time distribution of the number of tweets matching your search that researchers can manipulate to isolate periods or dates that are of particular interest to them. For me I’m more concerned with the content of particular tweets, but the option of seeing a time-scale distribution of “prayer + walk” tweets could reveal a community of prayerwalkers or a prayerwalking event that is worth investigating. (You can also take advantage of Google’s more sophisticated search options to refine your terms in ways that are not possible within Twitter.)
But we haven’t really done anything spectacular yet, have we? Hashtags are user assigned meta-tags in Twitter (#MappingPlace), but these don’t exist for prayerwalking. I could begin a campaign to help generate that meta-tag among users that do tweet about prayerwalking, but that would only help future data collection. Word-content searches can reveal a recent set of data about a topic in Twitter (“prayer+walking”), and Google can further refine those kind of searches by broadening the time-range and presenting the data in a way that highlights moments of intense activity (“prayer+walk”).
If we want to do more we can be extremely thankful to both Google and Twitter’s embrace of open API documentation. (Google has a custom search API that would allow one to return the same results that Realtime does. Right now there is still no API specifically for Realtime.) What this means is that a researcher who wants to can build tools that take advantage of back-end access to re-implement and re-purpose publicly available tweets. Using a combination of Google and Twitter it is possible not only to identify, store, display, and organize a data set of tweets, but to do this continuing forward in the future.
Goal: Build a web application that displays and manages a database of tweets culled first from past tweets and which then continues to monitor Twitter/Google to periodically add tweets to the database.
- Use Google’s Fusion tables to build a scaffold database to populate with tweets and tweet-related data.
- Write a script or series of scripts for Google search and/or Twitter to gather past data and add it to the Fusion database.
- Write a script or series of scripts for Twitter to continue capturing future tweets and periodically adding them to the database.
- Create an interface for my host web site to display fusion data including scripts to annotate and modify the database (i.e., embed Fusion API and twitter monitoring script as well as a way to highlight prayerwalking tweets and hide false positives).
So there it is. One small conference has me finally building up the reserve to really get my hands dirty (well past what I’m working on in my previous post about author relationships. It’s amazing what being around a group of motivated, enthusiastic folks can do to you–and it really affirms everything we know about collaborative workspaces. So with that, check this out on where ideas and inspiration and breakthroughs come from:
Suggestions and recommendations and corrections entirely welcome.