If many people named "Marc" live in an area, what does this say about how the area tends to vote? What do name distributions look like for precincts? Which names are the most predictive about how a precinct votes? How can we visualize this information in an insightful and exciting way? I don’t have the answers but here are some ideas:
- Use “words-used-by” bubble graphs to visualize data. For example, use the last names from democratic voting precincts for one bag of words and republican voting precincts for the other and then visualize these bags with something like http://www.nytimes.com/interactive/2012/09/06/us/politics/convention-word-counts.html?_r=0
- Use word graphs to view precinct voter names. http://www.wordle.net/ or
- Build a predictive model for how precincts will vote with registered voter name counts as predictive features. Each precinct would be an instance and the democrat vs. republican would be the outcome predicted. It would be interesting to see how well the model would work on held out data. We could also look at which names are the most important to the model’s accuracy. Random Forest classifiers would be the sort of thing to start with.