Friday, November 11, 2011

Understanding Relationships Through Data

This session on the Large Scale Computing track called "Understanding Relationships Through Data" was presented by Ming Hua, one of my lab-mates from grad school!   Ming now works at Facebook so it was great to see her again.  Of course, I knew from being a student with her that her talk would be excellent. :-)

Ming began the talk with some data - there are ~350 million users on facebook with 80 million photos uploaded.  Billions of pieces of content are shared every week, with people listening to music, checking in with friends, and posting photos and videos and sharing those updates with their friends.  The average user has 130 friends and is connected to 80 pages, groups, locations, etc.  This can be organized as a giant social graph.

The key point Ming makes is that there is a 'data generation loop', with reciprocal relationships between shared content, social context, and social connections.  Each node in this loop brings up different questions:
  • What do people share about?
  • What do people do together?
  • What do people connect with?
To explore these questions, Ming presented several aspects of her research.  First, she spoke about her investigations into 'happiness analysis', in which she tried to predict if people (Facebook users) are happy.  She stressed that they conducted a voluntary study where they received feedback from users and analyzed their facebook updates.  An ideal measure of happiness requires:
  • a reperesetnative of sample population
  • should be based on naturalistic behaviour
  • is computational (no human raters)
  • is efficient (must process millions of updates per hour)
She conducted sentiment analysis on FB status updates:
  • there are billions of status updates are shared monthly
  • these updates are used for self-expression
  • the identified updates are then published to friends and the public
The words in Facebook statuses can be mapped to emotion:
  • there are word categories to represent psychological content
  • this is well validated in many corpa
  • i.e. "Fred hates passive aggressive Facebook updates, but loves irony" - "hate" and "aggress" are negative emotional terms, "updates" is a neutral term, and "love" is positive.
The hypothesis is that happier people use more positive and/or fewer negative words
Facebook recruited 1341 English speakers to complete the study.  They looked at the percentage  of positive and negative words in updates and graphed the gross national happiness in US as a result.  It was interesting to see that the happiest days of the year according to the data are Thanksgiving, Christmas, New Years' Eve, Valentine's Day, Easter, and Halloween (however Ming stressed the need to normalize for phrases like 'Happy Halloween' in Facebook status updates).

People also like many pages (owned, authentic pages, sourced pages like wikipedia, community pages), so Ming asked, what if we represent all entities in the world by facebook pages, thus turning them into connections in the graph?  We might be better able to provide page recommendations, insights, ad targeting, and de-duplication of existing pages and pages provided in results.

Another interesting application of this kind of data analysis is to predict hours of business based on user check-ins using the words and data provided in their status updates.  There are other applications as well, given by the demographic and temporal data mapped geographically.  For example, check-ins could be predicted.  A study was done by holding out check-ins and trying to predict them by feeding the features into a logistic regression, works well for check-ins.  It doesn't work as well for predicting comments or 'likes' on a given check-in.

Responses to check-ins could also be predicted: Ming found that responses are more likely when the person checks in far from their usual location.  For example, when Ming checks in in Portland, she gets more responses than her check-ins in SF, which are much more common for her.  Also, if you check-in close to a commenter/liker, then you will are more likely to get a response.

Ming showed us several prototypes; for example, one that suggests better content for you based on the people you tag.  Or, instead of searching for pages when you type in #ghc11, you would get events, pages, friends statuses, etc.  The goal is to rank things that best match your original purpose.  There are new features based on geolocation, and real-time search of friends status updates.  Ming ended by reminding us that the ultimate goal is to stress the loop to make the world more open and connected.

There were many questions including one of my own: what are the concerns/implications around being able to identify people's moods?  Do happier people have more happy friends?  What are the future directions for Facebook along these lines?  Ming said that happier friends may have more interactions with their friends, so that can help identify them.

2 comments:

Kerry C said...

I'm so glad you posted this summary, since this is the industry presentation that actually relates to my work, and I really wanted to see.

Thanks!

Kate said...

Glad you got something out of it! :D