Hilary studies data, and gets to work with scientists from fields like sociology, physics, biology, etc to do it! She gave a quote that describes her philosophy:
"The purpose of computing is insight, not numbers. ~ Richard Hamming, 1961
Building on this, Hilary talked about how her work is really about learning about humanity and to increase our ability to make better decisions about the world. I could really relate to thinking about CS as a means to an end, or as a way to model and think about the world, rather than focusing only on code itself. She also talked about a shift in our thinking from thinking about naive data (single pieces of information, which is interesting on its own), to data that is interesting in the human context.
There have been advances in the field of Big Data recently: scalability/clustering, algorithms, data storage/analysis. Some fascinating examples of recent Big Data applications are:
- Using heat maps to identify restaurants that were illegally disposing of used oil/grease (an organized crime problem) in New York. Using this data, the City of New York set up their own grease disposal company and approached the restaurants to sell their grease to them, eliminating the problem.
- Using ambulance response data to find out why ambulance drivers parked in non-optimal locations (it turned out there were coffee shops there). Using this data resulted in making deals with coffee shops to entice ambulance drivers to park in better locations!
What are Data Scientists?
Data science is a mix of disciplines like math, comp sci, engineering, and curiousity. Hilary showed us how the overlaps of these disciplines contain nerds, but the intersection of all of them contains awesome nerds! It was quite funny. Data scientists are concerned with building mathematical models for the right questions. It's important to find the questions that matter.
Bitly is actually a spinoff from a failed product. It's first year was consumed mostly with building scalable systems. They now see 10's of millions of URLs per day and 100s of millions of clicks per day. Its research goal is: Can we understand human social attention in real time? A few things they've learned are:
- social experience in a given social environment varies by individual
- attention is fickle
- data needs to be normalized for it to be truthful and accurate
- the frame changes the way we consume content
- new geographies matter (i.e. best time to get clicks on twitter is different from facebook)
- the internet IS the real world (i.e. look at network data during the Arab Spring, one reflects the other)
Engineering Process at Bitly
The process is as follows:
- Research offline
- Do fancy math - find the shortcut
- Design infrastructure
- Re-design to run at scale and speed
Steps two and three are generally done in Python, while steps three and four often involve C and Go.
Another interesting problem they are working on is Realtime Search. Rankings are done dynamically and can vary by the second! That's amazing.
They have a @GHCBot which tweets about items of interest to the GHC community, so be sure to check it out! For those of us fans out there, there is also a Star Trek bot, which you'll have to let me know about if you find it. :-)
No comments:
Post a Comment