We know unix logs everything, which makes web-based data collection easy, in fact almost difficult not to do. As a result internet startups often find themselves gathering enormous amounts of data, for example site use patterns, click-streams, user demographics and preference functions, purchase histories… Many of these companies know they are sitting on a goldmine, but how to extract the relevant information from these scads of data? More precisely, how to predict user behavior and preferences better?
Statisticians, particularly through machine learning, have been working on this problem for a long time. Since I’ve arrived in New York City from Silicon Valley I’ve observed an enormous amount of quantitative talent here, at least in part due to the influence of the finance industry. But these quantitative skills are precisely what’s needed to make sense of the data collected by startups, and here it looks like NYC has an edge over Silicon Valley. Friends Evan Korth, Hilary Mason, and Chris Wiggins (two professors and a former professor) are building bridges to connect these two worlds. Their primary effort, HackNY, is a summer program linking students with quantitative talent with startups in need. (Wiggins’ mantra is to “get the kids off the street” by giving them alternatives to entering the finance profession.)
The New York startup scene is distinguishing itself from Silicon Valley by efforts to make direct use of the abundance of quantitative skills available here. Hilary and Chris created an excellent guideline for data-driven analysis in the startup context, “A Taxonomy of Data Science:” Obtain, Scrub, Explore, Model, and iNterpret. These data are often measuring phenomena in new ways, using novel data structures, and providing new opportunities for innovative data research and model building. Lots of data, lots of skill – great for statisticians and folks with an interest in learning from data, as well as for those collecting the data.
September 2010 S M T W T F S 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Archives
- July 2020
- May 2020
- April 2020
- March 2020
- September 2014
- May 2014
- September 2013
- May 2013
- April 2013
- December 2012
- January 2012
- November 2011
- May 2011
- March 2011
- February 2011
- September 2010
- February 2010
- January 2010
- December 2009
- November 2009
- October 2009
- September 2009
- July 2009
- June 2009
- April 2009
- November 2008
- October 2008
- September 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- August 2006
- July 2006
- September 2005
Categories
- A2K3
- anaphylaxis
- Berkman
- Berkman Lunch Series
- Book Reviews
- Conferences
- Developing world
- Diary
- Economics
- Fascinating People
- Film
- health
- Human Rights
- Intellectual Property
- Internet and Democracy
- Law
- Law
- Machine Learning
- Media
- Middle East
- Open Data
- Open Science
- OSTP
- Peer Review
- personal
- Reproducible Research
- Scientific Method
- shameless self-promotion
- Software
- Startups
- Statistics
- Talks
- Technology
- Uncategorized
- Women's rights
Pingback: Technology By Day » Startups Awash in Data: Quantitative Thinkers Needed « Victoria …