Archive for the 'Startups' Category

Startups Awash in Data: Quantitative Thinkers Needed

We know unix logs everything, which makes web-based data collection easy, in fact almost difficult not to do. As a result internet startups often find themselves gathering enormous amounts of data, for example site use patterns, click-streams, user demographics and preference functions, purchase histories… Many of these companies know they are sitting on a goldmine, but how to extract the relevant information from these scads of data? More precisely, how to predict user behavior and preferences better?

Statisticians, particularly through machine learning, have been working on this problem for a long time. Since I’ve arrived in New York City from Silicon Valley I’ve observed an enormous amount of quantitative talent here, at least in part due to the influence of the finance industry. But these quantitative skills are precisely what’s needed to make sense of the data collected by startups, and here it looks like NYC has an edge over Silicon Valley. Friends Evan Korth, Hilary Mason, and Chris Wiggins (two professors and a former professor) are building bridges to connect these two worlds. Their primary effort, HackNY, is a summer program linking students with quantitative talent with startups in need. (Wiggins’ mantra is to “get the kids off the street” by giving them alternatives to entering the finance profession.)

The New York startup scene is distinguishing itself from Silicon Valley by efforts to make direct use of the abundance of quantitative skills available here. Hilary and Chris created an excellent guideline for data-driven analysis in the startup context, “A Taxonomy of Data Science:” Obtain, Scrub, Explore, Model, and iNterpret. These data are often measuring phenomena in new ways, using novel data structures, and providing new opportunities for innovative data research and model building. Lots of data, lots of skill – great for statisticians and folks with an interest in learning from data, as well as for those collecting the data.