Home > Uncategorized > THE DATA ENTREPRENEUR

THE DATA ENTREPRENEUR


clip_image001

Jeff Hammerbacher

Before cofounding Silicon Valley software start-up Cloudera in 2009, at the age of 26, Jeff Hammerbacher was a quantitative analyst on Wall Street and one of Facebook’s first employees.

 

The open-source advantage

 

I was Facebook’s first research scientist. The initial goal for that position was to understand how changes to the site were impacting user behavior. We had built our own infrastructure to allow us to do some terabyte analytics, but we were going to have to scale it to up to petabytes.1 We realized that instead of continuing to invest in infrastructure, we could build a more powerful shared resource to facilitate business analysis by working with the open-source community.

 

In founding Cloudera, I saw a path to a complete infrastructure for doing analytical data management. It would be made up of existing open-source projects as well as open-source versions of a lot of the technologies that we had built out internally at Facebook. Cloudera would be a corporate entity for pursuing those goals and ensuring that it wasn’t just Facebook that would be able to use this technology but, really, any enterprise.

 

Data leaders

When we started Cloudera, we didn’t have a core thesis around where the technology would be adopted or what the market was going to look like. Early adopters were clearly in the Web and digital-media spaces. But in terms of traditional industries, the federal government surprised me. They really are the leaders in multimedia data analysis—working with text, images, video. In the intelligence agencies, I’ve seen more sophistication than in commercial domains.

 

I was also surprised to see the retail space. Retailers had very large volumes of data, and because many were branching out into e-commerce, they had a lot of Web logs and Web data as well. There is an arms race going on right now in retail. If you can understand consumer behavior and get your hands around as much behavioral data as possible to better guide product decision making, then every penny you can eke out is increasing your margins and allowing you to invest more.

 

Financial services was one sector that I had hoped would be an early adopter, but these companies tend not to look at their businesses as a whole in the same way that retail does. Data management is thought of as project specific, even to the point where individual trading desks could have their own chief technology officers. Our technology tends to work best as a shared infrastructure for multiple lines of business.

 

Where this is headed is learning how to point this new infrastructure for storing and analyzing data at real business problems, as well as growing the imagination of businesspeople about what they can do when a variety of experts analyze the data. If you can digitize reality, then you can move your world faster than before.

 

Building a big data function

You need to make a commitment to conceiving of data as a competitive advantage. The next step is to build out a low-cost, reliable infrastructure for data collection and storage for whichever line of business you perceive to be most critical to your company. If you don’t have that digital asset, then you’re not even going to be able to play the game. And then you can start layering on the complex analytics. Most companies go wrong when they start with the complex analytics.

 

When deciding how to incorporate analytics expertise into an organization, you have to be honest about what your organization looks like—your capacity to hire and your long-term vision for what that organization is going to be. There isn’t one right answer. Yahoo! built a centralized group called Strategic Data Solutions to run the entire gamut. Rather than just building a small group of people primarily focused on marketing analytics, the company took an end-to-end view, extending from data storage to the actual P&L. In our group at Facebook, because we were a very fast-moving organization, we were much more of a platform—a service organization for the rest of the company.

 

The rise of the ‘data scientist’

I tried to articulate this title of data scientist in a book I put together with O’Reilly Media.2 I now actually see people describing themselves as data scientists in their job titles on LinkedIn and scientists talking about themselves as data scientists. So it’s evolving. People realize that there is a gap between the current role of statistician or data analyst or business analyst and what they actually want. They are grappling with the set of tools and the set of skills that they need. Across the whole research cycle, it’s a combination of skills that social scientists understand, plus additional programming skills, plus the ability to do aggressive prioritization. And, of course, a good grounding in statistics and machine learning.3 That collection of skills is difficult to find.

 

Pasted from <https://www.mckinseyquarterly.com/Competing_through_data_Three_experts_offer_their_game_plans_2868>

Advertisements
Categories: Uncategorized
  1. Gabriel Fraser
    February 23, 2012 at 10:56 am

    Quite an impressive resume I’d like to be on your team as soon as possible. I’ve done B2B for a number of years for Lucent Tecnologies, AT&T, JP Morgan,Fraser Enterprises and some other remarkable companies. Also marketing and coporate sales so I am very familiar with corporations all over the country and abrod. Running a home based office is wonderful, I believe I can get more done because it excludes a lot of travel time.

    My telephone # is 704-706-9652

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: