Tuesday 20 June 2017

How Data Science Helps Us Ask The Right Questions: And Why IBM Never Became The King of Photocopies

Leaders sometimes ask questions that get in the way of solving the problem that really matters to them. We can learn a lot from a real example of two business titans.


During the 1960s, Big Blue had the opportunity to purchase or license the new Xerox reprographic photographic process (think: copies). IBM hired a consulting firm to answer the following question: "If a more reliable, cheaper and faster processing company were available, how many more copies of the original would people make in a given year?" He asked the wrong question - for miles. According to Paul Schoemaker and Steven Krupp's article in the MIT Sloan Management Review, The Art of Pivotal Questions (MIT Sloan Mgmt Rev. Winter 2016), IBM "ignored a new market segment that turned out to be many times more Copies of copies. "This was a huge opportunity overlooked. What would happen if IBM had asked instead:

"How could the new Xerox process change when and how people make copies, and what could grow in the total number of copies made in the next few years?"

I.D. The answer to the right question may well have resulted in IBM owning this new technology. Xerox may never have become a verb. We'll never know. What we do know, however, is that IBM paid dearly for asking the wrong question, and that companies should take the time they need to come up with the right questions to guide their initiatives. Spend a lot of time aiming and little time shooting.

Why this example? In the recent past, we have observed that the data science challenges that ask the right questions from the start have produced far-reaching results that your customers could never have conceived. The example of Harvard Medical School below demonstrates how to ask the right question can produce extraordinary results.

The challenges of world-class data science have three important features: (1) atomize (often called decomposition in the light of the scientific process known as decomposition of systems) the problem in its component parts; (2) extract the most difficult questions from the competition to serve as a precondition for proceeding, and (3) take those questions extracted to extract them from the domain. Collectively, these processes are known as DEA. They allow hundreds of thousands of the world's best problem solvers to analyze and solve problems without even knowing what the challenge is, such as helping Harvard Medical School (HMS) address DNA sequencing; Or align the (longer) strings that hold in place the solar matrices of the International Space Station in order to maximize energy capture within strictly prescribed parameters. In the latter case, the abstraction of the domain allowed a professor in Italy to produce the winning solution.

By atomizing a challenge, a crowdsourcing community divides it into small component parts. This brings together a number of benefits for its customers. First, community members self-select to compete in those sub-areas where they feel they have a comparative advantage and can win. Second, atomization allows parallel development (as opposed to sequential progress). With so many community members choosing to compete in specific challenges, one can find 143 contestants working in challenge (A), 110 in (B), and 79 in (C). When each of these challenges ends, it can be resynthesized with the others to bring together a whole, for example, (A) - (C). The alternative is sequential development, which requires that (A) be completed before (B), and then to (C) and so on. This leaves development vulnerable to weak links in the chain, and can be painfully slow. Atomization also allows more efficient participation. A contestant is more likely to be an expert at improving the algorithm used in a program than to be an expert at improving the algorithm and user interface for the program serving that algorithm. In the same way, this also improves the self-perception of probability of winning, and therefore the participation of the contestant in the atomized contest.

Data science also abstracts a problem of its domain (eg genetics or space or predictive toxicology) in the denominator that unifies crowdsourcing communities: mathematics. This has to be done with great care given that each domain has tacit or implicit assumptions of its practitioners, who need to do so as restrictions on the abstracted problem. While this is a restriction, the positive aspects of abstraction can far outweigh the hard work necessary to do it right. Instead of restricting competition to scientists who specialize in a single field, abstraction fosters new perspectives by virtue of the greater involvement of a much larger community with experience between domains. The paradigm changes that the effects of this strategy are the beauty of data science and the decomposition of systems.

At HMS, DEA resulted precisely in such an extreme result. Consider the statistics. The medical school wanted to minimize the distance between the strings to increase the effectiveness of their work in genomics. A previous attempt had processed 100,000 sequences in 15,622 seconds (260.4 minutes). HMS first looked in and dedicated a full-time resource (salary: $ 120,000) to meet the challenge. The developer lowered processing time to 2,845 seconds (47.6 minutes), a significant (but still unsatisfactory) result. Harvard Catalyst, a university-backed clinical science center at HMS, wanted to see if crowdsourcing could be applied to a traditional scientific community. The partners first atomized the problem in order to encourage community experts to self-select and respond to questions that presented them with a comparative advantage. The challenge had to be devoid of highly specific immunogenomic concepts of the domain to be attractive and to interest non-life science participants. See Karim R. Lakhani, award-based contests can provide solutions to problems in computational biology 109 (Nature Biotechnology Feb. 2015).

Summarizing the problem of sequence alignment as one involving string comparison, the challenge became accessible to a much larger audience of contestants from various fields. HMS devised a scoring metric that supported the goal of medical school to achieve greater accuracy and computational efficiency (speed). "That metric was revealed to the contestants and was the only measure used to award." HMS lasted two weeks and offered only $ 6,000 in prizes, and the top contestants received cash prizes of up to $ 500 each week.

The HMS edition distance challenge attracted 733 participants, of whom 122 presented algorithms. The presenters represented 69 countries. According to Karim Lakhani, a Harvard Business School professor who helped oversee the competition in his role as Principal Investigator at the Harvard-NASA Tournament Lab at Harvard's Institute of Quantitative Social Sciences, "none of the contestants were academic or computational biologists Computational And only five are described as coming from other R & D or life sciences in any capacity. "Id. Eighty-nine (89) completely different methods were explored and used in the 122 algorithms presented. It is difficult to conceive of any research effort that can economically and easily achieve an equivalent effort scale to address a specific problem in such a short duration (2 weeks). HMS kept a narrow focus and asked the right question: How can we dramatically minimize the time it takes to process the distance between genomic editions? That approach never faltered, even when crowdsourcing was used for the first time, and paid off.

How does HMS rate in its first foray into crowdsourcing? The results were astounding. With a winning solution of 16 seconds (976 times faster than the first attempt), several solutions were "very close to the theoretical maximum for the dataset". A 110. This extreme value represents a change not only in the way HMS tackled this complex issue of immunogenomics but also in the future of its innovation initiatives.

conclusion

We learn to ask the right questions at an early age. At an intersection, for example, a child might ask his father, "Does red mean we should stop or should we just stop?" The validity of the question will be confirmed by the answer in that case. Years later we ask questions about all aspects of our lives - jobs, finances, relationships, etc. We hope to ask the right questions at the right time. In the business world, the previous example of IBM's misguided question resulted in an entire industry that did not change as it might have. The implications for IBM were profound, just as they were for Xerox.

When we face complex scientific problems using data science, asking the right questions at each stop is critical to the process. If you do not, you can make the difference between frustration and deep innovation. Aim carefully and with due consideration in order to sculpt the correct question. You can not have a second chance.