Thursday, August 28, 2014

Information Economics: The Foundation of Business Analytics.

The technology industry often produces multiple, not-altogether-consistent definitions of the latest "hot" thing. Business analytics is no exception.  Confusion can be the result.

Hence, the intensity of my focus on definitions. Having previously defined "data science", I now drag readers through an exercise in defining business analytics.  At the risk of appearing obsessively compulsive, I repeatedly emphasize the business context.

The charter statement for this blog emphasizes the study of the application data science to business problems.  I seek to apply the scientific method to its practice.
"Data and statistical methods" have become inseparably associated with the "how" of business analytics.  I want want to dig deeper.  

Science is about answering "why." Since business is the domain of interest for business analytics, we should look to economics as a candidate foundational science. In the following, I:

  • Make the case for foundation for business analytics in economics;
  • Introduce a specialized domain of economics on which business analytics is based; and
  • Provide a simplified illustration of its use.
Why, reader, should you care about these things? Big data and business analytics are subjects of many bold claims regarding their transformational abilities.  Some of these claims are valid, and some are not.  I seek here to help you separate the science from the alchemy.

What is business analytics?

Service oriented architecture (SOA) — a source of significant tech-industry buzz during the last decade — provides a case study in definitions.  Distinct definitions appeared to arise for each stakeholder class. The Open Group — a non-profit organization promoting open standards for technology view — offers two definitions of SOA.  Software vendors tend to emphasize the key technology components.  For SOA, those are an Enterprise Service Bus (ESB) and a services registry.

Merrifield, et al,¹ identified the business payoff for a SOA approach to strategic technology management. SOA promises a cost-effective approach to mass customization of information technology. They describe a SOA planning method. But they did not explicitly define the practice.

Enterprise architecture (EA) provides another example.  I like Gartner's definition for two reasons:
  • The veracity of its source (i.e., Gartner said it); and
  • Its focused on "enterprise" in the business sense, independent of the technology.
The technology community captured the EA term to connote the architecture of the IT infrastructure — for either an organizational enterprise or for an individual system. Technology consulting group Forrester Research invented a new term, Business Architecture, apparently in response. Ross and Weill² established the need for a business-centric definition.

Why this circuitous path?  I want to make the case for an economics foundation for business analytics. Getting definitions right is important to my case. Statistical (and deterministic) models, modeling tools, and enterprise information management technologies are "how" business analytics is done.  Science seeks to answer the question, "Why"?

The important point here is that we focus business analytics on answering questions that lead to measurable, net-positive business outcomes.  We seek a scientific underpinning from which to achieve this objective.

What, then, does this mean for the discipline of business analytics? I continue with the pattern of a business-centric perspective. I also want to define it as precisely as possible. Wikipedia, the reflexive "go to" source, offers a pretty good definition:
Business analytics (BA) refers to the skills, technologies, practices for continuous iterative exploration and investigation of past business performance to gain insight and drive business planning. Business analytics focuses on developing new insights and understanding of business performance based on data and statistical methods.
This definition borrows from Bartlett³ (whom I have yet to read, but have added to my Kindle wish list).  Davenport⁴ defines the payoff — "improving performance in key business domains" — without explicitly providing a definition.

So, business analytics is about quantitative characterization of business performance. What then is business about?  Those who have sat through an MBA program might observe that the majority of the curriculum is derived from economics and its applications. An economic foundation for business analytics therefore seems reasonable.

Hence, an economics grounding for business analytics.  I preserve here the distinction between business analytics and econometrics.  The two disciplines use many of the same tools. Business analytics focuses however on a distinct organization.  It arguably constitutes a subset of econometrics.


The foundation of business analytics

Information economics is business analytics in its must fundamental form. Information economics is the science of assigning economic value to information.  It combines principles from the following disciplines:
  • Game theory, by which economic transactions are defined and modeled;
  • Information theory, with bases both in engineering and psychology disciplines; and
  • Microeconomics.
Figure 1 illustrates. 

Figure 1 — Information Economics resides at the intersection
four more familiar disciplines.

Practitioners of Info Economics employ a clearly defined toolset.
  • Information modeling,⁵ borrowed from information theory, precisely represents the distribution of elements of information among participants in an interaction;
  • Game theory contributes pattens for archetypical transactions between participants in an information exchange; and
  • Microeconomics provides bases for economic valuation of elements of information involved in a transaction between counterparties.
The presence of uncertainty in economic transactions introduces probability theory as well.

Information economics provides the foundation for many well known theories about the operation of financial markets.⁶  The interplay between bid-ask prices in a financial exchange, for example, telegraphs considerable information about counterparties' intentions and abilities without explicitly "showing their hands."  The Efficient Market Hypothesis finds partial justification in Info Economics.

Marketing economics is replete with examples.  Applications occur of course in other business disciplines.  For example, information economics can inform investment decision making.  Its principles can also guide what aspects of operational cost and efficiency are most worthy of measuring.

As an aside, the similarities between information economics and real options⁷ are striking. Real options theory assigns economic value to flexibility in making investment decisions.  I leave that discussion for a future installment.


A simple, contrived illustration of assigning economic value to information†

My illustration here is based on the "Elementary Game,"⁸ one of the simplest models from game theory. It resembles the Binary Symmetric Chanel (BSC). I first saw the BSC in communications courses while studying electrical engineering.  Info economics and communications theory (the engineering variety) share roots from information theory. That their toolsets resemble each other does not surprise me.

Figure 2 — a BSC illustration — gives us a passable representation of the "Elementary Game." (Note:  Texts in communications theory⁹ and info economics share the "Alice" and "Bob" notation.) The a priori events appear on the left-hand side.  There is some probability of either of two events occurring.  "Alice" initiates one event or the other.  

"Bob" receives a "signal" indicating — with a probability p that it is correct — which event "Alice" effected.  He therefore views the event from an a posteriori perspective.  Based on knowledge of the a priori probability of what "Alice" did, the probability that the signal is correct, and the cost/benefit of either of two resulting courses of actions, "Bob" must decide what his optimum next step is.


Figure 2 — The "Elementary Game" from game theory resembles the Binary Symmetric Chanel (BSC), a basic building block of communication theory.  (Source: Wikipedia, http://en.wikipedia.org/wiki/Binary_symmetric_channel)

Let's see the "Elementary Game" in action. Say that I'm a BMW dealer. I operate in a market that generates 10,000 sales annually.  I capture an average of 1,000 of those sales — or a 10% market share. Sales produce an average of $50,000 in revenue.  I have traditionally used mass media — broadcast and newspapers — for my advertising.


Let's now say that I can identify decision factors — information — that influence buyers' decisions about whether and from whom to make a purchase of a new car in my market segment.  These factors might include:
  • Capacity to make the purchase (e.g., disposable income);
  • Brand preferences; and
  • Age of current vehicle;
among other factors. I can use this information as the basis for a targeted advertising campaign that — with probability of 25% — increases my market share to 15%. Assume that (in order to keep this example simple) the targeted ad campaign costs the same as the mass media campaign.

How much is this information worth?  Information economics defines the value of information as:  "...the increase in utility from receiving the information and from optimally reacting to it."⁸  So, without the information I can take a course of action leading to one outcome — a specific revenue level in our case.  Given the information, I can make a decision to pursue an alternative course of action.  This alternative leads to a different outcome. I characterize my two alternatives using the same measure.

The increase in utility in our example is change in revenue realized from a targeted ad campaign. From elementary probability theory,

ΔRevenue = Pr{ΔSales} × ΔSales × Average revenue/sale
             = 25% × 500 × $50,000 
≈ $6,250,000.

Information that can — with probability 25% — increase my market share by from 10% to 15% is worth about $6 million to me!  This trivially simple illustration demonstrates the power of Google's business model.

Information economics at work

I illustrated a scientific approach — based on Information Economics — to assigning value to two specific elements of information in a specific business context. These essential elements of information are:

  • What change in market share might I be able to effect with a targeted ad campaign; and
  • What is the probability that my targeted ad campaign will produce that result.
This gives me an economic value of those two information elements.  I apply business analytics — data and statistical methods, tools, data scientists — to obtain the answers to these specific questions.  If the cost of getting this information is less than its economic value, then applying business analytics here yields a net-positive economic benefit.

This illustrative example admittedly oversimplifies things. Business decision makers should base their decisions on a range of probabilities. Few business questions lead to discrete, binary answers.

So, what does all this mean?  First, information economics provides a scientific approach to business case analyses for business analytics initiatives.  The value of business analytics is measured by its economic returns.  We now have a rigorous approach to determining the "goodness" of business analytics initiatives.

Second, this leads to criteria for a strategy for adoption of business analytics by organizations.  Lavalle, et al, advise would-be data-driven organizations to, "Start with questions, not data!"¹⁰  Successful adopters of business analytics as a foundation for decision making keep a laser-like focus on:

  • Business outcomes; and 
  • The questions that lead to them. 
This also implies a gradual, evolutionary approach to adoption.  But more on that, later.

Next installment:  Is more data always better?


Note:  Missed my cadence last week.  A short-notice proposal turned a slack week into a frenetic one.  But back in the saddle again, this week.


¹ R. Merrifield, J. Calhoun, and D. Stevens, "The next revolution in productivity," Harvard Business Review, June 2008,  http://goo.gl/Y58xqm.
²J. W. Ross and P. Weill, Enterprise architecture as strategy, Boston:  HBR Press, 2006, http://goo.gl/B7J5P8.
³ R. Bartlett, A practitioner's guide to business analytics," McGraw-Hill, 2013, http://goo.gl/o6dTOS.
⁴ T. H. Davenport, J. G. Harris, and R. Morison, Analytics at work, Boston:  HBR Press, 2010, Location 112, Kindle Edition, http://goo.gl/olZkKm.
⁵ L. Samuelson, "Modeling of knowledge in economic analysis," Journal of Economic Literature, June 2004, pp. 367-402.
⁶ M. K. Brunnermeier, Asset pricing under asymmetric information, London:  Oxford, 2001, http://goo.gl/7IMFDv.
⁷ See, e.g., M. Amram and N. Kulatilaka, Real options, Boston: HBR Press, 1999, http://goo.gl/6Bswjk.
⁸ M. Bütler, Information Economics, New York:  Routledge, 2007, p. 42, Kindle Edition, http://goo.gl/1zZKQ1.
⁹ see, e.g., B. Schneier, Applied cryptography, 2nd ed, New York:  Wiley, 2001.
¹⁰ S. Lavalle, et al, "Big data, analytics, and the path from insights to value," MITSloan Management Review, Winter 2011, pp. 21 - 31, http://goo.gl/8RSn5H.
† This example is purely fictional.  Any resemblance to experiences by actual BMW dealerships is purely coincidental.


© The Quant's Prism, 2014

Thursday, August 14, 2014

The "Science" of Data Science.

The inaugural installment of this journal described a broad vision related to the study of "data science." I attempt here to focus that ambition somewhat. Attempting to define the term "data science" is the objective of the this discussion.  

Most importantly, I want to assert a point of view that emphasizes science. I also approach data science from the point of view of a business professional. I am interested in applications of data science to deepening insight into strategies and operations of businesses and other organizations.

As an aside, I found this task more difficult than expected. A recent NY Times op-ed piece asserts that "Writing Is a Risky, Humiliating Endeavor." My experience here may corroborate that author's point of view.

The challenge with defining "data science" is that it is intertwined with a number of related IT-industry terms including, but not limited to:
  • Business intelligence;
  • Analytics and business analytics; and
  • Big data.
A number of Internet tools attempt to measure the intensity of attention given to topics.  Google Trends reports relative rates of search-engine queries.  The graphic below reports the relative frequencies for our four terms of interest.  We see that the term analytics attracts the lion's share of the interest.

"Data science" is practically lost in the noise. The growth rate for "data-science" queries appears relatively steady. Seeing total-volume statistics would be interesting.


A number of definitions of "data science" have been attempted:
  • Strata's Mike Loukides published What is Data Science? through O'Reilly.  This meandering narrative covers technologies and anecdotal instances of their use. A definition of data science is not concisely presented.
  • Gartner performed a text-analytics study identifying the core competencies of a data scientist as data management, analytics modeling, and business analysis. Gartner extends this list to "soft" consultant-related skills.  We find here an indirect definition of data science — in terms of its primary practitioner.  
  • Wikipedia provides a definition represented as an intersection of computer science, applied mathematics, and subject-domain expertise for a specific field.
  • Forbes contributor Gil Press, in "A very short history of data science," observes 52 years of history of the use of the term.
Regrettably, no definition suitable to our purposes conspicuously presents itself.  I therefore presumptuously undertake to proffer my own. 


What is Data Science?

We seek a business-focused definition of data science.  We also emphasize the scientific aspect. Business analytics captures one aspect of data science.  Data science can moreover be applied to "big data."

My approach is inspired by a 2008 Gartner report, "Gartner Clarifies the Definition of the Term Enterprise Architecture."  I address each of the points in the outline used for Gartner's definition.

What it isData science is the application of the scientific method to extract actionable insights from diverse sets of business information. 

What the scope is:  Data science prescribes systematic, reproducible methods to the entire information lifecycle from information source to information consumer.  This includes mathematics, data visualization, information management, and business analysis.   The breadth of data science's span transcends business domains of strategy, operations, finance, and logistics. 

What the result is:  The application of data science to business information leads to the best achievable information on which to base a specific business decision or action. Its results are organized and presented in a manner specifically designed for the information consumer.  They also contain indications of the degree of confidence to which the consumer should assign to them.

What the benefit is:  Consumers of information produced by data science receive the best achievable, actionable information specific to high-priority business questions. This results are provided the minimum expenditure of resources compared to less-scientific approaches. The scientific method leads to application of the most-appropriate mathematical and information-management methods to answer specific business questions given the available data.


How is Data Science related to the Scientific Method?

My preference for the term "data science" over "analytics" or "big data" is grounded in the prominence of science.  "Applying classic scientific methods to the practice of management¹" is one of the key promises offered by the movement encompassing data science, business analytics, and big data. "...The ultimate goal of data science is improving decision making, as this generally is of paramount interest to business.²" Improved business decision-making leads to:

  • Improved predictability of decision activities;
  • Reproducibility and transparency in the decision-making process; and
  • Precise separation of uncertainty into aspects that can be mitigated and those that cannot.
Data science — including big data, business analytics, and business intelligence — can never completely remove uncertainty from making decisions. It does separate resolvable uncertainties from those that remain — to varying degrees — "known unknowns."  

So how do business-focused data scientists apply the scientific method to business analytics?  Business analytics thought leaders describe high-level approaches in leading business journals.  

This list is by no means exhaustive.  They are consistent with analysis methods in which data scientists are indoctrinated during their educations.

I turn to Pirsig⁵ — an admittedly quirky source — for an accessible summary here.  Pirsig summarizes the scientific method in four steps:

  1. State the problem in terms that are no more than you are positive that you know;
  2. Formulate hypothesis of candidate causes for the problem;
  3. Design experiments to test the each hypothesis in isolation;
  4. Interpret the experiment results in terms of whether the hypothesis is proven or refuted; and
  5. Update the candidate hypotheses and return to step 3.
This is similar to Davenport's six-step "procedure."

This rigorous, systematic approach is necessary and sufficient to achieve the fundamental objective of data science in business decision-making:  Isolating uncertainty factors into those that can be resolved and those that remain uncertain.  Cutting corners leaves residual uncertainty.

Practicing analysts who may have read this far may ask, "But, what about unsupervised learning?"  I will address unsupervised learning in depth in a later installment dedicated to the topic.  Suffice it to say, for now, that unsupervised learning provides a source of candidate hypotheses.  Each resultant candidate hypothesis must be scientifically tested as described above.

Next Installment:  The economics of information.


¹ "Big data: The next frontier for innovation, competition, and productivity," McKinsey Global Institute, McKinsey & Company, 2011, p. 98,  http://www.mckinsey.com/~/media/McKinsey/dotcom/Insights%20and%20pubs/MGI/Research/Technology%20and%20Innovation/Big%20Data/MGI_big_data_full_report.ashx.
² F. Provost, T. Fawcett, "Data science and its relationship to big data and data-driven decision making," Big Data, Mary Ann Liebert, Inc., February 13, 2013, p. BD 53, http://online.liebertpub.com/doi/pdfplus/10.1089/big.2013.1508.
³ LaValle, Steve, et al, Big Data, Analytics and the Path From Insights to Value, MITSloan Management Review, Winter 2011, http://sloanreview.mit.edu/article/big-data-analytics-and-the-path-from-insights-to-value/
⁴ T. Davenport, "Keeping up with your quants," Harvard Business Review, July-August 2013, http://hbr.org/2013/07/keep-up-with-your-quants/ar/1
⁵ Pirsig, R. M., Zen and the Art of Motorcycle Maintenance, Harper-Collins, 1974, (Kindle edition 2009), http://goo.gl/si1ayP


© The Quant's Prism, 2014

Thursday, August 7, 2014

Introduction: Motivation and Charter

Charter

I dedicate this journal to promulgating and testing ideas about mathematics and its role in society and the economy. The popular media seems to pay Mathematics much attention these days.  Imaginations about the potential of the closely related field of artificial intelligence have also been reignited.  IBM's Watson project makes Stanley Kubrick's 2001: A Space Odyssey HAL 9000 character seem almost within reach.

I use this venue to contribute my verse — to borrow from Whitman — to the data science cacophony. Installments will address topics including:
  • Consideration potential and limitations of data science to decisively contribute to business and public policy;
  • Summary and critical analysis of topics of current interest in the media; 
  • Discussion of approaches to applying data science to practical problems;
  • Review of noteworthy literature germane to data science; and
  • Illustrative results from my own computational tinkering.
Providing a dispassionate view of selected topics related to mathematics will be a successful outcome of this project.  

I intend to attempt weekly installments. One installment a month will be a quantitative case study. Openly available data sources provide rich troves of opportunities for curious analysts.  The first installments will use labor-economics data from the bureau of labor statistics.

The remaining weekly installments will cover "softer" topics. The list of "soft" topics contains fifteen entries. The order is still under consideration.

Motivation

Why write about Mathematics and its role in society and the economy?  A couple of reasons come to front of mind:

  1. I love mathematics.  Insight into a nuance satisfies like few other things. Math represents to me The Spiritual; The Mystical; A Glimpse of the Divine.
  2. Math has suddenly become cool!  The popular media recently produced shows like Numb3rs.  Harvard Business Review called data science "The sexiest job of the 21st century."   Yes, we still have Big Bang Theory.  But it's a long way from Steve Urkel.
So how can somebody feels this way about Mathematics provide a dispassionate view?  I can because I am the recipient of a scientific education. The scientific calling demands passionate dispassion.

My experience and education also equip me for dispassionate exploration.  For example:

  • I follow the pattern of a one-time consultant and mentor.  R.C. Hansen published a widely circulated paper about the fundamental limitations of his field, antennas.  The grip of this arcane, narrow discipline upon him was sufficiently powerful to drive him to edit or write the definitive reference in his field not once, but twice.  The first appeared in 1985 and the second in 1998.  (A copy of each collects dust on my bookshelf.)  This commitment sharpened his objectivity.
  • I view the limitations of mathematics as an attribute of its beauty. Applying mathematics to business and public policy is fundamentally grounded in information theory. Cover and Thomas open their Information Theory text with presentations of the Information Inequality and the Data processing Inequality.  These fundamentals constrain data science to the same extent that gravity and drag impose bounds on aerodynamics.
But, the popular media appears to assign limitless potential to mathematical computation perpetually increasing due to the inevitable advance of Moore's Law.  The Gartner Group annually publishes a variety of "Hype Cycles."  Many within the IT industry will be familiar with this methodology.  A hype cycle represents a technology's lifecycle.  Similar methods appear elsewhere in the business literature (e.g., see Moore, HBR, July 2004).

Figure 1 shows Gartner's 2013 installment of its "omnibus" Hype Cycle for Emerging Technologies. (The entire volume is available for purchase at a price accessible to many medium-sized and large businesses.)  "Big Data" appears prominently at the "Peak of Inflated Expectations.  Big Data is frequently applied to the application of data science.


Figure 1 — The Gartner Group's 2013 installation of "Hype Cycle for Emerging Technologies." Source:  http://goo.gl/a4xlEY.

What does this mean for Data Science?  It is presently the boom phase of the boom-bust cycle followed by many industries. Gartner forecasts that "Big Data" will reach the "Plateau of Productivity" stage of industry maturity in five to ten years.  Between now and then, many dreams will be smashed, hearts broken, and fortunes lost.  The illusion that any analytical problem can be solved given enough CPU clock cycles will be shattered.

How does one get from now to then?  Two obvious paths present themselves:
  • We can hop on the roller coaster, raise our arms above our heads, and scream at the top of our lungs; or
  • We can practice passionate dispassion.
My preference ought to be obvious by now.