Cedalian on the shoulders of the giant Orion (source: Wikimedia)

Enterprises today are working hard to embrace artificial intelligence (AI) to compete. The greatest challenge they face is that AI represents a true paradigm shift for how we solve problems using software and data. Not only must your organization acquire or build new skills, but it must also unlearn patterns that previously made it successful. This unlearning/learning challenge exists at all levels — for an experienced professional, for an organization, for a company. And if it’s difficult for an individual to adapt, it’s a million times harder for a company to adapt.

Based on my success driving a large-scale development tools transformation initiative at IBM in the mid-2010s, last year (2019) I got the opportunity to lead IBM’s AI for developers mission within the Watson group. It turned out that my lack of personal experience with AI was an asset, as I could experience my own learning journey and thus gain deep understanding and empathy for the developers and the companies whom I would be helping to accelerate.

This article shares what I learned in the first year with the goal of helping other developers and leaders of development organizations who might embark on a similar learning journey.

Drinking from a firehose

The absent-minded maestro was racing up New York’s Seventh Avenue to a rehearsal, when a stranger stopped him. “Pardon me,” he said, “can you tell me how to get to Carnegie Hall?”

“Yes,” answered the maestro breathlessly. “Practice!”
— E.E. Kenyon

The most important skill of a knowledge worker is your ability to learn. To succeed and thrive, you must be intentional with your approach to learning, with a focus on only consuming quality information and in an efficient manner [1]. In my 20 years in the tech industry, I have developed my own approach to effective learning, which I describe briefly here.

I alternate between studying concepts and applying these concepts in realistic but tractable practice settings, grounded in Ericsson’s theory of deliberate practice [2]. In the case of learning about a technology like AI, I work to understand how others have applied it (the problem space) and to understand the cases in which the technology is superior to alternatives, which in this case is traditional programming (the solution space). This is important when embracing a new technology because it helps ward off the golden hammer problem (if all you have is a hammer, everything looks like a nail) as well as understanding the art of the possible and thus avoiding magical thinking. Finally, beyond self-study and practice, my learning relies heavily on discussions with experts, which frankly is the greatest luxury and privilege of being an IBM Distinguished Engineer.

My own learning journey for AI turned out to be the most intense and difficult learning I’ve done since getting my computer science degree at Penn State. AI was just so different from traditional programming! It was so difficult to rewire my mind to think about software systems that improve from experience [3] vs. software systems that merely do the things you’ve told them to do. And the math and statistics—I thought I left those things behind in college with my dumb haircut!

When you are mired in a swamp of complexity, you need tractable conceptual frameworks that help you gain a footing. For my own AI learning journey, I found this in The AI Ladder.

The AI Ladder

All models are wrong, but some are useful.
— George Box

Early in my career I thought it was a bit uncool to work for a very old tech company. But as the years went by and as I saw both startups and established firms go out of business or get acquired and assimilated into obscurity, I came to gain deep pride in IBM’s adaptive capacity. If you think about it, the ultimate superpower of both humans and of human organizations is our ability to learn and adapt to changing circumstances, and a company cannot survive in the tech industry if it cannot reinvent itself every 10 to 20 years. IBM has reinvented itself many times over—it’s a core competency.

Through this lens, one may understand that another core competency of IBM is to help other companies adapt, which is very hard to do at scale. IBM does this holistically, with consulting, services, and technology. In the case of AI, IBM had developed a conceptual model to help enterprises reason about AI-based transformation called the AI Ladder [4].

The AI Ladder has four conceptual rungs: collect, organize, analyze, and infuse. The fuel of AI is data, but all enterprises have a massive data sprawl problem because of years of siloed IT work, the projects vs. products mentality, and acquisitions. In any given enterprise, you might have twenty databases and three data warehouses with redundant and different data about customers and customer relationships, and then you have the same problem for several hundred other data types (orders, employees, product information, etc.). IBM promoted the AI Ladder to help organizations (metaphorically) climb out of this morass, and we organized around it, with new learning offerings, new professional services, and an updated data and AI software portfolio, including significant updates and changes to mainstays like databases and analytics/reporting as well as brand new products in our AI portfolio.

IBM’s AI Ladder: Collect, Organize, Analyze, Infuse (source: IBM)

The most interesting rung for me was “infuse” which deals with how a company fundamentally improves its user experiences, its capabilities, and its business processes by integrating trained machine learning (ML) models into production systems, and designing feedback loops such that the models continue to improve from the experience of being used.

As an example, imagine that Blockbuster Video in the 1990s had a data science department (it probably did). Their head of retail could ask this data science department to analyze sales trends to inform the mix of movies displayed on shelves, by region, with updates to the basic model on a quarterly basis. This is certainly applied data science and may even make use of machine learning, but it is not infused. Now consider the Netflix recommendation system. To the user, it’s a similar grid of potential movies to watch, but behind the scenes there are sophisticated machine learning models personalizing not just the selection of movies but even the screen art, all with the performance goal of keeping you happily engaged inside the Netflix app. That’s infused AI, even though users don’t realize it.

Where does this leave developers? The skill set for machine learning and software engineering is mostly non-overlapping, except for very fundamental things like “complex problem solving” and “programming.” The AI Ladder framework initially made me think that developers needed to wait for someone to collect, organize, and analyze their data, ultimately resulting in machine learning models that the developer could then integrate (infuse) into applications and business processes. While this is often the case for specialized models (like the Netflix recommendation system), it turns out that there is a class of problems where developers can skip right to infuse.

When my manager Beth Smith first told me this, I found it confusing—the ladder must be climbed! But as I continued my learning journey, I realized (unsurprisingly) Beth was completely right, and it’s grounded in some of the most fundamental principles of software engineering and architecture, which I understand very well.

Fundamentals

The entire history of software engineering is that of the rise in levels of abstraction.
— Grady Booch

In 1972, Canadian software engineering pioneer David Parnas wrote his paradigm-establishing paper On the criteria to be used in decomposing systems into modules that popularized now-fundamental software engineering concepts like modularity, encapsulation, and information hiding. You can trace a straight line from the perceived benefits of microservices architectures back to Dr. Parnas’s groundbreaking paper.

Sometime between reading this paper many years ago and my 2019 study of AI, I had tacitly come to think of APIs as simply a mechanism to make your service more useful to other services and to aid in rapid composition of services into applications. I’d somehow forgotten that APIs are also a useful mechanism for making some difficult software implementation accessible to a broad audience [5]. As a simple example, think about this Google search query:

google.com/search?q=Watson

This simple URL that you can paste into any web browser encapsulates tremendously complex computing, Internet, web, information retrieval, and machine learning technology for which our industry have collectively invested literally hundreds of billions of dollars to make possible.

APIs are a special case of Parnas’s concept of information hiding in that they make three related assumptions [6]:

  • The API creator does not directly collaborate with the API consumers
  • There are many, typically heterogeneous, consumers
  • The interface must be designed for durability, as breaking changes are horrendously expensive for the community to absorb in aggregate

This fundamental of software architecture also helps explain Beth’s statement and informs how we may — in some scenarios — jump to the top of the AI Ladder.

On the shoulders of giants

If I have seen further, it is by standing on the shoulders of giants.
— Isaac Newton

Academia and industry have been working towards today’s machine learning technology for many, many years and we as a civilization crossed some sort of threshold in the past five to ten years such that these technologies are now accessible to any hacker, startup, government, or enterprise.

While it is necessary for you to climb the AI Ladder for custom models — that is those models where you collect the data, organize it, select the ML algorithm(s), and train the models [7]— it is also possible to encapsulate and thus outsource the lower rungs to external experts, in two scenarios:

  • Developer APIs
  • AI applications (not covered here)

Several years before I joined Watson, previous leaders had reasoned—correctly—that we could use the combination of APIs, pre-built ML models, and (optional) tooling to encapsulate the collect, organize, and analyze rungs of the AI ladder for several common ML domains including natural language understanding, conversations with virtual agents, visual recognition, speech, and enterprise search, to name a few.

Let’s use Watson’s Natural Language Understanding (NLU) as an example. Human language is incredibly rich and complex and, as a practical matter, impossible to understand using traditional programming. However, machine learning (especially the deep learning variety) is now very good at understanding many aspects of language including concepts, relationships between concepts, and emotional content, to name a few. We can explain this via analogy: a human child learns language through sensory input (hearing others speak), practice (trying words and phrases), error correction (a parent correcting wrong usage or pronunciation), and repetition (kids like to talk!). On the other hand it would be ludicrous to teach a child to speak via vocabulary and grammar textbooks. We teach our NLU service to understand language in a conceptually similar way, though with (obviously) quite different mechanics. Finally, we make all of this capability and all of the many hundreds of person years of research and development on machine learning-based natural language processing available to developers via an elegant API and supporting set of SDKs. You can see this API in action through this cool demo.

Thus developers can today begin leveraging certain types of AI in their applications, even if they lack any formal training in data science or machine learning. It doesn’t entirely eliminate the AI learning curve—you still need to get your head around things like probabilistic systems, how to integrate error detection and correction to improve the underlying models, and simply how to make use of data types like language and images which heretofore were inaccessible to you—but this is a far gentler learning curve than starting from first principles (let’s talk about linear algebra!). Also, from an organizational perspective, it means that leaders can execute a bimodal adoption strategy: build up an internal data science / ML capability and start climbing the AI ladder for business- or industry-specific data [8], while simultaneously getting your current application developers started today, via the APIs and SDKs, especially now that Watson is available anywhere, on any cloud and on premises.

So with good AI APIs we can raise the level of abstraction such that developers who lack a machine learning background can start leveraging AI today. They are so simple to use that it’s easy to overlook the power and the science behind them. When you write a line of code that calls an AI API, you’ve skipped right to the top of the AI Ladder. It almost feels like cheating! But you haven’t cheated; you’re standing on the shoulders of the giants of the field [9] whose research, insights, persistence, and genius brought us to this point.

So then, what will you do?

Footnotes

[1] Deep learning pioneer and Turing Award winner Yann LeCun recently shared an interesting hypothesis comparing human learning to machine learning:

It is more efficient for evolution to specify the behavior of an intelligent organism by encoding an objective to be optimized by learning than by directly encoding a behavior. The price is learning time.

The reason for this is also why it’s more efficient for human engineers to build AI systems through machine learning than through direct programming. The price is training data.

[2] For a great primer on deliberate practice, read Morten Hansen’s book Great at Work, chapter 4, “Don’t Just Learn, Loop.”

[3] Carnegie Mellon professor Tom Mitchell provided the following popular definition of machine learning:

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.

[4] For a fuller exploration of the AI Ladder, Rob Thomas, IBM’s SVP of Cloud and Data, wrote both a short O’Reilly Radar report and a full book, that are both available for free download.

[5] Subsequent to originally publishing this article, I had a thought-provoking conversation on LinkedIn regarding accessibility with Frances West, former IBM Chief Accessibility Officer and author of Authentic Inclusion. Prior to meeting Frances in the mid-2010s, I—like many people—had the misconception that accessibility was only about making technology usable by people with physical disabilities. While that’s certainly a critical aspect of it, Frances taught me a broader view, which she reinforced in our conversation:

Accessibility to me has never been just about disability. It’s about extreme personalization and recognizing the human first in an increasingly tech driven, tech dominate world. As technologists, it’s our responsibility to respect human differences and make technology work for all, especially foundational technologies such as AI.

Through this lens, it made me realize that APIs are fundamentally about accessibility, which was a revelation (thank you Frances! 🙌🏻).

[6] The purpose and nature APIs remind me of something I once read about writing books, perhaps by Stephen King (?), in that they connect author and reader across time and space, and the author must attempt to imagine how the reader will interpret the prose, while accepting and embracing that the reader may interpret the prose in ways never imagined by the author. Similarly API designers must try to imagine all the ways their API might be used while accepting and embracing the fact that developers will use it in ways never imagined by the API designer.

[7] IBM’s Watson Studio supports building custom ML models from the ground up, for any sort of data.

[8] If you’re a developer who wants to go all-in and try your hand at applied machine learning, I recommend Andrew Ng’s famous Coursera course and the excellent book Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Ed. by Aurélien Géron. On the latter, the great Tim O’Reilly gives it his highest endorsement.

[9] Turing, McCarthy, Rosenblatt, Minsky, Jordan, Thrun, Ferrucci, Hinton, Bengio, LeCun, and Li, to name a few. For deeper historical narratives on the development of AI and ML, I recommend John Markoff’s Machines of Loving Grace and Sean Garrish’s How Smart Machines Think. Another excellent resource is Architects of Intelligence, where futurist Martin Ford interviews twenty-four of the leading AI pioneers of the past several decades both for historical perspective as well as speculation about where we might be going.

Acknowledgments

Many thanks to the following dear IBM and industry colleagues for reading and providing feedback on earlier versions of this article: Allie Miller, Barry O’Reilly, Chunhui Higgins, Dallas Hudgens, Erik Didriksen, Grady Booch, Katelyn Rothney, Lindsay Wershaw, Rachael Morin, Rick Gebhardt, and Robyn Johnson.

A special thanks to dear friend and long-time mentor Kyle Brown who, after reading the first draft, explained to me what I was actually trying to say. ☺️

A special thanks to Watson API architect Jeff Stylos, who patiently explained the ideas described in the “On the shoulders of giants” section to me, many times, until I finally got it. Thanks for your patience Jeff and your dedication to your craft. A similar special thank you to Watson NLU senior manager Olivia Buzek for helping me (slowly) get my head around deeper machine learning concepts. 🙇🏻‍♂️

My deepest thanks to Beth Smith, Rob Thomas, Daniel Hernandez, and Arvind Krishna for believing in me as a leader, helping me to understand our strategy, and pushing me and trusting me to contribute to it.

Bill Higgins is an IBM Distinguished Engineer based in Raleigh, North Carolina, USA. The above article is personal and does not necessarily represent IBM’s positions, strategies or opinions.

IBM Distinguished Engineer focused on culture and workforce modernization at scale