Toyota Venture’s Chris Abshire: Understanding Generative AI and Where It’s Headed
ABSTRACT
KEY POINTS FROM CHRIS ABSHIRE'S POV
Why is Generative AI such an important category moving forward?
- The conditions are set for a Cambrian Explosion in Generative AI applications. Foundation Models, trained on massive volumes of raw web-scale multi-modal datasets, are adaptable to many types of tasks, including AI generated language and vision. In the last year, companies like Stability AI, Open AI, Midjourney, and others have made versions of their Foundations Models available for commercial use. “The leading companies have positioned themselves as sector-agnostic providers,” says Abshire. “They built these general-purpose language and text-to-image models, and made them available via APIs. They’ve left it up to the AI community, businesses, and consumers to discover use cases.”
- We’re likely to see many more technological inflection points that quickly expand what’s possible. Current capabilities will be quickly superseded. “Lots of people get stuck on text-to-image generation and that perhaps it does not have much real business value,” says Abshire. “And that may be true — there may not be a big market there. But this attitude is too short-sighted. It misses that we may be just a few years away from creating high-quality long-form videos through the same process. People don't always think far enough out in terms of future applications of breakthrough technologies.”
What are some top applications that might be attached to this category?
- Various text applications are already seeing traction in business-to-business areas. Products that package Generative AI to augment functions or workflows within organizations are already seeing traction. One example is content marketing: “There already are tools out there that help users create a full blog post by providing just a short prompt, and the results are impressive,” says Abshire. Productivity is another area ripe for Generative AI: “The AI could summarize Zoom meetings or even emails in your inbox and then generate responses to action items or requests,” he adds.
- Three categories are further out but worth watching, given the potential:
- Generative coding: “Imagine an AI that can build an app based on inputs that describe how it will be used,” says Abshire. “We see this as a giant opportunity, and are tracking the efforts mainly led by the Microsoft-OpenAI partnership, which have spawned a lot of innovations in code generation through GitHub and VS Code.”
- Text-to-video: “Even without the ability to produce a full-length video, a marketing team would see a lot of value if they could easily produce short videos and try out concepts in a brainstorming session.”
- Text-to-object creation with 3D printing: “You can pair Generative AI with 3D printers, and literally use text prompts to conjure up objects from your imagination.” Generative Simulators will allow business users to create and manipulate hi-fi replicas of real-world objects and environments, which could be helpful in industrial settings, but also in entertainment for gaming and film. “Today’s Generative AI platforms may help you produce really cool and fun art, but what organizations need is digital models and simulations of the real world that can be precisely controlled,” says Abshire.
- Generative Simulators will allow business users to create and manipulate hi-fi replicas of real-world objects and environments, which could be helpful in industrial settings, but also in entertainment for gaming and film. “Today’s Generative AI platforms may help you produce really cool and fun art, but what organizations need is digital models and simulations of the real world that can be precisely controlled,” says Abshire. “There’s a few companies now building what are called Generative Simulators. Think Matrix-type worlds. You have a replica of a slice of the world at a high resolution and then you can type in whatever you want to pop up in that simulated environment. For industrial settings, you can use these simulations to train robots in object manipulation. In gaming, you could generate a Minecraft or Roblox world many times faster than a human could.”
What are some potential roadblocks?
- Generative AI still faces high tech barriers in the more complex tasks like generating video or code. “There’s a lot of technology friction that is holding up certain applications,” says Abshire. “Video is just a lot more compute intensive than image generation. The latest examples of text-to-video are still pretty basic. There’s a lot more progress that has to happen for this to be done well. A world with AI-generated 3D films is still very far out. Generative coding is starting to get a lot of attention and we think it’s a massive opportunity.”
- Fear of job losses might hinder adoption if careful messaging doesn’t defuse this issue. “People might be worried about having their jobs removed, but this technology will likely serve to augment work, at least in the near term. It’s making human jobs easier, not eliminating them.”
It may be true that text-to-image generation may not have real business value. But this attitude misses that we may be just a few years away from creating high-quality long-form videos through the same process. People don't always think far enough out in terms of future applications of breakthrough technologies.
Chris Abshire ~quoteblock
VISUAL: APPLICATIONS UP FOR GRABS in GENERATIVE AI
IN THE INVESTOR'S OWN WORDS
Foundation Models represent a new approach for building AI systems. They set the stage for the current wave of innovation and new techniques are constantly emerging to make these models even more powerful.
It all started at Stanford where they were trying to create a base “foundation model” that could understand the English language and work with text. They wanted to make an AI that could write a book, or chatbots that could answer a large range of questions.
The first well-known Foundation model was GPT-3, a large-language model or LLM that was released in 2020. But the big breakthrough came somewhere in late 2021 when folks started working with a lot of the tech giants’ compute resources, leveraging AWS and Microsoft Azure to basically train these Foundation Models on all the internet’s raw data.
The models leverage important AI techniques called self-supervised learning and transfer learning. With these, the AI's are able to label unstructured data on their own, and they can constantly apply learnings from one context into another.
What makes Foundation Models especially powerful and valuable is that certain properties can appear that a human would never anticipate. For example, a model trained on a large dataset might be able to tell stories on its own or be able to do arithmetic, without the AI having been specifically programmed to do that.
In Generative AI, we’re especially excited for applications in perception and training, marketing, and media and entertainment. But we’re all still in the exploratory phase figuring out where the biggest opportunities lie. There’s a lot of unknowns as we’re still very early in understanding where the most business value will be created and captured.
MORE Q&A
Q: How should companies think about competing when some of the training datasets and base Foundation Models are widely available?
A: Most generative AI startups have a slide deck and a copy of the stable diffusion weights, but don't know how to go beyond that. The top companies are coming up with creative ways to scrape data and form partnerships with the tech giants who’s compute resources are critical given how expensive it is to train on web-scale data. Folks that have proprietary data have an advantage because they can fine-tune off-the-shelf models for specific applications. For example, a company that generates videos could do the best video generation if they were able to scrape YouTube and get all of that video data to train their models on. Of course, there’s not enough computation power on Earth to train on all of YouTube’s video data. But thinking by analogy, that's the kind of advantage companies will seek.
Q: How do you see the dynamic playing out between vertical-specific applications versus companies like Open AI building out core technology?
A: On the whole, the larger successes and revenue opportunities will tend to come from verticalized applications, rather than the underlying engines. In regards to Foundation Models and the core horizontal technology, what people don’t realize is how much goes into getting these models trained on data and then getting them good enough to be useful. Most people can’t do it themselves because they don’t have the money to pay for the computing resources and they don’t have a clever way of getting the right type of data needed. The few well-known AI companies like Stability AI were built on the shoulders of giants, and they did most of the hard work before ever fundraising…which is refreshing to see in the fundraising landscape.
Q: What is a misconception people commonly have about this space?
A: A big misconception is that Foundation Models can only be used for generating digital art. Foundation Models are generative models, but they deal with different modalities, which impacts the applications they could be useful for.
Big LLMs like DALL-E are used mostly for digital art currently, but might disrupt other visual industries (illustrations, stock photos, or even commercials). Because CLIP is open source, it is also becoming a fundamental building block in many vision systems.
LLMs like GPT-3 are more mature (1 year in AI feels like 10 years in other industries), so they have many more applications, anywhere where text is the interaction medium or product. Well-known successful ones are Copy.ai and Jasper.ai. And many more building on the GPT-3 API or their own clones (Adept.ai, which has cool demos of a web browsing virtual assistant). A huge revolution is also brewing in coding (code is just text!). There is also a nascent trend of LLMs in robotics (language as interface, for instance for task specification).
WHAT ELSE TO WATCH FOR
- Generative AI is not all of the same quality, and access to the most powerful models and tools will increasingly be determined by the market. There will be a natural stratification in this category. Companies will begin to experiment with pricing, and hold back some of their models for internal use, or for special sets of clients. This is already visible on many text-to-image platforms, where API access is priced depending on models’ relative power. “Some of the content that you see online isn’t a great representation of what's currently possible,” says Abshire. “It depends on what you’re doing and paying for. Most users don’t have access to the best in-house models.”
- Watch for new jobs such as Query Engineers. “This technology has created an entirely new job called Query Engineering,” says Abshire. “The role involves trying to figure out which keywords and descriptions will output the desired image, video, or 3D model. This is a lot harder than one might think.”
STARTUPS MENTIONED IN THIS BRIEF
Acknowledgements
Special thanks to Adrien Gaidon, Head of ML at Toyota Research Institute, for his insights in the rapidly evolving Foundation Model space.
Editor's note: The interview for this article was in late October, 2022, before the release of GPT-3.5 and other developments in Generative AI.