SpaceX founder Elon Musk watches a post-launch press conference after the SpaceX Falcon 9 rocket with the Crew Dragon spacecraft takes off on an unmanned test flight to the International Space Station from the Kennedy Space Center in Cape Canaveral, Florida, March 2, 2019.
Mike Blake | Reuters
Avocado armchairs and tutu baby daikon radishes are among the quirky images created with a new piece of software from OpenAI, an Elon Musk-backed artificial intelligence lab in San Francisco.
OpenAI has trained the software known as Dall-E to generate images from short text captions. It specifically used a dataset of 12 billion images and their captions, which were found on the Internet.
The lab said Dall-E – a fusion of Spanish surrealist artist Salvador Dali and Wall-E, a tiny animated robot from the Pixar movie of the same name – had learned how to create images for a wide variety of concepts.
OpenAI showed some of the results in a blog post published Tuesday. “We’ve discovered it [Dall-E] has several capabilities, including creating anthropomorphic versions of animals and objects, combining unrelated concepts in plausible ways, displaying text, and applying transformations to existing images, ”the company wrote.
Dall-E is built on a neural network, a computer system vaguely inspired by the human brain that can recognize patterns and identify relationships between vast amounts of data.
While neural networks have previously generated images and videos, Dall-E is uncommon in that it relies on text input, while the others don’t.
Synthetic videos and images have become more sophisticated in recent years to the extent that it has become difficult for humans to distinguish between what is real and what is computer generated. General adversarial networks (GANs), which use two neural networks, have been used, for example, to create fake videos of politicians.
OpenAI recognized that Dall-E has the “potential for significant, broad societal impacts,” adding that it plans to analyze how models such as Dall-E relate to societal issues such as the economic impact on certain work processes and professions, the potential for bias in the output of the model and the longer-term ethical challenges that this technology implies. “
GPT-3 successor
Dall-E comes just months after OpenAI announced it had built a text generator called GPT-3 (Generative Pre-training), which is also supported by a neural network.
The language generator tool is capable of producing humanoid text on demand and it became relatively famous for an AI program when people realized it could write its own poetry, news articles, and short stories.
“Dall-E is a Text2Image system based on GPT-3 but trained on text and images,” Mark Riedl, associate professor at the Georgia Tech School of Interactive Computing, told CNBC.
“Text2image isn’t new, but the Dall-E demo is remarkable for producing illustrations that are much more coherent than other Text2Image systems I’ve seen over the years.”
OpenAI competes with companies like DeepMind and the Facebook AI Research group to build general purpose algorithms that can perform a wide variety of tasks at the human level and beyond.
Researchers have built AIs that can play complex games such as chess and the Chinese board game Go, translate one human language into another, and recognize tumors in a mammogram. But getting an AI system to show true “creativity” is a major challenge in the industry.
Riedl said Dall-E’s results show that it has learned how to coherently combine concepts, adding that “the ability to coherently combine concepts is considered a key form of creativity in humans.”
“This is a major step forward from a creativity point of view,” adds Riedl. “While there is not much agreement on what it means for an AI system to ‘understand’ something, the ability to use concepts in new ways is an important part of creativity and intelligence.”
Neil Lawrence, the former director of machine learning at Amazon Cambridge, told CNBC that Dall-E looks “very impressive.”
Lawrence, who is now a professor of machine learning at the University of Cambridge, described it as “an inspiring demonstration of the ability of these models to store and generalize information about our world in ways that people find very natural.”
He said, “I expect there will be all kinds of applications of this kind of technology, I can’t even imagine it. But it’s also interesting in terms of another pretty amazing technology that solves problems that we didn’t have. Even know that we actually had it. “
‘Does not improve the state of AI’
However, not everyone is that impressed with Dall-E.
Gary Marcus, an entrepreneur who sold a machine learning start-up to Uber in 2016 for an undisclosed amount, told CNBC that while it is interesting, it “does not advance the state of AI.”
He also pointed out that it has not yet opened and that the company has not yet published an academic paper on the study.
Marcus has previously questioned whether some of the research published in recent years by rival lab DeepMind should be classified as ‘breakthroughs’.
OpenAI was founded as a nonprofit with a $ 1 billion pledge from a group of founders including Tesla CEO Elon Musk. In February 2018, Musk left the board of OpenAI, but he continues to donate and advise the organization.
OpenAI turned a profit in 2019 and raised an additional $ 1 billion from Microsoft to fund its research. GPT-3 will be OpenAI’s first commercial product, and Reddit has signed up as one of the first customers.