Google Brain’s Andrew Ng says data-centric approach boosts AI success
Written by Ben Wodecki and republished with permission from AI Business.
The Google Brain and Coursera cofounder also says vertical platforms are key to greater AI adoption.
Andrew Ng, the revered AI expert and co-founder of Google Brain, said developers should adopt a data-centric AI approach to unlock the full potential of AI.
Speaking during a keynote at ScaleUp:AI ’22, Ng referred to a McKinsey report that suggested AI would add $13 trillion to the global economy by 2030. However, he said outside sectors served by consumer internet companies, AI adoption was “very low.”
“Despite the rapid progress of AI in the last decade, it won’t reach its full potential until it’s available to everyone – which is a long way off,” he said.
One reason is that companies still struggle to see meaning return on investment with their AI projects. Ng said only 10% of large companies have seen a “significant” payoff, citing an MIT/BCG study.
He said the key to increasing adoption is to make AI produce better outcomes through a data-centric approach. For decades, data scientists used a model-centric approach that focused on improving the algorithms or code applied to a dataset.
Today, these algorithms have been pretty much fine-tuned. Meanwhile, the available data has exploded and has become more unwieldy. A data-centric approach works on curating high-quality, relevant datasets by systematically engineering the data used to build an AI system.
Not enough right data
However, companies face two common barriers when looking to adopt AI: small datasets and customization issues.
Ng told the audience that outside of consumer tech industries, most other sectors do not have enough data required for an effective dataset. While some larger organizations may have 100 million data points, Ng recalled conversations with some manufacturing firms where the vast majority had 50 or fewer images for each defect.
However, a data-centric approach can work with even small datasets since it results in high-quality data, he said.
The second adoption barrier Ng touched on was that many industries require more customization for potential deployments.
The AI community has done well in completing projects worth hundreds of millions or billions of dollars. “But a large number of $1 million to $5 million projects are sitting there because we don’t know how to value them,” he said.
To solve the customization issue, Ng said AI developers need to build vertical platforms that would enable the end customer’s IT team to create the custom AI system they need.
The data-centric way
In a data-centric approach, subject matter experts curate and engineer the data – to ensure that the right data is being used to train the AI model. This makes all the difference.
Ng cited an example where the company wanted to use computer vision to inspect steel sheets. The company’s baseline accuracy was 76.2% accuracy. After an AI team came in and used a model-centric approach for two months, no improvements were made.
Ng sent in a team of his engineers, who made a 16.9% increase to achieve 93.1% accuracy using a data-centric approach. “Our approach took two weeks to get to a much higher level of accuracy,” he recalled.
“The world needs a lot of more vertical platforms to address [various] issues,” he said, adding, that data-centric AI platforms ensure “consistently high-quality data through all stages of an AI project.”
“Even for good engineers, projects could take 12 months. Now they can take one month.”
“By addressing small data and customization problems, it is key to democratizing access to AI,” Ng concluded. “Democratizing AI benefits everyone. This community holds the key to unlocking this next era of AI.”