[ad_1]
//php echo do_shortcode(‘[responsivevoice_button voice=”US English Male” buttontext=”Listen to Post”]’) ?>
TinyML will become the largest driver of the microcontroller market in the next 10 years, according to Remi El-Ouazzane, the president of STMicroelectronics’ microcontrollers and digital ICs group.
“I really believe this is the beginning of a tsunami wave,” he told EE Times in an exclusive interview. “We’re going to see a tsunami of products coming with ML functionality: It’s only going to increase, and it’s going to attract a lot of attention.”
STMicro has roughly a quarter of the microcontroller (MCU) market today, shipping between five and 10 million STM32 MCUs every day. According to El-Ouazzane, over the next five years, 500 million of those MCUs will be running some form of tinyML or AI workloads.
TinyML, which refers to running AI or machine learning inference on otherwise generic MCUs, “will become the largest endpoint market in the world,” he said.
El-Ouazzane—who previously served as CEO of edge AI chip startup Movidius and COO of Intel’s AI product group—and his team at STMicro have been hard at work the last few years bringing AI capabilities to the company’s portfolio.
“While I believe [tinyML] is the biggest market in the making, I’m also humbled by the fact that we have gone through three to five years of education of management of companies who make fans, pumps, inverters, washing machine drum companies—all those people are coming to it,” he said. “We live in the world of ChatGPT, but all these laggards are finally coming to use AI. It was my vision for Movidius back in the day. I thought it would happen… it is taking a long time, but we see it coming now.”

TinyML deployments
Energy-management and automation firm Schneider Electric is using a mainstream STM32 device for people-counting and thermal-imaging applications. To do so, it uses classification and segmentation algorithms on sensor data from a thermal infrared camera. Both the thermal camera pipeline and the AI run on the microcontroller. Schneider can use the result to optimize HVAC systems, thereby reducing the CO2 footprint of buildings.
Industrial door specialist Crouzet is also combining STM32 devices with tinyML for predictive maintenance purposes.
“This was interesting because, for them, the cost of maintenance is a huge deal,” El-Ouazzane said. “They have to deploy the maintenance person post-mortem, and if a plane is grounded because a door is malfunctioning… it is not good news when they receive that phone call.”
Crouzet’s tinyML system can detect signal drift in real time, with high accuracy to stay one step ahead of a potential failure. This system processes the data in the door then sends metadata for analysis.
“They are literally changing their business model to be able to deploy maintenance before it is needed, which has allowed them to be way more efficient in how they deploy their maintenance people. And, for sure, it saves them from receiving a phone call they don’t want to receive,” El-Ouazzane said.
Other examples include Chinese smart energy company Goodwe, which is using tinyML on vibration and temperature sensor data to prevent arcing in its high-power inverters.
While these are great examples, why are we not seeing the tsunami today?
“Between starting an engagement and [deployment], after having gone through understanding the platform, prototyping, proof of concept, testing, you name it, and several layers of management approval, it takes three years,” he said. “In the industrial world, it takes three years for a company to start from thinking about something, working with us for the first time, to the library being deployed for production in their product.”
Software stack
In general, STMicro splits its tinyML customers into two groups. Industrial customers, those with the 3-year lead time, generally have little experience with AI, while companies that have invested in data science expertise can generally turn things around faster. STMicro takes a similar approach to competitors, including NXP: a software stack that presents different entry points dependent on the user’s level of AI experience.
For the industrial group, NanoEdge AI Studio requires no advanced data-science knowledge, allowing embedded software developers to create optimal ML libraries from a user-friendly UI. It currently supports four types of libraries: anomaly detection, outlier detection, classification and regression. They can be combined and changed.
For example, outlier detection might detect a problem, classification might identify the source of the problem, then regression might extrapolate information to provide further insight. NanoEdge AI is used by customers like Crouzet as a low-code platform for working with vibration, pressure, sound, magnetic field and time-of-flight sensors.
The other entry point, STM32 Cube.AI, allows developers to train neural networks and optimize them for memory and compute constrained environments.
Counterintuitively, this platform is growing faster than its low-code brother. El-Ouazzane said that STM32 Cube.AI’s desktop downloads grew 400% between March last year and May this year.
“Here, the time to market is very fast—less than two years—because the people on this platform know what they want, and know how to deploy, and the level of sophistication is pretty high,” he said.
El-Ouazzane knows that AI software is both a compiler issue and a toolchain issue. Armed with the knowledge that it would be difficult to get developers to change away from familiar toolchains, STMicro approached Nvidia with the idea of working with its popular Tao toolchain. The resulting collaboration means models from Nvidia’s or STMicro’s model zoo, in ONNX format, can be ported to the Tao toolchain, trained and optimized (quantized and pruned), and then converted back to ONNX to export to STM32 Cube.AI for compilation to C code that can run on the STM32.
“For us, the mindset was: there is a reference toolchain, and the more we integrate into it, the more we can expand the universe of developers and the downloads we’ve got,” El-Ouazzane said. “I believe Nvidia sees there is a huge market of 500 million microcontrollers per year, and [the models] have to be trained somewhere.”
STMicro’s example application shows an STM32 MCU executing person detection before handing off only images with people in to an Nvidia Jetson GPU for further classification tasks. This reduces the amount of GPU compute needed and may help an edge system fit within a tighter power budget.

El-Ouazzane is also open to the ecosystem of third parties writing software compatible with STMicro’s Arm Cortex-M devices, including OctoML, Plumerai and others—with the likely outcome being a “co-opetition.”
“Some of these companies are helping to keep us honest!” he said. “If companies or customers want to leverage our solution, never ever would we stop that: That is not best practice. We are trying each [MLPerf] round to get better [benchmark scores] than them, and we are closing the gap, but I want them to be healthy and gain momentum with customers.”
Around three-quarters of submissions in the recent round of MLPerf Tiny benchmarks were submitted on STM32 hardware, which El-Ouazzane said illustrates the STMicro stack’s maturity. The company plans to enable potential customers to reproduce its MLPerf results in its dev cloud.
“I learned the hard way that hardware performance matters, but your stack and the land grab you make with your stack makes the whole difference,” El-Ouazzane said. “We are so vigilant in expanding our ecosystem and expanding the number of developers and keeping them under our roof, and we are going to make it hard for [them] to escape that environment.”
Hardware roadmap
STMicro’s also working on next-gen hardware for AI at the embedded edge.
“The edge is a different ballgame than training,” El-Ouazzane said, adding that while training is limited by interconnect, as well as compute and memory, at the edge the main limiting factor is cost.
“At the tiny edge, when you build products you are constrained by cost; you cannot go wild,” he said “There is a nominal price point, and it’s between 1 and 3 bucks… and part of the cost is captured by the non-volatile memory in the microcontroller.”

The STM32N6, the first Cortex-M device with a home-grown NPU on chip, was demonstrated recently running a custom version of YOLO running at 314 fps; this is one to two orders of magnitude faster than the same network running on the STM32H7, STMicro’s most powerful MCU without an NPU.
“The N6 has a traditional Von Neumann architecture, very similar to what we did at Movidius back in the day, but way more optimized in its footprint, super compact and delivering a decent amount of TOPS/W,” El-Ouazzane said.
The N6 will be sampled to 10-15 lead customers in September, with an official launch next year.
However, El-Ouazzane is clear that the N6 is not the end goal for STMicro in AI.
“If we nominally say we want to reach our performance-per-Watt end goal between 2025 and 2030, you can assume N6 is one-tenth of the way there,” he said. “That’s the amount of boost you’re going to see in the coming years. The N6 is a kick-ass product, and it is getting a lot of traction in AV-centric use cases, but there is an explosion of performance coming: There will be neural networks on microcontrollers fusing vision, audio and time series data.”
His vision for the required 10× performance jump is that non-volatile memory, which enables analog compute-in-memory schemes, is critical.
STMicro presented a paper at ISSCC this year about an SRAM-based analog compute-in-memory design it is developing for future generations. The demonstrator achieved 57 TOPS at 77 TOPS/W at the chip level (at INT4). However, it may be a little while before this reaches the mass market.
“The technology is in silicon today, we can demonstrate it and measure its performance,” El-Ouazzane said. “But it is becoming a question of roadmap intersect. This is something that will come in the next three to five years, for sure.”
For STMicro, he points out, when it comes it will come at scale.
Getting a product ready for that kind of volume takes time—testing, documentation, support—so the timing is less to do with technology and more to do with how quickly STMicro can turn technologies into mass-market products.
“We are super excited about being the driver in this microcontroller-AI accelerator space,” he said. “Some of us have done this before in the data center and client space, and we think we can reproduce it. Our roadmap will allow us to do mind-blowing things in the next five years.”
[ad_2]
Source link

