Dark Light

Axelera Demos Early Silicon, Raises $50 Million Leave a comment

[ad_1]

//php echo do_shortcode(‘[responsivevoice_button voice=”US English Male” buttontext=”Listen to Post”]’) ?>

Axelera, the European chip startup working on an edge AI accelerator, has demonstrated a working chip at the Embedded Vision Summit. The company also announced it raised an oversubscribed $50-million series A in an increasingly challenging fundraising climate for chip startups. New investors include a consortium of CDP Venture Capital, Verve Ventures and Fractionelera, which was formed to invest in Axelera.

Axelera Metis chip
Axelera’s chip is up and running four weeks after receiving first silicon. (Source: EE Times)

The company will use the capital to scale up its Metis accelerator production, expand its sales force and grow its fledgling operation in the United States, Axelera CEO Fabrizio Del Maffeo told EE Times. The funding will also go towards designing a next-gen version of the accelerator.

Core technologies

Axelera CTO Evangelos Eleftheriou said the company’s Metis chip relies on two key technologies: a digital in-memory compute-based matrix-vector multiply (MVM) accelerator with a RISC-V core to control dataflow. The quad-core design can achieve 214 TOPS peak performance with the peak efficiency at 14.7 TOPS/W.

“The whole design is hand-crafted down to the last transistor,” Eleftheriou said. “The reason is to minimize area and energy consumption.”

Axelera’s MVM is a digital in-memory compute design
Axelera’s MVM is a digital in-memory compute design. (Source: Axelera)

Energy efficiency does not depend on high utilization, Eleftheriou added, since blocks can be disabled using a flag. At the core level, for 100% utilization the efficiency is 14.1 TOPS/W, but drop the utilization very low to 6.25%, and Metis can still achieve 11.4 TOPS/W.

Axelera’s 52.5 TOPS MVM accelerator features densely interleaved weight storage and compute units. The design uses pipelining to maintain throughput. INT8 is used for weights, which accumulate in INT32. FP32 is used for activations. This is done to preserve accuracy: running an INT8-quantized ResNet-50 model lost only 0.1 percentage points versus the unquantized FP32 model, without retraining.

“We did lots of simulations to understand what optimizations we needed to do,” Eleftheriou said. “We know that, in general, neural networks are forgiving when it comes to weight precision, but they are not forgiving when it comes to activation precision.”

Axelera Metis chip block diagram
Each AI core has a 4 MB L1 cache, with 3 MB used for pre-fetching data to maintain throughput. (Source: Axelera)

There is a small RISC-V CPU in each AI core to manage dataflow over memory mapped I/O. This is a scalar floating point unit, but Eleftheriou said a next-gen chip might add vector extensions.

A data processing unit (DPU) in the AI core handles element-wise operations and activations (the activations are computed in FP32 for accuracy then reduced to INT8). Axelera’s 1 TOPS depth-wise processing unit (DWPU) is used for pooling, depth-wise convolution and up-sampling. These operations could be done in the MVM, Eleftheriou said, but not as efficiently.

Each core has 4 MB L1 cache; 1MB is used for compute and 3MB is used for pre-fetching data. Weights and activations can be placed in a 32-MB shared L2. The L2 is also used to transfer data between the cores. There is also an LPDDR4x interface that allows up to 4GB of external memory to be connected.

Models can be run concurrently in individual cores (different models simultaneously, or batched operation), cascading/pipelined, or larger models can be spread across more than one core, in which case the L2 cache is used to exchange data between the cores.

Axelera’s software stack is up and running. Pipelines of multiple neural networks (and image pre-/post-processing) can be designed easily via a YAML file.

Roadmap

Axelera Metis demo
Axelera’s demo had the Metis chip running a non-optimized version of SqueezeNet. (12,000 fps at 5W) (Source: EE Times)

With each core operating self-sufficiently, the design is scalable in either direction, Eleftheriou said.

“Of course, the devil is always in the detail—you need to have a network on chip, you have to connect all the ports on the chip, you have to make them collaborate with each other, but in principle, every core can be replicated,” he added.

Axelera is already thinking about its next-gen product, which will offer better performance for transformers at the edge. While it’s possible for Metis to run ViT, a future product may include a dedicated softmax accelerator, and would complement Metis, Del Maffeo said. The company is targeting 2025 for the next step in its roadmap.

Axelera currently has 55 companies signed up for its early access program, which is still open for applications. The 15 or so lead customers selected are all working on computer vision applications, Del Maffeo added, though enquiries have ranged from security to agriculture. Lead customers will get samples this summer, with the first software release due in July.



[ad_2]

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *