Dark Light

Esperanto Pivots to HPC and Generative AI Leave a comment

[ad_1]

//php echo do_shortcode(‘[responsivevoice_button voice=”US English Male” buttontext=”Listen to Post”]’) ?>

AI chip startup Esperanto recently pivoted its focus from recommendation acceleration to large language models (LLMs) and high-performance computing (HPC), releasing a general-purpose software development kit and PCIe accelerator card for its ET-SoC-1 first generation RISC-V data center accelerator chip.

The company believes its chip is well-positioned to take advantage of the market for LLM inference today, Craig Cochran, a marketing executive at the Mountain View, Calif.-based company, told EE Times.

“The real opportunity is for people to do their inferencing on one or two cards with low power, so low [total cost of ownership], with faster latency and performance than running on a [CPU],” he said. “We don’t expect people will want to do inference on GPUs; it’s overkill. And that’s why we think, for this application, we’ll be competing more with CPUs instead of Nvidia.”

Esperanto ET-SoC-1
Esperanto’s first-gen RISC-V AI and HPC accelerator, ET-SoC-1. (Source: EE Times/Sally Ward-Foxton)

Esperanto has demonstrated Meta’s OPT-13B LLM running on a single Esperanto chip, which operates in the 15-50 W power envelope with typical consumption around 25 W. Cochran said the company also has other generative AI models up and running today via its AI software development kit.

A new focus on LLMs is a natural consequence of the technology’s recent surge in popularity.

“When we launched this chip two years ago, recommendation was a big deal and transformers were not yet born, and now we have transformers and LLMs and the vertical applications are changing very fast, too,” Cochran said. “So we’re taking our hardware, which is good for all of these, adapting our software to make sure we can optimally support the models and going after those opportunities—because the opportunity space is shifting very quickly.”

Esperanto ET-SoC-1 power consumption graph
Power consumed by Esperanto’s PCIe card for ResNet, BERT and DLRM benchmarks. (Source: Esperanto)

Esperanto has optimized its AI software development kit (SDK) to handle partitioning of LLM layers efficiently, and it is experimenting with versions of OPT up to 30B parameters with plans to scale to larger versions and other models, including Llama.

Esperanto’s second new focus is HPC.

While there is an increasing amount of crossover between AI and HPC workloads, Esperanto’s view is that while they require separate software toolchains, the same hardware should be able to handle both workloads.

Speaking at a recent RISC-V event in Barcelona, Spain, Esperanto CTO Dave Ditzel said that RISC-V is the obvious choice for AI and HPC.

“We think RISC-V is not only the best choice, it’s the only logical choice,” Ditzel said. “When you think about building great systems for the future…there aren’t many alternatives. X86 is too heavyweight to serve as both the main CPU and the accelerators, GPUs are just too hard to program and they can’t really serve as your main CPU. Only RISC-V has the ability to do both things.”

The opportunity in AI and HPC segments is perfect for RISC-V offerings with the right software, he added.

“The big issue is, how do we make these machines easier to program?” he said. “That’s where RISC-V really has an opportunity. We think RISC-V is in a unique position to let us build the best converged HPC and ML system.”

PCIe card

Esperanto was previously targeting recommendation acceleration, typically limited to hyperscalers’ data centers that provide online shopping and social media newsfeed predictions. For this market, the company had previously planned an OCP Glacier Point-compatible, dual M.2 card and was running its chip within that power envelope, which is 20 W. Shifting focus to generative AI and HPC has necessitated development of a low-profile PCIe card. But moving to the PCIe form factor means power consumption can be higher, as much as 40 or 50 W if required, though typically it might be around the 25-W mark, Cochran said.

Esperanto ET-SoC-1 PCIe card
Esperanto has pivoted to the PCIe form factor for its ET-SoC-1 accelerator card. (Source: Esperanto)

“We were planning on doing both [M.2 cards and PCIe cards], but we ended up putting all our eggs in the PCIe basket,” Cochran said. “That’s not to say we won’t do M.2 cards if customers to show interest in that.”

Esperanto’s AI software stack
Esperanto’s AI software stack. (Source: Esperanto)

Esperanto’s production PCIe card, developed by Penguin Solutions, has 32 GB LPDDR4x memory. The company has built a 2U server as an eval system that can hold eight or 16 PCIe cards. This system, with dual Intel Xeon host CPUs, can offer up to 16,000 RISC-V CPU cores per server. A data center rack with 20 Esperanto servers can deliver around 320,000 cores.

Software stacks

Esperanto has two software stacks: one for AI, one for HPC.

Esperanto’s HPC software stack
Esperanto’s HPC software stack. (Source: Esperanto)

The existing AI software stack is built on Glow, Meta’s open-source AI compiler, which accepts PyTorch or ONNX format models and generates RISC-V executable code. There is also an execution engine tailored for Esperanto’s hardware. Esperanto has demonstrated LLM, computer vision (detection/segmentation) and recommendation models up and running via this stack.

A new HPC-oriented software stack, which Esperanto calls its general-purpose software development kit (GP-SDK), enables direct programming of the 1024 ET-Minion cores and their vector/tensor units for massively parallel computation. A standard C++ toolchain runs on the x86 host; users write their own application, which calls Esperanto’s runtime to control the chip. The RISC-V GCC toolchain is used to compile kernel code using Esperanto libraries and packager.

Second generation

Esperanto is planning a second-generation chip (ET-SoC-2), which Ditzel said in his talk will incorporate more features oriented toward HPC.

This chip is already under development with a lead customer. It will be fully compatible with the new RISC-V vector specification, Ditzel said, with a goal of at least 10 TFLOPS FP64 performance per chip (FP64 and FP32 support across all cores will be added for the second generation). The second-gen chip will use HBM rather than the LPDDR memory used by the first gen.

Esperanto servers in a rack
A rack of Esperanto eval servers up and running at Esperanto HQ. (Source: Esperanto)

“Our view is that RISC-V is now mature enough and ready to start the revolution for future combined machine learning and HPC,” Ditzel said. “One final prediction…with what we’re doing and what we see others doing, within at least the next five years, a RISC-V-based system will win a Green500 award [for energy efficiency in supercomputers]. Our goal is to make that happen with Esperanto hardware, and we’re happy to be challenged by anyone else out there who wants to build other systems.”

Esperanto is currently shipping evaluation servers to commercial customers and offers a cloud access program. Customers include several in the Fortune 100, Cochran said, noting that there is interest from both AI and HPC realms. The company also licenses IP to select strategic partners.

[ad_2]

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *