

# Realizing a more productive EDA environment

Improving the economics of semiconductor design with HPE systems and AMD EPYC<sup>™</sup> processors with AMD 3D V-Cache<sup>™</sup> technology

**Business white paper** 





# **Table of contents**

#### 3 Introduction

3 Customer challenges

#### 5 EDA software licensing

#### 6 The EPYC advantage

- 6 AMD EPYC 7003 series
- 6 AMD 3D V-Cache technology
- 7 An ideal architecture for memory-intensive EDA workloads
- 10 Choosing the right server platform

#### 11 The HPE Apollo 2000 Gen10 Plus system

12 Optional liquid cooling

#### **12** HPE ProLiant servers

- 12 Comprehensive server security and management
- 13 Power efficiency
- 13 Performance where it matters
- 14 Solutions ideal for EDA workloads



## Introduction

Few industries are more competitive than modern electronics manufacturing and chip design. Consumers expect devices to be faster, cheaper, and more reliable with each generation. Whether large or small, electronics manufacturers rely on electronic design automation (EDA) to enable these improvements.

High-performance computing (<u>HPC</u>) is used in all phases of the EDA cycle from system-level design to logic to analog design to simulation and layout. Even for midsized projects, verifying proper device functionality is one of the largest challenges faced by chip designers. As engineers make changes to a design, they need to run extensive computer simulations to verify functionality. By most estimates, regression testing and verification account for roughly 80% of simulation workloads in modern electronic design environments.<sup>1</sup> Given the enormous cost of committing a design to silicon, projects must be error-free before tape-out. The performance and capacity of the EDA simulation environment directly affect product quality, time to market, downstream support costs, and IT costs—all impacting the bottom line.

EDA firms compete based on the effectiveness of their design environments. This brief explains how high-performance HPE Apollo 2000 Gen10 Plus systems and HPE ProLiant servers powered by AMD EPYC<sup>™</sup> processors can provide a decisive advantage to electronics manufacturers. HPE solutions with AMD EPYC processors can help customers increase simulation capacity, improve throughput and productivity, and reduce TCO in EDA server farms.

## **Customer challenges**

Device simulation becomes more difficult as designs become larger. As the number of registers and memory in a device increases (call this n), the number of states to be modeled increases exponentially (2<sup>n</sup>). System-on-a-chip (SoC) designs are frequently in the range of hundreds of millions or even billions of logic gates, making verification more challenging with each product generation as designs become more complex.

In addition to size and complexity, reliability and security are important considerations. Products such as sensors for autonomous vehicles, embedded control systems, and medical devices need to work flawlessly. This demands higher levels of verification coverage and increased simulation to ensure quality and reliability.

Figure 1 illustrates the challenge faced by EDA design centers. Bringing innovative new products to market and improving reliability requires more simulation capacity. However, firms face pressure to shorten design cycles to meet time-to-market objectives with limited budgets for hardware and software.



Figure 1. EDA firms need more simulation capacity but face tight resource constraints

Semiconductor manufacturers need to deliver ever more complex designs, get to market faster, and continuously improve product quality—all with limited resources.

<sup>1</sup> Based on estimates from HPE internal VLSI design environment, 2022.



Besides raw performance, energy efficiency is also an important consideration. As data center managers seek to become more sustainable, they need servers that deliver maximum throughput per watt to minimize power and cooling requirements. They also need dense systems that minimize data center space requirements.

Making the most out of costly software licenses is an important part of the solution. EDA software tools need to simulate multiple aspects of device functionality over different periods. Chip designers typically run tools from leading EDA vendors, including Cadence®, Synopsys®, and Siemens EDA (formerly Mentor Graphics®). Workloads are diverse, with some simulations running for minutes or hours while others running for days or even weeks on large server farms.

Table 1 describes some typical verification workloads and their characteristics.

| Category                | Verification type                   | Description                                                                                                                                                                |
|-------------------------|-------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Digital<br>abstractions | Gate-level simulations (GLS)        | Models may consist of billions of gates.<br>Simulation runtimes can range from hours to<br>weeks, depending on the model and simulator.                                    |
|                         | Register-transfer level (RTL)       | Models typically consist of millions of lines of<br>C-like code running 10K to 100K simulated<br>cycles per second (cps). Runtimes range from<br>seconds to multiple days. |
|                         | Transaction-level modeling<br>(TLM) | Models consist of up to 1 million lines of<br>C++-like code running 10K to 1M simulated<br>cps. Runtimes range from seconds to hours.                                      |
| Analog<br>abstractions  | Transistor level (SPICE)            | Models consist of analog primitives: resistors, capacitors, transistors, and others.                                                                                       |
|                         | Verilog-AMS / VHDL-AMS              | Models consist of behavioral code and operate<br>on voltage and current values in an analog<br>simulator to solve a network.                                               |
|                         | System-level verification           | Models consist of C-like code run on a digital simulator.                                                                                                                  |

Table 1. Different types of EDA verification workloads

As models get larger, servers increasingly require large amounts of physical memory and cache. This is particularly true for workloads such as register transfer level (RTL) simulations. For large RTL simulations, the more data that can fit in a processor's cache, the better the performance. A large L3 cache can generally deliver better performance depending on the model being simulated because more of the model being simulated fits in cache.<sup>2</sup>

Many EDA tools are single threaded where one command is processed at a time. To optimize throughput and server utilization, customers tend to run many simulations per server on multicore servers. To achieve high throughput, customers need:

- High clock frequencies
- Large amounts of physical memory
- Large amounts of L3 cache per simulation
- Low latency and high bandwidth to cache and memory

AMD EPYC processors with AMD 3D V-Cache technology





# **EDA software licensing**

A specific challenge faced by electronic manufacturers is the high cost of software tools. Software license costs for EDA environments are typically much higher than hardware costs. Because of this cost disparity, IT administrators tend to be much more concerned about using software resources efficiently than maximizing server utilization. The cost of engineering talent is also an important consideration, and organizations need to maximize their productivity. Figure 2 provides a simplified view of a typical design environment.





Project teams typically work on multiple and sometimes overlapping designs and need fast access to software tools and servers to run them on. Before licensed EDA tools can run, they need to contact a license server and check out a license. Tools return license features to the license manager when implementation is complete. In some cases, a simulation may consume multiple license features.

A single license for a verification tool can cost multiple thousands of dollars per year. Regression tests can involve millions of discrete simulations and completing these quickly requires a large number of licenses. For high-demand tools, a design environment may have hundreds of license features. Overall license costs can easily exceed \$1M annually for a single tool.<sup>3</sup>

Because licenses are expensive, design firms have a strong incentive to keep these licenses fully utilized. Workload management software plays a critical role, coordinating with license servers and efficiently scheduling various batch and interactive jobs. The scheduler seeks to ensure that project deadlines are met and that resources are shared according to policy, optimizing both licenses and infrastructure resources.

Not only is it important to minimize the idle time for licenses, but it is also essential to use the licenses efficiently by running simulations as quickly as possible. A key metric for EDA firms is the number of simulations run per day per license. It's critical that these high-performance, high-value tools can run on compute nodes that can deliver high throughput for optimal cost efficiency.

<sup>3</sup> Price estimate provided by HPE VLSI design environment manager.



AMD EPYC processors deliver exceptional performance and scalability for EDA workloads.

- Industry-leading 7 nm x86 server CPU  $^{\rm 14}$
- PCIe 4.0 support offers double the data transfer rate of PCI 3.0
- Eight memory channels per socket for broad memory bandwidth
- Up to 3,200 MT/s DDR4 memory support
- Large L3 cache (up to 768 MB per socket on 3rd Gen EPYC with 3D V-Cache)

<sup>4</sup> AMD EPYC-based systems have been chosen as the basis of exascale supercomputers. Design wins include Frontier, a collaboration between the US Department of Energy (DOE), ORNL, and HPE. AMD EPYC processors will also power El Capitan, a collaboration between US DOE, LLNL, and HPE expected in early 2023.

<sup>5</sup> AMD EPYC<sup>™</sup> 7002 Series Processors

<sup>6</sup> amd.com/en/press-releases/2021-03-15amd-epyc-7003-series-cpus-set-newstandard-highest-performance-server

<sup>7</sup> Tuning Guide AMD EPYC 7003, March 2022: See section 1.2.2 Core Complex (CCX) and Complex Die (CCD)

amd.com/en/press-releases/2021-03-15amd-epyc-7003-series-cpus-set-newstandard-highest-performance-server

- <sup>9, 12</sup> For HPE Apollo and HPE ProLiant systems, a BIOS update is required when upgrading to 7003 series processors. Also, minimum OS requirements include Red Hat® Enterprise Linux® (RHEL) 8.3, SUSE Linux Enterprise Server (SLES) 12 SP5, or SLES 15 SP2.
- <sup>10, 13</sup> EPYC-026: Based on calculated areal density and based on bump pitch between AMD hybrid bond AMD 3D V-Cache stacked technology compared to AMD 2D chiplet technology and Intel<sup>®</sup> 3D stacked micro-bump technology.

<sup>11</sup> EPYC-027: Based on AMD internal simulations and published Intel data on Foveros technology specifications.

<sup>14</sup> amd.com/en/processors/epyc-7003-series

# The EPYC advantage

AMD EPYC processors bring together high core counts, large memory capacity, extreme memory bandwidth, large cache sizes, and massive I/O with the right ratios to enable exceptional HPC workload performance. For EDA users, this can translate into higher-quality designs, reduced regression runtimes, and better license utilization.

While AMD EPYC processors are the choice of next-generation exascale supercomputers,<sup>4</sup> they are also highly affordable, delivering exceptional performance while fitting within the design environment budgets of all sizes.

When AMD EPYC 7002 series processors were first introduced in August 2019, they were a game changer. This second generation of AMD EPYC processors delivered leadership clock frequencies, latency, memory bandwidth, and cache per core making AMD EPYC a preferred processor for EDA workloads.<sup>5</sup> The current generation of AMD EPYC 7003 series processors introduced in March 2021, extended this leadership even further, offering exceptional single-core performance with a consistent feature set across the stack. The new set of AMD EPYC 7003 series processors with AMD 3D V-Cache™ technology introduced in March of 2022 raises the bar even higher with 3x the L3 cache of the standard EPYC 7003 series. All these processors deliver excellent performance and customers can select the processor that best meets their needs depending on their unique workloads.

#### AMD EPYC 7003 series

AMD EPYC 7003 series processors offer several advantages over the previous generation.<sup>6</sup> Among these advantages are:

- A unified 8-core cache complex sharing a single 32 MB L3 cache per core complex die (CCD) providing up to twice the amount of directly accessible L3 cache per core with low latency<sup>7</sup>
- Up to a 19% improvement in instructions per cycle (IPC)<sup>8</sup>
- A faster Infinity Fabric™, clocked at 1,600 MHz enabling synchronous transfers with the 3,200 MT/s DDR4 memory
- Advanced chip-level security enhancements (SME, SEV-ES, SEV-SNP)
- 3x the L3 cache available from EPYC 7003 processors with AMD 3D V-Cache technology versus standard EPYC 7003 processors.

AMD EPYC 7003 series processors are a drop-in upgrade, fully compatible with AMD 7002 series systems. $^{\circ}$ 

#### AMD 3D V-Cache technology

The newest members of the AMD EPYC 7003 processor family feature AMD 3D V-Cache technology. This new technology extends the capabilities of the 7003 series with an innovative 3D vertical cache that adds 64 MB of L3 cache per CCD, tripling the amount of L3 cache to 768 MB per socket for 96 MB per CCD, a significant benefit for cache-sensitive workloads.

AMD EPYC 7003 series processors with AMD 3D V-Cache technology provide both outstanding density and energy efficiency with their unique solderless design.

- > 200x the interconnect density compared to on-package 2D chiplets<sup>10</sup>
- > 15x the interconnect density compared to micro bump 3D technology<sup>11</sup>
- > 3x the interconnect energy efficiency compared to 3D micro  $bump^{12, 13}$

The additional throughput that users can expect with 3D V-Cache technology varies depending on the workload. While additional cache may have only a modest impact on some workloads, for cache-intensive applications such as RTL simulations, the results can be dramatic. Design engineers and data center managers can select the optimal AMD EPYC processor depending on their unique workloads and mix of tools.



#### An ideal architecture for memory-intensive EDA workloads

The unique architecture shown in Figure 3 is the key to the EPYC processor's throughput advantage. The 9-die SoC features 8 CCDs providing up to 8 cores and 32 MB of cache per CCD. AMD EPYC 7003 processors with 3D V-Cache technology expands this to an unprecedented 96 MB of cache per CCD. This design places large amounts of L3 cache close to compute cores enabling optimal throughput for clock- and cache-sensitive RTL and verification workloads. The advanced 7 nm process enables clock frequencies to scale to up to 4.10 GHz, helping minimize ISV application license checkout time and enabling users to get more productivity from expensive ISV license features.<sup>15</sup>

While other processors share relatively small amounts of L3 cache across multiple cores, AMD EPYC processors provide a direct path between each core and associated L3 cache to speed throughput and help reduce latency.<sup>16</sup> This combination of more L3 cache per core and direct channels to cache combines to deliver exceptional throughput using the EPYC 7003 series with 3D V-Cache.



Figure 3. AMD EPYC high-level processor design—EPYC 7003 series with AMD 3D V-Cache technology

For most EDA applications, the high-frequency AMD EPYC 7xF3 processors will be of interest. These parts deliver leadership per-core performance while offering up to 32 MB of L3 cache per core. The new 7x73X series processors with AMD 3D V-Cache technology shown in Table 2 are ideal for RTL verification tools that benefit from huge amounts of cache.

- <sup>15</sup> EPYC-026: Based on calculated areal density and based on bump pitch between AMD hybrid bond AMD 3D V-Cache stacked technology compared to AMD 2D chiplet technology and Intel 3D stacked micro-bump technology.
- <sup>16</sup> CCX is a term used in AMD CPUs and stands for core complex. It refers to a group of up to four CPU cores in 7002 series processors or up to eight cores in 7003 series processors and their CPU caches (L1, L2, and L3). The number of cores per CCX varies by processor as described in the document at <u>amd.com/system/files/</u> documents/high-performance-computing-tuning guide-amd-epyc7003-series-processors.pdf.



 Table 2. AMD EPYC 7Fx3 and 7x73X processors recommended for EDA workloads

| EPYC<br>model        | Cores /<br>threads | Base<br>speed | Boost<br>speed <sup>17</sup> | L3 cache | Power<br>(Watts) | L3 cache<br>per core |
|----------------------|--------------------|---------------|------------------------------|----------|------------------|----------------------|
| AMD EPYC 7003 series |                    |               |                              |          |                  |                      |
| 75F3                 | 32/64              | 2.95 GHz      | Up to<br>4.0 GHz             | 256 MB   | 280              | 8 MB                 |
| 74F3                 | 24/48              | 3.20 GHz      | Up to<br>4.0 GHz             | 256 MB   | 240              | 10.7 MB              |
| 73F3                 | 16/32              | 3.50 GHz      | Up to<br>4.0 GHz             | 256 MB   | 240              | 16 MB                |
| 72F3                 | 8/16               | 3.70 GHz      | Up to<br>4.1 GHz             | 256 MB   | 180              | 32 MB                |

#### AMD EPYC 7003 series with 3D V-Cache technology

| 7573X | 32/64 | 2.80 GHz | Up to<br>3.6 GHz | 768 MB | 280 | 24 MB |
|-------|-------|----------|------------------|--------|-----|-------|
| 7473X | 24/48 | 2.80 GHz | Up to<br>3.7 GHz | 768 MB | 240 | 32 MB |
| 7373X | 16/32 | 3.05 GHz | Up to<br>3.8 GHz | 768 MB | 240 | 48 MB |

While performance for EDA applications will depend on the tool and design simulated, industry-standard benchmarks illustrate the advantage of AMD EPYC processors.



Figure 4. EPYC 7xF3 series high-frequency parts vs. comparable CPU competitors

Figure 4 shows relative SPECrate2017\_fp\_base scores per core comparing EPYC 7xF3 high-frequency parts to comparable competitor processors with similar core counts on dual-processor systems.<sup>18</sup> While the SPEC<sup>®</sup> benchmarks are not necessarily indicative of EDA application performance, they provide an objective basis for comparison. EPYC processors' superior performance is a result of high-clock speeds, fast DDR4 memory supporting up to 3,200 MT/s, eight memory channels per processor, and ample amounts of L3 cache per core. The green bars in Figure 4 represent different EPYC 7003 series processors (7Fx3) with varying numbers of cores.

<sup>17</sup> EPYC-18: Maximum boost for AMD EPYC processors is the maximum frequency achievable by any single core on the processor under normal operating conditions for server systems.

<sup>18</sup> All stated results are as of May 5th, 2022. See <u>spec.org</u> for more information.

All benchmarks referenced were conducted on 2P systems, so the core counts referenced are across both processors. Configurations as follows:

2P Intel® Xeon® Platinum 8358 (64C) scoring 454 SPECrate 2017\_fp\_base (454/64 = 7.09 score/core)—spec.org/cpu2017/results/ res2021q4/cpu2017-20211025-29752.html

2P AMD EPYC 75F3 (64C) scoring 546 SPECrate 2017\_fp\_base (546/64 = 8.53 score/core)—spec.org/cpu2017/results/ res2021q2/cpu2017-20210409-25543.html

2P Intel® Xeon® Gold 6342 (48C) scoring 395 SPECrate 2017\_fp\_base (395/48 = 8.23 score/core)—spec.org//cpu2017/results/ res2022q2/cpu2017-20220327-31254.html

2P AMD EPYC 74F3 (48C) scoring 484 SPECrate 2017\_fp\_base (484/48 = 10.08 score/core)—spec.org//cpu2017/results/ res2021g2/cpu2017-20210510-25992.htm

2P Intel Xeon Gold 6346 (32C) scoring 325 SPECrate 2017\_fp\_base (325/32 = 10.16 score per core)—spec.org//cpu2017/results/ res2021q3/cpu2017-20210802-28471.html

2P AMD EPYC 73F3 (16C) scoring 398 SPECrate 2017\_fp\_base (398/32 = 12.44 score per core)—spec.org/cpu2017/results/ res2021q3/cpu2017-20210816-28714.html

2P Intel Xeon Gold 6334 (16C) scoring 191 SPECrate 2017\_fp\_base (191/16 = 11.94 score per core)—spec.org/cpu2017/results/ res2021q4/cpu2017-20211025-29748.htm

2P AMD EPYC 72F3 (16C) scoring 249 SPECrate 2017\_fp\_base (249/16 = 15.56 score per core)—spec.org/cpu2017/results/ res2021q4/cpu2017-20210928-29647.html



The highest per-core throughput is generally achieved using processors with lower core counts. Not surprisingly, as illustrated in Figure 4, the 8-core AMD EPYC 72F3 processor delivers the highest per-core performance on the SPECrate2017\_fp benchmark amongst these 7xF3 processors. Since each core has a dedicated memory channel, it offers large cache per core, and cores do not need to compete for access to cache and memory with other cores on the same CCD. For applications where licenses are expensive, server farm administrators often deploy a larger number of lower core count servers, even though this increases the number of servers and racks required, resulting in higher infrastructure and management costs. For many EDA applications, the AMD EPYC 73F3 processor is a strong SKU. It provides a balance of high per-core throughput and density while supporting up to 32 single-threaded simulation jobs per dual-socket server.

Table 3 illustrates the advantages of AMD EPYC 7003 series processors over comparable competitive offerings across multiple points of comparison.<sup>19</sup>

Table 3. AMD EPYC 7003 series processors provide superior clock speed, L3 cache, and cache per core

|                                 | Intel Xeon Gold 6346 <sup>20</sup> | AMD EPYC 73F3 <sup>21</sup> | AMD EPYC 7373X <sup>22</sup> |
|---------------------------------|------------------------------------|-----------------------------|------------------------------|
| # cores                         | 16                                 | 16                          | 16                           |
| Total L3 cache                  | 36 MB                              | 256 MB                      | 768 MB                       |
| L3 cache / core                 | 2.25 MB                            | 16 MB                       | 48 MB                        |
| Memory speed                    | 3200 MT/s                          | 3200 MT/s                   | 3200 MT/s                    |
| Memory channels                 | 8                                  | 8                           | 8                            |
| Base clock (GHz)                | 3.10 GHz                           | 3.50 GHz                    | 3.05 GHz                     |
| Boost clock (GHz) <sup>23</sup> | Up to 3.60 $\rm GHz^{24}$          | Up to 4.00 GHz              | Up to 3.80 GHz               |
| Max memory                      | 6 TB <sup>25</sup>                 | 4 TB                        | 4 TB                         |
| PCIe lanes                      | 64                                 | 128                         | 128                          |

When these results are plotted visually as illustrated in Figure 5A, the differences become apparent. The AMD EPYC 73F3 offers dramatically more L3 cache and cache per core as well as double the number of PCIe channels per socket compared with Intel Xeon Gold 6346.



Figure 5A. Comparing AMD EPYC 73F3 to an alternative processor

<sup>19</sup> These comparisons were made in May 2022. The Intel Xeon Gold 6346 and AMD EPYC 73F3 were both introduced in 02 2021. Both processors feature high clock speeds, the same number of cores/threads, and the same number of memory channels.

#### <sup>20</sup> ark.intel.com/content/www/us/en/ark/ products/212457/intel-xeon-gold-6346processor-36m-cache-3-10-ghz.html

- <sup>21</sup> amd.com/en/products/cpu/amd-epyc-73f3
- <sup>22</sup> amd.com/en/products/cpu/amd-epyc-7373x
   <sup>23</sup> Maximum boost for AMD EPYC processors is the maximum frequency achievable by any single core on the processor under normal
- <sup>24</sup> Maximum Turbo Frequency is the maximum single-core frequency at which the processor is capable of operating using Intel Turbo Boost Technology and, if present, Intel Turbo Boost Max

Technology 3.0 and Intel Thermal Velocity Boost.

<sup>25</sup> See Intel Xeon Gold 6346 Processor specs at ark.intel.com/content/www/us/en/ark/ products/212457/intel-xeon-gold-6346processor-36m-cache-3-10-ghz.html. Note that 6 TB maximum memory assumes the use of Intel® Optane® Persistent Memory. With DRAM, maximum memory capacity on the 6346 processor is 4 TB (same as the EPVC 73F3). The differences between AMD and alternative processors in terms of L3 cache and cache per core are even more dramatic when comparing the AMD EPYC 7373X with AMD 3D V-Cache to the Intel Xeon Gold 6346 as illustrated in Figure 5B.



Figure 5B. Comparing AMD EPYC 7373X to an alternative processor

#### Choosing the right server platform

For EDA server farm administrators, choosing the right processor and server platform can be challenging. There are multiple AMD EPYC processors with different capabilities and price points. Also, applications have different performance characteristics. Given the high cost of software tools, faster processors are generally preferred, but the additional investment to get the most capable processors only makes sense if it results in tangible throughput gains.

Organizations routinely face trade-offs between throughput, cost, and productivity. This concept is illustrated in Figure 6. Organizations can purchase more expensive and capable processors (represented by the right side of the curve), but if the additional investment doesn't result in better application performance, the investment is wasted and throughput per dollar spent decreases.

This is why EDA server farm IT administrators usually deploy different server types for different application workloads. Verification engineers and EDA IT administrators need to weigh multiple considerations and determine what processor and server are optimal for each EDA application.

# PerformanceThroughput

- Turnaround time
- Productivity
- License utilization
- Time to market

#### Cost

- HW costs
- SW costs
- Facilities costs
- Management costs
- Asset depreciation







Figure 6. Selecting the optimal processor and server platform involves balancing many factors

HPE Apollo 2000 Gen10 Plus system with 4 x HPE ProLiant XL225n Gen10 Plus servers powered by AMD EPYC processors achieved ten world records on SPECpower\_ssj2008, making it the most energy-efficient multinode server in the world.<sup>29</sup>

<sup>28</sup> HPE has conducted internal performance testing with the latest AMD EPYC 7003 series processors. These results can be shared with a non-disclosure agreement (NDA) so that customers can make an informed decision about the ootimal server and processor.

<sup>27</sup> PCIe 4.0 delivers 16.0 GT/s, twice the transfer speed of PCIe 3.0; <u>en.wikipedia.</u> <u>org/wiki/PCI\_Express.</u>

<sup>28</sup> HPE HDR InfiniBand adapters are based on standard Mellanox ConnectX-6 technology.

<sup>29</sup> HPE ProLiant XL225n Gen10 Plus achieves 10 records on SPECpower\_ssj®2008 Factors to consider include technology refresh cycles, management costs, a customer's unique application mix, license utilization, license checkout time, TCO considerations, and more.

Unfortunately, there is no one-size-fits-all solution. AMD EPYC 7003 processors generally offer excellent throughput relative to processor costs for most EDA simulation workloads. For cache-intensive RTL simulations, AMD EPYC 7003 processors with 3D V-Cache technology are a good choice. HPE has benchmarked multiple workloads on different AMD EPYC processors and can help customers navigate the trade-offs between license costs, hardware costs, and throughput improvements, and help customers choose a server configuration optimal for their workload.<sup>26</sup>

# The HPE Apollo 2000 Gen10 Plus system

The HPE Apollo 2000 Gen10 Plus system is a dense, multiserver platform delivering tremendous performance, throughput, and workload flexibility in a small data center space footprint. Based on leading-edge AMD EPYC processors, HPE Apollo 2000 Gen10 Plus systems deliver twice the density of traditional rack-mount servers. Each chassis supports up to four dual-processor HPE ProLiant XL225n Gen10 Plus hot-plug servers, each with 2 TB of high-performance 3,200 MT/s DDR4 memory in just two rack units (2U).

For EDA environments, HPE Apollo 2000 Gen10 Plus systems provide the ideal blend of features. They offer exceptional simulation performance, expanded power capacity with 3,000W power supplies, N+N redundant power, and increased thermal capacity and airflow to reliably support long-running, high-throughput EDA simulations.



Figure 7. HPE Apollo 2000 Gen10 Plus system features

With support for the full family of AMD EPYC 7002 and 7003 series processors, including 7003 series processors with 3D V-Cache technology, EDA server farm IT administrators can configure systems to precisely meet workload demands. Customers can choose high-frequency EPYC 7xF3 processors with fewer cores per processor to optimize per-core performance or select high-throughput parts such as the EPYC 7763 processors with 64 cores. For demanding RTL workloads that benefit from large amounts of cache, customers might consider HPE ProLiant XL225n Gen10 Plus hot-plug servers populated with EPYC 7373X CPUs.

Fast I/O is also critical for EDA server farms to help ensure that file and network I/O do not emerge as bottlenecks. HPE Apollo 2000 Gen10 Plus systems offer PCIe Gen4, providing twice the throughput of the previous generation.<sup>27</sup> HPE offers a variety of high-performance PCIe options, including 200 Gbps HPE HDR InfiniBand adapters,<sup>28</sup> multiport 100GbE adapters, and high-performance NVMe SSD drives. Multiple storage options are available inside the chassis ranging from 0 to 24 SFF SAS/SATA hard drives.

#### **Optional liquid cooling**

For customers with suitably equipped data centers, a new option for HPE Apollo 2000 Gen10 Plus systems is plug-and-play support for Direct Liquid Cooling (DLC). The DLC option allows customers to increase power density and data efficiency. HPE server racks connect directly to facility water supplies without the need for secondary plumbing. Options are available for CPU only or CPU plus memory cooling. While air cooling is fine for most applications using the latest AMD EPYC processors described in this document, for specific dense configurations, HPE may recommend the <u>DLC</u> option.

## **HPE ProLiant servers**

For EDA customers that prefer 1U, single-processor systems, the HPE ProLiant DL325 Gen10 Plus v2 server is an excellent solution. This server has modest power and cooling requirements and fits easily into most data center environments.

For physical design workloads that require large amounts of memory, either HPE ProLiant DL365 Gen10 Plus or HPE ProLiant DL385 Gen10 Plus v2 servers are good choices. Both servers support up to 8 TB of memory, critical for memory-intensive EDA applications such as placement and routing.<sup>30</sup>





The HPE ProLiant DL325 Gen10 Plus v2, HPE ProLiant DL365 Gen10 Plus, and HPE ProLiant DL385 Gen10 Plus v2 servers run the entire SKU stack of 3rd generation AMD EPYC processors including those that support 3D V-Cache technology. For customers running EPYC 7003 series processors, minimum OS requirements apply. Supported Linux operating environments include RHEL 8.3, SLES 12 SP5, and SLES 15 SP2.<sup>31</sup>

#### **Comprehensive server security and management**

For security conscience design environments, HPE Apollo and HPE ProLiant systems provide runtime firmware validation that authenticates critical firmware at startup. Only HPE offers industry-standard servers with firmware anchored into silicon with HPE iLO 5<sup>32</sup> and silicon root of trust from HPE. Tied into the silicon root of trust from HPE is the AMD Secure Processor, a dedicated security processor embedded in the AMD EPYC SoC.

Customers can also take advantage of optional HPE Apollo Platform Manager (APM), a rack-level power and system management solution for HPE Apollo servers providing an enhanced graphical interface for ease of system management.<sup>33</sup> An optional HPE Apollo 2000 Rack Consolidation Module kit allows HPE iLO aggregation at the chassis level that can be daisy-chained to connect to a top-of-rack (TOR) management switch.

HPE Performance Cluster Manager (HPCM) is a complete integrated cluster management solution for HPE Apollo systems. HPCM provides system setup, hardware monitoring, and management (aggregating system metrics and remote management from HPE iLO) and cluster health management, image management, and software updates as well as power management.

<sup>30</sup> With the 256 GB LRDIMMs, memory transfer speed is limited to 2,933 MT/s when two DIMMs are installed per memory channel (required to install 8 TB). 3,200 MT/s transfer speeds are supported with one LRDIMM per channel and with smaller DIMM types.
<sup>31</sup> HPE ProLiant DL385 Gen10 Plus

v2 server QuickSpecs

<sup>32</sup> HPE iLO is a remote server management processor embedded in the system boards or HPE ProLiant servers providing lights-out operation.

<sup>33</sup> HPE Apollo Platform Manager QuickSpecs



HPE Apollo 2000 Gen10 Plus systems deliver sustained high-performance across multiple cores.

EDA users can reduce regression runtime, help maximize license utilization, and help reduce TCO by delivering more simulation capacity with a smaller data center footprint.

#### <sup>34</sup> amd.com/en/corporate-responsibility/ data-center-sustainability

- <sup>35</sup> AMD EPYC Bare Metal and Greenhouse Gas Emissions TCO Estimation Tool
- <sup>36</sup> HPE Apollo 2000 Gen10 Plus System with HPE ProLiant XL225n Gen10 Plus Servers Achieves 18 World Records in Energy Efficie
- <sup>37</sup> The latest industry standard SPECpower\_ssj2008 results are detailed at SPECpower\_ssj2008 Results The results referenced are as of March 15, 2021. Details of the four-node HPE Apollo XL225n Gen10 Plus benchmark result are provided here: spec.org/power\_ssj2008/results/res2021q1/ nower ssj2008-20210273-01073 html
- <sup>38</sup> Annual energy cost and rack space calculated based on the performance envelope of a 42U rack populated with HPE ProLiant XL225 Gen10 Plus servers running at 100% utilization vs. the energy and rack space required by competitor products to achieve the same performance. Average price per kWh = \$0.0693.
- <sup>39</sup> AMD benchmark result shared with HPE; AMD EPYC Family Claim Information | AMD

<sup>40</sup> MLNX-001A: EDA RTL Simulation comparison based on AMD internal testing completed on 9/20/2021 measuring the average time to complete a test case simulation. Comparing: 1 x 16C EPYC 7373X with AMD 3D V-Cash technology vs. 1 x 16C AMD EPYC 73F3 on the same AMD Daytona reference platform. Results may vary based on factors including silicon version, hardware and software configuration, and driver versions.

#### **Power efficiency**

For data center operators, power consumption and associated carbon emissions are increasingly important considerations—not just for environmental sustainability goals but to help reduce TCO as well. In September 2021, AMD announced an ambitious goal to deliver a 30x increase in energy efficiency for AMD EPYC CPUs and AMD Instinct accelerators used to power artificial intelligence (AI) training and high-performance computing applications by 2025.<sup>34</sup> AMD offers a Greenhouse Gas Emissions TCO estimation tool that can be used to estimate the potential savings and emission reductions with various AMD EPYC CPUs.<sup>35</sup> In addition to innovations in silicon, power efficiency gains stem from the fact that EDA users can run more concurrent simulations and get results faster, meaning that fewer nodes are required to deliver the same simulation throughput.

The HPE Apollo 2000 Gen10 Plus system with HPE ProLiant XL225N Gen10 Plus servers, builds on the power efficiency of AMD EPYC 7003 series processors, delivering 18 world records in energy efficiency.<sup>36</sup> These dense multi-node servers deliver real space and power savings to data centers of any size. Taking energy efficiency to the maximum, the HPE ProLiant XL225N Gen10 Plus server has the highest result of 17,696 overall ssj\_ops/ watt for 4-node blade configurations on the SPECpower\_ssj® 2008 benchmark.<sup>37</sup> With the HPE Apollo 2000 Gen10 Plus system, it is estimated that customers can see up to \$15,000 in annual energy cost savings.<sup>38</sup>

# Performance where it matters

AMD internal testing shows that AMD 3D V-Cache technology can dramatically accelerate EDA RTL workloads.<sup>39</sup> The 16-core, AMD EPYC 7373X CPU can deliver up to 66% faster simulations on Synopsys VCS, when compared to the EPYC 73F3 CPU.<sup>40</sup> The throughput advantage provided by AMD 3D V-Cache technology is illustrated in Figure 9.

With RTL simulations and other EDA applications, multiple simulations typically run simultaneously on the same server, each on a dedicated processor core. Because software licenses are a precious resource, EDA users need to maximize per-core performance to help minimize license checkout time. Semiconductor firms look for processors that deliver the best per-core performance while simultaneously supporting the most concurrent simulations to optimize resource utilization.



Figure 9. AMD EPYC 7373X vs. EPYC 73F3 CPU running Synopsys VCS

While the benefit of AMD 3D V-Cache technology will vary depending on the EDA model, for memory and cache-intensive workloads, the latest AMD EPYC processors can substantially boost productivity. Customers will typically want to use AMD EPYC 7003 processors for most applications and use processors with 3D V-Cache technology only for workloads that benefit from the additional cache.

To learn more about AMD EPYC 7003 series processors, visit amd.com/en/processors/epyc-7003series.



Make the right purchase decision. Contact our presales specialists. Chat now (sales)

Get updates



With HPE Apollo 2000 Gen10 Plus systems and HPE ProLiant servers based on AMD EPYC processors, EDA users can:

- Reduce regression runtimes to help maximize productivity
- Enable high verification throughput to improve design quality
- Help maximize EDA software licenses utilization to help minimize cost
- Significantly reduce data center footprint by choosing to run more simulations per server by using high core count CPUs

# Solutions ideal for EDA workloads

Whether large or small, silicon design firms are dealing with multiple challenges, including increasing design complexity, time-to-market pressures, and the high cost of engineering talent and software tools. Electronic devices increasingly require more thorough verification as new applications demand high levels of reliability and safety.

HPE servers powered by AMD EPYC processors provide an important new tool and added flexibility for organizations needing to improve the productivity and efficiency of their chip design environments. While results will vary depending on the EDA tools used and the models simulated, AMD EPYC processors with 3D V-Cache technology can deliver dramatic performance gains in some instances—up to 66% for specific RTL simulation workloads.<sup>41</sup>

By deploying HPE Apollo Gen10 Plus systems or HPE ProLiant servers, customers can:

- Accelerate the design process to meet time-to-market pressures
- Improve product quality and meet more stringent reliability requirements with the capacity to run more simulation and verification workloads within available time frames
- Help maximize value from limited IT budgets by deploying cost-effective, higher throughput systems that deliver improved server farm utilization, more efficient software license utilization, and better engineering productivity

# Learn more at

hpe.com/servers/apollo2000 hpe.com/us/en/servers/proliant-servers.html hpe.com/partners/amd

© Copyright 2022 Hewlett Packard Enterprise Development LP. The information contained herein is subject to change without notice. The only warranties for Hewlett Packard Enterprise products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. Hewlett Packard Enterprise shall not be liable for technical or editorial errors or omissions contained herein.

AMD is a trademark of Advanced Micro Devices, Inc. Intel Optane, Intel, Intel Xeon Gold, and Intel Xeon Platinum are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. Linux is the registered trademark of Linus Torvalds in the U.S. and other countries. Red Hat is a registered trademark of Red Hat, Inc. in the United States and other countries. All third-party marks are property of their respective owners.

a50002333ENW, Rev. 2