Cortex-A15 Architecture Analysis: Exploring the Secret of Powerful Performance

This year's new mobile phone trend is nothing more than a comprehensive approach to the four cores, but the same is the quad-core, in fact, the actual performance is very different. For example, the quad-core mobile phones for the entry-level mainstream market are generally Cortex-A7 and Cortex-A9 CPU cores. These cores have lower performance, cost and heat, so they are popular in the entry market.

In the high-end smartphones, there have been some new changes. In addition to the quad-core of the Qualcomm Krait series architecture that has emerged last year, the ARM orthodox Cortex-A15 has also embarked on the stage of quad-core mobile phones, such as Samsung's Exynos 5. Octa, NVIDIA's Tegra 4.

The Cortex-A15 is the most powerful CPU core architecture in the ARM Cortex-A family and was released in 2010. Texas Instruments was the first (2011) to commission a processor based on this architecture (model OMAP 5).

Compared with ARM's Cortex-A7, Cortex-A9 and other microarchitectures, the Cortex-A15 is very different.

A15 and A9 also have out-of-order execution, but Cortex-A15 has (twice) instruction transmit port and execution resources, instruction decoding capability is also 50% higher, dynamic branch prediction capability is stronger (multi-level branch table cache is adopted) ), the command pick-up bandwidth is stronger (128 bit vs 64 bit), which can make the A15's pipeline execution more efficient. In addition, the A15 uses the VFPv4 floating-point unit design, which can execute FMA instructions and hardware divide instructions. Compared to A9, the peak vector floating-point performance is basically only half of A15.

However, in reality, the opponent of the A15 should be Krait, an ARMv7A compatible processor architecture designed by Qualcomm. Qualcomm does not reveal much details about Krait's architecture. It is roughly three instruction decode ports (same as A15), seven command transmit ports (eight for A15), four transmit ports (eight for A15), and 4KB. +4KB single-cycle delay L0 Cache design.

If you use the old Dhrystone DMIPS/MHz as a performance measure, Krait is 3.3, A9 is 2.5, and A15 is 3.5. From the paper, Krait is really suitable as an A15 opponent.

However, Dhrystone's shortcomings are obvious. It can be fully loaded into the CPU's L1 cache. This means that the L2 cache cannot be used. (The A15 is an all-in-one design. Krait is a separate design. The integrated design can reduce memory swapping. A large amount of delay caused), hardware efficiency/complexity of out-of-order execution, memory subsystem unit (A15 memory unit can implement a load instruction pre-executed under certain conditions, and whether Krait can have such capability is not clear) A valuable assessment of the impact of many architectural differences on actual performance.

Of course, the DMIPS indicator used by ARM is actually not the Dhrystone 28 years ago, but from the EEBMC Coremark (in fact, Coremark is an improved version of the former, mainly to reduce pre-optimization, stricter rules for testing), but CoreMark is also Can be plugged into the L1 cache of most processors today, Dhrystone can not reflect the real application of mobile devices today still exists here.

Due to the increasing complexity of the application environment, it is becoming more and more complicated to properly evaluate the performance of a mobile device processor, because now web browsing, 3D games, audio and video, artificial intelligence, etc. of mobile devices cannot be completely plugged into L1. Cache, because these applications involve a lot of data processing.

At this time, the experience and testing methods that people have learned on desktop performance evaluation can be adopted on mobile devices. For CPU testing, the most reasonable test method is to use real-time application source code with multiple calculation scales to compile and test with native code. Under such circumstances, the computing unit and memory unit of the mobile device can be fully tested. The test results are the most informative.

The CPU test that can be officially recognized by the industry (computer industry, academic research) is SPEC.org's SPEC CPU. It uses source code to allow testers to compile to native code for testing. Many processors are in development at the beginning. The SPEC CPU is used as the most important performance evaluation indicator.

The latest version of the SPEC CPU is CPU2006, but CPU2006 is for the current desktop, workstation, server processor application environment, memory capacity (CPU2006 supports multi-threaded testing, so the required memory capacity is quite high, 8 thread processor with 16 GB memory It is also a bit reluctant) and its own storage space (the number of GB space is not compiled, it takes 1xGB after compilation), the requirements are higher, so the use of CPU2006 is not realistic for current mobile devices.

The SPEC CPU is updated every few years. The old version before CPU2006 was CPU2000. Its speed integer performance test can be run on a 1GB mobile device. In the past, even some CPU2000 tests were ported to the GPU. Do accelerated performance testing.

The ARM camp rarely publishes SPEC CPU test results, which of course has a reason, because in the past many times, ARM has only a few hundred megabytes of memory for the devices, and the space for the program is left after being plugged into the operating system. It's even less, and because of the power-saving pre-requisites, the performance of the ARM processor is actually not very good.

Interestingly, NVIDIA in the ARM camp announced the CPU2000INT test results when the Tegra 4 was released: In the NVIDIA reference platform set at 1.9GHz, the Tegra 4 SPEC PU2000int_base is 1168. This test result is equivalent to the AMD K8 Sledgehammer 2GHz test results published on SPEC.org in the fourth quarter of 2003.

NVIDIA also conducted the CPU2000 test on Xiaomi Phone 2 (using Qualcomm Snapdragon S4 Pro, APQ8064 1.7GHz), and estimated the S800 based on the change of S800 in SPC relative to S600 in IPC (per-cycle command) and frequency. CPU2000 test results:

From the chart, the S600's CPUINT2000_base test result is less than half of the Tegra 4, which largely reflects the real application difference between the Cortex-A15 and the Krait processor.

It should be pointed out that the test platform of both sides also has some influence. For example, when the Xiaomi mobile phone 2 performs this test, there is no frequency reduction of the CPU frequency. NVIDIA does not explain this.

In general, when the APQ8064 is running at full speed on a quad-core, the frequency will drop from the highest 1.7GHz due to overheating for a period of time. Of course, NVIDIA announced here the CPU2000INT test results in speed mode, which is a single-threaded test, only one CPU core will be used.

Unfortunately, Qualcomm has not raised any objection to this test result (it is said that Qualcomm is not very concerned about the high performance of the processor performance, they call it the baseband to send the CPU), and the configuration of the CPU2000 is quite complicated for the average person. Things, so this test is temporarily not supported by third parties using the same platform test.

VIA Electronics has released a document when it released the Nano X2 processor. It also uses the CPU2000 to test the Nano X2 1.2+GHz and Atom D525. The CPU2000 INT scores of the gcc compiler are 799 and 582 respectively, using Intel. The compiler's scores are 955 and 725 respectively.

NVIDIA's Tegra 4 CPU belongs to the ARMv7A instruction set, so the compiler is probably armcc or gcc. NVIDIA's newly acquired PGI is a veteran compiler vendor. Maybe it can provide an internal beta to NVIDIA, but PGI has never been released before. ARM compiler.

At this point, you should have a general understanding of the architectural features of the Cortex-A15 and the differences in performance of some of its competitors, but what about the actual application of such a flagship processor in smart devices?

With APM`s latest technologies, SP series Programmable Dc Power Supply offers high efficiency, high precision ,high stability ,low ripple with full featured power test solution. The Switch Mode Power Supply gives you just the right performance at just the right price.

From 600W to 4kW ,the whole series includes Variable Voltage Power Supply, equipped with a flexible auto-ranging output stage, maximum output voltage and current is up to 800V and 200A respectively.

Some features of the DC Power Supply as below:

Ultrafast respond time and high efficiency
Accurate voltage and current measurement capability
Constant Power and wide range of voltage and current output
Equips with LIST waveform editing function
Compliant with SCPI communication protocol
Support RS232/RS485/LAN/USB (standard) ,GPIB (optional)
Master/Slave parallel and series operation mode for up to 10 units
Built-in standard automobile electrical testing curves
Full protection: OVP/OCP/OPP/OTP/SCP
Voltage drop compensation by remote sense line.
Have obtained CE,UL,CSA,FCC.ROHS

Dc Power Supply

Dc Power Supply,Ac Dc Power Supply,12V Dc Power Supply,48Vdc Power Supply

APM Technologies (Dongguan) Co., Ltd , https://www.apmpowersupply.com