Nvidia hopes that the 64-bit “Denver” version of its Tegra K1 processor will offer PC-like performance in a tablet form factor. On Monday, the company released its first benchmarks backing that up. At the Hot Chips conference in San Jose, Nvidia revealed some of the differences distinguishing its 32-bit, quad-core Nvidia Tegra K1 chip, which debuted at CES in January, with the 64-bit, dual-core version of the same chip. While Nvidia has shipped its 32-bit Tegra K1, most recently in the Acer Chromebook 13, the “Denver” of the Tegra K1 version has not yet been released. Denver will run slightly faster than the 32-bit K1: up to 2.5GHz, versus 2.3GHz for the latter.
One of the highlights of the first day on this year’s Hot Chips, an annual conference on chip technology, was Nvidia unveiling of Denver, a custom CPU used in an upcoming version of Tegra K1 processor. There are two versions of the Tegra K1. The 32-bit version has a four Cortex-A15 CPU cores running at up to 2.3GHz, 32KB of L1 instruction and data caches and 2MB of L2 cache. This chip is available now and is used in a handful of devices including the Shield tablet, the Xiaomi MiPad sold in China and the Acer Chromeboook 13 announced earlier today. The second has two Denver custom CPU cores based on the ARMv8 64-bit instruction set running at up to 2.5GHz, 128K of L1 instruction cache 64K L1 data cache, and 2MB of L2 cache. Both use the same Kepler GPU with 192 CUDA cores. The two chips are pin compatible, which should make it easy to design devices that work with either or both. The idea behind Denver, according to Darrell Boggs, Nvidia’s Director of CPU Architecture and Principal Architect, is to deliver PC-class performance in mobile devices compatible with the massive ARM hardware and software ecosystem. To deliver this level of performance, Denver has a 7-wide superscalar architecture, meaning it can execute seven instructions per clock cycle compared with 3 instructions per clock with the A15, and it uses “aggressive” hardware prefetching, a commonly-used technique to place data closer to the CPU before it is needed to speed things up.