It was only a matter of time before Nvidia took the pristine TU116 graphics processor in its GeForce GTX 1660 Ti and made it a little bit to create a low cost derivative. The new GeForce GTX 1660 is, unsurprisingly, very similar to the higher-end model in that it lacks the Turing architecture's signature RT and Tensor cores. Instead, it aims to-die resources at accelerating today's rasterized games.
Nvidia does not even cut much from TU116's resource pool in the creation of the GeForce GTX 1660: a couple of Streaming Multiprocessors are excised, taking 128 CUDA cores and eight texture units with them. But the GPU is otherwise quite complete. This card's biggest loss is its lack of GDDR6 memory. By swapping in 8 Gb / s GDDR5 instead, bandwidth drops from the 1660 Ti's 288 GB / s to a mere 192 GB / s.
Naturally, the GeForce GTX 1660 is primarily aimed at FHD gaming, where 6GB of slower memory will not hurt performance as much as it would at higher resolutions. Can the $ 220 / £ 200 board maintain a fast enough frame rate to stave off AMD's Radeon RX 590 with more GDDR5's a wider bus, though?
TU116 Recap: Turing Without the RT and Tensor Cores
The GPU at the heart of the GeForce GTX 1660 is specifically named TU116-300-A1. It's a close relative of the GeForce GTX 1660 Ti's TU116-400-A1, trimmed from 24 Streaming Multiprocessors to 22. We're obviously still dealing with a processor lacking Nvidia's future-looking RT and Tensor cores, measuring 284mm² and composed of 6.6 Despite its smaller transistors, the TU116 is 42 percent larger than the GP106 that preceded it. Some of that growth is attributable to the more sophisticated shaders of the Turing architecture. Like the higher-end GeForce RTX 20-series cards, the GeForce GTX 1660 supports simultaneous execution of FP32 arithmetic instructions, which represent most shader workloads, and INT32 operations (for addressing / fetching data, floating-point min / max, compare, etc. ) When you hear about Turing cores achieving better performance than Pascal at a given clock rate, this capability largely explains why.
Turing's Streaming Multiprocessors are composed of fewer CUDA cores than Pascal's, but the design is in part compensated by spreading more SMs across each GPU. The newer architecture assigns one scheduler to each set of 16 CUDA cores (2x Pascal) along with one dispatch unit per 16 CUDA cores (same as Pascal). Four of these 16-core groupings comprise the SM, along with 96KB of cache that can be configured as 64KB L1 / 32KB shared memory and vice versa, and four textures units. Because Turing doubles up on schedulers, it only needs to issue a instruction to the CUDA cores every other clock cycle to keep them full. In the TU116, Nvidia replaces Turing's Tensor cores with 128 dedicated FP16 cores per SM, which allows the GeForce GTX 1660 to process half-precision operations. Inside, it's free to issue a different instruction to any other unit at 2x the rate of FP32. The other Turing-based GPU boast double-rate FP16 as well through their Tensor cores, so TU116's configuration serves to maintain that standard through hardware put in place specifically for this GPU. This is the updated version of the one published in our GeForce GTX 1660 Ti review, which illustrates TU116's massive improvement to half-precision throughput compared to GeForce GTX 1060 and its Pascal-based GP106 chip.