Can ATI’s new architecture do what the HD2xxx and HD3xxx families of GPUs couldn’t?
Not long ago, both ATI and NVIDIA claimed to be a little more environment friendly (read green) than they had been previously, in terms of power consumption of their latest 3D accelerators. The top priority, as we know (and conflicting with anything green), is a GPU that is more powerful so that games can run faster and look better. Performance and efficiency are two binaries whose existence together is extremely rare. Looking at NVIDIAs GTX2xx architecture last month, (July 2008), we see something more powerful than their earlier G80/G92, and therefore more power hungry. We rejoice about the former, after all a boost of 30 to 75 per cent is no small matter. Complain as we may about the latter issue, there is very little power saving one can expect from a geometric increase in transistor count (from 754 million to some 1.4 billion). Power issues aside and visual nirvana firmly in sight, ATI was quick to offer the 48xx series a little after the GTX2xx launch. No brainstorm this—after all ATi has had their backs solidly planted against a wall since November 2006 and the G80 (GeForce 8800GTX), which was one architecture that they (ATI) had absolutely no retort to. Their floundering position in the GPU market with failures like the HD2900XT and pseudo failures like the HD3870, while 8800GTX’s sold like hot cakes is ample testament to this. ATI did have a fan following in the mid-range segment, but NVIDIA completely dominated the flagship segment, and the performance war gave way to a reign of peace with the mighty G80 trampling ATI’s offerings beneath its 128 shader units a.k.a. stream processors.
With multithreading in mind, ATI’s new Radeon HD4870 and HD4850 feature a staggering 800 SPs each, unheard of in any previous GPU. There’re also plans to launch an X2 version, which is simply a larger PCB which houses two HD4870 GPUs therefore taking the SP count to a colossal 1,600 SPs. In terms of fabrication processes, ATI has a slight advantage with their new series having migrated to a 55-nm manufacturing process much earlier. Incidentally, NVIDIA’s newer GPUs like the GeForce 9800GTX and the GeForce GTX280/260 are manufactured on a 65-nm process. A 65-nm core will run hotter than a 55-nm core, not to mention costing more, as more silicon is used per GPU. Note that NVIDIA also plans a die shrink to 55-nm, with their (yet unseen) GeForce 9800GTX+.
Codenamed the RV770, the Radeon HD48xx series has also seen a tremendous growth in transistors up to 956 million, from the previous high of 666 million. Of course the GTX2xx is still the most populous GPU transistor wise with close to 1.4 billion transistors, but this is bigger than anything we’ve seen from ATI. Opposed to a larger die and older manufacturing process, the RV770 is cheaper to manufacture than the GTX2xx due to the move to a smaller fabrication process.
Do note that we aren’t doing any in depth comparisons between the two companies’ offerings, and this is strictly a look at ATI’s new wonder kid. Correlations, if any, are made simply for the sake of better understanding. Last year (August 2007 to be exact) we took a look at both NVIDIA’s and ATI’s new architectures—the G80 and the R600 in our insight feature Riders On The Stream. It was interesting to note that both companies’ approach to SPs was radically different. Instead of the rigid architecture that NVIDIA followed by fixing the number of SPs in relation to the amount of texture processors and memory channel controllers, ATI used a more dynamic approach. While NVIDIA had 128 scalar SPs on their G80 ATI used 64 SPs. We say 64 SPs and not 320 SPs since each of these SP consists of five ALUs, (Arithmetic Logic Units), which is how they arrived at the magic figure of 320 SPs, that is, 64 x 5. By the proper definition of an SP ATI’s 320 units on the RV 670 (Radeon HD38xx) cannot be termed as SPs any more than the HD48xx can claim to have 800 SPs as they aren’t really independent, scalar processors.
In essence, ATI’s 64 SPs are capable of working on more complex (read multithreaded) operations than NVIDIAs simpler SPs. In ATI’s case, it’s also true that each cluster of 5 ALUs, (1 SP), can only work on a single thread at a time. In case of a complex, multithreaded operation, this setup would be brilliant as one SP from ATI is more powerful than one NVIDIA SP. But in case of simple operations which can be handled by a single ALU, the other 4 ALUs in one SP would be unoccupied or possibly under occupied. In case of a simple operation, NVIDIA nulls the advantage ATI has with a more powerful SP, and it come down to sheer parallelism, where 128 SPs would outperform 64 SPs—a scenario which has occurred often enough in the past, and was reason for the HD3870s failure against the 8800GTX.
ATI’s underlying architecture hasn’t changed—so 800 SPs on the new HD48xx is really equal to 160 SPs (800/5). NVIDIA too has retained the same scalar architecture. So what has ATI done differently? According to their press releases ATI claims that their execution units on the HD48xx are 40 per cent more efficient than those on their earlier HD3870. There is no way to validate this, but the sheer number of SPs, that is, 160 is definitely enough to make the 48xx series far faster than anything else that ATI has marketed before, the previous best being their HD3870 with 64 SPs.
Both AMD and NVIDIA are guilty of using unnecessarily confusing nomenclature and designations on the various parts that their GPUs consist of. Both ATI GPUs, the RV670 and RV770 have an SIMD core (see figure 5). This core consists of 16 SPs grouped together. While the older RV670 had four such SIMD units, the new RV770 drops in an additional six SIMD units, taking the total to ten SIMD units. Besides this increase in the number of SIMD units, (and therefore SPs), the RV770 is identical to the RV670.