The TM3270 Media-processor
In this thesis, we present the TM3270 VLIW media-processor, the latest of TriMedia processors, and describe the innovations with respect to its prede- cessor: the TM3260. We describe enhancements to the load/store unit design, such as a new data prefetching technique, and architectural enhancements, such as additions to the TriMedia Instruction Set Architecture (ISA). Examples of ISA enhancements include collapsed load operations, two-slot operations and H.264 specific CABAC decoding operations. All of the TM3270 innovations contribute to a common goal: a balanced processor design in terms of silicon area and power consumption, which enables audio and standard resolution video processing for both the connected and portable markets. To measure the speedup of the indi- vidual innovations of the TM3270 design, we evaluate processor performance on a set of complete video applications: motion estimation, MPEG2 encoding and temporal upconversion. Each of these applications have been optimized to take advantage of the TM3270 enhancements, and the associated speedups have been measured to evaluate the impact of e.g. load/store unit improvements and new operations. We show that load/store unit improvements, such as data prefetch- ing, may improve the dynamic performance complexity (processor cycle count) by more than a factor two, for larger on-chip memory latencies. The speedup of indi- vidual ISA enhancements are measured in terms of both static (VLIW instruction count) and dynamic (processor cycle count) performance complexity, and both at the level of individual kernels and complete applications. Combined, the TM3270 enhancements result in speedups of more than a factor two, for the evaluated video applications.
