AltiVec

AltiVec is a whole of instructions SIMD of operations in Floating decimal point conceived by, and property of, Apple, IBM and Motorola (AIM alliance), and applied on versions of PowerPC the such G4 of Motorola and the G5 of IBM. AltiVec is a Trade name only held by Motorola; thus the unit is also called Velocity Engine by Apple and VMX by IBM.

PowerPC 970 (dubbed the G5 by Apple) desktop CPU from IBM does include has AltiVec high-performance links. It includes two functional units to allow Superscalar effects; full VMX in one has links, and has to multiply/adder in the other. -->

Origins

In the middle of the Years 1990, various microprocessor manufacturers were interested in the development of parallel calculating units which would make it possible to reduce the load of the processor by treating face of great quantities of data. Indeed, the multi-media rise of the activities had highlighted the requirements in power for these applications compared to traditional applications like office automation. The first tests of HP and Sun Microsystems made it possible to define the bases of the treatment units SIMD within the framework of such applications and opened the way at Intel and Motorola. Thus, in 1996, Intel finalized its extension MX which made it possible to work on data 64 bits via a vectorial calculating unit. This extension developed to give rise to the units SE, SE and SE which, by taking again the base of MX, authorizes work on the floating simple one or double precision double precision within the multi-media framework of applications like video compression or the synthesis of images 3D.

However, in spite of the projection which MX and of its successors in terms of performances represented, this technology seemed to be under-exploited in the communities of developers. Indeed, of the problems like the not-orthogonality of their instruction sets, the problems of alignment reports and the problems of adequacies between the structures of data and the limitations of these units largely slowed down the development of application of great scale. Parallel to work of Intel and by noting the problems against which the microprocessor manufacturers ran up, Motorola was combined in 1996 with Apple to conceive the future architecture of the Power PC G4. Keith Diefendorff, then chief of the AltiVec project, defines the broad outlines of this calculating unit by not making only one simple multi-media extension, but rather a general-purpose calculating unit. Tie left the experiments of its competitors, Motorola thus decided to begin again to zero the material design of this type of calculating unit. Finally, in 1999, PowerPC G4 from Apple is lived equipped with the calculating unit AltiVec from Motorola. In 2002, IBM integrates the Alliance AIM and integrates AltiVec in the processors which equip Power PC G5.

When it was presented the first time, at the end of the Années 1990, AltiVec was the most powerful system SIMD, for a Processeur of a Central processing unit of office. Compared with its contemporaries (MX of Intel, of integer only; SE in floating decimal point, and various other systems of suppliers of processors RISC), AltiVec offered more registers, which could be used in more varied ways and be operated by a whole of instructions much more flexible. However, SE, system SIMD from Intel of fourth generation, presented with the Pentium 4, proposes large numbers of functionalities which come to fill the differences between these technologies.

Principles of operation

Altivec is based on an implementation intra-processor of the model SIMD. In this model, the handling of structures of special data makes it possible to reproduce the behavior of a unit SIMD within one only processor. These types of data named vectors are blocks of 128 bits being able to contain data of several types and are treated by specific operators who simultaneously carry out their calculations on all the elements of the vector. A set of 32 vectorial registers 128 bits then make it possible to store 16 entireties 8 signed bits or not signed, 8 entireties 16 signed bits or not signed, 4 entireties 32 signed bits or not signed or 4 number realities 32-bit in floating point. Contrary to SE, Altivec does not propose a support for the real numbers double precision double precision.

Within the architecture of the Power PC, Altivec is completely independent of the traditional calculating units. It has four treatment units equipped with pipelines which enable him to treat several operations on the floating ones single precision, of the operations of permutations, the simple operations on entireties (like the addition) or of the operations composites on entireties (like the partial multiplications) efficiently. It is moreover able of simultaneously to start the execution of several operations in several calculating units, one then speaks about calculating unit Super-scalar. The user has moreover the possibility of using the whole of the normal registers provided by the Power PC and the 32 vectorial registers provided by AltiVec in a simultaneous way.

Instruction set

The recent versions of the development tools, such GNU To compile Collection, VisualAge of IBM and other compilers, provide the direct support to the instructions of AltiVec since programs in C and C++. In practice, 162 primitives C are available and were left again in four great groups:
  1. the instructions of loading which make it possible to charge one or more values with the memory towards a vectorial register or a register towards the memory.

  2. instructions pre-chargment or of prefetch which authorizes the direct handling of information present in the mask so as to optimize their order of treatment and this fact of increasing the total performances of the application.
  3. instructions of handling of the vectors . Certainly the most powerful functions of AltiVec, one finds in this category all the instructions of shifts of bits, of permutations of elements, fusion, duplication of vectorial data.
  4. the arithmetic instructions which make it possible to classically carry out operations such as the addition, the subtraction or of the operations on the numbers in floating decimal point (rounded or truncation) on vectors. Altivec provides also a certain emulating number of instructions of the functions of the type DSP, of the Boolean operations or comparisons.

The principal difference between the instruction set provided by Altivec and that of SE is that Altivec provides a great number of operations known as horizontal , i.e. operations which apply not to the elements of two vectors but to the contents of only one vector. For example, the function vec_sums carries out the sum of the elements of a vector.

Example of use

The use of Altivec within a code C is relatively simple. Let us consider for example a function C calculating the difference in two images of NxN pixels. It is a question here of carrying out the subtraction of the intensities of the pixels and of applying a thresholding to it to obtain an image in black and white.

unsigned I1 tank, I 2, R; for (size_t I = 0; I < N*N; i++) { signed shorts S = I2 - I1; R = (s<10 || s>240)? 0: 255; }

The code C equivalent Altivec is the following:

unsigned I1 tank, I 2, R; vector unsigned tank vS, vR, v240, v10, V0, V255; vector bool tank vC; v0 = vec_splat_u8 (0); v10 = vec_splat_u8 (10); v240 = (vector unsigned tank) vec_splat_s8 (- 15); v255 = (vector unsigned tank) vec_splat_s8 (- 1); for (size_t I = 0; I < (N*N) /16; i++) { vS = vec_sub (vec_ld (I2,16*i), vec_ld (I1,16*i)); vC = vec_or (vec_cmplt (vS, v10), vec_cmpgt (vS, v240)); vR = vec_sel (v255, v0, vC); vec_st (vR, R, i*16); }

Several points are to be noted:

  • the vectorial variables are declared by using the class of storage of the type vector . Thus, for a vector containing of the elements of a type T , the type corresponding is vector T ;

  • the primitives vec_ld and vec_st replace the accesses to the elements of the table by respectively carrying out the loading or the writing of blocks;
  • Each Altivec primitive treats the data per blocks of 16 elements, which implies that the principal loop should not carry out NxN iterations any more, but only NxN/16;
  • the vectorial constants are generated out of the principal loop for reasons of performances;
  • the tests necessary to the thresholding are also vectorized thanks to the primitive vec \ _sel which makes it possible to only carry out the equivalent of 16 selections in three cycles.

The principal advantage of these functions is their facility of uses due to a transparent overload for the enemble of the vectorial types, contrary to the instructions available on architectures Intel or the noun of the function is modified according to the type of the vector. An exhaustive list of the functions available is accessible on the page '' Altivec Instructions Cross-country race Refer '' of Apple.

Applications

Apple is the first customer of AltiVec, and uses it to accelerate the applications of multi-media such QuickTime and ITunes, and the programmes of image processing such Photoshop of Adobe. Motorola provided AltiVec processors for all the central processing units of office since the introduction of the G4. AltiVec is also employed in some embarked systems, to provide great performances at the time of the Digital processing of the signal.

See too

Random links:Aleksandr Korneev | Corycides | Group units | List greater English football stadiums | Maria turnip | Hadlock-Irondale_gauche,_Washington