GL_NV_fragment

GL_NV_fragment_program

Name
Name Strings
Contact
Notice
IP Status
Status
Version
Number
Dependencies
Overview
Issues
New Procedures and Functions
New Tokens
Additions to Chapter 2 of the OpenGL 1.2.1 Specification (OpenGL Operation)
Additions to Chapter 3 of the OpenGL 1.2.1 Specification (Rasterization)
Additions to Chapter 4 of the OpenGL 1.2.1 Specification (Per-Fragment Operations and the Framebuffer)
Additions to Chapter 5 of the OpenGL 1.2.1 Specification (Special Functions)
Additions to Chapter 6 of the OpenGL 1.2.1 Specification (State and State Requests)
Additions to Appendix F of the OpenGL 1.2.1 Specification (ARB Extensions)
Additions to the AGL/GLX/WGL Specifications
Dependencies on GL_NV_vertex_program
Dependencies on NV_texture_shader
Dependencies on NV_texture_rectangle
Dependencies on ARB_texture_cube_map
Dependencies on EXT_fog_coord
Dependencies on NV_depth_clamp
Dependencies on ARB_depth_texture and SGIX_depth_texture
Dependencies on NV_float_buffer
Dependencies on ARB_vertex_program
Dependencies on ARB_fragment_program
GLX Protocol
Errors
New State
New Implementation Dependent State
Revision History

Name

      
    NV_fragment_program

Name Strings

  
    GL_NV_fragment_program

Contact

  
    Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)  
    Mark J. Kilgard, NVIDIA Corporation (mjk 'at' nvidia.com)

Notice

  
    Copyright NVIDIA Corporation, 2001-2002.

IP Status

  
    NVIDIA Proprietary.

Status

  
    Implemented in CineFX (NV30) Emulation driver, August 2002.  
    Shipping in Release 40 NVIDIA driver for CineFX hardware, January 2003.

Version

  
    Last Modified Date:  $Date: 2005/05/24 $  
    NVIDIA Revision:     73

Number

Dependencies

  
    Written based on the wording of the OpenGL 1.2.1 specification and  
    requires OpenGL 1.2.1.  
  
    Requires support for the ARB_multitexture extension with at least  
    two texture units.  
  
    NV_vertex_program affects the definition of this extension.  The only  
    dependency is that both extensions use the same mechanisms for defining  
    and binding programs.  
  
    NV_texture_shader trivially affects the definition of this extension.  
  
    NV_texture_rectangle trivially affects the definition of this extension.  
  
    ARB_texture_cube_map trivially affects the definition of this extension.  
  
    EXT_fog_coord trivially affects the definition of this extension.  
  
    NV_depth_clamp affects the definition of this extension.  
  
    ARB_depth_texture and SGIX_depth_texture affect the definition of this  
    extension.  
  
    NV_float_buffer affects the definition of this extension.  
  
    ARB_vertex_program affects the definition of this extension.  
  
    ARB_fragment_program affects the definition of this extension.

Overview

  
    OpenGL mandates a certain set of configurable per-fragment computations  
    defining texture lookup, texture environment, color sum, and fog  
    operations.  Each of these areas provide a useful but limited set of fixed  
    operations.  For example, unextended OpenGL 1.2.1 provides only four  
    texture environment modes, color sum, and three fog modes.  Many OpenGL  
    extensions have either improved existing functionality or introduced new  
    configurable fragment operations.  While these extensions have enabled new  
    and interesting rendering effects, the set of effects is limited by the  
    set of special modes introduced by the extension.  This lack of  
    flexibility is in contrast to the high-level of programmability of  
    general-purpose CPUs and other (frequently software-based) shading  
    languages.  The purpose of this extension is to expose to the OpenGL  
    application writer an unprecedented degree of programmability in the  
    computation of final fragment colors and depth values.  
  
    This extension provides a mechanism for defining fragment program  
    instruction sequences for application-defined fragment programs.  When in  
    fragment program mode, a program is executed each time a fragment is  
    produced by rasterization.  The inputs for the program are the attributes  
    (position, colors, texture coordinates) associated with the fragment and a  
    set of constant registers.  A fragment program can perform mathematical  
    computations and texture lookups using arbitrary texture coordinates.  The  
    results of a fragment program are new color and depth values for the  
    fragment.  
  
    This extension defines a programming model including a 4-component vector  
    instruction set, 16- and 32-bit floating-point data types, and a  
    relatively large set of temporary registers.  The programming model also  
    includes a condition code vector which can be used to mask register writes  
    at run-time or kill fragments altogether.  The syntax, program  
    instructions, and general semantics are similar to those in the  
    NV_vertex_program and NV_vertex_program2 extensions, which provide for the  
    execution of an arbitrary program each time the GL receives a vertex.  
  
    The fragment program execution environment is designed for efficient  
    hardware implementation and to support a wide variety of programs.  By  
    design, the entire set of existing fragment programs defined by existing  
    OpenGL per-fragment computation extensions can be implemented using the  
    extension's programming model.  
  
    The fragment program execution environment accesses textures via  
    arbitrarily computed texture coordinates.  As such, there is no necessary  
    correspondence between the texture coordinates and texture maps previously  
    lumped into a single "texture unit".  This extension separates the notion  
    of "texture coordinate sets" and "texture image units" (texture maps and  
    associated parameters), allowing implementations with a different number  
    of each.  The initial implementation of this extension will support 8  
    texture coordinate sets and 16 texture image units.

Issues

  
    What limitations exist in this extension?  
  
        RESOLVED:  Very few.  Programs can not exceed a maximum program length  
        (which is no less than 1024 instructions), and can use no more than  
        32-64 temporary registers.  Programs can not access more than one  
        fragment attribute or program parameter (constant) per instruction,  
        but can work around this restriction using temporaries.  The number of  
        textures that can be used by a program is limited to the number of  
        texture image units provided by the implementation (16 in the initial  
        implementation of this extension).  
  
        These limits are fairly high.  Additionally, there is no limit on the  
        total number of texture lookups that can be performed by a program.  
        There is no limit on the length of a texture dependency chain -- one  
        can write a program that performs over 1000 consecutive dependent  
        texture lookups.  There is no restrictions on dependencies between  
        texture mapping instructions and arithmetic instructions.  Texture  
        lookups can be performed using arbitrarily computed texture  
        coordinates.  Applications can carry out their calculations with full  
        32-bit single precision, although two lower-precision modes are also  
        available.  
  
    How does texture mapping work with fragment programs?  
  
        RESOLVED:  This extension provides three instructions used to perform  
        texture lookups.  
  
        The "TEX" instruction performs a lookup with the (s,t,r) values taken  
        from an interpolated texture coordinate, an arbitrarily computed  
        vector, or even a program constant.  The "TXP" instruction performs a  
        similar lookup, except that it uses the fourth component of the source  
        vector to performs a perspective divide, using (s/q, t/q, r/q).  In  
        both cases, the GL will automatically compute partial derivatives used  
        for filter and LOD selection.  
  
        The "TXD" instruction operates like "TEX", except that it allows the  
        program to explicitly specify two additional vectors containing the  
        partial derivatives of the texture coordinate with respect to x and y  
        window coordinates.  
  
        All three instructions write a filtered texel value to a temporary or  
        output register.  Other than the computation of texture coordinates  
        and partial derivatives, texture lookups not performed any differently  
        in fragment program mode.  In particular, any applicable LOD biases,  
        wrap modes, minification and magnification filters, and anisotropic  
        filtering controls are still applied in fragment program mode.  
  
        The results of the texture lookup are available to be used arbitrarily  
        by subsequent fragment program instructions.  Fragment programs are  
        allowed to access any texture map arbitrarily many times.  
  
    Can fragment programs be used to compute depth values?  
  
         RESOLVED:  Yes.  A fragment program can perform arbitrary  
         computations to compute a final value for the fragment, which it  
         should write to the "z" component of the o[DEPR] register.  The "z"  
         value written should be in the range [0,1], regardless of the size of  
         the depth buffer.    
  
         To assist in the computation of the final Z value, a fragment program  
         can access the interpolated depth of the fragment (prior to any  
         displacement) by reading the "z" component of the f[WPOS] attribute  
         register.  
  
    How should near and far plane clipping work in fragment program mode if  
    the current fragment program computes a depth value?  
  
        RESOLVED:  Geometric clipping to the near and far clip plane should be  
        disabled.  Clipping should be done based on the depth values computed  
        per-fragment.  The rationale is that per-fragment depth displacement  
        operations may effectively move portions of a primitive initially  
        outside the clip volume inside, and vice versa.  
  
        Note that under the NV_depth_clamp extension, geometric clipping to  
        the near and far clip planes is also disabled, and the fragment depth  
        values are clamped to the depth range.  If depth clamp mode is enabled  
        when using a fragment program that computes a depth value, the  
        computed depth value will be clamped to the depth range.  
  
    Should fragment programs be allowed to use multiple precisions for  
    operands and operations?  
  
        RESOLVED:  Yes.  Low-precision operands are generally adequate for  
        representing colors.  Allowing low-precision registers also allows for  
        a larger number of temporary registers (at lower precision).  
        Low-precision operations also provide the opportunity for a higher  
        level of performance.    
  
        Applications are free to use only high-precision operations or mix  
        high- and low-precision operations as necessary.  
  
    What levels of precision are supported in arithmetic operations?  
  
        RESOLVED:  Arithmetic operations can be performed at three different  
        precisions.  32-bit floating point precision (fp32) uses the IEEE  
        single-precision standard with a sign bit, 8 exponent bits, and 23  
        mantissa bits.  16-bit floating-point precision (fp16) uses a similar  
        floating-point representation, but with 5 exponent bits and 10  
        mantissa bits.  Additionally, many arithmetic operations can also be  
        carried out at 12-bit fixed point precision (fx12), where values in  
        the range [-2,+2) are represented as signed values with 10 fraction  
        bits.  
  
    How should the precision with which operations are carried out be  
    specified?  Should we infer the precision from the types of the operands  
    or result vectors?  Or should it be an attribute of the instruction?  
  
        RESOLVED:  Applications can optionally specify the precision of  
        individual instructions by adding a suffix of "R", "H", and "X" to  
        instruction names to select fp32, fp16, and fx12 precision,  
        respectively.    
  
        By default, instructions will be carried out using the precision of  
        the destination register.  Always inferring the precision from the  
        operands has a number of issues.  First, there are a number of  
        operations (e.g., TEX/TXP/TXD) where result type has little to no  
        correspondance to the type of the operands.  In these cases, precision  
        suffixes are not supported.  Second, one could have instructions  
        automatically cast operands and compute results using the type of the  
        highest precision operand or result.  This behavior would be  
        problematic since all fragment attribute registers and program  
        parameters are kept at full precision, but full precision may not be  
        needed by the operation.  
  
        The choice of precision level allows programs to trade off precision  
        for potentially higher performance.  Giving the program explicit  
        control over the precision also allows it to dictate precision  
        explicitly and eliminate any uncertainty over type casting.  
  
    For instructions whose specified precision is different than the precision  
    of the operands or the result registers, how are the operations performed?  
    How are the condition codes updated?  
  
        RESOLVED:  Operations are performed with operands and results at the  
        precision specified by the instruction.  After the operation is  
        complete, the result is converted to the precision of the destination  
        register, after which the condition code is generated.  
  
        In an alternate approach, the condition code could be generated from  
        the result.  However, in some cases, the register contents would not  
        match the condition code.  In such cases, it may not be reliable to  
        use the condition code to prevent division by zero or other special  
        cases.  
  
    How does this extension interact with the ARB_multisample extension?  In  
    the ARB_multisample extension, each fragment has multiple depth values.  
    In this extension, a single interpolated depth value may be modified by a  
    fragment program.  
  
        RESOLVED:  The depth values for the extra samples are generated by  
        computing partials of the computed depth value and using these  
        partials to derive the depth values for each of the extra samples.  
  
    How does this extension interact with polygon offset?  Both extensions  
    modify fragment depth values.  
  
        RESOLVED:  As in the base OpenGL spec, the depth offset generated by  
        polygon offset is added during polygon rasterization.  The depth value  
        provided to programs in f[WPOS].z already includes polygon offset, if  
        enabled.  If the depth value is replaced by a fragment program, the  
        polygon offset value will NOT be recomputed and added back after  
        program execution.  
    
        This is probably not desirable for fragment programs that modify depth  
        values since the partials used to generate the offset may not match  
        the partials of the computed depth value.  Polygon offset for filled  
        polygons can be approximated in a fragment program using the depth  
        partials obtained by the DDX and DDY instructions.  This will not work  
        properly for line- and point-mode polygons, since the partials used  
        for offset are computed over the polygon, while the partials resulting  
        from the DDX and DDY instructions are computed along the line (or are  
        zero for point-mode polygons).  In addition, separate treatment of  
        points, line segments, and polygons is not possible in a fragment  
        program.  
  
    Should depth component replacement be an property of the fragment program  
    or a separate enable?  
  
        RESOLVED:  It should be a program property.  Using the output register  
        notation simplifies matters:  depth components are replaced if and  
        only if the DEPR register is written to.  This alleviates the  
        application and driver burden of maintaining separate state.  
  
    How does this extension affect the handling of q texture coordinates in  
    the OpenGL spec?  
         
        RESOLVED:  Fragment programs are allowed to access an associated q  
        texture coordinate, so this attribute must be produced by  
        rasterization.  In unextended OpenGL 1.2, the q coordinate is  
        eliminated in the rasterization portions of the spec after dividing  
        each of s, t, and r by it.  This extension updates the specification  
        to pass q coordinates through at least to conventional texture  
        mapping.  When fragment program mode are disabled, q coordinates will  
        be eliminated there in an identical manner.  This modification has the  
        added benefit of simplifying the equations used for attribute  
        interpolation.  
  
    How should clip w coordinates be handled by this extension?  
  
        RESOLVED:  Fragment programs are allowed to access the reciprocal of  
        the clip w coordinate, so this attribute must be produced by  
        rasterization.  The OpenGL 1.2 spec doesn't explictly enumerate the  
        attributes associated with the fragment, but we add treatment of the w  
        clip coordinate in the appropriate locations.    
  
        The reciprocal of the clip w coordinate in traditional graphics  
        hardware is produced by screen-space linear interpolation of the  
        reciprocals of the clip w coordinates of the vertices.  However, this  
        spec says the clip w coordinate is produced by perspective-correct  
        interpolation of the (non-reciprocated) clip w vertex coordinates.  
        These two formulations turn out to be equivalent, and the latter is  
        more convenient since the core OpenGL spec already contains formulas  
        for perspective-correct interpolation of vertex attributes.  
  
    What is produced by the TEX/TXP/TXD instructions if the requested texture  
    image is inconsistent?  
  
        RESOLVED:  The result vector is specified to be (0,0,0,0).  This  
        behavior is consistent with the NV_texture_shader extension.  Note  
        that like in NV_texture_shader, these instructions ignore the standard  
        hierarchy of texture enables and programs can access textures that are  
        not specifically "enabled".  
  
    Should a minimum precision be specified for certain fragment attribute  
    registers (in particular COL0, COL1) that may not be generated with full  
    fp32 precision?  
  
        RESOLVED:  No.  It is expected that the precision of COL0/COL1 should  
        generally be at least as high as that of the frame buffer.  
  
    Fragment color components (f[COL0] and f[COL1]) are generally  
    low-precision fixed-point values in the range [0,1].  Is it possible to  
    pass unclamped or high-precision color components to fragment programs?  
  
        RESOLVED:  Yes, although you can't exactly call them "colors".  
        High-precision per-vertex color values can be written into any unused  
        texture coordinate set, either via a MultiTexCoord call or using a  
        vertex program.  These "texture coordinates" will be interpolated  
        during rasterization, and can be used arbitrarily by a fragment  
        program.  
  
        In particular, there is no requirement that per-fragment attributes  
        called "texture coordinates" be used for texture mapping.  
  
    Should this specification guarantee that temporary registers are  
    initialized to zero?  
  
        RESOLVED:  Yes.  This will allow for the modular construction of  
        programs that accumulate results in registers.  For example,  
        per-fragment lighting may use MAD instructions to accumulate color  
        contributions at each light.  Without zero-initialization, the program  
        would require an explicit MOV instruction to load 0 or the use of the  
        MUL instruction for the first light.  
  
    Should this specification support Unicode program strings?  
  
        RESOLVED:  Not necessary.  
  
    Programs defined by NV_vertex_program begin with "!!VP1.0".  Should  
    fragment programs have a similar identifier?  
  
        RESOLVED:  Yes, "!!FP1.0", identifying the first revision of this  
        fragment program language.  
  
    Should per-fragment attributes have equivalent integer names in the  
    program language, as per-vertex attributes do in NV_vertex_program?  
  
        RESOLVED:  No.  In NV_vertex_program, "generic" vertex attributes  
        could be specified directly by an application using only an attribute  
        number.  Those numbers may have no necessary correlation with the  
        conventional attribute names, although conventional vertex attributes  
        are mapped to attribute numbers.  However, conventional attributes are  
        the only outputs of vertex programs and of rasterization.  Therefore,  
        there is no need for a similar input-by-number functionality for  
        fragment programs.  
  
    Should we provide the ability to issue instructions that do not update  
    temporary or output registers?  
  
        RESOLVED:  Yes.  Programs may issue instructions whose only purpose is  
        to update the condition code register, and requiring such instructions  
        to write to a temporary may require the use of an additional temporary  
        and/or defeat possible program optimizations.  We accomplish this by  
        adding two write-only temporary pseudo-registers ("RC" and "HC") that  
        can be specified as destination registers.  
  
    Do the packing and unpacking instructions in this extension make any  
    sense?  
  
        RESOLVED:  Yes.  They are useful for packing and unpacking multiple  
        components in a single channel of a floating-point frame buffer.  For  
        example, a 128-bit "RGBA" frame buffer could pack 16 8-bit quantities  
        or 8 16-bit quantities, all of which could be used in later  
        rasterization passes.  See the NV_float_buffer extension for more  
        information.  
  
    Should we provide a method for specifying an fp16 depth component output  
    value?  
  
        RESOLVED:  No.  There is no good reason for supporting half-precision  
        Z outputs.  Even with 16-bit Z buffers, the 10-bit mantissa of the  
        half-precision float is rather limiting.  There would effectively be  
        only 11 good bits in the back half of the Z buffer.  
  
    Should RequestResidentProgramsNV (or a new equivalent function) take a  
    target?  Dealing with working sets of different program types is a bit  
    messy.  Should we document some limitation if we get programs of different  
    types?  
            
        RESOLVED:  In retrospect, it may have been a good idea to attach a  
        target to this command, but there isn't a good reason to mess with  
        something that already works for vertex programs.  The driver is  
        responsible for ensuring consistent results when the program types  
        specified are mixed.  
      
    What happens on data type conversions where the original value is not  
    exactly representable in the new data type, either due to overflow or  
    insufficient precision in the destination type?  
  
        RESOLVED:  In case of overflow, the original value is clamped to the  
        +/-INF (fp16 or fp32) or the nearest representable value (fx12).  In  
        case of imprecision, the conversion is either to round or truncate to  
        the nearest representable value.  
  
    Should this extension support IEEE-style denorms?  For 32-bit IEEE  
    floating point, denorms are numbers smaller in absolute value than 2^-126.  
    For 16-bit floats used by this extension, denorms are numbers smaller in  
    absolute value than 2^-14.  
  
        RESOLVED:  For 32-bit data types, hardware support for denorms was  
        considered too expensive relative to the benefit provided.  
        Computational results that would otherwise produce denorms are flushed  
        to zero.  For 16-bit data types, hardware denorm support will be  
        present.  The expense of hardware denorm support is lower and the  
        potential precision benefit is greater for 16-bit data types.  
  
    OpenGL provides a hierarchy of texture enables.  The texture lookup  
    operations in NV_texture_shader effectively override the texture enable  
    hierarchy and select a specific texture to enable.  What should be done by  
    this extension?  
  
        RESOLVED:  This extension will build upon NV_texture_shader and reduce  
        the driver overhead of validating the texture enables.  Texture  
        lookups can be specified by instructions like "TEX H0, f[TEX2], TEX2,  
        3D", which would indicate to use texture coordinate set number 2 to do  
        a lookup in the texture object bound to the TEXTURE_3D target in  
        texture image unit 2.  
  
        Each texture unit can have only one "active" target.  Programs are not  
        allowed to reference different texture targets in the same texture  
        image unit.  In the example above, any other texture instructions  
        using texture image unit 2 must specify the 3D texture target.  
  
    What is the interaction with NV_register_combiners?  
  
        RESOLVED:  Register combiners are not available when fragment programs  
        are enabled.  
  
        Previous version of this specification supported the notion of  
        combiner programs, where the result of fragment program execution was  
        a set of four "texture lookup" values that fed the register combiners.  
  
    For convenience, should we include pseudo-instructions not present in the  
    hardware instruction set that are trivially implementable?  For example,  
    absolute value and subtract instructions could fall in this category.  An  
    "ABS R1,R0" instruction would be equivalent to "MAX R1,R0,-R0", and a "SUB  
    R2,R0,R1" would be equivalent to "ADD R2,R0,-R1"  
  
        RESOLVED:  In general, yes.  A SUB instruction is provided for  
        convenience.  This extension does not provide a separate ABS  
        instruction because it supports absolute value operations of each  
        operand.  
  
    Should there be a '+' in the <optionalSign> portion of the grammar?  There  
    isn't one in the GL_NV_vertex_program spec.  
  
        RESOLVED:  Yes, for orthogonality/readability.  A '+' obviously adds  
        no functionality.  In NV_vertex_program, an <optionalSign> of "-" was  
        always a negation operator.  However, in fragment programs, it can  
        also be used as a sign for a constant value.  
  
    Can the same fragment attribute register, program parameter register, or  
    constants be used for multiple operands in the same instruction?  If so,  
    can it be used with different swizzle patterns?  
  
        RESOLVED:  Yes and yes.  
  
    This extension allows different limits for the number of texture  
    coordinate sets and the number of texture image units (i.e., texture maps  
    and associated data).  The state in ActiveTextureARB affects both  
    coordinate sets (TexGen, matrix operations) and image units (TexParameter,  
    TexEnv).  How should we deal with this?  
  
        RESOLVED:  Continue to use ActiveTextureARB and emit an  
        INVALID_OPERATION if the active texture refers to an unsupported  
        coordinate set/image unit.  Other options included creating dummy  
        (unusable) state for unsupported coordinate sets/image units and  
        continue to use ActiveTextureARB normally, or creating separate state  
        and state-setting commands for coordinate sets and image units.  
        Separate state is the cleanest solution, but would add more calls and  
        potentially cause more programmer confusion.  Dummy state would avoid  
        additional error checks, but the demands of dummy state could grow if  
        the number of texture image units and texture coordinate sets  
        increases.  
  
        The current OpenGL spec is vague as to what state is affected by the  
        active texture selector and has no distination between  
        coordinate-related and image-related state.  The state tables could  
        use a good clean-up in this area.  
  
    The LRP instruction is defined so that the result of "LRP R0, R0, R1, R2"  
    is R0*R1+(1-R0)*R2.  There are conflicting precedents here.  The  
    definition here matches the "lrp" instruction in the DirectX 8.0 pixel  
    shader language.  However, an equivalent RenderMan lerp operation would  
    yield a result of (1-R0)*R1+R0*R2.  Which ordering should be implemented?  
  
        RESOLVED:  NVIDIA hardware implements the former operand ordering, and  
        there is no good reason to specify a different ordering.  To convert a  
        "LRP" using the latter ordering to NV_fragment_program, swap the third  
        and fourth arguments.  
  
    Should this extension provide tracking of matrices or any other state,  
    similar to that provided in NV_vertex_program?  
  
        RESOLVED:  No.  
  
    Should this extension provide global program parameters -- values shared  
    between multiple fragment programs?  
  
        RESOLVED:  No.  
  
    Should this extension provide program parameters specific to a program?  
    If so, how?  
  
        RESOLVED:  Yes.  These parameters will be called "local parameters".  
        This extension will provide both named and numbered local parameters.  
        Local parameters can be managed by the driver and eliminate the need  
        for applications to manage a global name space.    
  
        Named local parameters work much like standard variable names in most  
        programming languages.  They are created using the "DECLARE"  
        instruction within the fragment program itself.  For example:  
  
            DECLARE color = {1,0,0,1};  
  
        Named local parameters are used simply by referencing the variable  
        name.  They do not require the array syntax like the global parameters  
        in the NV_vertex_program extension.  They can be updated using the  
        commands ProgramNamedParameter4[f,fv]NV.  
  
        Numbered local parameters are not declared.  They are used by simply  
        referencing an element of an array called "p".  For example,  
  
            MOV R0, p[12];  
  
        loads the value of numbered local parameter 12 into register R0.  
        Numbered local parameters can be updated using the commands  
        ProgramLocalParameter4[d,dv,f,fv]ARB.  
  
        The numbered local parameter APIs were added to this extension late in  
        its development, and are provided for compatibility with the  
        ARB_vertex_program extension, and what will likely be supported in  
        ARB_fragment_program as well.  Providing this mechanism allows  
        programs to use the same mechanisms to set local parameters in both  
        extension.  
  
    Why are the APIs for setting named and numbered local parameters  
    different?  
  
        RESOLVED:  The named parameter API was created prior to  
        ARB_vertex_program (and the possible future ARB_fragment_program) and  
        uses conventions borrowed from NV_vertex_program.  A slightly  
        different API was chosen during the ARB standardization process; see  
        the ARB_vertex_program specification for more details.  
  
        The named parameter API takes a program ID and a parameter name, and  
        sets the parameter for the program with the specified ID.  The  
        specified program does not need to be bound (via BindProgramNV) in  
        order to modify the values of its named parameters.  The numbered  
        parameter API takes a program target enum (FRAGMENT_PROGRAM_NV) and a  
        parameter number and modifies the corresponding numbered parameter of  
        the currently bound program.  
  
    What should be the initial value of uninitialized local parameters?  
  
        RESOLVED:  (0,0,0,0).  This choice is somewhat arbitrary, but matches  
        previous extensions (e.g., NV_vertex_program).  
  
    Should this extension support program parameter arrays?  
  
        RESOLVED:  No hardware support is present.  Note that from the point  
        of view of a fragment program, a texture map can be used as a 1-, 2-,  
        or 3-dimensional array of constants.  
          
    Should this extension provide support constants in fragment programs?  If  
    so, how?  
  
        RESOLVED:  Yes.  Scalar or vector constants can be defined inline  
        (e.g., "1.0" or "{1,2,3,4}").  In addition, named constants are  
        supported using the "DEFINE" instruction, which allow programmers to  
        change the values of constants used in multiple instructions simply be  
        changing the value assigned to the named constant.  
  
        Note that because this extension uses program strings, the  
        floating-point value of any constants generated on the fly must be  
        printed to the program string.  An alternate method that avoids the  
        need to print constants is to declare a named local program parameter  
        and initialize it with the ProgramNamedParameter4[f,fv]() calls.  
  
    Should named constants be allowed to be redefined?  
  
        RESOLVED:  No.  If you want to redefine the values of constants, you  
        can create an equivalent named program parameter by changing the  
        "DEFINE" keyword to "DECLARE".  
  
    Should functions used to update or query named local parameters take a  
    zero-terminated string (as with most strings in the C programming  
    language), or should they require an explicit string length?  If the  
    former, should we create a version of LoadProgramNV that does not require  
    a string length.  
  
        RESOLVED:  Stick with explicit string length.  Strings that are  
        defined as constants can have the length computed at compile-time.  
        Strings read from files will have the length known in advance.  
        Programs to build strings at run-time also likely keep the length  
        up-to-date.  Passing an explicit length saves time, since the driver  
        doesn't have to do a strlen().  
  
    What is the deal with the alpha of the secondary color?  
  
        RESOLVED:  In unextended OpenGL 1.2, the alpha component of the  
        secondary color is forced to 0.0.  In the EXT_secondary_color  
        extension, the alpha of the per-vertex secondary colors is defined to  
        be 0.0.  NV_vertex_program allows vertex programs to produce a  
        per-vertex alpha component, but it is forced to zero for the purposes  
        of the color sum.  In the NV_register_combiners extension, the alpha  
        component of the secondary color is undefined.  What a mess.  
  
        In this extension, the alpha of the secondary color is well-defined  
        and can be used normally.  When in vertex program mode  
  
    Why are fragment program instructions involving f[FOGC] or f[TEX0] through  
    f[TEX7] automatically carried out at full precision?  
  
        RESOLVED:  This is an artifact of the method that these interpolants  
        are generated the NVIDIA graphics hardware.  If such instructions  
        absolutely must be carried out at lower precision, the requirement can  
        be met by first loading the interpolants into a temporary register.  
  
    With a different number of texture coordinate sets and texture image  
    units, how many copies of each kind of texture state are there?  
  
        RESOLVED:  The intention is that texture state be broken into three  
        groups.  (1) There are MAX_TEXTURE_COORDS_NV copies of texture  
        coordinate set state, which includes current texture coordinates,  
        TexGen state, and texture matrices.  (2) There are  
        MAX_TEXTURE_IMAGE_UNITS_NV copies of texture image unit state, which  
        include texture maps, texture parameters, LOD bias parameters.  (3)  
        There are MAX_TEXTURE_UNITS_ARB copies of legacy OpenGL texture unit  
        state (e.g., texture enables, TexEnv blending state), all of which are  
        unused when in fragment program mode.  
  
        It is not necessary that MAX_TEXTURE_UNITS_ARB be equal to the minimum  
        of MAX_TEXTURE_COORDS_NV and MAX_TEXTURE_IMAGE_UNITS --  
        implementations may choose not to extend fixed-function OpenGL texture  
        mapping modes beyond a certain point.  
  
    The GLX protocol for LoadProgramNV (and ProgramNamedParameterNV) may end  
    up with programs >64KB.  This will overflow the limits of the GLX Render  
    protocol, resulting in the need to use RenderLarge path.  This is an issue  
    with vertex programs, also.  
  
        RESOLVED:  Yes, it is.  
  
    Should textures used by fragment programs be declared?  For example,  
    "TEXTURE TEX3, 2D", indicating that the 2D texture should be used for all  
    accesses to texture unit 3.  The dimension could be dropped from the TEX  
    family of instructions, and some of the compile-time error checking could  
    be dropped.  
  
        RESOLVED:  Maybe it should be, but for better or worse, it isn't.  
  
    It is not all that uncommon to have negative q values with projective  
    texture mapping, but results are undefined if any q values are negative in  
    this specification.  Why?  
  
        RESOLVED:  This restriction carries on a similar one in the initial  
        OpenGL specification.  The motivation for this restriction is that  
        when interpolating, it is possible for a fragment to have an  
        interpolated q coordinate at or near 0.0.  Since the texture  
        coordinates used for projective texture mapping are s/q, t/q, and r/q,  
        this will result in a divide-by-zero error or suffer from significant  
        numerical instability.  Results will be inaccurate for such fragments.  
  
        Other than the numerical stability issue above, NVIDIA hardware should  
        have no problems with negative q coordinates.  
  
    Should programs that replace depth have their own special program type,  
    Such as "!!FPD1.0" and "!!FPDC1.0"?  
  
        RESOLVED:  No.  If a program has an instruction that writes to  
        o[DEPR], the final fragment depth value is taken from o[DEPR].z.  
        Otherwise, the fragment's original depth value is used.  
  
    What fx12 value should NaN map to?  
  
        RESOLVED:  For the lack of any better choice, 0.0.  
  
    How are special-case encodings (-INF, +INF, -0.0, +0.0, NaN) handled for  
    arithmetic and comparison operations?  
  
        RESOLVED:  The special cases for all floating-point operations are  
        designed to match the IEEE specification for floating-point numbers as  
        closely as possible.  The results produced by special cases should be  
        enumerated in the sections of this spec describing the operations.  
        There are some cases where the implemented fragment program behavior  
        does not match IEEE conventions, and these cases should be noted in  
        this specification.  
  
    How can condition codes be used to mask out register writes?  How about  
    killing fragments?  What other things can you do?  
  
        RESOLVED:  The following example computes a component wise |R1-R2|:  
  
          SUBC R0, R1, R2;      # "C" suffix means update condition code  
          MOV  R0 (LT), -R0;    # Conditional write mask in parentheses  
  
        The first instruction computes a component-wise difference between R1  
        and R2, storing R1-R2 in register R0.  The "C" suffix in the  
        instruction means to update the condition code based on the sign of  
        the result vector components.  The second instruction inverts the sign  
        of the components of R0.  However the "(LT)" portion says that the  
        destination register should be updated only if the corresponding  
        condition code component is LT (negative).  This means that only those  
        components of R0  
  
        To kill a fragment if the red (x) component of a texture lookup  
        returns zero:  
  
          TEXC R0, f[TEX0], TEX0, 2D;  
          KIL EQ.x;  
  
        To kill based on the green (y) component, use "EQ.y" instead.  To kill  
        if any of the four components is zero, use "EQ.xyzw" or just "EQ".  
          
        Fragment programs do not support boolean expressions.  These can  
        generally be achieved using conditional write mask.    
  
        To evaluate the expression "(R0.x == 0) && (R1.x == 0)":  
  
          MOVC RC.x, R0.x;  
          MOVC RC.x (EQ), R1.x;  
  
        To evaluate the expression "(R0.x == 0) || (R1.x == 0)":  
  
          MOVC RC.x, R0.x;  
          MOVC RC.x (NE), R1.x;  
  
        In both cases, the x component of the condition code will contain "EQ"  
        if and only if the condition is TRUE.  
  
    How can fragment programs be used to implement non-standard texture  
    filtering modes?  
  
        RESOLVED:  As one example, consider a case where you want to do linear  
        filtering in a 2D texture map, but only horizontally.  To achieve  
        this, first set the texture filtering mode to NEAREST.  For a 16 x n  
        texture, you might do something like:  
  
          DEFINE halfTexel = { 0.03125, 0 };   # 1/32 (1/2 a texel)  
          ADD R2, f[TEX0], -halfTexel;         # coords of left sample  
          ADD R1, f[TEX0], +halfTexel;         # coords of right sample  
          TEX R0, R2, TEX0, 2D;                # lookup left sample  
          TEX R1, R1, TEX0, 2D;                # lookup right sample  
          MUL R2.x, R2.x, 16;                  # scale X coords to texels  
          FRC R2.x, R2.x;                      # get fraction, filter weight  
          LRP R0, R2.x, R1, R0;                # blend samples based on weight  
  
        There are plenty of other interesting things that can be done.  
  
    Should this specification provide more examples?  
  
        RESOLVED:  Yes, it should.  
  
    Is the OpenGL ARB working on a multi-vendor standard for fragment  
    programmability?  Will there be an ARB_fragment_program extension?  If so,  
    how will this extension interact with the ARB standard?  
  
        RESOLVED:  Yes, as of July 2002, there was a multi-vendor working  
        group and a draft specification.  The ARB extension is expected to  
        have several features not present in this extension, such as state  
        tracking and global parameters (called "program environment  
        parameters").  It will also likely lack certain features found in this  
        extension.  
  
    Why does the HEMI mapping apply to the third component of signed HILO  
    textures, but not to unsigned HILO textures?  
  
        RESOLVED:  This behavior matches the behavior of NV_texture_shader  
        (e.g., the DOT_PRODUCT_NV mode).  The HEMI mapping will construct the  
        third component of a unit vector whose first two components are  
        encoded in the HILO texture.

New Procedures and Functions

  
    void ProgramNamedParameter4fNV(uint id, sizei len, const ubyte *name,  
                                   float x, float y, float z, float w);  
    void ProgramNamedParameter4dNV(uint id, sizei len, const ubyte *name,  
                                   double x, double y, double z, double w);  
    void ProgramNamedParameter4fvNV(uint id, sizei len, const ubyte *name,  
                                    const float v[]);  
    void ProgramNamedParameter4dvNV(uint id, sizei len, const ubyte *name,  
                                    const double v[]);  
    void GetProgramNamedParameterfvNV(uint id, sizei len, const ubyte *name,  
                                      float *params);  
    void GetProgramNamedParameterdvNV(uint id, sizei len, const ubyte *name,  
                                      double *params);  
  
    void ProgramLocalParameter4dARB(enum target, uint index,  
                                    double x, double y, double z, double w);  
    void ProgramLocalParameter4dvARB(enum target, uint index,  
                                     const double *params);  
    void ProgramLocalParameter4fARB(enum target, uint index,  
                                    float x, float y, float z, float w);  
    void ProgramLocalParameter4fvARB(enum target, uint index,  
                                     const float *params);  
    void GetProgramLocalParameterdvARB(enum target, uint index,  
                                       double *params);  
    void GetProgramLocalParameterfvARB(enum target, uint index,   
                                       float *params);

New Tokens

  
    Accepted by the <cap> parameter of Disable, Enable, and IsEnabled, by the  
    <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv, and GetDoublev,  
    and by the <target> parameter of BindProgramNV, LoadProgramNV,  
    ProgramLocalParameter4dARB, ProgramLocalParameter4dvARB,  
    ProgramLocalParameter4fARB, ProgramLocalParameter4fvARB,  
    GetProgramLocalParameterdvARB, and GetProgramLocalParameterfvARB:  
  
        FRAGMENT_PROGRAM_NV                            0x8870  
  
    Accepted by the <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv,  
    and GetDoublev:  
  
        MAX_TEXTURE_COORDS_NV                          0x8871  
        MAX_TEXTURE_IMAGE_UNITS_NV                     0x8872  
        FRAGMENT_PROGRAM_BINDING_NV                    0x8873  
        MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV       0x8868  
  
    Accepted by the <name> parameter of GetString:  
  
        PROGRAM_ERROR_STRING_NV                        0x8874

Additions to Chapter 2 of the OpenGL 1.2.1 Specification (OpenGL Operation)

  
    Modify Section 2.11, Clipping (p.39)  
  
    (replace the first paragraph of the section, p. 39)  Primitives are clipped  
    to the clip volume.  In clip coordinates, the view volume is defined by  
      
        -w_c <= x_c <= w_c,  
        -w_c <= y_c <= w_c, and  
        -w_c <= z_c <= w_c.  
  
    Clipping to the near and far clip planes is ignored if fragment program  
    mode (section 3.11) or texture shaders (see NV_texture_shader  
    specification) are enabled, if the current fragment program or texture  
    shader computes per-fragment depth values.  In this case, the view volume  
    is defined by:  
      
        -w_c <= x_c <= w_c and  
        -w_c <= y_c <= w_c.

Additions to Chapter 3 of the OpenGL 1.2.1 Specification (Rasterization)

  
    Modify Chapter 3 introduction (p. 57)  
  
    (p.57, modify 1st paragraph) ... Figure 3.1 diagrams the rasterization  
    process.  The color value assigned to a fragment is initially determined  
    by the rasterization operations (Sections 3.3 through 3.7) and modified by  
    either the execution of the texturing, color sum, and fog operations as  
    defined in Sections 3.8, 3.9, and 3.10, or of a fragment program defined  
    in Section 3.11.  The final depth value is initially determined by the  
    rasterization operations and may be modified by a fragment program.  
    
    note:  Antialiasing Application is renumbered from Section 3.11 to Section  
    3.12.  
  
    Modify Figure 3.1 (p.58)  
  
                             Primitive Assembly  
                                      |  
              +-----------+-----------+-----------+-----------+  
              |           |           |           |           |  
              |           |           |        Pixel          |  
            Point       Line       Polygon     Rectangle   Bitmap  
           Raster-     Raster-     Raster-     Raster-     Raster-  
           ization     ization     ization     ization     ization     
              |           |           |           |           |  
              +-----------+-----------+-----------+-----------+  
                                      |  
                                      |  
                    +-----------------+-----------------+  
                    |                 |                 |  
              Conventional         Texture          Fragment  
              Texture Fetch        Shaders          Programs  
                    |                 |                 |  
                    |  +--------------+                 |  
                    |  |                                |  
        TEXTURE_    o  o                                |  
        SHADER_NV                                       |  
        enable      o                                   |   
                    |                                   |  
                    +-------------+                     |  
                    |             |                     |  
               Conventional   Register                  |  
                  TexEnv      Combiners                 |  
                    |             |                     |  
                Color Sum         |                     |  
                    |             |                     |  
                   Fog            |                     |  
                    |             |                     |  
                    |  +----------+                     |  
                    |  |                                |   
        REGISTER_   o  o                                |  
        COMBINERS_                                      |  
        NV enable   o                                   |  
                    |                                   |  
                    +-----------------+  +--------------+  
                                      |  |  
                           FRAGMENT_  o  o  
                           PROGRAM_  
                           NV enable  o  
                                      |  
                                      |  
                                   Coverage   
                                  Application  
                                      |  
                                      v  
                            to fragment processing  
  
  
    Modify Section 3.3, Points (p.61)  
  
    All fragments produced in rasterizing a non-antialiased point are assigned  
    the same associated data, which are those of the vertex corresponding to  
    the point.  (delete reference to divide by q).  
  
    If anitialiasing is enabled, then ...  The data associated with each  
    fragment are otherwise the data associated with the point being  
    rasterized.  (delete reference to divide by q)  
  
    Modify Section 3.4.1, Basic Line Segment Rasterization (p.66)  
  
    (Note that t=0 at p_a and t=1 at p_b).  The value of an associated datum f  
    from the fragment, whether it be R, G, B, or A (in RGBA mode) or a color  
    index (in color index mode), the s, t, r, or q texture coordinate, or the  
    clip w coordinate (the depth value, window z, must be found using equation  
    3.3, below), is found as  
  
      f = (1-t) * f_a / w_a + t * f_b / w_b                     (3.2)  
          ---------------------------------  
                (1-t) / w_a + t / w_b  
  
    where f_a and f_b are the data associated with the starting and ending  
    endpoints of the segment, respectively; w_a and w_b are the clip  
    w coordinates of the starting and ending endpoints of the segments  
    respectively.  Note that linear interpolation would use  
  
      f = (1-t) * f_a + t * f_b.                                (3.3)  
  
    ... A GL implementation may choose to approximate equation 3.2 with 3.3,  
    but this will normally lead to unacceptable distortion effects when  
    interpolating texture coordinates or clip w coordinates.  
  
    Modify Section 3.5.1, Basic Polygon Rasterization (p.71)  
  
    Denote a datum at p_a, p_b, or p_c ... is given by  
  
      f = a * f_a / w_a + b * f_b / w_b + c * f_c / w_c         (3.4)  
          ---------------------------------------------  
                  a / w_a + b / w_b + c / w_c  
  
    where w_a, w_b, and w_c are the clip w coordinates of p_a, p_b, and p_c,  
    respectively.  a, b, and c are the barycentric coordinates of the fragment  
    for which the data are produced. a, b, and c must correspond precisely to  
    the exact coordinates ... at the fragment's center.  
      
    Just as with line segment rasterization, equation 3.4 may be approximated  
    by  
      
      f = a * f_a + b * f_b + c * f_c;                          (3.5)  
  
    this may yield ... for texture coordinates or clip w coordinates.  
  
    Modify Section 3.6.4, Rasterization of Pixel Rectangles (p.100)  
  
    A fragment arising from a group ... are given by those associated with the  
    current raster position.  (delete reference to divide by q)  
        
    Modify Section 3.7, Bitmaps (p.111)  
  
    Otherwise, a rectangular array ... The associated data for each fragment  
    are those associated with the current raster position.  (delete reference  
    to divide by q)  Once the fragments have been produced ...  
  
    Modify Section 3.8, Texturing (p.112)  
  
    ... an image at the location indicated by a fragment's texture coordinates  
    to modify the fragments primary RGBA color.  Texturing does not affect the  
    secondary color.    
  
    Texturing is specified only for RGBA mode; its use in color index mode is  
    undefined.  
  
    Except when in fragment program mode (Section 3.11), the (s,t,r) texture  
    coordinates used for texturing are the values s/q, t/q, and r/q,  
    respectively, where s, t, r, and q are the texture coordinates associated  
    with the fragment.  When in fragment program mode, the (s,t,r) texture  
    coordinates are specified by the program.  If q is less than or equal to  
    zero, the results of texturing are undefined.  
  
    Add new Section 3.11, Fragment Programs (p.140)    
  
    Fragment program mode is enabled and disabled with the Enable and Disable  
    commands using the symbolic constant FRAGMENT_PROGRAM_NV.  When fragment  
    program mode is enabled, standard and extended texturing, color sum, and  
    fog application stages are ignored and a general purpose program is  
    executed instead.    
  
    A fragment program is a sequence of instructions that execute on a  
    per-fragment basis.  In fragment program mode, the currently bound  
    fragment program is executed as each fragment is generated by the  
    rasterization operations.  Fragment programs execute a finite fixed  
    sequence of instructions with no branching or looping, and operate  
    independently from the processing of other fragments.  Fragment programs  
    are used to compute new color values to be associated with each fragment,  
    and can optionally compute a new depth value for each fragment as well.  
  
    Fragment program mode is not available in color index mode and is  
    considered disabled, regardless of the state of FRAGMENT_PROGRAM_NV.  When  
    fragment program mode is enabled, texture shaders and register combiners  
    (NV_texture_shader and NV_register_combiners extension) are disabled,  
    regardless of the state of TEXTURE_SHADER_NV and REGISTER_COMBINERS_NV.  
  
    Section 3.11.1, Fragment Program Registers  
  
    Fragment programs operate on a set of program registers.  Each program  
    register is a 4-component vector, whose components are referred to as "x",  
    "y", "z", and "w" respectively.  The components of a fragment register are  
    always referred to in this manner, regardless of the meaning of their  
    contents.  
  
    The four components of each fragment program register have one of two  
    different representations:  32-bit floating-point (fp32) or 16-bit  
    floating-point (fp16).  More details on these representations can be found  
    in Section 3.11.4.1.  
  
    There are several different classes of program registers.  Attribute  
    registers (Table X.1) correspond to the fragment's associated data  
    produced by rasterization.  Temporary registers (Table X.2) hold  
    intermediate results generated by the fragment program.  Output registers  
    (Table X.3) hold the final results of a fragment program.  The single  
    condition code register is used to mask writes to other registers or to  
    determine if a fragment should be discarded.  
  
  
    Section 3.11.1.1, Fragment Program Attribute Registers  
  
    The fragment program attribute registers (Table X.1) hold the location of  
    the fragment and the data associated with the fragment produced by  
    rasterization.  
  
    Fragment Attribute                                    Component  
    Register Name    Description                          Interpretation  
    --------------   -----------------------------------  --------------  
       f[WPOS]       Position of the fragment center.     (x,y,z,1/w)  
       f[COL0]       Interpolated primary color           (r,g,b,a)  
       f[COL1]       Interpolated secondary color         (r,g,b,a)  
       f[FOGC]       Interpolated fog distance/coord      (z,0,0,0)  
       f[TEX0]       Texture coordinate (unit 0)          (s,t,r,q)  
       f[TEX1]       Texture coordinate (unit 1)          (s,t,r,q)  
       f[TEX2]       Texture coordinate (unit 2)          (s,t,r,q)  
       f[TEX3]       Texture coordinate (unit 3)          (s,t,r,q)  
       f[TEX4]       Texture coordinate (unit 4)          (s,t,r,q)  
       f[TEX5]       Texture coordinate (unit 5)          (s,t,r,q)  
       f[TEX6]       Texture coordinate (unit 6)          (s,t,r,q)  
       f[TEX7]       Texture coordinate (unit 7)          (s,t,r,q)  
  
    Table X.1:  Fragment Attribute Registers.  The component interpretation  
    column describes the mapping of attribute values to register components.  
    For example, the "x" component of f[COL0] holds the red color component,  
    and the "x" component of f[TEX0] holds the "s" texture coordinate for  
    texture unit 0.  The entries "0" and "1" indicate that the attribute  
    register components hold the constants 0 and 1, respectively.  
  
    f[WPOS].x and f[WPOS].y hold the (x,y) window coordinates of the fragment  
    center, and relative to the lower left corner of the window.  f[WPOS].z  
    holds the associated z window coordinate, normally in the range [0,1].  
    f[WPOS].w holds the reciprocal of the associated clip w coordinate.  
  
    f[COL0] and f[COL1] hold the associated RGBA primary and secondary colors  
    of the fragment, respectively.    
  
    f[FOGC] holds the associated eye distance or fog coordinate normally used  
    for fog computations.  
  
    f[TEX0] through f[TEX7] hold the associated texture coordinates for  
    texture coordinate sets 0 through 7, respectively.  
  
    All attribute register components are treated as 32-bit floats.  However,  
    the components of primary and secondary colors (f[COL0] and f[COL1]) may  
    be generated with reduced precision.  
  
    The contents of the fragment attribute registers may not be modified by a  
    fragment program.  In addition, each fragment program instruction can use  
    at most one unique attribute register.  
  
  
    Section 3.11.1.2, Fragment Program Temporary Registers  
  
    The fragment temporary registers (Table X.2) hold intermediate values used  
    during the execution of a fragment program.  There are 96 temporary  
    register names, but not all can be used simultaneously.  
  
    Fragment Temporary  
    Register Name       Description  
    ------------------  -----------------------------------------------------  
        R0-R31          Four 32-bit (fp32) floating point values (s.e8.m23)  
        H0-H63          Four 16-bit (fp16) floating point values (s.e5.m10)  
  
    Table X.2:  Fragment Temporary Registers.  
  
    In addition to the normal temporary registers, there are two temporary  
    pseudo-registers, "RC" and "HC".  RC and HC are treated as unnumbered,  
    write-only temporary registers.  The components of RC have an fp32 data  
    type; the components of HC have an fp16 data type.  The sole purpose of  
    these registers is to permit instructions to modify the condition code  
    register (section 3.11.1.4) without overwriting the values in any  
    temporary register.  
  
    Fragment program instructions can read and write temporary registers.  
    There is no restriction on the number of temporary registers that can be  
    accessed by any given instruction.  
  
    All temporary registers are initialized to (0,0,0,0) each time a fragment  
    program executes.  
  
  
    Section 3.11.1.3, Fragment Program Output Registers  
  
    The fragment program output registers hold the final results of the  
    fragment program.  The possible final results of a fragment program are a  
    high- or low-precision RGBA fragment color, and a fragment depth value.  
  
       Output  
    Register Name      Description  
    -------------      -------------------------------------------------------  
       o[COLR]         Final RGBA fragment color, fp32 format  
       o[COLH]         Final RGBA fragment color, fp16 format  
       o[DEPR]         Final fragment depth value, fp32 format  
  
    Table X.3:  Fragment Program Output Registers.  
  
    o[COLR] and o[COLH] specify the color of a fragment.  These two registers  
    are identical, except for the associated data type of the components.  The  
    R, G, B, and A components of the fragment color are taken from the x, y,  
    z, and w components respectively of the o[COLR] or o[COLH].  A fragment  
    program will fail to load if it writes to both o[COLR] and o[COLH].  
  
    o[DEPR] can be used to replace the associated depth value of a fragment.  
    The new depth value is taken from the z component of o[DEPR].  If a  
    fragment program does not write to o[DEPR], the associated depth value is  
    unmodified.  
  
    A fragment program will fail to load if it does not write to at least one  
    output register.  
  
    The fragment program output registers may not be read by a fragment  
    program, but may be written to multiple times.    
  
    The values of all fragment program output registers are initially  
    undefined.  
  
  
    Section 3.11.1.4, Fragment Program Condition Code Register  
  
    The condition code register (CC) is a single four-component vector.  Each  
    component of this register is one of four enumerated values:  GT (greater  
    than), EQ (equal), LT (less than), or UN (unordered).  The condition code  
    register can be used to mask writes to fragment data register components  
    or to terminate processing of a fragment altogether (via the KIL  
    instruction).  
  
    Most fragment program instructions can optionally update the condition  
    code register.  When a fragment program instruction updates the condition  
    code register, a condition code component is set to LT if the  
    corresponding component of the result vector is less than zero, EQ if it  
    is equal to zero, GT if it is greater than zero, and UN if it is NaN (not  
    a number).  
  
    The condition code register is initialized to a vector of EQ values each  
    time a fragment program executes.  
  
  
    Section 3.11.2, Fragment Program Parameters  
  
    In addition to using the registers defined in Section 3.11.1, fragment  
    programs may also use fragment program parameters in their computation.  
    Fragment program parameters are constant during the execution of fragment  
    programs, but some parameters may be modified outside the execution of a  
    fragment program.  
  
    There are five different types of program parameters:  embedded scalar  
    constants, embedded vector constants, named constants, named local  
    parameters, and numbered local parameters.  
  
    Embedded scalar constants are written as standard floating-point numbers  
    with an optional sign designator ("+" or "-") and optional scientific  
    notation (e.g., "E+06", meaning "times 10^6").  
   
    Embedded vector constants are written as a comma-separated array of one to  
    four scalar constants, surrounded by braces (like a C/C++ array  
    initializer).  Vector constants are always treated as 4-component vectors:  
    constants with fewer than four components are expanded to 4-components by  
    filling missing y and z components with 0.0 and missing w components with  
    1.0.  Thus, the vector constant "{2}" is equivalent to "{2,0,0,1}",  
    "{3,4}" is equivalent to "{3,4,0,1}", and "{5,6,7}" is equivalent to  
    "{5,6,7,1}".  
  
    Named constants allow fragment program instructions to define scalar or  
    vector constants that can be referenced by name.  Named constants are  
    created using the DEFINE instruction:  
  
        DEFINE pi = 3.1415926535;  
        DEFINE color = {0.2, 0.5, 0.8, 1.0};  
  
    The DEFINE instruction associates a constant name with a scalar or vector  
    constant value.  Subsequent fragment program instructions that use the  
    constant name are equivalent to those using the corresponding constant  
    value.  
  
    Named local parameters are similar to named vector constants, but their  
    values can be modified after the program is loaded.  Local parameters are  
    created using the DECLARE instruction:  
  
        DECLARE fog_color1;  
        DECLARE fog_color2 = {0.3, 0.6, 0.9, 0.1};  
  
    The DECLARE instruction creates a 4-component vector associated with the  
    local parameter name.  Subsequent fragment program instructions  
    referencing the local parameter name are processed as though the current  
    value of the local parameter vector were specified instead of the  
    parameter name.  A DECLARE instruction can optionally specify an initial  
    value for the local parameter, which can be either a scalar or vector  
    constant.  Scalar constants are expanded to 4-component vectors by  
    replicating the scalar value in each component.  The initial value of  
    local parameters not initialized by the program is (0,0,0,0).  
  
    A named local parameter for a specific program can be updated using the  
    calls ProgramNamedParameter4fNV or ProgramNamedParameter4fvNV (section  
    5.7).  Named local parameters are accessible only by the program in which  
    they are defined.  Modifying a local parameter affects the only the  
    associated program and does not affect local parameters with the same name  
    that are found in any other fragment program.  
  
    Numbered local parameters are similar to named local parameters, except  
    that they are referred to by number and are not declared in fragment  
    programs.  Each fragment program object has an array of four-component  
    floating-point vectors that can be used by the program.  The number of  
    vectors is given by the implementation-dependent constant  
    MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV, and must be at least 64.  A  
    numbered local parameter is accessed by a fragment program as members of  
    an array called "p".  For example, the instruction  
  
        MOV R0, p[31];  
  
    copies the contents of numbered local parameter 31 into temporary register  
    R0.  
  
    Constant and local parameter names can be arbitrary strings consisting of  
    letters (upper or lower-case), numbers, underscores ("_"), and dollar  
    signs ("$").  Keywords defined in the grammar (including instruction  
    names) can not be used as constant names, nor can strings that start with  
    numbers, or strings that specify valid temporary register or texture  
    numbers (e.g., "R0"-"R31", "H0"-"H63"", "TEX0"-"TEX15").  A fragment  
    program will fail to load if a DEFINE or DECLARE instruction specifies an  
    invalid constant or local parameter name.  
  
    A fragment program will fail to load if an instruction contains a named  
    parameter not specified in a previous DEFINE or DECLARE instruction.  A  
    fragment program will also fail to load if a DEFINE or DECLARE instruction  
    attempts to re-define a named parameter specified in a previous DEFINE or  
    DECLARE instruction.  
  
    The contents of the fragment program parameters may not be modified by a  
    fragment program.  In addition, each fragment program instruction can  
    normally use at most one unique program parameter.  The only exception to  
    this rule is if all program parameter references specify named or embedded  
    constants that taken together contain no more than four unique scalar  
    values.  For such instructions, the GL will automatically generate an  
    equivalent instruction that references a single merged vector constant.  
    This merging allows programs to specify instructions like the following:  
  
        Instruction              Equivalent Instruction  
        ---------------------    ---------------------------------------  
        MAD R0, R1, 2, -1;       MAD R0, R1, {2,-1,0,0}.x, {2,-1,0,0}.y;  
        ADD R0, {1,2,3,4}, 4;    ADD R0, {1,2,3,4}.xyzw, {1,2,3,4}.w;  
  
    Before counting the number of unique values, any named constants are first  
    converted to the equivalent embedded constants.  When generating a  
    combined vector constant, the GL does not perform swizzling, component  
    selection, negation, or absolute value operations.  The following  
    instructions are invalid, as they contain more than four unique scalar  
    values.  
  
        Invalid Instructions  
        -----------------------------------  
        ADD R0, {1,2,3,4}, -4;  
        ADD R0, {1,2,3,4}, |-4|;  
        ADD R0, {1,2,3,4}, -{-1,-2,-3,-4};  
        ADD R0, {1,2,3,4}, {4,5,6,7}.x;  
  
  
    Section 3.11.3, Fragment Program Specification  
  
    Fragment programs are specified as an array of ubytes.  The array is a  
    string of ASCII characters encoding the program.  The command  
    LoadProgramNV loads a fragment program when the target parameter is  
    FRAGMENT_PROGRAM_NV.  The command BindProgramNV enables a fragment program  
    for execution.  
  
    At program load time, the program is parsed into a set of tokens possibly  
    separated by white space.  Spaces, tabs, newlines, carriage returns, and  
    comments are considered whitespace.  Comments begin with the character "#"  
    and are terminated by a newline, a carriage return, or the end of the  
    program array.  Fragment programs are case-sensitive -- upper and lower  
    case letters are treated differently.  The proper choice of case can be  
    inferred from the grammar.  
  
    The Backus-Naur Form (BNF) grammar below specifies the syntactically valid  
    sequences for fragment programs.  The set of valid tokens can be inferred  
    from the grammar.  The token "" represents an empty string and is used to  
    indicate optional rules.  A program is invalid if it contains any  
    undefined tokens or characters.  
  
    <program>              ::= <progPrefix> <instructionSequence> "END"  
  
    <progPrefix>           ::= "!!FP1.0"  
  
    <instructionSequence>  ::= <instructionSequence> <instructionStatement>  
                             | <instructionStatement>  
  
    <instructionStatement> ::= <instruction> ";"   
                             | <constantDefinition> ";"  
                             | <localDeclaration> ";"  
  
    <instruction>          ::= <VECTORop-instruction>  
                             | <SCALARop-instruction>  
                             | <BINSCop-instruction>  
                             | <BINop-instruction>  
                             | <TRIop-instruction>  
                             | <KILop-instruction>  
                             | <TEXop-instruction>  
                             | <TXDop-instruction>  
  
    <VECTORop-instruction> ::= <VECTORop> <maskedDstReg> ","   
                               <vectorSrc>  
  
    <VECTORop>             ::= "DDX"   | "DDX_SAT"  
                             | "DDXR"  | "DDXR_SAT"  
                             | "DDXH"  | "DDXH_SAT"  
                             | "DDXC"  | "DDXC_SAT"  
                             | "DDXRC" | "DDXRC_SAT"  
                             | "DDXHC" | "DDXHC_SAT"  
                             | "DDY"   | "DDY_SAT"  
                             | "DDYR"  | "DDYR_SAT"  
                             | "DDYH"  | "DDYH_SAT"  
                             | "DDYC"  | "DDYC_SAT"  
                             | "DDYRC" | "DDYRC_SAT"  
                             | "DDYHC" | "DDYHC_SAT"  
                             | "FLR"   | "FLR_SAT"  
                             | "FLRR"  | "FLRR_SAT"  
                             | "FLRH"  | "FLRH_SAT"  
                             | "FLRX"  | "FLRX_SAT"  
                             | "FLRC"  | "FLRC_SAT"  
                             | "FLRRC" | "FLRRC_SAT"  
                             | "FLRHC" | "FLRHC_SAT"  
                             | "FLRXC" | "FLRXC_SAT"  
                             | "FRC"   | "FRC_SAT"  
                             | "FRCR"  | "FRCR_SAT"  
                             | "FRCH"  | "FRCH_SAT"  
                             | "FRCX"  | "FRCX_SAT"  
                             | "FRCC"  | "FRCC_SAT"  
                             | "FRCRC" | "FRCRC_SAT"  
                             | "FRCHC" | "FRCHC_SAT"  
                             | "FRCXC" | "FRCXC_SAT"  
                             | "LIT"   | "LIT_SAT"  
                             | "LITR"  | "LITR_SAT"  
                             | "LITH"  | "LITH_SAT"  
                             | "LITC"  | "LITC_SAT"  
                             | "LITRC" | "LITRC_SAT"  
                             | "LITHC" | "LITHC_SAT"  
                             | "MOV"   | "MOV_SAT"  
                             | "MOVR"  | "MOVR_SAT"  
                             | "MOVH"  | "MOVH_SAT"  
                             | "MOVX"  | "MOVX_SAT"  
                             | "MOVC"  | "MOVC_SAT"  
                             | "MOVRC" | "MOVRC_SAT"  
                             | "MOVHC" | "MOVHC_SAT"  
                             | "MOVXC" | "MOVXC_SAT"  
                             | "PK2H"  
                             | "PK2US"    
                             | "PK4B"    
                             | "PK4UB"  
  
    <SCALARop-instruction> ::= <SCALARop> <maskedDstReg> ","   
                               <scalarSrc>  
  
    <SCALARop>             ::= "COS"     | "COS_SAT"  
                             | "COSR"    | "COSR_SAT"  
                             | "COSH"    | "COSH_SAT"  
                             | "COSC"    | "COSC_SAT"  
                             | "COSRC"   | "COSRC_SAT"  
                             | "COSHC"   | "COSHC_SAT"  
                             | "EX2"     | "EX2_SAT"  
                             | "EX2R"    | "EX2R_SAT"  
                             | "EX2H"    | "EX2H_SAT"  
                             | "EX2C"    | "EX2C_SAT"  
                             | "EX2RC"   | "EX2RC_SAT"  
                             | "EX2HC"   | "EX2HC_SAT"  
                             | "LG2"     | "LG2_SAT"  
                             | "LG2R"    | "LG2R_SAT"  
                             | "LG2H"    | "LG2H_SAT"  
                             | "LG2C"    | "LG2C_SAT"  
                             | "LG2RC"   | "LG2RC_SAT"  
                             | "LG2HC"   | "LG2HC_SAT"  
                             | "RCP"     | "RCP_SAT"  
                             | "RCPR"    | "RCPR_SAT"  
                             | "RCPH"    | "RCPH_SAT"  
                             | "RCPC"    | "RCPC_SAT"  
                             | "RCPRC"   | "RCPRC_SAT"  
                             | "RCPHC"   | "RCPHC_SAT"  
                             | "RSQ"     | "RSQ_SAT"  
                             | "RSQR"    | "RSQR_SAT"  
                             | "RSQH"    | "RSQH_SAT"  
                             | "RSQC"    | "RSQC_SAT"  
                             | "RSQRC"   | "RSQRC_SAT"  
                             | "RSQHC"   | "RSQHC_SAT"  
                             | "SIN"     | "SIN_SAT"  
                             | "SINR"    | "SINR_SAT"  
                             | "SINH"    | "SINH_SAT"  
                             | "SINC"    | "SINC_SAT"  
                             | "SINRC"   | "SINRC_SAT"  
                             | "SINHC"   | "SINHC_SAT"  
                             | "UP2H"    | "UP2H_SAT"  
                             | "UP2HC"   | "UP2HC_SAT"  
                             | "UP2US"   | "UP2US_SAT"  
                             | "UP2USC"  | "UP2USC_SAT"  
                             | "UP4B"    | "UP4B_SAT"  
                             | "UP4BC"   | "UP4BC_SAT"  
                             | "UP4UB"   | "UP4UB_SAT"  
                             | "UP4UBC"  | "UP4UBC_SAT"  
  
    <BINSCop-instruction> ::=  <BINSCop> <maskedDstReg> ","   
                               <scalarSrc> "," <scalarSrc>  
  
    <BINSCop>              ::= "POW"   | "POW_SAT"  
                             | "POWR"  | "POWR_SAT"  
                             | "POWH"  | "POWH_SAT"  
                             | "POWC"  | "POWC_SAT"  
                             | "POWRC" | "POWRC_SAT"  
                             | "POWHC" | "POWHC_SAT"  
  
    <BINop-instruction>    ::= <BINop> <maskedDstReg> ","  
                               <vectorSrc> "," <vectorSrc>  
  
    <BINop>                ::= "ADD"   | "ADD_SAT"  
                             | "ADDR"  | "ADDR_SAT"  
                             | "ADDH"  | "ADDH_SAT"  
                             | "ADDX"  | "ADDX_SAT"  
                             | "ADDC"  | "ADDC_SAT"  
                             | "ADDRC" | "ADDRC_SAT"  
                             | "ADDHC" | "ADDHC_SAT"  
                             | "ADDXC" | "ADDXC_SAT"  
                             | "DP3"   | "DP3_SAT"  
                             | "DP3R"  | "DP3R_SAT"  
                             | "DP3H"  | "DP3H_SAT"  
                             | "DP3X"  | "DP3X_SAT"  
                             | "DP3C"  | "DP3C_SAT"  
                             | "DP3RC" | "DP3RC_SAT"  
                             | "DP3HC" | "DP3HC_SAT"  
                             | "DP3XC" | "DP3XC_SAT"  
                             | "DP4"   | "DP4_SAT"  
                             | "DP4R"  | "DP4R_SAT"  
                             | "DP4H"  | "DP4H_SAT"  
                             | "DP4X"  | "DP4X_SAT"  
                             | "DP4C"  | "DP4C_SAT"  
                             | "DP4RC" | "DP4RC_SAT"  
                             | "DP4HC" | "DP4HC_SAT"  
                             | "DP4XC" | "DP4XC_SAT"  
                             | "DST"   | "DST_SAT"  
                             | "DSTR"  | "DSTR_SAT"  
                             | "DSTH"  | "DSTH_SAT"  
                             | "DSTC"  | "DSTC_SAT"  
                             | "DSTRC" | "DSTRC_SAT"  
                             | "DSTHC" | "DSTHC_SAT"  
                             | "MAX"   | "MAX_SAT"  
                             | "MAXR"  | "MAXR_SAT"  
                             | "MAXH"  | "MAXH_SAT"  
                             | "MAXX"  | "MAXX_SAT"  
                             | "MAXC"  | "MAXC_SAT"  
                             | "MAXRC" | "MAXRC_SAT"  
                             | "MAXHC" | "MAXHC_SAT"  
                             | "MAXXC" | "MAXXC_SAT"  
                             | "MIN"   | "MIN_SAT"  
                             | "MINR"  | "MINR_SAT"  
                             | "MINH"  | "MINH_SAT"  
                             | "MINX"  | "MINX_SAT"  
                             | "MINC"  | "MINC_SAT"  
                             | "MINRC" | "MINRC_SAT"  
                             | "MINHC" | "MINHC_SAT"  
                             | "MINXC" | "MINXC_SAT"  
                             | "MUL"   | "MUL_SAT"  
                             | "MULR"  | "MULR_SAT"  
                             | "MULH"  | "MULH_SAT"  
                             | "MULX"  | "MULX_SAT"  
                             | "MULC"  | "MULC_SAT"  
                             | "MULRC" | "MULRC_SAT"  
                             | "MULHC" | "MULHC_SAT"  
                             | "MULXC" | "MULXC_SAT"  
                             | "RFL"   | "RFL_SAT"  
                             | "RFLR"  | "RFLR_SAT"  
                             | "RFLH"  | "RFLH_SAT"  
                             | "RFLC"  | "RFLC_SAT"  
                             | "RFLRC" | "RFLRC_SAT"  
                             | "RFLHC" | "RFLHC_SAT"  
                             | "SEQ"   | "SEQ_SAT"  
                             | "SEQR"  | "SEQR_SAT"  
                             | "SEQH"  | "SEQH_SAT"  
                             | "SEQX"  | "SEQX_SAT"  
                             | "SEQC"  | "SEQC_SAT"  
                             | "SEQRC" | "SEQRC_SAT"  
                             | "SEQHC" | "SEQHC_SAT"  
                             | "SEQXC" | "SEQXC_SAT"  
                             | "SFL"   | "SFL_SAT"  
                             | "SFLR"  | "SFLR_SAT"  
                             | "SFLH"  | "SFLH_SAT"  
                             | "SFLX"  | "SFLX_SAT"  
                             | "SFLC"  | "SFLC_SAT"  
                             | "SFLRC" | "SFLRC_SAT"  
                             | "SFLHC" | "SFLHC_SAT"  
                             | "SFLXC" | "SFLXC_SAT"  
                             | "SGE"   | "SGE_SAT"  
                             | "SGER"  | "SGER_SAT"  
                             | "SGEH"  | "SGEH_SAT"  
                             | "SGEX"  | "SGEX_SAT"  
                             | "SGEC"  | "SGEC_SAT"  
                             | "SGERC" | "SGERC_SAT"  
                             | "SGEHC" | "SGEHC_SAT"  
                             | "SGEXC" | "SGEXC_SAT"  
                             | "SGT"   | "SGT_SAT"  
                             | "SGTR"  | "SGTR_SAT"  
                             | "SGTH"  | "SGTH_SAT"  
                             | "SGTX"  | "SGTX_SAT"  
                             | "SGTC"  | "SGTC_SAT"  
                             | "SGTRC" | "SGTRC_SAT"  
                             | "SGTHC" | "SGTHC_SAT"  
                             | "SGTXC" | "SGTXC_SAT"  
                             | "SLE"   | "SLE_SAT"  
                             | "SLER"  | "SLER_SAT"  
                             | "SLEH"  | "SLEH_SAT"  
                             | "SLEX"  | "SLEX_SAT"  
                             | "SLEC"  | "SLEC_SAT"  
                             | "SLERC" | "SLERC_SAT"  
                             | "SLEHC" | "SLEHC_SAT"  
                             | "SLEXC" | "SLEXC_SAT"  
                             | "SLT"   | "SLT_SAT"  
                             | "SLTR"  | "SLTR_SAT"  
                             | "SLTH"  | "SLTH_SAT"  
                             | "SLTX"  | "SLTX_SAT"  
                             | "SLTC"  | "SLTC_SAT"  
                             | "SLTRC" | "SLTRC_SAT"  
                             | "SLTHC" | "SLTHC_SAT"  
                             | "SLTXC" | "SLTXC_SAT"  
                             | "SNE"   | "SNE_SAT"  
                             | "SNER"  | "SNER_SAT"  
                             | "SNEH"  | "SNEH_SAT"  
                             | "SNEX"  | "SNEX_SAT"  
                             | "SNEC"  | "SNEC_SAT"  
                             | "SNERC" | "SNERC_SAT"  
                             | "SNEHC" | "SNEHC_SAT"  
                             | "SNEXC" | "SNEXC_SAT"  
                             | "STR"   | "STR_SAT"  
                             | "STRR"  | "STRR_SAT"  
                             | "STRH"  | "STRH_SAT"  
                             | "STRX"  | "STRX_SAT"  
                             | "STRC"  | "STRC_SAT"  
                             | "STRRC" | "STRRC_SAT"  
                             | "STRHC" | "STRHC_SAT"  
                             | "STRXC" | "STRXC_SAT"  
                             | "SUB"   | "SUB_SAT"  
                             | "SUBR"  | "SUBR_SAT"  
                             | "SUBH"  | "SUBH_SAT"  
                             | "SUBX"  | "SUBX_SAT"  
                             | "SUBC"  | "SUBC_SAT"  
                             | "SUBRC" | "SUBRC_SAT"  
                             | "SUBHC" | "SUBHC_SAT"  
                             | "SUBXC" | "SUBXC_SAT"  
  
    <TRIop-instruction>    ::= <TRIop> <maskedDstReg> ","  
                               <vectorSrc> "," <vectorSrc> ","  
                               <vectorSrc>  
  
    <TRIop>                ::= "MAD"   | "MAD_SAT"  
                             | "MADR"  | "MADR_SAT"  
                             | "MADH"  | "MADH_SAT"  
                             | "MADX"  | "MADX_SAT"  
                             | "MADC"  | "MADC_SAT"  
                             | "MADRC" | "MADRC_SAT"  
                             | "MADHC" | "MADHC_SAT"  
                             | "MADXC" | "MADXC_SAT"  
                             | "LRP"   | "LRP_SAT"  
                             | "LRPR"  | "LRPR_SAT"  
                             | "LRPH"  | "LRPH_SAT"  
                             | "LRPX"  | "LRPX_SAT"  
                             | "LRPC"  | "LRPC_SAT"  
                             | "LRPRC" | "LRPRC_SAT"  
                             | "LRPHC" | "LRPHC_SAT"  
                             | "LRPXC" | "LRPXC_SAT"  
                             | "X2D"   | "X2D_SAT"  
                             | "X2DR"  | "X2DR_SAT"  
                             | "X2DH"  | "X2DH_SAT"  
                             | "X2DC"  | "X2DC_SAT"  
                             | "X2DRC" | "X2DRC_SAT"  
                             | "X2DHC" | "X2DHC_SAT"  
  
    <KILop-instruction>    ::= <KILop> <ccMask>  
  
    <KILop>                ::= "KIL"  
  
    <TEXop-instruction>    ::= <TEXop> <maskedDstReg> ","  
                               <vectorSrc> "," <texImageId>  
  
    <TEXop>                ::= "TEX"  | "TEX_SAT"  
                             | "TEXC" | "TEXC_SAT"  
                             | "TXP"  | "TXP_SAT"  
                             | "TXPC" | "TXPC_SAT"  
  
    <TXDop-instruction>    ::= <TXDop> <maskedDstReg> ","  
                               <vectorSrc> "," <vectorSrc> ","  
                               <vectorSrc> "," <texImageId>  
  
    <TXDop>                ::= "TXD"  | "TXD_SAT"  
                             | "TXDC" | "TXDC_SAT"  
  
    <scalarSrc>            ::= <absScalarSrc>  
                             | <baseScalarSrc>  
  
    <absScalarSrc>         ::= <negate> "|" <baseScalarSrc> "|"  
  
    <baseScalarSrc>        ::= <signedScalarConstant>  
                             | <negate> <namedScalarConstant>  
                             | <negate> <vectorConstant> <scalarSuffix>  
                             | <negate> <namedLocalParameter> <scalarSuffix>  
                             | <negate> <numberedLocal> <scalarSuffix>  
                             | <negate> <srcRegister> <scalarSuffix>  
  
    <vectorSrc>            ::= <absVectorSrc>  
                             | <baseVectorSrc>  
  
    <absVectorSrc>         ::= <negate> "|" <baseVectorSrc> "|"  
  
    <baseVectorSrc>        ::= <signedScalarConstant>  
                             | <negate> <namedScalarConstant>  
                             | <negate> <vectorConstant> <scalarSuffix>  
                             | <negate> <vectorConstant> <swizzleSuffix>  
                             | <negate> <namedLocalParameter> <scalarSuffix>  
                             | <negate> <namedLocalParameter> <swizzleSuffix>  
                             | <negate> <numberedLocal> <scalarSuffix>  
                             | <negate> <numberedLocal> <swizzleSuffix>  
                             | <negate> <srcRegister> <scalarSuffix>  
                             | <negate> <srcRegister> <swizzleSuffix>  
  
    <maskedDstReg>         ::= <dstRegister> <optionalWriteMask>   
                               <optionalCCMask>  
  
    <dstRegister>          ::= <fragTempReg>  
                             | <fragOutputReg>  
                             | "RC"  
                             | "HC"  
  
    <optionalCCMask>       ::= "(" <ccMask> ")"  
                             | ""  
  
    <ccMask>               ::= <ccMaskRule> <swizzleSuffix>  
                             | <ccMaskRule> <scalarSuffix>  
  
    <ccMaskRule>           ::= "EQ" | "GE" | "GT" | "LE" | "LT" | "NE" |  
                               "TR" | "FL"  
                             
    <optionalWriteMask>    ::= ""  
                             | "." "x"  
                             | "."     "y"  
                             | "." "x" "y"  
                             | "."         "z"  
                             | "." "x"     "z"  
                             | "."     "y" "z"  
                             | "." "x" "y" "z"  
                             | "."             "w"  
                             | "." "x"         "w"  
                             | "."     "y"     "w"  
                             | "." "x" "y"     "w"  
                             | "."         "z" "w"  
                             | "." "x"     "z" "w"  
                             | "."     "y" "z" "w"  
                             | "." "x" "y" "z" "w"  
  
    <srcRegister>          ::= <fragAttribReg>  
                             | <fragTempReg>  
  
    <fragAttribReg>        ::= "f" "[" <fragAttribRegId> "]"  
  
    <fragAttribRegId>      ::= "WPOS" | "COL0" | "COL1" | "FOGC" | "TEX0"  
                             | "TEX1" | "TEX2" | "TEX3" | "TEX4" | "TEX5"  
                             | "TEX6" | "TEX7"  
  
    <fragTempReg>          ::= <fragF32Reg>  
                             | <fragF16Reg>  
  
    <fragF32Reg>           ::= "R0"  | "R1"  | "R2"  | "R3"  
                             | "R4"  | "R5"  | "R6"  | "R7"  
                             | "R8"  | "R9"  | "R10" | "R11"  
                             | "R12" | "R13" | "R14" | "R15"  
                             | "R16" | "R17" | "R18" | "R19"  
                             | "R20" | "R21" | "R22" | "R23"  
                             | "R24" | "R25" | "R26" | "R27"  
                             | "R28" | "R29" | "R30" | "R31"  
  
    <fragF16Reg>           ::= "H0"  | "H1"  | "H2"  | "H3"  
                             | "H4"  | "H5"  | "H6"  | "H7"  
                             | "H8"  | "H9"  | "H10" | "H11"  
                             | "H12" | "H13" | "H14" | "H15"  
                             | "H16" | "H17" | "H18" | "H19"  
                             | "H20" | "H21" | "H22" | "H23"  
                             | "H24" | "H25" | "H26" | "H27"  
                             | "H28" | "H29" | "H30" | "H31"  
                             | "H32" | "H33" | "H34" | "H35"  
                             | "H36" | "H37" | "H38" | "H39"  
                             | "H40" | "H41" | "H42" | "H43"  
                             | "H44" | "H45" | "H46" | "H47"  
                             | "H48" | "H49" | "H50" | "H51"  
                             | "H52" | "H53" | "H54" | "H55"  
                             | "H56" | "H57" | "H58" | "H59"  
                             | "H60" | "H61" | "H62" | "H63"  
  
    <fragOutputReg>        ::= "o" "[" <fragOutputRegName> "]"  
  
    <fragOutputRegName>    ::= "COLR" | "COLH" | "DEPR"  
  
    <numberedLocal>        ::= "p" "[" <localNumber> "]"  
  
    <localNumber>          ::= <integer> from 0 to  
                               MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV - 1  
  
    <scalarSuffix>         ::= "." <component>  
  
    <swizzleSuffix>        ::= ""  
                             | "." <component> <component>  
                                   <component> <component>  
  
    <component>            ::= "x" | "y" | "z" | "w"  
  
    <texImageId>           ::= <texImageUnit> "," <texImageTarget>  
  
    <texImageUnit>         ::= "TEX0"  | "TEX1"  | "TEX2"  | "TEX3"  
                             | "TEX4"  | "TEX5"  | "TEX6"  | "TEX7"  
                             | "TEX8"  | "TEX9"  | "TEX10" | "TEX11"  
                             | "TEX12" | "TEX13" | "TEX14" | "TEX15"  
  
    <texImageTarget>       ::= "1D" | "2D" | "3D" | "CUBE" | "RECT"  
  
    <constantDefinition>   ::= "DEFINE" <namedVectorConstant> "="   
                               <vectorConstant>  
                             | "DEFINE" <namedScalarConstant> "="   
                               <scalarConstant>  
  
    <localDeclaration>     ::= "DECLARE" <namedLocalParameter>   
                               <optionalLocalValue>  
  
    <optionalLocalValue>   ::= ""  
                             | "=" <vectorConstant>  
                             | "=" <scalarConstant>  
  
    <vectorConstant>       ::= {" <vectorConstantList> "}"  
                             | <namedVectorConstant>  
  
    <vectorConstantList>   ::= <scalarConstant>  
                             | <scalarConstant> "," <scalarConstant>  
                             | <scalarConstant> "," <scalarConstant> ","  
                               <scalarConstant>  
                             | <scalarConstant> "," <scalarConstant> ","  
                               <scalarConstant> "," <scalarConstant>  
  
    <scalarConstant>       ::= <signedScalarConstant>  
                             | <namedScalarConstant>  
  
    <signedScalarConstant> ::= <optionalSign> <floatConstant>  
  
    <namedScalarConstant>  ::= <identifier>    ((name of a scalar constant  
                                                 in a DEFINE instruction))  
  
    <namedVectorConstant>  ::= <identifier>    ((name of a vector constant  
                                                 in a DEFINE instruction))  
  
    <namedLocalParameter>  ::= <identifier>    ((name of a local parameter  
                                                 in a DECLARE instruction))  
  
    <negate>               ::= "-" | "+" | ""  
  
    <optionalSign>         ::= "-" | "+" | ""  
  
    <identifier>           ::= see text below  
  
    <floatConstant>        ::= see text below  
  
  
    The <identifier> rule matches a sequence of one or more letters ("A"  
    through "Z", "a" through "z", "_", and "$") and digits ("0" through "9);  
    the first character must be a letter.  The underscore ("_") and dollar  
    sign ("$") count as a letters.  Upper and lower case letters are different  
    (names are case-sensitive).  
  
    The <floatConstant> rule matches a floating-point constant consisting  
    of an integer part, a decimal point, a fraction part, an "e" or  
    "E", and an optionally signed integer exponent.  The integer and  
    fraction parts both consist of a sequence of on or more digits ("0"  
    through "9").  Either the integer part or the fraction parts (not  
    both) may be missing; either the decimal point or the "e" (or "E")  
    and the exponent (not both) may be missing.  
  
    A fragment program fails to load if it contains more than the maximum  
    number of executable instructions.  If ARB_fragment_program is supported,  
    this limit is the value of MAX_PROGRAM_INSTRUCTIONS_ARB for the  
    FRAGMENT_PROGRAM_ARB target.  Otherwise, the limit is 1024.  Executable  
    instructions are those matching the <instruction> rule in the grammar, and  
    do not include DEFINE or DECLARE instructions.  
  
    A fragment program fails to load if its total temporary and output  
    register count exceeds 64.  Each fp32 temporary or output register used by  
    the program (R0-R31, o[COLR], and o[DEPR]) counts as two registers; each  
    fp16 temporary or output register used by the program (H0-H63 and o[COLH])  
    count as a single register.  
        
    A fragment program fails to load if any instruction sources more than one  
    unique fragment attribute register.  Instructions sourcing the same  
    attribute register multiple times are acceptable.  
  
    A fragment program fails to load if any instruction sources more than one  
    unique program parameter register.  Instructions sourcing the same program  
    parameter multiple times are acceptable.  
  
    A fragment program fails to load if multiple texture lookup instructions  
    reference different targets for the same texture image unit.  
  
    A fragment program fails to load if it writes to both the o[COLR] and  
    o[COLH] output registers.  
  
    The error INVALID_OPERATION is generated by LoadProgramNV if a fragment  
    program fails to load because it is not syntactically correct or for one  
    of the semantic restrictions listed above.  
  
    The error INVALID_OPERATION is generated by LoadProgramNV if a program is  
    loaded for id when id is currently loaded with a program of a different  
    target.  
  
    A successfully loaded fragment program is parsed into a sequence of  
    instructions.  Each instruction is identified by its tokenized name.  The  
    operation of these instructions when executed is defined in Sections  
    3.11.4 and 3.11.5.  
  
  
    Section 3.11.4, Fragment Program Operation  
  
    There are forty-five fragment program instructions.  Fragment program  
    instructions may have up to eight variants, including a suffix of "R",  
    "H", or "X" to specify arithmetic precision (section 3.11.4.2), a suffix  
    of "C" to allow an update of the condition code register (section  
    3.11.4.4), and a suffix of "_SAT" to clamp the result vector components to  
    the range [0,1] (section 3.11.4.4).  For example, the sixteen forms of the  
    "ADD" instruction are "ADD", "ADDR", "ADDH", "ADDX", "ADDC", "ADDRC",  
    "ADDHC", "ADDXC", "ADD_SAT", "ADDR_SAT", "ADDH_SAT", "ADDX_SAT",  
    "ADDC_SAT", "ADDRC_SAT", "ADDHC_SAT", and "ADDXC_SAT".  
  
    Some mathematical instructions that support precision suffixes, typically  
    those that involve complicated floating-point computations, do not support  
    the "X" precision suffix.  
  
    The fragment program instructions and their respective input and output  
    parameters are summarized in Table X.4.  
  
      Instruction          Inputs  Output   Description  
      -----------------    ------  ------   --------------------------------  
      ADD[RHX][C][_SAT]    v,v     v        add  
      COS[RH ][C][_SAT]    s       ssss     cosine  
      DDX[RH ][C][_SAT]    v       v        derivative relative to x  
      DDY[RH ][C][_SAT]    v       v        derivative relative to y  
      DP3[RHX][C][_SAT]    v,v     ssss     3-component dot product  
      DP4[RHX][C][_SAT]    v,v     ssss     4-component dot product  
      DST[RH ][C][_SAT]    v,v     v        distance vector  
      EX2[RH ][C][_SAT]    s       ssss     exponential base 2  
      FLR[RHX][C][_SAT]    v       v        floor  
      FRC[RHX][C][_SAT]    v       v        fraction  
      KIL                  none    none     conditionally discard fragment  
      LG2[RH ][C][_SAT]    s       ssss     logarithm base 2  
      LIT[RH ][C][_SAT]    v       v        compute light coefficients  
      LRP[RHX][C][_SAT]    v,v,v   v        linear interpolation  
      MAD[RHX][C][_SAT]    v,v,v   v        multiply and add  
      MAX[RHX][C][_SAT]    v,v     v        maximum  
      MIN[RHX][C][_SAT]    v,v     v        minimum  
      MOV[RHX][C][_SAT]    v       v        move  
      MUL[RHX][C][_SAT]    v,v     v        multiply  
      PK2H                 v       ssss     pack two 16-bit floats  
      PK2US                v       ssss     pack two unsigned 16-bit scalars  
      PK4B                 v       ssss     pack four signed 8-bit scalars  
      PK4UB                v       ssss     pack four unsigned 8-bit scalars  
      POW[RH ][C][_SAT]    s,s     ssss     exponentiation (x^y)  
      RCP[RH ][C][_SAT]    s       ssss     reciprocal  
      RFL[RH ][C][_SAT]    v,v     v        reflection vector  
      RSQ[RH ][C][_SAT]    s       ssss     reciprocal square root  
      SEQ[RHX][C][_SAT]    v,v     v        set on equal  
      SFL[RHX][C][_SAT]    v,v     v        set on false  
      SGE[RHX][C][_SAT]    v,v     v        set on greater than or equal  
      SGT[RHX][C][_SAT]    v,v     v        set on greater than  
      SIN[RH ][C][_SAT]    s       ssss     sine  
      SLE[RHX][C][_SAT]    v,v     v        set on less than or equal  
      SLT[RHX][C][_SAT]    v,v     v        set on less than  
      SNE[RHX][C][_SAT]    v,v     v        set on not equal  
      STR[RHX][C][_SAT]    v,v     v        set on true  
      SUB[RHX][C][_SAT]    v,v     v        subtract  
      TEX[C][_SAT]         v       v        texture lookup  
      TXD[C][_SAT]         v,v,v   v        texture lookup w/partials  
      TXP[C][_SAT]         v       v        projective texture lookup  
      UP2H[C][_SAT]        s       v        unpack two 16-bit floats  
      UP2US[C][_SAT]       s       v        unpack two unsigned 16-bit scalars  
      UP4B[C][_SAT]        s       v        unpack four signed 8-bit scalars  
      UP4UB[C][_SAT]       s       v        unpack four unsigned 8-bit scalars  
      X2D[RH ][C][_SAT]    v,v,v   v        2D coordinate transformation  
       
    Table X.4:  Summary of fragment program instructions.  "[RHX]" indicates  
    an optional arithmetic precision suffix.  "[C]" indicates an optional  
    condition code update suffix.  "[_SAT]" indicates an optional clamp of  
    result vector components to [0,1].  "v" indicates a 4-component vector  
    input or output, "s" indicates a scalar input, and "ssss" indicates a  
    scalar output replicated across a 4-component vector.  
  
  
    Section 3.11.4.1:  Fragment Program Storage Precision  
  
    Registers in fragment program are stored in two different representations:  
    16-bit floating-point (fp16) and 32-bit floating-point (fp32).  There is  
    an additional 12-bit fixed-point representation (fx12) used only as an  
    internal representation for instructions with the "X" precision qualifier.  
  
    In the 32-bit float (fp32) representation, each component is represented  
    in floating-point with eight exponent and twenty-three mantissa bits, as  
    in the standard IEEE single-precision format.  If S represents the sign (0  
    or 1), E represents the exponent in the range [0,255], and M represents  
    the mantissa in the range [0,2^23-1], then an fp32 float is decoded as:  
  
       (-1)^S * 0.0,                           if E == 0,  
       (-1)^S * 2^(E-127) * (1 + M/2^23),      if 0 < E < 255,  
       (-1)^S * INF,                           if E == 255 and M == 0,  
       NaN,                                    if E == 255 and M != 0.  
  
    INF (Infinity) is a special representation indicating numerical overflow.  
    NaN (Not a Number) is a special representation indicating the result of  
    illegal arithmetic operations, such as computing the square root or  
    logarithm of a negative number.  Note that all normal fp32 values, zero,  
    and INF have an associated sign.  -0.0 and +0.0 are considered equivalent  
    for the purposes of comparisons.  
  
    This representation is identical to the IEEE single-precision  
    floating-point standard, except that no special representation is provided  
    for denorms -- numbers in the range (-2^-126, +2^-126).  All such numbers  
    are flushed to zero.  
  
    In a 16-bit float (fp16) register, each component is represented  
    similarly, except with only five exponent and ten mantissa bits.  If S  
    represents the sign (0 or 1), E represents the exponent in the range  
    [0,31], and M represents the mantissa in the range [0,2^10-1], then an  
    fp32 float is decoded as:  
  
       (-1)^S * 0.0,                           if E == 0 and M == 0,  
       (-1)^S * 2^-14 * M/2^10                 if E == 0 and M != 0,  
       (-1)^S * 2^(E-15) * (1 + M/2^10),       if 0 < E < 31,  
       (-1)^S * INF,                           if E == 31 and M == 0, or  
       NaN,                                    if E == 31 and M != 0.  
  
    One important difference is that the fp16 representation, unlike fp32,  
    supports denorms to maximize the limited precision of the 16-bit floating  
    point encodings.  
  
    In the 12-bit fixed-point (fx12) format, numbers are represented as signed  
    12-bit two's complement integers with 10 fraction bits.  The range of  
    representable values is [-2048/1024, +2047/1024].  
  
    Section 3.11.4.2:  Fragment Program Operation Precision  
  
    Fragment program instructions frequently perform mathematical operations.  
    Such operations may be performed at one of three different precisions.  
    Fragment programs can specify the precision of each instruction by using  
    the precision suffix.  If an instruction has a suffix of "R", calculations  
    are carried out with 32-bit floating point operands and results.  If an  
    instruction has a suffix of "H", calculations are carried out using 16-bit  
    floating point operands and results.  If an instruction has a suffix of  
    "X", calculations are carried out using 12-bit fixed point operands and  
    results.  For example, the instruction "MULR" performs a 32-bit  
    floating-point multiply, "MULH" performs a 16-bit floating-point multiply,  
    and "MULX" performs a 12-bit fixed-point multiply.  If no precision suffix  
    is specified, calculations are carried out using the precision of the  
    temporary register receiving the result.  
  
    Fragment program instructions may source registers or constants whose  
    precisions differ from the precision specified with the instruction.  
    Instructions may also generate intermediate results with a different  
    precision than that of the destination register.  In these cases, the  
    values sourced are converted to the precision specified by the  
    instruction.  
  
    When converting to fx12 format, -INF and any values less than -2048/1024  
    become -2048/1024.  +INF, and any values greater than +2047/1024 become  
    +2047/1024.  NaN becomes 0.  
  
    When converting to fp16 format, any values less than or equal to -2^16 are  
    converted to -INF.  Any values greater than or equal to +2^16 are  
    converted to +INF.  -INF, +INF, NaN, -0.0, and +0.0 are unchanged.  Any  
    other values that are not exactly representable in fp16 format are  
    converted to one of the two nearest representable values.  
  
    When converting to fp32 format, any values less than or equal to -2^128  
    are converted to -INF.  Any values greater than or equal to +2^128 are  
    converted to +INF.  -INF, +INF, NaN, -0.0, and +0.0 are unchanged.  Any  
    other values that are not exactly representable in fp32 format are  
    converted to one of the two nearest representable values.  
  
    Fragment program instructions using the fragment attribute registers  
    f[FOGC] or f[TEX0] through f[TEX7] will be carried out at full fp32  
    precision, regardless of the precision specified by the instruction.  
  
    Section 3.11.4.3:  Fragment Program Operands  
  
    Except for KIL, fragment program instructions operate on either vector or  
    scalar operands, indicated in the grammar (see section 3.11.3) by the  
    rules <vectorSrc> and <scalarSrc> respectively.  
  
    The basic set of scalar operands is defined by the grammar rule  
    <baseScalarSrc>.  Scalar operands can be scalar constants (embedded or  
    named), or single components of vector constants, local parameters, or  
    registers allowed by the <srcRegister> rule.  A vector component is  
    selected by the <scalarSuffix> rule, where the characters "x", "y", "z",  
    and "w" select the x, y, z, and w components, respectively, of the vector.  
  
    The basic set of vector operands is defined by the grammar rule  
    <baseVectorSrc>.  Vector operands can include vector constants, local  
    parameters, or registers allowed by the <srcRegister> rule.  
  
    Basic vector operands can be swizzled according to the <swizzleSuffix>  
    rule.  In its most general form, the <swizzleSuffix> rule matches the  
    pattern ".????" where each question mark is one of "x", "y", "z", or "w".  
    For such patterns, the x, y, z, and w components of the operand are taken  
    from the vector components named by the first, second, third, and fourth  
    character of the pattern, respectively.  For example, if the swizzle  
    suffix is ".yzzx" and the specified source contains {2,8,9,0}, the  
    swizzled operand used by the instruction is {8,9,9,2}.  If the  
    <swizzleSuffix> rule matches "", it is treated as though it were ".xyzw".  
  
    Operands can optionally be negated according to the <negate> rule in  
    <baseScalarSrc> or <baseVectorSrc>.  If the <negate> matches "-", each  
    value is negated.  
  
    The absolute value of operands can be taken if the <vectorSrc> or  
    <scalarSrc> rules match <absScalarSrc> or <absVectorSrc>.  In this case,  
    the absolute value of each component is taken.  In addition, if the  
    <negate> rule in <absScalarSrc> or <absVectorSrc> matches "-", the result  
    is then negated.  
  
    Instructions requiring vector operands can also use scalar operands in the  
    case where the <vectorSrc> rule matches <scalarSrc>.  In such cases, a  
    4-component vector is produced by replicating the scalar.  
  
    After operands are loaded, they are converted to a data type corresponding  
    to the operation precision specified in the fragment program instruction.  
   
    The following pseudo-code spells out the operand generation process.  
    "SrcT" and "InstT" refer to the data types of the specified register or  
    constant and the instruction, respectively.  "VecSrcT" and "VecInstT"  
    refer to 4-component vectors of the corresponding type.  "absolute" is  
    TRUE if the operand matches the <absScalarSrc> or <absVectorSrc> rules,  
    and FALSE otherwise.  "negateBase" is TRUE if the <negate> rule in  
    <baseScalarSrc> or <baseVectorSrc> matches "-" and FALSE otherwise.  
    "negateAbs" is TRUE if the <negate> rule in <absScalarSrc> or  
    <absVectorSrc> matches "-" and FALSE otherwise.  The ".c***", ".*c**",  
    ".**c*", ".***c" modifiers refer to the x, y, z, and w components obtained  
    by the swizzle operation.  TypeConvert() is assumed to convert a scalar of  
    type SrcT to a scalar of type InstT using the type conversion process  
    specified above.  
  
      VecInstT VectorLoad(VecSrcT source)  
      {  
          VecSrcT srcVal;  
          VecInstT convertedVal;  
  
          srcVal.x = source.c***;  
          srcVal.y = source.*c**;  
          srcVal.z = source.**c*;  
          srcVal.w = source.***c;  
          if (negateBase) {  
             srcVal.x = -srcVal.x;  
             srcVal.y = -srcVal.y;  
             srcVal.z = -srcVal.z;  
             srcVal.w = -srcVal.w;  
          }  
          if (absolute) {  
             srcVal.x = abs(srcVal.x);  
             srcVal.y = abs(srcVal.y);  
             srcVal.z = abs(srcVal.z);  
             srcVal.w = abs(srcVal.w);  
          }  
          if (negateAbs) {  
             srcVal.x = -srcVal.x;  
             srcVal.y = -srcVal.y;  
             srcVal.z = -srcVal.z;  
             srcVal.w = -srcVal.w;  
          }  
  
          convertedVal.x = TypeConvert(srcVal.x);  
          convertedVal.y = TypeConvert(srcVal.y);  
          convertedVal.z = TypeConvert(srcVal.z);  
          convertedVal.w = TypeConvert(srcVal.w);  
          return convertedVal;  
      }  
  
      InstT ScalarLoad(VecSrcT source)   
      {  
          SrcT srcVal;  
          InstT convertedVal;  
  
          srcVal = source.c***;  
          if (negateBase) {  
            srcVal = -srcVal;  
          }  
          if (absolute) {  
             srcVal = abs(srcVal);  
          }  
          if (negateAbs) {  
            srcVal = -srcVal;  
          }  
  
          convertedVal = TypeConvert(srcVal);  
          return convertedVal;  
      }  
  
  
    Section 3.11.4.4, Fragment Program Destination Register Update  
  
    Each fragment program instruction, except for KIL, writes a 4-component  
    result vector to a single temporary or output register.    
  
    The four components of the result vector are first optionally clamped to  
    the range [0,1].  The components will be clamped if and only if the result  
    clamp suffix "_SAT" is present in the instruction name.  The instruction  
    "ADD_SAT" will clamp the results to [0,1]; the otherwise equivalent  
    instruction "ADD" will not.  
  
    Since the instruction may be carried out at a different precision than the  
    destination register, the components of the results vector are then  
    converted to the data type corresponding to destination register.  
  
    Writes to individual components of the temporary register are controlled  
    by two sets of enables: individual component write masks specified as part  
    of the instruction and the optional condition code mask.  
  
    The component write mask is specified by the <optionalWriteMask> rule  
    found in the <maskedDstReg> rule.  If the optional mask is "", all  
    components are enabled.  Otherwise, the optional mask names the individual  
    components to enable.  The characters "x", "y", "z", and "w" match the x,  
    y, z, and w components respectively.  For example, an optional mask of  
    ".xzw" indicates that the x, z, and w components should be enabled for  
    writing but the y component should not.  The grammar requires that the  
    destination register mask components must be listed in "xyzw" order.  
  
    The optional condition code mask is specified by the <optionalCCMask> rule  
    found in the <maskedDstReg> rule.  If <optionalCCMask> matches "", all  
    components are enabled.  Otherwise, the condition code register is loaded  
    and swizzled according to the swizzling specified by <swizzleSuffix>.  
    Each component of the swizzled condition code is tested according to the  
    rule given by <ccMaskRule>.  <ccMaskRule> may have the values "EQ", "NE",  
    "LT", "GE", LE", or "GT", which mean to enable writes if the corresponding  
    condition code field evaluates to equal, not equal, less than, greater  
    than or equal, less than or equal, or greater than, respectively.  
    Comparisons involving condition codes of "UN" (unordered) evaluate to true  
    for "NE" and false otherwise.  For example, if the condition code is  
    (GT,LT,EQ,GT) and the condition code mask is "(NE.zyxw)", the swizzle  
    operation will load (EQ,LT,GT,GT) and the mask will thus will enable  
    writes on the y, z, and w components.  In addition, "TR" always enables  
    writes and "FL" always disables writes, regardless of the condition code.  
  
    Each component of the destination register is updated with the result of  
    the fragment program if and only if the component is enabled for writes by  
    both the component write mask and the optional condition code mask.  
    Otherwise, the component of the destination register remains unchanged.  
  
    A fragment program instruction can also optionally update the condition  
    code register.  The condition code is updated if the condition code  
    register update suffix "C" is present in the instruction name.  The  
    instruction "ADDC" will update the condition code; the otherwise  
    equivalent instruction "ADD" will not.  If condition code updates are  
    enabled, each component of the destination register enabled for writes is  
    compared to zero.  The corresponding component of the condition code is  
    set to "LT", "EQ", or "GT", if the written component is less than, equal  
    to, or greater than zero, respectively.  Condition code components are set  
    to "UN" if the written component is NaN.  Note that values of -0.0 and  
    +0.0 both evaluate to "EQ".  If a component of the destination register is  
    not enabled for writes, the corresponding condition code component is  
    unchanged.  
  
    In the following example code,  
  
        # R1=(-2, 0, 2, NaN)              R0                  CC  
        MOVC R0, R1;               # ( -2,  0,   2, NaN) (LT,EQ,GT,UN)  
        MOVC R0.xyz, R1.yzwx;      # (  0,  2, NaN, NaN) (EQ,GT,UN,UN)  
        MOVC R0 (NE), R1.zywx;     # (  0,  0, NaN,  -2) (EQ,EQ,UN,LT)  
  
    the first instruction writes (-2,0,2,NaN) to R0 and updates the condition  
    code to (LT,EQ,GT,UN).  The second instruction, only the "x", "y", and "z"  
    components of R0 and the condition code are updated, so R0 ends up with  
    (0,2,NaN,NaN) and the condition code ends up with (EQ,GT,UN,UN).  In the  
    third instruction, the condition code mask disables writes to the x  
    component (its condition code field is "EQ"), so R0 ends up with  
    (0,0,NaN,-2) and the condition code ends up with (EQ,EQ,UN,LT).  
  
    The following pseudocode illustrates the process of writing a result  
    vector to the destination register.  In the example, "ccMaskRule" refers  
    to the condition code mask rule given by <ccMaskRule> (or "" if no rule is  
    specified), "instrmask" refers to the component write mask given by the  
    <optionalWriteMask> rule, "updatecc" is TRUE if condition code updates are  
    enabled, and "clamp01" is TRUE if [0,1] result clamping is enabled.  
    "destination" and "cc" refer to the register selected by <dstRegister> and  
    the condition code, respectively.  
  
      boolean TestCC(CondCode field) {  
          switch (ccMaskRule) {  
          case "EQ":  return (field == "EQ");  
          case "NE":  return (field != "EQ");  
          case "LT":  return (field == "LT");  
          case "GE":  return (field == "GT" || field == "EQ");  
          case "LE":  return (field == "LT" || field == "EQ");  
          case "GT":  return (field == "GT");  
          case "TR":  return TRUE;  
          case "FL":  return FALSE;  
          case "":    return TRUE;  
      }  
  
      enum GenerateCC(DstT value) {  
        if (value == NaN) {  
          return UN;  
        } else if (value < 0) {  
          return LT;  
        } else if (value == 0) {  
          return EQ;  
        } else {  
          return GT;  
        }  
      }  
  
      void UpdateDestination(VecDstT destination, VecInstT result)  
      {  
          // Load the original destination register and condition code.  
          VecDstT resultDst;  
          VecDstT merged;  
          VecCC   mergedCC;  
  
          // Clamp the result vector components to [0,1], if requested.  
          if (clamp01) {  
              if (result.x < 0)      result.x = 0;  
              else if (result.x > 1) result.x = 1;  
              if (result.y < 0)      result.y = 0;  
              else if (result.y > 1) result.y = 1;  
              if (result.z < 0)      result.z = 0;  
              else if (result.z > 1) result.z = 1;  
              if (result.w < 0)      result.w = 0;  
              else if (result.w > 1) result.w = 1;  
          }  
  
          // Convert the result to the type of the destination register.  
          resultDst.x = TypeConvert(result.x);  
          resultDst.y = TypeConvert(result.y);  
          resultDst.z = TypeConvert(result.z);  
          resultDst.w = TypeConvert(result.w);  
  
          // Merge the converted result into the destination register, under  
          // control of the compile- and run-time write masks.  
          merged = destination;  
          mergedCC = cc;  
          if (instrMask.x && TestCC(cc.c***)) {  
              merged.x = result.x;  
              if (updatecc) mergedCC.x = GenerateCC(result.x);  
          }  
          if (instrMask.y && TestCC(cc.*c**)) {  
              merged.y = result.y;  
              if (updatecc) mergedCC.y = GenerateCC(result.y);  
          }  
          if (instrMask.z && TestCC(cc.**c*)) {  
              merged.z = result.z;  
              if (updatecc) mergedCC.z = GenerateCC(result.z);  
          }  
          if (instrMask.w && TestCC(cc.***c)) {  
              merged.w = result.w;  
              if (updatecc) mergedCC.w = GenerateCC(result.w);  
          }  
  
          // Write out the new destination register and result code.  
          destination = merged;  
          cc = mergedCC;  
      }  
  
    Section 3.11.5, Fragment Program Instruction Set  
  
    The following sections describe the instruction set available to fragment  
    programs.  
  
  
    Section 3.11.5.1,  ADD:  Add  
  
    The ADD instruction performs a component-wise add of the two operands to  
    yield a result vector.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      result.x = tmp0.x + tmp1.x;  
      result.y = tmp0.y + tmp1.y;  
      result.z = tmp0.z + tmp1.z;  
      result.w = tmp0.w + tmp1.w;  
  
    The following special-case rules apply to addition:  
  
      1. "A+B" is always equivalent to "B+A".  
      2. NaN + <x> = NaN, for all <x>.  
      3. +INF + <x> = +INF, for all <x> except NaN and -INF.  
      4. -INF + <x> = -INF, for all <x> except NaN and +INF.  
      5. +INF + -INF = NaN.  
      6. -0.0 + <x> = <x>, for all <x>.  
      7. +0.0 + <x> = <x>, for all <x> except -0.0.  
  
  
    Section 3.11.5.2,  COS:  Cosine  
  
    The COS instruction approximates the cosine of the angle specified by the  
    scalar operand and replicates the approximation to all four components of  
    the result vector.  The angle is specified in radians and does not have to  
    be in the range [0,2*PI].  
  
      tmp = ScalarLoad(op0);  
      result.x = ApproxCosine(tmp);  
      result.y = ApproxCosine(tmp);  
      result.z = ApproxCosine(tmp);  
      result.w = ApproxCosine(tmp);  
  
    The approximation function ApproxCosine is accurate to at least 22 bits  
    with an angle in the range [0,2*PI].  
  
      | ApproxCosine(x) - cos(x) | < 1.0 / 2^22, if 0.0 <= x < 2.0 * PI.  
  
    The error in the approximation will typically increase with the absolute  
    value of the angle when the angle falls outside the range [0,2*PI].  
  
    The following special-case rules apply to cosine approximation:  
  
      1. ApproxCosine(NaN) = NaN.  
      2. ApproxCosine(+/-INF) = NaN.  
      3. ApproxCosine(+/-0.0) = +1.0.  
  
  
    Section 3.11.5.3,  DDX:  Derivative Relative to X  
  
    The DDX instruction computes approximate partial derivatives of the four  
    components of the single operand with respect to the X window coordinate  
    to yield a result vector.  The partial derivative is evaluated at the  
    center of the pixel.  
  
      f = VectorLoad(op0);  
      result = ComputePartialX(f);  
  
    Note that the partial derivates obtained by this instruction are  
    approximate, and derivative-of-derivate instruction sequences may not  
    yield accurate second derivatives.    
  
    For components with partial derivatives that overflow (including +/-INF  
    inputs), the resulting partials may be encoded as large floating-point  
    numbers instead of +/-INF.  
  
  
    Section 3.11.5.4,  DDY:  Derivative Relative to Y  
  
    The DDY instruction computes approximate partial derivatives of the four  
    components of the single operand with respect to the Y window coordinate  
    to yield a result vector.  The partial derivative is evaluated at the  
    center of the pixel.  
  
      f = VectorLoad(op0);  
      result = ComputePartialY(f);  
  
    Note that the partial derivates obtained by this instruction are  
    approximate, and derivative-of-derivate instruction sequences may not  
    yield accurate second derivatives.  
  
    For components with partial derivatives that overflow (including +/-INF  
    inputs), the resulting partials may be encoded as large floating-point  
    numbers instead of +/-INF.  
  
  
    Section 3.11.5.5,  DP3:  3-Component Dot Product  
  
    The DP3 instruction computes a three component dot product of the two  
    operands (using the x, y, and z components) and replicates the dot product  
    to all four components of the result vector.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1):  
      result.x = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +   
                 (tmp0.z * tmp2.z);  
      result.y = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +   
                 (tmp0.z * tmp2.z);  
      result.z = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +   
                 (tmp0.z * tmp2.z);  
      result.w = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +   
                 (tmp0.z * tmp2.z);  
  
  
    Section 3.11.5.6,  DP4:  4-Component Dot Product  
  
    The DP4 instruction computes a four component dot product of the two  
    operands and replicates the dot product to all four components of the  
    result vector.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1):  
      result.x = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +   
                 (tmp0.z * tmp2.z) + (tmp0.w * tmp1.w);  
      result.y = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +   
                 (tmp0.z * tmp2.z) + (tmp0.w * tmp1.w);  
      result.z = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +   
                 (tmp0.z * tmp2.z) + (tmp0.w * tmp1.w);  
      result.w = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +   
                 (tmp0.z * tmp2.z) + (tmp0.w * tmp1.w);  
  
  
    Section 3.11.5.7,  DST:  Distance Vector  
  
    The DST instruction computes a distance vector from two specially-  
    formatted operands.  The first operand should be of the form [NA, d^2,  
    d^2, NA] and the second operand should be of the form [NA, 1/d, NA, 1/d],  
    where NA values are not relevant to the calculation and d is a vector  
    length.  If both vectors satisfy these conditions, the result vector will  
    be of the form [1.0, d, d^2, 1/d].  
  
    The exact behavior is specified in the following pseudo-code:  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      result.x = 1.0;  
      result.y = tmp0.y * tmp1.y;  
      result.z = tmp0.z;  
      result.w = tmp1.w;  
  
    Given an arbitrary vector, d^2 can be obtained using the DOT3 instruction  
    (using the same vector for both operands) and 1/d can be obtained from d^2  
    using the RSQ instruction.  
  
    This distance vector is useful for per-fragment light attenuation  
    calculations:  a DOT3 operation involving the distance vector and an  
    attenuation constants vector will yield the attenuation factor.  
  
  
    Section 3.11.5.8,  EX2:  Exponential Base 2  
  
    The EX2 instruction approximates 2 raised to the power of the scalar  
    operand and replicates it to all four components of the result  
    vector.  
  
      tmp = ScalarLoad(op0);  
      result.x = Approx2ToX(tmp);  
      result.y = Approx2ToX(tmp);  
      result.z = Approx2ToX(tmp);  
      result.w = Approx2ToX(tmp);  
  
    The approximation function is accurate to at least 22 bits:  
  
      | Approx2ToX(x) - 2^x | < 1.0 / 2^22, if 0.0 <= x < 1.0,  
  
    and, in general,  
     
      | Approx2ToX(x) - 2^x | < (1.0 / 2^22) * (2^floor(x)).  
  
    The following special-case rules apply to exponential approximation:  
  
      1. Approx2ToX(NaN) = NaN.  
      2. Approx2ToX(-INF) = +0.0.  
      3. Approx2ToX(+INF) = +INF.  
      4. Approx2ToX(+/-0.0) = +1.0.  
  
  
    Section 3.11.5.9,  FLR:  Floor  
  
    The FLR instruction performs a component-wise floor operation on the  
    operand to generate a result vector.  The floor of a value is defined as  
    the largest integer less than or equal to the value.  The floor of 2.3 is  
    2.0; the floor of -3.6 is -4.0.  
  
      tmp = VectorLoad(op0);  
      result.x = floor(tmp.x);  
      result.y = floor(tmp.y);  
      result.z = floor(tmp.z);  
      result.w = floor(tmp.w);  
  
    The following special-case rules apply to floor computation:  
  
      1. floor(NaN) = NaN.  
      2. floor(<x>) = <x>, for -0.0, +0.0, -INF, and +INF.  In all cases, the  
         sign of the result is equal to the sign of the operand.  
  
  
    Section 3.11.5.10,  FRC:  Fraction  
  
    The FRC instruction extracts the fractional portion of each component of  
    the operand to generate a result vector.  The fractional portion of a  
    component is defined as the result after subtracting off the floor of the  
    component (see FLR), and is always in the range [0.00, 1.00).  
  
    For negative values, the fractional portion is NOT the number written to  
    the right of the decimal point -- the fractional portion of -1.7 is not  
    0.7 -- it is 0.3.  0.3 is produced by subtracting the floor of -1.7 (-2.0)  
    from -1.7.  
  
      tmp = VectorLoad(op0);  
      result.x = tmp.x - floor(tmp.x);  
      result.y = tmp.y - floor(tmp.y);  
      result.z = tmp.z - floor(tmp.z);  
      result.w = tmp.w - floor(tmp.w);  
  
    The following special-case rules, which can be derived from the rules for  
    FLR and ADD apply to fraction computation:  
  
      1. fraction(NaN) = NaN.  
      2. fraction(+/-INF) = NaN.  
      3. fraction(+/-0.0) = +0.0.  
  
  
    Section 3.11.5.11,  KIL:  Conditionally Discard Fragment  
  
    The KIL instruction is unlike any other instruction in the instruction  
    set.  This instruction evaluates components of a swizzled condition code  
    using a test expression identical to that used to evaluate condition code  
    write masks (Section 3.11.4.4).  If any condition code component evaluates  
    to TRUE, the fragment is discarded.  Otherwise, the instruction has no  
    effect.  The condition code components are specified, swizzled, and  
    evaluated in the same manner as the condition code write mask.  
  
      if (TestCC(rc.c***) || TestCC(rc.*c**) ||   
          TestCC(rc.**c*) || TestCC(rc.***c)) {  
         // Discard the fragment.  
      } else {  
        // Do nothing.  
      }  
  
    If the fragment is discarded, it is treated as though it were not produced  
    by rasterization.  In particular, none of the per-fragment operations  
    (such as stencil tests, blends, stencil, depth, or color buffer writes)  
    are performed on the fragment.  
  
  
    Section 3.11.5.12,  LG2:  Logarithm Base 2  
  
    The LG2 instruction approximates the base 2 logarithm of the scalar  
    operand and replicates it to all four components of the result vector.  
  
      tmp = ScalarLoad(op0);  
      result.x = ApproxLog2(tmp);  
      result.y = ApproxLog2(tmp);  
      result.z = ApproxLog2(tmp);  
      result.w = ApproxLog2(tmp);  
     
    The approximation function is accurate to at least 22 bits:  
  
      | ApproxLog2(x) - log_2(x) | < 1.0 / 2^22.  
  
    Note that for large values of x, there are not enough bits in the  
    floating-point storage format to represent a result that precisely.  
  
    The following special-case rules apply to logarithm approximation:  
  
      1. ApproxLog2(NaN) = NaN.  
      2. ApproxLog2(+INF) = +INF.  
      3. ApproxLog2(+/-0.0) = -INF.  
      4. ApproxLog2(x) = NaN, -INF < x < -0.0.  
      5. ApproxLog2(-INF) = NaN.  
  
  
    Section 3.11.5.13,  LIT:  Compute Light Coefficients  
  
    The LIT instruction accelerates per-fragment lighting by computing  
    lighting coefficients for ambient, diffuse, and specular light  
    contributions.  The "x" component of the operand is assumed to hold a  
    diffuse dot product (n dot VP_pli, as in the vertex lighting equations in  
    Section 2.13.1).  The "y" component of the operand is assumed to hold a  
    specular dot product (n dot h_i).  The "w" component of the operand is  
    assumed to hold the specular exponent of the material (s_rm).  
  
    The "x" component of the result vector receives the value that should be  
    multiplied by the ambient light/material product (always 1.0).  The "y"  
    component of the result vector receives the value that should be  
    multiplied by the diffuse light/material product (n dot VP_pli).  The "z"  
    component of the result vector receives the value that should be  
    multiplied by the specular light/material product (f_i * (n dot h_i) ^  
    s_rm).  The "w" component of the result is the constant 1.0.  
  
    Negative diffuse and specular dot products are clamped to 0.0, as is done  
    in the standard per-vertex lighting operations.  In addition, if the  
    diffuse dot product is zero or negative, the specular coefficient is  
    forced to zero.  
  
      tmp = VectorLoad(op0);  
      if (t.x < 0) t.x = 0;  
      if (t.y < 0) t.y = 0;  
      result.x = 1.0;  
      result.y = t.x;  
      result.z = (t.x > 0) ? ApproxPower(t.y, t.w) : 0.0;  
      result.w = 1.0;  
  
    The exponentiation approximation used to compute result.z are identical to  
    that used in the POW instruction, including errors and the processing of  
    any special cases.  
  
  
    Section 3.11.5.14,  LRP:  Linear Interpolation  
  
    The LRP instruction performs a component-wise linear interpolation to  
    yield a result vector.  It interpolates between the components of the  
    second and third operands, using the first operand as a weight.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      tmp2 = VectorLoad(op2);  
      result.x = tmp0.x * tmp1.x + (1 - tmp0.x) * tmp2.x;  
      result.y = tmp0.y * tmp1.y + (1 - tmp0.y) * tmp2.y;  
      result.z = tmp0.z * tmp1.z + (1 - tmp0.z) * tmp2.z;  
      result.w = tmp0.w * tmp1.w + (1 - tmp0.w) * tmp2.w;  
  
  
    Section 3.11.5.15,  MAD:  Multiply and Add  
  
    The MAD instruction performs a component-wise multiply of the first two  
    operands, and then does a component-wise add of the product to the third  
    operand to yield a result vector.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      tmp2 = VectorLoad(op2);  
      result.x = tmp0.x * tmp1.x + tmp2.x;  
      result.y = tmp0.y * tmp1.y + tmp2.y;  
      result.z = tmp0.z * tmp1.z + tmp2.z;  
      result.w = tmp0.w * tmp1.w + tmp2.w;  
  
  
    Section 3.11.5.16,  MAX:  maximum  
  
    The MAX instruction computes component-wise maximums of the values in the  
    two operands to yield a result vector.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      result.x = max(tmp0.x, tmp1.x);  
      result.y = max(tmp0.y, tmp1.y);  
      result.z = max(tmp0.z, tmp1.z);  
      result.w = max(tmp0.w, tmp1.w);  
  
    The following special cases apply to the maximum operation:  
  
      1. max(A,B) is always equivalent to max(B,A).  
      2. max(NaN, <x>) == NaN, for all <x>.  
  
      
  
    Section 3.11.5.17,  MIN:  minimum  
  
    The MIN instruction computes component-wise minimums of the values in the  
    two operands to yield a result vector.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      result.x = min(tmp0.x, tmp1.x);  
      result.y = min(tmp0.y, tmp1.y);  
      result.z = min(tmp0.z, tmp1.z);  
      result.w = min(tmp0.w, tmp1.w);  
  
    The following special cases apply to the minimum operation:  
  
      1. min(A,B) is always equivalent to min(B,A).  
      2. min(NaN, <x>) == NaN, for all <x>.  
  
  
    Section 3.11.5.18,  MOV:  Move  
  
    The MOV instruction copies the value of the operand to yield a result  
    vector.  
  
      result = VectorLoad(op0);  
  
  
    Section 3.11.5.19,  MUL:  Multiply  
  
    The MUL instruction performs a component-wise multiply of the two operands  
    to yield a result vector.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      result.x = tmp0.x * tmp1.x;  
      result.y = tmp0.y * tmp1.y;  
      result.z = tmp0.z * tmp1.z;  
      result.w = tmp0.w * tmp1.w;  
  
    The following special-case rules apply to multiplication:  
  
      1. "A*B" is always equivalent to "B*A".  
      2. NaN * <x> = NaN, for all <x>.  
      3. +/-0.0 * +/-INF = NaN.  
      4. +/-0.0 * <x> = +/-0.0, for all <x> except -INF, +INF, and NaN.  The  
         sign of the result is positive if the signs of the two operands match  
         and negative otherwise.  
      5. +/-INF * <x> = +/-INF, for all <x> except -0.0, +0.0, and NaN.  The   
         sign of the result is positive if the signs of the two operands match  
         and negative otherwise.  
      6. +1.0 * <x> = <x>, for all <x>.  
  
  
    Section 3.11.5.20,  PK2H:  Pack Two 16-bit Floats  
  
    The PK2H instruction converts the "x" and "y" components of the single  
    operand into 16-bit floating-point format, packs the bit representation of  
    these two floats into a 32-bit value, and replicates that value to all  
    four components of the result vector.  The PK2H instruction can be  
    reversed by the UP2H instruction below.  
  
      tmp0 = VectorLoad(op0);  
      /* result obtained by combining raw bits of tmp0.x, tmp0.y */  
      result.x = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);  
      result.y = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);  
      result.z = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);  
      result.w = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);  
  
    The result must be written to a register with 32-bit components (an "R"  
    register, o[COLR], or o[DEPR]).  A fragment program will fail to load if  
    any other register type is specified.  
  
  
    Section 3.11.5.21,  PK2US:  Pack Two Unsigned 16-bit Scalars  
  
    The PK2US instruction converts the "x" and "y" components of the single  
    operand into a packed pair of 16-bit unsigned scalars.  The scalars are  
    represented in a bit pattern where all '0' bits corresponds to 0.0 and all  
    '1' bits corresponds to 1.0.  The bit representations of the two converted  
    components are packed into a 32-bit value, and that value is replicated to  
    all four components of the result vector.  The PK2US instruction can be  
    reversed by the UP2US instruction below.  
  
      tmp0 = VectorLoad(op0);  
      if (tmp0.x < 0.0) tmp0.x = 0.0;  
      if (tmp0.x > 1.0) tmp0.x = 1.0;  
      if (tmp0.y < 0.0) tmp0.y = 0.0;  
      if (tmp0.y > 1.0) tmp0.y = 1.0;  
      us.x = round(65535.0 * tmp0.x);  /* us is a ushort vector */  
      us.y = round(65535.0 * tmp0.y);  
      /* result obtained by combining raw bits of us. */  
      result.x = ((us.x) | (us.y << 16));  
      result.y = ((us.x) | (us.y << 16));  
      result.z = ((us.x) | (us.y << 16));  
      result.w = ((us.x) | (us.y << 16));  
  
    The result must be written to a register with 32-bit components (an "R"  
    register, o[COLR], or o[DEPR]).  A fragment program will fail to load if  
    any other register type is specified.  
  
  
    Section 3.11.5.22,  PK4B:  Pack Four Signed 8-bit Scalars  
  
    The PK4B instruction converts the four components of the single operand  
    into 8-bit signed quantities.  The signed quantities are represented in a  
    bit pattern where all '0' bits corresponds to -128/127 and all '1' bits  
    corresponds to +127/127.  The bit representations of the four converted  
    components are packed into a 32-bit value, and that value is replicated to  
    all four components of the result vector.  The PK4B instruction can be  
    reversed by the UP4B instruction below.  
  
      tmp0 = VectorLoad(op0);  
      if (tmp0.x < -128/127) tmp0.x = -128/127;  
      if (tmp0.y < -128/127) tmp0.y = -128/127;  
      if (tmp0.z < -128/127) tmp0.z = -128/127;  
      if (tmp0.w < -128/127) tmp0.w = -128/127;  
      if (tmp0.x > +127/127) tmp0.x = +127/127;  
      if (tmp0.y > +127/127) tmp0.y = +127/127;  
      if (tmp0.z > +127/127) tmp0.z = +127/127;  
      if (tmp0.w > +127/127) tmp0.w = +127/127;  
      ub.x = round(127.0 * tmp0.x + 128.0);  /* ub is a ubyte vector */  
      ub.y = round(127.0 * tmp0.y + 128.0);  
      ub.z = round(127.0 * tmp0.z + 128.0);  
      ub.w = round(127.0 * tmp0.w + 128.0);  
      /* result obtained by combining raw bits of ub. */  
      result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));  
      result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));  
      result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));  
      result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));  
  
    The result must be written to a register with 32-bit components (an "R"  
    register, o[COLR], or o[DEPR]).  A fragment program will fail to load if  
    any other register type is specified.  
  
  
    Section 3.11.5.23,  PK4UB:  Pack Four Unsigned 8-bit Scalars  
  
    The PK4UB instruction converts the four components of the single operand  
    into a packed grouping of 8-bit unsigned scalars.  The scalars are  
    represented in a bit pattern where all '0' bits corresponds to 0.0 and all  
    '1' bits corresponds to 1.0.  The bit representations of the four  
    converted components are packed into a 32-bit value, and that value is  
    replicated to all four components of the result vector.  The PK4UB  
    instruction can be reversed by the UP4UB instruction below.  
  
      tmp0 = VectorLoad(op0);  
      if (tmp0.x < 0.0) tmp0.x = 0.0;  
      if (tmp0.x > 1.0) tmp0.x = 1.0;  
      if (tmp0.y < 0.0) tmp0.y = 0.0;  
      if (tmp0.y > 1.0) tmp0.y = 1.0;  
      if (tmp0.z < 0.0) tmp0.z = 0.0;  
      if (tmp0.z > 1.0) tmp0.z = 1.0;  
      if (tmp0.w < 0.0) tmp0.w = 0.0;  
      if (tmp0.w > 1.0) tmp0.w = 1.0;  
      ub.x = round(255.0 * tmp0.x);  /* ub is a ubyte vector */  
      ub.y = round(255.0 * tmp0.y);  
      ub.z = round(255.0 * tmp0.z);  
      ub.w = round(255.0 * tmp0.w);  
      /* result obtained by combining raw bits of ub. */  
      result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));  
      result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));  
      result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));  
      result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));  
  
    The result must be written to a register with 32-bit components (an "R"  
    register, o[COLR], or o[DEPR]).  A fragment program will fail to load if  
    any other register type is specified.  
  
  
    Section 3.11.5.24,  POW:  Exponentiation  
  
    The POW instruction approximates the value of the first scalar operand  
    raised to the power of the second scalar operand and replicates it to all  
    four components of the result vector.  
  
      tmp0 = ScalarLoad(op0);  
      tmp1 = ScalarLoad(op1);  
      result.x = ApproxPower(tmp0, tmp1);  
      result.y = ApproxPower(tmp0, tmp1);  
      result.z = ApproxPower(tmp0, tmp1);  
      result.w = ApproxPower(tmp0, tmp1);  
     
    The exponentiation approximation function is defined in terms of the base  
    2 exponentiation and logarithm approximation operations in the EX2 and LG2  
    instructions, including errors and the processing of any special cases.  
    In particular,  
  
      ApproxPower(a,b) = ApproxExp2(b * ApproxLog2(a)).  
  
    The following special-case rules, which can be derived from the rules in  
    the LG2, MUL, and EX2 instructions, apply to exponentiation:  
  
      1. ApproxPower(<x>, <y>) = NaN, if x < -0.0,  
      2. ApproxPower(<x>, <y>) = NaN, if x or y is NaN.  
      3. ApproxPower(+/-0.0, +/-0.0) = NaN.  
      4. ApproxPower(+INF, +/-0.0) = NaN.  
      5. ApproxPower(+1.0, +/-INF) = NaN.  
      6. ApproxPower(+/-0.0, <x>) = +0.0, if x > +0.0.  
      7. ApproxPower(+/-0.0, <x>) = +INF, if x < -0.0.  
      8. ApproxPower(+1.0, <x>)   = +1.0, if -INF < x < +INF.  
      9. ApproxPower(+INF, <x>) = +INF, if x > +0.0.  
      10. ApproxPower(+INF, <x>) = +INF, if x < -0.0.  
      11. ApproxPower(<x>, +/-0.0) = +1.0, if +0.0 < x < +INF.  
      12. ApproxPower(<x>, +1.0) ~= <x>, if x >= +0.0.  
      13. ApproxPower(<x>, +INF) = +0.0, if -0.0 <= x < +1.0,  
                                   +INF, if x > +1.0,  
      14. ApproxPower(<x>, -INF) = +INF, if -0.0 <= x < +1.0,  
                                   +0.0, if x > +1.0,  
  
    Note that 0^0 is defined here as NaN, since ApproxLog2(0) = -INF, and  
    0*(-INF) = NaN.  In many other applications, including the standard C  
    pow() function, 0^0 is defined as 1.0.  This behavior can be emulated  
    using additional instructions in much that same way that the pow()  
    function is implemented on many CPUs.  
  
    Note that a logarithm is involved even if the exponent is an integer.  
    This means that any exponentiating with a negative base will produce NaN.  
    In constrast, it is possible in a "normal" mathematical formulation to  
    raise negative numbers to integral powers (e.g., (-3)^2== 9, and  
    (-0.5)^-2==4).  
  
  
    Section 3.11.5.25,  RCP:  Reciprocal  
  
    The RCP instruction approximates the reciprocal of the scalar operand and  
    replicates it to all four components of the result vector.  
  
      tmp = ScalarLoad(op0);  
      result.x = ApproxReciprocal(tmp);  
      result.y = ApproxReciprocal(tmp);  
      result.z = ApproxReciprocal(tmp);  
      result.w = ApproxReciprocal(tmp);  
  
    The approximation function is accurate to at least 22 bits:  
  
      | ApproxReciprocal(x) - (1/x) | < 1.0 / 2^22, if 1.0 <= x < 2.0.  
  
    The following special-case rules apply to reciprocation:  
  
      1. ApproxReciprocal(NaN) = NaN.  
      2. ApproxReciprocal(+INF) = +0.0.  
      3. ApproxReciprocal(-INF) = -0.0.  
      4. ApproxReciprocal(+0.0) = +INF.  
      5. ApproxReciprocal(-0.0) = -INF.  
  
  
    Section 3.11.5.26,  RFL:  Reflection Vector  
  
    The RFL instruction computes the reflection of the second vector operand  
    (the "direction" vector) about the vector specified by the first vector  
    operand (the "axis" vector).  Both operands are treated as 3D vectors (the  
    w components are ignored).  The result vector is another 3D vector (the  
    "reflected direction" vector).  The length of the result vector, ignoring  
    rounding errors, should equal that of the second operand.  
  
      axis = VectorLoad(op0);  
      direction = VectorLoad(op1);  
      tmp.w = (axis.x * axis.x + axis.y * axis.y +  
               axis.z * axis.z);  
      tmp.x = (axis.x * direction.x + axis.y * direction.y +   
               axis.z * direction.z);  
      tmp.x = 2.0 * tmp.x;  
      tmp.x = tmp.x / tmp.w;  
      result.x = tmp.x * axis.x - direction.x;  
      result.y = tmp.x * axis.y - direction.y;  
      result.z = tmp.x * axis.z - direction.z;  
  
    A fragment program will fail to load if the w component of the result is  
    enabled in the component write mask (see the <optionalWriteMask> rule in  
    the grammar).  
  
  
    Section 3.11.5.27,  RSQ:  Reciprocal Square Root  
  
    The RSQ instruction approximates the reciprocal of the square root of the  
    scalar operand and replicates it to all four components of the result  
    vector.  
  
      tmp = ScalarLoad(op0);  
      result.x = ApproxRSQRT(tmp);  
      result.y = ApproxRSQRT(tmp);  
      result.z = ApproxRSQRT(tmp);  
      result.w = ApproxRSQRT(tmp);  
  
    The approximation function is accurate to at least 22 bits:  
  
      | ApproxRSQRT(x) - (1/x) | < 1.0 / 2^22, if 1.0 <= x < 4.0.  
  
    The following special-case rules apply to reciprocal square roots:  
  
      1. ApproxRSQRT(NaN) = NaN.  
      2. ApproxRSQRT(+INF) = +0.0.  
      3. ApproxRSQRT(-INF) = NaN.  
      4. ApproxRSQRT(+0.0) = +INF.  
      5. ApproxRSQRT(-0.0) = -INF.  
      6. ApproxRSQRT(x) = NaN, if -INF < x < -0.0.  
  
  
    Section 3.11.5.28,  SEQ:  Set on Equal To  
  
    The SEQ instruction performs a component-wise comparison of the two  
    operands.  Each component of the result vector is 1.0 if the corresponding  
    component of the first operand is equal to that of the second, and 0.0  
    otherwise.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      result.x = (tmp0.x == tmp1.x) ? 1.0 : 0.0;  
      result.y = (tmp0.y == tmp1.y) ? 1.0 : 0.0;  
      result.z = (tmp0.z == tmp1.z) ? 1.0 : 0.0;  
      result.w = (tmp0.w == tmp1.w) ? 1.0 : 0.0;  
  
    The following special-case rules apply to SEQ:  
  
      1. (<x> == <y>) and (<y> == <x>) always produce the same result.  
      1. (NaN == <x>) is FALSE for all <x>, including NaN.  
      2. (+INF == +INF) and (-INF == -INF) are TRUE.  
      3. (-0.0 == +0.0) and (+0.0 == -0.0) are TRUE.  
  
  
    Section 3.11.5.29,  SFL:  Set on False  
  
    The SFL instruction is a degenerate case of the other "Set on"  
    instructions that sets all components of the result vector to  
    0.0.  
  
      result.x = 0.0;  
      result.y = 0.0;  
      result.z = 0.0;  
      result.w = 0.0;  
  
  
    Section 3.11.5.30,  SGE:  Set on Greater Than or Equal  
  
    The SGE instruction performs a component-wise comparison of the two  
    operands.  Each component of the result vector is 1.0 if the corresponding  
    component of the first operands is greater than or equal that of the  
    second, and 0.0 otherwise.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      result.x = (tmp0.x >= tmp1.x) ? 1.0 : 0.0;  
      result.y = (tmp0.y >= tmp1.y) ? 1.0 : 0.0;  
      result.z = (tmp0.z >= tmp1.z) ? 1.0 : 0.0;  
      result.w = (tmp0.w >= tmp1.w) ? 1.0 : 0.0;  
  
    The following special-case rules apply to SGE:  
  
      1. (NaN >= <x>) and (<x> >= NaN) are FALSE for all <x>.  
      2. (+INF >= +INF) and (-INF >= -INF) are TRUE.  
      3. (-0.0 >= +0.0) and (+0.0 >= -0.0) are TRUE.  
  
  
    Section 3.11.5.31,  SGT:  Set on Greater Than  
  
    The SGT instruction performs a component-wise comparison of the two  
    operands.  Each component of the result vector is 1.0 if the corresponding  
    component of the first operands is greater than that of the second, and  
    0.0 otherwise.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      result.x = (tmp0.x > tmp1.x) ? 1.0 : 0.0;  
      result.y = (tmp0.y > tmp1.y) ? 1.0 : 0.0;  
      result.z = (tmp0.z > tmp1.z) ? 1.0 : 0.0;  
      result.w = (tmp0.w > tmp1.w) ? 1.0 : 0.0;  
  
    The following special-case rules apply to SGT:  
  
      1. (NaN > <x>) and (<x> > NaN) are FALSE for all <x>.  
      2. (-0.0 > +0.0) and (+0.0 > -0.0) are FALSE.  
  
  
    Section 3.11.5.32,  SIN:  Sine  
  
    The SIN instruction approximates the sine of the angle specified by the  
    scalar operand and replicates it to all four components of the result  
    vector.  The angle is specified in radians and does not have to be in the  
    range [0,2*PI].  
  
      tmp = ScalarLoad(op0);  
      result.x = ApproxSine(tmp);  
      result.y = ApproxSine(tmp);  
      result.z = ApproxSine(tmp);  
      result.w = ApproxSine(tmp);  
  
    The approximation function is accurate to at least 22 bits with an angle  
    in the range [0,2*PI].  
  
      | ApproxSine(x) - sin(x) | < 1.0 / 2^22, if 0.0 <= x < 2.0 * PI.  
  
    The error in the approximation will typically increase with the absolute  
    value of the angle when the angle falls outside the range [0,2*PI].  
  
    The following special-case rules apply to cosine approximation:  
  
      1. ApproxSine(NaN) = NaN.  
      2. ApproxSine(+/-INF) = NaN.  
      3. ApproxSine(+/-0.0) = +/-0.0.  The sign of the result is equal to the  
         sign of the single operand.  
  
  
    Section 3.11.5.33,  SLE:  Set on Less Than or Equal  
  
    The SLE instruction performs a component-wise comparison of the two  
    operands.  Each component of the result vector is 1.0 if the corresponding  
    component of the first operand is less than or equal to that of the  
    second, and 0.0 otherwise.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      result.x = (tmp0.x <= tmp1.x) ? 1.0 : 0.0;  
      result.y = (tmp0.y <= tmp1.y) ? 1.0 : 0.0;  
      result.z = (tmp0.z <= tmp1.z) ? 1.0 : 0.0;  
      result.w = (tmp0.w <= tmp1.w) ? 1.0 : 0.0;  
  
    The following special-case rules apply to SLE:  
  
      1. (NaN <= <x>) and (<x> <= NaN) are FALSE for all <x>.  
      2. (+INF <= +INF) and (-INF <= -INF) are TRUE.  
      3. (-0.0 <= +0.0) and (+0.0 <= -0.0) are TRUE.  
  
  
    Section 3.11.5.34,  SLT:  Set on Less Than  
  
    The SLT instruction performs a component-wise comparison of the two  
    operands.  Each component of the result vector is 1.0 if the corresponding  
    component of the first operand is less than that of the second, and 0.0  
    otherwise.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      result.x = (tmp0.x < tmp1.x) ? 1.0 : 0.0;  
      result.y = (tmp0.y < tmp1.y) ? 1.0 : 0.0;  
      result.z = (tmp0.z < tmp1.z) ? 1.0 : 0.0;  
      result.w = (tmp0.w < tmp1.w) ? 1.0 : 0.0;  
  
    The following special-case rules apply to SLT:  
  
      1. (NaN < <x>) and (<x> < NaN) are FALSE for all <x>.  
      2. (-0.0 < +0.0) and (+0.0 < -0.0) are FALSE.  
  
  
    Section 3.11.5.35,  SNE:  Set on Not Equal  
  
    The SNE instruction performs a component-wise comparison of the two  
    operands.  Each component of the result vector is 1.0 if the corresponding  
    component of the first operand is not equal to that of the second, and 0.0  
    otherwise.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      result.x = (tmp0.x != tmp1.x) ? 1.0 : 0.0;  
      result.y = (tmp0.y != tmp1.y) ? 1.0 : 0.0;  
      result.z = (tmp0.z != tmp1.z) ? 1.0 : 0.0;  
      result.w = (tmp0.w != tmp1.w) ? 1.0 : 0.0;  
  
    The following special-case rules apply to SNE:  
  
      1. (<x> != <y>) and (<y> != <x>) always produce the same result.  
      2. (NaN != <x>) is TRUE for all <x>, including NaN.  
      3. (+INF != +INF) and (-INF != -INF) are FALSE.  
      4. (-0.0 != +0.0) and (+0.0 != -0.0) are TRUE.  
  
  
    Section 3.11.5.36,  STR:  Set on True  
  
    The STR instruction is a degenerate case of the other "Set on"  
    instructions that sets all components of the result vector to 1.0.  
  
      result.x = 1.0;  
      result.y = 1.0;  
      result.z = 1.0;  
      result.w = 1.0;  
  
  
    Section 3.11.5.37,  SUB:  Subtract  
  
    The SUB instruction performs a component-wise subtraction of the second  
    operand from the first to yield a result vector.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      result.x = tmp0.x - tmp1.x;  
      result.y = tmp0.y - tmp1.y;  
      result.z = tmp0.z - tmp1.z;  
      result.w = tmp0.w - tmp1.w;  
  
    The SUB instruction is completely equivalent to an identical ADD  
    instruction in which the negate operator on the second operand is  
    reversed:  
  
      1. "SUB R0, R1, R2" is equivalent to "ADD R0, R1, -R2".  
      2. "SUB R0, R1, -R2" is equivalent to "ADD R0, R1, R2".  
      3. "SUB R0, R1, |R2|" is equivalent to "ADD R0, R1, -|R2|".  
      4. "SUB R0, R1, -|R2|" is equivalent to "ADD R0, R1, |R2|".  
  
  
    Section 3.11.5.38,  TEX: Texture Lookup  
  
    The TEX instruction performs a filtered texture lookup using the texture  
    target given by <texImageTarget> belonging to the texture image unit given  
    by <texImageUnit>.  <texImageTarget> values of "1D", "2D", "3D", "CUBE",  
    and "RECT" correspond to the texture targets TEXTURE_1D, TEXTURE_2D,  
    TEXTURE_3D, TEXTURE_CUBE_MAP_ARB, and TEXTURE_RECTANGLE_NV, respectively.  
      
    The (s,t,r) texture coordinates used for the lookup are the x, y, and z  
    components of the single operand.  
  
    The texture lookup is performed as specified in Section 3.8.  The LOD  
    calculations in Section 3.8.5 are performed using an implementation  
    dependent method to derive ds/dx, ds/dy, dt/dx, dt/dy, dr/dx, and dr/dy.  
    The mapping of filtered texture components to the components of the result  
    vector is dependent on the base internal format of the texture and is  
    specified in Table X.5.  
  
                                 Result Vector Components  
      Base Internal Format        X      Y      Z      W  
      --------------------      -----  -----  -----  -----  
      ALPHA                      0.0    0.0    0.0    At  
      LUMINANCE                  Lt     Lt     Lt     1.0  
      LUMINANCE_ALPHA            Lt     Lt     Lt     At  
      INTENSITY                  It     It     It     It  
      RGB                        Rt     Gt     Bt     1.0  
      RGBA                       Rt     Gt     Bt     At  
      HILO_NV (signed)           HIt    LOt    HEMI   1.0  
      HILO_NV (unsigned)         HIt    LOt    1.0    1.0  
      DSDT_NV                    DSt    DTt    0.0    1.0  
      DSDT_MAG_NV                DSt    DTt    MAGt   1.0  
      DSDT_MAG_INTENSITY_NV      DSt    DTt    MAGt   It  
      FLOAT_R_NV                 Rt     0.0    0.0    1.0  
      FLOAT_RG_NV                Rt     Gt     0.0    1.0  
      FLOAT_RGB_NV               Rt     Gt     Bt     1.0  
      FLOAT_RGBA_NV              Rt     Gt     Bt     At  
        
      Table X.5:  Mapping of filtered texel components to result vector  
      components for the TEX instruction.  0.0 and 1.0 indicate that the  
      corresponding constant value is written to the result vector.  
      DEPTH_COMPONENT textures are treated as ALPHA, LUMINANCE, or INTENSITY,  
      as specified in the texture's depth texture mode.  
  
      For HILO_NV textures with signed components, "HEMI" is defined as  
      sqrt(MAX(0, 1-(HIt^2+LOt^2))).  
  
    This instruction specifies a particular texture target, ignoring the  
    standard hierarchy of texture enables (TEXTURE_CUBE_MAP_ARB, TEXTURE_3D,  
    TEXTURE_2D, TEXTURE_1D) used to select a texture target in unextended  
    OpenGL.  If the specified texture target has a consistent set of images, a  
    lookup is performed.  Otherwise, the result of the instruction is the  
    vector (0,0,0,0).  
  
    Although this instruction allows the selection of any texture target, a  
    fragment program can not use more than one texture target for any given  
    texture image unit.  
        
  
    Section 3.11.5.39,  TXD: Texture Lookup with Derivatives  
  
    The TXD instruction performs a filtered texture lookup using the texture  
    target given by <texImageTarget> belonging to the texture image unit given  
    by <texImageUnit>.  <texImageTarget> values of "1D", "2D", "3D", "CUBE",  
    and "RECT" correspond to the texture targets TEXTURE_1D, TEXTURE_2D,  
    TEXTURE_3D, TEXTURE_CUBE_MAP_ARB, and TEXTURE_RECTANGLE_NV, respectively.  
      
    The (s,t,r) texture coordinates used for the lookup are the x, y, and z  
    components of the first operand.  The partial derivatives in the X  
    direction (ds/dx, dt/dx, dr/dx) are specified by the x, y, and z  
    components of the second operand.  The partial derivatives in the Y  
    direction (ds/dy, dt/dy, dr/dy) are specified by the x, y, and z  
    components of the third operand.  
  
    The texture lookup is performed as specified in Section 3.8.  The LOD  
    calculations in Section 3.8.5 are performed using the specified partial  
    derivatives.  The mapping of filtered texture components to the components  
    of the result vector is dependent on the base internal format of the  
    texture and is specified in Table X.5.  
  
    This instruction specifies a particular texture target, ignoring the  
    standard hierarchy of texture enables (TEXTURE_CUBE_MAP_ARB, TEXTURE_3D,  
    TEXTURE_2D, TEXTURE_1D) used to select a texture target in unextended  
    OpenGL.  If the specified texture target has a consistent set of images, a  
    lookup is performed.  Otherwise, the result of the instruction is the  
    vector (0,0,0,0).  
        
    Although this instruction allows the selection of any texture target, a  
    fragment program can not use more than one texture target for any given  
    texture image unit.  
        
  
    Section 3.11.5.40,  TXP: Projective Texture Lookup  
  
    The TXP instruction performs a filtered texture lookup using the texture  
    target given by <texImageTarget> belonging to the texture image unit given  
    by <texImageUnit>.  <texImageTarget> values of "1D", "2D", "3D", "CUBE",  
    and "RECT" correspond to the texture targets TEXTURE_1D, TEXTURE_2D,  
    TEXTURE_3D, TEXTURE_CUBE_MAP_ARB, and TEXTURE_RECTANGLE_NV, respectively.  
  
    For cube map textures, the (s,t,r) texture coordinates used for the lookup  
    are given by x, y, and z, respectively.  For all other textures, the  
    (s,t,r) texture coordinates used for the lookup are given by x/w, y/w, and  
    z/w, respectively, where x, y, z, and w are the corresponding components  
    of the operand.  
  
    The texture lookup is performed as specified in Section 3.8.  The LOD  
    calculations in Section 3.8.5 are performed using an implementation  
    dependent method to derive ds/dx, ds/dy, dt/dx, dt/dy, dr/dx, and dr/dy.  
    The mapping of filtered texture components to the components of the result  
    vector is dependent on the base internal format of the texture and is  
    specified in Table X.5.  
  
    This instruction specifies a particular texture target, ignoring the  
    standard hierarchy of texture enables (TEXTURE_CUBE_MAP_ARB, TEXTURE_3D,  
    TEXTURE_2D, TEXTURE_1D) used to select a texture target in unextended  
    OpenGL.  If the specified texture target has a consistent set of images, a  
    lookup is performed.  Otherwise, the result of the instruction is the  
    vector (0,0,0,0).  
        
    Although this instruction allows the selection of any texture target, a  
    fragment program can not use more than one texture target for any given  
    texture image unit.  
        
  
    Section 3.11.5.41,  UP2H:  Unpack Two 16-Bit Floats  
  
    The UP2H instruction unpacks two 16-bit floats stored together in a 32-bit  
    scalar operand.  The first 16-bit float (stored in the 16 least  
    significant bits) is written into the "x" and "z" components of the result  
    vector; the second is written into the "y" and "w" components of the  
    result vector.  
  
    This operation undoes the type conversion and packing performed by the  
    PK2H instruction.  
  
      tmp = ScalarLoad(op0);  
      result.x = (fp16) (RawBits(tmp) & 0xFFFF);  
      result.y = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF);  
      result.z = (fp16) (RawBits(tmp) & 0xFFFF);  
      result.w = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF);  
      
    Since the source operand must be a 32-bit scalar, a fragment program will  
    fail to load if the operand is not obtained from a register with 32-bit  
    components or from a program parameter.  
  
  
    Section 3.11.5.42,  UP2US:  Unpack Two Unsigned 16-Bit Scalars  
  
    The UP2US instruction unpacks two 16-bit unsigned values packed together  
    in a 32-bit scalar operand.  The unsigned quantities are encoded where a  
    bit pattern of all '0' bits corresponds to 0.0 and a pattern of all '1'  
    bits corresponds to 1.0.  The "x" and "z" components of the result vector  
    are obtained from the 16 least significant bits of the operand; the "y"  
    and "w" components are obtained from the 16 most significant bits.  
  
    This operation undoes the type conversion and packing performed by the  
    PK2US instruction.  
  
      tmp = ScalarLoad(op0);  
      result.x = ((RawBits(tmp) >> 0)  & 0xFFFF) / 65535.0;  
      result.y = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0;  
      result.z = ((RawBits(tmp) >> 0)  & 0xFFFF) / 65535.0;  
      result.w = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0;  
  
    Since the source operand must be a 32-bit scalar, a fragment program will  
    fail to load if the operand is not obtained from a register with 32-bit  
    components or from a program parameter.  
  
  
    Section 3.11.5.43,  UP4B:  Unpack Four Signed 8-Bit Values  
  
    The UP4B instruction unpacks four 8-bit signed values packed together in a  
    32-bit scalar operand.  The signed quantities are encoded where a bit  
    pattern of all '0' bits corresponds to -128/127 and a pattern of all '1'  
    bits corresponds to +127/127.  The "x" component of the result vector is  
    the converted value corresponding to the 8 least significant bits of the  
    operand; the "w" component corresponds to the 8 most significant bits.  
  
    This operation undoes the type conversion and packing performed by the  
    PK4B instruction.  
  
      tmp = ScalarLoad(op0);  
      result.x = (((RawBits(tmp) >> 0) & 0xFF) - 128) / 127.0;  
      result.y = (((RawBits(tmp) >> 8) & 0xFF) - 128) / 127.0;  
      result.z = (((RawBits(tmp) >> 16) & 0xFF) - 128) / 127.0;  
      result.w = (((RawBits(tmp) >> 24) & 0xFF) - 128) / 127.0;  
  
    Since the source operand must be a 32-bit scalar, a fragment program will  
    fail to load if the operand is not obtained from a register with 32-bit  
    components or from a program parameter.  
  
  
    Section 3.11.5.44,  UP4UB:  Unpack Four Unsigned 8-Bit Scalars  
  
    The UP4UB instruction unpacks four 8-bit unsigned values packed together  
    in a 32-bit scalar operand.  The unsigned quantities are encoded where a  
    bit pattern of all '0' bits corresponds to 0.0 and a pattern of all '1'  
    bits corresponds to 1.0.  The "x" component of the result vector is  
    obtained from the 8 least significant bits of the operand; the "w"  
    component is obtained from the 8 most significant bits.  
  
    This operation undoes the type conversion and packing performed by the  
    PK4UB instruction.  
  
      tmp = ScalarLoad(op0);  
      result.x = ((RawBits(tmp) >> 0)  & 0xFF) / 255.0;  
      result.y = ((RawBits(tmp) >> 8)  & 0xFF) / 255.0;  
      result.z = ((RawBits(tmp) >> 16) & 0xFF) / 255.0;  
      result.w = ((RawBits(tmp) >> 24) & 0xFF) / 255.0;  
  
    Since the source operand must be a 32-bit scalar, a fragment program will  
    fail to load if the operand is not obtained from a register with 32-bit  
    components or from a program parameter.  
  
  
    Section 3.11.5.45,  X2D:  2D Coordinate Transformation  
  
    The X2D instruction multiplies the 2D offset vector specified by the "x"  
    and "y" components of the second vector operand by the 2x2 matrix  
    specified by the four components of the third vector operand, and adds the  
    transformed offset vector to the 2D vector specified by the "x" and "y"  
    components of the first vector operand.  The first component of the sum is  
    written to the "x" and "z" components of the result; the second component  
    is written to the "y" and "w" components of the result.  
  
    The X2D instruction can be used to displace texture coordinates in the  
    same manner as the OFFSET_TEXTURE_2D_NV mode in the GL_NV_texture_shader  
    extension.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      tmp2 = VectorLoad(op2);  
      result.x = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y;  
      result.y = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w;  
      result.z = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y;  
      result.w = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w;  
  
  
    Section 3.11.6, Fragment Program Outputs  
  
    Upon completion of fragment program execution, the output registers are  
    used to replace the fragment's associated data.  
  
    The RGBA color of the fragment is taken from the color output register  
    used by the program (COLR or COLH).  The R, G, B, and A color components  
    are extracted from the "x", "y", "z", and "w" components, respectively, of  
    the output register and are clamped to the range [0,1].  
  
    If the DEPR output register is written by the fragment program, the depth  
    value of the fragment is taken from the z component of the DEPR output  
    register.  If depth clamping is enabled, the depth value is clamped to the  
    range [min(n,f), max(n,f)], where n and f are the near and far depth range  
    values.  If depth clamping is disabled, the fragment is discarded if its  
    depth value is outside the range [min(n,f), max(n,f)].  
  
  
    Section 3.11.7, Required Fragment Program State  
  
    The state required for managing fragment programs consists of:  
  
      a bit indicating whether or not fragment program mode is enabled;  
  
      an unsigned integer naming the currently bound fragment program  
  
      and the state that must be maintained to indicate which integers are  
      currently in use as fragment program names.  
  
    Fragment program mode is initially disabled.  The initial state of all 128  
    fragment program parameter registers is (0,0,0,0).  The initial currently  
    bound fragment program is zero.  
  
    Each fragment program object consists of:  
  
      an enumerant given the program target (FRAGMENT_PROGRAM_NV);  
  
      a boolean indicating whether the program is resident;  
  
      an array of type ubyte containing the program string;  
  
      an integer representing the length of the program string array;  
  
      one four-component floating-point vector for each named local  
      parameter in the program;  
  
      and a set of MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV four-component  
      floating-point vectors to hold numbered local parameters, each initially  
      set to (0,0,0,0).  
  
    Initially, no program objects exist.  
  
    Additionally, the state required during the execution of a fragment  
    program consists of:  twelve 4-component floating-point fragment attribute  
    registers, thirty-two 128-bit physical temporary registers, and a single  
    4-component condition code, whose components have one of four values (LT,  
    EQ, GT, or UN).  
  
    Each time a fragment program is executed, the fragment attribute registers  
    are initialized with the fragment's location and associated data, all  
    temporary register components are initialized to zero, and all condition  
    code components are initialized to EQ.  
  
  
    Renumber Section 3.11 to Section 3.12, Antialiasing Application (p.140).  
    No changes to the text of the section.

Additions to Chapter 4 of the OpenGL 1.2.1 Specification (Per-Fragment Operations and the Framebuffer)

  
    None

Additions to Chapter 5 of the OpenGL 1.2.1 Specification (Special Functions)

  
    Add new section 5.7, Programs (after "Flush and Finish")  
  
    Programs are specified as an array of ubytes used to control the operation  
    of portions of the GL.  The array is a string of ASCII characters encoding  
    the program.  
  
    The command  
  
      LoadProgramNV(enum target, uint id, sizei len, const ubyte *program);  
  
    loads a program.  The target parameter specifies the type of program  
    loaded and can be VERTEX_PROGRAM_NV, VERTEX_STATE_PROGRAM_NV, or  
    FRAGMENT_PROGRAM_NV.  VERTEX_PROGRAM_NV specifies a program to be executed  
    in vertex program mode as each vertex is specified.  VERTEX_STATE_PROGRAM  
    specifies a program to be run manually to update vertex state.  
    FRAGMENT_PROGRAM specifies a program to be executed in fragment program  
    mode as each fragment is rasterized.  
  
    Multiple programs can be loaded with different names.  id names the  
    program to load.  The name space for programs is the set of positive  
    integers (zero is reserved).  The error INVALID_VALUE is generated by  
    LoadProgramNV if a program is loaded with an id of zero.  The error  
    INVALID_OPERATION is generated by LoadProgramNV or if a program is loaded  
    for an id that is currently loaded with a program of a different program  
    target.  program is a pointer to an array of ubytes that represents the  
    program being loaded.  The length of the array in ubytes is indicated by  
    len.  
  
    At program load time, the program is parsed into a set of tokens possibly  
    separated by white space.  Spaces, tabs, newlines, carriage returns, and  
    comments are considered whitespace.  Comments begin with the character "#"  
    and are terminated by a newline, a carriage return, or the end of the  
    program array.  Tokens are processed in a case-sensitive manner:  upper  
    and lower-case letters are not considered equivalent.  
  
    Each program target has a corresponding Backus-Naur Form (BNF) grammar  
    specifying the syntactically valid sequences for programs of the specified  
    type.  The set of valid tokens can be inferred from the grammar.  The  
    token "" represents an empty string and is used to indicate optional  
    rules.  A program is invalid if it contains any undefined tokens or  
    characters.  
  
    The error INVALID_OPERATION is generated by LoadProgramNV if a program  
    fails to load because it is not syntactically correct or fails to satisfy  
    all of the semantic restrictions corresponding to the program target.  
  
    A successfully loaded program is parsed into a sequence of instructions.  
    Each instruction is identified by its tokenized name.  The operation of  
    these instructions is specific to the program target and is defined  
    elsewhere.  
  
    A successfully loaded program replaces the program previously assigned to  
    the name specified by id.  If the OUT_OF_MEMORY error is generated by  
    LoadProgramNV, no change is made to the previous contents of the named  
    program.  
  
    Querying the value of PROGRAM_ERROR_POSITION_NV returns a ubyte offset  
    into the program string most recently passed to LoadProgramNV indicating  
    the position of the first error, if any, in the program.  If the program  
    fails to load because of a semantic restriction that cannot be determined  
    until the program is fully scanned, the error position will be len, the  
    length of the program.  If the program loads successfully, the value of  
    PROGRAM_ERROR_POSITION_NV is assigned the value negative one.  
  
    For targets whose programs are executed automatically (e.g., vertex and  
    fragment programs), there must be a current program.  The current vertex  
    program is executed automatically in vertex program mode as vertices are  
    specified.  The current fragment program is executed automatically in  
    fragment program mode as fragments are generated by rasterization.  
    Current programs for a program target are updated by  
  
      BindProgramNV(enum target, uint id);  
  
    where target must be VERTEX_PROGRAM_NV or FRAGMENT_PROGRAM_NV.  The error  
    INVALID_OPERATION is generated by BindProgramNV if id names a program that  
    has a type different than target (for example, if id names a vertex state  
    program as described in section 2.14.4).  
  
    Binding to a nonexistent program id does not generate an error.  In  
    particular, binding to program id zero does not generate an error.  
    However, because program zero cannot be loaded, program zero is always  
    nonexistent.  If a program id is successfully loaded with a new vertex  
    program and id is also the currently bound vertex program, the new program  
    is considered the currently bound vertex program.  
  
    The INVALID_OPERATION error is generated when both vertex program mode is  
    enabled and Begin is called (or when a command that performs an implicit  
    Begin is called) if the current vertex program is nonexistent or not  
    valid.  A vertex program may not be valid for reasons explained in section  
    2.14.5.  
  
    The INVALID_OPERATION error is generated when both fragment program mode  
    is enabled and Begin, another GL command that performs an implicit Begin,  
    or any other GL command that generates fragments is called, if the current  
    fragment program is nonexistent or not valid.  A fragment program may be  
    invalid for reasons explained in Section 3.11.3.  
  
    Programs are deleted by calling  
  
      void DeleteProgramsNV(sizei n, const uint *ids);  
  
    ids contains n names of programs to be deleted.  After a program is  
    deleted, it becomes nonexistent, and its name is again unused.  If a  
    program that is currently bound is deleted, it is as though BindProgramNV  
    has been executed with the same target as the deleted program and program  
    zero.  Unused names in ids are silently ignored, as is the value zero.  
  
    The command  
  
      void GenProgramsNV(sizei n, uint *ids);  
  
    returns n currently unused program names in ids.  These names are marked  
    as used, for the purposes of GenProgramsNV only, but they become existent  
    programs only when the are first loaded using LoadProgramNV.  
  
    An implementation may choose to establish a working set of programs on  
    which binding and/or manual execution are performed with higher  
    performance.  A program that is currently part of this working set is said  
    to be resident.  
  
    The command  
        
      boolean AreProgramsResidentNV(sizei n, const uint *ids,  
                                    boolean *residences);  
  
    returns TRUE if all of the n programs named in ids are resident, or if the  
    implementation does not distinguish a working set.  If at least one of the  
    programs named in ids is not resident, then FALSE is returned, and the  
    residence of each program is returned in residences.  Otherwise the  
    contents of residences are not changed.  If any of the names in ids are  
    nonexistent or zero, FALSE is returned, the error INVALID_VALUE is  
    generated, and the contents of residences are indeterminate.  The  
    residence status of a single named program can also be queried by calling  
    GetProgramivNV (Section 6.1.13) with id set to the name of the program and  
    pname set to PROGRAM_RESIDENT_NV.  
  
    AreProgramsResidentNV indicates only whether a program is currently  
    resident, not whether it could not be made resident.  An implementation  
    may choose to make a program resident only on first use, for example.  The  
    client may guide the GL implementation in determining which programs  
    should be resident by requesting a set of programs to make resident.  
  
    The command  
  
      void RequestResidentProgramsNV(sizei n, const uint *ids);  
  
    requests that the n programs named in ids should be made resident.  
    While all the programs are not guaranteed to become resident,  
    the implementation should make a best effort to make as many of  
    the programs resident as possible.  As a result of making the  
    requested programs resident, program names not among the requested  
    programs may become non-resident.  Higher priority for residency  
    should be given to programs listed earlier in the ids array.  
    RequestResidentProgramsNV silently ignores attempts to make resident  
    nonexistent program names or zero.  AreProgramsResidentNV can be  
    called after RequestResidentProgramsNV to determine which programs  
    actually became resident.  
  
    The commands  
  
      void ProgramNamedParameter4fNV(uint id, sizei len, const ubyte *name,  
                                     float x, float y, float z, float w);  
      void ProgramNamedParameter4dNV(uint id, sizei len, const ubyte *name,  
                                     double x, double y, double z, double w);  
      void ProgramNamedParameter4fvNV(uint id, sizei len, const ubyte *name,  
                                      const float v[]);  
      void ProgramNamedParameter4dvNV(uint id, sizei len, const ubyte *name,  
                                      const double v[]);  
  
    specify a new value for the named program local parameter <name> belonging  
    to the fragment program specified by <id>.  <name> is a pointer to an  
    array of ubytes holding the parameter name.  <len> specifies the number of  
    ubytes in the array given by <name>.  The new x, y, z, and w components of  
    the named local parameter are given by x, y, z, and w, respectively, for  
    ProgramNamedParameter4fNV and ProgramNamedParameter4dNV, and by v[0],  
    v[1], v[2], and v[3], respectively, for ProgramNamedParameter4fvNV and  
    ProgramNamedParameter4dvNV.  The error INVALID_OPERATION is generated if  
    <id> specifies a nonexistent program or a program whose type does not  
    suport named local parameters.  The error INVALID_VALUE error is generated  
    if <name> does not specify the name of a local parameter in the program  
    corresponding to <id>.  The error INVALID_VALUE is also generated if <len>  
    is zero.  
  
    The commands  
  
      void ProgramLocalParameter4fARB(enum target, uint index,  
                                      float x, float y, float z, float w);  
      void ProgramLocalParameter4fvARB(enum target, uint index,   
                                       const float *params);  
      void ProgramLocalParameter4dARB(enum target, uint index,  
                                      double x, double y, double z, double w);  
      void ProgramLocalParameter4dvARB(enum target, uint index,   
                                       const double *params);  
  
    update the values of the numbered program local parameter <index>  
    belonging to the program object currently bound to <target>.  For  
    ProgramLocalParameter4fARB and ProgramLocalParameter4dARB, the four  
    components of the parameter are updated with the values of <x>, <y>, <z>,  
    and <w>, respectively.  For ProgramLocalParameter4fvARB and  
    ProgramLocalParameter4dvARB, the four components of the parameter are  
    updated with the array of four values pointed to by <params>.  The error  
    INVALID_VALUE is generated if <index> is greater than or equal to the  
    number of numbered program local parameters supported by <target>.

Additions to Chapter 6 of the OpenGL 1.2.1 Specification (State and State Requests)

  
    Modify Section 6.1.11, Pointer and String Queries (p. 206)  
  
    (modify last paragraph, p. 206) ... The possible values for <name> are  
    VENDOR, RENDERER, VERSION, EXTENSIONS, and PROGRAM_ERROR_STRING_NV.  
  
    (add after last paragraph of section, p. 207) Queries of  
    PROGRAM_ERROR_STRING_NV return a pointer to an implementation-dependent  
    program load error string.  If the last call to LoadProgramNV failed to  
    load a program, the returned string describes a reason that the program  
    failed to load.  Otherwise, a pointer to an empty string (containing only  
    a terminator) is returned.  
  
    Rename and modify Section 6.1.13, Vertex and Fragment Program Queries  
    (from GL_NV_fragment_program).  Portions of this section pertaining to  
    fragment programs are copied verbatim.  
  
    (insert after discussion of GetProgramParameter[fd]vNV)  
  
    The commands  
  
      void GetProgramNamedParameterfvNV(uint id, sizei len,  
                                        const ubyte *name, float *params);  
      void GetProgramNamedParameterdvNV(uint id, sizei len,  
                                        const ubyte *name, double *params);  
  
    obtain the current program named local parameter value for the parameter  
    named <name> belonging to the program given by <id>.  <name> is a pointer  
    to an array of ubytes holding the parameter name.  <len> specifies the  
    number of ubytes in the array given by <name>.  The error  
    INVALID_OPERATION is generated if <id> specifies a nonexistent program or  
    a program whose type does not suport named local parameters.  The error  
    INVALID_VALUE is generated if <name> does not specify the name of a local  
    parameter in the program corresponding to <id>.  The error INVALID_VALUE  
    is also generated if <len> is zero.  Each named program local parameter is  
    an array of four values.  
  
    The commands  
  
      void GetProgramLocalParameterdvARB(enum target, uint index,  
                                         double *params);  
      void GetProgramLocalParameterfvARB(enum target, uint index,  
                                         float *params);  
  
    obtain the current value for the numbered program local parameter <index>  
    belonging to the program object currently bound to <target>, and places  
    the information in the array <params>.  The error INVALID_ENUM is  
    generated if <target> specifies a nonexistent program target or a program  
    target that does not support numbered program local parameters.  The error  
    INVALID_VALUE is generated if <index> is greater than or equal to the  
    implementation-dependent number of supported numbered program local  
    parameters for the program target.  
  
    When the program target type is FRAGMENT_PROGRAM_NV, each numbered program  
    local parameter returned is an array of four values.  ...  
  
    The command  
  
      void GetProgramivNV(uint id, enum pname, int *params);  
  
    obtains program state named by pname for the program named id in the array  
    params.  pname must be one of PROGRAM_TARGET_NV, PROGRAM_LENGTH_NV, or  
    PROGRAM_RESIDENT_NV.  The error INVALID_OPERATION is generated if the  
    program named id does not exist.  
  
    The command  
  
      void GetProgramStringNV(uint id, enum pname,  
                              ubyte *program);  
  
    obtains the program string for program id.  pname must be  
    PROGRAM_STRING_NV.  n ubytes are returned into the array program  
    where n is the length of the program in ubytes.  GetProgramivNV with  
    PROGRAM_LENGTH_NV can be used to query the length of a program's  
    string.  The INVALID_OPERATION error is generated if the program  
    named id does not exist.  
  
    ...  
  
    The command  
  
      boolean IsProgramNV(uint id);  
  
    returns TRUE if program is the name of a program object.  If program  
    is zero or is a non-zero value that is not the name of a program  
    object, or if an error condition occurs, IsProgramNV returns FALSE.  
    A name returned by GenProgramsNV but not yet loaded with a program  
    is not the name of a program object."

Additions to Appendix F of the OpenGL 1.2.1 Specification (ARB Extensions)

  
    Modify Section F.2.3 (Changes to Section 2.6), p.240  
   
    (modify last paragraph on p.240) ... Multiple sets of texture coordinates  
    may be used to specify how multiple texture images are mapped onto a  
    primitive.  The number of texture coordinate sets supported is  
    implementation dependent, but must be at least 1.  The number of texture  
    coordinate sets supported may be queried with the state  
    MAX_TEXTURE_COORDS_NV.  
  
    Modify Section F.2.4 (Changes to Section 2.7), p.241  
  
    (modify the last paragraph on p.241, carrying over to p.243)  
    Implementations may support more than one set of texture coordinates.  The  
    commands  
  
        void MultiTexCoord{1234}{sifd}ARB(enum texture, T coords)  
        void MultiTexCoord{1234}{sifd}vARB(enum texture, T coords)  
  
    take the coordinate set to be modified as the <texture> parameter.  
    <texture> is a symbolic constant of the form TEXTUREi_ARB, indicating that  
    texture coordinate set i is to be modified.  The constants obey  
    TEXTUREi_ARB = TEXTURE0_ARB + i (i is in the range 0 to k-1, where k is  
    the implementation dependent number of texture units defined by  
    MAX_TEXTURE_COORDS_NV).  
  
  
    Modify Section F.2.5 (Changes to Section 2.8), p.243  
  
    (modify first and second paragraphs of section) ... The client may specify  
    up to 5 plus the value of MAX_TEXTURE_COORDS_NV arrays; one each to store  
    vertex coordinates...  
  
    In implementations which support more than one texture coordinate set, the  
    command  
  
        void ClientActiveTextureARB(enum texture)  
  
    is used to select the vertex array client state parameters to be modified  
    by the TexCoordPointer command and the array affected by EnableClientState  
    and DisableClientState with the parameter TEXTURE_COORD_ARRAY.  This  
    command sets the state variable CLIENT_ACTIVE_TEXTURE_ARB.  Each texture  
    coordinate set has a client state vector which is selected when this  
    command is invoked.  This state vector also includes the vertex array  
    state.  This command also selects the texture coordinate set state used  
    for queries of client state.  
  
    (modify first paragraph on p.244) If the number of supported texture  
    coordinate sets (the value of MAX_TEXTURE_COORDS_NV) is k, ...  
  
  
    Modify Section F.2.6 (Changes to Section 2.10.2), p.244  
  
    (modify first paragraph)  For each texture coordinate set, a 4x4 matrix is  
    applied to the corresponding texture coordinates...  
  
    (replace second and third paragraphs) The command  
  
      void ActiveTextureARB(enum texture);  
  
    specifies the active texture unit selector, ACTIVE_TEXTURE_ARB.  Each  
    texture unit contains up to two distinct sub-units:  a texture coordinate  
    processing unit (consisting of a texture matrix stack and texture  
    coordinate generation state) and a texture image unit (consisting of all  
    the texture state defined in Section 3.8).  In implementations with a  
    different number of supported texture coordinate sets and texture image  
    units, some texture units may consist of only one of the two sub-units.  
  
    The active texture unit selector specifies the texture unit accessed by  
    commands involving texture coordinate processing.  Such commands include  
    those accessing the current matrix stack (if MATRIX_MODE is TEXTURE),  
    TexGen (Section 2.10.4), Enable/Disable (if any texture coordinate  
    generation enum is selected), as well as queries of the current texture  
    coordinates and current raster texture coordinates.  If the texture unit  
    number corresponding to the current value of ACTIVE_TEXTURE_ARB is greater  
    than or equal to the implementation dependent constant  
    MAX_TEXTURE_COORDS_NV, the error INVALID_OPERATION is generated by any  
    such command.  
  
    The active texture unit selector also selects the texture unit accessed by  
    commands involving texture image processing (Section 3.8).  Such commands  
    include all variants of TexEnv, TexParameter, and TexImage commands,  
    BindTexture, Enable/Disable for any texture target (e.g., TEXTURE_2D), and  
    queries of all such state.  If the texture unit number corresponding to  
    the current value of ACTIVE_TEXTURE_ARB is greater than or equal to the  
    implementation dependent constant MAX_TEXTURE_IMAGE_UNITS_NV, the error  
    INVALID_OPERATION is generated by any such command.  
  
    ActiveTextureARB generates the error INVALID_ENUM if an invalid <texture>  
    is specified.  <texture> is a symbolic constant of the form TEXTUREi_ARB,  
    indicating that texture unit i is to be modified.  The constants obey  
    TEXTUREi_ARB = TEXTURE0_ARB + i (i is in the range 0 to k-1, where k is  
    the larger of the MAX_TEXTURE_COORDS_NV and MAX_TEXTURE_IMAGE_UNITS_NV).  
    For compatibility with old OpenGL specifications, the implementation  
    dependent constant MAX_TEXTURE_UNITS_ARB specifies the number of  
    conventional texture units supported by the implementation.  Its value  
    must be no larger than the minimum of MAX_TEXTURE_COORDS_NV and  
    MAX_TEXTURE_IMAGE_UNITS_NV.  
  
    Modify Section F.2.12 (Changes to Section 3.8.10), p.249  
  
    (modify next-to-last paragraph) Texturing is enabled and disabled  
    individually for each texture unit.  If texturing is disabled for one of  
    the units, then the fragment resulting from the previous unit is passed  
    unaltered to the following unit.  Individual texture units beyond those  
    specified by MAX_TEXTURE_UNITS_ARB may be incomplete and are always  
    treated as disabled.  
  
    Modify Section F.2.15 (Changes to Section 6.1.2), p.251  
      
    (add to end of paragraph) Queries of texture state variables corresponding  
    to texture coordinate processing unit (namely, TexGen state and enables,  
    and matrices) will produce an INVALID_OPERATION error if the value of  
    ACTIVE_TEXTURE_ARB is greater than or equal to MAX_TEXTURE_COORDS_NV.  All  
    other texture state queries will result in an INVALID_OPERATION error if  
    the value of ACTIVE_TEXTURE_ARB is greater than or equal to  
    MAX_TEXTURE_IMAGE_UNITS_NV.

Additions to the AGL/GLX/WGL Specifications

  
    Program objects are shared between AGL/GLX/WGL rendering contexts if  
    and only if the rendering contexts share display lists.  No change  
    is made to the AGL/GLX/WGL API.

Dependencies on GL_NV_vertex_program

  
    If NV_vertex_program is supported, the description of LoadProgramNV in  
    Section 2.14.1.7 (up to the BNF description of vertex programs) is  
    deleted, as it is replaced by the contents of Section 5.7 in this  
    specification.  The general error descriptions in Section 2.14.1.7 common  
    to Section 5.7 (like INVALID_OPERATION if the program fails to compile)  
    should also be deleted.  Section 2.14.1.8 should also be deleted.  Section  
    6.1.13 is modified by this specification as described above.

Dependencies on NV_texture_shader

  
    If NV_texture_shader is not supported, the comment about texture shaders  
    being disabled in fragment program mode is not applicable.

Dependencies on NV_texture_rectangle

    
    If NV_texture_rectangle is not supported, the references to "RECT" in the  
    <texImageTarget> grammar rule and TEXTURE_RECTANGLE_NV are not applicable.

Dependencies on ARB_texture_cube_map

    
    If ARB_texture_cube_map is not supported, the references to "CUBE" in the  
    <texImageTarget> grammar rule and TEXTURE_CUBE_MAP_ARB are not applicable.

Dependencies on EXT_fog_coord

  
    If EXT_fog_coord is not supported, references to "fog coordinate" in the  
    definition of the "FOGC" fragment attribute register should be removed.

Dependencies on NV_depth_clamp

  
    If NV_depth_clamp is not supported, section 3.11.6 is modified to remove  
    discussion of the depth clamp enable and instead indicate that fragments  
    with depth values outside [min(n,f), max(n,f)] are always discarded.

Dependencies on ARB_depth_texture and SGIX_depth_texture

  
    If ARB_depth_texture is not supported, but SGIX_depth_texture is  
    supported, the discussion of Table X.5 is modified to indicate that  
    DEPTH_COMPONENT textures are treated as LUMINANCE.  
  
    If neither extension is supported, the discussion of DEPTH_COMPONENT  
    textures in Table X.5 should be removed.

Dependencies on NV_float_buffer

  
    If NV_float_buffer is not supported, references to FLOAT_R_NV,  
    FLOAT_RG_NV, FLOAT_RGB_NV, and FLOAT_RGBA_NV internal texture formats in  
    Table X.5 should be removed.

Dependencies on ARB_vertex_program

  
    This extension does not have any explicit dependencies, but the APIs for  
    setting and querying numbered local parameters (ProgramLocalParameter*ARB  
    and GetProgramLocalParameter*ARB) were taken directly from this extension,

Dependencies on ARB_fragment_program

  
    If ARB_fragment_program is not supported, the maximum number of executable  
    instructions in any !!FP1.0 program is 1024.  If ARB_fragment_program is  
    supported, the maximum number of executable instructions for an !!FP1.0 is  
    at least 1024, but can be larger.  The limit can be queried by calling  
    GetProgramiv with <target> set to FRAGMENT_PROGRAM_ARB and <pname> set to  
    MAX_PROGRAM_INSTRUCTIONS_ARB.

GLX Protocol

  
    Most of the GLX protocol needed to implement this extension is described  
    in the GL_NV_vertex_program extension specification and will not be  
    repeated here.  
  
    The following two rendering commands are potentially large, and hence can  
    be sent in a glXRender or glXRenderLarge request.  
  
        ProgramNamedParameter4fvNV  
            2           28+len+p        rendering command length  
            2           4218            rendering command opcode  
            4           CARD32          id  
            4           CARD32          len  
            4           FLOAT32         params[0]  
            4           FLOAT32         params[1]  
            4           FLOAT32         params[2]  
            4           FLOAT32         params[3]  
            len         LISTofCARD8     name  
            p                           unused, p=pad(len)  
  
         If the command is encoded in a glxRenderLarge request, the command  
         opcode and command length fields above are expanded to 4 bytes each:  
  
            4           32+len+p        rendering command length  
            4           4218            rendering command opcode  
  
  
        ProgramNamedParameter4dvNV  
            2           44+len+p        rendering command length  
            2           4219            rendering command opcode  
            4           CARD32          id  
            4           CARD32          len  
            8           FLOAT64         params[0]  
            8           FLOAT64         params[1]  
            8           FLOAT64         params[2]  
            8           FLOAT64         params[3]  
            len         LISTofCARD8     name  
            p                           unused, p=pad(len)  
  
         If the command is encoded in a glxRenderLarge request, the command  
         opcode and command length fields above are expanded to 4 bytes each:  
  
            4           48+len+p        rendering command length  
            4           4219            rendering command opcode  
  
  
    The remaining two commands are non-rendering commands.  These commands are  
    sent separately (i.e., not as part of a glXRender or glXRenderLarge  
    request), using the glXVendorPrivateWithReply request:  
  
        GetProgramNamedParameter4fvNV  
            1           CARD8           opcode (X assigned)  
            1           17              GLX opcode (glXVendorPrivateWithReply)  
            2           4+(len+p)/4     request length  
            4           1310            vendor specific opcode  
            4           GLX_CONTEXT_TAG context tag  
            4           INT32           len  
            len         LISTofCARD8     name  
            p                           unused, p=pad(len)  
          =>  
  
          If the command succeeds, 4 floats are sent in the reply:  
  
            1           1               reply  
            1                           unused  
            2           CARD16          sequence number  
            4           4               reply length  
            24                          unused  
            16          LISTofFLOAT32   params  
  
          Otherwise, an empty reply is sent, indicating that a GL error  
          occured:  
  
            1           1               reply  
            1                           unused  
            2           CARD16          sequence number  
            4           0               reply length  
            24                          unused  
  
  
        GetProgramNamedParameter4dvNV  
            1           CARD8           opcode (X assigned)  
            1           17              GLX opcode (glXVendorPrivateWithReply)  
            2           4+(len+p)/4     request length  
            4           1311            vendor specific opcode  
            4           GLX_CONTEXT_TAG context tag  
            4           INT32           len  
            len         LISTofCARD8     name  
            p                           unused, p=pad(len)  
          =>  
  
          If the command succeeds, 4 doubles are sent in the reply:  
  
            1           1               reply  
            1                           unused  
            2           CARD16          sequence number  
            4           8               reply length  
            24                          unused  
            32          LISTofFLOAT64   params  
  
          Otherwise, an empty reply is sent, indicating that a GL error  
          occured:  
  
            1           1               reply  
            1                           unused  
            2           CARD16          sequence number  
            4           0               reply length  
            24                          unused

Errors

  
    INVALID_OPERATION is generated by Begin, DrawPixels, Bitmap, CopyPixels,  
    or a command that performs an explicit Begin if FRAGMENT_PROGRAM_NV is  
    enabled and the currently bound fragment program does not exist.  
  
    INVALID_OPERATION is generated by ProgramNamedParameter4fNV,  
    ProgramNamedParameter4dNV, ProgramNamedParameter4fvNV,  
    ProgramNamedParameter4dvNV, GetProgramNamedParameterfvNV, or  
    GetProgramNamedParameterdvNV if <id> specifies a nonexistent program or a  
    program whose type does not suport local parameters.  
  
    INVALID_VALUE is generated by ProgramNamedParameter4fNV,  
    ProgramNamedParameter4dNV, ProgramNamedParameter4fvNV,  
    ProgramNamedParameter4dvNV, GetProgramNamedParameterfvNV, or  
    GetProgramNamedParameterdvNV if <len> is zero.  
  
    INVALID_VALUE is generated by ProgramNamedParameter4fNV,  
    ProgramNamedParameter4dNV, ProgramNamedParameter4fvNV,  
    ProgramNamedParameter4dvNV, GetProgramNamedParameterfvNV, or  
    GetProgramNamedParameterdvNV if <name> does not specify the name of a  
    local parameter in the program corresponding to <id>.  
  
    INVALID_OPERATION is generated by any command accessing texture coordinate  
    processing state if the texture unit number corresponding to the current  
    value of ACTIVE_TEXTURE_ARB is greater than or equal to the implementation  
    dependent constant MAX_TEXTURE_COORDS_NV.  
  
    INVALID_OPERATION is generated by any command accessing texture image  
    processing state if the texture unit number corresponding to the current  
    value of ACTIVE_TEXTURE_ARB is greater than or equal to the implementation  
    dependent constant MAX_TEXTURE_IMAGE_UNITS_NV.  
  
  
    (The following are error descriptions copied from GL_NV_vertex_program  
     that apply to this extension as well.  These modifications do not affect  
     the behavior of that extension.)  
  
    INVALID_VALUE is generated by LoadProgramNV if id is zero.  
  
    INVALID_OPERATION is generated by LoadProgramNV if the program  
    corresponding to id is currently loaded but has a program type different  
    from that given by target.  
  
    INVALID_OPERATION is generated by LoadProgramNV if the program specified  
    is syntactically incorrect for the program type specified by target.  The  
    value of PROGRAM_ERROR_POSITION_NV is still updated when this error is  
    generated.  
  
    INVALID_OPERATION is generated by LoadProgramNV if the program specified  
    fails to conform to any of the semantic restrictions imposed on programs  
    of the type specified by target.  The value of PROGRAM_ERROR_POSITION_NV  
    is still updated when this error is generated.  
  
    INVALID_OPERATION is generated by BindProgramNV if target does not match  
    the type of the program named by id.  
  
    INVALID_VALUE is generated by AreProgramsResidentNV if any of the queried  
    programs are zero or do not exist.  
  
    INVALID_OPERATION is generated by GetProgramivNV or GetProgramStringNV if  
    the program named id does not exist.

New State

  
Get Value                          Type  Get Command              Initial Value  Description         Section   Attribute  
---------------------------------  ----  -----------------------  -------------  ------------------  --------  ------------  
FRAGMENT_PROGRAM_NV                B     IsEnabled                FALSE          fragment program    3.11      enable  
                                                                                 mode enable  
FRAGMENT_PROGRAM_BINDING_NV        Z+    GetIntegerv              0              bound fragment      5.7       -  
                                                                                 program  
  
Table X.6.  New State Introduced by NV_fragment_program.  
  
  
Get Value                  Type    Get Command          Initial Value  Description         Section   Attribute  
-------------------------  ------  ------------------   -------------  ------------------  --------  ---------  
PROGRAM_ERROR_POSITION_NV  Z       GetIntegerv          -1             program error       5.7       -  
                                                                       position  
PROGRAM_TARGET_NV          Z2      GetProgramivNV       0              program target      6.1.13    -  
PROGRAM_LENGTH_NV          Z+      GetProgramivNV       0              program length      6.1.13    -  
PROGRAM_RESIDENT_NV        Z2      GetProgramivNV       False          program residency   6.1.13    -  
PROGRAM_STRING_NV          ubxn    GetProgramStringNV   ""             program string      6.1.13    -  
-                          nxR4    GetProgramNamed-     (0,0,0,0)      named program local 5.7       -  
                                   ParameterNV                         parameter value  
-                          64+xR4  GetProgramLocal-     (0,0,0,0)      numbered program    5.7       -  
                                   ParameterARB                        local parameter  
  
Table X.7.  Program Object State common to NV_vertex_program and NV_fragment_program.  
  
  
Get Value    Type    Get Command   Initial Value  Description               Section   Attribute  
---------    ------  -----------   -------------  -----------------------   --------  ---------  
-            12xR4   -             fragment data  fragment attribute  
                                                  registers                 3.11.1.1  -  
-            16xR4   -             (0,0,0,0)      fp32 temporary registers  3.11.1.2  -  
-            32xR4   -             (0,0,0,0)      fp16 temporary registers  3.11.1.2  -  
             (Z_4)4  -             (EQ,EQ,EQ,EQ)  condition code register   3.11.1.4  -  
                                                  address register  
  
Table X.8.  Fragment Program Per-Fragment Execution State.

New Implementation Dependent State

  
                                                 Minimum  
Get Value                   Type   Get Command    Value       Description    Section  Attribute  
---------                   ----   -----------   -------  -----------------  -------  ---------  
MAX_TEXTURE_COORDS_NV       Z+     GetIntegerv      2     number of texture  2.6      -  
                                                          coordinate sets  
                                                          supported  
MAX_TEXTURE_IMAGE_UNITS_NV  Z+     GetIntegerv      2     number of texture  2.10.2   -  
                                                          image units  
                                                          supported  
MAX_FRAGMENT_PROGRAM_       Z+     GetIntegerv     64     number of numbered 3.11.7   -  
  LOCAL_PARAMETERS_NV                                     local parameters  
                                                          supported

Revision History

  
    Rev.    Date    Author   Changes  
    ----  -------- --------  --------------------------------------------  
     73   05/23/05  pbrown   Fixed cut-and-paste error in the dependency   
                             section where it said "NV_texture_rectangle"  
                             instead of "ARB_texture_cube_map".  
  
     72   05/16/04  pbrown   Documented that it's not possible to results from  
                             LG2 that are any more precise than what is  
                             available in the fp32 storage format.  
  
     71   04/23/04  pbrown   Fixed incorrect example.  
  
     70   03/20/03  pbrown   Made the instruction count limit for !!FP1.0  
                             programs queryable instead of a hard-wired value  
                             of 1024.  The limit can be queried using  
                             ARB_fragment_program mechanisms, and remains 1024  
                             if ARB_fragment_program is unsupported.  
  
     69   02/01/03  pbrown   Removed support for combiner fragment programs  
                             (!!FCP1.0).  
  
     68   01/08/03  pbrown   Correct spec language providing examples of NaNs,  
                             such as sqrt(-1) or log(-1).  Division by zero  
                             produces an infinity, not a NaN.  
  
     67   12/23/02  pbrown   Fix incorrect syntax of examples of "KIL"  
                             instruction. The condition code test is not  
                             parenthesized in KIL.   
  
     66   10/31/02  pbrown   Cleaned up special cases of POW, including the  
                             fact that "POW dst, 0, 0" produces NaN in this  
                             spec, not 1.0.  
  
     65   10/28/02  pbrown   Documented that signed HILO textures will have  
                             the hemisphere remapping applied, but unsigned  
                             textures will not.  
  
     64   09/17/02  pbrown   Minor typo fixes.  
  
     63   08/14/02  pbrown   Clarified the value of the "other" components  
                             of f[FOGC].  
  
     62   07/24/02  pbrown   Removed PK4UBG and UP4UBG instructions.  
                             Simplified the implementation of the temporary  
                             and output register limit for combiner  
                             programs by counting all four o[TEXn] registers  
                             against the limit, whether or not they are  
                             written.  
  
     61   07/19/02  pbrown   Renamed ProgramLocalParameter*NV to  
                             ProgramNamedParameter*NV to eliminate naming  
                             conflicts with ARB_vertex_program (and presumably  
                             ARB_fragment_program).  
                               
                             Added support for numbered program local  
                             parameters for compatibility with the ARB vertex  
                             program extension (and upcoming ARB fragment  
                             program extension), so it's possible to set local  
                             parameters the same way in both extensions.  
  
                             Eliminated the language describing "register  
                             slots" and how the "H" and "R" registers overlap.  
                             Instead, registers are guaranteed not to overlap,  
                             and a semantic limit is added on the number of  
                             temporaries and output registers that can be used  
                             by a program.  
  
                             Eliminated the requirement that non-combiner  
                             programs actually write a color value; the only  
                             requirement is that one output register be  
                             written.  When using fragment programs that use  
                             depth replacement, there may not be a need to  
                             compute color if color writes are currently  
                             disabled  
  
                             Cleaned up the issues section.  Added several  
                             examples of fragment program operation.  
  
                             Cleaned up GLX protocol.  
  
     59   07/07/02  pbrown   Minor clarifications of texture lookup handling.  
                             Documented that DDX and DDY may not always  
                             produce infinities.  
  
     58   06/27/02  pbrown   Added clarification that instructions can use the  
                             same attribute or parameter register more than  
                             once.  Added support for "X" precision on the  
                             "set on" instructions.  Removed "X" precision  
                             support from DST.  
  
     57   06/27/02  pbrown   Added missing table entries covering the use of  
                             floating-point textures.  
  
     56   06/27/02  pbrown   Modified the spec to indicate that depth textures  
                             are treated as alpha, luminance, or intensity  
                             according to the depth texture mode in ARB_shadow.  
  
     55   06/26/02  pbrown   Fixed the correct aliased register number and  
                             "read-only" mappings for o[DEPR] in combiner  
                             programs.  
  
     54   06/05/02  pbrown   Fixed the spec to indicate that near and far  
                             frustum clipping is disabled for depth  
                             replacement programs.  Fixed the spec to indicate  
                             that the register combiners enable is overridden  
                             for fragment programs (enabled for combiner  
                             programs, disabled for color programs).  
  
     53   05/20/02  pbrown   Miscellaneous bug fixes for wording and  
                             special-case handling errors.  
  
     52   05/16/02  pbrown   Added "_SAT" suffix to clamp result vector  
                             components to [0,1].  Fixed special case rules  
                             for MUL instruction and the "UN" condition code.  
  
     50   04/19/02  pbrown   Added "$" as a legal character in an identifier  
                             name.  Added example for fixed and conditional  
                             write masks and condition code updates.  
  
     49   04/16/02  pbrown   Added new query of PROGRAM_ERROR_STRING_NV to  
                             return more detailed information on program load  
                             failures.  
  
     48   04/02/02  pbrown   Added missing enum value for the  
                             FRAGMENT_PROGRAM_BINDING_NV query.   
  
     47   03/15/02  pbrown   Fixed various typos, and an incorrect description  
                             of the MAX operation.  
  
     45   01/31/02  pbrown   Renamed the packing and unpacking opcode to more  
                             closely match OpenGL data type naming conventions  
                             (PK2 becomes PK2H, PK16 becomes PH2US, PK4  
                             becomes PK4B, PKB becomes PK4UB).  Renamed "BEM"  
                             instruction to "X2D" to reflect the fact that it  
                             does a 2D coordinate transformation (not just a  
                             bump mapping operation).  Added PK4UBG and UP4UBG  
                             instructions to support sRGB gamma correction  
                             when packing and unpacking components.  
  
     44   01/18/02  pbrown   Double the number of available temporaries (16 to  
                             32 fp32 vectors).  Add BEM (texture coordinate  
                             offset), PKB/UPB (unsigned byte packing), and  
                             PK16/UP16 (unsigned short packing) instructions.  
  
     43   01/04/02  pbrown   Documented special cases for comparisons,  
                             including the handling of NaN in the SNE  
                             instruction. Added automatic generation of a  
                             third normal component for HILO textures.  
                             Documented the restriction that RFL can't write  
                             to the w component of the result.  Trivial fix of  
                             the special-cases for RCP.  Fixed minor typo on  
                             the TEX instruction.  
  
     40   11/26/01  pbrown   Eliminated "X" precision specifier on those  
                             instructions that do complicated math or don't  
                             otherwise need it (e.g., "SGE").  Fixed special  
                             case math on LG2 instruction.  Eliminated  
                             incorrectly specified exponent clamping on LIT  
                             instruction.  Fixed description and special-case  
                             math on LIT/POW instructions.  Specified that  
                             combiner program outputs are clamped to [-1,+1],  
                             not [+0,+1].  
  
     39   11/16/01  pbrown   Added semantic restriction that PK2/PK4 must  
                             write to a 32-bit register.  Cleaned up the  
                             converse restrictions on UP2/UP4, making sure to  
                             allow UP2/UP4 from a program parameter.  Fix  
                             section numberings and a few typos.  
  
     36   11/07/01  pbrown   Cleaned up explanation of the "negative q is  
                             undefined" for texture mapping spec restriction.  
                             Fixed a nit on the number of condition code  
                             values (now 4 with UN - unordered).  
  
     35   10/29/01  pbrown   Add a SUB instruction for programmer  
                             convenience. Moved unresolved issue list back to  
                             the "Issues" section.  Fix several minor wording  
                             issues.  Clarify register combiners/texture  
                             shader/fragment program flow control diagram.  
  
     32   10/19/01  pbrown   Document the fragment program restriction that  
                             instructions involving f[FOGC] and f[TEX0-TEX7]  
                             are always carried out at fp32 precision.  
  
     31   10/19/01  pbrown   Fixed incorrect description of encoding of fp16  
                             denorms.  
  
     30   10/12/01  pbrown   Documented (0,0,0,0) local parameter  
                             initialization.  Disallow multiple defines of the  
                             same token.  Allow tokens that look like a  
                             possible register or texture name, but have  
                             numbers that are too big (e.g., "TEX24", "R37").  
                             Fixed up several grammar bugs.  Documented that  
                             LG2 and RSQ now do not automatically take  
                             absolute values, plus new math special cases.  
                             absolute values, plus new math special cases.

Last update: November 14, 2006.
Cette page doit être lue avec un navigateur récent respectant le standard XHTML 1.1.