GL_NV_gpu_program4

GL_NV_gpu_program4

Name
Name Strings
Contact
Status
Version
Number
Dependencies
Overview
New Procedures and Functions
New Tokens
Additions to Chapter 2 of the OpenGL 1.5 Specification (OpenGL Operation)
Additions to Chapter 3 of the OpenGL 1.5 Specification (Rasterization)
Additions to Chapter 4 of the OpenGL 1.5 Specification (Per-Fragment Operations and the Frame Buffer)
Additions to Chapter 5 of the OpenGL 1.5 Specification (Special Functions)
Additions to Chapter 6 of the OpenGL 1.5 Specification (State and State Requests)
Additions to Appendix A of the OpenGL 1.5 Specification (Invariance)
Additions to the AGL/GLX/WGL Specifications
GLX Protocol
Errors
Dependencies on NV_parameter_buffer_object
Dependencies on ARB_texture_rectangle
Dependencies on EXT_gpu_program_parameters
Dependencies on EXT_texture_integer
Dependencies on EXT_texture_array
Dependencies on EXT_texture_buffer_object
Dependencies on NV_primitive_restart
New State
New Implementation Dependent State
Issues
Revision History

Name

  
    NV_gpu_program4

Name Strings

  
    GL_NV_gpu_program4

Contact

  
    Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)

Status

  
    Shipping for GeForce 8 Series (November 2006)

Version

  
    Last Modified Date:         02/04/2008  
    NVIDIA Revision:            4

Number

Dependencies

  
    This extension is written against to OpenGL 2.0 specification.    
  
    OpenGL 2.0 is not required, but we expect all implementations of this  
    extension will also support OpenGL 2.0.  
  
    This extension is also written against the ARB_vertex_program  
    specification, which provides the basic mechanisms for the assembly  
    programming model used by this extension.  
  
    This extension serves as the basis for the NV_fragment_program4,  
    NV_geometry_program4, and NV_vertex_program4, which all build on this  
    extension to support fragment, geometry, and vertex programs,  
    respectively.  If "GL_NV_gpu_program4" is found in the extension string,  
    all of these extensions are supported.  
  
    NV_parameter_buffer_object affects the definition of this extension.  
  
    ARB_texture_rectangle trivially affects the definition of this extension.  
  
    EXT_gpu_program_parameters trivially affects the definition of this  
    extension.  
  
    EXT_texture_integer trivially affects the definition of this extension.  
  
    EXT_texture_array trivially affects the definition of this extension.  
  
    EXT_texture_buffer_object trivially affects the definition of this  
    extension.  
  
    NV_primitive_restart trivially affects the definition of this extension.

Overview

  
    This specification documents the common instruction set and basic  
    functionality provided by NVIDIA's 4th generation of assembly instruction  
    sets supporting programmable graphics pipeline stages.    
  
    The instruction set builds upon the basic framework provided by the  
    ARB_vertex_program and ARB_fragment_program extensions to expose  
    considerably more capable hardware.  In addition to new capabilities for  
    vertex and fragment programs, this extension provides a new program type  
    (geometry programs) further described in the NV_geometry_program4  
    specification.  
  
    NV_gpu_program4 provides a unified instruction set -- all instruction set  
    features are available for all program types, except for a small number of  
    features that make sense only for a specific program type.  It provides  
    fully capable signed and unsigned integer data types, along with a set of  
    arithmetic, logical, and data type conversion instructions capable of  
    operating on integers.  It also provides a uniform set of structured  
    branching constructs (if tests, loops, and subroutines) that fully support  
    run-time condition testing.  
  
    This extension provides several new texture mapping capabilities.  Shadow  
    cube maps are supported, where cube map faces can encode depth values.  
    Texture lookup instructions can include an immediate texel offset, which  
    can assist in advanced filtering.  New instructions are provided to fetch  
    a single texel by address in a texture map (TXF) and query the size of a  
    specified texture level (TXQ).  
  
    By and large, vertex and fragment programs written to ARB_vertex_program  
    and ARB_fragment_program can be ported directly by simply changing the  
    program header from "!!ARBvp1.0" or "!!ARBfp1.0" to "!!NVvp4.0" or  
    "!!NVfp4.0", and then modifying the code to take advantage of the expanded  
    feature set.  There are a small number of areas where this extension is  
    not a functional superset of previous vertex program extensions, which are  
    documented in this specification.

New Procedures and Functions

  
    void ProgramLocalParameterI4iNV(enum target, uint index,   
                                    int x, int y, int z, int w);  
    void ProgramLocalParameterI4ivNV(enum target, uint index,   
                                     const int *params);  
    void ProgramLocalParametersI4ivNV(enum target, uint index,   
                                      sizei count, const int *params);  
    void ProgramLocalParameterI4uiNV(enum target, uint index,   
                                     uint x, uint y, uint z, uint w);  
    void ProgramLocalParameterI4uivNV(enum target, uint index,   
                                      const uint *params);  
    void ProgramLocalParametersI4uivNV(enum target, uint index,   
                                       sizei count, const uint *params);  
  
    void ProgramEnvParameterI4iNV(enum target, uint index,   
                                  int x, int y, int z, int w);  
    void ProgramEnvParameterI4ivNV(enum target, uint index,   
                                   const int *params);  
    void ProgramEnvParametersI4ivNV(enum target, uint index,   
                                    sizei count, const int *params);  
    void ProgramEnvParameterI4uiNV(enum target, uint index,   
                                   uint x, uint y, uint z, uint w);  
    void ProgramEnvParameterI4uivNV(enum target, uint index,   
                                    const uint *params);  
    void ProgramEnvParametersI4uivNV(enum target, uint index,   
                                     sizei count, const uint *params);  
  
    void GetProgramLocalParameterIivNV(enum target, uint index,  
                                       int *params);  
    void GetProgramLocalParameterIuivNV(enum target, uint index,  
                                        uint *params);  
    void GetProgramEnvParameterIivNV(enum target, uint index,  
                                     int *params);  
    void GetProgramEnvParameterIuivNV(enum target, uint index,  
                                      uint *params);

New Tokens

  
  
    Accepted by the <pname> parameter of GetBooleanv, GetIntegerv,  
    GetFloatv, and GetDoublev:   
  
        MIN_PROGRAM_TEXEL_OFFSET_EXT                    0x8904  
        MAX_PROGRAM_TEXEL_OFFSET_EXT                    0x8905  
  
    (note:  these tokens are shared with the EXT_gpu_shader4 extension.)  
  
    Accepted by the <pname> parameter of GetProgramivARB:  
  
        PROGRAM_ATTRIB_COMPONENTS_NV                    0x8906  
        PROGRAM_RESULT_COMPONENTS_NV                    0x8907  
        MAX_PROGRAM_ATTRIB_COMPONENTS_NV                0x8908  
        MAX_PROGRAM_RESULT_COMPONENTS_NV                0x8909  
        MAX_PROGRAM_GENERIC_ATTRIBS_NV                  0x8DA5  
        MAX_PROGRAM_GENERIC_RESULTS_NV                  0x8DA6

Additions to Chapter 2 of the OpenGL 1.5 Specification (OpenGL Operation)

  
    (Modify "Section 2.14.1" of the ARB_vertex_program specification,  
    describing program parameters.)  
  
    Each program object has an associated array of program local parameters.  
    Program local parameters are four-component vectors whose components can  
    hold floating-point, signed integer, or unsigned integer values.  The data  
    type of each local parameter is established when the parameter's values  
    are assigned.  If a program attempts to read a local parameter using a  
    data type other than the one used when the parameter is set, the values  
    returned are undefined.  ... The commands  
  
      void ProgramLocalParameter4fARB(enum target, uint index,  
                                      float x, float y, float z, float w);  
      void ProgramLocalParameter4fvARB(enum target, uint index,   
                                       const float *params);  
      void ProgramLocalParameter4dARB(enum target, uint index,  
                                      double x, double y, double z, double w);  
      void ProgramLocalParameter4dvARB(enum target, uint index,   
                                       const double *params);  
  
      void ProgramLocalParameterI4iNV(enum target, uint index,   
                                      int x, int y, int z, int w);  
      void ProgramLocalParameterI4ivNV(enum target, uint index,   
                                       const int *params);  
      void ProgramLocalParameterI4uiNV(enum target, uint index,   
                                       uint x, uint y, uint z, uint w);  
      void ProgramLocalParameterI4uivNV(enum target, uint index,   
                                        const uint *params);  
  
    update the values of the program local parameter numbered <index>  
    belonging to the program object currently bound to <target>.  For the  
    non-vector versions of these commands, the four components of the  
    parameter are updated with the values of <x>, <y>, <z>, and <w>,  
    respectively.  For the vector versions, the components of the parameter  
    are updated with the array of four values pointed to by <params>.  The  
    error INVALID_VALUE is generated if <index> is greater than or equal to  
    the number of program local parameters supported by <target>.  
  
    The commands  
  
      void ProgramLocalParameters4fvNV(enum target, uint index,   
                                       sizei count, const float *params);  
      void ProgramLocalParametersI4ivNV(enum target, uint index,   
                                        sizei count, const int *params);  
      void ProgramLocalParametersI4uivNV(enum target, uint index,   
                                         sizei count, const uint *params);  
  
    update the values of the program local parameters numbered <index> through  
    <index> + <count> - 1 with the array of 4 * <count> values pointed to by  
    <params>.  The error INVALID_VALUE is generated if the sum of <index> and  
    <count> is greater than the number of program local parameters supported  
    by <target>.  
  
    When a program local parameter is updated, the data type of its components  
    is assigned according to the data type of the provided values.  If values  
    provided are of type "float" or "double", the components of the parameter  
    are floating-point.  If the values provided are of type "int", the  
    components of the parameter are signed integers.  If the values provided  
    are of type "uint", the components of the parameter are unsigned integers.  
  
    Additionally, each program target has an associated array of program  
    environment parameters.  Unlike program local parameters, program  
    environment parameters are shared by all program objects of a given  
    target.  Program environment parameters are four-component vectors whose  
    components can hold floating-point, signed integer, or unsigned integer  
    values.  The data type of each environment parameter is established when  
    the parameter's values are assigned.  If a program attempts to read an  
    environment parameter using a data type other than the one used when the  
    parameter is set, the values returned are undefined.  ... The commands  
  
      void ProgramEnvParameter4fARB(enum target, uint index,  
                                    float x, float y, float z, float w);  
      void ProgramEnvParameter4fvARB(enum target, uint index,  
                                     const float *params);  
      void ProgramEnvParameter4dARB(enum target, uint index,  
                                    double x, double y, double z, double w);  
      void ProgramEnvParameter4dvARB(enum target, uint index,  
                                     const double *params);  
      void ProgramEnvParameterI4iNV(enum target, uint index,   
                                    int x, int y, int z, int w);  
      void ProgramEnvParameterI4ivNV(enum target, uint index,   
                                     const int *params);  
      void ProgramEnvParameterI4uiNV(enum target, uint index,   
                                     uint x, uint y, uint z, uint w);  
      void ProgramEnvParameterI4uivNV(enum target, uint index,   
                                      const uint *params);  
  
    update the values of the program environment parameter numbered <index>  
    for the given program target <target>.  For the non-vector versions of  
    these commands, the four components of the parameter are updated with the  
    values of <x>, <y>, <z>, and <w>, respectively.  For the vector versions,  
    the four components of the parameter are updated with the array of four  
    values pointed to by <params>.  The error INVALID_VALUE is generated if  
    <index> is greater than or equal to the number of program environment  
    parameters supported by <target>.  
  
    The commands  
  
      void ProgramEnvParameters4fvNV(enum target, uint index,   
                                     sizei count, const float *params);  
      void ProgramEnvParametersI4ivNV(enum target, uint index,   
                                      sizei count, const int *params);  
      void ProgramEnvParametersI4uivNV(enum target, uint index,   
                                       sizei count, const uint *params);  
  
    update the values of the program environment parameters numbered <index>  
    through <index> + <count> - 1 with the array of 4 * <count> values pointed  
    to by <params>.  The error INVALID_VALUE is generated if the sum of  
    <index> and <count> is greater than the number of program local parameters  
    supported by <target>.  
  
    When a program environment parameter is updated, the data type of its  
    components is assigned according to the data type of the provided values.  
    If values provided are of type "float" or "double", the components of the  
    parameter are floating-point.  If the values provided are of type "int",  
    the components of the parameter are signed integers.  If the values  
    provided are of type "uint", the components of the parameter are unsigned  
    integers.  
  
    ...  
  
  
    Insert New Section 2.X between Sections 2.Y and 2.Z:  
  
    Section 2.X, GPU Programs  
  
    The GL provides a number of different program targets that allow an  
    application to either replace certain fixed-function pipeline stages with  
    a fully programmable model or use a program to control aspects of the GL  
    pipeline that previously had only hard-wired behavior.  
  
    A common base instruction set is available for all program types,   
    providing both integer and floating-point operations.  Structured  
    branching operations and subroutine calls are available.  Texture  
    mapping (loading data from external images) is supported for all  
    program types.  The main differences between the different program  
    types are the set of available inputs and outputs, which are program type-  
    specific, and a few instructions that are meaningful for only a subset  
    of program types.  
  
  
  
    Section 2.X.2, Program Grammar  
  
    GPU program strings are specified as an array of ASCII characters  
    containing the program text.  When a GPU program is loaded by a call to  
    ProgramStringARB, the program string is parsed into a set of tokens  
    possibly separated by whitespace.  Spaces, tabs, newlines, carriage  
    returns, and comments are considered whitespace.  Comments begin with the  
    character "#" and are terminated by a newline, a carriage return, or the  
    end of the program array.  
  
    The Backus-Naur Form (BNF) grammar below specifies the syntactically valid  
    sequences for GPU programs.  The set of valid tokens can be inferred  
    from the grammar.  A line containing "/* empty */" represents an empty  
    string and is used to indicate optional rules.  A program is invalid if it  
    contains any tokens or characters not defined in this specification.  
  
    Note that this extension is not a standalone extension and a small number  
    of grammar rules are left to be defined in the extensions defining the  
    specific vertex, fragment, and geometry program types.  
  
  
    <program>               ::= <optionSequence> <declSequence>   
                                <statementSequence> "END"  
  
    <optionSequence>        ::= <option> <optionSequence>  
                              | /* empty */  
  
    <option>                ::= "OPTION" <identifier> ";"  
  
    <declSequence>          ::= /* empty */  
  
    <statementSequence>     ::= <statement> <statementSequence>  
                              | /* empty */  
  
    <statement>             ::= <instruction> ";"  
                              | <namingStatement> ";"  
                              | <instLabel> ":"  
  
    <instruction>           ::= <ALUInstruction>  
                              | <TexInstruction>  
                              | <FlowInstruction>  
  
    <ALUInstruction>        ::= <VECTORop_instruction>  
                              | <SCALARop_instruction>  
                              | <BINSCop_instruction>  
                              | <BINop_instruction>  
                              | <VECSCAop_instruction>  
                              | <TRIop_instruction>  
                              | <SWZop_instruction>  
  
    <TexInstruction>        ::= <TEXop_instruction>  
                              | <TXDop_instruction>  
  
    <FlowInstruction>       ::= <BRAop_instruction>  
                              | <FLOWCCop_instruction>  
                              | <IFop_instruction>  
                              | <REPop_instruction>  
                              | <ENDFLOWop_instruction>  
  
    <VECTORop_instruction>  ::= <VECTORop> <opModifiers> <instResult> ","   
                                <instOperandV>  
  
    <VECTORop>              ::= "ABS"  
                              | "CEIL"  
                              | "FLR"  
                              | "FRC"  
                              | "I2F"  
                              | "LIT"  
                              | "MOV"  
                              | "NOT"  
                              | "NRM"  
                              | "PK2H"  
                              | "PK2US"  
                              | "PK4B"  
                              | "PK4UB"  
                              | "ROUND"  
                              | "SSG"  
                              | "TRUNC"  
  
    <SCALARop_instruction>  ::= <SCALARop> <opModifiers> <instResult> ","   
                                <instOperandS>  
  
    <SCALARop>              ::= "COS"  
                              | "EX2"  
                              | "LG2"  
                              | "RCC"  
                              | "RCP"  
                              | "RSQ"  
                              | "SCS"  
                              | "SIN"  
                              | "UP2H"  
                              | "UP2US"  
                              | "UP4B"  
                              | "UP4UB"  
  
    <BINSCop_instruction>   ::= <BINSCop> <opModifiers> <instResult> ","   
                                <instOperandS> "," <instOperandS>  
  
    <BINSCop>               ::= "POW"  
  
    <VECSCAop_instruction>  ::= <VECSCAop> <opModifiers> <instResult> ","   
                                <instOperandV> "," <instOperandS>  
  
    <VECSCAop>              ::= "DIV"  
                              | "SHL"  
                              | "SHR"  
                              | "MOD"  
  
    <BINop_instruction>     ::= <BINop> <opModifiers> <instResult> ","   
                                <instOperandV> "," <instOperandV>  
  
    <BINop>                 ::= "ADD"  
                              | "AND"  
                              | "DP3"  
                              | "DP4"  
                              | "DPH"  
                              | "DST"  
                              | "MAX"  
                              | "MIN"  
                              | "MUL"  
                              | "OR"  
                              | "RFL"  
                              | "SEQ"  
                              | "SFL"  
                              | "SGE"  
                              | "SGT"  
                              | "SLE"  
                              | "SLT"  
                              | "SNE"  
                              | "STR"  
                              | "SUB"  
                              | "XPD"  
                              | "DP2"  
                              | "XOR"  
  
    <TRIop_instruction>     ::= <TRIop> <opModifiers> <instResult> ","   
                                <instOperandV> "," <instOperandV> ","   
                                <instOperandV>  
  
    <TRIop>                 ::= "CMP"  
                              | "DP2A"  
                              | "LRP"  
                              | "MAD"  
                              | "SAD"  
                              | "X2D"  
  
    <SWZop_instruction>     ::= <SWZop> <opModifiers> <instResult> ","   
                                <instOperandVNS> "," <extendedSwizzle>  
  
    <SWZop>                 ::= "SWZ"  
  
    <TEXop_instruction>     ::= <TEXop> <opModifiers> <instResult> ","   
                                <instOperandV> "," <texAccess>  
  
    <TEXop>                 ::= "TEX"  
                              | "TXB"  
                              | "TXF"  
                              | "TXL"  
                              | "TXP"  
                              | "TXQ"  
  
    <TXDop_instruction>     ::= <TXDop> <opModifiers> <instResult> ","   
                                <instOperandV> "," <instOperandV> ","   
                                <instOperandV> "," <texAccess>  
  
    <TXDop>                 ::= "TXD"  
  
    <BRAop_instruction>     ::= <BRAop> <opModifiers> <instTarget>   
                                <optBranchCond>  
  
    <BRAop>                 ::= "CAL"  
  
    <FLOWCCop_instruction>  ::= <FLOWCCop> <opModifiers> <optBranchCond>  
  
    <FLOWCCop>              ::= "RET"  
                              | "BRK"  
                              | "CONT"  
  
    <IFop_instruction>      ::= <IFop> <opModifiers> <ccTest>  
  
    <IFop>                  ::= "IF"  
  
    <REPop_instruction>     ::= <REPop> <opModifiers> <instOperandV>  
                              | <REPop> <opModifiers>  
  
    <REPop>                 ::= "REP"  
  
    <ENDFLOWop_instruction> ::= <ENDFLOWop> <opModifiers>  
  
    <ENDFLOWop>             ::= "ELSE"  
                              | "ENDIF"  
                              | "ENDREP"  
  
    <opModifiers>           ::= <opModifierItem> <opModifiers>  
                              | /* empty */  
  
    <opModifierItem>        ::= "." <opModifier>  
  
    <opModifier>            ::= "F"  
                              | "U"  
                              | "S"  
                              | "CC"  
                              | "CC0"  
                              | "CC1"  
                              | "SAT"  
                              | "SSAT"  
                              | "NTC"  
                              | "S24"  
                              | "U24"  
                              | "HI"  
  
    <texAccess>             ::= <texImageUnit> "," <texTarget>  
                              | <texImageUnit> "," <texTarget> "," <texOffset>  
  
    <texImageUnit>          ::= "texture" <optArrayMemAbs>  
  
    <texTarget>             ::= "1D"  
                              | "2D"  
                              | "3D"  
                              | "CUBE"  
                              | "RECT"  
                              | "SHADOW1D"  
                              | "SHADOW2D"  
                              | "SHADOWRECT"  
                              | "ARRAY1D"  
                              | "ARRAY2D"  
                              | "SHADOWCUBE"  
                              | "SHADOWARRAY1D"  
                              | "SHADOWARRAY2D"  
  
    <texOffset>             ::= "(" <texOffsetComp> ")"  
                              | "(" <texOffsetComp> "," <texOffsetComp> ")"  
                              | "(" <texOffsetComp> "," <texOffsetComp> ","   
                                <texOffsetComp> ")"  
  
    <texOffsetComp>         ::= <optSign> <int>  
  
    <optBranchCond>         ::= /* empty */  
                              | <ccMask>  
  
    <instOperandV>          ::= <instOperandAbsV>  
                              | <instOperandBaseV>  
  
    <instOperandAbsV>       ::= <operandAbsNeg> "|" <instOperandBaseV> "|"  
  
    <instOperandBaseV>      ::= <operandNeg> <attribUseV>  
                              | <operandNeg> <tempUseV>  
                              | <operandNeg> <paramUseV>  
                              | <operandNeg> <bufferUseV>  
  
    <instOperandS>          ::= <instOperandAbsS>  
                              | <instOperandBaseS>  
  
    <instOperandAbsS>       ::= <operandAbsNeg> "|" <instOperandBaseS> "|"  
  
    <instOperandBaseS>      ::= <operandNeg> <attribUseS>  
                              | <operandNeg> <tempUseS>  
                              | <operandNeg> <paramUseS>  
                              | <operandNeg> <bufferUseS>  
  
    <instOperandVNS>        ::= <attribUseVNS>  
                              | <tempUseVNS>  
                              | <paramUseVNS>  
                              | <bufferUseVNS>  
  
    <operandAbsNeg>         ::= <optSign>  
  
    <operandNeg>            ::= <optSign>  
  
    <instResult>            ::= <instResultCC>  
                              | <instResultBase>  
  
    <instResultCC>          ::= <instResultBase> <ccMask>  
  
    <instResultBase>        ::= <tempUseW>  
                              | <resultUseW>  
  
    <namingStatement>       ::= <varMods> <ATTRIB_statement>  
                              | <varMods> <PARAM_statement>  
                              | <varMods> <TEMP_statement>  
                              | <varMods> <OUTPUT_statement>  
                              | <varMods> <BUFFER_statement>  
                              | <ALIAS_statement>  
  
    <ATTRIB_statement>      ::= "ATTRIB" <establishName> "=" <attribUseD>  
  
    <PARAM_statement>       ::= <PARAM_singleStmt>  
                              | <PARAM_multipleStmt>  
  
    <PARAM_singleStmt>      ::= "PARAM" <establishName> <paramSingleInit>  
  
    <PARAM_multipleStmt>    ::= "PARAM" <establishName> <optArraySize>   
                                <paramMultipleInit>  
  
    <paramSingleInit>       ::= "=" <paramUseDB>  
  
    <paramMultipleInit>     ::= "=" "{" <paramMultInitList> "}"  
  
    <paramMultInitList>     ::= <paramUseDM>  
                              | <paramUseDM> "," <paramMultInitList>  
  
    <TEMP_statement>        ::= "TEMP" <varNameList>  
  
    <OUTPUT_statement>      ::= "OUTPUT" <establishName> "=" <resultUseD>  
  
    <varMods>               ::= <varModifier> <varMods>  
                              | /* empty */  
  
    <varModifier>           ::= "SHORT"  
                              | "LONG"  
                              | "INT"  
                              | "UINT"  
                              | "FLOAT"  
  
    <ALIAS_statement>       ::= "ALIAS" <establishName> "=" <establishedName>  
  
    <BUFFER_statement>      ::= <bufferDeclType> <establishName> "="   
                                <bufferSingleInit>  
                              | <bufferDeclType> <establishName>   
                                <optArraySize> "=" <bufferMultInit>  
  
    <bufferDeclType>        ::= "BUFFER"  
                              | "BUFFER4"  
  
    <bufferSingleInit>      ::= "=" <bufferUseDB>  
  
    <bufferMultInit>        ::= "=" "{" <bufferMultInitList> "}"  
  
    <bufferMultInitList>    ::= <bufferUseDM>  
                              | <bufferUseDM> "," <bufferMultInitList>  
  
    <varNameList>           ::= <establishName>  
                              | <establishName> "," <varNameList>  
  
    <attribUseV>            ::= <attribBasic> <swizzleSuffix>  
                              | <attribVarName> <swizzleSuffix>  
                              | <attribVarName> <arrayMem> <swizzleSuffix>  
                              | <attribColor> <swizzleSuffix>  
                              | <attribColor> "." <colorType> <swizzleSuffix>  
  
    <attribUseS>            ::= <attribBasic> <scalarSuffix>  
                              | <attribVarName> <scalarSuffix>  
                              | <attribVarName> <arrayMem> <scalarSuffix>  
                              | <attribColor> <scalarSuffix>  
                              | <attribColor> "." <colorType> <scalarSuffix>  
  
    <attribUseVNS>          ::= <attribBasic>  
                              | <attribVarName>  
                              | <attribVarName> <arrayMem>  
                              | <attribColor>  
                              | <attribColor> "." <colorType>  
  
    <attribUseD>            ::= <attribBasic>  
                              | <attribColor>  
                              | <attribColor> "." <colorType>  
                              | <attribMulti>  
  
    <paramUseV>             ::= <paramVarName> <optArrayMem> <swizzleSuffix>  
                              | <stateSingleItem> <swizzleSuffix>  
                              | <programSingleItem> <swizzleSuffix>  
                              | <constantVector> <swizzleSuffix>  
                              | <constantScalar>  
  
    <paramUseS>             ::= <paramVarName> <optArrayMem> <scalarSuffix>  
                              | <stateSingleItem> <scalarSuffix>  
                              | <programSingleItem> <scalarSuffix>  
                              | <constantVector> <scalarSuffix>  
                              | <constantScalar>  
  
    <paramUseVNS>           ::= <paramVarName> <optArrayMem>  
                              | <stateSingleItem>  
                              | <programSingleItem>  
                              | <constantVector>  
                              | <constantScalar>  
  
    <paramUseDB>            ::= <stateSingleItem>  
                              | <programSingleItem>  
                              | <constantVector>  
                              | <signedConstantScalar>  
  
    <paramUseDM>            ::= <stateMultipleItem>  
                              | <programMultipleItem>  
                              | <constantVector>  
                              | <signedConstantScalar>  
  
    <stateMultipleItem>     ::= <stateSingleItem>  
                              | "state" "." <stateMatrixRows>  
  
    <stateSingleItem>       ::= "state" "." <stateMaterialItem>  
                              | "state" "." <stateLightItem>  
                              | "state" "." <stateLightModelItem>  
                              | "state" "." <stateLightProdItem>  
                              | "state" "." <stateFogItem>  
                              | "state" "." <stateMatrixRow>  
                              | "state" "." <stateTexGenItem>  
                              | "state" "." <stateClipPlaneItem>  
                              | "state" "." <statePointItem>  
                              | "state" "." <stateTexEnvItem>  
                              | "state" "." <stateDepthItem>  
  
    <stateMaterialItem>     ::= "material" "." <stateMatProperty>  
                              | "material" "." <faceType> "."   
                                <stateMatProperty>  
  
    <stateMatProperty>      ::= "ambient"  
                              | "diffuse"  
                              | "specular"  
                              | "emission"  
                              | "shininess"  
  
    <stateLightItem>        ::= "light" <arrayMemAbs> "." <stateLightProperty>  
  
    <stateLightProperty>    ::= "ambient"  
                              | "diffuse"  
                              | "specular"  
                              | "position"  
                              | "attenuation"  
                              | "spot" "." <stateSpotProperty>  
                              | "half"  
  
    <stateSpotProperty>     ::= "direction"  
  
    <stateLightModelItem>   ::= "lightmodel" "." <stateLModProperty>  
  
    <stateLModProperty>     ::= "ambient"  
                              | "scenecolor"  
                              | <faceType> "." "scenecolor"  
  
    <stateLightProdItem>    ::= "lightprod" <arrayMemAbs> "."   
                                <stateLProdProperty>  
                              | "lightprod" <arrayMemAbs> "." <faceType> "."   
                                <stateLProdProperty>  
  
    <stateLProdProperty>    ::= "ambient"  
                              | "diffuse"  
                              | "specular"  
  
    <stateFogItem>          ::= "fog" "." <stateFogProperty>  
  
    <stateFogProperty>      ::= "color"  
                              | "params"  
  
    <stateMatrixRows>       ::= <stateMatrixItem>  
                              | <stateMatrixItem> "." <stateMatModifier>  
                              | <stateMatrixItem> "." "row" <arrayRange>  
                              | <stateMatrixItem> "." <stateMatModifier> "."   
                                "row" <arrayRange>  
  
    <stateMatrixRow>        ::= <stateMatrixItem> "." "row" <arrayMemAbs>  
                              | <stateMatrixItem> "." <stateMatModifier> "."   
                                "row" <arrayMemAbs>  
  
    <stateMatrixItem>       ::= "matrix" "." <stateMatrixName>  
  
    <stateMatModifier>      ::= "inverse"  
                              | "transpose"  
                              | "invtrans"  
  
    <stateMatrixName>       ::= "modelview" <optArrayMemAbs>  
                              | "projection"  
                              | "mvp"  
                              | "texture" <optArrayMemAbs>  
                              | "program" <arrayMemAbs>  
  
    <stateTexGenItem>       ::= "texgen" <optArrayMemAbs> "."   
                                <stateTexGenType> "." <stateTexGenCoord>  
  
    <stateTexGenType>       ::= "eye"  
                              | "object"  
  
    <stateTexGenCoord>      ::= "s"  
                              | "t"  
                              | "r"  
                              | "q"  
  
    <stateClipPlaneItem>    ::= "clip" <arrayMemAbs> "." "plane"  
  
    <statePointItem>        ::= "point" "." <statePointProperty>  
  
    <statePointProperty>    ::= "size"  
                              | "attenuation"  
  
    <stateTexEnvItem>       ::= "texenv" <optArrayMemAbs> "."   
                                <stateTexEnvProperty>  
  
    <stateTexEnvProperty>   ::= "color"  
  
    <stateDepthItem>        ::= "depth" "." <stateDepthProperty>  
  
    <stateDepthProperty>    ::= "range"  
  
    <programSingleItem>     ::= <progEnvParam>  
                              | <progLocalParam>  
  
    <programMultipleItem>   ::= <progEnvParams>  
                              | <progLocalParams>  
  
    <progEnvParams>         ::= "program" "." "env" <arrayMemAbs>  
                              | "program" "." "env" <arrayRange>  
  
    <progEnvParam>          ::= "program" "." "env" <arrayMemAbs>  
  
    <progLocalParams>       ::= "program" "." "local" <arrayMemAbs>  
                              | "program" "." "local" <arrayRange>  
  
    <progLocalParam>        ::= "program" "." "local" <arrayMemAbs>  
  
    <constantVector>        ::= "{" <constantVectorList> "}"  
  
    <constantVectorList>    ::= <signedConstantScalar>  
                              | <signedConstantScalar> ","   
                                <signedConstantScalar>  
                              | <signedConstantScalar> ","   
                                <signedConstantScalar> ","   
                                <signedConstantScalar>  
                              | <signedConstantScalar> ","   
                                <signedConstantScalar> ","   
                                <signedConstantScalar> ","   
                                <signedConstantScalar>  
  
    <signedConstantScalar>  ::= <optSign> <constantScalar>  
  
    <constantScalar>        ::= <floatConstant>  
                              | <intConstant>  
  
    <floatConstant>         ::= <float>  
  
    <intConstant>           ::= <int>  
  
    <tempUseV>              ::= <tempVarName> <swizzleSuffix>  
  
    <tempUseS>              ::= <tempVarName> <scalarSuffix>  
  
    <tempUseVNS>            ::= <tempVarName>  
  
    <tempUseW>              ::= <tempVarName> <optWriteMask>  
  
    <resultUseW>            ::= <resultBasic> <optWriteMask>  
                              | <resultVarName> <optWriteMask>  
  
    <resultUseD>            ::= <resultBasic>  
  
    <bufferUseV>            ::= <bufferVarName> <optArrayMem> <swizzleSuffix>  
  
    <bufferUseS>            ::= <bufferVarName> <optArrayMem> <scalarSuffix>  
  
    <bufferUseVNS>          ::= <bufferVarName> <optArrayMem>  
  
    <bufferUseDB>           ::= <bufferBinding> <arrayMemAbs>  
  
    <bufferUseDM>           ::= <bufferBinding> <arrayMemAbs>  
                              | <bufferBinding> <arrayRange>  
                              | <bufferBinding>  
  
    <bufferBinding>         ::= "program" "." "buffer" <arrayMemAbs>  
  
    <optArraySize>          ::= "[" "]"  
                              | "[" <int> "]"  
  
    <optArrayMem>           ::= /* empty */  
                              | <arrayMem>  
  
    <arrayMem>              ::= <arrayMemAbs>  
                              | <arrayMemRel>  
  
    <optArrayMemAbs>        ::= /* empty */  
                              | <arrayMemAbs>  
  
    <arrayMemAbs>           ::= "[" <int> "]"  
  
    <arrayMemRel>           ::= "[" <arrayMemReg> <arrayMemOffset> "]"  
  
    <arrayMemReg>           ::= <addrUseS>  
  
    <arrayMemOffset>        ::= /* empty */  
                              | "+" <int>  
                              | "-" <int>  
  
    <arrayRange>            ::= "[" <int> ".." <int> "]"  
  
    <addrUseS>              ::= <addrVarName> <scalarSuffix>  
  
    <ccMask>                ::= "(" <ccTest> ")"  
  
    <ccTest>                ::= <ccMaskRule> <swizzleSuffix>  
  
    <ccMaskRule>            ::= "EQ"  
                              | "GE"  
                              | "GT"  
                              | "LE"  
                              | "LT"  
                              | "NE"  
                              | "TR"  
                              | "FL"  
                              | "EQ0"  
                              | "GE0"  
                              | "GT0"  
                              | "LE0"  
                              | "LT0"  
                              | "NE0"  
                              | "TR0"  
                              | "FL0"  
                              | "EQ1"  
                              | "GE1"  
                              | "GT1"  
                              | "LE1"  
                              | "LT1"  
                              | "NE1"  
                              | "TR1"  
                              | "FL1"  
                              | "NAN"  
                              | "NAN0"  
                              | "NAN1"  
                              | "LEG"  
                              | "LEG0"  
                              | "LEG1"  
                              | "CF"  
                              | "CF0"  
                              | "CF1"  
                              | "NCF"  
                              | "NCF0"  
                              | "NCF1"  
                              | "OF"  
                              | "OF0"  
                              | "OF1"  
                              | "NOF"  
                              | "NOF0"  
                              | "NOF1"  
                              | "AB"  
                              | "AB0"  
                              | "AB1"  
                              | "BLE"  
                              | "BLE0"  
                              | "BLE1"  
                              | "SF"  
                              | "SF0"  
                              | "SF1"  
                              | "NSF"  
                              | "NSF0"  
                              | "NSF1"  
  
    <optWriteMask>          ::= /* empty */  
                              | <xyzwMask>  
                              | <rgbaMask>  
  
    <xyzwMask>              ::= "." "x"  
                              | "." "y"  
                              | "." "xy"  
                              | "." "z"  
                              | "." "xz"  
                              | "." "yz"  
                              | "." "xyz"  
                              | "." "w"  
                              | "." "xw"  
                              | "." "yw"  
                              | "." "xyw"  
                              | "." "zw"  
                              | "." "xzw"  
                              | "." "yzw"  
                              | "." "xyzw"  
  
    <rgbaMask>              ::= "." "r"  
                              | "." "g"  
                              | "." "rg"  
                              | "." "b"  
                              | "." "rb"  
                              | "." "gb"  
                              | "." "rgb"  
                              | "." "a"  
                              | "." "ra"  
                              | "." "ga"  
                              | "." "rga"  
                              | "." "ba"  
                              | "." "rba"  
                              | "." "gba"  
                              | "." "rgba"  
  
    <swizzleSuffix>         ::= /* empty */  
                              | "." <component>  
                              | "." <xyzwSwizzle>  
                              | "." <rgbaSwizzle>  
  
    <extendedSwizzle>       ::= <extSwizComp> "," <extSwizComp> ","   
                                <extSwizComp> "," <extSwizComp>  
  
    <extSwizComp>           ::= <optSign> <xyzwExtSwizSel>  
                              | <optSign> <rgbaExtSwizSel>  
  
    <xyzwExtSwizSel>        ::= "0"  
                              | "1"  
                              | <xyzwComponent>  
  
    <rgbaExtSwizSel>        ::= <rgbaComponent>  
  
    <scalarSuffix>          ::= "." <component>  
  
    <component>             ::= <xyzwComponent>  
                              | <rgbaComponent>  
  
    <xyzwComponent>         ::= "x"  
                              | "y"  
                              | "z"  
                              | "w"  
  
    <rgbaComponent>         ::= "r"  
                              | "g"  
                              | "b"  
                              | "a"  
  
    <optSign>               ::= /* empty */  
                              | "-"  
                              | "+"  
  
    <faceType>              ::= "front"  
                              | "back"  
  
    <colorType>             ::= "primary"  
                              | "secondary"  
  
    <instLabel>             ::= <identifier>  
  
    <instTarget>            ::= <identifier>  
  
    <establishedName>       ::= <identifier>  
  
    <establishName>         ::= <identifier>  
  
  
    The <int> rule matches an integer constant.  The integer consists of a  
    sequence of one or more digits ("0" through "9"), or a sequence in  
    hexadecimal form beginning with "0x" followed by a sequence of one or more  
    hexadecimal digits ("0" through "9", "a" through "f", "A" through "F").  
  
    The <float> rule matches a floating-point constant consisting of an  
    integer part, a decimal point, a fraction part, an "e" or "E", and an  
    optionally signed integer exponent.  The integer and fraction parts both  
    consist of a sequence of one or more digits ("0" through "9").  Either the  
    integer part or the fraction parts (not both) may be missing; either the  
    decimal point or the "e" (or "E") and the exponent (not both) may be  
    missing.  Most grammar rules that allow floating-point values also allow  
    integers matching the <int> rule.  
  
    The <identifier> rule matches a sequence of one or more letters ("A"  
    through "Z", "a" through "z"), digits ("0" through "9), underscores ("_"),  
    or dollar signs ("$"); the first character must not be a number.  Upper  
    and lower case letters are considered different (names are  
    case-sensitive).  The following strings are reserved keywords and may not  
    be used as identifiers:  "fragment" (for fragment programs only), "vertex"  
    (for vertex and geometry programs), "primitive" (for fragment and geometry  
    programs), "program", "result", "state", and "texture".  
  
    The <tempVarName>, <paramVarName>, <attribVarName>, <resultVarName>, and  
    <bufferName> rules match identifiers that have been previously established  
    as names of temporary, program parameter, attribute, result, and program  
    parameter buffer variables, respectively.  
  
    The <xyzwSwizzle> and <rgbaSwizzle> rules match any 4-character strings  
    consisting only of the characters "x", "y", "z", and "w" (<xyzwSwizzle>)  
    or "r", "g", "b", "a" (<rgbaSwizzle>).  
  
    The error INVALID_OPERATION is generated if a program fails to load  
    because it is not syntactically correct or for one of the semantic  
    restrictions described in the following sections.  
  
    A successfully loaded program is parsed into a sequence of instructions.  
    Each instruction is identified by its tokenized name.  The operation of  
    these instructions when executed is defined in section 2.X.4.  A  
    successfully loaded program string replaces the program string previously  
    loaded into the specified program object.  If the OUT_OF_MEMORY error is  
    generated by ProgramStringARB, no change is made to the previous contents  
    of the current program object.  
  
  
    Section 2.X.3, Program Variables  
  
    Programs may operate on a number of different variables during their  
    execution.  The following sections define the different classes of  
    variables that can be declared and used by a program.    
  
    Some variable classes require variable bindings.  Variable classes with  
    bindings refer to state that is either generated or consumed outside the  
    program.  Examples of variable bindings include a vertex's normal, the  
    position of a vertex computed by a vertex program, an interpolated texture  
    coordinate, and the diffuse color of light 1.  Variables that are used  
    only during program execution do not have bindings.  
  
    Variables may be declared explicitly according to the <namingStatement>  
    grammar rule.  Explicit variable declarations allow a program to establish  
    a variable name that can be used to refer to a specified resource in  
    subsequent instructions.  Variables may be declared anywhere in the  
    program string, but must be declared prior to use.  A program will fail to  
    load if it declares the same variable name more than once, or if it refers  
    to a variable name that has not been previously declared in the program  
    string.  
  
    Variables may also be declared implicitly, simply by using a variable  
    binding as an operand in a program instruction.  Such uses are considered  
    to automatically create a nameless variable using the specified binding.  
    Only variable from classes with bindings can be declared implicitly.  
  
  
    Section 2.X.3.1, Program Variable Types  
  
    Explicit variable declarations may include one or more modifiers that  
    specify additional information about the variable, such as the size and  
    data type of the components of the variable.  Variable modifiers are  
    specified according to the <varModifier> grammar rule.  
  
    By default, variables are considered typeless.  They can be used in  
    instructions that read or write the variable as floating-point values,  
    signed integers, or unsigned integers.  If a variable is written using one  
    data type but then read using a different one, the results of the  
    operation are undefined.  Variables with bindings are considered to be  
    read or written when their values are produced or consumed; the data type  
    used by the GL is specified in the description of each binding.  
  
    Explicitly declared variables may optionally have one data type modifier,  
    which can be used to detect data type mismatch errors.  Type modifers of  
    "INT", "UINT", and "FLOAT" indicate that the components of the variable  
    are stored as signed integers, unsigned integers, or floating-point  
    values, respectively.  A program will fail to load if it attempts to read  
    or write a variable using a data type other than the one indicated by the  
    data type modifier.  Variables without a data type modifier can be read or  
    written using any data type.  
  
    Explicitly declared variables may optionally have one storage size  
    modifier.  Variables decared as "SHORT" will be represented using at least  
    16 bits per component.  "SHORT" floating-point values will have at least 5  
    bits of exponent and 10 bits of mantissa.  Variables declared as "LONG"  
    will be represented with at least 32 bits per component.  "LONG"  
    floating-point values will have at least 8 bits of exponent and 23 bits of  
    mantissa.  If no size modifier is provided, the GL will automatically  
    select component sizes.  Implementations are not required to support more  
    than one component size, so "SHORT", "LONG", and the default could all  
    refer to the same component size.  
  
    Each variable declaration can include at most one data type and one  
    storage size modifier.  A program will fail to load if it specifies  
    multiple data type or multiple storage size modifiers in a single variable  
    declaration.  
  
    (NOTE:  Fragment programs also support the modifiers "FLAT", "CENTROID",  
    and "NOPERSPECTIVE", which control how per-fragment attribute values are  
    produced.  These modifiers are described in detail in the  
    NV_fragment_program4 specification.)  
  
    Explicitly declared variables of all types may be declared as arrays.  An  
    array variable has one or more members, numbered 0 through <n>-1, where  
    <n> is the number of entries in the array.  The total number of entries in  
    the array can be declared using the <optArraySize> grammar rule.  For  
    variable classes without bindings, an array size must be specified in the  
    program, and must be a positive integer.  For variable classes with  
    bindings, a declared size is optional, and is taken from the number of  
    bindings assigned in the declaration if omitted.  A program will fail to  
    load if the declared size of an array variable does not match the number  
    of assigned bindings.  
  
    When a variable is declared as an array, instructions that use the  
    variable must specify an array member to access according to the  
    <arrayMem> grammar rule.  A program will fail to load if it contains an  
    instruction that accesses an array variable without specifying an array  
    member or an instruction that specifies an array member for a non-array  
    variable.  
  
  
    Section 2.X.3.2, Program Attribute Variables  
  
    Program attribute variables represent per-vertex or per-fragment inputs to  
    the program.  All attribute variables have associated bindings, and are  
    read-only during program execution.  Attribute variables may be declared  
    explicitly via the <ATTRIB_statement> grammar rule, or implicitly by using  
    an attribute binding in an instruction.  
  
    The set of available attribute bindings depends on the program type, and  
    is enumerated in the specifications for each program type.  
  
    The set of bindings allowed for attribute array variables is limited to  
    attribute state grouped in arrays (e.g., texture coordinates, generic  
    vertex attributes).  Additionally, all bindings assigned to the array must  
    be of the same binding type and must increase consecutively.  Examples of  
    valid and invalid binding lists include:  
  
      vertex.attrib[1], vertex.attrib[2]      # valid, 2-entry array  
      vertex.texcoord[0..3]                   # valid, 4-entry array  
      vertex.attrib[1], vertex.attrib[3]      # invalid, skipped attrib 2  
      vertex.attrib[2], vertex.attrib[1]      # invalid, wrong order  
      vertex.attrib[1], vertex.texcoord[2]    # invalid, different types  
  
    Additionally, attribute bindings may be used in no more than one array  
    variable accessed with relative addressing.  
  
    Implementations may have a limit on the total number of attribute binding  
    components used by each program target (MAX_PROGRAM_ATTRIB_COMPONENTS).  
    Programs that use more attribute binding components than this limit will  
    fail to load.  The method of counting used attribute binding components is  
    implementation-dependent, but must satisfy the following properties:  
  
      * If an attribute binding is not referenced in a program, or is  
        referenced only in declarations of attribute variables that are not  
        used, none of its components are counted.  
  
      * An attribute binding component may be counted as used only if there  
        exists an instruction operand where  
  
          - the component is enabled for read by the swizzle pattern (Section  
            2.X.4.2), and  
  
          - the attribute binding is  
  
              - referenced directly by the operand,  
  
              - bound to a declared variable referenced by the operand, or  
  
              - bound to a declared array variable where another binding in  
                the array satisfies one of the two previous conditions.  
  
        Implementations are not required to optimize out unused elements of an  
        attribute array or components that are used in only some elements of  
        an array.  The last of these rules is intended to cover the case where  
        the same attribute binding is used in multiple variables.  
  
        For example, an operand whose swizzle pattern selects only the x  
        component may result in the x component of an attribute binding being  
        counted, but may never result in the counting of the y, z, or w  
        components of any attribute binding.  
  
      * Implementations are not required to determine that components read by  
        an instruction are actually unused due to:  
  
          - instruction write masks (for example, a component-wise ADD  
            operation that only writes the "x" component doesn't have to read  
            the "y", "z", and "w" components of its operands) or  
  
          - any other properties of the instruction (for example, the DP3  
            instruction computes a 3-component dot product doesn't have to  
            read the "w" component of its operands).  
  
  
    Section 2.X.3.3, Program Parameters  
  
    Program parameter variables are used as constants during program  
    execution.  All program parameter variables have associated bindings and  
    are read-only during program execution.  Program parameters retain their  
    values across program invocations, although their values may change  
    between invocations due to GL state changes.  Program parameter variables  
    may be declared explicitly via the <PARAM_statement> grammar rule, or  
    implicitly by using a parameter binding in an instruction.  Except where  
    otherwise specified, program parameter bindings always specify  
    floating-point values.  
  
    When declaring program parameter array variables, all bindings are  
    supported and can be assigned to array members in any order.  The only  
    restriction is that no parameter binding may be used more than once in  
    array variables accessed using relative addressing.  A program will fail  
    to load if any program parameter binding is used more than once in a  
    single array accessed using relative addressing or used at least once in  
    two or more arrays accessed using relative addressing.  
  
  
    Constant Bindings  
  
    If a program parameter binding matches the <constantScalar> or  
    <signedConstantScalar> grammar rules, the corresponding program parameter  
    variable is bound to the vector (X,X,X,X), where X is the value of the  
    specified constant.  
  
    If a program parameter binding matches <constantVector>, the corresponding  
    program parameter variable is bound to the vector (X,Y,Z,W), where X, Y,  
    Z, and W are the values corresponding to the first, second, third, and  
    fourth match of <signedConstantScalar>.  If fewer than four constants are  
    specified, Y, Z, and W assume the values 0, 0, and 1, if their respective  
    constants are not specified.  
  
    Constant bindings can be interpreted as having signed integer, unsigned  
    integer, or floating-point values, depending on how they are used in the  
    program text.  For constants in variable declarations, the components of  
    the constant are interpreted according to the variable's component data  
    type modifier.  If no data type modifier is specified in a declaration,  
    constants are interpreted as floating-point values.  For constant bindings  
    used directly in an instruction, the components of the constant are  
    interpreted according to the required data type of the operand.  A program  
    will fail to load if it specifies a floating-point constant value  
    (matching the <floatConstant> grammar rule) that should be interpreted as  
    a signed or unsigned integer, or a negative integer constant value that  
    should be interpreted as an unsigned integer.  
  
    If the value used to specify a floating-point constant can not be exactly  
    represented, the nearest floating-point value will be used.  If the value  
    used to specify an integer constant is too large to be represented, the  
    program will fail to load.  
  
  
    Program Environment/Local Parameter Bindings  
  
      Binding                    Components  Underlying State  
      -------------------------  ----------  -------------------------------  
      program.env[a]             (x,y,z,w)   program environment parameter a  
      program.local[a]           (x,y,z,w)   program local parameter a  
      program.env[a..b]          (x,y,z,w)   program environment parameters   
                                             a through b  
      program.local[a..b]        (x,y,z,w)   program local parameters   
                                             a through b  
  
      Table X.1:  Program Environment/Local Parameter Bindings.  <a> and <b>  
      indicate parameter numbers, where <a> must be less than or equal to <b>.  
  
    If a program parameter binding matches "program.env[a]" or  
    "program.local[a]", the four components of the program parameter variable  
    are filled with the four components of program environment parameter <a>  
    or program local parameter <a> respectively.  
  
    Additionally, for program parameter array bindings, "program.env[a..b]"  
    and "program.local[a..b]" are equivalent to specifying program environment  
    or local parameters <a> through <b> in order, respectively.  A program  
    using any of these bindings will fail to load if <a> is greater than <b>.  
  
    Program environment and local parameters are typeless, and may be  
    specified as signed integer, unsigned integer, or floating-point  
    variables.  If a program environment parameter is read using a data type  
    other than the one used to specify it, an undefined value is returned.  
  
  
    Material Property Bindings  
  
      Binding                        Components  Underlying State  
      -----------------------------  ----------  ----------------------------  
      state.material.ambient         (r,g,b,a)   front ambient material color  
      state.material.diffuse         (r,g,b,a)   front diffuse material color  
      state.material.specular        (r,g,b,a)   front specular material color  
      state.material.emission        (r,g,b,a)   front emissive material color  
      state.material.shininess       (s,0,0,1)   front material shininess  
      state.material.front.ambient   (r,g,b,a)   front ambient material color  
      state.material.front.diffuse   (r,g,b,a)   front diffuse material color  
      state.material.front.specular  (r,g,b,a)   front specular material color  
      state.material.front.emission  (r,g,b,a)   front emissive material color  
      state.material.front.shininess (s,0,0,1)   front material shininess  
      state.material.back.ambient    (r,g,b,a)   back ambient material color  
      state.material.back.diffuse    (r,g,b,a)   back diffuse material color  
      state.material.back.specular   (r,g,b,a)   back specular material color  
      state.material.back.emission   (r,g,b,a)   back emissive material color  
      state.material.back.shininess  (s,0,0,1)   back material shininess  
  
      Table X.3:  Material Property Bindings.  If a material face is not  
      specified in the binding, the front property is used.  
  
    If a program parameter binding matches any of the material properties  
    listed in Table X.3, the program parameter variable is filled according to  
    the table.  For ambient, diffuse, specular, or emissive colors, the "x",  
    "y", "z", and "w" components are filled with the "r", "g", "b", and "a"  
    components, respectively, of the corresponding material color.  For  
    material shininess, the "x" component is filled with the material's  
    specular exponent, and the "y", "z", and "w" components are filled with  
    the floating-point constants 0, 0, and 1, respectively.  Bindings  
    containing ".back" refer to the back material; all other bindings refer to  
    the front material.  
  
    Material properties can be changed inside a Begin/End pair, either  
    directly by calling Material, or indirectly through color material.  
    However, such property changes are not guaranteed to update program  
    parameter bindings until the following End command.  Program parameter  
    variables bound to material properties changed inside a Begin/End pair are  
    undefined until the following End command.  
  
  
    Light Property Bindings  
  
      Binding                        Components  Underlying State  
      -----------------------------  ----------  ----------------------------  
      state.light[n].ambient         (r,g,b,a)   light n ambient color  
      state.light[n].diffuse         (r,g,b,a)   light n diffuse color  
      state.light[n].specular        (r,g,b,a)   light n specular color  
      state.light[n].position        (x,y,z,w)   light n position  
      state.light[n].attenuation     (a,b,c,e)   light n attenuation constants  
                                                 and spot light exponent  
      state.light[n].spot.direction  (x,y,z,c)   light n spot direction and  
                                                 cutoff angle cosine  
      state.light[n].half            (x,y,z,1)   light n infinite half-angle  
      state.lightmodel.ambient       (r,g,b,a)   light model ambient color  
      state.lightmodel.scenecolor    (r,g,b,a)   light model front scene color  
      state.lightmodel.              (r,g,b,a)   light model front scene color  
               front.scenecolor  
      state.lightmodel.              (r,g,b,a)   light model back scene color  
               back.scenecolor  
      state.lightprod[n].ambient     (r,g,b,a)   light n / front material  
                                                 ambient color product  
      state.lightprod[n].diffuse     (r,g,b,a)   light n / front material  
                                                 diffuse color product  
      state.lightprod[n].specular    (r,g,b,a)   light n / front material  
                                                 specular color product  
      state.lightprod[n].            (r,g,b,a)   light n / front material  
              front.ambient                      ambient color product  
      state.lightprod[n].            (r,g,b,a)   light n / front material  
              front.diffuse                      diffuse color product  
      state.lightprod[n].            (r,g,b,a)   light n / front material  
              front.specular                     specular color product  
      state.lightprod[n].            (r,g,b,a)   light n / back material  
              back.ambient                       ambient color product  
      state.lightprod[n].            (r,g,b,a)   light n / back material  
              back.diffuse                       diffuse color product  
      state.lightprod[n].            (r,g,b,a)   light n / back material  
              back.specular                      specular color product  
  
      Table X.4: Light Property Bindings.  <n> indicates a light number.  
  
    If a program parameter binding matches "state.light[n].ambient",  
    "state.light[n].diffuse", or "state.light[n].specular", the "x", "y", "z",  
    and "w" components of the program parameter variable are filled with the  
    "r", "g", "b", and "a" components, respectively, of the corresponding  
    light color.  
  
    If a program parameter binding matches "state.light[n].position", the "x",  
    "y", "z", and "w" components of the program parameter variable are filled  
    with the "x", "y", "z", and "w" components, respectively, of the light  
    position.  
      
    If a program parameter binding matches "state.light[n].attenuation", the  
    "x", "y", and "z" components of the program parameter variable are filled  
    with the constant, linear, and quadratic attenuation parameters of the  
    specified light, respectively (section 2.13.1).  The "w" component of the  
    program parameter variable is filled with the spot light exponent of the  
    specified light.  
  
    If a program parameter binding matches "state.light[n].spot.direction",  
    the "x", "y", and "z" components of the program parameter variable are  
    filled with the "x", "y", and "z" components of the spot light direction  
    of the specified light, respectively (section 2.13.1).  The "w" component  
    of the program parameter variable is filled with the cosine of the spot  
    light cutoff angle of the specified light.  
  
    If a program parameter binding matches "state.light[n].half", the "x",  
    "y", and "z" components of the program parameter variable are filled with  
    the x, y, and z components, respectively, of the normalized infinite  
    half-angle vector  
  
      h_inf = || P + (0, 0, 1) ||.  
  
    The "w" component is filled with 1.0.  In the computation of h_inf, P  
    consists of the x, y, and z coordinates of the normalized vector from the  
    eye position P_e to the eye-space light position P_pli (section 2.13.1).  
    h_inf is defined to correspond to the normalized half-angle vector when  
    using an infinite light (w coordinate of the position is zero) and an  
    infinite viewer (v_bs is FALSE).  For local lights or a local viewer,  
    h_inf is well-defined but does not match the normalized half-angle vector,  
    which will vary depending on the vertex position.  
  
    If a program parameter binding matches "state.lightmodel.ambient", the  
    "x", "y", "z", and "w" components of the program parameter variable are  
    filled with the "r", "g", "b", and "a" components of the light model  
    ambient color, respectively.  
  
    If a program parameter binding matches "state.lightmodel.scenecolor" or  
    "state.lightmodel.front.scenecolor", the "x", "y", and "z" components of  
    the program parameter variable are filled with the "r", "g", and "b"  
    components respectively of the "front scene color"  
  
      c_scene = a_cs * a_cm + e_cm,  
  
    where a_cs is the light model ambient color, a_cm is the front ambient  
    material color, and e_cm is the front emissive material color.  The "w"  
    component of the program parameter variable is filled with the alpha  
    component of the front diffuse material color.  If a program parameter  
    binding matches "state.lightmodel.back.scenecolor", a similar back scene  
    color, computed using back-facing material properties, is used.  The front  
    and back scene colors match the values that would be assigned to vertices  
    using conventional lighting if all lights were disabled.  
  
    If a program parameter binding matches anything beginning with  
    "state.lightprod[n]", the "x", "y", and "z" components of the program  
    parameter variable are filled with the "r", "g", and "b" components,  
    respectively, of the corresponding light product.  The three light product  
    components are the products of the corresponding color components of the  
    specified material property and the light color of the specified light  
    (see Table X.4).  The "w" component of the program parameter variable is  
    filled with the alpha component of the specified material property.  
  
    Light products depend on material properties, which can be changed inside  
    a Begin/End pair.  Such property changes are not guaranteed to take effect  
    until the following End command.  Program parameter variables bound to  
    light products whose corresponding material property changes inside a  
    Begin/End pair are undefined until the following End command.  
  
  
    Texture Coordinate Generation Property Bindings  
  
      Binding                    Components  Underlying State  
      -------------------------  ----------  ----------------------------  
      state.texgen[n].eye.s      (a,b,c,d)   TexGen eye linear plane  
                                             coefficients, s coord, unit n  
      state.texgen[n].eye.t      (a,b,c,d)   TexGen eye linear plane  
                                             coefficients, t coord, unit n  
      state.texgen[n].eye.r      (a,b,c,d)   TexGen eye linear plane  
                                             coefficients, r coord, unit n  
      state.texgen[n].eye.q      (a,b,c,d)   TexGen eye linear plane  
                                             coefficients, q coord, unit n  
      state.texgen[n].object.s   (a,b,c,d)   TexGen object linear plane  
                                             coefficients, s coord, unit n  
      state.texgen[n].object.t   (a,b,c,d)   TexGen object linear plane  
                                             coefficients, t coord, unit n  
      state.texgen[n].object.r   (a,b,c,d)   TexGen object linear plane  
                                             coefficients, r coord, unit n  
      state.texgen[n].object.q   (a,b,c,d)   TexGen object linear plane  
                                             coefficients, q coord, unit n  
  
      Table X.5:  Texture Coordinate Generation Property Bindings.  "[n]" is  
      optional -- texture unit <n> is used if specified; texture unit 0 is  
      used otherwise.  
  
    If a program parameter binding matches a set of TexGen plane coefficients,  
    the "x", "y", "z", and "w" components of the program parameter variable  
    are filled with the coefficients p1, p2, p3, and p4, respectively, for  
    object linear coefficients, and the coefficents p1', p2', p3', and p4',  
    respectively, for eye linear coefficients (section 2.10.4).  
  
  
    Fog Property Bindings  
  
      Binding                        Components  Underlying State  
      -----------------------------  ----------  ----------------------------  
      state.fog.color                (r,g,b,a)   RGB fog color (section 3.10)  
      state.fog.params               (d,s,e,r)   fog density, linear start  
                                                 and end, and 1/(end-start)  
                                                 (section 3.10)   
  
      Table X.6:  Fog Property Bindings  
  
    If a program parameter binding matches "state.fog.color", the "x", "y",  
    "z", and "w" components of the program parameter variable are filled with  
    the "r", "g", "b", and "a" components, respectively, of the fog color  
    (section 3.10).  
  
    If a program parameter binding matches "state.fog.params", the "x", "y",  
    and "z" components of the program parameter variable are filled with the  
    fog density, linear fog start, and linear fog end parameters (section  
    3.10), respectively.  The "w" component is filled with 1/(end-start),  
    where end and start are the linear fog end and start parameters,  
    respectively.  
  
  
    Clip Plane Property Bindings  
  
      Binding                        Components  Underlying State  
      -----------------------------  ----------  ----------------------------  
      state.clip[n].plane            (a,b,c,d)   clip plane n coefficients  
  
      Table X.7:  Clip Plane Property Bindings.  <n> specifies the clip plane  
      number, and is required.  
  
    If a program parameter binding matches "state.clip[n].plane", the "x",  
    "y", "z", and "w" components of the program parameter variable are filled  
    with the coefficients p1', p2', p3', and p4', respectively, of clip plane  
    <n> (section 2.11).  
  
  
    Point Property Bindings  
  
      Binding                        Components  Underlying State  
      -----------------------------  ----------  ----------------------------  
      state.point.size               (s,n,x,f)   point size, min and max size  
                                                 clamps, and fade threshold  
                                                 (section 3.3)   
      state.point.attenuation        (a,b,c,1)   point size attenuation consts  
  
      Table X.8:  Point Property Bindings  
  
    If a program parameter binding matches "state.point.size", the "x", "y",  
    "z", and "w" components of the program parameter variable are filled with  
    the point size, minimum point size, maximum point size, and fade  
    threshold, respectively (section 3.3).  
  
    If a program parameter binding matches "state.point.attenuation", the "x",  
    "y", and "z" components of the program parameter variable are filled with  
    the constant, linear, and quadratic point size attenuation parameters (a,  
    b, and c), respectively (section 3.3).  The "w" component is filled with  
    1.0.  
  
  
    Texture Environment Property Bindings  
  
      Binding                    Components  Underlying State  
      -------------------------  ----------  ----------------------------  
      state.texenv[n].color      (r,g,b,a)   texture environment n color  
  
      Table X.9:  Texture Environment Property Bindings.  "[n]" is optional --  
      texture unit <n> is used if specified; texture unit 0 is used otherwise.  
  
    If a program parameter binding matches "state.texenv[n].color", the "x",  
    "y", "z", and "w" components of the program parameter variable are filled  
    with the "r", "g", "b", and "a" components, respectively, of the  
    corresponding texture environment color.  Note that only "legacy" texture  
    units, as queried by MAX_TEXTURE_UNITS, include texture environment state.  
    Texture image units and texture coordinate sets do not have associated  
    texture environment state.  
  
  
    Depth Property Bindings  
  
      Binding                      Components  Underlying State  
      ---------------------------  ----------  ----------------------------  
      state.depth.range            (n,f,d,1)   Depth range near, far, and  
                                               (far-near) (section 2.10.1)  
  
      Table X.10:  Depth Property Bindings  
  
    If a program parameter binding matches "state.depth.range", the "x" and  
    "y" components of the program parameter variable are filled with the  
    mappings of near and far clipping planes to window coordinates,  
    respectively.  The "z" component is filled with the difference of the  
    mappings of near and far clipping planes, far minus near.  The "w"  
    component is filled with 1.0.  
  
  
    Matrix Property Bindings  
  
      Binding                               Underlying State  
      ------------------------------------  ---------------------------  
      * state.matrix.modelview[n]           modelview matrix n  
        state.matrix.projection             projection matrix  
        state.matrix.mvp                    modelview-projection matrix  
      * state.matrix.texture[n]             texture matrix n  
        state.matrix.program[n]             program matrix n  
  
      Table X.11:  Base Matrix Property Bindings.  The "[n]" syntax indicates  
      a specific matrix number.  For modelview and texture matrices, a matrix  
      number is optional, and matrix zero will be used if the matrix number is  
      omitted.  These base bindings may further be modified by a  
      inverse/transpose selector and a row selector.  
  
    If the beginning of a program parameter binding matches any of the matrix  
    binding names listed in Table X.11, the binding corresponds to a 4x4  
    matrix.  If the parameter binding is followed by ".inverse", ".transpose",  
    or ".invtrans" (<stateMatModifier> grammar rule), the inverse, transpose,  
    or transpose of the inverse, respectively, of the matrix specified in  
    Table X.11 is selected.  Otherwise, the matrix specified in Table X.11 is  
    selected.  If the specified matrix is poorly-conditioned (singular or  
    nearly so), its inverse matrix is undefined.  The binding name  
    "state.matrix.mvp" refers to the product of modelview matrix zero and the  
    projection matrix, defined as  
  
       MVP = P * M0,  
  
    where P is the projection matrix and M0 is modelview matrix zero.  
  
    If the selected matrix is followed by ".row[<a>]" (matching the  
    <stateMatrixRow> grammar rule), the "x", "y", "z", and "w" components of  
    the program parameter variable are filled with the four entries of row <a>  
    of the selected matrix.  In the example,  
  
      PARAM m0 = state.matrix.modelview[1].row[0];  
      PARAM m1 = state.matrix.projection.transpose.row[3];  
  
    the variable "m0" is set to the first row (row 0) of modelview matrix 1  
    and "m1" is set to the last row (row 3) of the transpose of the projection  
    matrix.  
  
    For program parameter array bindings, multiple rows of the selected matrix  
    can be bound via the <stateMatrixRows> grammar rule.  If the selected  
    matrix binding is followed by ".row[<a>..<b>]", the result is equivalent  
    to specifying matrix rows <a> through <b>, in order.  A program will fail  
    to load if <a> is greater than <b>.  If no row selection is specified  
    (<optMatrixRows> matches ""), matrix rows 0 through 3 are bound in order.  
    In the example,  
  
      PARAM m2[] = { state.matrix.program[0].row[1..2] };  
      PARAM m3[] = { state.matrix.program[0].transpose };  
  
    the array "m2" has two entries, containing rows 1 and 2 of program matrix  
    zero, and "m3" has four entries, containing all four rows of the transpose  
    of program matrix zero.  
  
  
    Section 2.X.3.4, Program Temporaries  
  
    Program temporary variables are used to hold temporary results during  
    program execution.  Temporaries do not persist between program  
    invocations, and are undefined at the beginning of each program  
    invocation.  
  
    Temporary variables are declared explicitly using the <TEMP_statement>  
    grammar rule.  Each such statement can declare one or more temporaries.  
    Temporaries can not be declared implicitly.  Temporaries can be declared  
    using any component size ("SHORT" or "LONG") and type ("FLOAT" or "INT")  
    modifier.  
  
    Temporary variables may be declared as arrays.  Temporary variables  
    declared as arrays may be stored in slower memory than those not declared  
    as arrays, and it is recommended to use non-array variables unless array  
    functionality is required.  
  
  
    Section 2.X.3.5, Program Results  
  
    Program result variables represent the per-vertex or per-fragment results  
    of the program.  All result variables have associated bindings, are  
    write-only during program execution, and are undefined at the beginning of  
    each program invocation.  Any vertex or fragment attributes corresponding  
    to unwritten result variables will be undefined in subsequent stages of  
    the pipeline.  Result variables may be declared explicitly via the  
    <OUTPUT_statement> grammar rule, or implicitly by using a result binding  
    in an instruction.  
  
    The set of available result bindings depends on the program type, and is  
    enumerated in the specifications for each program type.  
  
    Result variables may generally be declared as arrays, but the set of  
    bindings allowed for arrays is limited to state grouped in arrays (e.g.,  
    texture coordinates, clip distances, colors).  Additionally, all bindings  
    assigned to the array must be of the same binding type and must increase  
    consecutively.  Examples of valid and invalid binding lists for vertex  
    programs include:  
  
      result.clip[1], result.clip[2]          # valid, 2-entry array  
      result.texcoord[0..3]                   # valid, 4-entry array  
      result.texcoord[1], result.texcoord[3]  # invalid, skipped texcoord 2  
      result.texcoord[2], result.texcoord[1]  # invalid, wrong order  
      result.texcoord[1], result.clip[2]      # invalid, different types  
  
    Additionally, result bindings may be used in no more than one array  
    addressed with relative addressing.  
  
    Implementations may have a limit on the total number of result binding  
    components used by each program target (MAX_PROGRAM_RESULT_COMPONENTS_NV).  
    Programs that require more result binding components than this limit will  
    fail to load.  The method of counting used result binding components is  
    implementation-dependent, but must satisfy the following properties:  
  
      * If a result binding is not referenced in a program, or is referenced  
        only in declarations of result variables that are not used, none of  
        its components are counted.  
  
      * A result binding component may be counted as used only if there exists  
        an instruction operand where  
  
          - the component is enabled in the write mask (Section 2.X.4.3), and  
  
          - the result binding is either  
  
              - referenced directly by the operand,  
  
              - bound to a declared variable referenced by the operand, or  
  
              - bound to a declared array variable where another binding in  
                the array satisfies one of the two previous conditions.  
  
        Implementations are not required to optimize out unused elements of an  
        result array or components that are used in only some elements of an  
        array.  The last of these rules is intended to cover the case where  
        the same result binding is used in multiple variables.  
  
        For example, an instruction whose write mask selects only the x  
        component may result in the x component of a result binding being  
        counted, but may never result in the counting of the y, z, or w  
        components of any result binding.  
  
  
    Section 2.X.3.6, Program Parameter Buffers  
  
    Program parameter buffers are arrays consisting of single-component  
    typeless values or four-component typeless vectors stored in a buffer  
    object.  The GL provides an implementation-dependent number of buffer  
    object binding points for each program target, to which buffer objects can  
    be attached.  Program parameter buffer variables can be changed either by  
    updating the contents of bound buffer objects, or simply by changing the  
    buffer object attached to a binding point.  
  
    Program parameter buffer variables are used as constants during program  
    execution.  All program parameter buffer variables have an associated  
    binding and are read-only during program execution.  Program parameter  
    buffers retain their values across program invocations, although their  
    values may change as buffer object bindings or contents change.  Program  
    parameter buffer variables must be declared explicitly via the  
    <BUFFER_statement> grammar rule.  Program parameter buffer bindings can  
    not be used directly in executable instructions.  
  
    Program parameter buffer variables are treated as an array of  
    single-component values if the <bufferDeclType> grammar rule matches  
    "BUFFER" or as an array of four-component vectors if it matches "BUFFER4".  
    A program will fail to load if a variable declared as "BUFFER" and another  
    variable declared as "BUFFER4" use the same buffer binding point.  
  
    Program parameter buffer variables may be declared as arrays, but all  
    bindings assigned to the array must use the same binding point and must  
    increase consecutively.  
  
      Binding                        Components  Underlying State  
      -----------------------------  ----------  -----------------------------  
      program.buffer[a][b]           (x,x,x,x)   program parameter buffer a,  
                                                   element b  
      program.buffer[a][b..c]        (x,x,x,x)   program parameter buffer a,  
                                                   elements b through c  
      program.buffer[a]              (x,x,x,x)   program parameter buffer a,  
                                                   all elements  
  
      Table X.12: Program Parameter Buffer Bindings.  <a> indicates a buffer  
      number, <b> and <c> indicate individual elements.  
  
    If a program parameter buffer binding matches "program.buffer[a][b]", the  
    program parameter variable are filled with element <b> of the buffer  
    object bound to binding point <a>.  Each element of the bound buffer  
    object is treated a one or four words of data that can hold integer or  
    floating-point values.  When a single-component binding is evaluated, the  
    selected word is broadcast to all four components of the variable.  When a  
    four-component binding is evaluated, the four components of the buffer  
    element are loaded into the variable.  If no buffer object is bound to  
    binding point <a>, or the bound buffer object is not large enough to hold  
    an element <b>, the values used are undefined.  The binding point <a> must  
    be a nonnegative integer constant.  
  
    For program parameter buffer array declarations, "program.buffer[a][b..c]"  
    is equivalent to specifying elements <b> through <c> of the buffer object  
    bound to binding point <a> in order.  
  
    For program parameter buffer array declarations, "program.buffer[a]" is  
    equivalent to specifying the entire buffer -- elements 0 through <n>-1,  
    where <n> is either the size of the array (if declared) or the  
    implementation-dependent maximum parameter buffer object size limit (if no  
    size is declared).  
  
  
    Section 2.X.3.7, Program Condition Code Registers  
  
    The program condition code registers are four-component vectors.  Each  
    component of this register is a collection of single-bit flags, including  
    a sign flag (SF), a zero flag (ZF), an overflow flag (OF), and a carry  
    flag (CF).  There are two condition code registers (CC0 and CC1), whose  
    values are undefined at the beginning of program execution.  
  
    Most program instructions can optionally update one of the condition code  
    registers, by designating the condition code to update in the instruction.  
    When a condition code component is updated, the four flags of each  
    component of the condition code are set according to the corresponding  
    component of the instruction result.  Full details on the condition code  
    updates and tests can be found in Section 2.X.4.3.  
  
    The value of these four flags can be combined in various condition code  
    tests, which can be used to mask writes to destination variables and to  
    perform conditional branches or other condition operations.  
  
  
    Section 2.X.3.8, Program Aliases  
  
    Programs can create aliases by matching the <ALIAS_statement> grammar  
    rule.  Aliases allow programs to use multiple variable names to refer to a  
    single underlying variable.  For example, the statement  
  
      ALIAS var1 = var0  
  
    establishes a variable name of "var1".  Subsequent references to "var1" in  
    the program text are treated as references to "var0".  The left hand side  
    of an ALIAS statement must be a new variable name, and the right hand side  
    must be an established variable name.  
  
    Aliases are not considered variable declarations, so do not count against  
    the limits on the number of variable declarations allowed in the program  
    text.  
  
  
    Section 2.X.3.9, Program Resource Limits  
  
    (see ARB_vertex_program specification, incorporates all the different  
    limits on instruction counts, temporaries, attribute bindings, program  
    parameters, and so on)  
  
  
    Section 2.X.4, Program Execution Environment  
  
    The set of instructions supported for GPU programs is given in Table X.13  
    below and is described in detail in Section 2.X.8.  An instruction can use  
    up to three operands when it executes, and most instructions can write a  
    single result vector.  Instructions may also specify one or more  
    modifiers, according to the <opModifiers> grammar rule.  Instruction  
    modifiers affect how the specified operation is performed.  
  
    GPU programs may operate on signed integer, unsigned integer, or  
    floating-point values; some instructions are capable of operating on any  
    of the three types.  However, the data type of the operands and the result  
    are always determined based solely on the instruction and its modifiers.  
    If any of the variables used in the instruction are typeless, they will be  
    interpreted according to the data type derived from the instruction.  If  
    any variables with a conflicting data type are used in the instruction,  
    the program will fail to load unless the "NTC" (no type checking)  
    instruction modifier is specified.  
  
                  Modifiers   
      Instruction F I C S H D  Out Inputs    Description  
      ----------- - - - - - -  --- --------  --------------------------------  
      ABS         X X X X X F  v   v         absolute value  
      ADD         X X X X X F  v   v,v       add  
      AND         - X X - - S  v   v,v       bitwise and  
      BRK         - - - - - -  -   c         break out of loop instruction  
      CAL         - - - - - -  -   c         subroutine call  
      CEIL        X X X X X F  v   vf        ceiling  
      CMP         X X X X X F  v   v,v,v     compare  
      CONT        - - - - - -  -   c         continue with next loop interation  
      COS         X - X X X F  s   s         cosine with reduction to [-PI,PI]  
      DIV         X X X X X F  v   v,s       divide vector components by scalar  
      DP2         X - X X X F  s   v,v       2-component dot product  
      DP2A        X - X X X F  s   v,v,v     2-comp. dot product w/scalar add  
      DP3         X - X X X F  s   v,v       3-component dot product  
      DP4         X - X X X F  s   v,v       4-component dot product  
      DPH         X - X X X F  s   v,v       homogeneous dot product  
      DST         X - X X X F  v   v,v       distance vector  
      ELSE        - - - - - -  -   -         start if test else block  
      ENDIF       - - - - - -  -   -         end if test block  
      ENDREP      - - - - - -  -   -         end of repeat block  
      EX2         X - X X X F  s   s         exponential base 2  
      FLR         X X X X X F  v   vf        floor  
      FRC         X - X X X F  v   v         fraction  
      I2F         - X X - - S  vf  v         integer to float  
      IF          - - - - - -  -   c         start of if test block  
      KIL         X X - - X F  -   vc        kill fragment  
      LG2         X - X X X F  s   s         logarithm base 2  
      LIT         X - X X X F  v   v         compute lighting coefficients  
      LRP         X - X X X F  v   v,v,v     linear interpolation  
      MAD         X X X X X F  v   v,v,v     multiply and add  
      MAX         X X X X X F  v   v,v       maximum  
      MIN         X X X X X F  v   v,v       minimum  
      MOD         - X X - - S  v   v,s       modulus vector components by scalar  
      MOV         X X X X X F  v   v         move  
      MUL         X X X X X F  v   v,v       multiply  
      NOT         - X X - - S  v   v         bitwise not  
      NRM         X - X X X F  v   v         normalize 3-component vector  
      OR          - X X - - S  v   v,v       bitwise or  
      PK2H        X X - - - F  s   vf        pack two 16-bit floats  
      PK2US       X X - - - F  s   vf        pack two floats as unsigned 16-bit  
      PK4B        X X - - - F  s   vf        pack four floats as signed 8-bit  
      PK4UB       X X - - - F  s   vf        pack four floats as unsigned 8-bit  
      POW         X - X X X F  s   s,s       exponentiate  
      RCC         X - X X X F  s   s         reciprocal (clamped)  
      RCP         X - X X X F  s   s         reciprocal  
      REP         X X - - X F  -   v         start of repeat block  
      RET         - - - - - -  -   c         subroutine return  
      RFL         X - X X X F  v   v,v       reflection vector  
      ROUND       X X X X X F  v   vf        round to nearest integer  
      RSQ         X - X X X F  s   s         reciprocal square root  
      SAD         - X X - - S  vu  v,v,vu    sum of absolute differences  
      SCS         X - X X X F  v   s         sine/cosine without reduction  
      SEQ         X X X X X F  v   v,v       set on equal  
      SFL         X X X X X F  v   v,v       set on false  
      SGE         X X X X X F  v   v,v       set on greater than or equal  
      SGT         X X X X X F  v   v,v       set on greater than  
      SHL         - X X - - S  v   v,s       shift left  
      SHR         - X X - - S  v   v,s       shift right   
      SIN         X - X X X F  s   s         sine with reduction to [-PI,PI]  
      SLE         X X X X X F  v   v,v       set on less than or equal  
      SLT         X X X X X F  v   v,v       set on less than  
      SNE         X X X X X F  v   v,v       set on not equal  
      SSG         X - X X X F  v   v         set sign  
      STR         X X X X X F  v   v,v       set on true  
      SUB         X X X X X F  v   v,v       subtract  
      SWZ         X - X X X F  v   v         extended swizzle  
      TEX         X X X X - F  v   vf        texture sample  
      TRUNC       X X X X X F  v   vf        truncate (round toward zero)  
      TXB         X X X X - F  v   vf        texture sample with bias  
      TXD         X X X X - F  v   vf,vf,vf  texture sample w/partials        
      TXF         X X X X - F  v   vs        texel fetch  
      TXL         X X X X - F  v   vf        texture sample w/LOD  
      TXP         X X X X - F  v   vf        texture sample w/projection  
      TXQ         - - - - - S  vs  vs        texture info query  
      UP2H        X X X X - F  vf  s         unpack two 16-bit floats  
      UP2US       X X X X - F  vf  s         unpack two unsigned 16-bit ints  
      UP4B        X X X X - F  vf  s         unpack four signed 8-bit ints  
      UP4UB       X X X X - F  vf  s         unpack four unsigned 8-bit ints  
      X2D         X - X X X F  v   v,v,v     2D coordinate transformation  
      XOR         - X X - - S  v   v,v       exclusive or  
      XPD         X - X X X F  v   v,v       cross product  
  
      Table X.13:  Summary of NV_gpu_program4 instructions.  The "Modifiers"  
      columns specify the set of modifiers allowed for the instruction:  
  
        F = floating-point data type modifiers  
        I = signed and unsigned integer data type modifiers  
        C = condition code update modifiers  
        S = clamping (saturation) modifiers  
        H = half-precision float data type suffix  
        D = default data type modifier (F, U, or S)  
  
      The input and output columns describe the formats of the operands and  
      results of the instruction.  
  
        v:  4-component vector (data type is inherited from operation)  
        vf: 4-component vector (data type is always floating-point)  
        vs: 4-component vector (data type is always signed integer)  
        vu: 4-component vector (data type is always unsigned integer)  
        s:  scalar (replicated if written to a vector destination;  
                    data type is inherited from operation)  
        c:  condition code test result (e.g., "EQ", "GT1.x")  
        vc: 4-component vector or condition code test  
  
  
    Section 2.X.4.1, Program Instruction Modifiers  
  
    There are several types of instruction modifiers available.  A data type  
    modifier specifies that an instruction should operate on signed integer,  
    unsigned integer, or floating-point data, when multiple data types are  
    supported.  A clamping modifier applies to instructions with  
    floating-point results, and specifies the range to which the results  
    should be clamped.  A condition code update modifier specifies that the  
    instruction should update one of the condition code variables.  Several  
    other special modifiers are also provided.  
  
    Instruction modifiers may be specified as stand-alone modifiers or as  
    suffixes concatenated with the opcode name.  A program will fail to load  
    if it contains an instruction that  
  
      * specifies more than one modifier of any given type,  
  
      * specifies a clamping modifier on an instruction, unless it produces  
        floating-point results, or  
  
      * specifies a modifier that is not supported by the instruction (see  
        Table X.13 and the instruction description).  
  
    Stand-alone instruction modifiers are specified according to the  
    <opModifiers> grammar rule using a ".<modifier>" syntax.  Multiple  
    modifers, separated by periods, may be specified.  The set of supported  
    modifiers is described in Table X.14.  
  
      Modifier  Description  
      --------  -----------------------------------------------  
      F         Floating-point operation  
      U         Fixed-point operation, unsigned operands  
      S         Fixed-point operation, signed operands  
      CC        Update condition code register zero  
      CC0       Update condition code register zero  
      CC1       Update condition code register one  
      SAT       Floating-point results clamped to [0,1]  
      SSAT      Floating-point results clamped to [-1,1]  
      NTC       Disable type-checking on operands/results  
      S24       Signed multiply (24-bit operands)  
      U24       Unsigned multiply (24-bit operands)  
      HI        Multiplies two 32-bit integer operands, returns  
                  the 32 MSBs of the product  
  
      Table X.14, Instruction Modifers.  
  
    "F", "U", and "S" modifiers are data type modifiers and specify that the  
    instruction should operate on floating-point, unsigned integer, or  
    signed integer values, respectively.  For example, "ADD.F", "ADD.U", and  
    "ADD.S" specify component-wise addition of floating-point, unsigned  
    integer, or signed integer vectors, respectively.  These modifiers specify  
    a data type, but do not specify a precision at which the operation is  
    performed.  Floating-point operations will be carried out with an internal  
    precision no less than that used to represent the largest operand.  
    Fixed-point operations will be carried out using at least as many bits as  
    used to represent the largest operand.  Operands represented with fewer  
    bits than used to perform the instruction will be promoted to a larger  
    data type.  Signed integer operands will be sign-extended, where the most  
    significant bits are filled with ones if the operand is negative and zero  
    otherwise.  Unsigned integer operands will be zero-extended, where the  
    most significant bits are always filled with zeroes.  For some  
    instructions, the data type of some operands or the result are fixed; in  
    these cases, the data type modifier specifies the data type of the  
    remaining values.  
  
    "CC", "CC0", and "CC1" are condition code update modifiers that specify  
    that one of the condition code registers should be updated based on the  
    result of the instruction, as described in section 2.X.4.3.  "CC" and  
    "CC0" specify that the condition code register CC0 be updated; "CC1"  
    specifies an update to CC1.  If no condition code update modifier is  
    provided, the condition code registers will not be affected.  
  
    "SAT" and "SSAT" are clamping modifiers that specify that the  
    floating-point components of the instruction result should be clamped to  
    [0,1] or [-1,1], respectively, before updating the condition code and the  
    destination variable.  If no clamping suffix is specified, unclamped  
    results will be used for condition code updates (if any) and destination  
    variable writes.  Clamping modifiers are not supported on instructions  
    that do not produce floating-point results.  
  
    "NTC" (no type checking) disables data type checking on the instruction,  
    and allows instructions to use operands or result variables whose data  
    types are inconsistent with the expected data types of the instruction.  
  
    "S24", "U24", and "HI" are special modifiers that are allowed only for the  
    MUL instruction, and are described in detail where MUL is documented.  No  
    more than one such modifier may be provided for any instruction.  
  
    If an instruction supports data type modifiers, but none is provided, a  
    default data type will be chosen based on the instruction, as specified in  
    Table X.13 and the instruction set description (Section 2.X.8).  If  
    condition code update or clamping modifiers are not specified, the  
    corresponding operation will not be performed.  
  
    Additionally, each instruction name may have one or more suffixes,  
    concatenated onto the base instruction name, that operate as instruction  
    modifiers.  For conciseness, these suffixes are not spelled out in the  
    grammar -- the base opcode name is used as a placeholder for the opcode  
    and all of its possible suffixes.  Instruction suffixes are provided  
    mainly for compatibility with prior GPU program instruction sets (e.g.,  
    NV_vertex_program3, NV_fragment_program2, and predecessors).  The set of  
    allowable suffixes, and their equivalent stand-alone modifiers, are listed  
    in Table X.15.  
  
      Suffix  Modifier     Description  
      ------  ----------   ---------------------------------------------------  
      R       F            Floating-point operation, 32-bit precision  
      H       F(*)         Floating-point operation, at least 16-bit precision  
      C       CC0          Update condition code register zero  
      C0      CC0          Update condition code register zero  
      C1      CC1          Update condition code register one  
      _SAT    SAT          Floating-point results clamped to [0,1]  
      _SSAT   SSAT         Floating-point results clamped to [-1,1]  
  
      Table X.15,  Instruction Suffixes.  
  
    The "R" and "H" suffixes specify floating-point operations and are  
    equivalent to the "F" data type modifier.  They additionally specify a  
    minimum precision for the operations.  Instructions with an "R" precision  
    modifier will be carried out at no less than IEEE single-precision  
    floating-point (8 bits of exponent, 23 bits of mantissa).  Instructions  
    with an "H" precision modifier will be carried out at no less than 16-bit  
    floating-point precision (5 bits of exponent, 10 bits of mantissa).  
  
    An instruction may have multiple suffixes, but they must appear in order,  
    with data type suffixes first, followed by condition code update suffixes,  
    followed by clamping suffixes.  For example, "ADDR" carries out an add at  
    32-bit precision.  "ADDH_SAT" carries out an add at 16-bit precision (or  
    better) and clamps the results to [0,1].  "ADDRC1_SSAT" carries out an add  
    at 32-bit floating-point precision, clamps the results to [-1,1], and  
    updates condition code one based on the clamped result.  
  
  
    Section 2.X.4.2, Program Operands  
  
    Most program instructions operate on one or more scalar or vector  
    operands.  Each operand specifies an operand variable, which is either the  
    name of a previously declared variable or an implicit variable declaration  
    created by using a variable binding in the instruction.  Attribute,  
    parameter, or parameter buffer variables can be declared implicitly by  
    using a valid binding name in an operand.  Instruction operands are  
    specified by the <instOperandV>, <instOperandS>, or <instOperandVNS>  
    grammar rules.  
  
    If the operand variable is not an array, its contents are loaded directly.  
    If the operand variable is an array, a single element of the array is  
    loaded according to the <arrayMem> grammar rule.  The elements of an array  
    are numbered from 0 to <n>-1, where <n> is the number of entries in the  
    array.  Array members can be accessed using either absolute or relative  
    addressing.  
  
    Absolute array addressing is used when the <arrayMemAbs> grammar rule is  
    matched; the array member to load is specified by the matching integer.  
    Out-of-bounds array absolute accesses are not allowed.  If the specified  
    member number is greater than or equal to the size of the array, the  
    program will fail to load.  
  
    Relative array addressing is used when the <arrayMemRel> grammar rule is  
    matched.  This grammar rule allows the program to specify a scalar integer  
    operand and an optional constant offset, according to the <arrayMemReg>  
    and <arrayMemOffset> grammar rules.  When performing relative addressing,  
    the GL evaluates the specified integer scalar operand (according to the  
    rules specified in this section) and adds the constant offset.  The array  
    member loaded is given by this sum.  The constant offset is considered  
    zero if an offset is omitted.  If the sum is negative or exceeds the size  
    of the array, the results of the access are undefined, but may not lead to  
    program or GL termination.  The set of constant offsets supported for  
    relative addressing is limited to values in the range [0,<n>-1], where <n>  
    is the size of the array.  A program will fail to load if it specifies an  
    offset outside that range.  If offsets outside that range are required,  
    they can be applied by using an integer ADD instruction writing to a  
    temporary variable.  
  
    After the operand is loaded, its components can be rearranged according to  
    the <swizzleSuffix> grammar rule, or it can be converted to a scalar  
    operand according to the <scalarSuffix> grammar rule.  
  
    The <swizzleSuffix> grammar rule rearranges the components of a loaded  
    vector to produce another vector.  If the <swizzleSuffix> rule matches the  
    <xyzwSwizzle> or <rgbaSwizzle> grammar rule, a pattern of the form ".????"  
    is used, where each question mark is replaced with one of "x", "y", "z",  
    "w", "r", "g", "b", or a".  For such patterns, the x, y, z, and w  
    components of the operand are taken from the vector components named by  
    the first, second, third, and fourth character of the pattern,  
    respectively.  Swizzle components of "r", "g", "b", and "a" are equivalent  
    to "x", "y", "z", and "w", respectively.  For example, if the swizzle  
    suffix is ".yzzx" or ".gbbr" and the specified source contains {2,8,9,0},  
    the result is the vector {8,9,9,2}.  If the <swizzleSuffix> matches the  
    <component> grammar rule, a pattern of the form ".?" is used.  For this  
    pattern, all four components of the operand are taken from the single  
    component identified by the pattern.  If the swizzle suffix is omitted,  
    components are not rearranged and swizzling has no effect, as though  
    ".xyzw" were specified.  
  
    The swizzle suffix rules do not allow mixing "x", "y", "z", or "w"  
    selectors with "r", "g", "b", or "a" selectors.  A program will fail to  
    load if it contains a swizzle suffix with selectors from both of these  
    sets.  
  
    The <scalarSuffix> grammar rule converts a vector to a scalar by selecting  
    a single component.  The <scalarSuffix> rule is similar to the swizzle  
    selector, except that only a single component is selected.  If the scalar  
    suffix is ".y" and the specified source contains {2,8,9,0}, the value is  
    the scalar value 8.  
  
    Next, a component-wise negate operation is performed on the operand if the  
    <operandNeg> grammar rule matches "-".  Negation is not performed if the  
    operand has no sign prefix, or is prefixed with "+".  For unsigned integer  
    operands, the negate operand performs a two's complement operation.  
  
    Next, a component-wise absolute value operation is performed on the  
    operand if the <instOperandAbsV> or <instOperandAbsS> grammar rule is  
    matched, by surrounding the operand with two "|" characters.  The result  
    is optionally negated if the <operandAbsNeg> grammar rule matches "-".  
    For unsigned integer operands, the absolute value operation has no effect.  
  
  
    Section 2.X.4.3, Program Destination Variable Update  
  
    Most program instructions perform computations that produce a result,  
    which will be written to a variable.  Each instruction that computes a  
    result specifies a destination variable, which is either the name of a  
    previously declared variable or an implicit variable declaration created  
    by using a variable binding in the instruction.  Result variables can be  
    declared implicitly by using a valid program result binding name in the  
    result portion of the instruction.  Instruction results are specified  
    according to the <instResult> grammar rule.  
  
    The destination variable may be a single member of an array.  In this  
    case, a single array member is specified using the <arrayMem> grammar  
    rule, and the array member to update is computed in the exact same manner  
    as done for operand loads.  If the array member is computed at run time,  
    and is negative or greater than or equal to the size of the array, the  
    results of the destination variable update are undefined and could result  
    in overwriting other program variables.  
  
    The results of the operation may be obtained at a different precision than  
    that used to store the destination variable.  If so, the results are  
    converted to match the size of the destination variable.  For  
    floating-point values, the results are rounded to the nearest  
    floating-point value that can be represented in the destination variable.  
    If a result component is larger in magnitude than the largest  
    representable floating-point value in the data type of the destination  
    variable, an infinity encoding (+/-INF) is used.  Signed or unsigned  
    integer values are sign-extended or zero-extended, respectively, if the  
    destination variable has more bits than the result, and have their most  
    significant bits discarded if the destination variable has fewer bits.  
  
    Writes to individual components of a vector destination variable can be  
    controlled at compile time by individual component write masks specified  
    in the instruction.  The component write mask is specified by the  
    <optWriteMask> grammar rule, and is a string of up to four characters,  
    naming the components to enable for writing.  If no write mask is  
    specified, all components are enabled for writing.  The characters "x",  
    "y", "z", and "w" match the x, y, z, and w components respectively.  For  
    example, a write mask mask of ".xzw" indicates that the x, z, and w  
    components should be enabled for writing but the y component should not be  
    written.  The grammar requires that the destination register mask  
    components must be listed in "xyzw" order.  Additionally, write mask  
    components of "r", "g", "b", and "a" are equivalent to "x", "y", "z", and  
    "w", respectively.  The grammar does not allow mixing "x", "y", "z", or  
    "w" components with "r", "g", "b", and "a" ones.  
  
    Writes to individual components of a vector destination variable, or to a  
    scalar destination variable, can also be controlled at run time using  
    condition code write masks.  The condition code write mask is specified by  
    the <ccMask> grammar rule.  If a mask is specified, a condition code  
    variable is loaded according to the <ccMaskRule> grammar rule and tested  
    as described in Table X.16 to produce a four-component vector of TRUE/FALSE  
    values.  
  
         mask rule         test name                condition  
         ---------------   ----------------------   -----------------  
         EQ,  EQ0,  EQ1    equal                    !SF && ZF  
         GE,  GE0,  GE1    greater than or equal    !(SF ^ OF)  
         GT,  GT0,  GT1    greater than             (!SF ^ OF) && !ZF  
         LE,  LE0,  LE1    less than or equal       SF ^ (ZF || OF)  
         LT,  LT0,  LT1    less than                (SF && !ZF) ^ OF  
         NE,  NE0,  NE1    not equal                SF || !ZF  
         FL,  FL0,  FL1    false                    always false  
         TR,  TR0,  TR1    true                     always true  
  
         NAN, NAN0, NAN1   not a number             SF && ZF  
         LEG, LEG0, LEG1   less, equal, or greater  !SF || !ZF  
                             (anything but a NaN)  
  
         CF,  CF0,  CF1    carry flag               CF  
         NCF, NCF0, NCF1   no carry flag            !CF  
         OF,  OF0,  OF1    overflow flag            OF  
         NOF, NOF0, NOF1   no overflow flag         !OF  
         SF,  SF0,  SF1    sign flag                SF  
         NSF, NSF0, NSF1   no sign flag             !SF  
         AB,  AB0,  AB1    above                    CF && !ZF  
         BLE, BLE0, BLE1   below or equal           !CF || ZF  
         
      Table X.16, Condition Code Tests.  The allowed rules are specified in  
      the "mask rule" column.  If "0" or "1" is appended to the rule name  
      (e.g., "EQ1"), the corresponding condition code register (CC1 in this  
      example) is loaded, otherwise CC0 is loaded.  After loading, each  
      component is tested, using the expression listed in the "condition"  
      column.  
  
    After the condition code tests are performed, the four-component result  
    can be swizzled according to the <swizzleSuffix> grammar rule.  Individual  
    components of the destination variable are written only if the  
    corresponding component of the swizzled condition code test result is  
    TRUE.  If both a (compile-time) component write mask and a condition code  
    write mask are specified, destination variable components are written only  
    if the corresponding component is enabled in both masks.  
  
    A program instruction can also optionally update one of the two condition  
    code registers if the "CC", "CC0", or "CC1" instruction modifier are  
    specified.  These instruction modifiers update condition code register  
    CC0, CC0, or CC1, respectively.  The instructions "ADD.CC" or "ADD.CC0"  
    will perform an add and update condition code zero, "ADD.CC1" will add and  
    update condition code one, and "ADD" will simply perform the add without a  
    condition code update.  The components of the selected condition code  
    register are updated if and only if the corresponding component of the  
    destination variable are enabled by both write masks.  For the purposes of  
    condition code update, a scalar destination variable is treated as a  
    vector where the scalar result is written to "x" (if enabled in the write  
    mask), and writes to the "y", "z", and "w" components are disabled.  
  
    When condition code components are written, the condition code flags are  
    updated based on the corresponding component of the result.  If a  
    component of the destination register is not enabled for writes, the  
    corresponding condition code component is also unchanged.  
  
    For floating-point results, the sign flag (SF) is set if the result is  
    less than zero or is a NaN (not a number) value.  The zero flag (ZF) is  
    set if the result is equal to zero or is a NaN.  
  
    For signed and unsigned integer results, the sign flag (SF) is set if the  
    most significant bit of the value written to the result variable is set  
    and the zero flag (ZF) is set if the result written is zero.  For  
    instructions other than those performing an integer add or subtract (ADD,  
    MAD, SAD, SUB), the overflow and carry flags (OF and CF) are cleared.  
  
    For integer add or subtract operations, the overflow and carry flags by  
    doing both signed and unsigned adds/subtracts as follows:  
  
      The overflow flag (OF) is set by interpreting the two operands as signed  
      integers and performing a signed add or subtract.  If the result is  
      representable as a signed integer (i.e., doesn't overflow), the overflow  
      flag is cleared; otherwise, it is set.  
  
      The carry flag (CF) is set by interpreting the two operands as unsigned  
      integers and performing an unsigned add or subtract.  If the result of  
      an add is representable as an unsigned integer (i.e., doesn't overflow),  
      the carry flag is cleared; otherwise, it is set.  If the result of a  
      subtract is greater than or equal to zero, the carry flag is set;  
      otherwise, it is cleared.  
  
    For the purposes of condition code setting, negation modifiers turn add  
    operations into subtracts and vice versa.  If the operation is equivalent  
    to an add with both operands negated (-A-B), the carry and overflow flags  
    are both undefined.  
  
  
    Section 2.X.4.4, Program Texture Access  
  
    Certain program instructions may access texture images, as described in  
    section 3.8.  The coordinates, level-of-detail, and partial derivatives  
    used for performing the texture lookup are derived from values provided in  
    the program as described in the various sub-sections of Section 2.X.8.  
    These descriptions use the function  
  
      result_t_vec  
        TextureSample(float_vec coord, float lod, float_vec ddx,   
                      float_vec ddy, int_vec offset);  
  
    which obtains a filtered texel value <tau> as described in Section 3.8.8  
    and returns a 4-component vector (R,G,B,A) according to the format  
    conversions specified in Table 3.21.  The result vector is interpreted as  
    floating-point, signed integer, or unsigned integer, according to the data  
    type modifier of the instruction.  If the internal format of the texture  
    does not match the instruction's data type modifer, the results of the  
    texture lookup are undefined.  
  
    (Note:  For unextended OpenGL 2.0, all supported texture internal formats  
    store integer values but return floating-point results in the range [0,1]  
    on a texture lookup.  The ARB_texture_float extension introduces  
    floating-point internal format where components are both stored and  
    returned as floating-point values.  The EXT_texture_integer extension  
    introduces formats that both store and return either signed or unsigned  
    integer values.)  
  
    <coord> is a four-component floating-point vector from which the (s,t,r)  
    texture coordinates used for the texture access, the layer used for array  
    textures, and the reference value used for depth comparisons (section  
    3.8.14) are extracted according to Table X.17.  If the texture is a cube  
    map, (s,t,r) is projected to one of the six cube faces to produce a new  
    (s,t) vector according to Section 3.8.6.  For array textures, the layer  
    used is derived by rounding the extracted floating-point component to the  
    nearest integer and clamping the result to the range [0,<n>-1], where <n>  
    is the number of layers in the texture.  
  
    <lod> specifies the level of detail parameter and replaces the value  
    computed in equation 3.18.  <ddx> and <ddy> specify partial derivatives  
    (ds/dx, dt/dx, dr/dx, ds/dy, dt/dy, and dr/dy) for the texture  
    coordinates, and may be used to derive footprint shapes for anisotropic  
    texture filtering.  
  
    <offset> is a constant 3-component signed integer vector specified  
    according to the <texOffset> grammar rule, which is added to the computed  
    <u>, <v>, and <w> texel locations prior to sampling.  One, two, or three  
    components may be specified in the instruction; if fewer than three are  
    specified, the remaining offset components are zero.  A limited range of  
    offset values are supported; the minimum and maximum <texOffset> values  
    are implementation-dependent and given by MIN_PROGRAM_TEXEL_OFFSET_EXT and  
    MAX_PROGRAM_TEXEL_OFFSET_EXT, respectively.  A program will fail to load:  
  
      * if the texture target specified in the instruction is 1D, ARRAY1D,  
        SHADOW1D, or SHADOWARRAY1D, and the second or third component of the  
        offset vector is non-zero,  
  
      * if the texture target specified in the instruction is 2D, RECT,  
        ARRAY2D, SHADOW2D, SHADOWRECT, or SHADOWARRAY2D, and the third  
        component of the offset vector is non-zero,  
  
      * if the texture target is CUBE or SHADOWCUBE, and any component of the  
        offset vector is non-zero -- texel offsets are not supported for cube  
        map or buffer textures, or  
  
      * if any component of the offset vector is less than  
        MIN_PROGRAM_TEXEL_OFFSET_EXT or greater than  
        MAX_PROGRAM_TEXEL_OFFSET_EXT.  
  
    (NOTE:  Texel offsets are a new feature provided by this extension and are  
    described in more detail in edits to Section 3.8 below.)  
  
    The texture used by TextureSample() is one of the textures bound to the  
    texture image unit whose number is specified in the instruction according  
    to the <texImageUnit> grammar rule.  The texture target accessed is  
    specified according to the <texTarget> grammar rule and Table X.17.  
    Fixed-function texture enables are always ignored when determining the  
    texture to access in a program.  
  
                                                     coordinates used  
      texTarget          Texture Type               s t r  layer  shadow  
      ----------------   ---------------------      -----  -----  ------  
      1D                 TEXTURE_1D                 x - -    -      -  
      2D                 TEXTURE_2D                 x y -    -      -  
      3D                 TEXTURE_3D                 x y z    -      -  
      CUBE               TEXTURE_CUBE_MAP           x y z    -      -  
      RECT               TEXTURE_RECTANGLE_ARB      x y -    -      -  
      ARRAY1D            TEXTURE_1D_ARRAY_EXT       x - -    y      -  
      ARRAY2D            TEXTURE_2D_ARRAY_EXT       x y -    z      -  
      SHADOW1D           TEXTURE_1D                 x - -    -      z  
      SHADOW2D           TEXTURE_2D                 x y -    -      z  
      SHADOWRECT         TEXTURE_RECTANGLE_ARB      x y -    -      z  
      SHADOWCUBE         TEXTURE_CUBE_MAP           x y z    -      w  
      SHADOWARRAY1D      TEXTURE_1D_ARRAY_EXT       x - -    y      z  
      SHADOWARRAY2D      TEXTURE_2D_ARRAY_EXT       x y -    z      w  
      BUFFER             TEXTURE_BUFFER_EXT           <not supported>  
  
      Table X.17:  Texture types accessed for each of the <texTarget>, and  
      coordinate mappings.  The "SHADOW" and "ARRAY" targets are special  
      pseudo-targets described below.  The "coordinates used" column indicate  
      the input values used for each coordinate of the texture lookup, the  
      layer selector for array textures, and the reference value for texture  
      comparisons.  Buffer textures are not supported by normal texture lookup  
      functions, but are supported by TXF and TXQ, described below.  
  
    Texture targets with "SHADOW" are used to access textures with a  
    DEPTH_COMPONENT base internal format using depth comparisons (Section  
    3.8.14).  Results of a texture access are undefined:  
  
      * if a "SHADOW" target is used, and the corresponding texture has a base  
        internal format other than DEPTH_COMPONENT or a TEXTURE_COMPARE_MODE  
        of NONE, or  
  
      * if a non-"SHADOW" target is used, and the corresponding texture has a  
        base internal format of DEPTH_COMPONENT and a TEXTURE_COMPARE_MODE  
        other than NONE.  
  
    If the texture being accessed is not complete (or cube complete for  
    cubemap textures), no texture access is performed and the result is  
    undefined.  
  
    A program will fail to load if it attempts to sample from multiple texture  
    targets (including the SHADOW pseudo-targets) on the same texture image  
    unit.  For example, a program containing any two the following  
    instructions will fail to load:  
  
      TEX out, coord, texture[0], 1D;  
      TEX out, coord, texture[0], 2D;  
      TEX out, coord, texture[0], ARRAY2D;  
      TEX out, coord, texture[0], SHADOW2D;  
      TEX out, coord, texture[0], 3D;  
  
    Additionally, multiple texture targets for a single texture image unit may  
    not be used at the same time by the GL.  The error INVALID_OPERATION is  
    generated by Begin, RasterPos, or any command that performs an implicit  
    Begin if an enabled program accesses one texture target for a texture unit  
    while another enabled program or fixed-function fragment processing  
    accesses a different texture target for the same texture image unit.  
  
    Some texture instructions use standard methods to compute partial  
    derivatives and/or the level-of-detail used to perform texture accesses.  
    For fragment programs, the functions  
  
      float_vec ComputePartialsX(float_vec coord);  
      float_vec ComputePartialsY(float_vec coord);  
  
    compute approximate component-wise partial derivatives of the  
    floating-point vector <coord> relative to the X and Y coordinates,  
    respectively.  For vertex and geometry programs, these functions always  
    return (0,0,0,0).  The function  
  
      float ComputeLOD(float_vec ddx, float_vec ddy);  
  
    maps partial derivative vectors <ddx> and <ddy> to ds/dx, dt/dx, dr/dx,  
    ds/dy, dt/dy, and dr/dy and computes lambda_base(x,y) according to  
    equation 3.18.  
  
    The TXF instruction provides the ability to extract a single texel from a  
    specified texture image using the function  
      
      result_t_vec TexelFetch(uint_vec coord, int_vec offset);  
  
    The extracted texel is converted to an (R,G,B,A) vector according to Table  
    3.21.  The result vector is interpreted as floating-point, signed integer,  
    or unsigned integer, according to the data type modifier of the  
    instruction.  If the internal format of the texture is not compatible with  
    the instruction's data type modifer, the extracted texel value is  
    undefined.  
  
    <coord> is a four-component signed integer vector used to identify the  
    single texel accessed.  The (i,j,k) coordinates of the texel and the layer  
    used for array textures are extracted according to Table X.18.  The level  
    of detail accessed is obtained by adding the w component of <coord> to the  
    base level (level_base).  <offset> is a constant 3-component signed  
    integer vector added to the texel coordinates prior to the texel fetch as  
    described above.  In addition to the restrictions described above,  
    non-zero offset components are also not supported for BUFFER targets.  
  
    The texture used by TexelFetch() is specified by the image unit and target  
    parameters provided in the instruction, as for TextureSample() above.  
    Single texel fetches can not perform depth comparisons or access cubemaps.  
    If a program contains a TXF instruction specifying one of the "SHADOW" or  
    "CUBE" targets, it will fail to load.  
  
                                      coordinates used  
      texTarget          supported      i j k  layer  lod  
      ----------------   ---------      -----  -----  ---  
      1D                    yes         x - -    -     w  
      2D                    yes         x y -    -     w  
      3D                    yes         x y z    -     w  
      CUBE                  no          - - -    -     -  
      RECT                  yes         x y -    -     w  
      ARRAY1D               yes         x - -    y     w  
      ARRAY2D               yes         x y -    z     w  
      SHADOW1D              no          - - -    -     -  
      SHADOW2D              no          - - -    -     -  
      SHADOWRECT            no          - - -    -     -  
      SHADOWCUBE            no          - - -    -     -  
      SHADOWARRAY1D         no          - - -    -     -  
      SHADOWARRAY2D         no          - - -    -     -  
      BUFFER                yes         x - -    -     -  
  
      Table X.18, Mappings of texel fetch coordinates to texel location.  
  
    Single-texel fetches do not support LOD clamping or any texture wrap mode,  
    and require a mipmapped minification filter to access any level of detail  
    other than the base level.  The results of the texel fetch are undefined:  
  
      * if the computed LOD is less than the texture's base level (level_base)  
        or greater than the maximum level (level_max),  
  
      * if the computed LOD is not the texture's base level and the texture's  
        minification filter is NEAREST or LINEAR,  
  
      * if the layer specified for array textures is negative or greater than  
        the number of layers in the array texture,  
  
      * if the texel at (i,j,k) coordinates refer to a border texel outside  
        the defined extents of the specified LOD, where  
  
         i < -b_s, j < -b_s, k < -b_s,   
         i >= w_s - b_s, j >= h_s - b_s, or k >= d_s - b_s,   
  
        where the size parameters (w_s, h_s, d_s, and b_s) refer to the width,  
        height, depth, and border size of the image, as in equations 3.15,  
        3.16, and 3.17, or  
  
      * if the texture being accessed is not complete (or cube complete for  
        cubemaps).  
  
  
    Section 2.X.5, Program Flow Control  
  
    In addition to basic arithmetic, logical, and texture instructions, a  
    number of flow control instructions are provided, which are described in  
    detail in Section 2.X.8.  Programs can contain several types of  
    instruction blocks:  IF/ELSE/ENDIF blocks, REP/ENDREP blocks, and  
    subroutine blocks.  IF/ELSE/ENDIF blocks are a set of instructions  
    beginning with an "IF" instruction, ending with an "ENDIF" instruction,  
    and possibly containing an optional "ELSE" instruction.  REP/ENDREP blocks  
    are a set of instructions beginning with a "REP" instruction and ending  
    with an "ENDREP" instruction.  Subroutine blocks begin with an instruction  
    label identifying the name of the subroutine and ending just before the  
    next instruction label or the end of the program.  Examples include the  
    following:  
  
        MOVC CC, R0;  
        IF GT.x;  
          MOV R0, R1;     # executes if R0.x > 0  
        ELSE;  
          MOV R0, R2;     # executes if R0.x <= 0  
        ENDIF;  
  
        REP repCount;  
        ADD R0, R0, R1;  
        ENDREP;  
  
      square:             # subroutine to compute R0^2  
        MUL R0, R0, R0;  
        RET;  
      main:  
        MOV R0, 9.0;  
        CAL square;       # compute 9.0^2 in R0  
  
    IF/ELSE/ENDIF and REP/ENDREP blocks may be nested inside each other, and  
    inside subroutines.  In all cases, each instruction block must be  
    terminated with the appropriate instruction (ENDIF for IF, ENDREP for  
    REP).  Nested instruction blocks must be wholly contained within a block  
    -- if a REP instruction is found between an IF and ELSE instruction, the  
    corresponding ENDREP must also be present between the IF and ELSE.  
    Subroutines may not be nested inside IF/ELSE/ENDIF or REP/ENDREP blocks,  
    or inside other subroutines.  A program will fail to load if any  
    instruction block is terminated by an incorrect instruction, is not  
    terminated before the block containing it, or contains an instruction  
    label.  
  
    IF/ELSE/ENDIF blocks evaluate a condition to determine which instructions  
    to execute.  If the condition is true, all instructions between the IF and  
    ELSE are executed.  If the condition is false, all instructions between  
    the ELSE and ENDIF are executed.  The ELSE instruction is optional.  If  
    the ELSE is omitted, all instructions between the IF and ENDIF are  
    executed if the condition is true, or skipped if the condition is false.  
    A limited amount of nesting is supported -- a program will fail to load if  
    an IF instruction is nested inside MAX_PROGRAM_IF_DEPTH_NV or more  
    IF/ELSE/ENDIF blocks.  
  
    REP/ENDREP blocks are used to execute a sequence of instructions multiple  
    times.  The REP instruction includes an optional scalar operand to specify  
    a loop count indicating the number of times the block of instructions  
    should be repeated.  If the loop count is omitted, the contents of a  
    REP/ENDREP block will be repeated indefinitely until the loop is  
    explicitly terminated.  A limited amount of nesting is supported -- a  
    program will fail to load if a REP instruction is nested inside  
    MAX_PROGRAM_LOOP_DEPTH_NV or more REP/ENDREP blocks.  
  
    Within a REP/ENDREP block, the CONT instruction can be used to terminate  
    the current iteration of the loop by effectively jumping to the ENDREP  
    instruction.  The BRK instruction can be used to terminate the entire loop  
    by effectively jumping to the instruction immediately following the ENDREP  
    instruction.  If CONT and BRK instructions are found inside multiply  
    nested REP/ENDREP blocks, they apply to the innermost block.  A program  
    will fail to load if it includes a CONT or BRK instruction that is not  
    contained inside a REP/ENDREP block.  
  
    A REP/ENDREP block without a specified loop count can result in an  
    infinite loop.  To prevent obvious infinite loops, a program will fail to  
    load if it contains a REP/ENDREP block that contains neither a BRK  
    instruction at the current nesting level or a RET instruction at any  
    nesting level.  
  
    Subroutines are supported via the CAL and RET instructions.  A subroutine  
    block is identified by an instruction, which can be any valid identifier  
    according to the <instLabel> grammar rule.  The CAL instruction identifies  
    a subroutine name to call according to the <instTarget> grammar rule.  
    Instruction labels used in CAL instructions do not need to be defined in  
    the program text that precedes the instruction, but a program will fail to  
    load if it includes a CAL instruction that references an instruction label  
    that is not defined anywhere in the program.  When a CAL instruction is  
    executed, it transfers control to the instruction immediately following  
    the specified instruction label.  Subsequent instructions in that  
    subroutine are executed until a RET instruction is executed, or until  
    program execution reaches another instruction label or the end of the  
    program text.  After the subroutine finishes, execution continues with the  
    instruction immediately following the CAL instruction.  When a RET  
    instruction is issued, it will break out of any IF/ELSE/ENDIF or  
    REP/ENDREP blocks that contain it.  
  
    Subroutines may call other subroutines before completing, up to an  
    implementation-dependent maximum depth of MAX_PROGRAM_CALL_DEPTH_NV calls.  
    Subroutines may call any subroutine in the program, including themselves,  
    as long as the call depth limit is obeyed.  The results of issuing a CAL  
    instruction while MAX_PROGRAM_CALL_DEPTH subroutines have not completed  
    has undefined results, including possible program termination.  
  
    Several flow control instructions include condition code tests.  The IF  
    instruction requires a condition test to determine what instructions are  
    executed.  The CONT, BRK, CAL, and RET instructions have an optional  
    condition code test; if the test fails, the instructions are not executed.  
    Condition code tests are specified by the <ccTest> grammar rule.  The test  
    is evaluated like the condition code write mask (section 2.X.4.3), and  
    passes if and only if any of the four components passes.  
  
    If an instruction label named "main" is specified, GPU program execution  
    begins with the instruction immediately following that label.  Otherwise,  
    it begins with the first instruction of the program.  Instructions are  
    executed in sequence until either a RET instruction is issued in the main  
    subroutine or the end of the program text is reached.  
  
  
    Section 2.X.6, Program Options  
  
    Programs may specify a number of options to indicate that one or more  
    extended language features are used by the program.  All program options  
    used by the program must be declared at the beginning of the program  
    string.  Each program option specified in a program string will modify the  
    syntactic or semantic rules used to interpet the program and the execution  
    environment used to execute the program.  Features in program options  
    not declared by the program are ignored, even if the option is otherwise  
    supported by the GL.  Each option declaration consists of two tokens: the  
    keyword "OPTION" and an identifier.  
  
    The set of available options depends on the program type, and is  
    enumerated in the specifications for each program type.  Some program  
    types may not provide any options.  
  
  
    Section 2.X.7, Program Declarations  
  
    Programs may include a number of declaration statements to specify  
    characteristics of the program.  Each declaration statement is followed by  
    one or more arguments, separated by commas.  
  
    The set of available declarations depends on the program type, and is  
    enumerated in the specifications for each program type.  Some program  
    types may not provide declarations.  
  
  
    Section 2.X.8, Program Instruction Set  
  
    The following sections enumerate the set of instructions supported for GPU  
    programs.    
  
    Some instructions allow the use of one of the three basic data type  
    modifiers (floating point, signed integer, and unsigned integer).  Unless  
    otherwise mentioned:  
  
      * the result and all of the operands will be interpreted according to  
        the specified data type, and  
  
      * if no data type modifier is specified, the instruction will operate as  
        though a floating-point modifier ("F") were specified.  
  
    Some instructions will override one or both of these rules.  
  
  
    Section 2.X.8.Z, ABS:  Absolute Value  
  
    The ABS instruction performs a component-wise absolute value operation on  
    the single operand to yield a result vector.  
  
      tmp = VectorLoad(op0);   
      result.x = abs(tmp.x);  
      result.y = abs(tmp.y);  
      result.z = abs(tmp.z);  
      result.w = abs(tmp.w);  
  
    ABS supports all three data type modifiers.  Taking the absolute value of  
    an unsigned integer is not a useful operation, but is not illegal.  
  
  
    Section 2.X.8.Z, ADD:  Add  
  
    The ADD instruction performs a component-wise add of the two operands to  
    yield a result vector.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      result.x = tmp0.x + tmp1.x;  
      result.y = tmp0.y + tmp1.y;  
      result.z = tmp0.z + tmp1.z;  
      result.w = tmp0.w + tmp1.w;  
  
    ADD supports all three data type modifiers.      
  
  
    Section 2.X.8.Z, AND:  Bitwise AND  
  
    The AND instruction performs a bitwise AND operation on the components of  
    the two source vectors to yield a result vector.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      result.x = tmp0.x & tmp1.x;  
      result.y = tmp0.y & tmp1.y;  
      result.z = tmp0.z & tmp1.z;  
      result.w = tmp0.w & tmp1.w;  
  
    AND supports only signed and unsigned integer data type modifiers.  If no  
    type modifier is specified, both operands and the result are treated as  
    signed integers.  
  
  
    Section 2.X.8.Z, BRK:  Break out of Loop Instruction  
  
    The BRK instruction conditionally transfers control to the instruction  
    immediately following the next ENDREP instruction.  A BRK instruction has  
    no effect if the condition code test evaluates to FALSE.  
  
    The following pseudocode describes the operation of the instruction:  
  
      if (TestCC(cc.c***) || TestCC(cc.*c**) ||   
          TestCC(cc.**c*) || TestCC(cc.***c)) {  
        continue execution at instruction following the next ENDREP;  
      }  
  
  
    Section 2.X.8.Z, CAL:  Subroutine Call  
  
    The CAL instruction conditionally transfers control to the instruction  
    following the label specified in the instruction.  It also pushes a  
    reference to the instruction immediately following the CAL instruction  
    onto the call stack, where execution will continue after executing the  
    matching RET instruction.  The following pseudocode describes the  
    operation of the instruction:  
  
      if (TestCC(cc.c***) || TestCC(cc.*c**) ||   
          TestCC(cc.**c*) || TestCC(cc.***c)) {  
        if (callStackDepth >= MAX_PROGRAM_CALL_DEPTH_NV) {  
          // undefined results  
        } else {  
          callStack[callStackDepth] = nextInstruction;  
          callStackDepth++;  
        }  
        // continue execution at instruction following <instTarget>  
      } else {  
        // do nothing  
      }  
  
    In the pseudocode, <instTarget> is the label specified in the instruction  
    matching the <branchLabel> grammar rule, <callStackDepth> is the current  
    depth of the call stack, <callStack> is an array holding the call stack,  
    and <nextInstruction> is a reference to the instruction immediately  
    following the CAL instruction in the program string.  
  
    If the call stack overflows, the results of the CAL instruction are  
    undefined, and can result in immediate program termination.  
  
    An instruction label signifies the beginning of a new subroutine.  
    Subroutines may not nest or overlap.  If a CAL instruction is executed and  
    subsequent program execution reaches an instruction label before a  
    corresponding RET instruction is executed, the subroutine call returns  
    immediately, as though an unconditional RET instruction were inserted  
    immediately before the instruction label.  
  
    (Note:  On previous vertex program extensions -- NV_vertex_program2 and  
    NV_vertex_program3 -- instruction labels were also used as targets for  
    branch (BRA) instructions.  This unstructured branching functionality has  
    been replaced with the structured branching constructs found in this  
    instruction set.)  
  
  
    Section 2.X.8.Z, CEIL:  Ceiling  
  
    The CEIL instruction loads a single vector operand and performs a  
    component-wise ceiling operation to generate a result vector.  
  
      tmp = VectorLoad(op0);  
      iresult.x = ceil(tmp.x);  
      iresult.y = ceil(tmp.y);  
      iresult.z = ceil(tmp.z);  
      iresult.w = ceil(tmp.w);  
  
    The ceiling operation returns the nearest integer greater than or equal to  
    the operand.  For example ceil(-1.7) = -1.0, ceil(+1.0) = +1.0, and  
    ceil(+3.7) = +4.0.  
  
    CEIL supports all three data type modifiers.  The single operand is always  
    treated as a floating-point vector, but the result is written as a  
    floating-point value, a signed integer, or an unsigned integer, as  
    specified by the data type modifier.  If a value is not exactly  
    representable using the data type of the result (e.g., an overflow or  
    writing a negative value to an unsigned integer), the result is undefined.  
  
  
    Section 2.X.8.Z, CMP:  Compare  
  
    The CMP instructions performs a component-wise comparison of the first  
    operand against zero, and copies the values of the second or third  
    operands based on the results of the compare.  
      
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      tmp2 = VectorLoad(op2);  
      result.x = (tmp0.x < 0) ? tmp1.x : tmp2.x;  
      result.y = (tmp0.y < 0) ? tmp1.y : tmp2.y;  
      result.z = (tmp0.z < 0) ? tmp1.z : tmp2.z;  
      result.w = (tmp0.w < 0) ? tmp1.w : tmp2.w;  
  
    CMP supports all three data type modifiers.  CMP with an unsigned data  
    type modifier is not a useful operation, but is not illegal.  
  
  
    Section 2.X.8.Z, CONT:  Continue with Next Loop Iteration  
  
    The CONT instruction conditionally transfers control to the next ENDREP  
    instruction.  A CONT instruction has no effect if the condition code test  
    evaluates to FALSE.  
  
    The following pseudocode describes the operation of the instruction:  
  
      if (TestCC(cc.c***) || TestCC(cc.*c**) ||   
          TestCC(cc.**c*) || TestCC(cc.***c)) {  
        continue execution at the next ENDREP;  
      }  
  
  
    Section 2.X.8.Z, COS:  Cosine with Reduction to [-PI,PI]  
  
    The COS instruction approximates the trigonometric cosine of the angle  
    specified by the scalar operand and replicates it to all four components  
    of the result vector.  The angle is specified in radians and does not have  
    to be in the range [-PI,PI].  
  
      tmp = ScalarLoad(op0);  
      result.x = ApproxCosine(tmp);  
      result.y = ApproxCosine(tmp);  
      result.z = ApproxCosine(tmp);  
      result.w = ApproxCosine(tmp);  
  
    COS supports only floating-point data type modifiers.  
  
  
    Section 2.X.8.Z, DDX:  Partial Derivative Relative to X  
  
    The DDX instruction computes approximate partial derivatives of a vector  
    operand with respect to the X window coordinate, and is only available to  
    fragment programs.  See the NV_fragment_program4 specification for more  
    details.  
  
  
    Section 2.X.8.Z, DDY:  Partial Derivative Relative to Y  
  
    The DDY instruction computes approximate partial derivatives of a vector  
    operand with respect to the Y window coordinate, and is only available to  
    fragment programs.  See the NV_fragment_program4 specification for more  
    details.  
  
  
    Section 2.X.8.Z, DIV:  Divide Vector Components by Scalar  
  
    The DIV instruction performs a component-wise divide of the first vector  
    operand by the second scalar operand to produce a 4-component result  
    vector.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = ScalarLoad(op1);  
      result.x = tmp0.x / tmp1;  
      result.y = tmp0.y / tmp1;  
      result.z = tmp0.z / tmp1;  
      result.w = tmp0.w / tmp1;  
  
    DIV supports all three data type modifiers.  For floating-point division,  
    this instruction is not guaranteed to produce results identical to a  
    RCP/MUL instruction sequence.  
  
    The results of an signed or unsigned integer division by zero are  
    undefined.  
  
      
    Section 2.X.8.Z, DP2:  2-Component Dot Product  
  
    The DP2 instruction computes a two-component dot product of the two  
    operands (using the first two components) and replicates the dot product  
    to all four components of the result vector.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y);  
      result.x = dot;  
      result.y = dot;  
      result.z = dot;  
      result.w = dot;  
  
    DP2 supports only floating-point data type modifiers.  
  
  
    Section 2.X.8.Z, DP2A:  2-Component Dot Product with Scalar Add  
  
    The DP2 instruction computes a two-component dot product of the two  
    operands (using the first two components), adds the x component of the  
    third operand, and replicates the result to all four components of the  
    result vector.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      tmp2 = VectorLoad(op2);  
      dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + tmp2.x;  
      result.x = dot;  
      result.y = dot;  
      result.z = dot;  
      result.w = dot;  
  
    DP2A supports only floating-point data type modifiers.  
  
  
    Section 2.X.8.Z, DP3:  3-Component Dot Product  
  
    The DP3 instruction computes a three-component dot product of the two  
    operands (using the x, y, and z components) and replicates the dot product  
    to all four components of the result vector.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +   
            (tmp0.z * tmp1.z);  
      result.x = dot;  
      result.y = dot;  
      result.z = dot;  
      result.w = dot;  
  
    DP3 supports only floating-point data type modifiers.  
  
  
    Section 2.X.8.Z, DP4:  4-Component Dot Product  
  
    The DP4 instruction computes a four-component dot product of the two  
    operands and replicates the dot product to all four components of the  
    result vector.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1):  
      dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +   
            (tmp0.z * tmp1.z) + (tmp0.w * tmp1.w);  
      result.x = dot;  
      result.y = dot;  
      result.z = dot;  
      result.w = dot;  
  
    DP4 supports only floating-point data type modifiers.  
  
  
    Section 2.X.8.Z, DPH:  Homogeneous Dot Product  
  
    The DPH instruction computes a three-component dot product of the two  
    operands (using the x, y, and z components), adds the w component of the  
    second operand, and replicates the sum to all four components of the  
    result vector.  This is equivalent to a four-component dot product where  
    the w component of the first operand is forced to 1.0.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1):  
      dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +   
            (tmp0.z * tmp1.z) + tmp1.w;  
      result.x = dot;  
      result.y = dot;  
      result.z = dot;  
      result.w = dot;  
  
    DPH supports only floating-point data type modifiers.  
  
  
    Section 2.X.8.Z, DST:  Distance Vector  
  
    The DST instruction computes a distance vector from two specially-  
    formatted operands.  The first operand should be of the form [NA, d^2,  
    d^2, NA] and the second operand should be of the form [NA, 1/d, NA, 1/d],  
    where NA values are not relevant to the calculation and d is a vector  
    length.  If both vectors satisfy these conditions, the result vector will  
    be of the form [1.0, d, d^2, 1/d].  
  
    The exact behavior is specified in the following pseudo-code:  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      result.x = 1.0;  
      result.y = tmp0.y * tmp1.y;  
      result.z = tmp0.z;  
      result.w = tmp1.w;  
  
    Given an arbitrary vector, d^2 can be obtained using the DP3 instruction  
    (using the same vector for both operands) and 1/d can be obtained from d^2  
    using the RSQ instruction.  
  
    This distance vector is useful for per-vertex light attenuation  
    calculations:  a DP3 operation using the distance vector and an  
    attenuation constants vector as operands will yield the attenuation  
    factor.  
  
    DST supports only floating-point data type modifiers.  
  
  
    Section 2.X.8.Z, ELSE:  Start of If Test Else Block  
  
    The ELSE instruction signifies the end of the "execute if true" portion of  
    an IF/ELSE/ENDIF block and the beginning of the "execute if false"  
    portion.  
  
    If the condition evaluated at the IF statement was TRUE, when a program  
    reaches the ELSE statement, it has completed the entire "execute if true"  
    portion of the IF/ELSE/ENDIF block.  Execution will continue at the  
    corresponding ENDIF instruction.  
  
    If the condition evaluated at the IF statement was FALSE, program  
    execution would skip over the entire "execute if true" portion of the  
    IF/ELSE/ENDIF block, including the ELSE instruction.  
  
  
    Section 2.X.8.Z, EMIT:  Emit Vertex  
  
    The EMIT instruction emits a new vertex to be added to the current output  
    primitive generated by a geometry program, and is only available to  
    geometry programs.  See the NV_geometry_program4 specification for more  
    details.  
  
  
    Section 2.X.8.Z, ENDIF:  End of If Test Block  
  
    The ENDIF instruction signifies the end of an IF/ELSE/ENDIF block.  It has  
    no other effect on program execution.  
  
  
    Section 2.X.8,Z, ENDPRIM:  End of Primitive  
  
    A geometry program can emit multiple primitives in a single invocation.  
    The ENDPRIM instruction is used in a geometry program to signify the end  
    of the current primitive and the beginning of a new primitive of the same  
    type.  It is only available to geometry programs.  See the  
    NV_geometry_program4 specification for more details.  
  
  
    Section 2.X.8.Z, ENDREP:  End of Repeat Block  
  
    The ENDREP instruction specifies the end of a REP block.    
  
    When used with in conjunction with a REP instruction with a loop count,  
    ENDREP decrements the loop counter.  If the decremented loop counter is  
    greater than zero, ENDREP transfers control to the instruction immediately  
    after the corresponding REP instruction.  If the loop counter is less than  
    or equal to zero, execution continues at the instruction following the  
    ENDREP instruction.  When used in conjunction with a REP instruction  
    without loop count, ENDREP always transfers control to the instruction  
    immediately after the REP instruction.  
  
      if (REP instruction includes a loop count) {  
        LoopCount--;  
        if (LoopCount > 0) {  
          continue execution at instruction following corresponding REP  
            instruction;  
        }  
      } else {  
        continue execution at instruction following corresponding REP  
          instruction;  
      }  
  
  
    Section 2.X.8.Z, EX2:  Exponential Base 2  
  
    The EX2 instruction approximates 2 raised to the power of the scalar  
    operand and replicates the approximation to all four components of the  
    result vector.  
  
      tmp = ScalarLoad(op0);  
      result.x = Approx2ToX(tmp);  
      result.y = Approx2ToX(tmp);  
      result.z = Approx2ToX(tmp);  
      result.w = Approx2ToX(tmp);  
  
    EX2 supports only floating-point data type modifiers.  
  
  
    Section 2.X.8.Z, FLR:  Floor  
  
    The FLR instruction loads a single vector operand and performs a  
    component-wise floor operation to generate a result vector.  
  
      tmp = VectorLoad(op0);  
      result.x = floor(tmp.x);  
      result.y = floor(tmp.y);  
      result.z = floor(tmp.z);  
      result.w = floor(tmp.w);  
  
    The floor operation returns the nearest integer less than or equal to the  
    operand.  For example floor(-1.7) = -2.0, floor(+1.0) = +1.0, and floor(+3.7)  
    = +3.0.  
  
    FLR supports all three data type modifiers.  The single operand is always  
    treated as a floating-point value, but the result is written as a  
    floating-point value, a signed integer, or an unsigned integer, as  
    specified by the data type modifier.  If a value is not exactly  
    representable using the data type of the result (e.g., an overflow or  
    writing a negative value to an unsigned integer), the result is undefined.  
  
  
    Section 2.X.8.Z, FRC:  Fraction  
  
    The FRC instruction extracts the fractional portion of each component of  
    the operand to generate a result vector.  The fractional portion of a  
    component is defined as the result after subtracting off the floor of the  
    component (see FLR), and is always in the range [0.0, 1.0).  
  
    For negative values, the fractional portion is NOT the number written to  
    the right of the decimal point -- the fractional portion of -1.7 is not  
    0.7 -- it is 0.3.  0.3 is produced by subtracting the floor of -1.7 (-2.0)  
    from -1.7.  
  
      tmp = VectorLoad(op0);  
      result.x = fraction(tmp.x);  
      result.y = fraction(tmp.y);  
      result.z = fraction(tmp.z);  
      result.w = fraction(tmp.w);  
  
    FRC supports only floating-point data type modifiers.  
  
  
    Section 2.X.8.Z, I2F:  Integer to Float  
  
    The I2F instruction converts the components of an integer vector operand  
    to floating-point to produce a floating-point result vector.  
  
      tmp = VectorLoad(op0);  
      result.x = (float) tmp.x;  
      result.y = (float) tmp.y;  
      result.z = (float) tmp.z;  
      result.w = (float) tmp.w;  
  
    I2F supports only signed and unsigned integer data type modifiers.  The  
    single operand is interpreted according to the data type modifier.  If no  
    data type modifier is specified, the operand is treated as a signed  
    integer vector.  The result is always written as a float.  
  
  
    Section 2.X.8.Z, IF:  Start of If Test Block  
  
    The IF instruction performs a condition code test to determine what  
    instructions inside an IF/ELSE/ENDIF block are executed.  If the test  
    passes, execution continues at the instruction immediately following the  
    IF instruction.  If the test fails, IF transfers control to the  
    instruction immediately following the corresponding ELSE instruction (if  
    present) or the ENDIF instruction (if no ELSE is present).  
  
    Implementations may have a limited ability to nest IF blocks in any  
    subroutine.  If the number of IF/ENDIF blocks nested inside each other is  
    MAX_PROGRAM_IF_DEPTH_NV or higher, a program will fail to compile.  
  
      // Evaluate the condition.  If the condition is true, continue at the  
      // next instruction.  Otherwise, continue at the   
      if (TestCC(cc.c***) || TestCC(cc.*c**) ||   
          TestCC(cc.**c*) || TestCC(cc.***c)) {  
        continue execution at the next instruction;  
      } else if (IF block contains an ELSE statement) {  
        continue execution at instruction following corresponding ELSE;  
      } else {  
        continue execution at instruction following corresponding ENDIF;  
      }  
  
    (Note:  Unlike the NV_fragment_program2 extension, there is no run-time  
    limit on the maximum overall depth of IF/ENDIF nesting.  As long as each  
    individual subroutine of the program obeys the static nesting limits,  
    there will be no run-time errors in the program.  With the  
    NV_fragment_program2 extension, a program could terminate abnormally if it  
    called a subroutine inside a very deeply nested set of IF/ENDIF blocks and  
    the called subroutine also contained deeply nested IF/ENDIF blocks.  SUch  
    an error could occur even if neither subroutine exceeded static limits.)  
  
  
    Section 2.X.8.Z, KIL:  Kill Fragment  
  
    The KIL instruction conditionally kills a fragment, and is only available  
    to fragment programs.  See the NV_fragment_program4 specification for more  
    details.  
  
  
    Section 2.X.8.Z, LG2:  Logarithm Base 2  
  
    The LG2 instruction approximates the base 2 logarithm of the scalar  
    operand and replicates it to all four components of the result vector.  
  
      tmp = ScalarLoad(op0);  
      result.x = ApproxLog2(tmp);  
      result.y = ApproxLog2(tmp);  
      result.z = ApproxLog2(tmp);  
      result.w = ApproxLog2(tmp);  
  
    If the scalar operand is zero or negative, the result is undefined.  
  
    LG2 supports only floating-point data type modifiers.  
  
  
    Section 2.X.8.Z, LIT:  Compute Lighting Coefficients  
  
    The LIT instruction accelerates lighting computations by computing  
    lighting coefficients for ambient, diffuse, and specular light  
    contributions.  The "x" component of the single operand is assumed to hold  
    a diffuse dot product (n dot VP_pli, as in the vertex lighting equations  
    in Section 2.13.1).  The "y" component of the operand is assumed to hold a  
    specular dot product (n dot h_i).  The "w" component of the operand is  
    assumed to hold the specular exponent of the material (s_rm), and is  
    clamped to the range (-128, +128) exclusive.  
  
    The "x" component of the result vector receives the value that should be  
    multiplied by the ambient light/material product (always 1.0).  The "y"  
    component of the result vector receives the value that should be  
    multiplied by the diffuse light/material product (n dot VP_pli).  The "z"  
    component of the result vector receives the value that should be  
    multiplied by the specular light/material product (f_i * (n dot h_i) ^  
    s_rm).  The "w" component of the result is the constant 1.0.  
  
    Negative diffuse and specular dot products are clamped to 0.0, as is done  
    in the standard per-vertex lighting operations.  In addition, if the  
    diffuse dot product is zero or negative, the specular coefficient is  
    forced to zero.  
  
      tmp = VectorLoad(op0);  
      if (tmp.x < 0) tmp.x = 0;  
      if (tmp.y < 0) tmp.y = 0;  
      if (tmp.w < -(128.0-epsilon)) tmp.w = -(128.0-epsilon);  
      else if (tmp.w > 128-epsilon) tmp.w = 128-epsilon;  
      result.x = 1.0;  
      result.y = tmp.x;  
      result.z = (tmp.x > 0) ? RoughApproxPower(tmp.y, tmp.w) : 0.0;  
      result.w = 1.0;  
  
    Since 0^0 is defined to be 1, RoughApproxPower(0.0, 0.0) will produce 1.0.  
  
    LIT supports only floating-point data type modifiers.  
  
  
    Section 2.X.8.Z, LRP:  Linear Interpolation  
  
    The LRP instruction performs a component-wise linear interpolation between  
    the second and third operands using the first operand as the blend factor.  
      
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      tmp2 = VectorLoad(op2);  
      result.x = tmp0.x * tmp1.x + (1 - tmp0.x) * tmp2.x;  
      result.y = tmp0.y * tmp1.y + (1 - tmp0.y) * tmp2.y;  
      result.z = tmp0.z * tmp1.z + (1 - tmp0.z) * tmp2.z;  
      result.w = tmp0.w * tmp1.w + (1 - tmp0.w) * tmp2.w;  
  
    LRP supports only floating-point data type modifiers.  
  
  
    Section 2.X.8.Z, MAD:  Multiply and Add  
  
    The MAD instruction performs a component-wise multiply of the first two  
    operands, and then does a component-wise add of the product to the third  
    operand to yield a result vector.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      tmp2 = VectorLoad(op2);  
      result.x = tmp0.x * tmp1.x + tmp2.x;  
      result.y = tmp0.y * tmp1.y + tmp2.y;  
      result.z = tmp0.z * tmp1.z + tmp2.z;  
      result.w = tmp0.w * tmp1.w + tmp2.w;  
  
    The multiplication and addition operations in this instruction are subject  
    to the same rules as described for the MUL and ADD instructions.  
  
    MAD supports all three data type modifiers.  
  
  
    Section 2.X.8.Z, MAX:  Maximum  
  
    The MAX instruction computes component-wise maximums of the values in the  
    two operands to yield a result vector.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      result.x = (tmp0.x > tmp1.x) ? tmp0.x : tmp1.x;  
      result.y = (tmp0.y > tmp1.y) ? tmp0.y : tmp1.y;  
      result.z = (tmp0.z > tmp1.z) ? tmp0.z : tmp1.z;  
      result.w = (tmp0.w > tmp1.w) ? tmp0.w : tmp1.w;  
  
    MAX supports all three data type modifiers.  
  
  
    Section 2.X.8.Z, MIN:  Minimum  
  
    The MIN instruction computes component-wise minimums of the values in the  
    two operands to yield a result vector.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      result.x = (tmp0.x > tmp1.x) ? tmp1.x : tmp0.x;  
      result.y = (tmp0.y > tmp1.y) ? tmp1.y : tmp0.y;  
      result.z = (tmp0.z > tmp1.z) ? tmp1.z : tmp0.z;  
      result.w = (tmp0.w > tmp1.w) ? tmp1.w : tmp0.w;  
  
    MIN supports all three data type modifiers.  
  
  
    Section 2.X.8.Z, MOD:  Modulus  
  
    The MOD instruction performs a component-wise modulus operation on the first  
    vector operand by the second scalar operand to produce a 4-component result  
    vector.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = ScalarLoad(op1);  
      result.x = tmp0.x % tmp1;  
      result.y = tmp0.y % tmp1;  
      result.z = tmp0.z % tmp1;  
      result.w = tmp0.w % tmp1;  
  
    MOD supports both signed and unsigned integer data type modifiers.  If no  
    data type modifier is specified, both operands and the result are treated  
    as signed integers.  
  
  
    Section 2.X.8.Z, MOV:  Move  
  
    The MOV instruction copies the value of the operand to yield a result  
    vector.  
  
      result = VectorLoad(op0);  
  
    MOV supports all three data type modifiers.  
  
  
    Section 2.X.8.Z, MUL:  Multiply  
  
    The MUL instruction performs a component-wise multiply of the two operands  
    to yield a result vector.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      result.x = tmp0.x * tmp1.x;  
      result.y = tmp0.y * tmp1.y;  
      result.z = tmp0.z * tmp1.z;  
      result.w = tmp0.w * tmp1.w;  
  
    MUL supports all three data type modifiers.  The MUL instruction  
    additionally supports three special modifiers.  
  
    The "S24" and "U24" modifiers specify "fast" signed or unsigned integer  
    multiplies of 24-bit quantities, respectively.  The results of such  
    multiplies are undefined if either operand is outside the range  
    [-2^23,+2^23-1] for S24 or [0,2^24-1] for U24.  If "S24" or "U24" is  
    specified, the data type is implied and normal data type modifiers may not  
    be provided.  
  
    The "HI" modifier specifies a 32-bit integer multiply that returns the 32  
    most significant bits of the 64-bit product.  Integer multiplies without  
    the "HI" modifier normally return the least significant bits of the  
    product.  If "HI" is specified, either of the "S" or "U" integer data type  
    modifiers must also be specified.    
  
    Note that if condition code updates are performed on integer multiplies,  
    the overflow or carry flags are always cleared, even if the product  
    overflowed.  If it is necessary to determine if the results of an integer  
    multiply overflowed, the MUL.HI instruction may be used.  
  
  
    Section 2.X.8.Z, NOT:  Bitwise Not  
  
    The NOT instruction performs a component-wise bitwise NOT operation on the  
    source vector to produce a result vector.  
  
      tmp = VectorLoad(op0);  
      tmp.x = ~tmp.x;  
      tmp.y = ~tmp.y;  
      tmp.z = ~tmp.z;  
      tmp.w = ~tmp.w;  
  
    NOT supports only integer data type modifiers.  If no type modifier is  
    specified, the operand and the result are treated as signed integers.  
  
  
    Section 2.X.8.Z, NRM:  Normalize 3-Component Vector  
  
    The NRM instruction normalizes the vector given by the x, y, and z  
    components of the vector operand to produce the x, y, and z components of  
    the result vector.  The w component of the result is undefined.  
  
      tmp = VectorLoad(op0);  
      scale = ApproxRSQ(tmp.x * tmp.x + tmp.y * tmp.y + tmp.z * tmp.z);  
      result.x = tmp.x * scale;  
      result.y = tmp.y * scale;  
      result.z = tmp.z * scale;  
      result.w = undefined;  
  
    NRM supports only floating-point data type modifiers.  
  
  
    Section 2.X.8.Z, OR:  Bitwise Or  
  
    The OR instruction performs a bitwise OR operation on the components of  
    the two source vectors to yield a result vector.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      result.x = tmp0.x | tmp1.x;  
      result.y = tmp0.y | tmp1.y;  
      result.z = tmp0.z | tmp1.z;  
      result.w = tmp0.w | tmp1.w;  
  
    OR supports only integer data type modifiers.  If no type modifier is  
    specified, both operands and the result are treated as signed integers.  
  
  
    Section 2.X.8.Z, PK2H:  Pack Two 16-bit Floats  
  
    The PK2H instruction converts the "x" and "y" components of the single  
    floating-point vector operand into 16-bit floating-point format, packs the  
    bit representation of these two floats into a 32-bit unsigned integer, and  
    replicates that value to all four components of the result vector.  The  
    PK2H instruction can be reversed by the UP2H instruction below.  
  
      tmp0 = VectorLoad(op0);  
      /* result obtained by combining raw bits of tmp0.x, tmp0.y */  
      result.x = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);  
      result.y = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);  
      result.z = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);  
      result.w = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);  
  
    PK2H supports all three data type modifiers.  The single operand is always  
    treated as a floating-point value, but the result is written as a  
    floating-point value, a signed integer, or an unsigned integer, as  
    specified by the data type modifier.  For integer results, the bits can be  
    interpreted as described above.  For floating-point result variables, the  
    packed results do not constitute a meaningful floating-point variable and  
    should only be used to feed future unpack instructions.  
  
    A program will fail to load if it contains a PK2H instruction that writes  
    its results to a variable declared as "SHORT".  
  
  
    Section 2.X.8.Z, PK2US:  Pack Two Floats as Unsigned 16-bit  
  
    The PK2US instruction converts the "x" and "y" components of the single  
    floating-point vector operand into a packed pair of 16-bit unsigned  
    scalars.  The scalars are represented in a bit pattern where all '0' bits  
    corresponds to 0.0 and all '1' bits corresponds to 1.0.  The bit  
    representations of the two converted components are packed into a 32-bit  
    unsigned integer, and that value is replicated to all four components of  
    the result vector.  The PK2US instruction can be reversed by the UP2US  
    instruction below.  
  
      tmp0 = VectorLoad(op0);  
      if (tmp0.x < 0.0) tmp0.x = 0.0;  
      if (tmp0.x > 1.0) tmp0.x = 1.0;  
      if (tmp0.y < 0.0) tmp0.y = 0.0;  
      if (tmp0.y > 1.0) tmp0.y = 1.0;  
      us.x = round(65535.0 * tmp0.x);  /* us is a ushort vector */  
      us.y = round(65535.0 * tmp0.y);  
      /* result obtained by combining raw bits of us. */  
      result.x = ((us.x) | (us.y << 16));  
      result.y = ((us.x) | (us.y << 16));  
      result.z = ((us.x) | (us.y << 16));  
      result.w = ((us.x) | (us.y << 16));  
  
    PK2US supports all three data type modifiers.  The single operand is  
    always treated as a floating-point value, but the result is written as a  
    floating-point value, a signed integer, or an unsigned integer, as  
    specified by the data type modifier.  For integer result variables, the  
    bits can be interpreted as described above.  For floating-point result  
    variables, the packed results do not constitute a meaningful  
    floating-point variable and should only be used to feed future unpack  
    instructions.  
  
    A program will fail to load if it contains a PK2S instruction that writes  
    its results to a variable declared as "SHORT".  
  
  
    Section 2.X.8.Z, PK4B:  Pack Four Floats as Signed 8-bit  
  
    The PK4B instruction converts the four components of the single  
    floating-point vector operand into 8-bit signed quantities.  The signed  
    quantities are represented in a bit pattern where all '0' bits corresponds  
    to -128/127 and all '1' bits corresponds to +127/127.  The bit  
    representations of the four converted components are packed into a 32-bit  
    unsigned integer, and that value is replicated to all four components of  
    the result vector.  The PK4B instruction can be reversed by the UP4B  
    instruction below.  
  
      tmp0 = VectorLoad(op0);  
      if (tmp0.x < -128/127) tmp0.x = -128/127;  
      if (tmp0.y < -128/127) tmp0.y = -128/127;  
      if (tmp0.z < -128/127) tmp0.z = -128/127;  
      if (tmp0.w < -128/127) tmp0.w = -128/127;  
      if (tmp0.x > +127/127) tmp0.x = +127/127;  
      if (tmp0.y > +127/127) tmp0.y = +127/127;  
      if (tmp0.z > +127/127) tmp0.z = +127/127;  
      if (tmp0.w > +127/127) tmp0.w = +127/127;  
      ub.x = round(127.0 * tmp0.x + 128.0);  /* ub is a ubyte vector */  
      ub.y = round(127.0 * tmp0.y + 128.0);  
      ub.z = round(127.0 * tmp0.z + 128.0);  
      ub.w = round(127.0 * tmp0.w + 128.0);  
      /* result obtained by combining raw bits of ub. */  
      result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));  
      result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));  
      result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));  
      result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));  
  
    PK4B supports all three data type modifiers.  The single operand is always  
    treated as a floating-point value, but the result is written as a  
    floating-point value, a signed integer, or an unsigned integer, as  
    specified by the data type modifier.  For integer result variables, the  
    bits can be interpreted as described above.  For floating-point result  
    variables, the packed results do not constitute a meaningful  
    floating-point variable and should only be used to feed future unpack  
    instructions.  A program will fail to load if it contains a PK4B  
    instruction that writes its results to a variable declared as "SHORT".  
  
  
    Section 2.X.8.Z, PK4UB:  Pack Four Floats as Unsigned 8-bit  
  
    The PK4UB instruction converts the four components of the single  
    floating-point vector operand into a packed grouping of 8-bit unsigned  
    scalars.  The scalars are represented in a bit pattern where all '0' bits  
    corresponds to 0.0 and all '1' bits corresponds to 1.0.  The bit  
    representations of the four converted components are packed into a 32-bit  
    unsigned integer, and that value is replicated to all four components of  
    the result vector.  The PK4UB instruction can be reversed by the UP4UB  
    instruction below.  
  
      tmp0 = VectorLoad(op0);  
      if (tmp0.x < 0.0) tmp0.x = 0.0;  
      if (tmp0.x > 1.0) tmp0.x = 1.0;  
      if (tmp0.y < 0.0) tmp0.y = 0.0;  
      if (tmp0.y > 1.0) tmp0.y = 1.0;  
      if (tmp0.z < 0.0) tmp0.z = 0.0;  
      if (tmp0.z > 1.0) tmp0.z = 1.0;  
      if (tmp0.w < 0.0) tmp0.w = 0.0;  
      if (tmp0.w > 1.0) tmp0.w = 1.0;  
      ub.x = round(255.0 * tmp0.x);  /* ub is a ubyte vector */  
      ub.y = round(255.0 * tmp0.y);  
      ub.z = round(255.0 * tmp0.z);  
      ub.w = round(255.0 * tmp0.w);  
      /* result obtained by combining raw bits of ub. */  
      result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));  
      result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));  
      result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));  
      result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));  
  
    PK4UB supports all three data type modifiers.  The single operand is  
    always treated as a floating-point value, but the result is written as a  
    floating-point value, a signed integer, or an unsigned integer, as  
    specified by the data type modifier.  For integer result variables, the  
    bits can be interpreted as described above.  For floating-point result  
    variables, the packed results do not constitute a meaningful  
    floating-point variable and should only be used to feed future unpack  
    instructions.  
  
    A program will fail to load if it contains a PK4UB instruction that writes  
    its results to a variable declared as "SHORT".  
  
  
    Section 2.X.8.Z, POW:  Exponentiate  
  
    The POW instruction approximates the value of the first scalar operand  
    raised to the power of the second scalar operand and replicates it to all  
    four components of the result vector.  
  
      tmp0 = ScalarLoad(op0);  
      tmp1 = ScalarLoad(op1);  
      result.x = ApproxPower(tmp0, tmp1);  
      result.y = ApproxPower(tmp0, tmp1);  
      result.z = ApproxPower(tmp0, tmp1);  
      result.w = ApproxPower(tmp0, tmp1);  
  
    The exponentiation approximation function may be implemented using the  
    base 2 exponentiation and logarithm approximation operations in the EX2  
    and LG2 instructions.  In particular,  
  
      ApproxPower(a,b) = ApproxExp2(b * ApproxLog2(a)).  
  
    Note that a logarithm may be involved even for cases where the exponent is  
    an integer.  This means that it may not be possible to exponentiate  
    correctly with a negative base.  In constrast, it is possible in a  
    "normal" mathematical formulation to raise negative numbers to integral  
    powers (e.g., (-3)^2== 9, and (-0.5)^-2==4).  
  
    POW supports only floating-point data type modifiers.  
  
  
    Section 2.X.8.Z, RCC:  Reciprocal (Clamped)  
  
    The RCC instruction approximates the reciprocal of the scalar operand,  
    clamps the result to one of two ranges, and replicates the clamped result  
    to all four components of the result vector.  
  
    If the approximated reciprocal is greater than 0.0, the result is clamped  
    to the range [2^-64, 2^+64].  If the approximate reciprocal is not greater  
    than zero, the result is clamped to the range [-2^+64, -2^-64].  
  
      tmp = ScalarLoad(op0);  
      result.x = ClampApproxReciprocal(tmp);  
      result.y = ClampApproxReciprocal(tmp);  
      result.z = ClampApproxReciprocal(tmp);  
      result.w = ClampApproxReciprocal(tmp);  
  
    RCC supports only floating-point data type modifiers.  
  
  
    Section 2.X.8.Z, RCP:  Reciprocal  
  
    The RCP instruction approximates the reciprocal of the scalar operand and  
    replicates it to all four components of the result vector.  
  
      tmp = ScalarLoad(op0);  
      result.x = ApproxReciprocal(tmp);  
      result.y = ApproxReciprocal(tmp);  
      result.z = ApproxReciprocal(tmp);  
      result.w = ApproxReciprocal(tmp);  
  
    RCP supports only floating-point data type modifiers.  
  
  
    Section 2.X.8.Z, REP:  Start of Repeat Block  
  
    The REP instruction begins a REP/ENDREP block.  The REP instruction  
    supports an optional operand whose x component specifies the initial value  
    for the loop count.  The loop count indicates the number of times the  
    instructions between the REP and corresponding ENDREP instruction will be  
    executed.  If the initial value of the loop count is not positive, the  
    entire block is skipped and execution continues at the instruction  
    following the corresponding ENDREP instruction.  If the loop count is  
    specified as a floating-point value, it is converted to the largest  
    integer less than or equal to the specified value (i.e., taking its  
    floor).  
  
    If no operand is provided to REP, the loop count is ignored and the  
    corresponding ENDREP instruction unconditionally transfers control to the  
    instruction immediately following the REP instruction.  The only way to  
    exit such a loop is with the BRK instruction.  To prevent obvious infinite  
    loops, a program that includes a REP/ENDREP block with no loop count will  
    fail to compile unless it contains either a BRK instruction at the current  
    nesting level or a RET instruction at any nesting level.  
  
    Implementations may have a limited ability to nest REP/ENDREP blocks.  If  
    the number of REP/ENDREP blocks nested inside each other is  
    MAX_PROGRAM_LOOP_DEPTH_NV or higher, a program will fail to compile.  
  
      // Set up loop information for the new nesting level.  
      tmp = VectorLoad(op0);  
      LoopCount = floor(tmp.x);  
      if (LoopCount <= 0) {  
        continue execution at the corresponding ENDREP;  
      }  
  
    REP supports all three data type modifiers.  The single operand is  
    interpreted according to the data type modifier.  
  
    (Note:  Unlike the NV_fragment_program2 extension, REP blocks in this  
    extension support fully general looping; the specified loop count can be  
    computed in the program itself.  Additionally, there is no run-time limit  
    on the maximum overall depth of REP/ENDREP nesting.  As long as each  
    individual subroutine of the program obeys the static nesting limits,  
    there will be no run-time errors in the program.  With the  
    NV_fragment_program2 extension, a program could terminate abnormally if it  
    called a subroutine inside a deeply nested set of REP/ENDREP blocks and  
    the called subroutine also contained deeply nested REP/ENDREP blocks.  
    Such an error could occur even if neither subroutine exceeded static  
    limits.)  
  
  
    Section 2.X.8.Z, RET:  Subroutine Return  
  
    The RET instruction conditionally returns from a subroutine initiated by a  
    CAL instruction by popping an instruction reference off the top of the  
    call stack and transferring control to the referenced instruction.  The  
    following pseudocode describes the operation of the instruction:  
  
      if (TestCC(cc.c***) || TestCC(cc.*c**) ||   
          TestCC(cc.**c*) || TestCC(cc.***c)) {  
        if (callStackDepth <= 0) {  
          // terminate program  
        } else {  
          callStackDepth--;  
          instruction = callStack[callStackDepth];  
        }  
  
        // continue execution at <instruction>  
      } else {  
        // do nothing  
      }  
  
    In the pseudocode, <callStackDepth> is the depth of the call stack,  
    <callStack> is an array holding the call stack, and <instruction> is a  
    reference to an instruction previously pushed onto the call stack.  
  
    If the call stack is empty when RET executes, the program terminates  
    normally.  
  
  
    Section 2.X.8.Z, RFL:  Reflection Vector  
  
    The RFL instruction computes the reflection of the second vector operand  
    (the "direction" vector) about the vector specified by the first vector  
    operand (the "axis" vector).  Both operands are treated as 3D vectors (the  
    w components are ignored).  The result vector is another 3D vector (the  
    "reflected direction" vector).  The length of the result vector, ignoring  
    rounding errors, should equal that of the second operand.  
  
      axis = VectorLoad(op0);  
      direction = VectorLoad(op1);  
      tmp.w = (axis.x * axis.x + axis.y * axis.y + axis.z * axis.z);  
      tmp.x = (axis.x * direction.x + axis.y * direction.y +   
               axis.z * direction.z);  
      tmp.x = 2.0 * tmp.x;  
      tmp.x = tmp.x / tmp.w;  
      result.x = tmp.x * axis.x - direction.x;  
      result.y = tmp.x * axis.y - direction.y;  
      result.z = tmp.x * axis.z - direction.z;  
  
    RFL supports only floating-point data type modifiers.  
  
  
    Section 2.X.8.Z, ROUND:  Round to Nearest Integer  
  
    The ROUND instruction loads a single vector operand and performs a  
    component-wise round operation to generate a result vector.  
  
      tmp = VectorLoad(op0);  
      result.x = round(tmp.x);  
      result.y = round(tmp.y);  
      result.z = round(tmp.z);  
      result.w = round(tmp.w);  
  
    The round operation returns the nearest integer to the operand.  If the  
    fractional portion of the operand is 0.5, round() selects the nearest even  
    integer.  For example round(-1.7) = -2.0, round(+1.0) = +1.0, and  
    round(+3.7) = +4.0.  
  
    ROUND supports all three data type modifiers.  The single operand is  
    always treated as a floating-point value, but the result is written as a  
    floating-point value, a signed integer, or an unsigned integer, as  
    specified by the data type modifier.  If a value is not exactly  
    representable using the data type of the result (e.g., an overflow or  
    writing a negative value to an unsigned integer), the result is undefined.  
  
  
    Section 2.X.8.Z, RSQ:  Reciprocal Square Root  
  
    The RSQ instruction approximates the reciprocal of the square root of the  
    scalar operand and replicates it to all four components of the result  
    vector.  
  
      tmp = ScalarLoad(op0);  
      result.x = ApproxRSQRT(tmp);  
      result.y = ApproxRSQRT(tmp);  
      result.z = ApproxRSQRT(tmp);  
      result.w = ApproxRSQRT(tmp);  
  
    If the operand is less than or equal to zero, the results of the  
    instruction are undefined.  
  
    RSQ supports only floating-point data type modifiers.  
  
    Note that this instruction differs from the RSQ instruction in  
    ARB_vertex_program in that it does not implicitly take the absolute value  
    of its operand.  The |abs| operator can be used to achieve equivalent  
    semantics.  
  
  
    Section 2.X.8.Z, SAD:  Sum of Absolute Differences  
  
    The SAD instruction performs a component-wise difference of the first two  
    integer operands (subtracting the second from the first), and then does a  
    component-wise add of the absolute value of the difference to the third  
    unsigned integer operand to yield an unsigned integer result vector.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      tmp2 = VectorLoad(op2);  
      result.x = abs(tmp0.x - tmp1.x) + tmp2.x;  
      result.y = abs(tmp0.y - tmp1.y) + tmp2.y;  
      result.z = abs(tmp0.z - tmp1.z) + tmp2.z;  
      result.w = abs(tmp0.w - tmp1.w) + tmp2.w;  
  
    SAD supports signed and unsigned integer data type modifiers.  The first  
    two operands are interpreted according to the data type modifier.  The  
    third operand and the result are always unsigned integers.  
  
  
    Section 2.X.8.Z, SCS:  Sine/Cosine without Reduction  
  
    The SCS instruction approximates the trigonometric sine and cosine of the  
    angle specified by the scalar operand and places the cosine in the x  
    component and the sine in the y component of the result vector.  The z and  
    w components of the result vector are undefined.  The angle is specified  
    in radians and must be in the range [-PI,PI].  
  
      tmp = ScalarLoad(op0);  
      result.x = ApproxCosine(tmp);  
      result.y = ApproxSine(tmp);  
  
    If the scalar operand is not in the range [-PI,PI], the result vector is  
    undefined.  
  
    SCS supports only floating-point data type modifiers.  
  
  
    Section 2.X.8.Z, SEQ:  Set on Equal  
  
    The SEQ instruction performs a component-wise comparison of the two  
    operands.  Each component of the result vector returns a TRUE value  
    (described below) if the corresponding component of the first operand is  
    equal to that of the second, and a FALSE value otherwise.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      result.x = (tmp0.x == tmp1.x) ? TRUE : FALSE;  
      result.y = (tmp0.y == tmp1.y) ? TRUE : FALSE;  
      result.z = (tmp0.z == tmp1.z) ? TRUE : FALSE;  
      result.w = (tmp0.w == tmp1.w) ? TRUE : FALSE;  
  
    SEQ supports all data type modifiers.  For floating-point data types, the  
    TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data  
    types, the TRUE value is -1 and the FALSE value is 0.  For unsigned  
    integer data types, the TRUE value is the maximum integer value (all bits  
    are ones) and the FALSE value is zero.  
  
  
    Section 2.X.8.Z, SFL:  Set on False  
  
    The SFL instruction is a degenerate case of the other "Set on"  
    instructions that sets all components of the result vector to a FALSE  
    value (described below).  
  
      result.x = FALSE;  
      result.y = FALSE;  
      result.z = FALSE;  
      result.w = FALSE;  
  
    SFL supports all data type modifiers.  For floating-point data types, the  
    FALSE value is 0.0.  For signed and unsigned integer data types, the FALSE  
    value is zero.  
  
  
    Section 2.X.8.Z, SGE:  Set on Greater Than or Equal  
  
    The SGE instruction performs a component-wise comparison of the two  
    operands.  Each component of the result vector returns a TRUE value  
    (described below) if the corresponding component of the first operand is  
    greater than or equal to that of the second, and a FALSE value otherwise.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      result.x = (tmp0.x >= tmp1.x) ? TRUE : FALSE;  
      result.y = (tmp0.y >= tmp1.y) ? TRUE : FALSE;  
      result.z = (tmp0.z >= tmp1.z) ? TRUE : FALSE;  
      result.w = (tmp0.w >= tmp1.w) ? TRUE : FALSE;  
  
    SGE supports all data type modifiers.  For floating-point data types, the  
    TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data  
    types, the TRUE value is -1 and the FALSE value is 0.  For unsigned  
    integer data types, the TRUE value is the maximum integer value (all bits  
    are ones) and the FALSE value is zero.  
  
  
    Section 2.X.8.Z, SGT:  Set on Greater Than  
  
    The SGT instruction performs a component-wise comparison of the two  
    operands.  Each component of the result vector returns a TRUE value  
    (described below) if the corresponding component of the first operand is  
    greater than that of the second, and a FALSE value otherwise.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      result.x = (tmp0.x > tmp1.x) ? TRUE : FALSE;  
      result.y = (tmp0.y > tmp1.y) ? TRUE : FALSE;  
      result.z = (tmp0.z > tmp1.z) ? TRUE : FALSE;  
      result.w = (tmp0.w > tmp1.w) ? TRUE : FALSE;  
  
    SGT supports all data type modifiers.  For floating-point data types, the  
    TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data  
    types, the TRUE value is -1 and the FALSE value is 0.  For unsigned  
    integer data types, the TRUE value is the maximum integer value (all bits  
    are ones) and the FALSE value is zero.  
  
  
    Section 2.X.8.Z, SHL:  Shift Left  
  
    The SHL instruction performs a component-wise left shift of the bits of  
    the first operand by the value of the second scalar operand to produce a  
    result vector.  The bits vacated during the shift operation are filled  
    with zeroes.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = ScalarLoad(op1);  
      result.x = tmp0.x << tmp1;  
      result.y = tmp0.y << tmp1;  
      result.z = tmp0.z << tmp1;  
      result.w = tmp0.w << tmp1;  
  
    The results of a shift operation ("<<") are undefined if the value of the  
    second operand is negative, or greater than or equal to the number of bits  
    in the first operand.  
  
    SHL supports both signed and unsigned integer data type modifiers.  If no  
    modifier is provided, the operands and the result are treated as signed  
    integers.  
  
  
    Section 2.X.8.Z, SHR:  Shift Right  
  
    The SHR instruction performs a component-wise right shift of the bits of  
    the first operand by the value of the second scalar operand to produce a  
    result vector.  The bits vacated during shift operation are filled with  
    zeros if the operand is non-negative and ones otherwise.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = ScalarLoad(op1);  
      result.x = tmp0.x >> tmp1;  
      result.y = tmp0.y >> tmp1;  
      result.z = tmp0.z >> tmp1;  
      result.w = tmp0.w >> tmp1;  
  
    The results of a shift operation (">>") are undefined if the value of the  
    second operand is negative, or greater than or equal to the number of bits  
    in the first operand.  
  
    SHR supports both signed and unsigned integer data type modifiers.  If no  
    modifiers are provided, the operands and the result are treated as signed  
    integers.  
  
  
    Section 2.X.8.Z, SIN:  Sine with Reduction to [-PI,PI]  
  
    The SIN instruction approximates the trigonometric sine of the angle  
    specified by the scalar operand and replicates it to all four components  
    of the result vector.  The angle is specified in radians and does not have  
    to be in the range [-PI,PI].  
  
      tmp = ScalarLoad(op0);  
      result.x = ApproxSine(tmp);  
      result.y = ApproxSine(tmp);  
      result.z = ApproxSine(tmp);  
      result.w = ApproxSine(tmp);  
  
    SIN supports only floating-point data type modifiers.  
  
  
    Section 2.X.8.Z, SLE:  Set on Less Than or Equal  
  
    The SLE instruction performs a component-wise comparison of the two  
    operands.  Each component of the result vector returns a TRUE value  
    (described below) if the corresponding component of the first operand is  
    less than or equal to that of the second, and a FALSE value otherwise.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      result.x = (tmp0.x <= tmp1.x) ? TRUE : FALSE;  
      result.y = (tmp0.y <= tmp1.y) ? TRUE : FALSE;  
      result.z = (tmp0.z <= tmp1.z) ? TRUE : FALSE;  
      result.w = (tmp0.w <= tmp1.w) ? TRUE : FALSE;  
  
    SLE supports all data type modifiers.  For floating-point data types, the  
    TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data  
    types, the TRUE value is -1 and the FALSE value is 0.  For unsigned  
    integer data types, the TRUE value is the maximum integer value (all bits  
    are ones) and the FALSE value is zero.  
  
  
    Section 2.X.8.Z, SLT:  Set on Less Than  
  
    The SLT instruction performs a component-wise comparison of the two  
    operands.  Each component of the result vector returns a TRUE value  
    (described below) if the corresponding component of the first operand is  
    less than that of the second, and a FALSE value otherwise.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      result.x = (tmp0.x < tmp1.x) ? TRUE : FALSE;  
      result.y = (tmp0.y < tmp1.y) ? TRUE : FALSE;  
      result.z = (tmp0.z < tmp1.z) ? TRUE : FALSE;  
      result.w = (tmp0.w < tmp1.w) ? TRUE : FALSE;  
  
    SLT supports all data type modifiers.  For floating-point data types, the  
    TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data  
    types, the TRUE value is -1 and the FALSE value is 0.  For unsigned  
    integer data types, the TRUE value is the maximum integer value (all bits  
    are ones) and the FALSE value is zero.  
  
  
    Section 2.X.8.Z, SNE:  Set on Not Equal  
  
    The SNE instruction performs a component-wise comparison of the two  
    operands.  Each component of the result vector returns a TRUE value  
    (described below) if the corresponding component of the first operand is  
    less than that of the second, and a FALSE value otherwise.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      result.x = (tmp0.x != tmp1.x) ? TRUE : FALSE;  
      result.y = (tmp0.y != tmp1.y) ? TRUE : FALSE;  
      result.z = (tmp0.z != tmp1.z) ? TRUE : FALSE;  
      result.w = (tmp0.w != tmp1.w) ? TRUE : FALSE;  
  
    SNE supports all data type modifiers.  For floating-point data types, the  
    TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data  
    types, the TRUE value is -1 and the FALSE value is 0.  For unsigned  
    integer data types, the TRUE value is the maximum integer value (all bits  
    are ones) and the FALSE value is zero.  
  
  
    Section 2.X.8.Z, SSG:  Set Sign  
  
    The SSG instruction generates a result vector containing the signs of  
    each component of the single vector operand.  Each component of the  
    result vector is 1.0 if the corresponding component of the operand  
    is greater than zero, 0.0 if the corresponding component of the  
    operand is equal to zero, and -1.0 if the corresponding component  
    of the operand is less than zero.  
  
      tmp = VectorLoad(op0);  
      result.x = SetSign(tmp.x);  
      result.y = SetSign(tmp.y);  
      result.z = SetSign(tmp.z);  
      result.w = SetSign(tmp.w);  
  
    SSG supports only floating-point data type modifiers.  
  
  
    Section 2.X.8.Z, STR:  Set on True  
  
    The STR instruction is a degenerate case of the other "Set on"  
    instructions that sets all components of the result vector to a TRUE value  
    (described below).  
  
      result.x = TRUE;  
      result.y = TRUE;  
      result.z = TRUE;  
      result.w = TRUE;  
  
    STR supports all data type modifiers.  For floating-point data types, the  
    TRUE value is 1.0.  For signed integer data types, the TRUE value is -1.  
    For unsigned integer data types, the TRUE value is the maximum integer  
    value (all bits are ones).  
  
  
    Section 2.X.8.Z, SUB:  Subtract  
  
    The SUB instruction performs a component-wise subtraction of the second  
    operand from the first to yield a result vector.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      result.x = tmp0.x - tmp1.x;  
      result.y = tmp0.y - tmp1.y;  
      result.z = tmp0.z - tmp1.z;  
      result.w = tmp0.w - tmp1.w;  
  
    SUB supports all three data type modifiers.  
  
  
    Section 2.X.8.Z, SWZ:  Extended Swizzle  
  
    The SWZ instruction loads the single vector operand, and performs a  
    swizzle operation more powerful than that provided for loading normal  
    vector operands to yield an instruction vector.  
  
    After the operand is loaded, the "x", "y", "z", and "w" components of the  
    result vector are selected by the first, second, third, and fourth matches  
    of the <extSwizComp> pattern in the <extendedSwizzle> rule.  
  
    A result component can be selected from any of the four components of the  
    operand or the constants 0.0 and 1.0.  The result component can also be  
    optionally negated.  The following pseudocode describes the component  
    selection method.  "operand" refers to the vector operand, "select" is an  
    enumerant where the values ZERO, ONE, X, Y, Z, and W correspond to the  
    <extSwizSel> rule matching "0", "1", "x", "y", "z", and "w", respectively.  
    "negate" is TRUE if and only if the <optionalSign> rule in <extSwizComp>  
    matches "-".  
  
      float ExtSwizComponent(floatVec operand, enum select, boolean negate)  
      {  
          float result;  
          switch (select) {  
            case ZERO:  result = 0.0; break;  
            case ONE:   result = 1.0; break;  
            case X:     result = operand.x; break;  
            case Y:     result = operand.y; break;  
            case Z:     result = operand.z; break;  
            case W:     result = operand.w; break;  
          }  
          if (negate) {  
            result = -result;  
          }  
          return result;  
      }  
  
    The entire extended swizzle operation is then defined using the following  
    pseudocode:  
  
      tmp = VectorLoad(op0);  
      result.x = ExtSwizComponent(tmp, xSelect, xNegate);  
      result.y = ExtSwizComponent(tmp, ySelect, yNegate);  
      result.z = ExtSwizComponent(tmp, zSelect, zNegate);  
      result.w = ExtSwizComponent(tmp, wSelect, wNegate);  
  
    "xSelect", "xNegate", "ySelect", "yNegate", "zSelect", "zNegate",  
    "wSelect", and "wNegate" correspond to the "select" and "negate" values  
    above for the four <extSwizComp> matches.    
  
    Since this instruction allows for component selection and negation for  
    each individual component, the grammar does not allow the use of the  
    normal swizzle and negation operations allowed for vector operands in  
    other instructions.  
  
    SWZ supports only floating-point data type modifiers.  
  
  
    Section 2.X.8.Z, TEX:  Texture Sample  
  
    The TEX instruction takes the four components of a single floating-point  
    source vector and performs a filtered texture access as described in  
    Section 2.X.4.4.  The returned (R,G,B,A) value is written to the  
    floating-point result vector.  Partial derivatives and the level of detail  
    are computed automatically.  
  
      tmp = VectorLoad(op0);  
      ddx = ComputePartialsX(tmp);  
      ddy = ComputePartialsY(tmp);  
      lambda = ComputeLOD(ddx, ddy);  
      result = TextureSample(tmp, lambda, ddx, ddy, texelOffset);  
  
    TEX supports all three data type modifiers.  The single operand is always  
    treated as a floating-point vector; the results are interpreted according  
    to the data type modifier.  
  
  
    Section 2.X.8.Z, TRUNC:  Truncate (Round Toward Zero)  
  
    The TRUNC instruction loads a single vector operand and performs a  
    component-wise truncate operation to generate a result vector.  
  
      tmp = VectorLoad(op0);  
      result.x = trunc(tmp.x);  
      result.y = trunc(tmp.y);  
      result.z = trunc(tmp.z);  
      result.w = trunc(tmp.w);  
  
    The truncate operation returns the nearest integer to zero smaller in  
    magnitude than the operand.  For example trunc(-1.7) = -1.0, trunc(+1.0) =  
    +1.0, and trunc(+3.7) = +3.0.  
  
    TRUNC supports all three data type modifiers.  The single operand is  
    always treated as a floating-point value, but the result is written as a  
    floating-point value, a signed integer, or an unsigned integer, as  
    specified by the data type modifier.  If a value is not exactly  
    representable using the data type of the result (e.g., an overflow or  
    writing a negative value to an unsigned integer), the result is undefined.  
  
  
    Section 2.X.8.Z, TXB:  Texture Sample with Bias  
  
    The TXB instruction takes the four components of a single floating-point  
    source vector and performs a filtered texture access as described in  
    Section 2.X.4.4.  The returned (R,G,B,A) value is written to the  
    floating-point result vector.  Partial derivatives and the level of detail  
    are computed automatically, but the fourth component of the source vector  
    is added to the computed LOD prior to sampling.  
  
      tmp = VectorLoad(op0);  
      ddx = ComputePartialsX(tmp);  
      ddy = ComputePartialsY(tmp);  
      lambda = ComputeLOD(ddx, ddy);  
      result = TextureSample(tmp, lambda + tmp.w, ddx, ddy, texelOffset);  
  
    The single source vector in the TXB instruction does not have enough  
    coordinates to specify a lookup into a two-dimensional array texture or  
    cube map texture with both an LOD bias and an explicit reference value for  
    depth comparison.  A program will fail to load if it contains a TXB  
    instruction with a target of SHADOWCUBE or SHADOWARRAY2D.  
  
    TXB supports all three data type modifiers.  The single operand is always  
    treated as a floating-point vector; the results are interpreted according  
    to the data type modifier.  
  
  
    Section 2.X.8.Z, TXD:  Texture Sample with Partials        
  
    The TXD instruction takes the four components of the first floating-point  
    source vector and performs a filtered texture access as described in  
    Section 2.X.4.4.  The returned (R,G,B,A) value is written to the  
    floating-point result vector.  The partial derivatives of the texture  
    coordinates with respect to X and Y are specified by the second and third  
    floating-point source vectors.  The level of detail is computed  
    automatically using the provided partial derivatives.  
  
    Note that for cube map texture targets, the provided partial derivatives  
    are in the coordinate system used before texture coordinates are projected  
    onto the appropriate cube face.  The partial derivatives of the  
    post-projection texture coordinates, which are used for level-of-detail  
    and anisotropic filtering calculations, are derived from the original  
    coordinates and partial derivatives in an implementation-dependent manner.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      tmp2 = VectorLoad(op2);  
      lambda = ComputeLOD(tmp1, tmp2);  
      result = TextureSample(tmp0, lambda, tmp1, tmp2, texelOffset);  
  
    TXD supports all three data type modifiers.  All three operands are always  
    treated as floating-point vectors; the results are interpreted according  
    to the data type modifier.  
  
  
    Section 2.X.8.Z, TXF:  Texel Fetch  
  
    The TXF instruction takes the four components of a single signed integer  
    source vector and performs a single texel fetch as described in Section  
    2.X.4.4.  The first three components provide the <i>, <j>, and <k> values  
    for the texel fetch, and the fourth component is used to determine the LOD  
    to access.  The returned (R,G,B,A) value is written to the floating-point  
    result vector.  Partial derivatives are irrelevant for single texel  
    fetches.  
  
      tmp = VectorLoad(op0);  
      result = TexelFetch(tmp, texelOffset);  
  
    TXF supports all three data type modifiers.  The single vector operand is  
    treated as a signed integer vector; the results are interpreted according  
    to the data type modifier.  
  
  
    Section 2.X.8.Z, TXL:  Texture Sample with LOD  
  
    The TXL instruction takes the four components of a single floating-point  
    source vector and performs a filtered texture access as described in  
    Section 2.X.4.4.  The returned (R,G,B,A) value is written to the  
    floating-point result vector.  The level of detail is taken from the  
    fourth component of the source vector.  
  
    Partial derivatives are not computed by the TXL instruction and  
    anisotropic filtering is not performed.  
  
      tmp = VectorLoad(op0);  
      ddx = (0,0,0);  
      ddy = (0,0,0);  
      result = TextureSample(tmp, tmp.w, ddx, ddy, texelOffset);  
  
    The single source vector in the TXL instruction does not have enough  
    coordinates to specify a lookup into a 2D array or cube map texture with  
    both an explicit LOD and a reference value for depth comparison.  A  
    program will fail to load if it contains a TXL instruction with a target  
    of SHADOWCUBE or SHADOWARRAY2D.  
  
    TXL supports all three data type modifiers.  The single vector operand is  
    treated as a floating-point vector; the results are interpreted according  
    to the data type modifier.  
  
  
    Section 2.X.8.Z, TXP:  Texture Sample with Projection  
  
    The TXP instruction divides the first three components of its single  
    floating-point source vector by its fourth component, maps the results to  
    s, t, and r, and performs a filtered texture access as described in  
    Section 2.X.4.4.  The returned (R,G,B,A) value is written to the  
    floating-point result vector.  Partial derivatives and the level of detail  
    are computed automatically.  
  
      tmp0 = VectorLoad(op0);  
      tmp0.x = tmp0.x / tmp0.w;  
      tmp0.y = tmp0.y / tmp0.w;  
      tmp0.z = tmp0.z / tmp0.w;  
      ddx = ComputePartialsX(tmp);  
      ddy = ComputePartialsY(tmp);  
      lambda = ComputeLOD(ddx, ddy);  
      result = TextureSample(tmp, lambda, ddx, ddy, texelOffset);  
  
    The single source vector in the TXP instruction does not have enough  
    coordinates to specify a lookup into a 2D array or cube map texture with  
    both a Q coordinate and an explicit reference value for depth comparison.  
    A program will fail to load if it contains a TXP instruction with a target  
    of SHADOWCUBE or SHADOWARRAY2D.  
  
    TXP supports all three data type modifiers.  The single vector operand is  
    treated as a floating-point vector; the results are interpreted according  
    to the data type modifier.  
  
  
    Section 2.X.8.Z, TXQ:  Texture Size Query  
  
    The TXQ instruction takes the first component of the single integer vector  
    operand, adds the number of the base level of the specified texture to  
    determine a texture image level, and returns an integer result vector  
    containing the size of the image at that level of the texture.  
  
    For one-dimensional and one-dimensional array textures, the "x" component  
    of the result vector is filled with the width of the image(s).  For  
    two-dimensional, rectangle, cube map, and two-dimensional array textures,  
    the "x" and "y" components are filled with the width and height of the  
    image(s).  For three-dimensional textures, the "x", "y", and "z"  
    components are filled with the width, height, and depth of the image.  
    Additionally, the number of layers in an array texture is returned in the  
    "y" component of the result for one-dimensional array textures or the "z"  
    component for two-dimensional array textures.  All other components of the  
    result vector is undefined.  For the purposes of this instruction, the  
    width, height, and depth of a texture do NOT include any border.  
  
      tmp0 = VectorLoad(op0);  
      tmp0.x = tmp0.x + texture[op1].target[op2].base_level;  
      result.x = texture[op1].target[op2].level[tmp0.x].width;  
      result.y = texture[op1].target[op2].level[tmp0.x].height;  
      result.z = texture[op1].target[op2].level[tmp0.x].depth;  
  
    If the level computed by adding the operand to the base level of the  
    texture is less than the base level number or greater than the maximum  
    level number, the results are undefined.  
  
    TXQ supports no data type modifiers; the scalar operand and the result  
    vector are both interpreted as signed integers.  
  
  
    Section 2.X.8.Z, UP2H:  Unpack Two 16-bit Floats  
  
    The UP2H instruction unpacks two 16-bit floats stored together in a 32-bit  
    scalar operand.  The first 16-bit float (stored in the 16 least  
    significant bits) is written into the "x" and "z" components of the result  
    vector; the second is written into the "y" and "w" components of the  
    result vector.  
  
    This operation undoes the type conversion and packing performed by  
    the PK2H instruction.  
  
      tmp = ScalarLoad(op0);  
      result.x = (fp16) (RawBits(tmp) & 0xFFFF);  
      result.y = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF);  
      result.z = (fp16) (RawBits(tmp) & 0xFFFF);  
      result.w = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF);  
  
    UP2H supports all three data type modifiers.  The single operand is read  
    as a floating-point value, a signed integer, or an unsigned integer, as  
    specified by the data type modifier; the 32 least significant bits of the  
    encoding are used for unpacking.  For floating-point operand variables, it  
    is expected (but not required) that the operand was produced by a previous  
    pack instruction.  The result is always written as a floating-point  
    vector.  
      
    A program will fail to load if it contains a UP2H instruction whose  
    operand is a variable declared as "SHORT".  
  
  
    Section 2.X.8.Z, UP2US:  Unpack Two Unsigned 16-bit Integers  
  
    The UP2US instruction unpacks two 16-bit unsigned values packed  
    together in a 32-bit scalar operand.  The unsigned quantities are  
    encoded where a bit pattern of all '0' bits corresponds to 0.0 and  
    a pattern of all '1' bits corresponds to 1.0.  The "x" and "z"  
    components of the result vector are obtained from the 16 least  
    significant bits of the operand; the "y" and "w" components are  
    obtained from the 16 most significant bits.  
  
    This operation undoes the type conversion and packing performed by  
    the PK2US instruction.  
  
      tmp = ScalarLoad(op0);  
      result.x = ((RawBits(tmp) >> 0)  & 0xFFFF) / 65535.0;  
      result.y = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0;  
      result.z = ((RawBits(tmp) >> 0)  & 0xFFFF) / 65535.0;  
      result.w = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0;  
  
    UP2US supports all three data type modifiers.  The single operand is read  
    as a floating-point value, a signed integer, or an unsigned integer, as  
    specified by the data type modifier; the 32 least significant bits of the  
    encoding are used for unpacking.  For floating-point operand variables, it  
    is expected (but not required) that the operand was produced by a previous  
    pack instruction.  The result is always written as a floating-point  
    vector.  
  
    A GPU program will fail to load if it contains a UP2S instruction  
    whose operand is a variable declared as "SHORT".  
  
  
    Section 2.X.8.Z, UP4B:  Unpack Four Signed 8-bit Integers  
  
    The UP4B instruction unpacks four 8-bit signed values packed together  
    in a 32-bit scalar operand.  The signed quantities are encoded where  
    a bit pattern of all '0' bits corresponds to -128/127 and a pattern  
    of all '1' bits corresponds to +127/127.  The "x" component of the  
    result vector is the converted value corresponding to the 8 least  
    significant bits of the operand; the "w" component corresponds to  
    the 8 most significant bits.  
  
    This operation undoes the type conversion and packing performed by  
    the PK4B instruction.  
  
      tmp = ScalarLoad(op0);  
      result.x = (((RawBits(tmp) >> 0) & 0xFF) - 128) / 127.0;  
      result.y = (((RawBits(tmp) >> 8) & 0xFF) - 128) / 127.0;  
      result.z = (((RawBits(tmp) >> 16) & 0xFF) - 128) / 127.0;  
      result.w = (((RawBits(tmp) >> 24) & 0xFF) - 128) / 127.0;  
  
    UP2B supports all three data type modifiers.  The single operand is read  
    as a floating-point value, a signed integer, or an unsigned integer, as  
    specified by the data type modifier; the 32 least significant bits of the  
    encoding are used for unpacking.  For floating-point operand variables, it  
    is expected (but not required) that the operand was produced by a previous  
    pack instruction.  The result is always written as a floating-point  
    vector.  
  
    A program will fail to load if it contains a UP4B instruction whose  
    operand is a variable declared as "SHORT".  
  
  
    Section 2.X.8.Z, UP4UB:  Unpack Four Unsigned 8-bit Integers  
  
    The UP4UB instruction unpacks four 8-bit unsigned values packed  
    together in a 32-bit scalar operand.  The unsigned quantities are  
    encoded where a bit pattern of all '0' bits corresponds to 0.0 and a  
    pattern of all '1' bits corresponds to 1.0.  The "x" component of the  
    result vector is obtained from the 8 least significant bits of the  
    operand; the "w" component is obtained from the 8 most significant  
    bits.  
  
    This operation undoes the type conversion and packing performed by  
    the PK4UB instruction.  
  
      tmp = ScalarLoad(op0);  
      result.x = ((RawBits(tmp) >> 0)  & 0xFF) / 255.0;  
      result.y = ((RawBits(tmp) >> 8)  & 0xFF) / 255.0;  
      result.z = ((RawBits(tmp) >> 16) & 0xFF) / 255.0;  
      result.w = ((RawBits(tmp) >> 24) & 0xFF) / 255.0;  
  
    UP4UB supports all three data type modifiers.  The single operand is read  
    as a floating-point value, a signed integer, or an unsigned integer, as  
    specified by the data type modifier; the 32 least significant bits of the  
    encoding are used for unpacking.  For floating-point operand variables, it  
    is expected (but not required) that the operand was produced by a previous  
    pack instruction.  The result is always written as a floating-point  
    vector.  
  
    A program will fail to load if it contains a UP4UB instruction whose  
    operand is a variable declared as "SHORT".  
  
  
    Section 2.X.8.Z, X2D:  2D Coordinate Transformation  
  
    The X2D instruction multiplies the 2D offset vector specified by the  
    "x" and "y" components of the second vector operand by the 2x2 matrix  
    specified by the four components of the third vector operand, and adds  
    the transformed offset vector to the 2D vector specified by the "x"  
    and "y" components of the first vector operand.  The first component  
    of the sum is written to the "x" and "z" components of the result;  
    the second component is written to the "y" and "w" components of  
    the result.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      tmp2 = VectorLoad(op2);  
      result.x = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y;  
      result.y = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w;  
      result.z = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y;  
      result.w = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w;  
  
    X2D supports only floating-point data type modifiers.  
  
  
    Section 2.X.8.Z, XOR:  Exclusive Or  
  
    The XOR instruction performs a bitwise XOR operation on the components of  
    the two source vectors to yield a result vector.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      result.x = tmp0.x ^ tmp1.x;  
      result.y = tmp0.y ^ tmp1.y;  
      result.z = tmp0.z ^ tmp1.z;  
      result.w = tmp0.w ^ tmp1.w;  
  
    XOR supports only integer data type modifiers.  If no type modifier is  
    specified, both operands and the result are treated as signed integers.  
  
  
    Section 2.X.8.Z, XPD:  Cross Product  
  
    The XPD instruction computes the cross product using the first three  
    components of its two vector operands to generate the x, y, and z  
    components of the result vector.  The w component of the result vector is  
    undefined.  
  
      tmp0 = VectorLoad(op0);  
      tmp1 = VectorLoad(op1);  
      result.x = tmp0.y * tmp1.z - tmp0.z * tmp1.y;  
      result.y = tmp0.z * tmp1.x - tmp0.x * tmp1.z;  
      result.z = tmp0.x * tmp1.y - tmp0.y * tmp1.x;  
  
    XPD supports only floating-point data type modifiers.

Additions to Chapter 3 of the OpenGL 1.5 Specification (Rasterization)

  
    Modify Section 3.8.1, Texture Image Specification, p. 150  
  
    (modify 4th paragraph, p. 151 -- add cubemaps to the list of texture  
    targets that can be used with DEPTH_COMPONENT textures) Textures with a  
    base internal format of DEPTH_COMPONENT are supported by texture image  
    specification commands only if <target> is TEXTURE_1D, TEXTURE_2D,  
    TEXTURE_CUBE_MAP, TEXTURE_RECTANGLE_ARB, TEXTURE_1D_ARRAY_EXT,  
    TEXTURE_2D_ARRAY_EXT, PROXY_TEXTURE_1D PROXY_TEXTURE_2D,  
    PROXY_TEXTURE_CUBE_MAP, PROXY_TEXTURE_RECTANGLE_ARB,  
    PROXY_TEXTURE_1D_ARRAY_EXT, or PROXY_TEXTURE_2D_ARRAY_EXT.  Using this  
    format in conjunction with any other target will result in an  
    INVALID_OPERATION error.  
  
  
    Delete Section 3.8.7, Texture Wrap Modes.  (The language in this section  
    is folded into updates to the following section, and is no longer needed  
    here.)  
  
  
    Modify Section 3.8.8, Texture Minification:  
  
    (replace the last paragraph, p. 171):  Let s(x,y) be the function that  
    associates an s texture coordinate with each set of window coordinates  
    (x,y) that lie within a primitive; define t(x,y) and r(x,y) analogously.  
    Let  
  
      u(x,y) = w_t * s(x,y) + offsetu_shader,  
      v(x,y) = h_t * t(x,y) + offsetv_shader,   
      w(x,y) = d_t * r(x,y) + offsetw_shader, and  
  
    where w_t, h_t, and d_t are as defined by equations 3.15, 3.16, and 3.17  
    with w_s, h_s, and d_s equal to the width, height, and depth of the image  
    array whose level is level_base.  (offsetu_shader, offsetv_shader,  
    offsetw_shader) is the texel offset specified in the vertex, geometry, or  
    fragment program instruction used to perform the access.  For  
    fixed-function texture accesses, all three shader offsets are taken to be  
    zero.  For a one-dimensional texture, define v(x,y) == 0 and w(x,y) === 0;  
    for two-dimensional textures, define w(x,y) == 0.  
  
    (start a new paragraph with "For a polygon, rho is given at a fragment  
    with window coordinates...", and then continue with the original spec  
    text.)  
  
    (replace text starting with the last paragraph on p. 172, continuing to  
    the end of p. 174)  
  
    The (u,v,w) coordinates are then modified according the texture wrap  
    modes, as specified in Table X.19, to generate a new set of coordinates  
    (u',v',w').  
  
      TEXTURE_WRAP_S              Coordinate Transformation  
      --------------------------  ------------------------------------------  
      CLAMP                       u' = clamp(u, 0, w_t-0.5),  
                                         if NEAREST filtering,  
                                       clamp(u, 0, w_t),  
                                         otherwise  
      CLAMP_TO_EDGE               u' = clamp(u, 0.5, w_t-0.5)  
      CLAMP_TO_BORDER             u' = clamp(u, -0.5, w_t+0.5)  
      REPEAT                      u' = clamp(fmod(u, w_t), 0.5, w_t-0.5)  
      MIRROR_CLAMP_EXT            u' = clamp(fabs(u), 0.5, w_t-0.5),  
                                         if NEAREST filtering, or  
                                     = clamp(fabs(u), 0.5, w_t),  
                                         otherwise  
      MIRROR_CLAMP_TO_EDGE_EXT    u' = clamp(fabs(u), 0.5, w_t-0.5)  
      MIRROR_CLAMP_TO_BORDER_EXT  u' = clamp(fabs(u), 0.5, w_t+0.5)  
      MIRRORED_REPEAT             u' = w_t - clamp(fabs(w_t - fmod(u, 2*w_t)),  
                                                   0.5, w_t-0.5),  
  
      Table X.19:  Texel coordinate wrap mode application.  clamp(a,b,c)  
      returns b if a<b, c if a>c, and a otherwise.  fmod(a,b) returns  
      a-b*floor(a/b), and fabs(a) returns the absolute value of a.  For the v  
      and w coordinates, TEXTURE_WRAP_T and h_t, and TEXTURE_WRAP_R and d_t,  
      respectively, are used.  
  
    When lambda indicates minification, the value assigned to  
    TEXTURE_MIN_FILTER is used to determine how the texture value for a  
    fragment is selected.  
  
    When TEXTURE_MIN_FILTER is NEAREST, the texel in the image array of level  
    level_base that is nearest (in Manhattan distance) to that specified by  
    (s,t,r) is obtained.  For a three-dimensional texture, the texel at  
    location (i,j,k) becomes the texture value.  For a two-dimensional  
    texture, k is irrelevant, and the texel at location (i,j) becomes the  
    texture value.  For a one-dimensional texture, j and k are irrelevant, and  
    the texel at location i becomes the texture value.  
  
    If the selected (i,j,k), (i,j), or i location refers to a border texel  
    that satisfies any of the following conditions:  
  
      i < -b_s,  
      j < -b_s,   
      k < -b_s,   
      i >= w_l + b_s,   
      j >= h_l + b_s, or  
      j >= d_l + b_s,  
   
    then the border values defined by TEXTURE_BORDER_COLOR are used in place  
    of the non-existent texel. If the texture contains color components, the  
    values of TEXTURE_BORDER_COLOR are interpreted as an RGBA color to match  
    the textures internal format in a manner consistent with table 3.15. If  
    the texture contains depth components, the first component of  
    TEXTURE_BORDER_COLOR is interpreted as a depth value.  
  
    When TEXTURE_MIN_FILTER is LINEAR, a 2x2x2 cube of texels in the image  
    array of level level_base is selected.  Let:  
  
      i_0   = floor(u' - 0.5),  
      j_0   = floor(v' - 0.5),  
      k_0   = floor(w' - 0.5),  
      i_1   = i_0 + 1,  
      j_1   = j_0 + 1,  
      k_1   = k_0 + 1,  
      alpha = frac(u' - 0.5),  
      beta  = frac(v' - 0.5),  
      gamma = frac(w' - 0.5),  
  
    where frac(<x>) denotes the fractional part of <x>.  
  
    For a three-dimensional texture, the texture value tau is found as...  
  
    (replace last paragraph, p.174) For any texel in the equation above that  
    refers to a border texel outside the defined range of the image, the texel  
    value is taken from the texture border color as with NEAREST filtering.  
  
  
    Modify Section 3.8.14, Texture Comparison Modes (p. 185)  
  
    (modify 2nd paragraph, p. 188, indicating that the Q texture coordinate is  
    used for depth comparisons on cubemap textures)  
  
    Let D_t be the depth texture value, in the range [0, 1].  For  
    fixed-function texture lookups, let R be the interpolated <r> texture  
    coordinate, clamped to the range [0, 1].  For texture lookups generated by  
    a program instruction, let R be the reference value for depth comparisons  
    provided in the instruction, also clamped to [0, 1].  Then the effective  
    texture value L_t, I_t, or A_t is computed as follows:

Additions to Chapter 4 of the OpenGL 1.5 Specification (Per-Fragment Operations and the Frame Buffer)

  
    None.

Additions to Chapter 5 of the OpenGL 1.5 Specification (Special Functions)

  
    None.

Additions to Chapter 6 of the OpenGL 1.5 Specification (State and State Requests)

  
    Modify Section 6.1.12 of the ARB_vertex_program specification.  
  
    (Add new integer program parameter queries, plus language that program  
    environment or local parameter query results are undefined if the query  
    specifies a data type incompatible with the data type of the parameter  
    being queried.)  
  
    The commands  
  
      void GetProgramEnvParameterdvARB(enum target, uint index,  
                                       double *params);  
      void GetProgramEnvParameterfvARB(enum target, uint index,  
                                       float *params);  
      void GetProgramEnvParameterIivNV(enum target, uint index,  
                                       int *params);  
      void GetProgramEnvParameterIuivNV(enum target, uint index,  
                                        uint *params);  
  
    obtain the current value for the program environment parameter numbered  
    <index> for the given program target <target>, and places the information  
    in the array <params>.  The values returned are undefined if the data type  
    of the components of the parameter is not compatible with the data type of  
    <params>.  Floating-point components are compatible with "double" or  
    "float"; signed and unsigned integer components are compatible with "int"  
    and "uint", respectively.  The error INVALID_ENUM is generated if <target>  
    specifies a nonexistent program target or a program target that does not  
    support program environment parameters.  The error INVALID_VALUE is  
    generated if <index> is greater than or equal to the  
    implementation-dependent number of supported program environment  
    parameters for the program target.  
  
    ...  
  
    The commands  
  
      void GetProgramLocalParameterdvARB(enum target, uint index,  
                                         double *params);  
      void GetProgramLocalParameterfvARB(enum target, uint index,  
                                         float *params);  
      void GetProgramLocalParameterIivNV(enum target, uint index,  
                                         int *params);  
      void GetProgramLocalParameterIuivNV(enum target, uint index,  
                                          uint *params);  
  
    obtain the current value for the program local parameter numbered <index>  
    belonging to the program object currently bound to <target>, and places  
    the information in the array <params>.  The values returned are undefined  
    if the data type of the components of the parameter is not compatible with  
    the data type of <params>.  Floating-point components are compatible with  
    "double' or "float"; signed and unsigned integer components are compatible  
    with "int" and "uint", respectively.  The error INVALID_ENUM is generated  
    if <target> specifies a nonexistent program target or a program target  
    that does not support program local parameters.  The error INVALID_VALUE  
    is generated if <index> is greater than or equal to the  
    implementation-dependent number of supported program local parameters for  
    the program target.  
  
    ...  
  
    The command  
  
      void GetProgramivARB(enum target, enum pname, int *params);  
  
    obtains program state for the program target <target>, writing ...  
      
    (add new paragraphs describing the new supported queries)  
  
    If <pname> is PROGRAM_ATTRIB_COMPONENTS_NV or  
    PROGRAM_RESULT_COMPONENTS_NV, GetProgramivARB returns a single integer  
    holding the number of active attribute or result variable components,  
    respectively, used by the program object currently bound to <target>.  
  
    If <pname> is MAX_PROGRAM_ATTRIB_COMPONENTS or  
    MAX_PROGRAM_RESULT_COMPONENTS_NV, GetProgramivARB returns a single integer  
    holding the maximum number of active attribute or result variable  
    components, respectively, supported for programs of type <target>.

Additions to Appendix A of the OpenGL 1.5 Specification (Invariance)

  
    None.

Additions to the AGL/GLX/WGL Specifications

  
    None.

GLX Protocol

  
    None.

Errors

  
    The error INVALID_VALUE is generated by ProgramLocalParameter4fARB,  
    ProgramLocalParameter4fvARB, ProgramLocalParameter4dARB,  
    ProgramLocalParameter4dvARB, ProgramLocalParameterI4iNV,  
    ProgramLocalParameterI4ivNV, ProgramLocalParameterI4uiNV,  
    ProgramLocalParameterI4uivNV, GetProgramLocalParameter4fvARB,  
    GetProgramLocalParameter4dvARB, GetProgramLocalParameterI4ivNV, and  
    GetProgramLocalParameterI4uivNV if <index> is greater than or equal to the  
    number of program local parameters supported by <target>.  
  
    The error INVALID_VALUE is generated by ProgramEnvParameter4fARB,  
    ProgramEnvParameter4fvARB, ProgramEnvParameter4dARB,  
    ProgramEnvParameter4dvARB, ProgramEnvParameterI4iNV,  
    ProgramEnvParameterI4ivNV, ProgramEnvParameterI4uiNV,  
    ProgramEnvParameterI4uivNV, GetProgramEnvParameter4fvARB,  
    GetProgramEnvParameter4dvARB, GetProgramEnvParameterI4ivNV, and  
    GetProgramEnvParameterI4uivNV if <index> is greater than or equal to the  
    number of program environment parameters supported by <target>.  
  
    The error INVALID_VALUE is generated by ProgramLocalParameters4fvNV,  
    ProgramLocalParametersI4ivNV, and ProgramLocalParametersI4uivNV if the sum  
    of <index> and <count> is greater than the number of program local  
    parameters supported by <target>.  
  
    The error INVALID_VALUE is generated by ProgramEnvParameters4fvNV,  
    ProgramEnvParametersI4ivNV, and ProgramEnvParametersI4uivNV if the sum of  
    <index> and <count> is greater than the number of program environment  
    parameters supported by <target>.

Dependencies on NV_parameter_buffer_object

  
    If NV_parameter_buffer_object is not supported, references to program  
    parameter buffer variables and bindings should be removed.

Dependencies on ARB_texture_rectangle

  
    If ARB_texture_rectangle is not supported, references to rectangle  
    textures and the RECT and SHADOWRECT texture target identifiers should be  
    removed.

Dependencies on EXT_gpu_program_parameters

  
    If EXT_gpu_program_parameters is not supported, references to the  
    Program{Local,Env}Parameters4fvNV commands, which set multiple program  
    local or environment parameters in a single call, should be removed.  
    These prototypes were included in this spec for completeness only.

Dependencies on EXT_texture_integer

  
    If EXT_texture_integer is not supported, references to texture lookups  
    returning integer values in Section 2.X.4.4 (Texture Access) should be  
    removed, and all texture formats are considered to produce floating-point  
    values.

Dependencies on EXT_texture_array

  
    If EXT_texture_array is not supported, references to array textures in  
    Section 2.X.4.4 (Texture Access) and elsewhere should be removed, as  
    should all references to the "ARRAY1D", "ARRAY2D", "SHADOWARRAY1D", and  
    "SHADOWARRAY2D" tokens.

Dependencies on EXT_texture_buffer_object

  
    If EXT_texture_buffer_object is not supported, references to buffer  
    textures in Section 2.X.4.4 (Texture Access) and elsewhere should be  
    removed, as should all references to the "BUFFER" tokens.

Dependencies on NV_primitive_restart

  
    If NV_primitive_restart is supported, index values causing a primitive  
    restart are not considered as specifying an End command, followed by  
    another Begin.  Primitive restart is therefore not guaranteed to  
    immediately update bindings for material properties changed inside a  
    Begin/End.  The spec language says they "are not guaranteed to update  
    program parameter bindings until the following End command."

New State

  
                                                         Initial  
    Get Value                     Type  Get Command       Value  Description             Sec     Attrib  
    ----------------------------  ----  ---------------  ------- ----------------------  ------  ------  
    PROGRAM_ATTRIB_COMPONENTS_NV  Z+    GetProgramivARB     -    number of components    6.1.12   -  
                                                                 used for attributes  
    PROGRAM_RESULT_COMPONENTS_NV  Z+    GetProgramivARB     -    number of components    6.1.12   -  
                                                                 used for results  
  
    Table X.20.  New Program Object State.  Program object queries return  
    attributes of the program object currently bound to the program target  
    <target>.

New Implementation Dependent State

  
                                                             Minimum  
    Get Value                         Type  Get Command       Value   Description           Sec.   Attrib  
    --------------------------------  ----  ---------------  -------  --------------------- ------ ------  
    MIN_PROGRAM_TEXEL_OFFSET_EXT      Z     GetIntegerv        -8     minimum texel offset  2.x.4.4  -  
                                                                      allowed in lookup  
    MAX_PROGRAM_TEXEL_OFFSET_EXT      Z     GetIntegerv        +7     maximum texel offset  2.x.4.4  -  
                                                                      allowed in lookup  
    MAX_PROGRAM_ATTRIB_COMPONENTS_NV  Z+    GetProgramivARB    (*)    maximum number of     6.1.12   -  
                                                                      components allowed  
                                                                      for attributes  
    MAX_PROGRAM_RESULT_COMPONENTS_NV  Z+    GetProgramivARB    (*)    maximum number of     6.1.12   -  
                                                                      components allowed  
                                                                      for results  
    MAX_PROGRAM_GENERIC_ATTRIBS_NV    Z+    GetProgramivARB    (*)    number of generic     6.1.12   -  
                                                                      attribute vectors  
                                                                      supported  
    MAX_PROGRAM_GENERIC_RESULTS_NV    Z+    GetProgramivARB    (*)    number of generic     6.1.12   -  
                                                                      result vectors  
                                                                      supported  
    MAX_PROGRAM_CALL_DEPTH_NV         Z+    GetProgramivARB     4     maximum program       2.X.5    -  
                                                                      call stack depth  
    MAX_PROGRAM_IF_DEPTH_NV           Z+    GetProgramivARB     48    maximum program       2.X.5    -  
                                                                      if nesting  
    MAX_PROGRAM_LOOP_DEPTH_NV         Z+    GetProgramivARB     4     maximum program       2.X.5    -  
                                                                      loop nesting  
  
    Table X.21:  New Implementation-Dependent Values Introduced by  
    NV_gpu_program4.  (*) means that the required minimum is program  
    type-specific.  There are separate limits for each program type.

Issues

  
    (1) How does this extension differ from previous NV_vertex_program and  
    NV_fragment_program extensions?  
  
      RESOLVED:  
  
        - This extension provides a uniform set of instructions and bindings.  
          Unlike previous extensions, the set of instructions and bindings  
          available is generally the same.  The only exceptions are a small  
          number of instructions and bindings that make sense for one specific  
          program type.  
  
        - This extension supports integer data types and provides a  
          full-fledged integer instruction set.  
  
        - This extension supports array variables of all types, including  
          temporaries.  Array variables can be accessed directly or indirectly  
          (using integer temporaries as indices).  
  
        - This extension provides a uniform set of structured branching  
          constructs (if tests, loops, subroutines) that fully support  
          run-time condition testing.  Previous versions of NV_vertex_program  
          provided unstructured branching.  Previous versions of  
          NV_fragment_program provided structure branching constructs, but the  
          support was more limited -- for example, looping constructs couldn't  
          specify loop counts with values computed at run time.  
  
        - This extension supports geometry programs, which are described in  
          more detail in the NV_geometry_program4 extension.  
  
        - This extension provides the ability to specify and use cubemap  
          textures with a DEPTH_COMPONENT internal format.  Shadow mapping is  
          supported; the Q texture coordinate is used as the reference value  
          for comparisons.  
  
    (2) Is this extension backward-compatible with previous NV_vertex_program  
    and NV_fragment_program extensions?  If not, what support has been  
    removed?  
  
      RESOLVED:  This extension is largely, but not completely,  
      backward-compatible.  Functionality removed includes:  
  
        - Unstructured branching:  NV_vertex_program2 included a general  
          branch instruction "BRA" that could be used to jump to an arbitrary  
          instruction.  The "CAL" instruction could "call" to an arbitrary  
          instruction into code that was not necessarily structured as simple  
          subroutine blocks.  Arbitrary unstructured branching can be  
          difficult to implement efficiently on highly parallel GPU  
          architectures, while basic structured branching is not nearly as  
          difficult.  
  
          This extension retains the "CAL" instruction but treats each block  
          of code between instruction labels as a separate subroutine.  The  
          "BRA" instruction and arbitrary branching has been removed.  The  
          structured branching constructs in this extension are sufficient to  
          implement almost all of the looping/branching support in high-level  
          languages ("goto" being the most obvious exception).  
  
        - Address registers:  NV_vertex_program added the notion of address  
          registers, which were effectively under-powered integer temporaries.  
          The set of instructions used to manipulate address registers was  
          severely limited.  NV_vertex_program[23] extended the original  
          scalars to vectors and added a few more instructions to manipulate  
          address registers.  Fragment programs had no address registers until  
          NV_fragment_program2 added the loop counter, which was very similar  
          in functionality to vertex program address registers, but even more  
          limited.  This extension adds true integer temporaries, which can  
          accomplish everything old address registers could do, and much more.  
          Address register support was removed to simplify the API.  
  
        - NV_fragment_program2 LOOP construct:  NV_fragment_program2 added a  
          LOOP instruction, which let you repeat a block of code <N> times,  
          with a parallel loop counter that started at <A> and stepped by <B>  
          on each iteration.  This construct was signficantly limited in  
          several ways -- the loop count had to be constant, and you could  
          only access the innermost loop counter in a nested loop.  This  
          extension discards the support and retains the simpler "REP"  
          construct to implement loops.  If desired, a loop counter can be  
          implemented by manipulating an integer temporary.  The "BRK"  
          instruction (conditional break) is retained, and a "CONT"  
          instruction (conditional continue) is added.  Additionally, the loop  
          count need not be a constant.  
  
        - NV_vertex_program and ARB_vertex_program EXP and LOG instructions:  
          NV_vertex_program provided EXP and LOG instructions that computed a  
          rough approximation of 2^x or log_2(x) and provided some additional  
          values that could help refine the approximation.  Those opcodes were  
          carried forward into ARB_vertex_program.  Both ARB_vertex_program  
          and NV_vertex_program2 provided EX2 and LG2 instructions that  
          computed a better approximation.  All fragment program extensions  
          also provided EX2 and LG2, but did not bother to include EXP and  
          LOG.  On the hardware targeted by this extension, there is no  
          advantage to using EXP and LOG, so these opcodes have been removed  
          for simplicity.  
  
        - NV_vertex_program3 and NV_fragment_program2 provide the ability to  
          do indirect addressing of inputs/outputs when using bindings in  
          instructions -- for example:  
  
            MOV R0, vertex.attrib[A0.x+2];      # vertex  
            MOV result.texcoord[A0.y], R1;      # vertex  
            MOV R2, fragment.texcoord[A0.x];    # fragment  
  
          This extension provides indexing capability, but using named array  
          variables instead.  
  
            ATTRIB attribs[] = { vertex.attrib[2..5] };  
            MOV R0, attribs[A0.x];  
            OUTPUT outcoords[] = { result.texcoord[0..3] };  
            MOV outcoords[A0.y], R1;  
            ATTRIB texcoords[] = { fragment.texcoord[0..2] };  
            MOV R2, texcoords[A0.x];  
  
          This approach makes the set of attribute and result bindings more  
          regular.  Additionally, it helps the assembler determine which  
          vertex/fragment attributes are actually needed -- when the assembler  
          sees constructs like "fragment.texcoord[A0.x]", it must treat *all*  
          texture coordinates as live unless it can determine the range of  
          values used for indexing.  The named array variable approach  
          explicitly identifies which attributes are needed when indexing is  
          used.  
  
      Functionality altered includes:  
  
        - The RSQ instruction in the original NV_vertex_program and  
          ARB_vertex_program extensions implicitly took the absolute value of  
          their operand.  Since the ARB extensions don't have numerics  
          guarantees, computing the reciprocal square root of a negative value  
          was not meaningful.  To allow for the possibility of taking the  
          reciprocal square root of a negative value (which should yield NaN  
          -- "not a number"), the RSQ instruction in this instruction no  
          longer implicitly takes the absolute value of its operand.  
          Equivalent functionality can be achieved using the explicit |abs|  
          absolute value operator on the operand to RSQ.  
  
        - The results of texture lookups accessing inconsistent textures are  
          now undefined, instead of producing a fixed constant vector.  
  
  
    (3) What should this set of extensions be called?  
  
      RESOLVED:  NV_gpu_program4, NV_vertex_program4, NV_fragment_program4,  
      and NV_geometry_program4.  Only NV_gpu_program4 will appear in the  
      extension string; the other three specifications exist simply to define  
      vertex, fragment, and geometry program-specific features.  
  
      The "gpu_program" name was chosen due to the common instruction set  
      intended to run on GPUs.  On previous chip generations, the vertex and  
      fragment instruction sets were similar, but there were enough  
      differences to package them separately.  
  
      The choice of "4" indicates that this is the fourth generation of  
      programmable hardware from NVIDIA.  The GeForce3 and GeForce4 series  
      supported NV_vertex_program.  The GeForce FX series supported  
      NV_vertex_program2 and added fragment programmability with  
      NV_fragment_program.  Around this time, the OpenGL Architecture Review  
      Board (ARB) approved ARB_vertex_program and ARB_fragment_program  
      extensions, and NVIDIA added NV_vertex_program2_option and  
      NV_fragment_program_option extensions exposing GeForce FX features using  
      the ARB extensions' instruction set.  The GeForce6 and GeForce7 series  
      brought the NV_vertex_program3 and NV_fragment_program2 extensions,  
      which extend the ARB extensions further.  This extension adds geometry  
      programs, and brings the "version number" for each of these extensions  
      up to "4".  
  
  
    (4) This instruction adds integer data type support in programmable  
    shaders that were previously float-centric.  Should applications be able  
    to pass integer values directly to the shaders, and if so, how does it  
    work?  
  
      RESOLVED:  The diagram at the bottom of this issue depicts data flows in  
      the GL, as extended by this and related extensions.  
  
      This extension generalizes some state to be "typeless", instead of being  
      strongly typed (and almost invariably floating-point) as in the core  
      specification.  We introduce a new set of functions to specify GL state  
      as signed or unsigned integer values, instead of floating point values.  
      These functions include:  
  
        * VertexAttribI*{i,ui}() -- Specify generic vertex attributes as  
          integers.  This extension does not create "integer" versions for  
          fixed-function attribute functions (e.g., glColor, glTexCoord),  
          which remain fully floating-point.  
  
        * Program{Env,Local}ParameterI*{i,ui}() -- Specify environment and  
          local parameters as integers.  
  
        * TexImage*() with EXT_texture_integer internal formats -- Specify  
          texture images as containing integer data whose values are not  
          converted to floating-point values.  
  
        * EXT_parameter_buffer_object functions -- Bind (typeless) buffer  
          object data stores for use as program parameters.  These buffer  
          objects can be loaded with either integer or floating-point data.  
  
        * EXT_texture_buffer_object functions -- Bind (typeless) buffer object  
          data stores for use as textures.  These buffer objects can be loaded  
          with either integer or floating-point data.  
  
      Each type of program (using NV_gpu_program4 and related extension) can  
      read attributes using any data type (float, signed integer, unsigned  
      integer) and write result values used by subsequent stages using any  
      data type.  
  
      Finally, there are several new places where integer data can be  
      consumed by the GL:  
  
        * NV_transform_feedback -- Stream transformed vertex attribute  
          components to a (typeless) buffer object.  The transformed  
          attributes can be written as signed or unsigned integers in vertex  
          and geometry programs.  
  
        * EXT_texture_integer internal formats and framebuffer objects --  
          Provide support for rendering to integer texture formats, where  
          final fragment values are treated as signed or unsigned integers,  
          rather than floating-point values.  
  
      The diagram below represents a substantial portion of the GL pipeline.  
      Each line connecting blocks represents an interface where data is  
      "produced" from the GL state or by fixed-function or programmable  
      pipeline stages and "consumed" by another pipeline stage.  Each producer  
      and consumer is labeled with a data type.  For producers, the  
      "(typeless)" designation generally means that the state and/or output  
      can be written as floating-point values or as signed or unsigned  
      integers.  "(float)" means that the outputs are always written as  
      floating-point.  The same distinction applies to consumers --  
      "(typeless)" means that the consumer is capable of reading inputs using  
      any data type, and "(float)" means that consumer always reads inputs as  
      floating-point values.  
  
      To get sane results, applications must ensure that each value passed  
      between pipeline stages is produced and consumed using the same data  
      type.  If a value is written in one stage as a floating-point value; it  
      must be read as a floating-point value as well.  If such a value is read  
      as a signed or unsigned integer, its value is considered undefined.  In  
      practice, the raw bits used to represent the floating-point (IEEE  
      single-precision floating-point encoding in the initial implementation  
      of this spec) will be treated as an integer.  
  
      Type matching between stages is not enforced by the GL, because the  
      overhead of doing so would be substantial.  Such overhead would include:  
  
        * matching the inputs and outputs of each pipeline stage  
          (fixed-function or programmable) every time the program  
          configuration or fixed-function state changes,  
  
        * tracking the data type of each generic vertex attribute and checking  
          it against the vertex program's inputs,  
  
        * tracking the data type of each program parameter and checking it  
          against the manner the parameters were used in programs,  
  
        * matching color buffers against fragment program outputs.  
  
      Such error checking is certainly valuable, but the additional CPU  
      overhead cost is substantial.  Given that current CPUs often have a hard  
      time keeping up with high-end GPUs, adding more overhead is a step in  
      the wrong direction.  We expect developer tools, such as instrumented  
      drivers, to be able to provide type checking on most interfaces.    
  
      The diagram below depicts assembly programmability.  Using vertex,  
      geometry, and fragment shaders provided by the OpenGL Shading Language  
      (GLSL) isn't substantially different from the assembly interface, except  
      that the interfaces between programmable pipeline stages are more  
      tightly coupled in GLSL (vertex, geometry, and fragment shaders are  
      linked together into a single program object), and that shader variables  
      are more strongly typed in GLSL than in the assembly interface.  
  
      In the figure below, the first programmable stage is vertex program  
      execution.  For all inputs read by the vertex program, they must be  
      specified in the GL vertex APIs (immediate mode or vertex arrays) using  
      a data type matching the data type read by the shader.  Additionally,  
      vertex programs (and all other program types) can read program  
      parameters, parameter buffers, and textures.  In all cases the  
      parameter, buffer, or texture data must be accessed in the shader using  
      the same data type used to specify the data.  If vertex programs are  
      disabled, fixed-function vertex processing is used.  Fixed-function  
      vertex processing is fully floating-point, and all the conventional  
      vertex attributes and state used by fixed-function are floating-point  
      values.  
  
      After vertex processing, an optional geometry program can be executed,  
      which reads attributes written by vertex programs (or fixed-functon) and  
      writes out new vertex attributes.  The vertex attributes it reads must  
      have been written by the vertex program (or fixed-function) using a  
      matching data type.  
  
      After geometry program execution, vertex attributes can optionally be  
      written out to buffer objects using the NV_transform_feedback extension.  
      The vertex attributes are written by the GL to the buffer objects using  
      the same data type used to write the attribute in the geometry program  
      (or vertex program if geometry programs are disabled).  
  
      Then, rasterization generates fragments based on transformed vertices.  
      Most attributes written by vertex or geometry programs can be read by  
      fragment programs, after the rasterization hardware "interpolates" them.  
      This extension allows fragment programs to control how each attribute is  
      interpolated.  If an attribute is flat-shaded, it will be taken from the  
      output attribute of the provoking vertex of the primitive using the same  
      data type.  If an attribute is smooth-shaded, the per-vertex attributes  
      will be interpreted as a floating-point value, and a floating-point  
      result.  One necessary consequence of this is that any integer  
      per-fragment attributes must be flat-shaded.  To prevent some  
      interpolation type errors, assembly and GLSL fragment shaders will not  
      compile if they declare an integer fragment attribute that is not flat  
      shaded.  [NOTE:  While point primitives generally have constant  
      attributes, any integer attributes must still be flat-shaded; point  
      rasterization may perform (degenerate) floating-point interpolation.]  
  
      Fragment programs must read attributes using data types matching the  
      outputs of the interpolation or flat-shading operations.  They may write  
      one or more color outputs using any data type, but the data type used  
      must match the corresponding framebuffer attachments.  Outputs directed  
      at signed or unsigned integer textures (EXT_texture_integer) must be  
      written using the appropriate integer data type; all other outputs must  
      be written as floating-point values.  Note that some of the  
      fixed-function per-fragment operations (e.g., blending, alpha test) are  
      specified as floating-point operations and are skipped when directed at  
      signed or unsigned integer color buffers.  
  
  
  
                                     generic               conventional  
                                     vertex                  vertex  
                                    attributes              attributes  
                                       | (typeless)             | (float)  
                                       |                        |  
                                       |                        |  
                                       | +----------------------+  
         program                       | |                      |  
        parameters ----+               | |                      |  
        (typeless)     |               | | (typeless)           | (float)  
                       |               V V                      V  
         constant      +-+----------> vertex              fixed-function  
         buffers   ----+ |(typeless)  program                 vertex  
        (typeless)     | |              |                       |  
                       | |              | (typeless)            | (float)  
         textures  ----+ |              V                       |  
        (typeless)       |              |<----------------------+  
            |            |              |  
            |            |              +---------------+  
            |            |              |               |  
            |            |              | (typeless)    |  
            |            |              V               |  
            |            +---------> geometry           |  
            |            |(typeless) program            |  
            |            |              |               |  
            |            |              | (typeless)    |  
            |            |              V               |  
            |            |              |<--------------+  
            |            |              |  
            |            |              |  
            |            |              +-----------------+  
            |            |              |                 |(typeless)  
            |            |              |                 v  
            |            |              |             transform  
            |            |              |             feedback  
            |            |              |              buffers  
            |            |              |  
            |            |              |  
            |            |              +-----------------------+  
            |            |              |                       |  
            |            |              | (float)               | (typeless)  
            |            |              V                       V  
            |            |         interpolated               flat  
            |            |          attributes             attributes  
            |            |              |                       |  
            |            |              | (float)               | (typeless)  
            |            |              V                       |  
            |            |              |<----------------------+  
            |            |              |  
            |            |              +-----------------------+  
            |            |              |                       |  
            |            |              | (typeless)            | (float)  
            |            |(typeless)    V                       V  
            |            +---------> fragment     +------> fixed-function  
            |                        program      |(float)   fragment  
            |                           |         |             |  
            +--------------------------/|/--------+             |  
                                        |                       |  
                                        | (typeless)            | (float)  
                                        V                       |  
                                        |<----------------------+  
                                        |  
                                        +-----------------------+------ ....  
                                        |                       |  
                                        | (typeless)            | (typeless)  
                                        V                       V  
                                      color                   color  
                                    attachment              attachment  
                                        0                       1  
           
  
    (5) Instructions can operate on signed integer, unsigned integer, and  
    floating-point values.  Some operations make sense on all three data  
    types?  How is this supported, and what type checking support is provided  
    by the assembler?  
  
      RESOLVED:  One important property of the instruction set is that the  
      data type for all operands and the result is fully specified by the  
      instructions themselves.  For instructions (such as ADD) that make sense  
      for both integer and floating-point values, an optional data type  
      modifier is provided to indicate which type of operation should be  
      performed.  For example, "ADD.S", "ADD.U", and "ADD.F", add signed  
      integers, unsigned integers, or floating-point values, respectively.  If  
      no data type modifier is provided, ".F" is assumed if the instruction  
      can apply to floating-point values and ".S" is assumed otherwise.  
  
      To help identify errors where the wrong data type is used -- for  
      example, adding integer values in an ADD instruction that omits a data  
      type modifier and thus defaults to "ADD.F" -- variables may be declared  
      with optional data type modifiers.  In the following code:  
  
        INT TEMP a;  
        UINT TEMP b;  
        FLOAT TEMP c;  
        TEMP d;  
  
      "a", "b", "c", and "d" are declared as temporary variables holding  
      signed integer, unsigned integer, floating-point, and typeless values.  
      Since each instruction fully specifies the data type of each operand and  
      its result, these data types can be checked against the data type  
      assigned to the variables operated on.  If the types don't match, and  
      the variable is not typeless, an error is reported.  The opcode modifier  
      ".NTC" can be used to ignore such errors on a per-opcode basis, if  
      required.  
  
      Note that when bindings are used directly in instructions, they are  
      always considered typeless for simplicity.  Some fixed-function bindings  
      have an obvious data type, but other bindings (e.g., program parameters)  
      can hold either integer or floating-point values, depending on how they  
      were specified.  
  
      Variable data types are optional.  Typeless variables are provided  
      because some programs may want to reuse the same variable in several  
      places with different data types.  
  
    (6) Should both signed (INT) and unsigned integer (UINT) data types be  
    provided?  
  
      RESOLVED:  Yes.  Signed and unsigned integer operations are supported.  
      Providing both "INT" and "UINT" variable modifiers distinguish between  
      signed and unsigned values for type checking purposes, to ensure that  
      unsigned values aren't read as signed values and vice versa.  
  
      This specification says if a value is read a signed integer, but was  
      written as an unsigned integer, the value returned is undefined.  
      However, signed and unsigned integers are interchangeable in practice,  
      except for very large unsigned integers (which can't be represented as  
      signed values of the equivalent size) or negative signed integers.  
  
      If programs know that they won't generate negative or very large values,  
      signed and unsigned integers can be used interchangeably.  To avoid type  
      errors in the assembler in this case, typeless variables can be used.  
      Or the ".NTC" modifier can be used when appropriate.  
  
    (7) Integer and floating-point constants are supported in the instruction  
    set.  Integer constants might be interpreted to mean either "real integer"  
    values or floating-point values.  How are they supported?  
  
      RESOLVED:  When an obvious floating point constant is specified (e.g.,  
      "3.0"), the developers' intent is clear.  If you try to use a  
      floating-point value in an instruction that wants an integer operand, or  
      a declaration of an integer parameter variable, the program will fail to  
      load.  An integer constant used in an instruction isn't quite as clear.  
      But its meaning can be easily inferred because the operand types of  
      instructions are well-known at compile time.  An integer multiply  
      involving the constant "2" will interpret the "2" as an integer.  A  
      floating-point multiply involving the same constant "2" will interpret  
      it as a floating-point value.  
  
      The only real problem is for a parameter declaration that is typeless.  
      For typed variables, the intent is clear:  
  
        INT PARAM two = 2;               # use integer 2  
        FLOAT PARAM twoPt0 = 2;          # use floating-point 2.0  
  
      For typeless variables, there's no context to go on:  
  
        PARAM two = 2;                   # 2?  2.0?  
  
      This extension is intended to be largely upward-compatible with  
      ARB_vertex_program, ARB_fragment_program, and the other extensions built  
      on top of them.  In all of these, the previous declaration is legal and  
      means "2.0".  For compatibility, we choose to interpret integer  
      constants in this case as floating-point values.  The assembler in the  
      NVIDIA implementation will issue a warning if this case ever occurs.  
  
      This extension does not provide decoration of integer constant values --  
      we considered adding suffixed integers such as "2U" to mean "2, and  
      don't even think about converting me to a float!".  We expect that it  
      will be sufficient to use the "INT" or "FLOAT" modifiers to disambiguate  
      effectively.  
  
    (8) Should hexadecimal constants (e.g., 0x87A3 or 0xFFFFFFFF) be supported?  
  
      RESOLVED:  Yes.  
  
    (9) Should we provide data type modifiers with explicit component sizes?  
    For example, "INT8", "FLOAT16", or "INT32".  If so, should we provide a  
    mechanism to query the size (in bits) of a variable, or of different  
    variable types/qualifiers?  
  
      RESOLVED:  No.  
  
    (10) Should this extension provide better support for array variables?  
  
      RESOLVED:  Yes; array variables of all types are allowed.  
  
      In ARB_vertex_program, program parameter (constant) variables could be  
      addressed as arrays.  Temporary variables, vertex attributes, and vertex  
      results could not be declared as arrays.  
  
      In NV_vertex_program3 and NV_fragment_program2, relative addressing was  
      supported in program bindings:  
  
        MOV R0, vertex.attrib[A0.x];            # vertex  
        MOV result.texcoord[A0.x], R0;          # vertex  
        MOV R0, fragment.texcoord[A0.x];        # fragment -- inside LOOP  
  
      Explicitly declared attribute or result arrays were not supported, and  
      temporaries could also not be arrays.  
  
      This extension allows users to declare attribute, result, and temporary  
      arrays such as:  
  
        ATTRIB attribs[] = { vertex.attrib[7..11] };  
        TEMP scratch[10];  
        RESULT texcoords[] = { result.texcoord[0..3] };  
  
      Additionally, the relative addressing mechanisms provided by  
      NV_vertex_program3 and NV_fragment_program2 are NOT supported in this  
      extension -- instead, declared array variables are the only way to get  
      relative addressing.  Using declared arrays allows the assembler to  
      identify which attributes will actually be used.  An expression like  
      "vertex.texcoord[A0.x]" doesn't identify which texture coordinates are  
      referenced, and the assembler must be conservative in this case and  
      assume that they all are.  
  
    (11) Is relative addressing of temporaries allowed?  
  
      RESOLVED:  Yes.  However, arrays of temporaries may end up being stored  
      in off-chip memory, and may be slower to access than non-array  
      temporaries.  
  
    (12) Should this extension add bindings to pass generic attributes between  
    vertex, geometry, and fragment programs, or are texture coordinates  
    sufficient?  
  
      RESOLVED:  While texture coordinates have been used in the past, generic  
      attributes should be provided.    
  
      The assembler provides a large set of bindings and automatically  
      eliminates generic attributes or components that are unused.  At each  
      interface between programs, there is an implementation-dependent limit  
      on the number of attribute components that can be passed.  
  
      There are several reasons that this approach was chosen.  First, if the  
      number of attributes that can be passed between program stages exceeds  
      the number of existing texture coordinate sets supported when specifying  
      vertex, a second implementation-dependent number of texture coordinates  
      would need to be exposed to cover the number supported between stages.  
      Second, the mechanisms described above reduce or eliminate the need to  
      pack attributes into four component vectors.  Third, "texture  
      coordinates" that have been historically used for texture lookups don't  
      need to be used to pass values that aren't used this way.  
  
    (13) The structured branching support in NV_fragment_program2 provides a  
    REP instruction that says to repeat a block of code <N> times, as well as  
    a LOOP instruction that does the same, but also provides a special loop  
    counter variable.  What sort of looping mechanism should we provide here?  
  
      RESOLVED:  Provide only the REP instruction.  The functionality provided  
      by the LOOP instruction can be easily achieved by using an integer  
      temporary as the loop index.  This avoids two annoyances of the old LOOP  
      models:  (a) the loop index (A0.x) is a special variable name, while all  
      other variables are declared normally and (b) instructions can only  
      access the loop index of the innermost loop -- loop indices at higher  
      nesting levels are not accessible.  
  
      One other option was a considered -- a "LOOPV" instruction (LOOP with a  
      variable where the program specified a variable name and component to  
      hold the loop index, instead of using the implicit variable name "A0.x".  
      In the end, it was decided that using an integer temporary as a loop  
      counter was sufficient.  
  
    (14) The structured branching support in NV_fragment_program2 provides a  
    REP instruction that requires a loop count.  Some looping constructs may  
    not have a definite loop count, such as a "while" statement in C.  Should  
    this construct be supported, and if so, how?  
  
      RESOLVED:  The REP instruction is extended to make the loop count  
      optional.  If no loop count is provided, the REP instruction specified a  
      loop that can only be exited using the BRK (break) or RET instructions.  
      To avoid obvious infinite loops, an error will be reported if a  
      REP/ENDREP block contains no BRK instruction at the current nesting  
      level and no RET instruction at any nesting level.  
  
      To implement a loop like "while (value < 7.0) ...", code such as the  
      following can be used:  
  
        TEMP cc;                        # dummy variable  
        REP;  
          SLT.CC cc.x, value.x, 7.0;    # compare value.x to 7.0, set CC0  
          BRK NE.x;                     # break out if not true  
          ...  
          ...                           # presumably update value!  
          ...  
        ENDREP;  
  
    (15) The structured branching support in NV_fragment_program2 provides a  
    BRK instruction that operates like C's "break" statement.  Should we  
    provide something similar to C's "continue" statement, which skips to the  
    next iteration of the loop?  
  
      RESOLVED:  Yes, a new CONT opcode is provided for this purpose.  
  
    (16) Can the BRK or CONT instructions break out of multiple levels of  
    nested loops at once?  
  
      RESOLVED:  No.  BRK and CONT only exit the current nesting level.  To  
      break out of multiple levels of nested loops, multiple BRK/CONT  
      instructions are required.  
  
    (17) For REP instructions, is the loop counter reloaded on each iteration  
    of the loop?  
  
      RESOLVED:  No.  The loop counter is loaded once at the top of the loop,  
      compared to zero at the top of the loop, and decremented when each loop  
      iteration completes.  A program may overwrite the variable used to  
      specify the initial value of the loop counter inside the loop without  
      affecting the number of times the loop body is executed.  
  
    (18) How are floating-point values represented in this extension?  What  
    about floating-point arithmetic operations?  
  
      RESOLVED:  In the initial hardware implementation of this extension,  
      floating-point values are represented using the standard 32-bit IEEE  
      single-precision encoding, consisting of a sign bit, 8 exponent bits,  
      and 23 mantissa bits.  Special encodings for NaN (not a number), +/-INF  
      (infinity), and positive and negative zero are supported.  Denorms  
      (values less than 2^-126, which have an exponent encoding of "0" and no  
      implied leading one) are supported, but may be flushed to zero,  
      preserving the sign bit of the original value.  Arithmetic operations  
      are carried out at single-precision using normal IEEE floating-point  
      rules, including special rules for generating infinities, NaNs, and  
      zeros of each sign.  
  
      Floating-point temporaries declared as "SHORT" may be, but are not  
      necessarily, stored as 16-bit "fp16" values (sign bit, five exponent  
      bits, ten mantissa bits), as specified in the NV_float_buffer and  
      ARB_half_float_pixel extensions.  
  
    (19) Should we provide a method to declare how fragment attributes are  
    interpolated?  It is possible to have flat-shaded attributes,  
    perspective-corrected attributes, and centroid-sampled attributes.  
  
      RESOLVED:  Yes.  Fragment program attribute variable declarations may  
      specify the "FLAT", "NOPERSPECTIVE", and "CENTROID" modifiers.    
  
      These modifiers are documented in detail in the NV_fragment_program4  
      specification.  
        
    (20) Should vertex and primitive identifiers be supported?  If so, how?  
  
      RESOLVED:  A vertex identifier is available as "vertex.id" in a vertex  
      program.  The vertex ID is equal to value effectively passed to  
      ArrayElement when the vertex is specified, and is defined only if vertex  
      arrays are used with buffer objects (VBOs).  
  
      A primitive identifier is available as "primitive.id" in a geometry or  
      fragment program.  The primitive ID is equal to the number of primitives  
      processed since the last implicit or explicit call to glBegin().  
  
      See the NV_vertex_program4 spec for more information on vertex IDs, and  
      the NV_geometry_program4 or NV_fragment_program4 specs for more  
      information on primitive IDs.  
  
    (21) For integer opcodes, should a bitwise inversion operator "~" be  
    provided, analogous to existing negation operator?  
  
      RESOLVED:  No.  If this operator were provided, it might allow a program  
      to evaluate the expression "a&(~b)" using a single instruction:  
  
        AND.U a, a, ~b;  
  
      Instead, it is necessary to instead do something like:  
  
        UINT TEMP t;  
        NOT.U t, b;  
        AND.U a, a, t;  
  
      If necessary, this functionality could be added in a subsequent  
      extension.  
  
    (22) What happens if you negate or take the absolute value of the  
    biggest-magnitude negative integer?  
  
      RESOLVED:  Signed integers are represented using two's complement  
      representation.  For 32-bit integers, the largest possible value is  
      2^31-1; the smallest possible value is -2^31.  There is no way to  
      represent 2^31, which is what these operators "should" return.  The  
      value returned in this case is the original value of -2^31.  
  
    (23) How do condition codes work?  How are they different from those  
    provided in previous NVIDIA extensions?  
  
      RESOLVED:  There are two condition codes -- CC0 and CC1 -- each of which  
      is a four-component vector.  The condition codes are set based on the  
      result of an instruction that specifies a condition code update  
      modifier.  Examples include:  
  
        ADD.S.CC  R0, R1, R2;       # add signed integers R1 and R2, update  
                                    #   CC0 based on the result, write the   
                                    #   final value to R0  
        ADD.F.CC1 R3, R4, R5;       # add floats R4 and R5, update CC1 based  
                                    #   on the result, write the final value  
                                    #   to R3  
        ADD.U.CC0 R6.xy, R7, R8;    # add unsigned integers R7 and R8, update  
                                    #   CC0 (x and y components) based on the  
                                    #   result, write the final value to R6  
                                    #   (x and y components)  
  
      Condition codes can be used for conditional writes, conditional  
      branches, or other operations.  The condition codes aren't used  
      directly, but are instead used with a condition code test such as "LT"  
      (less than) or "EQ" (equal to).  Examples include:  
  
        MOV R0 (GT.x), R1;          # move R1 to R0 only if the x component of  
                                    #   CC0 indicates a result of ">0"  
        MOV R2 (NE1), R3;           # component-wise move of R3 to R2 if the  
                                    #   corresponding component of CC1   
                                    #   indicates a result of "!=0"  
        IF LE0.xyxy;                # execute the block of code if the x or  
          ...                       #   y components of CC0 indicate a result  
        ENDIF;                      #   of "<=0"  
        REP;                          
          ...  
          BRK EQ1.xyzx;             # break out of loop if the x, y, or z  
        ENDREP;                     #   components of CC1 indicate a result of  
                                    #   "==0".  
  
      Previous NVIDIA extensions provide eight tests, which are still  
      supported here.  The tests "EQ" (equal), "GE" (greater/equal), "GT"  
      (greater than), "LE" (less/equal), "LT" (less than), and "NE" (not  
      equal) can be used to determine the relation of the result used to set  
      the condition code with zero.  The tests "TR" (true) and "FL" (false),  
      are special tests that always evaluate to true or false respectively.  
  
      For floating-point results, a NaN (not a number) encoding causes the  
      "NE" condition to evaluate to TRUE and all other conditions to evaluate  
      to FALSE.  IEEE encodings for "negative" and "positive" zero are both  
      treated as equal to zero.  
  
      Condition codes are implemented as a set of flags, which are set  
      depending on the type of operation, as described in the spec.    
  
      For instructions that return floating-point or signed integer values,  
      the normal condition code tests reliably indicate the relationship of  
      the result to zero.  For instructions that return unsigned values, the  
      condition codes are a bit more complicated.  For example, the sign flag  
      is set if the most significant bit of the result written is set.  As a  
      result, very large unsigned integer values (e.g., 0x80000000 -  
      0xFFFFFFFF) are effectively treated as negative values.  Condition code  
      tests should be used with care with unsigned results -- to test if an  
      unsigned integer is ">0", use a sequence like:  
  
        MOV.U.CC R0, R1;            # move R1 to R0, set condition code  
        IF NE;                      # test if the result is "!=0", a very   
          ...                       #   large value might fail "GT"!  
        ENDIF;  
  
      This extension provides a number of additional condition code tests  
      useful for different floating-point or integer operations:  
  
        * NAN (not a number) is true if a floating-point result is a NaN.  LEG  
          (less, equal to, or greater) is the opposite of NAN.  
  
        * CF (carry flag) is true if an unsigned add overflows, or if an  
          unsigned subtract produces a non-negative value.  NCF (no carry  
          flag) is the opposite of CF.  
  
        * OF (overflow flag) is true if a signed add or subtract overflows.  
          NOF (no overflow flag) is the opposite of OF.  
  
        * SF (sign flag) is true if the sign flag is set.  NSF (no sign flag)  
          is the opposite of SF.  
  
        * AB (above) is true if an unsigned subtract produces a positive  
          result.  BLE (below or equal) is the opposite of AB, and is true if  
          an unsigned subtract produces a negative result or zero.  Note that  
          CF can be used to test if the result is greater than or equal to  
          zero, and NCF can be used to test if the result is less than zero.  
  
    (24) How do the "set on" instructions (SEQ, SGE, SGT, SLE, SLT, SNE) work  
    with integer values and/or condition codes?  
  
      RESOLVED:  "Set on" instructions comparing signed and unsigned values  
      return zero if the condition is false, and an integer with all bits set  
      if the condition is true.  If the result is signed, it is interpreted as  
      -1.  If the result is unsigned, it is interpreted the largest unsigned  
      value (0xFFFFFFFF for 32-bit integers).  This is different from the  
      floating-point "set on", which is defined to return 1.0.  
  
      This specific result encoding was chosen so that bitwise operators (NOT,  
      AND, OR, XOR) can be used to evaluate boolean expressions.  
  
      When performing condition code tests on the results of an integer "set  
      on" instruction, keep in mind that a TRUE result has the most  
      significant bit set and will be interpreted as a negative value.  To  
      test if a condition is true, use "NE" (!=0).  A condition code test of  
      "GT" will always fail if the condition code was written by an integer  
      "set on" instruction.  
  
    (25) What new texture functionality is provided?  
  
      RESOLVED:  Several new features are provided.  
  
      First, the TXF (texel fetch) instruction allows programs to access a  
      texture map like a normal array.  Integer coordinates identifying an  
      individual texel and LOD are provided, and the corresponding texture  
      data is returned without filtering of any type.  
  
      Second, the TXQ (texture size query) instruction allows programs to  
      query the size of a specified level of detail of a texture.  This  
      feature allows programs to perform computations dependent on the size of  
      the texture without having to pass the size as a program parameter or  
      via some other mechanism.  
  
      Third, applications may specify a constant texel offset in a texture  
      instruction that moves the texture sample point by the specified number  
      of texels.  This offset can be used to perform custom texture filtering,  
      and is also independent of the size of the texture LOD -- the same  
      offsets are applied, regardless of the mipmap level.  
  
      Fourth, shadow mapping is supported for cube map textures.  The first  
      three coordinates are the normal (s,t,r) coordinates for a cube map  
      texture lookup, and the fourth component is a depth reference value that  
      can be compared to the depth value stored in the texture.  
  
    (26) What "consistency" requirements are in effect for textures accessed  
    via the TXF (texel fetch) instruction?  
  
      UNRESOLVED:  The texture must be usable for regular texture mapping  
      operations -- if texture sizes or formats are inconsistent and a  
      mipmapped min filter is used, the results are undefined.  
  
    (27) How does the TXF instruction work with bordered textures?  
  
      RESOLVED:  The entire image can be accessed, including the border  
      texels.  For a 64x64 2D texture plus border (66x66 overall), the lower  
      left border texel is accessed using the coordinates (-1,-1); the upper  
      right border texel is accessed using the coordinates (64,64).  
  
    (28) What should TXQ (texture size query) return for "irrelevant" texture  
    sizes (e.g., height of a 1D texture)?  Should it return any other  
    information at the same time?  
  
      RESOLVED:  This specification leaves all "extra" components undefined.  
  
    (29) How do texture offsets interact with cubemap textures?  
  
      RESOLVED:  They are not supported in this extension.  
  
    (30) How do texture offsets interact with mipmapped textures?  
  
      RESOLVED:  The texture offsets are added after the (s,t,r) coordinates  
      have been divided by q (if applicable) and converted to (u,v,w)  
      coordinates by multiplying by the size of the selected texture level.  
      The offsets are added to the (u,v,w) coordinates, and always move the  
      sample point by an integral number of texel coordinates.  If multiple  
      mipmaps are accessed, the sample point in each mipmap level is moved by  
      an identical offset.  The applied offsets are independent of the  
      selected mipmap level.  
  
    (31) How do shadow cube maps work?  
  
      UNRESOLVED:  An application can define a cube map texture with a  
      DEPTH_COMPONENT internal format, and then render a scene using the cube  
      map faces as the depth buffer(s).  When rendering the projection should  
      be set up using the "center" of the cubemap as the eye, and using a  
      normal projection matrix.  When applying the shadow map, the fragment  
      program read the (x,y,z) eye coordinates, compute the length of the  
      major axis (MAX(|x|,|y|,|z|) and then transform this coordinate to [0,1]  
      space using the same parameters used to derive Z in the projection  
      matrix.  A 4-component vector consisting of x, y, z, and this computed  
      depth value should be passed to the texture lookup, and normal shadow  
      mapping operations will be performed.  
  
      This issue should include the math needed to do this computation and  
      sample code.  
  
    (32) Integer multiplies can overflow by a lot.  Should there be some way  
    to return the high part of both unsigned and signed integer multiplies?  
  
      RESOLVED:  Yes.  The ".HI" multipler is provided to do a return the 32  
      MSBs of a 32x32 integer multiply.  The instruction sequence:  
  
        INT TEMP R0, R1, R2, R3;  
        MUL.S    R0, R2, R3;  
        MUL.S.HI R1, R2, R3;  
  
     will do a 32x32 signed integer multiply of R2 and R3, with the 32 LSBs of  
     the 64-bit result in R0 and the 32 MSBs in R1.  
  
    (33) Should there be any other special multiplication modifiers?  
  
      RESOLVED:  Yes.  The ".S24" and ".U24" modifiers allow for signed and  
      unsigned integer multiplies where both operands are guaranteed to fit in  
      the least significant 24 bits.  On some architectures supporting this  
      extension, ".S24" and ".U24" integer multiplies may be faster than  
      general-purpose ".S" and ".U" multiplies.  If either value doesn't fit  
      in 24 bits, the results of the operation are undefined --  
      implementations may, but are not required to, ignore the MSBs of the  
      operands if ".S24" or ".U24" is specified.  
  
    (34) This extension provides subroutines, but doesn't provide a stack to  
    push and pop parameters.  How do we deal with this?  NV_vertex_program3  
    supported PUSHA/POPA instructions to push and pop address registers.  
  
      RESOLVED:  No explicit stack is required.  A program can implement a  
      stack by allocating a temporary array plus a single integer temporary to  
      use as the stack "pointer".  For example:  
  
        TEMP stack[256];                # 256 4-component vectors  
        INT TEMP sp;                    # sp.x == stack pointer  
        INT TEMP cc;                    # condition code results  
          
        function:  
          SGE.S.CC cc.x, sp.x, 256;     # compute stackPointer >= 256  
          RET NE.x;                     # return if TRUE  
          MOV stack[sp], R0;            # push R0 onto the stack  
          ADD.S sp.x, sp.x, 1;  
          ...  
          SUB.S sp.x, sp.x, 1;          # pop R0 off the stack  
          MOV R0, stack[sp];  
          RET  
  
    (35) Should we provide new vector semantics for previously-defined opcodes  
    (e.g., LG2 computes a component-wise logarithm)?  
  
      RESOLVED:  Not in this extension.  The instructions we define here are  
      compatible with the vector or scalar nature of previously defined  
      opcodes.  This simplifies the implementation of an assembler that needs  
      to support both old and new instruction sets.  
  
    (36) Should it really be undefined to read from a register storing data of  
    one type with an instruction of the other type (e.g., to read the bits of  
    a floating-point number as an unsigned integer)?  
  
      RESOLVED:  The spec describes undefined results for simplicity.  In  
      practice, mixing data types can be done, where signed integers are  
      represented as two's complement integers and floating-point numbers are  
      represented using IEEE single-precision representation.  For example:  
  
        TEMP R0, R1;                    # typeless  
        MOV.U R0, 0x3F800000;           # R0 = 1.0  
        MOV.U R1, 0xBF800000;           # R1 = -1.0  
        MUL.F R0, R0, R1;               # R0 = -1 * 1 = -1 (0xBF800000)  
        XOR.U R0, R0, R1;               # R0 = 0xBF800000 ^ 0xBF800000 = 0  
        NOT.U R0, R0;                   # R0 = 0xFFFFFFFF  
        I2F.S R0, R0;                   # R0 = -1.0 (0xFFFFFFFF = -1 signed)  
        SEQ.F R0, R0, R1;               # R0 = 1.0 (-1.0 == -1.0)  
  
    (37) Buffer objects can be sourced as program parameters using the  
    NV_parameter_buffer_object extension.  How are they accessed in a program?  
  
      RESOLVED:  The instruction set and existing program environment and  
      local parameter bindings operate largely on four-component vectors.  
      However, NV_parameter_buffer_object exposes the ability to reach into  
      buffers consisting of user-generated data or data written to the buffer  
      object by the GPU.  Such data sets may not consist entirely  
      four-component floating-point vectors, so a four-component vector API  
      may be unnatural.  An application might need to reformat its data set to  
      deal with this issue.  Or it might generate odd code to compensate for  
      mis-alignment -- for example, reading an array of 3-component vectors by  
      doing two four-component vector accesses and then rotating based on  
      alignment.  Neither approach is particularly satisfying.  
  
      Instead, this extension takes the approach of treating parameter buffers  
      as array of scalar words.  When an individual buffer element is read,  
      the single word is replicated to produce a four-component vector.  To  
      access an array of 3-component vectors, code like the following can be  
      used:  
  
        PARAM buffer[] = { program.buffer[0] };  
        INT TEMP index;  
        TEMP R0;  
        ...  
        MUL.S index, index, 3;          # to read "vec3" #X, compute 3*X  
        MOV R0.x, buffer[index+0];  
        MOV R0.y, buffer[index+1];  
        MOV R0.z, buffer[index+2];  
  
    (38) Should recursion be allowed?  If so, how is the total amount of  
    recursion limited?  
  
      RESOLVED:  Recursion is allowed, and a call stack is provided by the  
      implementation.  The size of the call stack is limited to the  
      implementation-dependent constant MAX_PROGRAM_CALL_DEPTH, and when a the  
      call stack is full, the results of further CAL instructions is  
      undefined.  In the initial implementation of this extension, such  
      instructions will have no effect.    
  
      Note that no stack is provided to hold local registers; a program may  
      implement its own via a temporary array and integer stack "pointer".  
  
    (39) Variables are all four-component vectors in previous extensions.  
    Should scalar or small-vector variables be provided?  
  
      RESOLVED:  It would be a useful feature, but it was left out for  
      simplicity.  In practice, a variable where only the X component is used  
      will be equivalent to a scalar.  
  
    (40) The PK* (pack) and UP* (unpack) instructions allow packing multiple  
    components of data into a single component.  The bit packing is  
    well-defined.  Should we require specific data types (e.g., unsigned  
    integer) to hold packed values?  
  
      RESOLVED:  No.  Previous instruction sets only allowed programs to write  
      packed values to a floating-point variable (the only data type  
      provided).  We will allow packed results to be written to a variable of  
      any data type.  Integer instructions can be used to manipulate bits of  
      packed data in place.  
  
    (41) What happens when converting integers to floats or vice versa if  
    there is insufficient precision or range to represent the result?  
  
      RESOLVED:  For integer-to-float conversions, the nearest representable  
      floating-point value is used, and the least significant bits of the  
      original integer value are lost.  For float-to-integer conversions,  
      out-of-range values are clamped to the nearest representable integer.  
  
    (42) Why are some of the grammar rules so bizarre (e.g., attribUseD,  
    attribUseV, attribUseS, attribUseVNS)?  
  
      RESOLVED:  This grammar is based upon the original ARB_vertex_program  
      grammar, which has a number of "interesting" characteristics.  For  
      example, some of the bindings provided by ARB_vertex_program naturally  
      require some amount of lookahead.  For example, a vertex program can  
      write an output color using any of the following:  
  
        MOV result.color, 0;            # primary color  
        MOV result.color.primary, 0;    # primary color again  
        MOV result.color.secondary, 0;  # secondary color this time  
  
      The pieces of the color binding are separated by "." tokens.  However,  
      writemasks are also supported, which also use "." before the write  
      mask.  So, we could also have something like:  
  
        MOV result.color.xyz, 0;        # primary color with W masked off  
  
      In this form, a parser needs to look at both the "." and the "xyz" to  
      determine that the binding being used is "result.color" (and not  
      "result.color.secondary").  
  
      Additionally, some checks that should probably be semantic errors (e.g.,  
      allowing different swizzle or scalar operand selectors per instruction,  
      or disallowing both in the case of SWZ) we specified in the original  
      grammar.  
  
      ARB_fragment_program and subsequent NVIDIA instructions built upon this,  
      and the grammar for this extension was rewritten in the current form so  
      it could be validated more easily.  
  
    (43) This is an NV extension (NV_gpu_program4).  Why does the  
     MAX_PROGRAM_TEXEL_OFFSET_EXT token has an "EXT" suffix?  
  
      RESOLVED:  This token is shared between this extension and the  
      comparable high-level GLSL programmability extension (EXT_gpu_shader4).  
      Rather than provide a duplicate set of token names, we simply use the  
      EXT version here.

Revision History

  
    Rev.    Date    Author    Changes  
    ----  --------  --------  --------------------------------------------  
     4    02/04/08  pbrown    Fix errors in texture wrap mode handling.  
                              Added a missing clamp to avoid sampling border  
                              in REPEAT mode.  Fixed incorrectly specified  
                              weights for LINEAR filtering.  
  
     3    02/09/07  pbrown    Updated status section (now released).  
  
     2    10/19/06  pbrown    Change the token suffix for maximum texel offset  
                              values from NV to EXT, since it is shared with  
                              EXT_gpu_shader4.  Clarify what happens on a  
                              negate of an unsigned value.  Fix typo in data  
                              type modifier description.  Add missing  
                              description of the "BUFFER4" declaration   
                              keyword.  
  
     1              pbrown    Internal spec development.  
     1              pbrown    Internal spec development.

Last update: November 14, 2006.
Cette page doit être lue avec un navigateur récent respectant le standard XHTML 1.1.