Microsoft.ML.DataView Akin to FindIndexSorted, except stores the found index in the output index parameter, and returns whether that index is a valid index pointing to a value equal to the input parameter value. Assumes input is sorted and finds value using BinarySearch. If value is not found, returns the logical index of 'value' in the sorted list i.e index of the first element greater than value. In case of duplicates it returns the index of the first one. It guarantees that items before the returned index are < value, while those at and after the returned index are >= value. Assumes input is sorted and finds value using BinarySearch. If value is not found, returns the logical index of 'value' in the sorted list i.e index of the first element greater than value. In case of duplicates it returns the index of the first one. It guarantees that items before the returned index are < value, while those at and after the returned index are >= value. A structure serving as the identifier of a row of . For datasets with millions of records, those IDs need to be unique, therefore the need for such a large structure to hold the values. Those Ids are derived from other Ids of the previous components of the pipelines, and dividing the structure in two: high order and low order of bits, and reduces the changes of those collisions even further. The low order bits. Corresponds to H1 in the Murmur algorithms. The high order bits. Corresponds to H2 in the Murmur algorithms. Initializes a new instance of The low order ulong. The high order ulong. An operation that treats the value as an unmixed Murmur3 128-bit hash state, and returns the hash state that would result if we hashed an addition 16 bytes that were all zeros, except for the last bit which is one. An operation that treats the value as an unmixed Murmur3 128-bit hash state, and returns the hash state that would result if we hashed an addition 16 bytes that were all zeros. An operation that treats the value as an unmixed Murmur3 128-bit hash state, and returns the hash state that would result if we took , scrambled it using , then hashed the result of that. This is the abstract base class for all types in the type system. Those that wish to extend the type system should derive from one of the more specific abstract classes or . Constructor for extension types, which must be either or . The raw for this . Note that this is the raw representation type and not the complete information content of the . Code should not assume that a uniquely identifiers a . For example, most practical instances of ML.NET's KeyType and will have a of , but both are very different in the types of information conveyed in that number. Return if is equivalent to and otherwise. Another to be compared with . The abstract base class for all non-primitive types. This class stands in constrast to . As that class is defined to encapsulate cases where instances of the representation type can be freely copied without concerns about ownership, mutability, or dispoal, this is defined for those types where these factors become concerns. To take the most conspicuous example, is a structure type, which through the buffer sharing mechanisms of its representation type, does not have assignment as sufficient to create an independent copy. The abstract base class for all primitive types. Values of these types can be freely copied without concern for ownership, mutation, or disposing. The standard text type. This has representation type of with type parameter . Note this can have only one possible value, accessible by the singleton static property . The singleton instance of this type. The standard number type. This class is not directly instantiable. All allowed instances of this type are singletons, and are accessible as static properties on this class. The singleton instance of the with representation type of . The singleton instance of the with representation type of . The singleton instance of the with representation type of . The singleton instance of the with representation type of . The singleton instance of the with representation type of . The singleton instance of the with representation type of . The singleton instance of the with representation type of . The singleton instance of the with representation type of . The singleton instance of the with representation type of . The singleton instance of the with representation type of . The type. This has representation type of . Note this can have only one possible value, accessible by the singleton static property . The singleton instance of this type. The standard boolean type. This has representation type of . Note this can have only one possible value, accessible by the singleton static property . The singleton instance of this type. The standard date time type. This has representation type of . Note this can have only one possible value, accessible by the singleton static property . The singleton instance of this type. The standard date time offset type. This has representation type of . Note this can have only one possible value, accessible by the singleton static property . The singleton instance of this type. The standard timespan type. This has representation type of . Note this can have only one possible value, accessible by the singleton static property . The singleton instance of this type. should be used to decorated class properties and fields, if that class' instances will be loaded as ML.NET . The function will be called to register a for a with its s. Whenever a value typed to the registered and its s, that value's type (i.e., a ) in would be the associated . A function implicitly invoked by ML.NET when processing a custom type. It binds a DataViewType to a custom type plus its attributes. Return if is equivalent to and otherwise. Another to be compared with . Type representing categorical or enumerated values, most commonly used for the values of labels in multiclass classification models. The underlying .NET type is one of the unsigned integer types. The default is , but it can also be , , or . Despite keys being numerical types, the information is not inherently numeric, so typically, arithmetic is not meaningful. Missing values are mapped to 0. The first non-missing value of the set is always 1. The other values range up to the value of . For example, if you have a key value with a of 3, then the value 0 corresponds to missing key values, and one of the values of 1, 2, or 3 is of the valid values, and no other values are used. Initializes a new instance of the class. The underlying representation type. Should be one of , , (the most common choice), or . The cardinality of the underlying set. This must not exceed the associated maximum value of the representation type. For example, if is , then this must not exceed . Initializes a new instance of the class. This differs from the hypothetically more general constructor by taking an for , to more naturally facilitate the most common case that the key value is being used as an enumeration over an array or list of some form. The underlying representation type. Should be one of , , (the most common choice), or . The cardinality of the underlying set. This must not exceed the associated maximum value of the representation type. For example, if is , then this must not exceed . Returns true iff the given type is valid for a . The valid ones are , , , and , that is, the unsigned integer types. is the cardinality of the . The typical legal values for data of this type ranges from the missing value of 0, and non-missing values ranging from to 1 through , inclusive, being the enumeration into whatever set the key values are enumerated over. Determine if this object is equal to another instance. Checks if the other item is the type of , if the is the same, and if the is the same. The other object to compare against. if both objects are equal, otherwise . Determine if a instance is equal to another instance. Checks if any object is the type of , if the is the same, and if the is the same. The other object to compare against. if both objects are equal, otherwise . Retrieves the hash code. An integer representing the hash code. The string representation of the . A formatted string. A buffer that supports both dense and sparse representations. This is the representation type for all instances. The explicitly defined values of this vector are exposed through and, if not dense, . This structure is by itself immutable, but to enable buffer editing including re-use of the internal buffers, a mutable variant can be accessed through . Throughout the code, we make the assumption that a sparse is logically equivalent to a dense with the default value for filling in the default values. The type of the vector. There are no compile-time restrictions on what this could be, but this code and practically all code that uses makes the assumption that an assignment of a value is sufficient to make a completely independent copy of it. So, for example, this means that a buffer of buffers is not possible. But, things like , , and , are totally fine. The internal re-usable array of values. The internal re-usable array of indices. The number of items explicitly represented. This equals when the representation is dense and less than when sparse. The logical length of the buffer. Note that if this vector , then this will be the same as the as returned from , since all values are explicitly represented in a dense representation. If this is a sparse representation, then that will be somewhat shorter, as this field contains the number of both explicit and implicit entries. The explicitly represented values. When this , the of the returned value will equal , and otherwise will have length less than . The indices. For a dense representation, this array is not used, and will return the default "empty" span. For a sparse representation it is parallel to that returned from and specifies the logical indices for the corresponding values, in increasing order, between 0 inclusive and exclusive, corresponding to all explicitly defined values. All values at unspecified indices should be treated as being implicitly defined with the default value of . To give one example, if returns [3, 5] and () produces [98, 76], this stands for a vector with non-zero values 98 and 76 respectively at the 4th and 6th coordinates, and zeros at all other indices. (Zero, because that is the default value for all .NET numeric types.) Gets a value indicating whether every logical element is explicitly represented in the buffer. Construct a dense representation. The array is often unspecified, but if specified it should be considered a buffer to be held on to, to be possibly used. The logical length of the resulting instance. The values to be used. This must be at least as long as . If is 0, it is legal for this to be . The constructed buffer takes ownership of this array. The internal indices buffer. Because this constructor is for dense representations this will not be immediately useful, but it does provide a buffer to be potentially reused to avoid allocation. This is mostly non-null in situations where you want to produce a dense , but you happen to have an indices array "left over" and you don't want to needlessly lose. The resulting structure takes ownership of the passed in arrays, so they should not be used for other purposes in the future. Construct a possibly sparse vector representation. The length of the constructed buffer. The count of explicit entries. This must be between 0 and , both inclusive. If it equals the result is a dense vector, and if less this will be a sparse vector. The values to be used. This must be at least as long as . If is 0, it is legal for this to be . The indices to be used. If we are constructing a dense representation, or is 0, this can be . Otherwise, this must be at least as long as . The resulting structure takes ownership of the passed in arrays, so they should not be used for other purposes in the future. Copy from this buffer to the given destination, forcing a dense representation. The destination buffer. After the copy, this will have of . Copy from this buffer to the given destination. The destination buffer. After the copy, this will have of . Copy a range of values from this buffer to the given destination. The destination buffer. After the copy, this will have of . The minimum inclusive index to start copying from this vector. The logical number of values to copy from this vector into . Copy from this buffer to the given destination span. This "densifies." The destination buffer. This must have least . Copy from this buffer to the given destination span, starting at the specified index. This "densifies." The destination buffer. This must be at least plus . The starting index of at which to start copying. The value to fill in for the implicit sparse entries. This is a potential exception to general expectation of sparse that the implicit sparse entries have the default value of . Copy from a section of a source array to the given destination. Returns the joint list of all index/value pairs. If all pairs, even those implicit values of a sparse representation, will be returned, with the implicit values having the default value, as is appropriate. If left then only explicitly defined values are returned. The index/value pairs. Returns an enumerable with items, representing the values. Gets the item stored in this structure. In the case of a dense vector this is a simple lookup. In the case of a sparse vector, it will try to find the entry with that index, and set to that stored value, or if no such value was found, assign it the default value. In the case where is , this will take constant time since it an directly lookup. For sparse vectors, however, because it must perform a bisection search on the indices to find the appropriate value, that takes logarithmic time with respect to the number of explicitly represented items, which is to say, the of the return value of . For that reason, a single completely isolated lookup, since constructing as does is not a free operation, it may be more efficient to use this method. However if one is doing a more involved computation involving many operations, it may be faster to utilize and, if appropriate, directly. The index, which must be a non-negative number less than . The value stored at that index, or if this is a sparse vector where this is an implicit entry, the default value for . A variant of that returns the value instead of passing it back using a reference parameter. The index, which must be a non-negative number less than . The value stored at that index, or if this is a sparse vector where this is an implicit entry, the default value for . Returns an enumerator that iterates through the values in VBuffer. A helper method that gives us an iterable over the items given the fields from a . Note that we have this in a separate utility class, rather than in its more natural location of itself, due to a bug in the C++/CLI compiler. (DevDiv 1097919: [C++/CLI] Nested generic types are not correctly imported from metadata). So, if we want to use in C++/CLI projects, we cannot have a generic struct with a nested class that has the outer struct type as a field. Various methods for creating instances. Creates a with the same shape (length and density) as the . The destination buffer. Note that the resulting is assumed to take ownership of this passed in object, and so whatever was passed in as this parameter should not be used again, since its underlying buffers are being potentially reused. Creates a using 's values and indices buffers. The destination buffer. Note that the resulting is assumed to take ownership of this passed in object, and so whatever was passed in as this parameter should not be used again, since its underlying buffers are being potentially reused. The logical length of the new buffer being edited. The optional number of physical values to be represented in the buffer. The buffer will be dense if is omitted. The optional number of maximum physical values to represent in the buffer. The buffer won't grow beyond this maximum size. True means that the old buffer values and indices are preserved, if possible (Array.Resize is called). False means that a new array will be allocated, if necessary. True means to ensure the Indices buffer is available, even if the buffer will be dense. An object capable of editing a by filling out (and if the buffer is not dense). The structure by itself is immutable. However, the purpose of is to enable buffer re-use we can edit them through this structure, as created through or . The mutable span of values. The mutable span of indices. Gets a value indicating whether a new array was allocated. Gets a value indicating whether a new array was allocated. Commits the edits and creates a new using the current and . Note that this structure and its properties should not be used once this is called. The newly created . Commits the edits and creates a new using the current Values and Indices, while allowing to truncate the length of and, if sparse, . Like , this structure and its properties should not be used once this is called. The new number of physical values to be represented in the created buffer. The newly created . This method allows to modify the length of the explicitly defined values. This is useful in sparse situations where the was created with a larger physical value count than was needed because the final value count was not known at creation time. The standard vector type. The representation type of this is , where the type parameter is in . The dimensions. This will always have at least one item. All values will be non-negative. As with , a zero value indicates that the vector type is considered to have unknown length along that dimension. In the case where this is a multi-dimensional type, that is, a situation where has length greater than one, since itself is a single dimensional structure, we must clarify what we mean. The indices represent a "flattened" view of the coordinates implicit in the dimensions. We consider that the last dimension is the most "minor" index. In the case where has length 2, this is commonly referred to as row-major order. So, if you hypothetically had dimensions of { 5, 2 }, then the values would be all of length 10, and the flattened indices 0, 1, 2, 3, 4, ... would correspond to "coordinates" of (0, 0), (0, 1), (1, 0), (1, 1), (2, 0), ..., respectively. Constructs a new single-dimensional vector type. The type of the items contained in the vector. The size of the single dimension. Constructs a potentially multi-dimensional vector type. The type of the items contained in the vector. The dimensions. Note that, like , must be non-empty, with all non-negative values. Also, because is the product of , the result of multiplying all these values together must not overflow . Constructs a potentially multi-dimensional vector type. The type of the items contained in the vector. The dimensions. Note that, like , must be non-empty, with all non-negative values. Also, because is the product of , the result of multiplying all these values together must not overflow . Whether this is a vector type with known size. Equivalent to > 0. The type of the items stored as values in vectors of this type. The size of the vector. A value of zero means it is a vector whose size is unknown. A vector whose size is known should correspond to values that always have the same , whereas one whose size is unknown may have values whose varies from record to record. Note that this is always the product of the elements in . Represents the schema of an or an . The schema is a collection of . Number of columns in the schema. Get the column by name. Throws an exception if such column does not exist. Note that if multiple columns exist with the same name, the one with the biggest index is returned. The other columns are considered 'hidden', and only accessible by their index. Get the column by index. Get the column by name, or null if the column is not present. This class describes one column in the particular schema. The name of the column. The column's index in the schema. Whether this column is hidden (accessible only by index). The type of the column. The annotations of the column. This class represents the schema of one column of a data view, without an attachment to a particular . The name of the column. The type of the column. The annotations associated with the column. Creates an instance of a . Create an instance of from an existing schema's column. The schema annotations of one . Annotation getter delegates. Useful to construct annotations out of other annotations. The schema of the annotations row. It is different from the schema that the column belongs to. Create an annotations row by supplying the schema columns and the getter delegates for all the values. Note: The array is owned by this instance. Get a getter delegate for one value of the annotations row. Get the value of an annotation, by annotation kind (aka column name). Class containing operations to build an . Add some columns from into our new annotations, by applying to all the names. The annotations row to take values from. The predicate describing which annotation columns to keep. Add one annotation column, strongly-typed version. The type of the value. The annotation name. The annotation type. The getter delegate. Annotations of the input column. Note that annotations on an annotation column is somewhat rare except for certain types (for example, slot names for a vector, key values for something of key type). Add one annotation column, weakly-typed version. The annotation name. The annotation type. The getter delegate that provides the value. Note that the type of the getter is still checked inside this method. Annotations of the input column. Note that annotations on an annotation column is somewhat rare except for certain types (for example, slot names for a vector, key values for something of key type). Add one annotation column for a primitive value type. The annotation name. The annotation type. The value of the annotation. Annotations of the input column. Note that annotations on an annotation column is somewhat rare except for certain types (for example, slot names for a vector, key values for something of key type). Returns a row that contains the current contents of this . Class containing operations to build a . Create a new instance of . Add one column to the schema being built. The column name. The column type. The column annotations. Add multiple existing columns to the schema being built. Columns to add. Add multiple existing columns to the schema being built. Columns to add. Returns a that contains the current contents of this . This constructor should only be called by . The input columns. The constructed instance takes ownership of the array. The input and output of Query Operators (Transforms). This is the fundamental data pipeline type, comparable to for LINQ. Whether this IDataView supports shuffling of rows, to any degree. Returns the number of rows if known. Returning null means that the row count is unknown but it might return a non-null value on a subsequent call. This indicates, that the transform does not YET know the number of rows, but may in the future. Its implementation's computation complexity should be O(1). Most implementation will return the same answer every time. Some, like a cache, might return null until the cache is fully populated. Get a row cursor. The indicate the active columns that are needed to iterate over. If set to an empty no column is requested. The schema of the returned cursor will be the same as the schema of the IDataView, but getting a getter for inactive columns will throw. The active columns needed. If passed an empty no column is requested. An instance of to seed randomizing the access for a shuffled cursor. This constructs a set of parallel batch cursors. The value is a recommended limit on cardinality. If is non-positive, this indicates that the caller has no recommendation, and the implementation should have some default behavior to cover this case. Note that this is strictly a recommendation: it is entirely possible that an implementation can return a different number of cursors. The cursors should return the same data as returned through , except partitioned: no two cursors should return the "same" row as would have been returned through the regular serial cursor, but all rows should be returned by exactly one of the cursors returned from this cursor. The cursors can have their values reconciled downstream through the use of the property. The typical usage pattern is that a set of cursors is requested, each of them is then given to a set of working threads that consume from them independently while, ultimately, the results are finally collated in the end by exploiting the ordering of the property described above. More typical scenarios will be content with pulling from the single serial cursor of . The active columns needed. If passed an empty no column is requested. The suggested degree of parallelism. An instance of to seed randomizing the access. Gets an instance of Schema. Delegate type to get a value. This can be used for efficient access to data in a or . A logical row of data. May be a row of an or a stand-alone row. This is incremented when the underlying contents changes, giving clients a way to detect change. It should be -1 when the object is in a state where values cannot be fetched. In particular, for an , this will be before if ever called for the first time, or after the first time is called and returns . Note that this position is not position within the underlying data, but position of this cursor only. If one, for example, opened a set of parallel streaming cursors, or a shuffled cursor, each such cursor's first valid entry would always have position 0. This provides a means for reconciling multiple rows that have been produced generally from . When getting a set, there is a need to, while allowing parallel processing to proceed, always have an aim that the original order should be recoverable. Note, whether or not a user cares about that original order in one's specific application is another story altogether (most callers of this as a practical matter do not, otherwise they would not call it), but at least in principle it should be possible to reconstruct the original order one would get from an identically configured . So: for any cursor implementation, batch numbers should be non-decreasing. Furthermore, any given batch number should only appear in one of the cursors as returned by . In this way, order is determined by batch number. An operation that reconciles these cursors to produce a consistent single cursoring, could do so by drawing from the single cursor, among all cursors in the set, that has the smallest batch number available. Note that there is no suggestion that the batches for a particular entry will be consistent from cursoring to cursoring, except for the consistency in resulting in the same overall ordering. The same entry could have different batch numbers from one cursoring to another. There is also no requirement that any given batch number must appear, at all. It is merely a mechanism for recovering ordering from a possibly arbitrary partitioning of the data. It also follows from this, of course, that considering the batch to be a property of the data is completely invalid. A getter for a 128-bit ID value. It is common for objects to serve multiple instances to iterate over what is supposed to be the same data, for example, in a a cursor set will produce the same data as a serial cursor, just partitioned, and a shuffled cursor will produce the same data as a serial cursor or any other shuffled cursor, only shuffled. The ID exists for applications that need to reconcile which entry is actually which. Ideally this ID should be unique, but for practical reasons, it suffices if collisions are simply extremely improbable. Note that this ID, while it must be consistent for multiple streams according to the semantics above, is not considered part of the data per se. So, to take the example of a data view specifically, a single data view must render consistent IDs across all cursorings, but there is no suggestion at all that if the "same" data were presented in a different data view (as by, say, being transformed, cached, saved, or whatever), that the IDs between the two different data views would have any discernible relationship. Returns whether the given column is active in this row. Returns a value getter delegate to fetch the value of the given , from the row. This throws if the column is not active in this row, or if the type differs from this column's type. is the column's content type. is the output column whose getter should be returned. Gets a , which provides name and type information for variables (i.e., columns in ML.NET's type system) stored in this row. Implementation of dispose. Calls with . The disposable method for the disposable pattern. This default implementation does nothing. Whether this was called from . Subclasses that implement should call this method with , but I hasten to add that implementing finalizers should be avoided if at all possible.. Class used to cursor through rows of an . Note that this is also an . The is incremented by . Prior to the first call to , or after returns , is -1. Otherwise, when returns , >= 0. Advance to the next row. When the cursor is first created, this method should be called to move to the first row. Returns if there are no more rows. The debugger proxy for . The debugger proxy for .