Table of contents
TOC
Collapse the table of content
Expand the table of content

RxXdfData function (RevoScaleR)

Heidi Steen|Last Updated: 12/15/2016

This is the main generator for S4 class RxXdfData, which extends RxDataSource.

Usage

RxXdfData(file, varsToKeep = NULL, varsToDrop = NULL, returnDataFrame = TRUE,
        stringsAsFactors = FALSE, blocksPerRead = rxGetOption("blocksPerRead"),
        fileSystem = NULL, createCompositeSet = NULL,
        blocksPerCompositeFile = 3)

## S3 method for class 'RxXdfData'
head(x, n = 6L, reportProgress = 0L, ...)

## S3 method for class 'RxXdfData'
summary(object, ...)

## S3 method for class 'RxXdfData'
tail(x, n = 6L, addrownums = TRUE, reportProgress = 0L, ...)

Argument

The following table shows the arguments to RxXdfData in order and their default values.

ParameterDescription
fileCharacter string specifying the .xdf file.
varsToKeepCharacter vector of variable names to keep around during operations. If NULL, argument is ignored. Cannot be used with varsToDrop.
varsToDropCharacter vector of variable names to drop from operations. If NULL, argument is ignored. Cannot be used with varsToKeep.
returnDataFrameLogical indicating whether or not to convert the result to a data frame when reading with rxReadNext. If FALSE, a list is returned when reading with rxReadNext.
stringsAsFactorsLogical indicating whether or not to convert strings into factors in R (for reader mode only).
blocksPerReadNumber of blocks to read for each chunk of data read from the data source.
fileSystemCharacter string or RxFileSystem object indicating type of file system; "native" or RxNativeFileSystem object can be used for the local operating system, or an RxHdfsFileSystem object for the Hadoop file system. If NULL, the file system will be set to that in the current compute context, if available, otherwise the fileSystem option.
createCompositeSetLogical value or NULL. Used only when writing. If TRUE, a composite set of files will be created instead of a single .xdf file. A directory will be created whose name is the same as the .xdf file that would otherwise be created, but with no extension. Subdirectories ‘data’ and ‘metadata’ will be created. In the ‘data’ subdirectory, the data will be split across a set of .xdfd files (see blocksPerCompositeFile below for determining how many blocks of data will be in each file). In the ‘metadata’ subdirectory there is a single .xdfm file, which contains the meta data for all of the .xdfd files in the ‘data’ subdirectory. When the compute context is RxHadoopMR a composite set of files is always created.
blocksPerCompositeFileInteger value. If createCompositeSet=TRUE, and if the compute context is not RxHadoopMR, this will be the number of blocks put into each .xdfd file in the composite set. When importing is being done on Hadoop using MapReduce, the number of rows per .xdfd file is determined by the rows assigned to each MapReduce task, and the number of blocks per .xdfd file is therefore determined by rowsPerRead.
xAn RxXdfData object.
objectAn RxXdfData object.
nPositive integer. Number of rows of the data set to extract.
addrownumsLogical. If TRUE, row numbers will be created to match the original data set.
reportProgressInteger value with options:

- 0: no progress is reported.
- 1: the number of processed rows is printed and updated.
- 2: rows processed and timings are reported.
- 3: rows processed and all timings are reported.
...Arguments to be passed to underlying functions.

Return Value

Object of class RxXdfData.

Examples

myDataSource <- RxXdfData(file.path(rxGetOption("sampleDataDir"), "claims"))
# both of these should return TRUE
is(myDataSource, "RxXdfData")
is(myDataSource, "RxDataSource")

names(myDataSource)

modelFormula <- formula(myDataSource, depVars = "cost", varsToDrop = "RowNum")

See Also

ScaleR Functions (RevoScaleR package)

Comparison of rx Functions and CRAN R Functions

© 2017 Microsoft