Read a CSV or Parquet file, as exported from the GESLA Shiny app interface (geslaR-app). A "GESLA dataset file" is a subset of the GESLA dataset, fetched from the geslaR-app. When using that app, you can choose to download the selected subset in CSV or Parquet file formats. Whichever option is chosen this function will automatically identify the file type and use the appropriate functions to import the dataset to R.
This function can be used for exported files from the online
interface (hosted in this
server)
or from a local interface, as when using the
run_gesla_app()
function.
Arguments
- file
The file name (must end in
.csv
or.parquet
only)- as_data_frame
If
FALSE
(default), the data will be imported as an ArrowTable
format. Otherwise, the data will be in atbl_df
(data.frame
) format. See Details.- ...
Other arguments from
arrow::read_csv_arrow()
, andarrow::read_parquet()
, from the arrow package.
Details
We highly recommend to export subsets of the GESLA dataset from the geslaR-app in the Parquet file format. This format has a much smaller file size when comparred to the CSV format.
In any case, the only difference between CSV and Parquet files will
be the file size. However, when importing these data to R, both
file types have the option to be imported as an Arrow Table
format, which is the default (argument as_data_frame = FALSE
).
This way, the object created in R will have a very small size,
independent of how big the file size is. To deal with this type of
object, you can use dplyr
verbs, in the same way as a normal
data.frame
(or tbl_df
). Some examples can be found in the Arrow documentation.
If the as_data_frame
argument is set to TRUE
, the imported R
object will vary in size, according to the size of the dataset, and
regardless of the file type. In many situations, this can be
infeasible, since the object can result in a "larger-than-memory"
size, and possibly will make R operations slow or even a session
crash. Therefore, we always recommend to start with as_data_frame = FALSE
, and work with the dataset from there.
See Examples below.
Author
Fernando Mayer fernando.mayer@mu.ie
Examples
##------------------------------------------------------------------
## Import an internal example Parquet file
tmp <- tempdir()
file.copy(system.file(
"extdata", "ireland.parquet", package = "geslaR"), tmp)
#> [1] TRUE
da <- read_gesla(paste0(tmp, "/ireland.parquet"))
## Check size in memory
object.size(da)
#> 488 bytes
##------------------------------------------------------------------
## Import an internal example CSV file
tmp <- tempdir()
file.copy(system.file(
"extdata", "ireland.csv", package = "geslaR"), tmp)
#> [1] TRUE
da <- read_gesla(paste0(tmp, "/ireland.csv"))
## Check size in memory
object.size(da)
#> 488 bytes
##------------------------------------------------------------------
## Import an internal example Parquet file as data.frame
tmp <- tempdir()
file.copy(system.file(
"extdata", "ireland.parquet", package = "geslaR"), tmp)
#> [1] FALSE
da <- read_gesla(paste0(tmp, "/ireland.parquet"),
as_data_frame = TRUE)
## Check size in memory
object.size(da)
#> 11112 bytes
##------------------------------------------------------------------
## Import an internal example CSV file as data.frame
tmp <- tempdir()
file.copy(system.file(
"extdata", "ireland.csv", package = "geslaR"), tmp)
#> [1] FALSE
da <- read_gesla(paste0(tmp, "/ireland.csv"),
as_data_frame = TRUE)
## Check size in memory
object.size(da)
#> 11104 bytes
## Remove files from temporary directory
unlink(paste0(tmp, "/ireland.parquet"))
unlink(paste0(tmp, "/ireland.csv"))