When I import a .csv file with read.table, with the call df <- read.table("ModelSugar(new) real_thesis_experiment-table_1.csv", skip = 6, sep = ",", head = TRUE)
and I check the summary of the data I get (only first 3 columns of 45 are shown):
X.run.number. scenario configuration
Min. : 1 "pessimistic":999994 "central":999994
1st Qu.: 650
Median :1299
Mean :1299
3rd Qu.:1949
Max. :2600
With this dataframe I can make nice graphics. However, I have 80 .csv files with a total size of 40 GB, so I want to import only specific columns.
I figured this would be easier with fread
(from the data.table package). So I imported 5 columns and rbind them together into one dataframe with the call
my.files <- list.files(pattern=".csv")
my.data <- lapply(my.files,fread, header = FALSE, select = c(1,2,3,25,29), sep=",")
df <- do.call("rbind", my.data)
The summary of that dataframe looks like(4 of 5 columns shown:
[run number] scenario configuration [step]
Length:999994 Length:999994 Length:999994 Length:999994
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
With this dataframe I cannot make the graphics that I could with read.table. I guess that this has to do with the class of the columns' values.
How can I make sure that the dataframe created with fread has the same characteristics as the one with read.table, so that I can make the graphics I want?
EDIT
I found out that when I first split the .csv in excel into columns and then use the fread call with sep = ";" instead of sep = ",", that it does work. Strange... And I don't want to convert the .csv files into columns in excel manually.
See Question&Answers more detail:
os