Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
130 views
in Technique[技术] by (71.8m points)

r - Importing fread vs read.table and errors

When I import a .csv file with read.table, with the call df <- read.table("ModelSugar(new) real_thesis_experiment-table_1.csv", skip = 6, sep = ",", head = TRUE) and I check the summary of the data I get (only first 3 columns of 45 are shown):

 X.run.number. scenario        configuration   
 Min.   :   1 "pessimistic":999994   "central":999994  
 1st Qu.: 650                                            
 Median :1299                                            
 Mean   :1299                                            
 3rd Qu.:1949                                            
 Max.   :2600  

With this dataframe I can make nice graphics. However, I have 80 .csv files with a total size of 40 GB, so I want to import only specific columns.

I figured this would be easier with fread (from the data.table package). So I imported 5 columns and rbind them together into one dataframe with the call

my.files <- list.files(pattern=".csv")
my.data <- lapply(my.files,fread, header = FALSE, select = c(1,2,3,25,29), sep=",") 
df <- do.call("rbind", my.data)

The summary of that dataframe looks like(4 of 5 columns shown:

[run number]         scenario         configuration         [step]         
 Length:999994      Length:999994      Length:999994      Length:999994     
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character 

With this dataframe I cannot make the graphics that I could with read.table. I guess that this has to do with the class of the columns' values.

How can I make sure that the dataframe created with fread has the same characteristics as the one with read.table, so that I can make the graphics I want?

EDIT

I found out that when I first split the .csv in excel into columns and then use the fread call with sep = ";" instead of sep = ",", that it does work. Strange... And I don't want to convert the .csv files into columns in excel manually.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

First 5 columns (of 45) of dfshort look like this:

   X.run.number.      scenario configuration biobased.chemical.industry
1              3 "pessimistic"     "central"    "modification-dominant"
2              2 "pessimistic"     "central"    "modification-dominant"
3              3 "pessimistic"     "central"    "modification-dominant"
4              4 "pessimistic"     "central"    "modification-dominant"
5              2 "pessimistic"     "central"    "modification-dominant"
6              1 "pessimistic"     "central"    "modification-dominant"
7              3 "pessimistic"     "central"    "modification-dominant"
8              3 "pessimistic"     "central"    "modification-dominant"
9              2 "pessimistic"     "central"    "modification-dominant"
10             4 "pessimistic"     "central"    "modification-dominant"
   distributed.sugar.factory.investment.costs
1                                    70000000
2                                    70000000
3                                    70000000
4                                    70000000
5                                    70000000
6                                    70000000
7                                    70000000

Template looks like this:

 run_number      scenario configuration tick financial_balance_SU
1          3 "pessimistic"     "central"    0                    0
2          2 "pessimistic"     "central"    0                    0
3          3 "pessimistic"     "central"    1                    0
4          4 "pessimistic"     "central"    0                    0
5          2 "pessimistic"     "central"    1                    0
6          1 "pessimistic"     "central"    0                    0

df looks like this:

   run_number        scenario configuration tick financial_balance_SU
1:      23377 ""pessimistic""     ""mixed""  200  6.079728695488823E9
2:      23377 ""pessimistic""     ""mixed""  201  6.079728695488823E9
3:      23378 ""pessimistic""     ""mixed""  192   9.10006561818864E9
4:      23377 ""pessimistic""     ""mixed""  202  6.079728695488823E9
5:      23377 ""pessimistic""     ""mixed""  203  6.079728695488823E9
6:      23378 ""pessimistic""     ""mixed""  193   9.10006561818864E9

EDIT

str(dfshort)

'data.frame':   10 obs. of  45 variables:
 $ X.run.number.                                        : int  3 2 3 4 2 1 3 3 2 4
 $ scenario                                             : Factor w/ 1 level ""pessimistic"": 1 1 1 1 1 1 1 1 1 1
 $ configuration                                        : Factor w/ 1 level ""central"": 1 1 1 1 1 1 1 1 1 1
 $ biobased.chemical.industry                           : Factor w/ 1 level ""modification-dominant"": 1 1 1 1 1 1 1 1 1 1
 $ distributed.sugar.factory.investment.costs           : int  70000000 70000000 70000000 70000000 70000000 70000000 70000000 70000000 70000000 70000000
 $ beet.syrups.factory.investment.costs                 : int  1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000
 $ ethanol.factory.investment.costs                     : int  1500000 1500000 1500000 1500000 1500000 1500000 1500000 1500000 1500000 1500000
 $ market.share.beet.syrups.increase                    : num  0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002
 $ demand.beets.for.chemical.EU.increase                : num  0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
 $ transport.costs                                      : int  1 1 1 1 1 1 1 1 1 1
 $ washing.at.farmer                                    : Factor w/ 1 level ""no"": 1 1 1 1 1 1 1 1 1 1
 $ beet.syrups.price.percentage.of.sugar.price          : num  0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3
 $ CO2.tax.                                             : Factor w/ 1 level ""yes"": 1 1 1 1 1 1 1 1 1 1
 $ sugar.tax                                            : num  0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2
 $ CO2.tax                                              : int  13 13 13 13 13 13 13 13 13 13
 $ market.share.increase.period                         : int  10 10 10 10 10 10 10 10 10 10
 $ electricity.source                                   : Factor w/ 1 level ""conventional-mix"": 1 1 1 1 1 1 1 1 1 1
 $ white.sugar.price.EU.maximum                         : int  1000 1000 1000 1000 1000 1000 1000 1000 1000 1000
 $ white.sugar.price.EU.minimum                         : int  200 200 200 200 200 200 200 200 200 200
 $ beet.syrups.price.EU.maximum                         : int  500 500 500 500 500 500 500 500 500 500
 $ beet.syrups.price.EU.minimum                         : int  100 100 100 100 100 100 100 100 100 100
 $ ethanol.price.EU.maximum                             : int  2 2 2 2 2 2 2 2 2 2
 $ ethanol.price.EU.minimum                             : num  0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3
 $ years.taken.into.account                             : int  5 5 5 5 5 5 5 5 5 5
 $ X.step.                                              : int  0 0 1 0 1 0 2 3 2 1
 $ financial.balance.farmers                            : num  0 0 0 0 0 ...
 $ diesel.use.farmers                                   : int  0 0 0 0 0 0 0 0 0 0
 $ N.use.farmers                                        : int  0 0 0 0 0 0 0 0 0 0
 $ financial.balance.SU                                 : num  0 0 0 0 0 ...
 $ electricity.use.SU                                   : int  0 0 0 0 0 0 0 0 0 0
 $ financial.balance.central.sugar.factories            : num  0 0 0 0 0 ...
 $ electricity.use.central.sugar.factories              : int  0 0 0 0 0 0 0 0 0 0
 $ financial.balance.distributed.sugar.factories        : num  0 0 0 0 0 ...
 $ electricity.use.distributed.sugar.factories          : int  0 0 0 0 0 0 0 0 0 0
 $ financial.balance.beet.syrups.factories              : int  0 0 0 0 0 0 0 0 0 0
 $ electricity.use.beet.syrups.factories                : int  0 0 0 0 0 0 0 0 0 0
 $ financial.balance.ethanol.factories                  : int  0 0 0 0 0 0 0 0 0 0
 $ electricity.use.ethanol.factories                    : int  0 0 0 0 0 0 0 0 0 0
 $ transport.costs.yearly                               : num  0 0 0 0 0 ...
 $ diesel.use.total.transport                           : num  0 0 0 0 0 ...
 $ profit.per.tonne.sugar.beet.central.sugar.factory    : num  0 0 0 0 0 ...
 $ profit.per.tonne.sugar.beet.distributed.sugar.factory: num  0 0 0 0 0 ...
 $ profit.per.tonne.sugar.beet.sugar.from.beet.syrups   : int  0 0 0 0 0 0 0 0 0 0
 $ profit.per.tonne.sugar.beet.beet.syrups.factory      : int  0 0 0 0 0 0 0 0 0 0
 $ profit.per.tonne.sugar.beet.ethanol.factory          : num  0 0 0 0 0 ...

str(df)

Classes ‘data.table’ and 'data.frame':  19000000 obs. of  5 variables:
 $ run_number          : chr  "23377" "23377" "23378" "23377" ...
 $ scenario            : chr  """pessimistic""" """pessimistic""" """pessimistic""" """pessimistic""" ...
 $ configuration       : chr  """mixed""" """mixed""" """mixed""" """mixed""" ...
 $ tick                : chr  "200" "201" "192" "202" ...
 $ financial_balance_SU: chr  "6.079728695488823E9" "6.079728695488823E9" "9.10006561818864E9" "6.079728695488823E9" ...
 - attr(*, ".internal.selfref")=<externalptr> 

str(template)

'data.frame':   10 obs. of  5 variables:
 $ run_number          : int  3 2 3 4 2 1 3 3 2 4
 $ scenario            : Factor w/ 1 level ""pessimistic"": 1 1 1 1 1 1 1 1 1 1
 $ configuration       : Factor w/ 1 level ""central"": 1 1 1 1 1 1 1 1 1 1
 $ tick                : int  0 0 1 0 1 0 2 3 2 1
 $ financial_balance_SU: num  0 0 0 0 0 ...

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...