Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
324 views
in Technique[技术] by (71.8m points)

dataframes.jl - Using Julia, how can I read multiple CSV and combine columns

I'm pretty new to Julia and I consider myself as a beginner in programming in general. I coded a bit of MATLAB and Python.

I have a bunch of CSVs and I want to combine them to do data analysis. My data look like this:

using DataFrames
using Plots
using CSV
using Glob
using Pipe

file_list = glob("*.csv") #list of all csvs in dir
df = @pipe file_list[1] |> CSV.File(_,header = 2) |> DataFrame #Read file
# I could have use df = CSV.File(file_list[1], header = 2) |> DataFrame but
# I wanted to try piping multiple operation but it didn't work

[Results of the code snippet][1]

This results in: https://i.stack.imgur.com/nZTFy.png

The thing is

  1. I want to combine the first 5 colums, as they define the time as yyyy-mm-dd-hh-mm-ss
  2. Ideally, I would add a column with the name of the file so all would merge in a single dataframe.

As I said, I'm pretty new to Julia and programming in general. Any help is appreciated.

Thank you.

question from:https://stackoverflow.com/questions/66064894/using-julia-how-can-i-read-multiple-csv-and-combine-columns

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

To pipe every item in a list, use .|>

julia> [1,2,3] .|> sqrt
3-element Array{Float64,1}:
 1.0
 1.4142135623730951
 1.7320508075688772

you can add columns like that :

julia> using DataFrames, Dates

julia> df = DataFrame("yr"=>2000, "m"=>1:2, "d"=>[30,1], "h"=>12:13, "min"=>30:31, "sec"=>58:59)
2×6 DataFrame
 Row │ yr     m      d      h      min    sec
     │ Int64  Int64  Int64  Int64  Int64  Int64
─────┼──────────────────────────────────────────
   1 │  2000      1     30     12     30     58
   2 │  2000      2      1     13     31     59

julia> df[!,"datetime"] = DateTime.(df[!,"yr"], df[!,"m"], df[!,"d"], df[!,"h"], df[!,"min"], df[!,"sec"])
2-element Array{DateTime,1}:
 2000-01-30T12:30:58
 2000-02-01T13:31:59

julia> df[!,"file"] .= "file.csv"
2-element Array{String,1}:
 "file.csv"
 "file.csv"

julia> df
2×8 DataFrame
 Row │ yr     m      d      h      min    sec    datetime             file
     │ Int64  Int64  Int64  Int64  Int64  Int64  DateTime             String
─────┼─────────────────────────────────────────────────────────────────────────
   1 │  2000      1     30     12     30     58  2000-01-30T12:30:58  file.csv
   2 │  2000      2      1     13     31     59  2000-02-01T13:31:59  file.csv

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...