I haven't found a way to do this elegantly, but I'm very familiar with this problem (getting data from FactSet PA reports -> Excel -> R, right?). I understand different reports have different formats, and this can be a pain.
For a slightly different version of annoyingly formatted spreadsheets, I do the following. It's not the most elegant (it requires two reads of the file) but it works. I like reading the file twice, to make sure the columns are of the correct type, and with good headers. It's easy to mess up column imports, so I'd rather have my code read the file twice than go through and clean up columns myself, and the read_excel defaults, if you start at the right row, are pretty good.
Also, it's worth noting that as of today (2017-04-20), readxl had an update. I installed the new version to see if that would make this very easy, but I don't believe that's the case, although I could be mistaken.
library(readxl)
library(stringr)
library(dplyr)
f_path <- file.path("whatever.xlsx")
if (!file.exists(f_path)) {
f_path <- file.choose()
}
# I read this twice, temp_read to figure out where the data actually starts...
# Maybe you need something like this -
# excel_sheets <- readxl::excel_sheets(f_path)
# desired_sheet <- which(stringr::str_detect(excel_sheets,"2 Factor Brinson Attribution"))
desired_sheet <- 1
temp_read <- readxl::read_excel(f_path,sheet = desired_sheet)
skip_rows <- NULL
col_skip <- 0
search_string <- "Monthly Returns"
max_cols_to_search <- 10
max_rows_to_search <- 10
# Note, for the - 0, you may need to add/subtract a row if you end up skipping too far later.
while (length(skip_rows) == 0) {
col_skip <- col_skip + 1
if (col_skip == max_cols_to_search) break
skip_rows <- which(stringr::str_detect(temp_read[1:max_rows_to_search,col_skip][[1]],search_string)) - 0
}
# ... now we re-read from the known good starting point.
real_data <- readxl::read_excel(
f_path,
sheet = desired_sheet,
skip = skip_rows
)
# You likely don't need this if you start at the right row
# But given that all weird spreadsheets are weird in their own way
# You may want to operate on the col_skip, maybe like so:
# real_data <- real_data %>%
# select(-(1:col_skip))
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…