Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
109 views
in Technique[技术] by (71.8m points)

python - index out of range when generating a csv

I am trying to build a sports betting program.

Right now I am stuck at the part where it is generating a csv with all of the box scores for the previous two college basketball seasons. It is pulling the boxscore indexes from the csv I have already generated.

I keep getting index out of range error once it hits 10653 iterations on the progress bar. I can't find anything specific in the csv it is reading at the 10653rd row.

I know the iterations correspond to rows in the csv because when I run all line of code prior to df = Boxscore(box_link).dataframe the progress bar completes at 14980 iterations, which is the same number of rows in the csv it is reading.

Any help would be greatly appreciated. The code is below along with the error message.

from sportsreference.ncaab.boxscore import Boxscore
start_season = 2020 # below code will pull data from all seasons starting from this year
box_df = None
schedule_df = pd.read_csv('ncaab - sheet81 - ncaab - sheet81.csv')#if only running for testing, a smaller csv may be used to speed up the process
season_df = schedule_df.loc[schedule_df.Season>=start_season]
for index, row in tqdm(season_df.iterrows()):
    box_link = row['BoxscoreIndex']
    _df = Boxscore(box_link).dataframe #The line to left is where the error keeps coming in "list index out of range". I ran everything above this and it works fine.  
        
    if box_df is not None:
        box_df = pd.concat([box_df,_df],axis=0)
    else:
        box_df = _df
            
box_df.to_csv('boxscores3.csv'.format(start_season),index=None)    
IndexError                                Traceback (most recent call last)
<ipython-input-24-91c5b71b03e2> in <module>
      6 for index, row in tqdm(season_df.iterrows()):
      7     box_link = row['BoxscoreIndex']
----> 8     _df = Boxscore(box_link).dataframe #The line to left is where the error keeps coming in "list index out of range". I ran everything above this and it works fine.
      9 
     10     if box_df is not None:

~DownloadsWPy64-3860python-3.8.6.amd64libsite-packagessportsreference
caaboxscore.py in __init__(self, uri)
    223         self._home_defensive_rating = None
    224 
--> 225         self._parse_game_data(uri)
    226 
    227     def _retrieve_html_page(self, uri):

~DownloadsWPy64-3860python-3.8.6.amd64libsite-packagessportsreference
caaboxscore.py in _parse_game_data(self, uri)
    668             if short_field == 'away_record' or 
    669                short_field == 'home_record':
--> 670                 value = self._parse_record(short_field, boxscore, index)
    671                 setattr(self, field, value)
    672                 continue

~DownloadsWPy64-3860python-3.8.6.amd64libsite-packagessportsreference
caaboxscore.py in _parse_record(self, field, boxscore, index)
    375         records = boxscore(BOXSCORE_SCHEME[field]).items()
    376         records = [x.text() for x in records if x.text() != '']
--> 377         return records[index]
    378 
    379     def _find_boxscore_tables(self, boxscore):

IndexError: list index out of range
question from:https://stackoverflow.com/questions/65836240/index-out-of-range-when-generating-a-csv

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

first, just want to point out that the .format() method 'boxscores3.csv'.format(start_season) here doesnt do anything. it's still going to return 'boxscores3.csv'. You'd need to have that placeholder within the string to have that in the filename:

so for example if start_season = '2020', then 'boxscores3_{0}.csv'.format(start_season) would give you 'boxscores3_2020.csv'

So if you want that filename dynamic, change to:

box_df.to_csv('boxscores3_{0}.csv'.format(start_season),index=None)

or

box_df.to_csv('boxscores3_{some_variable}.csv'.format(some_variable = start_season),index=None)

or

box_df.to_csv('boxscores3_%s.csv' %start_season),index=None)

Next, until you can provide a sample of that csv file, specifically row 10653, can't really help you with the specific issue.

However, until then, I can offer an alternate solution using espn api.

You can get box scores of college basketball games, provided you have the gameId. So this code will go through each date (need to give a start date), get the gameIds of each game. Then with the gameIds, can go get the boxscore from another api endpoint. Unfortunetly, the boxscore isn't returned in a json format, but rather the html (which is fine because we can use pandas to read in the table).

I don't know exactly what you need or want, but this may help you while you are learning python to see other ways to get data:

Code:

from tqdm import tqdm
import requests
import pandas as pd
import datetime


date_list = []
sdate = datetime.date(2021, 1, 1)   # start date
edate = datetime.date.today()  # end date

delta = edate - sdate       # as timedelta

for i in range(delta.days + 1):
    day = sdate + datetime.timedelta(days=i)
    date_list.append(day.strftime("%Y%m%d"))

 


headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36'}
payload = {
'xhr': '1',
'device': 'desktop',
'country': 'us',
'lang': 'en',
'region': 'us',
'site': 'espn',
'edition-host': 'espn.com',
'site-type': 'full'}

# Get gameIds
gameId_dict = {}
for dateStr in tqdm(date_list):
    url = 'https://secure.espn.com/core/mens-college-basketball/schedule/_/date/{dateStr}/group/50'.format(dateStr=dateStr)
    games = requests.get(url, headers=headers, params=payload).json()['content']['schedule'][dateStr]['games']
    gameId_dict[dateStr] = []
    for game in games:
        # Check if game was postponed
        if game['status']['type']['name'] in ['STATUS_POSTPONED','STATUS_CANCELED','STATUS_SCHEDULED']:
            continue
        game_info = {}
        game_info[game['id']] = {}
        game_info[game['id']]['awayTeam'] = game['shortName'].split('@')[0].strip()
        game_info[game['id']]['homeTeam'] = game['shortName'].split('@')[1].strip()
        gameId_dict[dateStr].append(game_info)



full_df = pd.DataFrame()
# Box score - gameId needed
box_url = 'https://secure.espn.com/core/mens-college-basketball/boxscore'
for dateStr, games in tqdm(gameId_dict.items()):
    for game in tqdm(games):
        for gameId, teams in game.items():
            payload = {
            'gameId': gameId,
            'xhr': '1',
             'render': 'true',
            'device': 'desktop',
            'country': 'us',
            'lang': 'en',
            'region': 'us',
            'site': 'espn',
            'edition-host': 'espn.com',
            'site-type': 'full'}
            
            data = requests.get(box_url, headers=headers, params=payload).json()
            away_df = pd.read_html(data['content']['html'], header=1)[0].rename(columns={'Bench':'Player'})
            away_df = away_df[away_df['Player'] != 'TEAM']
            away_df = away_df[away_df['Player'].notna()]
            away_df['Team'] = teams['awayTeam']
            away_df['Home_Away'] = 'Away'
            away_df['Starter_Bench'] = 'Bench'
            away_df.loc[0:4, 'Starter_Bench'] = 'Starter'
            away_df['Player'] = away_df['Player'].str.split(r"([a-z]+)([A-Z].+)", expand=True)[2]
            away_df[['Player','Team']] = away_df['Player'].str.extract('^(.*?)([A-Z]+)$', expand=True)
                    
            home_df = pd.read_html(data['content']['html'], header=1)[1].rename(columns={'Bench':'Player'})
            home_df = home_df[home_df['Player'] != 'TEAM']
            home_df = home_df[home_df['Player'].notna()]
            home_df['Team'] = teams['homeTeam']
            home_df['Home_Away'] = 'Home'
            home_df['Starter_Bench'] = 'Bench'
            home_df.loc[0:4, 'Starter_Bench'] = 'Starter'
            home_df['Player'] = home_df['Player'].str.split(r"([a-z]+)([A-Z].+)", expand=True)[2]
            home_df[['Player','Team']] = home_df['Player'].str.extract('^(.*?)([A-Z]+)$', expand=True)
            
            game_df = away_df.append(home_df, sort = False)
            game_df['Date'] = datetime.datetime.strptime(dateStr, '%Y%m%d').strftime('%m/%d/%y')
            full_df = full_df.append(game_df, sort = False)

full_df = full_df.reset_index(drop=True)

Output:

print (full_df.head(30).to_string())
                Player MIN    FG  3PT   FT OREB DREB REB AST STL BLK TO PF PTS  Team Home_Away Starter_Bench Pos      Date
0             H. Drame  22   2-7  0-2  0-0    1    1   2   0   0   1  1  4   4   SPU      Away       Starter   F  01/01/21
1             F. Drame  20   2-3  0-1  0-0    1    5   6   0   3   1  1  4   4   SPU      Away       Starter   F  01/01/21
2               M. Lee  24  2-11  0-4  1-2    1    2   3   0   0   0  3  0   5   SPU      Away       Starter   G  01/01/21
3             D. Banks  26  4-12  1-6  2-4    0    5   5   6   1   0  1  1  11   SPU      Away       Starter   G  01/01/21
4             D. Edert  32  6-10  2-4  1-2    0    4   4   0   2   0  1  2  15   SPU      Away       Starter   G  01/01/21
5           O. Diahame   1   0-1  0-0  0-0    0    0   0   0   0   0  0  0   0   SPU      Away         Bench   F  01/01/21
6             K. Ndefo  23  7-10  0-0  3-3    1    6   7   2   1   5  1  4  17   SPU      Away         Bench   F  01/01/21
7            B. Diallo  14   0-2  0-0  0-0    1    1   2   0   0   0  0  0   0   SPU      Away         Bench   G  01/01/21
8             T. Brake  24   1-2  0-1  0-0    0    0   0   1   0   0  0  1   2   SPU      Away         Bench   G  01/01/21
9           M. Silvera   6   0-0  0-0  0-0    0    1   1   1   0   0  1  0   0   SPU      Away         Bench   G  01/01/21
10            N. Kamba   8   0-1  0-0  0-0    0    0   0   0   0   0  2  0   0   SPU      Away         Bench   G  01/01/21
11            J. Fritz  38   5-9  0-0  4-5    2    8  10   4   1   3  1  3  14   CAN      Home       Starter   F  01/01/21
12            J. White  17   4-7  1-2  0-0    1    4   5   2   0   0  5  2   9   CAN      Home       Starter   F  01/01/21
13           A. Fofana  20   1-7  1-4  1-2    0    1   1   1   0   0  1  2   4   CAN      Home       Starter   G  01/01/21
14          A. Harried  23  3-10  1-4  0-1    2    5   7   1   1   1  0  1   7   CAN      Home       Starter   G  01/01/21
15        J. Henderson  37   3-8  3-5  5-6    0    1   1   2   0   0  1  1  14   CAN      Home       Starter   G  01/01/21
16      G. Maslennikov   2   0-2  0-1  0-0    0    0   0   0   0   0  1  1   0   CAN      Home         Bench   F  01/01/21
17            M. Green  18   3-4  0-0  2-2    1    4   5   2   1   0  2  1   8   CAN      Home         Bench   F  01/01/21
18          S. Hitchon   3   0-0  0-0  0-0    0    0   0   1   0   0  0  0   0   CAN      Home         Bench   F  01/01/21
19       S. Uijtendaal  20   2-4  1-2  0-0    0    0   0   0   1   0  0  2   5   CAN      Home         Bench   G  01/01/21
20          M. Brandon  19   4-5  1-2  0-0    0    3   3   2   2   0  2  1   9   CAN      Home         Bench   G  01/01/21
21           A. Ahemed   3   0-0  0-0  0-0    0    1   1   1   0   0  0  1   0   CAN      Home         Bench   G  01/01/21
22           K. Nwandu  34  5-13  1-3  0-1    1    3   4   3   1   0  3  1  11  NIAG      Away       Starter   F  01/01/21
23      G. Kuakumensah  23   1-2  1-2  1-2    0    2   2   1   0   0  1  1   4  NIAG      Away       Starter   F  01/01/21
24         N. Kratholm  18   4-7  0-0  3-5    2    2   4   1   0   0  0  2  11  NIAG      Away       Starter   F  01/01/21
25          M. Hammond  33  7-14  3-6  0-0    0    4   4   1   1   0  2  2  17  NIAG      Away       Starter   G  01/01/21
26          J. Roberts  28   2-6  2-6  2-2    0    2   2   3   1   0  2  3   8  NIAG      Away       Starter   G  01/01/21
27          J. Cintron  14   0-2  0-0  0-0    1    3   4   0   0   1  2  1   0  NIAG      Away         Bench   F  01/01/21
28  DonaldN. MacDonald   9   0-1  0-1  0-0    0    3   3   0   0   0  0  0   0  NIAG      Away         Bench   G  01/01/21
29          R. Solomon  25  4-11  0-2  2-2    1    3   4   0   3   0  0  1  10  NIAG      Away         Bench   G  01/01/21

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...