First of all, the error is pretty straight-forward: your column_headers
list has 22 columns, but player_data
entries only have 21. So you need to find which out column is missing and why. Just by visually comparing the entries from the dataframe and the headers list, it appears one of the two first columns is missing. player_data[0][0]
returns
1, CLE, Andrew Wiggins, University of Kansas,...
but it should be
1, 1, CLE, Andrew Wiggins, University of Kansas,...
The problem is the table itself. Navigate to the website, hover over the table and right-click: inspect.
The first row of data (underneath the 'Rk') consists of 21 td
and 1 th
element. The "rk" entry is actually of type th
and not td
:
That is why your
player_data = [[td.getText() for td in data_rows[i].findAll('td')] for i in range(len(data_rows))]
skips the first column because it is only iterating over td
elements. Hence the different length.
I don't know how important the first column is; quick fix would be to drop the Rk column from your headers list.
Alternatively, search for both td
and th
elements:
player_data = [[td.getText() for td in data_rows[i].findAll(['td','th'])] for i in range(len(data_rows))]
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…