Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
457 views
in Technique[技术] by (71.8m points)

python - 尝试使用python ETL数据时,我一直遇到关键错误。 有想法该怎么解决这个吗?(I keep getting a key error while trying to ETL data using python. Any ideas on how to fix this?)

 # The following modules are used for MySQL - they require the MySQL Python interface to be 
loaded from the MySQL installer
import mysql.connector
import sqlalchemy
from sqlalchemy import create_engine
import pandas as pd

# The next line creates the connection to the MySQL database on my PC
# Notice that the schema (bostonrest) is included in the connection string    
#
engine = create_engine('mysql+mysqlconnector://root:[email protected]:3306/newyorkres2', 
echo=False)

# Read the contents of the Boston Restaurant Data into a Pandas dataframe
newyorkres2 = pd.read_csv('/users/MichaelMcCuen/desktop/MIS467 Data 
Warehousing/NewYorkResData.csv', engine='python')

# Extract the list of columns (field names) in the CSV file, then print them
colnames = newyorkres2.columns.tolist()
print(colnames,'')


# Print the first three lines of the data, just to see what it looks like; then print the 
number of rows in the dataset
print(newyorkres2.head())
print('Num Rows: ',newyorkres2.shape[1])


# bosfac == a dataframe that contains the Boston Facilities (restaurants). 
# Note that we are selecting the fields that we want in this particular subset of the data

newyorkest=newyorkres2['camis','cuisineType','establishmentName','building','street','zipCode','Boro','phone', 'communityBoard', 'councilDistrict', 'censusTract', 'BIN']

(newyorkest = newyorkres2 ['camis','cuisineType','EstablishmentName','building','street','zipCode','Boro','phone','communityBoard','councilDistrict','censusTract','BIN '])

# sort the data by the property_id, which serves as a key; then, drop the duplicate rows. 
# It is necessary to sort by a key and then remove the duplicate key values that are adjacent 
in the output

newyorkest = newyorkest.sort_values(['camis'], ascending = [True])
newyorkres2clean = newyorkest.drop_duplicates(subset = 'camis')

# write the 'clean' version of the Boston Facility data to MySQL, as a table named 
"restaurant"
newyorkres2clean.to_sql(name='establishmentName', con=engine, if_exists = 'replace', 
index=False)
  ask by Michael McCuen translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...