Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
261 views
in Technique[技术] by (71.8m points)

python - Pandas table scrape

I am trying to figure the best approach of converting a table into JSON records. At present I have the output as desired however the format of the table is puzzling me a little. The example below should explain:

ID   Product        Item_Material   Owner           Interest %
123  Test Item 1    Electric        Elctrotech              60%
null null           null            Spark inc               40%
124  Test Item 2    Wood            TY Toys                 100%
125  Test Item 3    Plastic         NA Materials            100%

My new line JSON is what I want but I am looking to somehow achieve the nested table rows into a nested JSON format if part of the parent row.

{"ID":"Test Item 1", "Item_Material":"Electric", "Owner":"Elctrotech","Interest %":"60%"}
{"ID":null, "Item_Material":null, "Owner":"Spark inc","Insterest %":"40%"} 
{"ID":"Test Item 2", "Item_Material":"Wood", "Owner":"TY Toys","Insterest %":"100%"}
{"ID":"Test Item 3","Item_Material":"Plastic","Owner":"NA Materials","Interest %":"100%"}

The aim would be to have the first row JSON something like this?

{"ID":"Test Item 1", "Item_Material":"Electric", "Owners": [{"Owner": "Elctrotech", "Interest %":"60%", "Owner":"Spark inc","Interest %":"40%"}]}

The data originates from a scraped table using Beautiful Soup, the rows in the table I have provided are all in separate <tr> tags so when pulled into a pandas dataframe it is presented this way. I dont know if there is functionality to even merge in pandas to the row above so I can have one JSON record per 'Product'. Sometimes there can be multiple 'Owners' per item not just 2.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The output dict line is not the same that you expected, but your dict sintax was wrong. Try this. Only with Pandas

p=[[123,"Test Item 1","Electric","Elctrotech","60%"], [124,"Test Item 2","Wood"," TY Toys","100%"],[125,"Test Item 1","Plastic","NA Materials","100%"], [123,"Test Item 1","Foo","Bar","80%"], [123,"Test Item 1","Electric","TRY TRY TRY","70%"]]

x=pd.DataFrame(p, columns=["ID","Product","Item_Material","Owner","Interest %"])

d=dict(ID="", Item_Material="", Owners={"Owner":[], "Interest %":[]})
x_gb=x.groupby(["Product", "Item_Material"])
grouped_Series_Owner = x_gb["Owner"].apply(list).to_dict()
grouped_Series_Interest = x_gb["Interest %"].apply(list).to_dict()
for k in out.keys():
    d["Item_Material"]=out[k]["Item_Material"]
    d["ID"]=out[k]["Product"]
    d["Owners"]["Owner"]= grouped_Series_Owner[(out[k]["Product"], out[k]["Item_Material"])]
    d["Owners"]["Interest %"]= grouped_Series_Interest[(out[k]["Product"], out[k]["Item_Material"])]
    print(d)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...