Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
115 views
in Technique[技术] by (71.8m points)

python - Trying to group different values that have some similarities in a dictionary

I'm parsing a JSON that kinda looks like this:

[{"acc":P1,"Lenght":855,..."MBDB-1":{"source_id":"2btp_A","regions":[[70,73],[231,234]],"content_fraction":0.033,"content_count":8},"MBDB-2":{...},"MDB-2":{...}},
{"acc":P2,"Lenght":145,...,"MBDB-14":{...},...}]

And I'm trying to generate a dictionary with only the information that I want (ie, "acc", "Lenght") and all the information INSIDE the keys that starts with "MBDB", no matter what comes after that (the actual file is huge, with a lot of information that I don't really need).

For the first two items, it's fairly easy. This is what I got:

import json 

my_dict= dict.fromkeys(['ID', 'MISSING','LENGHT'])
with open("...mypathJson1.json") as f:
    data = json.loads(f.read())
    for i in data:
        if "acc" in i:
            my_dict["ID"]=i["acc"]

But I'm really lost on how to append each of the values of "MBDB-something" to the MISSING key. As far as I understand, I can't use startswith(), because I'm working with a dict (generated by json.loads()).

This is what the result should look like:

  ID LENGHT source_id             regions content_count

0 P1    855    2btp_A [[70,73],[231,234]]             8
1 P1    855       ...               [...]             #   
2 P2    145       ...               [...]             #

So I can later use .explode and perform different operations on some of the information that these keys hold. I feel that I'm out of my league to solve this issue, so any advice is welcome! EDIT: I've edited the desired output to be the content of the different keys INSIDE all the "MBDB" keys.

question from:https://stackoverflow.com/questions/65890539/trying-to-group-different-values-that-have-some-similarities-in-a-dictionary

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Since the key are consistent in the json object, you can insert one item in a list based on every "MBDB" key that you find.

# load data
with open("...mypathJson1.json") as f:
    data = json.loads(f.read())

out = [] # final output
for d in data:
    for k, v in d.items():
        if "MBDB" in k: 
            out.append({
                "ID": d["acc"],
                "LENGTH": d["Lenght"],
                "source_id": v["source_id"],
                "regions": v["regions"],
                "content_count": v["content_count"]
            })

Final output here will be a list of dict. you can use pandas to convert it into a dataframe.

df = pandas.DataFrame(out)

# output
 ID  LENGTH source_id                 regions  content_count
0  P1     855    2btp_A  [[70, 73], [231, 234]]              8
1  P1     855    2btp_B  [[70, 73], [231, 234]]              8
2  P2     855    2btp_A  [[70, 73], [231, 234]]              8
3  P2     855    2btp_B  [[70, 73], [231, 234]]              8

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...