python - I use NOT IN to compare and object's id to a list of ids. Comparison seems to always evaluate to TRUE

Question

Welcome To Ask or Share your Answers For Others

python - I use NOT IN to compare and object's id to a list of ids. Comparison seems to always evaluate to TRUE

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - I use NOT IN to compare and object's id to a list of ids. Comparison seems to always evaluate to TRUE

Consider the following Code in Python 3.8:

def clean_data(data) -> list:
    """Takes a list of Webelements and extracts needed information using RegEx.
        Groups Information into Dictionarys.
        Returns List of Dictionarys"""
    clean_data = []

    for element in data:

        datapoint = {}
        text = element.text

        name = re.search(r"A.*", text).group()

        auction_id = re.search(r"(?:LOT ID: )(d+)", text).group(1)

        auction_price = re.search(r"(?:€)(d+.d+.d+)", text).group(1)
        auction_price = auction_price.replace(",", "")
        auction_price = float(auction_price)

        date = re.search(r"(?:Sold on |Reserve not met )(.*)", text).group(1)
        date = datetime.strptime(date, "%d/%m/%Y")

        datapoint["auction_id"] = int(auction_id)
        datapoint["name"] = name
        datapoint["auction_price"] = auction_price
        datapoint["date"] = date

        clean_data.append(datapoint)
        
    return clean_data

def read_id_from_database() -> list:
    """Connects to the MySQL Database and returns a list of all existing auction_ids."""
    connection = mysql.connector.connect(**database_credentials)
    cursor = connection.cursor()
    SQL = """SELECT auction_id FROM database"""
    cursor.execute(SQL)
    ids = cursor.fetchall()
    existing_ids = [id[0] for id in ids]

    return existing_ids


def select_new_datapoints(data : list) -> list:
    """Takes a list of dictionaries. Compares the auction_id with a list of existing auction IDs.
        Returns a list of all dictionaries with new auction IDs"""
    existing_ids = read_id_from_database()
    new_datapoints = []
    for element in data:
        if int(element["auction_id"]) not in existing_ids:
            new_datapoints.append(element)
    return new_datapoints

Clean_data extracts relevant information from a webdriver object on auction items. Every item is represented as a dictionary called datapoint. Every key-value-pair in this dictionary is one bit of information. The function returns a list of dictionaries.

read_id_from_database just returns a list of all auction_ids which are already stored in my database.

select_new_datapoints checks whether an item on the Clean_data list has an id which is already in my database by comparing its id to the list provided by read_id_from_database. If the id does not already exist, the item gets added to the new_datapoint list.

for element in data:
        if int(element["auction_id"]) not in existing_ids:
            new_datapoints.append(element)

Those items get added to my database later on.

The problem is now, that the select_new_datapoints method does not work. It always keeps all items, despite most or even all of the IDs being already present in my database.

I already checked this by printing the IDs from clean_data and existing_ids into a csv file and comparing them visually:

CSV File

Both lists are identical, meaning the read_id_from_database should return an empty new_datapoints list. However, the read_id_from_database instead returns a list with 226 elements, meaning something is going wring here:

new_datapoints = []
    for element in data:
        if int(element["auction_id"]) not in existing_ids:
            new_datapoints.append(element)
    return new_datapoints

I assume that for some reason

    if int(element["auction_id"]) not in existing_ids:
        new_datapoints.append(element)

always evaluates to TRUE and therefore all elements get appended. I was unable to find out why this is since

if int(element["auction_id"]) not in existing_ids:

Should work as intended and my csv showed me that every element's id is already in the existing_ids list. I also already checked the types of the data I am comparing they are both int.

question from:https://stackoverflow.com/questions/65831154/i-use-not-in-to-compare-and-objects-id-to-a-list-of-ids-comparison-seems-to-al

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

python - I use NOT IN to compare and object's id to a list of ids. Comparison seems to always evaluate to TRUE

python - I use NOT IN to compare and object's id to a list of ids. Comparison seems to always evaluate to TRUE

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags