regex - How can I read data from a list and index specific values into Elasticsearch, using python?

Question

Welcome To Ask or Share your Answers For Others

regex - How can I read data from a list and index specific values into Elasticsearch, using python?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

regex - How can I read data from a list and index specific values into Elasticsearch, using python?

I have used "paramiko" to connect from my PC to a devboard, and execute a script. Then I am saving the results of this script in a list (output). I want to extract some values of the list and insert them into Elasticsearch. I have done it manually with the first result of the list. But how can I automate for the rest of the values? Do I need "regex"? Please give me some clues.

Thank you

THIS IS PART OF THE CODE THAT CONNECTS TO THE DEVBOARD, EXECUTES A SCRIPT AND RETRIEVES A LIST=output

def main():
    ssh = initialize_ssh()
    stdin, stdout, stderr = ssh.exec_command('cd coral/tflite/python/examples/classification/Auto_benchmark
 python3 auto_benchmark.py')
    output = stdout.readlines()
    type(output)
    #print(type(output))
    print('
'.join(output))
    ssh.close()

THE LIST LOOKS LIKE THIS:

labels: imagenet_labels.txt 

Model: efficientnet-edgetpu-S_quant_edgetpu.tflite 

Image: img0000.jpg 


----INFERENCE TIME----

Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.

Time: 6.2ms

Results: wall clock

Score: 0.25781

##################################### 

labels: imagenet_labels.txt 

Model: mobilenet_v1_1.0_224_quant_edgetpu.tflite 

Image: img0000.jpg 


----INFERENCE TIME----

Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.

Time: 2.8ms

Results: umbrella

Score: 0.22266

##################################### 
Temperature: 35C

THIS IS THE MAPPING THAT IS NEEDED TO INDEX DATA INTO ELASTICSEARCH

def initialize_mapping_classification(es):
    """
    Initialise les mappings
    """
    mapping_classification = {
        'properties': {
            '@timestamp': {'type': 'date'},
            'type': 'coralito',
            'Model': {'type': 'string'},
            'Time': {'type': 'float'},
            'Results': {'type': 'string'},
            'Score': {'type': 'float'},
            'Temperature': {'type': 'float'}
        }
    }

    if not es.indices.exists(CORAL):
        es.indices.create(CORAL)
        es.indices.put_mapping(body=mapping_classification, doc_type=DOC_TYPE, index=CORAL)

THIS IS MY ATTEMPT. I HAVE DONE IT MANUALLY WITH THE FIRST RESULT OF THE LIST. I WANT TO AUTOMATE IT

if CLASSIFY == 1:
                
        doc = {
            '@timestamp':  str(datetime.datetime.utcnow().strftime("%Y-%m-%d"'T'"%H:%M:%S")),
            'type': 'coralito',
            'Model': "efficientnet-edgetpu-S_quant_edgetpu.tflite",
            'Time': "6.2 ms",
            'Results': "wall clock",
            'Score': "0.25781",
            'Temperature': "35 C"
        }

        response = send_data_elasticsearch(CORAL, DOC_TYPE, doc, es)

        print(doc)

------------------------------EDIT 2---------------------------------------

So this is how my data looks like after using regex to extract the values of interest

This is what I get indexed:

This is my code:

import elasticsearch  
from elasticsearch import Elasticsearch, helpers
import datetime
import re

data = ['labels: imagenet_labels.txt 
', '
', 'Model: efficientnet-edgetpu-S_quant_edgetpu.tflite 
', '
', 'Image: insect.jpg 
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 23.1
', 'Time(ms): 5.7
', '
', '
', 'Inference: corkscrew, bottle screw
', 'Score: 0.03125 
', '
', 'TPU_temp(°C): 57.05
', '##################################### 
', '
', 'labels: imagenet_labels.txt 
', '
', 'Model: efficientnet-edgetpu-M_quant_edgetpu.tflite 
', '
', 'Image: insect.jpg 
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 29.3
', 'Time(ms): 10.8
', '
', '
', "Inference: dragonfly, darning needle, devil's darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk
", 'Score: 0.09375 
', '
', 'TPU_temp(°C): 56.8
', '##################################### 
', '
', 'labels: imagenet_labels.txt 
', '
', 'Model: efficientnet-edgetpu-L_quant_edgetpu.tflite 
', '
', 'Image: insect.jpg 
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 45.6
', 'Time(ms): 31.0
', '
', '
', 'Inference: pick, plectrum, plectron
', 'Score: 0.09766 
', '
', 'TPU_temp(°C): 57.55
', '##################################### 
', '
', 'labels: imagenet_labels.txt 
', '
', 'Model: inception_v3_299_quant_edgetpu.tflite 
', '
', 'Image: insect.jpg 
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 68.8
', 'Time(ms): 51.3
', '
', '
', 'Inference: ringlet, ringlet butterfly
', 'Score: 0.48047 
', '
', 'TPU_temp(°C): 57.3
', '##################################### 
', '
', 'labels: imagenet_labels.txt 
', '
', 'Model: inception_v4_299_quant_edgetpu.tflite 
', '
', 'Image: insect.jpg 
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 121.8
', 'Time(ms): 101.2
', '
', '
', 'Inference: admiral
', 'Score: 0.59375 
', '
', 'TPU_temp(°C): 57.05
', '##################################### 
', '
', 'labels: imagenet_labels.txt 
', '
', 'Model: inception_v2_224_quant_edgetpu.tflite 
', '
', 'Image: insect.jpg 
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 34.3
', 'Time(ms): 16.6
', '
', '
', 'Inference: lycaenid, lycaenid butterfly
', 'Score: 0.41406 
', '
', 'TPU_temp(°C): 57.3
', '##################################### 
', '
', 'labels: imagenet_labels.txt 
', '
', 'Model: mobilenet_v2_1.0_224_quant_edgetpu.tflite 
', '
', 'Image: insect.jpg 
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 14.4
', 'Time(ms): 3.3
', '
', '
', 'Inference: leatherback turtle, leatherback, leathery turtle, Dermochelys coriacea
', 'Score: 0.36328 
', '
', 'TPU_temp(°C): 57.3
', '##################################### 
', '
', 'labels: imagenet_labels.txt 
', '
', 'Model: mobilenet_v1_1.0_224_quant_edgetpu.tflite 
', '
', 'Image: insect.jpg 
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 14.5
', 'Time(ms): 3.0
', '
', '
', 'Inference: bow tie, bow-tie, bowtie
', 'Score: 0.33984 
', '
', 'TPU_temp(°C): 57.3
', '##################################### 
', '
', 'labels: imagenet_labels.txt 
', '
', 'Model: inception_v1_224_quant_edgetpu.tflite 
', '
', 'Image: insect.jpg 
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 21.2
', 'Time(ms): 3.6
', '
', '
', 'Inference: pick, plectrum, plectron
', 'Score: 0.17578 
', '
', 'TPU_temp(°C): 57.3
', '##################################### 
', '
']


# declare a client instance of the Python Elasticsearch library
client = Elasticsearch("http://localhost:9200")

#using regex 
regex = re.compile(r'(w+)((.+)):s(.*)|(w+:)s(.*)')
match_regex = list(filter(regex.match, data))
match = [line.rstrip('
') for line in match_regex]


#using "bulk"
def yield_docs():
    """
    Initialise les mappings
    """
    
    doc_source = {
        "data": match
        
        }

    # use a yield generator so that the doc data isn't loaded into memory
    yield {
        "_index": "coralito",
        "_type": "coralote",
        "_source": doc_source
        }

try:
    # make the bulk call using 'actions' and get a response
    resp = helpers.bulk(
        client,
        yield_docs()
    )
    print ("
helpers.bulk() RESPONSE:", resp)
    print ("RESPONSE TYPE:", type(resp))
except Exception as err:
    print("
helpers.bulk() ERROR:", err)

-----------------------------EDIT 3---------------------

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:25:56+0000

Remove the line breaks
Split the text by a common delimiter (----INFERENCE TIME---- would be a good start I think)
Extract the keys & values using for example r'(w+:)s(.*)' or a named lookbehind such as r'(?<=Note: ).*' etc
Parse the numeric values (time, score, temperature, ...) -- you'll thank me later ;)
Extend the Model mapping w/ a keyword datatype -- otherwise the dot will be tokenized away and you'll wonder why you can't search for exact matches nor aggregate on it
Prepare the objects that you'll want to sync
Bulk upload to ElasticSearch

Categories

regex - How can I read data from a list and index specific values into Elasticsearch, using python?

regex - How can I read data from a list and index specific values into Elasticsearch, using python?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags