Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
462 views
in Technique[技术] by (71.8m points)

regex - How can I read data from a list and index specific values into Elasticsearch, using python?

I have used "paramiko" to connect from my PC to a devboard, and execute a script. Then I am saving the results of this script in a list (output). I want to extract some values of the list and insert them into Elasticsearch. I have done it manually with the first result of the list. But how can I automate for the rest of the values? Do I need "regex"? Please give me some clues.

Thank you

THIS IS PART OF THE CODE THAT CONNECTS TO THE DEVBOARD, EXECUTES A SCRIPT AND RETRIEVES A LIST=output

def main():
    ssh = initialize_ssh()
    stdin, stdout, stderr = ssh.exec_command('cd coral/tflite/python/examples/classification/Auto_benchmark
 python3 auto_benchmark.py')
    output = stdout.readlines()
    type(output)
    #print(type(output))
    print('
'.join(output))
    ssh.close()

THE LIST LOOKS LIKE THIS:

labels: imagenet_labels.txt 

Model: efficientnet-edgetpu-S_quant_edgetpu.tflite 

Image: img0000.jpg 


----INFERENCE TIME----

Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.

Time: 6.2ms

Results: wall clock

Score: 0.25781

##################################### 

labels: imagenet_labels.txt 

Model: mobilenet_v1_1.0_224_quant_edgetpu.tflite 

Image: img0000.jpg 


----INFERENCE TIME----

Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.

Time: 2.8ms

Results: umbrella

Score: 0.22266

##################################### 
Temperature: 35C

THIS IS THE MAPPING THAT IS NEEDED TO INDEX DATA INTO ELASTICSEARCH

def initialize_mapping_classification(es):
    """
    Initialise les mappings
    """
    mapping_classification = {
        'properties': {
            '@timestamp': {'type': 'date'},
            'type': 'coralito',
            'Model': {'type': 'string'},
            'Time': {'type': 'float'},
            'Results': {'type': 'string'},
            'Score': {'type': 'float'},
            'Temperature': {'type': 'float'}
        }
    }

    if not es.indices.exists(CORAL):
        es.indices.create(CORAL)
        es.indices.put_mapping(body=mapping_classification, doc_type=DOC_TYPE, index=CORAL)

THIS IS MY ATTEMPT. I HAVE DONE IT MANUALLY WITH THE FIRST RESULT OF THE LIST. I WANT TO AUTOMATE IT

if CLASSIFY == 1:
                
        doc = {
            '@timestamp':  str(datetime.datetime.utcnow().strftime("%Y-%m-%d"'T'"%H:%M:%S")),
            'type': 'coralito',
            'Model': "efficientnet-edgetpu-S_quant_edgetpu.tflite",
            'Time': "6.2 ms",
            'Results': "wall clock",
            'Score': "0.25781",
            'Temperature': "35 C"
        }

        response = send_data_elasticsearch(CORAL, DOC_TYPE, doc, es)

        print(doc)

------------------------------EDIT 2---------------------------------------

So this is how my data looks like after using regex to extract the values of interest

enter image description here

This is what I get indexed:

enter image description here

This is my code:

import elasticsearch  
from elasticsearch import Elasticsearch, helpers
import datetime
import re

data = ['labels: imagenet_labels.txt 
', '
', 'Model: efficientnet-edgetpu-S_quant_edgetpu.tflite 
', '
', 'Image: insect.jpg 
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 23.1
', 'Time(ms): 5.7
', '
', '
', 'Inference: corkscrew, bottle screw
', 'Score: 0.03125 
', '
', 'TPU_temp(°C): 57.05
', '##################################### 
', '
', 'labels: imagenet_labels.txt 
', '
', 'Model: efficientnet-edgetpu-M_quant_edgetpu.tflite 
', '
', 'Image: insect.jpg 
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 29.3
', 'Time(ms): 10.8
', '
', '
', "Inference: dragonfly, darning needle, devil's darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk
", 'Score: 0.09375 
', '
', 'TPU_temp(°C): 56.8
', '##################################### 
', '
', 'labels: imagenet_labels.txt 
', '
', 'Model: efficientnet-edgetpu-L_quant_edgetpu.tflite 
', '
', 'Image: insect.jpg 
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 45.6
', 'Time(ms): 31.0
', '
', '
', 'Inference: pick, plectrum, plectron
', 'Score: 0.09766 
', '
', 'TPU_temp(°C): 57.55
', '##################################### 
', '
', 'labels: imagenet_labels.txt 
', '
', 'Model: inception_v3_299_quant_edgetpu.tflite 
', '
', 'Image: insect.jpg 
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 68.8
', 'Time(ms): 51.3
', '
', '
', 'Inference: ringlet, ringlet butterfly
', 'Score: 0.48047 
', '
', 'TPU_temp(°C): 57.3
', '##################################### 
', '
', 'labels: imagenet_labels.txt 
', '
', 'Model: inception_v4_299_quant_edgetpu.tflite 
', '
', 'Image: insect.jpg 
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 121.8
', 'Time(ms): 101.2
', '
', '
', 'Inference: admiral
', 'Score: 0.59375 
', '
', 'TPU_temp(°C): 57.05
', '##################################### 
', '
', 'labels: imagenet_labels.txt 
', '
', 'Model: inception_v2_224_quant_edgetpu.tflite 
', '
', 'Image: insect.jpg 
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 34.3
', 'Time(ms): 16.6
', '
', '
', 'Inference: lycaenid, lycaenid butterfly
', 'Score: 0.41406 
', '
', 'TPU_temp(°C): 57.3
', '##################################### 
', '
', 'labels: imagenet_labels.txt 
', '
', 'Model: mobilenet_v2_1.0_224_quant_edgetpu.tflite 
', '
', 'Image: insect.jpg 
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 14.4
', 'Time(ms): 3.3
', '
', '
', 'Inference: leatherback turtle, leatherback, leathery turtle, Dermochelys coriacea
', 'Score: 0.36328 
', '
', 'TPU_temp(°C): 57.3
', '##################################### 
', '
', 'labels: imagenet_labels.txt 
', '
', 'Model: mobilenet_v1_1.0_224_quant_edgetpu.tflite 
', '
', 'Image: insect.jpg 
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 14.5
', 'Time(ms): 3.0
', '
', '
', 'Inference: bow tie, bow-tie, bowtie
', 'Score: 0.33984 
', '
', 'TPU_temp(°C): 57.3
', '##################################### 
', '
', 'labels: imagenet_labels.txt 
', '
', 'Model: inception_v1_224_quant_edgetpu.tflite 
', '
', 'Image: insect.jpg 
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 21.2
', 'Time(ms): 3.6
', '
', '
', 'Inference: pick, plectrum, plectron
', 'Score: 0.17578 
', '
', 'TPU_temp(°C): 57.3
', '##################################### 
', '
']


# declare a client instance of the Python Elasticsearch library
client = Elasticsearch("http://localhost:9200")

#using regex 
regex = re.compile(r'(w+)((.+)):s(.*)|(w+:)s(.*)')
match_regex = list(filter(regex.match, data))
match = [line.rstrip('
') for line in match_regex]


#using "bulk"
def yield_docs():
    """
    Initialise les mappings
    """
    
    doc_source = {
        "data": match
        
        }

    # use a yield generator so that the doc data isn't loaded into memory
    yield {
        "_index": "coralito",
        "_type": "coralote",
        "_source": doc_source
        }

try:
    # make the bulk call using 'actions' and get a response
    resp = helpers.bulk(
        client,
        yield_docs()
    )
    print ("
helpers.bulk() RESPONSE:", resp)
    print ("RESPONSE TYPE:", type(resp))
except Exception as err:
    print("
helpers.bulk() ERROR:", err)

-----------------------------EDIT 3---------------------

enter image description here enter image description here enter image description here

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
  1. Remove the line breaks
  2. Split the text by a common delimiter (----INFERENCE TIME---- would be a good start I think)
  3. Extract the keys & values using for example r'(w+:)s(.*)' or a named lookbehind such as r'(?<=Note: ).*' etc
  4. Parse the numeric values (time, score, temperature, ...) -- you'll thank me later ;)
  5. Extend the Model mapping w/ a keyword datatype -- otherwise the dot will be tokenized away and you'll wonder why you can't search for exact matches nor aggregate on it
  6. Prepare the objects that you'll want to sync
  7. Bulk upload to ElasticSearch

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...