Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
361 views
in Technique[技术] by (71.8m points)

python - 在Elasticsearch DSL中重置索引名称(Reset index name in elasticsearch dsl)

I'm trying to create an ETL that extracts from mongo, process the data and loads into elastic.

(我正在尝试创建一个从mongo中提取,处理数据并将其加载到Elastic中的ETL。)

I will do a daily load so I thought of naming my index with the current date.

(我将每天进行加载,因此我想将索引命名为当前日期。)

This will help me for a later processing I need to do with this first index.

(这将帮助我进行以后需要处理的第一个索引。)

I used elasticsearch dsl guide: https://elasticsearch-dsl.readthedocs.io/en/latest/persistence.html The problem that I have comes from my little experience with working with classes.

(我使用了elasticsearch dsl指南: https ://elasticsearch-dsl.readthedocs.io/en/latest/persistence.html我遇到的问题来自于我对类的使用经验。)

I don't know how to reset the Index name from the class.

(我不知道如何从该类中重置索引名称。)

Here is my code for the class ( custom_indices.py ):

(这是我的课程代码( custom_indices.py ):)

from elasticsearch_dsl import Document, Date, Integer, Keyword, Text
from elasticsearch_dsl.connections import connections
from elasticsearch_dsl import Search
import datetime

class News(Document):
    title = Text(analyzer='standard', fields={'raw': Keyword()})
    manual_tagging = Keyword()

    class Index:
        name = 'processed_news_'+datetime.datetime.now().strftime("%Y%m%d")

    def save(self, ** kwargs):
        return super(News, self).save(** kwargs)

    def is_published(self):
        return datetime.now() >= self.processed

And this is the part of the code where I create the instance to that class:

(这是我为该类创建实例的代码的一部分:)

from custom_indices import News
import elasticsearch
import elasticsearch_dsl
from elasticsearch_dsl.connections import connections
import pandas as pd
import datetime

connections.create_connection(hosts=['localhost'])
News.init()
for index, doc in df.iterrows():
    new_insert = News(meta={'id': doc.url_hashed}, 
                      title = doc.title,
                      manual_tagging = doc.customTags,
                   )
    new_insert.save()

Every time I call the "News" class I would expect to have a new name.

(每次我呼叫“新闻”类时,我都希望有一个新名称。)

However, the name doesn't change even if I load the class again ( from custom_indices import News ).

(但是,即使我再次加载该类( 从custom_indices import News ),名称也不会更改。)

I know this is only a problem I have when testing but I'd like to know how to force that "reset".

(我知道这只是测试时遇到的一个问题,但我想知道如何强制执行“重置”。)

Actually, I originally wanted to change the name outside the class with this line right before the loop:

(实际上,我最初想在循环之前使用以下代码在类外更改名称:)

News.Index.name = "NEW_NAME"

However, that didn't work.

(但是,那没有用。)

I was still seeing the name defined on the class.

(我仍然看到在类上定义的名称。)

Could anyone please assist?

(谁能帮忙吗?)

Many thanks!

(非常感谢!)

PS: this must be just an object oriented programming issue.

(PS:这一定只是一个面向对象的编程问题。)

Apologies for my ignorance on the subject.

(抱歉,我对此一无所知。)

  ask by raul translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Maybe you could take advantage of the fact that Document.init() accepts an index keyword argument .

(也许您可以利用Document.init() 接受index关键字arguments的事实。)

If you want the index name to get set automatically, you could implement init() in the News class and call super().init(...) in your implementation.

(如果要自动设置索引名称,则可以在News类中实现init() ,并在实现中调用super().init(...) 。)

A simplified example (python 3.x):

(一个简化的示例(python 3.x):)

from elasticsearch_dsl import Document
from elasticsearch_dsl.connections import connections
import datetime


class News(Document):
    @classmethod
    def init(cls, index=None, using=None):
        index_name = index or 'processed_news_' + datetime.datetime.now().strftime("%Y%m%d")
        return super().init(index=index_name, using=using)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...