I'm trying to create an ETL that extracts from mongo, process the data and loads into elastic.
(我正在尝试创建一个从mongo中提取,处理数据并将其加载到Elastic中的ETL。)
I will do a daily load so I thought of naming my index with the current date. (我将每天进行加载,因此我想将索引命名为当前日期。)
This will help me for a later processing I need to do with this first index. (这将帮助我进行以后需要处理的第一个索引。)
I used elasticsearch dsl guide: https://elasticsearch-dsl.readthedocs.io/en/latest/persistence.html The problem that I have comes from my little experience with working with classes. (我使用了elasticsearch dsl指南: https ://elasticsearch-dsl.readthedocs.io/en/latest/persistence.html我遇到的问题来自于我对类的使用经验。)
I don't know how to reset the Index name from the class. (我不知道如何从该类中重置索引名称。)
Here is my code for the class ( custom_indices.py ): (这是我的课程代码( custom_indices.py ):)
from elasticsearch_dsl import Document, Date, Integer, Keyword, Text
from elasticsearch_dsl.connections import connections
from elasticsearch_dsl import Search
import datetime
class News(Document):
title = Text(analyzer='standard', fields={'raw': Keyword()})
manual_tagging = Keyword()
class Index:
name = 'processed_news_'+datetime.datetime.now().strftime("%Y%m%d")
def save(self, ** kwargs):
return super(News, self).save(** kwargs)
def is_published(self):
return datetime.now() >= self.processed
And this is the part of the code where I create the instance to that class:
(这是我为该类创建实例的代码的一部分:)
from custom_indices import News
import elasticsearch
import elasticsearch_dsl
from elasticsearch_dsl.connections import connections
import pandas as pd
import datetime
connections.create_connection(hosts=['localhost'])
News.init()
for index, doc in df.iterrows():
new_insert = News(meta={'id': doc.url_hashed},
title = doc.title,
manual_tagging = doc.customTags,
)
new_insert.save()
Every time I call the "News" class I would expect to have a new name.
(每次我呼叫“新闻”类时,我都希望有一个新名称。)
However, the name doesn't change even if I load the class again ( from custom_indices import News ). (但是,即使我再次加载该类( 从custom_indices import News ),名称也不会更改。)
I know this is only a problem I have when testing but I'd like to know how to force that "reset". (我知道这只是测试时遇到的一个问题,但我想知道如何强制执行“重置”。)
Actually, I originally wanted to change the name outside the class with this line right before the loop: (实际上,我最初想在循环之前使用以下代码在类外更改名称:)
News.Index.name = "NEW_NAME"
However, that didn't work.
(但是,那没有用。)
I was still seeing the name defined on the class. (我仍然看到在类上定义的名称。)
Could anyone please assist? (谁能帮忙吗?)
Many thanks! (非常感谢!)
PS: this must be just an object oriented programming issue. (PS:这一定只是一个面向对象的编程问题。)
Apologies for my ignorance on the subject. (抱歉,我对此一无所知。)
ask by raul translate from so 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…