Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
523 views
in Technique[技术] by (71.8m points)

python - Use separate environ and sys.path between dags

* TLDR: This question originally based on problem that was later determined to be due to the updated title of this question. Skip to "Update 2" for most relevant question details.

Have dag file that imports a python list of dicts from another python file in another location and creates a dag based on the list's dict values and airflow is having weird problem where it appear to see something different that when I run the dag file manually. Some snippet like...

...
environ["PROJECT_HOME"] = "/path/to/some/project/files"
# import certain project files
sys.path.append(environ["PROJECT_HOME"])
import tables as tt

tables = tt.tables

for table in tables:
    print table
    assert isinstance(table, dict)
    <create some dag task 1>
    <create some dag task 2>
    ...

When running the py file manually from the ~/airflow/dag/ dir, there are no errors thrown and the for loop prints the dicts, but airflow apparently sees things differently in the webserver and when running airflow list_dags. Running airflow list_dags I get the error

    assert isinstance(table, dict)
AssertionError

and don't know how to test what is causing this, since again when running the py file manually from the dag location, there is no problem and the print statement shows dicts and the webserver UI shows no further error message.

Anyone know what could be going on here? Maybe something about how the imports are working?

* Update 1:

Seeing more weirdness in that when calling functions from the imported python module, everything runs fine when running the dag file manually, but airflow list_dags says...

AttributeError: 'module' object has no attribute 'my_func'

making me even further suspect some import weirdness, even though this is the exact same process I am using in another dag file (ie. setting some environ values and appending to sys.path) to import modules for that dag and have no problems there.

* Update 2:

The problem appears to be (after printing various sys.path, environ, and module.__all__ info at the erroring assert) that a similarly-named module that is getting imported is from the another project I did this same exact procedure for. Ie. have another file that does...

...
environ["PROJECT_HOME"] = "/path/to/some/project/files"
# import certain project files
sys.path.append(environ["PROJECT_HOME"])
import tables as tt

tables = tt.tables

for table in tables:
    print table
    assert isinstance(table, dict)
    <create some dag task 1>
    <create some dag task 2>
    ...

and this project home is getting used instead to download a similarly named module that also has a obj named what I was expecting (even when I insert the projects folder at front of sys.path). Other than making packaged dags is there a way to keep airflow from combining all of the environ and sys.path values of different dags (since I use $PROJECT_HOME in various bash and python task scripts)?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

For bringing in specific modules from other paths, I use the solution here to import other python modules by specifying their absolute file path.

For running various python scripts as airflow tasks using different python interpreters, I do something like...

do_stuff_a = BashOperator(
        task_id='my_task_a',
        bash_command='/path/to/virtualenv_a/bin/python /path/to/script_a.py'),
        execution_timeout=timedelta(minutes=30),
        dag=dag)

as done in similar question here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...