* TLDR: This question originally based on problem that was later determined to be due to the updated title of this question. Skip to "Update 2" for most relevant question details.
Have dag file that imports a python list of dicts from another python file in another location and creates a dag based on the list's dict values and airflow is having weird problem where it appear to see something different that when I run the dag file manually. Some snippet like...
...
environ["PROJECT_HOME"] = "/path/to/some/project/files"
# import certain project files
sys.path.append(environ["PROJECT_HOME"])
import tables as tt
tables = tt.tables
for table in tables:
print table
assert isinstance(table, dict)
<create some dag task 1>
<create some dag task 2>
...
When running the py file manually from the ~/airflow/dag/
dir, there are no errors thrown and the for loop prints the dicts, but airflow apparently sees things differently in the webserver and when running airflow list_dags
. Running airflow list_dags
I get the error
assert isinstance(table, dict)
AssertionError
and don't know how to test what is causing this, since again when running the py file manually from the dag location, there is no problem and the print statement shows dicts and the webserver UI shows no further error message.
Anyone know what could be going on here? Maybe something about how the imports are working?
* Update 1:
Seeing more weirdness in that when calling functions from the imported python module, everything runs fine when running the dag file manually, but airflow list_dags
says...
AttributeError: 'module' object has no attribute 'my_func'
making me even further suspect some import weirdness, even though this is the exact same process I am using in another dag file (ie. setting some environ
values and appending to sys.path
) to import modules for that dag and have no problems there.
* Update 2:
The problem appears to be (after printing various sys.path
, environ
, and module.__all__
info at the erroring assert) that a similarly-named module that is getting imported is from the another project I did this same exact procedure for. Ie. have another file that does...
...
environ["PROJECT_HOME"] = "/path/to/some/project/files"
# import certain project files
sys.path.append(environ["PROJECT_HOME"])
import tables as tt
tables = tt.tables
for table in tables:
print table
assert isinstance(table, dict)
<create some dag task 1>
<create some dag task 2>
...
and this project home is getting used instead to download a similarly named module that also has a obj named what I was expecting (even when I insert the projects folder at front of sys.path
). Other than making packaged dags is there a way to keep airflow from combining all of the environ
and sys.path
values of different dags (since I use $PROJECT_HOME in various bash and python task scripts)?
See Question&Answers more detail:
os