python - Cannot find col function in pyspark

Question

Welcome To Ask or Share your Answers For Others

python - Cannot find col function in pyspark

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Cannot find col function in pyspark

In pyspark 1.6.2, I can import col function by

from pyspark.sql.functions import col

but when I try to look it up in the Github source code I find no col function in functions.py file, how can python import a function that doesn't exist?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:13:29+0000

It exists. It just isn't explicitly defined. Functions exported from pyspark.sql.functions are thin wrappers around JVM code and, with a few exceptions which require special treatment, are generated automatically using helper methods.

If you carefully check the source you'll find col listed among other _functions. This dictionary is further iterated and _create_function is used to generate wrappers. Each generated function is directly assigned to a corresponding name in the globals.

Finally __all__, which defines a list of items exported from the module, just exports all globals excluding ones contained in the blacklist.

If this mechanisms is still not clear you can create a toy example:

Create Python module called foo.py with a following content:

# Creates a function assigned to the name foo
globals()["foo"] = lambda x: "foo {0}".format(x)

# Exports all entries from globals which start with foo
__all__ = [x for x in globals() if x.startswith("foo")]

Place it somewhere on the Python path (for example in the working directory).
Import foo:
```
from foo import foo

foo(1)
```

An undesired side effect of such metaprogramming approach is that defined functions might not be recognized by the tools depending purely on static code analysis. This is not a critical issue and can be safely ignored during development process.

Depending on the IDE installing type annotations might resolve the problem (see for example zero323/pyspark-stubs#172).

Categories

python - Cannot find col function in pyspark

python - Cannot find col function in pyspark

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags