Why are the columns reordered in alphabet order ?
Because Row
created with **kwargs
sorts the arguments by name.
This design choice is required to address the issues described in PEP 468. Please check SPARK-12467 for a discussion.
Can I preserve the original order of columns without adding a schema ?
Not with **kwargs
. You can use plain tuples
:
df = spark.createDataFrame([(0, 1, 2), (10, 11, 12)], ["c", "b", "a"])
or namedtuple
:
from collections import namedtuple
CBA = namedtuple("CBA", ["c", "b", "a"])
spark.createDataFrame([CBA(0, 1, 2), CBA(10, 11, 12)])
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…