(Warning: Mammoth answer ahead. The part up to the first horizontal line makes a good tl;dr section, I suppose)
I'm not sure if I qualify as Python guru... but I have a solid grasp on iteration in Python, so let's try :)
First off: Afaik, LINQ queries are executed lazily - if that's the case, generator expressions are a closer Python concept (either way, list-, dict- and set comprehensions are conceptually just generator expressions fed to the list/dict/set constructor!).
Also, there is a conceptual difference: LINQ is for, as the name says, querying data structures. List-/dict-/set comprehensions are possible application of this (e.g. filtering and projecting the items of a list). So they are in fact less general (as we will see, many things built into LINQ are not built into them). Likewise, generator expressions are a way to formulate an one-time forward iterator in-place (I like to think of it as lambda for generator functions, only without an ugly, long keyword ;) ) and not a way to describe a complex query. They overlap, yes, but they are not identical. If you want all the power of LINQ in Python, you will have to write a fully-fledged generator. Or combine the numerous powerful generators built-in and in itertools
.
Now, Python counterparts for the LINQ capabilities Jon Skeet named:
Projections: (x.foo for ...)
Filtering: (... if x.bar > 5)
- Joins (x join y on x.foo equals y.bar)
The closest thing would be((x_item, next(y_item for y_item in y if x_item.foo == y_item.bar)) for x_item in x)
, I suppose.
Note that this will not iterate over the whole y for each x_item, it will only get the first match.
- Group joins (x join y on x.foo equals y.bar into g)
This is harder. Python doesn't have anonymous types, though they are trivial to do yourself if you don't mind messing with __dict__
:
class Anonymous(object):
def __init__(self, **kwargs):
self.__dict__ = kwargs
Then, we could do (Anonymous(x=x, y=y) for ...)
to get a list of objects that are have x
and y
members with the respective values.
The right thing is usually feeding the results to the constructor of an approriate class, say, XY.
- Grouping (group x.foo by x.bar)
Now it gets hairy... there is no build-in way, afaik. But we can define it ourself if we need it:
from collections import defaultdict
def group_by(iterable, group_func):
groups = defaultdict(list)
for item in iterable:
groups[group_func(item)].append(item)
return groups
Example:
>>> from operator import attrgetter
>>> group_by((x.foo for x in ...), attrgetter('bar'))
defaultdict(<class 'list'>, {some_value_of_bar: [x.foo of all x where x.bar == some_value_of_bar], some_other_value_of_bar: [...], ...})
This requires whatever we group by to be hashable, though. It's possible to avoid this, and I'll make a stab if there is public demand. But for now, I'm being lazy :)
We can also just return an iterable of groups without the values we grouped by, by calling .values()
on the result (of course we can feed that to list
to get something we can index and iterate several times). But who knows if we won't need the group values...
- Ordering (orderby x.foo ascending, y.bar descending)
Sorting needs special syntax? The build-in sorted
works for iterables, too: sorted(x % 2 for x in range(10))
or sorted(x for x in xs, key=attrgetter('foo'))
. Sorted ascending by default, the keyword argument reverse
gives descending order.
Alas, afaik sorting by multiple attributes is not that easy, especially when mixing ascending and descending. Hmm... topic for a recipe?
- Intermediate variables (let tmp = x.foo)
No, not possible in comprehensions or generator expressions - they are, as the name says, supposed to be expressions (and usually only span one or two lines). It's perfectly possible in generator function, though:
(x * 2 for x in iterable)
rewritten as generator with intermediate variable:
def doubles(iterable):
for x in iterable:
times2 = x * 2
yield times2
Flattening: (c for s in ("aa","bb") for c in s )
Note that although LINQ to Objects deals with delegates, other query providers (e.g. LINQ to SQL) can deal in expression trees which describe the query instead of just presenting executable delegates. This allows the query to be translated into SQL (or other query languages) - again, I don't know whether Python supports that sort of thing or not. It's a significant part of LINQ though.
Python definitely does no such thing. List expressions correspond one-to-one to accumulating a plain list in a (possibly nested) for-loop, generator expressions correspond one-to-one to a generator.
Given the parser
and ast
module, it would be possible in theory to write a library for converting a comprehension into e.g. an SQL query. But nobody cares to.