The comparison of the SubjectId
is not the problem because c.SubjectId
is a value of a primitive type (int
, I guess). The exception complains about Equals(c.Students)
. c.Students
is a constant (with respect to the query duplicates
) but not a primitive type.
I would also try to do the comparison in memory and not in the database. You are loading the whole data into memory anyway when you start your first foreach
loop: It executes the query allClasses
. Then inside of the loop you extend the IQueryable allClasses
to the IQueryable duplicates
which gets executed then in the inner foreach
loop. This is one database query per element of your outer loop! This could explain the poor performance of the code.
So I would try to perform the content of the first foreach
in memory. For the comparison of the Students
list it is necessary to compare element by element, not the references to the Students collections because they are for sure different.
var sb = new StringBuilder();
using (var ctx = new Ctx())
{
ctx.CommandTimeout = 10000; // Perhaps not necessary anymore
var allClasses = ctx.Classes.Include("Students").OrderBy(o => o.Id)
.ToList(); // executes query, allClasses is now a List, not an IQueryable
// everything from here runs in memory
foreach (var c in allClasses)
{
var duplicates = allClasses.Where(
o => o.SubjectId == c.SubjectId &&
o.Id != c.Id &&
o.Students.OrderBy(s => s.Name).Select(s => s.Name)
.SequenceEqual(c.Students.OrderBy(s => s.Name).Select(s => s.Name)));
// duplicates is an IEnumerable, not an IQueryable
foreach (var d in duplicates)
sb.Append(d.LongName)
.Append(" is a duplicate of ")
.Append(c.LongName)
.Append("<br />");
}
}
lblResult.Text = sb.ToString();
Ordering the sequences by name is necessary because, I believe, SequenceEqual
compares length of the sequence and then element 0 with element 0, then element 1 with element 1 and so on.
Edit To your comment that the first query is still slow.
If you have 1300 classes with 30 students each the performance of eager loading (Include
) could suffer from the multiplication of data which are transfered between database and client. This is explained here: How many Include I can use on ObjectSet in EntityFramework to retain performance? . The query is complex because it needs a JOIN
between classes and students and object materialization is complex as well because EF must filter out the duplicated data when the objects are created.
An alternative approach is to load only the classes without the students in the first query and then load the students one by one inside of a loop explicitely. It would look like this:
var sb = new StringBuilder();
using (var ctx = new Ctx())
{
ctx.CommandTimeout = 10000; // Perhaps not necessary anymore
var allClasses = ctx.Classes.OrderBy(o => o.Id).ToList(); // <- No Include!
foreach (var c in allClasses)
{
// "Explicite loading": This is a new roundtrip to the DB
ctx.LoadProperty(c, "Students");
}
foreach (var c in allClasses)
{
// ... same code as above
}
}
lblResult.Text = sb.ToString();
You would have 1 + 1300 database queries in this example instead of only one, but you won't have the data multiplication which occurs with eager loading and the queries are simpler (no JOIN
between classes and students).
Explicite loading is explained here:
If you work with Lazy Loading the first foreach
with LoadProperty
would not be necessary as the Students
collections will be loaded the first time you access it. It should result in the same 1300 additional queries like explicite loading.