Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
162 views
in Technique[技术] by (71.8m points)

c# - Optimize entity framework query

I'm trying to make a stackoverflow clone in my own time to learn EF6 and MVC5, i'm currently using OWin for authentication.

Everything works fine when i have like 50-60 questions, i used Red Gate data generator and try to ramp it up to 1million questions with a couple of thousands of child table rows without relationship just to 'stress' the ORM a bit. Here's how the linq looks like

var query = ctx.Questions
               .AsNoTracking()     //read-only performance boost.. http://visualstudiomagazine.com/articles/2010/06/24/five-tips-linq-to-sql.aspx
               .Include("Attachments")                                
               .Include("Location")
               .Include("CreatedBy") //IdentityUser
               .Include("Tags")
               .Include("Upvotes")
               .Include("Upvotes.CreatedBy")
               .Include("Downvotes")
               .Include("Downvotes.CreatedBy")
               .AsQueryable();

if (string.IsNullOrEmpty(sort)) //default
{
    query = query.OrderByDescending(x => x.CreatedDate);
}
else
{
    sort = sort.ToLower();
    if (sort == "latest")
    {
        query = query.OrderByDescending(x => x.CreatedDate);
    }
    else if (sort == "popular")
    {
        //most viewed
        query = query.OrderByDescending(x => x.ViewCount);
    }
}

var complaints = query.Skip(skipCount)
                      .Take(pageSize)
                      .ToList(); //makes an evaluation..

Needless to say i'm getting SQL timeouts and after installing Miniprofiler, and look at the sql statement generated, it's a monstrous few hundred lines long.

I know i'm joining/including too many tables, but how many projects in real life, we only have to join 1 or 2 tables? There might be situations where we have to do this many joins with multi-million rows, is going stored procedures the only way?

If that's the case, would EF itself be only suitable for small scale projects?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Most likely the problem you are experiencing is a Cartesian product.

Based on just some sample data:

var query = ctx.Questions // 50 
  .Include("Attachments") // 20                                
  .Include("Location") // 10
  .Include("CreatedBy") // 5
  .Include("Tags") // 5
  .Include("Upvotes") // 5
  .Include("Upvotes.CreatedBy") // 5
  .Include("Downvotes") // 5
  .Include("Downvotes.CreatedBy") // 5

  // Where Blah
  // Order By Blah

This returns a number of rows upwards of

50 x 20 x 10 x 5 x 5 x 5 x 5 x 5 x 5 = 156,250,000

Seriously... that is an INSANE number of rows to return.

You really have two options if you are having this issue:

First: The easy way, rely on Entity-Framework to wire up models automagically as they enter the context. And afterwards, use the entities AsNoTracking() and dispose of the context.

// Continuing with the query above:

var questions = query.Select(q => q);
var attachments = query.Select(q => q.Attachments);
var locations = query.Select(q => q.Locations);

This will make a request per table, but instead of 156 MILLION rows, you only download 110 rows. But the cool part is they are all wired up in EF Context Cache memory, so now the questions variable is completely populated.

Second: Create a stored procedure that returns multiple tables and have EF materialize the classes.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...