EF Core's Silent Performance Bug: The GroupBy That Runs on the CLIENT
EF Core's Silent Performance Bug: The GroupBy That Runs on the CLIENT
Series: Real Build, Real Bugs — Part 1 of 3
This LINQ ran perfectly on my machine. It passed code review. It shipped. And then, with real data, it quietly started pulling thousands of rows into memory on every page load.
.GroupBy(m => m.ThreadId)
.Select(g => g.OrderByDescending(m => m.CreatedDate).First())
One line. Let me show you exactly why — and the two-line fix.
The Setup
I'm building a messaging inbox. For each thread I want a preview — the latest message in that thread. So the obvious approach: group the messages by thread, and in each group take the newest one.
var lastMessages = await _db.Messages.AsNoTracking()
.Where(m => threadIds.Contains(m.ThreadId) && m.IsActive)
.GroupBy(m => m.ThreadId)
.Select(g => g.OrderByDescending(m => m.CreatedDate).First())
.ToListAsync();
Reads like plain English. "Group by thread, take the newest in each group."
And depending on your EF Core version, this either throws at runtime, or — worse — silently runs the grouping on the client. Meaning EF downloads every active message for every thread into your app's memory and groups them there. Twenty threads in dev? Invisible. Real traffic? It's the query dragging your whole page down.
Why It Breaks
Here's the why. SQL GROUP BY can only return aggregates — a max, a count, a sum. One value per group. It physically cannot hand you back a whole row per group.
But "the newest row per thread" is a top-N-per-group problem. In SQL that needs a totally different shape — a window function with ROW_NUMBER partitioned by thread, or a self-join.
EF Core's MySQL provider will not silently rewrite your First() into a window function. So it's cornered. It can't express "first row per group" as one SQL statement.
- EF Core 3+: just refuses and throws "could not be translated."
- Older provider combos: fall back to client evaluation — and that's the silent killer. Nothing errors. It just gets slow as your data grows.
The mental model:
| Expression | Translatable? |
|---|---|
g.Max(m => m.Id) |
✅ Yes — real aggregate |
g.Count() |
✅ Yes — real aggregate |
g.First() |
❌ No — whole row per group |
g.Select(m => m.Body) |
❌ No — whole entity |
The Fix
Step 1 — Get the id, not the row.
Max(Id) per group is a real aggregate. It translates cleanly to GROUP BY ... MAX(Id).
var lastMessageIds = await _db.Messages.AsNoTracking()
.Where(m => m.OrgLibId == orgId && threadIds.Contains(m.ThreadId) && m.IsActive)
.GroupBy(m => m.ThreadId)
.Select(g => g.Max(m => m.Id))
.ToListAsync();
Step 2 — Fetch exactly those rows.
var lastMessages = await _db.Messages.AsNoTracking()
.Where(m => lastMessageIds.Contains(m.Id))
.Select(m => new { m.ThreadId, m.Body, m.IsFromParent })
.ToListAsync();
Two fully-translated SQL statements instead of one that explodes. That's the whole fix.
⚠️ The One Assumption You Must Write Down
Max(Id) equals "newest" only if your id is monotonic with time — an auto-increment key on an append-only table where rows are inserted in order and never back-dated. For a single-writer message log, that's safe.
If you can back-date rows, order by the real timestamp using a window function instead.
When In-Memory Grouping Is the Right Call
Sometimes grouping in memory is the right call. For a small dashboard count, projecting a tiny two-column row set, pulling it down, and grouping in C# on purpose is perfectly reasonable:
var rows = await _db.Tickets.AsNoTracking()
.Where(t => t.OrgLibId == orgId && t.Status != "Void")
.Select(t => new { t.EventId, t.Status }) // tiny, bounded projection
.ToListAsync();
var soldByEvent = rows.GroupBy(r => r.EventId)
.ToDictionary(g => g.Key, g => g.Count());
The sin isn't grouping in memory — it's grouping in memory by accident, over an unbounded table. Make it a decision, not a surprise.
The Rules
GroupBythenMax,Count, orSumtranslates.GroupBythenFirst()or a whole entity does not.- For "latest row per group," aggregate the id, then fetch by id.
- Turn on EF Core's translation warnings so client-eval shows up in your logs, not your incident channel.
- Small bounded set? Group in memory — deliberately.
#dotnet #csharp #efcore #mysql #performance #aspnetcore