Entity Framework – Don’t take aggregates for granted
With the advent of LINQ enabled ORM products, it would be easy to get carried away with the idea that the ‘Framework’ will just do it all for you. For example, all those wonderful aggregation extension methods that have been put into our eager little hands, such as .Sum(), .Count() and .Average() etc, however the question begs, are we naturally being lead into the pit of success? What are we doing to the concurrency and throughput of our system when we aggregate this way?
Isolation default scenario
Take this LINQ To Entities query:
var sumAmt1 = ctx.Accounts.Sum(a => a.Amount);
when profiled produces this output:
– network protocol: LPC set transaction isolation level read committed SELECT [GroupBy1].[A1] AS [C1] FROM ( SELECT SUM([Extent1].[Amount]) AS [A1] FROM [dbo].[Accounts] AS [Extent1] ) AS [GroupBy1]
Therefore we can see that the default Isolation level chosen is Read Committed which will mean that the accuracy of .Sum() is questionable, particularly as account information becomes more volatile with increased frequency is visiting existing values in the Amount field and with the addition of new rows, in what we would hope is a highly concurrent system. Typically however we might come across aggregate functions such as those show below, included in a reporting module of some kind.
var ctx = new AggregateProblemEntities(); var query = from c in ctx.Customer join a in ctx.Accounts on c.EntityID equals a.EntityID let sum = ctx.Accounts .Where(aa => aa.EntityID == c.EntityID) .Sum(aa => aa.Amount) select new { ID = c.EntityID, Name = c.FirstName + e.LastName, TotalAccount = sum }; var byacctId = from a in ctx.Accounts group a by a.AccountID into g select new { ID = g.FirstOrDefault().AccountID, TotalAmoutForID = g.Sum(ac => ac.Amount) };
I found over time that many developers will make heavy use of aggregate functions in this way, whilst addressing an OLTP systems.
If of course we start to wind up our Isolation Levels in the hope that we gain some accuracy as to the value reported in the .Sum() we can expect to hurt the concurrency of the system and increase the likelihood of deadlocks. Even Repeatable Read has it’s issues – phantom rows can still effect aggregates in a negative way.
using (TransactionScope sc = new TransactionScope(TransactionScopeOption.RequiresNew, new TransactionOptions() { IsolationLevel = IsolationLevel.RepeatableRead })) { var sumAmt2 = ctx.Accounts .Sum(a => a.Amount); }
Perhaps at this stage we are best left to consider teasing out the aggregated fields and putting them somewhere else in our schema, perhaps addressing the problem by adding a new column – say AccountTotals on the Customer entity (and table). From a Domain Model perspective, each account amount when updated / added or removed would change the value in Customer table, requiring that we only change the value for one row at a time and when querying it for a Sum (even though there is no sum in this case), there will be no need to lock the table or page for consistency. One other thing to consider is that aggregations are usually a function of reporting and this is where architecturally speaking CQRS can help us solve the problem and build a more highly concurrent system by publishing events (from commands) and storing query data in a denormalized persistent storage through publishing domain events.
The LINQ queries from the his post were written to correlate and run against the TSQL examples demonstrated in this post http://bit.ly/h4foC by Merrill Aldrich.
This all leads into some soon to follow posts which will begin the journey into working with Entity Framework (4.0 only) in a way that is very explicit where interfaces are used to describe roles (business behaviours) and those roles are enlisted to describe to the system a) how to fetch data (eager or lazy) and b) which part of an entities data structure to fetch and c) which polymorphic behaviours we are interested in. Here is a quick preview of the type of code it brings:
var custs = repo.Get<IMakeCustomerDelinquent>();
Ok I got off the track a little and that’s all I want to say about the future post.
No comments yet. Be the first.
Leave a reply








