Deividas wrote:
OK, let assume that a query does return duplicate results. In that case what is the logic behind removing the TOP(10)? The query would return 10 results with duplicates, which could be exactly what I'm expecting so why is that bad in any way?
That's a discussion which has been held a couple of times, so I won't rehash it here
. Our framework from the beginning has been focusing on returning the data you are interested in. So if you do a Take(10) (or in our own api fetch 10 rows), you don't want 10 duplicate rows, but 10 rows. E.g. when you fetch customers with a filter on order, using a join with order, you get duplicate rows. Fetching the customer entities will filter out duplicates. This is done across the board, so also for projections.
The same thing is true for paging. If you page from 1 page to the other, you want actual data, not the same page with duplicate rows. As paging on the server side requires that the set which is paged has unique rows (otherwise it doesn't work), the query has to make sure there are unique rows. This requires Distinct.
This means that to do a server side limit, it has to be sure there are no duplicate rows. If distinct hasn't been specified (which is the case in a Linq query for example), and joins are in the query, it can't guarantee that the set has unique rows, so it will switch to client-side limiting and distinct filtering. Specifying Distinct() makes this work. Client-side filtering/limiting uses the datareader to filter/limit rows, and fetches 1 row at a time from the datareader, so it doesn't fetch the complete set.
Your query could result in duplicates:
select customerid from customers join orders on customers.customerid = orders.customerid
gives duplicate customerids, because the join is a 1:n.
This behavior is in our framework for many years and therefore we can't (and actually we won't) change it. We wont change it because it's consistent with the entity fetch logic we have.
However, as some of you want to get duplicate rows with a take, page (I really have no idea why on earth one wants that, but alas... ), we'll add a feature to 3.1 which allows you to define the behavior of distinct in limited projections (like Take(), paging). In practice this won't be something you'd need however, as the data will be used somewhere and if you have to deal with duplicate rows (which have no meaning. If you want to count things, run a groupby with count), this always leads to unwanted behavior: the duplicate data has no meaning, it's the same as the other data already read.