- Home
- LLBLGen Pro
- LLBLGen Pro Runtime Framework
Paging on a slow query is still slow
Joined: 02-Jan-2019
I dont get this. I have a logevents-table which contains around 260.000 rows. Not much, but enought to slow things down.
I query for logevents like this
var qf = new QueryFactory();
var query = qf.Logevent
.From(QueryTarget
.LeftJoin(LogeventEntity.Relations.UserEntityUsingUserId))
.WithPath(LogeventEntity.PrefetchPathUser);
... sorting and filtering not relevant here ...
query = query.Page(uiLogEventGrid.CurrentPageIndex + 1, uiLogEventGrid.PageSize);
pageindex is 0, and pagesize is 20
For sorting purposes, I need the leftjoin to the users table, but in this specific scenario I don't use it.
This produces a query like
SELECT DISTINCT
`logevents`.`Data`,
`logevents`.`Description`,
`logevents`.`Id`,
`logevents`.`ObjectId`,
`logevents`.`ObjectType`,
`logevents`.`Timestamp`,
`logevents`.`UserId`
FROM
(
`users`
RIGHT JOIN `logevents`
ON `users`.`Id` = `logevents`.`UserId`
)
ORDER BY `logevents`.`Id` DESC
LIMIT 20
This query takes 7.8 seconds to execute. If I remove the join, it takes 0.1 seconds. Its like the join is executed on all rows, and only then is the limit applied.
On table logevents, there is a PRIMARY index on Id as well as a KEY index on UserId On table users, there is a PRIMARY index on Id
Any ideas on how to speed up this issue?
Version information
- LLBLGen 5.8 (5.8.0) RTM Adapter template
- Mysql 5.6
- Devart dotconnect for MySQL express 8.16.1541.0
Joined: 21-Aug-2005
First things first, FK Side left join PK Side is effectively the same as an inner join, so please try using an Inner Join in your query. Then make sure LogEvents.UserId is properly indexed and you can also try undexing the cloumns used in the Sort clause.
Joined: 02-Jan-2019
Walaa wrote:
First things first, FK Side left join PK Side is effectively the same as an inner join, so please try using an Inner Join in your query. ...
I don't understand that.
What I have is conceptually
Logevents
- id (PK)
- timestamp
- description
- userid (FK)
Users
- id (PK)
- adusername
- firstname
- lastname
What I want, is ALL logevents with users joined on where a userid is present, which isn't always the case. I.e. what I would describe as a typical logevents LEFT JOIN users, wheres an users INNER JOIN logevents would only give me logevents for which a user can be found. Isn't that so?
When I query
var query = qf.Logevent .From(QueryTarget .LeftJoin(LogeventEntity.Relations.UserEntityUsingUserId)) .WithPath(LogeventEntity.PrefetchPathUser);
this SQL is generated:
FROM users RIGHT JOIN logevents ON ...
This will only give me logevents where a userid is present. However, I would expect a
FROM logevents LEFT JOIN users ON ...
as that is what I try to define in the code. Am I doing it wrong/backwards?
Anyway, I found the source for the performance issue. The query generated by llblgen includes a "DISTINCT", which I presume forces the DB engine to compare all selected rows before returning them. If I remove the DISTINCT keyword, the query performs as fast as expected. Why is this keyword added, and what can/should I do to avoid it?
Thanks for your kind help!
Joined: 17-Aug-2003
Yes, sorry it's my fault, I suggested it might be it, Walaa just followed that advice, but it's my mistake, you're correct. I made the error as in my hurry I misinterpreted the LogeventEntity.Relations.User
directive. FK left join PK is correct. I'm sorry for the confusion.
The right join is following the relationship you defined (you can specify the relationship in two ways, the Fk-Pk one and the Pk-Fk one, the runtime then flips it if necessary to always have pk-fk in the first join), it's fine, our runtime does that to better sort the elements internally that already have been emitted into the query. pk right join fk is equal to fk left join pk.
I think one of the main issues in your query is that you don't have an index on Logevents.userid. As the join has to compare values from that column with Users.id, and there's no index on the Logevents.userid column, it has to do a tablescan. So adding an index there is beneficial.
The distinct is emitted as it can't determine if the resulting rows are all unique. For paging, distinct is required to make sure the rows on a page are unique so fetching e.g. page 3 indeed gives the 3rd page of results. We don't have analysis in the runtime that traverses joins to see if the resulting query indeed results in unique elements. In this particular query it isn't possible as the projection contains only fields from the FK side of the join (so no duplicates can happen), however if you e.g. use the same relationship but fetch all users, duplicates will happen and therefore distinct is necessary. As said, we don't do deep analysis of the query to determine this, so to be sure we emit Distinct.
A join with 260K rows shouldn't take 7 seconds with indexes, so if you add the index it'll be faster, and you can use the paging.
Joined: 02-Jan-2019
Otis wrote:
... A join with 260K rows shouldn't take 7 seconds with indexes, so if you add the index it'll be faster, and you can use the paging.
Thanks for your quick reply, Frans!
One misunderstanding from my side: there is a relation so no logevent entries can exist without having a valid userid. Hence, a logevents LEFT JOIN users will return the same number of rows as users RIGHT JOIN logevents. Correct?
As I understand your reply, duplicates might happen in a different query (but not in this one), but as you don't analyse for that, the DISTINCT is added always. However, the way I see it, I have indexes on the relevant columns, here is the CREATE for the two tables:
CREATE TABLE `logevents` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`UserId` int(11) NOT NULL,
`ObjectType` int(11) NOT NULL,
`ObjectId` int(11) NOT NULL,
`Timestamp` datetime NOT NULL,
`Description` text COLLATE utf8_unicode_ci NOT NULL,
`Data` text COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`Id`),
KEY `FK_Logevents_Users` (`UserId`),
CONSTRAINT `FK_Logevents_Users` FOREIGN KEY (`UserId`) REFERENCES `users` (`Id`)
) ENGINE=InnoDB AUTO_INCREMENT=286043 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Primary key (index) on Id and another index on UserId.
And:
CREATE TABLE `users` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`FirstName` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`LastName` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`Email` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`ADUserName` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`IsDisabled` tinyint(1) NOT NULL,
`Password` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`Id`)
) ENGINE=InnoDB AUTO_INCREMENT=114 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Index on primary key Id.
Am I missing something with regards to the indexes?
Is there a way to instruct LLBLGen to not generate the DISTINCT keyword if that actually does the difference for me in this particular query?
Joined: 17-Aug-2003
First things first. After re-checking, I have to correct myself: Walaa is correct here in that FK left join PK is equal to FK inner join PK, the context in which this is true is when the FK side is non-nullable (so every FK side row has an associated PK row).
Ok, back to your problem:
JanRHansen wrote:
Otis wrote:
... A join with 260K rows shouldn't take 7 seconds with indexes, so if you add the index it'll be faster, and you can use the paging.
Thanks for your quick reply, Frans!
One misunderstanding from my side: there is a relation so no logevent entries can exist without having a valid userid. Hence, a logevents LEFT JOIN users will return the same number of rows as users RIGHT JOIN logevents. Correct?
yes and also the same rows as logevents inner join users, as the logevents side has a non-nullable FK and therefore there's always a user instance for every logevent instance.
As I understand your reply, duplicates might happen in a different query (but not in this one), but as you don't analyse for that, the DISTINCT is added always. However, the way I see it, I have indexes on the relevant columns, here is the CREATE for the two tables:
CREATE TABLE `logevents` ( `Id` int(11) NOT NULL AUTO_INCREMENT, `UserId` int(11) NOT NULL, `ObjectType` int(11) NOT NULL, `ObjectId` int(11) NOT NULL, `Timestamp` datetime NOT NULL, `Description` text COLLATE utf8_unicode_ci NOT NULL, `Data` text COLLATE utf8_unicode_ci NOT NULL, PRIMARY KEY (`Id`), KEY `FK_Logevents_Users` (`UserId`), CONSTRAINT `FK_Logevents_Users` FOREIGN KEY (`UserId`) REFERENCES `users` (`Id`) ) ENGINE=InnoDB AUTO_INCREMENT=286043 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Primary key (index) on Id and another index on UserId.
Alright, so that's ok. I misinterpreted your KEY index on UserId On table user
as an index on the users table, not the logevents table.
And:
CREATE TABLE `users` ( `Id` int(11) NOT NULL AUTO_INCREMENT, `FirstName` varchar(255) COLLATE utf8_unicode_ci NOT NULL, `LastName` varchar(255) COLLATE utf8_unicode_ci NOT NULL, `Email` varchar(255) COLLATE utf8_unicode_ci NOT NULL, `ADUserName` varchar(255) COLLATE utf8_unicode_ci NOT NULL, `IsDisabled` tinyint(1) NOT NULL, `Password` varchar(255) COLLATE utf8_unicode_ci NOT NULL, PRIMARY KEY (`Id`) ) ENGINE=InnoDB AUTO_INCREMENT=114 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Index on primary key Id.
Am I missing something with regards to the indexes?
No that's what you can do.
Is there a way to instruct LLBLGen to not generate the DISTINCT keyword if that actually does the difference for me in this particular query?
If you don't join the users entity, it'll be different. I don't see why you'd join the users table as the query you posted at the top has
ORDER BY `logevents`.`Id` DESC
which orders on the logevents entity, so no users needed. Please post all the code of the query so there's no more confusion. Thanks.
Joined: 02-Jan-2019
Sure - it was just to shorten things a bit. The data is bound to a grid and can be
- filtered by objecttype and/or objectid
- filtered by contains(text) on Description
- sorted by Id, Timestamp, Description or User.ADUserName.
- and finally paged
So, I construct the query little by little, the most annoying part being the sorting, as I have to look at the grids sorting expressions, which is something like "Description ASC" and translate that to an actual SortExpression. If you have a better way to do this, I'll gladly hear about it
var qf = new QueryFactory();
var query = qf.Logevent
.From(QueryTarget
.LeftJoin(LogeventEntity.Relations.UserEntityUsingUserId))
.WithPath(LogeventEntity.PrefetchPathUser);
if (currentObjectType != LogEventManager.ObjectType.All)
{
query = query.AndWhere(LogeventFields.ObjectType.Equal((int)currentObjectType));
}
if (ObjectId > 0)
{
query = query.AndWhere(LogeventFields.ObjectId.Equal((int)ObjectId));
}
// filter for search text
if (!String.IsNullOrWhiteSpace(uiSearchInput.Text))
{
IPredicateExpression whereClause = new PredicateExpression();
var searchParts = uiSearchInput.Text.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).ToList();
foreach (var searchPart in searchParts)
{
var pe = new PredicateExpression(LogeventFields.Description.Contains(searchPart));
whereClause.AddWithAnd(pe);
}
query = query.AndWhere(whereClause);
}
// sorting
if (uiLogEventGrid.MasterTableView.SortExpressions.Count > 0)
{
SortExpression sort = null;
var gridSortOrder = uiLogEventGrid.MasterTableView.SortExpressions[0].SortOrder;
switch (uiLogEventGrid.MasterTableView.SortExpressions[0].FieldName)
{
case "Id":
{
sort = gridSortOrder == GridSortOrder.Ascending
? new SortExpression(LogeventFields.Id.Ascending())
: new SortExpression(LogeventFields.Id.Descending());
break;
}
case "Timestamp":
{
sort = gridSortOrder == GridSortOrder.Ascending
? new SortExpression(LogeventFields.Timestamp.Ascending())
: new SortExpression(LogeventFields.Timestamp.Descending());
break;
}
case "User.ADUserName":
{
sort = gridSortOrder == GridSortOrder.Ascending
? new SortExpression(UserFields.AduserName.Ascending())
: new SortExpression(UserFields.AduserName.Descending());
break;
}
case "Description":
{
sort = gridSortOrder == GridSortOrder.Ascending
? new SortExpression(LogeventFields.Description.Ascending())
: new SortExpression(LogeventFields.Description.Descending());
break;
}
}
query = query.OrderBy(sort);
}
sw.Start();
// Get total rowcount
var count = adapter.FetchScalar<int>(qf.Create().Select(query.CountRow()));
uiLogEventGrid.VirtualItemCount = count;
sw.Stop();
Logger.Info($@"count {sw.Elapsed}");
// possibly adjust current page
if ((uiLogEventGrid.CurrentPageIndex + 1) > count/uiLogEventGrid.PageSize)
{
// current pageindex is larger than supported by the resultset size
uiLogEventGrid.CurrentPageIndex = count / uiLogEventGrid.PageSize;
}
// query the actual page of data
query = query.Page(uiLogEventGrid.CurrentPageIndex + 1, uiLogEventGrid.PageSize);
sw.Restart();
adapter.FetchQuery(query, logEvents);
sw.Stop();
Logger.Info($@"fetch {sw.Elapsed}");
So I only need to join the users table to sort by their AD user name.
I hope it makes better sense now Sorry about the confusion.
Joined: 21-Aug-2005
Could you please capture the generated query, and run it in MySQL, using Toad for example, and see if it takes the same amount of time to get executed?
And in this case you may re-run it again using the EXPLAIN keyword (before the query), to see the execution plan. This might shed some light into what we are overlooking.
Joined: 02-Jan-2019
Walaa wrote:
Could you please capture the generated query, and run it in MySQL, using Toad for example, and see if it takes the same amount of time to get executed?
And in this case you may re-run it again using the EXPLAIN keyword (before the query), to see the execution plan. This might shed some light into what we are overlooking.
Yes, sure. I'll run the "fetch everything" with no filtering, and only the default order by Id.
I'm using SQL yog and DBeaver to test with, both reveal the same time spent.
So, three queries are generated. The first originating from
var count = adapter.FetchScalar<int>(qf.Create().Select(query.CountRow()));
and is generated as
SELECT
(SELECT
COUNT(*) AS `LPAV_`
FROM
(
`users`
RIGHT JOIN `logevents`
ON `users`.`Id` = `logevents`.`UserId`
)
ORDER BY `logevents`.`Id` DESC) AS `LLBLV_1`
FROM
`logevents`
LIMIT 1;
it takes 0 seconds.
The second and third originates from
adapter.FetchQuery(query, logEvents);
and is generated as :
SELECT DISTINCT
`logevents`.`Data`,
`logevents`.`Description`,
`logevents`.`Id`,
`logevents`.`ObjectId`,
`logevents`.`ObjectType`,
`logevents`.`Timestamp`,
`logevents`.`UserId`
FROM
(
`users`
RIGHT JOIN `logevents`
ON `users`.`Id` = `logevents`.`UserId`
)
ORDER BY `logevents`.`Id` DESC
LIMIT 20;
this query takes between 7 and 8 seconds to run, consistently.
and
SELECT
`users`.`ADUserName` AS `AduserName`,
`users`.`Email`,
`users`.`FirstName`,
`users`.`Id`,
`users`.`IsDisabled`,
`users`.`LastName`,
`users`.`Password`
FROM
`users`
WHERE (`users`.`Id` = @p1)
Parameter :@p1 :Int32.Length :0. PRECISION :0. Scale :0. Direction :Input.Value :2.
this query takes, as expected, 0 seconds. So again in the milliseconds area.
All entries in the first 20 originate from the same user (with id=2), so only one users details are fetched in the third query.
Running an EXPLAIN on the slow query provides:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | logevents | index | PRIMARY | 4 | 20 | Using temporary | ||
1 | SIMPLE | users | eq_ref | PRIMARY | PRIMARY | 4 | hevesydoc.logevents.UserId | 1 | Using index; Distinct |
From SQL yog there is a "SHOW PROFILE Result (ordered by duration)" as well
state | duration (summed) in sec | percentage |
---|---|---|
Sending data | 7.56409 | 99.29038 |
removing tmp table | 0.05332 | 0.69991 |
Opening tables | 0.00032 | 0.00420 |
freeing items | 0.00016 | 0.00210 |
Creating tmp table | 0.00006 | 0.00079 |
closing tables | 0.00004 | 0.00053 |
cleaning up | 0.00003 | 0.00039 |
starting | 0.00003 | 0.00039 |
end | 0.00003 | 0.00039 |
statistics | 0.00002 | 0.00026 |
init | 0.00002 | 0.00026 |
System lock | 0.00001 | 0.00013 |
preparing | 0.00001 | 0.00013 |
query end | 0.00001 | 0.00013 |
Total | 7.61815 | 100.00000 |
I just noticed that if I remove the
ORDER BY `logevents`.`Id` DESC
from the slow query, it becomes lightning fast as well. That would typically happen if the index was missing, right?
Ah well... ideas are very welcome
Joined: 17-Aug-2003
SELECT DISTINCT
`logevents`.`Data`,
`logevents`.`Description`,
`logevents`.`Id`,
`logevents`.`ObjectId`,
`logevents`.`ObjectType`,
`logevents`.`Timestamp`,
`logevents`.`UserId`
FROM
(
`users`
RIGHT JOIN `logevents`
ON `users`.`Id` = `logevents`.`UserId`
)
ORDER BY `logevents`.`Id` DESC
LIMIT 20;
Here the users table isn't needed as you're sorting on a field in logevents, so the whole join can be removed. The FK field logevents.userid isn't nullable so the user rows are there, so no need for a join, the # of rows is the same.
What I find strange about the profile is that the slow part is 'sending data', so not something in the query itself, but more a processing of the resultset. This might be indeed the Distinct you mentioned. Removing the orderby isn't recommended when paging, as you need an ordering otherwise you get random rows in the page (i.e. it might be the second page is teh same as the first page).
Joined: 02-Jan-2019
Otis wrote:
Here the users table isn't needed as you're sorting on a field in logevents, so the whole join can be removed. The FK field logevents.userid isn't nullable so the user rows are there, so no need for a join, the # of rows is the same.
I know, but as you can see from the C# code, I may need to sort on users.adusername if the user decides to do that in the grid. Also, regardless of any sorting, I need to display the adusername property value in the grid, so I need the join for that purpose.
Otis wrote:
What I find strange about the profile is that the slow part is 'sending data', so not something in the query itself, but more a processing of the resultset. This might be indeed the Distinct you mentioned.
Yep, thats what lead me to test the DISTINCT's impact in the first place.
Otis wrote:
Removing the orderby isn't recommended when paging, as you need an ordering otherwise you get random rows in the page (i.e. it might be the second page is teh same as the first page).
I know - I just wanted to test it, and since the result was dramatically improved, I wanted to let you know. It might help you figuring out what is wrong.
Joined: 02-Jan-2019
Walaa wrote:
I'd try the following:
1- Make sure
users
.Id
andlogevents
.UserId
have the exact dataType. God forbids if they are strings, make sure the collation is the same.
Both are int(11)
Walaa wrote:
2- Rebuild indexes in both tables.
I ran
OPTIMIZE TABLE logevents;
OPTIMIZE TABLE users;
Both returned the same type of answer (the one for logevents shown below)
Table | Op | Msg_type | Msg_text |
---|---|---|---|
hevesydoc.logevents | optimize | note | Table does not support optimize, doing recreate + analyze instead |
hevesydoc.logevents | optimize | status | OK |
The query still takes 7-8 seconds.
Joined: 17-Aug-2003
I don't see anything we can do, to be honest. the query should run fast, but mysql apparently chokes on it. Other than indexes, which you already have, there's little else we can do.
As you say distinct is slow, the database has to spool the temporary resultset into a temp table, and then filter, same goes for the order by desc. This is usually not that slow, as distinct can be done while fetching rows for sending to the client (you don't need to distinct the whole set, only 20 are requested). But your temp table entry in the profile is fast. the only thing that's slow is the 'sending to client' part, and that can be anything.
Joined: 02-Jan-2019
Yeah...
But hey - news from here now. It seems to be related to the - admittedly - very old MySQL server version (5.5 or 5.6).
I just tried to install a MySQL 8.0.25, copy the database and run the slow query. Execution time is now 0.011 seconds. Should have looked there in the first place, but hindsight is always 20-20, right?
Don't waste more time on this issue, I'll upgrade the DB as part of the project.
Thanks a lot for your very kind and quick help!
/Jan