Paging on a slow query is still slow

Posts   
 
    
JanRHansen
User
Posts: 37
Joined: 02-Jan-2019
# Posted on: 19-May-2021 15:23:31   

I dont get this. I have a logevents-table which contains around 260.000 rows. Not much, but enought to slow things down.

I query for logevents like this

var qf = new QueryFactory();
var query = qf.Logevent
    .From(QueryTarget
        .LeftJoin(LogeventEntity.Relations.UserEntityUsingUserId))
    .WithPath(LogeventEntity.PrefetchPathUser);

... sorting and filtering not relevant here ...

query = query.Page(uiLogEventGrid.CurrentPageIndex + 1, uiLogEventGrid.PageSize);

pageindex is 0, and pagesize is 20

For sorting purposes, I need the leftjoin to the users table, but in this specific scenario I don't use it.

This produces a query like

SELECT DISTINCT
  `logevents`.`Data`,
  `logevents`.`Description`,
  `logevents`.`Id`,
  `logevents`.`ObjectId`,
  `logevents`.`ObjectType`,
  `logevents`.`Timestamp`,
  `logevents`.`UserId`
FROM
  (
    `users`
    RIGHT JOIN `logevents`
      ON `users`.`Id` = `logevents`.`UserId`
  )
ORDER BY `logevents`.`Id` DESC
LIMIT 20

This query takes 7.8 seconds to execute. If I remove the join, it takes 0.1 seconds. Its like the join is executed on all rows, and only then is the limit applied.

On table logevents, there is a PRIMARY index on Id as well as a KEY index on UserId On table users, there is a PRIMARY index on Id

Any ideas on how to speed up this issue?

Version information

  • LLBLGen 5.8 (5.8.0) RTM Adapter template
  • Mysql 5.6
  • Devart dotconnect for MySQL express 8.16.1541.0
Walaa avatar
Walaa
Support Team
Posts: 14950
Joined: 21-Aug-2005
# Posted on: 20-May-2021 00:49:08   

First things first, FK Side left join PK Side is effectively the same as an inner join, so please try using an Inner Join in your query. Then make sure LogEvents.UserId is properly indexed and you can also try undexing the cloumns used in the Sort clause.

JanRHansen
User
Posts: 37
Joined: 02-Jan-2019
# Posted on: 20-May-2021 09:27:43   

Walaa wrote:

First things first, FK Side left join PK Side is effectively the same as an inner join, so please try using an Inner Join in your query. ...

I don't understand that.

What I have is conceptually

Logevents

  • id (PK)
  • timestamp
  • description
  • userid (FK)

Users

  • id (PK)
  • adusername
  • firstname
  • lastname

What I want, is ALL logevents with users joined on where a userid is present, which isn't always the case. I.e. what I would describe as a typical logevents LEFT JOIN users, wheres an users INNER JOIN logevents would only give me logevents for which a user can be found. Isn't that so?

When I query

var query = qf.Logevent .From(QueryTarget .LeftJoin(LogeventEntity.Relations.UserEntityUsingUserId)) .WithPath(LogeventEntity.PrefetchPathUser);

this SQL is generated:

FROM users RIGHT JOIN logevents ON ...

This will only give me logevents where a userid is present. However, I would expect a

FROM logevents LEFT JOIN users ON ...

as that is what I try to define in the code. Am I doing it wrong/backwards?

Anyway, I found the source for the performance issue. The query generated by llblgen includes a "DISTINCT", which I presume forces the DB engine to compare all selected rows before returning them. If I remove the DISTINCT keyword, the query performs as fast as expected. Why is this keyword added, and what can/should I do to avoid it?

Thanks for your kind help!

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 20-May-2021 09:42:33   

Yes, sorry it's my fault, I suggested it might be it, Walaa just followed that advice, but it's my mistake, you're correct. I made the error as in my hurry I misinterpreted the LogeventEntity.Relations.User directive. FK left join PK is correct. I'm sorry for the confusion.

The right join is following the relationship you defined (you can specify the relationship in two ways, the Fk-Pk one and the Pk-Fk one, the runtime then flips it if necessary to always have pk-fk in the first join), it's fine, our runtime does that to better sort the elements internally that already have been emitted into the query. pk right join fk is equal to fk left join pk.

I think one of the main issues in your query is that you don't have an index on Logevents.userid. As the join has to compare values from that column with Users.id, and there's no index on the Logevents.userid column, it has to do a tablescan. So adding an index there is beneficial.

The distinct is emitted as it can't determine if the resulting rows are all unique. For paging, distinct is required to make sure the rows on a page are unique so fetching e.g. page 3 indeed gives the 3rd page of results. We don't have analysis in the runtime that traverses joins to see if the resulting query indeed results in unique elements. In this particular query it isn't possible as the projection contains only fields from the FK side of the join (so no duplicates can happen), however if you e.g. use the same relationship but fetch all users, duplicates will happen and therefore distinct is necessary. As said, we don't do deep analysis of the query to determine this, so to be sure we emit Distinct.

A join with 260K rows shouldn't take 7 seconds with indexes, so if you add the index it'll be faster, and you can use the paging.

Frans Bouma | Lead developer LLBLGen Pro
JanRHansen
User
Posts: 37
Joined: 02-Jan-2019
# Posted on: 20-May-2021 10:05:07   

Otis wrote:

... A join with 260K rows shouldn't take 7 seconds with indexes, so if you add the index it'll be faster, and you can use the paging.

Thanks for your quick reply, Frans!

One misunderstanding from my side: there is a relation so no logevent entries can exist without having a valid userid. Hence, a logevents LEFT JOIN users will return the same number of rows as users RIGHT JOIN logevents. Correct?

As I understand your reply, duplicates might happen in a different query (but not in this one), but as you don't analyse for that, the DISTINCT is added always. However, the way I see it, I have indexes on the relevant columns, here is the CREATE for the two tables:

CREATE TABLE `logevents` (
  `Id` int(11) NOT NULL AUTO_INCREMENT,
  `UserId` int(11) NOT NULL,
  `ObjectType` int(11) NOT NULL,
  `ObjectId` int(11) NOT NULL,
  `Timestamp` datetime NOT NULL,
  `Description` text COLLATE utf8_unicode_ci NOT NULL,
  `Data` text COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`Id`),
  KEY `FK_Logevents_Users` (`UserId`),
  CONSTRAINT `FK_Logevents_Users` FOREIGN KEY (`UserId`) REFERENCES `users` (`Id`)
) ENGINE=InnoDB AUTO_INCREMENT=286043 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

Primary key (index) on Id and another index on UserId.

And:

CREATE TABLE `users` (
  `Id` int(11) NOT NULL AUTO_INCREMENT,
  `FirstName` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `LastName` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `Email` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `ADUserName` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `IsDisabled` tinyint(1) NOT NULL,
  `Password` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`Id`)
) ENGINE=InnoDB AUTO_INCREMENT=114 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

Index on primary key Id.

Am I missing something with regards to the indexes?

Is there a way to instruct LLBLGen to not generate the DISTINCT keyword if that actually does the difference for me in this particular query?

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 20-May-2021 10:40:14   

First things first. After re-checking, I have to correct myself: Walaa is correct here in that FK left join PK is equal to FK inner join PK, the context in which this is true is when the FK side is non-nullable (so every FK side row has an associated PK row).

Ok, back to your problem:

JanRHansen wrote:

Otis wrote:

... A join with 260K rows shouldn't take 7 seconds with indexes, so if you add the index it'll be faster, and you can use the paging.

Thanks for your quick reply, Frans!

One misunderstanding from my side: there is a relation so no logevent entries can exist without having a valid userid. Hence, a logevents LEFT JOIN users will return the same number of rows as users RIGHT JOIN logevents. Correct?

yes and also the same rows as logevents inner join users, as the logevents side has a non-nullable FK and therefore there's always a user instance for every logevent instance.

As I understand your reply, duplicates might happen in a different query (but not in this one), but as you don't analyse for that, the DISTINCT is added always. However, the way I see it, I have indexes on the relevant columns, here is the CREATE for the two tables:

CREATE TABLE `logevents` (
  `Id` int(11) NOT NULL AUTO_INCREMENT,
  `UserId` int(11) NOT NULL,
  `ObjectType` int(11) NOT NULL,
  `ObjectId` int(11) NOT NULL,
  `Timestamp` datetime NOT NULL,
  `Description` text COLLATE utf8_unicode_ci NOT NULL,
  `Data` text COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`Id`),
  KEY `FK_Logevents_Users` (`UserId`),
  CONSTRAINT `FK_Logevents_Users` FOREIGN KEY (`UserId`) REFERENCES `users` (`Id`)
) ENGINE=InnoDB AUTO_INCREMENT=286043 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

Primary key (index) on Id and another index on UserId.

Alright, so that's ok. I misinterpreted your KEY index on UserId On table user as an index on the users table, not the logevents table.

And:

CREATE TABLE `users` (
  `Id` int(11) NOT NULL AUTO_INCREMENT,
  `FirstName` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `LastName` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `Email` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `ADUserName` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `IsDisabled` tinyint(1) NOT NULL,
  `Password` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`Id`)
) ENGINE=InnoDB AUTO_INCREMENT=114 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

Index on primary key Id.

Am I missing something with regards to the indexes?

No that's what you can do.

Is there a way to instruct LLBLGen to not generate the DISTINCT keyword if that actually does the difference for me in this particular query?

If you don't join the users entity, it'll be different. I don't see why you'd join the users table as the query you posted at the top has

ORDER BY `logevents`.`Id` DESC

which orders on the logevents entity, so no users needed. Please post all the code of the query so there's no more confusion. Thanks.

Frans Bouma | Lead developer LLBLGen Pro
JanRHansen
User
Posts: 37
Joined: 02-Jan-2019
# Posted on: 20-May-2021 10:56:13   

Sure - it was just to shorten things a bit. The data is bound to a grid and can be

  • filtered by objecttype and/or objectid
  • filtered by contains(text) on Description
  • sorted by Id, Timestamp, Description or User.ADUserName.
  • and finally paged

So, I construct the query little by little, the most annoying part being the sorting, as I have to look at the grids sorting expressions, which is something like "Description ASC" and translate that to an actual SortExpression. If you have a better way to do this, I'll gladly hear about it simple_smile

                var qf = new QueryFactory();
                var query = qf.Logevent
                    .From(QueryTarget
                        .LeftJoin(LogeventEntity.Relations.UserEntityUsingUserId))
                    .WithPath(LogeventEntity.PrefetchPathUser);

                if (currentObjectType != LogEventManager.ObjectType.All)
                {
                    query = query.AndWhere(LogeventFields.ObjectType.Equal((int)currentObjectType));
                }

                if (ObjectId > 0)
                {
                    query = query.AndWhere(LogeventFields.ObjectId.Equal((int)ObjectId));
                }

                // filter for search text
                if (!String.IsNullOrWhiteSpace(uiSearchInput.Text))
                {
                    IPredicateExpression whereClause = new PredicateExpression();
                    var searchParts = uiSearchInput.Text.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).ToList();

                    foreach (var searchPart in searchParts)
                    {
                        var pe = new PredicateExpression(LogeventFields.Description.Contains(searchPart));

                        whereClause.AddWithAnd(pe);
                    }

                    query = query.AndWhere(whereClause);
                }

                // sorting
                if (uiLogEventGrid.MasterTableView.SortExpressions.Count > 0)
                {
                    SortExpression sort = null;
                    
                    var gridSortOrder = uiLogEventGrid.MasterTableView.SortExpressions[0].SortOrder;
                    
                    switch (uiLogEventGrid.MasterTableView.SortExpressions[0].FieldName)
                    {
                        case "Id":
                        {
                            sort = gridSortOrder == GridSortOrder.Ascending
                                ? new SortExpression(LogeventFields.Id.Ascending())
                                : new SortExpression(LogeventFields.Id.Descending());
                            break;
                        }
                        case "Timestamp":
                        {
                            sort = gridSortOrder == GridSortOrder.Ascending
                                ? new SortExpression(LogeventFields.Timestamp.Ascending())
                                : new SortExpression(LogeventFields.Timestamp.Descending());
                            break;
                        }
                        case "User.ADUserName":
                        {
                            sort = gridSortOrder == GridSortOrder.Ascending
                                ? new SortExpression(UserFields.AduserName.Ascending())
                                : new SortExpression(UserFields.AduserName.Descending());
                            break;
                        }
                        case "Description":
                        {
                            sort = gridSortOrder == GridSortOrder.Ascending
                                ? new SortExpression(LogeventFields.Description.Ascending())
                                : new SortExpression(LogeventFields.Description.Descending());
                            break;
                        }

                    }
                    query = query.OrderBy(sort);
                }

                sw.Start();

                // Get total rowcount
                var count = adapter.FetchScalar<int>(qf.Create().Select(query.CountRow()));
                uiLogEventGrid.VirtualItemCount = count;

                sw.Stop();
                Logger.Info($@"count {sw.Elapsed}");

                // possibly adjust current page
                if ((uiLogEventGrid.CurrentPageIndex + 1) > count/uiLogEventGrid.PageSize)
                {
                    // current pageindex is larger than supported by the resultset size
                    uiLogEventGrid.CurrentPageIndex = count / uiLogEventGrid.PageSize;
                }

                // query the actual page of data
                query = query.Page(uiLogEventGrid.CurrentPageIndex + 1, uiLogEventGrid.PageSize);

                sw.Restart();
                adapter.FetchQuery(query, logEvents);
                sw.Stop();
                Logger.Info($@"fetch {sw.Elapsed}");

So I only need to join the users table to sort by their AD user name.

I hope it makes better sense now simple_smile Sorry about the confusion.

Walaa avatar
Walaa
Support Team
Posts: 14950
Joined: 21-Aug-2005
# Posted on: 20-May-2021 19:04:02   

Could you please capture the generated query, and run it in MySQL, using Toad for example, and see if it takes the same amount of time to get executed?

And in this case you may re-run it again using the EXPLAIN keyword (before the query), to see the execution plan. This might shed some light into what we are overlooking.

JanRHansen
User
Posts: 37
Joined: 02-Jan-2019
# Posted on: 21-May-2021 09:16:36   

Walaa wrote:

Could you please capture the generated query, and run it in MySQL, using Toad for example, and see if it takes the same amount of time to get executed?

And in this case you may re-run it again using the EXPLAIN keyword (before the query), to see the execution plan. This might shed some light into what we are overlooking.

Yes, sure. I'll run the "fetch everything" with no filtering, and only the default order by Id.

I'm using SQL yog and DBeaver to test with, both reveal the same time spent.

So, three queries are generated. The first originating from

var count = adapter.FetchScalar<int>(qf.Create().Select(query.CountRow()));

and is generated as

SELECT
  (SELECT
    COUNT(*) AS `LPAV_`
  FROM
    (
      `users`
      RIGHT JOIN `logevents`
        ON `users`.`Id` = `logevents`.`UserId`
    )
  ORDER BY `logevents`.`Id` DESC) AS `LLBLV_1`
FROM
  `logevents`
LIMIT 1;

it takes 0 seconds.

The second and third originates from

adapter.FetchQuery(query, logEvents);

and is generated as :

SELECT DISTINCT
  `logevents`.`Data`,
  `logevents`.`Description`,
  `logevents`.`Id`,
  `logevents`.`ObjectId`,
  `logevents`.`ObjectType`,
  `logevents`.`Timestamp`,
  `logevents`.`UserId`
FROM
  (
    `users`
    RIGHT JOIN `logevents`
      ON `users`.`Id` = `logevents`.`UserId`
  )
ORDER BY `logevents`.`Id` DESC
LIMIT 20;

this query takes between 7 and 8 seconds to run, consistently.

and


SELECT
  `users`.`ADUserName` AS `AduserName`,
  `users`.`Email`,
  `users`.`FirstName`,
  `users`.`Id`,
  `users`.`IsDisabled`,
  `users`.`LastName`,
  `users`.`Password`
FROM
  `users`
WHERE (`users`.`Id` = @p1) 
Parameter :@p1 :Int32.Length :0. PRECISION :0. Scale :0. Direction :Input.Value :2.

this query takes, as expected, 0 seconds. So again in the milliseconds area.

All entries in the first 20 originate from the same user (with id=2), so only one users details are fetched in the third query.

Running an EXPLAIN on the slow query provides:

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE logevents index PRIMARY 4 20 Using temporary
1 SIMPLE users eq_ref PRIMARY PRIMARY 4 hevesydoc.logevents.UserId 1 Using index; Distinct

From SQL yog there is a "SHOW PROFILE Result (ordered by duration)" as well

state duration (summed) in sec percentage
Sending data 7.56409 99.29038
removing tmp table 0.05332 0.69991
Opening tables 0.00032 0.00420
freeing items 0.00016 0.00210
Creating tmp table 0.00006 0.00079
closing tables 0.00004 0.00053
cleaning up 0.00003 0.00039
starting 0.00003 0.00039
end 0.00003 0.00039
statistics 0.00002 0.00026
init 0.00002 0.00026
System lock 0.00001 0.00013
preparing 0.00001 0.00013
query end 0.00001 0.00013
Total 7.61815 100.00000

I just noticed that if I remove the

ORDER BY `logevents`.`Id` DESC

from the slow query, it becomes lightning fast as well. That would typically happen if the index was missing, right?

Ah well... ideas are very welcome simple_smile

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 21-May-2021 09:24:53   
SELECT DISTINCT
  `logevents`.`Data`,
  `logevents`.`Description`,
  `logevents`.`Id`,
  `logevents`.`ObjectId`,
  `logevents`.`ObjectType`,
  `logevents`.`Timestamp`,
  `logevents`.`UserId`
FROM
  (
    `users`
    RIGHT JOIN `logevents`
      ON `users`.`Id` = `logevents`.`UserId`
  )
ORDER BY `logevents`.`Id` DESC
LIMIT 20;

Here the users table isn't needed as you're sorting on a field in logevents, so the whole join can be removed. The FK field logevents.userid isn't nullable so the user rows are there, so no need for a join, the # of rows is the same.

What I find strange about the profile is that the slow part is 'sending data', so not something in the query itself, but more a processing of the resultset. This might be indeed the Distinct you mentioned. Removing the orderby isn't recommended when paging, as you need an ordering otherwise you get random rows in the page (i.e. it might be the second page is teh same as the first page).

Frans Bouma | Lead developer LLBLGen Pro
JanRHansen
User
Posts: 37
Joined: 02-Jan-2019
# Posted on: 21-May-2021 09:31:32   

Otis wrote:

Here the users table isn't needed as you're sorting on a field in logevents, so the whole join can be removed. The FK field logevents.userid isn't nullable so the user rows are there, so no need for a join, the # of rows is the same.

I know, but as you can see from the C# code, I may need to sort on users.adusername if the user decides to do that in the grid. Also, regardless of any sorting, I need to display the adusername property value in the grid, so I need the join for that purpose.

Otis wrote:

What I find strange about the profile is that the slow part is 'sending data', so not something in the query itself, but more a processing of the resultset. This might be indeed the Distinct you mentioned.

Yep, thats what lead me to test the DISTINCT's impact in the first place.

Otis wrote:

Removing the orderby isn't recommended when paging, as you need an ordering otherwise you get random rows in the page (i.e. it might be the second page is teh same as the first page).

I know - I just wanted to test it, and since the result was dramatically improved, I wanted to let you know. It might help you figuring out what is wrong.

Walaa avatar
Walaa
Support Team
Posts: 14950
Joined: 21-Aug-2005
# Posted on: 21-May-2021 10:19:43   

I'd try the following:

1- Make sure users.Id and logevents.UserId have the exact dataType. God forbids if they are strings, make sure the collation is the same.

2- Rebuild indexes in both tables.

JanRHansen
User
Posts: 37
Joined: 02-Jan-2019
# Posted on: 21-May-2021 10:37:59   

Walaa wrote:

I'd try the following:

1- Make sure users.Id and logevents.UserId have the exact dataType. God forbids if they are strings, make sure the collation is the same.

Both are int(11)

Walaa wrote:

2- Rebuild indexes in both tables.

I ran

OPTIMIZE TABLE logevents;
OPTIMIZE TABLE users;

Both returned the same type of answer (the one for logevents shown below)

Table Op Msg_type Msg_text
hevesydoc.logevents optimize note Table does not support optimize, doing recreate + analyze instead
hevesydoc.logevents optimize status OK

The query still takes 7-8 seconds.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 21-May-2021 12:50:35   

I don't see anything we can do, to be honest. the query should run fast, but mysql apparently chokes on it. Other than indexes, which you already have, there's little else we can do.

As you say distinct is slow, the database has to spool the temporary resultset into a temp table, and then filter, same goes for the order by desc. This is usually not that slow, as distinct can be done while fetching rows for sending to the client (you don't need to distinct the whole set, only 20 are requested). But your temp table entry in the profile is fast. the only thing that's slow is the 'sending to client' part, and that can be anything.

Frans Bouma | Lead developer LLBLGen Pro
JanRHansen
User
Posts: 37
Joined: 02-Jan-2019
# Posted on: 21-May-2021 12:54:54   

Yeah...

But hey - news from here now. It seems to be related to the - admittedly - very old MySQL server version (5.5 or 5.6).

I just tried to install a MySQL 8.0.25, copy the database and run the slow query. Execution time is now 0.011 seconds. Should have looked there in the first place, but hindsight is always 20-20, right?

Don't waste more time on this issue, I'll upgrade the DB as part of the project.

Thanks a lot for your very kind and quick help!

/Jan