Oracle ODP.NET "Connection request timed out" error

Posts   
 
    
greenstone
User
Posts: 132
Joined: 20-Jun-2007
# Posted on: 12-Oct-2021 23:42:58   

Our web application (using llblgen 5.6.2.0) with Oracle (using ODP.NET) as the database. There are typically several thousand concurrent users.

We are getting the error "Connection request timed out" errors (different web site users) in bursts of 20 at the same time. Then it might be 70 minutes, or 60, or 10 minutes before we see a burst of "Connection request timed out" again. The "Connection request timed out" is on different pages handlers, but all happen at the same time.

This oracle link (below) suggests to either "Validate connection=true" in the connection string (which would likely really diminish performance) -or-

The Validate Connection attribute validates connections coming out of the pool. This attribute should be used only when absolutely necessary, because it causes a round-trip to the database to validate each connection immediately before it is provided to the application. If invalid connections are uncommon, developers can create their own event handler to retrieve and validate a new connection, rather than using the Validate Connection attribute. This generally provides better performance.*

https://community.oracle.com/tech/developers/discussion/4343363/oracle-manageddataaccess-client-oracleexception-0x80004005-connection-request-timed-out

Below is a sample of the error we receive.

Suggestion of addressing this ODP.NET issue? Maybe something put in the llblgen layer to re-invigorate the connection without failing the connection?

Error Description: The following exception occurred at PR Get(Oracle.ManagedDataAccess.Client.ConnectionString, Boolean, System.String, Boolean): Connection request timed out

The following exception occurred at Grb.Platform.Framework.Business.Lower.EntityClasses.UserEmployeeEntity GetUserEmployeeByEmpGeneralId(Int32, Boolean): BusinessManager_GetUserEmployeeByEmpGeneralId, empGeneralId: xxx9113

The following exception occurred at Boolean HandleError(System.Exception): Exception of type 'System.Web.HttpUnhandledException' was thrown.

Stack Trace:

   at OracleInternal.ConnectionPool.PoolManager`3.Get(ConnectionString csWithDiffOrNewPwd, Boolean bGetForApp, String affinityInstanceName, Boolean bForceMatch)
   at OracleInternal.ConnectionPool.OraclePoolManager.Get(ConnectionString csWithNewPassword, Boolean bGetForApp, String affinityInstanceName, Boolean bForceMatch)
   at OracleInternal.ConnectionPool.OracleConnectionDispenser`3.Get(ConnectionString cs, PM conPM, ConnectionString pmCS, SecureString securedPassword, SecureString securedProxyPassword)
   at Oracle.ManagedDataAccess.Client.OracleConnection.Open()
   at SD.LLBLGen.Pro.ORMSupportClasses.DataAccessAdapterCore.OpenConnection()
   at SD.LLBLGen.Pro.ORMSupportClasses.DataAccessAdapterCore.ExecuteMultiRowRetrievalQuery(IRetrievalQuery queryToExecute, IEntityFactory2 entityFactory, IEntityCollection2 collectionToFill, IFieldPersistenceInfo[] fieldsPersistenceInfo, Boolean allowDuplicates, IEntityFields2 fieldsUsedForQuery)
   at SD.LLBLGen.Pro.ORMSupportClasses.DataAccessAdapterCore.FetchEntityCollectionInternal(QueryParameters parameters)
   at SD.LLBLGen.Pro.ORMSupportClasses.DataAccessAdapterCore.FetchEntityCollection(QueryParameters parameters)
   at SD.LLBLGen.Pro.ORMSupportClasses.DataAccessAdapterBase.<>c__DisplayClass10_0.<FetchEntityCollection>b__0()
   at SD.LLBLGen.Pro.ORMSupportClasses.DataAccessAdapterCore.FetchEntityCollection(IEntityCollection2 collectionToFill, IRelationPredicateBucket filterBucket, Int32 maxNumberOfItemsToReturn, ISortExpression sortClauses, IPrefetchPath2 prefetchPath)
   at Grb.Platform.Business.HR.BusinessManager.GetUserEmployeeByEmpGeneralId(Int32 empGeneralId, Boolean fetchLoginEntity) in D:\Source\WorkingSource\Production\Platform\Business\HR\BusinessManager.vb:line 10295   at Grb.Platform.Business.HR.BusinessManager.GetUserEmployeeByEmpGeneralId(Int32 empGeneralId, Boolean fetchLoginEntity) in D:\Source\WorkingSource\Production\Platform\Business\HR\BusinessManager.vb:line 10309
   at Grb.Platform.Presentation.Employee.NewUserVideo.Page_Init(Object sender, EventArgs e) in D:\Source\WorkingSource\Production\Platform\Presentation\Employee\NewUserVideo.aspx.cs:line 31
   at System.EventHandler.Invoke(Object sender, EventArgs e)
   at System.Web.UI.Control.OnInit(EventArgs e)
   at System.Web.UI.Page.OnInit(EventArgs e)
   at Grb.Platform.Framework.Presentation.CommonWebForm.OnInit(EventArgs e) in D:\Source\WorkingSource\Production\Platform\Framework\Presentation\CommonWebForm.vb:line 212
   at System.Web.UI.Control.InitRecursive(Control namingContainer)
   at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint)   at System.Web.UI.Page.HandleError(Exception e)
   at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint)
   at System.Web.UI.Page.ProcessRequest(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint)
   at System.Web.UI.Page.ProcessRequest()
   at System.Web.UI.Page.ProcessRequest(HttpContext context)
   at System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
   at System.Web.HttpApplication.ExecuteStepImpl(IExecutionStep step)
   at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)
Otis avatar
Otis
LLBLGen Pro Team
Posts: 39588
Joined: 17-Aug-2003
# Posted on: 13-Oct-2021 10:10:48   

Is this on a cloud based system? The error looks like a transient error one, so you could recover from that using a RecoveryStrategy. https://www.llblgen.com/Documentation/5.6/LLBLGen%20Pro%20RTF/Using%20the%20generated%20code/gencode_transientrecovery.htm

The SqlAzureRecoveryStrategy type is what you should look into (copy the class from the source code archive, available on the website in the Extra's section). Of course you have to replace the code with ODP.NET specific code and Oracle errors, but you can use that to catch the exception there, check if the error code is one you can retry (like with this one), and the strategy will retry the action for the number of times you specify. I don't have a list of transient error codes handy for oracle, sorry but I don't think these are hard to come by.

Would that help you?

Frans Bouma | Lead developer LLBLGen Pro
greenstone
User
Posts: 132
Joined: 20-Jun-2007
# Posted on: 13-Oct-2021 12:11:05   

Otis wrote:

Is this on a cloud based system? The error looks like a transient error one, so you could recover from that using a RecoveryStrategy. https://www.llblgen.com/Documentation/5.6/LLBLGen%20Pro%20RTF/Using%20the%20generated%20code/gencode_transientrecovery.htm

The SqlAzureRecoveryStrategy type is what you should look into (copy the class from the source code archive, available on the website in the Extra's section). Of course you have to replace the code with ODP.NET specific code and Oracle errors, but you can use that to catch the exception there, check if the error code is one you can retry (like with this one), and the strategy will retry the action for the number of times you specify. I don't have a list of transient error codes handy for oracle, sorry but I don't think these are hard to come by.

Would that help you?

Not a cloud based server or database, a local web server and a local physical database. Thanks for the info on the RecoveryStrategy: That's good info and seems like a good point I may need to implement this kind of code. I'm still puzzling a bit of why this ODP.NET error has started happening, as this has been running for years now and this error (seeming to do with ODP.NET Connection Pooling) is moderately new.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39588
Joined: 17-Aug-2003
# Posted on: 13-Oct-2021 13:50:22   

Perhaps someone updated the ODP.NET Client? (If you're using the old client wrapping one) or updated the ODP.NET provider perhaps?

Frans Bouma | Lead developer LLBLGen Pro
greenstone
User
Posts: 132
Joined: 20-Jun-2007
# Posted on: 18-Oct-2021 16:06:08   

Thanks for the thought. Still looking into it, and will post what we find (for future reference)

Otis wrote:

Perhaps someone updated the ODP.NET Client? (If you're using the old client wrapping one) or updated the ODP.NET provider perhaps?

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39588
Joined: 17-Aug-2003
# Posted on: 19-Oct-2021 10:32:39   

If it's of any help: we didn't change anything in how we operate with Oracle ODP.NET: it's straight forward ADO.NET, nothing specific. We do support 2 different ODP.NET variants (managed, which doesn't require the CLI and the CLI wrapping one), not sure which one you're using. Both are obtained through the DbProviderFactory system and obtained via the GAC so a new install of ODP.NET might have affected your software as well.

Frans Bouma | Lead developer LLBLGen Pro
greenstone
User
Posts: 132
Joined: 20-Jun-2007
# Posted on: 09-Dec-2021 00:41:06   

Otis wrote:

If it's of any help: we didn't change anything in how we operate with Oracle ODP.NET: it's straight forward ADO.NET, nothing specific. We do support 2 different ODP.NET variants (managed, which doesn't require the CLI and the CLI wrapping one), not sure which one you're using. Both are obtained through the DbProviderFactory system and obtained via the GAC so a new install of ODP.NET might have affected your software as well.

Thanks for letting us know you use straight ODP.NET. We use the Managed ODP.NET.

We've implemented the recovery strategy and it does seem to recover after one or a couple tries (takes sometimes 30 seconds...up to 2 minutes to fully recover)...we know the timing as we log the info provided by the recovery strategy when it's reported a given SQL call is successful after X tries. Thanks for pointing us this way.

We're still trying to figure out the root cause of the "Connection request timed out".

We're wondering if there is some query we're making right before these error start happening that is "breaking" the connection for us. Is there a way we can log the last llblgen generated SQL query before a recovery is found to be needed?

daelmo avatar
daelmo
Support Team
Posts: 8245
Joined: 28-Nov-2005
# Posted on: 09-Dec-2021 07:20:30   

greenstone wrote:

We're wondering if there is some query we're making right before these error start happening that is "breaking" the connection for us. Is there a way we can log the last llblgen generated SQL query before a recovery is found to be needed?

Have you tried ORMProfiler to profile the sql calls?

David Elizondo | LLBLGen Support Team
Otis avatar
Otis
LLBLGen Pro Team
Posts: 39588
Joined: 17-Aug-2003
# Posted on: 09-Dec-2021 10:20:11   

greenstone wrote:

We're wondering if there is some query we're making right before these error start happening that is "breaking" the connection for us. Is there a way we can log the last llblgen generated SQL query before a recovery is found to be needed?

The recovery strategy traces its recovery to the query execution trace switch (ORMQueryExecution) (info level). executed queries are also traced to that trace switch (verbose level). The verbose level gives a lot of data, so running this for a day might not be that helpful. But if you know which functionality will trigger it, you might be able to narrow it down a bit.

Connection timeouts however happen when the connection is opened. You could override: SD.LLBLGen.Pro.ORMSupportClasses.DataAccessAdapterCore.ExecuteMultiRowRetrievalQuery(IRetrievalQuery queryToExecute, IEntityFactory2 entityFactory, IEntityCollection2 collectionToFill, IFieldPersistenceInfo[] fieldsPersistenceInfo, Boolean allowDuplicates, IEntityFields2 fieldsUsedForQuery)

in a partial class of DataAccessAdapter, and wrap the call to the base method with a try/catch. If the connection open fails with the exception, you can then log the query there as it's in queryToExecute.

Frans Bouma | Lead developer LLBLGen Pro
greenstone
User
Posts: 132
Joined: 20-Jun-2007
# Posted on: 09-Dec-2021 16:44:45   

daelmo wrote:

greenstone wrote:

We're wondering if there is some query we're making right before these error start happening that is "breaking" the connection for us. Is there a way we can log the last llblgen generated SQL query before a recovery is found to be needed?

Have you tried ORMProfiler to profile the sql calls?

Thanks Daelmo. This is an issue we're only seeing in production when there are thousands of concurrent users, so I don't think the profiler will be able to handle that level of traffic.

greenstone
User
Posts: 132
Joined: 20-Jun-2007
# Posted on: 09-Dec-2021 16:45:15   

Otis wrote:

greenstone wrote:

We're wondering if there is some query we're making right before these error start happening that is "breaking" the connection for us. Is there a way we can log the last llblgen generated SQL query before a recovery is found to be needed?

The recovery strategy traces its recovery to the query execution trace switch (ORMQueryExecution) (info level). executed queries are also traced to that trace switch (verbose level). The verbose level gives a lot of data, so running this for a day might not be that helpful. But if you know which functionality will trigger it, you might be able to narrow it down a bit.

Connection timeouts however happen when the connection is opened. You could override: SD.LLBLGen.Pro.ORMSupportClasses.DataAccessAdapterCore.ExecuteMultiRowRetrievalQuery(IRetrievalQuery queryToExecute, IEntityFactory2 entityFactory, IEntityCollection2 collectionToFill, IFieldPersistenceInfo[] fieldsPersistenceInfo, Boolean allowDuplicates, IEntityFields2 fieldsUsedForQuery)

in a partial class of DataAccessAdapter, and wrap the call to the base method with a try/catch. If the connection open fails with the exception, you can then log the query there as it's in queryToExecute.

Thanks Otis!