The following mail reached me a few days ago and I think the solution could also help others. It concerns a single Exchange 2016 server on Windows Server 2012 R2. Here is an excerpt of the problem from the email:
But now I am at the end of my rope and wanted to find out if you have ever had such an error in your Exchange installations. My Exchange Server "dies" almost daily with the following effects:
- Mails are no longer delivered and I receive an error message in the queue overview (see PrintScreen in the attachment)
- Some users can no longer connect to the Exchange Server (in the connection overview, it counts up the connection attempts endlessly - but no connection is established).
I am currently working around this by rebooting the Information Store and the RPC Client Access service. This then runs again for a few hours and I have the same error again.
A small screenshot was also attached:
Here is the text of the message:
432 4.3.2 STOREDRV; mailbox server is offline; STOREDRV.Deliver.Exception:ConnectionFailedTransientException.MapiExceptionNetworkError; Failed to process message due to a transient exception with message Underlying MAPI stream threw exception
I already had a hunch where the problem was and then requested the event log. They kindly provided me with the event log and also gave me a time when the problem last occurred.
The following error was logged in the application log:
Source: MSExchange Availability
ID:: 4009
Process Microsoft.Exchange.InfoWorker.Common.Delayed`1[System.String]: Unable to open connection for mailbox MAILBOX SMTP:EMAIL-Adresse. Exception returned is: Microsoft.Exchange.Data.Storage.StorageTransientException: The process failed to get the correct properties. —> Microsoft.Mapi.MapiExceptionRpcServerTooBusy: MapiExceptionRpcServerTooBusy: Unable to get properties on object. (hr=0x80004005, ec=2419)
The cause was then also clear:
The server therefore has a lot to do. Throttling policies are in place to prevent individual users from consuming too many system resources on the server and thus slowing down other users and/or the system. Throttling policies limit the resources that a single user can use.
To be on the safe side, I then asked another question:
Do many of your users have several mailboxes open? How many mailboxes are in one database?
The answer to my question:
I would estimate the number of mailboxes to be around 450 and yes, most of my users (I have around 80-100 concurrent users) have several (up to 20) mailboxes open.
In this case, all mailboxes were stored in a database.
On the trail of the problem
I recommended creating another database and moving 50% of the mailboxes to the new database. There should be a balanced ratio here to distribute the load as evenly as possible. Therefore, not all power users should be moved to a new database, but only half, the rest can then be filled with less active mailboxes. In this way, a fairly balanced ratio of database size and load can usually be achieved. An Exchange Server 2016 Standard Edition can manage a maximum of 5 mounted databases per server (Enterprise Edition: 100 mounted databases per server).
The load can therefore be better distributed by using several databases. Multiple databases also make sense with regard to backup and restore, as one stream can be opened for each database during backup (if the appropriate software is used), thereby increasing the throughput of the backup. Restore times can also be improved; instead of restoring a large database from the backup in the event of an error, it may be sufficient to restore one of several smaller databases.
Back to the throttling policies:
By default, there is a global throttling policy for all Exchange 2016 servers. Among other things, this policy defines the number of simultaneous connections per user/client/protocol. One value is particularly important for the current case: RCAMaxConcurreny
The RcaMaxConcurrency parameter specifies how many concurrent connections an RPC Client Access user can have against an Exchange server at one time. A connection is held from the moment a request is received until the connection is closed or the connection is otherwise disconnected (for example, if the user goes offline). If users attempt to make more concurrent requests than their policy allows, the new connection attempt fails. However, the existing connections remain valid.
RCAMaxConcurreny therefore defines the number of simultaneous connections per client. The default setting is 40:
Get-ThrottlingPolicy | ft name,RcaMaxConcurrency
In this case, however, some users have up to 20 mailboxes open. Here is a screenshot for my user who has 2 mailboxes open:
Here you can see 5 connections, 3 for the primary mailbox (my own) and 2 for the additional mailbox. Extrapolated to 20 mailboxes, this results in the following:
3 connections for your own mailbox and 20 x 2 connections for the additional mailboxes. A total of 43 connections. However, the policy allows up to 40 simultaneous connections.
Apparently the value RCAMaxConcurreny of the throttling policy also refers to the database, because all connections can be established if 10 of the mailboxes are stored in DB1 and the other 10 mailboxes in DB2. This means that a total of only 23 connections to DB1 (own mailbox + 10 other mailboxes) and 20 connections to DB2 (10 other mailboxes) would run.
However, I have not yet been able to verify this conclusively.
The solution to the problem
Ultimately, two measures were implemented that solved the problem:
An additional database was created and the mailboxes were distributed as evenly as possible based on mailbox size and user activity.
An additional throttling policy has been created and the value for the RCAMaxConcurreny parameter has been increased:
New-ThrottlingPolicy -Name FrankysWebPolicy -ThrottlingPolicyScope Organization -RcaMaxConcurrency 80
So far, the problem has not recurred.