Macrium Support Forum

Annoying Erroneous Backup Failed Alerts Caused By Temporary Agent Disconnect

https://forum.macrium.com/Topic72116.aspx

By uit - 7 July 2023 1:28 AM

We are loving SM 8.1!  Thanks for making it much faster!
Though, one thing that's bothering us is lots of email alert messages when an SM Agent briefly loses connection with the SM Server. Can you all throttle back the sensitivity please?

Unknown Operation failed on Server1
Error - Agent disconnected
Definition: Server1 Backup
Repository: Server1 Offsite
Please check the Site Manager for further information.

That reminds me, after a SM Agent loses connection, even briefly, to the SM Server, from that point forward, all additional messages, including the final backup success/failure email message, don't show the SM Agent's Server name any longer. Where the Agent Server name would typically appear, there's a blank space.

Thanks!
By Alex - 9 July 2023 10:45 AM

uit - 7 July 2023 1:28 AM
We are loving SM 8.1!  Thanks for making it much faster!
Though, one thing that's bothering us is lots of email alert messages when an SM Agent briefly loses connection with the SM Server. Can you all throttle back the sensitivity please?

Unknown Operation failed on Server1
Error - Agent disconnected
Definition: Server1 Backup
Repository: Server1 Offsite
Please check the Site Manager for further information.

That reminds me, after a SM Agent loses connection, even briefly, to the SM Server, from that point forward, all additional messages, including the final backup success/failure email message, don't show the SM Agent's Server name any longer. Where the Agent Server name would typically appear, there's a blank space.

Thanks!

Hi,

Internally, the Site Manager Agent uses an active connection to the server - this should not break unless there are network conditions that cause the break to happen, internally the connection uses TCP timeouts and retries in order to maintain an active state.
It might be best to contact support and see if they can collect some logs for analysis so we can check what's going on. Sometimes backups (particularly with 8.1's faster speed) can saturate a client's network connection to the point that they drop the agent comms channel. Backup Definitions have a rate limit option for this, so you could try setting that to 80% of your client's network connection bandwidth and see if it helps.

For the email issue you are seeing, are you using Site Manager or the Agent Reflect to send those emails?
By uit - 9 July 2023 3:20 PM

Yes, the alerts are caused by a very brief network communication slowdown or interruption. I know this because it often happens while remote servers are backing up chewing up a good amount of the Internet connection. HOWEVER, Macrium can be less sensitive to brief interruptions. It's not transiting the backup data, so it needn't be such a stickler.  Deliberately slowing down backups is not a proper way to fix this issue.

SM sends the emails.

Reminder: After a SM Agent loses connection, even briefly, to the SM Server, from that point forward, all additional messages, including the final backup success/failure email message, don't show the SM Agent's Server name any longer. Where the Agent Server name would typically appear, there's a blank space.

Finally, the server running SM is very busy during backups, which is curious. Again, getting back to Macrium design, could it be more efficient in the way it monitors backups?  We're intending to add more Agent licenses. I'm wondering if it's going to overwhelm the server with it's incessant behavior.
By Alex - 10 July 2023 10:53 AM

A drop should only happen because of events that exceed TCP comms tolerances, which we can set - but that should only be happening in genuine disconnection cases or complete network saturation. Normally, we'd set these tolerances high to prevent network drop off, but since the number of simultaneous backups to a repo is limited, we don't want to hold up queued backups up for an Agent that has genuinely failed their connection, so we fail a bit sooner. It's possible that for your setup, these limits aren't working too well, if you talk to our support team they can gather some logs and information for the dev team to analyse. 

For the second point, the notifications should have the agent information attached - that is all done via cached data from the Site Manager server, not a reference back to the original data. It might be a bug or something else going on, so I'd ask you to contact support so we can gather logs and get a fix sorted out. 

We have some users running 500 agents to a server - things should scale fine to that level. The server is generally busy maintaining a live cache for any web sessions to view, this should be fairly constant regardless of the number of backups available.