Recently I was troubleshooting a case on Database Replication – Link has Failed between the CAS and one of the Primary Sites.
Problem: The main issue is with the Data Replication Service (DRS). It was found out that the Primary Site’s Database Server is running low on storage. However after increasing the database storage the link between the CAS and Primary site remained broken.
Findings and Resolution: There was a drive issue on Primary Site’s database site system which caused replication to break.
Actions taken upon investigation:
- Restarted sms execuitve and sms component services on the Primary Site (PS3) server
- Restarted the SQL services in CAS and the affected Primary Site’s Database server
Symptoms: It was reported that applications deployed to Windows 10 machines are not coming down in Software Center and only the machines under the Primary Site (PS3) are the ones affected.
During investigation: When checked, it was confirmed that package contents are not getting distributed to all Distribution Points (DP) under the Primary Site (PS3).
Checking the Database Replication: Link state = Link Failed
Checking System Status > Site Status: Status all OK however the free space in Primary Site’s database is running low.
Resolution 1: Worked with Server and Database team to increase the storage for the database.
Problem solve?: NO, Link state is still Link Failed
Further investigation and actions: CAS site is confirmed active and Primary Site is in replication maintenance.
The following query on was run against the Primary Site’s database server:
select * from RCM_DrsInitializationTracking where InitializationStatus not in (6,7)
but did not return any output.
We then ran another query:
select * from RCM_ReplicationLinkStatus where SnapshotApplied <>1
it returned the following:
From the SCCM console, the Link State was Link Initializing
Next step was to restart the sms execuitve and sms component services in Primary Site (PS3) server after the service restart, we were getting vlogs from PS3 database which corresponds to same status as what was seen in the console for the two replication groups below:
Requests got stuck.
We did the same in the CAS database server, there were no cab files or dumps SQL tables for the Init request in the rcm.box.
The final step was to restart the SQL services on the CAS and PS3 the databases. After the restart it took some time to get the PS3 link Active. After running a query we got the current site status: ReplicationActive SMS_REPLICATION_CONFIGURATION_MONITOR
xxx End of troubleshooting xxx
Hope this post helps someone who’s encountering the same problem.
Have a nice day!