Hello there!
SQL Server 2008 (10.0.4000)
Dynamics NAV 2009 R2 (6.0.32012)
Classic Client only
Background story:
Got a call from a client some time ago. The client told me, they had to shut down the server for a planned maintenance work (at the power lines). After they started the server, the database "didn't came up immediately". Instead of waiting they rebooted the server again. Not the best idea but it seemed to work. The database was accessible. Soon after, NAV started showing nasty errors. Once the errors where so bad they couldn't work anymore, they contacted me.
It was clear from the error messages they had some consistency errors within the database, a DBCC CHECKDB confirmed this (hundreds of them!). Luckily, only NCI where involved. Before I started repairing the database I checked the system.
It's an old server (>5 years) without warranty. The SQL errorlog and Windows event logs were fine. So were the server's maintenance logs (HP). No indication for a hardware fault or damaged logical volume. That's strange.
My first step was to update all firmwares and drivers (all of it pretty old).
After that I repaired all broken indexes and did a final consistency check, which returned no errors. I set up a consistency check task to run daily (at night). Just in case I missed anything or the problem returns.
All was good - for two days.
After two days I got a call. The CheckDB task reported errors (error 8936, severity 16). Indeed, there were new consistency errors, just a hand-full. Again, only NCI (and one VSIFT$0). I took me only a couple of minutes to fix them, no big deal. My problem is, I still don't know how or why they happen. And I don't want to wait until a table CI is affected, that would be nasty.
Involved tables/indexes seem to be totally random. Server logs, Windows logs are clean. So, it seems it's not a hardware/SQL Server issue but maybe an application one? Could NAV be the cause of this? Maybe because of a bad build (though, we used to install this build quite a bit in the past and never saw something like this) or through bad programming? Am I missing something?
Errors still happen, usually after a couple of days and the past two times only 2 errors each time.
0