Database Corrupted

vikram7_dabasvikram7_dabas Member Posts: 611
Dear Concern,
I m using Navision 4.0 SP3 LS Retail IN Version with Native Database.There is 1 table "Trans Sales Entry",when I m taking Backup it is showing me following error:
There is a corrupted area in the database. This type of error occurs if the databse file is changed by another program or if the device driver does not function correctly.

When I test this single table then it also showing me same error given above. When I Copy that table and paste in excel the it is not showing me any error.When I transferred data in any other table(Created by us by using Save as) by using Transferfields function, then also it is not showing me any error.Please let me know how can I find wrong data.
Vikram Dabas
Navision Technical Consultant

Answers

  • rdebathrdebath Member Posts: 383
    Okay, firstly DON'T PANIC.

    The good thing is you seem to be able to copy the data in the table, that means you have a good chance of recovering the situation without losing any data. Still ... CHECK your backups and make sure you don't overwrite any good backups.

    These sort of corruptions are very rare with Navision but one time they can occur when the database is being written to while the PC's power is failing. For a PC the memory can start to lose power and get corrupted and yet the system is still running well enough to write this data to the disk drives complete with the flag that says it's valid ...

    I mention this because Navision A/S were rumoured to have a tool called C/DART that would roll back the last versions of the database, Microsoft may have this still and may be willing to use it, or even provide it to you. (as the Native Database is no long on the future plans)

    Okay, the easiest solution is to go to a backup, if you have one recent enough for the business to accept. If not, are you backing up frequently enough? Do you need MS-SQL's point in time restores?

    Assuming you want a full recovery, here a list of things you can try ON A COPY OF THE DATABASE
    1. Delete all the indexes on the table; recreate them by importing a saved copy of the object. Does it backup now?
    2. Try to delete all the data in the table. Then Delete and re-import the table, copy the data back.
      You need to check that you really have all the data, and there isn't a chunk skipped somewhere in the middle!
      If there is some missing, can you recover it from a previous backup?
      Does it backup?
    3. Copy every table to a different company in the database backup that company instead.

    If you can't get all the data this will still let you get most of it, is that sufficient?

    After you've created your backup use it to create a brand new database.

    ALSO ... Make sure the live hardware has no defects, run it through as many system tests as you can. Make sure the UPS works and the system shuts down BEFORE the battery runs out!
  • vikram7_dabasvikram7_dabas Member Posts: 611
    rdebath wrote:
    Okay, firstly DON'T PANIC.

    The good thing is you seem to be able to copy the data in the table, that means you have a good chance of recovering the situation without losing any data. Still ... CHECK your backups and make sure you don't overwrite any good backups.

    These sort of corruptions are very rare with Navision but one time they can occur when the database is being written to while the PC's power is failing. For a PC the memory can start to lose power and get corrupted and yet the system is still running well enough to write this data to the disk drives complete with the flag that says it's valid ...

    I mention this because Navision A/S were rumoured to have a tool called C/DART that would roll back the last versions of the database, Microsoft may have this still and may be willing to use it, or even provide it to you. (as the Native Database is no long on the future plans)

    Okay, the easiest solution is to go to a backup, if you have one recent enough for the business to accept. If not, are you backing up frequently enough? Do you need MS-SQL's point in time restores?

    Assuming you want a full recovery, here a list of things you can try ON A COPY OF THE DATABASE
    1. Delete all the indexes on the table; recreate them by importing a saved copy of the object. Does it backup now?
    2. Try to delete all the data in the table. Then Delete and re-import the table, copy the data back.
      You need to check that you really have all the data, and there isn't a chunk skipped somewhere in the middle!
      If there is some missing, can you recover it from a previous backup?
      Does it backup?
    3. Copy every table to a different company in the database backup that company instead.

    If you can't get all the data this will still let you get most of it, is that sufficient?

    After you've created your backup use it to create a brand new database.

    ALSO ... Make sure the live hardware has no defects, run it through as many system tests as you can. Make sure the UPS works and the system shuts down BEFORE the battery runs out!

    Dear Concern
    Can u please explain me step 1?I think u mean that first of all I have to transfer data of that table in another table,then delete data of that table then I will delete all keys from that table after that again I have to import data.Am I right?
    Vikram Dabas
    Navision Technical Consultant
  • rdebathrdebath Member Posts: 383
    Not step 1, it's the first option. Any one of A, B or C may fix the issue, if you're unlucky you will still lose some data.

    So to do Option A you
    1. Export the table object as a fob
    2. Go into table design for that object and remove all the keys except the primary key.
    3. Remove any sumindexes on the primary key
    4. Save the object
    5. Reimport the object from the fob file.

    If this and the backup all work without error the problem was on one of the secondary indexes and the database may be fine now; nevertheless you should use your backup to build a new database.

    If there's an error at any point try Option B with a new copy of the database.
  • vikram7_dabasvikram7_dabas Member Posts: 611
    Dear concern
    I have done these steps then also it is showing me same error.I think there is some thing wrong in Primary Key.
    Vikram Dabas
    Navision Technical Consultant
  • vikram7_dabasvikram7_dabas Member Posts: 611
    Dear Concern
    Please give me some other solution.I m waiting 4 ur response.Thanxs in advance
    Vikram Dabas
    Navision Technical Consultant
  • rdebathrdebath Member Posts: 383
    Dear Concern
    Please give me some other solution.I m waiting 4 ur response.Thanxs in advance
    Huh?
    Dear concern
    I have done these steps then also it is showing me same error.I think there is some thing wrong in Primary Key.
    Not unreasonable as it's the backup that found it, how about B or C then?
    Can you delete the contents of the table? eg: TableVar.RESET; TableVar.DELETEALL;
    If not you're gonna have to copy the entire database record by record as I said in C.
  • David_SingletonDavid_Singleton Member Posts: 5,479
    Hi Vikram,

    I assume this is a clients live system. By now you should have contacted Microsoft and sent the database to them for repair. As rdebath has said, they could have used C/DART (Cside DAta Recovery Tool) to recover the database. But since you have waited so long its probably too late, since you would now have to roll back too far and loose too much data.

    Generally you have been given some very good advice on how to fix this. But this is not a job you want to do the first time in a live environment, and you really should have gotten an expert to help you. The longer you let this go the higher the risk of total failure. basically if the NAV application tries to read the damaged record it will lock the database and the client will be shut down. You are skating on thin ice. Or more to the point you have sent your client out on ice that you know is too thin to skate on.

    Over the years I have seen a lot of these errors, not because it happens a lot, but simply because recover is what I do. One thing I can say, and that is the longer you wait the worse this issue gets, you need to get the DB to Microsoft NOW and pay them money to fix it, instead of trying to do it on the cheap.

    On the posts above, I do have one comment. As mentioned I have fixed a lot of these corruptions, and NEVER ONCE did I ever see a database corrupted due to a power failure. I have seen a case where the client was using a badly configured RAID5 and when it failed it wrote a corrupt packet to disk, but this was a failure of the setup of the RAID, NOT because of the power failing. I have had people tell me it was due to a power failure, and ever time we found it to be something else.

    in my experience the causes in order of the most likely are:
    • Network errors, most commonly because of Hubs or printers. (more than 50% of all cases)
    • Bad SCSI cables used to connect drives or incorrect placement of terminators. (maybe 20%)
    • Incorrectly configured RAID or some "intelligent disk management software". (less than 10%)
    • Bad sector on drive (maybe 5%)
    • Power failure (0%)

    Its nearly always a network error.
    David Singleton
  • David_SingletonDavid_Singleton Member Posts: 5,479
    I m waiting 4 ur response.

    [-X
    David Singleton
  • David_SingletonDavid_Singleton Member Posts: 5,479
    NEVER ONCE did I ever see a database corrupted due to a power failure.

    Sorry there is one exception to this, but not the same corruption as the one Vikram has.

    If you expand a database the whole version principle is turned off. So if the power fails during an expand your database is dead.
    David Singleton
  • vikram7_dabasvikram7_dabas Member Posts: 611
    Thanxs
    :D
    Vikram Dabas
    Navision Technical Consultant
  • rdebathrdebath Member Posts: 383
    On the posts above, I do have one comment. As mentioned I have fixed a lot of these corruptions, and NEVER ONCE did I ever see a database corrupted due to a power failure. I have seen a case where the client was using a badly configured RAID5 and when it failed it wrote a corrupt packet to disk, but this was a failure of the setup of the RAID, NOT because of the power failing. I have had people tell me it was due to a power failure, and ever time we found it to be something else.
    I mentioned the power failure mode because it is a known method of failure and would give a fixed point in time that C/DART would be used to roll back to. However, like you imply, I think that it's likely that this mode cannot happen with Navision Native because, I think, it's a COW database where the CPU is used to initiate a write of the 'Current version' pointer once the version data has been physically written to the DB. It's very likely that the CPU is more voltage sensitive than the RAM or the DMA controllers so the CPU dies first. This failure mode would be more likely on logfile style databases where the writes are pushed to the disk as a ordered stream without intervention from the CPU, eg using the SCSI scatter/gather commands at the HW level.

    There is one thing though David, I would expected that you've seen more of the high end hardware where stupid decisions to save a few pennies are rarer. One such decision would be omitting a tiny UPS ...

    Ahhh, RAID5, There's absolutely NO :-#
    Its nearly always a network error.
    Now this I don't get.

    As the link to the DB isn't encrypted I can see that a network error may introduce corruption into the user data written to the DB, but I really can't see how this will cause the kind of errors in the structure of the database that we're talking about because you'd get connection or command errors first. Unless, you're seeing people who access the DB as a local database on a mapped drive... eeeeeuw :shock: , Nope I really would NOT trust MS-networking to be truthful about successful physical writes to that level. But ...
  • David_SingletonDavid_Singleton Member Posts: 5,479
    rdebath wrote:
    Its nearly always a network error.
    Now this I don't get.

    As the link to the DB isn't encrypted I can see that a network error may introduce corruption into the user data written to the DB, but I really can't see how this will cause the kind of errors in the structure of the database that we're talking about because you'd get connection or command errors first.

    This took me a long time to work out (years in fact) for a long time I thought bad sectors were the issue, but sitting with some guys at PC&C and reviewing some specific cases all was clear. Basically if a sector is bad then you wont be able to write to it. All the packets you send from fin.exe to server.exe have an additional CRC overlaid on them by Navision. This is why often you can run extensive network tests that say the network is fine, but Navision gives network errors.

    Now if you send 100 packets with errors, there is virtually no chance of them passing through the CRC. But when there is a bad device on the network, like a faulty hub, or a net card that has bad contacts, the network may resend a packet many thousands of times before it gets it right. Now as the number of packets increases the odds of passing the CRC start to come into the real world. (you can do the math). And it just needs one bad packet to get through and write one bad byte to a disk, and the disk then looks corrupted, since the odds of it surviving the same CRC error the other way are virtually nill.

    So for me a bad network is the first thing I look for. What I do is set up a copy of the Navision database, and run a backup, restore and Adjust cost at the same time. Then I monitor the network and look for bad packets. As this is running, I have the client go to every printer and what ever other device they have and test them, often under load the bad device shows up.

    Once of all things it was a guy in a corner office that never used Navision ever. He printed huge Auto-cad pictures to a colour laser printer (this was when colour lasers just came out) and he did this using a no name Ethernet to parallel port connector. He was connected to the network by a hub not a switch. We simply plugged the printer into the back of his computer and never had another corruption.

    I had another client that replaced every drive and controller in their server TWICE as well as swapping to another server and swapping all the cards form one to another. They were adamant that Navision was the problem. And the db corrupted a few times. Eventually they got a network analyzer on the net and found a net card in a client pc that was not pushed all the way in. Pushed the card in and two years later not a single issue.

    I have many stories like that, though int eh DOS version, most of the errors I found were from SCSI terminators. Companies would expand their DB and add more drives, so plug more SCSI drives in. The correct procedure is to then move the terminators to be at the end of the cable. But often they simply added the drive to the end of the existing cable and that would kill it.
    rdebath wrote:
    ... the CPU is used to initiate a write of the 'Current version' pointer once the version data has been physically written to the DB. It's very likely that the CPU is more voltage sensitive than the RAM or the DMA controllers so the CPU dies first. This failure mode would be more likely on logfile style databases where the writes are pushed to the disk as a ordered stream without intervention from the CPU, eg using the SCSI scatter/gather commands at the HW level.

    There is one thing though David, I would expected that you've seen more of the high end hardware where stupid decisions to save a few pennies are rarer. One such decision would be omitting a tiny UPS ...

    Ahhh, RAID5, There's absolutely NO :-#
    Unless, you're seeing people who access the DB as a local database on a mapped drive... eeeeeuw :shock: , Nope I really would NOT trust MS-networking to be truthful about successful physical writes to that level. But ...

    Yes seen that as well, but its normally a catastrophic and instant db failure. Its a sign of true insanity when someone does that.
    David Singleton
  • rdebathrdebath Member Posts: 383
    Basically if a sector is bad then you wont be able to write to it.
    Wrong, very wrong. For all drives nowadays the S.M.A.R.T. system (the idea copied from older high spec SCSI drives) means that if you write to a sector that the drive knows is bad it will reallocate that sector to a different location on the disk so it will work fine from then on. In addition, as I understand it, when a modern drive overwrites sectors it overwrites everything including headers and synch markers. This means that if you write every single sector on the disk it works exactly like the old style low level format. You need to check the "reallocated sector count" in the S.M.A.R.T. stats for the drive to see how many bad sectors the drive has had.

    There's an old specification joke ...
    All scsi drives must send an email 7 days before failure
    S.M.A.R.T tries to make that a reality.
    an additional CRC overlaid on them by Navision
    This is a good thing. But are you sure it's not just actively complaining about errors detected by the IP checksums. For modern network speeds these are a bad joke, one in maybe 10MB of errored packets will get through. But if Navision adds another checksum it'll be at least 32 bits which will knock 5 to 9 orders of magnitude off the undetected error rate. More than good enough to stem this source of corruption.
    I had another client that replaced every drive and controller in their server TWICE as well as swapping to another server and swapping all the cards form one to another. They were adamant that Navision was the problem. And the db corrupted a few times. Eventually they got a network analyzer on the net and found a net card in a client pc that was not pushed all the way in. Pushed the card in and two years later not a single issue.
    Okay, this sure looks like network errors causing database errors.
    Error 18 in module 244 is a CRC error passed on to Navision Server by the network driver.
    Okay, fine. I must be wrong, in some (almost impossible I thought) way a corruption of the network command stream can lead to a structural corruption of the Navision database. The only way I can see this happening is if the service blindly trusts that the data the client gives it is correct and error free and some important part of the B-Tree manipulations is done by the client. This looks very weird to me, though it would explain why record level locking isn't there. Also I can only believe that Navision does NOT add any other CRCs to the channel, it just reports errored packets whereas other applications always ignore them.
    Yes seen that as well, but its normally a catastrophic and instant db failure. Its a sign of true insanity when someone does that.
    Ahhh, users, don't cha just lov 'em. :mrgreen:

    Though, don't tell anyone, but it seems to work fine on our network. Not too slow either, must be because of all that write caching. :wink:
  • David_SingletonDavid_Singleton Member Posts: 5,479
    rdebath wrote:
    Basically if a sector is bad then you wont be able to write to it.
    Wrong, very wrong. For all drives nowadays the S.M.A.R.T. system (the idea copied from older high spec SCSI drives) means that if you write to a sector that the drive knows is bad it will reallocate that sector to a different location on the disk so it will work fine from then on.

    Correct, but basically it is not then writing to that Physical sector, it is using a new location on the disk. It wont let you write to a bad sector. Either way you look at it I am right, very right, basically the sector is being (virtually) repaired BEFORE its is written to so, my statement stands that you can't write to a BAD sector. :mrgreen: But hey who wants to split hairs. :wink:
    David Singleton
  • rdebathrdebath Member Posts: 383
    my statement stands that you can't write to a BAD sector. :mrgreen: But hey who wants to split hairs. :wink:

    :lol: , of course, there's another hair to split; without SMART in the way you can actually ONLY write to bad sectors, the write will succeed. It's just that you can never read what you wrote.

    Next ... :mrgreen:
  • rdebathrdebath Member Posts: 383
    David, something I noticed in the Security Hardening Guide:

    viewtopic.php?f=23&t=41573&p=204013#p204013
Sign In or Register to comment.