Database suddenly at 100%

bbrownbbrown Member Posts: 3,268
edited 2012-01-12 in Navision Attain
This is for one of our clients that is approachign the 256 GB limit of the C/Side database. They are currently involved with us in an upgrade to NAV 2009 with SQL.

Over this past weekend we worked with them to delete some historical data. That brought the free space to around 25% as of Tuesday.

This mornign they reported the DB was showing 100% full and not allowign any activity. I have done the following:

First I got all users out and stopped the database service.

Then I made copies of the database files to another server for safety net.

Next I logged in as single user. The database was reporting 100% full. I tried to opitimze a table but it would not let me.

Next I slightly expanded the database files. They were are at a total of 259,000,000 and I expanded to 262,143,952.

I then logged out and back in. The system briefly show a "recovering free space" message but then failed with a no space message.

Now when I log in it's back to 100%.

any thoughts on a solution?
There are no bugs - only undocumented features.

Answers

  • bbrownbbrown Member Posts: 3,268
    BTW - I have opened a support call with MS. Just seeign if anyone had ideas while waitign on them. :D
    There are no bugs - only undocumented features.
  • David_SingletonDavid_Singleton Member Posts: 5,479
    This is a common and known bug in a few versions of Navision. I have posted about it quite a few times.

    here is one where you even replied to

    viewtopic.php?t=23953&view=previous

    and google

    https://www.google.com/search?source=ig ... 0l0l0ll0l0
    David Singleton
  • David_SingletonDavid_Singleton Member Posts: 5,479
    By the way everytime I had a client with this error in the end it turned out to be a network error. I always found the single hardest part of fixing the issue was to convince the customer that it was a network error.
    David Singleton
  • bbrownbbrown Member Posts: 3,268
    Thanks.

    I am currently logged in as single-user (service is off) and running a backup. If successful, I'll restore to a new database.

    BTW - this is a 3.60 DB with 5.0 executables. We did the executable update a few years back cause we want to use the GETLASTERRORTEXT feature.
    There are no bugs - only undocumented features.
  • bbrownbbrown Member Posts: 3,268
    By the way everytime I had a client with this error in the end it turned out to be a network error. I always found the single hardest part of fixing the issue was to convince the customer that it was a network error.

    I can't confirm or deny that. But won't rule it out. This system has been hanging on for sometime, and the client has repeatedly pushed off upgrading. We're finnaly upgrading them to NAV 2009 but it will be a couple more months before live. Actually, it's a complete re-implementation. With very few modifications and olny master data coming forward.

    This is a client that wrote 50 invoices the first month we implemented NAV. Today that's 1500+ per day. Management thinks nothing of throwign millions at the plant floor, but takes 3 years to approve a NAV upgrade project for 10% of that. Cause in there words "NAV doesn't make them any money".
    There are no bugs - only undocumented features.
  • David_SingletonDavid_Singleton Member Posts: 5,479
    bbrown wrote:
    Cause in there words "NAV doesn't make them any money".

    ](*,) ](*,) ](*,) ](*,)

    Good luck... you will need it.
    David Singleton
  • bbrownbbrown Member Posts: 3,268
    The restore has succeeded
    There are no bugs - only undocumented features.
  • David_SingletonDavid_Singleton Member Posts: 5,479
    bbrown wrote:
    The restore has succeeded

    But have you fixed the issue. sometimes it just comes back again. Monitor it closely and check out the network with some heavy duty network testers (Hardware based, not software ones).
    David Singleton
  • bbrownbbrown Member Posts: 3,268
    Have to find it before you can fix it.

    The feedback I got from MS was that this could also be a disk issue. Sort of makes some sense. That is a transaction gets committed which, because of commit-cache, is actually only committed in memory. Then soemthing goes wrong during the write to the disk....
    There are no bugs - only undocumented features.
  • David_SingletonDavid_Singleton Member Posts: 5,479
    bbrown wrote:
    Have to find it before you can fix it.

    The feedback I got from MS was that this could also be a disk issue. Sort of makes some sense. That is a transaction gets committed which, because of commit-cache, is actually only committed in memory. Then soemthing goes wrong during the write to the disk....


    Yes I have gotten that exact reply from PC&C/Navision/MS each time. We had one client that put all new drives in their machine and then the error came back. This time they replaced the controllers and again the drives. When it happened the third time we got them to do a thorough net work analysis and found a bad card in one of the clients. They replaced that card and all was well. The thing is that the network error allows the server to pass bad data onto the drives, so it looks like a drive error.

    Of course it could be a drive error, but I have had this error a few times, and in the end it was always a network issue.

    Another interesting thing was that the second time I had this issue with a client and logged it at Navision, they said that this was an unknown issue and that no one had ever reported it before, which was odd since I knew that I had reported it before. Though through a different partner.

    And one last thing I remembered. Although we could never reproduce the error, its clear that it happens at time of extremely high network activity. Which traditionally also is when the disks are generally most active. The client that had it three times, found that it was happening when they were doing a backup and some batch tasks at night such as Inv cost adjustment. So it might be a good idea to have your client make sure to log everyone else off when they do a backup. Though with a DB that big I guess they use Hotcopy.
    David Singleton
  • David_SingletonDavid_Singleton Member Posts: 5,479
    Also since they probably already have a SQL server, is there any way you can move the current DB to SQL till the new implementation is ready?
    David Singleton
  • bbrownbbrown Member Posts: 3,268
    bbrown wrote:
    Have to find it before you can fix it.

    The feedback I got from MS was that this could also be a disk issue. Sort of makes some sense. That is a transaction gets committed which, because of commit-cache, is actually only committed in memory. Then soemthing goes wrong during the write to the disk....


    Yes I have gotten that exact reply from PC&C/Navision/MS each time. We had one client that put all new drives in their machine and then the error came back. This time they replaced the controllers and again the drives. When it happened the third time we got them to do a thorough net work analysis and found a bad card in one of the clients. They replaced that card and all was well. The thing is that the network error allows the server to pass bad data onto the drives, so it looks like a drive error.

    Of course it could be a drive error, but I have had this error a few times, and in the end it was always a network issue.

    Another interesting thing was that the second time I had this issue with a client and logged it at Navision, they said that this was an unknown issue and that no one had ever reported it before, which was odd since I knew that I had reported it before. Though through a different partner.

    And one last thing I remembered. Although we could never reproduce the error, its clear that it happens at time of extremely high network activity. Which traditionally also is when the disks are generally most active. The client that had it three times, found that it was happening when they were doing a backup and some batch tasks at night such as Inv cost adjustment. So it might be a good idea to have your client make sure to log everyone else off when they do a backup. Though with a DB that big I guess they use Hotcopy.


    I'm not ready to rule out a network issue. That was actually the first cause I mentioned to their network integrator. I've also seen it before, and with databases other than NAV. But when it comes down to it, I can only recommend they look into it. I'll be discussing it with them again tomorrow.
    There are no bugs - only undocumented features.
  • bbrownbbrown Member Posts: 3,268
    Also since they probably already have a SQL server, is there any way you can move the current DB to SQL till the new implementation is ready?

    The only SQL Server currently in place, is the one being used for the "conference room pilot". The best way to describe it would be "glorified workstation". It's purpsoe was to support a small group (2 -4) of users to review and redesign business processes in the new system.

    Moving to SQL would also mean a need to migrate the existing database. That equals time, which equals downtime.

    No guarentee that the existing code runs fine under SQL. We could go thru the conversion effort and end up with critical performance issues. Remember this is still a 3.60 database.
    There are no bugs - only undocumented features.
  • David_SingletonDavid_Singleton Member Posts: 5,479
    yes it look like you have done your homework, and are doing the best possible under the circumstances. In those cases moving to SQL right now would probably be more risky than the risk of the error happening again.

    I guess the best thing you can do is just make sure they have good solid and regular backups, and a good strategy to rebuild and recover if it happens again.

    On the bright side, at least this give more motivation to make the upgrade happen.
    David Singleton
  • bbrownbbrown Member Posts: 3,268
    Well, as predicted above, this has occurred again. But this time we know the cause. It was a "disk issue" brought on by the "perfect storm" of "user initiated" events. Or as one client says "the loose nut behind the keyboard".

    Here's what happened:

    A user went to an electrical panel to shutoff power to some machinery on the plant floor. In doign so, they accidently turned of the circuit that supplies power to a UPS in the computer rroom. On of the devices connected to the specific UPS was the disk array holdign the NAV database.

    Then for several hours nobody noticed this (or the UPS beeping, etc) and eventually it shut off, bring NAV to a crashing halt.

    When we got it back up, the DB was at 100%. But this time we were unable to do a backup and are having to revert back to the last hotcopy.

    ](*,)
    There are no bugs - only undocumented features.
Sign In or Register to comment.