Database Corruption In Sql Server

Posted on

Database Corruption In Sql Server – Ultimately, there are only two main things DBAs need to worry about: making data available to the right people and making it inaccessible to the less right people. All other considerations are really just additions to these two concerns. For example, performance is just an extension of making data available to the right users – because if they don’t have access to it in a timely manner, it’s not really as “available” as it should be.

Therefore, disaster prevention and recovery concerns should be high on every DBA’s list. Still, I find surprisingly that most database administrators (especially the “reluctant” ones) don’t know enough about how to adequately protect themselves from database corruption.

Database Corruption In Sql Server

Database Corruption In Sql Server

So I thought it would be fun to jump into a multi-part set of posts that provide a practical overview of the basics of SQL Server database corruption – including an overview of what corruption is, why you can’t really prevent it – and what you can do to deal with it. they have dealt with reality in a way that ensures adequate availability and availability of your data, even in the case of corruption.

Migrate Sql Server To New Server

For the purposes of this series of articles, database corruption is defined as a problem associated with the incorrect storage of zeros and ones required to store database data at the disk or IO subsystem level.

In this sense, the corruption discussed in this series of posts is VERY different from other types of “disasters” that could render corporate data useless (for example, when a developer or administrator executes an UPDATE statement without a WHERE, or a bad software problem ” mangle unit prices). While user/application errors or “disasters” are technically very different from corruption disasters, it’s important to note that many of the mitigation strategies described in this post series for combating SQL Server database corruption are also best practices that can be easily misused as a means of dealing with other types of database disasters.

In virtually all cases of SQL Server database corruption (more than 99.99% by most accounts), the root cause of the corruption is problems with the I/O subsystem, meaning that the root cause is a problem with drives, controllers, and possibly drivers. . And while the specific root causes can vary widely (simply due to the sheer complexity of managing magnetic storage), one key thing to remember about disk systems is that everyone in IT knows that all major operating systems come with the equivalent of some kind of control disk utility (CHKDSK) that can scan for bad sectors, corrupted items, and other storage problems that can and will occur in storage environments.

In other words, when you think about all the complex operations that go on (at breakneck speed) when it comes to backing up and restoring data, it shouldn’t surprise you that things can go wrong from time to time. In fact, in a fantastic excerpt from one of the best “hard” sci-fi books I’ve read in a while, author Christian Cantrell explores the dangers of magnetic storage through the eyes of his protagonist

Sql Database Recovery Software To Repair Corrupt Sql Database

Removing moving parts from machines was the best way to increase reliability. Since the subatomic laws of the universe dictate that it is physically impossible for a moving part to move in exactly the same way every time it moves, every moving part in the system introduces some degree of unpredictability and unexplained variability. One could even say vulnerability. What if the critical part doesn’t move enough? what if he moved too much, too fast, or at a slightly slower speed than last time? What if he didn’t move at all? How many times can it move before it wears out? How long will it take for the anti-friction lubricant to wear out? How does temperature affect the properties of this lubricant? Christian Cantrell – Detention (Chapter 30)

In other words: Corruption is not really a question of whether. The question is more about when. And in that sense, when SQL Server databases are corrupted, the source of that corruption is almost always the result of some failure at the disk subsystem level, in the sense that the highly structured data “stored” on disk was simply written badly. As a result, when SQL Server reloads this data from disk, it encounters “zeros and ones” that are broken or “broken” and corrupted – which in turn means that databases can and DO lose data when the IO subsystem in databases SQL Server level corruption is occurring.

So now that we’ve covered what corruption is and how it’s caused, I thought it would be fun to look at a practical example of creating or simulating corruption – since most people tend to learn best by example. So in my next blog post, I’ll provide a “hands-on” overview of ways you can deliberately go in and “crip” or “corrupt” your own non-production databases to see exactly how the corruption works – and serve as the basis for more examples and “demonstrations”. , in which we look at ways to repair this damage. First of all, if you’re reading this article and you’re not regularly performing consistency checks on your SQL Server databases, you should drop whatever you’re doing and go for it. What do I mean regularly? In my opinion, based on years of experience, you should run DBCC CHECKDB at least as often as you do a full backup. If you don’t have a clean consistency check, you don’t know if the last backup you made is valid. SQL Server readily backs up a damaged database.

Database Corruption In Sql Server

I cheated a bit here and used an undocumented command called DBCC WRITEPAGE to corrupt one 8KB page in a non-clustered index on a table I created. You should never use this command unless you’re trying to corrupt something for a demo like this, but as you can see after the page corruption, SQL Server fails CHECKDB, then happily creates a backup of our already corrupted database.

Dbcc Checkdb Commands In Sql Server To Repair Sql Database

Aside from doing something horrible like editing a page, database corruption is mainly caused by storage failures. An example would be your local SAN, where the SAN operating system recognizes that a write operation to the host OS has completed, but the write does not actually complete. SQL Server received a write confirmation and thinks the data was successfully written to the page, but for some reason it wasn’t. This happened several times at a previous job when our “data center” SAN (didn’t) crash hard when the building lost power (yes, we didn’t). medical devices from). What actually happened was that the SAN was acknowledging writes when data hit memory on the SAN, which is a performance improvement that assumes you have the right power infrastructure to prevent a “brutal SAN crash”. You know what’s going on, you assume, right?

In any case, this is much less common than before for several reasons, one of which is the use of cloud storage, which is very robust in terms of data protection. In addition, modern enterprise-class SANs are more efficient and less likely to experience such failures. It’s still very possible though – I had a minor Azure VM corruption event a few years ago and we had a customer populate their very non-enterprise class SAN with terrible results (it was all corruption). So the moral of the story is that wherever you run SQL Server, you should run checkdb. (Except for Azure SQL DB and possibly a managed instance).

There are many things and tools that people will try to sell you on the internet to fix your database corruption. Almost all of them are crap – if you have corruption in a clustered table or index, or worse, one of the system pages that determines allocation, you’re screwed and need to restore your last good backup. (see why it’s important to keep backups here?)

However, in some cases you may get lucky. If your damage is limited to a non-clustered index, you don’t need to restore the database and can just rebuild your index.

Best Practices To Migrate Sql Server To Oracle Database

However, in my case it just generated the dreaded SQL Server error 824. I suspect it has something to do with how I corrupted the page, but that investigation is not complete. I was able to disable the index, then rebuild it and we had a successful CHECKDB. I was asked a very good question while researching. How does database corruption occur on SQL Server? And how to repair a damaged database in MS SQL Server? For example: “We plan and store our valuable data in a SQL database. Suddenly we found that the SQL Server database was corrupted

Sql server database monitoring, sql server database corruption repair, microsoft sql server database, backup sql server database, sql database corruption, sql server database management, database in sql server, sql database corruption causes, recover sql server database, repair database sql server, page corruption in sql server, repairing sql database corruption