Dealing with SQL Server Space - Part 1

Recently, I have been building a large database for the new House-Buddy project. The site uses a huge collection of data to identify the sweet-spots around a particular location. The site is hosted on a shared hosting which in theory had unlimited DB storage, but after they noticed the size of the database has grown to 47 GB, they put their foot down and ask me to make it smaller. Granted, I was a bit careless when it came to storing data, but I was still determined to build and host this site without changing my hosting provider.

I was blissfully oblivion of the internal workings of MSSQL Server when I started this project. For example, I didn't know that MSSQL doesn't shrink the file automatically and if you are in a shared environment you may lack privileges to do it manually using DBCC SHRINKFILE. I had to build the database that was 'small' from the very beginning. This article lists the key steps I made to reduce the size of the database.

Watch Space

There are really two main scripts that are needed to watch your space: one for the database size and one for the tables. 'Database size' is technically a sum of the database and its logs.

You can use the following script to assess which tables are causing the biggest issues:

The total space taken by tables should (roughly) equal the Used space return by the first query. However, as mentioned before, SQL Server counts logs and doesn't automatically shrink the files so there may be some drift.

Prevent the log growing

Outcome: 7 GB down to 300 MB

MSSQL databases are really made of two files: data and its log. The size of the log has got a considerable impact on the 'total' size of the database if it is not being watched. How the log behaves is set by 'recovery model'. There are three main types:

  1. Full - stores every transaction until either a transaction log backup occurs or the transaction log is truncated.
  2. Simple - stores transaction when it is being executed. Once the transaction is commited, the log is cleared.
  3. Bulk-logged - a mix between the two. Broadly behaves as Full except for certain operations such as Bulk insert where it behaves like simple.

Therefore, to reduce the log, one needs to either use SIMPLE or BULK-LOGGED. I decided for SIMPLE since seeding data wasn't a critical operation and I could always go back and reseed if necessary. Do read Microsoft document before changing your recovery mode, it may not be the right option for you.

The trick here is: switch between SIMPLE and FULL causes the log to be trunkated.