Checks
There are 17 checks currently supported in 3 categories.
Data Types
- Time based data types
- Arbitrary VARCHAR length
- Unspecified VARCHAR length
- Mad VARCHAR Max
- NVARCHAR data type (Express only)
- FLOAT and REAL data types
- Deprecated data types
- BIGINT as IDENTITY (Express only)
- NUMERIC or DECIMAL with 0 scale
- Enum columns not implemented as foreign key
File Growth
- Database growth past 10GB (Express only)
- Database growth type
Architecture
Time based formats
Checks that commonly named time columns are using one of the recommended date/time data types. Storing date/time data in other data types may take up more storage and cause performance issues.
Arbitrary VARCHAR length
A variable length column should be based off of business requirements and only be as large as the maximum amount of data needed to be stored. Using classic numbers like 256/255 to estimate the length of a column usually indicate that the exact length required has not been properly assessed.
Unspecified VARCHAR length
If a VARCHAR
column is created without specifying a length, it defaults to one. If this is done by mistake, it may cause truncation of data as it is inserted into the table. If only one character is needed, CHAR(1)
is preferable as it uses 2 less bytes than VARCHAR(1)
.
Mad VARCHAR Max
While using VARCHAR(MAX)
/NVARCHAR(MAX)
can be tempting as a one size fits all solution, it is generally bad design practice unless absolutely required. It limits the column’s ability to be used as an index key, makes online index rebuilding impossible, can lead to storing data pages out of row, and causes row size estimates to balloon. Generally, performance will be degraded.
NVARCHAR data type
With the database size limit of 10GB for user databases in SQL Server Express, choosing the smallest data types is integral. Avoid NVARCHAR unless the column requires Unicode data. Instead, VARCHAR will only use approximately half of the space to store similar data.
FLOAT and REAL data types
While FLOAT
and REAL
can store a precision up to 53, they inherently store approxmate data. If stored values are required to be exact or are queried as WHERE =
, then inexact results may be returned. DECIMAL
or NUMERIC
should be chosen if exact values are required and can fit in a precision of up to 38. For more information, see Microsoft’s Using decimal, float, and real Data.
Deprecated data types
The use of NTEXT
, TEXT
, and IMAGE
data types should be avoided. These are deprecated data types and will be removed in future versions of SQL Server. They can be replaced by NVARCHAR(MAX)
, VARCHAR(MAX)
, and VARBINARY
data types.
BIGINT as IDENTITY
Each BIGINT
value requires 8 bytes of space and supports numbers up to 2^63-1, but INT
only requires 4 bytes with numbers up to 2^31-1. Since each user database in SQL Server Express has a limit of 10GB, it is unlikely that enough rows can exist to necessitate using BIGINT
as an IDENTITY
data type. Defaulting to INT
can save space and reflects a more reasonable upper limit given the restrictions.
NUMERIC or DECIMAL with 0 scale
DECIMAL
or NUMERIC
data types with a scale of 0 and precision of 18 or less are more effectively stored as an INT
or BIGINT
due to their smaller sizes on disk and ability to hold equivalent values.
Enum column not implemented as foreign key
Enumeration based columns should be implemented in a separate table to adhere to third normal form, which brings benefits of data integrity, space savings, and performance increases. If these values are instead stored in character data types directly in a table instead of as a foreign key to said enumeration table, table size increases, data integrity becomes much harder to maintain, and performance can suffer.
Database growth past 10GB
In most versions SQL Server Express, user databases and the model database are subject to a limit of 10GB. Setting the file growth limit to stay under that amount helps to prevent the database from auto growing too large and being unable to function properly.
Database growth type
Setting data file growth to be a fixed value, versus a percentage, helps to avoid exponential growth of the files. A percentage will result in the file growing significantly larger over time during each auto growth event.
Default fill factor
SQL Server defaults all table fill factors to 100%, but modifying it to leave room for future data can greatly reduce the rate of fragmentation for a table in certain situations. Due to size limitations in SQL Server Express, however, changing this from the default value can artificially increase file size and cause the limit to be reached faster than desired. Fragmentation on small data sets rarely results in performance issues, so keeping the default fill factor is preferred for Express instances.
Number of indexes
While indexes should be added to complement key queries, too many on a single table can hamper performance. There is no magic number of how many indexes are too many, so this value is provided as a parameter. The default value is 7.
Inefficient indexes
Indexes that are exact duplicates or overlap with others may unnecessarily increase the storage footprint of a database. These can reduce performance with little or no benefit to query performance. Each index should be assessed based on the required workload and only removed once it is determined it is not necessary.
Sparse columns
If sparse columns is an available feature and a column reaches the threshold percentage of NULL
values, significant space can be optimized by converting the column to a sparse column. Statistics are used to estimate the amount of NULL
values so any suggestion should be validated before implementing this advice.
Heap Tables
Heap tables cause an excessive amount of fragmentation when rows are inserted, updated, or deleted from the table. This results in under utilized data pages, adding additional I/O and causing table size to balloon. Most tables should have a clustered index, unless they are used for staging data or temporary in nature.
See also: