In this post, I decided to explain to you how to use SQL Server FILESTREAM to store unstructured data. Here you can also read what are the positive and negative sides of FILESTREAM. Here are some other SQL queries to help you use it.
In earlier versions of SQL Server, storage of unstructured data posed many challenges in terms of maintaining consistency between structured and unstructured data, managing backup/restore procedure, performance issues, scalability, etc.… In earlier versions of MSSQL before SQL Server 2008, there were various mechanisms for storing unstructured data. This information is usually stored in the form of files in a shared folder for which certain users are granted access rights. The UNC path of these files is usually stored in the database as a column in the table (varchar (n)) so that application logic can access the specific files. The problem was the security of the files, administering access to them and maintaining them.
The later concept of Binary Large Objects (BLOB) helped to store unstructured data to some degree. The main advantage of this concept is the integrated management and transactional consistency within the database. Security issues (the previous file solution) were resolved in this case. There were problems though, which is a 2Gb limit and log file overload.
To address these issues, SQL Server 2008 introduced for the first time an enhancement called FILESTREAM.
It is not a data type such as VARBINARY (MAX) but an attribute/property set via the VARBINARY column to indicate that the data should be stored directly on the file system. Such data remains an integral part of the system and maintains transactional consistency.
Some of the features of FILESTREAM are:
- Binary data is stored as individual files outside the database,
- These individual files can be accessed through the WIN32 API for file operations,
- T-SQL statements are applicable,
- The 2GB file size limit for VARBINARY (MAX) columns has been removed for objects stored in the file system through FILESTREAM,
- FILESTREAM can also be used on compressed folders, disks, volumes.
In order to use FILESTREAM, it must be enabled. FILESTREAM is enabled on two levels. The first level is the SQL Server level. This level was enabled when installing SQL Server. If this is not done then enable can be done through SQL Server Configuration Manager by checking the checkboxes on the SQL Server FILESTREAM tab of Properties (Figure 1).
Figure 1. Enable FILESTREAM at the SQL Server level using SQL Server Configuration Manager
The second level of enabling FILESTREAM is at the SQL Server Instance level. This enable is done by executing the T-SQL script provided in script 1.
Script 1. Enable FILESTREAM at Server Instance level
Table 1 lists the possible values for filestream_access_level.
|0||Disables FILESTREAM support for this instance.|
|1||Enables FILESTREAM for Transact-SQL access.|
|2||Enables FILESTREAM for Transact-SQL and Win32 streaming access.|
Table 1. Possible values for filestream_access_level
The next step is to create a database where it is possible to create tables in which FILESTREAM will be used. An example of a T-SQL script for creating a database where FILESTREAM can be used is given in script 2.
Script 2. An example of creating a database where it will be possible to use FILESTREAM
Executing script 2 on the file system will create a $ FSLOG folder and file filestream.hdr. The filestream.hdr file is a header file for the FILESTREAM container.
The next step is to create a table with column VARBINARY (MAX) with the FILESTREAM attribute (script 3).
Script 3. Creating a table with a VARBINARY (MAX) FILESTREAM column
It remains to insert the record into the newly created table (script 4).
Script 4. Inserting a record into a table with a VARBINARY (MAX) FILESTREAM column
Inserting a record with script 4 will also create a new folder on the file system.
Records from the table with the VARBINARY (MAX) FILESTREAM column can be obtained by executing script 5.
Script 5. Viewing records from a table with a VARBINARY (MAX) FILESTREAM column
The result of executing script 5 is given in Figure 2.
Figure 2. Record from a table with a VARBINARY (MAX) FILESTREAM column
Each cell in the FILESTREAM column is a file path on the file system associated with it. To read the path, it is necessary to use the PathName property of the varbinary (max) column in the T-SQL statement. Script 6 provides an example of how to read the varbinary (max) column file path.
Script 6. Example of reading a FILESTREAM file path on a file system
T-SQL can be used to work with FILESTREAM data, but it’s a more natural MS Visual Studio environment. The use of FILESTREAM functionality, in application logic, is made possible by implementing the System.Data.SqlTypes.SqlFileStream class. To maintain consistency of data, each SQL FILESTREAM operation must be part of a transaction. MARS (Multiple Active Result Sets) connections have special rules for batch transactions, which the T-SQL BEGIN TRANSACTION statement violates. To avoid this problem, the application client should use the appropriate transaction management API to mean the class System.Data.SqlClient.SqlTransaction.
To allow transactional access to a FILESTREAM data file system, it is necessary to use the T-SQL function GET_FILESTREAM_TRASACTION_CONTEXT () to provide a token that represents the current transaction within a particular session (C # code 1).
C # code 1. Using the GET_FILESTREAM_TRASACTION_CONTEXT () function
This way the transaction is initiated but not yet committed or rollbacked. Multiple operations with FILESTREAM data can be performed in one transaction. An example of a data entry code using the Win32 API is given in C # code 2 in which a sqlFileStream object is initially created based on the System.Data.SqlTypes.SqlFileStream class.
C # code 2. Example code for entering data in FILESTREAM
An example code to read FILESTREAM data, using the Win32 API, is given in C # code 3. The same sqlFileStream object that was created at the beginning of the C # code 1 example is used.
C # code 3. Example code to read FILESTREAM data
Upon completion of FILESTREAM enrollment transactions (C # code 1) and reading FILESTREAM data (C # code 2), the sqlFileStream object (created at the beginning of the C # code 1 example based on the System.Data.SqlTypes.SqlFileStream class) must be closed and commit transaction (C # code 4). The sqlCommand object was created at the beginning of example C # code 1.
C # code 4. Close the sqlFileStream object and commit the transaction
A graph of comparative values of data loading performance, when VARBINARY (MAX) and FILESTREAM are used to store unstructured data and T-SQL or Win32 is given in Figure 3 to access the data.
Figure 3. Load performance of BLOBs of different sizes
It should also be noted that FILESTREAM is also used in other SQL Server technologies, such as FileTable and In-Memory OLTP.
In the end:
Like any other technology, FILESTREAM has its advantages and disadvantages.
- Files created as part of FILESTREAM are managed by SQL Server itself in its own file groups from which backup and restore can be done together with other SQL Server data,
- Reading and writing these files is part of the database transaction,
- It is possible to store very large BLOB objects
- FILESTREAM data can only be stored on local disk volumes,
- Not supported in database snapshot,
- Not supported on database mirroring,
- Transparent Data Encryption is not supported,
- Cannot be used with table-valued parameters