ProblemI have heard about Full Text Search and I know it has been around for the last few versions of SQL Server. I have never really used it and I have seen some of the other tips related to the technology on your web site, so I assume someone must be using it. Can you provide any use cases for Full Text Search? I think I understand the general premise, but not sure if it is any quicker than just using a LIKE statement or how I would need to change my queries.
SolutionYes - You are correct, Full Text Search has been an available feature in the last few versions of SQL Server and does provide another means of data access. In general, you need to setup a new catalog for table and column you would like to use and then change some of the syntax in your queries to use the Full Text Catalog. Here is some information to get you started:
Use Case for Full Text Search
On a recent project, one of the needs was to search by a few different sets of criteria which were in a number of different tables with a number of one to many relationships. Most of the tables had millions of rows of data with the largest table having over 12 million rows. The data was primarily read-only and updated on a monthly basis. Unfortunately, all of the queries had to access the 12 million row table with 1 to 5 joins to the child tables and 1 to 5 WHERE clauses in the queries. The data needed to be returned as quickly as possible with the highest level of concurrency based on the existing hardware resources.
In this scenario, a number of test cases were built with a few different T-SQL coding techniques to include:
- Logic in the JOIN statements
- INNER JOIN
- LEFT OUTER JOIN
- RIGHT OUTER JOIN
- EXCEPT and INTERSECT syntax
- Logic in WHERE clauses
- IN and NOT IN statements
- LIKE and NOT LIKE Clauses
- = or <>
- Sub queries
- UNION statements
- Load temp tables, update flags or delete records
- Full Text Catalog queries
- etc.
Based on the testing conducted (queries, data, concurrency, etc.), the EXCEPT and INTERSECT syntax was the best T-SQL option for a single JOIN and/or a single WHERE clause statement, which was considered a rare condition. The best T-SQL option for a numerous JOIN and numerous WHERE clause statements, which was considered the norm, was the Full Text Catalog with the CONTAINS command. This was a little counter intuitive initially, but the Full Text Catalog was the most efficient as we tested each scenario and reviewed the cost of the query plans. However, this may not be the case for your data and queries, so test thoroughly.
*** NOTE *** - As a point of reference, the testing was conducted with the DBCC DROPCLEANBUFFERS command issued in between statement executions to ensure cached results would not unfairly benefit subsequent queries.
Setting up a Full Text Catalog
Since, Full Text Catalogs may be new to some DBAs and Developers, let's walk through an example of setting up a Full Text Catalog for the AdventureWorks database (SQL Server 2005 sample database):
Creation - To create a new Full Text catalog navigate to root | Databases | AdventureWorks (Database Name) | Storage | Full Text Catalogs. To create the catalog, right click on the Full Text Catalogs folder and select the 'New Full-Text Catalog...' option.
On the New Full-Text Catalog interface the following settings should be configured:
|
Corresponding Script
USE [AdventureWorks]
GOCREATE FULLTEXT CATALOG [zProduction.Product]IN PATH N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData'WITH ACCENT_SENSITIVITY = ON AUTHORIZATION [dbo] GO |
Configuration - To configure the Full Text Catalog (tables, columns, schedule, etc) navigate to root | Databases | AdventureWorks (Database Name) | Storage | Full Text Catalogs | Catalog Name (i.e. zProduction.Product in our example) and select the 'Properties' option.
On the Full Text Catalog Properties interface the following settings should be configured:
|
General Page - This interface corresponds primarily to the catalog that was setup in the previous steps, with the ability to indicate if the catalog is the default, the catalog owner and if the catalog is accent sensitive. |
Tables/Views Page - This page provides the opportunity to determine the tables, columns to include in the Full Text Catalog. Although multiple tables and columns could be included. In the lower portion of the interface, it is important to note the unique index for each table, the language and the track changes setting. For more information about the track changes setting, reference the Rebuilding the Full Text Catalog section below. |
Population Schedule - |
Corresponding Script
USE [AdventureWorks]
GOCREATE FULLTEXT INDEX ON [Production].[Product] KEY INDEX [PK_Product_ProductID] ON[zProduction.Product] WITH CHANGE_TRACKING AUTOGO
USE [AdventureWorks]
GOALTER FULLTEXT INDEX ON [Production].[Product] ADD ([Name])GO
USE [AdventureWorks]
GOALTER FULLTEXT INDEX ON [Production].[Product] ENABLEGO
USE [msdb]
GODECLARE @jobId BINARY(16)EXEC msdb.dbo.sp_add_job @job_name=N'Start Optimize Catalog Population on AdventureWorks.zProduction.Product', @enabled=1, @start_step_id=1, @description=N'Scheduled full-text optimize catalog population for full-text catalog zProduction.Product in database AdventureWorks. This job was created by the Full-Text Catalog Scheduling dialog or Full-Text Indexing Wizard.', @category_name=N'Full-Text', @job_id = @jobId OUTPUT select @jobId GO
EXEC msdb.dbo.sp_add_jobserver @job_name=N'Start Optimize Catalog Population on AdventureWorks.zProduction.Product', @server_name = N'JTKLAPTOP'GO
USE [msdb]
GODECLARE @schedule_id int EXEC msdb.dbo.sp_add_jobschedule @job_name=N'Start Optimize Catalog Population on AdventureWorks.zProduction.Product', @name=N'zProduction.Product Full Text Catalog ', @enabled=1, @freq_type=4, @freq_interval=1, @freq_subday_type=1, @freq_subday_interval=0, @freq_relative_interval=0, @freq_recurrence_factor=1, @active_start_date=20070924, @active_end_date=99991231, @active_start_time=20000, @active_end_time=235959, @schedule_id = @schedule_id OUTPUT
select @schedule_id
GO
USE [msdb]
GOEXEC msdb.dbo.sp_add_jobstep @job_name=N'Start Optimize Catalog Population on AdventureWorks.zProduction.Product', @step_name=N'Full-Text Indexing', @step_id=1, @cmdexec_success_code=0, @on_success_action=1, @on_success_step_id=-1, @on_fail_action=2, @on_fail_step_id=-1, @retry_attempts=0, @retry_interval=0, @os_run_priority=0, @subsystem=N'TSQL', @command=N'USE [AdventureWorks] ALTER FULLTEXT CATALOG [zProduction.Product] REORGANIZE', @database_name=N'master'GO |
Querying with the Full Text Catalog
Here are a few Full Text Catalogs query examples with the CONTAINS command for the AdventureWorks database as a point of reference:
Product ID and Product Name Selection
USE AdventureWorks;GOSELECT ProductID, [Name]FROM Production.ProductWHERE CONTAINS([Name], '"*washer*" OR "*ball*"');
GO |
Description Selection
USE AdventureWorks;GOSELECT ProductDescriptionID, DescriptionFROM Production.ProductDescriptionWHERE CONTAINS(Description, '"*technology*" OR "*performance*"');GO
|
Rebuilding the Full Text Catalog
One item to keep in mind is that the Full Text Catalogs need to be rebuilt automatically, on a schedule or on an as needed basis in order to include all of the updated data since the initial population. Please reference the options below to rebuild the Full Text Catalogs based on the Tables/Views Page interface shown above:
- Automatic - The data in the full-text index is automatically updated as the data in the corresponding table is inserted, updated or deleted.
- Manual - In this scenario, when the indexed data is inserted, updated or deleted, SQL Server will track the change, but not update the index as is the case with the automatic option. A SQL Server Agent job needs to run in order to update the index.
- Do not track changes - When the indexed data is inserted, updated or deleted, SQL Server will not track the change and the index must be rebuilt to reflect all of the underlying data changes.
Each of these options need to be understood in terms of data availability versus performance. If the data in the Full Text catalog always needs to be up to date at any cost, then use the automatic setting. If the data changes need to be recorded, but not update the Full Text indexes then the manual setting should be used. This setting should balance some of the performance needs by rebuilding the Full Text indexes during a low usage period. If the data is updated in bulk on a regular basis, then not tracking changes is probably appropriate with the practice that the Full Text index will need to be updated following the bulk loading of the data.
Please refer MSSQL Tips blog.
Please refer MSSQL Tips blog.
No comments:
Post a Comment