Al's BrainDump

RSS

Navigation





Quick Search
»
Advanced Search »

PoweredBy

Index Scan vs Index Seek

Index Scan retrieves all the rows from the table. Index Seek retrieves selective rows from the table.

Index Scan

Since a scan touches every row in the table whether or not it qualifies, the cost is proportional to the total number of rows in the table. Thus, a scan is an efficient strategy if the table is small or if most of the rows qualify for the predicate. Index Scan is nothing but scanning on the data pages from the first page to the last page. If there is an index on a table, and if the query is touching a larger amount of data, which means the query is retrieving more than 50 percent or 90 percent of the data, and then optimizer would just scan all the data pages to retrieve the data rows. If there is no index, then you might see a Table Scan (Index Scan) in the execution plan.

Index Seek

Since a seek only touches rows that qualify and pages that contain these qualifying rows, the cost is proportional to the number of qualifying rows and pages rather than to the total number of rows in the table. Index seeks are generally preferred for the highly selective queries. What that means is that the query is just requesting a fewer number of rows or just retrieving the other 10 (some documents says 15 percent) of the rows of the table.

Joins

Summary

Nested loops is simple and has no setup overhead so good for datasets where at least one side of the join has a trivial amount of rows. Hash joins have to put the contents of the smaller side into a dictionary first (overhead) but can handle larger datasets on both side of the join. Merge joins are good for sorted data. Nested loops join works well with relatively small data sets and merge join helps with moderately sized data sets, hash join excels at performing the largest joins. Hash joins parallelize and scale better than any other join and are great at maximizing throughput in data warehouses.

Nested loops

Simple outter and inner loop. Good for when 1 side has small number of rows, as in WHERE p.productid = 870. Muder when both inner and outter have large datasets as the worst case scenario is O(A*B). When A only has 1ish rows then not a problem.

When extra columns are required for the select list that do not exist in the indexes being used then the column values either have to be looked up (key lookup operator) or alternatively the table is scanned (index scan operator) once for every joined row in the other table.

Merge Join

Most efficient of all three operators. Requires the data be sorted first and then step through both sides in an interleaved fashion. Merge will be very efficient when one or both tables have a non clustered index that sorts the join column as that can be used as opposed to performing a sort at query time.

Hash Join

Craig Freeman Step1: Create hashtable\dictionary of the smaller set first (this has to fit in memory), Step2: Loop on larger set and use dictionary.find for every row to lookup into premade dictionary Unlike the other join methods, the hash join requires a memory grant (reserved buffer page counter) to store the hash table, thus, there is a limit to the number of concurrent hash joins that SQL Server can run at any given time.



ScrewTurn Wiki version 3.0.5.600. Some of the icons created by FamFamFam.