Cost of Retrieval
select studnum, mark from MARK
where course = 'PHYS' and studnum = 100 and mark > 90
Physical design for MARK
- Btree primary index on course
MARK-PINDEX/(RID-P, studnum, course, assignnum, mark)
- Btree secondary index on studnum
MARK-SINDEX/(RID-S, studnum, RID-P)
Statistical metadata
- |MARK| = 10000
(MARK-PINDEX) = 50 - distinct(MARK, studnum) = 500 (number of different students)
- distinct(MARK, course) = 100 (number of different courses)
- distinct(MARK, mark) = 100 (number of different marks)
Strategy 1: Use Primary Index
Query plan
- Assuming uniform distribution of tuples over the course, there are about |MARK| / 100 = 100 tuples with course = PHYS
- Searching MARK-PINDEX Btree has a cost of 2
- Retrieving the 100 matching tuples adds a cost of 100/
(MARK-INDEX) data blocks - Total cost is 2 + 100 / 50 = 4 block reads
Selection of tuples from relation using a clustered primary key has cost
Searching Btree has cost 2
Whether its for primary or secondary index
Strategy 2: Use Secondary Index
Query Plan:
Nested index join
- Assuming uniform distribution of tuples over student numbers there will be about |MARK|/500 = 20
- Searching MARK-SINDEX Btree has a cost of 2
- Not clustered index so we assume each matching record in Btree is on a separate data block (Pessimistic)
- 20 blocks will need to be read
- Total cost is 22 block reads
Selection of tuples from relation using an unclustered secondary index has a cost of 2+
Strategy 3: Scan Primary Index
Query Plan:
- 10000/50 = 200 MARK-INDEX Btree data pages
- Total cost is 200 block reads
Selection of tuples from relation by an exhaustive scan of its primary index has a cost of