№ | Слайд | Текст |
1 |
 |
Object Based Disk: the key to intelligent I/OGeorge Gorbatenko Data Machine International St Paul, MN 55115 gorby@ece.umn.edu Sept 23, 2002 1 |
2 |
 |
Why are we interestedfaster transportable more accessible cheaper facilitates holistic design improve reliability Sept 23, 2002 DMI 2 |
3 |
 |
I/O is considered the weak link in systems architectureI/O problem memory wall bottle neck Sept 23, 2002 DMI 3 |
4 |
 |
Issuesrandomness is painful mechanical time vs electronic time ratio of times is about 200:1 operating system obscures the disk Sept 23, 2002 DMI 4 |
5 |
 |
Operating Systemseamless view of space legacy of data storage goes back to punched card accommodates all applications Sept 23, 2002 DMI 5 |
6 |
 |
Data evolutiontape reflected a 80 column card image disk reflected tape Sept 23, 2002 DMI 6 |
7 |
 |
In short…nothing much has changed data format-wise since the 1930’s we are pretty much dealing with records in a linear format, one record after the next Sept 23, 2002 DMI 7 |
8 |
 |
The advantage of object based design isencapsulate the data define the application subset don’t have the operating system getting in the way Sept 23, 2002 DMI 8 |
9 |
 |
SQL object is good choicebroad user base de facto standard for data bases high enough to exploit the power in the I/O yesterdays CPU in today’s disk (controller) aggregate compute power exceeds the host Sept 23, 2002 DMI 9 |
10 |
 |
Researchers in Intelligent Disks are motivated by…exploiting the latent processing potential filtering data in place Sept 23, 2002 DMI 10 |
11 |
 |
Consider a disk farm…Sept 23, 2002 DMI 11 |
12 |
 |
But where do we place the intelligencehost I/O controller disk Sept 23, 2002 DMI 12 |
13 |
 |
Disk basicsmany platters (ea fixed head) 10 many concentric tracks / platter 10k each track holds many sectors 100 Total number of 512 byte sectors 10M ____ disk capacity: 5GB Sept 23, 2002 DMI 13 |
14 |
 |
To access a random blockseek to track 10-15 us wait for block to roll around 4 –5 us read block 80 us hence… 200:1 Sept 23, 2002 DMI 14 |
15 |
 |
Design Goalssynchronous operation next data you want is beneath head process data in place (filter) touch the min amount of data for what you touch you pay in time and space exploit locality amortize random access read over large data block Sept 23, 2002 DMI 15 |
16 |
 |
Access strategies…Amortize the (inefficient) access over large block of data Make sure the data has utility Sept 23, 2002 DMI 16 |
17 |
 |
Optimum Block SizeSept 23, 2002 DMI 17 |
18 |
 |
Select name, address,salary where salary >22KSept 23, 2002 DMI 18 |
19 |
 |
Data Utility…Sept 23, 2002 DMI 19 |
20 |
 |
Consider the travels of an inchworm…A1 B1 C1 D1 E1 A2 B2 C2 D2 E2 A3 B3 C3 D3 E3 A4 B4 C4 D4 E4 A5 B5 C5 D5 E5 Sept 23, 2002 DMI 20 |
21 |
 |
Travels of an inchworm…A1 B1 C1 D1 E1 A2 B2 C2 D2 E2 A3 B3 C3 D3 E3 A4 B4 C4 D4 E4 A5 B5 C5 D5 E5 Sept 23, 2002 DMI 21 |
22 |
 |
Travels of an inchworm…A1 B1 C1 D1 E1 A2 B2 C2 D2 E2 A3 B3 C3 D3 E3 A4 B4 C4 D4 E4 A5 B5 C5 D5 E5 Sept 23, 2002 DMI 22 |
23 |
 |
Locality of ReferenceA1 B1 C1 D1 E1 A2 B2 C2 D2 E2 A3 B3 C3 D3 E3 A4 B4 C4 D4 E4 A5 B5 C5 D5 E5 (a) Logical view of two dimensional table. A1 B1 C1 D1 E1 A2 B2 C2 D2 E2 A3 B3 C3 D3…… (b) Row ordered mapping (physical). A1 A2 A3 A4 A5B1 B2 B3 B4 B5 C1 C2 C3 C4…… (c) Column ordered mapping (physical) Sept 23, 2002 DMI 23 |
24 |
 |
Preservation of Logical TopologyTo preserve the logical topology of n dimensional logical data space, the physical space must at least be of like dimension. - for a 2D table (rows and columns) we need to view disk as two dimensional Sept 23, 2002 DMI 24 |
25 |
 |
Observations:SQL can be decomposed in two operations select - favored by column order extract – favored by row order granular access permits touching min data map data so as to preserve topology when going from logical to physical medium reading a tracks worth of data appears reasonable Sept 23, 2002 DMI 25 |
26 |
 |
Treating disk as 2D spacedata objects are 2D spaces solves “design boundaries” disk is basically a 3D medium cylinder-track-sector Sept 23, 2002 DMI 26 |
27 |
 |
The disk is 3 dimensionalSept 23, 2002 DMI 27 |
28 |
 |
Consider the first cylinder of the set…Sept 23, 2002 DMI 28 |
29 |
 |
Examining a single cylinder…Sept 23, 2002 DMI 29 |
30 |
 |
which has tracks and sectors…Sept 23, 2002 DMI 30 |
31 |
 |
track for each head…Sept 23, 2002 DMI 31 |
32 |
 |
track read…Sept 23, 2002 DMI 32 |
33 |
 |
diagonal (sector block) read…Sept 23, 2002 DMI 33 |
34 |
 |
sector block shadowSept 23, 2002 DMI 34 |
35 |
 |
Unwrap a cylinder…Sept 23, 2002 DMI 35 |
36 |
 |
2 dimensional space: hd x sectorSept 23, 2002 DMI 36 |
37 |
 |
track read or sector block read…Sept 23, 2002 DMI 37 |
38 |
 |
Physical Sector Block Organization…Physical sector (512) Logical sector size (lss) Sept 23, 2002 DMI 38 |
39 |
 |
Logical Sector Block Organization…Physical sector (512) Logical sector size (lss) Sept 23, 2002 DMI 39 |
40 |
 |
record structure…typedef struct _record { char employee_no [8]; // employee number; field A char name [12]; // name; field B char address [24]; // address; field C char zip [5]; // zip code; field D char salary [6]; // salary; field E char doh [6]; // data of hire; field F char dept [3]; // department; field G char tbd [16]; // reserved for future use; field H } Record; Sept 23, 2002 DMI 40 |
41 |
 |
modified best fit algorithmLSS = ceil (rec_len / num_hds) = ceil (64 /10) = 4n = 8 rec_space = LSS * num_hds = 80 bytes LSS (8 bytes) Sept 23, 2002 DMI 41 |
42 |
 |
modified best fit algorithmLSS (8 bytes) B C D G E F Sept 23, 2002 DMI 42 A typedef struct _record { char employee_no [8]; // field A char name [12]; // field B char address [24]; // field C char zip [5]; // field D char salary [6]; // field E char doh [6]; // field F char dept [3]; // field G char tbd [16]; // field H } Record; |
43 |
 |
SQL Decomposition…Select records scan the salary field stores ordinal position in bit vector Extract records optimizer decides strategy (trk or sb read) Sept 23, 2002 DMI 43 |
44 |
 |
Comparison Results…Sept 23, 2002 DMI 44 |
45 |
 |
Prototypetwo 4 GB Seagate Baracudas 21 heads (29 zones) 40 KLOC skew = 5 sectors Solaris 2.51 OS emulated intelligence in IOP context sw every 60 ms Sept 23, 2002 DMI 45 |
46 |
 |
Data particulars…168 byte records LSS = 8 bytes 63 records per Sector Block 7,749 records per cylinder 3 fields (2 heads) involved in query 2 records extracted from disjoint blocks Sept 23, 2002 DMI 46 |
47 |
 |
Test Runswrite cyl worth data w/o optimizer write same with optimizer enabled scan cyl involving 3 col; extract 2 blks repeat operation (c) Sept 23, 2002 DMI 47 |
48 |
 |
Results…Observed Calculated case (a) 2.5 sec 2.427 sec case (b) 196 ms 216 ± 4 ms case (c) 51 ms 54.5 ± 4ms case (d) 42 ms 37.6 ± 4ms Sept 23, 2002 DMI 48 |
49 |
 |
Benchmark Analysis3 Benchmarks selected - Wisconsin - Set Query - TPC D/H selected non-join cases reversed engineered the I/O detail Sept 23, 2002 DMI 49 |
50 |
 |
Wisconsin results…Sept 23, 2002 DMI 50 WISCONSIN BENCHMARK 1000 WIS 2D 100.0 Time ( seconds) 10.00 1.000 0.100 Q1 Q2 Q3 Q4 Q5 Benchmarks |
51 |
 |
Average time within class…Sept 23, 2002 DMI 51 AVERAGE TIME WITHIN CLASS 200 DB2 M204 180 2D 160 140 120 Average Time (seconds) 100 80 60 40 20 0 Q1 Q2A Q3A Q4A Q5 Query Group |
52 |
 |
TPC Q6LINEAL block size 8,192 65,536 time (sec) 155 19.38 8.00 ratio FOM 0.009 0.05 5.47 ratio 2D lss (bytes) 8 4 time (sec) 1.95 1.59 1.23 ratio FOM 0.454 0.639 1.41 ratio FOM ratio 49.38 12.71 time ratio 79.52 12.22 Sept 23, 2002 DMI 52 |
53 |
 |
But what about indexesDepending on selection rate, may make sense. useful when… enforcing unique key SQL JOIN operation add noise to coherent system Sept 23, 2002 DMI 53 |
54 |
 |
The “S” curveSept 23, 2002 DMI 54 EXTRACTION TIME vs SELECTION 25 DEL=1 20 DEL=2 DEL=4 DEL=8 DEL=16 15 Number of Spins 10 5 0 0.01% 0.10% 1.00% 10.0% 100% Selection (percent) |
55 |
 |
Index vs Exhaustive ScanSept 23, 2002 DMI 55 COMPARISON: INDEX vs EXHAUSTIVE SCAN seq X rand X iso X 2D seq 2D rand Time to Extract Records from 1M Row Table (seconds) Extraction Rate (expressed as percentage) |
56 |
 |
Penalty for wrong guess is less7 hours 3 seconds Sept 23, 2002 DMI 56 COMPARISON: INDEX vs EXHAUSTIVE SCAN seq X rand X iso X 2D seq 2D rand Time to Extract Records from 1M Row Table (seconds) Extraction Rate (expressed as percentage) |
57 |
 |
Index StrategyUse on unique fields consider for JOIN operations evaluate based on knowledge of data AVOID where no information is available about data - penalty is bounded Sept 23, 2002 DMI 57 |
58 |
 |
Intelligent I/O ProcessorIOP 20 KC 200 MB/s 36 MB/s Sept 23, 2002 DMI 58 |
59 |
 |
I/O ops are atomic to the cylinderInformation contained in a “clip” spindle & cylinder number & LSS data number records scan info (selection criteria) command fields of interest (project) read/write/scan Data and bit (beta) vector returned Sept 23, 2002 DMI 59 |
60 |
 |
IOP Characteristics…Manages own memory and - tracks are cached DSP is ideal building block - columns, vectors Has no semantic awareness of data offsets and widths Process data in real time RISC type - executes simple jobs quickly Sept 23, 2002 DMI 60 |
61 |
 |
Parallel Architecture10 MZ / .020 = 500 IOP’s 500 IOP’s => 4K spindles HOST PROCESSOR IOP 1 IOP 2 IOP 3 IOP n Sept 23, 2002 DMI 61 |
62 |
 |
Comments…Issue becomes being able to evenly distribute data DB vendors have to pass the info down Disk OEMs… capacity vs intelligence Performance is scalable Evaluate performance of indexes in new light… - could introduce noise in otherwise coherent system Sept 23, 2002 DMI 62 |
63 |
 |
SummaryObject Based Design permits local optimization with minimum compromise 2D mapping Synchronous disk Granular access is key Intelligent IOP (real time processing) For I/O limited applications… significant performance gains are possible Sept 23, 2002 DMI 63 |
64 |
 |
Challenge and future workModel a more rigorous system Explore other applications spatial scientific Integrate concept in a communication model add disk instrumentation define role for disk Sept 23, 2002 DMI 64 |
«Класс сгате» |