№ | Слайд | Текст |
1 |
 |
Part 1: Bag-of-words modelsby Li Fei-Fei (Princeton) |
2 |
 |
Related worksEarly “bag of words” models: mostly texture recognition Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003; Hierarchical Bayesian models for documents (pLSA, LDA, etc.) Hoffman 1999; Blei, Ng & Jordan, 2004; Teh, Jordan, Beal & Blei, 2004 Object categorization Csurka, Bray, Dance & Fan, 2004; Sivic, Russell, Efros, Freeman & Zisserman, 2005; Sudderth, Torralba, Freeman & Willsky, 2005; Natural scene categorization Vogel & Schiele, 2004; Fei-Fei & Perona, 2005; Bosch, Zisserman & Munoz, 2006 |
3 |
 |
|
4 |
 |
Analogy to documentsOf all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted point by point to visual centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the image in the eye was projected. Through the discoveries of Hubel and Wiesel we now know that behind the origin of the visual perception in the brain there is a considerably more complicated course of events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step-wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image. |
5 |
 |
A clarification: definition of “BoW”Looser definition Independent features |
6 |
 |
A clarification: definition of “BoW”Looser definition Independent features Stricter definition Independent features histogram representation |
7 |
 |
|
8 |
 |
2.1. 3. Representation |
9 |
 |
1.Feature detection and representation |
10 |
 |
1.Feature detection and representationRegular grid Vogel & Schiele, 2003 Fei-Fei & Perona, 2005 |
11 |
 |
1.Feature detection and representationRegular grid Vogel & Schiele, 2003 Fei-Fei & Perona, 2005 Interest point detector Csurka, et al. 2004 Fei-Fei & Perona, 2005 Sivic, et al. 2005 |
12 |
 |
1.Feature detection and representationRegular grid Vogel & Schiele, 2003 Fei-Fei & Perona, 2005 Interest point detector Csurka, Bray, Dance & Fan, 2004 Fei-Fei & Perona, 2005 Sivic, Russell, Efros, Freeman & Zisserman, 2005 Other methods Random sampling (Vidal-Naquet & Ullman, 2002) Segmentation based patches (Barnard, Duygulu, Forsyth, de Freitas, Blei, Jordan, 2003) |
13 |
 |
1.Feature detection and representationDetect patches [Mikojaczyk and Schmid ’02] [Mata, Chum, Urban & Pajdla, ’02] [Sivic & Zisserman, ’03] Compute SIFT descriptor [Lowe’99] Normalize patch Slide credit: Josef Sivic |
14 |
 |
1.Feature detection and representation |
15 |
 |
2. Codewords dictionary formation |
16 |
 |
2. Codewords dictionary formationVector quantization Slide credit: Josef Sivic |
17 |
 |
2. Codewords dictionary formationFei-Fei et al. 2005 |
18 |
 |
Image patch examples of codewordsSivic et al. 2005 |
19 |
 |
3. Image representationcodewords frequency |
20 |
 |
2.1. 3. Representation |
21 |
 |
Learning and Recognitioncategory models (and/or) classifiers |
22 |
 |
Learning and RecognitionGenerative method: - graphical models Discriminative method: - SVM category models (and/or) classifiers |
23 |
 |
2 generative modelsNa?ve Bayes classifier Csurka Bray, Dance & Fan, 2004 Hierarchical Bayesian text models (pLSA and LDA) Background: Hoffman 2001, Blei, Ng & Jordan, 2004 Object categorization: Sivic et al. 2005, Sudderth et al. 2005 Natural scene categorization: Fei-Fei et al. 2005 |
24 |
 |
First, some notationswn: each patch in an image wn = [0,0,…1,…,0,0]T w: a collection of all N patches in an image w = [w1,w2,…,wN] dj: the jth image in an image collection c: category of the image z: theme or topic of the patch |
25 |
 |
Case #1: the Nave Bayes model c w N Csurka et al. 2004 |
26 |
 |
Csurka et al2004 |
27 |
 |
Csurka et al2004 |
28 |
 |
Case #2: Hierarchical Bayesian text modelsProbabilistic Latent Semantic Analysis (pLSA) Latent Dirichlet Allocation (LDA) Hoffman, 2001 Blei et al., 2001 |
29 |
 |
Case #2: Hierarchical Bayesian text modelsProbabilistic Latent Semantic Analysis (pLSA) Sivic et al. ICCV 2005 |
30 |
 |
Case #2: Hierarchical Bayesian text modelsLatent Dirichlet Allocation (LDA) Fei-Fei et al. ICCV 2005 |
31 |
 |
Case #2: the pLSA model |
32 |
 |
Case #2: the pLSA modelSlide credit: Josef Sivic |
33 |
 |
Case #2: Recognition using pLSASlide credit: Josef Sivic |
34 |
 |
Case #2: Learning the pLSA parametersMaximize likelihood of data using EM Observed counts of word i in document j M … number of codewords N … number of images Slide credit: Josef Sivic |
35 |
 |
DemoCourse website |
36 |
 |
task: face detection – no labeling |
37 |
 |
Demo: feature detectionOutput of crude feature detector Find edges Draw points randomly from edge set Draw from uniform distribution to get scale |
38 |
 |
Demo: learnt parametersCodeword distributions per theme (topic) Theme distributions per image Learning the model: do_plsa(‘config_file_1’) Evaluate and visualize the model: do_plsa_evaluation(‘config_file_1’) |
39 |
 |
Demo: recognition examples |
40 |
 |
Demo: categorization resultsPerformance of each theme |
41 |
 |
Learning and RecognitionGenerative method: - graphical models Discriminative method: - SVM category models (and/or) classifiers |
42 |
 |
Discriminative methods based on ‘bag of words’ representationDecision boundary Zebra Non-zebra |
43 |
 |
Discriminative methods based on ‘bag of words’ representationGrauman & Darrell, 2005, 2006: SVM w/ Pyramid Match kernels Others Csurka, Bray, Dance & Fan, 2004 Serre & Poggio, 2005 |
44 |
 |
Summary: Pyramid match kerneloptimal partial matching between sets of features Grauman & Darrell, 2005, Slide credit: Kristen Grauman |
45 |
 |
Pyramid Match (Grauman & Darrell 2005)Histogram intersection Slide credit: Kristen Grauman |
46 |
 |
Pyramid Match (Grauman & Darrell 2005)Histogram intersection Slide credit: Kristen Grauman |
47 |
 |
Pyramid match kernelWeights inversely proportional to bin size Normalize kernel values to avoid favoring large sets Slide credit: Kristen Grauman |
48 |
 |
Example pyramid matchLevel 0 Slide credit: Kristen Grauman |
49 |
 |
Example pyramid matchLevel 1 Slide credit: Kristen Grauman |
50 |
 |
Example pyramid matchLevel 2 Slide credit: Kristen Grauman |
51 |
 |
Example pyramid matchpyramid match optimal match Slide credit: Kristen Grauman |
52 |
 |
Summary: Pyramid match kerneloptimal partial matching between sets of features difficulty of a match at level i number of new matches at level i Slide credit: Kristen Grauman |
53 |
 |
Object recognition resultsETH-80 database 8 object classes (Eichhorn and Chapelle 2004) Features: Harris detector PCA-SIFT descriptor, d=10 84% 85% 84% Kernel Complexity Recognition rate Match [Wallraven et al.] Bhattacharyya affinity [Kondor & Jebara] Pyramid match Slide credit: Kristen Grauman |
54 |
 |
Object recognition resultsCaltech objects database 101 object classes Features: SIFT detector PCA-SIFT descriptor, d=10 30 training images / class 43% recognition rate (1% chance performance) 0.002 seconds per match Slide credit: Kristen Grauman |
55 |
 |
|
56 |
 |
?What about spatial info? |
57 |
 |
What about spatial infoFeature level Spatial influence through correlogram features: Savarese, Winn and Criminisi, CVPR 2006 |
58 |
 |
What about spatial infoFeature level Generative models Sudderth, Torralba, Freeman & Willsky, 2005, 2006 Niebles & Fei-Fei, CVPR 2007 |
59 |
 |
What about spatial infoFeature level Generative models Sudderth, Torralba, Freeman & Willsky, 2005, 2006 Niebles & Fei-Fei, CVPR 2007 |
60 |
 |
What about spatial infoFeature level Generative models Discriminative methods Lazebnik, Schmid & Ponce, 2006 |
61 |
 |
Invariance issuesScale and rotation Implicit Detectors and descriptors Kadir and Brady. 2003 |
62 |
 |
Invariance issuesScale and rotation Occlusion Implicit in the models Codeword distribution: small variations (In theory) Theme (z) distribution: different occlusion patterns |
63 |
 |
Invariance issuesScale and rotation Occlusion Translation Encode (relative) location information Sudderth, Torralba, Freeman & Willsky, 2005, 2006 Niebles & Fei-Fei, 2007 |
64 |
 |
Invariance issuesScale and rotation Occlusion Translation View point (in theory) Codewords: detector and descriptor Theme distributions: different view points Fergus, Fei-Fei, Perona & Zisserman, 2005 |
65 |
 |
Model propertiesIntuitive Analogy to documents |
66 |
 |
Model propertiesIntuitive Analogy to documents Analogy to human vision Olshausen and Field, 2004, Fei-Fei and Perona, 2005 |
67 |
 |
Model propertiesIntuitive generative models Convenient for weakly- or un-supervised, incremental training Prior information Flexibility (e.g. HDP) Li, Wang & Fei-Fei, CVPR 2007 Sivic, Russell, Efros, Freeman, Zisserman, 2005 |
68 |
 |
Model propertiesIntuitive generative models Discriminative method Computationally efficient Grauman et al. CVPR 2005 |
69 |
 |
Model propertiesIntuitive generative models Discriminative method Learning and recognition relatively fast Compare to other methods |
70 |
 |
Weakness of the modelNo rigorous geometric information of the object components It’s intuitive to most of us that objects are made of parts – no such information Not extensively tested yet for View point invariance Scale invariance Segmentation and localization unclear |
«Part 1: Bag-of-words models» |
http://900igr.net/prezentacija/anglijskij-jazyk/part-1-bag-of-words-models-252492.html