what is the goal of our object recognition system?
must be able to deal with different representations of the same object
what does the goal of the object recognition system apply to?
structures differences
differences in pose, distance, lighting, position, viewpoint
degraded images - occlusion, noise, distortion, filtering
what is the two-stream model?
post V1, information is transmitted via two pathways - ventral and dorsal stream
more complex networks link to frontal lobe and other areas
what is the ventral stream?
what pathway
from V1 to V2 then V4 and IT cortex
associated with object recognition and memory
what is the dorsal stream?
where and how pathway
from V1 to V2 and finally V5/MT
associated with motion, location, saccadic control
what is object agnosia?
damage to ventral stream can create a deficiency in object recognition
what is the problem with models of object recognition?
all models share a common assumption
the senses register the presence and an internal representation of the stimulus is generated (perceptual representation)
object is recognised when there is a match between the perceptual representation and some stored representation of the same object
object recognition requires the interaction of perception and memory
what did David Marr assert about models of object recognition?
necessarily includes a computational approach
should have three levels of analysis - computational (what is the system doing and why?), algorithmic (which processes, rules and algorithms are used to solve the problem?), implementational (how are these processes implemented by the system?)
what are template-matching models?
simplest model for object recognition
assumes we have templates - perceptual representations stored in our long-term memory
template is available for any object
when an object appears in the receptive field of this detector that matches that template, it signals
what is the problem with template-matching models?
work well but letters must be in exactly expected location and orientation
for this to work, would need a detector for every possible orientation, scale and font - requiring an impossibly large brain
what are feature detection models?
Selfridge’s Pandemonium model (1959)
a built up template model
described in terms of demons with different jobs
model fits well with Hubel & Wiesel’s work which suggests feature detecting neurons
what are feature demons in the feature detection model?
look at image and simply write down how many examples of their feature they see
what are cognitive demons in the feature detection model?
shout if they think that the combination of features applies to their letter
more confident they are, louder they shout
what are decision demons in the feature detection model?
listens to the cognitive demons and decides who is shouting the loudest, providing that as the perceived letter
how are early implementations of the feature detection model crude?
feature demons still using templates
still no information on configuration
still can’t distinguish between different versions
what are the controversies in object recognition?
if asked to describe shape of coin, might say its a disc - this is always true, its a feature of the object
physical shape of coin is viewpoint-independent
perceptual representation changes and therefore is viewer-dependent
what is Marr & Nishihara’s (1978) structural description model?
believed goal of model is to describe to object unambiguously therefore system must be invariant to transformations in viewpoint, illumination, etc which means the system must know which properties are invariant under transformation and how other properties may vary
coordinate system is object centred, negating the problem of transformation variance
what are the primitives in Marr & Nishihara’s (1978) structural description model?
primitive are basic units of information in its representation
volumetric approach - volumes only require axis and size info - maintains specificity without requiring too much storage space
see everything in cylinders
how is information organised into an object description in Marr & Nishihara’s (1978) structural description model?
viewer centred - input image, edge image , 2.5D sketch
object centred - 3D model
at 4th stage, system creates viewpoint-independent representation of the object by describing its volumetric structure, 3D model enables consistent recognition of the object from any angle, ensuring the object can be identified no matter how it is viewed
object is described in terms of its axes and the volumes around them - description is modular and hierarchical meaning the object can be described at many scales, allowing for identity matching and discrimination
what is the input image in Marr & Nishihara’s (1978) structural description model?
retinal image
intensity and wavelength of light at each point
what is the edge image in Marr & Nishihara’s (1978) structural description model?
zero crossing, blobs, edges, bars, ends, curves, boundaries
what is the 2.5D sketch in Marr & Nishihara’s (1978) structural description model?
surfaces with local orientations and discontinuities in depth
what is the 3D model in Marr & Nishihara’s (1978) structural description model?
composed of 3D “primitive” volumes, organised hierarchically by scale