Lecture 6 - Object recognition Flashcards

Question 1

Q

What is the problem and goal with many-to-one mapping?

Answer

A

Problem: there are many separate objects that occupy the same cognitive category (many-to-one mapping
E.g. wooden chair, rocking chair, metal chairs = all chairs
Goal: our object recognition system must be able to deal with different representations of the same object
This doesn’t only apply to structural differences, but also pose, distance, lighting, position and viewpoint
We also need to be able to cope with degraded image like occlusion (other objects obstructing our target object), noise, distortion and filtering

Question 2

Q

where does object recognition take place?

Answer

A

The two-stream model – post V1, info is transmitted via two pathways
Ventral stream ‘what’ (V1-V2-V4-IT cortex) associated with object recognition and memory
Dorsal stream ‘where’ (V1-V2-V5/MT) associated with motion, location, saccadic control
More complex networks link to the frontal lobe and other areas
Object agnosia – damage to ventral stream can create a deficiency in object recognition
Objects can still be recognised if prompted by other modalities (e.g. smell)

Question 3

Q

models of object recognition: The problem and three levels of analysis

Answer

A

The Problem:
- All models share a common assumption
- The senses register the presence of a stimulus. An internal representation of the stimulus is generated (perceptual representation). The object is recognised when there is a match between the perceptual representation and some stored representation of the same object. Object recognition requires the interaction of perception and memory
- Several models have attempted to explain how the visual system constructs a perception of a recognised object
- David Marr states object recognised includes a computational approach and has three levels of analysis
1. Computational – what is the system doing and why?
2. Algorithmic – which processes, rules, and algorithms are used to solve the problem?
3. Implementational – how are these processes implemented by the system

Question 4

Q

What are template-matching models?

Answer

A

The simplest model for object recognition
It assumes we have templates (i.e. perceptual representations stored in our long-term memory) available for any object and you have a detector
When an object appears in the RF of this detector that matches that template, it signals
The computer vision equivalent is the machines that read cheques, the work well but the letters must be in exactly the expected location and orientation
For this to work, we would need a detector for every possible orientation, scale, font etc – requiring a large brain!

Question 5

Q

what are feature detection models?

Answer

A

Selfridge’s Pandemonium model (1959)
A built-up template model, described in terms of demon with different jobs
1. Feature demons – look at the image and simply write down how many examples of their feature they see
2. Cognitive demons – shout if they think that combination of features applies to their letter, the more confident they are, the louder they shout
3. Decision demon – listens to the cognitive demons and decides who is shouting the loudest, providing that as the perceived letter
This model fits well with Hubel and Wiesel’s work, suggesting feature-detecting neurons
Feature demons are till using templates, still no information on configuration, still can’t distinguish between different versions of R
Modern viewpoint-dependent models are derived from models like this

Question 6

Q

What are some controversies in object recognition?

Answer

A

E.g. describe a coin as a disc – this is always true, it’s a feature of this object
The physical shape of a coin is a viewpoint-independent
What about your viewpoint? Is your experience of the coin always the same?
The perceptual representation changes and therefore is viewpoint-dependent
What happens un our brain?
Different theories have decided on viewpoint-independent or viewpoint-dependent representation to solve the issue of shape constancy

Question 7

Q

Structural description models: Marr & Nishihara (1978)

Answer

A

They believed the goal of the model is to describe the object unambiguously
Therefore, the system must be invariant to transformations in viewpoint, illumination, etc
This means the system must know which properties are invariant under transformation and how other properties might vary
a) Should the coordinate system be viewer-centred or object-centred? Object-centred negated the problem of transformation variance
b) What are its primitives? (primitives are the basic units of info in its representation) Volumetric approach: volumes only require axis and size info – maintains specificity without requiring too much storage space
c) How is that information organised into an object description? This is their proposed process by which the image becomes a set of volumes
At this stage, the system creates a viewpoint-dependent representation of the object by describing its volumetric structure. This 3D model enables consistent recognition of the object from any angle, ensuring that the object can be identified no matter how its viewed
The object is described in terms of its axes and the volumes around them
This description is modular and hierarchical
This means the object can be described at many scales, allowing for identity matching and discrimination
Now we have representation, we just need recognition
Recognition: the ‘model store”: even if your object perception doesn’t exactly match anything in your model store, you’ll find the closest match and have sufficient info on your object from the image and your memory to help you interact with it

Question 8

Q

Structural description models: Biedermann (1987)

Answer

A

This model is also called recognition by components (RBC)
Proposed a set of primitive volumes into which objects are decomposed
The volumes are called Geons (geometric ions)
Many features of these geons remain the same in 2D & 3D (e.g. collinearly, curvature, symmetry, parallelism, co-termination)
He estimated that there are < 36 of these geons, therefore 36 squared = 1296 pairs of geons, which can be attached in different ways/ different relative sizes
75,000 possible 2-geon objects
He gave experimental evidence for these geons in human object recognition as it was much poorer recognition in the missing-geon condition, particularly when presented only for a short time
Partially occlusion changes the object from flat to 3D
The visual system will generate hypotheses about how the objects contours may continue behind the occlude – cognitive process that allows people to infer the full shape even if the part of it is hidden
The Geons theory struggles to explain how humans perceive and recognise objects that are partially obscured as it doesn’t fully account for this inferential processing

Question 9

Q

Assessment of structural description models

Answer

A

Pros:
- Invariance is well explained
- Recognition relies on description rather than matching
- Graded representations cope with discrimination and generalization
- Evidence that structural information matters to humans and to neurons
Cons:
- Extracting model parameters can be hard in real images (e.g. occlusion)
- Structural description is difficult for some objects (e.g. crumpled paper)
- Driven by theoretical desirability rather than behavioural or physiological evidence

Question 10

Q

what are view-dependent models?

Answer

A

Bulthoff & Edelman (1992), Riesenhuber & Poggio (1999)
View-dependent models start with a completely different idea
Don’t need a catalogue of fixed 3D models, but rather a catalogue of shape descriptions that match view-dependent characteristics of objects
The finding of a canonical perspective supports the idea of an experience-based catalogue
Canonical perspective – is what comes to mind when we think of an object and its view that seems to maximise the info used for recognition
Uses a viewer-based coordinate system
The primitives are sub-regions of the image:
a) Not the whole image. These are ‘abstract features’ (i.e. lines, curves, texture, colour, shading, etc)
b) Feature-sensitive units combine into each other in a weighted way, getting more complicated
c) Size and position invariant
d) These feed into view-tuned object recognition cells
e) Recognition by matching input to the closest stored view
Main difference is a weighted approach between layers, rather than winner takes all
1. View layer: units code a particular view of the object
2. Object layer: units respond to a particular object (view-independent)

Question 11

Q

Evidence for view-dependent models

Answer

A

Human object recognition is not perfectly viewpoint invariant
The viewing sphere: practised recognising objects form specific viewpoints, tested at novel viewpoints
1. Interpolation: between previous viewpoints (easiest)
2. Extrapolation; beyond previous viewpoints but in the same axis (medium difficulty)
3. Orthogonal axis: from a completely new viewpoint (hardest)

Question 12

Q

Assessment of view-dependent models

Answer

A

Pros:
- Straightforward
- Minimises transformations that must be performed
- Newer models are based directly on what we know of physiology
- Abstract features are recombinable
- Good behavioural, physiological, and stimulation-based evidence
Cons:
- Humans often show quite good generalisation across viewpoints even for novel objects
- Still more memory intensive than e.g. geon model

Lecture 6 - Object recognition Flashcards

(12 cards)