Lecture 6 - Object recognition Flashcards

(12 cards)

1
Q

What is the problem and goal with many-to-one mapping?

A
  • Problem: there are many separate objects that occupy the same cognitive category (many-to-one mapping
  • E.g. wooden chair, rocking chair, metal chairs = all chairs
  • Goal: our object recognition system must be able to deal with different representations of the same object
  • This doesn’t only apply to structural differences, but also pose, distance, lighting, position and viewpoint
  • We also need to be able to cope with degraded image like occlusion (other objects obstructing our target object), noise, distortion and filtering
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

where does object recognition take place?

A
  • The two-stream model – post V1, info is transmitted via two pathways
  • Ventral stream ‘what’ (V1-V2-V4-IT cortex) associated with object recognition and memory
  • Dorsal stream ‘where’ (V1-V2-V5/MT) associated with motion, location, saccadic control
  • More complex networks link to the frontal lobe and other areas
  • Object agnosia – damage to ventral stream can create a deficiency in object recognition
  • Objects can still be recognised if prompted by other modalities (e.g. smell)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

models of object recognition: The problem and three levels of analysis

A

The Problem:
- All models share a common assumption
- The senses register the presence of a stimulus. An internal representation of the stimulus is generated (perceptual representation). The object is recognised when there is a match between the perceptual representation and some stored representation of the same object. Object recognition requires the interaction of perception and memory
- Several models have attempted to explain how the visual system constructs a perception of a recognised object
- David Marr states object recognised includes a computational approach and has three levels of analysis
1. Computational – what is the system doing and why?
2. Algorithmic – which processes, rules, and algorithms are used to solve the problem?
3. Implementational – how are these processes implemented by the system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are template-matching models?

A
  • The simplest model for object recognition
  • It assumes we have templates (i.e. perceptual representations stored in our long-term memory) available for any object and you have a detector
  • When an object appears in the RF of this detector that matches that template, it signals
  • The computer vision equivalent is the machines that read cheques, the work well but the letters must be in exactly the expected location and orientation
  • For this to work, we would need a detector for every possible orientation, scale, font etc – requiring a large brain!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what are feature detection models?

A
  • Selfridge’s Pandemonium model (1959)
  • A built-up template model, described in terms of demon with different jobs
    1. Feature demons – look at the image and simply write down how many examples of their feature they see
    2. Cognitive demons – shout if they think that combination of features applies to their letter, the more confident they are, the louder they shout
    3. Decision demon – listens to the cognitive demons and decides who is shouting the loudest, providing that as the perceived letter
  • This model fits well with Hubel and Wiesel’s work, suggesting feature-detecting neurons
  • Feature demons are till using templates, still no information on configuration, still can’t distinguish between different versions of R
  • Modern viewpoint-dependent models are derived from models like this
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some controversies in object recognition?

A
  • E.g. describe a coin as a disc – this is always true, it’s a feature of this object
  • The physical shape of a coin is a viewpoint-independent
  • What about your viewpoint? Is your experience of the coin always the same?
  • The perceptual representation changes and therefore is viewpoint-dependent
  • What happens un our brain?
  • Different theories have decided on viewpoint-independent or viewpoint-dependent representation to solve the issue of shape constancy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Structural description models: Marr & Nishihara (1978)

A
  • They believed the goal of the model is to describe the object unambiguously
  • Therefore, the system must be invariant to transformations in viewpoint, illumination, etc
  • This means the system must know which properties are invariant under transformation and how other properties might vary
    a) Should the coordinate system be viewer-centred or object-centred? Object-centred negated the problem of transformation variance
    b) What are its primitives? (primitives are the basic units of info in its representation) Volumetric approach: volumes only require axis and size info – maintains specificity without requiring too much storage space
    c) How is that information organised into an object description? This is their proposed process by which the image becomes a set of volumes
  • At this stage, the system creates a viewpoint-dependent representation of the object by describing its volumetric structure. This 3D model enables consistent recognition of the object from any angle, ensuring that the object can be identified no matter how its viewed
  • The object is described in terms of its axes and the volumes around them
  • This description is modular and hierarchical
  • This means the object can be described at many scales, allowing for identity matching and discrimination
  • Now we have representation, we just need recognition
  • Recognition: the ‘model store”: even if your object perception doesn’t exactly match anything in your model store, you’ll find the closest match and have sufficient info on your object from the image and your memory to help you interact with it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Structural description models: Biedermann (1987)

A
  • This model is also called recognition by components (RBC)
  • Proposed a set of primitive volumes into which objects are decomposed
  • The volumes are called Geons (geometric ions)
  • Many features of these geons remain the same in 2D & 3D (e.g. collinearly, curvature, symmetry, parallelism, co-termination)
  • He estimated that there are < 36 of these geons, therefore 36 squared = 1296 pairs of geons, which can be attached in different ways/ different relative sizes
  • 75,000 possible 2-geon objects
  • He gave experimental evidence for these geons in human object recognition as it was much poorer recognition in the missing-geon condition, particularly when presented only for a short time
  • Partially occlusion changes the object from flat to 3D
  • The visual system will generate hypotheses about how the objects contours may continue behind the occlude – cognitive process that allows people to infer the full shape even if the part of it is hidden
  • The Geons theory struggles to explain how humans perceive and recognise objects that are partially obscured as it doesn’t fully account for this inferential processing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Assessment of structural description models

A

Pros:
- Invariance is well explained
- Recognition relies on description rather than matching
- Graded representations cope with discrimination and generalization
- Evidence that structural information matters to humans and to neurons
Cons:
- Extracting model parameters can be hard in real images (e.g. occlusion)
- Structural description is difficult for some objects (e.g. crumpled paper)
- Driven by theoretical desirability rather than behavioural or physiological evidence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what are view-dependent models?

A
  • Bulthoff & Edelman (1992), Riesenhuber & Poggio (1999)
  • View-dependent models start with a completely different idea
  • Don’t need a catalogue of fixed 3D models, but rather a catalogue of shape descriptions that match view-dependent characteristics of objects
  • The finding of a canonical perspective supports the idea of an experience-based catalogue
  • Canonical perspective – is what comes to mind when we think of an object and its view that seems to maximise the info used for recognition
  • Uses a viewer-based coordinate system
  • The primitives are sub-regions of the image:
    a) Not the whole image. These are ‘abstract features’ (i.e. lines, curves, texture, colour, shading, etc)
    b) Feature-sensitive units combine into each other in a weighted way, getting more complicated
    c) Size and position invariant
    d) These feed into view-tuned object recognition cells
    e) Recognition by matching input to the closest stored view
  • Main difference is a weighted approach between layers, rather than winner takes all
    1. View layer: units code a particular view of the object
    2. Object layer: units respond to a particular object (view-independent)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Evidence for view-dependent models

A
  • Human object recognition is not perfectly viewpoint invariant
  • The viewing sphere: practised recognising objects form specific viewpoints, tested at novel viewpoints
    1. Interpolation: between previous viewpoints (easiest)
    2. Extrapolation; beyond previous viewpoints but in the same axis (medium difficulty)
    3. Orthogonal axis: from a completely new viewpoint (hardest)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Assessment of view-dependent models

A

Pros:
- Straightforward
- Minimises transformations that must be performed
- Newer models are based directly on what we know of physiology
- Abstract features are recombinable
- Good behavioural, physiological, and stimulation-based evidence
Cons:
- Humans often show quite good generalisation across viewpoints even for novel objects
- Still more memory intensive than e.g. geon model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly