Chapter 12 - Seeing
The visual code
It may seem to be a paradox to say that there are programs for seeing. The essence of programs is that they are plans for action, whereas in the conventional schemes of both philosophers and physiologists sensation and action are separated, and a sensation would usually be considered to come first, before action. We are proposing that the reverse is true and that higher animals, at least, go around actively searching for things to see and that they 'see' mainly those things that were expected because the program includes hypotheses and rules for testing them.
To understand this paradox we have to explain why seeing is not like photography and this involves some further rather subtle considerations about symbolism, which even today are not widely understood. When we see a red object, what goes up the optic nerve? It obviously is not red light so what then is it? The physiologist replies, 'a set of nerve impulses'. But nerve impulses are not red either. To express the situation we may say that the nerve impulses are signals that act as symbols of red light, in the sense that they can be decoded by an appropriate brain. Our task is then to explain what is implied by speaking in this way of 'signals', 'symbols', and 'decoding'.
Although seeing is not like photography we shall begin by saying that the eye is in some ways very like a camera. It has a lens and a diaphragm (the iris) and a focusing device. What is more, the first step in the process of vision is a photochemical change somewhat like that in a photographic plate. The retina contains a mosaic of more than one hundred million separate receiving elements of two sorts, the rods and cones, each of which detects a tiny part of the image that is thrown on it by the lens, producing a minute electrical or chemical change. Only the cones are sensitive to colours and most of them are concentrated near the centre of the eye. Here there is a small area, the fovea, containing only about 30 000 receptive cones. These perform nearly all the detailed work of seeing, except in dim light. In order to see things we have therefore continually to explore them by minute movements of the eyes around them, examining the part we want to see by the fovea. This is of fundamental importance for our system for thinking about vision because the program that controls these eye movements largely determines what we see.
Seeking what to see
It is easy to see that a person moves his eyes in jerks, say when he is reading or looking around the room. These large movements are separated by periods of fixation. They occur at a maximum of about five a second (in rapid reading) but often they are less frequent. The movements are ballistic and very fast (1000°/s). Even during fixation the eyes continue to make small tremor movements at about 50/s and 10 minutes of arc, enough to change the position of an image on the fovea by about 30 cones. Both the large and small fast movements are now usually called saccades (from a French word meaning the pull on the reins of a horse). In addition the eyes make slow drifts during fixation, and they may also follow moving targets (pursuit movements) or move to maintain stereoscopy (vergence movements).
No information is received during a saccade, so the large movements divide up
the process of seeing into a discontinuous series of packages, and it can be
shown that information does in fact reach the cerebral cortex in bursts,
corresponding with them. Figure 12.1 shows the sequence of pieces of information
that might be sent from the fovea as the eye scans a pyramid. Each jump is
towards a point that is likely to be interesting. The direction that is chosen
depends on the program in the brain, which makes a forecast on the basis of the
information received. In Fig. 12.2(a and b) the interest is obviously in the
human figures, and especially their eyes. Incidentally this is one more example
of the propensity of our brain programs to direct attention to human features, a
tendency that is probably partly inborn and no doubt accentuated by life as a
social creature. In Fig. I2.2(b) when instruction was given to search for
particular features the program was modified accordingly.
So the programs of enquiry that are learned from childhood onwards dictate what movements are made in response to the information coming in with each jump. Vision is a dynamic process, using a series of scans, but these are not rigidly determined as in a television raster. They are varied according to the nature of the scene itself and the previous experience of the individual. Moreover the scanning does not work by converting the information in the spatial scene into a single channel, but puts it into many parallel channels, which maintain the spatial relations, so in a sense the original picture is reproduced on the cortex, but modified and much expanded (p. 125).
We can thus regard all seeing as a continual search for the answers to questions posed by the brain. The signals sent from the retina constitute 'messages' conveying these answers. The brain then uses this information to construct a suitable hypothesis about what is there and a program of action to meet the situation. As a hungry boy looks around, his eyes may send signals that suggest a fruit tree. Signals go back to the eyes to search for food and if the returning messages indicate 'apples' he starts the climb to pick and eat them.
Encoding in the retina
The sequence of processes involved in the act of seeing do not therefore really begin in the retina, but involve the brain. Nevertheless it is convenient to ask just how the retina composes its messages. The rods and cones are the light-sensitive elements. They contain special pigments, which change when the intensity of light falling on them varies. This change alters the electrical potentials of the cells, so that the pattern of light thrown by the lens produces a corresponding pattern of electrical and chemical change in the various neurons that make up the retina (Fig. 12.3). Many of these cells are little 'microneurons', which do not send away yes-or-no signals but produce graded changes that increase or decrease the probability that their larger neighbours will set off action potentials. This is a sort of analogue computation, which finally generates discontinuous (digital) all-or-nothing signals in the largest cells of the retina, the ganglion cells, whose axons run to the brain. These impulses in the optic nerve fibres at each moment of scanning a scene are the answers, in code, to the 'questions' that had been asked at the previous moment. Of course if something quite unexpected happens it is seen even though it had not been anticipated. The point is that what goes on in the retina is not the recording of a 'picture', but the detection of a series of items, which are reported to the brain. If the eyes are prevented from moving the signals fade within a second and no picture can be seen.