Thursday, 6 August 2015

Overview of B.F. Skinner's work on Operant Conditioning

Links to material about learning by machines

TRIAL & ERROR LEARNING

Behaviourist Thoery/ Thorndike's Theory of Connectionism/ Trial and Error Learning/ S-R      Bond Theory

         Trial and Error is a method of learning in which various response are tentatively tried and some discarded until a solution is attained.

          E.L.Thorndike(1874-1949) was the chief exponent of th theory of connectionism or trial and error. He was an American Psychologist who conducted Stimulus - Response(S-R) theory experiment with the help of animals. Thorndike was the first to study the subject of learning systematically using standardized procedure and apparatus. All learning, according to Thorndike is the formation of bonds or connections between Stimulus- Response.

The Puzzle Box Experiment

Thorndike's Experiment on cat in the puzzle is widely known and often quoted in psychology of learning. The experimental set up was very simple. A hungry cat was confined in a puzzle box and outside the box a dish of food was kept. The cat, in the box had to pull a string to come out of the box. The cat in the box made several random movement of jumping, dashing and running to get out of the box. The cat atlast succeeded in pulling the string. The door of the puzzle box opened, the cat came out and ate the food. He promtly put the cat to next trial. The cat again gave a lot of frantic behaviour but it soon succeeded in pulling the string. It repeated for several time, Thorndike noticed as the repetition increases the error also reduced i.e., Thorndike's cat showed slow, gradual and continous improvement in performance over successive trials. He concluded that learning of cat in the puzzle box can be explained in term of formation of direct connectionism between stimulus and response.

Features of Trial and Error Learning

1. Learning by trial and error is gradual process.
2. For learning to occur, the learner must be definitely motivated.
3. The learner makes random and variable response.
4. Some responses do lead to the goal (annoying response)
5. Some responses lead to the goal. (satisfying responses)
6. With the increase in number of trials the annoying responses will tend to be eliminated and the satisfying responses will be strengthened and repeated.
7. The time taken to perform the task (to repeat the satisfying response) decreases with successive trials.

The experiment sum up the following in the process of learning:
1. Drive : In the present experiment, drive was hunger and was intensified with the sight of food. (motivation)
2. Goal : To get the food by getting out of the box.
3. Block: The cat was confined in the box with a closed door.
4. Random Movement: The cat, persistently, tried to get out of the box.
5. Chance of Success: As a result of this striving and random movement the cat, by chance, succeeded in opening the door.
6. Selection (of proper moevement) : Gradually the cat recognised the correct manipulation of latch.
7. Fixation: At last, the cat learned the proper way of opening the door by eliminating all the incorrect responses and fixing the only right responses.

Through the experiment, Thorndike explains that the learning is nothing but the stamping in of correct responses and stamping out of incorrect responses through trial and error.

Thorndike's Laws of Learning

i) Law of Readiness : The law states "When any conduction unit is ready to conduct, for it do so is satisfying. When a conduction unit is not ready to conduct. for it to conduct is annoying. When any conduction is ready to conduct, for it not to do so is annoying."

ii) Law of Effect: The law states "When a modifiable connection between a stimulus and response is made and is accompanied or followed by a satisfying state of affairs the strength of connection is increased. When a connection between stimulus and response is made and accompanied or followed by an annoying state of affairs , it strength decreases.

iii) Law of Execise: The law states "Any response to a situation will, other things being equal, be more strongly connected with the situation in proportion to the number of times it has been connected with that situation and to the average vigour and duration of the connection."

The law has two sub parts: a) Law of Use and b) Law of Disuse

a) Law of Use states that "When a modifiable connection is made between a situation and response that connection strength is increased if it is practised."
b) Law of Disuse states that "When a modifiable connection is not made between a situation and response, during a length of time, that connection's strength is decreased." This means, any act that is not practised for sometime gradually decays.

Educational Implication

1. Thorndike's theory emphasize the importance of motivation in learning. So learning should be made purposeful and goal directed.
2. It stresses the importance of mental readiness, meaningful practise and incentive in learning process.
3. The law of readiness implies that the teacher should prepare the minds of the students to be ready to accept the knowledge, skills and aptitudes before teaching the topic.
4. More and more opportunities should be given to the learners to use and repeat the knowledge they get in the classroom for effectiveness and longer retention.
5. To maintain learned connection for longer period, review of learned material is necessary.
6. The law of effect has called atention to the importance of motivation and reinforcement in learning.
7. In order to benefit from the mechanism of association in the learning process what is being taught at one situation should be linked with the past experience of the learner.

INSIGHT LEARNING

Wolfgang Kohler, a psychologist trained at the University of Berlin, was working at a primate research facility maintained by the Prussian Academy of Sciences in the Canary Islands when the First World War broke out. Marooned there, he had at his disposal a large outdoor pen and nine chimpanzees of various ages. The pen, described by Kohler as a playground, was provided with a variety of objects including boxes, poles, and sticks, with which the primates could experiment.

Kohler constructed a variety of problems for the chimps, each of which involved obtaining food that was not directly accessible. In the simplest task, food was put on the other side of a barrier. Dogs and cats in previous experiments had faced the barrier in order to reach the food, rather than moving away from the goal to circumvent the barrier. The chimps, however, presented with an apparently analogous situation, set off immediately on the circuitous route to the food.

It is important to note that the dogs and cats that had apparently failed this test were not necessarily less intelligent than the chimps. The earlier experiments that psychologists had run on dogs and cats differed from Kohler's experiments on chimps in two important ways. First, the barriers were not familiar to the dogs and cats, and thus there was no opportunity for using latent learning, whereas the chimps were well acquainted with the rooms used in Kohler's tests. Second, whereas the food remained visible in the dog and cat experiments, in the chimp test the food was tossed out the window (after which the window was shut) and fell out of sight. Indeed, when Kohler tried the same test on a dog familiar with the room, the animal (after proving to itself that the window was shut), took the shortest of the possible indirect routes to the unseen food.

The ability to select an indirect (or even novel) route to a goal is not restricted to chimps, cats, and dogs. At least some insects routinely perform similar feats. The cognitive processing underlying these abilities will become clearer when we look at navigation by chimps in a later chapter. For now, the point is that the chimpanzees' abilities to plan routes are not as unique as they appeared at the time.

Some of the other tests that Kohler is known for are preserved on film. In a typical sequence, a chimp jumps fruitlessly at bananas that have been hung out of reach. Usually, after a period of unsuccessful jumping, the chimp apparently becomes angry or frustrated, walks away in seeming disgust, pauses, then looks at the food in what might be a more reflective way, then at the toys in the enclosure, then back at the food, and then at the toys again. Finally the animal begins to use the toys to get at the food.

The details of the chimps' solutions to Kohler's food-gathering puzzle varied. One chimp tried to shinny up a toppling pole it had poised under the bananas; several succeeded by stacking crates underneath, but were hampered by difficulties in getting their centers of gravity right. Another chimp had good luck moving a crate under the bananas and using a pole to knock them down. The theme common to each of these attempts is that, to all appearances, the chimps were solving the problem by a kind of cognitive trial and error, as if they were experimenting in their minds before manipulating the tools. The pattern of these behaviors--failure, pause, looking at the potential tools, and then the attempt--would seem to involve insight and planning, at least on the first occasion.

PAVLO EXPERIMENT

Pavlov's Dogs

Like many great scientific advances, Pavlovian conditioning (aka classical conditioning) was discovered accidentally.

During the 1890s Russian physiologist Ivan Pavlov was looking at salivation in dogs in response to being fed, when he noticed that his dogs would begin to salivate whenever he entered the room, even when he was not bringing them food. At first this was something of a nuisance (not to mention messy!).

Pavlovian Conditioning

Pavlov (1902) started from the idea that there are some things that a dog does not need to learn. For example, dogs don’t learn to salivate whenever they see food. This reflex is ‘hard wired’ into the dog. Inbehaviorist terms, it is an unconditioned response (i.e. a stimulus-response connection that required no learning). In behaviorist terms, we write:

Unconditioned Stimulus (Food) > Unconditioned Response (Salivate)

Pavlov showed the existence of the unconditioned response by presenting a dog with a bowl of food and the measuring its salivary secretions (see image below).

However, when Pavlov discovered that any object or event which the dogs learnt to associate with food (such as the lab assistant) would trigger the same response, he realized that he had made an important scientific discovery. Accordlingly, he devoted the rest of his career to studying this type of learning.

Pavlov knew that somehow, the dogs in his lab had learned to associate food with his lab assistant. This must have been learned, because at one point the dogs did not do it, and there came a point where they started, so their behavior had changed. A change in behavior of this type must be the result of learning.

In behaviorist terms, the lab assistant was originally a neutral stimulus. It is called neutral because it produces no response. What had happened was that the neutral stimulus (the lab assistant) had become associated with an unconditioned stimulus (food).

In his experiment, Pavlov used a bell as his neutral stimulus. Whenever he gave food to his dogs, he also rang a bell. After a number of repeats of this procedure, he tried the bell on its own. As you might expect, the bell on its own now caused an increase in salivation.

So the dog had learned an association between the bell and the food and a new behavior had been learnt. Because this response was learned (or conditioned), it is called a conditioned response. The neutral stimulus has become a conditioned stimulus.

Pavlov found that for associations to be made, the two stimuli had to be presented close together in time. He called this the law of temporal contiguity. If the time between the conditioned stimulus (bell) and unconditioned stimulus (food) is too great, then learning will not occur.

Pavlov and his studies of classical conditioning have become famous since his early work between 1890-1930. Classical conditioning is "classical" in that it is the first systematic study of basic laws of learning / conditioning.

Summary

To summarize, classical conditioning (later developed by John Watson) involves learning to associate an unconditioned stimulus that already brings about a particular response (i.e. a reflex) with a new (conditioned) stimulus, so that the new stimulus brings about the same response.

Pavlov developed some rather unfriendly technical terms to describe this process. The unconditioned stimulus (or UCS) is the object or event that originally produces the reflexive / natural response.

The response to this is called the unconditioned response (or UCR). The neutral stimulus (NS) is a new stimulus that does not produce a response.

Once the neutral stimulus has become associated with the unconditioned stimulus, it becomes a conditioned stimulus (CS). The conditioned response (CR) is the response to the conditioned stimulus.

References

Pavlov, I. P. (1897/1902). The work of the digestive glands. London: Griffin.

Pavlov, I. P. (1928). Lectures on conditioned reflexes.(Translated by W.H. Gantt) London: Allen and Unwin.

Pavlov, I. P. (1955). Selected works. Moscow: Foreign Languages Publishing House.

SKINNER

By the 1920s John B. Watson had left academic psychology and other behaviorists were becoming influential, proposing new forms of learning other than classical conditioning. Perhaps the most important of these was Burrhus Frederic Skinner. Although, for obvious reasons he is more commonly known as B.F. Skinner.

Skinner's views were slightly less extreme than those of Watson (1913). Skinner believed that we do have such a thing as a mind, but that it is simply more productive to study observable behavior rather than internal mental events.

The work of Skinner was rooted in a view that classical conditioning was far too simplistic to be a complete explanation of complex human behaviour. He believed that the best way to understand behavior is to look at the causes of an action and its consequences. He called this approach operant conditioning.

Operant Conditioning deals with operants - intentional actions that have an effect on the surrounding environment. Skinner set out to identify the processes which made certain operant behaviours more or less likely to occur.

Skinner's theory of operant conditioning was based on the work of Thorndike (1905). Edward Thorndike studied learning in animals using a puzzle box to propose the theory known as the 'Law of Effect'.

BF Skinner: Operant Conditioning

Skinner is regarded as the father of Operant Conditioning, but his work was based on Thorndike’s law of effect. Skinner introduced a new term into the Law of Effect - Reinforcement. Behavior which is reinforced tends to be repeated (i.e. strengthened); behavior which is not reinforced tends to die out-or be extinguished (i.e. weakened).

Skinner (1948) studied operant conditioning by conducting experiments using animals which he placed in a 'Skinner Box' which was similar to Thorndike’s puzzle box.

Skinner Box illustration operant conditioning

B.F. Skinner (1938) coined the term operant conditioning; it means roughly changing of behavior by the use of reinforcement which is given after the desired response. Skinner identified three types of responses or operant that can follow behavior.

• Neutral operants: responses from the environment that neither increase nor decrease the probability of a behavior being repeated.

• Reinforcers: Responses from the environment that increase the probability of a behavior being repeated. Reinforcers can be either positive or negative.

• Punishers: Responses from the environment that decrease the likelihood of a behavior being repeated. Punishment weakens behavior.

We can all think of examples of how our own behavior has been affected by reinforcers and punishers. As a child you probably tried out a number of behaviors and learned from their consequences.

For example, if when you were younger you tried smoking at school, and the chief consequence was that you got in with the crowd you always wanted to hang out with, you would have been positively reinforced (i.e. rewarded) and would be likely to repeat the behavior. If, however, the main consequence was that you were caught, caned, suspended from school and your parents became involved you would most certainly have been punished, and you would consequently be much less likely to smoke now.

Positive Reinforcement

Skinner showed how positive reinforcement worked by placing a hungry rat in his Skinner box. The box contained a lever in the side and as the rat moved about the box it would accidentally knock the lever. Immediately it did so a food pellet would drop into a container next to the lever. The rats quickly learned to go straight to the lever after a few times of being put in the box. The consequence of receiving food if they pressed the lever ensured that they would repeat the action again and again.

Positive reinforcement strengthens a behavior by providing a consequence an individual finds rewarding. For example, if your teacher gives you £5 each time you complete your homework (i.e. a reward) you are more likely to repeat this behavior in the future, thus strengthening the behavior of completing your homework.

Skinner -Operant Conditioning

BF Skinner: Operant Conditioning

Skinner (1948) studied operant conditioning by conducting experiments using animals which he placed in a 'Skinner Box' which was similar to Thorndike’s puzzle box.

• Neutral operants: responses from the environment that neither increase nor decrease the probability of a behavior being repeated.

• Reinforcers: Responses from the environment that increase the probability of a behavior being repeated. Reinforcers can be either positive or negative.

• Punishers: Responses from the environment that decrease the likelihood of a behavior being repeated. Punishment weakens behavior.

We can all think of examples of how our own behavior has been affected by reinforcers and punishers. As a child you probably tried out a number of behaviors and learned from their consequences.

Positive Reinforcement

Negative Reinforcement

The removal of an unpleasant reinforcer can also strengthen behavior. This is known as negative reinforcement because it is the removal of an adverse stimulus which is ‘rewarding’ to the animal or person. Negative reinforcement strengthens behavior because it stops or removes an unpleasant experience.

For example, if you do not complete your homework you give your teacher £5. You will complete your homework to avoid paying £5, thus strengthening the behavior of completing your homework.

Skinner showed how negative reinforcement worked by placing a rat in his Skinner box and then subjecting it to an unpleasant electric current which caused it some discomfort. As the rat moved about the box it would accidentally knock the lever. Immediately it did so the electric current would be switched off. The rats quickly learned to go straight to the lever after a few times of being put in the box. The consequence of escaping the electric current ensured that they would repeat the action again and again.

In fact Skinner even taught the rats to avoid the electric current by turning on a light just before the electric current came on. The rats soon learned to press the lever when the light came on because they knew that this would stop the electric current being switched on.

These two learned responses are known as Escape Learning and Avoidance Learning.

Punishment (weakens behavior)

Punishment is defined as the opposite of reinforcement since it is designed to weaken or eliminate a response rather than increase it. It is an aversive event that decreases the behavior that it follows

Like reinforcement, punishment can work either by directly applying an unpleasant stimulus like a shock after a response or by removing a potentially rewarding stimulus, for instance, deducting someone’s pocket money to punish undesirable behavior.

Note: It is not always easy to distinguish between punishment and negative reinforcement.

There are many problems with using punishment, such as:

Punished behavior is not forgotten, it's suppressed - behavior returns when punishment is no longer present.
Causes increased aggression - shows that aggression is a way to cope with problems.
Creates fear that can generalize to undesirable behaviors, e.g., fear of school.
Does not necessarily guide toward desired behavior - reinforcement tells you what to do, punishment only tells you what not to do.

Schedules of Reinforcement

Imagine a rat in a “Skinner box”. In operant conditioning if no food pellet is delivered immediately after the lever is pressed then after several attempts the rat stops pressing the lever (how long would someone continue to go to work if their employer stopped paying them?). The behaviour has been extinguished.

Behaviorists discovered that different patterns (or schedules) of reinforcement had different effects on the speed of learning and on extinction. Ferster and Skinner (1957) devised different ways of delivering reinforcement, and found that this had effects on

1. The Response Rate - The rate at which the rat pressed the lever (i.e. how hard the rat worked).

2. The Extinction Rate - The rate at which lever pressing died out (i.e. how soon the rat gave up).

Skinner found that the type of reinforcement which produces the slowest rate of extinction (i.e. people will go on repeating the behaviour for the longest time without reinforcement) is variable-ratio reinforcement. The type of reinforcement which has the quickest rate of extinction is continuous reinforcement.

(A) Continuous Reinforcement

An animal/human is positively reinforced every time a specific behaviour occurs, e.g. every time a lever is pressed a pellet is delivered and then food delivery is shut off.

Response rate is SLOW
Extinction rate is FAST

(B) Fixed Ratio Reinforcement

Behavior is reinforced only after the behavior occurs a specified number of times. E.g. one reinforcement is given after every so many correct responses e.g. after every 5th response. For example a child receives a star for every five words spelt correctly.

Response rate is FAST
Extinction rate is MEDIUM

(C) Fixed Interval Reinforcement

One reinforcement is given after a fixed time interval providing at least one correct response has been made. An example is being paid by the hour. Another example would be every 15 minutes (half hour, hour etc.) a pellet is delivered (providing at least one lever press has been made) then food delivery is shut off.

Response rate is MEDIUM
Extinction rate is MEDIUM

(D) Variable Ratio Reinforcement

Behavior is reinforced after an unpredictable number of times. For examples gambling or fishing.

Response rate is FAST
Extinction rate is SLOW (very hard to extinguish because of unpredictability )

(E) Variable Interval Reinforcement

Providing one correct response has been made, reinforcement is given after an unpredictable amount of time has passed, e.g. on average every 5 minutes. An example is a self-employed person being paid at unpredictable times.

Response rate is FAST
Extinction rate is SLOW

Behavior Shaping

A further important contribution made by Skinner (1951) is the notion of behaviour shaping through successive approximation. Skinner argues that the principles of operant conditioning can be used to produce extremely complex behaviour if rewards and punishments are delivered in such a way as to encourage move an organism closer and closer to the desired behaviour each time.

In order to do this, the conditions (or contingencies) required to receive the reward should shift each time the organism moves a step closer to the desired behaviour.

According to Skinner, most animal and human behaviour (including language) can be explained as a product of this type of successive approximation.

Behavior Modification

Behavior modification is a set of therapies / techniques based on operant conditioning (Skinner, 1938, 1953). The main principle comprises changing environmental events that are related to a person's behavior. For example, the reinforcement of desired behaviors and ignoring or punishing undesired ones.

This is not as simple as it sounds — always reinforcing desired behavior, for example, is basically bribery.

There are different types of positive reinforcements. Primary reinforcement is when a reward strengths a behavior by itself. Secondary reinforcement is when something strengthens a behavior because it leads to a primary reinforcer.

Examples of behavior modification therapy include token economy and behavior shaping

Token Economy

Token economy is a system in which targeted behaviors are reinforced with tokens (secondary reinforcers) and are later exchanged for rewards (primary reinforcers).

Tokens can be in the form of fake money, buttons, poker chips, stickers, etc. While rewards can range anywhere from snacks to privileges or activities.

Token economy has been found to be very effective in managing psychiatric patients. However, the patients can become over reliant on the tokens, making it difficult for them to adjust to society once they leave prisons, hospital etc.

Teachers also use token economy at primary school by giving young children stickers to reward good behavior.

Operant Conditioning in the Classroom

In the conventional learning situation operant conditioning applies largely to issues of class and student management, rather than to learning content. It is very relevant to shaping skill performance.

A simple way to shape behavior is to provide feedback on learner performance, e.g. compliments, approval, encouragement, and affirmation. A variable-ratio produces the highest response rate for students learning a new task, whereby initially reinforcement (e.g. praise) occurs at frequent intervals, and as the performance improves reinforcement occurs less frequently, until eventually only exceptional outcomes are reinforced.

For example, if a teacher wanted to encourage students to answer questions in class they should praise them for every attempt (regardless of whether their answer is correct). Gradually the teacher will only praise the students when their answer is correct, and over time only exceptional answers will be praised.

Unwanted behaviors, such as tardiness and dominating class discussion can be extinguished through being ignored by the teacher (rather than being reinforced by having attention drawn to them).

Knowledge of success is also important as it motivates future learning. However it is important to vary the type of reinforcement given, so that the behavior is maintained. This is not an easy task, as the teacher may appear insincere if he/she thinks too much about the way to behave.

Operant Conditioning Summary

Looking at Skinner's classic studies on pigeons’ / rat's behavior we can identify some of the major assumptions of the behaviorist approach.

• Psychology should be seen as a science, to be studied in a scientific manner. Skinner's study of behavior in rats was conducted under carefully controlled laboratory conditions.

• Behaviorism is primarily concerned with observable behavior, as opposed to internal events like thinking and emotion. Note that Skinner did not say that the rats learned to press a lever because they wanted food. He instead concentrated on describing the easily observed behavior that the rats acquired.

• The major influence on human behavior is learning from our environment. In the Skinner study, because food followed a particular behavior the rats learned to repeat that behavior, e.g. operant conditioning.

• There is little difference between the learning that takes place in humans and that in other animals. Therefore research (e.g. operant conditioning) can be carried out on animals (Rats / Pigeons) as well as on humans. Skinner proposed that the way humans learn behavior is much the same as the way the rats learned to press a lever.

So, if your layperson's idea of psychology has always been of people in laboratories wearing white coats and watching hapless rats try to negotiate mazes in order to get to their dinner, then you are probably thinking of behavioral psychology.

Behaviorism and its offshoots tend to be among the most scientific of the psychological perspectives. The emphasis of behavioral psychology is on how we learn to behave in certain ways. We are all constantly learning new behaviors and how to modify our existing behavior. Behavioral psychology is the psychological approach that focuses on how this learning takes place.

ENVIRONMENTAL AND HEALTH EDUCATION

Thursday, 6 August 2015

EXPERIMENT VIDEOS

PSYCHOLOGY MEANING

INSIGHT LEARNING

Operant Studies of Learning and Memory

TRIAL & ERROR LEARNING

INSIGHT LEARNING

PAVLO EXPERIMENT

Pavlov's Dogs

Pavlovian Conditioning

Summary

References

SKINNER

BF Skinner: Operant Conditioning

Positive Reinforcement

Skinner -Operant Conditioning

BF Skinner: Operant Conditioning

Positive Reinforcement

Negative Reinforcement

Punishment (weakens behavior)

Schedules of Reinforcement

(A) Continuous Reinforcement

(B) Fixed Ratio Reinforcement

(C) Fixed Interval Reinforcement

(D) Variable Ratio Reinforcement

(E) Variable Interval Reinforcement

Behavior Shaping

Behavior Modification

Token Economy

Operant Conditioning in the Classroom

Operant Conditioning Summary