|
The Technion Prediction Tournament Organized by: Ido Erev, Eyal Ert, and Alvin E. Roth |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
7.3 Baseline models -
Repeated decisions from experience (condition E-repeated) 7.3.1 Normalized Reinforcement Learning (NRL) Download SAS example of the NRL model here The normalized reinforcement learning model (see Erev & Barron, 2005 and a similar model in Erev, Bereby-Meyer & Roth, 1999) assumes a stochastic choice rule that is similar to the SCPT rule. Specifically, the probability of selecting the risky prospect at trial t is given by:
where WVt(k) is the weighted value of action k at trial t, μ is a free payoff sensitivity parameter, and Dt is a measure of experienced payoff variability. If strategy k was selected at t, its weighted value at trial t+1 is a weighted average of WVt(k) at t, and vt the obtained payoff at t
The parameter 0 < ω < 1 captures the weight for the recent outcomes. The initial value, WV1(k) is assumed to equal A(1)—the expected payoff from random choice (e.g., A(1) in Problem 1 is .5[18.8(.8) + 7.6(.2)] +15.5]. The payoff variability term Dt is the weighted average of the difference between the obtained payoff at trial t and t-1:
where v0 is assumed to equal A(1), and D1 is assumed to equal l. The predictions of this model with the parameters estimated by Erev and Barron and the parameters that best fit the data are presented in Table 4. The results reveal a relatively large advantage of the estimated parameters.
7.3.2 Basic Reinforcement Learning The basic reinforcement learning model, considered here, is a simplification of the NRL model. The simplification involves the assumption Dt =1. Table 4 reveals that this simplification impairs the fit. 7.3.3 RELACS Erev and Barron (2005) present a generalization of the NRL model that assumes reinforcement learning among cognitive strategies. Since this model is rather complex, we do not present it here. Table 4 presents its predictions for the current task with the parameters estimated by Erev and Barron, and with the parameters that best fit the current data. The results reveal that RELACS provide relatively good predictions using the original parameter, but the best fit is not very good.
7.3.4 Explorative sampler The explorative sample model (Erev, Ert & Yechiam, 2008) can be summarized with the following assumptions: A1: Exploration and exploitation. The agents are assumed to consider two cognitive strategies: exploration and exploitation. Exploration implies a random choice. The probability of exploration is 1 in the very first trial, and (when information concerning the forgone payoffs is not available)it reduces toward an asymptote (at ε) with experience. The effect of experience on the probability of exploration depends on the expected length of the experiment (T). Exploration diminishes quickly when T is small, and slowly when T is large. This assumption is quantified as follows:
where d is a free parameter that captures the sensitivity to the length of the experiment. A2: Experiences. The experiences with each alternative include the set of observed outcomes yielded by this alternative in previous trials. In addition, when the payoffs are limited to the obtained payoff, the subjective value of the very first outcome is recalled as an experience with all the alternatives. A3: Naïve sampling. Under exploitation the agent draws (with replacement) a sample of mt past experiences with each alternative. All previous experiences are equally likely to be sampled. A4: Sampling algorithm. The value of mt at trial t is assumed to be randomly selected from the set {1, 2,…… k} where k is a free parameter. The sampling algorithm is assumed to depend on the available information. When the feedback is limited to the obtained payoffs the sampling from the experiences with the different alternatives is independent. When the foregone payoffs are known (the decision makers receive complete feedback that includes the payoff from the unselected alternatives), the distinct samples are perfectly correlated. The decision maker selects one set of mt trials, and the outcomes in those trials are used to determine the values of the different alternatives. A5: Regressiveness, diminishing sensitivity, and choice. The recalled subjective values of the outcome x (from selecting alternative j) at trial t is assumed to be affected by two factors: regression to the mean of all the experiences with the relevant alternative (in the first t-1 trials), and diminishing sensitivity. Regression is captured with the assumption that the regressed value is Rx= (1-w)x + (w)Aj(t), where w is a free parameter and Aj(t) is the average outcome from the relevant alternative.[1] Diminishing sensitivity is captured with a variant of prospect theory’s (Kahneman and Tversky, 1979) value function that assumes
Where αt = (1+Vt)(-β), β > 0 is a free parameter, and Vt is a measure of payoff variability. Vt is computed as the average absolute difference between consecutive obtained payoffs in the first t-1 trials (with an initial value at 0). The parameter β captures the effect of diminishing sensitivity: large β implies quick increase in diminishing sensitivity with payoff variability. The estimated subjective value of each alternative at trial t is the mean of the subjective value of the alternative's sample in that trial. Under exploitation the agent selects the alternative with the highest estimated value. Table 4 presents the predictions of the explorative sampler model with the parameters estimated by Erev et al. (2008), and with the parameters that best fit the current data. The results reveal that the model tends to over-predict the tendency to select the risky prospect.
7.3.5 Explorative sampler with recency Download SAS example of the Explorative sampler with recency model here The last model presented here is a refinement of the explorative sampler model that was developed to capture the bias considered above. Specifically, the refined model assumes that the most recent outcome with each alternative is always considered. This change is implemented by replacing assumption A3 with the following assumption: A3’: Naïve sampling with recency. Under exploitation the agent draws (with replacement) a sample of mt past experiences with each alternative. The first draw is the most recent experience with each alternative. All previous experiences are equally likely to be sampled in the remaining mt-1 draws. Notice that Assumption A3’ implies a hot stove effect (see Denrell & March, 2001): An increase in risk aversion with experience. The right hand column in table 4 presents the predictions of the refined model. The results show that the refinement improves the fit. Additional analysis shows that the added recency effect does not impair the prediction of the explorative sampling models in the experimental condition reviewed by Erev & Haruvy, 2007. Table 4: Condition E-repeated: The upper panel presents the aggregated proportion of choices in Risk (Prisk) and the predictions of the baseline models. The lower panel presents summary statistic.
[1] Implicit in this regressiveness (the assumption W > 0) is the assumption that all the experiences are weighted (because all the experiences affect the mean). The value of this implicit assumption was demonstrated by Lebiere, Gonzalez and Martin (2006): It is necessary to capture the observed behavior in Problem 24.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||