<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD with MathML3 v1.2 20190208//EN" "JATS-archivearticle1-mathml3.dtd"> <article xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="1.2"><front><journal-meta><journal-id journal-id-type="nlm-ta">elife</journal-id><journal-id journal-id-type="publisher-id">eLife</journal-id><journal-title-group><journal-title>eLife</journal-title></journal-title-group><issn pub-type="epub" publication-format="electronic">2050-084X</issn><publisher><publisher-name>eLife Sciences Publications, Ltd</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="publisher-id">75308</article-id><article-id pub-id-type="doi">10.7554/eLife.75308</article-id><article-categories><subj-group subj-group-type="display-channel"><subject>Research Article</subject></subj-group><subj-group subj-group-type="heading"><subject>Computational and Systems Biology</subject></subj-group><subj-group subj-group-type="heading"><subject>Physics of Living Systems</subject></subj-group></article-categories><title-group><article-title>Conformist social learning leads to self-organised prevention against adverse bias in risky decision making</article-title></title-group><contrib-group><contrib contrib-type="author" corresp="yes" id="author-236566"><name><surname>Toyokawa</surname><given-names>Wataru</given-names></name><contrib-id authenticated="true" contrib-id-type="orcid">https://orcid.org/0000-0001-8558-8568</contrib-id><email>wataru.toyokawa@uni-konstanz.de</email><xref ref-type="aff" rid="aff1">1</xref><xref ref-type="other" rid="fund1"/><xref ref-type="fn" rid="con1"/><xref ref-type="fn" rid="conf1"/></contrib><contrib contrib-type="author" id="author-270982"><name><surname>Gaissmaier</surname><given-names>Wolfgang</given-names></name><xref ref-type="aff" rid="aff1">1</xref><xref ref-type="aff" rid="aff2">2</xref><xref ref-type="other" rid="fund1"/><xref ref-type="fn" rid="con2"/><xref ref-type="fn" rid="conf1"/></contrib><aff id="aff1"><label>1</label><institution-wrap><institution-id institution-id-type="ror">https://ror.org/0546hnb39</institution-id><institution>Department of Psychology, University of Konstanz</institution></institution-wrap><addr-line><named-content content-type="city">Konstanz</named-content></addr-line><country>Germany</country></aff><aff id="aff2"><label>2</label><institution-wrap><institution-id institution-id-type="ror">https://ror.org/0546hnb39</institution-id><institution>Centre for the Advanced Study of Collective Behaviour, University of Konstanz,</institution></institution-wrap><addr-line><named-content content-type="city">Konstanz</named-content></addr-line><country>Germany</country></aff></contrib-group><contrib-group content-type="section"><contrib contrib-type="editor"><name><surname>Liljeholm</surname><given-names>Mimi</given-names></name><role>Reviewing Editor</role><aff><institution-wrap><institution-id institution-id-type="ror">https://ror.org/04gyf1771</institution-id><institution>University of California, Irvine</institution></institution-wrap><country>United States</country></aff></contrib><contrib contrib-type="senior_editor"><name><surname>Frank</surname><given-names>Michael J</given-names></name><role>Senior Editor</role><aff><institution-wrap><institution-id institution-id-type="ror">https://ror.org/05gq02987</institution-id><institution>Brown University</institution></institution-wrap><country>United States</country></aff></contrib></contrib-group><pub-date date-type="publication" publication-format="electronic"><day>10</day><month>05</month><year>2022</year></pub-date><pub-date pub-type="collection"><year>2022</year></pub-date><volume>11</volume><elocation-id>e75308</elocation-id><history><date date-type="received" iso-8601-date="2021-11-05"><day>05</day><month>11</month><year>2021</year></date><date date-type="accepted" iso-8601-date="2022-04-01"><day>01</day><month>04</month><year>2022</year></date></history><pub-history><event><event-desc>This manuscript was published as a preprint at bioRxiv.</event-desc><date date-type="preprint" iso-8601-date="2021-02-23"><day>23</day><month>02</month><year>2021</year></date><self-uri content-type="preprint" xlink:href="https://doi.org/10.1101/2021.02.22.432286"/></event></pub-history><permissions><copyright-statement>© 2022, Toyokawa and Gaissmaier</copyright-statement><copyright-year>2022</copyright-year><copyright-holder>Toyokawa and Gaissmaier</copyright-holder><ali:free_to_read/><license xlink:href="http://creativecommons.org/licenses/by/4.0/"><ali:license_ref>http://creativecommons.org/licenses/by/4.0/</ali:license_ref><license-p>This article is distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License</ext-link>, which permits unrestricted use and redistribution provided that the original author and source are credited.</license-p></license></permissions><self-uri content-type="pdf" xlink:href="elife-75308-v1.pdf"/><self-uri content-type="figures-pdf" xlink:href="elife-75308-figures-v1.pdf"/><abstract><p>Given the ubiquity of potentially adverse behavioural bias owing to myopic trial-and-error learning, it seems paradoxical that improvements in decision-making performance through conformist social learning, a process widely considered to be bias amplification, still prevail in animal collective behaviour. Here we show, through model analyses and large-scale interactive behavioural experiments with 585 human subjects, that conformist influence can indeed promote favourable risk taking in repeated experience-based decision making, even though many individuals are systematically biased towards adverse risk aversion. Although strong positive feedback conferred by copying the majority’s behaviour could result in unfavourable informational cascades, our differential equation model of collective behavioural dynamics identified a key role for increasing exploration by negative feedback arising when a weak minority influence undermines the inherent behavioural bias. This ‘collective behavioural rescue’, emerging through coordination of positive and negative feedback, highlights a benefit of collective learning in a broader range of environmental conditions than previously assumed and resolves the ostensible paradox of adaptive collective behavioural flexibility under conformist influences.</p></abstract><abstract abstract-type="executive-summary"><title>eLife digest</title><p>When it comes to making decisions, like choosing a restaurant or political candidate, most of us rely on limited information that is not accurate enough to find the best option. Considering others’ decisions and opinions can help us make smarter choices, a phenomenon called “collective intelligence”.</p><p>Collective intelligence relies on individuals making unbiased decisions. If individuals are biased toward making poor choices over better ones, copying the group’s behavior may exaggerate biases. Humans are persistently biased. To avoid repeated failure, humans tend to avoid risky behavior. Instead, they often choose safer alternatives even when there might be a greater long-term benefit to risk-taking. This may hamper collective intelligence.</p><p>Toyokawa and Gaissmaier show that learning from others helps humans make better decisions even when most people are biased toward risk aversion. The experiments first used computer modeling to assess the effect of individual bias on collective intelligence. Then, Toyokawa and Gaissmaier conducted an online investigation in which 185 people performed a task that involved choosing a safer or risker alternative, and 400 people completed the same task in groups of 2 to 8. The online experiment showed that participating in a group changed the learning dynamics to make information sampling less biased over time. This mitigated people’s tendency to be risk-averse when risk-taking is beneficial.</p><p>The model and experiments help explain why humans have evolved to learn through social interactions. Social learning and the tendency of humans to conform to the group’s behavior mitigates individual risk aversion. Studies of the effect of bias on individual decision-making in other circumstances are needed. For example, would the same finding hold in the context of social media, which allows individuals to share unprecedented amounts of sometimes incorrect information?</p></abstract><kwd-group kwd-group-type="author-keywords"><kwd>social learning</kwd><kwd>conformity</kwd><kwd>reinforcement learning</kwd><kwd>hot stove effect</kwd><kwd>risky decision making</kwd><kwd>collective behaviour</kwd></kwd-group><kwd-group kwd-group-type="research-organism"><title>Research organism</title><kwd>Human</kwd></kwd-group><funding-group><award-group id="fund1"><funding-source><institution-wrap><institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/501100001659</institution-id><institution>Deutsche Forschungsgemeinschaft</institution></institution-wrap></funding-source><award-id>EXC 2117 - 422037984</award-id><principal-award-recipient><name><surname>Toyokawa</surname><given-names>Wataru</given-names></name><name><surname>Gaissmaier</surname><given-names>Wolfgang</given-names></name></principal-award-recipient></award-group><funding-statement>The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.</funding-statement></funding-group><custom-meta-group><custom-meta specific-use="meta-only"><meta-name>Author impact statement</meta-name><meta-value>Mathematical modelling and large-scale online experiments revealed that learning from others can induce 'smarter' decisions even when most individuals are biased towards adverse risk aversion.</meta-value></custom-meta></custom-meta-group></article-meta></front><body><sec id="s1" sec-type="intro"><title>Introduction</title><p>Collective intelligence, a self-organised improvement of decision making among socially interacting individuals, has been considered one of the key evolutionary advantages of group living (<xref ref-type="bibr" rid="bib33">Harrison et al., 2001</xref>; <xref ref-type="bibr" rid="bib41">Krause and Ruxton, 2002</xref>; <xref ref-type="bibr" rid="bib67">Sumpter, 2005</xref>; <xref ref-type="bibr" rid="bib74">Ward and Zahavi, 1973</xref>). Although what information each individual can access may be a subject of uncertainty, information transfer through the adaptive use of social cues filters such ‘noises’ out (<xref ref-type="bibr" rid="bib42">Laland, 2004</xref>; <xref ref-type="bibr" rid="bib60">Rendell et al., 2010</xref>), making individual behaviour on average more accurate (<xref ref-type="bibr" rid="bib34">Hastie and Kameda, 2005</xref>; <xref ref-type="bibr" rid="bib40">King and Cowlishaw, 2007</xref>; <xref ref-type="bibr" rid="bib64">Simons, 2004</xref>). Evolutionary models (<xref ref-type="bibr" rid="bib14">Boyd and Richerson, 1985</xref>; <xref ref-type="bibr" rid="bib38">Kandler and Laland, 2013</xref>; <xref ref-type="bibr" rid="bib39">Kendal et al., 2005</xref>) and empirical evidence (<xref ref-type="bibr" rid="bib71">Toyokawa et al., 2014</xref>; <xref ref-type="bibr" rid="bib73">Toyokawa et al., 2019</xref>) have both shown that the benefit brought by the balanced use of both socially and individually acquired information is usually larger than the cost of possibly creating an alignment of suboptimal behaviour among individuals by herding (<xref ref-type="bibr" rid="bib11">Bikhchandani et al., 1992</xref>; <xref ref-type="bibr" rid="bib29">Giraldeau et al., 2002</xref>; <xref ref-type="bibr" rid="bib57">Raafat et al., 2009</xref>). This prediction holds as long as individual trial-and-error learning leads to higher accuracy than merely random decision making (<xref ref-type="bibr" rid="bib26">Efferson et al., 2008</xref>). Copying a common behaviour exhibited by many others is adaptive if the output of these individuals is expected to be better than uninformed decisions.</p><p>However, both humans and non-human animals suffer not only from environmental noise but also commonly from systematic biases in their decision making (e.g. <xref ref-type="bibr" rid="bib32">Harding et al., 2004</xref>; <xref ref-type="bibr" rid="bib35">Hertwig and Erev, 2009</xref>; <xref ref-type="bibr" rid="bib58">Real, 1981</xref>; <xref ref-type="bibr" rid="bib59">Real et al., 1982</xref>). Under such circumstances, simply aggregating individual inputs does not guarantee collective intelligence because a majority of the group may be biased towards suboptimization. A prominent example of such a potentially suboptimal bias is risk aversion that emerges through trial-and-error learning with adaptive information-sampling behaviour (<xref ref-type="bibr" rid="bib21">Denrell, 2007</xref>; <xref ref-type="bibr" rid="bib46">March, 1996</xref>). Because it is a robust consequence of decision making based on learning (<xref ref-type="bibr" rid="bib35">Hertwig and Erev, 2009</xref>; <xref ref-type="bibr" rid="bib79">Yechiam et al., 2006</xref>; <xref ref-type="bibr" rid="bib77">Weber, 2006</xref>; <xref ref-type="bibr" rid="bib46">March, 1996</xref>), risk aversion can be a major constraint of animal behaviour, especially when taking a high-risk high-return behavioural option is favourable in the long run. Therefore, the ostensible prerequisite of collective intelligence, that is, that individuals should be unbiased and more accurate than mere chance, may not always hold. A theory that incorporates dynamics of trial-and-error learning and the learnt risk aversion into social learning is needed to understand the conditions under which collective intelligence operates in risky decision making.</p><p>Given that behavioural biases are omnipresent and learning animals rarely escape from them, it may seem that social learning, especially the ‘copy-the-majority’ behaviour (aka, ‘conformist social learning’ or ‘positive frequency-based copying’; <xref ref-type="bibr" rid="bib42">Laland, 2004</xref>), whereby the most common behaviour in a group is disproportionately more likely to be copied (<xref ref-type="bibr" rid="bib14">Boyd and Richerson, 1985</xref>), may often lead to maladaptive herding, because recursive social interactions amplify the common bias (i.e. a positive feedback loop; <xref ref-type="bibr" rid="bib22">Denrell and Le Mens, 2007</xref>; <xref ref-type="bibr" rid="bib23">Denrell and Le Mens, 2017</xref>; <xref ref-type="bibr" rid="bib25">Dussutour et al., 2005</xref>; <xref ref-type="bibr" rid="bib57">Raafat et al., 2009</xref>). Previous studies in humans have indeed suggested that individual decision-making biases are transmitted through social influences (<xref ref-type="bibr" rid="bib15">Chung et al., 2015</xref>; <xref ref-type="bibr" rid="bib8">Bault et al., 2011</xref>; <xref ref-type="bibr" rid="bib69">Suzuki et al., 2016</xref>; <xref ref-type="bibr" rid="bib63">Shupp and Williams, 2008</xref>; <xref ref-type="bibr" rid="bib37">Jouini et al., 2011</xref>; <xref ref-type="bibr" rid="bib51">Moussaïd et al., 2015</xref>). Nevertheless, the collective improvement of decision accuracy through simple copying processes has been widely observed across different taxa (<xref ref-type="bibr" rid="bib61">Sasaki and Biro, 2017</xref>; <xref ref-type="bibr" rid="bib62">Seeley et al., 1991</xref>; <xref ref-type="bibr" rid="bib1">Alem et al., 2016</xref>; <xref ref-type="bibr" rid="bib67">Sumpter, 2005</xref>; <xref ref-type="bibr" rid="bib33">Harrison et al., 2001</xref>), including the very species known to exhibit learnt risk-taking biases, such as bumblebees (<xref ref-type="bibr" rid="bib58">Real, 1981</xref>; <xref ref-type="bibr" rid="bib59">Real et al., 1982</xref>), honeybees (<xref ref-type="bibr" rid="bib24">Drezner-Levy and Shafir, 2007</xref>), and pigeons (<xref ref-type="bibr" rid="bib44">Ludvig et al., 2014</xref>). Such observations may indicate, counter-intuitively, that social learning may not necessarily trap animal groups in suboptimization even when most of the individuals are suboptimally biased.</p><p>In this paper, we propose a parsimonious computational mechanism that accounts for the emerging improvement of decision accuracy among suboptimally risk-aversive individuals. In our agent-based model, we allow our hypothetical agents to compromise between individual trial-and-error learning and the frequency-based copying process, that is, a balanced reliance on social learning that has been repeatedly supported in previous empirical studies (e.g. <xref ref-type="bibr" rid="bib19">Deffner et al., 2020</xref>; <xref ref-type="bibr" rid="bib47">McElreath et al., 2005</xref>; <xref ref-type="bibr" rid="bib48">McElreath et al., 2008</xref>; <xref ref-type="bibr" rid="bib72">Toyokawa et al., 2017</xref>; <xref ref-type="bibr" rid="bib73">Toyokawa et al., 2019</xref>). This is a natural extension of some previous models that assumed that individual decision making was regulated fully by others’ beliefs (<xref ref-type="bibr" rid="bib22">Denrell and Le Mens, 2007</xref>; <xref ref-type="bibr" rid="bib23">Denrell and Le Mens, 2017</xref>). Under such extremely strong social influence, exaggeration of individual bias was always the case because information sampling was always directed towards the most popular alternative, often resulting in a mismatch between the true environmental state and what individuals believed (’collective illusion’; <xref ref-type="bibr" rid="bib23">Denrell and Le Mens, 2017</xref>). By allowing a mixture of social and asocial learning processes within a single individual, the emergent collective behaviour is able to remain flexible (<xref ref-type="bibr" rid="bib3">Aplin et al., 2017</xref>; <xref ref-type="bibr" rid="bib73">Toyokawa et al., 2019</xref>), which may allow groups to escape from the suboptimal behavioural state.</p><p>We focused on a repeated decision-making situation where individuals updated their beliefs about the value of behavioural alternatives through their own action–reward experiences (experience-based task). Experience-based decision making is widespread in animals that learn in a range of contexts (<xref ref-type="bibr" rid="bib35">Hertwig and Erev, 2009</xref>). The time-depth interaction between belief updating and decision making may create a non-linear relationship between social learning and individual behavioural biases (<xref ref-type="bibr" rid="bib12">Biro et al., 2016</xref>), which we hypothesised is key in improving decision accuracy in self-organised collective systems (<xref ref-type="bibr" rid="bib33">Harrison et al., 2001</xref>; <xref ref-type="bibr" rid="bib67">Sumpter, 2005</xref>).</p><p>In the study reported here, we firstly examined whether a simple form of conformist social influence can improve collective decision performance in a simple multi-armed bandit task using an agent-based model simulation. We found that promotion of favourable risk taking can indeed emerge across different assumptions and parameter spaces, including individual heterogeneity within a group. This phenomenon occurs thanks, apparently, to the non-linear effect of social interactions, namely, <italic>collective behavioural rescue</italic>. To disentangle the core dynamics behind this ostensibly self-organised process, we then analysed a differential equation model representing approximate population dynamics. Combining these two theoretical approaches, we identified that it is a combination of positive and negative feedback loops that underlies collective behavioural rescue, and that the key mechanism is a promotion of information sampling by modest conformist social influence.</p><p>Finally, to investigate whether the assumptions and predictions of the model hold in reality, we conducted a series of online behavioural experiments with human participants. The experimental task was basically a replication of the task used in the agent-based model described above, although the parameters of the bandit tasks were modified to explore wider task spaces beyond the simplest two-armed task. Experimental results show that the human collective behavioural pattern was consistent with the theoretical prediction, and model selection and parameter estimation suggest that our model assumptions fit well with our experimental data. In sum, we provide a general account of the robustness of collective intelligence even under systematic risk aversion and highlight a previously overlooked benefit of conformist social influence.</p></sec><sec id="s2" sec-type="results"><title>Results</title><sec id="s2-1"><title>The decision-making task</title><p>The minimal task that allowed us to study both learnt risk aversion and conformist social learning was a two-armed bandit task where one alternative provided certain payoffs <inline-formula><mml:math id="inf1"><mml:msub><mml:mi>π</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:math></inline-formula> constantly (safe option <inline-formula><mml:math id="inf2"><mml:mi>s</mml:mi></mml:math></inline-formula>) and the other alternative provided a range of payoffs stochastically, following a Gaussian distribution <inline-formula><mml:math id="inf3"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>π</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mo>∼</mml:mo><mml:mrow><mml:mi mathvariant="script">N</mml:mi></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>μ</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo>.</mml:mo><mml:mi>d</mml:mi><mml:mo>.</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula> (risky option <inline-formula><mml:math id="inf4"><mml:mi>r</mml:mi></mml:math></inline-formula>; <xref ref-type="fig" rid="fig1">Figure 1a</xref>). Unless otherwise stated, we followed the same task setup as <xref ref-type="bibr" rid="bib21">Denrell, 2007</xref>, who mathematically derived the condition under which individual reinforcement learners would exhibit risk aversion. In the main analysis, we focus on the case where the risky alternative had a higher mean payoff than the safe alternative (i.e. producing more payoffs on average in the long run; positive risk premium [positive RP]), meaning that choosing the risky alternative was the optimal strategy for a decision maker to maximise accumulated payoffs. Unless otherwise stated, the total number of decision-making trials (time horizon) was set to <inline-formula><mml:math id="inf5"><mml:mrow><mml:mi>T</mml:mi><mml:mo>=</mml:mo><mml:mn>150</mml:mn></mml:mrow></mml:math></inline-formula> in the main simulations described below.</p><fig-group><fig id="fig1" position="float"><label>Figure 1.</label><caption><title>Mitigation of suboptimal risk aversion by social influence.</title><p>(<bold>a</bold>) A schematic diagram of the task. A safe option provides a constant reward <inline-formula><mml:math id="inf6"><mml:mrow><mml:msub><mml:mi>π</mml:mi><mml:mi>s</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula> whereas a risky option provides a reward randomly drawn from a Gaussian distribution with mean <inline-formula><mml:math id="inf7"><mml:mrow><mml:mi>μ</mml:mi><mml:mo>=</mml:mo><mml:mn>1.5</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="inf8"><mml:mrow><mml:mtext>s.d.</mml:mtext><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula>. (<bold>b, c</bold>): The emergence of suboptimal risk aversion (the hot stove effect) depending on a combination of the reinforcement learning parameters; (<bold>b</bold>): under no social influence (i.e. the copying weight <inline-formula><mml:math id="inf9"><mml:mrow><mml:mi>σ</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></inline-formula>), and (<bold>c</bold>): under social influences with different values of the conformity exponents <inline-formula><mml:math id="inf10"><mml:mi>θ</mml:mi></mml:math></inline-formula> and copying weights <inline-formula><mml:math id="inf11"><mml:mi>σ</mml:mi></mml:math></inline-formula>. The dashed curve is the asymptotic equilibrium at which asocial learners are expected to end up choosing the two alternatives with equal likelihood (i.e. <inline-formula><mml:math id="inf12"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo>→</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:math></inline-formula>), which is given analytically by <inline-formula><mml:math id="inf13"><mml:mrow><mml:mi>β</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>2</mml:mn><mml:mo>-</mml:mo><mml:mi>α</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>/</mml:mo><mml:mi>α</mml:mi></mml:mrow></mml:mrow></mml:math></inline-formula>(<xref ref-type="bibr" rid="bib21">Denrell, 2007</xref>). The coloured background is a result of the agent-based simulation with total trials <inline-formula><mml:math id="inf14"><mml:mrow><mml:mi>T</mml:mi><mml:mo>=</mml:mo><mml:mn>150</mml:mn></mml:mrow></mml:math></inline-formula> and group size <inline-formula><mml:math id="inf15"><mml:mrow><mml:mi>N</mml:mi><mml:mo>=</mml:mo><mml:mn>10</mml:mn></mml:mrow></mml:math></inline-formula>, showing the average proportion of choosing the risky option in the second half of the learning trials <inline-formula><mml:math id="inf16"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>></mml:mo><mml:mn>75</mml:mn></mml:mrow></mml:msub><mml:mo>></mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> under a given combination of the parameters. (<bold>d</bold>): The differences between the mean proportion of risk aversion of asocial learners and that of social learners, highlighting regions in which performance is improved (orange) or undermined (purple) by social learning.</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="elife-75308.xml.media/fig1.jpg"/></fig><fig id="fig1s1" position="float" specific-use="child-fig"><label>Figure 1—figure supplement 1.</label><caption><title>The simulation result with a wider parameter space.</title><p>The effect of the relationship between individual learning rate (<inline-formula><mml:math id="inf17"><mml:mi>α</mml:mi></mml:math></inline-formula>) and individual inverse temperature (<inline-formula><mml:math id="inf18"><mml:mi>β</mml:mi></mml:math></inline-formula>) across the different combinations of social learning parameters on the mean proportion of choosing the risky alternative in the second half of the trials of the two-armed bandit task described in <xref ref-type="fig" rid="fig1">Figure 1</xref> in the main text. The dashed curves give a set of parameter combinations with which asocial learners are expected to choose the risky alternative in the same proportion as they choose the safe alternative (i.e. <inline-formula><mml:math id="inf19"><mml:mrow><mml:msubsup><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mo>⋆</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:math></inline-formula>) in the infinite time horizon <inline-formula><mml:math id="inf20"><mml:mrow><mml:mi>T</mml:mi><mml:mo>→</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow></mml:math></inline-formula>, given by <inline-formula><mml:math id="inf21"><mml:mrow><mml:mi>β</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>2</mml:mn><mml:mo>-</mml:mo><mml:mi>α</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>/</mml:mo><mml:mi>α</mml:mi></mml:mrow></mml:mrow></mml:math></inline-formula>.</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="elife-75308.xml.media/fig1-figsupp1.jpg"/></fig><fig id="fig1s2" position="float" specific-use="child-fig"><label>Figure 1—figure supplement 2.</label><caption><title>The results of the value-shaping social influence model.</title><p>The relationships between individual learning rate (<inline-formula><mml:math id="inf22"><mml:mi>α</mml:mi></mml:math></inline-formula>) and individual inverse temperature (<inline-formula><mml:math id="inf23"><mml:mi>β</mml:mi></mml:math></inline-formula>) across different combinations of social learning parameters. The coloured background shows the average proportion of choosing the risky option in the second half of the learning trials <inline-formula><mml:math id="inf24"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>></mml:mo><mml:mn>75</mml:mn></mml:mrow></mml:msub><mml:mo>></mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>. Different social learning weights (<inline-formula><mml:math id="inf25"><mml:msub><mml:mi>σ</mml:mi><mml:mrow><mml:mi>v</mml:mi><mml:mo>⁢</mml:mo><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>) are shown from top to bottom (<inline-formula><mml:math id="inf26"><mml:mrow><mml:msub><mml:mi>σ</mml:mi><mml:mrow><mml:mi>v</mml:mi><mml:mo>⁢</mml:mo><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo>∈</mml:mo><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>0.1</mml:mn><mml:mo>,</mml:mo><mml:mn>0.25</mml:mn><mml:mo>,</mml:mo><mml:mn>0.5</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo stretchy="false">}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>). Different conformity exponents are shown from left to right (<inline-formula><mml:math id="inf27"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>∈</mml:mo><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mn>0.5</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo stretchy="false">}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>). The dashed curve is the asymptotic equilibrium at which asocial learners are expected to end up choosing both alternatives with equal likelihood (i.e. <inline-formula><mml:math id="inf28"><mml:mrow><mml:msubsup><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mo>⋆</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:math></inline-formula>), given by <inline-formula><mml:math id="inf29"><mml:mrow><mml:mi>β</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>2</mml:mn><mml:mo>-</mml:mo><mml:mi>α</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>/</mml:mo><mml:mi>α</mml:mi></mml:mrow></mml:mrow></mml:math></inline-formula>.</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="elife-75308.xml.media/fig1-figsupp2.jpg"/></fig><fig id="fig1s3" position="float" specific-use="child-fig"><label>Figure 1—figure supplement 3.</label><caption><title>The simulation result with the negative risk premium.</title><p>The relationships between individual learning rate (<inline-formula><mml:math id="inf30"><mml:mi>α</mml:mi></mml:math></inline-formula>) and individual inverse temperature (<inline-formula><mml:math id="inf31"><mml:mi>β</mml:mi></mml:math></inline-formula>) across different combinations of social learning parameters. The coloured background shows the average proportion of choosing the risky option in the second half of the learning trials <inline-formula><mml:math id="inf32"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>></mml:mo><mml:mn>75</mml:mn></mml:mrow></mml:msub><mml:mo>></mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>. Different social learning weights (<inline-formula><mml:math id="inf33"><mml:mi>σ</mml:mi></mml:math></inline-formula>) are shown from top to bottom (<inline-formula><mml:math id="inf34"><mml:mrow><mml:mi>σ</mml:mi><mml:mo>∈</mml:mo><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>0.25</mml:mn><mml:mo>,</mml:mo><mml:mn>0.5</mml:mn><mml:mo>,</mml:mo><mml:mn>0.75</mml:mn><mml:mo>,</mml:mo><mml:mn>0.9</mml:mn><mml:mo stretchy="false">}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>). Different conformity exponents are shown from left to right (<inline-formula><mml:math id="inf35"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>∈</mml:mo><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>4</mml:mn><mml:mo>,</mml:mo><mml:mn>8</mml:mn><mml:mo stretchy="false">}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>). The risk premium is negative<inline-formula><mml:math id="inf36"><mml:mrow><mml:mrow><mml:mi>μ</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math></inline-formula></p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="elife-75308.xml.media/fig1-figsupp3.jpg"/></fig><fig id="fig1s4" position="float" specific-use="child-fig"><label>Figure 1—figure supplement 4.</label><caption><title>The simulation result with the Bernoulli noise distribution.</title><p>The relationships between individual learning rate (<inline-formula><mml:math id="inf37"><mml:mi>α</mml:mi></mml:math></inline-formula>) and individual inverse temperature (<inline-formula><mml:math id="inf38"><mml:mi>β</mml:mi></mml:math></inline-formula>) across different combinations of social learning parameters. The coloured background shows the average proportion of choosing the risky option in the second half of the learning trials <inline-formula><mml:math id="inf39"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>></mml:mo><mml:mn>75</mml:mn></mml:mrow></mml:msub><mml:mo>></mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>. Different social learning weights (<inline-formula><mml:math id="inf40"><mml:mi>σ</mml:mi></mml:math></inline-formula>) are shown from top to bottom (<inline-formula><mml:math id="inf41"><mml:mrow><mml:mi>σ</mml:mi><mml:mo>∈</mml:mo><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>0.2</mml:mn><mml:mo>,</mml:mo><mml:mn>0.4</mml:mn><mml:mo>,</mml:mo><mml:mn>0.6</mml:mn><mml:mo>,</mml:mo><mml:mn>0.8</mml:mn><mml:mo stretchy="false">}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>). Different conformity exponents are shown from left to right (<inline-formula><mml:math id="inf42"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>∈</mml:mo><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>4</mml:mn><mml:mo>,</mml:mo><mml:mn>8</mml:mn><mml:mo stretchy="false">}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>). The binary payoff distribution was used where the safe alternative always provides <inline-formula><mml:math id="inf43"><mml:mrow><mml:msub><mml:mi>π</mml:mi><mml:mi>s</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula> while the risky alternative provides either a 70% chance of <inline-formula><mml:math id="inf44"><mml:mrow><mml:msub><mml:mi>π</mml:mi><mml:mi>r</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></inline-formula> or a 30% chance of <inline-formula><mml:math id="inf45"><mml:mrow><mml:msub><mml:mi>π</mml:mi><mml:mi>r</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>5</mml:mn></mml:mrow></mml:math></inline-formula> . The risk premium was 1.5.</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="elife-75308.xml.media/fig1-figsupp4.jpg"/></fig><fig id="fig1s5" position="float" specific-use="child-fig"><label>Figure 1—figure supplement 5.</label><caption><title>The simulation results under the positive risk premium experimental setups (a,d: the 1-risky-1-safe; b,e: the 1-risky-3-safe; c,f: the 2-risky-2-safe).</title><p>The relationships between individual learning rate (<inline-formula><mml:math id="inf46"><mml:mi>α</mml:mi></mml:math></inline-formula>) and individual inverse temperature (<inline-formula><mml:math id="inf47"><mml:mi>β</mml:mi></mml:math></inline-formula>) across different combinations of social learning parameters. (a–c): The coloured background shows the average proportion of choosing the risky option in the second half of the learning trials (<inline-formula><mml:math id="inf48"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>></mml:mo><mml:mn>75</mml:mn></mml:mrow></mml:msub><mml:mo>></mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>) under social influences with different values of the conformity exponents <inline-formula><mml:math id="inf49"><mml:mi>θ</mml:mi></mml:math></inline-formula> and copying weights <inline-formula><mml:math id="inf50"><mml:mi>σ</mml:mi></mml:math></inline-formula>. The dashed curve is the asymptotic equilibrium at which asocial learners are expected to end up choosing the two alternatives with equal likelihood (i.e. <inline-formula><mml:math id="inf51"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>r</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:math></inline-formula>). (d–f): The differences between the mean proportion of risk aversion of asocial learners and that of social learners, highlighting regions in which performance is improved (that is, risk-seeking increases; orange) or undermined (that is, risk-aversion is amplified; purple) by social learning.</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="elife-75308.xml.media/fig1-figsupp5.jpg"/></fig><fig id="fig1s6" position="float" specific-use="child-fig"><label>Figure 1—figure supplement 6.</label><caption><title>The simulation results under the negative risk premium experimental setup.</title><p>The relationships between individual learning rate (<inline-formula><mml:math id="inf52"><mml:mi>α</mml:mi></mml:math></inline-formula>) and individual inverse temperature (<inline-formula><mml:math id="inf53"><mml:mi>β</mml:mi></mml:math></inline-formula>) across different combinations of social learning parameters. (left): The coloured background shows the average proportion of choosing the (optimal) safe option in the second half of the learning trials under social influences with different values of the conformity exponents <inline-formula><mml:math id="inf54"><mml:mi>θ</mml:mi></mml:math></inline-formula> and copying weights <inline-formula><mml:math id="inf55"><mml:mi>σ</mml:mi></mml:math></inline-formula>. The dashed curve shows the proportion of choosing the safe option at <inline-formula><mml:math id="inf56"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>s</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>0.85</mml:mn></mml:mrow></mml:math></inline-formula>. (right): The differences between the mean proportion of risk aversion of asocial learners and that of social learners, highlighting regions in which (suboptimal) risk-seeking increases (orange) and (optimal) risk-aversion increases (purple) by social learning.</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="elife-75308.xml.media/fig1-figsupp6.jpg"/></fig></fig-group><p>To maximise one’s own long-term individual profit under such circumstances, it is crucial to strike the right balance between exploiting the option that has seemed better so far and exploring the other options to seek informational gain. Because of the nature of adaptive information sampling under such exploration–exploitation trade-offs, lone decision makers often end up being risk averse, trying to reduce the chance of further failures once the individual has experienced an unfavourable outcome from the risky alternative (<xref ref-type="bibr" rid="bib46">March, 1996</xref>; <xref ref-type="bibr" rid="bib21">Denrell, 2007</xref>; <xref ref-type="bibr" rid="bib35">Hertwig and Erev, 2009</xref>), a phenomenon known as the <italic>hot stove effect</italic>. Within the framework of this task, risk aversion is suboptimal in the long run if the risk premium is positive (<xref ref-type="bibr" rid="bib20">Denrell and March, 2001</xref>).</p></sec><sec id="s2-2"><title>The baseline model</title><p>For the baseline asocial reinforcement learning, we assumed a standard, well-established model that is a combination of the Rescorla–Wagner learning rule and softmax decision making (<xref ref-type="bibr" rid="bib68">Sutton and Barto, 2018</xref>, see Materials and methods for the full details). There are two parameters, a <italic>learning rate</italic> (<inline-formula><mml:math id="inf57"><mml:mi>α</mml:mi></mml:math></inline-formula>) and an <italic>inverse temperature</italic> (<inline-formula><mml:math id="inf58"><mml:mi>β</mml:mi></mml:math></inline-formula>). The larger the <inline-formula><mml:math id="inf59"><mml:mi>α</mml:mi></mml:math></inline-formula>, the more weight is given to recent experiences, making the agent’s belief update more myopic. The parameter <inline-formula><mml:math id="inf60"><mml:mi>β</mml:mi></mml:math></inline-formula> regulates how sensitive the choice probability is to the belief about the option’s value (i.e. controlling the proneness to explore). As <inline-formula><mml:math id="inf61"><mml:mrow><mml:mi>β</mml:mi><mml:mo>→</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></inline-formula>, the softmax choice probability approximates to a random choice (i.e. highly explorative). Conversely, if <inline-formula><mml:math id="inf62"><mml:mrow><mml:mi>β</mml:mi><mml:mo>→</mml:mo><mml:mrow><mml:mo>+</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow></mml:mrow></mml:math></inline-formula>, it asymptotes to a deterministic choice in favour of the option with the highest subjective value (i.e. highly exploitative).</p><p>Varying these two parameters systematically, it is possible to see under what conditions trial-and-error learning leads individuals to be risk averse (<xref ref-type="fig" rid="fig1">Figure 1b</xref>). Suboptimal risk aversion becomes prominent when value updating in learning is myopic (i.e. when <inline-formula><mml:math id="inf63"><mml:mi>α</mml:mi></mml:math></inline-formula> is large) or action selection is exploitative (i.e. when <inline-formula><mml:math id="inf64"><mml:mi>β</mml:mi></mml:math></inline-formula> is large) or both (the blue area of <xref ref-type="fig" rid="fig1">Figure 1b</xref>). Under such circumstances, the hot stove effect occurs (<xref ref-type="bibr" rid="bib21">Denrell, 2007</xref>): Experiences of low-value payoffs from the risky option tend to discourage decision makers from further choosing the risky option, trapping them in the safe alternative. In sum, whenever the interaction between the two learning parameters <inline-formula><mml:math id="inf65"><mml:mrow><mml:mi>α</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>β</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> exceeds a threshold value, which was 2 in the current example, decision makers are expected to become averse to the risky option (the black solid lines in <xref ref-type="fig" rid="fig2">Figure 2</xref>). The hot stove effect is known to emerge in a range of model implementations and has been widely observed in previous human experiments (<xref ref-type="bibr" rid="bib46">March, 1996</xref>; <xref ref-type="bibr" rid="bib21">Denrell, 2007</xref>; <xref ref-type="bibr" rid="bib35">Hertwig and Erev, 2009</xref>).</p><fig-group><fig id="fig2" position="float"><label>Figure 2.</label><caption><title>The effect of social learning on average decision performance.</title><p>The <italic>x</italic> axis is a product of two reinforcement learning parameters <inline-formula><mml:math id="inf66"><mml:mrow><mml:mi>α</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>β</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>, namely, the susceptibility to the hot stove effect. The <italic>y</italic> axis is the mean probability of choosing the optimal risky alternative in the last 75 trials in a two-armed bandit task whose setup was the same as in <xref ref-type="fig" rid="fig1">Figure 1</xref>. The black solid curve is the analytical prediction of the asymptotic performance of individual reinforcement learning with infinite time horizon <inline-formula><mml:math id="inf67"><mml:mrow><mml:mi>T</mml:mi><mml:mo>→</mml:mo><mml:mrow><mml:mo>+</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow></mml:mrow></mml:math></inline-formula> (<xref ref-type="bibr" rid="bib21">Denrell, 2007</xref>). The analytical curve shows a choice shift emerging at <inline-formula><mml:math id="inf68"><mml:mrow><mml:mrow><mml:mi>α</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>β</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:math></inline-formula>; that is, individual learners ultimately prefer the safe to the risky option in the current setup of the task when <inline-formula><mml:math id="inf69"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>α</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>β</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo><mml:mo>></mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>. The dotted curves are mean results of agent-based simulations of social learners with two different mean values of the copying weight <inline-formula><mml:math id="inf70"><mml:mrow><mml:mi>σ</mml:mi><mml:mo>∈</mml:mo><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mn>0.25</mml:mn><mml:mo>,</mml:mo><mml:mn>0.5</mml:mn><mml:mo stretchy="false">}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> (green and yellow, respectively) and asocial learners with <inline-formula><mml:math id="inf71"><mml:mrow><mml:mi>σ</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></inline-formula> (purple). The difference between the agent-based simulation with <inline-formula><mml:math id="inf72"><mml:mrow><mml:mi>σ</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></inline-formula> and the analytical result was due to the finite number of decision trials in the simulation, and hence, the longer the horizon, the closer they become (<xref ref-type="fig" rid="fig2s1">Figure 2—figure supplement 1</xref>). Each panel shows a different combination of the inverse temperature <inline-formula><mml:math id="inf73"><mml:mi>β</mml:mi></mml:math></inline-formula> and the conformity exponent <inline-formula><mml:math id="inf74"><mml:mi>θ</mml:mi></mml:math></inline-formula>.</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="elife-75308.xml.media/fig2.jpg"/></fig><fig id="fig2s1" position="float" specific-use="child-fig"><label>Figure 2—figure supplement 1.</label><caption><title>The effect of social learning on the average decision performance on the longer time horizon.</title><p>The <italic>x</italic> axis is an interaction of two reinforcement learning parameters <inline-formula><mml:math id="inf75"><mml:mrow><mml:mi>α</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>β</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>, that is, the susceptibility to the hot stove effect. The <italic>y</italic> axis is the mean probability of choosing the optimal risky alternative in the last 75 trials in the two-armed bandit task whose setup was the same as in <xref ref-type="fig" rid="fig1">Figures 1</xref> and <xref ref-type="fig" rid="fig2">2</xref> in the main text (i.e. <inline-formula><mml:math id="inf76"><mml:mrow><mml:mi>μ</mml:mi><mml:mo>=</mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:math></inline-formula>, s.d. = 1) except for the longer time horizon <inline-formula><mml:math id="inf77"><mml:mrow><mml:mi>T</mml:mi><mml:mo>=</mml:mo><mml:mn>1075</mml:mn></mml:mrow></mml:math></inline-formula> compared to the time horizon used in the main text (<inline-formula><mml:math id="inf78"><mml:mrow><mml:mi>T</mml:mi><mml:mo>=</mml:mo><mml:mn>150</mml:mn></mml:mrow></mml:math></inline-formula>). The dotted curves are the mean result of agent-based simulations of groups of social learners with two different mean values of the copying weight <inline-formula><mml:math id="inf79"><mml:mrow><mml:mi>σ</mml:mi><mml:mo>∈</mml:mo><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mn>0.25</mml:mn><mml:mo>,</mml:mo><mml:mn>0.5</mml:mn><mml:mo stretchy="false">}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> or individual learners with <inline-formula><mml:math id="inf80"><mml:mrow><mml:mi>σ</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></inline-formula>. Each panel shows a different combination of the inverse temperature <inline-formula><mml:math id="inf81"><mml:mi>β</mml:mi></mml:math></inline-formula> and the conformity exponent <inline-formula><mml:math id="inf82"><mml:mi>θ</mml:mi></mml:math></inline-formula>. The black solid curve is the theoretical benchmark where individual reinforcement learners were expected to asymptote with <inline-formula><mml:math id="inf83"><mml:mrow><mml:mi>T</mml:mi><mml:mo>→</mml:mo><mml:mrow><mml:mo>+</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow></mml:mrow></mml:math></inline-formula>. Compared to <xref ref-type="fig" rid="fig2">Figure 2</xref> in the main text, individual learners got closer to the benchmark. On the other hand, the performance of social learners remained deviated from the benchmark, suggesting that social influence had a qualitative impact on the course of learning and decision making, rather than merely slowing down approaching the equilibrium of individual learning.</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="elife-75308.xml.media/fig2-figsupp1.jpg"/></fig><fig id="fig2s2" position="float" specific-use="child-fig"><label>Figure 2—figure supplement 2.</label><caption><title>The effect of social learning on the time evolution of decision performance.</title><p>The <italic>x</italic> axis is the number of trials. The <italic>y</italic> axis is the mean proportion of choosing the optimal risky alternative. Each colour shows a different <inline-formula><mml:math id="inf84"><mml:mi>β</mml:mi></mml:math></inline-formula>. For the asocial learning condition (i.e. <inline-formula><mml:math id="inf85"><mml:mrow><mml:mi>σ</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></inline-formula>), the analytical benchmark to which reinforcement learners asymptote is shown as a horizontal line. Conformity exponent <inline-formula><mml:math id="inf86"><mml:mi>θ</mml:mi></mml:math></inline-formula> was 2. Group size was 8. The simulation was repeated 1000 times for each combination of parameters. Compared to asocial learning cases, social learning (<inline-formula><mml:math id="inf87"><mml:mrow><mml:mi>σ</mml:mi><mml:mo>=</mml:mo><mml:mn>0.3</mml:mn></mml:mrow></mml:math></inline-formula>) qualitatively alters the course of learning, rather than just speeding up or slowing down learning.</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="elife-75308.xml.media/fig2-figsupp2.jpg"/></fig></fig-group></sec><sec id="s2-3"><title>The conformist social influence model</title><p>We next considered a collective learning situation in which a group of multiple individuals perform the task simultaneously and individuals can observe others’ actions. We assumed a simple frequency-based social cue specifying distributions of individual choices (<xref ref-type="bibr" rid="bib47">McElreath et al., 2005</xref>; <xref ref-type="bibr" rid="bib48">McElreath et al., 2008</xref>; <xref ref-type="bibr" rid="bib72">Toyokawa et al., 2017</xref>; <xref ref-type="bibr" rid="bib73">Toyokawa et al., 2019</xref>; <xref ref-type="bibr" rid="bib19">Deffner et al., 2020</xref>). We assumed that individuals could not observe others’ earnings, ensuring that they could not sample information about payoffs being no longer available because of their own choice (i.e. forgone payoffs; <xref ref-type="bibr" rid="bib21">Denrell, 2007</xref>; <xref ref-type="bibr" rid="bib78">Yechiam and Busemeyer, 2006</xref>).</p><p>A realised payoff was independent of others’ decisions and was drawn solely from the payoff probability distribution specific to each alternative (and hence no externality was assumed), thereby ensuring there would be no direct social competition over the monetary reward (<xref ref-type="bibr" rid="bib28">Giraldeau and Caraco, 2000</xref>) nor normative pressure towards majority alignment (<xref ref-type="bibr" rid="bib16">Cialdini and Goldstein, 2004</xref>; <xref ref-type="bibr" rid="bib45">Mahmoodi et al., 2018</xref>). The value of social information was assumed to be only informational (<xref ref-type="bibr" rid="bib26">Efferson et al., 2008</xref>; <xref ref-type="bibr" rid="bib54">Nakahashi, 2007</xref>). Nevertheless, our model may apply to the context of normative social influences, because what we assumed here was modification in individual choice probabilities by social influences, irrespective of underlying motivations of conformity.</p><p>To model a compromise between individual trial-and-error learning and the frequency-based copying process, we formulated the social influences on reinforcement learning as a weighted average between the asocial (<inline-formula><mml:math id="inf88"><mml:mi>A</mml:mi></mml:math></inline-formula>) and social (<inline-formula><mml:math id="inf89"><mml:mi>S</mml:mi></mml:math></inline-formula>) processes of decision making, that is, <inline-formula><mml:math id="inf90"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mi>σ</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>⁢</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi>σ</mml:mi><mml:mo>⁢</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math id="inf91"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the individual net probability of choosing an option <inline-formula><mml:math id="inf92"><mml:mrow><mml:mi>i</mml:mi><mml:mo>∈</mml:mo><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo stretchy="false">}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> at time <inline-formula><mml:math id="inf93"><mml:mi>t</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="inf94"><mml:mi>σ</mml:mi></mml:math></inline-formula> is a weight given to the social influence (<italic>copying weight</italic>).</p><p>In addition, the level of social frequency dependence was determined by another social learning parameter <inline-formula><mml:math id="inf95"><mml:mi>θ</mml:mi></mml:math></inline-formula> (<italic>conformity exponent</italic>), such that <inline-formula><mml:math id="inf96"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mi>θ</mml:mi></mml:msubsup><mml:mo>/</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mi>θ</mml:mi></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mi>θ</mml:mi></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math id="inf97"><mml:msub><mml:mi>N</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> is the number of agents who chose option <inline-formula><mml:math id="inf98"><mml:mi>i</mml:mi></mml:math></inline-formula> (see the Materials and methods for the accurate formulation). The larger the <inline-formula><mml:math id="inf99"><mml:mi>θ</mml:mi></mml:math></inline-formula>, the more the net choice probability favours a common alternative chosen by the majority of a group at the moment (a conformity bias; <xref ref-type="bibr" rid="bib14">Boyd and Richerson, 1985</xref>). Note that there is no actual social influence when <inline-formula><mml:math id="inf100"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></inline-formula> because in this case the ‘social influence’ favours a uniformly random choice, irrespective of whether it is a common behaviour.</p><p>Our model is a natural extension of both the asocial reinforcement learning and the model of ‘extreme conformity’ assumed in some previous models (e.g. <xref ref-type="bibr" rid="bib23">Denrell and Le Mens, 2017</xref>), as these conditions can be expressed as a special case of parameter combinations. We explore the implications of this extension in the Discussion. The descriptions of the parameters are summarised in <xref ref-type="table" rid="table1">Table 1</xref>.</p><table-wrap id="table1" position="float"><label>Table 1.</label><caption><title>Summary of the learning model parameters.</title></caption><table frame="hsides" rules="groups"><thead><tr><th valign="bottom">Symbol</th><th valign="bottom">Meaning</th><th valign="bottom">Range of the value</th></tr></thead><tbody><tr><td align="left" valign="bottom">α</td><td align="left" valign="bottom">Learning rate</td><td align="left" valign="bottom">[0, 1]</td></tr><tr><td align="left" valign="bottom">β</td><td align="left" valign="bottom">Inverse temperature</td><td align="left" valign="bottom">[0, +∞]</td></tr><tr><td align="left" valign="bottom">α(1+β)</td><td align="left" valign="bottom">Susceptibility to the hot stove effect</td><td align="left" valign="bottom"/></tr><tr><td align="left" valign="bottom">σ</td><td align="left" valign="bottom">Copying weight</td><td align="left" valign="bottom">[0, 1]</td></tr><tr><td align="left" valign="bottom">θ</td><td align="left" valign="bottom">Conformity exponent</td><td align="left" valign="bottom">[-∞, +∞]</td></tr></tbody></table></table-wrap></sec><sec id="s2-4"><title>The collective behavioural rescue effect</title><p>Varying these two social learning parameters, <inline-formula><mml:math id="inf101"><mml:mi>σ</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="inf102"><mml:mi>θ</mml:mi></mml:math></inline-formula>, systematically, we observed a mitigation of suboptimal risk aversion under positive frequency-based social influences. As shown in <xref ref-type="fig" rid="fig1">Figure 1c</xref>, even with a strong conformity bias (<inline-formula><mml:math id="inf103"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>></mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>), social influence widened the region of parameter combinations where the majority of decision makers could escape from suboptimal risk aversion (the increase of the red area in <xref ref-type="fig" rid="fig1">Figure 1c</xref>). The increment of the area of adaptive risk seeking was greater with <inline-formula><mml:math id="inf104"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula> than with <inline-formula><mml:math id="inf105"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>=</mml:mo><mml:mn>4</mml:mn></mml:mrow></mml:math></inline-formula>. When <inline-formula><mml:math id="inf106"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula>, a large copying weight (<inline-formula><mml:math id="inf107"><mml:mi>σ</mml:mi></mml:math></inline-formula>) could eliminate almost all the area of risk aversion (<xref ref-type="fig" rid="fig1">Figure 1c</xref>; see also <xref ref-type="fig" rid="fig1s1">Figure 1—figure supplement 1</xref> for a greater range of parameter combinations), whereas when <inline-formula><mml:math id="inf108"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>=</mml:mo><mml:mn>4</mml:mn></mml:mrow></mml:math></inline-formula>, there was also a region in which optimal risk seeking was weakened (<xref ref-type="fig" rid="fig1">Figure 1d</xref>). On the other hand, such substantial switching of the majority to being risk seeking did not emerge in the negative risk premium (negative RP) task (<xref ref-type="fig" rid="fig1s3">Figure 1—figure supplement 3</xref>), although there was a parameter region where the proportion of suboptimal risk seeking relatively increased compared to that of individual learners (<xref ref-type="fig" rid="fig1s6">Figure 1—figure supplement 6</xref>). Naturally, increasing the copying weight <inline-formula><mml:math id="inf109"><mml:mrow><mml:mi>σ</mml:mi><mml:mo>→</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula> eventually approximated the chance-level performance in both positive and negative RP cases (<xref ref-type="fig" rid="fig1s1">Figure 1—figure supplement 1</xref>, <xref ref-type="fig" rid="fig1s3">Figure 1—figure supplement 3</xref>). In sum, simulations suggest that conformist social influence widely promoted risk seeking under the positive RP, and that such a promotion of risk seeking was less evident in the negative RP task.</p><p><xref ref-type="fig" rid="fig2">Figure 2</xref> highlights the extent to which risk aversion was relaxed through social influences. Individuals with positive <inline-formula><mml:math id="inf110"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>σ</mml:mi><mml:mo>></mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> could maintain a high proportion of risk seeking even in the region of high susceptibility to the hot stove effect (<inline-formula><mml:math id="inf111"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>α</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>β</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo><mml:mo>></mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>). Although social learners eventually fell into a risk-averse regime with increasing <inline-formula><mml:math id="inf112"><mml:mrow><mml:mi>α</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>β</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>, risk aversion was largely mitigated compared to the performance of individual learners who had <inline-formula><mml:math id="inf113"><mml:mrow><mml:mi>σ</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></inline-formula>. Interestingly, the probability of choosing the optimal risky option was maximised at an intermediate value of <inline-formula><mml:math id="inf114"><mml:mrow><mml:mi>α</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>β</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> when the conformity exponent was large <inline-formula><mml:math id="inf115"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>=</mml:mo><mml:mn>4</mml:mn></mml:mrow></mml:math></inline-formula> and the copying weight was high <inline-formula><mml:math id="inf116"><mml:mrow><mml:mi>σ</mml:mi><mml:mo>=</mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:math></inline-formula>.</p><p>In the region of less susceptibility to the hot stove effect (<inline-formula><mml:math id="inf117"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>α</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>β</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo><mml:mo><</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>), social influence could enhance individual optimal risk seeking up to the theoretical benchmark expected in individual reinforcement learning with an infinite time horizon (the solid curves in <xref ref-type="fig" rid="fig2">Figure 2</xref>). A socially induced increase in risk seeking in the region <inline-formula><mml:math id="inf118"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>α</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>β</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo><mml:mo><</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> was more evident with larger <inline-formula><mml:math id="inf119"><mml:mi>β</mml:mi></mml:math></inline-formula>, and hence with smaller <inline-formula><mml:math id="inf120"><mml:mi>α</mml:mi></mml:math></inline-formula> to satisfy <inline-formula><mml:math id="inf121"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>α</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>β</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo><mml:mo><</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>. The smaller the learning rate <inline-formula><mml:math id="inf122"><mml:mi>α</mml:mi></mml:math></inline-formula>, the longer it would take to achieve the asymptotic equilibrium state, due to slow value updating. Asocial learners, as well as social learners with high <inline-formula><mml:math id="inf123"><mml:mi>σ</mml:mi></mml:math></inline-formula> (=0.5) coupled with high <inline-formula><mml:math id="inf124"><mml:mi>θ</mml:mi></mml:math></inline-formula> (=4), were still far from the analytical benchmark, whereas social learners with weak social influence <inline-formula><mml:math id="inf125"><mml:mrow><mml:mi>σ</mml:mi><mml:mo>=</mml:mo><mml:mn>0.25</mml:mn></mml:mrow></mml:math></inline-formula> were nearly able to converge on the benchmark performance, suggesting that social learning might affect the speed of learning. Indeed, a longer time horizon <inline-formula><mml:math id="inf126"><mml:mrow><mml:mi>T</mml:mi><mml:mo>=</mml:mo><mml:mn>1075</mml:mn></mml:mrow></mml:math></inline-formula> reduced the advantage of weak social learners in this <inline-formula><mml:math id="inf127"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>α</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>β</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo><mml:mo><</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> region because slow learners could now achieve the benchmark accuracy (<xref ref-type="fig" rid="fig2s1">Figure 2—figure supplement 1</xref> and <xref ref-type="fig" rid="fig2s2">Figure 2—figure supplement 2</xref>).</p><p>Approaching the benchmark with an elongated time horizon, and the concomitant reduction in the advantage of social learners, was also found in the high susceptibility region <inline-formula><mml:math id="inf128"><mml:mrow><mml:mrow><mml:mi>α</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>β</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>≫</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:math></inline-formula> especially for those who had a high conformity exponent <inline-formula><mml:math id="inf129"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>=</mml:mo><mml:mn>4</mml:mn></mml:mrow></mml:math></inline-formula> (<xref ref-type="fig" rid="fig2s1">Figure 2—figure supplement 1</xref>). Notably, however, facilitation of optimal risk seeking became further evident in the other intermediate region <inline-formula><mml:math id="inf130"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mn>2</mml:mn><mml:mo><</mml:mo><mml:mi>α</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>β</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo><mml:mo><</mml:mo><mml:mn>4</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>. This suggests that merely speeding up or slowing down learning could not satisfactorily account for the qualitative ‘choice shift’ emerging through social influences.</p><p>We obtained similar results across different settings of the multi-armed bandit task, such as a skewed payoff distribution in which either large or small payoffs were randomly drawn from a Bernoulli process (<xref ref-type="bibr" rid="bib46">March, 1996</xref>; <xref ref-type="bibr" rid="bib21">Denrell, 2007</xref>, <xref ref-type="fig" rid="fig1s4">Figure 1—figure supplement 4</xref>) and increased option numbers (<xref ref-type="fig" rid="fig1s5">Figure 1—figure supplement 5</xref>). Further, the conclusion still held for an alternative model in which social influences modified the belief-updating process (the value-shaping model; <xref ref-type="bibr" rid="bib53">Najar et al., 2020</xref>) rather than directly influencing the choice probability (the decision-biasing model) as assumed in the main text thus far (see Supplementary Methods; <xref ref-type="fig" rid="fig1s2">Figure 1—figure supplement 2</xref>). One could derive many other more complex social learning processes that may operate in reality; however, the comprehensive search of possible model space is beyond the current interest. Yet, decision biasing was found to fit better than value shaping with our behavioural experimental data (<xref ref-type="fig" rid="fig6s2">Figure 6—figure supplement 2</xref>), leading us to focus our analysis on the decision-biasing model.</p></sec><sec id="s2-5"><title>The robustness of individual heterogeneity</title><p>We have thus far assumed no parameter variations across individuals in a group to focus on the qualitative differences between social and asocial learners’ behaviour. However, individual differences in development, state, or experience or variations in behaviour caused by personality traits might either facilitate or undermine collective decision performance. Especially if a group is composed of both types of individuals, those who are less susceptible to the hot stove effect (<inline-formula><mml:math id="inf131"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>α</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>β</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo><mml:mo><</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>) as well as those who are more susceptible <inline-formula><mml:math id="inf132"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>α</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>β</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo><mml:mo>></mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>, it remains unclear who benefits from the rescue effect: Is it only those individuals with <inline-formula><mml:math id="inf133"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>α</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>β</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo><mml:mo>></mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> who enjoy the benefit, or can collective intelligence benefit a group as a whole? For the sake of simplicity, here we considered groups of five individuals, which were composed of either homogeneous (yellow in <xref ref-type="fig" rid="fig3">Figure 3</xref>) or heterogeneous (green, blue, purple in <xref ref-type="fig" rid="fig3">Figure 3</xref>) individuals. Individual values of a focal behavioural parameter were varied across individuals in a group. Other non-focal parameters were identical across individuals within a group. The basic parameter values assigned to non-focal parameters were <inline-formula><mml:math id="inf134"><mml:mrow><mml:mi>α</mml:mi><mml:mo>=</mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="inf135"><mml:mrow><mml:mi>β</mml:mi><mml:mo>=</mml:mo><mml:mn>7</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="inf136"><mml:mrow><mml:mi>σ</mml:mi><mml:mo>=</mml:mo><mml:mn>0.3</mml:mn></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="inf137"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>=</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:math></inline-formula>, which were chosen so that the homogeneous group could generate the collective rescue effect. The groups’ mean values of the various focal parameters were matched to these basic values.</p><fig id="fig3" position="float"><label>Figure 3.</label><caption><title>The effect of individual heterogeneity on the proportion of choosing the risky option in the two-armed bandit task.</title><p>(<bold>a</bold>) The effect of heterogeneity of <inline-formula><mml:math id="inf138"><mml:mi>α</mml:mi></mml:math></inline-formula>, (<bold>b</bold>) <inline-formula><mml:math id="inf139"><mml:mi>β</mml:mi></mml:math></inline-formula>, (<bold>c</bold>) <inline-formula><mml:math id="inf140"><mml:mi>σ</mml:mi></mml:math></inline-formula>, and (<bold>d</bold>) <inline-formula><mml:math id="inf141"><mml:mi>θ</mml:mi></mml:math></inline-formula>. Individual values of a focal behavioural parameter were varied across individuals in a group of five. Other non-focal parameters were identical across individuals within a group. The basic parameter values assigned to non-focal parameters were <inline-formula><mml:math id="inf142"><mml:mrow><mml:mi>α</mml:mi><mml:mo>=</mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="inf143"><mml:mrow><mml:mi>β</mml:mi><mml:mo>=</mml:mo><mml:mn>7</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="inf144"><mml:mrow><mml:mi>σ</mml:mi><mml:mo>=</mml:mo><mml:mn>0.3</mml:mn></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="inf145"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>=</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:math></inline-formula>, and groups’ mean values of the various focal parameters were matched to these basic values. We simulated 3 different heterogeneous compositions: The majority (3 of 5 individuals) potentially suffered the hot stove effect <inline-formula><mml:math id="inf146"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>α</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>β</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo><mml:mo>></mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> (<bold>a, b</bold>) or had the highest diversity in social learning parameters (c, d; purple); the majority were able to overcome the hot stove effect <inline-formula><mml:math id="inf147"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>α</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>β</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo><mml:mo><</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> (<bold>a, b</bold>) or had moderate heterogeneity in the social learning parameters (c, d; blue); and all individuals had <inline-formula><mml:math id="inf148"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>α</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>β</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo><mml:mo>></mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> but smaller heterogeneity (green). The yellow diamond shows the homogeneous groups’ performance. Lines are drawn through average results across the same compositional groups. Each round dot represents a group member’s mean performance. The diamonds are the average performance of each group for each composition category. For comparison, asocial learners’ performance, with which the performance of social learners can be evaluated, is shown in gray. For heterogeneous <inline-formula><mml:math id="inf149"><mml:mi>α</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="inf150"><mml:mi>β</mml:mi></mml:math></inline-formula>, the analytical solution of asocial learning performance is shown as a solid-line curve. We ran 20,000 replications for each group composition.</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="elife-75308.xml.media/fig3.jpg"/></fig><p><xref ref-type="fig" rid="fig3">Figure 3a</xref> shows the effect of heterogeneity in the learning rate (<inline-formula><mml:math id="inf151"><mml:mi>α</mml:mi></mml:math></inline-formula>). Heterogeneous groups performed better on average than a homogeneous group (represented by the yellow diamond). The heterogeneous groups owed this overall improvement to the large rescue effect operating for individuals who had a high susceptibility to the hot stove effect (<inline-formula><mml:math id="inf152"><mml:mrow><mml:mrow><mml:mi>α</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>β</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>≫</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:math></inline-formula>). On the other hand, the performance of less susceptible individuals (<inline-formula><mml:math id="inf153"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>α</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>β</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo><mml:mo><</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>) was slightly undermined compared to the asocial benchmark performance shown in grey. Notably, however, how large the detrimental effect was for the low-susceptibility individuals depended on the group’s composition: The undermining effect was largely mitigated when low-susceptibility individuals (<inline-formula><mml:math id="inf154"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>α</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>β</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo><mml:mo><</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>) made up a majority of a group (3 of 5; the blue line), whereas they performed worse than the asocial benchmark when the majority were those with high susceptibility (purple).</p><p>The advantage of a heterogeneous group was also found for the inverse temperature (<inline-formula><mml:math id="inf155"><mml:mi>β</mml:mi></mml:math></inline-formula>), although the impact of the group’s heterogeneity was much smaller than that for <inline-formula><mml:math id="inf156"><mml:mi>α</mml:mi></mml:math></inline-formula> (<xref ref-type="fig" rid="fig3">Figure 3b</xref>). Interestingly, no detrimental effect for individuals with <inline-formula><mml:math id="inf157"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>α</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>β</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo><mml:mo><</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> was found in association with the <inline-formula><mml:math id="inf158"><mml:mi>β</mml:mi></mml:math></inline-formula> variations.</p><p>On the other hand, individual variations in the copying weight (<inline-formula><mml:math id="inf159"><mml:mi>σ</mml:mi></mml:math></inline-formula>) had an overall detrimental effect on collective performance, although individuals in the highest diversity group could still perform better than the asocial learners (<xref ref-type="fig" rid="fig3">Figure 3c</xref>). Individuals who had an intermediate level of <inline-formula><mml:math id="inf160"><mml:mi>σ</mml:mi></mml:math></inline-formula> achieved relatively higher performance within the group than those who had either higher or lower <inline-formula><mml:math id="inf161"><mml:mi>σ</mml:mi></mml:math></inline-formula>. This was because individuals with lower <inline-formula><mml:math id="inf162"><mml:mi>σ</mml:mi></mml:math></inline-formula> could benefit less from social information, while those with higher <inline-formula><mml:math id="inf163"><mml:mi>σ</mml:mi></mml:math></inline-formula> relied so heavily on social frequency information that behaviour was barely informed by individual learning, resulting in maladaptive herding or collective illusion (<xref ref-type="bibr" rid="bib23">Denrell and Le Mens, 2017</xref>; <xref ref-type="bibr" rid="bib73">Toyokawa et al., 2019</xref>). As a result, the average performance decreased with increasing diversity in <inline-formula><mml:math id="inf164"><mml:mi>σ</mml:mi></mml:math></inline-formula>.</p><p>Such a substantial effect of individual differences was not observed in the conformity exponent <inline-formula><mml:math id="inf165"><mml:mi>θ</mml:mi></mml:math></inline-formula> (<xref ref-type="fig" rid="fig3">Figure 3d</xref>), where individual performance was almost stable regardless of whether the individual was heavily conformist (<inline-formula><mml:math id="inf166"><mml:mrow><mml:msub><mml:mi>θ</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>8</mml:mn></mml:mrow></mml:math></inline-formula>) or even negatively dependent on social information (<inline-formula><mml:math id="inf167"><mml:mrow><mml:msub><mml:mi>θ</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mrow></mml:math></inline-formula>). The existence of a few conformists in a group could not itself trigger positive feedback among the group unless other individuals also relied on social information in a conformist-biased way, because the flexible behaviour of non-conformists could keep the group’s distribution nearly flat (i.e. <inline-formula><mml:math id="inf168"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>s</mml:mi></mml:msub><mml:mo>≈</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mi>r</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>). Therefore, the existence of individuals with small <inline-formula><mml:math id="inf169"><mml:mi>θ</mml:mi></mml:math></inline-formula> in a heterogeneous group could prevent the strong positive feedback from being immediately elicited, compensating for the potential detrimental effect of maladaptive herding by strong conformists.</p><p>Overall, the relaxation of, and possibly the complete rescue from, a suboptimal risk aversion in repeated risky decision making emerged in a range of conditions in collective learning. It was not likely a mere speeding up or slowing down of learning process (<xref ref-type="fig" rid="fig2s1">Figure 2—figure supplement 1</xref> and <xref ref-type="fig" rid="fig2s2">Figure 2—figure supplement 2</xref>), nor just an averaging process mixing performances of both risk seekers and risk-averse individuals (<xref ref-type="fig" rid="fig3">Figure 3</xref>). It depended neither on specific characteristics of social learning models (<xref ref-type="fig" rid="fig1s2">Figure 1—figure supplement 2</xref>) nor on the profile of the bandit task’s setups (<xref ref-type="fig" rid="fig1s4">Figure 1—figure supplement 4</xref>). Instead, our simulation suggests that self-organisation may play a key role in this emergent phenomenon. To seek a general mechanism underlying the observed collective behavioural rescue, in the next section we show a reduced, approximated differential equation model that can provide qualitative insights into the collective decision-making dynamics observed above.</p></sec><sec id="s2-6"><title>The simplified population dynamics model</title><p>To obtain a qualitative understanding of self-organisation that seems responsible for the pattern of adaptive behavioural shift observed in our individual-based simulation, we made a reduced model that approximates temporal changes of behaviour of an ‘average’ individual, or in other words, average dynamics of a population of multiple individuals, where the computational details of reinforcement learning were purposely ignored. Such a dynamic modelling approach has been commonly used in population ecology and collective animal behaviour research and has proven highly useful in disentangling the factors underlying complex systems (e.g. <xref ref-type="bibr" rid="bib9">Beckers et al., 1990</xref>; <xref ref-type="bibr" rid="bib30">Goss et al., 1989</xref>; <xref ref-type="bibr" rid="bib62">Seeley et al., 1991</xref>; <xref ref-type="bibr" rid="bib66">Sumpter and Pratt, 2003</xref>; <xref ref-type="bibr" rid="bib33">Harrison et al., 2001</xref>).</p><p>Specifically, we considered a differential equation that focuses only on increases and decreases in the number of individuals who are choosing the risky option (<inline-formula><mml:math id="inf170"><mml:msub><mml:mi>N</mml:mi><mml:mi>R</mml:mi></mml:msub></mml:math></inline-formula>) and the safe option (<inline-formula><mml:math id="inf171"><mml:msub><mml:mi>N</mml:mi><mml:mi>S</mml:mi></mml:msub></mml:math></inline-formula>) with either a positive (+) or a negative (-) ‘attitude’ (or preference) towards the risky option (<xref ref-type="fig" rid="fig4">Figure 4a</xref>). The part of the population that has a positive attitude (<inline-formula><mml:math id="inf172"><mml:msubsup><mml:mi>N</mml:mi><mml:mi>S</mml:mi><mml:mo>+</mml:mo></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="inf173"><mml:msubsup><mml:mi>N</mml:mi><mml:mi>R</mml:mi><mml:mo>+</mml:mo></mml:msubsup></mml:math></inline-formula>) is more likely to move on to, and stay at, the risky option, whereas the other part of the population that has a negative attitude (<inline-formula><mml:math id="inf174"><mml:msubsup><mml:mi>N</mml:mi><mml:mi>S</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="inf175"><mml:msubsup><mml:mi>N</mml:mi><mml:mi>R</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:math></inline-formula>) is more likely to move on to, and stay at, the safe option. Note that movements in the opposite direction also exist, such as moving on to the risky option when having a negative attitude (<inline-formula><mml:math id="inf176"><mml:msubsup><mml:mi>P</mml:mi><mml:mi>R</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:math></inline-formula>), but at a lower rate than <inline-formula><mml:math id="inf177"><mml:msubsup><mml:mi>P</mml:mi><mml:mi>S</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:math></inline-formula>, depicted by the thickness of the arrows in <xref ref-type="fig" rid="fig4">Figure 4a</xref>. We defined that the probability of moving towards an option matched with their attitude (<inline-formula><mml:math id="inf178"><mml:mrow><mml:msubsup><mml:mi>P</mml:mi><mml:mi>S</mml:mi><mml:mo>-</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mi>P</mml:mi><mml:mi>R</mml:mi><mml:mo>+</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) was higher than that of moving in the opposite direction (<inline-formula><mml:math id="inf179"><mml:mrow><mml:msubsup><mml:mi>P</mml:mi><mml:mi>R</mml:mi><mml:mo>-</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mi>P</mml:mi><mml:mi>S</mml:mi><mml:mo>+</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>l</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>), that is, <inline-formula><mml:math id="inf180"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo>></mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mstyle></mml:math></inline-formula>. The probability <inline-formula><mml:math id="inf181"><mml:msub><mml:mi>p</mml:mi><mml:mi>l</mml:mi></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math id="inf182"><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub></mml:math></inline-formula> can be seen approximately as the per capita rate of exploration and exploitation, respectively.</p><fig-group><fig id="fig4" position="float"><label>Figure 4.</label><caption><title>The population dynamics model.</title><p>(<bold>a</bold>) A schematic diagram of the dynamics. Solid arrows represent a change in population density between connected states at a time step. The thicker the arrow, the larger the per-capita rate of behavioural change. (<bold>b, c</bold>) The results of the asocial, baseline model where <inline-formula><mml:math id="inf183"><mml:mrow><mml:msubsup><mml:mi>P</mml:mi><mml:mi>S</mml:mi><mml:mo>-</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mi>P</mml:mi><mml:mi>R</mml:mi><mml:mo>+</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="inf184"><mml:mrow><mml:msubsup><mml:mi>P</mml:mi><mml:mi>R</mml:mi><mml:mo>-</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mi>P</mml:mi><mml:mi>S</mml:mi><mml:mo>+</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>l</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> (<inline-formula><mml:math id="inf185"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo>></mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mstyle></mml:math></inline-formula>). Both figures show the equilibrium bias towards risk seeking (i.e., <inline-formula><mml:math id="inf186"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mi>r</mml:mi><mml:mo>⋆</mml:mo></mml:msubsup><mml:mo>-</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mi>s</mml:mi><mml:mo>⋆</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>) as a function of the degree of risk premium <inline-formula><mml:math id="inf187"><mml:mi>e</mml:mi></mml:math></inline-formula> as well as of the per-capita probability of moving to the less preferred behavioural option <inline-formula><mml:math id="inf188"><mml:msub><mml:mi>p</mml:mi><mml:mi>l</mml:mi></mml:msub></mml:math></inline-formula>. (<bold>b</bold>) The explicit form of the curve is given by <inline-formula><mml:math id="inf189"><mml:mrow><mml:mo>-</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>l</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>⁢</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mi>e</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>⁢</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub></mml:mrow><mml:mo>-</mml:mo><mml:mrow><mml:mi>e</mml:mi><mml:mo>⁢</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>l</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mo>/</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>l</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>⁢</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mi>e</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>⁢</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi>e</mml:mi><mml:mo>⁢</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>l</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math></inline-formula>. (<bold>c</bold>) The dashed curve is the analytically derived neutral equilibrium of the asocial system that results in <inline-formula><mml:math id="inf190"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mi>R</mml:mi><mml:mo>*</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mi>S</mml:mi><mml:mo>*</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, given by <inline-formula><mml:math id="inf191"><mml:mrow><mml:mi>e</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub><mml:mo>/</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>l</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math></inline-formula>. (<bold>d</bold>) The equilibrium of the collective behavioural dynamics with social influences. The numerical results were obtained with <inline-formula><mml:math id="inf192"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mi>S</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mo>-</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mi>S</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mo>+</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:mn>5</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="inf193"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mi>R</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>10</mml:mn></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="inf194"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>0.7</mml:mn></mml:mrow></mml:math></inline-formula>.</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="elife-75308.xml.media/fig4.jpg"/></fig><fig id="fig4s1" position="float" specific-use="child-fig"><label>Figure 4—figure supplement 1.</label><caption><title>The result of the differential equation model.</title><p>The effect of both the per capita probability of exploration <inline-formula><mml:math id="inf195"><mml:msub><mml:mi>p</mml:mi><mml:mi>l</mml:mi></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math id="inf196"><mml:mi>e</mml:mi></mml:math></inline-formula> (i.e. the ratio of individuals who prefer behavioural state <inline-formula><mml:math id="inf197"><mml:mi>R</mml:mi></mml:math></inline-formula>) on the equilibrium degree of risk seeking (i.e. <inline-formula><mml:math id="inf198"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mi>R</mml:mi><mml:mo>*</mml:mo></mml:msubsup><mml:mo>-</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mi>S</mml:mi><mml:mo>*</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>), across the different combinations of social influence parameters. Different social influence weights are shown from top to bottom (<inline-formula><mml:math id="inf199"><mml:mrow><mml:mi>σ</mml:mi><mml:mo>∈</mml:mo><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>0.25</mml:mn><mml:mo>,</mml:mo><mml:mn>0.5</mml:mn><mml:mo>,</mml:mo><mml:mn>0.75</mml:mn><mml:mo stretchy="false">}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>). Different conformity exponents are shown from left to right (<inline-formula><mml:math id="inf200"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>∈</mml:mo><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>10</mml:mn><mml:mo stretchy="false">}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>). The dashed curve is <inline-formula><mml:math id="inf201"><mml:mrow><mml:mi>e</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub><mml:mo>/</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>l</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math></inline-formula>. The numeric solution was obtained with conditions <inline-formula><mml:math id="inf202"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mi>S</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mo>-</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mi>S</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mo>+</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:mn>5</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="inf203"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mi>R</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>10</mml:mn></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="inf204"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>0.7</mml:mn></mml:mrow></mml:math></inline-formula>.</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="elife-75308.xml.media/fig4-figsupp1.jpg"/></fig></fig-group><p>An attitude can change when the risky option is chosen. We assumed that a proportion <inline-formula><mml:math id="inf205"><mml:mi>e</mml:mi></mml:math></inline-formula> (<inline-formula><mml:math id="inf206"><mml:mrow><mml:mn>0</mml:mn><mml:mo>≤</mml:mo><mml:mi>e</mml:mi><mml:mo>≤</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula>) of the risk-taking part of the population would have a good experience, thereby holding a positive attitude (i.e. <inline-formula><mml:math id="inf207"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mi>R</mml:mi><mml:mo>+</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mi>e</mml:mi><mml:mo>⁢</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mi>R</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:math></inline-formula>). On the other hand, the rest of the risk-taking population would have a negative attitude (i.e. <inline-formula><mml:math id="inf208"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mi>R</mml:mi><mml:mo>-</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mi>e</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>⁢</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mi>R</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:math></inline-formula>). This proportion <inline-formula><mml:math id="inf209"><mml:mi>e</mml:mi></mml:math></inline-formula> can be interpreted as an approximation of the risk premium under the Gaussian noise of risk, because the larger <inline-formula><mml:math id="inf210"><mml:mi>e</mml:mi></mml:math></inline-formula> is, the more individuals one would expect would encounter a better experience than when making the safe choice. The full details are shown in the Materials and methods (<xref ref-type="table" rid="table2">Table 2</xref>).</p><table-wrap id="table2" position="float"><label>Table 2.</label><caption><title>Summary of the differential equation model parameters.</title></caption><table frame="hsides" rules="groups"><thead><tr><th valign="bottom">Symbol</th><th valign="bottom">Meaning</th><th valign="bottom">Range of the value</th></tr></thead><tbody><tr><td align="left" valign="bottom"><inline-formula><mml:math id="inf211"><mml:msubsup><mml:mi>N</mml:mi><mml:mi>R</mml:mi><mml:mo>+</mml:mo></mml:msubsup></mml:math></inline-formula></td><td align="left" valign="bottom">Density of individuals choosing <inline-formula><mml:math id="inf212"><mml:mi>R</mml:mi></mml:math></inline-formula> and preferring <inline-formula><mml:math id="inf213"><mml:mi>R</mml:mi></mml:math></inline-formula></td><td align="left" valign="bottom"><inline-formula><mml:math id="inf214"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mi>R</mml:mi><mml:mo>+</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mi>e</mml:mi><mml:mo>⁢</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mi>R</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:math></inline-formula></td></tr><tr><td align="left" valign="bottom"><inline-formula><mml:math id="inf215"><mml:msubsup><mml:mi>N</mml:mi><mml:mi>R</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:math></inline-formula></td><td align="left" valign="bottom">Density of individuals choosing <inline-formula><mml:math id="inf216"><mml:mi>R</mml:mi></mml:math></inline-formula> and preferring <inline-formula><mml:math id="inf217"><mml:mi>S</mml:mi></mml:math></inline-formula></td><td align="left" valign="bottom"><inline-formula><mml:math id="inf218"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mi>R</mml:mi><mml:mo>-</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mi>e</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>⁢</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mi>R</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:math></inline-formula></td></tr><tr><td align="left" valign="bottom"><inline-formula><mml:math id="inf219"><mml:msubsup><mml:mi>N</mml:mi><mml:mi>S</mml:mi><mml:mo>+</mml:mo></mml:msubsup></mml:math></inline-formula></td><td align="left" valign="bottom">Density of individuals choosing <inline-formula><mml:math id="inf220"><mml:mi>S</mml:mi></mml:math></inline-formula> and preferring <inline-formula><mml:math id="inf221"><mml:mi>R</mml:mi></mml:math></inline-formula></td><td align="left" valign="bottom"/></tr><tr><td align="left" valign="bottom"><inline-formula><mml:math id="inf222"><mml:msubsup><mml:mi>N</mml:mi><mml:mi>S</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:math></inline-formula></td><td align="left" valign="bottom">Density of individuals choosing <inline-formula><mml:math id="inf223"><mml:mi>S</mml:mi></mml:math></inline-formula> and preferring <inline-formula><mml:math id="inf224"><mml:mi>S</mml:mi></mml:math></inline-formula></td><td align="left" valign="bottom"/></tr><tr><td align="left" valign="bottom"><inline-formula><mml:math id="inf225"><mml:msub><mml:mi>p</mml:mi><mml:mi>l</mml:mi></mml:msub></mml:math></inline-formula></td><td align="left" valign="bottom">Per capita rate of moving to the unfavourable option</td><td align="left" valign="bottom"><inline-formula><mml:math id="inf226"><mml:mrow><mml:mn>0</mml:mn><mml:mo>≤</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>l</mml:mi></mml:msub><mml:mo>≤</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub><mml:mo>≤</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula></td></tr><tr><td align="left" valign="bottom"><inline-formula><mml:math id="inf227"><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub></mml:math></inline-formula></td><td align="left" valign="bottom">Per capita rate of moving to the favourable option</td><td align="left" valign="bottom"><inline-formula><mml:math id="inf228"><mml:mrow><mml:mn>0</mml:mn><mml:mo>≤</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>l</mml:mi></mml:msub><mml:mo>≤</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub><mml:mo>≤</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula></td></tr><tr><td align="left" valign="bottom"><inline-formula><mml:math id="inf229"><mml:mi>e</mml:mi></mml:math></inline-formula></td><td align="left" valign="bottom">Per capita rate of becoming enchanted with the risky option</td><td align="left" valign="bottom"><inline-formula><mml:math id="inf230"><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:math></inline-formula></td></tr><tr><td align="left" valign="bottom"><inline-formula><mml:math id="inf231"><mml:mi>σ</mml:mi></mml:math></inline-formula></td><td align="left" valign="bottom">Social influence weight</td><td align="left" valign="bottom"><inline-formula><mml:math id="inf232"><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:math></inline-formula></td></tr><tr><td align="left" valign="bottom"><inline-formula><mml:math id="inf233"><mml:mi>θ</mml:mi></mml:math></inline-formula></td><td align="left" valign="bottom">Conformity exponent</td><td align="left" valign="bottom"><inline-formula><mml:math id="inf234"><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mo>+</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:math></inline-formula></td></tr></tbody></table></table-wrap><p>To confirm that this approximated model can successfully replicate the fundamental property of the hot stove effect, we first describe the asocial behavioural model without social influence. The baseline, asocial dynamic system has a locally stable non-trivial equilibrium that gives <inline-formula><mml:math id="inf235"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mi>S</mml:mi><mml:mo>⋆</mml:mo></mml:msubsup><mml:mo>≥</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="inf236"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mi>R</mml:mi><mml:mo>⋆</mml:mo></mml:msubsup><mml:mo>≥</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math id="inf237"><mml:msup><mml:mi>N</mml:mi><mml:mo>⋆</mml:mo></mml:msup></mml:math></inline-formula> means the equilibrium density at which the system stops changing (<inline-formula><mml:math id="inf238"><mml:mrow><mml:mrow><mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mo>⁢</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mi>S</mml:mi><mml:mo>⋆</mml:mo></mml:msubsup></mml:mrow><mml:mo>/</mml:mo><mml:mi>d</mml:mi></mml:mrow><mml:mo>⁢</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mo>⁢</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mi>R</mml:mi><mml:mo>⋆</mml:mo></mml:msubsup></mml:mrow><mml:mo>/</mml:mo><mml:mi>d</mml:mi></mml:mrow><mml:mo>⁢</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></inline-formula>). At equilibrium, the ratio between the number of individuals choosing the safe option <inline-formula><mml:math id="inf239"><mml:mi>S</mml:mi></mml:math></inline-formula> and the number choosing the risky option <inline-formula><mml:math id="inf240"><mml:mi>R</mml:mi></mml:math></inline-formula> is given by <inline-formula><mml:math id="inf241"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mi>S</mml:mi><mml:mo>⋆</mml:mo></mml:msubsup><mml:mo>:</mml:mo><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mi>R</mml:mi><mml:mo>⋆</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mi>e</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>l</mml:mi></mml:msub><mml:mo>/</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mi>e</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub><mml:mo>/</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>l</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo>:</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula>, indicating that risk aversion (defined as the case where a larger part of the population chooses the safe option; <inline-formula><mml:math id="inf242"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo>⋆</mml:mo></mml:mrow></mml:msubsup><mml:mo>></mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mo>⋆</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula>) emerges when the inequality <inline-formula><mml:math id="inf243"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>e</mml:mi><mml:mo><</mml:mo><mml:msubsup><mml:mi>P</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo>−</mml:mo></mml:mrow></mml:msubsup><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msubsup><mml:mi>P</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo>−</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>P</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mo>−</mml:mo></mml:mrow></mml:msubsup><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula> holds.</p><p><xref ref-type="fig" rid="fig4">Figure 4b</xref> visually shows that the population is indeed attracted to the safe option <inline-formula><mml:math id="inf244"><mml:mi>S</mml:mi></mml:math></inline-formula> (that is, <inline-formula><mml:math id="inf245"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo>⋆</mml:mo></mml:mrow></mml:msubsup><mml:mo>></mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mo>⋆</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula>) in a wide range of the parameter region even when there is a positive ‘risk premium’ defined as <inline-formula><mml:math id="inf246"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>e</mml:mi><mml:mo>></mml:mo><mml:mn>1</mml:mn><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>. Although individuals choosing the risky option are more likely to become enchanted with the risky option than to be disappointed (i.e., <inline-formula><mml:math id="inf247"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>e</mml:mi><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mo>+</mml:mo></mml:mrow></mml:msubsup><mml:mo>></mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>−</mml:mo><mml:mi>e</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mo>−</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula>), the risk-seeking equilibrium (defined as <inline-formula><mml:math id="inf248"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo>⋆</mml:mo></mml:mrow></mml:msubsup><mml:mo><</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mo>⋆</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula>) becomes less likely to emerge as the exploration rate <inline-formula><mml:math id="inf249"><mml:msub><mml:mi>p</mml:mi><mml:mi>l</mml:mi></mml:msub></mml:math></inline-formula> decreases, consistent with the hot stove effect caused by asymmetric adaptive sampling (<xref ref-type="bibr" rid="bib21">Denrell, 2007</xref>). Risk seeking never emerges when <inline-formula><mml:math id="inf250"><mml:mrow><mml:mi>e</mml:mi><mml:mo>≤</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:math></inline-formula>, which is also consistent with the results of reinforcement learning.</p><p>This dynamics model provides an illustrative understanding of how the asymmetry of adaptive sampling causes the hot stove effect. Consider the case of high inequality between exploitation (<inline-formula><mml:math id="inf251"><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub></mml:math></inline-formula>) and exploration (<inline-formula><mml:math id="inf252"><mml:msub><mml:mi>p</mml:mi><mml:mi>l</mml:mi></mml:msub></mml:math></inline-formula>), namely, <inline-formula><mml:math id="inf253"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub><mml:mo>≫</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>l</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. Under such a condition, the state <inline-formula><mml:math id="inf254"><mml:msup><mml:mi>S</mml:mi><mml:mo>-</mml:mo></mml:msup></mml:math></inline-formula>, that is choosing the safe option with the negative inner attitude –, becomes a ‘dead end’ from which individuals can seldom escape once entered. However, if the inequality <inline-formula><mml:math id="inf255"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub><mml:mo>≥</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>l</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is not so large that a substantial fraction of the population now comes back to <inline-formula><mml:math id="inf256"><mml:msup><mml:mi>R</mml:mi><mml:mo>-</mml:mo></mml:msup></mml:math></inline-formula> from <inline-formula><mml:math id="inf257"><mml:msup><mml:mi>S</mml:mi><mml:mo>-</mml:mo></mml:msup></mml:math></inline-formula>, the increasing number of people belonging to <inline-formula><mml:math id="inf258"><mml:msup><mml:mi>R</mml:mi><mml:mo>+</mml:mo></mml:msup></mml:math></inline-formula> (that is, <inline-formula><mml:math id="inf259"><mml:msubsup><mml:mi>N</mml:mi><mml:mi>R</mml:mi><mml:mo>+</mml:mo></mml:msubsup></mml:math></inline-formula>) could eventually exceed the number of people ‘spilling out’ to <inline-formula><mml:math id="inf260"><mml:msup><mml:mi>S</mml:mi><mml:mo>-</mml:mo></mml:msup></mml:math></inline-formula>. Such an illustrative analysis shows that the hot stove effect can be overcome if the number of people who get stuck in the dead end <inline-formula><mml:math id="inf261"><mml:msup><mml:mi>S</mml:mi><mml:mo>-</mml:mo></mml:msup></mml:math></inline-formula> can somehow be reduced. And this is possible if one can increase the ‘come-backs’ to <inline-formula><mml:math id="inf262"><mml:msup><mml:mi>R</mml:mi><mml:mo>-</mml:mo></mml:msup></mml:math></inline-formula>. In other words, if any mechanisms can increase <inline-formula><mml:math id="inf263"><mml:msubsup><mml:mi>P</mml:mi><mml:mi>R</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:math></inline-formula> in relation to <inline-formula><mml:math id="inf264"><mml:msubsup><mml:mi>P</mml:mi><mml:mi>S</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:math></inline-formula>, the hot stove effect should be overcome.</p><p>Next, we assumed a frequency-dependent reliance on social information operating in this population dynamics. Specifically, we considered that the net per capita probability of choosing each option, <inline-formula><mml:math id="inf265"><mml:mi>P</mml:mi></mml:math></inline-formula>, is composed of a weighted average between the asocial baseline probability (<inline-formula><mml:math id="inf266"><mml:mi>p</mml:mi></mml:math></inline-formula>) and the social frequency influence (<inline-formula><mml:math id="inf267"><mml:mi>F</mml:mi></mml:math></inline-formula>), namely, <inline-formula><mml:math id="inf268"><mml:mrow><mml:mi>P</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mi>σ</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>⁢</mml:mo><mml:mi>p</mml:mi></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi>σ</mml:mi><mml:mo>⁢</mml:mo><mml:mi>F</mml:mi></mml:mrow></mml:mrow></mml:mrow></mml:math></inline-formula>. Again, <inline-formula><mml:math id="inf269"><mml:mi>σ</mml:mi></mml:math></inline-formula> is the weight of social influence, and we also assumed that there would be the conformity exponent <inline-formula><mml:math id="inf270"><mml:mi>θ</mml:mi></mml:math></inline-formula> in the social frequency influence <inline-formula><mml:math id="inf271"><mml:mi>F</mml:mi></mml:math></inline-formula> such that <inline-formula><mml:math id="inf272"><mml:mrow><mml:mi>F</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mi>i</mml:mi><mml:mi>θ</mml:mi></mml:msubsup><mml:mo>/</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mi>S</mml:mi><mml:mi>θ</mml:mi></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mi>R</mml:mi><mml:mi>θ</mml:mi></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math></inline-formula> where <inline-formula><mml:math id="inf273"><mml:mrow><mml:mi>i</mml:mi><mml:mo>∈</mml:mo><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mi>S</mml:mi><mml:mo>,</mml:mo><mml:mi>R</mml:mi><mml:mo stretchy="false">}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> (see Materials and methods).</p><p>Through numerical analyses, we have confirmed that social influence can indeed increase the flow-back rate <inline-formula><mml:math id="inf274"><mml:msubsup><mml:mi>P</mml:mi><mml:mi>R</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:math></inline-formula>, which raises the possibility of risk-seeking equilibrium <inline-formula><mml:math id="inf275"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mo>⋆</mml:mo></mml:mrow></mml:msubsup><mml:mo>></mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo>⋆</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> (<xref ref-type="fig" rid="fig4">Figure 4d</xref>; see <xref ref-type="fig" rid="fig4s1">Figure 4—figure supplement 1</xref> for a wider parameter region). For an approximation of the bifurcation analysis, we recorded the equilibrium density of the risky state <inline-formula><mml:math id="inf276"><mml:msubsup><mml:mi>N</mml:mi><mml:mi>R</mml:mi><mml:mo>⋆</mml:mo></mml:msubsup></mml:math></inline-formula> starting from various initial population distributions (that is, varying <inline-formula><mml:math id="inf277"><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mi>R</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math id="inf278"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mi>S</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mn>20</mml:mn><mml:mo>-</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mi>R</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:math></inline-formula>). <xref ref-type="fig" rid="fig5">Figure 5</xref> shows the conditions under which the system ends up in risk-seeking equilibrium. When the conformity exponent <inline-formula><mml:math id="inf279"><mml:mi>θ</mml:mi></mml:math></inline-formula> is not too large (<inline-formula><mml:math id="inf280"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>θ</mml:mi><mml:mo><</mml:mo><mml:mn>10</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>), there is a region that risk seeking can be a unique equilibrium, irrespective of the initial distribution, and attracting the population even from an extremely biased initial distribution such as <inline-formula><mml:math id="inf281"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> (<xref ref-type="fig" rid="fig5">Figure 5</xref>).</p><fig-group><fig id="fig5" position="float"><label>Figure 5.</label><caption><title>The approximate bifurcation analysis.</title><p>The relationships between the social influence weight <inline-formula><mml:math id="inf282"><mml:mi>σ</mml:mi></mml:math></inline-formula> and the equilibrium number of individuals in the risky behavioural state <inline-formula><mml:math id="inf283"><mml:msubsup><mml:mi>N</mml:mi><mml:mi>R</mml:mi><mml:mo>⋆</mml:mo></mml:msubsup></mml:math></inline-formula> across different conformity exponents <inline-formula><mml:math id="inf284"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>∈</mml:mo><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>10</mml:mn><mml:mo stretchy="false">}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> and different values of risk premium <inline-formula><mml:math id="inf285"><mml:mrow><mml:mi>e</mml:mi><mml:mo>∈</mml:mo><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mn>0.55</mml:mn><mml:mo>,</mml:mo><mml:mn>0.65</mml:mn><mml:mo>,</mml:mo><mml:mn>0.7</mml:mn><mml:mo>,</mml:mo><mml:mn>0.75</mml:mn><mml:mo stretchy="false">}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>, are shown as black dots. The background colours indicate regions where the system approaches either risk aversion (<inline-formula><mml:math id="inf286"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mo>⋆</mml:mo></mml:mrow></mml:msubsup><mml:mo><</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo>⋆</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula>; blue) or risk seeking (<inline-formula><mml:math id="inf287"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mo>⋆</mml:mo></mml:mrow></mml:msubsup><mml:mo>></mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo>⋆</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula>; red). The horizontal dashed line is <inline-formula><mml:math id="inf288"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>R</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mi>S</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>10</mml:mn></mml:mrow></mml:math></inline-formula>. Two locally stable equilibria emerge when <inline-formula><mml:math id="inf289"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>≥</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:math></inline-formula>, which suggests that the system has a bifurcation when <inline-formula><mml:math id="inf290"><mml:mi>σ</mml:mi></mml:math></inline-formula> is sufficiently large. The other parameters are set to <inline-formula><mml:math id="inf291"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>0.7</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="inf292"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>l</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>0.2</mml:mn></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="inf293"><mml:mrow><mml:mi>N</mml:mi><mml:mo>=</mml:mo><mml:mn>20</mml:mn></mml:mrow></mml:math></inline-formula>.</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="elife-75308.xml.media/fig5.jpg"/></fig><fig id="fig5s1" position="float" specific-use="child-fig"><label>Figure 5—figure supplement 1.</label><caption><title>The approximate bifurcation analysis.</title><p>The relationship between the social influence weight <inline-formula><mml:math id="inf294"><mml:mi>σ</mml:mi></mml:math></inline-formula> and the equilibrium number of individuals choosing the risky alternative <inline-formula><mml:math id="inf295"><mml:msubsup><mml:mi>N</mml:mi><mml:mi>R</mml:mi><mml:mo>⋆</mml:mo></mml:msubsup></mml:math></inline-formula> across the different conformity exponents <inline-formula><mml:math id="inf296"><mml:mrow><mml:mi>θ</mml:mi><mml:mspace width="veryverythickmathspace"/><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi/><mml:mo>∈</mml:mo><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>10</mml:mn><mml:mo stretchy="false">}</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>, shown as black dots. The triangular points shown in the background of each panel indicate regions in which the group approaches risk aversion (i.e., <inline-formula><mml:math id="inf297"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mo>⋆</mml:mo></mml:mrow></mml:msubsup><mml:mo><</mml:mo><mml:mn>10</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>; blue) or the risk-seeking equilibrium (i.e. <inline-formula><mml:math id="inf298"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mo>⋆</mml:mo></mml:mrow></mml:msubsup><mml:mo>></mml:mo><mml:mn>10</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>; red). Two different equilibria mean that the system has a bifurcation under a given <inline-formula><mml:math id="inf299"><mml:mi>σ</mml:mi></mml:math></inline-formula>. The direction of the background triangles indicates whether <inline-formula><mml:math id="inf300"><mml:msub><mml:mi>N</mml:mi><mml:mi>R</mml:mi></mml:msub></mml:math></inline-formula> increases (<inline-formula><mml:math id="inf301"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi></mml:mrow></mml:mstyle></mml:math></inline-formula>) or decreases (<inline-formula><mml:math id="inf302"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi mathvariant="normal">∇</mml:mi></mml:mrow></mml:mstyle></mml:math></inline-formula>) relative to its starting position. The other parameters are set to <inline-formula><mml:math id="inf303"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>0.7</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="inf304"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>l</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>0.2</mml:mn></mml:mrow></mml:math></inline-formula>.</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="elife-75308.xml.media/fig5-figsupp1.jpg"/></fig></fig-group><p>Under the conformist bias <inline-formula><mml:math id="inf305"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>≥</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:math></inline-formula>, two locally stable equilibria exist. Strong positive feedback dominates the system when both <inline-formula><mml:math id="inf306"><mml:mi>σ</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="inf307"><mml:mi>θ</mml:mi></mml:math></inline-formula> are large. Therefore, the system can end up in either of the equilibria depending solely on the initial density distribution, consistent with the conventional view of herding (<xref ref-type="bibr" rid="bib23">Denrell and Le Mens, 2017</xref>; <xref ref-type="bibr" rid="bib73">Toyokawa et al., 2019</xref>). This is also consistent with a well-known result of collective foraging by pheromone trail ants, which react to social information in a conformity-like manner (<xref ref-type="bibr" rid="bib9">Beckers et al., 1990</xref>; <xref ref-type="bibr" rid="bib33">Harrison et al., 2001</xref>).</p><p>Notably, however, even with a positive conformist bias, such as <inline-formula><mml:math id="inf308"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>=</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:math></inline-formula>, there is a region with a moderate value of <inline-formula><mml:math id="inf309"><mml:mi>σ</mml:mi></mml:math></inline-formula> where risk seeking remains a unique equilibrium when the risk premium was high (<inline-formula><mml:math id="inf310"><mml:mrow><mml:mi>e</mml:mi><mml:mo>≥</mml:mo><mml:mn>0.7</mml:mn></mml:mrow></mml:math></inline-formula>). In this regime, the benefit of collective behavioural rescue can dominate without any possibility of maladaptive herding.</p><p>It is worth noting that in the case of <inline-formula><mml:math id="inf311"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></inline-formula>, where individuals make merely a random choice at a rate <inline-formula><mml:math id="inf312"><mml:mi>σ</mml:mi></mml:math></inline-formula>, risk aversion is also relaxed (<xref ref-type="fig" rid="fig5">Figure 5</xref>, the leftmost column), and the adaptive risky shift even emerges around <inline-formula><mml:math id="inf313"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mn>0.25</mml:mn><mml:mo><</mml:mo><mml:mi>σ</mml:mi><mml:mo><</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>. However, this ostensible behavioural rescue is due solely to the pure effect of additional random exploration that reduces <inline-formula><mml:math id="inf314"><mml:mrow><mml:msubsup><mml:mi>P</mml:mi><mml:mi>S</mml:mi><mml:mo>-</mml:mo></mml:msubsup><mml:mo>/</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mi>P</mml:mi><mml:mi>S</mml:mi><mml:mo>-</mml:mo></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>P</mml:mi><mml:mi>R</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>, mitigating stickiness to the dead-end status <inline-formula><mml:math id="inf315"><mml:msup><mml:mi>S</mml:mi><mml:mo>-</mml:mo></mml:msup></mml:math></inline-formula>. When <inline-formula><mml:math id="inf316"><mml:mrow><mml:mi>σ</mml:mi><mml:mo>→</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula> with <inline-formula><mml:math id="inf317"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></inline-formula>, therefore, the risky shift eventually disappears because the individuals choose between <inline-formula><mml:math id="inf318"><mml:mi>S</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="inf319"><mml:mi>R</mml:mi></mml:math></inline-formula> almost randomly.</p><p>However, the collective risky shift observed in the conditions of <inline-formula><mml:math id="inf320"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>></mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> cannot be explained solely by the mere addition of exploration. A weak conformist bias (i.e. a linear response to the social frequency; <inline-formula><mml:math id="inf321"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula>) monotonically increases the equilibrium density <inline-formula><mml:math id="inf322"><mml:msubsup><mml:mi>N</mml:mi><mml:mi>R</mml:mi><mml:mo>⋆</mml:mo></mml:msubsup></mml:math></inline-formula> with increasing social influence <inline-formula><mml:math id="inf323"><mml:mi>σ</mml:mi></mml:math></inline-formula>, which goes beyond the level of risky shift observed with the addition of random choice (<xref ref-type="fig" rid="fig5">Figure 5</xref>). Therefore, although the collective rescue might indeed owe its part of the mitigation of the hot stove effect to increasing exploration, the further enhancement of risk seeking cannot be fully explained by it alone.</p><p>The key is the interaction between negative and positive feedback. As we discussed above, risk aversion is reduced if the ratio <inline-formula><mml:math id="inf324"><mml:mrow><mml:msubsup><mml:mi>P</mml:mi><mml:mi>S</mml:mi><mml:mo>-</mml:mo></mml:msubsup><mml:mo>/</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mi>P</mml:mi><mml:mi>S</mml:mi><mml:mo>-</mml:mo></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>P</mml:mi><mml:mi>R</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> decreases, either by increasing <inline-formula><mml:math id="inf325"><mml:msubsup><mml:mi>P</mml:mi><mml:mi>R</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:math></inline-formula> or reducing <inline-formula><mml:math id="inf326"><mml:msubsup><mml:mi>P</mml:mi><mml:mi>S</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:math></inline-formula>. The per individual probability of choosing the safe option with the negative attitude, that is, <inline-formula><mml:math id="inf327"><mml:mrow><mml:msubsup><mml:mi>P</mml:mi><mml:mi>S</mml:mi><mml:mo>-</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mi>σ</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>⁢</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mrow><mml:mi>σ</mml:mi><mml:mo>⁢</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mi>S</mml:mi><mml:mi>θ</mml:mi></mml:msubsup></mml:mrow><mml:mo>/</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mi>R</mml:mi><mml:mi>θ</mml:mi></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mi>S</mml:mi><mml:mi>θ</mml:mi></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math></inline-formula>, becomes smaller than the baseline exploitation probability <inline-formula><mml:math id="inf328"><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub></mml:math></inline-formula>, when <inline-formula><mml:math id="inf329"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup><mml:mo stretchy="false">)</mml:mo><mml:mo><</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mstyle></mml:math></inline-formula>. Even though the majority of the population may still choose the safe alternative and hence <inline-formula><mml:math id="inf330"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mo>></mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mstyle></mml:math></inline-formula>, the inequality <inline-formula><mml:math id="inf331"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup><mml:mo stretchy="false">)</mml:mo><mml:mo><</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mstyle></mml:math></inline-formula> can nevertheless hold if one takes a sufficiently small value of <inline-formula><mml:math id="inf332"><mml:mi>θ</mml:mi></mml:math></inline-formula>. Crucially, the reduction of <inline-formula><mml:math id="inf333"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msubsup><mml:mi>P</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo>−</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> leads to a further reduction of <inline-formula><mml:math id="inf334"><mml:msubsup><mml:mi>P</mml:mi><mml:mi>S</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:math></inline-formula> itself through decreasing <inline-formula><mml:math id="inf335"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo>−</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula>, thereby further decreasing the social influence supporting the safe option. Such a negative feedback process weakens the concomitant risk aversion. Naturally, this negative feedback is maximised with <inline-formula><mml:math id="inf336"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></inline-formula>.</p><p>Once the negative feedback has weakened the underlying risk aversion, the majority of the population eventually choose the risky option, an effect evident in the case of <inline-formula><mml:math id="inf337"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></inline-formula> (<xref ref-type="fig" rid="fig5">Figure 5</xref>). What uniquely operates in cases of <inline-formula><mml:math id="inf338"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>></mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> is that because <inline-formula><mml:math id="inf339"><mml:msub><mml:mi>N</mml:mi><mml:mi>R</mml:mi></mml:msub></mml:math></inline-formula> is a majority by now, positive feedback starts. Thanks to the conformist bias, the inequality <inline-formula><mml:math id="inf340"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo>></mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mstyle></mml:math></inline-formula> is further <italic>amplified</italic>. In this phase, the larger <inline-formula><mml:math id="inf341"><mml:mi>θ</mml:mi></mml:math></inline-formula>, the stronger the concomitant relationship <inline-formula><mml:math id="inf342"><mml:mrow><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mi>S</mml:mi><mml:mi>θ</mml:mi></mml:msubsup><mml:mo>/</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mi>R</mml:mi><mml:mi>θ</mml:mi></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mi>S</mml:mi><mml:mi>θ</mml:mi></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>≪</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. Such positive feedback will never operate with <inline-formula><mml:math id="inf343"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>≤</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></inline-formula>.</p><p>In conclusion, it is the synergy of negative and positive feedback that explains the full range of adaptive risky shift. Neither positive nor negative feedback alone can account for both accuracy and flexibility emerging through collective learning and decision making. The results are qualitatively unchanged across a range of different combinations of <inline-formula><mml:math id="inf344"><mml:mi>e</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="inf345"><mml:msub><mml:mi>p</mml:mi><mml:mi>l</mml:mi></mml:msub></mml:math></inline-formula>, and <inline-formula><mml:math id="inf346"><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub></mml:math></inline-formula> (<xref ref-type="fig" rid="fig4s1">Figure 4—figure supplement 1</xref> and <xref ref-type="fig" rid="fig5s1">Figure 5—figure supplement 1</xref>). It is worth noting that when <inline-formula><mml:math id="inf347"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>e</mml:mi><mml:mo><</mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>, this social frequency-dependent population tends to exhibit risk aversion (<xref ref-type="fig" rid="fig5s1">Figure 5—figure supplement 1</xref>), consistent with the result of the agent-based simulation for the case where the mean payoff of the risky option was smaller than that of the safe option (<xref ref-type="fig" rid="fig1s3">Figure 1—figure supplement 3</xref>). Therefore, the system does not mindlessly prefer risk seeking, but it becomes risk prone only when to do so is favourable in the long run.</p></sec><sec id="s2-7"><title>An experimental demonstration</title><p>One hundred eighty-five adult human subjects performed the individual task without social interactions, while 400 subjects performed the task collectively with group sizes ranging from 2 to 8. We confirmed that the model predictions were qualitatively unchanged across the experimental settings used in the online experiments (<xref ref-type="fig" rid="fig1s5">Figure 1—figure supplement 5</xref>).</p><p>We used four different task settings. Three of them were positive risk premium (positive RP) tasks that had an optimal risky alternative, while the other was a negative risk premium (negative RP) task that had a suboptimal risky alternative. On the basis of both the agent-based simulation (<xref ref-type="fig" rid="fig1">Figure 1</xref> and <xref ref-type="fig" rid="fig1s3">Figure 1—figure supplement 3</xref>) and the population dynamics (<xref ref-type="fig" rid="fig5">Figure 5</xref> and <xref ref-type="fig" rid="fig5s1">Figure 5—figure supplement 1</xref>), we hypothesised that conformist social influence promotes risk seeking to a lesser extent when the RP is negative than when it is positive. We also expected that whether the collective rescue effect emerges under positive RP settings depends on learning parameters such as <inline-formula><mml:math id="inf348"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>α</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>β</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula> (<xref ref-type="fig" rid="fig1s5">Figure 1—figure supplement 5d-f</xref>).</p><p>The Bayesian model comparison (<xref ref-type="bibr" rid="bib65">Stephan et al., 2009</xref>) revealed that participants in the group condition were more likely to employ decision-biasing social learning than either asocial reinforcement learning or the value-shaping process (<xref ref-type="fig" rid="fig6s2">Figure 6—figure supplement 2</xref>). Therefore, in the following analysis, we focus on results obtained from the decision-biasing model fit. Individual parameters were estimated using a hierarchical Bayesian method whose performance had been supported by the parameter recovery (<xref ref-type="fig" rid="fig6s3">Figure 6—figure supplement 3</xref>).</p><p>Parameter estimation (<xref ref-type="table" rid="table3">Table 3</xref>) showed that individuals in the group condition across all four tasks were likely to use social information in their decision making at a rate ranging between 4% and 18% (Mean <inline-formula><mml:math id="inf349"><mml:mi>σ</mml:mi></mml:math></inline-formula>; <xref ref-type="table" rid="table3">Table 3</xref>), and that mean posterior values of <inline-formula><mml:math id="inf350"><mml:mi>θ</mml:mi></mml:math></inline-formula> were above 1 for all four tasks. These suggest that participants were likely to use a mix of individual reinforcement learning and conformist social learning.</p><table-wrap id="table3" position="float"><label>Table 3.</label><caption><title>Means and 95% Bayesian credible intervals (shown in square brackets) of the global parameters of the learning model.</title><p>The group condition and individual condition are shown separately. All parameters satisfied the Gelman–Rubin criterion <inline-formula><mml:math id="inf351"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mrow><mml:mover><mml:mi>R</mml:mi><mml:mo stretchy="false">^</mml:mo></mml:mover></mml:mrow><mml:mo><</mml:mo><mml:mn>1.01</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>. All estimates are based on over 500 effective samples from the posterior.</p></caption><table frame="hsides" rules="groups"><thead><tr><th align="left" valign="bottom">Task category</th><th align="left" colspan="3" valign="bottom">Positive risk premium (positive RP)</th><th align="left" valign="bottom">Negative risk premium (negative RP)</th></tr><tr><th align="left" valign="bottom">Task</th><th align="char" char="hyphen" valign="bottom">1-risky-1-safe</th><th align="char" char="hyphen" valign="bottom">1-risky-3-safe</th><th align="char" char="hyphen" valign="bottom">2-risky-2-safe</th><th align="char" char="hyphen" valign="bottom">1-risky-1-safe</th></tr></thead><tbody><tr><td align="left" valign="bottom">Group</td><td align="left" valign="bottom">n = 123</td><td align="left" valign="bottom">n = 97</td><td align="left" valign="bottom">n = 87</td><td align="left" valign="bottom">n = 93</td></tr><tr><td align="left" valign="top">μ<sub>logitα</sub></td><td align="left" valign="bottom">–2.2 [-2.8,–1.5]</td><td align="left" valign="bottom">–1.8 [-2.3,–1.4]</td><td align="left" valign="bottom">–1.7 [-2.1,–1.3]</td><td align="left" valign="bottom">–0.09 [-0.7, 0.6]</td></tr><tr><td align="left" valign="bottom">(Mean α)</td><td align="left" valign="bottom">0.10 [0.06, 0.18]</td><td align="left" valign="bottom">0.14 [0.09, 0.20]</td><td align="left" valign="bottom">0.15 [0.11, 0.21]</td><td align="left" valign="bottom">0.48 [0.3, 0.6]</td></tr><tr><td align="left" valign="bottom">μ<sub>logitβ</sub></td><td align="left" valign="bottom">1.4 [1.1, 1.6]</td><td align="left" valign="bottom">1.5 [1.3, 1.8]</td><td align="left" valign="bottom">1.3 [1.0, 1.5]</td><td align="left" valign="bottom">1.2 [1.0, 1.5]</td></tr><tr><td align="left" valign="bottom">(Mean β)</td><td align="left" valign="bottom">4.1 [3.0, 5.0]</td><td align="left" valign="bottom">4.5 [3.7, 6.0]</td><td align="left" valign="bottom">3.7 [2.7, 4.5]</td><td align="left" valign="bottom">3.3 [2.7, 4.5]</td></tr><tr><td align="left" valign="bottom">μ<sub>logitα</sub></td><td align="left" valign="bottom">–2.4 [-3.1,–1.8]</td><td align="left" valign="bottom">–2.1 [-2.6,–1.6]</td><td align="left" valign="bottom">–2.1 [-2.5,–1.7]</td><td align="left" valign="bottom">–2.0 [-2.7,–1.5]</td></tr><tr><td align="left" valign="bottom">(Mean σ)</td><td align="left" valign="bottom">0.08 [0.04, 0.14]</td><td align="left" valign="bottom">0.11 [0.07, 0.17]</td><td align="left" valign="bottom">0.11 [0.08, 0.15]</td><td align="left" valign="bottom">0.12 [0.06. 0.18]</td></tr><tr><td align="left" valign="bottom">μ<sub>θ</sub> = mean θ</td><td align="left" valign="bottom">1.4 [0.58, 2.3]</td><td align="left" valign="bottom">1.6 [0.9, 2.4]</td><td align="left" valign="bottom">1.8 [1.0, 2.9]</td><td align="left" valign="bottom">1.6 [0.9, 2.3]</td></tr><tr><td align="left" valign="bottom">Individual</td><td align="left" valign="bottom">n = 45</td><td align="left" valign="bottom">n = 51</td><td align="left" valign="bottom">n = 64</td><td align="left" valign="bottom">n = 25</td></tr><tr><td align="left" valign="bottom">μ<sub>logitα</sub></td><td align="left" valign="bottom">–2.1 [-3.1,–0.87]</td><td align="left" valign="bottom">–2.1 [-2.6,–1.6]</td><td align="left" valign="bottom">–1.3 [-2.1,–0.50]</td><td align="left" valign="bottom">–1.3 [-2.2,–0.4]</td></tr><tr><td align="left" valign="bottom">(Mean α)</td><td align="left" valign="bottom">0.11 [0.04, 0.30]</td><td align="left" valign="bottom">0.11 [0.07, 0.17]</td><td align="left" valign="bottom">0.21 [0.11, 0.38]</td><td align="left" valign="bottom">0.2 [0.1, 0.4]</td></tr><tr><td align="left" valign="bottom">μ<sub>logitβ</sub></td><td align="left" valign="bottom">0.42 [-0.43, 1.1]</td><td align="left" valign="bottom">0.91 [0.63, 1.2]</td><td align="left" valign="bottom">0.76 [0.42, 1.1]</td><td align="left" valign="bottom">1.2 [0.9, 1.4]</td></tr><tr><td align="left" valign="bottom">(Mean β)</td><td align="left" valign="bottom">1.5 [0.65, 3.0]</td><td align="left" valign="bottom">2.5 [1.9, 3.3]</td><td align="left" valign="bottom">2.1 [1.5, 3.0]</td><td align="left" valign="bottom">3.3 [2.5, 4.1]</td></tr></tbody></table></table-wrap><p>To address whether the behavioural data are well explained by our social learning model and whether collective rescue was indeed observed for social learning individuals, we conducted agent-based simulations of the fit computational model with the calibrated parameters, including 100,000 independent runs for each task setup (see Materials and methods).</p><p>The results of the agent-based simulations agreed with our hypotheses (<xref ref-type="fig" rid="fig6">Figure 6</xref>). Overall, the 80% Bayesian credible intervals of the predicted performance of the group condition (shades of orange in <xref ref-type="fig" rid="fig6">Figure 6</xref>) cover an area of more risk taking than the area covered by the individual condition (shades of grey). As predicted, in the negative RP task, social learning promoted suboptimal risk taking for some values of <inline-formula><mml:math id="inf352"><mml:mrow><mml:mi>α</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>β</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>, but the magnitude looked smaller compared to in the positive RP tasks. Additionally, increasing <inline-formula><mml:math id="inf353"><mml:msub><mml:mi>σ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> led to an increasing probability of risk taking in the positive RP tasks (<xref ref-type="fig" rid="fig6">Figure 6a–c</xref>), whereas in the negative RP task, increasing <inline-formula><mml:math id="inf354"><mml:mi>σ</mml:mi></mml:math></inline-formula> did not always increase risk taking (<xref ref-type="fig" rid="fig6">Figure 6d</xref>).</p><fig-group><fig id="fig6" position="float"><label>Figure 6.</label><caption><title>Prediction of the fit learning model.</title><p>Results of a series of agent-based simulations with individual parameters that were drawn randomly from the best fit global parameters. Independent simulations were conducted 100,000 times for each condition. Group size was fixed to six for the group condition. Lines are means (black-dashed: individual, coloured-solid: group) and the shaded areas are 80% Bayesian credible intervals. Mean performances of agents with different <inline-formula><mml:math id="inf355"><mml:msub><mml:mi>σ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> are shown in the colour gradient. (<bold>a</bold>) A two-armed bandit task. (<bold>b</bold>) A 1-risky-3-safe (four-armed) bandit task. (<bold>c</bold>) A 2-risky-2-safe (four-armed) bandit task. (<bold>d</bold>) A negative risk premium two-armed bandit task.</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="elife-75308.xml.media/fig6.jpg"/></fig><fig id="fig6s1" position="float" specific-use="child-fig"><label>Figure 6—figure supplement 1.</label><caption><title>Experimental results with the mixed logit model regression.</title><p>The black triangles are subjects in the individual learning condition; the orange dots are those in the group condition with group sizes ranging from 2 to 8. The solid lines are predictions from a mixed logit model for the individual condition (black) and for the group condition (orange), with the shaded area showing the 95% Bayesian credible intervals (CIs). (<bold>a</bold>) A two-armed bandit task (<inline-formula><mml:math id="inf356"><mml:mrow><mml:mi>N</mml:mi><mml:mo>=</mml:mo><mml:mn>168</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>. (<bold>b</bold>) A 1-risky-3-safe (four-armed) bandit task (<inline-formula><mml:math id="inf357"><mml:mrow><mml:mi>N</mml:mi><mml:mo>=</mml:mo><mml:mn>148</mml:mn></mml:mrow></mml:math></inline-formula>). (<bold>c</bold>) A 2-risky-2-safe (four-armed) bandit task (<inline-formula><mml:math id="inf358"><mml:mrow><mml:mi>N</mml:mi><mml:mo>=</mml:mo><mml:mn>151</mml:mn></mml:mrow></mml:math></inline-formula>). (<bold>d</bold>) A negative risk premium (RP) two-armed bandit task (<inline-formula><mml:math id="inf359"><mml:mrow><mml:mi>N</mml:mi><mml:mo>=</mml:mo><mml:mn>118</mml:mn></mml:mrow></mml:math></inline-formula>). The width of the CI for the individual condition in the negative RP task is due to the lack of data points in the region. The <italic>x</italic> axis is <inline-formula><mml:math id="inf360"><mml:mrow><mml:msub><mml:mi>α</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>β</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>, namely, the susceptibility to the hot stove effect. (<bold>a</bold>, <bold>b,</bold> and <bold>d</bold>) The <italic>y</italic> axis is the mean proportion of choosing the risky alternative averaged over the second half of the trials. (<bold>c</bold>) The <italic>y</italic> axis is the mean proportion of choosing the optimal risky alternative averaged over the second half of the trials. The horizontal lines show the chance-level probability.</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="elife-75308.xml.media/fig6-figsupp1.jpg"/></fig><fig id="fig6s2" position="float" specific-use="child-fig"><label>Figure 6—figure supplement 2.</label><caption><title>Bayesian model comparison.</title><p>(<bold>a</bold>) The model recovery performance: model frequencies (dark shade) and exceedance probability (XP) for each pair of simulated and fitted models, calculated by the Widely Applicable Information Criterion (WAIC). (b–d) Model comparison results. The lengths of the bars indicate model frequencies. Exceedance probability (XP) of the decision-biasing model is shown.</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="elife-75308.xml.media/fig6-figsupp2.jpg"/></fig><fig id="fig6s3" position="float" specific-use="child-fig"><label>Figure 6—figure supplement 3.</label><caption><title>The parameter recovery performance.</title><p>The top half and bottom half of the figure are the results of parameter recovery test 1 and 2, respectively. The left column shows the global parameters fitted for each of the two four-armed bandit tasks, the 1-risky-3-safe task (<inline-formula><mml:math id="inf361"><mml:mrow><mml:mi>N</mml:mi><mml:mo>=</mml:mo><mml:mn>105</mml:mn></mml:mrow></mml:math></inline-formula>) and the 2-risky-2-safe task (<inline-formula><mml:math id="inf362"><mml:mrow><mml:mi>N</mml:mi><mml:mo>=</mml:mo><mml:mn>105</mml:mn></mml:mrow></mml:math></inline-formula>). The red points are the true values and the black points are the mean posterior values (i.e. recovered values). The 95% Bayesian credible intervals are shown with error bars. The middle and right column are individual-level parameters across the two task conditions (<inline-formula><mml:math id="inf363"><mml:mrow><mml:mi>N</mml:mi><mml:mo>=</mml:mo><mml:mn>210</mml:mn></mml:mrow></mml:math></inline-formula>). The <italic>x</italic> axis is the true value and the <italic>y</italic> axis is the fitted (i.e. the mean posterior) individual value. The differences between the true value and the estimated value are shown in different colours (Dark: fit well). The Pearson’s correlation coefficients between the true and fitted values are shown.</p></caption><graphic mime-subtype="jpeg" mimetype="image" xlink:href="elife-75308.xml.media/fig6-figsupp3.jpg"/></fig></fig-group><p>However, a complete switch of the majority’s behaviour from the suboptimal safe options to the optimal risky option (i.e. <inline-formula><mml:math id="inf364"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mo>></mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> for the two-armed task and <inline-formula><mml:math id="inf365"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mo>></mml:mo><mml:mn>0.25</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> for the four-armed task) was not widely observed. This might be because of the low copying weight (<inline-formula><mml:math id="inf366"><mml:mi>σ</mml:mi></mml:math></inline-formula>), coupled with the lower <inline-formula><mml:math id="inf367"><mml:mrow><mml:msub><mml:mi>α</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>β</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> of individual learners (mean [median] = 0.8 [0.3]) than that of social learners (mean [median] = 1.1 [0.5]; <xref ref-type="table" rid="table3">Table 3</xref>). The weak average reliance on social learning (<inline-formula><mml:math id="inf368"><mml:msub><mml:mi>σ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula>) hindered the strong collective rescue effect because strong positive feedback was not robustly formed.</p><p>To quantify the effect size of the relationship between the proportion of risk taking and each subject’s best fit learning parameters, we analysed a generalised linear mixed model (GLMM) fitted with the experimental data (see Materials and methods; <xref ref-type="table" rid="table4">Table 4</xref>). Within the group condition, the GLMM analysis showed a positive effect of <inline-formula><mml:math id="inf369"><mml:msub><mml:mi>σ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> on risk taking for every task condition (<xref ref-type="table" rid="table4">Table 4</xref>), which supports the simulated pattern. Also consistent with the simulations, in the positive RP tasks, subjects exhibited risk aversion more strongly when they had a higher value of <inline-formula><mml:math id="inf370"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>α</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>β</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula> (<xref ref-type="fig" rid="fig6s1">Figure 6—figure supplement 1a-c</xref>). There was no such clear trend in data from the negative RP task, although we cannot make a strong inference because of the large width of the Bayesian credible interval (<xref ref-type="fig" rid="fig6s1">Figure 6—figure supplement 1d</xref>). In the negative RP task, subjects were biased more towards the (favourable) safe option than subjects in the positive RP tasks (i.e. the intercept of the GLMM was lower in the negative RP task than in the others).<xref ref-type="table" rid="table2">Table 2</xref>.</p><table-wrap id="table4" position="float"><label>Table 4.</label><caption><title>Means and 95% Bayesian credible intervals (CIs; shown in square brackets) of the posterior estimations of the mixed logit model (generalised linear mixed model) that predicts the probability of choosing the risky alternative in the second half of the trial (<inline-formula><mml:math id="inf371"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>t</mml:mi><mml:mo>></mml:mo><mml:mn>35</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula>.</title><p>All parameters satisfied the Gelman–Rubin criterion <inline-formula><mml:math id="inf372"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mrow><mml:mover><mml:mi>R</mml:mi><mml:mo stretchy="false">^</mml:mo></mml:mover></mml:mrow><mml:mo><</mml:mo><mml:mn>1.01</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>. All estimates are based on over 500 effective samples from the posterior. Coefficients whose CI is either below or above 0 are highlighted.</p></caption><table frame="hsides" rules="groups"><thead><tr><th align="left" valign="bottom">Task category</th><th align="left" colspan="3" valign="bottom">Positive Risk Premium (positive RP)</th><th align="left" valign="bottom">Negative Risk Premium (negative RP)</th></tr><tr><th align="left" valign="bottom">Task</th><th align="char" char="hyphen" valign="bottom">1-risky-1-safe</th><th align="char" char="hyphen" valign="bottom">1-risky-3-safe</th><th align="char" char="hyphen" valign="bottom">2-risky-2-safe</th><th align="char" char="hyphen" valign="bottom">1-risky-1-safe</th></tr></thead><tbody><tr><td align="left" valign="bottom"/><td align="left" valign="bottom">n = 168</td><td align="left" valign="bottom">n = 148</td><td align="left" valign="bottom">n = 151</td><td align="left" valign="bottom">n = 118</td></tr><tr><td align="left" valign="bottom">Intercept</td><td align="char" char="." valign="bottom">–0.1 [-0.6, 0.3]</td><td align="char" char="." valign="bottom">–1.1 [-1.5,–0.6]</td><td align="char" char="." valign="bottom">–0.8 [-1.2,–0.4]</td><td align="char" char="." valign="bottom">–3.5 [-4.4,–2.7]</td></tr><tr><td align="left" valign="bottom">Susceptibility to the hot stove effect (α(β+1))</td><td align="char" char="." valign="bottom">–0.9 [-1.3,–0.4]</td><td align="char" char="." valign="bottom">–1.0 [-1.5,–0.5]</td><td align="char" char="." valign="bottom">–0.9 [-1.3,–0.6]</td><td align="char" char="." valign="bottom">0.6 [-0.1, 1.4]</td></tr><tr><td align="left" valign="bottom">Group (no = 0/yes = 1)</td><td align="char" char="." valign="bottom">0.0 [-0.7, 0.7]</td><td align="char" char="." valign="bottom">–0.2 [-1.0, 0.7]</td><td align="char" char="." valign="bottom">0.4 [-0.5, 1.2]</td><td align="char" char="." valign="bottom">3.8 [2.7, 4.9]</td></tr><tr><td align="left" valign="bottom">Group × α(β+1)</td><td align="char" char="." valign="bottom">0.6 [0.0, 1.1]</td><td align="char" char="." valign="bottom">0.4 [0.0, 0.9]</td><td align="char" char="." valign="bottom">0.3 [-0.1, 0.7]</td><td align="char" char="." valign="bottom">–1.1 [-1.9,–0.3]</td></tr><tr><td align="left" valign="bottom">Group × copying weight σ</td><td align="char" char="." valign="bottom">1.4 [0.5, 2.3]</td><td align="char" char="." valign="bottom">1.9 [0.8, 3.0]</td><td align="char" char="." valign="bottom">2.2 [0.4, 4.0]</td><td align="char" char="." valign="bottom">3.8 [2.2, 5.3]</td></tr><tr><td align="left" valign="bottom">Group × conformity exponent θ</td><td align="char" char="." valign="bottom">–0.7 [-0.9,–0.5]</td><td align="char" char="." valign="bottom">0.2 [0.0, 0.5]</td><td align="char" char="." valign="bottom">–0.3 [-0.5,–0.1]</td><td align="char" char="." valign="bottom">–1.8 [-2.1,–1.5]</td></tr></tbody></table></table-wrap><p>In sum, the experimental data analysis supports our prediction that conformist social influence promotes favourable risk taking even if individuals are biased towards risk aversion. The GLMM generally agreed with the theoretical prediction, and the fitted computational model that was supported by the Bayesian model comparison confirmed that the observed pattern was indeed likely to be a product of the collective rescue effect by conformist social learning. As predicted, the key was the balance between individual learning and the use of social information. In the Discussion, we consider the effect of the experimental setting on human learning strategies, which can be explored in future studies.</p></sec></sec><sec id="s3" sec-type="discussion"><title>Discussion</title><p>We have demonstrated that frequency-based copying, one of the most common forms of social learning strategy, can rescue decision makers from committing to adverse risk aversion in a risky trial-and-error learning task, even though a majority of individuals are potentially biased towards suboptimal risk aversion. Although an extremely strong reliance on conformist influence can raise the possibility of getting stuck on a suboptimal option, consistent with the previous view of herding by conformity (<xref ref-type="bibr" rid="bib57">Raafat et al., 2009</xref>; <xref ref-type="bibr" rid="bib23">Denrell and Le Mens, 2017</xref>), the mitigation of risk aversion and the concomitant collective behavioural rescue could emerge in a wide range of situations under modest use of conformist social learning.</p><p>Neither the averaging process of diverse individual inputs nor the speeding up of learning could account for the rescue effect. The individual diversity in the learning rate (<inline-formula><mml:math id="inf373"><mml:msub><mml:mi>α</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula>) was beneficial for the group performance, whereas that in the social learning weight (<inline-formula><mml:math id="inf374"><mml:msub><mml:mi>σ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula>) undermines the average decision performance, which could not be explained simply by a monotonic relationship between diversity and wisdom of crowds (<xref ref-type="bibr" rid="bib43">Lorenz et al., 2011</xref>). Self-organisation through collective behavioural dynamics emerging from the experience-based decision making must be responsible for the seemingly counter-intuitive phenomenon of collective rescue.</p><p>Our simplified differential equation model has identified a key mechanism of the collective behavioural rescue: the synergy of positive and negative feedback. Despite conformity, the probability of choosing the suboptimal option can decrease from what is expected by individual learning alone. Indeed, an inherent individual preference for the safe alternative, expressed by the softmax function <inline-formula><mml:math id="inf375"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mi>β</mml:mi><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msup><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mi>β</mml:mi><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mi>β</mml:mi><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msup><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula>, is mitigated by the conformist influence <inline-formula><mml:math id="inf376"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula> as long as the former is larger than the latter. In other words, risk-aversion was mitigated not because the majority chose the risky option, nor were individuals simply attracted towards the majority. Rather, participants’ choices became risker even though the majority chose the safer alternative at the outset. Under social influences (either because of informational or normative motivations), individuals become more explorative, likely to continue sampling the risky option even after he/she gets disappointed by poor rewards. Once individual risk aversion is reduced, there will exist fewer individuals choosing the suboptimal safe option, which further reduces the number of majority choosing the safe option. This negative feedback facilitates individuals revisiting the risky alternative. Such an attraction to the risky option allows more individuals, including those who are currently sceptical about the value of the risky option, to experience a large bonanza from the risky option, which results in ‘gluing’ them to the risky alternative for a while. Once a majority of individuals get glued to the risky alternative, positive feedback from conformity kicks in, and optimal risk seeking is further strengthened.</p><p>Models of conformist social influences have suggested that influences from the majority on individual decision making can lead a group as a whole to collective illusion that individuals learn to prefer any behavioural alternatives supported by many other individuals (<xref ref-type="bibr" rid="bib22">Denrell and Le Mens, 2007</xref>; <xref ref-type="bibr" rid="bib23">Denrell and Le Mens, 2017</xref>). However, previous empirical studies have repeatedly demonstrated that collective decision making under frequency-based social influences is broadly beneficial and can maintain more flexibility than what suggested by models of herding and collective illusion (<xref ref-type="bibr" rid="bib73">Toyokawa et al., 2019</xref>; <xref ref-type="bibr" rid="bib3">Aplin et al., 2017</xref>; <xref ref-type="bibr" rid="bib9">Beckers et al., 1990</xref>; <xref ref-type="bibr" rid="bib62">Seeley et al., 1991</xref>; <xref ref-type="bibr" rid="bib33">Harrison et al., 2001</xref>; <xref ref-type="bibr" rid="bib38">Kandler and Laland, 2013</xref>). For example, <xref ref-type="bibr" rid="bib3">Aplin et al., 2017</xref> demonstrated that populations of great tits (<italic>Parus major</italic>) could switch their behavioural tradition after an environmental change even though individual birds were likely to have a strong conformist tendency. A similar phenomenon was also reported in humans (<xref ref-type="bibr" rid="bib73">Toyokawa et al., 2019</xref>).</p><p>Although these studies did not focus on risky decision making, and hence individuals were not inherently biased, experimentally induced environmental change was able to create such a situation where a majority of individuals exhibited an out-dated, suboptimal behaviour. However, as we have shown, a collective learning system could rescue their performance even though the individual distribution was strongly biased towards the suboptimal direction at the outset. The great tit and human groups were able to switch their tradition because of, rather than despite, the conformist social influence, thanks to the synergy of negative and positive feedback processes. Such the synergistic interaction between positive and negative feedback could not be predicted by the collective illusion models where individual decision making is determined fully by the majority influence because no negative feedback would be able to operate.</p><p>Through online behavioural experiments using a risky multi-armed bandit task, we have confirmed our theoretical prediction that simple frequency-based copying could mitigate risk aversion that many individual learners, especially those who had higher learning rates or lower exploration rates or both, would have exhibited as a result of the hot stove effect. The mitigation of risk aversion was also observed in the negative RP task, in which social learning slightly undermined the decision performance. However, because riskiness and expected reward are often positively correlated in a wide range of decision-making environments in the real world (<xref ref-type="bibr" rid="bib27">Frank, 2009</xref>; <xref ref-type="bibr" rid="bib56">Pleskac and Hertwig, 2014</xref>), the detrimental effect of reducing optimal risk aversion when risk premium is negative could be negligible in many ecological circumstances, making the conformist social learning beneficial in most cases.</p><p>Yet, a majority, albeit a smaller one, still showed risk aversion. The weak reliance on social learning, which affected less than 20% of decisions, was unable to facilitate strong positive feedback. The little use of social information might have been due to the lack of normative motivations for conformity and to the stationarity of the task. In a stable environment, learners could eventually gather enough information as trials proceeded, which might have made them less curious about information gathering including social learning (<xref ref-type="bibr" rid="bib60">Rendell et al., 2010</xref>). In reality, people might use more sophisticated social learning strategies whereby they change the reliance on social information flexibly over trials (<xref ref-type="bibr" rid="bib19">Deffner et al., 2020</xref>; <xref ref-type="bibr" rid="bib72">Toyokawa et al., 2017</xref>; <xref ref-type="bibr" rid="bib73">Toyokawa et al., 2019</xref>). Future research should consider more strategic use of social information, and will look at the conditions that elicit heavier reliance on the conformist social learning in humans, such as normative pressures for aligning with majority, volatility in the environment, time pressure, or an increasing number of behavioural options (<xref ref-type="bibr" rid="bib52">Muthukrishna et al., 2016</xref>), coupled with much larger group sizes (<xref ref-type="bibr" rid="bib73">Toyokawa et al., 2019</xref>).</p><p>The low learning rate <inline-formula><mml:math id="inf377"><mml:mi>α</mml:mi></mml:math></inline-formula>, which was at most 0.2 for many individuals in all the experimental task except for the negative RP task, should also have hindered the potential benefits of collective rescue in our current experiment, because the benefit of mitigating the hot stove effect would be minimal or hardly realised under such a small susceptibility to the hot stove effect. Although we believe that the simplest stationary environment was a necessary first step in building our understanding of the collective behavioural rescue effect, we would suggest that future studies use a temporally unstable (‘restless’) bandit task to elicit both a higher learning rate and a heavier reliance on social learning, so as to investigate the possibilities of a stronger effect. Indeed, previous studies with changing environments have reported a learning rate as high as <inline-formula><mml:math id="inf378"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>α</mml:mi><mml:mo>></mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> (<xref ref-type="bibr" rid="bib72">Toyokawa et al., 2017</xref>; <xref ref-type="bibr" rid="bib73">Toyokawa et al., 2019</xref>; <xref ref-type="bibr" rid="bib19">Deffner et al., 2020</xref>), under which individual learners should have suffered the hot stove trap more often.</p><p>Information about others’ payoffs might also be available in addition to inadvertent social frequency cues in some social contexts (<xref ref-type="bibr" rid="bib8">Bault et al., 2011</xref>; <xref ref-type="bibr" rid="bib13">Bolton and Harris, 1999</xref>). Knowing others’ payoffs allows one to use the ‘copy-successful-individuals’ strategy, which has been suggested to promote risk seeking irrespective of the risk premium because at least a subset of a population can be highly successful by sheer luck in risk taking (<xref ref-type="bibr" rid="bib5">Baldini, 2012</xref>; <xref ref-type="bibr" rid="bib6">Baldini, 2013</xref>; <xref ref-type="bibr" rid="bib70">Takahashi and Ihara, 2019</xref>). Additionally, cooperative communications may further amplify the suboptimal decision bias if information senders selectively communicate their own, biased, beliefs (<xref ref-type="bibr" rid="bib51">Moussaïd et al., 2015</xref>). Therefore, although communication may transfer information about forgone payoffs of other alternatives, which could mitigate the hot stove effect (<xref ref-type="bibr" rid="bib21">Denrell, 2007</xref>; <xref ref-type="bibr" rid="bib78">Yechiam and Busemeyer, 2006</xref>), future research should explore the potential impact of active sharing of richer information on collective learning situations (<xref ref-type="bibr" rid="bib71">Toyokawa et al., 2014</xref>).</p><p>In contrast, previous studies suggested that competitions or conflicts of interest among individuals can lead to better collective intelligence than fully cooperative situations (<xref ref-type="bibr" rid="bib17">Conradt et al., 2013</xref>) and can promote adaptive risk taking (<xref ref-type="bibr" rid="bib4">Arbilly et al., 2011</xref>). Further research will identify conditions under which cooperative communication containing richer information can improve decision making and drive adaptive cumulative cultural transmission (<xref ref-type="bibr" rid="bib18">Csibra and Gergely, 2011</xref>; <xref ref-type="bibr" rid="bib50">Morgan et al., 2015</xref>), when adverse biases in individual decision-making processes prevail.</p><p>The generality of our dynamics model should apply to various collective decision-making systems, not only to human groups. Because it is a fundamental property of adaptive reinforcement learning, risk aversion due to the hot stove effect should be widespread in animals (<xref ref-type="bibr" rid="bib58">Real, 1981</xref>; <xref ref-type="bibr" rid="bib76">Weber et al., 2004</xref>; <xref ref-type="bibr" rid="bib35">Hertwig and Erev, 2009</xref>). Therefore, its solution, the collective behavioural rescue, should also operate broadly in collective animal decision making because frequency-based copying is one of the common social learning strategies (<xref ref-type="bibr" rid="bib36">Hoppitt and Laland, 2013</xref>; <xref ref-type="bibr" rid="bib31">Grüter and Leadbeater, 2014</xref>). Future research should determine to what extent the collective behavioural rescue actually impacts animal decision making in wider contexts, and whether it influences the evolution of social learning, information sharing, and the formation of group living.</p><p>We have identified a previously overlooked mechanism underlying the adaptive advantages of frequency-based social learning. Our results suggest that an informational benefit of group living could exist well beyond simple informational pooling where individuals can enjoy the wisdom of crowds effect (<xref ref-type="bibr" rid="bib74">Ward and Zahavi, 1973</xref>). Furthermore, the flexibility emerging through the interaction of negative and positive feedback suggests that conformity could evolve in a wider range of environments than previously assumed (<xref ref-type="bibr" rid="bib2">Aoki and Feldman, 2014</xref>; <xref ref-type="bibr" rid="bib55">Nakahashi et al., 2012</xref>), including temporally variable environments (<xref ref-type="bibr" rid="bib3">Aplin et al., 2017</xref>). Social learning can drive self-organisation, regulating the mitigation and amplification of behavioural biases and canalising the course of repeated decision making under risk and uncertainty.</p></sec><sec id="s4" sec-type="materials|methods"><title>Materials and methods</title><sec id="s4-1"><title>The baseline asocial learning model and the hot stove effect</title><p>We assumed that the decision maker updates their value of choosing the alternative <inline-formula><mml:math id="inf379"><mml:mi>i</mml:mi></mml:math></inline-formula> (<inline-formula><mml:math id="inf380"><mml:mrow><mml:mi/><mml:mo>∈</mml:mo><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>r</mml:mi><mml:mo stretchy="false">}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>) at time <inline-formula><mml:math id="inf381"><mml:mi>t</mml:mi></mml:math></inline-formula> following the Rescorla–Wagner learning rule: <inline-formula><mml:math id="inf382"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mrow></mml:msub><mml:mo>←</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mi>α</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>⁢</mml:mo><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi>α</mml:mi><mml:mo>⁢</mml:mo><mml:msub><mml:mi>π</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math id="inf383"><mml:mi>α</mml:mi></mml:math></inline-formula> (<inline-formula><mml:math id="inf384"><mml:mrow><mml:mn>0</mml:mn><mml:mo>≤</mml:mo><mml:mi>α</mml:mi><mml:mo>≤</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula>) is a <italic>learning rate</italic>, manipulating the step size of the belief updating, and <inline-formula><mml:math id="inf385"><mml:msub><mml:mi>π</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is a realised payoff from the chosen alternative <inline-formula><mml:math id="inf386"><mml:mi>i</mml:mi></mml:math></inline-formula> at time <inline-formula><mml:math id="inf387"><mml:mi>t</mml:mi></mml:math></inline-formula> (<xref ref-type="bibr" rid="bib68">Sutton and Barto, 2018</xref>). The larger the <inline-formula><mml:math id="inf388"><mml:mi>α</mml:mi></mml:math></inline-formula>, the more weight is given to recent experiences, making reinforcement learning more myopic. The <inline-formula><mml:math id="inf389"><mml:mi>Q</mml:mi></mml:math></inline-formula> value for the unchosen alternative is unchanged. Before the first choice, individuals had no previous preference for either option (i.e. <inline-formula><mml:math id="inf390"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></inline-formula>). Then <inline-formula><mml:math id="inf391"><mml:mi>Q</mml:mi></mml:math></inline-formula> values were translated into choice probabilities through a softmax (or multinomial-logistic) function such that <inline-formula><mml:math id="inf392"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mi>exp</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>β</mml:mi><mml:mo>⁢</mml:mo><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>/</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi>exp</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>β</mml:mi><mml:mo>⁢</mml:mo><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi>exp</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>β</mml:mi><mml:mo>⁢</mml:mo><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math id="inf393"><mml:mi>β</mml:mi></mml:math></inline-formula>, the <italic>inverse temperature</italic>, is a parameter regulating how sensitive the choice probability is to the value of the estimate <inline-formula><mml:math id="inf394"><mml:mi>Q</mml:mi></mml:math></inline-formula> (i.e. controlling the proneness to explore).</p><p>In such a risk-heterogeneous multi-armed bandit setting, reinforcement learners are prone to exhibiting suboptimal risk aversion (<xref ref-type="bibr" rid="bib46">March, 1996</xref>; <xref ref-type="bibr" rid="bib21">Denrell, 2007</xref>; <xref ref-type="bibr" rid="bib35">Hertwig and Erev, 2009</xref>), even though they could have achieved high performance in a risk-homogeneous task where all options have an equivalent payoff variance (<xref ref-type="bibr" rid="bib68">Sutton and Barto, 2018</xref>). <xref ref-type="bibr" rid="bib21">Denrell, 2007</xref> mathematically derived the condition under which suboptimal risk aversion arises, depicted by the dashed curve in <xref ref-type="fig" rid="fig1">Figure 1b</xref>. In the main analysis, we focused on the case where the risky alternative had <inline-formula><mml:math id="inf395"><mml:mrow><mml:mi>μ</mml:mi><mml:mo>=</mml:mo><mml:mn>1.5</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="inf396"><mml:mrow><mml:mtext>s.d.</mml:mtext><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula> and the safe alternative generated <inline-formula><mml:math id="inf397"><mml:mrow><mml:msub><mml:mi>π</mml:mi><mml:mi>s</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula> unless otherwise stated, that is, where choosing the risky alternative was the optimal strategy for a decision maker in the long run.</p></sec><sec id="s4-2"><title>Collective learning and social influences</title><p>We extended the baseline model to a collective learning situation in which a group of 10 individuals completed the task simultaneously and individuals could obtain social information. For social information, we assumed a simple frequency-based social cue specifying distributions of individual choices (<xref ref-type="bibr" rid="bib47">McElreath et al., 2005</xref>; <xref ref-type="bibr" rid="bib48">McElreath et al., 2008</xref>; <xref ref-type="bibr" rid="bib72">Toyokawa et al., 2017</xref>; <xref ref-type="bibr" rid="bib73">Toyokawa et al., 2019</xref>; <xref ref-type="bibr" rid="bib19">Deffner et al., 2020</xref>). Following the previous modelling of social learning in such multi-agent multi-armed bandit situations (e.g. <xref ref-type="bibr" rid="bib3">Aplin et al., 2017</xref>; <xref ref-type="bibr" rid="bib7">Barrett et al., 2017</xref>; <xref ref-type="bibr" rid="bib47">McElreath et al., 2005</xref>; <xref ref-type="bibr" rid="bib48">McElreath et al., 2008</xref>; <xref ref-type="bibr" rid="bib72">Toyokawa et al., 2017</xref>; <xref ref-type="bibr" rid="bib73">Toyokawa et al., 2019</xref>; <xref ref-type="bibr" rid="bib19">Deffner et al., 2020</xref>), we assumed that social influences on reinforcement learning would be expressed as a weighted average between the softmax probability based on the <inline-formula><mml:math id="inf398"><mml:mi>Q</mml:mi></mml:math></inline-formula> values and the conformist social influence, as follows:<disp-formula id="equ1"><label>(1)</label><mml:math id="m1"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>−</mml:mo><mml:mi>σ</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mfrac><mml:mrow><mml:mi>exp</mml:mi><mml:mo>⁡</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>β</mml:mi><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mi>exp</mml:mi><mml:mo>⁡</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>β</mml:mi><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>+</mml:mo><mml:mi>exp</mml:mi><mml:mo>⁡</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>β</mml:mi><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac><mml:mo>+</mml:mo><mml:mi>σ</mml:mi><mml:mfrac><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>−</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mn>0.1</mml:mn><mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>−</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mn>0.1</mml:mn><mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mi mathvariant="normal">%</mml:mi><mml:mo>−</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mn>0.1</mml:mn><mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula></p><p>where <inline-formula><mml:math id="inf399"><mml:mi>σ</mml:mi></mml:math></inline-formula> was a weight given to the social influence (<italic>copying weight</italic>) and <inline-formula><mml:math id="inf400"><mml:mi>θ</mml:mi></mml:math></inline-formula> was the strength of conformist influence (<italic>conformity exponent</italic>), which determines the influence of social frequency on choosing the alternative <inline-formula><mml:math id="inf401"><mml:mi>i</mml:mi></mml:math></inline-formula> at time <inline-formula><mml:math id="inf402"><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula>, that is, <inline-formula><mml:math id="inf403"><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>. The larger the conformity exponent <inline-formula><mml:math id="inf404"><mml:mi>θ</mml:mi></mml:math></inline-formula>, the higher the influence that was given to an alternative that was chosen by more individuals, with non-linear conformist social influence arising when <inline-formula><mml:math id="inf405"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>></mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>. We added a small number, 0.1, to <inline-formula><mml:math id="inf406"><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> so that an option chosen by no one (i.e., <inline-formula><mml:math id="inf407"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></inline-formula>) could provide the highest social influence when <inline-formula><mml:math id="inf408"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>θ</mml:mi><mml:mo><</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula> (negative frequency bias). Although this additional 0.1 slightly reduces the conformity influence when <inline-formula><mml:math id="inf409"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>></mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>, we confirmed that the results were qualitatively unchanged. Note also that in the first trial <inline-formula><mml:math id="inf410"><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula>, we assumed that the choice was determined solely by the asocial softmax function because there was no social information available yet.</p><p>Note that when <inline-formula><mml:math id="inf411"><mml:mrow><mml:mi>σ</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></inline-formula>, there is no social influence, and the decision maker is considered an asocial learner. It is also worth noting that when <inline-formula><mml:math id="inf412"><mml:mrow><mml:mi>σ</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula> with <inline-formula><mml:math id="inf413"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>θ</mml:mi><mml:mo>></mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>, individual choices become fully contingent on the group’s most common behaviour, which was assumed in some previous models of strong conformist social influences in sampling behaviour (<xref ref-type="bibr" rid="bib23">Denrell and Le Mens, 2017</xref>). The descriptions of the parameters are shown in <xref ref-type="table" rid="table1">Table 1</xref>. The simulations were run in R 4.0.2 (<ext-link ext-link-type="uri" xlink:href="https://www.r-project.org">https://www.r-project.org</ext-link>) and the code is available at (<ext-link ext-link-type="uri" xlink:href="https://github.com/WataruToyokawa/ToyokawaGaissmaier2021/tree/main/dynamicsModel">the author’s github repository</ext-link>).</p></sec><sec id="s4-3"><title>The approximated dynamics model of collective behaviour</title><p>We assume a group of <inline-formula><mml:math id="inf414"><mml:mi>N</mml:mi></mml:math></inline-formula> individuals who exhibit two different behavioural states: choosing a safe alternative <inline-formula><mml:math id="inf415"><mml:mi>S</mml:mi></mml:math></inline-formula>, exhibited by <inline-formula><mml:math id="inf416"><mml:msub><mml:mi>N</mml:mi><mml:mi>S</mml:mi></mml:msub></mml:math></inline-formula> individuals; and choosing a risky alternative <inline-formula><mml:math id="inf417"><mml:mi>R</mml:mi></mml:math></inline-formula>, exhibited by <inline-formula><mml:math id="inf418"><mml:msub><mml:mi>N</mml:mi><mml:mi>R</mml:mi></mml:msub></mml:math></inline-formula> individuals (<inline-formula><mml:math id="inf419"><mml:mrow><mml:mi>N</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>S</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mi>R</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:math></inline-formula>). We also assume that there are two different ‘inner belief’ states, labelled ‘-’ and ‘+’. Individuals who possess the negative belief prefer the safe alternative <inline-formula><mml:math id="inf420"><mml:mi>S</mml:mi></mml:math></inline-formula> to <inline-formula><mml:math id="inf421"><mml:mi>R</mml:mi></mml:math></inline-formula>, while those who possess the positive belief prefer <inline-formula><mml:math id="inf422"><mml:mi>R</mml:mi></mml:math></inline-formula> to <inline-formula><mml:math id="inf423"><mml:mi>S</mml:mi></mml:math></inline-formula>. A per capita probability of choice shift from one behavioural alternative to the other is denoted by <inline-formula><mml:math id="inf424"><mml:mi>P</mml:mi></mml:math></inline-formula>. For example, <inline-formula><mml:math id="inf425"><mml:msubsup><mml:mi>P</mml:mi><mml:mi>S</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:math></inline-formula> means the individual probability of changing the choice to the safe alternative from the risky alternative under the negative belief. Because there exist <inline-formula><mml:math id="inf426"><mml:msubsup><mml:mi>N</mml:mi><mml:mi>S</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:math></inline-formula> individuals who chose <inline-formula><mml:math id="inf427"><mml:mi>S</mml:mi></mml:math></inline-formula> with belief -, the total number of individuals who ‘move on’ to <inline-formula><mml:math id="inf428"><mml:mi>S</mml:mi></mml:math></inline-formula> from <inline-formula><mml:math id="inf429"><mml:mi>R</mml:mi></mml:math></inline-formula> at one time step is denoted by <inline-formula><mml:math id="inf430"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msubsup><mml:mi>P</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo>−</mml:mo></mml:mrow></mml:msubsup><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo>−</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula>. We assume that the probability of shifting to the more preferable option is larger than that of shifting to the less preferable option, that is, <inline-formula><mml:math id="inf431"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msubsup><mml:mi>P</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo>−</mml:mo></mml:mrow></mml:msubsup><mml:mo>></mml:mo><mml:msubsup><mml:mi>P</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mo>−</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> and <inline-formula><mml:math id="inf432"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msubsup><mml:mi>P</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mo>+</mml:mo></mml:mrow></mml:msubsup><mml:mo>></mml:mo><mml:msubsup><mml:mi>P</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo>+</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> (<xref ref-type="fig" rid="fig4">Figure 4a</xref>).</p><p>We assume that the belief state can change by choosing the risky alternative. We define that the per capita probability of becoming + state, that is, having a higher preference for the risky alternative, is <inline-formula><mml:math id="inf433"><mml:mi>e</mml:mi></mml:math></inline-formula> (<inline-formula><mml:math id="inf434"><mml:mrow><mml:mn>0</mml:mn><mml:mo>≤</mml:mo><mml:mi>e</mml:mi><mml:mo>≤</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula>), and hence <inline-formula><mml:math id="inf435"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mi>R</mml:mi><mml:mo>+</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mi>e</mml:mi><mml:mo>⁢</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mi>R</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:math></inline-formula>. The rest of the individuals who choose the risky alternative become - belief state, that is, <inline-formula><mml:math id="inf436"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mi>R</mml:mi><mml:mo>-</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mi>e</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>⁢</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mi>R</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:math></inline-formula>.</p><p>We define ‘<inline-formula><mml:math id="inf437"><mml:mi>e</mml:mi></mml:math></inline-formula>’ so that it can be seen as a risk premium of the gambles. For example, imagine a two-armed bandit task equipped with one risky arm with Gaussian noises and the other a sure arm. The larger the mean expected reward of the risky option (i.e. the higher the risk premium), the more people who choose the risky arm are expected to obtain a larger reward than what the safe alternative would provide. By assuming <inline-formula><mml:math id="inf438"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>e</mml:mi><mml:mo>></mml:mo><mml:mn>1</mml:mn><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>, therefore, it approximates a situation where risk seeking is optimal in the long run.</p><p>Here, we focus only on the population dynamics: If more people choose <inline-formula><mml:math id="inf439"><mml:mi>S</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="inf440"><mml:msub><mml:mi>N</mml:mi><mml:mi>S</mml:mi></mml:msub></mml:math></inline-formula> increases. On the other hand, if more people choose <inline-formula><mml:math id="inf441"><mml:mi>R</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="inf442"><mml:msub><mml:mi>N</mml:mi><mml:mi>R</mml:mi></mml:msub></mml:math></inline-formula> increases. As a consequence, the system may eventually reach an equilibrium state where both <inline-formula><mml:math id="inf443"><mml:msub><mml:mi>N</mml:mi><mml:mi>S</mml:mi></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math id="inf444"><mml:msub><mml:mi>N</mml:mi><mml:mi>R</mml:mi></mml:msub></mml:math></inline-formula> no longer change. If we find that the equilibrium state of the population (denoted by *) satisfies <inline-formula><mml:math id="inf445"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mo>⋆</mml:mo></mml:mrow></mml:msubsup><mml:mo>></mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo>⋆</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula>, we define that the population exhibits risk seeking, escaping from the hot stove effect. For the sake of simplicity, we assumed <inline-formula><mml:math id="inf446"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>l</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mi>P</mml:mi><mml:mi>R</mml:mi><mml:mo>-</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mi>P</mml:mi><mml:mi>S</mml:mi><mml:mo>+</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="inf447"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mi>P</mml:mi><mml:mi>R</mml:mi><mml:mo>+</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mi>P</mml:mi><mml:mi>S</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math id="inf448"><mml:mrow><mml:mn>0</mml:mn><mml:mo>≤</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>l</mml:mi></mml:msub><mml:mo>≤</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub><mml:mo>≤</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula>, for the asocial baseline model.</p><p>Considering <inline-formula><mml:math id="inf449"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mi>R</mml:mi><mml:mo>+</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mi>e</mml:mi><mml:mo>⁢</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mi>R</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="inf450"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mi>R</mml:mi><mml:mo>-</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mi>e</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>⁢</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mi>R</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:math></inline-formula>, the dynamics are written as the following differential equations:<disp-formula id="equ2"><label>(2)</label><mml:math id="m2"><mml:mrow><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mo>{</mml:mo><mml:mtable columnalign="left left" columnspacing="1em" displaystyle="false" rowspacing=".2em"><mml:mtr><mml:mtd><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:mi>d</mml:mi><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:mfrac><mml:mo>=</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo>−</mml:mo></mml:mrow></mml:msubsup><mml:mo>−</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>−</mml:mo><mml:mi>e</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo>+</mml:mo></mml:mrow></mml:msubsup><mml:mo>−</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mi>e</mml:mi><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msub></mml:mstyle></mml:mtd><mml:mtd/></mml:mtr><mml:mtr><mml:mtd><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:mi>d</mml:mi><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo>−</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:mfrac><mml:mo>=</mml:mo><mml:mo>−</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo>−</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>−</mml:mo><mml:mi>e</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mstyle></mml:mtd><mml:mtd/></mml:mtr><mml:mtr><mml:mtd><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:mi>d</mml:mi><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo>+</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:mfrac><mml:mo>=</mml:mo><mml:mo>−</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo>+</mml:mo></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mi>e</mml:mi><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo>.</mml:mo></mml:mstyle></mml:mtd><mml:mtd/></mml:mtr></mml:mtable><mml:mo fence="true" stretchy="true" symmetric="true"/></mml:mrow></mml:mstyle></mml:mrow></mml:math></disp-formula></p><p>Overall, our model crystallises the asymmetry emerging from adaptive sampling, which is considered as a fundamental mechanism of the hot stove effect (<xref ref-type="bibr" rid="bib21">Denrell, 2007</xref>; <xref ref-type="bibr" rid="bib46">March, 1996</xref>): Once decision makers underestimate the expected value of the risky alternative, they start avoiding it and do not have another chance to correct the error. In other words, although there would potentially be more individuals who obtain a preference for <inline-formula><mml:math id="inf451"><mml:mi>R</mml:mi></mml:math></inline-formula> by choosing the risky alternative (i.e. <inline-formula><mml:math id="inf452"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>e</mml:mi><mml:mo>></mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>), this asymmetry raised by the adaptive balance between exploration–exploitation may constantly increase the number of people who possess a preference for <inline-formula><mml:math id="inf453"><mml:mi>S</mml:mi></mml:math></inline-formula> due to underestimation of the value of the risky alternative. If our model is able to capture this asymmetric dynamics properly, the relationship between <inline-formula><mml:math id="inf454"><mml:mi>e</mml:mi></mml:math></inline-formula> (i.e. the potential goodness of the risky option) and <inline-formula><mml:math id="inf455"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>l</mml:mi></mml:msub><mml:mo>/</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>h</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> (i.e. the exploration–exploitation) should account for the hot stove effect, as suggested by previous learning model analysis (<xref ref-type="bibr" rid="bib21">Denrell, 2007</xref>). The equilibrium analysis was conducted in Mathematica (<ext-link ext-link-type="uri" xlink:href="https://github.com/WataruToyokawa/ToyokawaGaissmaier2021/tree/main/dynamicsModel">code is available online</ext-link>). The results are shown in <xref ref-type="fig" rid="fig4">Figure 4</xref>.</p></sec><sec id="s4-4"><title>Collective dynamics with social influences</title><p>For social influences, we assumed that the behavioural transition rates, <inline-formula><mml:math id="inf456"><mml:msub><mml:mi>P</mml:mi><mml:mi>S</mml:mi></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math id="inf457"><mml:msub><mml:mi>P</mml:mi><mml:mi>R</mml:mi></mml:msub></mml:math></inline-formula>, would depend on the number of individuals <inline-formula><mml:math id="inf458"><mml:msub><mml:mi>N</mml:mi><mml:mi>S</mml:mi></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math id="inf459"><mml:msub><mml:mi>N</mml:mi><mml:mi>R</mml:mi></mml:msub></mml:math></inline-formula> as follows:<disp-formula id="equ3"><label>(3)</label><mml:math id="m3"><mml:mrow><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mo>{</mml:mo><mml:mtable columnalign="left left" columnspacing="1em" displaystyle="false" rowspacing=".2em"><mml:mtr><mml:mtd><mml:mstyle displaystyle="true" scriptlevel="0"><mml:msubsup><mml:mi>P</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo>−</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>−</mml:mo><mml:mi>σ</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>σ</mml:mi><mml:mfrac><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup></mml:mrow></mml:mfrac><mml:mo>,</mml:mo></mml:mstyle></mml:mtd><mml:mtd/></mml:mtr><mml:mtr><mml:mtd><mml:mstyle displaystyle="true" scriptlevel="0"><mml:msubsup><mml:mi>P</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mo>−</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>−</mml:mo><mml:mi>σ</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>σ</mml:mi><mml:mfrac><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup></mml:mrow></mml:mfrac><mml:mo>,</mml:mo></mml:mstyle></mml:mtd><mml:mtd/></mml:mtr><mml:mtr><mml:mtd><mml:mstyle displaystyle="true" scriptlevel="0"><mml:msubsup><mml:mi>P</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo>+</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>−</mml:mo><mml:mi>σ</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>σ</mml:mi><mml:mfrac><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup></mml:mrow></mml:mfrac><mml:mo>,</mml:mo></mml:mstyle></mml:mtd><mml:mtd/></mml:mtr><mml:mtr><mml:mtd><mml:mstyle displaystyle="true" scriptlevel="0"><mml:msubsup><mml:mi>P</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mo>+</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>−</mml:mo><mml:mi>σ</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>σ</mml:mi><mml:mfrac><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup></mml:mrow></mml:mfrac><mml:mo>,</mml:mo></mml:mstyle></mml:mtd><mml:mtd/></mml:mtr></mml:mtable><mml:mo fence="true" stretchy="true" symmetric="true"/></mml:mrow></mml:mstyle></mml:mrow></mml:math></disp-formula></p><p>where <inline-formula><mml:math id="inf460"><mml:mi>σ</mml:mi></mml:math></inline-formula> is the weight of social influence and <inline-formula><mml:math id="inf461"><mml:mi>θ</mml:mi></mml:math></inline-formula> is the strength of the conformist bias, corresponding to the agent-based learning model (<xref ref-type="table" rid="table1">Table 1</xref>). Other assumptions were the same as in the baseline dynamics model. The baseline dynamics model was a special case of this social influence model with <inline-formula><mml:math id="inf462"><mml:mrow><mml:mi>σ</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></inline-formula>. Because the system was not analytically tractable, we obtained the numeric solution across different initial distribution of <inline-formula><mml:math id="inf463"><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mi>S</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math id="inf464"><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mi>R</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> for various combinations of the parameters.</p></sec><sec id="s4-5"><title>The online experiments</title><p>The experimental procedure was approved by the Ethics Committee at the University of Konstanz (‘Collective learning and decision-making study’). Six hundred nineteen English-speaking subjects [294 self-identified as women, 277 as men, 1 as other, and the rest of 47 unspecified; mean (minimum, maximum) age = 35.2 (18, 74) years] participated in the task through the online experimental recruiting platform <ext-link ext-link-type="uri" xlink:href="https://www.prolific.co">Prolific Academic</ext-link>. We excluded subjects who disconnected from the online task before completing at least the first 35 rounds from our computational model-fitting analysis, resulting in 585 subjects (the detailed distribution of subjects for each condition is shown in <xref ref-type="table" rid="table3">Table 3</xref>). A parameter recovery test had suggested that the sample size was sufficient to reliably estimate individual parameters using a hierarchical Bayesian fitting method (see below; <xref ref-type="fig" rid="fig6s3">Figure 6—figure supplement 3</xref>).</p><sec id="s4-5-1"><title>Design of the experimental manipulations</title><p>The group size was manipulated by randomly assigning different capacities of a ‘waiting lobby’ where subjects had to wait until other subjects arrived. When the lobby capacity was 1, which happened at probability 0.1, the individual condition started upon the first subject’s arrival. Otherwise, the group condition started when there were more than three people at 3 min since the lobby opened (see Appendix 1 Supplementary Methods). If there were only two or fewer people in the lobby at this stage, the subjects each were assigned to the individual condition. Note that some groups in the group condition ended up with only two individuals due to a drop out of one individual during the task.</p><p>We used three different tasks: a 1-risky-1-safe task, a 1-risky-3-safe task, and a 2-risky-2-safe task, where one risky option was expected to give a higher payoff than other options on average (that is, tasks with a positive risk premium [positive RP]). To confirm our prediction that risky shift would not strongly emerge when risk premium was negative (i.e. risk seeking was suboptimal), we also conducted another 1-risky-1-safe task with a negative risk premium (the negative RP task). Participants’ goal was to gather as many individual payoff as possible, as monetary incentives were given to the individual performance. In the negative RP task, risk aversion was favourable instead. All tasks had 70 decision-making trials. The task proceeded on a trial basis; that is, trials of all individuals in a group were synchronised. Subjects in the group condition could see social frequency information, namely, how many people chose each alternative in the preceding trial. No social information was available in the first trial. These tasks were assigned randomly as a between subject condition, and subjects were allowed to participate in one session only.</p><p>We employed a skewed payoff probability distribution rather than a normal distribution for the risky alternative, and we conducted not only a two-armed task but also four-armed bandit tasks, because our pilot study had suggested that subjects tended to have a small susceptibility to the effect (<inline-formula><mml:math id="inf465"><mml:mrow><mml:mrow><mml:msub><mml:mi>α</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>β</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>≪</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:math></inline-formula>), and hence we needed more difficult settings than the conventional Gaussian noise binary-choice task to elicit risk aversion from individual decision makers. Running agent-based simulations, we confirmed that these task setups used in the experiment could elicit the collective rescue effect (<xref ref-type="fig" rid="fig1s5">Figure 1—figure supplement 5</xref> <xref ref-type="fig" rid="fig1s6">Figure 1—figure supplement 6</xref>).</p><p>The details of the task setups are as follows:</p><sec id="s4-5-1-1"><title>The 1-risky-1-safe task (positive RP)</title><p>The optimal risky option produced either 50 or 550 points at probability 0.7 and 0.3, respectively (the expected payoff was 200). The safe option produced 150 points (with a small amount of Gaussian noise with s.d. = 5).</p></sec><sec id="s4-5-1-2"><title>The 1-risky-3-safe task (positive RP)</title><p>The optimal risky option produced either 50 or 425 points at probability 0.6 and 0.4, respectively (the expected payoff was 200). The three safe options each produced 150, 125, and 100 points, respectively, with a small Gaussian noise with s.d. = 5.</p></sec><sec id="s4-5-1-3"><title>The 2-risky-2-safe task (positive RP)</title><p>The optimal risky option produced either 50 or 425 points at probability 0.6 and 0.4, respectively (the expected payoff was 200). The two safe options each produced 150 and 125 points, respectively, with a small Gaussian noise with s.d. = 5. The suboptimal risky option, whose expected value was 125, produced either 50 or 238 points at probability 0.6 and 0.4, respectively.</p></sec><sec id="s4-5-1-4"><title>The 1-risky-1-safe task (negative RP)</title><p>The setting was the same as in the 1-risky-1-safe positive RP task, except that the expected payoff from the risky option was smaller than the safe option, producing either 50 or 220 points at probability 0.7 and 0.3, respectively (the expected payoff was 101).</p><p>We have confirmed through agent-based model simulations that the collective behavioural rescue could emerge in tasks equipped with the experimental settings (<xref ref-type="fig" rid="fig1s5">Figure 1—figure supplement 5</xref>). We have also confirmed that risk seeking does not always increase when risk premium is negative (<xref ref-type="fig" rid="fig1s6">Figure 1—figure supplement 6</xref>). With the four-armed tasks we aimed to demonstrate that the rescue effect is not limited to binary-choice situations. Other procedures of the collective learning task were the same as those used in our agent-based simulation shown in the main text. The experimental materials including illustrated instructions can be found in <xref ref-type="video" rid="video1">Video 1</xref> (individual condition) and <xref ref-type="video" rid="video2">Video 2</xref> (group condition).</p><media id="video1" mime-subtype="mp4" mimetype="video" xlink:href="elife-75308-video1.mp4"><label>Video 1.</label><caption><title>A sample screenshot of the online experimental task (Individual condition).</title><p>This video was taken only for the demonstration purpose and hence not associated to any actual participant’s behaviour.</p></caption></media><media id="video2" mime-subtype="mp4" mimetype="video" xlink:href="elife-75308-video2.mp4"><label>Video 2.</label><caption><title>A sample screenshot of the online experimental task with N = 3 (group condition).</title><p>This video was taken only for the demonstration purpose and hence not associated to any actual participant’s behaviour. Also note that actual participants could see only one browser window per participant in the experimental sessions.</p></caption></media></sec></sec></sec><sec id="s4-6"><title>The hierarchical Bayesian model fitting</title><p>To fit the mixed logit model (GLMM) as well as the learning model, we used a hierarchical Bayesian method. For the learning model, we estimated the global means (<inline-formula><mml:math id="inf466"><mml:msub><mml:mi>μ</mml:mi><mml:mi>α</mml:mi></mml:msub></mml:math></inline-formula>, <inline-formula><mml:math id="inf467"><mml:msub><mml:mi>μ</mml:mi><mml:mi>β</mml:mi></mml:msub></mml:math></inline-formula>, <inline-formula><mml:math id="inf468"><mml:msub><mml:mi>μ</mml:mi><mml:mi>σ</mml:mi></mml:msub></mml:math></inline-formula>, and <inline-formula><mml:math id="inf469"><mml:msub><mml:mi>μ</mml:mi><mml:mi>θ</mml:mi></mml:msub></mml:math></inline-formula>) and global variances (<inline-formula><mml:math id="inf470"><mml:msub><mml:mi>v</mml:mi><mml:mi>α</mml:mi></mml:msub></mml:math></inline-formula>, <inline-formula><mml:math id="inf471"><mml:msub><mml:mi>v</mml:mi><mml:mi>β</mml:mi></mml:msub></mml:math></inline-formula>, <inline-formula><mml:math id="inf472"><mml:msub><mml:mi>v</mml:mi><mml:mi>σ</mml:mi></mml:msub></mml:math></inline-formula>, and <inline-formula><mml:math id="inf473"><mml:msub><mml:mi>v</mml:mi><mml:mi>θ</mml:mi></mml:msub></mml:math></inline-formula>) for each of the four experimental conditions and for the individual and group conditions separately. For the individual condition, we assumed <inline-formula><mml:math id="inf474"><mml:mrow><mml:mi>σ</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></inline-formula> for all subjects and hence no social learning parameters were estimated. Full details of the model-fitting procedure and prior assumptions are shown in the Supplementary Methods. The R and Stan code used in the model fitting are available from <ext-link ext-link-type="uri" xlink:href="https://github.com/WataruToyokawa/ToyokawaGaissmaier2021">an online repository</ext-link>.</p><sec id="s4-6-1"><title>The GLMM</title><p>We conducted a mixed logit model analysis to investigate the relationship between the proportion of choosing the risky option in the second half of the trials (<inline-formula><mml:math id="inf475"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>></mml:mo><mml:mn>35</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mstyle></mml:math></inline-formula>) and the fit learning parameters (<inline-formula><mml:math id="inf476"><mml:mrow><mml:msub><mml:mi>α</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>β</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="inf477"><mml:msub><mml:mi>σ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula>, and <inline-formula><mml:math id="inf478"><mml:msub><mml:mi>θ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula>). Since no social learning parameters exist in the individual condition, the dummy variable of the group condition was considered (<inline-formula><mml:math id="inf479"><mml:mrow><mml:msub><mml:mi>G</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula> if individual <inline-formula><mml:math id="inf480"><mml:mi>i</mml:mi></mml:math></inline-formula> was in the group condition or 0 otherwise). The formula used is <inline-formula><mml:math id="inf481"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>g</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>></mml:mo><mml:mn>35</mml:mn></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula> = <inline-formula><mml:math id="inf482"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>γ</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>γ</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mi>α</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>β</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi>γ</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>γ</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>α</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>β</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi>γ</mml:mi><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>σ</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>γ</mml:mi><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>θ</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>ϵ</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>ϵ</mml:mi><mml:mrow><mml:mi>g</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mstyle></mml:math></inline-formula>, where <inline-formula><mml:math id="inf483"><mml:msub><mml:mi>ϵ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math id="inf484"><mml:msub><mml:mi>ϵ</mml:mi><mml:mi>g</mml:mi></mml:msub></mml:math></inline-formula> were the random effect of individual and group, respectively. The model fitting using the Markov chain Monte Carlo (MCMC) method was the same as what was used for the computational model fitting, and the code are available from the repository shown above.</p></sec><sec id="s4-6-2"><title>Model and parameter recovery, and post hoc simulation</title><p>To assess the adequacy of the hierarchical Bayesian model-fitting method, we tested how well the hierarchical Bayesian method (HBM) could recover ‘true’ parameter values that were used to simulate synthetic data. We simulated artificial agents’ behaviour assuming that they behave according to the social learning model with each parameter setting. We generated ‘true’ parameter values for each simulated agent based on both experimentally fit global parameters (<xref ref-type="table" rid="table1">Table 1</xref>; parameter recovery test 1). In addition, we ran another recovery test using arbitrary global parameters that deviated from the experimentally fit values (parameter recovery test 2), to confirm that our fitting procedure was not just ‘attracted’ to the fit value. We then simulated synthetic behavioural data and recovered their parameter values using the HBM described above. Both parameter recovery tests showed that all the recovered individual parameters were positively correlated with the true values, whose correlation coefficients were all larger than 0.5. We also confirmed that 30 of 32 global parameters in total were recovered within the 95% Bayesian credible intervals, and that even those two non-recovered parameters (<inline-formula><mml:math id="inf485"><mml:msub><mml:mi>μ</mml:mi><mml:mi>β</mml:mi></mml:msub></mml:math></inline-formula> for the 2-risky-2-safe task in parameter recovery test 1 and <inline-formula><mml:math id="inf486"><mml:msub><mml:mi>μ</mml:mi><mml:mi>α</mml:mi></mml:msub></mml:math></inline-formula> for the 1-risky-3-safe task in parameter recovery test 2) did not deviate so much from the true value (<xref ref-type="fig" rid="fig6s3">Figure 6—figure supplement 3</xref>).</p><p>We compared the baseline reinforcement learning model, the decision-biasing model, and the value-shaping model (see Supplementary Methods) using Bayesian model selection (<xref ref-type="bibr" rid="bib65">Stephan et al., 2009</xref>). The model frequency and exceedance probability were calculated based on the Widely Applicable Information Criterion (WAIC) values for each subject (<xref ref-type="bibr" rid="bib75">Watanabe and Opper, 2010</xref>). We confirmed accurate model recovery by simulations using our task setting (<xref ref-type="fig" rid="fig6s2">Figure 6—figure supplement 2</xref>).</p><p>We also ran a series of individual-based model simulations using the calibrated global parameter values for each condition. First, we randomly sampled a set of agents whose individual parameter values were drawn from the fit global parameters. Second, we let this synthetic group of agents perform the task for 70 rounds. We repeated these steps 100,000 times for each task setting and for each individual and group condition.</p></sec></sec></sec></body><back><sec id="s5" sec-type="additional-information"><title>Additional information</title><fn-group content-type="competing-interest"><title>Competing interests</title><fn fn-type="COI-statement" id="conf1"><p>No competing interests declared</p></fn></fn-group><fn-group content-type="author-contribution"><title>Author contributions</title><fn fn-type="con" id="con1"><p>Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Validation, Visualization, Writing - original draft, Writing - review and editing</p></fn><fn fn-type="con" id="con2"><p>Conceptualization, Funding acquisition, Project administration, Visualization, Writing - original draft, Writing - review and editing</p></fn></fn-group><fn-group content-type="ethics-information"><title>Ethics</title><fn fn-type="other"><p>Human subjects: The experimental procedure was approved by the Ethics Committee at the University of Konstanz ('Collective learning and decision-making study').All subjects consented to participation through an online consent form at the beginning of the task.</p></fn></fn-group></sec><sec id="s6" sec-type="supplementary-material"><title>Additional files</title><supplementary-material id="transrepform"><label>Transparent reporting form</label><media mime-subtype="docx" mimetype="application" xlink:href="elife-75308-transrepform1-v1.docx"/></supplementary-material></sec><sec id="s7" sec-type="data-availability"><title>Data availability</title><p>Code for the agent-based simulation as well as for the experimental data analyses can be found in the main author's Github repository <ext-link ext-link-type="uri" xlink:href="https://github.com/WataruToyokawa/ToyokawaGaissmaier2021">https://github.com/WataruToyokawa/ToyokawaGaissmaier2021</ext-link> (copy archived at <ext-link ext-link-type="uri" xlink:href="https://archive.softwareheritage.org/swh:1:dir:b7a125996741a1edef36a6ff5d6dfd71607ceccb;origin=https://github.com/WataruToyokawa/ToyokawaGaissmaier2021;visit=swh:1:snp:f5e052d6d238d3e6a6a60e7e9a2122c33a3b4ae7;anchor=swh:1:rev:6fca0b26c33af3a5b3c415719fa3df0dced15149">swh:1:rev:6fca0b26c33af3a5b3c415719fa3df0dced15149</ext-link>).</p></sec><ack id="ack"><title>Acknowledgements</title><p>This work was funded by a Small Project Grant from the Centre for the Advanced Study of Collective Behaviour, the University of Konstanz (S20-06), by the University of Konstanz Committee on Research (FP031/19), and by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy – EXC 2117–422037984. We thank Iain Couzin, Lucy Aplin, Brendan Barrett, Ralf Kurvers, Charley Wu, Gota Morishita, and Anita Todd for many helpful comments on earlier versions of this paper.</p></ack><ref-list><title>References</title><ref id="bib1"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Alem</surname><given-names>S</given-names></name><name><surname>Perry</surname><given-names>CJ</given-names></name><name><surname>Zhu</surname><given-names>X</given-names></name><name><surname>Loukola</surname><given-names>OJ</given-names></name><name><surname>Ingraham</surname><given-names>T</given-names></name><name><surname>Søvik</surname><given-names>E</given-names></name><name><surname>Chittka</surname><given-names>L</given-names></name></person-group><year iso-8601-date="2016">2016</year><article-title>Associative Mechanisms Allow for Social Learning and Cultural Transmission of String Pulling in an Insect</article-title><source>PLOS Biology</source><volume>14</volume><elocation-id>e1002564</elocation-id><pub-id pub-id-type="doi">10.1371/journal.pbio.1002564</pub-id><pub-id pub-id-type="pmid">27701411</pub-id></element-citation></ref><ref id="bib2"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Aoki</surname><given-names>K</given-names></name><name><surname>Feldman</surname><given-names>MW</given-names></name></person-group><year iso-8601-date="2014">2014</year><article-title>Evolution of learning strategies in temporally and spatially variable environments: a review of theory</article-title><source>Theoretical Population Biology</source><volume>91</volume><fpage>3</fpage><lpage>19</lpage><pub-id pub-id-type="doi">10.1016/j.tpb.2013.10.004</pub-id><pub-id pub-id-type="pmid">24211681</pub-id></element-citation></ref><ref id="bib3"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Aplin</surname><given-names>LM</given-names></name><name><surname>Sheldon</surname><given-names>BC</given-names></name><name><surname>McElreath</surname><given-names>R</given-names></name></person-group><year iso-8601-date="2017">2017</year><article-title>Conformity does not perpetuate suboptimal traditions in a wild population of songbirds</article-title><source>PNAS</source><volume>114</volume><fpage>7830</fpage><lpage>7837</lpage><pub-id pub-id-type="doi">10.1073/pnas.1621067114</pub-id><pub-id pub-id-type="pmid">28739943</pub-id></element-citation></ref><ref id="bib4"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Arbilly</surname><given-names>M</given-names></name><name><surname>Motro</surname><given-names>U</given-names></name><name><surname>Feldman</surname><given-names>MW</given-names></name><name><surname>Lotem</surname><given-names>A</given-names></name></person-group><year iso-8601-date="2011">2011</year><article-title>Evolution of social learning when high expected payoffs are associated with high risk of failure</article-title><source>Journal of the Royal Society, Interface</source><volume>8</volume><fpage>1604</fpage><lpage>1615</lpage><pub-id pub-id-type="doi">10.1098/rsif.2011.0138</pub-id><pub-id pub-id-type="pmid">21508013</pub-id></element-citation></ref><ref id="bib5"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Baldini</surname><given-names>R</given-names></name></person-group><year iso-8601-date="2012">2012</year><article-title>Success-biased social learning: cultural and evolutionary dynamics</article-title><source>Theoretical Population Biology</source><volume>82</volume><fpage>222</fpage><lpage>228</lpage><pub-id pub-id-type="doi">10.1016/j.tpb.2012.06.005</pub-id><pub-id pub-id-type="pmid">22743216</pub-id></element-citation></ref><ref id="bib6"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Baldini</surname><given-names>R</given-names></name></person-group><year iso-8601-date="2013">2013</year><article-title>Two success-biased social learning strategies</article-title><source>Theoretical Population Biology</source><volume>86</volume><fpage>43</fpage><lpage>49</lpage><pub-id pub-id-type="doi">10.1016/j.tpb.2013.03.005</pub-id><pub-id pub-id-type="pmid">23587700</pub-id></element-citation></ref><ref id="bib7"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Barrett</surname><given-names>BJ</given-names></name><name><surname>McElreath</surname><given-names>RL</given-names></name><name><surname>Perry</surname><given-names>SE</given-names></name></person-group><year iso-8601-date="2017">2017</year><article-title>Pay-off-biased social learning underlies the diffusion of novel extractive foraging traditions in a wild primate</article-title><source>Proceedings of the Royal Society B</source><volume>284</volume><elocation-id>20170358</elocation-id><pub-id pub-id-type="doi">10.1098/rspb.2017.0358</pub-id><pub-id pub-id-type="pmid">28344797</pub-id></element-citation></ref><ref id="bib8"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bault</surname><given-names>N</given-names></name><name><surname>Joffily</surname><given-names>M</given-names></name><name><surname>Rustichini</surname><given-names>A</given-names></name><name><surname>Coricelli</surname><given-names>G</given-names></name></person-group><year iso-8601-date="2011">2011</year><article-title>Medial prefrontal cortex and striatum mediate the influence of social comparison on the decision process</article-title><source>PNAS</source><volume>108</volume><fpage>16044</fpage><lpage>16049</lpage><pub-id pub-id-type="doi">10.1073/pnas.1100892108</pub-id><pub-id pub-id-type="pmid">21896760</pub-id></element-citation></ref><ref id="bib9"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Beckers</surname><given-names>R</given-names></name><name><surname>Deneubourg</surname><given-names>JLD</given-names></name><name><surname>Goss</surname><given-names>S</given-names></name><name><surname>Pasteels</surname><given-names>JM</given-names></name></person-group><year iso-8601-date="1990">1990</year><article-title>Collective decision making through food recruitment</article-title><source>Insectes Sociaux</source><volume>37</volume><fpage>258</fpage><lpage>267</lpage><pub-id pub-id-type="doi">10.1007/BF02224053</pub-id></element-citation></ref><ref id="bib10"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Biele</surname><given-names>G</given-names></name><name><surname>Rieskamp</surname><given-names>J</given-names></name><name><surname>Krugel</surname><given-names>LK</given-names></name><name><surname>Heekeren</surname><given-names>HR</given-names></name><name><surname>Behrens</surname><given-names>T</given-names></name></person-group><year iso-8601-date="2011">2011</year><article-title>The Neural Basis of Following Advice</article-title><source>PLOS Biology</source><volume>9</volume><elocation-id>e1001089</elocation-id><pub-id pub-id-type="doi">10.1371/journal.pbio.1001089</pub-id><pub-id pub-id-type="pmid">21713027</pub-id></element-citation></ref><ref id="bib11"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bikhchandani</surname><given-names>S</given-names></name><name><surname>Hirshleifer</surname><given-names>D</given-names></name><name><surname>Welch</surname><given-names>I</given-names></name></person-group><year iso-8601-date="1992">1992</year><article-title>A Theory of Fads, Fashion, Custom, and Cultural Change as Informational Cascades</article-title><source>Journal of Political Economy</source><volume>100</volume><fpage>992</fpage><lpage>1026</lpage><pub-id pub-id-type="doi">10.1086/261849</pub-id></element-citation></ref><ref id="bib12"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Biro</surname><given-names>D</given-names></name><name><surname>Sasaki</surname><given-names>T</given-names></name><name><surname>Portugal</surname><given-names>SJ</given-names></name></person-group><year iso-8601-date="2016">2016</year><article-title>Bringing a Time-Depth Perspective to Collective Animal Behaviour</article-title><source>Trends in Ecology & Evolution</source><volume>31</volume><fpage>550</fpage><lpage>562</lpage><pub-id pub-id-type="doi">10.1016/j.tree.2016.03.018</pub-id><pub-id pub-id-type="pmid">27105543</pub-id></element-citation></ref><ref id="bib13"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bolton</surname><given-names>P</given-names></name><name><surname>Harris</surname><given-names>C</given-names></name></person-group><year iso-8601-date="1999">1999</year><article-title>Strategic Experimentation</article-title><source>Econometrica</source><volume>67</volume><fpage>349</fpage><lpage>374</lpage><pub-id pub-id-type="doi">10.1111/1468-0262.00022</pub-id></element-citation></ref><ref id="bib14"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Boyd</surname><given-names>R</given-names></name><name><surname>Richerson</surname><given-names>PJ</given-names></name></person-group><year iso-8601-date="1985">1985</year><source>Culture and the Evolutionary Process</source><publisher-loc>Chicago, IL</publisher-loc><publisher-name>University of Chicago Press</publisher-name></element-citation></ref><ref id="bib15"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chung</surname><given-names>D</given-names></name><name><surname>Christopoulos</surname><given-names>GI</given-names></name><name><surname>King-Casas</surname><given-names>B</given-names></name><name><surname>Ball</surname><given-names>SB</given-names></name><name><surname>Chiu</surname><given-names>PH</given-names></name></person-group><year iso-8601-date="2015">2015</year><article-title>Social signals of safety and risk confer utility and have asymmetric effects on observers’ choices</article-title><source>Nature Neuroscience</source><volume>18</volume><fpage>912</fpage><lpage>916</lpage><pub-id pub-id-type="doi">10.1038/nn.4022</pub-id><pub-id pub-id-type="pmid">25984890</pub-id></element-citation></ref><ref id="bib16"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cialdini</surname><given-names>RB</given-names></name><name><surname>Goldstein</surname><given-names>NJ</given-names></name></person-group><year iso-8601-date="2004">2004</year><article-title>Social influence: compliance and conformity</article-title><source>Annual Review of Psychology</source><volume>55</volume><fpage>591</fpage><lpage>621</lpage><pub-id pub-id-type="doi">10.1146/annurev.psych.55.090902.142015</pub-id><pub-id pub-id-type="pmid">14744228</pub-id></element-citation></ref><ref id="bib17"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Conradt</surname><given-names>L</given-names></name><name><surname>List</surname><given-names>C</given-names></name><name><surname>Roper</surname><given-names>TJ</given-names></name></person-group><year iso-8601-date="2013">2013</year><article-title>Swarm intelligence: when uncertainty meets conflict</article-title><source>The American Naturalist</source><volume>182</volume><fpage>592</fpage><lpage>610</lpage><pub-id pub-id-type="doi">10.1086/673253</pub-id><pub-id pub-id-type="pmid">24107367</pub-id></element-citation></ref><ref id="bib18"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Csibra</surname><given-names>G</given-names></name><name><surname>Gergely</surname><given-names>G</given-names></name></person-group><year iso-8601-date="2011">2011</year><article-title>Natural pedagogy as evolutionary adaptation</article-title><source>Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences</source><volume>366</volume><fpage>1149</fpage><lpage>1157</lpage><pub-id pub-id-type="doi">10.1098/rstb.2010.0319</pub-id><pub-id pub-id-type="pmid">21357237</pub-id></element-citation></ref><ref id="bib19"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Deffner</surname><given-names>D</given-names></name><name><surname>Kleinow</surname><given-names>V</given-names></name><name><surname>McElreath</surname><given-names>R</given-names></name></person-group><year iso-8601-date="2020">2020</year><article-title>Dynamic social learning in temporally and spatially variable environments</article-title><source>Royal Society Open Science</source><volume>7</volume><elocation-id>200734</elocation-id><pub-id pub-id-type="doi">10.1098/rsos.200734</pub-id><pub-id pub-id-type="pmid">33489255</pub-id></element-citation></ref><ref id="bib20"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Denrell</surname><given-names>J</given-names></name><name><surname>March</surname><given-names>JG</given-names></name></person-group><year iso-8601-date="2001">2001</year><article-title>Adaptation as Information Restriction: The Hot Stove Effect</article-title><source>Organization Science</source><volume>12</volume><fpage>523</fpage><lpage>538</lpage><pub-id pub-id-type="doi">10.1287/orsc.12.5.523.10092</pub-id></element-citation></ref><ref id="bib21"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Denrell</surname><given-names>J</given-names></name></person-group><year iso-8601-date="2007">2007</year><article-title>Adaptive learning and risk taking</article-title><source>Psychological Review</source><volume>114</volume><fpage>177</fpage><lpage>187</lpage><pub-id pub-id-type="doi">10.1037/0033-295X.114.1.177</pub-id><pub-id pub-id-type="pmid">17227186</pub-id></element-citation></ref><ref id="bib22"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Denrell</surname><given-names>J</given-names></name><name><surname>Le Mens</surname><given-names>G</given-names></name></person-group><year iso-8601-date="2007">2007</year><article-title>Interdependent sampling and social influence</article-title><source>Psychological Review</source><volume>114</volume><fpage>398</fpage><lpage>422</lpage><pub-id pub-id-type="doi">10.1037/0033-295X.114.2.398</pub-id><pub-id pub-id-type="pmid">17500632</pub-id></element-citation></ref><ref id="bib23"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Denrell</surname><given-names>J</given-names></name><name><surname>Le Mens</surname><given-names>G</given-names></name></person-group><year iso-8601-date="2017">2017</year><article-title>Information Sampling, Belief Synchronization, and Collective Illusions</article-title><source>Management Science</source><volume>63</volume><fpage>528</fpage><lpage>547</lpage><pub-id pub-id-type="doi">10.1287/mnsc.2015.2354</pub-id></element-citation></ref><ref id="bib24"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Drezner-Levy</surname><given-names>T</given-names></name><name><surname>Shafir</surname><given-names>S</given-names></name></person-group><year iso-8601-date="2007">2007</year><article-title>Parameters of variable reward distributions that affect risk sensitivity of honey bees</article-title><source>The Journal of Experimental Biology</source><volume>210</volume><fpage>269</fpage><lpage>277</lpage><pub-id pub-id-type="doi">10.1242/jeb.02656</pub-id><pub-id pub-id-type="pmid">17210963</pub-id></element-citation></ref><ref id="bib25"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Dussutour</surname><given-names>A</given-names></name><name><surname>Deneubourg</surname><given-names>JL</given-names></name><name><surname>Fourcassié</surname><given-names>V</given-names></name></person-group><year iso-8601-date="2005">2005</year><article-title>Amplification of individual preferences in a social context: the case of wall-following in ants</article-title><source>Proceedings. Biological Sciences</source><volume>272</volume><fpage>705</fpage><lpage>714</lpage><pub-id pub-id-type="doi">10.1098/rspb.2004.2990</pub-id><pub-id pub-id-type="pmid">15870033</pub-id></element-citation></ref><ref id="bib26"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Efferson</surname><given-names>C</given-names></name><name><surname>Lalive</surname><given-names>R</given-names></name><name><surname>Richerson</surname><given-names>P</given-names></name><name><surname>Mcelreath</surname><given-names>R</given-names></name><name><surname>Lubell</surname><given-names>M</given-names></name></person-group><year iso-8601-date="2008">2008</year><article-title>Conformists and mavericks: the empirics of frequency-dependent cultural transmission☆</article-title><source>Evolution and Human Behavior</source><volume>29</volume><fpage>56</fpage><lpage>64</lpage><pub-id pub-id-type="doi">10.1016/j.evolhumbehav.2007.08.003</pub-id></element-citation></ref><ref id="bib27"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Frank</surname><given-names>SA</given-names></name></person-group><year iso-8601-date="2009">2009</year><article-title>The common patterns of nature</article-title><source>Journal of Evolutionary Biology</source><volume>22</volume><fpage>1563</fpage><lpage>1585</lpage><pub-id pub-id-type="doi">10.1111/j.1420-9101.2009.01775.x</pub-id><pub-id pub-id-type="pmid">19538344</pub-id></element-citation></ref><ref id="bib28"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Giraldeau</surname><given-names>LA</given-names></name><name><surname>Caraco</surname><given-names>T</given-names></name></person-group><year iso-8601-date="2000">2000</year><source>Social Foraging Theory</source><publisher-loc>New Jersey, United States</publisher-loc><publisher-name>Princeton University Press</publisher-name><pub-id pub-id-type="doi">10.1515/9780691188348</pub-id></element-citation></ref><ref id="bib29"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Giraldeau</surname><given-names>LA</given-names></name><name><surname>Valone</surname><given-names>TJ</given-names></name><name><surname>Templeton</surname><given-names>JJ</given-names></name></person-group><year iso-8601-date="2002">2002</year><article-title>Potential disadvantages of using socially acquired information</article-title><source>Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences</source><volume>357</volume><fpage>1559</fpage><lpage>1566</lpage><pub-id pub-id-type="doi">10.1098/rstb.2002.1065</pub-id><pub-id pub-id-type="pmid">12495513</pub-id></element-citation></ref><ref id="bib30"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Goss</surname><given-names>S</given-names></name><name><surname>Aron</surname><given-names>S</given-names></name><name><surname>Deneubourg</surname><given-names>JL</given-names></name><name><surname>Pasteels</surname><given-names>JM</given-names></name></person-group><year iso-8601-date="1989">1989</year><article-title>Self-organized shortcuts in the Argentine ant</article-title><source>Naturwissenschaften</source><volume>76</volume><fpage>579</fpage><lpage>581</lpage><pub-id pub-id-type="doi">10.1007/BF00462870</pub-id></element-citation></ref><ref id="bib31"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Grüter</surname><given-names>C</given-names></name><name><surname>Leadbeater</surname><given-names>E</given-names></name></person-group><year iso-8601-date="2014">2014</year><article-title>Insights from insects about adaptive social information use</article-title><source>Trends in Ecology & Evolution</source><volume>29</volume><fpage>177</fpage><lpage>184</lpage><pub-id pub-id-type="doi">10.1016/j.tree.2014.01.004</pub-id><pub-id pub-id-type="pmid">24560544</pub-id></element-citation></ref><ref id="bib32"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Harding</surname><given-names>EJ</given-names></name><name><surname>Paul</surname><given-names>ES</given-names></name><name><surname>Mendl</surname><given-names>M</given-names></name></person-group><year iso-8601-date="2004">2004</year><article-title>Cognitive bias and affective state</article-title><source>Nature</source><volume>427</volume><elocation-id>312</elocation-id><pub-id pub-id-type="doi">10.1038/427312a</pub-id><pub-id pub-id-type="pmid">14737158</pub-id></element-citation></ref><ref id="bib33"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Harrison</surname><given-names>JF</given-names></name><name><surname>Camazine</surname><given-names>S</given-names></name><name><surname>Marden</surname><given-names>JH</given-names></name><name><surname>Kirkton</surname><given-names>SD</given-names></name><name><surname>Rozo</surname><given-names>A</given-names></name><name><surname>Yang</surname><given-names>X</given-names></name></person-group><year iso-8601-date="2001">2001</year><article-title>Mite not make it home: tracheal mites reduce the safety margin for oxygen delivery of flying honeybees</article-title><source>The Journal of Experimental Biology</source><volume>204</volume><fpage>805</fpage><lpage>814</lpage><pub-id pub-id-type="doi">10.1242/jeb.204.4.805</pub-id><pub-id pub-id-type="pmid">11171363</pub-id></element-citation></ref><ref id="bib34"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hastie</surname><given-names>R</given-names></name><name><surname>Kameda</surname><given-names>T</given-names></name></person-group><year iso-8601-date="2005">2005</year><article-title>The robust beauty of majority rules in group decisions</article-title><source>Psychological Review</source><volume>112</volume><fpage>494</fpage><lpage>508</lpage><pub-id pub-id-type="doi">10.1037/0033-295X.112.2.494</pub-id><pub-id pub-id-type="pmid">15783295</pub-id></element-citation></ref><ref id="bib35"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hertwig</surname><given-names>R</given-names></name><name><surname>Erev</surname><given-names>I</given-names></name></person-group><year iso-8601-date="2009">2009</year><article-title>The description-experience gap in risky choice</article-title><source>Trends in Cognitive Sciences</source><volume>13</volume><fpage>517</fpage><lpage>523</lpage><pub-id pub-id-type="doi">10.1016/j.tics.2009.09.004</pub-id><pub-id pub-id-type="pmid">19836292</pub-id></element-citation></ref><ref id="bib36"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Hoppitt</surname><given-names>W</given-names></name><name><surname>Laland</surname><given-names>KN</given-names></name></person-group><year iso-8601-date="2013">2013</year><source>Social Learning: An Introduction to Mechanisms, Methods, and Models</source><publisher-loc>New Jersey, United States</publisher-loc><publisher-name>Princeton University Press</publisher-name><pub-id pub-id-type="doi">10.1515/9781400846504</pub-id></element-citation></ref><ref id="bib37"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Jouini</surname><given-names>E</given-names></name><name><surname>Napp</surname><given-names>C</given-names></name><name><surname>Nocetti</surname><given-names>D</given-names></name></person-group><year iso-8601-date="2011">2011</year><article-title>Collective risk aversion</article-title><source>Social Choice and Welfare</source><volume>40</volume><fpage>411</fpage><lpage>437</lpage><pub-id pub-id-type="doi">10.1007/s00355-011-0611-9</pub-id></element-citation></ref><ref id="bib38"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kandler</surname><given-names>A</given-names></name><name><surname>Laland</surname><given-names>KN</given-names></name></person-group><year iso-8601-date="2013">2013</year><article-title>Tradeoffs between the strength of conformity and number of conformists in variable environments</article-title><source>Journal of Theoretical Biology</source><volume>332</volume><fpage>191</fpage><lpage>202</lpage><pub-id pub-id-type="doi">10.1016/j.jtbi.2013.04.023</pub-id><pub-id pub-id-type="pmid">23643630</pub-id></element-citation></ref><ref id="bib39"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kendal</surname><given-names>RL</given-names></name><name><surname>Coolen</surname><given-names>I</given-names></name><name><surname>Bergen</surname><given-names>Y</given-names></name><name><surname>Laland</surname><given-names>KN</given-names></name></person-group><year iso-8601-date="2005">2005</year><article-title>Trade-Offs in the Adaptive Use of Social and Asocial Learning</article-title><source>Advances in the Study of Behavior</source><volume>35</volume><fpage>333</fpage><lpage>379</lpage><pub-id pub-id-type="doi">10.1016/S0065-3454(05)35008-X</pub-id></element-citation></ref><ref id="bib40"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>King</surname><given-names>AJ</given-names></name><name><surname>Cowlishaw</surname><given-names>G</given-names></name></person-group><year iso-8601-date="2007">2007</year><article-title>When to use social information: the advantage of large group size in individual decision making</article-title><source>Biology Letters</source><volume>3</volume><fpage>137</fpage><lpage>139</lpage><pub-id pub-id-type="doi">10.1098/rsbl.2007.0017</pub-id><pub-id pub-id-type="pmid">17284400</pub-id></element-citation></ref><ref id="bib41"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Krause</surname><given-names>J</given-names></name><name><surname>Ruxton</surname><given-names>GD</given-names></name></person-group><year iso-8601-date="2002">2002</year><source>Living in Groups</source><publisher-loc>Oxford</publisher-loc><publisher-name>Oxford University Press</publisher-name></element-citation></ref><ref id="bib42"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Laland</surname><given-names>KN</given-names></name></person-group><year iso-8601-date="2004">2004</year><article-title>Social learning strategies</article-title><source>Learning & Behavior</source><volume>32</volume><fpage>4</fpage><lpage>14</lpage><pub-id pub-id-type="doi">10.3758/bf03196002</pub-id><pub-id pub-id-type="pmid">15161136</pub-id></element-citation></ref><ref id="bib43"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lorenz</surname><given-names>J</given-names></name><name><surname>Rauhut</surname><given-names>H</given-names></name><name><surname>Schweitzer</surname><given-names>F</given-names></name><name><surname>Helbing</surname><given-names>D</given-names></name></person-group><year iso-8601-date="2011">2011</year><article-title>How social influence can undermine the wisdom of crowd effect</article-title><source>PNAS</source><volume>108</volume><fpage>9020</fpage><lpage>9025</lpage><pub-id pub-id-type="doi">10.1073/pnas.1008636108</pub-id><pub-id pub-id-type="pmid">21576485</pub-id></element-citation></ref><ref id="bib44"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ludvig</surname><given-names>EA</given-names></name><name><surname>Madan</surname><given-names>CR</given-names></name><name><surname>Pisklak</surname><given-names>JM</given-names></name><name><surname>Spetch</surname><given-names>ML</given-names></name></person-group><year iso-8601-date="2014">2014</year><article-title>Reward context determines risky choice in pigeons and humans</article-title><source>Biology Letters</source><volume>10</volume><elocation-id>20140451</elocation-id><pub-id pub-id-type="doi">10.1098/rsbl.2014.0451</pub-id><pub-id pub-id-type="pmid">25165453</pub-id></element-citation></ref><ref id="bib45"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Mahmoodi</surname><given-names>A</given-names></name><name><surname>Bahrami</surname><given-names>B</given-names></name><name><surname>Mehring</surname><given-names>C</given-names></name></person-group><year iso-8601-date="2018">2018</year><article-title>Reciprocity of social influence</article-title><source>Nature Communications</source><volume>9</volume><fpage>1</fpage><lpage>9</lpage><pub-id pub-id-type="doi">10.1038/s41467-018-04925-y</pub-id><pub-id pub-id-type="pmid">29946078</pub-id></element-citation></ref><ref id="bib46"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>March</surname><given-names>JG</given-names></name></person-group><year iso-8601-date="1996">1996</year><article-title>Learning to be risk averse</article-title><source>Psychological Review</source><volume>103</volume><fpage>309</fpage><lpage>319</lpage><pub-id pub-id-type="doi">10.1037/0033-295X.103.2.309</pub-id></element-citation></ref><ref id="bib47"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>McElreath</surname><given-names>R</given-names></name><name><surname>Lubell</surname><given-names>M</given-names></name><name><surname>Richerson</surname><given-names>PJ</given-names></name><name><surname>Waring</surname><given-names>TM</given-names></name><name><surname>Baum</surname><given-names>W</given-names></name><name><surname>Edsten</surname><given-names>E</given-names></name><name><surname>Efferson</surname><given-names>C</given-names></name><name><surname>Paciotti</surname><given-names>B</given-names></name></person-group><year iso-8601-date="2005">2005</year><article-title>Applying evolutionary models to the laboratory study of social learning</article-title><source>Evolution and Human Behavior</source><volume>26</volume><fpage>483</fpage><lpage>508</lpage><pub-id pub-id-type="doi">10.1016/j.evolhumbehav.2005.04.003</pub-id></element-citation></ref><ref id="bib48"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>McElreath</surname><given-names>R</given-names></name><name><surname>Bell</surname><given-names>AV</given-names></name><name><surname>Efferson</surname><given-names>C</given-names></name><name><surname>Lubell</surname><given-names>M</given-names></name><name><surname>Richerson</surname><given-names>PJ</given-names></name><name><surname>Waring</surname><given-names>T</given-names></name></person-group><year iso-8601-date="2008">2008</year><article-title>Beyond existence and aiming outside the laboratory: estimating frequency-dependent and pay-off-biased social learning strategies</article-title><source>Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences</source><volume>363</volume><fpage>3515</fpage><lpage>3528</lpage><pub-id pub-id-type="doi">10.1098/rstb.2008.0131</pub-id><pub-id pub-id-type="pmid">18799416</pub-id></element-citation></ref><ref id="bib49"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>McElreath</surname><given-names>R</given-names></name></person-group><year iso-8601-date="2020">2020</year><source>Statistical Rethinking</source><publisher-loc>London, United Kingdom</publisher-loc><publisher-name>CRC press</publisher-name><pub-id pub-id-type="doi">10.1201/9780429029608</pub-id></element-citation></ref><ref id="bib50"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Morgan</surname><given-names>TJH</given-names></name><name><surname>Uomini</surname><given-names>NT</given-names></name><name><surname>Rendell</surname><given-names>LE</given-names></name><name><surname>Chouinard-Thuly</surname><given-names>L</given-names></name><name><surname>Street</surname><given-names>SE</given-names></name><name><surname>Lewis</surname><given-names>HM</given-names></name><name><surname>Cross</surname><given-names>CP</given-names></name><name><surname>Evans</surname><given-names>C</given-names></name><name><surname>Kearney</surname><given-names>R</given-names></name><name><surname>de la Torre</surname><given-names>I</given-names></name><name><surname>Whiten</surname><given-names>A</given-names></name><name><surname>Laland</surname><given-names>KN</given-names></name></person-group><year iso-8601-date="2015">2015</year><article-title>Experimental evidence for the co-evolution of hominin tool-making teaching and language</article-title><source>Nature Communications</source><volume>6</volume><elocation-id>6029</elocation-id><pub-id pub-id-type="doi">10.1038/ncomms7029</pub-id><pub-id pub-id-type="pmid">25585382</pub-id></element-citation></ref><ref id="bib51"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Moussaïd</surname><given-names>M</given-names></name><name><surname>Brighton</surname><given-names>H</given-names></name><name><surname>Gaissmaier</surname><given-names>W</given-names></name></person-group><year iso-8601-date="2015">2015</year><article-title>The amplification of risk in experimental diffusion chains</article-title><source>PNAS</source><volume>112</volume><fpage>5631</fpage><lpage>5636</lpage><pub-id pub-id-type="doi">10.1073/pnas.1421883112</pub-id><pub-id pub-id-type="pmid">25902519</pub-id></element-citation></ref><ref id="bib52"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Muthukrishna</surname><given-names>M</given-names></name><name><surname>Morgan</surname><given-names>TJH</given-names></name><name><surname>Henrich</surname><given-names>J</given-names></name></person-group><year iso-8601-date="2016">2016</year><article-title>The when and who of social learning and conformist transmission</article-title><source>Evolution and Human Behavior</source><volume>37</volume><fpage>10</fpage><lpage>20</lpage><pub-id pub-id-type="doi">10.1016/j.evolhumbehav.2015.05.004</pub-id></element-citation></ref><ref id="bib53"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Najar</surname><given-names>A</given-names></name><name><surname>Bonnet</surname><given-names>E</given-names></name><name><surname>Bahrami</surname><given-names>B</given-names></name><name><surname>Palminteri</surname><given-names>S</given-names></name></person-group><year iso-8601-date="2020">2020</year><article-title>The actions of others act as a pseudo-reward to drive imitation in the context of social reinforcement learning</article-title><source>PLOS Biology</source><volume>18</volume><elocation-id>e3001028</elocation-id><pub-id pub-id-type="doi">10.1371/journal.pbio.3001028</pub-id><pub-id pub-id-type="pmid">33290387</pub-id></element-citation></ref><ref id="bib54"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Nakahashi</surname><given-names>W</given-names></name></person-group><year iso-8601-date="2007">2007</year><article-title>The evolution of conformist transmission in social learning when the environment changes periodically</article-title><source>Theoretical Population Biology</source><volume>72</volume><fpage>52</fpage><lpage>66</lpage><pub-id pub-id-type="doi">10.1016/j.tpb.2007.03.003</pub-id><pub-id pub-id-type="pmid">17442355</pub-id></element-citation></ref><ref id="bib55"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Nakahashi</surname><given-names>W</given-names></name><name><surname>Wakano</surname><given-names>JY</given-names></name><name><surname>Henrich</surname><given-names>J</given-names></name></person-group><year iso-8601-date="2012">2012</year><article-title>Adaptive social learning strategies in temporally and spatially varying environments : how temporal vs. spatial variation, number of cultural traits, and costs of learning influence the evolution of conformist-biased transmission, payoff-biased transmission, and individual learning</article-title><source>Human Nature (Hawthorne, N.Y.)</source><volume>23</volume><fpage>386</fpage><lpage>418</lpage><pub-id pub-id-type="doi">10.1007/s12110-012-9151-y</pub-id><pub-id pub-id-type="pmid">22926986</pub-id></element-citation></ref><ref id="bib56"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pleskac</surname><given-names>TJ</given-names></name><name><surname>Hertwig</surname><given-names>R</given-names></name></person-group><year iso-8601-date="2014">2014</year><article-title>Ecologically rational choice and the structure of the environment</article-title><source>Journal of Experimental Psychology</source><volume>143</volume><fpage>2000</fpage><lpage>2019</lpage><pub-id pub-id-type="doi">10.1037/xge0000013</pub-id></element-citation></ref><ref id="bib57"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Raafat</surname><given-names>RM</given-names></name><name><surname>Chater</surname><given-names>N</given-names></name><name><surname>Frith</surname><given-names>C</given-names></name></person-group><year iso-8601-date="2009">2009</year><article-title>Herding in humans</article-title><source>Trends in Cognitive Sciences</source><volume>13</volume><fpage>420</fpage><lpage>428</lpage><pub-id pub-id-type="doi">10.1016/j.tics.2009.08.002</pub-id></element-citation></ref><ref id="bib58"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Real</surname><given-names>LA</given-names></name></person-group><year iso-8601-date="1981">1981</year><article-title>Uncertainty and Pollinator-Plant Interactions: The Foraging Behavior of Bees and Wasps on Artificial Flowers</article-title><source>Ecology</source><volume>62</volume><fpage>20</fpage><lpage>26</lpage><pub-id pub-id-type="doi">10.2307/1936663</pub-id></element-citation></ref><ref id="bib59"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Real</surname><given-names>L</given-names></name><name><surname>Ott</surname><given-names>J</given-names></name><name><surname>Silverfine</surname><given-names>E</given-names></name></person-group><year iso-8601-date="1982">1982</year><article-title>On the Tradeoff Between the Mean and the Variance in Foraging: Effect of Spatial Distribution and Color Preference</article-title><source>Ecology</source><volume>63</volume><elocation-id>1617</elocation-id><pub-id pub-id-type="doi">10.2307/1940101</pub-id></element-citation></ref><ref id="bib60"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Rendell</surname><given-names>L</given-names></name><name><surname>Boyd</surname><given-names>R</given-names></name><name><surname>Cownden</surname><given-names>D</given-names></name><name><surname>Enquist</surname><given-names>M</given-names></name><name><surname>Eriksson</surname><given-names>K</given-names></name><name><surname>Feldman</surname><given-names>MW</given-names></name><name><surname>Fogarty</surname><given-names>L</given-names></name><name><surname>Ghirlanda</surname><given-names>S</given-names></name><name><surname>Lillicrap</surname><given-names>T</given-names></name><name><surname>Laland</surname><given-names>KN</given-names></name></person-group><year iso-8601-date="2010">2010</year><article-title>Why copy others? Insights from the social learning strategies tournament</article-title><source>Science (New York, N.Y.)</source><volume>328</volume><fpage>208</fpage><lpage>213</lpage><pub-id pub-id-type="doi">10.1126/science.1184719</pub-id><pub-id pub-id-type="pmid">20378813</pub-id></element-citation></ref><ref id="bib61"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sasaki</surname><given-names>T</given-names></name><name><surname>Biro</surname><given-names>D</given-names></name></person-group><year iso-8601-date="2017">2017</year><article-title>Cumulative culture can emerge from collective intelligence in animal groups</article-title><source>Nature Communications</source><volume>8</volume><fpage>1</fpage><lpage>6</lpage><pub-id pub-id-type="doi">10.1038/ncomms15049</pub-id><pub-id pub-id-type="pmid">28416804</pub-id></element-citation></ref><ref id="bib62"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Seeley</surname><given-names>T</given-names></name><name><surname>Camazine</surname><given-names>S</given-names></name><name><surname>Sneyd</surname><given-names>J</given-names></name></person-group><year iso-8601-date="1991">1991</year><article-title>Collective decision-making in honey bees: how colonies choose among nectar sources</article-title><source>Behavioral Ecology and Sociobiology</source><volume>28</volume><fpage>277</fpage><lpage>290</lpage><pub-id pub-id-type="doi">10.1007/BF00175101</pub-id></element-citation></ref><ref id="bib63"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Shupp</surname><given-names>RS</given-names></name><name><surname>Williams</surname><given-names>AW</given-names></name></person-group><year iso-8601-date="2008">2008</year><article-title>Risk Preference Differentials of Small Groups and Individuals</article-title><source>The Economic Journal</source><volume>118</volume><fpage>258</fpage><lpage>283</lpage><pub-id pub-id-type="doi">10.1111/j.1468-0297.2007.02112.x</pub-id></element-citation></ref><ref id="bib64"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Simons</surname><given-names>AM</given-names></name></person-group><year iso-8601-date="2004">2004</year><article-title>Many wrongs: the advantage of group navigation</article-title><source>Trends in Ecology & Evolution</source><volume>19</volume><fpage>453</fpage><lpage>455</lpage><pub-id pub-id-type="doi">10.1016/j.tree.2004.07.001</pub-id><pub-id pub-id-type="pmid">16701304</pub-id></element-citation></ref><ref id="bib65"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Stephan</surname><given-names>KE</given-names></name><name><surname>Penny</surname><given-names>WD</given-names></name><name><surname>Daunizeau</surname><given-names>J</given-names></name><name><surname>Moran</surname><given-names>RJ</given-names></name><name><surname>Friston</surname><given-names>KJ</given-names></name></person-group><year iso-8601-date="2009">2009</year><article-title>Bayesian model selection for group studies</article-title><source>NeuroImage</source><volume>46</volume><fpage>1004</fpage><lpage>1017</lpage><pub-id pub-id-type="doi">10.1016/j.neuroimage.2009.03.025</pub-id><pub-id pub-id-type="pmid">19306932</pub-id></element-citation></ref><ref id="bib66"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sumpter</surname><given-names>D</given-names></name><name><surname>Pratt</surname><given-names>S</given-names></name></person-group><year iso-8601-date="2003">2003</year><article-title>A modelling framework for understanding social insect foraging</article-title><source>Behavioral Ecology and Sociobiology</source><volume>53</volume><fpage>131</fpage><lpage>144</lpage><pub-id pub-id-type="doi">10.1007/s00265-002-0549-0</pub-id></element-citation></ref><ref id="bib67"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sumpter</surname><given-names>DJT</given-names></name></person-group><year iso-8601-date="2005">2005</year><article-title>The principles of collective animal behaviour</article-title><source>Philosophical Transactions of the Royal Society B</source><volume>361</volume><fpage>5</fpage><lpage>22</lpage><pub-id pub-id-type="doi">10.1098/rstb.2005.1733</pub-id></element-citation></ref><ref id="bib68"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Sutton</surname><given-names>RS</given-names></name><name><surname>Barto</surname><given-names>AG</given-names></name></person-group><year iso-8601-date="2018">2018</year><source>Reinforcement Learning: An Introduction</source><publisher-loc>Massachusetts, United States</publisher-loc><publisher-name>MIT press</publisher-name></element-citation></ref><ref id="bib69"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Suzuki</surname><given-names>S</given-names></name><name><surname>Jensen</surname><given-names>ELS</given-names></name><name><surname>Bossaerts</surname><given-names>P</given-names></name><name><surname>O’Doherty</surname><given-names>JP</given-names></name></person-group><year iso-8601-date="2016">2016</year><article-title>Behavioral contagion during learning about another agent’s risk-preferences acts on the neural representation of decision-risk</article-title><source>PNAS</source><volume>113</volume><fpage>3755</fpage><lpage>3760</lpage><pub-id pub-id-type="doi">10.1073/pnas.1600092113</pub-id><pub-id pub-id-type="pmid">27001826</pub-id></element-citation></ref><ref id="bib70"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Takahashi</surname><given-names>T</given-names></name><name><surname>Ihara</surname><given-names>Y</given-names></name></person-group><year iso-8601-date="2019">2019</year><article-title>Cultural and evolutionary dynamics with best-of-k learning when payoffs are uncertain</article-title><source>Theoretical Population Biology</source><volume>128</volume><fpage>27</fpage><lpage>38</lpage><pub-id pub-id-type="doi">10.1016/j.tpb.2019.05.004</pub-id><pub-id pub-id-type="pmid">31145878</pub-id></element-citation></ref><ref id="bib71"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Toyokawa</surname><given-names>W</given-names></name><name><surname>Kim</surname><given-names>H</given-names></name><name><surname>Kameda</surname><given-names>T</given-names></name></person-group><year iso-8601-date="2014">2014</year><article-title>Human collective intelligence under dual exploration-exploitation dilemmas</article-title><source>PLOS ONE</source><volume>9</volume><elocation-id>e95789</elocation-id><pub-id pub-id-type="doi">10.1371/journal.pone.0095789</pub-id><pub-id pub-id-type="pmid">24755892</pub-id></element-citation></ref><ref id="bib72"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Toyokawa</surname><given-names>W.</given-names></name><name><surname>Saito</surname><given-names>Y</given-names></name><name><surname>Kameda</surname><given-names>T</given-names></name></person-group><year iso-8601-date="2017">2017</year><article-title>Individual differences in learning behaviours in humans: Asocial exploration tendency does not predict reliance on social learning</article-title><source>Evolution and Human Behavior</source><volume>38</volume><fpage>325</fpage><lpage>333</lpage><pub-id pub-id-type="doi">10.1016/j.evolhumbehav.2016.11.001</pub-id></element-citation></ref><ref id="bib73"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Toyokawa</surname><given-names>W.</given-names></name><name><surname>Whalen</surname><given-names>A</given-names></name><name><surname>Laland</surname><given-names>KN</given-names></name></person-group><year iso-8601-date="2019">2019</year><article-title>Social learning strategies regulate the wisdom and madness of interactive crowds</article-title><source>Nature Human Behaviour</source><volume>3</volume><fpage>183</fpage><lpage>193</lpage><pub-id pub-id-type="doi">10.1038/s41562-018-0518-x</pub-id><pub-id pub-id-type="pmid">30944445</pub-id></element-citation></ref><ref id="bib74"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ward</surname><given-names>P</given-names></name><name><surname>Zahavi</surname><given-names>A</given-names></name></person-group><year iso-8601-date="1973">1973</year><article-title>The importance of certain assemblages of birds as “information-centres” for food finding</article-title><source>Ibis</source><volume>115</volume><fpage>517</fpage><lpage>534</lpage><pub-id pub-id-type="doi">10.1111/j.1474-919X.1973.tb01990.x</pub-id></element-citation></ref><ref id="bib75"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Watanabe</surname><given-names>S</given-names></name><name><surname>Opper</surname><given-names>M</given-names></name></person-group><year iso-8601-date="2010">2010</year><article-title>Asymptotic equivalence of bayes cross validation and widely applicable information criterion in singular learning theory</article-title><source>Journal of Machine Learning Research</source><volume>11</volume><elocation-id>12</elocation-id></element-citation></ref><ref id="bib76"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Weber</surname><given-names>EU</given-names></name><name><surname>Shafir</surname><given-names>S</given-names></name><name><surname>Blais</surname><given-names>AR</given-names></name></person-group><year iso-8601-date="2004">2004</year><article-title>Predicting risk sensitivity in humans and lower animals: risk as variance or coefficient of variation</article-title><source>Psychological Review</source><volume>111</volume><fpage>430</fpage><lpage>445</lpage><pub-id pub-id-type="doi">10.1037/0033-295X.111.2.430</pub-id><pub-id pub-id-type="pmid">15065916</pub-id></element-citation></ref><ref id="bib77"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Weber</surname><given-names>EU</given-names></name></person-group><year iso-8601-date="2006">2006</year><article-title>Experience-Based and Description-Based Perceptions of Long-Term Risk: Why Global Warming does not Scare us (Yet</article-title><source>Climatic Change</source><volume>77</volume><fpage>103</fpage><lpage>120</lpage><pub-id pub-id-type="doi">10.1007/s10584-006-9060-3</pub-id></element-citation></ref><ref id="bib78"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Yechiam</surname><given-names>E</given-names></name><name><surname>Busemeyer</surname><given-names>JR</given-names></name></person-group><year iso-8601-date="2006">2006</year><article-title>The effect of foregone payoffs on underweighting small probability events</article-title><source>Journal of Behavioral Decision Making</source><volume>19</volume><fpage>1</fpage><lpage>16</lpage><pub-id pub-id-type="doi">10.1002/bdm.509</pub-id></element-citation></ref><ref id="bib79"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Yechiam</surname><given-names>E</given-names></name><name><surname>Erev</surname><given-names>I</given-names></name><name><surname>Barron</surname><given-names>G</given-names></name></person-group><year iso-8601-date="2006">2006</year><article-title>The effect of experience on using a safety device</article-title><source>Safety Science</source><volume>44</volume><fpage>515</fpage><lpage>522</lpage><pub-id pub-id-type="doi">10.1016/j.ssci.2005.11.006</pub-id></element-citation></ref></ref-list><app-group><app id="appendix-1"><title>Appendix 1</title><sec id="s8" sec-type="appendix"><title>Supplementary methods</title><sec id="s8-1" sec-type="appendix"><title>An analytical result derived by <xref ref-type="bibr" rid="bib21">Denrell, 2007</xref></title><p>In the simplest setup of the two-armed bandit task, <xref ref-type="bibr" rid="bib21">Denrell, 2007</xref> derived an explicit form for the asymptotic probability of choosing the risky alternative <inline-formula><mml:math id="inf487"><mml:msubsup><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mo>⋆</mml:mo></mml:msubsup></mml:math></inline-formula> (as <inline-formula><mml:math id="inf488"><mml:mrow><mml:mi>t</mml:mi><mml:mo>→</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow></mml:math></inline-formula>) as follows:<disp-formula id="equ4"><label>(4)</label><mml:math id="m4"><mml:mrow><mml:msubsup><mml:mi>P</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mo>⋆</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mn>1</mml:mn><mml:mo>+</mml:mo><mml:mi>exp</mml:mi><mml:mo>⁡</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>α</mml:mi><mml:msup><mml:mi>β</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:msup><mml:mtext>s.d.</mml:mtext><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo stretchy="false">(</mml:mo><mml:mn>2</mml:mn><mml:mo>−</mml:mo><mml:mi>α</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac><mml:mo>−</mml:mo><mml:mi>β</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>μ</mml:mi><mml:mo>−</mml:mo><mml:msub><mml:mi>π</mml:mi><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p><p><xref ref-type="disp-formula" rid="equ4">Equation 4</xref> identifies a condition under which reinforcement learners exhibit risk aversion. In fact, when there is no risk premium (i.e. <inline-formula><mml:math id="inf489"><mml:mrow><mml:mi>μ</mml:mi><mml:mo>≤</mml:mo><mml:msub><mml:mi>π</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>), the condition of risk aversion always holds, that is, <inline-formula><mml:math id="inf490"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msubsup><mml:mi>P</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mo>⋆</mml:mo></mml:mrow></mml:msubsup><mml:mo><</mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>. Consider the case where risk aversion is suboptimal, that is, <inline-formula><mml:math id="inf491"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>μ</mml:mi><mml:mo>></mml:mo><mml:msub><mml:mi>π</mml:mi><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mstyle></mml:math></inline-formula>. <xref ref-type="disp-formula" rid="equ4">Equation 4</xref> suggests that suboptimal risk aversion emerges when learning is myopic (i.e. when <inline-formula><mml:math id="inf492"><mml:mi>α</mml:mi></mml:math></inline-formula> is large) and/or decision making is less explorative (i.e. when <inline-formula><mml:math id="inf493"><mml:mi>β</mml:mi></mml:math></inline-formula> is large). For instance, when the payoff distribution of the risky alternative is set to <inline-formula><mml:math id="inf494"><mml:mrow><mml:mi>μ</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>π</mml:mi><mml:mi>s</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="inf495"><mml:mrow><mml:msup><mml:mtext>s.d.</mml:mtext><mml:mn>2</mml:mn></mml:msup><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula>, the condition of risk aversion, <inline-formula><mml:math id="inf496"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msubsup><mml:mi>P</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mo>⋆</mml:mo></mml:mrow></mml:msubsup><mml:mo><</mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>, holds under <inline-formula><mml:math id="inf497"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>β</mml:mi><mml:mo>></mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mn>2</mml:mn><mml:mo>−</mml:mo><mml:mi>α</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mi>α</mml:mi></mml:mrow></mml:mstyle></mml:math></inline-formula>, which corresponds to the area above the dashed curve in <xref ref-type="fig" rid="fig1">Figure 1b</xref> in the main text. Risk aversion becomes more prominent when the risk premium <inline-formula><mml:math id="inf498"><mml:mrow><mml:mi>μ</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mi>π</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is small and/or the payoff variance <inline-formula><mml:math id="inf499"><mml:msup><mml:mtext>s.d.</mml:mtext><mml:mn>2</mml:mn></mml:msup></mml:math></inline-formula> is large.</p></sec><sec id="s8-2" sec-type="appendix"><title>The online experiments</title><sec id="s8-2-1" sec-type="appendix"><title>Subjects</title><p>The positive risk premium (positive RP) tasks were conducted between August and October 2020 (recruiting 492 subjects), while the negative risk premium (negative RP) task was conducted in September 2021 (recruiring 127 subjects) in response to the comments from peer reviewers. All subjects declared their residence in the United Kingdom, the United States, Ireland, or Australia. All subjects consented to participation through an online consent form at the beginning of the task. We excluded subjects who disconnected from the online task before completing at least the first 35 rounds from our computational model-fitting analysis, resulting in 467 subjects for the positive RP tasks and 118 subjects for the negative RP task (the detailed distribution of subjects for each condition is shown in <xref ref-type="table" rid="table1">Table 1</xref> in the main text). The task was available only for English-speaking subjects and they had to be 18 years old or older. Only subjects who passed a comprehension quiz at the end of the instructions could enter the task. Subjects were paid 0.8 GBP as a show-up fee as well as an additional bonus payment depending on their performance in the decision-making task In the positive RP tasks 500 artificial points were converted to 8 pence, while in the negative RP task 500 points were converted to 10 pence so as to compensate the less productive environment, resulting in a bonus ranging between £1.0 and £3.5.</p></sec><sec id="s8-2-2" sec-type="appendix"><title>Sample size</title><p>Our original target sample size for the positive RP tasks was 50 subjects for the individual condition and 150 subjects for the group condition where our target average group size was 5 individuals per group. For the negative RP task, we aimed to recruit 30 individuals for the individual condition and 100 individuals (that is, 20 groups of 5) for the group condition. Subjects each completed 70 trials of the task. The sample size and the trial number had been justified by a model recovery analysis of a previous study (<xref ref-type="bibr" rid="bib73">Toyokawa et al., 2019</xref>).</p><p>Because of the nature of the ‘waiting lobby’, which was available only for 3 min, we could not fully control the exact size of each experimental group. Therefore, we set the maximum capacity of a lobby to 8 individuals for the 1-safe-1-risky task, which was conducted in August 2020, so as to buffer potential dropouts during the waiting period. Since we learnt that dropping out happened far less than we originally expected, we reduced the lobby capacity to 6 for both the 1-risky-3-safe and the 2-risky-2-safe task, which were conducted in October 2020. As a result, we had 20 groups (mean group size = 6.95), 21 groups (mean group size = 4.7), 19 groups (mean group size = 4.3), and 21 gorups (mean group size = 4.4), for the 1-risky-1-safe, 1-risky-3-safe, 2-risky-2-safe task, and the negative risk premium 2-armed task, respectively. Although we could not achieve the sample size targeted, partly due to the dropouts during the task and to a fatal error occurring in the experimental server in the first few sessions of the four-armed tasks, the parameter recovery test with <inline-formula><mml:math id="inf500"><mml:mrow><mml:mi>N</mml:mi><mml:mo>=</mml:mo><mml:mn>105</mml:mn></mml:mrow></mml:math></inline-formula> suggested that the current sample size should be reliable enough to estimate social influences for each subject (<xref ref-type="fig" rid="fig6s3">Figure 6—figure supplement 3</xref>).</p></sec></sec><sec id="s8-3" sec-type="appendix"><title>The hierarchical Bayesian parameter estimation</title><p>We used the hierarchical Bayesian method (HBM) to estimate the free parameters of our learning model. HBM allowed us to estimate individual differences, while this individual variation is bounded by the group-level (i.e. hyper) parameters. To do so, we used the following non-centred reparameterisation (the ‘Matt trick’) as follows:<disp-formula id="equ5"><mml:math id="m5"><mml:mrow><mml:mtext>logit</mml:mtext><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>α</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi>μ</mml:mi><mml:mrow><mml:mi>α</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>α</mml:mi></mml:mrow></mml:msub><mml:mo>∗</mml:mo><mml:msub><mml:mi>α</mml:mi><mml:mrow><mml:mtext>raw</mml:mtext><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></disp-formula></p><p>where <inline-formula><mml:math id="inf501"><mml:msub><mml:mi>μ</mml:mi><mml:mi>α</mml:mi></mml:msub></mml:math></inline-formula> is a global mean of <inline-formula><mml:math id="inf502"><mml:mrow><mml:mtext>logit</mml:mtext><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>α</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="inf503"><mml:msub><mml:mi>v</mml:mi><mml:mi>α</mml:mi></mml:msub></mml:math></inline-formula> is a global scale parameter of the individual variations, which is multiplied by a standardised individual random variable <inline-formula><mml:math id="inf504"><mml:msub><mml:mi>α</mml:mi><mml:mrow><mml:mtext>raw</mml:mtext><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>. We used a standardised normal prior distribution centred on 0 for <inline-formula><mml:math id="inf505"><mml:msub><mml:mi>μ</mml:mi><mml:mi>α</mml:mi></mml:msub></mml:math></inline-formula> and an exponential prior for <inline-formula><mml:math id="inf506"><mml:msub><mml:mi>v</mml:mi><mml:mi>α</mml:mi></mml:msub></mml:math></inline-formula>. The same method was applied to the other learning parameters <inline-formula><mml:math id="inf507"><mml:msub><mml:mi>β</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula>, <inline-formula><mml:math id="inf508"><mml:msub><mml:mi>σ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula>, and <inline-formula><mml:math id="inf509"><mml:msub><mml:mi>θ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula>.</p><p>We assumed that the ‘raw’ values of individual random variables (<inline-formula><mml:math id="inf510"><mml:msub><mml:mi>α</mml:mi><mml:mrow><mml:mtext>raw</mml:mtext><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula><mml:math id="inf511"><mml:msub><mml:mi>β</mml:mi><mml:mrow><mml:mtext>raw</mml:mtext><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula><mml:math id="inf512"><mml:msub><mml:mi>σ</mml:mi><mml:mrow><mml:mtext>raw</mml:mtext><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula><mml:math id="inf513"><mml:msub><mml:mi>θ</mml:mi><mml:mrow><mml:mtext>raw</mml:mtext><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>) were drawn from a multivariate normal distribution. The correlation matrix was estimated using a Cholesky decomposition with a weakly informative Lewandowski–Kurowicka–Joe prior that gave a low likelihood to very high or very low correlations between the parameters (<xref ref-type="bibr" rid="bib49">McElreath, 2020</xref>; <xref ref-type="bibr" rid="bib19">Deffner et al., 2020</xref>).</p><sec id="s8-3-1" sec-type="appendix"><title>Model fitting</title><p>All models were fitted using the Hamiltonian Monte Carlo engine CmdStan 2.25.0 (<ext-link ext-link-type="uri" xlink:href="https://mc-stan.org/cmdstanr/index.html">https://mc-stan.org/cmdstanr/index.html</ext-link>) in R 4.0.2 (<ext-link ext-link-type="uri" xlink:href="https://www.r-project.org">https://www.r-project.org</ext-link>). The models contained at least six parallel chains and we confirmed convergence of the MCMC using both the Gelman–Rubin statistics criterion <inline-formula><mml:math id="inf514"><mml:mrow><mml:mover accent="true"><mml:mi>R</mml:mi><mml:mo stretchy="false">^</mml:mo></mml:mover><mml:mo>≤</mml:mo><mml:mn>1.01</mml:mn></mml:mrow></mml:math></inline-formula> and the effective sample sizes greater than 500. The R and Stan code used in the model fitting are available from <ext-link ext-link-type="uri" xlink:href="https://github.com/WataruToyokawa/ToyokawaGaissmaier2021">an online repository</ext-link>.</p></sec></sec><sec id="s8-4" sec-type="appendix"><title>The value-shaping social influence model</title><p>We considered another implementation of social influences in reinforcement learning, namely, a value-shaping (<xref ref-type="bibr" rid="bib53">Najar et al., 2020</xref>) (or ‘outcome-bonus’ <xref ref-type="bibr" rid="bib10">Biele et al., 2011</xref>) model rather than the decision-biasing process assumed in our main analyses. In the value-shaping model, social influence modifies the <inline-formula><mml:math id="inf515"><mml:mi>Q</mml:mi></mml:math></inline-formula> value’s updating process as follows:<disp-formula id="equ6"><mml:math id="m6"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy="false">←</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>−</mml:mo><mml:mi>α</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>α</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>π</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>σ</mml:mi><mml:mrow><mml:mi>v</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mover><mml:mi>π</mml:mi><mml:mo stretchy="false">¯</mml:mo></mml:mover></mml:mrow><mml:mfrac><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>−</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>−</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>−</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math></disp-formula></p><p>where the social frequency cue acts as an additional ‘bonus’ to the value that was weighted by <inline-formula><mml:math id="inf516"><mml:msub><mml:mi>σ</mml:mi><mml:mrow><mml:mi>v</mml:mi><mml:mo>⁢</mml:mo><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> (<inline-formula><mml:math id="inf517"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>σ</mml:mi><mml:mrow><mml:mi>v</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo>></mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:mstyle></mml:math></inline-formula>) and standardised by the expected payoff from choosing randomly among all alternatives <inline-formula><mml:math id="inf518"><mml:mover accent="true"><mml:mi>π</mml:mi><mml:mo stretchy="false">¯</mml:mo></mml:mover></mml:math></inline-formula>. Here we assumed no direct social influence on the action selection process (i.e., <inline-formula><mml:math id="inf519"><mml:mrow><mml:mi>σ</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math></inline-formula> in our main model). We confirmed that the collective behavioural rescue could emerge when the inverse temperature <inline-formula><mml:math id="inf520"><mml:mi>β</mml:mi></mml:math></inline-formula> was sufficiently small (<xref ref-type="fig" rid="fig1s2">Figure 1—figure supplement 2</xref>). Although it is beyond the focus of this article whether any other types of models would fit better with human data than the models we considered in this study, it is an interesting question for future research. For such an attempt, see <xref ref-type="bibr" rid="bib53">Najar et al., 2020</xref>.</p></sec></sec></app></app-group></back><sub-article article-type="editor-report" id="sa0"><front-stub><article-id pub-id-type="doi">10.7554/eLife.75308.sa0</article-id><title-group><article-title>Editor's evaluation</article-title></title-group><contrib-group><contrib contrib-type="author"><name><surname>Liljeholm</surname><given-names>Mimi</given-names></name><role specific-use="editor">Reviewing Editor</role><aff><institution-wrap><institution-id institution-id-type="ror">https://ror.org/04gyf1771</institution-id><institution>University of California, Irvine</institution></institution-wrap><country>United States</country></aff></contrib></contrib-group></front-stub><body><p>The authors use reinforcement learning and dynamic modeling to formalize the favorable effects of conformity on risk taking, demonstrating that social influence can produce an adaptive risk-seeking equilibrium at the population level. The work provides a rigorous analysis of a paradoxical interplay between social and economic choice.</p></body></sub-article><sub-article article-type="decision-letter" id="sa1"><front-stub><article-id pub-id-type="doi">10.7554/eLife.75308.sa1</article-id><title-group><article-title>Decision letter</article-title></title-group><contrib-group content-type="section"><contrib contrib-type="editor"><name><surname>Liljeholm</surname><given-names>Mimi</given-names></name><role>Reviewing Editor</role><aff><institution-wrap><institution-id institution-id-type="ror">https://ror.org/04gyf1771</institution-id><institution>University of California, Irvine</institution></institution-wrap><country>United States</country></aff></contrib></contrib-group></front-stub><body><boxed-text id="sa2-box1"><p>In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.</p></boxed-text><p><bold>Decision letter after peer review:</bold></p><p>[Editors’ note: the authors submitted for reconsideration following the decision after peer review. What follows is the decision letter after the first round of review.]</p><p>Thank you for submitting the paper "Conformist social learning leads to self-organised prevention against adverse bias in risky decision making" for consideration by <italic>eLife</italic>. Your article has been reviewed by 3 peer reviewers, including Mimi Liljeholm as the Reviewing Editor and Reviewer #3, and the evaluation has been overseen by a Senior Editor.</p><p>We are sorry to say that, after consultation with the reviewers, we have decided that this work will not be considered further for publication by <italic>eLife</italic>.</p><p>There was consensus among the reviewers that the paper addresses an important and impactful topic. The influence of conformity on economic choice is still largely unexplored, and the rigorous modeling approach employed here is valuable. However, a primary weakness is the lack of integration with the relevant empirical and theoretical context, which imperils both the novelty and interpretability of the results:</p><p>First, it is unclear whether these findings constitute enough of an advance over those reported by Denrell and Le Mens (2007) to warrant publication in <italic>eLife</italic>. Second, there is no effort to incorporate processes supporting normative and informational conformity into the models. Notably, these issues are somewhat connected, in that a formal integration of normative and informational conformity with sampling-based collective rescue might go a long way towards distinguishing this work from that of Denrell and Le Mens. Thus, while these issues are too open-ended for a revision decision at <italic>eLife</italic>, the enthusiasm among reviewers was such that, should these concerns be fully addressed, the paper might be considered again as a new submission.</p><p>The specific comments from the reviewers are appended below for further reference.</p><p><italic>Reviewer #1:</italic></p><p>In this study the authors investigated the effect of social influence on individuals' risk aversion in a 2-armed bandit task. Using a reinforcement learning model and a dynamic model they showed that social influence can in fact diminish risk aversion. The authors then conducted a series of online experiments and found that their experimental findings were consistent with the prediction of their simulations.</p><p>The authors addressed an important question in the literature and adopted an interesting approach by first making predictions using simulations and then verifying those predictions with experimental data. The modelling has been conducted very carefully.</p><p>However, I have some concerns about the interpretation of the findings which might be addressed using additional analysis and or rewriting some parts of the manuscript. The study does not clarify whether in this task participants copy others to maximize their accuracy (informational conformity) or alternatively to be aligned with others (normative conformity). It is possible that participants became riskier because most of the group were choosing the riskier decisions (regardless of the outcome). In addition to that, an earlier study showed that people make riskier decisions when they make decision alongside other people. This might be a potential confound of this study.</p><p>One potentially interesting design would be to test people in a situation where only the minority of the group members choose the optimal option (riskier option). If participants' choices become riskier even in this condition, we can conclude that they were not just copying the majority, but were maximising their reward by observing others' decisions and outcomes.</p><p>In this study the authors investigated the effect of social influence on individuals' risk aversion in a 2-armed bandit task. Using a reinforcement learning model and a dynamic model they showed that social influence can in fact diminish risk aversion. The authors then conducted a series of online experiments and observed the same effect in their experimental data as well. The research question is timely, and the modelling has been done carefully. However, I have some comments and concerns about the interpretations of their findings.</p><p>– My first question is do the participants copy others because the other risky option sounds better in terms of reward or because they copy others just because being in alignment with others is rewarding? This brings us to the distinction between informational and normative influence. For example, a recent study showed that copying others is not necessarily motivated by maximising accuracy (Mahmoodi et al. 2018, see also Cialdinin and Goldstein 2004). In their experimental data, the authors found that participants do not copy others (choosing the risky options) as much as they should do. Does it suggest that their conformity toward others cannot be fully explained by informational motives (where the aim of conformity is to maximise payoff/accuracy). I suggest that the authors discuss each of these possibilities and then explain to which of these two types of influence their findings belong to.</p><p>– An earlier study showed that people's decisions become riskier when they make decisions with others (Bault et al. PNAS 2011). Could this explain the findings that are presented in this paper? Can the models distinguish between these two types of change in behaviour? I strongly suggest the authors to discuss the Bault et al. paper and discuss how their findings deviate from this study.</p><p>– In one section the authors show that reducing heterogeneity in groups undermines group performance. It brought my attention to a study (Lorenz et al. PNAS 2011) which suggested that social influence can undermine wisdom of crowds through reducing heterogeneity of opinions. It seems that the authors are presenting the same phenomenon as that suggested by Lorenz and colleagues. I suggest that the authors cite that study and discuss how their results is related to that study and whether their findings broaden our understanding of the effect of heterogeneity on collective performance.</p><p>– Nothing can be found about the model in the main text. Similarly, some of the terms are not even defined before they are used in the main text. For example, the term "asocial learning" is only defined in the Figure 2 caption. I suggest that the authors briefly explain the model and the key terms in the main text before presenting the result. I also strongly suggest that the authors mention in the main text that the detail of the model is presented in the methods.</p><p>In the introduction reads: social influence does not mindlessly increase risk seeking; instead, it may work only when to do so is adaptive. I believe this sentence is vague in its current form. I suggest that the authors elaborate on it, especially on the last part (i.e. to do so is adaptive).</p><p><italic>Reviewer #2:</italic></p><p>The paper considers an interesting puzzle. While most psychological studies of conformity tend to focus on negative effects, animal research highlights positive effects of conformity. The current analysis tries to clarify this apparent puzzle by clarifying one positive effect of conformity: Reduction of the hot stove effect that impairs maximization when taking risk is optimal.</p><p>The paper can be improved by building on previous efforts (e.g., Denrell and Le Mens, 2007) to clarify the impact of social influence on the hot stove effect. It would also be good to try to simplify the model, and use the same tasks in the theoretical analysis and the experimental study.</p><p>One interesting open question involves the impact of the increase in the information, available today (in social networks like Facebook) concerning the behavior of other individuals. I think that the current analysis predicts an increase in risk taking.</p><p>The paper considers an interesting puzzle. While most psychological studies of conformity tend to focus on negative effects, animal research highlights positive effects of conformity. The current analysis tries to clarify this apparent puzzle by clarifying one positive effect of conformity: Reduction of the hot stove effect that impairs maximization when taking risk is optimal.</p><p>The paper's main shortcoming is the fact that it is difficult to understand how it adds to the observations presented in Denrell and Le Mens (2007), and more recent research by Le Mens. It is possible that the authors can address this shortcoming by clarifying the difference between pure conformity (or imitation) and the impact of social influence examined by Le Mens and his co-authors.</p><p>Another shortcoming involves the difference between the choice task analyzed in the theoretical analysis, and the task examined in the experiment. The theoretical analysis focuses on normal distributions, and the experiment focuses on asymmetric bimodal distributions. The authors suggest that they chose to switch to asymmetric bimodal distributions as the hot stove effect, exhibited by human subjects, in the case of normal distributions is not strong. If this is the case, it would be good to adjust the theoretical model and use a model that better capture human behavior.</p><p>A third shortcoming involves the complexity to the theoretical model. Since this model is only used to demonstrate that conformity can reduce the hot stove effect, and is not supposed to capture the exact magnitude of the two effects, I could not understand why it includes so many parameters. For example, it would be nice to add only one parameter to the basic reinforcement learning model. If more parameters are needed it would be good to show why they are needed.</p><p><italic>Reviewer #3:</italic></p><p>The authors use reinforcement learning and dynamic modeling to formalize the favorable effects of conformity on risk taking, demonstrating that social influence can produce an adaptive risk-seeking equilibrium at the population level. The work provides a rigorous analysis of a paradoxical interplay between social and economic choice.</p><p>Conformity is commonly attributed to either an intrinsic reward of group membership, or to inferences about the optimality of others' behavior (i.e., normative vs. informational). Neither of these aspects of conformity are addressed here, which limits the interpretability of the results. For example, if there is an intrinsic reward associated with majority alignment, that should contribute to the reinforcement of such decisions; moreover, inferences about the optimality of observed behavior likely change from early trials, in which others can be assumed to simply explore, to later trials, in which the decisions of others may be indicative of their success. The work would be more impactful if it considered how these factors might affect the potential for collective rescue.</p><p>An interesting question is whether a substantial payoff contingent on choosing a risky option may server to reinforce the act of risk taking itself, and how such processes might propagate social influence across environments.</p><p>I suspect that the paper was initially written with the Methods following the Introduction, and with the Methods being subsequently moved to the end without much additional editing. As a result, very few of the variables and concepts (e.g., conformist influence, copying weight, positive/negative feedback) are defined in the main text upon first mention, which makes for extremely onerous reading.</p><p>Cases where the risky option yields a lesser mean payoff, producing a potentially detrimental social influence, should be given full weight in the main text, and should have been included in the behavioral study. Generally, the discrepancy between modeling and behavioral results is a bit disappointing. It is unclear why the behavioral experiment was not designed so as create the most relevant conditions.</p><p>My greatest concern is that the work does not integrate properly with its theoretical and empirical context. Additional analyses assessing the relative contributions of normative and informational conformity to socially induced risk-seeking would be helpful.</p><p>[Editors’ note: further revisions were suggested prior to acceptance, as described below.]</p><p>Thank you for resubmitting your work entitled "Conformist social learning leads to self-organised prevention against adverse bias in risky decision making" for further consideration by <italic>eLife</italic>. Your revised article has been evaluated by Michael Frank (Senior Editor) and a Reviewing Editor.</p><p>The manuscript has been improved, and Reviewer 2 recommends acceptance at this point, but Reviewer 3 has some remaining concerns, summarized below. We invite you to address all remaining concerns in a second round of revisions. Make sure to include point-by-point replies to each of Reviewer 3's recommendations.</p><p>1) Although the organization and writing is improved, it has some ways to go before the manuscript is ready for publication. For example, the basic aims and methods should be stated in one or two sentences at the end of the first, or at most second, paragraph of the introduction, giving the reader a clear sense of where things are going. Moreover, the "Agent-based model" section should be shortened (to include only what is needed to conceptually understand the model, leaving details for a table and the methods) and better integrated with the introduction, rather than inserted as (what appears to be) a super-section.</p><p>2) Just as the intro should include a concise description of the model, it should highlight the online experiments, and how they relate to the modeling.</p><p>3) The result section should start with a paragraph outlining the various hypotheses and corresponding analyses (i.e., a "roadmap" for the section).</p><p>4) Please quantify the performance of your model relative to others with formal comparisons (e.g., Bayesian Model Selection).</p><p>5) Please quantify all claims of associations with effect sizes and clearly justify all parameter cut-offs/values.</p><p>6) Streamline figures and included predicted/observed result plots wherever possible.</p><p><italic>Reviewer #3:</italic></p><p>The authors showed that when individuals learn about how they should decide in a situation where they can choose between a risky and a safe option, they might overcome maladaptive biases (e.g. exaggerated risk-aversion when risk-taking would be beneficial) by conforming with group behaviour. Strengths include rigorous and innovative computational modeling, a weakness might be that the set-up of the empirical study did not actually widely provoke behavioral phenomena at question, e.g. social learning (which is arguably at the core of the research question). Even though I am reviewing a revised manuscript I would hope the authors find a way to further improve clarity in the presentation of their research question and results.</p><p>My main concern was that even in the revised version I read, I found the paper not as accessible as I think it should be for a wide readership of a journal like <italic>eLife</italic>; and often times I found that things you could communicate in a straightforward way are put too complicated/expressed very verbosely. When reading the previous reviews after having read the revised paper, I also got the feeling that there were some misunderstandings. You clarified these specific points well in your responses, I think, but the lack in clarity might be even more drastic with an interdisciplinary readership this journal aims at as compared to the experts the journal has recruited now for this review? I will try to give some examples below.</p><p>The introduction should, imho provide a general intro to the question of the paper and how you've arrived to ask that question, avoiding too much technical jargon. After having read the paper, I realized that the research question is pretty straight-forward (and interesting) and derived from 1/2 previous observations, but this didn't become clear on the very first read.</p><p>Just as an example, some of the first sentences are…</p><p>"One rationale behind this optimistic view might come from the assumption that individuals tend to prefer a behavioural option that provides larger net benefits in utility over those providing lower outcomes. Therefore, even though uncertainty makes individual decision-making fallible, statistical filtering through informational pooling may be able to reduce uncertainty by cancelling out such noise."</p><p>This is only 1 example (it is something I noted throughout the paper… and also came up in the previous round of reviews) where I think these 2 sentences require some if not a little more background in decision-making (net benefits, utility, uncertainty, noise, stat filtering, informational pooling) to be understandable.</p><p>Other terminology like e.g. collective illusion, opportunity costs, description-based vs. experienced-based risk-taking paradigms, frequency-based influences, would be nice to be either defined in the text or replaced by a more accessible description in the introduction.</p><p>I know it is sometimes hard to mentalize which terminology others not working on the same things might struggle with, but given that this is an interdisciplinary journal, might it perhaps make sense to ask a researcher friend who is not exactly working on this topic to give it a read?</p><p>"such a risk-taking bias constrained by the fundamental nature of learning may function independently from the adaptive risk perception (Frey et al., 2017), potentially preventing adaptive risk taking."</p><p>Unclear without knowing or looking up the Frey paper – shorten ("to be too risk-averse might be maladaptive in some contexts"?) or explain.</p><p>Line 74-82 extremely long sentence – I think can easily be simplified? – "previous studies have neglected contexts where individuals learn about the environment both by own and others' experiences"</p><p>Line 84-89 is really long, too.</p><p>I would have been interested to learn more about the online experiments in the intro.</p><p>I do understand the reasoning of copy and pasting the agent Based Model description after having read the previous reviews, but it confused me when reading the article first; I think it needs to be integrated with Intro/Results section better (the first paragraph reads like intro/background still, but then it is about the authors' current study). I fear that readers don't know where they are in the paper at that point. (I much prefer journals with the Methods-Results order rather than the one <italic>eLife</italic> uses, as this would naturally circumvent this problem, but that's the challenge here, I reckon.)</p><p>Results</p><p>Subheadings: Make clearer which is the section that describes simulations and which is the empirical section.</p><p>Can you maybe start with reiterating in a structured way which parameters you set to which values and why in the simulation before describing the effects of it, how many trials you simulate (you start speaking of elongated time horizons but do not mention the original horizon length other than in the figure legend?) etc?</p><p>I know the Najar work and I think it is cool that you can generalize your results also to a value-based framework, but I do not think readers that do not know the Najar study will be able to follow this at it is described now, which makes it more confusing than interesting. So either elaborate what this means (accessible to non-initiated readers) or ban to the Supplement (would be a shame).</p><p>Would the value-based model fit the empirical data better?</p><p>When reading the first part of the Results section I constantly wondered: How did the group behave / How was the behaviour of the group determined in the simulation? Was variability considered? (this is something that's been manipulated in some empirical studies building on descriptive risk scenarios, e.g. Suzuki et al). It becomes clear when reading on/looking at Figure 3, but it is such a crucial point that it needs to be made clear from the beginning.</p><p>I think it is a major limitation that in the empirical study actual social learning was extremely limited, given that the paper claims to provide a formal account of the function of social learning in this situation?…. I would have thought that indeed trying to provoke more use of social influence by altering the experimental setup in a way the authors propose in their discussion would have been important, and given that this can be done online, also a feasible option.</p><p>Also, empirically, susceptibility to the hot stove effect, i.e., <italic>α<sub>i</sub></italic>(<italic>β<sub>i</sub></italic> + 1), seems to be very low – zero for most of the participants in some scenarios according to Figure 6A,B,C? – isn't this concerning, given that this is at the core of what the authors want to explain?</p><p>"those with a higher value of the susceptibility to the hot stove effect (<italic>αi</italic>(<italic>βi</italic> + 1)) were less likely to choose the risky alternative, whereas those who had a smaller value of <italic>α<sub>i</sub></italic>(<italic>β<sub>i</sub></italic> + 1) had a higher chance of choosing the safe alternative (Figure 6a-c), " -- I'm confused- should it read "those who had a smaller value of <italic>α<sub>i</sub></italic>(<italic>β<sub>i</sub></italic> + 1) had a higher chance of choosing the *risky* alternative"?</p><p>Could you please quantify this correlation in terms of an effect size? The association is not linear, from the Figure? Please specify.</p><p>Is the association driven by α or by β or really by the product?</p><p>"The behaviour in the group condition supports our theoretical predictions. In the PRP tasks, the proportion of choosing the favourable risky option increased with social influence (<italic>σ<sub>i</sub></italic>) particularly for individuals who had a high susceptibility to the hot stove effect. On the other hand, social influence had little benefit for those who had a low susceptibility to the hot stove effect (e.g., <italic>α<sub>i</sub></italic>(<italic>β<sub>i</sub></italic> + 1) {less than or equal to} 0.5)." Can you quantify this with statistically (effect sizes etc)?</p><p>Did you do any form of model selection on the empirical data (with different set ups of your models, the reduced models (e.g. without σ), or the Najar type of model) to demonstrate that it is really your theoretically proposed model that fits the data best (e.g. Bayesian Model Selection)? Please include in the main manuscript. See Palminteri, TiCS on why this might be important.</p><p>I think Figure 6 is really overloaded. The legend somehow looks as it would belong only to panel c? The coloured plots are individual data points as a function of group size (which is what?) or copying weight? For me, in the current formatting, dots were too small to detect a continuous colour coding scheme (only yellow vs purple). The solid lines are simulated data? Can you show a regression line for the empirical data to allow for comparisons? Does it not differ substantially from the model predictions? I suggest making different plots for different purposes (compare predicted behaviour to empirical behaviour, show effect of copying weight, show effect of group size etc, show simulation were you plug in <inline-formula><mml:math id="sa1m1"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mrow><mml:mover><mml:mi>σ</mml:mi><mml:mo stretchy="false">¯</mml:mo></mml:mover></mml:mrow></mml:mrow></mml:mstyle></mml:math></inline-formula>>0.4)</p><p>"In keeping with this, if we extrapolated the larger value of the copying of weight (i.e., <inline-formula><mml:math id="sa1m2"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mrow><mml:mover><mml:msub><mml:mi>σ</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">¯</mml:mo></mml:mover></mml:mrow></mml:mrow></mml:mstyle></mml:math></inline-formula>> 0.4) into the best fitting social learning model with the other parameters calibrated, a strong collective rescue became prominent " – sorry, where does the value <inline-formula><mml:math id="sa1m3"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mrow><mml:mover><mml:mi>σ</mml:mi><mml:mo stretchy="false">¯</mml:mo></mml:mover></mml:mrow></mml:mrow></mml:mstyle></mml:math></inline-formula>>0.4 exactly come from for this analysis? Please give more detail/contextualise better.</p><p>Please try to be consistent with terminology <italic>αi</italic>(<italic>βi</italic> + 1) is sometimes called 'susceptibility ' or 'susceptibility value', which might be confusing, given that in some published articles susceptibility refers to susceptibility to social influence which would be another parameter…. I suggest to go through the manuscript once more and strictly only use one term for each parameter (the one you introduce in the table).</p></body></sub-article><sub-article article-type="reply" id="sa2"><front-stub><article-id pub-id-type="doi">10.7554/eLife.75308.sa2</article-id><title-group><article-title>Author response</article-title></title-group></front-stub><body><p>[Editors’ note: the authors resubmitted a revised version of the paper for consideration. What follows is the authors’ response to the first round of review.]</p><disp-quote content-type="editor-comment"><p>The specific comments from the reviewers are appended below for further reference.</p><p>Reviewer #1:</p><p>In this study the authors investigated the effect of social influence on individuals' risk aversion in a 2-armed bandit task. Using a reinforcement learning model and a dynamic model they showed that social influence can in fact diminish risk aversion. The authors then conducted a series of online experiments and found that their experimental findings were consistent with the prediction of their simulations.</p><p>The authors addressed an important question in the literature and adopted an interesting approach by first making predictions using simulations and then verifying those predictions with experimental data. The modelling has been conducted very carefully.</p><p>However, I have some concerns about the interpretation of the findings which might be addressed using additional analysis and or rewriting some parts of the manuscript.</p><p>The study does not clarify whether in this task participants copy others to maximize their accuracy (informational conformity) or alternatively to be aligned with others (normative conformity). It is possible that participants became riskier because most of the group were choosing the riskier decisions (regardless of the outcome).</p></disp-quote><p>We agree to the reviewer’s point that both informational and normative motivations would underlie conformity behaviour. It is correct that our model does not specify underlying individual motivations. In other words, the model does not depend on whether there are normative motivations to align with the majority or there exist only informational motivations for conformity. The result of our model can hold irrespective of them. Whatever the proximate reasons behind the conformist social influence, as long as individual choices are influenced by many others’ behaviour, experiences from the (otherwise avoided) risky alternative can increase, which results in mitigation of the hot stove effect. To make this point clearer, we have added texts in the Model section in the main text as follows:</p><p>– (Line 159 – 167): “A payoff realised was independent of others’ decisions and it was drawn solely from the payoff probability distribution specific to each alternative, thereby we assume neither direct social competition over the monetary reward (Giraldeau & Caraco, 2000) nor normative pressures towards majority alignment (Cialdini & Goldstein, 2004; Mahmoodi et al., 2018). The value of social information was assumed to be only informational (Nakahashi, 2012). Nevertheless, our model could apply to the context of normative social influences, because what we assumed here was modifications in individual choice probabilities due to social influences, irrespective of underlying motivations of conformity.”</p><p>For the sake of simplicity, in the online experiment we aimed to limit the underlying motivation of using social information to the informational one. Therefore, participants in our experiment had no direct monetary incentives for aligning with the majority and were kept anonymous during the task, which we thought minimised the possibility to evoke the normative motivation. Nevertheless, we agree with the reviewer’s point that the distinction between informational and normative conformity is important in a broader context of social influence, and both types of motivations could have in fact co-worked in the experiment as well as in many real-world situations. We added discussions to elaborate this point (see below). In particular, we expect that the weight for social learning (that is,σparameter in our model) would increase if both informational and normative motivations for conformity operate together, which would either promote more robust rescue effect ifσis still not too large, or trigger maladaptive herding (i.e. collective illusion) ifσbecomes too large. The discussion we have added are as follows:</p><p>– (Lines 572 – 586): “The weak reliance on social learning, which affected only about 15% of decisions, was unable to facilitate strong positive feedback. The little use of social information might have been due to the lack of normative motivations for conformity and to the stationarity of the task. In a stable environment, learners could eventually gather enough information as trials proceeded, which might have made them less curious about information gathering including social learning (Rendell et al., 2010). In reality, people might use more sophisticated social learning strategies whereby they change the reliance on social information flexibly over trials (Deffner et al., 2020; Toyokawa et al., 2017, 2019). Future research should consider more strategic use of social information, and will look at the conditions that elicit heavier reliance on the conformist social learning in humans, such as normative pressures for aligning with majority, volatility in the environment, time pressure, or an increasing number of behavioural options (Muthukrishna et al., 2015), coupled with larger group sizes (Toyokawa et al., 2019).”</p><disp-quote content-type="editor-comment"><p>In addition to that, an earlier study showed that people make riskier decisions when they make decision alongside other people. This might be a potential confound of this study.</p></disp-quote><p>We now cite some earlier studies, highlighting how our approach and the previous literature differ qualitatively. Previous studies investigating social influence on risky decision making have focused mainly on the description-based task where information sampling from experience does not play any important role in decision making. In contrast, our focus here is on the experience-based (i.e., learning based) risky decision making where information sampling processes are responsible for the proximate causes of risk-aversion, whose mechanisms can be independent from the utility function-based risk sensitivity measured in the description-based task. This distinction is important, because the very nature of information sampling in the experienced-based task plays the core role in both the hot stove effect and the collective rescue effect. To make this point clear, we have added sentences in the Introduction as follows:</p><p>–(Lines 72 – 82): How, if at all, can group-living animals improve collective decision accuracy while suppressing the potentially deleterious constraint of decision-making biases through trial-and-error learning? One of the strong candidates of explaining this gap is the fact that studies in human social learning in risky decision making have focused only on either the description based gambles (Chung et al., 2015; Bault et al., 2011; Suzuki et al., 2016; Shupp and Williams, 2008) or extreme conformity where individual choices are regulated fully by others’ behaviour (Denrell and Le Mens, 2007, 2016), but not on experienced-based situations where both individual and social learning affect behavioural outcomes, a form of decision making widespread in group-living animals and humans (Hertwig and Erev, 2009; Camazine et al., 2001; Toyokawa et al., 2019).</p><disp-quote content-type="editor-comment"><p>One potentially interesting design would be to test people in a situation where only the minority of the group members choose the optimal option (riskier option). If participants' choices become riskier even in this condition, we can conclude that they were not just copying the majority, but were maximising their reward by observing others' decisions and outcomes.</p></disp-quote><p>The situation described by the reviewer here is exactly what happened in our results. Risk-aversion was mitigated not because the majority chose the risky option, nor were individuals simply attracted towards the majority. Rather, participants’ choices became risker even though the majority chose the safer alternative at the outset. The mechanism behind such an ostensibly ‘minority effect’ is explained in the main text as follows, and we now highlight more clearly what this means:</p><p>– (Line 516 – 526): Despite conformity, the probability of choosing the suboptimal option can decrease from what is expected by individual learning alone. Indeed, an inherent individual preference for the safe alternative, expressed by the softmax function <inline-formula><mml:math id="sa2m4"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mi>β</mml:mi><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msup><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mi>β</mml:mi><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mi>β</mml:mi><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msup><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula>, is always mitigated by the conformist influence <inline-formula><mml:math id="sa2m5"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula> as long as the former is larger than the latter. In other words, risk-aversion was mitigated not because the majority chose the risky option, nor were individuals simply attracted towards the majority. Rather, participants’ choices became risker even though the majority chose the safer alternative at the outset. Intuitively, under social influences (either because of informational or normative motivations), individuals become more explorative, likely to continue sampling the risky option even after he/she becomes disappointed by poor rewards.</p><disp-quote content-type="editor-comment"><p>In this study the authors investigated the effect of social influence on individuals' risk aversion in a 2-armed bandit task. Using a reinforcement learning model and a dynamic model they showed that social influence can in fact diminish risk aversion. The authors then conducted a series of online experiments and observed the same effect in their experimental data as well. The research question is timely, and the modelling has been done carefully. However, I have some comments and concerns about the interpretations of their findings.</p><p>– My first question is do the participants copy others because the other risky option sounds better in terms of reward or because they copy others just because being in alignment with others is rewarding? This brings us to the distinction between informational and normative influence. For example, a recent study showed that copying others is not necessarily motivated by maximising accuracy (Mahmoodi et al. 2018, see also Cialdinin and Goldstein 2004). In their experimental data, the authors found that participants do not copy others (choosing the risky options) as much as they should do. Does it suggest that their conformity toward others cannot be fully explained by informational motives (where the aim of conformity is to maximise payoff/accuracy). I suggest that the authors discuss each of these possibilities and then explain to which of these two types of influence their findings belong to.</p></disp-quote><p>We agree that two different motivations for conformity might have played a role in our experimental setup, although we did not explicitly distinguish these factors in our theoretical development. We have discussed further on this point and edited some texts in the Discussion as we have shown in our response to the reviewer 1 above.</p><disp-quote content-type="editor-comment"><p>– An earlier study showed that people's decisions become riskier when they make decisions with others (Bault et al. PNAS 2011). Could this explain the findings that are presented in this paper? Can the models distinguish between these two types of change in behaviour? I strongly suggest the authors to discuss the Bault et al. paper and discuss how their findings deviate from this study.</p></disp-quote><p>We have added texts explaining the relationship between our study and other studies using the description-based gambling tasks such as Bault et al. (2011), as we have shown in our response to the reviewer 2 (see above). In general, Bault et al. (2011) focuses on the description-based task where individuals can access to the profile of gambles, whereas our focus is on the experience-based decision making where information sampling through choices is crucial. The fact that previous human collective risky decision-making studies have been dominated mostly by the description-based gambling seems to account for the ostensible gap between maladaptive collective illusion reported in human conformity studies and collective intelligence documented in animal conformity studies. Since information sampling through experience is the crucial factor in our results, the rescue effect would never emerge if we used the description-based tasks.</p><p>Another key difference between our model and Bault et al. (2011) was whether others’ payoff information was available or not. Bault et al. focused on the situation where participants could see others’ payoffs, hence assuming the richer social information transmission than what assumed by our frequency-based social learning model. The implications of this difference were discussed in detail as follows:</p><p>– (Lines 600 – 609): Information about others’ payoffs might also be available in addition to inadvertent social frequency cues in some social contexts (Bault et al., 2011), especially with the aid of online communication tools or benevolent pedagogical acts from others. Although communicative acts may transfer information about behavioural alternatives that one has never tried before and may inform about forgone payoffs from other alternatives, which could mitigate the hot stove effect (Denrell, 2007; Yechiam and Busemeyer, 2006), it may further amplify the suboptimal decision bias if information senders, despite their cooperative motivation, selectively filter out some pieces of information they think are redundant (Moussaïd et al., 2015).</p><disp-quote content-type="editor-comment"><p>– In one section the authors show that reducing heterogeneity in groups undermines group performance. It brought my attention to a study (Lorenz et al. PNAS 2011) which suggested that social influence can undermine wisdom of crowds through reducing heterogeneity of opinions. It seems that the authors are presenting the same phenomenon as that suggested by Lorenz and colleagues. I suggest that the authors cite that study and discuss how their results is related to that study and whether their findings broaden our understanding of the effect of heterogeneity on collective performance.</p></disp-quote><p>Thank you for asking us to clarify the relation to this important study. We now explain explicitly why the collective rescue effect we find cannot be explained by the monotonic relationship between diversity and the wisdom of crowds (as it occurred in Lorenz et al., 2011). The followings are the discussion that we added:</p><p>– (Lines 505 – 513): Neither the averaging process of diverse individual inputs nor the speeding up of learning could account for the rescue effect. The individual diversity in the learning rate (α) was beneficial for the group performance, whereas that in the social learning weight (σ) undermines the average decision performance, which could not be explained simply by a monotonic relationship between diversity and wisdom of crowds (Lorenz et al., 2011). Self-organisation through collective behavioural dynamics emerging from the experience-based decision making must be responsible for the seemingly counter-intuitive phenomenon of collective rescue.</p><disp-quote content-type="editor-comment"><p>– Nothing can be found about the model in the main text. Similarly, some of the terms are not even defined before they are used in the main text. For example, the term "asocial learning" is only defined in the Figure 2 caption. I suggest that the authors briefly explain the model and the key terms in the main text before presenting the result. I also strongly suggest that the authors mention in the main text that the detail of the model is presented in the methods.</p></disp-quote><p>We now include ‘the Agent-Based Model’ section right after the Introduction (line 94 – 196), explaining the details of both task setups and the reinforcement learning models. We introduce the term ‘asocial learning’ in this new section as follows:</p><p>– (Lines 188 – 196): Note that, when σ = 0, there is no social influence, and the decision maker is considered as an asocial learner. It is also worth noting that, when σ = 1 with θ > 0, individual choices are assumed to be contingent fully upon majority's behaviour, which was assumed in some previous models of strong conformist social influences in sampling behaviour (Denrell and LeMens, 2016). Our model is a natural extension of both the asocial reinforcement learning and the model of ‘extreme conformity’, as these conditions can be expressed as a special case of parameter combinations. We will discuss the implications of this extension in the Discussion. The descriptions of the parameters are shown in Table 1.</p><p>We mention that the details of the dynamics model and that of the online experiments are presented in the Method as follows:</p><p>– (Line 329): The full details of this dynamics model are shown in the Method and Table 3.</p><p>– (Lines 436 – 439): The experimental task was basically a replication of the agent-based model described above, although the parameters of the bandit tasks were different (see the Method for the details of the experimental procedures; Supplementary Figure 11).</p><disp-quote content-type="editor-comment"><p>In the introduction reads: social influence does not mindlessly increase risk seeking; instead, it may work only when to do so is adaptive. I believe this sentence is vague in its current form. I suggest that the authors elaborate on it, especially on the last part (i.e. to do so is adaptive).</p></disp-quote><p>To clarify this point, we have changed the sentence as follows:</p><p>– (Lines 208 – 211): Interestingly, such a switch to risk seeking did not emerge when risk aversion was actually optimal (Supplementary Figure 9), suggesting that social influence does not always increase risk seeking; instead, the effect seems to be more prominent especially when risk seeking is beneficial in the long run.</p><disp-quote content-type="editor-comment"><p>Reviewer #2:</p><p>The paper considers an interesting puzzle. While most psychological studies of conformity tend to focus on negative effects, animal research highlights positive effects of conformity. The current analysis tries to clarify this apparent puzzle by clarifying one positive effect of conformity: Reduction of the hot stove effect that impairs maximization when taking risk is optimal.</p><p>The paper can be improved by building on previous efforts (e.g., Denrell and Le Mens, 2007) to clarify the impact of social influence on the hot stove effect. It would also be good to try to simplify the model, and use the same tasks in the theoretical analysis and the experimental study.</p></disp-quote><p>We agree to the reviewer’s point that our theory could become more impactful by relating to previous models such as Denrell & Le Mens (2007; 2016). We have explained what the critical difference between our model and the previous model is, highlighting how our model can be considered as a natural extension of previous conformity models. Notably, our model includes the cases explored by Denrell & Le Mens (2016) as an extreme setting of the social learning parameters where individual decision making is regulated fully by conformist social influence (that is,σ=1 andθ>1 for all individuals). Although the minor technical details between ours and their model are not identical, we were indeed able to replicate the pattern they found (i.e., the collective illusion by copying the majority’s behaviour) especially when social learning parameters (i.e., σandθ) were very high. The texts we added are as follows:</p><p>–(Lines 188 – 196): “Note that, when σ = 0, there is no social influence, and the decision maker is considered as an asocial learner. It is also worth noting that, when σ = 1 with θ > 0, individual choices are assumed to be contingent fully upon majority's behaviour, which was assumed in some previous models of strong conformist social influences in sampling behaviour (Denrell & LeMens, 2016). Our model is a natural extension of both the asocial reinforcement learning and the model of ‘extreme conformity’, as these conditions can be expressed as a special case of parameter combinations. We will discuss the implications of this extension in the Discussion. The descriptions of the parameters are shown in Table 1.”</p><p>To keep the model as simple as possible, we have added only two parameters for the frequency-based social learning processes, namely, σ(copying weight) and θ(conformity exponent). Previous studies have established that these two processes (i.e., the rate of social learning and the strength of conformity) affect collective dynamics differently (e.g., Kandler & Laland, 2013; Toyokawa et al., 2019). Therefore, we must consider these two parameters explicitly. We could have made our model more complex and more realistic by, for instance, considering temporally changing social influences (Toyokawa et al., 2019; Deffner et al., 2020), which we believe is worth exploring in the future studies. However, we aimed to limit our analysis to the simplest case so as to connect the literature on reinforcement learning and the hot stove effect (Denrell, 2007).</p><p>The discrepancy between our theoretical model and the online experimental tasks has by now been resolved by additional simulations shown in Supplementary Fig. 11 (Figure 1 – figure supplement 5). Here, we have shown that the rescue effect emerges robustly across different settings of the bandit tasks used in the online experiment. The reason why we focused on the Gaussian distribution task in the main result of the Agent-Based Model section was for the sake of mathematical tractability. The Gaussian task was theoretically well established, and its analytical solution was available (Denrell, 2007), which made our findings much clearer because we can assure that performance of social learners deviates truly from the analytical solution of asocial reinforcement learners’ performance (Fig 1).</p><disp-quote content-type="editor-comment"><p>One interesting open question involves the impact of the increase in the information, available today (in social networks like Facebook) concerning the behavior of other individuals. I think that the current analysis predicts an increase in risk taking.</p></disp-quote><p>We have added some discussion on this issue in the discussion section as follows:</p><p>– (Lines 600 – 609): “Information about others’ payoffs might also be available in addition to inadvertent social frequency cues in some social contexts (Bault et al., 2011), especially with the aid of online communication tools or benevolent pedagogical acts from others. Although communicative acts may transfer information about behavioural alternatives that one has never tried before and may inform about forgone payoffs from other alternatives, which could mitigate the hot stove effect (Denrell, 2007; Yechiam and Busemeyer, 2006), it may further amplify the suboptimal decision bias if information senders, despite their cooperative motivation, selectively filter out some pieces of information they think are redundant (Moussaïd et al., 2015).”</p><disp-quote content-type="editor-comment"><p>The paper considers an interesting puzzle. While most psychological studies of conformity tend to focus on negative effects, animal research highlights positive effects of conformity. The current analysis tries to clarify this apparent puzzle by clarifying one positive effect of conformity: Reduction of the hot stove effect that impairs maximization when taking risk is optimal.</p><p>The paper's main shortcoming is the fact that it is difficult to understand how it adds to the observations presented in Denrell and Le Mens (2007), and more recent research by Le Mens. It is possible that the authors can address this shortcoming by clarifying the difference between pure conformity (or imitation) and the impact of social influence examined by Le Mens and his co-authors.</p></disp-quote><p>Denrell and Le Mens (2007 and 2016) are indeed very relevant to our topic. We cite these two papers in our revised manuscript. Their 2007 paper considered opinion dynamics of a pair of individuals, while their 2016 paper extended it to multiple players (n≧2) that is more relevant to our model. The most crucial difference between their 2016 model and ours is that whilst they only considered a very strong conformity bias whereby individual choices were determined fully by other people’s opinion state, we have considered a wider range of conformist social influences from extremely weak (σ = 0; asocial reinforcement learning) to extremely strong (σ = 1; akin to the strong conformity assumed in Denrell and Le Mens (2016)). As we showed in our results, this relaxation of allowing the intermediate-level of conformist social influence in decision making is the necessary condition to generate the collective rescue effect. To clarify this point, we have modified several texts as follows:</p><p>– (Lines 72 – 82): “How, if at all, can group-living animals improve collective decision accuracy while suppressing the potentially deleterious constraint of decision-making biases through trial-and-error learning? One of the strong candidates of explaining this gap is the fact that studies in human social learning in risky decision making have focused only on either the description-based gambles (Chung et al., 2015; Bault et al., 2011; Suzuki et al., 2016; Shupp and Williams, 2008) or extreme conformity where individual choices are regulated fully by others’ behaviour (Denrell and Le Mens, 2007, 2016), but not on experienced-based situations where both individual and social learning affect behavioural outcomes, a form of decision making widespread in group-living animals and humans (Hertwig and Erev, 2009; Camazine et al., 2001; Toyokawa et al., 2019).”</p><p>– (Lines 188 – 196): “Note that, when σ = 0, there is no social influence, and the decision maker is considered as an asocial learner. It is also worth noting that, when σ = 1 with θ > 0, individual choices are assumed to be contingent fully upon majority's behaviour, which was assumed in some previous models of strong conformist social influences in sampling behaviour (Denrell and LeMens, 2016). Our model is a natural extension of both the asocial reinforcement learning and the model of ‘extreme conformity’, as these conditions can be expressed as a special case of parameter combinations. We will discuss the implications of this extension in the Discussion. The descriptions of the parameters are shown in Table 1.”</p><p>– (Lines 288 – 292): “This was because individuals with lower σ could benefit less from social information, while those with higher relied so heavily on social frequency information that behaviour was barely informed by individual learning, resulting in maladaptive herding or collective illusion (Denrell and Le Mens, 2016; Toyokawa et al., 2019).”</p><p>– (Lines 495 – 504): “We have demonstrated that frequency-based copying, one of the most common forms of social learning strategy, can rescue decision makers from committing to adverse risk aversion in a risky trial-and-error learning task, even though a majority of individuals are potentially biased towards suboptimal risk aversion. Although an extremely strong reliance on conformist influence can raise the possibility of getting stuck on a suboptimal option, consistent with the previous view of herding by conformity (Raafat et al., 2009; Denrell and Le Mens, 2016), the mitigation of risk aversion and the concomitant collective behavioural rescue could emerge in a wide range of situations under modest use of conformist social learning.”</p><p>– (Lines 557 – 560): “Such the synergistic interaction between positive and negative feedback could not be predicted by the collective illusion models where individual decision making is determined fully by the majority influence because no negative feedback would be able to operate.”</p><disp-quote content-type="editor-comment"><p>Another shortcoming involves the difference between the choice task analyzed in the theoretical analysis, and the task examined in the experiment. The theoretical analysis focuses on normal distributions, and the experiment focuses on asymmetric bimodal distributions. The authors suggest that they chose to switch to asymmetric bimodal distributions as the hot stove effect, exhibited by human subjects, in the case of normal distributions is not strong. If this is the case, it would be good to adjust the theoretical model and use a model that better capture human behavior.</p></disp-quote><p>We conducted additional simulations using the same bandit task setups as we used in the online experiment, confirming that the results do not change across conditions. Please find the details of this point in our reply to the review from reviewer #2 above.</p><disp-quote content-type="editor-comment"><p>A third shortcoming involves the complexity to the theoretical model. Since this model is only used to demonstrate that conformity can reduce the hot stove effect, and is not supposed to capture the exact magnitude of the two effects, I could not understand why it includes so many parameters. For example, it would be nice to add only one parameter to the basic reinforcement learning model. If more parameters are needed it would be good to show why they are needed.</p></disp-quote><p>As we discussed in the reply to the above, we agree that we should restrict the model to be as simple as possible. We believe that the current modelling is one of the simplest forms that can capture both the reliance on social influence (captured by <italic>σ</italic>) and the strength of conformity (captured by <italic>θ</italic>) separately.</p><disp-quote content-type="editor-comment"><p>Reviewer #3:</p><p>The authors use reinforcement learning and dynamic modeling to formalize the favorable effects of conformity on risk taking, demonstrating that social influence can produce an adaptive risk-seeking equilibrium at the population level. The work provides a rigorous analysis of a paradoxical interplay between social and economic choice.</p><p>Conformity is commonly attributed to either an intrinsic reward of group membership, or to inferences about the optimality of others' behavior (i.e., normative vs. informational). Neither of these aspects of conformity are addressed here, which limits the interpretability of the results. For example, if there is an intrinsic reward associated with majority alignment, that should contribute to the reinforcement of such decisions; moreover, inferences about the optimality of observed behavior likely change from early trials, in which others can be assumed to simply explore, to later trials, in which the decisions of others may be indicative of their success. The work would be more impactful if it considered how these factors might affect the potential for collective rescue.</p><p>An interesting question is whether a substantial payoff contingent on choosing a risky option may server to reinforce the act of risk taking itself, and how such processes might propagate social influence across environments.</p><p>I suspect that the paper was initially written with the Methods following the Introduction, and with the Methods being subsequently moved to the end without much additional editing. As a result, very few of the variables and concepts (e.g., conformist influence, copying weight, positive/negative feedback) are defined in the main text upon first mention, which makes for extremely onerous reading.</p></disp-quote><p>We have substantially revised the structure of the manuscript, now placing the ‘a Agent-Based Model’ section between the Introduction and the Results. In the current form, all key parameters (namely, conformist influence and copying weight) as well as other important concepts (e.g., positive feedback) are defined when it first appears:</p><p>– (Line 61 – 64): “Given that behavioural biases are ubiquitous and learning animals rarely escape from them, it may seem that conformist social influences may often lead to suboptimal herding or collective illusion through recursive amplification of the majority influence (i.e., positive feedback)”</p><p>Also, we deleted the term ‘negative feedback’ from the Introduction so that the term first appears at line 412 so that the meaning becomes clearer:</p><p>– (Lines 410 – 413): “Crucially, the reduction of <inline-formula><mml:math id="sa2m6"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>P</mml:mi><mml:msup><mml:mi>s</mml:mi><mml:mo>–</mml:mo></mml:msup></mml:mrow></mml:mstyle></mml:math></inline-formula><sup>−</sup> leads to further reduction of <inline-formula><mml:math id="sa2m7"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>P</mml:mi><mml:msup><mml:mi>s</mml:mi><mml:mo>–</mml:mo></mml:msup></mml:mrow></mml:mstyle></mml:math></inline-formula><sup>−</sup> itself through decreasing <inline-formula><mml:math id="sa2m8"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>N</mml:mi><mml:msup><mml:mi>s</mml:mi><mml:mrow><mml:mo>–</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:mstyle></mml:math></inline-formula>, thereby further decreasing the social influence supporting the safe option <inline-formula><mml:math id="sa2m9"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>θ</mml:mi></mml:mrow></mml:msubsup><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula>. Such a negative feedback process weakens the concomitant risk aversion.”</p><disp-quote content-type="editor-comment"><p>Cases where the risky option yields a lesser mean payoff, producing a potentially detrimental social influence, should be given full weight in the main text, and should have been included in the behavioral study. Generally, the discrepancy between modeling and behavioral results is a bit disappointing. It is unclear why the behavioral experiment was not designed so as create the most relevant conditions.</p></disp-quote><p>To address this important point, we have conducted an additional series of experiments in which the risky option yields a smaller mean payoff than the safe alternative (namely, the negative risk premium [NRP] task), and report on it both in the theoretical part (see Supplementary Figure 18 [Figure 6 —figure supplement 2]) as well as in the experimental part (Figure 6 and Table 2; see also Supplementary Figure 19). In general, the model prediction was supported by the data from the NRP condition, suggesting that social influences could slightly be detrimental in such a condition because promotion of exploration increased the suboptimal risk taking. Nevertheless, the extent to which risk taking was increased by social influence in the NRP task was smaller than the extent to which optimal risk taking was increased in the positive risk premium tasks. Also, a previous study found that risk and reward are often correlated positively in many real-life circumstances (Pleskac and Hertwig, 2014), suggesting that situations where social influence is detrimental might be less common than situations where social influence is beneficial. Therefore, our conclusion that conformist social learning is more likely to promote adaptive risk taking should widely hold.</p><p>To highlight the results from these additional analyses and experiments, we have modified several parts of our texts as follows:</p><p>– (Lines 434 – 445): “To investigate whether the collective rescue effect can operate in reality, we conducted a series of online behavioural experiments using human participants. The experimental task was basically a replication of the agent-based model described above, although the parameters of the bandit tasks were different (see the Method for the details of the experimental procedures; Supplementary Figure 11). One hundred eighty-five adult human subjects performed the individual task without social interactions, while 400 subjects performed the task collectively with group sizes ranging from 2 to 8 (Supplementary Figure 17 and 19). We used four different settings for the multiarmed bandit tasks. Three of them were positive risk premium (PRP) tasks that had an optimal risky alternative, while the other was a negative risk premium (NRP) task that had a suboptimal risky alternative (see Methods). l (Lines 446 – 455): The behavioural results with statistical model fitting confirmed the predictions of the theoretical model. In the PRP task subjects who had a larger estimated value of the susceptibility to the hot stove effect (<italic>α<sub>i</sub></italic>(<italic>β<sub>i</sub></italic> + 1)) were less likely to choose the risky alternative, whereas those who had a smaller value of <italic>α<sub>i</sub></italic>(<italic>β<sub>i</sub></italic> + 1) had a higher chance of choosing the safe alternative (Figure 6a–c), consistent with the theory of the hot stove effect (Figure 2, Supplementary Figure 11). In the NRP task, individuals tended to choose the favourable safe option more often than they chose the risky option in a range of the susceptibility value <italic>α<sub>i</sub></italic>(<italic>β<sub>i</sub></italic> + 1) (Figure 6d), which was also consistent with the model prediction (Supplementary Figure 18).”</p><p>– (Lines 480 – 493): “In the NRP task, conformist social influence undermined the proportion of choosing the optimal safe option and increased adverse risk seeking, although a complete switch of the majority's behaviour to the suboptimal risky option did not happen (Figure 6d; Suppelementary Figure 19). Such promotion of suboptimal risk taking was particularly prominent when the susceptibility value <italic>α<sub>i</sub></italic>(<italic>β<sub>i</sub></italic> + 1) was large. Nonetheless, the extent to which risk taking was increased in the NRP condition was smaller than that in the PRP tasks, consistent with our model prediction that conformist social learning is more likely to promote favourable risk taking (Supplementary Figure 18). It is worth noting that the estimated learning rates (i.e., mean <italic>α<sub>i</sub></italic> = 0.48) in the NRP task were larger than that in other PRP tasks (mean <italic>α<sub>i</sub> </italic>< 0.21; Table 2), making social learning particularly deleterious when risk taking is suboptimal (Supplementary Figure 18). In the discussion, we will discuss about the effect of experimental setting on the human learning strategies, which can be explored in the future studies.”</p><p>– (Lines 561 – 571): “Through online behavioural experiments using a risky multi-armed bandit task, we have confirmed our theoretical prediction that simple frequency-based copying could mitigate risk aversion that many individual learners, especially those who had higher learning rates and/or lower exploration rates, would have exhibited as a result of the hot stove effect. The mitigation of risk aversion was also observed in the NRP task, in which social learning slightly undermined the decision performance. However, because riskiness and expected reward are often positively correlated in a wide range of decision-making environments in the real world (Pleskac and Hertwig, 2014), the detrimental effect of reducing optimal risk aversion when risk premium is negative could be negligible in many ecological circumstances, making the conformist social learning beneficial in most cases.”</p><disp-quote content-type="editor-comment"><p>My greatest concern is that the work does not integrate properly with its theoretical and empirical context. Additional analyses assessing the relative contributions of normative and informational conformity to socially induced risk-seeking would be helpful.</p></disp-quote><p>We have conducted additional simulations with the bandit task setting identical to the experimental tasks (see Supplementary Figure 11 (Figure 1 —figure supplement 5) for the PRP tasks; and Figure 18 (Figure 6 —figure supplement 2) for the NRP task). We have confirmed that our theoretical results can hold robustly across the range of different task settings, strengthening the implication of the finding.</p><p>We are fully aware of the important distinctions between normative and informational motivations underlying the use of social information. We have explicitly mentioned our stance at lines 159 – 167 that we limited our analysis to the informational context and suggested some future directions in the discussion at lines 572 – 586, as follows:</p><p>– (Line 159 – 167): “A payoff realised was independent of others’ decisions and it was drawn solely from the payoff probability distribution specific to each alternative, thereby we assume neither direct social competition over the monetary reward (Giraldeau and Caraco, 2000) nor normative pressures towards majority alignment (Cialdini and Goldstein, 2004; Mahmoodi et al., 2018). The value of social information was assumed to be only informational (Nakahashi, 2012). Nevertheless, our model could apply to the context of normative social influences, because what we assumed here was modifications in individual choice probabilities due to social influences, irrespective of underlying motivations of conformity.”</p><p>– (Lines 572 – 586): “The weak reliance on social learning, which affected only about 15% of decisions, was unable to facilitate strong positive feedback. The little use of social information might have been due to the lack of normative motivations for conformity and to the stationarity of the task. In a stable environment, learners could eventually gather enough information as trials proceeded, which might have made them less curious about information gathering including social learning (Rendell et al., 2010). In reality, people might use more sophisticated social learning strategies whereby they change the reliance on social information flexibly over trials (Deffner et al., 2020, Toyokawa et al., 2017, Toyokawa et al., 2019). Future research should consider more strategic use of social information, and will look at the conditions that elicit heavier reliance on the conformist social learning in humans, such as normative pressures for aligning with majority, volatility in the environment, time pressure, or an increasing number of behavioural options (Muthukrishna et al., 2015), coupled with larger group sizes (Toyokawa et al., 2019).”</p><p>[Editors’ note: what follows is the authors’ response to the second round of review.]</p><disp-quote content-type="editor-comment"><p>Essential revisions:</p><p>The manuscript has been improved, and Reviewer 2 recommends acceptance at this point, but Reviewer 3 has some remaining concerns, summarized below. We invite you to address all remaining concerns in a second round of revisions. Make sure to include point-by-point replies to each of Reviewer 3's recommendations.</p><p>1) Although the organization and writing is improved, it has some ways to go before the manuscript is ready for publication. For example, the basic aims and methods should be stated in one or two sentences at the end of the first, or at most second, paragraph of the introduction, giving the reader a clear sense of where things are going. Moreover, the "Agent-based model" section should be shortened (to include only what is needed to conceptually understand the model, leaving details for a table and the methods) and better integrated with the introduction, rather than inserted as (what appears to be) a super-section.</p></disp-quote><p>We thank the editors and the reviewers for this valuable suggestion. We totally agree with the value of giving readers a clear sense of the article’s structure in the beginning of the introduction. To do this, among addressing the other points related to the Introduction (see below), we have revised the Introduction sentence-by-sentence, and have made a central question and background of this paper much clearer in the first two paragraphs. Particularly, the aim of this paper is summarised at the end of the second paragraph the Introduction:</p><p>– Lines 60 – 63: “A theory that incorporates dynamics of trial-and-error learning and the learnt risk aversion into social learning is needed to understand the conditions under which collective intelligence operates in risky decision making.”</p><p>Also, we have deleted the subsection “Agent-based model” and integrated its contents into the end of the Introduction and the beginning of the Result. In both places, we defined technical terms as soon as they first appeared, and verbally described the concept and assumptions of the computational model. Please see the subsections “The decision-making task”, “The baseline model”, and “The conformist social influence model” in the Result section.</p><disp-quote content-type="editor-comment"><p>2) Just as the intro should include a concise description of the model, it should highlight the online experiments, and how they relate to the modeling.</p></disp-quote><p>Highlighting the online experiment and articulating the relationship between the experiment and models in the Introduction is a wonderful suggestion. We have modified the Introduction to make the aim of the experiment clear. The modified text is as follows:</p><p>– Lines 122 – 130: “Finally, to investigate whether the assumptions and predictions of the model hold in reality, we conducted a series of online behavioural experiments with human participants. The experimental task was basically a replication of the task used in the agent-based model described above, although the parameters of the bandit tasks were modified to explore wider task spaces beyond the simplest two-armed task. Experimental results show that the human collective behavioural pattern was consistent with the theoretical prediction, and model selection and parameter estimation suggest that our model assumptions fit well with our experimental data.”</p><disp-quote content-type="editor-comment"><p>3) The result section should start with a paragraph outlining the various hypotheses and corresponding analyses (i.e., a "roadmap" for the section).</p></disp-quote><p>We thank the editors for this insightful suggestion. Sketching a roadmap before showing detailed results is a great idea. To guide readers smoothly from the Introduction to the Result, we have outlined an overview of our analysis at the end if the Introduction:</p><p>– Lines 109 – 132: “In the study reported here, we firstly examined whether a simple form of conformist social influence can improve collective decision performance in a simple multi-armed bandit task using an agent-based model simulation. We found that promotion of favourable risk taking can indeed emerge across different assumptions and parameter spaces, including individual heterogeneity within a group. This phenomenon occurs thanks, apparently, to the non-linear effect of social interactions, namely, <italic>collective behavioural rescue</italic>. To disentangle the core dynamics behind this ostensibly self-organised process, we then analysed a differential equation model representing approximate population dynamics. Combining these two theoretical approaches, we identified that it is a combination of positive and negative feedback loops that underlies collective behavioural rescue, and that the key mechanism is a promotion of information sampling by modest conformist social influence.</p><p>Finally, to investigate whether the assumptions and predictions of the model hold in reality, we conducted a series of online behavioural experiments with human participants. The experimental task was basically a replication of the task used in the agent-based model described above, although the parameters of the bandit tasks were modified to explore wider task spaces beyond the simplest two-armed task. Experimental results show that the human collective behavioural pattern was consistent with the theoretical prediction, and model selection and parameter estimation suggest that our model assumptions fit well with our experimental data. In sum, we provide a general account of the robustness of collective intelligence even under systematic risk aversion and highlight a previously overlooked benefit of conformist social influence.”</p><p>As we believe that repeating the general outline at the beginning of the result section would be redundant, we put short introductory sentences at the beginning of each subsection in the Result. Especially, we have made our experimental hypotheses clearly stated at the beginning of the “Experimental demonstration” subsection as follows:</p><p>– Lines 507 – 512: “On the basis of both the agent-based simulation (Figure 1 and Supplementary Figure 9) and the population dynamics (Figure 5 and Supplementary Figure 16), we hypothesised that conformist social influence promotes risk seeking to a lesser extent when the RP is negative than when it is positive. We also expected that whether the collective rescue effect emerges under positive RP settings depends on learning parameters such as <inline-formula><mml:math id="sa2m10"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>α</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>β</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula> (Supplementary Figure 11d–f).”</p><disp-quote content-type="editor-comment"><p>4) Please quantify the performance of your model relative to others with formal comparisons (e.g., Bayesian Model Selection).</p></disp-quote><p>We really appreciate this insightful suggestion. Thanks to the formal model comparison using the Bayesian model selection based on WAIC, our finding has been much strengthened. We have now included both the model recovery check and the model comparison result in the new Supplementary Figure 18 (Figure 6 —figure supplement 2) in page 54. The successful model recovery has ensured that the hierarchical Bayesian model fitting method could reliably differentiate between the candidate models, and we confirmed that the model comparison favoured the decision-biasing model that we used in our main analysis. We have added some texts to include this finding as follows:</p><p>– Lines 277 – 286: “Further, the conclusion still held for an alternative model in which social influences modified the belief-updating process (the value-shaping model; Najar et al., 2020) rather than directly influencing the choice probability (the decision-biasing model) as assumed in the main text thus far (see Supplementary Methods; Supplementary Figure 8). One could derive many other more complex social learning processes that may operate in reality; however, the comprehensive search of possible model space is beyond the current interest. Yet, decision biasing was found to fit better than value shaping with our behavioural experimental data (Supplementary Figure 18), leading us to focus our analysis on the decision-biasing model.”</p><p>– Lines 513 – 517: “The Bayesian model comparison (Stephan et al., 2009) revealed that participants in the group condition were more likely to employ decision-biasing social learning than either asocial reinforcement learning or the value-shaping process (Supplementary Figure 18). Therefore, in the following analysis we focus on results obtained from the decision-biasing model fit.”</p><p>– Line 950 – 956: “We compared the baseline reinforcement learning model, the decision biasing model, and the value-shaping model (see Supplementary Methods) using Bayesian model selection (Stephan et al., 2009). The model frequency and exceedance probability were calculated based on the Widely Applicable Information Criterion (WAIC) values for each subject (Watanabe and Opper, 2010). We confirmed accurate model recovery by simulations using our task setting (Supplementary Figure 18).”</p><disp-quote content-type="editor-comment"><p>5) Please quantify all claims of associations with effect sizes and clearly justify all parameter cut-offs/values.</p><p>6) Streamline figures and included predicted/observed result plots wherever possible.</p></disp-quote><p>We thank the editors and the reviewers for this great suggestion. Having conducted an additional data analysis using a standard GLMM with a hierarchical Bayesian estimation method, we have quantified all the empirical findings with their effect sizes accompanied by the Bayesian credible intervals. Please see the new Table 3 (page 28) for the estimated coefficients of the GLMM as well as a new Supplementary Figure 17 (Figure 6 —figure supplement 1; page 53) for the prediction from the fit GLMM. Also, we deleted the arbitrary parameter cut-offs, and showed the effect of the copying weight in a continuous manner in the Figure 6 (page 26).</p><p>To streamline the two different types of empirical analyses (that are the fit computational model and the raw data with fit GLMM), we have separated them into the computational model prediction (Figure 6) and the GLMM regression with the experimental data (Supplementary Figure 17). The matched pattern between them supports that the fit computational model was able to reproduce the actual participants’ behaviour. To highlight these points, we have added a new paragraph in the result section as follows:</p><p>– Lines 548 – 560: “To quantify the effect size of the relationship between the proportion of risk taking and each subject’s best fit learning parameters, we analysed a generalised linear mixed model (GLMM) fitted with the experimental data (see Methods; Table 3). Within the group condition, the GLMM analysis showed a positive effect of on risk taking for every task condition (Table 3), which supports the simulated pattern. Also consistent with the simulations, in the positive RP tasks, subjects exhibited risk aversion more strongly when they had a higher value of <inline-formula><mml:math id="sa2m11"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>α</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>β</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula> (Supplementary Figure 17a–c). There was no such clear trend in data from the negative RP task, although we cannot make a strong inference because of the large width of the Bayesian credible interval (Supplementary Figure 17d). In the negative RP task, subjects were biased more towards the (favourable) safe option than subjects in the positive RP tasks (i.e., the intercept of the GLMM was lower in the negative RP task than in the others).”</p><disp-quote content-type="editor-comment"><p>Reviewer #3:</p><p>The authors showed that when individuals learn about how they should decide in a situation where they can choose between a risky and a safe option, they might overcome maladaptive biases (e.g. exaggerated risk-aversion when risk-taking would be beneficial) by conforming with group behaviour. Strengths include rigorous and innovative computational modeling, a weakness might be that the set-up of the empirical study did not actually widely provoke behavioral phenomena at question, e.g. social learning (which is arguably at the core of the research question). Even though I am reviewing a revised manuscript I would hope the authors find a way to further improve clarity in the presentation of their research question and results.</p><p>My main concern was that even in the revised version I read, I found the paper not as accessible as I think it should be for a wide readership of a journal like eLife; and often times I found that things you could communicate in a straightforward way are put too complicated/expressed very verbosely. When reading the previous reviews after having read the revised paper, I also got the feeling that there were some misunderstandings. You clarified these specific points well in your responses, I think, but the lack in clarity might be even more drastic with an interdisciplinary readership this journal aims at as compared to the experts the journal has recruited now for this review? I will try to give some examples below.</p></disp-quote><p>We thank the reviewer very much for this valuable suggestion. We fully agreed that the previous manuscript was not very accessible to a wider audience we aim to reach. We have revised the manuscript substantially to improve its clarity, accessibility, and rigor. Especially, both the Introduction and introductory paragraphs of the Result were rewritten to let them clearly articulate our research question and aims. In the following, we will explain, point-by-point, how we have revised the manuscript in responding to each of the reviewer’s concerns.</p><disp-quote content-type="editor-comment"><p>The introduction should, imho provide a general intro to the question of the paper and how you've arrived to ask that question, avoiding too much technical jargon. After having read the paper, I realized that the research question is pretty straight-forward (and interesting) and derived from 1/2 previous observations, but this didn't become clear on the very first read.</p><p>Just as an example, some of the first sentences are…</p><p>"One rationale behind this optimistic view might come from the assumption that individuals tend to prefer a behavioural option that provides larger net benefits in utility over those providing lower outcomes. Therefore, even though uncertainty makes individual decision-making fallible, statistical filtering through informational pooling may be able to reduce uncertainty by cancelling out such noise."</p><p>This is only 1 example (it is something I noted throughout the paper… and also came up in the previous round of reviews) where I think these 2 sentences require some if not a little more background in decision-making (net benefits, utility, uncertainty, noise, stat filtering, informational pooling) to be understandable.</p></disp-quote><p>We thank the reviewer for this valuable feedback. The sentence referred here was in the first paragraph of the Introduction of the previous manuscript, which was indeed not very accessible for many interdisciplinary readers. Having restructured the Introduction, we believe that the aims and motivations behind this study became clearer and selfexplanatory. The first paragraph of the Introduction is now read as follows:</p><p>– Lines 29 – 46: “Collective intelligence, a self-organised improvement of decision making among socially interacting individuals, has been considered one of the key evolutionary advantages of group living (Camazine et al., 2001; Krause and Ruxton, 2002; Sumpter, 2006; Ward and Zahavi, 1973). Although what information each individual can access may be a subject of uncertainty, information transfer through the adaptive use of social cues filters such ‘noises’ out (Laland, 2004; Rendell et al., 2010), making individual behaviour on average more accurate (Hastie and Kameda, 2005; King and Cowlishaw, 2007; Simons, 2004). Evolutionary models (Boyd and Richerson, 1985; Kandler and Laland, 2013; Kendal et al., 2005) and empirical evidence (Toyokawa et al., 2014, 2019) have both shown that the benefit brought by the balanced use of both socially and individually acquired information is usually larger than the cost of possibly creating an alignment of suboptimal behaviour among individuals by herding (Bikhchandani et al., 1992; Giraldeau et al., 2002; Raafat et al., 2009). This prediction holds as long as individual trialand-error learning leads to higher accuracy than merely random decision making (Efferson et al., 2008). Copying a common behaviour exhibited by many others is adaptive if the output of these individuals is expected to be better than uninformed decisions.”</p><disp-quote content-type="editor-comment"><p>Other terminology like e.g. collective illusion, opportunity costs, description-based vs. experienced-based risk-taking paradigms, frequency-based influences, would be nice to be either defined in the text or replaced by a more accessible description in the introduction.</p><p>I know it is sometimes hard to mentalize which terminology others not working on the same things might struggle with, but given that this is an interdisciplinary journal, might it perhaps make sense to ask a researcher friend who is not exactly working on this topic to give it a read?</p></disp-quote><p>We thank the reviewer so much for specifying these reader-unfriendly technical jargons. We have defined both “collective illusion” and “frequency-based” when they first appear, with the other terms eliminated from the text. Please see the revised manuscript listed below. We have asked several colleagues from different fields to read the manuscript, which we believe has made the manuscript much more accessible. The modified sentences are as follows:</p><p>– Lines 94 – 95: “a mismatch between the true environmental state and what individuals believed (’collective illusion’; Denrell and Le Mens, 2016).”</p><p>– Lines 65 – 71: “it may seem that social learning, especially the ’copy-the-majority’ behaviour (aka, ’conformist social learning’ or ’positive frequency-based copying’; Laland, 2004), whereby the most common behaviour in a group is disproportionately more likely to be copied (Boyd and Richerson, 1985), may often lead to maladaptive herding, because recursive social interactions amplify the common bias (i.e., a positive feedback loop; Denrell and Le Mens, 2007, 2016; Dussutour et al., 2005; Raafat et al., 2009).”</p><disp-quote content-type="editor-comment"><p>"such a risk-taking bias constrained by the fundamental nature of learning may function independently from the adaptive risk perception (Frey et al., 2017), potentially preventing adaptive risk taking."</p><p>Unclear without knowing or looking up the Frey paper – shorten ("to be too risk-averse might be maladaptive in some contexts"?) or explain.</p></disp-quote><p>We totally agreed that the sentence was unclear. Indeed, merely mentioning that risk aversion may be adaptive in some context (Real and Caraco, 1986; McNamara and Houston, 1992; Yoshimura and Clark, 1991), and that risk aversion may arise from different mechanisms (Frey et al. 2017), was not directly related to the focus of this paper. What we would like to highlight in the second paragraph of the Introduction was the omnipresent possibility of risk aversion arising from reinforcement learning. In the revised version, therefore, we concentrated on this focal point and deleted those irrelevant topics.</p><p>– Lines 47 – 63: “However, both humans and non-human animals suffer not only from environmental noise but also commonly from systematic biases in their decision making (e.g., Harding et al., 2004; Hertwig and Erev, 2009; Real, 1981; Real et al., 1982). Under such circumstances, simply aggregating individual inputs does not guarantee collective intelligence because a majority of the group may be biased towards suboptimization. A prominent example of such a potentially suboptimal bias is risk aversion that emerges through trial-and-error learning with adaptive information-sampling behaviour (Denrell, 2007; March, 1996). Because it is a robust consequence of decision making based on learning (Hertwig and Erev, 2009; Yechiam et al., 2006; Weber, 2006; March, 1996), risk aversion can be a major constraint of animal behaviour, especially when taking a high-risk high-return behavioural option is favourable in the long run. Therefore, the ostensible prerequisite of collective intelligence, that is, that individuals should be unbiased and more accurate than mere chance, may not always hold. A theory that incorporates dynamics of trial-and-error learning and the learnt risk aversion into social learning is needed to understand the conditions under which collective intelligence operates in risky decision making.”</p><disp-quote content-type="editor-comment"><p>Line 74-82 extremely long sentence – I think can easily be simplified? – "previous studies have neglected contexts where individuals learn about the environment both by own and others' experiences"</p></disp-quote><p>We agreed that the sentence was too long. Because this relates to the central motivation behind our choice of model and question, we elaborated it in two paragraphs as follows:</p><p>– Lines 83 – 99: “In this paper, we propose a parsimonious computational mechanism that accounts for the emerging improvement of decision accuracy among suboptimally riskaversive individuals. In our agent-based model, we allow our hypothetical agents to compromise between individual trial-and-error learning and the frequency-based copying process, that is, a balanced reliance on social learning that has been repeatedly supported in previous empirical studies (e.g., Deffner et al., 2020; McElreath et al., 2005, 2008; Toyokawa et al., 2017, 2019). This is a natural extension of some previous models that assumed that individual decision making was regulated fully by others’ beliefs (Denrell and Le Mens, 2007, 2016). Under such extremely strong social influence, exaggeration of individual bias was always the case because information sampling was always directed towards the most popular alternative, often resulting in a mismatch between the true environmental state and what individuals believed (’collective illusion’; Denrell and Le Mens, 2016). By allowing a mixture of social and asocial learning processes within a single individual, the emergent collective behaviour is able to remain flexible (Aplin et al., 2017; Toyokawa et al., 2019), which may allow groups to escape from the suboptimal behavioural state.”</p><p>– Lines 100 – 108: “We focused on a repeated decision-making situation where individuals updated their beliefs about the value of behavioural alternatives through their own action–reward experiences (experience-based task). Experience-based decision making is widespread in animals that learn in a range of contexts (Hertwig and Erev, 2009). The time-depth interaction between belief updating and decision making may create a non-linear relationship between social learning and individual behavioural biases (Biro et al., 2016), which we hypothesised is key in improving decision accuracy in self-organised collective systems (Camazine et al., 2001; Sumpter, 2006).”</p><disp-quote content-type="editor-comment"><p>Line 84-89 is really long, too.</p><p>I would have been interested to learn more about the online experiments in the intro.</p></disp-quote><p>We thank the reviewer very much for pointing this out. We totally agreed. In the revised version, we gave a more accessible roadmap of the paper at the end of the Introduction, rather than putting such a long-sentence summary. In this roadmap section, we have described more about the experiment and highlight the relationship between the theoretical models and the experiment. The revised paragraph of the Introduction is as follows:</p><p>– Lines 109 – 132: “In the study reported here, we firstly examined whether a simple form of conformist social influence can improve collective decision performance in a simple multi-armed bandit task using an agent-based model simulation. We found that promotion of favourable risk taking can indeed emerge across different assumptions and parameter spaces, including individual heterogeneity within a group. This phenomenon occurs thanks, apparently, to the non-linear effect of social interactions, namely, <italic>collective behavioural rescue</italic>. To disentangle the core dynamics behind this ostensibly self-organised process, we then analysed a differential equation model representing approximate population dynamics. Combining these two theoretical approaches, we identified that it is a combination of positive and negative feedback loops that underlies collective behavioural rescue, and that the key mechanism is a promotion of information sampling by modest conformist social influence.</p><p>Finally, to investigate whether the assumptions and predictions of the model hold in reality, we conducted a series of online behavioural experiments with human participants. The experimental task was basically a replication of the task used in the agent-based model described above, although the parameters of the bandit tasks were modified to explore wider task spaces beyond the simplest two-armed task. Experimental results show that the human collective behavioural pattern was consistent with the theoretical prediction, and model selection and parameter estimation suggest that our model assumptions fit well with our experimental data. In sum, we provide a general account of the robustness of collective intelligence even under systematic risk aversion and highlight a previously overlooked benefit of conformist social influence.”</p><disp-quote content-type="editor-comment"><p>I do understand the reasoning of copy and pasting the agent Based Model description after having read the previous reviews, but it confused me when reading the article first; I think it needs to be integrated with Intro/Results section better (the first paragraph reads like intro/background still, but then it is about the authors' current study). I fear that readers don't know where they are in the paper at that point. (I much prefer journals with the Methods-Results order rather than the one eLife uses, as this would naturally circumvent this problem, but that's the challenge here, I reckon.)</p></disp-quote><p>We totally agreed that the conceptual description of the method should be integrated in the Introduction and the Result. In the revised manuscript, we have elaborated the concept of the model using as an accessible language as possible. Especially, the new subsections “The decision-making task” (page 5), “The baseline model” (page 7), “The conformist social influence model” (page 8), as well as “The simplified population dynamics model” (page 16) have been substantially revised to have conceptual verbal descriptions about the assumption and formulation before showing detailed results.</p><disp-quote content-type="editor-comment"><p>Results</p><p>Subheadings: Make clearer which is the section that describes simulations and which is the empirical section.</p></disp-quote><p>Thank you for this valuable suggestion. We have now subheadings fully separated between the theoretical part and the experimental results, so that the empirical result is shown only in the subsection “An experimental demonstration” (page 21).</p><disp-quote content-type="editor-comment"><p>Can you maybe start with reiterating in a structured way which parameters you set to which values and why in the simulation before describing the effects of it, how many trials you simulate (you start speaking of elongated time horizons but do not mention the original horizon length other than in the figure legend?) etc?</p></disp-quote><p>We have added parameters used in the simulations before describing the result. The revision we made are as follows:</p><p>– Lines 146 – 147: “Unless otherwise stated, the total number of decision-making trials (time horizon) was set to T = 150 in the main simulations described below.”</p><p>– Line 301 – 307: “Individual values of a focal behavioural parameter were varied across individuals in a group. Other non-focal parameters were identical across individuals within a group. The basic parameter values assigned to non-focal parameters were α = 0.5, β = 7, σ = 0.3, and θ = 2, which were chosen so that the homogeneous group could generate the collective rescue effect. The groups’ mean values of the various focal parameters were matched to these basic values.”</p><disp-quote content-type="editor-comment"><p>I know the Najar work and I think it is cool that you can generalize your results also to a value-based framework, but I do not think readers that do not know the Najar study will be able to follow this at it is described now, which makes it more confusing than interesting. So, either elaborate what this means (accessible to non-initiated readers) or ban to the Supplement (would be a shame).</p><p>Would the value-based model fit the empirical data better?</p></disp-quote><p>Thank you so much for this valuable comment. As we described in our response to the editor’s point (4), we have conducted the Bayesian model comparison and have established that the decision-biasing model was likely to fit better than both the value-shaping model and the baseline reinforcement learning model. We have verbally described the value shaping model in the following paragraph, while the full details are shown in the supplementary method.</p><p>– Lines 277 – 286: “Further, the conclusion still held for an alternative model in which social influences modified the belief-updating process (the value-shaping model; Najar et al., 2020) rather than directly influencing the choice probability (the decision-biasing model) as assumed in the main text thus far (see Supplementary Methods; Supplementary Figure 8). One could derive many other more complex social learning processes that may operate in reality; however, the comprehensive search of possible model space is beyond the current interest. Yet, decision biasing was found to fit better than value shaping with our behavioural experimental data (Supplementary Figure 18), leading us to focus our analysis on the decision-biasing model.”</p><disp-quote content-type="editor-comment"><p>When reading the first part of the Results section I constantly wondered: How did the group behave / How was the behaviour of the group determined in the simulation? Was variability considered? (this is something that's been manipulated in some empirical studies building on descriptive risk scenarios, e.g. Suzuki et al). It becomes clear when reading on/looking at Figure 3, but it is such a crucial point that it needs to be made clear from the beginning.</p></disp-quote><p>This is a great suggestion and we agreed with the importance to consider the variability of individuals. To make this point clear in the Introduction, we have included this in the “roadmap” paragraph as follows:</p><p>– Lines 111 – 113: “We found that promotion of favourable risk taking can indeed emerge across different assumptions and parameter spaces, including individual heterogeneity within a group.”</p><disp-quote content-type="editor-comment"><p>I think it is a major limitation that in the empirical study actual social learning was extremely limited, given that the paper claims to provide a formal account of the function of social learning in this situation?…. I would have thought that indeed trying to provoke more use of social influence by altering the experimental setup in a way the authors propose in their discussion would have been important, and given that this can be done online, also a feasible option.</p></disp-quote><p>We thank the reviewer for pointing out the limitation of the current study. We totally agree that increasing the copying weight by experimental manipulations will indeed be an important future direction. As we discussed in the main text (lines 649 – 676), one of such a promising manipulation will be to use a ‘restless’ bandit task that is theoretically expected to induce both a higher learning rate and higher copying weight. Nevertheless, we believe that a direct link between the simplest form of theory (that is, a static bandit task) and experimental findings was a necessary first step toward developing further theoretical hypotheses in more complex settings. Therefore, in the current purpose we put the possibility of the restless bandit as a future task.</p><disp-quote content-type="editor-comment"><p>Also, empirically, susceptibility to the hot stove effect, i.e., <italic>α</italic><sub><italic>i</italic></sub>(<italic>β</italic><sub><italic>i</italic></sub> + 1), seems to be very low – zero for most of the participants in some scenarios according to Figure 6A,B,C? – isn't this concerning, given that this is at the core of what the authors want to explain?</p></disp-quote><p>We thank the reviewer for raising this point. This is a wonderful question. The average value of the susceptibility to the hot stove effect across the conditions was about 0.2 ~ 0.6, but not zero (which was not visually obvious due to the scale of the x-axis, but can be derived from the fit parameter values shown in Table 2). Such a low level of the susceptibility to the hot stove effect was problematic in the 1-risky-1-safe task because asocial individuals were not likely to suffer from the hot stove effect (see Supplementary Figure 11a). Therefore, we conducted the other two positive RP 4-armed tasks where risk aversion was expected to emerge even if α (β +1) was as small as such values (Supplementary Figure 11b, c). However, as we have discussed in the main text, it is indeed an interesting future direction to manipulate both task and environment to elicit higher learning rate as well as heavier reliance on social learning.</p><disp-quote content-type="editor-comment"><p>"those with a higher value of the susceptibility to the hot stove effect (<italic>α</italic><sub><italic>i</italic></sub>(<italic>β</italic><sub><italic>i</italic></sub> + 1)) were less likely to choose the risky alternative, whereas those who had a smaller value of <italic>α</italic><sub><italic>i</italic></sub>(<italic>β</italic><sub><italic>i</italic></sub> + 1) had a higher chance of choosing the safe alternative (Figure 6a-c), " -- I'm confused- should it read "those who had a smaller value of <italic>α</italic><sub><italic>i</italic></sub>(<italic>β</italic><sub><italic>i</italic></sub> + 1) had a higher chance of choosing the ‘risky’ alternative"?</p><p>Could you please quantify this correlation in terms of an effect size? The association is not linear, from the Figure? Please specify.</p><p>Is the association driven by α or by β or really by the product?</p></disp-quote><p>The reviewer is correct, the original sentence suggested the effect in the wrong, opposite direction. To quantify the effect size, we have conducted an additional GLMM, and the results are now read:</p><p>– Lines 548 – 560: “To quantify the effect size of the relationship between the proportion of risk taking and each subject’s best fit learning parameters, we analysed a generalised linear mixed model (GLMM) fitted with the experimental data (see Methods; Table 3). Within the group condition, the GLMM analysis showed a positive effect of on risk taking for every task condition (Table 3), which supports the simulated pattern. Also consistent with the simulations, in the positive RP tasks, subjects exhibited risk aversion more strongly when they had a higher value of <inline-formula><mml:math id="sa2m12"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mi>α</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>β</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula> (Supplementary Figure 17a–c). There was no such clear trend in data from the negative RP task, although we cannot make a strong inference because of the large width of the Bayesian credible interval (Supplementary Figure 17d). In the negative RP task, subjects were biased more towards the (favourable) safe option than subjects in the positive RP tasks (i.e., the intercept of the GLMM was lower in the negative RP task than in the others).”</p><p>The product, <italic>α</italic><sub><italic>i</italic></sub>(<italic>β</italic><sub><italic>i</italic></sub> + 1), has been derived from the theoretical development of Denrell (2007) and has been used in our theoretical analysis too (Figure 2). Of course, the learning rate (α) and the inverse temperature (β) are a different free parameter that plays a different functional role in the learning algorithm. However, in the context of the hot stove effect, they play a correlated role, which allows us to compress a dimension in the analysis. Thanks to this, we can understand both theory (Figure 2) and empirical results (Figure 6 and Supplementary Figure 17) in a simple way. Therefore, for the sake of brevity, we believe that treating them as a product form <italic>α</italic><sub><italic>i</italic></sub>(<italic>β</italic><sub><italic>i</italic></sub> + 1) is more straightforward for the current purpose than separating them apart.</p><disp-quote content-type="editor-comment"><p>"The behaviour in the group condition supports our theoretical predictions. In the PRP tasks, the proportion of choosing the favourable risky option increased with social influence (i) particularly for individuals who had a high susceptibility to the hot stove effect. On the other hand, social influence had little benefit for those who had a low susceptibility to the hot stove effect (e.g., <italic>α</italic><sub><italic>i</italic></sub>(<italic>β</italic><sub><italic>i</italic></sub> + 1) {less than or equal to} 0.5)." Can you quantify this with statistically (effect sizes etc)?</p></disp-quote><p>We thank the reviewer very much for suggesting us to formally quantify the effect sizes. As we described above, we have conducted a GLMM analysis, and confirmed that the pattern emerged in the experiment matched well with the prediction of the calibrated computational model. We believe that our findings have been made more convincing by this additional analysis.</p><disp-quote content-type="editor-comment"><p>Did you do any form of model selection on the empirical data (with different set ups of your models, the reduced models (e.g. without σ), or the Najar type of model) to demonstrate that it is really your theoretically proposed model that fits the data best (e.g. Bayesian Model Selection)? Please include in the main manuscript. See Palminteri, TiCS on why this might be important.</p><p>I think Figure 6 is really overloaded. The legend somehow looks as it would belong only to panel c? The coloured plots are individual data points as a function of group size (which is what?) or copying weight? For me, in the current formatting, dots were too small to detect a continuous colour coding scheme (only yellow vs purple). The solid lines are simulated data? Can you show a regression line for the empirical data to allow for comparisons? Does it not differ substantially from the model predictions? I suggest making different plots for different purposes (compare predicted behaviour to empirical behaviour, show effect of copying weight, show effect of group size etc, show simulation were you plug in σ>0.4)</p></disp-quote><p>Thank you so much also for giving us this terrific suggestion. We have included both the model recovery test and model comparison based on Bayesian model selection in the main test (Supplementary Figure 18 [Figure 6 —figure supplement 2]) in page 54. Please see our response to the editor’s comment (4) for more details.</p><disp-quote content-type="editor-comment"><p>"In keeping with this, if we extrapolated the larger value of the copying of weight (i.e., <inline-formula><mml:math id="sa2m13"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mrow><mml:mover><mml:msub><mml:mi>σ</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">¯</mml:mo></mml:mover></mml:mrow></mml:mrow></mml:mstyle></mml:math></inline-formula>>0.4) into the best fitting social learning model with the other parameters calibrated, a strong collective rescue became prominent " – sorry, where does the value <inline-formula><mml:math id="sa2m14"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mrow><mml:mover><mml:mi>σ</mml:mi><mml:mo stretchy="false">¯</mml:mo></mml:mover></mml:mrow></mml:mrow></mml:mstyle></mml:math></inline-formula>>0.4 exactly come from for this analysis? Please give more detail/contextualise better.</p></disp-quote><p>This was indeed helpful feedback. As we described in our response to the editor’s comment (5) and (6), we have separated the computational model prediction and the data with a regression line into two figures. We have also included the varying σ in a gradual manner, varying across the range of individual fit σ for each task, rather than just showing an arbitrary high value (that was set to σ>0.4 in the previous manuscript). We believe that the current presentation of the empirical result allows readers to easily differentiate what was the data themselves and what was the model prediction.</p><disp-quote content-type="editor-comment"><p>Please try to be consistent with terminology <italic>α</italic><sub><italic>i</italic></sub>(<italic>β</italic><sub><italic>i</italic></sub>+ 1) is sometimes called 'susceptibility ' or 'susceptibility value', which might be confusing, given that in some published articles susceptibility refers to susceptibility to social influence which would be another parameter…. I suggest to go through the manuscript once more and strictly only use one term for each parameter (the one you introduce in the table).</p></disp-quote><p>Thank you so much again for your thorough review and insightful comments. We have gone through the text again and made all the terms consistent.</p></body></sub-article></article>