Result: Adaptive purchase tasks in the operant demand framework.

Title:

Adaptive purchase tasks in the operant demand framework.

Authors:

Gilroy SP; Department of Psychology, Louisiana State University., Rzeszutek MJ; Department of Family and Community Medicine, University of Kentucky., Koffarnus MN; Department of Family and Community Medicine, University of Kentucky., Reed DD; Institutes for Behavior Resources, Inc., Hursh SR; Institutes for Behavior Resources, Inc.

Source:

Experimental and clinical psychopharmacology [Exp Clin Psychopharmacol] 2025 Apr; Vol. 33 (2), pp. 199-208. Date of Electronic Publication: 2025 Feb 24.

Publication Type:

Journal Article

Language:

English

Journal Info:

Publisher: American Psychological Association Country of Publication: United States NLM ID: 9419066 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1936-2293 (Electronic) Linking ISSN: 10641297 NLM ISO Abbreviation: Exp Clin Psychopharmacol Subsets: MEDLINE

Imprint Name(s):

Original Publication: Washington, DC : American Psychological Association, c1993-

MeSH Terms:

Adaptive Algorithms* , Conditioning, Operant* , Consumer Behavior* , Models, Psychological* , Reinforcement, Psychology*, Machine Learning ; Humans

Entry Date(s):

Date Created: 20250224 Date Completed: 20250325 Latest Revision: 20250814

Update Code:

20260130

DOI:

10.1037/pha0000757

PMID:

39992756

Database:

MEDLINE

Further Information

*Various avenues exist for quantifying the effects of reinforcers on behavior. Numerous nonlinear models derived from the framework of Hursh and Silberberg (2008) are often applied to elucidate key metrics in the operant demand framework (e.g., Q₀, P<subscript>MAX</subscript>), with each approach presenting respective strengths and trade-offs. This work introduces and demonstrates an adaptive task capable of elucidating key features of operant demand without relying on nonlinear regression (i.e., a targeted form of empirical P<subscript>MAX</subscript>). An adaptive algorithm based on reinforcement learning is used to systematically guide questioning in the search for participant-level estimates related to peak work (e.g., P<subscript>MAX</subscript>), and this algorithm was evaluated across four varying iteration lengths (i.e., five, 10, 15, and 20 sequentially updated questions). Equivalence testing with simulated agent responses revealed that tasks with five or more sequentially updated questions recovered P<subscript>MAX</subscript> values statistically equivalent to seeded P<subscript>MAX</subscript> values, which provided evidence suggesting that quantitative modeling (i.e., nonlinear regression) may not be necessary to reveal valuable features of reinforcer consumption and how consumption scales as a function of price. Discussions are presented regarding extensions of contemporary hypothetical purchase tasks and strategies for extracting and comparing critical aspects of consumer demand. (PsycInfo Database Record (c) 2025 APA, all rights reserved).*

Adaptive Purchase Tasks in the Operant Demand Framework

<cn> <bold>By: Shawn P. Gilroy</bold>
> Department of Psychology, Louisiana State University
> <bold>Mark J. Rzeszutek</bold>
> Department of Family and Community Medicine, University of Kentucky
> <bold>Mikhail N. Koffarnus</bold>
> Department of Family and Community Medicine, University of Kentucky
> <bold>Derek D. Reed</bold>
> Institutes for Behavior Resources, Inc., Baltimore, Maryland, United States
> <bold>Steven R. Hursh</bold>
> Institutes for Behavior Resources, Inc., Baltimore, Maryland, United States
> Department of Psychiatry and Behavioral Science, Johns Hopkins University School of Medicine </cn>

<bold>Acknowledgement: </bold>Neo Gebru served as action editor.The authors have no known conflicts of interest to disclose. This work was developed as part of a collaboration on novel methods for evaluating reinforcer effects. This work was not grant funded. The authors do not have any significant interests to disclose related to this work.Shawn P. Gilroy played a lead role in conceptualization, formal analysis, methodology, visualization, and writing–original draft and an equal role in writing–review and editing. Mark J. Rzeszutek played a supporting role in methodology and writing–original draft and an equal role in writing–review and editing. Mikhail N. Koffarnus played a supporting role in methodology and writing–original draft and an equal role in writing–review and editing. Derek D. Reed played a supporting role in methodology and writing–original draft and an equal role in writing–review and editing. Steven R. Hursh played a supporting role in methodology, writing–original draft, and writing–review and editing.

The operant demand framework is a mature and active collection of methods designed to evaluate how the effects of reinforcers on behavior scale as a function of various ecological factors, such as “cost” and the availability of alternatives (Hursh, 1980, 1984). Hursh and Silberberg (2008) emphasized the need to expand upon earlier efforts to quantify reinforcer effects (e.g., breakpoints, response rates) and move toward a reinforcer-based framework sensitive to the continuous nature of reinforcer effects (e.g., varying effects across potential schedules) as well as the reinforcer–reinforcer relations (e.g., complements, substitutes) that inevitably influence choice behavior (Hursh & Bauman, 1987; Madden et al., 2007).

The most well-represented approach to quantifying reinforcer consumption using economic principles has been the exponential model of operant demand proposed by Hursh and Silberberg (2008). This model (Equation 1) characterizes the effects of price (P) on levels of reinforcer consumption (Q). A total of three parameters are featured in the model, and inferences drawn from this model emphasize two key measures related to consumption: demand intensity (Q0) and the scaling of reinforcer effects as a function of price (i.e., α and PMAX).<anchor name="b-fn1"></anchor><sups>1</sups><anchor name="eqn1"></anchor>Demand intensity reflects consumption when scaling due to P is zeroed out (i.e., Q0 = Q at a P = 0) and therefore reflects a dimension of reinforcer consumption independent of price. Alternatively, consumption scaled as a function of price can be both directly and indirectly estimated in the model via α and PMAX, respectively. The α parameter reflects the rate of change in price elasticity (η) observed across prices given units inferred by the intercept (Q0) and span of the demand curve (k), and this reveals the distance in terms of P wherein the η of demand shifts from inelastic demand to elastic demand (i.e., PMAX). That is, α may be viewed as the rate by which the inelastic demand for reinforcers observed at lower prices advances toward elastic demand at increased prices. Unit elasticity (i.e., η = −1) and price associated with peak work, PMAX, equates to parameter α when accounting for parameters Q0 and k (Gilroy et al., 2019). Related to PMAX, the quantity OMAX refers to the total overall level of consumption observed at PMAX. These metrics have been found to each capture distinct and meaningful dimensions of reinforcer consumption for various types of reinforcer consumption (Aston et al., 2017; Bidwell et al., 2012; Mackillop et al., 2009).

<h31 id="pha-33-2-199-d241e242">Critical Elements in the Operant Demand Framework</h31>

Individual patterns of consumption have been linked to various ecological and contextual factors (Strickland et al., 2022). Researchers applying research synthesis to characterize various forms of substance use (e.g., cigarettes, alcohol) have found good support for a two-factor latent solution consisting of Amplitude and Persistence (Mackillop et al., 2009). Using Mackillop et al. (2009) as a representative example, these authors found that principal component analysis of demand curve metrics revealed one factor to be associated with volumetric consumption of alcohol (amplitude; e.g., the intensity of demand) and another with the degree to which individuals took steps to defend that consumption (persistence; i.e., [in]sensitivity to price, PMAX). These findings provided good evidence that substance use patterns are complex and heterogeneous and teams have since replicated this factor structure across various families of drug reinforcers, including cigarettes (Bidwell et al., 2012), marijuana (Aston et al., 2017), heroin, and cocaine (Schwartz et al., 2023). These findings suggest that the various demand curve metrics each provide information useful for evaluating reinforcer effects and that this utility extends to various families of reinforcers. This general framework has been useful for facilitating various forms of drug reinforcer research, such as drawing comparisons between reinforcers to compare the potential of each for use and abuse (i.e., balancing therapeutic effects with potential for problematic consumption levels) as well as conducting explorations of a single drug reinforcer within a range of varying units/dosages.

Hursh and Silberberg (2008) introduced the concept of essential value (EV) as a strategy for supporting comparisons across drug reinforcers and across differing units (e.g., dosages). Specifically, the goal of the EV strategy was to isolate variance associated with varying units (e.g., units of reinforcer delivery) and to promote a more universal interpretation of how a given reinforcer affects behavior across schedules. The expression presented by Hursh and Silberberg (2008) was later expanded and simplified in Gilroy (2023), see Equation 2.<anchor name="eqn2"></anchor>

The two expressions in Equation 2 reflect EV with (lower) and without (upper) a normalization accounting for reinforcer unit differences (i.e., consumption varying due to dosage differences). The upper expression reflects the more typical use case when patterns of consumption within or across participants are examined and share a common reinforcer unit (e.g., no. of cigarettes with equal nicotine content). In contrast, the lower expression is useful in the less common case wherein patterns of consumption are simultaneously analyzed across differing reinforcer units (e.g., high-nicotine vs. low-nicotine cigarette consumption). The EV metric highlights the importance of each of the metrics discussed thus far in characterizing and comparing reinforcer scaling (i.e., Q0 and PMAX).

The metrics Q0 and PMAX provide straightforward and representative reflections of Amplitude and Persistence. Although the estimated α parameter can also be used to characterize persistence (i.e., a continuous rate of change in elasticity [η]), parameters such as α are difficult to interpret outside of the specific contexts in which they are estimated. For example, parameter α is straightforward to estimate and compare in contexts where a common scale (k) exists but difficult to compare outside of these circumstances. In contrast, PMAX reflects the scaling of reinforcers in a way that is (a) generally robust to variance in spans (ks) and (b) easily interpreted via visual inspection of a work output curve, see Figure 1.<anchor name="b-fn2"></anchor><sups>2</sups> Regardless, the two are highly related and calculations describing this relationship have been established (see Appendix). Likewise, Q0 is also straightforward and can be easily estimated or directly sampled when the effects of pricing are absent, such as by directly observing or querying individual consumption at a P = 0 (see Amlung et al., 2015, for a relevant example of such).
>
><anchor name="fig1"></anchor>

<h31 id="pha-33-2-199-d241e413">Elucidating Key Metrics in Operant Demand</h31>

The most prevalent means of reporting demand intensity and the scaling of reinforcer effects take the form of fitted model parameters (i.e., Q0, α). High levels of adoption for the exponential model presented by Hursh and Silberberg (2008) make good sense given the high level of applicability facilitated by a small number of parameters. For example, in the absence of very small fractional consumption values (e.g., 0.001) or nonconsumption (i.e., 0), the original model fares quite well with consumption values that range across multiple orders. Furthermore, metrics not revealed directly from modeling (e.g., PMAX) are calculated with ease via exact solutions (Gilroy et al., 2019). Although derivatives of this framework have been put forward to accommodate cases where nonconsumption values are observed (Gilroy et al., 2021; Koffarnus et al., 2015), each implementation represents a departure from the original manner of regression and presents with respective trade-offs Gilroy (2022).

Empirical alternatives that do not require parameter estimation (i.e., model-free approaches) exist for Q0, OMAX, and PMAX. Demand intensity and responding representative of Q0 can be captured simply by querying and assessing consumption in the absence of any cost (i.e., QFREE; e.g., “How much of ____ would you consume if free?).” Additionally, values of OMAX and PMAX can be inferred visually via the inspection of an expenditure curve (e.g., Greenwald & Hursh, 2006). Specifically, the empirical OMAX can be extracted from the peak level of expenditure (i.e., the amount of resources spent), which is accompanied by the empirical PMAX associated with the empirical OMAX.

Although methods using empirical data are easily performed, these simpler approaches present various limitations. First, researchers typically do not have a priori information regarding which prices are likely meaningful to prospective participants and the operant demand framework. Said a bit more directly, price assays for hypothetical purchase tasks are weakly informed and cover a substantial range (e.g., 0–10,000 USD per unit), which typically results in an oversampling of responding at higher/lower prices and undersampling of responding in the region of the curve associated with the greatest change (i.e., the bend of the curve). This arrangement weakens the usefulness and precision of individual data points as empirical representations of key demand metrics because these values (e.g., PMAX) must be contained within increasingly large pricing increments, such as between 10 and 100 USD (see Gilroy et al., 2019, for representative simulations). These issues regarding precision are due to the fixed nature of typical pricing tasks, which are not designed to explore the questions and prices relevant to individual participants, and this feature presently limits the utility of empirical metrics in the framework (i.e., empirical PMAX and demand intensity).

<h31 id="pha-33-2-199-d241e489">Adaptive Algorithms in Operant Behavioral Economics</h31>

Adaptive assessments mutate in response to information provided by individual respondents. That is, an algorithm is put into place to present the most statistically informative questions for individual respondents. Although not yet available in work applying operant demand methods, adaptive tasks have been available for delay discounting research for some time. These include tasks that adapt to participant responses to identify a representative data point, such as 50% decay or ED50 (J. H. Yoon & Higgins, 2008), or those who identify some parameter based on an assumption of the presumed functional form of the process underlying the data (e.g., decay rate matched to 50% decay).

<bold>Adaptive Tasks in Delay Discounting Research</bold>

The task presented by Du et al. (2002) features an algorithm to iteratively elucidate a boundary wherein the value of each prospect (e.g., smaller sooner, larger later) is essentially equivalent (i.e., neither substantially better than one another) by halving the difference of the commodity up or down from the previous participant response (i.e., the adaptive titration task). This is also known as an adjusting amount task, which adaptively adjusts the amount of the commodity to find a specific point (i.e., in the case of discounting, an indifference point). The goal of this process is to reveal a derived ordinate among a set of fixed delays, which yields a curve that may be characterized via statistical analysis. For the Du et al. (2002) task, the algorithm was driven by predefined iterative limits (i.e., a set number of adaptive choices for each fixed delay); however, other algorithms in this space included constraints more determined by participant responses (e.g., varying delays).

Johnson and Bickel (2002) used an algorithm that also adjusted the amount of the commodity based on participant responses; however, the amount determined was based on moving upper and lower limits until the difference between the upper and lower limits was 2% or less of the larger magnitude (i.e., increasing precision related to preference reversals). It warrants noting that both these tasks rely on a set of fixed delays, and only the values within those delay points are assessed adaptively (i.e., adapting amounts for the immediate options, not the delays). Furthermore, data generated from these tasks still required nonlinear model fitting, so relevant metrics needed to be derived from the data before statistical comparisons could be performed (Kaplan et al., 2021).

Other adaptive tasks exist and free the analyst from the need to perform model fitting by providing a parameter that references a presumed data-generating process. One example is the five-trial adjusting delay task (Koffarnus & Bickel, 2014), in which a larger later option is presented alongside an immediate option half the size of the larger. Each choice changes the delay to a larger later option based on the participant’s previous response throughout five questions. The last response is then used to identify the ED50, which refers to the point at which the larger, later option is subjectively equal to the smaller, sooner option and is reported as a fitted parameter, presuming a hyperbolic form (Mazur, 1987). Another example is the three-option adaptive discount rate measure (H. Yoon & Chapman, 2016), which has a similar logic to the Johnson and Bickel (2002) adjusting amount task but uses three choices to shift upper and lower limits over 10 choices to identify a discount rate based on a hyperbolic function. Although each of these adaptive tasks avoids the need for nonlinear model fitting, this is made possible by presuming some functional form for analytic purposes. Because of this, these tasks cannot be used for comparing different functional forms of the data, as they presuppose them to identify a discount rate rather than generate data to be modeled.

<bold>Translating Adaptive Tasks to Demand Research</bold>

Adaptive tasks and procedures in discounting have been useful for understanding discounting processes for various commodities and in varying contexts. Although these are actively used and developed in discounting research, several aspects of the operant demand framework and key metrics of interest present a challenge to implementing adaptive tasks (i.e., Q0, PMAX). First, levels of consumption are continuous and do not have a ceiling or absolute upper limit as is common for the discounting paradigm. Second, discounting methods typically involve pairs of choices (i.e., sooner vs. later) but demand methods may include one, two, or even more competing options for goods/services, which adds complexity to the task (i.e., continuous vs. dichotomous responding). Third, metrics extracted from demand curves emphasize relative changes in consumption and price (e.g., PMAX), and this is less straightforward than optimizing toward some assumed or fixed quantity, for example, ED50.

Accessing the benefits of adaptive tasks for operant demand purchase tasks (e.g., improved precision with fewer questions) requires a novel approach. Specifically, such tasks would need to accommodate the continuous nature of hypothetical consumption data and to be sensitive to relative changes in consumption across prices. Additionally, simplifying the approach would require adopting a framework for quantifying choice behavior without unnecessarily making assumptions regarding the underlying data-generating process. At present, there are no published approaches in studies of delay discounting and operant demand that are suited to accomplishing these aims, and this calls for further exploration of alternative approaches (e.g., artificial intelligence).

<bold>Machine Learning and Dynamic Adjustment Algorithms</bold>

The term machine learning (ML) refers to a family of methods designed to support drawing generalizable inferences from data (Blum et al., 2020). These tools are applied broadly, toward many practical and theoretical issues, and a complete review of these methods is beyond the scope of this work. Rather, the focus of this work is instead on how ML can be used to supplement contemporary methods in purchase tasks commonly used in operant demand research. The central goal of this work was to present and review a process for developing an agent that responds to the input of a participant and interactively guides the presentation of pricing questions (i.e., toward those associated with greater resource allocation/responding).

Reinforcement learning (RL) can be viewed as a derivative of ML methods; however, RL is distinct from both supervised and unsupervised forms of ML. For instance, ML is traditionally applied to either extract structure from data or perform classification, whereas RL is often used to guide the making of “sequential optimal decisions under uncertainty” via a Markov decision process (Rao & Jelvis, 2023). These tools are frequently used to model decision-making processes (e.g., an adversary for computer games), using agents designed to suggest actions (K) given prior/available information and environmental state. That is, the choice to demonstrate a given action (k) is conditioned on a history of prior reinforcement and present conditions. The reinforcement element of this approach refers to a reweighting of respective actions (K) given the history and likely future of rewards. Various algorithms that differentially weight actions differ in how each balances the need for exploring available actions and for exploiting prior experience.

Actions available to agents are favored or made more likely based on the concept of regret. Regret refers to the evaluation of observed rewards associated with actions (i.e., more regret = missing out on potential reinforcers). The selection of actions by the agent is driven by levels of regret, whereby the most likely action to pursue is the one that maximizes the probability and quantity of reward and minimizes regret. This process can be adjusted to balance the need for exploring unexplored actions in the pursuit of optimal reward. The choice of algorithms incorporated in RL approaches is guided based on various assumptions for the decision-making process and the types of data being optimized (Rao & Jelvis, 2023). For example, there are often distributional assumptions regarding the probability or magnitude of reward for specific actions (e.g., Binomial for yes [1] or no [0] responses). Additionally, the relative superiority of an action may not be stationary, and different actions may have superior outcomes at different points in time. Algorithms for dynamically predicting optimal actions vary considerably and are carefully selected or designed depending on the specific nature of the task, context, and manner of reward.

<bold>Partially Ordered Set Master Algorithm</bold>

The partially ordered set master (POSM) algorithm is a variant of RL that explores various actions that are ordered (Missura & Gärtner, 2011). For example, the ordering may correspond to multiple settings that vary in terms of increasing difficulty for a hypothetical user. The POSM algorithm is unique in that the available actions (i.e., <img src="http://imagesrvr.epnet.com/embimages/apa-psycarticles/pha/pha_33_2_199_math1.gif"/>) are ordered rather than each action having a discrete distribution. The ordering inherent in the POSM algorithm is useful in purchase tasks because ordering naturally exists among pricing options and because there is no assumption that the price associated with peak expenditure is stationary over time. The algorithm presented in Missura and Gärtner (2011) is depicted in Algorithm 1.
>Algorithm 1: Partially ordered set master algorithm
>Require: Partial order (<img src="http://imagesrvr.epnet.com/embimages/apa-psycarticles/pha/pha_33_2_199_math2.gif"/>) for K different settings (ks), reweighting constant <img src="http://imagesrvr.epnet.com/embimages/apa-psycarticles/pha/pha_33_2_199_math3.gif"/>, and agent observations across time (Ot)
><img src="http://imagesrvr.epnet.com/embimages/apa-psycarticles/pha/pha_33_2_199_math4.gif"/> let <img src="http://imagesrvr.epnet.com/embimages/apa-psycarticles/pha/pha_33_2_199_math5.gif"/>
>fort = 1, 2 … do
><img src="http://imagesrvr.epnet.com/embimages/apa-psycarticles/pha/pha_33_2_199_math6.gif"/>: let <img src="http://imagesrvr.epnet.com/embimages/apa-psycarticles/pha/pha_33_2_199_math7.gif"/>
><img src="http://imagesrvr.epnet.com/embimages/apa-psycarticles/pha/pha_33_2_199_math8.gif"/>: let <img src="http://imagesrvr.epnet.com/embimages/apa-psycarticles/pha/pha_33_2_199_math9.gif"/>
>Predict <img src="http://imagesrvr.epnet.com/embimages/apa-psycarticles/pha/pha_33_2_199_math10.gif"/>
>Observe <img src="http://imagesrvr.epnet.com/embimages/apa-psycarticles/pha/pha_33_2_199_math11.gif"/>
>if<img src="http://imagesrvr.epnet.com/embimages/apa-psycarticles/pha/pha_33_2_199_math12.gif"/>then
><img src="http://imagesrvr.epnet.com/embimages/apa-psycarticles/pha/pha_33_2_199_math13.gif"/>: let <img src="http://imagesrvr.epnet.com/embimages/apa-psycarticles/pha/pha_33_2_199_math14.gif"/>
>if<img src="http://imagesrvr.epnet.com/embimages/apa-psycarticles/pha/pha_33_2_199_math15.gif"/>then
><img src="http://imagesrvr.epnet.com/embimages/apa-psycarticles/pha/pha_33_2_199_math16.gif"/>: let <img src="http://imagesrvr.epnet.com/embimages/apa-psycarticles/pha/pha_33_2_199_math17.gif"/>

The original purpose of the algorithm was to assist in identifying an optimal setting regarding user performance (i.e., neither too easy nor too hard). The use of the term “master” is related to the game-based context and the design of an intelligent adversary (i.e., as if opposing a “master” in a gamelike context). Beliefs regarding specific settings are updated using performances demonstrated by the user, particularly when a given setting appears “too easy” (−1) or “too hard” (+1) for the user. The total mass of beliefs is reflected across policies for “too low” and “too high” at a given time (t), represented for each action (k) as At and Bt, respectively. Each of these is listed in Equation 3.<anchor name="eqn3"></anchor>

The process of reweighting beliefs across prices according to a fixed constant is illustrated in Algorithm 1. This approach is efficient in terms of maximizing information because the process of updating beliefs, βwt(x), carries forward to all levels above or below action k. Estimates of beliefs favoring specific actions at a specific time (θt) are determined using the minimum of both policies for A and B for each level of k at a given time t. The specific calculation for this is provided in Equation 4.<anchor name="eqn4"></anchor>

<h31 id="pha-33-2-199-d241e1103">An Algorithm-Driven Hypothetical Purchase Task</h31>

The POSM algorithm can be adapted for use in purchase to explore levels of price (P) using the total amount of expenditure (i.e., P multiplied by Q) and the concept of regret across K levels of P. The minimization of regret guides the exploration of future prices in search of a point of peak expenditure. The point at which peak expenditure is optimized reveals the empirical PMAX and OMAX for the user. A visual of the link between the demand curve, the point of unit elasticity (PMAX), and the point of peak work (OMAX) is illustrated in Figure 1.

Figure 2 provides an overview of how the POSM algorithm can be used to guide questions presented to participants in a purchase task. The procedure begins with the initialization of a vector of beliefs at K levels of P (e.g., 1–500 USD; w = 1) and the generation of an initial prediction. Per Equation 4, this is essentially a uniform prior, and the initial prediction is the midpoint of the range of levels (e.g., P = 250 USD for pricing from 1 to 500 USD). Input from the user across iterations reveals expenditure (R) at respective levels of price (k) and beliefs are updated to favor levels that minimize regret (i.e., produces value closest to presently known OMAX). This process is either repeated for a fixed number of iterations (t) or terminated once a threshold is met and further iterations are unlikely to further reduce regret. Figure 3 illustrates a simulated sequence wherein the agent adapts to the expenditure of the simulated user and guides the prices presented to more sample levels at or near PMAX (see left). The overall regret decreases as the user provides information that more consistently produces the highest expenditure (OMAX; see right). The empirical data comprising the work output and demand functions learned from the simulated user in the task are presented in Figure 4.
>
><anchor name="fig2"></anchor>
>
><anchor name="fig3"></anchor>
>
><anchor name="fig4"></anchor>

<h31 id="pha-33-2-199-d241e1175">Research Questions</h31>

The POSM algorithm is well-suited to hypothetical purchase tasks given the strategic use of ordinal options (i.e., increased price = increased “difficulty” defending consumption) and usefulness in seeking questions that provide the most informational value (i.e., nearer PMAX). Furthermore, informed pricing tasks also assist with avoiding questions that query price values that are unlikely to be useful quantitatively (i.e., nonconsumption at prohibitively expensive prices). The primary goal of this work was to present and evaluate an adaptive approach to evaluating reinforcer value for use in adaptive purchasing tasks. This work primarily focused on whether the algorithm, across varying question lengths, could reliably and efficiently recover unknown values of PMAX. This algorithm was evaluated in two dimensions with research questions specific to each.
>Research Question 1: Given simulated PMAX values, does the POSM algorithm recover the price associated with peak work (PMAX) in purchase tasks consisting of five, 10, 15, and 20 sequential questions?
>Research Question 2: Related to Research Question 1, what is the minimum number of sequential questions necessary to recover statistically equivalent estimates of PMAX?

Method

> <h31 id="pha-33-2-199-d241e1204">Simulated Agents</h31>

A total of 4,000 (n = 1,000 × 4 run lengths) simulated agents were generated across four different hypothetical purchase task lengths: five, 10, 15, and 20 questions. A lower limit of five questions was selected to mirror the brevity of adaptive tasks available in studies of delay discounting (Koffarnus & Bickel, 2014). Each simulated series sampled randomly selected OMAX, PMAX, and Q0 values from ranges of 50 to 950, 1,000 to 5,000, and 10 to 100, respectively. The quantity expended by the hypothetical user was generated by using such values and the solutions provided in Gilroy (2023) and Gilroy et al. (2019) to produce a prediction using the exponential model proposed by Hursh and Silberberg (2008). All simulations were conducted using the R Statistical Program (R Core Team, 2013), and the same seed value was used across simulations to isolate differences solely due to iteration length.

<h31 id="pha-33-2-199-d241e1235">Analytic Strategy</h31>

Four individual equivalence tests were conducted for each of the varying numbers of questions in the task. All equivalence tests were performed using the TOSTER R package (Lakens, 2017; Lakens & Caldwell, 2022). Equivalence tests were used to determine whether the results of the algorithm-produced PMAX values were statistically different and not equivalent, not statistically different but not equivalent, or not statistically different and equivalent. For each test, the smallest effect size of interest for differences between simulated and true PMAX values was set to a value of 0.01. The 0.01 value on the log scale provides a convenient means of approximating an estimated 1% difference between paired values and differences below this upper and lower threshold were not considered to be statistically meaningful.

Results

> <h31 id="pha-33-2-199-d241e1253">Research Question 1: Tests of Equivalence for PMAX</h31>

Illustrations of task equivalence are illustrated in the right-hand portion of Figure 5 for tasks featuring five, 10, 15, and 20 sequential questions. For each iteration length, equivalence tests include tests for both statistical difference and equivalence. Tests of statistical difference using two one-sided T tests revealed no statistically significant differences between log-transformed seeded and derived PMAX values for lengths of five (T = 0.756, p = .45), 10 (T = 0.159, p = .87), 15 (T = −0.879, p = .38), or 20 questions (T = 0.45, p = .65). Relatedly, tests for statistical equivalence were also all significant, rejecting the null hypothesis of nonequivalence for lengths of five (T = −2.5, p < .01), 10 (T = −31, p < .01), 15 (T = 51.95, p < .01), or 20 questions (T = −88, p < .01). Results overall indicated that the algorithm performed well across all lengths, though precision increased from lengths of five but did not improve substantially at lengths greater than 10.
>
><anchor name="fig5"></anchor>

<h31 id="pha-33-2-199-d241e1304">Research Question 2: Effects of Iteration Length on PMAX</h31>

Visualizations of correspondence between seeded and derived PMAX values across varying task lengths are provided in the left-hand portion of Figure 5. Visual inspection revealed strong overall correspondence between estimates of PMAX across task lengths. Tasks that featured five sequentially updated questions yielded PMAX values that were highly correlated with seed PMAX values (r = 0.97, p < .001); however, seeded PMAX values at the lowest extremes were the least well detected in this short run length. All remaining task lengths demonstrated essentially perfect correspondence, with lengths of 10 (r ≥ .999, p < .001), 15 (r ≥ .999, p = .001), and 20 questions producing statistically equivalent estimates of PMAX (r ≥ .999, p = .001).

Discussion

Purchase tasks are among the most frequently used tools in research applying the operant demand framework. Data from these tasks are readily analyzed using any of the modeling options derived from the framework of Hursh and Silberberg (2008). However, despite good adoption and flexibility, the fixed and standardized nature of pricing assays included in these tasks limits research in several regards. This fixed nature limits the usefulness of empirical data because the price points directly sampled in these tasks seldom closely correspond with values critical to the framework (e.g., the point at which demand for a reinforcer shifts from inelastic to elastic). The goal of this work was to introduce and evaluate an adaptive approach based on RL for use in purchase tasks. The results from this study revealed strong overall performance across tasks of all lengths as well as good correspondence between seeded and derived PMAX values, even with the most abbreviated forms of the task (i.e., just five questions). Despite good correspondence, differential amounts of variability were observed for tasks with varying question lengths. In contrast, all evaluations with lengths of 10 or more questions demonstrated statistical equivalence and consistently recovered the seeded PMAX values.

Overall findings suggest that the most abbreviated forms of adaptive purchase tasks may be less capable of consistently capturing PMAX values, and this warrants a bit of discussion. First, shorter tasks may be less consistent when the range of prices considered is substantial (e.g., 0–1,000 USD/unit) and/or the participant’s PMAX value falls near the extremes (e.g., 950 USD/unit). Given that the task operates from an initial uniform prior, more questions/updates would be necessary to ensure enough freedom for the algorithm to explore the parameter space nearer the extremes (i.e., larger ranges = more iterations required). Furthermore, providing additional questions would provide the algorithm with a greater opportunity to recover from a chance errant response on the part of the participant (i.e., user error). Additionally, sufficient iterative updating could also support better data quality, as questions asked in this manner may lessen the risk of certain data being determined “unsystematic” or less amenable to statistical analysis based on a single errant response.

Simulations performed in this study revealed that adaptive purchase tasks with 10 or more sequential questions were essentially perfect in determining empirical measures of PMAX that were identical to the “true” seeded PMAX values. This finding suggests that RL algorithms have good potential for enhancing the flexibility and utility of purchase tasks and, furthermore, may lessen the need for statistically and mathematically complex operations when conducting behavioral economic research. Specifically, nonlinear models and the solutions necessary for deriving key metrics may not be necessary to answer research questions within the existing framework. That is, both Q0/QFREE can be determined with good precision via empirical means and without the need to derive them mathematically.

<h31 id="pha-33-2-199-d241e1387">Implications for Future Research</h31>

The framework of Hursh and Silberberg (2008) has been critical in guiding modern approaches for evaluating choice behavior under constraint (e.g., scaled as a function of price, availability of alternatives), and the exponential model of operant demand and its derivatives have provided a reliable means of elucidating critical elements in the operant demand framework (e.g., revealing Q0 and PMAX). Statistical modeling has traditionally been necessary because fixed pricing assays seldom provided empirical information that was sensitive enough to support direct comparisons within or between participants. However, given that price assays can be made adaptive and directly explore relevant prices, nonlinear modeling such as that suggested by Hursh and Silberberg (2008) may not be necessary to answer many research questions commonly explored using this framework. That is, precise empirical representations of Q0 (i.e., QFREE) and PMAX/OMAX may circumvent the need to estimate these values from nonlinear models, and researchers could proceed with comparisons using much simpler methods (e.g., T tests, analysis of variance). Such a move has the potential for simplifying practices in the operant demand framework in several notable ways.

First, the strategy provided here avoids operating with set assumptions regarding the underlying processes involved in decision making, a topic currently unresolved. The task presented in this work is not bound to a specific underlying process and allows researchers a flexible means of conducting basic and applied research free from specific assumptions regarding the underlying decision-making process. This largely avoids the need to explore and compare competing models for the data, which lessens the demands put on the analyst. Second, an increased focus on consumption at peak levels as opposed to consumption at low (or zero) levels both streamlines the task and avoids data that are traditionally associated with complex modeling decisions (e.g., nonconsumption). Third, and last, the model-agnostic approach does not limit the freedom of analysts to pursue quantitative modeling and supports multiple layers of interpretation. This is conceptually and theoretically valuable because it supports model- and theory-driven research related to reinforcer effects and operant demand.

<h31 id="pha-33-2-199-d241e1423">Limitations and Areas of Future Extension</h31>

The methods presented in this work provide an encouraging initial exploration of RL as an avenue for further research applying behavioral economic methods. Although these findings are highly promising and support new avenues for research, a few points warrant noting. First, the evaluation focused on fixed runs of sequentially updated beliefs regarding peak expenditure and these were largely selected based on convenience and practicality. Future expansions of this work would benefit from exploring flexible thresholds for task termination based on decreasing regret rather than preset amounts. Second, parameter recovery was excellent with simulated agents but it remains to be seen how real-world participants may respond to fixed versus sequentially updated batteries. Third, and related to the second point, comparative research evaluating correspondence between modeled and adaptively derived empirical metrics has yet to be performed.

In summary, methods from RL and ML have strong potential to add constructively to the methodology derived from the operant demand framework. This study highlighted an alternative to expanding these methods, focusing on the data collected rather than how data were modeled. This focus on good empirical data over complex statistical modeling presents exciting opportunities to simplify analytical procedures and improve the quality and interpretability of empirical data. Additional exploration of RL, ML, and other advanced methods from computer science continues to be warranted at this time.

Footnotes

<sups> 1 </sups> Quantity PMAX is not fitted directly and is instead derived from overall model predictions; however, controlling for all other parameters, parameter α is most representative of price scaling effects.

<sups> 2 </sups> The process of estimating PMAX is complicated when parameters such as span (k) fall below an absolute minimum threshold (Gilroy et al., 2019).

References

Amlung, M., McCarty, K. N., Morris, D. H., Tsai, C.-L., & McCarthy, D. M. (2015). Increased behavioral economic demand and craving for alcohol following a laboratory alcohol challenge. Addiction, 110(9), 1421–1428. 10.1111/add.12897

Aston, E. R., Farris, S. G., MacKillop, J., & Metrik, J. (2017). Latent factor structure of a behavioral economic marijuana demand curve. Psychopharmacology, 234(16), 2421–2429. 10.1007/s00213-017-4633-6

Bidwell, L. C., MacKillop, J., Murphy, J. G., Tidey, J. W., & Colby, S. M. (2012). Latent factor structure of a behavioral economic cigarette demand curve in adolescent smokers. Addictive Behaviors, 37(11), 1257–1263. 10.1016/j.addbeh.2012.06.009

Blum, A., Hopcroft, J., & Ravindran, K. (2020). Foundations of data science. Cambridge University Press. 10.1017/9781108755528

Du, W., Green, L., & Myerson, J. (2002). Cross-cultural comparisons of discounting delayed and probabilistic rewards. The Psychological Record, 52(4), 479–492. 10.1007/BF03395199

Gilroy, S. P. (2022). Hidden equivalence in the operant demand framework: A review and evaluation of multiple methods for evaluating nonconsumption. Journal of the Experimental Analysis of Behavior, 117(1), 105–119. 10.1002/jeab.724

Gilroy, S. P. (2023). Interpretation(s) of essential value in operant demand. Journal of the Experimental Analysis of Behavior, 119(3), 554–564. 10.1002/jeab.845

Gilroy, S. P., Kaplan, B. A., Reed, D. D., Hantula, D. A., & Hursh, S. R. (2019). An exact solution for unit elasticity in the exponential model of operant demand. Experimental and Clinical Psychopharmacology, 27(6), 588–597. 10.1037/pha0000268

Gilroy, S. P., Kaplan, B. A., Schwartz, L. P., Reed, D. D., & Hursh, S. R. (2021). A zero-bounded model of operant demand. Journal of the Experimental Analysis of Behavior, 115(3), 729–746. 10.1002/jeab.679

Greenwald, M. K., & Hursh, S. R. (2006). Behavioral economic analysis of opioid consumption in heroin-dependent individuals: Effects of unit price and pre-session drug supply. Drug and Alcohol Dependence, 85(1), 35–48. 10.1016/j.drugalcdep.2006.03.007

Hursh, S. R. (1980). Economic concepts for the analysis of behavior. Journal of the Experimental Analysis of Behavior, 34(2), 219–238. 10.1901/jeab.1980.34-219

Hursh, S. R. (1984). Behavioral economics. Journal of the Experimental Analysis of Behavior, 42(3), 435–452. 10.1901/jeab.1984.42-435

Hursh, S. R., & Bauman, R. A. (1987). The behavioral analysis of demand. In L.Green & J.Kagel (Eds.), Advances in behavioral economics (Vol. 1, pp. 117–165). Ablex Publishing.

Hursh, S. R., & Silberberg, A. (2008). Economic demand and essential value. Psychological Review, 115(1), 186–198. 10.1037/0033-295X.115.1.186

Johnson, M. W., & Bickel, W. K. (2002). Within-subject comparison of real and hypothetical money rewards in delay discounting. Journal of the Experimental Analysis of Behavior, 77(2), 129–146. 10.1901/jeab.2002.77-129

Kaplan, B. A., Franck, C. T., McKee, K., Gilroy, S. P., & Koffarnus, M. N. (2021). Applying mixed-effects modeling to behavioral economic demand: An introduction. Perspectives on Behavior Science, 44(2–3), 333–358. 10.1007/s40614-021-00299-7

Koffarnus, M. N., & Bickel, W. K. (2014). A 5-trial adjusting delay discounting task: Accurate discount rates in less than one minute. Experimental and Clinical Psychopharmacology, 22(3), 222–228. 10.1037/a0035973

Koffarnus, M. N., Franck, C. T., Stein, J. S., & Bickel, W. K. (2015). A modified exponential behavioral economic demand model to better describe consumption data. Experimental and Clinical Psychopharmacology, 23(6), 504–512. 10.1037/pha0000045

Lakens, D. (2017). Equivalence tests: A practical primer for t tests, correlations, and meta-analyses. Social Psychological and Personality Science, 8(4), 355–362. 10.1177/1948550617697177

Lakens, D., & Caldwell, A. (2022). Package ‘TOSTER’ [Computer software].

MacKillop, J., Murphy, J. G., Tidey, J. W., Kahler, C. W., Ray, L. A., & Bickel, W. K. (2009). Latent structure of facets of alcohol reinforcement from a behavioral economic demand curve. Psychopharmacology, 203(1), 33–40. 10.1007/s00213-008-1367-5

Madden, G. J., Smethells, J. R., Ewan, E. E., & Hursh, S. R. (2007). Tests of behavioral-economic assessments of relative reinforcer efficacy: Economic substitutes. Journal of the Experimental Analysis of Behavior, 87(2), 219–240. 10.1901/jeab.2007.80-06

Mazur, J. E. (1987). An adjusting procedure for studying delayed reinforcement. In M. L.Commons, J. E.Mazur, J. A.Nevin, & H.Rachlin (Eds.), The effect of delay and of intervening events on reinforcement value (pp. 55–73). Lawrence Erlbaum.

Missura, O., & Gärtner, T. (2011). Predicting dynamic difficulty. Advances in Neural Information Processing Systems, 24. <a href="https://papers.nips.cc/paper_files/paper/2011/file/7c9d0b1f96aebd7b5eca8c3edaa19ebb-Paper.pdf" target="_blank">https://papers.nips.cc/paper_files/paper/2011/file/7c9d0b1f96aebd7b5eca8c3edaa19ebb-Paper.pdf</a>

R Core Team. (2013). R: A language and environment for statistical computing [Computer software].

Rao, A., & Jelvis, T. (2023). Foundations of reinforcement learning with applications in finance (1st ed.). Chapman & Hall; CRC Press.

Schwartz, L. P., Toegel, F., Devine, J. K., Holtyn, A. F., Roma, P. G., & Hursh, S. R. (2023). Latent factor structure of behavioral economic heroin and cocaine demand curves. Experimental and Clinical Psychopharmacology, 31(2), 378–385. 10.1037/pha0000594

Strickland, J. C., Reed, D. D., Hursh, S. R., Schwartz, L. P., Foster, R. N. S., Gelino, B. W., LeComte, R. S., Oda, F. S., Salzer, A. R., Schneider, T. D., Dayton, L., Latkin, C., & Johnson, M. W. (2022). Behavioral economic methods to inform infectious disease response: Prevention, testing, and vaccination in the COVID-19 pandemic. PLOS ONE, 17(1), Article e0258828. 10.1371/journal.pone.0258828

Yoon, H., & Chapman, G. B. (2016). A closer look at the yardstick: A new discount rate measure with precision and range. Journal of Behavioral Decision Making, 29(5), 470–480. 10.1002/bdm.1890

Yoon, J. H., & Higgins, S. T. (2008). Turning k on its head: Comments on use of an ED50 in delay discounting research. Drug and Alcohol Dependence, 95(1–2), 169–172. 10.1016/j.drugalcdep.2007.12.011

<h31 id="pha-33-2-199-d241e2516">APPENDIX</h31> <anchor name="A"></anchor> <h31 id="pha-33-2-199-d241e2517">APPENDIX A: Extracting Model Values From Model-Free Estimates</h31>

The quantity PMAX equates to α given respective units. The results of empirical fitting yielding Q0 and PMAX can be used to solve for α (given any suitable value for k). Relevant calculations are provided below illustrating respective calculations (left) and a worked solution (right). The plot below illustrates the range of prices with the solved α parameter.<anchor name="eqn5"></anchor>Consumption 10.0 7.5 5.0 2.5 0.0 1 10 100 1000 Prices Work 1000 750 500 250 0 1 10 100 1000 Prices

Submitted: May 22, 2024 Revised: August 17, 2024 Accepted: September 28, 2024

*Result*: Adaptive purchase tasks in the operant demand framework.

*Further Information*

Adaptive Purchase Tasks in the Operant Demand Framework

Method

Results

Discussion

Footnotes

References

*Links*

*Additional functions*

Result: Adaptive purchase tasks in the operant demand framework.

Further Information

Links

Additional functions