Probabilistic Reversal Learning (PRL) Task¶

Field	Value
Name	Probabilistic Reversal Learning (PRL) Task
Version	main (1.0)
URL / Repository	https://github.com/TaskBeacon/PRL
Short Description	A task where participants learn stimulus-reward associations that reverse unpredictably.
Created By	Zhipeng Cao (zhipeng30@foxmail.com)
Date Updated	2025-07-23
PsyFlow Version	0.1.0
PsychoPy Version	2025.1.1
Modality	Behavior/EEG
Language	Chinese
Voice Name	zh-CN-YunyangNeural

1. Task Overview¶

This task implements a probabilistic reversal learning paradigm designed for EEG studies. Participants are presented with two visual stimuli (images) and must learn which one has a higher probability of yielding a reward (+10 points). The reward contingencies are not fixed; after a certain number of correct choices, the probabilities reverse, requiring participants to adapt their strategy. The goal is to maximize the total score by tracking these changes and consistently choosing the more advantageous stimulus. The task consists of multiple blocks, and within each block, a new pair of stimuli is used.

2. Task Flow¶

Block-Level Flow (`main.py`)¶

Step	Description
1. Initialization	Load configuration, collect subject information, and set up the PsychoPy window, keyboard, and trigger sender.
2. Instructions	Display the task instructions text and play the corresponding voiceover. Wait for a spacebar press to continue.
3. Block Loop	Iterate through the total number of blocks (`total_blocks`: 6).
4. Countdown	Display a 3-second countdown before each block begins.
5. Stimulus Setup	For each block, select a new pair of images (`stima` and `stimb`) from the `assets` folder.
6. Controller Reset	A new `Controller` instance is created for each block, resetting the learning and reversal logic.
7. Run Trials	Execute the trial-level logic (`run_trial.py`) for the number of trials in the block (`trial_per_block`: 40).
8. Block Break	After each block, display a break screen showing the score for that block. Wait for a spacebar press to proceed to the next block.
9. End of Task	After all blocks are completed, display a “good bye” screen with the total score.
10. Data Saving	Save all recorded data to a CSV file.
11. Cleanup	Close the serial port and quit the PsychoPy application.

Trial-Level Flow (`run_trial.py`)¶

Step	Description
1. Fixation	A fixation cross (`+`) is displayed for a random duration between 0.6 and 0.8 seconds.
2. Cue Presentation	Two images (`stima` and `stimb`) are presented on the left and right sides of the screen. The positions are determined by the `condition` (`AB` or `BA`).
3. Response Collection	The participant has 1.5 seconds (`cue_duration`) to choose one of the images by pressing ‘f’ for the left or ‘j’ for the right. A highlight box appears around the selected image.
4. Feedback Logic	The outcome is determined probabilistically. If the correct stimulus is chosen, there is an 80% (`win_prob`) chance of winning +10 points and a 20% chance of losing 10 points. If the incorrect stimulus is chosen, the probabilities are reversed. No response results in a loss of 10 points.
5. Blank Screen	A blank screen is shown for a random duration between 0.4 and 0.6 seconds.
6. Feedback Display	The feedback (“+10分”, “-10分”, or “未反应：-10分”) is displayed for 0.8 seconds (`feedback_duration`).
7. Controller Update	The `Controller` is updated with the outcome of the trial (`hit` or `miss`).

Controller Logic (`utils.py`)¶

Component	Description
Reversal Mechanism	The `Controller` uses a sliding window approach to determine when to reverse the stimulus-reward contingencies.
Sliding Window	The controller tracks the last 10 trials (`sliding_window`).
Reversal Trigger	If the participant correctly identifies the higher-probability stimulus in at least 9 of the last 10 trials (`sliding_window_hits`), the reward probabilities are reversed.
State Tracking	The controller tracks the `current_correct` stimulus (‘stima’ or ‘stimb’), the `reversal_count`, and the history of hits within the current phase (`phase_hits`).
Win Probability	The initial win probability is 80% (`win_prob`). After the first reversal, it changes to 90% (`rev_win_prob`).

3. Configuration Summary¶

a. Subject Info¶

Field	Description
`subject_id`	The unique identifier for the subject (3 digits, 101-999).
`subname`	The subject’s name in Pinyin.
`age`	The subject’s age (5-60).
`gender`	The subject’s gender (‘Male’ or ‘Female’).

b. Window Settings¶

Parameter	Value
`size`	[1920, 1080]
`units`	deg
`screen`	1
`bg_color`	gray
`fullscreen`	True
`monitor_width_cm`	60
`monitor_distance_cm`	72

c. Stimuli¶

Name	Type	Description
`fixation`	text	A white ‘+’ symbol.
`win_feedback`	text	“+10分” in white.
`lose_feedback`	text	“-10分” in white.
`no_response_feedback`	text	“未反应：-10分” in white.
`blank`	text	An empty text stimulus.
`stima`	image	The first image in a pair, with a size of [5, 5] degrees.
`stimb`	image	The second image in a pair, with a size of [5, 5] degrees.
`highlight_left`	rect	A white rectangle to highlight the left stimulus.
`highlight_right`	rect	A white rectangle to highlight the right stimulus.
`block_break`	text	A multi-line text displaying the score at the end of a block.
`instruction_text`	textbox	The initial instructions for the task.
`good_bye`	text	The final message at the end of the experiment, showing the total score.

d. Timing¶

Phase	Duration (seconds)
`fixation_duration`	[0.6, 0.8] (randomized)
`cue_duration`	1.5
`feedback_duration`	0.8
`blank_duration`	[0.4, 0.6] (randomized)

e. Triggers¶

Event	Code
`exp_onset`	98
`exp_end`	99
`block_onset`	100
`block_end`	101
`fixation_onset`	1
`cue_onset`	2
`key_press`	3
`no_response`	4
`win_feedback_onset`	5
`lose_feedback_onset`	6
`no_response_feedback_onset`	7

f. Adaptive Controller¶

Parameter	Value
`win_prob`	0.8
`rev_win_prob`	0.9
`sliding_window`	10
`sliding_window_hits`	9

4. Methods (for academic publication)¶

In this experiment, participants performed a probabilistic reversal learning task. Each trial began with a central fixation cross, displayed for a variable duration of 600 to 800 ms. Subsequently, two distinct images were presented simultaneously on the left and right sides of the screen for 1500 ms. Participants were instructed to select one of the two images by pressing the ‘f’ key for the left image or the ‘j’ key for the right image. Following their response, a blank screen was shown for 400 to 600 ms, after which feedback was provided for 800 ms.

The task was structured into 6 blocks of 40 trials each. Within each block, one of the two images was designated as the “correct” stimulus, associated with an 80% probability of reward (+10 points) and a 20% probability of punishment (-10 points). The “incorrect” stimulus had the inverse probabilities. The reward contingencies were subject to reversal based on the participant’s performance. A reversal was triggered when the participant chose the correct stimulus in at least 9 of the last 10 trials. Upon reversal, the previously “incorrect” stimulus became the “correct” one, and its associated win probability was increased to 90% for the remainder of the block to facilitate the learning of the new rule. This design allows for the examination of cognitive flexibility and reinforcement learning under changing environmental conditions.