Research Report // 2024GSND 5130 Experimental Draft
Academic Archive

One with the Beat: Does Audio-Visual Desynchronization Affect Player Performance in Rhythm Games?

TheGrapeEscapeCharles SchwimmerGodfrey Yang

Abstract

An investigation into the impact of audio-visual latency on player performance in mobile rhythm games. Using Cytus II as a testbed, this study employs a between-subjects design to measure how specific desynchronization intervals affect timing accuracy (Technical Points) across different player proficiency levels.

Figure 1: A screenshot of Cytus II showing a perfect score for a late-game level.

1. Abstract

Rhythm games require players to react to synchronized audio-visual cues [1]. Popular titles like Guitar Hero have achieved widespread commercial success [2], while others like Osu! maintained communities with more than 10,000 daily active players even after 17 years of release [3]. Rhythm games typically feature audio-visual latency adjustment to prevent conflicting cues. However, the impact of audio-visual desynchronization on player performance and perception remains largely unexplored until now. To investigate how audio-visual desynchronization affects rhythm game players, we used a between-subjects design and measured changes in performance with and without a desynchronization condition in Cytus II. The study was not able to find conclusive evidence. However, it highlighted the need for further research with larger sample sizes and more controlled environments to explore the nuances of this relationship.

2. Introduction

Rhythm games challenge players to cultivate and utilize a “rhythm sense,” a form of perception that allows them to anticipate and respond to rhythmic cues precisely [4]. This interplay between instincts and audio-visual stimuli demands that players not only react to what they see and hear but also internalize the rhythmic patterns and predict upcoming events [4].

There is limited academic research exploring the development of rhythm sense, its place in player performance, and the factors that influence it. While studies have investigated audio-visual synchrony perception in various contexts, its application to the unique mechanics and challenges of rhythm games still needs to be explored. This study aims to address this gap by investigating the impact of audio-visual desynchronization on player performance in Cytus II, a popular mobile rhythm game. Specifically, we examine how introducing a delay between the audio and visual cues affects players' ability to time their inputs accurately. We hypothesize that audio-visual desynchronization will negatively impact player performance.

3. Related Works

Research on audio-visual synchronization in games draws on a rich history of work investigating the perception and impact of temporal discrepancies between auditory and visual stimuli. The ability to accurately perceive and respond to these discrepancies is crucial for various cognitive tasks, including speech perception, music performance, and interaction with dynamic environments.

3.1 Audio-Visual Synchrony and Performance

Human perception is highly sensitive to temporal discrepancies between auditory and visual stimuli [5][6]. Even small asynchronies can disrupt perceived synchrony and negatively impact user experience, particularly in multimedia contexts like video games [5][7][8]. Asynchronous feedback can reduce player performance and enjoyment [8][9]. This is particularly relevant for rhythm games, which demand precise timing and coordination with audio-visual cues. Alexandriovsky et al. (2016) demonstrated that even minor asynchronies between vibrotactile and audio stimuli significantly reduced player accuracy in a rhythm game [7], underscoring the potential for subtle audio-visual desynchronization to disrupt gameplay.

3.2 Individual Differences in Synchrony Perception

Individual differences play a crucial role in audio-visual synchrony perception. Factors such as age, gender, musical experience, and familiarity with rhythm games can influence a player’s sensitivity to asynchrony [7][8][10]. For instance, Yu et al. (2022) found considerable variability in participants’ ability to identify synchronized and desynchronized audio tracks in the rhythm game Rhythm Dungeon [9]. This variability highlights the importance of considering individual differences in studies examining the impact of audio-visual desynchronization.

3.3 Beat Perception, Rhythm Games, and Serious Applications

Beat perception, or rhythm sense, is a core component of rhythm gameplay [4]. Research suggests that visual cues can enhance the processing of temporal patterns in the auditory domain, improving synchronization accuracy with auditory rhythms [11][12]. This finding supports the notion that visual elements in rhythm games contribute significantly to players’ ability to synchronize with the music.

Beyond entertainment, beat perception holds potential for applications in therapeutic contexts. Research suggests a strong link between audio-visual synchrony and brain function development, particularly in children [13]. Children with Attention Deficit Hyperactivity Disorder (ADHD) often exhibit difficulties in audio-visual synchrony and beat perception [13]. A study by Tierney et al. (2017) also suggested a link between differences in rhythmic skills, neural consistency, and linguistic proficiency [14].

4. Method

Previous studies have demonstrated the potential impact of asynchrony on performance and experience [15], the influence of individual differences in perception [9], and the role of music in shaping player experience [10]. This study builds upon this foundation by specifically investigating the effects of audio-visual desynchronization on player performance in Cytus II.

4.1 Participants

Participants were recruited primarily from the Game Science and Design master’s program at Northeastern University, with additional subjects selected from both within and outside the university. The sample included males, females, and non-binary individuals aged 18-32 with no prior experience with Cytus II. While prior experience with rhythm or music games was not assessed via self-report due to reliability concerns, participant proficiency was determined based on their performance in the first round of the study.

4.2 Apparatus

  • Cytus II: A rhythm game for mobile devices where players tap notes as a scanline crosses them [16]. Performance is measured in Technical Points (TP), a percentage score reflecting timing accuracy [17].

Figure 2: A screenshot of Cytus II showing the notes and the scanline.

  • Audio Setup: On-board speakers were used instead of headphones to minimize latency variations.
  • Mobile Devices: 3 Devices were used for testing
  • Apple iPhone 15
  • Google Pixel 8
  • Samsung Galaxy Note

4.3 Procedure

4.3.1 Pilot Study

A qualitative pilot study (sample size: 3) was conducted to select appropriate levels in Cytus II for the main study. This involved evaluating various levels to identify two suitable options:

  1. Tutorial Level: Bullet Waiting For Me (Difficulty: Easy, Level: 1) A level that effectively introduces core game mechanics to participants with no prior Cytus II experience.
  2. Test Level: Baptism of Fire (Difficulty: Easy, Level: 3) A level with specific characteristics deemed important for this study:
  • Relatively easy note patterns: To reduce the influence of individual learning differences on performance, a song with simple note patterns was chosen. This helps ensure that any observed performance changes are primarily attributable to the audio-visual desynchronization rather than variations in players' ability to learn complex patterns.
  • Moderate pacing: The selected test level features a tempo and note density that encourages reliance on both audio and visual cues for accurate timing, as observed and reported in the pilot study. This is crucial because the study investigates the interplay between these cues.

4.3.2 Study Procedure

Participants were provided with a clear explanation of the study’s objectives and a step-by-step outline of the procedure. Participants were given the opportunity to provide their informed consent to participate in the study, including acknowledging their understanding of the study’s purpose, procedures, how their data will be anonymized, stored, and analyzed, and potential risks. Participants were explicitly informed of their right to withdraw at any stage of the study.

Once participants provided informed consent, they completed the following procedure under the researcher's supervision. Data were recorded in Google Sheets.

  1. Tutorial: Participants played the tutorial level under the researcher’s guidance.
  2. Round 1 (Synchronized): All participants played the test level with synchronized audio and visual cues.
  3. Round 2:
  1. Control Group: played the test level again with synchronized audio and visual cues.
  2. Experiment Group: played the test level with a 250 ms delay between the audio and visual cues, introduced using the game’s calibration feature.
  1. Question: The participants are asked if they perceive the second round to be more challenging or cognitively demanding. They were also welcome to share additional thoughts.

4.3.3 Proficiency Grouping

Self-reported measurements can be unreliable, and prior experience often does not indicate a player’s proficiency in rhythm games. Instead, participants were divided into three proficiency groups (low, medium, and high) based on their TP scores in the first round. This approach, similar to the Input Multi-tasking metric employed by Ono et al. (2023), allows for a more objective assessment of participant skill level within Cytus II, mitigating potential biases associated with self-reported prior experience.

4.4 Design and Counterbalancing

This study defines player performance in rhythm games solely on their rhythm timing accuracy. TP was chosen as the only performance metric. While Cytus II provides other metrics like combo length and overall score, these metrics are not included in the study as their calculations introduce unnecessary complexity.

To account for potential learning effects and fatigue between Round 1 and Round 2, the change in TP score (ΔTP) was used as the dependent variable. This is calculated as:

ΔTP = Round 2 TP – Round 1 TP

This approach aims to counterbalance the influence of these confounding factors. As people have different rates of learning and fatigue, calculating the change in TP helps to isolate the effect of the independent variable (audio-visual desynchronization) by effectively comparing each participant to their own baseline performance.

4.5 Hypothesis and Variables

IV: Audio-visual Synchronicity:  80ms for the control group, 250ms for the experimental group.
DV: ΔTP

H0: There is no statistically significant negative performance change when introducing a 250ms audio-visual desynchronization.

H1: There is a statistically significant negative performance change when introducing a 250ms audio-visual desynchronization.

4.6 Data Analysis

We analyzed ΔTP across control and experiment groups and calculated mean and standard deviation. We used an independent sample t-test on ΔTP to determine statistical significance. We also conducted further statistical analysis on player performance across different proficiency groups.

5. Result

We conducted our experiment on 14 participants. Two samples were excluded, one due to an error in data collection and one as an outlier.  We only included data from the remaining 12 participants. The 12 participants include 6 from the control group, and 6 from the experiment group. They are grouped randomly. We observed a slightly different ΔTP in the control group (MN = -0.423, SD = 4.139) and the experiment group (MN = 1.135, SD = 3.724). The independent sample t-test reports a p-value=0.509. The Mann–Whitney U test reported a p-value=0.818. This indicates that we failed to reject the null hypothesis.

Figure 3. Control Group TP By Round

Figure 4. Experiment Group TP by Round

Figure 5. ΔTP by Groups

We additionally categorized participants into 3 proficiency groups based on their

round 1 TP score.  Participants with TP <= 80 are in Proficiency Group 1. Participants with 80 < TP <= 90 are in Proficiency Group 2. Participants with TP >= 90 are in Proficiency Group 3.

Figure 6. ΔTP By Proficiency Group

The analysis of each proficiency group’s ΔTP reported P1: MN=6.453, SD=1.664; P2: MN=-0.833, SD=3.506; P3: MN=0.094, SD=1.017. The overall TPs for all 3 proficiency groups are also shown (P1: MN=79.750, SD=5.508; P2:MN=84.508, SD=3.793; P3: MN=96.997, SD=1.935).

Figure 7. TP By Proficiency Group

As an additional analysis, run Welch’s t-tests on ΔTP across groups. T-tests between group 1 and group 2 reported p-value=0.050, between group 1 and group 3 reported p-value=0.010, while between group 2 and group 3 reported p-value=0.680.

Figure 8. Average ΔTP For Each Proficiency in Control

Figure 9. Average ΔTP For Each Proficiency in Experiment

Figure 10. Average ΔTP For Control and Experiment

Figure 11. Distribution of ΔTP

6. Discussion

6.1 Qualitative Findings

Most players quickly identified the audio-visual desynchronization within seconds of starting the second round. However, the study lacked sufficient data to quantitatively analyze the relationship between player proficiency and its perception.

Low-proficiency players often have a misconception that rhythm game note placement and timing do not correspond to the music. This misperception, when mentioned by participants, was often coupled with claims of sole reliance on visual cues and a lack of perceived increase in challenge.

Conversely, high-proficiency players used audio and visual cues for timing prediction and displayed more varied reactions to the desynchronization. Some ignored the visual cues, relying solely on audio (participant results excluded from analysis). Others related their experience to music theory concepts and employed various techniques to mitigate the desynchronization's effects. These players also noted that the desynchronization's impact varied with tempo and note density changes.

6.2 Learning and Outliers

Figure 8 indicates that control group performance becomes more consistent with increased proficiency, which is expected. However, with only two control participants per proficiency level, further research is needed to confirm the validity of our method of determining player proficiency.

The uneven distribution of player proficiency in the experimental group (1 low, 1 medium, 4 high) may have influenced the observed inconsistency in performance. This imbalance likely resulted from the assignment procedure, which did not consider proficiency when allocating participants to groups. Notably, high-proficiency players performed less consistently when subjected to 250ms of audio-visual desynchronization (Figures 8, 9). However, it's important to note the difference in sample size (4 vs 2) when interpreting this finding.

The effect of learning disproportionately influences the performance of the sole low-proficiency participant in the experimental group.

Figure 12. Average ΔTP For Control and Experiment with possible outlier removed

To illustrate this impact, Figure 12 displays the average ΔTP for both groups after excluding this participant as an outlier. Removing this outlier reveals a more consistent decrease in performance within the experimental group, aligning with the expected effect of desynchronization. However, this apparent consistency could be attributed to various factors, including the higher proportion of high-proficiency players in the experimental group, the inherent noise in the data, and the overall limited sample size. Therefore, this observation should be interpreted primarily to demonstrate the potential influence of learning effects and outliers on the study's results.

6.4 Power Analysis

We can calculate a Cohen’s d of 0.4 from our data, suggesting a small to medium effect size. A post hoc power analysis suggested that we achieved a power level of 0.16 (Tails: One, α err prob: 0.05, Effect size f: 0.4, Sample size group 1: 6, Sample size group 2: 6). This is consistent with preliminary exploration and visualization results. An a priori power analysis suggested a total sample size of 272 for future studies to achieve a power level of 0.95.

6.5 Implications

This study underscores the potential of rhythm games as valuable tools for investigating the complex interplay of perception, cognition, and action. Our preliminary results, while inconclusive, suggest individual differences in strategy and predisposition to rhythmic timing. These findings highlight the multifaceted nature of rhythm games, which demand not only precise motor coordination but also rapid decision-making and adaptive strategies in dynamic environments. Rhythm games offer a unique platform for evaluating these that rely on integrating multiple sensory inputs and responding effectively to unpredictable stimuli.

7. Limitations / Future Work

This outcome, while a promising preliminary study of the performance-degrading effect of desynchronization and simulated local latency, is not without its flaws. Our testing procedure did not control the overall testing environment. As a result, some subjects were more or less distracted during the testing process. Some subjects had a discussion while playing, while others had other music playing in the background. This could have led to performance and learning outcome changes.

In addition to the uncontrolled testing environment, we also had three different devices used for testing. While all phones are near the current generation, they all feature their own native desynchronization rate. This could have led to unpredictable changes in performance. We did not perform another pilot study to evaluate the similarity of the test devices and kept a consistent .08 ms delay and .33 ms delay for control and experimental, respectively.  A future research project would be advised to either standardize the technology used or perform the aforementioned pilot study to attain synchronization standardization.

We did not select our subjects based on prior experience with rhythm games. As a result, we had significantly differing skill sets within our sample. In anticipation of this issue, we built an additional grouping of player skills in rhythm games to help explain the different patterns of skill progression and song learning. A future attempt at this study would be advised to either narrow the sample to only feature low-skill, low-experience subjects, only feature high-skill, high-experience subjects, or double the sample size to accommodate enough subjects for a sufficient test for both experience and skill levels.

For the test itself, we did not perform enough repeated tests or counterbalances to provide the most robust testing process possible. A potential future test that we could perform would be to include ABBA testing for a full song or 10 repeated short sections of one song with just 1-minute segment lengths instead of full songs to allow for getting a larger and more robust dataset.  

Finally, the song choice led to players being able to near-perfect the game on the first try, which impacted the data collected. A different song choice could have led to lower perfect rates for higher-skill players and lowered learning rates among newer players. A potential future study could have included either a more difficult song or a different song choice based on prior experience and skill to help control for rapid learning rates and mastery.

8. Conclusion

This study investigated the impact of audio-visual desynchronization on player performance through the rhythm game Cytus II. While the primary hypothesis was not supported, the study revealed potential performance variations and susceptibility among different proficiency groups. This suggests that the relationship between audio-visual synchrony and player performance in rhythm games may be more complex than initially anticipated and warrants further investigation. Future research should address the limitations of this study, including the small sample size, under controlled testing environment, and device variations. Larger-scale studies with standardized testing conditions and a focus on proficiency-based analysis are needed to gain a more comprehensive and generalizable understanding of how audio-visual desynchronization affects rhythm game players.

Works Cited

[1] Song, D. H., Kim, K. B., & Lee, J. H. (2013). Analysis and evaluation of mobile rhythm games: Game structure and playability. Proceedings of the 2013 International Conference on ICT Convergence (ICTC), 341–345. https://doi.org/10.1109/ICTC.2013.6676905

[2] Miller, K. (2016). Schizophonic Performance: Guitar Hero, Rock Band, and Virtual Virtuosity. In M. Austin (Ed.), Music Video Games: Performance, Politics, and Play (pp. 189-210). Bloomsbury Academic.

[3] ppy Pty Ltd. (n.d.). osu!. Retrieved December 12, 2024, from https://osu.ppy.sh/

[4] Schultz, P. (2016). Rhythm Sense: Modality and Enactive Perception in Rhythm Heaven. In M. Austin (Ed.), Music Video Games: Performance, Politics, and Play (pp. 251-272). Bloomsbury Academic.

[5] Schwartz J-L, Savariaux C (2014) No, There Is No 150 ms Lead of Visual Speech on Auditory Speech, but a Range of Audiovisual Asynchronies Varying from Small Audio Lead to Large Audio Lag. PLoS Comput Biol 10(7): e1003743. doi:10.1371/journal.pcbi.1003743

[6] Hagman, E. (2015). Audiovisual desynchronization impact on listening effort: How does audio delayed to visuals affect listeners’ effort to understand a news reporter in a noisy background? [Bachelor's thesis, Luleå University of Technology]. Luleå University of Technology, Department of Arts, Communication and Education.

[7] Katya Alvarez-Molina, Anke V. Reinschluessel, Tim Kratky, Martin Scharpenberg & Rainer Malaka (2024) Can you feel the rhythm? Comparing vibrotactile and auditory stimuli in the rhythm video game Jump‘n'Rhythm, Behaviour & Information Technology, 43:11, 2343-2360, DOI: 10.1080/0144929X.2023.2243525

[8] Bellino, A. Rhythmic- Synchronization-Based Interaction: Effect of Interfering Auditory Stimuli, Age and Gender on Users’ Performances. Appl. Sci. 2022, 12, 3053. https://doi.org/10.3390/ app12063053

[9] Yu Chen, Tian Min, Juntao Zhao, and Wei Cai. 2022. Synchronization in Games Sound: An Audiovisual Study on Player Experience and Performance. In 2nd edition of the Game Systems Workshop (GameSys ’22), June 14, 2022, Athlone, Ireland. ACM, New York, NY, USA, 7 pages. https://doi.org/10.1145/ 3534085.3534342

[10] Pesek, M., Hirci, N., Žnideršič, K., & Marolt, M. (2024). Enhancing music rhythmic perception and performance with a VR game. Virtual Reality, 28(118). https://doi.org/10.1007/s10055-024-01014-y

[11] Kjærbo, R., Romeu, R., Pérez, M. G., Correia, F. R., Guruvayurappan, V., Overholt, D., & Dahl, S. (2020). Rhythm Rangers: An evaluation of beat synchronisation skills and musical confidence through multiplayer gamification influence. In Proceedings of the 17th Sound and Music Computing Conference, Torino, June 24th–26th 2020.

[12] Hove, M. J., Iversen, J. R., Zhang, A., & Repp, B. H. (2012). Synchronization with competing visual and auditory rhythms: Bouncing ball meets metronome. Experimental Brain Research, 221(3), 303–313. https://doi.org/10.1007/s00221-012-3134-7 Sources and related content

[13] Giannaraki, M., Moumoutzis, N., Papatzanis, Y., Kourkoutas, E., & Mania, K. (2020). A 3D rhythm-based serious game for collaboration improvement of children with attention deficit hyperactivity disorder (ADHD). In Proceedings of the 17th Sound and Music Computing Conference, Torino, June 24th–26th 2020.

[14] Tierney A, White-Schwoch T, MacLean J, Kraus N (2017) Individual differences in rhythm skills: links with neural consistency and linguistic ability. J Cognit Neurosci 29(5):855–868. https:// doi. org/ 10. 1162/ jocn_a_ 01092

[15] Iannis Albert, Nicole Burkard, Dirk Queck, and Marc Herrlich. 2022. The Effect of Auditory-Motor Synchronization in Exergames on the Example of the VR Rhythm Game BeatSaber. Proc. ACM Hum.-Comput. Interact. 6, CHI PLAY, Article 253 (October 2022), 26 pages. https://doi.org/10.1145/3549516

[16] Rayark Inc. (n.d.). Cytus II. Retrieved December 12, 2024, from https://rayark.com/g/cytus2/

[17] Cytus Wiki. (n.d.). Score. Retrieved December 12, 2024, from https://cytus.fandom.com/wiki/Score

[18] Ono, T., Sakurai, T., Kasuno, S., & Murai, T. (2022). Novel 3-D action video game mechanics reveal differentiable cognitive constructs in young players, but not in old. Scientific Reports, 12(1), Article 11751. https://doi.org/10.1038/s41598-022-15679-5 Sources and related content

Academic Archive © 2024

TheGrapeEscape · Charles Schwimmer · Godfrey Yang

Rhythm GamesCognitive LoadExperimental Design